[osv-dev] Re: [PATCH V2] threads: allocate application stacks lazily to save memory

Waldek Kozaczuk Sat, 07 Mar 2020 22:17:12 -0800

Bump again :-)

On Sunday, March 1, 2020 at 10:30:07 AM UTC-5, Waldek Kozaczuk wrote:
>
> Bump!
>
> On Monday, February 10, 2020 at 12:14:52 AM UTC-5, Waldek Kozaczuk wrote:
>>
>> Nadav,
>>
>> What do you think about this patch? Also, do you think that the deadlock 
>> issue related to *vma_list_mutex *I discovered and had to overcome (see 
>> my previous email in this chain) is the only one we need to worry about or 
>> are there any potential hidden problems like that I have missed to notice?
>>
>> Shall I make this lazy-stack feature activated with #ifdef and only if we 
>> feel confident we could make it permanent?
>>
>> Waldek
>>
>> On Sunday, February 2, 2020 at 11:15:30 AM UTC-5, Waldek Kozaczuk wrote:
>>>
>>> As the issue #143 explains, currently applications stacks are eagerly 
>>> allocated (aka pre-populated) which may lead to substantial memory 
>>> waste. 
>>> We want those stacks to be lazily allocated instead, so that physical 
>>> memory behind the stack gets allocated as needed by standard fault 
>>> handling mechanism. 
>>>
>>> The page fault handling logic requires that both interrupts and 
>>> preemption 
>>> are enabled. Therefore this patch implements simple strategy to read 
>>> single byte on a stack 
>>> one page (4K) deeper before disabling interrupts or preemption. 
>>>
>>> It turns out we cannot read that single byte "blindly" because in some 
>>> scenarios 
>>> interrupts might be already disabled when we are about to disable 
>>> preemption and vice versa. 
>>> Also similarly, which is more speculative, disabling preemption (and 
>>> possibly interrupts) may nest. 
>>> Therefore we need a "conditional" read of that byte on stack. More 
>>> specifically 
>>> we should ONLY read the byte IF both interrupts and preemption are 
>>> enabled. 
>>>
>>> This patch achieves this by adding extra thread-local counter to track 
>>> interrupts disabling/enabling 
>>> that operates in a similar way the preemption counter does. So every 
>>> time interrupts 
>>> are disabled we increment the counter and decrement it everytime 
>>> interrupts are enabled back. 
>>> Then everytime before we disable interrupts or preemption, we check if 
>>> both counters 
>>> are 0 and only then try to read a byte to potentially trigger a fault. 
>>>
>>> Please note that performance-wise this patch is slighy more expensive 
>>> than it 
>>> was originally hoped. As the disassembled snippets illustrate it costs 
>>> extra 3-4 instructions 
>>> extra every time we disable preemption or interrupts. 
>>>
>>> A. Disabling preemption in __tls_get_addr before the patch: 
>>> ``` 
>>>  4035cae8:   e8 83 f6 ff ff          callq  4035c170 
>>> <elf::get_program()> 
>>>  4035caed:   4c 8b 2d 6c 40 51 00    mov    0x51406c(%rip),%r13        # 
>>> 40870b60 <sched::preempt_counter+0x40870b60> 
>>>  4035caf4:   4c 8b 33                mov    (%rbx),%r14 
>>>  4035caf7:   64 41 83 45 00 01       addl   $0x1,%fs:0x0(%r13) 
>>>  4035cafd:   e8 6e f6 ff ff          callq  4035c170 
>>> <elf::get_program()> 
>>> ``` 
>>>
>>> B. Disabling preemption in __tls_get_addr after the patch: 
>>> ``` 
>>>  4035fdd8:   e8 63 f6 ff ff          callq  4035f440 
>>> <elf::get_program()> 
>>>  4035fddd:   4c 8b 2d a4 4d 51 00    mov    0x514da4(%rip),%r13        # 
>>> 40874b88 <arch::irq_preempt_counters+0x40874b88> 
>>>  4035fde4:   4c 8b 33                mov    (%rbx),%r14 
>>>  4035fde7:   64 49 83 7d 00 00       cmpq   $0x0,%fs:0x0(%r13) 
>>>  4035fded:   75 07                   jne    4035fdf6 
>>> <__tls_get_addr+0x76> 
>>>  4035fdef:   8a 84 24 00 f0 ff ff    mov    -0x1000(%rsp),%al 
>>>  4035fdf6:   64 41 83 45 00 01       addl   $0x1,%fs:0x0(%r13) 
>>>  4035fdfc:   e8 3f f6 ff ff          callq  4035f440 
>>> <elf::get_program()> 
>>> ``` 
>>>
>>> As an example of memory saving, tomcat using ~ 300 thread ended up 365MB 
>>> instead 
>>> of 620MB before the patch. 
>>>
>>> Fixes #143 
>>>
>>> Signed-off-by: Matthew Pabst <[email protected]> 
>>> Signed-off-by: Waldemar Kozaczuk <[email protected]> 
>>> --- 
>>>  arch/aarch64/arch.hh    |  5 ++++ 
>>>  arch/x64/arch-switch.hh | 12 ++++++++++ 
>>>  arch/x64/arch.hh        | 34 ++++++++++++++++++++++++++ 
>>>  arch/x64/mmu.cc         |  5 ++++ 
>>>  core/mmu.cc             | 24 +++++++++++++++++++ 
>>>  core/sched.cc           |  6 ++++- 
>>>  include/osv/counters.hh | 53 +++++++++++++++++++++++++++++++++++++++++ 
>>>  include/osv/irqlock.hh  |  4 ++++ 
>>>  include/osv/mmu-defs.hh |  3 +-- 
>>>  include/osv/sched.hh    | 10 ++++---- 
>>>  libc/mman.cc            |  9 ++++--- 
>>>  libc/pthread.cc         | 10 +++++++- 
>>>  12 files changed, 162 insertions(+), 13 deletions(-) 
>>>  create mode 100644 include/osv/counters.hh 
>>>
>>> diff --git a/arch/aarch64/arch.hh b/arch/aarch64/arch.hh 
>>> index 855f1987..473e84e9 100644 
>>> --- a/arch/aarch64/arch.hh 
>>> +++ b/arch/aarch64/arch.hh 
>>> @@ -20,6 +20,11 @@ namespace arch { 
>>>  #define INSTR_SIZE_MIN 4 
>>>  #define ELF_IMAGE_START (OSV_KERNEL_BASE + 0x10000) 
>>>   
>>> +inline void ensure_next_stack_page() 
>>> +{ 
>>> +    //TODO: Implement lazy stack check for AARCH64 
>>> +} 
>>> + 
>>>  inline void irq_disable() 
>>>  { 
>>>      processor::irq_disable(); 
>>> diff --git a/arch/x64/arch-switch.hh b/arch/x64/arch-switch.hh 
>>> index 6803498f..3e52f7f8 100644 
>>> --- a/arch/x64/arch-switch.hh 
>>> +++ b/arch/x64/arch-switch.hh 
>>> @@ -148,6 +148,13 @@ void thread::init_stack() 
>>>      _state.rip = reinterpret_cast<void*>(thread_main); 
>>>      _state.rsp = stacktop; 
>>>      _state.exception_stack = _arch.exception_stack + 
>>> sizeof(_arch.exception_stack); 
>>> + 
>>> +    if (stack.lazy) { 
>>> +        // If the thread stack is setup to get lazily allocated and 
>>> given the thread initially starts 
>>> +        // running with preemption disabled, we need to pre-fault the 
>>> first stack page. 
>>> +        volatile char r = *((char *) (stacktop - 1)); 
>>> +        (void) r; // trick the compiler into thinking that r is used 
>>> +    } 
>>>  } 
>>>   
>>>  void thread::setup_tcb() 
>>> @@ -311,6 +318,11 @@ void thread::free_tcb() 
>>>   
>>>  void thread_main_c(thread* t) 
>>>  { 
>>> +    // Disable lazy stack pre-fault logic for this thread 
>>> +    // if stack is not lazy (most likely kernel thread) 
>>> +    if (t->get_stack_info().lazy) { 
>>> +       
>>>  
>>> arch::irq_preempt_counters._data.set_lazy_flags_bit(arch::lazy_stack_disable,
>>>  
>>> false); 
>>> +    } 
>>>      arch::irq_enable(); 
>>>  #ifdef CONF_preempt 
>>>      preempt_enable(); 
>>> diff --git a/arch/x64/arch.hh b/arch/x64/arch.hh 
>>> index 17df5f5c..74b728dc 100644 
>>> --- a/arch/x64/arch.hh 
>>> +++ b/arch/x64/arch.hh 
>>> @@ -10,6 +10,8 @@ 
>>>   
>>>  #include "processor.hh" 
>>>  #include "msr.hh" 
>>> +#include <osv/barrier.hh> 
>>> +#include <osv/counters.hh> 
>>>   
>>>  // namespace arch - architecture independent interface for architecture 
>>>  //                  dependent operations (e.g. irq_disable vs. cli) 
>>> @@ -20,8 +22,33 @@ namespace arch { 
>>>  #define INSTR_SIZE_MIN 1 
>>>  #define ELF_IMAGE_START OSV_KERNEL_BASE 
>>>   
>>> +inline void ensure_next_stack_page() { 
>>> +    // Because both irq and preempt counters and lazy stack 
>>> +    // flags are co-located within same 8-bytes long union 
>>> +    // we can check single field for non-zero value to determine 
>>> +    // if we should trigger pre-fault on lazy stack 
>>> +    if (irq_preempt_counters.lazy_stack_no_pre_fault) { 
>>> +        return; 
>>> +    } 
>>> + 
>>> +    char i; 
>>> +    asm volatile("movb -4096(%%rsp), %0" : "=r"(i)); 
>>> +} 
>>> + 
>>> +inline void ensure_next_two_stack_pages() { 
>>> +    if (irq_preempt_counters.lazy_stack_no_pre_fault) { 
>>> +        return; 
>>> +    } 
>>> + 
>>> +    char i; 
>>> +    asm volatile("movb -4096(%%rsp), %0" : "=r"(i)); 
>>> +    asm volatile("movb -8192(%%rsp), %0" : "=r"(i)); 
>>> +} 
>>> + 
>>>  inline void irq_disable() 
>>>  { 
>>> +    ensure_next_stack_page(); 
>>> +    ++irq_preempt_counters._data.irq; 
>>>      processor::cli(); 
>>>  } 
>>>   
>>> @@ -36,11 +63,15 @@ inline void irq_disable_notrace() 
>>>  inline void irq_enable() 
>>>  { 
>>>      processor::sti(); 
>>> +    --irq_preempt_counters._data.irq; 
>>> +    barrier(); 
>>>  } 
>>>   
>>>  inline void wait_for_interrupt() 
>>>  { 
>>>      processor::sti_hlt(); 
>>> +    --irq_preempt_counters._data.irq; 
>>> +    barrier(); 
>>>  } 
>>>   
>>>  inline void halt_no_interrupts() 
>>> @@ -78,12 +109,15 @@ private: 
>>>   
>>>  inline void irq_flag_notrace::save() 
>>>  { 
>>> +    ensure_next_stack_page(); 
>>> +    ++irq_preempt_counters._data.irq; 
>>>      asm volatile("sub $128, %%rsp; pushfq; popq %0; add $128, %%rsp" : 
>>> "=r"(_rflags)); 
>>>  } 
>>>   
>>>  inline void irq_flag_notrace::restore() 
>>>  { 
>>>      asm volatile("sub $128, %%rsp; pushq %0; popfq; add $128, %%rsp" : 
>>> : "r"(_rflags)); 
>>> +    --irq_preempt_counters._data.irq; 
>>>  } 
>>>   
>>>  inline bool irq_flag_notrace::enabled() const 
>>> diff --git a/arch/x64/mmu.cc b/arch/x64/mmu.cc 
>>> index 24da5caa..ae619a9f 100644 
>>> --- a/arch/x64/mmu.cc 
>>> +++ b/arch/x64/mmu.cc 
>>> @@ -37,6 +37,11 @@ void page_fault(exception_frame *ef) 
>>>      assert(sched::preemptable()); 
>>>      assert(ef->rflags & processor::rflags_if); 
>>>   
>>> +    // Even though we are on exception stack that is not lazy 
>>> +    // for pre-caution we are disabling the pre-fault logic 
>>> +    // for the time the fault is being handled 
>>> +    arch::lazy_stack_flags_guard 
>>> lazy_stack_guard(arch::lazy_stack_fault_disable); 
>>> + 
>>>      // And since we may sleep, make sure interrupts are enabled. 
>>>      DROP_LOCK(irq_lock) { // irq_lock is acquired by HW 
>>>          mmu::vm_fault(addr, ef); 
>>> diff --git a/core/mmu.cc b/core/mmu.cc 
>>> index ff3fab47..b6e7cc36 100644 
>>> --- a/core/mmu.cc 
>>> +++ b/core/mmu.cc 
>>> @@ -42,6 +42,18 @@ extern const char text_start[], text_end[]; 
>>>   
>>>  namespace mmu { 
>>>   
>>> +static void prevent_stack_page_fault() { 
>>> +    // We need to ensure that lazy stack is populated 
>>> +    // deeply enough (2 pages) for all cases where 
>>> +    // the vma_list_mutex is taken for write so that we can disable 
>>> +    // the pre-fault check (arch::ensure_next_stack_page) which, 
>>> +    // if page fault triggered, would make page-fault handling logic 
>>> +    // attempt to take same vma_list_mutex fo read and end up with 
>>> deadlock 
>>> +#ifndef AARCH64_PORT_STUB 
>>> +    arch::ensure_next_two_stack_pages(); 
>>> +#endif 
>>> +} 
>>> + 
>>>  namespace bi = boost::intrusive; 
>>>   
>>>  class vma_compare { 
>>> @@ -1183,6 +1195,8 @@ static void nohugepage(void* addr, size_t length) 
>>>   
>>>  error advise(void* addr, size_t size, int advice) 
>>>  { 
>>> +    prevent_stack_page_fault(); 
>>> +    arch::lazy_stack_flags_guard 
>>> lazy_stack_guard(arch::lazy_stack_mmu_disable); 
>>>      WITH_LOCK(vma_list_mutex.for_write()) { 
>>>          if (!ismapped(addr, size)) { 
>>>              return make_error(ENOMEM); 
>>> @@ -1215,6 +1229,8 @@ void* map_anon(const void* addr, size_t size, 
>>> unsigned flags, unsigned perm) 
>>>      size = align_up(size, mmu::page_size); 
>>>      auto start = reinterpret_cast<uintptr_t>(addr); 
>>>      auto* vma = new mmu::anon_vma(addr_range(start, start + size), 
>>> perm, flags); 
>>> +    prevent_stack_page_fault(); 
>>> +    arch::lazy_stack_flags_guard 
>>> lazy_stack_guard(arch::lazy_stack_mmu_disable); 
>>>      SCOPE_LOCK(vma_list_mutex.for_write()); 
>>>      auto v = (void*) allocate(vma, start, size, search); 
>>>      if (flags & mmap_populate) { 
>>> @@ -1241,6 +1257,8 @@ void* map_file(const void* addr, size_t size, 
>>> unsigned flags, unsigned perm, 
>>>      auto start = reinterpret_cast<uintptr_t>(addr); 
>>>      auto *vma = f->mmap(addr_range(start, start + size), flags | 
>>> mmap_file, perm, offset).release(); 
>>>      void *v; 
>>> +    prevent_stack_page_fault(); 
>>> +    arch::lazy_stack_flags_guard 
>>> lazy_stack_guard(arch::lazy_stack_mmu_disable); 
>>>      WITH_LOCK(vma_list_mutex.for_write()) { 
>>>          v = (void*) allocate(vma, start, size, search); 
>>>          if (flags & mmap_populate) { 
>>> @@ -1607,6 +1625,8 @@ ulong map_jvm(unsigned char* jvm_addr, size_t 
>>> size, size_t align, balloon_ptr b) 
>>>   
>>>      auto* vma = new mmu::jvm_balloon_vma(jvm_addr, start, start + size, 
>>> b, v->perm(), v->flags()); 
>>>   
>>> +    prevent_stack_page_fault(); 
>>> +    arch::lazy_stack_flags_guard 
>>> lazy_stack_guard(arch::lazy_stack_mmu_disable); 
>>>      WITH_LOCK(vma_list_mutex.for_write()) { 
>>>          // This means that the mapping that we had before was a balloon 
>>> mapping 
>>>          // that was laying around and wasn't updated to an anon 
>>> mapping. If we 
>>> @@ -1868,6 +1888,8 @@ void free_initial_memory_range(uintptr_t addr, 
>>> size_t size) 
>>>   
>>>  error mprotect(const void *addr, size_t len, unsigned perm) 
>>>  { 
>>> +    prevent_stack_page_fault(); 
>>> +    arch::lazy_stack_flags_guard 
>>> lazy_stack_guard(arch::lazy_stack_mmu_disable); 
>>>      SCOPE_LOCK(vma_list_mutex.for_write()); 
>>>   
>>>      if (!ismapped(addr, len)) { 
>>> @@ -1879,6 +1901,8 @@ error mprotect(const void *addr, size_t len, 
>>> unsigned perm) 
>>>   
>>>  error munmap(const void *addr, size_t length) 
>>>  { 
>>> +    prevent_stack_page_fault(); 
>>> +    arch::lazy_stack_flags_guard 
>>> lazy_stack_guard(arch::lazy_stack_mmu_disable); 
>>>      SCOPE_LOCK(vma_list_mutex.for_write()); 
>>>   
>>>      length = align_up(length, mmu::page_size); 
>>> diff --git a/core/sched.cc b/core/sched.cc 
>>> index 06f849d1..4510a85e 100644 
>>> --- a/core/sched.cc 
>>> +++ b/core/sched.cc 
>>> @@ -42,6 +42,11 @@ extern char _percpu_start[], _percpu_end[]; 
>>>  using namespace osv; 
>>>  using namespace osv::clock::literals; 
>>>   
>>> +namespace arch { 
>>> +// By default disable lazy stack pre-fault logic 
>>> +counters __thread irq_preempt_counters = {{1, 1, 1}}; 
>>> +} 
>>> + 
>>>  namespace sched { 
>>>   
>>>  TRACEPOINT(trace_sched_idle, ""); 
>>> @@ -69,7 +74,6 @@ std::vector<cpu*> cpus 
>>> __attribute__((init_priority((int)init_prio::cpus))); 
>>>  thread __thread * s_current; 
>>>  cpu __thread * current_cpu; 
>>>   
>>> -unsigned __thread preempt_counter = 1; 
>>>  bool __thread need_reschedule = false; 
>>>   
>>>  elf::tls_data tls; 
>>> diff --git a/include/osv/counters.hh b/include/osv/counters.hh 
>>> new file mode 100644 
>>> index 00000000..b862f32c 
>>> --- /dev/null 
>>> +++ b/include/osv/counters.hh 
>>> @@ -0,0 +1,53 @@ 
>>> +/* 
>>> + * Copyright (C) 2020 Waldemar Kozaczuk 
>>> + * 
>>> + * This work is open source software, licensed under the terms of the 
>>> + * BSD license as described in the LICENSE file in the top-level 
>>> directory. 
>>> + */ 
>>> + 
>>> +#ifndef OSV_COUNTERS_HH 
>>> +#define OSV_COUNTERS_HH 
>>> + 
>>> +namespace arch { 
>>> + 
>>> +// Both preempt and irq counters are co-located next to each other 
>>> +// along with the the flags that drive lazy stack related logic 
>>> +// union by design. It is so that compiler can optimize 
>>> ensure_next_stack_page() 
>>> +// to test against single 64-bit lazy_stack_no_pre_fault field 
>>> +union counters { 
>>> +    struct data { 
>>> +        unsigned preempt; 
>>> +        u16 irq; 
>>> +        u16 lazy_stack_flags; 
>>> + 
>>> +        inline void set_lazy_flags_bit(unsigned nr, bool v) { 
>>> +            lazy_stack_flags &= ~(u64(1) << nr); 
>>> +            lazy_stack_flags |= u64(v) << nr; 
>>> +        } 
>>> +    } _data; 
>>> +    uint64_t lazy_stack_no_pre_fault; 
>>> +}; 
>>> + 
>>> +extern counters __thread irq_preempt_counters; 
>>> + 
>>> +class lazy_stack_flags_guard { 
>>> +    unsigned _flag_nr; 
>>> +public: 
>>> +    lazy_stack_flags_guard(unsigned flag_nr) { 
>>> +        _flag_nr = flag_nr; 
>>> +        irq_preempt_counters._data.set_lazy_flags_bit(_flag_nr, true); 
>>> +    } 
>>> +    ~lazy_stack_flags_guard() { 
>>> +        irq_preempt_counters._data.set_lazy_flags_bit(_flag_nr, false); 
>>> +    } 
>>> +}; 
>>> + 
>>> +enum { 
>>> +    lazy_stack_disable = 0, 
>>> +    lazy_stack_mmu_disable = 1, 
>>> +    lazy_stack_fault_disable = 2, 
>>> +}; 
>>> + 
>>> +} 
>>> + 
>>> +#endif //OSV_COUNTERS_HH 
>>> diff --git a/include/osv/irqlock.hh b/include/osv/irqlock.hh 
>>> index 2d5372fc..1e0ab6fa 100644 
>>> --- a/include/osv/irqlock.hh 
>>> +++ b/include/osv/irqlock.hh 
>>> @@ -34,6 +34,10 @@ inline void irq_save_lock_type::lock() 
>>>  inline void irq_save_lock_type::unlock() 
>>>  { 
>>>      _flags.restore(); 
>>> +#ifndef AARCH64_PORT_STUB 
>>> +    --arch::irq_preempt_counters._data.irq; 
>>> +    barrier(); 
>>> +#endif 
>>>  } 
>>>   
>>>  extern irq_lock_type irq_lock; 
>>> diff --git a/include/osv/mmu-defs.hh b/include/osv/mmu-defs.hh 
>>> index 18edf441..a731ed6e 100644 
>>> --- a/include/osv/mmu-defs.hh 
>>> +++ b/include/osv/mmu-defs.hh 
>>> @@ -15,8 +15,6 @@ 
>>>   
>>>  struct exception_frame; 
>>>   
>>> -struct exception_frame; 
>>> - 
>>>  namespace mmu { 
>>>   
>>>  constexpr uintptr_t page_size = 4096; 
>>> @@ -84,6 +82,7 @@ enum { 
>>>      mmap_small       = 1ul << 5, 
>>>      mmap_jvm_balloon = 1ul << 6, 
>>>      mmap_file        = 1ul << 7, 
>>> +    mmap_stack       = 1ul << 8, 
>>>  }; 
>>>   
>>>  enum { 
>>> diff --git a/include/osv/sched.hh b/include/osv/sched.hh 
>>> index 0fb44f77..96453a8b 100644 
>>> --- a/include/osv/sched.hh 
>>> +++ b/include/osv/sched.hh 
>>> @@ -25,6 +25,7 @@ 
>>>  #include <osv/rcu.hh> 
>>>  #include <osv/clock.hh> 
>>>  #include <osv/timer-set.hh> 
>>> +#include "osv/counters.hh" 
>>>  #include <string.h> 
>>>   
>>>  typedef float runtime_t; 
>>> @@ -335,6 +336,7 @@ public: 
>>>          void* begin; 
>>>          size_t size; 
>>>          void (*deleter)(stack_info si);  // null: don't delete 
>>> +        bool lazy = false; 
>>>          static void default_deleter(stack_info si); 
>>>      }; 
>>>      struct attr { 
>>> @@ -978,14 +980,13 @@ inline void release(mutex_t* mtx) 
>>>      } 
>>>  } 
>>>   
>>> -extern unsigned __thread preempt_counter; 
>>>  extern bool __thread need_reschedule; 
>>>   
>>>  #ifdef __OSV_CORE__ 
>>>  inline unsigned int get_preempt_counter() 
>>>  { 
>>>      barrier(); 
>>> -    return preempt_counter; 
>>> +    return arch::irq_preempt_counters._data.preempt; 
>>>  } 
>>>   
>>>  inline bool preemptable() 
>>> @@ -1005,14 +1006,15 @@ inline void preempt() 
>>>   
>>>  inline void preempt_disable() 
>>>  { 
>>> -    ++preempt_counter; 
>>> +    arch::ensure_next_stack_page(); 
>>> +    ++arch::irq_preempt_counters._data.preempt; 
>>>      barrier(); 
>>>  } 
>>>   
>>>  inline void preempt_enable() 
>>>  { 
>>>      barrier(); 
>>> -    --preempt_counter; 
>>> +    --arch::irq_preempt_counters._data.preempt; 
>>>      if (preemptable() && need_reschedule && arch::irq_enabled()) { 
>>>          cpu::schedule(); 
>>>      } 
>>> diff --git a/libc/mman.cc b/libc/mman.cc 
>>> index d0803ac4..d00befc3 100644 
>>> --- a/libc/mman.cc 
>>> +++ b/libc/mman.cc 
>>> @@ -38,12 +38,11 @@ unsigned libc_flags_to_mmap(int flags) 
>>>          mmap_flags |= mmu::mmap_populate; 
>>>      } 
>>>      if (flags & MAP_STACK) { 
>>> -        // OSv currently requires that stacks be pinned (see issue 
>>> #143). So 
>>> -        // if an application wants to mmap() a stack for 
>>> pthread_attr_setstack 
>>> -        // and did us the courtesy of telling this to ue (via 
>>> MAP_STACK), 
>>> -        // let's return the courtesy by returning pre-faulted memory. 
>>> -        // FIXME: If issue #143 is fixed, this workaround should be 
>>> removed. 
>>> +#ifdef AARCH64_PORT_STUB 
>>>          mmap_flags |= mmu::mmap_populate; 
>>> +#else 
>>> +        mmap_flags |= mmu::mmap_stack; 
>>> +#endif 
>>>      } 
>>>      if (flags & MAP_SHARED) { 
>>>          mmap_flags |= mmu::mmap_shared; 
>>> diff --git a/libc/pthread.cc b/libc/pthread.cc 
>>> index 8c976bf6..2279ac0c 100644 
>>> --- a/libc/pthread.cc 
>>> +++ b/libc/pthread.cc 
>>> @@ -140,10 +140,18 @@ namespace pthread_private { 
>>>              return {attr.stack_begin, attr.stack_size}; 
>>>          } 
>>>          size_t size = attr.stack_size; 
>>> -        void *addr = mmu::map_anon(nullptr, size, mmu::mmap_populate, 
>>> mmu::perm_rw); 
>>> +#ifdef AARCH64_PORT_STUB 
>>> +        unsigned stack_flags = mmu::mmap_populate; 
>>> +#else 
>>> +        unsigned stack_flags = mmu::mmap_stack; 
>>> +#endif 
>>> +        void *addr = mmu::map_anon(nullptr, size, stack_flags, 
>>> mmu::perm_rw); 
>>>          mmu::mprotect(addr, attr.guard_size, 0); 
>>>          sched::thread::stack_info si{addr, size}; 
>>>          si.deleter = free_stack; 
>>> +#ifndef AARCH64_PORT_STUB 
>>> +        si.lazy = true; 
>>> +#endif 
>>>          return si; 
>>>      } 
>>>   
>>> -- 
>>> 2.20.1 
>>>
>>>


-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/1029ced5-7dab-4e90-bb3c-309181133c47%40googlegroups.com.

[osv-dev] Re: [PATCH V2] threads: allocate application stacks lazily to save memory

Reply via email to