Nadav,

Thanks for the review and answers to my questions. I will try to prepare 
next (hopefully final version of the patch).

Please see some of my comments below,

On Tuesday, March 6, 2018 at 6:25:47 PM UTC-5, Nadav Har'El wrote:
>
>
> On Mon, Feb 5, 2018 at 7:58 AM, Waldek Kozaczuk <jwkoz...@gmail.com 
> <javascript:>> wrote:
>
>> I am expecting this patch will need some tweaking but I think it 
>> fundamentally implements SYSCALL call stack switch. I have tested in using 
>> my golang working branch.
>>
>
> Excellent.
>  
>
>>
>> First of all I am assembly novice so I am not sure if I have not messed 
>> some stuff in entry.S. For example I am not sure I have not messed the CFI 
>> - on other hand given by previous GDB research it might be mute point given 
>> gdb cannot handle backtracing non-contiguous stacks.
>>
>
> I think not doing the CFI business around the new function call is fine 
> (we won't debug this function...) but there's possibly one error in the CFI 
> thing which I pointed out in my review.
> Anyway, this CFI stuff is hard to get right without actually testing gdb...
>  
>
>>
>> Secondly I am not sure if tiny stack is enough (1024). I am afraid there 
>> maybe some scenarios where thread which calls setup_large_syscall_stack() 
>> may be interrupted (or something else) and bigger stack is needed.
>>
>> For example this is a crash I got only once when the tiny stack was 512 
>> bytes deep (not sure if increasing to 1024 has fully addressed it as I have 
>> not seen it anymore):
>>
>
> I'm not sure what exactly I'm seeing in this stack trace, it seems 
> corrupt, it is conceivable that 512 bytes is not enough, but I don't really 
> know why - which of the functions on the malloc() path use a lot of stack 
> space?
> 512 is not a lot - just our "push all the registers on the stack" shtick 
> consumes 128 bytes (16*8) of that.
>  
>  
>
>> [backtrace]
>> 0x000000000044d85c <osv::generate_signal(siginfo&, exception_frame*)+140>
>> 0x000000000044d88a <osv::handle_mmap_fault(unsigned long, int, 
>> exception_frame*)+26>
>> 0x000000000032b3eb <mmu::vm_fault(unsigned long, exception_frame*)+171>
>> 0x000000000038b4e8 <page_fault+136>
>> 0x000000000038a356 <???+3711830>
>> 0x00000000003e5381 <sched::cpu::reschedule_from_interrupt(bool, std::
>> chrono::duration<long, std::ratio<1l, 1000000000l> 
>> >)+561>
>> 0x00000000003e583b <sched::cpu::schedule()+27>
>> 0x00000000003e5f01 <sched::thread::wait()+17>
>> 0x00000000003c5957 <lockfree::mutex::lock()+407>
>> 0x00000000003d4885 <???+4016261>
>> 0x00000000003d4a5a <???+4016730>
>> 0xffff8000015d203f <???+22880319>
>> 0x00000000003e5381 <sched::cpu::reschedule_from_interrupt(bool, std::
>> chrono::duration<long, std::ratio<1l, 1000000000l> 
>> >)+561>
>> 0x00000000003e583b <sched::cpu::schedule()+27>
>> 0x00000000003e5f01 <sched::thread::wait()+17>
>> 0x00000000003c5957 <lockfree::mutex::lock()+407>
>> 0x00000000003d4885 <???+4016261>
>> 0x00000000003d4a5a <???+4016730>
>> 0x00000000003d4d46 <malloc+70>
>> 0x00000000003e2780 <sched::thread::setup_large_syscall_stack()+48>
>> 0x000000000038b324 <syscall_entry+36>
>> 0x000010000113cd91 <???+18075025>
>> 0x0000100001125e23 <???+17980963>
>> 0x000010000112c35f <???+18006879>
>> 0x000010000112c2e5 <???+18006757>
>> 0x0000100001306bf2 <???+19950578>
>> 0x0000200000ddff3f <???+14548799>
>> qemu failed.
>>
>> I also tried to use mmap() and mprotect() (instead of malloc) to allocate 
>> memory for large stack and it did not quite work (please see commented out 
>> code in the patch). Everytime I ran it would crash like this (not event 
>> same thread when mprotect() was called):
>>
>> 0  0x0000000000225930 in abort (fmt=fmt@entry=0x71d020 "exception nested 
>> too deeply") at runtime.cc:130
>> #1  0x0000000000388804 in sched::arch_cpu::enter_exception 
>> (this=<optimized out>) at arch/x64/arch-cpu.cc:19
>> #2  sched::exception_guard::exception_guard (this=<optimized out>) at 
>> arch/x64/arch-cpu.cc:37
>> #3  0x000000000038c6da in general_protection (ef=0xffff800000f2f048) at 
>> arch/x64/exceptions.cc:313
>> #4  <signal handler called>
>> #5  0x000000000038890e in safe_load<frame*> (data=<synthetic pointer>, 
>> potentially_bad_pointer=0x56415741e5894855) at ar
>> ch/x64/safe-ptr.hh:33
>> #6  backtrace_safe (pc=pc@entry=0xffff800000f2d330, nr=nr@entry=128) at 
>> arch/x64/backtrace.cc:25
>> #7  0x000000000020203c in print_backtrace () at runtime.cc:79
>> #8  0x000000000022591c in abort (fmt=fmt@entry=0x71d3a0 "general 
>> protection fault\n") at runtime.cc:121
>> #9  0x000000000038c73a in general_protection (ef=0xffff800000f2e048) at 
>> arch/x64/exceptions.cc:322
>> #10 <signal handler called>
>> #11 std::unique_ptr<sched::thread::detached_state, 
>> std::default_delete<sched::thread::detached_state> >::get (this=0x5bd
>> 8894808c483d0)
>>     at /usr/include/c++/5/bits/unique_ptr.h:305
>> #12 
>> sched::thread::do_wake_with<waiter::wake()::{lambda()#1}>(waiter::wake()::{lambda()#1},
>>  
>> unsigned int) (allowed_initi
>> al_states_mask=24, 
>>     action=..., this=0x5bd8894808c48348) at include/osv/sched.hh:1261
>> #13 
>> sched::thread::wake_with_from_mutex<waiter::wake()::{lambda()#1}>(waiter::wake()::{lambda()#1})
>>  
>> (action=..., this=0x
>> 5bd8894808c48348)
>>     at include/osv/sched.hh:1287
>> #14 waiter::wake (this=0x33185e 
>> <mmu::initialized_anonymous_page_provider::fill(void*, unsigned long, 
>> unsigned long)+30>
>> )
>>     at include/osv/wait_record.hh:38
>> #15 lockfree::mutex::lock (this=0xffffa00002166648, 
>> this@entry=0xffff800002953040) at core/lfmutex.cc:74
>> #16 lockfree_mutex_lock (m=m@entry=0xffffa00002166648) at 
>> core/lfmutex.cc:282
>> #17 0x00000000002e213b in mutex_lock (m=0xffffa00002166648) at 
>> include/osv/mutex.h:59
>> #18 sa_bulk_lookup (hdl=0xffffa00002166648, 
>> attrs=attrs@entry=0xffff8000029579c0, count=3)
>>     at bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/sa.c:1851
>> #19 0x0000000000316cd4 in zfs_getattr (vp=0xffffa0000221e780, 
>> vap=0xffff800002957aa0)
>>     at bsd/sys/cddl/contrib/opensolaris/uts/common/fs/zfs/zfs_vnops.c:2055
>> #20 0x000000000042cf2c in vn_stat (vp=vp@entry=0xffffa0000221e780, 
>> st=st@entry=0xffff800002957b60) at fs/vfs/vfs_vnode.c
>> c:325
>> #21 0x000000000043029a in vfs_file::stat (this=<optimized out>, 
>> st=0xffff800002957b60) at fs/vfs/vfs_fops.cc:118
>> #22 0x000000000041d48c in size (f=...) at fs/fs.cc:16
>> #23 0x00000000003300ed in mmu::file_vma::fault (this=0xffffa0000221ed00, 
>> addr=17592205996032, ef=0xffff800002958098) at 
>> core/mmu.cc:1689
>> #24 0x000000000032b484 in mmu::vm_fault (addr=<optimized out>, 
>> addr@entry=17592205997277, ef=ef@entry=0xffff800002958098
>> ) at core/mmu.cc:1342
>> #25 0x000000000038b529 in page_fault (ef=0xffff800002958098) at 
>> arch/x64/mmu.cc:38
>> #26 <signal handler called>
>> #27 runtime.memmove () at 
>> /home/wkozaczuk/tools/go/src/runtime/memmove_amd64.s:161
>> #28 0x000010000112864f in runtime.recordForPanic (b=...) at 
>> /home/wkozaczuk/tools/go/src/runtime/print.go:45
>>
>>
> I'm not parsing this. Where is a system call here?
>  
>
>> When I commented mprotect() (still mmap()) the test app would work but 
>> occasionally crash (I do not have an example). SO I wonder if there is some 
>> fundamental constraint which is violated when calling mmap()/mprotect() 
>> when setting up large stack (called from assembly in tiny stack) comparing 
>> to regular malloc(). Is it because mprotect() requires updating memory page 
>> tables and it triggers some logic that messes things up? If I cannot use 
>> mmap() how do I implement canary type of logic with malloc()?
>>
>
> Hmm, I just now realized you were using mprotect() to guard one page on 
> the end of the stack (i.e., its beginning) and not the entire thing :-) 
> Makes more sense now...
> I can't say I understand why mmap() should not work here. Sounds like it 
> should have... We're already using mmap() for stacks, successfully.
> To implement a canary with malloc() what you can do is to put a fixed 
> value, say 0xdeadbeef, in the beginning of the stack buffer, and every time 
> you exit from a system call, you check that it still has this value.
> This doesn't catch stack overflows immediately, but at least it will not 
> allow some other thread who's data we just ruined to run. It's not 
> foolproof though - it's possible that the data we overflowed is actually 
> OSv's own data and ruining it messed with the function of the system call 
> itself.
>

I think we can implement canary logic later. So meanwhile I will stick to 
regular malloc(). Also isn't my problems with mmap()/mprotectd() related to 
this issue - https://github.com/cloudius-systems/osv/issues/143  ?

>  
>
>>    struct thread_control_block { 
>>
>>>      thread_control_block* self; 
>>>      void* tls_base; 
>>> +    // These fields, a per-thread stack for SYSCALL instruction, 
>>> +    // are used in arch/x64/entry.S for %fs's offset. 
>>> +    // We currently keep these fields in the TCB to make it easier to 
>>> access in assembly code 
>>> +    // through a known offset into %fs. But with more effort, we could 
>>> have used an 
>>> +    // ordinary thread-local variable instead and avoided extending the 
>>> TCB here. 
>>> +    void* tiny_syscall_stack_addr; 
>>> +    void* large_syscall_stack_addr; 
>>>
>>  
>>  Should we instead store single pointer to a struct that has pointers to 
>> both tiny and large stack?
>>
>
> I don't think so, but see my comments about whether these two separate 
> pointers are really
> needed in the TCB).
>  
> +    # Skip setup large stack if has already happenned 
> +    pushq %rcx 
> +    movq %fs:24, %rcx 
> +    cmpq $0, %rcx 
> +    jne large_stack_has_been_setup 
> + 
>
>>  
>> Am I saving too many registers? 
>>
>
> I think not but see my comments on alignment in my review.
>  
>
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Reply via email to