Hi, thanks for the prompt replies. I was just typing up a long response (and ended up suspecting similar culprits) until I noticed there were 2 new replies. Thanks for looking into it. I'm guessing this means it is more or less impossible to use the pxz application, unless I rewrite the source such that it does not fork the xz executable?
On Thursday, 21 May 2020 21:20:47 UTC+2, Waldek Kozaczuk wrote: > > I think this code in the app might explain this huge malloc: > > lzma_options_lzma lzma_options; > xzcmd_max = sysconf(_SC_ARG_MAX); > page_size = sysconf(_SC_PAGE_SIZE); > xzcmd = malloc(xzcmd_max); > > > On Thursday, May 21, 2020 at 3:16:29 PM UTC-4, Waldek Kozaczuk wrote: >> >> I connected with gdb and here is stacktrace I got for the main app thread: >> >> #0 sched::thread::switch_to (this=this@entry=0xffff8000001d1040) at >> arch/x64/arch-switch.hh:108 >> #1 0x000000004040dace in sched::cpu::reschedule_from_interrupt >> (this=0xffff80000001e040, called_from_yield=called_from_yield@entry=false, >> preempt_after=..., preempt_after@entry=...) at core/sched.cc:339 >> #2 0x000000004040e800 in sched::cpu::schedule () at >> include/osv/sched.hh:1315 >> #3 0x000000004040e8e6 in sched::thread::wait >> (this=this@entry=0xffff800000f0a040) at core/sched.cc:1216 >> #4 0x000000004043ca86 in sched::thread::do_wait_for<lockfree::mutex, >> sched::wait_object<waitqueue> > (mtx=...) at include/osv/mutex.h:41 >> #5 sched::thread::wait_for<waitqueue&> (mtx=...) at >> include/osv/sched.hh:1225 >> #6 waitqueue::wait (this=this@entry=0x408fa650 <mmu::vma_list_mutex+48>, >> mtx=...) at core/waitqueue.cc:56 >> #7 0x00000000403eb27b in rwlock::reader_wait_lockable (this=<optimized >> out>) at core/rwlock.cc:174 >> #8 rwlock::rlock (this=this@entry=0x408fa620 <mmu::vma_list_mutex>) at >> core/rwlock.cc:29 >> #9 0x000000004034b88c in rwlock_for_read::lock (this=0x408fa620 >> <mmu::vma_list_mutex>) at include/osv/rwlock.h:113 >> #10 std::lock_guard<rwlock_for_read&>::lock_guard (__m=..., >> this=<synthetic pointer>) at /usr/include/c++/9/bits/std_mutex.h:159 >> #11 lock_guard_for_with_lock<rwlock_for_read&>::lock_guard_for_with_lock >> (lock=..., this=<synthetic pointer>) at include/osv/mutex.h:89 >> #12 mmu::vm_fault (addr=17592186081280, addr@entry=17592186083096, >> ef=ef@entry=0xffff800000f0f068) at core/mmu.cc:1333 >> #13 0x00000000403adf7c in page_fault (ef=0xffff800000f0f068) at >> arch/x64/mmu.cc:42 >> #14 <signal handler called> >> #15 0x00000000405bf0cd in _Unwind_IteratePhdrCallback () >> #16 0x000000004047fd37 in <lambda(const >> elf::program::modules_list&)>::operator() (ml=..., __closure=<synthetic >> pointer>) at libc/dlfcn.cc:118 >> #17 elf::program::with_modules<dl_iterate_phdr(int (*)(dl_phdr_info*, >> size_t, void*), void*)::<lambda(const elf::program::modules_list&)> > >> (f=..., >> this=0xffffa0000009cbb0) at include/osv/elf.hh:698 >> #18 dl_iterate_phdr (callback=0x405befa0 <_Unwind_IteratePhdrCallback>, >> data=0x200000700520) at libc/dlfcn.cc:99 >> #19 0x00000000405c0255 in _Unwind_Find_FDE () >> #20 0x00000000405bc693 in uw_frame_state_for () >> #21 0x00000000405be1da in _Unwind_RaiseException () >> #22 0x00000000404c4d1c in __cxa_throw () >> #23 0x0000000040205229 in mmu::find_hole (start=<optimized out>, >> size=<optimized out>) at include/osv/error.h:36 >> #24 0x000000004034ecea in mmu::allocate (v=v@entry=0xffffa00000cf2b80, >> start=35184372088832, start@entry=0, size=size@entry=9223372036854779904, >> search=search@entry=true) at core/mmu.cc:1113 >> #25 0x000000004034fa97 in mmu::map_anon (addr=addr@entry=0x0, >> size=size@entry=9223372036854779904, flags=flags@entry=2, perm=perm@entry=3) >> at core/mmu.cc:1219 >> #26 0x00000000403f89a0 in memory::mapped_malloc_large (offset=64, >> size=9223372036854779904) at core/mempool.cc:919 >> #27 memory::malloc_large (size=9223372036854779904, alignment=16, >> block=true, contiguous=false) at core/mempool.cc:919 >> #28 0x00000000403fa272 in std_malloc (size=9223372036854775807, >> alignment=16) at core/mempool.cc:1795 >> #29 0x00000000403fa63b in malloc (size=9223372036854775807) at >> core/mempool.cc:2001 >> #30 0x00001000000075d5 in main () >> #31 0x0000000040444c11 in osv::application::run_main >> (this=0xffffa0007ffb4210) at /usr/include/c++/9/bits/stl_vector.h:915 >> #32 0x0000000040444d65 in __libc_start_main (main=0x100000007560 <main>) >> at core/app.cc:37 >> #33 0x000010000000801e in _start () >> >> It is trying to allocate tons of memory and it looks like we crash in >> find_hole() probably with throw make_error(ENOMEM); >> >> I wonder if it is app (https://github.com/jnovy/pxz/blob/master/pxz.c) >> passing such memory size or is there some bug on our side? >> >> (BTW osv info threads fails like this - would be nice to fix it: >> >> (gdb) osv info threads >> 1 (0xffff800000017040) reclaimer cpu0 status::waiting >> condvar::wait(lockfree::mutex*, sched::timer*) at core/condvar.cc:43 >> vruntime 6.07461e-25 >> Python Exception <class 'Exception'> Class does not extend >> list_base_hook: sched::timer_base: >> Error occurred in Python: Class does not extend list_base_hook: >> sched::timer_base >> ) >> >> When I examined pxz.c it eventually calls execvpe() which will definitely >> NOT work in OSv (OSv does not support processes so forking does not work -> >> there is some research fork that does that which I sent paper about >> recently). >> >> 135 void __attribute__((noreturn)) run_xz( char **argv, char **envp ) { >> 136 execve(XZ_BINARY, argv, envp); >> 137 error(0, errno, "execution of "XZ_BINARY" binary failed"); >> 138 exit(EXIT_FAILURE); >> 139 } >> >> xz seems to work fine (at least --help): >> >> ./scripts/manifest_from_host.sh -w xz && ./scripts/build >> --append-manifest fs=rofs >> ./scripts/firecracker.py >> OSv v0.55.0-9-gc13529d9 >> Booted up in 7.42 ms >> Cmdline: /xz --help >> Usage: /xz [OPTION]... [FILE]... >> Compress or decompress FILEs in the .xz format. >> >> -z, --compress force compression >> -d, --decompress force decompression >> -t, --test test compressed file integrity >> -l, --list list information about .xz files >> -k, --keep keep (don't delete) input files >> -f, --force force overwrite of output file and (de)compress >> links >> -c, --stdout write to standard output and don't delete input >> files >> -0 ... -9 compression preset; default is 6; take compressor >> *and* >> decompressor memory usage into account before using >> 7-9! >> -e, --extreme try to improve compression ratio by using more CPU >> time; >> does not affect decompressor memory requirements >> -T, --threads=NUM use at most NUM threads; the default is 1; set to 0 >> to use as many threads as there are processor cores >> -q, --quiet suppress warnings; specify twice to suppress errors >> too >> -v, --verbose be verbose; specify twice for even more verbose >> -h, --help display this short help and exit >> -H, --long-help display the long help (lists also the advanced >> options) >> -V, --version display the version number and exit >> >> With no FILE, or when FILE is -, read standard input. >> >> Report bugs to <[email protected] <javascript:>> (in English or >> Finnish). >> XZ Utils home page: <https://tukaani.org/xz/> >> >> Waldek >> >> On Thursday, May 21, 2020 at 6:59:07 AM UTC-4, Nadav Har'El wrote: >>> >>> On Thu, May 21, 2020 at 12:46 PM De Vries <[email protected]> wrote: >>> >>>> Hi, >>>> >>>> Sorry if this is a bit of a newbie question. I'm trying to run a pretty >>>> simple application on OSv: pxz <https://github.com/jnovy/pxz>. I'm >>>> able to run other apps like mysql for example without any problem. >>>> I have tried this the following way. First, I compiled the pxz >>>> executable with the -fPIE flag on the host machine, then put it in a new >>>> folder at osv/apps/pxz. I then ran the following: >>>> ./scripts/manifest_from_host.sh -r ~/osv/apps/pxz/pxz > ./apps/pxz/usr. >>>> manifest >>>> ./scripts/build image=pxz >>>> >>>> It generates the following usr.manifest >>>> # (PIE) Position Independent Executable >>>> /pxz: /home/user1/osv/apps/pxz/pxz >>>> # -------------------- >>>> # Dependencies >>>> # -------------------- >>>> /usr/lib/libgomp.so.1: /usr/lib/x86_64-linux-gnu/libgomp.so.1 >>>> /usr/lib/liblzma.so.5: /lib/x86_64-linux-gnu/liblzma.so.5 >>>> # -------------------- >>>> >>>> Running it with >>>> ./scripts/run.py -e "pxz --version" >>>> >>>> Results in >>>> OSv v0.55.0-6-g557251e1 >>>> eth0: 192.168.122.15 >>>> Booted up in 407.56 ms >>>> Cmdline: pxz --version >>>> >>>> But it just hangs. No errors, but also no output. I have tried actually >>>> using pxz (not just --version) to compress a file but that also hangs >>>> indefinitely (while this works fine on the host machine). >>>> >>> >>> It's hard to say. It seems like you did everything right. I assume that >>> if you run "pxz --version" on the host it works properly - prints a version >>> number and exits - right? >>> During the "hang", does OSv do some busy loop ("top" will show you the >>> OSv vm taking 100% CPU) or waits for something? >>> >>> One thing you can do to figure out what is going on is to attach gdb to >>> the running VM, and inquire from it what threads are running, and what they >>> are waiting for. >>> It's not trivial to do, but not particular difficult either, and >>> explained well (I hope) here: >>> https://github.com/cloudius-systems/osv/wiki/Debugging-OSv#debugging-osv-with-gdb >>> Note that you don't need to rebuild OSv specially for debugging to debug >>> it this way. >>> >>> >>>> >>>> Running ./scripts/run.py with the -V flag looks completely fine except >>>> maybe for the last line that is printed (after it prints Cmdline: pxz >>>> --version): >>>> sysconf(): stubbed for parameter 0 >>>> >>>> >>> This is a _SC_ARG_MAX parameter to sysconf(), it is indeed not >>> implemented (and can be trivially implemented) but I doubt that this is the >>> problem causing the hang (I also wonder why this program would need to >>> check _SC_ARG_MAX if it's just planning to print the version number, not >>> exec() anything - you can look at this software's source code to see what >>> it does with _SC_ARG_MAX. >>> >>> >>> >>>> I have also tried to run pxz using the way its done in the >>>> native-example application, but that also results in it hanging >>>> indefinitely. >>>> What could be the issue here? >>>> >>>> -- >>>> You received this message because you are subscribed to the Google >>>> Groups "OSv Development" group. >>>> To unsubscribe from this group and stop receiving emails from it, send >>>> an email to [email protected]. >>>> To view this discussion on the web visit >>>> https://groups.google.com/d/msgid/osv-dev/9ce2c259-c6e9-475d-aa73-e7e6d71cd722%40googlegroups.com >>>> >>>> <https://groups.google.com/d/msgid/osv-dev/9ce2c259-c6e9-475d-aa73-e7e6d71cd722%40googlegroups.com?utm_medium=email&utm_source=footer> >>>> . >>>> >>> -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/0be8f468-8df1-4964-b376-3ba2219abb47%40googlegroups.com.
