On Thu, May 21, 2020 at 3:40 PM <[email protected]> wrote: > Hi, thanks for the prompt replies. I was just typing up a long response > (and ended up suspecting similar culprits) until I noticed there were 2 new > replies. Thanks for looking into it. > I'm guessing this means it is more or less impossible to use the pxz > application, unless I rewrite the source such that it does not fork the xz > executable? > Pretty much. Use some sort of multi-threading approach.
> > On Thursday, 21 May 2020 21:20:47 UTC+2, Waldek Kozaczuk wrote: >> >> I think this code in the app might explain this huge malloc: >> >> lzma_options_lzma lzma_options; >> xzcmd_max = sysconf(_SC_ARG_MAX); >> page_size = sysconf(_SC_PAGE_SIZE); >> xzcmd = malloc(xzcmd_max); >> >> >> On Thursday, May 21, 2020 at 3:16:29 PM UTC-4, Waldek Kozaczuk wrote: >>> >>> I connected with gdb and here is stacktrace I got for the main app >>> thread: >>> >>> #0 sched::thread::switch_to (this=this@entry=0xffff8000001d1040) at >>> arch/x64/arch-switch.hh:108 >>> #1 0x000000004040dace in sched::cpu::reschedule_from_interrupt >>> (this=0xffff80000001e040, called_from_yield=called_from_yield@entry >>> =false, >>> preempt_after=..., preempt_after@entry=...) at core/sched.cc:339 >>> #2 0x000000004040e800 in sched::cpu::schedule () at >>> include/osv/sched.hh:1315 >>> #3 0x000000004040e8e6 in sched::thread::wait >>> (this=this@entry=0xffff800000f0a040) >>> at core/sched.cc:1216 >>> #4 0x000000004043ca86 in sched::thread::do_wait_for<lockfree::mutex, >>> sched::wait_object<waitqueue> > (mtx=...) at include/osv/mutex.h:41 >>> #5 sched::thread::wait_for<waitqueue&> (mtx=...) at >>> include/osv/sched.hh:1225 >>> #6 waitqueue::wait (this=this@entry=0x408fa650 >>> <mmu::vma_list_mutex+48>, mtx=...) at core/waitqueue.cc:56 >>> #7 0x00000000403eb27b in rwlock::reader_wait_lockable (this=<optimized >>> out>) at core/rwlock.cc:174 >>> #8 rwlock::rlock (this=this@entry=0x408fa620 <mmu::vma_list_mutex>) at >>> core/rwlock.cc:29 >>> #9 0x000000004034b88c in rwlock_for_read::lock (this=0x408fa620 >>> <mmu::vma_list_mutex>) at include/osv/rwlock.h:113 >>> #10 std::lock_guard<rwlock_for_read&>::lock_guard (__m=..., >>> this=<synthetic pointer>) at /usr/include/c++/9/bits/std_mutex.h:159 >>> #11 lock_guard_for_with_lock<rwlock_for_read&>::lock_guard_for_with_lock >>> (lock=..., this=<synthetic pointer>) at include/osv/mutex.h:89 >>> #12 mmu::vm_fault (addr=17592186081280, addr@entry=17592186083096, >>> ef=ef@entry=0xffff800000f0f068) at core/mmu.cc:1333 >>> #13 0x00000000403adf7c in page_fault (ef=0xffff800000f0f068) at >>> arch/x64/mmu.cc:42 >>> #14 <signal handler called> >>> #15 0x00000000405bf0cd in _Unwind_IteratePhdrCallback () >>> #16 0x000000004047fd37 in <lambda(const >>> elf::program::modules_list&)>::operator() (ml=..., __closure=<synthetic >>> pointer>) at libc/dlfcn.cc:118 >>> #17 elf::program::with_modules<dl_iterate_phdr(int (*)(dl_phdr_info*, >>> size_t, void*), void*)::<lambda(const elf::program::modules_list&)> > >>> (f=..., >>> this=0xffffa0000009cbb0) at include/osv/elf.hh:698 >>> #18 dl_iterate_phdr (callback=0x405befa0 <_Unwind_IteratePhdrCallback>, >>> data=0x200000700520) at libc/dlfcn.cc:99 >>> #19 0x00000000405c0255 in _Unwind_Find_FDE () >>> #20 0x00000000405bc693 in uw_frame_state_for () >>> #21 0x00000000405be1da in _Unwind_RaiseException () >>> #22 0x00000000404c4d1c in __cxa_throw () >>> #23 0x0000000040205229 in mmu::find_hole (start=<optimized out>, >>> size=<optimized out>) at include/osv/error.h:36 >>> #24 0x000000004034ecea in mmu::allocate (v=v@entry=0xffffa00000cf2b80, >>> start=35184372088832, start@entry=0, size=size@entry >>> =9223372036854779904, >>> search=search@entry=true) at core/mmu.cc:1113 >>> #25 0x000000004034fa97 in mmu::map_anon (addr=addr@entry=0x0, >>> size=size@entry=9223372036854779904, flags=flags@entry=2, >>> perm=perm@entry=3) >>> at core/mmu.cc:1219 >>> #26 0x00000000403f89a0 in memory::mapped_malloc_large (offset=64, >>> size=9223372036854779904) at core/mempool.cc:919 >>> #27 memory::malloc_large (size=9223372036854779904, alignment=16, >>> block=true, contiguous=false) at core/mempool.cc:919 >>> #28 0x00000000403fa272 in std_malloc (size=9223372036854775807, >>> alignment=16) at core/mempool.cc:1795 >>> #29 0x00000000403fa63b in malloc (size=9223372036854775807) at >>> core/mempool.cc:2001 >>> #30 0x00001000000075d5 in main () >>> #31 0x0000000040444c11 in osv::application::run_main >>> (this=0xffffa0007ffb4210) at /usr/include/c++/9/bits/stl_vector.h:915 >>> #32 0x0000000040444d65 in __libc_start_main (main=0x100000007560 <main>) >>> at core/app.cc:37 >>> #33 0x000010000000801e in _start () >>> >>> It is trying to allocate tons of memory and it looks like we crash in >>> find_hole() probably with throw make_error(ENOMEM); >>> >>> I wonder if it is app (https://github.com/jnovy/pxz/blob/master/pxz.c) >>> passing such memory size or is there some bug on our side? >>> >>> (BTW osv info threads fails like this - would be nice to fix it: >>> >>> (gdb) osv info threads >>> 1 (0xffff800000017040) reclaimer cpu0 status::waiting >>> condvar::wait(lockfree::mutex*, sched::timer*) at core/condvar.cc:43 >>> vruntime 6.07461e-25 >>> Python Exception <class 'Exception'> Class does not extend >>> list_base_hook: sched::timer_base: >>> Error occurred in Python: Class does not extend list_base_hook: >>> sched::timer_base >>> ) >>> >>> When I examined pxz.c it eventually calls execvpe() which will >>> definitely NOT work in OSv (OSv does not support processes so forking does >>> not work -> there is some research fork that does that which I sent paper >>> about recently). >>> >>> 135 void __attribute__((noreturn)) run_xz( char **argv, char **envp ) { >>> 136 execve(XZ_BINARY, argv, envp); >>> 137 error(0, errno, "execution of "XZ_BINARY" binary failed"); >>> 138 exit(EXIT_FAILURE); >>> 139 } >>> >>> xz seems to work fine (at least --help): >>> >>> ./scripts/manifest_from_host.sh -w xz && ./scripts/build >>> --append-manifest fs=rofs >>> ./scripts/firecracker.py >>> OSv v0.55.0-9-gc13529d9 >>> Booted up in 7.42 ms >>> Cmdline: /xz --help >>> Usage: /xz [OPTION]... [FILE]... >>> Compress or decompress FILEs in the .xz format. >>> >>> -z, --compress force compression >>> -d, --decompress force decompression >>> -t, --test test compressed file integrity >>> -l, --list list information about .xz files >>> -k, --keep keep (don't delete) input files >>> -f, --force force overwrite of output file and (de)compress >>> links >>> -c, --stdout write to standard output and don't delete input >>> files >>> -0 ... -9 compression preset; default is 6; take compressor >>> *and* >>> decompressor memory usage into account before >>> using 7-9! >>> -e, --extreme try to improve compression ratio by using more CPU >>> time; >>> does not affect decompressor memory requirements >>> -T, --threads=NUM use at most NUM threads; the default is 1; set to 0 >>> to use as many threads as there are processor cores >>> -q, --quiet suppress warnings; specify twice to suppress >>> errors too >>> -v, --verbose be verbose; specify twice for even more verbose >>> -h, --help display this short help and exit >>> -H, --long-help display the long help (lists also the advanced >>> options) >>> -V, --version display the version number and exit >>> >>> With no FILE, or when FILE is -, read standard input. >>> >>> Report bugs to <[email protected]> (in English or Finnish). >>> XZ Utils home page: <https://tukaani.org/xz/> >>> >>> Waldek >>> >>> On Thursday, May 21, 2020 at 6:59:07 AM UTC-4, Nadav Har'El wrote: >>>> >>>> On Thu, May 21, 2020 at 12:46 PM De Vries <[email protected]> wrote: >>>> >>>>> Hi, >>>>> >>>>> Sorry if this is a bit of a newbie question. I'm trying to run a >>>>> pretty simple application on OSv: pxz <https://github.com/jnovy/pxz>. >>>>> I'm able to run other apps like mysql for example without any problem. >>>>> I have tried this the following way. First, I compiled the pxz >>>>> executable with the -fPIE flag on the host machine, then put it in a new >>>>> folder at osv/apps/pxz. I then ran the following: >>>>> ./scripts/manifest_from_host.sh -r ~/osv/apps/pxz/pxz > ./apps/pxz/usr >>>>> .manifest >>>>> ./scripts/build image=pxz >>>>> >>>>> It generates the following usr.manifest >>>>> # (PIE) Position Independent Executable >>>>> /pxz: /home/user1/osv/apps/pxz/pxz >>>>> # -------------------- >>>>> # Dependencies >>>>> # -------------------- >>>>> /usr/lib/libgomp.so.1: /usr/lib/x86_64-linux-gnu/libgomp.so.1 >>>>> /usr/lib/liblzma.so.5: /lib/x86_64-linux-gnu/liblzma.so.5 >>>>> # -------------------- >>>>> >>>>> Running it with >>>>> ./scripts/run.py -e "pxz --version" >>>>> >>>>> Results in >>>>> OSv v0.55.0-6-g557251e1 >>>>> eth0: 192.168.122.15 >>>>> Booted up in 407.56 ms >>>>> Cmdline: pxz --version >>>>> >>>>> But it just hangs. No errors, but also no output. I have tried >>>>> actually using pxz (not just --version) to compress a file but that also >>>>> hangs indefinitely (while this works fine on the host machine). >>>>> >>>> >>>> It's hard to say. It seems like you did everything right. I assume that >>>> if you run "pxz --version" on the host it works properly - prints a version >>>> number and exits - right? >>>> During the "hang", does OSv do some busy loop ("top" will show you the >>>> OSv vm taking 100% CPU) or waits for something? >>>> >>>> One thing you can do to figure out what is going on is to attach gdb to >>>> the running VM, and inquire from it what threads are running, and what they >>>> are waiting for. >>>> It's not trivial to do, but not particular difficult either, and >>>> explained well (I hope) here: >>>> https://github.com/cloudius-systems/osv/wiki/Debugging-OSv#debugging-osv-with-gdb >>>> Note that you don't need to rebuild OSv specially for debugging to >>>> debug it this way. >>>> >>>> >>>>> >>>>> Running ./scripts/run.py with the -V flag looks completely fine except >>>>> maybe for the last line that is printed (after it prints Cmdline: pxz >>>>> --version): >>>>> sysconf(): stubbed for parameter 0 >>>>> >>>>> >>>> This is a _SC_ARG_MAX parameter to sysconf(), it is indeed not >>>> implemented (and can be trivially implemented) but I doubt that this is the >>>> problem causing the hang (I also wonder why this program would need to >>>> check _SC_ARG_MAX if it's just planning to print the version number, not >>>> exec() anything - you can look at this software's source code to see what >>>> it does with _SC_ARG_MAX. >>>> >>>> >>>> >>>>> I have also tried to run pxz using the way its done in the >>>>> native-example application, but that also results in it hanging >>>>> indefinitely. >>>>> What could be the issue here? >>>>> >>>>> -- >>>>> You received this message because you are subscribed to the Google >>>>> Groups "OSv Development" group. >>>>> To unsubscribe from this group and stop receiving emails from it, send >>>>> an email to [email protected]. >>>>> To view this discussion on the web visit >>>>> https://groups.google.com/d/msgid/osv-dev/9ce2c259-c6e9-475d-aa73-e7e6d71cd722%40googlegroups.com >>>>> <https://groups.google.com/d/msgid/osv-dev/9ce2c259-c6e9-475d-aa73-e7e6d71cd722%40googlegroups.com?utm_medium=email&utm_source=footer> >>>>> . >>>>> >>>> -- > You received this message because you are subscribed to the Google Groups > "OSv Development" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > To view this discussion on the web visit > https://groups.google.com/d/msgid/osv-dev/0be8f468-8df1-4964-b376-3ba2219abb47%40googlegroups.com > <https://groups.google.com/d/msgid/osv-dev/0be8f468-8df1-4964-b376-3ba2219abb47%40googlegroups.com?utm_medium=email&utm_source=footer> > . > -- You received this message because you are subscribed to the Google Groups "OSv Development" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. To view this discussion on the web visit https://groups.google.com/d/msgid/osv-dev/CAL9cFfPbBAa20WY8jKyO05L-Boy%3DTOm8FhbnHyf9Gbvc0D8Srw%40mail.gmail.com.
