Re: [osv-dev] New user: trouble running a simple program

Waldek Kozaczuk Thu, 21 May 2020 13:22:53 -0700

On Thu, May 21, 2020 at 3:40 PM <[email protected]> wrote:

> Hi, thanks for the prompt replies. I was just typing up a long response
> (and ended up suspecting similar culprits) until I noticed there were 2 new
> replies. Thanks for looking into it.
> I'm guessing this means it is more or less impossible to use the pxz
> application, unless I rewrite the source such that it does not fork the xz
> executable?
>
Pretty much. Use some sort of multi-threading approach.


>
> On Thursday, 21 May 2020 21:20:47 UTC+2, Waldek Kozaczuk wrote:
>>
>> I think this code in the app might explain this huge malloc:
>>
>> lzma_options_lzma lzma_options;
>> xzcmd_max = sysconf(_SC_ARG_MAX);
>> page_size = sysconf(_SC_PAGE_SIZE);
>> xzcmd = malloc(xzcmd_max);
>>
>>
>> On Thursday, May 21, 2020 at 3:16:29 PM UTC-4, Waldek Kozaczuk wrote:
>>>
>>> I connected with gdb and here is stacktrace I got for the main app
>>> thread:
>>>
>>> #0  sched::thread::switch_to (this=this@entry=0xffff8000001d1040) at
>>> arch/x64/arch-switch.hh:108
>>> #1  0x000000004040dace in sched::cpu::reschedule_from_interrupt
>>> (this=0xffff80000001e040, called_from_yield=called_from_yield@entry
>>> =false,
>>>     preempt_after=..., preempt_after@entry=...) at core/sched.cc:339
>>> #2  0x000000004040e800 in sched::cpu::schedule () at
>>> include/osv/sched.hh:1315
>>> #3  0x000000004040e8e6 in sched::thread::wait 
>>> (this=this@entry=0xffff800000f0a040)
>>> at core/sched.cc:1216
>>> #4  0x000000004043ca86 in sched::thread::do_wait_for<lockfree::mutex,
>>> sched::wait_object<waitqueue> > (mtx=...) at include/osv/mutex.h:41
>>> #5  sched::thread::wait_for<waitqueue&> (mtx=...) at
>>> include/osv/sched.hh:1225
>>> #6  waitqueue::wait (this=this@entry=0x408fa650
>>> <mmu::vma_list_mutex+48>, mtx=...) at core/waitqueue.cc:56
>>> #7  0x00000000403eb27b in rwlock::reader_wait_lockable (this=<optimized
>>> out>) at core/rwlock.cc:174
>>> #8  rwlock::rlock (this=this@entry=0x408fa620 <mmu::vma_list_mutex>) at
>>> core/rwlock.cc:29
>>> #9  0x000000004034b88c in rwlock_for_read::lock (this=0x408fa620
>>> <mmu::vma_list_mutex>) at include/osv/rwlock.h:113
>>> #10 std::lock_guard<rwlock_for_read&>::lock_guard (__m=...,
>>> this=<synthetic pointer>) at /usr/include/c++/9/bits/std_mutex.h:159
>>> #11 lock_guard_for_with_lock<rwlock_for_read&>::lock_guard_for_with_lock
>>> (lock=..., this=<synthetic pointer>) at include/osv/mutex.h:89
>>> #12 mmu::vm_fault (addr=17592186081280, addr@entry=17592186083096,
>>> ef=ef@entry=0xffff800000f0f068) at core/mmu.cc:1333
>>> #13 0x00000000403adf7c in page_fault (ef=0xffff800000f0f068) at
>>> arch/x64/mmu.cc:42
>>> #14 <signal handler called>
>>> #15 0x00000000405bf0cd in _Unwind_IteratePhdrCallback ()
>>> #16 0x000000004047fd37 in <lambda(const
>>> elf::program::modules_list&)>::operator() (ml=..., __closure=<synthetic
>>> pointer>) at libc/dlfcn.cc:118
>>> #17 elf::program::with_modules<dl_iterate_phdr(int (*)(dl_phdr_info*,
>>> size_t, void*), void*)::<lambda(const elf::program::modules_list&)> >
>>> (f=...,
>>>     this=0xffffa0000009cbb0) at include/osv/elf.hh:698
>>> #18 dl_iterate_phdr (callback=0x405befa0 <_Unwind_IteratePhdrCallback>,
>>> data=0x200000700520) at libc/dlfcn.cc:99
>>> #19 0x00000000405c0255 in _Unwind_Find_FDE ()
>>> #20 0x00000000405bc693 in uw_frame_state_for ()
>>> #21 0x00000000405be1da in _Unwind_RaiseException ()
>>> #22 0x00000000404c4d1c in __cxa_throw ()
>>> #23 0x0000000040205229 in mmu::find_hole (start=<optimized out>,
>>> size=<optimized out>) at include/osv/error.h:36
>>> #24 0x000000004034ecea in mmu::allocate (v=v@entry=0xffffa00000cf2b80,
>>> start=35184372088832, start@entry=0, size=size@entry
>>> =9223372036854779904,
>>>     search=search@entry=true) at core/mmu.cc:1113
>>> #25 0x000000004034fa97 in mmu::map_anon (addr=addr@entry=0x0,
>>> size=size@entry=9223372036854779904, flags=flags@entry=2,
>>> perm=perm@entry=3)
>>>     at core/mmu.cc:1219
>>> #26 0x00000000403f89a0 in memory::mapped_malloc_large (offset=64,
>>> size=9223372036854779904) at core/mempool.cc:919
>>> #27 memory::malloc_large (size=9223372036854779904, alignment=16,
>>> block=true, contiguous=false) at core/mempool.cc:919
>>> #28 0x00000000403fa272 in std_malloc (size=9223372036854775807,
>>> alignment=16) at core/mempool.cc:1795
>>> #29 0x00000000403fa63b in malloc (size=9223372036854775807) at
>>> core/mempool.cc:2001
>>> #30 0x00001000000075d5 in main ()
>>> #31 0x0000000040444c11 in osv::application::run_main
>>> (this=0xffffa0007ffb4210) at /usr/include/c++/9/bits/stl_vector.h:915
>>> #32 0x0000000040444d65 in __libc_start_main (main=0x100000007560 <main>)
>>> at core/app.cc:37
>>> #33 0x000010000000801e in _start ()
>>>
>>> It is trying to allocate tons of memory and it looks like we crash in
>>> find_hole() probably with throw make_error(ENOMEM);
>>>
>>> I wonder if it is app (https://github.com/jnovy/pxz/blob/master/pxz.c)
>>> passing such memory size or is there some bug on our side?
>>>
>>> (BTW osv info threads fails like this - would be nice to fix it:
>>>
>>> (gdb) osv info threads
>>>    1 (0xffff800000017040) reclaimer       cpu0 status::waiting
>>> condvar::wait(lockfree::mutex*, sched::timer*) at core/condvar.cc:43
>>> vruntime  6.07461e-25
>>> Python Exception <class 'Exception'> Class does not extend
>>> list_base_hook: sched::timer_base:
>>> Error occurred in Python: Class does not extend list_base_hook:
>>> sched::timer_base
>>> )
>>>
>>> When I examined pxz.c it eventually calls execvpe() which will
>>> definitely NOT work in OSv (OSv does not support processes so forking does
>>> not work -> there is some research fork that does that which I sent paper
>>> about recently).
>>>
>>> 135 void __attribute__((noreturn)) run_xz( char **argv, char **envp ) {
>>> 136         execve(XZ_BINARY, argv, envp);
>>> 137         error(0, errno, "execution of "XZ_BINARY" binary failed");
>>> 138         exit(EXIT_FAILURE);
>>> 139 }
>>>
>>> xz seems to work fine (at least --help):
>>>
>>> ./scripts/manifest_from_host.sh -w xz && ./scripts/build
>>> --append-manifest fs=rofs
>>> ./scripts/firecracker.py
>>> OSv v0.55.0-9-gc13529d9
>>> Booted up in 7.42 ms
>>> Cmdline: /xz --help
>>> Usage: /xz [OPTION]... [FILE]...
>>> Compress or decompress FILEs in the .xz format.
>>>
>>>   -z, --compress      force compression
>>>   -d, --decompress    force decompression
>>>   -t, --test          test compressed file integrity
>>>   -l, --list          list information about .xz files
>>>   -k, --keep          keep (don't delete) input files
>>>   -f, --force         force overwrite of output file and (de)compress
>>> links
>>>   -c, --stdout        write to standard output and don't delete input
>>> files
>>>   -0 ... -9           compression preset; default is 6; take compressor
>>> *and*
>>>                       decompressor memory usage into account before
>>> using 7-9!
>>>   -e, --extreme       try to improve compression ratio by using more CPU
>>> time;
>>>                       does not affect decompressor memory requirements
>>>   -T, --threads=NUM   use at most NUM threads; the default is 1; set to 0
>>>                       to use as many threads as there are processor cores
>>>   -q, --quiet         suppress warnings; specify twice to suppress
>>> errors too
>>>   -v, --verbose       be verbose; specify twice for even more verbose
>>>   -h, --help          display this short help and exit
>>>   -H, --long-help     display the long help (lists also the advanced
>>> options)
>>>   -V, --version       display the version number and exit
>>>
>>> With no FILE, or when FILE is -, read standard input.
>>>
>>> Report bugs to <[email protected]> (in English or Finnish).
>>> XZ Utils home page: <https://tukaani.org/xz/>
>>>
>>> Waldek
>>>
>>> On Thursday, May 21, 2020 at 6:59:07 AM UTC-4, Nadav Har'El wrote:
>>>>
>>>> On Thu, May 21, 2020 at 12:46 PM De Vries <[email protected]> wrote:
>>>>
>>>>> Hi,
>>>>>
>>>>> Sorry if this is a bit of a newbie question. I'm trying to run a
>>>>> pretty simple application on OSv: pxz <https://github.com/jnovy/pxz>.
>>>>> I'm able to run other apps like mysql for example without any problem.
>>>>> I have tried this the following way. First, I compiled the pxz
>>>>> executable with the -fPIE flag on the host machine, then put it in a new
>>>>> folder at osv/apps/pxz. I then ran the following:
>>>>> ./scripts/manifest_from_host.sh -r ~/osv/apps/pxz/pxz > ./apps/pxz/usr
>>>>> .manifest
>>>>> ./scripts/build image=pxz
>>>>>
>>>>> It generates the following usr.manifest
>>>>> # (PIE) Position Independent Executable
>>>>> /pxz: /home/user1/osv/apps/pxz/pxz
>>>>> # --------------------
>>>>> # Dependencies
>>>>> # --------------------
>>>>> /usr/lib/libgomp.so.1: /usr/lib/x86_64-linux-gnu/libgomp.so.1
>>>>> /usr/lib/liblzma.so.5: /lib/x86_64-linux-gnu/liblzma.so.5
>>>>> # --------------------
>>>>>
>>>>> Running it with
>>>>> ./scripts/run.py -e "pxz --version"
>>>>>
>>>>> Results in
>>>>> OSv v0.55.0-6-g557251e1
>>>>> eth0: 192.168.122.15
>>>>> Booted up in 407.56 ms
>>>>> Cmdline: pxz --version
>>>>>
>>>>> But it just hangs. No errors, but also no output. I have tried
>>>>> actually using pxz (not just --version) to compress a file but that also
>>>>> hangs indefinitely (while this works fine on the host machine).
>>>>>
>>>>
>>>> It's hard to say. It seems like you did everything right. I assume that
>>>> if you run "pxz --version" on the host it works properly - prints a version
>>>> number and exits - right?
>>>> During the "hang", does OSv do some busy loop ("top" will show you the
>>>> OSv vm taking 100% CPU) or waits for something?
>>>>
>>>> One thing you can do to figure out what is going on is to attach gdb to
>>>> the running VM, and inquire from it what threads are running, and what they
>>>> are waiting for.
>>>> It's not trivial to do, but not particular difficult either, and
>>>> explained well (I hope) here:
>>>> https://github.com/cloudius-systems/osv/wiki/Debugging-OSv#debugging-osv-with-gdb
>>>> Note that you don't need to rebuild OSv specially for debugging to
>>>> debug it this way.
>>>>
>>>>
>>>>>
>>>>> Running ./scripts/run.py with the -V flag looks completely fine except
>>>>> maybe for the last line that is printed (after it prints Cmdline: pxz
>>>>> --version):
>>>>> sysconf(): stubbed for parameter 0
>>>>>
>>>>>
>>>> This is a _SC_ARG_MAX parameter to sysconf(), it is indeed not
>>>> implemented (and can be trivially implemented) but I doubt that this is the
>>>> problem causing the hang (I also wonder why this program would need to
>>>> check _SC_ARG_MAX if it's just planning to print the version number, not
>>>> exec() anything - you can look at this software's source code to see what
>>>> it does with _SC_ARG_MAX.
>>>>
>>>>
>>>>
>>>>> I have also tried to run pxz using the way its done in the
>>>>> native-example application, but that also results in it hanging
>>>>> indefinitely.
>>>>> What could be the issue here?
>>>>>
>>>>> --
>>>>> You received this message because you are subscribed to the Google
>>>>> Groups "OSv Development" group.
>>>>> To unsubscribe from this group and stop receiving emails from it, send
>>>>> an email to [email protected].
>>>>> To view this discussion on the web visit
>>>>> https://groups.google.com/d/msgid/osv-dev/9ce2c259-c6e9-475d-aa73-e7e6d71cd722%40googlegroups.com
>>>>> <https://groups.google.com/d/msgid/osv-dev/9ce2c259-c6e9-475d-aa73-e7e6d71cd722%40googlegroups.com?utm_medium=email&utm_source=footer>
>>>>> .
>>>>>
>>>> --
> You received this message because you are subscribed to the Google Groups
> "OSv Development" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to [email protected].
> To view this discussion on the web visit
> https://groups.google.com/d/msgid/osv-dev/0be8f468-8df1-4964-b376-3ba2219abb47%40googlegroups.com
> <https://groups.google.com/d/msgid/osv-dev/0be8f468-8df1-4964-b376-3ba2219abb47%40googlegroups.com?utm_medium=email&utm_source=footer>
> .
>

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to [email protected].
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/CAL9cFfPbBAa20WY8jKyO05L-Boy%3DTOm8FhbnHyf9Gbvc0D8Srw%40mail.gmail.com.

Re: [osv-dev] New user: trouble running a simple program

Reply via email to