Hi Waldek,

Our tree is up to date with f7b6bee552b41f56a55, plus I manually
applied your patch from this thread (as at the time it wasn't
committed).

Error seems to happen regardless of memory size I specify - but we
can't go below 2GB for memory reasons (btw - we seem to just freeze
when we run out of memory?).

Happens whether we use SMP or not.

We do have some modifications to OSv (mainly - do not use DHCP as we're
specifying the IP addresses in the cloudinit file, and some other hooks
into the network code that we're not actually using in this instance).
We've been using those changes for quite a while now - and I doubt this
is related. Again, its hard for me to try with a stock OSv due to the
complexity of the setup (multiple interfaces, talking to database
server etc).

We're keen to keep moving forward with the OSv version due to the
number of very useful fixes you've found (that may be causing some of
the crashes we see in production).

Cheers,
Rick



On Thu, 2019-08-22 at 16:02 -0400, Waldek Kozaczuk wrote:
> Rick,
> 
> Does this error happen with specific memory configuration? Or is more
> generic? I have lost track if in this email thread we are still
> talking about error related to the change in memory allocation I made
> to use memory below kernel? Also are you using the latest master or
> 0.53 specifically?
> 
> I thought we are talking about error when one passes 1.01 or 1.02 GB
> as memory size. Is it true?
> 
> I understand we have found slew of possible other bugs. 
> 
> Sorry I am a bit confused,
> Waldek
> 
> On Thu, Aug 22, 2019 at 15:53 Rick Payne <ri...@rossfell.co.uk>
> wrote:
> > On Thu, 2019-08-22 at 21:49 +1100, Rick Payne wrote:
> > > On Thu, 2019-08-22 at 12:30 +0300, Nadav Har'El wrote:
> > > 
> > > > Please run "osv syms" to allow gdb to find your application
> > object
> > > > files, and show lines there. Perhaps it's a segfault inside
> > your
> > > > application, not the kernel?
> > > 
> > > I had, but I had forgotten to add our stuff to the usr.manifest
> > so
> > > the
> > > tool could find them. I think this is better (from a different
> > run,
> > > apologies):
> > 
> > Ok, with a debug build of the ERTS, it seems to be failing in the
> > garbage collector for the beam. At this point its probably
> > allocating
> > memory and moving objects around - so I'm a bit suspicious of the
> > changes in OSv in this area:
> > 
> > #44 <signal handler called>
> > #45 0x0000100005bfa75e in move_boxed (ptr=0x20006c2016e0, hdr=128,
> > hpp=0x200040f55738, 
> >     orig=0x20005fbffcc0) at beam/erl_gc.h:91
> > #46 0x0000100005c00014 in sweep (src_size=0, src=0x0, ohsz=0,
> > oh=0x20005f000028 "@\002", 
> >     type=ErtsSweepNewHeap, n_htop=0x20005fbffff0,
> > n_hp=0x20005fbffcc8)
> > at beam/erl_gc.c:2184
> > ---Type <return> to continue, or q <return> to quit---
> > #47 sweep_new_heap (n_hp=0x20005f000028, n_htop=0x20005f001a58, 
> >     old_heap=0x20005f000028 "@\002", old_heap_size=0) at
> > beam/erl_gc.c:2237
> > #48 0x0000100005bff060 in do_minor (p=0x20004943dd78,
> > live_hf_end=0xfffffffffffffff8, 
> >     mature=0x20006b600028 "\200", mature_size=14266416,
> > new_sz=2072833, 
> >     objv=0x20004943de28, nobj=3) at beam/erl_gc.c:1678
> > #49 0x0000100005bfe1b4 in minor_collection (p=0x20004943dd78, 
> >     live_hf_end=0xfffffffffffffff8, need=0, objv=0x20004943de28,
> > nobj=3, 
> >     ygen_usage=1835980, recl=0x200040f55cf8) at beam/erl_gc.c:1426
> > #50 0x0000100005bfc2cf in garbage_collect (p=0x20004943dd78, 
> >     live_hf_end=0xfffffffffffffff8, need=0, objv=0x20004943de28,
> > nobj=3, fcalls=4000, 
> >     max_young_gen_usage=0) at beam/erl_gc.c:746
> > #51 0x0000100005bfc937 in erts_garbage_collect_nobump
> > (p=0x20004943dd78, need=0, 
> >     objv=0x20004943de28, nobj=3, fcalls=4000) at beam/erl_gc.c:882
> > #52 0x0000100005a8ecda in erts_execute_dirty_system_task
> > (c_p=0x20004943dd78)
> >     at beam/erl_process.c:10543
> > #53 0x0000100005a714bf in erts_dirty_process_main
> > (esdp=0xffff80007fc75d00)
> >     at beam/beam_emu.c:1201
> > #54 0x0000100005a8ac04 in sched_dirty_cpu_thread_func
> > (vesdp=0xffff80007fc75d00)
> >     at beam/erl_process.c:8512
> > #55 0x0000100005d0c7e8 in thr_wrapper (vtwd=0x2000002fea50) at
> > pthread/ethread.c:118
> > #56 0x0000000040461c96 in
> > pthread_private::pthread::<lambda()>::operator() (
> >     __closure=0xffffa0007f896a00) at libc/pthread.cc:114
> > #57 std::_Function_handler<void(),
> > pthread_private::pthread::pthread(void* (*)(void*), void*,
> > sigset_t,
> > const pthread_private::thread_attr*)::<lambda()> >::_M_invoke(const
> > std::_Any_data---Type <return> to continue, or q <return> to quit
> > ---
> >  &) (__functor=...) at /usr/include/c++/7/bits/std_function.h:316
> > #58 0x00000000403f9647 in sched::thread_main_c
> > (t=0xffff800003579040)
> >     at arch/x64/arch-switch.hh:271
> > #59 0x000000004039a793 in thread_main () at arch/x64/entry.S:113
> > (gdb) 
> > 
> > Like I said, it could be the erlang ERTS but I think thats pretty
> > unlikely.
> > 
> > Rick
> > 

-- 
You received this message because you are subscribed to the Google Groups "OSv 
Development" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to osv-dev+unsubscr...@googlegroups.com.
To view this discussion on the web visit 
https://groups.google.com/d/msgid/osv-dev/7bbb76ce3cf54b053533f43a7af0871178a4156e.camel%40rossfell.co.uk.

Reply via email to