On Sat, 1 Sep 2018 at 20:23, Euan Kemp <[email protected]> wrote: > > > well ok, but it would be more interesting to be able to reproduce Niklas > > system behaviour without any such tricks. > The oom-killer being invoked is the oom-killer being invoked. It's not really > a trick, and the main point was to clarify why perkeep can't/isn't logging in > this scenario.
Well, the point failed then. > I agree this little venture into the exact details of the oom-mechanic isn't > really interesting to this issue. I did not say that. It is interesting to me. > I think our next steps are the following: > > 1. Get pprof heap output from before a crash to see if there are any memory > leaks in perkeep, fix said memory leak, or > 2. Declare that for the work-load in question more memory is needed. > > Niklas, if you're familiar with reading pprof output, you could add the > environment variable `CAMLI_HTTP_PPROF=true` to your perkeep instance, and > then take regular snapshots of the "/debug/pprof/heap" and use "go tool > pprof" to see if there are any memory leaks some time before the crash. Be > sure that you're not exposing that endpoint to the outside world for security > reasons. > > I think that's our best bet for a concrete next step, though if someone else > has a better idea, please do suggest it :) > > On Sat, Sep 1, 2018 at 11:07 AM Mathieu Lonjaret <[email protected]> > wrote: >> >> well ok, but it would be more interesting to be able to reproduce >> Niklas system behaviour without any such tricks. >> >> On Sat, 1 Sep 2018 at 20:01, Euan Kemp <[email protected]> wrote: >> > >> > You can reproduce a kernel oom-kill of a specific progress using oom_adj >> > and the magic-sysrq hotkey or file. >> > >> > For example, to make pid $pid get oomkilled, you can do the following: >> > >> > $ echo 1000 | sudo tee /proc/$pid/oom_score_adj >> > $ echo f | sudo tee /proc/sysrq-trigger >> > >> > This assumes you haven't adjusted the oom score for any other pids >> > significantly up (unlikely) and that you have CONFIG_MAGIC_SYSRQ in your >> > kernel (very likely). >> > >> > Another way to do it would be to run the process in a specific cgroup >> > (e.g. with cgexec) and enforce a very low memory limit on that cgroup, and >> > that's probably a bit more robust though also a little more complicated. >> > >> > -- >> > You received this message because you are subscribed to the Google Groups >> > "Perkeep" group. >> > To unsubscribe from this group and stop receiving emails from it, send an >> > email to [email protected]. >> > For more options, visit https://groups.google.com/d/optout. >> >> -- >> You received this message because you are subscribed to the Google Groups >> "Perkeep" group. >> To unsubscribe from this group and stop receiving emails from it, send an >> email to [email protected]. >> For more options, visit https://groups.google.com/d/optout. > > -- > You received this message because you are subscribed to the Google Groups > "Perkeep" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to [email protected]. > For more options, visit https://groups.google.com/d/optout. -- You received this message because you are subscribed to the Google Groups "Perkeep" group. To unsubscribe from this group and stop receiving emails from it, send an email to [email protected]. For more options, visit https://groups.google.com/d/optout.
