Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-08-05 Thread Vít Ondruch

Dne 04. 08. 20 v 21:38 Michael Catanzaro napsal(a):
> On Tue, Aug 4, 2020 at 10:31 am, Chris Murphy
>  wrote:
>> Should we go back to the old workaround for F33? Madness for one more
>> release? And then drop the madness once there's a dnf solution?
>
> We could, but we have installed so many other things that it's
> becoming quite hard to keep track of them all, and if we're going to
> have a workaround for any one package I would recommend we use the
> same workaround for them all. And that's the merge request I have
> above. And for that to work, we would need to require that anyone
> touching comps also add a corresponding Recommends: in fedora-release.
> That would be unfortunate.


Wouldn't it be better to replace this part of comps by soft
dependencies? I quite don't understand why we have not dropped comps (at
leas for the use case of installation basic OS) when we got soft
dependencies in RPM.

Admittedly, the soft dependencies would be repeatedly installed compared
to comps, but now you are asking DNF to actually install the content of
comps repetitively. So there won't be difference at the end.


Vít


>
> I'd rather have a proper dnf fix in place for F33.
>
> ___
> devel mailing list -- devel@lists.fedoraproject.org
> To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-08-05 Thread Vít Ondruch

Dne 04. 08. 20 v 20:58 Vitaly Zaitsev via devel napsal(a):
> On 04.08.2020 16:45, Vít Ondruch wrote:
>> I think the "don't use autoremove" is better suggestion ATM, because I
>> don't really want to keep earlyoom on the system in case there is
>> systemd-oomd or whatever should be the successor.
> You can always easily swap one package to another:
>
> sudo dnf swap earlyoom systemd-oomd --allowerasing
>

I know I can swap packages and what not, but primarily I want to keep my
system in "default" state, mostly following the changes Fedora
contributors are proposing. So if the proposal is to have earlyoom
installed by default, then it is surprising it might not be installed.
This situation should be fixed generally without me changing anything.
And that is the reason I bumped this thread.


Vít

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-08-05 Thread John M. Harris Jr
On Tuesday, August 4, 2020 1:45:52 AM MST Vít Ondruch wrote:
> Yesterday, I have updated my Rawhide and wondered why `dnf autoremove`
> would want to remove earlyoom just to discover that soft dependency in
> earlyoom was dropped [1] and hence nothing requires earlyoom and DNF is
> free to remove this package (and it is possibly not installed anymore on
> upgraded systems).
> 
> Therefore I wonder what is the status of EarlyOOM. Should I let the
> package go? If not, then the situation should be fixed somehow, probably
> either by reverting the revert or adding the dependency into
> fedora-release as was proposed elsewhere.

Generally, if you let the package go, your system won't suffer from your 
processes getting killed needlessly. This is likely a benefit, so I don't know 
if this is really a bug.

-- 
John M. Harris, Jr.

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-08-04 Thread Michael Catanzaro
On Tue, Aug 4, 2020 at 10:31 am, Chris Murphy  
wrote:

Should we go back to the old workaround for F33? Madness for one more
release? And then drop the madness once there's a dnf solution?


We could, but we have installed so many other things that it's becoming 
quite hard to keep track of them all, and if we're going to have a 
workaround for any one package I would recommend we use the same 
workaround for them all. And that's the merge request I have above. And 
for that to work, we would need to require that anyone touching comps 
also add a corresponding Recommends: in fedora-release. That would be 
unfortunate.


I'd rather have a proper dnf fix in place for F33.

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-08-04 Thread Vitaly Zaitsev via devel
On 04.08.2020 16:45, Vít Ondruch wrote:
> I think the "don't use autoremove" is better suggestion ATM, because I
> don't really want to keep earlyoom on the system in case there is
> systemd-oomd or whatever should be the successor.

You can always easily swap one package to another:

sudo dnf swap earlyoom systemd-oomd --allowerasing

-- 
Sincerely,
  Vitaly Zaitsev (vit...@easycoding.org)
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-08-04 Thread Chris Murphy
On Tue, Aug 4, 2020 at 8:46 AM Vít Ondruch  wrote:
>
>
> Dne 04. 08. 20 v 16:05 Vitaly Zaitsev via devel napsal(a):
> > On 04.08.2020 15:48, Michael Catanzaro wrote:
> >> In the meantime, if you want to keep earlyoom, don't use autoremove.
> > sudo dnf mark install earlyoom
> >
>
> I think the "don't use autoremove" is better suggestion ATM, because I
> don't really want to keep earlyoom on the system in case there is
> systemd-oomd or whatever should be the successor.

systemd-oomd is coming along but it'll be Fedora 34 most likely, but
possibly Fedora 35.

Hopefully there will be a way to do some kind of "rebase" where
recommended things are favored on upgrades (to a new release version),
but without having to obsolete them, and not applied to each update
within a given release.

-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-08-04 Thread Chris Murphy
On Tue, Aug 4, 2020 at 7:49 AM Michael Catanzaro  wrote:
> In the meantime, it will get
> pulled in on upgrades to F32 due to the old workaround, but it's not
> currently being pulled in on upgrades to F33.

Should we go back to the old workaround for F33? Madness for one more
release? And then drop the madness once there's a dnf solution?

By the way... just in case it matters
https://src.fedoraproject.org/fork/catanzaro/rpms/fedora-release/c/a0df346ba785363adccdedeab4cc5d3edb24?branch=master

line 562 should be `zram-generator-defaults` - 'zram' will be
obsoleted by 'zram-generator-defaults` in F33.


-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-08-04 Thread Vít Ondruch

Dne 04. 08. 20 v 16:05 Vitaly Zaitsev via devel napsal(a):
> On 04.08.2020 15:48, Michael Catanzaro wrote:
>> In the meantime, if you want to keep earlyoom, don't use autoremove.
> sudo dnf mark install earlyoom
>

I think the "don't use autoremove" is better suggestion ATM, because I
don't really want to keep earlyoom on the system in case there is
systemd-oomd or whatever should be the successor.


Vít
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-08-04 Thread Vitaly Zaitsev via devel
On 04.08.2020 15:48, Michael Catanzaro wrote:
> In the meantime, if you want to keep earlyoom, don't use autoremove.

sudo dnf mark install earlyoom

-- 
Sincerely,
  Vitaly Zaitsev (vit...@easycoding.org)
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-08-04 Thread Michael Catanzaro


On Tue, Aug 4, 2020 at 10:45 am, Vít Ondruch  
wrote:

Yesterday, I have updated my Rawhide and wondered why `dnf autoremove`
would want to remove earlyoom just to discover that soft dependency in
earlyoom was dropped [1] and hence nothing requires earlyoom and DNF 
is
free to remove this package (and it is possibly not installed anymore 
on

upgraded systems).

Therefore I wonder what is the status of EarlyOOM. Should I let the
package go? If not, then the situation should be fixed somehow, 
probably

either by reverting the revert or adding the dependency into
fedora-release as was proposed elsewhere.


We're tracking this problem in 
https://pagure.io/fedora-workstation/issue/138 and 
https://bugzilla.redhat.com/show_bug.cgi?id=1814306. It's high priority 
for Workstation, but it's blocked on dnf. We've been working around it 
in an ad-hoc way for each package we add in a different way in every 
release. In this case, I removed our original workaround in 
https://src.fedoraproject.org/rpms/earlyoom/pull-request/2 because we 
intended to replace it with a new workaround, 
https://src.fedoraproject.org/fork/catanzaro/rpms/fedora-release/c/a0df346ba785363adccdedeab4cc5d3edb24?branch=master. 
However, we decided the new workaround was a little outrageous and we 
would just wait for a dnf fix instead. In the meantime, if you want to 
keep earlyoom, don't use autoremove. In the meantime, it will get 
pulled in on upgrades to F32 due to the old workaround, but it's not 
currently being pulled in on upgrades to F33.


Michael

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-08-04 Thread Vít Ondruch
Yesterday, I have updated my Rawhide and wondered why `dnf autoremove`
would want to remove earlyoom just to discover that soft dependency in
earlyoom was dropped [1] and hence nothing requires earlyoom and DNF is
free to remove this package (and it is possibly not installed anymore on
upgraded systems).

Therefore I wonder what is the status of EarlyOOM. Should I let the
package go? If not, then the situation should be fixed somehow, probably
either by reverting the revert or adding the dependency into
fedora-release as was proposed elsewhere.


Vít


[1]
https://src.fedoraproject.org/rpms/earlyoom/c/a6d0f45a3524830642a4120704e8d295598f8ec3?branch=master


Dne 03. 01. 20 v 20:18 Ben Cotton napsal(a):
> https://fedoraproject.org/wiki/Changes/EnableEarlyoom
>
> == Summary ==
> Install earlyoom package, and enable it by default. This will cause
> the kernel oomkiller to trigger sooner, but will not affect which
> process it chooses to kill off. The idea is to recover from out of
> memory situations sooner, rather than the typical complete system hang
> in which the user has no other choice but to force power off.
>
>
> == Owner ==
> * Name: [[User:chrismurphy| Chris Murphy]]
> * Email: bugzi...@colorremedies.com
>
> == Detailed Description ==
> Workstation working group has discussed "better interactivity in
> low-memory situations" for some months. In certain use cases,
> typically compiling, if all RAM and swap are completely consumed,
> system responsiveness becomes so abysmal that a reasonable user can
> consider the system "lost", and resorts to forcing a power off. This
> is objective a very bad UX. The broad discussion of this problem, and
> some ideas for near term and long term solutions, is located here:
>
> Recent long discussions on "Better interactivity in low-memory situations"
> https://pagure.io/fedora-workstation/issue/98
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/XUZLHJ5O32OX24LG44R7UZ2TMN6NY47N/
>
> Fedora editions and spins, have the in-kernel OOM (out-of-memory)
> manager enabled. The manager's concern is keeping the kernel itself
> functioning. It has no concern about user space function or
> interactivity. This proposed change attempts to improve the user
> experience, in the short term, by triggering the in-kernel process
> killing mechanism, sooner. Instead of the system becoming completely
> unresponsive for tens of minutes, hours or days, the expectation is an
> offending process (determined by oom_score, same as now) will be
> killed off within seconds or a few minutes. This is an incremental
> improvement in user experience, but admittedly still suboptimal. There
> is additional work on-going to improve the user experience further.
>
> Workstation working group discussion specific to enabling earlyoom by default
> https://pagure.io/fedora-workstation/issue/119
>
> Other in-progress solutions:
> https://gitlab.freedesktop.org/hadess/low-memory-monitor
>
> Background information on this complicated problem:
> https://www.kernel.org/doc/gorman/html/understand/understand016.html
> https://lwn.net/Articles/317814/
>
> == Benefit to Fedora ==
>
> There are two major benefits to Fedora:
>
> * improved user experience by more quickly regaining control over
> one's system, rather than having to force power off in low-memory
> situations where there's aggressive swapping. Once a system becomes
> unresponsive, it's completely reasonable for the user to assume the
> system is lost, but that includes high potential for data loss.
>
> * reducing forced poweroff as the main work around will increase data
> collection, improving understanding of low memory situations and how
> to handle them better
>
>
> == Scope ==
> * Proposal owners:
> a. Modify 
> {{code|https://pagure.io/fedora-comps/blob/master/f/comps-f32.xml.in}}
> to include earlyoom package for Workstation.
> b. Modify 
> {{code|https://src.fedoraproject.org/rpms/fedora-release/blob/master/f/80-workstation.preset}}
> to include:
> 
> # enable earlyoom by default on workstation
> enable earlyoom.service
> 
>
> * Other developers:
> Restricted to Workstation edition, unless other editions/spins want to opt-in.
>
> * Release engineering: [https://pagure.io/releng/issues #9141] (a
> check of an impact with Release Engineering is needed) 
>
> * Policies and guidelines: N/A
> * Trademark approval: N/A
>
> == Upgrade/compatibility impact ==
> earlyoom.service will be enabled on upgrade. An upgraded system should
> exhibit the same behaviors as a clean installed system.
>
> == How To Test ==
> * Fedora 30/31 users can test today, any edition or spin:
> {{code|sudo dnf install earlyoom}}
> {{code|sudo systemctl enable --now earlyoom}}
>
> And then attempt to cause an out of memory situation. Examples:
> {{code|tail /dev/zero}}
> {{code|https://lkml.org/lkml/2019/8/4/15}}
>
> * Fedora Workstation 32 (and Rawhide) users will see this service is
> already enabled. It can be toggled with  {{code|sudo 

Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-15 Thread Neal Gompa
On Wed, Jan 15, 2020 at 6:40 PM John M. Harris Jr  wrote:
>
> On Wednesday, January 8, 2020 12:24:23 PM MST Chris Murphy wrote:
> > Looks like PSI based oom killing doesn't work without swap. Therefore
> > oomd can't be considered a universal solution. Quite a lot of
> > developers have workstations with quite a decent amount of RAM,
> > ~64GiB, and do not use swap at all. Server baremetal are likewise
> > mixed, depending on workloads, and in cloud it's rare for swap to
> > exist.
>
> While it's true some systems without swap are possible, it's not common,
> except in the case of virtual machines, in which case it depends heavily on
> the vendor.
>

Nearly all of our servers don't have swap in $DAYJOB's server fleet
(thousands of servers). In my experience, it's an increasingly common
thing.



-- 
真実はいつも一つ!/ Always, there's only one truth!
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-15 Thread John M. Harris Jr
On Wednesday, January 8, 2020 12:24:23 PM MST Chris Murphy wrote:
> Looks like PSI based oom killing doesn't work without swap. Therefore
> oomd can't be considered a universal solution. Quite a lot of
> developers have workstations with quite a decent amount of RAM,
> ~64GiB, and do not use swap at all. Server baremetal are likewise
> mixed, depending on workloads, and in cloud it's rare for swap to
> exist.

While it's true some systems without swap are possible, it's not common, 
except in the case of virtual machines, in which case it depends heavily on 
the vendor.

-- 
John M. Harris, Jr.
Splentity

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-10 Thread Chris Murphy
On Fri, Jan 10, 2020 at 2:05 AM Lennart Poettering  wrote:
>
> On Mi, 08.01.20 12:24, Chris Murphy (li...@colorremedies.com) wrote:
>
> > On Mon, Jan 6, 2020 at 11:09 AM Lennart Poettering  
> > wrote:
> > >
> > > - facebook is working on making oomd something that just works for
> > >   everyone, they are in the final rounds of canonicalizing the
> > >   configuration so that it can just work for all workloads without
> > >   tuning. The last bits for this to be deployable are currently being
> > >   done on the kernel side ("iocost"), when that's in, they'll submit
> > >   oomd (or simplified parts of it) to systemd, so that it's just there
> > >   and works. It's their expressive intention to make this something
> > >   that also works for desktop stuff and requires no further
> > >   tuning. they also will do the systemd work necessary. time frame:
> > >   half a year, maybe one year, but no guarantees.
> >
> > Looks like PSI based oom killing doesn't work without swap. Therefore
> > oomd can't be considered a universal solution. Quite a lot of
> > developers have workstations with quite a decent amount of RAM,
> > ~64GiB, and do not use swap at all. Server baremetal are likewise
> > mixed, depending on workloads, and in cloud it's rare for swap to
> > exist.
> >
> > https://github.com/facebookincubator/oomd/issues/80
> >
> > We think earlyoom can be adjusted to work well for both the swap and
> > no swap use cases.
>
> Isn't rearlyoom also watching the swap metrics only?

No, memory free and swap free, as a percentage. Super simplistic. If
there is no swap, then the percent only applies to MemAvailable.

https://pagure.io/fedora-workstation/issue/119#comment-619749


-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-10 Thread Chris Murphy
On-going related discussions on linux-mm@ list

user space unresponsive, followup: lsf/mm congestion
https://marc.info/?t=15784291223=1=2

Gist here is the kernel is working as expected. The process is asking
for resources that don't exist and the kernel can't really assume
either the workload or the user's intent or wish. Maybe they want the
system to finish the task even if it's unusuable?

OOM killer not nearly agressive enough?
https://marc.info/?t=15784298701=1=2

Not clear where it's stuck, reclaim or waiting on a lock - more info needed.

---
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-10 Thread drago01
On Wed, Jan 8, 2020 at 8:25 PM Chris Murphy  wrote:
>
> On Mon, Jan 6, 2020 at 11:09 AM Lennart Poettering  
> wrote:
> >
> > - facebook is working on making oomd something that just works for
> >   everyone, they are in the final rounds of canonicalizing the
> >   configuration so that it can just work for all workloads without
> >   tuning. The last bits for this to be deployable are currently being
> >   done on the kernel side ("iocost"), when that's in, they'll submit
> >   oomd (or simplified parts of it) to systemd, so that it's just there
> >   and works. It's their expressive intention to make this something
> >   that also works for desktop stuff and requires no further
> >   tuning. they also will do the systemd work necessary. time frame:
> >   half a year, maybe one year, but no guarantees.
>
> Looks like PSI based oom killing doesn't work without swap. Therefore
> oomd can't be considered a universal solution. Quite a lot of
> developers have workstations with quite a decent amount of RAM,
> ~64GiB, and do not use swap at all. Server baremetal are likewise
> mixed, depending on workloads, and in cloud it's rare for swap to
> exist.
>
> https://github.com/facebookincubator/oomd/issues/80
>
> We think earlyoom can be adjusted to work well for both the swap and
> no swap use cases.

How? On a system with 64GB of ram and no swap all it does currently is
reducing the amount of usable memory significantly.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-10 Thread Lennart Poettering
On Mi, 08.01.20 12:24, Chris Murphy (li...@colorremedies.com) wrote:

> On Mon, Jan 6, 2020 at 11:09 AM Lennart Poettering  
> wrote:
> >
> > - facebook is working on making oomd something that just works for
> >   everyone, they are in the final rounds of canonicalizing the
> >   configuration so that it can just work for all workloads without
> >   tuning. The last bits for this to be deployable are currently being
> >   done on the kernel side ("iocost"), when that's in, they'll submit
> >   oomd (or simplified parts of it) to systemd, so that it's just there
> >   and works. It's their expressive intention to make this something
> >   that also works for desktop stuff and requires no further
> >   tuning. they also will do the systemd work necessary. time frame:
> >   half a year, maybe one year, but no guarantees.
>
> Looks like PSI based oom killing doesn't work without swap. Therefore
> oomd can't be considered a universal solution. Quite a lot of
> developers have workstations with quite a decent amount of RAM,
> ~64GiB, and do not use swap at all. Server baremetal are likewise
> mixed, depending on workloads, and in cloud it's rare for swap to
> exist.
>
> https://github.com/facebookincubator/oomd/issues/80
>
> We think earlyoom can be adjusted to work well for both the swap and
> no swap use cases.

Isn't rearlyoom also watching the swap metrics only?

Lennart

--
Lennart Poettering, Berlin
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-09 Thread Chris Murphy
On Thu, Jan 9, 2020 at 5:58 AM Benjamin Berg  wrote:
>
> On Wed, 2020-01-08 at 12:24 -0700, Chris Murphy wrote:
> > On Mon, Jan 6, 2020 at 11:09 AM Lennart Poettering  
> > wrote:
> > > - facebook is working on making oomd something that just works for
> > >   everyone, they are in the final rounds of canonicalizing the
> > >   configuration so that it can just work for all workloads without
> > >   tuning. The last bits for this to be deployable are currently being
> > >   done on the kernel side ("iocost"), when that's in, they'll submit
> > >   oomd (or simplified parts of it) to systemd, so that it's just there
> > >   and works. It's their expressive intention to make this something
> > >   that also works for desktop stuff and requires no further
> > >   tuning. they also will do the systemd work necessary. time frame:
> > >   half a year, maybe one year, but no guarantees.
> >
> > Looks like PSI based oom killing doesn't work without swap. Therefore
> > oomd can't be considered a universal solution. Quite a lot of
> > developers have workstations with quite a decent amount of RAM,
> > ~64GiB, and do not use swap at all. Server baremetal are likewise
> > mixed, depending on workloads, and in cloud it's rare for swap to
> > exist.
> >
> > https://github.com/facebookincubator/oomd/issues/80
> >
> > We think earlyoom can be adjusted to work well for both the swap and
> > no swap use cases.
>
> But so can oomd, after all, they are willing to implement a plugin that
> uses the MaxAvailable heuristic. It just won't be available in the
> short term.
>
> In principle, I think what we are trying to achieve here is to keep the
> system mostly responsive from a user perspective. This seems to imply
> keeping pages in main memory that belong to "important" processes.

Right, not merely clobbering a process the user ostensibly wants to
run and complete. The just don't want it to take over.


> Should oomd not manage to do this well enough out of the box, then I
> see two main methods we have to improve things:
>
>  * Aggressively kill when we think important pages might get evicted
>- earlyoom does this based on MemAvailable
>- oomd plugin could do the same if deemed the right thing
>  * Actively protect important processes[1]:
>- set MemoryMin, MemoryLow on important units
>- limit "normal" processes more e.g. MemoryHigh for applications
>- in the long run: adjust the OOMScore/MemoryHigh dynamically based
>  on whether the user is interacting with an application at the time
>
> earlyoom does the first and has the big advantage that it can be
> shipped in F32. However, it is not clear to me that this aggressive
> heuristic is actually better overall. And even if it is, we would
> likely still move it into oomd in the long run.

I agree, although the decisions made in this release cycle can really
only be made based on what we know now. Earlyoom has a chance of
making this a better experience in the case where something really
should be OOM killed, just sooner than the kernel's oom-killer would
have. It doesn't solve the unresponsiveness problem that happens once
RAM is full, but before swap reaches 10%. In any case, it's not going
on a process kill spree. It's not going to magically free up a system
every time its under swap duress.

I've got cases where a system is under significant duress with only
50% swap use - earlyoom does nothing for that.

>
> Finally, for F32 we might already be able to improve things quite a lot
> simply by setting a few configuration options in GNOME systemd units.

Maybe. What are the risks? Is it fair to characterize this as more of
optimization of existing functionality, than it is a feature? That's a
technical question. Of course, if this improves responsivity of the
system while under swap thrashing, it's definitely a marketable
feature!

-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-09 Thread Benjamin Berg
On Wed, 2020-01-08 at 12:24 -0700, Chris Murphy wrote:
> On Mon, Jan 6, 2020 at 11:09 AM Lennart Poettering  
> wrote:
> > - facebook is working on making oomd something that just works for
> >   everyone, they are in the final rounds of canonicalizing the
> >   configuration so that it can just work for all workloads without
> >   tuning. The last bits for this to be deployable are currently being
> >   done on the kernel side ("iocost"), when that's in, they'll submit
> >   oomd (or simplified parts of it) to systemd, so that it's just there
> >   and works. It's their expressive intention to make this something
> >   that also works for desktop stuff and requires no further
> >   tuning. they also will do the systemd work necessary. time frame:
> >   half a year, maybe one year, but no guarantees.
> 
> Looks like PSI based oom killing doesn't work without swap. Therefore
> oomd can't be considered a universal solution. Quite a lot of
> developers have workstations with quite a decent amount of RAM,
> ~64GiB, and do not use swap at all. Server baremetal are likewise
> mixed, depending on workloads, and in cloud it's rare for swap to
> exist.
> 
> https://github.com/facebookincubator/oomd/issues/80
> 
> We think earlyoom can be adjusted to work well for both the swap and
> no swap use cases.

But so can oomd, after all, they are willing to implement a plugin that
uses the MaxAvailable heuristic. It just won't be available in the
short term.

In principle, I think what we are trying to achieve here is to keep the
system mostly responsive from a user perspective. This seems to imply
keeping pages in main memory that belong to "important" processes.

Should oomd not manage to do this well enough out of the box, then I
see two main methods we have to improve things:

 * Aggressively kill when we think important pages might get evicted
   - earlyoom does this based on MemAvailable
   - oomd plugin could do the same if deemed the right thing
 * Actively protect important processes[1]:
   - set MemoryMin, MemoryLow on important units
   - limit "normal" processes more e.g. MemoryHigh for applications
   - in the long run: adjust the OOMScore/MemoryHigh dynamically based
 on whether the user is interacting with an application at the time

earlyoom does the first and has the big advantage that it can be
shipped in F32. However, it is not clear to me that this aggressive
heuristic is actually better overall. And even if it is, we would
likely still move it into oomd in the long run.

Finally, for F32 we might already be able to improve things quite a lot
simply by setting a few configuration options in GNOME systemd units.

Benjamin

[1] I do not know how well this works, so it may be nice if people
experimented with it[2]. For GNOME you can easily add a systemd drop-in 
for various services. e.g. to protect the shell (in a wayland session)
simply do:

$ systemctl edit --user gnome-shell-wayland.service
[Service]
MemoryMin=250M
MemoryLow=500M

Which I suspect should already help a lot in many scenarios.

[2] Unfortunately, I guess that such measurements may be skewed a lot
on systems that use swap due to unrelated lags. i.e. Jan Grulich's mail
from earlier today titled "Lagging system with latest kernels".
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-08 Thread Chris Murphy
On Mon, Jan 6, 2020 at 11:09 AM Lennart Poettering  wrote:
>
> - facebook is working on making oomd something that just works for
>   everyone, they are in the final rounds of canonicalizing the
>   configuration so that it can just work for all workloads without
>   tuning. The last bits for this to be deployable are currently being
>   done on the kernel side ("iocost"), when that's in, they'll submit
>   oomd (or simplified parts of it) to systemd, so that it's just there
>   and works. It's their expressive intention to make this something
>   that also works for desktop stuff and requires no further
>   tuning. they also will do the systemd work necessary. time frame:
>   half a year, maybe one year, but no guarantees.

Looks like PSI based oom killing doesn't work without swap. Therefore
oomd can't be considered a universal solution. Quite a lot of
developers have workstations with quite a decent amount of RAM,
~64GiB, and do not use swap at all. Server baremetal are likewise
mixed, depending on workloads, and in cloud it's rare for swap to
exist.

https://github.com/facebookincubator/oomd/issues/80

We think earlyoom can be adjusted to work well for both the swap and
no swap use cases.

-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread Chris Murphy
On Tue, Jan 7, 2020 at 1:48 PM Mark Otaris  wrote:
>
> I intended to demonstrate that cgroups can be used to cause the kernel OOM
> killer to react appropriately and fast enough, implying that replacing the
> OOM killer is not necessary and that replacing it by a userspace OOM killer
> that does not account for cgroups can be undesirable. The exact same controls
> set with my example commands, and others, can be set with scopes as well,
> so this should be applicable.
>
> > https://lore.kernel.org/linux-fsdevel/20200104090955.gf23...@dread.disaster.area/T/#m8b25fd42501d780d8053fc7aa9f4e3a28a19c49f
>
> Okay, interesting. But that’s a statement from just one person, and it has to
> be interpreted in the context of what it is confirming; that is, that the OOM
> killer is “mainly concerned about kernel survival in low memory situations”,
> which is weaker than your claim that “their concern with kernel oom-killer is
> strictly with keeping the kernel functioning”. I don’t know if the OOM 
> killer’s
> main purpose is to keep the kernel alive (Michal Hocko appears to think so,
> maybe others disagree), but it is in any case not an abuse of the OOM killer 
> to
> also use it to keep userspace responsive,

The oom killer doesn't keep user space responsive per se, in your
example that's done by cgroups restricting resources. And that's neat,
and necessary to keep making forward progress on. But we don't have
that for unprivileged process right now, unless the user knows the
secret decoder ring command to use to do this every time they run
something in Terminal; and then have some idea to hint at what
resources are needed for the task to succeed rather than just get
clobbered anyway.

That's maybe the elephant in the room with earlyoom (or one of them),
yes we've recovered sooner, the user can hopefully save their data and
reboot. But did their task succeed? No. It got clobbered.


>and there is no reason to think that
> kernel folks are not interested in helping achieve this goal.

I did mean with a kernel only solution. I've been tracking this issue
for 6-7 months including the congestion and kswapd discussions
on-going, so I know they do care broadly about providing some
mechanisms by which user space can better behave. But all of that
requires varying degrees of opt-in, and quite a lot of it involves
considerable work to even understand it, let alone implement it.

>The only
> advantage I see to earlyoom so far is that it sends SIGTERM before taking
> further steps that will kill processes.

Yes and it happens sooner. Probably not soon enough for many users.
There may be some risk by overpromising and under delivering: by
making it the default and then for the vast majority of cases it
doesn't matter, because users are long since conditioned to just force
power off within a minute or less of the GUI stuttering or freezing up
on them. It is very workload and system specific.

-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread Mark Otaris
I intended to demonstrate that cgroups can be used to cause the kernel OOM
killer to react appropriately and fast enough, implying that replacing the
OOM killer is not necessary and that replacing it by a userspace OOM killer
that does not account for cgroups can be undesirable. The exact same controls
set with my example commands, and others, can be set with scopes as well,
so this should be applicable.

> https://lore.kernel.org/linux-fsdevel/20200104090955.gf23...@dread.disaster.area/T/#m8b25fd42501d780d8053fc7aa9f4e3a28a19c49f

Okay, interesting. But that’s a statement from just one person, and it has to
be interpreted in the context of what it is confirming; that is, that the OOM
killer is “mainly concerned about kernel survival in low memory situations”,
which is weaker than your claim that “their concern with kernel oom-killer is
strictly with keeping the kernel functioning”. I don’t know if the OOM killer’s
main purpose is to keep the kernel alive (Michal Hocko appears to think so,
maybe others disagree), but it is in any case not an abuse of the OOM killer to
also use it to keep userspace responsive, and there is no reason to think that
kernel folks are not interested in helping achieve this goal. The only
advantage I see to earlyoom so far is that it sends SIGTERM before taking
further steps that will kill processes.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread Zbigniew Jędrzejewski-Szmek
On Tue, Jan 07, 2020 at 09:19:47AM -0600, Michael Catanzaro wrote:
> On Tue, Jan 7, 2020 at 5:27 am, Mark Otaris  wrote:
> >Try it. With a memory limit,
> >
> >podman run --rm -it --memory=1G fedora bash -c 'dnf install -y
> >stress-ng && stress-ng --malloc 100 --memcpy 100 --mmap 100 --vm
> >100'
> >
> >will use CPU but keep your system responsive. Without the memory limit
> >(this will hang your system),
> >
> >podman run --rm -it fedora bash -c 'dnf install -y stress-ng &&
> >stress-ng --malloc 100 --memcpy 100 --mmap 100 --vm 100'
> >
> >the system hangs and doesn’t recover after 15 minutes.
> 
> I don't think we can use this, though; or at least, I don't see how.
> systemd allows limiting the memory accessible to a scope, but it
> doesn't allow carving out memory for one particular scope that is
> not to be accessible to other scopes. So I don't see a way to use
> these memory limits to ensure sufficient memory remains available to
> critical session processes. (Am I missing something?)

systemd is just a proxy for the kernel here.
The kernel allows memory.min to be set, which is defined as [1]
> Hard memory protection. If the memory usage of a cgroup is within
> its effective min boundary, the cgroup’s memory won’t be reclaimed
> under any conditions.

There is also memory.low which is weaker:
> Best-effort memory protection. If the memory usage of a cgroup is
> within its effective low boundary, the cgroup’s memory won’t be
> reclaimed unless there is no reclaimable memory available in
> unprotected cgroups.

I think that a combination of those two settings could be sufficient
to give us appropriate memory protection for a graphical session.
I envision the limits as being set using some simple formula based
on the total RAM available and the desktop environment used at machine
boot.

[1] 
https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory-interface-files

Zbyszek
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread Lennart Poettering
On Di, 07.01.20 09:27, Michael Catanzaro (mcatanz...@gnome.org) wrote:

> On Mon, Jan 6, 2020 at 7:09 pm, Lennart Poettering 
> wrote:
> > - oomd currently polls some parameters in time intervals too,
> >   still. They are working on getting rid of that too, so that
> >   everything is event based via PSI. Given their own focus on servers
> >   it's not a primary goal, but still a goal.
>
> Alexey seems really pessimistic about PSI. It looks like he expects any
> solution based on PSI will fail:
>
> https://pagure.io/fedora-workstation/issue/98#comment-619086
>
> So that seems like the most important problem right now. Looks like Benjamin
> has already solved the problem of isolating apps into separate systemd
> scopes. Alexey's concern about browser tabs is similarly solvable. But if
> PSI in general is too difficult to configure, this plan isn't going to work.

Well, I personally certainly trust Tejun to deliver if he says he'll
deliver. He has a pretty good track record, and it's his explicit goal
to make this stuff work.

Lennart

--
Lennart Poettering, Berlin
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread Chris Murphy
On Tue, Jan 7, 2020 at 8:55 AM Chris Murphy  wrote:
>
> On Mon, Jan 6, 2020 at 10:28 PM Mark Otaris  wrote:
> >
> > > For now, kernel developers have made it clear they do not care about
> > > user space responsiveness. At all. Their concern with kernel
> > > oom-killer is strictly with keeping the kernel functioning.
> >
> > This is false. The stated purpose of the OOM killer is not only to keep the
> > kernel alive.
>
> https://lore.kernel.org/linux-fsdevel/20200104090955.gf23...@dread.disaster.area/T/#m8b25fd42501d780d8053fc7aa9f4e3a28a19c49f

Sorry, that's a long email and set of threads. As it relates to the
above, the phrase to search for is: "This is indeed the case."


-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread Chris Murphy
On Mon, Jan 6, 2020 at 10:28 PM Mark Otaris  wrote:
>
> > For now, kernel developers have made it clear they do not care about
> > user space responsiveness. At all. Their concern with kernel
> > oom-killer is strictly with keeping the kernel functioning.
>
> This is false. The stated purpose of the OOM killer is not only to keep the
> kernel alive.

https://lore.kernel.org/linux-fsdevel/20200104090955.gf23...@dread.disaster.area/T/#m8b25fd42501d780d8053fc7aa9f4e3a28a19c49f


>Nor does the fact the kernel has not solved userspace
> responsiveness yet imply that kernel folks do not care. Rather, it means that
> they will not solve it on their own because the kernel does not have all the
> information it needs. Kernel folks do care, or we wouldn’t have PSI or 
> cgroups.

OK.


> > Can it be done with cgroupv2 and PSI alone? Unclear.
>
> Of course it can. Just run 100 instances of every stress-ng memory worker in
> a podman container with a cgroup memory limit. The system will not hang.

a. Not everything is running or will run in a container;
b. To what degree cgroups and PSI, making no other changes,
solves/avoids the problem under discussion is workload dependent.
That's stated in the last part of the email response I reference
above.

Of course, Fedora Workstation is a general purpose operating system.
It's untenable to have workload specific operating systems. By what
mechanism is the workload categorized? And by what mechanism is the
system dynamical (re)configured?


> Try it. With a memory limit,
>
> podman run --rm -it --memory=1G fedora bash -c 'dnf install -y stress-ng && 
> stress-ng --malloc 100 --memcpy 100 --mmap 100 --vm 100'

When I ask the question "can it be done" I'm asking for cake and
eating it too. I'm not asking for an example of running something that
doesn't implode the system, I know about that. I want to compile
something, and have the system figure out the resources it can give to
that task, without killing it, and without impacting the responsivity
of my computer.


-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread Michael Catanzaro
On Mon, Jan 6, 2020 at 7:09 pm, Lennart Poettering 
 wrote:

- oomd currently polls some parameters in time intervals too,
  still. They are working on getting rid of that too, so that
  everything is event based via PSI. Given their own focus on servers
  it's not a primary goal, but still a goal.


Alexey seems really pessimistic about PSI. It looks like he expects any 
solution based on PSI will fail:


https://pagure.io/fedora-workstation/issue/98#comment-619086

So that seems like the most important problem right now. Looks like 
Benjamin has already solved the problem of isolating apps into separate 
systemd scopes. Alexey's concern about browser tabs is similarly 
solvable. But if PSI in general is too difficult to configure, this 
plan isn't going to work.


___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread Michael Catanzaro

On Tue, Jan 7, 2020 at 5:27 am, Mark Otaris  wrote:

Try it. With a memory limit,

podman run --rm -it --memory=1G fedora bash -c 'dnf install -y 
stress-ng && stress-ng --malloc 100 --memcpy 100 --mmap 100 --vm 100'


will use CPU but keep your system responsive. Without the memory limit
(this will hang your system),

podman run --rm -it fedora bash -c 'dnf install -y stress-ng && 
stress-ng --malloc 100 --memcpy 100 --mmap 100 --vm 100'


the system hangs and doesn’t recover after 15 minutes.


I don't think we can use this, though; or at least, I don't see how. 
systemd allows limiting the memory accessible to a scope, but it 
doesn't allow carving out memory for one particular scope that is not 
to be accessible to other scopes. So I don't see a way to use these 
memory limits to ensure sufficient memory remains available to critical 
session processes. (Am I missing something?)


___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread Michael Catanzaro

On Tue, Jan 7, 2020 at 11:23 am, Kamil Paral  wrote:
In your example you forget that swap needs to filled almost to full 
for early-oom to start reacting. That takes time during which the 
system responsibility is abysmal. The UX difference happens only 
after you've already suffered through a serious responsivity 
degradation, and the only difference is the end state, *if* you've 
managed to wait long enough for early-oom to kick in (which happens 
earlier than kernel oom and with better results about which process 
gets killed, according to Chris).


Right, we understand this. earlyoom (or a systemd-level OOM solution) 
is only half the solution. The other half will be fixing swap. That 
will probably require (a) reducing the amount of swap created by 
anaconda, and/or (b) swap on zram.


___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread Michael Catanzaro


On Tue, Jan 7, 2020 at 10:09 am, Benjamin Berg 
 wrote:

Even if that is the case, on F31 (with GNOME 3.34.2) we do place most
user processes into separate scopes[1]. This is not perfect, because 
it

currently only affects processes launched by gnome-shell, gnome-
settings-daemon and gnome-session. So everything spawned by e.g.
nautilus (easily fixable) or the terminal may still end up in their
parents scope.


Awesome.

What about D-Bus activation?

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread Sheogorath via devel
On Sat, 2020-01-04 at 12:17 -0600, Michael Catanzaro wrote:
> On Sat, Jan 4, 2020 at 11:38 am, Zbigniew Jędrzejewski-Szmek 
>  wrote:
> > What about using the memory controller for user units to allocate
> > memory resources between the processes in the user session? Thanks
> > to
> > recent developments, the gnome session uses separate systemd units
> > (and thus separate cgroups) for various services. We could set 
> > attributes
> > like memory.low for "the basic components of the user session",
> > and on the other hand, memory.swap.max for "the payload", i.e.
> > various
> > user processes on top.
> 
> This looks interesting. I'd love to see more serious discussion of
> this 
> proposal. Carving out dedicated memory for essential desktop
> processes 
> seems like something we should be able to do in 2020.
> 

And it seems like it is: In the issue about this whole topic some
implemented solutions where mentioned: 
https://github.com/Nefelim4ag/Ananicy

But not further commented at least on pagure. 
https://pagure.io/fedora-workstation/issue/98#comment-615424

Which I think is quite sad as those seem to be the way better way to
handle those things. Having a daemon that assigns cgroups to processes
seems to let the kernel do its thing and keep us all sane and keeps the
system reasonable responsive.

I guess the important question here is: Does it really prevent hanging
and what's the origin of hanging? Is it that the kernel starts to swap
and therefore eats up all CPU time or is it the programs in foreground
that suddenly all try to get their piece of memory back that forces
kswapd onto the CPU?

My guess would be the latter, but I'm sure the group who did the
research on this topic has a better insight into this.

-- 
Signed
Sheogorath

OpenPGP: https://shivering-isles.com/openpgp/0xFCB98C2A3EC6F601.txt


signature.asc
Description: This is a digitally signed message part
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread Benjamin Berg
On Tue, 2020-01-07 at 11:28 +0100, Lennart Poettering wrote:
> On Mo, 06.01.20 14:53, Michael Catanzaro (mcatanz...@gnome.org) wrote:
> 
> > On Mon, Jan 6, 2020 at 7:09 pm, Lennart Poettering 
> > wrote:
> > > - facebook is working on making oomd something that just works for
> > >   everyone, they are in the final rounds of canonicalizing the
> > >   configuration so that it can just work for all workloads without
> > >   tuning. The last bits for this to be deployable are currently being
> > >   done on the kernel side ("iocost"), when that's in, they'll submit
> > >   oomd (or simplified parts of it) to systemd, so that it's just there
> > >   and works. It's their expressive intention to make this something
> > >   that also works for desktop stuff and requires no further
> > >   tuning. they also will do the systemd work necessary. time frame:
> > >   half a year, maybe one year, but no guarantees.
> > 
> > Asking around, I understand oomd only operates at the cgroup level, i.e. it
> > kills an entire cgroup at once, not individual processes. So I understand
> > this would also depend on GNOME-level work to ensure individual applications
> > get launched in their own systemd scopes, yes?
> 
> That would be a good idea, yes. But there'd be a knob for that in the
> unit files.
> 
> I mean, OOMPolicy= currently can be set to "stop", "kill" or
> "continue", where "stop" means "when a process of service X is OOM
> killed, attempt to shutdown all of X in a friendly way"; "kill" means
> "when a process of service X is OOM killed, forcibly kill all other
> processes of X too"; "continue" means "if a process of service X is
> OOM killed, do nothing else".

Yep, changing the OOMPolicy was considered at first. But creating new
scopes for spawned children is simple enough and it also solves some
other issues (e.g. not killing children when gnome-shell is restarted).

> The expectation here is that most services will want "stop" but
> services that are more "application servers" than an individual
> service (think: apache with its cgi scripts or crond with its
> cronjobs) would set OOMPolicy=continue, since if one of their jobs
> misbheaves they probably should continue running.
> 
> But yeah, the focus where things are going are clearly towards making
> a cgroup the unit that is managed as a whole.

Benjamin


signature.asc
Description: This is a digitally signed message part
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread Benjamin Berg
Hi,

[resend this older message for the list]

On Mon, 2020-01-06 at 14:53 -0600, Michael Catanzaro wrote:
> On Mon, Jan 6, 2020 at 7:09 pm, Lennart Poettering 
>  wrote:
> > - facebook is working on making oomd something that just works for
> >   everyone, they are in the final rounds of canonicalizing the
> >   configuration so that it can just work for all workloads without
> >   tuning. The last bits for this to be deployable are currently being
> >   done on the kernel side ("iocost"), when that's in, they'll submit
> >   oomd (or simplified parts of it) to systemd, so that it's just there
> >   and works. It's their expressive intention to make this something
> >   that also works for desktop stuff and requires no further
> >   tuning. they also will do the systemd work necessary. time frame:
> >   half a year, maybe one year, but no guarantees.
> 
> Asking around, I understand oomd only operates at the cgroup level, 
> i.e. it kills an entire cgroup at once, not individual processes. So I 
> understand this would also depend on GNOME-level work to ensure 
> individual applications get launched in their own systemd scopes, yes?

Even if that is the case, on F31 (with GNOME 3.34.2) we do place most
user processes into separate scopes[1]. This is not perfect, because it
currently only affects processes launched by gnome-shell, gnome-
settings-daemon and gnome-session. So everything spawned by e.g.
nautilus (easily fixable) or the terminal may still end up in their
parents scope.

But, I would say the cgroup separation is pretty much good enough
already. So even if it is a requirement, I would not worry about it
beyond making sure that some applications like nautilus get fixes.

Benjamin

[1] They are named gnome-launched-X-Y.scope and get bound to the
lifetime of the session using a drop-in.
Personally I also added a drop-in to limit memory consumption for
Evolution that way. It tends to just disappear sometimes now. Which is
kind of neat but it would be nice to also get a notification.


signature.asc
Description: This is a digitally signed message part
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread Benjamin Berg
On Tue, 2020-01-07 at 11:44 +0100, Benjamin Berg wrote:
> On Tue, 2020-01-07 at 10:21 +, Zbigniew Jędrzejewski-Szmek wrote:
> > I'm quoting from my mail from this same thread:
> > 
> > │ ├─gnome-shell-wayland.service
> > │ │ ├─1501571 /usr/bin/gnome-shell
> > │ │ ├─1501606 /usr/bin/Xwayland :0 -rootless -noreset -accessx
> > -core
> > -auth /run/user/1000/.mutter-Xwaylandauth.SCXID0
> > -listen 4 -listen 5 -displayfd 6
> > │ │ ├─1501713 ibus-daemon --panel disable -r --xim
> > │ │ ├─1501718 /usr/libexec/ibus-dconf
> > │ │ ├─1501719 /usr/libexec/ibus-extension-gtk3
> > │ │ ├─1501724 /usr/libexec/ibus-x11 --kill-daemon
> > │ │ ├─1501980 /usr/libexec/ibus-engine-simple
> > │ │ ├─1503586 /usr/lib64/firefox/firefox
> > │ │ ├─1503691 /usr/lib64/firefox/firefox -contentproc -childID 2
> > -isForBrowser ...
> > │ │ ├─1503701 /usr/lib64/firefox/firefox -contentproc -childID 3
> > -isForBrowser ...
> > │ │ ├─1503747 /usr/lib64/firefox/firefox -contentproc -childID 4
> > -isForBrowser ...
> > │ │ ├─1520219 bwrap --args 35 telegram-desktop --
> > │ │ ├─1520229 bwrap --args 35 xdg-dbus-proxy --args=37
> > │ │ ├─1520230 xdg-dbus-proxy --args=37
> > │ │ ├─1520232 bwrap --args 35 telegram-desktop --
> > │ │ ├─1520233 /app/bin/Telegram --
> > │ │ ├─1540753 pavucontrol
> > ...
> 
> (Oh, what is the command to get this output?)

Aha, systemd-cgls, and this should be a normal F31:

│   │ ├─gnome-shell-wayland.service
│   │ │ ├─2160536 /usr/bin/gnome-shell
│   │ │ ├─2160575 /usr/bin/Xwayland :0 -rootless -noreset -accessx -core -auth 
/run/user/1000/.mutter-Xwaylandauth.4R9ED0 -listen 4 -listen 5 -displayfd 6
│   │ │ ├─2160744 ibus-daemon --panel disable -r --xim
│   │ │ ├─2160754 /usr/libexec/ibus-dconf
│   │ │ ├─2160755 /usr/libexec/ibus-extension-gtk3
│   │ │ ├─2160759 /usr/libexec/ibus-x11 --kill-daemon
│   │ │ └─2160998 /usr/libexec/ibus-engine-simple

Benjamin
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread Benjamin Berg
On Tue, 2020-01-07 at 10:21 +, Zbigniew Jędrzejewski-Szmek wrote:
> On Tue, Jan 07, 2020 at 11:07:49AM +0100, Benjamin Berg wrote:
> > On Tue, 2020-01-07 at 09:47 +, Zbigniew Jędrzejewski-Szmek wrote:
> > > I wanted to ask about this too... but didn't know where ;)
> > > As of today, gnome-shell in F31 seems to start almost everything 
> > > as separate systemd user scopes:
> > > 
> > > - various services started automaticlly like /usr/libexec/gsd-
> > > power,
> > >   /usr/libexec/gsd-sound, etc.
> > > 
> > > - flatpaks (this seems to be new, I had them running under
> > >   gnome-shell-wayland.service last week!)
> > 
> > Hmm, pretty sure flatpaks have always created their own scopes.
> 
> I'm quoting from my mail from this same thread:
> 
> │ ├─gnome-shell-wayland.service
> │ │ ├─1501571 /usr/bin/gnome-shell
> │ │ ├─1501606 /usr/bin/Xwayland :0 -rootless -noreset -accessx -core
> -auth /run/user/1000/.mutter-Xwaylandauth.SCXID0
> -listen 4 -listen 5 -displayfd 6
> │ │ ├─1501713 ibus-daemon --panel disable -r --xim
> │ │ ├─1501718 /usr/libexec/ibus-dconf
> │ │ ├─1501719 /usr/libexec/ibus-extension-gtk3
> │ │ ├─1501724 /usr/libexec/ibus-x11 --kill-daemon
> │ │ ├─1501980 /usr/libexec/ibus-engine-simple
> │ │ ├─1503586 /usr/lib64/firefox/firefox
> │ │ ├─1503691 /usr/lib64/firefox/firefox -contentproc -childID 2
> -isForBrowser ...
> │ │ ├─1503701 /usr/lib64/firefox/firefox -contentproc -childID 3
> -isForBrowser ...
> │ │ ├─1503747 /usr/lib64/firefox/firefox -contentproc -childID 4
> -isForBrowser ...
> │ │ ├─1520219 bwrap --args 35 telegram-desktop --
> │ │ ├─1520229 bwrap --args 35 xdg-dbus-proxy --args=37
> │ │ ├─1520230 xdg-dbus-proxy --args=37
> │ │ ├─1520232 bwrap --args 35 telegram-desktop --
> │ │ ├─1520233 /app/bin/Telegram --
> │ │ ├─1540753 pavucontrol
> ...

(Oh, what is the command to get this output?)

This is not what I am seeing here. My gnome-shell cgroup only contains
the gnome-shell, Xwayland and ibus. And I have separate .scope units
for Telegram (flatpak-org.telegram.desktop-2162569.scope), evolution
(gnome-launched-org.gnome.Epiphany.desktop-2162690.scope), …

And I am pretty sure that flatpak/bwrap has always taken care of
scoping flatpaks correctly.

Benjamin


> 
> So maybe a bug? I'll keep watching if it happens again.
> 
> > > Stuff started from the run dialog (alt-f2) and from
> > > the overview still seems to land in gnome-shell-wayland.service,
> > > but maybe this is fixed in gnome-shell 3.35?
> > 
> > This should have changed with the gnome-shell 3.34.2 update in
> > Fedora
> > 31. It may be that it has not reached rawhide yet though.
> 
> I'm still on gnome-shell-3.34.1-4.fc31.x86_64. I'll try the latest
> version.
> 
> > > Another issue is that things that are started through the gnome
> > > terminal also land in gnome-terminal-server.service. They need to
> > > get their own scopes to make resource allocation robust.
> > 
> > Do you think we should just place each VT into its own scope?
> 
> Yes. Everything starting at the shell (or whatever command is
> configured as the "payload", should get its own scope) and a separate
> set of resources than gnome-terminal-server.service.
> 
> > That seems like a reasonable start in principle, though graphical
> > applications launched from the terminal may still not be moved into
> > their own scope then.
> 
> I think it is OK. After all, starting graphical applications from
> the terminal is a special case. If desired, the user may run
> 'systemd-run --user foo' if they want to segregate it. (Actually,
> we might teach some apps to put themselves into a scope when started
> from a command line. This makes sense for stuff like firefox, but
> also
> screen/tmux and others. But I consider this a completely separate
> issue.)
> 
> Zbyszek
> 
> > > It seems we're quite close! Do we just need to wait for another
> > > gnome release and then we'll have everything nicely segregated?
> > 
> > Likely not perfect, but hopefully close enough for many purposes :)
> > 
> > Benjamin
> 
> 
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread David Schwörer
On 1/7/20 11:07 AM, Benjamin Berg wrote:
> On Tue, 2020-01-07 at 09:47 +, Zbigniew Jędrzejewski-Szmek wrote:
>> On Mon, Jan 06, 2020 at 02:53:13PM -0600, Michael Catanzaro wrote:
>>> On Mon, Jan 6, 2020 at 7:09 pm, Lennart Poettering
>>>  wrote:
 - facebook is working on making oomd something that just works for
  everyone, they are in the final rounds of canonicalizing the
  configuration so that it can just work for all workloads without
  tuning. The last bits for this to be deployable are currently being
  done on the kernel side ("iocost"), when that's in, they'll submit
  oomd (or simplified parts of it) to systemd, so that it's just there
  and works. It's their expressive intention to make this something
  that also works for desktop stuff and requires no further
  tuning. they also will do the systemd work necessary. time frame:
  half a year, maybe one year, but no guarantees.
>>>
>>> Asking around, I understand oomd only operates at the cgroup level,
>>> i.e. it kills an entire cgroup at once, not individual processes. So
>>> I understand this would also depend on GNOME-level work to ensure
>>> individual applications get launched in their own systemd scopes,
>>> yes?
>>
>> I wanted to ask about this too... but didn't know where ;)
>> As of today, gnome-shell in F31 seems to start almost everything 
>> as separate systemd user scopes:
>>
>> - various services started automaticlly like /usr/libexec/gsd-power,
>>   /usr/libexec/gsd-sound, etc.
>>
>> - flatpaks (this seems to be new, I had them running under
>>   gnome-shell-wayland.service last week!)
> 
> Hmm, pretty sure flatpaks have always created their own scopes.
> 
>> Stuff started from the run dialog (alt-f2) and from
>> the overview still seems to land in gnome-shell-wayland.service,
>> but maybe this is fixed in gnome-shell 3.35?
> 
> This should have changed with the gnome-shell 3.34.2 update in Fedora
> 31. It may be that it has not reached rawhide yet though.
> 

Just had a look at awesome, all applications seem to be in the same
cgroup, according to systemd-cgtop. Thus if the whole cgroup would be
killed, that means rather then stopping firefox if it uses to much
memory, my whole session would be terminated.

>> Another issue is that things that are started through the gnome
>> terminal also land in gnome-terminal-server.service. They need to
>> get their own scopes to make resource allocation robust.
> 
> Do you think we should just place each VT into its own scope?
> 
> That seems like a reasonable start in principle, though graphical
> applications launched from the terminal may still not be moved into
> their own scope then.
> 
>> It seems we're quite close! Do we just need to wait for another
>> gnome release and then we'll have everything nicely segregated?
> 
> Likely not perfect, but hopefully close enough for many purposes :)
> 
> Benjamin
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread Lennart Poettering
On Mo, 06.01.20 14:53, Michael Catanzaro (mcatanz...@gnome.org) wrote:

> On Mon, Jan 6, 2020 at 7:09 pm, Lennart Poettering 
> wrote:
> > - facebook is working on making oomd something that just works for
> >   everyone, they are in the final rounds of canonicalizing the
> >   configuration so that it can just work for all workloads without
> >   tuning. The last bits for this to be deployable are currently being
> >   done on the kernel side ("iocost"), when that's in, they'll submit
> >   oomd (or simplified parts of it) to systemd, so that it's just there
> >   and works. It's their expressive intention to make this something
> >   that also works for desktop stuff and requires no further
> >   tuning. they also will do the systemd work necessary. time frame:
> >   half a year, maybe one year, but no guarantees.
>
> Asking around, I understand oomd only operates at the cgroup level, i.e. it
> kills an entire cgroup at once, not individual processes. So I understand
> this would also depend on GNOME-level work to ensure individual applications
> get launched in their own systemd scopes, yes?

That would be a good idea, yes. But there'd be a knob for that in the
unit files.

I mean, OOMPolicy= currently can be set to "stop", "kill" or
"continue", where "stop" means "when a process of service X is OOM
killed, attempt to shutdown all of X in a friendly way"; "kill" means
"when a process of service X is OOM killed, forcibly kill all other
processes of X too"; "continue" means "if a process of service X is
OOM killed, do nothing else".

The expectation here is that most services will want "stop" but
services that are more "application servers" than an individual
service (think: apache with its cgi scripts or crond with its
cronjobs) would set OOMPolicy=continue, since if one of their jobs
misbheaves they probably should continue running.

But yeah, the focus where things are going are clearly towards making
a cgroup the unit that is managed as a whole.

Lennart

--
Lennart Poettering, Berlin
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread Kamil Paral
On Mon, Jan 6, 2020 at 8:52 PM Roberto Ragusa  wrote:

> On 2020-01-06 18:31, Kamil Paral wrote:
>
> > FWIW, the behavior on Android is very close to what is proposed here. If
> your application exceeds the amount of available memory, it simply closes
> right in front of your eyes. No explanation, nothing, it's just gone (might
> be different on latest Android versions). The same thing would happen with
> EarlyOOM - some application would disappear.
> >
>
> The analogy is not completely fair.
> On Android applications are designed to be Started and Stopped by the
> system, and they are supposed to save
> their entire state so that when restarted nothing has apparently happened,
> from the point of view of the user.
> (many applications are badly written, but that's another story...)
> And we are talking about background applications, on a system where only
> one application is in foreground
> (only very recently you can have two applications in foreground).
> Finally, it is the applications that are stopped (by asking them nicely
> trough an event), not general system
> processes; Android would never kill a wpa_supplicant process, for example.
>
> Android has a concept of "cache" of background applications, they are
> there, if possible,  just to have them
> back very quickly; it is similar to how Linux keeps dirty disk content in
> RAM and pushes it to disk
> when RAM must be freed.
>

Sure, Android is quite a different world, I'm not saying the comparison
applies 1:1. But the end-effect is similar - a window just disappears. And
it's even less obvious why it happened, because you don't have any swap and
therefore you haven't gone through any performance degradation. That's all
I wanted to note. (Also, I'm also skeptical about the app saving state
before being killed, because it already went out of memory and can't
function properly. But let's not go off-topic.)
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread Kamil Paral
On Tue, Jan 7, 2020 at 8:09 AM Aleksandra Fedorova 
wrote:

> UX before: system works, I run heavy application, system starts to hang, i
> understand that there is an issue, i can kill the app or reboot, which
> gives me clean and working system.
>
> UX after: system works, no visible problems. Suddenly random app
> disappears, no errors or crashes reported to me. It might be that my active
> app is killed, then I know that smth happened, but what if background
> process is killed? Maybe my messenger app?
>

Or actually:

UX before: system works, I run a heavy application, system starts to hang,
I can't even move my mouse, the application doesn't respond to Alt+F4, I
wait patiently for a few minutes then give up and hard-reboot

UX after: system works, I run a heavy application, system starts to hang, I
can't even move my mouse, the application doesn't respond to Alt+F4, I wait
patiently for a few minutes then the application disappears and I have a
functional system again

In your example you forget that swap needs to filled almost to full for
early-oom to start reacting. That takes time during which the system
responsibility is abysmal. The UX difference happens only after you've
already suffered through a serious responsivity degradation, and the only
difference is the end state, *if* you've managed to wait long enough for
early-oom to kick in (which happens earlier than kernel oom and with better
results about which process gets killed, according to Chris).
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread Zbigniew Jędrzejewski-Szmek
On Tue, Jan 07, 2020 at 11:07:49AM +0100, Benjamin Berg wrote:
> On Tue, 2020-01-07 at 09:47 +, Zbigniew Jędrzejewski-Szmek wrote:
> > On Mon, Jan 06, 2020 at 02:53:13PM -0600, Michael Catanzaro wrote:
> > > On Mon, Jan 6, 2020 at 7:09 pm, Lennart Poettering
> > >  wrote:
> > > > - facebook is working on making oomd something that just works for
> > > >  everyone, they are in the final rounds of canonicalizing the
> > > >  configuration so that it can just work for all workloads without
> > > >  tuning. The last bits for this to be deployable are currently being
> > > >  done on the kernel side ("iocost"), when that's in, they'll submit
> > > >  oomd (or simplified parts of it) to systemd, so that it's just there
> > > >  and works. It's their expressive intention to make this something
> > > >  that also works for desktop stuff and requires no further
> > > >  tuning. they also will do the systemd work necessary. time frame:
> > > >  half a year, maybe one year, but no guarantees.
> > > 
> > > Asking around, I understand oomd only operates at the cgroup level,
> > > i.e. it kills an entire cgroup at once, not individual processes. So
> > > I understand this would also depend on GNOME-level work to ensure
> > > individual applications get launched in their own systemd scopes,
> > > yes?
> > 
> > I wanted to ask about this too... but didn't know where ;)
> > As of today, gnome-shell in F31 seems to start almost everything 
> > as separate systemd user scopes:
> > 
> > - various services started automaticlly like /usr/libexec/gsd-power,
> >   /usr/libexec/gsd-sound, etc.
> > 
> > - flatpaks (this seems to be new, I had them running under
> >   gnome-shell-wayland.service last week!)
> 
> Hmm, pretty sure flatpaks have always created their own scopes.

I'm quoting from my mail from this same thread:

│ ├─gnome-shell-wayland.service
│ │ ├─1501571 /usr/bin/gnome-shell
│ │ ├─1501606 /usr/bin/Xwayland :0 -rootless -noreset -accessx -core -auth 
/run/user/1000/.mutter-Xwaylandauth.SCXID0
-listen 4 -listen 5 -displayfd 6
│ │ ├─1501713 ibus-daemon --panel disable -r --xim
│ │ ├─1501718 /usr/libexec/ibus-dconf
│ │ ├─1501719 /usr/libexec/ibus-extension-gtk3
│ │ ├─1501724 /usr/libexec/ibus-x11 --kill-daemon
│ │ ├─1501980 /usr/libexec/ibus-engine-simple
│ │ ├─1503586 /usr/lib64/firefox/firefox
│ │ ├─1503691 /usr/lib64/firefox/firefox -contentproc -childID 2 -isForBrowser 
...
│ │ ├─1503701 /usr/lib64/firefox/firefox -contentproc -childID 3 -isForBrowser 
...
│ │ ├─1503747 /usr/lib64/firefox/firefox -contentproc -childID 4 -isForBrowser 
...
│ │ ├─1520219 bwrap --args 35 telegram-desktop --
│ │ ├─1520229 bwrap --args 35 xdg-dbus-proxy --args=37
│ │ ├─1520230 xdg-dbus-proxy --args=37
│ │ ├─1520232 bwrap --args 35 telegram-desktop --
│ │ ├─1520233 /app/bin/Telegram --
│ │ ├─1540753 pavucontrol
...


So maybe a bug? I'll keep watching if it happens again.

> > Stuff started from the run dialog (alt-f2) and from
> > the overview still seems to land in gnome-shell-wayland.service,
> > but maybe this is fixed in gnome-shell 3.35?
> 
> This should have changed with the gnome-shell 3.34.2 update in Fedora
> 31. It may be that it has not reached rawhide yet though.

I'm still on gnome-shell-3.34.1-4.fc31.x86_64. I'll try the latest
version.

> > Another issue is that things that are started through the gnome
> > terminal also land in gnome-terminal-server.service. They need to
> > get their own scopes to make resource allocation robust.
> 
> Do you think we should just place each VT into its own scope?

Yes. Everything starting at the shell (or whatever command is
configured as the "payload", should get its own scope) and a separate
set of resources than gnome-terminal-server.service.

> That seems like a reasonable start in principle, though graphical
> applications launched from the terminal may still not be moved into
> their own scope then.

I think it is OK. After all, starting graphical applications from
the terminal is a special case. If desired, the user may run
'systemd-run --user foo' if they want to segregate it. (Actually,
we might teach some apps to put themselves into a scope when started
from a command line. This makes sense for stuff like firefox, but also
screen/tmux and others. But I consider this a completely separate
issue.)

Zbyszek

> > It seems we're quite close! Do we just need to wait for another
> > gnome release and then we'll have everything nicely segregated?
> 
> Likely not perfect, but hopefully close enough for many purposes :)
> 
> Benjamin

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread Benjamin Berg
On Tue, 2020-01-07 at 09:47 +, Zbigniew Jędrzejewski-Szmek wrote:
> On Mon, Jan 06, 2020 at 02:53:13PM -0600, Michael Catanzaro wrote:
> > On Mon, Jan 6, 2020 at 7:09 pm, Lennart Poettering
> >  wrote:
> > > - facebook is working on making oomd something that just works for
> > >  everyone, they are in the final rounds of canonicalizing the
> > >  configuration so that it can just work for all workloads without
> > >  tuning. The last bits for this to be deployable are currently being
> > >  done on the kernel side ("iocost"), when that's in, they'll submit
> > >  oomd (or simplified parts of it) to systemd, so that it's just there
> > >  and works. It's their expressive intention to make this something
> > >  that also works for desktop stuff and requires no further
> > >  tuning. they also will do the systemd work necessary. time frame:
> > >  half a year, maybe one year, but no guarantees.
> > 
> > Asking around, I understand oomd only operates at the cgroup level,
> > i.e. it kills an entire cgroup at once, not individual processes. So
> > I understand this would also depend on GNOME-level work to ensure
> > individual applications get launched in their own systemd scopes,
> > yes?
> 
> I wanted to ask about this too... but didn't know where ;)
> As of today, gnome-shell in F31 seems to start almost everything 
> as separate systemd user scopes:
> 
> - various services started automaticlly like /usr/libexec/gsd-power,
>   /usr/libexec/gsd-sound, etc.
> 
> - flatpaks (this seems to be new, I had them running under
>   gnome-shell-wayland.service last week!)

Hmm, pretty sure flatpaks have always created their own scopes.

> Stuff started from the run dialog (alt-f2) and from
> the overview still seems to land in gnome-shell-wayland.service,
> but maybe this is fixed in gnome-shell 3.35?

This should have changed with the gnome-shell 3.34.2 update in Fedora
31. It may be that it has not reached rawhide yet though.

> Another issue is that things that are started through the gnome
> terminal also land in gnome-terminal-server.service. They need to
> get their own scopes to make resource allocation robust.

Do you think we should just place each VT into its own scope?

That seems like a reasonable start in principle, though graphical
applications launched from the terminal may still not be moved into
their own scope then.

> It seems we're quite close! Do we just need to wait for another
> gnome release and then we'll have everything nicely segregated?

Likely not perfect, but hopefully close enough for many purposes :)

Benjamin


signature.asc
Description: This is a digitally signed message part
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread Zbigniew Jędrzejewski-Szmek
On Tue, Jan 07, 2020 at 08:57:04AM +0200, Damian Ivanov wrote:
> I am not using a swap partition at all, the system always hangs when
> OOM but sometimes also at just less than 20%

This might be https://gitlab.gnome.org/GNOME/gnome-shell/issues/1981
or one of the duplicates.

Zbyszek
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread Zbigniew Jędrzejewski-Szmek
On Mon, Jan 06, 2020 at 02:53:13PM -0600, Michael Catanzaro wrote:
> On Mon, Jan 6, 2020 at 7:09 pm, Lennart Poettering
>  wrote:
> >- facebook is working on making oomd something that just works for
> >  everyone, they are in the final rounds of canonicalizing the
> >  configuration so that it can just work for all workloads without
> >  tuning. The last bits for this to be deployable are currently being
> >  done on the kernel side ("iocost"), when that's in, they'll submit
> >  oomd (or simplified parts of it) to systemd, so that it's just there
> >  and works. It's their expressive intention to make this something
> >  that also works for desktop stuff and requires no further
> >  tuning. they also will do the systemd work necessary. time frame:
> >  half a year, maybe one year, but no guarantees.
> 
> Asking around, I understand oomd only operates at the cgroup level,
> i.e. it kills an entire cgroup at once, not individual processes. So
> I understand this would also depend on GNOME-level work to ensure
> individual applications get launched in their own systemd scopes,
> yes?

I wanted to ask about this too... but didn't know where ;)
As of today, gnome-shell in F31 seems to start almost everything 
as separate systemd user scopes:

- various services started automaticlly like /usr/libexec/gsd-power,
  /usr/libexec/gsd-sound, etc.

- flatpaks (this seems to be new, I had them running under
  gnome-shell-wayland.service last week!)

Stuff started from the run dialog (alt-f2) and from
the overview still seems to land in gnome-shell-wayland.service,
but maybe this is fixed in gnome-shell 3.35?

Another issue is that things that are started through the gnome
terminal also land in gnome-terminal-server.service. They need to
get their own scopes to make resource allocation robust.

It seems we're quite close! Do we just need to wait for another
gnome release and then we'll have everything nicely segregated?

Zbyszek
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread Kevin Kofler
Chris Murphy wrote:
> It's not correct that the Workstation working group doesn't want to
> see it supported, it's a question of whether and to what degree it can
> be supported, and making sure users have expectations proper set. I
> wouldn't want users thinking it'll work by advertising that it does,
> and then it eats their data.
> 
> Does the hardware support it? Does the hardware properly advertise
> what it does support? What mechanisms are needed in the kernel and
> systemd to support it, and what to do when there are bugs that break
> it? It's not practical for the Fedora kernel team to become
> responsible for supporting it when it breaks, nor is it practical to
> block the release on such bugs.

The biggest issue is that "Secure Boot" (Restricted Boot) mode disables 
hibernation.

Kevin Kofler
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-07 Thread Chris Murphy
On Tue, Jan 7, 2020 at 12:08 AM Aleksandra Fedorova  wrote:
>
>
>
> On Mon, 6 Jan 2020, 18:32 Kamil Paral,  wrote:
>>
>> On Sun, Jan 5, 2020 at 12:43 PM Aleksandra Fedorova  
>> wrote:
>>>
>>> I wonder, how I as a user going to be informed about the
>>> earlyoom-event? I assume abrt will recognize the crash? Will it be
>>> easily visible from the abrt report that it was the OOM?
>>>
>>> The concern is: if we enable such a service, will we get large amount
>>> of vague bug reports from users who don't understand what has
>>> happened. Can we make it somehow easier to debug?
>>
>>
>> FWIW, the behavior on Android is very close to what is proposed here. If 
>> your application exceeds the amount of available memory, it simply closes 
>> right in front of your eyes. No explanation, nothing, it's just gone (might 
>> be different on latest Android versions). The same thing would happen with 
>> EarlyOOM - some application would disappear.
>>
>> I agree it would be nice to inform the user before or at least after. 
>> Windows can do it - they show a notification roughly saying "Your system is 
>> running out of memory and some application might get closed". (At least they 
>> used to in the old days, I haven't run out of memory for a long time, and I 
>> don't know whether Windows 10 behaves the same way). But I think it should 
>> not be a stopper for the proposal as it is. Even without the notification 
>> the user experience is improved over the default behavior.
>
>
> I am not convinced that it is an improvement to be honest.
>
> UX before: system works, I run heavy application, system starts to hang, i 
> understand that there is an issue, i can kill the app or reboot, which gives 
> me clean and working system.
>
> UX after: system works, no visible problems. Suddenly random app disappears, 
> no errors or crashes reported to me. It might be that my active app is 
> killed, then I know that smth happened, but what if background process is 
> killed? Maybe my messenger app?

This comparison is not accurate.
1. In the UX before case, it's unfair you're comparing to user
intervention to kill the app rather than oom-killer.
2. oom-killer reports to the journal. earlyoom reports to the journal.
They're the same.
3. Quite a lot of errors and crashes are only ever reported to the journal.
4. UX after (i.e. with earlyoom running), the system starts to hang,
you should understand there's an issue, recovery shouldn't take quite
as long but you'll still wish the system hadn't become hung up in the
first place
5. The app is quit with SIGTERM, not killed
6. kernel oom-killer can kill background processes too

> I am going to keep working in my main app without noticing that I lost 
> something, not knowing that I need to take action. And my system now runs in 
> a weird state, and can stay there for days, which will lead to more weird and 
> nonreproducible errors later.

No different than with oom-killer, assuming you're willing to wait for
it take action. If you force power off instead, there's some chance
you're still going to do that with earlyoom because the responsivity
problem has more to do with congestion as a result of heavy swapping.

> The "hang" of a system was the feedback user got from the system that there 
> is something wrong. Not ideal, but at least there was something. With this 
> feature we don't solve the issue, we remove the "bad" feedback, and we don't 
> provide any replacement for it making memory problem completely invisible.
>
> Is it really a good UX?

Insofar as aggravation is definitely not good UX, I'd say for sure
it's better to reduce user aggravation. They will still experience the
hang. It just won't last quite as long, and yet it still will be too
long.

-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Aleksandra Fedorova
On Mon, 6 Jan 2020, 18:32 Kamil Paral,  wrote:

> On Sun, Jan 5, 2020 at 12:43 PM Aleksandra Fedorova 
> wrote:
>
>> I wonder, how I as a user going to be informed about the
>> earlyoom-event? I assume abrt will recognize the crash? Will it be
>> easily visible from the abrt report that it was the OOM?
>>
>> The concern is: if we enable such a service, will we get large amount
>> of vague bug reports from users who don't understand what has
>> happened. Can we make it somehow easier to debug?
>>
>
> FWIW, the behavior on Android is very close to what is proposed here. If
> your application exceeds the amount of available memory, it simply closes
> right in front of your eyes. No explanation, nothing, it's just gone (might
> be different on latest Android versions). The same thing would happen with
> EarlyOOM - some application would disappear.
>
> I agree it would be nice to inform the user before or at least after.
> Windows can do it - they show a notification roughly saying "Your system is
> running out of memory and some application might get closed". (At least
> they used to in the old days, I haven't run out of memory for a long time,
> and I don't know whether Windows 10 behaves the same way). But I think it
> should not be a stopper for the proposal as it is. Even without the
> notification the user experience is improved over the default behavior.
>

I am not convinced that it is an improvement to be honest.

UX before: system works, I run heavy application, system starts to hang, i
understand that there is an issue, i can kill the app or reboot, which
gives me clean and working system.

UX after: system works, no visible problems. Suddenly random app
disappears, no errors or crashes reported to me. It might be that my active
app is killed, then I know that smth happened, but what if background
process is killed? Maybe my messenger app?

I am going to keep working in my main app without noticing that I lost
something, not knowing that I need to take action. And my system now runs
in a weird state, and can stay there for days, which will lead to more
weird and nonreproducible errors later.

The "hang" of a system was the feedback user got from the system that there
is something wrong. Not ideal, but at least there was something. With this
feature we don't solve the issue, we remove the "bad" feedback, and we
don't provide any replacement for it making memory problem completely
invisible.

Is it really a good UX?


___
> devel mailing list -- devel@lists.fedoraproject.org
> To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct:
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives:
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
>
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Damian Ivanov
I am not using a swap partition at all, the system always hangs when
OOM but sometimes also at just less than 20%

On Tue, Jan 7, 2020 at 8:43 AM Peter Hutterer  wrote:
>
> On Sat, Jan 04, 2020 at 12:15:20PM -0600, Michael Catanzaro wrote:
> > Let's keep this desktop-focused, since the proposal does not affect Server
> > edition.
> >
> > On Sat, Jan 4, 2020 at 12:48 pm, drago01  wrote:
> > > As for the desktop case the running web browers in a cgroup to keep them
> > > in check would solve most real world problems - other common desktop
> > > apps don't use enough memory to cause such issues (unless your system is
> > > really memory constrained but then the "buy more memory" solution is the
> > > better fix).
> >
> > The last time I saw my desktop hang due to a web browser using too much
> > memory was 2015.
>
> just FTR, this happens relatively frequently for me. Some websites seem to
> cause Firefox to swap itself into nirvana. Sometimes within a short time
> but sometimes it takes longer. I've come back from a lunch break a few times
> to a desktop swapping itself to death.
>
> Not yet fully identified but it does happen.
>
> Cheers,
>Peter
>
> >
> > The freezes I've encountered in the past five years were all related to
> > software development:
> >
> > * When compiling large software projects, it's possible to run out of RAM
> > either when building lots of files in parallel, or when linking
> > * GNOME Builder runs ctags, and ctags likes to use dozens of GB of RAM to
> > index large software projects. I think it sometimes gets into a loop where
> > it just allocates more and more RAM until the desktop dies
> >
> > ___
> > devel mailing list -- devel@lists.fedoraproject.org
> > To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> > Fedora Code of Conduct: 
> > https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> > List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> > List Archives: 
> > https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
> ___
> devel mailing list -- devel@lists.fedoraproject.org
> To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct: 
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Peter Hutterer
On Sat, Jan 04, 2020 at 12:15:20PM -0600, Michael Catanzaro wrote:
> Let's keep this desktop-focused, since the proposal does not affect Server
> edition.
> 
> On Sat, Jan 4, 2020 at 12:48 pm, drago01  wrote:
> > As for the desktop case the running web browers in a cgroup to keep them
> > in check would solve most real world problems - other common desktop
> > apps don't use enough memory to cause such issues (unless your system is
> > really memory constrained but then the "buy more memory" solution is the
> > better fix).
> 
> The last time I saw my desktop hang due to a web browser using too much
> memory was 2015.

just FTR, this happens relatively frequently for me. Some websites seem to
cause Firefox to swap itself into nirvana. Sometimes within a short time
but sometimes it takes longer. I've come back from a lunch break a few times
to a desktop swapping itself to death.

Not yet fully identified but it does happen.

Cheers,
   Peter

> 
> The freezes I've encountered in the past five years were all related to
> software development:
> 
> * When compiling large software projects, it's possible to run out of RAM
> either when building lots of files in parallel, or when linking
> * GNOME Builder runs ctags, and ctags likes to use dozens of GB of RAM to
> index large software projects. I think it sometimes gets into a loop where
> it just allocates more and more RAM until the desktop dies
> 
> ___
> devel mailing list -- devel@lists.fedoraproject.org
> To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct: 
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Mark Otaris
> For now, kernel developers have made it clear they do not care about
> user space responsiveness. At all. Their concern with kernel
> oom-killer is strictly with keeping the kernel functioning.

This is false. The stated purpose of the OOM killer is not only to keep the
kernel alive. Nor does the fact the kernel has not solved userspace
responsiveness yet imply that kernel folks do not care. Rather, it means that
they will not solve it on their own because the kernel does not have all the
information it needs. Kernel folks do care, or we wouldn’t have PSI or cgroups.
A userspace solution is needed, but does not need to replace the OOM killer;
cgroups are also a userspace solution. If earlyoom breaks them, it can make
things worse than the status quo.

> Can it be done with cgroupv2 and PSI alone? Unclear.

Of course it can. Just run 100 instances of every stress-ng memory worker in
a podman container with a cgroup memory limit. The system will not hang. Do
the same without the memory limit. The system will hang within seconds and never
recover. Thus demonstrating that cgroups work and do the things they were
intended to do.

Try it. With a memory limit,

podman run --rm -it --memory=1G fedora bash -c 'dnf install -y stress-ng && 
stress-ng --malloc 100 --memcpy 100 --mmap 100 --vm 100'

will use CPU but keep your system responsive. Without the memory limit
(this will hang your system),

podman run --rm -it fedora bash -c 'dnf install -y stress-ng && stress-ng 
--malloc 100 --memcpy 100 --mmap 100 --vm 100'

the system hangs and doesn’t recover after 15 minutes. Same thing
with `tail /dev/zero`:

podman run --rm -it --memory=1G fedora tail /dev/zero

activates the OOM killer after three seconds, with

kernel: Memory cgroup out of memory: Killed process 8814 (tail) 
total-vm:3141408kB, anon-rss:1042028kB, file-rss:4kB, shmem-rss:0kB, UID:1000 
pgtables:6336512kB oom_score_adj:0
systemd[943]: 
libpod-e061e1cb57dde204632531a556d37efbd51c9ab67346a8bc4d5e26c7301c165b.scope: 
A process of this unit has been killed by the OOM killer.kernel: oom_reaper: 
reaped process 8814 (tail), now anon-rss:0kB, file-rss:0kB, shmem-rss:0kB

logged in the system journal. You were saying the OOM killer activates too late
and rarely kills the right process? Well, here it activates early enough and
knows exactly what to stop. It is worth trying with ninja and WebKit too.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Artem Tim
Seems like this bug:

**Kills multiple processes at once**
https://github.com/rfjakob/earlyoom/issues/121

but according to github it's fixed now. 
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Michael Catanzaro
On Mon, Jan 6, 2020 at 7:09 pm, Lennart Poettering 
 wrote:

- facebook is working on making oomd something that just works for
  everyone, they are in the final rounds of canonicalizing the
  configuration so that it can just work for all workloads without
  tuning. The last bits for this to be deployable are currently being
  done on the kernel side ("iocost"), when that's in, they'll submit
  oomd (or simplified parts of it) to systemd, so that it's just there
  and works. It's their expressive intention to make this something
  that also works for desktop stuff and requires no further
  tuning. they also will do the systemd work necessary. time frame:
  half a year, maybe one year, but no guarantees.


Asking around, I understand oomd only operates at the cgroup level, 
i.e. it kills an entire cgroup at once, not individual processes. So I 
understand this would also depend on GNOME-level work to ensure 
individual applications get launched in their own systemd scopes, yes?


Michael

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Chris Murphy
On Mon, Jan 6, 2020 at 1:14 PM Robbie Harwood  wrote:
>
> Chris Murphy  writes:

> > As for swap size options including no swap, and maybe swap-on-ZRAM:
> > https://pagure.io/fedora-workstation/issue/120
> > https://bugzilla.redhat.com/show_bug.cgi?id=1731978
> >
> > There are all kinds of useful and necessary discussions to have there
> > (rather than here).
>
> The links are appreciated; I was not aware of these discussions and will
> follow them.  However, since we're discussing behavior of the system
> under heavy load, I think how we handle swap (the thing that makes it
> slow down when you're low on memory...) is extremely relevant.

It's perhaps the most relevant thing, it's what's to be avoided
because it causes the responsiveness problem in the first place. I
just meant in terms of this feature proposal, there are no swap
changes. And what to change is elsewhere, and elsewhen. :D


-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Robbie Harwood
Chris Murphy  writes:

> Robbie Harwood  wrote:
>> "John M. Harris Jr"  writes:
>>> On Friday, January 3, 2020 1:51:00 PM MST Robbie Harwood wrote:
 Robbie Harwood  writes:
> Ben Cotton  writes:
>
>> https://fedoraproject.org/wiki/Changes/EnableEarlyoom
>>
>> == Summary ==
>> Install earlyoom package, and enable it by default. This will cause
>> the kernel oomkiller to trigger sooner, but will not affect which
>> process it chooses to kill off. The idea is to recover from out of
>> memory situations sooner, rather than the typical complete system hang
>> in which the user has no other choice but to force power off.
>>
>> # enable earlyoom by default on workstation
>> enable earlyoom.service
>> 
>
> The OOM killer is a kernel function.  I have no opinion on this proposal
> as it stands, but I would like it to include an explanation of why this
> requires a service in userspace to fix.

 Another thought.  Wouldn't some of the pain here be alleviated by
 setting vm.swappiness=0?  Currently it seems to be 60, which results
 in somewhat aggressive swap use; 1 seems better (minimal swapping
 without disabling), while 0 will disable it for general use (while
 preserving it for hibernation).  This would at least improve the disk
 thrashing during OOM situations.
>>>
>>> To clarify, according to the Workstation group, hibernation isn't even
>>> supported.
>>
>> If that's true - and I don't know how I'd check it, so I didn't - we
>> should revisit enabling swap in the default install, and *definitely*
>> should remove the warning for not having it from anaconda.
>
> It's not correct that the Workstation working group doesn't want to
> see it supported, it's a question of whether and to what degree it can
> be supported, and making sure users have expectations proper set. I
> wouldn't want users thinking it'll work by advertising that it does,
> and then it eats their data.

I think enabling it by default very strongly suggests it's supported,
regardless of what the intentions are.  I have no quarrel with the
kernel team in either direction they wish to decide (supported or non),
but if it's non-supported, it shouldn't look like it's supported.

> As for swap size options including no swap, and maybe swap-on-ZRAM:
> https://pagure.io/fedora-workstation/issue/120
> https://bugzilla.redhat.com/show_bug.cgi?id=1731978
>
> There are all kinds of useful and necessary discussions to have there
> (rather than here).

The links are appreciated; I was not aware of these discussions and will
follow them.  However, since we're discussing behavior of the system
under heavy load, I think how we handle swap (the thing that makes it
slow down when you're low on memory...) is extremely relevant.

Thanks,
--Robbie


signature.asc
Description: PGP signature
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Chris Murphy
On Mon, Jan 6, 2020 at 12:11 PM Robbie Harwood  wrote:
>
> "John M. Harris Jr"  writes:
>
> > On Friday, January 3, 2020 1:51:00 PM MST Robbie Harwood wrote:
> >> Robbie Harwood  writes:
> >>> Ben Cotton  writes:
> >>>
>  https://fedoraproject.org/wiki/Changes/EnableEarlyoom
> 
>  == Summary ==
>  Install earlyoom package, and enable it by default. This will cause
>  the kernel oomkiller to trigger sooner, but will not affect which
>  process it chooses to kill off. The idea is to recover from out of
>  memory situations sooner, rather than the typical complete system hang
>  in which the user has no other choice but to force power off.
> 
>  # enable earlyoom by default on workstation
>  enable earlyoom.service
>  
> >>>
> >>> The OOM killer is a kernel function.  I have no opinion on this proposal
> >>> as it stands, but I would like it to include an explanation of why this
> >>> requires a service in userspace to fix.
> >>
> >> Another thought.  Wouldn't some of the pain here be alleviated by
> >> setting vm.swappiness=0?  Currently it seems to be 60, which results
> >> in somewhat aggressive swap use; 1 seems better (minimal swapping
> >> without disabling), while 0 will disable it for general use (while
> >> preserving it for hibernation).  This would at least improve the disk
> >> thrashing during OOM situations.
> >
> > To clarify, according to the Workstation group, hibernation isn't even
> > supported.
>
> If that's true - and I don't know how I'd check it, so I didn't - we
> should revisit enabling swap in the default install, and *definitely*
> should remove the warning for not having it from anaconda.

It's not correct that the Workstation working group doesn't want to
see it supported, it's a question of whether and to what degree it can
be supported, and making sure users have expectations proper set. I
wouldn't want users thinking it'll work by advertising that it does,
and then it eats their data.

Does the hardware support it? Does the hardware properly advertise
what it does support? What mechanisms are needed in the kernel and
systemd to support it, and what to do when there are bugs that break
it? It's not practical for the Fedora kernel team to become
responsible for supporting it when it breaks, nor is it practical to
block the release on such bugs. The most recent topic I found on this:

Disabling kernel's hibernate support by default, allow re-enabling it
with a kernel cmdline option
https://lists.fedoraproject.org/archives/list/ker...@lists.fedoraproject.org/message/TLTA6HAYJWQYHV3ZHFXUIXM4IJVWBEJJ/

As for swap size options including no swap, and maybe swap-on-ZRAM:
https://pagure.io/fedora-workstation/issue/120
https://bugzilla.redhat.com/show_bug.cgi?id=1731978

There are all kinds of useful and necessary discussions to have there
(rather than here).


--
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Roberto Ragusa

On 2020-01-06 18:31, Kamil Paral wrote:


FWIW, the behavior on Android is very close to what is proposed here. If your 
application exceeds the amount of available memory, it simply closes right in 
front of your eyes. No explanation, nothing, it's just gone (might be different 
on latest Android versions). The same thing would happen with EarlyOOM - some 
application would disappear.



The analogy is not completely fair.
On Android applications are designed to be Started and Stopped by the system, 
and they are supposed to save
their entire state so that when restarted nothing has apparently happened, from 
the point of view of the user.
(many applications are badly written, but that's another story...)
And we are talking about background applications, on a system where only one 
application is in foreground
(only very recently you can have two applications in foreground).
Finally, it is the applications that are stopped (by asking them nicely trough 
an event), not general system
processes; Android would never kill a wpa_supplicant process, for example.

Android has a concept of "cache" of background applications, they are there, if 
possible,  just to have them
back very quickly; it is similar to how Linux keeps dirty disk content in RAM 
and pushes it to disk
when RAM must be freed.

Regards.

--
   Roberto Ragusamail at robertoragusa.it
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Michael Catanzaro
On Mon, Jan 6, 2020 at 11:53 am, Chris Murphy  
wrote:

And yes the idea is to go a little faster. Earlyoom is easy to take
out. And I have no problem with it coming out in fc33 if oomd or (more
likely) lmm are ready by then.


Brainstorming: if a systemd-level solution were to be ready in the F33 
or F34 timeframe, I'd be OK with just waiting for that. We've had 31 
Fedora releases without earlyoom and one or two more isn't the end of 
the world. Seems easier than installing earlyoom on everybody's 
computers and then calling "backsies!" a year later.


Of course we would need to monitor progress at the systemd level to 
make sure this solution is advancing as desired, and fall back to plans 
for earlyoom if things get off track.


Michael

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Robbie Harwood
"John M. Harris Jr"  writes:

> On Friday, January 3, 2020 1:51:00 PM MST Robbie Harwood wrote:
>> Robbie Harwood  writes:
>>> Ben Cotton  writes:
>>>
 https://fedoraproject.org/wiki/Changes/EnableEarlyoom
 
 == Summary ==
 Install earlyoom package, and enable it by default. This will cause
 the kernel oomkiller to trigger sooner, but will not affect which
 process it chooses to kill off. The idea is to recover from out of
 memory situations sooner, rather than the typical complete system hang
 in which the user has no other choice but to force power off.
 
 # enable earlyoom by default on workstation
 enable earlyoom.service
 
>>> 
>>> The OOM killer is a kernel function.  I have no opinion on this proposal
>>> as it stands, but I would like it to include an explanation of why this
>>> requires a service in userspace to fix.
>> 
>> Another thought.  Wouldn't some of the pain here be alleviated by
>> setting vm.swappiness=0?  Currently it seems to be 60, which results
>> in somewhat aggressive swap use; 1 seems better (minimal swapping
>> without disabling), while 0 will disable it for general use (while
>> preserving it for hibernation).  This would at least improve the disk
>> thrashing during OOM situations.
>
> To clarify, according to the Workstation group, hibernation isn't even 
> supported.

If that's true - and I don't know how I'd check it, so I didn't - we
should revisit enabling swap in the default install, and *definitely*
should remove the warning for not having it from anaconda.

Thanks,
--Robbie


signature.asc
Description: PGP signature
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Chris Murphy
On Mon, Jan 6, 2020 at 11:09 AM Lennart Poettering  wrote:
>
> On Mo, 06.01.20 11:22, Michael Catanzaro (mcatanz...@gnome.org) wrote:
>
> So I talked to Tejun Heo about this (kernel cgroups maintainer,
> working for facebook with the people who did the PSI stuff, kernel mm
> guy). Here's the gist:
>
> - earlyoom might be OK as short time stopgap if people really want to
>   hurry something, as long as it watches only swap depletion (which it
>   pretty much does already). But it should then also determine what
>   to kill taking the swap use into account and little else (which it
>   apparently does not). This doesn't make any sense to have though if
>   there is no swap.
>
> - Don't bother with the OOM score the kernel calculates for processes,
>   it doesn't take the swap use into account. That said, do take the
>   configurable OOM score *adjustment* into account, so that processes
>   which set that are respected, i.e. journald, udevd, and such. (or in
>   otherwords, ignore /proc/$PID/oom_score, but respect
>   /proc/PID/oom_score_adj).
>
> - going down to 100ms poll intervals is a bad idea, 1s is sufficient,
>   maybe higher.
>
> - facebook is working on making oomd something that just works for
>   everyone, they are in the final rounds of canonicalizing the
>   configuration so that it can just work for all workloads without
>   tuning. The last bits for this to be deployable are currently being
>   done on the kernel side ("iocost"), when that's in, they'll submit
>   oomd (or simplified parts of it) to systemd, so that it's just there
>   and works. It's their expressive intention to make this something
>   that also works for desktop stuff and requires no further
>   tuning. they also will do the systemd work necessary. time frame:
>   half a year, maybe one year, but no guarantees.
>
> - oomd currently polls some parameters in time intervals too,
>   still. They are working on getting rid of that too, so that
>   everything is event based via PSI. Given their own focus on servers
>   it's not a primary goal, but still a goal.
>
> Or in other words: oomd is the way to go in the long run, developed
> alongside the kernel features backing it. You can use it already if
> you like, but there are still too many knobs for generic
> deployment. earlyoom might be a valid temporary stopgap if you want to
> hurry this.
>
> (And now I hope I paraphrased everything he said more or less
> correctly...)

Thanks for all of that, and it's consistent with the research and
discussion the working group have done in the past 6 months on this
subject. What I can't estimate is whether oomd or lmm will be better
long term for Fedora  Workstation, or if there's an advantage of them
co-existing.

And yes the idea is to go a little faster. Earlyoom is easy to take
out. And I have no problem with it coming out in fc33 if oomd or (more
likely) lmm are ready by then.


-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Kamil Paral
On Mon, Jan 6, 2020 at 7:10 PM Lennart Poettering 
wrote:

> - going down to 100ms poll intervals is a bad idea, 1s is sufficient,
>   maybe higher.
>

According to the project readme, the query interval is 100ms only if the
lack or free RAM starts to get severe. Otherwise the interval is claimed to
be longer. I haven't checked the code, though.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Lennart Poettering
On Mo, 06.01.20 11:22, Michael Catanzaro (mcatanz...@gnome.org) wrote:

So I talked to Tejun Heo about this (kernel cgroups maintainer,
working for facebook with the people who did the PSI stuff, kernel mm
guy). Here's the gist:

- earlyoom might be OK as short time stopgap if people really want to
  hurry something, as long as it watches only swap depletion (which it
  pretty much does already). But it should then also determine what
  to kill taking the swap use into account and little else (which it
  apparently does not). This doesn't make any sense to have though if
  there is no swap.

- Don't bother with the OOM score the kernel calculates for processes,
  it doesn't take the swap use into account. That said, do take the
  configurable OOM score *adjustment* into account, so that processes
  which set that are respected, i.e. journald, udevd, and such. (or in
  otherwords, ignore /proc/$PID/oom_score, but respect
  /proc/PID/oom_score_adj).

- going down to 100ms poll intervals is a bad idea, 1s is sufficient,
  maybe higher.

- facebook is working on making oomd something that just works for
  everyone, they are in the final rounds of canonicalizing the
  configuration so that it can just work for all workloads without
  tuning. The last bits for this to be deployable are currently being
  done on the kernel side ("iocost"), when that's in, they'll submit
  oomd (or simplified parts of it) to systemd, so that it's just there
  and works. It's their expressive intention to make this something
  that also works for desktop stuff and requires no further
  tuning. they also will do the systemd work necessary. time frame:
  half a year, maybe one year, but no guarantees.

- oomd currently polls some parameters in time intervals too,
  still. They are working on getting rid of that too, so that
  everything is event based via PSI. Given their own focus on servers
  it's not a primary goal, but still a goal.

Or in other words: oomd is the way to go in the long run, developed
alongside the kernel features backing it. You can use it already if
you like, but there are still too many knobs for generic
deployment. earlyoom might be a valid temporary stopgap if you want to
hurry this.

(And now I hope I paraphrased everything he said more or less
correctly...)

if you want to know more about fb's oomd:
https://cfp.all-systems-go.io/ASG2019/talk/DQX3DH/

(but before this will enter systemd it's gonna be dumbed down, i.e,
less configuration, more "just works")

Lennart

--
Lennart Poettering, Berlin
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Kamil Paral
On Fri, Jan 3, 2020 at 8:20 PM Ben Cotton  wrote:

> https://fedoraproject.org/wiki/Changes/EnableEarlyoom
>
> == Summary ==
> Install earlyoom package, and enable it by default. This will cause
> the kernel oomkiller to trigger sooner, but will not affect which
> process it chooses to kill off. The idea is to recover from out of
> memory situations sooner, rather than the typical complete system hang
> in which the user has no other choice but to force power off.
>

I've read the whole thread (phew!) and I support the proposal. The user
experience is improved and I don't see any substantial disadvantages (power
management etc can hopefully be fine-tuned). Of course the code should be
well inspected by someone knowledgeable, if it's going to run with high
privileges. And if there are serious candidates with a better approach
(e.g. something from systemd), it might make sense to delay this and wait a
while. OTOH, if verifying the code and setting it up is not that much work,
those candidates can *replace* early-oom in the future, and no delay is
necessary. Overall +1 from me.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Kamil Paral
On Sun, Jan 5, 2020 at 12:43 PM Aleksandra Fedorova 
wrote:

> I wonder, how I as a user going to be informed about the
> earlyoom-event? I assume abrt will recognize the crash? Will it be
> easily visible from the abrt report that it was the OOM?
>
> The concern is: if we enable such a service, will we get large amount
> of vague bug reports from users who don't understand what has
> happened. Can we make it somehow easier to debug?
>

FWIW, the behavior on Android is very close to what is proposed here. If
your application exceeds the amount of available memory, it simply closes
right in front of your eyes. No explanation, nothing, it's just gone (might
be different on latest Android versions). The same thing would happen with
EarlyOOM - some application would disappear.

I agree it would be nice to inform the user before or at least after.
Windows can do it - they show a notification roughly saying "Your system is
running out of memory and some application might get closed". (At least
they used to in the old days, I haven't run out of memory for a long time,
and I don't know whether Windows 10 behaves the same way). But I think it
should not be a stopper for the proposal as it is. Even without the
notification the user experience is improved over the default behavior.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Michael Catanzaro
On Mon, Jan 6, 2020 at 5:47 pm, Lennart Poettering 
 wrote:

Yes, l-m-m is great. If we can deploy l-m-m today already, why isn't
it good enoug for earlyoom?



GMemoryMonitor is the GLib API that's implemented using 
low-memory-monitor's D-Bus API.


In practice, using it for OOM killing is not that great, though. We've 
rejected this approach because the OOM killing is causing serious 
problems and there are no plans to fix it: 
https://gitlab.freedesktop.org/hadess/low-memory-monitor/issues/8. 
Therefore most likely we'll use l-m-m only for advisory memory pressure 
notifications.


On Mon, Jan 6, 2020 at 5:47 pm, Lennart Poettering 
 wrote:

Sounds like someone needs to do their homework, if this is "unclear"?

I mean, you basically admit here that this isn't really figured out to
the end. Maybe let's give this a bit more time and figure things out a
bit more, instead of rushing earlyoom in?

Adopting something now, at a point we already clearly know that PSI is
how this should be done sounds very wrong to me.


I think it would absolutely be reasonable to defer from F32 -> F33 if 
we have concrete plans to use that delay to implement an OOM solution. 
E.g. if you or someone else wanted to throw together a systemd-level 
solution:



Sounds like something that is relatively easily implementable in
systemd though, in a much better way, i.e. hooked to PSI...



I mean, wouldn't this all be solved much nicer, much more future
proof, if someone would just do what l-m-m does as part of systemd
service management, i.e. let's say an option StopOnMemoryPressure=
that watches PSI and terminates services *cleanly* when needed,
i.e. goes through ExecStop= and such?

And you know what, PSI is precisely defined to be used for purposes
like this, we already have experience with it (see l-m-m) and a patch
adding this to systemd isn#t really that hard either...


So again, the problem with PSI so far is that we haven't gotten it to 
work well. If systemd can make it work well, that would be super 
lovely. Sounds like that would also avoid continuous wakeups, which 
would be very nice.


I don't think anybody would object to a systemd-level solution. If it's 
part of systemd, there would no longer be concerns about architecture 
or code quality, and it'd feel much less hackish. We would want to test 
it to ensure responsiveness is comparable to what earlyoom would offer, 
of course.


Michael

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Lennart Poettering
On Mo, 06.01.20 10:09, Michael Catanzaro (mcatanz...@gnome.org) wrote:

> Is there a way to check memory usage without periodic wakeups?

PSI. It measures latency though. Which is the right thing to measure
here... You can configure thresholds there and it wakes you up when
those are hit. Thus userspace doesn't have to poll at all...

> In WebKit we wake up every 5s to check memory usage if we saw low memory
> usage on the last wakeup, or every 1s if it was high, with a scale in
> between. Would be good to experiment with the timings and see how long we
> can get away with before the polling interval is too large to prevent system
> lockups. (The WebKit timings are designed for cache clearing, not for
> maintaining system responsiveness, so I wouldn't trust those.)

Watch things with PSI.

> > 2. New code using system() in the year 2020? Really?
> >
> > 3. Fixed size buffers and implicit, undetected, truncation of strings
> >at various places (for example, when formatting the shell string to
> >pass to system()).
>
> Thanks. The code review is much appreciated. If we're going to be running a
> superuser deamon, then we need to be confident that it doesn't do these
> dangerous things. And these choices do raise quality concerns about what
> might be lurking in the rest of the code, as well.

BTW, this should not be a root daemon anyway. It only needs one cap:
CAP_SYS_KILL. Hence, drop privs to some user of its own, and keep that
one cap. Use AmbientCapabilities= in the unit file.

> My understanding is that experiments with PSI have indicated that it's hard
> to make it work well in practice. Alexey (hakavlad) has investigated this
> topic extensively, and his conclusion was:
>
> "PSI-based process killing should not be used by default, because this topic
> is still poorly understood and we don’t know what thresholds are desirable
> for most users: it’s hard to find good default values."

If things are poorly understood, then understand them better... Don't
just adopt some stuff that isn't much better understood either...

> > But even if we'd ignore that in order fight latencies one should watch
> > latencies: OOM killing per process is just not appropriate on a
> > systemd system: all our system services (and a good chunk of our user
> > services too) are sorted neatly into cgroups, and we really should
> > kill them as a whole and not just individual processes inside
> > them. systemd manages that today, and makes exceptions configurable
> > via OOMPolicy=, and with your earlyoom stuff you break that.
> >
> > This looks like second guessing the kernel memory management folks at
> > a place where one can only lose, and at the time breaking correct OOM
> > reporting by the kernel via cgroups and stuff.
>
> I think it's very clear at this point that this is extremely unlikely to be
> fixed at the kernel level. If that changes, great, but in the meantime we
> need a userspace solution to prevent Fedora from locking up. The Workstation
> WG doesn't have much (any?) kernel development experience, and we're aware
> that historical discussions on fixing this issue at the kernel level have
> concluded negatively, so we're limiting our interest to userspace
> solutions.

Well, it's not that the kernel folks wouldn't provide you with some
tools to improve the situation, see PSI...

> > Also: what precisely is this even supposed to do? Replace the
> > algorithm for detecting *when* to go on a kill rampage? Or actually
> > replace the algorithm selecting *what* to kill during a kill rampage?
>
> earlyoom is restricted to the former, although in the future we might be
> interested in doing the later as well, either by enhancing earlyoom or
> switching to another tool, e.g. to prevent sshd or journald from being
> killed.

These services should set the OOMScoreAdjust= value to something
sensible. journald and udevd do that. maybe ssh should do too... (it's
a bit harder for ssh, since it needs to undo the setting for its
sessions again, since oom scores are propagated down the process tree)

> > If it's the former (which the name of the project suggests,
> > _early_oom)), then at the most basic the tool should let the kernel do
> > the killing, i.e. "echo f > /proc/sysrq-trigger". That way the
> > reporting via cgroups isn't fucked, and systemd can still do its
> > thing, and the kernel can kill per cgroup rather than per process...
>
> Problem is that letting the kernel do the work can cause data loss. earlyoom
> needs to handle process termination itself so that it can send SIGTERM
> first, instead of jumping straight to SIGKILL and corrupting who knows what.

Well, then tell systemd to do it for you... Use the D-Bus call
GetUnitByPID() and then issue StopUnit().

Lennart

--
Lennart Poettering, Berlin
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 

Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Lennart Poettering
On Mo, 06.01.20 17:47, Lennart Poettering (mzerq...@0pointer.de) wrote:

> On Mo, 06.01.20 08:51, Chris Murphy (li...@colorremedies.com) wrote:
>
> > On Mon, Jan 6, 2020 at 3:08 AM Lennart Poettering  
> > wrote:
> > >>
> > > Looking at the sources very superficially I see a couple of problems:
> > >
> > > 1. Waking up all the time in 100ms intervals? We generally try to
> > >avoid waking the CPU up all the time if nothing happens. Saving
> > >power and things.
> >
> > I agree. What do you think is a reasonable interval? Given that
> > earlyoom won't SIGTERM until both 10% memory free and 10% swap free,
> > and that will take at least some seconds, what about an interval of 3
> > seconds?
>
> None. Use PSI. It wakes you up only when pressure stalls reach
> threshold you declare. Which basically means you never steal the CPUs
> on an idle system, you never cause a wakeup whatsoever.
>
> > > But more importantly: are we sure this actually operates the way we
> > > should? i.e. PSI is really what should be watched. It is not
> > > interesting who uses how much memory and triggering kills on
> > > that. What matters is to detect when the system becomes slow due to
> > > that, i.e. *latencies* introduced due to memory pressure and that's
> > > what PSI is about, and hence what should be used.
> >
> > Earlyoom is a short term stop gap while a more sophisticated solution
> > is still maturing. That being low-memory-monitor, which does leverage
> > PSI.
>
> Yes, l-m-m is great. If we can deploy l-m-m today already, why isn't
> it good enoug for earlyoom?

Oops, sorry. I mean GMemoryMonitor. I assumed l-m-m and GMemoryMonitor
was the same thing, but they aren't. I am not sure about l-m-m,
haven't looked at it in detail.

  GMemoryMonitor = great
  l-m-m = no idea

Lennart

--
Lennart Poettering, Berlin
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Lennart Poettering
On Mo, 06.01.20 08:51, Chris Murphy (li...@colorremedies.com) wrote:

> On Mon, Jan 6, 2020 at 3:08 AM Lennart Poettering  
> wrote:
> >>
> > Looking at the sources very superficially I see a couple of problems:
> >
> > 1. Waking up all the time in 100ms intervals? We generally try to
> >avoid waking the CPU up all the time if nothing happens. Saving
> >power and things.
>
> I agree. What do you think is a reasonable interval? Given that
> earlyoom won't SIGTERM until both 10% memory free and 10% swap free,
> and that will take at least some seconds, what about an interval of 3
> seconds?

None. Use PSI. It wakes you up only when pressure stalls reach
threshold you declare. Which basically means you never steal the CPUs
on an idle system, you never cause a wakeup whatsoever.

> > But more importantly: are we sure this actually operates the way we
> > should? i.e. PSI is really what should be watched. It is not
> > interesting who uses how much memory and triggering kills on
> > that. What matters is to detect when the system becomes slow due to
> > that, i.e. *latencies* introduced due to memory pressure and that's
> > what PSI is about, and hence what should be used.
>
> Earlyoom is a short term stop gap while a more sophisticated solution
> is still maturing. That being low-memory-monitor, which does leverage
> PSI.

Yes, l-m-m is great. If we can deploy l-m-m today already, why isn't
it good enoug for earlyoom?

> > But even if we'd ignore that in order fight latencies one should watch
> > latencies: OOM killing per process is just not appropriate on a
> > systemd system: all our system services (and a good chunk of our user
> > services too) are sorted neatly into cgroups, and we really should
> > kill them as a whole and not just individual processes inside
> > them. systemd manages that today, and makes exceptions configurable
> > via OOMPolicy=, and with your earlyoom stuff you break that.
>
> OOMPolicy= depends on the kernel oom-killer, which is extremely
> reluctant to trigger at all. Consistently in my testing, the vast
> majority of the time, kernel oom-killer takes > 30 minutes to trigger.
> And it may not even kill the worst offender, but rather something like
> sshd. A couple of times, I've seen it kill systemd-journald. That's
> not a small problem.

Well, that sounds as if OOMScoreAdjust= of these services should be
tweaked. In journald we us OOMScoreAdjust=-250 and in udevd
OOMScoreAdjust=-1000.

If journald is still killed too likely, we can certainly bump it to
-900 or so, please file a bug.

> earlyoom first sends SIGTERM. It's not different from the user saying,
> enough of this, let's just gracefully quit the offending process. Only
> if the problem continues to get worse is SIGKILL sent.

This sounds as if you want low-memory-monitor, but for all services,
right?

Sounds like something that is relatively easily implementable in
systemd though, in a much better way, i.e. hooked to PSI...

> For now, kernel developers have made it clear they do not care about
> user space responsiveness. At all. Their concern with kernel

References to this? I mean, the kernel developers are not a single
person, they tend to have different opinions...

> > Also: what precisely is this even supposed to do? Replace the
> > algorithm for detecting *when* to go on a kill rampage? Or actually
> > replace the algorithm selecting *what* to kill during a kill rampage?
>
> a. It's never a kill rampage.

it calls kill(), which I call a "kill rampage"...

> It isn't replacing anything. It's acting as a user advocate by
> approximating what a reasonable user would do, SIGTERM. The user can't
> do this themselves because during heavy swap system responsivity is
> already lost, before we're even close to OOM.
>
> You're right, someone should absolutely solve the responsivity
> problem. Kernel folks have clearly ceded this. Can it be done with
> cgroupv2 and PSI alone? Unclear.

Sounds like someone needs to do their homework, if this is "unclear"?

I mean, you basically admit here that this isn't really figured out to
the end. Maybe let's give this a bit more time and figure things out a
bit more, instead of rushing earlyoom in?

Adopting something now, at a point we already clearly know that PSI is
how this should be done sounds very wrong to me.

> That would be a killing rampage. sysrq+f issues SIGKILL and definitely
> results in data loss, always. Earlyoom uses SIGTERM as a first step,
> which is a much more conservative first attempt.

But it sends SIGKILL next? Why? Why not sysrq+f trggred from userspace
for that?

I must say the idea that there are effectively multiple process
babysitters now, which both want to decide when to terminate services
sounds very wrong to me...

I mean, wouldn't this all be solved much nicer, much more future
proof, if someone would just do what l-m-m does as part of systemd
service management, i.e. let's say an option StopOnMemoryPressure=
that watches PSI and terminates 

Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Michael Catanzaro


On Mon, Jan 6, 2020 at 11:07 am, Lennart Poettering 
 wrote:

Hmm, are we sure this is something we want to have in the default
install? Is the code really good enough for that?

Looking at the sources very superficially I see a couple of problems:

1. Waking up all the time in 100ms intervals? We generally try to
   avoid waking the CPU up all the time if nothing happens. Saving
   power and things.


Is there a way to check memory usage without periodic wakeups?

In WebKit we wake up every 5s to check memory usage if we saw low 
memory usage on the last wakeup, or every 1s if it was high, with a 
scale in between. Would be good to experiment with the timings and see 
how long we can get away with before the polling interval is too large 
to prevent system lockups. (The WebKit timings are designed for cache 
clearing, not for maintaining system responsiveness, so I wouldn't 
trust those.)



2. New code using system() in the year 2020? Really?

3. Fixed size buffers and implicit, undetected, truncation of strings
   at various places (for example, when formatting the shell string to
   pass to system()).


Thanks. The code review is much appreciated. If we're going to be 
running a superuser deamon, then we need to be confident that it 
doesn't do these dangerous things. And these choices do raise quality 
concerns about what might be lurking in the rest of the code, as well.



But more importantly: are we sure this actually operates the way we
should? i.e. PSI is really what should be watched. It is not
interesting who uses how much memory and triggering kills on
that. What matters is to detect when the system becomes slow due to
that, i.e. *latencies* introduced due to memory pressure and that's
what PSI is about, and hence what should be used.


My understanding is that experiments with PSI have indicated that it's 
hard to make it work well in practice. Alexey (hakavlad) has 
investigated this topic extensively, and his conclusion was:


"PSI-based process killing should not be used by default, because this 
topic is still poorly understood and we don’t know what thresholds 
are desirable for most users: it’s hard to find good default values."


https://pagure.io/fedora-workstation/issue/98#comment-615425

Details at: https://github.com/rfjakob/earlyoom/issues/100

So: already considered, but rejected for now.


But even if we'd ignore that in order fight latencies one should watch
latencies: OOM killing per process is just not appropriate on a
systemd system: all our system services (and a good chunk of our user
services too) are sorted neatly into cgroups, and we really should
kill them as a whole and not just individual processes inside
them. systemd manages that today, and makes exceptions configurable
via OOMPolicy=, and with your earlyoom stuff you break that.

This looks like second guessing the kernel memory management folks at
a place where one can only lose, and at the time breaking correct OOM
reporting by the kernel via cgroups and stuff.


I think it's very clear at this point that this is extremely unlikely 
to be fixed at the kernel level. If that changes, great, but in the 
meantime we need a userspace solution to prevent Fedora from locking 
up. The Workstation WG doesn't have much (any?) kernel development 
experience, and we're aware that historical discussions on fixing this 
issue at the kernel level have concluded negatively, so we're limiting 
our interest to userspace solutions.


I think everybody would be happy to hold off on userspace solutions if 
a kernel solution is in the works. I'd love to see kernel devs 
acknowledge the issue, using the same test cases that we're using 
(either 'ninja build' WebKit or simply 'tail /dev/zero'), and propose a 
real concrete solution. But I'm not going to hold my breath for that. 
My understanding is that previous discussions have concluded that the 
kernel OOM is designed to ensure enough memory remains available to the 
kernel, and that userspace is responsible for determining how to keep 
userspace responsive.



Also: what precisely is this even supposed to do? Replace the
algorithm for detecting *when* to go on a kill rampage? Or actually
replace the algorithm selecting *what* to kill during a kill rampage?


earlyoom is restricted to the former, although in the future we might 
be interested in doing the later as well, either by enhancing earlyoom 
or switching to another tool, e.g. to prevent sshd or journald from 
being killed.



If it's the former (which the name of the project suggests,
_early_oom)), then at the most basic the tool should let the kernel do
the killing, i.e. "echo f > /proc/sysrq-trigger". That way the
reporting via cgroups isn't fucked, and systemd can still do its
thing, and the kernel can kill per cgroup rather than per process...


Problem is that letting the kernel do the work can cause data loss. 
earlyoom needs to handle process termination itself so that it can send 
SIGTERM first, instead of 

Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Chris Murphy
On Mon, Jan 6, 2020 at 4:57 AM Roberto Ragusa  wrote:
>
> On 1/5/20 12:38 AM, Chris Murphy wrote:
> > On Sat, Jan 4, 2020 at 2:51 AM Aleksandra Fedorova  
> > wrote:
> >
> >> Since in the Change we are not introducing just the earlyoom tool but 
> >> enable it with a specific profile I would add those details here. Smth 
> >> like:
> >>
> >> "earlyoom service will choose the offending process based on the same 
> >> oom_score as kernel uses. It will send a SIGTERM signal on 10% of RAM 
> >> left, and SIGKILL on 5%"
> >
> > I add this information to the summary. Also, I think these numbers may
> > need to change to avoid prematurely sending SIGTERM when the system
> > has no swap device.
> I read that sentence in a different way:
> "earlyoom will make only 90% of your RAM available,
> so it is effectively using 10% of your RAM".
>
> On my 32GB laptop that means 3.2GB of RAM gets unusable.
> And on my 64GB machine I'm being robbed of 6.4GB. Wow.
>
> How low can these numbers be pushed? Even 3% would be 1GB out of 32GB.

What you say is only true in the case of systems with no swap. That's
mentioned in the proposal. If swap is being used, for sure essentially
all of your RAM is being used, so it's swap that's the determining
factor. If you don't have swap, yes RAM becomes the determining factor
and I agree that on systems with a lot of RAM, 10% is too high.

The ideal scenario is to not run earlyoom at all on systems that do
not have a swap device. They're not going to run into the responsivity
problem anyway, which is a direct consequence of heavy swapping.

-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Chris Murphy
On Mon, Jan 6, 2020 at 3:08 AM Lennart Poettering  wrote:
>>
> Looking at the sources very superficially I see a couple of problems:
>
> 1. Waking up all the time in 100ms intervals? We generally try to
>avoid waking the CPU up all the time if nothing happens. Saving
>power and things.

I agree. What do you think is a reasonable interval? Given that
earlyoom won't SIGTERM until both 10% memory free and 10% swap free,
and that will take at least some seconds, what about an interval of 3
seconds?


> But more importantly: are we sure this actually operates the way we
> should? i.e. PSI is really what should be watched. It is not
> interesting who uses how much memory and triggering kills on
> that. What matters is to detect when the system becomes slow due to
> that, i.e. *latencies* introduced due to memory pressure and that's
> what PSI is about, and hence what should be used.

Earlyoom is a short term stop gap while a more sophisticated solution
is still maturing. That being low-memory-monitor, which does leverage
PSI.

> But even if we'd ignore that in order fight latencies one should watch
> latencies: OOM killing per process is just not appropriate on a
> systemd system: all our system services (and a good chunk of our user
> services too) are sorted neatly into cgroups, and we really should
> kill them as a whole and not just individual processes inside
> them. systemd manages that today, and makes exceptions configurable
> via OOMPolicy=, and with your earlyoom stuff you break that.

OOMPolicy= depends on the kernel oom-killer, which is extremely
reluctant to trigger at all. Consistently in my testing, the vast
majority of the time, kernel oom-killer takes > 30 minutes to trigger.
And it may not even kill the worst offender, but rather something like
sshd. A couple of times, I've seen it kill systemd-journald. That's
not a small problem.

earlyoom first sends SIGTERM. It's not different from the user saying,
enough of this, let's just gracefully quit the offending process. Only
if the problem continues to get worse is SIGKILL sent.


> This looks like second guessing the kernel memory management folks at
> a place where one can only lose, and at the time breaking correct OOM
> reporting by the kernel via cgroups and stuff.

It is intended to be a substitute for the user hitting the power
button. It's not intended as a substitute for the OS, as a whole,
improving its user advocacy to do the right thing in the first place,
which currently it isn't.

For now, kernel developers have made it clear they do not care about
user space responsiveness. At all. Their concern with kernel
oom-killer is strictly with keeping the kernel functioning. And the
congestion that results from heavy simultaneous page-in and page-out
also appears to not be a concern for kernel developers, it's a well
known problem, and they haven't made any break through in this area.

So it's really going to need to be user space managed, leveraging PSI
and cgroupv2. And that's the next step.


> Also: what precisely is this even supposed to do? Replace the
> algorithm for detecting *when* to go on a kill rampage? Or actually
> replace the algorithm selecting *what* to kill during a kill rampage?

a. It's never a kill rampage.
b. When: It first uses SIGTERM at 10% remaining for both memory and
swap; and SIGKILL at 5%.
In hundreds of tests I've never seen earlyoom use SIGKILL, so far
everything responds fairly immediately to SIGTERM. But I'm also
testing with well behaved programs, nothing malicious. And that's
intentional. This problem is actually far worse if it were malicious.
c. What: Same as kernel oom-killer. It uses oom_score.

It isn't replacing anything. It's acting as a user advocate by
approximating what a reasonable user would do, SIGTERM. The user can't
do this themselves because during heavy swap system responsivity is
already lost, before we're even close to OOM.

You're right, someone should absolutely solve the responsivity
problem. Kernel folks have clearly ceded this. Can it be done with
cgroupv2 and PSI alone? Unclear.


> If it's the former (which the name of the project suggests,
> _early_oom)), then at the most basic the tool should let the kernel do
> the killing, i.e. "echo f > /proc/sysrq-trigger". That way the
> reporting via cgroups isn't fucked, and systemd can still do its
> thing, and the kernel can kill per cgroup rather than per process...

That would be a killing rampage. sysrq+f issues SIGKILL and definitely
results in data loss, always. Earlyoom uses SIGTERM as a first step,
which is a much more conservative first attempt.

> Anyway, this all sounds very very fishy to me. Not thought to the end,
> and I am pretty sure this is something the kernel memory management
> folks should give a blessing to. Second guessing the kernel like that
> is just a bad idea if you ask me.

There's no first or second guessing. The kernel oom-killer is strictly
responsible for maintaining enough resources for the 

Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Matthias Clasen
On Mon, Jan 6, 2020 at 5:08 AM Lennart Poettering 
wrote:

>
> I mean, yes, the OOM killer might not be that great currently, but
> this sounds like something to fix in kernel land, and if that doesn't
> work out for some reason because kernel devs can't agree, then do it
> as fallback in userspace, but with sound input from the kernel folks,
> and the blessing of at least some of the kernel folks.
>
>
I agree that the implementation may need some work, but one thing should be
clear:
it is not a winning strategy to wait for the kernel folks to fix this. They
have for all practical
purposes given up on this problem, and are not going to solve the issue for
us.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Roberto Ragusa

On 1/5/20 12:38 AM, Chris Murphy wrote:

On Sat, Jan 4, 2020 at 2:51 AM Aleksandra Fedorova  wrote:


Since in the Change we are not introducing just the earlyoom tool but enable it 
with a specific profile I would add those details here. Smth like:

"earlyoom service will choose the offending process based on the same oom_score as 
kernel uses. It will send a SIGTERM signal on 10% of RAM left, and SIGKILL on 5%"


I add this information to the summary. Also, I think these numbers may
need to change to avoid prematurely sending SIGTERM when the system
has no swap device.

I read that sentence in a different way:
"earlyoom will make only 90% of your RAM available,
so it is effectively using 10% of your RAM".

On my 32GB laptop that means 3.2GB of RAM gets unusable.
And on my 64GB machine I'm being robbed of 6.4GB. Wow.

How low can these numbers be pushed? Even 3% would be 1GB out of 32GB.

Regards.

--
   Roberto Ragusamail at robertoragusa.it
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-06 Thread Lennart Poettering
On Fr, 03.01.20 14:18, Ben Cotton (bcot...@redhat.com) wrote:

> https://fedoraproject.org/wiki/Changes/EnableEarlyoom
>
> == Summary ==
> Install earlyoom package, and enable it by default. This will cause
> the kernel oomkiller to trigger sooner, but will not affect which
> process it chooses to kill off. The idea is to recover from out of
> memory situations sooner, rather than the typical complete system hang
> in which the user has no other choice but to force power off.

Hmm, are we sure this is something we want to have in the default
install? Is the code really good enough for that?

Looking at the sources very superficially I see a couple of problems:

1. Waking up all the time in 100ms intervals? We generally try to
   avoid waking the CPU up all the time if nothing happens. Saving
   power and things.

2. New code using system() in the year 2020? Really?

3. Fixed size buffers and implicit, undetected, truncation of strings
   at various places (for example, when formatting the shell string to
   pass to system()).

But more importantly: are we sure this actually operates the way we
should? i.e. PSI is really what should be watched. It is not
interesting who uses how much memory and triggering kills on
that. What matters is to detect when the system becomes slow due to
that, i.e. *latencies* introduced due to memory pressure and that's
what PSI is about, and hence what should be used.

But even if we'd ignore that in order fight latencies one should watch
latencies: OOM killing per process is just not appropriate on a
systemd system: all our system services (and a good chunk of our user
services too) are sorted neatly into cgroups, and we really should
kill them as a whole and not just individual processes inside
them. systemd manages that today, and makes exceptions configurable
via OOMPolicy=, and with your earlyoom stuff you break that.

This looks like second guessing the kernel memory management folks at
a place where one can only lose, and at the time breaking correct OOM
reporting by the kernel via cgroups and stuff.

Also: what precisely is this even supposed to do? Replace the
algorithm for detecting *when* to go on a kill rampage? Or actually
replace the algorithm selecting *what* to kill during a kill rampage?

If it's the former (which the name of the project suggests,
_early_oom)), then at the most basic the tool should let the kernel do
the killing, i.e. "echo f > /proc/sysrq-trigger". That way the
reporting via cgroups isn't fucked, and systemd can still do its
thing, and the kernel can kill per cgroup rather than per process...

Anyway, this all sounds very very fishy to me. Not thought to the end,
and I am pretty sure this is something the kernel memory management
folks should give a blessing to. Second guessing the kernel like that
is just a bad idea if you ask me.

I mean, yes, the OOM killer might not be that great currently, but
this sounds like something to fix in kernel land, and if that doesn't
work out for some reason because kernel devs can't agree, then do it
as fallback in userspace, but with sound input from the kernel folks,
and the blessing of at least some of the kernel folks.

Lennart

--
Lennart Poettering, Berlin
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-05 Thread Chris Murphy
On Sun, Jan 5, 2020 at 4:43 AM Aleksandra Fedorova  wrote:
>
> I wonder, how I as a user going to be informed about the
> earlyoom-event?

Same as a kernel oom-killer event. Primary source is the journal.

But well before either earlyoom sends SIGTERM or kernel oom-killer
kills something, the user will know something is wrong, because system
responsivity will be stuttering or even already intermittently
hanging. Earlyoom is not aggressively clobbering things, except for
system configuration that have no swap device. That configuration
needs some earlyoom tweaking, probably, and we're looking at that, but
then those folks also aren't experiencing much reduced system
responsiveness in these cases because their system can't heavily swap.

>I assume abrt will recognize the crash? Will it be
> easily visible from the abrt report that it was the OOM?

No. It's not a crash. Earlyoom sends SIGTERM first, and only sends
SIGKILL if the process isn't responding in time to SIGTERM. And the
kernel oom-killer also doesn't result in an abrt report.

> The concern is: if we enable such a service, will we get large amount
> of vague bug reports from users who don't understand what has
> happened. Can we make it somehow easier to debug?

Unless further real world testing uncovers something very new and
different from my testing (entirely possible, but I can't estimate
that probability), there won't be a measurable increase in bug reports
related to this.

Based on my limited testing (I've done around 200+ tests of
oom-killer, earlyoom SIGTERM (never have seen a SIGKILL), and nohang;
and perhaps 80 of those tests involved forced power off during heavy
swap, compile and system use) there really isn't anything that
requires the user to get involved.

Also, there isn't a per se bug here. It's a series of intentional
designs that are colliding together in a deeply problematic user
experience for the desktop: that the "operating system", i.e. Fedora
Workstation providing kernel, systemd, a bunch of services, libraries,
policies - permits an unprivileged process to ask for essentially
unlimited resources and overcommit the system *and* then heavy swap
use results in compromised system responsiveness and control.

Earlyoom doesn't change any of that, it just selects a process for
SIGTERM much sooner than the kernel oom-killer. And that only stops
the bad experience, by stopping the resource hogging process. It isn't
actually fixing anything. It is in some sense an act of desperation,
that's been a long time coming. Arguably, earlyoom isn't aggressive
enough, doesn't stop the badness soon enough.


-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-05 Thread Chris Murphy
On Sun, Jan 5, 2020 at 2:18 AM Zbigniew Jędrzejewski-Szmek
 wrote:
>
> On Sat, Jan 04, 2020 at 04:38:19PM -0700, Chris Murphy wrote:
> > My understanding of systemd OOMPolicy= behavior, is it looks for the
> > kernel's oom-killer messages and acts upon those. Whereas earlyoom
> > uses the same metric (oom_score) as the oom-killer, it does not invoke
> > the oom-killer. Therefore systemd probably does not get the proper
> > hint to implement OOMPolicy=
>
> Yes. The kernel reports oom events in the cgroup file memory.events,
> and systemd waits for an inotify event on that file; OOMPolicy=stop is
> implemented that way. And the OOMPolicy=kill option is "implemented"
> by setting memory.oom.group=1 in the kernel [1] and having the kernel
> kill all the processes. So systemd is providing a thin wrapper around
> the kernel functionality.
>
> If processes are not killed by the kernel but through a signal from
> userspace, all of this will not work.

The gotcha on the desktop with kernel oom-killer, is that if it's
needed, it's way past too late. And it may never trigger.

The central problem to be solved isn't even what does OOM killing or
when: the ridiculously bad system responsiveness during heavy swap
usage.

My top criticism of the feature proposal is that it doesn't address
the responsivity problem head on. It just reduces the duration of
badness. And the reduction isn't near enough.

One thing that helps the heavy swap problem, today? A much smaller
swap partition. In fact, no swap partition alleviates the problem
entirely, but of course that has other consequences (that the working
group is discussing in #120).

-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-05 Thread Zbigniew Jędrzejewski-Szmek
On Sun, Jan 05, 2020 at 12:29:40PM +0100, Aleksandra Fedorova wrote:
> On Sun, Jan 5, 2020 at 10:18 AM Zbigniew Jędrzejewski-Szmek
>  wrote:
> >
> > On Sat, Jan 04, 2020 at 04:38:19PM -0700, Chris Murphy wrote:
> > > On Sat, Jan 4, 2020 at 2:51 AM Aleksandra Fedorova  
> > > wrote:
> > >
> > > > Since in the Change we are not introducing just the earlyoom tool but 
> > > > enable it with a specific profile I would add those details here. Smth 
> > > > like:
> > > >
> > > > "earlyoom service will choose the offending process based on the same 
> > > > oom_score as kernel uses. It will send a SIGTERM signal on 10% of RAM 
> > > > left, and SIGKILL on 5%"
> > >
> > > I add this information to the summary. Also, I think these numbers may
> > > need to change to avoid prematurely sending SIGTERM when the system
> > > has no swap device.
> > >
> > > > As I understand in the current setup we are looking more for a 
> > > > controlled failure scenario rather than for a solution.
> > >
> > > Yes, it's fair to say this proposal is to make things "less bad". It
> > > doesn't improve system responsiveness. Once heavy swap starts, the
> > > system is sluggish, stutters, and briefly stalls. This proposal
> > > doesn't fix that. There is a lot of room for improvement.
> > >
> > >
> > > > Can we get a specific manual, what users supposed to do, once they 
> > > > trigger the earlyoom? Does earlyoom help in reporting? Which logs we 
> > > > need to look at?
> > > >
> > > > Maybe add a section in UX part of the change, or setup a dedicated wiki 
> > > > page?
> > >
> > > The user shouldn't need to do anything differently than if the kernel
> > > oom-killer had triggered. The system journal will contain messages
> > > showing what was killed and why:
> > >
> > > Jan 04 16:05:42 fmac.local earlyoom[4896]: low memory! at or below
> > > SIGTERM limits: mem 10 %, swap 10 %
> > > Jan 04 16:05:42 fmac.local earlyoom[4896]: sending SIGTERM to process
> > > 27421 "chrome": badness 305, VmRSS 42 MiB
> > >
> > >
> > > > Additionally, there was a question during the chat discussion: how the 
> > > > earlyoom setup will work together with OOMPolicy and any other related 
> > > > options of systemd units? Will systemd recognize the OOM event?
> > >
> > > My understanding of systemd OOMPolicy= behavior, is it looks for the
> > > kernel's oom-killer messages and acts upon those. Whereas earlyoom
> > > uses the same metric (oom_score) as the oom-killer, it does not invoke
> > > the oom-killer. Therefore systemd probably does not get the proper
> > > hint to implement OOMPolicy=
> >
> > Yes. The kernel reports oom events in the cgroup file memory.events,
> > and systemd waits for an inotify event on that file; OOMPolicy=stop is
> > implemented that way. And the OOMPolicy=kill option is "implemented"
> > by setting memory.oom.group=1 in the kernel [1] and having the kernel
> > kill all the processes. So systemd is providing a thin wrapper around
> > the kernel functionality.
> >
> > If processes are not killed by the kernel but through a signal from
> > userspace, all of this will not work.
> 
> I grepped /usr/lib/systemd and /etc/systemd for "OOM" on my
> workstation and it seems that we have only OOMScoreAdjust option used
> in the installed systemd units. And this option will be respected by
> earlyoom.
> 
> Since on workstation we don't use tweaking of the OOMPolicy on the
> unit level, I'd say we can leave the tweaking to the system
> administrators: when there is need to adjust OOMPolicy of a service,
> administrators would need to tweak or disable earlyoom service as
> well.

Having "conflicts" between things, in the sense that using one feature
means that another feature needs to be disabled, is always an option.
But it's never a very good option. I think that it isn't too important
to keep OOMPolicy= working, since its a new and relatively unused thing.
Nevertheless, it would be nice to find a way to avoid this and
support both features at the same time. This thread 'til now is mostly
about establishing whether there really is a conflict (it seems yes)
and whether there is some easy way to avoid it (not sure yet...). I
think we should explore that before settling on the easy but suboptimal
answer.

> But I'd like to understand better the difference between _default_
> OOM-event and _default_ earlyoom-event:
> 
> Afaik DefaultOOMPolicy is set to "stop", which means if one of the
> processes in the service is killed by OOM, other processes from the
> same service are gracefully stopped by systemd.
> 
> What is the default behavior of the systemd service on external
> SIGTERM/SIGKILL signal sent to the process by earlyoom?

It depends on which of the processes is killed. If the main process
is killed with SIGTERM, systemd kill consider this a normal successful 
termination.
If the main process is killed with SIGKILL, systemd will consider this a 
failure.
(Both of those cases modified by SuccessExitStatus=.)
If some random subprocess 

Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-05 Thread Aleksandra Fedorova
On Sun, Jan 5, 2020 at 12:39 AM Chris Murphy  wrote:
>
> On Sat, Jan 4, 2020 at 2:51 AM Aleksandra Fedorova  wrote:
>
> > Since in the Change we are not introducing just the earlyoom tool but 
> > enable it with a specific profile I would add those details here. Smth like:
> >
> > "earlyoom service will choose the offending process based on the same 
> > oom_score as kernel uses. It will send a SIGTERM signal on 10% of RAM left, 
> > and SIGKILL on 5%"
>
> I add this information to the summary. Also, I think these numbers may
> need to change to avoid prematurely sending SIGTERM when the system
> has no swap device.
>
> > As I understand in the current setup we are looking more for a controlled 
> > failure scenario rather than for a solution.
>
> Yes, it's fair to say this proposal is to make things "less bad". It
> doesn't improve system responsiveness. Once heavy swap starts, the
> system is sluggish, stutters, and briefly stalls. This proposal
> doesn't fix that. There is a lot of room for improvement.
>
>
> > Can we get a specific manual, what users supposed to do, once they trigger 
> > the earlyoom? Does earlyoom help in reporting? Which logs we need to look 
> > at?
> >
> > Maybe add a section in UX part of the change, or setup a dedicated wiki 
> > page?
>
> The user shouldn't need to do anything differently than if the kernel
> oom-killer had triggered. The system journal will contain messages
> showing what was killed and why:
>
> Jan 04 16:05:42 fmac.local earlyoom[4896]: low memory! at or below
> SIGTERM limits: mem 10 %, swap 10 %
> Jan 04 16:05:42 fmac.local earlyoom[4896]: sending SIGTERM to process
> 27421 "chrome": badness 305, VmRSS 42 MiB
>

I wonder, how I as a user going to be informed about the
earlyoom-event? I assume abrt will recognize the crash? Will it be
easily visible from the abrt report that it was the OOM?

The concern is: if we enable such a service, will we get large amount
of vague bug reports from users who don't understand what has
happened. Can we make it somehow easier to debug?

> > Additionally, there was a question during the chat discussion: how the 
> > earlyoom setup will work together with OOMPolicy and any other related 
> > options of systemd units? Will systemd recognize the OOM event?
>
> My understanding of systemd OOMPolicy= behavior, is it looks for the
> kernel's oom-killer messages and acts upon those. Whereas earlyoom
> uses the same metric (oom_score) as the oom-killer, it does not invoke
> the oom-killer. Therefore systemd probably does not get the proper
> hint to implement OOMPolicy=
>
> Fedora need to discuss how big of a problem that is, if there's anyway
> to mitigate it, or tolerate it, weighing the pros of earlyoom for a
> short period, versus the cons of punting this problem for another
> release. This proposal does not intend to step on other superseding
> work in this area, but if it does, it'll be withdrawn.
>
>
> --
> Chris Murphy
> ___
> devel mailing list -- devel@lists.fedoraproject.org
> To unsubscribe send an email to devel-le...@lists.fedoraproject.org
> Fedora Code of Conduct: 
> https://docs.fedoraproject.org/en-US/project/code-of-conduct/
> List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
> List Archives: 
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org

-- 
Aleksandra Fedorova
bookwar
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-05 Thread Aleksandra Fedorova
On Sun, Jan 5, 2020 at 10:18 AM Zbigniew Jędrzejewski-Szmek
 wrote:
>
> On Sat, Jan 04, 2020 at 04:38:19PM -0700, Chris Murphy wrote:
> > On Sat, Jan 4, 2020 at 2:51 AM Aleksandra Fedorova  
> > wrote:
> >
> > > Since in the Change we are not introducing just the earlyoom tool but 
> > > enable it with a specific profile I would add those details here. Smth 
> > > like:
> > >
> > > "earlyoom service will choose the offending process based on the same 
> > > oom_score as kernel uses. It will send a SIGTERM signal on 10% of RAM 
> > > left, and SIGKILL on 5%"
> >
> > I add this information to the summary. Also, I think these numbers may
> > need to change to avoid prematurely sending SIGTERM when the system
> > has no swap device.
> >
> > > As I understand in the current setup we are looking more for a controlled 
> > > failure scenario rather than for a solution.
> >
> > Yes, it's fair to say this proposal is to make things "less bad". It
> > doesn't improve system responsiveness. Once heavy swap starts, the
> > system is sluggish, stutters, and briefly stalls. This proposal
> > doesn't fix that. There is a lot of room for improvement.
> >
> >
> > > Can we get a specific manual, what users supposed to do, once they 
> > > trigger the earlyoom? Does earlyoom help in reporting? Which logs we need 
> > > to look at?
> > >
> > > Maybe add a section in UX part of the change, or setup a dedicated wiki 
> > > page?
> >
> > The user shouldn't need to do anything differently than if the kernel
> > oom-killer had triggered. The system journal will contain messages
> > showing what was killed and why:
> >
> > Jan 04 16:05:42 fmac.local earlyoom[4896]: low memory! at or below
> > SIGTERM limits: mem 10 %, swap 10 %
> > Jan 04 16:05:42 fmac.local earlyoom[4896]: sending SIGTERM to process
> > 27421 "chrome": badness 305, VmRSS 42 MiB
> >
> >
> > > Additionally, there was a question during the chat discussion: how the 
> > > earlyoom setup will work together with OOMPolicy and any other related 
> > > options of systemd units? Will systemd recognize the OOM event?
> >
> > My understanding of systemd OOMPolicy= behavior, is it looks for the
> > kernel's oom-killer messages and acts upon those. Whereas earlyoom
> > uses the same metric (oom_score) as the oom-killer, it does not invoke
> > the oom-killer. Therefore systemd probably does not get the proper
> > hint to implement OOMPolicy=
>
> Yes. The kernel reports oom events in the cgroup file memory.events,
> and systemd waits for an inotify event on that file; OOMPolicy=stop is
> implemented that way. And the OOMPolicy=kill option is "implemented"
> by setting memory.oom.group=1 in the kernel [1] and having the kernel
> kill all the processes. So systemd is providing a thin wrapper around
> the kernel functionality.
>
> If processes are not killed by the kernel but through a signal from
> userspace, all of this will not work.

I grepped /usr/lib/systemd and /etc/systemd for "OOM" on my
workstation and it seems that we have only OOMScoreAdjust option used
in the installed systemd units. And this option will be respected by
earlyoom.

Since on workstation we don't use tweaking of the OOMPolicy on the
unit level, I'd say we can leave the tweaking to the system
administrators: when there is need to adjust OOMPolicy of a service,
administrators would need to tweak or disable earlyoom service as
well.

But I'd like to understand better the difference between _default_
OOM-event and _default_ earlyoom-event:

Afaik DefaultOOMPolicy is set to "stop", which means if one of the
processes in the service is killed by OOM, other processes from the
same service are gracefully stopped by systemd.

What is the default behavior of the systemd service on external
SIGTERM/SIGKILL signal sent to the process by earlyoom?

> [1] 
> https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory-interface-files
>
> Zbyszek
>

-- 
Aleksandra Fedorova
bookwar
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-05 Thread Zbigniew Jędrzejewski-Szmek
On Sat, Jan 04, 2020 at 04:38:19PM -0700, Chris Murphy wrote:
> On Sat, Jan 4, 2020 at 2:51 AM Aleksandra Fedorova  wrote:
> 
> > Since in the Change we are not introducing just the earlyoom tool but 
> > enable it with a specific profile I would add those details here. Smth like:
> >
> > "earlyoom service will choose the offending process based on the same 
> > oom_score as kernel uses. It will send a SIGTERM signal on 10% of RAM left, 
> > and SIGKILL on 5%"
> 
> I add this information to the summary. Also, I think these numbers may
> need to change to avoid prematurely sending SIGTERM when the system
> has no swap device.
> 
> > As I understand in the current setup we are looking more for a controlled 
> > failure scenario rather than for a solution.
> 
> Yes, it's fair to say this proposal is to make things "less bad". It
> doesn't improve system responsiveness. Once heavy swap starts, the
> system is sluggish, stutters, and briefly stalls. This proposal
> doesn't fix that. There is a lot of room for improvement.
> 
> 
> > Can we get a specific manual, what users supposed to do, once they trigger 
> > the earlyoom? Does earlyoom help in reporting? Which logs we need to look 
> > at?
> >
> > Maybe add a section in UX part of the change, or setup a dedicated wiki 
> > page?
> 
> The user shouldn't need to do anything differently than if the kernel
> oom-killer had triggered. The system journal will contain messages
> showing what was killed and why:
> 
> Jan 04 16:05:42 fmac.local earlyoom[4896]: low memory! at or below
> SIGTERM limits: mem 10 %, swap 10 %
> Jan 04 16:05:42 fmac.local earlyoom[4896]: sending SIGTERM to process
> 27421 "chrome": badness 305, VmRSS 42 MiB
> 
> 
> > Additionally, there was a question during the chat discussion: how the 
> > earlyoom setup will work together with OOMPolicy and any other related 
> > options of systemd units? Will systemd recognize the OOM event?
> 
> My understanding of systemd OOMPolicy= behavior, is it looks for the
> kernel's oom-killer messages and acts upon those. Whereas earlyoom
> uses the same metric (oom_score) as the oom-killer, it does not invoke
> the oom-killer. Therefore systemd probably does not get the proper
> hint to implement OOMPolicy=

Yes. The kernel reports oom events in the cgroup file memory.events,
and systemd waits for an inotify event on that file; OOMPolicy=stop is
implemented that way. And the OOMPolicy=kill option is "implemented"
by setting memory.oom.group=1 in the kernel [1] and having the kernel
kill all the processes. So systemd is providing a thin wrapper around
the kernel functionality.

If processes are not killed by the kernel but through a signal from
userspace, all of this will not work.

[1] 
https://www.kernel.org/doc/html/latest/admin-guide/cgroup-v2.html#memory-interface-files

Zbyszek

> Fedora need to discuss how big of a problem that is, if there's anyway
> to mitigate it, or tolerate it, weighing the pros of earlyoom for a
> short period, versus the cons of punting this problem for another
> release. This proposal does not intend to step on other superseding
> work in this area, but if it does, it'll be withdrawn.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-04 Thread Kevin Kofler
Michael Catanzaro wrote:
> nohang has experimented with PSI, but it actually isn't using PSI
> metrics by default because they've proven to be less effective than
> hoped for. In theory, using an interactivity measure like PSI should
> provide for the best results, but in practice it just hasn't worked out
> well.

I think this really needs to be handled entirely in the kernel to be 
effective, because if the interactivity is already down the drain, your 
userspace PSI monitor will not get to run at all in a reasonable timeframe.

I think that to ensure interactivity, the kernel needs to synchronously 
check the interactivity metrics each and every time it gets a swap-in 
request, and fail the request (and kill the process, most likely) if the 
requesting process is known to hurt interactivity too much with its previous 
requests. Anything asynchronous will just not work, because asynchronous 
event handlers stop working when the interactivity is too poor.

Kevin Kofler
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-04 Thread Kevin Kofler
John M. Harris Jr wrote:
> This is simply not the case. It may be for GNOME, but I haven't tested
> that. It definitely is not the case for Plasma.

… unless you want to run KMail/Akonadi on it. :-)

But yes, Plasma itself works fine with 2 GiB (I haven't actually tested with 
less than 4 GiB, but you wrote you did and I believe you there), most 
applications should work too, and if you need an e-mail client, you can run 
a lightweight one such as Trojitá (Qt-based, fast, and requires less than 50 
MiB exclusive memory here, with over 5 mails in my IMAP inbox – if I 
start scrolling through the inbox, it loads old metadata on demand, growing 
the memory usage to still less than 100 MiB).

And my Core 2 Duo with 4 GiB RAM definitely works fine with Plasma (desktop 
environment), Falkon (web browser), Trojitá (e-mail client), Krusader (file 
manager), etc.

Kevin Kofler
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-04 Thread John M. Harris Jr
On Saturday, January 4, 2020 2:29:11 PM MST drago01 wrote:
> A modern desktop with apps on top will not run well enough on 2GB,
> lets stop pretending it does.

This is simply not the case. It may be for GNOME, but I haven't tested that. 
It definitely is not the case for Plasma.

-- 
John M. Harris, Jr.
Splentity

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-04 Thread Chris Murphy
On Sat, Jan 4, 2020 at 2:51 AM Aleksandra Fedorova  wrote:

> Since in the Change we are not introducing just the earlyoom tool but enable 
> it with a specific profile I would add those details here. Smth like:
>
> "earlyoom service will choose the offending process based on the same 
> oom_score as kernel uses. It will send a SIGTERM signal on 10% of RAM left, 
> and SIGKILL on 5%"

I add this information to the summary. Also, I think these numbers may
need to change to avoid prematurely sending SIGTERM when the system
has no swap device.

> As I understand in the current setup we are looking more for a controlled 
> failure scenario rather than for a solution.

Yes, it's fair to say this proposal is to make things "less bad". It
doesn't improve system responsiveness. Once heavy swap starts, the
system is sluggish, stutters, and briefly stalls. This proposal
doesn't fix that. There is a lot of room for improvement.


> Can we get a specific manual, what users supposed to do, once they trigger 
> the earlyoom? Does earlyoom help in reporting? Which logs we need to look at?
>
> Maybe add a section in UX part of the change, or setup a dedicated wiki page?

The user shouldn't need to do anything differently than if the kernel
oom-killer had triggered. The system journal will contain messages
showing what was killed and why:

Jan 04 16:05:42 fmac.local earlyoom[4896]: low memory! at or below
SIGTERM limits: mem 10 %, swap 10 %
Jan 04 16:05:42 fmac.local earlyoom[4896]: sending SIGTERM to process
27421 "chrome": badness 305, VmRSS 42 MiB


> Additionally, there was a question during the chat discussion: how the 
> earlyoom setup will work together with OOMPolicy and any other related 
> options of systemd units? Will systemd recognize the OOM event?

My understanding of systemd OOMPolicy= behavior, is it looks for the
kernel's oom-killer messages and acts upon those. Whereas earlyoom
uses the same metric (oom_score) as the oom-killer, it does not invoke
the oom-killer. Therefore systemd probably does not get the proper
hint to implement OOMPolicy=

Fedora need to discuss how big of a problem that is, if there's anyway
to mitigate it, or tolerate it, weighing the pros of earlyoom for a
short period, versus the cons of punting this problem for another
release. This proposal does not intend to step on other superseding
work in this area, but if it does, it'll be withdrawn.


-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-04 Thread Chris Murphy
On Sat, Jan 4, 2020 at 2:30 PM drago01  wrote:
>
> On Sat, Jan 4, 2020 at 7:32 PM Chris Murphy  wrote:
> >
> > It might be. And it might need to be tweaked. Perhaps 6% for SIGTERM
> > and 3% for SIGKILL. Or even 5% and 2.5%. For sure using a percentage
> > of RAM and swap is too simplistic. But it's easy for users to
> > understand. Something more sophisticated, based on kernel pressure
> > stall information would likely be better, and folks are working on
> > that.
>
> Yes that would be a way better metric than a percent value which is
> either to close to full ram or to early if you have lots of ram.
> 6% of 4GB is 254MB while for 32GB its almost 2GB - killing processes
> while you have 2GB left is just wasteful.

If there's a swap device, that won't happen. The case where SIGTERM
really happens at 10% RAM free, is when there's no swap device. And
even though the no swap device configuration is not a default, and
explicitly not recommended, right now, by the installer (as in, if you
try to do such an installation, it warns you) - it is a configuration
we allow, and I happen to know it's somewhat common among developers
with systems with lots of RAM expressly because swap thrashing even to
SSD results in such poor UX.

Consider the following 'vmstat 10' while doing a compile:

procs ---memory-- ---swap-- -io -system-- --cpu-
 r  b   swpd   free   buff  cache   si   sobibo   in   cs us sy id wa st

 6 11 4168060 1821580 40 736604 30234 10841 46533 13805 19230
29799 74 12  1 13  0

At this time, the GUI was completely unresponsive, not even the mouse
arrow moves, for about 1 minute. Seemingly plenty of RAM and swap, and
idle CPU. But rather heavy swap in and out.


10  9 4459648 200912 40 569260 11218 18856 28846 19997 15164 35256
28  9  9 53  0
 6  8 4207328 807092 40 636156 26205 16744 35472 18287 20179 34087
62 12  3 23  0

At these two lines, the mouse arrow is stuttering, the GUI is very
sluggish, even unresponsive much of the time.

Jan 04 15:37:18 fmac.local earlyoom[4896]: mem avail:  1212 of  7865
MiB (15 %), swap free: 4807 of 8195 MiB (58 %)

Near the same time. The system is no where near either RAM or swap
exhaustion. But swap si/so are high. This is an SSD BTW.

Can I get to the compile and force quit? Eventually, it would take a
couple minutes.  But good progress is being made with the compile
during this whole time.

earlyoom doesn't SIGTERM this compile until 20 minutes of this
behavior. With default settings. So it really isn't solving the
sluggish, stuttering problem. But what does happen, is it SIGTERMs the
compile before the system gets to a state where essentially all of the
work is only swap in and swap out, and no other work is being done.

Here is the output (2 week expiration)
https://pastebin.com/0iZHNjg7

Retest with no swap at all, and yes, compile gets a SIGTERM when free
memory gets to 10% (because swap is already considered to be 0% free,
since it doesn't exist). But also? The system isn't under any swap io
duress. The system is completely responsive throughout.

This is why we see developers giving up on swap partitions entirely.
swap-on-ZRAM might be a compromise. That's related issue #120.


> > That's not a fix either, it's a work around that papers over the
> > problem. Same as earlyoom, except RAM costs money, and may not be an
> > option due to hardware limitations. A modern operating system needs to
> > know better than to allow unprivileged processes to take down the
> > whole system.
>
> I think you misunderstood me. Yes the OS should behave better than
> this but if you are running a server you don't want your DB, web
> server to not be reachable because the system run out of memory - the
> only way to "fix" that
> is to provide enough resources. No amount of OOM killing would help
> you here. The system may be up but not the server process the machine
> is running for ...

Perhaps, but two points:

a. this feature is for Workstation. If the Server working group wants
to give it a go, that's up to them. But they may prefer experimenting
with more server oriented user space oom daemons like recent versions
of oomd. And for that use case, Facebook (and others) have
investigated this and find that avoiding OOM even by process killing,
is far less bad than the system hanging itself. As in better for
recovery and better for limited sysadmin resources. There's a video
about it from the recent All Systems Go conference.

b. earlyoom does SIGTERM first, I have yet to see a single process
(hundreds of tests, but that's really nothing, and also not a
scientific sample) that doesn't respond to SIGTERM, where SIGKILL is
needed.


> > > And btw we should really update the minimum memory requirements in our 
> > > documentation, the current ones have nothing to do with reality (if you 
> > > want a pleasant user experience).
> >
> > Can you be more specific?
> >
> > On getfedora.org it reads:
> > Fedora requires a minimum of 

Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-04 Thread drago01
On Sat, Jan 4, 2020 at 7:32 PM Chris Murphy  wrote:
>
> On Sat, Jan 4, 2020 at 4:48 AM drago01  wrote:
> >
> >
> >
> > On Saturday, January 4, 2020, Neal Gompa  wrote:
> >>
> >> On Sat, Jan 4, 2020 at 2:33 AM Vitaly Zaitsev via devel
> >>  wrote:
> >> >
> >> > On 03.01.2020 22:27, Neal Gompa wrote:
> >> > > and servers...
> >> >
> >> > Admins will be very happy when such user-space killer will kill for
> >> > example PgSQL database server and cause DB corruption or loss of banking
> >> > transactions.
> >> >
> >>
> >> This is already happening anyway. The idea is that earlyoom will just
> >> do it slightly earlier so we have a responsive system when the
> >> failures happen. Unlike a lot of the other options, earlyoom is just
> >> doing what the kernel does, just slightly earlier so that the system
> >> doesn't become unresponsive.
> >>
> >>
> >>
> >> That is *hugely* valuable for sysadmins
> >> to be able to recover the systems without power cycling. As a sysadmin
> >> myself, I *hate* power cycling servers because it takes forever and
> >> its a lot bigger loss of productivity (and potentially money!
> >
> >
> > Except that slightly earlier is way to early on systems which have lots of 
> > memory (see mails from before).
>
> It might be. And it might need to be tweaked. Perhaps 6% for SIGTERM
> and 3% for SIGKILL. Or even 5% and 2.5%. For sure using a percentage
> of RAM and swap is too simplistic. But it's easy for users to
> understand. Something more sophisticated, based on kernel pressure
> stall information would likely be better, and folks are working on
> that.

Yes that would be a way better metric than a percent value which is
either to close to full ram or to early if you have lots of ram.
6% of 4GB is 254MB while for 32GB its almost 2GB - killing processes
while you have 2GB left is just wasteful.

> >
> > And if a server runs into a oom situation your software is either broken 
> > (leaking) or you didn't allocate enough resources for your use case.
> >
> > So the fix is not oom killing nor power cycling but to either allocate more 
> > memory of it is a VM or buy more if it is a hardware server (or fix the 
> > memory leak in your software).
>
> That's not a fix either, it's a work around that papers over the
> problem. Same as earlyoom, except RAM costs money, and may not be an
> option due to hardware limitations. A modern operating system needs to
> know better than to allow unprivileged processes to take down the
> whole system.

I think you misunderstood me. Yes the OS should behave better than
this but if you are running a server you don't want your DB, web
server to not be reachable because the system run out of memory - the
only way to "fix" that
is to provide enough resources. No amount of OOM killing would help
you here. The system may be up but not the server process the machine
is running for ...

>
> > And btw we should really update the minimum memory requirements in our 
> > documentation, the current ones have nothing to do with reality (if you 
> > want a pleasant user experience).
>
> Can you be more specific?
>
> On getfedora.org it reads:
> Fedora requires a minimum of 20GB disk, 2GB RAM, to install and run
> successfully. Double those amounts is recommended.


I simply do not think 2GB is sufficient, the "recommended double" i.e
4GB should be the "required" and drop the double part all together.
A modern desktop with apps on top will not run well enough on 2GB,
lets stop pretending it does. But anyways that's off topic as it is
not part of the proposal.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-04 Thread Chris Murphy
On Sat, Jan 4, 2020 at 11:20 AM John M. Harris Jr  wrote:
>
> On Saturday, January 4, 2020 11:16:24 AM MST Michael Catanzaro wrote:
> > On Fri, Jan 3, 2020 at 5:52 pm, John M. Harris Jr
> >  wrote:
> >
> > > In that case, I'd suggest waiting the 15 minutes, and then not
> > > bogging down
> > > your system that badly the next time. This is, really, the best
> > > option.
> >
> >
> > I'm going to suggest you stop replying in this thread if you're not
> > interested in responding with productive comments.
> >
> > The user experience requirement here is "desktop should not hang for 15
> > minutes when under memory pressure." Your comment indicates that it
> > *should* hang, presumably to punish users for using too much memory.
> > This is so absurd that I don't think you're engaging in good-faith
> > discussion anymore.
>
> Whether or not it should or should not is irrelevant. I don't see much of an
> alternative than what seems to be a "hang", honestly. It has nothing to do
> with something to "punish" users, it's to get the system to a state where you
> can `sync` and reboot.

The point of this feature proposal is precisely to get the system into
a state where they can save their work and do a proper reboot. It's
safer, less esoteric, and more reliable than sysrq+b.

It cannot become a user's burden to know the kernel is still doing
something, when there's zero feedback and zero control. When will the
system recover on its own? An hour? A day? A week? I can tell you for
sure in my test case, it was consistently stuck for > 30 minutes. I
let it go that long, many times, only to demonstrate it's not a
temporary hang, and users are acting rationally to force power off.


-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-04 Thread John M. Harris Jr
On Saturday, January 4, 2020 11:31:49 AM MST Chris Murphy wrote:
> A modern operating system needs to
> know better than to allow unprivileged processes to take down the
> whole system.

Well, you can configure quotas if you really want, but the idea is that it's 
YOUR COMPUTER, and you should be able to use it however you like. If you want 
to run software that requires more RAM than your system has, you can do that, 
and it will run, just not well.

-- 
John M. Harris, Jr.
Splentity

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-04 Thread Chris Murphy
On Sat, Jan 4, 2020 at 4:48 AM drago01  wrote:
>
>
>
> On Saturday, January 4, 2020, Neal Gompa  wrote:
>>
>> On Sat, Jan 4, 2020 at 2:33 AM Vitaly Zaitsev via devel
>>  wrote:
>> >
>> > On 03.01.2020 22:27, Neal Gompa wrote:
>> > > and servers...
>> >
>> > Admins will be very happy when such user-space killer will kill for
>> > example PgSQL database server and cause DB corruption or loss of banking
>> > transactions.
>> >
>>
>> This is already happening anyway. The idea is that earlyoom will just
>> do it slightly earlier so we have a responsive system when the
>> failures happen. Unlike a lot of the other options, earlyoom is just
>> doing what the kernel does, just slightly earlier so that the system
>> doesn't become unresponsive.
>>
>>
>>
>> That is *hugely* valuable for sysadmins
>> to be able to recover the systems without power cycling. As a sysadmin
>> myself, I *hate* power cycling servers because it takes forever and
>> its a lot bigger loss of productivity (and potentially money!
>
>
> Except that slightly earlier is way to early on systems which have lots of 
> memory (see mails from before).

It might be. And it might need to be tweaked. Perhaps 6% for SIGTERM
and 3% for SIGKILL. Or even 5% and 2.5%. For sure using a percentage
of RAM and swap is too simplistic. But it's easy for users to
understand. Something more sophisticated, based on kernel pressure
stall information would likely be better, and folks are working on
that.

>
> And if a server runs into a oom situation your software is either broken 
> (leaking) or you didn't allocate enough resources for your use case.
>
> So the fix is not oom killing nor power cycling but to either allocate more 
> memory of it is a VM or buy more if it is a hardware server (or fix the 
> memory leak in your software).

That's not a fix either, it's a work around that papers over the
problem. Same as earlyoom, except RAM costs money, and may not be an
option due to hardware limitations. A modern operating system needs to
know better than to allow unprivileged processes to take down the
whole system.


> And btw we should really update the minimum memory requirements in our 
> documentation, the current ones have nothing to do with reality (if you want 
> a pleasant user experience).

Can you be more specific?

On getfedora.org it reads:
Fedora requires a minimum of 20GB disk, 2GB RAM, to install and run
successfully. Double those amounts is recommended.


-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-04 Thread John M. Harris Jr
On Saturday, January 4, 2020 4:48:04 AM MST drago01 wrote:
> And btw we should really update the minimum memory requirements in our
> documentation, the current ones have nothing to do with reality (if you
> want a pleasant user experience).

That is not necessary, at all. I'm running Fedora on a 2009 Core 2 Duo system 
with 2 GiB of RAM, and have not had any issues, after disabling the compositor 
in Plasma. My daily driver is an X200 Tablet with 4 GiB of RAM, similarly. 
These amounts are more than sufficient for most users, and Firefox has never 
led my system to an OOM event. The only time I've ever run into an OOM on this 
system was while compiling some poorly written software (that I wrote, years 
ago).

-- 
John M. Harris, Jr.
Splentity

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-04 Thread John M. Harris Jr
On Saturday, January 4, 2020 11:16:24 AM MST Michael Catanzaro wrote:
> On Fri, Jan 3, 2020 at 5:52 pm, John M. Harris Jr 
>  wrote:
> 
> > In that case, I'd suggest waiting the 15 minutes, and then not 
> > bogging down
> > your system that badly the next time. This is, really, the best 
> > option.
> 
> 
> I'm going to suggest you stop replying in this thread if you're not 
> interested in responding with productive comments.
> 
> The user experience requirement here is "desktop should not hang for 15 
> minutes when under memory pressure." Your comment indicates that it 
> *should* hang, presumably to punish users for using too much memory. 
> This is so absurd that I don't think you're engaging in good-faith 
> discussion anymore.

Whether or not it should or should not is irrelevant. I don't see much of an 
alternative than what seems to be a "hang", honestly. It has nothing to do 
with something to "punish" users, it's to get the system to a state where you 
can `sync` and reboot.

-- 
John M. Harris, Jr.
Splentity

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-04 Thread Chris Murphy
On Sat, Jan 4, 2020 at 12:45 AM  wrote:
>
> В Суб, 04/01/2020 в 08:27 +0100, Vitaly Zaitsev via devel пишет:
>
> > I'm strongly against adding of any user-space OOM killers to Fedora
> > default images. Users should explicitly enable them only when needed.
>
> Just my 2 cents: i tested early versions of earlyoom and have weird
> experience with it: it killing not Chromium or Chromium processes,
> instead it killing tiny processes which it shouldn't, probably. I guess
> it could kill dnf process as well easily.
>
> I am skeptically too about enabling such things by default, but in same
> time would be nice to massively test this.

earlyoom uses oom_score to determine the victim process to SIGTERM and
SIGKILL. The same metric used by the kernel oom-killer. I too have
seen inexplicable kernel oom-killer invoked on processes that should
not be targets: sssd, sshd, and even once systemd-journald. This is
very weird and I don't have an explanation why any process with a
score of 0 is getting killed before the dozens of processes with a
score much higher, and yet I've seen it. It's suspicious.

The nice thing about earlyoom, even though it's a hammer? It's a small
hammer. It's not going to go on a wrecking ball spree. It can, and
likely will, be backed out as other solutions become more useful. And
the documentation reflects its oversimplification of a complex
problem.

-- 
Chris Murphy
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-04 Thread John M. Harris Jr
On Friday, January 3, 2020 11:34:13 PM MST Andreas Tunek wrote:
> Den lör 4 jan. 2020 kl 01:53 skrev John M. Harris Jr :
> > On Friday, January 3, 2020 4:25:20 PM MST Chris Murphy wrote:
> > > in the cases were I could issue syrq+b, responsiveness was so bad
> > > it'd take upwards of 15 minutes just to type out the command
> > 
> > In that case, I'd suggest waiting the 15 minutes, and then not bogging
> > down
> > your system that badly the next time. This is, really, the best option.
> 
> *Remembers to be excellent to each other.*
> Or maybe we should try to make operating systems that actually work under
> heavy load.

If we had something that would "actually work under heavy load" (we do, but it 
doesn't work for some people), then my advice wouldn't be necessary. However, 
what I've said is for the safety of your installed system. It should only be 
followed if the integrity of your data is important to you.

-- 
John M. Harris, Jr.
Splentity

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-04 Thread Michael Catanzaro
On Sat, Jan 4, 2020 at 11:38 am, Zbigniew Jędrzejewski-Szmek 
 wrote:

What about using the memory controller for user units to allocate
memory resources between the processes in the user session? Thanks to
recent developments, the gnome session uses separate systemd units
(and thus separate cgroups) for various services. We could set 
attributes

like memory.low for "the basic components of the user session",
and on the other hand, memory.swap.max for "the payload", i.e. various
user processes on top.


This looks interesting. I'd love to see more serious discussion of this 
proposal. Carving out dedicated memory for essential desktop processes 
seems like something we should be able to do in 2020.


Michael

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-04 Thread Michael Catanzaro
On Fri, Jan 3, 2020 at 5:52 pm, John M. Harris Jr 
 wrote:
In that case, I'd suggest waiting the 15 minutes, and then not 
bogging down
your system that badly the next time. This is, really, the best 
option.


I'm going to suggest you stop replying in this thread if you're not 
interested in responding with productive comments.


The user experience requirement here is "desktop should not hang for 15 
minutes when under memory pressure." Your comment indicates that it 
*should* hang, presumably to punish users for using too much memory. 
This is so absurd that I don't think you're engaging in good-faith 
discussion anymore.


___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-04 Thread Michael Catanzaro



On Fri, Jan 3, 2020 at 11:12 pm, Tom Seewald  wrote:
I think this would be a really big improvement for workstation and 
other desktop spins, the handling of out of memory situations have 
been a consistent paint point on Linux.  However, may I ask why 
EarlyOOM was chosen over something like NoHang [1]?  I am a bit 
concerned that EarlyOOM's heuristics may be too coarse, as it does 
not take into account the newly-added PSI metrics [2][3] that other 
projects like NoHang, oomd, and low-memory-monitor utilize.  For 
example, if the system is thrashing, but swap is not full, to my 
knowledge EarlyOOM will not see a problem, however it would be 
visible via PSI.


We've been working closely with Alexey, the maintainer of nohang, on 
this proposal. He has recommended using either earlyoom or nohang as 
the two best choices over other available options (e.g. oomd, or 
low-memory-monitor). I'm not completely certain why earlyoom was chosen 
over nohang, but I think simplicity and code maturity were likely 
important considerations in the final choice.


nohang has experimented with PSI, but it actually isn't using PSI 
metrics by default because they've proven to be less effective than 
hoped for. In theory, using an interactivity measure like PSI should 
provide for the best results, but in practice it just hasn't worked out 
well.


In our experiments, low-memory-monitor is currently significantly worse 
at handling OOM conditions (as has been noted elsewhere in this 
thread). Although we're likely to enable low-memory-monitor in 
Workstation, we would use it only for advisory memory pressure 
notifications (GMemoryMonitor).


Michael

___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-04 Thread Michael Catanzaro
Let's keep this desktop-focused, since the proposal does not affect 
Server edition.


On Sat, Jan 4, 2020 at 12:48 pm, drago01  wrote:
As for the desktop case the running web browers in a cgroup to keep 
them in check would solve most real world problems - other common 
desktop apps don't use enough memory to cause such issues (unless 
your system is really memory constrained but then the "buy more 
memory" solution is the better fix).


The last time I saw my desktop hang due to a web browser using too much 
memory was 2015.


The freezes I've encountered in the past five years were all related to 
software development:


* When compiling large software projects, it's possible to run out of 
RAM either when building lots of files in parallel, or when linking
* GNOME Builder runs ctags, and ctags likes to use dozens of GB of RAM 
to index large software projects. I think it sometimes gets into a loop 
where it just allocates more and more RAM until the desktop dies


___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-04 Thread drago01
On Saturday, January 4, 2020, Neal Gompa  wrote:

> On Sat, Jan 4, 2020 at 2:33 AM Vitaly Zaitsev via devel
>  wrote:
> >
> > On 03.01.2020 22:27, Neal Gompa wrote:
> > > and servers...
> >
> > Admins will be very happy when such user-space killer will kill for
> > example PgSQL database server and cause DB corruption or loss of banking
> > transactions.
> >
>
> This is already happening anyway. The idea is that earlyoom will just
> do it slightly earlier so we have a responsive system when the
> failures happen. Unlike a lot of the other options, earlyoom is just
> doing what the kernel does, just slightly earlier so that the system
> doesn't become unresponsive.


>
> That is *hugely* valuable for sysadmins
> to be able to recover the systems without power cycling. As a sysadmin
> myself, I *hate* power cycling servers because it takes forever and
> its a lot bigger loss of productivity (and potentially money!
>

Except that slightly earlier is way to early on systems which have lots of
memory (see mails from before).

And if a server runs into a oom situation your software is either broken
(leaking) or you didn't allocate enough resources for your use case.

So the fix is not oom killing nor power cycling but to either allocate more
memory of it is a VM or buy more if it is a hardware server (or fix the
memory leak in your software).

As for the desktop case the running web browers in a cgroup to keep them in
check would solve most real world problems - other common desktop apps
don't use enough memory to cause such issues (unless your system is really
memory constrained but then the "buy more memory" solution is the better
fix).

And btw we should really update the minimum memory requirements in our
documentation, the current ones have nothing to do with reality (if you
want a pleasant user experience).
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-04 Thread Zbigniew Jędrzejewski-Szmek
On Fri, Jan 03, 2020 at 02:18:40PM -0500, Ben Cotton wrote:
> https://fedoraproject.org/wiki/Changes/EnableEarlyoom
> 
> == Summary ==
> Install earlyoom package, and enable it by default. This will cause
> the kernel oomkiller to trigger sooner, but will not affect which
> process it chooses to kill off. The idea is to recover from out of
> memory situations sooner, rather than the typical complete system hang
> in which the user has no other choice but to force power off.

Hi,
I'll throw out another idea out here, in hope that people can provide
insight. It's something I wanted to look into for a while, but I admit
to not having done any research myself, so the approach might be totally
useless...

What about using the memory controller for user units to allocate
memory resources between the processes in the user session? Thanks to
recent developments, the gnome session uses separate systemd units
(and thus separate cgroups) for various services. We could set attributes
like memory.low for "the basic components of the user session",
and on the other hand, memory.swap.max for "the payload", i.e. various
user processes on top.

Doing something like this effectively would most likely require some
changes to how we assign processes to cgroups. I still get some processes
in "wrong" cgroups:

│ ├─gnome-shell-wayland.service 
│ │ ├─1501571 /usr/bin/gnome-shell
│ │ ├─1501606 /usr/bin/Xwayland :0 -rootless -noreset -accessx -core -auth 
/run/user/1000/.mutter-Xwaylandauth.SCXID0 -listen 4 -listen 5 -displayfd 6
│ │ ├─1501713 ibus-daemon --panel disable -r --xim
│ │ ├─1501718 /usr/libexec/ibus-dconf
│ │ ├─1501719 /usr/libexec/ibus-extension-gtk3
│ │ ├─1501724 /usr/libexec/ibus-x11 --kill-daemon
│ │ ├─1501980 /usr/libexec/ibus-engine-simple
│ │ ├─1503586 /usr/lib64/firefox/firefox
│ │ ├─1503691 /usr/lib64/firefox/firefox -contentproc -childID 2 -isForBrowser 
...
│ │ ├─1503701 /usr/lib64/firefox/firefox -contentproc -childID 3 -isForBrowser 
...
│ │ ├─1503747 /usr/lib64/firefox/firefox -contentproc -childID 4 -isForBrowser 
...
│ │ ├─1520219 bwrap --args 35 telegram-desktop --
│ │ ├─1520229 bwrap --args 35 xdg-dbus-proxy --args=37
│ │ ├─1520230 xdg-dbus-proxy --args=37
│ │ ├─1520232 bwrap --args 35 telegram-desktop --
│ │ ├─1520233 /app/bin/Telegram --
│ │ ├─1540753 pavucontrol
...

(and firefox and anything-running-as-flatpak would be the prime
candidates to split out into their own cgroups and build resource
limits around...)

The cgroup hierarchy is mostly flat (most user services get
cgroups directly in the root of the user tree under
/sys/fs/cgroup/user.slice/user-nnn.slice/user@nnn.service/).
To make resource assignment effective, I would like to see a
"basic.slice" (name TBD) that would gather various "core" stuff like
gnome-shell-wayland.service, dbus-broker.service, and whatever
other services that the graphical session depends on. This would
get mimimum memory protections and such.

Then there should be "payload.slice", and underneath that all the
non-essential services and everything that the user starts from the
terminal.

What I *don't know* is: how much of an overhead enabling the memory
controller has, and whether those resource limits would actually have
the desired effect (and more generally, how they should be best set).

Zbyszek
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-04 Thread Neal Gompa
On Sat, Jan 4, 2020 at 2:33 AM Vitaly Zaitsev via devel
 wrote:
>
> On 03.01.2020 22:27, Neal Gompa wrote:
> > and servers...
>
> Admins will be very happy when such user-space killer will kill for
> example PgSQL database server and cause DB corruption or loss of banking
> transactions.
>

This is already happening anyway. The idea is that earlyoom will just
do it slightly earlier so we have a responsive system when the
failures happen. Unlike a lot of the other options, earlyoom is just
doing what the kernel does, just slightly earlier so that the system
doesn't become unresponsive. That is *hugely* valuable for sysadmins
to be able to recover the systems without power cycling. As a sysadmin
myself, I *hate* power cycling servers because it takes forever and
its a lot bigger loss of productivity (and potentially money!).



-- 
真実はいつも一つ!/ Always, there's only one truth!
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-04 Thread Aleksandra Fedorova
On Fri, 3 Jan 2020, 20:19 Ben Cotton,  wrote:

> https://fedoraproject.org/wiki/Changes/EnableEarlyoom
>
> == Summary ==
> Install earlyoom package, and enable it by default. This will cause
> the kernel oomkiller to trigger sooner, but will not affect which
> process it chooses to kill off. The idea is to recover from out of
> memory situations sooner, rather than the typical complete system hang
> in which the user has no other choice but to force power off.
>

Since in the Change we are not introducing just the earlyoom tool but
enable it with a specific profile I would add those details here. Smth like:

"earlyoom service will choose the offending process based on the same
oom_score as kernel uses. It will send a SIGTERM signal on 10% of RAM left,
and SIGKILL on 5%"


> == Owner ==
> * Name: [[User:chrismurphy| Chris Murphy]]
> * Email: bugzi...@colorremedies.com
>
> == Detailed Description ==
> Workstation working group has discussed "better interactivity in
> low-memory situations" for some months. In certain use cases,
> typically compiling, if all RAM and swap are completely consumed,
> system responsiveness becomes so abysmal that a reasonable user can
> consider the system "lost", and resorts to forcing a power off. This
> is objective a very bad UX. The broad discussion of this problem, and
> some ideas for near term and long term solutions, is located here:
>
> Recent long discussions on "Better interactivity in low-memory
> situations"
> https://pagure.io/fedora-workstation/issue/98
>
> https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org/message/XUZLHJ5O32OX24LG44R7UZ2TMN6NY47N/
> 
>
> Fedora editions and spins, have the in-kernel OOM (out-of-memory)
> manager enabled. The manager's concern is keeping the kernel itself
> functioning. It has no concern about user space function or
> interactivity. This proposed change attempts to improve the user
> experience, in the short term, by triggering the in-kernel process
> killing mechanism, sooner. Instead of the system becoming completely
> unresponsive for tens of minutes, hours or days, the expectation is an
> offending process (determined by oom_score, same as now) will be
> killed off within seconds or a few minutes. This is an incremental
> improvement in user experience, but admittedly still suboptimal. There
> is additional work on-going to improve the user experience further.
>
> Workstation working group discussion specific to enabling earlyoom by
> default
> https://pagure.io/fedora-workstation/issue/119
>
> Other in-progress solutions:
> https://gitlab.freedesktop.org/hadess/low-memory-monitor
>
> Background information on this complicated problem:
> https://www.kernel.org/doc/gorman/html/understand/understand016.html
> https://lwn.net/Articles/317814/
>
> == Benefit to Fedora ==
>
> There are two major benefits to Fedora:
>
> * improved user experience by more quickly regaining control over
> one's system, rather than having to force power off in low-memory
> situations where there's aggressive swapping. Once a system becomes
> unresponsive, it's completely reasonable for the user to assume the
> system is lost, but that includes high potential for data loss.
>
> * reducing forced poweroff as the main work around will increase data
> collection, improving understanding of low memory situations and how
> to handle them better
>

As I understand in the current setup we are looking more for a controlled
failure scenario rather than for a solution.

Can we get a specific manual, what users supposed to do, once they trigger
the earlyoom? Does earlyoom help in reporting? Which logs we need to look
at?

Maybe add a section in UX part of the change, or setup a dedicated wiki
page?



>
> == Scope ==
> * Proposal owners:
> a. Modify {{code|
> https://pagure.io/fedora-comps/blob/master/f/comps-f32.xml.in}}
> to include earlyoom package for Workstation.
> b. Modify {{code|
> https://src.fedoraproject.org/rpms/fedora-release/blob/master/f/80-workstation.preset
> }}
> to include:
> 
> # enable earlyoom by default on workstation
> enable earlyoom.service
> 
>
> * Other developers:
> Restricted to Workstation edition, unless other editions/spins want to
> opt-in.
>
> * Release engineering: [https://pagure.io/releng/issues #9141] (a
> check of an impact with Release Engineering is needed) 
>
> * Policies and guidelines: N/A
> * Trademark approval: N/A
>
> == Upgrade/compatibility impact ==
> earlyoom.service will be enabled on upgrade. An upgraded system should
> exhibit the same behaviors as a clean installed system.
>
> == How To Test ==
> * Fedora 30/31 users can test today, any edition or spin:
> {{code|sudo dnf install earlyoom}}
> {{code|sudo systemctl enable --now earlyoom}}
>
> And then attempt to cause an out of memory situation. Examples:
> {{code|tail /dev/zero}}
> {{code|https://lkml.org/lkml/2019/8/4/15}}
>
> * Fedora Workstation 32 (and Rawhide) users will see this service is
> already enabled. It can be toggled 

Re: Fedora 32 System-Wide Change proposal (late): Enable EarlyOOM

2020-01-03 Thread ego . cordatus
В Суб, 04/01/2020 в 08:27 +0100, Vitaly Zaitsev via devel пишет:

> I'm strongly against adding of any user-space OOM killers to Fedora
> default images. Users should explicitly enable them only when needed.

Just my 2 cents: i tested early versions of earlyoom and have weird
experience with it: it killing not Chromium or Chromium processes,
instead it killing tiny processes which it shouldn't, probably. I guess
it could kill dnf process as well easily.

I am skeptically too about enabling such things by default, but in same
time would be nice to massively test this.
___
devel mailing list -- devel@lists.fedoraproject.org
To unsubscribe send an email to devel-le...@lists.fedoraproject.org
Fedora Code of Conduct: 
https://docs.fedoraproject.org/en-US/project/code-of-conduct/
List Guidelines: https://fedoraproject.org/wiki/Mailing_list_guidelines
List Archives: 
https://lists.fedoraproject.org/archives/list/devel@lists.fedoraproject.org


  1   2   >