Re: [PATCH 0/5] prevent OOM triggered by TTM

2018-02-07 Thread Thomas Hellstrom

Hi,

On 02/07/2018 02:22 PM, Christian König wrote:

Understood, but why is that?

Well because customers requested it :)

What we try to do here is having a parameter which says when less than 
x megabytes of memory are left then fail the allocation.


This is basically to prevent buggy applications which try to allocate 
as much memory as possible until they receive an -ENOMEM from running 
into the OOM killer.


OK. Understood.



That's true, but with VRAM, TTM overcommits swap space which may lead 
to ugly memory allocation failures at hibernate time.
Yeah, that is exactly the reason why I said that Roger should disable 
the limit during suspend swap out :)


Well that was really in the context of the swapping implementation 
rather in the context of this change so it was a little off-topic. Even 
if disabling this limit, TTM can overcommit. But looking at the swapping 
implementation is a different issue.


/Thomas







Regards,
Christian.

Am 07.02.2018 um 14:17 schrieb Thomas Hellstrom:

Hi, Roger.

On 02/07/2018 09:25 AM, He, Roger wrote:
Why should TTM be different in that aspect? It would be good to 
know your reasoning WRT this?


Now, in TTM struct ttm_bo_device it already has member no_retry to 
indicate your option.
If you prefer no OOM triggered by TTM, set it as true. The default 
is false to keep original behavior.

AMD prefers return value of no memory rather than OOM for now.


Understood, but why is that? I mean just because TTM doesn't invoke 
the OOM killer, that doesn't mean that the process will, the next 
millisecond, page in a number of pages and invoke it? So this 
mechanism would be pretty susceptible to races?
One thing I looked at at one point was to have TTM do the 
swapping itself instead of handing it off to the shmem system. That 
way we could pre-allocate swap entries for all swappable (BO) 
memory, making sure that we wouldn't run out of swap space when,


I prefer current mechanism of swap out. At the beginning the swapped 
pages stay in system memory by shmem until OS move to status with 
high memory pressure, that has an obvious advantage. For example, if 
the BO is swapped out into shmem, but not really be flushed into 
swap disk. When validate it and swap in it at this moment, the 
overhead is small compared to swap in from disk.


But that is true for a page handed off to the swap-cache as well. It 
won't be immediately flushed to disc, only when the swap cache is 
shrunk.


In addition, No need swap space reservation for TTM pages when 
allocation since swap disk is shared not only for TTM exclusive.


That's true, but with VRAM, TTM overcommits swap space which may lead 
to ugly memory allocation failures at hibernate time.


So again we provide a flag no_retry in struct ttm_bo_device to let 
driver set according to its request.


Thanks,
Thomas





Thanks
Roger(Hongbo.He)

-Original Message-
From: Thomas Hellstrom [mailto:tho...@shipmail.org]
Sent: Wednesday, February 07, 2018 2:43 PM
To: He, Roger <hongbo...@amd.com>; amd-...@lists.freedesktop.org; 
dri-devel@lists.freedesktop.org

Cc: Koenig, Christian <christian.koe...@amd.com>
Subject: Re: [PATCH 0/5] prevent OOM triggered by TTM

Hi, Roger,

On 02/06/2018 10:04 AM, Roger He wrote:

currently ttm code has no any allocation limit. So it allows pages
allocatation unlimited until OOM. Because if swap space is full of
swapped pages and then system memory will be filled up with ttm pages.
and then any memory allocation request will trigger OOM.

I'm a bit curious, isn't this the way things are supposed to work on 
a linux system?
If all memory resources are used up, the OOM killer will kill the 
most memory hungry (perhaps rogue) process rather than processes 
being nice and try to find out themselves whether allocations will 
succeed?
Why should TTM be different in that aspect? It would be good to know 
your reasoning WRT this?


Admittedly, graphics process OOM memory accounting doesn't work very 
well, due to not all BOs not being CPU mapped, but it looks like 
there is recent work towards fixing this?


One thing I looked at at one point was to have TTM do the swapping 
itself instead of handing it off to the shmem system. That way we 
could pre-allocate swap entries for all swappable (BO) memory, 
making sure that we wouldn't run out of swap space when, for 
example, hibernating and that would also limit the pinned 
non-swappable memory (from TTM driver kernel allocations for 
example) to half the system memory resources.


Thanks,
Thomas





___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 0/5] prevent OOM triggered by TTM

2018-02-07 Thread Christian König

Understood, but why is that?

Well because customers requested it :)

What we try to do here is having a parameter which says when less than x 
megabytes of memory are left then fail the allocation.


This is basically to prevent buggy applications which try to allocate as 
much memory as possible until they receive an -ENOMEM from running into 
the OOM killer.


That's true, but with VRAM, TTM overcommits swap space which may lead 
to ugly memory allocation failures at hibernate time.
Yeah, that is exactly the reason why I said that Roger should disable 
the limit during suspend swap out :)


Regards,
Christian.

Am 07.02.2018 um 14:17 schrieb Thomas Hellstrom:

Hi, Roger.

On 02/07/2018 09:25 AM, He, Roger wrote:
Why should TTM be different in that aspect? It would be good to 
know your reasoning WRT this?


Now, in TTM struct ttm_bo_device it already has member no_retry to 
indicate your option.
If you prefer no OOM triggered by TTM, set it as true. The default is 
false to keep original behavior.

AMD prefers return value of no memory rather than OOM for now.


Understood, but why is that? I mean just because TTM doesn't invoke 
the OOM killer, that doesn't mean that the process will, the next 
millisecond, page in a number of pages and invoke it? So this 
mechanism would be pretty susceptible to races?
One thing I looked at at one point was to have TTM do the 
swapping itself instead of handing it off to the shmem system. That 
way we could pre-allocate swap entries for all swappable (BO) memory, 
making sure that we wouldn't run out of swap space when,


I prefer current mechanism of swap out. At the beginning the swapped 
pages stay in system memory by shmem until OS move to status with 
high memory pressure, that has an obvious advantage. For example, if 
the BO is swapped out into shmem, but not really be flushed into swap 
disk. When validate it and swap in it at this moment, the overhead is 
small compared to swap in from disk.


But that is true for a page handed off to the swap-cache as well. It 
won't be immediately flushed to disc, only when the swap cache is shrunk.


In addition, No need swap space reservation for TTM pages when 
allocation since swap disk is shared not only for TTM exclusive.


That's true, but with VRAM, TTM overcommits swap space which may lead 
to ugly memory allocation failures at hibernate time.


So again we provide a flag no_retry in struct ttm_bo_device to let 
driver set according to its request.


Thanks,
Thomas





Thanks
Roger(Hongbo.He)

-Original Message-
From: Thomas Hellstrom [mailto:tho...@shipmail.org]
Sent: Wednesday, February 07, 2018 2:43 PM
To: He, Roger <hongbo...@amd.com>; amd-...@lists.freedesktop.org; 
dri-devel@lists.freedesktop.org

Cc: Koenig, Christian <christian.koe...@amd.com>
Subject: Re: [PATCH 0/5] prevent OOM triggered by TTM

Hi, Roger,

On 02/06/2018 10:04 AM, Roger He wrote:

currently ttm code has no any allocation limit. So it allows pages
allocatation unlimited until OOM. Because if swap space is full of
swapped pages and then system memory will be filled up with ttm pages.
and then any memory allocation request will trigger OOM.

I'm a bit curious, isn't this the way things are supposed to work on 
a linux system?
If all memory resources are used up, the OOM killer will kill the 
most memory hungry (perhaps rogue) process rather than processes 
being nice and try to find out themselves whether allocations will 
succeed?
Why should TTM be different in that aspect? It would be good to know 
your reasoning WRT this?


Admittedly, graphics process OOM memory accounting doesn't work very 
well, due to not all BOs not being CPU mapped, but it looks like 
there is recent work towards fixing this?


One thing I looked at at one point was to have TTM do the swapping 
itself instead of handing it off to the shmem system. That way we 
could pre-allocate swap entries for all swappable (BO) memory, making 
sure that we wouldn't run out of swap space when, for example, 
hibernating and that would also limit the pinned non-swappable memory 
(from TTM driver kernel allocations for example) to half the system 
memory resources.


Thanks,
Thomas





___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 0/5] prevent OOM triggered by TTM

2018-02-07 Thread Thomas Hellstrom

Hi, Roger.

On 02/07/2018 09:25 AM, He, Roger wrote:

Why should TTM be different in that aspect? It would be good to know 
your reasoning WRT this?

Now, in TTM struct ttm_bo_device it already has member no_retry to indicate 
your option.
If you prefer no OOM triggered by TTM, set it as true. The default is false to 
keep original behavior.
AMD prefers return value of no memory rather than OOM for now.


Understood, but why is that? I mean just because TTM doesn't invoke the 
OOM killer, that doesn't mean that the process will, the next 
millisecond, page in a number of pages and invoke it? So this mechanism 
would be pretty susceptible to races?

One thing I looked at at one point was to have TTM do the swapping 
itself instead of handing it off to the shmem system. That way we could 
pre-allocate swap entries for all swappable (BO) memory, making sure that we 
wouldn't run out of swap space when,

I prefer current mechanism of swap out. At the beginning the swapped pages stay 
in system memory by shmem until OS move to status with high memory pressure, 
that has an obvious advantage. For example, if the BO is swapped out into 
shmem, but not really be flushed into swap disk. When validate it and swap in 
it at this moment, the overhead is small compared to swap in from disk.


But that is true for a page handed off to the swap-cache as well. It 
won't be immediately flushed to disc, only when the swap cache is shrunk.



In addition, No need swap space reservation for TTM pages when allocation since 
swap disk is shared not only for TTM exclusive.


That's true, but with VRAM, TTM overcommits swap space which may lead to 
ugly memory allocation failures at hibernate time.



So again we provide a flag no_retry in struct ttm_bo_device to let driver set 
according to its request.


Thanks,
Thomas





Thanks
Roger(Hongbo.He)

-Original Message-
From: Thomas Hellstrom [mailto:tho...@shipmail.org]
Sent: Wednesday, February 07, 2018 2:43 PM
To: He, Roger <hongbo...@amd.com>; amd-...@lists.freedesktop.org; 
dri-devel@lists.freedesktop.org
Cc: Koenig, Christian <christian.koe...@amd.com>
Subject: Re: [PATCH 0/5] prevent OOM triggered by TTM

Hi, Roger,

On 02/06/2018 10:04 AM, Roger He wrote:

currently ttm code has no any allocation limit. So it allows pages
allocatation unlimited until OOM. Because if swap space is full of
swapped pages and then system memory will be filled up with ttm pages.
and then any memory allocation request will trigger OOM.


I'm a bit curious, isn't this the way things are supposed to work on a linux 
system?
If all memory resources are used up, the OOM killer will kill the most memory 
hungry (perhaps rogue) process rather than processes being nice and try to find 
out themselves whether allocations will succeed?
Why should TTM be different in that aspect? It would be good to know your 
reasoning WRT this?

Admittedly, graphics process OOM memory accounting doesn't work very well, due 
to not all BOs not being CPU mapped, but it looks like there is recent work 
towards fixing this?

One thing I looked at at one point was to have TTM do the swapping itself 
instead of handing it off to the shmem system. That way we could pre-allocate 
swap entries for all swappable (BO) memory, making sure that we wouldn't run 
out of swap space when, for example, hibernating and that would also limit the 
pinned non-swappable memory (from TTM driver kernel allocations for example) to 
half the system memory resources.

Thanks,
Thomas



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH 0/5] prevent OOM triggered by TTM

2018-02-07 Thread He, Roger
Sure, will make it turn off as default and make the limit configurable.

Thanks
Roger(Hongbo.He)
-Original Message-
From: Christian König [mailto:ckoenig.leichtzumer...@gmail.com] 
Sent: Wednesday, February 07, 2018 4:41 PM
To: He, Roger <hongbo...@amd.com>; Thomas Hellstrom <tho...@shipmail.org>; 
amd-...@lists.freedesktop.org; dri-devel@lists.freedesktop.org
Cc: Koenig, Christian <christian.koe...@amd.com>
Subject: Re: [PATCH 0/5] prevent OOM triggered by TTM

> AMD prefers return value of no memory rather than OOM for now.
Yeah, but Thomas is right that the default for this feature should be that it 
is turned off.

That's why I said that we should make the limit configurable.

Regards,
Christian.

Am 07.02.2018 um 09:25 schrieb He, Roger:
>   Why should TTM be different in that aspect? It would be good to know 
> your reasoning WRT this?
>
> Now, in TTM struct ttm_bo_device it already has member no_retry to indicate 
> your option.
> If you prefer no OOM triggered by TTM, set it as true. The default is false 
> to keep original behavior.
> AMD prefers return value of no memory rather than OOM for now.
>
>
>
>   One thing I looked at at one point was to have TTM do the swapping 
> itself instead of handing it off to the shmem system. That way we 
> could pre-allocate swap entries for all swappable (BO) memory, making 
> sure that we wouldn't run out of swap space when,
>
> I prefer current mechanism of swap out. At the beginning the swapped pages 
> stay in system memory by shmem until OS move to status with high memory 
> pressure, that has an obvious advantage. For example, if the BO is swapped 
> out into shmem, but not really be flushed into swap disk. When validate it 
> and swap in it at this moment, the overhead is small compared to swap in from 
> disk. In addition, No need swap space reservation for TTM pages when 
> allocation since swap disk is shared not only for TTM exclusive.
> So again we provide a flag no_retry in struct ttm_bo_device to let driver set 
> according to its request.
>
>
> Thanks
> Roger(Hongbo.He)
>
> -Original Message-
> From: Thomas Hellstrom [mailto:tho...@shipmail.org]
> Sent: Wednesday, February 07, 2018 2:43 PM
> To: He, Roger <hongbo...@amd.com>; amd-...@lists.freedesktop.org; 
> dri-devel@lists.freedesktop.org
> Cc: Koenig, Christian <christian.koe...@amd.com>
> Subject: Re: [PATCH 0/5] prevent OOM triggered by TTM
>
> Hi, Roger,
>
> On 02/06/2018 10:04 AM, Roger He wrote:
>> currently ttm code has no any allocation limit. So it allows pages 
>> allocatation unlimited until OOM. Because if swap space is full of 
>> swapped pages and then system memory will be filled up with ttm pages.
>> and then any memory allocation request will trigger OOM.
>>
> I'm a bit curious, isn't this the way things are supposed to work on a linux 
> system?
> If all memory resources are used up, the OOM killer will kill the most memory 
> hungry (perhaps rogue) process rather than processes being nice and try to 
> find out themselves whether allocations will succeed?
> Why should TTM be different in that aspect? It would be good to know your 
> reasoning WRT this?
>
> Admittedly, graphics process OOM memory accounting doesn't work very well, 
> due to not all BOs not being CPU mapped, but it looks like there is recent 
> work towards fixing this?
>
> One thing I looked at at one point was to have TTM do the swapping itself 
> instead of handing it off to the shmem system. That way we could pre-allocate 
> swap entries for all swappable (BO) memory, making sure that we wouldn't run 
> out of swap space when, for example, hibernating and that would also limit 
> the pinned non-swappable memory (from TTM driver kernel allocations for 
> example) to half the system memory resources.
>
> Thanks,
> Thomas
>
> ___
> dri-devel mailing list
> dri-devel@lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/dri-devel

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 0/5] prevent OOM triggered by TTM

2018-02-07 Thread Christian König

AMD prefers return value of no memory rather than OOM for now.
Yeah, but Thomas is right that the default for this feature should be 
that it is turned off.


That's why I said that we should make the limit configurable.

Regards,
Christian.

Am 07.02.2018 um 09:25 schrieb He, Roger:

Why should TTM be different in that aspect? It would be good to know 
your reasoning WRT this?

Now, in TTM struct ttm_bo_device it already has member no_retry to indicate 
your option.
If you prefer no OOM triggered by TTM, set it as true. The default is false to 
keep original behavior.
AMD prefers return value of no memory rather than OOM for now.



One thing I looked at at one point was to have TTM do the swapping 
itself instead of handing it off to the shmem system. That way we could 
pre-allocate swap entries for all swappable (BO) memory, making sure that we 
wouldn't run out of swap space when,

I prefer current mechanism of swap out. At the beginning the swapped pages stay 
in system memory by shmem until OS move to status with high memory pressure, 
that has an obvious advantage. For example, if the BO is swapped out into 
shmem, but not really be flushed into swap disk. When validate it and swap in 
it at this moment, the overhead is small compared to swap in from disk. In 
addition, No need swap space reservation for TTM pages when allocation since 
swap disk is shared not only for TTM exclusive.
So again we provide a flag no_retry in struct ttm_bo_device to let driver set 
according to its request.


Thanks
Roger(Hongbo.He)

-Original Message-
From: Thomas Hellstrom [mailto:tho...@shipmail.org]
Sent: Wednesday, February 07, 2018 2:43 PM
To: He, Roger <hongbo...@amd.com>; amd-...@lists.freedesktop.org; 
dri-devel@lists.freedesktop.org
Cc: Koenig, Christian <christian.koe...@amd.com>
Subject: Re: [PATCH 0/5] prevent OOM triggered by TTM

Hi, Roger,

On 02/06/2018 10:04 AM, Roger He wrote:

currently ttm code has no any allocation limit. So it allows pages
allocatation unlimited until OOM. Because if swap space is full of
swapped pages and then system memory will be filled up with ttm pages.
and then any memory allocation request will trigger OOM.


I'm a bit curious, isn't this the way things are supposed to work on a linux 
system?
If all memory resources are used up, the OOM killer will kill the most memory 
hungry (perhaps rogue) process rather than processes being nice and try to find 
out themselves whether allocations will succeed?
Why should TTM be different in that aspect? It would be good to know your 
reasoning WRT this?

Admittedly, graphics process OOM memory accounting doesn't work very well, due 
to not all BOs not being CPU mapped, but it looks like there is recent work 
towards fixing this?

One thing I looked at at one point was to have TTM do the swapping itself 
instead of handing it off to the shmem system. That way we could pre-allocate 
swap entries for all swappable (BO) memory, making sure that we wouldn't run 
out of swap space when, for example, hibernating and that would also limit the 
pinned non-swappable memory (from TTM driver kernel allocations for example) to 
half the system memory resources.

Thanks,
Thomas

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH 0/5] prevent OOM triggered by TTM

2018-02-07 Thread He, Roger
Why should TTM be different in that aspect? It would be good to know 
your reasoning WRT this?

Now, in TTM struct ttm_bo_device it already has member no_retry to indicate 
your option.
If you prefer no OOM triggered by TTM, set it as true. The default is false to 
keep original behavior. 
AMD prefers return value of no memory rather than OOM for now.



One thing I looked at at one point was to have TTM do the swapping 
itself instead of handing it off to the shmem system. That way we could 
pre-allocate swap entries for all swappable (BO) memory, making sure that we 
wouldn't run out of swap space when, 

I prefer current mechanism of swap out. At the beginning the swapped pages stay 
in system memory by shmem until OS move to status with high memory pressure, 
that has an obvious advantage. For example, if the BO is swapped out into 
shmem, but not really be flushed into swap disk. When validate it and swap in 
it at this moment, the overhead is small compared to swap in from disk. In 
addition, No need swap space reservation for TTM pages when allocation since 
swap disk is shared not only for TTM exclusive.
So again we provide a flag no_retry in struct ttm_bo_device to let driver set 
according to its request.


Thanks
Roger(Hongbo.He)

-Original Message-
From: Thomas Hellstrom [mailto:tho...@shipmail.org] 
Sent: Wednesday, February 07, 2018 2:43 PM
To: He, Roger <hongbo...@amd.com>; amd-...@lists.freedesktop.org; 
dri-devel@lists.freedesktop.org
Cc: Koenig, Christian <christian.koe...@amd.com>
Subject: Re: [PATCH 0/5] prevent OOM triggered by TTM

Hi, Roger,

On 02/06/2018 10:04 AM, Roger He wrote:
> currently ttm code has no any allocation limit. So it allows pages 
> allocatation unlimited until OOM. Because if swap space is full of 
> swapped pages and then system memory will be filled up with ttm pages. 
> and then any memory allocation request will trigger OOM.
>

I'm a bit curious, isn't this the way things are supposed to work on a linux 
system?
If all memory resources are used up, the OOM killer will kill the most memory 
hungry (perhaps rogue) process rather than processes being nice and try to find 
out themselves whether allocations will succeed?
Why should TTM be different in that aspect? It would be good to know your 
reasoning WRT this?

Admittedly, graphics process OOM memory accounting doesn't work very well, due 
to not all BOs not being CPU mapped, but it looks like there is recent work 
towards fixing this?

One thing I looked at at one point was to have TTM do the swapping itself 
instead of handing it off to the shmem system. That way we could pre-allocate 
swap entries for all swappable (BO) memory, making sure that we wouldn't run 
out of swap space when, for example, hibernating and that would also limit the 
pinned non-swappable memory (from TTM driver kernel allocations for example) to 
half the system memory resources.

Thanks,
Thomas

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 0/5] prevent OOM triggered by TTM

2018-02-06 Thread Thomas Hellstrom

Hi, Roger,

On 02/06/2018 10:04 AM, Roger He wrote:

currently ttm code has no any allocation limit. So it allows pages
allocatation unlimited until OOM. Because if swap space is full
of swapped pages and then system memory will be filled up with ttm
pages. and then any memory allocation request will trigger OOM.



I'm a bit curious, isn't this the way things are supposed to work on a 
linux system?
If all memory resources are used up, the OOM killer will kill the most 
memory hungry (perhaps rogue) process
rather than processes being nice and try to find out themselves whether 
allocations will succeed?
Why should TTM be different in that aspect? It would be good to know 
your reasoning WRT this?


Admittedly, graphics process OOM memory accounting doesn't work very 
well, due to not all BOs not being

CPU mapped, but it looks like there is recent work towards fixing this?

One thing I looked at at one point was to have TTM do the swapping 
itself instead of handing it off to the
shmem system. That way we could pre-allocate swap entries for all 
swappable (BO) memory, making sure that
we wouldn't run out of swap space when, for example, hibernating and 
that would also limit the pinned
non-swappable memory (from TTM driver kernel allocations for example) to 
half the system memory resources.


Thanks,
Thomas

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


RE: [PATCH 0/5] prevent OOM triggered by TTM

2018-02-06 Thread He, Roger
Move the new call out of ttm_mem_global_reserve() and into 
ttm_page_alloc.c or ttm_page_alloc_dma.c (but keep it in ttm_memory.c).  
ttm_mem_global_reserve() is called for each page allocated and 
si_mem_available() is a bit to heavy for that.

Good idea! Agree with you completely, because initially I also concern that but 
no better way at that time.
Going to improve the patches. Thanks!

-Original Message-
From: Koenig, Christian 
Sent: Tuesday, February 06, 2018 5:52 PM
To: He, Roger <hongbo...@amd.com>; amd-...@lists.freedesktop.org; 
dri-devel@lists.freedesktop.org
Cc: tho...@shipmail.org
Subject: Re: [PATCH 0/5] prevent OOM triggered by TTM

Nice work, but a few comments.

First of all you need to reorder the patches. Adding the exceptions to the 
restrictions should come first, then the restriction itself. 
Otherwise we might break a setup in between the patches and that is bad for 
bisecting.

Then make all values configurable, e.g. take a closer look at ttm_memory.c. 
Just add attributes directly under the memory_accounting directory (see 
ttm_mem_global_init).

Additional to that you can't put device specific information (the no_retry 
flag) into ttm_mem_global, that is driver unspecific and won't work like this.

Move the new call out of ttm_mem_global_reserve() and into ttm_page_alloc.c or 
ttm_page_alloc_dma.c (but keep it in ttm_memory.c). 
ttm_mem_global_reserve() is called for each page allocated and
si_mem_available() is a bit to heavy for that.

Maybe name TTM_OPT_FLAG_ALLOW_ALLOC_ANYWAY something like _FORCE_ALLOCATION or 
_ALLOW_OOM.

And please also try if a criteria like (si_mem_available() +
get_nr_swap_pages()) < limit works as well. This way we would have only a 
single new limit.

Regards,
Christian.

Am 06.02.2018 um 10:04 schrieb Roger He:
> currently ttm code has no any allocation limit. So it allows pages 
> allocatation unlimited until OOM. Because if swap space is full of 
> swapped pages and then system memory will be filled up with ttm pages. 
> and then any memory allocation request will trigger OOM.
>
>
> the following patches is for prevent OOM triggered by TTM.
> the basic idea is when allocating TTM pages, check the free swap space 
> firt. if it is less than the fixe limit, reject the allocation request.
> but there are two exceptions which should allow it regardless of zone 
> memory account limit.
> a. page fault
> for ttm_mem_global_reserve if serving for page fault routine,
> because page fault routing already grabbed system memory so the
> allowance of this exception is harmless. Otherwise, it will trigger
>  OOM killer.
> b. suspend
> anyway, we should allow suspend success always.
>
>
> at last, if bdev.no_retry is false (by defaut), keep the original 
> behavior no any change.
>
> Roger He (5):
>drm/ttm: check if the free swap space is under limit 256MB
>drm/ttm: keep original behavior except with flag no_retry
>drm/ttm: use bit flag to replace allow_reserved_eviction in
>  ttm_operation_ctx
>drm/ttm: add bit flag TTM_OPT_FLAG_ALLOW_ALLOC_ANYWAY
>drm/ttm: add input parameter allow_allo_anyway for ttm_bo_evict_mm
>
>   drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  4 +--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  4 +--
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  | 10 +++---
>   drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |  2 +-
>   drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  8 +++--
>   drivers/gpu/drm/nouveau/nouveau_drm.c   |  2 +-
>   drivers/gpu/drm/qxl/qxl_object.c|  4 +--
>   drivers/gpu/drm/radeon/radeon_device.c  |  6 ++--
>   drivers/gpu/drm/radeon/radeon_object.c  |  6 ++--
>   drivers/gpu/drm/radeon/radeon_object.h  |  3 +-
>   drivers/gpu/drm/ttm/ttm_bo.c| 19 +++
>   drivers/gpu/drm/ttm/ttm_bo_vm.c |  6 ++--
>   drivers/gpu/drm/ttm/ttm_memory.c| 51 
> ++---
>   drivers/gpu/drm/ttm/ttm_page_alloc_dma.c|  1 -
>   drivers/gpu/drm/vmwgfx/vmwgfx_drv.c |  6 ++--
>   include/drm/ttm/ttm_bo_api.h| 14 ++--
>   include/drm/ttm/ttm_memory.h|  6 
>   18 files changed, 111 insertions(+), 43 deletions(-)
>

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


Re: [PATCH 0/5] prevent OOM triggered by TTM

2018-02-06 Thread Christian König

Nice work, but a few comments.

First of all you need to reorder the patches. Adding the exceptions to 
the restrictions should come first, then the restriction itself. 
Otherwise we might break a setup in between the patches and that is bad 
for bisecting.


Then make all values configurable, e.g. take a closer look at 
ttm_memory.c. Just add attributes directly under the memory_accounting 
directory (see ttm_mem_global_init).


Additional to that you can't put device specific information (the 
no_retry flag) into ttm_mem_global, that is driver unspecific and won't 
work like this.


Move the new call out of ttm_mem_global_reserve() and into 
ttm_page_alloc.c or ttm_page_alloc_dma.c (but keep it in ttm_memory.c). 
ttm_mem_global_reserve() is called for each page allocated and 
si_mem_available() is a bit to heavy for that.


Maybe name TTM_OPT_FLAG_ALLOW_ALLOC_ANYWAY something like 
_FORCE_ALLOCATION or _ALLOW_OOM.


And please also try if a criteria like (si_mem_available() + 
get_nr_swap_pages()) < limit works as well. This way we would have only 
a single new limit.


Regards,
Christian.

Am 06.02.2018 um 10:04 schrieb Roger He:

currently ttm code has no any allocation limit. So it allows pages
allocatation unlimited until OOM. Because if swap space is full
of swapped pages and then system memory will be filled up with ttm
pages. and then any memory allocation request will trigger OOM.


the following patches is for prevent OOM triggered by TTM.
the basic idea is when allocating TTM pages, check the free swap space
firt. if it is less than the fixe limit, reject the allocation request.
but there are two exceptions which should allow it regardless of zone
memory account limit.
a. page fault
for ttm_mem_global_reserve if serving for page fault routine,
because page fault routing already grabbed system memory so the
allowance of this exception is harmless. Otherwise, it will trigger
 OOM killer.
b. suspend
anyway, we should allow suspend success always.


at last, if bdev.no_retry is false (by defaut), keep the original behavior
no any change.

Roger He (5):
   drm/ttm: check if the free swap space is under limit 256MB
   drm/ttm: keep original behavior except with flag no_retry
   drm/ttm: use bit flag to replace allow_reserved_eviction in
 ttm_operation_ctx
   drm/ttm: add bit flag TTM_OPT_FLAG_ALLOW_ALLOC_ANYWAY
   drm/ttm: add input parameter allow_allo_anyway for ttm_bo_evict_mm

  drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  4 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  4 +--
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  | 10 +++---
  drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |  2 +-
  drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  8 +++--
  drivers/gpu/drm/nouveau/nouveau_drm.c   |  2 +-
  drivers/gpu/drm/qxl/qxl_object.c|  4 +--
  drivers/gpu/drm/radeon/radeon_device.c  |  6 ++--
  drivers/gpu/drm/radeon/radeon_object.c  |  6 ++--
  drivers/gpu/drm/radeon/radeon_object.h  |  3 +-
  drivers/gpu/drm/ttm/ttm_bo.c| 19 +++
  drivers/gpu/drm/ttm/ttm_bo_vm.c |  6 ++--
  drivers/gpu/drm/ttm/ttm_memory.c| 51 ++---
  drivers/gpu/drm/ttm/ttm_page_alloc_dma.c|  1 -
  drivers/gpu/drm/vmwgfx/vmwgfx_drv.c |  6 ++--
  include/drm/ttm/ttm_bo_api.h| 14 ++--
  include/drm/ttm/ttm_memory.h|  6 
  18 files changed, 111 insertions(+), 43 deletions(-)



___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel


[PATCH 0/5] prevent OOM triggered by TTM

2018-02-06 Thread Roger He
currently ttm code has no any allocation limit. So it allows pages 
allocatation unlimited until OOM. Because if swap space is full
of swapped pages and then system memory will be filled up with ttm
pages. and then any memory allocation request will trigger OOM.


the following patches is for prevent OOM triggered by TTM.
the basic idea is when allocating TTM pages, check the free swap space
firt. if it is less than the fixe limit, reject the allocation request.
but there are two exceptions which should allow it regardless of zone 
memory account limit.
a. page fault
   for ttm_mem_global_reserve if serving for page fault routine,
   because page fault routing already grabbed system memory so the 
   allowance of this exception is harmless. Otherwise, it will trigger
OOM killer.
b. suspend
   anyway, we should allow suspend success always.


at last, if bdev.no_retry is false (by defaut), keep the original behavior
no any change.

Roger He (5):
  drm/ttm: check if the free swap space is under limit 256MB
  drm/ttm: keep original behavior except with flag no_retry
  drm/ttm: use bit flag to replace allow_reserved_eviction in
ttm_operation_ctx
  drm/ttm: add bit flag TTM_OPT_FLAG_ALLOW_ALLOC_ANYWAY
  drm/ttm: add input parameter allow_allo_anyway for ttm_bo_evict_mm

 drivers/gpu/drm/amd/amdgpu/amdgpu_cs.c  |  4 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_debugfs.c |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_device.c  |  4 +--
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.c  | 10 +++---
 drivers/gpu/drm/amd/amdgpu/amdgpu_object.h  |  2 +-
 drivers/gpu/drm/amd/amdgpu/amdgpu_ttm.c |  8 +++--
 drivers/gpu/drm/nouveau/nouveau_drm.c   |  2 +-
 drivers/gpu/drm/qxl/qxl_object.c|  4 +--
 drivers/gpu/drm/radeon/radeon_device.c  |  6 ++--
 drivers/gpu/drm/radeon/radeon_object.c  |  6 ++--
 drivers/gpu/drm/radeon/radeon_object.h  |  3 +-
 drivers/gpu/drm/ttm/ttm_bo.c| 19 +++
 drivers/gpu/drm/ttm/ttm_bo_vm.c |  6 ++--
 drivers/gpu/drm/ttm/ttm_memory.c| 51 ++---
 drivers/gpu/drm/ttm/ttm_page_alloc_dma.c|  1 -
 drivers/gpu/drm/vmwgfx/vmwgfx_drv.c |  6 ++--
 include/drm/ttm/ttm_bo_api.h| 14 ++--
 include/drm/ttm/ttm_memory.h|  6 
 18 files changed, 111 insertions(+), 43 deletions(-)

-- 
2.7.4

___
dri-devel mailing list
dri-devel@lists.freedesktop.org
https://lists.freedesktop.org/mailman/listinfo/dri-devel