Re: [RFC PATCH 2/4] mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic
On Wed 07-06-17 10:10:36, Wei Yang wrote: [...] > Hmm... Let me be more specific. With two factors, costly or not, flag set or > not, we have four combinations. Here it is classified into two categories. > > 1. __GFP_RETRY_MAYFAIL not set > > Brief description on behavior: > costly: pick up the shortcut, so no OOM > !costly: no shortcut and will OOM I think > > Impact from this patch set: > No. true > My personal understanding: > The allocation without __GFP_RETRY_MAYFAIL is not effected by this patch > set. Since !costly allocation will trigger OOM, this is the reason why > "small allocations never fail _practically_", as mentioned in > https://lwn.net/Articles/723317/. > > > 3. __GFP_RETRY_MAYFAIL set > > Brief description on behavior: > costly/!costly: no shortcut here and no OOM invoked > > Impact from this patch set: > For those allocations with __GFP_RETRY_MAYFAIL, OOM is not invoked for > both. yes > My personal understanding: > This is the semantic you are willing to introduce in this patch set. By > cutting off the OOM invoke when __GFP_RETRY_MAYFAIL is set, you makes this > a middle situation between NOFAIL and NORETRY. yes > page_alloc will try some luck to get some free pages without disturb other > part of the system. By doing so, the never fail allocation for !costly > pages will be "fixed". If I understand correctly, you are willing to make > this the default behavior in the future? I do not think we can make this a default in a foreseeable future unfortunately. That's why I've made it a gfp modifier in the first place. I assume many users will opt in by using the flag. In future we can even help by adding a highlevel GFP_$FOO flag but I am worried that this would just add to the explosion of existing highlevel gfp masks (e.g. do we want GFP_NOFS_MAY_FAIL, GFP_USER_MAY_FAIL, GFP_USER_HIGH_MOVABLE_MAYFAIL etc...) -- Michal Hocko SUSE Labs
Re: [RFC PATCH 2/4] mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic
On Wed 07-06-17 10:10:36, Wei Yang wrote: [...] > Hmm... Let me be more specific. With two factors, costly or not, flag set or > not, we have four combinations. Here it is classified into two categories. > > 1. __GFP_RETRY_MAYFAIL not set > > Brief description on behavior: > costly: pick up the shortcut, so no OOM > !costly: no shortcut and will OOM I think > > Impact from this patch set: > No. true > My personal understanding: > The allocation without __GFP_RETRY_MAYFAIL is not effected by this patch > set. Since !costly allocation will trigger OOM, this is the reason why > "small allocations never fail _practically_", as mentioned in > https://lwn.net/Articles/723317/. > > > 3. __GFP_RETRY_MAYFAIL set > > Brief description on behavior: > costly/!costly: no shortcut here and no OOM invoked > > Impact from this patch set: > For those allocations with __GFP_RETRY_MAYFAIL, OOM is not invoked for > both. yes > My personal understanding: > This is the semantic you are willing to introduce in this patch set. By > cutting off the OOM invoke when __GFP_RETRY_MAYFAIL is set, you makes this > a middle situation between NOFAIL and NORETRY. yes > page_alloc will try some luck to get some free pages without disturb other > part of the system. By doing so, the never fail allocation for !costly > pages will be "fixed". If I understand correctly, you are willing to make > this the default behavior in the future? I do not think we can make this a default in a foreseeable future unfortunately. That's why I've made it a gfp modifier in the first place. I assume many users will opt in by using the flag. In future we can even help by adding a highlevel GFP_$FOO flag but I am worried that this would just add to the explosion of existing highlevel gfp masks (e.g. do we want GFP_NOFS_MAY_FAIL, GFP_USER_MAY_FAIL, GFP_USER_HIGH_MOVABLE_MAYFAIL etc...) -- Michal Hocko SUSE Labs
Re: [RFC PATCH 2/4] mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic
On Tue, Jun 06, 2017 at 02:03:15PM +0200, Michal Hocko wrote: >On Tue 06-06-17 11:04:01, Wei Yang wrote: >> On Mon, Jun 05, 2017 at 08:43:43AM +0200, Michal Hocko wrote: >> >On Sat 03-06-17 10:24:40, Wei Yang wrote: >> >> Hi, Michal >> >> >> >> Just go through your patch. >> >> >> >> I have one question and one suggestion as below. >> >> >> >> One suggestion: >> >> >> >> This patch does two things to me: >> >> 1. Replace __GFP_REPEAT with __GFP_RETRY_MAYFAIL >> >> 2. Adjust the logic in page_alloc to provide the middle semantic >> >> >> >> My suggestion is to split these two task into two patches, so that readers >> >> could catch your fundamental logic change easily. >> > >> >Well, the rename and the change is intentionally tight together. My >> >previous patches have removed all __GFP_REPEAT users for low order >> >requests which didn't have any implemented semantic. So as of now we >> >should only have those users which semantic will not change. I do not >> >add any new low order user in this patch so it in fact doesn't change >> >any existing semnatic. >> > >> >> >> >> On Tue, Mar 07, 2017 at 04:48:41PM +0100, Michal Hocko wrote: >> >> >From: Michal Hocko>> >[...] >> >> >@@ -3776,9 +3784,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned >> >> >int order, >> >> > >> >> > /* >> >> > * Do not retry costly high order allocations unless they are >> >> >- * __GFP_REPEAT >> >> >+ * __GFP_RETRY_MAYFAIL >> >> > */ >> >> >-if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & >> >> >__GFP_REPEAT)) >> >> >+if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & >> >> >__GFP_RETRY_MAYFAIL)) >> >> > goto nopage; >> >> >> >> One question: >> >> >> >> From your change log, it mentions will provide the same semantic for >> >> !costly >> >> allocations. While the logic here is the same as before. >> >> >> >> For a !costly allocation with __GFP_REPEAT flag, the difference after this >> >> patch is no OOM will be invoked, while it will still continue in the loop. >> > >> >Not really. There are two things. The above will shortcut retrying if >> >there is _no_ __GFP_RETRY_MAYFAIL. If the flags _is_ specified we will >> >back of in __alloc_pages_may_oom. >> > >> >> Maybe I don't catch your point in this message: >> >> >> >> __GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to >> >> the page allocator. This has been true but only for allocations requests >> >> larger than PAGE_ALLOC_COSTLY_ORDER. It has been always ignored for >> >> smaller sizes. This is a bit unfortunate because there is no way to >> >> express the same semantic for those requests and they are considered too >> >> important to fail so they might end up looping in the page allocator for >> >> ever, similarly to GFP_NOFAIL requests. >> >> >> >> I thought you will provide the same semantic to !costly allocation, or I >> >> misunderstand? >> > >> >yes and that is the case. __alloc_pages_may_oom will back off before OOM >> >killer is invoked and the allocator slow path will fail because >> >did_some_progress == 0; >> >> Thanks for your explanation. >> >> So same "semantic" doesn't mean same "behavior". >> 1. costly allocations will pick up the shut cut > >yes and there are no such allocations yet (based on my previous >cleanups) > >> 2. !costly allocations will try something more but finally fail without >> invoking OOM. > >no, the behavior will not change for those. > Hmm... Let me be more specific. With two factors, costly or not, flag set or not, we have four combinations. Here it is classified into two categories. 1. __GFP_RETRY_MAYFAIL not set Brief description on behavior: costly: pick up the shortcut, so no OOM !costly: no shortcut and will OOM I think Impact from this patch set: No. My personal understanding: The allocation without __GFP_RETRY_MAYFAIL is not effected by this patch set. Since !costly allocation will trigger OOM, this is the reason why "small allocations never fail _practically_", as mentioned in https://lwn.net/Articles/723317/. 3. __GFP_RETRY_MAYFAIL set Brief description on behavior: costly/!costly: no shortcut here and no OOM invoked Impact from this patch set: For those allocations with __GFP_RETRY_MAYFAIL, OOM is not invoked for both. My personal understanding: This is the semantic you are willing to introduce in this patch set. By cutting off the OOM invoke when __GFP_RETRY_MAYFAIL is set, you makes this a middle situation between NOFAIL and NORETRY. page_alloc will try some luck to get some free pages without disturb other part of the system. By doing so, the never fail allocation for !costly pages will be "fixed". If I understand correctly, you are willing to make this the default behavior in the future? >> Hope this time I catch your point. >> >> BTW, did_some_progress mostly means the OOM works to
Re: [RFC PATCH 2/4] mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic
On Tue, Jun 06, 2017 at 02:03:15PM +0200, Michal Hocko wrote: >On Tue 06-06-17 11:04:01, Wei Yang wrote: >> On Mon, Jun 05, 2017 at 08:43:43AM +0200, Michal Hocko wrote: >> >On Sat 03-06-17 10:24:40, Wei Yang wrote: >> >> Hi, Michal >> >> >> >> Just go through your patch. >> >> >> >> I have one question and one suggestion as below. >> >> >> >> One suggestion: >> >> >> >> This patch does two things to me: >> >> 1. Replace __GFP_REPEAT with __GFP_RETRY_MAYFAIL >> >> 2. Adjust the logic in page_alloc to provide the middle semantic >> >> >> >> My suggestion is to split these two task into two patches, so that readers >> >> could catch your fundamental logic change easily. >> > >> >Well, the rename and the change is intentionally tight together. My >> >previous patches have removed all __GFP_REPEAT users for low order >> >requests which didn't have any implemented semantic. So as of now we >> >should only have those users which semantic will not change. I do not >> >add any new low order user in this patch so it in fact doesn't change >> >any existing semnatic. >> > >> >> >> >> On Tue, Mar 07, 2017 at 04:48:41PM +0100, Michal Hocko wrote: >> >> >From: Michal Hocko >> >[...] >> >> >@@ -3776,9 +3784,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned >> >> >int order, >> >> > >> >> > /* >> >> > * Do not retry costly high order allocations unless they are >> >> >- * __GFP_REPEAT >> >> >+ * __GFP_RETRY_MAYFAIL >> >> > */ >> >> >-if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & >> >> >__GFP_REPEAT)) >> >> >+if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & >> >> >__GFP_RETRY_MAYFAIL)) >> >> > goto nopage; >> >> >> >> One question: >> >> >> >> From your change log, it mentions will provide the same semantic for >> >> !costly >> >> allocations. While the logic here is the same as before. >> >> >> >> For a !costly allocation with __GFP_REPEAT flag, the difference after this >> >> patch is no OOM will be invoked, while it will still continue in the loop. >> > >> >Not really. There are two things. The above will shortcut retrying if >> >there is _no_ __GFP_RETRY_MAYFAIL. If the flags _is_ specified we will >> >back of in __alloc_pages_may_oom. >> > >> >> Maybe I don't catch your point in this message: >> >> >> >> __GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to >> >> the page allocator. This has been true but only for allocations requests >> >> larger than PAGE_ALLOC_COSTLY_ORDER. It has been always ignored for >> >> smaller sizes. This is a bit unfortunate because there is no way to >> >> express the same semantic for those requests and they are considered too >> >> important to fail so they might end up looping in the page allocator for >> >> ever, similarly to GFP_NOFAIL requests. >> >> >> >> I thought you will provide the same semantic to !costly allocation, or I >> >> misunderstand? >> > >> >yes and that is the case. __alloc_pages_may_oom will back off before OOM >> >killer is invoked and the allocator slow path will fail because >> >did_some_progress == 0; >> >> Thanks for your explanation. >> >> So same "semantic" doesn't mean same "behavior". >> 1. costly allocations will pick up the shut cut > >yes and there are no such allocations yet (based on my previous >cleanups) > >> 2. !costly allocations will try something more but finally fail without >> invoking OOM. > >no, the behavior will not change for those. > Hmm... Let me be more specific. With two factors, costly or not, flag set or not, we have four combinations. Here it is classified into two categories. 1. __GFP_RETRY_MAYFAIL not set Brief description on behavior: costly: pick up the shortcut, so no OOM !costly: no shortcut and will OOM I think Impact from this patch set: No. My personal understanding: The allocation without __GFP_RETRY_MAYFAIL is not effected by this patch set. Since !costly allocation will trigger OOM, this is the reason why "small allocations never fail _practically_", as mentioned in https://lwn.net/Articles/723317/. 3. __GFP_RETRY_MAYFAIL set Brief description on behavior: costly/!costly: no shortcut here and no OOM invoked Impact from this patch set: For those allocations with __GFP_RETRY_MAYFAIL, OOM is not invoked for both. My personal understanding: This is the semantic you are willing to introduce in this patch set. By cutting off the OOM invoke when __GFP_RETRY_MAYFAIL is set, you makes this a middle situation between NOFAIL and NORETRY. page_alloc will try some luck to get some free pages without disturb other part of the system. By doing so, the never fail allocation for !costly pages will be "fixed". If I understand correctly, you are willing to make this the default behavior in the future? >> Hope this time I catch your point. >> >> BTW, did_some_progress mostly means the OOM works to me. Are there
Re: [RFC PATCH 2/4] mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic
On Tue 06-06-17 11:04:01, Wei Yang wrote: > On Mon, Jun 05, 2017 at 08:43:43AM +0200, Michal Hocko wrote: > >On Sat 03-06-17 10:24:40, Wei Yang wrote: > >> Hi, Michal > >> > >> Just go through your patch. > >> > >> I have one question and one suggestion as below. > >> > >> One suggestion: > >> > >> This patch does two things to me: > >> 1. Replace __GFP_REPEAT with __GFP_RETRY_MAYFAIL > >> 2. Adjust the logic in page_alloc to provide the middle semantic > >> > >> My suggestion is to split these two task into two patches, so that readers > >> could catch your fundamental logic change easily. > > > >Well, the rename and the change is intentionally tight together. My > >previous patches have removed all __GFP_REPEAT users for low order > >requests which didn't have any implemented semantic. So as of now we > >should only have those users which semantic will not change. I do not > >add any new low order user in this patch so it in fact doesn't change > >any existing semnatic. > > > >> > >> On Tue, Mar 07, 2017 at 04:48:41PM +0100, Michal Hocko wrote: > >> >From: Michal Hocko> >[...] > >> >@@ -3776,9 +3784,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int > >> >order, > >> > > >> > /* > >> > * Do not retry costly high order allocations unless they are > >> >- * __GFP_REPEAT > >> >+ * __GFP_RETRY_MAYFAIL > >> > */ > >> >- if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT)) > >> >+ if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & > >> >__GFP_RETRY_MAYFAIL)) > >> > goto nopage; > >> > >> One question: > >> > >> From your change log, it mentions will provide the same semantic for > >> !costly > >> allocations. While the logic here is the same as before. > >> > >> For a !costly allocation with __GFP_REPEAT flag, the difference after this > >> patch is no OOM will be invoked, while it will still continue in the loop. > > > >Not really. There are two things. The above will shortcut retrying if > >there is _no_ __GFP_RETRY_MAYFAIL. If the flags _is_ specified we will > >back of in __alloc_pages_may_oom. > > > >> Maybe I don't catch your point in this message: > >> > >> __GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to > >> the page allocator. This has been true but only for allocations requests > >> larger than PAGE_ALLOC_COSTLY_ORDER. It has been always ignored for > >> smaller sizes. This is a bit unfortunate because there is no way to > >> express the same semantic for those requests and they are considered too > >> important to fail so they might end up looping in the page allocator for > >> ever, similarly to GFP_NOFAIL requests. > >> > >> I thought you will provide the same semantic to !costly allocation, or I > >> misunderstand? > > > >yes and that is the case. __alloc_pages_may_oom will back off before OOM > >killer is invoked and the allocator slow path will fail because > >did_some_progress == 0; > > Thanks for your explanation. > > So same "semantic" doesn't mean same "behavior". > 1. costly allocations will pick up the shut cut yes and there are no such allocations yet (based on my previous cleanups) > 2. !costly allocations will try something more but finally fail without > invoking OOM. no, the behavior will not change for those. > Hope this time I catch your point. > > BTW, did_some_progress mostly means the OOM works to me. Are there some other > important situations when did_some_progress is set to 1? Yes e.g. for GFP_NOFS when we cannot really invoke the OOM killer yet we cannot fail the allocation. -- Michal Hocko SUSE Labs
Re: [RFC PATCH 2/4] mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic
On Tue 06-06-17 11:04:01, Wei Yang wrote: > On Mon, Jun 05, 2017 at 08:43:43AM +0200, Michal Hocko wrote: > >On Sat 03-06-17 10:24:40, Wei Yang wrote: > >> Hi, Michal > >> > >> Just go through your patch. > >> > >> I have one question and one suggestion as below. > >> > >> One suggestion: > >> > >> This patch does two things to me: > >> 1. Replace __GFP_REPEAT with __GFP_RETRY_MAYFAIL > >> 2. Adjust the logic in page_alloc to provide the middle semantic > >> > >> My suggestion is to split these two task into two patches, so that readers > >> could catch your fundamental logic change easily. > > > >Well, the rename and the change is intentionally tight together. My > >previous patches have removed all __GFP_REPEAT users for low order > >requests which didn't have any implemented semantic. So as of now we > >should only have those users which semantic will not change. I do not > >add any new low order user in this patch so it in fact doesn't change > >any existing semnatic. > > > >> > >> On Tue, Mar 07, 2017 at 04:48:41PM +0100, Michal Hocko wrote: > >> >From: Michal Hocko > >[...] > >> >@@ -3776,9 +3784,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int > >> >order, > >> > > >> > /* > >> > * Do not retry costly high order allocations unless they are > >> >- * __GFP_REPEAT > >> >+ * __GFP_RETRY_MAYFAIL > >> > */ > >> >- if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT)) > >> >+ if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & > >> >__GFP_RETRY_MAYFAIL)) > >> > goto nopage; > >> > >> One question: > >> > >> From your change log, it mentions will provide the same semantic for > >> !costly > >> allocations. While the logic here is the same as before. > >> > >> For a !costly allocation with __GFP_REPEAT flag, the difference after this > >> patch is no OOM will be invoked, while it will still continue in the loop. > > > >Not really. There are two things. The above will shortcut retrying if > >there is _no_ __GFP_RETRY_MAYFAIL. If the flags _is_ specified we will > >back of in __alloc_pages_may_oom. > > > >> Maybe I don't catch your point in this message: > >> > >> __GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to > >> the page allocator. This has been true but only for allocations requests > >> larger than PAGE_ALLOC_COSTLY_ORDER. It has been always ignored for > >> smaller sizes. This is a bit unfortunate because there is no way to > >> express the same semantic for those requests and they are considered too > >> important to fail so they might end up looping in the page allocator for > >> ever, similarly to GFP_NOFAIL requests. > >> > >> I thought you will provide the same semantic to !costly allocation, or I > >> misunderstand? > > > >yes and that is the case. __alloc_pages_may_oom will back off before OOM > >killer is invoked and the allocator slow path will fail because > >did_some_progress == 0; > > Thanks for your explanation. > > So same "semantic" doesn't mean same "behavior". > 1. costly allocations will pick up the shut cut yes and there are no such allocations yet (based on my previous cleanups) > 2. !costly allocations will try something more but finally fail without > invoking OOM. no, the behavior will not change for those. > Hope this time I catch your point. > > BTW, did_some_progress mostly means the OOM works to me. Are there some other > important situations when did_some_progress is set to 1? Yes e.g. for GFP_NOFS when we cannot really invoke the OOM killer yet we cannot fail the allocation. -- Michal Hocko SUSE Labs
Re: [RFC PATCH 2/4] mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic
On Mon, Jun 05, 2017 at 08:43:43AM +0200, Michal Hocko wrote: >On Sat 03-06-17 10:24:40, Wei Yang wrote: >> Hi, Michal >> >> Just go through your patch. >> >> I have one question and one suggestion as below. >> >> One suggestion: >> >> This patch does two things to me: >> 1. Replace __GFP_REPEAT with __GFP_RETRY_MAYFAIL >> 2. Adjust the logic in page_alloc to provide the middle semantic >> >> My suggestion is to split these two task into two patches, so that readers >> could catch your fundamental logic change easily. > >Well, the rename and the change is intentionally tight together. My >previous patches have removed all __GFP_REPEAT users for low order >requests which didn't have any implemented semantic. So as of now we >should only have those users which semantic will not change. I do not >add any new low order user in this patch so it in fact doesn't change >any existing semnatic. > >> >> On Tue, Mar 07, 2017 at 04:48:41PM +0100, Michal Hocko wrote: >> >From: Michal Hocko>[...] >> >@@ -3776,9 +3784,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int >> >order, >> > >> >/* >> > * Do not retry costly high order allocations unless they are >> >-* __GFP_REPEAT >> >+* __GFP_RETRY_MAYFAIL >> > */ >> >- if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT)) >> >+ if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & >> >__GFP_RETRY_MAYFAIL)) >> >goto nopage; >> >> One question: >> >> From your change log, it mentions will provide the same semantic for !costly >> allocations. While the logic here is the same as before. >> >> For a !costly allocation with __GFP_REPEAT flag, the difference after this >> patch is no OOM will be invoked, while it will still continue in the loop. > >Not really. There are two things. The above will shortcut retrying if >there is _no_ __GFP_RETRY_MAYFAIL. If the flags _is_ specified we will >back of in __alloc_pages_may_oom. > >> Maybe I don't catch your point in this message: >> >> __GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to >> the page allocator. This has been true but only for allocations requests >> larger than PAGE_ALLOC_COSTLY_ORDER. It has been always ignored for >> smaller sizes. This is a bit unfortunate because there is no way to >> express the same semantic for those requests and they are considered too >> important to fail so they might end up looping in the page allocator for >> ever, similarly to GFP_NOFAIL requests. >> >> I thought you will provide the same semantic to !costly allocation, or I >> misunderstand? > >yes and that is the case. __alloc_pages_may_oom will back off before OOM >killer is invoked and the allocator slow path will fail because >did_some_progress == 0; Thanks for your explanation. So same "semantic" doesn't mean same "behavior". 1. costly allocations will pick up the shut cut 2. !costly allocations will try something more but finally fail without invoking OOM. Hope this time I catch your point. BTW, did_some_progress mostly means the OOM works to me. Are there some other important situations when did_some_progress is set to 1? >-- >Michal Hocko >SUSE Labs -- Wei Yang Help you, Help me signature.asc Description: PGP signature
Re: [RFC PATCH 2/4] mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic
On Mon, Jun 05, 2017 at 08:43:43AM +0200, Michal Hocko wrote: >On Sat 03-06-17 10:24:40, Wei Yang wrote: >> Hi, Michal >> >> Just go through your patch. >> >> I have one question and one suggestion as below. >> >> One suggestion: >> >> This patch does two things to me: >> 1. Replace __GFP_REPEAT with __GFP_RETRY_MAYFAIL >> 2. Adjust the logic in page_alloc to provide the middle semantic >> >> My suggestion is to split these two task into two patches, so that readers >> could catch your fundamental logic change easily. > >Well, the rename and the change is intentionally tight together. My >previous patches have removed all __GFP_REPEAT users for low order >requests which didn't have any implemented semantic. So as of now we >should only have those users which semantic will not change. I do not >add any new low order user in this patch so it in fact doesn't change >any existing semnatic. > >> >> On Tue, Mar 07, 2017 at 04:48:41PM +0100, Michal Hocko wrote: >> >From: Michal Hocko >[...] >> >@@ -3776,9 +3784,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int >> >order, >> > >> >/* >> > * Do not retry costly high order allocations unless they are >> >-* __GFP_REPEAT >> >+* __GFP_RETRY_MAYFAIL >> > */ >> >- if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT)) >> >+ if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & >> >__GFP_RETRY_MAYFAIL)) >> >goto nopage; >> >> One question: >> >> From your change log, it mentions will provide the same semantic for !costly >> allocations. While the logic here is the same as before. >> >> For a !costly allocation with __GFP_REPEAT flag, the difference after this >> patch is no OOM will be invoked, while it will still continue in the loop. > >Not really. There are two things. The above will shortcut retrying if >there is _no_ __GFP_RETRY_MAYFAIL. If the flags _is_ specified we will >back of in __alloc_pages_may_oom. > >> Maybe I don't catch your point in this message: >> >> __GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to >> the page allocator. This has been true but only for allocations requests >> larger than PAGE_ALLOC_COSTLY_ORDER. It has been always ignored for >> smaller sizes. This is a bit unfortunate because there is no way to >> express the same semantic for those requests and they are considered too >> important to fail so they might end up looping in the page allocator for >> ever, similarly to GFP_NOFAIL requests. >> >> I thought you will provide the same semantic to !costly allocation, or I >> misunderstand? > >yes and that is the case. __alloc_pages_may_oom will back off before OOM >killer is invoked and the allocator slow path will fail because >did_some_progress == 0; Thanks for your explanation. So same "semantic" doesn't mean same "behavior". 1. costly allocations will pick up the shut cut 2. !costly allocations will try something more but finally fail without invoking OOM. Hope this time I catch your point. BTW, did_some_progress mostly means the OOM works to me. Are there some other important situations when did_some_progress is set to 1? >-- >Michal Hocko >SUSE Labs -- Wei Yang Help you, Help me signature.asc Description: PGP signature
Re: [RFC PATCH 2/4] mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic
On Sat 03-06-17 10:24:40, Wei Yang wrote: > Hi, Michal > > Just go through your patch. > > I have one question and one suggestion as below. > > One suggestion: > > This patch does two things to me: > 1. Replace __GFP_REPEAT with __GFP_RETRY_MAYFAIL > 2. Adjust the logic in page_alloc to provide the middle semantic > > My suggestion is to split these two task into two patches, so that readers > could catch your fundamental logic change easily. Well, the rename and the change is intentionally tight together. My previous patches have removed all __GFP_REPEAT users for low order requests which didn't have any implemented semantic. So as of now we should only have those users which semantic will not change. I do not add any new low order user in this patch so it in fact doesn't change any existing semnatic. > > On Tue, Mar 07, 2017 at 04:48:41PM +0100, Michal Hocko wrote: > >From: Michal Hocko[...] > >@@ -3776,9 +3784,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int > >order, > > > > /* > > * Do not retry costly high order allocations unless they are > >- * __GFP_REPEAT > >+ * __GFP_RETRY_MAYFAIL > > */ > >-if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT)) > >+if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & > >__GFP_RETRY_MAYFAIL)) > > goto nopage; > > One question: > > From your change log, it mentions will provide the same semantic for !costly > allocations. While the logic here is the same as before. > > For a !costly allocation with __GFP_REPEAT flag, the difference after this > patch is no OOM will be invoked, while it will still continue in the loop. Not really. There are two things. The above will shortcut retrying if there is _no_ __GFP_RETRY_MAYFAIL. If the flags _is_ specified we will back of in __alloc_pages_may_oom. > Maybe I don't catch your point in this message: > > __GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to > the page allocator. This has been true but only for allocations requests > larger than PAGE_ALLOC_COSTLY_ORDER. It has been always ignored for > smaller sizes. This is a bit unfortunate because there is no way to > express the same semantic for those requests and they are considered too > important to fail so they might end up looping in the page allocator for > ever, similarly to GFP_NOFAIL requests. > > I thought you will provide the same semantic to !costly allocation, or I > misunderstand? yes and that is the case. __alloc_pages_may_oom will back off before OOM killer is invoked and the allocator slow path will fail because did_some_progress == 0; -- Michal Hocko SUSE Labs
Re: [RFC PATCH 2/4] mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic
On Sat 03-06-17 10:24:40, Wei Yang wrote: > Hi, Michal > > Just go through your patch. > > I have one question and one suggestion as below. > > One suggestion: > > This patch does two things to me: > 1. Replace __GFP_REPEAT with __GFP_RETRY_MAYFAIL > 2. Adjust the logic in page_alloc to provide the middle semantic > > My suggestion is to split these two task into two patches, so that readers > could catch your fundamental logic change easily. Well, the rename and the change is intentionally tight together. My previous patches have removed all __GFP_REPEAT users for low order requests which didn't have any implemented semantic. So as of now we should only have those users which semantic will not change. I do not add any new low order user in this patch so it in fact doesn't change any existing semnatic. > > On Tue, Mar 07, 2017 at 04:48:41PM +0100, Michal Hocko wrote: > >From: Michal Hocko [...] > >@@ -3776,9 +3784,9 @@ __alloc_pages_slowpath(gfp_t gfp_mask, unsigned int > >order, > > > > /* > > * Do not retry costly high order allocations unless they are > >- * __GFP_REPEAT > >+ * __GFP_RETRY_MAYFAIL > > */ > >-if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & __GFP_REPEAT)) > >+if (order > PAGE_ALLOC_COSTLY_ORDER && !(gfp_mask & > >__GFP_RETRY_MAYFAIL)) > > goto nopage; > > One question: > > From your change log, it mentions will provide the same semantic for !costly > allocations. While the logic here is the same as before. > > For a !costly allocation with __GFP_REPEAT flag, the difference after this > patch is no OOM will be invoked, while it will still continue in the loop. Not really. There are two things. The above will shortcut retrying if there is _no_ __GFP_RETRY_MAYFAIL. If the flags _is_ specified we will back of in __alloc_pages_may_oom. > Maybe I don't catch your point in this message: > > __GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to > the page allocator. This has been true but only for allocations requests > larger than PAGE_ALLOC_COSTLY_ORDER. It has been always ignored for > smaller sizes. This is a bit unfortunate because there is no way to > express the same semantic for those requests and they are considered too > important to fail so they might end up looping in the page allocator for > ever, similarly to GFP_NOFAIL requests. > > I thought you will provide the same semantic to !costly allocation, or I > misunderstand? yes and that is the case. __alloc_pages_may_oom will back off before OOM killer is invoked and the allocator slow path will fail because did_some_progress == 0; -- Michal Hocko SUSE Labs
Re: [RFC PATCH 2/4] mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic
Hi, Michal Just go through your patch. I have one question and one suggestion as below. One suggestion: This patch does two things to me: 1. Replace __GFP_REPEAT with __GFP_RETRY_MAYFAIL 2. Adjust the logic in page_alloc to provide the middle semantic My suggestion is to split these two task into two patches, so that readers could catch your fundamental logic change easily. On Tue, Mar 07, 2017 at 04:48:41PM +0100, Michal Hocko wrote: >From: Michal Hocko> >__GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to >the page allocator. This has been true but only for allocations requests >larger than PAGE_ALLOC_COSTLY_ORDER. It has been always ignored for >smaller sizes. This is a bit unfortunate because there is no way to >express the same semantic for those requests and they are considered too >important to fail so they might end up looping in the page allocator for >ever, similarly to GFP_NOFAIL requests. > >Now that the whole tree has been cleaned up and accidental or misled >usage of __GFP_REPEAT flag has been removed for !costly requests we can >give the original flag a better name and more importantly a more useful >semantic. Let's rename it to __GFP_RETRY_MAYFAIL which tells the user that >the allocator would try really hard but there is no promise of a >success. This will work independent of the order and overrides the >default allocator behavior. Page allocator users have several levels of >guarantee vs. cost options (take GFP_KERNEL as an example) >- GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_ > attempt to free memory at all. The most light weight mode which even > doesn't kick the background reclaim. Should be used carefully because > it might deplete the memory and the next user might hit the more > aggressive reclaim >- GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic > allocation without any attempt to free memory from the current context > but can wake kswapd to reclaim memory if the zone is below the low > watermark. Can be used from either atomic contexts or when the request > is a performance optimization and there is another fallback for a slow > path. >- (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) - non > sleeping allocation with an expensive fallback so it can access some > portion of memory reserves. Usually used from interrupt/bh context with > an expensive slow path fallback. >- GFP_KERNEL - both background and direct reclaim are allowed and the > _default_ page allocator behavior is used. That means that !costly > allocation requests are basically nofail (unless the requesting task > is killed by the OOM killer) and costly will fail early rather than > cause disruptive reclaim. >- GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior and > all allocation requests fail early rather than cause disruptive > reclaim (one round of reclaim in this implementation). The OOM killer > is not invoked. >- GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator behavior > and all allocation requests try really hard. The request will fail if the > reclaim cannot make any progress. The OOM killer won't be triggered. >- GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior > and all allocation requests will loop endlessly until they > succeed. This might be really dangerous especially for larger orders. > >Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL because >they already had their semantic. No new users are added. >__alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if >there is no progress and we have already passed the OOM point. This >means that all the reclaim opportunities have been exhausted except the >most disruptive one (the OOM killer) and a user defined fallback >behavior is more sensible than keep retrying in the page allocator. > >Signed-off-by: Michal Hocko >--- > Documentation/DMA-ISA-LPC.txt| 2 +- > arch/powerpc/include/asm/book3s/64/pgalloc.h | 2 +- > arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- > drivers/mmc/host/wbsd.c | 2 +- > drivers/s390/char/vmcp.c | 2 +- > drivers/target/target_core_transport.c | 2 +- > drivers/vhost/net.c | 2 +- > drivers/vhost/scsi.c | 2 +- > drivers/vhost/vsock.c| 2 +- > fs/btrfs/check-integrity.c | 2 +- > fs/btrfs/raid56.c| 2 +- > include/linux/gfp.h | 32 +++- > include/linux/slab.h | 3 ++- > include/trace/events/mmflags.h | 2 +- > mm/hugetlb.c | 4 ++-- > mm/internal.h| 2 +- > mm/page_alloc.c | 14 +--- > mm/sparse-vmemmap.c | 4
Re: [RFC PATCH 2/4] mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic
Hi, Michal Just go through your patch. I have one question and one suggestion as below. One suggestion: This patch does two things to me: 1. Replace __GFP_REPEAT with __GFP_RETRY_MAYFAIL 2. Adjust the logic in page_alloc to provide the middle semantic My suggestion is to split these two task into two patches, so that readers could catch your fundamental logic change easily. On Tue, Mar 07, 2017 at 04:48:41PM +0100, Michal Hocko wrote: >From: Michal Hocko > >__GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to >the page allocator. This has been true but only for allocations requests >larger than PAGE_ALLOC_COSTLY_ORDER. It has been always ignored for >smaller sizes. This is a bit unfortunate because there is no way to >express the same semantic for those requests and they are considered too >important to fail so they might end up looping in the page allocator for >ever, similarly to GFP_NOFAIL requests. > >Now that the whole tree has been cleaned up and accidental or misled >usage of __GFP_REPEAT flag has been removed for !costly requests we can >give the original flag a better name and more importantly a more useful >semantic. Let's rename it to __GFP_RETRY_MAYFAIL which tells the user that >the allocator would try really hard but there is no promise of a >success. This will work independent of the order and overrides the >default allocator behavior. Page allocator users have several levels of >guarantee vs. cost options (take GFP_KERNEL as an example) >- GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_ > attempt to free memory at all. The most light weight mode which even > doesn't kick the background reclaim. Should be used carefully because > it might deplete the memory and the next user might hit the more > aggressive reclaim >- GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic > allocation without any attempt to free memory from the current context > but can wake kswapd to reclaim memory if the zone is below the low > watermark. Can be used from either atomic contexts or when the request > is a performance optimization and there is another fallback for a slow > path. >- (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) - non > sleeping allocation with an expensive fallback so it can access some > portion of memory reserves. Usually used from interrupt/bh context with > an expensive slow path fallback. >- GFP_KERNEL - both background and direct reclaim are allowed and the > _default_ page allocator behavior is used. That means that !costly > allocation requests are basically nofail (unless the requesting task > is killed by the OOM killer) and costly will fail early rather than > cause disruptive reclaim. >- GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior and > all allocation requests fail early rather than cause disruptive > reclaim (one round of reclaim in this implementation). The OOM killer > is not invoked. >- GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator behavior > and all allocation requests try really hard. The request will fail if the > reclaim cannot make any progress. The OOM killer won't be triggered. >- GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior > and all allocation requests will loop endlessly until they > succeed. This might be really dangerous especially for larger orders. > >Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL because >they already had their semantic. No new users are added. >__alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if >there is no progress and we have already passed the OOM point. This >means that all the reclaim opportunities have been exhausted except the >most disruptive one (the OOM killer) and a user defined fallback >behavior is more sensible than keep retrying in the page allocator. > >Signed-off-by: Michal Hocko >--- > Documentation/DMA-ISA-LPC.txt| 2 +- > arch/powerpc/include/asm/book3s/64/pgalloc.h | 2 +- > arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- > drivers/mmc/host/wbsd.c | 2 +- > drivers/s390/char/vmcp.c | 2 +- > drivers/target/target_core_transport.c | 2 +- > drivers/vhost/net.c | 2 +- > drivers/vhost/scsi.c | 2 +- > drivers/vhost/vsock.c| 2 +- > fs/btrfs/check-integrity.c | 2 +- > fs/btrfs/raid56.c| 2 +- > include/linux/gfp.h | 32 +++- > include/linux/slab.h | 3 ++- > include/trace/events/mmflags.h | 2 +- > mm/hugetlb.c | 4 ++-- > mm/internal.h| 2 +- > mm/page_alloc.c | 14 +--- > mm/sparse-vmemmap.c | 4 ++-- > mm/util.c
Re: [RFC PATCH 2/4] mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic
[I am sorry but I didn't get to this earlier] On Thu 25-05-17 11:21:05, NeilBrown wrote: > On Tue, Mar 07 2017, Michal Hocko wrote: > > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > > index 2bfcfd33e476..60af7937c6f2 100644 > > --- a/include/linux/gfp.h > > +++ b/include/linux/gfp.h > > @@ -25,7 +25,7 @@ struct vm_area_struct; > > #define ___GFP_FS 0x80u > > #define ___GFP_COLD0x100u > > #define ___GFP_NOWARN 0x200u > > -#define ___GFP_REPEAT 0x400u > > +#define ___GFP_RETRY_MAYFAIL 0x400u > > #define ___GFP_NOFAIL 0x800u > > #define ___GFP_NORETRY 0x1000u > > #define ___GFP_MEMALLOC0x2000u > > @@ -136,26 +136,38 @@ struct vm_area_struct; > > * > > * __GFP_RECLAIM is shorthand to allow/forbid both direct and kswapd > > reclaim. > > * > > - * __GFP_REPEAT: Try hard to allocate the memory, but the allocation > > attempt > > - * _might_ fail. This depends upon the particular VM implementation. > > + * The default allocator behavior depends on the request size. We have a > > concept > > + * of so called costly allocations (with order > PAGE_ALLOC_COSTLY_ORDER). > > Boundary conditions is one of my pet peeves > The description here suggests that an allocation of > "1<inconsistent with how those words would normally be interpreted. > > Looking at the code I see comparisons like: > >order < PAGE_ALLOC_COSTLY_ORDER > or >order >= PAGE_ALLOC_COSTLY_ORDER > > which supports the documented (but incoherent) meaning. > > But I also see: > > order = max_t(int, PAGE_ALLOC_COSTLY_ORDER - 1, 0); this smells fishy. Very similarly to other PAGE_ALLOC_COSTLY_ORDER usage out of the mm proper. Many of them can be simply changed to use kvmalloc. I will put this on my todo list for a later cleanup. There shouldn't be any real need to care about PAGE_ALLOC_COSTLY_ORDER. > which looks like it is trying to perform the largest non-costly > allocation, but is making a smaller allocation than necessary. > > I would *really* like it if the constant actually meant what its name > implied. > > PAGE_ALLOC_MAX_NON_COSTLY > ?? Yeah, I can see how this can be confusing. Maybe this is worth a separate cleanup? I wouldn't like to conflate it with this patch. > > + * !costly allocations are too essential to fail so they are implicitly > > + * non-failing (with some exceptions like OOM victims might fail) by > > default while > > + * costly requests try to be not disruptive and back off even without > > invoking > > + * the OOM killer. The following three modifiers might be used to override > > some of > > + * these implicit rules > > + * > > + * __GFP_NORETRY: The VM implementation must not retry indefinitely and > > will > > + * return NULL when direct reclaim and memory compaction have failed to > > allow > > + * the allocation to succeed. The OOM killer is not called with the > > current > > + * implementation. This is a default mode for costly allocations. > > The name here is "NORETRY", but the text says "not retry indefinitely". > So does it retry or not? > I would assuming it "tried" once, and only once. > However it could be that a "try" is not a simple well defined task. This is the case unfortunatelly. E.g. we have that node_reclaim thing which will try to reclaim a local node before falling back to other nodes. And that counts as a direct reclaim attempt and that happens in the allocator fast path. We do get to the allocator slow path where we do the full direct reclaim attempt and give up only if that fails. Confusing? I can see how... > Maybe some escalation happens on the 2nd or 3rd "try", so they are really > trying different things? > > The word "indefinitely" implies there is a definite limit. It might > help to say what that is, or at least say that it is small. OK. > Also, this documentation is phrased to tell the VM implementor what is, > or is not, allowed. Most readers will be more interested is the > responsibilities of the caller. > > __GFP_NORETRY: The VM implementation will not retry after all > reasonable avenues for finding free memory have been pursued. The > implementation may sleep (i.e. call 'schedule()'), but only while > waiting for another task to perform some specific action. > The caller must handle failure. This flag is suitable when failure can > easily be handled at small cost, such as reduced throughput. The above is not precise. What about the following? __GFP_NORETRY: The VM implementation will not try only very lightweight memory direct reclaim to get some memory under memory pressure (thus it can sleep). It will avoid disruptive actions like OOM killer. The caller must handle the failure which is quite likely to happen under heavy memory pressure. The flag is suitable when failure can easily be handled at small cost, such as
Re: [RFC PATCH 2/4] mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic
[I am sorry but I didn't get to this earlier] On Thu 25-05-17 11:21:05, NeilBrown wrote: > On Tue, Mar 07 2017, Michal Hocko wrote: > > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > > index 2bfcfd33e476..60af7937c6f2 100644 > > --- a/include/linux/gfp.h > > +++ b/include/linux/gfp.h > > @@ -25,7 +25,7 @@ struct vm_area_struct; > > #define ___GFP_FS 0x80u > > #define ___GFP_COLD0x100u > > #define ___GFP_NOWARN 0x200u > > -#define ___GFP_REPEAT 0x400u > > +#define ___GFP_RETRY_MAYFAIL 0x400u > > #define ___GFP_NOFAIL 0x800u > > #define ___GFP_NORETRY 0x1000u > > #define ___GFP_MEMALLOC0x2000u > > @@ -136,26 +136,38 @@ struct vm_area_struct; > > * > > * __GFP_RECLAIM is shorthand to allow/forbid both direct and kswapd > > reclaim. > > * > > - * __GFP_REPEAT: Try hard to allocate the memory, but the allocation > > attempt > > - * _might_ fail. This depends upon the particular VM implementation. > > + * The default allocator behavior depends on the request size. We have a > > concept > > + * of so called costly allocations (with order > PAGE_ALLOC_COSTLY_ORDER). > > Boundary conditions is one of my pet peeves > The description here suggests that an allocation of > "1< inconsistent with how those words would normally be interpreted. > > Looking at the code I see comparisons like: > >order < PAGE_ALLOC_COSTLY_ORDER > or >order >= PAGE_ALLOC_COSTLY_ORDER > > which supports the documented (but incoherent) meaning. > > But I also see: > > order = max_t(int, PAGE_ALLOC_COSTLY_ORDER - 1, 0); this smells fishy. Very similarly to other PAGE_ALLOC_COSTLY_ORDER usage out of the mm proper. Many of them can be simply changed to use kvmalloc. I will put this on my todo list for a later cleanup. There shouldn't be any real need to care about PAGE_ALLOC_COSTLY_ORDER. > which looks like it is trying to perform the largest non-costly > allocation, but is making a smaller allocation than necessary. > > I would *really* like it if the constant actually meant what its name > implied. > > PAGE_ALLOC_MAX_NON_COSTLY > ?? Yeah, I can see how this can be confusing. Maybe this is worth a separate cleanup? I wouldn't like to conflate it with this patch. > > + * !costly allocations are too essential to fail so they are implicitly > > + * non-failing (with some exceptions like OOM victims might fail) by > > default while > > + * costly requests try to be not disruptive and back off even without > > invoking > > + * the OOM killer. The following three modifiers might be used to override > > some of > > + * these implicit rules > > + * > > + * __GFP_NORETRY: The VM implementation must not retry indefinitely and > > will > > + * return NULL when direct reclaim and memory compaction have failed to > > allow > > + * the allocation to succeed. The OOM killer is not called with the > > current > > + * implementation. This is a default mode for costly allocations. > > The name here is "NORETRY", but the text says "not retry indefinitely". > So does it retry or not? > I would assuming it "tried" once, and only once. > However it could be that a "try" is not a simple well defined task. This is the case unfortunatelly. E.g. we have that node_reclaim thing which will try to reclaim a local node before falling back to other nodes. And that counts as a direct reclaim attempt and that happens in the allocator fast path. We do get to the allocator slow path where we do the full direct reclaim attempt and give up only if that fails. Confusing? I can see how... > Maybe some escalation happens on the 2nd or 3rd "try", so they are really > trying different things? > > The word "indefinitely" implies there is a definite limit. It might > help to say what that is, or at least say that it is small. OK. > Also, this documentation is phrased to tell the VM implementor what is, > or is not, allowed. Most readers will be more interested is the > responsibilities of the caller. > > __GFP_NORETRY: The VM implementation will not retry after all > reasonable avenues for finding free memory have been pursued. The > implementation may sleep (i.e. call 'schedule()'), but only while > waiting for another task to perform some specific action. > The caller must handle failure. This flag is suitable when failure can > easily be handled at small cost, such as reduced throughput. The above is not precise. What about the following? __GFP_NORETRY: The VM implementation will not try only very lightweight memory direct reclaim to get some memory under memory pressure (thus it can sleep). It will avoid disruptive actions like OOM killer. The caller must handle the failure which is quite likely to happen under heavy memory pressure. The flag is suitable when failure can easily be handled at small cost, such as reduced throughput > > + * > > + * __GFP_RETRY_MAYFAIL: Try
Re: [RFC PATCH 2/4] mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic
On Tue, Mar 07 2017, Michal Hocko wrote: > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index 2bfcfd33e476..60af7937c6f2 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -25,7 +25,7 @@ struct vm_area_struct; > #define ___GFP_FS0x80u > #define ___GFP_COLD 0x100u > #define ___GFP_NOWARN0x200u > -#define ___GFP_REPEAT0x400u > +#define ___GFP_RETRY_MAYFAIL 0x400u > #define ___GFP_NOFAIL0x800u > #define ___GFP_NORETRY 0x1000u > #define ___GFP_MEMALLOC 0x2000u > @@ -136,26 +136,38 @@ struct vm_area_struct; > * > * __GFP_RECLAIM is shorthand to allow/forbid both direct and kswapd reclaim. > * > - * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt > - * _might_ fail. This depends upon the particular VM implementation. > + * The default allocator behavior depends on the request size. We have a > concept > + * of so called costly allocations (with order > PAGE_ALLOC_COSTLY_ORDER). Boundary conditions is one of my pet peeves The description here suggests that an allocation of "1<= PAGE_ALLOC_COSTLY_ORDER which supports the documented (but incoherent) meaning. But I also see: order = max_t(int, PAGE_ALLOC_COSTLY_ORDER - 1, 0); which looks like it is trying to perform the largest non-costly allocation, but is making a smaller allocation than necessary. I would *really* like it if the constant actually meant what its name implied. PAGE_ALLOC_MAX_NON_COSTLY ?? > + * !costly allocations are too essential to fail so they are implicitly > + * non-failing (with some exceptions like OOM victims might fail) by default > while > + * costly requests try to be not disruptive and back off even without > invoking > + * the OOM killer. The following three modifiers might be used to override > some of > + * these implicit rules > + * > + * __GFP_NORETRY: The VM implementation must not retry indefinitely and will > + * return NULL when direct reclaim and memory compaction have failed to > allow > + * the allocation to succeed. The OOM killer is not called with the > current > + * implementation. This is a default mode for costly allocations. The name here is "NORETRY", but the text says "not retry indefinitely". So does it retry or not? I would assuming it "tried" once, and only once. However it could be that a "try" is not a simple well defined task. Maybe some escalation happens on the 2nd or 3rd "try", so they are really trying different things? The word "indefinitely" implies there is a definite limit. It might help to say what that is, or at least say that it is small. Also, this documentation is phrased to tell the VM implementor what is, or is not, allowed. Most readers will be more interested is the responsibilities of the caller. __GFP_NORETRY: The VM implementation will not retry after all reasonable avenues for finding free memory have been pursued. The implementation may sleep (i.e. call 'schedule()'), but only while waiting for another task to perform some specific action. The caller must handle failure. This flag is suitable when failure can easily be handled at small cost, such as reduced throughput. > + * > + * __GFP_RETRY_MAYFAIL: Try hard to allocate the memory, but the allocation > attempt > + * _might_ fail. All viable forms of memory reclaim are tried before the > fail. > + * The OOM killer is excluded because this would be too disruptive. This > can be > + * used to override non-failing default behavior for !costly requests as > well as > + * fortify costly requests. What does "Try hard" mean? In part, it means "retry everything a few more times", I guess in the hope that something happened in the mean time. It also seems to mean waiting for compaction to happen, which I guess is only relevant for >PAGE_SIZE allocations? Maybe it also means waiting for page-out to complete. So the summary would be that it waits for a little while, hoping for a miracle. __GFP_RETRY_MAYFAIL: The VM implementation will retry memory reclaim procedures that have previously failed if there is some indication that progress has been made else where. It can wait for other tasks to attempt high level approaches to freeing memory such as compaction (which removed fragmentation) and page-out. There is still a definite limit to the number of retries, but it is a larger limit than with __GFP_NORERY. Allocations with this flag may fail, but only when there is genuinely little unused memory. While these allocations do not directly trigger the OOM killer, their failure indicates that the system is likely to need to use the
Re: [RFC PATCH 2/4] mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic
On Tue, Mar 07 2017, Michal Hocko wrote: > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > index 2bfcfd33e476..60af7937c6f2 100644 > --- a/include/linux/gfp.h > +++ b/include/linux/gfp.h > @@ -25,7 +25,7 @@ struct vm_area_struct; > #define ___GFP_FS0x80u > #define ___GFP_COLD 0x100u > #define ___GFP_NOWARN0x200u > -#define ___GFP_REPEAT0x400u > +#define ___GFP_RETRY_MAYFAIL 0x400u > #define ___GFP_NOFAIL0x800u > #define ___GFP_NORETRY 0x1000u > #define ___GFP_MEMALLOC 0x2000u > @@ -136,26 +136,38 @@ struct vm_area_struct; > * > * __GFP_RECLAIM is shorthand to allow/forbid both direct and kswapd reclaim. > * > - * __GFP_REPEAT: Try hard to allocate the memory, but the allocation attempt > - * _might_ fail. This depends upon the particular VM implementation. > + * The default allocator behavior depends on the request size. We have a > concept > + * of so called costly allocations (with order > PAGE_ALLOC_COSTLY_ORDER). Boundary conditions is one of my pet peeves The description here suggests that an allocation of "1<= PAGE_ALLOC_COSTLY_ORDER which supports the documented (but incoherent) meaning. But I also see: order = max_t(int, PAGE_ALLOC_COSTLY_ORDER - 1, 0); which looks like it is trying to perform the largest non-costly allocation, but is making a smaller allocation than necessary. I would *really* like it if the constant actually meant what its name implied. PAGE_ALLOC_MAX_NON_COSTLY ?? > + * !costly allocations are too essential to fail so they are implicitly > + * non-failing (with some exceptions like OOM victims might fail) by default > while > + * costly requests try to be not disruptive and back off even without > invoking > + * the OOM killer. The following three modifiers might be used to override > some of > + * these implicit rules > + * > + * __GFP_NORETRY: The VM implementation must not retry indefinitely and will > + * return NULL when direct reclaim and memory compaction have failed to > allow > + * the allocation to succeed. The OOM killer is not called with the > current > + * implementation. This is a default mode for costly allocations. The name here is "NORETRY", but the text says "not retry indefinitely". So does it retry or not? I would assuming it "tried" once, and only once. However it could be that a "try" is not a simple well defined task. Maybe some escalation happens on the 2nd or 3rd "try", so they are really trying different things? The word "indefinitely" implies there is a definite limit. It might help to say what that is, or at least say that it is small. Also, this documentation is phrased to tell the VM implementor what is, or is not, allowed. Most readers will be more interested is the responsibilities of the caller. __GFP_NORETRY: The VM implementation will not retry after all reasonable avenues for finding free memory have been pursued. The implementation may sleep (i.e. call 'schedule()'), but only while waiting for another task to perform some specific action. The caller must handle failure. This flag is suitable when failure can easily be handled at small cost, such as reduced throughput. > + * > + * __GFP_RETRY_MAYFAIL: Try hard to allocate the memory, but the allocation > attempt > + * _might_ fail. All viable forms of memory reclaim are tried before the > fail. > + * The OOM killer is excluded because this would be too disruptive. This > can be > + * used to override non-failing default behavior for !costly requests as > well as > + * fortify costly requests. What does "Try hard" mean? In part, it means "retry everything a few more times", I guess in the hope that something happened in the mean time. It also seems to mean waiting for compaction to happen, which I guess is only relevant for >PAGE_SIZE allocations? Maybe it also means waiting for page-out to complete. So the summary would be that it waits for a little while, hoping for a miracle. __GFP_RETRY_MAYFAIL: The VM implementation will retry memory reclaim procedures that have previously failed if there is some indication that progress has been made else where. It can wait for other tasks to attempt high level approaches to freeing memory such as compaction (which removed fragmentation) and page-out. There is still a definite limit to the number of retries, but it is a larger limit than with __GFP_NORERY. Allocations with this flag may fail, but only when there is genuinely little unused memory. While these allocations do not directly trigger the OOM killer, their failure indicates that the system is likely to need to use the OOM killer soon. The caller must handle failure, but can reasonably do so by failing a higher-level request, or completing it only in a much less efficient manner. If the allocation does fail, and
[RFC PATCH 2/4] mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic
From: Michal Hocko__GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to the page allocator. This has been true but only for allocations requests larger than PAGE_ALLOC_COSTLY_ORDER. It has been always ignored for smaller sizes. This is a bit unfortunate because there is no way to express the same semantic for those requests and they are considered too important to fail so they might end up looping in the page allocator for ever, similarly to GFP_NOFAIL requests. Now that the whole tree has been cleaned up and accidental or misled usage of __GFP_REPEAT flag has been removed for !costly requests we can give the original flag a better name and more importantly a more useful semantic. Let's rename it to __GFP_RETRY_MAYFAIL which tells the user that the allocator would try really hard but there is no promise of a success. This will work independent of the order and overrides the default allocator behavior. Page allocator users have several levels of guarantee vs. cost options (take GFP_KERNEL as an example) - GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_ attempt to free memory at all. The most light weight mode which even doesn't kick the background reclaim. Should be used carefully because it might deplete the memory and the next user might hit the more aggressive reclaim - GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic allocation without any attempt to free memory from the current context but can wake kswapd to reclaim memory if the zone is below the low watermark. Can be used from either atomic contexts or when the request is a performance optimization and there is another fallback for a slow path. - (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) - non sleeping allocation with an expensive fallback so it can access some portion of memory reserves. Usually used from interrupt/bh context with an expensive slow path fallback. - GFP_KERNEL - both background and direct reclaim are allowed and the _default_ page allocator behavior is used. That means that !costly allocation requests are basically nofail (unless the requesting task is killed by the OOM killer) and costly will fail early rather than cause disruptive reclaim. - GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior and all allocation requests fail early rather than cause disruptive reclaim (one round of reclaim in this implementation). The OOM killer is not invoked. - GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator behavior and all allocation requests try really hard. The request will fail if the reclaim cannot make any progress. The OOM killer won't be triggered. - GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior and all allocation requests will loop endlessly until they succeed. This might be really dangerous especially for larger orders. Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL because they already had their semantic. No new users are added. __alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if there is no progress and we have already passed the OOM point. This means that all the reclaim opportunities have been exhausted except the most disruptive one (the OOM killer) and a user defined fallback behavior is more sensible than keep retrying in the page allocator. Signed-off-by: Michal Hocko --- Documentation/DMA-ISA-LPC.txt| 2 +- arch/powerpc/include/asm/book3s/64/pgalloc.h | 2 +- arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- drivers/mmc/host/wbsd.c | 2 +- drivers/s390/char/vmcp.c | 2 +- drivers/target/target_core_transport.c | 2 +- drivers/vhost/net.c | 2 +- drivers/vhost/scsi.c | 2 +- drivers/vhost/vsock.c| 2 +- fs/btrfs/check-integrity.c | 2 +- fs/btrfs/raid56.c| 2 +- include/linux/gfp.h | 32 +++- include/linux/slab.h | 3 ++- include/trace/events/mmflags.h | 2 +- mm/hugetlb.c | 4 ++-- mm/internal.h| 2 +- mm/page_alloc.c | 14 +--- mm/sparse-vmemmap.c | 4 ++-- mm/util.c| 6 +++--- mm/vmalloc.c | 2 +- mm/vmscan.c | 8 +++ net/core/dev.c | 6 +++--- net/core/skbuff.c| 2 +- net/sched/sch_fq.c | 2 +- tools/perf/builtin-kmem.c| 2 +- 25 files changed, 66 insertions(+), 45 deletions(-) diff --git a/Documentation/DMA-ISA-LPC.txt
[RFC PATCH 2/4] mm, tree wide: replace __GFP_REPEAT by __GFP_RETRY_MAYFAIL with more useful semantic
From: Michal Hocko __GFP_REPEAT was designed to allow retry-but-eventually-fail semantic to the page allocator. This has been true but only for allocations requests larger than PAGE_ALLOC_COSTLY_ORDER. It has been always ignored for smaller sizes. This is a bit unfortunate because there is no way to express the same semantic for those requests and they are considered too important to fail so they might end up looping in the page allocator for ever, similarly to GFP_NOFAIL requests. Now that the whole tree has been cleaned up and accidental or misled usage of __GFP_REPEAT flag has been removed for !costly requests we can give the original flag a better name and more importantly a more useful semantic. Let's rename it to __GFP_RETRY_MAYFAIL which tells the user that the allocator would try really hard but there is no promise of a success. This will work independent of the order and overrides the default allocator behavior. Page allocator users have several levels of guarantee vs. cost options (take GFP_KERNEL as an example) - GFP_KERNEL & ~__GFP_RECLAIM - optimistic allocation without _any_ attempt to free memory at all. The most light weight mode which even doesn't kick the background reclaim. Should be used carefully because it might deplete the memory and the next user might hit the more aggressive reclaim - GFP_KERNEL & ~__GFP_DIRECT_RECLAIM (or GFP_NOWAIT)- optimistic allocation without any attempt to free memory from the current context but can wake kswapd to reclaim memory if the zone is below the low watermark. Can be used from either atomic contexts or when the request is a performance optimization and there is another fallback for a slow path. - (GFP_KERNEL|__GFP_HIGH) & ~__GFP_DIRECT_RECLAIM (aka GFP_ATOMIC) - non sleeping allocation with an expensive fallback so it can access some portion of memory reserves. Usually used from interrupt/bh context with an expensive slow path fallback. - GFP_KERNEL - both background and direct reclaim are allowed and the _default_ page allocator behavior is used. That means that !costly allocation requests are basically nofail (unless the requesting task is killed by the OOM killer) and costly will fail early rather than cause disruptive reclaim. - GFP_KERNEL | __GFP_NORETRY - overrides the default allocator behavior and all allocation requests fail early rather than cause disruptive reclaim (one round of reclaim in this implementation). The OOM killer is not invoked. - GFP_KERNEL | __GFP_RETRY_MAYFAIL - overrides the default allocator behavior and all allocation requests try really hard. The request will fail if the reclaim cannot make any progress. The OOM killer won't be triggered. - GFP_KERNEL | __GFP_NOFAIL - overrides the default allocator behavior and all allocation requests will loop endlessly until they succeed. This might be really dangerous especially for larger orders. Existing users of __GFP_REPEAT are changed to __GFP_RETRY_MAYFAIL because they already had their semantic. No new users are added. __alloc_pages_slowpath is changed to bail out for __GFP_RETRY_MAYFAIL if there is no progress and we have already passed the OOM point. This means that all the reclaim opportunities have been exhausted except the most disruptive one (the OOM killer) and a user defined fallback behavior is more sensible than keep retrying in the page allocator. Signed-off-by: Michal Hocko --- Documentation/DMA-ISA-LPC.txt| 2 +- arch/powerpc/include/asm/book3s/64/pgalloc.h | 2 +- arch/powerpc/kvm/book3s_64_mmu_hv.c | 2 +- drivers/mmc/host/wbsd.c | 2 +- drivers/s390/char/vmcp.c | 2 +- drivers/target/target_core_transport.c | 2 +- drivers/vhost/net.c | 2 +- drivers/vhost/scsi.c | 2 +- drivers/vhost/vsock.c| 2 +- fs/btrfs/check-integrity.c | 2 +- fs/btrfs/raid56.c| 2 +- include/linux/gfp.h | 32 +++- include/linux/slab.h | 3 ++- include/trace/events/mmflags.h | 2 +- mm/hugetlb.c | 4 ++-- mm/internal.h| 2 +- mm/page_alloc.c | 14 +--- mm/sparse-vmemmap.c | 4 ++-- mm/util.c| 6 +++--- mm/vmalloc.c | 2 +- mm/vmscan.c | 8 +++ net/core/dev.c | 6 +++--- net/core/skbuff.c| 2 +- net/sched/sch_fq.c | 2 +- tools/perf/builtin-kmem.c| 2 +- 25 files changed, 66 insertions(+), 45 deletions(-) diff --git a/Documentation/DMA-ISA-LPC.txt b/Documentation/DMA-ISA-LPC.txt index