Re: [Question] Should direct reclaim time be bounded?

2019-07-12 Thread Mike Kravetz
On 7/11/19 10:47 PM, Hillf Danton wrote: > > On Thu, 11 Jul 2019 02:42:56 +0800 Mike Kravetz wrote: >> >> It is quite easy to hit the condition where: >> nr_reclaimed == 0 && nr_scanned == 0 is true, but we skip the previous test >> > Then skipping check of __GFP_RETRY_MAYFAIL makes no sense in

Re: [Question] Should direct reclaim time be bounded?

2019-07-12 Thread Mel Gorman
On Thu, Jul 11, 2019 at 09:12:45AM +0200, Michal Hocko wrote: > On Wed 10-07-19 16:36:58, Mike Kravetz wrote: > > On 7/10/19 12:44 PM, Michal Hocko wrote: > > > On Wed 10-07-19 11:42:40, Mike Kravetz wrote: > > > [...] > > >> As Michal suggested, I'm going to do some testing to see what impact > >

Re: [Question] Should direct reclaim time be bounded?

2019-07-11 Thread Michal Hocko
On Wed 10-07-19 16:36:58, Mike Kravetz wrote: > On 7/10/19 12:44 PM, Michal Hocko wrote: > > On Wed 10-07-19 11:42:40, Mike Kravetz wrote: > > [...] > >> As Michal suggested, I'm going to do some testing to see what impact > >> dropping the __GFP_RETRY_MAYFAIL flag for these huge page allocations

Re: [Question] Should direct reclaim time be bounded?

2019-07-10 Thread Mike Kravetz
On 7/10/19 12:44 PM, Michal Hocko wrote: > On Wed 10-07-19 11:42:40, Mike Kravetz wrote: > [...] >> As Michal suggested, I'm going to do some testing to see what impact >> dropping the __GFP_RETRY_MAYFAIL flag for these huge page allocations >> will have on the number of pages allocated. > > Just

Re: [Question] Should direct reclaim time be bounded?

2019-07-10 Thread Michal Hocko
On Wed 10-07-19 11:42:40, Mike Kravetz wrote: [...] > As Michal suggested, I'm going to do some testing to see what impact > dropping the __GFP_RETRY_MAYFAIL flag for these huge page allocations > will have on the number of pages allocated. Just to clarify. I didn't mean to drop

Re: [Question] Should direct reclaim time be bounded?

2019-07-10 Thread Mike Kravetz
On 7/7/19 10:19 PM, Hillf Danton wrote: > On Mon, 01 Jul 2019 20:15:51 -0700 Mike Kravetz wrote: >> On 7/1/19 1:59 AM, Mel Gorman wrote: >>> >>> I think it would be reasonable to have should_continue_reclaim allow an >>> exit if scanning at higher priority than DEF_PRIORITY - 2, nr_scanned is >>>

Re: [Question] Should direct reclaim time be bounded?

2019-07-04 Thread Mike Kravetz
On 7/4/19 4:09 AM, Michal Hocko wrote: > On Wed 03-07-19 16:54:35, Mike Kravetz wrote: >> On 7/3/19 2:43 AM, Mel Gorman wrote: >>> Indeed. I'm getting knocked offline shortly so I didn't give this the >>> time it deserves but it appears that part of this problem is >>> hugetlb-specific when one

Re: [Question] Should direct reclaim time be bounded?

2019-07-04 Thread Michal Hocko
On Wed 03-07-19 16:54:35, Mike Kravetz wrote: > On 7/3/19 2:43 AM, Mel Gorman wrote: > > Indeed. I'm getting knocked offline shortly so I didn't give this the > > time it deserves but it appears that part of this problem is > > hugetlb-specific when one node is full and can enter into this

Re: [Question] Should direct reclaim time be bounded?

2019-07-03 Thread Mike Kravetz
On 7/3/19 2:43 AM, Mel Gorman wrote: > Indeed. I'm getting knocked offline shortly so I didn't give this the > time it deserves but it appears that part of this problem is > hugetlb-specific when one node is full and can enter into this continual > loop due to __GFP_RETRY_MAYFAIL requiring both

Re: [Question] Should direct reclaim time be bounded?

2019-07-03 Thread Mel Gorman
On Mon, Jul 01, 2019 at 08:15:50PM -0700, Mike Kravetz wrote: > On 7/1/19 1:59 AM, Mel Gorman wrote: > > On Fri, Jun 28, 2019 at 11:20:42AM -0700, Mike Kravetz wrote: > >> On 4/24/19 7:35 AM, Vlastimil Babka wrote: > >>> On 4/23/19 6:39 PM, Mike Kravetz wrote: > > That being said, I do not

Re: [Question] Should direct reclaim time be bounded?

2019-07-01 Thread Mike Kravetz
On 7/1/19 1:59 AM, Mel Gorman wrote: > On Fri, Jun 28, 2019 at 11:20:42AM -0700, Mike Kravetz wrote: >> On 4/24/19 7:35 AM, Vlastimil Babka wrote: >>> On 4/23/19 6:39 PM, Mike Kravetz wrote: > That being said, I do not think __GFP_RETRY_MAYFAIL is wrong here. It > looks like there is

Re: [Question] Should direct reclaim time be bounded?

2019-07-01 Thread Mel Gorman
On Fri, Jun 28, 2019 at 11:20:42AM -0700, Mike Kravetz wrote: > On 4/24/19 7:35 AM, Vlastimil Babka wrote: > > On 4/23/19 6:39 PM, Mike Kravetz wrote: > >>> That being said, I do not think __GFP_RETRY_MAYFAIL is wrong here. It > >>> looks like there is something wrong in the reclaim going on. > >>

Re: [Question] Should direct reclaim time be bounded?

2019-06-28 Thread Mike Kravetz
On 4/24/19 7:35 AM, Vlastimil Babka wrote: > On 4/23/19 6:39 PM, Mike Kravetz wrote: >>> That being said, I do not think __GFP_RETRY_MAYFAIL is wrong here. It >>> looks like there is something wrong in the reclaim going on. >> >> Ok, I will start digging into that. Just wanted to make sure before

Re: [Question] Should direct reclaim time be bounded?

2019-04-24 Thread Vlastimil Babka
On 4/23/19 6:39 PM, Mike Kravetz wrote: >> That being said, I do not think __GFP_RETRY_MAYFAIL is wrong here. It >> looks like there is something wrong in the reclaim going on. > > Ok, I will start digging into that. Just wanted to make sure before I got > into it too deep. > > BTW - This is

Re: [Question] Should direct reclaim time be bounded?

2019-04-23 Thread Mike Kravetz
On 4/23/19 12:19 AM, Michal Hocko wrote: > On Mon 22-04-19 21:07:28, Mike Kravetz wrote: >> In our distro kernel, I am thinking about making allocations try "less hard" >> on nodes where we start to see failures. less hard == NORETRY/NORECLAIM. >> I was going to try something like this on an

Re: [Question] Should direct reclaim time be bounded?

2019-04-23 Thread Michal Hocko
On Mon 22-04-19 21:07:28, Mike Kravetz wrote: [...] > However, consider the case of a 2 node system where: > node 0 has 2GB memory > node 1 has 4GB memory > > Now, if one wants to allocate 4GB of huge pages they may be tempted to simply, > "echo 2048 > nr_hugepages". At first this will go well

[Question] Should direct reclaim time be bounded?

2019-04-22 Thread Mike Kravetz
I was looking into an issue on our distro kernel where allocation of huge pages via "echo X > /proc/sys/vm/nr_hugepages" was taking a LONG time. In this particular case, we were actually allocating huge pages VERY slowly at the rate of about one every 30 seconds. I don't want to talk about the