Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Tue, 11 Feb 2014 19:59:40 -0800 (PST) David Rientjes wrote: > On Tue, 11 Feb 2014, Luiz Capitulino wrote: > > > > > HugeTLB command-line option hugepages= allows the user to specify how > > > > many > > > > huge pages should be allocated at boot. On NUMA systems, this argument > > > > automatically distributes huge pages allocation among nodes, which can > > > > be undesirable. > > > > > > > > > > And when hugepages can no longer be allocated on a node because it is too > > > small, the remaining hugepages are distributed over nodes with memory > > > available, correct? > > > > No. hugepagesnid= tries to obey what was specified by the uses as much as > > possible. > > I'm referring to what I quoted above, the hugepages= parameter. Oh, OK. > I'm > saying that using existing functionality you can reserve an excess of > hugepages and then free unneeded hugepages at runtime to get the desired > amount allocated only on a specific node. I got that part. I only think this is not a good solution as I explained bellow. > > > Strange, it would seem better to just reserve as many hugepages as you > > > want so that you get the desired number on each node and then free the > > > ones you don't need at runtime. > > > > You mean, for example, if I have a 2 node system and want 2 1G huge pages > > from node 1, then I have to allocate 4 1G huge pages and then free 2 pages > > on node 0 after boot? That seems very cumbersome to me. Besides, what if > > node0 needs this memory during boot? > > > > All of this functionality, including the current hugepages= reservation at > boot, needs to show that it can't be done as late as when you could run an > initscript to do the reservation at runtime and fragmentation is at its > lowest level when userspace first becomes available. It's not that it can't. The point is that for 1G huge pages it's more reliable to allocate them as early as possible during the kernel boot process. I'm all for having/improving 1G allocation support at run-time, and volunteer to help with that effort, but that's something that can (and IMO should) be done on top of this series. > I don't see any justification given in the patchset that suggests you > can't simply do this in an initscript if it is possible to allocate 1GB > pages at runtime. If it's too late because of oom, then your userspace is > going to oom anyway if you reserve the hugepages at boot; if it's too late > because of fragmentation, let's work on that issue (and justification why > things like movablecore= don't work for you). > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Wed, 12 Feb 2014 03:37:11 +0100 Andi Kleen wrote: > > The real syntax is hugepagesnid=nid,nr-pages,size. Which looks > > straightforward > > to me. I honestly can't think of anything better than that, but I'm open for > > suggestions. > > hugepages_node=nid:nr-pages:size,... ? Looks good, I'll consider using it for v2. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Tue, Feb 11, 2014 at 06:15:57PM -0200, Marcelo Tosatti wrote: > On Tue, Feb 11, 2014 at 05:10:35PM +, Mel Gorman wrote: > > On Tue, Feb 11, 2014 at 01:26:29PM -0200, Marcelo Tosatti wrote: > > > > Or take a stab at allocating 1G pages at runtime. It would require > > > > finding properly aligned 1Gs worth of contiguous MAX_ORDER_NR_PAGES at > > > > runtime. I would expect it would only work very early in the lifetime of > > > > the system but if the user is willing to use kernel parameters to > > > > allocate them then it should not be an issue. > > > > > > Can be an improvement on top of the current patchset? Certain use-cases > > > require allocation guarantees (even if that requires kernel parameters). > > > > > > > Sure, they're not mutually exclusive. It would just avoid the need to > > create a new kernel parameter and use the existing interfaces. > > Yes, the problem is there is no guarantee is there? > There is no guarantee anyway and early in the lifetime of the system there is going to be very little difference in success rates. In case there is a misunderstanding here, I'm not looking to NAK a series that adds another kernel parameter. If it was me, I would have tried runtime allocation first to avoid adding a new interface but it's a personal preference. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Tue, Feb 11, 2014 at 06:15:57PM -0200, Marcelo Tosatti wrote: On Tue, Feb 11, 2014 at 05:10:35PM +, Mel Gorman wrote: On Tue, Feb 11, 2014 at 01:26:29PM -0200, Marcelo Tosatti wrote: Or take a stab at allocating 1G pages at runtime. It would require finding properly aligned 1Gs worth of contiguous MAX_ORDER_NR_PAGES at runtime. I would expect it would only work very early in the lifetime of the system but if the user is willing to use kernel parameters to allocate them then it should not be an issue. Can be an improvement on top of the current patchset? Certain use-cases require allocation guarantees (even if that requires kernel parameters). Sure, they're not mutually exclusive. It would just avoid the need to create a new kernel parameter and use the existing interfaces. Yes, the problem is there is no guarantee is there? There is no guarantee anyway and early in the lifetime of the system there is going to be very little difference in success rates. In case there is a misunderstanding here, I'm not looking to NAK a series that adds another kernel parameter. If it was me, I would have tried runtime allocation first to avoid adding a new interface but it's a personal preference. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Wed, 12 Feb 2014 03:37:11 +0100 Andi Kleen a...@firstfloor.org wrote: The real syntax is hugepagesnid=nid,nr-pages,size. Which looks straightforward to me. I honestly can't think of anything better than that, but I'm open for suggestions. hugepages_node=nid:nr-pages:size,... ? Looks good, I'll consider using it for v2. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Tue, 11 Feb 2014 19:59:40 -0800 (PST) David Rientjes rient...@google.com wrote: On Tue, 11 Feb 2014, Luiz Capitulino wrote: HugeTLB command-line option hugepages= allows the user to specify how many huge pages should be allocated at boot. On NUMA systems, this argument automatically distributes huge pages allocation among nodes, which can be undesirable. And when hugepages can no longer be allocated on a node because it is too small, the remaining hugepages are distributed over nodes with memory available, correct? No. hugepagesnid= tries to obey what was specified by the uses as much as possible. I'm referring to what I quoted above, the hugepages= parameter. Oh, OK. I'm saying that using existing functionality you can reserve an excess of hugepages and then free unneeded hugepages at runtime to get the desired amount allocated only on a specific node. I got that part. I only think this is not a good solution as I explained bellow. Strange, it would seem better to just reserve as many hugepages as you want so that you get the desired number on each node and then free the ones you don't need at runtime. You mean, for example, if I have a 2 node system and want 2 1G huge pages from node 1, then I have to allocate 4 1G huge pages and then free 2 pages on node 0 after boot? That seems very cumbersome to me. Besides, what if node0 needs this memory during boot? All of this functionality, including the current hugepages= reservation at boot, needs to show that it can't be done as late as when you could run an initscript to do the reservation at runtime and fragmentation is at its lowest level when userspace first becomes available. It's not that it can't. The point is that for 1G huge pages it's more reliable to allocate them as early as possible during the kernel boot process. I'm all for having/improving 1G allocation support at run-time, and volunteer to help with that effort, but that's something that can (and IMO should) be done on top of this series. I don't see any justification given in the patchset that suggests you can't simply do this in an initscript if it is possible to allocate 1GB pages at runtime. If it's too late because of oom, then your userspace is going to oom anyway if you reserve the hugepages at boot; if it's too late because of fragmentation, let's work on that issue (and justification why things like movablecore= don't work for you). -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Wed, 12 Feb 2014, Andi Kleen wrote: > > The real syntax is hugepagesnid=nid,nr-pages,size. Which looks > > straightforward > > to me. I honestly can't think of anything better than that, but I'm open for > > suggestions. > > hugepages_node=nid:nr-pages:size,... ? > I think that if we actually want this support that it should behave like hugepages= and hugepagesz=, i.e. you specify a hugepagesnode= and, if present, all remaining hugepages= and hugepagesz= parameters act only on that node unless overridden by another hugepagesnode=. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Tue, 11 Feb 2014, Luiz Capitulino wrote: > > > HugeTLB command-line option hugepages= allows the user to specify how many > > > huge pages should be allocated at boot. On NUMA systems, this argument > > > automatically distributes huge pages allocation among nodes, which can > > > be undesirable. > > > > > > > And when hugepages can no longer be allocated on a node because it is too > > small, the remaining hugepages are distributed over nodes with memory > > available, correct? > > No. hugepagesnid= tries to obey what was specified by the uses as much as > possible. I'm referring to what I quoted above, the hugepages= parameter. I'm saying that using existing functionality you can reserve an excess of hugepages and then free unneeded hugepages at runtime to get the desired amount allocated only on a specific node. > > Strange, it would seem better to just reserve as many hugepages as you > > want so that you get the desired number on each node and then free the > > ones you don't need at runtime. > > You mean, for example, if I have a 2 node system and want 2 1G huge pages > from node 1, then I have to allocate 4 1G huge pages and then free 2 pages > on node 0 after boot? That seems very cumbersome to me. Besides, what if > node0 needs this memory during boot? > All of this functionality, including the current hugepages= reservation at boot, needs to show that it can't be done as late as when you could run an initscript to do the reservation at runtime and fragmentation is at its lowest level when userspace first becomes available. I don't see any justification given in the patchset that suggests you can't simply do this in an initscript if it is possible to allocate 1GB pages at runtime. If it's too late because of oom, then your userspace is going to oom anyway if you reserve the hugepages at boot; if it's too late because of fragmentation, let's work on that issue (and justification why things like movablecore= don't work for you). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
> The real syntax is hugepagesnid=nid,nr-pages,size. Which looks straightforward > to me. I honestly can't think of anything better than that, but I'm open for > suggestions. hugepages_node=nid:nr-pages:size,... ? -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Tue, 11 Feb 2014 22:17:32 +0100 Andi Kleen wrote: > On Mon, Feb 10, 2014 at 12:27:44PM -0500, Luiz Capitulino wrote: > > HugeTLB command-line option hugepages= allows the user to specify how many > > huge pages should be allocated at boot. On NUMA systems, this argument > > automatically distributes huge pages allocation among nodes, which can > > be undesirable. > > > > The hugepagesnid= option introduced by this commit allows the user > > to specify which NUMA nodes should be used to allocate boot-time HugeTLB > > pages. For example, hugepagesnid=0,2,2G will allocate two 2G huge pages > > from node 0 only. More details on patch 3/4 and patch 4/4. > > The syntax seems very confusing. Can you make that more obvious? I guess that my bad description in this email may have contributed to make it look confusing. The real syntax is hugepagesnid=nid,nr-pages,size. Which looks straightforward to me. I honestly can't think of anything better than that, but I'm open for suggestions. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Mon, Feb 10, 2014 at 12:27:44PM -0500, Luiz Capitulino wrote: > HugeTLB command-line option hugepages= allows the user to specify how many > huge pages should be allocated at boot. On NUMA systems, this argument > automatically distributes huge pages allocation among nodes, which can > be undesirable. > > The hugepagesnid= option introduced by this commit allows the user > to specify which NUMA nodes should be used to allocate boot-time HugeTLB > pages. For example, hugepagesnid=0,2,2G will allocate two 2G huge pages > from node 0 only. More details on patch 3/4 and patch 4/4. The syntax seems very confusing. Can you make that more obvious? -Andi -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Tue, Feb 11, 2014 at 05:10:35PM +, Mel Gorman wrote: > On Tue, Feb 11, 2014 at 01:26:29PM -0200, Marcelo Tosatti wrote: > > > Or take a stab at allocating 1G pages at runtime. It would require > > > finding properly aligned 1Gs worth of contiguous MAX_ORDER_NR_PAGES at > > > runtime. I would expect it would only work very early in the lifetime of > > > the system but if the user is willing to use kernel parameters to > > > allocate them then it should not be an issue. > > > > Can be an improvement on top of the current patchset? Certain use-cases > > require allocation guarantees (even if that requires kernel parameters). > > > > Sure, they're not mutually exclusive. It would just avoid the need to > create a new kernel parameter and use the existing interfaces. Yes, the problem is there is no guarantee is there? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Tue, Feb 11, 2014 at 01:26:29PM -0200, Marcelo Tosatti wrote: > > Or take a stab at allocating 1G pages at runtime. It would require > > finding properly aligned 1Gs worth of contiguous MAX_ORDER_NR_PAGES at > > runtime. I would expect it would only work very early in the lifetime of > > the system but if the user is willing to use kernel parameters to > > allocate them then it should not be an issue. > > Can be an improvement on top of the current patchset? Certain use-cases > require allocation guarantees (even if that requires kernel parameters). > Sure, they're not mutually exclusive. It would just avoid the need to create a new kernel parameter and use the existing interfaces. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Mon, 10 Feb 2014 15:13:54 -0800 Andrew Morton wrote: > On Mon, 10 Feb 2014 12:27:44 -0500 Luiz Capitulino > wrote: > > > HugeTLB command-line option hugepages= allows the user to specify how many > > huge pages should be allocated at boot. On NUMA systems, this argument > > automatically distributes huge pages allocation among nodes, which can > > be undesirable. > > Grumble. "can be undesirable" is the entire reason for the entire > patchset. We need far, far more detail than can be conveyed in three > words, please! Right, sorry for that. I'll improve this for v2, but a better introduction for the series would be something like the following. Today, HugeTLB provides support for controlling allocation of persistent huge pages on a NUMA system through sysfs. So, for example, if a sysadmin wants to allocate 300 2M huge pages on node 1, s/he can do: echo 300 > /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages This works as long as you have enough contiguous pages, which may work for 2M pages, but is harder for 1G huge pages. For those, it's better or even required to reserve them at boot. To this end we have the hugepages= command-line option, which works but misses the per node control. This option evenly distributes huge pages among nodes. However, we have users who want more flexibility. They want to be able to specify something like: allocate 2 1G huge pages from node0 and 4 1G huge page from node1. This is what this series implements. It's basically per node allocation control for 1G huge pages, but it's important to note that this series is not intrusive. All it does is to set the initial per node allocation. All the functions and data structure added by this series are only used once at boot, after that they are discarded and rest in oblivion. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Mon, 10 Feb 2014 18:54:20 -0800 (PST) David Rientjes wrote: > On Mon, 10 Feb 2014, Luiz Capitulino wrote: > > > HugeTLB command-line option hugepages= allows the user to specify how many > > huge pages should be allocated at boot. On NUMA systems, this argument > > automatically distributes huge pages allocation among nodes, which can > > be undesirable. > > > > And when hugepages can no longer be allocated on a node because it is too > small, the remaining hugepages are distributed over nodes with memory > available, correct? No. hugepagesnid= tries to obey what was specified by the uses as much as possible. So, if you specify that 10 1G huge pages should be allocated from node0 but only 7 1G pages can actually be allocated, then hugepagesnid= will do just that. > > The hugepagesnid= option introduced by this commit allows the user > > to specify which NUMA nodes should be used to allocate boot-time HugeTLB > > pages. For example, hugepagesnid=0,2,2G will allocate two 2G huge pages > > from node 0 only. More details on patch 3/4 and patch 4/4. > > > > Strange, it would seem better to just reserve as many hugepages as you > want so that you get the desired number on each node and then free the > ones you don't need at runtime. You mean, for example, if I have a 2 node system and want 2 1G huge pages from node 1, then I have to allocate 4 1G huge pages and then free 2 pages on node 0 after boot? That seems very cumbersome to me. Besides, what if node0 needs this memory during boot? > That probably doesn't work because we can't free very large hugepages that > are reserved at boot, would fixing that issue reduce the need for this > patchset? I don't think so. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Tue, Feb 11, 2014 at 09:25:14AM +, Mel Gorman wrote: > On Mon, Feb 10, 2014 at 06:54:20PM -0800, David Rientjes wrote: > > On Mon, 10 Feb 2014, Luiz Capitulino wrote: > > > > > HugeTLB command-line option hugepages= allows the user to specify how many > > > huge pages should be allocated at boot. On NUMA systems, this argument > > > automatically distributes huge pages allocation among nodes, which can > > > be undesirable. > > > > > > > And when hugepages can no longer be allocated on a node because it is too > > small, the remaining hugepages are distributed over nodes with memory > > available, correct? > > > > > The hugepagesnid= option introduced by this commit allows the user > > > to specify which NUMA nodes should be used to allocate boot-time HugeTLB > > > pages. For example, hugepagesnid=0,2,2G will allocate two 2G huge pages > > > from node 0 only. More details on patch 3/4 and patch 4/4. > > > > > > > Strange, it would seem better to just reserve as many hugepages as you > > want so that you get the desired number on each node and then free the > > ones you don't need at runtime. You have to know the behaviour of the allocator, and rely on that to allocate the exact number of 1G hugepages on a particular node. Is that desired in constrast with specifying the exact number, and location, of hugepages to allocated? > Or take a stab at allocating 1G pages at runtime. It would require > finding properly aligned 1Gs worth of contiguous MAX_ORDER_NR_PAGES at > runtime. I would expect it would only work very early in the lifetime of > the system but if the user is willing to use kernel parameters to > allocate them then it should not be an issue. Can be an improvement on top of the current patchset? Certain use-cases require allocation guarantees (even if that requires kernel parameters). -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Mon, Feb 10, 2014 at 06:54:20PM -0800, David Rientjes wrote: > On Mon, 10 Feb 2014, Luiz Capitulino wrote: > > > HugeTLB command-line option hugepages= allows the user to specify how many > > huge pages should be allocated at boot. On NUMA systems, this argument > > automatically distributes huge pages allocation among nodes, which can > > be undesirable. > > > > And when hugepages can no longer be allocated on a node because it is too > small, the remaining hugepages are distributed over nodes with memory > available, correct? > > > The hugepagesnid= option introduced by this commit allows the user > > to specify which NUMA nodes should be used to allocate boot-time HugeTLB > > pages. For example, hugepagesnid=0,2,2G will allocate two 2G huge pages > > from node 0 only. More details on patch 3/4 and patch 4/4. > > > > Strange, it would seem better to just reserve as many hugepages as you > want so that you get the desired number on each node and then free the > ones you don't need at runtime. > Or take a stab at allocating 1G pages at runtime. It would require finding properly aligned 1Gs worth of contiguous MAX_ORDER_NR_PAGES at runtime. I would expect it would only work very early in the lifetime of the system but if the user is willing to use kernel parameters to allocate them then it should not be an issue. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Mon, Feb 10, 2014 at 06:54:20PM -0800, David Rientjes wrote: On Mon, 10 Feb 2014, Luiz Capitulino wrote: HugeTLB command-line option hugepages= allows the user to specify how many huge pages should be allocated at boot. On NUMA systems, this argument automatically distributes huge pages allocation among nodes, which can be undesirable. And when hugepages can no longer be allocated on a node because it is too small, the remaining hugepages are distributed over nodes with memory available, correct? The hugepagesnid= option introduced by this commit allows the user to specify which NUMA nodes should be used to allocate boot-time HugeTLB pages. For example, hugepagesnid=0,2,2G will allocate two 2G huge pages from node 0 only. More details on patch 3/4 and patch 4/4. Strange, it would seem better to just reserve as many hugepages as you want so that you get the desired number on each node and then free the ones you don't need at runtime. Or take a stab at allocating 1G pages at runtime. It would require finding properly aligned 1Gs worth of contiguous MAX_ORDER_NR_PAGES at runtime. I would expect it would only work very early in the lifetime of the system but if the user is willing to use kernel parameters to allocate them then it should not be an issue. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Tue, Feb 11, 2014 at 09:25:14AM +, Mel Gorman wrote: On Mon, Feb 10, 2014 at 06:54:20PM -0800, David Rientjes wrote: On Mon, 10 Feb 2014, Luiz Capitulino wrote: HugeTLB command-line option hugepages= allows the user to specify how many huge pages should be allocated at boot. On NUMA systems, this argument automatically distributes huge pages allocation among nodes, which can be undesirable. And when hugepages can no longer be allocated on a node because it is too small, the remaining hugepages are distributed over nodes with memory available, correct? The hugepagesnid= option introduced by this commit allows the user to specify which NUMA nodes should be used to allocate boot-time HugeTLB pages. For example, hugepagesnid=0,2,2G will allocate two 2G huge pages from node 0 only. More details on patch 3/4 and patch 4/4. Strange, it would seem better to just reserve as many hugepages as you want so that you get the desired number on each node and then free the ones you don't need at runtime. You have to know the behaviour of the allocator, and rely on that to allocate the exact number of 1G hugepages on a particular node. Is that desired in constrast with specifying the exact number, and location, of hugepages to allocated? Or take a stab at allocating 1G pages at runtime. It would require finding properly aligned 1Gs worth of contiguous MAX_ORDER_NR_PAGES at runtime. I would expect it would only work very early in the lifetime of the system but if the user is willing to use kernel parameters to allocate them then it should not be an issue. Can be an improvement on top of the current patchset? Certain use-cases require allocation guarantees (even if that requires kernel parameters). -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Mon, 10 Feb 2014 18:54:20 -0800 (PST) David Rientjes rient...@google.com wrote: On Mon, 10 Feb 2014, Luiz Capitulino wrote: HugeTLB command-line option hugepages= allows the user to specify how many huge pages should be allocated at boot. On NUMA systems, this argument automatically distributes huge pages allocation among nodes, which can be undesirable. And when hugepages can no longer be allocated on a node because it is too small, the remaining hugepages are distributed over nodes with memory available, correct? No. hugepagesnid= tries to obey what was specified by the uses as much as possible. So, if you specify that 10 1G huge pages should be allocated from node0 but only 7 1G pages can actually be allocated, then hugepagesnid= will do just that. The hugepagesnid= option introduced by this commit allows the user to specify which NUMA nodes should be used to allocate boot-time HugeTLB pages. For example, hugepagesnid=0,2,2G will allocate two 2G huge pages from node 0 only. More details on patch 3/4 and patch 4/4. Strange, it would seem better to just reserve as many hugepages as you want so that you get the desired number on each node and then free the ones you don't need at runtime. You mean, for example, if I have a 2 node system and want 2 1G huge pages from node 1, then I have to allocate 4 1G huge pages and then free 2 pages on node 0 after boot? That seems very cumbersome to me. Besides, what if node0 needs this memory during boot? That probably doesn't work because we can't free very large hugepages that are reserved at boot, would fixing that issue reduce the need for this patchset? I don't think so. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Mon, 10 Feb 2014 15:13:54 -0800 Andrew Morton a...@linux-foundation.org wrote: On Mon, 10 Feb 2014 12:27:44 -0500 Luiz Capitulino lcapitul...@redhat.com wrote: HugeTLB command-line option hugepages= allows the user to specify how many huge pages should be allocated at boot. On NUMA systems, this argument automatically distributes huge pages allocation among nodes, which can be undesirable. Grumble. can be undesirable is the entire reason for the entire patchset. We need far, far more detail than can be conveyed in three words, please! Right, sorry for that. I'll improve this for v2, but a better introduction for the series would be something like the following. Today, HugeTLB provides support for controlling allocation of persistent huge pages on a NUMA system through sysfs. So, for example, if a sysadmin wants to allocate 300 2M huge pages on node 1, s/he can do: echo 300 /sys/devices/system/node/node1/hugepages/hugepages-2048kB/nr_hugepages This works as long as you have enough contiguous pages, which may work for 2M pages, but is harder for 1G huge pages. For those, it's better or even required to reserve them at boot. To this end we have the hugepages= command-line option, which works but misses the per node control. This option evenly distributes huge pages among nodes. However, we have users who want more flexibility. They want to be able to specify something like: allocate 2 1G huge pages from node0 and 4 1G huge page from node1. This is what this series implements. It's basically per node allocation control for 1G huge pages, but it's important to note that this series is not intrusive. All it does is to set the initial per node allocation. All the functions and data structure added by this series are only used once at boot, after that they are discarded and rest in oblivion. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Tue, Feb 11, 2014 at 01:26:29PM -0200, Marcelo Tosatti wrote: Or take a stab at allocating 1G pages at runtime. It would require finding properly aligned 1Gs worth of contiguous MAX_ORDER_NR_PAGES at runtime. I would expect it would only work very early in the lifetime of the system but if the user is willing to use kernel parameters to allocate them then it should not be an issue. Can be an improvement on top of the current patchset? Certain use-cases require allocation guarantees (even if that requires kernel parameters). Sure, they're not mutually exclusive. It would just avoid the need to create a new kernel parameter and use the existing interfaces. -- Mel Gorman SUSE Labs -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Tue, Feb 11, 2014 at 05:10:35PM +, Mel Gorman wrote: On Tue, Feb 11, 2014 at 01:26:29PM -0200, Marcelo Tosatti wrote: Or take a stab at allocating 1G pages at runtime. It would require finding properly aligned 1Gs worth of contiguous MAX_ORDER_NR_PAGES at runtime. I would expect it would only work very early in the lifetime of the system but if the user is willing to use kernel parameters to allocate them then it should not be an issue. Can be an improvement on top of the current patchset? Certain use-cases require allocation guarantees (even if that requires kernel parameters). Sure, they're not mutually exclusive. It would just avoid the need to create a new kernel parameter and use the existing interfaces. Yes, the problem is there is no guarantee is there? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Mon, Feb 10, 2014 at 12:27:44PM -0500, Luiz Capitulino wrote: HugeTLB command-line option hugepages= allows the user to specify how many huge pages should be allocated at boot. On NUMA systems, this argument automatically distributes huge pages allocation among nodes, which can be undesirable. The hugepagesnid= option introduced by this commit allows the user to specify which NUMA nodes should be used to allocate boot-time HugeTLB pages. For example, hugepagesnid=0,2,2G will allocate two 2G huge pages from node 0 only. More details on patch 3/4 and patch 4/4. The syntax seems very confusing. Can you make that more obvious? -Andi -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Tue, 11 Feb 2014 22:17:32 +0100 Andi Kleen a...@firstfloor.org wrote: On Mon, Feb 10, 2014 at 12:27:44PM -0500, Luiz Capitulino wrote: HugeTLB command-line option hugepages= allows the user to specify how many huge pages should be allocated at boot. On NUMA systems, this argument automatically distributes huge pages allocation among nodes, which can be undesirable. The hugepagesnid= option introduced by this commit allows the user to specify which NUMA nodes should be used to allocate boot-time HugeTLB pages. For example, hugepagesnid=0,2,2G will allocate two 2G huge pages from node 0 only. More details on patch 3/4 and patch 4/4. The syntax seems very confusing. Can you make that more obvious? I guess that my bad description in this email may have contributed to make it look confusing. The real syntax is hugepagesnid=nid,nr-pages,size. Which looks straightforward to me. I honestly can't think of anything better than that, but I'm open for suggestions. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
The real syntax is hugepagesnid=nid,nr-pages,size. Which looks straightforward to me. I honestly can't think of anything better than that, but I'm open for suggestions. hugepages_node=nid:nr-pages:size,... ? -Andi -- a...@linux.intel.com -- Speaking for myself only. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Tue, 11 Feb 2014, Luiz Capitulino wrote: HugeTLB command-line option hugepages= allows the user to specify how many huge pages should be allocated at boot. On NUMA systems, this argument automatically distributes huge pages allocation among nodes, which can be undesirable. And when hugepages can no longer be allocated on a node because it is too small, the remaining hugepages are distributed over nodes with memory available, correct? No. hugepagesnid= tries to obey what was specified by the uses as much as possible. I'm referring to what I quoted above, the hugepages= parameter. I'm saying that using existing functionality you can reserve an excess of hugepages and then free unneeded hugepages at runtime to get the desired amount allocated only on a specific node. Strange, it would seem better to just reserve as many hugepages as you want so that you get the desired number on each node and then free the ones you don't need at runtime. You mean, for example, if I have a 2 node system and want 2 1G huge pages from node 1, then I have to allocate 4 1G huge pages and then free 2 pages on node 0 after boot? That seems very cumbersome to me. Besides, what if node0 needs this memory during boot? All of this functionality, including the current hugepages= reservation at boot, needs to show that it can't be done as late as when you could run an initscript to do the reservation at runtime and fragmentation is at its lowest level when userspace first becomes available. I don't see any justification given in the patchset that suggests you can't simply do this in an initscript if it is possible to allocate 1GB pages at runtime. If it's too late because of oom, then your userspace is going to oom anyway if you reserve the hugepages at boot; if it's too late because of fragmentation, let's work on that issue (and justification why things like movablecore= don't work for you). -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Wed, 12 Feb 2014, Andi Kleen wrote: The real syntax is hugepagesnid=nid,nr-pages,size. Which looks straightforward to me. I honestly can't think of anything better than that, but I'm open for suggestions. hugepages_node=nid:nr-pages:size,... ? I think that if we actually want this support that it should behave like hugepages= and hugepagesz=, i.e. you specify a hugepagesnode= and, if present, all remaining hugepages= and hugepagesz= parameters act only on that node unless overridden by another hugepagesnode=. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Mon, 2014-02-10 at 15:13 -0800, Andrew Morton wrote: > On Mon, 10 Feb 2014 12:27:44 -0500 Luiz Capitulino > wrote: > > > HugeTLB command-line option hugepages= allows the user to specify how many > > huge pages should be allocated at boot. On NUMA systems, this argument > > automatically distributes huge pages allocation among nodes, which can > > be undesirable. > > Grumble. "can be undesirable" is the entire reason for the entire > patchset. We need far, far more detail than can be conveyed in three > words, please! One (not so real-world) scenario that comes right to mind which can benefit for such a feature is the ability to study socket/node scaling for hugepage aware applications. Yes, we do have numactl to bind programs to resources, but I don't mind having a way of finer graining hugetlb allocations, specially if it doesn't hurt anything. Thanks, Davidlohr -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Mon, 10 Feb 2014, Luiz Capitulino wrote: > HugeTLB command-line option hugepages= allows the user to specify how many > huge pages should be allocated at boot. On NUMA systems, this argument > automatically distributes huge pages allocation among nodes, which can > be undesirable. > And when hugepages can no longer be allocated on a node because it is too small, the remaining hugepages are distributed over nodes with memory available, correct? > The hugepagesnid= option introduced by this commit allows the user > to specify which NUMA nodes should be used to allocate boot-time HugeTLB > pages. For example, hugepagesnid=0,2,2G will allocate two 2G huge pages > from node 0 only. More details on patch 3/4 and patch 4/4. > Strange, it would seem better to just reserve as many hugepages as you want so that you get the desired number on each node and then free the ones you don't need at runtime. That probably doesn't work because we can't free very large hugepages that are reserved at boot, would fixing that issue reduce the need for this patchset? -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Mon, 10 Feb 2014 12:27:44 -0500 Luiz Capitulino wrote: > HugeTLB command-line option hugepages= allows the user to specify how many > huge pages should be allocated at boot. On NUMA systems, this argument > automatically distributes huge pages allocation among nodes, which can > be undesirable. Grumble. "can be undesirable" is the entire reason for the entire patchset. We need far, far more detail than can be conveyed in three words, please! > The hugepagesnid= option introduced by this commit allows the user > to specify which NUMA nodes should be used to allocate boot-time HugeTLB > pages. For example, hugepagesnid=0,2,2G will allocate two 2G huge pages > from node 0 only. More details on patch 3/4 and patch 4/4. > -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Mon, 10 Feb 2014 12:27:44 -0500 Luiz Capitulino wrote: > The hugepagesnid= option introduced by this commit allows the user > to specify which NUMA nodes should be used to allocate boot-time HugeTLB > pages. For example, hugepagesnid=0,2,2G will allocate two 2G huge pages > from node 0 only. More details on patch 3/4 and patch 4/4. s/2G/1G I repeatedly did this mistake even when testing... For some reason my brain insists on typing "2,2G" instead of "2,1G". -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Mon, 10 Feb 2014 12:27:44 -0500 Luiz Capitulino lcapitul...@redhat.com wrote: The hugepagesnid= option introduced by this commit allows the user to specify which NUMA nodes should be used to allocate boot-time HugeTLB pages. For example, hugepagesnid=0,2,2G will allocate two 2G huge pages from node 0 only. More details on patch 3/4 and patch 4/4. s/2G/1G I repeatedly did this mistake even when testing... For some reason my brain insists on typing 2,2G instead of 2,1G. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Mon, 10 Feb 2014 12:27:44 -0500 Luiz Capitulino lcapitul...@redhat.com wrote: HugeTLB command-line option hugepages= allows the user to specify how many huge pages should be allocated at boot. On NUMA systems, this argument automatically distributes huge pages allocation among nodes, which can be undesirable. Grumble. can be undesirable is the entire reason for the entire patchset. We need far, far more detail than can be conveyed in three words, please! The hugepagesnid= option introduced by this commit allows the user to specify which NUMA nodes should be used to allocate boot-time HugeTLB pages. For example, hugepagesnid=0,2,2G will allocate two 2G huge pages from node 0 only. More details on patch 3/4 and patch 4/4. -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Mon, 10 Feb 2014, Luiz Capitulino wrote: HugeTLB command-line option hugepages= allows the user to specify how many huge pages should be allocated at boot. On NUMA systems, this argument automatically distributes huge pages allocation among nodes, which can be undesirable. And when hugepages can no longer be allocated on a node because it is too small, the remaining hugepages are distributed over nodes with memory available, correct? The hugepagesnid= option introduced by this commit allows the user to specify which NUMA nodes should be used to allocate boot-time HugeTLB pages. For example, hugepagesnid=0,2,2G will allocate two 2G huge pages from node 0 only. More details on patch 3/4 and patch 4/4. Strange, it would seem better to just reserve as many hugepages as you want so that you get the desired number on each node and then free the ones you don't need at runtime. That probably doesn't work because we can't free very large hugepages that are reserved at boot, would fixing that issue reduce the need for this patchset? -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/4] hugetlb: add hugepagesnid= command-line option
On Mon, 2014-02-10 at 15:13 -0800, Andrew Morton wrote: On Mon, 10 Feb 2014 12:27:44 -0500 Luiz Capitulino lcapitul...@redhat.com wrote: HugeTLB command-line option hugepages= allows the user to specify how many huge pages should be allocated at boot. On NUMA systems, this argument automatically distributes huge pages allocation among nodes, which can be undesirable. Grumble. can be undesirable is the entire reason for the entire patchset. We need far, far more detail than can be conveyed in three words, please! One (not so real-world) scenario that comes right to mind which can benefit for such a feature is the ability to study socket/node scaling for hugepage aware applications. Yes, we do have numactl to bind programs to resources, but I don't mind having a way of finer graining hugetlb allocations, specially if it doesn't hurt anything. Thanks, Davidlohr -- To unsubscribe from this list: send the line unsubscribe linux-kernel in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/