Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
On 09/20/2016 06:24 AM, David Rientjes wrote: > On Sat, 17 Sep 2016, Anshuman Khandual wrote: > >>> > > I'm questioning if this information can be inferred from information >>> > > already in /proc/zoneinfo and sysfs. We know the no-fallback zonelist >>> > > is >>> > > going to include the local node, and we know the other zonelists are >>> > > either node ordered or zone ordered (or do we need to extend >>> > > vm.numa_zonelist_order for default?). I may have missed what new >>> > > knowledge this interface is imparting on us. >> > >> > IIUC /proc/zoneinfo lists down zone internal state and statistics for >> > all zones on the system at any given point of time. The no-fallback >> > list contains the zones from the local node and fallback (which gets >> > used more often than the no-fallback) list contains all zones either >> > in node-ordered or zone-ordered manner. In most of the platforms the >> > default being the node order but the sequence of present nodes in >> > that order is determined by various factors like NUMA distance, load, >> > presence of CPUs on the node etc. This order of nodes in the fallback >> > list is the most important information derived out of this interface. >> > > The point is that all of this can be inferred with information already > provided, so the additional interface seems unnecessary. The only > extension I think that is needed is to determine if the order is node or > zone when vm.numa_zonelist_order == default and we shouldn't parse this > from dmesg. Okay. Seems like the general view is that this interface is not necessary. Hence wont be posting the debugfs version for now.
Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
On 09/20/2016 06:24 AM, David Rientjes wrote: > On Sat, 17 Sep 2016, Anshuman Khandual wrote: > >>> > > I'm questioning if this information can be inferred from information >>> > > already in /proc/zoneinfo and sysfs. We know the no-fallback zonelist >>> > > is >>> > > going to include the local node, and we know the other zonelists are >>> > > either node ordered or zone ordered (or do we need to extend >>> > > vm.numa_zonelist_order for default?). I may have missed what new >>> > > knowledge this interface is imparting on us. >> > >> > IIUC /proc/zoneinfo lists down zone internal state and statistics for >> > all zones on the system at any given point of time. The no-fallback >> > list contains the zones from the local node and fallback (which gets >> > used more often than the no-fallback) list contains all zones either >> > in node-ordered or zone-ordered manner. In most of the platforms the >> > default being the node order but the sequence of present nodes in >> > that order is determined by various factors like NUMA distance, load, >> > presence of CPUs on the node etc. This order of nodes in the fallback >> > list is the most important information derived out of this interface. >> > > The point is that all of this can be inferred with information already > provided, so the additional interface seems unnecessary. The only > extension I think that is needed is to determine if the order is node or > zone when vm.numa_zonelist_order == default and we shouldn't parse this > from dmesg. Okay. Seems like the general view is that this interface is not necessary. Hence wont be posting the debugfs version for now.
Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
On Sat, 17 Sep 2016, Anshuman Khandual wrote: > > I'm questioning if this information can be inferred from information > > already in /proc/zoneinfo and sysfs. We know the no-fallback zonelist is > > going to include the local node, and we know the other zonelists are > > either node ordered or zone ordered (or do we need to extend > > vm.numa_zonelist_order for default?). I may have missed what new > > knowledge this interface is imparting on us. > > IIUC /proc/zoneinfo lists down zone internal state and statistics for > all zones on the system at any given point of time. The no-fallback > list contains the zones from the local node and fallback (which gets > used more often than the no-fallback) list contains all zones either > in node-ordered or zone-ordered manner. In most of the platforms the > default being the node order but the sequence of present nodes in > that order is determined by various factors like NUMA distance, load, > presence of CPUs on the node etc. This order of nodes in the fallback > list is the most important information derived out of this interface. > The point is that all of this can be inferred with information already provided, so the additional interface seems unnecessary. The only extension I think that is needed is to determine if the order is node or zone when vm.numa_zonelist_order == default and we shouldn't parse this from dmesg.
Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
On Sat, 17 Sep 2016, Anshuman Khandual wrote: > > I'm questioning if this information can be inferred from information > > already in /proc/zoneinfo and sysfs. We know the no-fallback zonelist is > > going to include the local node, and we know the other zonelists are > > either node ordered or zone ordered (or do we need to extend > > vm.numa_zonelist_order for default?). I may have missed what new > > knowledge this interface is imparting on us. > > IIUC /proc/zoneinfo lists down zone internal state and statistics for > all zones on the system at any given point of time. The no-fallback > list contains the zones from the local node and fallback (which gets > used more often than the no-fallback) list contains all zones either > in node-ordered or zone-ordered manner. In most of the platforms the > default being the node order but the sequence of present nodes in > that order is determined by various factors like NUMA distance, load, > presence of CPUs on the node etc. This order of nodes in the fallback > list is the most important information derived out of this interface. > The point is that all of this can be inferred with information already provided, so the additional interface seems unnecessary. The only extension I think that is needed is to determine if the order is node or zone when vm.numa_zonelist_order == default and we shouldn't parse this from dmesg.
Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
On 09/12/2016 11:43 PM, David Rientjes wrote: > On Mon, 12 Sep 2016, Anshuman Khandual wrote: > > after memory or node hot[un]plug is desirable. This change adds one > new sysfs interface (/sys/devices/system/memory/system_zone_details) > which will fetch and dump this information. >>> Doesn't this violate the "one value per file" sysfs rule? Does it >>> belong in debugfs instead? >> >> Yeah sure. Will make it a debugfs interface. >> > > So the intended reader of this file is running as root? Yeah. > >>> I also really question the need to dump kernel addresses out, filtered >>> or not. What's the point? >> >> Hmm, thought it to be an additional information. But yes its additional >> and can be dropped. >> > > I'm questioning if this information can be inferred from information > already in /proc/zoneinfo and sysfs. We know the no-fallback zonelist is > going to include the local node, and we know the other zonelists are > either node ordered or zone ordered (or do we need to extend > vm.numa_zonelist_order for default?). I may have missed what new > knowledge this interface is imparting on us. IIUC /proc/zoneinfo lists down zone internal state and statistics for all zones on the system at any given point of time. The no-fallback list contains the zones from the local node and fallback (which gets used more often than the no-fallback) list contains all zones either in node-ordered or zone-ordered manner. In most of the platforms the default being the node order but the sequence of present nodes in that order is determined by various factors like NUMA distance, load, presence of CPUs on the node etc. This order of nodes in the fallback list is the most important information derived out of this interface.
Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
On 09/12/2016 11:43 PM, David Rientjes wrote: > On Mon, 12 Sep 2016, Anshuman Khandual wrote: > > after memory or node hot[un]plug is desirable. This change adds one > new sysfs interface (/sys/devices/system/memory/system_zone_details) > which will fetch and dump this information. >>> Doesn't this violate the "one value per file" sysfs rule? Does it >>> belong in debugfs instead? >> >> Yeah sure. Will make it a debugfs interface. >> > > So the intended reader of this file is running as root? Yeah. > >>> I also really question the need to dump kernel addresses out, filtered >>> or not. What's the point? >> >> Hmm, thought it to be an additional information. But yes its additional >> and can be dropped. >> > > I'm questioning if this information can be inferred from information > already in /proc/zoneinfo and sysfs. We know the no-fallback zonelist is > going to include the local node, and we know the other zonelists are > either node ordered or zone ordered (or do we need to extend > vm.numa_zonelist_order for default?). I may have missed what new > knowledge this interface is imparting on us. IIUC /proc/zoneinfo lists down zone internal state and statistics for all zones on the system at any given point of time. The no-fallback list contains the zones from the local node and fallback (which gets used more often than the no-fallback) list contains all zones either in node-ordered or zone-ordered manner. In most of the platforms the default being the node order but the sequence of present nodes in that order is determined by various factors like NUMA distance, load, presence of CPUs on the node etc. This order of nodes in the fallback list is the most important information derived out of this interface.
Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
On Mon, 12 Sep 2016, Anshuman Khandual wrote: > >> > after memory or node hot[un]plug is desirable. This change adds one > >> > new sysfs interface (/sys/devices/system/memory/system_zone_details) > >> > which will fetch and dump this information. > > Doesn't this violate the "one value per file" sysfs rule? Does it > > belong in debugfs instead? > > Yeah sure. Will make it a debugfs interface. > So the intended reader of this file is running as root? > > I also really question the need to dump kernel addresses out, filtered > > or not. What's the point? > > Hmm, thought it to be an additional information. But yes its additional > and can be dropped. > I'm questioning if this information can be inferred from information already in /proc/zoneinfo and sysfs. We know the no-fallback zonelist is going to include the local node, and we know the other zonelists are either node ordered or zone ordered (or do we need to extend vm.numa_zonelist_order for default?). I may have missed what new knowledge this interface is imparting on us.
Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
On Mon, 12 Sep 2016, Anshuman Khandual wrote: > >> > after memory or node hot[un]plug is desirable. This change adds one > >> > new sysfs interface (/sys/devices/system/memory/system_zone_details) > >> > which will fetch and dump this information. > > Doesn't this violate the "one value per file" sysfs rule? Does it > > belong in debugfs instead? > > Yeah sure. Will make it a debugfs interface. > So the intended reader of this file is running as root? > > I also really question the need to dump kernel addresses out, filtered > > or not. What's the point? > > Hmm, thought it to be an additional information. But yes its additional > and can be dropped. > I'm questioning if this information can be inferred from information already in /proc/zoneinfo and sysfs. We know the no-fallback zonelist is going to include the local node, and we know the other zonelists are either node ordered or zone ordered (or do we need to extend vm.numa_zonelist_order for default?). I may have missed what new knowledge this interface is imparting on us.
Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
On 09/09/2016 01:54 AM, Dave Hansen wrote: > On 09/07/2016 07:46 PM, Anshuman Khandual wrote: >> > after memory or node hot[un]plug is desirable. This change adds one >> > new sysfs interface (/sys/devices/system/memory/system_zone_details) >> > which will fetch and dump this information. > Doesn't this violate the "one value per file" sysfs rule? Does it > belong in debugfs instead? Yeah sure. Will make it a debugfs interface. > > I also really question the need to dump kernel addresses out, filtered > or not. What's the point? Hmm, thought it to be an additional information. But yes its additional and can be dropped.
Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
On 09/09/2016 01:54 AM, Dave Hansen wrote: > On 09/07/2016 07:46 PM, Anshuman Khandual wrote: >> > after memory or node hot[un]plug is desirable. This change adds one >> > new sysfs interface (/sys/devices/system/memory/system_zone_details) >> > which will fetch and dump this information. > Doesn't this violate the "one value per file" sysfs rule? Does it > belong in debugfs instead? Yeah sure. Will make it a debugfs interface. > > I also really question the need to dump kernel addresses out, filtered > or not. What's the point? Hmm, thought it to be an additional information. But yes its additional and can be dropped.
Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
On 09/09/2016 07:06 PM, Michal Hocko wrote: > On Thu 08-09-16 08:16:58, Anshuman Khandual wrote: >> > Each individual node in the system has a ZONELIST_FALLBACK zonelist >> > and a ZONELIST_NOFALLBACK zonelist. These zonelists decide fallback >> > order of zones during memory allocations. Sometimes it helps to dump >> > these zonelists to see the priority order of various zones in them. >> > >> > Particularly platforms which support memory hotplug into previously >> > non existing zones (at boot), this interface helps in visualizing >> > which all zonelists of the system at what priority level, the new >> > hot added memory ends up in. POWER is such a platform where all the >> > memory detected during boot time remains with ZONE_DMA for good but >> > then hot plug process can actually get new memory into ZONE_MOVABLE. >> > So having a way to get the snapshot of the zonelists on the system >> > after memory or node hot[un]plug is desirable. This change adds one >> > new sysfs interface (/sys/devices/system/memory/system_zone_details) >> > which will fetch and dump this information. > I am still not sure I understand why this is helpful and who is the > consumer for this interface and how it will benefit from the > information. Dave (who doesn't seem to be on the CC list re-added) had > another objection that this breaks one-value-per-file rule for sysfs > files. It helps in understanding the relative priority of each memory zone of the system during various allocation scenarios. Its particularly helpful after hotplug/unplug of additional memory into previously non existing zone on a node. > > This all smells like a debugging feature to me and so it should go into > debugfs. Sure, will make it a debugfs interface.
Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
On 09/09/2016 07:06 PM, Michal Hocko wrote: > On Thu 08-09-16 08:16:58, Anshuman Khandual wrote: >> > Each individual node in the system has a ZONELIST_FALLBACK zonelist >> > and a ZONELIST_NOFALLBACK zonelist. These zonelists decide fallback >> > order of zones during memory allocations. Sometimes it helps to dump >> > these zonelists to see the priority order of various zones in them. >> > >> > Particularly platforms which support memory hotplug into previously >> > non existing zones (at boot), this interface helps in visualizing >> > which all zonelists of the system at what priority level, the new >> > hot added memory ends up in. POWER is such a platform where all the >> > memory detected during boot time remains with ZONE_DMA for good but >> > then hot plug process can actually get new memory into ZONE_MOVABLE. >> > So having a way to get the snapshot of the zonelists on the system >> > after memory or node hot[un]plug is desirable. This change adds one >> > new sysfs interface (/sys/devices/system/memory/system_zone_details) >> > which will fetch and dump this information. > I am still not sure I understand why this is helpful and who is the > consumer for this interface and how it will benefit from the > information. Dave (who doesn't seem to be on the CC list re-added) had > another objection that this breaks one-value-per-file rule for sysfs > files. It helps in understanding the relative priority of each memory zone of the system during various allocation scenarios. Its particularly helpful after hotplug/unplug of additional memory into previously non existing zone on a node. > > This all smells like a debugging feature to me and so it should go into > debugfs. Sure, will make it a debugfs interface.
Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
On Thu 08-09-16 08:16:58, Anshuman Khandual wrote: > Each individual node in the system has a ZONELIST_FALLBACK zonelist > and a ZONELIST_NOFALLBACK zonelist. These zonelists decide fallback > order of zones during memory allocations. Sometimes it helps to dump > these zonelists to see the priority order of various zones in them. > > Particularly platforms which support memory hotplug into previously > non existing zones (at boot), this interface helps in visualizing > which all zonelists of the system at what priority level, the new > hot added memory ends up in. POWER is such a platform where all the > memory detected during boot time remains with ZONE_DMA for good but > then hot plug process can actually get new memory into ZONE_MOVABLE. > So having a way to get the snapshot of the zonelists on the system > after memory or node hot[un]plug is desirable. This change adds one > new sysfs interface (/sys/devices/system/memory/system_zone_details) > which will fetch and dump this information. I am still not sure I understand why this is helpful and who is the consumer for this interface and how it will benefit from the information. Dave (who doesn't seem to be on the CC list re-added) had another objection that this breaks one-value-per-file rule for sysfs files. This all smells like a debugging feature to me and so it should go into debugfs. > Example zonelist information from a KVM guest. > > [NODE (0)] > ZONELIST_FALLBACK > (0) (node 0) (DMA 0xc0006300) > (1) (node 1) (DMA 0xc0016300) > (2) (node 2) (DMA 0xc0026300) > (3) (node 3) (DMA 0xc003ffdba300) > ZONELIST_NOFALLBACK > (0) (node 0) (DMA 0xc0006300) > [NODE (1)] > ZONELIST_FALLBACK > (0) (node 1) (DMA 0xc0016300) > (1) (node 2) (DMA 0xc0026300) > (2) (node 3) (DMA 0xc003ffdba300) > (3) (node 0) (DMA 0xc0006300) > ZONELIST_NOFALLBACK > (0) (node 1) (DMA 0xc0016300) > [NODE (2)] > ZONELIST_FALLBACK > (0) (node 2) (DMA 0xc0026300) > (1) (node 3) (DMA 0xc003ffdba300) > (2) (node 0) (DMA 0xc0006300) > (3) (node 1) (DMA 0xc0016300) > ZONELIST_NOFALLBACK > (0) (node 2) (DMA 0xc0026300) > [NODE (3)] > ZONELIST_FALLBACK > (0) (node 3) (DMA 0xc003ffdba300) > (1) (node 0) (DMA 0xc0006300) > (2) (node 1) (DMA 0xc0016300) > (3) (node 2) (DMA 0xc0026300) > ZONELIST_NOFALLBACK > (0) (node 3) (DMA 0xc003ffdba300) > > Signed-off-by: Anshuman Khandual> --- > Changes in V4: > - Explicitly included mmzone.h header inside page_alloc.c > - Changed the kernel address printing from %lx to %pK > > Changes in V3: > - Moved all these new sysfs code inside CONFIG_NUMA > > Changes in V2: > - Added more details into the commit message > - Added sysfs interface file details into the commit message > - Added ../ABI/testing/sysfs-system-zone-details file > > .../ABI/testing/sysfs-system-zone-details | 9 > drivers/base/memory.c | 52 > ++ > mm/page_alloc.c| 1 + > 3 files changed, 62 insertions(+) > create mode 100644 Documentation/ABI/testing/sysfs-system-zone-details > > diff --git a/Documentation/ABI/testing/sysfs-system-zone-details > b/Documentation/ABI/testing/sysfs-system-zone-details > new file mode 100644 > index 000..9c13b2e > --- /dev/null > +++ b/Documentation/ABI/testing/sysfs-system-zone-details > @@ -0,0 +1,9 @@ > +What:/sys/devices/system/memory/system_zone_details > +Date:Sep 2016 > +KernelVersion: 4.8 > +Contact: khand...@linux.vnet.ibm.com > +Description: > + This read only file dumps the zonelist and it's constituent > + zones information for both ZONELIST_FALLBACK and ZONELIST_ > + NOFALLBACK zonelists for each online node of the system at > + any given point of time. > diff --git a/drivers/base/memory.c b/drivers/base/memory.c > index dc75de9..c7ab991 100644 > --- a/drivers/base/memory.c > +++ b/drivers/base/memory.c > @@ -442,7 +442,56 @@ print_block_size(struct device *dev, struct > device_attribute *attr, > return sprintf(buf, "%lx\n", get_memory_block_size()); > } > > +#ifdef CONFIG_NUMA > +static ssize_t dump_zonelist(char *buf, struct zonelist *zonelist) > +{ > + unsigned int i; > + ssize_t count = 0; > + > + for (i = 0; zonelist->_zonerefs[i].zone; i++) { > + count += sprintf(buf + count, >
Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
On Thu 08-09-16 08:16:58, Anshuman Khandual wrote: > Each individual node in the system has a ZONELIST_FALLBACK zonelist > and a ZONELIST_NOFALLBACK zonelist. These zonelists decide fallback > order of zones during memory allocations. Sometimes it helps to dump > these zonelists to see the priority order of various zones in them. > > Particularly platforms which support memory hotplug into previously > non existing zones (at boot), this interface helps in visualizing > which all zonelists of the system at what priority level, the new > hot added memory ends up in. POWER is such a platform where all the > memory detected during boot time remains with ZONE_DMA for good but > then hot plug process can actually get new memory into ZONE_MOVABLE. > So having a way to get the snapshot of the zonelists on the system > after memory or node hot[un]plug is desirable. This change adds one > new sysfs interface (/sys/devices/system/memory/system_zone_details) > which will fetch and dump this information. I am still not sure I understand why this is helpful and who is the consumer for this interface and how it will benefit from the information. Dave (who doesn't seem to be on the CC list re-added) had another objection that this breaks one-value-per-file rule for sysfs files. This all smells like a debugging feature to me and so it should go into debugfs. > Example zonelist information from a KVM guest. > > [NODE (0)] > ZONELIST_FALLBACK > (0) (node 0) (DMA 0xc0006300) > (1) (node 1) (DMA 0xc0016300) > (2) (node 2) (DMA 0xc0026300) > (3) (node 3) (DMA 0xc003ffdba300) > ZONELIST_NOFALLBACK > (0) (node 0) (DMA 0xc0006300) > [NODE (1)] > ZONELIST_FALLBACK > (0) (node 1) (DMA 0xc0016300) > (1) (node 2) (DMA 0xc0026300) > (2) (node 3) (DMA 0xc003ffdba300) > (3) (node 0) (DMA 0xc0006300) > ZONELIST_NOFALLBACK > (0) (node 1) (DMA 0xc0016300) > [NODE (2)] > ZONELIST_FALLBACK > (0) (node 2) (DMA 0xc0026300) > (1) (node 3) (DMA 0xc003ffdba300) > (2) (node 0) (DMA 0xc0006300) > (3) (node 1) (DMA 0xc0016300) > ZONELIST_NOFALLBACK > (0) (node 2) (DMA 0xc0026300) > [NODE (3)] > ZONELIST_FALLBACK > (0) (node 3) (DMA 0xc003ffdba300) > (1) (node 0) (DMA 0xc0006300) > (2) (node 1) (DMA 0xc0016300) > (3) (node 2) (DMA 0xc0026300) > ZONELIST_NOFALLBACK > (0) (node 3) (DMA 0xc003ffdba300) > > Signed-off-by: Anshuman Khandual > --- > Changes in V4: > - Explicitly included mmzone.h header inside page_alloc.c > - Changed the kernel address printing from %lx to %pK > > Changes in V3: > - Moved all these new sysfs code inside CONFIG_NUMA > > Changes in V2: > - Added more details into the commit message > - Added sysfs interface file details into the commit message > - Added ../ABI/testing/sysfs-system-zone-details file > > .../ABI/testing/sysfs-system-zone-details | 9 > drivers/base/memory.c | 52 > ++ > mm/page_alloc.c| 1 + > 3 files changed, 62 insertions(+) > create mode 100644 Documentation/ABI/testing/sysfs-system-zone-details > > diff --git a/Documentation/ABI/testing/sysfs-system-zone-details > b/Documentation/ABI/testing/sysfs-system-zone-details > new file mode 100644 > index 000..9c13b2e > --- /dev/null > +++ b/Documentation/ABI/testing/sysfs-system-zone-details > @@ -0,0 +1,9 @@ > +What:/sys/devices/system/memory/system_zone_details > +Date:Sep 2016 > +KernelVersion: 4.8 > +Contact: khand...@linux.vnet.ibm.com > +Description: > + This read only file dumps the zonelist and it's constituent > + zones information for both ZONELIST_FALLBACK and ZONELIST_ > + NOFALLBACK zonelists for each online node of the system at > + any given point of time. > diff --git a/drivers/base/memory.c b/drivers/base/memory.c > index dc75de9..c7ab991 100644 > --- a/drivers/base/memory.c > +++ b/drivers/base/memory.c > @@ -442,7 +442,56 @@ print_block_size(struct device *dev, struct > device_attribute *attr, > return sprintf(buf, "%lx\n", get_memory_block_size()); > } > > +#ifdef CONFIG_NUMA > +static ssize_t dump_zonelist(char *buf, struct zonelist *zonelist) > +{ > + unsigned int i; > + ssize_t count = 0; > + > + for (i = 0; zonelist->_zonerefs[i].zone; i++) { > + count += sprintf(buf + count, > +
Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
On 09/07/2016 07:46 PM, Anshuman Khandual wrote: > after memory or node hot[un]plug is desirable. This change adds one > new sysfs interface (/sys/devices/system/memory/system_zone_details) > which will fetch and dump this information. Doesn't this violate the "one value per file" sysfs rule? Does it belong in debugfs instead? I also really question the need to dump kernel addresses out, filtered or not. What's the point?
Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
On 09/07/2016 07:46 PM, Anshuman Khandual wrote: > after memory or node hot[un]plug is desirable. This change adds one > new sysfs interface (/sys/devices/system/memory/system_zone_details) > which will fetch and dump this information. Doesn't this violate the "one value per file" sysfs rule? Does it belong in debugfs instead? I also really question the need to dump kernel addresses out, filtered or not. What's the point?
Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
Hi Anshuman, [auto build test ERROR on driver-core/driver-core-testing] [also build test ERROR on v4.8-rc5] [cannot apply to next-20160908] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] [Suggest to use git(>=2.9.0) format-patch --base= (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on] [Check https://git-scm.com/docs/git-format-patch for more information] url: https://github.com/0day-ci/linux/commits/Anshuman-Khandual/mm-Add-sysfs-interface-to-dump-each-node-s-zonelist-information/20160908-104922 config: x86_64-lkp (attached as .config) compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): drivers/base/memory.c: In function 'dump_zonelist': >> drivers/base/memory.c:455:4: error: 'zone_names' undeclared (first use in >> this function) zone_names[zonelist->_zonerefs[i].zone_idx], ^~ drivers/base/memory.c:455:4: note: each undeclared identifier is reported only once for each function it appears in vim +/zone_names +455 drivers/base/memory.c 449 ssize_t count = 0; 450 451 for (i = 0; zonelist->_zonerefs[i].zone; i++) { 452 count += sprintf(buf + count, 453 "\t\t(%d) (node %d) (%-7s 0x%pK)\n", i, 454 zonelist->_zonerefs[i].zone->zone_pgdat->node_id, > 455 zone_names[zonelist->_zonerefs[i].zone_idx], 456 (void *) zonelist->_zonerefs[i].zone); 457 } 458 return count; --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: [PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
Hi Anshuman, [auto build test ERROR on driver-core/driver-core-testing] [also build test ERROR on v4.8-rc5] [cannot apply to next-20160908] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] [Suggest to use git(>=2.9.0) format-patch --base= (or --base=auto for convenience) to record what (public, well-known) commit your patch series was built on] [Check https://git-scm.com/docs/git-format-patch for more information] url: https://github.com/0day-ci/linux/commits/Anshuman-Khandual/mm-Add-sysfs-interface-to-dump-each-node-s-zonelist-information/20160908-104922 config: x86_64-lkp (attached as .config) compiler: gcc-6 (Debian 6.1.1-9) 6.1.1 20160705 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): drivers/base/memory.c: In function 'dump_zonelist': >> drivers/base/memory.c:455:4: error: 'zone_names' undeclared (first use in >> this function) zone_names[zonelist->_zonerefs[i].zone_idx], ^~ drivers/base/memory.c:455:4: note: each undeclared identifier is reported only once for each function it appears in vim +/zone_names +455 drivers/base/memory.c 449 ssize_t count = 0; 450 451 for (i = 0; zonelist->_zonerefs[i].zone; i++) { 452 count += sprintf(buf + count, 453 "\t\t(%d) (node %d) (%-7s 0x%pK)\n", i, 454 zonelist->_zonerefs[i].zone->zone_pgdat->node_id, > 455 zone_names[zonelist->_zonerefs[i].zone_idx], 456 (void *) zonelist->_zonerefs[i].zone); 457 } 458 return count; --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
[PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
Each individual node in the system has a ZONELIST_FALLBACK zonelist and a ZONELIST_NOFALLBACK zonelist. These zonelists decide fallback order of zones during memory allocations. Sometimes it helps to dump these zonelists to see the priority order of various zones in them. Particularly platforms which support memory hotplug into previously non existing zones (at boot), this interface helps in visualizing which all zonelists of the system at what priority level, the new hot added memory ends up in. POWER is such a platform where all the memory detected during boot time remains with ZONE_DMA for good but then hot plug process can actually get new memory into ZONE_MOVABLE. So having a way to get the snapshot of the zonelists on the system after memory or node hot[un]plug is desirable. This change adds one new sysfs interface (/sys/devices/system/memory/system_zone_details) which will fetch and dump this information. Example zonelist information from a KVM guest. [NODE (0)] ZONELIST_FALLBACK (0) (node 0) (DMA 0xc0006300) (1) (node 1) (DMA 0xc0016300) (2) (node 2) (DMA 0xc0026300) (3) (node 3) (DMA 0xc003ffdba300) ZONELIST_NOFALLBACK (0) (node 0) (DMA 0xc0006300) [NODE (1)] ZONELIST_FALLBACK (0) (node 1) (DMA 0xc0016300) (1) (node 2) (DMA 0xc0026300) (2) (node 3) (DMA 0xc003ffdba300) (3) (node 0) (DMA 0xc0006300) ZONELIST_NOFALLBACK (0) (node 1) (DMA 0xc0016300) [NODE (2)] ZONELIST_FALLBACK (0) (node 2) (DMA 0xc0026300) (1) (node 3) (DMA 0xc003ffdba300) (2) (node 0) (DMA 0xc0006300) (3) (node 1) (DMA 0xc0016300) ZONELIST_NOFALLBACK (0) (node 2) (DMA 0xc0026300) [NODE (3)] ZONELIST_FALLBACK (0) (node 3) (DMA 0xc003ffdba300) (1) (node 0) (DMA 0xc0006300) (2) (node 1) (DMA 0xc0016300) (3) (node 2) (DMA 0xc0026300) ZONELIST_NOFALLBACK (0) (node 3) (DMA 0xc003ffdba300) Signed-off-by: Anshuman Khandual--- Changes in V4: - Explicitly included mmzone.h header inside page_alloc.c - Changed the kernel address printing from %lx to %pK Changes in V3: - Moved all these new sysfs code inside CONFIG_NUMA Changes in V2: - Added more details into the commit message - Added sysfs interface file details into the commit message - Added ../ABI/testing/sysfs-system-zone-details file .../ABI/testing/sysfs-system-zone-details | 9 drivers/base/memory.c | 52 ++ mm/page_alloc.c| 1 + 3 files changed, 62 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-system-zone-details diff --git a/Documentation/ABI/testing/sysfs-system-zone-details b/Documentation/ABI/testing/sysfs-system-zone-details new file mode 100644 index 000..9c13b2e --- /dev/null +++ b/Documentation/ABI/testing/sysfs-system-zone-details @@ -0,0 +1,9 @@ +What: /sys/devices/system/memory/system_zone_details +Date: Sep 2016 +KernelVersion: 4.8 +Contact: khand...@linux.vnet.ibm.com +Description: + This read only file dumps the zonelist and it's constituent + zones information for both ZONELIST_FALLBACK and ZONELIST_ + NOFALLBACK zonelists for each online node of the system at + any given point of time. diff --git a/drivers/base/memory.c b/drivers/base/memory.c index dc75de9..c7ab991 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -442,7 +442,56 @@ print_block_size(struct device *dev, struct device_attribute *attr, return sprintf(buf, "%lx\n", get_memory_block_size()); } +#ifdef CONFIG_NUMA +static ssize_t dump_zonelist(char *buf, struct zonelist *zonelist) +{ + unsigned int i; + ssize_t count = 0; + + for (i = 0; zonelist->_zonerefs[i].zone; i++) { + count += sprintf(buf + count, + "\t\t(%d) (node %d) (%-7s 0x%pK)\n", i, + zonelist->_zonerefs[i].zone->zone_pgdat->node_id, + zone_names[zonelist->_zonerefs[i].zone_idx], + (void *) zonelist->_zonerefs[i].zone); + } + return count; +} + +static ssize_t dump_zonelists(char *buf) +{ + struct zonelist *zonelist; + unsigned int node; + ssize_t count = 0; + + for_each_online_node(node) { + zonelist = &(NODE_DATA(node)-> + node_zonelists[ZONELIST_FALLBACK]); + count +=
[PATCH V4] mm: Add sysfs interface to dump each node's zonelist information
Each individual node in the system has a ZONELIST_FALLBACK zonelist and a ZONELIST_NOFALLBACK zonelist. These zonelists decide fallback order of zones during memory allocations. Sometimes it helps to dump these zonelists to see the priority order of various zones in them. Particularly platforms which support memory hotplug into previously non existing zones (at boot), this interface helps in visualizing which all zonelists of the system at what priority level, the new hot added memory ends up in. POWER is such a platform where all the memory detected during boot time remains with ZONE_DMA for good but then hot plug process can actually get new memory into ZONE_MOVABLE. So having a way to get the snapshot of the zonelists on the system after memory or node hot[un]plug is desirable. This change adds one new sysfs interface (/sys/devices/system/memory/system_zone_details) which will fetch and dump this information. Example zonelist information from a KVM guest. [NODE (0)] ZONELIST_FALLBACK (0) (node 0) (DMA 0xc0006300) (1) (node 1) (DMA 0xc0016300) (2) (node 2) (DMA 0xc0026300) (3) (node 3) (DMA 0xc003ffdba300) ZONELIST_NOFALLBACK (0) (node 0) (DMA 0xc0006300) [NODE (1)] ZONELIST_FALLBACK (0) (node 1) (DMA 0xc0016300) (1) (node 2) (DMA 0xc0026300) (2) (node 3) (DMA 0xc003ffdba300) (3) (node 0) (DMA 0xc0006300) ZONELIST_NOFALLBACK (0) (node 1) (DMA 0xc0016300) [NODE (2)] ZONELIST_FALLBACK (0) (node 2) (DMA 0xc0026300) (1) (node 3) (DMA 0xc003ffdba300) (2) (node 0) (DMA 0xc0006300) (3) (node 1) (DMA 0xc0016300) ZONELIST_NOFALLBACK (0) (node 2) (DMA 0xc0026300) [NODE (3)] ZONELIST_FALLBACK (0) (node 3) (DMA 0xc003ffdba300) (1) (node 0) (DMA 0xc0006300) (2) (node 1) (DMA 0xc0016300) (3) (node 2) (DMA 0xc0026300) ZONELIST_NOFALLBACK (0) (node 3) (DMA 0xc003ffdba300) Signed-off-by: Anshuman Khandual --- Changes in V4: - Explicitly included mmzone.h header inside page_alloc.c - Changed the kernel address printing from %lx to %pK Changes in V3: - Moved all these new sysfs code inside CONFIG_NUMA Changes in V2: - Added more details into the commit message - Added sysfs interface file details into the commit message - Added ../ABI/testing/sysfs-system-zone-details file .../ABI/testing/sysfs-system-zone-details | 9 drivers/base/memory.c | 52 ++ mm/page_alloc.c| 1 + 3 files changed, 62 insertions(+) create mode 100644 Documentation/ABI/testing/sysfs-system-zone-details diff --git a/Documentation/ABI/testing/sysfs-system-zone-details b/Documentation/ABI/testing/sysfs-system-zone-details new file mode 100644 index 000..9c13b2e --- /dev/null +++ b/Documentation/ABI/testing/sysfs-system-zone-details @@ -0,0 +1,9 @@ +What: /sys/devices/system/memory/system_zone_details +Date: Sep 2016 +KernelVersion: 4.8 +Contact: khand...@linux.vnet.ibm.com +Description: + This read only file dumps the zonelist and it's constituent + zones information for both ZONELIST_FALLBACK and ZONELIST_ + NOFALLBACK zonelists for each online node of the system at + any given point of time. diff --git a/drivers/base/memory.c b/drivers/base/memory.c index dc75de9..c7ab991 100644 --- a/drivers/base/memory.c +++ b/drivers/base/memory.c @@ -442,7 +442,56 @@ print_block_size(struct device *dev, struct device_attribute *attr, return sprintf(buf, "%lx\n", get_memory_block_size()); } +#ifdef CONFIG_NUMA +static ssize_t dump_zonelist(char *buf, struct zonelist *zonelist) +{ + unsigned int i; + ssize_t count = 0; + + for (i = 0; zonelist->_zonerefs[i].zone; i++) { + count += sprintf(buf + count, + "\t\t(%d) (node %d) (%-7s 0x%pK)\n", i, + zonelist->_zonerefs[i].zone->zone_pgdat->node_id, + zone_names[zonelist->_zonerefs[i].zone_idx], + (void *) zonelist->_zonerefs[i].zone); + } + return count; +} + +static ssize_t dump_zonelists(char *buf) +{ + struct zonelist *zonelist; + unsigned int node; + ssize_t count = 0; + + for_each_online_node(node) { + zonelist = &(NODE_DATA(node)-> + node_zonelists[ZONELIST_FALLBACK]); + count += sprintf(buf + count, "[NODE