Re: [PATCH RFC 04/14] mm: vmstat: convert slab vmstat counter to bytes
On Mon, Sep 16, 2019 at 02:38:40PM +0200, Johannes Weiner wrote: > On Thu, Sep 05, 2019 at 02:45:48PM -0700, Roman Gushchin wrote: > > In order to prepare for per-object slab memory accounting, > > convert NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE vmstat > > items to bytes. > > > > To make sure that these vmstats are in bytes, rename them > > to NR_SLAB_RECLAIMABLE_B and NR_SLAB_UNRECLAIMABLE_B (similar to > > NR_KERNEL_STACK_KB). > > > > The size of slab memory shouldn't exceed 4Gb on 32-bit machines, > > so it will fit into atomic_long_t we use for vmstats. > > > > Signed-off-by: Roman Gushchin Hello Johannes! Thank you for looking into the patchset! > > Maybe a crazy idea, but instead of mixing bytes and pages, would it be > difficult to account all vmstat items in bytes internally? And provide > two general apis, byte and page based, to update and query the counts, > instead of tying the unit it to individual items? > > The vmstat_item_in_bytes() conditional shifting is pretty awkward in > code that has a recent history littered with subtle breakages. > > The translation helper node_page_state_pages() will yield garbage if > used with the page-based counters, which is another easy to misuse > interface. > > We already have many places that multiply with PAGE_SIZE to get the > stats in bytes or kb units. > > And _B/_KB suffixes are kinda clunky. > > The stats use atomic_long_t, so switching to atomic64_t doesn't make a > difference on 64-bit and is backward compatible with 32-bit. I fully agree here, that having different stats in different units adds a lot of mess to the code. But I always thought that 64-bit atomics are slow on a 32-bit machine, so it might be a noticeable performance regression. Don't you think so? I'm happy to prepare such a patch(set), only I'd prefer to keep it separately from this one. It can precede or follow the slab controller rework, either way will work. Slab controller rework is already not so small, so adding more code (and potential issues) here will only make the review more complex. > > The per-cpu batch size you have to raise from s8 either way. Yeah, tbh I don't know why those are just not unsigned long by default. Space savings are miserable here, and I don't see any other reasons. It could be even slightly faster to use a larger type. I kinda tried to keep the patchset as small as possible (at least for the RFC version), so tried to avoid any non-necessary changes. But overall using s8 or s16 here doesn't make much sense to me. > > It seems to me that would make the code and API a lot simpler and > easier to use / harder to misuse.
Re: [PATCH RFC 04/14] mm: vmstat: convert slab vmstat counter to bytes
On Thu, Sep 05, 2019 at 02:45:48PM -0700, Roman Gushchin wrote: > In order to prepare for per-object slab memory accounting, > convert NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE vmstat > items to bytes. > > To make sure that these vmstats are in bytes, rename them > to NR_SLAB_RECLAIMABLE_B and NR_SLAB_UNRECLAIMABLE_B (similar to > NR_KERNEL_STACK_KB). > > The size of slab memory shouldn't exceed 4Gb on 32-bit machines, > so it will fit into atomic_long_t we use for vmstats. > > Signed-off-by: Roman Gushchin Maybe a crazy idea, but instead of mixing bytes and pages, would it be difficult to account all vmstat items in bytes internally? And provide two general apis, byte and page based, to update and query the counts, instead of tying the unit it to individual items? The vmstat_item_in_bytes() conditional shifting is pretty awkward in code that has a recent history littered with subtle breakages. The translation helper node_page_state_pages() will yield garbage if used with the page-based counters, which is another easy to misuse interface. We already have many places that multiply with PAGE_SIZE to get the stats in bytes or kb units. And _B/_KB suffixes are kinda clunky. The stats use atomic_long_t, so switching to atomic64_t doesn't make a difference on 64-bit and is backward compatible with 32-bit. The per-cpu batch size you have to raise from s8 either way. It seems to me that would make the code and API a lot simpler and easier to use / harder to misuse.
[PATCH RFC 04/14] mm: vmstat: convert slab vmstat counter to bytes
In order to prepare for per-object slab memory accounting, convert NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE vmstat items to bytes. To make sure that these vmstats are in bytes, rename them to NR_SLAB_RECLAIMABLE_B and NR_SLAB_UNRECLAIMABLE_B (similar to NR_KERNEL_STACK_KB). The size of slab memory shouldn't exceed 4Gb on 32-bit machines, so it will fit into atomic_long_t we use for vmstats. Signed-off-by: Roman Gushchin --- drivers/base/node.c | 11 --- fs/proc/meminfo.c | 4 ++-- include/linux/mmzone.h | 10 -- include/linux/vmstat.h | 8 kernel/power/snapshot.c | 2 +- mm/memcontrol.c | 29 - mm/oom_kill.c | 2 +- mm/page_alloc.c | 8 mm/slab.h | 15 --- mm/slab_common.c| 4 ++-- mm/slob.c | 12 ++-- mm/slub.c | 8 mm/vmscan.c | 3 ++- mm/vmstat.c | 22 +++--- mm/workingset.c | 6 -- 15 files changed, 93 insertions(+), 51 deletions(-) diff --git a/drivers/base/node.c b/drivers/base/node.c index 296546ffed6c..56664222f3fd 100644 --- a/drivers/base/node.c +++ b/drivers/base/node.c @@ -368,8 +368,8 @@ static ssize_t node_read_meminfo(struct device *dev, unsigned long sreclaimable, sunreclaimable; si_meminfo_node(, nid); - sreclaimable = node_page_state(pgdat, NR_SLAB_RECLAIMABLE); - sunreclaimable = node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE); + sreclaimable = node_page_state_pages(pgdat, NR_SLAB_RECLAIMABLE_B); + sunreclaimable = node_page_state_pages(pgdat, NR_SLAB_UNRECLAIMABLE_B); n = sprintf(buf, "Node %d MemTotal: %8lu kB\n" "Node %d MemFree:%8lu kB\n" @@ -495,9 +495,14 @@ static ssize_t node_read_vmstat(struct device *dev, int i; int n = 0; - for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) + for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) { + unsigned long x = sum_zone_node_page_state(nid, i); + + if (vmstat_item_in_bytes(i)) + x >>= PAGE_SHIFT; n += sprintf(buf+n, "%s %lu\n", vmstat_text[i], sum_zone_node_page_state(nid, i)); + } #ifdef CONFIG_NUMA for (i = 0; i < NR_VM_NUMA_STAT_ITEMS; i++) diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index ac9247371871..87afa8683c1b 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -53,8 +53,8 @@ static int meminfo_proc_show(struct seq_file *m, void *v) pages[lru] = global_node_page_state(NR_LRU_BASE + lru); available = si_mem_available(); - sreclaimable = global_node_page_state(NR_SLAB_RECLAIMABLE); - sunreclaim = global_node_page_state(NR_SLAB_UNRECLAIMABLE); + sreclaimable = global_node_page_state_pages(NR_SLAB_RECLAIMABLE_B); + sunreclaim = global_node_page_state_pages(NR_SLAB_UNRECLAIMABLE_B); show_val_kb(m, "MemTotal: ", i.totalram); show_val_kb(m, "MemFree:", i.freeram); diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 6e8233d52971..2dbc2d042ef6 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -215,8 +215,8 @@ enum node_stat_item { NR_INACTIVE_FILE, /* " " " " " */ NR_ACTIVE_FILE, /* " " " " " */ NR_UNEVICTABLE, /* " " " " " */ - NR_SLAB_RECLAIMABLE,/* Please do not reorder this item */ - NR_SLAB_UNRECLAIMABLE, /* and this one without looking at + NR_SLAB_RECLAIMABLE_B, /* Please do not reorder this item */ + NR_SLAB_UNRECLAIMABLE_B,/* and this one without looking at * memcg_flush_percpu_vmstats() first. */ NR_ISOLATED_ANON, /* Temporary isolated pages from anon lru */ NR_ISOLATED_FILE, /* Temporary isolated pages from file lru */ @@ -247,6 +247,12 @@ enum node_stat_item { NR_VM_NODE_STAT_ITEMS }; +static __always_inline bool vmstat_item_in_bytes(enum node_stat_item item) +{ + return (item == NR_SLAB_RECLAIMABLE_B || + item == NR_SLAB_UNRECLAIMABLE_B); +} + /* * We do arithmetic on the LRU lists in various places in the code, * so it is important to keep the active lists LRU_ACTIVE higher in diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index bdeda4b079fe..e5460e597e0c 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -194,6 +194,12 @@ static inline unsigned long global_node_page_state(enum node_stat_item item) return x; } +static inline +unsigned long global_node_page_state_pages(enum node_stat_item item) +{ + return global_node_page_state(item) >> PAGE_SHIFT; +} + static inline unsigned long zone_page_state(struct zone *zone,