Re: [PATCH RFC 04/14] mm: vmstat: convert slab vmstat counter to bytes

2019-09-16 Thread Roman Gushchin
On Mon, Sep 16, 2019 at 02:38:40PM +0200, Johannes Weiner wrote:
> On Thu, Sep 05, 2019 at 02:45:48PM -0700, Roman Gushchin wrote:
> > In order to prepare for per-object slab memory accounting,
> > convert NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE vmstat
> > items to bytes.
> > 
> > To make sure that these vmstats are in bytes, rename them
> > to NR_SLAB_RECLAIMABLE_B and NR_SLAB_UNRECLAIMABLE_B (similar to
> > NR_KERNEL_STACK_KB).
> > 
> > The size of slab memory shouldn't exceed 4Gb on 32-bit machines,
> > so it will fit into atomic_long_t we use for vmstats.
> > 
> > Signed-off-by: Roman Gushchin 

Hello Johannes!

Thank you for looking into the patchset!

> 
> Maybe a crazy idea, but instead of mixing bytes and pages, would it be
> difficult to account all vmstat items in bytes internally? And provide
> two general apis, byte and page based, to update and query the counts,
> instead of tying the unit it to individual items?
> 
> The vmstat_item_in_bytes() conditional shifting is pretty awkward in
> code that has a recent history littered with subtle breakages.
> 
> The translation helper node_page_state_pages() will yield garbage if
> used with the page-based counters, which is another easy to misuse
> interface.
> 
> We already have many places that multiply with PAGE_SIZE to get the
> stats in bytes or kb units.
> 
> And _B/_KB suffixes are kinda clunky.
> 
> The stats use atomic_long_t, so switching to atomic64_t doesn't make a
> difference on 64-bit and is backward compatible with 32-bit.

I fully agree here, that having different stats in different units
adds a lot of mess to the code. But I always thought that 64-bit
atomics are slow on a 32-bit machine, so it might be a noticeable
performance regression. Don't you think so?

I'm happy to prepare such a patch(set), only I'd prefer to keep it
separately from this one. It can precede or follow the slab controller
rework, either way will work. Slab controller rework is already not so
small, so adding more code (and potential issues) here will only make
the review more complex.

> 
> The per-cpu batch size you have to raise from s8 either way.

Yeah, tbh I don't know why those are just not unsigned long by default.
Space savings are miserable here, and I don't see any other reasons.
It could be even slightly faster to use a larger type.

I kinda tried to keep the patchset as small as possible (at least for
the RFC version), so tried to avoid any non-necessary changes.
But overall using s8 or s16 here doesn't make much sense to me.

> 
> It seems to me that would make the code and API a lot simpler and
> easier to use / harder to misuse.


Re: [PATCH RFC 04/14] mm: vmstat: convert slab vmstat counter to bytes

2019-09-16 Thread Johannes Weiner
On Thu, Sep 05, 2019 at 02:45:48PM -0700, Roman Gushchin wrote:
> In order to prepare for per-object slab memory accounting,
> convert NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE vmstat
> items to bytes.
> 
> To make sure that these vmstats are in bytes, rename them
> to NR_SLAB_RECLAIMABLE_B and NR_SLAB_UNRECLAIMABLE_B (similar to
> NR_KERNEL_STACK_KB).
> 
> The size of slab memory shouldn't exceed 4Gb on 32-bit machines,
> so it will fit into atomic_long_t we use for vmstats.
> 
> Signed-off-by: Roman Gushchin 

Maybe a crazy idea, but instead of mixing bytes and pages, would it be
difficult to account all vmstat items in bytes internally? And provide
two general apis, byte and page based, to update and query the counts,
instead of tying the unit it to individual items?

The vmstat_item_in_bytes() conditional shifting is pretty awkward in
code that has a recent history littered with subtle breakages.

The translation helper node_page_state_pages() will yield garbage if
used with the page-based counters, which is another easy to misuse
interface.

We already have many places that multiply with PAGE_SIZE to get the
stats in bytes or kb units.

And _B/_KB suffixes are kinda clunky.

The stats use atomic_long_t, so switching to atomic64_t doesn't make a
difference on 64-bit and is backward compatible with 32-bit.

The per-cpu batch size you have to raise from s8 either way.

It seems to me that would make the code and API a lot simpler and
easier to use / harder to misuse.


[PATCH RFC 04/14] mm: vmstat: convert slab vmstat counter to bytes

2019-09-05 Thread Roman Gushchin
In order to prepare for per-object slab memory accounting,
convert NR_SLAB_RECLAIMABLE and NR_SLAB_UNRECLAIMABLE vmstat
items to bytes.

To make sure that these vmstats are in bytes, rename them
to NR_SLAB_RECLAIMABLE_B and NR_SLAB_UNRECLAIMABLE_B (similar to
NR_KERNEL_STACK_KB).

The size of slab memory shouldn't exceed 4Gb on 32-bit machines,
so it will fit into atomic_long_t we use for vmstats.

Signed-off-by: Roman Gushchin 
---
 drivers/base/node.c | 11 ---
 fs/proc/meminfo.c   |  4 ++--
 include/linux/mmzone.h  | 10 --
 include/linux/vmstat.h  |  8 
 kernel/power/snapshot.c |  2 +-
 mm/memcontrol.c | 29 -
 mm/oom_kill.c   |  2 +-
 mm/page_alloc.c |  8 
 mm/slab.h   | 15 ---
 mm/slab_common.c|  4 ++--
 mm/slob.c   | 12 ++--
 mm/slub.c   |  8 
 mm/vmscan.c |  3 ++-
 mm/vmstat.c | 22 +++---
 mm/workingset.c |  6 --
 15 files changed, 93 insertions(+), 51 deletions(-)

diff --git a/drivers/base/node.c b/drivers/base/node.c
index 296546ffed6c..56664222f3fd 100644
--- a/drivers/base/node.c
+++ b/drivers/base/node.c
@@ -368,8 +368,8 @@ static ssize_t node_read_meminfo(struct device *dev,
unsigned long sreclaimable, sunreclaimable;
 
si_meminfo_node(, nid);
-   sreclaimable = node_page_state(pgdat, NR_SLAB_RECLAIMABLE);
-   sunreclaimable = node_page_state(pgdat, NR_SLAB_UNRECLAIMABLE);
+   sreclaimable = node_page_state_pages(pgdat, NR_SLAB_RECLAIMABLE_B);
+   sunreclaimable = node_page_state_pages(pgdat, NR_SLAB_UNRECLAIMABLE_B);
n = sprintf(buf,
   "Node %d MemTotal:   %8lu kB\n"
   "Node %d MemFree:%8lu kB\n"
@@ -495,9 +495,14 @@ static ssize_t node_read_vmstat(struct device *dev,
int i;
int n = 0;
 
-   for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++)
+   for (i = 0; i < NR_VM_ZONE_STAT_ITEMS; i++) {
+   unsigned long x = sum_zone_node_page_state(nid, i);
+
+   if (vmstat_item_in_bytes(i))
+   x >>= PAGE_SHIFT;
n += sprintf(buf+n, "%s %lu\n", vmstat_text[i],
 sum_zone_node_page_state(nid, i));
+   }
 
 #ifdef CONFIG_NUMA
for (i = 0; i < NR_VM_NUMA_STAT_ITEMS; i++)
diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c
index ac9247371871..87afa8683c1b 100644
--- a/fs/proc/meminfo.c
+++ b/fs/proc/meminfo.c
@@ -53,8 +53,8 @@ static int meminfo_proc_show(struct seq_file *m, void *v)
pages[lru] = global_node_page_state(NR_LRU_BASE + lru);
 
available = si_mem_available();
-   sreclaimable = global_node_page_state(NR_SLAB_RECLAIMABLE);
-   sunreclaim = global_node_page_state(NR_SLAB_UNRECLAIMABLE);
+   sreclaimable = global_node_page_state_pages(NR_SLAB_RECLAIMABLE_B);
+   sunreclaim = global_node_page_state_pages(NR_SLAB_UNRECLAIMABLE_B);
 
show_val_kb(m, "MemTotal:   ", i.totalram);
show_val_kb(m, "MemFree:", i.freeram);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 6e8233d52971..2dbc2d042ef6 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -215,8 +215,8 @@ enum node_stat_item {
NR_INACTIVE_FILE,   /*  " " "   "   " */
NR_ACTIVE_FILE, /*  " " "   "   " */
NR_UNEVICTABLE, /*  " " "   "   " */
-   NR_SLAB_RECLAIMABLE,/* Please do not reorder this item */
-   NR_SLAB_UNRECLAIMABLE,  /* and this one without looking at
+   NR_SLAB_RECLAIMABLE_B,  /* Please do not reorder this item */
+   NR_SLAB_UNRECLAIMABLE_B,/* and this one without looking at
 * memcg_flush_percpu_vmstats() first. */
NR_ISOLATED_ANON,   /* Temporary isolated pages from anon lru */
NR_ISOLATED_FILE,   /* Temporary isolated pages from file lru */
@@ -247,6 +247,12 @@ enum node_stat_item {
NR_VM_NODE_STAT_ITEMS
 };
 
+static __always_inline bool vmstat_item_in_bytes(enum node_stat_item item)
+{
+   return (item == NR_SLAB_RECLAIMABLE_B ||
+   item == NR_SLAB_UNRECLAIMABLE_B);
+}
+
 /*
  * We do arithmetic on the LRU lists in various places in the code,
  * so it is important to keep the active lists LRU_ACTIVE higher in
diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h
index bdeda4b079fe..e5460e597e0c 100644
--- a/include/linux/vmstat.h
+++ b/include/linux/vmstat.h
@@ -194,6 +194,12 @@ static inline unsigned long global_node_page_state(enum 
node_stat_item item)
return x;
 }
 
+static inline
+unsigned long global_node_page_state_pages(enum node_stat_item item)
+{
+   return global_node_page_state(item) >> PAGE_SHIFT;
+}
+
 static inline unsigned long zone_page_state(struct zone *zone,