from:"SeongJae Park"

Re: [RFC PATCH v4 0/5] DAMON based tiered memory management for CXL memory

2024-05-15 Thread SeongJae Park

Hi Honggyu,

On Mon, 13 May 2024 20:59:15 +0900 Honggyu Kim  wrote:

> Hi SeongJae,
> 
> Thanks very much for your work!  It got delayed due to the priority
> changes in my workplace for building another heterogeneous memory
> allocator.
> https://github.com/skhynix/hmsdk/wiki/hmalloc

No problem at all.  We all work on our own schedule and nobody can chase/push
anybody :)

> 
> On Sun, 12 May 2024 10:54:42 -0700 SeongJae Park  wrote:
> > There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously
> > posted at [1].
> > 
> > It says there is no implementation of the demote/promote DAMOS action
> > are made.  This RFC is about its implementation for physical address
> > space.
> > 
> > Changes from RFC v3
> > (https://lore.kernel.org/20240405060858.2818-1-honggyu@sk.com):
> 
> This link cannot be opened.  I will share the link again here.
> https://lore.kernel.org/all/20240405060858.2818-1-honggyu@sk.com

Thank you for checking the link!  It's weird though, since I can open the link
on my Chrome browser.

> 
> >   0. updated from v3 and posted by SJ on behalf of Hunggyu under his
> >  approval.
> >   1. Do not reuse damon_pa_pageout() and drop 'enum migration_mode'
> >   2. Drop vmstat change
> 
> I haven't checked whether I can collect useful information without
> vmstat, but the changes look good in general except for that.

I was thinking you could use DAMOS stat[1] for the schemes and assuming no
reply to it as an agreement, but maybe I should made it clear.  Do you think
DAMOS stat cannot be used instead?  If so, what would be the limitation of
DAMOS stat for your usage?

> 
> >   3. Drop unnecessary page reference check
> 
> I will compare this patch series with my previous v3 patchset and get
> back to you later maybe next week.

Thank you very much!  Unless I get a good enough test setup and results from it
on my own or from others' help, your test result would be the last requirement
for dropping RFC from this patchset.

> Sorry, I will have another break this week.

No problem, I hope you to have nice break.  Nobody can chase/push others.  We
all do this work voluntarily for our own fun and profit, right? ;)

[1] https://lore.kernel.org/damon/20240405060858.2818-1-honggyu@sk.com

Thanks,
SJ

> 
> Thanks,
> Honggyu

[RFC IDEA v2 6/6] drivers/virtio/virtio_balloon: integrate ACMA and ballooning

2024-05-12 Thread SeongJae Park

Let the host effectively inflate the balloon in access/contiguity-aware
way when the guest kernel is compiled with specific kernel config.  When
the config is enabled and the host requests balloon size change,
virtio-balloon adjusts ACMA's max-mem parameter instead of allocating
guest pages and put it into the balloon.  As a result, the host can use
the requested amount of guest memory, so from the host's perspective,
the ballooning just works, but in transparent and
access/contiguity-aware way.

Signed-off-by: SeongJae Park 
---
 drivers/virtio/virtio_balloon.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c
index 1f5b3dd31fcf..a954d75789ae 100644
--- a/drivers/virtio/virtio_balloon.c
+++ b/drivers/virtio/virtio_balloon.c
@@ -472,6 +472,32 @@ static void virtballoon_changed(struct virtio_device *vdev)
struct virtio_balloon *vb = vdev->priv;
unsigned long flags;
 
+#ifdef CONFIG_ACMA_BALLOON
+   s64 target;
+   u32 num_pages;
+
+
+   /* Legacy balloon config space is LE, unlike all other devices. */
+   virtio_cread_le(vb->vdev, struct virtio_balloon_config, num_pages,
+   _pages);
+
+   /*
+* Aligned up to guest page size to avoid inflating and deflating
+* balloon endlessly.
+*/
+   target = ALIGN(num_pages, VIRTIO_BALLOON_PAGES_PER_PAGE);
+
+   /*
+* If the given new max mem size is larger than current acma's max mem
+* size, same to normal max mem adjustment.
+* If the given new max mem size is smaller than current acma's max mem
+* size, strong aggressiveness is applied while memory for meeting the
+* new max mem is met is stolen.
+*/
+   acma_set_max_mem_aggressive(totalram_pages() - target);
+   return;
+#endif
+
spin_lock_irqsave(>stop_update_lock, flags);
if (!vb->stop_update) {
start_update_balloon_size(vb);
-- 
2.39.2

[RFC IDEA v2 0/6] mm/damon: introduce Access/Contiguity-aware Memory Auto-scaling (ACMA)

2024-05-12 Thread SeongJae Park

 exclusively used for only the contiguous memory allocation, or
allow non-contiguous memory allocation to use it under special condition
such as allowing only movable pages.  The second approach improves the
memory utilization, but sometimes suffers from pages that movable by
definition, but not easily movable in practice, similar to the memory
block-level page migration for memory hot unplugging that described on
the limitation section.  Even without the migration reliability and the
speed, finding the optimum size of the pool is challenging.

We could use ACMA-like approach for dynamically allocating a memory pool
for contiguous memory allocation.  It will be similar to ACMA but do not
report DAMOS-alloc-ed pages to the host.  Instead, use the regions as
contiguous memory allocation pool.

DRAM Consuming Power Saving
---

DRAM consumes and emits huge amount of power and carbon, respectively.
On bare-meta machines, we could scale down memory using ACMA, hot-unplug
completely DAMOS-alloc-ed memory blocks, and power off the DRAM device if
the hardware supports such operation.

Discussion Points
=

- Is there existing better alternatives for memory over-commit VM
  systems?
- Is it ok to reuse pages reporting infrastructure from ACMA?
- Is it ok to reuse virtio-balloon's interface for ACMA-integration?
- Will access-aware migration make real benefit?
- Does future usages of access-aware memory allocation make sense?

SeongJae Park (6):
  mm/damon: implement DAMOS actions for access-aware contiguous memory
allocation
  mm/damon: add the initial part of access/contiguity-aware memory
auto-scaling module
  mm/page_reporting: implement a function for reporting specific pfn
range
  mm/damon/acma: implement scale down feature
  mm/damon/acma: implement scale up feature
  drivers/virtio/virtio_balloon: integrate ACMA and ballooning

 drivers/virtio/virtio_balloon.c |  26 ++
 include/linux/damon.h   |  37 +++
 mm/damon/Kconfig|  10 +
 mm/damon/Makefile   |   1 +
 mm/damon/acma.c | 546 
 mm/damon/paddr.c|  93 ++
 mm/damon/sysfs-schemes.c|   4 +
 mm/page_reporting.c |  27 ++
 8 files changed, 744 insertions(+)
 create mode 100644 mm/damon/acma.c


base-commit: 40475439de721986370c9d26f53596e2bd4e1416
-- 
2.39.2

Re: [RFC PATCH v3 1/7] mm/damon/paddr: refactor DAMOS_PAGEOUT with migration_mode

2024-05-12 Thread SeongJae Park

On Sat, 11 May 2024 13:16:17 -0700 SeongJae Park  wrote:

> On Fri,  5 Apr 2024 12:19:07 -0700 SeongJae Park  wrote:
> 
> > On Fri,  5 Apr 2024 15:08:50 +0900 Honggyu Kim  wrote:
> > 
> > > This is a preparation patch that introduces migration modes.
> > > 
> > > The damon_pa_pageout is renamed to damon_pa_migrate and it receives an
> > > extra argument for migration_mode.
> > 
> > I personally think keeping damon_pa_pageout() as is and adding a new 
> > function
> > (damon_pa_migrate()) with some duplicated code is also ok, but this approach
> > also looks fine to me.  So I have no strong opinion here, but just letting 
> > you
> > know I would have no objection at both approaches.
> 
> Meanwhile, we added one more logic in damon_pa_pageout() for doing page
> idleness double check on its own[1].  It makes reusing damon_pa_pageout() for
> multiple reason a bit complex.  I think the complexity added a problem in this
> patch that I also missed before due to the complexity.  Show below comment in
> line.  Hence now I think it would be better to do the suggested way.
> 
> If we use the approach, this patch is no more necessary, and therefore can be
> dropped.
> 
> [1] https://lore.kernel.org/20240426195247.100306-1...@kernel.org

I updated this patchset to address comments on this thread, and posted it as
RFC patchset v4 on behalf of Honggyu under his approval:
https://lore.kernel.org/20240512175447.75943-1...@kernel.org


Thanks,
SJ

[...]

[RFC PATCH v4 3/5] mm/migrate: add MR_DAMON to migrate_reason

2024-05-12 Thread SeongJae Park

From: Honggyu Kim 

The current patch series introduces DAMON based migration across NUMA
nodes so it'd be better to have a new migrate_reason in trace events.

Signed-off-by: Honggyu Kim 
Reviewed-by: SeongJae Park 
Signed-off-by: SeongJae Park 
---
 include/linux/migrate_mode.h   | 1 +
 include/trace/events/migrate.h | 3 ++-
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
index f37cc03f9369..cec36b7e7ced 100644
--- a/include/linux/migrate_mode.h
+++ b/include/linux/migrate_mode.h
@@ -29,6 +29,7 @@ enum migrate_reason {
MR_CONTIG_RANGE,
MR_LONGTERM_PIN,
MR_DEMOTION,
+   MR_DAMON,
MR_TYPES
 };
 
diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h
index 0190ef725b43..cd01dd7b3640 100644
--- a/include/trace/events/migrate.h
+++ b/include/trace/events/migrate.h
@@ -22,7 +22,8 @@
EM( MR_NUMA_MISPLACED,  "numa_misplaced")   \
EM( MR_CONTIG_RANGE,"contig_range") \
EM( MR_LONGTERM_PIN,"longterm_pin") \
-   EMe(MR_DEMOTION,"demotion")
+   EM( MR_DEMOTION,"demotion") \
+   EMe(MR_DAMON,   "damon")
 
 /*
  * First define the enums in the above macros to be exported to userspace
-- 
2.39.2

[RFC PATCH v4 0/5] DAMON based tiered memory management for CXL memory

2024-05-12 Thread SeongJae Park

es the execution time increase.

However, "DAMON tiered" result shows less slowdown because the
DAMOS_MIGRATE_COLD action at DRAM node proactively demotes pre-allocated
cold memory to CXL node and this free space at DRAM increases more
chance to allocate hot or warm pages of redis-server to fast DRAM node.
Moreover, DAMOS_MIGRATE_HOT action at CXL node also promotes hot pages
of redis-server to DRAM node actively.

As a result, it makes more memory of redis-server stay in DRAM node
compared to "default" memory policy and this makes the performance
improvement.

The following result of latest distribution workload shows similar data.

  2. YCSB latest distribution read only workload
  memory pressure with cold memory on node0 with 512GB of local DRAM.
  =++=
   |   cold memory occupied by mmap and memset  |
   |   0G  440G  450G  460G  470G  480G  490G  500G |
  =++=
  Execution time normalized to DRAM-only values | GEOMEAN
  -++-
  DRAM-only| 1.00 - - - - - - - | 1.00
  CXL-only | 1.18 - - - - - - - | 1.18
  default  |-  1.18  1.19  1.18  1.18  1.17  1.19  1.18 | 1.18
  DAMON tiered |-  1.04  1.04  1.04  1.05  1.04  1.05  1.05 | 1.04
  =++=
  CXL usage of redis-server in GB   | AVERAGE
  -++-
  DRAM-only|  0.0 - - - - - - - |  0.0
  CXL-only | 52.6 - - - - - - - | 52.6
  default  |-  20.5  27.1  33.2  39.5  45.5  50.4  50.5 | 38.1
  DAMON tiered |-   0.2   0.4   0.7   1.6   1.2   1.1   3.4 |  1.2
  =++=

In summary of both results, our evaluation shows that "DAMON tiered"
memory management reduces the performance slowdown compared to the
"default" memory policy from 17~18% to 4~5% when the system runs with
high memory pressure on its fast tier DRAM nodes.

Having these DAMOS_MIGRATE_HOT and DAMOS_MIGRATE_COLD actions can make
tiered memory systems run more efficiently under high memory pressures.

Signed-off-by: Honggyu Kim 
Signed-off-by: Hyeongtak Ji 
Signed-off-by: Rakie Kim 
Signed-off-by: SeongJae Park 

[1] https://lore.kernel.org/damon/20231112195602.61525-1...@kernel.org
[2] https://lore.kernel.org/damon/20240311204545.47097-1...@kernel.org
[3] https://github.com/skhynix/hmsdk
[4] https://github.com/redis/redis/tree/7.0.0
[5] https://github.com/brianfrankcooper/YCSB/tree/0.17.0
[6] https://dl.acm.org/doi/10.1145/3503222.3507731
[7] https://dl.acm.org/doi/10.1145/3582016.3582063

Honggyu Kim (3):
  mm: make alloc_demote_folio externally invokable for migration
  mm/migrate: add MR_DAMON to migrate_reason
  mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion

Hyeongtak Ji (2):
  mm/damon/sysfs-schemes: add target_nid on sysfs-schemes
  mm/damon/paddr: introduce DAMOS_MIGRATE_HOT action for promotion

 include/linux/damon.h  |  15 +++-
 include/linux/migrate_mode.h   |   1 +
 include/trace/events/migrate.h |   3 +-
 mm/damon/core.c|   5 +-
 mm/damon/dbgfs.c   |   2 +-
 mm/damon/lru_sort.c|   3 +-
 mm/damon/paddr.c   | 157 +
 mm/damon/reclaim.c |   3 +-
 mm/damon/sysfs-schemes.c   |  35 +++-
 mm/internal.h  |   1 +
 mm/vmscan.c|   3 +-
 11 files changed, 219 insertions(+), 9 deletions(-)


base-commit: edc60852c99779574e0748bcf766560db67eb423
-- 
2.39.2

Re: [RFC PATCH v3 1/7] mm/damon/paddr: refactor DAMOS_PAGEOUT with migration_mode

2024-05-11 Thread SeongJae Park

On Fri,  5 Apr 2024 12:19:07 -0700 SeongJae Park  wrote:

> On Fri,  5 Apr 2024 15:08:50 +0900 Honggyu Kim  wrote:
> 
> > This is a preparation patch that introduces migration modes.
> > 
> > The damon_pa_pageout is renamed to damon_pa_migrate and it receives an
> > extra argument for migration_mode.
> 
> I personally think keeping damon_pa_pageout() as is and adding a new function
> (damon_pa_migrate()) with some duplicated code is also ok, but this approach
> also looks fine to me.  So I have no strong opinion here, but just letting you
> know I would have no objection at both approaches.

Meanwhile, we added one more logic in damon_pa_pageout() for doing page
idleness double check on its own[1].  It makes reusing damon_pa_pageout() for
multiple reason a bit complex.  I think the complexity added a problem in this
patch that I also missed before due to the complexity.  Show below comment in
line.  Hence now I think it would be better to do the suggested way.

If we use the approach, this patch is no more necessary, and therefore can be
dropped.

[1] https://lore.kernel.org/20240426195247.100306-1...@kernel.org

Thanks,
SJ

[...]
> 
> > 
> > No functional changes applied.
> > 
> > Signed-off-by: Honggyu Kim 
> > ---
> >  mm/damon/paddr.c | 18 +++---
> >  1 file changed, 15 insertions(+), 3 deletions(-)
> > 
> > diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
> > index 081e2a325778..277a1c4d833c 100644
> > --- a/mm/damon/paddr.c
> > +++ b/mm/damon/paddr.c
> > @@ -224,7 +224,12 @@ static bool damos_pa_filter_out(struct damos *scheme, 
> > struct folio *folio)
> > return false;
> >  }
> >  
> > -static unsigned long damon_pa_pageout(struct damon_region *r, struct damos 
> > *s)
> > +enum migration_mode {
> > +   MIG_PAGEOUT,
> > +};
> 
> To avoid name conflicts, what about renaming to 'damos_migration_mode' and
> 'DAMOS_MIG_PAGEOUT'?
> 
> > +
> > +static unsigned long damon_pa_migrate(struct damon_region *r, struct damos 
> > *s,
> > + enum migration_mode mm)
> 
> My poor brain has a bit confused with the name.  What about calling it 'mode'?
> 
> >  {
> > unsigned long addr, applied;
> > LIST_HEAD(folio_list);
> > @@ -249,7 +254,14 @@ static unsigned long damon_pa_pageout(struct 
> > damon_region *r, struct damos *s)

Before this line, damon_pa_pageout() calls folio_clear_referenced() and
folio_test_clear_young() for the folio, because this is pageout code.  Changed
function, damon_pa_migrate() is not only for cold pages but general migrations.
Hence it should also be handled based on the migration mode, but not handled.

I think this problem came from the increased complexity of this function.
Hence I think it is better to keep damon_pa_pageout() as is and adding a new
function for migration.

Thanks,
SJ

[...]

Re: [RFC PATCH v3 5/7] mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion

2024-04-09 Thread SeongJae Park

Hi Honggyu,

On Tue,  9 Apr 2024 18:54:14 +0900 Honggyu Kim  wrote:
> On Mon,  8 Apr 2024 10:52:28 -0700 SeongJae Park  wrote:
> > On Mon,  8 Apr 2024 21:06:44 +0900 Honggyu Kim  wrote:
> > > On Fri,  5 Apr 2024 12:24:30 -0700 SeongJae Park  wrote:
> > > > On Fri,  5 Apr 2024 15:08:54 +0900 Honggyu Kim  
> > > > wrote:
[...]
> > > I can remove it, but I would like to have more discussion about this
> > > issue.  The current implementation allows only a single migration
> > > target with "target_nid", but users might want to provide fall back
> > > migration target nids.
> > > 
> > > For example, if more than two CXL nodes exist in the system, users might
> > > want to migrate cold pages to any CXL nodes.  In such cases, we might
> > > have to make "target_nid" accept comma separated node IDs.  nodemask can
> > > be better but we should provide a way to change the scanning order.
> > > 
> > > I would like to hear how you think about this.
> > 
> > Good point.  I think we could later extend the sysfs file to receive the
> > comma-separated numbers, or even mask.  For simplicity, adding sysfs files
> > dedicated for the different format of inputs could also be an option (e.g.,
> > target_nids_list, target_nids_mask).  But starting from this single node as 
> > is
> > now looks ok to me.
> 
> If you think we can start from a single node, then I will keep it as is.
> But are you okay if I change the same 'target_nid' to accept
> comma-separated numbers later?  Or do you want to introduce another knob
> such as 'target_nids_list'?  What about rename 'target_nid' to
> 'target_nids' at the first place?

I have no strong concern or opinion about this at the moment.  Please feel free
to renaming it to 'taget_nids' if you think that's better.

[...]
> Please note that I will be out of office this week so won't be able to
> answer quickly.

No problem, I hope you to take and enjoy your time :)


Thanks,
SJ

[...]

Re: [RFC PATCH v3 5/7] mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion

2024-04-08 Thread SeongJae Park

On Mon,  8 Apr 2024 21:06:44 +0900 Honggyu Kim  wrote:

> On Fri,  5 Apr 2024 12:24:30 -0700 SeongJae Park  wrote:
> > On Fri,  5 Apr 2024 15:08:54 +0900 Honggyu Kim  wrote:
[...]
> > > Here is one of the example usage of this 'migrate_cold' action.
> > > 
> > >   $ cd /sys/kernel/mm/damon/admin/kdamonds/
> > >   $ cat contexts//schemes//action
> > >   migrate_cold
> > >   $ echo 2 > contexts//schemes//target_nid
> > >   $ echo commit > state
> > >   $ numactl -p 0 ./hot_cold 500M 600M &
> > >   $ numastat -c -p hot_cold
> > > 
> > >   Per-node process memory usage (in MBs)
> > >   PID Node 0 Node 1 Node 2 Total
> > >   --  -- -- -- -
> > >   701 (hot_cold) 501  0601  1101
> > > 
> > > Since there are some common routines with pageout, many functions have
> > > similar logics between pageout and migrate cold.
> > > 
> > > damon_pa_migrate_folio_list() is a minimized version of
> > > shrink_folio_list(), but it's minified only for demotion.
> > 
> > MIGRATE_COLD is not only for demotion, right?  I think the last two words 
> > are
> > better to be removed for reducing unnecessary confuses.
> 
> You mean the last two sentences?  I will remove them if you feel it's
> confusing.

Yes.  My real intended suggestion was 's/only for demotion/only for
migration/', but entirely removing the sentences is also ok for me.

> 
> > > 
> > > Signed-off-by: Honggyu Kim 
> > > Signed-off-by: Hyeongtak Ji 
> > > ---
> > >  include/linux/damon.h|   2 +
> > >  mm/damon/paddr.c | 146 ++-
> > >  mm/damon/sysfs-schemes.c |   4 ++
> > >  3 files changed, 151 insertions(+), 1 deletion(-)
[...]
> > > --- a/mm/damon/paddr.c
> > > +++ b/mm/damon/paddr.c
[...]
> > > +{
> > > + unsigned int nr_succeeded;
> > > + nodemask_t allowed_mask = NODE_MASK_NONE;
> > > +
> > 
> > I personally prefer not having empty lines in the middle of variable
> > declarations/definitions.  Could we remove this empty line?
> 
> I can remove it, but I would like to have more discussion about this
> issue.  The current implementation allows only a single migration
> target with "target_nid", but users might want to provide fall back
> migration target nids.
> 
> For example, if more than two CXL nodes exist in the system, users might
> want to migrate cold pages to any CXL nodes.  In such cases, we might
> have to make "target_nid" accept comma separated node IDs.  nodemask can
> be better but we should provide a way to change the scanning order.
> 
> I would like to hear how you think about this.

Good point.  I think we could later extend the sysfs file to receive the
comma-separated numbers, or even mask.  For simplicity, adding sysfs files
dedicated for the different format of inputs could also be an option (e.g.,
target_nids_list, target_nids_mask).  But starting from this single node as is
now looks ok to me.

[...]
> > > + /* 'folio_list' is always empty here */
> > > +
> > > + /* Migrate folios selected for migration */
> > > + nr_migrated += migrate_folio_list(_folios, pgdat, target_nid);
> > > + /* Folios that could not be migrated are still in @migrate_folios */
> > > + if (!list_empty(_folios)) {
> > > + /* Folios which weren't migrated go back on @folio_list */
> > > + list_splice_init(_folios, folio_list);
> > > + }
> > 
> > Let's not use braces for single statement
> > (https://docs.kernel.org/process/coding-style.html#placing-braces-and-spaces).
> 
> Hmm.. I know the convention but left it as is because of the comment.
> If I remove the braces, it would have a weird alignment for the two
> lines for comment and statement lines.

I don't really hate such alignment.  But if you don't like it, how about moving
the comment out of the if statement?  Having one comment for one-line if
statement looks not bad to me.

> 
> > > +
> > > + try_to_unmap_flush();
> > > +
> > > + list_splice(_folios, folio_list);
> > 
> > Can't we move remaining folios in migrate_folios to ret_folios at once?
> 
> I will see if it's possible.

Thank you.  Not a strict request, though.

[...]
> > > + nid = folio_nid(lru_to_folio(folio_list));
> > > + do {
> > > + struct folio *folio = lru_to_folio(folio_list);
> > > +
> > > + if (nid == folio_nid(folio)) {
> > > + folio_clear_active(folio);
> > 
> > I think this was necessary for demotion, but now this should be removed 
> > since
> > this function is no more for demotion but for migrating random pages, right?
> 
> Yeah,  it can be removed because we do migration instead of demotion,
> but I need to make sure if it doesn't change the performance evaluation
> results.

Yes, please ensure the test results are valid :)


Thanks,
SJ

[...]

Re: [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL memory

2024-04-05 Thread SeongJae Park

Hello Honggyu,

On Fri,  5 Apr 2024 15:08:49 +0900 Honggyu Kim  wrote:

> There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously
> posted at [1].
> 
> It says there is no implementation of the demote/promote DAMOS action
> are made.  This RFC is about its implementation for physical address
> space.
> 
> 
> Changes from RFC v2:
>   1. Rename DAMOS_{PROMOTE,DEMOTE} actions to DAMOS_MIGRATE_{HOT,COLD}.
>   2. Create 'target_nid' to set the migration target node instead of
>  depending on node distance based information.
>   3. Instead of having page level access check in this patch series,
>  delegate the job to a new DAMOS filter type YOUNG[2].
>   4. Introduce vmstat counters "damon_migrate_{hot,cold}".
>   5. Rebase from v6.7 to v6.8.

Thank you for patiently keeping discussion and making this great version!  I
left comments on each patch, but found no special concerns.  Per-page access
recheck for MIGRATE_HOT and vmstat change are taking my eyes, though.  I doubt
if those really needed.  It would be nice if you could answer to the comments.

Once my comments on this version are addressed, I would have no reason to
object at dropping the RFC tag from this patchset.

Nonetheless, I show some warnings and errors from checkpatch.pl.  I don't
really care about those for RFC patches, so no problem at all.  But if you
agree to my opinion about RFC tag dropping, and therefore if you will send next
version without RFC tag, please make sure you also run checkpatch.pl before
posting.

Thanks,
SJ

[...]

Re: [RFC PATCH v3 7/7] mm/damon: Add "damon_migrate_{hot,cold}" vmstat

2024-04-05 Thread SeongJae Park

On Fri,  5 Apr 2024 15:08:56 +0900 Honggyu Kim  wrote:

> This patch adds "damon_migrate_{hot,cold}" under node specific vmstat
> counters at the following location.
> 
>   /sys/devices/system/node/node*/vmstat
> 
> The counted values are accumulcated to the global vmstat so it also
> introduces the same counter at /proc/vmstat as well.

DAMON provides its own DAMOS stats via DAMON sysfs interface.  Do we really
need this change?


Thanks,
SJ

[...]

Re: [RFC PATCH v3 6/7] mm/damon/paddr: introduce DAMOS_MIGRATE_HOT action for promotion

2024-04-05 Thread SeongJae Park

On Fri,  5 Apr 2024 15:08:55 +0900 Honggyu Kim  wrote:

> From: Hyeongtak Ji 
> 
> This patch introduces DAMOS_MIGRATE_HOT action, which is similar to
> DAMOS_MIGRATE_COLD, but it is targeted to migrate hot pages.

My understanding of our last discussion was that 'HOT/COLD' here is only for
prioritization score function.  If I'm not wrong, this is not for targeting,
but just prioritize migrating hot pages first under the quota.

> 
> It migrates pages inside the given region to the 'target_nid' NUMA node
> in the sysfs.
> 
> Here is one of the example usage of this 'migrate_hot' action.
> 
>   $ cd /sys/kernel/mm/damon/admin/kdamonds/
>   $ cat contexts//schemes//action
>   migrate_hot
>   $ echo 0 > contexts//schemes//target_nid
>   $ echo commit > state
>   $ numactl -p 2 ./hot_cold 500M 600M &
>   $ numastat -c -p hot_cold
> 
>   Per-node process memory usage (in MBs)
>   PID Node 0 Node 1 Node 2 Total
>   --  -- -- -- -
>   701 (hot_cold) 501  0601  1101
> 
> Signed-off-by: Hyeongtak Ji 
> Signed-off-by: Honggyu Kim 
> ---
>  include/linux/damon.h|  2 ++
>  mm/damon/paddr.c | 12 ++--
>  mm/damon/sysfs-schemes.c |  4 +++-
>  3 files changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/damon.h b/include/linux/damon.h
> index df8671e69a70..934c95a7c042 100644
> --- a/include/linux/damon.h
> +++ b/include/linux/damon.h
> @@ -105,6 +105,7 @@ struct damon_target {
>   * @DAMOS_NOHUGEPAGE:Call ``madvise()`` for the region with 
> MADV_NOHUGEPAGE.
>   * @DAMOS_LRU_PRIO:  Prioritize the region on its LRU lists.
>   * @DAMOS_LRU_DEPRIO:Deprioritize the region on its LRU lists.
> + * @DAMOS_MIGRATE_HOT:  Migrate for the given hot region.

As commented on the previous patch, this could be bit re-phrased.
Also, let's use tabs consistently.

>   * @DAMOS_MIGRATE_COLD: Migrate for the given cold region.
>   * @DAMOS_STAT:  Do nothing but count the stat.
>   * @NR_DAMOS_ACTIONS:Total number of DAMOS actions
> @@ -123,6 +124,7 @@ enum damos_action {
>   DAMOS_NOHUGEPAGE,
>   DAMOS_LRU_PRIO,
>   DAMOS_LRU_DEPRIO,
> + DAMOS_MIGRATE_HOT,
>   DAMOS_MIGRATE_COLD,
>   DAMOS_STAT, /* Do nothing but only record the stat */
>   NR_DAMOS_ACTIONS,
> diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
> index fe217a26f788..fd9d35b5cc83 100644
> --- a/mm/damon/paddr.c
> +++ b/mm/damon/paddr.c
> @@ -229,6 +229,7 @@ static bool damos_pa_filter_out(struct damos *scheme, 
> struct folio *folio)
>  
>  enum migration_mode {
>   MIG_PAGEOUT,
> + MIG_MIGRATE_HOT,
>   MIG_MIGRATE_COLD,
>  };

It looks like we don't need MIG_MIGRATE_HOT and MIG_MIGRATE_COLD in real, but
just one, say, MIG_MIGRATE, since the code can know if it should use what
prioritization score function with DAMOS action?

Also, as I commented on the previous one, I'd prefer having DAMOS_ prefix.

>  
> @@ -375,8 +376,10 @@ static unsigned long damon_pa_migrate(struct 
> damon_region *r, struct damos *s,
>   if (damos_pa_filter_out(s, folio))
>   goto put_folio;
>  
> - folio_clear_referenced(folio);
> - folio_test_clear_young(folio);
> + if (mm != MIG_MIGRATE_HOT) {
> + folio_clear_referenced(folio);
> + folio_test_clear_young(folio);
> + }

We agreed to this check via 'young' page type DAMOS filter, and let this
doesn't care about it, right?  If I'm not wrong, I think this should be
removed?

>   if (!folio_isolate_lru(folio))
>   goto put_folio;
>   /*
> @@ -394,6 +397,7 @@ static unsigned long damon_pa_migrate(struct damon_region 
> *r, struct damos *s,
>   case MIG_PAGEOUT:
>   applied = reclaim_pages(_list);
>   break;
> + case MIG_MIGRATE_HOT:
>   case MIG_MIGRATE_COLD:
>   applied = damon_pa_migrate_pages(_list, mm,
>s->target_nid);
> @@ -454,6 +458,8 @@ static unsigned long damon_pa_apply_scheme(struct 
> damon_ctx *ctx,
>   return damon_pa_mark_accessed(r, scheme);
>   case DAMOS_LRU_DEPRIO:
>   return damon_pa_deactivate_pages(r, scheme);
> + case DAMOS_MIGRATE_HOT:
> + return damon_pa_migrate(r, scheme, MIG_MIGRATE_HOT);
>   case DAMOS_MIGRATE_COLD:
>   return damon_pa_migrate(r, scheme, MIG_MIGRATE_COLD);
>   case DAMOS_STAT:
> @@ -476,6 +482,8 @@ static int damon_pa_scheme_score(struct damon_ctx 
> *context,
>   return damon_hot_score(context, r, scheme);
>   case DAMOS_LRU_DEPRIO:
>   return damon_cold_score(context, r, scheme);
> + case DAMOS_MIGRATE_HOT:
> + return damon_hot_score(context, r, scheme);
>   case DAMOS_MIGRATE_COLD:
>   return damon_cold_score(context, r, scheme);
>   default:
> diff

Re: [RFC PATCH v3 5/7] mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion

2024-04-05 Thread SeongJae Park

On Fri,  5 Apr 2024 16:55:57 +0900 Hyeongtak Ji  wrote:

> On Fri,  5 Apr 2024 15:08:54 +0900 Honggyu Kim  wrote:
> 
> ...snip...
> 
> > +static unsigned long damon_pa_migrate_pages(struct list_head *folio_list,
> > +   enum migration_mode mm,
> > +   int target_nid)
> > +{
> > +   int nid;
> > +   unsigned int nr_migrated = 0;
> > +   LIST_HEAD(node_folio_list);
> > +   unsigned int noreclaim_flag;
> > +
> > +   if (list_empty(folio_list))
> > +   return nr_migrated;
> 
> How about checking if `target_nid` is `NUMA_NO_NODE` or not earlier,
> 
> > +
> > +   noreclaim_flag = memalloc_noreclaim_save();
> > +
> > +   nid = folio_nid(lru_to_folio(folio_list));
> > +   do {
> > +   struct folio *folio = lru_to_folio(folio_list);
> > +
> > +   if (nid == folio_nid(folio)) {
> > +   folio_clear_active(folio);
> > +   list_move(>lru, _folio_list);
> > +   continue;
> > +   }
> > +
> > +   nr_migrated += damon_pa_migrate_folio_list(_folio_list,
> > +  NODE_DATA(nid), mm,
> > +  target_nid);
> > +   nid = folio_nid(lru_to_folio(folio_list));
> > +   } while (!list_empty(folio_list));
> > +
> > +   nr_migrated += damon_pa_migrate_folio_list(_folio_list,
> > +  NODE_DATA(nid), mm,
> > +  target_nid);
> > +
> > +   memalloc_noreclaim_restore(noreclaim_flag);
> > +
> > +   return nr_migrated;
> > +}
> > +
> 
> ...snip...
> 
> > +static unsigned int migrate_folio_list(struct list_head *migrate_folios,
> > +  struct pglist_data *pgdat,
> > +  int target_nid)
> > +{
> > +   unsigned int nr_succeeded;
> > +   nodemask_t allowed_mask = NODE_MASK_NONE;
> > +
> > +   struct migration_target_control mtc = {
> > +   /*
> > +* Allocate from 'node', or fail quickly and quietly.
> > +* When this happens, 'page' will likely just be discarded
> > +* instead of migrated.
> > +*/
> > +   .gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) | 
> > __GFP_NOWARN |
> > +   __GFP_NOMEMALLOC | GFP_NOWAIT,
> > +   .nid = target_nid,
> > +   .nmask = _mask
> > +   };
> > +
> > +   if (pgdat->node_id == target_nid || target_nid == NUMA_NO_NODE)
> > +   return 0;
> 
> instead of here.

Agree.  As I replied on the previous reply, I think this check can be done from
the caller (or the caller of the caller) of this function.

> 
> > +
> > +   if (list_empty(migrate_folios))
> > +   return 0;

Same for this.

> > +
> > +   /* Migration ignores all cpuset and mempolicy settings */
> > +   migrate_pages(migrate_folios, alloc_migrate_folio, NULL,
> > + (unsigned long), MIGRATE_ASYNC, MR_DAMON,
> > + _succeeded);
> > +
> > +   return nr_succeeded;
> > +}
> > +
> 
> ...snip...
> 
> Kind regards,
> Hyeongtak
> 

Thanks,
SJ

Re: [RFC PATCH v3 5/7] mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion

2024-04-05 Thread SeongJae Park

On Fri,  5 Apr 2024 15:08:54 +0900 Honggyu Kim  wrote:

> This patch introduces DAMOS_MIGRATE_COLD action, which is similar to
> DAMOS_PAGEOUT, but migrate folios to the given 'target_nid' in the sysfs
> instead of swapping them out.
> 
> The 'target_nid' sysfs knob is created by this patch to inform the
> migration target node ID.

Isn't it created by the previous patch?

> 
> Here is one of the example usage of this 'migrate_cold' action.
> 
>   $ cd /sys/kernel/mm/damon/admin/kdamonds/
>   $ cat contexts//schemes//action
>   migrate_cold
>   $ echo 2 > contexts//schemes//target_nid
>   $ echo commit > state
>   $ numactl -p 0 ./hot_cold 500M 600M &
>   $ numastat -c -p hot_cold
> 
>   Per-node process memory usage (in MBs)
>   PID Node 0 Node 1 Node 2 Total
>   --  -- -- -- -
>   701 (hot_cold) 501  0601  1101
> 
> Since there are some common routines with pageout, many functions have
> similar logics between pageout and migrate cold.
> 
> damon_pa_migrate_folio_list() is a minimized version of
> shrink_folio_list(), but it's minified only for demotion.

MIGRATE_COLD is not only for demotion, right?  I think the last two words are
better to be removed for reducing unnecessary confuses.

> 
> Signed-off-by: Honggyu Kim 
> Signed-off-by: Hyeongtak Ji 
> ---
>  include/linux/damon.h|   2 +
>  mm/damon/paddr.c | 146 ++-
>  mm/damon/sysfs-schemes.c |   4 ++
>  3 files changed, 151 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/damon.h b/include/linux/damon.h
> index 24ea33a03d5d..df8671e69a70 100644
> --- a/include/linux/damon.h
> +++ b/include/linux/damon.h
> @@ -105,6 +105,7 @@ struct damon_target {
>   * @DAMOS_NOHUGEPAGE:Call ``madvise()`` for the region with 
> MADV_NOHUGEPAGE.
>   * @DAMOS_LRU_PRIO:  Prioritize the region on its LRU lists.
>   * @DAMOS_LRU_DEPRIO:Deprioritize the region on its LRU lists.
> + * @DAMOS_MIGRATE_COLD: Migrate for the given cold region.

Whether it will be for cold region or not is depending on the target access
pattern.  What about 'Migrate the regions in coldest regions first manner.'?
Or, simply 'Migrate the regions (prioritize cold)' here, and explain about the
prioritization under quota on the detailed comments part?

Also, let's use tab consistently.

>   * @DAMOS_STAT:  Do nothing but count the stat.
>   * @NR_DAMOS_ACTIONS:Total number of DAMOS actions
>   *
> @@ -122,6 +123,7 @@ enum damos_action {
>   DAMOS_NOHUGEPAGE,
>   DAMOS_LRU_PRIO,
>   DAMOS_LRU_DEPRIO,
> + DAMOS_MIGRATE_COLD,
>   DAMOS_STAT, /* Do nothing but only record the stat */
>   NR_DAMOS_ACTIONS,
>  };
> diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
> index 277a1c4d833c..fe217a26f788 100644
> --- a/mm/damon/paddr.c
> +++ b/mm/damon/paddr.c
> @@ -12,6 +12,9 @@
>  #include 
>  #include 
>  #include 
> +#include 
> +#include 
> +#include 
>  
>  #include "../internal.h"
>  #include "ops-common.h"
> @@ -226,8 +229,137 @@ static bool damos_pa_filter_out(struct damos *scheme, 
> struct folio *folio)
>  
>  enum migration_mode {
>   MIG_PAGEOUT,
> + MIG_MIGRATE_COLD,
>  };
>  
> +static unsigned int migrate_folio_list(struct list_head *migrate_folios,
> +struct pglist_data *pgdat,
> +int target_nid)

To avoid name collisions, I'd prefer having damon_pa_prefix.  I show this patch
is defining damon_pa_migrate_folio_list() below, though.  What about
__damon_pa_migrate_folio_list()?

> +{
> + unsigned int nr_succeeded;
> + nodemask_t allowed_mask = NODE_MASK_NONE;
> +

I personally prefer not having empty lines in the middle of variable
declarations/definitions.  Could we remove this empty line?

> + struct migration_target_control mtc = {
> + /*
> +  * Allocate from 'node', or fail quickly and quietly.
> +  * When this happens, 'page' will likely just be discarded
> +  * instead of migrated.
> +  */
> + .gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) | 
> __GFP_NOWARN |
> + __GFP_NOMEMALLOC | GFP_NOWAIT,
> + .nid = target_nid,
> + .nmask = _mask
> + };
> +
> + if (pgdat->node_id == target_nid || target_nid == NUMA_NO_NODE)
> + return 0;
> +
> + if (list_empty(migrate_folios))
> + return 0;

Can't these checks be done by the caller?

> +
> + /* Migration ignores all cpuset and mempolicy settings */
> + migrate_pages(migrate_folios, alloc_migrate_folio, NULL,
> +   (unsigned long), MIGRATE_ASYNC, MR_DAMON,
> +   _succeeded);
> +
> + return nr_succeeded;
> +}
> +
> +static unsigned int damon_pa_migrate_folio_list(struct list_head *folio_list,
> + struct pglist_data *pgdat,
> +

Re: [RFC PATCH v3 4/7] mm/migrate: add MR_DAMON to migrate_reason

2024-04-05 Thread SeongJae Park

On Fri,  5 Apr 2024 15:08:53 +0900 Honggyu Kim  wrote:

> The current patch series introduces DAMON based migration across NUMA
> nodes so it'd be better to have a new migrate_reason in trace events.
> 
> Signed-off-by: Honggyu Kim 

Reviewed-by: SeongJae Park 


Thanks,
SJ

> ---
>  include/linux/migrate_mode.h   | 1 +
>  include/trace/events/migrate.h | 3 ++-
>  2 files changed, 3 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
> index f37cc03f9369..cec36b7e7ced 100644
> --- a/include/linux/migrate_mode.h
> +++ b/include/linux/migrate_mode.h
> @@ -29,6 +29,7 @@ enum migrate_reason {
>   MR_CONTIG_RANGE,
>   MR_LONGTERM_PIN,
>   MR_DEMOTION,
> + MR_DAMON,
>   MR_TYPES
>  };
>  
> diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h
> index 0190ef725b43..cd01dd7b3640 100644
> --- a/include/trace/events/migrate.h
> +++ b/include/trace/events/migrate.h
> @@ -22,7 +22,8 @@
>   EM( MR_NUMA_MISPLACED,  "numa_misplaced")   \
>   EM( MR_CONTIG_RANGE,"contig_range") \
>   EM( MR_LONGTERM_PIN,"longterm_pin") \
> - EMe(MR_DEMOTION,"demotion")
> + EM( MR_DEMOTION,"demotion") \
> + EMe(MR_DAMON,   "damon")
>  
>  /*
>   * First define the enums in the above macros to be exported to userspace
> -- 
> 2.34.1
> 
>

Re: [RFC PATCH v3 2/7] mm: make alloc_demote_folio externally invokable for migration

2024-04-05 Thread SeongJae Park

On Fri,  5 Apr 2024 15:08:51 +0900 Honggyu Kim  wrote:

> The alloc_demote_folio can be used out of vmscan.c so it'd be better to
> remove static keyword from it.
> 
> This function can also be used for both demotion and promotion so it'd
> be better to rename it from alloc_demote_folio to alloc_migrate_folio.

I'm not sure if renaming is really needed, but has no strong opinion.

> 
> Signed-off-by: Honggyu Kim 

I have one more trivial comment below, but finds no blocker for me.

Reviewed-by: SeongJae Park 

> ---
>  mm/internal.h |  1 +
>  mm/vmscan.c   | 10 +++---
>  2 files changed, 8 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/internal.h b/mm/internal.h
> index f309a010d50f..c96ff9bc82d0 100644
> --- a/mm/internal.h
> +++ b/mm/internal.h
> @@ -866,6 +866,7 @@ extern unsigned long  __must_check vm_mmap_pgoff(struct 
> file *, unsigned long,
>  unsigned long, unsigned long);
>  
>  extern void set_pageblock_order(void);
> +struct folio *alloc_migrate_folio(struct folio *src, unsigned long private);
>  unsigned long reclaim_pages(struct list_head *folio_list);
>  unsigned int reclaim_clean_pages_from_list(struct zone *zone,
>   struct list_head *folio_list);
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index 4255619a1a31..9e456cac03b4 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -910,8 +910,7 @@ static void folio_check_dirty_writeback(struct folio 
> *folio,
>   mapping->a_ops->is_dirty_writeback(folio, dirty, writeback);
>  }
>  
> -static struct folio *alloc_demote_folio(struct folio *src,
> - unsigned long private)
> +struct folio *alloc_migrate_folio(struct folio *src, unsigned long private)
>  {
>   struct folio *dst;
>   nodemask_t *allowed_mask;
> @@ -935,6 +934,11 @@ static struct folio *alloc_demote_folio(struct folio 
> *src,
>   if (dst)
>   return dst;
>  
> + /*
> +  * Allocation failed from the target node so try to allocate from
> +  * fallback nodes based on allowed_mask.
> +  * See fallback_alloc() at mm/slab.c.
> +  */

I think this might better to be a separate cleanup patch, but given its small
size, I have no strong opinion.

>   mtc->gfp_mask &= ~__GFP_THISNODE;
>   mtc->nmask = allowed_mask;
>  
> @@ -973,7 +977,7 @@ static unsigned int demote_folio_list(struct list_head 
> *demote_folios,
>   node_get_allowed_targets(pgdat, _mask);
>  
>   /* Demotion ignores all cpuset and mempolicy settings */
> - migrate_pages(demote_folios, alloc_demote_folio, NULL,
> + migrate_pages(demote_folios, alloc_migrate_folio, NULL,
> (unsigned long), MIGRATE_ASYNC, MR_DEMOTION,
> _succeeded);
>  
> -- 
> 2.34.1


Thanks,
SJ

Re: [RFC PATCH v3 1/7] mm/damon/paddr: refactor DAMOS_PAGEOUT with migration_mode

2024-04-05 Thread SeongJae Park

On Fri,  5 Apr 2024 15:08:50 +0900 Honggyu Kim  wrote:

> This is a preparation patch that introduces migration modes.
> 
> The damon_pa_pageout is renamed to damon_pa_migrate and it receives an
> extra argument for migration_mode.

I personally think keeping damon_pa_pageout() as is and adding a new function
(damon_pa_migrate()) with some duplicated code is also ok, but this approach
also looks fine to me.  So I have no strong opinion here, but just letting you
know I would have no objection at both approaches.

> 
> No functional changes applied.
> 
> Signed-off-by: Honggyu Kim 
> ---
>  mm/damon/paddr.c | 18 +++---
>  1 file changed, 15 insertions(+), 3 deletions(-)
> 
> diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
> index 081e2a325778..277a1c4d833c 100644
> --- a/mm/damon/paddr.c
> +++ b/mm/damon/paddr.c
> @@ -224,7 +224,12 @@ static bool damos_pa_filter_out(struct damos *scheme, 
> struct folio *folio)
>   return false;
>  }
>  
> -static unsigned long damon_pa_pageout(struct damon_region *r, struct damos 
> *s)
> +enum migration_mode {
> + MIG_PAGEOUT,
> +};

To avoid name conflicts, what about renaming to 'damos_migration_mode' and
'DAMOS_MIG_PAGEOUT'?

> +
> +static unsigned long damon_pa_migrate(struct damon_region *r, struct damos 
> *s,
> +   enum migration_mode mm)

My poor brain has a bit confused with the name.  What about calling it 'mode'?

>  {
>   unsigned long addr, applied;
>   LIST_HEAD(folio_list);
> @@ -249,7 +254,14 @@ static unsigned long damon_pa_pageout(struct 
> damon_region *r, struct damos *s)
>  put_folio:
>   folio_put(folio);
>   }
> - applied = reclaim_pages(_list);
> + switch (mm) {
> + case MIG_PAGEOUT:
> + applied = reclaim_pages(_list);
> + break;
> + default:
> + /* Unexpected migration mode. */
> + return 0;
> + }
>   cond_resched();
>   return applied * PAGE_SIZE;
>  }
> @@ -297,7 +309,7 @@ static unsigned long damon_pa_apply_scheme(struct 
> damon_ctx *ctx,
>  {
>   switch (scheme->action) {
>   case DAMOS_PAGEOUT:
> - return damon_pa_pageout(r, scheme);
> + return damon_pa_migrate(r, scheme, MIG_PAGEOUT);
>   case DAMOS_LRU_PRIO:
>   return damon_pa_mark_accessed(r, scheme);
>   case DAMOS_LRU_DEPRIO:
> -- 
> 2.34.1


Thanks,
SJ

Re: [PATCH v9 1/2] memory tier: dax/kmem: introduce an abstract layer for finding, allocating, and putting memory types

2024-03-31 Thread SeongJae Park

Hi Ho-Ren,

On Fri, 29 Mar 2024 05:33:52 + "Ho-Ren (Jack) Chuang" 
 wrote:

> Since different memory devices require finding, allocating, and putting
> memory types, these common steps are abstracted in this patch,
> enhancing the scalability and conciseness of the code.
> 
> Signed-off-by: Ho-Ren (Jack) Chuang 
> Reviewed-by: "Huang, Ying" 
> ---
>  drivers/dax/kmem.c   | 20 ++--
>  include/linux/memory-tiers.h | 13 +
>  mm/memory-tiers.c| 32 
>  3 files changed, 47 insertions(+), 18 deletions(-)
> 
[...]
> diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
> index 69e781900082..a44c03c2ba3a 100644
> --- a/include/linux/memory-tiers.h
> +++ b/include/linux/memory-tiers.h
> @@ -48,6 +48,9 @@ int mt_calc_adistance(int node, int *adist);
>  int mt_set_default_dram_perf(int nid, struct access_coordinate *perf,
>const char *source);
>  int mt_perf_to_adistance(struct access_coordinate *perf, int *adist);
> +struct memory_dev_type *mt_find_alloc_memory_type(int adist,
> + struct list_head 
> *memory_types);
> +void mt_put_memory_types(struct list_head *memory_types);
>  #ifdef CONFIG_MIGRATION
>  int next_demotion_node(int node);
>  void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
> @@ -136,5 +139,15 @@ static inline int mt_perf_to_adistance(struct 
> access_coordinate *perf, int *adis
>  {
>   return -EIO;
>  }
> +
> +struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct 
> list_head *memory_types)
> +{
> + return NULL;
> +}
> +
> +void mt_put_memory_types(struct list_head *memory_types)
> +{
> +
> +}

I found latest mm-unstable tree is failing kunit as below, and 'git bisect'
says it happens from this patch.

$ ./tools/testing/kunit/kunit.py run --build_dir ../kunit.out/
[11:56:40] Configuring KUnit Kernel ...
[11:56:40] Building KUnit Kernel ...
Populating config with:
$ make ARCH=um O=../kunit.out/ olddefconfig
Building with:
$ make ARCH=um O=../kunit.out/ --jobs=36
ERROR:root:In file included from .../mm/memory.c:71:
.../include/linux/memory-tiers.h:143:25: warning: no previous prototype for 
‘mt_find_alloc_memory_type’ [-Wmissing-prototypes]
  143 | struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct 
list_head *memory_types)
  | ^
.../include/linux/memory-tiers.h:148:6: warning: no previous prototype for 
‘mt_put_memory_types’ [-Wmissing-prototypes]
  148 | void mt_put_memory_types(struct list_head *memory_types)
  |  ^~~
[...]

Maybe we should set these as 'static inline', like below?  I confirmed this
fixes the kunit error.  May I ask your opinion?

diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
index a44c03c2ba3a..ee6e53144156 100644
--- a/include/linux/memory-tiers.h
+++ b/include/linux/memory-tiers.h
@@ -140,12 +140,12 @@ static inline int mt_perf_to_adistance(struct 
access_coordinate *perf, int *adis
return -EIO;
 }

-struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct list_head 
*memory_types)
+static inline struct memory_dev_type *mt_find_alloc_memory_type(int adist, 
struct list_head *memory_types)
 {
return NULL;
 }

-void mt_put_memory_types(struct list_head *memory_types)
+static inline void mt_put_memory_types(struct list_head *memory_types)
 {

 }

Thanks,
SJ

Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

2024-03-26 Thread SeongJae Park

On Mon, 25 Mar 2024 15:53:03 -0700 SeongJae Park  wrote:

> On Mon, 25 Mar 2024 21:01:04 +0900 Honggyu Kim  wrote:
[...]
> > On Fri, 22 Mar 2024 09:32:23 -0700 SeongJae Park  wrote:
> > > On Fri, 22 Mar 2024 18:02:23 +0900 Honggyu Kim  wrote:
[...]
> >
> > I would like to hear how you think about this.
> 
> So, to summarize my humble opinion,
> 
> 1. I like the idea of having two actions.  But I'd like to use names other 
> than
>'promote' and 'demote'.
> 2. I still prefer having a filter for the page granularity access re-check.
> 
[...]
> > I will join the DAMON Beer/Coffee/Tea Chat tomorrow as scheduled so I
> > can talk more about this issue.
> 
> Looking forward to chatting with you :)

We met and discussed about this topic in the chat series yesterday.  Sharing
the summary for keeping the open discussion.

Honggyu thankfully accepted my humble suggestions on the last reply.  Honggyu
will post the third version of this patchset soon.  The patchset will implement
two new DAMOS actions, namely MIGRATE_HOT and MIGRATE_COLD.  Those will migrate
the DAMOS target regions to a user-specified NUMA node, but will have different
prioritization score function.  As name implies, they will prioritize more hot
regions and cold regions, respectively.

Honggyu, please feel free to fix if there is anything wrong or missed.

And thanks to Honggyu again for patiently keeping this productive discussion
and their awesome work.

Thanks,
SJ

[...]

Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

2024-03-25 Thread SeongJae Park

On Mon, 25 Mar 2024 21:01:04 +0900 Honggyu Kim  wrote:

> Hi SeongJae,
> 
> On Fri, 22 Mar 2024 09:32:23 -0700 SeongJae Park  wrote:
> > On Fri, 22 Mar 2024 18:02:23 +0900 Honggyu Kim  wrote:
[...]
> > > > Honggyu joined DAMON Beer/Coffee/Tea Chat[1] yesterday, and we 
> > > > discussed about
> > > > this patchset in high level.  Sharing the summary here for open 
> > > > discussion.  As
> > > > also discussed on the first version of this patchset[2], we want to 
> > > > make single
> > > > action for general page migration with minimum changes, but would like 
> > > > to keep
> > > > page level access re-check.  We also agreed the previously proposed 
> > > > DAMOS
> > > > filter-based approach could make sense for the purpose.
> > > 
> > > Thanks very much for the summary.  I have been trying to merge promote
> > > and demote actions into a single migrate action, but I found an issue
> > > regarding damon_pa_scheme_score.  It currently calls damon_cold_score()
> > > for demote action and damon_hot_score() for promote action, but what
> > > should we call when we use a single migrate action?
> > 
> > Good point!  This is what I didn't think about when suggesting that.  Thank 
> > you
> > for letting me know this gap!  I think there could be two approach, off the 
> > top
> > of my head.
> > 
> > The first one would be extending the interface so that the user can select 
> > the
> > score function.  This would let flexible usage, but I'm bit concerned if 
> > this
> > could make things unnecessarily complex, and would really useful in many
> > general use case.
> 
> I also think this looks complicated and may not be useful for general
> users.
> 
> > The second approach would be letting DAMON infer the intention.  In this 
> > case,
> > I think we could know the intention is the demotion if the scheme has a youg
> > pages exclusion filter.  Then, we could use the cold_score().  And vice 
> > versa.
> > To cover a case that there is no filter at all, I think we could have one
> > assumption.  My humble intuition says the new action (migrate) may be used 
> > more
> > for promotion use case.  So, in damon_pa_scheme_score(), if the action of 
> > the
> > given scheme is the new one (say, MIGRATE), the function will further check 
> > if
> > the scheme has a filter for excluding young pages.  If so, the function will
> > use cold_score().  Otherwise, the function will use hot_score().
> 
> Thanks for suggesting many ideas but I'm afraid that I feel this doesn't
> look good.  Thinking it again, I think we can think about keep using
> DAMOS_PROMOTE and DAMOS_DEMOTE,

In other words, keep having dedicated DAMOS action for intuitive prioritization
score function, or, coupling the prioritization with each action, right?  I
think this makes sense, and fit well with the documentation.

The prioritization mechanism should be different for each action.  For 
example,
rarely accessed (colder) memory regions would be prioritized for page-out
scheme action.  In contrast, the colder regions would be deprioritized for 
huge
page collapse scheme action.  Hence, the prioritization mechanisms for each
action are implemented in each DAMON operations set, together with the 
actions.

In other words, each DAMOS action should allow users intuitively understand
what types of regions will be prioritized.  We already have such couples of
DAMOS actions such as DAMOS_[NO]HUGEPAGE and DAMOS_LRU_[DE]PRIO.  So adding a
couple of action for this case sounds reasonable to me.  And I think this is
better and simpler than having the inferrence based behavior.

That said, I concern if 'PROMOTE' and 'DEMOTE' still sound bit ambiguous to
people who don't know 'demote_folio_list()' and its friends.  Meanwhile, the
name might sound too detail about what it does to people who know the
functions, so make it bit unflexible.  They might also get confused since we
don't have 'promote_folio_list()'.

To my humble understanding, what you really want to do is migrating pages to
specific address range (or node) prioritizing the pages based on the hotness.
What about, say, MIGRATE_{HOT,COLD}?

> but I can make them directly call
> damon_folio_young() for access check instead of using young filter.
> 
> And we can internally handle the complicated combination such as demote
> action sets "young" filter with "matching" true and promote action sets
> "young" filter with "matching" false.  IMHO, this will make the usage
> simpler.

I think whether to exclude young/non-young (maybe idle is better

Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

2024-03-22 Thread SeongJae Park

On Fri, 22 Mar 2024 17:27:34 +0900 Honggyu Kim  wrote:

[...]
> OK. It could be a matter of preference and the current filter is already
> in the mainline so I won't insist more.

Thank you for accepting my humble suggestion.

[...]
> > I'd prefer improving the documents or
> > user-space tool and keep the kernel code simple.
> 
> OK. I will see if there is a way to improve damo tool for this instead
> of making changes on the kernel side.

Looking forward!

[...]
> Yeah, I made this thread too much about filter naming discussion rather
> than tiered memory support.

No problem at all.  Thank you for keeping this productive discussion.

[...]
> Thanks again for your feedback.

That's my pleasure :)


Thanks,
SJ

[...]

Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

2024-03-22 Thread SeongJae Park

On Fri, 22 Mar 2024 18:02:23 +0900 Honggyu Kim  wrote:

> Hi SeongJae,
> 
> On Tue, 27 Feb 2024 15:51:20 -0800 SeongJae Park  wrote:
> > On Mon, 26 Feb 2024 23:05:46 +0900 Honggyu Kim  wrote:
> > 
> > > There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously
> > > posted at [1].
> > > 
> > > It says there is no implementation of the demote/promote DAMOS action
> > > are made.  This RFC is about its implementation for physical address
> > > space.
> > > 
> > > 
[...]
> > Thank you for running the tests again with the new version of the patches 
> > and
> > sharing the results!
> 
> It's a bit late answer, but the result was from the previous evaluation.
> I ran it again with RFC v2, but didn't see much difference so just
> pasted the same result here.

No problem, thank you for clarifying :)

[...]
> > > Honggyu Kim (3):
> > >   mm/damon: refactor DAMOS_PAGEOUT with migration_mode
> > >   mm: make alloc_demote_folio externally invokable for migration
> > >   mm/damon: introduce DAMOS_DEMOTE action for demotion
> > > 
> > > Hyeongtak Ji (4):
> > >   mm/memory-tiers: add next_promotion_node to find promotion target
> > >   mm/damon: introduce DAMOS_PROMOTE action for promotion
> > >   mm/damon/sysfs-schemes: add target_nid on sysfs-schemes
> > >   mm/damon/sysfs-schemes: apply target_nid for promote and demote
> > > actions
> > 
> > Honggyu joined DAMON Beer/Coffee/Tea Chat[1] yesterday, and we discussed 
> > about
> > this patchset in high level.  Sharing the summary here for open discussion. 
> >  As
> > also discussed on the first version of this patchset[2], we want to make 
> > single
> > action for general page migration with minimum changes, but would like to 
> > keep
> > page level access re-check.  We also agreed the previously proposed DAMOS
> > filter-based approach could make sense for the purpose.
> 
> Thanks very much for the summary.  I have been trying to merge promote
> and demote actions into a single migrate action, but I found an issue
> regarding damon_pa_scheme_score.  It currently calls damon_cold_score()
> for demote action and damon_hot_score() for promote action, but what
> should we call when we use a single migrate action?

Good point!  This is what I didn't think about when suggesting that.  Thank you
for letting me know this gap!  I think there could be two approach, off the top
of my head.

The first one would be extending the interface so that the user can select the
score function.  This would let flexible usage, but I'm bit concerned if this
could make things unnecessarily complex, and would really useful in many
general use case.

The second approach would be letting DAMON infer the intention.  In this case,
I think we could know the intention is the demotion if the scheme has a youg
pages exclusion filter.  Then, we could use the cold_score().  And vice versa.
To cover a case that there is no filter at all, I think we could have one
assumption.  My humble intuition says the new action (migrate) may be used more
for promotion use case.  So, in damon_pa_scheme_score(), if the action of the
given scheme is the new one (say, MIGRATE), the function will further check if
the scheme has a filter for excluding young pages.  If so, the function will
use cold_score().  Otherwise, the function will use hot_score().

So I'd more prefer the second approach.  I think it would be not too late to
consider the first approach after waiting for it turns out more actions have
such ambiguity and need more general interface for explicitly set the score
function.

Thanks,
SJ

[...]

Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

2024-03-20 Thread SeongJae Park

Hi Honggyu,

On Wed, 20 Mar 2024 16:07:48 +0900 Honggyu Kim  wrote:

> Hi SeongJae,
> 
> On Mon, 18 Mar 2024 12:07:21 -0700 SeongJae Park  wrote:
> > On Mon, 18 Mar 2024 22:27:45 +0900 Honggyu Kim  wrote:
> > 
> > > Hi SeongJae,
> > > 
> > > On Sun, 17 Mar 2024 08:31:44 -0700 SeongJae Park  wrote:
> > > > Hi Honggyu,
> > > > 
> > > > On Sun, 17 Mar 2024 17:36:29 +0900 Honggyu Kim  
> > > > wrote:
> > > > 
> > > > > Hi SeongJae,
> > > > > 
> > > > > Thanks for the confirmation.  I have a few comments on young filter so
> > > > > please read the inline comments again.
> > > > > 
> > > > > On Wed, 12 Mar 2024 08:53:00 -0800 SeongJae Park  
> > > > > wrote:
> > > > > > Hi Honggyu,
> > > > > > 
> > > > > > > > -Original Message-
> > > > > > > > From: SeongJae Park 
> > > > > > > > Sent: Tuesday, March 12, 2024 3:33 AM
> > > > > > > > To: Honggyu Kim 
> > > > > > > > Cc: SeongJae Park ; kernel_team 
> > > > > > > > 
> > > > > > > > Subject: RE: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory 
> > > > > > > > management for CXL memory
> > > > > > > >
> > > > > > > > Hi Honggyu,
> > > > > > > >
> > > > > > > > On Mon, 11 Mar 2024 12:51:12 + "honggyu@sk.com" 
> > > > > > > >  wrote:
> > > > > > > >
> > > > > > > > > Hi SeongJae,
> > > > > > > > >
> > > > > > > > > I've tested it again and found that "young" filter has to be 
> > > > > > > > > set
> > > > > > > > > differently as follows.
> > > > > > > > > - demote action: set "young" filter with "matching" true
> > > > > > > > > - promote action: set "young" filter with "matching" false
> > > 
> > > Thinking it again, I feel like "matching" true or false looks quite
> > > vague to me as a general user.
> > > 
> > > Instead, I would like to have more meaningful names for "matching" as
> > > follows.
> > > 
> > > - matching "true" can be either (filter) "out" or "skip".
> > > - matching "false" can be either (filter) "in" or "apply".
> > 
> > I agree the naming could be done much better.  And thank you for the nice
> > suggestions.  I have a few concerns, though.
> 
> I don't think my suggestion is best.  I just would like to have more
> discussion about it.

I also understand my naming sense is far from good :)  I'm grateful to have
this constructive discussion!

> 
> > Firstly, increasing the number of behavioral concepts.  DAMOS filter feature
> > has only single behavior: excluding some types of memory from DAMOS action
> > target.  The "matching" is to provide a flexible way for further specifying 
> > the
> > target to exclude in a bit detail.  Without it, we would need non-variant 
> > for
> > each filter type.  Comapred to the current terms, the new terms feel like
> > implying there are two types of behaviors.  I think one behavior is easier 
> > to
> > understand than two behaviors, and better match what internal code is doing.
> > 
> > Secondly, ambiguity in "in" and "apply".  To me, the terms sound like 
> > _adding_
> > something more than _excluding_.
> 
> I understood that young filter "matching" "false" means apply action
> only to young pages.  Do I misunderstood something here?  If not,

Technically speaking, having a DAMOS filter with 'matching' parameter as
'false' for 'young pages' type means you want DAMOS to "exclude pages that not
young from the scheme's action target".  That's the only thing it truly does,
and what it tries to guarantee.  Whether the action will be applied to young
pages or not depends on more factors including additional filters and DAMOS
parameter.  IOW, that's not what the simple setting promises.

Of course, I know you are assuming there is only the single filter.  Hence,
effectively you're correct.  And the sentence may be a better wording for end
users.  However, it tooke me a bit time to understand your assumption and
concluding whether

Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

2024-03-18 Thread SeongJae Park

On Mon, 18 Mar 2024 22:27:45 +0900 Honggyu Kim  wrote:

> Hi SeongJae,
> 
> On Sun, 17 Mar 2024 08:31:44 -0700 SeongJae Park  wrote:
> > Hi Honggyu,
> > 
> > On Sun, 17 Mar 2024 17:36:29 +0900 Honggyu Kim  wrote:
> > 
> > > Hi SeongJae,
> > > 
> > > Thanks for the confirmation.  I have a few comments on young filter so
> > > please read the inline comments again.
> > > 
> > > On Wed, 12 Mar 2024 08:53:00 -0800 SeongJae Park  wrote:
> > > > Hi Honggyu,
> > > > 
> > > > > > -Original Message-
> > > > > > From: SeongJae Park 
> > > > > > Sent: Tuesday, March 12, 2024 3:33 AM
> > > > > > To: Honggyu Kim 
> > > > > > Cc: SeongJae Park ; kernel_team 
> > > > > > 
> > > > > > Subject: RE: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory 
> > > > > > management for CXL memory
> > > > > >
> > > > > > Hi Honggyu,
> > > > > >
> > > > > > On Mon, 11 Mar 2024 12:51:12 + "honggyu@sk.com" 
> > > > > >  wrote:
> > > > > >
> > > > > > > Hi SeongJae,
> > > > > > >
> > > > > > > I've tested it again and found that "young" filter has to be set
> > > > > > > differently as follows.
> > > > > > > - demote action: set "young" filter with "matching" true
> > > > > > > - promote action: set "young" filter with "matching" false
> 
> Thinking it again, I feel like "matching" true or false looks quite
> vague to me as a general user.
> 
> Instead, I would like to have more meaningful names for "matching" as
> follows.
> 
> - matching "true" can be either (filter) "out" or "skip".
> - matching "false" can be either (filter) "in" or "apply".

I agree the naming could be done much better.  And thank you for the nice
suggestions.  I have a few concerns, though.

Firstly, increasing the number of behavioral concepts.  DAMOS filter feature
has only single behavior: excluding some types of memory from DAMOS action
target.  The "matching" is to provide a flexible way for further specifying the
target to exclude in a bit detail.  Without it, we would need non-variant for
each filter type.  Comapred to the current terms, the new terms feel like
implying there are two types of behaviors.  I think one behavior is easier to
understand than two behaviors, and better match what internal code is doing.

Secondly, ambiguity in "in" and "apply".  To me, the terms sound like _adding_
something more than _excluding_.  I think that might confuse people in some
cases.  Actually, I have used the term "filter-out" and "filter-in" on
this  and several threads.  When saying about "filter-in" scenario, I had to
emphasize the fact that it is not adding something but excluding others.  I now
think that was not a good approach.

Finally, "apply" sounds a bit deterministic.  I think it could be a bit
confusing in some cases such as when using multiple filters in a combined way.
For example, if we have two filters for 1) "apply" a memcg and 2) skip anon
pages, the given DAMOS action will not be applied to anon pages of the memcg.
I think this might be a bit confusing.

> 
> Internally, the type of "matching" can be boolean, but it'd be better
> for general users have another ways to set it such as "out"/"in" or
> "skip"/"apply" via sysfs interface.  I prefer "skip" and "apply" looks
> more intuitive, but I don't have strong objection on "out" and "in" as
> well.

Unfortunately, DAMON sysfs interface is an ABI that we want to keep stable.  Of
course we could make some changes on it if really required.  But I'm unsure if
the problem of current naming and benefit of the sugegsted change are big
enough to outweighs the stability risk and additional efforts.

Also, DAMON sysfs interface is arguably not for _very_ general users.  DAMON
user-space tool is the one for _more_ general users.  To quote DAMON usage
document,

- *DAMON user space tool.*
  `This <https://github.com/awslabs/damo>`_ is for privileged people such as
  system administrators who want a just-working human-friendly interface.
  [...]
- *sysfs interface.*
  :ref:`This ` is for privileged user space programmers who
  want more optimized use of DAMON. [...]

If the concept is that confused, I think we could improve the docum

Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

2024-03-17 Thread SeongJae Park

On Sun, 17 Mar 2024 08:31:44 -0700 SeongJae Park  wrote:

> Hi Honggyu,
> 
> On Sun, 17 Mar 2024 17:36:29 +0900 Honggyu Kim  wrote:
> 
> > Hi SeongJae,
> > 
> > Thanks for the confirmation.  I have a few comments on young filter so
> > please read the inline comments again.
> > 
> > On Wed, 12 Mar 2024 08:53:00 -0800 SeongJae Park  wrote:
> > > Hi Honggyu,
[...]
> > Thanks.  I see that it works fine, but I would like to have more
> > discussion about "young" filter.  What I think about filter is that if I
> > apply "young" filter "true" for demotion, then the action applies only
> > for "young" pages, but the current implementation works opposite.
> > 
> > I understand the function name of internal implementation is
> > "damos_pa_filter_out" so the basic action is filtering out, but the
> > cgroup filter works in the opposite way for now.
> 
> Does memcg filter works in the opposite way?  I don't think so because
> __damos_pa_filter_out() sets 'matches' as 'true' only if the the given folio 
> is
> contained in the given memcg.  'young' filter also simply sets 'matches' as
> 'true' only if the given folio is young.
> 
> If it works in the opposite way, it's a bug that need to be fixed.  Please let
> me know if I'm missing something.

I just read the DAMOS filters part of the documentation for DAMON sysfs
interface again.  I believe it is explaining the meaning of 'matching' as I
intended to, as below:

You can write ``Y`` or ``N`` to ``matching`` file to filter out pages that 
does
or does not match to the type, respectively.  Then, the scheme's action will
not be applied to the pages that specified to be filtered out.

But, I found the following example for memcg filter is wrong, as below:

For example, below restricts a DAMOS action to be applied to only 
non-anonymous
pages of all memory cgroups except ``/having_care_already``.::

# echo 2 > nr_filters
# # filter out anonymous pages
echo anon > 0/type
echo Y > 0/matching
# # further filter out all cgroups except one at '/having_care_already'
echo memcg > 1/type
echo /having_care_already > 1/memcg_path
echo N > 1/matching

Specifically, the last line of the commands should write 'Y' instead of 'N' to
do what explained.  Without the fix, the action will be applied to only
non-anonymous pages of 'having_care_already' memcg.  This is definitely wrong.
I will fix this soon.  I'm unsure if this is what made you to believe memcg
DAMOS filter is working in the opposite way, though.

Thanks,
SJ

[...]

Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

2024-03-17 Thread SeongJae Park

Hi Honggyu,

On Sun, 17 Mar 2024 17:36:29 +0900 Honggyu Kim  wrote:

> Hi SeongJae,
> 
> Thanks for the confirmation.  I have a few comments on young filter so
> please read the inline comments again.
> 
> On Wed, 12 Mar 2024 08:53:00 -0800 SeongJae Park  wrote:
> > Hi Honggyu,
> > 
> > > > -Original Message-
> > > > From: SeongJae Park 
> > > > Sent: Tuesday, March 12, 2024 3:33 AM
> > > > To: Honggyu Kim 
> > > > Cc: SeongJae Park ; kernel_team 
> > > > 
> > > > Subject: RE: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory 
> > > > management for CXL memory
> > > >
> > > > Hi Honggyu,
> > > >
> > > > On Mon, 11 Mar 2024 12:51:12 + "honggyu@sk.com" 
> > > >  wrote:
> > > >
> > > > > Hi SeongJae,
> > > > >
> > > > > I've tested it again and found that "young" filter has to be set
> > > > > differently as follows.
> > > > > - demote action: set "young" filter with "matching" true
> > > > > - promote action: set "young" filter with "matching" false
> > > >
> > > > DAMOS filter is basically for filtering "out" memory regions that 
> > > > matches to
> > > > the condition.  Hence in your setup, young pages are not filtered out 
> > > > from
> > > > demote action target.
> > > 
> > > I thought young filter true means "young pages ARE filtered out" for 
> > > demotion.
> > 
> > You're correct.
> 
> Ack.
> 
> > > 
> > > > That is, you're demoting pages that "not" young.
> > > 
> > > Your explanation here looks opposite to the previous statement.
> > 
> > Again, you're correct.  My intention was "non-young pages are not ..." but
> > maybe I was out of my mind and mistakenly removed "non-" part.  Sorry for 
> > the
> > confusion.
> 
> No problem.  I also think it's quite confusing.
> 
> > > 
> > > > And vice versa, so you're applying promote to non-non-young (young) 
> > > > pages.
> > > 
> > > Yes, I understand "promote non-non-young pages" means "promote young 
> > > pages".
> > > This might be understood as "young pages are NOT filtered out" for 
> > > promotion
> > > but it doesn't mean that "old pages are filtered out" instead.
> > > And we just rely hot detection only on DAMOS logics such as access 
> > > frequency
> > > and age. Am I correct?
> > 
> > You're correct.
> 
> Ack.  But if it doesn't mean that "old pages are filtered out" instead,

It does mean that.  Here, filtering is exclusive.  Hence, "filter-in a type of
pages" means "filter-out pages of other types".  At least that's the intention.
To quote the documentation
(https://docs.kernel.org/mm/damon/design.html#filters),

Each filter specifies the type of target memory, and whether it should
exclude the memory of the type (filter-out), or all except the memory of
the type (filter-in).

> then do we really need this filter for promotion?  If not, maybe should
> we create another "old" filter for promotion?  As of now, the promotion
> is mostly done inaccurately, but the accurate migration is done at
> demotion level.

Is this based on your theory?  Or, a real behavior that you're seeing from your
setup?  If this is a real behavior, I think that should be a bug that need to
be fixed.

> To avoid this issue, I feel we should promotion only "young" pages after
> filtering "old" pages out.
> 
> > > 
> > > > I understand this is somewhat complex, but what we have for now.
> > > 
> > > Thanks for the explanation. I guess you mean my filter setup is correct.
> > > Is it correct?
> > 
> > Again, you're correct.  Your filter setup is what I expected to :)
> 
> Thanks.  I see that it works fine, but I would like to have more
> discussion about "young" filter.  What I think about filter is that if I
> apply "young" filter "true" for demotion, then the action applies only
> for "young" pages, but the current implementation works opposite.
> 
> I understand the function name of internal implementation is
> "damos_pa_filter_out" so the basic action is filtering out, but the
> cgroup filter works in the opposite way for now.

Does memcg filter works in the opposit

Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

2024-03-06 Thread SeongJae Park

Hello,

On Tue, 27 Feb 2024 15:51:20 -0800 SeongJae Park  wrote:

> On Mon, 26 Feb 2024 23:05:46 +0900 Honggyu Kim  wrote:
> 
> > There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously
> > posted at [1].
> > 
> > It says there is no implementation of the demote/promote DAMOS action
> > are made.  This RFC is about its implementation for physical address
> > space.
[...]
> Honggyu joined DAMON Beer/Coffee/Tea Chat[1] yesterday, and we discussed about
> this patchset in high level.  Sharing the summary here for open discussion.  
> As
> also discussed on the first version of this patchset[2], we want to make 
> single
> action for general page migration with minimum changes, but would like to keep
> page level access re-check.  We also agreed the previously proposed DAMOS
> filter-based approach could make sense for the purpose.
> 
> Because I was anyway planning making such DAMOS filter for not only
> promotion/demotion but other types of DAMOS action, I will start developing 
> the
> page level access re-check results based DAMOS filter.  Once the 
> implementation
> of the prototype is done, I will share the early implementation.  Then, 
> Honggyu
> will adjust their implementation based on the filter, and run their tests 
> again
> and share the results.

I just posted an RFC patchset for the page level access re-check DAMOS filter:
https://lore.kernel.org/r/20240307030013.47041-1...@kernel.org

I hope it to help you better understanding and testing the idea.

> 
> [1] https://lore.kernel.org/damon/20220810225102.124459-1...@kernel.org/
> [2] https://lore.kernel.org/damon/20240118171756.80356-1...@kernel.org


Thanks,
SJ

[...]

Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory

2024-02-27 Thread SeongJae Park

On Mon, 26 Feb 2024 23:05:46 +0900 Honggyu Kim  wrote:

> There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously
> posted at [1].
> 
> It says there is no implementation of the demote/promote DAMOS action
> are made.  This RFC is about its implementation for physical address
> space.
> 
> 
> Introduction
> 
> 
> With the advent of CXL/PCIe attached DRAM, which will be called simply
> as CXL memory in this cover letter, some systems are becoming more
> heterogeneous having memory systems with different latency and bandwidth
> characteristics.  They are usually handled as different NUMA nodes in
> separate memory tiers and CXL memory is used as slow tiers because of
> its protocol overhead compared to local DRAM.
> 
> In this kind of systems, we need to be careful placing memory pages on
> proper NUMA nodes based on the memory access frequency.  Otherwise, some
> frequently accessed pages might reside on slow tiers and it makes
> performance degradation unexpectedly.  Moreover, the memory access
> patterns can be changed at runtime.
> 
> To handle this problem, we need a way to monitor the memory access
> patterns and migrate pages based on their access temperature.  The
> DAMON(Data Access MONitor) framework and its DAMOS(DAMON-based Operation
> Schemes) can be useful features for monitoring and migrating pages.
> DAMOS provides multiple actions based on DAMON monitoring results and it
> can be used for proactive reclaim, which means swapping cold pages out
> with DAMOS_PAGEOUT action, but it doesn't support migration actions such
> as demotion and promotion between tiered memory nodes.
> 
> This series supports two new DAMOS actions; DAMOS_DEMOTE for demotion
> from fast tiers and DAMOS_PROMOTE for promotion from slow tiers.  This
> prevents hot pages from being stuck on slow tiers, which makes
> performance degradation and cold pages can be proactively demoted to
> slow tiers so that the system can increase the chance to allocate more
> hot pages to fast tiers.
> 
> The DAMON provides various tuning knobs but we found that the proactive
> demotion for cold pages is especially useful when the system is running
> out of memory on its fast tier nodes.
> 
> Our evaluation result shows that it reduces the performance slowdown
> compared to the default memory policy from 15~17% to 4~5% when the
> system runs under high memory pressure on its fast tier DRAM nodes.
> 
> 
> DAMON configuration
> ===
> 
> The specific DAMON configuration doesn't have to be in the scope of this
> patch series, but some rough idea is better to be shared to explain the
> evaluation result.
> 
> The DAMON provides many knobs for fine tuning but its configuration file
> is generated by HMSDK[2].  It includes gen_config.py script that
> generates a json file with the full config of DAMON knobs and it creates
> multiple kdamonds for each NUMA node when the DAMON is enabled so that
> it can run hot/cold based migration for tiered memory.

I was feeling a bit confused from here since DAMON doesn't receive parameters
via a file.  To my understanding, the 'configuration file' means the input file
for DAMON user-space tool, damo, not DAMON.  Just a trivial thing, but making
it clear if possible could help readers in my opinion.

> 
> 
> Evaluation Workload
> ===
> 
> The performance evaluation is done with redis[3], which is a widely used
> in-memory database and the memory access patterns are generated via
> YCSB[4].  We have measured two different workloads with zipfian and
> latest distributions but their configs are slightly modified to make
> memory usage higher and execution time longer for better evaluation.
> 
> The idea of evaluation using these demote and promote actions covers
> system-wide memory management rather than partitioning hot/cold pages of
> a single workload.  The default memory allocation policy creates pages
> to the fast tier DRAM node first, then allocates newly created pages to
> the slow tier CXL node when the DRAM node has insufficient free space.
> Once the page allocation is done then those pages never move between
> NUMA nodes.  It's not true when using numa balancing, but it is not the
> scope of this DAMON based 2-tier memory management support.
> 
> If the working set of redis can be fit fully into the DRAM node, then
> the redis will access the fast DRAM only.  Since the performance of DRAM
> only is faster than partially accessing CXL memory in slow tiers, this
> environment is not useful to evaluate this patch series.
> 
> To make pages of redis be distributed across fast DRAM node and slow
> CXL node to evaluate our demote and promote actions, we pre-allocate
> some cold memory externally using mmap and memset before launching
> redis-server.  We assumed that there are enough amount of cold memory in
> datacenters as TMO[5] and TPP[6] papers mentioned.
> 
> The evaluation sequence is as follows.
> 
> 1. Turn on DAMON with DAMOS_DEMOTE action

Re: [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory

2024-01-18 Thread SeongJae Park

On Thu, 18 Jan 2024 19:40:16 +0900 Hyeongtak Ji  wrote:

> Hi SeongJae,
> 
> On Wed, 17 Jan 2024 SeongJae Park  wrote:
> 
> [...]
> >> Let's say there are 3 nodes in the system and the first node0 and node1
> >> are the first tier, and node2 is the second tier.
> >> 
> >>   $ cat /sys/devices/virtual/memory_tiering/memory_tier4/nodelist
> >>   0-1
> >> 
> >>   $ cat /sys/devices/virtual/memory_tiering/memory_tier22/nodelist
> >>   2
> >> 
> >> Here is the result of partitioning hot/cold memory and I put execution
> >> command at the right side of numastat result.  I initially ran each
> >> hot_cold program with preferred setting so that they initially allocate
> >> memory on one of node0 or node2, but they gradually migrated based on
> >> their access frequencies.
> >> 
> >>   $ numastat -c -p hot_cold
> >>   Per-node process memory usage (in MBs) 
> >>   PID  Node 0 Node 1 Node 2 Total 
> >>   ---  -- -- -- - 
> >>   754 (hot_cold) 1800  0   2000  3800<- hot_cold 1800 2000 
> >>   1184 (hot_cold) 300  0500   800<- hot_cold 300 500 
> >>   1818 (hot_cold) 801  0   3199  4000<- hot_cold 800 3200 
> >>   30289 (hot_cold)  4  0  510<- hot_cold 3 5 
> >>   30325 (hot_cold) 31  0 5181<- hot_cold 30 50 
> >>   ---  -- -- -- - 
> >>   Total  2938  0   5756  8695
> >> 
> >> The final node placement result shows that DAMON accurately migrated
> >> pages by their hotness for multiple processes.
> >
> > What was the result when the corner cases handling logics were not applied?
> 
> This is the result of the same test that Honggyu did, but with an insufficient
> corner cases handling logics.
> 
>   $ numastat -c -p hot_cold
> 
>   Per-node process memory usage (in MBs)
>   PID Node 0 Node 1 Node 2 Total
>   --  -- -- -- -
>   862 (hot_cold)2256  0   1545  3801   <- hot_cold 1800 2000
>   863 (hot_cold) 403  0398   801   <- hot_cold 300 500
>   864 (hot_cold)1520  0   2482  4001   <- hot_cold 800 3200
>   865 (hot_cold)   6  0  3 9   <- hot_cold 3 5
>   866 (hot_cold)  29  0 5281   <- hot_cold 30 50
>   --  -- -- -- -
>   Total 4215  0   4480  8695
> 
> As time goes by, DAMON keeps trying to split the hot/cold region, but it does
> not seem to be enough.
> 
>   $ numastat -c -p hot_cold
> 
>   Per-node process memory usage (in MBs)
>   PID Node 0 Node 1 Node 2 Total
>   --  -- -- -- -
>   862 (hot_cold)2022  0   1780  3801   <- hot_cold 1800 2000
>   863 (hot_cold) 351  0450   801   <- hot_cold 300 500
>   864 (hot_cold)1134  0   2868  4001   <- hot_cold 800 3200
>   865 (hot_cold)   7  0  2 9   <- hot_cold 3 5
>   866 (hot_cold)  43  0 3981   <- hot_cold 30 50
>   --  -- -- -- -
>   Total 3557  0   5138  8695
> 
> >
> > And, what are the corner cases handling logic that seemed essential?  I show
> > the page granularity active/reference check could indeed provide many
> > improvements, but that's only my humble assumption.
> 
> Yes, the page granularity active/reference check is essential.  To make the
> above "insufficient" result, the only thing I did was to promote
> inactive/not_referenced pages.
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index f03be320f9ad..c2aefb883c54 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -1127,9 +1127,7 @@ static unsigned int __promote_folio_list(struct 
> list_head *folio_list,
> VM_BUG_ON_FOLIO(folio_test_active(folio), folio);
> 
> references = folio_check_references(folio, sc);
> -   if (references == FOLIOREF_KEEP ||
> -   references == FOLIOREF_RECLAIM ||
> -   references == FOLIOREF_RECLAIM_CLEAN)
> +   if (references == FOLIOREF_KEEP )
> goto keep_locked;
> 
> /* Relocate its contents to another node. */

Thank you for sharing the details :)  I think DAMOS filters based approach
could be worthy to try, then.

> 
> >
> > If the corner cases are indeed better to be applied in page granularity, I
> > agree we need some more efforts since DAMON monitoring r

Re: [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory

2024-01-17 Thread SeongJae Park

On Wed, 17 Jan 2024 13:11:03 -0800 SeongJae Park  wrote:

[...]
> Hi Honggyu,
> 
> On Wed, 17 Jan 2024 20:49:25 +0900 Honggyu Kim  wrote:
> 
> > Hi SeongJae,
> > 
> > Thanks very much for your comments in details.
> > 
> > On Tue, 16 Jan 2024 12:31:59 -0800 SeongJae Park  wrote:
> > 
[...]
> > > To this end, I feel the problem might be able te be simpler, because this
> > > patchset is trying to provide two sophisticated operations, while I think 
> > > a
> > > simpler approach might be possible.  My humble simpler idea is adding a 
> > > DAMOS
> > > operation for moving pages to a given node (like sys_move_phy_pages 
> > > RFC[1]),
> > > instead of the promote/demote.  Because the general pages migration can 
> > > handle
> > > multiple cases including the promote/demote in my humble assumption.
[...]
> > > In more detail, users could decide which is the appropriate node for 
> > > promotion
> > > or demotion and use the new DAMOS action to do promotion and demotion.  
> > > Users
> > > would requested to decide which node is the proper promotion/demotion 
> > > target
> > > nodes, but that decision wouldn't be that hard in my opinion.
> > > 
> > > For this, 'struct damos' would need to be updated for such 
> > > argument-dependent
> > > actions, like 'struct damos_filter' is haing a union.
> > 
> > That might be a better solution.  I will think about it.
> 
> More specifically, I think receiving an address range as the argument might
> more flexible than just NUMA node.  Maybe we can imagine proactively migrating
> cold movable pages from normal zones to movable zones, to avoid normal zone
> memory pressure.

Yet another crazy idea.  Finding hot regions in the middle of cold region and
move to besides of other hot pages.  As a result, memory is sorted by access
temperature even in same node, and the system gains more spatial locality,
which benefits general locality-based algorithms including DAMON's adaptive
regions adjustment.


Thanks,
SJ

[...]

Re: [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory

2024-01-17 Thread SeongJae Park

Hi Honggyu,

On Wed, 17 Jan 2024 20:49:25 +0900 Honggyu Kim  wrote:

> Hi SeongJae,
> 
> Thanks very much for your comments in details.
> 
> On Tue, 16 Jan 2024 12:31:59 -0800 SeongJae Park  wrote:
> 
> > Thank you so much for this great patches and the above nice test results.  I
> > believe the test setup and results make sense, and merging a revised 
> > version of
> > this patchset would provide real benefits to the users.
> 
> Glad to hear that!
> 
> > In a high level, I think it might better to separate DAMON internal changes
> > from DAMON external changes.
> 
> I agree.  I can't guarantee but I can move all the external changes
> inside mm/damon, but will try that as much as possible.
> 
> > For DAMON part changes, I have no big concern other than trivial coding 
> > style
> > level comments.
> 
> Sure.  I will fix those.
> 
> > For DAMON-external changes that implementing demote_pages() and
> > promote_pages(), I'm unsure if the implementation is reusing appropriate
> > functions, and if those are placee in right source file.  Especially, I'm
> > unsure if vmscan.c is the right place for promotion code.  Also I don't 
> > know if
> > there is a good agreement on the promotion/demotion target node decision.  
> > That
> > should be because I'm not that familiar with the areas and the files, but I
> > feel this might because our discussions on the promotion and the demotion
> > operations are having rooms for being more matured.  Because I'm not very
> > faimiliar with the part, I'd like to hear others' comments, too.
> 
> I would also like to hear others' comments, but this might not be needed
> if most of external code can be moved to mm/damon.
> 
> > To this end, I feel the problem might be able te be simpler, because this
> > patchset is trying to provide two sophisticated operations, while I think a
> > simpler approach might be possible.  My humble simpler idea is adding a 
> > DAMOS
> > operation for moving pages to a given node (like sys_move_phy_pages RFC[1]),
> > instead of the promote/demote.  Because the general pages migration can 
> > handle
> > multiple cases including the promote/demote in my humble assumption.
> 
> My initial implementation was similar but I found that it's not accurate
> enough due to the nature of inaccuracy of DAMON regions.  I saw that
> many pages were demoted and promoted back and forth because migration
> target regions include both hot and cold pages together.
> 
> So I have implemented the demotion and promotion logics based on the
> shrink_folio_list, which contains many corner case handling logics for
> reclaim.
> 
> Having the current demotion and promotion logics makes the hot/cold
> migration pretty accurate as expected.  We made a simple program called
> "hot_cold" and it receives 2 arguments for hot size and cold size in MB.
> For example, "hot_cold 200 500" allocates 200MB of hot memory and 500MB
> of cold memory.  It basically allocates 2 large blocks of memory with
> mmap, then repeat memset for the initial 200MB to make it accessed in an
> infinite loop.
> 
> Let's say there are 3 nodes in the system and the first node0 and node1
> are the first tier, and node2 is the second tier.
> 
>   $ cat /sys/devices/virtual/memory_tiering/memory_tier4/nodelist
>   0-1
> 
>   $ cat /sys/devices/virtual/memory_tiering/memory_tier22/nodelist
>   2
> 
> Here is the result of partitioning hot/cold memory and I put execution
> command at the right side of numastat result.  I initially ran each
> hot_cold program with preferred setting so that they initially allocate
> memory on one of node0 or node2, but they gradually migrated based on
> their access frequencies.
> 
>   $ numastat -c -p hot_cold
>   Per-node process memory usage (in MBs) 
>   PID  Node 0 Node 1 Node 2 Total 
>   ---  -- -- -- - 
>   754 (hot_cold) 1800  0   2000  3800<- hot_cold 1800 2000 
>   1184 (hot_cold) 300  0500   800<- hot_cold 300 500 
>   1818 (hot_cold) 801  0   3199  4000<- hot_cold 800 3200 
>   30289 (hot_cold)  4  0  510<- hot_cold 3 5 
>   30325 (hot_cold) 31  0 5181<- hot_cold 30 50 
>   ---  -- -- -- - 
>   Total  2938  0   5756  8695
> 
> The final node placement result shows that DAMON accurately migrated
> pages by their hotness for multiple processes.

What was the result when the corner cases handling logics were not applied?

And, what are the corner cases handling logic that seemed essential?  I show
the page g

Re: [RFC PATCH 2/4] mm/damon: introduce DAMOS_DEMOTE action for demotion

2024-01-16 Thread SeongJae Park

On Mon, 15 Jan 2024 13:52:50 +0900 Honggyu Kim  wrote:

> This patch introduces DAMOS_DEMOTE action, which is similar to
> DAMOS_PAGEOUT, but demote folios instead of swapping them out.
> 
> Since there are some common routines with pageout, many functions have
> similar logics between pageout and demote.
> 
> The execution sequence of DAMOS_PAGEOUT and DAMOS_DEMOTE look as follows.
> 
>   DAMOS_PAGEOUT action
> damo_pa_apply_scheme

Nit.  s/damo/damon/

> -> damon_pa_reclaim
> -> reclaim_pages
> -> reclaim_folio_list
> -> shrink_folio_list
> 
>   DAMOS_DEMOTE action
> damo_pa_apply_scheme

Ditto.

> -> damon_pa_reclaim
> -> demote_pages
> -> do_demote_folio_list
> -> __demote_folio_list
> -> demote_folio_list

I think implementation of 'demote_pages()' might better to be separated.

I'm also feeling the naming a bit strange, since I was usually thinking '__'
prefix is for functions that will internally used.  That is, I'd assume
__demote_folio_list() is called from demote_folio_list(), but this function is
doing in an opposite way.

> 
> __demote_folio_list() is a minimized version of shrink_folio_list(), but
> it's minified only for demotion.
> 
> Signed-off-by: Honggyu Kim 
> ---
>  include/linux/damon.h|  2 +
>  mm/damon/paddr.c | 17 +---
>  mm/damon/sysfs-schemes.c |  1 +
>  mm/internal.h|  1 +
>  mm/vmscan.c  | 84 
>  5 files changed, 99 insertions(+), 6 deletions(-)
> 
> diff --git a/include/linux/damon.h b/include/linux/damon.h
> index e00ddf1ed39c..4c0a0fef09c5 100644
> --- a/include/linux/damon.h
> +++ b/include/linux/damon.h
> @@ -106,6 +106,7 @@ struct damon_target {
>   * @DAMOS_LRU_PRIO:  Prioritize the region on its LRU lists.
>   * @DAMOS_LRU_DEPRIO:Deprioritize the region on its LRU lists.
>   * @DAMOS_STAT:  Do nothing but count the stat.
> + * @DAMOS_DEMOTE:Do demotion for the current region.

I'd prefer defining DEMOTE before STAT, like we introduced LRU_PRIO/DEPRIO
after STAT but defined there.  It would help keeping the two different groups
of operations separated (STAT is different from other actions since it is not
for makeing real changes but only get statistics and monitoring results
querying).

>   * @NR_DAMOS_ACTIONS:Total number of DAMOS actions
>   *
>   * The support of each action is up to running  damon_operations.
> @@ -123,6 +124,7 @@ enum damos_action {
>   DAMOS_LRU_PRIO,
>   DAMOS_LRU_DEPRIO,
>   DAMOS_STAT, /* Do nothing but only record the stat */
> + DAMOS_DEMOTE,

Ditto.

>   NR_DAMOS_ACTIONS,
>  };
>  
> diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
> index 081e2a325778..d3e3f077cd00 100644
> --- a/mm/damon/paddr.c
> +++ b/mm/damon/paddr.c
> @@ -224,7 +224,7 @@ static bool damos_pa_filter_out(struct damos *scheme, 
> struct folio *folio)
>   return false;
>  }
>  
> -static unsigned long damon_pa_pageout(struct damon_region *r, struct damos 
> *s)
> +static unsigned long damon_pa_reclaim(struct damon_region *r, struct damos 
> *s, bool is_demote)

I understand that reclamation could include both pageout and demotion, but not
sure if that is making its purpose clearer or more ambiguous.  What about
renaming to '..._demote_or_pageout()', like
'damon_pa_mark_accessed_or_deactivate()'?  Also, 'is_demote' could be simply
'demote'.

I think having a separate function, say, damon_pa_demote() is also ok, if it
makes code easier to read and not intorduce too much duplicated lines of code.

Also, I'd prefer keeping the 80 columns limit[1] by breaking this line.

[1] 
https://docs.kernel.org/process/coding-style.html?highlight=coding+style#breaking-long-lines-and-strings

>  {
>   unsigned long addr, applied;
>   LIST_HEAD(folio_list);
> @@ -242,14 +242,17 @@ static unsigned long damon_pa_pageout(struct 
> damon_region *r, struct damos *s)
>   folio_test_clear_young(folio);
>   if (!folio_isolate_lru(folio))
>   goto put_folio;
> - if (folio_test_unevictable(folio))
> + if (folio_test_unevictable(folio) && !is_demote)
>   folio_putback_lru(folio);
>   else
>   list_add(>lru, _list);
>  put_folio:
>   folio_put(folio);
>   }
> - applied = reclaim_pages(_list);
> + if (is_demote)
> + applied = demote_pages(_list);
> + else
> + applied = reclaim_pages(_list);
>   cond_resched();
>   return applied * PAGE_SIZE;
>  }
> @@ -297,13 +300,15 @@ static unsigned long damon_pa_apply_scheme(struct 
> damon_ctx *ctx,
>  {
>   switch (scheme->action) {
>   case DAMOS_PAGEOUT:
> - return damon_pa_pageout(r, scheme);
> + return damon_pa_reclaim(r, scheme, false);
>   case DAMOS_LRU_PRIO:
>   return damon_pa_mark_accessed(r, scheme);
>   case DAMOS_LRU_DEPRIO:
>

Re: [RFC PATCH 1/4] mm/vmscan: refactor reclaim_pages with reclaim_or_migrate_folios

2024-01-16 Thread SeongJae Park

On Mon, 15 Jan 2024 13:52:49 +0900 Honggyu Kim  wrote:

> Since we will introduce reclaim_pages like functions such as
> demote_pages and promote_pages, the most of the code can be shared.
> 
> This is a preparation patch that introduces reclaim_or_migrate_folios()
> to cover all the logics, but it provides a handler for the different
> actions.
> 
> No functional changes applied.
> 
> Signed-off-by: Honggyu Kim 
> ---
>  mm/vmscan.c | 18 --
>  1 file changed, 12 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/vmscan.c b/mm/vmscan.c
> index bba207f41b14..7ca2396ccc3b 100644
> --- a/mm/vmscan.c
> +++ b/mm/vmscan.c
> @@ -2107,15 +2107,16 @@ static unsigned int reclaim_folio_list(struct 
> list_head *folio_list,
>   return nr_reclaimed;
>  }
>  
> -unsigned long reclaim_pages(struct list_head *folio_list)
> +static unsigned long reclaim_or_migrate_folios(struct list_head *folio_list,
> + unsigned int (*handler)(struct list_head *, struct pglist_data 
> *))

I'm not very sure if extending this function for general migration is the right
approach, since we have dedicated functions for the migration.

I'd like to hear others' opinions.

>  {
>   int nid;
> - unsigned int nr_reclaimed = 0;
> + unsigned int nr_folios = 0;
>   LIST_HEAD(node_folio_list);
>   unsigned int noreclaim_flag;
>  
>   if (list_empty(folio_list))
> - return nr_reclaimed;
> + return nr_folios;
>  
>   noreclaim_flag = memalloc_noreclaim_save();
>  
> @@ -2129,15 +2130,20 @@ unsigned long reclaim_pages(struct list_head 
> *folio_list)
>   continue;
>   }
>  
> - nr_reclaimed += reclaim_folio_list(_folio_list, 
> NODE_DATA(nid));
> + nr_folios += handler(_folio_list, NODE_DATA(nid));
>   nid = folio_nid(lru_to_folio(folio_list));
>   } while (!list_empty(folio_list));
>  
> - nr_reclaimed += reclaim_folio_list(_folio_list, NODE_DATA(nid));
> + nr_folios += handler(_folio_list, NODE_DATA(nid));
>  
>   memalloc_noreclaim_restore(noreclaim_flag);
>  
> - return nr_reclaimed;
> + return nr_folios;
> +}
> +
> +unsigned long reclaim_pages(struct list_head *folio_list)
> +{
> + return reclaim_or_migrate_folios(folio_list, reclaim_folio_list);
>  }
>  
>  static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan,
> -- 
> 2.34.1

Re: [RFC PATCH 4/4] mm/damon: introduce DAMOS_PROMOTE action for promotion

2024-01-16 Thread SeongJae Park

On Mon, 15 Jan 2024 13:52:52 +0900 Honggyu Kim  wrote:

> From: Hyeongtak Ji 
> 
> This patch introduces DAMOS_PROMOTE action for paddr mode.
> 
> It includes renaming alloc_demote_folio to alloc_migrate_folio to use it
> for promotion as well.
> 
> The execution sequence of DAMOS_DEMOTE and DAMOS_PROMOTE look as
> follows for comparison.
> 
>   DAMOS_DEMOTE action
> damo_pa_apply_scheme
> -> damon_pa_reclaim
> -> demote_pages
> -> do_demote_folio_list
> -> __demote_folio_list
> -> demote_folio_list
> 
>   DAMOS_PROMOTE action
> damo_pa_apply_scheme
> -> damon_pa_promote
> -> promote_pages
> -> do_promote_folio_list
> -> __promote_folio_list
> -> promote_folio_list
> 
> Signed-off-by: Hyeongtak Ji 
> Signed-off-by: Honggyu Kim 
> ---
>  include/linux/damon.h  |   2 +
>  include/linux/migrate_mode.h   |   1 +
>  include/linux/vm_event_item.h  |   1 +
>  include/trace/events/migrate.h |   3 +-
>  mm/damon/paddr.c   |  29 
>  mm/damon/sysfs-schemes.c   |   1 +
>  mm/internal.h  |   1 +
>  mm/vmscan.c| 129 -
>  mm/vmstat.c|   1 +
>  9 files changed, 165 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/damon.h b/include/linux/damon.h
> index 4c0a0fef09c5..477060bb6718 100644
> --- a/include/linux/damon.h
> +++ b/include/linux/damon.h
> @@ -107,6 +107,7 @@ struct damon_target {
>   * @DAMOS_LRU_DEPRIO:Deprioritize the region on its LRU lists.
>   * @DAMOS_STAT:  Do nothing but count the stat.
>   * @DAMOS_DEMOTE:Do demotion for the current region.
> + * @DAMOS_PROMOTE:   Do promotion if possible, otherwise do nothing.

Like LRU_PRIO is defined before LRU_DEPRIO, what about defining PROMOTE before
DEMOTE?

>   * @NR_DAMOS_ACTIONS:Total number of DAMOS actions
>   *
>   * The support of each action is up to running  damon_operations.
> @@ -125,6 +126,7 @@ enum damos_action {
>   DAMOS_LRU_DEPRIO,
>   DAMOS_STAT, /* Do nothing but only record the stat */
>   DAMOS_DEMOTE,
> + DAMOS_PROMOTE,
>   NR_DAMOS_ACTIONS,
>  };
>  
> diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h
> index f37cc03f9369..63f75eb9abf3 100644
> --- a/include/linux/migrate_mode.h
> +++ b/include/linux/migrate_mode.h
> @@ -29,6 +29,7 @@ enum migrate_reason {
>   MR_CONTIG_RANGE,
>   MR_LONGTERM_PIN,
>   MR_DEMOTION,
> + MR_PROMOTION,
>   MR_TYPES
>  };
>  
> diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h
> index 8abfa1240040..63cf920afeaa 100644
> --- a/include/linux/vm_event_item.h
> +++ b/include/linux/vm_event_item.h
> @@ -44,6 +44,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT,
>   PGDEMOTE_KSWAPD,
>   PGDEMOTE_DIRECT,
>   PGDEMOTE_KHUGEPAGED,
> + PGPROMOTE,
>   PGSCAN_KSWAPD,
>   PGSCAN_DIRECT,
>   PGSCAN_KHUGEPAGED,
> diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h
> index 0190ef725b43..f0dd569c1e62 100644
> --- a/include/trace/events/migrate.h
> +++ b/include/trace/events/migrate.h
> @@ -22,7 +22,8 @@
>   EM( MR_NUMA_MISPLACED,  "numa_misplaced")   \
>   EM( MR_CONTIG_RANGE,"contig_range") \
>   EM( MR_LONGTERM_PIN,"longterm_pin") \
> - EMe(MR_DEMOTION,"demotion")
> + EM( MR_DEMOTION,"demotion") \
> + EMe(MR_PROMOTION,   "promotion")
>  
>  /*
>   * First define the enums in the above macros to be exported to userspace
> diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c
> index d3e3f077cd00..360ce69d5898 100644
> --- a/mm/damon/paddr.c
> +++ b/mm/damon/paddr.c
> @@ -257,6 +257,32 @@ static unsigned long damon_pa_reclaim(struct 
> damon_region *r, struct damos *s, b
>   return applied * PAGE_SIZE;
>  }
>  
> +static unsigned long damon_pa_promote(struct damon_region *r, struct damos 
> *s)
> +{
> + unsigned long addr, applied;
> + LIST_HEAD(folio_list);
> +
> + for (addr = r->ar.start; addr < r->ar.end; addr += PAGE_SIZE) {
> + struct folio *folio = damon_get_folio(PHYS_PFN(addr));
> +
> + if (!folio)
> + continue;
> +
> + if (damos_pa_filter_out(s, folio))
> + goto put_folio;
> +
> + if (!folio_isolate_lru(folio))
> + goto put_folio;
> +
> + list_add(>lru, _list);
> +put_folio:
> + folio_put(folio);
> + }
> + applied = promote_pages(_list);
> + cond_resched();
> + return applied * PAGE_SIZE;
> +}
> +
>  static inline unsigned long damon_pa_mark_accessed_or_deactivate(
>   struct damon_region *r, struct damos *s, bool mark_accessed)
>  {
> @@ -309,6 +335,8 @@ static unsigned long damon_pa_apply_scheme(struct 
> damon_ctx

Re: [RFC PATCH 3/4] mm/memory-tiers: add next_promotion_node to find promotion target

2024-01-16 Thread SeongJae Park

On Mon, 15 Jan 2024 13:52:51 +0900 Honggyu Kim  wrote:

> From: Hyeongtak Ji 
> 
> This patch adds next_promotion_node that can be used to identify the
> appropriate promotion target based on memory tiers.  When multiple
> promotion target nodes are available, the nearest node is selected based
> on numa distance.
> 
> Signed-off-by: Hyeongtak Ji 
> ---
>  include/linux/memory-tiers.h | 11 +
>  mm/memory-tiers.c| 43 
>  2 files changed, 54 insertions(+)
> 
> diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h
> index 1e39d27bee41..0788e435fc50 100644
> --- a/include/linux/memory-tiers.h
> +++ b/include/linux/memory-tiers.h
> @@ -50,6 +50,7 @@ int mt_set_default_dram_perf(int nid, struct 
> node_hmem_attrs *perf,
>  int mt_perf_to_adistance(struct node_hmem_attrs *perf, int *adist);
>  #ifdef CONFIG_MIGRATION
>  int next_demotion_node(int node);
> +int next_promotion_node(int node);
>  void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets);
>  bool node_is_toptier(int node);
>  #else
> @@ -58,6 +59,11 @@ static inline int next_demotion_node(int node)
>   return NUMA_NO_NODE;
>  }
>  
> +static inline int next_promotion_node(int node)
> +{
> + return NUMA_NO_NODE;
> +}
> +
>  static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t 
> *targets)
>  {
>   *targets = NODE_MASK_NONE;
> @@ -101,6 +107,11 @@ static inline int next_demotion_node(int node)
>   return NUMA_NO_NODE;
>  }
>  
> +static inline int next_promotion_node(int node)
> +{
> + return NUMA_NO_NODE;
> +}
> +
>  static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t 
> *targets)
>  {
>   *targets = NODE_MASK_NONE;
> diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c
> index 8d5291add2bc..0060ee571cf4 100644
> --- a/mm/memory-tiers.c
> +++ b/mm/memory-tiers.c
> @@ -335,6 +335,49 @@ int next_demotion_node(int node)
>   return target;
>  }
>  
> +/*
> + * Select a promotion target that is close to the from node among the given
> + * two nodes.
> + *
> + * TODO: consider other decision policy as node_distance may not be precise.
> + */
> +static int select_promotion_target(int a, int b, int from)
> +{
> + if (node_distance(from, a) < node_distance(from, b))
> + return a;
> + else
> + return b;
> +}
> +
> +/**
> + * next_promotion_node() - Get the next node in the promotion path
> + * @node: The starting node to lookup the next node
> + *
> + * Return: node id for next memory node in the promotion path hierarchy
> + * from @node; NUMA_NO_NODE if @node is the toptier.
> + */
> +int next_promotion_node(int node)
> +{
> + int target = NUMA_NO_NODE;
> + int nid;
> +
> + if (node_is_toptier(node))
> + return NUMA_NO_NODE;
> +
> + rcu_read_lock();
> + for_each_node_state(nid, N_MEMORY) {
> + if (node_isset(node, node_demotion[nid].preferred)) {
> + if (target == NUMA_NO_NODE)
> + target = nid;
> + else
> + target = select_promotion_target(nid, target, 
> node);
> + }
> + }
> + rcu_read_unlock();
> +
> + return target;
> +}
> +

If this is gonna used by only DAMON and we don't have a concrete plan to making
this used by others, I think implementing this in mm/damon/ might make sense.

>  static void disable_all_demotion_targets(void)
>  {
>   struct memory_tier *memtier;
> -- 
> 2.34.1

Re: [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory

2024-01-16 Thread SeongJae Park

Hello,

On Mon, 15 Jan 2024 13:52:48 +0900 Honggyu Kim  wrote:

> There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously
> posted at [1].
> 
> It says there is no implementation of the demote/promote DAMOS action
> are made.  This RFC is about its implementation for physical address
> space.
> 
[...]
> Evaluation Results
> ==
> 
[...]
> In summary of both results, our evaluation shows that "DAMON 2-tier"
> memory management reduces the performance slowdown compared to the
> "default" memory policy from 15~17% to 4~5% when the system runs with
> high memory pressure on its fast tier DRAM nodes.
> 
> The similar evaluation was done in another machine that has 256GB of
> local DRAM and 96GB of CXL memory.  The performance slowdown is reduced
> from 20~24% for "default" to 5~7% for "DAMON 2-tier".
> 
> Having these DAMOS_DEMOTE and DAMOS_PROMOTE actions can make 2-tier
> memory systems run more efficiently under high memory pressures.

Thank you so much for this great patches and the above nice test results.  I
believe the test setup and results make sense, and merging a revised version of
this patchset would provide real benefits to the users.

In a high level, I think it might better to separate DAMON internal changes
from DAMON external changes.

For DAMON part changes, I have no big concern other than trivial coding style
level comments.

For DAMON-external changes that implementing demote_pages() and
promote_pages(), I'm unsure if the implementation is reusing appropriate
functions, and if those are placee in right source file.  Especially, I'm
unsure if vmscan.c is the right place for promotion code.  Also I don't know if
there is a good agreement on the promotion/demotion target node decision.  That
should be because I'm not that familiar with the areas and the files, but I
feel this might because our discussions on the promotion and the demotion
operations are having rooms for being more matured.  Because I'm not very
faimiliar with the part, I'd like to hear others' comments, too.

To this end, I feel the problem might be able to be simpler, because this
patchset is trying to provide two sophisticated operations, while I think a
simpler approach might be possible.  My humble simpler idea is adding a DAMOS
operation for moving pages to a given node (like sys_move_phy_pages RFC[1]),
instead of the promote/demote.  Because the general pages migration can handle
multiple cases including the promote/demote in my humble assumption.  In more
detail, users could decide which is the appropriate node for promotion or
demotion and use the new DAMOS action to do promotion and demotion.  Users
would requested to decide which node is the proper promotion/demotion target
nodes, but that decision wouldn't be that hard in my opinion.

For this, 'struct damos' would need to be updated for such argument-dependent
actions, like 'struct damos_filter' is haing a union.

In future, we could extend the operation to the promotion and the demotion
after the dicussion around the promotion and demotion is matured, if required.
And assuming DAMON be extended for originating CPU-aware access monitoring, the
new DAMOS action would also cover more use cases such as general NUMA nodes
balancing (extending DAMON for CPU-aware monitoring would required), and some
complex configurations where having both CPU affinity and tiered memory.  I
also think that may well fit with my RFC idea[2] for tiered memory management.

Looking forward to opinions from you and others.  I admig I miss many things,
and more than happy to be enlightened.

[1] https://lwn.net/Articles/944007/
[2] https://lore.kernel.org/damon/20231112195602.61525-1...@kernel.org/

Thanks,
SJ

> 
> Signed-off-by: Honggyu Kim 
> Signed-off-by: Hyeongtak Ji 
> Signed-off-by: Rakie Kim 
> 
> [1] https://lore.kernel.org/damon/20231112195602.61525-1...@kernel.org
> [2] https://github.com/skhynix/hmsdk
> [3] https://github.com/redis/redis/tree/7.0.0
> [4] https://github.com/brianfrankcooper/YCSB/tree/0.17.0
> [5] https://dl.acm.org/doi/10.1145/3503222.3507731
> [6] https://dl.acm.org/doi/10.1145/3582016.3582063
> 
> Honggyu Kim (2):
>   mm/vmscan: refactor reclaim_pages with reclaim_or_migrate_folios
>   mm/damon: introduce DAMOS_DEMOTE action for demotion
> 
> Hyeongtak Ji (2):
>   mm/memory-tiers: add next_promotion_node to find promotion target
>   mm/damon: introduce DAMOS_PROMOTE action for promotion
> 
>  include/linux/damon.h  |   4 +
>  include/linux/memory-tiers.h   |  11 ++
>  include/linux/migrate_mode.h   |   1 +
>  include/linux/vm_event_item.h  |   1 +
>  include/trace/events/migrate.h |   3 +-
>  mm/damon/paddr.c   |  46 ++-
>  mm/damon/sysfs-schemes.c   |   2 +
>  mm/internal.h  |   2 +
>  mm/memory-tiers.c  |  43 ++
>  mm/vmscan.c| 231 +++--
>  mm/vmstat.c|   1 +
>  11 files changed, 330 insertions(+), 15

Re: [PATCH v3] vmscan: add trace events for lru_gen

2023-09-25 Thread SeongJae Park

Hello,

On Sun, 24 Sep 2023 23:23:43 +0900 Jaewon Kim  wrote:

> As the legacy lru provides, the lru_gen needs some trace events for
> debugging.
> 
> This commit introduces 2 trace events.
>   trace_mm_vmscan_lru_gen_scan
>   trace_mm_vmscan_lru_gen_evict
> 
> Each event is similar to the following legacy events.
>   trace_mm_vmscan_lru_isolate,
>   trace_mm_vmscan_lru_shrink_[in]active
> 
> Here's an example
>   mm_vmscan_lru_gen_scan: isolate_mode=0 classzone=1 order=9 
> nr_requested=4096 nr_scanned=431 nr_skipped=0 nr_taken=55 lru=anon
>   mm_vmscan_lru_gen_evict: nid=0 nr_reclaimed=42 nr_dirty=0 nr_writeback=0 
> nr_congested=0 nr_immediate=0 nr_activate_anon=13 nr_activate_file=0 
> nr_ref_keep=0 nr_unmap_fail=0 priority=2 
> flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC
>   mm_vmscan_lru_gen_scan: isolate_mode=0 classzone=1 order=9 
> nr_requested=4096 nr_scanned=66 nr_skipped=0 nr_taken=64 lru=file
>   mm_vmscan_lru_gen_evict: nid=0 nr_reclaimed=62 nr_dirty=0 nr_writeback=0 
> nr_congested=0 nr_immediate=0 nr_activate_anon=0 nr_activate_file=2 
> nr_ref_keep=0 nr_unmap_fail=0 priority=2 
> flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC
> 
> Signed-off-by: Jaewon Kim 
> Reviewed-by: Steven Rostedt (Google) 
> Reviewed-by: T.J. Mercier 
> ---
> v3: change printk format
> v2: use condition and make it aligned
> v1: introduce trace events
> ---
>  include/trace/events/mmflags.h |  5 ++
>  include/trace/events/vmscan.h  | 98 ++
>  mm/vmscan.c| 17 --
>  3 files changed, 115 insertions(+), 5 deletions(-)
> 
> diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
> index 1478b9dd05fa..44e9b38f83e7 100644
> --- a/include/trace/events/mmflags.h
> +++ b/include/trace/events/mmflags.h
> @@ -274,6 +274,10 @@ IF_HAVE_VM_SOFTDIRTY(VM_SOFTDIRTY,   "softdirty" 
> )   \
>   EM (LRU_ACTIVE_FILE, "active_file") \
>   EMe(LRU_UNEVICTABLE, "unevictable")
>  
> +#define LRU_GEN_NAMES\
> + EM (LRU_GEN_ANON, "anon") \
> + EMe(LRU_GEN_FILE, "file")
> +

I found this patchset makes build fails when !CONFIG_LRU_GEN, like below:

In file included from /linux/include/trace/trace_events.h:27,
 from /linux/include/trace/define_trace.h:102,
 from /linux/include/trace/events/oom.h:195,
 from /linux/mm/oom_kill.c:53:
/linux/include/trace/events/mmflags.h:278:7: error: ‘LRU_GEN_ANON’ 
undeclared here (not in a function); did you mean ‘LRU_GEN_PGOFF’?
  278 |   EM (LRU_GEN_ANON, "anon") \
  |   ^~~~

Maybe some config checks are needed?


Thanks,
SJ

[...]

[PATCH 3/9] mm/damon/core: use nr_accesses_bp as a source of damos_before_apply tracepoint

2023-09-15 Thread SeongJae Park

damos_before_apply tracepoint is exposing access rate of DAMON regions
using nr_accesses field of regions, which was actually used by DAMOS in
the past.  However, it has changed to use nr_accesses_bp instead.
Update the tracepoint to expose the value that DAMOS is really using.

Note that it doesn't expose the value as is in the basis point, but
after converting it to the natural number by dividing it by 10,000.
Therefore this change doesn't make user-visible behavioral differences.

Signed-off-by: SeongJae Park 
---
 include/trace/events/damon.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h
index 19930bb7af9a..23200aabccac 100644
--- a/include/trace/events/damon.h
+++ b/include/trace/events/damon.h
@@ -36,7 +36,7 @@ TRACE_EVENT_CONDITION(damos_before_apply,
__entry->target_idx = target_idx;
__entry->start = r->ar.start;
__entry->end = r->ar.end;
-   __entry->nr_accesses = r->nr_accesses;
+   __entry->nr_accesses = r->nr_accesses_bp / 1;
__entry->age = r->age;
__entry->nr_regions = nr_regions;
),
-- 
2.25.1

[PATCH RESEND v2 1/2] mm/damon/core: add a tracepoint for damos apply target regions

2023-09-12 Thread SeongJae Park

DAMON provides damon_aggregated tracepoint, which exposes details of
each region and its access monitoring results.  It is useful for
getting whole monitoring results, e.g., for recording purposes.

For investigations of DAMOS, DAMON Sysfs interface provides DAMOS
statistics and tried_regions directory.  But, those provides only
statistics and snapshots.  If the scheme is frequently applied and if
the user needs to know every detail of DAMOS behavior, the
snapshot-based interface could be insufficient and expensive.

As a last resort, userspace users need to record the all monitoring
results via damon_aggregated tracepoint and simulate how DAMOS would
worked.  It is unnecessarily complicated.  DAMON kernel API users,
meanwhile, can do that easily via before_damos_apply() callback field of
'struct damon_callback', though.

Add a tracepoint that will be called just after before_damos_apply()
callback for more convenient investigations of DAMOS.  The tracepoint
exposes all details about each regions, similar to damon_aggregated
tracepoint.

Please note that DAMOS is currently not only for memory management but
also for query-like efficient monitoring results retrievals (when 'stat'
action is used).  Until now, only statistics or snapshots were
supported.  Addition of this tracepoint allows efficient full recording
of DAMOS-based filtered monitoring results.

Signed-off-by: SeongJae Park 
---
 include/trace/events/damon.h | 39 
 mm/damon/core.c  | 32 -
 2 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h
index 0b8d13bde17a..19930bb7af9a 100644
--- a/include/trace/events/damon.h
+++ b/include/trace/events/damon.h
@@ -9,6 +9,45 @@
 #include 
 #include 
 
+TRACE_EVENT_CONDITION(damos_before_apply,
+
+   TP_PROTO(unsigned int context_idx, unsigned int scheme_idx,
+   unsigned int target_idx, struct damon_region *r,
+   unsigned int nr_regions, bool do_trace),
+
+   TP_ARGS(context_idx, target_idx, scheme_idx, r, nr_regions, do_trace),
+
+   TP_CONDITION(do_trace),
+
+   TP_STRUCT__entry(
+   __field(unsigned int, context_idx)
+   __field(unsigned int, scheme_idx)
+   __field(unsigned long, target_idx)
+   __field(unsigned long, start)
+   __field(unsigned long, end)
+   __field(unsigned int, nr_accesses)
+   __field(unsigned int, age)
+   __field(unsigned int, nr_regions)
+   ),
+
+   TP_fast_assign(
+   __entry->context_idx = context_idx;
+   __entry->scheme_idx = scheme_idx;
+   __entry->target_idx = target_idx;
+   __entry->start = r->ar.start;
+   __entry->end = r->ar.end;
+   __entry->nr_accesses = r->nr_accesses;
+   __entry->age = r->age;
+   __entry->nr_regions = nr_regions;
+   ),
+
+   TP_printk("ctx_idx=%u scheme_idx=%u target_idx=%lu nr_regions=%u 
%lu-%lu: %u %u",
+   __entry->context_idx, __entry->scheme_idx,
+   __entry->target_idx, __entry->nr_regions,
+   __entry->start, __entry->end,
+   __entry->nr_accesses, __entry->age)
+);
+
 TRACE_EVENT(damon_aggregated,
 
TP_PROTO(unsigned int target_id, struct damon_region *r,
diff --git a/mm/damon/core.c b/mm/damon/core.c
index ca631dd88b33..3ca34a252a3c 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -950,6 +950,33 @@ static void damos_apply_scheme(struct damon_ctx *c, struct 
damon_target *t,
struct timespec64 begin, end;
unsigned long sz_applied = 0;
int err = 0;
+   /*
+* We plan to support multiple context per kdamond, as DAMON sysfs
+* implies with 'nr_contexts' file.  Nevertheless, only single context
+* per kdamond is supported for now.  So, we can simply use '0' context
+* index here.
+*/
+   unsigned int cidx = 0;
+   struct damos *siter;/* schemes iterator */
+   unsigned int sidx = 0;
+   struct damon_target *titer; /* targets iterator */
+   unsigned int tidx = 0;
+   bool do_trace = false;
+
+   /* get indices for trace_damos_before_apply() */
+   if (trace_damos_before_apply_enabled()) {
+   damon_for_each_scheme(siter, c) {
+   if (siter == s)
+   break;
+   sidx++;
+   }
+   damon_for_each_target(titer, c) {
+   if (titer == t)
+   break;
+   tidx++;
+   }
+   do_trace = true;
+   }
 
if (c->ops.apply_scheme) {
if (quota->esz && quota->cha

[PATCH v2 1/2] mm/damon/core: add a tracepoint for damos apply target regions

2023-09-12 Thread SeongJae Park

DAMON provides damon_aggregated tracepoint, which exposes details of
each region and its access monitoring results.  It is useful for
getting whole monitoring results, e.g., for recording purposes.

For investigations of DAMOS, DAMON Sysfs interface provides DAMOS
statistics and tried_regions directory.  But, those provides only
statistics and snapshots.  If the scheme is frequently applied and if
the user needs to know every detail of DAMOS behavior, the
snapshot-based interface could be insufficient and expensive.

As a last resort, userspace users need to record the all monitoring
results via damon_aggregated tracepoint and simulate how DAMOS would
worked.  It is unnecessarily complicated.  DAMON kernel API users,
meanwhile, can do that easily via before_damos_apply() callback field of
'struct damon_callback', though.

Add a tracepoint that will be called just after before_damos_apply()
callback for more convenient investigations of DAMOS.  The tracepoint
exposes all details about each regions, similar to damon_aggregated
tracepoint.

Please note that DAMOS is currently not only for memory management but
also for query-like efficient monitoring results retrievals (when 'stat'
action is used).  Until now, only statistics or snapshots were
supported.  Addition of this tracepoint allows efficient full recording
of DAMOS-based filtered monitoring results.

Signed-off-by: SeongJae Park 
---
 include/trace/events/damon.h | 39 
 mm/damon/core.c  | 32 -
 2 files changed, 70 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h
index 0b8d13bde17a..19930bb7af9a 100644
--- a/include/trace/events/damon.h
+++ b/include/trace/events/damon.h
@@ -9,6 +9,45 @@
 #include 
 #include 
 
+TRACE_EVENT_CONDITION(damos_before_apply,
+
+   TP_PROTO(unsigned int context_idx, unsigned int scheme_idx,
+   unsigned int target_idx, struct damon_region *r,
+   unsigned int nr_regions, bool do_trace),
+
+   TP_ARGS(context_idx, target_idx, scheme_idx, r, nr_regions, do_trace),
+
+   TP_CONDITION(do_trace),
+
+   TP_STRUCT__entry(
+   __field(unsigned int, context_idx)
+   __field(unsigned int, scheme_idx)
+   __field(unsigned long, target_idx)
+   __field(unsigned long, start)
+   __field(unsigned long, end)
+   __field(unsigned int, nr_accesses)
+   __field(unsigned int, age)
+   __field(unsigned int, nr_regions)
+   ),
+
+   TP_fast_assign(
+   __entry->context_idx = context_idx;
+   __entry->scheme_idx = scheme_idx;
+   __entry->target_idx = target_idx;
+   __entry->start = r->ar.start;
+   __entry->end = r->ar.end;
+   __entry->nr_accesses = r->nr_accesses;
+   __entry->age = r->age;
+   __entry->nr_regions = nr_regions;
+   ),
+
+   TP_printk("ctx_idx=%u scheme_idx=%u target_idx=%lu nr_regions=%u 
%lu-%lu: %u %u",
+   __entry->context_idx, __entry->scheme_idx,
+   __entry->target_idx, __entry->nr_regions,
+   __entry->start, __entry->end,
+   __entry->nr_accesses, __entry->age)
+);
+
 TRACE_EVENT(damon_aggregated,
 
TP_PROTO(unsigned int target_id, struct damon_region *r,
diff --git a/mm/damon/core.c b/mm/damon/core.c
index ca631dd88b33..3ca34a252a3c 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -950,6 +950,33 @@ static void damos_apply_scheme(struct damon_ctx *c, struct 
damon_target *t,
struct timespec64 begin, end;
unsigned long sz_applied = 0;
int err = 0;
+   /*
+* We plan to support multiple context per kdamond, as DAMON sysfs
+* implies with 'nr_contexts' file.  Nevertheless, only single context
+* per kdamond is supported for now.  So, we can simply use '0' context
+* index here.
+*/
+   unsigned int cidx = 0;
+   struct damos *siter;/* schemes iterator */
+   unsigned int sidx = 0;
+   struct damon_target *titer; /* targets iterator */
+   unsigned int tidx = 0;
+   bool do_trace = false;
+
+   /* get indices for trace_damos_before_apply() */
+   if (trace_damos_before_apply_enabled()) {
+   damon_for_each_scheme(siter, c) {
+   if (siter == s)
+   break;
+   sidx++;
+   }
+   damon_for_each_target(titer, c) {
+   if (titer == t)
+   break;
+   tidx++;
+   }
+   do_trace = true;
+   }
 
if (c->ops.apply_scheme) {
if (quota->esz && quota->cha

Re: [PATCH 1/2] mm/damon/core: add a tracepoint for damos apply target regions

2023-09-11 Thread SeongJae Park

On Mon, 11 Sep 2023 16:51:44 -0400 Steven Rostedt  wrote:

> On Mon, 11 Sep 2023 20:36:42 +
> SeongJae Park  wrote:
> 
> > > Then tracing is fully enabled here, and now we enter:
> > > 
> > >   if (trace_damos_before_apply_enabled()) {
> > >   trace_damos_before_apply(cidx, sidx, tidx, r,
> > >   damon_nr_regions(t));
> > >   }
> > > 
> > > Now the trace event is hit with sidx and tidx zero when they should not 
> > > be.
> > > This could confuse you when looking at the report.  
> > 
> > Thank you so much for enlightening me with this kind explanation, Steve!  
> > And
> > this all make sense.  I will follow your suggestion in the next spin.
> > 
> > > 
> > > What I suggested was to initialize sidx to zero,  
> > 
> > Nit.  Initialize to not zero but -1, right?
> 
> Yeah, but I was also thinking of the reset of it too :-p
> 
>   sidx = -1;
> 
>   if (trace_damos_before_apply_enabled()) {
>   sidx = 0;

Thank you for clarifying, Steve :)

Nevertheless, since the variable is unsigned int, I would need to use UINT_MAX
instead.  To make the code easier to understand, I'd prefer to add a third
parameter, as you suggested as another option at the original reply, like
below:

--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -997,6 +997,7 @@ static void damos_apply_scheme(struct damon_ctx *c, struct 
damon_target *t,
unsigned int sidx = 0;
struct damon_target *titer; /* targets iterator */
unsigned int tidx = 0;
+   bool do_trace = false;

/* get indices for trace_damos_before_apply() */
if (trace_damos_before_apply_enabled()) {
@@ -1010,6 +1011,7 @@ static void damos_apply_scheme(struct damon_ctx *c, 
struct damon_target *t,
break;
tidx++;
}
+   do_trace = true;
}

if (c->ops.apply_scheme) {
@@ -1036,7 +1038,7 @@ static void damos_apply_scheme(struct damon_ctx *c, 
struct damon_target *t,
err = c->callback.before_damos_apply(c, t, r, s);
if (!err) {
trace_damos_before_apply(cidx, sidx, tidx, r,
-   damon_nr_regions(t));
+   damon_nr_regions(t), do_trace);
sz_applied = c->ops.apply_scheme(c, t, r, s);
}
ktime_get_coarse_ts64();


Thanks,
SJ

> 
> -- Steve
> 
> 
> > 
> > > set it in the first trace_*_enabled() check, and ignore calling the
> > > tracepoint if it's not >= 0.
> > >

Re: [PATCH 1/2] mm/damon/core: add a tracepoint for damos apply target regions

2023-09-11 Thread SeongJae Park

Hi Steven,

On Mon, 11 Sep 2023 14:19:55 -0400 Steven Rostedt  wrote:

> On Mon, 11 Sep 2023 04:59:07 +
> SeongJae Park  wrote:
> 
> > --- a/mm/damon/core.c
> > +++ b/mm/damon/core.c
> > @@ -950,6 +950,28 @@ static void damos_apply_scheme(struct damon_ctx *c, 
> > struct damon_target *t,
> > struct timespec64 begin, end;
> > unsigned long sz_applied = 0;
> > int err = 0;
> > +   /*
> > +* We plan to support multiple context per kdamond, as DAMON sysfs
> > +* implies with 'nr_contexts' file.  Nevertheless, only single context
> > +* per kdamond is supported for now.  So, we can simply use '0' context
> > +* index here.
> > +*/
> > +   unsigned int cidx = 0;
> > +   struct damos *siter;/* schemes iterator */
> > +   unsigned int sidx = 0;
> > +   struct damon_target *titer; /* targets iterator */
> > +   unsigned int tidx = 0;
> > +
> 
> If this loop is only for passing sidx and tidx to the trace point,

You're correct.

> you can add around it:
> 
>   if (trace_damos_before_apply_enabled()) {
> 
> > +   damon_for_each_scheme(siter, c) {
> > +   if (siter == s)
> > +   break;
> > +   sidx++;
> > +   }
> > +   damon_for_each_target(titer, c) {
> > +   if (titer == t)
> > +   break;
> > +   tidx++;
> > +   }
> 
>   }
> 
> 
> And then this loop will only be done if that trace event is enabled.

Today I learned yet another great feature of the tracing framework.  Thank you
Steven, I will add that to the next spin of this patchset!

> 
> To prevent races, you may also want to add a third parameter, or initialize
> them to -1:
> 
>   sidx = -1;
> 
>   if (trace_damo_before_apply_enabled()) {
>   sidx = 0;
>   [..]
>   }
> 
> And you can change the TRACE_EVENT() TO TRACE_EVENT_CONDITION():
> 
> TRACE_EVENT_CONDITION(damos_before_apply,
> 
>   TP_PROTO(...),
> 
>   TP_ARGS(...),
> 
>   TP_CONDITION(sidx >= 0),
> 
> and the trace event will not be called if sidx is less than zero.
> 
> Also, this if statement is only done when the trace event is enabled, so
> it's equivalent to:
> 
>   if (trace_damos_before_apply_enabled()) {
>   if (sdx >= 0)
>   trace_damos_before_apply(cidx, sidx, tidx, r,
>   damon_nr_regions(t));
>   }

Again, thank you very much for letting me know this awesome feature.  However,
sidx is supposed to be always >=0 here, since kdamond is running in single
thread and hence no race is expected.  If it exists, it's a bug.  So, I
wouldn't make this change.  Appreciate again for letting me know this very
useful feature, and please let me know if I'm missing something, though!


Thanks,
SJ

> 
> -- Steve
> 
> 
> 
> >  
> > if (c->ops.apply_scheme) {
> > if (quota->esz && quota->charged_sz + sz > quota->esz) {
> > @@ -964,8 +986,11 @@ static void damos_apply_scheme(struct damon_ctx *c, 
> > struct damon_target *t,
> > ktime_get_coarse_ts64();
> > if (c->callback.before_damos_apply)
> > err = c->callback.before_damos_apply(c, t, r, s);
> > -   if (!err)
> > +   if (!err) {
> > +   trace_damos_before_apply(cidx, sidx, tidx, r,
> > +   damon_nr_regions(t));
> > sz_applied = c->ops.apply_scheme(c, t, r, s);
> > +   }
> > ktime_get_coarse_ts64();
> > quota->total_charged_ns += timespec64_to_ns() -
> > timespec64_to_ns();
> > -- 
>

Re: [PATCH 1/2] mm/damon/core: add a tracepoint for damos apply target regions

2023-09-11 Thread SeongJae Park

On Mon, 11 Sep 2023 16:31:27 -0400 Steven Rostedt  wrote:

> On Mon, 11 Sep 2023 19:05:04 +
> SeongJae Park  wrote:
> 
> > > Also, this if statement is only done when the trace event is enabled, so
> > > it's equivalent to:
> > > 
> > >   if (trace_damos_before_apply_enabled()) {
> > >   if (sdx >= 0)
> > >   trace_damos_before_apply(cidx, sidx, tidx, r,
> > >   damon_nr_regions(t));
> > >   }  
> > 
> > Again, thank you very much for letting me know this awesome feature.  
> > However,
> > sidx is supposed to be always >=0 here, since kdamond is running in single
> > thread and hence no race is expected.  If it exists, it's a bug.  So, I
> > wouldn't make this change.  Appreciate again for letting me know this very
> > useful feature, and please let me know if I'm missing something, though!
> 
> The race isn't with your code, but the enabling of tracing.
> 
> Let's say you enable tracing just ass it passed the first:
> 
>   if (trace_damos_before_apply_enabled()) {
>
>   damon_for_each_scheme(siter, c) {
>   if (siter == s)
>   break;
>   sidx++;
>   }
>   damon_for_each_target(titer, c) {
>   if (titer == t)
>   break;
>   tidx++;
>   }  
> 
> Now, sidx and tidx is zero (when they were not computed, thus, they
> shouldn't be zero.
> 
> Then tracing is fully enabled here, and now we enter:
> 
>   if (trace_damos_before_apply_enabled()) {
>   trace_damos_before_apply(cidx, sidx, tidx, r,
>   damon_nr_regions(t));
>   }
> 
> Now the trace event is hit with sidx and tidx zero when they should not be.
> This could confuse you when looking at the report.

Thank you so much for enlightening me with this kind explanation, Steve!  And
this all make sense.  I will follow your suggestion in the next spin.

> 
> What I suggested was to initialize sidx to zero,

Nit.  Initialize to not zero but -1, right?

> set it in the first trace_*_enabled() check, and ignore calling the
> tracepoint if it's not >= 0.
> 
> -- Steve
> 


Thanks,
SJ

[PATCH 1/2] mm/damon/core: add a tracepoint for damos apply target regions

2023-09-10 Thread SeongJae Park

DAMON provides damon_aggregated tracepoint, which exposes details of
each region and its access monitoring results.  It is useful for
getting whole monitoring results, e.g., for recording purposes.

For investigations of DAMOS, DAMON Sysfs interface provides DAMOS
statistics and tried_regions directory.  But, those provides only
statistics and snapshots.  If the scheme is frequently applied and if
the user needs to know every detail of DAMOS behavior, the
snapshot-based interface could be insufficient and expensive.

As a last resort, userspace users need to record the all monitoring
results via damon_aggregated tracepoint and simulate how DAMOS would
worked.  It is unnecessarily complicated.  DAMON kernel API users,
meanwhile, can do that easily via before_damos_apply() callback field of
'struct damon_callback', though.

Add a tracepoint that will be called just after before_damos_apply()
callback for more convenient investigations of DAMOS.  The tracepoint
exposes all details about each regions, similar to damon_aggregated
tracepoint.

Please note that DAMOS is currently not only for memory management but
also for query-like efficient monitoring results retrievals (when 'stat'
action is used).  Until now, only statistics or snapshots were
supported.  Addition of this tracepoint allows efficient full recording
of DAMOS-based filtered monitoring results.

Signed-off-by: SeongJae Park 
---
 include/trace/events/damon.h | 37 
 mm/damon/core.c  | 27 +-
 2 files changed, 63 insertions(+), 1 deletion(-)

diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h
index 0b8d13bde17a..9e7b39495b05 100644
--- a/include/trace/events/damon.h
+++ b/include/trace/events/damon.h
@@ -9,6 +9,43 @@
 #include 
 #include 
 
+TRACE_EVENT(damos_before_apply,
+
+   TP_PROTO(unsigned int context_idx, unsigned int scheme_idx,
+   unsigned int target_idx, struct damon_region *r,
+   unsigned int nr_regions),
+
+   TP_ARGS(context_idx, target_idx, scheme_idx, r, nr_regions),
+
+   TP_STRUCT__entry(
+   __field(unsigned int, context_idx)
+   __field(unsigned int, scheme_idx)
+   __field(unsigned long, target_idx)
+   __field(unsigned long, start)
+   __field(unsigned long, end)
+   __field(unsigned int, nr_accesses)
+   __field(unsigned int, age)
+   __field(unsigned int, nr_regions)
+   ),
+
+   TP_fast_assign(
+   __entry->context_idx = context_idx;
+   __entry->scheme_idx = scheme_idx;
+   __entry->target_idx = target_idx;
+   __entry->start = r->ar.start;
+   __entry->end = r->ar.end;
+   __entry->nr_accesses = r->nr_accesses;
+   __entry->age = r->age;
+   __entry->nr_regions = nr_regions;
+   ),
+
+   TP_printk("ctx_idx=%u scheme_idx=%u target_idx=%lu nr_regions=%u 
%lu-%lu: %u %u",
+   __entry->context_idx, __entry->scheme_idx,
+   __entry->target_idx, __entry->nr_regions,
+   __entry->start, __entry->end,
+   __entry->nr_accesses, __entry->age)
+);
+
 TRACE_EVENT(damon_aggregated,
 
TP_PROTO(unsigned int target_id, struct damon_region *r,
diff --git a/mm/damon/core.c b/mm/damon/core.c
index ca631dd88b33..aa7fbcdf7310 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -950,6 +950,28 @@ static void damos_apply_scheme(struct damon_ctx *c, struct 
damon_target *t,
struct timespec64 begin, end;
unsigned long sz_applied = 0;
int err = 0;
+   /*
+* We plan to support multiple context per kdamond, as DAMON sysfs
+* implies with 'nr_contexts' file.  Nevertheless, only single context
+* per kdamond is supported for now.  So, we can simply use '0' context
+* index here.
+*/
+   unsigned int cidx = 0;
+   struct damos *siter;/* schemes iterator */
+   unsigned int sidx = 0;
+   struct damon_target *titer; /* targets iterator */
+   unsigned int tidx = 0;
+
+   damon_for_each_scheme(siter, c) {
+   if (siter == s)
+   break;
+   sidx++;
+   }
+   damon_for_each_target(titer, c) {
+   if (titer == t)
+   break;
+   tidx++;
+   }
 
if (c->ops.apply_scheme) {
if (quota->esz && quota->charged_sz + sz > quota->esz) {
@@ -964,8 +986,11 @@ static void damos_apply_scheme(struct damon_ctx *c, struct 
damon_target *t,
ktime_get_coarse_ts64();
if (c->callback.before_damos_apply)
err = c->callback.before_damos_apply(c, t, r, s);
-   if (!er

[RFC 3/8] mm/damon/core: expose nr_accesses_bp from damos_before_apply tracepoint

2023-09-09 Thread SeongJae Park

damos_before_apply tracepoint is exposing access rate of DAMON regions
using nr_accesses, which was actually used by DAMOS in the past.
However, it has changed to use nr_accesses_bp instead.  Update the
tracepoint to expose the value that DAMOS is really using.  Note that it
doesn't expose the value as is in the basis point, but after converting
it to the natural number by dividing it by 10,000.  That's for avoiding
confuses for old users.

Signed-off-by: SeongJae Park 
---
 include/trace/events/damon.h | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h
index 9e7b39495b05..6f98198c0104 100644
--- a/include/trace/events/damon.h
+++ b/include/trace/events/damon.h
@@ -34,7 +34,7 @@ TRACE_EVENT(damos_before_apply,
__entry->target_idx = target_idx;
__entry->start = r->ar.start;
__entry->end = r->ar.end;
-   __entry->nr_accesses = r->nr_accesses;
+   __entry->nr_accesses = r->nr_accesses_bp / 1;
__entry->age = r->age;
__entry->nr_regions = nr_regions;
),
-- 
2.25.1

Re: [PATCH v2 00/16] Multigenerational LRU Framework

2021-04-13 Thread SeongJae Park

From: SeongJae Park 

On Tue, 13 Apr 2021 10:13:24 -0600 Jens Axboe  wrote:

> On 4/13/21 1:51 AM, SeongJae Park wrote:
> > From: SeongJae Park 
> > 
> > Hello,
> > 
> > 
> > Very interesting work, thank you for sharing this :)
> > 
> > On Tue, 13 Apr 2021 00:56:17 -0600 Yu Zhao  wrote:
> > 
> >> What's new in v2
> >> 
> >> Special thanks to Jens Axboe for reporting a regression in buffered
> >> I/O and helping test the fix.
> > 
> > Is the discussion open?  If so, could you please give me a link?
> 
> I wasn't on the initial post (or any of the lists it was posted to), but
> it's on the google page reclaim list. Not sure if that is public or not.
> 
> tldr is that I was pretty excited about this work, as buffered IO tends
> to suck (a lot) for high throughput applications. My test case was
> pretty simple:
> 
> Randomly read a fast device, using 4k buffered IO, and watch what
> happens when the page cache gets filled up. For this particular test,
> we'll initially be doing 2.1GB/sec of IO, and then drop to 1.5-1.6GB/sec
> with kswapd using a lot of CPU trying to keep up. That's mainline
> behavior.
> 
> The initial posting of this patchset did no better, in fact it did a bit
> worse. Performance dropped to the same levels and kswapd was using as
> much CPU as before, but on top of that we also got excessive swapping.
> Not at a high rate, but 5-10MB/sec continually.
> 
> I had some back and forths with Yu Zhao and tested a few new revisions,
> and the current series does much better in this regard. Performance
> still dips a bit when page cache fills, but not nearly as much, and
> kswapd is using less CPU than before.
> 
> Hope that helps,

Appreciate this kind and detailed explanation, Jens!

So, my understanding is that v2 of this patchset improved the performance by
using frequency (tier) in addition to recency (generation number) for buffered
I/O pages.  That makes sense to me.  If I'm misunderstanding, please let me
know.


Thanks,
SeongJae Park

> -- 
> Jens Axboe
>

[PATCH v28 11/13] mm/damon: Add kunit tests

2021-04-13 Thread SeongJae Park

From: SeongJae Park 

This commit adds kunit based unit tests for the core and the virtual
address spaces monitoring primitives of DAMON.

Signed-off-by: SeongJae Park 
Reviewed-by: Brendan Higgins 
---
 mm/damon/Kconfig  |  36 +
 mm/damon/core-test.h  | 253 
 mm/damon/core.c   |   7 +
 mm/damon/dbgfs-test.h | 126 
 mm/damon/dbgfs.c  |   2 +
 mm/damon/vaddr-test.h | 328 ++
 mm/damon/vaddr.c  |   7 +
 7 files changed, 759 insertions(+)
 create mode 100644 mm/damon/core-test.h
 create mode 100644 mm/damon/dbgfs-test.h
 create mode 100644 mm/damon/vaddr-test.h

diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig
index 72f1683ba0ee..455995152697 100644
--- a/mm/damon/Kconfig
+++ b/mm/damon/Kconfig
@@ -12,6 +12,18 @@ config DAMON
  See https://damonitor.github.io/doc/html/latest-damon/index.html for
  more information.
 
+config DAMON_KUNIT_TEST
+   bool "Test for damon" if !KUNIT_ALL_TESTS
+   depends on DAMON && KUNIT=y
+   default KUNIT_ALL_TESTS
+   help
+ This builds the DAMON Kunit test suite.
+
+ For more information on KUnit and unit tests in general, please refer
+ to the KUnit documentation.
+
+ If unsure, say N.
+
 config DAMON_VADDR
bool "Data access monitoring primitives for virtual address spaces"
depends on DAMON && MMU
@@ -21,6 +33,18 @@ config DAMON_VADDR
  This builds the default data access monitoring primitives for DAMON
  that works for virtual address spaces.
 
+config DAMON_VADDR_KUNIT_TEST
+   bool "Test for DAMON primitives" if !KUNIT_ALL_TESTS
+   depends on DAMON_VADDR && KUNIT=y
+   default KUNIT_ALL_TESTS
+   help
+ This builds the DAMON virtual addresses primitives Kunit test suite.
+
+ For more information on KUnit and unit tests in general, please refer
+ to the KUnit documentation.
+
+ If unsure, say N.
+
 config DAMON_DBGFS
bool "DAMON debugfs interface"
depends on DAMON_VADDR && DEBUG_FS
@@ -30,4 +54,16 @@ config DAMON_DBGFS
 
  If unsure, say N.
 
+config DAMON_DBGFS_KUNIT_TEST
+   bool "Test for damon debugfs interface" if !KUNIT_ALL_TESTS
+   depends on DAMON_DBGFS && KUNIT=y
+   default KUNIT_ALL_TESTS
+   help
+ This builds the DAMON debugfs interface Kunit test suite.
+
+ For more information on KUnit and unit tests in general, please refer
+ to the KUnit documentation.
+
+ If unsure, say N.
+
 endmenu
diff --git a/mm/damon/core-test.h b/mm/damon/core-test.h
new file mode 100644
index ..b815dfbfb5fd
--- /dev/null
+++ b/mm/damon/core-test.h
@@ -0,0 +1,253 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Data Access Monitor Unit Tests
+ *
+ * Copyright 2019 Amazon.com, Inc. or its affiliates.  All rights reserved.
+ *
+ * Author: SeongJae Park 
+ */
+
+#ifdef CONFIG_DAMON_KUNIT_TEST
+
+#ifndef _DAMON_CORE_TEST_H
+#define _DAMON_CORE_TEST_H
+
+#include 
+
+static void damon_test_regions(struct kunit *test)
+{
+   struct damon_region *r;
+   struct damon_target *t;
+
+   r = damon_new_region(1, 2);
+   KUNIT_EXPECT_EQ(test, 1ul, r->ar.start);
+   KUNIT_EXPECT_EQ(test, 2ul, r->ar.end);
+   KUNIT_EXPECT_EQ(test, 0u, r->nr_accesses);
+
+   t = damon_new_target(42);
+   KUNIT_EXPECT_EQ(test, 0u, damon_nr_regions(t));
+
+   damon_add_region(r, t);
+   KUNIT_EXPECT_EQ(test, 1u, damon_nr_regions(t));
+
+   damon_del_region(r);
+   KUNIT_EXPECT_EQ(test, 0u, damon_nr_regions(t));
+
+   damon_free_target(t);
+}
+
+static unsigned int nr_damon_targets(struct damon_ctx *ctx)
+{
+   struct damon_target *t;
+   unsigned int nr_targets = 0;
+
+   damon_for_each_target(t, ctx)
+   nr_targets++;
+
+   return nr_targets;
+}
+
+static void damon_test_target(struct kunit *test)
+{
+   struct damon_ctx *c = damon_new_ctx();
+   struct damon_target *t;
+
+   t = damon_new_target(42);
+   KUNIT_EXPECT_EQ(test, 42ul, t->id);
+   KUNIT_EXPECT_EQ(test, 0u, nr_damon_targets(c));
+
+   damon_add_target(c, t);
+   KUNIT_EXPECT_EQ(test, 1u, nr_damon_targets(c));
+
+   damon_destroy_target(t);
+   KUNIT_EXPECT_EQ(test, 0u, nr_damon_targets(c));
+
+   damon_destroy_ctx(c);
+}
+
+/*
+ * Test kdamond_reset_aggregated()
+ *
+ * DAMON checks access to each region and aggregates this information as the
+ * access frequency of each region.  In detail, it increases '->nr_accesses' of
+ * regions that an access has confirmed.  'kdamond_reset_aggregated()' flushes
+ * the aggregated information ('->nr_accesses' of each regions) to the result
+ * buffer.  As a result of the flushing, the '->nr_accesses' of regions are
+ * initialized to zero.
+ */
+static voi

[PATCH v28 12/13] mm/damon: Add user space selftests

2021-04-13 Thread SeongJae Park

From: SeongJae Park 

This commit adds a simple user space tests for DAMON.  The tests are
using kselftest framework.

Signed-off-by: SeongJae Park 
---
 tools/testing/selftests/damon/Makefile|  7 ++
 .../selftests/damon/_chk_dependency.sh| 28 ++
 .../testing/selftests/damon/debugfs_attrs.sh  | 98 +++
 3 files changed, 133 insertions(+)
 create mode 100644 tools/testing/selftests/damon/Makefile
 create mode 100644 tools/testing/selftests/damon/_chk_dependency.sh
 create mode 100755 tools/testing/selftests/damon/debugfs_attrs.sh

diff --git a/tools/testing/selftests/damon/Makefile 
b/tools/testing/selftests/damon/Makefile
new file mode 100644
index ..8a3f2cd9fec0
--- /dev/null
+++ b/tools/testing/selftests/damon/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for damon selftests
+
+TEST_FILES = _chk_dependency.sh
+TEST_PROGS = debugfs_attrs.sh
+
+include ../lib.mk
diff --git a/tools/testing/selftests/damon/_chk_dependency.sh 
b/tools/testing/selftests/damon/_chk_dependency.sh
new file mode 100644
index ..e090836c2bf7
--- /dev/null
+++ b/tools/testing/selftests/damon/_chk_dependency.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
+DBGFS=/sys/kernel/debug/damon
+
+if [ $EUID -ne 0 ];
+then
+   echo "Run as root"
+   exit $ksft_skip
+fi
+
+if [ ! -d $DBGFS ]
+then
+   echo "$DBGFS not found"
+   exit $ksft_skip
+fi
+
+for f in attrs target_ids monitor_on
+do
+   if [ ! -f "$DBGFS/$f" ]
+   then
+   echo "$f not found"
+   exit 1
+   fi
+done
diff --git a/tools/testing/selftests/damon/debugfs_attrs.sh 
b/tools/testing/selftests/damon/debugfs_attrs.sh
new file mode 100755
index ..4a8ab4910ee4
--- /dev/null
+++ b/tools/testing/selftests/damon/debugfs_attrs.sh
@@ -0,0 +1,98 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+source ./_chk_dependency.sh
+
+# Test attrs file
+file="$DBGFS/attrs"
+
+ORIG_CONTENT=$(cat $file)
+
+echo 1 2 3 4 5 > $file
+if [ $? -ne 0 ]
+then
+   echo "$file write failed"
+   echo $ORIG_CONTENT > $file
+   exit 1
+fi
+
+echo 1 2 3 4 > $file
+if [ $? -eq 0 ]
+then
+   echo "$file write success (should failed)"
+   echo $ORIG_CONTENT > $file
+   exit 1
+fi
+
+CONTENT=$(cat $file)
+if [ "$CONTENT" != "1 2 3 4 5" ]
+then
+   echo "$file not written"
+   echo $ORIG_CONTENT > $file
+   exit 1
+fi
+
+echo $ORIG_CONTENT > $file
+
+# Test target_ids file
+file="$DBGFS/target_ids"
+
+ORIG_CONTENT=$(cat $file)
+
+echo "1 2 3 4" > $file
+if [ $? -ne 0 ]
+then
+   echo "$file write fail"
+   echo $ORIG_CONTENT > $file
+   exit 1
+fi
+
+echo "1 2 abc 4" > $file
+if [ $? -ne 0 ]
+then
+   echo "$file write fail"
+   echo $ORIG_CONTENT > $file
+   exit 1
+fi
+
+CONTENT=$(cat $file)
+if [ "$CONTENT" != "1 2" ]
+then
+   echo "$file not written"
+   echo $ORIG_CONTENT > $file
+   exit 1
+fi
+
+echo abc 2 3 > $file
+if [ $? -ne 0 ]
+then
+   echo "$file wrong value write fail"
+   echo $ORIG_CONTENT > $file
+   exit 1
+fi
+
+if [ ! -z "$(cat $file)" ]
+then
+   echo "$file not cleared"
+   echo $ORIG_CONTENT > $file
+   exit 1
+fi
+
+echo > $file
+if [ $? -ne 0 ]
+then
+   echo "$file init fail"
+   echo $ORIG_CONTENT > $file
+   exit 1
+fi
+
+if [ ! -z "$(cat $file)" ]
+then
+   echo "$file not initialized"
+   echo $ORIG_CONTENT > $file
+   exit 1
+fi
+
+echo $ORIG_CONTENT > $file
+
+echo "PASS"
-- 
2.17.1

[PATCH v28 10/13] Documentation: Add documents for DAMON

2021-04-13 Thread SeongJae Park

From: SeongJae Park 

This commit adds documents for DAMON under
`Documentation/admin-guide/mm/damon/` and `Documentation/vm/damon/`.

Signed-off-by: SeongJae Park 
---
 Documentation/admin-guide/mm/damon/guide.rst | 158 +
 Documentation/admin-guide/mm/damon/index.rst |  15 ++
 Documentation/admin-guide/mm/damon/plans.rst |  29 +++
 Documentation/admin-guide/mm/damon/start.rst | 114 +
 Documentation/admin-guide/mm/damon/usage.rst | 112 +
 Documentation/admin-guide/mm/index.rst   |   1 +
 Documentation/vm/damon/api.rst   |  20 ++
 Documentation/vm/damon/design.rst| 166 +
 Documentation/vm/damon/eval.rst  | 232 +++
 Documentation/vm/damon/faq.rst   |  58 +
 Documentation/vm/damon/index.rst |  31 +++
 Documentation/vm/index.rst   |   1 +
 12 files changed, 937 insertions(+)
 create mode 100644 Documentation/admin-guide/mm/damon/guide.rst
 create mode 100644 Documentation/admin-guide/mm/damon/index.rst
 create mode 100644 Documentation/admin-guide/mm/damon/plans.rst
 create mode 100644 Documentation/admin-guide/mm/damon/start.rst
 create mode 100644 Documentation/admin-guide/mm/damon/usage.rst
 create mode 100644 Documentation/vm/damon/api.rst
 create mode 100644 Documentation/vm/damon/design.rst
 create mode 100644 Documentation/vm/damon/eval.rst
 create mode 100644 Documentation/vm/damon/faq.rst
 create mode 100644 Documentation/vm/damon/index.rst

diff --git a/Documentation/admin-guide/mm/damon/guide.rst 
b/Documentation/admin-guide/mm/damon/guide.rst
new file mode 100644
index ..f52dc1669bb1
--- /dev/null
+++ b/Documentation/admin-guide/mm/damon/guide.rst
@@ -0,0 +1,158 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==
+Optimization Guide
+==
+
+This document helps you estimating the amount of benefit that you could get
+from DAMON-based optimizations, and describes how you could achieve it.  You
+are assumed to already read :doc:`start`.
+
+
+Check The Signs
+===
+
+No optimization can provide same extent of benefit to every case.  Therefore
+you should first guess how much improvements you could get using DAMON.  If
+some of below conditions match your situation, you could consider using DAMON.
+
+- *Low IPC and High Cache Miss Ratios.*  Low IPC means most of the CPU time is
+  spent waiting for the completion of time-consuming operations such as memory
+  access, while high cache miss ratios mean the caches don't help it well.
+  DAMON is not for cache level optimization, but DRAM level.  However,
+  improving DRAM management will also help this case by reducing the memory
+  operation latency.
+- *Memory Over-commitment and Unknown Users.*  If you are doing memory
+  overcommitment and you cannot control every user of your system, a memory
+  bank run could happen at any time.  You can estimate when it will happen
+  based on DAMON's monitoring results and act earlier to avoid or deal better
+  with the crisis.
+- *Frequent Memory Pressure.*  Frequent memory pressure means your system has
+  wrong configurations or memory hogs.  DAMON will help you find the right
+  configuration and/or the criminals.
+- *Heterogeneous Memory System.*  If your system is utilizing memory devices
+  that placed between DRAM and traditional hard disks, such as non-volatile
+  memory or fast SSDs, DAMON could help you utilizing the devices more
+  efficiently.
+
+
+Profile
+===
+
+If you found some positive signals, you could start by profiling your workloads
+using DAMON.  Find major workloads on your systems and analyze their data
+access pattern to find something wrong or can be improved.  The DAMON user
+space tool (``damo``) will be useful for this.  You can get ``damo`` from
+https://github.com/awslabs/damo.
+
+We recommend you to start from working set size distribution check using ``damo
+report wss``.  If the distribution is ununiform or quite different from what
+you estimated, you could consider `Memory Configuration`_ optimization.
+
+Then, review the overall access pattern in heatmap form using ``damo report
+heats``.  If it shows a simple pattern consists of a small number of memory
+regions having high contrast of access temperature, you could consider manual
+`Program Modification`_.
+
+If you still want to absorb more benefits, you should develop `Personalized
+DAMON Application`_ for your special case.
+
+You don't need to take only one approach among the above plans, but you could
+use multiple of the above approaches to maximize the benefit.
+
+
+Optimize
+
+
+If the profiling result also says it's worth trying some optimization, you
+could consider below approaches.  Note that some of the below approaches assume
+that your systems are configured with swap devices or other types of auxiliary
+memory so that you don't strictly required to accommodate the whole working set
+in the main memory.  Most

[PATCH v28 13/13] MAINTAINERS: Update for DAMON

2021-04-13 Thread SeongJae Park

From: SeongJae Park 

This commit updates MAINTAINERS file for DAMON related files.

Signed-off-by: SeongJae Park 
---
 MAINTAINERS | 12 
 1 file changed, 12 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 4d68184d3f76..42bbcaec5050 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5025,6 +5025,18 @@ F:   net/ax25/ax25_out.c
 F: net/ax25/ax25_timer.c
 F: net/ax25/sysctl_net_ax25.c
 
+DATA ACCESS MONITOR
+M: SeongJae Park 
+L: linux...@kvack.org
+S: Maintained
+F: Documentation/admin-guide/mm/damon/*
+F: Documentation/vm/damon/*
+F: include/linux/damon.h
+F: include/trace/events/damon.h
+F: mm/damon/*
+F: tools/damon/*
+F: tools/testing/selftests/damon/*
+
 DAVICOM FAST ETHERNET (DMFE) NETWORK DRIVER
 L: net...@vger.kernel.org
 S: Orphan
-- 
2.17.1

[PATCH v28 08/13] mm/damon/dbgfs: Export kdamond pid to the user space

2021-04-13 Thread SeongJae Park

From: SeongJae Park 

For CPU usage accounting, knowing pid of the monitoring thread could be
helpful.  For example, users could use cpuaccount cgroups with the pid.

This commit therefore exports the pid of currently running monitoring
thread to the user space via 'kdamond_pid' file in the debugfs
directory.

Signed-off-by: SeongJae Park 
---
 mm/damon/dbgfs.c | 38 --
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c
index 17c7878cfcb8..67b273472c0b 100644
--- a/mm/damon/dbgfs.c
+++ b/mm/damon/dbgfs.c
@@ -237,6 +237,32 @@ static ssize_t dbgfs_target_ids_write(struct file *file,
return ret;
 }
 
+static ssize_t dbgfs_kdamond_pid_read(struct file *file,
+   char __user *buf, size_t count, loff_t *ppos)
+{
+   struct damon_ctx *ctx = file->private_data;
+   char *kbuf;
+   ssize_t len;
+
+   kbuf = kmalloc(count, GFP_KERNEL);
+   if (!kbuf)
+   return -ENOMEM;
+
+   mutex_lock(>kdamond_lock);
+   if (ctx->kdamond)
+   len = scnprintf(kbuf, count, "%d\n", ctx->kdamond->pid);
+   else
+   len = scnprintf(kbuf, count, "none\n");
+   mutex_unlock(>kdamond_lock);
+   if (!len)
+   goto out;
+   len = simple_read_from_buffer(buf, count, ppos, kbuf, len);
+
+out:
+   kfree(kbuf);
+   return len;
+}
+
 static int damon_dbgfs_open(struct inode *inode, struct file *file)
 {
file->private_data = inode->i_private;
@@ -258,10 +284,18 @@ static const struct file_operations target_ids_fops = {
.write = dbgfs_target_ids_write,
 };
 
+static const struct file_operations kdamond_pid_fops = {
+   .owner = THIS_MODULE,
+   .open = damon_dbgfs_open,
+   .read = dbgfs_kdamond_pid_read,
+};
+
 static void dbgfs_fill_ctx_dir(struct dentry *dir, struct damon_ctx *ctx)
 {
-   const char * const file_names[] = {"attrs", "target_ids"};
-   const struct file_operations *fops[] = {_fops, _ids_fops};
+   const char * const file_names[] = {"attrs", "target_ids",
+   "kdamond_pid"};
+   const struct file_operations *fops[] = {_fops, _ids_fops,
+   _pid_fops};
int i;
 
for (i = 0; i < ARRAY_SIZE(file_names); i++)
-- 
2.17.1

[PATCH v28 09/13] mm/damon/dbgfs: Support multiple contexts

2021-04-13 Thread SeongJae Park

From: SeongJae Park 

In some use cases, users would want to run multiple monitoring context.
For example, if a user wants a high precision monitoring and dedicating
multiple CPUs for the job is ok, because DAMON creates one monitoring
thread per one context, the user can split the monitoring target regions
into multiple small regions and create one context for each region.  Or,
someone might want to simultaneously monitor different address spaces,
e.g., both virtual address space and physical address space.

The DAMON's API allows such usage, but 'damon-dbgfs' does not.
Therefore, only kernel space DAMON users can do multiple contexts
monitoring.

This commit allows the user space DAMON users to use multiple contexts
monitoring by introducing two new 'damon-dbgfs' debugfs files,
'mk_context' and 'rm_context'.  Users can create a new monitoring
context by writing the desired name of the new context to 'mk_context'.
Then, a new directory with the name and having the files for setting of
the context ('attrs', 'target_ids' and 'record') will be created under
the debugfs directory.  Writing the name of the context to remove to
'rm_context' will remove the related context and directory.

Signed-off-by: SeongJae Park 
---
 mm/damon/dbgfs.c | 197 ++-
 1 file changed, 195 insertions(+), 2 deletions(-)

diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c
index 67b273472c0b..734bc14f0100 100644
--- a/mm/damon/dbgfs.c
+++ b/mm/damon/dbgfs.c
@@ -18,6 +18,7 @@
 static struct damon_ctx **dbgfs_ctxs;
 static int dbgfs_nr_ctxs;
 static struct dentry **dbgfs_dirs;
+static DEFINE_MUTEX(damon_dbgfs_lock);
 
 /*
  * Returns non-empty string on success, negarive error code otherwise.
@@ -314,6 +315,186 @@ static struct damon_ctx *dbgfs_new_ctx(void)
return ctx;
 }
 
+static void dbgfs_destroy_ctx(struct damon_ctx *ctx)
+{
+   damon_destroy_ctx(ctx);
+}
+
+/*
+ * Make a context of @name and create a debugfs directory for it.
+ *
+ * This function should be called while holding damon_dbgfs_lock.
+ *
+ * Returns 0 on success, negative error code otherwise.
+ */
+static int dbgfs_mk_context(char *name)
+{
+   struct dentry *root, **new_dirs, *new_dir;
+   struct damon_ctx **new_ctxs, *new_ctx;
+
+   if (damon_nr_running_ctxs())
+   return -EBUSY;
+
+   new_ctxs = krealloc(dbgfs_ctxs, sizeof(*dbgfs_ctxs) *
+   (dbgfs_nr_ctxs + 1), GFP_KERNEL);
+   if (!new_ctxs)
+   return -ENOMEM;
+   dbgfs_ctxs = new_ctxs;
+
+   new_dirs = krealloc(dbgfs_dirs, sizeof(*dbgfs_dirs) *
+   (dbgfs_nr_ctxs + 1), GFP_KERNEL);
+   if (!new_dirs)
+   return -ENOMEM;
+   dbgfs_dirs = new_dirs;
+
+   root = dbgfs_dirs[0];
+   if (!root)
+   return -ENOENT;
+
+   new_dir = debugfs_create_dir(name, root);
+   dbgfs_dirs[dbgfs_nr_ctxs] = new_dir;
+
+   new_ctx = dbgfs_new_ctx();
+   if (!new_ctx) {
+   debugfs_remove(new_dir);
+   dbgfs_dirs[dbgfs_nr_ctxs] = NULL;
+   return -ENOMEM;
+   }
+
+   dbgfs_ctxs[dbgfs_nr_ctxs] = new_ctx;
+   dbgfs_fill_ctx_dir(dbgfs_dirs[dbgfs_nr_ctxs],
+   dbgfs_ctxs[dbgfs_nr_ctxs]);
+   dbgfs_nr_ctxs++;
+
+   return 0;
+}
+
+static ssize_t dbgfs_mk_context_write(struct file *file,
+   const char __user *buf, size_t count, loff_t *ppos)
+{
+   char *kbuf;
+   char *ctx_name;
+   ssize_t ret = count;
+   int err;
+
+   kbuf = user_input_str(buf, count, ppos);
+   if (IS_ERR(kbuf))
+   return PTR_ERR(kbuf);
+   ctx_name = kmalloc(count + 1, GFP_KERNEL);
+   if (!ctx_name) {
+   kfree(kbuf);
+   return -ENOMEM;
+   }
+
+   /* Trim white space */
+   if (sscanf(kbuf, "%s", ctx_name) != 1) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   mutex_lock(_dbgfs_lock);
+   err = dbgfs_mk_context(ctx_name);
+   if (err)
+   ret = err;
+   mutex_unlock(_dbgfs_lock);
+
+out:
+   kfree(kbuf);
+   kfree(ctx_name);
+   return ret;
+}
+
+/*
+ * Remove a context of @name and its debugfs directory.
+ *
+ * This function should be called while holding damon_dbgfs_lock.
+ *
+ * Return 0 on success, negative error code otherwise.
+ */
+static int dbgfs_rm_context(char *name)
+{
+   struct dentry *root, *dir, **new_dirs;
+   struct damon_ctx **new_ctxs;
+   int i, j;
+
+   if (damon_nr_running_ctxs())
+   return -EBUSY;
+
+   root = dbgfs_dirs[0];
+   if (!root)
+   return -ENOENT;
+
+   dir = debugfs_lookup(name, root);
+   if (!dir)
+   return -ENOENT;
+
+   new_dirs = kmalloc_array(dbgfs_nr_ctxs - 1, sizeof(*dbgfs_dirs),
+   GFP_KERNEL);
+   if (!new_dirs)
+   return -ENOMEM;
+
+   new_ctxs = kma

[PATCH v28 05/13] mm/damon: Implement primitives for the virtual memory address spaces

2021-04-13 Thread SeongJae Park

From: SeongJae Park 

This commit introduces a reference implementation of the address space
specific low level primitives for the virtual address space, so that
users of DAMON can easily monitor the data accesses on virtual address
spaces of specific processes by simply configuring the implementation to
be used by DAMON.

The low level primitives for the fundamental access monitoring are
defined in two parts:

1. Identification of the monitoring target address range for the address
   space.
2. Access check of specific address range in the target space.

The reference implementation for the virtual address space does the
works as below.

PTE Accessed-bit Based Access Check
---

The implementation uses PTE Accessed-bit for basic access checks.  That
is, it clears the bit for the next sampling target page and checks
whether it is set again after one sampling period.  This could disturb
the reclaim logic.  DAMON uses ``PG_idle`` and ``PG_young`` page flags
to solve the conflict, as Idle page tracking does.

VMA-based Target Address Range Construction
---

Only small parts in the super-huge virtual address space of the
processes are mapped to physical memory and accessed.  Thus, tracking
the unmapped address regions is just wasteful.  However, because DAMON
can deal with some level of noise using the adaptive regions adjustment
mechanism, tracking every mapping is not strictly required but could
even incur a high overhead in some cases.  That said, too huge unmapped
areas inside the monitoring target should be removed to not take the
time for the adaptive mechanism.

For the reason, this implementation converts the complex mappings to
three distinct regions that cover every mapped area of the address
space.  Also, the two gaps between the three regions are the two biggest
unmapped areas in the given address space.  The two biggest unmapped
areas would be the gap between the heap and the uppermost mmap()-ed
region, and the gap between the lowermost mmap()-ed region and the stack
in most of the cases.  Because these gaps are exceptionally huge in
usual address spaces, excluding these will be sufficient to make a
reasonable trade-off.  Below shows this in detail::




(small mmap()-ed regions and munmap()-ed regions)




Signed-off-by: SeongJae Park 
Reviewed-by: Leonard Foerster 
---
 include/linux/damon.h |  13 +
 mm/damon/Kconfig  |   9 +
 mm/damon/Makefile |   1 +
 mm/damon/vaddr.c  | 616 ++
 4 files changed, 639 insertions(+)
 create mode 100644 mm/damon/vaddr.c

diff --git a/include/linux/damon.h b/include/linux/damon.h
index 0bd5d6913a6c..72cf5ebd35fe 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -246,4 +246,17 @@ int damon_stop(struct damon_ctx **ctxs, int nr_ctxs);
 
 #endif /* CONFIG_DAMON */
 
+#ifdef CONFIG_DAMON_VADDR
+
+/* Monitoring primitives for virtual memory address spaces */
+void damon_va_init(struct damon_ctx *ctx);
+void damon_va_update(struct damon_ctx *ctx);
+void damon_va_prepare_access_checks(struct damon_ctx *ctx);
+unsigned int damon_va_check_accesses(struct damon_ctx *ctx);
+bool damon_va_target_valid(void *t);
+void damon_va_cleanup(struct damon_ctx *ctx);
+void damon_va_set_primitives(struct damon_ctx *ctx);
+
+#endif /* CONFIG_DAMON_VADDR */
+
 #endif /* _DAMON_H */
diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig
index d00e99ac1a15..8ae080c52950 100644
--- a/mm/damon/Kconfig
+++ b/mm/damon/Kconfig
@@ -12,4 +12,13 @@ config DAMON
  See https://damonitor.github.io/doc/html/latest-damon/index.html for
  more information.
 
+config DAMON_VADDR
+   bool "Data access monitoring primitives for virtual address spaces"
+   depends on DAMON && MMU
+   select PAGE_EXTENSION if !64BIT
+   select PAGE_IDLE_FLAG
+   help
+ This builds the default data access monitoring primitives for DAMON
+ that works for virtual address spaces.
+
 endmenu
diff --git a/mm/damon/Makefile b/mm/damon/Makefile
index 4fd2edb4becf..6ebbd08aed67 100644
--- a/mm/damon/Makefile
+++ b/mm/damon/Makefile
@@ -1,3 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0
 
 obj-$(CONFIG_DAMON):= core.o
+obj-$(CONFIG_DAMON_VADDR)  += vaddr.o
diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
new file mode 100644
index ..3bc9dc9f0656
--- /dev/null
+++ b/mm/damon/vaddr.c
@@ -0,0 +1,616 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * DAMON Primitives for Virtual Address Spaces
+ *
+ * Author: SeongJae Park 
+ */
+
+#define pr_fmt(fmt) "damon-va: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Get a random number in [l, r) */
+#define damon_rand(l, r) (l + prandom_u32_max(r - l))
+
+/*
+ * 't->id' should be the pointer to the relevant 'struct pid' having reference
+ * count.  Caller must put the returned task, u

[PATCH v28 03/13] mm/damon: Adaptively adjust regions

2021-04-13 Thread SeongJae Park

From: SeongJae Park 

Even somehow the initial monitoring target regions are well constructed
to fulfill the assumption (pages in same region have similar access
frequencies), the data access pattern can be dynamically changed.  This
will result in low monitoring quality.  To keep the assumption as much
as possible, DAMON adaptively merges and splits each region based on
their access frequency.

For each ``aggregation interval``, it compares the access frequencies of
adjacent regions and merges those if the frequency difference is small.
Then, after it reports and clears the aggregated access frequency of
each region, it splits each region into two or three regions if the
total number of regions will not exceed the user-specified maximum
number of regions after the split.

In this way, DAMON provides its best-effort quality and minimal overhead
while keeping the upper-bound overhead that users set.

Signed-off-by: SeongJae Park 
Reviewed-by: Leonard Foerster 
---
 include/linux/damon.h |  23 +++--
 mm/damon/core.c   | 214 +-
 2 files changed, 227 insertions(+), 10 deletions(-)

diff --git a/include/linux/damon.h b/include/linux/damon.h
index 67db309ad61b..0bd5d6913a6c 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -12,6 +12,9 @@
 #include 
 #include 
 
+/* Minimal region size.  Every damon_region is aligned by this. */
+#define DAMON_MIN_REGION   PAGE_SIZE
+
 /**
  * struct damon_addr_range - Represents an address region of [@start, @end).
  * @start: Start address of the region (inclusive).
@@ -85,6 +88,8 @@ struct damon_ctx;
  * prepared for the next access check.
  * @check_accesses should check the accesses to each region that made after the
  * last preparation and update the number of observed accesses of each region.
+ * It should also return max number of observed accesses that made as a result
+ * of its update.  The value will be used for regions adjustment threshold.
  * @reset_aggregated should reset the access monitoring results that aggregated
  * by @check_accesses.
  * @target_valid should check whether the target is still valid for the
@@ -95,7 +100,7 @@ struct damon_primitive {
void (*init)(struct damon_ctx *context);
void (*update)(struct damon_ctx *context);
void (*prepare_access_checks)(struct damon_ctx *context);
-   void (*check_accesses)(struct damon_ctx *context);
+   unsigned int (*check_accesses)(struct damon_ctx *context);
void (*reset_aggregated)(struct damon_ctx *context);
bool (*target_valid)(void *target);
void (*cleanup)(struct damon_ctx *context);
@@ -172,7 +177,9 @@ struct damon_callback {
  * @primitive: Set of monitoring primitives for given use cases.
  * @callback:  Set of callbacks for monitoring events notifications.
  *
- * @region_targets:Head of monitoring targets (_target) list.
+ * @min_nr_regions:The minimum number of adaptive monitoring regions.
+ * @max_nr_regions:The maximum number of adaptive monitoring regions.
+ * @adaptive_targets:  Head of monitoring targets (_target) list.
  */
 struct damon_ctx {
unsigned long sample_interval;
@@ -191,7 +198,9 @@ struct damon_ctx {
struct damon_primitive primitive;
struct damon_callback callback;
 
-   struct list_head region_targets;
+   unsigned long min_nr_regions;
+   unsigned long max_nr_regions;
+   struct list_head adaptive_targets;
 };
 
 #define damon_next_region(r) \
@@ -207,10 +216,10 @@ struct damon_ctx {
list_for_each_entry_safe(r, next, >regions_list, list)
 
 #define damon_for_each_target(t, ctx) \
-   list_for_each_entry(t, &(ctx)->region_targets, list)
+   list_for_each_entry(t, &(ctx)->adaptive_targets, list)
 
 #define damon_for_each_target_safe(t, next, ctx)   \
-   list_for_each_entry_safe(t, next, &(ctx)->region_targets, list)
+   list_for_each_entry_safe(t, next, &(ctx)->adaptive_targets, list)
 
 #ifdef CONFIG_DAMON
 
@@ -224,11 +233,13 @@ struct damon_target *damon_new_target(unsigned long id);
 void damon_add_target(struct damon_ctx *ctx, struct damon_target *t);
 void damon_free_target(struct damon_target *t);
 void damon_destroy_target(struct damon_target *t);
+unsigned int damon_nr_regions(struct damon_target *t);
 
 struct damon_ctx *damon_new_ctx(void);
 void damon_destroy_ctx(struct damon_ctx *ctx);
 int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
-   unsigned long aggr_int, unsigned long primitive_upd_int);
+   unsigned long aggr_int, unsigned long primitive_upd_int,
+   unsigned long min_nr_reg, unsigned long max_nr_reg);
 
 int damon_start(struct damon_ctx **ctxs, int nr_ctxs);
 int damon_stop(struct damon_ctx **ctxs, int nr_ctxs);
diff --git a/mm/damon/core.c b/mm/damon/core.c
index 94db494dcf70..b36b6bdd94e2 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -10,8 +10,12 @@
 #include

[PATCH v28 06/13] mm/damon: Add a tracepoint

2021-04-13 Thread SeongJae Park

From: SeongJae Park 

This commit adds a tracepoint for DAMON.  It traces the monitoring
results of each region for each aggregation interval.  Using this, DAMON
can easily integrated with tracepoints supporting tools such as perf.

Signed-off-by: SeongJae Park 
Reviewed-by: Leonard Foerster 
Reviewed-by: Steven Rostedt (VMware) 
---
 include/trace/events/damon.h | 43 
 mm/damon/core.c  |  7 +-
 2 files changed, 49 insertions(+), 1 deletion(-)
 create mode 100644 include/trace/events/damon.h

diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h
new file mode 100644
index ..2f422f4f1fb9
--- /dev/null
+++ b/include/trace/events/damon.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM damon
+
+#if !defined(_TRACE_DAMON_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_DAMON_H
+
+#include 
+#include 
+#include 
+
+TRACE_EVENT(damon_aggregated,
+
+   TP_PROTO(struct damon_target *t, struct damon_region *r,
+   unsigned int nr_regions),
+
+   TP_ARGS(t, r, nr_regions),
+
+   TP_STRUCT__entry(
+   __field(unsigned long, target_id)
+   __field(unsigned int, nr_regions)
+   __field(unsigned long, start)
+   __field(unsigned long, end)
+   __field(unsigned int, nr_accesses)
+   ),
+
+   TP_fast_assign(
+   __entry->target_id = t->id;
+   __entry->nr_regions = nr_regions;
+   __entry->start = r->ar.start;
+   __entry->end = r->ar.end;
+   __entry->nr_accesses = r->nr_accesses;
+   ),
+
+   TP_printk("target_id=%lu nr_regions=%u %lu-%lu: %u",
+   __entry->target_id, __entry->nr_regions,
+   __entry->start, __entry->end, __entry->nr_accesses)
+);
+
+#endif /* _TRACE_DAMON_H */
+
+/* This part must be outside protection */
+#include 
diff --git a/mm/damon/core.c b/mm/damon/core.c
index b36b6bdd94e2..912112662d0c 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -13,6 +13,9 @@
 #include 
 #include 
 
+#define CREATE_TRACE_POINTS
+#include 
+
 /* Get a random number in [l, r) */
 #define damon_rand(l, r) (l + prandom_u32_max(r - l))
 
@@ -388,8 +391,10 @@ static void kdamond_reset_aggregated(struct damon_ctx *c)
damon_for_each_target(t, c) {
struct damon_region *r;
 
-   damon_for_each_region(r, t)
+   damon_for_each_region(r, t) {
+   trace_damon_aggregated(t, r, damon_nr_regions(t));
r->nr_accesses = 0;
+   }
}
 }
 
-- 
2.17.1

[PATCH v28 07/13] mm/damon: Implement a debugfs-based user space interface

2021-04-13 Thread SeongJae Park

From: SeongJae Park 

DAMON is designed to be used by kernel space code such as the memory
management subsystems, and therefore it provides only kernel space API.
That said, letting the user space control DAMON could provide some
benefits to them.  For example, it will allow user space to analyze
their specific workloads and make their own special optimizations.

For such cases, this commit implements a simple DAMON application kernel
module, namely 'damon-dbgfs', which merely wraps the DAMON api and
exports those to the user space via the debugfs.

'damon-dbgfs' exports three files, ``attrs``, ``target_ids``, and
``monitor_on`` under its debugfs directory, ``/damon/``.

Attributes
--

Users can read and write the ``sampling interval``, ``aggregation
interval``, ``regions update interval``, and min/max number of
monitoring target regions by reading from and writing to the ``attrs``
file.  For example, below commands set those values to 5 ms, 100 ms,
1,000 ms, 10, 1000 and check it again::

# cd /damon
# echo 5000 10 100 10 1000 > attrs
# cat attrs
5000 10 100 10 1000

Target IDs
--

Some types of address spaces supports multiple monitoring target.  For
example, the virtual memory address spaces monitoring can have multiple
processes as the monitoring targets.  Users can set the targets by
writing relevant id values of the targets to, and get the ids of the
current targets by reading from the ``target_ids`` file.  In case of the
virtual address spaces monitoring, the values should be pids of the
monitoring target processes.  For example, below commands set processes
having pids 42 and 4242 as the monitoring targets and check it again::

# cd /damon
# echo 42 4242 > target_ids
# cat target_ids
42 4242

Note that setting the target ids doesn't start the monitoring.

Turning On/Off
--

Setting the files as described above doesn't incur effect unless you
explicitly start the monitoring.  You can start, stop, and check the
current status of the monitoring by writing to and reading from the
``monitor_on`` file.  Writing ``on`` to the file starts the monitoring
of the targets with the attributes.  Writing ``off`` to the file stops
those.  DAMON also stops if every targets are invalidated (in case of
the virtual memory monitoring, target processes are invalidated when
terminated).  Below example commands turn on, off, and check the status
of DAMON::

# cd /damon
# echo on > monitor_on
# echo off > monitor_on
# cat monitor_on
off

Please note that you cannot write to the above-mentioned debugfs files
while the monitoring is turned on.  If you write to the files while
DAMON is running, an error code such as ``-EBUSY`` will be returned.

Signed-off-by: SeongJae Park 
Reviewed-by: Leonard Foerster 
---
 include/linux/damon.h |   3 +
 mm/damon/Kconfig  |   9 +
 mm/damon/Makefile |   1 +
 mm/damon/core.c   |  47 +
 mm/damon/dbgfs.c  | 386 ++
 5 files changed, 446 insertions(+)
 create mode 100644 mm/damon/dbgfs.c

diff --git a/include/linux/damon.h b/include/linux/damon.h
index 72cf5ebd35fe..b17e808a9cae 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -237,9 +237,12 @@ unsigned int damon_nr_regions(struct damon_target *t);
 
 struct damon_ctx *damon_new_ctx(void);
 void damon_destroy_ctx(struct damon_ctx *ctx);
+int damon_set_targets(struct damon_ctx *ctx,
+   unsigned long *ids, ssize_t nr_ids);
 int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
unsigned long aggr_int, unsigned long primitive_upd_int,
unsigned long min_nr_reg, unsigned long max_nr_reg);
+int damon_nr_running_ctxs(void);
 
 int damon_start(struct damon_ctx **ctxs, int nr_ctxs);
 int damon_stop(struct damon_ctx **ctxs, int nr_ctxs);
diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig
index 8ae080c52950..72f1683ba0ee 100644
--- a/mm/damon/Kconfig
+++ b/mm/damon/Kconfig
@@ -21,4 +21,13 @@ config DAMON_VADDR
  This builds the default data access monitoring primitives for DAMON
  that works for virtual address spaces.
 
+config DAMON_DBGFS
+   bool "DAMON debugfs interface"
+   depends on DAMON_VADDR && DEBUG_FS
+   help
+ This builds the debugfs interface for DAMON.  The user space admins
+ can use the interface for arbitrary data access monitoring.
+
+ If unsure, say N.
+
 endmenu
diff --git a/mm/damon/Makefile b/mm/damon/Makefile
index 6ebbd08aed67..fed4be3bace3 100644
--- a/mm/damon/Makefile
+++ b/mm/damon/Makefile
@@ -2,3 +2,4 @@
 
 obj-$(CONFIG_DAMON):= core.o
 obj-$(CONFIG_DAMON_VADDR)  += vaddr.o
+obj-$(CONFIG_DAMON_DBGFS)  += dbgfs.o
diff --git a/mm/damon/core.c b/mm/damon/core.c
index 912112662d0c..cad2b4cee39d 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -172,6 +172,39 @@ void damon_destroy_ctx(struct dam

[PATCH v28 01/13] mm: Introduce Data Access MONitor (DAMON)

2021-04-13 Thread SeongJae Park

From: SeongJae Park 

DAMON is a data access monitoring framework for the Linux kernel.  The
core mechanisms of DAMON make it

 - accurate (the monitoring output is useful enough for DRAM level
   performance-centric memory management; It might be inappropriate for
   CPU cache levels, though),
 - light-weight (the monitoring overhead is normally low enough to be
   applied online), and
 - scalable (the upper-bound of the overhead is in constant range
   regardless of the size of target workloads).

Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications.  For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.

Due to its simple and flexible interface, providing user space interface
would be also easy.  Then, user space users who have some special
workloads can write personalized applications for better understanding
and optimizations of their workloads and systems.

===

Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic.  The basic
access check is as below.

The output of DAMON says what memory regions are how frequently accessed
for a given duration.  The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation
interval``.  In detail, DAMON checks access to each page per ``sampling
interval`` and aggregates the results.  In other words, counts the
number of the accesses to each region.  After each ``aggregation
interval`` passes, DAMON calls callback functions that previously
registered by users so that users can read the aggregated results and
then clears the results.  This can be described in below simple
pseudo-code::

init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)

The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target
regions could be dynamically changed (e.g., mmap() or memory hotplug).
The monitoring overhead of this mechanism will arbitrarily increase as
the size of the target workload grows.

The basic monitoring primitives for actual access check and dynamic
target regions construction aren't in the core part of DAMON.  Instead,
it allows users to implement their own primitives that are optimized for
their use case and configure DAMON to use those.  In other words, users
cannot use current version of DAMON without some additional works.

Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.

Signed-off-by: SeongJae Park 
Reviewed-by: Leonard Foerster 
---
 include/linux/damon.h | 167 ++
 mm/Kconfig|   3 +
 mm/Makefile   |   1 +
 mm/damon/Kconfig  |  15 ++
 mm/damon/Makefile |   3 +
 mm/damon/core.c   | 318 ++
 6 files changed, 507 insertions(+)
 create mode 100644 include/linux/damon.h
 create mode 100644 mm/damon/Kconfig
 create mode 100644 mm/damon/Makefile
 create mode 100644 mm/damon/core.c

diff --git a/include/linux/damon.h b/include/linux/damon.h
new file mode 100644
index ..2f652602b1ea
--- /dev/null
+++ b/include/linux/damon.h
@@ -0,0 +1,167 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * DAMON api
+ *
+ * Author: SeongJae Park 
+ */
+
+#ifndef _DAMON_H_
+#define _DAMON_H_
+
+#include 
+#include 
+#include 
+
+struct damon_ctx;
+
+/**
+ * struct damon_primitive  Monitoring primitives for given use cases.
+ *
+ * @init:  Initialize primitive-internal data structures.
+ * @update:Update primitive-internal data structures.
+ * @prepare_access_checks: Prepare next access check of target regions.
+ * @check_accesses:Check the accesses to target regions.
+ * @reset_aggregated:  Reset aggregated accesses monitoring results.
+ * @target_valid:  Determine if the target is valid.
+ * @cleanup:   Clean up the context.
+ *
+ * DAMON can be extended for various address spaces and usages.  For this,
+ * users should register the low level primitives for their target address
+ * space and usecase via the _ctx.primitive.  Then, the monitoring thread
+ * (_ctx.kdamond) calls @init and @prepare_access_checks before starting
+ * the monitoring, @update after each

[PATCH v28 04/13] mm/idle_page_tracking: Make PG_idle reusable

2021-04-13 Thread SeongJae Park

From: SeongJae Park 

PG_idle and PG_young allow the two PTE Accessed bit users, Idle Page
Tracking and the reclaim logic concurrently work while don't interfere
each other.  That is, when they need to clear the Accessed bit, they set
PG_young to represent the previous state of the bit, respectively.  And
when they need to read the bit, if the bit is cleared, they further read
the PG_young to know whether the other has cleared the bit meanwhile or
not.

We could add another page flag and extend the mechanism to use the flag
if we need to add another concurrent PTE Accessed bit user subsystem.
However, the space is limited.  Meanwhile, if the new subsystem is
mutually exclusive with IDLE_PAGE_TRACKING or interfering with it is not
a real problem, it would be ok to simply reuse the PG_idle flag.
However, it's impossible because the flags are dependent on
IDLE_PAGE_TRACKING.

To allow such reuse of the flags, this commit separates the PG_young and
PG_idle flag logic from IDLE_PAGE_TRACKING and introduces new kernel
config, 'PAGE_IDLE_FLAG'.  Hence, a new subsystem would be able to reuse
PG_idle without depending on IDLE_PAGE_TRACKING.

In the next commit, DAMON's reference implementation of the virtual
memory address space monitoring primitives will use it.

Signed-off-by: SeongJae Park 
Reviewed-by: Shakeel Butt 
---
 include/linux/page-flags.h |  4 ++--
 include/linux/page_ext.h   |  2 +-
 include/linux/page_idle.h  |  6 +++---
 include/trace/events/mmflags.h |  2 +-
 mm/Kconfig |  8 
 mm/page_ext.c  | 12 +++-
 mm/page_idle.c | 10 --
 7 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 04a34c08e0a6..6be2c1e2fb48 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -131,7 +131,7 @@ enum pageflags {
 #ifdef CONFIG_MEMORY_FAILURE
PG_hwpoison,/* hardware poisoned page. Don't touch */
 #endif
-#if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT)
+#if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT)
PG_young,
PG_idle,
 #endif
@@ -436,7 +436,7 @@ PAGEFLAG_FALSE(HWPoison)
 #define __PG_HWPOISON 0
 #endif
 
-#if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT)
+#if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT)
 TESTPAGEFLAG(Young, young, PF_ANY)
 SETPAGEFLAG(Young, young, PF_ANY)
 TESTCLEARFLAG(Young, young, PF_ANY)
diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index aff81ba31bd8..fabb2e1e087f 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -19,7 +19,7 @@ struct page_ext_operations {
 enum page_ext_flags {
PAGE_EXT_OWNER,
PAGE_EXT_OWNER_ALLOCATED,
-#if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
+#if defined(CONFIG_PAGE_IDLE_FLAG) && !defined(CONFIG_64BIT)
PAGE_EXT_YOUNG,
PAGE_EXT_IDLE,
 #endif
diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h
index 1e894d34bdce..d8a6aecf99cb 100644
--- a/include/linux/page_idle.h
+++ b/include/linux/page_idle.h
@@ -6,7 +6,7 @@
 #include 
 #include 
 
-#ifdef CONFIG_IDLE_PAGE_TRACKING
+#ifdef CONFIG_PAGE_IDLE_FLAG
 
 #ifdef CONFIG_64BIT
 static inline bool page_is_young(struct page *page)
@@ -106,7 +106,7 @@ static inline void clear_page_idle(struct page *page)
 }
 #endif /* CONFIG_64BIT */
 
-#else /* !CONFIG_IDLE_PAGE_TRACKING */
+#else /* !CONFIG_PAGE_IDLE_FLAG */
 
 static inline bool page_is_young(struct page *page)
 {
@@ -135,6 +135,6 @@ static inline void clear_page_idle(struct page *page)
 {
 }
 
-#endif /* CONFIG_IDLE_PAGE_TRACKING */
+#endif /* CONFIG_PAGE_IDLE_FLAG */
 
 #endif /* _LINUX_MM_PAGE_IDLE_H */
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 629c7a0eaff2..ea434bbc2d2b 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -73,7 +73,7 @@
 #define IF_HAVE_PG_HWPOISON(flag,string)
 #endif
 
-#if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT)
+#if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT)
 #define IF_HAVE_PG_IDLE(flag,string) ,{1UL << flag, string}
 #else
 #define IF_HAVE_PG_IDLE(flag,string)
diff --git a/mm/Kconfig b/mm/Kconfig
index 04b66c8df24a..7be2bc06b7d8 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -770,10 +770,18 @@ config DEFERRED_STRUCT_PAGE_INIT
  lifetime of the system until these kthreads finish the
  initialisation.
 
+config PAGE_IDLE_FLAG
+   bool "Add PG_idle and PG_young flags"
+   help
+ This feature adds PG_idle and PG_young flags in 'struct page'.  PTE
+ Accessed bit writers can set the state of the bit in the flags to let
+ other PTE Accessed bit readers don't disturbed.
+
 config IDLE_PAGE_TRACKING
bool "Enable idle page tracking"
dep

[PATCH v28 02/13] mm/damon/core: Implement region-based sampling

2021-04-13 Thread SeongJae Park

From: SeongJae Park 

To avoid the unbounded increase of the overhead, DAMON groups adjacent
pages that are assumed to have the same access frequencies into a
region.  As long as the assumption (pages in a region have the same
access frequencies) is kept, only one page in the region is required to
be checked.  Thus, for each ``sampling interval``,

 1. the 'prepare_access_checks' primitive picks one page in each region,
 2. waits for one ``sampling interval``,
 3. checks whether the page is accessed meanwhile, and
 4. increases the access count of the region if so.

Therefore, the monitoring overhead is controllable by adjusting the
number of regions.  DAMON allows both the underlying primitives and user
callbacks to adjust regions for the trade-off.  In other words, this
commit makes DAMON to use not only time-based sampling but also
space-based sampling.

This scheme, however, cannot preserve the quality of the output if the
assumption is not guaranteed.  Next commit will address this problem.

Signed-off-by: SeongJae Park 
Reviewed-by: Leonard Foerster 
---
 include/linux/damon.h |  77 ++-
 mm/damon/core.c   | 143 --
 2 files changed, 213 insertions(+), 7 deletions(-)

diff --git a/include/linux/damon.h b/include/linux/damon.h
index 2f652602b1ea..67db309ad61b 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -12,6 +12,48 @@
 #include 
 #include 
 
+/**
+ * struct damon_addr_range - Represents an address region of [@start, @end).
+ * @start: Start address of the region (inclusive).
+ * @end:   End address of the region (exclusive).
+ */
+struct damon_addr_range {
+   unsigned long start;
+   unsigned long end;
+};
+
+/**
+ * struct damon_region - Represents a monitoring target region.
+ * @ar:The address range of the region.
+ * @sampling_addr: Address of the sample for the next access check.
+ * @nr_accesses:   Access frequency of this region.
+ * @list:  List head for siblings.
+ */
+struct damon_region {
+   struct damon_addr_range ar;
+   unsigned long sampling_addr;
+   unsigned int nr_accesses;
+   struct list_head list;
+};
+
+/**
+ * struct damon_target - Represents a monitoring target.
+ * @id:Unique identifier for this target.
+ * @regions_list:  Head of the monitoring target regions of this target.
+ * @list:  List head for siblings.
+ *
+ * Each monitoring context could have multiple targets.  For example, a context
+ * for virtual memory address spaces could have multiple target processes.  The
+ * @id of each target should be unique among the targets of the context.  For
+ * example, in the virtual address monitoring context, it could be a pidfd or
+ * an address of an mm_struct.
+ */
+struct damon_target {
+   unsigned long id;
+   struct list_head regions_list;
+   struct list_head list;
+};
+
 struct damon_ctx;
 
 /**
@@ -36,7 +78,7 @@ struct damon_ctx;
  *
  * @init should initialize primitive-internal data structures.  For example,
  * this could be used to construct proper monitoring target regions and link
- * those to @damon_ctx.target.
+ * those to @damon_ctx.adaptive_targets.
  * @update should update the primitive-internal data structures.  For example,
  * this could be used to update monitoring target regions for current status.
  * @prepare_access_checks should manipulate the monitoring regions to be
@@ -130,7 +172,7 @@ struct damon_callback {
  * @primitive: Set of monitoring primitives for given use cases.
  * @callback:  Set of callbacks for monitoring events notifications.
  *
- * @target:Pointer to the user-defined monitoring target.
+ * @region_targets:Head of monitoring targets (_target) list.
  */
 struct damon_ctx {
unsigned long sample_interval;
@@ -149,11 +191,40 @@ struct damon_ctx {
struct damon_primitive primitive;
struct damon_callback callback;
 
-   void *target;
+   struct list_head region_targets;
 };
 
+#define damon_next_region(r) \
+   (container_of(r->list.next, struct damon_region, list))
+
+#define damon_prev_region(r) \
+   (container_of(r->list.prev, struct damon_region, list))
+
+#define damon_for_each_region(r, t) \
+   list_for_each_entry(r, >regions_list, list)
+
+#define damon_for_each_region_safe(r, next, t) \
+   list_for_each_entry_safe(r, next, >regions_list, list)
+
+#define damon_for_each_target(t, ctx) \
+   list_for_each_entry(t, &(ctx)->region_targets, list)
+
+#define damon_for_each_target_safe(t, next, ctx)   \
+   list_for_each_entry_safe(t, next, &(ctx)->region_targets, list)
+
 #ifdef CONFIG_DAMON
 
+struct damon_region *damon_new_region(unsigned long start, unsigned long end);
+inline void damon_insert_region(struct damon_region *r,
+   struct damon_region *prev, struct damon_region *next);
+void damon_add_region(stru

[PATCH v28 00/13] Introduce Data Access MONitor (DAMON)

2021-04-13 Thread SeongJae Park

From: SeongJae Park 

Changes from Previous Version (v27)
===

- Rebase on latest -mm tree (v5.12-rc7-mmots-2021-04-11-20-49)
- dbgfs: Fix wrong failure handlings (Stefan Nuernberger)
- dbgfs: Change return type of 'dbgfs_fill_ctx_dir()' to void (Greg KH)

Introduction


DAMON is a data access monitoring framework for the Linux kernel.  The core
mechanisms of DAMON called 'region based sampling' and 'adaptive regions
adjustment' (refer to 'mechanisms.rst' in the 11th patch of this patchset for
the detail) make it

 - accurate (The monitored information is useful for DRAM level memory
   management. It might not appropriate for Cache-level accuracy, though.),
 - light-weight (The monitoring overhead is low enough to be applied online
   while making no impact on the performance of the target workloads.), and
 - scalable (the upper-bound of the instrumentation overhead is controllable
   regardless of the size of target workloads.).

Using this framework, therefore, several memory management mechanisms such as
reclamation and THP can be optimized to aware real data access patterns.
Experimental access pattern aware memory management optimization works that
incurring high instrumentation overhead will be able to have another try.

Though DAMON is for kernel subsystems, it can be easily exposed to the user
space by writing a DAMON-wrapper kernel subsystem.  Then, user space users who
have some special workloads will be able to write personalized tools or
applications for deeper understanding and specialized optimizations of their
systems.

Long-term Plan
--

DAMON is a part of a project called Data Access-aware Operating System (DAOS).
As the name implies, I want to improve the performance and efficiency of
systems using fine-grained data access patterns.  The optimizations are for
both kernel and user spaces.  I will therefore modify or create kernel
subsystems, export some of those to user space and implement user space library
/ tools.  Below shows the layers and components for the project.

---
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework:  DAMON
Features:   DAMOS, virtual addr, physical addr, ...
Applications:   DAMON-debugfs, (DARC), ...
^^^KERNEL SPACE

Raw Interface:  debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...

vvvUSER SPACE  
Library:(libdamon), ...
Tools:  DAMO, (perf), ...
---

The components in parentheses or marked as '...' are not implemented yet but in
the future plan.  IOW, those are the TODO tasks of DAOS project.  For more
detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjp...@amazon.com/

Evaluations
===

We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel that
v24 DAMON patchset is applied.

DAMON is lightweight.  It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.

DAMON is accurate and useful for memory management optimizations.  An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.  Another
experimental DAMON-based 'proactive reclamation' implementation, 'prcl',
reduces 93.38% of residential sets and 23.63% of system memory footprint while
incurring only 1.22% runtime overhead in the best case (parsec3/freqmine).

NOTE that the experimental THP optimization and proactive reclamation are not
for production but only for proof of concepts.

Please refer to the official document[1] or "Documentation/admin-guide/mm: Add
a document for DAMON" patch in this patchset for detailed evaluation setup and
results.

[1] 
https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html

Real-world User Story
=

In summary, DAMON has used on production systems and proved its usefulness.

DAMON as a profiler
---

We analyzed characteristics of a large scale production systems of our
customers using DAMON.  The systems utilize 70GB DRAM and 36 CPUs.  From this,
we were able to find interesting things below.

There were obviously different access pattern under idle workload and active
workload.  Under the idle workload, it accessed large memory regions with low
frequency, while the active workload accessed small memory regions with high
freuqnecy.

DAMON found a 7GB memory region that showing obviously high access frequency
under the active workload.  We believe this is the performance-effective
working set and need to be protected.

There was a

Re: [PATCH v2 00/16] Multigenerational LRU Framework

2021-04-13 Thread SeongJae Park

From: SeongJae Park 

Hello,


Very interesting work, thank you for sharing this :)

On Tue, 13 Apr 2021 00:56:17 -0600 Yu Zhao  wrote:

> What's new in v2
> 
> Special thanks to Jens Axboe for reporting a regression in buffered
> I/O and helping test the fix.

Is the discussion open?  If so, could you please give me a link?

> 
> This version includes the support of tiers, which represent levels of
> usage from file descriptors only. Pages accessed N times via file
> descriptors belong to tier order_base_2(N). Each generation contains
> at most MAX_NR_TIERS tiers, and they require additional MAX_NR_TIERS-2
> bits in page->flags. In contrast to moving across generations which
> requires the lru lock, moving across tiers only involves an atomic
> operation on page->flags and therefore has a negligible cost. A
> feedback loop modeled after the well-known PID controller monitors the
> refault rates across all tiers and decides when to activate pages from
> which tiers, on the reclaim path.
> 
> This feedback model has a few advantages over the current feedforward
> model:
> 1) It has a negligible overhead in the buffered I/O access path
>because activations are done in the reclaim path.
> 2) It takes mapped pages into account and avoids overprotecting pages
>accessed multiple times via file descriptors.
> 3) More tiers offer better protection to pages accessed more than
>twice when buffered-I/O-intensive workloads are under memory
>pressure.
> 
> The fio/io_uring benchmark shows 14% improvement in IOPS when randomly
> accessing Samsung PM981a in the buffered I/O mode.

Improvement under memory pressure, right?  How much pressure?

[...]
> 
> Differential scans via page tables
> --
> Each differential scan discovers all pages that have been referenced
> since the last scan. Specifically, it walks the mm_struct list
> associated with an lruvec to scan page tables of processes that have
> been scheduled since the last scan.

Does this means it scans only virtual address spaces of processes and therefore
pages in the page cache that are not mmap()-ed will not be scanned?

> The cost of each differential scan
> is roughly proportional to the number of referenced pages it
> discovers. Unless address spaces are extremely sparse, page tables
> usually have better memory locality than the rmap. The end result is
> generally a significant reduction in CPU usage, for workloads using a
> large amount of anon memory.

When and how frequently it scans?


Thanks,
SeongJae Park

[...]

Re: [PATCH v27 07/13] mm/damon: Implement a debugfs-based user space interface

2021-04-12 Thread SeongJae Park

From: SeongJae Park 

On Thu,  8 Apr 2021 13:48:48 + SeongJae Park  wrote:

> From: SeongJae Park 
> 
> DAMON is designed to be used by kernel space code such as the memory
> management subsystems, and therefore it provides only kernel space API.
> That said, letting the user space control DAMON could provide some
> benefits to them.  For example, it will allow user space to analyze
> their specific workloads and make their own special optimizations.
> 
> For such cases, this commit implements a simple DAMON application kernel
> module, namely 'damon-dbgfs', which merely wraps the DAMON api and
> exports those to the user space via the debugfs.
>
[...]
> +/*
> + * Functions for the initialization
> + */
> +
> +static int __init damon_dbgfs_init(void)
> +{
> + int rc;
> +
> + dbgfs_ctxs = kmalloc(sizeof(*dbgfs_ctxs), GFP_KERNEL);
> + if (!dbgfs_ctxs) {
> + pr_err("%s: dbgfs ctxs alloc failed\n", __func__);
> + return -ENOMEM;
> + }
> + dbgfs_ctxs[0] = dbgfs_new_ctx();
> + if (!dbgfs_ctxs[0]) {
> + pr_err("%s: dbgfs ctx alloc failed\n", __func__);
> + return -ENOMEM;

My colleague, Stefan found 'dbgfs_ctxs' is not freed here.  Similar in below
'__damon_dbgfs_init()' failure handling.  I will fix these in the next version.

Reported-by: Stefan Nuernberger 


Thanks,
SeongJae Park

> + }
> + dbgfs_nr_ctxs = 1;
> +
> + rc = __damon_dbgfs_init();
> + if (rc)
> + pr_err("%s: dbgfs init failed\n", __func__);
> +
> + return rc;
> +}
> +
> +module_init(damon_dbgfs_init);
> -- 
> 2.17.1
>

Re: [PATCH v27 07/13] mm/damon: Implement a debugfs-based user space interface

2021-04-10 Thread SeongJae Park

From: SeongJae Park 

On Sat, 10 Apr 2021 10:55:01 +0200 Greg KH  wrote:

> On Thu, Apr 08, 2021 at 01:48:48PM +0000, SeongJae Park wrote:
> > +static int dbgfs_fill_ctx_dir(struct dentry *dir, struct damon_ctx *ctx)
> > +{
> > +   const char * const file_names[] = {"attrs", "target_ids"};
> > +   const struct file_operations *fops[] = {_fops, _ids_fops};
> > +   int i;
> > +
> > +   for (i = 0; i < ARRAY_SIZE(file_names); i++)
> > +   debugfs_create_file(file_names[i], 0600, dir, ctx, fops[i]);
> > +
> > +   return 0;
> > +}
> 
> Why do you have a function that can only return 0, actually return
> something?  It should be void, right?

You're right, I will make it return void in the next spin.


Thanks,
SeongJae Park

> 
> thanks,
> 
> greg k-h
>

[PATCH v27 13/13] MAINTAINERS: Update for DAMON

2021-04-08 Thread SeongJae Park

From: SeongJae Park 

This commit updates MAINTAINERS file for DAMON related files.

Signed-off-by: SeongJae Park 
---
 MAINTAINERS | 12 
 1 file changed, 12 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index ad650102f950..0df746019eb9 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -5003,6 +5003,18 @@ F:   net/ax25/ax25_out.c
 F: net/ax25/ax25_timer.c
 F: net/ax25/sysctl_net_ax25.c
 
+DATA ACCESS MONITOR
+M: SeongJae Park 
+L: linux...@kvack.org
+S: Maintained
+F: Documentation/admin-guide/mm/damon/*
+F: Documentation/vm/damon/*
+F: include/linux/damon.h
+F: include/trace/events/damon.h
+F: mm/damon/*
+F: tools/damon/*
+F: tools/testing/selftests/damon/*
+
 DAVICOM FAST ETHERNET (DMFE) NETWORK DRIVER
 L: net...@vger.kernel.org
 S: Orphan
-- 
2.17.1

[PATCH v27 12/13] mm/damon: Add user space selftests

2021-04-08 Thread SeongJae Park

From: SeongJae Park 

This commit adds a simple user space tests for DAMON.  The tests are
using kselftest framework.

Signed-off-by: SeongJae Park 
---
 tools/testing/selftests/damon/Makefile|  7 ++
 .../selftests/damon/_chk_dependency.sh| 28 ++
 .../testing/selftests/damon/debugfs_attrs.sh  | 98 +++
 3 files changed, 133 insertions(+)
 create mode 100644 tools/testing/selftests/damon/Makefile
 create mode 100644 tools/testing/selftests/damon/_chk_dependency.sh
 create mode 100755 tools/testing/selftests/damon/debugfs_attrs.sh

diff --git a/tools/testing/selftests/damon/Makefile 
b/tools/testing/selftests/damon/Makefile
new file mode 100644
index ..8a3f2cd9fec0
--- /dev/null
+++ b/tools/testing/selftests/damon/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for damon selftests
+
+TEST_FILES = _chk_dependency.sh
+TEST_PROGS = debugfs_attrs.sh
+
+include ../lib.mk
diff --git a/tools/testing/selftests/damon/_chk_dependency.sh 
b/tools/testing/selftests/damon/_chk_dependency.sh
new file mode 100644
index ..e090836c2bf7
--- /dev/null
+++ b/tools/testing/selftests/damon/_chk_dependency.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
+DBGFS=/sys/kernel/debug/damon
+
+if [ $EUID -ne 0 ];
+then
+   echo "Run as root"
+   exit $ksft_skip
+fi
+
+if [ ! -d $DBGFS ]
+then
+   echo "$DBGFS not found"
+   exit $ksft_skip
+fi
+
+for f in attrs target_ids monitor_on
+do
+   if [ ! -f "$DBGFS/$f" ]
+   then
+   echo "$f not found"
+   exit 1
+   fi
+done
diff --git a/tools/testing/selftests/damon/debugfs_attrs.sh 
b/tools/testing/selftests/damon/debugfs_attrs.sh
new file mode 100755
index ..4a8ab4910ee4
--- /dev/null
+++ b/tools/testing/selftests/damon/debugfs_attrs.sh
@@ -0,0 +1,98 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+source ./_chk_dependency.sh
+
+# Test attrs file
+file="$DBGFS/attrs"
+
+ORIG_CONTENT=$(cat $file)
+
+echo 1 2 3 4 5 > $file
+if [ $? -ne 0 ]
+then
+   echo "$file write failed"
+   echo $ORIG_CONTENT > $file
+   exit 1
+fi
+
+echo 1 2 3 4 > $file
+if [ $? -eq 0 ]
+then
+   echo "$file write success (should failed)"
+   echo $ORIG_CONTENT > $file
+   exit 1
+fi
+
+CONTENT=$(cat $file)
+if [ "$CONTENT" != "1 2 3 4 5" ]
+then
+   echo "$file not written"
+   echo $ORIG_CONTENT > $file
+   exit 1
+fi
+
+echo $ORIG_CONTENT > $file
+
+# Test target_ids file
+file="$DBGFS/target_ids"
+
+ORIG_CONTENT=$(cat $file)
+
+echo "1 2 3 4" > $file
+if [ $? -ne 0 ]
+then
+   echo "$file write fail"
+   echo $ORIG_CONTENT > $file
+   exit 1
+fi
+
+echo "1 2 abc 4" > $file
+if [ $? -ne 0 ]
+then
+   echo "$file write fail"
+   echo $ORIG_CONTENT > $file
+   exit 1
+fi
+
+CONTENT=$(cat $file)
+if [ "$CONTENT" != "1 2" ]
+then
+   echo "$file not written"
+   echo $ORIG_CONTENT > $file
+   exit 1
+fi
+
+echo abc 2 3 > $file
+if [ $? -ne 0 ]
+then
+   echo "$file wrong value write fail"
+   echo $ORIG_CONTENT > $file
+   exit 1
+fi
+
+if [ ! -z "$(cat $file)" ]
+then
+   echo "$file not cleared"
+   echo $ORIG_CONTENT > $file
+   exit 1
+fi
+
+echo > $file
+if [ $? -ne 0 ]
+then
+   echo "$file init fail"
+   echo $ORIG_CONTENT > $file
+   exit 1
+fi
+
+if [ ! -z "$(cat $file)" ]
+then
+   echo "$file not initialized"
+   echo $ORIG_CONTENT > $file
+   exit 1
+fi
+
+echo $ORIG_CONTENT > $file
+
+echo "PASS"
-- 
2.17.1

[PATCH v27 10/13] Documentation: Add documents for DAMON

2021-04-08 Thread SeongJae Park

From: SeongJae Park 

This commit adds documents for DAMON under
`Documentation/admin-guide/mm/damon/` and `Documentation/vm/damon/`.

Signed-off-by: SeongJae Park 
---
 Documentation/admin-guide/mm/damon/guide.rst | 158 +
 Documentation/admin-guide/mm/damon/index.rst |  15 ++
 Documentation/admin-guide/mm/damon/plans.rst |  29 +++
 Documentation/admin-guide/mm/damon/start.rst | 114 +
 Documentation/admin-guide/mm/damon/usage.rst | 112 +
 Documentation/admin-guide/mm/index.rst   |   1 +
 Documentation/vm/damon/api.rst   |  20 ++
 Documentation/vm/damon/design.rst| 166 +
 Documentation/vm/damon/eval.rst  | 232 +++
 Documentation/vm/damon/faq.rst   |  58 +
 Documentation/vm/damon/index.rst |  31 +++
 Documentation/vm/index.rst   |   1 +
 12 files changed, 937 insertions(+)
 create mode 100644 Documentation/admin-guide/mm/damon/guide.rst
 create mode 100644 Documentation/admin-guide/mm/damon/index.rst
 create mode 100644 Documentation/admin-guide/mm/damon/plans.rst
 create mode 100644 Documentation/admin-guide/mm/damon/start.rst
 create mode 100644 Documentation/admin-guide/mm/damon/usage.rst
 create mode 100644 Documentation/vm/damon/api.rst
 create mode 100644 Documentation/vm/damon/design.rst
 create mode 100644 Documentation/vm/damon/eval.rst
 create mode 100644 Documentation/vm/damon/faq.rst
 create mode 100644 Documentation/vm/damon/index.rst

diff --git a/Documentation/admin-guide/mm/damon/guide.rst 
b/Documentation/admin-guide/mm/damon/guide.rst
new file mode 100644
index ..f52dc1669bb1
--- /dev/null
+++ b/Documentation/admin-guide/mm/damon/guide.rst
@@ -0,0 +1,158 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==
+Optimization Guide
+==
+
+This document helps you estimating the amount of benefit that you could get
+from DAMON-based optimizations, and describes how you could achieve it.  You
+are assumed to already read :doc:`start`.
+
+
+Check The Signs
+===
+
+No optimization can provide same extent of benefit to every case.  Therefore
+you should first guess how much improvements you could get using DAMON.  If
+some of below conditions match your situation, you could consider using DAMON.
+
+- *Low IPC and High Cache Miss Ratios.*  Low IPC means most of the CPU time is
+  spent waiting for the completion of time-consuming operations such as memory
+  access, while high cache miss ratios mean the caches don't help it well.
+  DAMON is not for cache level optimization, but DRAM level.  However,
+  improving DRAM management will also help this case by reducing the memory
+  operation latency.
+- *Memory Over-commitment and Unknown Users.*  If you are doing memory
+  overcommitment and you cannot control every user of your system, a memory
+  bank run could happen at any time.  You can estimate when it will happen
+  based on DAMON's monitoring results and act earlier to avoid or deal better
+  with the crisis.
+- *Frequent Memory Pressure.*  Frequent memory pressure means your system has
+  wrong configurations or memory hogs.  DAMON will help you find the right
+  configuration and/or the criminals.
+- *Heterogeneous Memory System.*  If your system is utilizing memory devices
+  that placed between DRAM and traditional hard disks, such as non-volatile
+  memory or fast SSDs, DAMON could help you utilizing the devices more
+  efficiently.
+
+
+Profile
+===
+
+If you found some positive signals, you could start by profiling your workloads
+using DAMON.  Find major workloads on your systems and analyze their data
+access pattern to find something wrong or can be improved.  The DAMON user
+space tool (``damo``) will be useful for this.  You can get ``damo`` from
+https://github.com/awslabs/damo.
+
+We recommend you to start from working set size distribution check using ``damo
+report wss``.  If the distribution is ununiform or quite different from what
+you estimated, you could consider `Memory Configuration`_ optimization.
+
+Then, review the overall access pattern in heatmap form using ``damo report
+heats``.  If it shows a simple pattern consists of a small number of memory
+regions having high contrast of access temperature, you could consider manual
+`Program Modification`_.
+
+If you still want to absorb more benefits, you should develop `Personalized
+DAMON Application`_ for your special case.
+
+You don't need to take only one approach among the above plans, but you could
+use multiple of the above approaches to maximize the benefit.
+
+
+Optimize
+
+
+If the profiling result also says it's worth trying some optimization, you
+could consider below approaches.  Note that some of the below approaches assume
+that your systems are configured with swap devices or other types of auxiliary
+memory so that you don't strictly required to accommodate the whole working set
+in the main memory.  Most

[PATCH v27 11/13] mm/damon: Add kunit tests

2021-04-08 Thread SeongJae Park

From: SeongJae Park 

This commit adds kunit based unit tests for the core and the virtual
address spaces monitoring primitives of DAMON.

Signed-off-by: SeongJae Park 
Reviewed-by: Brendan Higgins 
---
 mm/damon/Kconfig  |  36 +
 mm/damon/core-test.h  | 253 
 mm/damon/core.c   |   7 +
 mm/damon/dbgfs-test.h | 126 
 mm/damon/dbgfs.c  |   2 +
 mm/damon/vaddr-test.h | 328 ++
 mm/damon/vaddr.c  |   7 +
 7 files changed, 759 insertions(+)
 create mode 100644 mm/damon/core-test.h
 create mode 100644 mm/damon/dbgfs-test.h
 create mode 100644 mm/damon/vaddr-test.h

diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig
index 72f1683ba0ee..455995152697 100644
--- a/mm/damon/Kconfig
+++ b/mm/damon/Kconfig
@@ -12,6 +12,18 @@ config DAMON
  See https://damonitor.github.io/doc/html/latest-damon/index.html for
  more information.
 
+config DAMON_KUNIT_TEST
+   bool "Test for damon" if !KUNIT_ALL_TESTS
+   depends on DAMON && KUNIT=y
+   default KUNIT_ALL_TESTS
+   help
+ This builds the DAMON Kunit test suite.
+
+ For more information on KUnit and unit tests in general, please refer
+ to the KUnit documentation.
+
+ If unsure, say N.
+
 config DAMON_VADDR
bool "Data access monitoring primitives for virtual address spaces"
depends on DAMON && MMU
@@ -21,6 +33,18 @@ config DAMON_VADDR
  This builds the default data access monitoring primitives for DAMON
  that works for virtual address spaces.
 
+config DAMON_VADDR_KUNIT_TEST
+   bool "Test for DAMON primitives" if !KUNIT_ALL_TESTS
+   depends on DAMON_VADDR && KUNIT=y
+   default KUNIT_ALL_TESTS
+   help
+ This builds the DAMON virtual addresses primitives Kunit test suite.
+
+ For more information on KUnit and unit tests in general, please refer
+ to the KUnit documentation.
+
+ If unsure, say N.
+
 config DAMON_DBGFS
bool "DAMON debugfs interface"
depends on DAMON_VADDR && DEBUG_FS
@@ -30,4 +54,16 @@ config DAMON_DBGFS
 
  If unsure, say N.
 
+config DAMON_DBGFS_KUNIT_TEST
+   bool "Test for damon debugfs interface" if !KUNIT_ALL_TESTS
+   depends on DAMON_DBGFS && KUNIT=y
+   default KUNIT_ALL_TESTS
+   help
+ This builds the DAMON debugfs interface Kunit test suite.
+
+ For more information on KUnit and unit tests in general, please refer
+ to the KUnit documentation.
+
+ If unsure, say N.
+
 endmenu
diff --git a/mm/damon/core-test.h b/mm/damon/core-test.h
new file mode 100644
index ..b815dfbfb5fd
--- /dev/null
+++ b/mm/damon/core-test.h
@@ -0,0 +1,253 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Data Access Monitor Unit Tests
+ *
+ * Copyright 2019 Amazon.com, Inc. or its affiliates.  All rights reserved.
+ *
+ * Author: SeongJae Park 
+ */
+
+#ifdef CONFIG_DAMON_KUNIT_TEST
+
+#ifndef _DAMON_CORE_TEST_H
+#define _DAMON_CORE_TEST_H
+
+#include 
+
+static void damon_test_regions(struct kunit *test)
+{
+   struct damon_region *r;
+   struct damon_target *t;
+
+   r = damon_new_region(1, 2);
+   KUNIT_EXPECT_EQ(test, 1ul, r->ar.start);
+   KUNIT_EXPECT_EQ(test, 2ul, r->ar.end);
+   KUNIT_EXPECT_EQ(test, 0u, r->nr_accesses);
+
+   t = damon_new_target(42);
+   KUNIT_EXPECT_EQ(test, 0u, damon_nr_regions(t));
+
+   damon_add_region(r, t);
+   KUNIT_EXPECT_EQ(test, 1u, damon_nr_regions(t));
+
+   damon_del_region(r);
+   KUNIT_EXPECT_EQ(test, 0u, damon_nr_regions(t));
+
+   damon_free_target(t);
+}
+
+static unsigned int nr_damon_targets(struct damon_ctx *ctx)
+{
+   struct damon_target *t;
+   unsigned int nr_targets = 0;
+
+   damon_for_each_target(t, ctx)
+   nr_targets++;
+
+   return nr_targets;
+}
+
+static void damon_test_target(struct kunit *test)
+{
+   struct damon_ctx *c = damon_new_ctx();
+   struct damon_target *t;
+
+   t = damon_new_target(42);
+   KUNIT_EXPECT_EQ(test, 42ul, t->id);
+   KUNIT_EXPECT_EQ(test, 0u, nr_damon_targets(c));
+
+   damon_add_target(c, t);
+   KUNIT_EXPECT_EQ(test, 1u, nr_damon_targets(c));
+
+   damon_destroy_target(t);
+   KUNIT_EXPECT_EQ(test, 0u, nr_damon_targets(c));
+
+   damon_destroy_ctx(c);
+}
+
+/*
+ * Test kdamond_reset_aggregated()
+ *
+ * DAMON checks access to each region and aggregates this information as the
+ * access frequency of each region.  In detail, it increases '->nr_accesses' of
+ * regions that an access has confirmed.  'kdamond_reset_aggregated()' flushes
+ * the aggregated information ('->nr_accesses' of each regions) to the result
+ * buffer.  As a result of the flushing, the '->nr_accesses' of regions are
+ * initialized to zero.
+ */
+static voi

[PATCH v27 08/13] mm/damon/dbgfs: Export kdamond pid to the user space

2021-04-08 Thread SeongJae Park

From: SeongJae Park 

For CPU usage accounting, knowing pid of the monitoring thread could be
helpful.  For example, users could use cpuaccount cgroups with the pid.

This commit therefore exports the pid of currently running monitoring
thread to the user space via 'kdamond_pid' file in the debugfs
directory.

Signed-off-by: SeongJae Park 
---
 mm/damon/dbgfs.c | 38 --
 1 file changed, 36 insertions(+), 2 deletions(-)

diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c
index 9af844faffd4..b20c1e7742ce 100644
--- a/mm/damon/dbgfs.c
+++ b/mm/damon/dbgfs.c
@@ -237,6 +237,32 @@ static ssize_t dbgfs_target_ids_write(struct file *file,
return ret;
 }
 
+static ssize_t dbgfs_kdamond_pid_read(struct file *file,
+   char __user *buf, size_t count, loff_t *ppos)
+{
+   struct damon_ctx *ctx = file->private_data;
+   char *kbuf;
+   ssize_t len;
+
+   kbuf = kmalloc(count, GFP_KERNEL);
+   if (!kbuf)
+   return -ENOMEM;
+
+   mutex_lock(>kdamond_lock);
+   if (ctx->kdamond)
+   len = scnprintf(kbuf, count, "%d\n", ctx->kdamond->pid);
+   else
+   len = scnprintf(kbuf, count, "none\n");
+   mutex_unlock(>kdamond_lock);
+   if (!len)
+   goto out;
+   len = simple_read_from_buffer(buf, count, ppos, kbuf, len);
+
+out:
+   kfree(kbuf);
+   return len;
+}
+
 static int damon_dbgfs_open(struct inode *inode, struct file *file)
 {
file->private_data = inode->i_private;
@@ -258,10 +284,18 @@ static const struct file_operations target_ids_fops = {
.write = dbgfs_target_ids_write,
 };
 
+static const struct file_operations kdamond_pid_fops = {
+   .owner = THIS_MODULE,
+   .open = damon_dbgfs_open,
+   .read = dbgfs_kdamond_pid_read,
+};
+
 static int dbgfs_fill_ctx_dir(struct dentry *dir, struct damon_ctx *ctx)
 {
-   const char * const file_names[] = {"attrs", "target_ids"};
-   const struct file_operations *fops[] = {_fops, _ids_fops};
+   const char * const file_names[] = {"attrs", "target_ids",
+   "kdamond_pid"};
+   const struct file_operations *fops[] = {_fops, _ids_fops,
+   _pid_fops};
int i;
 
for (i = 0; i < ARRAY_SIZE(file_names); i++)
-- 
2.17.1

[PATCH v27 09/13] mm/damon/dbgfs: Support multiple contexts

2021-04-08 Thread SeongJae Park

From: SeongJae Park 

In some use cases, users would want to run multiple monitoring context.
For example, if a user wants a high precision monitoring and dedicating
multiple CPUs for the job is ok, because DAMON creates one monitoring
thread per one context, the user can split the monitoring target regions
into multiple small regions and create one context for each region.  Or,
someone might want to simultaneously monitor different address spaces,
e.g., both virtual address space and physical address space.

The DAMON's API allows such usage, but 'damon-dbgfs' does not.
Therefore, only kernel space DAMON users can do multiple contexts
monitoring.

This commit allows the user space DAMON users to use multiple contexts
monitoring by introducing two new 'damon-dbgfs' debugfs files,
'mk_context' and 'rm_context'.  Users can create a new monitoring
context by writing the desired name of the new context to 'mk_context'.
Then, a new directory with the name and having the files for setting of
the context ('attrs', 'target_ids' and 'record') will be created under
the debugfs directory.  Writing the name of the context to remove to
'rm_context' will remove the related context and directory.

Signed-off-by: SeongJae Park 
---
 mm/damon/dbgfs.c | 203 ++-
 1 file changed, 201 insertions(+), 2 deletions(-)

diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c
index b20c1e7742ce..66ac7e18b1df 100644
--- a/mm/damon/dbgfs.c
+++ b/mm/damon/dbgfs.c
@@ -18,6 +18,7 @@
 static struct damon_ctx **dbgfs_ctxs;
 static int dbgfs_nr_ctxs;
 static struct dentry **dbgfs_dirs;
+static DEFINE_MUTEX(damon_dbgfs_lock);
 
 /*
  * Returns non-empty string on success, negarive error code otherwise.
@@ -316,6 +317,192 @@ static struct damon_ctx *dbgfs_new_ctx(void)
return ctx;
 }
 
+static void dbgfs_destroy_ctx(struct damon_ctx *ctx)
+{
+   damon_destroy_ctx(ctx);
+}
+
+/*
+ * Make a context of @name and create a debugfs directory for it.
+ *
+ * This function should be called while holding damon_dbgfs_lock.
+ *
+ * Returns 0 on success, negative error code otherwise.
+ */
+static int dbgfs_mk_context(char *name)
+{
+   struct dentry *root, **new_dirs, *new_dir;
+   struct damon_ctx **new_ctxs, *new_ctx;
+   int err;
+
+   if (damon_nr_running_ctxs())
+   return -EBUSY;
+
+   new_ctxs = krealloc(dbgfs_ctxs, sizeof(*dbgfs_ctxs) *
+   (dbgfs_nr_ctxs + 1), GFP_KERNEL);
+   if (!new_ctxs)
+   return -ENOMEM;
+
+   new_dirs = krealloc(dbgfs_dirs, sizeof(*dbgfs_dirs) *
+   (dbgfs_nr_ctxs + 1), GFP_KERNEL);
+   if (!new_dirs) {
+   kfree(new_ctxs);
+   return -ENOMEM;
+   }
+
+   dbgfs_ctxs = new_ctxs;
+   dbgfs_dirs = new_dirs;
+
+   root = dbgfs_dirs[0];
+   if (!root)
+   return -ENOENT;
+
+   new_dir = debugfs_create_dir(name, root);
+   dbgfs_dirs[dbgfs_nr_ctxs] = new_dir;
+
+   new_ctx = dbgfs_new_ctx();
+   if (!new_ctx) {
+   debugfs_remove(new_dir);
+   dbgfs_dirs[dbgfs_nr_ctxs] = NULL;
+   return -ENOMEM;
+   }
+   dbgfs_ctxs[dbgfs_nr_ctxs] = new_ctx;
+
+   err = dbgfs_fill_ctx_dir(dbgfs_dirs[dbgfs_nr_ctxs],
+   dbgfs_ctxs[dbgfs_nr_ctxs]);
+   if (err)
+   return err;
+
+   dbgfs_nr_ctxs++;
+   return 0;
+}
+
+static ssize_t dbgfs_mk_context_write(struct file *file,
+   const char __user *buf, size_t count, loff_t *ppos)
+{
+   char *kbuf;
+   char *ctx_name;
+   ssize_t ret = count;
+   int err;
+
+   kbuf = user_input_str(buf, count, ppos);
+   if (IS_ERR(kbuf))
+   return PTR_ERR(kbuf);
+   ctx_name = kmalloc(count + 1, GFP_KERNEL);
+   if (!ctx_name) {
+   kfree(kbuf);
+   return -ENOMEM;
+   }
+
+   /* Trim white space */
+   if (sscanf(kbuf, "%s", ctx_name) != 1) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   mutex_lock(_dbgfs_lock);
+   err = dbgfs_mk_context(ctx_name);
+   if (err)
+   ret = err;
+   mutex_unlock(_dbgfs_lock);
+
+out:
+   kfree(kbuf);
+   kfree(ctx_name);
+   return ret;
+}
+
+/*
+ * Remove a context of @name and its debugfs directory.
+ *
+ * This function should be called while holding damon_dbgfs_lock.
+ *
+ * Return 0 on success, negative error code otherwise.
+ */
+static int dbgfs_rm_context(char *name)
+{
+   struct dentry *root, *dir, **new_dirs;
+   struct damon_ctx **new_ctxs;
+   int i, j;
+
+   if (damon_nr_running_ctxs())
+   return -EBUSY;
+
+   root = dbgfs_dirs[0];
+   if (!root)
+   return -ENOENT;
+
+   dir = debugfs_lookup(name, root);
+   if (!dir)
+   return -ENOENT;
+
+   new_dirs = kmalloc_array(dbgfs_nr_ctxs - 1, sizeof(*

[PATCH v27 07/13] mm/damon: Implement a debugfs-based user space interface

2021-04-08 Thread SeongJae Park

From: SeongJae Park 

DAMON is designed to be used by kernel space code such as the memory
management subsystems, and therefore it provides only kernel space API.
That said, letting the user space control DAMON could provide some
benefits to them.  For example, it will allow user space to analyze
their specific workloads and make their own special optimizations.

For such cases, this commit implements a simple DAMON application kernel
module, namely 'damon-dbgfs', which merely wraps the DAMON api and
exports those to the user space via the debugfs.

'damon-dbgfs' exports three files, ``attrs``, ``target_ids``, and
``monitor_on`` under its debugfs directory, ``/damon/``.

Attributes
--

Users can read and write the ``sampling interval``, ``aggregation
interval``, ``regions update interval``, and min/max number of
monitoring target regions by reading from and writing to the ``attrs``
file.  For example, below commands set those values to 5 ms, 100 ms,
1,000 ms, 10, 1000 and check it again::

# cd /damon
# echo 5000 10 100 10 1000 > attrs
# cat attrs
5000 10 100 10 1000

Target IDs
--

Some types of address spaces supports multiple monitoring target.  For
example, the virtual memory address spaces monitoring can have multiple
processes as the monitoring targets.  Users can set the targets by
writing relevant id values of the targets to, and get the ids of the
current targets by reading from the ``target_ids`` file.  In case of the
virtual address spaces monitoring, the values should be pids of the
monitoring target processes.  For example, below commands set processes
having pids 42 and 4242 as the monitoring targets and check it again::

# cd /damon
# echo 42 4242 > target_ids
# cat target_ids
42 4242

Note that setting the target ids doesn't start the monitoring.

Turning On/Off
--

Setting the files as described above doesn't incur effect unless you
explicitly start the monitoring.  You can start, stop, and check the
current status of the monitoring by writing to and reading from the
``monitor_on`` file.  Writing ``on`` to the file starts the monitoring
of the targets with the attributes.  Writing ``off`` to the file stops
those.  DAMON also stops if every targets are invalidated (in case of
the virtual memory monitoring, target processes are invalidated when
terminated).  Below example commands turn on, off, and check the status
of DAMON::

# cd /damon
# echo on > monitor_on
# echo off > monitor_on
# cat monitor_on
off

Please note that you cannot write to the above-mentioned debugfs files
while the monitoring is turned on.  If you write to the files while
DAMON is running, an error code such as ``-EBUSY`` will be returned.

Signed-off-by: SeongJae Park 
Reviewed-by: Leonard Foerster 
---
 include/linux/damon.h |   3 +
 mm/damon/Kconfig  |   9 +
 mm/damon/Makefile |   1 +
 mm/damon/core.c   |  47 ++
 mm/damon/dbgfs.c  | 382 ++
 5 files changed, 442 insertions(+)
 create mode 100644 mm/damon/dbgfs.c

diff --git a/include/linux/damon.h b/include/linux/damon.h
index 72cf5ebd35fe..b17e808a9cae 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -237,9 +237,12 @@ unsigned int damon_nr_regions(struct damon_target *t);
 
 struct damon_ctx *damon_new_ctx(void);
 void damon_destroy_ctx(struct damon_ctx *ctx);
+int damon_set_targets(struct damon_ctx *ctx,
+   unsigned long *ids, ssize_t nr_ids);
 int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
unsigned long aggr_int, unsigned long primitive_upd_int,
unsigned long min_nr_reg, unsigned long max_nr_reg);
+int damon_nr_running_ctxs(void);
 
 int damon_start(struct damon_ctx **ctxs, int nr_ctxs);
 int damon_stop(struct damon_ctx **ctxs, int nr_ctxs);
diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig
index 8ae080c52950..72f1683ba0ee 100644
--- a/mm/damon/Kconfig
+++ b/mm/damon/Kconfig
@@ -21,4 +21,13 @@ config DAMON_VADDR
  This builds the default data access monitoring primitives for DAMON
  that works for virtual address spaces.
 
+config DAMON_DBGFS
+   bool "DAMON debugfs interface"
+   depends on DAMON_VADDR && DEBUG_FS
+   help
+ This builds the debugfs interface for DAMON.  The user space admins
+ can use the interface for arbitrary data access monitoring.
+
+ If unsure, say N.
+
 endmenu
diff --git a/mm/damon/Makefile b/mm/damon/Makefile
index 6ebbd08aed67..fed4be3bace3 100644
--- a/mm/damon/Makefile
+++ b/mm/damon/Makefile
@@ -2,3 +2,4 @@
 
 obj-$(CONFIG_DAMON):= core.o
 obj-$(CONFIG_DAMON_VADDR)  += vaddr.o
+obj-$(CONFIG_DAMON_DBGFS)  += dbgfs.o
diff --git a/mm/damon/core.c b/mm/damon/core.c
index 912112662d0c..cad2b4cee39d 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -172,6 +172,39 @@ void damon_destroy_ctx(struct dam

[PATCH v27 06/13] mm/damon: Add a tracepoint

2021-04-08 Thread SeongJae Park

From: SeongJae Park 

This commit adds a tracepoint for DAMON.  It traces the monitoring
results of each region for each aggregation interval.  Using this, DAMON
can easily integrated with tracepoints supporting tools such as perf.

Signed-off-by: SeongJae Park 
Reviewed-by: Leonard Foerster 
Reviewed-by: Steven Rostedt (VMware) 
---
 include/trace/events/damon.h | 43 
 mm/damon/core.c  |  7 +-
 2 files changed, 49 insertions(+), 1 deletion(-)
 create mode 100644 include/trace/events/damon.h

diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h
new file mode 100644
index ..2f422f4f1fb9
--- /dev/null
+++ b/include/trace/events/damon.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM damon
+
+#if !defined(_TRACE_DAMON_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_DAMON_H
+
+#include 
+#include 
+#include 
+
+TRACE_EVENT(damon_aggregated,
+
+   TP_PROTO(struct damon_target *t, struct damon_region *r,
+   unsigned int nr_regions),
+
+   TP_ARGS(t, r, nr_regions),
+
+   TP_STRUCT__entry(
+   __field(unsigned long, target_id)
+   __field(unsigned int, nr_regions)
+   __field(unsigned long, start)
+   __field(unsigned long, end)
+   __field(unsigned int, nr_accesses)
+   ),
+
+   TP_fast_assign(
+   __entry->target_id = t->id;
+   __entry->nr_regions = nr_regions;
+   __entry->start = r->ar.start;
+   __entry->end = r->ar.end;
+   __entry->nr_accesses = r->nr_accesses;
+   ),
+
+   TP_printk("target_id=%lu nr_regions=%u %lu-%lu: %u",
+   __entry->target_id, __entry->nr_regions,
+   __entry->start, __entry->end, __entry->nr_accesses)
+);
+
+#endif /* _TRACE_DAMON_H */
+
+/* This part must be outside protection */
+#include 
diff --git a/mm/damon/core.c b/mm/damon/core.c
index b36b6bdd94e2..912112662d0c 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -13,6 +13,9 @@
 #include 
 #include 
 
+#define CREATE_TRACE_POINTS
+#include 
+
 /* Get a random number in [l, r) */
 #define damon_rand(l, r) (l + prandom_u32_max(r - l))
 
@@ -388,8 +391,10 @@ static void kdamond_reset_aggregated(struct damon_ctx *c)
damon_for_each_target(t, c) {
struct damon_region *r;
 
-   damon_for_each_region(r, t)
+   damon_for_each_region(r, t) {
+   trace_damon_aggregated(t, r, damon_nr_regions(t));
r->nr_accesses = 0;
+   }
}
 }
 
-- 
2.17.1

[PATCH v27 05/13] mm/damon: Implement primitives for the virtual memory address spaces

2021-04-08 Thread SeongJae Park

From: SeongJae Park 

This commit introduces a reference implementation of the address space
specific low level primitives for the virtual address space, so that
users of DAMON can easily monitor the data accesses on virtual address
spaces of specific processes by simply configuring the implementation to
be used by DAMON.

The low level primitives for the fundamental access monitoring are
defined in two parts:

1. Identification of the monitoring target address range for the address
   space.
2. Access check of specific address range in the target space.

The reference implementation for the virtual address space does the
works as below.

PTE Accessed-bit Based Access Check
---

The implementation uses PTE Accessed-bit for basic access checks.  That
is, it clears the bit for the next sampling target page and checks
whether it is set again after one sampling period.  This could disturb
the reclaim logic.  DAMON uses ``PG_idle`` and ``PG_young`` page flags
to solve the conflict, as Idle page tracking does.

VMA-based Target Address Range Construction
---

Only small parts in the super-huge virtual address space of the
processes are mapped to physical memory and accessed.  Thus, tracking
the unmapped address regions is just wasteful.  However, because DAMON
can deal with some level of noise using the adaptive regions adjustment
mechanism, tracking every mapping is not strictly required but could
even incur a high overhead in some cases.  That said, too huge unmapped
areas inside the monitoring target should be removed to not take the
time for the adaptive mechanism.

For the reason, this implementation converts the complex mappings to
three distinct regions that cover every mapped area of the address
space.  Also, the two gaps between the three regions are the two biggest
unmapped areas in the given address space.  The two biggest unmapped
areas would be the gap between the heap and the uppermost mmap()-ed
region, and the gap between the lowermost mmap()-ed region and the stack
in most of the cases.  Because these gaps are exceptionally huge in
usual address spaces, excluding these will be sufficient to make a
reasonable trade-off.  Below shows this in detail::




(small mmap()-ed regions and munmap()-ed regions)




Signed-off-by: SeongJae Park 
Reviewed-by: Leonard Foerster 
Reported-by: Guoju Fang 
---
 include/linux/damon.h |  13 +
 mm/damon/Kconfig  |   9 +
 mm/damon/Makefile |   1 +
 mm/damon/vaddr.c  | 616 ++
 4 files changed, 639 insertions(+)
 create mode 100644 mm/damon/vaddr.c

diff --git a/include/linux/damon.h b/include/linux/damon.h
index 0bd5d6913a6c..72cf5ebd35fe 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -246,4 +246,17 @@ int damon_stop(struct damon_ctx **ctxs, int nr_ctxs);
 
 #endif /* CONFIG_DAMON */
 
+#ifdef CONFIG_DAMON_VADDR
+
+/* Monitoring primitives for virtual memory address spaces */
+void damon_va_init(struct damon_ctx *ctx);
+void damon_va_update(struct damon_ctx *ctx);
+void damon_va_prepare_access_checks(struct damon_ctx *ctx);
+unsigned int damon_va_check_accesses(struct damon_ctx *ctx);
+bool damon_va_target_valid(void *t);
+void damon_va_cleanup(struct damon_ctx *ctx);
+void damon_va_set_primitives(struct damon_ctx *ctx);
+
+#endif /* CONFIG_DAMON_VADDR */
+
 #endif /* _DAMON_H */
diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig
index d00e99ac1a15..8ae080c52950 100644
--- a/mm/damon/Kconfig
+++ b/mm/damon/Kconfig
@@ -12,4 +12,13 @@ config DAMON
  See https://damonitor.github.io/doc/html/latest-damon/index.html for
  more information.
 
+config DAMON_VADDR
+   bool "Data access monitoring primitives for virtual address spaces"
+   depends on DAMON && MMU
+   select PAGE_EXTENSION if !64BIT
+   select PAGE_IDLE_FLAG
+   help
+ This builds the default data access monitoring primitives for DAMON
+ that works for virtual address spaces.
+
 endmenu
diff --git a/mm/damon/Makefile b/mm/damon/Makefile
index 4fd2edb4becf..6ebbd08aed67 100644
--- a/mm/damon/Makefile
+++ b/mm/damon/Makefile
@@ -1,3 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0
 
 obj-$(CONFIG_DAMON):= core.o
+obj-$(CONFIG_DAMON_VADDR)  += vaddr.o
diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
new file mode 100644
index ..3bc9dc9f0656
--- /dev/null
+++ b/mm/damon/vaddr.c
@@ -0,0 +1,616 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * DAMON Primitives for Virtual Address Spaces
+ *
+ * Author: SeongJae Park 
+ */
+
+#define pr_fmt(fmt) "damon-va: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Get a random number in [l, r) */
+#define damon_rand(l, r) (l + prandom_u32_max(r - l))
+
+/*
+ * 't->id' should be the pointer to the relevant 'struct pid' having reference
+ * count.  Caller must put the r

[PATCH v27 03/13] mm/damon: Adaptively adjust regions

2021-04-08 Thread SeongJae Park

From: SeongJae Park 

Even somehow the initial monitoring target regions are well constructed
to fulfill the assumption (pages in same region have similar access
frequencies), the data access pattern can be dynamically changed.  This
will result in low monitoring quality.  To keep the assumption as much
as possible, DAMON adaptively merges and splits each region based on
their access frequency.

For each ``aggregation interval``, it compares the access frequencies of
adjacent regions and merges those if the frequency difference is small.
Then, after it reports and clears the aggregated access frequency of
each region, it splits each region into two or three regions if the
total number of regions will not exceed the user-specified maximum
number of regions after the split.

In this way, DAMON provides its best-effort quality and minimal overhead
while keeping the upper-bound overhead that users set.

Signed-off-by: SeongJae Park 
Reviewed-by: Leonard Foerster 
---
 include/linux/damon.h |  23 +++--
 mm/damon/core.c   | 214 +-
 2 files changed, 227 insertions(+), 10 deletions(-)

diff --git a/include/linux/damon.h b/include/linux/damon.h
index 67db309ad61b..0bd5d6913a6c 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -12,6 +12,9 @@
 #include 
 #include 
 
+/* Minimal region size.  Every damon_region is aligned by this. */
+#define DAMON_MIN_REGION   PAGE_SIZE
+
 /**
  * struct damon_addr_range - Represents an address region of [@start, @end).
  * @start: Start address of the region (inclusive).
@@ -85,6 +88,8 @@ struct damon_ctx;
  * prepared for the next access check.
  * @check_accesses should check the accesses to each region that made after the
  * last preparation and update the number of observed accesses of each region.
+ * It should also return max number of observed accesses that made as a result
+ * of its update.  The value will be used for regions adjustment threshold.
  * @reset_aggregated should reset the access monitoring results that aggregated
  * by @check_accesses.
  * @target_valid should check whether the target is still valid for the
@@ -95,7 +100,7 @@ struct damon_primitive {
void (*init)(struct damon_ctx *context);
void (*update)(struct damon_ctx *context);
void (*prepare_access_checks)(struct damon_ctx *context);
-   void (*check_accesses)(struct damon_ctx *context);
+   unsigned int (*check_accesses)(struct damon_ctx *context);
void (*reset_aggregated)(struct damon_ctx *context);
bool (*target_valid)(void *target);
void (*cleanup)(struct damon_ctx *context);
@@ -172,7 +177,9 @@ struct damon_callback {
  * @primitive: Set of monitoring primitives for given use cases.
  * @callback:  Set of callbacks for monitoring events notifications.
  *
- * @region_targets:Head of monitoring targets (_target) list.
+ * @min_nr_regions:The minimum number of adaptive monitoring regions.
+ * @max_nr_regions:The maximum number of adaptive monitoring regions.
+ * @adaptive_targets:  Head of monitoring targets (_target) list.
  */
 struct damon_ctx {
unsigned long sample_interval;
@@ -191,7 +198,9 @@ struct damon_ctx {
struct damon_primitive primitive;
struct damon_callback callback;
 
-   struct list_head region_targets;
+   unsigned long min_nr_regions;
+   unsigned long max_nr_regions;
+   struct list_head adaptive_targets;
 };
 
 #define damon_next_region(r) \
@@ -207,10 +216,10 @@ struct damon_ctx {
list_for_each_entry_safe(r, next, >regions_list, list)
 
 #define damon_for_each_target(t, ctx) \
-   list_for_each_entry(t, &(ctx)->region_targets, list)
+   list_for_each_entry(t, &(ctx)->adaptive_targets, list)
 
 #define damon_for_each_target_safe(t, next, ctx)   \
-   list_for_each_entry_safe(t, next, &(ctx)->region_targets, list)
+   list_for_each_entry_safe(t, next, &(ctx)->adaptive_targets, list)
 
 #ifdef CONFIG_DAMON
 
@@ -224,11 +233,13 @@ struct damon_target *damon_new_target(unsigned long id);
 void damon_add_target(struct damon_ctx *ctx, struct damon_target *t);
 void damon_free_target(struct damon_target *t);
 void damon_destroy_target(struct damon_target *t);
+unsigned int damon_nr_regions(struct damon_target *t);
 
 struct damon_ctx *damon_new_ctx(void);
 void damon_destroy_ctx(struct damon_ctx *ctx);
 int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int,
-   unsigned long aggr_int, unsigned long primitive_upd_int);
+   unsigned long aggr_int, unsigned long primitive_upd_int,
+   unsigned long min_nr_reg, unsigned long max_nr_reg);
 
 int damon_start(struct damon_ctx **ctxs, int nr_ctxs);
 int damon_stop(struct damon_ctx **ctxs, int nr_ctxs);
diff --git a/mm/damon/core.c b/mm/damon/core.c
index 94db494dcf70..b36b6bdd94e2 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -10,8 +10,12 @@
 #include

[PATCH v27 04/13] mm/idle_page_tracking: Make PG_idle reusable

2021-04-08 Thread SeongJae Park

From: SeongJae Park 

PG_idle and PG_young allow the two PTE Accessed bit users, Idle Page
Tracking and the reclaim logic concurrently work while don't interfere
each other.  That is, when they need to clear the Accessed bit, they set
PG_young to represent the previous state of the bit, respectively.  And
when they need to read the bit, if the bit is cleared, they further read
the PG_young to know whether the other has cleared the bit meanwhile or
not.

We could add another page flag and extend the mechanism to use the flag
if we need to add another concurrent PTE Accessed bit user subsystem.
However, the space is limited.  Meanwhile, if the new subsystem is
mutually exclusive with IDLE_PAGE_TRACKING or interfering with it is not
a real problem, it would be ok to simply reuse the PG_idle flag.
However, it's impossible because the flags are dependent on
IDLE_PAGE_TRACKING.

To allow such reuse of the flags, this commit separates the PG_young and
PG_idle flag logic from IDLE_PAGE_TRACKING and introduces new kernel
config, 'PAGE_IDLE_FLAG'.  Hence, a new subsystem would be able to reuse
PG_idle without depending on IDLE_PAGE_TRACKING.

In the next commit, DAMON's reference implementation of the virtual
memory address space monitoring primitives will use it.

Signed-off-by: SeongJae Park 
Reviewed-by: Shakeel Butt 
---
 include/linux/page-flags.h |  4 ++--
 include/linux/page_ext.h   |  2 +-
 include/linux/page_idle.h  |  6 +++---
 include/trace/events/mmflags.h |  2 +-
 mm/Kconfig |  8 
 mm/page_ext.c  | 12 +++-
 mm/page_idle.c | 10 --
 7 files changed, 26 insertions(+), 18 deletions(-)

diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h
index 04a34c08e0a6..6be2c1e2fb48 100644
--- a/include/linux/page-flags.h
+++ b/include/linux/page-flags.h
@@ -131,7 +131,7 @@ enum pageflags {
 #ifdef CONFIG_MEMORY_FAILURE
PG_hwpoison,/* hardware poisoned page. Don't touch */
 #endif
-#if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT)
+#if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT)
PG_young,
PG_idle,
 #endif
@@ -436,7 +436,7 @@ PAGEFLAG_FALSE(HWPoison)
 #define __PG_HWPOISON 0
 #endif
 
-#if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT)
+#if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT)
 TESTPAGEFLAG(Young, young, PF_ANY)
 SETPAGEFLAG(Young, young, PF_ANY)
 TESTCLEARFLAG(Young, young, PF_ANY)
diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h
index aff81ba31bd8..fabb2e1e087f 100644
--- a/include/linux/page_ext.h
+++ b/include/linux/page_ext.h
@@ -19,7 +19,7 @@ struct page_ext_operations {
 enum page_ext_flags {
PAGE_EXT_OWNER,
PAGE_EXT_OWNER_ALLOCATED,
-#if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT)
+#if defined(CONFIG_PAGE_IDLE_FLAG) && !defined(CONFIG_64BIT)
PAGE_EXT_YOUNG,
PAGE_EXT_IDLE,
 #endif
diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h
index 1e894d34bdce..d8a6aecf99cb 100644
--- a/include/linux/page_idle.h
+++ b/include/linux/page_idle.h
@@ -6,7 +6,7 @@
 #include 
 #include 
 
-#ifdef CONFIG_IDLE_PAGE_TRACKING
+#ifdef CONFIG_PAGE_IDLE_FLAG
 
 #ifdef CONFIG_64BIT
 static inline bool page_is_young(struct page *page)
@@ -106,7 +106,7 @@ static inline void clear_page_idle(struct page *page)
 }
 #endif /* CONFIG_64BIT */
 
-#else /* !CONFIG_IDLE_PAGE_TRACKING */
+#else /* !CONFIG_PAGE_IDLE_FLAG */
 
 static inline bool page_is_young(struct page *page)
 {
@@ -135,6 +135,6 @@ static inline void clear_page_idle(struct page *page)
 {
 }
 
-#endif /* CONFIG_IDLE_PAGE_TRACKING */
+#endif /* CONFIG_PAGE_IDLE_FLAG */
 
 #endif /* _LINUX_MM_PAGE_IDLE_H */
diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h
index 629c7a0eaff2..ea434bbc2d2b 100644
--- a/include/trace/events/mmflags.h
+++ b/include/trace/events/mmflags.h
@@ -73,7 +73,7 @@
 #define IF_HAVE_PG_HWPOISON(flag,string)
 #endif
 
-#if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT)
+#if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT)
 #define IF_HAVE_PG_IDLE(flag,string) ,{1UL << flag, string}
 #else
 #define IF_HAVE_PG_IDLE(flag,string)
diff --git a/mm/Kconfig b/mm/Kconfig
index 56bec147bdff..0616a8b1ff0b 100644
--- a/mm/Kconfig
+++ b/mm/Kconfig
@@ -771,10 +771,18 @@ config DEFERRED_STRUCT_PAGE_INIT
  lifetime of the system until these kthreads finish the
  initialisation.
 
+config PAGE_IDLE_FLAG
+   bool "Add PG_idle and PG_young flags"
+   help
+ This feature adds PG_idle and PG_young flags in 'struct page'.  PTE
+ Accessed bit writers can set the state of the bit in the flags to let
+ other PTE Accessed bit readers don't disturbed.
+
 config IDLE_PAGE_TRACKING
bool "Enable idle page tracking"
dep

[PATCH v27 02/13] mm/damon/core: Implement region-based sampling

2021-04-08 Thread SeongJae Park

From: SeongJae Park 

To avoid the unbounded increase of the overhead, DAMON groups adjacent
pages that are assumed to have the same access frequencies into a
region.  As long as the assumption (pages in a region have the same
access frequencies) is kept, only one page in the region is required to
be checked.  Thus, for each ``sampling interval``,

 1. the 'prepare_access_checks' primitive picks one page in each region,
 2. waits for one ``sampling interval``,
 3. checks whether the page is accessed meanwhile, and
 4. increases the access count of the region if so.

Therefore, the monitoring overhead is controllable by adjusting the
number of regions.  DAMON allows both the underlying primitives and user
callbacks to adjust regions for the trade-off.  In other words, this
commit makes DAMON to use not only time-based sampling but also
space-based sampling.

This scheme, however, cannot preserve the quality of the output if the
assumption is not guaranteed.  Next commit will address this problem.

Signed-off-by: SeongJae Park 
Reviewed-by: Leonard Foerster 
---
 include/linux/damon.h |  77 ++-
 mm/damon/core.c   | 143 --
 2 files changed, 213 insertions(+), 7 deletions(-)

diff --git a/include/linux/damon.h b/include/linux/damon.h
index 2f652602b1ea..67db309ad61b 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -12,6 +12,48 @@
 #include 
 #include 
 
+/**
+ * struct damon_addr_range - Represents an address region of [@start, @end).
+ * @start: Start address of the region (inclusive).
+ * @end:   End address of the region (exclusive).
+ */
+struct damon_addr_range {
+   unsigned long start;
+   unsigned long end;
+};
+
+/**
+ * struct damon_region - Represents a monitoring target region.
+ * @ar:The address range of the region.
+ * @sampling_addr: Address of the sample for the next access check.
+ * @nr_accesses:   Access frequency of this region.
+ * @list:  List head for siblings.
+ */
+struct damon_region {
+   struct damon_addr_range ar;
+   unsigned long sampling_addr;
+   unsigned int nr_accesses;
+   struct list_head list;
+};
+
+/**
+ * struct damon_target - Represents a monitoring target.
+ * @id:Unique identifier for this target.
+ * @regions_list:  Head of the monitoring target regions of this target.
+ * @list:  List head for siblings.
+ *
+ * Each monitoring context could have multiple targets.  For example, a context
+ * for virtual memory address spaces could have multiple target processes.  The
+ * @id of each target should be unique among the targets of the context.  For
+ * example, in the virtual address monitoring context, it could be a pidfd or
+ * an address of an mm_struct.
+ */
+struct damon_target {
+   unsigned long id;
+   struct list_head regions_list;
+   struct list_head list;
+};
+
 struct damon_ctx;
 
 /**
@@ -36,7 +78,7 @@ struct damon_ctx;
  *
  * @init should initialize primitive-internal data structures.  For example,
  * this could be used to construct proper monitoring target regions and link
- * those to @damon_ctx.target.
+ * those to @damon_ctx.adaptive_targets.
  * @update should update the primitive-internal data structures.  For example,
  * this could be used to update monitoring target regions for current status.
  * @prepare_access_checks should manipulate the monitoring regions to be
@@ -130,7 +172,7 @@ struct damon_callback {
  * @primitive: Set of monitoring primitives for given use cases.
  * @callback:  Set of callbacks for monitoring events notifications.
  *
- * @target:Pointer to the user-defined monitoring target.
+ * @region_targets:Head of monitoring targets (_target) list.
  */
 struct damon_ctx {
unsigned long sample_interval;
@@ -149,11 +191,40 @@ struct damon_ctx {
struct damon_primitive primitive;
struct damon_callback callback;
 
-   void *target;
+   struct list_head region_targets;
 };
 
+#define damon_next_region(r) \
+   (container_of(r->list.next, struct damon_region, list))
+
+#define damon_prev_region(r) \
+   (container_of(r->list.prev, struct damon_region, list))
+
+#define damon_for_each_region(r, t) \
+   list_for_each_entry(r, >regions_list, list)
+
+#define damon_for_each_region_safe(r, next, t) \
+   list_for_each_entry_safe(r, next, >regions_list, list)
+
+#define damon_for_each_target(t, ctx) \
+   list_for_each_entry(t, &(ctx)->region_targets, list)
+
+#define damon_for_each_target_safe(t, next, ctx)   \
+   list_for_each_entry_safe(t, next, &(ctx)->region_targets, list)
+
 #ifdef CONFIG_DAMON
 
+struct damon_region *damon_new_region(unsigned long start, unsigned long end);
+inline void damon_insert_region(struct damon_region *r,
+   struct damon_region *prev, struct damon_region *next);
+void damon_add_region(stru

[PATCH v27 01/13] mm: Introduce Data Access MONitor (DAMON)

2021-04-08 Thread SeongJae Park

From: SeongJae Park 

DAMON is a data access monitoring framework for the Linux kernel.  The
core mechanisms of DAMON make it

 - accurate (the monitoring output is useful enough for DRAM level
   performance-centric memory management; It might be inappropriate for
   CPU cache levels, though),
 - light-weight (the monitoring overhead is normally low enough to be
   applied online), and
 - scalable (the upper-bound of the overhead is in constant range
   regardless of the size of target workloads).

Using this framework, hence, we can easily write efficient kernel space
data access monitoring applications.  For example, the kernel's memory
management mechanisms can make advanced decisions using this.
Experimental data access aware optimization works that incurring high
access monitoring overhead could again be implemented on top of this.

Due to its simple and flexible interface, providing user space interface
would be also easy.  Then, user space users who have some special
workloads can write personalized applications for better understanding
and optimizations of their workloads and systems.

===

Nevertheless, this commit is defining and implementing only basic access
check part without the overhead-accuracy handling core logic.  The basic
access check is as below.

The output of DAMON says what memory regions are how frequently accessed
for a given duration.  The resolution of the access frequency is
controlled by setting ``sampling interval`` and ``aggregation
interval``.  In detail, DAMON checks access to each page per ``sampling
interval`` and aggregates the results.  In other words, counts the
number of the accesses to each region.  After each ``aggregation
interval`` passes, DAMON calls callback functions that previously
registered by users so that users can read the aggregated results and
then clears the results.  This can be described in below simple
pseudo-code::

init()
while monitoring_on:
for page in monitoring_target:
if accessed(page):
nr_accesses[page] += 1
if time() % aggregation_interval == 0:
for callback in user_registered_callbacks:
callback(monitoring_target, nr_accesses)
for page in monitoring_target:
nr_accesses[page] = 0
if time() % update_interval == 0:
update()
sleep(sampling interval)

The target regions constructed at the beginning of the monitoring and
updated after each ``regions_update_interval``, because the target
regions could be dynamically changed (e.g., mmap() or memory hotplug).
The monitoring overhead of this mechanism will arbitrarily increase as
the size of the target workload grows.

The basic monitoring primitives for actual access check and dynamic
target regions construction aren't in the core part of DAMON.  Instead,
it allows users to implement their own primitives that are optimized for
their use case and configure DAMON to use those.  In other words, users
cannot use current version of DAMON without some additional works.

Following commits will implement the core mechanisms for the
overhead-accuracy control and default primitives implementations.

Signed-off-by: SeongJae Park 
Reviewed-by: Leonard Foerster 
---
 include/linux/damon.h | 167 ++
 mm/Kconfig|   3 +
 mm/Makefile   |   1 +
 mm/damon/Kconfig  |  15 ++
 mm/damon/Makefile |   3 +
 mm/damon/core.c   | 318 ++
 6 files changed, 507 insertions(+)
 create mode 100644 include/linux/damon.h
 create mode 100644 mm/damon/Kconfig
 create mode 100644 mm/damon/Makefile
 create mode 100644 mm/damon/core.c

diff --git a/include/linux/damon.h b/include/linux/damon.h
new file mode 100644
index ..2f652602b1ea
--- /dev/null
+++ b/include/linux/damon.h
@@ -0,0 +1,167 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * DAMON api
+ *
+ * Author: SeongJae Park 
+ */
+
+#ifndef _DAMON_H_
+#define _DAMON_H_
+
+#include 
+#include 
+#include 
+
+struct damon_ctx;
+
+/**
+ * struct damon_primitive  Monitoring primitives for given use cases.
+ *
+ * @init:  Initialize primitive-internal data structures.
+ * @update:Update primitive-internal data structures.
+ * @prepare_access_checks: Prepare next access check of target regions.
+ * @check_accesses:Check the accesses to target regions.
+ * @reset_aggregated:  Reset aggregated accesses monitoring results.
+ * @target_valid:  Determine if the target is valid.
+ * @cleanup:   Clean up the context.
+ *
+ * DAMON can be extended for various address spaces and usages.  For this,
+ * users should register the low level primitives for their target address
+ * space and usecase via the _ctx.primitive.  Then, the monitoring thread
+ * (_ctx.kdamond) calls @init and @prepare_access_checks before starting
+ * the monitoring, @update after each

[PATCH v27 00/13] Introduce Data Access MONitor (DAMON)

2021-04-08 Thread SeongJae Park

From: SeongJae Park 

Changes from Previous Version (v26)
===

- Rebase on latest -mm tree (v5.12-rc6-mmots-2021-04-06-22-33)
- Check kmalloc() failures in dbgfs init (Greg KH)
- Fix a typo: s/stollen/stolen/ (Stefan Nuernberger)
- Update document for updated user space tool path

Introduction


DAMON is a data access monitoring framework for the Linux kernel.  The core
mechanisms of DAMON called 'region based sampling' and 'adaptive regions
adjustment' (refer to 'mechanisms.rst' in the 11th patch of this patchset for
the detail) make it

 - accurate (The monitored information is useful for DRAM level memory
   management. It might not appropriate for Cache-level accuracy, though.),
 - light-weight (The monitoring overhead is low enough to be applied online
   while making no impact on the performance of the target workloads.), and
 - scalable (the upper-bound of the instrumentation overhead is controllable
   regardless of the size of target workloads.).

Using this framework, therefore, several memory management mechanisms such as
reclamation and THP can be optimized to aware real data access patterns.
Experimental access pattern aware memory management optimization works that
incurring high instrumentation overhead will be able to have another try.

Though DAMON is for kernel subsystems, it can be easily exposed to the user
space by writing a DAMON-wrapper kernel subsystem.  Then, user space users who
have some special workloads will be able to write personalized tools or
applications for deeper understanding and specialized optimizations of their
systems.

Long-term Plan
--

DAMON is a part of a project called Data Access-aware Operating System (DAOS).
As the name implies, I want to improve the performance and efficiency of
systems using fine-grained data access patterns.  The optimizations are for
both kernel and user spaces.  I will therefore modify or create kernel
subsystems, export some of those to user space and implement user space library
/ tools.  Below shows the layers and components for the project.

---
Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ...
Framework:  DAMON
Features:   DAMOS, virtual addr, physical addr, ...
Applications:   DAMON-debugfs, (DARC), ...
^^^KERNEL SPACE

Raw Interface:  debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ...

vvvUSER SPACE  
Library:(libdamon), ...
Tools:  DAMO, (perf), ...
---

The components in parentheses or marked as '...' are not implemented yet but in
the future plan.  IOW, those are the TODO tasks of DAOS project.  For more
detail, please refer to the plans:
https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjp...@amazon.com/

Evaluations
===

We evaluated DAMON's overhead, monitoring quality and usefulness using 24
realistic workloads on my QEMU/KVM based virtual machine running a kernel that
v24 DAMON patchset is applied.

DAMON is lightweight.  It increases system memory usage by 0.39% and slows
target workloads down by 1.16%.

DAMON is accurate and useful for memory management optimizations.  An
experimental DAMON-based operation scheme for THP, namely 'ethp', removes
76.15% of THP memory overheads while preserving 51.25% of THP speedup.  Another
experimental DAMON-based 'proactive reclamation' implementation, 'prcl',
reduces 93.38% of residential sets and 23.63% of system memory footprint while
incurring only 1.22% runtime overhead in the best case (parsec3/freqmine).

NOTE that the experimental THP optimization and proactive reclamation are not
for production but only for proof of concepts.

Please refer to the official document[1] or "Documentation/admin-guide/mm: Add
a document for DAMON" patch in this patchset for detailed evaluation setup and
results.

[1] 
https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html

Real-world User Story
=

In summary, DAMON has used on production systems and proved its usefulness.

DAMON as a profiler
---

We analyzed characteristics of a large scale production systems of our
customers using DAMON.  The systems utilize 70GB DRAM and 36 CPUs.  From this,
we were able to find interesting things below.

There were obviously different access pattern under idle workload and active
workload.  Under the idle workload, it accessed large memory regions with low
frequency, while the active workload accessed small memory regions with high
freuqnecy.

DAMON found a 7GB memory region that showing obviously high access frequency
under the active workload.  We believe this is the performance-effective
working set and need to be

Re: [PATCH v26 07/13] mm/damon: Implement a debugfs-based user space interface

2021-03-30 Thread SeongJae Park

From: SeongJae Park 

On Tue, 30 Mar 2021 09:59:50 + SeongJae Park  wrote:

> From: SeongJae Park 
> 
> On Tue, 30 Mar 2021 11:22:45 +0200 Greg KH  wrote:
> 
> > On Tue, Mar 30, 2021 at 09:05:31AM +, sj38.p...@gmail.com wrote:
> > > +static int __init __damon_dbgfs_init(void)
> > > +{
> > > + struct dentry *dbgfs_root;
> > > + const char * const file_names[] = {"monitor_on"};
> > > + const struct file_operations *fops[] = {_on_fops};
> > > + int i;
> > > +
> > > + dbgfs_root = debugfs_create_dir("damon", NULL);
> > > +
> > > + for (i = 0; i < ARRAY_SIZE(file_names); i++)
> > > + debugfs_create_file(file_names[i], 0600, dbgfs_root, NULL,
> > > + fops[i]);
> > > + dbgfs_fill_ctx_dir(dbgfs_root, dbgfs_ctxs[0]);
> > > +
> > > + dbgfs_dirs = kmalloc_array(1, sizeof(dbgfs_root), GFP_KERNEL);
> > 
> > No error checking for memory allocation failures?
> 
> Oops, I will add the check in the next spin.
> 
> > 
> > 
> > > + dbgfs_dirs[0] = dbgfs_root;
> > > +
> > > + return 0;
> > > +}
> > > +
> > > +/*
> > > + * Functions for the initialization
> > > + */
> > > +
> > > +static int __init damon_dbgfs_init(void)
> > > +{
> > > + int rc;
> > > +
> > > + dbgfs_ctxs = kmalloc(sizeof(*dbgfs_ctxs), GFP_KERNEL);
> > 
> > No error checking?
> 
> Will add in the next spin.
> 
> > 
> > > + dbgfs_ctxs[0] = dbgfs_new_ctx();
> > > + if (!dbgfs_ctxs[0])
> > > + return -ENOMEM;

And, I found that I'm not printing the error for this failure.  I guess this
might made you to to think below error message should printed inside the
callee.

I will add 'pr_err()' here and above unchecked failure case, in the next
version.

BTW, I forgot saying thank you for your review.  Appreciate!


Thanks,
SeongJae Park

> > > + dbgfs_nr_ctxs = 1;
> > > +
> > > + rc = __damon_dbgfs_init();
> > > + if (rc)
> > > + pr_err("%s: dbgfs init failed\n", __func__);
> > 
> > Shouldn't the error be printed out in the function that failed, not in
> > this one?
> 
> I thought some other functions (in future) might want to use
> '__damon_dbgfs_init()' but siletnly handles it's failure.  Therefore I made 
> the
> function fails silently but returns the error code explicitly.  Am I missing
> somthing?
> 
> 
> Thanks,
> SeongJae Park
> 
> > 
> > thanks,
> > 
> > greg k-h
>

Re: [PATCH v26 07/13] mm/damon: Implement a debugfs-based user space interface

2021-03-30 Thread SeongJae Park

From: SeongJae Park 

On Tue, 30 Mar 2021 11:22:45 +0200 Greg KH  wrote:

> On Tue, Mar 30, 2021 at 09:05:31AM +, sj38.p...@gmail.com wrote:
> > +static int __init __damon_dbgfs_init(void)
> > +{
> > +   struct dentry *dbgfs_root;
> > +   const char * const file_names[] = {"monitor_on"};
> > +   const struct file_operations *fops[] = {_on_fops};
> > +   int i;
> > +
> > +   dbgfs_root = debugfs_create_dir("damon", NULL);
> > +
> > +   for (i = 0; i < ARRAY_SIZE(file_names); i++)
> > +   debugfs_create_file(file_names[i], 0600, dbgfs_root, NULL,
> > +   fops[i]);
> > +   dbgfs_fill_ctx_dir(dbgfs_root, dbgfs_ctxs[0]);
> > +
> > +   dbgfs_dirs = kmalloc_array(1, sizeof(dbgfs_root), GFP_KERNEL);
> 
> No error checking for memory allocation failures?

Oops, I will add the check in the next spin.

> 
> 
> > +   dbgfs_dirs[0] = dbgfs_root;
> > +
> > +   return 0;
> > +}
> > +
> > +/*
> > + * Functions for the initialization
> > + */
> > +
> > +static int __init damon_dbgfs_init(void)
> > +{
> > +   int rc;
> > +
> > +   dbgfs_ctxs = kmalloc(sizeof(*dbgfs_ctxs), GFP_KERNEL);
> 
> No error checking?

Will add in the next spin.

> 
> > +   dbgfs_ctxs[0] = dbgfs_new_ctx();
> > +   if (!dbgfs_ctxs[0])
> > +   return -ENOMEM;
> > +   dbgfs_nr_ctxs = 1;
> > +
> > +   rc = __damon_dbgfs_init();
> > +   if (rc)
> > +   pr_err("%s: dbgfs init failed\n", __func__);
> 
> Shouldn't the error be printed out in the function that failed, not in
> this one?

I thought some other functions (in future) might want to use
'__damon_dbgfs_init()' but siletnly handles it's failure.  Therefore I made the
function fails silently but returns the error code explicitly.  Am I missing
somthing?


Thanks,
SeongJae Park

> 
> thanks,
> 
> greg k-h

Re: [PATCH v25 05/13] mm/damon: Implement primitives for the virtual memory address spaces

2021-03-26 Thread SeongJae Park

From: SeongJae Park 

On Thu, 18 Mar 2021 10:08:48 + sj38.p...@gmail.com wrote:

> From: SeongJae Park 
> 
> This commit introduces a reference implementation of the address space
> specific low level primitives for the virtual address space, so that
> users of DAMON can easily monitor the data accesses on virtual address
> spaces of specific processes by simply configuring the implementation to
> be used by DAMON.
> 
> The low level primitives for the fundamental access monitoring are
> defined in two parts:
> 
> 1. Identification of the monitoring target address range for the address
>space.
> 2. Access check of specific address range in the target space.
> 
> The reference implementation for the virtual address space does the
> works as below.
> 
> PTE Accessed-bit Based Access Check
> ---
> 
> The implementation uses PTE Accessed-bit for basic access checks.  That
> is, it clears the bit for the next sampling target page and checks
> whether it is set again after one sampling period.  This could disturb
> the reclaim logic.  DAMON uses ``PG_idle`` and ``PG_young`` page flags
> to solve the conflict, as Idle page tracking does.
> 
> VMA-based Target Address Range Construction
> ---
> 
> Only small parts in the super-huge virtual address space of the
> processes are mapped to physical memory and accessed.  Thus, tracking
> the unmapped address regions is just wasteful.  However, because DAMON
> can deal with some level of noise using the adaptive regions adjustment
> mechanism, tracking every mapping is not strictly required but could
> even incur a high overhead in some cases.  That said, too huge unmapped
> areas inside the monitoring target should be removed to not take the
> time for the adaptive mechanism.
> 
> For the reason, this implementation converts the complex mappings to
> three distinct regions that cover every mapped area of the address
> space.  Also, the two gaps between the three regions are the two biggest
> unmapped areas in the given address space.  The two biggest unmapped
> areas would be the gap between the heap and the uppermost mmap()-ed
> region, and the gap between the lowermost mmap()-ed region and the stack
> in most of the cases.  Because these gaps are exceptionally huge in
> usual address spaces, excluding these will be sufficient to make a
> reasonable trade-off.  Below shows this in detail::
> 
> 
> 
> 
> (small mmap()-ed regions and munmap()-ed regions)
> 
> 
> 
> 
> Signed-off-by: SeongJae Park 
> Reviewed-by: Leonard Foerster 
> ---
>  include/linux/damon.h |  13 +
>  mm/damon/Kconfig  |   9 +
>  mm/damon/Makefile |   1 +
>  mm/damon/vaddr.c  | 579 ++
>  4 files changed, 602 insertions(+)
>  create mode 100644 mm/damon/vaddr.c
> 
[...]
> +
> +/*
> + * Update regions for current memory mappings
> + */
> +void damon_va_update(struct damon_ctx *ctx)
> +{
> + struct damon_addr_range three_regions[3];
> + struct damon_target *t;
> +
> + damon_for_each_target(t, ctx) {
> + if (damon_va_three_regions(t, three_regions))
> + continue;
> + damon_va_apply_three_regions(ctx, t, three_regions);
> + }
> +}
> +
> +static void damon_ptep_mkold(pte_t *pte, struct mm_struct *mm,
> +  unsigned long addr)
> +{
> + bool referenced = false;
> + struct page *page = pte_page(*pte);

The 'pte' could be a special mapping which has no associated 'struct page'.  In
the case, 'page' would be invalid.  Guoju from Alibaba found the problem from
his GPU setup and reported the problem to via Github[1].  I made a fix and
waiting for his test results.  I will squash the fix in the next version of
this patch.

[1] 
https://github.com/sjp38/linux/pull/3/commits/12eeebc6ffc8b5d2a6aba7a2ec9fb85d3c1663af
[2] 
https://github.com/sjp38/linux/commit/f1fa22b6375ceb9ae53e9370452de0d62efd4df5


Thanks,
SeongJae Park

> +
> + if (pte_young(*pte)) {
> + referenced = true;
> + *pte = pte_mkold(*pte);
> + }
> +
> +#ifdef CONFIG_MMU_NOTIFIER
> + if (mmu_notifier_clear_young(mm, addr, addr + PAGE_SIZE))
> + referenced = true;
> +#endif /* CONFIG_MMU_NOTIFIER */
> +
> + if (referenced)
> + set_page_young(page);
> +
> + set_page_idle(page);
> +}
> +
[...]
> +
> +static void damon_va_mkold(struct mm_struct *mm, unsigned long addr)
> +{
> + pte_t *pte = NULL;
> + pmd_t *pmd = NULL;
> + spinlock_t *ptl;
> +
> + if (follow_invalidate_pte(mm, addr, NULL, , , ))
> + return;
> +
> + if (pte) {
> + damon_ptep_mkold(pte, mm, addr);
> + pte_unmap_unlock(pte, ptl);
> + } else {
> + damon_pmdp_mkold(pmd, mm, addr);
> + spin_unlock(ptl);
> + }
> +}
> +
[...]

Re: [PATCH v2] mm/shmem: Enclose shmem_mcopy_atomic_pte() with 'CONFIG_USERFAULTFD'

2021-03-16 Thread SeongJae Park

On Tue, 16 Mar 2021 09:16:57 -0700 Axel Rasmussen  
wrote:

> Sorry for the build failure! I sent a new version of my patch with
> this same fix on the 10th
> (https://lore.kernel.org/patchwork/patch/1392464/), and I believe
> Andrew has already included it in his tree.

No problem at all, thank you for letting me know! :)

FYI, I tested on 'master' of https://github.com/hnaz/linux-mm.

Thanks,
SeongJae Park

[...]

Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879

[PATCH v2] mm/shmem: Enclose shmem_mcopy_atomic_pte() with 'CONFIG_USERFAULTFD'

2021-03-16 Thread SeongJae Park

From: SeongJae Park 

Commit 49eeab03fa0a ("userfaultfd: support minor fault handling for
shmem") introduced shmem_mcopy_atomic_pte().  The function is declared
in 'userfaultfd_k.h' when 'CONFIG_USERFAULTFD' is defined, and defined
as 'BUG()' if the config is unset.  However, the definition of the
function in 'shmem.c' is not protected by the '#ifdef' macro.  As a
result, the build fails when the config is not set.  This commit fixes
the problem.

Fixes: 49eeab03fa0a ("userfaultfd: support minor fault handling for shmem")
Signed-off-by: SeongJae Park 
---

Changes from v1
- Remove unnecessary internal code review URL

---
 mm/shmem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index 547df2b766f7..c0d3abefeb3f 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2359,6 +2359,7 @@ static struct inode *shmem_get_inode(struct super_block 
*sb, const struct inode
return inode;
 }
 
+#ifdef CONFIG_USERFAULTFD
 int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
   struct vm_area_struct *dst_vma,
   unsigned long dst_addr, unsigned long src_addr,
@@ -2492,6 +2493,7 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, 
pmd_t *dst_pmd,
shmem_inode_unacct_blocks(inode, 1);
goto out;
 }
+#endif
 
 #ifdef CONFIG_TMPFS
 static const struct inode_operations shmem_symlink_inode_operations;
-- 
2.17.1




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879

[PATCH] mm/shmem: Enclose shmem_mcopy_atomic_pte() with 'CONFIG_USERFAULTFD'

2021-03-16 Thread SeongJae Park

From: SeongJae Park 

Commit 49eeab03fa0a ("userfaultfd: support minor fault handling for
shmem") introduced shmem_mcopy_atomic_pte().  The function is declared
in 'userfaultfd_k.h' when 'CONFIG_USERFAULTFD' is defined, and defined
as 'BUG()' if the config is unset.  However, the definition of the
function in 'shmem.c' is not protected by the '#ifdef' macro.  As a
result, the build fails when the config is not set.  This commit fixes
the problem.

Fixes: 49eeab03fa0a ("userfaultfd: support minor fault handling for shmem")
Signed-off-by: SeongJae Park 

cr https://code.amazon.com/reviews/CR-47204463
---
 mm/shmem.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/mm/shmem.c b/mm/shmem.c
index 547df2b766f7..c0d3abefeb3f 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -2359,6 +2359,7 @@ static struct inode *shmem_get_inode(struct super_block 
*sb, const struct inode
return inode;
 }
 
+#ifdef CONFIG_USERFAULTFD
 int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd,
   struct vm_area_struct *dst_vma,
   unsigned long dst_addr, unsigned long src_addr,
@@ -2492,6 +2493,7 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, 
pmd_t *dst_pmd,
shmem_inode_unacct_blocks(inode, 1);
goto out;
 }
+#endif
 
 #ifdef CONFIG_TMPFS
 static const struct inode_operations shmem_symlink_inode_operations;
-- 
2.17.1




Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879

Re: [PATCH] docs/kokr: make sections on bug reporting match practice

2021-03-16 Thread SeongJae Park

ping?

On Mon,  8 Mar 2021 08:57:01 +0100 SeongJae Park  wrote:

> From: SeongJae Park 
> 
> Translate this commit to Korean:
> 
> cf6d6fc27936 ("docs: process/howto.rst: make sections on bug reporting 
> match practice")
> 
> Signed-off-by: SeongJae Park 
> ---
>  Documentation/translations/ko_KR/howto.rst | 18 +-
>  1 file changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/Documentation/translations/ko_KR/howto.rst 
> b/Documentation/translations/ko_KR/howto.rst
> index 787f1e85f8a0..a2bdd564c907 100644
> --- a/Documentation/translations/ko_KR/howto.rst
> +++ b/Documentation/translations/ko_KR/howto.rst
> @@ -339,14 +339,8 @@ Andrew Morton의 글이 있다.
>  버그 보고
>  -
>  
> -https://bugzilla.kernel.org 는 리눅스 커널 개발자들이 커널의 버그를 추적하는
> -곳이다. 사용자들은 발견한 모든 버그들을 보고하기 위하여 이 툴을 사용할 것을
> -권장한다.  kernel bugzilla를 사용하는 자세한 방법은 다음을 참조하라.
> -
> -https://bugzilla.kernel.org/page.cgi?id=faq.html
> -
>  메인 커널 소스 디렉토리에 있는 'Documentation/admin-guide/reporting-issues.rst'
> -파일은 커널 버그라고 생각되는 것을 보고하는 방법에 관한 좋은 템플릿이며 문제를
> +파일은 커널 버그라고 생각되는 것을 어떻게 보고하면 되는지, 그리고 문제를
>  추적하기 위해서 커널 개발자들이 필요로 하는 정보가 무엇들인지를 상세히 설명하고
>  있다.
>  
> @@ -362,8 +356,14 @@ https://bugzilla.kernel.org 는 리눅스 커널 개발자들이 커널의 버
>  점수를 얻을 수 있는 가장 좋은 방법중의 하나이다. 왜냐하면 많은 사람들은
>  다른 사람들의 버그들을 수정하기 위하여 시간을 낭비하지 않기 때문이다.
>  
> -이미 보고된 버그 리포트들을 가지고 작업하기 위해서 https://bugzilla.kernel.org
> -를 참조하라.
> +이미 보고된 버그 리포트들을 가지고 작업하기 위해서는 여러분이 관심있는
> +서브시스템을 찾아라. 해당 서브시스템의 버그들이 어디로 리포트 되는지
> +MAINTAINERS 파일을 체크하라; 그건 대부분 메일링 리스트이고, 가끔은 버그 추적
> +시스템이다. 그 장소에 있는 최근 버그 리포트 기록들을 검색하고 여러분이 보기에
> +적합하다 싶은 것을 도와라. 여러분은 버그 리포트를 위해
> +https://bugzilla.kernel.org 를 체크하고자 할 수도 있다; 소수의 커널
> +서브시스템들만이 버그 신고와 추적을 위해 해당 시스템을 실제로 사용하고 있지만,
> +전체 커널의 버그들이 그곳에 정리된다.
>  
>  
>  메일링 리스트들
> -- 
> 2.17.1



Amazon Development Center Germany GmbH
Krausenstr. 38
10117 Berlin
Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss
Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B
Sitz: Berlin
Ust-ID: DE 289 237 879

[PATCH] docs/kokr: make sections on bug reporting match practice

2021-03-07 Thread SeongJae Park

From: SeongJae Park 

Translate this commit to Korean:

cf6d6fc27936 ("docs: process/howto.rst: make sections on bug reporting 
match practice")

Signed-off-by: SeongJae Park 
---
 Documentation/translations/ko_KR/howto.rst | 18 +-
 1 file changed, 9 insertions(+), 9 deletions(-)

diff --git a/Documentation/translations/ko_KR/howto.rst 
b/Documentation/translations/ko_KR/howto.rst
index 787f1e85f8a0..a2bdd564c907 100644
--- a/Documentation/translations/ko_KR/howto.rst
+++ b/Documentation/translations/ko_KR/howto.rst
@@ -339,14 +339,8 @@ Andrew Morton의 글이 있다.
 버그 보고
 -
 
-https://bugzilla.kernel.org 는 리눅스 커널 개발자들이 커널의 버그를 추적하는
-곳이다. 사용자들은 발견한 모든 버그들을 보고하기 위하여 이 툴을 사용할 것을
-권장한다.  kernel bugzilla를 사용하는 자세한 방법은 다음을 참조하라.
-
-https://bugzilla.kernel.org/page.cgi?id=faq.html
-
 메인 커널 소스 디렉토리에 있는 'Documentation/admin-guide/reporting-issues.rst'
-파일은 커널 버그라고 생각되는 것을 보고하는 방법에 관한 좋은 템플릿이며 문제를
+파일은 커널 버그라고 생각되는 것을 어떻게 보고하면 되는지, 그리고 문제를
 추적하기 위해서 커널 개발자들이 필요로 하는 정보가 무엇들인지를 상세히 설명하고
 있다.
 
@@ -362,8 +356,14 @@ https://bugzilla.kernel.org 는 리눅스 커널 개발자들이 커널의 버
 점수를 얻을 수 있는 가장 좋은 방법중의 하나이다. 왜냐하면 많은 사람들은
 다른 사람들의 버그들을 수정하기 위하여 시간을 낭비하지 않기 때문이다.
 
-이미 보고된 버그 리포트들을 가지고 작업하기 위해서 https://bugzilla.kernel.org
-를 참조하라.
+이미 보고된 버그 리포트들을 가지고 작업하기 위해서는 여러분이 관심있는
+서브시스템을 찾아라. 해당 서브시스템의 버그들이 어디로 리포트 되는지
+MAINTAINERS 파일을 체크하라; 그건 대부분 메일링 리스트이고, 가끔은 버그 추적
+시스템이다. 그 장소에 있는 최근 버그 리포트 기록들을 검색하고 여러분이 보기에
+적합하다 싶은 것을 도와라. 여러분은 버그 리포트를 위해
+https://bugzilla.kernel.org 를 체크하고자 할 수도 있다; 소수의 커널
+서브시스템들만이 버그 신고와 추적을 위해 해당 시스템을 실제로 사용하고 있지만,
+전체 커널의 버그들이 그곳에 정리된다.
 
 
 메일링 리스트들
-- 
2.17.1

Re: [PATCH v24 00/14] Subject: Introduce Data Access MONitor (DAMON)

2021-03-04 Thread SeongJae Park

On Thu, 4 Feb 2021 16:31:36 +0100 SeongJae Park  wrote:

> From: SeongJae Park 
[...]
> 
> Introduction
> 
> 
> DAMON is a data access monitoring framework for the Linux kernel.  The core
> mechanisms of DAMON called 'region based sampling' and 'adaptive regions
> adjustment' (refer to 'mechanisms.rst' in the 11th patch of this patchset for
> the detail) make it
> 
>  - accurate (The monitored information is useful for DRAM level memory
>management. It might not appropriate for Cache-level accuracy, though.),
>  - light-weight (The monitoring overhead is low enough to be applied online
>while making no impact on the performance of the target workloads.), and
>  - scalable (the upper-bound of the instrumentation overhead is controllable
>regardless of the size of target workloads.).
> 
> Using this framework, therefore, several memory management mechanisms such as
> reclamation and THP can be optimized to aware real data access patterns.
> Experimental access pattern aware memory management optimization works that
> incurring high instrumentation overhead will be able to have another try.
> 
> Though DAMON is for kernel subsystems, it can be easily exposed to the user
> space by writing a DAMON-wrapper kernel subsystem.  Then, user space users who
> have some special workloads will be able to write personalized tools or
> applications for deeper understanding and specialized optimizations of their
> systems.
>
[...]
> 
> Baseline and Complete Git Trees
> ===
> 
> The patches are based on the v5.10.  You can also clone the complete git
> tree:
> 
> $ git clone git://github.com/sjp38/linux -b damon/patches/v24
> 
> The web is also available:
> https://github.com/sjp38/linux/releases/tag/damon/patches/v24
> 
> There are a couple of trees for entire DAMON patchset series.  It includes
> future features.  The first one[1] contains the changes for latest release,
> while the other one[2] contains the changes for next release.
> 
> [1] https://github.com/sjp38/linux/tree/damon/master
> [2] https://github.com/sjp38/linux/tree/damon/next

For people who prefer LTS kernels, I decided to maintain two more trees that
repectively based on latest two LTS kernels and contains backports of the
latest 'damon/master' tree, as below.  Please use those if you want to test
DAMON but using LTS.

- For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y
- For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y


Thanks,
SeongJae Park

[PATCH v24 11/14] Documentation: Add documents for DAMON

2021-03-03 Thread SeongJae Park

From: SeongJae Park 

This commit adds documents for DAMON under
`Documentation/admin-guide/mm/damon/` and `Documentation/vm/damon/`.

Signed-off-by: SeongJae Park 
---
 Documentation/admin-guide/mm/damon/guide.rst | 159 ++
 Documentation/admin-guide/mm/damon/index.rst |  15 +
 Documentation/admin-guide/mm/damon/plans.rst |  29 ++
 Documentation/admin-guide/mm/damon/start.rst |  97 ++
 Documentation/admin-guide/mm/damon/usage.rst | 304 +++
 Documentation/admin-guide/mm/index.rst   |   1 +
 Documentation/vm/damon/api.rst   |  20 ++
 Documentation/vm/damon/design.rst| 166 ++
 Documentation/vm/damon/eval.rst  | 232 ++
 Documentation/vm/damon/faq.rst   |  58 
 Documentation/vm/damon/index.rst |  31 ++
 Documentation/vm/index.rst   |   1 +
 12 files changed, 1113 insertions(+)
 create mode 100644 Documentation/admin-guide/mm/damon/guide.rst
 create mode 100644 Documentation/admin-guide/mm/damon/index.rst
 create mode 100644 Documentation/admin-guide/mm/damon/plans.rst
 create mode 100644 Documentation/admin-guide/mm/damon/start.rst
 create mode 100644 Documentation/admin-guide/mm/damon/usage.rst
 create mode 100644 Documentation/vm/damon/api.rst
 create mode 100644 Documentation/vm/damon/design.rst
 create mode 100644 Documentation/vm/damon/eval.rst
 create mode 100644 Documentation/vm/damon/faq.rst
 create mode 100644 Documentation/vm/damon/index.rst

diff --git a/Documentation/admin-guide/mm/damon/guide.rst 
b/Documentation/admin-guide/mm/damon/guide.rst
new file mode 100644
index ..49da40bc4ba9
--- /dev/null
+++ b/Documentation/admin-guide/mm/damon/guide.rst
@@ -0,0 +1,159 @@
+.. SPDX-License-Identifier: GPL-2.0
+
+==
+Optimization Guide
+==
+
+This document helps you estimating the amount of benefit that you could get
+from DAMON-based optimizations, and describes how you could achieve it.  You
+are assumed to already read :doc:`start`.
+
+
+Check The Signs
+===
+
+No optimization can provide same extent of benefit to every case.  Therefore
+you should first guess how much improvements you could get using DAMON.  If
+some of below conditions match your situation, you could consider using DAMON.
+
+- *Low IPC and High Cache Miss Ratios.*  Low IPC means most of the CPU time is
+  spent waiting for the completion of time-consuming operations such as memory
+  access, while high cache miss ratios mean the caches don't help it well.
+  DAMON is not for cache level optimization, but DRAM level.  However,
+  improving DRAM management will also help this case by reducing the memory
+  operation latency.
+- *Memory Over-commitment and Unknown Users.*  If you are doing memory
+  overcommitment and you cannot control every user of your system, a memory
+  bank run could happen at any time.  You can estimate when it will happen
+  based on DAMON's monitoring results and act earlier to avoid or deal better
+  with the crisis.
+- *Frequent Memory Pressure.*  Frequent memory pressure means your system has
+  wrong configurations or memory hogs.  DAMON will help you find the right
+  configuration and/or the criminals.
+- *Heterogeneous Memory System.*  If your system is utilizing memory devices
+  that placed between DRAM and traditional hard disks, such as non-volatile
+  memory or fast SSDs, DAMON could help you utilizing the devices more
+  efficiently.
+
+
+Profile
+===
+
+If you found some positive signals, you could start by profiling your workloads
+using DAMON.  Find major workloads on your systems and analyze their data
+access pattern to find something wrong or can be improved.  The DAMON user
+space tool (``damo``) will be useful for this.  You can get ``damo`` from
+``tools/damon/`` directory in the DAMON development tree (``damon/master``
+branch of https://github.com/sjp38/linux.git).
+
+We recommend you to start from working set size distribution check using ``damo
+report wss``.  If the distribution is ununiform or quite different from what
+you estimated, you could consider `Memory Configuration`_ optimization.
+
+Then, review the overall access pattern in heatmap form using ``damo report
+heats``.  If it shows a simple pattern consists of a small number of memory
+regions having high contrast of access temperature, you could consider manual
+`Program Modification`_.
+
+If you still want to absorb more benefits, you should develop `Personalized
+DAMON Application`_ for your special case.
+
+You don't need to take only one approach among the above plans, but you could
+use multiple of the above approaches to maximize the benefit.
+
+
+Optimize
+
+
+If the profiling result also says it's worth trying some optimization, you
+could consider below approaches.  Note that some of the below approaches assume
+that your systems are configured with swap devices or other types of auxiliary
+memory so that you don't

[PATCH v24 10/14] mm/damon/dbgfs: Support multiple contexts

2021-03-03 Thread SeongJae Park

From: SeongJae Park 

In some use cases, users would want to run multiple monitoring context.
For example, if a user wants a high precision monitoring and dedicating
multiple CPUs for the job is ok, because DAMON creates one monitoring
thread per one context, the user can split the monitoring target regions
into multiple small regions and create one context for each region.  Or,
someone might want to simultaneously monitor different address spaces,
e.g., both virtual address space and physical address space.

The DAMON's API allows such usage, but 'damon-dbgfs' does not.
Therefore, only kernel space DAMON users can do multiple contexts
monitoring.

This commit allows the user space DAMON users to use multiple contexts
monitoring by introducing two new 'damon-dbgfs' debugfs files,
'mk_context' and 'rm_context'.  Users can create a new monitoring
context by writing the desired name of the new context to 'mk_context'.
Then, a new directory with the name and having the files for setting of
the context ('attrs', 'target_ids' and 'record') will be created under
the debugfs directory.  Writing the name of the context to remove to
'rm_context' will remove the related context and directory.

Signed-off-by: SeongJae Park 
---
 mm/damon/dbgfs.c | 215 ++-
 1 file changed, 212 insertions(+), 3 deletions(-)

diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c
index 4b9ac2043e99..68edfd4d3b41 100644
--- a/mm/damon/dbgfs.c
+++ b/mm/damon/dbgfs.c
@@ -29,6 +29,7 @@ struct dbgfs_recorder {
 static struct damon_ctx **dbgfs_ctxs;
 static int dbgfs_nr_ctxs;
 static struct dentry **dbgfs_dirs;
+static DEFINE_MUTEX(damon_dbgfs_lock);
 
 /*
  * Returns non-empty string on success, negarive error code otherwise.
@@ -495,6 +496,13 @@ static void dbgfs_write_record_header(struct damon_ctx 
*ctx)
dbgfs_write_rbuf(ctx, _ver, sizeof(recfmt_ver));
 }
 
+static void dbgfs_free_recorder(struct dbgfs_recorder *recorder)
+{
+   kfree(recorder->rbuf);
+   kfree(recorder->rfile_path);
+   kfree(recorder);
+}
+
 static unsigned int nr_damon_targets(struct damon_ctx *ctx)
 {
struct damon_target *t;
@@ -561,7 +569,7 @@ static struct damon_ctx *dbgfs_new_ctx(void)
 {
struct damon_ctx *ctx;
 
-   ctx = damon_new_ctx(DAMON_ADAPTIVE_TARGET);
+   ctx = damon_new_ctx();
if (!ctx)
return NULL;
 
@@ -577,6 +585,195 @@ static struct damon_ctx *dbgfs_new_ctx(void)
return ctx;
 }
 
+static void dbgfs_destroy_ctx(struct damon_ctx *ctx)
+{
+   dbgfs_free_recorder(ctx->callback.private);
+   damon_destroy_ctx(ctx);
+}
+
+/*
+ * Make a context of @name and create a debugfs directory for it.
+ *
+ * This function should be called while holding damon_dbgfs_lock.
+ *
+ * Returns 0 on success, negative error code otherwise.
+ */
+static int dbgfs_mk_context(char *name)
+{
+   struct dentry *root, **new_dirs, *new_dir;
+   struct damon_ctx **new_ctxs, *new_ctx;
+   int err;
+
+   if (damon_nr_running_ctxs())
+   return -EBUSY;
+
+   new_ctxs = krealloc(dbgfs_ctxs, sizeof(*dbgfs_ctxs) *
+   (dbgfs_nr_ctxs + 1), GFP_KERNEL);
+   if (!new_ctxs)
+   return -ENOMEM;
+
+   new_dirs = krealloc(dbgfs_dirs, sizeof(*dbgfs_dirs) *
+   (dbgfs_nr_ctxs + 1), GFP_KERNEL);
+   if (!new_dirs) {
+   kfree(new_ctxs);
+   return -ENOMEM;
+   }
+
+   dbgfs_ctxs = new_ctxs;
+   dbgfs_dirs = new_dirs;
+
+   root = dbgfs_dirs[0];
+   if (!root)
+   return -ENOENT;
+
+   new_dir = debugfs_create_dir(name, root);
+   if (IS_ERR(new_dir))
+   return PTR_ERR(new_dir);
+   dbgfs_dirs[dbgfs_nr_ctxs] = new_dir;
+
+   new_ctx = dbgfs_new_ctx();
+   if (!new_ctx) {
+   debugfs_remove(new_dir);
+   dbgfs_dirs[dbgfs_nr_ctxs] = NULL;
+   return -ENOMEM;
+   }
+   dbgfs_ctxs[dbgfs_nr_ctxs] = new_ctx;
+
+   err = dbgfs_fill_ctx_dir(dbgfs_dirs[dbgfs_nr_ctxs],
+   dbgfs_ctxs[dbgfs_nr_ctxs]);
+   if (err)
+   return err;
+
+   dbgfs_nr_ctxs++;
+   return 0;
+}
+
+static ssize_t dbgfs_mk_context_write(struct file *file,
+   const char __user *buf, size_t count, loff_t *ppos)
+{
+   char *kbuf;
+   char *ctx_name;
+   ssize_t ret = count;
+   int err;
+
+   kbuf = user_input_str(buf, count, ppos);
+   if (IS_ERR(kbuf))
+   return PTR_ERR(kbuf);
+   ctx_name = kmalloc(count + 1, GFP_KERNEL);
+   if (!ctx_name) {
+   kfree(kbuf);
+   return -ENOMEM;
+   }
+
+   /* Trim white space */
+   if (sscanf(kbuf, "%s", ctx_name) != 1) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   mutex_lock(_dbgfs_lock);
+   err = dbgfs_mk_context(ctx_name);
+   if (err)
+

Re: Please apply "xen-netback: delete NAPI instance when queue fails to initialize" to v4.4.y

2021-02-24 Thread SeongJae Park

On Wed, 24 Feb 2021 18:21:09 +0100 Greg KH  wrote:

> On Wed, Feb 24, 2021 at 06:03:56PM +0100, SeongJae Park wrote:
> > This is a request for merge of upstream commit 4a658527271b ("xen-netback:
> > delete NAPI instance when queue fails to initialize") on v4.4.y tree.
> > 
> > If 'xenvif_connect()' fails after successful 'netif_napi_add()', the napi is
> > not cleaned up.  Because 'create_queues()' frees the queues in its error
> > handling code, if the 'xenvif_free()' is called for the vif, use-after-free
> > occurs. The upstream commit fixes the problem by cleaning up the napi in the
> > 'xenvif_connect()'.
> > 
> > Attaching the original patch below for your convenience.
> 
> The original patch does not apply cleanly.

I tested the commit is cleanly applicable with 'git cherry-pick' before posting
this.  I just tried 'git format-patch ... && git am ...' and confirmed it
doesn't work.  Sorry, my fault.

> 
> > Tested-by: Markus Boehme 
> 
> What was tested?

We confirmed the unmodified v4.4.y kernel crashes on a stress test that
repeatedly doing netdev attach/detach, while the patch applied version doesn't.

> 
> I backported the patch, but next time, please provide the patch that
> will work properly.

Thanks, and apology for the inconvenience.  I will do the check with posting
patch again rather than only 'git cherry-pick' from next time.

Thanks,
SeongJae Park

> 
> greg k-h

Please apply "xen-netback: delete NAPI instance when queue fails to initialize" to v4.4.y

2021-02-24 Thread SeongJae Park

This is a request for merge of upstream commit 4a658527271b ("xen-netback:
delete NAPI instance when queue fails to initialize") on v4.4.y tree.

If 'xenvif_connect()' fails after successful 'netif_napi_add()', the napi is
not cleaned up.  Because 'create_queues()' frees the queues in its error
handling code, if the 'xenvif_free()' is called for the vif, use-after-free
occurs. The upstream commit fixes the problem by cleaning up the napi in the
'xenvif_connect()'.

Attaching the original patch below for your convenience.

Tested-by: Markus Boehme 


Thanks,
SeongJae Park

 >8 ===
>From 4a658527271bce43afb1cf4feec89afe6716ca59 Mon Sep 17 00:00:00 2001
From: David Vrabel 
Date: Fri, 15 Jan 2016 14:55:35 +
Subject: [PATCH] xen-netback: delete NAPI instance when queue fails to
 initialize

When xenvif_connect() fails it may leave a stale NAPI instance added to
the device.  Make sure we delete it in the error path.

Signed-off-by: David Vrabel 
Signed-off-by: David S. Miller 
---
 drivers/net/xen-netback/interface.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/net/xen-netback/interface.c 
b/drivers/net/xen-netback/interface.c
index e7bd63eb2876..3bba6ceee132 100644
--- a/drivers/net/xen-netback/interface.c
+++ b/drivers/net/xen-netback/interface.c
@@ -615,6 +615,7 @@ int xenvif_connect(struct xenvif_queue *queue, unsigned 
long tx_ring_ref,
queue->tx_irq = 0;
 err_unmap:
xenvif_unmap_frontend_rings(queue);
+   netif_napi_del(>napi);
 err:
module_put(THIS_MODULE);
return err;
-- 
2.17.1

Re: [PATCH v24 00/14] Subject: Introduce Data Access MONitor (DAMON)

2021-02-24 Thread SeongJae Park

On Thu, 4 Feb 2021 16:31:36 +0100 SeongJae Park  wrote:

> From: SeongJae Park 
> 
[...]
> 
> Introduction
> 
> 
> DAMON is a data access monitoring framework for the Linux kernel.  The core
> mechanisms of DAMON called 'region based sampling' and 'adaptive regions
> adjustment' (refer to 'mechanisms.rst' in the 11th patch of this patchset for
> the detail) make it
> 
>  - accurate (The monitored information is useful for DRAM level memory
>management. It might not appropriate for Cache-level accuracy, though.),
>  - light-weight (The monitoring overhead is low enough to be applied online
>while making no impact on the performance of the target workloads.), and
>  - scalable (the upper-bound of the instrumentation overhead is controllable
>regardless of the size of target workloads.).
> 
> Using this framework, therefore, several memory management mechanisms such as
> reclamation and THP can be optimized to aware real data access patterns.
> Experimental access pattern aware memory management optimization works that
> incurring high instrumentation overhead will be able to have another try.
> 
> Though DAMON is for kernel subsystems, it can be easily exposed to the user
> space by writing a DAMON-wrapper kernel subsystem.  Then, user space users who
> have some special workloads will be able to write personalized tools or
> applications for deeper understanding and specialized optimizations of their
> systems.
> 

I realized I didn't introduce a good, intuitive example use case of DAMON for
profiling so far, though DAMON is not for only profiling.  One straightforward
and realistic usage of DAMON as a profiling tool would be recording the
monitoring results with callstack and visualize those by timeline together.

For example, below link shows that visualization for a realistic workload,
namely 'fft' in SPLASH-2X benchmark suite.  From that, you can know there are
three memory access bursting phases in the workload and
'FFT1DOnce.cons::prop.2()' looks responsible for the first and second hot
phase, while 'Transpose()' is responsible for the last one.  Now the programmer
can take a deep look in the functions and optimize the code (e.g., adding
madvise() or mlock() calls).

https://damonitor.github.io/temporal/damon_callstack.png

We used the approach for 'mlock()'-based optimization of a range of other
realistic benchmark workloads.  The optimized versions achieved up to about
2.5x performance improvement under memory pressure[1].

Note: I made the uppermost two figures in above 'fft' visualization (working
set size and access frequency of each memory region by time) via the DAMON user
space tool[2], while the lowermost one (callstack by time) is made using perf
and speedscope[3].  We have no descent and totally automated tool for that yet
(will be implemented soon, maybe under perf as a perf-script[4]), but you could
reproduce that with below commands.

$ # run the workload
$ sudo damo record $(pidof ) &
$ sudo perf record -g $(pidof )
$ # after your workload finished (you should also finish perf on your own)
$ damo report wss --sortby time --plot wss.pdf
$ damo report heats --heatmap freq.pdf
$ sudo perf script | speedscope -
$ # open wss.pdf and freq.pdf with our favorite pdf viewer

[1] 
https://linuxplumbersconf.org/event/4/contributions/548/attachments/311/590/damon_ksummit19.pdf
[2] https://lore.kernel.org/linux-mm/20201215115448.25633-8-sjp...@amazon.com/
[3] https://www.speedscope.app/
[4] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjp...@amazon.com/

Re: [PATCH v24 11/14] Documentation: Add documents for DAMON

2021-02-23 Thread SeongJae Park

On Thu, 4 Feb 2021 16:31:47 +0100 SeongJae Park  wrote:

> From: SeongJae Park 
> 
> This commit adds documents for DAMON under
> `Documentation/admin-guide/mm/damon/` and `Documentation/vm/damon/`.
> 
> Signed-off-by: SeongJae Park 
> ---
>  Documentation/admin-guide/mm/damon/guide.rst | 159 ++
>  Documentation/admin-guide/mm/damon/index.rst |  15 +
>  Documentation/admin-guide/mm/damon/plans.rst |  29 ++
>  Documentation/admin-guide/mm/damon/start.rst |  97 ++
>  Documentation/admin-guide/mm/damon/usage.rst | 304 +++
>  Documentation/admin-guide/mm/index.rst   |   1 +
>  Documentation/vm/damon/api.rst   |  20 ++
>  Documentation/vm/damon/design.rst| 166 ++
>  Documentation/vm/damon/eval.rst  | 232 ++
>  Documentation/vm/damon/faq.rst   |  58 
>  Documentation/vm/damon/index.rst |  31 ++
>  Documentation/vm/index.rst   |   1 +
>  12 files changed, 1113 insertions(+)
>  create mode 100644 Documentation/admin-guide/mm/damon/guide.rst
>  create mode 100644 Documentation/admin-guide/mm/damon/index.rst
>  create mode 100644 Documentation/admin-guide/mm/damon/plans.rst
>  create mode 100644 Documentation/admin-guide/mm/damon/start.rst
>  create mode 100644 Documentation/admin-guide/mm/damon/usage.rst
>  create mode 100644 Documentation/vm/damon/api.rst
>  create mode 100644 Documentation/vm/damon/design.rst
>  create mode 100644 Documentation/vm/damon/eval.rst
>  create mode 100644 Documentation/vm/damon/faq.rst
>  create mode 100644 Documentation/vm/damon/index.rst
> 
[...]
> diff --git a/Documentation/admin-guide/mm/damon/usage.rst 
> b/Documentation/admin-guide/mm/damon/usage.rst
> new file mode 100644
> index ..32436cf853c7
> --- /dev/null
> +++ b/Documentation/admin-guide/mm/damon/usage.rst
> @@ -0,0 +1,304 @@
> +.. SPDX-License-Identifier: GPL-2.0
> +
> +===
> +Detailed Usages
> +===
> +
> +DAMON provides below three interfaces for different users.
> +
> +- *DAMON user space tool.*
> +  This is for privileged people such as system administrators who want a
> +  just-working human-friendly interface.  Using this, users can use the 
> DAMONâ€™s
> +  major features in a human-friendly way.  It may not be highly tuned for
> +  special cases, though.  It supports only virtual address spaces monitoring.
> +- *debugfs interface.*
> +  This is for privileged user space programmers who want more optimized use 
> of
> +  DAMON.  Using this, users can use DAMONâ€™s major features by reading
> +  from and writing to special debugfs files.  Therefore, you can write and 
> use
> +  your personalized DAMON debugfs wrapper programs that reads/writes the
> +  debugfs files instead of you.  The DAMON user space tool is also a 
> reference
> +  implementation of such programs.  It supports only virtual address spaces
> +  monitoring.
> +- *Kernel Space Programming Interface.*
> +  This is for kernel space programmers.  Using this, users can utilize every
> +  feature of DAMON most flexibly and efficiently by writing kernel space
> +  DAMON application programs for you.  You can even extend DAMON for various
> +  address spaces.
> +
> +This document does not describe the kernel space programming interface in
> +detail.  For that, please refer to the :doc:`/vm/damon/api`.
> +
> +
> +DAMON User Space Tool
> +=

This version of the patchset doesn't introduce the user space tool source code,
so putting the detailed usage here might make no sense.  I will remove this
section in the next version.  If you will review this patch, please skip this
section.
[...]
> +
> +debugfs Interface
> +=

But, this section will not be removed.  Please review.

[...]


Thanks,
SeongJae Park

Re: [PATCH v24 07/14] mm/damon: Implement a debugfs-based user space interface

2021-02-05 Thread SeongJae Park

On Fri, 5 Feb 2021 16:29:41 +0100 Greg KH  wrote:

> On Thu, Feb 04, 2021 at 04:31:43PM +0100, SeongJae Park wrote:
> > From: SeongJae Park 
> > 
> > DAMON is designed to be used by kernel space code such as the memory
> > management subsystems, and therefore it provides only kernel space API.
> > That said, letting the user space control DAMON could provide some
> > benefits to them.  For example, it will allow user space to analyze
> > their specific workloads and make their own special optimizations.
> > 
> > For such cases, this commit implements a simple DAMON application kernel
> > module, namely 'damon-dbgfs', which merely wraps the DAMON api and
> > exports those to the user space via the debugfs.
> > 
> > 'damon-dbgfs' exports three files, ``attrs``, ``target_ids``, and
> > ``monitor_on`` under its debugfs directory, ``/damon/``.
[...]
> > ---
> >  include/linux/damon.h |   3 +
> >  mm/damon/Kconfig  |   9 +
> >  mm/damon/Makefile |   1 +
> >  mm/damon/core.c   |  47 +
> >  mm/damon/dbgfs.c  | 387 ++
> >  5 files changed, 447 insertions(+)
> >  create mode 100644 mm/damon/dbgfs.c
[...]
> > diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c
> > new file mode 100644
> > index ..db15380737d1
> > --- /dev/null
> > +++ b/mm/damon/dbgfs.c
[...]
> > +
> > +static int dbgfs_fill_ctx_dir(struct dentry *dir, struct damon_ctx *ctx)
> > +{
> > +   const char * const file_names[] = {"attrs", "target_ids"};
> > +   const struct file_operations *fops[] = {_fops, _ids_fops};
> > +   int i;
> > +
> > +   for (i = 0; i < ARRAY_SIZE(file_names); i++) {
> > +   if (!debugfs_create_file(file_names[i], 0600, dir,
> > +   ctx, fops[i])) {
> > +   pr_err("failed to create %s file\n", file_names[i]);
> > +   return -ENOMEM;
> 
> No need to check the return value of this function, just keep going and
> ignore it as there's nothing to do and kernel code should not do
> different things based on the output of any debugfs calls.
> 
> Also, this check is totally wrong and doesn't do what you think it is
> doing...

Ok, I will drop the check.

> 
> > +static int __init __damon_dbgfs_init(void)
> > +{
> > +   struct dentry *dbgfs_root;
> > +   const char * const file_names[] = {"monitor_on"};
> > +   const struct file_operations *fops[] = {_on_fops};
> > +   int i;
> > +
> > +   dbgfs_root = debugfs_create_dir("damon", NULL);
> > +   if (IS_ERR(dbgfs_root)) {
> > +   pr_err("failed to create the dbgfs dir\n");
> > +   return PTR_ERR(dbgfs_root);
> 
> Again, no need to check anything, just pass the result of a debugfs call
> back into another one just fine.

Ok.

> 
> > +   }
> > +
> > +   for (i = 0; i < ARRAY_SIZE(file_names); i++) {
> > +   if (!debugfs_create_file(file_names[i], 0600, dbgfs_root,
> > +   NULL, fops[i])) {
> 
> Again, this isn't checking what you think it is, so please don't do it.

Got it.

I will fix those as you suggested in the next version.


Thanks,
SeongJae Park

> 
> thanks,
> 
> greg k-h

[PATCH v24 05/14] mm/damon: Implement primitives for the virtual memory address spaces

2021-02-04 Thread SeongJae Park

From: SeongJae Park 

This commit introduces a reference implementation of the address space
specific low level primitives for the virtual address space, so that
users of DAMON can easily monitor the data accesses on virtual address
spaces of specific processes by simply configuring the implementation to
be used by DAMON.

The low level primitives for the fundamental access monitoring are
defined in two parts:

1. Identification of the monitoring target address range for the address
   space.
2. Access check of specific address range in the target space.

The reference implementation for the virtual address space does the
works as below.

PTE Accessed-bit Based Access Check
---

The implementation uses PTE Accessed-bit for basic access checks.  That
is, it clears the bit for the next sampling target page and checks
whether it is set again after one sampling period.  This could disturb
the reclaim logic.  DAMON uses ``PG_idle`` and ``PG_young`` page flags
to solve the conflict, as Idle page tracking does.

VMA-based Target Address Range Construction
---

Only small parts in the super-huge virtual address space of the
processes are mapped to physical memory and accessed.  Thus, tracking
the unmapped address regions is just wasteful.  However, because DAMON
can deal with some level of noise using the adaptive regions adjustment
mechanism, tracking every mapping is not strictly required but could
even incur a high overhead in some cases.  That said, too huge unmapped
areas inside the monitoring target should be removed to not take the
time for the adaptive mechanism.

For the reason, this implementation converts the complex mappings to
three distinct regions that cover every mapped area of the address
space.  Also, the two gaps between the three regions are the two biggest
unmapped areas in the given address space.  The two biggest unmapped
areas would be the gap between the heap and the uppermost mmap()-ed
region, and the gap between the lowermost mmap()-ed region and the stack
in most of the cases.  Because these gaps are exceptionally huge in
usual address spaces, excluding these will be sufficient to make a
reasonable trade-off.  Below shows this in detail::




(small mmap()-ed regions and munmap()-ed regions)




Signed-off-by: SeongJae Park 
Reviewed-by: Leonard Foerster 
---
 include/linux/damon.h |  13 +
 mm/damon/Kconfig  |   9 +
 mm/damon/Makefile |   1 +
 mm/damon/vaddr.c  | 579 ++
 4 files changed, 602 insertions(+)
 create mode 100644 mm/damon/vaddr.c

diff --git a/include/linux/damon.h b/include/linux/damon.h
index 0bd5d6913a6c..72cf5ebd35fe 100644
--- a/include/linux/damon.h
+++ b/include/linux/damon.h
@@ -246,4 +246,17 @@ int damon_stop(struct damon_ctx **ctxs, int nr_ctxs);
 
 #endif /* CONFIG_DAMON */
 
+#ifdef CONFIG_DAMON_VADDR
+
+/* Monitoring primitives for virtual memory address spaces */
+void damon_va_init(struct damon_ctx *ctx);
+void damon_va_update(struct damon_ctx *ctx);
+void damon_va_prepare_access_checks(struct damon_ctx *ctx);
+unsigned int damon_va_check_accesses(struct damon_ctx *ctx);
+bool damon_va_target_valid(void *t);
+void damon_va_cleanup(struct damon_ctx *ctx);
+void damon_va_set_primitives(struct damon_ctx *ctx);
+
+#endif /* CONFIG_DAMON_VADDR */
+
 #endif /* _DAMON_H */
diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig
index d00e99ac1a15..8ae080c52950 100644
--- a/mm/damon/Kconfig
+++ b/mm/damon/Kconfig
@@ -12,4 +12,13 @@ config DAMON
  See https://damonitor.github.io/doc/html/latest-damon/index.html for
  more information.
 
+config DAMON_VADDR
+   bool "Data access monitoring primitives for virtual address spaces"
+   depends on DAMON && MMU
+   select PAGE_EXTENSION if !64BIT
+   select PAGE_IDLE_FLAG
+   help
+ This builds the default data access monitoring primitives for DAMON
+ that works for virtual address spaces.
+
 endmenu
diff --git a/mm/damon/Makefile b/mm/damon/Makefile
index 4fd2edb4becf..6ebbd08aed67 100644
--- a/mm/damon/Makefile
+++ b/mm/damon/Makefile
@@ -1,3 +1,4 @@
 # SPDX-License-Identifier: GPL-2.0
 
 obj-$(CONFIG_DAMON):= core.o
+obj-$(CONFIG_DAMON_VADDR)  += vaddr.o
diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c
new file mode 100644
index ..a6bf234daae6
--- /dev/null
+++ b/mm/damon/vaddr.c
@@ -0,0 +1,579 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * DAMON Primitives for Virtual Address Spaces
+ *
+ * Author: SeongJae Park 
+ */
+
+#define pr_fmt(fmt) "damon-va: " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+/* Get a random number in [l, r) */
+#define damon_rand(l, r) (l + prandom_u32_max(r - l))
+
+/*
+ * 't->id' should be the pointer to the relevant 'struct pid' having reference
+ * count.  Caller must put the returned task, u

[PATCH v24 14/14] MAINTAINERS: Update for DAMON

2021-02-04 Thread SeongJae Park

From: SeongJae Park 

This commit updates MAINTAINERS file for DAMON related files.

Signed-off-by: SeongJae Park 
---
 MAINTAINERS | 12 
 1 file changed, 12 insertions(+)

diff --git a/MAINTAINERS b/MAINTAINERS
index 281de213ef47..88b2125b0f07 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -4872,6 +4872,18 @@ F:   net/ax25/ax25_out.c
 F: net/ax25/ax25_timer.c
 F: net/ax25/sysctl_net_ax25.c
 
+DATA ACCESS MONITOR
+M: SeongJae Park 
+L: linux...@kvack.org
+S: Maintained
+F: Documentation/admin-guide/mm/damon/*
+F: Documentation/vm/damon/*
+F: include/linux/damon.h
+F: include/trace/events/damon.h
+F: mm/damon/*
+F: tools/damon/*
+F: tools/testing/selftests/damon/*
+
 DAVICOM FAST ETHERNET (DMFE) NETWORK DRIVER
 L: net...@vger.kernel.org
 S: Orphan
-- 
2.17.1

[PATCH v24 13/14] mm/damon: Add user space selftests

2021-02-04 Thread SeongJae Park

From: SeongJae Park 

This commit adds a simple user space tests for DAMON.  The tests are
using kselftest framework.

Signed-off-by: SeongJae Park 
---
 tools/testing/selftests/damon/Makefile|   7 +
 .../selftests/damon/_chk_dependency.sh|  28 +++
 tools/testing/selftests/damon/_chk_record.py  | 109 
 .../testing/selftests/damon/debugfs_attrs.sh  | 161 ++
 .../testing/selftests/damon/debugfs_record.sh |  50 ++
 5 files changed, 355 insertions(+)
 create mode 100644 tools/testing/selftests/damon/Makefile
 create mode 100644 tools/testing/selftests/damon/_chk_dependency.sh
 create mode 100644 tools/testing/selftests/damon/_chk_record.py
 create mode 100755 tools/testing/selftests/damon/debugfs_attrs.sh
 create mode 100755 tools/testing/selftests/damon/debugfs_record.sh

diff --git a/tools/testing/selftests/damon/Makefile 
b/tools/testing/selftests/damon/Makefile
new file mode 100644
index ..cfd5393a4639
--- /dev/null
+++ b/tools/testing/selftests/damon/Makefile
@@ -0,0 +1,7 @@
+# SPDX-License-Identifier: GPL-2.0
+# Makefile for damon selftests
+
+TEST_FILES = _chk_dependency.sh _chk_record_file.py
+TEST_PROGS = debugfs_attrs.sh debugfs_record.sh
+
+include ../lib.mk
diff --git a/tools/testing/selftests/damon/_chk_dependency.sh 
b/tools/testing/selftests/damon/_chk_dependency.sh
new file mode 100644
index ..b304b7779976
--- /dev/null
+++ b/tools/testing/selftests/damon/_chk_dependency.sh
@@ -0,0 +1,28 @@
+#!/bin/bash
+# SPDX-License-Identifier: GPL-2.0
+
+# Kselftest framework requirement - SKIP code is 4.
+ksft_skip=4
+
+DBGFS=/sys/kernel/debug/damon
+
+if [ $EUID -ne 0 ];
+then
+   echo "Run as root"
+   exit $ksft_skip
+fi
+
+if [ ! -d $DBGFS ]
+then
+   echo "$DBGFS not found"
+   exit $ksft_skip
+fi
+
+for f in attrs record target_ids monitor_on
+do
+   if [ ! -f "$DBGFS/$f" ]
+   then
+   echo "$f not found"
+   exit 1
+   fi
+done
diff --git a/tools/testing/selftests/damon/_chk_record.py 
b/tools/testing/selftests/damon/_chk_record.py
new file mode 100644
index ..73e128904319
--- /dev/null
+++ b/tools/testing/selftests/damon/_chk_record.py
@@ -0,0 +1,109 @@
+#!/usr/bin/env python3
+# SPDX-License-Identifier: GPL-2.0
+
+"Check whether the DAMON record file is valid"
+
+import argparse
+import struct
+import sys
+
+fmt_version = 0
+
+def set_fmt_version(f):
+global fmt_version
+
+mark = f.read(16)
+if mark == b'damon_recfmt_ver':
+fmt_version = struct.unpack('i', f.read(4))[0]
+else:
+fmt_version = 0
+f.seek(0)
+return fmt_version
+
+def read_pid(f):
+if fmt_version == 1:
+pid = struct.unpack('i', f.read(4))[0]
+else:
+pid = struct.unpack('L', f.read(8))[0]
+
+def err_percent(val, expected):
+return abs(val - expected) / expected * 100
+
+def chk_task_info(f):
+pid = read_pid(f)
+nr_regions = struct.unpack('I', f.read(4))[0]
+
+if nr_regions > max_nr_regions:
+print('too many regions: %d > %d' % (nr_regions, max_nr_regions))
+exit(1)
+
+nr_gaps = 0
+eaddr = 0
+for r in range(nr_regions):
+saddr = struct.unpack('L', f.read(8))[0]
+if eaddr and saddr != eaddr:
+nr_gaps += 1
+eaddr = struct.unpack('L', f.read(8))[0]
+nr_accesses = struct.unpack('I', f.read(4))[0]
+
+if saddr >= eaddr:
+print('wrong region [%d,%d)' % (saddr, eaddr))
+exit(1)
+
+max_nr_accesses = aint / sint
+if nr_accesses > max_nr_accesses:
+if err_percent(nr_accesses, max_nr_accesses) > 15:
+print('too high nr_access: expected %d but %d' %
+(max_nr_accesses, nr_accesses))
+exit(1)
+if nr_gaps != 2:
+print('number of gaps are not two but %d' % nr_gaps)
+exit(1)
+
+def parse_time_us(bindat):
+sec = struct.unpack('l', bindat[0:8])[0]
+nsec = struct.unpack('l', bindat[8:16])[0]
+return (sec * 10 + nsec) / 1000
+
+def main():
+global sint
+global aint
+global min_nr
+global max_nr_regions
+
+parser = argparse.ArgumentParser()
+parser.add_argument('file', metavar='',
+help='path to the record file')
+parser.add_argument('--attrs', metavar='',
+default='5000 10 100 10 1000',
+help='content of debugfs attrs file')
+args = parser.parse_args()
+file_path = args.file
+attrs = [int(x) for x in args.attrs.split()]
+sint, aint, rint, min_nr, max_nr_regions = attrs
+
+with open(file_path, 'rb') as f:
+set_fmt_version(f)
+last_aggr_time = None
+while True:
+timebin = f.read(16)
+if len(timebin) != 16:
+break
+
+now = parse_time_us(timebin)
+if not last_a

[PATCH v24 12/14] mm/damon: Add kunit tests

2021-02-04 Thread SeongJae Park

From: SeongJae Park 

This commit adds kunit based unit tests for the core and the virtual
address spaces monitoring primitives of DAMON.

Signed-off-by: SeongJae Park 
Reviewed-by: Brendan Higgins 
---
 mm/damon/Kconfig  |  36 +
 mm/damon/core-test.h  | 253 
 mm/damon/core.c   |   7 +
 mm/damon/dbgfs-test.h | 214 +++
 mm/damon/dbgfs.c  |   2 +
 mm/damon/vaddr-test.h | 328 ++
 mm/damon/vaddr.c  |   7 +
 7 files changed, 847 insertions(+)
 create mode 100644 mm/damon/core-test.h
 create mode 100644 mm/damon/dbgfs-test.h
 create mode 100644 mm/damon/vaddr-test.h

diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig
index 72f1683ba0ee..455995152697 100644
--- a/mm/damon/Kconfig
+++ b/mm/damon/Kconfig
@@ -12,6 +12,18 @@ config DAMON
  See https://damonitor.github.io/doc/html/latest-damon/index.html for
  more information.
 
+config DAMON_KUNIT_TEST
+   bool "Test for damon" if !KUNIT_ALL_TESTS
+   depends on DAMON && KUNIT=y
+   default KUNIT_ALL_TESTS
+   help
+ This builds the DAMON Kunit test suite.
+
+ For more information on KUnit and unit tests in general, please refer
+ to the KUnit documentation.
+
+ If unsure, say N.
+
 config DAMON_VADDR
bool "Data access monitoring primitives for virtual address spaces"
depends on DAMON && MMU
@@ -21,6 +33,18 @@ config DAMON_VADDR
  This builds the default data access monitoring primitives for DAMON
  that works for virtual address spaces.
 
+config DAMON_VADDR_KUNIT_TEST
+   bool "Test for DAMON primitives" if !KUNIT_ALL_TESTS
+   depends on DAMON_VADDR && KUNIT=y
+   default KUNIT_ALL_TESTS
+   help
+ This builds the DAMON virtual addresses primitives Kunit test suite.
+
+ For more information on KUnit and unit tests in general, please refer
+ to the KUnit documentation.
+
+ If unsure, say N.
+
 config DAMON_DBGFS
bool "DAMON debugfs interface"
depends on DAMON_VADDR && DEBUG_FS
@@ -30,4 +54,16 @@ config DAMON_DBGFS
 
  If unsure, say N.
 
+config DAMON_DBGFS_KUNIT_TEST
+   bool "Test for damon debugfs interface" if !KUNIT_ALL_TESTS
+   depends on DAMON_DBGFS && KUNIT=y
+   default KUNIT_ALL_TESTS
+   help
+ This builds the DAMON debugfs interface Kunit test suite.
+
+ For more information on KUnit and unit tests in general, please refer
+ to the KUnit documentation.
+
+ If unsure, say N.
+
 endmenu
diff --git a/mm/damon/core-test.h b/mm/damon/core-test.h
new file mode 100644
index ..b815dfbfb5fd
--- /dev/null
+++ b/mm/damon/core-test.h
@@ -0,0 +1,253 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Data Access Monitor Unit Tests
+ *
+ * Copyright 2019 Amazon.com, Inc. or its affiliates.  All rights reserved.
+ *
+ * Author: SeongJae Park 
+ */
+
+#ifdef CONFIG_DAMON_KUNIT_TEST
+
+#ifndef _DAMON_CORE_TEST_H
+#define _DAMON_CORE_TEST_H
+
+#include 
+
+static void damon_test_regions(struct kunit *test)
+{
+   struct damon_region *r;
+   struct damon_target *t;
+
+   r = damon_new_region(1, 2);
+   KUNIT_EXPECT_EQ(test, 1ul, r->ar.start);
+   KUNIT_EXPECT_EQ(test, 2ul, r->ar.end);
+   KUNIT_EXPECT_EQ(test, 0u, r->nr_accesses);
+
+   t = damon_new_target(42);
+   KUNIT_EXPECT_EQ(test, 0u, damon_nr_regions(t));
+
+   damon_add_region(r, t);
+   KUNIT_EXPECT_EQ(test, 1u, damon_nr_regions(t));
+
+   damon_del_region(r);
+   KUNIT_EXPECT_EQ(test, 0u, damon_nr_regions(t));
+
+   damon_free_target(t);
+}
+
+static unsigned int nr_damon_targets(struct damon_ctx *ctx)
+{
+   struct damon_target *t;
+   unsigned int nr_targets = 0;
+
+   damon_for_each_target(t, ctx)
+   nr_targets++;
+
+   return nr_targets;
+}
+
+static void damon_test_target(struct kunit *test)
+{
+   struct damon_ctx *c = damon_new_ctx();
+   struct damon_target *t;
+
+   t = damon_new_target(42);
+   KUNIT_EXPECT_EQ(test, 42ul, t->id);
+   KUNIT_EXPECT_EQ(test, 0u, nr_damon_targets(c));
+
+   damon_add_target(c, t);
+   KUNIT_EXPECT_EQ(test, 1u, nr_damon_targets(c));
+
+   damon_destroy_target(t);
+   KUNIT_EXPECT_EQ(test, 0u, nr_damon_targets(c));
+
+   damon_destroy_ctx(c);
+}
+
+/*
+ * Test kdamond_reset_aggregated()
+ *
+ * DAMON checks access to each region and aggregates this information as the
+ * access frequency of each region.  In detail, it increases '->nr_accesses' of
+ * regions that an access has confirmed.  'kdamond_reset_aggregated()' flushes
+ * the aggregated information ('->nr_accesses' of each regions) to the result
+ * buffer.  As a result of the flushing, the '->nr_accesses' of regions are
+ * initialized to ze

[PATCH v24 09/14] mm/damon/dbgfs: Export kdamond pid to the user space

2021-02-04 Thread SeongJae Park

From: SeongJae Park 

For CPU usage accounting, knowing pid of the monitoring thread could be
helpful.  For example, users could use cpuaccount cgroups with the pid.

This commit therefore exports the pid of currently running monitoring
thread to the user space via 'kdamond_pid' file in the debugfs
directory.

Signed-off-by: SeongJae Park 
---
 mm/damon/dbgfs.c | 37 +++--
 1 file changed, 35 insertions(+), 2 deletions(-)

diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c
index dce4409e5887..4b9ac2043e99 100644
--- a/mm/damon/dbgfs.c
+++ b/mm/damon/dbgfs.c
@@ -358,6 +358,32 @@ static ssize_t dbgfs_target_ids_write(struct file *file,
return ret;
 }
 
+static ssize_t dbgfs_kdamond_pid_read(struct file *file,
+   char __user *buf, size_t count, loff_t *ppos)
+{
+   struct damon_ctx *ctx = file->private_data;
+   char *kbuf;
+   ssize_t len;
+
+   kbuf = kmalloc(count, GFP_KERNEL);
+   if (!kbuf)
+   return -ENOMEM;
+
+   mutex_lock(>kdamond_lock);
+   if (ctx->kdamond)
+   len = scnprintf(kbuf, count, "%d\n", ctx->kdamond->pid);
+   else
+   len = scnprintf(kbuf, count, "none\n");
+   mutex_unlock(>kdamond_lock);
+   if (!len)
+   goto out;
+   len = simple_read_from_buffer(buf, count, ppos, kbuf, len);
+
+out:
+   kfree(kbuf);
+   return len;
+}
+
 static int damon_dbgfs_open(struct inode *inode, struct file *file)
 {
file->private_data = inode->i_private;
@@ -386,11 +412,18 @@ static const struct file_operations target_ids_fops = {
.write = dbgfs_target_ids_write,
 };
 
+static const struct file_operations kdamond_pid_fops = {
+   .owner = THIS_MODULE,
+   .open = damon_dbgfs_open,
+   .read = dbgfs_kdamond_pid_read,
+};
+
 static int dbgfs_fill_ctx_dir(struct dentry *dir, struct damon_ctx *ctx)
 {
-   const char * const file_names[] = {"attrs", "record", "target_ids"};
+   const char * const file_names[] = {"attrs", "record", "target_ids",
+   "kdamond_pid"};
const struct file_operations *fops[] = {_fops, _fops,
-   _ids_fops};
+   _ids_fops, _pid_fops};
int i;
 
for (i = 0; i < ARRAY_SIZE(file_names); i++) {
-- 
2.17.1

[PATCH v24 08/14] mm/damon/dbgfs: Implement recording feature

2021-02-04 Thread SeongJae Park

From: SeongJae Park 

The user space users can control DAMON and get the monitoring results
via the 'damon_aggregated' tracepoint event.  However, dealing with the
tracepoint might be complex for some simple use cases.  This commit
therefore implements 'recording' feature in 'damon-dbgfs'.  The feature
can be used via 'record' file in the '/damon/' directory.

The file allows users to record monitored access patterns in a regular
binary file.  The recorded results are first written in an in-memory
buffer and flushed to a file in batch.  Users can get and set the size
of the buffer and the path to the result file by reading from and
writing to the ``record`` file.  For example, below commands set the
buffer to be 4 KiB and the result to be saved in ``/damon.data``. ::

# cd /damon
# echo "4096 /damon.data" > record
# cat record
4096 /damon.data

The recording can be disabled by setting the buffer size zero.

Signed-off-by: SeongJae Park 
---
 mm/damon/dbgfs.c | 261 ++-
 1 file changed, 259 insertions(+), 2 deletions(-)

diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c
index db15380737d1..dce4409e5887 100644
--- a/mm/damon/dbgfs.c
+++ b/mm/damon/dbgfs.c
@@ -15,6 +15,17 @@
 #include 
 #include 
 
+#define MIN_RECORD_BUFFER_LEN  1024
+#define MAX_RECORD_BUFFER_LEN  (4 * 1024 * 1024)
+#define MAX_RFILE_PATH_LEN 256
+
+struct dbgfs_recorder {
+   unsigned char *rbuf;
+   unsigned int rbuf_len;
+   unsigned int rbuf_offset;
+   char *rfile_path;
+};
+
 static struct damon_ctx **dbgfs_ctxs;
 static int dbgfs_nr_ctxs;
 static struct dentry **dbgfs_dirs;
@@ -97,6 +108,116 @@ static ssize_t dbgfs_attrs_write(struct file *file,
return ret;
 }
 
+static ssize_t dbgfs_record_read(struct file *file,
+   char __user *buf, size_t count, loff_t *ppos)
+{
+   struct damon_ctx *ctx = file->private_data;
+   struct dbgfs_recorder *rec = ctx->callback.private;
+   char record_buf[20 + MAX_RFILE_PATH_LEN];
+   int ret;
+
+   mutex_lock(>kdamond_lock);
+   ret = scnprintf(record_buf, ARRAY_SIZE(record_buf), "%u %s\n",
+   rec->rbuf_len, rec->rfile_path);
+   mutex_unlock(>kdamond_lock);
+   return simple_read_from_buffer(buf, count, ppos, record_buf, ret);
+}
+
+/*
+ * dbgfs_set_recording() - Set attributes for the recording.
+ * @ctx:   target kdamond context
+ * @rbuf_len:  length of the result buffer
+ * @rfile_path:path to the monitor result files
+ *
+ * Setting 'rbuf_len' 0 disables recording.
+ *
+ * This function should not be called while the kdamond is running.
+ *
+ * Return: 0 on success, negative error code otherwise.
+ */
+static int dbgfs_set_recording(struct damon_ctx *ctx,
+   unsigned int rbuf_len, char *rfile_path)
+{
+   struct dbgfs_recorder *recorder;
+   size_t rfile_path_len;
+
+   if (rbuf_len && (rbuf_len > MAX_RECORD_BUFFER_LEN ||
+   rbuf_len < MIN_RECORD_BUFFER_LEN)) {
+   pr_err("result buffer size (%u) is out of [%d,%d]\n",
+   rbuf_len, MIN_RECORD_BUFFER_LEN,
+   MAX_RECORD_BUFFER_LEN);
+   return -EINVAL;
+   }
+   rfile_path_len = strnlen(rfile_path, MAX_RFILE_PATH_LEN);
+   if (rfile_path_len >= MAX_RFILE_PATH_LEN) {
+   pr_err("too long (>%d) result file path %s\n",
+   MAX_RFILE_PATH_LEN, rfile_path);
+   return -EINVAL;
+   }
+
+   recorder = ctx->callback.private;
+   if (!recorder) {
+   recorder = kzalloc(sizeof(*recorder), GFP_KERNEL);
+   if (!recorder)
+   return -ENOMEM;
+   ctx->callback.private = recorder;
+   }
+
+   recorder->rbuf_len = rbuf_len;
+   kfree(recorder->rbuf);
+   recorder->rbuf = NULL;
+   kfree(recorder->rfile_path);
+   recorder->rfile_path = NULL;
+
+   if (rbuf_len) {
+   recorder->rbuf = kvmalloc(rbuf_len, GFP_KERNEL);
+   if (!recorder->rbuf)
+   return -ENOMEM;
+   }
+   recorder->rfile_path = kmalloc(rfile_path_len + 1, GFP_KERNEL);
+   if (!recorder->rfile_path)
+   return -ENOMEM;
+   strncpy(recorder->rfile_path, rfile_path, rfile_path_len + 1);
+
+   return 0;
+}
+
+static ssize_t dbgfs_record_write(struct file *file,
+   const char __user *buf, size_t count, loff_t *ppos)
+{
+   struct damon_ctx *ctx = file->private_data;
+   char *kbuf;
+   unsigned int rbuf_len;
+   char rfile_path[MAX_RFILE_PATH_LEN];
+   ssize_t ret = count;
+   int err;
+
+   kbuf = user_input_str(buf, count, ppos);
+   if (IS_ERR(kbuf))
+   return PTR_ERR(kbuf);
+
+   if (sscanf(kbuf

[PATCH v24 06/14] mm/damon: Add a tracepoint

2021-02-04 Thread SeongJae Park

From: SeongJae Park 

This commit adds a tracepoint for DAMON.  It traces the monitoring
results of each region for each aggregation interval.  Using this, DAMON
can easily integrated with tracepoints supporting tools such as perf.

Signed-off-by: SeongJae Park 
Reviewed-by: Leonard Foerster 
Reviewed-by: Steven Rostedt (VMware) 
---
 include/trace/events/damon.h | 43 
 mm/damon/core.c  |  7 +-
 2 files changed, 49 insertions(+), 1 deletion(-)
 create mode 100644 include/trace/events/damon.h

diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h
new file mode 100644
index ..2f422f4f1fb9
--- /dev/null
+++ b/include/trace/events/damon.h
@@ -0,0 +1,43 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+#undef TRACE_SYSTEM
+#define TRACE_SYSTEM damon
+
+#if !defined(_TRACE_DAMON_H) || defined(TRACE_HEADER_MULTI_READ)
+#define _TRACE_DAMON_H
+
+#include 
+#include 
+#include 
+
+TRACE_EVENT(damon_aggregated,
+
+   TP_PROTO(struct damon_target *t, struct damon_region *r,
+   unsigned int nr_regions),
+
+   TP_ARGS(t, r, nr_regions),
+
+   TP_STRUCT__entry(
+   __field(unsigned long, target_id)
+   __field(unsigned int, nr_regions)
+   __field(unsigned long, start)
+   __field(unsigned long, end)
+   __field(unsigned int, nr_accesses)
+   ),
+
+   TP_fast_assign(
+   __entry->target_id = t->id;
+   __entry->nr_regions = nr_regions;
+   __entry->start = r->ar.start;
+   __entry->end = r->ar.end;
+   __entry->nr_accesses = r->nr_accesses;
+   ),
+
+   TP_printk("target_id=%lu nr_regions=%u %lu-%lu: %u",
+   __entry->target_id, __entry->nr_regions,
+   __entry->start, __entry->end, __entry->nr_accesses)
+);
+
+#endif /* _TRACE_DAMON_H */
+
+/* This part must be outside protection */
+#include 
diff --git a/mm/damon/core.c b/mm/damon/core.c
index b36b6bdd94e2..912112662d0c 100644
--- a/mm/damon/core.c
+++ b/mm/damon/core.c
@@ -13,6 +13,9 @@
 #include 
 #include 
 
+#define CREATE_TRACE_POINTS
+#include 
+
 /* Get a random number in [l, r) */
 #define damon_rand(l, r) (l + prandom_u32_max(r - l))
 
@@ -388,8 +391,10 @@ static void kdamond_reset_aggregated(struct damon_ctx *c)
damon_for_each_target(t, c) {
struct damon_region *r;
 
-   damon_for_each_region(r, t)
+   damon_for_each_region(r, t) {
+   trace_damon_aggregated(t, r, damon_nr_regions(t));
r->nr_accesses = 0;
+   }
}
 }
 
-- 
2.17.1

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1352 matches

Mail list logo