Re: [RFC PATCH v4 0/5] DAMON based tiered memory management for CXL memory
Hi Honggyu, On Mon, 13 May 2024 20:59:15 +0900 Honggyu Kim wrote: > Hi SeongJae, > > Thanks very much for your work! It got delayed due to the priority > changes in my workplace for building another heterogeneous memory > allocator. > https://github.com/skhynix/hmsdk/wiki/hmalloc No problem at all. We all work on our own schedule and nobody can chase/push anybody :) > > On Sun, 12 May 2024 10:54:42 -0700 SeongJae Park wrote: > > There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously > > posted at [1]. > > > > It says there is no implementation of the demote/promote DAMOS action > > are made. This RFC is about its implementation for physical address > > space. > > > > Changes from RFC v3 > > (https://lore.kernel.org/20240405060858.2818-1-honggyu@sk.com): > > This link cannot be opened. I will share the link again here. > https://lore.kernel.org/all/20240405060858.2818-1-honggyu@sk.com Thank you for checking the link! It's weird though, since I can open the link on my Chrome browser. > > > 0. updated from v3 and posted by SJ on behalf of Hunggyu under his > > approval. > > 1. Do not reuse damon_pa_pageout() and drop 'enum migration_mode' > > 2. Drop vmstat change > > I haven't checked whether I can collect useful information without > vmstat, but the changes look good in general except for that. I was thinking you could use DAMOS stat[1] for the schemes and assuming no reply to it as an agreement, but maybe I should made it clear. Do you think DAMOS stat cannot be used instead? If so, what would be the limitation of DAMOS stat for your usage? > > > 3. Drop unnecessary page reference check > > I will compare this patch series with my previous v3 patchset and get > back to you later maybe next week. Thank you very much! Unless I get a good enough test setup and results from it on my own or from others' help, your test result would be the last requirement for dropping RFC from this patchset. > Sorry, I will have another break this week. No problem, I hope you to have nice break. Nobody can chase/push others. We all do this work voluntarily for our own fun and profit, right? ;) [1] https://lore.kernel.org/damon/20240405060858.2818-1-honggyu@sk.com Thanks, SJ > > Thanks, > Honggyu
[RFC IDEA v2 6/6] drivers/virtio/virtio_balloon: integrate ACMA and ballooning
Let the host effectively inflate the balloon in access/contiguity-aware way when the guest kernel is compiled with specific kernel config. When the config is enabled and the host requests balloon size change, virtio-balloon adjusts ACMA's max-mem parameter instead of allocating guest pages and put it into the balloon. As a result, the host can use the requested amount of guest memory, so from the host's perspective, the ballooning just works, but in transparent and access/contiguity-aware way. Signed-off-by: SeongJae Park --- drivers/virtio/virtio_balloon.c | 26 ++ 1 file changed, 26 insertions(+) diff --git a/drivers/virtio/virtio_balloon.c b/drivers/virtio/virtio_balloon.c index 1f5b3dd31fcf..a954d75789ae 100644 --- a/drivers/virtio/virtio_balloon.c +++ b/drivers/virtio/virtio_balloon.c @@ -472,6 +472,32 @@ static void virtballoon_changed(struct virtio_device *vdev) struct virtio_balloon *vb = vdev->priv; unsigned long flags; +#ifdef CONFIG_ACMA_BALLOON + s64 target; + u32 num_pages; + + + /* Legacy balloon config space is LE, unlike all other devices. */ + virtio_cread_le(vb->vdev, struct virtio_balloon_config, num_pages, + _pages); + + /* +* Aligned up to guest page size to avoid inflating and deflating +* balloon endlessly. +*/ + target = ALIGN(num_pages, VIRTIO_BALLOON_PAGES_PER_PAGE); + + /* +* If the given new max mem size is larger than current acma's max mem +* size, same to normal max mem adjustment. +* If the given new max mem size is smaller than current acma's max mem +* size, strong aggressiveness is applied while memory for meeting the +* new max mem is met is stolen. +*/ + acma_set_max_mem_aggressive(totalram_pages() - target); + return; +#endif + spin_lock_irqsave(>stop_update_lock, flags); if (!vb->stop_update) { start_update_balloon_size(vb); -- 2.39.2
[RFC IDEA v2 0/6] mm/damon: introduce Access/Contiguity-aware Memory Auto-scaling (ACMA)
exclusively used for only the contiguous memory allocation, or allow non-contiguous memory allocation to use it under special condition such as allowing only movable pages. The second approach improves the memory utilization, but sometimes suffers from pages that movable by definition, but not easily movable in practice, similar to the memory block-level page migration for memory hot unplugging that described on the limitation section. Even without the migration reliability and the speed, finding the optimum size of the pool is challenging. We could use ACMA-like approach for dynamically allocating a memory pool for contiguous memory allocation. It will be similar to ACMA but do not report DAMOS-alloc-ed pages to the host. Instead, use the regions as contiguous memory allocation pool. DRAM Consuming Power Saving --- DRAM consumes and emits huge amount of power and carbon, respectively. On bare-meta machines, we could scale down memory using ACMA, hot-unplug completely DAMOS-alloc-ed memory blocks, and power off the DRAM device if the hardware supports such operation. Discussion Points = - Is there existing better alternatives for memory over-commit VM systems? - Is it ok to reuse pages reporting infrastructure from ACMA? - Is it ok to reuse virtio-balloon's interface for ACMA-integration? - Will access-aware migration make real benefit? - Does future usages of access-aware memory allocation make sense? SeongJae Park (6): mm/damon: implement DAMOS actions for access-aware contiguous memory allocation mm/damon: add the initial part of access/contiguity-aware memory auto-scaling module mm/page_reporting: implement a function for reporting specific pfn range mm/damon/acma: implement scale down feature mm/damon/acma: implement scale up feature drivers/virtio/virtio_balloon: integrate ACMA and ballooning drivers/virtio/virtio_balloon.c | 26 ++ include/linux/damon.h | 37 +++ mm/damon/Kconfig| 10 + mm/damon/Makefile | 1 + mm/damon/acma.c | 546 mm/damon/paddr.c| 93 ++ mm/damon/sysfs-schemes.c| 4 + mm/page_reporting.c | 27 ++ 8 files changed, 744 insertions(+) create mode 100644 mm/damon/acma.c base-commit: 40475439de721986370c9d26f53596e2bd4e1416 -- 2.39.2
Re: [RFC PATCH v3 1/7] mm/damon/paddr: refactor DAMOS_PAGEOUT with migration_mode
On Sat, 11 May 2024 13:16:17 -0700 SeongJae Park wrote: > On Fri, 5 Apr 2024 12:19:07 -0700 SeongJae Park wrote: > > > On Fri, 5 Apr 2024 15:08:50 +0900 Honggyu Kim wrote: > > > > > This is a preparation patch that introduces migration modes. > > > > > > The damon_pa_pageout is renamed to damon_pa_migrate and it receives an > > > extra argument for migration_mode. > > > > I personally think keeping damon_pa_pageout() as is and adding a new > > function > > (damon_pa_migrate()) with some duplicated code is also ok, but this approach > > also looks fine to me. So I have no strong opinion here, but just letting > > you > > know I would have no objection at both approaches. > > Meanwhile, we added one more logic in damon_pa_pageout() for doing page > idleness double check on its own[1]. It makes reusing damon_pa_pageout() for > multiple reason a bit complex. I think the complexity added a problem in this > patch that I also missed before due to the complexity. Show below comment in > line. Hence now I think it would be better to do the suggested way. > > If we use the approach, this patch is no more necessary, and therefore can be > dropped. > > [1] https://lore.kernel.org/20240426195247.100306-1...@kernel.org I updated this patchset to address comments on this thread, and posted it as RFC patchset v4 on behalf of Honggyu under his approval: https://lore.kernel.org/20240512175447.75943-1...@kernel.org Thanks, SJ [...]
[RFC PATCH v4 3/5] mm/migrate: add MR_DAMON to migrate_reason
From: Honggyu Kim The current patch series introduces DAMON based migration across NUMA nodes so it'd be better to have a new migrate_reason in trace events. Signed-off-by: Honggyu Kim Reviewed-by: SeongJae Park Signed-off-by: SeongJae Park --- include/linux/migrate_mode.h | 1 + include/trace/events/migrate.h | 3 ++- 2 files changed, 3 insertions(+), 1 deletion(-) diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h index f37cc03f9369..cec36b7e7ced 100644 --- a/include/linux/migrate_mode.h +++ b/include/linux/migrate_mode.h @@ -29,6 +29,7 @@ enum migrate_reason { MR_CONTIG_RANGE, MR_LONGTERM_PIN, MR_DEMOTION, + MR_DAMON, MR_TYPES }; diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h index 0190ef725b43..cd01dd7b3640 100644 --- a/include/trace/events/migrate.h +++ b/include/trace/events/migrate.h @@ -22,7 +22,8 @@ EM( MR_NUMA_MISPLACED, "numa_misplaced") \ EM( MR_CONTIG_RANGE,"contig_range") \ EM( MR_LONGTERM_PIN,"longterm_pin") \ - EMe(MR_DEMOTION,"demotion") + EM( MR_DEMOTION,"demotion") \ + EMe(MR_DAMON, "damon") /* * First define the enums in the above macros to be exported to userspace -- 2.39.2
[RFC PATCH v4 0/5] DAMON based tiered memory management for CXL memory
es the execution time increase. However, "DAMON tiered" result shows less slowdown because the DAMOS_MIGRATE_COLD action at DRAM node proactively demotes pre-allocated cold memory to CXL node and this free space at DRAM increases more chance to allocate hot or warm pages of redis-server to fast DRAM node. Moreover, DAMOS_MIGRATE_HOT action at CXL node also promotes hot pages of redis-server to DRAM node actively. As a result, it makes more memory of redis-server stay in DRAM node compared to "default" memory policy and this makes the performance improvement. The following result of latest distribution workload shows similar data. 2. YCSB latest distribution read only workload memory pressure with cold memory on node0 with 512GB of local DRAM. =++= | cold memory occupied by mmap and memset | | 0G 440G 450G 460G 470G 480G 490G 500G | =++= Execution time normalized to DRAM-only values | GEOMEAN -++- DRAM-only| 1.00 - - - - - - - | 1.00 CXL-only | 1.18 - - - - - - - | 1.18 default |- 1.18 1.19 1.18 1.18 1.17 1.19 1.18 | 1.18 DAMON tiered |- 1.04 1.04 1.04 1.05 1.04 1.05 1.05 | 1.04 =++= CXL usage of redis-server in GB | AVERAGE -++- DRAM-only| 0.0 - - - - - - - | 0.0 CXL-only | 52.6 - - - - - - - | 52.6 default |- 20.5 27.1 33.2 39.5 45.5 50.4 50.5 | 38.1 DAMON tiered |- 0.2 0.4 0.7 1.6 1.2 1.1 3.4 | 1.2 =++= In summary of both results, our evaluation shows that "DAMON tiered" memory management reduces the performance slowdown compared to the "default" memory policy from 17~18% to 4~5% when the system runs with high memory pressure on its fast tier DRAM nodes. Having these DAMOS_MIGRATE_HOT and DAMOS_MIGRATE_COLD actions can make tiered memory systems run more efficiently under high memory pressures. Signed-off-by: Honggyu Kim Signed-off-by: Hyeongtak Ji Signed-off-by: Rakie Kim Signed-off-by: SeongJae Park [1] https://lore.kernel.org/damon/20231112195602.61525-1...@kernel.org [2] https://lore.kernel.org/damon/20240311204545.47097-1...@kernel.org [3] https://github.com/skhynix/hmsdk [4] https://github.com/redis/redis/tree/7.0.0 [5] https://github.com/brianfrankcooper/YCSB/tree/0.17.0 [6] https://dl.acm.org/doi/10.1145/3503222.3507731 [7] https://dl.acm.org/doi/10.1145/3582016.3582063 Honggyu Kim (3): mm: make alloc_demote_folio externally invokable for migration mm/migrate: add MR_DAMON to migrate_reason mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion Hyeongtak Ji (2): mm/damon/sysfs-schemes: add target_nid on sysfs-schemes mm/damon/paddr: introduce DAMOS_MIGRATE_HOT action for promotion include/linux/damon.h | 15 +++- include/linux/migrate_mode.h | 1 + include/trace/events/migrate.h | 3 +- mm/damon/core.c| 5 +- mm/damon/dbgfs.c | 2 +- mm/damon/lru_sort.c| 3 +- mm/damon/paddr.c | 157 + mm/damon/reclaim.c | 3 +- mm/damon/sysfs-schemes.c | 35 +++- mm/internal.h | 1 + mm/vmscan.c| 3 +- 11 files changed, 219 insertions(+), 9 deletions(-) base-commit: edc60852c99779574e0748bcf766560db67eb423 -- 2.39.2
Re: [RFC PATCH v3 1/7] mm/damon/paddr: refactor DAMOS_PAGEOUT with migration_mode
On Fri, 5 Apr 2024 12:19:07 -0700 SeongJae Park wrote: > On Fri, 5 Apr 2024 15:08:50 +0900 Honggyu Kim wrote: > > > This is a preparation patch that introduces migration modes. > > > > The damon_pa_pageout is renamed to damon_pa_migrate and it receives an > > extra argument for migration_mode. > > I personally think keeping damon_pa_pageout() as is and adding a new function > (damon_pa_migrate()) with some duplicated code is also ok, but this approach > also looks fine to me. So I have no strong opinion here, but just letting you > know I would have no objection at both approaches. Meanwhile, we added one more logic in damon_pa_pageout() for doing page idleness double check on its own[1]. It makes reusing damon_pa_pageout() for multiple reason a bit complex. I think the complexity added a problem in this patch that I also missed before due to the complexity. Show below comment in line. Hence now I think it would be better to do the suggested way. If we use the approach, this patch is no more necessary, and therefore can be dropped. [1] https://lore.kernel.org/20240426195247.100306-1...@kernel.org Thanks, SJ [...] > > > > > No functional changes applied. > > > > Signed-off-by: Honggyu Kim > > --- > > mm/damon/paddr.c | 18 +++--- > > 1 file changed, 15 insertions(+), 3 deletions(-) > > > > diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c > > index 081e2a325778..277a1c4d833c 100644 > > --- a/mm/damon/paddr.c > > +++ b/mm/damon/paddr.c > > @@ -224,7 +224,12 @@ static bool damos_pa_filter_out(struct damos *scheme, > > struct folio *folio) > > return false; > > } > > > > -static unsigned long damon_pa_pageout(struct damon_region *r, struct damos > > *s) > > +enum migration_mode { > > + MIG_PAGEOUT, > > +}; > > To avoid name conflicts, what about renaming to 'damos_migration_mode' and > 'DAMOS_MIG_PAGEOUT'? > > > + > > +static unsigned long damon_pa_migrate(struct damon_region *r, struct damos > > *s, > > + enum migration_mode mm) > > My poor brain has a bit confused with the name. What about calling it 'mode'? > > > { > > unsigned long addr, applied; > > LIST_HEAD(folio_list); > > @@ -249,7 +254,14 @@ static unsigned long damon_pa_pageout(struct > > damon_region *r, struct damos *s) Before this line, damon_pa_pageout() calls folio_clear_referenced() and folio_test_clear_young() for the folio, because this is pageout code. Changed function, damon_pa_migrate() is not only for cold pages but general migrations. Hence it should also be handled based on the migration mode, but not handled. I think this problem came from the increased complexity of this function. Hence I think it is better to keep damon_pa_pageout() as is and adding a new function for migration. Thanks, SJ [...]
Re: [RFC PATCH v3 5/7] mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion
Hi Honggyu, On Tue, 9 Apr 2024 18:54:14 +0900 Honggyu Kim wrote: > On Mon, 8 Apr 2024 10:52:28 -0700 SeongJae Park wrote: > > On Mon, 8 Apr 2024 21:06:44 +0900 Honggyu Kim wrote: > > > On Fri, 5 Apr 2024 12:24:30 -0700 SeongJae Park wrote: > > > > On Fri, 5 Apr 2024 15:08:54 +0900 Honggyu Kim > > > > wrote: [...] > > > I can remove it, but I would like to have more discussion about this > > > issue. The current implementation allows only a single migration > > > target with "target_nid", but users might want to provide fall back > > > migration target nids. > > > > > > For example, if more than two CXL nodes exist in the system, users might > > > want to migrate cold pages to any CXL nodes. In such cases, we might > > > have to make "target_nid" accept comma separated node IDs. nodemask can > > > be better but we should provide a way to change the scanning order. > > > > > > I would like to hear how you think about this. > > > > Good point. I think we could later extend the sysfs file to receive the > > comma-separated numbers, or even mask. For simplicity, adding sysfs files > > dedicated for the different format of inputs could also be an option (e.g., > > target_nids_list, target_nids_mask). But starting from this single node as > > is > > now looks ok to me. > > If you think we can start from a single node, then I will keep it as is. > But are you okay if I change the same 'target_nid' to accept > comma-separated numbers later? Or do you want to introduce another knob > such as 'target_nids_list'? What about rename 'target_nid' to > 'target_nids' at the first place? I have no strong concern or opinion about this at the moment. Please feel free to renaming it to 'taget_nids' if you think that's better. [...] > Please note that I will be out of office this week so won't be able to > answer quickly. No problem, I hope you to take and enjoy your time :) Thanks, SJ [...]
Re: [RFC PATCH v3 5/7] mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion
On Mon, 8 Apr 2024 21:06:44 +0900 Honggyu Kim wrote: > On Fri, 5 Apr 2024 12:24:30 -0700 SeongJae Park wrote: > > On Fri, 5 Apr 2024 15:08:54 +0900 Honggyu Kim wrote: [...] > > > Here is one of the example usage of this 'migrate_cold' action. > > > > > > $ cd /sys/kernel/mm/damon/admin/kdamonds/ > > > $ cat contexts//schemes//action > > > migrate_cold > > > $ echo 2 > contexts//schemes//target_nid > > > $ echo commit > state > > > $ numactl -p 0 ./hot_cold 500M 600M & > > > $ numastat -c -p hot_cold > > > > > > Per-node process memory usage (in MBs) > > > PID Node 0 Node 1 Node 2 Total > > > -- -- -- -- - > > > 701 (hot_cold) 501 0601 1101 > > > > > > Since there are some common routines with pageout, many functions have > > > similar logics between pageout and migrate cold. > > > > > > damon_pa_migrate_folio_list() is a minimized version of > > > shrink_folio_list(), but it's minified only for demotion. > > > > MIGRATE_COLD is not only for demotion, right? I think the last two words > > are > > better to be removed for reducing unnecessary confuses. > > You mean the last two sentences? I will remove them if you feel it's > confusing. Yes. My real intended suggestion was 's/only for demotion/only for migration/', but entirely removing the sentences is also ok for me. > > > > > > > Signed-off-by: Honggyu Kim > > > Signed-off-by: Hyeongtak Ji > > > --- > > > include/linux/damon.h| 2 + > > > mm/damon/paddr.c | 146 ++- > > > mm/damon/sysfs-schemes.c | 4 ++ > > > 3 files changed, 151 insertions(+), 1 deletion(-) [...] > > > --- a/mm/damon/paddr.c > > > +++ b/mm/damon/paddr.c [...] > > > +{ > > > + unsigned int nr_succeeded; > > > + nodemask_t allowed_mask = NODE_MASK_NONE; > > > + > > > > I personally prefer not having empty lines in the middle of variable > > declarations/definitions. Could we remove this empty line? > > I can remove it, but I would like to have more discussion about this > issue. The current implementation allows only a single migration > target with "target_nid", but users might want to provide fall back > migration target nids. > > For example, if more than two CXL nodes exist in the system, users might > want to migrate cold pages to any CXL nodes. In such cases, we might > have to make "target_nid" accept comma separated node IDs. nodemask can > be better but we should provide a way to change the scanning order. > > I would like to hear how you think about this. Good point. I think we could later extend the sysfs file to receive the comma-separated numbers, or even mask. For simplicity, adding sysfs files dedicated for the different format of inputs could also be an option (e.g., target_nids_list, target_nids_mask). But starting from this single node as is now looks ok to me. [...] > > > + /* 'folio_list' is always empty here */ > > > + > > > + /* Migrate folios selected for migration */ > > > + nr_migrated += migrate_folio_list(_folios, pgdat, target_nid); > > > + /* Folios that could not be migrated are still in @migrate_folios */ > > > + if (!list_empty(_folios)) { > > > + /* Folios which weren't migrated go back on @folio_list */ > > > + list_splice_init(_folios, folio_list); > > > + } > > > > Let's not use braces for single statement > > (https://docs.kernel.org/process/coding-style.html#placing-braces-and-spaces). > > Hmm.. I know the convention but left it as is because of the comment. > If I remove the braces, it would have a weird alignment for the two > lines for comment and statement lines. I don't really hate such alignment. But if you don't like it, how about moving the comment out of the if statement? Having one comment for one-line if statement looks not bad to me. > > > > + > > > + try_to_unmap_flush(); > > > + > > > + list_splice(_folios, folio_list); > > > > Can't we move remaining folios in migrate_folios to ret_folios at once? > > I will see if it's possible. Thank you. Not a strict request, though. [...] > > > + nid = folio_nid(lru_to_folio(folio_list)); > > > + do { > > > + struct folio *folio = lru_to_folio(folio_list); > > > + > > > + if (nid == folio_nid(folio)) { > > > + folio_clear_active(folio); > > > > I think this was necessary for demotion, but now this should be removed > > since > > this function is no more for demotion but for migrating random pages, right? > > Yeah, it can be removed because we do migration instead of demotion, > but I need to make sure if it doesn't change the performance evaluation > results. Yes, please ensure the test results are valid :) Thanks, SJ [...]
Re: [RFC PATCH v3 0/7] DAMON based tiered memory management for CXL memory
Hello Honggyu, On Fri, 5 Apr 2024 15:08:49 +0900 Honggyu Kim wrote: > There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously > posted at [1]. > > It says there is no implementation of the demote/promote DAMOS action > are made. This RFC is about its implementation for physical address > space. > > > Changes from RFC v2: > 1. Rename DAMOS_{PROMOTE,DEMOTE} actions to DAMOS_MIGRATE_{HOT,COLD}. > 2. Create 'target_nid' to set the migration target node instead of > depending on node distance based information. > 3. Instead of having page level access check in this patch series, > delegate the job to a new DAMOS filter type YOUNG[2]. > 4. Introduce vmstat counters "damon_migrate_{hot,cold}". > 5. Rebase from v6.7 to v6.8. Thank you for patiently keeping discussion and making this great version! I left comments on each patch, but found no special concerns. Per-page access recheck for MIGRATE_HOT and vmstat change are taking my eyes, though. I doubt if those really needed. It would be nice if you could answer to the comments. Once my comments on this version are addressed, I would have no reason to object at dropping the RFC tag from this patchset. Nonetheless, I show some warnings and errors from checkpatch.pl. I don't really care about those for RFC patches, so no problem at all. But if you agree to my opinion about RFC tag dropping, and therefore if you will send next version without RFC tag, please make sure you also run checkpatch.pl before posting. Thanks, SJ [...]
Re: [RFC PATCH v3 7/7] mm/damon: Add "damon_migrate_{hot,cold}" vmstat
On Fri, 5 Apr 2024 15:08:56 +0900 Honggyu Kim wrote: > This patch adds "damon_migrate_{hot,cold}" under node specific vmstat > counters at the following location. > > /sys/devices/system/node/node*/vmstat > > The counted values are accumulcated to the global vmstat so it also > introduces the same counter at /proc/vmstat as well. DAMON provides its own DAMOS stats via DAMON sysfs interface. Do we really need this change? Thanks, SJ [...]
Re: [RFC PATCH v3 6/7] mm/damon/paddr: introduce DAMOS_MIGRATE_HOT action for promotion
On Fri, 5 Apr 2024 15:08:55 +0900 Honggyu Kim wrote: > From: Hyeongtak Ji > > This patch introduces DAMOS_MIGRATE_HOT action, which is similar to > DAMOS_MIGRATE_COLD, but it is targeted to migrate hot pages. My understanding of our last discussion was that 'HOT/COLD' here is only for prioritization score function. If I'm not wrong, this is not for targeting, but just prioritize migrating hot pages first under the quota. > > It migrates pages inside the given region to the 'target_nid' NUMA node > in the sysfs. > > Here is one of the example usage of this 'migrate_hot' action. > > $ cd /sys/kernel/mm/damon/admin/kdamonds/ > $ cat contexts//schemes//action > migrate_hot > $ echo 0 > contexts//schemes//target_nid > $ echo commit > state > $ numactl -p 2 ./hot_cold 500M 600M & > $ numastat -c -p hot_cold > > Per-node process memory usage (in MBs) > PID Node 0 Node 1 Node 2 Total > -- -- -- -- - > 701 (hot_cold) 501 0601 1101 > > Signed-off-by: Hyeongtak Ji > Signed-off-by: Honggyu Kim > --- > include/linux/damon.h| 2 ++ > mm/damon/paddr.c | 12 ++-- > mm/damon/sysfs-schemes.c | 4 +++- > 3 files changed, 15 insertions(+), 3 deletions(-) > > diff --git a/include/linux/damon.h b/include/linux/damon.h > index df8671e69a70..934c95a7c042 100644 > --- a/include/linux/damon.h > +++ b/include/linux/damon.h > @@ -105,6 +105,7 @@ struct damon_target { > * @DAMOS_NOHUGEPAGE:Call ``madvise()`` for the region with > MADV_NOHUGEPAGE. > * @DAMOS_LRU_PRIO: Prioritize the region on its LRU lists. > * @DAMOS_LRU_DEPRIO:Deprioritize the region on its LRU lists. > + * @DAMOS_MIGRATE_HOT: Migrate for the given hot region. As commented on the previous patch, this could be bit re-phrased. Also, let's use tabs consistently. > * @DAMOS_MIGRATE_COLD: Migrate for the given cold region. > * @DAMOS_STAT: Do nothing but count the stat. > * @NR_DAMOS_ACTIONS:Total number of DAMOS actions > @@ -123,6 +124,7 @@ enum damos_action { > DAMOS_NOHUGEPAGE, > DAMOS_LRU_PRIO, > DAMOS_LRU_DEPRIO, > + DAMOS_MIGRATE_HOT, > DAMOS_MIGRATE_COLD, > DAMOS_STAT, /* Do nothing but only record the stat */ > NR_DAMOS_ACTIONS, > diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c > index fe217a26f788..fd9d35b5cc83 100644 > --- a/mm/damon/paddr.c > +++ b/mm/damon/paddr.c > @@ -229,6 +229,7 @@ static bool damos_pa_filter_out(struct damos *scheme, > struct folio *folio) > > enum migration_mode { > MIG_PAGEOUT, > + MIG_MIGRATE_HOT, > MIG_MIGRATE_COLD, > }; It looks like we don't need MIG_MIGRATE_HOT and MIG_MIGRATE_COLD in real, but just one, say, MIG_MIGRATE, since the code can know if it should use what prioritization score function with DAMOS action? Also, as I commented on the previous one, I'd prefer having DAMOS_ prefix. > > @@ -375,8 +376,10 @@ static unsigned long damon_pa_migrate(struct > damon_region *r, struct damos *s, > if (damos_pa_filter_out(s, folio)) > goto put_folio; > > - folio_clear_referenced(folio); > - folio_test_clear_young(folio); > + if (mm != MIG_MIGRATE_HOT) { > + folio_clear_referenced(folio); > + folio_test_clear_young(folio); > + } We agreed to this check via 'young' page type DAMOS filter, and let this doesn't care about it, right? If I'm not wrong, I think this should be removed? > if (!folio_isolate_lru(folio)) > goto put_folio; > /* > @@ -394,6 +397,7 @@ static unsigned long damon_pa_migrate(struct damon_region > *r, struct damos *s, > case MIG_PAGEOUT: > applied = reclaim_pages(_list); > break; > + case MIG_MIGRATE_HOT: > case MIG_MIGRATE_COLD: > applied = damon_pa_migrate_pages(_list, mm, >s->target_nid); > @@ -454,6 +458,8 @@ static unsigned long damon_pa_apply_scheme(struct > damon_ctx *ctx, > return damon_pa_mark_accessed(r, scheme); > case DAMOS_LRU_DEPRIO: > return damon_pa_deactivate_pages(r, scheme); > + case DAMOS_MIGRATE_HOT: > + return damon_pa_migrate(r, scheme, MIG_MIGRATE_HOT); > case DAMOS_MIGRATE_COLD: > return damon_pa_migrate(r, scheme, MIG_MIGRATE_COLD); > case DAMOS_STAT: > @@ -476,6 +482,8 @@ static int damon_pa_scheme_score(struct damon_ctx > *context, > return damon_hot_score(context, r, scheme); > case DAMOS_LRU_DEPRIO: > return damon_cold_score(context, r, scheme); > + case DAMOS_MIGRATE_HOT: > + return damon_hot_score(context, r, scheme); > case DAMOS_MIGRATE_COLD: > return damon_cold_score(context, r, scheme); > default: > diff
Re: [RFC PATCH v3 5/7] mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion
On Fri, 5 Apr 2024 16:55:57 +0900 Hyeongtak Ji wrote: > On Fri, 5 Apr 2024 15:08:54 +0900 Honggyu Kim wrote: > > ...snip... > > > +static unsigned long damon_pa_migrate_pages(struct list_head *folio_list, > > + enum migration_mode mm, > > + int target_nid) > > +{ > > + int nid; > > + unsigned int nr_migrated = 0; > > + LIST_HEAD(node_folio_list); > > + unsigned int noreclaim_flag; > > + > > + if (list_empty(folio_list)) > > + return nr_migrated; > > How about checking if `target_nid` is `NUMA_NO_NODE` or not earlier, > > > + > > + noreclaim_flag = memalloc_noreclaim_save(); > > + > > + nid = folio_nid(lru_to_folio(folio_list)); > > + do { > > + struct folio *folio = lru_to_folio(folio_list); > > + > > + if (nid == folio_nid(folio)) { > > + folio_clear_active(folio); > > + list_move(>lru, _folio_list); > > + continue; > > + } > > + > > + nr_migrated += damon_pa_migrate_folio_list(_folio_list, > > + NODE_DATA(nid), mm, > > + target_nid); > > + nid = folio_nid(lru_to_folio(folio_list)); > > + } while (!list_empty(folio_list)); > > + > > + nr_migrated += damon_pa_migrate_folio_list(_folio_list, > > + NODE_DATA(nid), mm, > > + target_nid); > > + > > + memalloc_noreclaim_restore(noreclaim_flag); > > + > > + return nr_migrated; > > +} > > + > > ...snip... > > > +static unsigned int migrate_folio_list(struct list_head *migrate_folios, > > + struct pglist_data *pgdat, > > + int target_nid) > > +{ > > + unsigned int nr_succeeded; > > + nodemask_t allowed_mask = NODE_MASK_NONE; > > + > > + struct migration_target_control mtc = { > > + /* > > +* Allocate from 'node', or fail quickly and quietly. > > +* When this happens, 'page' will likely just be discarded > > +* instead of migrated. > > +*/ > > + .gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) | > > __GFP_NOWARN | > > + __GFP_NOMEMALLOC | GFP_NOWAIT, > > + .nid = target_nid, > > + .nmask = _mask > > + }; > > + > > + if (pgdat->node_id == target_nid || target_nid == NUMA_NO_NODE) > > + return 0; > > instead of here. Agree. As I replied on the previous reply, I think this check can be done from the caller (or the caller of the caller) of this function. > > > + > > + if (list_empty(migrate_folios)) > > + return 0; Same for this. > > + > > + /* Migration ignores all cpuset and mempolicy settings */ > > + migrate_pages(migrate_folios, alloc_migrate_folio, NULL, > > + (unsigned long), MIGRATE_ASYNC, MR_DAMON, > > + _succeeded); > > + > > + return nr_succeeded; > > +} > > + > > ...snip... > > Kind regards, > Hyeongtak > Thanks, SJ
Re: [RFC PATCH v3 5/7] mm/damon/paddr: introduce DAMOS_MIGRATE_COLD action for demotion
On Fri, 5 Apr 2024 15:08:54 +0900 Honggyu Kim wrote: > This patch introduces DAMOS_MIGRATE_COLD action, which is similar to > DAMOS_PAGEOUT, but migrate folios to the given 'target_nid' in the sysfs > instead of swapping them out. > > The 'target_nid' sysfs knob is created by this patch to inform the > migration target node ID. Isn't it created by the previous patch? > > Here is one of the example usage of this 'migrate_cold' action. > > $ cd /sys/kernel/mm/damon/admin/kdamonds/ > $ cat contexts//schemes//action > migrate_cold > $ echo 2 > contexts//schemes//target_nid > $ echo commit > state > $ numactl -p 0 ./hot_cold 500M 600M & > $ numastat -c -p hot_cold > > Per-node process memory usage (in MBs) > PID Node 0 Node 1 Node 2 Total > -- -- -- -- - > 701 (hot_cold) 501 0601 1101 > > Since there are some common routines with pageout, many functions have > similar logics between pageout and migrate cold. > > damon_pa_migrate_folio_list() is a minimized version of > shrink_folio_list(), but it's minified only for demotion. MIGRATE_COLD is not only for demotion, right? I think the last two words are better to be removed for reducing unnecessary confuses. > > Signed-off-by: Honggyu Kim > Signed-off-by: Hyeongtak Ji > --- > include/linux/damon.h| 2 + > mm/damon/paddr.c | 146 ++- > mm/damon/sysfs-schemes.c | 4 ++ > 3 files changed, 151 insertions(+), 1 deletion(-) > > diff --git a/include/linux/damon.h b/include/linux/damon.h > index 24ea33a03d5d..df8671e69a70 100644 > --- a/include/linux/damon.h > +++ b/include/linux/damon.h > @@ -105,6 +105,7 @@ struct damon_target { > * @DAMOS_NOHUGEPAGE:Call ``madvise()`` for the region with > MADV_NOHUGEPAGE. > * @DAMOS_LRU_PRIO: Prioritize the region on its LRU lists. > * @DAMOS_LRU_DEPRIO:Deprioritize the region on its LRU lists. > + * @DAMOS_MIGRATE_COLD: Migrate for the given cold region. Whether it will be for cold region or not is depending on the target access pattern. What about 'Migrate the regions in coldest regions first manner.'? Or, simply 'Migrate the regions (prioritize cold)' here, and explain about the prioritization under quota on the detailed comments part? Also, let's use tab consistently. > * @DAMOS_STAT: Do nothing but count the stat. > * @NR_DAMOS_ACTIONS:Total number of DAMOS actions > * > @@ -122,6 +123,7 @@ enum damos_action { > DAMOS_NOHUGEPAGE, > DAMOS_LRU_PRIO, > DAMOS_LRU_DEPRIO, > + DAMOS_MIGRATE_COLD, > DAMOS_STAT, /* Do nothing but only record the stat */ > NR_DAMOS_ACTIONS, > }; > diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c > index 277a1c4d833c..fe217a26f788 100644 > --- a/mm/damon/paddr.c > +++ b/mm/damon/paddr.c > @@ -12,6 +12,9 @@ > #include > #include > #include > +#include > +#include > +#include > > #include "../internal.h" > #include "ops-common.h" > @@ -226,8 +229,137 @@ static bool damos_pa_filter_out(struct damos *scheme, > struct folio *folio) > > enum migration_mode { > MIG_PAGEOUT, > + MIG_MIGRATE_COLD, > }; > > +static unsigned int migrate_folio_list(struct list_head *migrate_folios, > +struct pglist_data *pgdat, > +int target_nid) To avoid name collisions, I'd prefer having damon_pa_prefix. I show this patch is defining damon_pa_migrate_folio_list() below, though. What about __damon_pa_migrate_folio_list()? > +{ > + unsigned int nr_succeeded; > + nodemask_t allowed_mask = NODE_MASK_NONE; > + I personally prefer not having empty lines in the middle of variable declarations/definitions. Could we remove this empty line? > + struct migration_target_control mtc = { > + /* > + * Allocate from 'node', or fail quickly and quietly. > + * When this happens, 'page' will likely just be discarded > + * instead of migrated. > + */ > + .gfp_mask = (GFP_HIGHUSER_MOVABLE & ~__GFP_RECLAIM) | > __GFP_NOWARN | > + __GFP_NOMEMALLOC | GFP_NOWAIT, > + .nid = target_nid, > + .nmask = _mask > + }; > + > + if (pgdat->node_id == target_nid || target_nid == NUMA_NO_NODE) > + return 0; > + > + if (list_empty(migrate_folios)) > + return 0; Can't these checks be done by the caller? > + > + /* Migration ignores all cpuset and mempolicy settings */ > + migrate_pages(migrate_folios, alloc_migrate_folio, NULL, > + (unsigned long), MIGRATE_ASYNC, MR_DAMON, > + _succeeded); > + > + return nr_succeeded; > +} > + > +static unsigned int damon_pa_migrate_folio_list(struct list_head *folio_list, > + struct pglist_data *pgdat, > +
Re: [RFC PATCH v3 4/7] mm/migrate: add MR_DAMON to migrate_reason
On Fri, 5 Apr 2024 15:08:53 +0900 Honggyu Kim wrote: > The current patch series introduces DAMON based migration across NUMA > nodes so it'd be better to have a new migrate_reason in trace events. > > Signed-off-by: Honggyu Kim Reviewed-by: SeongJae Park Thanks, SJ > --- > include/linux/migrate_mode.h | 1 + > include/trace/events/migrate.h | 3 ++- > 2 files changed, 3 insertions(+), 1 deletion(-) > > diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h > index f37cc03f9369..cec36b7e7ced 100644 > --- a/include/linux/migrate_mode.h > +++ b/include/linux/migrate_mode.h > @@ -29,6 +29,7 @@ enum migrate_reason { > MR_CONTIG_RANGE, > MR_LONGTERM_PIN, > MR_DEMOTION, > + MR_DAMON, > MR_TYPES > }; > > diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h > index 0190ef725b43..cd01dd7b3640 100644 > --- a/include/trace/events/migrate.h > +++ b/include/trace/events/migrate.h > @@ -22,7 +22,8 @@ > EM( MR_NUMA_MISPLACED, "numa_misplaced") \ > EM( MR_CONTIG_RANGE,"contig_range") \ > EM( MR_LONGTERM_PIN,"longterm_pin") \ > - EMe(MR_DEMOTION,"demotion") > + EM( MR_DEMOTION,"demotion") \ > + EMe(MR_DAMON, "damon") > > /* > * First define the enums in the above macros to be exported to userspace > -- > 2.34.1 > >
Re: [RFC PATCH v3 2/7] mm: make alloc_demote_folio externally invokable for migration
On Fri, 5 Apr 2024 15:08:51 +0900 Honggyu Kim wrote: > The alloc_demote_folio can be used out of vmscan.c so it'd be better to > remove static keyword from it. > > This function can also be used for both demotion and promotion so it'd > be better to rename it from alloc_demote_folio to alloc_migrate_folio. I'm not sure if renaming is really needed, but has no strong opinion. > > Signed-off-by: Honggyu Kim I have one more trivial comment below, but finds no blocker for me. Reviewed-by: SeongJae Park > --- > mm/internal.h | 1 + > mm/vmscan.c | 10 +++--- > 2 files changed, 8 insertions(+), 3 deletions(-) > > diff --git a/mm/internal.h b/mm/internal.h > index f309a010d50f..c96ff9bc82d0 100644 > --- a/mm/internal.h > +++ b/mm/internal.h > @@ -866,6 +866,7 @@ extern unsigned long __must_check vm_mmap_pgoff(struct > file *, unsigned long, > unsigned long, unsigned long); > > extern void set_pageblock_order(void); > +struct folio *alloc_migrate_folio(struct folio *src, unsigned long private); > unsigned long reclaim_pages(struct list_head *folio_list); > unsigned int reclaim_clean_pages_from_list(struct zone *zone, > struct list_head *folio_list); > diff --git a/mm/vmscan.c b/mm/vmscan.c > index 4255619a1a31..9e456cac03b4 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -910,8 +910,7 @@ static void folio_check_dirty_writeback(struct folio > *folio, > mapping->a_ops->is_dirty_writeback(folio, dirty, writeback); > } > > -static struct folio *alloc_demote_folio(struct folio *src, > - unsigned long private) > +struct folio *alloc_migrate_folio(struct folio *src, unsigned long private) > { > struct folio *dst; > nodemask_t *allowed_mask; > @@ -935,6 +934,11 @@ static struct folio *alloc_demote_folio(struct folio > *src, > if (dst) > return dst; > > + /* > + * Allocation failed from the target node so try to allocate from > + * fallback nodes based on allowed_mask. > + * See fallback_alloc() at mm/slab.c. > + */ I think this might better to be a separate cleanup patch, but given its small size, I have no strong opinion. > mtc->gfp_mask &= ~__GFP_THISNODE; > mtc->nmask = allowed_mask; > > @@ -973,7 +977,7 @@ static unsigned int demote_folio_list(struct list_head > *demote_folios, > node_get_allowed_targets(pgdat, _mask); > > /* Demotion ignores all cpuset and mempolicy settings */ > - migrate_pages(demote_folios, alloc_demote_folio, NULL, > + migrate_pages(demote_folios, alloc_migrate_folio, NULL, > (unsigned long), MIGRATE_ASYNC, MR_DEMOTION, > _succeeded); > > -- > 2.34.1 Thanks, SJ
Re: [RFC PATCH v3 1/7] mm/damon/paddr: refactor DAMOS_PAGEOUT with migration_mode
On Fri, 5 Apr 2024 15:08:50 +0900 Honggyu Kim wrote: > This is a preparation patch that introduces migration modes. > > The damon_pa_pageout is renamed to damon_pa_migrate and it receives an > extra argument for migration_mode. I personally think keeping damon_pa_pageout() as is and adding a new function (damon_pa_migrate()) with some duplicated code is also ok, but this approach also looks fine to me. So I have no strong opinion here, but just letting you know I would have no objection at both approaches. > > No functional changes applied. > > Signed-off-by: Honggyu Kim > --- > mm/damon/paddr.c | 18 +++--- > 1 file changed, 15 insertions(+), 3 deletions(-) > > diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c > index 081e2a325778..277a1c4d833c 100644 > --- a/mm/damon/paddr.c > +++ b/mm/damon/paddr.c > @@ -224,7 +224,12 @@ static bool damos_pa_filter_out(struct damos *scheme, > struct folio *folio) > return false; > } > > -static unsigned long damon_pa_pageout(struct damon_region *r, struct damos > *s) > +enum migration_mode { > + MIG_PAGEOUT, > +}; To avoid name conflicts, what about renaming to 'damos_migration_mode' and 'DAMOS_MIG_PAGEOUT'? > + > +static unsigned long damon_pa_migrate(struct damon_region *r, struct damos > *s, > + enum migration_mode mm) My poor brain has a bit confused with the name. What about calling it 'mode'? > { > unsigned long addr, applied; > LIST_HEAD(folio_list); > @@ -249,7 +254,14 @@ static unsigned long damon_pa_pageout(struct > damon_region *r, struct damos *s) > put_folio: > folio_put(folio); > } > - applied = reclaim_pages(_list); > + switch (mm) { > + case MIG_PAGEOUT: > + applied = reclaim_pages(_list); > + break; > + default: > + /* Unexpected migration mode. */ > + return 0; > + } > cond_resched(); > return applied * PAGE_SIZE; > } > @@ -297,7 +309,7 @@ static unsigned long damon_pa_apply_scheme(struct > damon_ctx *ctx, > { > switch (scheme->action) { > case DAMOS_PAGEOUT: > - return damon_pa_pageout(r, scheme); > + return damon_pa_migrate(r, scheme, MIG_PAGEOUT); > case DAMOS_LRU_PRIO: > return damon_pa_mark_accessed(r, scheme); > case DAMOS_LRU_DEPRIO: > -- > 2.34.1 Thanks, SJ
Re: [PATCH v9 1/2] memory tier: dax/kmem: introduce an abstract layer for finding, allocating, and putting memory types
Hi Ho-Ren, On Fri, 29 Mar 2024 05:33:52 + "Ho-Ren (Jack) Chuang" wrote: > Since different memory devices require finding, allocating, and putting > memory types, these common steps are abstracted in this patch, > enhancing the scalability and conciseness of the code. > > Signed-off-by: Ho-Ren (Jack) Chuang > Reviewed-by: "Huang, Ying" > --- > drivers/dax/kmem.c | 20 ++-- > include/linux/memory-tiers.h | 13 + > mm/memory-tiers.c| 32 > 3 files changed, 47 insertions(+), 18 deletions(-) > [...] > diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h > index 69e781900082..a44c03c2ba3a 100644 > --- a/include/linux/memory-tiers.h > +++ b/include/linux/memory-tiers.h > @@ -48,6 +48,9 @@ int mt_calc_adistance(int node, int *adist); > int mt_set_default_dram_perf(int nid, struct access_coordinate *perf, >const char *source); > int mt_perf_to_adistance(struct access_coordinate *perf, int *adist); > +struct memory_dev_type *mt_find_alloc_memory_type(int adist, > + struct list_head > *memory_types); > +void mt_put_memory_types(struct list_head *memory_types); > #ifdef CONFIG_MIGRATION > int next_demotion_node(int node); > void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets); > @@ -136,5 +139,15 @@ static inline int mt_perf_to_adistance(struct > access_coordinate *perf, int *adis > { > return -EIO; > } > + > +struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct > list_head *memory_types) > +{ > + return NULL; > +} > + > +void mt_put_memory_types(struct list_head *memory_types) > +{ > + > +} I found latest mm-unstable tree is failing kunit as below, and 'git bisect' says it happens from this patch. $ ./tools/testing/kunit/kunit.py run --build_dir ../kunit.out/ [11:56:40] Configuring KUnit Kernel ... [11:56:40] Building KUnit Kernel ... Populating config with: $ make ARCH=um O=../kunit.out/ olddefconfig Building with: $ make ARCH=um O=../kunit.out/ --jobs=36 ERROR:root:In file included from .../mm/memory.c:71: .../include/linux/memory-tiers.h:143:25: warning: no previous prototype for ‘mt_find_alloc_memory_type’ [-Wmissing-prototypes] 143 | struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct list_head *memory_types) | ^ .../include/linux/memory-tiers.h:148:6: warning: no previous prototype for ‘mt_put_memory_types’ [-Wmissing-prototypes] 148 | void mt_put_memory_types(struct list_head *memory_types) | ^~~ [...] Maybe we should set these as 'static inline', like below? I confirmed this fixes the kunit error. May I ask your opinion? diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h index a44c03c2ba3a..ee6e53144156 100644 --- a/include/linux/memory-tiers.h +++ b/include/linux/memory-tiers.h @@ -140,12 +140,12 @@ static inline int mt_perf_to_adistance(struct access_coordinate *perf, int *adis return -EIO; } -struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct list_head *memory_types) +static inline struct memory_dev_type *mt_find_alloc_memory_type(int adist, struct list_head *memory_types) { return NULL; } -void mt_put_memory_types(struct list_head *memory_types) +static inline void mt_put_memory_types(struct list_head *memory_types) { } Thanks, SJ
Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory
On Mon, 25 Mar 2024 15:53:03 -0700 SeongJae Park wrote: > On Mon, 25 Mar 2024 21:01:04 +0900 Honggyu Kim wrote: [...] > > On Fri, 22 Mar 2024 09:32:23 -0700 SeongJae Park wrote: > > > On Fri, 22 Mar 2024 18:02:23 +0900 Honggyu Kim wrote: [...] > > > > I would like to hear how you think about this. > > So, to summarize my humble opinion, > > 1. I like the idea of having two actions. But I'd like to use names other > than >'promote' and 'demote'. > 2. I still prefer having a filter for the page granularity access re-check. > [...] > > I will join the DAMON Beer/Coffee/Tea Chat tomorrow as scheduled so I > > can talk more about this issue. > > Looking forward to chatting with you :) We met and discussed about this topic in the chat series yesterday. Sharing the summary for keeping the open discussion. Honggyu thankfully accepted my humble suggestions on the last reply. Honggyu will post the third version of this patchset soon. The patchset will implement two new DAMOS actions, namely MIGRATE_HOT and MIGRATE_COLD. Those will migrate the DAMOS target regions to a user-specified NUMA node, but will have different prioritization score function. As name implies, they will prioritize more hot regions and cold regions, respectively. Honggyu, please feel free to fix if there is anything wrong or missed. And thanks to Honggyu again for patiently keeping this productive discussion and their awesome work. Thanks, SJ [...]
Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory
On Mon, 25 Mar 2024 21:01:04 +0900 Honggyu Kim wrote: > Hi SeongJae, > > On Fri, 22 Mar 2024 09:32:23 -0700 SeongJae Park wrote: > > On Fri, 22 Mar 2024 18:02:23 +0900 Honggyu Kim wrote: [...] > > > > Honggyu joined DAMON Beer/Coffee/Tea Chat[1] yesterday, and we > > > > discussed about > > > > this patchset in high level. Sharing the summary here for open > > > > discussion. As > > > > also discussed on the first version of this patchset[2], we want to > > > > make single > > > > action for general page migration with minimum changes, but would like > > > > to keep > > > > page level access re-check. We also agreed the previously proposed > > > > DAMOS > > > > filter-based approach could make sense for the purpose. > > > > > > Thanks very much for the summary. I have been trying to merge promote > > > and demote actions into a single migrate action, but I found an issue > > > regarding damon_pa_scheme_score. It currently calls damon_cold_score() > > > for demote action and damon_hot_score() for promote action, but what > > > should we call when we use a single migrate action? > > > > Good point! This is what I didn't think about when suggesting that. Thank > > you > > for letting me know this gap! I think there could be two approach, off the > > top > > of my head. > > > > The first one would be extending the interface so that the user can select > > the > > score function. This would let flexible usage, but I'm bit concerned if > > this > > could make things unnecessarily complex, and would really useful in many > > general use case. > > I also think this looks complicated and may not be useful for general > users. > > > The second approach would be letting DAMON infer the intention. In this > > case, > > I think we could know the intention is the demotion if the scheme has a youg > > pages exclusion filter. Then, we could use the cold_score(). And vice > > versa. > > To cover a case that there is no filter at all, I think we could have one > > assumption. My humble intuition says the new action (migrate) may be used > > more > > for promotion use case. So, in damon_pa_scheme_score(), if the action of > > the > > given scheme is the new one (say, MIGRATE), the function will further check > > if > > the scheme has a filter for excluding young pages. If so, the function will > > use cold_score(). Otherwise, the function will use hot_score(). > > Thanks for suggesting many ideas but I'm afraid that I feel this doesn't > look good. Thinking it again, I think we can think about keep using > DAMOS_PROMOTE and DAMOS_DEMOTE, In other words, keep having dedicated DAMOS action for intuitive prioritization score function, or, coupling the prioritization with each action, right? I think this makes sense, and fit well with the documentation. The prioritization mechanism should be different for each action. For example, rarely accessed (colder) memory regions would be prioritized for page-out scheme action. In contrast, the colder regions would be deprioritized for huge page collapse scheme action. Hence, the prioritization mechanisms for each action are implemented in each DAMON operations set, together with the actions. In other words, each DAMOS action should allow users intuitively understand what types of regions will be prioritized. We already have such couples of DAMOS actions such as DAMOS_[NO]HUGEPAGE and DAMOS_LRU_[DE]PRIO. So adding a couple of action for this case sounds reasonable to me. And I think this is better and simpler than having the inferrence based behavior. That said, I concern if 'PROMOTE' and 'DEMOTE' still sound bit ambiguous to people who don't know 'demote_folio_list()' and its friends. Meanwhile, the name might sound too detail about what it does to people who know the functions, so make it bit unflexible. They might also get confused since we don't have 'promote_folio_list()'. To my humble understanding, what you really want to do is migrating pages to specific address range (or node) prioritizing the pages based on the hotness. What about, say, MIGRATE_{HOT,COLD}? > but I can make them directly call > damon_folio_young() for access check instead of using young filter. > > And we can internally handle the complicated combination such as demote > action sets "young" filter with "matching" true and promote action sets > "young" filter with "matching" false. IMHO, this will make the usage > simpler. I think whether to exclude young/non-young (maybe idle is better
Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory
On Fri, 22 Mar 2024 17:27:34 +0900 Honggyu Kim wrote: [...] > OK. It could be a matter of preference and the current filter is already > in the mainline so I won't insist more. Thank you for accepting my humble suggestion. [...] > > I'd prefer improving the documents or > > user-space tool and keep the kernel code simple. > > OK. I will see if there is a way to improve damo tool for this instead > of making changes on the kernel side. Looking forward! [...] > Yeah, I made this thread too much about filter naming discussion rather > than tiered memory support. No problem at all. Thank you for keeping this productive discussion. [...] > Thanks again for your feedback. That's my pleasure :) Thanks, SJ [...]
Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory
On Fri, 22 Mar 2024 18:02:23 +0900 Honggyu Kim wrote: > Hi SeongJae, > > On Tue, 27 Feb 2024 15:51:20 -0800 SeongJae Park wrote: > > On Mon, 26 Feb 2024 23:05:46 +0900 Honggyu Kim wrote: > > > > > There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously > > > posted at [1]. > > > > > > It says there is no implementation of the demote/promote DAMOS action > > > are made. This RFC is about its implementation for physical address > > > space. > > > > > > [...] > > Thank you for running the tests again with the new version of the patches > > and > > sharing the results! > > It's a bit late answer, but the result was from the previous evaluation. > I ran it again with RFC v2, but didn't see much difference so just > pasted the same result here. No problem, thank you for clarifying :) [...] > > > Honggyu Kim (3): > > > mm/damon: refactor DAMOS_PAGEOUT with migration_mode > > > mm: make alloc_demote_folio externally invokable for migration > > > mm/damon: introduce DAMOS_DEMOTE action for demotion > > > > > > Hyeongtak Ji (4): > > > mm/memory-tiers: add next_promotion_node to find promotion target > > > mm/damon: introduce DAMOS_PROMOTE action for promotion > > > mm/damon/sysfs-schemes: add target_nid on sysfs-schemes > > > mm/damon/sysfs-schemes: apply target_nid for promote and demote > > > actions > > > > Honggyu joined DAMON Beer/Coffee/Tea Chat[1] yesterday, and we discussed > > about > > this patchset in high level. Sharing the summary here for open discussion. > > As > > also discussed on the first version of this patchset[2], we want to make > > single > > action for general page migration with minimum changes, but would like to > > keep > > page level access re-check. We also agreed the previously proposed DAMOS > > filter-based approach could make sense for the purpose. > > Thanks very much for the summary. I have been trying to merge promote > and demote actions into a single migrate action, but I found an issue > regarding damon_pa_scheme_score. It currently calls damon_cold_score() > for demote action and damon_hot_score() for promote action, but what > should we call when we use a single migrate action? Good point! This is what I didn't think about when suggesting that. Thank you for letting me know this gap! I think there could be two approach, off the top of my head. The first one would be extending the interface so that the user can select the score function. This would let flexible usage, but I'm bit concerned if this could make things unnecessarily complex, and would really useful in many general use case. The second approach would be letting DAMON infer the intention. In this case, I think we could know the intention is the demotion if the scheme has a youg pages exclusion filter. Then, we could use the cold_score(). And vice versa. To cover a case that there is no filter at all, I think we could have one assumption. My humble intuition says the new action (migrate) may be used more for promotion use case. So, in damon_pa_scheme_score(), if the action of the given scheme is the new one (say, MIGRATE), the function will further check if the scheme has a filter for excluding young pages. If so, the function will use cold_score(). Otherwise, the function will use hot_score(). So I'd more prefer the second approach. I think it would be not too late to consider the first approach after waiting for it turns out more actions have such ambiguity and need more general interface for explicitly set the score function. Thanks, SJ [...]
Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory
Hi Honggyu, On Wed, 20 Mar 2024 16:07:48 +0900 Honggyu Kim wrote: > Hi SeongJae, > > On Mon, 18 Mar 2024 12:07:21 -0700 SeongJae Park wrote: > > On Mon, 18 Mar 2024 22:27:45 +0900 Honggyu Kim wrote: > > > > > Hi SeongJae, > > > > > > On Sun, 17 Mar 2024 08:31:44 -0700 SeongJae Park wrote: > > > > Hi Honggyu, > > > > > > > > On Sun, 17 Mar 2024 17:36:29 +0900 Honggyu Kim > > > > wrote: > > > > > > > > > Hi SeongJae, > > > > > > > > > > Thanks for the confirmation. I have a few comments on young filter so > > > > > please read the inline comments again. > > > > > > > > > > On Wed, 12 Mar 2024 08:53:00 -0800 SeongJae Park > > > > > wrote: > > > > > > Hi Honggyu, > > > > > > > > > > > > > > -Original Message- > > > > > > > > From: SeongJae Park > > > > > > > > Sent: Tuesday, March 12, 2024 3:33 AM > > > > > > > > To: Honggyu Kim > > > > > > > > Cc: SeongJae Park ; kernel_team > > > > > > > > > > > > > > > > Subject: RE: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory > > > > > > > > management for CXL memory > > > > > > > > > > > > > > > > Hi Honggyu, > > > > > > > > > > > > > > > > On Mon, 11 Mar 2024 12:51:12 + "honggyu@sk.com" > > > > > > > > wrote: > > > > > > > > > > > > > > > > > Hi SeongJae, > > > > > > > > > > > > > > > > > > I've tested it again and found that "young" filter has to be > > > > > > > > > set > > > > > > > > > differently as follows. > > > > > > > > > - demote action: set "young" filter with "matching" true > > > > > > > > > - promote action: set "young" filter with "matching" false > > > > > > Thinking it again, I feel like "matching" true or false looks quite > > > vague to me as a general user. > > > > > > Instead, I would like to have more meaningful names for "matching" as > > > follows. > > > > > > - matching "true" can be either (filter) "out" or "skip". > > > - matching "false" can be either (filter) "in" or "apply". > > > > I agree the naming could be done much better. And thank you for the nice > > suggestions. I have a few concerns, though. > > I don't think my suggestion is best. I just would like to have more > discussion about it. I also understand my naming sense is far from good :) I'm grateful to have this constructive discussion! > > > Firstly, increasing the number of behavioral concepts. DAMOS filter feature > > has only single behavior: excluding some types of memory from DAMOS action > > target. The "matching" is to provide a flexible way for further specifying > > the > > target to exclude in a bit detail. Without it, we would need non-variant > > for > > each filter type. Comapred to the current terms, the new terms feel like > > implying there are two types of behaviors. I think one behavior is easier > > to > > understand than two behaviors, and better match what internal code is doing. > > > > Secondly, ambiguity in "in" and "apply". To me, the terms sound like > > _adding_ > > something more than _excluding_. > > I understood that young filter "matching" "false" means apply action > only to young pages. Do I misunderstood something here? If not, Technically speaking, having a DAMOS filter with 'matching' parameter as 'false' for 'young pages' type means you want DAMOS to "exclude pages that not young from the scheme's action target". That's the only thing it truly does, and what it tries to guarantee. Whether the action will be applied to young pages or not depends on more factors including additional filters and DAMOS parameter. IOW, that's not what the simple setting promises. Of course, I know you are assuming there is only the single filter. Hence, effectively you're correct. And the sentence may be a better wording for end users. However, it tooke me a bit time to understand your assumption and concluding whether
Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory
On Mon, 18 Mar 2024 22:27:45 +0900 Honggyu Kim wrote: > Hi SeongJae, > > On Sun, 17 Mar 2024 08:31:44 -0700 SeongJae Park wrote: > > Hi Honggyu, > > > > On Sun, 17 Mar 2024 17:36:29 +0900 Honggyu Kim wrote: > > > > > Hi SeongJae, > > > > > > Thanks for the confirmation. I have a few comments on young filter so > > > please read the inline comments again. > > > > > > On Wed, 12 Mar 2024 08:53:00 -0800 SeongJae Park wrote: > > > > Hi Honggyu, > > > > > > > > > > -Original Message- > > > > > > From: SeongJae Park > > > > > > Sent: Tuesday, March 12, 2024 3:33 AM > > > > > > To: Honggyu Kim > > > > > > Cc: SeongJae Park ; kernel_team > > > > > > > > > > > > Subject: RE: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory > > > > > > management for CXL memory > > > > > > > > > > > > Hi Honggyu, > > > > > > > > > > > > On Mon, 11 Mar 2024 12:51:12 + "honggyu@sk.com" > > > > > > wrote: > > > > > > > > > > > > > Hi SeongJae, > > > > > > > > > > > > > > I've tested it again and found that "young" filter has to be set > > > > > > > differently as follows. > > > > > > > - demote action: set "young" filter with "matching" true > > > > > > > - promote action: set "young" filter with "matching" false > > Thinking it again, I feel like "matching" true or false looks quite > vague to me as a general user. > > Instead, I would like to have more meaningful names for "matching" as > follows. > > - matching "true" can be either (filter) "out" or "skip". > - matching "false" can be either (filter) "in" or "apply". I agree the naming could be done much better. And thank you for the nice suggestions. I have a few concerns, though. Firstly, increasing the number of behavioral concepts. DAMOS filter feature has only single behavior: excluding some types of memory from DAMOS action target. The "matching" is to provide a flexible way for further specifying the target to exclude in a bit detail. Without it, we would need non-variant for each filter type. Comapred to the current terms, the new terms feel like implying there are two types of behaviors. I think one behavior is easier to understand than two behaviors, and better match what internal code is doing. Secondly, ambiguity in "in" and "apply". To me, the terms sound like _adding_ something more than _excluding_. I think that might confuse people in some cases. Actually, I have used the term "filter-out" and "filter-in" on this and several threads. When saying about "filter-in" scenario, I had to emphasize the fact that it is not adding something but excluding others. I now think that was not a good approach. Finally, "apply" sounds a bit deterministic. I think it could be a bit confusing in some cases such as when using multiple filters in a combined way. For example, if we have two filters for 1) "apply" a memcg and 2) skip anon pages, the given DAMOS action will not be applied to anon pages of the memcg. I think this might be a bit confusing. > > Internally, the type of "matching" can be boolean, but it'd be better > for general users have another ways to set it such as "out"/"in" or > "skip"/"apply" via sysfs interface. I prefer "skip" and "apply" looks > more intuitive, but I don't have strong objection on "out" and "in" as > well. Unfortunately, DAMON sysfs interface is an ABI that we want to keep stable. Of course we could make some changes on it if really required. But I'm unsure if the problem of current naming and benefit of the sugegsted change are big enough to outweighs the stability risk and additional efforts. Also, DAMON sysfs interface is arguably not for _very_ general users. DAMON user-space tool is the one for _more_ general users. To quote DAMON usage document, - *DAMON user space tool.* `This <https://github.com/awslabs/damo>`_ is for privileged people such as system administrators who want a just-working human-friendly interface. [...] - *sysfs interface.* :ref:`This ` is for privileged user space programmers who want more optimized use of DAMON. [...] If the concept is that confused, I think we could improve the docum
Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory
On Sun, 17 Mar 2024 08:31:44 -0700 SeongJae Park wrote: > Hi Honggyu, > > On Sun, 17 Mar 2024 17:36:29 +0900 Honggyu Kim wrote: > > > Hi SeongJae, > > > > Thanks for the confirmation. I have a few comments on young filter so > > please read the inline comments again. > > > > On Wed, 12 Mar 2024 08:53:00 -0800 SeongJae Park wrote: > > > Hi Honggyu, [...] > > Thanks. I see that it works fine, but I would like to have more > > discussion about "young" filter. What I think about filter is that if I > > apply "young" filter "true" for demotion, then the action applies only > > for "young" pages, but the current implementation works opposite. > > > > I understand the function name of internal implementation is > > "damos_pa_filter_out" so the basic action is filtering out, but the > > cgroup filter works in the opposite way for now. > > Does memcg filter works in the opposite way? I don't think so because > __damos_pa_filter_out() sets 'matches' as 'true' only if the the given folio > is > contained in the given memcg. 'young' filter also simply sets 'matches' as > 'true' only if the given folio is young. > > If it works in the opposite way, it's a bug that need to be fixed. Please let > me know if I'm missing something. I just read the DAMOS filters part of the documentation for DAMON sysfs interface again. I believe it is explaining the meaning of 'matching' as I intended to, as below: You can write ``Y`` or ``N`` to ``matching`` file to filter out pages that does or does not match to the type, respectively. Then, the scheme's action will not be applied to the pages that specified to be filtered out. But, I found the following example for memcg filter is wrong, as below: For example, below restricts a DAMOS action to be applied to only non-anonymous pages of all memory cgroups except ``/having_care_already``.:: # echo 2 > nr_filters # # filter out anonymous pages echo anon > 0/type echo Y > 0/matching # # further filter out all cgroups except one at '/having_care_already' echo memcg > 1/type echo /having_care_already > 1/memcg_path echo N > 1/matching Specifically, the last line of the commands should write 'Y' instead of 'N' to do what explained. Without the fix, the action will be applied to only non-anonymous pages of 'having_care_already' memcg. This is definitely wrong. I will fix this soon. I'm unsure if this is what made you to believe memcg DAMOS filter is working in the opposite way, though. Thanks, SJ [...]
Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory
Hi Honggyu, On Sun, 17 Mar 2024 17:36:29 +0900 Honggyu Kim wrote: > Hi SeongJae, > > Thanks for the confirmation. I have a few comments on young filter so > please read the inline comments again. > > On Wed, 12 Mar 2024 08:53:00 -0800 SeongJae Park wrote: > > Hi Honggyu, > > > > > > -Original Message- > > > > From: SeongJae Park > > > > Sent: Tuesday, March 12, 2024 3:33 AM > > > > To: Honggyu Kim > > > > Cc: SeongJae Park ; kernel_team > > > > > > > > Subject: RE: Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory > > > > management for CXL memory > > > > > > > > Hi Honggyu, > > > > > > > > On Mon, 11 Mar 2024 12:51:12 + "honggyu@sk.com" > > > > wrote: > > > > > > > > > Hi SeongJae, > > > > > > > > > > I've tested it again and found that "young" filter has to be set > > > > > differently as follows. > > > > > - demote action: set "young" filter with "matching" true > > > > > - promote action: set "young" filter with "matching" false > > > > > > > > DAMOS filter is basically for filtering "out" memory regions that > > > > matches to > > > > the condition. Hence in your setup, young pages are not filtered out > > > > from > > > > demote action target. > > > > > > I thought young filter true means "young pages ARE filtered out" for > > > demotion. > > > > You're correct. > > Ack. > > > > > > > > That is, you're demoting pages that "not" young. > > > > > > Your explanation here looks opposite to the previous statement. > > > > Again, you're correct. My intention was "non-young pages are not ..." but > > maybe I was out of my mind and mistakenly removed "non-" part. Sorry for > > the > > confusion. > > No problem. I also think it's quite confusing. > > > > > > > > And vice versa, so you're applying promote to non-non-young (young) > > > > pages. > > > > > > Yes, I understand "promote non-non-young pages" means "promote young > > > pages". > > > This might be understood as "young pages are NOT filtered out" for > > > promotion > > > but it doesn't mean that "old pages are filtered out" instead. > > > And we just rely hot detection only on DAMOS logics such as access > > > frequency > > > and age. Am I correct? > > > > You're correct. > > Ack. But if it doesn't mean that "old pages are filtered out" instead, It does mean that. Here, filtering is exclusive. Hence, "filter-in a type of pages" means "filter-out pages of other types". At least that's the intention. To quote the documentation (https://docs.kernel.org/mm/damon/design.html#filters), Each filter specifies the type of target memory, and whether it should exclude the memory of the type (filter-out), or all except the memory of the type (filter-in). > then do we really need this filter for promotion? If not, maybe should > we create another "old" filter for promotion? As of now, the promotion > is mostly done inaccurately, but the accurate migration is done at > demotion level. Is this based on your theory? Or, a real behavior that you're seeing from your setup? If this is a real behavior, I think that should be a bug that need to be fixed. > To avoid this issue, I feel we should promotion only "young" pages after > filtering "old" pages out. > > > > > > > > I understand this is somewhat complex, but what we have for now. > > > > > > Thanks for the explanation. I guess you mean my filter setup is correct. > > > Is it correct? > > > > Again, you're correct. Your filter setup is what I expected to :) > > Thanks. I see that it works fine, but I would like to have more > discussion about "young" filter. What I think about filter is that if I > apply "young" filter "true" for demotion, then the action applies only > for "young" pages, but the current implementation works opposite. > > I understand the function name of internal implementation is > "damos_pa_filter_out" so the basic action is filtering out, but the > cgroup filter works in the opposite way for now. Does memcg filter works in the opposit
Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory
Hello, On Tue, 27 Feb 2024 15:51:20 -0800 SeongJae Park wrote: > On Mon, 26 Feb 2024 23:05:46 +0900 Honggyu Kim wrote: > > > There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously > > posted at [1]. > > > > It says there is no implementation of the demote/promote DAMOS action > > are made. This RFC is about its implementation for physical address > > space. [...] > Honggyu joined DAMON Beer/Coffee/Tea Chat[1] yesterday, and we discussed about > this patchset in high level. Sharing the summary here for open discussion. > As > also discussed on the first version of this patchset[2], we want to make > single > action for general page migration with minimum changes, but would like to keep > page level access re-check. We also agreed the previously proposed DAMOS > filter-based approach could make sense for the purpose. > > Because I was anyway planning making such DAMOS filter for not only > promotion/demotion but other types of DAMOS action, I will start developing > the > page level access re-check results based DAMOS filter. Once the > implementation > of the prototype is done, I will share the early implementation. Then, > Honggyu > will adjust their implementation based on the filter, and run their tests > again > and share the results. I just posted an RFC patchset for the page level access re-check DAMOS filter: https://lore.kernel.org/r/20240307030013.47041-1...@kernel.org I hope it to help you better understanding and testing the idea. > > [1] https://lore.kernel.org/damon/20220810225102.124459-1...@kernel.org/ > [2] https://lore.kernel.org/damon/20240118171756.80356-1...@kernel.org Thanks, SJ [...]
Re: [RFC PATCH v2 0/7] DAMON based 2-tier memory management for CXL memory
On Mon, 26 Feb 2024 23:05:46 +0900 Honggyu Kim wrote: > There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously > posted at [1]. > > It says there is no implementation of the demote/promote DAMOS action > are made. This RFC is about its implementation for physical address > space. > > > Introduction > > > With the advent of CXL/PCIe attached DRAM, which will be called simply > as CXL memory in this cover letter, some systems are becoming more > heterogeneous having memory systems with different latency and bandwidth > characteristics. They are usually handled as different NUMA nodes in > separate memory tiers and CXL memory is used as slow tiers because of > its protocol overhead compared to local DRAM. > > In this kind of systems, we need to be careful placing memory pages on > proper NUMA nodes based on the memory access frequency. Otherwise, some > frequently accessed pages might reside on slow tiers and it makes > performance degradation unexpectedly. Moreover, the memory access > patterns can be changed at runtime. > > To handle this problem, we need a way to monitor the memory access > patterns and migrate pages based on their access temperature. The > DAMON(Data Access MONitor) framework and its DAMOS(DAMON-based Operation > Schemes) can be useful features for monitoring and migrating pages. > DAMOS provides multiple actions based on DAMON monitoring results and it > can be used for proactive reclaim, which means swapping cold pages out > with DAMOS_PAGEOUT action, but it doesn't support migration actions such > as demotion and promotion between tiered memory nodes. > > This series supports two new DAMOS actions; DAMOS_DEMOTE for demotion > from fast tiers and DAMOS_PROMOTE for promotion from slow tiers. This > prevents hot pages from being stuck on slow tiers, which makes > performance degradation and cold pages can be proactively demoted to > slow tiers so that the system can increase the chance to allocate more > hot pages to fast tiers. > > The DAMON provides various tuning knobs but we found that the proactive > demotion for cold pages is especially useful when the system is running > out of memory on its fast tier nodes. > > Our evaluation result shows that it reduces the performance slowdown > compared to the default memory policy from 15~17% to 4~5% when the > system runs under high memory pressure on its fast tier DRAM nodes. > > > DAMON configuration > === > > The specific DAMON configuration doesn't have to be in the scope of this > patch series, but some rough idea is better to be shared to explain the > evaluation result. > > The DAMON provides many knobs for fine tuning but its configuration file > is generated by HMSDK[2]. It includes gen_config.py script that > generates a json file with the full config of DAMON knobs and it creates > multiple kdamonds for each NUMA node when the DAMON is enabled so that > it can run hot/cold based migration for tiered memory. I was feeling a bit confused from here since DAMON doesn't receive parameters via a file. To my understanding, the 'configuration file' means the input file for DAMON user-space tool, damo, not DAMON. Just a trivial thing, but making it clear if possible could help readers in my opinion. > > > Evaluation Workload > === > > The performance evaluation is done with redis[3], which is a widely used > in-memory database and the memory access patterns are generated via > YCSB[4]. We have measured two different workloads with zipfian and > latest distributions but their configs are slightly modified to make > memory usage higher and execution time longer for better evaluation. > > The idea of evaluation using these demote and promote actions covers > system-wide memory management rather than partitioning hot/cold pages of > a single workload. The default memory allocation policy creates pages > to the fast tier DRAM node first, then allocates newly created pages to > the slow tier CXL node when the DRAM node has insufficient free space. > Once the page allocation is done then those pages never move between > NUMA nodes. It's not true when using numa balancing, but it is not the > scope of this DAMON based 2-tier memory management support. > > If the working set of redis can be fit fully into the DRAM node, then > the redis will access the fast DRAM only. Since the performance of DRAM > only is faster than partially accessing CXL memory in slow tiers, this > environment is not useful to evaluate this patch series. > > To make pages of redis be distributed across fast DRAM node and slow > CXL node to evaluate our demote and promote actions, we pre-allocate > some cold memory externally using mmap and memset before launching > redis-server. We assumed that there are enough amount of cold memory in > datacenters as TMO[5] and TPP[6] papers mentioned. > > The evaluation sequence is as follows. > > 1. Turn on DAMON with DAMOS_DEMOTE action
Re: [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory
On Thu, 18 Jan 2024 19:40:16 +0900 Hyeongtak Ji wrote: > Hi SeongJae, > > On Wed, 17 Jan 2024 SeongJae Park wrote: > > [...] > >> Let's say there are 3 nodes in the system and the first node0 and node1 > >> are the first tier, and node2 is the second tier. > >> > >> $ cat /sys/devices/virtual/memory_tiering/memory_tier4/nodelist > >> 0-1 > >> > >> $ cat /sys/devices/virtual/memory_tiering/memory_tier22/nodelist > >> 2 > >> > >> Here is the result of partitioning hot/cold memory and I put execution > >> command at the right side of numastat result. I initially ran each > >> hot_cold program with preferred setting so that they initially allocate > >> memory on one of node0 or node2, but they gradually migrated based on > >> their access frequencies. > >> > >> $ numastat -c -p hot_cold > >> Per-node process memory usage (in MBs) > >> PID Node 0 Node 1 Node 2 Total > >> --- -- -- -- - > >> 754 (hot_cold) 1800 0 2000 3800<- hot_cold 1800 2000 > >> 1184 (hot_cold) 300 0500 800<- hot_cold 300 500 > >> 1818 (hot_cold) 801 0 3199 4000<- hot_cold 800 3200 > >> 30289 (hot_cold) 4 0 510<- hot_cold 3 5 > >> 30325 (hot_cold) 31 0 5181<- hot_cold 30 50 > >> --- -- -- -- - > >> Total 2938 0 5756 8695 > >> > >> The final node placement result shows that DAMON accurately migrated > >> pages by their hotness for multiple processes. > > > > What was the result when the corner cases handling logics were not applied? > > This is the result of the same test that Honggyu did, but with an insufficient > corner cases handling logics. > > $ numastat -c -p hot_cold > > Per-node process memory usage (in MBs) > PID Node 0 Node 1 Node 2 Total > -- -- -- -- - > 862 (hot_cold)2256 0 1545 3801 <- hot_cold 1800 2000 > 863 (hot_cold) 403 0398 801 <- hot_cold 300 500 > 864 (hot_cold)1520 0 2482 4001 <- hot_cold 800 3200 > 865 (hot_cold) 6 0 3 9 <- hot_cold 3 5 > 866 (hot_cold) 29 0 5281 <- hot_cold 30 50 > -- -- -- -- - > Total 4215 0 4480 8695 > > As time goes by, DAMON keeps trying to split the hot/cold region, but it does > not seem to be enough. > > $ numastat -c -p hot_cold > > Per-node process memory usage (in MBs) > PID Node 0 Node 1 Node 2 Total > -- -- -- -- - > 862 (hot_cold)2022 0 1780 3801 <- hot_cold 1800 2000 > 863 (hot_cold) 351 0450 801 <- hot_cold 300 500 > 864 (hot_cold)1134 0 2868 4001 <- hot_cold 800 3200 > 865 (hot_cold) 7 0 2 9 <- hot_cold 3 5 > 866 (hot_cold) 43 0 3981 <- hot_cold 30 50 > -- -- -- -- - > Total 3557 0 5138 8695 > > > > > And, what are the corner cases handling logic that seemed essential? I show > > the page granularity active/reference check could indeed provide many > > improvements, but that's only my humble assumption. > > Yes, the page granularity active/reference check is essential. To make the > above "insufficient" result, the only thing I did was to promote > inactive/not_referenced pages. > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index f03be320f9ad..c2aefb883c54 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -1127,9 +1127,7 @@ static unsigned int __promote_folio_list(struct > list_head *folio_list, > VM_BUG_ON_FOLIO(folio_test_active(folio), folio); > > references = folio_check_references(folio, sc); > - if (references == FOLIOREF_KEEP || > - references == FOLIOREF_RECLAIM || > - references == FOLIOREF_RECLAIM_CLEAN) > + if (references == FOLIOREF_KEEP ) > goto keep_locked; > > /* Relocate its contents to another node. */ Thank you for sharing the details :) I think DAMOS filters based approach could be worthy to try, then. > > > > > If the corner cases are indeed better to be applied in page granularity, I > > agree we need some more efforts since DAMON monitoring r
Re: [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory
On Wed, 17 Jan 2024 13:11:03 -0800 SeongJae Park wrote: [...] > Hi Honggyu, > > On Wed, 17 Jan 2024 20:49:25 +0900 Honggyu Kim wrote: > > > Hi SeongJae, > > > > Thanks very much for your comments in details. > > > > On Tue, 16 Jan 2024 12:31:59 -0800 SeongJae Park wrote: > > [...] > > > To this end, I feel the problem might be able te be simpler, because this > > > patchset is trying to provide two sophisticated operations, while I think > > > a > > > simpler approach might be possible. My humble simpler idea is adding a > > > DAMOS > > > operation for moving pages to a given node (like sys_move_phy_pages > > > RFC[1]), > > > instead of the promote/demote. Because the general pages migration can > > > handle > > > multiple cases including the promote/demote in my humble assumption. [...] > > > In more detail, users could decide which is the appropriate node for > > > promotion > > > or demotion and use the new DAMOS action to do promotion and demotion. > > > Users > > > would requested to decide which node is the proper promotion/demotion > > > target > > > nodes, but that decision wouldn't be that hard in my opinion. > > > > > > For this, 'struct damos' would need to be updated for such > > > argument-dependent > > > actions, like 'struct damos_filter' is haing a union. > > > > That might be a better solution. I will think about it. > > More specifically, I think receiving an address range as the argument might > more flexible than just NUMA node. Maybe we can imagine proactively migrating > cold movable pages from normal zones to movable zones, to avoid normal zone > memory pressure. Yet another crazy idea. Finding hot regions in the middle of cold region and move to besides of other hot pages. As a result, memory is sorted by access temperature even in same node, and the system gains more spatial locality, which benefits general locality-based algorithms including DAMON's adaptive regions adjustment. Thanks, SJ [...]
Re: [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory
Hi Honggyu, On Wed, 17 Jan 2024 20:49:25 +0900 Honggyu Kim wrote: > Hi SeongJae, > > Thanks very much for your comments in details. > > On Tue, 16 Jan 2024 12:31:59 -0800 SeongJae Park wrote: > > > Thank you so much for this great patches and the above nice test results. I > > believe the test setup and results make sense, and merging a revised > > version of > > this patchset would provide real benefits to the users. > > Glad to hear that! > > > In a high level, I think it might better to separate DAMON internal changes > > from DAMON external changes. > > I agree. I can't guarantee but I can move all the external changes > inside mm/damon, but will try that as much as possible. > > > For DAMON part changes, I have no big concern other than trivial coding > > style > > level comments. > > Sure. I will fix those. > > > For DAMON-external changes that implementing demote_pages() and > > promote_pages(), I'm unsure if the implementation is reusing appropriate > > functions, and if those are placee in right source file. Especially, I'm > > unsure if vmscan.c is the right place for promotion code. Also I don't > > know if > > there is a good agreement on the promotion/demotion target node decision. > > That > > should be because I'm not that familiar with the areas and the files, but I > > feel this might because our discussions on the promotion and the demotion > > operations are having rooms for being more matured. Because I'm not very > > faimiliar with the part, I'd like to hear others' comments, too. > > I would also like to hear others' comments, but this might not be needed > if most of external code can be moved to mm/damon. > > > To this end, I feel the problem might be able te be simpler, because this > > patchset is trying to provide two sophisticated operations, while I think a > > simpler approach might be possible. My humble simpler idea is adding a > > DAMOS > > operation for moving pages to a given node (like sys_move_phy_pages RFC[1]), > > instead of the promote/demote. Because the general pages migration can > > handle > > multiple cases including the promote/demote in my humble assumption. > > My initial implementation was similar but I found that it's not accurate > enough due to the nature of inaccuracy of DAMON regions. I saw that > many pages were demoted and promoted back and forth because migration > target regions include both hot and cold pages together. > > So I have implemented the demotion and promotion logics based on the > shrink_folio_list, which contains many corner case handling logics for > reclaim. > > Having the current demotion and promotion logics makes the hot/cold > migration pretty accurate as expected. We made a simple program called > "hot_cold" and it receives 2 arguments for hot size and cold size in MB. > For example, "hot_cold 200 500" allocates 200MB of hot memory and 500MB > of cold memory. It basically allocates 2 large blocks of memory with > mmap, then repeat memset for the initial 200MB to make it accessed in an > infinite loop. > > Let's say there are 3 nodes in the system and the first node0 and node1 > are the first tier, and node2 is the second tier. > > $ cat /sys/devices/virtual/memory_tiering/memory_tier4/nodelist > 0-1 > > $ cat /sys/devices/virtual/memory_tiering/memory_tier22/nodelist > 2 > > Here is the result of partitioning hot/cold memory and I put execution > command at the right side of numastat result. I initially ran each > hot_cold program with preferred setting so that they initially allocate > memory on one of node0 or node2, but they gradually migrated based on > their access frequencies. > > $ numastat -c -p hot_cold > Per-node process memory usage (in MBs) > PID Node 0 Node 1 Node 2 Total > --- -- -- -- - > 754 (hot_cold) 1800 0 2000 3800<- hot_cold 1800 2000 > 1184 (hot_cold) 300 0500 800<- hot_cold 300 500 > 1818 (hot_cold) 801 0 3199 4000<- hot_cold 800 3200 > 30289 (hot_cold) 4 0 510<- hot_cold 3 5 > 30325 (hot_cold) 31 0 5181<- hot_cold 30 50 > --- -- -- -- - > Total 2938 0 5756 8695 > > The final node placement result shows that DAMON accurately migrated > pages by their hotness for multiple processes. What was the result when the corner cases handling logics were not applied? And, what are the corner cases handling logic that seemed essential? I show the page g
Re: [RFC PATCH 2/4] mm/damon: introduce DAMOS_DEMOTE action for demotion
On Mon, 15 Jan 2024 13:52:50 +0900 Honggyu Kim wrote: > This patch introduces DAMOS_DEMOTE action, which is similar to > DAMOS_PAGEOUT, but demote folios instead of swapping them out. > > Since there are some common routines with pageout, many functions have > similar logics between pageout and demote. > > The execution sequence of DAMOS_PAGEOUT and DAMOS_DEMOTE look as follows. > > DAMOS_PAGEOUT action > damo_pa_apply_scheme Nit. s/damo/damon/ > -> damon_pa_reclaim > -> reclaim_pages > -> reclaim_folio_list > -> shrink_folio_list > > DAMOS_DEMOTE action > damo_pa_apply_scheme Ditto. > -> damon_pa_reclaim > -> demote_pages > -> do_demote_folio_list > -> __demote_folio_list > -> demote_folio_list I think implementation of 'demote_pages()' might better to be separated. I'm also feeling the naming a bit strange, since I was usually thinking '__' prefix is for functions that will internally used. That is, I'd assume __demote_folio_list() is called from demote_folio_list(), but this function is doing in an opposite way. > > __demote_folio_list() is a minimized version of shrink_folio_list(), but > it's minified only for demotion. > > Signed-off-by: Honggyu Kim > --- > include/linux/damon.h| 2 + > mm/damon/paddr.c | 17 +--- > mm/damon/sysfs-schemes.c | 1 + > mm/internal.h| 1 + > mm/vmscan.c | 84 > 5 files changed, 99 insertions(+), 6 deletions(-) > > diff --git a/include/linux/damon.h b/include/linux/damon.h > index e00ddf1ed39c..4c0a0fef09c5 100644 > --- a/include/linux/damon.h > +++ b/include/linux/damon.h > @@ -106,6 +106,7 @@ struct damon_target { > * @DAMOS_LRU_PRIO: Prioritize the region on its LRU lists. > * @DAMOS_LRU_DEPRIO:Deprioritize the region on its LRU lists. > * @DAMOS_STAT: Do nothing but count the stat. > + * @DAMOS_DEMOTE:Do demotion for the current region. I'd prefer defining DEMOTE before STAT, like we introduced LRU_PRIO/DEPRIO after STAT but defined there. It would help keeping the two different groups of operations separated (STAT is different from other actions since it is not for makeing real changes but only get statistics and monitoring results querying). > * @NR_DAMOS_ACTIONS:Total number of DAMOS actions > * > * The support of each action is up to running damon_operations. > @@ -123,6 +124,7 @@ enum damos_action { > DAMOS_LRU_PRIO, > DAMOS_LRU_DEPRIO, > DAMOS_STAT, /* Do nothing but only record the stat */ > + DAMOS_DEMOTE, Ditto. > NR_DAMOS_ACTIONS, > }; > > diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c > index 081e2a325778..d3e3f077cd00 100644 > --- a/mm/damon/paddr.c > +++ b/mm/damon/paddr.c > @@ -224,7 +224,7 @@ static bool damos_pa_filter_out(struct damos *scheme, > struct folio *folio) > return false; > } > > -static unsigned long damon_pa_pageout(struct damon_region *r, struct damos > *s) > +static unsigned long damon_pa_reclaim(struct damon_region *r, struct damos > *s, bool is_demote) I understand that reclamation could include both pageout and demotion, but not sure if that is making its purpose clearer or more ambiguous. What about renaming to '..._demote_or_pageout()', like 'damon_pa_mark_accessed_or_deactivate()'? Also, 'is_demote' could be simply 'demote'. I think having a separate function, say, damon_pa_demote() is also ok, if it makes code easier to read and not intorduce too much duplicated lines of code. Also, I'd prefer keeping the 80 columns limit[1] by breaking this line. [1] https://docs.kernel.org/process/coding-style.html?highlight=coding+style#breaking-long-lines-and-strings > { > unsigned long addr, applied; > LIST_HEAD(folio_list); > @@ -242,14 +242,17 @@ static unsigned long damon_pa_pageout(struct > damon_region *r, struct damos *s) > folio_test_clear_young(folio); > if (!folio_isolate_lru(folio)) > goto put_folio; > - if (folio_test_unevictable(folio)) > + if (folio_test_unevictable(folio) && !is_demote) > folio_putback_lru(folio); > else > list_add(>lru, _list); > put_folio: > folio_put(folio); > } > - applied = reclaim_pages(_list); > + if (is_demote) > + applied = demote_pages(_list); > + else > + applied = reclaim_pages(_list); > cond_resched(); > return applied * PAGE_SIZE; > } > @@ -297,13 +300,15 @@ static unsigned long damon_pa_apply_scheme(struct > damon_ctx *ctx, > { > switch (scheme->action) { > case DAMOS_PAGEOUT: > - return damon_pa_pageout(r, scheme); > + return damon_pa_reclaim(r, scheme, false); > case DAMOS_LRU_PRIO: > return damon_pa_mark_accessed(r, scheme); > case DAMOS_LRU_DEPRIO: >
Re: [RFC PATCH 1/4] mm/vmscan: refactor reclaim_pages with reclaim_or_migrate_folios
On Mon, 15 Jan 2024 13:52:49 +0900 Honggyu Kim wrote: > Since we will introduce reclaim_pages like functions such as > demote_pages and promote_pages, the most of the code can be shared. > > This is a preparation patch that introduces reclaim_or_migrate_folios() > to cover all the logics, but it provides a handler for the different > actions. > > No functional changes applied. > > Signed-off-by: Honggyu Kim > --- > mm/vmscan.c | 18 -- > 1 file changed, 12 insertions(+), 6 deletions(-) > > diff --git a/mm/vmscan.c b/mm/vmscan.c > index bba207f41b14..7ca2396ccc3b 100644 > --- a/mm/vmscan.c > +++ b/mm/vmscan.c > @@ -2107,15 +2107,16 @@ static unsigned int reclaim_folio_list(struct > list_head *folio_list, > return nr_reclaimed; > } > > -unsigned long reclaim_pages(struct list_head *folio_list) > +static unsigned long reclaim_or_migrate_folios(struct list_head *folio_list, > + unsigned int (*handler)(struct list_head *, struct pglist_data > *)) I'm not very sure if extending this function for general migration is the right approach, since we have dedicated functions for the migration. I'd like to hear others' opinions. > { > int nid; > - unsigned int nr_reclaimed = 0; > + unsigned int nr_folios = 0; > LIST_HEAD(node_folio_list); > unsigned int noreclaim_flag; > > if (list_empty(folio_list)) > - return nr_reclaimed; > + return nr_folios; > > noreclaim_flag = memalloc_noreclaim_save(); > > @@ -2129,15 +2130,20 @@ unsigned long reclaim_pages(struct list_head > *folio_list) > continue; > } > > - nr_reclaimed += reclaim_folio_list(_folio_list, > NODE_DATA(nid)); > + nr_folios += handler(_folio_list, NODE_DATA(nid)); > nid = folio_nid(lru_to_folio(folio_list)); > } while (!list_empty(folio_list)); > > - nr_reclaimed += reclaim_folio_list(_folio_list, NODE_DATA(nid)); > + nr_folios += handler(_folio_list, NODE_DATA(nid)); > > memalloc_noreclaim_restore(noreclaim_flag); > > - return nr_reclaimed; > + return nr_folios; > +} > + > +unsigned long reclaim_pages(struct list_head *folio_list) > +{ > + return reclaim_or_migrate_folios(folio_list, reclaim_folio_list); > } > > static unsigned long shrink_list(enum lru_list lru, unsigned long nr_to_scan, > -- > 2.34.1
Re: [RFC PATCH 4/4] mm/damon: introduce DAMOS_PROMOTE action for promotion
On Mon, 15 Jan 2024 13:52:52 +0900 Honggyu Kim wrote: > From: Hyeongtak Ji > > This patch introduces DAMOS_PROMOTE action for paddr mode. > > It includes renaming alloc_demote_folio to alloc_migrate_folio to use it > for promotion as well. > > The execution sequence of DAMOS_DEMOTE and DAMOS_PROMOTE look as > follows for comparison. > > DAMOS_DEMOTE action > damo_pa_apply_scheme > -> damon_pa_reclaim > -> demote_pages > -> do_demote_folio_list > -> __demote_folio_list > -> demote_folio_list > > DAMOS_PROMOTE action > damo_pa_apply_scheme > -> damon_pa_promote > -> promote_pages > -> do_promote_folio_list > -> __promote_folio_list > -> promote_folio_list > > Signed-off-by: Hyeongtak Ji > Signed-off-by: Honggyu Kim > --- > include/linux/damon.h | 2 + > include/linux/migrate_mode.h | 1 + > include/linux/vm_event_item.h | 1 + > include/trace/events/migrate.h | 3 +- > mm/damon/paddr.c | 29 > mm/damon/sysfs-schemes.c | 1 + > mm/internal.h | 1 + > mm/vmscan.c| 129 - > mm/vmstat.c| 1 + > 9 files changed, 165 insertions(+), 3 deletions(-) > > diff --git a/include/linux/damon.h b/include/linux/damon.h > index 4c0a0fef09c5..477060bb6718 100644 > --- a/include/linux/damon.h > +++ b/include/linux/damon.h > @@ -107,6 +107,7 @@ struct damon_target { > * @DAMOS_LRU_DEPRIO:Deprioritize the region on its LRU lists. > * @DAMOS_STAT: Do nothing but count the stat. > * @DAMOS_DEMOTE:Do demotion for the current region. > + * @DAMOS_PROMOTE: Do promotion if possible, otherwise do nothing. Like LRU_PRIO is defined before LRU_DEPRIO, what about defining PROMOTE before DEMOTE? > * @NR_DAMOS_ACTIONS:Total number of DAMOS actions > * > * The support of each action is up to running damon_operations. > @@ -125,6 +126,7 @@ enum damos_action { > DAMOS_LRU_DEPRIO, > DAMOS_STAT, /* Do nothing but only record the stat */ > DAMOS_DEMOTE, > + DAMOS_PROMOTE, > NR_DAMOS_ACTIONS, > }; > > diff --git a/include/linux/migrate_mode.h b/include/linux/migrate_mode.h > index f37cc03f9369..63f75eb9abf3 100644 > --- a/include/linux/migrate_mode.h > +++ b/include/linux/migrate_mode.h > @@ -29,6 +29,7 @@ enum migrate_reason { > MR_CONTIG_RANGE, > MR_LONGTERM_PIN, > MR_DEMOTION, > + MR_PROMOTION, > MR_TYPES > }; > > diff --git a/include/linux/vm_event_item.h b/include/linux/vm_event_item.h > index 8abfa1240040..63cf920afeaa 100644 > --- a/include/linux/vm_event_item.h > +++ b/include/linux/vm_event_item.h > @@ -44,6 +44,7 @@ enum vm_event_item { PGPGIN, PGPGOUT, PSWPIN, PSWPOUT, > PGDEMOTE_KSWAPD, > PGDEMOTE_DIRECT, > PGDEMOTE_KHUGEPAGED, > + PGPROMOTE, > PGSCAN_KSWAPD, > PGSCAN_DIRECT, > PGSCAN_KHUGEPAGED, > diff --git a/include/trace/events/migrate.h b/include/trace/events/migrate.h > index 0190ef725b43..f0dd569c1e62 100644 > --- a/include/trace/events/migrate.h > +++ b/include/trace/events/migrate.h > @@ -22,7 +22,8 @@ > EM( MR_NUMA_MISPLACED, "numa_misplaced") \ > EM( MR_CONTIG_RANGE,"contig_range") \ > EM( MR_LONGTERM_PIN,"longterm_pin") \ > - EMe(MR_DEMOTION,"demotion") > + EM( MR_DEMOTION,"demotion") \ > + EMe(MR_PROMOTION, "promotion") > > /* > * First define the enums in the above macros to be exported to userspace > diff --git a/mm/damon/paddr.c b/mm/damon/paddr.c > index d3e3f077cd00..360ce69d5898 100644 > --- a/mm/damon/paddr.c > +++ b/mm/damon/paddr.c > @@ -257,6 +257,32 @@ static unsigned long damon_pa_reclaim(struct > damon_region *r, struct damos *s, b > return applied * PAGE_SIZE; > } > > +static unsigned long damon_pa_promote(struct damon_region *r, struct damos > *s) > +{ > + unsigned long addr, applied; > + LIST_HEAD(folio_list); > + > + for (addr = r->ar.start; addr < r->ar.end; addr += PAGE_SIZE) { > + struct folio *folio = damon_get_folio(PHYS_PFN(addr)); > + > + if (!folio) > + continue; > + > + if (damos_pa_filter_out(s, folio)) > + goto put_folio; > + > + if (!folio_isolate_lru(folio)) > + goto put_folio; > + > + list_add(>lru, _list); > +put_folio: > + folio_put(folio); > + } > + applied = promote_pages(_list); > + cond_resched(); > + return applied * PAGE_SIZE; > +} > + > static inline unsigned long damon_pa_mark_accessed_or_deactivate( > struct damon_region *r, struct damos *s, bool mark_accessed) > { > @@ -309,6 +335,8 @@ static unsigned long damon_pa_apply_scheme(struct > damon_ctx
Re: [RFC PATCH 3/4] mm/memory-tiers: add next_promotion_node to find promotion target
On Mon, 15 Jan 2024 13:52:51 +0900 Honggyu Kim wrote: > From: Hyeongtak Ji > > This patch adds next_promotion_node that can be used to identify the > appropriate promotion target based on memory tiers. When multiple > promotion target nodes are available, the nearest node is selected based > on numa distance. > > Signed-off-by: Hyeongtak Ji > --- > include/linux/memory-tiers.h | 11 + > mm/memory-tiers.c| 43 > 2 files changed, 54 insertions(+) > > diff --git a/include/linux/memory-tiers.h b/include/linux/memory-tiers.h > index 1e39d27bee41..0788e435fc50 100644 > --- a/include/linux/memory-tiers.h > +++ b/include/linux/memory-tiers.h > @@ -50,6 +50,7 @@ int mt_set_default_dram_perf(int nid, struct > node_hmem_attrs *perf, > int mt_perf_to_adistance(struct node_hmem_attrs *perf, int *adist); > #ifdef CONFIG_MIGRATION > int next_demotion_node(int node); > +int next_promotion_node(int node); > void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t *targets); > bool node_is_toptier(int node); > #else > @@ -58,6 +59,11 @@ static inline int next_demotion_node(int node) > return NUMA_NO_NODE; > } > > +static inline int next_promotion_node(int node) > +{ > + return NUMA_NO_NODE; > +} > + > static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t > *targets) > { > *targets = NODE_MASK_NONE; > @@ -101,6 +107,11 @@ static inline int next_demotion_node(int node) > return NUMA_NO_NODE; > } > > +static inline int next_promotion_node(int node) > +{ > + return NUMA_NO_NODE; > +} > + > static inline void node_get_allowed_targets(pg_data_t *pgdat, nodemask_t > *targets) > { > *targets = NODE_MASK_NONE; > diff --git a/mm/memory-tiers.c b/mm/memory-tiers.c > index 8d5291add2bc..0060ee571cf4 100644 > --- a/mm/memory-tiers.c > +++ b/mm/memory-tiers.c > @@ -335,6 +335,49 @@ int next_demotion_node(int node) > return target; > } > > +/* > + * Select a promotion target that is close to the from node among the given > + * two nodes. > + * > + * TODO: consider other decision policy as node_distance may not be precise. > + */ > +static int select_promotion_target(int a, int b, int from) > +{ > + if (node_distance(from, a) < node_distance(from, b)) > + return a; > + else > + return b; > +} > + > +/** > + * next_promotion_node() - Get the next node in the promotion path > + * @node: The starting node to lookup the next node > + * > + * Return: node id for next memory node in the promotion path hierarchy > + * from @node; NUMA_NO_NODE if @node is the toptier. > + */ > +int next_promotion_node(int node) > +{ > + int target = NUMA_NO_NODE; > + int nid; > + > + if (node_is_toptier(node)) > + return NUMA_NO_NODE; > + > + rcu_read_lock(); > + for_each_node_state(nid, N_MEMORY) { > + if (node_isset(node, node_demotion[nid].preferred)) { > + if (target == NUMA_NO_NODE) > + target = nid; > + else > + target = select_promotion_target(nid, target, > node); > + } > + } > + rcu_read_unlock(); > + > + return target; > +} > + If this is gonna used by only DAMON and we don't have a concrete plan to making this used by others, I think implementing this in mm/damon/ might make sense. > static void disable_all_demotion_targets(void) > { > struct memory_tier *memtier; > -- > 2.34.1
Re: [RFC PATCH 0/4] DAMON based 2-tier memory management for CXL memory
Hello, On Mon, 15 Jan 2024 13:52:48 +0900 Honggyu Kim wrote: > There was an RFC IDEA "DAMOS-based Tiered-Memory Management" previously > posted at [1]. > > It says there is no implementation of the demote/promote DAMOS action > are made. This RFC is about its implementation for physical address > space. > [...] > Evaluation Results > == > [...] > In summary of both results, our evaluation shows that "DAMON 2-tier" > memory management reduces the performance slowdown compared to the > "default" memory policy from 15~17% to 4~5% when the system runs with > high memory pressure on its fast tier DRAM nodes. > > The similar evaluation was done in another machine that has 256GB of > local DRAM and 96GB of CXL memory. The performance slowdown is reduced > from 20~24% for "default" to 5~7% for "DAMON 2-tier". > > Having these DAMOS_DEMOTE and DAMOS_PROMOTE actions can make 2-tier > memory systems run more efficiently under high memory pressures. Thank you so much for this great patches and the above nice test results. I believe the test setup and results make sense, and merging a revised version of this patchset would provide real benefits to the users. In a high level, I think it might better to separate DAMON internal changes from DAMON external changes. For DAMON part changes, I have no big concern other than trivial coding style level comments. For DAMON-external changes that implementing demote_pages() and promote_pages(), I'm unsure if the implementation is reusing appropriate functions, and if those are placee in right source file. Especially, I'm unsure if vmscan.c is the right place for promotion code. Also I don't know if there is a good agreement on the promotion/demotion target node decision. That should be because I'm not that familiar with the areas and the files, but I feel this might because our discussions on the promotion and the demotion operations are having rooms for being more matured. Because I'm not very faimiliar with the part, I'd like to hear others' comments, too. To this end, I feel the problem might be able to be simpler, because this patchset is trying to provide two sophisticated operations, while I think a simpler approach might be possible. My humble simpler idea is adding a DAMOS operation for moving pages to a given node (like sys_move_phy_pages RFC[1]), instead of the promote/demote. Because the general pages migration can handle multiple cases including the promote/demote in my humble assumption. In more detail, users could decide which is the appropriate node for promotion or demotion and use the new DAMOS action to do promotion and demotion. Users would requested to decide which node is the proper promotion/demotion target nodes, but that decision wouldn't be that hard in my opinion. For this, 'struct damos' would need to be updated for such argument-dependent actions, like 'struct damos_filter' is haing a union. In future, we could extend the operation to the promotion and the demotion after the dicussion around the promotion and demotion is matured, if required. And assuming DAMON be extended for originating CPU-aware access monitoring, the new DAMOS action would also cover more use cases such as general NUMA nodes balancing (extending DAMON for CPU-aware monitoring would required), and some complex configurations where having both CPU affinity and tiered memory. I also think that may well fit with my RFC idea[2] for tiered memory management. Looking forward to opinions from you and others. I admig I miss many things, and more than happy to be enlightened. [1] https://lwn.net/Articles/944007/ [2] https://lore.kernel.org/damon/20231112195602.61525-1...@kernel.org/ Thanks, SJ > > Signed-off-by: Honggyu Kim > Signed-off-by: Hyeongtak Ji > Signed-off-by: Rakie Kim > > [1] https://lore.kernel.org/damon/20231112195602.61525-1...@kernel.org > [2] https://github.com/skhynix/hmsdk > [3] https://github.com/redis/redis/tree/7.0.0 > [4] https://github.com/brianfrankcooper/YCSB/tree/0.17.0 > [5] https://dl.acm.org/doi/10.1145/3503222.3507731 > [6] https://dl.acm.org/doi/10.1145/3582016.3582063 > > Honggyu Kim (2): > mm/vmscan: refactor reclaim_pages with reclaim_or_migrate_folios > mm/damon: introduce DAMOS_DEMOTE action for demotion > > Hyeongtak Ji (2): > mm/memory-tiers: add next_promotion_node to find promotion target > mm/damon: introduce DAMOS_PROMOTE action for promotion > > include/linux/damon.h | 4 + > include/linux/memory-tiers.h | 11 ++ > include/linux/migrate_mode.h | 1 + > include/linux/vm_event_item.h | 1 + > include/trace/events/migrate.h | 3 +- > mm/damon/paddr.c | 46 ++- > mm/damon/sysfs-schemes.c | 2 + > mm/internal.h | 2 + > mm/memory-tiers.c | 43 ++ > mm/vmscan.c| 231 +++-- > mm/vmstat.c| 1 + > 11 files changed, 330 insertions(+), 15
Re: [PATCH v3] vmscan: add trace events for lru_gen
Hello, On Sun, 24 Sep 2023 23:23:43 +0900 Jaewon Kim wrote: > As the legacy lru provides, the lru_gen needs some trace events for > debugging. > > This commit introduces 2 trace events. > trace_mm_vmscan_lru_gen_scan > trace_mm_vmscan_lru_gen_evict > > Each event is similar to the following legacy events. > trace_mm_vmscan_lru_isolate, > trace_mm_vmscan_lru_shrink_[in]active > > Here's an example > mm_vmscan_lru_gen_scan: isolate_mode=0 classzone=1 order=9 > nr_requested=4096 nr_scanned=431 nr_skipped=0 nr_taken=55 lru=anon > mm_vmscan_lru_gen_evict: nid=0 nr_reclaimed=42 nr_dirty=0 nr_writeback=0 > nr_congested=0 nr_immediate=0 nr_activate_anon=13 nr_activate_file=0 > nr_ref_keep=0 nr_unmap_fail=0 priority=2 > flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC > mm_vmscan_lru_gen_scan: isolate_mode=0 classzone=1 order=9 > nr_requested=4096 nr_scanned=66 nr_skipped=0 nr_taken=64 lru=file > mm_vmscan_lru_gen_evict: nid=0 nr_reclaimed=62 nr_dirty=0 nr_writeback=0 > nr_congested=0 nr_immediate=0 nr_activate_anon=0 nr_activate_file=2 > nr_ref_keep=0 nr_unmap_fail=0 priority=2 > flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC > > Signed-off-by: Jaewon Kim > Reviewed-by: Steven Rostedt (Google) > Reviewed-by: T.J. Mercier > --- > v3: change printk format > v2: use condition and make it aligned > v1: introduce trace events > --- > include/trace/events/mmflags.h | 5 ++ > include/trace/events/vmscan.h | 98 ++ > mm/vmscan.c| 17 -- > 3 files changed, 115 insertions(+), 5 deletions(-) > > diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h > index 1478b9dd05fa..44e9b38f83e7 100644 > --- a/include/trace/events/mmflags.h > +++ b/include/trace/events/mmflags.h > @@ -274,6 +274,10 @@ IF_HAVE_VM_SOFTDIRTY(VM_SOFTDIRTY, "softdirty" > ) \ > EM (LRU_ACTIVE_FILE, "active_file") \ > EMe(LRU_UNEVICTABLE, "unevictable") > > +#define LRU_GEN_NAMES\ > + EM (LRU_GEN_ANON, "anon") \ > + EMe(LRU_GEN_FILE, "file") > + I found this patchset makes build fails when !CONFIG_LRU_GEN, like below: In file included from /linux/include/trace/trace_events.h:27, from /linux/include/trace/define_trace.h:102, from /linux/include/trace/events/oom.h:195, from /linux/mm/oom_kill.c:53: /linux/include/trace/events/mmflags.h:278:7: error: ‘LRU_GEN_ANON’ undeclared here (not in a function); did you mean ‘LRU_GEN_PGOFF’? 278 | EM (LRU_GEN_ANON, "anon") \ | ^~~~ Maybe some config checks are needed? Thanks, SJ [...]
[PATCH 3/9] mm/damon/core: use nr_accesses_bp as a source of damos_before_apply tracepoint
damos_before_apply tracepoint is exposing access rate of DAMON regions using nr_accesses field of regions, which was actually used by DAMOS in the past. However, it has changed to use nr_accesses_bp instead. Update the tracepoint to expose the value that DAMOS is really using. Note that it doesn't expose the value as is in the basis point, but after converting it to the natural number by dividing it by 10,000. Therefore this change doesn't make user-visible behavioral differences. Signed-off-by: SeongJae Park --- include/trace/events/damon.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h index 19930bb7af9a..23200aabccac 100644 --- a/include/trace/events/damon.h +++ b/include/trace/events/damon.h @@ -36,7 +36,7 @@ TRACE_EVENT_CONDITION(damos_before_apply, __entry->target_idx = target_idx; __entry->start = r->ar.start; __entry->end = r->ar.end; - __entry->nr_accesses = r->nr_accesses; + __entry->nr_accesses = r->nr_accesses_bp / 1; __entry->age = r->age; __entry->nr_regions = nr_regions; ), -- 2.25.1
[PATCH RESEND v2 1/2] mm/damon/core: add a tracepoint for damos apply target regions
DAMON provides damon_aggregated tracepoint, which exposes details of each region and its access monitoring results. It is useful for getting whole monitoring results, e.g., for recording purposes. For investigations of DAMOS, DAMON Sysfs interface provides DAMOS statistics and tried_regions directory. But, those provides only statistics and snapshots. If the scheme is frequently applied and if the user needs to know every detail of DAMOS behavior, the snapshot-based interface could be insufficient and expensive. As a last resort, userspace users need to record the all monitoring results via damon_aggregated tracepoint and simulate how DAMOS would worked. It is unnecessarily complicated. DAMON kernel API users, meanwhile, can do that easily via before_damos_apply() callback field of 'struct damon_callback', though. Add a tracepoint that will be called just after before_damos_apply() callback for more convenient investigations of DAMOS. The tracepoint exposes all details about each regions, similar to damon_aggregated tracepoint. Please note that DAMOS is currently not only for memory management but also for query-like efficient monitoring results retrievals (when 'stat' action is used). Until now, only statistics or snapshots were supported. Addition of this tracepoint allows efficient full recording of DAMOS-based filtered monitoring results. Signed-off-by: SeongJae Park --- include/trace/events/damon.h | 39 mm/damon/core.c | 32 - 2 files changed, 70 insertions(+), 1 deletion(-) diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h index 0b8d13bde17a..19930bb7af9a 100644 --- a/include/trace/events/damon.h +++ b/include/trace/events/damon.h @@ -9,6 +9,45 @@ #include #include +TRACE_EVENT_CONDITION(damos_before_apply, + + TP_PROTO(unsigned int context_idx, unsigned int scheme_idx, + unsigned int target_idx, struct damon_region *r, + unsigned int nr_regions, bool do_trace), + + TP_ARGS(context_idx, target_idx, scheme_idx, r, nr_regions, do_trace), + + TP_CONDITION(do_trace), + + TP_STRUCT__entry( + __field(unsigned int, context_idx) + __field(unsigned int, scheme_idx) + __field(unsigned long, target_idx) + __field(unsigned long, start) + __field(unsigned long, end) + __field(unsigned int, nr_accesses) + __field(unsigned int, age) + __field(unsigned int, nr_regions) + ), + + TP_fast_assign( + __entry->context_idx = context_idx; + __entry->scheme_idx = scheme_idx; + __entry->target_idx = target_idx; + __entry->start = r->ar.start; + __entry->end = r->ar.end; + __entry->nr_accesses = r->nr_accesses; + __entry->age = r->age; + __entry->nr_regions = nr_regions; + ), + + TP_printk("ctx_idx=%u scheme_idx=%u target_idx=%lu nr_regions=%u %lu-%lu: %u %u", + __entry->context_idx, __entry->scheme_idx, + __entry->target_idx, __entry->nr_regions, + __entry->start, __entry->end, + __entry->nr_accesses, __entry->age) +); + TRACE_EVENT(damon_aggregated, TP_PROTO(unsigned int target_id, struct damon_region *r, diff --git a/mm/damon/core.c b/mm/damon/core.c index ca631dd88b33..3ca34a252a3c 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -950,6 +950,33 @@ static void damos_apply_scheme(struct damon_ctx *c, struct damon_target *t, struct timespec64 begin, end; unsigned long sz_applied = 0; int err = 0; + /* +* We plan to support multiple context per kdamond, as DAMON sysfs +* implies with 'nr_contexts' file. Nevertheless, only single context +* per kdamond is supported for now. So, we can simply use '0' context +* index here. +*/ + unsigned int cidx = 0; + struct damos *siter;/* schemes iterator */ + unsigned int sidx = 0; + struct damon_target *titer; /* targets iterator */ + unsigned int tidx = 0; + bool do_trace = false; + + /* get indices for trace_damos_before_apply() */ + if (trace_damos_before_apply_enabled()) { + damon_for_each_scheme(siter, c) { + if (siter == s) + break; + sidx++; + } + damon_for_each_target(titer, c) { + if (titer == t) + break; + tidx++; + } + do_trace = true; + } if (c->ops.apply_scheme) { if (quota->esz && quota->cha
[PATCH v2 1/2] mm/damon/core: add a tracepoint for damos apply target regions
DAMON provides damon_aggregated tracepoint, which exposes details of each region and its access monitoring results. It is useful for getting whole monitoring results, e.g., for recording purposes. For investigations of DAMOS, DAMON Sysfs interface provides DAMOS statistics and tried_regions directory. But, those provides only statistics and snapshots. If the scheme is frequently applied and if the user needs to know every detail of DAMOS behavior, the snapshot-based interface could be insufficient and expensive. As a last resort, userspace users need to record the all monitoring results via damon_aggregated tracepoint and simulate how DAMOS would worked. It is unnecessarily complicated. DAMON kernel API users, meanwhile, can do that easily via before_damos_apply() callback field of 'struct damon_callback', though. Add a tracepoint that will be called just after before_damos_apply() callback for more convenient investigations of DAMOS. The tracepoint exposes all details about each regions, similar to damon_aggregated tracepoint. Please note that DAMOS is currently not only for memory management but also for query-like efficient monitoring results retrievals (when 'stat' action is used). Until now, only statistics or snapshots were supported. Addition of this tracepoint allows efficient full recording of DAMOS-based filtered monitoring results. Signed-off-by: SeongJae Park --- include/trace/events/damon.h | 39 mm/damon/core.c | 32 - 2 files changed, 70 insertions(+), 1 deletion(-) diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h index 0b8d13bde17a..19930bb7af9a 100644 --- a/include/trace/events/damon.h +++ b/include/trace/events/damon.h @@ -9,6 +9,45 @@ #include #include +TRACE_EVENT_CONDITION(damos_before_apply, + + TP_PROTO(unsigned int context_idx, unsigned int scheme_idx, + unsigned int target_idx, struct damon_region *r, + unsigned int nr_regions, bool do_trace), + + TP_ARGS(context_idx, target_idx, scheme_idx, r, nr_regions, do_trace), + + TP_CONDITION(do_trace), + + TP_STRUCT__entry( + __field(unsigned int, context_idx) + __field(unsigned int, scheme_idx) + __field(unsigned long, target_idx) + __field(unsigned long, start) + __field(unsigned long, end) + __field(unsigned int, nr_accesses) + __field(unsigned int, age) + __field(unsigned int, nr_regions) + ), + + TP_fast_assign( + __entry->context_idx = context_idx; + __entry->scheme_idx = scheme_idx; + __entry->target_idx = target_idx; + __entry->start = r->ar.start; + __entry->end = r->ar.end; + __entry->nr_accesses = r->nr_accesses; + __entry->age = r->age; + __entry->nr_regions = nr_regions; + ), + + TP_printk("ctx_idx=%u scheme_idx=%u target_idx=%lu nr_regions=%u %lu-%lu: %u %u", + __entry->context_idx, __entry->scheme_idx, + __entry->target_idx, __entry->nr_regions, + __entry->start, __entry->end, + __entry->nr_accesses, __entry->age) +); + TRACE_EVENT(damon_aggregated, TP_PROTO(unsigned int target_id, struct damon_region *r, diff --git a/mm/damon/core.c b/mm/damon/core.c index ca631dd88b33..3ca34a252a3c 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -950,6 +950,33 @@ static void damos_apply_scheme(struct damon_ctx *c, struct damon_target *t, struct timespec64 begin, end; unsigned long sz_applied = 0; int err = 0; + /* +* We plan to support multiple context per kdamond, as DAMON sysfs +* implies with 'nr_contexts' file. Nevertheless, only single context +* per kdamond is supported for now. So, we can simply use '0' context +* index here. +*/ + unsigned int cidx = 0; + struct damos *siter;/* schemes iterator */ + unsigned int sidx = 0; + struct damon_target *titer; /* targets iterator */ + unsigned int tidx = 0; + bool do_trace = false; + + /* get indices for trace_damos_before_apply() */ + if (trace_damos_before_apply_enabled()) { + damon_for_each_scheme(siter, c) { + if (siter == s) + break; + sidx++; + } + damon_for_each_target(titer, c) { + if (titer == t) + break; + tidx++; + } + do_trace = true; + } if (c->ops.apply_scheme) { if (quota->esz && quota->cha
Re: [PATCH 1/2] mm/damon/core: add a tracepoint for damos apply target regions
On Mon, 11 Sep 2023 16:51:44 -0400 Steven Rostedt wrote: > On Mon, 11 Sep 2023 20:36:42 + > SeongJae Park wrote: > > > > Then tracing is fully enabled here, and now we enter: > > > > > > if (trace_damos_before_apply_enabled()) { > > > trace_damos_before_apply(cidx, sidx, tidx, r, > > > damon_nr_regions(t)); > > > } > > > > > > Now the trace event is hit with sidx and tidx zero when they should not > > > be. > > > This could confuse you when looking at the report. > > > > Thank you so much for enlightening me with this kind explanation, Steve! > > And > > this all make sense. I will follow your suggestion in the next spin. > > > > > > > > What I suggested was to initialize sidx to zero, > > > > Nit. Initialize to not zero but -1, right? > > Yeah, but I was also thinking of the reset of it too :-p > > sidx = -1; > > if (trace_damos_before_apply_enabled()) { > sidx = 0; Thank you for clarifying, Steve :) Nevertheless, since the variable is unsigned int, I would need to use UINT_MAX instead. To make the code easier to understand, I'd prefer to add a third parameter, as you suggested as another option at the original reply, like below: --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -997,6 +997,7 @@ static void damos_apply_scheme(struct damon_ctx *c, struct damon_target *t, unsigned int sidx = 0; struct damon_target *titer; /* targets iterator */ unsigned int tidx = 0; + bool do_trace = false; /* get indices for trace_damos_before_apply() */ if (trace_damos_before_apply_enabled()) { @@ -1010,6 +1011,7 @@ static void damos_apply_scheme(struct damon_ctx *c, struct damon_target *t, break; tidx++; } + do_trace = true; } if (c->ops.apply_scheme) { @@ -1036,7 +1038,7 @@ static void damos_apply_scheme(struct damon_ctx *c, struct damon_target *t, err = c->callback.before_damos_apply(c, t, r, s); if (!err) { trace_damos_before_apply(cidx, sidx, tidx, r, - damon_nr_regions(t)); + damon_nr_regions(t), do_trace); sz_applied = c->ops.apply_scheme(c, t, r, s); } ktime_get_coarse_ts64(); Thanks, SJ > > -- Steve > > > > > > > set it in the first trace_*_enabled() check, and ignore calling the > > > tracepoint if it's not >= 0. > > >
Re: [PATCH 1/2] mm/damon/core: add a tracepoint for damos apply target regions
Hi Steven, On Mon, 11 Sep 2023 14:19:55 -0400 Steven Rostedt wrote: > On Mon, 11 Sep 2023 04:59:07 + > SeongJae Park wrote: > > > --- a/mm/damon/core.c > > +++ b/mm/damon/core.c > > @@ -950,6 +950,28 @@ static void damos_apply_scheme(struct damon_ctx *c, > > struct damon_target *t, > > struct timespec64 begin, end; > > unsigned long sz_applied = 0; > > int err = 0; > > + /* > > +* We plan to support multiple context per kdamond, as DAMON sysfs > > +* implies with 'nr_contexts' file. Nevertheless, only single context > > +* per kdamond is supported for now. So, we can simply use '0' context > > +* index here. > > +*/ > > + unsigned int cidx = 0; > > + struct damos *siter;/* schemes iterator */ > > + unsigned int sidx = 0; > > + struct damon_target *titer; /* targets iterator */ > > + unsigned int tidx = 0; > > + > > If this loop is only for passing sidx and tidx to the trace point, You're correct. > you can add around it: > > if (trace_damos_before_apply_enabled()) { > > > + damon_for_each_scheme(siter, c) { > > + if (siter == s) > > + break; > > + sidx++; > > + } > > + damon_for_each_target(titer, c) { > > + if (titer == t) > > + break; > > + tidx++; > > + } > > } > > > And then this loop will only be done if that trace event is enabled. Today I learned yet another great feature of the tracing framework. Thank you Steven, I will add that to the next spin of this patchset! > > To prevent races, you may also want to add a third parameter, or initialize > them to -1: > > sidx = -1; > > if (trace_damo_before_apply_enabled()) { > sidx = 0; > [..] > } > > And you can change the TRACE_EVENT() TO TRACE_EVENT_CONDITION(): > > TRACE_EVENT_CONDITION(damos_before_apply, > > TP_PROTO(...), > > TP_ARGS(...), > > TP_CONDITION(sidx >= 0), > > and the trace event will not be called if sidx is less than zero. > > Also, this if statement is only done when the trace event is enabled, so > it's equivalent to: > > if (trace_damos_before_apply_enabled()) { > if (sdx >= 0) > trace_damos_before_apply(cidx, sidx, tidx, r, > damon_nr_regions(t)); > } Again, thank you very much for letting me know this awesome feature. However, sidx is supposed to be always >=0 here, since kdamond is running in single thread and hence no race is expected. If it exists, it's a bug. So, I wouldn't make this change. Appreciate again for letting me know this very useful feature, and please let me know if I'm missing something, though! Thanks, SJ > > -- Steve > > > > > > > if (c->ops.apply_scheme) { > > if (quota->esz && quota->charged_sz + sz > quota->esz) { > > @@ -964,8 +986,11 @@ static void damos_apply_scheme(struct damon_ctx *c, > > struct damon_target *t, > > ktime_get_coarse_ts64(); > > if (c->callback.before_damos_apply) > > err = c->callback.before_damos_apply(c, t, r, s); > > - if (!err) > > + if (!err) { > > + trace_damos_before_apply(cidx, sidx, tidx, r, > > + damon_nr_regions(t)); > > sz_applied = c->ops.apply_scheme(c, t, r, s); > > + } > > ktime_get_coarse_ts64(); > > quota->total_charged_ns += timespec64_to_ns() - > > timespec64_to_ns(); > > -- >
Re: [PATCH 1/2] mm/damon/core: add a tracepoint for damos apply target regions
On Mon, 11 Sep 2023 16:31:27 -0400 Steven Rostedt wrote: > On Mon, 11 Sep 2023 19:05:04 + > SeongJae Park wrote: > > > > Also, this if statement is only done when the trace event is enabled, so > > > it's equivalent to: > > > > > > if (trace_damos_before_apply_enabled()) { > > > if (sdx >= 0) > > > trace_damos_before_apply(cidx, sidx, tidx, r, > > > damon_nr_regions(t)); > > > } > > > > Again, thank you very much for letting me know this awesome feature. > > However, > > sidx is supposed to be always >=0 here, since kdamond is running in single > > thread and hence no race is expected. If it exists, it's a bug. So, I > > wouldn't make this change. Appreciate again for letting me know this very > > useful feature, and please let me know if I'm missing something, though! > > The race isn't with your code, but the enabling of tracing. > > Let's say you enable tracing just ass it passed the first: > > if (trace_damos_before_apply_enabled()) { > > damon_for_each_scheme(siter, c) { > if (siter == s) > break; > sidx++; > } > damon_for_each_target(titer, c) { > if (titer == t) > break; > tidx++; > } > > Now, sidx and tidx is zero (when they were not computed, thus, they > shouldn't be zero. > > Then tracing is fully enabled here, and now we enter: > > if (trace_damos_before_apply_enabled()) { > trace_damos_before_apply(cidx, sidx, tidx, r, > damon_nr_regions(t)); > } > > Now the trace event is hit with sidx and tidx zero when they should not be. > This could confuse you when looking at the report. Thank you so much for enlightening me with this kind explanation, Steve! And this all make sense. I will follow your suggestion in the next spin. > > What I suggested was to initialize sidx to zero, Nit. Initialize to not zero but -1, right? > set it in the first trace_*_enabled() check, and ignore calling the > tracepoint if it's not >= 0. > > -- Steve > Thanks, SJ
[PATCH 1/2] mm/damon/core: add a tracepoint for damos apply target regions
DAMON provides damon_aggregated tracepoint, which exposes details of each region and its access monitoring results. It is useful for getting whole monitoring results, e.g., for recording purposes. For investigations of DAMOS, DAMON Sysfs interface provides DAMOS statistics and tried_regions directory. But, those provides only statistics and snapshots. If the scheme is frequently applied and if the user needs to know every detail of DAMOS behavior, the snapshot-based interface could be insufficient and expensive. As a last resort, userspace users need to record the all monitoring results via damon_aggregated tracepoint and simulate how DAMOS would worked. It is unnecessarily complicated. DAMON kernel API users, meanwhile, can do that easily via before_damos_apply() callback field of 'struct damon_callback', though. Add a tracepoint that will be called just after before_damos_apply() callback for more convenient investigations of DAMOS. The tracepoint exposes all details about each regions, similar to damon_aggregated tracepoint. Please note that DAMOS is currently not only for memory management but also for query-like efficient monitoring results retrievals (when 'stat' action is used). Until now, only statistics or snapshots were supported. Addition of this tracepoint allows efficient full recording of DAMOS-based filtered monitoring results. Signed-off-by: SeongJae Park --- include/trace/events/damon.h | 37 mm/damon/core.c | 27 +- 2 files changed, 63 insertions(+), 1 deletion(-) diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h index 0b8d13bde17a..9e7b39495b05 100644 --- a/include/trace/events/damon.h +++ b/include/trace/events/damon.h @@ -9,6 +9,43 @@ #include #include +TRACE_EVENT(damos_before_apply, + + TP_PROTO(unsigned int context_idx, unsigned int scheme_idx, + unsigned int target_idx, struct damon_region *r, + unsigned int nr_regions), + + TP_ARGS(context_idx, target_idx, scheme_idx, r, nr_regions), + + TP_STRUCT__entry( + __field(unsigned int, context_idx) + __field(unsigned int, scheme_idx) + __field(unsigned long, target_idx) + __field(unsigned long, start) + __field(unsigned long, end) + __field(unsigned int, nr_accesses) + __field(unsigned int, age) + __field(unsigned int, nr_regions) + ), + + TP_fast_assign( + __entry->context_idx = context_idx; + __entry->scheme_idx = scheme_idx; + __entry->target_idx = target_idx; + __entry->start = r->ar.start; + __entry->end = r->ar.end; + __entry->nr_accesses = r->nr_accesses; + __entry->age = r->age; + __entry->nr_regions = nr_regions; + ), + + TP_printk("ctx_idx=%u scheme_idx=%u target_idx=%lu nr_regions=%u %lu-%lu: %u %u", + __entry->context_idx, __entry->scheme_idx, + __entry->target_idx, __entry->nr_regions, + __entry->start, __entry->end, + __entry->nr_accesses, __entry->age) +); + TRACE_EVENT(damon_aggregated, TP_PROTO(unsigned int target_id, struct damon_region *r, diff --git a/mm/damon/core.c b/mm/damon/core.c index ca631dd88b33..aa7fbcdf7310 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -950,6 +950,28 @@ static void damos_apply_scheme(struct damon_ctx *c, struct damon_target *t, struct timespec64 begin, end; unsigned long sz_applied = 0; int err = 0; + /* +* We plan to support multiple context per kdamond, as DAMON sysfs +* implies with 'nr_contexts' file. Nevertheless, only single context +* per kdamond is supported for now. So, we can simply use '0' context +* index here. +*/ + unsigned int cidx = 0; + struct damos *siter;/* schemes iterator */ + unsigned int sidx = 0; + struct damon_target *titer; /* targets iterator */ + unsigned int tidx = 0; + + damon_for_each_scheme(siter, c) { + if (siter == s) + break; + sidx++; + } + damon_for_each_target(titer, c) { + if (titer == t) + break; + tidx++; + } if (c->ops.apply_scheme) { if (quota->esz && quota->charged_sz + sz > quota->esz) { @@ -964,8 +986,11 @@ static void damos_apply_scheme(struct damon_ctx *c, struct damon_target *t, ktime_get_coarse_ts64(); if (c->callback.before_damos_apply) err = c->callback.before_damos_apply(c, t, r, s); - if (!er
[RFC 3/8] mm/damon/core: expose nr_accesses_bp from damos_before_apply tracepoint
damos_before_apply tracepoint is exposing access rate of DAMON regions using nr_accesses, which was actually used by DAMOS in the past. However, it has changed to use nr_accesses_bp instead. Update the tracepoint to expose the value that DAMOS is really using. Note that it doesn't expose the value as is in the basis point, but after converting it to the natural number by dividing it by 10,000. That's for avoiding confuses for old users. Signed-off-by: SeongJae Park --- include/trace/events/damon.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h index 9e7b39495b05..6f98198c0104 100644 --- a/include/trace/events/damon.h +++ b/include/trace/events/damon.h @@ -34,7 +34,7 @@ TRACE_EVENT(damos_before_apply, __entry->target_idx = target_idx; __entry->start = r->ar.start; __entry->end = r->ar.end; - __entry->nr_accesses = r->nr_accesses; + __entry->nr_accesses = r->nr_accesses_bp / 1; __entry->age = r->age; __entry->nr_regions = nr_regions; ), -- 2.25.1
Re: [PATCH v2 00/16] Multigenerational LRU Framework
From: SeongJae Park On Tue, 13 Apr 2021 10:13:24 -0600 Jens Axboe wrote: > On 4/13/21 1:51 AM, SeongJae Park wrote: > > From: SeongJae Park > > > > Hello, > > > > > > Very interesting work, thank you for sharing this :) > > > > On Tue, 13 Apr 2021 00:56:17 -0600 Yu Zhao wrote: > > > >> What's new in v2 > >> > >> Special thanks to Jens Axboe for reporting a regression in buffered > >> I/O and helping test the fix. > > > > Is the discussion open? If so, could you please give me a link? > > I wasn't on the initial post (or any of the lists it was posted to), but > it's on the google page reclaim list. Not sure if that is public or not. > > tldr is that I was pretty excited about this work, as buffered IO tends > to suck (a lot) for high throughput applications. My test case was > pretty simple: > > Randomly read a fast device, using 4k buffered IO, and watch what > happens when the page cache gets filled up. For this particular test, > we'll initially be doing 2.1GB/sec of IO, and then drop to 1.5-1.6GB/sec > with kswapd using a lot of CPU trying to keep up. That's mainline > behavior. > > The initial posting of this patchset did no better, in fact it did a bit > worse. Performance dropped to the same levels and kswapd was using as > much CPU as before, but on top of that we also got excessive swapping. > Not at a high rate, but 5-10MB/sec continually. > > I had some back and forths with Yu Zhao and tested a few new revisions, > and the current series does much better in this regard. Performance > still dips a bit when page cache fills, but not nearly as much, and > kswapd is using less CPU than before. > > Hope that helps, Appreciate this kind and detailed explanation, Jens! So, my understanding is that v2 of this patchset improved the performance by using frequency (tier) in addition to recency (generation number) for buffered I/O pages. That makes sense to me. If I'm misunderstanding, please let me know. Thanks, SeongJae Park > -- > Jens Axboe >
[PATCH v28 11/13] mm/damon: Add kunit tests
From: SeongJae Park This commit adds kunit based unit tests for the core and the virtual address spaces monitoring primitives of DAMON. Signed-off-by: SeongJae Park Reviewed-by: Brendan Higgins --- mm/damon/Kconfig | 36 + mm/damon/core-test.h | 253 mm/damon/core.c | 7 + mm/damon/dbgfs-test.h | 126 mm/damon/dbgfs.c | 2 + mm/damon/vaddr-test.h | 328 ++ mm/damon/vaddr.c | 7 + 7 files changed, 759 insertions(+) create mode 100644 mm/damon/core-test.h create mode 100644 mm/damon/dbgfs-test.h create mode 100644 mm/damon/vaddr-test.h diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig index 72f1683ba0ee..455995152697 100644 --- a/mm/damon/Kconfig +++ b/mm/damon/Kconfig @@ -12,6 +12,18 @@ config DAMON See https://damonitor.github.io/doc/html/latest-damon/index.html for more information. +config DAMON_KUNIT_TEST + bool "Test for damon" if !KUNIT_ALL_TESTS + depends on DAMON && KUNIT=y + default KUNIT_ALL_TESTS + help + This builds the DAMON Kunit test suite. + + For more information on KUnit and unit tests in general, please refer + to the KUnit documentation. + + If unsure, say N. + config DAMON_VADDR bool "Data access monitoring primitives for virtual address spaces" depends on DAMON && MMU @@ -21,6 +33,18 @@ config DAMON_VADDR This builds the default data access monitoring primitives for DAMON that works for virtual address spaces. +config DAMON_VADDR_KUNIT_TEST + bool "Test for DAMON primitives" if !KUNIT_ALL_TESTS + depends on DAMON_VADDR && KUNIT=y + default KUNIT_ALL_TESTS + help + This builds the DAMON virtual addresses primitives Kunit test suite. + + For more information on KUnit and unit tests in general, please refer + to the KUnit documentation. + + If unsure, say N. + config DAMON_DBGFS bool "DAMON debugfs interface" depends on DAMON_VADDR && DEBUG_FS @@ -30,4 +54,16 @@ config DAMON_DBGFS If unsure, say N. +config DAMON_DBGFS_KUNIT_TEST + bool "Test for damon debugfs interface" if !KUNIT_ALL_TESTS + depends on DAMON_DBGFS && KUNIT=y + default KUNIT_ALL_TESTS + help + This builds the DAMON debugfs interface Kunit test suite. + + For more information on KUnit and unit tests in general, please refer + to the KUnit documentation. + + If unsure, say N. + endmenu diff --git a/mm/damon/core-test.h b/mm/damon/core-test.h new file mode 100644 index ..b815dfbfb5fd --- /dev/null +++ b/mm/damon/core-test.h @@ -0,0 +1,253 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Data Access Monitor Unit Tests + * + * Copyright 2019 Amazon.com, Inc. or its affiliates. All rights reserved. + * + * Author: SeongJae Park + */ + +#ifdef CONFIG_DAMON_KUNIT_TEST + +#ifndef _DAMON_CORE_TEST_H +#define _DAMON_CORE_TEST_H + +#include + +static void damon_test_regions(struct kunit *test) +{ + struct damon_region *r; + struct damon_target *t; + + r = damon_new_region(1, 2); + KUNIT_EXPECT_EQ(test, 1ul, r->ar.start); + KUNIT_EXPECT_EQ(test, 2ul, r->ar.end); + KUNIT_EXPECT_EQ(test, 0u, r->nr_accesses); + + t = damon_new_target(42); + KUNIT_EXPECT_EQ(test, 0u, damon_nr_regions(t)); + + damon_add_region(r, t); + KUNIT_EXPECT_EQ(test, 1u, damon_nr_regions(t)); + + damon_del_region(r); + KUNIT_EXPECT_EQ(test, 0u, damon_nr_regions(t)); + + damon_free_target(t); +} + +static unsigned int nr_damon_targets(struct damon_ctx *ctx) +{ + struct damon_target *t; + unsigned int nr_targets = 0; + + damon_for_each_target(t, ctx) + nr_targets++; + + return nr_targets; +} + +static void damon_test_target(struct kunit *test) +{ + struct damon_ctx *c = damon_new_ctx(); + struct damon_target *t; + + t = damon_new_target(42); + KUNIT_EXPECT_EQ(test, 42ul, t->id); + KUNIT_EXPECT_EQ(test, 0u, nr_damon_targets(c)); + + damon_add_target(c, t); + KUNIT_EXPECT_EQ(test, 1u, nr_damon_targets(c)); + + damon_destroy_target(t); + KUNIT_EXPECT_EQ(test, 0u, nr_damon_targets(c)); + + damon_destroy_ctx(c); +} + +/* + * Test kdamond_reset_aggregated() + * + * DAMON checks access to each region and aggregates this information as the + * access frequency of each region. In detail, it increases '->nr_accesses' of + * regions that an access has confirmed. 'kdamond_reset_aggregated()' flushes + * the aggregated information ('->nr_accesses' of each regions) to the result + * buffer. As a result of the flushing, the '->nr_accesses' of regions are + * initialized to zero. + */ +static voi
[PATCH v28 12/13] mm/damon: Add user space selftests
From: SeongJae Park This commit adds a simple user space tests for DAMON. The tests are using kselftest framework. Signed-off-by: SeongJae Park --- tools/testing/selftests/damon/Makefile| 7 ++ .../selftests/damon/_chk_dependency.sh| 28 ++ .../testing/selftests/damon/debugfs_attrs.sh | 98 +++ 3 files changed, 133 insertions(+) create mode 100644 tools/testing/selftests/damon/Makefile create mode 100644 tools/testing/selftests/damon/_chk_dependency.sh create mode 100755 tools/testing/selftests/damon/debugfs_attrs.sh diff --git a/tools/testing/selftests/damon/Makefile b/tools/testing/selftests/damon/Makefile new file mode 100644 index ..8a3f2cd9fec0 --- /dev/null +++ b/tools/testing/selftests/damon/Makefile @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0 +# Makefile for damon selftests + +TEST_FILES = _chk_dependency.sh +TEST_PROGS = debugfs_attrs.sh + +include ../lib.mk diff --git a/tools/testing/selftests/damon/_chk_dependency.sh b/tools/testing/selftests/damon/_chk_dependency.sh new file mode 100644 index ..e090836c2bf7 --- /dev/null +++ b/tools/testing/selftests/damon/_chk_dependency.sh @@ -0,0 +1,28 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +# Kselftest framework requirement - SKIP code is 4. +ksft_skip=4 + +DBGFS=/sys/kernel/debug/damon + +if [ $EUID -ne 0 ]; +then + echo "Run as root" + exit $ksft_skip +fi + +if [ ! -d $DBGFS ] +then + echo "$DBGFS not found" + exit $ksft_skip +fi + +for f in attrs target_ids monitor_on +do + if [ ! -f "$DBGFS/$f" ] + then + echo "$f not found" + exit 1 + fi +done diff --git a/tools/testing/selftests/damon/debugfs_attrs.sh b/tools/testing/selftests/damon/debugfs_attrs.sh new file mode 100755 index ..4a8ab4910ee4 --- /dev/null +++ b/tools/testing/selftests/damon/debugfs_attrs.sh @@ -0,0 +1,98 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +source ./_chk_dependency.sh + +# Test attrs file +file="$DBGFS/attrs" + +ORIG_CONTENT=$(cat $file) + +echo 1 2 3 4 5 > $file +if [ $? -ne 0 ] +then + echo "$file write failed" + echo $ORIG_CONTENT > $file + exit 1 +fi + +echo 1 2 3 4 > $file +if [ $? -eq 0 ] +then + echo "$file write success (should failed)" + echo $ORIG_CONTENT > $file + exit 1 +fi + +CONTENT=$(cat $file) +if [ "$CONTENT" != "1 2 3 4 5" ] +then + echo "$file not written" + echo $ORIG_CONTENT > $file + exit 1 +fi + +echo $ORIG_CONTENT > $file + +# Test target_ids file +file="$DBGFS/target_ids" + +ORIG_CONTENT=$(cat $file) + +echo "1 2 3 4" > $file +if [ $? -ne 0 ] +then + echo "$file write fail" + echo $ORIG_CONTENT > $file + exit 1 +fi + +echo "1 2 abc 4" > $file +if [ $? -ne 0 ] +then + echo "$file write fail" + echo $ORIG_CONTENT > $file + exit 1 +fi + +CONTENT=$(cat $file) +if [ "$CONTENT" != "1 2" ] +then + echo "$file not written" + echo $ORIG_CONTENT > $file + exit 1 +fi + +echo abc 2 3 > $file +if [ $? -ne 0 ] +then + echo "$file wrong value write fail" + echo $ORIG_CONTENT > $file + exit 1 +fi + +if [ ! -z "$(cat $file)" ] +then + echo "$file not cleared" + echo $ORIG_CONTENT > $file + exit 1 +fi + +echo > $file +if [ $? -ne 0 ] +then + echo "$file init fail" + echo $ORIG_CONTENT > $file + exit 1 +fi + +if [ ! -z "$(cat $file)" ] +then + echo "$file not initialized" + echo $ORIG_CONTENT > $file + exit 1 +fi + +echo $ORIG_CONTENT > $file + +echo "PASS" -- 2.17.1
[PATCH v28 10/13] Documentation: Add documents for DAMON
From: SeongJae Park This commit adds documents for DAMON under `Documentation/admin-guide/mm/damon/` and `Documentation/vm/damon/`. Signed-off-by: SeongJae Park --- Documentation/admin-guide/mm/damon/guide.rst | 158 + Documentation/admin-guide/mm/damon/index.rst | 15 ++ Documentation/admin-guide/mm/damon/plans.rst | 29 +++ Documentation/admin-guide/mm/damon/start.rst | 114 + Documentation/admin-guide/mm/damon/usage.rst | 112 + Documentation/admin-guide/mm/index.rst | 1 + Documentation/vm/damon/api.rst | 20 ++ Documentation/vm/damon/design.rst| 166 + Documentation/vm/damon/eval.rst | 232 +++ Documentation/vm/damon/faq.rst | 58 + Documentation/vm/damon/index.rst | 31 +++ Documentation/vm/index.rst | 1 + 12 files changed, 937 insertions(+) create mode 100644 Documentation/admin-guide/mm/damon/guide.rst create mode 100644 Documentation/admin-guide/mm/damon/index.rst create mode 100644 Documentation/admin-guide/mm/damon/plans.rst create mode 100644 Documentation/admin-guide/mm/damon/start.rst create mode 100644 Documentation/admin-guide/mm/damon/usage.rst create mode 100644 Documentation/vm/damon/api.rst create mode 100644 Documentation/vm/damon/design.rst create mode 100644 Documentation/vm/damon/eval.rst create mode 100644 Documentation/vm/damon/faq.rst create mode 100644 Documentation/vm/damon/index.rst diff --git a/Documentation/admin-guide/mm/damon/guide.rst b/Documentation/admin-guide/mm/damon/guide.rst new file mode 100644 index ..f52dc1669bb1 --- /dev/null +++ b/Documentation/admin-guide/mm/damon/guide.rst @@ -0,0 +1,158 @@ +.. SPDX-License-Identifier: GPL-2.0 + +== +Optimization Guide +== + +This document helps you estimating the amount of benefit that you could get +from DAMON-based optimizations, and describes how you could achieve it. You +are assumed to already read :doc:`start`. + + +Check The Signs +=== + +No optimization can provide same extent of benefit to every case. Therefore +you should first guess how much improvements you could get using DAMON. If +some of below conditions match your situation, you could consider using DAMON. + +- *Low IPC and High Cache Miss Ratios.* Low IPC means most of the CPU time is + spent waiting for the completion of time-consuming operations such as memory + access, while high cache miss ratios mean the caches don't help it well. + DAMON is not for cache level optimization, but DRAM level. However, + improving DRAM management will also help this case by reducing the memory + operation latency. +- *Memory Over-commitment and Unknown Users.* If you are doing memory + overcommitment and you cannot control every user of your system, a memory + bank run could happen at any time. You can estimate when it will happen + based on DAMON's monitoring results and act earlier to avoid or deal better + with the crisis. +- *Frequent Memory Pressure.* Frequent memory pressure means your system has + wrong configurations or memory hogs. DAMON will help you find the right + configuration and/or the criminals. +- *Heterogeneous Memory System.* If your system is utilizing memory devices + that placed between DRAM and traditional hard disks, such as non-volatile + memory or fast SSDs, DAMON could help you utilizing the devices more + efficiently. + + +Profile +=== + +If you found some positive signals, you could start by profiling your workloads +using DAMON. Find major workloads on your systems and analyze their data +access pattern to find something wrong or can be improved. The DAMON user +space tool (``damo``) will be useful for this. You can get ``damo`` from +https://github.com/awslabs/damo. + +We recommend you to start from working set size distribution check using ``damo +report wss``. If the distribution is ununiform or quite different from what +you estimated, you could consider `Memory Configuration`_ optimization. + +Then, review the overall access pattern in heatmap form using ``damo report +heats``. If it shows a simple pattern consists of a small number of memory +regions having high contrast of access temperature, you could consider manual +`Program Modification`_. + +If you still want to absorb more benefits, you should develop `Personalized +DAMON Application`_ for your special case. + +You don't need to take only one approach among the above plans, but you could +use multiple of the above approaches to maximize the benefit. + + +Optimize + + +If the profiling result also says it's worth trying some optimization, you +could consider below approaches. Note that some of the below approaches assume +that your systems are configured with swap devices or other types of auxiliary +memory so that you don't strictly required to accommodate the whole working set +in the main memory. Most
[PATCH v28 13/13] MAINTAINERS: Update for DAMON
From: SeongJae Park This commit updates MAINTAINERS file for DAMON related files. Signed-off-by: SeongJae Park --- MAINTAINERS | 12 1 file changed, 12 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 4d68184d3f76..42bbcaec5050 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5025,6 +5025,18 @@ F: net/ax25/ax25_out.c F: net/ax25/ax25_timer.c F: net/ax25/sysctl_net_ax25.c +DATA ACCESS MONITOR +M: SeongJae Park +L: linux...@kvack.org +S: Maintained +F: Documentation/admin-guide/mm/damon/* +F: Documentation/vm/damon/* +F: include/linux/damon.h +F: include/trace/events/damon.h +F: mm/damon/* +F: tools/damon/* +F: tools/testing/selftests/damon/* + DAVICOM FAST ETHERNET (DMFE) NETWORK DRIVER L: net...@vger.kernel.org S: Orphan -- 2.17.1
[PATCH v28 08/13] mm/damon/dbgfs: Export kdamond pid to the user space
From: SeongJae Park For CPU usage accounting, knowing pid of the monitoring thread could be helpful. For example, users could use cpuaccount cgroups with the pid. This commit therefore exports the pid of currently running monitoring thread to the user space via 'kdamond_pid' file in the debugfs directory. Signed-off-by: SeongJae Park --- mm/damon/dbgfs.c | 38 -- 1 file changed, 36 insertions(+), 2 deletions(-) diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c index 17c7878cfcb8..67b273472c0b 100644 --- a/mm/damon/dbgfs.c +++ b/mm/damon/dbgfs.c @@ -237,6 +237,32 @@ static ssize_t dbgfs_target_ids_write(struct file *file, return ret; } +static ssize_t dbgfs_kdamond_pid_read(struct file *file, + char __user *buf, size_t count, loff_t *ppos) +{ + struct damon_ctx *ctx = file->private_data; + char *kbuf; + ssize_t len; + + kbuf = kmalloc(count, GFP_KERNEL); + if (!kbuf) + return -ENOMEM; + + mutex_lock(>kdamond_lock); + if (ctx->kdamond) + len = scnprintf(kbuf, count, "%d\n", ctx->kdamond->pid); + else + len = scnprintf(kbuf, count, "none\n"); + mutex_unlock(>kdamond_lock); + if (!len) + goto out; + len = simple_read_from_buffer(buf, count, ppos, kbuf, len); + +out: + kfree(kbuf); + return len; +} + static int damon_dbgfs_open(struct inode *inode, struct file *file) { file->private_data = inode->i_private; @@ -258,10 +284,18 @@ static const struct file_operations target_ids_fops = { .write = dbgfs_target_ids_write, }; +static const struct file_operations kdamond_pid_fops = { + .owner = THIS_MODULE, + .open = damon_dbgfs_open, + .read = dbgfs_kdamond_pid_read, +}; + static void dbgfs_fill_ctx_dir(struct dentry *dir, struct damon_ctx *ctx) { - const char * const file_names[] = {"attrs", "target_ids"}; - const struct file_operations *fops[] = {_fops, _ids_fops}; + const char * const file_names[] = {"attrs", "target_ids", + "kdamond_pid"}; + const struct file_operations *fops[] = {_fops, _ids_fops, + _pid_fops}; int i; for (i = 0; i < ARRAY_SIZE(file_names); i++) -- 2.17.1
[PATCH v28 09/13] mm/damon/dbgfs: Support multiple contexts
From: SeongJae Park In some use cases, users would want to run multiple monitoring context. For example, if a user wants a high precision monitoring and dedicating multiple CPUs for the job is ok, because DAMON creates one monitoring thread per one context, the user can split the monitoring target regions into multiple small regions and create one context for each region. Or, someone might want to simultaneously monitor different address spaces, e.g., both virtual address space and physical address space. The DAMON's API allows such usage, but 'damon-dbgfs' does not. Therefore, only kernel space DAMON users can do multiple contexts monitoring. This commit allows the user space DAMON users to use multiple contexts monitoring by introducing two new 'damon-dbgfs' debugfs files, 'mk_context' and 'rm_context'. Users can create a new monitoring context by writing the desired name of the new context to 'mk_context'. Then, a new directory with the name and having the files for setting of the context ('attrs', 'target_ids' and 'record') will be created under the debugfs directory. Writing the name of the context to remove to 'rm_context' will remove the related context and directory. Signed-off-by: SeongJae Park --- mm/damon/dbgfs.c | 197 ++- 1 file changed, 195 insertions(+), 2 deletions(-) diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c index 67b273472c0b..734bc14f0100 100644 --- a/mm/damon/dbgfs.c +++ b/mm/damon/dbgfs.c @@ -18,6 +18,7 @@ static struct damon_ctx **dbgfs_ctxs; static int dbgfs_nr_ctxs; static struct dentry **dbgfs_dirs; +static DEFINE_MUTEX(damon_dbgfs_lock); /* * Returns non-empty string on success, negarive error code otherwise. @@ -314,6 +315,186 @@ static struct damon_ctx *dbgfs_new_ctx(void) return ctx; } +static void dbgfs_destroy_ctx(struct damon_ctx *ctx) +{ + damon_destroy_ctx(ctx); +} + +/* + * Make a context of @name and create a debugfs directory for it. + * + * This function should be called while holding damon_dbgfs_lock. + * + * Returns 0 on success, negative error code otherwise. + */ +static int dbgfs_mk_context(char *name) +{ + struct dentry *root, **new_dirs, *new_dir; + struct damon_ctx **new_ctxs, *new_ctx; + + if (damon_nr_running_ctxs()) + return -EBUSY; + + new_ctxs = krealloc(dbgfs_ctxs, sizeof(*dbgfs_ctxs) * + (dbgfs_nr_ctxs + 1), GFP_KERNEL); + if (!new_ctxs) + return -ENOMEM; + dbgfs_ctxs = new_ctxs; + + new_dirs = krealloc(dbgfs_dirs, sizeof(*dbgfs_dirs) * + (dbgfs_nr_ctxs + 1), GFP_KERNEL); + if (!new_dirs) + return -ENOMEM; + dbgfs_dirs = new_dirs; + + root = dbgfs_dirs[0]; + if (!root) + return -ENOENT; + + new_dir = debugfs_create_dir(name, root); + dbgfs_dirs[dbgfs_nr_ctxs] = new_dir; + + new_ctx = dbgfs_new_ctx(); + if (!new_ctx) { + debugfs_remove(new_dir); + dbgfs_dirs[dbgfs_nr_ctxs] = NULL; + return -ENOMEM; + } + + dbgfs_ctxs[dbgfs_nr_ctxs] = new_ctx; + dbgfs_fill_ctx_dir(dbgfs_dirs[dbgfs_nr_ctxs], + dbgfs_ctxs[dbgfs_nr_ctxs]); + dbgfs_nr_ctxs++; + + return 0; +} + +static ssize_t dbgfs_mk_context_write(struct file *file, + const char __user *buf, size_t count, loff_t *ppos) +{ + char *kbuf; + char *ctx_name; + ssize_t ret = count; + int err; + + kbuf = user_input_str(buf, count, ppos); + if (IS_ERR(kbuf)) + return PTR_ERR(kbuf); + ctx_name = kmalloc(count + 1, GFP_KERNEL); + if (!ctx_name) { + kfree(kbuf); + return -ENOMEM; + } + + /* Trim white space */ + if (sscanf(kbuf, "%s", ctx_name) != 1) { + ret = -EINVAL; + goto out; + } + + mutex_lock(_dbgfs_lock); + err = dbgfs_mk_context(ctx_name); + if (err) + ret = err; + mutex_unlock(_dbgfs_lock); + +out: + kfree(kbuf); + kfree(ctx_name); + return ret; +} + +/* + * Remove a context of @name and its debugfs directory. + * + * This function should be called while holding damon_dbgfs_lock. + * + * Return 0 on success, negative error code otherwise. + */ +static int dbgfs_rm_context(char *name) +{ + struct dentry *root, *dir, **new_dirs; + struct damon_ctx **new_ctxs; + int i, j; + + if (damon_nr_running_ctxs()) + return -EBUSY; + + root = dbgfs_dirs[0]; + if (!root) + return -ENOENT; + + dir = debugfs_lookup(name, root); + if (!dir) + return -ENOENT; + + new_dirs = kmalloc_array(dbgfs_nr_ctxs - 1, sizeof(*dbgfs_dirs), + GFP_KERNEL); + if (!new_dirs) + return -ENOMEM; + + new_ctxs = kma
[PATCH v28 05/13] mm/damon: Implement primitives for the virtual memory address spaces
From: SeongJae Park This commit introduces a reference implementation of the address space specific low level primitives for the virtual address space, so that users of DAMON can easily monitor the data accesses on virtual address spaces of specific processes by simply configuring the implementation to be used by DAMON. The low level primitives for the fundamental access monitoring are defined in two parts: 1. Identification of the monitoring target address range for the address space. 2. Access check of specific address range in the target space. The reference implementation for the virtual address space does the works as below. PTE Accessed-bit Based Access Check --- The implementation uses PTE Accessed-bit for basic access checks. That is, it clears the bit for the next sampling target page and checks whether it is set again after one sampling period. This could disturb the reclaim logic. DAMON uses ``PG_idle`` and ``PG_young`` page flags to solve the conflict, as Idle page tracking does. VMA-based Target Address Range Construction --- Only small parts in the super-huge virtual address space of the processes are mapped to physical memory and accessed. Thus, tracking the unmapped address regions is just wasteful. However, because DAMON can deal with some level of noise using the adaptive regions adjustment mechanism, tracking every mapping is not strictly required but could even incur a high overhead in some cases. That said, too huge unmapped areas inside the monitoring target should be removed to not take the time for the adaptive mechanism. For the reason, this implementation converts the complex mappings to three distinct regions that cover every mapped area of the address space. Also, the two gaps between the three regions are the two biggest unmapped areas in the given address space. The two biggest unmapped areas would be the gap between the heap and the uppermost mmap()-ed region, and the gap between the lowermost mmap()-ed region and the stack in most of the cases. Because these gaps are exceptionally huge in usual address spaces, excluding these will be sufficient to make a reasonable trade-off. Below shows this in detail:: (small mmap()-ed regions and munmap()-ed regions) Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster --- include/linux/damon.h | 13 + mm/damon/Kconfig | 9 + mm/damon/Makefile | 1 + mm/damon/vaddr.c | 616 ++ 4 files changed, 639 insertions(+) create mode 100644 mm/damon/vaddr.c diff --git a/include/linux/damon.h b/include/linux/damon.h index 0bd5d6913a6c..72cf5ebd35fe 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -246,4 +246,17 @@ int damon_stop(struct damon_ctx **ctxs, int nr_ctxs); #endif /* CONFIG_DAMON */ +#ifdef CONFIG_DAMON_VADDR + +/* Monitoring primitives for virtual memory address spaces */ +void damon_va_init(struct damon_ctx *ctx); +void damon_va_update(struct damon_ctx *ctx); +void damon_va_prepare_access_checks(struct damon_ctx *ctx); +unsigned int damon_va_check_accesses(struct damon_ctx *ctx); +bool damon_va_target_valid(void *t); +void damon_va_cleanup(struct damon_ctx *ctx); +void damon_va_set_primitives(struct damon_ctx *ctx); + +#endif /* CONFIG_DAMON_VADDR */ + #endif /* _DAMON_H */ diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig index d00e99ac1a15..8ae080c52950 100644 --- a/mm/damon/Kconfig +++ b/mm/damon/Kconfig @@ -12,4 +12,13 @@ config DAMON See https://damonitor.github.io/doc/html/latest-damon/index.html for more information. +config DAMON_VADDR + bool "Data access monitoring primitives for virtual address spaces" + depends on DAMON && MMU + select PAGE_EXTENSION if !64BIT + select PAGE_IDLE_FLAG + help + This builds the default data access monitoring primitives for DAMON + that works for virtual address spaces. + endmenu diff --git a/mm/damon/Makefile b/mm/damon/Makefile index 4fd2edb4becf..6ebbd08aed67 100644 --- a/mm/damon/Makefile +++ b/mm/damon/Makefile @@ -1,3 +1,4 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_DAMON):= core.o +obj-$(CONFIG_DAMON_VADDR) += vaddr.o diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c new file mode 100644 index ..3bc9dc9f0656 --- /dev/null +++ b/mm/damon/vaddr.c @@ -0,0 +1,616 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * DAMON Primitives for Virtual Address Spaces + * + * Author: SeongJae Park + */ + +#define pr_fmt(fmt) "damon-va: " fmt + +#include +#include +#include +#include +#include +#include +#include + +/* Get a random number in [l, r) */ +#define damon_rand(l, r) (l + prandom_u32_max(r - l)) + +/* + * 't->id' should be the pointer to the relevant 'struct pid' having reference + * count. Caller must put the returned task, u
[PATCH v28 03/13] mm/damon: Adaptively adjust regions
From: SeongJae Park Even somehow the initial monitoring target regions are well constructed to fulfill the assumption (pages in same region have similar access frequencies), the data access pattern can be dynamically changed. This will result in low monitoring quality. To keep the assumption as much as possible, DAMON adaptively merges and splits each region based on their access frequency. For each ``aggregation interval``, it compares the access frequencies of adjacent regions and merges those if the frequency difference is small. Then, after it reports and clears the aggregated access frequency of each region, it splits each region into two or three regions if the total number of regions will not exceed the user-specified maximum number of regions after the split. In this way, DAMON provides its best-effort quality and minimal overhead while keeping the upper-bound overhead that users set. Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster --- include/linux/damon.h | 23 +++-- mm/damon/core.c | 214 +- 2 files changed, 227 insertions(+), 10 deletions(-) diff --git a/include/linux/damon.h b/include/linux/damon.h index 67db309ad61b..0bd5d6913a6c 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -12,6 +12,9 @@ #include #include +/* Minimal region size. Every damon_region is aligned by this. */ +#define DAMON_MIN_REGION PAGE_SIZE + /** * struct damon_addr_range - Represents an address region of [@start, @end). * @start: Start address of the region (inclusive). @@ -85,6 +88,8 @@ struct damon_ctx; * prepared for the next access check. * @check_accesses should check the accesses to each region that made after the * last preparation and update the number of observed accesses of each region. + * It should also return max number of observed accesses that made as a result + * of its update. The value will be used for regions adjustment threshold. * @reset_aggregated should reset the access monitoring results that aggregated * by @check_accesses. * @target_valid should check whether the target is still valid for the @@ -95,7 +100,7 @@ struct damon_primitive { void (*init)(struct damon_ctx *context); void (*update)(struct damon_ctx *context); void (*prepare_access_checks)(struct damon_ctx *context); - void (*check_accesses)(struct damon_ctx *context); + unsigned int (*check_accesses)(struct damon_ctx *context); void (*reset_aggregated)(struct damon_ctx *context); bool (*target_valid)(void *target); void (*cleanup)(struct damon_ctx *context); @@ -172,7 +177,9 @@ struct damon_callback { * @primitive: Set of monitoring primitives for given use cases. * @callback: Set of callbacks for monitoring events notifications. * - * @region_targets:Head of monitoring targets (_target) list. + * @min_nr_regions:The minimum number of adaptive monitoring regions. + * @max_nr_regions:The maximum number of adaptive monitoring regions. + * @adaptive_targets: Head of monitoring targets (_target) list. */ struct damon_ctx { unsigned long sample_interval; @@ -191,7 +198,9 @@ struct damon_ctx { struct damon_primitive primitive; struct damon_callback callback; - struct list_head region_targets; + unsigned long min_nr_regions; + unsigned long max_nr_regions; + struct list_head adaptive_targets; }; #define damon_next_region(r) \ @@ -207,10 +216,10 @@ struct damon_ctx { list_for_each_entry_safe(r, next, >regions_list, list) #define damon_for_each_target(t, ctx) \ - list_for_each_entry(t, &(ctx)->region_targets, list) + list_for_each_entry(t, &(ctx)->adaptive_targets, list) #define damon_for_each_target_safe(t, next, ctx) \ - list_for_each_entry_safe(t, next, &(ctx)->region_targets, list) + list_for_each_entry_safe(t, next, &(ctx)->adaptive_targets, list) #ifdef CONFIG_DAMON @@ -224,11 +233,13 @@ struct damon_target *damon_new_target(unsigned long id); void damon_add_target(struct damon_ctx *ctx, struct damon_target *t); void damon_free_target(struct damon_target *t); void damon_destroy_target(struct damon_target *t); +unsigned int damon_nr_regions(struct damon_target *t); struct damon_ctx *damon_new_ctx(void); void damon_destroy_ctx(struct damon_ctx *ctx); int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, - unsigned long aggr_int, unsigned long primitive_upd_int); + unsigned long aggr_int, unsigned long primitive_upd_int, + unsigned long min_nr_reg, unsigned long max_nr_reg); int damon_start(struct damon_ctx **ctxs, int nr_ctxs); int damon_stop(struct damon_ctx **ctxs, int nr_ctxs); diff --git a/mm/damon/core.c b/mm/damon/core.c index 94db494dcf70..b36b6bdd94e2 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -10,8 +10,12 @@ #include
[PATCH v28 06/13] mm/damon: Add a tracepoint
From: SeongJae Park This commit adds a tracepoint for DAMON. It traces the monitoring results of each region for each aggregation interval. Using this, DAMON can easily integrated with tracepoints supporting tools such as perf. Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster Reviewed-by: Steven Rostedt (VMware) --- include/trace/events/damon.h | 43 mm/damon/core.c | 7 +- 2 files changed, 49 insertions(+), 1 deletion(-) create mode 100644 include/trace/events/damon.h diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h new file mode 100644 index ..2f422f4f1fb9 --- /dev/null +++ b/include/trace/events/damon.h @@ -0,0 +1,43 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM damon + +#if !defined(_TRACE_DAMON_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_DAMON_H + +#include +#include +#include + +TRACE_EVENT(damon_aggregated, + + TP_PROTO(struct damon_target *t, struct damon_region *r, + unsigned int nr_regions), + + TP_ARGS(t, r, nr_regions), + + TP_STRUCT__entry( + __field(unsigned long, target_id) + __field(unsigned int, nr_regions) + __field(unsigned long, start) + __field(unsigned long, end) + __field(unsigned int, nr_accesses) + ), + + TP_fast_assign( + __entry->target_id = t->id; + __entry->nr_regions = nr_regions; + __entry->start = r->ar.start; + __entry->end = r->ar.end; + __entry->nr_accesses = r->nr_accesses; + ), + + TP_printk("target_id=%lu nr_regions=%u %lu-%lu: %u", + __entry->target_id, __entry->nr_regions, + __entry->start, __entry->end, __entry->nr_accesses) +); + +#endif /* _TRACE_DAMON_H */ + +/* This part must be outside protection */ +#include diff --git a/mm/damon/core.c b/mm/damon/core.c index b36b6bdd94e2..912112662d0c 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -13,6 +13,9 @@ #include #include +#define CREATE_TRACE_POINTS +#include + /* Get a random number in [l, r) */ #define damon_rand(l, r) (l + prandom_u32_max(r - l)) @@ -388,8 +391,10 @@ static void kdamond_reset_aggregated(struct damon_ctx *c) damon_for_each_target(t, c) { struct damon_region *r; - damon_for_each_region(r, t) + damon_for_each_region(r, t) { + trace_damon_aggregated(t, r, damon_nr_regions(t)); r->nr_accesses = 0; + } } } -- 2.17.1
[PATCH v28 07/13] mm/damon: Implement a debugfs-based user space interface
From: SeongJae Park DAMON is designed to be used by kernel space code such as the memory management subsystems, and therefore it provides only kernel space API. That said, letting the user space control DAMON could provide some benefits to them. For example, it will allow user space to analyze their specific workloads and make their own special optimizations. For such cases, this commit implements a simple DAMON application kernel module, namely 'damon-dbgfs', which merely wraps the DAMON api and exports those to the user space via the debugfs. 'damon-dbgfs' exports three files, ``attrs``, ``target_ids``, and ``monitor_on`` under its debugfs directory, ``/damon/``. Attributes -- Users can read and write the ``sampling interval``, ``aggregation interval``, ``regions update interval``, and min/max number of monitoring target regions by reading from and writing to the ``attrs`` file. For example, below commands set those values to 5 ms, 100 ms, 1,000 ms, 10, 1000 and check it again:: # cd /damon # echo 5000 10 100 10 1000 > attrs # cat attrs 5000 10 100 10 1000 Target IDs -- Some types of address spaces supports multiple monitoring target. For example, the virtual memory address spaces monitoring can have multiple processes as the monitoring targets. Users can set the targets by writing relevant id values of the targets to, and get the ids of the current targets by reading from the ``target_ids`` file. In case of the virtual address spaces monitoring, the values should be pids of the monitoring target processes. For example, below commands set processes having pids 42 and 4242 as the monitoring targets and check it again:: # cd /damon # echo 42 4242 > target_ids # cat target_ids 42 4242 Note that setting the target ids doesn't start the monitoring. Turning On/Off -- Setting the files as described above doesn't incur effect unless you explicitly start the monitoring. You can start, stop, and check the current status of the monitoring by writing to and reading from the ``monitor_on`` file. Writing ``on`` to the file starts the monitoring of the targets with the attributes. Writing ``off`` to the file stops those. DAMON also stops if every targets are invalidated (in case of the virtual memory monitoring, target processes are invalidated when terminated). Below example commands turn on, off, and check the status of DAMON:: # cd /damon # echo on > monitor_on # echo off > monitor_on # cat monitor_on off Please note that you cannot write to the above-mentioned debugfs files while the monitoring is turned on. If you write to the files while DAMON is running, an error code such as ``-EBUSY`` will be returned. Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster --- include/linux/damon.h | 3 + mm/damon/Kconfig | 9 + mm/damon/Makefile | 1 + mm/damon/core.c | 47 + mm/damon/dbgfs.c | 386 ++ 5 files changed, 446 insertions(+) create mode 100644 mm/damon/dbgfs.c diff --git a/include/linux/damon.h b/include/linux/damon.h index 72cf5ebd35fe..b17e808a9cae 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -237,9 +237,12 @@ unsigned int damon_nr_regions(struct damon_target *t); struct damon_ctx *damon_new_ctx(void); void damon_destroy_ctx(struct damon_ctx *ctx); +int damon_set_targets(struct damon_ctx *ctx, + unsigned long *ids, ssize_t nr_ids); int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, unsigned long aggr_int, unsigned long primitive_upd_int, unsigned long min_nr_reg, unsigned long max_nr_reg); +int damon_nr_running_ctxs(void); int damon_start(struct damon_ctx **ctxs, int nr_ctxs); int damon_stop(struct damon_ctx **ctxs, int nr_ctxs); diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig index 8ae080c52950..72f1683ba0ee 100644 --- a/mm/damon/Kconfig +++ b/mm/damon/Kconfig @@ -21,4 +21,13 @@ config DAMON_VADDR This builds the default data access monitoring primitives for DAMON that works for virtual address spaces. +config DAMON_DBGFS + bool "DAMON debugfs interface" + depends on DAMON_VADDR && DEBUG_FS + help + This builds the debugfs interface for DAMON. The user space admins + can use the interface for arbitrary data access monitoring. + + If unsure, say N. + endmenu diff --git a/mm/damon/Makefile b/mm/damon/Makefile index 6ebbd08aed67..fed4be3bace3 100644 --- a/mm/damon/Makefile +++ b/mm/damon/Makefile @@ -2,3 +2,4 @@ obj-$(CONFIG_DAMON):= core.o obj-$(CONFIG_DAMON_VADDR) += vaddr.o +obj-$(CONFIG_DAMON_DBGFS) += dbgfs.o diff --git a/mm/damon/core.c b/mm/damon/core.c index 912112662d0c..cad2b4cee39d 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -172,6 +172,39 @@ void damon_destroy_ctx(struct dam
[PATCH v28 01/13] mm: Introduce Data Access MONitor (DAMON)
From: SeongJae Park DAMON is a data access monitoring framework for the Linux kernel. The core mechanisms of DAMON make it - accurate (the monitoring output is useful enough for DRAM level performance-centric memory management; It might be inappropriate for CPU cache levels, though), - light-weight (the monitoring overhead is normally low enough to be applied online), and - scalable (the upper-bound of the overhead is in constant range regardless of the size of target workloads). Using this framework, hence, we can easily write efficient kernel space data access monitoring applications. For example, the kernel's memory management mechanisms can make advanced decisions using this. Experimental data access aware optimization works that incurring high access monitoring overhead could again be implemented on top of this. Due to its simple and flexible interface, providing user space interface would be also easy. Then, user space users who have some special workloads can write personalized applications for better understanding and optimizations of their workloads and systems. === Nevertheless, this commit is defining and implementing only basic access check part without the overhead-accuracy handling core logic. The basic access check is as below. The output of DAMON says what memory regions are how frequently accessed for a given duration. The resolution of the access frequency is controlled by setting ``sampling interval`` and ``aggregation interval``. In detail, DAMON checks access to each page per ``sampling interval`` and aggregates the results. In other words, counts the number of the accesses to each region. After each ``aggregation interval`` passes, DAMON calls callback functions that previously registered by users so that users can read the aggregated results and then clears the results. This can be described in below simple pseudo-code:: init() while monitoring_on: for page in monitoring_target: if accessed(page): nr_accesses[page] += 1 if time() % aggregation_interval == 0: for callback in user_registered_callbacks: callback(monitoring_target, nr_accesses) for page in monitoring_target: nr_accesses[page] = 0 if time() % update_interval == 0: update() sleep(sampling interval) The target regions constructed at the beginning of the monitoring and updated after each ``regions_update_interval``, because the target regions could be dynamically changed (e.g., mmap() or memory hotplug). The monitoring overhead of this mechanism will arbitrarily increase as the size of the target workload grows. The basic monitoring primitives for actual access check and dynamic target regions construction aren't in the core part of DAMON. Instead, it allows users to implement their own primitives that are optimized for their use case and configure DAMON to use those. In other words, users cannot use current version of DAMON without some additional works. Following commits will implement the core mechanisms for the overhead-accuracy control and default primitives implementations. Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster --- include/linux/damon.h | 167 ++ mm/Kconfig| 3 + mm/Makefile | 1 + mm/damon/Kconfig | 15 ++ mm/damon/Makefile | 3 + mm/damon/core.c | 318 ++ 6 files changed, 507 insertions(+) create mode 100644 include/linux/damon.h create mode 100644 mm/damon/Kconfig create mode 100644 mm/damon/Makefile create mode 100644 mm/damon/core.c diff --git a/include/linux/damon.h b/include/linux/damon.h new file mode 100644 index ..2f652602b1ea --- /dev/null +++ b/include/linux/damon.h @@ -0,0 +1,167 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * DAMON api + * + * Author: SeongJae Park + */ + +#ifndef _DAMON_H_ +#define _DAMON_H_ + +#include +#include +#include + +struct damon_ctx; + +/** + * struct damon_primitive Monitoring primitives for given use cases. + * + * @init: Initialize primitive-internal data structures. + * @update:Update primitive-internal data structures. + * @prepare_access_checks: Prepare next access check of target regions. + * @check_accesses:Check the accesses to target regions. + * @reset_aggregated: Reset aggregated accesses monitoring results. + * @target_valid: Determine if the target is valid. + * @cleanup: Clean up the context. + * + * DAMON can be extended for various address spaces and usages. For this, + * users should register the low level primitives for their target address + * space and usecase via the _ctx.primitive. Then, the monitoring thread + * (_ctx.kdamond) calls @init and @prepare_access_checks before starting + * the monitoring, @update after each
[PATCH v28 04/13] mm/idle_page_tracking: Make PG_idle reusable
From: SeongJae Park PG_idle and PG_young allow the two PTE Accessed bit users, Idle Page Tracking and the reclaim logic concurrently work while don't interfere each other. That is, when they need to clear the Accessed bit, they set PG_young to represent the previous state of the bit, respectively. And when they need to read the bit, if the bit is cleared, they further read the PG_young to know whether the other has cleared the bit meanwhile or not. We could add another page flag and extend the mechanism to use the flag if we need to add another concurrent PTE Accessed bit user subsystem. However, the space is limited. Meanwhile, if the new subsystem is mutually exclusive with IDLE_PAGE_TRACKING or interfering with it is not a real problem, it would be ok to simply reuse the PG_idle flag. However, it's impossible because the flags are dependent on IDLE_PAGE_TRACKING. To allow such reuse of the flags, this commit separates the PG_young and PG_idle flag logic from IDLE_PAGE_TRACKING and introduces new kernel config, 'PAGE_IDLE_FLAG'. Hence, a new subsystem would be able to reuse PG_idle without depending on IDLE_PAGE_TRACKING. In the next commit, DAMON's reference implementation of the virtual memory address space monitoring primitives will use it. Signed-off-by: SeongJae Park Reviewed-by: Shakeel Butt --- include/linux/page-flags.h | 4 ++-- include/linux/page_ext.h | 2 +- include/linux/page_idle.h | 6 +++--- include/trace/events/mmflags.h | 2 +- mm/Kconfig | 8 mm/page_ext.c | 12 +++- mm/page_idle.c | 10 -- 7 files changed, 26 insertions(+), 18 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 04a34c08e0a6..6be2c1e2fb48 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -131,7 +131,7 @@ enum pageflags { #ifdef CONFIG_MEMORY_FAILURE PG_hwpoison,/* hardware poisoned page. Don't touch */ #endif -#if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT) +#if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT) PG_young, PG_idle, #endif @@ -436,7 +436,7 @@ PAGEFLAG_FALSE(HWPoison) #define __PG_HWPOISON 0 #endif -#if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT) +#if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT) TESTPAGEFLAG(Young, young, PF_ANY) SETPAGEFLAG(Young, young, PF_ANY) TESTCLEARFLAG(Young, young, PF_ANY) diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index aff81ba31bd8..fabb2e1e087f 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -19,7 +19,7 @@ struct page_ext_operations { enum page_ext_flags { PAGE_EXT_OWNER, PAGE_EXT_OWNER_ALLOCATED, -#if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) +#if defined(CONFIG_PAGE_IDLE_FLAG) && !defined(CONFIG_64BIT) PAGE_EXT_YOUNG, PAGE_EXT_IDLE, #endif diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h index 1e894d34bdce..d8a6aecf99cb 100644 --- a/include/linux/page_idle.h +++ b/include/linux/page_idle.h @@ -6,7 +6,7 @@ #include #include -#ifdef CONFIG_IDLE_PAGE_TRACKING +#ifdef CONFIG_PAGE_IDLE_FLAG #ifdef CONFIG_64BIT static inline bool page_is_young(struct page *page) @@ -106,7 +106,7 @@ static inline void clear_page_idle(struct page *page) } #endif /* CONFIG_64BIT */ -#else /* !CONFIG_IDLE_PAGE_TRACKING */ +#else /* !CONFIG_PAGE_IDLE_FLAG */ static inline bool page_is_young(struct page *page) { @@ -135,6 +135,6 @@ static inline void clear_page_idle(struct page *page) { } -#endif /* CONFIG_IDLE_PAGE_TRACKING */ +#endif /* CONFIG_PAGE_IDLE_FLAG */ #endif /* _LINUX_MM_PAGE_IDLE_H */ diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index 629c7a0eaff2..ea434bbc2d2b 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -73,7 +73,7 @@ #define IF_HAVE_PG_HWPOISON(flag,string) #endif -#if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT) +#if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT) #define IF_HAVE_PG_IDLE(flag,string) ,{1UL << flag, string} #else #define IF_HAVE_PG_IDLE(flag,string) diff --git a/mm/Kconfig b/mm/Kconfig index 04b66c8df24a..7be2bc06b7d8 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -770,10 +770,18 @@ config DEFERRED_STRUCT_PAGE_INIT lifetime of the system until these kthreads finish the initialisation. +config PAGE_IDLE_FLAG + bool "Add PG_idle and PG_young flags" + help + This feature adds PG_idle and PG_young flags in 'struct page'. PTE + Accessed bit writers can set the state of the bit in the flags to let + other PTE Accessed bit readers don't disturbed. + config IDLE_PAGE_TRACKING bool "Enable idle page tracking" dep
[PATCH v28 02/13] mm/damon/core: Implement region-based sampling
From: SeongJae Park To avoid the unbounded increase of the overhead, DAMON groups adjacent pages that are assumed to have the same access frequencies into a region. As long as the assumption (pages in a region have the same access frequencies) is kept, only one page in the region is required to be checked. Thus, for each ``sampling interval``, 1. the 'prepare_access_checks' primitive picks one page in each region, 2. waits for one ``sampling interval``, 3. checks whether the page is accessed meanwhile, and 4. increases the access count of the region if so. Therefore, the monitoring overhead is controllable by adjusting the number of regions. DAMON allows both the underlying primitives and user callbacks to adjust regions for the trade-off. In other words, this commit makes DAMON to use not only time-based sampling but also space-based sampling. This scheme, however, cannot preserve the quality of the output if the assumption is not guaranteed. Next commit will address this problem. Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster --- include/linux/damon.h | 77 ++- mm/damon/core.c | 143 -- 2 files changed, 213 insertions(+), 7 deletions(-) diff --git a/include/linux/damon.h b/include/linux/damon.h index 2f652602b1ea..67db309ad61b 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -12,6 +12,48 @@ #include #include +/** + * struct damon_addr_range - Represents an address region of [@start, @end). + * @start: Start address of the region (inclusive). + * @end: End address of the region (exclusive). + */ +struct damon_addr_range { + unsigned long start; + unsigned long end; +}; + +/** + * struct damon_region - Represents a monitoring target region. + * @ar:The address range of the region. + * @sampling_addr: Address of the sample for the next access check. + * @nr_accesses: Access frequency of this region. + * @list: List head for siblings. + */ +struct damon_region { + struct damon_addr_range ar; + unsigned long sampling_addr; + unsigned int nr_accesses; + struct list_head list; +}; + +/** + * struct damon_target - Represents a monitoring target. + * @id:Unique identifier for this target. + * @regions_list: Head of the monitoring target regions of this target. + * @list: List head for siblings. + * + * Each monitoring context could have multiple targets. For example, a context + * for virtual memory address spaces could have multiple target processes. The + * @id of each target should be unique among the targets of the context. For + * example, in the virtual address monitoring context, it could be a pidfd or + * an address of an mm_struct. + */ +struct damon_target { + unsigned long id; + struct list_head regions_list; + struct list_head list; +}; + struct damon_ctx; /** @@ -36,7 +78,7 @@ struct damon_ctx; * * @init should initialize primitive-internal data structures. For example, * this could be used to construct proper monitoring target regions and link - * those to @damon_ctx.target. + * those to @damon_ctx.adaptive_targets. * @update should update the primitive-internal data structures. For example, * this could be used to update monitoring target regions for current status. * @prepare_access_checks should manipulate the monitoring regions to be @@ -130,7 +172,7 @@ struct damon_callback { * @primitive: Set of monitoring primitives for given use cases. * @callback: Set of callbacks for monitoring events notifications. * - * @target:Pointer to the user-defined monitoring target. + * @region_targets:Head of monitoring targets (_target) list. */ struct damon_ctx { unsigned long sample_interval; @@ -149,11 +191,40 @@ struct damon_ctx { struct damon_primitive primitive; struct damon_callback callback; - void *target; + struct list_head region_targets; }; +#define damon_next_region(r) \ + (container_of(r->list.next, struct damon_region, list)) + +#define damon_prev_region(r) \ + (container_of(r->list.prev, struct damon_region, list)) + +#define damon_for_each_region(r, t) \ + list_for_each_entry(r, >regions_list, list) + +#define damon_for_each_region_safe(r, next, t) \ + list_for_each_entry_safe(r, next, >regions_list, list) + +#define damon_for_each_target(t, ctx) \ + list_for_each_entry(t, &(ctx)->region_targets, list) + +#define damon_for_each_target_safe(t, next, ctx) \ + list_for_each_entry_safe(t, next, &(ctx)->region_targets, list) + #ifdef CONFIG_DAMON +struct damon_region *damon_new_region(unsigned long start, unsigned long end); +inline void damon_insert_region(struct damon_region *r, + struct damon_region *prev, struct damon_region *next); +void damon_add_region(stru
[PATCH v28 00/13] Introduce Data Access MONitor (DAMON)
From: SeongJae Park Changes from Previous Version (v27) === - Rebase on latest -mm tree (v5.12-rc7-mmots-2021-04-11-20-49) - dbgfs: Fix wrong failure handlings (Stefan Nuernberger) - dbgfs: Change return type of 'dbgfs_fill_ctx_dir()' to void (Greg KH) Introduction DAMON is a data access monitoring framework for the Linux kernel. The core mechanisms of DAMON called 'region based sampling' and 'adaptive regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this patchset for the detail) make it - accurate (The monitored information is useful for DRAM level memory management. It might not appropriate for Cache-level accuracy, though.), - light-weight (The monitoring overhead is low enough to be applied online while making no impact on the performance of the target workloads.), and - scalable (the upper-bound of the instrumentation overhead is controllable regardless of the size of target workloads.). Using this framework, therefore, several memory management mechanisms such as reclamation and THP can be optimized to aware real data access patterns. Experimental access pattern aware memory management optimization works that incurring high instrumentation overhead will be able to have another try. Though DAMON is for kernel subsystems, it can be easily exposed to the user space by writing a DAMON-wrapper kernel subsystem. Then, user space users who have some special workloads will be able to write personalized tools or applications for deeper understanding and specialized optimizations of their systems. Long-term Plan -- DAMON is a part of a project called Data Access-aware Operating System (DAOS). As the name implies, I want to improve the performance and efficiency of systems using fine-grained data access patterns. The optimizations are for both kernel and user spaces. I will therefore modify or create kernel subsystems, export some of those to user space and implement user space library / tools. Below shows the layers and components for the project. --- Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ... Framework: DAMON Features: DAMOS, virtual addr, physical addr, ... Applications: DAMON-debugfs, (DARC), ... ^^^KERNEL SPACE Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ... vvvUSER SPACE Library:(libdamon), ... Tools: DAMO, (perf), ... --- The components in parentheses or marked as '...' are not implemented yet but in the future plan. IOW, those are the TODO tasks of DAOS project. For more detail, please refer to the plans: https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjp...@amazon.com/ Evaluations === We evaluated DAMON's overhead, monitoring quality and usefulness using 24 realistic workloads on my QEMU/KVM based virtual machine running a kernel that v24 DAMON patchset is applied. DAMON is lightweight. It increases system memory usage by 0.39% and slows target workloads down by 1.16%. DAMON is accurate and useful for memory management optimizations. An experimental DAMON-based operation scheme for THP, namely 'ethp', removes 76.15% of THP memory overheads while preserving 51.25% of THP speedup. Another experimental DAMON-based 'proactive reclamation' implementation, 'prcl', reduces 93.38% of residential sets and 23.63% of system memory footprint while incurring only 1.22% runtime overhead in the best case (parsec3/freqmine). NOTE that the experimental THP optimization and proactive reclamation are not for production but only for proof of concepts. Please refer to the official document[1] or "Documentation/admin-guide/mm: Add a document for DAMON" patch in this patchset for detailed evaluation setup and results. [1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html Real-world User Story = In summary, DAMON has used on production systems and proved its usefulness. DAMON as a profiler --- We analyzed characteristics of a large scale production systems of our customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From this, we were able to find interesting things below. There were obviously different access pattern under idle workload and active workload. Under the idle workload, it accessed large memory regions with low frequency, while the active workload accessed small memory regions with high freuqnecy. DAMON found a 7GB memory region that showing obviously high access frequency under the active workload. We believe this is the performance-effective working set and need to be protected. There was a
Re: [PATCH v2 00/16] Multigenerational LRU Framework
From: SeongJae Park Hello, Very interesting work, thank you for sharing this :) On Tue, 13 Apr 2021 00:56:17 -0600 Yu Zhao wrote: > What's new in v2 > > Special thanks to Jens Axboe for reporting a regression in buffered > I/O and helping test the fix. Is the discussion open? If so, could you please give me a link? > > This version includes the support of tiers, which represent levels of > usage from file descriptors only. Pages accessed N times via file > descriptors belong to tier order_base_2(N). Each generation contains > at most MAX_NR_TIERS tiers, and they require additional MAX_NR_TIERS-2 > bits in page->flags. In contrast to moving across generations which > requires the lru lock, moving across tiers only involves an atomic > operation on page->flags and therefore has a negligible cost. A > feedback loop modeled after the well-known PID controller monitors the > refault rates across all tiers and decides when to activate pages from > which tiers, on the reclaim path. > > This feedback model has a few advantages over the current feedforward > model: > 1) It has a negligible overhead in the buffered I/O access path >because activations are done in the reclaim path. > 2) It takes mapped pages into account and avoids overprotecting pages >accessed multiple times via file descriptors. > 3) More tiers offer better protection to pages accessed more than >twice when buffered-I/O-intensive workloads are under memory >pressure. > > The fio/io_uring benchmark shows 14% improvement in IOPS when randomly > accessing Samsung PM981a in the buffered I/O mode. Improvement under memory pressure, right? How much pressure? [...] > > Differential scans via page tables > -- > Each differential scan discovers all pages that have been referenced > since the last scan. Specifically, it walks the mm_struct list > associated with an lruvec to scan page tables of processes that have > been scheduled since the last scan. Does this means it scans only virtual address spaces of processes and therefore pages in the page cache that are not mmap()-ed will not be scanned? > The cost of each differential scan > is roughly proportional to the number of referenced pages it > discovers. Unless address spaces are extremely sparse, page tables > usually have better memory locality than the rmap. The end result is > generally a significant reduction in CPU usage, for workloads using a > large amount of anon memory. When and how frequently it scans? Thanks, SeongJae Park [...]
Re: [PATCH v27 07/13] mm/damon: Implement a debugfs-based user space interface
From: SeongJae Park On Thu, 8 Apr 2021 13:48:48 + SeongJae Park wrote: > From: SeongJae Park > > DAMON is designed to be used by kernel space code such as the memory > management subsystems, and therefore it provides only kernel space API. > That said, letting the user space control DAMON could provide some > benefits to them. For example, it will allow user space to analyze > their specific workloads and make their own special optimizations. > > For such cases, this commit implements a simple DAMON application kernel > module, namely 'damon-dbgfs', which merely wraps the DAMON api and > exports those to the user space via the debugfs. > [...] > +/* > + * Functions for the initialization > + */ > + > +static int __init damon_dbgfs_init(void) > +{ > + int rc; > + > + dbgfs_ctxs = kmalloc(sizeof(*dbgfs_ctxs), GFP_KERNEL); > + if (!dbgfs_ctxs) { > + pr_err("%s: dbgfs ctxs alloc failed\n", __func__); > + return -ENOMEM; > + } > + dbgfs_ctxs[0] = dbgfs_new_ctx(); > + if (!dbgfs_ctxs[0]) { > + pr_err("%s: dbgfs ctx alloc failed\n", __func__); > + return -ENOMEM; My colleague, Stefan found 'dbgfs_ctxs' is not freed here. Similar in below '__damon_dbgfs_init()' failure handling. I will fix these in the next version. Reported-by: Stefan Nuernberger Thanks, SeongJae Park > + } > + dbgfs_nr_ctxs = 1; > + > + rc = __damon_dbgfs_init(); > + if (rc) > + pr_err("%s: dbgfs init failed\n", __func__); > + > + return rc; > +} > + > +module_init(damon_dbgfs_init); > -- > 2.17.1 >
Re: [PATCH v27 07/13] mm/damon: Implement a debugfs-based user space interface
From: SeongJae Park On Sat, 10 Apr 2021 10:55:01 +0200 Greg KH wrote: > On Thu, Apr 08, 2021 at 01:48:48PM +0000, SeongJae Park wrote: > > +static int dbgfs_fill_ctx_dir(struct dentry *dir, struct damon_ctx *ctx) > > +{ > > + const char * const file_names[] = {"attrs", "target_ids"}; > > + const struct file_operations *fops[] = {_fops, _ids_fops}; > > + int i; > > + > > + for (i = 0; i < ARRAY_SIZE(file_names); i++) > > + debugfs_create_file(file_names[i], 0600, dir, ctx, fops[i]); > > + > > + return 0; > > +} > > Why do you have a function that can only return 0, actually return > something? It should be void, right? You're right, I will make it return void in the next spin. Thanks, SeongJae Park > > thanks, > > greg k-h >
[PATCH v27 13/13] MAINTAINERS: Update for DAMON
From: SeongJae Park This commit updates MAINTAINERS file for DAMON related files. Signed-off-by: SeongJae Park --- MAINTAINERS | 12 1 file changed, 12 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index ad650102f950..0df746019eb9 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -5003,6 +5003,18 @@ F: net/ax25/ax25_out.c F: net/ax25/ax25_timer.c F: net/ax25/sysctl_net_ax25.c +DATA ACCESS MONITOR +M: SeongJae Park +L: linux...@kvack.org +S: Maintained +F: Documentation/admin-guide/mm/damon/* +F: Documentation/vm/damon/* +F: include/linux/damon.h +F: include/trace/events/damon.h +F: mm/damon/* +F: tools/damon/* +F: tools/testing/selftests/damon/* + DAVICOM FAST ETHERNET (DMFE) NETWORK DRIVER L: net...@vger.kernel.org S: Orphan -- 2.17.1
[PATCH v27 12/13] mm/damon: Add user space selftests
From: SeongJae Park This commit adds a simple user space tests for DAMON. The tests are using kselftest framework. Signed-off-by: SeongJae Park --- tools/testing/selftests/damon/Makefile| 7 ++ .../selftests/damon/_chk_dependency.sh| 28 ++ .../testing/selftests/damon/debugfs_attrs.sh | 98 +++ 3 files changed, 133 insertions(+) create mode 100644 tools/testing/selftests/damon/Makefile create mode 100644 tools/testing/selftests/damon/_chk_dependency.sh create mode 100755 tools/testing/selftests/damon/debugfs_attrs.sh diff --git a/tools/testing/selftests/damon/Makefile b/tools/testing/selftests/damon/Makefile new file mode 100644 index ..8a3f2cd9fec0 --- /dev/null +++ b/tools/testing/selftests/damon/Makefile @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0 +# Makefile for damon selftests + +TEST_FILES = _chk_dependency.sh +TEST_PROGS = debugfs_attrs.sh + +include ../lib.mk diff --git a/tools/testing/selftests/damon/_chk_dependency.sh b/tools/testing/selftests/damon/_chk_dependency.sh new file mode 100644 index ..e090836c2bf7 --- /dev/null +++ b/tools/testing/selftests/damon/_chk_dependency.sh @@ -0,0 +1,28 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +# Kselftest framework requirement - SKIP code is 4. +ksft_skip=4 + +DBGFS=/sys/kernel/debug/damon + +if [ $EUID -ne 0 ]; +then + echo "Run as root" + exit $ksft_skip +fi + +if [ ! -d $DBGFS ] +then + echo "$DBGFS not found" + exit $ksft_skip +fi + +for f in attrs target_ids monitor_on +do + if [ ! -f "$DBGFS/$f" ] + then + echo "$f not found" + exit 1 + fi +done diff --git a/tools/testing/selftests/damon/debugfs_attrs.sh b/tools/testing/selftests/damon/debugfs_attrs.sh new file mode 100755 index ..4a8ab4910ee4 --- /dev/null +++ b/tools/testing/selftests/damon/debugfs_attrs.sh @@ -0,0 +1,98 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +source ./_chk_dependency.sh + +# Test attrs file +file="$DBGFS/attrs" + +ORIG_CONTENT=$(cat $file) + +echo 1 2 3 4 5 > $file +if [ $? -ne 0 ] +then + echo "$file write failed" + echo $ORIG_CONTENT > $file + exit 1 +fi + +echo 1 2 3 4 > $file +if [ $? -eq 0 ] +then + echo "$file write success (should failed)" + echo $ORIG_CONTENT > $file + exit 1 +fi + +CONTENT=$(cat $file) +if [ "$CONTENT" != "1 2 3 4 5" ] +then + echo "$file not written" + echo $ORIG_CONTENT > $file + exit 1 +fi + +echo $ORIG_CONTENT > $file + +# Test target_ids file +file="$DBGFS/target_ids" + +ORIG_CONTENT=$(cat $file) + +echo "1 2 3 4" > $file +if [ $? -ne 0 ] +then + echo "$file write fail" + echo $ORIG_CONTENT > $file + exit 1 +fi + +echo "1 2 abc 4" > $file +if [ $? -ne 0 ] +then + echo "$file write fail" + echo $ORIG_CONTENT > $file + exit 1 +fi + +CONTENT=$(cat $file) +if [ "$CONTENT" != "1 2" ] +then + echo "$file not written" + echo $ORIG_CONTENT > $file + exit 1 +fi + +echo abc 2 3 > $file +if [ $? -ne 0 ] +then + echo "$file wrong value write fail" + echo $ORIG_CONTENT > $file + exit 1 +fi + +if [ ! -z "$(cat $file)" ] +then + echo "$file not cleared" + echo $ORIG_CONTENT > $file + exit 1 +fi + +echo > $file +if [ $? -ne 0 ] +then + echo "$file init fail" + echo $ORIG_CONTENT > $file + exit 1 +fi + +if [ ! -z "$(cat $file)" ] +then + echo "$file not initialized" + echo $ORIG_CONTENT > $file + exit 1 +fi + +echo $ORIG_CONTENT > $file + +echo "PASS" -- 2.17.1
[PATCH v27 10/13] Documentation: Add documents for DAMON
From: SeongJae Park This commit adds documents for DAMON under `Documentation/admin-guide/mm/damon/` and `Documentation/vm/damon/`. Signed-off-by: SeongJae Park --- Documentation/admin-guide/mm/damon/guide.rst | 158 + Documentation/admin-guide/mm/damon/index.rst | 15 ++ Documentation/admin-guide/mm/damon/plans.rst | 29 +++ Documentation/admin-guide/mm/damon/start.rst | 114 + Documentation/admin-guide/mm/damon/usage.rst | 112 + Documentation/admin-guide/mm/index.rst | 1 + Documentation/vm/damon/api.rst | 20 ++ Documentation/vm/damon/design.rst| 166 + Documentation/vm/damon/eval.rst | 232 +++ Documentation/vm/damon/faq.rst | 58 + Documentation/vm/damon/index.rst | 31 +++ Documentation/vm/index.rst | 1 + 12 files changed, 937 insertions(+) create mode 100644 Documentation/admin-guide/mm/damon/guide.rst create mode 100644 Documentation/admin-guide/mm/damon/index.rst create mode 100644 Documentation/admin-guide/mm/damon/plans.rst create mode 100644 Documentation/admin-guide/mm/damon/start.rst create mode 100644 Documentation/admin-guide/mm/damon/usage.rst create mode 100644 Documentation/vm/damon/api.rst create mode 100644 Documentation/vm/damon/design.rst create mode 100644 Documentation/vm/damon/eval.rst create mode 100644 Documentation/vm/damon/faq.rst create mode 100644 Documentation/vm/damon/index.rst diff --git a/Documentation/admin-guide/mm/damon/guide.rst b/Documentation/admin-guide/mm/damon/guide.rst new file mode 100644 index ..f52dc1669bb1 --- /dev/null +++ b/Documentation/admin-guide/mm/damon/guide.rst @@ -0,0 +1,158 @@ +.. SPDX-License-Identifier: GPL-2.0 + +== +Optimization Guide +== + +This document helps you estimating the amount of benefit that you could get +from DAMON-based optimizations, and describes how you could achieve it. You +are assumed to already read :doc:`start`. + + +Check The Signs +=== + +No optimization can provide same extent of benefit to every case. Therefore +you should first guess how much improvements you could get using DAMON. If +some of below conditions match your situation, you could consider using DAMON. + +- *Low IPC and High Cache Miss Ratios.* Low IPC means most of the CPU time is + spent waiting for the completion of time-consuming operations such as memory + access, while high cache miss ratios mean the caches don't help it well. + DAMON is not for cache level optimization, but DRAM level. However, + improving DRAM management will also help this case by reducing the memory + operation latency. +- *Memory Over-commitment and Unknown Users.* If you are doing memory + overcommitment and you cannot control every user of your system, a memory + bank run could happen at any time. You can estimate when it will happen + based on DAMON's monitoring results and act earlier to avoid or deal better + with the crisis. +- *Frequent Memory Pressure.* Frequent memory pressure means your system has + wrong configurations or memory hogs. DAMON will help you find the right + configuration and/or the criminals. +- *Heterogeneous Memory System.* If your system is utilizing memory devices + that placed between DRAM and traditional hard disks, such as non-volatile + memory or fast SSDs, DAMON could help you utilizing the devices more + efficiently. + + +Profile +=== + +If you found some positive signals, you could start by profiling your workloads +using DAMON. Find major workloads on your systems and analyze their data +access pattern to find something wrong or can be improved. The DAMON user +space tool (``damo``) will be useful for this. You can get ``damo`` from +https://github.com/awslabs/damo. + +We recommend you to start from working set size distribution check using ``damo +report wss``. If the distribution is ununiform or quite different from what +you estimated, you could consider `Memory Configuration`_ optimization. + +Then, review the overall access pattern in heatmap form using ``damo report +heats``. If it shows a simple pattern consists of a small number of memory +regions having high contrast of access temperature, you could consider manual +`Program Modification`_. + +If you still want to absorb more benefits, you should develop `Personalized +DAMON Application`_ for your special case. + +You don't need to take only one approach among the above plans, but you could +use multiple of the above approaches to maximize the benefit. + + +Optimize + + +If the profiling result also says it's worth trying some optimization, you +could consider below approaches. Note that some of the below approaches assume +that your systems are configured with swap devices or other types of auxiliary +memory so that you don't strictly required to accommodate the whole working set +in the main memory. Most
[PATCH v27 11/13] mm/damon: Add kunit tests
From: SeongJae Park This commit adds kunit based unit tests for the core and the virtual address spaces monitoring primitives of DAMON. Signed-off-by: SeongJae Park Reviewed-by: Brendan Higgins --- mm/damon/Kconfig | 36 + mm/damon/core-test.h | 253 mm/damon/core.c | 7 + mm/damon/dbgfs-test.h | 126 mm/damon/dbgfs.c | 2 + mm/damon/vaddr-test.h | 328 ++ mm/damon/vaddr.c | 7 + 7 files changed, 759 insertions(+) create mode 100644 mm/damon/core-test.h create mode 100644 mm/damon/dbgfs-test.h create mode 100644 mm/damon/vaddr-test.h diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig index 72f1683ba0ee..455995152697 100644 --- a/mm/damon/Kconfig +++ b/mm/damon/Kconfig @@ -12,6 +12,18 @@ config DAMON See https://damonitor.github.io/doc/html/latest-damon/index.html for more information. +config DAMON_KUNIT_TEST + bool "Test for damon" if !KUNIT_ALL_TESTS + depends on DAMON && KUNIT=y + default KUNIT_ALL_TESTS + help + This builds the DAMON Kunit test suite. + + For more information on KUnit and unit tests in general, please refer + to the KUnit documentation. + + If unsure, say N. + config DAMON_VADDR bool "Data access monitoring primitives for virtual address spaces" depends on DAMON && MMU @@ -21,6 +33,18 @@ config DAMON_VADDR This builds the default data access monitoring primitives for DAMON that works for virtual address spaces. +config DAMON_VADDR_KUNIT_TEST + bool "Test for DAMON primitives" if !KUNIT_ALL_TESTS + depends on DAMON_VADDR && KUNIT=y + default KUNIT_ALL_TESTS + help + This builds the DAMON virtual addresses primitives Kunit test suite. + + For more information on KUnit and unit tests in general, please refer + to the KUnit documentation. + + If unsure, say N. + config DAMON_DBGFS bool "DAMON debugfs interface" depends on DAMON_VADDR && DEBUG_FS @@ -30,4 +54,16 @@ config DAMON_DBGFS If unsure, say N. +config DAMON_DBGFS_KUNIT_TEST + bool "Test for damon debugfs interface" if !KUNIT_ALL_TESTS + depends on DAMON_DBGFS && KUNIT=y + default KUNIT_ALL_TESTS + help + This builds the DAMON debugfs interface Kunit test suite. + + For more information on KUnit and unit tests in general, please refer + to the KUnit documentation. + + If unsure, say N. + endmenu diff --git a/mm/damon/core-test.h b/mm/damon/core-test.h new file mode 100644 index ..b815dfbfb5fd --- /dev/null +++ b/mm/damon/core-test.h @@ -0,0 +1,253 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Data Access Monitor Unit Tests + * + * Copyright 2019 Amazon.com, Inc. or its affiliates. All rights reserved. + * + * Author: SeongJae Park + */ + +#ifdef CONFIG_DAMON_KUNIT_TEST + +#ifndef _DAMON_CORE_TEST_H +#define _DAMON_CORE_TEST_H + +#include + +static void damon_test_regions(struct kunit *test) +{ + struct damon_region *r; + struct damon_target *t; + + r = damon_new_region(1, 2); + KUNIT_EXPECT_EQ(test, 1ul, r->ar.start); + KUNIT_EXPECT_EQ(test, 2ul, r->ar.end); + KUNIT_EXPECT_EQ(test, 0u, r->nr_accesses); + + t = damon_new_target(42); + KUNIT_EXPECT_EQ(test, 0u, damon_nr_regions(t)); + + damon_add_region(r, t); + KUNIT_EXPECT_EQ(test, 1u, damon_nr_regions(t)); + + damon_del_region(r); + KUNIT_EXPECT_EQ(test, 0u, damon_nr_regions(t)); + + damon_free_target(t); +} + +static unsigned int nr_damon_targets(struct damon_ctx *ctx) +{ + struct damon_target *t; + unsigned int nr_targets = 0; + + damon_for_each_target(t, ctx) + nr_targets++; + + return nr_targets; +} + +static void damon_test_target(struct kunit *test) +{ + struct damon_ctx *c = damon_new_ctx(); + struct damon_target *t; + + t = damon_new_target(42); + KUNIT_EXPECT_EQ(test, 42ul, t->id); + KUNIT_EXPECT_EQ(test, 0u, nr_damon_targets(c)); + + damon_add_target(c, t); + KUNIT_EXPECT_EQ(test, 1u, nr_damon_targets(c)); + + damon_destroy_target(t); + KUNIT_EXPECT_EQ(test, 0u, nr_damon_targets(c)); + + damon_destroy_ctx(c); +} + +/* + * Test kdamond_reset_aggregated() + * + * DAMON checks access to each region and aggregates this information as the + * access frequency of each region. In detail, it increases '->nr_accesses' of + * regions that an access has confirmed. 'kdamond_reset_aggregated()' flushes + * the aggregated information ('->nr_accesses' of each regions) to the result + * buffer. As a result of the flushing, the '->nr_accesses' of regions are + * initialized to zero. + */ +static voi
[PATCH v27 08/13] mm/damon/dbgfs: Export kdamond pid to the user space
From: SeongJae Park For CPU usage accounting, knowing pid of the monitoring thread could be helpful. For example, users could use cpuaccount cgroups with the pid. This commit therefore exports the pid of currently running monitoring thread to the user space via 'kdamond_pid' file in the debugfs directory. Signed-off-by: SeongJae Park --- mm/damon/dbgfs.c | 38 -- 1 file changed, 36 insertions(+), 2 deletions(-) diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c index 9af844faffd4..b20c1e7742ce 100644 --- a/mm/damon/dbgfs.c +++ b/mm/damon/dbgfs.c @@ -237,6 +237,32 @@ static ssize_t dbgfs_target_ids_write(struct file *file, return ret; } +static ssize_t dbgfs_kdamond_pid_read(struct file *file, + char __user *buf, size_t count, loff_t *ppos) +{ + struct damon_ctx *ctx = file->private_data; + char *kbuf; + ssize_t len; + + kbuf = kmalloc(count, GFP_KERNEL); + if (!kbuf) + return -ENOMEM; + + mutex_lock(>kdamond_lock); + if (ctx->kdamond) + len = scnprintf(kbuf, count, "%d\n", ctx->kdamond->pid); + else + len = scnprintf(kbuf, count, "none\n"); + mutex_unlock(>kdamond_lock); + if (!len) + goto out; + len = simple_read_from_buffer(buf, count, ppos, kbuf, len); + +out: + kfree(kbuf); + return len; +} + static int damon_dbgfs_open(struct inode *inode, struct file *file) { file->private_data = inode->i_private; @@ -258,10 +284,18 @@ static const struct file_operations target_ids_fops = { .write = dbgfs_target_ids_write, }; +static const struct file_operations kdamond_pid_fops = { + .owner = THIS_MODULE, + .open = damon_dbgfs_open, + .read = dbgfs_kdamond_pid_read, +}; + static int dbgfs_fill_ctx_dir(struct dentry *dir, struct damon_ctx *ctx) { - const char * const file_names[] = {"attrs", "target_ids"}; - const struct file_operations *fops[] = {_fops, _ids_fops}; + const char * const file_names[] = {"attrs", "target_ids", + "kdamond_pid"}; + const struct file_operations *fops[] = {_fops, _ids_fops, + _pid_fops}; int i; for (i = 0; i < ARRAY_SIZE(file_names); i++) -- 2.17.1
[PATCH v27 09/13] mm/damon/dbgfs: Support multiple contexts
From: SeongJae Park In some use cases, users would want to run multiple monitoring context. For example, if a user wants a high precision monitoring and dedicating multiple CPUs for the job is ok, because DAMON creates one monitoring thread per one context, the user can split the monitoring target regions into multiple small regions and create one context for each region. Or, someone might want to simultaneously monitor different address spaces, e.g., both virtual address space and physical address space. The DAMON's API allows such usage, but 'damon-dbgfs' does not. Therefore, only kernel space DAMON users can do multiple contexts monitoring. This commit allows the user space DAMON users to use multiple contexts monitoring by introducing two new 'damon-dbgfs' debugfs files, 'mk_context' and 'rm_context'. Users can create a new monitoring context by writing the desired name of the new context to 'mk_context'. Then, a new directory with the name and having the files for setting of the context ('attrs', 'target_ids' and 'record') will be created under the debugfs directory. Writing the name of the context to remove to 'rm_context' will remove the related context and directory. Signed-off-by: SeongJae Park --- mm/damon/dbgfs.c | 203 ++- 1 file changed, 201 insertions(+), 2 deletions(-) diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c index b20c1e7742ce..66ac7e18b1df 100644 --- a/mm/damon/dbgfs.c +++ b/mm/damon/dbgfs.c @@ -18,6 +18,7 @@ static struct damon_ctx **dbgfs_ctxs; static int dbgfs_nr_ctxs; static struct dentry **dbgfs_dirs; +static DEFINE_MUTEX(damon_dbgfs_lock); /* * Returns non-empty string on success, negarive error code otherwise. @@ -316,6 +317,192 @@ static struct damon_ctx *dbgfs_new_ctx(void) return ctx; } +static void dbgfs_destroy_ctx(struct damon_ctx *ctx) +{ + damon_destroy_ctx(ctx); +} + +/* + * Make a context of @name and create a debugfs directory for it. + * + * This function should be called while holding damon_dbgfs_lock. + * + * Returns 0 on success, negative error code otherwise. + */ +static int dbgfs_mk_context(char *name) +{ + struct dentry *root, **new_dirs, *new_dir; + struct damon_ctx **new_ctxs, *new_ctx; + int err; + + if (damon_nr_running_ctxs()) + return -EBUSY; + + new_ctxs = krealloc(dbgfs_ctxs, sizeof(*dbgfs_ctxs) * + (dbgfs_nr_ctxs + 1), GFP_KERNEL); + if (!new_ctxs) + return -ENOMEM; + + new_dirs = krealloc(dbgfs_dirs, sizeof(*dbgfs_dirs) * + (dbgfs_nr_ctxs + 1), GFP_KERNEL); + if (!new_dirs) { + kfree(new_ctxs); + return -ENOMEM; + } + + dbgfs_ctxs = new_ctxs; + dbgfs_dirs = new_dirs; + + root = dbgfs_dirs[0]; + if (!root) + return -ENOENT; + + new_dir = debugfs_create_dir(name, root); + dbgfs_dirs[dbgfs_nr_ctxs] = new_dir; + + new_ctx = dbgfs_new_ctx(); + if (!new_ctx) { + debugfs_remove(new_dir); + dbgfs_dirs[dbgfs_nr_ctxs] = NULL; + return -ENOMEM; + } + dbgfs_ctxs[dbgfs_nr_ctxs] = new_ctx; + + err = dbgfs_fill_ctx_dir(dbgfs_dirs[dbgfs_nr_ctxs], + dbgfs_ctxs[dbgfs_nr_ctxs]); + if (err) + return err; + + dbgfs_nr_ctxs++; + return 0; +} + +static ssize_t dbgfs_mk_context_write(struct file *file, + const char __user *buf, size_t count, loff_t *ppos) +{ + char *kbuf; + char *ctx_name; + ssize_t ret = count; + int err; + + kbuf = user_input_str(buf, count, ppos); + if (IS_ERR(kbuf)) + return PTR_ERR(kbuf); + ctx_name = kmalloc(count + 1, GFP_KERNEL); + if (!ctx_name) { + kfree(kbuf); + return -ENOMEM; + } + + /* Trim white space */ + if (sscanf(kbuf, "%s", ctx_name) != 1) { + ret = -EINVAL; + goto out; + } + + mutex_lock(_dbgfs_lock); + err = dbgfs_mk_context(ctx_name); + if (err) + ret = err; + mutex_unlock(_dbgfs_lock); + +out: + kfree(kbuf); + kfree(ctx_name); + return ret; +} + +/* + * Remove a context of @name and its debugfs directory. + * + * This function should be called while holding damon_dbgfs_lock. + * + * Return 0 on success, negative error code otherwise. + */ +static int dbgfs_rm_context(char *name) +{ + struct dentry *root, *dir, **new_dirs; + struct damon_ctx **new_ctxs; + int i, j; + + if (damon_nr_running_ctxs()) + return -EBUSY; + + root = dbgfs_dirs[0]; + if (!root) + return -ENOENT; + + dir = debugfs_lookup(name, root); + if (!dir) + return -ENOENT; + + new_dirs = kmalloc_array(dbgfs_nr_ctxs - 1, sizeof(*
[PATCH v27 07/13] mm/damon: Implement a debugfs-based user space interface
From: SeongJae Park DAMON is designed to be used by kernel space code such as the memory management subsystems, and therefore it provides only kernel space API. That said, letting the user space control DAMON could provide some benefits to them. For example, it will allow user space to analyze their specific workloads and make their own special optimizations. For such cases, this commit implements a simple DAMON application kernel module, namely 'damon-dbgfs', which merely wraps the DAMON api and exports those to the user space via the debugfs. 'damon-dbgfs' exports three files, ``attrs``, ``target_ids``, and ``monitor_on`` under its debugfs directory, ``/damon/``. Attributes -- Users can read and write the ``sampling interval``, ``aggregation interval``, ``regions update interval``, and min/max number of monitoring target regions by reading from and writing to the ``attrs`` file. For example, below commands set those values to 5 ms, 100 ms, 1,000 ms, 10, 1000 and check it again:: # cd /damon # echo 5000 10 100 10 1000 > attrs # cat attrs 5000 10 100 10 1000 Target IDs -- Some types of address spaces supports multiple monitoring target. For example, the virtual memory address spaces monitoring can have multiple processes as the monitoring targets. Users can set the targets by writing relevant id values of the targets to, and get the ids of the current targets by reading from the ``target_ids`` file. In case of the virtual address spaces monitoring, the values should be pids of the monitoring target processes. For example, below commands set processes having pids 42 and 4242 as the monitoring targets and check it again:: # cd /damon # echo 42 4242 > target_ids # cat target_ids 42 4242 Note that setting the target ids doesn't start the monitoring. Turning On/Off -- Setting the files as described above doesn't incur effect unless you explicitly start the monitoring. You can start, stop, and check the current status of the monitoring by writing to and reading from the ``monitor_on`` file. Writing ``on`` to the file starts the monitoring of the targets with the attributes. Writing ``off`` to the file stops those. DAMON also stops if every targets are invalidated (in case of the virtual memory monitoring, target processes are invalidated when terminated). Below example commands turn on, off, and check the status of DAMON:: # cd /damon # echo on > monitor_on # echo off > monitor_on # cat monitor_on off Please note that you cannot write to the above-mentioned debugfs files while the monitoring is turned on. If you write to the files while DAMON is running, an error code such as ``-EBUSY`` will be returned. Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster --- include/linux/damon.h | 3 + mm/damon/Kconfig | 9 + mm/damon/Makefile | 1 + mm/damon/core.c | 47 ++ mm/damon/dbgfs.c | 382 ++ 5 files changed, 442 insertions(+) create mode 100644 mm/damon/dbgfs.c diff --git a/include/linux/damon.h b/include/linux/damon.h index 72cf5ebd35fe..b17e808a9cae 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -237,9 +237,12 @@ unsigned int damon_nr_regions(struct damon_target *t); struct damon_ctx *damon_new_ctx(void); void damon_destroy_ctx(struct damon_ctx *ctx); +int damon_set_targets(struct damon_ctx *ctx, + unsigned long *ids, ssize_t nr_ids); int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, unsigned long aggr_int, unsigned long primitive_upd_int, unsigned long min_nr_reg, unsigned long max_nr_reg); +int damon_nr_running_ctxs(void); int damon_start(struct damon_ctx **ctxs, int nr_ctxs); int damon_stop(struct damon_ctx **ctxs, int nr_ctxs); diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig index 8ae080c52950..72f1683ba0ee 100644 --- a/mm/damon/Kconfig +++ b/mm/damon/Kconfig @@ -21,4 +21,13 @@ config DAMON_VADDR This builds the default data access monitoring primitives for DAMON that works for virtual address spaces. +config DAMON_DBGFS + bool "DAMON debugfs interface" + depends on DAMON_VADDR && DEBUG_FS + help + This builds the debugfs interface for DAMON. The user space admins + can use the interface for arbitrary data access monitoring. + + If unsure, say N. + endmenu diff --git a/mm/damon/Makefile b/mm/damon/Makefile index 6ebbd08aed67..fed4be3bace3 100644 --- a/mm/damon/Makefile +++ b/mm/damon/Makefile @@ -2,3 +2,4 @@ obj-$(CONFIG_DAMON):= core.o obj-$(CONFIG_DAMON_VADDR) += vaddr.o +obj-$(CONFIG_DAMON_DBGFS) += dbgfs.o diff --git a/mm/damon/core.c b/mm/damon/core.c index 912112662d0c..cad2b4cee39d 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -172,6 +172,39 @@ void damon_destroy_ctx(struct dam
[PATCH v27 06/13] mm/damon: Add a tracepoint
From: SeongJae Park This commit adds a tracepoint for DAMON. It traces the monitoring results of each region for each aggregation interval. Using this, DAMON can easily integrated with tracepoints supporting tools such as perf. Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster Reviewed-by: Steven Rostedt (VMware) --- include/trace/events/damon.h | 43 mm/damon/core.c | 7 +- 2 files changed, 49 insertions(+), 1 deletion(-) create mode 100644 include/trace/events/damon.h diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h new file mode 100644 index ..2f422f4f1fb9 --- /dev/null +++ b/include/trace/events/damon.h @@ -0,0 +1,43 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM damon + +#if !defined(_TRACE_DAMON_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_DAMON_H + +#include +#include +#include + +TRACE_EVENT(damon_aggregated, + + TP_PROTO(struct damon_target *t, struct damon_region *r, + unsigned int nr_regions), + + TP_ARGS(t, r, nr_regions), + + TP_STRUCT__entry( + __field(unsigned long, target_id) + __field(unsigned int, nr_regions) + __field(unsigned long, start) + __field(unsigned long, end) + __field(unsigned int, nr_accesses) + ), + + TP_fast_assign( + __entry->target_id = t->id; + __entry->nr_regions = nr_regions; + __entry->start = r->ar.start; + __entry->end = r->ar.end; + __entry->nr_accesses = r->nr_accesses; + ), + + TP_printk("target_id=%lu nr_regions=%u %lu-%lu: %u", + __entry->target_id, __entry->nr_regions, + __entry->start, __entry->end, __entry->nr_accesses) +); + +#endif /* _TRACE_DAMON_H */ + +/* This part must be outside protection */ +#include diff --git a/mm/damon/core.c b/mm/damon/core.c index b36b6bdd94e2..912112662d0c 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -13,6 +13,9 @@ #include #include +#define CREATE_TRACE_POINTS +#include + /* Get a random number in [l, r) */ #define damon_rand(l, r) (l + prandom_u32_max(r - l)) @@ -388,8 +391,10 @@ static void kdamond_reset_aggregated(struct damon_ctx *c) damon_for_each_target(t, c) { struct damon_region *r; - damon_for_each_region(r, t) + damon_for_each_region(r, t) { + trace_damon_aggregated(t, r, damon_nr_regions(t)); r->nr_accesses = 0; + } } } -- 2.17.1
[PATCH v27 05/13] mm/damon: Implement primitives for the virtual memory address spaces
From: SeongJae Park This commit introduces a reference implementation of the address space specific low level primitives for the virtual address space, so that users of DAMON can easily monitor the data accesses on virtual address spaces of specific processes by simply configuring the implementation to be used by DAMON. The low level primitives for the fundamental access monitoring are defined in two parts: 1. Identification of the monitoring target address range for the address space. 2. Access check of specific address range in the target space. The reference implementation for the virtual address space does the works as below. PTE Accessed-bit Based Access Check --- The implementation uses PTE Accessed-bit for basic access checks. That is, it clears the bit for the next sampling target page and checks whether it is set again after one sampling period. This could disturb the reclaim logic. DAMON uses ``PG_idle`` and ``PG_young`` page flags to solve the conflict, as Idle page tracking does. VMA-based Target Address Range Construction --- Only small parts in the super-huge virtual address space of the processes are mapped to physical memory and accessed. Thus, tracking the unmapped address regions is just wasteful. However, because DAMON can deal with some level of noise using the adaptive regions adjustment mechanism, tracking every mapping is not strictly required but could even incur a high overhead in some cases. That said, too huge unmapped areas inside the monitoring target should be removed to not take the time for the adaptive mechanism. For the reason, this implementation converts the complex mappings to three distinct regions that cover every mapped area of the address space. Also, the two gaps between the three regions are the two biggest unmapped areas in the given address space. The two biggest unmapped areas would be the gap between the heap and the uppermost mmap()-ed region, and the gap between the lowermost mmap()-ed region and the stack in most of the cases. Because these gaps are exceptionally huge in usual address spaces, excluding these will be sufficient to make a reasonable trade-off. Below shows this in detail:: (small mmap()-ed regions and munmap()-ed regions) Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster Reported-by: Guoju Fang --- include/linux/damon.h | 13 + mm/damon/Kconfig | 9 + mm/damon/Makefile | 1 + mm/damon/vaddr.c | 616 ++ 4 files changed, 639 insertions(+) create mode 100644 mm/damon/vaddr.c diff --git a/include/linux/damon.h b/include/linux/damon.h index 0bd5d6913a6c..72cf5ebd35fe 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -246,4 +246,17 @@ int damon_stop(struct damon_ctx **ctxs, int nr_ctxs); #endif /* CONFIG_DAMON */ +#ifdef CONFIG_DAMON_VADDR + +/* Monitoring primitives for virtual memory address spaces */ +void damon_va_init(struct damon_ctx *ctx); +void damon_va_update(struct damon_ctx *ctx); +void damon_va_prepare_access_checks(struct damon_ctx *ctx); +unsigned int damon_va_check_accesses(struct damon_ctx *ctx); +bool damon_va_target_valid(void *t); +void damon_va_cleanup(struct damon_ctx *ctx); +void damon_va_set_primitives(struct damon_ctx *ctx); + +#endif /* CONFIG_DAMON_VADDR */ + #endif /* _DAMON_H */ diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig index d00e99ac1a15..8ae080c52950 100644 --- a/mm/damon/Kconfig +++ b/mm/damon/Kconfig @@ -12,4 +12,13 @@ config DAMON See https://damonitor.github.io/doc/html/latest-damon/index.html for more information. +config DAMON_VADDR + bool "Data access monitoring primitives for virtual address spaces" + depends on DAMON && MMU + select PAGE_EXTENSION if !64BIT + select PAGE_IDLE_FLAG + help + This builds the default data access monitoring primitives for DAMON + that works for virtual address spaces. + endmenu diff --git a/mm/damon/Makefile b/mm/damon/Makefile index 4fd2edb4becf..6ebbd08aed67 100644 --- a/mm/damon/Makefile +++ b/mm/damon/Makefile @@ -1,3 +1,4 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_DAMON):= core.o +obj-$(CONFIG_DAMON_VADDR) += vaddr.o diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c new file mode 100644 index ..3bc9dc9f0656 --- /dev/null +++ b/mm/damon/vaddr.c @@ -0,0 +1,616 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * DAMON Primitives for Virtual Address Spaces + * + * Author: SeongJae Park + */ + +#define pr_fmt(fmt) "damon-va: " fmt + +#include +#include +#include +#include +#include +#include +#include + +/* Get a random number in [l, r) */ +#define damon_rand(l, r) (l + prandom_u32_max(r - l)) + +/* + * 't->id' should be the pointer to the relevant 'struct pid' having reference + * count. Caller must put the r
[PATCH v27 03/13] mm/damon: Adaptively adjust regions
From: SeongJae Park Even somehow the initial monitoring target regions are well constructed to fulfill the assumption (pages in same region have similar access frequencies), the data access pattern can be dynamically changed. This will result in low monitoring quality. To keep the assumption as much as possible, DAMON adaptively merges and splits each region based on their access frequency. For each ``aggregation interval``, it compares the access frequencies of adjacent regions and merges those if the frequency difference is small. Then, after it reports and clears the aggregated access frequency of each region, it splits each region into two or three regions if the total number of regions will not exceed the user-specified maximum number of regions after the split. In this way, DAMON provides its best-effort quality and minimal overhead while keeping the upper-bound overhead that users set. Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster --- include/linux/damon.h | 23 +++-- mm/damon/core.c | 214 +- 2 files changed, 227 insertions(+), 10 deletions(-) diff --git a/include/linux/damon.h b/include/linux/damon.h index 67db309ad61b..0bd5d6913a6c 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -12,6 +12,9 @@ #include #include +/* Minimal region size. Every damon_region is aligned by this. */ +#define DAMON_MIN_REGION PAGE_SIZE + /** * struct damon_addr_range - Represents an address region of [@start, @end). * @start: Start address of the region (inclusive). @@ -85,6 +88,8 @@ struct damon_ctx; * prepared for the next access check. * @check_accesses should check the accesses to each region that made after the * last preparation and update the number of observed accesses of each region. + * It should also return max number of observed accesses that made as a result + * of its update. The value will be used for regions adjustment threshold. * @reset_aggregated should reset the access monitoring results that aggregated * by @check_accesses. * @target_valid should check whether the target is still valid for the @@ -95,7 +100,7 @@ struct damon_primitive { void (*init)(struct damon_ctx *context); void (*update)(struct damon_ctx *context); void (*prepare_access_checks)(struct damon_ctx *context); - void (*check_accesses)(struct damon_ctx *context); + unsigned int (*check_accesses)(struct damon_ctx *context); void (*reset_aggregated)(struct damon_ctx *context); bool (*target_valid)(void *target); void (*cleanup)(struct damon_ctx *context); @@ -172,7 +177,9 @@ struct damon_callback { * @primitive: Set of monitoring primitives for given use cases. * @callback: Set of callbacks for monitoring events notifications. * - * @region_targets:Head of monitoring targets (_target) list. + * @min_nr_regions:The minimum number of adaptive monitoring regions. + * @max_nr_regions:The maximum number of adaptive monitoring regions. + * @adaptive_targets: Head of monitoring targets (_target) list. */ struct damon_ctx { unsigned long sample_interval; @@ -191,7 +198,9 @@ struct damon_ctx { struct damon_primitive primitive; struct damon_callback callback; - struct list_head region_targets; + unsigned long min_nr_regions; + unsigned long max_nr_regions; + struct list_head adaptive_targets; }; #define damon_next_region(r) \ @@ -207,10 +216,10 @@ struct damon_ctx { list_for_each_entry_safe(r, next, >regions_list, list) #define damon_for_each_target(t, ctx) \ - list_for_each_entry(t, &(ctx)->region_targets, list) + list_for_each_entry(t, &(ctx)->adaptive_targets, list) #define damon_for_each_target_safe(t, next, ctx) \ - list_for_each_entry_safe(t, next, &(ctx)->region_targets, list) + list_for_each_entry_safe(t, next, &(ctx)->adaptive_targets, list) #ifdef CONFIG_DAMON @@ -224,11 +233,13 @@ struct damon_target *damon_new_target(unsigned long id); void damon_add_target(struct damon_ctx *ctx, struct damon_target *t); void damon_free_target(struct damon_target *t); void damon_destroy_target(struct damon_target *t); +unsigned int damon_nr_regions(struct damon_target *t); struct damon_ctx *damon_new_ctx(void); void damon_destroy_ctx(struct damon_ctx *ctx); int damon_set_attrs(struct damon_ctx *ctx, unsigned long sample_int, - unsigned long aggr_int, unsigned long primitive_upd_int); + unsigned long aggr_int, unsigned long primitive_upd_int, + unsigned long min_nr_reg, unsigned long max_nr_reg); int damon_start(struct damon_ctx **ctxs, int nr_ctxs); int damon_stop(struct damon_ctx **ctxs, int nr_ctxs); diff --git a/mm/damon/core.c b/mm/damon/core.c index 94db494dcf70..b36b6bdd94e2 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -10,8 +10,12 @@ #include
[PATCH v27 04/13] mm/idle_page_tracking: Make PG_idle reusable
From: SeongJae Park PG_idle and PG_young allow the two PTE Accessed bit users, Idle Page Tracking and the reclaim logic concurrently work while don't interfere each other. That is, when they need to clear the Accessed bit, they set PG_young to represent the previous state of the bit, respectively. And when they need to read the bit, if the bit is cleared, they further read the PG_young to know whether the other has cleared the bit meanwhile or not. We could add another page flag and extend the mechanism to use the flag if we need to add another concurrent PTE Accessed bit user subsystem. However, the space is limited. Meanwhile, if the new subsystem is mutually exclusive with IDLE_PAGE_TRACKING or interfering with it is not a real problem, it would be ok to simply reuse the PG_idle flag. However, it's impossible because the flags are dependent on IDLE_PAGE_TRACKING. To allow such reuse of the flags, this commit separates the PG_young and PG_idle flag logic from IDLE_PAGE_TRACKING and introduces new kernel config, 'PAGE_IDLE_FLAG'. Hence, a new subsystem would be able to reuse PG_idle without depending on IDLE_PAGE_TRACKING. In the next commit, DAMON's reference implementation of the virtual memory address space monitoring primitives will use it. Signed-off-by: SeongJae Park Reviewed-by: Shakeel Butt --- include/linux/page-flags.h | 4 ++-- include/linux/page_ext.h | 2 +- include/linux/page_idle.h | 6 +++--- include/trace/events/mmflags.h | 2 +- mm/Kconfig | 8 mm/page_ext.c | 12 +++- mm/page_idle.c | 10 -- 7 files changed, 26 insertions(+), 18 deletions(-) diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index 04a34c08e0a6..6be2c1e2fb48 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -131,7 +131,7 @@ enum pageflags { #ifdef CONFIG_MEMORY_FAILURE PG_hwpoison,/* hardware poisoned page. Don't touch */ #endif -#if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT) +#if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT) PG_young, PG_idle, #endif @@ -436,7 +436,7 @@ PAGEFLAG_FALSE(HWPoison) #define __PG_HWPOISON 0 #endif -#if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT) +#if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT) TESTPAGEFLAG(Young, young, PF_ANY) SETPAGEFLAG(Young, young, PF_ANY) TESTCLEARFLAG(Young, young, PF_ANY) diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index aff81ba31bd8..fabb2e1e087f 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -19,7 +19,7 @@ struct page_ext_operations { enum page_ext_flags { PAGE_EXT_OWNER, PAGE_EXT_OWNER_ALLOCATED, -#if defined(CONFIG_IDLE_PAGE_TRACKING) && !defined(CONFIG_64BIT) +#if defined(CONFIG_PAGE_IDLE_FLAG) && !defined(CONFIG_64BIT) PAGE_EXT_YOUNG, PAGE_EXT_IDLE, #endif diff --git a/include/linux/page_idle.h b/include/linux/page_idle.h index 1e894d34bdce..d8a6aecf99cb 100644 --- a/include/linux/page_idle.h +++ b/include/linux/page_idle.h @@ -6,7 +6,7 @@ #include #include -#ifdef CONFIG_IDLE_PAGE_TRACKING +#ifdef CONFIG_PAGE_IDLE_FLAG #ifdef CONFIG_64BIT static inline bool page_is_young(struct page *page) @@ -106,7 +106,7 @@ static inline void clear_page_idle(struct page *page) } #endif /* CONFIG_64BIT */ -#else /* !CONFIG_IDLE_PAGE_TRACKING */ +#else /* !CONFIG_PAGE_IDLE_FLAG */ static inline bool page_is_young(struct page *page) { @@ -135,6 +135,6 @@ static inline void clear_page_idle(struct page *page) { } -#endif /* CONFIG_IDLE_PAGE_TRACKING */ +#endif /* CONFIG_PAGE_IDLE_FLAG */ #endif /* _LINUX_MM_PAGE_IDLE_H */ diff --git a/include/trace/events/mmflags.h b/include/trace/events/mmflags.h index 629c7a0eaff2..ea434bbc2d2b 100644 --- a/include/trace/events/mmflags.h +++ b/include/trace/events/mmflags.h @@ -73,7 +73,7 @@ #define IF_HAVE_PG_HWPOISON(flag,string) #endif -#if defined(CONFIG_IDLE_PAGE_TRACKING) && defined(CONFIG_64BIT) +#if defined(CONFIG_PAGE_IDLE_FLAG) && defined(CONFIG_64BIT) #define IF_HAVE_PG_IDLE(flag,string) ,{1UL << flag, string} #else #define IF_HAVE_PG_IDLE(flag,string) diff --git a/mm/Kconfig b/mm/Kconfig index 56bec147bdff..0616a8b1ff0b 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -771,10 +771,18 @@ config DEFERRED_STRUCT_PAGE_INIT lifetime of the system until these kthreads finish the initialisation. +config PAGE_IDLE_FLAG + bool "Add PG_idle and PG_young flags" + help + This feature adds PG_idle and PG_young flags in 'struct page'. PTE + Accessed bit writers can set the state of the bit in the flags to let + other PTE Accessed bit readers don't disturbed. + config IDLE_PAGE_TRACKING bool "Enable idle page tracking" dep
[PATCH v27 02/13] mm/damon/core: Implement region-based sampling
From: SeongJae Park To avoid the unbounded increase of the overhead, DAMON groups adjacent pages that are assumed to have the same access frequencies into a region. As long as the assumption (pages in a region have the same access frequencies) is kept, only one page in the region is required to be checked. Thus, for each ``sampling interval``, 1. the 'prepare_access_checks' primitive picks one page in each region, 2. waits for one ``sampling interval``, 3. checks whether the page is accessed meanwhile, and 4. increases the access count of the region if so. Therefore, the monitoring overhead is controllable by adjusting the number of regions. DAMON allows both the underlying primitives and user callbacks to adjust regions for the trade-off. In other words, this commit makes DAMON to use not only time-based sampling but also space-based sampling. This scheme, however, cannot preserve the quality of the output if the assumption is not guaranteed. Next commit will address this problem. Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster --- include/linux/damon.h | 77 ++- mm/damon/core.c | 143 -- 2 files changed, 213 insertions(+), 7 deletions(-) diff --git a/include/linux/damon.h b/include/linux/damon.h index 2f652602b1ea..67db309ad61b 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -12,6 +12,48 @@ #include #include +/** + * struct damon_addr_range - Represents an address region of [@start, @end). + * @start: Start address of the region (inclusive). + * @end: End address of the region (exclusive). + */ +struct damon_addr_range { + unsigned long start; + unsigned long end; +}; + +/** + * struct damon_region - Represents a monitoring target region. + * @ar:The address range of the region. + * @sampling_addr: Address of the sample for the next access check. + * @nr_accesses: Access frequency of this region. + * @list: List head for siblings. + */ +struct damon_region { + struct damon_addr_range ar; + unsigned long sampling_addr; + unsigned int nr_accesses; + struct list_head list; +}; + +/** + * struct damon_target - Represents a monitoring target. + * @id:Unique identifier for this target. + * @regions_list: Head of the monitoring target regions of this target. + * @list: List head for siblings. + * + * Each monitoring context could have multiple targets. For example, a context + * for virtual memory address spaces could have multiple target processes. The + * @id of each target should be unique among the targets of the context. For + * example, in the virtual address monitoring context, it could be a pidfd or + * an address of an mm_struct. + */ +struct damon_target { + unsigned long id; + struct list_head regions_list; + struct list_head list; +}; + struct damon_ctx; /** @@ -36,7 +78,7 @@ struct damon_ctx; * * @init should initialize primitive-internal data structures. For example, * this could be used to construct proper monitoring target regions and link - * those to @damon_ctx.target. + * those to @damon_ctx.adaptive_targets. * @update should update the primitive-internal data structures. For example, * this could be used to update monitoring target regions for current status. * @prepare_access_checks should manipulate the monitoring regions to be @@ -130,7 +172,7 @@ struct damon_callback { * @primitive: Set of monitoring primitives for given use cases. * @callback: Set of callbacks for monitoring events notifications. * - * @target:Pointer to the user-defined monitoring target. + * @region_targets:Head of monitoring targets (_target) list. */ struct damon_ctx { unsigned long sample_interval; @@ -149,11 +191,40 @@ struct damon_ctx { struct damon_primitive primitive; struct damon_callback callback; - void *target; + struct list_head region_targets; }; +#define damon_next_region(r) \ + (container_of(r->list.next, struct damon_region, list)) + +#define damon_prev_region(r) \ + (container_of(r->list.prev, struct damon_region, list)) + +#define damon_for_each_region(r, t) \ + list_for_each_entry(r, >regions_list, list) + +#define damon_for_each_region_safe(r, next, t) \ + list_for_each_entry_safe(r, next, >regions_list, list) + +#define damon_for_each_target(t, ctx) \ + list_for_each_entry(t, &(ctx)->region_targets, list) + +#define damon_for_each_target_safe(t, next, ctx) \ + list_for_each_entry_safe(t, next, &(ctx)->region_targets, list) + #ifdef CONFIG_DAMON +struct damon_region *damon_new_region(unsigned long start, unsigned long end); +inline void damon_insert_region(struct damon_region *r, + struct damon_region *prev, struct damon_region *next); +void damon_add_region(stru
[PATCH v27 01/13] mm: Introduce Data Access MONitor (DAMON)
From: SeongJae Park DAMON is a data access monitoring framework for the Linux kernel. The core mechanisms of DAMON make it - accurate (the monitoring output is useful enough for DRAM level performance-centric memory management; It might be inappropriate for CPU cache levels, though), - light-weight (the monitoring overhead is normally low enough to be applied online), and - scalable (the upper-bound of the overhead is in constant range regardless of the size of target workloads). Using this framework, hence, we can easily write efficient kernel space data access monitoring applications. For example, the kernel's memory management mechanisms can make advanced decisions using this. Experimental data access aware optimization works that incurring high access monitoring overhead could again be implemented on top of this. Due to its simple and flexible interface, providing user space interface would be also easy. Then, user space users who have some special workloads can write personalized applications for better understanding and optimizations of their workloads and systems. === Nevertheless, this commit is defining and implementing only basic access check part without the overhead-accuracy handling core logic. The basic access check is as below. The output of DAMON says what memory regions are how frequently accessed for a given duration. The resolution of the access frequency is controlled by setting ``sampling interval`` and ``aggregation interval``. In detail, DAMON checks access to each page per ``sampling interval`` and aggregates the results. In other words, counts the number of the accesses to each region. After each ``aggregation interval`` passes, DAMON calls callback functions that previously registered by users so that users can read the aggregated results and then clears the results. This can be described in below simple pseudo-code:: init() while monitoring_on: for page in monitoring_target: if accessed(page): nr_accesses[page] += 1 if time() % aggregation_interval == 0: for callback in user_registered_callbacks: callback(monitoring_target, nr_accesses) for page in monitoring_target: nr_accesses[page] = 0 if time() % update_interval == 0: update() sleep(sampling interval) The target regions constructed at the beginning of the monitoring and updated after each ``regions_update_interval``, because the target regions could be dynamically changed (e.g., mmap() or memory hotplug). The monitoring overhead of this mechanism will arbitrarily increase as the size of the target workload grows. The basic monitoring primitives for actual access check and dynamic target regions construction aren't in the core part of DAMON. Instead, it allows users to implement their own primitives that are optimized for their use case and configure DAMON to use those. In other words, users cannot use current version of DAMON without some additional works. Following commits will implement the core mechanisms for the overhead-accuracy control and default primitives implementations. Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster --- include/linux/damon.h | 167 ++ mm/Kconfig| 3 + mm/Makefile | 1 + mm/damon/Kconfig | 15 ++ mm/damon/Makefile | 3 + mm/damon/core.c | 318 ++ 6 files changed, 507 insertions(+) create mode 100644 include/linux/damon.h create mode 100644 mm/damon/Kconfig create mode 100644 mm/damon/Makefile create mode 100644 mm/damon/core.c diff --git a/include/linux/damon.h b/include/linux/damon.h new file mode 100644 index ..2f652602b1ea --- /dev/null +++ b/include/linux/damon.h @@ -0,0 +1,167 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * DAMON api + * + * Author: SeongJae Park + */ + +#ifndef _DAMON_H_ +#define _DAMON_H_ + +#include +#include +#include + +struct damon_ctx; + +/** + * struct damon_primitive Monitoring primitives for given use cases. + * + * @init: Initialize primitive-internal data structures. + * @update:Update primitive-internal data structures. + * @prepare_access_checks: Prepare next access check of target regions. + * @check_accesses:Check the accesses to target regions. + * @reset_aggregated: Reset aggregated accesses monitoring results. + * @target_valid: Determine if the target is valid. + * @cleanup: Clean up the context. + * + * DAMON can be extended for various address spaces and usages. For this, + * users should register the low level primitives for their target address + * space and usecase via the _ctx.primitive. Then, the monitoring thread + * (_ctx.kdamond) calls @init and @prepare_access_checks before starting + * the monitoring, @update after each
[PATCH v27 00/13] Introduce Data Access MONitor (DAMON)
From: SeongJae Park Changes from Previous Version (v26) === - Rebase on latest -mm tree (v5.12-rc6-mmots-2021-04-06-22-33) - Check kmalloc() failures in dbgfs init (Greg KH) - Fix a typo: s/stollen/stolen/ (Stefan Nuernberger) - Update document for updated user space tool path Introduction DAMON is a data access monitoring framework for the Linux kernel. The core mechanisms of DAMON called 'region based sampling' and 'adaptive regions adjustment' (refer to 'mechanisms.rst' in the 11th patch of this patchset for the detail) make it - accurate (The monitored information is useful for DRAM level memory management. It might not appropriate for Cache-level accuracy, though.), - light-weight (The monitoring overhead is low enough to be applied online while making no impact on the performance of the target workloads.), and - scalable (the upper-bound of the instrumentation overhead is controllable regardless of the size of target workloads.). Using this framework, therefore, several memory management mechanisms such as reclamation and THP can be optimized to aware real data access patterns. Experimental access pattern aware memory management optimization works that incurring high instrumentation overhead will be able to have another try. Though DAMON is for kernel subsystems, it can be easily exposed to the user space by writing a DAMON-wrapper kernel subsystem. Then, user space users who have some special workloads will be able to write personalized tools or applications for deeper understanding and specialized optimizations of their systems. Long-term Plan -- DAMON is a part of a project called Data Access-aware Operating System (DAOS). As the name implies, I want to improve the performance and efficiency of systems using fine-grained data access patterns. The optimizations are for both kernel and user spaces. I will therefore modify or create kernel subsystems, export some of those to user space and implement user space library / tools. Below shows the layers and components for the project. --- Primitives: PTE Accessed bit, PG_idle, rmap, (Intel CMT), ... Framework: DAMON Features: DAMOS, virtual addr, physical addr, ... Applications: DAMON-debugfs, (DARC), ... ^^^KERNEL SPACE Raw Interface: debugfs, (sysfs), (damonfs), tracepoints, (sys_damon), ... vvvUSER SPACE Library:(libdamon), ... Tools: DAMO, (perf), ... --- The components in parentheses or marked as '...' are not implemented yet but in the future plan. IOW, those are the TODO tasks of DAOS project. For more detail, please refer to the plans: https://lore.kernel.org/linux-mm/20201202082731.24828-1-sjp...@amazon.com/ Evaluations === We evaluated DAMON's overhead, monitoring quality and usefulness using 24 realistic workloads on my QEMU/KVM based virtual machine running a kernel that v24 DAMON patchset is applied. DAMON is lightweight. It increases system memory usage by 0.39% and slows target workloads down by 1.16%. DAMON is accurate and useful for memory management optimizations. An experimental DAMON-based operation scheme for THP, namely 'ethp', removes 76.15% of THP memory overheads while preserving 51.25% of THP speedup. Another experimental DAMON-based 'proactive reclamation' implementation, 'prcl', reduces 93.38% of residential sets and 23.63% of system memory footprint while incurring only 1.22% runtime overhead in the best case (parsec3/freqmine). NOTE that the experimental THP optimization and proactive reclamation are not for production but only for proof of concepts. Please refer to the official document[1] or "Documentation/admin-guide/mm: Add a document for DAMON" patch in this patchset for detailed evaluation setup and results. [1] https://damonitor.github.io/doc/html/latest-damon/admin-guide/mm/damon/eval.html Real-world User Story = In summary, DAMON has used on production systems and proved its usefulness. DAMON as a profiler --- We analyzed characteristics of a large scale production systems of our customers using DAMON. The systems utilize 70GB DRAM and 36 CPUs. From this, we were able to find interesting things below. There were obviously different access pattern under idle workload and active workload. Under the idle workload, it accessed large memory regions with low frequency, while the active workload accessed small memory regions with high freuqnecy. DAMON found a 7GB memory region that showing obviously high access frequency under the active workload. We believe this is the performance-effective working set and need to be
Re: [PATCH v26 07/13] mm/damon: Implement a debugfs-based user space interface
From: SeongJae Park On Tue, 30 Mar 2021 09:59:50 + SeongJae Park wrote: > From: SeongJae Park > > On Tue, 30 Mar 2021 11:22:45 +0200 Greg KH wrote: > > > On Tue, Mar 30, 2021 at 09:05:31AM +, sj38.p...@gmail.com wrote: > > > +static int __init __damon_dbgfs_init(void) > > > +{ > > > + struct dentry *dbgfs_root; > > > + const char * const file_names[] = {"monitor_on"}; > > > + const struct file_operations *fops[] = {_on_fops}; > > > + int i; > > > + > > > + dbgfs_root = debugfs_create_dir("damon", NULL); > > > + > > > + for (i = 0; i < ARRAY_SIZE(file_names); i++) > > > + debugfs_create_file(file_names[i], 0600, dbgfs_root, NULL, > > > + fops[i]); > > > + dbgfs_fill_ctx_dir(dbgfs_root, dbgfs_ctxs[0]); > > > + > > > + dbgfs_dirs = kmalloc_array(1, sizeof(dbgfs_root), GFP_KERNEL); > > > > No error checking for memory allocation failures? > > Oops, I will add the check in the next spin. > > > > > > > > + dbgfs_dirs[0] = dbgfs_root; > > > + > > > + return 0; > > > +} > > > + > > > +/* > > > + * Functions for the initialization > > > + */ > > > + > > > +static int __init damon_dbgfs_init(void) > > > +{ > > > + int rc; > > > + > > > + dbgfs_ctxs = kmalloc(sizeof(*dbgfs_ctxs), GFP_KERNEL); > > > > No error checking? > > Will add in the next spin. > > > > > > + dbgfs_ctxs[0] = dbgfs_new_ctx(); > > > + if (!dbgfs_ctxs[0]) > > > + return -ENOMEM; And, I found that I'm not printing the error for this failure. I guess this might made you to to think below error message should printed inside the callee. I will add 'pr_err()' here and above unchecked failure case, in the next version. BTW, I forgot saying thank you for your review. Appreciate! Thanks, SeongJae Park > > > + dbgfs_nr_ctxs = 1; > > > + > > > + rc = __damon_dbgfs_init(); > > > + if (rc) > > > + pr_err("%s: dbgfs init failed\n", __func__); > > > > Shouldn't the error be printed out in the function that failed, not in > > this one? > > I thought some other functions (in future) might want to use > '__damon_dbgfs_init()' but siletnly handles it's failure. Therefore I made > the > function fails silently but returns the error code explicitly. Am I missing > somthing? > > > Thanks, > SeongJae Park > > > > > thanks, > > > > greg k-h >
Re: [PATCH v26 07/13] mm/damon: Implement a debugfs-based user space interface
From: SeongJae Park On Tue, 30 Mar 2021 11:22:45 +0200 Greg KH wrote: > On Tue, Mar 30, 2021 at 09:05:31AM +, sj38.p...@gmail.com wrote: > > +static int __init __damon_dbgfs_init(void) > > +{ > > + struct dentry *dbgfs_root; > > + const char * const file_names[] = {"monitor_on"}; > > + const struct file_operations *fops[] = {_on_fops}; > > + int i; > > + > > + dbgfs_root = debugfs_create_dir("damon", NULL); > > + > > + for (i = 0; i < ARRAY_SIZE(file_names); i++) > > + debugfs_create_file(file_names[i], 0600, dbgfs_root, NULL, > > + fops[i]); > > + dbgfs_fill_ctx_dir(dbgfs_root, dbgfs_ctxs[0]); > > + > > + dbgfs_dirs = kmalloc_array(1, sizeof(dbgfs_root), GFP_KERNEL); > > No error checking for memory allocation failures? Oops, I will add the check in the next spin. > > > > + dbgfs_dirs[0] = dbgfs_root; > > + > > + return 0; > > +} > > + > > +/* > > + * Functions for the initialization > > + */ > > + > > +static int __init damon_dbgfs_init(void) > > +{ > > + int rc; > > + > > + dbgfs_ctxs = kmalloc(sizeof(*dbgfs_ctxs), GFP_KERNEL); > > No error checking? Will add in the next spin. > > > + dbgfs_ctxs[0] = dbgfs_new_ctx(); > > + if (!dbgfs_ctxs[0]) > > + return -ENOMEM; > > + dbgfs_nr_ctxs = 1; > > + > > + rc = __damon_dbgfs_init(); > > + if (rc) > > + pr_err("%s: dbgfs init failed\n", __func__); > > Shouldn't the error be printed out in the function that failed, not in > this one? I thought some other functions (in future) might want to use '__damon_dbgfs_init()' but siletnly handles it's failure. Therefore I made the function fails silently but returns the error code explicitly. Am I missing somthing? Thanks, SeongJae Park > > thanks, > > greg k-h
Re: [PATCH v25 05/13] mm/damon: Implement primitives for the virtual memory address spaces
From: SeongJae Park On Thu, 18 Mar 2021 10:08:48 + sj38.p...@gmail.com wrote: > From: SeongJae Park > > This commit introduces a reference implementation of the address space > specific low level primitives for the virtual address space, so that > users of DAMON can easily monitor the data accesses on virtual address > spaces of specific processes by simply configuring the implementation to > be used by DAMON. > > The low level primitives for the fundamental access monitoring are > defined in two parts: > > 1. Identification of the monitoring target address range for the address >space. > 2. Access check of specific address range in the target space. > > The reference implementation for the virtual address space does the > works as below. > > PTE Accessed-bit Based Access Check > --- > > The implementation uses PTE Accessed-bit for basic access checks. That > is, it clears the bit for the next sampling target page and checks > whether it is set again after one sampling period. This could disturb > the reclaim logic. DAMON uses ``PG_idle`` and ``PG_young`` page flags > to solve the conflict, as Idle page tracking does. > > VMA-based Target Address Range Construction > --- > > Only small parts in the super-huge virtual address space of the > processes are mapped to physical memory and accessed. Thus, tracking > the unmapped address regions is just wasteful. However, because DAMON > can deal with some level of noise using the adaptive regions adjustment > mechanism, tracking every mapping is not strictly required but could > even incur a high overhead in some cases. That said, too huge unmapped > areas inside the monitoring target should be removed to not take the > time for the adaptive mechanism. > > For the reason, this implementation converts the complex mappings to > three distinct regions that cover every mapped area of the address > space. Also, the two gaps between the three regions are the two biggest > unmapped areas in the given address space. The two biggest unmapped > areas would be the gap between the heap and the uppermost mmap()-ed > region, and the gap between the lowermost mmap()-ed region and the stack > in most of the cases. Because these gaps are exceptionally huge in > usual address spaces, excluding these will be sufficient to make a > reasonable trade-off. Below shows this in detail:: > > > > > (small mmap()-ed regions and munmap()-ed regions) > > > > > Signed-off-by: SeongJae Park > Reviewed-by: Leonard Foerster > --- > include/linux/damon.h | 13 + > mm/damon/Kconfig | 9 + > mm/damon/Makefile | 1 + > mm/damon/vaddr.c | 579 ++ > 4 files changed, 602 insertions(+) > create mode 100644 mm/damon/vaddr.c > [...] > + > +/* > + * Update regions for current memory mappings > + */ > +void damon_va_update(struct damon_ctx *ctx) > +{ > + struct damon_addr_range three_regions[3]; > + struct damon_target *t; > + > + damon_for_each_target(t, ctx) { > + if (damon_va_three_regions(t, three_regions)) > + continue; > + damon_va_apply_three_regions(ctx, t, three_regions); > + } > +} > + > +static void damon_ptep_mkold(pte_t *pte, struct mm_struct *mm, > + unsigned long addr) > +{ > + bool referenced = false; > + struct page *page = pte_page(*pte); The 'pte' could be a special mapping which has no associated 'struct page'. In the case, 'page' would be invalid. Guoju from Alibaba found the problem from his GPU setup and reported the problem to via Github[1]. I made a fix and waiting for his test results. I will squash the fix in the next version of this patch. [1] https://github.com/sjp38/linux/pull/3/commits/12eeebc6ffc8b5d2a6aba7a2ec9fb85d3c1663af [2] https://github.com/sjp38/linux/commit/f1fa22b6375ceb9ae53e9370452de0d62efd4df5 Thanks, SeongJae Park > + > + if (pte_young(*pte)) { > + referenced = true; > + *pte = pte_mkold(*pte); > + } > + > +#ifdef CONFIG_MMU_NOTIFIER > + if (mmu_notifier_clear_young(mm, addr, addr + PAGE_SIZE)) > + referenced = true; > +#endif /* CONFIG_MMU_NOTIFIER */ > + > + if (referenced) > + set_page_young(page); > + > + set_page_idle(page); > +} > + [...] > + > +static void damon_va_mkold(struct mm_struct *mm, unsigned long addr) > +{ > + pte_t *pte = NULL; > + pmd_t *pmd = NULL; > + spinlock_t *ptl; > + > + if (follow_invalidate_pte(mm, addr, NULL, , , )) > + return; > + > + if (pte) { > + damon_ptep_mkold(pte, mm, addr); > + pte_unmap_unlock(pte, ptl); > + } else { > + damon_pmdp_mkold(pmd, mm, addr); > + spin_unlock(ptl); > + } > +} > + [...]
Re: [PATCH v2] mm/shmem: Enclose shmem_mcopy_atomic_pte() with 'CONFIG_USERFAULTFD'
On Tue, 16 Mar 2021 09:16:57 -0700 Axel Rasmussen wrote: > Sorry for the build failure! I sent a new version of my patch with > this same fix on the 10th > (https://lore.kernel.org/patchwork/patch/1392464/), and I believe > Andrew has already included it in his tree. No problem at all, thank you for letting me know! :) FYI, I tested on 'master' of https://github.com/hnaz/linux-mm. Thanks, SeongJae Park [...] Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
[PATCH v2] mm/shmem: Enclose shmem_mcopy_atomic_pte() with 'CONFIG_USERFAULTFD'
From: SeongJae Park Commit 49eeab03fa0a ("userfaultfd: support minor fault handling for shmem") introduced shmem_mcopy_atomic_pte(). The function is declared in 'userfaultfd_k.h' when 'CONFIG_USERFAULTFD' is defined, and defined as 'BUG()' if the config is unset. However, the definition of the function in 'shmem.c' is not protected by the '#ifdef' macro. As a result, the build fails when the config is not set. This commit fixes the problem. Fixes: 49eeab03fa0a ("userfaultfd: support minor fault handling for shmem") Signed-off-by: SeongJae Park --- Changes from v1 - Remove unnecessary internal code review URL --- mm/shmem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/shmem.c b/mm/shmem.c index 547df2b766f7..c0d3abefeb3f 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2359,6 +2359,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode return inode; } +#ifdef CONFIG_USERFAULTFD int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, @@ -2492,6 +2493,7 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, shmem_inode_unacct_blocks(inode, 1); goto out; } +#endif #ifdef CONFIG_TMPFS static const struct inode_operations shmem_symlink_inode_operations; -- 2.17.1 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
[PATCH] mm/shmem: Enclose shmem_mcopy_atomic_pte() with 'CONFIG_USERFAULTFD'
From: SeongJae Park Commit 49eeab03fa0a ("userfaultfd: support minor fault handling for shmem") introduced shmem_mcopy_atomic_pte(). The function is declared in 'userfaultfd_k.h' when 'CONFIG_USERFAULTFD' is defined, and defined as 'BUG()' if the config is unset. However, the definition of the function in 'shmem.c' is not protected by the '#ifdef' macro. As a result, the build fails when the config is not set. This commit fixes the problem. Fixes: 49eeab03fa0a ("userfaultfd: support minor fault handling for shmem") Signed-off-by: SeongJae Park cr https://code.amazon.com/reviews/CR-47204463 --- mm/shmem.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/mm/shmem.c b/mm/shmem.c index 547df2b766f7..c0d3abefeb3f 100644 --- a/mm/shmem.c +++ b/mm/shmem.c @@ -2359,6 +2359,7 @@ static struct inode *shmem_get_inode(struct super_block *sb, const struct inode return inode; } +#ifdef CONFIG_USERFAULTFD int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, struct vm_area_struct *dst_vma, unsigned long dst_addr, unsigned long src_addr, @@ -2492,6 +2493,7 @@ int shmem_mcopy_atomic_pte(struct mm_struct *dst_mm, pmd_t *dst_pmd, shmem_inode_unacct_blocks(inode, 1); goto out; } +#endif #ifdef CONFIG_TMPFS static const struct inode_operations shmem_symlink_inode_operations; -- 2.17.1 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
Re: [PATCH] docs/kokr: make sections on bug reporting match practice
ping? On Mon, 8 Mar 2021 08:57:01 +0100 SeongJae Park wrote: > From: SeongJae Park > > Translate this commit to Korean: > > cf6d6fc27936 ("docs: process/howto.rst: make sections on bug reporting > match practice") > > Signed-off-by: SeongJae Park > --- > Documentation/translations/ko_KR/howto.rst | 18 +- > 1 file changed, 9 insertions(+), 9 deletions(-) > > diff --git a/Documentation/translations/ko_KR/howto.rst > b/Documentation/translations/ko_KR/howto.rst > index 787f1e85f8a0..a2bdd564c907 100644 > --- a/Documentation/translations/ko_KR/howto.rst > +++ b/Documentation/translations/ko_KR/howto.rst > @@ -339,14 +339,8 @@ Andrew Morton의 글이 있다. > 버그 보고 > - > > -https://bugzilla.kernel.org 는 리눅스 커널 개발자들이 커널의 버그를 추적하는 > -곳이다. 사용자들은 발견한 모든 버그들을 보고하기 위하여 이 툴을 사용할 것을 > -권장한다. kernel bugzilla를 사용하는 자세한 방법은 다음을 참조하라. > - > -https://bugzilla.kernel.org/page.cgi?id=faq.html > - > 메인 커널 소스 디렉토리에 있는 'Documentation/admin-guide/reporting-issues.rst' > -파일은 커널 버그라고 생각되는 것을 보고하는 방법에 관한 좋은 템플릿이며 문제를 > +파일은 커널 버그라고 생각되는 것을 어떻게 보고하면 되는지, 그리고 문제를 > 추적하기 위해서 커널 개발자들이 필요로 하는 정보가 무엇들인지를 상세히 설명하고 > 있다. > > @@ -362,8 +356,14 @@ https://bugzilla.kernel.org 는 리눅스 커널 개발자들이 커널의 버 > 점수를 얻을 수 있는 가장 좋은 방법중의 하나이다. 왜냐하면 많은 사람들은 > 다른 사람들의 버그들을 수정하기 위하여 시간을 낭비하지 않기 때문이다. > > -이미 보고된 버그 리포트들을 가지고 작업하기 위해서 https://bugzilla.kernel.org > -를 참조하라. > +이미 보고된 버그 리포트들을 가지고 작업하기 위해서는 여러분이 관심있는 > +서브시스템을 찾아라. 해당 서브시스템의 버그들이 어디로 리포트 되는지 > +MAINTAINERS 파일을 체크하라; 그건 대부분 메일링 리스트이고, 가끔은 버그 추적 > +시스템이다. 그 장소에 있는 최근 버그 리포트 기록들을 검색하고 여러분이 보기에 > +적합하다 싶은 것을 도와라. 여러분은 버그 리포트를 위해 > +https://bugzilla.kernel.org 를 체크하고자 할 수도 있다; 소수의 커널 > +서브시스템들만이 버그 신고와 추적을 위해 해당 시스템을 실제로 사용하고 있지만, > +전체 커널의 버그들이 그곳에 정리된다. > > > 메일링 리스트들 > -- > 2.17.1 Amazon Development Center Germany GmbH Krausenstr. 38 10117 Berlin Geschaeftsfuehrung: Christian Schlaeger, Jonathan Weiss Eingetragen am Amtsgericht Charlottenburg unter HRB 149173 B Sitz: Berlin Ust-ID: DE 289 237 879
[PATCH] docs/kokr: make sections on bug reporting match practice
From: SeongJae Park Translate this commit to Korean: cf6d6fc27936 ("docs: process/howto.rst: make sections on bug reporting match practice") Signed-off-by: SeongJae Park --- Documentation/translations/ko_KR/howto.rst | 18 +- 1 file changed, 9 insertions(+), 9 deletions(-) diff --git a/Documentation/translations/ko_KR/howto.rst b/Documentation/translations/ko_KR/howto.rst index 787f1e85f8a0..a2bdd564c907 100644 --- a/Documentation/translations/ko_KR/howto.rst +++ b/Documentation/translations/ko_KR/howto.rst @@ -339,14 +339,8 @@ Andrew Morton의 글이 있다. 버그 보고 - -https://bugzilla.kernel.org 는 리눅스 커널 개발자들이 커널의 버그를 추적하는 -곳이다. 사용자들은 발견한 모든 버그들을 보고하기 위하여 이 툴을 사용할 것을 -권장한다. kernel bugzilla를 사용하는 자세한 방법은 다음을 참조하라. - -https://bugzilla.kernel.org/page.cgi?id=faq.html - 메인 커널 소스 디렉토리에 있는 'Documentation/admin-guide/reporting-issues.rst' -파일은 커널 버그라고 생각되는 것을 보고하는 방법에 관한 좋은 템플릿이며 문제를 +파일은 커널 버그라고 생각되는 것을 어떻게 보고하면 되는지, 그리고 문제를 추적하기 위해서 커널 개발자들이 필요로 하는 정보가 무엇들인지를 상세히 설명하고 있다. @@ -362,8 +356,14 @@ https://bugzilla.kernel.org 는 리눅스 커널 개발자들이 커널의 버 점수를 얻을 수 있는 가장 좋은 방법중의 하나이다. 왜냐하면 많은 사람들은 다른 사람들의 버그들을 수정하기 위하여 시간을 낭비하지 않기 때문이다. -이미 보고된 버그 리포트들을 가지고 작업하기 위해서 https://bugzilla.kernel.org -를 참조하라. +이미 보고된 버그 리포트들을 가지고 작업하기 위해서는 여러분이 관심있는 +서브시스템을 찾아라. 해당 서브시스템의 버그들이 어디로 리포트 되는지 +MAINTAINERS 파일을 체크하라; 그건 대부분 메일링 리스트이고, 가끔은 버그 추적 +시스템이다. 그 장소에 있는 최근 버그 리포트 기록들을 검색하고 여러분이 보기에 +적합하다 싶은 것을 도와라. 여러분은 버그 리포트를 위해 +https://bugzilla.kernel.org 를 체크하고자 할 수도 있다; 소수의 커널 +서브시스템들만이 버그 신고와 추적을 위해 해당 시스템을 실제로 사용하고 있지만, +전체 커널의 버그들이 그곳에 정리된다. 메일링 리스트들 -- 2.17.1
Re: [PATCH v24 00/14] Subject: Introduce Data Access MONitor (DAMON)
On Thu, 4 Feb 2021 16:31:36 +0100 SeongJae Park wrote: > From: SeongJae Park [...] > > Introduction > > > DAMON is a data access monitoring framework for the Linux kernel. The core > mechanisms of DAMON called 'region based sampling' and 'adaptive regions > adjustment' (refer to 'mechanisms.rst' in the 11th patch of this patchset for > the detail) make it > > - accurate (The monitored information is useful for DRAM level memory >management. It might not appropriate for Cache-level accuracy, though.), > - light-weight (The monitoring overhead is low enough to be applied online >while making no impact on the performance of the target workloads.), and > - scalable (the upper-bound of the instrumentation overhead is controllable >regardless of the size of target workloads.). > > Using this framework, therefore, several memory management mechanisms such as > reclamation and THP can be optimized to aware real data access patterns. > Experimental access pattern aware memory management optimization works that > incurring high instrumentation overhead will be able to have another try. > > Though DAMON is for kernel subsystems, it can be easily exposed to the user > space by writing a DAMON-wrapper kernel subsystem. Then, user space users who > have some special workloads will be able to write personalized tools or > applications for deeper understanding and specialized optimizations of their > systems. > [...] > > Baseline and Complete Git Trees > === > > The patches are based on the v5.10. You can also clone the complete git > tree: > > $ git clone git://github.com/sjp38/linux -b damon/patches/v24 > > The web is also available: > https://github.com/sjp38/linux/releases/tag/damon/patches/v24 > > There are a couple of trees for entire DAMON patchset series. It includes > future features. The first one[1] contains the changes for latest release, > while the other one[2] contains the changes for next release. > > [1] https://github.com/sjp38/linux/tree/damon/master > [2] https://github.com/sjp38/linux/tree/damon/next For people who prefer LTS kernels, I decided to maintain two more trees that repectively based on latest two LTS kernels and contains backports of the latest 'damon/master' tree, as below. Please use those if you want to test DAMON but using LTS. - For v5.4.y: https://github.com/sjp38/linux/tree/damon/for-v5.4.y - For v5.10.y: https://github.com/sjp38/linux/tree/damon/for-v5.10.y Thanks, SeongJae Park
[PATCH v24 11/14] Documentation: Add documents for DAMON
From: SeongJae Park This commit adds documents for DAMON under `Documentation/admin-guide/mm/damon/` and `Documentation/vm/damon/`. Signed-off-by: SeongJae Park --- Documentation/admin-guide/mm/damon/guide.rst | 159 ++ Documentation/admin-guide/mm/damon/index.rst | 15 + Documentation/admin-guide/mm/damon/plans.rst | 29 ++ Documentation/admin-guide/mm/damon/start.rst | 97 ++ Documentation/admin-guide/mm/damon/usage.rst | 304 +++ Documentation/admin-guide/mm/index.rst | 1 + Documentation/vm/damon/api.rst | 20 ++ Documentation/vm/damon/design.rst| 166 ++ Documentation/vm/damon/eval.rst | 232 ++ Documentation/vm/damon/faq.rst | 58 Documentation/vm/damon/index.rst | 31 ++ Documentation/vm/index.rst | 1 + 12 files changed, 1113 insertions(+) create mode 100644 Documentation/admin-guide/mm/damon/guide.rst create mode 100644 Documentation/admin-guide/mm/damon/index.rst create mode 100644 Documentation/admin-guide/mm/damon/plans.rst create mode 100644 Documentation/admin-guide/mm/damon/start.rst create mode 100644 Documentation/admin-guide/mm/damon/usage.rst create mode 100644 Documentation/vm/damon/api.rst create mode 100644 Documentation/vm/damon/design.rst create mode 100644 Documentation/vm/damon/eval.rst create mode 100644 Documentation/vm/damon/faq.rst create mode 100644 Documentation/vm/damon/index.rst diff --git a/Documentation/admin-guide/mm/damon/guide.rst b/Documentation/admin-guide/mm/damon/guide.rst new file mode 100644 index ..49da40bc4ba9 --- /dev/null +++ b/Documentation/admin-guide/mm/damon/guide.rst @@ -0,0 +1,159 @@ +.. SPDX-License-Identifier: GPL-2.0 + +== +Optimization Guide +== + +This document helps you estimating the amount of benefit that you could get +from DAMON-based optimizations, and describes how you could achieve it. You +are assumed to already read :doc:`start`. + + +Check The Signs +=== + +No optimization can provide same extent of benefit to every case. Therefore +you should first guess how much improvements you could get using DAMON. If +some of below conditions match your situation, you could consider using DAMON. + +- *Low IPC and High Cache Miss Ratios.* Low IPC means most of the CPU time is + spent waiting for the completion of time-consuming operations such as memory + access, while high cache miss ratios mean the caches don't help it well. + DAMON is not for cache level optimization, but DRAM level. However, + improving DRAM management will also help this case by reducing the memory + operation latency. +- *Memory Over-commitment and Unknown Users.* If you are doing memory + overcommitment and you cannot control every user of your system, a memory + bank run could happen at any time. You can estimate when it will happen + based on DAMON's monitoring results and act earlier to avoid or deal better + with the crisis. +- *Frequent Memory Pressure.* Frequent memory pressure means your system has + wrong configurations or memory hogs. DAMON will help you find the right + configuration and/or the criminals. +- *Heterogeneous Memory System.* If your system is utilizing memory devices + that placed between DRAM and traditional hard disks, such as non-volatile + memory or fast SSDs, DAMON could help you utilizing the devices more + efficiently. + + +Profile +=== + +If you found some positive signals, you could start by profiling your workloads +using DAMON. Find major workloads on your systems and analyze their data +access pattern to find something wrong or can be improved. The DAMON user +space tool (``damo``) will be useful for this. You can get ``damo`` from +``tools/damon/`` directory in the DAMON development tree (``damon/master`` +branch of https://github.com/sjp38/linux.git). + +We recommend you to start from working set size distribution check using ``damo +report wss``. If the distribution is ununiform or quite different from what +you estimated, you could consider `Memory Configuration`_ optimization. + +Then, review the overall access pattern in heatmap form using ``damo report +heats``. If it shows a simple pattern consists of a small number of memory +regions having high contrast of access temperature, you could consider manual +`Program Modification`_. + +If you still want to absorb more benefits, you should develop `Personalized +DAMON Application`_ for your special case. + +You don't need to take only one approach among the above plans, but you could +use multiple of the above approaches to maximize the benefit. + + +Optimize + + +If the profiling result also says it's worth trying some optimization, you +could consider below approaches. Note that some of the below approaches assume +that your systems are configured with swap devices or other types of auxiliary +memory so that you don't
[PATCH v24 10/14] mm/damon/dbgfs: Support multiple contexts
From: SeongJae Park In some use cases, users would want to run multiple monitoring context. For example, if a user wants a high precision monitoring and dedicating multiple CPUs for the job is ok, because DAMON creates one monitoring thread per one context, the user can split the monitoring target regions into multiple small regions and create one context for each region. Or, someone might want to simultaneously monitor different address spaces, e.g., both virtual address space and physical address space. The DAMON's API allows such usage, but 'damon-dbgfs' does not. Therefore, only kernel space DAMON users can do multiple contexts monitoring. This commit allows the user space DAMON users to use multiple contexts monitoring by introducing two new 'damon-dbgfs' debugfs files, 'mk_context' and 'rm_context'. Users can create a new monitoring context by writing the desired name of the new context to 'mk_context'. Then, a new directory with the name and having the files for setting of the context ('attrs', 'target_ids' and 'record') will be created under the debugfs directory. Writing the name of the context to remove to 'rm_context' will remove the related context and directory. Signed-off-by: SeongJae Park --- mm/damon/dbgfs.c | 215 ++- 1 file changed, 212 insertions(+), 3 deletions(-) diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c index 4b9ac2043e99..68edfd4d3b41 100644 --- a/mm/damon/dbgfs.c +++ b/mm/damon/dbgfs.c @@ -29,6 +29,7 @@ struct dbgfs_recorder { static struct damon_ctx **dbgfs_ctxs; static int dbgfs_nr_ctxs; static struct dentry **dbgfs_dirs; +static DEFINE_MUTEX(damon_dbgfs_lock); /* * Returns non-empty string on success, negarive error code otherwise. @@ -495,6 +496,13 @@ static void dbgfs_write_record_header(struct damon_ctx *ctx) dbgfs_write_rbuf(ctx, _ver, sizeof(recfmt_ver)); } +static void dbgfs_free_recorder(struct dbgfs_recorder *recorder) +{ + kfree(recorder->rbuf); + kfree(recorder->rfile_path); + kfree(recorder); +} + static unsigned int nr_damon_targets(struct damon_ctx *ctx) { struct damon_target *t; @@ -561,7 +569,7 @@ static struct damon_ctx *dbgfs_new_ctx(void) { struct damon_ctx *ctx; - ctx = damon_new_ctx(DAMON_ADAPTIVE_TARGET); + ctx = damon_new_ctx(); if (!ctx) return NULL; @@ -577,6 +585,195 @@ static struct damon_ctx *dbgfs_new_ctx(void) return ctx; } +static void dbgfs_destroy_ctx(struct damon_ctx *ctx) +{ + dbgfs_free_recorder(ctx->callback.private); + damon_destroy_ctx(ctx); +} + +/* + * Make a context of @name and create a debugfs directory for it. + * + * This function should be called while holding damon_dbgfs_lock. + * + * Returns 0 on success, negative error code otherwise. + */ +static int dbgfs_mk_context(char *name) +{ + struct dentry *root, **new_dirs, *new_dir; + struct damon_ctx **new_ctxs, *new_ctx; + int err; + + if (damon_nr_running_ctxs()) + return -EBUSY; + + new_ctxs = krealloc(dbgfs_ctxs, sizeof(*dbgfs_ctxs) * + (dbgfs_nr_ctxs + 1), GFP_KERNEL); + if (!new_ctxs) + return -ENOMEM; + + new_dirs = krealloc(dbgfs_dirs, sizeof(*dbgfs_dirs) * + (dbgfs_nr_ctxs + 1), GFP_KERNEL); + if (!new_dirs) { + kfree(new_ctxs); + return -ENOMEM; + } + + dbgfs_ctxs = new_ctxs; + dbgfs_dirs = new_dirs; + + root = dbgfs_dirs[0]; + if (!root) + return -ENOENT; + + new_dir = debugfs_create_dir(name, root); + if (IS_ERR(new_dir)) + return PTR_ERR(new_dir); + dbgfs_dirs[dbgfs_nr_ctxs] = new_dir; + + new_ctx = dbgfs_new_ctx(); + if (!new_ctx) { + debugfs_remove(new_dir); + dbgfs_dirs[dbgfs_nr_ctxs] = NULL; + return -ENOMEM; + } + dbgfs_ctxs[dbgfs_nr_ctxs] = new_ctx; + + err = dbgfs_fill_ctx_dir(dbgfs_dirs[dbgfs_nr_ctxs], + dbgfs_ctxs[dbgfs_nr_ctxs]); + if (err) + return err; + + dbgfs_nr_ctxs++; + return 0; +} + +static ssize_t dbgfs_mk_context_write(struct file *file, + const char __user *buf, size_t count, loff_t *ppos) +{ + char *kbuf; + char *ctx_name; + ssize_t ret = count; + int err; + + kbuf = user_input_str(buf, count, ppos); + if (IS_ERR(kbuf)) + return PTR_ERR(kbuf); + ctx_name = kmalloc(count + 1, GFP_KERNEL); + if (!ctx_name) { + kfree(kbuf); + return -ENOMEM; + } + + /* Trim white space */ + if (sscanf(kbuf, "%s", ctx_name) != 1) { + ret = -EINVAL; + goto out; + } + + mutex_lock(_dbgfs_lock); + err = dbgfs_mk_context(ctx_name); + if (err) +
Re: Please apply "xen-netback: delete NAPI instance when queue fails to initialize" to v4.4.y
On Wed, 24 Feb 2021 18:21:09 +0100 Greg KH wrote: > On Wed, Feb 24, 2021 at 06:03:56PM +0100, SeongJae Park wrote: > > This is a request for merge of upstream commit 4a658527271b ("xen-netback: > > delete NAPI instance when queue fails to initialize") on v4.4.y tree. > > > > If 'xenvif_connect()' fails after successful 'netif_napi_add()', the napi is > > not cleaned up. Because 'create_queues()' frees the queues in its error > > handling code, if the 'xenvif_free()' is called for the vif, use-after-free > > occurs. The upstream commit fixes the problem by cleaning up the napi in the > > 'xenvif_connect()'. > > > > Attaching the original patch below for your convenience. > > The original patch does not apply cleanly. I tested the commit is cleanly applicable with 'git cherry-pick' before posting this. I just tried 'git format-patch ... && git am ...' and confirmed it doesn't work. Sorry, my fault. > > > Tested-by: Markus Boehme > > What was tested? We confirmed the unmodified v4.4.y kernel crashes on a stress test that repeatedly doing netdev attach/detach, while the patch applied version doesn't. > > I backported the patch, but next time, please provide the patch that > will work properly. Thanks, and apology for the inconvenience. I will do the check with posting patch again rather than only 'git cherry-pick' from next time. Thanks, SeongJae Park > > greg k-h
Please apply "xen-netback: delete NAPI instance when queue fails to initialize" to v4.4.y
This is a request for merge of upstream commit 4a658527271b ("xen-netback: delete NAPI instance when queue fails to initialize") on v4.4.y tree. If 'xenvif_connect()' fails after successful 'netif_napi_add()', the napi is not cleaned up. Because 'create_queues()' frees the queues in its error handling code, if the 'xenvif_free()' is called for the vif, use-after-free occurs. The upstream commit fixes the problem by cleaning up the napi in the 'xenvif_connect()'. Attaching the original patch below for your convenience. Tested-by: Markus Boehme Thanks, SeongJae Park >8 === >From 4a658527271bce43afb1cf4feec89afe6716ca59 Mon Sep 17 00:00:00 2001 From: David Vrabel Date: Fri, 15 Jan 2016 14:55:35 + Subject: [PATCH] xen-netback: delete NAPI instance when queue fails to initialize When xenvif_connect() fails it may leave a stale NAPI instance added to the device. Make sure we delete it in the error path. Signed-off-by: David Vrabel Signed-off-by: David S. Miller --- drivers/net/xen-netback/interface.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/xen-netback/interface.c b/drivers/net/xen-netback/interface.c index e7bd63eb2876..3bba6ceee132 100644 --- a/drivers/net/xen-netback/interface.c +++ b/drivers/net/xen-netback/interface.c @@ -615,6 +615,7 @@ int xenvif_connect(struct xenvif_queue *queue, unsigned long tx_ring_ref, queue->tx_irq = 0; err_unmap: xenvif_unmap_frontend_rings(queue); + netif_napi_del(>napi); err: module_put(THIS_MODULE); return err; -- 2.17.1
Re: [PATCH v24 00/14] Subject: Introduce Data Access MONitor (DAMON)
On Thu, 4 Feb 2021 16:31:36 +0100 SeongJae Park wrote: > From: SeongJae Park > [...] > > Introduction > > > DAMON is a data access monitoring framework for the Linux kernel. The core > mechanisms of DAMON called 'region based sampling' and 'adaptive regions > adjustment' (refer to 'mechanisms.rst' in the 11th patch of this patchset for > the detail) make it > > - accurate (The monitored information is useful for DRAM level memory >management. It might not appropriate for Cache-level accuracy, though.), > - light-weight (The monitoring overhead is low enough to be applied online >while making no impact on the performance of the target workloads.), and > - scalable (the upper-bound of the instrumentation overhead is controllable >regardless of the size of target workloads.). > > Using this framework, therefore, several memory management mechanisms such as > reclamation and THP can be optimized to aware real data access patterns. > Experimental access pattern aware memory management optimization works that > incurring high instrumentation overhead will be able to have another try. > > Though DAMON is for kernel subsystems, it can be easily exposed to the user > space by writing a DAMON-wrapper kernel subsystem. Then, user space users who > have some special workloads will be able to write personalized tools or > applications for deeper understanding and specialized optimizations of their > systems. > I realized I didn't introduce a good, intuitive example use case of DAMON for profiling so far, though DAMON is not for only profiling. One straightforward and realistic usage of DAMON as a profiling tool would be recording the monitoring results with callstack and visualize those by timeline together. For example, below link shows that visualization for a realistic workload, namely 'fft' in SPLASH-2X benchmark suite. From that, you can know there are three memory access bursting phases in the workload and 'FFT1DOnce.cons::prop.2()' looks responsible for the first and second hot phase, while 'Transpose()' is responsible for the last one. Now the programmer can take a deep look in the functions and optimize the code (e.g., adding madvise() or mlock() calls). https://damonitor.github.io/temporal/damon_callstack.png We used the approach for 'mlock()'-based optimization of a range of other realistic benchmark workloads. The optimized versions achieved up to about 2.5x performance improvement under memory pressure[1]. Note: I made the uppermost two figures in above 'fft' visualization (working set size and access frequency of each memory region by time) via the DAMON user space tool[2], while the lowermost one (callstack by time) is made using perf and speedscope[3]. We have no descent and totally automated tool for that yet (will be implemented soon, maybe under perf as a perf-script[4]), but you could reproduce that with below commands. $ # run the workload $ sudo damo record $(pidof ) & $ sudo perf record -g $(pidof ) $ # after your workload finished (you should also finish perf on your own) $ damo report wss --sortby time --plot wss.pdf $ damo report heats --heatmap freq.pdf $ sudo perf script | speedscope - $ # open wss.pdf and freq.pdf with our favorite pdf viewer [1] https://linuxplumbersconf.org/event/4/contributions/548/attachments/311/590/damon_ksummit19.pdf [2] https://lore.kernel.org/linux-mm/20201215115448.25633-8-sjp...@amazon.com/ [3] https://www.speedscope.app/ [4] https://lore.kernel.org/linux-mm/20210107120729.22328-1-sjp...@amazon.com/
Re: [PATCH v24 11/14] Documentation: Add documents for DAMON
On Thu, 4 Feb 2021 16:31:47 +0100 SeongJae Park wrote: > From: SeongJae Park > > This commit adds documents for DAMON under > `Documentation/admin-guide/mm/damon/` and `Documentation/vm/damon/`. > > Signed-off-by: SeongJae Park > --- > Documentation/admin-guide/mm/damon/guide.rst | 159 ++ > Documentation/admin-guide/mm/damon/index.rst | 15 + > Documentation/admin-guide/mm/damon/plans.rst | 29 ++ > Documentation/admin-guide/mm/damon/start.rst | 97 ++ > Documentation/admin-guide/mm/damon/usage.rst | 304 +++ > Documentation/admin-guide/mm/index.rst | 1 + > Documentation/vm/damon/api.rst | 20 ++ > Documentation/vm/damon/design.rst| 166 ++ > Documentation/vm/damon/eval.rst | 232 ++ > Documentation/vm/damon/faq.rst | 58 > Documentation/vm/damon/index.rst | 31 ++ > Documentation/vm/index.rst | 1 + > 12 files changed, 1113 insertions(+) > create mode 100644 Documentation/admin-guide/mm/damon/guide.rst > create mode 100644 Documentation/admin-guide/mm/damon/index.rst > create mode 100644 Documentation/admin-guide/mm/damon/plans.rst > create mode 100644 Documentation/admin-guide/mm/damon/start.rst > create mode 100644 Documentation/admin-guide/mm/damon/usage.rst > create mode 100644 Documentation/vm/damon/api.rst > create mode 100644 Documentation/vm/damon/design.rst > create mode 100644 Documentation/vm/damon/eval.rst > create mode 100644 Documentation/vm/damon/faq.rst > create mode 100644 Documentation/vm/damon/index.rst > [...] > diff --git a/Documentation/admin-guide/mm/damon/usage.rst > b/Documentation/admin-guide/mm/damon/usage.rst > new file mode 100644 > index ..32436cf853c7 > --- /dev/null > +++ b/Documentation/admin-guide/mm/damon/usage.rst > @@ -0,0 +1,304 @@ > +.. SPDX-License-Identifier: GPL-2.0 > + > +=== > +Detailed Usages > +=== > + > +DAMON provides below three interfaces for different users. > + > +- *DAMON user space tool.* > + This is for privileged people such as system administrators who want a > + just-working human-friendly interface. Using this, users can use the > DAMON’s > + major features in a human-friendly way. It may not be highly tuned for > + special cases, though. It supports only virtual address spaces monitoring. > +- *debugfs interface.* > + This is for privileged user space programmers who want more optimized use > of > + DAMON. Using this, users can use DAMON’s major features by reading > + from and writing to special debugfs files. Therefore, you can write and > use > + your personalized DAMON debugfs wrapper programs that reads/writes the > + debugfs files instead of you. The DAMON user space tool is also a > reference > + implementation of such programs. It supports only virtual address spaces > + monitoring. > +- *Kernel Space Programming Interface.* > + This is for kernel space programmers. Using this, users can utilize every > + feature of DAMON most flexibly and efficiently by writing kernel space > + DAMON application programs for you. You can even extend DAMON for various > + address spaces. > + > +This document does not describe the kernel space programming interface in > +detail. For that, please refer to the :doc:`/vm/damon/api`. > + > + > +DAMON User Space Tool > += This version of the patchset doesn't introduce the user space tool source code, so putting the detailed usage here might make no sense. I will remove this section in the next version. If you will review this patch, please skip this section. [...] > + > +debugfs Interface > += But, this section will not be removed. Please review. [...] Thanks, SeongJae Park
Re: [PATCH v24 07/14] mm/damon: Implement a debugfs-based user space interface
On Fri, 5 Feb 2021 16:29:41 +0100 Greg KH wrote: > On Thu, Feb 04, 2021 at 04:31:43PM +0100, SeongJae Park wrote: > > From: SeongJae Park > > > > DAMON is designed to be used by kernel space code such as the memory > > management subsystems, and therefore it provides only kernel space API. > > That said, letting the user space control DAMON could provide some > > benefits to them. For example, it will allow user space to analyze > > their specific workloads and make their own special optimizations. > > > > For such cases, this commit implements a simple DAMON application kernel > > module, namely 'damon-dbgfs', which merely wraps the DAMON api and > > exports those to the user space via the debugfs. > > > > 'damon-dbgfs' exports three files, ``attrs``, ``target_ids``, and > > ``monitor_on`` under its debugfs directory, ``/damon/``. [...] > > --- > > include/linux/damon.h | 3 + > > mm/damon/Kconfig | 9 + > > mm/damon/Makefile | 1 + > > mm/damon/core.c | 47 + > > mm/damon/dbgfs.c | 387 ++ > > 5 files changed, 447 insertions(+) > > create mode 100644 mm/damon/dbgfs.c [...] > > diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c > > new file mode 100644 > > index ..db15380737d1 > > --- /dev/null > > +++ b/mm/damon/dbgfs.c [...] > > + > > +static int dbgfs_fill_ctx_dir(struct dentry *dir, struct damon_ctx *ctx) > > +{ > > + const char * const file_names[] = {"attrs", "target_ids"}; > > + const struct file_operations *fops[] = {_fops, _ids_fops}; > > + int i; > > + > > + for (i = 0; i < ARRAY_SIZE(file_names); i++) { > > + if (!debugfs_create_file(file_names[i], 0600, dir, > > + ctx, fops[i])) { > > + pr_err("failed to create %s file\n", file_names[i]); > > + return -ENOMEM; > > No need to check the return value of this function, just keep going and > ignore it as there's nothing to do and kernel code should not do > different things based on the output of any debugfs calls. > > Also, this check is totally wrong and doesn't do what you think it is > doing... Ok, I will drop the check. > > > +static int __init __damon_dbgfs_init(void) > > +{ > > + struct dentry *dbgfs_root; > > + const char * const file_names[] = {"monitor_on"}; > > + const struct file_operations *fops[] = {_on_fops}; > > + int i; > > + > > + dbgfs_root = debugfs_create_dir("damon", NULL); > > + if (IS_ERR(dbgfs_root)) { > > + pr_err("failed to create the dbgfs dir\n"); > > + return PTR_ERR(dbgfs_root); > > Again, no need to check anything, just pass the result of a debugfs call > back into another one just fine. Ok. > > > + } > > + > > + for (i = 0; i < ARRAY_SIZE(file_names); i++) { > > + if (!debugfs_create_file(file_names[i], 0600, dbgfs_root, > > + NULL, fops[i])) { > > Again, this isn't checking what you think it is, so please don't do it. Got it. I will fix those as you suggested in the next version. Thanks, SeongJae Park > > thanks, > > greg k-h
[PATCH v24 05/14] mm/damon: Implement primitives for the virtual memory address spaces
From: SeongJae Park This commit introduces a reference implementation of the address space specific low level primitives for the virtual address space, so that users of DAMON can easily monitor the data accesses on virtual address spaces of specific processes by simply configuring the implementation to be used by DAMON. The low level primitives for the fundamental access monitoring are defined in two parts: 1. Identification of the monitoring target address range for the address space. 2. Access check of specific address range in the target space. The reference implementation for the virtual address space does the works as below. PTE Accessed-bit Based Access Check --- The implementation uses PTE Accessed-bit for basic access checks. That is, it clears the bit for the next sampling target page and checks whether it is set again after one sampling period. This could disturb the reclaim logic. DAMON uses ``PG_idle`` and ``PG_young`` page flags to solve the conflict, as Idle page tracking does. VMA-based Target Address Range Construction --- Only small parts in the super-huge virtual address space of the processes are mapped to physical memory and accessed. Thus, tracking the unmapped address regions is just wasteful. However, because DAMON can deal with some level of noise using the adaptive regions adjustment mechanism, tracking every mapping is not strictly required but could even incur a high overhead in some cases. That said, too huge unmapped areas inside the monitoring target should be removed to not take the time for the adaptive mechanism. For the reason, this implementation converts the complex mappings to three distinct regions that cover every mapped area of the address space. Also, the two gaps between the three regions are the two biggest unmapped areas in the given address space. The two biggest unmapped areas would be the gap between the heap and the uppermost mmap()-ed region, and the gap between the lowermost mmap()-ed region and the stack in most of the cases. Because these gaps are exceptionally huge in usual address spaces, excluding these will be sufficient to make a reasonable trade-off. Below shows this in detail:: (small mmap()-ed regions and munmap()-ed regions) Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster --- include/linux/damon.h | 13 + mm/damon/Kconfig | 9 + mm/damon/Makefile | 1 + mm/damon/vaddr.c | 579 ++ 4 files changed, 602 insertions(+) create mode 100644 mm/damon/vaddr.c diff --git a/include/linux/damon.h b/include/linux/damon.h index 0bd5d6913a6c..72cf5ebd35fe 100644 --- a/include/linux/damon.h +++ b/include/linux/damon.h @@ -246,4 +246,17 @@ int damon_stop(struct damon_ctx **ctxs, int nr_ctxs); #endif /* CONFIG_DAMON */ +#ifdef CONFIG_DAMON_VADDR + +/* Monitoring primitives for virtual memory address spaces */ +void damon_va_init(struct damon_ctx *ctx); +void damon_va_update(struct damon_ctx *ctx); +void damon_va_prepare_access_checks(struct damon_ctx *ctx); +unsigned int damon_va_check_accesses(struct damon_ctx *ctx); +bool damon_va_target_valid(void *t); +void damon_va_cleanup(struct damon_ctx *ctx); +void damon_va_set_primitives(struct damon_ctx *ctx); + +#endif /* CONFIG_DAMON_VADDR */ + #endif /* _DAMON_H */ diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig index d00e99ac1a15..8ae080c52950 100644 --- a/mm/damon/Kconfig +++ b/mm/damon/Kconfig @@ -12,4 +12,13 @@ config DAMON See https://damonitor.github.io/doc/html/latest-damon/index.html for more information. +config DAMON_VADDR + bool "Data access monitoring primitives for virtual address spaces" + depends on DAMON && MMU + select PAGE_EXTENSION if !64BIT + select PAGE_IDLE_FLAG + help + This builds the default data access monitoring primitives for DAMON + that works for virtual address spaces. + endmenu diff --git a/mm/damon/Makefile b/mm/damon/Makefile index 4fd2edb4becf..6ebbd08aed67 100644 --- a/mm/damon/Makefile +++ b/mm/damon/Makefile @@ -1,3 +1,4 @@ # SPDX-License-Identifier: GPL-2.0 obj-$(CONFIG_DAMON):= core.o +obj-$(CONFIG_DAMON_VADDR) += vaddr.o diff --git a/mm/damon/vaddr.c b/mm/damon/vaddr.c new file mode 100644 index ..a6bf234daae6 --- /dev/null +++ b/mm/damon/vaddr.c @@ -0,0 +1,579 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * DAMON Primitives for Virtual Address Spaces + * + * Author: SeongJae Park + */ + +#define pr_fmt(fmt) "damon-va: " fmt + +#include +#include +#include +#include +#include +#include +#include + +/* Get a random number in [l, r) */ +#define damon_rand(l, r) (l + prandom_u32_max(r - l)) + +/* + * 't->id' should be the pointer to the relevant 'struct pid' having reference + * count. Caller must put the returned task, u
[PATCH v24 14/14] MAINTAINERS: Update for DAMON
From: SeongJae Park This commit updates MAINTAINERS file for DAMON related files. Signed-off-by: SeongJae Park --- MAINTAINERS | 12 1 file changed, 12 insertions(+) diff --git a/MAINTAINERS b/MAINTAINERS index 281de213ef47..88b2125b0f07 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -4872,6 +4872,18 @@ F: net/ax25/ax25_out.c F: net/ax25/ax25_timer.c F: net/ax25/sysctl_net_ax25.c +DATA ACCESS MONITOR +M: SeongJae Park +L: linux...@kvack.org +S: Maintained +F: Documentation/admin-guide/mm/damon/* +F: Documentation/vm/damon/* +F: include/linux/damon.h +F: include/trace/events/damon.h +F: mm/damon/* +F: tools/damon/* +F: tools/testing/selftests/damon/* + DAVICOM FAST ETHERNET (DMFE) NETWORK DRIVER L: net...@vger.kernel.org S: Orphan -- 2.17.1
[PATCH v24 13/14] mm/damon: Add user space selftests
From: SeongJae Park This commit adds a simple user space tests for DAMON. The tests are using kselftest framework. Signed-off-by: SeongJae Park --- tools/testing/selftests/damon/Makefile| 7 + .../selftests/damon/_chk_dependency.sh| 28 +++ tools/testing/selftests/damon/_chk_record.py | 109 .../testing/selftests/damon/debugfs_attrs.sh | 161 ++ .../testing/selftests/damon/debugfs_record.sh | 50 ++ 5 files changed, 355 insertions(+) create mode 100644 tools/testing/selftests/damon/Makefile create mode 100644 tools/testing/selftests/damon/_chk_dependency.sh create mode 100644 tools/testing/selftests/damon/_chk_record.py create mode 100755 tools/testing/selftests/damon/debugfs_attrs.sh create mode 100755 tools/testing/selftests/damon/debugfs_record.sh diff --git a/tools/testing/selftests/damon/Makefile b/tools/testing/selftests/damon/Makefile new file mode 100644 index ..cfd5393a4639 --- /dev/null +++ b/tools/testing/selftests/damon/Makefile @@ -0,0 +1,7 @@ +# SPDX-License-Identifier: GPL-2.0 +# Makefile for damon selftests + +TEST_FILES = _chk_dependency.sh _chk_record_file.py +TEST_PROGS = debugfs_attrs.sh debugfs_record.sh + +include ../lib.mk diff --git a/tools/testing/selftests/damon/_chk_dependency.sh b/tools/testing/selftests/damon/_chk_dependency.sh new file mode 100644 index ..b304b7779976 --- /dev/null +++ b/tools/testing/selftests/damon/_chk_dependency.sh @@ -0,0 +1,28 @@ +#!/bin/bash +# SPDX-License-Identifier: GPL-2.0 + +# Kselftest framework requirement - SKIP code is 4. +ksft_skip=4 + +DBGFS=/sys/kernel/debug/damon + +if [ $EUID -ne 0 ]; +then + echo "Run as root" + exit $ksft_skip +fi + +if [ ! -d $DBGFS ] +then + echo "$DBGFS not found" + exit $ksft_skip +fi + +for f in attrs record target_ids monitor_on +do + if [ ! -f "$DBGFS/$f" ] + then + echo "$f not found" + exit 1 + fi +done diff --git a/tools/testing/selftests/damon/_chk_record.py b/tools/testing/selftests/damon/_chk_record.py new file mode 100644 index ..73e128904319 --- /dev/null +++ b/tools/testing/selftests/damon/_chk_record.py @@ -0,0 +1,109 @@ +#!/usr/bin/env python3 +# SPDX-License-Identifier: GPL-2.0 + +"Check whether the DAMON record file is valid" + +import argparse +import struct +import sys + +fmt_version = 0 + +def set_fmt_version(f): +global fmt_version + +mark = f.read(16) +if mark == b'damon_recfmt_ver': +fmt_version = struct.unpack('i', f.read(4))[0] +else: +fmt_version = 0 +f.seek(0) +return fmt_version + +def read_pid(f): +if fmt_version == 1: +pid = struct.unpack('i', f.read(4))[0] +else: +pid = struct.unpack('L', f.read(8))[0] + +def err_percent(val, expected): +return abs(val - expected) / expected * 100 + +def chk_task_info(f): +pid = read_pid(f) +nr_regions = struct.unpack('I', f.read(4))[0] + +if nr_regions > max_nr_regions: +print('too many regions: %d > %d' % (nr_regions, max_nr_regions)) +exit(1) + +nr_gaps = 0 +eaddr = 0 +for r in range(nr_regions): +saddr = struct.unpack('L', f.read(8))[0] +if eaddr and saddr != eaddr: +nr_gaps += 1 +eaddr = struct.unpack('L', f.read(8))[0] +nr_accesses = struct.unpack('I', f.read(4))[0] + +if saddr >= eaddr: +print('wrong region [%d,%d)' % (saddr, eaddr)) +exit(1) + +max_nr_accesses = aint / sint +if nr_accesses > max_nr_accesses: +if err_percent(nr_accesses, max_nr_accesses) > 15: +print('too high nr_access: expected %d but %d' % +(max_nr_accesses, nr_accesses)) +exit(1) +if nr_gaps != 2: +print('number of gaps are not two but %d' % nr_gaps) +exit(1) + +def parse_time_us(bindat): +sec = struct.unpack('l', bindat[0:8])[0] +nsec = struct.unpack('l', bindat[8:16])[0] +return (sec * 10 + nsec) / 1000 + +def main(): +global sint +global aint +global min_nr +global max_nr_regions + +parser = argparse.ArgumentParser() +parser.add_argument('file', metavar='', +help='path to the record file') +parser.add_argument('--attrs', metavar='', +default='5000 10 100 10 1000', +help='content of debugfs attrs file') +args = parser.parse_args() +file_path = args.file +attrs = [int(x) for x in args.attrs.split()] +sint, aint, rint, min_nr, max_nr_regions = attrs + +with open(file_path, 'rb') as f: +set_fmt_version(f) +last_aggr_time = None +while True: +timebin = f.read(16) +if len(timebin) != 16: +break + +now = parse_time_us(timebin) +if not last_a
[PATCH v24 12/14] mm/damon: Add kunit tests
From: SeongJae Park This commit adds kunit based unit tests for the core and the virtual address spaces monitoring primitives of DAMON. Signed-off-by: SeongJae Park Reviewed-by: Brendan Higgins --- mm/damon/Kconfig | 36 + mm/damon/core-test.h | 253 mm/damon/core.c | 7 + mm/damon/dbgfs-test.h | 214 +++ mm/damon/dbgfs.c | 2 + mm/damon/vaddr-test.h | 328 ++ mm/damon/vaddr.c | 7 + 7 files changed, 847 insertions(+) create mode 100644 mm/damon/core-test.h create mode 100644 mm/damon/dbgfs-test.h create mode 100644 mm/damon/vaddr-test.h diff --git a/mm/damon/Kconfig b/mm/damon/Kconfig index 72f1683ba0ee..455995152697 100644 --- a/mm/damon/Kconfig +++ b/mm/damon/Kconfig @@ -12,6 +12,18 @@ config DAMON See https://damonitor.github.io/doc/html/latest-damon/index.html for more information. +config DAMON_KUNIT_TEST + bool "Test for damon" if !KUNIT_ALL_TESTS + depends on DAMON && KUNIT=y + default KUNIT_ALL_TESTS + help + This builds the DAMON Kunit test suite. + + For more information on KUnit and unit tests in general, please refer + to the KUnit documentation. + + If unsure, say N. + config DAMON_VADDR bool "Data access monitoring primitives for virtual address spaces" depends on DAMON && MMU @@ -21,6 +33,18 @@ config DAMON_VADDR This builds the default data access monitoring primitives for DAMON that works for virtual address spaces. +config DAMON_VADDR_KUNIT_TEST + bool "Test for DAMON primitives" if !KUNIT_ALL_TESTS + depends on DAMON_VADDR && KUNIT=y + default KUNIT_ALL_TESTS + help + This builds the DAMON virtual addresses primitives Kunit test suite. + + For more information on KUnit and unit tests in general, please refer + to the KUnit documentation. + + If unsure, say N. + config DAMON_DBGFS bool "DAMON debugfs interface" depends on DAMON_VADDR && DEBUG_FS @@ -30,4 +54,16 @@ config DAMON_DBGFS If unsure, say N. +config DAMON_DBGFS_KUNIT_TEST + bool "Test for damon debugfs interface" if !KUNIT_ALL_TESTS + depends on DAMON_DBGFS && KUNIT=y + default KUNIT_ALL_TESTS + help + This builds the DAMON debugfs interface Kunit test suite. + + For more information on KUnit and unit tests in general, please refer + to the KUnit documentation. + + If unsure, say N. + endmenu diff --git a/mm/damon/core-test.h b/mm/damon/core-test.h new file mode 100644 index ..b815dfbfb5fd --- /dev/null +++ b/mm/damon/core-test.h @@ -0,0 +1,253 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Data Access Monitor Unit Tests + * + * Copyright 2019 Amazon.com, Inc. or its affiliates. All rights reserved. + * + * Author: SeongJae Park + */ + +#ifdef CONFIG_DAMON_KUNIT_TEST + +#ifndef _DAMON_CORE_TEST_H +#define _DAMON_CORE_TEST_H + +#include + +static void damon_test_regions(struct kunit *test) +{ + struct damon_region *r; + struct damon_target *t; + + r = damon_new_region(1, 2); + KUNIT_EXPECT_EQ(test, 1ul, r->ar.start); + KUNIT_EXPECT_EQ(test, 2ul, r->ar.end); + KUNIT_EXPECT_EQ(test, 0u, r->nr_accesses); + + t = damon_new_target(42); + KUNIT_EXPECT_EQ(test, 0u, damon_nr_regions(t)); + + damon_add_region(r, t); + KUNIT_EXPECT_EQ(test, 1u, damon_nr_regions(t)); + + damon_del_region(r); + KUNIT_EXPECT_EQ(test, 0u, damon_nr_regions(t)); + + damon_free_target(t); +} + +static unsigned int nr_damon_targets(struct damon_ctx *ctx) +{ + struct damon_target *t; + unsigned int nr_targets = 0; + + damon_for_each_target(t, ctx) + nr_targets++; + + return nr_targets; +} + +static void damon_test_target(struct kunit *test) +{ + struct damon_ctx *c = damon_new_ctx(); + struct damon_target *t; + + t = damon_new_target(42); + KUNIT_EXPECT_EQ(test, 42ul, t->id); + KUNIT_EXPECT_EQ(test, 0u, nr_damon_targets(c)); + + damon_add_target(c, t); + KUNIT_EXPECT_EQ(test, 1u, nr_damon_targets(c)); + + damon_destroy_target(t); + KUNIT_EXPECT_EQ(test, 0u, nr_damon_targets(c)); + + damon_destroy_ctx(c); +} + +/* + * Test kdamond_reset_aggregated() + * + * DAMON checks access to each region and aggregates this information as the + * access frequency of each region. In detail, it increases '->nr_accesses' of + * regions that an access has confirmed. 'kdamond_reset_aggregated()' flushes + * the aggregated information ('->nr_accesses' of each regions) to the result + * buffer. As a result of the flushing, the '->nr_accesses' of regions are + * initialized to ze
[PATCH v24 09/14] mm/damon/dbgfs: Export kdamond pid to the user space
From: SeongJae Park For CPU usage accounting, knowing pid of the monitoring thread could be helpful. For example, users could use cpuaccount cgroups with the pid. This commit therefore exports the pid of currently running monitoring thread to the user space via 'kdamond_pid' file in the debugfs directory. Signed-off-by: SeongJae Park --- mm/damon/dbgfs.c | 37 +++-- 1 file changed, 35 insertions(+), 2 deletions(-) diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c index dce4409e5887..4b9ac2043e99 100644 --- a/mm/damon/dbgfs.c +++ b/mm/damon/dbgfs.c @@ -358,6 +358,32 @@ static ssize_t dbgfs_target_ids_write(struct file *file, return ret; } +static ssize_t dbgfs_kdamond_pid_read(struct file *file, + char __user *buf, size_t count, loff_t *ppos) +{ + struct damon_ctx *ctx = file->private_data; + char *kbuf; + ssize_t len; + + kbuf = kmalloc(count, GFP_KERNEL); + if (!kbuf) + return -ENOMEM; + + mutex_lock(>kdamond_lock); + if (ctx->kdamond) + len = scnprintf(kbuf, count, "%d\n", ctx->kdamond->pid); + else + len = scnprintf(kbuf, count, "none\n"); + mutex_unlock(>kdamond_lock); + if (!len) + goto out; + len = simple_read_from_buffer(buf, count, ppos, kbuf, len); + +out: + kfree(kbuf); + return len; +} + static int damon_dbgfs_open(struct inode *inode, struct file *file) { file->private_data = inode->i_private; @@ -386,11 +412,18 @@ static const struct file_operations target_ids_fops = { .write = dbgfs_target_ids_write, }; +static const struct file_operations kdamond_pid_fops = { + .owner = THIS_MODULE, + .open = damon_dbgfs_open, + .read = dbgfs_kdamond_pid_read, +}; + static int dbgfs_fill_ctx_dir(struct dentry *dir, struct damon_ctx *ctx) { - const char * const file_names[] = {"attrs", "record", "target_ids"}; + const char * const file_names[] = {"attrs", "record", "target_ids", + "kdamond_pid"}; const struct file_operations *fops[] = {_fops, _fops, - _ids_fops}; + _ids_fops, _pid_fops}; int i; for (i = 0; i < ARRAY_SIZE(file_names); i++) { -- 2.17.1
[PATCH v24 08/14] mm/damon/dbgfs: Implement recording feature
From: SeongJae Park The user space users can control DAMON and get the monitoring results via the 'damon_aggregated' tracepoint event. However, dealing with the tracepoint might be complex for some simple use cases. This commit therefore implements 'recording' feature in 'damon-dbgfs'. The feature can be used via 'record' file in the '/damon/' directory. The file allows users to record monitored access patterns in a regular binary file. The recorded results are first written in an in-memory buffer and flushed to a file in batch. Users can get and set the size of the buffer and the path to the result file by reading from and writing to the ``record`` file. For example, below commands set the buffer to be 4 KiB and the result to be saved in ``/damon.data``. :: # cd /damon # echo "4096 /damon.data" > record # cat record 4096 /damon.data The recording can be disabled by setting the buffer size zero. Signed-off-by: SeongJae Park --- mm/damon/dbgfs.c | 261 ++- 1 file changed, 259 insertions(+), 2 deletions(-) diff --git a/mm/damon/dbgfs.c b/mm/damon/dbgfs.c index db15380737d1..dce4409e5887 100644 --- a/mm/damon/dbgfs.c +++ b/mm/damon/dbgfs.c @@ -15,6 +15,17 @@ #include #include +#define MIN_RECORD_BUFFER_LEN 1024 +#define MAX_RECORD_BUFFER_LEN (4 * 1024 * 1024) +#define MAX_RFILE_PATH_LEN 256 + +struct dbgfs_recorder { + unsigned char *rbuf; + unsigned int rbuf_len; + unsigned int rbuf_offset; + char *rfile_path; +}; + static struct damon_ctx **dbgfs_ctxs; static int dbgfs_nr_ctxs; static struct dentry **dbgfs_dirs; @@ -97,6 +108,116 @@ static ssize_t dbgfs_attrs_write(struct file *file, return ret; } +static ssize_t dbgfs_record_read(struct file *file, + char __user *buf, size_t count, loff_t *ppos) +{ + struct damon_ctx *ctx = file->private_data; + struct dbgfs_recorder *rec = ctx->callback.private; + char record_buf[20 + MAX_RFILE_PATH_LEN]; + int ret; + + mutex_lock(>kdamond_lock); + ret = scnprintf(record_buf, ARRAY_SIZE(record_buf), "%u %s\n", + rec->rbuf_len, rec->rfile_path); + mutex_unlock(>kdamond_lock); + return simple_read_from_buffer(buf, count, ppos, record_buf, ret); +} + +/* + * dbgfs_set_recording() - Set attributes for the recording. + * @ctx: target kdamond context + * @rbuf_len: length of the result buffer + * @rfile_path:path to the monitor result files + * + * Setting 'rbuf_len' 0 disables recording. + * + * This function should not be called while the kdamond is running. + * + * Return: 0 on success, negative error code otherwise. + */ +static int dbgfs_set_recording(struct damon_ctx *ctx, + unsigned int rbuf_len, char *rfile_path) +{ + struct dbgfs_recorder *recorder; + size_t rfile_path_len; + + if (rbuf_len && (rbuf_len > MAX_RECORD_BUFFER_LEN || + rbuf_len < MIN_RECORD_BUFFER_LEN)) { + pr_err("result buffer size (%u) is out of [%d,%d]\n", + rbuf_len, MIN_RECORD_BUFFER_LEN, + MAX_RECORD_BUFFER_LEN); + return -EINVAL; + } + rfile_path_len = strnlen(rfile_path, MAX_RFILE_PATH_LEN); + if (rfile_path_len >= MAX_RFILE_PATH_LEN) { + pr_err("too long (>%d) result file path %s\n", + MAX_RFILE_PATH_LEN, rfile_path); + return -EINVAL; + } + + recorder = ctx->callback.private; + if (!recorder) { + recorder = kzalloc(sizeof(*recorder), GFP_KERNEL); + if (!recorder) + return -ENOMEM; + ctx->callback.private = recorder; + } + + recorder->rbuf_len = rbuf_len; + kfree(recorder->rbuf); + recorder->rbuf = NULL; + kfree(recorder->rfile_path); + recorder->rfile_path = NULL; + + if (rbuf_len) { + recorder->rbuf = kvmalloc(rbuf_len, GFP_KERNEL); + if (!recorder->rbuf) + return -ENOMEM; + } + recorder->rfile_path = kmalloc(rfile_path_len + 1, GFP_KERNEL); + if (!recorder->rfile_path) + return -ENOMEM; + strncpy(recorder->rfile_path, rfile_path, rfile_path_len + 1); + + return 0; +} + +static ssize_t dbgfs_record_write(struct file *file, + const char __user *buf, size_t count, loff_t *ppos) +{ + struct damon_ctx *ctx = file->private_data; + char *kbuf; + unsigned int rbuf_len; + char rfile_path[MAX_RFILE_PATH_LEN]; + ssize_t ret = count; + int err; + + kbuf = user_input_str(buf, count, ppos); + if (IS_ERR(kbuf)) + return PTR_ERR(kbuf); + + if (sscanf(kbuf
[PATCH v24 06/14] mm/damon: Add a tracepoint
From: SeongJae Park This commit adds a tracepoint for DAMON. It traces the monitoring results of each region for each aggregation interval. Using this, DAMON can easily integrated with tracepoints supporting tools such as perf. Signed-off-by: SeongJae Park Reviewed-by: Leonard Foerster Reviewed-by: Steven Rostedt (VMware) --- include/trace/events/damon.h | 43 mm/damon/core.c | 7 +- 2 files changed, 49 insertions(+), 1 deletion(-) create mode 100644 include/trace/events/damon.h diff --git a/include/trace/events/damon.h b/include/trace/events/damon.h new file mode 100644 index ..2f422f4f1fb9 --- /dev/null +++ b/include/trace/events/damon.h @@ -0,0 +1,43 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#undef TRACE_SYSTEM +#define TRACE_SYSTEM damon + +#if !defined(_TRACE_DAMON_H) || defined(TRACE_HEADER_MULTI_READ) +#define _TRACE_DAMON_H + +#include +#include +#include + +TRACE_EVENT(damon_aggregated, + + TP_PROTO(struct damon_target *t, struct damon_region *r, + unsigned int nr_regions), + + TP_ARGS(t, r, nr_regions), + + TP_STRUCT__entry( + __field(unsigned long, target_id) + __field(unsigned int, nr_regions) + __field(unsigned long, start) + __field(unsigned long, end) + __field(unsigned int, nr_accesses) + ), + + TP_fast_assign( + __entry->target_id = t->id; + __entry->nr_regions = nr_regions; + __entry->start = r->ar.start; + __entry->end = r->ar.end; + __entry->nr_accesses = r->nr_accesses; + ), + + TP_printk("target_id=%lu nr_regions=%u %lu-%lu: %u", + __entry->target_id, __entry->nr_regions, + __entry->start, __entry->end, __entry->nr_accesses) +); + +#endif /* _TRACE_DAMON_H */ + +/* This part must be outside protection */ +#include diff --git a/mm/damon/core.c b/mm/damon/core.c index b36b6bdd94e2..912112662d0c 100644 --- a/mm/damon/core.c +++ b/mm/damon/core.c @@ -13,6 +13,9 @@ #include #include +#define CREATE_TRACE_POINTS +#include + /* Get a random number in [l, r) */ #define damon_rand(l, r) (l + prandom_u32_max(r - l)) @@ -388,8 +391,10 @@ static void kdamond_reset_aggregated(struct damon_ctx *c) damon_for_each_target(t, c) { struct damon_region *r; - damon_for_each_region(r, t) + damon_for_each_region(r, t) { + trace_damon_aggregated(t, r, damon_nr_regions(t)); r->nr_accesses = 0; + } } } -- 2.17.1