Re: [PATCH -next v6 0/2] Make memory reclamation measurable
在 2024/3/7 17:26, Michal Hocko 写道: The main reasons for adding static tracepoints are: 1. To subdivide the time spent in the shrinker->count_objects() and shrinker->scan_objects() functions within the do_shrink_slab function. Using BPF kprobe, we can only track the time spent in the do_shrink_slab function. 2. When tracing frequently called functions, static tracepoints (BPF tp/tracepoint) have lower performance impact compared to dynamic tracepoints (BPF kprobe). You can track the time process has been preempted by other means, no? We have context switching tracepoints in place. Have you considered that option? Let me think about it... Thanks Bixuan Cui
Re: [PATCH -next v6 0/2] Make memory reclamation measurable
在 2024/2/21 15:44, Michal Hocko 写道: It would be really helpful to have more details on why we need those trace points. It is my understanding that you would like to have a more fine grained numbers for the time duration of different parts of the reclaim process. I can imagine this could be useful in some cases but is it useful enough and for a wider variety of workloads? Is that worth a dedicated static tracepoints? Why an add-hoc dynamic tracepoints or BPF for a very special situation is not sufficient? In other words, tell us more about the usecases and why is this generally useful. Thank you for your reply, I'm sorry that I forgot to describe the detailed reason. Memory reclamation usually occurs when there is high memory pressure (or low memory) and is performed by Kswapd. In embedded systems, CPU resources are limited, and it is common for kswapd and critical processes (which typically require a large amount of memory and trigger memory reclamation) to compete for CPU resources. which in turn affects the execution of this key process, causing the execution time to increase and causing lags,such as dropped frames or slower startup times in mobile games. Currently, with the help of kernel trace events or tools like Perfetto, we can only see that kswapd is competing for CPU and the frequency of memory reclamation triggers, but we do not have detailed information or metrics about memory reclamation, such as the duration and amount of each reclamation, or who is releasing memory (super_cache, f2fs, ext4), etc. This makes it impossible to locate the above problems. Currently this patch helps us solve 2 actual performance problems (kswapd preempts the CPU causing game delay) 1. The increased memory allocation in the game (across different versions) has led to the degradation of kswapd. This is found by calculating the total amount of Reclaim(page) during the game startup phase. 2. The adoption of a different file system in the new system version has resulted in a slower reclamation rate. This is discovered through the OBJ_NAME change. For example, OBJ_NAME changes from super_cache_scan to ext4_es_scan. Subsequently, it is also possible to calculate the memory reclamation rate to evaluate the memory performance of different versions. The main reasons for adding static tracepoints are: 1. To subdivide the time spent in the shrinker->count_objects() and shrinker->scan_objects() functions within the do_shrink_slab function. Using BPF kprobe, we can only track the time spent in the do_shrink_slab function. 2. When tracing frequently called functions, static tracepoints (BPF tp/tracepoint) have lower performance impact compared to dynamic tracepoints (BPF kprobe). Thanks Bixuan Cui
Re: [PATCH -next v6 0/2] Make memory reclamation measurable
在 2024/2/21 10:22, Steven Rostedt 写道: It's up to the memory management folks to decide on this. -- Steve Noted with thanks. Bixuan Cui
Re: [PATCH -next v6 0/2] Make memory reclamation measurable
ping~ 在 2024/1/5 9:36, Bixuan Cui 写道: When the system memory is low, kswapd reclaims the memory. The key steps of memory reclamation include 1.shrink_lruvec * shrink_active_list, moves folios from the active LRU to the inactive LRU * shrink_inactive_list, shrink lru from inactive LRU list 2.shrink_slab * shrinker->count_objects(), calculates the freeable memory * shrinker->scan_objects(), reclaims the slab memory The existing tracers in the vmscan are as follows: --do_try_to_free_pages --shrink_zones --trace_mm_vmscan_node_reclaim_begin (tracer) --shrink_node --shrink_node_memcgs --trace_mm_vmscan_memcg_shrink_begin (tracer) --shrink_lruvec --shrink_list --shrink_active_list --trace_mm_vmscan_lru_shrink_active (tracer) --shrink_inactive_list --trace_mm_vmscan_lru_shrink_inactive (tracer) --shrink_active_list --shrink_slab --do_shrink_slab --shrinker->count_objects() --trace_mm_shrink_slab_start (tracer) --shrinker->scan_objects() --trace_mm_shrink_slab_end (tracer) --trace_mm_vmscan_memcg_shrink_end (tracer) --trace_mm_vmscan_node_reclaim_end (tracer) If we get the duration and quantity of shrink lru and slab, then we can measure the memory recycling, as follows Measuring memory reclamation with bpf: LRU FILE: CPU COMMShrinkActive(us) ShrinkInactive(us) Reclaim(page) 7 kswapd0 26 51 32 7 kswapd0 52 47 13 SLAB: CPU COMMOBJ_NAMECount_Dur(us) Freeable(page) Scan_Dur(us) Reclaim(page) 1 kswapd0 super_cache_scan.cfi_jt 2 341 3225 128 7 kswapd0 super_cache_scan.cfi_jt 0 2247 8524 1024 7 kswapd0 super_cache_scan.cfi_jt 23670 00 For this, add the new tracer to shrink_active_list/shrink_inactive_list and shrinker->count_objects(). Changes: v6: * Add Reviewed-by from Steven Rostedt. v5: * Use 'DECLARE_EVENT_CLASS(mm_vmscan_lru_shrink_start_template' to replace 'RACE_EVENT(mm_vmscan_lru_shrink_inactive/active_start' * Add the explanation for adding new shrink lru events into 'mm: vmscan: add new event to trace shrink lru' v4: Add Reviewed-by and Changlog to every patch. v3: Swap the positions of 'nid' and 'freeable' to prevent the hole in the trace event. v2: Modify trace_mm_vmscan_lru_shrink_inactive() in evict_folios() at the same time to fix build error. cuibixuan (2): mm: shrinker: add new event to trace shrink count mm: vmscan: add new event to trace shrink lru include/trace/events/vmscan.h | 80 ++- mm/shrinker.c | 4 ++ mm/vmscan.c | 11 +++-- 3 files changed, 90 insertions(+), 5 deletions(-)
Re: [PATCH -next v6 0/2] Make memory reclamation measurable
ping~ 在 2024/1/5 9:36, Bixuan Cui 写道: When the system memory is low, kswapd reclaims the memory. The key steps of memory reclamation include 1.shrink_lruvec * shrink_active_list, moves folios from the active LRU to the inactive LRU * shrink_inactive_list, shrink lru from inactive LRU list 2.shrink_slab * shrinker->count_objects(), calculates the freeable memory * shrinker->scan_objects(), reclaims the slab memory The existing tracers in the vmscan are as follows: --do_try_to_free_pages --shrink_zones --trace_mm_vmscan_node_reclaim_begin (tracer) --shrink_node --shrink_node_memcgs --trace_mm_vmscan_memcg_shrink_begin (tracer) --shrink_lruvec --shrink_list --shrink_active_list --trace_mm_vmscan_lru_shrink_active (tracer) --shrink_inactive_list --trace_mm_vmscan_lru_shrink_inactive (tracer) --shrink_active_list --shrink_slab --do_shrink_slab --shrinker->count_objects() --trace_mm_shrink_slab_start (tracer) --shrinker->scan_objects() --trace_mm_shrink_slab_end (tracer) --trace_mm_vmscan_memcg_shrink_end (tracer) --trace_mm_vmscan_node_reclaim_end (tracer) If we get the duration and quantity of shrink lru and slab, then we can measure the memory recycling, as follows Measuring memory reclamation with bpf: LRU FILE: CPU COMMShrinkActive(us) ShrinkInactive(us) Reclaim(page) 7 kswapd0 26 51 32 7 kswapd0 52 47 13 SLAB: CPU COMMOBJ_NAMECount_Dur(us) Freeable(page) Scan_Dur(us) Reclaim(page) 1 kswapd0 super_cache_scan.cfi_jt 2 341 3225 128 7 kswapd0 super_cache_scan.cfi_jt 0 2247 8524 1024 7 kswapd0 super_cache_scan.cfi_jt 23670 00 For this, add the new tracer to shrink_active_list/shrink_inactive_list and shrinker->count_objects(). Changes: v6: * Add Reviewed-by from Steven Rostedt. v5: * Use 'DECLARE_EVENT_CLASS(mm_vmscan_lru_shrink_start_template' to replace 'RACE_EVENT(mm_vmscan_lru_shrink_inactive/active_start' * Add the explanation for adding new shrink lru events into 'mm: vmscan: add new event to trace shrink lru' v4: Add Reviewed-by and Changlog to every patch. v3: Swap the positions of 'nid' and 'freeable' to prevent the hole in the trace event. v2: Modify trace_mm_vmscan_lru_shrink_inactive() in evict_folios() at the same time to fix build error. cuibixuan (2): mm: shrinker: add new event to trace shrink count mm: vmscan: add new event to trace shrink lru include/trace/events/vmscan.h | 80 ++- mm/shrinker.c | 4 ++ mm/vmscan.c | 11 +++-- 3 files changed, 90 insertions(+), 5 deletions(-)
Re: [PATCH -next v6 0/2] Make memory reclamation measurable
ping~ 在 2024/1/5 9:36, Bixuan Cui 写道: When the system memory is low, kswapd reclaims the memory. The key steps of memory reclamation include 1.shrink_lruvec * shrink_active_list, moves folios from the active LRU to the inactive LRU * shrink_inactive_list, shrink lru from inactive LRU list 2.shrink_slab * shrinker->count_objects(), calculates the freeable memory * shrinker->scan_objects(), reclaims the slab memory The existing tracers in the vmscan are as follows: --do_try_to_free_pages --shrink_zones --trace_mm_vmscan_node_reclaim_begin (tracer) --shrink_node --shrink_node_memcgs --trace_mm_vmscan_memcg_shrink_begin (tracer) --shrink_lruvec --shrink_list --shrink_active_list --trace_mm_vmscan_lru_shrink_active (tracer) --shrink_inactive_list --trace_mm_vmscan_lru_shrink_inactive (tracer) --shrink_active_list --shrink_slab --do_shrink_slab --shrinker->count_objects() --trace_mm_shrink_slab_start (tracer) --shrinker->scan_objects() --trace_mm_shrink_slab_end (tracer) --trace_mm_vmscan_memcg_shrink_end (tracer) --trace_mm_vmscan_node_reclaim_end (tracer) If we get the duration and quantity of shrink lru and slab, then we can measure the memory recycling, as follows Measuring memory reclamation with bpf: LRU FILE: CPU COMMShrinkActive(us) ShrinkInactive(us) Reclaim(page) 7 kswapd0 26 51 32 7 kswapd0 52 47 13 SLAB: CPU COMMOBJ_NAMECount_Dur(us) Freeable(page) Scan_Dur(us) Reclaim(page) 1 kswapd0 super_cache_scan.cfi_jt 2 341 3225 128 7 kswapd0 super_cache_scan.cfi_jt 0 2247 8524 1024 7 kswapd0 super_cache_scan.cfi_jt 23670 00 For this, add the new tracer to shrink_active_list/shrink_inactive_list and shrinker->count_objects(). Changes: v6: * Add Reviewed-by from Steven Rostedt. v5: * Use 'DECLARE_EVENT_CLASS(mm_vmscan_lru_shrink_start_template' to replace 'RACE_EVENT(mm_vmscan_lru_shrink_inactive/active_start' * Add the explanation for adding new shrink lru events into 'mm: vmscan: add new event to trace shrink lru' v4: Add Reviewed-by and Changlog to every patch. v3: Swap the positions of 'nid' and 'freeable' to prevent the hole in the trace event. v2: Modify trace_mm_vmscan_lru_shrink_inactive() in evict_folios() at the same time to fix build error. cuibixuan (2): mm: shrinker: add new event to trace shrink count mm: vmscan: add new event to trace shrink lru include/trace/events/vmscan.h | 80 ++- mm/shrinker.c | 4 ++ mm/vmscan.c | 11 +++-- 3 files changed, 90 insertions(+), 5 deletions(-)
[PATCH -next v6 2/2] mm: vmscan: add new event to trace shrink lru
From: cuibixuan Page reclaim is an important part of memory reclaim, including: * shrink_active_list(), moves folios from the active LRU to the inactive LRU * shrink_inactive_list(), shrink lru from inactive LRU list Add the new events to calculate the execution time to better evaluate the entire memory recycling ratio. Example of output: kswapd0-103 [007] . 1098.353020: mm_vmscan_lru_shrink_active_start: nid=0 kswapd0-103 [007] . 1098.353040: mm_vmscan_lru_shrink_active_end: nid=0 nr_taken=32 nr_active=0 nr_deactivated=32 nr_referenced=0 priority=6 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC kswapd0-103 [007] . 1098.353040: mm_vmscan_lru_shrink_inactive_start: nid=0 kswapd0-103 [007] . 1098.353094: mm_vmscan_lru_shrink_inactive_end: nid=0 nr_scanned=32 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate_anon=0 nr_activate_file=0 nr_ref_keep=32 nr_unmap_fail=0 priority=6 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC kswapd0-103 [007] . 1098.353094: mm_vmscan_lru_shrink_inactive_start: nid=0 kswapd0-103 [007] . 1098.353162: mm_vmscan_lru_shrink_inactive_end: nid=0 nr_scanned=32 nr_reclaimed=21 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate_anon=0 nr_activate_file=0 nr_ref_keep=11 nr_unmap_fail=0 priority=6 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC Signed-off-by: Bixuan Cui Reviewed-by: Andrew Morton Reviewed-by: Steven Rostedt (Google) --- Changes: v6: * Add Reviewed-by from Steven Rostedt. v5: * Use 'DECLARE_EVENT_CLASS(mm_vmscan_lru_shrink_start_template' to replace 'RACE_EVENT(mm_vmscan_lru_shrink_inactive/active_start' * Add the explanation for adding new shrink lru events into 'mm: vmscan: add new event to trace shrink lru' v4: * Add Reviewed-by and Changlog to every patch. v3: * Swap the positions of 'nid' and 'freeable' to prevent the hole in the trace event. v2: * Modify trace_mm_vmscan_lru_shrink_inactive() in evict_folios() at the same time to fix build error (Andrew pointed out). include/trace/events/vmscan.h | 31 +-- mm/vmscan.c | 11 --- 2 files changed, 37 insertions(+), 5 deletions(-) diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h index b99cd28c9815..4793d952c248 100644 --- a/include/trace/events/vmscan.h +++ b/include/trace/events/vmscan.h @@ -395,7 +395,34 @@ TRACE_EVENT(mm_vmscan_write_folio, show_reclaim_flags(__entry->reclaim_flags)) ); -TRACE_EVENT(mm_vmscan_lru_shrink_inactive, +DECLARE_EVENT_CLASS(mm_vmscan_lru_shrink_start_template, + + TP_PROTO(int nid), + + TP_ARGS(nid), + + TP_STRUCT__entry( + __field(int, nid) + ), + + TP_fast_assign( + __entry->nid = nid; + ), + + TP_printk("nid=%d", __entry->nid) +); + +DEFINE_EVENT(mm_vmscan_lru_shrink_start_template, mm_vmscan_lru_shrink_inactive_start, + TP_PROTO(int nid), + TP_ARGS(nid) +); + +DEFINE_EVENT(mm_vmscan_lru_shrink_start_template, mm_vmscan_lru_shrink_active_start, + TP_PROTO(int nid), + TP_ARGS(nid) +); + +TRACE_EVENT(mm_vmscan_lru_shrink_inactive_end, TP_PROTO(int nid, unsigned long nr_scanned, unsigned long nr_reclaimed, @@ -446,7 +473,7 @@ TRACE_EVENT(mm_vmscan_lru_shrink_inactive, show_reclaim_flags(__entry->reclaim_flags)) ); -TRACE_EVENT(mm_vmscan_lru_shrink_active, +TRACE_EVENT(mm_vmscan_lru_shrink_active_end, TP_PROTO(int nid, unsigned long nr_taken, unsigned long nr_active, unsigned long nr_deactivated, diff --git a/mm/vmscan.c b/mm/vmscan.c index 4e3b835c6b4a..a44d9624d60f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1906,6 +1906,8 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, struct pglist_data *pgdat = lruvec_pgdat(lruvec); bool stalled = false; + trace_mm_vmscan_lru_shrink_inactive_start(pgdat->node_id); + while (unlikely(too_many_isolated(pgdat, file, sc))) { if (stalled) return 0; @@ -1990,7 +1992,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, if (file) sc->nr.file_taken += nr_taken; - trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, + trace_mm_vmscan_lru_shrink_inactive_end(pgdat->node_id, nr_scanned, nr_reclaimed, , sc->priority, file); return nr_reclaimed; } @@ -2028,6 +2030,8 @@ static void shrink_active_list(unsigned long nr_to_scan, int file = is_file_lru(lru); struct pglist_data *pgdat = lruvec_pgdat(lruvec); + trace_mm_vmscan_lru_shrink_active_start(pgdat->node_id); + lru_add_drain(); spin_lock_irq(>lru_lock); @@ -2107,7 +2111,7 @@ static void shrink_active_list(unsigned long nr_to
[PATCH -next v6 1/2] mm: shrinker: add new event to trace shrink count
From: cuibixuan do_shrink_slab() calculates the freeable memory through shrinker->count_objects(), and then reclaims the memory through shrinker->scan_objects(). When reclaiming memory, shrinker->count_objects() takes a certain amount of time: Fun spend(us) ext4_es_count 4302 ext4_es_scan 12 super_cache_count 4195 super_cache_scan 2103 Therefore, adding the trace event to count_objects() can more accurately obtain the time taken for slab memory recycling. Example of output: kswapd0-103 [003] . 1098.317942: mm_shrink_count_start: kfree_rcu_shrink_count.cfi_jt+0x0/0x8 c540ff51: nid: 0 kswapd0-103 [003] . 1098.317951: mm_shrink_count_end: kfree_rcu_shrink_count.cfi_jt+0x0/0x8 c540ff51: nid: 0 freeable:36 Signed-off-by: Bixuan Cui Reviewed-by: Steven Rostedt (Google) --- Changes: v6: * Add Reviewed-by from Steven Rostedt. v5: * Use 'DECLARE_EVENT_CLASS(mm_vmscan_lru_shrink_start_template' to replace 'RACE_EVENT(mm_vmscan_lru_shrink_inactive/active_start' * Add the explanation for adding new shrink lru events into 'mm: vmscan: add new event to trace shrink lru' v4: * Add Reviewed-by and Changlog to every patch. v3: * Swap the positions of 'nid' and 'freeable' to prevent the hole in the trace event. v2: * Modify trace_mm_vmscan_lru_shrink_inactive() in evict_folios() at the same time to fix build error (Andrew pointed out). include/trace/events/vmscan.h | 49 +++ mm/shrinker.c | 4 +++ 2 files changed, 53 insertions(+) diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h index 1a488c30afa5..b99cd28c9815 100644 --- a/include/trace/events/vmscan.h +++ b/include/trace/events/vmscan.h @@ -196,6 +196,55 @@ DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_memcg_softlimit_re ); #endif /* CONFIG_MEMCG */ +TRACE_EVENT(mm_shrink_count_start, + TP_PROTO(struct shrinker *shr, struct shrink_control *sc), + + TP_ARGS(shr, sc), + + TP_STRUCT__entry( + __field(struct shrinker *, shr) + __field(void *, shrink) + __field(int, nid) + ), + + TP_fast_assign( + __entry->shr = shr; + __entry->shrink = shr->count_objects; + __entry->nid = sc->nid; + ), + + TP_printk("%pS %p: nid: %d", + __entry->shrink, + __entry->shr, + __entry->nid) +); + +TRACE_EVENT(mm_shrink_count_end, + TP_PROTO(struct shrinker *shr, struct shrink_control *sc, long freeable), + + TP_ARGS(shr, sc, freeable), + + TP_STRUCT__entry( + __field(struct shrinker *, shr) + __field(void *, shrink) + __field(long, freeable) + __field(int, nid) + ), + + TP_fast_assign( + __entry->shr = shr; + __entry->shrink = shr->count_objects; + __entry->freeable = freeable; + __entry->nid = sc->nid; + ), + + TP_printk("%pS %p: nid: %d freeable:%ld", + __entry->shrink, + __entry->shr, + __entry->nid, + __entry->freeable) +); + TRACE_EVENT(mm_shrink_slab_start, TP_PROTO(struct shrinker *shr, struct shrink_control *sc, long nr_objects_to_shrink, unsigned long cache_items, diff --git a/mm/shrinker.c b/mm/shrinker.c index dd91eab43ed3..d0c7bf61db61 100644 --- a/mm/shrinker.c +++ b/mm/shrinker.c @@ -379,7 +379,11 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, : SHRINK_BATCH; long scanned = 0, next_deferred; + trace_mm_shrink_count_start(shrinker, shrinkctl); + freeable = shrinker->count_objects(shrinker, shrinkctl); + + trace_mm_shrink_count_end(shrinker, shrinkctl, freeable); if (freeable == 0 || freeable == SHRINK_EMPTY) return freeable; -- 2.17.1
[PATCH -next v6 0/2] Make memory reclamation measurable
When the system memory is low, kswapd reclaims the memory. The key steps of memory reclamation include 1.shrink_lruvec * shrink_active_list, moves folios from the active LRU to the inactive LRU * shrink_inactive_list, shrink lru from inactive LRU list 2.shrink_slab * shrinker->count_objects(), calculates the freeable memory * shrinker->scan_objects(), reclaims the slab memory The existing tracers in the vmscan are as follows: --do_try_to_free_pages --shrink_zones --trace_mm_vmscan_node_reclaim_begin (tracer) --shrink_node --shrink_node_memcgs --trace_mm_vmscan_memcg_shrink_begin (tracer) --shrink_lruvec --shrink_list --shrink_active_list --trace_mm_vmscan_lru_shrink_active (tracer) --shrink_inactive_list --trace_mm_vmscan_lru_shrink_inactive (tracer) --shrink_active_list --shrink_slab --do_shrink_slab --shrinker->count_objects() --trace_mm_shrink_slab_start (tracer) --shrinker->scan_objects() --trace_mm_shrink_slab_end (tracer) --trace_mm_vmscan_memcg_shrink_end (tracer) --trace_mm_vmscan_node_reclaim_end (tracer) If we get the duration and quantity of shrink lru and slab, then we can measure the memory recycling, as follows Measuring memory reclamation with bpf: LRU FILE: CPU COMMShrinkActive(us) ShrinkInactive(us) Reclaim(page) 7 kswapd0 26 51 32 7 kswapd0 52 47 13 SLAB: CPU COMMOBJ_NAMECount_Dur(us) Freeable(page) Scan_Dur(us) Reclaim(page) 1 kswapd0 super_cache_scan.cfi_jt 2 341 3225 128 7 kswapd0 super_cache_scan.cfi_jt 0 2247 8524 1024 7 kswapd0 super_cache_scan.cfi_jt 23670 00 For this, add the new tracer to shrink_active_list/shrink_inactive_list and shrinker->count_objects(). Changes: v6: * Add Reviewed-by from Steven Rostedt. v5: * Use 'DECLARE_EVENT_CLASS(mm_vmscan_lru_shrink_start_template' to replace 'RACE_EVENT(mm_vmscan_lru_shrink_inactive/active_start' * Add the explanation for adding new shrink lru events into 'mm: vmscan: add new event to trace shrink lru' v4: Add Reviewed-by and Changlog to every patch. v3: Swap the positions of 'nid' and 'freeable' to prevent the hole in the trace event. v2: Modify trace_mm_vmscan_lru_shrink_inactive() in evict_folios() at the same time to fix build error. cuibixuan (2): mm: shrinker: add new event to trace shrink count mm: vmscan: add new event to trace shrink lru include/trace/events/vmscan.h | 80 ++- mm/shrinker.c | 4 ++ mm/vmscan.c | 11 +++-- 3 files changed, 90 insertions(+), 5 deletions(-) -- 2.17.1
[PATCH -next v5 2/2] mm: vmscan: add new event to trace shrink lru
From: cuibixuan Page reclaim is an important part of memory reclaim, including: * shrink_active_list(), moves folios from the active LRU to the inactive LRU * shrink_inactive_list(), shrink lru from inactive LRU list Add the new events to calculate the execution time to better evaluate the entire memory recycling ratio. Example of output: kswapd0-103 [007] . 1098.353020: mm_vmscan_lru_shrink_active_start: nid=0 kswapd0-103 [007] . 1098.353040: mm_vmscan_lru_shrink_active_end: nid=0 nr_taken=32 nr_active=0 nr_deactivated=32 nr_referenced=0 priority=6 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC kswapd0-103 [007] . 1098.353040: mm_vmscan_lru_shrink_inactive_start: nid=0 kswapd0-103 [007] . 1098.353094: mm_vmscan_lru_shrink_inactive_end: nid=0 nr_scanned=32 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate_anon=0 nr_activate_file=0 nr_ref_keep=32 nr_unmap_fail=0 priority=6 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC kswapd0-103 [007] . 1098.353094: mm_vmscan_lru_shrink_inactive_start: nid=0 kswapd0-103 [007] . 1098.353162: mm_vmscan_lru_shrink_inactive_end: nid=0 nr_scanned=32 nr_reclaimed=21 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate_anon=0 nr_activate_file=0 nr_ref_keep=11 nr_unmap_fail=0 priority=6 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC Signed-off-by: Bixuan Cui Reviewed-by: Andrew Morton --- Changes: v5: * Use 'DECLARE_EVENT_CLASS(mm_vmscan_lru_shrink_start_template' to replace 'RACE_EVENT(mm_vmscan_lru_shrink_inactive/active_start' * Add the explanation for adding new shrink lru events into 'mm: vmscan: add new event to trace shrink lru' v4: * Add Reviewed-by and Changlog to every patch. v3: * Swap the positions of 'nid' and 'freeable' to prevent the hole in the trace event. v2: * Modify trace_mm_vmscan_lru_shrink_inactive() in evict_folios() at the same time to fix build error (Andrew pointed out). include/trace/events/vmscan.h | 31 +-- mm/vmscan.c | 11 --- 2 files changed, 37 insertions(+), 5 deletions(-) diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h index b99cd28c9815..4793d952c248 100644 --- a/include/trace/events/vmscan.h +++ b/include/trace/events/vmscan.h @@ -395,7 +395,34 @@ TRACE_EVENT(mm_vmscan_write_folio, show_reclaim_flags(__entry->reclaim_flags)) ); -TRACE_EVENT(mm_vmscan_lru_shrink_inactive, +DECLARE_EVENT_CLASS(mm_vmscan_lru_shrink_start_template, + + TP_PROTO(int nid), + + TP_ARGS(nid), + + TP_STRUCT__entry( + __field(int, nid) + ), + + TP_fast_assign( + __entry->nid = nid; + ), + + TP_printk("nid=%d", __entry->nid) +); + +DEFINE_EVENT(mm_vmscan_lru_shrink_start_template, mm_vmscan_lru_shrink_inactive_start, + TP_PROTO(int nid), + TP_ARGS(nid) +); + +DEFINE_EVENT(mm_vmscan_lru_shrink_start_template, mm_vmscan_lru_shrink_active_start, + TP_PROTO(int nid), + TP_ARGS(nid) +); + +TRACE_EVENT(mm_vmscan_lru_shrink_inactive_end, TP_PROTO(int nid, unsigned long nr_scanned, unsigned long nr_reclaimed, @@ -446,7 +473,7 @@ TRACE_EVENT(mm_vmscan_lru_shrink_inactive, show_reclaim_flags(__entry->reclaim_flags)) ); -TRACE_EVENT(mm_vmscan_lru_shrink_active, +TRACE_EVENT(mm_vmscan_lru_shrink_active_end, TP_PROTO(int nid, unsigned long nr_taken, unsigned long nr_active, unsigned long nr_deactivated, diff --git a/mm/vmscan.c b/mm/vmscan.c index 4e3b835c6b4a..a44d9624d60f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1906,6 +1906,8 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, struct pglist_data *pgdat = lruvec_pgdat(lruvec); bool stalled = false; + trace_mm_vmscan_lru_shrink_inactive_start(pgdat->node_id); + while (unlikely(too_many_isolated(pgdat, file, sc))) { if (stalled) return 0; @@ -1990,7 +1992,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, if (file) sc->nr.file_taken += nr_taken; - trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, + trace_mm_vmscan_lru_shrink_inactive_end(pgdat->node_id, nr_scanned, nr_reclaimed, , sc->priority, file); return nr_reclaimed; } @@ -2028,6 +2030,8 @@ static void shrink_active_list(unsigned long nr_to_scan, int file = is_file_lru(lru); struct pglist_data *pgdat = lruvec_pgdat(lruvec); + trace_mm_vmscan_lru_shrink_active_start(pgdat->node_id); + lru_add_drain(); spin_lock_irq(>lru_lock); @@ -2107,7 +2111,7 @@ static void shrink_active_list(unsigned long nr_to_scan, lru_note_cost(lruvec, file, 0, nr_rotated
[PATCH -next v5 0/2] Make memory reclamation measurable
When the system memory is low, kswapd reclaims the memory. The key steps of memory reclamation include 1.shrink_lruvec * shrink_active_list, moves folios from the active LRU to the inactive LRU * shrink_inactive_list, shrink lru from inactive LRU list 2.shrink_slab * shrinker->count_objects(), calculates the freeable memory * shrinker->scan_objects(), reclaims the slab memory The existing tracers in the vmscan are as follows: --do_try_to_free_pages --shrink_zones --trace_mm_vmscan_node_reclaim_begin (tracer) --shrink_node --shrink_node_memcgs --trace_mm_vmscan_memcg_shrink_begin (tracer) --shrink_lruvec --shrink_list --shrink_active_list --trace_mm_vmscan_lru_shrink_active (tracer) --shrink_inactive_list --trace_mm_vmscan_lru_shrink_inactive (tracer) --shrink_active_list --shrink_slab --do_shrink_slab --shrinker->count_objects() --trace_mm_shrink_slab_start (tracer) --shrinker->scan_objects() --trace_mm_shrink_slab_end (tracer) --trace_mm_vmscan_memcg_shrink_end (tracer) --trace_mm_vmscan_node_reclaim_end (tracer) If we get the duration and quantity of shrink lru and slab, then we can measure the memory recycling, as follows Measuring memory reclamation with bpf: LRU FILE: CPU COMMShrinkActive(us) ShrinkInactive(us) Reclaim(page) 7 kswapd0 26 51 32 7 kswapd0 52 47 13 SLAB: CPU COMMOBJ_NAMECount_Dur(us) Freeable(page) Scan_Dur(us) Reclaim(page) 1 kswapd0 super_cache_scan.cfi_jt 2 341 3225 128 7 kswapd0 super_cache_scan.cfi_jt 0 2247 8524 1024 7 kswapd0 super_cache_scan.cfi_jt 23670 00 For this, add the new tracer to shrink_active_list/shrink_inactive_list and shrinker->count_objects(). Changes: v5: * Use 'DECLARE_EVENT_CLASS(mm_vmscan_lru_shrink_start_template' to replace 'RACE_EVENT(mm_vmscan_lru_shrink_inactive/active_start' * Add the explanation for adding new shrink lru events into 'mm: vmscan: add new event to trace shrink lru' v4: Add Reviewed-by and Changlog to every patch. v3: Swap the positions of 'nid' and 'freeable' to prevent the hole in the trace event. v2: Modify trace_mm_vmscan_lru_shrink_inactive() in evict_folios() at the same time to fix build error. cuibixuan (2): mm: shrinker: add new event to trace shrink count mm: vmscan: add new event to trace shrink lru include/trace/events/vmscan.h | 80 ++- mm/shrinker.c | 4 ++ mm/vmscan.c | 11 +++-- 3 files changed, 90 insertions(+), 5 deletions(-) -- 2.17.1
[PATCH -next v5 1/2] mm: shrinker: add new event to trace shrink count
From: cuibixuan do_shrink_slab() calculates the freeable memory through shrinker->count_objects(), and then reclaims the memory through shrinker->scan_objects(). When reclaiming memory, shrinker->count_objects() takes a certain amount of time: Fun spend(us) ext4_es_count 4302 ext4_es_scan 12 super_cache_count 4195 super_cache_scan 2103 Therefore, adding the trace event to count_objects() can more accurately obtain the time taken for slab memory recycling. Example of output: kswapd0-103 [003] . 1098.317942: mm_shrink_count_start: kfree_rcu_shrink_count.cfi_jt+0x0/0x8 c540ff51: nid: 0 kswapd0-103 [003] . 1098.317951: mm_shrink_count_end: kfree_rcu_shrink_count.cfi_jt+0x0/0x8 c540ff51: nid: 0 freeable:36 Signed-off-by: Bixuan Cui Reviewed-by: Steven Rostedt --- Changes: v5: * Use 'DECLARE_EVENT_CLASS(mm_vmscan_lru_shrink_start_template' to replace 'RACE_EVENT(mm_vmscan_lru_shrink_inactive/active_start' * Add the explanation for adding new shrink lru events into 'mm: vmscan: add new event to trace shrink lru' v4: * Add Reviewed-by and Changlog to every patch. v3: * Swap the positions of 'nid' and 'freeable' to prevent the hole in the trace event. v2: * Modify trace_mm_vmscan_lru_shrink_inactive() in evict_folios() at the same time to fix build error (Andrew pointed out). include/trace/events/vmscan.h | 49 +++ mm/shrinker.c | 4 +++ 2 files changed, 53 insertions(+) diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h index 1a488c30afa5..b99cd28c9815 100644 --- a/include/trace/events/vmscan.h +++ b/include/trace/events/vmscan.h @@ -196,6 +196,55 @@ DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_memcg_softlimit_re ); #endif /* CONFIG_MEMCG */ +TRACE_EVENT(mm_shrink_count_start, + TP_PROTO(struct shrinker *shr, struct shrink_control *sc), + + TP_ARGS(shr, sc), + + TP_STRUCT__entry( + __field(struct shrinker *, shr) + __field(void *, shrink) + __field(int, nid) + ), + + TP_fast_assign( + __entry->shr = shr; + __entry->shrink = shr->count_objects; + __entry->nid = sc->nid; + ), + + TP_printk("%pS %p: nid: %d", + __entry->shrink, + __entry->shr, + __entry->nid) +); + +TRACE_EVENT(mm_shrink_count_end, + TP_PROTO(struct shrinker *shr, struct shrink_control *sc, long freeable), + + TP_ARGS(shr, sc, freeable), + + TP_STRUCT__entry( + __field(struct shrinker *, shr) + __field(void *, shrink) + __field(long, freeable) + __field(int, nid) + ), + + TP_fast_assign( + __entry->shr = shr; + __entry->shrink = shr->count_objects; + __entry->freeable = freeable; + __entry->nid = sc->nid; + ), + + TP_printk("%pS %p: nid: %d freeable:%ld", + __entry->shrink, + __entry->shr, + __entry->nid, + __entry->freeable) +); + TRACE_EVENT(mm_shrink_slab_start, TP_PROTO(struct shrinker *shr, struct shrink_control *sc, long nr_objects_to_shrink, unsigned long cache_items, diff --git a/mm/shrinker.c b/mm/shrinker.c index dd91eab43ed3..d0c7bf61db61 100644 --- a/mm/shrinker.c +++ b/mm/shrinker.c @@ -379,7 +379,11 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, : SHRINK_BATCH; long scanned = 0, next_deferred; + trace_mm_shrink_count_start(shrinker, shrinkctl); + freeable = shrinker->count_objects(shrinker, shrinkctl); + + trace_mm_shrink_count_end(shrinker, shrinkctl, freeable); if (freeable == 0 || freeable == SHRINK_EMPTY) return freeable; -- 2.17.1
Re: [PATCH -next v4 2/2] mm: vmscan: add new event to trace shrink lru
在 2023/12/21 1:54, Yu Zhao 写道: Signed-off-by: Bixuan Cui Reviewed-by: Andrew Morton --- v4: Add Reviewed-by and Changlog to every patch. Where did Andrew provide his Reviewed-by?Hi, I just want to add Reviewed-by to my patch to thank the reveiw of Steven and Andrew.:-) v2: Modify trace_mm_vmscan_lru_shrink_inactive() in evict_folios() at the same time to fix build error. The reason v3 was NAK'ed was not mentioned or fixed in v4. So NAK again. The build error pointed out by Andrew has been fixed in [mm: vmscan: add new event to trace shrink lru]: @@ -4524,9 +4528,10 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap if (list_empty()) return scanned; retry: + trace_mm_vmscan_lru_shrink_inactive_start(pgdat->node_id); reclaimed = shrink_folio_list(, pgdat, sc, , false); sc->nr_reclaimed += reclaimed; - trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, + trace_mm_vmscan_lru_shrink_inactive_end(pgdat->node_id, scanned, reclaimed, , sc->priority, type ? LRU_INACTIVE_FILE : LRU_INACTIVE_ANON); Are there any other reasons for NAK? And thank you for your review. Thanks Bixuan Cui
[PATCH -next v4 2/2] mm: vmscan: add new event to trace shrink lru
From: cuibixuan Add a new event to calculate the shrink_inactive_list()/shrink_active_list() execution time. Example of output: kswapd0-103 [007] . 1098.353020: mm_vmscan_lru_shrink_active_start: nid=0 kswapd0-103 [007] . 1098.353040: mm_vmscan_lru_shrink_active_end: nid=0 nr_taken=32 nr_active=0 nr_deactivated=32 nr_referenced=0 priority=6 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC kswapd0-103 [007] . 1098.353040: mm_vmscan_lru_shrink_inactive_start: nid=0 kswapd0-103 [007] . 1098.353094: mm_vmscan_lru_shrink_inactive_end: nid=0 nr_scanned=32 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate_anon=0 nr_activate_file=0 nr_ref_keep=32 nr_unmap_fail=0 priority=6 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC kswapd0-103 [007] . 1098.353094: mm_vmscan_lru_shrink_inactive_start: nid=0 kswapd0-103 [007] . 1098.353162: mm_vmscan_lru_shrink_inactive_end: nid=0 nr_scanned=32 nr_reclaimed=21 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate_anon=0 nr_activate_file=0 nr_ref_keep=11 nr_unmap_fail=0 priority=6 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC Signed-off-by: Bixuan Cui Reviewed-by: Andrew Morton --- v4: Add Reviewed-by and Changlog to every patch. v2: Modify trace_mm_vmscan_lru_shrink_inactive() in evict_folios() at the same time to fix build error. include/trace/events/vmscan.h | 38 +-- mm/vmscan.c | 11 +++--- 2 files changed, 44 insertions(+), 5 deletions(-) diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h index b99cd28c9815..02868bdc5999 100644 --- a/include/trace/events/vmscan.h +++ b/include/trace/events/vmscan.h @@ -395,7 +395,24 @@ TRACE_EVENT(mm_vmscan_write_folio, show_reclaim_flags(__entry->reclaim_flags)) ); -TRACE_EVENT(mm_vmscan_lru_shrink_inactive, +TRACE_EVENT(mm_vmscan_lru_shrink_inactive_start, + + TP_PROTO(int nid), + + TP_ARGS(nid), + + TP_STRUCT__entry( + __field(int, nid) + ), + + TP_fast_assign( + __entry->nid = nid; + ), + + TP_printk("nid=%d", __entry->nid) +); + +TRACE_EVENT(mm_vmscan_lru_shrink_inactive_end, TP_PROTO(int nid, unsigned long nr_scanned, unsigned long nr_reclaimed, @@ -446,7 +463,24 @@ TRACE_EVENT(mm_vmscan_lru_shrink_inactive, show_reclaim_flags(__entry->reclaim_flags)) ); -TRACE_EVENT(mm_vmscan_lru_shrink_active, +TRACE_EVENT(mm_vmscan_lru_shrink_active_start, + + TP_PROTO(int nid), + + TP_ARGS(nid), + + TP_STRUCT__entry( + __field(int, nid) + ), + + TP_fast_assign( + __entry->nid = nid; + ), + + TP_printk("nid=%d", __entry->nid) +); + +TRACE_EVENT(mm_vmscan_lru_shrink_active_end, TP_PROTO(int nid, unsigned long nr_taken, unsigned long nr_active, unsigned long nr_deactivated, diff --git a/mm/vmscan.c b/mm/vmscan.c index 4e3b835c6b4a..a44d9624d60f 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1906,6 +1906,8 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, struct pglist_data *pgdat = lruvec_pgdat(lruvec); bool stalled = false; + trace_mm_vmscan_lru_shrink_inactive_start(pgdat->node_id); + while (unlikely(too_many_isolated(pgdat, file, sc))) { if (stalled) return 0; @@ -1990,7 +1992,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, if (file) sc->nr.file_taken += nr_taken; - trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, + trace_mm_vmscan_lru_shrink_inactive_end(pgdat->node_id, nr_scanned, nr_reclaimed, , sc->priority, file); return nr_reclaimed; } @@ -2028,6 +2030,8 @@ static void shrink_active_list(unsigned long nr_to_scan, int file = is_file_lru(lru); struct pglist_data *pgdat = lruvec_pgdat(lruvec); + trace_mm_vmscan_lru_shrink_active_start(pgdat->node_id); + lru_add_drain(); spin_lock_irq(>lru_lock); @@ -2107,7 +2111,7 @@ static void shrink_active_list(unsigned long nr_to_scan, lru_note_cost(lruvec, file, 0, nr_rotated); mem_cgroup_uncharge_list(_active); free_unref_page_list(_active); - trace_mm_vmscan_lru_shrink_active(pgdat->node_id, nr_taken, nr_activate, + trace_mm_vmscan_lru_shrink_active_end(pgdat->node_id, nr_taken, nr_activate, nr_deactivate, nr_rotated, sc->priority, file); } @@ -4524,9 +4528,10 @@ static int evict_folios(struct lruvec *lruvec, struct scan_control *sc, int swap if (list_empty()) return scanned; retry: + trace_mm_vmscan_lru_shrink_inactive_start(pgdat->node_id);
[PATCH -next v4 1/2] mm: shrinker: add new event to trace shrink count
From: cuibixuan do_shrink_slab() calculates the freeable memory through shrinker->count_objects(), and then reclaims the memory through shrinker->scan_objects(). When reclaiming memory, shrinker->count_objects() takes a certain amount of time: Fun spend(us) ext4_es_count 4302 ext4_es_scan 12 super_cache_count 4195 super_cache_scan 2103 Therefore, adding the trace event to count_objects() can more accurately obtain the time taken for slab memory recycling. Example of output: kswapd0-103 [003] . 1098.317942: mm_shrink_count_start: kfree_rcu_shrink_count.cfi_jt+0x0/0x8 c540ff51: nid: 0 kswapd0-103 [003] . 1098.317951: mm_shrink_count_end: kfree_rcu_shrink_count.cfi_jt+0x0/0x8 c540ff51: nid: 0 freeable:36 Signed-off-by: Bixuan Cui Reviewed-by: Steven Rostedt --- v4: Add Reviewed-by and Changlog to every patch. v3: Swap the positions of 'nid' and 'freeable' to prevent the hole in the trace event. include/trace/events/vmscan.h | 49 +++ mm/shrinker.c | 4 +++ 2 files changed, 53 insertions(+) diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h index 1a488c30afa5..b99cd28c9815 100644 --- a/include/trace/events/vmscan.h +++ b/include/trace/events/vmscan.h @@ -196,6 +196,55 @@ DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_memcg_softlimit_re ); #endif /* CONFIG_MEMCG */ +TRACE_EVENT(mm_shrink_count_start, + TP_PROTO(struct shrinker *shr, struct shrink_control *sc), + + TP_ARGS(shr, sc), + + TP_STRUCT__entry( + __field(struct shrinker *, shr) + __field(void *, shrink) + __field(int, nid) + ), + + TP_fast_assign( + __entry->shr = shr; + __entry->shrink = shr->count_objects; + __entry->nid = sc->nid; + ), + + TP_printk("%pS %p: nid: %d", + __entry->shrink, + __entry->shr, + __entry->nid) +); + +TRACE_EVENT(mm_shrink_count_end, + TP_PROTO(struct shrinker *shr, struct shrink_control *sc, long freeable), + + TP_ARGS(shr, sc, freeable), + + TP_STRUCT__entry( + __field(struct shrinker *, shr) + __field(void *, shrink) + __field(long, freeable) + __field(int, nid) + ), + + TP_fast_assign( + __entry->shr = shr; + __entry->shrink = shr->count_objects; + __entry->freeable = freeable; + __entry->nid = sc->nid; + ), + + TP_printk("%pS %p: nid: %d freeable:%ld", + __entry->shrink, + __entry->shr, + __entry->nid, + __entry->freeable) +); + TRACE_EVENT(mm_shrink_slab_start, TP_PROTO(struct shrinker *shr, struct shrink_control *sc, long nr_objects_to_shrink, unsigned long cache_items, diff --git a/mm/shrinker.c b/mm/shrinker.c index dd91eab43ed3..d0c7bf61db61 100644 --- a/mm/shrinker.c +++ b/mm/shrinker.c @@ -379,7 +379,11 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, : SHRINK_BATCH; long scanned = 0, next_deferred; + trace_mm_shrink_count_start(shrinker, shrinkctl); + freeable = shrinker->count_objects(shrinker, shrinkctl); + + trace_mm_shrink_count_end(shrinker, shrinkctl, freeable); if (freeable == 0 || freeable == SHRINK_EMPTY) return freeable; -- 2.17.1
[PATCH -next v4 0/2] Make memory reclamation measurable
When the system memory is low, kswapd reclaims the memory. The key steps of memory reclamation include 1.shrink_lruvec * shrink_active_list, moves folios from the active LRU to the inactive LRU * shrink_inactive_list, shrink lru from inactive LRU list 2.shrink_slab * shrinker->count_objects(), calculates the freeable memory * shrinker->scan_objects(), reclaims the slab memory The existing tracers in the vmscan are as follows: --do_try_to_free_pages --shrink_zones --trace_mm_vmscan_node_reclaim_begin (tracer) --shrink_node --shrink_node_memcgs --trace_mm_vmscan_memcg_shrink_begin (tracer) --shrink_lruvec --shrink_list --shrink_active_list --trace_mm_vmscan_lru_shrink_active (tracer) --shrink_inactive_list --trace_mm_vmscan_lru_shrink_inactive (tracer) --shrink_active_list --shrink_slab --do_shrink_slab --shrinker->count_objects() --trace_mm_shrink_slab_start (tracer) --shrinker->scan_objects() --trace_mm_shrink_slab_end (tracer) --trace_mm_vmscan_memcg_shrink_end (tracer) --trace_mm_vmscan_node_reclaim_end (tracer) If we get the duration and quantity of shrink lru and slab, then we can measure the memory recycling, as follows Measuring memory reclamation with bpf: LRU FILE: CPU COMMShrinkActive(us) ShrinkInactive(us) Reclaim(page) 7 kswapd0 26 51 32 7 kswapd0 52 47 13 SLAB: CPU COMMOBJ_NAMECount_Dur(us) Freeable(page) Scan_Dur(us) Reclaim(page) 1 kswapd0 super_cache_scan.cfi_jt 2 341 3225 128 7 kswapd0 super_cache_scan.cfi_jt 0 2247 8524 1024 7 kswapd0 super_cache_scan.cfi_jt 23670 00 For this, add the new tracer to shrink_active_list/shrink_inactive_list and shrinker->count_objects(). Changes: v4: Add Reviewed-by and Changlog to every patch. v3: Swap the positions of 'nid' and 'freeable' to prevent the hole in the trace event. v2: Modify trace_mm_vmscan_lru_shrink_inactive() in evict_folios() at the same time to fix build error. cuibixuan (2): mm: shrinker: add new event to trace shrink count mm: vmscan: add new event to trace shrink lru include/trace/events/vmscan.h | 87 ++- mm/shrinker.c | 4 ++ mm/vmscan.c | 11 +++-- 3 files changed, 97 insertions(+), 5 deletions(-) -- 2.17.1
Re: [PATCH -next 2/2] mm: vmscan: add new event to trace shrink lru
在 2023/12/13 11:03, Andrew Morton 写道: -TRACE_EVENT(mm_vmscan_lru_shrink_inactive, +TRACE_EVENT(mm_vmscan_lru_shrink_inactive_start, Current kernels have a call to trace_mm_vmscan_lru_shrink_inactive() in evict_folios(), so this renaming broke the build. Sorry, I did not enable CONFIG_LRU_GEN when compiling and testing. I will double check my patches. Thanks Bixuan Cui
[PATCH -next 2/2] mm: vmscan: add new event to trace shrink lru
From: cuibixuan Add a new event to calculate the shrink_inactive_list()/shrink_active_list() execution time. Example of output: kswapd0-103 [007] . 1098.353020: mm_vmscan_lru_shrink_active_start: nid=0 kswapd0-103 [007] . 1098.353040: mm_vmscan_lru_shrink_active_end: nid=0 nr_taken=32 nr_active=0 nr_deactivated=32 nr_referenced=0 priority=6 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC kswapd0-103 [007] . 1098.353040: mm_vmscan_lru_shrink_inactive_start: nid=0 kswapd0-103 [007] . 1098.353094: mm_vmscan_lru_shrink_inactive_end: nid=0 nr_scanned=32 nr_reclaimed=0 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate_anon=0 nr_activate_file=0 nr_ref_keep=32 nr_unmap_fail=0 priority=6 flags=RECLAIM_WB_ANON|RECLAIM_WB_ASYNC kswapd0-103 [007] . 1098.353094: mm_vmscan_lru_shrink_inactive_start: nid=0 kswapd0-103 [007] . 1098.353162: mm_vmscan_lru_shrink_inactive_end: nid=0 nr_scanned=32 nr_reclaimed=21 nr_dirty=0 nr_writeback=0 nr_congested=0 nr_immediate=0 nr_activate_anon=0 nr_activate_file=0 nr_ref_keep=11 nr_unmap_fail=0 priority=6 flags=RECLAIM_WB_FILE|RECLAIM_WB_ASYNC Signed-off-by: Bixuan Cui --- include/trace/events/vmscan.h | 38 +-- mm/vmscan.c | 8 ++-- 2 files changed, 42 insertions(+), 4 deletions(-) diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h index 406faa5591c1..9809d158f968 100644 --- a/include/trace/events/vmscan.h +++ b/include/trace/events/vmscan.h @@ -395,7 +395,24 @@ TRACE_EVENT(mm_vmscan_write_folio, show_reclaim_flags(__entry->reclaim_flags)) ); -TRACE_EVENT(mm_vmscan_lru_shrink_inactive, +TRACE_EVENT(mm_vmscan_lru_shrink_inactive_start, + + TP_PROTO(int nid), + + TP_ARGS(nid), + + TP_STRUCT__entry( + __field(int, nid) + ), + + TP_fast_assign( + __entry->nid = nid; + ), + + TP_printk("nid=%d", __entry->nid) +); + +TRACE_EVENT(mm_vmscan_lru_shrink_inactive_end, TP_PROTO(int nid, unsigned long nr_scanned, unsigned long nr_reclaimed, @@ -446,7 +463,24 @@ TRACE_EVENT(mm_vmscan_lru_shrink_inactive, show_reclaim_flags(__entry->reclaim_flags)) ); -TRACE_EVENT(mm_vmscan_lru_shrink_active, +TRACE_EVENT(mm_vmscan_lru_shrink_active_start, + + TP_PROTO(int nid), + + TP_ARGS(nid), + + TP_STRUCT__entry( + __field(int, nid) + ), + + TP_fast_assign( + __entry->nid = nid; + ), + + TP_printk("nid=%d", __entry->nid) +); + +TRACE_EVENT(mm_vmscan_lru_shrink_active_end, TP_PROTO(int nid, unsigned long nr_taken, unsigned long nr_active, unsigned long nr_deactivated, diff --git a/mm/vmscan.c b/mm/vmscan.c index 4e3b835c6b4a..73e690b3ce68 100644 --- a/mm/vmscan.c +++ b/mm/vmscan.c @@ -1906,6 +1906,8 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, struct pglist_data *pgdat = lruvec_pgdat(lruvec); bool stalled = false; + trace_mm_vmscan_lru_shrink_inactive_start(pgdat->node_id); + while (unlikely(too_many_isolated(pgdat, file, sc))) { if (stalled) return 0; @@ -1990,7 +1992,7 @@ static unsigned long shrink_inactive_list(unsigned long nr_to_scan, if (file) sc->nr.file_taken += nr_taken; - trace_mm_vmscan_lru_shrink_inactive(pgdat->node_id, + trace_mm_vmscan_lru_shrink_inactive_end(pgdat->node_id, nr_scanned, nr_reclaimed, , sc->priority, file); return nr_reclaimed; } @@ -2028,6 +2030,8 @@ static void shrink_active_list(unsigned long nr_to_scan, int file = is_file_lru(lru); struct pglist_data *pgdat = lruvec_pgdat(lruvec); + trace_mm_vmscan_lru_shrink_active_start(pgdat->node_id); + lru_add_drain(); spin_lock_irq(>lru_lock); @@ -2107,7 +2111,7 @@ static void shrink_active_list(unsigned long nr_to_scan, lru_note_cost(lruvec, file, 0, nr_rotated); mem_cgroup_uncharge_list(_active); free_unref_page_list(_active); - trace_mm_vmscan_lru_shrink_active(pgdat->node_id, nr_taken, nr_activate, + trace_mm_vmscan_lru_shrink_active_end(pgdat->node_id, nr_taken, nr_activate, nr_deactivate, nr_rotated, sc->priority, file); } -- 2.39.0
[PATCH -next 1/2] mm: shrinker: add new event to trace shrink count
From: cuibixuan do_shrink_slab() calculates the freeable memory through shrinker->count_objects(), and then reclaims the memory through shrinker->scan_objects(). When reclaiming memory, shrinker->count_objects() takes a certain amount of time: Fun spend(us) ext4_es_count 4302 ext4_es_scan 12 super_cache_count 4195 super_cache_scan 2103 Therefore, adding the trace event to count_objects() can more accurately obtain the time taken for slab memory recycling. Example of output: kswapd0-103 [003] . 1098.317942: mm_shrink_count_start: kfree_rcu_shrink_count.cfi_jt+0x0/0x8 c540ff51: nid: 0 kswapd0-103 [003] . 1098.317951: mm_shrink_count_end: kfree_rcu_shrink_count.cfi_jt+0x0/0x8 c540ff51: nid: 0 freeable:36 Signed-off-by: Bixuan Cui --- include/trace/events/vmscan.h | 49 +++ mm/shrinker.c | 4 +++ 2 files changed, 53 insertions(+) diff --git a/include/trace/events/vmscan.h b/include/trace/events/vmscan.h index 1a488c30afa5..406faa5591c1 100644 --- a/include/trace/events/vmscan.h +++ b/include/trace/events/vmscan.h @@ -196,6 +196,55 @@ DEFINE_EVENT(mm_vmscan_direct_reclaim_end_template, mm_vmscan_memcg_softlimit_re ); #endif /* CONFIG_MEMCG */ +TRACE_EVENT(mm_shrink_count_start, + TP_PROTO(struct shrinker *shr, struct shrink_control *sc), + + TP_ARGS(shr, sc), + + TP_STRUCT__entry( + __field(struct shrinker *, shr) + __field(void *, shrink) + __field(int, nid) + ), + + TP_fast_assign( + __entry->shr = shr; + __entry->shrink = shr->count_objects; + __entry->nid = sc->nid; + ), + + TP_printk("%pS %p: nid: %d", + __entry->shrink, + __entry->shr, + __entry->nid) +); + +TRACE_EVENT(mm_shrink_count_end, + TP_PROTO(struct shrinker *shr, struct shrink_control *sc, long freeable), + + TP_ARGS(shr, sc, freeable), + + TP_STRUCT__entry( + __field(struct shrinker *, shr) + __field(void *, shrink) + __field(int, nid) + __field(long, freeable) + ), + + TP_fast_assign( + __entry->shr = shr; + __entry->shrink = shr->count_objects; + __entry->nid = sc->nid; + __entry->freeable = freeable; + ), + + TP_printk("%pS %p: nid: %d freeable:%ld", + __entry->shrink, + __entry->shr, + __entry->nid, + __entry->freeable) +); + TRACE_EVENT(mm_shrink_slab_start, TP_PROTO(struct shrinker *shr, struct shrink_control *sc, long nr_objects_to_shrink, unsigned long cache_items, diff --git a/mm/shrinker.c b/mm/shrinker.c index dd91eab43ed3..d0c7bf61db61 100644 --- a/mm/shrinker.c +++ b/mm/shrinker.c @@ -379,7 +379,11 @@ static unsigned long do_shrink_slab(struct shrink_control *shrinkctl, : SHRINK_BATCH; long scanned = 0, next_deferred; + trace_mm_shrink_count_start(shrinker, shrinkctl); + freeable = shrinker->count_objects(shrinker, shrinkctl); + + trace_mm_shrink_count_end(shrinker, shrinkctl, freeable); if (freeable == 0 || freeable == SHRINK_EMPTY) return freeable; -- 2.39.0
[PATCH -next 0/2] Make memory reclamation measurable
From: cuibixuan When the system memory is low, kswapd reclaims the memory. The key steps of memory reclamation include 1.shrink_lruvec * shrink_active_list, moves folios from the active LRU to the inactive LRU * shrink_inactive_list, shrink lru from inactive LRU list 2.shrink_slab * shrinker->count_objects(), calculates the freeable memory * shrinker->scan_objects(), reclaims the slab memory The existing tracers in the vmscan are as follows: --do_try_to_free_pages --shrink_zones --trace_mm_vmscan_node_reclaim_begin (tracer) --shrink_node --shrink_node_memcgs --trace_mm_vmscan_memcg_shrink_begin (tracer) --shrink_lruvec --shrink_list --shrink_active_list --trace_mm_vmscan_lru_shrink_active (tracer) --shrink_inactive_list --trace_mm_vmscan_lru_shrink_inactive (tracer) --shrink_active_list --shrink_slab --do_shrink_slab --shrinker->count_objects() --trace_mm_shrink_slab_start (tracer) --shrinker->scan_objects() --trace_mm_shrink_slab_end (tracer) --trace_mm_vmscan_memcg_shrink_end (tracer) --trace_mm_vmscan_node_reclaim_end (tracer) If we get the duration and quantity of shrink lru and slab, then we can measure the memory recycling, as follows Measuring memory reclamation with bpf: LRU FILE: CPU COMMShrinkActive(us) ShrinkInactive(us) Reclaim(page) 7 kswapd0 26 51 32 7 kswapd0 52 47 13 SLAB: CPU COMMOBJ_NAMECount_Dur(us) Freeable(page) Scan_Dur(us) Reclaim(page) 1 kswapd0 super_cache_scan.cfi_jt 2 341 3225 128 7 kswapd0 super_cache_scan.cfi_jt 0 2247 8524 1024 7 kswapd0 super_cache_scan.cfi_jt 23670 00 For this, add the new tracer to shrink_active_list/shrink_inactive_list and shrinker->count_objects(). cuibixuan (2): mm: shrinker: add new event to trace shrink count mm: vmscan: add new event to trace shrink lru include/trace/events/vmscan.h | 87 ++- mm/shrinker.c | 4 ++ mm/vmscan.c | 8 +++- 3 files changed, 95 insertions(+), 4 deletions(-) -- 2.39.0
[PATCH -next] PCI: remove unused variable rdev
Fix the build warning: drivers/pci/quirks.c: In function ‘quirk_amd_nvme_fixup’: drivers/pci/quirks.c:312:18: warning: unused variable ‘rdev’ [-Wunused-variable] struct pci_dev *rdev; ^~~~ Fixes: 9597624ef606 ('nvme: put some AMD PCIE downstream NVME device to simple suspend/resume path') Signed-off-by: Bixuan Cui --- drivers/pci/quirks.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c index 2e24dced699a..c86ede081534 100644 --- a/drivers/pci/quirks.c +++ b/drivers/pci/quirks.c @@ -309,8 +309,6 @@ DECLARE_PCI_FIXUP_FINAL(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_8151_0, quirk_nopci static void quirk_amd_nvme_fixup(struct pci_dev *dev) { - struct pci_dev *rdev; - dev->dev_flags |= PCI_DEV_FLAGS_AMD_NVME_SIMPLE_SUSPEND; pci_info(dev, "AMD simple suspend opt enabled\n"); -- 2.17.1
[PATCH -next] nvmem: sprd: Add missing MODULE_DEVICE_TABLE
This patch adds missing MODULE_DEVICE_TABLE definition which generates correct modalias for automatic loading of this driver when it is built as an external module. Reported-by: Hulk Robot Signed-off-by: Bixuan Cui --- drivers/nvmem/sprd-efuse.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/nvmem/sprd-efuse.c b/drivers/nvmem/sprd-efuse.c index 59523245db8a..5d394559edf2 100644 --- a/drivers/nvmem/sprd-efuse.c +++ b/drivers/nvmem/sprd-efuse.c @@ -425,6 +425,7 @@ static const struct of_device_id sprd_efuse_of_match[] = { { .compatible = "sprd,ums312-efuse", .data = _data }, { } }; +MODULE_DEVICE_TABLE(of, sprd_efuse_of_match); static struct platform_driver sprd_efuse_driver = { .probe = sprd_efuse_probe,
[PATCH -next] staging: ralink-gdma: Add missing MODULE_DEVICE_TABLE
This patch adds missing MODULE_DEVICE_TABLE definition which generates correct modalias for automatic loading of this driver when it is built as an external module. Reported-by: Hulk Robot Signed-off-by: Bixuan Cui --- drivers/staging/ralink-gdma/ralink-gdma.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/staging/ralink-gdma/ralink-gdma.c b/drivers/staging/ralink-gdma/ralink-gdma.c index 3c26b665ee7c..33e28ccf4d85 100644 --- a/drivers/staging/ralink-gdma/ralink-gdma.c +++ b/drivers/staging/ralink-gdma/ralink-gdma.c @@ -788,6 +788,7 @@ static const struct of_device_id gdma_of_match_table[] = { { .compatible = "ralink,rt3883-gdma", .data = _gdma_data }, { }, }; +MODULE_DEVICE_TABLE(of, gdma_of_match_table); static int gdma_dma_probe(struct platform_device *pdev) {
[PATCH -next] usb: dwc3: qcom: Remove redundant dev_err call in dwc3_qcom_probe()
There is a error message within devm_ioremap_resource already, so remove the dev_err call to avoid redundant error message. Reported-by: Hulk Robot Signed-off-by: Bixuan Cui --- drivers/usb/dwc3/dwc3-qcom.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/usb/dwc3/dwc3-qcom.c b/drivers/usb/dwc3/dwc3-qcom.c index e37cc58dfa55..726d5048d87c 100644 --- a/drivers/usb/dwc3/dwc3-qcom.c +++ b/drivers/usb/dwc3/dwc3-qcom.c @@ -774,7 +774,6 @@ static int dwc3_qcom_probe(struct platform_device *pdev) qcom->qscratch_base = devm_ioremap_resource(dev, parent_res); if (IS_ERR(qcom->qscratch_base)) { - dev_err(dev, "failed to map qscratch, err=%d\n", ret); ret = PTR_ERR(qcom->qscratch_base); goto clk_disable; }
Re: [PATCH] usb: dwc3: qcom: Fixed an issue that the ret value is incorrect in dwc3_qcom_probe()
On 2021/4/9 18:00, Manivannan Sadhasivam wrote: > But this error message can be removed altogether as devm_ioremap_resource() > reports it already. Thank you for your reply. I'll revise it. Thanks, Bixuan Cui
[PATCH -next] powerpc/perf/hv-24x7: Make some symbols static
The sparse tool complains as follows: arch/powerpc/perf/hv-24x7.c:229:1: warning: symbol '__pcpu_scope_hv_24x7_txn_flags' was not declared. Should it be static? arch/powerpc/perf/hv-24x7.c:230:1: warning: symbol '__pcpu_scope_hv_24x7_txn_err' was not declared. Should it be static? arch/powerpc/perf/hv-24x7.c:236:1: warning: symbol '__pcpu_scope_hv_24x7_hw' was not declared. Should it be static? arch/powerpc/perf/hv-24x7.c:244:1: warning: symbol '__pcpu_scope_hv_24x7_reqb' was not declared. Should it be static? arch/powerpc/perf/hv-24x7.c:245:1: warning: symbol '__pcpu_scope_hv_24x7_resb' was not declared. Should it be static? This symbol is not used outside of hv-24x7.c, so this commit marks it static. Reported-by: Hulk Robot Signed-off-by: Bixuan Cui --- arch/powerpc/perf/hv-24x7.c | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/arch/powerpc/perf/hv-24x7.c b/arch/powerpc/perf/hv-24x7.c index e5eb33255066..1816f560a465 100644 --- a/arch/powerpc/perf/hv-24x7.c +++ b/arch/powerpc/perf/hv-24x7.c @@ -226,14 +226,14 @@ static struct attribute_group event_long_desc_group = { static struct kmem_cache *hv_page_cache; -DEFINE_PER_CPU(int, hv_24x7_txn_flags); -DEFINE_PER_CPU(int, hv_24x7_txn_err); +static DEFINE_PER_CPU(int, hv_24x7_txn_flags); +static DEFINE_PER_CPU(int, hv_24x7_txn_err); struct hv_24x7_hw { struct perf_event *events[255]; }; -DEFINE_PER_CPU(struct hv_24x7_hw, hv_24x7_hw); +static DEFINE_PER_CPU(struct hv_24x7_hw, hv_24x7_hw); /* * request_buffer and result_buffer are not required to be 4k aligned, @@ -241,8 +241,8 @@ DEFINE_PER_CPU(struct hv_24x7_hw, hv_24x7_hw); * the simplest way to ensure that. */ #define H24x7_DATA_BUFFER_SIZE 4096 -DEFINE_PER_CPU(char, hv_24x7_reqb[H24x7_DATA_BUFFER_SIZE]) __aligned(4096); -DEFINE_PER_CPU(char, hv_24x7_resb[H24x7_DATA_BUFFER_SIZE]) __aligned(4096); +static DEFINE_PER_CPU(char, hv_24x7_reqb[H24x7_DATA_BUFFER_SIZE]) __aligned(4096); +static DEFINE_PER_CPU(char, hv_24x7_resb[H24x7_DATA_BUFFER_SIZE]) __aligned(4096); static unsigned int max_num_requests(int interface_version) {
[PATCH -next] powerpc/perf: Make symbol 'isa207_pmu_format_attr' static
The sparse tool complains as follows: arch/powerpc/perf/isa207-common.c:24:18: warning: symbol 'isa207_pmu_format_attr' was not declared. Should it be static? This symbol is not used outside of isa207-common.c, so this commit marks it static. Reported-by: Hulk Robot Signed-off-by: Bixuan Cui --- arch/powerpc/perf/isa207-common.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/perf/isa207-common.c b/arch/powerpc/perf/isa207-common.c index e4f577da33d8..487f9e914b5c 100644 --- a/arch/powerpc/perf/isa207-common.c +++ b/arch/powerpc/perf/isa207-common.c @@ -21,7 +21,7 @@ PMU_FORMAT_ATTR(thresh_stop, "config:32-35"); PMU_FORMAT_ATTR(thresh_start, "config:36-39"); PMU_FORMAT_ATTR(thresh_cmp,"config:40-49"); -struct attribute *isa207_pmu_format_attr[] = { +static struct attribute *isa207_pmu_format_attr[] = { _attr_event.attr, _attr_pmcxsel.attr, _attr_mark.attr,
[PATCH -next] powerpc/pseries: Make symbol '__pcpu_scope_hcall_stats' static
The sparse tool complains as follows: arch/powerpc/platforms/pseries/hvCall_inst.c:29:1: warning: symbol '__pcpu_scope_hcall_stats' was not declared. Should it be static? This symbol is not used outside of hvCall_inst.c, so this commit marks it static. Reported-by: Hulk Robot Signed-off-by: Bixuan Cui --- arch/powerpc/platforms/pseries/hvCall_inst.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/pseries/hvCall_inst.c b/arch/powerpc/platforms/pseries/hvCall_inst.c index 2c59b4986ea5..3a50612a78db 100644 --- a/arch/powerpc/platforms/pseries/hvCall_inst.c +++ b/arch/powerpc/platforms/pseries/hvCall_inst.c @@ -26,7 +26,7 @@ struct hcall_stats { }; #define HCALL_STAT_ARRAY_SIZE ((MAX_HCALL_OPCODE >> 2) + 1) -DEFINE_PER_CPU(struct hcall_stats[HCALL_STAT_ARRAY_SIZE], hcall_stats); +static DEFINE_PER_CPU(struct hcall_stats[HCALL_STAT_ARRAY_SIZE], hcall_stats); /* * Routines for displaying the statistics in debugfs
[PATCH -next] powerpc/powernv: make symbol 'mpipl_kobj' static
The sparse tool complains as follows: arch/powerpc/platforms/powernv/opal-core.c:74:16: warning: symbol 'mpipl_kobj' was not declared. This symbol is not used outside of opal-core.c, so marks it static. Reported-by: Hulk Robot Signed-off-by: Bixuan Cui --- arch/powerpc/platforms/powernv/opal-core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/arch/powerpc/platforms/powernv/opal-core.c b/arch/powerpc/platforms/powernv/opal-core.c index 0d9ba70f7251..5b9736bbc2aa 100644 --- a/arch/powerpc/platforms/powernv/opal-core.c +++ b/arch/powerpc/platforms/powernv/opal-core.c @@ -71,7 +71,7 @@ static LIST_HEAD(opalcore_list); static struct opalcore_config *oc_conf; static const struct opal_mpipl_fadump *opalc_metadata; static const struct opal_mpipl_fadump *opalc_cpu_metadata; -struct kobject *mpipl_kobj; +static struct kobject *mpipl_kobj; /* * Set crashing CPU's signal to SIGUSR1. if the kernel is triggered
[PATCH] usb: dwc3: qcom: Fixed an issue that the ret value is incorrect in dwc3_qcom_probe()
There is a error message after devm_ioremap_resource failed, and the ret is needs to be obtained through PTR_ERR(qcom->qscratch_base). We need to move the dev_err() downwards to ensure that the ret value is correct. Fixes: a4333c3a6ba9 ('usb: dwc3: Add Qualcomm DWC3 glue driver') Reported-by: Hulk Robot Signed-off-by: Bixuan Cui --- drivers/usb/dwc3/dwc3-qcom.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/dwc3/dwc3-qcom.c b/drivers/usb/dwc3/dwc3-qcom.c index e37cc58dfa55..4716ca8c753d 100644 --- a/drivers/usb/dwc3/dwc3-qcom.c +++ b/drivers/usb/dwc3/dwc3-qcom.c @@ -774,8 +774,8 @@ static int dwc3_qcom_probe(struct platform_device *pdev) qcom->qscratch_base = devm_ioremap_resource(dev, parent_res); if (IS_ERR(qcom->qscratch_base)) { - dev_err(dev, "failed to map qscratch, err=%d\n", ret); ret = PTR_ERR(qcom->qscratch_base); + dev_err(dev, "failed to map qscratch, err=%d\n", ret); goto clk_disable; } -- 2.17.1
[PATCH] drm/vc4: Fix PM reference leak in vc4_vec_encoder_enable()
The pm_runtime_get_sync will increment pm usage counter even it failed.Thus a pairing decrement is needed. Change pm_runtime_get_sync to pm_runtime_resume_and_get for keeping usage counter balanced. Reported-by: Hulk Robot Signed-off-by: Bixuan Cui --- drivers/gpu/drm/vc4/vc4_vec.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/gpu/drm/vc4/vc4_vec.c b/drivers/gpu/drm/vc4/vc4_vec.c index bd5b8eb58b18..924e03050dd4 100644 --- a/drivers/gpu/drm/vc4/vc4_vec.c +++ b/drivers/gpu/drm/vc4/vc4_vec.c @@ -403,7 +403,7 @@ static void vc4_vec_encoder_enable(struct drm_encoder *encoder) struct vc4_vec *vec = vc4_vec_encoder->vec; int ret; - ret = pm_runtime_get_sync(>pdev->dev); + ret = pm_runtime_resume_and_get(>pdev->dev); if (ret < 0) { DRM_ERROR("Failed to retain power domain: %d\n", ret); return;
[PATCH] usb: core: hub: Fix PM reference leak in usb_port_resume()
pm_runtime_get_sync will increment pm usage counter even it failed. thus a pairing decrement is needed. Fix it by replacing it with pm_runtime_resume_and_get to keep usage counter balanced. Reported-by: Hulk Robot Signed-off-by: Bixuan Cui --- drivers/usb/core/hub.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/core/hub.c b/drivers/usb/core/hub.c index 9a83390072da..b2bc4b7c4289 100644 --- a/drivers/usb/core/hub.c +++ b/drivers/usb/core/hub.c @@ -3605,7 +3605,7 @@ int usb_port_resume(struct usb_device *udev, pm_message_t msg) u16 portchange, portstatus; if (!test_and_set_bit(port1, hub->child_usage_bits)) { - status = pm_runtime_get_sync(_dev->dev); + status = pm_runtime_resume_and_get(_dev->dev); if (status < 0) { dev_dbg(>dev, "can't resume usb port, status %d\n", status); -- 2.17.1
[PATCH -next] usb: musb: fix PM reference leak in musb_irq_work()
pm_runtime_get_sync will increment pm usage counter even it failed. thus a pairing decrement is needed. Fix it by replacing it with pm_runtime_resume_and_get to keep usage counter balanced. Reported-by: Hulk Robot Signed-off-by: Bixuan Cui --- drivers/usb/musb/musb_core.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/musb/musb_core.c b/drivers/usb/musb/musb_core.c index fc0457db62e1..8f09a387b773 100644 --- a/drivers/usb/musb/musb_core.c +++ b/drivers/usb/musb/musb_core.c @@ -2070,7 +2070,7 @@ static void musb_irq_work(struct work_struct *data) struct musb *musb = container_of(data, struct musb, irq_work.work); int error; - error = pm_runtime_get_sync(musb->controller); + error = pm_runtime_resume_and_get(musb->controller); if (error < 0) { dev_err(musb->controller, "Could not enable: %i\n", error);
Re: [PATCH] s390/pci: move ioremap/ioremap_prot/ioremap_wc/ioremap_wt/iounmap to arch/s390/mm/ioremap.c
On 2021/4/6 19:14, Niklas Schnelle wrote: > and move the have_mio variable out of the PCI only code or use a raw > "#ifdef CONFIG_PCI". Obviously we don't have any actual users of > ioremap() that don't depend on CONFIG_PCI but it would make it so that > ioremap() exists and should actually function without CONFIG_PCI. > The weird part though is that for anyone using it without CONFIG_PCI it > would stop working if that is set and the machine doesn't have MIO > support but would work if it does. Well, Maybe it's better not to change it.And thank you for the explanation. Thanks, Bixuan Cui
Re: [PATCH] media: Fix compilation error
On 2021/4/7 21:45, Mikko Perttunen wrote: > This change was done only very recently, it's in linux-next and submitted for > 5.13. I missed this one host1x_syncpt_free call in vi.c, but Thierry has > already applied an equivalent patch on his end so the issue should be > resolved. Yes,this build error has been fixed in the next-20210407. Thanks, Bixuan Cui
[PATCH] ASoC: tegra: fix build warning
The following function may have no callers, so they're marked __maybe_unused to avoid warning: sound/soc/tegra/tegra30_i2s.c:50:12: warning: ‘tegra30_i2s_runtime_resume’ defined but not used [-Wunused-function] static int tegra30_i2s_runtime_resume(struct device *dev) ^~ sound/soc/tegra/tegra30_i2s.c:39:12: warning: ‘tegra30_i2s_runtime_suspend’ defined but not used [-Wunused-function] static int tegra30_i2s_runtime_suspend(struct device *dev) ^~~ sound/soc/tegra/tegra20_i2s.c:48:12: warning: ‘tegra20_i2s_runtime_resume’ defined but not used [-Wunused-function] static int tegra20_i2s_runtime_resume(struct device *dev) ^~ sound/soc/tegra/tegra20_i2s.c:37:12: warning: ‘tegra20_i2s_runtime_suspend’ defined but not used [-Wunused-function] static int tegra20_i2s_runtime_suspend(struct device *dev) ^~~ sound/soc/tegra/tegra30_ahub.c:64:12: warning: ‘tegra30_ahub_runtime_resume’ defined but not used [-Wunused-function] static int tegra30_ahub_runtime_resume(struct device *dev) ^~~ sound/soc/tegra/tegra30_ahub.c:43:12: warning: ‘tegra30_ahub_runtime_suspend’ defined but not used [-Wunused-function] static int tegra30_ahub_runtime_suspend(struct device *dev) ^~~~ Fixes: 82ef0ae46b86 ('ASoC: tegra: add runtime PM support') Fixes: be944d42ccc1 ('ASoC: tegra: add tegra30-ahub driver') Fixes: 4fb0384f3dc6 ('ASoC: tegra: add tegra30-i2s driver') Reported-by: Hulk Robot Signed-off-by: Bixuan Cui --- sound/soc/tegra/tegra20_i2s.c | 4 ++-- sound/soc/tegra/tegra30_ahub.c | 4 ++-- sound/soc/tegra/tegra30_i2s.c | 4 ++-- 3 files changed, 6 insertions(+), 6 deletions(-) diff --git a/sound/soc/tegra/tegra20_i2s.c b/sound/soc/tegra/tegra20_i2s.c index 1b27f81c10fe..b280ebd72591 100644 --- a/sound/soc/tegra/tegra20_i2s.c +++ b/sound/soc/tegra/tegra20_i2s.c @@ -34,7 +34,7 @@ #define DRV_NAME "tegra20-i2s" -static int tegra20_i2s_runtime_suspend(struct device *dev) +static __maybe_unused int tegra20_i2s_runtime_suspend(struct device *dev) { struct tegra20_i2s *i2s = dev_get_drvdata(dev); @@ -45,7 +45,7 @@ static int tegra20_i2s_runtime_suspend(struct device *dev) return 0; } -static int tegra20_i2s_runtime_resume(struct device *dev) +static __maybe_unused int tegra20_i2s_runtime_resume(struct device *dev) { struct tegra20_i2s *i2s = dev_get_drvdata(dev); int ret; diff --git a/sound/soc/tegra/tegra30_ahub.c b/sound/soc/tegra/tegra30_ahub.c index d1718f3af3cd..4692c70ed933 100644 --- a/sound/soc/tegra/tegra30_ahub.c +++ b/sound/soc/tegra/tegra30_ahub.c @@ -40,7 +40,7 @@ static inline void tegra30_audio_write(u32 reg, u32 val) regmap_write(ahub->regmap_ahub, reg, val); } -static int tegra30_ahub_runtime_suspend(struct device *dev) +static __maybe_unused int tegra30_ahub_runtime_suspend(struct device *dev) { regcache_cache_only(ahub->regmap_apbif, true); regcache_cache_only(ahub->regmap_ahub, true); @@ -61,7 +61,7 @@ static int tegra30_ahub_runtime_suspend(struct device *dev) * stopping streams should dynamically adjust the clock as required. However, * this is not yet implemented. */ -static int tegra30_ahub_runtime_resume(struct device *dev) +static __maybe_unused int tegra30_ahub_runtime_resume(struct device *dev) { int ret; diff --git a/sound/soc/tegra/tegra30_i2s.c b/sound/soc/tegra/tegra30_i2s.c index 8730ffa0f691..36344f0a64c1 100644 --- a/sound/soc/tegra/tegra30_i2s.c +++ b/sound/soc/tegra/tegra30_i2s.c @@ -36,7 +36,7 @@ #define DRV_NAME "tegra30-i2s" -static int tegra30_i2s_runtime_suspend(struct device *dev) +static __maybe_unused int tegra30_i2s_runtime_suspend(struct device *dev) { struct tegra30_i2s *i2s = dev_get_drvdata(dev); @@ -47,7 +47,7 @@ static int tegra30_i2s_runtime_suspend(struct device *dev) return 0; } -static int tegra30_i2s_runtime_resume(struct device *dev) +static __maybe_unused int tegra30_i2s_runtime_resume(struct device *dev) { struct tegra30_i2s *i2s = dev_get_drvdata(dev); int ret; -- 2.17.1
[PATCH] media: Fix compilation error
Fix the error: drivers/staging/media/tegra-video/vi.c:1180:4: error: implicit declaration of function 'host1x_syncpt_free' [-Werror,-Wimplicit-function-declaration] Fixes: 3028a00c55bf ('gpu: host1x: Cleanup and refcounting for syncpoints') Reported-by: Hulk Robot Signed-off-by: Bixuan Cui --- drivers/staging/media/tegra-video/vi.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/media/tegra-video/vi.c b/drivers/staging/media/tegra-video/vi.c index 7e0cb5529b49..df5ca3596470 100644 --- a/drivers/staging/media/tegra-video/vi.c +++ b/drivers/staging/media/tegra-video/vi.c @@ -1177,7 +1177,7 @@ static int tegra_channel_host1x_syncpt_init(struct tegra_vi_channel *chan) mw_sp = host1x_syncpt_request(>client, flags); if (!mw_sp) { dev_err(vi->dev, "failed to request memory ack syncpoint\n"); - host1x_syncpt_free(fs_sp); + host1x_syncpt_put(fs_sp); ret = -ENOMEM; goto free_syncpts; } -- 2.17.1
[PATCH] s390/pci: move ioremap/ioremap_prot/ioremap_wc/ioremap_wt/iounmap to arch/s390/mm/ioremap.c
The ioremap/iounmap is implemented in arch/s390/pci/pci.c. While CONFIG_PCI is disabled,the compilation error is reported: s390x-linux-gnu-ld: drivers/pcmcia/cistpl.o: in function `set_cis_map': cistpl.c:(.text+0x32a): undefined reference to `ioremap' s390x-linux-gnu-ld: cistpl.c:(.text+0x360): undefined reference to `iounmap' s390x-linux-gnu-ld: cistpl.c:(.text+0x384): undefined reference to `iounmap' s390x-linux-gnu-ld: cistpl.c:(.text+0x396): undefined reference to `ioremap' s390x-linux-gnu-ld: drivers/pcmcia/cistpl.o: in function `release_cis_mem': cistpl.c:(.text+0xcb8): undefined reference to `iounmap' Add arch/s390/mm/ioremap.c file and move ioremap/ioremap_wc/ioremap_rt/iounmap to it to fix the error. Reported-by: kernel test robot Signed-off-by: Bixuan Cui --- arch/s390/include/asm/io.h | 8 ++--- arch/s390/mm/Makefile | 2 +- arch/s390/mm/ioremap.c | 64 + arch/s390/pci/pci.c| 73 ++ 4 files changed, 80 insertions(+), 67 deletions(-) create mode 100644 arch/s390/mm/ioremap.c diff --git a/arch/s390/include/asm/io.h b/arch/s390/include/asm/io.h index e3882b012bfa..48a55644c34f 100644 --- a/arch/s390/include/asm/io.h +++ b/arch/s390/include/asm/io.h @@ -22,6 +22,10 @@ void unxlate_dev_mem_ptr(phys_addr_t phys, void *addr); #define IO_SPACE_LIMIT 0 +#define ioremap ioremap +#define ioremap_wt ioremap_wt +#define ioremap_wc ioremap_wc + void __iomem *ioremap_prot(phys_addr_t addr, size_t size, unsigned long prot); void __iomem *ioremap(phys_addr_t addr, size_t size); void __iomem *ioremap_wc(phys_addr_t addr, size_t size); @@ -51,10 +55,6 @@ static inline void ioport_unmap(void __iomem *p) #define pci_iomap_wc pci_iomap_wc #define pci_iomap_wc_range pci_iomap_wc_range -#define ioremap ioremap -#define ioremap_wt ioremap_wt -#define ioremap_wc ioremap_wc - #define memcpy_fromio(dst, src, count) zpci_memcpy_fromio(dst, src, count) #define memcpy_toio(dst, src, count) zpci_memcpy_toio(dst, src, count) #define memset_io(dst, val, count) zpci_memset_io(dst, val, count) diff --git a/arch/s390/mm/Makefile b/arch/s390/mm/Makefile index cd67e94c16aa..74c22dfb131b 100644 --- a/arch/s390/mm/Makefile +++ b/arch/s390/mm/Makefile @@ -4,7 +4,7 @@ # obj-y := init.o fault.o extmem.o mmap.o vmem.o maccess.o -obj-y += page-states.o pageattr.o pgtable.o pgalloc.o +obj-y += page-states.o pageattr.o pgtable.o pgalloc.o ioremap.o obj-$(CONFIG_CMM) += cmm.o obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o diff --git a/arch/s390/mm/ioremap.c b/arch/s390/mm/ioremap.c new file mode 100644 index ..132e6ddff36f --- /dev/null +++ b/arch/s390/mm/ioremap.c @@ -0,0 +1,64 @@ +// SPDX-License-Identifier: GPL-2.0-only +/* + * Copyright (C) 2021 Huawei Ltd. + * Author: Bixuan Cui + */ +#include +#include +#include +#include + +static void __iomem *__ioremap(phys_addr_t addr, size_t size, pgprot_t prot) +{ + unsigned long offset, vaddr; + struct vm_struct *area; + phys_addr_t last_addr; + + last_addr = addr + size - 1; + if (!size || last_addr < addr) + return NULL; + + offset = addr & ~PAGE_MASK; + addr &= PAGE_MASK; + size = PAGE_ALIGN(size + offset); + area = get_vm_area(size, VM_IOREMAP); + if (!area) + return NULL; + + vaddr = (unsigned long) area->addr; + if (ioremap_page_range(vaddr, vaddr + size, addr, prot)) { + free_vm_area(area); + return NULL; + } + return (void __iomem *) ((unsigned long) area->addr + offset); +} + +void __iomem *ioremap_prot(phys_addr_t addr, size_t size, unsigned long prot) +{ + return __ioremap(addr, size, __pgprot(prot)); +} +EXPORT_SYMBOL(ioremap_prot); + +void __iomem *ioremap(phys_addr_t addr, size_t size) +{ + return __ioremap(addr, size, PAGE_KERNEL); +} +EXPORT_SYMBOL(ioremap); + +void __iomem *ioremap_wc(phys_addr_t addr, size_t size) +{ + return __ioremap(addr, size, pgprot_writecombine(PAGE_KERNEL)); +} +EXPORT_SYMBOL(ioremap_wc); + +void __iomem *ioremap_wt(phys_addr_t addr, size_t size) +{ + return __ioremap(addr, size, pgprot_writethrough(PAGE_KERNEL)); +} +EXPORT_SYMBOL(ioremap_wt); + +void iounmap(volatile void __iomem *addr) +{ + vunmap((__force void *) ((unsigned long) addr & PAGE_MASK)); +} +EXPORT_SYMBOL(iounmap); diff --git a/arch/s390/pci/pci.c b/arch/s390/pci/pci.c index dd14641b2d20..be300850df9c 100644 --- a/arch/s390/pci/pci.c +++ b/arch/s390/pci/pci.c @@ -227,65 +227,6 @@ void __iowrite64_copy(void __iomem *to, const void *from, size_t count) zpci_memcpy_toio(to, from, count); } -static void __iomem *__ioremap(phys_addr_t addr, size_t size, pgprot_t prot) -{ - unsigned long offset, vaddr; - struct vm_struct *area; - phys_addr_t last_addr; - - last_a
[PATCH -next] e1000e: Fix 'defined but not used' warning
Fix the warning while disable CONFIG_PM_SLEEP: drivers/net/ethernet/intel/e1000e/netdev.c:6926:12: warning: ‘e1000e_pm_prepare’ defined but not used [-Wunused-function] static int e1000e_pm_prepare(struct device *dev) ^ Signed-off-by: Bixuan Cui --- drivers/net/ethernet/intel/e1000e/netdev.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/intel/e1000e/netdev.c b/drivers/net/ethernet/intel/e1000e/netdev.c index f1c9debd9f3b..d2e4653536c5 100644 --- a/drivers/net/ethernet/intel/e1000e/netdev.c +++ b/drivers/net/ethernet/intel/e1000e/netdev.c @@ -6923,7 +6923,7 @@ static int __e1000_resume(struct pci_dev *pdev) return 0; } -static int e1000e_pm_prepare(struct device *dev) +static __maybe_unused int e1000e_pm_prepare(struct device *dev) { return pm_runtime_suspended(dev) && pm_suspend_via_firmware(); -- 2.17.1
[PATCH -next] drm/rockchip: remove unused variable 'old_state'
Fix the warning: drivers/gpu/drm/rockchip/rockchip_drm_vop.c:882:26:warning: unused variable ‘old_state’ [-Wunused-variable] struct drm_plane_state *old_state = drm_atomic_get_old_plane_state(state, Signed-off-by: Bixuan Cui --- drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 2 -- 1 file changed, 2 deletions(-) diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c index 81c70d7a0471..64469439ddf2 100644 --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c @@ -879,8 +879,6 @@ static void vop_plane_atomic_disable(struct drm_plane *plane, static void vop_plane_atomic_update(struct drm_plane *plane, struct drm_atomic_state *state) { - struct drm_plane_state *old_state = drm_atomic_get_old_plane_state(state, - plane); struct drm_plane_state *new_state = drm_atomic_get_new_plane_state(state, plane); struct drm_crtc *crtc = new_state->crtc; -- 2.17.1
[PATCH v5 1/2] perf tools: add 'perf irq' to measure the hardware interrupts
Add 'perf irq' to trace/measure the hardware interrupts. Now three functions are provided: 1. 'perf irq record ' to record the irq handler events. 2. 'perf irq script' to see a detailed trace of the workload that was recorded. 3. 'perf irq report' to calculate the time consumed by each hardware interrupt processing function. Signed-off-by: Bixuan Cui --- tools/perf/Build | 1 + tools/perf/builtin-irq.c | 260 +++ tools/perf/builtin.h | 1 + tools/perf/perf.c| 1 + 4 files changed, 263 insertions(+) create mode 100644 tools/perf/builtin-irq.c diff --git a/tools/perf/Build b/tools/perf/Build index db61dbe2b543..ad8e3c19bb03 100644 --- a/tools/perf/Build +++ b/tools/perf/Build @@ -25,6 +25,7 @@ perf-y += builtin-data.o perf-y += builtin-version.o perf-y += builtin-c2c.o perf-y += builtin-daemon.o +perf-y += builtin-irq.o perf-$(CONFIG_TRACE) += builtin-trace.o perf-$(CONFIG_LIBELF) += builtin-probe.o diff --git a/tools/perf/builtin-irq.c b/tools/perf/builtin-irq.c new file mode 100644 index ..bf1e6efd85f8 --- /dev/null +++ b/tools/perf/builtin-irq.c @@ -0,0 +1,260 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "builtin.h" +#include "perf.h" +#include "perf-sys.h" + +#include "util/cpumap.h" +#include "util/evlist.h" +#include "util/evsel.h" +#include "util/evsel_fprintf.h" +#include "util/symbol.h" +#include "util/thread.h" +#include "util/header.h" +#include "util/session.h" +#include "util/tool.h" +#include "util/cloexec.h" +#include "util/thread_map.h" +#include "util/color.h" +#include "util/stat.h" +#include "util/string2.h" +#include "util/callchain.h" +#include "util/time-utils.h" + +#include +#include +#include "util/trace-event.h" + +#include "util/debug.h" +#include "util/event.h" + +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#define IRQ_NAME_LEN 24 +#define MAX_CPUS 4096 + +static const char *cpu_list; +static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS); + +struct perf_irq { + struct perf_tool tool; + bool force; + + char irq_name[MAX_CPUS][IRQ_NAME_LEN]; + u32 irq_num[MAX_CPUS]; + u64 irq_entry_time[MAX_CPUS]; + u64 irq_exit_time[MAX_CPUS]; +}; + +typedef int (*irq_handler)(struct perf_tool *tool, + struct evsel *evsel, + struct perf_sample *sample); + +static int perf_report_process_sample(struct perf_tool *tool, +union perf_event *event __maybe_unused, +struct perf_sample *sample, +struct evsel *evsel, +struct machine *machine __maybe_unused) +{ + int err = 0; + + if (evsel->handler != NULL) { + irq_handler f = evsel->handler; + err = f(tool, evsel, sample); + } + + return err; +} + +static void output_report(struct perf_irq *irq, u32 cpu) +{ + int ret, i; + char irq_entry_time[30], irq_exit_time[30], irq_diff[30]; + + timestamp__scnprintf_usec(irq->irq_entry_time[cpu], + irq_entry_time, sizeof(irq_entry_time)); + timestamp__scnprintf_usec(irq->irq_exit_time[cpu], + irq_exit_time, sizeof(irq_exit_time)); + timestamp__scnprintf_usec(irq->irq_exit_time[cpu] - irq->irq_entry_time[cpu], + irq_diff, sizeof(irq_diff)); + + ret = printf(" %s ", irq->irq_name[cpu]); + for (i = 0; i < IRQ_NAME_LEN - ret; i++) + printf(" "); + + printf("| [%04d] | %13s s | %16s s | %16s s\n", + cpu, irq_diff, irq_entry_time, irq_exit_time); +} + +static int report_irq_handler_entry_event(struct perf_tool *tool, + struct evsel *evsel, + struct perf_sample *sample) +{ + int this_cpu = sample->cpu, err = 0; + struct perf_irq *irq = container_of(tool, struct perf_irq, tool); + const char *name = evsel__strval(evsel, sample, "name"); + + irq->irq_entry_time[this_cpu] = sample->time; + + strncpy(irq->irq_name[this_cpu], name, IRQ_NAME_LEN - 1); + irq->irq_name[this_cpu][IRQ_NAME_LEN - 1] = '\0'; + + return err; +} + +static int report_irq_handler_exit_event(struct perf_tool *tool, + struct evsel *evsel __maybe_unused, + struct perf_sample *
[PATCH v5 0/2] perf tools: add 'perf irq' to measure the hardware interrupts
When the hardware interrupt processing function is executed, the interrupt and preemption of current cpu are disabled. As a result, the task is suspended. The execution of the hardware processing function takes a long time (for example 5 ms), will affect the task scheduling performance. This patches provides the 'perf irq' command to trace and calculate the time consumed of the hardware irq function. [verse] 'perf irq' [] {record|report} DESCRIPTION --- There are several variants of 'perf irq': 'perf irq record ' to record the irq handler events of an arbitrary workload. 'perf irq report' to calculate the time consumed by each hardware interrupt processing function. Example usage: perf irq record -- sleep 1 perf irq report By default it shows the individual irq events, including the irq name, cpu(execute the hardware interrupt processing function), time consumed, entry time and exit time for the each hardware irq: --- Irq name | CPU | Time consume us | Handler entry time | Handler exit time --- enp2s0f2-tx-0| [0006] | 0.01 s | 6631263.313329 s | 6631263.313330 s megasas | [0013] | 0.03 s | 6631263.209564 s | 6631263.209567 s acpi | [0016] | 0.18 s | 6631263.085787 s | 6631263.085805 s And: perf irq --cpu 78 record -- sleep 1 perf irq --cpu 78 report --- Irq name | CPU | Time consume us | Handler entry time | Handler exit time --- enp134s0f0-TxRx-2 | [0078] | 0.05 s |693757.533189 s | 693757.533194 s Changes from v5: * Resolve conflicts by the latest commit. Changes from v4: * Keep pairs of irq entry/exit per cpu; * Add NUL-termination to the end of irq->irq_name when strncpy is used; * Delete some unused declarations and parameters; Changes from v3: * Delete 'perf irq script' because its function can be implemented using 'perf script'; * Add --cpu option for 'perf irq'; Changes from v2: * Delete "-m", "1024" in __cmd_record(); * Change 'perf irq timeconsume ' to 'perf irq report '; * Fix a error for tools/perf/Documentation/perf-irq.txt; Bixuan Cui (2): perf tools: add 'perf irq' to measure the hardware interrupts perf tools: Add documentation for 'perf irq' command tools/perf/Build | 1 + tools/perf/Documentation/perf-irq.txt | 47 + tools/perf/builtin-irq.c | 260 ++ tools/perf/builtin.h | 1 + tools/perf/command-list.txt | 1 + tools/perf/perf.c | 1 + 6 files changed, 311 insertions(+) create mode 100644 tools/perf/Documentation/perf-irq.txt create mode 100644 tools/perf/builtin-irq.c -- 2.17.1
[PATCH v5 2/2] perf tools: Add documentation for 'perf irq' command
Add documentation for 'perf irq' command. Signed-off-by: Bixuan Cui --- tools/perf/Documentation/perf-irq.txt | 47 +++ tools/perf/command-list.txt | 1 + 2 files changed, 48 insertions(+) create mode 100644 tools/perf/Documentation/perf-irq.txt diff --git a/tools/perf/Documentation/perf-irq.txt b/tools/perf/Documentation/perf-irq.txt new file mode 100644 index ..8f1c466c3d6b --- /dev/null +++ b/tools/perf/Documentation/perf-irq.txt @@ -0,0 +1,47 @@ +perf-irq(1) += + +NAME + +perf-irq - Tool to trace/measure hardware interrupts + +SYNOPSIS + +[verse] +'perf irq' [] {record|report} + +DESCRIPTION +--- +There are several variants of 'perf irq': + + 'perf irq record ' to record the irq handler events + of an arbitrary workload. + + 'perf irq report' to calculate the time consumed by each + hardware interrupt processing function. + +Example usage: +perf irq record -- sleep 1 +perf irq report + + By default it shows the individual irq events, including the irq name, + cpu(execute the hardware interrupt processing function), time consumed, + entry time and exit time for the each hardware irq: + + --- + Irq name | CPU | Time consume us | Handler entry time | Handler exit time + --- + enp2s0f2-tx-0| [0006] | 0.01 s | 6631263.313329 s | 6631263.313330 s + megasas | [0013] | 0.03 s | 6631263.209564 s | 6631263.209567 s + acpi | [0016] | 0.18 s | 6631263.085787 s | 6631263.085805 s + + +OPTIONS for 'perf irq' + + +--cpus:: + Show just entries with activities for the given CPUs. + +SEE ALSO + +linkperf:perf-record[1] diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt index 825a12e8d694..bf56178d2895 100644 --- a/tools/perf/command-list.txt +++ b/tools/perf/command-list.txt @@ -26,6 +26,7 @@ perf-report mainporcelain common perf-sched mainporcelain common perf-scriptmainporcelain common perf-stat mainporcelain common +perf-irq mainporcelain common perf-test mainporcelain common perf-timechart mainporcelain common perf-top mainporcelain common -- 2.17.1
Re: [PATCH v4 0/2] perf tools: add 'perf irq' to measure the hardware interrupts
On 2021/2/24 20:45, Jiri Olsa wrote: > hi, > I can't apply this on later Arnaldo's perf/core, > what commit/branch is this based one? Thanks,I check it. > > thanks, > jirka
[PATCH v4 0/2] perf tools: add 'perf irq' to measure the hardware interrupts
When the hardware interrupt processing function is executed, the interrupt and preemption of current cpu are disabled. As a result, the task is suspended. The execution of the hardware processing function takes a long time (for example 5 ms), will affect the task scheduling performance. This patches provides the 'perf irq' command to trace and calculate the time consumed of the hardware irq function. [verse] 'perf irq' [] {record|report} DESCRIPTION --- There are several variants of 'perf irq': 'perf irq record ' to record the irq handler events of an arbitrary workload. 'perf irq report' to calculate the time consumed by each hardware interrupt processing function. Example usage: perf irq record -- sleep 1 perf irq report By default it shows the individual irq events, including the irq name, cpu(execute the hardware interrupt processing function), time consumed, entry time and exit time for the each hardware irq: --- Irq name | CPU | Time consume us | Handler entry time | Handler exit time --- enp2s0f2-tx-0| [0006] | 0.01 s | 6631263.313329 s | 6631263.313330 s megasas | [0013] | 0.03 s | 6631263.209564 s | 6631263.209567 s acpi | [0016] | 0.18 s | 6631263.085787 s | 6631263.085805 s And: perf irq --cpu 78 record -- sleep 1 perf irq --cpu 78 report --- Irq name | CPU | Time consume us | Handler entry time | Handler exit time --- enp134s0f0-TxRx-2 | [0078] | 0.05 s |693757.533189 s | 693757.533194 s Changes from v4: * Keep pairs of irq entry/exit per cpu; * Add NUL-termination to the end of irq->irq_name when strncpy is used; * Delete some unused declarations and parameters; Changes from v3: * Delete 'perf irq script' because its function can be implemented using 'perf script'; * Add --cpu option for 'perf irq'; Changes from v2: * Delete "-m", "1024" in __cmd_record(); * Change 'perf irq timeconsume ' to 'perf irq report '; * Fix a error for tools/perf/Documentation/perf-irq.txt; Bixuan Cui (2): perf tools: add 'perf irq' to measure the hardware interrupts perf tools: Add documentation for 'perf irq' command tools/perf/Build | 1 + tools/perf/Documentation/perf-irq.txt | 47 + tools/perf/builtin-irq.c | 259 ++ tools/perf/builtin.h | 1 + tools/perf/command-list.txt | 1 + tools/perf/perf.c | 1 + 6 files changed, 310 insertions(+) create mode 100644 tools/perf/Documentation/perf-irq.txt create mode 100644 tools/perf/builtin-irq.c -- 2.17.1
[PATCH v4 1/2] perf tools: add 'perf irq' to measure the hardware interrupts
Add 'perf irq' to trace/measure the hardware interrupts. Now three functions are provided: 1. 'perf irq record ' to record the irq handler events. 2. 'perf irq script' to see a detailed trace of the workload that was recorded. 3. 'perf irq report' to calculate the time consumed by each hardware interrupt processing function. Signed-off-by: Bixuan Cui --- tools/perf/Build | 1 + tools/perf/builtin-irq.c | 260 +++ tools/perf/builtin.h | 1 + tools/perf/perf.c| 1 + 4 files changed, 263 insertions(+) create mode 100644 tools/perf/builtin-irq.c diff --git a/tools/perf/Build b/tools/perf/Build index 5f392dbb88fc..d52a1e1d6d8a 100644 --- a/tools/perf/Build +++ b/tools/perf/Build @@ -24,6 +24,7 @@ perf-y += builtin-mem.o perf-y += builtin-data.o perf-y += builtin-version.o perf-y += builtin-c2c.o +perf-y += builtin-irq.o perf-$(CONFIG_TRACE) += builtin-trace.o perf-$(CONFIG_LIBELF) += builtin-probe.o diff --git a/tools/perf/builtin-irq.c b/tools/perf/builtin-irq.c new file mode 100644 index ..bf1e6efd85f8 --- /dev/null +++ b/tools/perf/builtin-irq.c @@ -0,0 +1,260 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "builtin.h" +#include "perf.h" +#include "perf-sys.h" + +#include "util/cpumap.h" +#include "util/evlist.h" +#include "util/evsel.h" +#include "util/evsel_fprintf.h" +#include "util/symbol.h" +#include "util/thread.h" +#include "util/header.h" +#include "util/session.h" +#include "util/tool.h" +#include "util/cloexec.h" +#include "util/thread_map.h" +#include "util/color.h" +#include "util/stat.h" +#include "util/string2.h" +#include "util/callchain.h" +#include "util/time-utils.h" + +#include +#include +#include "util/trace-event.h" + +#include "util/debug.h" +#include "util/event.h" + +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#define IRQ_NAME_LEN 24 +#define MAX_CPUS 4096 + +static const char *cpu_list; +static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS); + +struct perf_irq { + struct perf_tool tool; + bool force; + + char irq_name[MAX_CPUS][IRQ_NAME_LEN]; + u32 irq_num[MAX_CPUS]; + u64 irq_entry_time[MAX_CPUS]; + u64 irq_exit_time[MAX_CPUS]; +}; + +typedef int (*irq_handler)(struct perf_tool *tool, + struct evsel *evsel, + struct perf_sample *sample); + +static int perf_report_process_sample(struct perf_tool *tool, +union perf_event *event __maybe_unused, +struct perf_sample *sample, +struct evsel *evsel, +struct machine *machine __maybe_unused) +{ + int err = 0; + + if (evsel->handler != NULL) { + irq_handler f = evsel->handler; + err = f(tool, evsel, sample); + } + + return err; +} + +static void output_report(struct perf_irq *irq, u32 cpu) +{ + int ret, i; + char irq_entry_time[30], irq_exit_time[30], irq_diff[30]; + + timestamp__scnprintf_usec(irq->irq_entry_time[cpu], + irq_entry_time, sizeof(irq_entry_time)); + timestamp__scnprintf_usec(irq->irq_exit_time[cpu], + irq_exit_time, sizeof(irq_exit_time)); + timestamp__scnprintf_usec(irq->irq_exit_time[cpu] - irq->irq_entry_time[cpu], + irq_diff, sizeof(irq_diff)); + + ret = printf(" %s ", irq->irq_name[cpu]); + for (i = 0; i < IRQ_NAME_LEN - ret; i++) + printf(" "); + + printf("| [%04d] | %13s s | %16s s | %16s s\n", + cpu, irq_diff, irq_entry_time, irq_exit_time); +} + +static int report_irq_handler_entry_event(struct perf_tool *tool, + struct evsel *evsel, + struct perf_sample *sample) +{ + int this_cpu = sample->cpu, err = 0; + struct perf_irq *irq = container_of(tool, struct perf_irq, tool); + const char *name = evsel__strval(evsel, sample, "name"); + + irq->irq_entry_time[this_cpu] = sample->time; + + strncpy(irq->irq_name[this_cpu], name, IRQ_NAME_LEN - 1); + irq->irq_name[this_cpu][IRQ_NAME_LEN - 1] = '\0'; + + return err; +} + +static int report_irq_handler_exit_event(struct perf_tool *tool, + struct evsel *evsel __maybe_unused, + struct perf_sample *
[PATCH v4 2/2] perf tools: Add documentation for 'perf irq' command
Add documentation for 'perf irq' command. Signed-off-by: Bixuan Cui --- tools/perf/Documentation/perf-irq.txt | 47 +++ tools/perf/command-list.txt | 1 + 2 files changed, 48 insertions(+) create mode 100644 tools/perf/Documentation/perf-irq.txt diff --git a/tools/perf/Documentation/perf-irq.txt b/tools/perf/Documentation/perf-irq.txt new file mode 100644 index ..8f1c466c3d6b --- /dev/null +++ b/tools/perf/Documentation/perf-irq.txt @@ -0,0 +1,47 @@ +perf-irq(1) += + +NAME + +perf-irq - Tool to trace/measure hardware interrupts + +SYNOPSIS + +[verse] +'perf irq' [] {record|report} + +DESCRIPTION +--- +There are several variants of 'perf irq': + + 'perf irq record ' to record the irq handler events + of an arbitrary workload. + + 'perf irq report' to calculate the time consumed by each + hardware interrupt processing function. + +Example usage: +perf irq record -- sleep 1 +perf irq report + + By default it shows the individual irq events, including the irq name, + cpu(execute the hardware interrupt processing function), time consumed, + entry time and exit time for the each hardware irq: + + --- + Irq name | CPU | Time consume us | Handler entry time | Handler exit time + --- + enp2s0f2-tx-0| [0006] | 0.01 s | 6631263.313329 s | 6631263.313330 s + megasas | [0013] | 0.03 s | 6631263.209564 s | 6631263.209567 s + acpi | [0016] | 0.18 s | 6631263.085787 s | 6631263.085805 s + + +OPTIONS for 'perf irq' + + +--cpus:: + Show just entries with activities for the given CPUs. + +SEE ALSO + +linkperf:perf-record[1] diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt index bc6c585f74fc..c5224ea3ac71 100644 --- a/tools/perf/command-list.txt +++ b/tools/perf/command-list.txt @@ -26,6 +26,7 @@ perf-report mainporcelain common perf-sched mainporcelain common perf-scriptmainporcelain common perf-stat mainporcelain common +perf-irq mainporcelain common perf-test mainporcelain common perf-timechart mainporcelain common perf-top mainporcelain common -- 2.17.1
Re: [PATCH v3 1/2] perf tools: add 'perf irq' to measure the hardware interrupts
On 2021/1/29 21:57, Hagen Paul Pfeifer wrote: > Idea: why now pre-calc the max IRQ name length and adjust the columns > dynamically? Time consume us, entry time and exit time can be threatet static > in (max) length lenght. This overlong, static ASCII lines are no pleasure! ;-) > > Furthermore, why not a --format feature? > *perf irq report --format csv* option? Your human readable ASCII version is > okay, but often you had thousands of IRQs where just one IRQ is an outliner. > Pipe the output to numpy and matplotlib is often the required post-precessing > step, but parsing the ASCII output is tortuous. > Good idea. In fact, I plan to output part of (maximum) data. I'll consider adding '--format csv*' option. Thanks Bixuan Cui > hgn
Re: [PATCH v3 1/2] perf tools: add 'perf irq' to measure the hardware interrupts
On 2021/1/27 15:37, Namhyung Kim wrote: > Note that strncpy doesn't guarantee the NUL-termination. > You'd better do it by yourself just in case. Thanks, I'll fix these bugs. > > Thanks, > Namhyung
[PATCH v3 1/2] perf tools: add 'perf irq' to measure the hardware interrupts
Add 'perf irq' to trace/measure the hardware interrupts. Now three functions are provided: 1. 'perf irq record ' to record the irq handler events. 2. 'perf irq script' to see a detailed trace of the workload that was recorded. 3. 'perf irq report' to calculate the time consumed by each hardware interrupt processing function. Signed-off-by: Bixuan Cui --- tools/perf/Build | 1 + tools/perf/builtin-irq.c | 283 +++ tools/perf/builtin.h | 1 + tools/perf/perf.c| 1 + 4 files changed, 286 insertions(+) create mode 100644 tools/perf/builtin-irq.c diff --git a/tools/perf/Build b/tools/perf/Build index 5f392dbb88fc..d52a1e1d6d8a 100644 --- a/tools/perf/Build +++ b/tools/perf/Build @@ -24,6 +24,7 @@ perf-y += builtin-mem.o perf-y += builtin-data.o perf-y += builtin-version.o perf-y += builtin-c2c.o +perf-y += builtin-irq.o perf-$(CONFIG_TRACE) += builtin-trace.o perf-$(CONFIG_LIBELF) += builtin-probe.o diff --git a/tools/perf/builtin-irq.c b/tools/perf/builtin-irq.c new file mode 100644 index ..25ba0669a875 --- /dev/null +++ b/tools/perf/builtin-irq.c @@ -0,0 +1,283 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "builtin.h" +#include "perf.h" +#include "perf-sys.h" + +#include "util/cpumap.h" +#include "util/evlist.h" +#include "util/evsel.h" +#include "util/evsel_fprintf.h" +#include "util/symbol.h" +#include "util/thread.h" +#include "util/header.h" +#include "util/session.h" +#include "util/tool.h" +#include "util/cloexec.h" +#include "util/thread_map.h" +#include "util/color.h" +#include "util/stat.h" +#include "util/string2.h" +#include "util/callchain.h" +#include "util/time-utils.h" + +#include +#include +#include "util/trace-event.h" + +#include "util/debug.h" +#include "util/event.h" + +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#define IRQ_NAME_LEN 20 +#define MAX_CPUS 4096 + +static const char *cpu_list; +static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS); + +struct perf_irq; + +struct perf_irq { + struct perf_tool tool; + bool force; + + u32 irq_entry_irq; + char irq_name[IRQ_NAME_LEN]; + u32 cpu; + u64 irq_entry_time; + u32 irq_entry_pid; + u32 irq_exit_irq; + u64 irq_exit_time; + u32 irq_exit_pid; +}; + +typedef int (*irq_handler)(struct perf_tool *tool, + union perf_event *event, + struct evsel *evsel, + struct perf_sample *sample, + struct machine *machine); + +static int perf_report_process_sample(struct perf_tool *tool, +union perf_event *event, +struct perf_sample *sample, +struct evsel *evsel, +struct machine *machine) +{ + int err = 0; + + if (evsel->handler != NULL) { + irq_handler f = evsel->handler; + err = f(tool, event, evsel, sample, machine); + } + + return err; +} + +static void output_report(struct perf_irq *irq) +{ + int ret, i; + char irq_entry_time[30], irq_exit_time[30], irq_diff[30]; + + /* The entry and exit of the hardware irq function +* exist at the same time. Check it by irq and pid. +*/ + if (irq->irq_entry_pid != irq->irq_exit_pid || + irq->irq_entry_irq != irq->irq_exit_irq) + return; + + timestamp__scnprintf_usec(irq->irq_entry_time, + irq_entry_time, sizeof(irq_entry_time)); + timestamp__scnprintf_usec(irq->irq_exit_time, + irq_exit_time, sizeof(irq_exit_time)); + timestamp__scnprintf_usec(irq->irq_exit_time - irq->irq_entry_time, + irq_diff, sizeof(irq_diff)); + + ret = printf(" %s ", irq->irq_name); + for (i = 0; i < IRQ_NAME_LEN - ret; i++) + printf(" "); + + printf("| [%04d] | %13s s | %16s s | %16s s\n", + irq->cpu, irq_diff, irq_entry_time, irq_exit_time); +} + +static int report_irq_handler_entry_event(struct perf_tool *tool, + union perf_event *event __maybe_unused, + struct evsel *evsel, + struct perf_sample *sample, + struct machine *machine __maybe_unused) +{ + int err = 0; + struct perf_ir
[PATCH v3 2/2] perf tools: Add documentation for 'perf irq' command
Add documentation for 'perf irq' command. Signed-off-by: Bixuan Cui --- tools/perf/Documentation/perf-irq.txt | 47 +++ tools/perf/command-list.txt | 1 + 2 files changed, 48 insertions(+) create mode 100644 tools/perf/Documentation/perf-irq.txt diff --git a/tools/perf/Documentation/perf-irq.txt b/tools/perf/Documentation/perf-irq.txt new file mode 100644 index ..8f1c466c3d6b --- /dev/null +++ b/tools/perf/Documentation/perf-irq.txt @@ -0,0 +1,47 @@ +perf-irq(1) += + +NAME + +perf-irq - Tool to trace/measure hardware interrupts + +SYNOPSIS + +[verse] +'perf irq' [] {record|report} + +DESCRIPTION +--- +There are several variants of 'perf irq': + + 'perf irq record ' to record the irq handler events + of an arbitrary workload. + + 'perf irq report' to calculate the time consumed by each + hardware interrupt processing function. + +Example usage: +perf irq record -- sleep 1 +perf irq report + + By default it shows the individual irq events, including the irq name, + cpu(execute the hardware interrupt processing function), time consumed, + entry time and exit time for the each hardware irq: + + --- + Irq name | CPU | Time consume us | Handler entry time | Handler exit time + --- + enp2s0f2-tx-0| [0006] | 0.01 s | 6631263.313329 s | 6631263.313330 s + megasas | [0013] | 0.03 s | 6631263.209564 s | 6631263.209567 s + acpi | [0016] | 0.18 s | 6631263.085787 s | 6631263.085805 s + + +OPTIONS for 'perf irq' + + +--cpus:: + Show just entries with activities for the given CPUs. + +SEE ALSO + +linkperf:perf-record[1] diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt index bc6c585f74fc..c5224ea3ac71 100644 --- a/tools/perf/command-list.txt +++ b/tools/perf/command-list.txt @@ -26,6 +26,7 @@ perf-report mainporcelain common perf-sched mainporcelain common perf-scriptmainporcelain common perf-stat mainporcelain common +perf-irq mainporcelain common perf-test mainporcelain common perf-timechart mainporcelain common perf-top mainporcelain common -- 2.17.1
[PATCH v3 0/2] perf tools: add 'perf irq' to measure the hardware interrupts
When the hardware interrupt processing function is executed, the interrupt and preemption of current cpu are disabled. As a result, the task is suspended. The execution of the hardware processing function takes a long time (for example 5 ms), will affect the task scheduling performance. This patches provides the 'perf irq' command to trace and calculate the time consumed of the hardware irq function. [verse] 'perf irq' [] {record|report} DESCRIPTION --- There are several variants of 'perf irq': 'perf irq record ' to record the irq handler events of an arbitrary workload. 'perf irq report' to calculate the time consumed by each hardware interrupt processing function. Example usage: perf irq record -- sleep 1 perf irq report By default it shows the individual irq events, including the irq name, cpu(execute the hardware interrupt processing function), time consumed, entry time and exit time for the each hardware irq: --- Irq name | CPU | Time consume us | Handler entry time | Handler exit time --- enp2s0f2-tx-0| [0006] | 0.01 s | 6631263.313329 s | 6631263.313330 s megasas | [0013] | 0.03 s | 6631263.209564 s | 6631263.209567 s acpi | [0016] | 0.18 s | 6631263.085787 s | 6631263.085805 s And: perf irq --cpu 78 record -- sleep 1 perf irq --cpu 78 report --- Irq name | CPU | Time consume us | Handler entry time | Handler exit time --- enp134s0f0-TxRx-2 | [0078] | 0.05 s |693757.533189 s | 693757.533194 s Changes from v3: * Delete 'perf irq script' because its function can be implemented using 'perf script'; * Add --cpu option for 'perf irq'; Changes from v2: * Delete "-m", "1024" in __cmd_record(); * Change 'perf irq timeconsume ' to 'perf irq report '; * Fix a error for tools/perf/Documentation/perf-irq.txt; Bixuan Cui (2): perf tools: add 'perf irq' to measure the hardware interrupts perf tools: Add documentation for 'perf irq' command tools/perf/Build | 1 + tools/perf/Documentation/perf-irq.txt | 47 + tools/perf/builtin-irq.c | 283 ++ tools/perf/builtin.h | 1 + tools/perf/command-list.txt | 1 + tools/perf/perf.c | 1 + 6 files changed, 334 insertions(+) create mode 100644 tools/perf/Documentation/perf-irq.txt create mode 100644 tools/perf/builtin-irq.c -- 2.17.1
[PATCH v2 2/2] perf tools: Add documentation for 'perf irq' command
Add documentation for 'perf irq' command. Signed-off-by: Bixuan Cui --- tools/perf/Documentation/perf-irq.txt | 58 +++ tools/perf/command-list.txt | 1 + 2 files changed, 59 insertions(+) create mode 100644 tools/perf/Documentation/perf-irq.txt diff --git a/tools/perf/Documentation/perf-irq.txt b/tools/perf/Documentation/perf-irq.txt new file mode 100644 index ..22709b6df62d --- /dev/null +++ b/tools/perf/Documentation/perf-irq.txt @@ -0,0 +1,58 @@ +perf-irq(1) += + +NAME + +perf-irq - Tool to trace/measure hardware interrupts + +SYNOPSIS + +[verse] +'perf irq' {record|report|script} + +DESCRIPTION +--- +There are several variants of 'perf irq': + + 'perf irq record ' to record the irq handler events + of an arbitrary workload. + + 'perf irq script' to see a detailed trace of the workload that + was recorded (aliased to 'perf script' for now). + + 'perf irq report' to calculate the time consumed by each + hardware interrupt processing function. + +Example usage: +perf irq record -- sleep 1 +perf irq report + + By default it shows the individual irq events, including the irq name, + cpu(execute the hardware interrupt processing function), time consumed, + entry time and exit time for the each hardware irq: + + --- + Irq name | CPU | Time consume us | Handler entry time | Handler exit time + --- + enp2s0f2-tx-0| [0006] | 0.01 s | 6631263.313329 s | 6631263.313330 s + + --- + Irq name | CPU | Time consume us | Handler entry time | Handler exit time + --- + megasas | [0013] | 0.03 s | 6631263.209564 s | 6631263.209567 s + + --- + Irq name | CPU | Time consume us | Handler entry time | Handler exit time + --- + acpi | [0016] | 0.18 s | 6631263.085787 s | 6631263.085805 s + + +OPTIONS for 'perf irq report' + + +--cpus:: + Show just entries with activities for the given CPUs. + +SEE ALSO + +linkperf:perf-record[1] diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt index bc6c585f74fc..c5224ea3ac71 100644 --- a/tools/perf/command-list.txt +++ b/tools/perf/command-list.txt @@ -26,6 +26,7 @@ perf-report mainporcelain common perf-sched mainporcelain common perf-scriptmainporcelain common perf-stat mainporcelain common +perf-irq mainporcelain common perf-test mainporcelain common perf-timechart mainporcelain common perf-top mainporcelain common -- 2.17.1
[PATCH v2 1/2] perf tools: add 'perf irq' to measure the hardware interrupts
Add 'perf irq' to trace/measure the hardware interrupts. Now three functions are provided: 1. 'perf irq record ' to record the irq handler events. 2. 'perf irq script' to see a detailed trace of the workload that was recorded. 3. 'perf irq report' to calculate the time consumed by each hardware interrupt processing function. Signed-off-by: Bixuan Cui --- tools/perf/Build | 1 + tools/perf/builtin-irq.c | 287 +++ tools/perf/builtin.h | 1 + tools/perf/perf.c| 1 + 4 files changed, 290 insertions(+) create mode 100644 tools/perf/builtin-irq.c diff --git a/tools/perf/Build b/tools/perf/Build index 5f392dbb88fc..d52a1e1d6d8a 100644 --- a/tools/perf/Build +++ b/tools/perf/Build @@ -24,6 +24,7 @@ perf-y += builtin-mem.o perf-y += builtin-data.o perf-y += builtin-version.o perf-y += builtin-c2c.o +perf-y += builtin-irq.o perf-$(CONFIG_TRACE) += builtin-trace.o perf-$(CONFIG_LIBELF) += builtin-probe.o diff --git a/tools/perf/builtin-irq.c b/tools/perf/builtin-irq.c new file mode 100644 index ..58dd1a488edf --- /dev/null +++ b/tools/perf/builtin-irq.c @@ -0,0 +1,287 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "builtin.h" +#include "perf.h" +#include "perf-sys.h" + +#include "util/cpumap.h" +#include "util/evlist.h" +#include "util/evsel.h" +#include "util/evsel_fprintf.h" +#include "util/symbol.h" +#include "util/thread.h" +#include "util/header.h" +#include "util/session.h" +#include "util/tool.h" +#include "util/cloexec.h" +#include "util/thread_map.h" +#include "util/color.h" +#include "util/stat.h" +#include "util/string2.h" +#include "util/callchain.h" +#include "util/time-utils.h" + +#include +#include +#include "util/trace-event.h" + +#include "util/debug.h" +#include "util/event.h" + +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#define IRQ_NAME_LEN 20 +#define MAX_CPUS 4096 + +static const char *cpu_list; +static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS); + +struct perf_irq; + +struct perf_irq { + struct perf_tool tool; + bool force; + + u32 irq_entry_irq; + char irq_name[IRQ_NAME_LEN]; + u32 cpu; + u64 irq_entry_time; + u32 irq_entry_pid; + u32 irq_exit_irq; + u64 irq_exit_time; + u32 irq_exit_pid; +}; + +typedef int (*irq_handler)(struct perf_tool *tool, + union perf_event *event, + struct evsel *evsel, + struct perf_sample *sample, + struct machine *machine); + +static int perf_report_process_sample(struct perf_tool *tool, +union perf_event *event, +struct perf_sample *sample, +struct evsel *evsel, +struct machine *machine) +{ + int err = 0; + + if (evsel->handler != NULL) { + irq_handler f = evsel->handler; + err = f(tool, event, evsel, sample, machine); + } + + return err; +} + +static void output_report(struct perf_irq *irq) +{ + int ret, i; + char irq_entry_time[30], irq_exit_time[30], irq_diff[30]; + + /* The entry and exit of the hardware irq function +* exist at the same time. Check it by irq and pid. +*/ + if (irq->irq_entry_pid != irq->irq_exit_pid || + irq->irq_entry_irq != irq->irq_exit_irq) + return; + + timestamp__scnprintf_usec(irq->irq_entry_time, + irq_entry_time, sizeof(irq_entry_time)); + timestamp__scnprintf_usec(irq->irq_exit_time, + irq_exit_time, sizeof(irq_exit_time)); + timestamp__scnprintf_usec(irq->irq_exit_time - irq->irq_entry_time, + irq_diff, sizeof(irq_diff)); + + printf(" ---\n"); + printf(" Irq name | CPU | Time consume us | Handler entry time | Handler exit time \n"); + printf(" ---\n"); + + ret = printf(" %s ", irq->irq_name); + for (i = 0; i < IRQ_NAME_LEN - ret; i++) + printf(" "); + + printf("| [%04d] | %13s s | %16s s
[PATCH v2 0/2] perf tools: add 'perf irq' to measure the hardware interrupts
When the hardware interrupt processing function is executed, the interrupt and preemption of current cpu are disabled. As a result, the task is suspended. The execution of the hardware processing function takes a long time (for example 5 ms), will affect the task scheduling performance. This patches provides the 'perf irq' command to trace and calculate the time consumed of the hardware irq function. [verse] 'perf irq' {record|report|script} There are several variants of 'perf irq': 'perf irq record ' to record the irq handler events of an arbitrary workload. 'perf irq script' to see a detailed trace of the workload that was recorded (aliased to 'perf script' for now). 'perf irq report' to calculate the time consumed by each hardware interrupt processing function. Example usage: perf irq record -- sleep 1 perf irq report By default it shows the individual irq events, including the irq name, cpu(execute the hardware interrupt processing function), time consumed, entry time and exit time for the each hardware irq: --- Irq name | CPU | Time consume us | Handler entry time | Handler exit time --- enp2s0f2-tx-0| [0006] | 0.01 s | 6631263.313329 s | 6631263.313330 s --- Irq name | CPU | Time consume us | Handler entry time | Handler exit time --- megasas | [0013] | 0.03 s | 6631263.209564 s | 6631263.209567 s --- Irq name | CPU | Time consume us | Handler entry time | Handler exit time --- acpi | [0016] | 0.18 s | 6631263.085787 s | 6631263.085805 s And: perf irq report --cpu 78 --- Irq name | CPU | Time consume us | Handler entry time | Handler exit time --- enp134s0f0-TxRx-2 | [0078] | 0.05 s |693757.533189 s | 693757.533194 s Changes from v2: * Delete "-m", "1024" in __cmd_record() * Change 'perf irq timeconsume ' to 'perf irq report ' * Fix a error for tools/perf/Documentation/perf-irq.txt Bixuan Cui (2): perf tools: add 'perf irq' to measure the hardware interrupts perf tools: Add documentation for 'perf irq' command tools/perf/Build | 1 + tools/perf/Documentation/perf-irq.txt | 58 ++ tools/perf/builtin-irq.c | 287 ++ tools/perf/builtin.h | 1 + tools/perf/command-list.txt | 1 + tools/perf/perf.c | 1 + 6 files changed, 349 insertions(+) create mode 100644 tools/perf/Documentation/perf-irq.txt create mode 100644 tools/perf/builtin-irq.c -- 2.17.1
Re: [PATCH 1/2] perf tools: add 'perf irq' to measure the hardware interrupts
On 2021/1/13 3:50, Alexei Budankov wrote: > Hi Bixuan, > > On 12.01.2021 15:55, Bixuan Cui wrote: >> Add 'perf irq' to trace/measure the hardware interrupts. >> >> Now three functions are provided: >> 1. 'perf irq record ' to record the irq handler events. >> 2. 'perf irq script' to see a detailed trace of the workload that >>was recorded. >> 3. 'perf irq timeconsume' to calculate the time consumed by each >>hardware interrupt processing function. >> >> Signed-off-by: Bixuan Cui > Thanks for the patches. There is still something that could be improved. > >> --- >> tools/perf/Build | 1 + >> tools/perf/builtin-irq.c | 288 +++ >> tools/perf/builtin.h | 1 + >> tools/perf/perf.c| 1 + >> 4 files changed, 291 insertions(+) >> create mode 100644 tools/perf/builtin-irq.c >> > > >> + >> +static int __cmd_record(int argc, const char **argv) >> +{ >> +unsigned int rec_argc, i, j; >> +const char **rec_argv; >> +const char * const record_args[] = { >> +"record", >> +"-a", > I see it works also like this: > > sudo perf record -p PID -c 1 -e irq:irq_handler_entry,irq:irq_handler_exit > sudo perf record -R -c 1 -e irq:irq_handler_entry,irq:irq_handler_exit -- > find / > > This -a option jointly with -p option could be made configurable from > the command line for perf irq mode. That's true. We can add a series of commands for 'perf irq',such as record, script and report. So I kept the 'perf irq record'. > >> +"-R", >> +"-m", "1024", > Do you see data losses with default buffer size of 512KB > when capturing trace in your specific use case? > > If not then this -m could be avoided or made configurable > if you still need it. Thank you for your advice, I will delete it.
[PATCH 0/2] perf tools: add 'perf irq' to measure the hardware interrupts
When the hardware interrupt processing function is executed, the interrupt and preemption of current cpu are disabled. As a result, the task is suspended. The execution of the hardware processing function takes a long time (for example 5 ms), will affect the task scheduling performance. This patches provides the 'perf irq' command to trace and calculate the time consumed of the hardware irq function. [verse] 'perf irq' {record|timeconsume|script} There are several variants of 'perf irq': 'perf irq record ' to record the irq handler events of an arbitrary workload. 'perf irq script' to see a detailed trace of the workload that was recorded (aliased to 'perf script' for now). 'perf irq timeconsume' to calculate the time consumed by each hardware interrupt processing function. Example usage: perf irq record -- sleep 1 perf irq timeconsume By default it shows the individual irq events, including the irq name, cpu(execute the hardware interrupt processing function), time consumed, entry time and exit time for the each hardware irq: --- Irq name | CPU | Time consume us | Handler entry time | Handler exit time --- enp2s0f2-tx-0| [0006] | 0.01 s | 6631263.313329 s | 6631263.313330 s --- Irq name | CPU | Time consume us | Handler entry time | Handler exit time --- megasas | [0013] | 0.03 s | 6631263.209564 s | 6631263.209567 s --- Irq name | CPU | Time consume us | Handler entry time | Handler exit time --- acpi | [0016] | 0.18 s | 6631263.085787 s | 6631263.085805 s Bixuan Cui (2): perf tools: add 'perf irq' to measure the hardware interrupts perf tools: Add documentation for 'perf irq' command tools/perf/Build | 1 + tools/perf/Documentation/perf-irq.txt | 58 ++ tools/perf/builtin-irq.c | 288 ++ tools/perf/builtin.h | 1 + tools/perf/command-list.txt | 1 + tools/perf/perf.c | 1 + 6 files changed, 350 insertions(+) create mode 100644 tools/perf/Documentation/perf-irq.txt create mode 100644 tools/perf/builtin-irq.c -- 2.17.1
[PATCH 1/2] perf tools: add 'perf irq' to measure the hardware interrupts
Add 'perf irq' to trace/measure the hardware interrupts. Now three functions are provided: 1. 'perf irq record ' to record the irq handler events. 2. 'perf irq script' to see a detailed trace of the workload that was recorded. 3. 'perf irq timeconsume' to calculate the time consumed by each hardware interrupt processing function. Signed-off-by: Bixuan Cui --- tools/perf/Build | 1 + tools/perf/builtin-irq.c | 288 +++ tools/perf/builtin.h | 1 + tools/perf/perf.c| 1 + 4 files changed, 291 insertions(+) create mode 100644 tools/perf/builtin-irq.c diff --git a/tools/perf/Build b/tools/perf/Build index 5f392dbb88fc..d52a1e1d6d8a 100644 --- a/tools/perf/Build +++ b/tools/perf/Build @@ -24,6 +24,7 @@ perf-y += builtin-mem.o perf-y += builtin-data.o perf-y += builtin-version.o perf-y += builtin-c2c.o +perf-y += builtin-irq.o perf-$(CONFIG_TRACE) += builtin-trace.o perf-$(CONFIG_LIBELF) += builtin-probe.o diff --git a/tools/perf/builtin-irq.c b/tools/perf/builtin-irq.c new file mode 100644 index ..3a73e698dedf --- /dev/null +++ b/tools/perf/builtin-irq.c @@ -0,0 +1,288 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "builtin.h" +#include "perf.h" +#include "perf-sys.h" + +#include "util/cpumap.h" +#include "util/evlist.h" +#include "util/evsel.h" +#include "util/evsel_fprintf.h" +#include "util/symbol.h" +#include "util/thread.h" +#include "util/header.h" +#include "util/session.h" +#include "util/tool.h" +#include "util/cloexec.h" +#include "util/thread_map.h" +#include "util/color.h" +#include "util/stat.h" +#include "util/string2.h" +#include "util/callchain.h" +#include "util/time-utils.h" + +#include +#include +#include "util/trace-event.h" + +#include "util/debug.h" +#include "util/event.h" + +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#define IRQ_NAME_LEN 20 +#define MAX_CPUS 4096 + +static const char *cpu_list; +static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS); + +struct perf_irq; + +struct perf_irq { + struct perf_tool tool; + bool force; + + u32 irq_entry_irq; + char irq_name[IRQ_NAME_LEN]; + u32 cpu; + u64 irq_entry_time; + u32 irq_entry_pid; + u32 irq_exit_irq; + u64 irq_exit_time; + u32 irq_exit_pid; +}; + +typedef int (*irq_handler)(struct perf_tool *tool, + union perf_event *event, + struct evsel *evsel, + struct perf_sample *sample, + struct machine *machine); + +static int perf_timeconsume_process_sample(struct perf_tool *tool, +union perf_event *event, +struct perf_sample *sample, +struct evsel *evsel, +struct machine *machine) +{ + int err = 0; + + if (evsel->handler != NULL) { + irq_handler f = evsel->handler; + err = f(tool, event, evsel, sample, machine); + } + + return err; +} + +static void output_timeconsume(struct perf_irq *irq) +{ + int ret, i; + char irq_entry_time[30], irq_exit_time[30], irq_diff[30]; + + /* The entry and exit of the hardware irq function +* exist at the same time. Check it by irq and pid. +*/ + if (irq->irq_entry_pid != irq->irq_exit_pid || + irq->irq_entry_irq != irq->irq_exit_irq) + return; + + timestamp__scnprintf_usec(irq->irq_entry_time, + irq_entry_time, sizeof(irq_entry_time)); + timestamp__scnprintf_usec(irq->irq_exit_time, + irq_exit_time, sizeof(irq_exit_time)); + timestamp__scnprintf_usec(irq->irq_exit_time - irq->irq_entry_time, + irq_diff, sizeof(irq_diff)); + + printf(" ---\n"); + printf(" Irq name | CPU | Time consume us | Handler entry time | Handler exit time \n"); + printf(" ---\n"); + + ret = printf(" %s ", irq->irq_name); + for (i = 0; i < IRQ_NAME_LEN - ret; i++) + printf(" "); + + printf("| [%04d] | %13s s | %16s s | %1
[PATCH 2/2] perf tools: Add documentation for 'perf irq' command
Add documentation for 'perf irq' command. Signed-off-by: Bixuan Cui --- tools/perf/Documentation/perf-irq.txt | 58 +++ tools/perf/command-list.txt | 1 + 2 files changed, 59 insertions(+) create mode 100644 tools/perf/Documentation/perf-irq.txt diff --git a/tools/perf/Documentation/perf-irq.txt b/tools/perf/Documentation/perf-irq.txt new file mode 100644 index ..8c0e388dad59 --- /dev/null +++ b/tools/perf/Documentation/perf-irq.txt @@ -0,0 +1,58 @@ +perf-irq(1) += + +NAME + +perf-irq - Tool to trace/measure hardware interrupts + +SYNOPSIS + +[verse] +'perf irq' {record|timeconsume|script} + +DESCRIPTION +--- +There are several variants of 'perf irq': + + 'perf irq record ' to record the irq handler events + of an arbitrary workload. + + 'perf irq script' to see a detailed trace of the workload that + was recorded (aliased to 'perf script' for now). + + 'perf irq timeconsume' to calculate the time consumed by each + hardware interrupt processing function. + +Example usage: +perf irq record -- sleep 1 +perf irq timeconsume + + By default it shows the individual irq events, including the irq name, + cpu(execute the hardware interrupt processing function), time consumed, + entry time and exit time for the each hardware irq: + + --- + Irq name | CPU | Time consume us | Handler entry time | Handler exit time + --- + enp2s0f2-tx-0| [0006] | 0.01 s | 6631263.313329 s | 6631263.313330 s + + --- + Irq name | CPU | Time consume us | Handler entry time | Handler exit time + --- + megasas | [0013] | 0.03 s | 6631263.209564 s | 6631263.209567 s + + --- + Irq name | CPU | Time consume us | Handler entry time | Handler exit time + --- + acpi | [0016] | 0.18 s | 6631263.085787 s | 6631263.085805 s + + +OPTIONS for 'perf irq' + + +--cpus:: + Show just entries with activities for the given CPUs. + +SEE ALSO + +linkperf:perf-record[1] diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt index bc6c585f74fc..c5224ea3ac71 100644 --- a/tools/perf/command-list.txt +++ b/tools/perf/command-list.txt @@ -26,6 +26,7 @@ perf-report mainporcelain common perf-sched mainporcelain common perf-scriptmainporcelain common perf-stat mainporcelain common +perf-irq mainporcelain common perf-test mainporcelain common perf-timechart mainporcelain common perf-top mainporcelain common -- 2.17.1
[PATCH 2/2] perf tools: Add documentation for 'perf irq' command
Add documentation for 'perf irq' command. Signed-off-by: Bixuan Cui --- tools/perf/Documentation/perf-irq.txt | 58 +++ tools/perf/command-list.txt | 1 + 2 files changed, 59 insertions(+) create mode 100644 tools/perf/Documentation/perf-irq.txt diff --git a/tools/perf/Documentation/perf-irq.txt b/tools/perf/Documentation/perf-irq.txt new file mode 100644 index ..8c0e388dad59 --- /dev/null +++ b/tools/perf/Documentation/perf-irq.txt @@ -0,0 +1,58 @@ +perf-irq(1) += + +NAME + +perf-irq - Tool to trace/measure hardware interrupts + +SYNOPSIS + +[verse] +'perf irq' {record|timeconsume|script} + +DESCRIPTION +--- +There are several variants of 'perf irq': + + 'perf irq record ' to record the irq handler events + of an arbitrary workload. + + 'perf irq script' to see a detailed trace of the workload that + was recorded (aliased to 'perf script' for now). + + 'perf irq timeconsume' to calculate the time consumed by each + hardware interrupt processing function. + +Example usage: +perf irq record -- sleep 1 +perf irq timeconsume + + By default it shows the individual irq events, including the irq name, + cpu(execute the hardware interrupt processing function), time consumed, + entry time and exit time for the each hardware irq: + + --- + Irq name | CPU | Time consume us | Handler entry time | Handler exit time + --- + enp2s0f2-tx-0| [0006] | 0.01 s | 6631263.313329 s | 6631263.313330 s + + --- + Irq name | CPU | Time consume us | Handler entry time | Handler exit time + --- + megasas | [0013] | 0.03 s | 6631263.209564 s | 6631263.209567 s + + --- + Irq name | CPU | Time consume us | Handler entry time | Handler exit time + --- + acpi | [0016] | 0.18 s | 6631263.085787 s | 6631263.085805 s + + +OPTIONS for 'perf irq' + + +--cpus:: + Show just entries with activities for the given CPUs. + +SEE ALSO + +linkperf:perf-record[1] diff --git a/tools/perf/command-list.txt b/tools/perf/command-list.txt index bc6c585f74fc..c5224ea3ac71 100644 --- a/tools/perf/command-list.txt +++ b/tools/perf/command-list.txt @@ -26,6 +26,7 @@ perf-report mainporcelain common perf-sched mainporcelain common perf-scriptmainporcelain common perf-stat mainporcelain common +perf-irq mainporcelain common perf-test mainporcelain common perf-timechart mainporcelain common perf-top mainporcelain common -- 2.17.1
[PATCH 1/2] perf tools: add 'perf irq' to measure the hardware interrupts
Add 'perf irq' to trace/measure the hardware interrupts. Now three functions are provided: 1. 'perf irq record ' to record the irq handler events. 2. 'perf irq script' to see a detailed trace of the workload that was recorded. 3. 'perf irq timeconsume' to calculate the time consumed by each hardware interrupt processing function. Signed-off-by: Bixuan Cui --- tools/perf/Build | 1 + tools/perf/builtin-irq.c | 288 +++ tools/perf/builtin.h | 1 + tools/perf/perf.c| 1 + 4 files changed, 291 insertions(+) create mode 100644 tools/perf/builtin-irq.c diff --git a/tools/perf/Build b/tools/perf/Build index 5f392dbb88fc..d52a1e1d6d8a 100644 --- a/tools/perf/Build +++ b/tools/perf/Build @@ -24,6 +24,7 @@ perf-y += builtin-mem.o perf-y += builtin-data.o perf-y += builtin-version.o perf-y += builtin-c2c.o +perf-y += builtin-irq.o perf-$(CONFIG_TRACE) += builtin-trace.o perf-$(CONFIG_LIBELF) += builtin-probe.o diff --git a/tools/perf/builtin-irq.c b/tools/perf/builtin-irq.c new file mode 100644 index ..3a73e698dedf --- /dev/null +++ b/tools/perf/builtin-irq.c @@ -0,0 +1,288 @@ +// SPDX-License-Identifier: GPL-2.0 +#include "builtin.h" +#include "perf.h" +#include "perf-sys.h" + +#include "util/cpumap.h" +#include "util/evlist.h" +#include "util/evsel.h" +#include "util/evsel_fprintf.h" +#include "util/symbol.h" +#include "util/thread.h" +#include "util/header.h" +#include "util/session.h" +#include "util/tool.h" +#include "util/cloexec.h" +#include "util/thread_map.h" +#include "util/color.h" +#include "util/stat.h" +#include "util/string2.h" +#include "util/callchain.h" +#include "util/time-utils.h" + +#include +#include +#include "util/trace-event.h" + +#include "util/debug.h" +#include "util/event.h" + +#include +#include +#include +#include +#include +#include + +#include +#include +#include +#include +#include +#include +#include +#include + +#include + +#define IRQ_NAME_LEN 20 +#define MAX_CPUS 4096 + +static const char *cpu_list; +static DECLARE_BITMAP(cpu_bitmap, MAX_NR_CPUS); + +struct perf_irq; + +struct perf_irq { + struct perf_tool tool; + bool force; + + u32 irq_entry_irq; + char irq_name[IRQ_NAME_LEN]; + u32 cpu; + u64 irq_entry_time; + u32 irq_entry_pid; + u32 irq_exit_irq; + u64 irq_exit_time; + u32 irq_exit_pid; +}; + +typedef int (*irq_handler)(struct perf_tool *tool, + union perf_event *event, + struct evsel *evsel, + struct perf_sample *sample, + struct machine *machine); + +static int perf_timeconsume_process_sample(struct perf_tool *tool, +union perf_event *event, +struct perf_sample *sample, +struct evsel *evsel, +struct machine *machine) +{ + int err = 0; + + if (evsel->handler != NULL) { + irq_handler f = evsel->handler; + err = f(tool, event, evsel, sample, machine); + } + + return err; +} + +static void output_timeconsume(struct perf_irq *irq) +{ + int ret, i; + char irq_entry_time[30], irq_exit_time[30], irq_diff[30]; + + /* The entry and exit of the hardware irq function +* exist at the same time. Check it by irq and pid. +*/ + if (irq->irq_entry_pid != irq->irq_exit_pid || + irq->irq_entry_irq != irq->irq_exit_irq) + return; + + timestamp__scnprintf_usec(irq->irq_entry_time, + irq_entry_time, sizeof(irq_entry_time)); + timestamp__scnprintf_usec(irq->irq_exit_time, + irq_exit_time, sizeof(irq_exit_time)); + timestamp__scnprintf_usec(irq->irq_exit_time - irq->irq_entry_time, + irq_diff, sizeof(irq_diff)); + + printf(" ---\n"); + printf(" Irq name | CPU | Time consume us | Handler entry time | Handler exit time \n"); + printf(" ---\n"); + + ret = printf(" %s ", irq->irq_name); + for (i = 0; i < IRQ_NAME_LEN - ret; i++) + printf(" "); + + printf("| [%04d] | %13s s | %16s s | %1
[PATCH 0/2] perf tools: add 'perf irq' to measure the hardware interrupts
When the hardware interrupt processing function is executed, the interrupt and preemption of current cpu are disabled. As a result, the task is suspended. The execution of the hardware processing function takes a long time (for example 5 ms), will affect the task scheduling performance. This patches provides the 'perf irq' command to trace and calculate the time consumed of the hardware irq function. [verse] 'perf irq' {record|timeconsume|script} There are several variants of 'perf irq': 'perf irq record ' to record the irq handler events of an arbitrary workload. 'perf irq script' to see a detailed trace of the workload that was recorded (aliased to 'perf script' for now). 'perf irq timeconsume' to calculate the time consumed by each hardware interrupt processing function. Example usage: perf irq record -- sleep 1 perf irq timeconsume By default it shows the individual irq events, including the irq name, cpu(execute the hardware interrupt processing function), time consumed, entry time and exit time for the each hardware irq: --- Irq name | CPU | Time consume us | Handler entry time | Handler exit time --- enp2s0f2-tx-0| [0006] | 0.01 s | 6631263.313329 s | 6631263.313330 s --- Irq name | CPU | Time consume us | Handler entry time | Handler exit time --- megasas | [0013] | 0.03 s | 6631263.209564 s | 6631263.209567 s --- Irq name | CPU | Time consume us | Handler entry time | Handler exit time --- acpi | [0016] | 0.18 s | 6631263.085787 s | 6631263.085805 s Bixuan Cui (2): perf tools: add 'perf irq' to measure the hardware interrupts perf tools: Add documentation for 'perf irq' command tools/perf/Build | 1 + tools/perf/Documentation/perf-irq.txt | 58 ++ tools/perf/builtin-irq.c | 288 ++ tools/perf/builtin.h | 1 + tools/perf/command-list.txt | 1 + tools/perf/perf.c | 1 + 6 files changed, 350 insertions(+) create mode 100644 tools/perf/Documentation/perf-irq.txt create mode 100644 tools/perf/builtin-irq.c -- 2.17.1
Re: [PATCH v2] net: neterion: vxge: reduce stack usage in VXGE_COMPLETE_VPATH_TX
On 2020/7/21 9:38, David Miller wrote: > From: Bixuan Cui > Date: Mon, 20 Jul 2020 09:58:39 +0800 > >> Fix the warning: [-Werror=-Wframe-larger-than=] >> >> drivers/net/ethernet/neterion/vxge/vxge-main.c: >> In function'VXGE_COMPLETE_VPATH_TX.isra.37': >> drivers/net/ethernet/neterion/vxge/vxge-main.c:119:1: >> warning: the frame size of 1056 bytes is larger than 1024 bytes >> >> Dropping the NR_SKB_COMPLETED to 16 is appropriate that won't >> have much impact on performance and functionality. >> >> Signed-off-by: Bixuan Cui >> Signed-off-by: Stephen Hemminger >> --- >> v2: Dropping the NR_SKB_COMPLETED to 16. > Applied. thanks.
[PATCH v2] net: neterion: vxge: reduce stack usage in VXGE_COMPLETE_VPATH_TX
Fix the warning: [-Werror=-Wframe-larger-than=] drivers/net/ethernet/neterion/vxge/vxge-main.c: In function'VXGE_COMPLETE_VPATH_TX.isra.37': drivers/net/ethernet/neterion/vxge/vxge-main.c:119:1: warning: the frame size of 1056 bytes is larger than 1024 bytes Dropping the NR_SKB_COMPLETED to 16 is appropriate that won't have much impact on performance and functionality. Signed-off-by: Bixuan Cui Signed-off-by: Stephen Hemminger --- v2: Dropping the NR_SKB_COMPLETED to 16. drivers/net/ethernet/neterion/vxge/vxge-main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/ethernet/neterion/vxge/vxge-main.c b/drivers/net/ethernet/neterion/vxge/vxge-main.c index b0faa737b817..f905d0fe7d54 100644 --- a/drivers/net/ethernet/neterion/vxge/vxge-main.c +++ b/drivers/net/ethernet/neterion/vxge/vxge-main.c @@ -98,7 +98,7 @@ static inline void VXGE_COMPLETE_VPATH_TX(struct vxge_fifo *fifo) { struct sk_buff **skb_ptr = NULL; struct sk_buff **temp; -#define NR_SKB_COMPLETED 128 +#define NR_SKB_COMPLETED 16 struct sk_buff *completed[NR_SKB_COMPLETED]; int more; -- 2.17.1 .
Re: [PATCH] net: neterion: vxge: reduce stack usage in VXGE_COMPLETE_VPATH_TX
On 2020/7/20 1:05, Stephen Hemminger wrote: > On Thu, 16 Jul 2020 17:32:47 + > Bixuan Cui wrote: > >> Fix the warning: [-Werror=-Wframe-larger-than=] >> >> drivers/net/ethernet/neterion/vxge/vxge-main.c: >> In function'VXGE_COMPLETE_VPATH_TX.isra.37': >> drivers/net/ethernet/neterion/vxge/vxge-main.c:119:1: >> warning: the frame size of 1056 bytes is larger than 1024 bytes >> >> Signed-off-by: Bixuan Cui > Dropping the NR_SKB_COMPLETED to 16 won't have much impact > on performance, and shrink the size. > > Doing 16 skb's at a time instead of 128 probably costs > less than one allocation. Especially since it is unlikely > that the device completed that many transmits at once. > > I will send the v2 patch based on your suggestions. thanks
Re: [PATCH -next v2] usb: usbtest: reduce stack usage in test_queue
On 2020/7/16 22:26, Greg KH wrote: >> Reported-by: kbuild test robot >> Signed-off-by: Bixuan Cui >> --- >> drivers/usb/misc/usbtest.c | 10 +- >> 1 file changed, 9 insertions(+), 1 deletion(-) > What changed from v1? Always put that below the --- line. > > Please fix up and resend a v2. Thank you,it's my mistake. I resend a v2.
[PATCH -next v2] usb: usbtest: reduce stack usage in test_queue
Fix the warning: [-Werror=-Wframe-larger-than=] drivers/usb/misc/usbtest.c: In function 'test_queue': drivers/usb/misc/usbtest.c:2148:1: warning: the frame size of 1232 bytes is larger than 1024 bytes Reported-by: kbuild test robot Acked-by: Alan Stern Signed-off-by: Bixuan Cui --- v2: Change MAX_SGLEN to param->sglen. drivers/usb/misc/usbtest.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/usb/misc/usbtest.c b/drivers/usb/misc/usbtest.c index 8b220d56647b..150090ee4ec1 100644 --- a/drivers/usb/misc/usbtest.c +++ b/drivers/usb/misc/usbtest.c @@ -2043,7 +2043,7 @@ test_queue(struct usbtest_dev *dev, struct usbtest_param_32 *param, unsignedi; unsigned long packets = 0; int status = 0; - struct urb *urbs[MAX_SGLEN]; + struct urb **urbs; if (!param->sglen || param->iterations > UINT_MAX / param->sglen) return -EINVAL; @@ -2051,6 +2051,10 @@ test_queue(struct usbtest_dev *dev, struct usbtest_param_32 *param, if (param->sglen > MAX_SGLEN) return -EINVAL; + urbs = kcalloc(param->sglen, sizeof(*urbs), GFP_KERNEL); + if (!urbs) + return -ENOMEM; + memset(, 0, sizeof(context)); context.count = param->iterations * param->sglen; context.dev = dev; @@ -2137,6 +2141,8 @@ test_queue(struct usbtest_dev *dev, struct usbtest_param_32 *param, else if (context.errors > (context.is_iso ? context.packet_count / 10 : 0)) status = -EIO; + + kfree(urbs); return status; fail: @@ -2144,6 +2150,8 @@ test_queue(struct usbtest_dev *dev, struct usbtest_param_32 *param, if (urbs[i]) simple_free_urb(urbs[i]); } + + kfree(urbs); return status; } -- 2.17.1 .
Re: [PATCH -next v2] usb: usbtest: reduce stack usage in test_queue
Fix the warning: [-Werror=-Wframe-larger-than=] drivers/usb/misc/usbtest.c: In function 'test_queue': drivers/usb/misc/usbtest.c:2148:1: warning: the frame size of 1232 bytes is larger than 1024 bytes Reported-by: kbuild test robot Acked-by: Alan Stern Signed-off-by: Bixuan Cui --- v2: Change MAX_SGLEN to param->sglen. drivers/usb/misc/usbtest.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/usb/misc/usbtest.c b/drivers/usb/misc/usbtest.c index 8b220d56647b..150090ee4ec1 100644 --- a/drivers/usb/misc/usbtest.c +++ b/drivers/usb/misc/usbtest.c @@ -2043,7 +2043,7 @@ test_queue(struct usbtest_dev *dev, struct usbtest_param_32 *param, unsignedi; unsigned long packets = 0; int status = 0; - struct urb *urbs[MAX_SGLEN]; + struct urb **urbs; if (!param->sglen || param->iterations > UINT_MAX / param->sglen) return -EINVAL; @@ -2051,6 +2051,10 @@ test_queue(struct usbtest_dev *dev, struct usbtest_param_32 *param, if (param->sglen > MAX_SGLEN) return -EINVAL; + urbs = kcalloc(param->sglen, sizeof(*urbs), GFP_KERNEL); + if (!urbs) + return -ENOMEM; + memset(, 0, sizeof(context)); context.count = param->iterations * param->sglen; context.dev = dev; @@ -2137,6 +2141,8 @@ test_queue(struct usbtest_dev *dev, struct usbtest_param_32 *param, else if (context.errors > (context.is_iso ? context.packet_count / 10 : 0)) status = -EIO; + + kfree(urbs); return status; fail: @@ -2144,6 +2150,8 @@ test_queue(struct usbtest_dev *dev, struct usbtest_param_32 *param, if (urbs[i]) simple_free_urb(urbs[i]); } + + kfree(urbs); return status; } -- 2.17.1 .
[PATCH -next v2] media: tuners: reduce stack usage in mxl5005s_reconfigure
Fix the warning: [-Werror=-Wframe-larger-than=] drivers/media/tuners/mxl5005s.c: In function 'mxl5005s_reconfigure': drivers/media/tuners/mxl5005s.c:3953:1: warning: the frame size of 1152 bytes is larger than 1024 bytes Signed-off-by: Bixuan Cui --- drivers/media/tuners/mxl5005s.c | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/media/tuners/mxl5005s.c b/drivers/media/tuners/mxl5005s.c index 1c07e2225fb3..f6e82a8e7d37 100644 --- a/drivers/media/tuners/mxl5005s.c +++ b/drivers/media/tuners/mxl5005s.c @@ -3926,15 +3926,26 @@ static int mxl5005s_reconfigure(struct dvb_frontend *fe, u32 mod_type, u32 bandwidth) { struct mxl5005s_state *state = fe->tuner_priv; - - u8 AddrTable[MXL5005S_REG_WRITING_TABLE_LEN_MAX]; - u8 ByteTable[MXL5005S_REG_WRITING_TABLE_LEN_MAX]; + u8 *AddrTable; + u8 *ByteTable; int TableLen; dprintk(1, "%s(type=%d, bw=%d)\n", __func__, mod_type, bandwidth); mxl5005s_reset(fe); + AddrTable = kcalloc(MXL5005S_REG_WRITING_TABLE_LEN_MAX, sizeof(u8), + GFP_KERNEL); + if (!AddrTable) + return -ENOMEM; + + ByteTable = kcalloc(MXL5005S_REG_WRITING_TABLE_LEN_MAX, sizeof(u8), + GFP_KERNEL); + if (!ByteTable) { + kfree(AddrTable); + return -ENOMEM; + } + /* Tuner initialization stage 0 */ MXL_GetMasterControl(ByteTable, MC_SYNTH_RESET); AddrTable[0] = MASTER_CONTROL_ADDR; @@ -3949,6 +3960,9 @@ static int mxl5005s_reconfigure(struct dvb_frontend *fe, u32 mod_type, mxl5005s_writeregs(fe, AddrTable, ByteTable, TableLen); + kfree(AddrTable); + kfree(ByteTable); + return 0; } -- 2.17.1 .
[PATCH -next v2] usb: usbtest: reduce stack usage in test_queue
Fix the warning: [-Werror=-Wframe-larger-than=] drivers/usb/misc/usbtest.c: In function 'test_queue': drivers/usb/misc/usbtest.c:2148:1: warning: the frame size of 1232 bytes is larger than 1024 bytes Reported-by: kbuild test robot Signed-off-by: Bixuan Cui --- drivers/usb/misc/usbtest.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/usb/misc/usbtest.c b/drivers/usb/misc/usbtest.c index 8b220d56647b..a9b40953d6bc 100644 --- a/drivers/usb/misc/usbtest.c +++ b/drivers/usb/misc/usbtest.c @@ -2043,7 +2043,7 @@ test_queue(struct usbtest_dev *dev, struct usbtest_param_32 *param, unsignedi; unsigned long packets = 0; int status = 0; - struct urb *urbs[MAX_SGLEN]; + struct urb **urbs; if (!param->sglen || param->iterations > UINT_MAX / param->sglen) return -EINVAL; @@ -2051,6 +2051,10 @@ test_queue(struct usbtest_dev *dev, struct usbtest_param_32 *param, if (param->sglen > MAX_SGLEN) return -EINVAL; + urbs = kcalloc(MAX_SGLEN, sizeof(*urbs), GFP_KERNEL); + if (!urbs) + return -ENOMEM; + memset(, 0, sizeof(context)); context.count = param->iterations * param->sglen; context.dev = dev; @@ -2137,6 +2141,8 @@ test_queue(struct usbtest_dev *dev, struct usbtest_param_32 *param, else if (context.errors > (context.is_iso ? context.packet_count / 10 : 0)) status = -EIO; + + kfree(urbs); return status; fail: @@ -2144,6 +2150,8 @@ test_queue(struct usbtest_dev *dev, struct usbtest_param_32 *param, if (urbs[i]) simple_free_urb(urbs[i]); } + + kfree(urbs); return status; } -- 2.17.1 .
Re: [PATCH] net: neterion: vxge: reduce stack usage in VXGE_COMPLETE_VPATH_TX
On 2020/7/16 17:46, Joe Perches wrote: > I doubt this is a good idea. > Check the callers interrupt status. yes, it's not good idea to alloc memory in interrupt handler, I will think more while fix warning. :) Thanks.
[PATCH] net: neterion: vxge: reduce stack usage in VXGE_COMPLETE_VPATH_TX
Fix the warning: [-Werror=-Wframe-larger-than=] drivers/net/ethernet/neterion/vxge/vxge-main.c: In function'VXGE_COMPLETE_VPATH_TX.isra.37': drivers/net/ethernet/neterion/vxge/vxge-main.c:119:1: warning: the frame size of 1056 bytes is larger than 1024 bytes Signed-off-by: Bixuan Cui --- drivers/net/ethernet/neterion/vxge/vxge-main.c | 8 1 file changed, 8 insertions(+) diff --git a/drivers/net/ethernet/neterion/vxge/vxge-main.c b/drivers/net/ethernet/neterion/vxge/vxge-main.c index b0faa737b817..97ddfc9debd4 100644 --- a/drivers/net/ethernet/neterion/vxge/vxge-main.c +++ b/drivers/net/ethernet/neterion/vxge/vxge-main.c @@ -100,8 +100,14 @@ static inline void VXGE_COMPLETE_VPATH_TX(struct vxge_fifo *fifo) struct sk_buff **temp; #define NR_SKB_COMPLETED 128 struct sk_buff *completed[NR_SKB_COMPLETED]; + struct sk_buff **completed; int more; + completed = kcalloc(NR_SKB_COMPLETED, sizeof(*completed), + GFP_KERNEL); + if (!completed) + return; + do { more = 0; skb_ptr = completed; @@ -116,6 +122,8 @@ static inline void VXGE_COMPLETE_VPATH_TX(struct vxge_fifo *fifo) for (temp = completed; temp != skb_ptr; temp++) dev_consume_skb_irq(*temp); } while (more); + + free(completed); } static inline void VXGE_COMPLETE_ALL_TX(struct vxgedev *vdev) -- 2.17.1
[PATCH] media: tuners: reduce stack usage in mxl5005s_reconfigure
Fix the warning: [-Werror=-Wframe-larger-than=] drivers/media/tuners/mxl5005s.c: In function 'mxl5005s_reconfigure': drivers/media/tuners/mxl5005s.c:3953:1: warning: the frame size of 1152 bytes is larger than 1024 bytes Signed-off-by: Bixuan Cui --- drivers/media/tuners/mxl5005s.c | 20 +--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/drivers/media/tuners/mxl5005s.c b/drivers/media/tuners/mxl5005s.c index 1c07e2225fb3..f6e82a8e7d37 100644 --- a/drivers/media/tuners/mxl5005s.c +++ b/drivers/media/tuners/mxl5005s.c @@ -3926,15 +3926,26 @@ static int mxl5005s_reconfigure(struct dvb_frontend *fe, u32 mod_type, u32 bandwidth) { struct mxl5005s_state *state = fe->tuner_priv; - - u8 AddrTable[MXL5005S_REG_WRITING_TABLE_LEN_MAX]; - u8 ByteTable[MXL5005S_REG_WRITING_TABLE_LEN_MAX]; + u8 *AddrTable; + u8 *ByteTable; int TableLen; dprintk(1, "%s(type=%d, bw=%d)\n", __func__, mod_type, bandwidth); mxl5005s_reset(fe); + AddrTable = kcalloc(MXL5005S_REG_WRITING_TABLE_LEN_MAX, sizeof(u8), + GFP_KERNEL); + if (!AddrTable) + return -ENOMEM; + + ByteTable = kcalloc(MXL5005S_REG_WRITING_TABLE_LEN_MAX, sizeof(u8), + GFP_KERNEL); + if (!ByteTable) { + kfree(AddrTable); + return -ENOMEM; + } + /* Tuner initialization stage 0 */ MXL_GetMasterControl(ByteTable, MC_SYNTH_RESET); AddrTable[0] = MASTER_CONTROL_ADDR; @@ -3949,6 +3960,9 @@ static int mxl5005s_reconfigure(struct dvb_frontend *fe, u32 mod_type, mxl5005s_writeregs(fe, AddrTable, ByteTable, TableLen); + kfree(AddrTable); + kfree(ByteTable); + return 0; } -- 2.17.1
[PATCH] usb: usbtest: reduce stack usage in test_queue
Fix the warning: [-Werror=-Wframe-larger-than=] drivers/usb/misc/usbtest.c: In function 'test_queue': drivers/usb/misc/usbtest.c:2148:1: warning: the frame size of 1232 bytes is larger than 1024 bytes Reported-by: kbuild test robot Signed-off-by: Bixuan Cui --- drivers/usb/misc/usbtest.c | 10 +- 1 file changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/usb/misc/usbtest.c b/drivers/usb/misc/usbtest.c index 8b220d56647b..a9b40953d6bc 100644 --- a/drivers/usb/misc/usbtest.c +++ b/drivers/usb/misc/usbtest.c @@ -2043,7 +2043,7 @@ test_queue(struct usbtest_dev *dev, struct usbtest_param_32 *param, unsignedi; unsigned long packets = 0; int status = 0; - struct urb *urbs[MAX_SGLEN]; + struct urb **urbs; if (!param->sglen || param->iterations > UINT_MAX / param->sglen) return -EINVAL; @@ -2051,6 +2051,10 @@ test_queue(struct usbtest_dev *dev, struct usbtest_param_32 *param, if (param->sglen > MAX_SGLEN) return -EINVAL; + urbs = kcalloc(MAX_SGLEN, sizeof(*urbs), GFP_KERNEL); + if (!urbs) + return -ENOMEM; + memset(, 0, sizeof(context)); context.count = param->iterations * param->sglen; context.dev = dev; @@ -2137,6 +2141,8 @@ test_queue(struct usbtest_dev *dev, struct usbtest_param_32 *param, else if (context.errors > (context.is_iso ? context.packet_count / 10 : 0)) status = -EIO; + + kfree(urbs); return status; fail: @@ -2144,6 +2150,8 @@ test_queue(struct usbtest_dev *dev, struct usbtest_param_32 *param, if (urbs[i]) simple_free_urb(urbs[i]); } + + kfree(urbs); return status; } -- 2.17.1
Re: [PATCH v3] mm/percpu: fix 'defined but not used' warning
On 2020/7/15 9:50, Stephen Rothwell wrote: > I have added this patch to linux-next today. thanks.
Re: [PATCH v2] mm/percpu: fix 'defined but not used' warning
On 2020/7/15 2:41, Roman Gushchin wrote: >> Fixes: 26c99879ef01 ("mm: memcg/percpu: account percpu memory to memory >> cgroups") > The "fixes" tag is not valid: the patch is in the mm queue, so it doesn't > have a stable hash. Usually Andrew squashes such fixes into the original patch > on merging. Thanks for your advice,delete it.
[PATCH v3] mm/percpu: fix 'defined but not used' warning
Gcc report the following warning without CONFIG_MEMCG_KMEM: mm/percpu-internal.h:145:29: warning: 'pcpu_chunk_type' defined but not used [-Wunused-function] static enum pcpu_chunk_type pcpu_chunk_type(struct pcpu_chunk *chunk) ^~~ Add 'inline' to pcpu_chunk_type(),pcpu_is_memcg_chunk() and pcpu_chunk_list() to clear warning. Acked-by: Roman Gushchin Suggested-by: Stephen Rothwell Signed-off-by: Bixuan Cui --- mm/percpu-internal.h | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h index 7983455842ff..18b768ac7dca 100644 --- a/mm/percpu-internal.h +++ b/mm/percpu-internal.h @@ -129,31 +129,31 @@ static inline int pcpu_chunk_map_bits(struct pcpu_chunk *chunk) } #ifdef CONFIG_MEMCG_KMEM -static enum pcpu_chunk_type pcpu_chunk_type(struct pcpu_chunk *chunk) +static inline enum pcpu_chunk_type pcpu_chunk_type(struct pcpu_chunk *chunk) { if (chunk->obj_cgroups) return PCPU_CHUNK_MEMCG; return PCPU_CHUNK_ROOT; } -static bool pcpu_is_memcg_chunk(enum pcpu_chunk_type chunk_type) +static inline bool pcpu_is_memcg_chunk(enum pcpu_chunk_type chunk_type) { return chunk_type == PCPU_CHUNK_MEMCG; } #else -static enum pcpu_chunk_type pcpu_chunk_type(struct pcpu_chunk *chunk) +static inline enum pcpu_chunk_type pcpu_chunk_type(struct pcpu_chunk *chunk) { return PCPU_CHUNK_ROOT; } -static bool pcpu_is_memcg_chunk(enum pcpu_chunk_type chunk_type) +static inline bool pcpu_is_memcg_chunk(enum pcpu_chunk_type chunk_type) { return false; } #endif -static struct list_head *pcpu_chunk_list(enum pcpu_chunk_type chunk_type) +static inline struct list_head *pcpu_chunk_list(enum pcpu_chunk_type chunk_type) { return _chunk_lists[pcpu_nr_slots * pcpu_is_memcg_chunk(chunk_type)]; -- 2.17.1 .
[PATCH v2] mm/percpu: fix 'defined but not used' warning
Gcc report the following warning without CONFIG_MEMCG_KMEM: mm/percpu-internal.h:145:29: warning: 'pcpu_chunk_type' defined but not used [-Wunused-function] static enum pcpu_chunk_type pcpu_chunk_type(struct pcpu_chunk *chunk) ^~~ Add 'inline' to pcpu_chunk_type(),pcpu_is_memcg_chunk() and pcpu_chunk_list() to clear warning. Fixes: 26c99879ef01 ("mm: memcg/percpu: account percpu memory to memory cgroups") Signed-off-by: Stephen Rothwell Signed-off-by: Bixuan Cui --- mm/percpu-internal.h | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h index 7983455842ff..18b768ac7dca 100644 --- a/mm/percpu-internal.h +++ b/mm/percpu-internal.h @@ -129,31 +129,31 @@ static inline int pcpu_chunk_map_bits(struct pcpu_chunk *chunk) } #ifdef CONFIG_MEMCG_KMEM -static enum pcpu_chunk_type pcpu_chunk_type(struct pcpu_chunk *chunk) +static inline enum pcpu_chunk_type pcpu_chunk_type(struct pcpu_chunk *chunk) { if (chunk->obj_cgroups) return PCPU_CHUNK_MEMCG; return PCPU_CHUNK_ROOT; } -static bool pcpu_is_memcg_chunk(enum pcpu_chunk_type chunk_type) +static inline bool pcpu_is_memcg_chunk(enum pcpu_chunk_type chunk_type) { return chunk_type == PCPU_CHUNK_MEMCG; } #else -static enum pcpu_chunk_type pcpu_chunk_type(struct pcpu_chunk *chunk) +static inline enum pcpu_chunk_type pcpu_chunk_type(struct pcpu_chunk *chunk) { return PCPU_CHUNK_ROOT; } -static bool pcpu_is_memcg_chunk(enum pcpu_chunk_type chunk_type) +static inline bool pcpu_is_memcg_chunk(enum pcpu_chunk_type chunk_type) { return false; } #endif -static struct list_head *pcpu_chunk_list(enum pcpu_chunk_type chunk_type) +static inline struct list_head *pcpu_chunk_list(enum pcpu_chunk_type chunk_type) { return _chunk_lists[pcpu_nr_slots * pcpu_is_memcg_chunk(chunk_type)]; -- 2.17.1 . .
Re: [PATCH v2] mm/percpu: fix 'defined but not used' warning
On 2020/7/14 21:34, Bixuan Cui wrote: > mm/percpu-internal.h:145:29: warning: ‘pcpu_chunk_type’ defined > but not used [-Wunused-function] Please ignore this email, I will resend the v2 patch.
[PATCH v2] mm/percpu: fix 'defined but not used' warning
Gcc report the following warning without CONFIG_MEMCG_KMEM: mm/percpu-internal.h:145:29: warning: ‘pcpu_chunk_type’ defined but not used [-Wunused-function] static enum pcpu_chunk_type pcpu_chunk_type(struct pcpu_chunk *chunk) ^~~ Add 'inline' to pcpu_chunk_type(),pcpu_is_memcg_chunk() and pcpu_chunk_list() to clear warning. Fixes: 26c99879ef01 ("mm: memcg/percpu: account percpu memory to memory cgroups") Signed-off-by: Stephen Rothwell Signed-off-by: Bixuan Cui --- mm/percpu-internal.h | 10 +- 1 file changed, 5 insertions(+), 5 deletions(-) diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h index 7983455842ff..18b768ac7dca 100644 --- a/mm/percpu-internal.h +++ b/mm/percpu-internal.h @@ -129,31 +129,31 @@ static inline int pcpu_chunk_map_bits(struct pcpu_chunk *chunk) } #ifdef CONFIG_MEMCG_KMEM -static enum pcpu_chunk_type pcpu_chunk_type(struct pcpu_chunk *chunk) +static inline enum pcpu_chunk_type pcpu_chunk_type(struct pcpu_chunk *chunk) { if (chunk->obj_cgroups) return PCPU_CHUNK_MEMCG; return PCPU_CHUNK_ROOT; } -static bool pcpu_is_memcg_chunk(enum pcpu_chunk_type chunk_type) +static inline bool pcpu_is_memcg_chunk(enum pcpu_chunk_type chunk_type) { return chunk_type == PCPU_CHUNK_MEMCG; } #else -static enum pcpu_chunk_type pcpu_chunk_type(struct pcpu_chunk *chunk) +static inline enum pcpu_chunk_type pcpu_chunk_type(struct pcpu_chunk *chunk) { return PCPU_CHUNK_ROOT; } -static bool pcpu_is_memcg_chunk(enum pcpu_chunk_type chunk_type) +static inline bool pcpu_is_memcg_chunk(enum pcpu_chunk_type chunk_type) { return false; } #endif -static struct list_head *pcpu_chunk_list(enum pcpu_chunk_type chunk_type) +static inline struct list_head *pcpu_chunk_list(enum pcpu_chunk_type chunk_type) { return _chunk_lists[pcpu_nr_slots * pcpu_is_memcg_chunk(chunk_type)]; -- 2.17.1 .
Re: [PATCH] mm/percpu: mark pcpu_chunk_type() as __maybe_unused
On 2020/7/14 20:53, Stephen Rothwell wrote: > Hi Bixuan, > > On Tue, 14 Jul 2020 13:41:01 +0000 Bixuan Cui wrote: >> Gcc report the following warning without CONFIG_MEMCG_KMEM: >> >> mm/percpu-internal.h:145:29: warning: ‘pcpu_chunk_type’ defined >> but not used [-Wunused-function] >> static enum pcpu_chunk_type pcpu_chunk_type(struct pcpu_chunk *chunk) >> ^~~ >> >> Mark pcpu_chunk_type() as __maybe_unused to make it clear. > Given that it is in a header file, it should probably just be "static > inline" (which will also suppress the warning). As should > pcpu_is_memcg_chunk() and pcpu_chunk_list(). Also, without them being > inline, there will be a new copy for each file that > mm/percpu-internal.h is included in. > > And that should be considered a fix for "mm: memcg/percpu: account > percpu memory to memory cgroups". Thinks,i will fix it.
[PATCH] mm/percpu: mark pcpu_chunk_type() as __maybe_unused
Gcc report the following warning without CONFIG_MEMCG_KMEM: mm/percpu-internal.h:145:29: warning: ‘pcpu_chunk_type’ defined but not used [-Wunused-function] static enum pcpu_chunk_type pcpu_chunk_type(struct pcpu_chunk *chunk) ^~~ Mark pcpu_chunk_type() as __maybe_unused to make it clear. Signed-off-by: Bixuan Cui --- mm/percpu-internal.h | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/mm/percpu-internal.h b/mm/percpu-internal.h index 7983455842ff..8a8a230bd957 100644 --- a/mm/percpu-internal.h +++ b/mm/percpu-internal.h @@ -129,7 +129,7 @@ static inline int pcpu_chunk_map_bits(struct pcpu_chunk *chunk) } #ifdef CONFIG_MEMCG_KMEM -static enum pcpu_chunk_type pcpu_chunk_type(struct pcpu_chunk *chunk) +static enum pcpu_chunk_type __maybe_unused pcpu_chunk_type(struct pcpu_chunk *chunk) { if (chunk->obj_cgroups) return PCPU_CHUNK_MEMCG; @@ -142,7 +142,7 @@ static bool pcpu_is_memcg_chunk(enum pcpu_chunk_type chunk_type) } #else -static enum pcpu_chunk_type pcpu_chunk_type(struct pcpu_chunk *chunk) +static enum pcpu_chunk_type __maybe_unused pcpu_chunk_type(struct pcpu_chunk *chunk) { return PCPU_CHUNK_ROOT; } -- 2.17.1
Re: [PATCH] kernel/kprobes: add check to avoid memory leaks
On 2017/10/30 12:42, Masami Hiramatsu wrote: > I don't like this kind of check, since this is obviously caller's bug. > Why doesn't each caller check this? Thank you for your answer. Thanks, Bixuan Cui
Re: [PATCH] kernel/kprobes: add check to avoid memory leaks
On 2017/10/30 12:42, Masami Hiramatsu wrote: > I don't like this kind of check, since this is obviously caller's bug. > Why doesn't each caller check this? Thank you for your answer. Thanks, Bixuan Cui
Re: [PATCH] kernel/kprobes: add check to avoid memory leaks
On 2017/10/25 20:29, Bixuan Cui wrote: And test again with this patch: insmod testRegKretprobes_004.ko [ 163.853281] register_kretprobe failed, returned -22 insmod: can't insert 'testRegKretprobes_004.ko': Operation not permitted Thanks, Bixuan Cui > The register_kretprobe(struct kretprobe *rp) creates and initializes > a hash list for rp->free_instances when register kretprobe every time. > Then malloc memory for it. > > The test case: > static struct kretprobe rp; > struct kretprobe *rps[2]={, }; > static int ret_handler(struct kretprobe_instance *ri, struct pt_regs *regs) > { > printk(KERN_DEBUG "ret_handler\n"); > return 0; > } > static int entry_handler(struct kretprobe_instance *ri, struct pt_regs *regs) > { > printk(KERN_DEBUG "entry_handler\n"); > return 0; > } > static int __init kretprobe_init(void) > { > int ret; > rp.kp.addr = (kprobe_opcode_t *)kallsyms_lookup_name("do_fork"); > rp.handler=ret_handler; > rp.entry_handler=entry_handler; > rp.maxactive = 3; > > ret = register_kretprobes(rps,2); > > Result: > unreferenced object 0x8010b12ad980 (size 64): > comm "insmod", pid 17352, jiffies 4298977824 (age 63065.756s) > hex dump (first 32 bytes): > 00 00 00 00 00 00 00 00 d8 84 12 fc ff 7f ff ff > 74 65 73 74 52 65 67 4b 72 65 74 70 72 6f 62 65 testRegKretprobe > backtrace: > [] create_object+0x1e0/0x3f0 > [] kmemleak_alloc+0x6c/0xf0 > [] __kmalloc+0x23c/0x2e0 > [] register_kretprobe+0x12c/0x350 > > When call register_kretprobes(struct kretprobe **rps, int num) with the > same rps(num>=2). > The first time,call INIT_HLIST_HEAD() and kmalloc() to malloc memory for the > hash list,then save into rp->free_instances. > The second time,call INIT_HLIST_HEAD() and kmalloc() then create a new > hash list into rp->free_instances and lost the first rp->free_instances. > So add check to avoid it. > > Reported-and-tested-by: kangwen <kangw...@huawei.com> > Signed-off-by: Bixuan Cui <cuibix...@huawei.com> > --- > kernel/kprobes.c | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/kernel/kprobes.c b/kernel/kprobes.c > index 6301dae..f19f191 100644 > --- a/kernel/kprobes.c > +++ b/kernel/kprobes.c > @@ -1890,10 +1890,16 @@ EXPORT_SYMBOL_GPL(register_kretprobe); > > int register_kretprobes(struct kretprobe **rps, int num) > { > - int ret = 0, i; > + int ret = 0, i, j; > > if (num <= 0) > return -EINVAL; > + > + for (i = 0; i < num-1; i++) > + for (j = i+1; j < num; j++) > + if (rps[i] == rps[j]) > + return -EINVAL; > + > for (i = 0; i < num; i++) { > ret = register_kretprobe(rps[i]); > if (ret < 0) { > -- > 2.6.2 > > > > > . >
Re: [PATCH] kernel/kprobes: add check to avoid memory leaks
On 2017/10/25 20:29, Bixuan Cui wrote: And test again with this patch: insmod testRegKretprobes_004.ko [ 163.853281] register_kretprobe failed, returned -22 insmod: can't insert 'testRegKretprobes_004.ko': Operation not permitted Thanks, Bixuan Cui > The register_kretprobe(struct kretprobe *rp) creates and initializes > a hash list for rp->free_instances when register kretprobe every time. > Then malloc memory for it. > > The test case: > static struct kretprobe rp; > struct kretprobe *rps[2]={, }; > static int ret_handler(struct kretprobe_instance *ri, struct pt_regs *regs) > { > printk(KERN_DEBUG "ret_handler\n"); > return 0; > } > static int entry_handler(struct kretprobe_instance *ri, struct pt_regs *regs) > { > printk(KERN_DEBUG "entry_handler\n"); > return 0; > } > static int __init kretprobe_init(void) > { > int ret; > rp.kp.addr = (kprobe_opcode_t *)kallsyms_lookup_name("do_fork"); > rp.handler=ret_handler; > rp.entry_handler=entry_handler; > rp.maxactive = 3; > > ret = register_kretprobes(rps,2); > > Result: > unreferenced object 0x8010b12ad980 (size 64): > comm "insmod", pid 17352, jiffies 4298977824 (age 63065.756s) > hex dump (first 32 bytes): > 00 00 00 00 00 00 00 00 d8 84 12 fc ff 7f ff ff > 74 65 73 74 52 65 67 4b 72 65 74 70 72 6f 62 65 testRegKretprobe > backtrace: > [] create_object+0x1e0/0x3f0 > [] kmemleak_alloc+0x6c/0xf0 > [] __kmalloc+0x23c/0x2e0 > [] register_kretprobe+0x12c/0x350 > > When call register_kretprobes(struct kretprobe **rps, int num) with the > same rps(num>=2). > The first time,call INIT_HLIST_HEAD() and kmalloc() to malloc memory for the > hash list,then save into rp->free_instances. > The second time,call INIT_HLIST_HEAD() and kmalloc() then create a new > hash list into rp->free_instances and lost the first rp->free_instances. > So add check to avoid it. > > Reported-and-tested-by: kangwen > Signed-off-by: Bixuan Cui > --- > kernel/kprobes.c | 8 +++- > 1 file changed, 7 insertions(+), 1 deletion(-) > > diff --git a/kernel/kprobes.c b/kernel/kprobes.c > index 6301dae..f19f191 100644 > --- a/kernel/kprobes.c > +++ b/kernel/kprobes.c > @@ -1890,10 +1890,16 @@ EXPORT_SYMBOL_GPL(register_kretprobe); > > int register_kretprobes(struct kretprobe **rps, int num) > { > - int ret = 0, i; > + int ret = 0, i, j; > > if (num <= 0) > return -EINVAL; > + > + for (i = 0; i < num-1; i++) > + for (j = i+1; j < num; j++) > + if (rps[i] == rps[j]) > + return -EINVAL; > + > for (i = 0; i < num; i++) { > ret = register_kretprobe(rps[i]); > if (ret < 0) { > -- > 2.6.2 > > > > > . >
[PATCH] kernel/kprobes: add check to avoid memory leaks
The register_kretprobe(struct kretprobe *rp) creates and initializes a hash list for rp->free_instances when register kretprobe every time. Then malloc memory for it. The test case: static struct kretprobe rp; struct kretprobe *rps[2]={, }; static int ret_handler(struct kretprobe_instance *ri, struct pt_regs *regs) { printk(KERN_DEBUG "ret_handler\n"); return 0; } static int entry_handler(struct kretprobe_instance *ri, struct pt_regs *regs) { printk(KERN_DEBUG "entry_handler\n"); return 0; } static int __init kretprobe_init(void) { int ret; rp.kp.addr = (kprobe_opcode_t *)kallsyms_lookup_name("do_fork"); rp.handler=ret_handler; rp.entry_handler=entry_handler; rp.maxactive = 3; ret = register_kretprobes(rps,2); Result: unreferenced object 0x8010b12ad980 (size 64): comm "insmod", pid 17352, jiffies 4298977824 (age 63065.756s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 d8 84 12 fc ff 7f ff ff 74 65 73 74 52 65 67 4b 72 65 74 70 72 6f 62 65 testRegKretprobe backtrace: [] create_object+0x1e0/0x3f0 [] kmemleak_alloc+0x6c/0xf0 [] __kmalloc+0x23c/0x2e0 [] register_kretprobe+0x12c/0x350 When call register_kretprobes(struct kretprobe **rps, int num) with the same rps(num>=2). The first time,call INIT_HLIST_HEAD() and kmalloc() to malloc memory for the hash list,then save into rp->free_instances. The second time,call INIT_HLIST_HEAD() and kmalloc() then create a new hash list into rp->free_instances and lost the first rp->free_instances. So add check to avoid it. Reported-and-tested-by: kangwen <kangw...@huawei.com> Signed-off-by: Bixuan Cui <cuibix...@huawei.com> --- kernel/kprobes.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 6301dae..f19f191 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -1890,10 +1890,16 @@ EXPORT_SYMBOL_GPL(register_kretprobe); int register_kretprobes(struct kretprobe **rps, int num) { - int ret = 0, i; + int ret = 0, i, j; if (num <= 0) return -EINVAL; + + for (i = 0; i < num-1; i++) + for (j = i+1; j < num; j++) + if (rps[i] == rps[j]) + return -EINVAL; + for (i = 0; i < num; i++) { ret = register_kretprobe(rps[i]); if (ret < 0) { -- 2.6.2
[PATCH] kernel/kprobes: add check to avoid memory leaks
The register_kretprobe(struct kretprobe *rp) creates and initializes a hash list for rp->free_instances when register kretprobe every time. Then malloc memory for it. The test case: static struct kretprobe rp; struct kretprobe *rps[2]={, }; static int ret_handler(struct kretprobe_instance *ri, struct pt_regs *regs) { printk(KERN_DEBUG "ret_handler\n"); return 0; } static int entry_handler(struct kretprobe_instance *ri, struct pt_regs *regs) { printk(KERN_DEBUG "entry_handler\n"); return 0; } static int __init kretprobe_init(void) { int ret; rp.kp.addr = (kprobe_opcode_t *)kallsyms_lookup_name("do_fork"); rp.handler=ret_handler; rp.entry_handler=entry_handler; rp.maxactive = 3; ret = register_kretprobes(rps,2); Result: unreferenced object 0x8010b12ad980 (size 64): comm "insmod", pid 17352, jiffies 4298977824 (age 63065.756s) hex dump (first 32 bytes): 00 00 00 00 00 00 00 00 d8 84 12 fc ff 7f ff ff 74 65 73 74 52 65 67 4b 72 65 74 70 72 6f 62 65 testRegKretprobe backtrace: [] create_object+0x1e0/0x3f0 [] kmemleak_alloc+0x6c/0xf0 [] __kmalloc+0x23c/0x2e0 [] register_kretprobe+0x12c/0x350 When call register_kretprobes(struct kretprobe **rps, int num) with the same rps(num>=2). The first time,call INIT_HLIST_HEAD() and kmalloc() to malloc memory for the hash list,then save into rp->free_instances. The second time,call INIT_HLIST_HEAD() and kmalloc() then create a new hash list into rp->free_instances and lost the first rp->free_instances. So add check to avoid it. Reported-and-tested-by: kangwen Signed-off-by: Bixuan Cui --- kernel/kprobes.c | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/kernel/kprobes.c b/kernel/kprobes.c index 6301dae..f19f191 100644 --- a/kernel/kprobes.c +++ b/kernel/kprobes.c @@ -1890,10 +1890,16 @@ EXPORT_SYMBOL_GPL(register_kretprobe); int register_kretprobes(struct kretprobe **rps, int num) { - int ret = 0, i; + int ret = 0, i, j; if (num <= 0) return -EINVAL; + + for (i = 0; i < num-1; i++) + for (j = i+1; j < num; j++) + if (rps[i] == rps[j]) + return -EINVAL; + for (i = 0; i < num; i++) { ret = register_kretprobe(rps[i]); if (ret < 0) { -- 2.6.2
Re: 【BUG】The kernel start fail when enable CONFIG_ARM64_LSE_ATOMICS and KCOV_INSTRUMENT_ALL
> This is a known issue [1,2] that's being worked on at the moment. > > The problem is that the out-of-line LL/SC atomics are built with a > special ABI, but KCOV inserts calls to other code that does not follow > this ABI, resulting in register corruption. > > When I saw this happen, this resulted in cmpxchg() spuriously failing > somewhere in the page init code, which looks similar to what you're > seeing. > > The patch at [1] is sufficient to work around this for the time being. Thank you for your explanation :-D . Thanks, Bixuan Cui > > Thanks, > Mark. > > [1] > http://lists.infradead.org/pipermail/linux-arm-kernel/2017-September/533105.html > [2] > http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/537018.html > >> >> After I try kernel v4.13 and it's the same result. >> I trace the code and find it stop at the end of '__init_single_page(page, >> pfn, zone, nid)' in memmap_init_zone() of mm/page_alloc.c : >> >> static void __meminit __init_single_page(struct page *page, unsigned long >> pfn, >> unsigned long zone, int nid) >> { >> set_page_links(page, zone, nid, pfn); >> init_page_count(page); >> page_mapcount_reset(page); >> page_cpupid_reset_last(page); >> >> INIT_LIST_HEAD(>lru); >> >> #ifdef WANT_PAGE_VIRTUAL >> /* The shift won't overflow because ZONE_NORMAL is below 4G. */ >> if (!is_highmem_idx(zone)) >> set_page_address(page, __va(pfn << PAGE_SHIFT)); >> #endif >> // printk("stop here\n"); >> } >> >> Anyone can give me advice? >> >> Thanks, >> Bixuan Cui >> >> >> ___ >> linux-arm-kernel mailing list >> linux-arm-ker...@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > . >
Re: 【BUG】The kernel start fail when enable CONFIG_ARM64_LSE_ATOMICS and KCOV_INSTRUMENT_ALL
> This is a known issue [1,2] that's being worked on at the moment. > > The problem is that the out-of-line LL/SC atomics are built with a > special ABI, but KCOV inserts calls to other code that does not follow > this ABI, resulting in register corruption. > > When I saw this happen, this resulted in cmpxchg() spuriously failing > somewhere in the page init code, which looks similar to what you're > seeing. > > The patch at [1] is sufficient to work around this for the time being. Thank you for your explanation :-D . Thanks, Bixuan Cui > > Thanks, > Mark. > > [1] > http://lists.infradead.org/pipermail/linux-arm-kernel/2017-September/533105.html > [2] > http://lists.infradead.org/pipermail/linux-arm-kernel/2017-October/537018.html > >> >> After I try kernel v4.13 and it's the same result. >> I trace the code and find it stop at the end of '__init_single_page(page, >> pfn, zone, nid)' in memmap_init_zone() of mm/page_alloc.c : >> >> static void __meminit __init_single_page(struct page *page, unsigned long >> pfn, >> unsigned long zone, int nid) >> { >> set_page_links(page, zone, nid, pfn); >> init_page_count(page); >> page_mapcount_reset(page); >> page_cpupid_reset_last(page); >> >> INIT_LIST_HEAD(>lru); >> >> #ifdef WANT_PAGE_VIRTUAL >> /* The shift won't overflow because ZONE_NORMAL is below 4G. */ >> if (!is_highmem_idx(zone)) >> set_page_address(page, __va(pfn << PAGE_SHIFT)); >> #endif >> // printk("stop here\n"); >> } >> >> Anyone can give me advice? >> >> Thanks, >> Bixuan Cui >> >> >> ___ >> linux-arm-kernel mailing list >> linux-arm-ker...@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel > > . >
【BUG】The kernel start fail when enable CONFIG_ARM64_LSE_ATOMICS and KCOV_INSTRUMENT_ALL
Hi, I try to start the kernel(v4.14.0-rc4) by qemu while enable CONFIG_ARM64_LSE_ATOMICS and KCOV_INSTRUMENT_ALL(use arch/arm64/configs/defconfig) at the same time. Then it hang: qemu-system-aarch64 -kernel Image -m 2048 -smp 8 -initrd qemu-le.rootfs -cpu cortex-a57 -nographic -machine virt,kernel_irqchip=on -append 'console=ttyAMA0 root=/dev/ram rdinit=/sbin/init nohz_full=1 earlycon=pl011,0x900' -rtc base=localtime -device virtio-net-device,netdev=net0 -netdev type=tap,id=net0,script=no,downscript=no,ifname=tap2 [0.00] Booting Linux on physical CPU 0x0 [0.00] Linux version 4.14.0-rc4 #24 SMP PREEMPT Mon Oct 16 11:02:11 CST 2017 [0.00] Boot CPU: AArch64 Processor [411fd070] [0.00] Machine model: linux,dummy-virt [0.00] earlycon: pl11 at MMIO 0x0900 (options '') [0.00] bootconsole [pl11] enabled [0.00] efi: Getting EFI parameters from FDT: [0.00] efi: UEFI not found. [0.00] cma: Reserved 64 MiB at 0xbc00 After I try kernel v4.13 and it's the same result. I trace the code and find it stop at the end of '__init_single_page(page, pfn, zone, nid)' in memmap_init_zone() of mm/page_alloc.c : static void __meminit __init_single_page(struct page *page, unsigned long pfn, unsigned long zone, int nid) { set_page_links(page, zone, nid, pfn); init_page_count(page); page_mapcount_reset(page); page_cpupid_reset_last(page); INIT_LIST_HEAD(>lru); #ifdef WANT_PAGE_VIRTUAL /* The shift won't overflow because ZONE_NORMAL is below 4G. */ if (!is_highmem_idx(zone)) set_page_address(page, __va(pfn << PAGE_SHIFT)); #endif // printk("stop here\n"); } Anyone can give me advice? Thanks, Bixuan Cui
【BUG】The kernel start fail when enable CONFIG_ARM64_LSE_ATOMICS and KCOV_INSTRUMENT_ALL
Hi, I try to start the kernel(v4.14.0-rc4) by qemu while enable CONFIG_ARM64_LSE_ATOMICS and KCOV_INSTRUMENT_ALL(use arch/arm64/configs/defconfig) at the same time. Then it hang: qemu-system-aarch64 -kernel Image -m 2048 -smp 8 -initrd qemu-le.rootfs -cpu cortex-a57 -nographic -machine virt,kernel_irqchip=on -append 'console=ttyAMA0 root=/dev/ram rdinit=/sbin/init nohz_full=1 earlycon=pl011,0x900' -rtc base=localtime -device virtio-net-device,netdev=net0 -netdev type=tap,id=net0,script=no,downscript=no,ifname=tap2 [0.00] Booting Linux on physical CPU 0x0 [0.00] Linux version 4.14.0-rc4 #24 SMP PREEMPT Mon Oct 16 11:02:11 CST 2017 [0.00] Boot CPU: AArch64 Processor [411fd070] [0.00] Machine model: linux,dummy-virt [0.00] earlycon: pl11 at MMIO 0x0900 (options '') [0.00] bootconsole [pl11] enabled [0.00] efi: Getting EFI parameters from FDT: [0.00] efi: UEFI not found. [0.00] cma: Reserved 64 MiB at 0xbc00 After I try kernel v4.13 and it's the same result. I trace the code and find it stop at the end of '__init_single_page(page, pfn, zone, nid)' in memmap_init_zone() of mm/page_alloc.c : static void __meminit __init_single_page(struct page *page, unsigned long pfn, unsigned long zone, int nid) { set_page_links(page, zone, nid, pfn); init_page_count(page); page_mapcount_reset(page); page_cpupid_reset_last(page); INIT_LIST_HEAD(>lru); #ifdef WANT_PAGE_VIRTUAL /* The shift won't overflow because ZONE_NORMAL is below 4G. */ if (!is_highmem_idx(zone)) set_page_address(page, __va(pfn << PAGE_SHIFT)); #endif // printk("stop here\n"); } Anyone can give me advice? Thanks, Bixuan Cui
kernel of next-20170602 call trace when run add_key02 in LTP
Hi, Compile kernel (next-20170602) and run ltp, find: / # ./add_key02 tst_test.c:878: INFO: Timeout per run is 0h 05m 00s [ 341.183219] BUG: unable to handle kernel NULL pointer dereference at (null) [ 341.183850] IP: memset+0x10/0x20 [ 341.184550] *pdpt = 35441001 *pde = [ 341.184550] [ 341.184550] Oops: 0002 [#2] SMP [ 341.184550] Modules linked in: [ 341.184550] CPU: 0 PID: 124 Comm: add_key02 Tainted: G SD W 4.12.0-rc3-next-20170602 #3 [ 341.184550] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 341.184550] task: f5b9ca00 task.stack: f6514000 [ 341.184550] EIP: memset+0x10/0x20 [ 341.184550] EFLAGS: 0246 CPU: 0 [ 341.184550] EAX: EBX: ECX: 0001 EDX: [ 341.184550] ESI: EDI: EBP: f6515f24 ESP: f6515f1c [ 341.184550] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 341.184550] CR0: 80050033 CR2: CR3: 36404920 CR4: 06f0 [ 341.184550] DR0: DR1: DR2: DR3: [ 341.184550] DR6: DR7: [ 341.184550] Call Trace: [ 341.184550] memzero_explicit+0xf/0x20 [ 341.184550] SyS_add_key+0x11f/0x1c0 [ 341.184550] ? change_pid+0x13/0x50 [ 341.184550] do_fast_syscall_32+0x8b/0x130 [ 341.184550] entry_SYSENTER_32+0x4e/0x7c [ 341.184550] EIP: 0xb772ddc1 [ 341.184550] EFLAGS: 0246 CPU: 0 [ 341.184550] EAX: ffda EBX: 080de341 ECX: 080de346 EDX: [ 341.184550] ESI: 0001 EDI: fffc EBP: 0808aa97 ESP: bfe3636c [ 341.184550] DS: 007b ES: 007b FS: GS: 0033 SS: 007b [ 341.184550] Code: 8a 0e 88 0f 8d b4 26 00 00 00 00 8b 45 f0 83 c4 04 5b 5e 5f 5d c3 90 8d 74 26 00 3e 8d 74 26 00 55 89 e5 57 89 c7 53 89 c3 89 d0 aa 89 d8 5b 5f 5d c3 90 90 90 90 90 90 90 90 3e 8d 74 26 00 [ 341.184550] EIP: memset+0x10/0x20 SS:ESP: 0068:f6515f1c [ 341.184550] CR2: [ 341.219144] ---[ end trace e3963c970d107f91 ]--- tst_test.c:928: INFO: If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1 tst_test.c:929: BROK: Test killed! (timeout?) I try to use other tags and kernel on next-20170427 is ok, but next-20170502 fail. Is it bug? Thanks, Cui Bixuan
kernel of next-20170602 call trace when run add_key02 in LTP
Hi, Compile kernel (next-20170602) and run ltp, find: / # ./add_key02 tst_test.c:878: INFO: Timeout per run is 0h 05m 00s [ 341.183219] BUG: unable to handle kernel NULL pointer dereference at (null) [ 341.183850] IP: memset+0x10/0x20 [ 341.184550] *pdpt = 35441001 *pde = [ 341.184550] [ 341.184550] Oops: 0002 [#2] SMP [ 341.184550] Modules linked in: [ 341.184550] CPU: 0 PID: 124 Comm: add_key02 Tainted: G SD W 4.12.0-rc3-next-20170602 #3 [ 341.184550] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 [ 341.184550] task: f5b9ca00 task.stack: f6514000 [ 341.184550] EIP: memset+0x10/0x20 [ 341.184550] EFLAGS: 0246 CPU: 0 [ 341.184550] EAX: EBX: ECX: 0001 EDX: [ 341.184550] ESI: EDI: EBP: f6515f24 ESP: f6515f1c [ 341.184550] DS: 007b ES: 007b FS: 00d8 GS: 00e0 SS: 0068 [ 341.184550] CR0: 80050033 CR2: CR3: 36404920 CR4: 06f0 [ 341.184550] DR0: DR1: DR2: DR3: [ 341.184550] DR6: DR7: [ 341.184550] Call Trace: [ 341.184550] memzero_explicit+0xf/0x20 [ 341.184550] SyS_add_key+0x11f/0x1c0 [ 341.184550] ? change_pid+0x13/0x50 [ 341.184550] do_fast_syscall_32+0x8b/0x130 [ 341.184550] entry_SYSENTER_32+0x4e/0x7c [ 341.184550] EIP: 0xb772ddc1 [ 341.184550] EFLAGS: 0246 CPU: 0 [ 341.184550] EAX: ffda EBX: 080de341 ECX: 080de346 EDX: [ 341.184550] ESI: 0001 EDI: fffc EBP: 0808aa97 ESP: bfe3636c [ 341.184550] DS: 007b ES: 007b FS: GS: 0033 SS: 007b [ 341.184550] Code: 8a 0e 88 0f 8d b4 26 00 00 00 00 8b 45 f0 83 c4 04 5b 5e 5f 5d c3 90 8d 74 26 00 3e 8d 74 26 00 55 89 e5 57 89 c7 53 89 c3 89 d0 aa 89 d8 5b 5f 5d c3 90 90 90 90 90 90 90 90 3e 8d 74 26 00 [ 341.184550] EIP: memset+0x10/0x20 SS:ESP: 0068:f6515f1c [ 341.184550] CR2: [ 341.219144] ---[ end trace e3963c970d107f91 ]--- tst_test.c:928: INFO: If you are running on slow machine, try exporting LTP_TIMEOUT_MUL > 1 tst_test.c:929: BROK: Test killed! (timeout?) I try to use other tags and kernel on next-20170427 is ok, but next-20170502 fail. Is it bug? Thanks, Cui Bixuan