Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

2022-01-19 Thread Guilherme G. Piccoli
Hi Petr, thanks again for the thorough review! I will implement all your
suggestions and submit a V4, all makes sense to me =)

Cheers,


Guilherme



___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v2] proc/vmcore: fix possible deadlock on concurrent mmap and read

2022-01-19 Thread David Hildenbrand
Lockdep noticed that there is chance for a deadlock if we have
concurrent mmap, concurrent read, and the addition/removal of a
callback.

As nicely explained by Boqun:

"
Lockdep warned about the above sequences because rw_semaphore is a fair
read-write lock, and the following can cause a deadlock:

TASK 1  TASK 2  TASK 3
==  ==  ==
down_write(mmap_lock);
down_read(vmcore_cb_rwsem)
down_write(vmcore_cb_rwsem); // 
blocked
down_read(vmcore_cb_rwsem); // cannot get the lock because of the 
fairness
down_read(mmap_lock); // blocked

IOW, a reader can block another read if there is a writer queued by the
second reader and the lock is fair.
"

To fix, convert to srcu to make this deadlock impossible. We need srcu as
our callbacks can sleep. With this change, I cannot trigger any lockdep
warnings.

[6.386519] ==
[6.387203] WARNING: possible circular locking dependency detected
[6.387965] 5.17.0-0.rc0.20220117git0c947b893d69.68.test.fc36.x86_64 #1 Not 
tainted
[6.388899] --
[6.389657] makedumpfile/542 is trying to acquire lock:
[6.390308] 832d2eb8 (vmcore_cb_rwsem){.+.+}-{3:3}, at: 
mmap_vmcore+0x340/0x580
[6.391290]
[6.391290] but task is already holding lock:
[6.391978] 8880af226438 (>mmap_lock#2){}-{3:3}, at: 
vm_mmap_pgoff+0x84/0x150
[6.392898]
[6.392898] which lock already depends on the new lock.
[6.392898]
[6.393866]
[6.393866] the existing dependency chain (in reverse order) is:
[6.394762]
[6.394762] -> #1 (>mmap_lock#2){}-{3:3}:
[6.395530]lock_acquire+0xc3/0x1a0
[6.396047]__might_fault+0x4e/0x70
[6.396562]_copy_to_user+0x1f/0x90
[6.397093]__copy_oldmem_page+0x72/0xc0
[6.397663]read_from_oldmem+0x77/0x1e0
[6.398229]read_vmcore+0x2c2/0x310
[6.398742]proc_reg_read+0x47/0xa0
[6.399265]vfs_read+0x101/0x340
[6.399751]__x64_sys_pread64+0x5d/0xa0
[6.400314]do_syscall_64+0x43/0x90
[6.400778]entry_SYSCALL_64_after_hwframe+0x44/0xae
[6.401390]
[6.401390] -> #0 (vmcore_cb_rwsem){.+.+}-{3:3}:
[6.402063]validate_chain+0x9f4/0x2670
[6.402560]__lock_acquire+0x8f7/0xbc0
[6.403054]lock_acquire+0xc3/0x1a0
[6.403509]down_read+0x4a/0x140
[6.403948]mmap_vmcore+0x340/0x580
[6.404403]proc_reg_mmap+0x3e/0x90
[6.404866]mmap_region+0x504/0x880
[6.405322]do_mmap+0x38a/0x520
[6.405744]vm_mmap_pgoff+0xc1/0x150
[6.406258]ksys_mmap_pgoff+0x178/0x200
[6.406823]do_syscall_64+0x43/0x90
[6.407339]entry_SYSCALL_64_after_hwframe+0x44/0xae
[6.407975]
[6.407975] other info that might help us debug this:
[6.407975]
[6.408945]  Possible unsafe locking scenario:
[6.408945]
[6.409684]CPU0CPU1
[6.410196]
[6.410703]   lock(>mmap_lock#2);
[6.411121]lock(vmcore_cb_rwsem);
[6.411792]lock(>mmap_lock#2);
[6.412465]   lock(vmcore_cb_rwsem);
[6.412873]
[6.412873]  *** DEADLOCK ***
[6.412873]
[6.413522] 1 lock held by makedumpfile/542:
[6.414006]  #0: 8880af226438 (>mmap_lock#2){}-{3:3}, at: 
vm_mmap_pgoff+0x84/0x150
[6.414944]
[6.414944] stack backtrace:
[6.415432] CPU: 0 PID: 542 Comm: makedumpfile Not tainted 
5.17.0-0.rc0.20220117git0c947b893d69.68.test.fc36.x86_64 #1
[6.416581] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[6.417272] Call Trace:
[6.417593]  
[6.417882]  dump_stack_lvl+0x5d/0x78
[6.418346]  print_circular_bug+0x5d7/0x5f0
[6.418821]  ? stack_trace_save+0x3a/0x50
[6.419273]  ? save_trace+0x3d/0x330
[6.419681]  check_noncircular+0xd1/0xe0
[6.420217]  validate_chain+0x9f4/0x2670
[6.420715]  ? __lock_acquire+0x8f7/0xbc0
[6.421234]  ? __lock_acquire+0x8f7/0xbc0
[6.421685]  __lock_acquire+0x8f7/0xbc0
[6.422127]  lock_acquire+0xc3/0x1a0
[6.422535]  ? mmap_vmcore+0x340/0x580
[6.422965]  ? lock_is_held_type+0xe2/0x140
[6.423432]  ? mmap_vmcore+0x340/0x580
[6.423893]  down_read+0x4a/0x140
[6.424321]  ? mmap_vmcore+0x340/0x580
[6.424800]  mmap_vmcore+0x340/0x580
[6.425237]  ? vm_area_alloc+0x1c/0x60
[6.425661]  ? trace_kmem_cache_alloc+0x30/0xe0
[6.426174]  ? kmem_cache_alloc+0x1e0/0x2f0
[6.426641]  proc_reg_mmap+0x3e/0x90
[6.427052]  mmap_region+0x504/0x880
[6.427462]  do_mmap+0x38a/0x520
[6.427842]  vm_mmap_pgoff+0xc1/0x150
[6.428260]  ksys_mmap_pgoff+0x178/0x200
[ 

Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

2022-01-19 Thread Petr Mladek
On Fri 2022-01-14 15:30:46, Guilherme G. Piccoli wrote:
> The panic_print setting allows users to collect more information in a
> panic event, like memory stats, tasks, CPUs backtraces, etc.
> This is a pretty interesting debug mechanism, but currently the print
> event happens *after* kmsg_dump(), meaning that pstore, for example,
> cannot collect a dmesg with the panic_print information.

The above text is OK.

The below commit message is quite hard to follow. The sentences are very
long and complex. I tend to do the same mistakes. I will try to
suggest some improvements.


> This patch changes that in 2 ways:

> (a) First, after a good discussion with Petr in the mailing-list[0],
> he noticed

The above is not needed ;-) It is better to just describe the problem
and the solution a directive way.

> that one specific setting of panic_print (the console replay,
> bit 5) should not be executed before console proper flushing; hence we
> hereby split the panic_print_sys_info() function in upper and lower
> portions: if the parameter "after_kmsg_dumpers" is passed, only bit 5
> (the console replay thing) is evaluated and the function returns - this
> is the lower portion. Otherwise all other bits are checked and the
> function prints the user required information; this is the upper/earlier
> mode.

The meaning of the boolean parameter might be explained by:

"panic_print_sys_info() allows to print additional information into the
kernel log. In addition, it allows to reply the existing log on the
console first.

The extra information is useful also when generating crash dump or
kmsg_dump. But the replay functionality makes sense only at the end
of panic().

Allow to distinguish the two situations by a boolean parameter."



> (b) With the above change, we can safely insert a panic_print_sys_info()
> call up some lines, in order kmsg_dump() accounts this new information
> and exposes it through pstore or other kmsg dumpers. Notice that this
> new earlier call doesn't set "after_kmsg_dumpers" so we print the
> information set by user in panic_print, except the console replay that,
> if set, will be executed after the console flushing.

This paragraph is too complicated. The important information here is:

 "panic_print_sys_info() is moved above kmsg_dump(). It allows to dump
 the extra information."

The trick with the boolean is already explained elsewhere. It is not
needed to repeat it in this paragraph.


> Also, worth to notice we needed to guard the panic_print_sys_info(false)
> calls against double print - we use kexec_crash_loaded() helper for that
> (see discussion [1] for more details).

This should be explained together with the reason to call it on two
locations, see below.


> In the first version of this patch we discussed the risks in the
> mailing-list [0], and seems this approach is the least terrible,

the above is not needed.

> despite still having risks of polluting the log with the bulk of
> information that panic_print may bring, losing older messages.
> In order to attenuate that, we've added a warning in the
> kernel-parameters.txt so that users enabling this mechanism consider
> to increase the log buffer size via "log_buf_len" as well.

Again, I would describe the problem and solution a directive way.

"The additional messages might overwrite the oldest messages when
 the buffer is full. The only reasonable solution is to use large
 enough log buffer. The advice is added into kernel-parameters.txt."



> Finally, another decision was to keep 2 panic_print_sys_info(false)
> calls (instead of just bringing it up some code lines and keep a single
> call) due to the panic notifiers:

The above is too complicated and not important.

> if kdump is not set, currently the
> panic_print information is collected after the notifiers and since
> it's a bit safer this way,

The important information is why it is safer. Honestly, I am still not sure
about this claim. I have checked several notifiers today and they did
not added much stability.


> we decided to keep it as is, only modifying
> the kdump case as per the previous commit [2] (see more details about
> this discussion also in thread [1]).

This does not provide much information. Link to linux-next is bad
idea, see below.


> [0] https://lore.kernel.org/lkml/20211230161828.121858-1-gpicc...@igalia.com
> [1] 
> https://lore.kernel.org/lkml/f25672a4-e4dd-29e8-b2db-f92dd9ff9...@igalia.com
> [2] 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=5613b7538f69

linux-next is regularly rebased and Andrew's patches are always newly
imported from quilt. The commit probably already has another hash.

> 
> Cc: Feng Tang 
> Cc: Petr Mladek 
> Signed-off-by: Guilherme G. Piccoli 
> ---
> 
> 
> V3: Added a guard in the 2nd panic_print_sys_info(false) to prevent
> double print - thanks for catching this Petr!
> 
> I didn't implement your final suggestion Petr, i.e., putting 

Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

2022-01-19 Thread Guilherme G. Piccoli
Thanks again Petr, for the deep analysis! Much appreciated.
Some comments inline below:


On 19/01/2022 12:48, Petr Mladek wrote:
>[...]
> From my POV, the function of panic notifiers is not well defined. They
> do various things, for example:
> [...] 
> 
> The do more that just providing information. Some are risky. It is not
> easy to disable a particular one.

We are trying to change that here:
https://lore.kernel.org/lkml/20220108153451.195121-1-gpicc...@igalia.com/

Your review/comments are very welcome =)


> [...] 
> It might make sense to allow to call kmsg_dump before panic notifiers
> to reduce the risk of a breakage. But I do not have enough experience
> with them to judge this.
> 
> I can't remember any bug report in this code. I guess that only
> few people use kmsg_dump.

One of the problems doing that is that RCU and hung task detector, for
example, have panic notifiers to disable themselves in the panic
scenario - if we kmsg_dump() _before_ the panic notifiers, we may have
intermixed messages, all messy...so for me it makes sense to keep the
kmsg_dump() after panic notifiers.


> [...]
> Yes, panic_print_sys_info() increases the risk that the crash dump
> will not succeed. But the change makes sense because:
> 
>   + panic_print_sys_info() does nothing by default. Most users will
> not enable it together with crash dump.
> 
>   + Guilherme uses crash dump only to dump the kernel log. It might
> be more reliable than kmsg_dump. In this case, panic_print_sys_info()
> is the only way to get the extra information.
> 
>   + panic_print_sys_info() might be useful even with full crash dump.
> For example, ftrace messages are not easy to read from the memory
> dump.

The last point is really good, I didn't consider that before but makes a
lot of sense - we can now dump (a hopefully small!) ftrace/event trace
buffer to dmesg before a kdump, making it pretty easy to read that later.
Cheers,


Guilherme

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

2022-01-19 Thread Petr Mladek
On Wed 2022-01-19 15:13:18, Baoquan He wrote:
> On 01/14/22 at 03:30pm, Guilherme G. Piccoli wrote:
> > The panic_print setting allows users to collect more information in a
> > panic event, like memory stats, tasks, CPUs backtraces, etc.
> > This is a pretty interesting debug mechanism, but currently the print
> > event happens *after* kmsg_dump(), meaning that pstore, for example,
> > cannot collect a dmesg with the panic_print information.
> .. 
> 
> Thanks for the effort.
>   
> 
> I have some concerns about this patch, and it's still not clear to me
> why the thing done in this patch is needed. Hope you don't mind if I ask
> stupid question or things have been discussed in the earlier post.
> 
> Firstly, let me add some background information and details about the
> problem with my understanding, please correct me if anthing is wrong.
> 
> Background:
> ===
> Currently, in panic(), there are 5 different ways to try best to
> collect information:
> 1) Vmcore dumping
>done via kdump, high weight, almost get all information we want;
>need switch to kdump kernel immediately.

My experience is that it basically always works when it is correctly
configured. It might be tested before the crash.



> 2) Panic notifier
>iterating those registered handlers, light weight, information got
>depends on those panic handlers. Try its best after panic, do not reboot.


>From my POV, the function of panic notifiers is not well defined. They
do various things, for example:

  + on_panic_nb() flushes s390 console buffers. It looks like a
complex and risky code. For example, __sclp_vt220_flush_buffer()
takes spin locks and calls timer code.

  + dump_gisb_error() prints some info but it also communicates with
the device, see gisb_read() and gisb_write() in
brcmstb_gisb_arb_decode_addr(). I am not sure how reliable this is.

  + parisc_panic_event() re-enables the soft-power switch. I am not
sure how safe this is.

  + pvpanic_panic_notify() takes spin lock and does in iowrite().


The do more that just providing information. Some are risky. It is not
easy to disable a particular one.


> 3) kmsg_dump
>iterating registered dumpers to collect as much kernel log as possible
>to store. E.g on pstore, light weight, only kernel log, do not reboot,
>log can be checked after system reboot.

>From my POV, it is similar functionality like the crash dump.

It might make sense to allow to call kmsg_dump before panic notifiers
to reduce the risk of a breakage. But I do not have enough experience
with them to judge this.

I can't remember any bug report in this code. I guess that only
few people use kmsg_dump.


> 4)console_flush_on_panic(CONSOLE_FLUSH_PENDING);
>Flush to get the pending logbuf printed out to console.

My experience is that there are many problems with consoles.
There might be deadlocks because of internal locks. Some
serial consoles are incredibly slow. Messages in the console
drivers might cause infinite output. Some drivers are pretty
complex, especially tty that is often enabled.

console_flush_on_panic() makes it even worse. It resets console_sem.
It means that it tries to use the consoles even when they are
already in use.

Note that printk() tries to show the messages on the console
immediately. console_flush_on_panic() is just the last
instance when all safe ways failed.


> 5)panic_print
>Print out more system info, including all dmesg, task states, mem
>info, etc.

This adds more information to the kernel log. They can be requested
by kernel commandline parameter. They can be seen only on the console
at the moment.

It makes sense to allow to see them also in crash dump or kmsg_dump.

> About the change, my question is why not packing
> console_flush_on_panic(CONSOLE_FLUSH_PENDING) into panic_print(), and
> move panic_print() up to be above kmsg_dump().

As explained above. console_flush_on_panic() is a dirty hack.
It really has to be called at the end as the last resort.


> And adding panic_print_sys_info(false) before kdump might not be a good
> idea. Firstly, panicked system is very unstable, kdump need switch to
> 2nd kernel asap w/o any extra action so that it can survive and collect
> tons of information. Added panic_print_sys_info(false) is obviously
> unnecessary to kdump, and increase risk to kernel switching.

Yes, panic_print_sys_info() increases the risk that the crash dump
will not succeed. But the change makes sense because:

  + panic_print_sys_info() does nothing by default. Most users will
not enable it together with crash dump.

  + Guilherme uses crash dump only to dump the kernel log. It might
be more reliable than kmsg_dump. In this case, panic_print_sys_info()
is the only way to get the extra information.

  + panic_print_sys_info() might be useful even with full crash dump.
For example, 

Re: [PATCH v1] proc/vmcore: fix false positive lockdep warning

2022-01-19 Thread David Hildenbrand
On 19.01.22 16:15, David Hildenbrand wrote:
> On 19.01.22 16:08, Boqun Feng wrote:
>> Hi,
>>
>> On Wed, Jan 19, 2022 at 12:37:02PM +0100, David Hildenbrand wrote:
>>> Lockdep complains that we do during mmap of the vmcore:
>>> down_write(mmap_lock);
>>> down_read(vmcore_cb_rwsem);
>>> And during read of the vmcore:
>>> down_read(vmcore_cb_rwsem);
>>> down_read(mmap_lock);
>>>
>>> We cannot possibly deadlock when only taking vmcore_cb_rwsem in read
>>> mode, however, it's hard to teach that to lockdep.
>>>
>>
>> Lockdep warned about the above sequences because rw_semaphore is a fair
>> read-write lock, and the following can cause a deadlock:
>>
>>  TASK 1  TASK 2  TASK 3
>>  ==  ==  ==
>>  down_write(mmap_lock);
>>  down_read(vmcore_cb_rwsem)
>>  down_write(vmcore_cb_rwsem); // 
>> blocked
>>  down_read(vmcore_cb_rwsem); // cannot get the lock because of the 
>> fairness
>>  down_read(mmap_lock); // blocked
>>  
>> IOW, a reader can block another read if there is a writer queued by the
>> second reader and the lock is fair.
>>
>> So there is a deadlock possiblity.
> 
> Task 3 will never take the mmap_lock before doing a
> down_write(vmcore_cb_rwsem).
> 
> How would this happen?

Ah, I get it, nevermind. I'll adjust the patch description.

Thanks!

-- 
Thanks,

David / dhildenb


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v1] proc/vmcore: fix false positive lockdep warning

2022-01-19 Thread David Hildenbrand
On 19.01.22 16:08, Boqun Feng wrote:
> Hi,
> 
> On Wed, Jan 19, 2022 at 12:37:02PM +0100, David Hildenbrand wrote:
>> Lockdep complains that we do during mmap of the vmcore:
>>  down_write(mmap_lock);
>>  down_read(vmcore_cb_rwsem);
>> And during read of the vmcore:
>>  down_read(vmcore_cb_rwsem);
>>  down_read(mmap_lock);
>>
>> We cannot possibly deadlock when only taking vmcore_cb_rwsem in read
>> mode, however, it's hard to teach that to lockdep.
>>
> 
> Lockdep warned about the above sequences because rw_semaphore is a fair
> read-write lock, and the following can cause a deadlock:
> 
>   TASK 1  TASK 2  TASK 3
>   ==  ==  ==
>   down_write(mmap_lock);
>   down_read(vmcore_cb_rwsem)
>   down_write(vmcore_cb_rwsem); // 
> blocked
>   down_read(vmcore_cb_rwsem); // cannot get the lock because of the 
> fairness
>   down_read(mmap_lock); // blocked
>   
> IOW, a reader can block another read if there is a writer queued by the
> second reader and the lock is fair.
> 
> So there is a deadlock possiblity.

Task 3 will never take the mmap_lock before doing a
down_write(vmcore_cb_rwsem).

How would this happen?


-- 
Thanks,

David / dhildenb


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v1] proc/vmcore: fix false positive lockdep warning

2022-01-19 Thread Boqun Feng
Hi,

On Wed, Jan 19, 2022 at 12:37:02PM +0100, David Hildenbrand wrote:
> Lockdep complains that we do during mmap of the vmcore:
>   down_write(mmap_lock);
>   down_read(vmcore_cb_rwsem);
> And during read of the vmcore:
>   down_read(vmcore_cb_rwsem);
>   down_read(mmap_lock);
> 
> We cannot possibly deadlock when only taking vmcore_cb_rwsem in read
> mode, however, it's hard to teach that to lockdep.
> 

Lockdep warned about the above sequences because rw_semaphore is a fair
read-write lock, and the following can cause a deadlock:

TASK 1  TASK 2  TASK 3
==  ==  ==
down_write(mmap_lock);
down_read(vmcore_cb_rwsem)
down_write(vmcore_cb_rwsem); // 
blocked
down_read(vmcore_cb_rwsem); // cannot get the lock because of the 
fairness
down_read(mmap_lock); // blocked

IOW, a reader can block another read if there is a writer queued by the
second reader and the lock is fair.

So there is a deadlock possiblity.

Regards,
Boqun


> So instead, convert to srcu to make lockdep happy. We need srcu as our
> callbacks can sleep. Witht his change, I cannot trigger any lockdep
> complaint.
> 
> [6.386519] ==
> [6.387203] WARNING: possible circular locking dependency detected
> [6.387965] 5.17.0-0.rc0.20220117git0c947b893d69.68.test.fc36.x86_64 #1 
> Not tainted
> [6.388899] --
> [6.389657] makedumpfile/542 is trying to acquire lock:
> [6.390308] 832d2eb8 (vmcore_cb_rwsem){.+.+}-{3:3}, at: 
> mmap_vmcore+0x340/0x580
> [6.391290]
> [6.391290] but task is already holding lock:
> [6.391978] 8880af226438 (>mmap_lock#2){}-{3:3}, at: 
> vm_mmap_pgoff+0x84/0x150
> [6.392898]
> [6.392898] which lock already depends on the new lock.
> [6.392898]
> [6.393866]
> [6.393866] the existing dependency chain (in reverse order) is:
> [6.394762]
> [6.394762] -> #1 (>mmap_lock#2){}-{3:3}:
> [6.395530]lock_acquire+0xc3/0x1a0
> [6.396047]__might_fault+0x4e/0x70
> [6.396562]_copy_to_user+0x1f/0x90
> [6.397093]__copy_oldmem_page+0x72/0xc0
> [6.397663]read_from_oldmem+0x77/0x1e0
> [6.398229]read_vmcore+0x2c2/0x310
> [6.398742]proc_reg_read+0x47/0xa0
> [6.399265]vfs_read+0x101/0x340
> [6.399751]__x64_sys_pread64+0x5d/0xa0
> [6.400314]do_syscall_64+0x43/0x90
> [6.400778]entry_SYSCALL_64_after_hwframe+0x44/0xae
> [6.401390]
> [6.401390] -> #0 (vmcore_cb_rwsem){.+.+}-{3:3}:
> [6.402063]validate_chain+0x9f4/0x2670
> [6.402560]__lock_acquire+0x8f7/0xbc0
> [6.403054]lock_acquire+0xc3/0x1a0
> [6.403509]down_read+0x4a/0x140
> [6.403948]mmap_vmcore+0x340/0x580
> [6.404403]proc_reg_mmap+0x3e/0x90
> [6.404866]mmap_region+0x504/0x880
> [6.405322]do_mmap+0x38a/0x520
> [6.405744]vm_mmap_pgoff+0xc1/0x150
> [6.406258]ksys_mmap_pgoff+0x178/0x200
> [6.406823]do_syscall_64+0x43/0x90
> [6.407339]entry_SYSCALL_64_after_hwframe+0x44/0xae
> [6.407975]
> [6.407975] other info that might help us debug this:
> [6.407975]
> [6.408945]  Possible unsafe locking scenario:
> [6.408945]
> [6.409684]CPU0CPU1
> [6.410196]
> [6.410703]   lock(>mmap_lock#2);
> [6.411121]lock(vmcore_cb_rwsem);
> [6.411792]lock(>mmap_lock#2);
> [6.412465]   lock(vmcore_cb_rwsem);
> [6.412873]
> [6.412873]  *** DEADLOCK ***
> [6.412873]
> [6.413522] 1 lock held by makedumpfile/542:
> [6.414006]  #0: 8880af226438 (>mmap_lock#2){}-{3:3}, at: 
> vm_mmap_pgoff+0x84/0x150
> [6.414944]
> [6.414944] stack backtrace:
> [6.415432] CPU: 0 PID: 542 Comm: makedumpfile Not tainted 
> 5.17.0-0.rc0.20220117git0c947b893d69.68.test.fc36.x86_64 #1
> [6.416581] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
> [6.417272] Call Trace:
> [6.417593]  
> [6.417882]  dump_stack_lvl+0x5d/0x78
> [6.418346]  print_circular_bug+0x5d7/0x5f0
> [6.418821]  ? stack_trace_save+0x3a/0x50
> [6.419273]  ? save_trace+0x3d/0x330
> [6.419681]  check_noncircular+0xd1/0xe0
> [6.420217]  validate_chain+0x9f4/0x2670
> [6.420715]  ? __lock_acquire+0x8f7/0xbc0
> [6.421234]  ? __lock_acquire+0x8f7/0xbc0
> [6.421685]  __lock_acquire+0x8f7/0xbc0
> [6.422127]  lock_acquire+0xc3/0x1a0
> [6.422535]  ? mmap_vmcore+0x340/0x580
> [6.422965]  ? lock_is_held_type+0xe2/0x140
> [6.423432]  ? 

Re: [PATCH V3] panic: Move panic_print before kmsg dumpers

2022-01-19 Thread Guilherme G. Piccoli
On 19/01/2022 04:13, Baoquan He wrote:
> [...]
> Thanks for the effort.
>   
> 
> I have some concerns about this patch, and it's still not clear to me
> why the thing done in this patch is needed. Hope you don't mind if I ask
> stupid question or things have been discussed in the earlier post.
> 
> Firstly, let me add some background information and details about the
> problem with my understanding, please correct me if anthing is wrong.
> 

Hi Baoquan - first of all, thanks a lot for your review and detailed
analysis, this is pretty good and much appreciated!


> Background:
> ===
> Currently, in panic(), there are 5 different ways to try best to
> collect information:
> 1) Vmcore dumping
>done via kdump, high weight, almost get all information we want;
>need switch to kdump kernel immediately.
> 2) Panic notifier
>iterating those registered handlers, light weight, information got
>depends on those panic handlers. Try its best after panic, do not reboot.
> 3) kmsg_dump
>iterating registered dumpers to collect as much kernel log as possible
>to store. E.g on pstore, light weight, only kernel log, do not reboot,
>log can be checked after system reboot.
> 4)console_flush_on_panic(CONSOLE_FLUSH_PENDING);
>Flush to get the pending logbuf printed out to console.
> 5)panic_print
>Print out more system info, including all dmesg, task states, mem
>info, etc.
> ===
> 
> 
> The problem encoutered:
> ===
> Currently panic_print is done after kmsg_dump. This caused an issue that
> system info poked by panic_print to expose to logbuf have no chance to
> be collected by kmsg_dump. E.g pstore_dumper registered and handled in
> kmsg_dump(), is a light weight and helpful panic info collecting
> mechanism, will miss those lots of and helful message poked by
> panic_print.
> ==
> 

OK, I agree with you analysis, it's quite precise - our main problem
resolved in this patch is that some users may want to save the
panic_print information on pstore collected logs, and currently this
isn't possible.


> [...] 
> About the change, my question is why not packing
> console_flush_on_panic(CONSOLE_FLUSH_PENDING) into panic_print(), and
> move panic_print() up to be above kmsg_dump(). The pending console
> flusing and replaying all flush are all only called inside panic(), and
> aim to print out more information.
> 

About this point, the idea of not messing with the console flush comes
from a discussion in the V1 of this patch, see here:
https://lore.kernel.org/lkml/YdQzU0KkAVq2BWpk@alley/

Basically, there are much more risks than benefits in moving the console
flush earlier. Hence, we kept the console flushing intact and even the
replay flag in panic_print - that's the reason we need the new flag in
panic_print_sys_info().


> And adding panic_print_sys_info(false) before kdump might not be a good
> idea. Firstly, panicked system is very unstable, kdump need switch to
> 2nd kernel asap w/o any extra action so that it can survive and collect
> tons of information. Added panic_print_sys_info(false) is obviously
> unnecessary to kdump, and increase risk to kernel switching.
> 
> If I missing anything, please help point out so that I can have a
> complete view on this isuse and its fix.
> 

And this is another point, and for that I must apologize since I sent
this change in another patch[0] but forgot to CC the kexec list; so
details are in this thread:
https://lore.kernel.org/lkml/20211109202848.610874-4-gpicc...@igalia.com/

There was a discussion with Dave there, thankfully he was able to see
the patch using some lore/lei search in lkml, but let me give you the
summary here.

So, I agree with you that calling the panic_print handler before kdump
_increases the risk_ of a kdump failure - this is more or less like a
the "crash_kexec_post_notifiers", right? It's a decision from the user,
with its natural trade-offs and considerations to be made.
But in the panic_print case, users weren't allowed to decide, before my
patch - it was just impossible to ever get the panic_print output before
the kdump.

My patch just enabled the users to have this decision. It's not like if
we are now dumping the panic_print always, it's still a parameter that
users must add consciously there, but now they are entitled to have this
choice! And the bonus is that we can have kdump collecting only the
dmesg, which is fast and storage-friendly (since the vmcore, even
compressed, is quite big) but still, with lots of information from the
panic_print output. Of course, there's the trade-off here, the risk of
printing all that information before the kdump, that might make kdump
more unstable, though this is a decision that the users/sysadmin must
take into account.

Finally, Dave suggested that we could make panic_print a panic notifier,
and this idea is pretty good! But then, the 

Re: [PATCH v2 0/5] kexec: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef

2022-01-19 Thread Jisheng Zhang
On Wed, Jan 19, 2022 at 05:33:22PM +0800, Baoquan He wrote:
> On 01/19/22 at 09:52am, Alexandre Ghiti wrote:
> > Hi Baoquan,
> > 
> > On Wed, Jan 19, 2022 at 9:11 AM Baoquan He  wrote:
> > >
> > > On 01/18/22 at 10:13pm, Jisheng Zhang wrote:
> > > > On Sun, Jan 16, 2022 at 09:38:47PM +0800, Baoquan He wrote:
> > > > > Hi Jisheng,
> > > >
> > > > Hi Baoquan,
> > > >
> > > > >
> > > > > On 12/07/21 at 12:05am, Jisheng Zhang wrote:
> > > > > > Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE"
> > > > > > by a check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code
> > > > > > and increase compile coverage.
> > > > >
> > > > > I go through this patchset, You mention the benefits it brings are
> > > > > 1) simplity the code;
> > > > > 2) increase compile coverage;
> > > > >
> > > > > For benefit 1), it mainly removes the dummy function in x86, arm and
> > > > > arm64, right?
> > > >
> > > > Another benefit: remove those #ifdef #else #endif usage. Recently, I
> > > > fixed a bug due to lots of "#ifdefs":
> > > > http://lists.infradead.org/pipermail/linux-riscv/2021-December/010607.html
> > >
> > > Glad to know the fix. While, sometime the ifdeffery is necessary. I am
> > > sorry about the one in riscv and you have fixed, it's truly a bug . But,
> > > the increasing compile coverage at below you tried to make, it may cause
> > > issue. Please see below my comment.
> > >
> > > >
> > > > >
> > > > > For benefit 2), increasing compile coverage, could you tell more how 
> > > > > it
> > > > > achieves and why it matters? What if people disables 
> > > > > CONFIG_KEXEC_CORE in
> > > > > purpose? Please forgive my poor compiling knowledge.
> > > >
> > > > Just my humble opinion, let's compare the code::
> > > >
> > > > #ifdef CONFIG_KEXEC_CORE
> > > >
> > > > code block A;
> > > >
> > > > #endif
> > > >
> > > > If KEXEC_CORE is disabled, code block A won't be compiled at all, the
> > > > preprocessor will remove code block A;
> > > >
> > > > If we convert the code to:
> > > >
> > > > if (IS_ENABLED(CONFIG_KEXEC_CORE)) {
> > > >   code block A;
> > > > }
> > > >
> > > > Even if KEXEC_CORE is disabled, code block A is still compiled.
> > >
> > > This is what I am worried about. Before, if CONFIG_KEXEC_CORE is
> > > unset, those relevant codes are not compiled in. I can't see what
> > > benefit is brought in if compiled in the unneeded code block. Do I miss
> > > anything?
> > >
> > 
> > This is explained in Documentation/process/coding-style.rst "21)
> > Conditional Compilation".
> 
> Thanks for the pointer, Alex.
> 
> I read that part, while my confusion isn't gone. With the current code,
> CONFIG_KEXEC_CORE is set,
>   - reserve_crashkernel_low() and reserve_crashkernel() compiled in.

Although the code block will be compiled, but the code block will be
optimized out.

> CONFIG_KEXEC_CORE is unset,
>   - reserve_crashkernel_low() and reserve_crashkernel() compiled out. 
> 
> After this patch applied, does it have the same effect as the old code?

I compared the .o, and can confirm they acchieve the same effect.

> 
> arch/x86/kernel/setup.c:
> 
> before
> ==
> #ifdef CONFIG_KEXEC_CORE
> static int __init reserve_crashkernel_low(void)
> {
>   ..
> }
> static void __init reserve_crashkernel(void)
> {
>   ..
> }
> #else
> static void __init reserve_crashkernel(void)
> {
> }
> #endif
> 
> after
> ===
> static int __init reserve_crashkernel_low(void)
> {
>   ..
> }
> static void __init reserve_crashkernel(void)
> {
>   ..
>   if (!IS_ENABLED(CONFIG_KEXEC_CORE))
> return;
>   ..
> }
> 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH v1] proc/vmcore: fix false positive lockdep warning

2022-01-19 Thread David Hildenbrand
Lockdep complains that we do during mmap of the vmcore:
down_write(mmap_lock);
down_read(vmcore_cb_rwsem);
And during read of the vmcore:
down_read(vmcore_cb_rwsem);
down_read(mmap_lock);

We cannot possibly deadlock when only taking vmcore_cb_rwsem in read
mode, however, it's hard to teach that to lockdep.

So instead, convert to srcu to make lockdep happy. We need srcu as our
callbacks can sleep. Witht his change, I cannot trigger any lockdep
complaint.

[6.386519] ==
[6.387203] WARNING: possible circular locking dependency detected
[6.387965] 5.17.0-0.rc0.20220117git0c947b893d69.68.test.fc36.x86_64 #1 Not 
tainted
[6.388899] --
[6.389657] makedumpfile/542 is trying to acquire lock:
[6.390308] 832d2eb8 (vmcore_cb_rwsem){.+.+}-{3:3}, at: 
mmap_vmcore+0x340/0x580
[6.391290]
[6.391290] but task is already holding lock:
[6.391978] 8880af226438 (>mmap_lock#2){}-{3:3}, at: 
vm_mmap_pgoff+0x84/0x150
[6.392898]
[6.392898] which lock already depends on the new lock.
[6.392898]
[6.393866]
[6.393866] the existing dependency chain (in reverse order) is:
[6.394762]
[6.394762] -> #1 (>mmap_lock#2){}-{3:3}:
[6.395530]lock_acquire+0xc3/0x1a0
[6.396047]__might_fault+0x4e/0x70
[6.396562]_copy_to_user+0x1f/0x90
[6.397093]__copy_oldmem_page+0x72/0xc0
[6.397663]read_from_oldmem+0x77/0x1e0
[6.398229]read_vmcore+0x2c2/0x310
[6.398742]proc_reg_read+0x47/0xa0
[6.399265]vfs_read+0x101/0x340
[6.399751]__x64_sys_pread64+0x5d/0xa0
[6.400314]do_syscall_64+0x43/0x90
[6.400778]entry_SYSCALL_64_after_hwframe+0x44/0xae
[6.401390]
[6.401390] -> #0 (vmcore_cb_rwsem){.+.+}-{3:3}:
[6.402063]validate_chain+0x9f4/0x2670
[6.402560]__lock_acquire+0x8f7/0xbc0
[6.403054]lock_acquire+0xc3/0x1a0
[6.403509]down_read+0x4a/0x140
[6.403948]mmap_vmcore+0x340/0x580
[6.404403]proc_reg_mmap+0x3e/0x90
[6.404866]mmap_region+0x504/0x880
[6.405322]do_mmap+0x38a/0x520
[6.405744]vm_mmap_pgoff+0xc1/0x150
[6.406258]ksys_mmap_pgoff+0x178/0x200
[6.406823]do_syscall_64+0x43/0x90
[6.407339]entry_SYSCALL_64_after_hwframe+0x44/0xae
[6.407975]
[6.407975] other info that might help us debug this:
[6.407975]
[6.408945]  Possible unsafe locking scenario:
[6.408945]
[6.409684]CPU0CPU1
[6.410196]
[6.410703]   lock(>mmap_lock#2);
[6.411121]lock(vmcore_cb_rwsem);
[6.411792]lock(>mmap_lock#2);
[6.412465]   lock(vmcore_cb_rwsem);
[6.412873]
[6.412873]  *** DEADLOCK ***
[6.412873]
[6.413522] 1 lock held by makedumpfile/542:
[6.414006]  #0: 8880af226438 (>mmap_lock#2){}-{3:3}, at: 
vm_mmap_pgoff+0x84/0x150
[6.414944]
[6.414944] stack backtrace:
[6.415432] CPU: 0 PID: 542 Comm: makedumpfile Not tainted 
5.17.0-0.rc0.20220117git0c947b893d69.68.test.fc36.x86_64 #1
[6.416581] Hardware name: Red Hat KVM, BIOS 0.5.1 01/01/2011
[6.417272] Call Trace:
[6.417593]  
[6.417882]  dump_stack_lvl+0x5d/0x78
[6.418346]  print_circular_bug+0x5d7/0x5f0
[6.418821]  ? stack_trace_save+0x3a/0x50
[6.419273]  ? save_trace+0x3d/0x330
[6.419681]  check_noncircular+0xd1/0xe0
[6.420217]  validate_chain+0x9f4/0x2670
[6.420715]  ? __lock_acquire+0x8f7/0xbc0
[6.421234]  ? __lock_acquire+0x8f7/0xbc0
[6.421685]  __lock_acquire+0x8f7/0xbc0
[6.422127]  lock_acquire+0xc3/0x1a0
[6.422535]  ? mmap_vmcore+0x340/0x580
[6.422965]  ? lock_is_held_type+0xe2/0x140
[6.423432]  ? mmap_vmcore+0x340/0x580
[6.423893]  down_read+0x4a/0x140
[6.424321]  ? mmap_vmcore+0x340/0x580
[6.424800]  mmap_vmcore+0x340/0x580
[6.425237]  ? vm_area_alloc+0x1c/0x60
[6.425661]  ? trace_kmem_cache_alloc+0x30/0xe0
[6.426174]  ? kmem_cache_alloc+0x1e0/0x2f0
[6.426641]  proc_reg_mmap+0x3e/0x90
[6.427052]  mmap_region+0x504/0x880
[6.427462]  do_mmap+0x38a/0x520
[6.427842]  vm_mmap_pgoff+0xc1/0x150
[6.428260]  ksys_mmap_pgoff+0x178/0x200
[6.428701]  do_syscall_64+0x43/0x90
[6.429126]  entry_SYSCALL_64_after_hwframe+0x44/0xae
[6.429745] RIP: 0033:0x7fc7359b8fc7
[6.430157] Code: 00 00 00 89 ef e8 69 b3 ff ff eb e4 e8 c2 64 01 00 66 90 
f3 0f 1e fa 41 89 ca 41 f7 c1 ff 0f 00 00 75 10 b8 09 00 00 00 0f 05 <48> 3d 00 
f0 ff ff 77 21 c3 48 8b 05 21 7e 0e 00 64 c7 00 16 00 00
[6.432147] RSP: 002b:7fff35b4c208 EFLAGS: 0246 ORIG_RAX: 
0009
[6.432970] RAX: ffda RBX: 0001 RCX: 

Re: [PATCH v3 6/6] crash hp: Add x86 crash hotplug support

2022-01-19 Thread Baoquan He
On 01/10/22 at 02:57pm, Eric DeVolder wrote:
> For x86_64, when CPU or memory is hot un/plugged, the crash
> elfcorehdr, which describes the CPUs and memory in the system,
> must also be updated.
> 
> To update the elfcorehdr for x86_64, a new elfcorehdr must be
> generated from the available CPUs and memory. The new elfcorehdr
> is prepared into a buffer, and if no errors occur, it is
> installed over the top of the existing elfcorehdr.
> 
> In the patch 'crash hp: kexec_file changes for crash hotplug support'
> the need to update purgatory due to the change in elfcorehdr was
> eliminated.  As a result, no changes to purgatory or boot_params
> (as the elfcorehdr= kernel command line parameter pointer
> remains unchanged and correct) are needed, just elfcorehdr.
> 
> To accommodate a growing number of resources via hotplug, the
> elfcorehdr segment must be sufficiently large enough to accommodate
> changes, see the CRASH_HOTPLUG_ELFCOREHDR_SZ configure item.
> 
> NOTE that this supports both kexec_load and kexec_file_load. Support
> for kexec_load is made possible by identifying the elfcorehdr segment
> at load time and updating it as previously described. However, it is
> the responsibility of the userspace kexec utility to ensure that:
>  - the elfcorehdr segment is sufficiently large enough to accommodate
>hotplug changes, ala CRASH_HOTPLUG_ELFCOREHDR_SZ.
>  - provides a purgatory that excludes the elfcorehdr from its list of
>run-time segments to check.
> These changes to the userspace kexec utility are not yet available.
> 
> Signed-off-by: Eric DeVolder 
> ---
>  arch/x86/kernel/crash.c | 138 +++-
>  1 file changed, 137 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
> index 9730c88530fc..d185137b33d4 100644
> --- a/arch/x86/kernel/crash.c
> +++ b/arch/x86/kernel/crash.c
> @@ -25,6 +25,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -265,7 +266,8 @@ static int prepare_elf_headers(struct kimage *image, void 
> **addr,
>   goto out;
>  
>   /* By default prepare 64bit headers */
> - ret =  crash_prepare_elf64_headers(cmem, IS_ENABLED(CONFIG_X86_64), 
> addr, sz);
> + ret =  crash_prepare_elf64_headers(image, cmem,
> + IS_ENABLED(CONFIG_X86_64), addr, sz);
>  
>  out:
>   vfree(cmem);
> @@ -397,7 +399,17 @@ int crash_load_segments(struct kimage *image)
>   image->elf_headers = kbuf.buffer;
>   image->elf_headers_sz = kbuf.bufsz;
>  
> +#ifdef CONFIG_CRASH_HOTPLUG
> + /* Ensure elfcorehdr segment large enough for hotplug changes */
> + kbuf.memsz = CONFIG_CRASH_HOTPLUG_ELFCOREHDR_SZ;

I would define a default value for the size, meantime provide a Kconfig
option to allow user to customize.

> + /* For marking as usable to crash kernel */
> + image->elf_headers_sz = kbuf.memsz;
> + /* Record the index of the elfcorehdr segment */
> + image->elf_index = image->nr_segments;
> + image->elf_index_valid = true;
> +#else
>   kbuf.memsz = kbuf.bufsz;
> +#endif
>   kbuf.buf_align = ELF_CORE_HEADER_ALIGN;
>   kbuf.mem = KEXEC_BUF_MEM_UNKNOWN;
>   ret = kexec_add_buffer();
> @@ -412,3 +424,127 @@ int crash_load_segments(struct kimage *image)
>   return ret;
>  }
>  #endif /* CONFIG_KEXEC_FILE */
> +
> +#ifdef CONFIG_CRASH_HOTPLUG

These two helper function should be carved out into a separate patch as
a preparatory one. I am considering how to rearrange and split the
patches, will reply to cover letter.

> +void *map_crash_pages(unsigned long paddr, unsigned long size)
> +{
> + /*
> +  * NOTE: The addresses and sizes passed to this routine have
> +  * already been fully aligned on page boundaries. There is no
> +  * need for massaging the address or size.
> +  */
> + void *ptr = NULL;
> +
> + /* NOTE: requires arch_kexec_[un]protect_crashkres() for write access */
> + if (size > 0) {
> + struct page *page = pfn_to_page(paddr >> PAGE_SHIFT);
> +
> + ptr = kmap(page);
> + }
> +
> + return ptr;
> +}
> +
> +void unmap_crash_pages(void **ptr)
> +{
> + if (ptr) {
> + if (*ptr)
> + kunmap(*ptr);
> + *ptr = NULL;
> + }
> +}
> +
> +void arch_crash_hotplug_handler(struct kimage *image,
> + unsigned int hp_action, unsigned long a, unsigned long b)
> +{
> + /*
> +  * To accurately reflect hot un/plug changes, the elfcorehdr (which
> +  * is passed to the crash kernel via the elfcorehdr= parameter)
> +  * must be updated with the new list of CPUs and memories. The new
> +  * elfcorehdr is prepared in a kernel buffer, and if no errors,
> +  * then it is written on top of the existing/old elfcorehdr.
> +  *
> +  * Due to the change to the elfcorehdr, purgatory must explicitly
> +  * exclude the elfcorehdr from the list of segments 

Re: [PATCH v2 0/5] kexec: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef

2022-01-19 Thread Baoquan He
On 01/19/22 at 09:52am, Alexandre Ghiti wrote:
> Hi Baoquan,
> 
> On Wed, Jan 19, 2022 at 9:11 AM Baoquan He  wrote:
> >
> > On 01/18/22 at 10:13pm, Jisheng Zhang wrote:
> > > On Sun, Jan 16, 2022 at 09:38:47PM +0800, Baoquan He wrote:
> > > > Hi Jisheng,
> > >
> > > Hi Baoquan,
> > >
> > > >
> > > > On 12/07/21 at 12:05am, Jisheng Zhang wrote:
> > > > > Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE"
> > > > > by a check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code
> > > > > and increase compile coverage.
> > > >
> > > > I go through this patchset, You mention the benefits it brings are
> > > > 1) simplity the code;
> > > > 2) increase compile coverage;
> > > >
> > > > For benefit 1), it mainly removes the dummy function in x86, arm and
> > > > arm64, right?
> > >
> > > Another benefit: remove those #ifdef #else #endif usage. Recently, I
> > > fixed a bug due to lots of "#ifdefs":
> > > http://lists.infradead.org/pipermail/linux-riscv/2021-December/010607.html
> >
> > Glad to know the fix. While, sometime the ifdeffery is necessary. I am
> > sorry about the one in riscv and you have fixed, it's truly a bug . But,
> > the increasing compile coverage at below you tried to make, it may cause
> > issue. Please see below my comment.
> >
> > >
> > > >
> > > > For benefit 2), increasing compile coverage, could you tell more how it
> > > > achieves and why it matters? What if people disables CONFIG_KEXEC_CORE 
> > > > in
> > > > purpose? Please forgive my poor compiling knowledge.
> > >
> > > Just my humble opinion, let's compare the code::
> > >
> > > #ifdef CONFIG_KEXEC_CORE
> > >
> > > code block A;
> > >
> > > #endif
> > >
> > > If KEXEC_CORE is disabled, code block A won't be compiled at all, the
> > > preprocessor will remove code block A;
> > >
> > > If we convert the code to:
> > >
> > > if (IS_ENABLED(CONFIG_KEXEC_CORE)) {
> > >   code block A;
> > > }
> > >
> > > Even if KEXEC_CORE is disabled, code block A is still compiled.
> >
> > This is what I am worried about. Before, if CONFIG_KEXEC_CORE is
> > unset, those relevant codes are not compiled in. I can't see what
> > benefit is brought in if compiled in the unneeded code block. Do I miss
> > anything?
> >
> 
> This is explained in Documentation/process/coding-style.rst "21)
> Conditional Compilation".

Thanks for the pointer, Alex.

I read that part, while my confusion isn't gone. With the current code,
CONFIG_KEXEC_CORE is set,
  - reserve_crashkernel_low() and reserve_crashkernel() compiled in. 
CONFIG_KEXEC_CORE is unset,
  - reserve_crashkernel_low() and reserve_crashkernel() compiled out. 

After this patch applied, does it have the same effect as the old code?

arch/x86/kernel/setup.c:

before
==
#ifdef CONFIG_KEXEC_CORE
static int __init reserve_crashkernel_low(void)
{
..
}
static void __init reserve_crashkernel(void)
{
..
}
#else
static void __init reserve_crashkernel(void)
{
}
#endif

after
===
static int __init reserve_crashkernel_low(void)
{
..
}
static void __init reserve_crashkernel(void)
{
..
if (!IS_ENABLED(CONFIG_KEXEC_CORE))
return;
..
}


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 0/5] kexec: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef

2022-01-19 Thread Alexandre Ghiti
Hi Baoquan,

On Wed, Jan 19, 2022 at 9:11 AM Baoquan He  wrote:
>
> On 01/18/22 at 10:13pm, Jisheng Zhang wrote:
> > On Sun, Jan 16, 2022 at 09:38:47PM +0800, Baoquan He wrote:
> > > Hi Jisheng,
> >
> > Hi Baoquan,
> >
> > >
> > > On 12/07/21 at 12:05am, Jisheng Zhang wrote:
> > > > Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE"
> > > > by a check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code
> > > > and increase compile coverage.
> > >
> > > I go through this patchset, You mention the benefits it brings are
> > > 1) simplity the code;
> > > 2) increase compile coverage;
> > >
> > > For benefit 1), it mainly removes the dummy function in x86, arm and
> > > arm64, right?
> >
> > Another benefit: remove those #ifdef #else #endif usage. Recently, I
> > fixed a bug due to lots of "#ifdefs":
> > http://lists.infradead.org/pipermail/linux-riscv/2021-December/010607.html
>
> Glad to know the fix. While, sometime the ifdeffery is necessary. I am
> sorry about the one in riscv and you have fixed, it's truly a bug . But,
> the increasing compile coverage at below you tried to make, it may cause
> issue. Please see below my comment.
>
> >
> > >
> > > For benefit 2), increasing compile coverage, could you tell more how it
> > > achieves and why it matters? What if people disables CONFIG_KEXEC_CORE in
> > > purpose? Please forgive my poor compiling knowledge.
> >
> > Just my humble opinion, let's compare the code::
> >
> > #ifdef CONFIG_KEXEC_CORE
> >
> > code block A;
> >
> > #endif
> >
> > If KEXEC_CORE is disabled, code block A won't be compiled at all, the
> > preprocessor will remove code block A;
> >
> > If we convert the code to:
> >
> > if (IS_ENABLED(CONFIG_KEXEC_CORE)) {
> >   code block A;
> > }
> >
> > Even if KEXEC_CORE is disabled, code block A is still compiled.
>
> This is what I am worried about. Before, if CONFIG_KEXEC_CORE is
> unset, those relevant codes are not compiled in. I can't see what
> benefit is brought in if compiled in the unneeded code block. Do I miss
> anything?
>

This is explained in Documentation/process/coding-style.rst "21)
Conditional Compilation".

Alex

>
> ___
> linux-riscv mailing list
> linux-ri...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-riscv

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v3 3/6] crash hp: definitions and prototype changes

2022-01-19 Thread Baoquan He
On 01/10/22 at 02:57pm, Eric DeVolder wrote:
> This change adds members to struct kimage to facilitate crash
> hotplug support.
> 
> This change also defines crash hotplug events and associated
> prototypes.
> 
> Signed-off-by: Eric DeVolder 
> ---
>  include/linux/kexec.h | 21 +++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 0c994ae37729..068f853f1c65 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -221,8 +221,9 @@ struct crash_mem {
>  extern int crash_exclude_mem_range(struct crash_mem *mem,
>  unsigned long long mstart,
>  unsigned long long mend);
> -extern int crash_prepare_elf64_headers(struct crash_mem *mem, int kernel_map,
> -void **addr, unsigned long *sz);
> +extern int crash_prepare_elf64_headers(struct kimage *image,
> + struct crash_mem *mem, int kernel_map,
> + void **addr, unsigned long *sz);
>  #endif /* CONFIG_KEXEC_FILE */
>  
>  #ifdef CONFIG_KEXEC_ELF
> @@ -299,6 +300,13 @@ struct kimage {
>  
>   /* Information for loading purgatory */
>   struct purgatory_info purgatory_info;
> +
> +#ifdef CONFIG_CRASH_HOTPLUG
> + bool hotplug_event;
> + int offlinecpu;
> + bool elf_index_valid;
> + int elf_index;

Do we really need elf_index_valid? Can we initialize elf_index to , e.g '-1',
then check if the value is valid?

> +#endif
>  #endif
>  
>  #ifdef CONFIG_IMA_KEXEC
> @@ -315,6 +323,15 @@ struct kimage {
>   unsigned long elf_load_addr;
>  };
>  
> +#ifdef CONFIG_CRASH_HOTPLUG
> +void arch_crash_hotplug_handler(struct kimage *image,
> + unsigned int hp_action, unsigned long a, unsigned long b);
> +#define KEXEC_CRASH_HP_REMOVE_CPU   0
> +#define KEXEC_CRASH_HP_ADD_CPU  1
> +#define KEXEC_CRASH_HP_REMOVE_MEMORY 2
> +#define KEXEC_CRASH_HP_ADD_MEMORY   3
> +#endif /* CONFIG_CRASH_HOTPLUG */
> +
>  /* kexec interface functions */
>  extern void machine_kexec(struct kimage *image);
>  extern int machine_kexec_prepare(struct kimage *image);
> -- 
> 2.27.0
> 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC v2 3/6] crash hp: definitions and prototype changes

2022-01-19 Thread Baoquan He
On 01/19/22 at 04:23pm, Baoquan He wrote:
> On 12/07/21 at 02:52pm, Eric DeVolder wrote:
> > This change adds members to struct kimage to facilitate crash
> > hotplug support.
> > 
> > This change also defines crash hotplug events and associated
> > prototypes.
> > 
> > Signed-off-by: Eric DeVolder 
> > ---
> >  include/linux/kexec.h | 21 +++--
> >  1 file changed, 19 insertions(+), 2 deletions(-)
> > 
> > diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> > index 0c994ae37729..068f853f1c65 100644
> > --- a/include/linux/kexec.h
> > +++ b/include/linux/kexec.h
> > @@ -221,8 +221,9 @@ struct crash_mem {
> >  extern int crash_exclude_mem_range(struct crash_mem *mem,
> >unsigned long long mstart,
> >unsigned long long mend);
> > -extern int crash_prepare_elf64_headers(struct crash_mem *mem, int 
> > kernel_map,
> > -  void **addr, unsigned long *sz);
> > +extern int crash_prepare_elf64_headers(struct kimage *image,
> > +   struct crash_mem *mem, int kernel_map,
> > +   void **addr, unsigned long *sz);
> >  #endif /* CONFIG_KEXEC_FILE */
> >  
> >  #ifdef CONFIG_KEXEC_ELF
> > @@ -299,6 +300,13 @@ struct kimage {
> >  
> > /* Information for loading purgatory */
> > struct purgatory_info purgatory_info;
> > +
> > +#ifdef CONFIG_CRASH_HOTPLUG
> > +   bool hotplug_event;
> > +   int offlinecpu;
> > +   bool elf_index_valid;
> > +   int elf_index;
> 
> Do we really need elf_index_valid? Can we initialize elf_index to , e.g '-1',
> then check if the value is valid?

Sorry, please ignore this one, I will add this comment in v3 thread.

> 
> > +#endif
> >  #endif
> >  
> >  #ifdef CONFIG_IMA_KEXEC
> > @@ -315,6 +323,15 @@ struct kimage {
> > unsigned long elf_load_addr;
> >  };
> >  
> > +#ifdef CONFIG_CRASH_HOTPLUG
> > +void arch_crash_hotplug_handler(struct kimage *image,
> > +   unsigned int hp_action, unsigned long a, unsigned long b);
> > +#define KEXEC_CRASH_HP_REMOVE_CPU   0
> > +#define KEXEC_CRASH_HP_ADD_CPU  1
> > +#define KEXEC_CRASH_HP_REMOVE_MEMORY 2
> > +#define KEXEC_CRASH_HP_ADD_MEMORY   3
> > +#endif /* CONFIG_CRASH_HOTPLUG */
> > +
> >  /* kexec interface functions */
> >  extern void machine_kexec(struct kimage *image);
> >  extern int machine_kexec_prepare(struct kimage *image);
> > -- 
> > 2.27.0
> > 
> 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC v2 3/6] crash hp: definitions and prototype changes

2022-01-19 Thread Baoquan He
On 12/07/21 at 02:52pm, Eric DeVolder wrote:
> This change adds members to struct kimage to facilitate crash
> hotplug support.
> 
> This change also defines crash hotplug events and associated
> prototypes.
> 
> Signed-off-by: Eric DeVolder 
> ---
>  include/linux/kexec.h | 21 +++--
>  1 file changed, 19 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/kexec.h b/include/linux/kexec.h
> index 0c994ae37729..068f853f1c65 100644
> --- a/include/linux/kexec.h
> +++ b/include/linux/kexec.h
> @@ -221,8 +221,9 @@ struct crash_mem {
>  extern int crash_exclude_mem_range(struct crash_mem *mem,
>  unsigned long long mstart,
>  unsigned long long mend);
> -extern int crash_prepare_elf64_headers(struct crash_mem *mem, int kernel_map,
> -void **addr, unsigned long *sz);
> +extern int crash_prepare_elf64_headers(struct kimage *image,
> + struct crash_mem *mem, int kernel_map,
> + void **addr, unsigned long *sz);
>  #endif /* CONFIG_KEXEC_FILE */
>  
>  #ifdef CONFIG_KEXEC_ELF
> @@ -299,6 +300,13 @@ struct kimage {
>  
>   /* Information for loading purgatory */
>   struct purgatory_info purgatory_info;
> +
> +#ifdef CONFIG_CRASH_HOTPLUG
> + bool hotplug_event;
> + int offlinecpu;
> + bool elf_index_valid;
> + int elf_index;

Do we really need elf_index_valid? Can we initialize elf_index to , e.g '-1',
then check if the value is valid?

> +#endif
>  #endif
>  
>  #ifdef CONFIG_IMA_KEXEC
> @@ -315,6 +323,15 @@ struct kimage {
>   unsigned long elf_load_addr;
>  };
>  
> +#ifdef CONFIG_CRASH_HOTPLUG
> +void arch_crash_hotplug_handler(struct kimage *image,
> + unsigned int hp_action, unsigned long a, unsigned long b);
> +#define KEXEC_CRASH_HP_REMOVE_CPU   0
> +#define KEXEC_CRASH_HP_ADD_CPU  1
> +#define KEXEC_CRASH_HP_REMOVE_MEMORY 2
> +#define KEXEC_CRASH_HP_ADD_MEMORY   3
> +#endif /* CONFIG_CRASH_HOTPLUG */
> +
>  /* kexec interface functions */
>  extern void machine_kexec(struct kimage *image);
>  extern int machine_kexec_prepare(struct kimage *image);
> -- 
> 2.27.0
> 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH v2 0/5] kexec: use IS_ENABLED(CONFIG_KEXEC_CORE) instead of #ifdef

2022-01-19 Thread Baoquan He
On 01/18/22 at 10:13pm, Jisheng Zhang wrote:
> On Sun, Jan 16, 2022 at 09:38:47PM +0800, Baoquan He wrote:
> > Hi Jisheng,
> 
> Hi Baoquan,
> 
> > 
> > On 12/07/21 at 12:05am, Jisheng Zhang wrote:
> > > Replace the conditional compilation using "#ifdef CONFIG_KEXEC_CORE"
> > > by a check for "IS_ENABLED(CONFIG_KEXEC_CORE)", to simplify the code
> > > and increase compile coverage.
> > 
> > I go through this patchset, You mention the benefits it brings are
> > 1) simplity the code;
> > 2) increase compile coverage;
> > 
> > For benefit 1), it mainly removes the dummy function in x86, arm and
> > arm64, right?
> 
> Another benefit: remove those #ifdef #else #endif usage. Recently, I
> fixed a bug due to lots of "#ifdefs":
> http://lists.infradead.org/pipermail/linux-riscv/2021-December/010607.html

Glad to know the fix. While, sometime the ifdeffery is necessary. I am
sorry about the one in riscv and you have fixed, it's truly a bug . But,
the increasing compile coverage at below you tried to make, it may cause
issue. Please see below my comment.

> 
> > 
> > For benefit 2), increasing compile coverage, could you tell more how it
> > achieves and why it matters? What if people disables CONFIG_KEXEC_CORE in
> > purpose? Please forgive my poor compiling knowledge.
> 
> Just my humble opinion, let's compare the code::
> 
> #ifdef CONFIG_KEXEC_CORE
> 
> code block A;
> 
> #endif
> 
> If KEXEC_CORE is disabled, code block A won't be compiled at all, the
> preprocessor will remove code block A;
> 
> If we convert the code to:
> 
> if (IS_ENABLED(CONFIG_KEXEC_CORE)) {
>   code block A;
> }
> 
> Even if KEXEC_CORE is disabled, code block A is still compiled.

This is what I am worried about. Before, if CONFIG_KEXEC_CORE is
unset, those relevant codes are not compiled in. I can't see what
benefit is brought in if compiled in the unneeded code block. Do I miss
anything?


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec