Re: [PATCH] makedumpfile: cope with not-present mem section

2020-02-19 Thread piliu


On 02/20/2020 04:12 AM, HAGIO KAZUHITO(萩尾 一仁) wrote:
> Hi Cascardo,
> 
> Do you have any solution or detailed information on the failure on your 
> kernel?
> or could you try this branch?  It has an additional patch on top of Pingfan's
> one to avoid the false positive failure that I'm suspecting:
> https://github.com/k-hagio/makedumpfile/tree/modify-mem_section-validation
> 
> Pingfan,
> Do you have an output of makedumpfile when the original failure occurs?
> If you don't and it's hard to get it, no need to do so.  I just would like to
> add it to your patch if available.
I did the test on a PowerVM. After hot removing the memory, save a raw
vmcore by "cp", then run makedumpfile against the "cp" vmcore, and it
will get the following error message:
# makedumpfile -x vmlinux -l -d 31 vmcore vmcore.dump
get_mem_section: Could not validate mem_section.
get_mm_sparsemem: Can't get the address of mem_section.

makedumpfile Failed.

Thanks,
Pingfan
> 
> Thanks,
> Kazu
> 
> -Original Message-
>> On 02/12/2020 12:11 PM, piliu wrote:
>>>
>>>
>>> On 02/06/2020 11:46 AM, piliu wrote:


 On 02/05/2020 05:18 AM, HAGIO KAZUHITO wrote:
>> -Original Message-
>> On Tue, Feb 04, 2020 at 02:24:17PM +0800, piliu wrote:
>>> Hi,
>>>
>>> Sorry to reply late due to a long festival.
>>>
>>> I have tested this patch against v4.15 and latest kernel with small
>>> modification to meet the situation we discussed here. Both work fine.
>>>
>>> The below is the modification of two kernel
>>>
>>> test1. latest kernel with two extra modification to expose the problem
>>> -1.1 reverts commit 1f503443e7df8dc8366608b4d810ce2d6669827c
>>> (mm/sparse.c: reset section's mem_map when fully deactivated), this
>>> commit work around this bug
>>> -1.2. reverts commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
>>> write correct address of mem_section into vmcoreinfo"). This will create
>>> a buggy situation as we discussed here.
>>> -1.3. fix building bug due to revert
>>> a0b1280368d1e91ab72f849ef095b4f07a39bbf1
>>>
>>> test2. v4.15, which include both commit 83e3c48729d9 and a0b1280368d1.
>>> -2.1. revert commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
>>> write correct address of mem_section into vmcoreinfo")
>>>
>>> So I can not see any problem with my patch.
>>> Maybe I misunderstand the discussion, but I can not see my original
>>> patch will break the kernel which have 83e3c48729d9 but not 
>>> a0b1280368d1.
>>>
>>> Thanks,
>>> Pingfan
>>>
>>
>> You also need to test the case where 83e3c48729d9 is not present at all. 
>> Can
>> you test on a 4.4 kernel, for example? As far as I understand, a vanilla 
>> 4.4
>> kernel would not be dumpable with your patch.
>
> As far as I've tested this patch with SPARSEMEM_EXTREME vmcores below, 
> it's OK:
>   - 51 vmcores of vanilla kernels (each from 2.6.36 through 5.5) on hand
>   - one more vanilla 4.4.0 kernel with a different config from the above
>
> So apparently not all vanilla 4.4 kernels are affected by the patch.
>
 Sorry, due to touch hardware resource in our lab, I can not have a test
 on v4.4 on a system with hotplug memory yet. I still try to find some
 resource.

 But from the logic of this patch, it just does the following changes:
 -1. After memory hot-removed, either mem_section.section_mem_map = NULL
 or mem_section.section_mem_map without SECTION_MARKED_PRESENT, we will
 have mem_maps[section_nr] = mem_map = NOT_MEMMAP_ADDR, so later it will
 be skipped.
 -2. If not populated, mem_section.section_mem_map = NULL. It can follow
 the same handling of hot-removed, and be skipped during parsing.

 And in v4.4 sparse_remove_one_section() just assigns ms->section_mem_map
 = 0, which can not be violated by the above changes.
>> Ping. As all of us can not reproduce this bug by v4.4 kernel, further
>> more, there is no code analysis, which persuades us this patch will
>> break the makedumpfile on any kernel version.
>>
>> Could this better-to-have patch be accepted?
>>
>> Thanks,
>> Pingfan
>>> Last night, I got a machine to test this scene. After applying my patch
>>> makedumpfile can still work with v4.4 kernel.
>>>
>>> Thanks,
>>> Pingfan
>>>
> 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] makedumpfile: cope with not-present mem section

2020-02-19 Thread Thadeu Lima de Souza Cascardo
On Wed, Feb 19, 2020 at 08:12:41PM +, HAGIO KAZUHITO(萩尾 一仁) wrote:
> Hi Cascardo,
> 
> Do you have any solution or detailed information on the failure on your 
> kernel?
> or could you try this branch?  It has an additional patch on top of Pingfan's
> one to avoid the false positive failure that I'm suspecting:
> https://github.com/k-hagio/makedumpfile/tree/modify-mem_section-validation
> 
> Pingfan,
> Do you have an output of makedumpfile when the original failure occurs?
> If you don't and it's hard to get it, no need to do so.  I just would like to
> add it to your patch if available.
> 
> Thanks,
> Kazu

Will try the said branch. Sorry that I couldn't work this out before. I was
trying to reproduce this today, but end up in a rabbit hole when qemu+KVM
started failing for unrelated reasons after an upgrade.

I'll try to come up with some new results by tomorrow later in the day.

Thanks.
Cascardo.

> 
> -Original Message-
> > On 02/12/2020 12:11 PM, piliu wrote:
> > >
> > >
> > > On 02/06/2020 11:46 AM, piliu wrote:
> > >>
> > >>
> > >> On 02/05/2020 05:18 AM, HAGIO KAZUHITO wrote:
> >  -Original Message-
> >  On Tue, Feb 04, 2020 at 02:24:17PM +0800, piliu wrote:
> > > Hi,
> > >
> > > Sorry to reply late due to a long festival.
> > >
> > > I have tested this patch against v4.15 and latest kernel with small
> > > modification to meet the situation we discussed here. Both work fine.
> > >
> > > The below is the modification of two kernel
> > >
> > > test1. latest kernel with two extra modification to expose the problem
> > > -1.1 reverts commit 1f503443e7df8dc8366608b4d810ce2d6669827c
> > > (mm/sparse.c: reset section's mem_map when fully deactivated), this
> > > commit work around this bug
> > > -1.2. reverts commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
> > > write correct address of mem_section into vmcoreinfo"). This will 
> > > create
> > > a buggy situation as we discussed here.
> > > -1.3. fix building bug due to revert
> > > a0b1280368d1e91ab72f849ef095b4f07a39bbf1
> > >
> > > test2. v4.15, which include both commit 83e3c48729d9 and a0b1280368d1.
> > > -2.1. revert commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
> > > write correct address of mem_section into vmcoreinfo")
> > >
> > > So I can not see any problem with my patch.
> > > Maybe I misunderstand the discussion, but I can not see my original
> > > patch will break the kernel which have 83e3c48729d9 but not 
> > > a0b1280368d1.
> > >
> > > Thanks,
> > > Pingfan
> > >
> > 
> >  You also need to test the case where 83e3c48729d9 is not present at 
> >  all. Can
> >  you test on a 4.4 kernel, for example? As far as I understand, a 
> >  vanilla 4.4
> >  kernel would not be dumpable with your patch.
> > >>>
> > >>> As far as I've tested this patch with SPARSEMEM_EXTREME vmcores below, 
> > >>> it's OK:
> > >>>   - 51 vmcores of vanilla kernels (each from 2.6.36 through 5.5) on hand
> > >>>   - one more vanilla 4.4.0 kernel with a different config from the above
> > >>>
> > >>> So apparently not all vanilla 4.4 kernels are affected by the patch.
> > >>>
> > >> Sorry, due to touch hardware resource in our lab, I can not have a test
> > >> on v4.4 on a system with hotplug memory yet. I still try to find some
> > >> resource.
> > >>
> > >> But from the logic of this patch, it just does the following changes:
> > >> -1. After memory hot-removed, either mem_section.section_mem_map = NULL
> > >> or mem_section.section_mem_map without SECTION_MARKED_PRESENT, we will
> > >> have mem_maps[section_nr] = mem_map = NOT_MEMMAP_ADDR, so later it will
> > >> be skipped.
> > >> -2. If not populated, mem_section.section_mem_map = NULL. It can follow
> > >> the same handling of hot-removed, and be skipped during parsing.
> > >>
> > >> And in v4.4 sparse_remove_one_section() just assigns ms->section_mem_map
> > >> = 0, which can not be violated by the above changes.
> > Ping. As all of us can not reproduce this bug by v4.4 kernel, further
> > more, there is no code analysis, which persuades us this patch will
> > break the makedumpfile on any kernel version.
> > 
> > Could this better-to-have patch be accepted?
> > 
> > Thanks,
> > Pingfan
> > > Last night, I got a machine to test this scene. After applying my patch
> > > makedumpfile can still work with v4.4 kernel.
> > >
> > > Thanks,
> > > Pingfan
> > >
> 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [PATCH] makedumpfile: cope with not-present mem section

2020-02-19 Thread 萩尾 一仁
Hi Cascardo,

Do you have any solution or detailed information on the failure on your kernel?
or could you try this branch?  It has an additional patch on top of Pingfan's
one to avoid the false positive failure that I'm suspecting:
https://github.com/k-hagio/makedumpfile/tree/modify-mem_section-validation

Pingfan,
Do you have an output of makedumpfile when the original failure occurs?
If you don't and it's hard to get it, no need to do so.  I just would like to
add it to your patch if available.

Thanks,
Kazu

-Original Message-
> On 02/12/2020 12:11 PM, piliu wrote:
> >
> >
> > On 02/06/2020 11:46 AM, piliu wrote:
> >>
> >>
> >> On 02/05/2020 05:18 AM, HAGIO KAZUHITO wrote:
>  -Original Message-
>  On Tue, Feb 04, 2020 at 02:24:17PM +0800, piliu wrote:
> > Hi,
> >
> > Sorry to reply late due to a long festival.
> >
> > I have tested this patch against v4.15 and latest kernel with small
> > modification to meet the situation we discussed here. Both work fine.
> >
> > The below is the modification of two kernel
> >
> > test1. latest kernel with two extra modification to expose the problem
> > -1.1 reverts commit 1f503443e7df8dc8366608b4d810ce2d6669827c
> > (mm/sparse.c: reset section's mem_map when fully deactivated), this
> > commit work around this bug
> > -1.2. reverts commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
> > write correct address of mem_section into vmcoreinfo"). This will create
> > a buggy situation as we discussed here.
> > -1.3. fix building bug due to revert
> > a0b1280368d1e91ab72f849ef095b4f07a39bbf1
> >
> > test2. v4.15, which include both commit 83e3c48729d9 and a0b1280368d1.
> > -2.1. revert commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
> > write correct address of mem_section into vmcoreinfo")
> >
> > So I can not see any problem with my patch.
> > Maybe I misunderstand the discussion, but I can not see my original
> > patch will break the kernel which have 83e3c48729d9 but not 
> > a0b1280368d1.
> >
> > Thanks,
> > Pingfan
> >
> 
>  You also need to test the case where 83e3c48729d9 is not present at all. 
>  Can
>  you test on a 4.4 kernel, for example? As far as I understand, a vanilla 
>  4.4
>  kernel would not be dumpable with your patch.
> >>>
> >>> As far as I've tested this patch with SPARSEMEM_EXTREME vmcores below, 
> >>> it's OK:
> >>>   - 51 vmcores of vanilla kernels (each from 2.6.36 through 5.5) on hand
> >>>   - one more vanilla 4.4.0 kernel with a different config from the above
> >>>
> >>> So apparently not all vanilla 4.4 kernels are affected by the patch.
> >>>
> >> Sorry, due to touch hardware resource in our lab, I can not have a test
> >> on v4.4 on a system with hotplug memory yet. I still try to find some
> >> resource.
> >>
> >> But from the logic of this patch, it just does the following changes:
> >> -1. After memory hot-removed, either mem_section.section_mem_map = NULL
> >> or mem_section.section_mem_map without SECTION_MARKED_PRESENT, we will
> >> have mem_maps[section_nr] = mem_map = NOT_MEMMAP_ADDR, so later it will
> >> be skipped.
> >> -2. If not populated, mem_section.section_mem_map = NULL. It can follow
> >> the same handling of hot-removed, and be skipped during parsing.
> >>
> >> And in v4.4 sparse_remove_one_section() just assigns ms->section_mem_map
> >> = 0, which can not be violated by the above changes.
> Ping. As all of us can not reproduce this bug by v4.4 kernel, further
> more, there is no code analysis, which persuades us this patch will
> break the makedumpfile on any kernel version.
> 
> Could this better-to-have patch be accepted?
> 
> Thanks,
> Pingfan
> > Last night, I got a machine to test this scene. After applying my patch
> > makedumpfile can still work with v4.4 kernel.
> >
> > Thanks,
> > Pingfan
> >

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCHv3] powerpc/crashkernel: take "mem=" option into account

2020-02-19 Thread Pingfan Liu
'mem=" option is an easy way to put high pressure on memory during some
test. Hence after applying the memory limit, instead of total mem, the
actual usable memory should be considered when reserving mem for
crashkernel. Otherwise the boot up may experience OOM issue.

E.g. it would reserve 4G prior to the change and 512M afterward, if passing
crashkernel="2G-4G:384M,4G-16G:512M,16G-64G:1G,64G-128G:2G,128G-:4G", and
mem=5G on a 256G machine.

This issue is powerpc specific because it puts higher priority on fadump
and kdump reservation than on "mem=". Referring the following code:
if (fadump_reserve_mem() == 0)
reserve_crashkernel();
...
/* Ensure that total memory size is page-aligned. */
limit = ALIGN(memory_limit ?: memblock_phys_mem_size(), PAGE_SIZE);
memblock_enforce_memory_limit(limit);

While on other arches, the effect of "mem=" takes a higher priority and pass
through memblock_phys_mem_size() before calling reserve_crashkernel().

Signed-off-by: Pingfan Liu 
To: linuxppc-...@lists.ozlabs.org
Cc: Hari Bathini 
Cc: Michael Ellerman 
Cc: kexec@lists.infradead.org
---
v2 -> v3: improve commit log
 arch/powerpc/kernel/machine_kexec.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/kernel/machine_kexec.c 
b/arch/powerpc/kernel/machine_kexec.c
index c4ed328..eec96dc 100644
--- a/arch/powerpc/kernel/machine_kexec.c
+++ b/arch/powerpc/kernel/machine_kexec.c
@@ -114,11 +114,12 @@ void machine_kexec(struct kimage *image)
 
 void __init reserve_crashkernel(void)
 {
-   unsigned long long crash_size, crash_base;
+   unsigned long long crash_size, crash_base, total_mem_sz;
int ret;
 
+   total_mem_sz = memory_limit ? memory_limit : memblock_phys_mem_size();
/* use common parsing */
-   ret = parse_crashkernel(boot_command_line, memblock_phys_mem_size(),
+   ret = parse_crashkernel(boot_command_line, total_mem_sz,
_size, _base);
if (ret == 0 && crash_size > 0) {
crashk_res.start = crash_base;
@@ -185,7 +186,7 @@ void __init reserve_crashkernel(void)
"for crashkernel (System RAM: %ldMB)\n",
(unsigned long)(crash_size >> 20),
(unsigned long)(crashk_res.start >> 20),
-   (unsigned long)(memblock_phys_mem_size() >> 20));
+   (unsigned long)(total_mem_sz >> 20));
 
if (!memblock_is_region_memory(crashk_res.start, crash_size) ||
memblock_reserve(crashk_res.start, crash_size)) {
-- 
2.7.5


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


19-02-2020

2020-02-19 Thread urs portmann
Guten Morgen,
  19-02-2020
Wir haben versucht, Sie zu erreichen und haben noch nichts von Ihnen gehört. 
Haben Sie unsere letzte E-Mail über Ihre S.p.e.n.d.e erhalten? Wenn nicht, 
melden Sie sich bitte bei uns, um weitere Informationen zu erhalten.

Wir warten darauf, von Ihnen zu hören, sobald Sie diese Nachricht erhalten, die 
Sie bei der weiteren Vorgehensweise unterstützt.

Mfg
urs portmann

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec