RE: [PATCH] makedumpfile: cope with not-present mem section

2020-04-30 Thread 萩尾 一仁
> -Original Message-
> On 04/29/2020 10:27 PM, HAGIO KAZUHITO wrote:
> > Hi Pingfan,
> >
> >> -Original Message-
> >> Hi Kazu and Cascardo,
> >>
> >> I encounter a weird problem when running makedumpfile on a s390 machine.
> >>
> >> Our production kernel uses extreme sparse memory model, and has the
> >> following:
> >>
> >> in mm/sparse.c
> >>
> >> #ifdef CONFIG_SPARSEMEM_EXTREME
> >> struct mem_section **mem_section;
> >> #else
> >> struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]
> >> cacheline_internodealigned_in_smp;
> >> #endif
> >>
> >> So in makedumpfile.c, get_mem_section(), it got a failed result when the
> >> first call site to validate_mem_section(), then it should success at the
> >> second call site to validate_mem_section(), which is inside if
> >> (is_sparsemem_extreme()) condition.
> >
> > I think your production kernel should have kernel commit a0b1280368d1
> > ("kdump: write correct address of mem_section into vmcoreinfo"), so the
> > first call should return TRUE and the second one should return FALSE.
> Yes, it is.
> >
> >>
> >> But the actual result is not like expected.
> >>
> >> After introducing
> >> commit e113f1c974c820f9633dc0073eda525d7575f365[PATCH] cope with
> >> not-present mem section
> >>
> >> I got two successful calls to validate_mem_section(), and finally failed
> >> at the condition
> >>ret = symbol_valid ^ pointer_valid;
> >>if (!ret) {
> >>ERRMSG("Could not validate mem_section.\n");
> >>}
> >>
> >>
> >> Do you have any idea?
> >
> > Presumably this will be what I expected that it might be possible.
> > I can apply the patch below this time, what about this?
> > https://github.com/k-hagio/makedumpfile-old/commit/ce883df3864a5744ac0f1eff47de06b5074edb5f.patch
> looks good.

Thanks.

> >
> > or, we can also investigate why the second call returns TRUE, and
> > fix the conditions in the validate_mem_section()..
> This is due to the relaxed condition check after applying my commit
> commit e113f1c974("[PATCH] cope with not-present mem section")
> 
> diff --git a/makedumpfile.c b/makedumpfile.c
> index ae7336a..607e07f 100644
> --- a/makedumpfile.c
> +++ b/makedumpfile.c
> @@ -3406,8 +3406,6 @@ section_mem_map_addr(unsigned long addr, unsigned
> long *map_mask)
> map = ULONG(mem_section + OFFSET(mem_section.section_mem_map));
> mask = SECTION_MAP_MASK;
> *map_mask = map & ~mask;
> -   if (map == 0x0)
> -   *map_mask |= SECTION_MARKED_PRESENT;
> map &= mask;
> free(mem_section);
> 
> @@ -3453,10 +3451,8 @@ validate_mem_section(unsigned long *mem_sec,
> mem_map = NOT_MEMMAP_ADDR;
> } else {
> mem_map = section_mem_map_addr(section, _mask);
> +   /* for either no mem_map or hot-removed */
> if (!(map_mask & SECTION_MARKED_PRESENT)) {
> -   return FALSE; --> a strict check
> -   }
> -   if (mem_map == 0) {
> mem_map = NOT_MEMMAP_ADDR;
> } else {
> mem_map = sparse_decode_mem_map(mem_map,
> 
> 
> Before my patch, it return FALSE for any non NULL value without
> SECTION_MARKED_PRESENT. But my patch relaxes the restriction and
> consider it as hot-removed mem_section and keeps the parsing on.

Yes, so I meant that we might add some conditions so that the second call
could return FALSE for your vmcore as expected.  But I decided to apply
the patch I wrote before.. and applied:
https://github.com/makedumpfile/makedumpfile/commit/81b79c514ff6fc881f1df4cb04ecb2d7cb22badc

I deferred merging this at that time because it might not be needed
actually and I didn't want to change the behavior if possible.
But it happened.

Thank you for the report.

Kazu

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] makedumpfile: cope with not-present mem section

2020-04-30 Thread piliu


On 04/29/2020 10:27 PM, HAGIO KAZUHITO(萩尾 一仁) wrote:
> Hi Pingfan,
> 
>> -Original Message-
>> Hi Kazu and Cascardo,
>>
>> I encounter a weird problem when running makedumpfile on a s390 machine.
>>
>> Our production kernel uses extreme sparse memory model, and has the
>> following:
>>
>> in mm/sparse.c
>>
>> #ifdef CONFIG_SPARSEMEM_EXTREME
>> struct mem_section **mem_section;
>> #else
>> struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]
>> cacheline_internodealigned_in_smp;
>> #endif
>>
>> So in makedumpfile.c, get_mem_section(), it got a failed result when the
>> first call site to validate_mem_section(), then it should success at the
>> second call site to validate_mem_section(), which is inside if
>> (is_sparsemem_extreme()) condition.
> 
> I think your production kernel should have kernel commit a0b1280368d1
> ("kdump: write correct address of mem_section into vmcoreinfo"), so the
> first call should return TRUE and the second one should return FALSE.
Yes, it is.
> 
>>
>> But the actual result is not like expected.
>>
>> After introducing
>> commit e113f1c974c820f9633dc0073eda525d7575f365[PATCH] cope with
>> not-present mem section
>>
>> I got two successful calls to validate_mem_section(), and finally failed
>> at the condition
>>  ret = symbol_valid ^ pointer_valid;
>>  if (!ret) {
>>  ERRMSG("Could not validate mem_section.\n");
>>  }
>>
>>
>> Do you have any idea?
> 
> Presumably this will be what I expected that it might be possible.
> I can apply the patch below this time, what about this?
> https://github.com/k-hagio/makedumpfile-old/commit/ce883df3864a5744ac0f1eff47de06b5074edb5f.patch
looks good.
> 
> or, we can also investigate why the second call returns TRUE, and
> fix the conditions in the validate_mem_section()..
This is due to the relaxed condition check after applying my commit
commit e113f1c974("[PATCH] cope with not-present mem section")

diff --git a/makedumpfile.c b/makedumpfile.c
index ae7336a..607e07f 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -3406,8 +3406,6 @@ section_mem_map_addr(unsigned long addr, unsigned
long *map_mask)
map = ULONG(mem_section + OFFSET(mem_section.section_mem_map));
mask = SECTION_MAP_MASK;
*map_mask = map & ~mask;
-   if (map == 0x0)
-   *map_mask |= SECTION_MARKED_PRESENT;
map &= mask;
free(mem_section);

@@ -3453,10 +3451,8 @@ validate_mem_section(unsigned long *mem_sec,
mem_map = NOT_MEMMAP_ADDR;
} else {
mem_map = section_mem_map_addr(section, _mask);
+   /* for either no mem_map or hot-removed */
if (!(map_mask & SECTION_MARKED_PRESENT)) {
-   return FALSE; --> a strict check
-   }
-   if (mem_map == 0) {
mem_map = NOT_MEMMAP_ADDR;
} else {
mem_map = sparse_decode_mem_map(mem_map,


Before my patch, it return FALSE for any non NULL value without
SECTION_MARKED_PRESENT. But my patch relaxes the restriction and
consider it as hot-removed mem_section and keeps the parsing on.

Thanks,
Pingfan
> 
> Thanks,
> Kazu
> ___
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [PATCH] makedumpfile: cope with not-present mem section

2020-04-29 Thread 萩尾 一仁
Hi Pingfan,

> -Original Message-
> Hi Kazu and Cascardo,
> 
> I encounter a weird problem when running makedumpfile on a s390 machine.
> 
> Our production kernel uses extreme sparse memory model, and has the
> following:
> 
> in mm/sparse.c
> 
> #ifdef CONFIG_SPARSEMEM_EXTREME
> struct mem_section **mem_section;
> #else
> struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]
> cacheline_internodealigned_in_smp;
> #endif
> 
> So in makedumpfile.c, get_mem_section(), it got a failed result when the
> first call site to validate_mem_section(), then it should success at the
> second call site to validate_mem_section(), which is inside if
> (is_sparsemem_extreme()) condition.

I think your production kernel should have kernel commit a0b1280368d1
("kdump: write correct address of mem_section into vmcoreinfo"), so the
first call should return TRUE and the second one should return FALSE.

> 
> But the actual result is not like expected.
> 
> After introducing
> commit e113f1c974c820f9633dc0073eda525d7575f365[PATCH] cope with
> not-present mem section
> 
> I got two successful calls to validate_mem_section(), and finally failed
> at the condition
>   ret = symbol_valid ^ pointer_valid;
>   if (!ret) {
>   ERRMSG("Could not validate mem_section.\n");
>   }
> 
> 
> Do you have any idea?

Presumably this will be what I expected that it might be possible.
I can apply the patch below this time, what about this?
https://github.com/k-hagio/makedumpfile-old/commit/ce883df3864a5744ac0f1eff47de06b5074edb5f.patch

or, we can also investigate why the second call returns TRUE, and
fix the conditions in the validate_mem_section()..

Thanks,
Kazu
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] makedumpfile: cope with not-present mem section

2020-04-29 Thread piliu
Hi Kazu and Cascardo,

I encounter a weird problem when running makedumpfile on a s390 machine.

Our production kernel uses extreme sparse memory model, and has the
following:

in mm/sparse.c

#ifdef CONFIG_SPARSEMEM_EXTREME
struct mem_section **mem_section;
#else
struct mem_section mem_section[NR_SECTION_ROOTS][SECTIONS_PER_ROOT]
cacheline_internodealigned_in_smp;
#endif

So in makedumpfile.c, get_mem_section(), it got a failed result when the
first call site to validate_mem_section(), then it should success at the
second call site to validate_mem_section(), which is inside if
(is_sparsemem_extreme()) condition.

But the actual result is not like expected.

After introducing
commit e113f1c974c820f9633dc0073eda525d7575f365[PATCH] cope with
not-present mem section

I got two successful calls to validate_mem_section(), and finally failed
at the condition
ret = symbol_valid ^ pointer_valid;
if (!ret) {
ERRMSG("Could not validate mem_section.\n");
}


Do you have any idea?

Thanks,
Pingfan


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [PATCH] makedumpfile: cope with not-present mem section

2020-02-21 Thread 萩尾 一仁
-Original Message-
> Hi, Kazu.
> 
> For now, I would keep Pingfan's patch as is. I decided to test some other
> kernels, like 3.16 and 4.9 without commit
> 83e3c48729d9ebb7af5a31a504f3fd6aff0348c4.
> 
> So, they would be valid for the first iteration and invalid for the second,
> just like Ubuntu's 4.4 kernel. As I couldn't reproduce the problem, I
> investigated further and realized I was testing without your commit for
> makedumpfile: 8f1aafa1643532ece86cba22f2187d0f42fb7ca3 ("PATCH Fix
> validate_mem_section()").
> 
> With that one, all of Ubuntu's 4.4 kernel and Debian's 3.16 and 4.9 kernels
> dump correctly. Without that one, all fail, and I suppose it would be easily
> reproducible.
> 
> So, in effect, we are looking for any entry with the present bit, and then
> checking it is valid kernel address. That seems to work just fine.

OK, good to turn out Pingfan's patch is enough.  Applied.
https://sourceforge.net/p/makedumpfile/code/ci/e113f1c974c820f9633dc0073eda525d7575f365/

(Oops, forgot to add your ack. sorry about that..)

> 
> As you mention, the only case we do the second check is for some downstream
> kernels (though I would argue we should care about those the most), but at

(IMHO, upstream makedumpfile can support downstream kernels but if a patch
to support them has a risk to upstream kernels, we should be careful.)

> least from the Ubuntu side, those should not be around in the field anymore,
> and, by default, those should be the rare exception anyway. So, I agree with
> your follow-up commit in your branch, as it also simplifies the code a lot.
> 
> If you care, feel free to add my Ack to the two patches.
> 
> Acked-by: Thadeu Lima de Souza Cascardo 

Thank you for looking into those.
As for the follow-up commit, I'll test and consider whether to merge it
a little more.  (Any comments or cons are welcome.)

Kazu

> 
> Thanks for your patience.
> Thadeu Cascardo.
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] makedumpfile: cope with not-present mem section

2020-02-20 Thread Thadeu Lima de Souza Cascardo
On Thu, Feb 20, 2020 at 03:40:12PM +, HAGIO KAZUHITO(萩尾 一仁) wrote:
> -Original Message-
> > On 02/20/2020 04:12 AM, HAGIO KAZUHITO wrote:
> > > Hi Cascardo,
> > >
> > > Do you have any solution or detailed information on the failure on your 
> > > kernel?
> > > or could you try this branch?  It has an additional patch on top of 
> > > Pingfan's
> > > one to avoid the false positive failure that I'm suspecting:
> > > https://github.com/k-hagio/makedumpfile/tree/modify-mem_section-validation
> > >
> > > Pingfan,
> > > Do you have an output of makedumpfile when the original failure occurs?
> > > If you don't and it's hard to get it, no need to do so.  I just would 
> > > like to
> > > add it to your patch if available.
> > I did the test on a PowerVM. After hot removing the memory, save a raw
> > vmcore by "cp", then run makedumpfile against the "cp" vmcore, and it
> > will get the following error message:
> > # makedumpfile -x vmlinux -l -d 31 vmcore vmcore.dump
> > get_mem_section: Could not validate mem_section.
> > get_mm_sparsemem: Can't get the address of mem_section.
> > 
> > makedumpfile Failed.
> 
> Thank you, will add it to your patch.
> 
> and a bit of explanation for the branch above..
> 
> >  But from the logic of this patch, it just does the following changes:
> >  -1. After memory hot-removed, either mem_section.section_mem_map = NULL
> >  or mem_section.section_mem_map without SECTION_MARKED_PRESENT, we will
> >  have mem_maps[section_nr] = mem_map = NOT_MEMMAP_ADDR, so later it will
> >  be skipped.
> >  -2. If not populated, mem_section.section_mem_map = NULL. It can follow
> >  the same handling of hot-removed, and be skipped during parsing.
> > 
> >  And in v4.4 sparse_remove_one_section() just assigns 
> >  ms->section_mem_map
> >  = 0, which can not be violated by the above changes.
> > >> Ping. As all of us can not reproduce this bug by v4.4 kernel, further
> > >> more, there is no code analysis, which persuades us this patch will
> > >> break the makedumpfile on any kernel version.
> 
> I'm not clear what causes it on the 4.4 kernel because of lack of information.
> But at least your patch relaxes the condition to recognize data of an address
> as a valid mem_section.  So I'm concerned that both of the first and second
> validate_mem_section() may return true by accident or something.  If it is not
> caused by a patch in the 4.4 kernel, for example, just data location causes 
> it,
> it may occur on upstream kernels.  Although we cannot reproduce it so far.
> 
> Whatever causes it, in the first place, upstream kernels with vmcoreinfo don't
> need the second validate_mem_section().  It's almost for some downstream
> kernels and has a risk that causes problem like this.  So I'm thinking that
> it might be safer to change it to the if(!ret) way on the branch above
> with your patch.
> 
> Thanks,
> Kazu
> 

Hi, Kazu.

For now, I would keep Pingfan's patch as is. I decided to test some other
kernels, like 3.16 and 4.9 without commit
83e3c48729d9ebb7af5a31a504f3fd6aff0348c4.

So, they would be valid for the first iteration and invalid for the second,
just like Ubuntu's 4.4 kernel. As I couldn't reproduce the problem, I
investigated further and realized I was testing without your commit for
makedumpfile: 8f1aafa1643532ece86cba22f2187d0f42fb7ca3 ("PATCH Fix
validate_mem_section()").

With that one, all of Ubuntu's 4.4 kernel and Debian's 3.16 and 4.9 kernels
dump correctly. Without that one, all fail, and I suppose it would be easily
reproducible.

So, in effect, we are looking for any entry with the present bit, and then
checking it is valid kernel address. That seems to work just fine.

As you mention, the only case we do the second check is for some downstream
kernels (though I would argue we should care about those the most), but at
least from the Ubuntu side, those should not be around in the field anymore,
and, by default, those should be the rare exception anyway. So, I agree with
your follow-up commit in your branch, as it also simplifies the code a lot.

If you care, feel free to add my Ack to the two patches.

Acked-by: Thadeu Lima de Souza Cascardo 

Thanks for your patience.
Thadeu Cascardo.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [PATCH] makedumpfile: cope with not-present mem section

2020-02-20 Thread 萩尾 一仁
-Original Message-
> On 02/20/2020 04:12 AM, HAGIO KAZUHITO wrote:
> > Hi Cascardo,
> >
> > Do you have any solution or detailed information on the failure on your 
> > kernel?
> > or could you try this branch?  It has an additional patch on top of 
> > Pingfan's
> > one to avoid the false positive failure that I'm suspecting:
> > https://github.com/k-hagio/makedumpfile/tree/modify-mem_section-validation
> >
> > Pingfan,
> > Do you have an output of makedumpfile when the original failure occurs?
> > If you don't and it's hard to get it, no need to do so.  I just would like 
> > to
> > add it to your patch if available.
> I did the test on a PowerVM. After hot removing the memory, save a raw
> vmcore by "cp", then run makedumpfile against the "cp" vmcore, and it
> will get the following error message:
> # makedumpfile -x vmlinux -l -d 31 vmcore vmcore.dump
> get_mem_section: Could not validate mem_section.
> get_mm_sparsemem: Can't get the address of mem_section.
> 
> makedumpfile Failed.

Thank you, will add it to your patch.

and a bit of explanation for the branch above..

>  But from the logic of this patch, it just does the following changes:
>  -1. After memory hot-removed, either mem_section.section_mem_map = NULL
>  or mem_section.section_mem_map without SECTION_MARKED_PRESENT, we will
>  have mem_maps[section_nr] = mem_map = NOT_MEMMAP_ADDR, so later it will
>  be skipped.
>  -2. If not populated, mem_section.section_mem_map = NULL. It can follow
>  the same handling of hot-removed, and be skipped during parsing.
> 
>  And in v4.4 sparse_remove_one_section() just assigns ms->section_mem_map
>  = 0, which can not be violated by the above changes.
> >> Ping. As all of us can not reproduce this bug by v4.4 kernel, further
> >> more, there is no code analysis, which persuades us this patch will
> >> break the makedumpfile on any kernel version.

I'm not clear what causes it on the 4.4 kernel because of lack of information.
But at least your patch relaxes the condition to recognize data of an address
as a valid mem_section.  So I'm concerned that both of the first and second
validate_mem_section() may return true by accident or something.  If it is not
caused by a patch in the 4.4 kernel, for example, just data location causes it,
it may occur on upstream kernels.  Although we cannot reproduce it so far.

Whatever causes it, in the first place, upstream kernels with vmcoreinfo don't
need the second validate_mem_section().  It's almost for some downstream
kernels and has a risk that causes problem like this.  So I'm thinking that
it might be safer to change it to the if(!ret) way on the branch above
with your patch.

Thanks,
Kazu

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] makedumpfile: cope with not-present mem section

2020-02-19 Thread piliu


On 02/20/2020 04:12 AM, HAGIO KAZUHITO(萩尾 一仁) wrote:
> Hi Cascardo,
> 
> Do you have any solution or detailed information on the failure on your 
> kernel?
> or could you try this branch?  It has an additional patch on top of Pingfan's
> one to avoid the false positive failure that I'm suspecting:
> https://github.com/k-hagio/makedumpfile/tree/modify-mem_section-validation
> 
> Pingfan,
> Do you have an output of makedumpfile when the original failure occurs?
> If you don't and it's hard to get it, no need to do so.  I just would like to
> add it to your patch if available.
I did the test on a PowerVM. After hot removing the memory, save a raw
vmcore by "cp", then run makedumpfile against the "cp" vmcore, and it
will get the following error message:
# makedumpfile -x vmlinux -l -d 31 vmcore vmcore.dump
get_mem_section: Could not validate mem_section.
get_mm_sparsemem: Can't get the address of mem_section.

makedumpfile Failed.

Thanks,
Pingfan
> 
> Thanks,
> Kazu
> 
> -Original Message-
>> On 02/12/2020 12:11 PM, piliu wrote:
>>>
>>>
>>> On 02/06/2020 11:46 AM, piliu wrote:


 On 02/05/2020 05:18 AM, HAGIO KAZUHITO wrote:
>> -Original Message-
>> On Tue, Feb 04, 2020 at 02:24:17PM +0800, piliu wrote:
>>> Hi,
>>>
>>> Sorry to reply late due to a long festival.
>>>
>>> I have tested this patch against v4.15 and latest kernel with small
>>> modification to meet the situation we discussed here. Both work fine.
>>>
>>> The below is the modification of two kernel
>>>
>>> test1. latest kernel with two extra modification to expose the problem
>>> -1.1 reverts commit 1f503443e7df8dc8366608b4d810ce2d6669827c
>>> (mm/sparse.c: reset section's mem_map when fully deactivated), this
>>> commit work around this bug
>>> -1.2. reverts commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
>>> write correct address of mem_section into vmcoreinfo"). This will create
>>> a buggy situation as we discussed here.
>>> -1.3. fix building bug due to revert
>>> a0b1280368d1e91ab72f849ef095b4f07a39bbf1
>>>
>>> test2. v4.15, which include both commit 83e3c48729d9 and a0b1280368d1.
>>> -2.1. revert commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
>>> write correct address of mem_section into vmcoreinfo")
>>>
>>> So I can not see any problem with my patch.
>>> Maybe I misunderstand the discussion, but I can not see my original
>>> patch will break the kernel which have 83e3c48729d9 but not 
>>> a0b1280368d1.
>>>
>>> Thanks,
>>> Pingfan
>>>
>>
>> You also need to test the case where 83e3c48729d9 is not present at all. 
>> Can
>> you test on a 4.4 kernel, for example? As far as I understand, a vanilla 
>> 4.4
>> kernel would not be dumpable with your patch.
>
> As far as I've tested this patch with SPARSEMEM_EXTREME vmcores below, 
> it's OK:
>   - 51 vmcores of vanilla kernels (each from 2.6.36 through 5.5) on hand
>   - one more vanilla 4.4.0 kernel with a different config from the above
>
> So apparently not all vanilla 4.4 kernels are affected by the patch.
>
 Sorry, due to touch hardware resource in our lab, I can not have a test
 on v4.4 on a system with hotplug memory yet. I still try to find some
 resource.

 But from the logic of this patch, it just does the following changes:
 -1. After memory hot-removed, either mem_section.section_mem_map = NULL
 or mem_section.section_mem_map without SECTION_MARKED_PRESENT, we will
 have mem_maps[section_nr] = mem_map = NOT_MEMMAP_ADDR, so later it will
 be skipped.
 -2. If not populated, mem_section.section_mem_map = NULL. It can follow
 the same handling of hot-removed, and be skipped during parsing.

 And in v4.4 sparse_remove_one_section() just assigns ms->section_mem_map
 = 0, which can not be violated by the above changes.
>> Ping. As all of us can not reproduce this bug by v4.4 kernel, further
>> more, there is no code analysis, which persuades us this patch will
>> break the makedumpfile on any kernel version.
>>
>> Could this better-to-have patch be accepted?
>>
>> Thanks,
>> Pingfan
>>> Last night, I got a machine to test this scene. After applying my patch
>>> makedumpfile can still work with v4.4 kernel.
>>>
>>> Thanks,
>>> Pingfan
>>>
> 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] makedumpfile: cope with not-present mem section

2020-02-19 Thread Thadeu Lima de Souza Cascardo
On Wed, Feb 19, 2020 at 08:12:41PM +, HAGIO KAZUHITO(萩尾 一仁) wrote:
> Hi Cascardo,
> 
> Do you have any solution or detailed information on the failure on your 
> kernel?
> or could you try this branch?  It has an additional patch on top of Pingfan's
> one to avoid the false positive failure that I'm suspecting:
> https://github.com/k-hagio/makedumpfile/tree/modify-mem_section-validation
> 
> Pingfan,
> Do you have an output of makedumpfile when the original failure occurs?
> If you don't and it's hard to get it, no need to do so.  I just would like to
> add it to your patch if available.
> 
> Thanks,
> Kazu

Will try the said branch. Sorry that I couldn't work this out before. I was
trying to reproduce this today, but end up in a rabbit hole when qemu+KVM
started failing for unrelated reasons after an upgrade.

I'll try to come up with some new results by tomorrow later in the day.

Thanks.
Cascardo.

> 
> -Original Message-
> > On 02/12/2020 12:11 PM, piliu wrote:
> > >
> > >
> > > On 02/06/2020 11:46 AM, piliu wrote:
> > >>
> > >>
> > >> On 02/05/2020 05:18 AM, HAGIO KAZUHITO wrote:
> >  -Original Message-
> >  On Tue, Feb 04, 2020 at 02:24:17PM +0800, piliu wrote:
> > > Hi,
> > >
> > > Sorry to reply late due to a long festival.
> > >
> > > I have tested this patch against v4.15 and latest kernel with small
> > > modification to meet the situation we discussed here. Both work fine.
> > >
> > > The below is the modification of two kernel
> > >
> > > test1. latest kernel with two extra modification to expose the problem
> > > -1.1 reverts commit 1f503443e7df8dc8366608b4d810ce2d6669827c
> > > (mm/sparse.c: reset section's mem_map when fully deactivated), this
> > > commit work around this bug
> > > -1.2. reverts commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
> > > write correct address of mem_section into vmcoreinfo"). This will 
> > > create
> > > a buggy situation as we discussed here.
> > > -1.3. fix building bug due to revert
> > > a0b1280368d1e91ab72f849ef095b4f07a39bbf1
> > >
> > > test2. v4.15, which include both commit 83e3c48729d9 and a0b1280368d1.
> > > -2.1. revert commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
> > > write correct address of mem_section into vmcoreinfo")
> > >
> > > So I can not see any problem with my patch.
> > > Maybe I misunderstand the discussion, but I can not see my original
> > > patch will break the kernel which have 83e3c48729d9 but not 
> > > a0b1280368d1.
> > >
> > > Thanks,
> > > Pingfan
> > >
> > 
> >  You also need to test the case where 83e3c48729d9 is not present at 
> >  all. Can
> >  you test on a 4.4 kernel, for example? As far as I understand, a 
> >  vanilla 4.4
> >  kernel would not be dumpable with your patch.
> > >>>
> > >>> As far as I've tested this patch with SPARSEMEM_EXTREME vmcores below, 
> > >>> it's OK:
> > >>>   - 51 vmcores of vanilla kernels (each from 2.6.36 through 5.5) on hand
> > >>>   - one more vanilla 4.4.0 kernel with a different config from the above
> > >>>
> > >>> So apparently not all vanilla 4.4 kernels are affected by the patch.
> > >>>
> > >> Sorry, due to touch hardware resource in our lab, I can not have a test
> > >> on v4.4 on a system with hotplug memory yet. I still try to find some
> > >> resource.
> > >>
> > >> But from the logic of this patch, it just does the following changes:
> > >> -1. After memory hot-removed, either mem_section.section_mem_map = NULL
> > >> or mem_section.section_mem_map without SECTION_MARKED_PRESENT, we will
> > >> have mem_maps[section_nr] = mem_map = NOT_MEMMAP_ADDR, so later it will
> > >> be skipped.
> > >> -2. If not populated, mem_section.section_mem_map = NULL. It can follow
> > >> the same handling of hot-removed, and be skipped during parsing.
> > >>
> > >> And in v4.4 sparse_remove_one_section() just assigns ms->section_mem_map
> > >> = 0, which can not be violated by the above changes.
> > Ping. As all of us can not reproduce this bug by v4.4 kernel, further
> > more, there is no code analysis, which persuades us this patch will
> > break the makedumpfile on any kernel version.
> > 
> > Could this better-to-have patch be accepted?
> > 
> > Thanks,
> > Pingfan
> > > Last night, I got a machine to test this scene. After applying my patch
> > > makedumpfile can still work with v4.4 kernel.
> > >
> > > Thanks,
> > > Pingfan
> > >
> 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [PATCH] makedumpfile: cope with not-present mem section

2020-02-19 Thread 萩尾 一仁
Hi Cascardo,

Do you have any solution or detailed information on the failure on your kernel?
or could you try this branch?  It has an additional patch on top of Pingfan's
one to avoid the false positive failure that I'm suspecting:
https://github.com/k-hagio/makedumpfile/tree/modify-mem_section-validation

Pingfan,
Do you have an output of makedumpfile when the original failure occurs?
If you don't and it's hard to get it, no need to do so.  I just would like to
add it to your patch if available.

Thanks,
Kazu

-Original Message-
> On 02/12/2020 12:11 PM, piliu wrote:
> >
> >
> > On 02/06/2020 11:46 AM, piliu wrote:
> >>
> >>
> >> On 02/05/2020 05:18 AM, HAGIO KAZUHITO wrote:
>  -Original Message-
>  On Tue, Feb 04, 2020 at 02:24:17PM +0800, piliu wrote:
> > Hi,
> >
> > Sorry to reply late due to a long festival.
> >
> > I have tested this patch against v4.15 and latest kernel with small
> > modification to meet the situation we discussed here. Both work fine.
> >
> > The below is the modification of two kernel
> >
> > test1. latest kernel with two extra modification to expose the problem
> > -1.1 reverts commit 1f503443e7df8dc8366608b4d810ce2d6669827c
> > (mm/sparse.c: reset section's mem_map when fully deactivated), this
> > commit work around this bug
> > -1.2. reverts commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
> > write correct address of mem_section into vmcoreinfo"). This will create
> > a buggy situation as we discussed here.
> > -1.3. fix building bug due to revert
> > a0b1280368d1e91ab72f849ef095b4f07a39bbf1
> >
> > test2. v4.15, which include both commit 83e3c48729d9 and a0b1280368d1.
> > -2.1. revert commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
> > write correct address of mem_section into vmcoreinfo")
> >
> > So I can not see any problem with my patch.
> > Maybe I misunderstand the discussion, but I can not see my original
> > patch will break the kernel which have 83e3c48729d9 but not 
> > a0b1280368d1.
> >
> > Thanks,
> > Pingfan
> >
> 
>  You also need to test the case where 83e3c48729d9 is not present at all. 
>  Can
>  you test on a 4.4 kernel, for example? As far as I understand, a vanilla 
>  4.4
>  kernel would not be dumpable with your patch.
> >>>
> >>> As far as I've tested this patch with SPARSEMEM_EXTREME vmcores below, 
> >>> it's OK:
> >>>   - 51 vmcores of vanilla kernels (each from 2.6.36 through 5.5) on hand
> >>>   - one more vanilla 4.4.0 kernel with a different config from the above
> >>>
> >>> So apparently not all vanilla 4.4 kernels are affected by the patch.
> >>>
> >> Sorry, due to touch hardware resource in our lab, I can not have a test
> >> on v4.4 on a system with hotplug memory yet. I still try to find some
> >> resource.
> >>
> >> But from the logic of this patch, it just does the following changes:
> >> -1. After memory hot-removed, either mem_section.section_mem_map = NULL
> >> or mem_section.section_mem_map without SECTION_MARKED_PRESENT, we will
> >> have mem_maps[section_nr] = mem_map = NOT_MEMMAP_ADDR, so later it will
> >> be skipped.
> >> -2. If not populated, mem_section.section_mem_map = NULL. It can follow
> >> the same handling of hot-removed, and be skipped during parsing.
> >>
> >> And in v4.4 sparse_remove_one_section() just assigns ms->section_mem_map
> >> = 0, which can not be violated by the above changes.
> Ping. As all of us can not reproduce this bug by v4.4 kernel, further
> more, there is no code analysis, which persuades us this patch will
> break the makedumpfile on any kernel version.
> 
> Could this better-to-have patch be accepted?
> 
> Thanks,
> Pingfan
> > Last night, I got a machine to test this scene. After applying my patch
> > makedumpfile can still work with v4.4 kernel.
> >
> > Thanks,
> > Pingfan
> >

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] makedumpfile: cope with not-present mem section

2020-02-18 Thread piliu


On 02/12/2020 12:11 PM, piliu wrote:
> 
> 
> On 02/06/2020 11:46 AM, piliu wrote:
>>
>>
>> On 02/05/2020 05:18 AM, HAGIO KAZUHITO(萩尾 一仁) wrote:
 -Original Message-
 On Tue, Feb 04, 2020 at 02:24:17PM +0800, piliu wrote:
> Hi,
>
> Sorry to reply late due to a long festival.
>
> I have tested this patch against v4.15 and latest kernel with small
> modification to meet the situation we discussed here. Both work fine.
>
> The below is the modification of two kernel
>
> test1. latest kernel with two extra modification to expose the problem
> -1.1 reverts commit 1f503443e7df8dc8366608b4d810ce2d6669827c
> (mm/sparse.c: reset section's mem_map when fully deactivated), this
> commit work around this bug
> -1.2. reverts commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
> write correct address of mem_section into vmcoreinfo"). This will create
> a buggy situation as we discussed here.
> -1.3. fix building bug due to revert
> a0b1280368d1e91ab72f849ef095b4f07a39bbf1
>
> test2. v4.15, which include both commit 83e3c48729d9 and a0b1280368d1.
> -2.1. revert commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
> write correct address of mem_section into vmcoreinfo")
>
> So I can not see any problem with my patch.
> Maybe I misunderstand the discussion, but I can not see my original
> patch will break the kernel which have 83e3c48729d9 but not a0b1280368d1.
>
> Thanks,
> Pingfan
>

 You also need to test the case where 83e3c48729d9 is not present at all. 
 Can
 you test on a 4.4 kernel, for example? As far as I understand, a vanilla 
 4.4
 kernel would not be dumpable with your patch.
>>>
>>> As far as I've tested this patch with SPARSEMEM_EXTREME vmcores below, it's 
>>> OK:
>>>   - 51 vmcores of vanilla kernels (each from 2.6.36 through 5.5) on hand
>>>   - one more vanilla 4.4.0 kernel with a different config from the above
>>>
>>> So apparently not all vanilla 4.4 kernels are affected by the patch.
>>>
>> Sorry, due to touch hardware resource in our lab, I can not have a test
>> on v4.4 on a system with hotplug memory yet. I still try to find some
>> resource.
>>
>> But from the logic of this patch, it just does the following changes:
>> -1. After memory hot-removed, either mem_section.section_mem_map = NULL
>> or mem_section.section_mem_map without SECTION_MARKED_PRESENT, we will
>> have mem_maps[section_nr] = mem_map = NOT_MEMMAP_ADDR, so later it will
>> be skipped.
>> -2. If not populated, mem_section.section_mem_map = NULL. It can follow
>> the same handling of hot-removed, and be skipped during parsing.
>>
>> And in v4.4 sparse_remove_one_section() just assigns ms->section_mem_map
>> = 0, which can not be violated by the above changes.
Ping. As all of us can not reproduce this bug by v4.4 kernel, further
more, there is no code analysis, which persuades us this patch will
break the makedumpfile on any kernel version.

Could this better-to-have patch be accepted?

Thanks,
Pingfan
> Last night, I got a machine to test this scene. After applying my patch
> makedumpfile can still work with v4.4 kernel.
> 
> Thanks,
> Pingfan
> 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] makedumpfile: cope with not-present mem section

2020-02-11 Thread piliu


On 02/06/2020 11:46 AM, piliu wrote:
> 
> 
> On 02/05/2020 05:18 AM, HAGIO KAZUHITO(萩尾 一仁) wrote:
>>> -Original Message-
>>> On Tue, Feb 04, 2020 at 02:24:17PM +0800, piliu wrote:
 Hi,

 Sorry to reply late due to a long festival.

 I have tested this patch against v4.15 and latest kernel with small
 modification to meet the situation we discussed here. Both work fine.

 The below is the modification of two kernel

 test1. latest kernel with two extra modification to expose the problem
 -1.1 reverts commit 1f503443e7df8dc8366608b4d810ce2d6669827c
 (mm/sparse.c: reset section's mem_map when fully deactivated), this
 commit work around this bug
 -1.2. reverts commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
 write correct address of mem_section into vmcoreinfo"). This will create
 a buggy situation as we discussed here.
 -1.3. fix building bug due to revert
 a0b1280368d1e91ab72f849ef095b4f07a39bbf1

 test2. v4.15, which include both commit 83e3c48729d9 and a0b1280368d1.
 -2.1. revert commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
 write correct address of mem_section into vmcoreinfo")

 So I can not see any problem with my patch.
 Maybe I misunderstand the discussion, but I can not see my original
 patch will break the kernel which have 83e3c48729d9 but not a0b1280368d1.

 Thanks,
 Pingfan

>>>
>>> You also need to test the case where 83e3c48729d9 is not present at all. Can
>>> you test on a 4.4 kernel, for example? As far as I understand, a vanilla 4.4
>>> kernel would not be dumpable with your patch.
>>
>> As far as I've tested this patch with SPARSEMEM_EXTREME vmcores below, it's 
>> OK:
>>   - 51 vmcores of vanilla kernels (each from 2.6.36 through 5.5) on hand
>>   - one more vanilla 4.4.0 kernel with a different config from the above
>>
>> So apparently not all vanilla 4.4 kernels are affected by the patch.
>>
> Sorry, due to touch hardware resource in our lab, I can not have a test
> on v4.4 on a system with hotplug memory yet. I still try to find some
> resource.
> 
> But from the logic of this patch, it just does the following changes:
> -1. After memory hot-removed, either mem_section.section_mem_map = NULL
> or mem_section.section_mem_map without SECTION_MARKED_PRESENT, we will
> have mem_maps[section_nr] = mem_map = NOT_MEMMAP_ADDR, so later it will
> be skipped.
> -2. If not populated, mem_section.section_mem_map = NULL. It can follow
> the same handling of hot-removed, and be skipped during parsing.
> 
> And in v4.4 sparse_remove_one_section() just assigns ms->section_mem_map
> = 0, which can not be violated by the above changes.
Last night, I got a machine to test this scene. After applying my patch
makedumpfile can still work with v4.4 kernel.

Thanks,
Pingfan


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] makedumpfile: cope with not-present mem section

2020-02-05 Thread piliu


On 02/05/2020 05:18 AM, HAGIO KAZUHITO(萩尾 一仁) wrote:
>> -Original Message-
>> On Tue, Feb 04, 2020 at 02:24:17PM +0800, piliu wrote:
>>> Hi,
>>>
>>> Sorry to reply late due to a long festival.
>>>
>>> I have tested this patch against v4.15 and latest kernel with small
>>> modification to meet the situation we discussed here. Both work fine.
>>>
>>> The below is the modification of two kernel
>>>
>>> test1. latest kernel with two extra modification to expose the problem
>>> -1.1 reverts commit 1f503443e7df8dc8366608b4d810ce2d6669827c
>>> (mm/sparse.c: reset section's mem_map when fully deactivated), this
>>> commit work around this bug
>>> -1.2. reverts commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
>>> write correct address of mem_section into vmcoreinfo"). This will create
>>> a buggy situation as we discussed here.
>>> -1.3. fix building bug due to revert
>>> a0b1280368d1e91ab72f849ef095b4f07a39bbf1
>>>
>>> test2. v4.15, which include both commit 83e3c48729d9 and a0b1280368d1.
>>> -2.1. revert commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
>>> write correct address of mem_section into vmcoreinfo")
>>>
>>> So I can not see any problem with my patch.
>>> Maybe I misunderstand the discussion, but I can not see my original
>>> patch will break the kernel which have 83e3c48729d9 but not a0b1280368d1.
>>>
>>> Thanks,
>>> Pingfan
>>>
>>
>> You also need to test the case where 83e3c48729d9 is not present at all. Can
>> you test on a 4.4 kernel, for example? As far as I understand, a vanilla 4.4
>> kernel would not be dumpable with your patch.
> 
> As far as I've tested this patch with SPARSEMEM_EXTREME vmcores below, it's 
> OK:
>   - 51 vmcores of vanilla kernels (each from 2.6.36 through 5.5) on hand
>   - one more vanilla 4.4.0 kernel with a different config from the above
> 
> So apparently not all vanilla 4.4 kernels are affected by the patch.
> 
Sorry, due to touch hardware resource in our lab, I can not have a test
on v4.4 on a system with hotplug memory yet. I still try to find some
resource.

But from the logic of this patch, it just does the following changes:
-1. After memory hot-removed, either mem_section.section_mem_map = NULL
or mem_section.section_mem_map without SECTION_MARKED_PRESENT, we will
have mem_maps[section_nr] = mem_map = NOT_MEMMAP_ADDR, so later it will
be skipped.
-2. If not populated, mem_section.section_mem_map = NULL. It can follow
the same handling of hot-removed, and be skipped during parsing.

And in v4.4 sparse_remove_one_section() just assigns ms->section_mem_map
= 0, which can not be violated by the above changes.

Thanks,
Pingfan


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [PATCH] makedumpfile: cope with not-present mem section

2020-02-04 Thread 萩尾 一仁
> -Original Message-
> On Tue, Feb 04, 2020 at 02:24:17PM +0800, piliu wrote:
> > Hi,
> >
> > Sorry to reply late due to a long festival.
> >
> > I have tested this patch against v4.15 and latest kernel with small
> > modification to meet the situation we discussed here. Both work fine.
> >
> > The below is the modification of two kernel
> >
> > test1. latest kernel with two extra modification to expose the problem
> > -1.1 reverts commit 1f503443e7df8dc8366608b4d810ce2d6669827c
> > (mm/sparse.c: reset section's mem_map when fully deactivated), this
> > commit work around this bug
> > -1.2. reverts commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
> > write correct address of mem_section into vmcoreinfo"). This will create
> > a buggy situation as we discussed here.
> > -1.3. fix building bug due to revert
> > a0b1280368d1e91ab72f849ef095b4f07a39bbf1
> >
> > test2. v4.15, which include both commit 83e3c48729d9 and a0b1280368d1.
> > -2.1. revert commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
> > write correct address of mem_section into vmcoreinfo")
> >
> > So I can not see any problem with my patch.
> > Maybe I misunderstand the discussion, but I can not see my original
> > patch will break the kernel which have 83e3c48729d9 but not a0b1280368d1.
> >
> > Thanks,
> > Pingfan
> >
> 
> You also need to test the case where 83e3c48729d9 is not present at all. Can
> you test on a 4.4 kernel, for example? As far as I understand, a vanilla 4.4
> kernel would not be dumpable with your patch.

As far as I've tested this patch with SPARSEMEM_EXTREME vmcores below, it's OK:
  - 51 vmcores of vanilla kernels (each from 2.6.36 through 5.5) on hand
  - one more vanilla 4.4.0 kernel with a different config from the above

So apparently not all vanilla 4.4 kernels are affected by the patch.

> 
> Thanks.
> Cascardo.
> 
> > On 01/29/2020 03:33 AM, Thadeu Lima de Souza Cascardo wrote:
> > > On Tue, Jan 28, 2020 at 05:03:12PM +, HAGIO KAZUHITO wrote:
> > >> Hi Cascardo,
> > >>
> > >>> -Original Message-
> > >>> On Mon, Jan 27, 2020 at 02:04:54PM -0300, Thadeu Lima de Souza Cascardo 
> > >>> wrote:
> >  Sorry for taking too long to respond, as I was on vacation.
> > 
> >  The kernels that had commit 83e3c48729d9, but not commit a0b1280368d1, 
> >  are
> >  not supported anymore. In a way that it's even hard for me to test 
> >  them.
> > 
> >  However, I managed to test it, and those two lines are definitively 
> >  needed
> >  to dump a VM running such a kernel. Is removing them really needed to 
> >  fix
> >  this issue?
> > 
> >  Otherwise, I would rather keep them.
> > 
> >  Thanks.
> >  Cascardo.
> > >>>
> > >>> By the way, I was too fast in sending this. We really need to keep 
> > >>> those lines
> > >>> as makedumpfile will fail to dump a 4.4 kernel with this patch as is.
> > >>
> > >> Is that Ubuntu 4.4 kernel which has 83e3c48729d9 and not a0b1280368d1?
> > >> Could you elaborate on how it fails?
> > >
> > > No, it doesn't have either, so my guess is it would fail on upstream 4.4 
> > > as
> > > well, so anything that doesn't have 83e3c48729d9.
> > >
> > > That's what I get on that 4.4 kernel (4.4.0-171-generic):
> > >
> > > # ./makedumpfile /proc/vmcore ../dump
> > > get_mem_section: Could not validate mem_section.
> > > get_mm_sparsemem: Can't get the address of mem_section.
> > >
> > > makedumpfile Failed.
> > > #

Thanks for the infomation.
I guess that your 4.4 kernel and machine get a false-positive result (TRUE)
from the second validate_mem_section() with this patch, right?

If we don't have a way to exactly determine whether a mem_section is real
or not, we might have to accept some tradeoff here.  For example, a workaround
which I think of is something like this:

ret = validate_mem_section(SYMBOL(mem_section));
if (!ret && is_sparsemem_extreme()) {
  ...
  ret = validate_mem_section(mem_section_ptr);
  if (!ret)
ERRMSG("Could not determine the valid mem_section.\n");
}

with Pingfan's patch.  This will work for the false-positive fail you hit (if 
so),
but may affect some downstream kernels which have 83e3c48729d9 and do not
have a0b1280368d1.  But at least there is no upstream kernel like that.

Any other solution?

Thanks,
Kazu

> > >
> > > So, now, I have a better grasp of the whole logic, and understand why it 
> > > fails
> > > with this patch.
> > >
> > > So, we need to either interpret the mem_section as a pointer to the array 
> > > of a
> > > pointer to the pointer to the array. The only case the second option is 
> > > valid
> > > is when sparse_extreme is on, so we don't even need to check the second 
> > > case
> > > when it's off.
> > >
> > > Then, we check that interpreting it either way is valid. If it's valid in 
> > > both
> > > interpretations, we can't decide which to use, and will fail. So far, we
> > > haven't seen any case in the field where that would accidentally happen. 
> > 

Re: [PATCH] makedumpfile: cope with not-present mem section

2020-02-04 Thread Thadeu Lima de Souza Cascardo
On Tue, Feb 04, 2020 at 02:24:17PM +0800, piliu wrote:
> Hi,
> 
> Sorry to reply late due to a long festival.
> 
> I have tested this patch against v4.15 and latest kernel with small
> modification to meet the situation we discussed here. Both work fine.
> 
> The below is the modification of two kernel
> 
> test1. latest kernel with two extra modification to expose the problem
> -1.1 reverts commit 1f503443e7df8dc8366608b4d810ce2d6669827c
> (mm/sparse.c: reset section's mem_map when fully deactivated), this
> commit work around this bug
> -1.2. reverts commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
> write correct address of mem_section into vmcoreinfo"). This will create
> a buggy situation as we discussed here.
> -1.3. fix building bug due to revert
> a0b1280368d1e91ab72f849ef095b4f07a39bbf1
> 
> test2. v4.15, which include both commit 83e3c48729d9 and a0b1280368d1.
> -2.1. revert commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
> write correct address of mem_section into vmcoreinfo")
> 
> So I can not see any problem with my patch.
> Maybe I misunderstand the discussion, but I can not see my original
> patch will break the kernel which have 83e3c48729d9 but not a0b1280368d1.
> 
> Thanks,
> Pingfan
> 

You also need to test the case where 83e3c48729d9 is not present at all. Can
you test on a 4.4 kernel, for example? As far as I understand, a vanilla 4.4
kernel would not be dumpable with your patch.

Thanks.
Cascardo.

> On 01/29/2020 03:33 AM, Thadeu Lima de Souza Cascardo wrote:
> > On Tue, Jan 28, 2020 at 05:03:12PM +, HAGIO KAZUHITO(萩尾 一仁) wrote:
> >> Hi Cascardo,
> >>
> >>> -Original Message-
> >>> On Mon, Jan 27, 2020 at 02:04:54PM -0300, Thadeu Lima de Souza Cascardo 
> >>> wrote:
>  Sorry for taking too long to respond, as I was on vacation.
> 
>  The kernels that had commit 83e3c48729d9, but not commit a0b1280368d1, 
>  are
>  not supported anymore. In a way that it's even hard for me to test them.
> 
>  However, I managed to test it, and those two lines are definitively 
>  needed
>  to dump a VM running such a kernel. Is removing them really needed to fix
>  this issue?
> 
>  Otherwise, I would rather keep them.
> 
>  Thanks.
>  Cascardo.
> >>>
> >>> By the way, I was too fast in sending this. We really need to keep those 
> >>> lines
> >>> as makedumpfile will fail to dump a 4.4 kernel with this patch as is.
> >>
> >> Is that Ubuntu 4.4 kernel which has 83e3c48729d9 and not a0b1280368d1?
> >> Could you elaborate on how it fails?
> > 
> > No, it doesn't have either, so my guess is it would fail on upstream 4.4 as
> > well, so anything that doesn't have 83e3c48729d9.
> > 
> > That's what I get on that 4.4 kernel (4.4.0-171-generic):
> > 
> > # ./makedumpfile /proc/vmcore ../dump
> > get_mem_section: Could not validate mem_section.
> > get_mm_sparsemem: Can't get the address of mem_section.
> > 
> > makedumpfile Failed.
> > #
> > 
> > So, now, I have a better grasp of the whole logic, and understand why it 
> > fails
> > with this patch.
> > 
> > So, we need to either interpret the mem_section as a pointer to the array 
> > of a
> > pointer to the pointer to the array. The only case the second option is 
> > valid
> > is when sparse_extreme is on, so we don't even need to check the second case
> > when it's off.
> > 
> > Then, we check that interpreting it either way is valid. If it's valid in 
> > both
> > interpretations, we can't decide which to use, and will fail. So far, we
> > haven't seen any case in the field where that would accidentally happen. 
> > But in
> > case it does, we should rather fail to dump and fallback to copying, than
> > creating a bogus compressed dump.
> > 
> > When this patch is applied, a kernel which does not have 83e3c48729d9, and
> > thus, has mem_section as a direct pointer to the array, it so happens that 
> > we
> > don't detect the pointer to pointer to the array case as invalid, thus 
> > failing
> > to dump.
> > 
> > The way we validate is that the mem_maps should either have the PRESENT bit 
> > or
> > be NULL. Now, that assumption may be invalid, and we would need to do the
> > validation some other way. I can test the cases where that assumption is
> > invalid in a 4.4 kernel and see how to fix this in a satisfactory way.
> > 
> > Going through the code once again, I don't see how the second section of the
> > patch would be correct by itself too. I will think it through a little more 
> > and
> > see if I can come up with a solution.
> > 
> > Regards.
> > Cascardo.
> > 
> >>
> >> I'm thinking that Pingfan's thought may help:
>  I think it could be if/else, no need to call twice.
> >>
> >> Thanks,
> >> Kazu
> >>
> >>>
> >>> Cascardo.
> > 
> 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] makedumpfile: cope with not-present mem section

2020-02-03 Thread piliu
Hi,

Sorry to reply late due to a long festival.

I have tested this patch against v4.15 and latest kernel with small
modification to meet the situation we discussed here. Both work fine.

The below is the modification of two kernel

test1. latest kernel with two extra modification to expose the problem
-1.1 reverts commit 1f503443e7df8dc8366608b4d810ce2d6669827c
(mm/sparse.c: reset section's mem_map when fully deactivated), this
commit work around this bug
-1.2. reverts commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
write correct address of mem_section into vmcoreinfo"). This will create
a buggy situation as we discussed here.
-1.3. fix building bug due to revert
a0b1280368d1e91ab72f849ef095b4f07a39bbf1

test2. v4.15, which include both commit 83e3c48729d9 and a0b1280368d1.
-2.1. revert commit a0b1280368d1e91ab72f849ef095b4f07a39bbf1 ("kdump:
write correct address of mem_section into vmcoreinfo")

So I can not see any problem with my patch.
Maybe I misunderstand the discussion, but I can not see my original
patch will break the kernel which have 83e3c48729d9 but not a0b1280368d1.

Thanks,
Pingfan

On 01/29/2020 03:33 AM, Thadeu Lima de Souza Cascardo wrote:
> On Tue, Jan 28, 2020 at 05:03:12PM +, HAGIO KAZUHITO(萩尾 一仁) wrote:
>> Hi Cascardo,
>>
>>> -Original Message-
>>> On Mon, Jan 27, 2020 at 02:04:54PM -0300, Thadeu Lima de Souza Cascardo 
>>> wrote:
 Sorry for taking too long to respond, as I was on vacation.

 The kernels that had commit 83e3c48729d9, but not commit a0b1280368d1, are
 not supported anymore. In a way that it's even hard for me to test them.

 However, I managed to test it, and those two lines are definitively needed
 to dump a VM running such a kernel. Is removing them really needed to fix
 this issue?

 Otherwise, I would rather keep them.

 Thanks.
 Cascardo.
>>>
>>> By the way, I was too fast in sending this. We really need to keep those 
>>> lines
>>> as makedumpfile will fail to dump a 4.4 kernel with this patch as is.
>>
>> Is that Ubuntu 4.4 kernel which has 83e3c48729d9 and not a0b1280368d1?
>> Could you elaborate on how it fails?
> 
> No, it doesn't have either, so my guess is it would fail on upstream 4.4 as
> well, so anything that doesn't have 83e3c48729d9.
> 
> That's what I get on that 4.4 kernel (4.4.0-171-generic):
> 
> # ./makedumpfile /proc/vmcore ../dump
> get_mem_section: Could not validate mem_section.
> get_mm_sparsemem: Can't get the address of mem_section.
> 
> makedumpfile Failed.
> #
> 
> So, now, I have a better grasp of the whole logic, and understand why it fails
> with this patch.
> 
> So, we need to either interpret the mem_section as a pointer to the array of a
> pointer to the pointer to the array. The only case the second option is valid
> is when sparse_extreme is on, so we don't even need to check the second case
> when it's off.
> 
> Then, we check that interpreting it either way is valid. If it's valid in both
> interpretations, we can't decide which to use, and will fail. So far, we
> haven't seen any case in the field where that would accidentally happen. But 
> in
> case it does, we should rather fail to dump and fallback to copying, than
> creating a bogus compressed dump.
> 
> When this patch is applied, a kernel which does not have 83e3c48729d9, and
> thus, has mem_section as a direct pointer to the array, it so happens that we
> don't detect the pointer to pointer to the array case as invalid, thus failing
> to dump.
> 
> The way we validate is that the mem_maps should either have the PRESENT bit or
> be NULL. Now, that assumption may be invalid, and we would need to do the
> validation some other way. I can test the cases where that assumption is
> invalid in a 4.4 kernel and see how to fix this in a satisfactory way.
> 
> Going through the code once again, I don't see how the second section of the
> patch would be correct by itself too. I will think it through a little more 
> and
> see if I can come up with a solution.
> 
> Regards.
> Cascardo.
> 
>>
>> I'm thinking that Pingfan's thought may help:
 I think it could be if/else, no need to call twice.
>>
>> Thanks,
>> Kazu
>>
>>>
>>> Cascardo.
> 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] makedumpfile: cope with not-present mem section

2020-01-28 Thread Thadeu Lima de Souza Cascardo
On Tue, Jan 28, 2020 at 05:03:12PM +, HAGIO KAZUHITO(萩尾 一仁) wrote:
> Hi Cascardo,
> 
> > -Original Message-
> > On Mon, Jan 27, 2020 at 02:04:54PM -0300, Thadeu Lima de Souza Cascardo 
> > wrote:
> > > Sorry for taking too long to respond, as I was on vacation.
> > >
> > > The kernels that had commit 83e3c48729d9, but not commit a0b1280368d1, are
> > > not supported anymore. In a way that it's even hard for me to test them.
> > >
> > > However, I managed to test it, and those two lines are definitively needed
> > > to dump a VM running such a kernel. Is removing them really needed to fix
> > > this issue?
> > >
> > > Otherwise, I would rather keep them.
> > >
> > > Thanks.
> > > Cascardo.
> > 
> > By the way, I was too fast in sending this. We really need to keep those 
> > lines
> > as makedumpfile will fail to dump a 4.4 kernel with this patch as is.
> 
> Is that Ubuntu 4.4 kernel which has 83e3c48729d9 and not a0b1280368d1?
> Could you elaborate on how it fails?

No, it doesn't have either, so my guess is it would fail on upstream 4.4 as
well, so anything that doesn't have 83e3c48729d9.

That's what I get on that 4.4 kernel (4.4.0-171-generic):

# ./makedumpfile /proc/vmcore ../dump
get_mem_section: Could not validate mem_section.
get_mm_sparsemem: Can't get the address of mem_section.

makedumpfile Failed.
#

So, now, I have a better grasp of the whole logic, and understand why it fails
with this patch.

So, we need to either interpret the mem_section as a pointer to the array of a
pointer to the pointer to the array. The only case the second option is valid
is when sparse_extreme is on, so we don't even need to check the second case
when it's off.

Then, we check that interpreting it either way is valid. If it's valid in both
interpretations, we can't decide which to use, and will fail. So far, we
haven't seen any case in the field where that would accidentally happen. But in
case it does, we should rather fail to dump and fallback to copying, than
creating a bogus compressed dump.

When this patch is applied, a kernel which does not have 83e3c48729d9, and
thus, has mem_section as a direct pointer to the array, it so happens that we
don't detect the pointer to pointer to the array case as invalid, thus failing
to dump.

The way we validate is that the mem_maps should either have the PRESENT bit or
be NULL. Now, that assumption may be invalid, and we would need to do the
validation some other way. I can test the cases where that assumption is
invalid in a 4.4 kernel and see how to fix this in a satisfactory way.

Going through the code once again, I don't see how the second section of the
patch would be correct by itself too. I will think it through a little more and
see if I can come up with a solution.

Regards.
Cascardo.

> 
> I'm thinking that Pingfan's thought may help:
> >> I think it could be if/else, no need to call twice.
> 
> Thanks,
> Kazu
> 
> > 
> > Cascardo.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [PATCH] makedumpfile: cope with not-present mem section

2020-01-28 Thread 萩尾 一仁
Hi Cascardo,

> -Original Message-
> On Mon, Jan 27, 2020 at 02:04:54PM -0300, Thadeu Lima de Souza Cascardo wrote:
> > Sorry for taking too long to respond, as I was on vacation.
> >
> > The kernels that had commit 83e3c48729d9, but not commit a0b1280368d1, are
> > not supported anymore. In a way that it's even hard for me to test them.
> >
> > However, I managed to test it, and those two lines are definitively needed
> > to dump a VM running such a kernel. Is removing them really needed to fix
> > this issue?
> >
> > Otherwise, I would rather keep them.
> >
> > Thanks.
> > Cascardo.
> 
> By the way, I was too fast in sending this. We really need to keep those lines
> as makedumpfile will fail to dump a 4.4 kernel with this patch as is.

Is that Ubuntu 4.4 kernel which has 83e3c48729d9 and not a0b1280368d1?
Could you elaborate on how it fails?

I'm thinking that Pingfan's thought may help:
>> I think it could be if/else, no need to call twice.

Thanks,
Kazu

> 
> Cascardo.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] makedumpfile: cope with not-present mem section

2020-01-27 Thread Thadeu Lima de Souza Cascardo
On Mon, Jan 27, 2020 at 02:04:54PM -0300, Thadeu Lima de Souza Cascardo wrote:
> Sorry for taking too long to respond, as I was on vacation.
> 
> The kernels that had commit 83e3c48729d9, but not commit a0b1280368d1, are
> not supported anymore. In a way that it's even hard for me to test them.
> 
> However, I managed to test it, and those two lines are definitively needed
> to dump a VM running such a kernel. Is removing them really needed to fix
> this issue?
> 
> Otherwise, I would rather keep them.
> 
> Thanks.
> Cascardo.

By the way, I was too fast in sending this. We really need to keep those lines
as makedumpfile will fail to dump a 4.4 kernel with this patch as is.

Cascardo.

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] makedumpfile: cope with not-present mem section

2020-01-27 Thread Thadeu Lima de Souza Cascardo
On Thu, Jan 23, 2020 at 05:07:50PM +, HAGIO KAZUHITO(萩尾 一仁) wrote:
> Hi Pingfan,
> 
> Sorry, I had been busy, then was looking into its history a bit.
> 
> > -Original Message-
> > On 01/20/2020 04:59 PM, Baoquan He wrote:
> > > On 01/20/20 at 09:33am, David Hildenbrand wrote:
> > >>
> > >>
> > >>> Am 20.01.2020 um 03:25 schrieb Pingfan Liu :
> > >>>
> > >>> After kernel commit ba72b4c8cf60 ("mm/sparsemem: support sub-section
> > >>> hotplug"), when hot-removed, section_mem_map is still encoded with 
> > >>> section
> > >>> start pfn, not NULL. This break the current makedumpfile.
> 
> Could you add kernel version which this started and error message,
> if possible, for someone hitting this issue to find the patch.
> 
> > >>>
> > >>> Whatever section_mem_map coding info after hot-removed, it is reliable
> > >>> just to work on SECTION_MARKED_PRESENT bit. Fixing makedumpfile by this
> > >>> way.
> > >>>
> > >>> Signed-off-by: Pingfan Liu 
> > >>> To: kexec@lists.infradead.org
> > >>> Cc: Kazuhito Hagio 
> > >>> Cc: Baoquan He 
> > >>> Cc: David Hildenbrand 
> > >>> Cc: Andrew Morton 
> > >>> Cc: Dan Williams 
> > >>> Cc: Oscar Salvador 
> > >>> Cc: Michal Hocko 
> > >>> Cc: Qian Cai 
> > >>> ---
> > >>> makedumpfile.c | 6 +-
> > >>> 1 file changed, 1 insertion(+), 5 deletions(-)
> > >>>
> > >>> diff --git a/makedumpfile.c b/makedumpfile.c
> > >>> index e290fbd..ab40a58 100644
> > >>> --- a/makedumpfile.c
> > >>> +++ b/makedumpfile.c
> > >>> @@ -3406,8 +3406,6 @@ section_mem_map_addr(unsigned long addr, unsigned 
> > >>> long *map_mask)
> > >>>map = ULONG(mem_section + OFFSET(mem_section.section_mem_map));
> > >>>mask = SECTION_MAP_MASK;
> > >>>*map_mask = map & ~mask;
> > >>> -if (map == 0x0)
> > >>> -*map_mask |= SECTION_MARKED_PRESENT;
> > This should be a trick to let map==0x0 survive the logic
> > if (!(map_mask & SECTION_MARKED_PRESENT)) {
> > return FALSE;
> > }
> > in validate_mem_section().
> > >>
> > >> Why was that added in the first place? This looks like dome compat 
> > >> handling to me. Are we sure we can
> > remove it?
> > The logic introduced by commit 14876c45aef ("[PATCH makedumpfile] handle
> > mem_section as either a pointer or an array")
> > Combining section_mem_map_addr() and the following logic in
> > validate_mem_section()
> > if (mem_map == 0) {
> > mem_map = NOT_MEMMAP_ADDR;
> > }
> > 
> > I reached the same conclusion as Baoquan's.
> > >
> > >
> > > The original code assumes that there are only two kinds of mem_section,
> > > one is empty value, the second is a present one. This removing behaves
> > > the same as the old code, I don't see a problem with it.
> > >
> > > I tried to understand the code, but I don't know why it need call
> > > validate_mem_section() twice and compare the returned value.
> > I think it could be if/else, no need to call twice.
> 
> It looks to me that commit 14876c45aef ("[PATCH makedumpfile] handle 
> mem_section
> as either a pointer or an array") was intended to support kernels which had
>   83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for 
> CONFIG_SPARSEMEM_EXTREME=y")
> and did not have
>   a0b1280368d1 ("kdump: write correct address of mem_section into 
> vmcoreinfo").
> 
> Apparently there were some released Ubuntu kernels like this.
> 
> So, if we need that logic, your patch seems
> - support kernels which section_mem_map is non-NULL for hot-removed memory,
> - but might increase the possiblity of false-positive.
> 
> I think this is a tradeoff between the above two, but the latter would be
> small enough.  And I'm testing the patch and OK with 10 vmcores so far.
> 
> > Cc Thadeu Lima de Souza Cascardo, who contributes the original code, and
> > may have some idea about it.
> 
> So if Cascardo doesn't have any objection, I will accept.

Sorry for taking too long to respond, as I was on vacation.

The kernels that had commit 83e3c48729d9, but not commit a0b1280368d1, are
not supported anymore. In a way that it's even hard for me to test them.

However, I managed to test it, and those two lines are definitively needed
to dump a VM running such a kernel. Is removing them really needed to fix
this issue?

Otherwise, I would rather keep them.

Thanks.
Cascardo.

> 
> Thanks,
> Kazu
> 
> P.S. My email address has been changed to k-hagio...@nec.com.
> Please send email to this address in the future.  Thanks.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [PATCH] makedumpfile: cope with not-present mem section

2020-01-23 Thread 萩尾 一仁
Hi Pingfan,

Sorry, I had been busy, then was looking into its history a bit.

> -Original Message-
> On 01/20/2020 04:59 PM, Baoquan He wrote:
> > On 01/20/20 at 09:33am, David Hildenbrand wrote:
> >>
> >>
> >>> Am 20.01.2020 um 03:25 schrieb Pingfan Liu :
> >>>
> >>> After kernel commit ba72b4c8cf60 ("mm/sparsemem: support sub-section
> >>> hotplug"), when hot-removed, section_mem_map is still encoded with section
> >>> start pfn, not NULL. This break the current makedumpfile.

Could you add kernel version which this started and error message,
if possible, for someone hitting this issue to find the patch.

> >>>
> >>> Whatever section_mem_map coding info after hot-removed, it is reliable
> >>> just to work on SECTION_MARKED_PRESENT bit. Fixing makedumpfile by this
> >>> way.
> >>>
> >>> Signed-off-by: Pingfan Liu 
> >>> To: kexec@lists.infradead.org
> >>> Cc: Kazuhito Hagio 
> >>> Cc: Baoquan He 
> >>> Cc: David Hildenbrand 
> >>> Cc: Andrew Morton 
> >>> Cc: Dan Williams 
> >>> Cc: Oscar Salvador 
> >>> Cc: Michal Hocko 
> >>> Cc: Qian Cai 
> >>> ---
> >>> makedumpfile.c | 6 +-
> >>> 1 file changed, 1 insertion(+), 5 deletions(-)
> >>>
> >>> diff --git a/makedumpfile.c b/makedumpfile.c
> >>> index e290fbd..ab40a58 100644
> >>> --- a/makedumpfile.c
> >>> +++ b/makedumpfile.c
> >>> @@ -3406,8 +3406,6 @@ section_mem_map_addr(unsigned long addr, unsigned 
> >>> long *map_mask)
> >>>map = ULONG(mem_section + OFFSET(mem_section.section_mem_map));
> >>>mask = SECTION_MAP_MASK;
> >>>*map_mask = map & ~mask;
> >>> -if (map == 0x0)
> >>> -*map_mask |= SECTION_MARKED_PRESENT;
> This should be a trick to let map==0x0 survive the logic
>   if (!(map_mask & SECTION_MARKED_PRESENT)) {
>   return FALSE;
>   }
> in validate_mem_section().
> >>
> >> Why was that added in the first place? This looks like dome compat 
> >> handling to me. Are we sure we can
> remove it?
> The logic introduced by commit 14876c45aef ("[PATCH makedumpfile] handle
> mem_section as either a pointer or an array")
> Combining section_mem_map_addr() and the following logic in
> validate_mem_section()
> if (mem_map == 0) {
>   mem_map = NOT_MEMMAP_ADDR;
> }
> 
> I reached the same conclusion as Baoquan's.
> >
> >
> > The original code assumes that there are only two kinds of mem_section,
> > one is empty value, the second is a present one. This removing behaves
> > the same as the old code, I don't see a problem with it.
> >
> > I tried to understand the code, but I don't know why it need call
> > validate_mem_section() twice and compare the returned value.
> I think it could be if/else, no need to call twice.

It looks to me that commit 14876c45aef ("[PATCH makedumpfile] handle mem_section
as either a pointer or an array") was intended to support kernels which had
  83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for 
CONFIG_SPARSEMEM_EXTREME=y")
and did not have
  a0b1280368d1 ("kdump: write correct address of mem_section into vmcoreinfo").

Apparently there were some released Ubuntu kernels like this.

So, if we need that logic, your patch seems
- support kernels which section_mem_map is non-NULL for hot-removed memory,
- but might increase the possiblity of false-positive.

I think this is a tradeoff between the above two, but the latter would be
small enough.  And I'm testing the patch and OK with 10 vmcores so far.

> Cc Thadeu Lima de Souza Cascardo, who contributes the original code, and
> may have some idea about it.

So if Cascardo doesn't have any objection, I will accept.

Thanks,
Kazu

P.S. My email address has been changed to k-hagio...@nec.com.
Please send email to this address in the future.  Thanks.
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] makedumpfile: cope with not-present mem section

2020-01-20 Thread piliu


On 01/20/2020 04:59 PM, Baoquan He wrote:
> On 01/20/20 at 09:33am, David Hildenbrand wrote:
>>
>>
>>> Am 20.01.2020 um 03:25 schrieb Pingfan Liu :
>>>
>>> After kernel commit ba72b4c8cf60 ("mm/sparsemem: support sub-section
>>> hotplug"), when hot-removed, section_mem_map is still encoded with section
>>> start pfn, not NULL. This break the current makedumpfile.
>>>
>>> Whatever section_mem_map coding info after hot-removed, it is reliable
>>> just to work on SECTION_MARKED_PRESENT bit. Fixing makedumpfile by this
>>> way.
>>>
>>> Signed-off-by: Pingfan Liu 
>>> To: kexec@lists.infradead.org
>>> Cc: Kazuhito Hagio 
>>> Cc: Baoquan He 
>>> Cc: David Hildenbrand 
>>> Cc: Andrew Morton 
>>> Cc: Dan Williams 
>>> Cc: Oscar Salvador 
>>> Cc: Michal Hocko 
>>> Cc: Qian Cai 
>>> ---
>>> makedumpfile.c | 6 +-
>>> 1 file changed, 1 insertion(+), 5 deletions(-)
>>>
>>> diff --git a/makedumpfile.c b/makedumpfile.c
>>> index e290fbd..ab40a58 100644
>>> --- a/makedumpfile.c
>>> +++ b/makedumpfile.c
>>> @@ -3406,8 +3406,6 @@ section_mem_map_addr(unsigned long addr, unsigned 
>>> long *map_mask)
>>>map = ULONG(mem_section + OFFSET(mem_section.section_mem_map));
>>>mask = SECTION_MAP_MASK;
>>>*map_mask = map & ~mask;
>>> -if (map == 0x0)
>>> -*map_mask |= SECTION_MARKED_PRESENT;
This should be a trick to let map==0x0 survive the logic
if (!(map_mask & SECTION_MARKED_PRESENT)) {
return FALSE;
}
in validate_mem_section().
>>
>> Why was that added in the first place? This looks like dome compat handling 
>> to me. Are we sure we can remove it?
The logic introduced by commit 14876c45aef ("[PATCH makedumpfile] handle
mem_section as either a pointer or an array")
Combining section_mem_map_addr() and the following logic in
validate_mem_section()
if (mem_map == 0) {
mem_map = NOT_MEMMAP_ADDR;
}

I reached the same conclusion as Baoquan's.
> 
> 
> The original code assumes that there are only two kinds of mem_section,
> one is empty value, the second is a present one. This removing behaves
> the same as the old code, I don't see a problem with it.
> 
> I tried to understand the code, but I don't know why it need call
> validate_mem_section() twice and compare the returned value.
I think it could be if/else, no need to call twice.
Cc Thadeu Lima de Souza Cascardo, who contributes the original code, and
may have some idea about it.

Thanks,
Pingfan

> 
> 
> ___
> kexec mailing list
> kexec@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/kexec
> 


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] makedumpfile: cope with not-present mem section

2020-01-20 Thread Baoquan He
On 01/20/20 at 09:33am, David Hildenbrand wrote:
> 
> 
> > Am 20.01.2020 um 03:25 schrieb Pingfan Liu :
> > 
> > After kernel commit ba72b4c8cf60 ("mm/sparsemem: support sub-section
> > hotplug"), when hot-removed, section_mem_map is still encoded with section
> > start pfn, not NULL. This break the current makedumpfile.
> > 
> > Whatever section_mem_map coding info after hot-removed, it is reliable
> > just to work on SECTION_MARKED_PRESENT bit. Fixing makedumpfile by this
> > way.
> > 
> > Signed-off-by: Pingfan Liu 
> > To: kexec@lists.infradead.org
> > Cc: Kazuhito Hagio 
> > Cc: Baoquan He 
> > Cc: David Hildenbrand 
> > Cc: Andrew Morton 
> > Cc: Dan Williams 
> > Cc: Oscar Salvador 
> > Cc: Michal Hocko 
> > Cc: Qian Cai 
> > ---
> > makedumpfile.c | 6 +-
> > 1 file changed, 1 insertion(+), 5 deletions(-)
> > 
> > diff --git a/makedumpfile.c b/makedumpfile.c
> > index e290fbd..ab40a58 100644
> > --- a/makedumpfile.c
> > +++ b/makedumpfile.c
> > @@ -3406,8 +3406,6 @@ section_mem_map_addr(unsigned long addr, unsigned 
> > long *map_mask)
> >map = ULONG(mem_section + OFFSET(mem_section.section_mem_map));
> >mask = SECTION_MAP_MASK;
> >*map_mask = map & ~mask;
> > -if (map == 0x0)
> > -*map_mask |= SECTION_MARKED_PRESENT;
> 
> Why was that added in the first place? This looks like dome compat handling 
> to me. Are we sure we can remove it?


The original code assumes that there are only two kinds of mem_section,
one is empty value, the second is a present one. This removing behaves
the same as the old code, I don't see a problem with it.

I tried to understand the code, but I don't know why it need call
validate_mem_section() twice and compare the returned value.


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] makedumpfile: cope with not-present mem section

2020-01-20 Thread David Hildenbrand


> Am 20.01.2020 um 03:25 schrieb Pingfan Liu :
> 
> After kernel commit ba72b4c8cf60 ("mm/sparsemem: support sub-section
> hotplug"), when hot-removed, section_mem_map is still encoded with section
> start pfn, not NULL. This break the current makedumpfile.
> 
> Whatever section_mem_map coding info after hot-removed, it is reliable
> just to work on SECTION_MARKED_PRESENT bit. Fixing makedumpfile by this
> way.
> 
> Signed-off-by: Pingfan Liu 
> To: kexec@lists.infradead.org
> Cc: Kazuhito Hagio 
> Cc: Baoquan He 
> Cc: David Hildenbrand 
> Cc: Andrew Morton 
> Cc: Dan Williams 
> Cc: Oscar Salvador 
> Cc: Michal Hocko 
> Cc: Qian Cai 
> ---
> makedumpfile.c | 6 +-
> 1 file changed, 1 insertion(+), 5 deletions(-)
> 
> diff --git a/makedumpfile.c b/makedumpfile.c
> index e290fbd..ab40a58 100644
> --- a/makedumpfile.c
> +++ b/makedumpfile.c
> @@ -3406,8 +3406,6 @@ section_mem_map_addr(unsigned long addr, unsigned long 
> *map_mask)
>map = ULONG(mem_section + OFFSET(mem_section.section_mem_map));
>mask = SECTION_MAP_MASK;
>*map_mask = map & ~mask;
> -if (map == 0x0)
> -*map_mask |= SECTION_MARKED_PRESENT;

Why was that added in the first place? This looks like dome compat handling to 
me. Are we sure we can remove it?


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


[PATCH] makedumpfile: cope with not-present mem section

2020-01-19 Thread Pingfan Liu
After kernel commit ba72b4c8cf60 ("mm/sparsemem: support sub-section
hotplug"), when hot-removed, section_mem_map is still encoded with section
start pfn, not NULL. This break the current makedumpfile.

Whatever section_mem_map coding info after hot-removed, it is reliable
just to work on SECTION_MARKED_PRESENT bit. Fixing makedumpfile by this
way.

Signed-off-by: Pingfan Liu 
To: kexec@lists.infradead.org
Cc: Kazuhito Hagio 
Cc: Baoquan He 
Cc: David Hildenbrand 
Cc: Andrew Morton 
Cc: Dan Williams 
Cc: Oscar Salvador 
Cc: Michal Hocko 
Cc: Qian Cai 
---
 makedumpfile.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/makedumpfile.c b/makedumpfile.c
index e290fbd..ab40a58 100644
--- a/makedumpfile.c
+++ b/makedumpfile.c
@@ -3406,8 +3406,6 @@ section_mem_map_addr(unsigned long addr, unsigned long 
*map_mask)
map = ULONG(mem_section + OFFSET(mem_section.section_mem_map));
mask = SECTION_MAP_MASK;
*map_mask = map & ~mask;
-   if (map == 0x0)
-   *map_mask |= SECTION_MARKED_PRESENT;
map &= mask;
free(mem_section);
 
@@ -3453,10 +3451,8 @@ validate_mem_section(unsigned long *mem_sec,
mem_map = NOT_MEMMAP_ADDR;
} else {
mem_map = section_mem_map_addr(section, _mask);
+   /* for either no mem_map or hot-removed */
if (!(map_mask & SECTION_MARKED_PRESENT)) {
-   return FALSE;
-   }
-   if (mem_map == 0) {
mem_map = NOT_MEMMAP_ADDR;
} else {
mem_map = sparse_decode_mem_map(mem_map,
-- 
2.7.5


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec