Re: [LKP] efad4e475c [ 40.308255] Oops: 0000 [#1] PREEMPT SMP PTI

2019-02-18 Thread Matthew Wilcox
On Mon, Feb 18, 2019 at 07:11:55PM +0100, Michal Hocko wrote:
> On Mon 18-02-19 09:57:26, Matthew Wilcox wrote:
> > On Mon, Feb 18, 2019 at 06:05:58PM +0100, Michal Hocko wrote:
> > > + end_pfn = min(start_pfn + nr_pages,
> > > + zone_end_pfn(page_zone(pfn_to_page(start_pfn;
> > >  
> > >   /* Check the starting page of each pageblock within the range */
> > > - for (; page < end_page; page = next_active_pageblock(page)) {
> > > - if (!is_pageblock_removable_nolock(page))
> > > + for (; start_pfn < end_pfn; start_pfn = 
> > > next_active_pageblock(start_pfn)) {
> > > + if (!is_pageblock_removable_nolock(start_pfn))
> > 
> > If you have a zone which contains pfns that run from ULONG_MAX-n to 
> > ULONG_MAX,
> > end_pfn is going to wrap around to 0 and this loop won't execute.
> 
> Is this a realistic situation to bother?

How insane do you think hardware manufacturers are ... ?  I don't know
of one today, but I wouldn't bet on something like that never existing.

> > I think
> > you should use:
> > 
> > max_pfn = min(start_pfn + nr_pages,
> > zone_end_pfn(page_zone(pfn_to_page(start_pfn - 1;
> > 
> > for (; start_pfn <= max_pfn; ...)
> 
> I do not really care strongly, but we have more places were we do
> start_pfn + nr_pages and then use it as pfn < end_pfn construct. I
> suspect we would need to make a larger audit and make the code
> consistent so unless there are major concerns I would stick with what
> I have for now and leave the rest for the cleanup. Does that sound
> reasonable?

Yes, I think so.  There are a number of other places where we can wrap
around from ULONG_MAX to 0 fairly easily (eg page offsets in a file on
32-bit machines).  I started thinking about this with the XArray and
rapidly convinced myself we have a problem throughout Linux.


Re: [LKP] efad4e475c [ 40.308255] Oops: 0000 [#1] PREEMPT SMP PTI

2019-02-18 Thread Michal Hocko
On Mon 18-02-19 09:57:26, Matthew Wilcox wrote:
> On Mon, Feb 18, 2019 at 06:05:58PM +0100, Michal Hocko wrote:
> > +   end_pfn = min(start_pfn + nr_pages,
> > +   zone_end_pfn(page_zone(pfn_to_page(start_pfn;
> >  
> > /* Check the starting page of each pageblock within the range */
> > -   for (; page < end_page; page = next_active_pageblock(page)) {
> > -   if (!is_pageblock_removable_nolock(page))
> > +   for (; start_pfn < end_pfn; start_pfn = 
> > next_active_pageblock(start_pfn)) {
> > +   if (!is_pageblock_removable_nolock(start_pfn))
> 
> If you have a zone which contains pfns that run from ULONG_MAX-n to ULONG_MAX,
> end_pfn is going to wrap around to 0 and this loop won't execute.

Is this a realistic situation to bother?

> I think
> you should use:
> 
>   max_pfn = min(start_pfn + nr_pages,
>   zone_end_pfn(page_zone(pfn_to_page(start_pfn - 1;
> 
>   for (; start_pfn <= max_pfn; ...)

I do not really care strongly, but we have more places were we do
start_pfn + nr_pages and then use it as pfn < end_pfn construct. I
suspect we would need to make a larger audit and make the code
consistent so unless there are major concerns I would stick with what
I have for now and leave the rest for the cleanup. Does that sound
reasonable?

-- 
Michal Hocko
SUSE Labs


Re: [LKP] efad4e475c [ 40.308255] Oops: 0000 [#1] PREEMPT SMP PTI

2019-02-18 Thread Matthew Wilcox
On Mon, Feb 18, 2019 at 06:05:58PM +0100, Michal Hocko wrote:
> + end_pfn = min(start_pfn + nr_pages,
> + zone_end_pfn(page_zone(pfn_to_page(start_pfn;
>  
>   /* Check the starting page of each pageblock within the range */
> - for (; page < end_page; page = next_active_pageblock(page)) {
> - if (!is_pageblock_removable_nolock(page))
> + for (; start_pfn < end_pfn; start_pfn = 
> next_active_pageblock(start_pfn)) {
> + if (!is_pageblock_removable_nolock(start_pfn))

If you have a zone which contains pfns that run from ULONG_MAX-n to ULONG_MAX,
end_pfn is going to wrap around to 0 and this loop won't execute.  I think
you should use:

max_pfn = min(start_pfn + nr_pages,
zone_end_pfn(page_zone(pfn_to_page(start_pfn - 1;

for (; start_pfn <= max_pfn; ...)



Re: [LKP] efad4e475c [ 40.308255] Oops: 0000 [#1] PREEMPT SMP PTI

2019-02-18 Thread Mike Rapoport
On Mon, Feb 18, 2019 at 06:05:58PM +0100, Michal Hocko wrote:
> On Mon 18-02-19 18:48:14, Mike Rapoport wrote:
> > On Mon, Feb 18, 2019 at 04:22:13PM +0100, Michal Hocko wrote:
> [...]
> > > Thinking about it some more, is it possible that we are overflowing by 1
> > > here?
> > 
> > Looks like that, the end_pfn is actually the first pfn in the next section.
> 
> Thanks for the confirmation. I guess it also exaplains why nobody has
> noticed this off-by-one. Most people seem to use VMEMMAP SPARSE model
> and we are safe there.
> 
> > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > > index 124e794867c5..6618b9d3e53a 100644
> > > --- a/mm/memory_hotplug.c
> > > +++ b/mm/memory_hotplug.c
> > > @@ -1234,10 +1234,10 @@ bool is_mem_section_removable(unsigned long 
> > > start_pfn, unsigned long nr_pages)
> > >  {
> > >   struct page *page = pfn_to_page(start_pfn);
> > >   unsigned long end_pfn = min(start_pfn + nr_pages, 
> > > zone_end_pfn(page_zone(page)));
> > > - struct page *end_page = pfn_to_page(end_pfn);
> > > + struct page *end_page = pfn_to_page(end_pfn - 1);
> > >  
> > >   /* Check the starting page of each pageblock within the range */
> > > - for (; page < end_page; page = next_active_pageblock(page)) {
> > > + for (; page <= end_page; page = next_active_pageblock(page)) {
> > >   if (!is_pageblock_removable_nolock(page))
> > >   return false;
> > >   cond_resched();
> > 
> > Works with your fix, but I think mine is more intuitive ;-)
> 
> I would rather go and rework this to pfns. What about this instead.
> Slightly larger but arguably cleared code?

Yeah, this is clearer.
 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 124e794867c5..a799a0bdbf34 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1188,11 +1188,13 @@ static inline int pageblock_free(struct page *page)
>   return PageBuddy(page) && page_order(page) >= pageblock_order;
>  }
>  
> -/* Return the start of the next active pageblock after a given page */
> -static struct page *next_active_pageblock(struct page *page)
> +/* Return the pfn of the start of the next active pageblock after a given 
> pfn */
> +static unsigned long next_active_pageblock(unsigned long pfn)
>  {
> + struct page *page = pfn_to_page(pfn);
> +
>   /* Ensure the starting page is pageblock-aligned */
> - BUG_ON(page_to_pfn(page) & (pageblock_nr_pages - 1));
> + BUG_ON(pfn & (pageblock_nr_pages - 1));
>  
>   /* If the entire pageblock is free, move to the end of free page */
>   if (pageblock_free(page)) {
> @@ -1200,16 +1202,16 @@ static struct page *next_active_pageblock(struct page 
> *page)
>   /* be careful. we don't have locks, page_order can be changed.*/
>   order = page_order(page);
>   if ((order < MAX_ORDER) && (order >= pageblock_order))
> - return page + (1 << order);
> + return pfn + (1 << order);
>   }
>  
> - return page + pageblock_nr_pages;
> + return pfn + pageblock_nr_pages;
>  }
>  
> -static bool is_pageblock_removable_nolock(struct page *page)
> +static bool is_pageblock_removable_nolock(unsigned long pfn)
>  {
> + struct page *page = pfn_to_page(pfn);
>   struct zone *zone;
> - unsigned long pfn;
>  
>   /*
>* We have to be careful here because we are iterating over memory
> @@ -1232,13 +1234,14 @@ static bool is_pageblock_removable_nolock(struct page 
> *page)
>  /* Checks if this range of memory is likely to be hot-removable. */
>  bool is_mem_section_removable(unsigned long start_pfn, unsigned long 
> nr_pages)
>  {
> - struct page *page = pfn_to_page(start_pfn);
> - unsigned long end_pfn = min(start_pfn + nr_pages, 
> zone_end_pfn(page_zone(page)));
> - struct page *end_page = pfn_to_page(end_pfn);
> + unsigned long end_pfn;
> +
> + end_pfn = min(start_pfn + nr_pages,
> + zone_end_pfn(page_zone(pfn_to_page(start_pfn;
>  
>   /* Check the starting page of each pageblock within the range */
> - for (; page < end_page; page = next_active_pageblock(page)) {
> - if (!is_pageblock_removable_nolock(page))
> + for (; start_pfn < end_pfn; start_pfn = 
> next_active_pageblock(start_pfn)) {
> + if (!is_pageblock_removable_nolock(start_pfn))
>   return false;
>   cond_resched();
>   }

With this on top the loop even fits into 80-chars ;-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 9cc42f3..9981ca7 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1234,13 +1234,13 @@ static bool is_pageblock_removable_nolock(unsigned long 
pfn)
 /* Checks if this range of memory is likely to be hot-removable. */
 bool is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
 {
-   unsigned long end_pfn;
+   unsigned long end_pfn, pfn;
 
end_pfn = min(start_pfn + nr_pages,
 

Re: [LKP] efad4e475c [ 40.308255] Oops: 0000 [#1] PREEMPT SMP PTI

2019-02-18 Thread Michal Hocko
On Mon 18-02-19 18:48:14, Mike Rapoport wrote:
> On Mon, Feb 18, 2019 at 04:22:13PM +0100, Michal Hocko wrote:
[...]
> > Thinking about it some more, is it possible that we are overflowing by 1
> > here?
> 
> Looks like that, the end_pfn is actually the first pfn in the next section.

Thanks for the confirmation. I guess it also exaplains why nobody has
noticed this off-by-one. Most people seem to use VMEMMAP SPARSE model
and we are safe there.

> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index 124e794867c5..6618b9d3e53a 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -1234,10 +1234,10 @@ bool is_mem_section_removable(unsigned long 
> > start_pfn, unsigned long nr_pages)
> >  {
> > struct page *page = pfn_to_page(start_pfn);
> > unsigned long end_pfn = min(start_pfn + nr_pages, 
> > zone_end_pfn(page_zone(page)));
> > -   struct page *end_page = pfn_to_page(end_pfn);
> > +   struct page *end_page = pfn_to_page(end_pfn - 1);
> >  
> > /* Check the starting page of each pageblock within the range */
> > -   for (; page < end_page; page = next_active_pageblock(page)) {
> > +   for (; page <= end_page; page = next_active_pageblock(page)) {
> > if (!is_pageblock_removable_nolock(page))
> > return false;
> > cond_resched();
> 
> Works with your fix, but I think mine is more intuitive ;-)

I would rather go and rework this to pfns. What about this instead.
Slightly larger but arguably cleared code?

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 124e794867c5..a799a0bdbf34 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1188,11 +1188,13 @@ static inline int pageblock_free(struct page *page)
return PageBuddy(page) && page_order(page) >= pageblock_order;
 }
 
-/* Return the start of the next active pageblock after a given page */
-static struct page *next_active_pageblock(struct page *page)
+/* Return the pfn of the start of the next active pageblock after a given pfn 
*/
+static unsigned long next_active_pageblock(unsigned long pfn)
 {
+   struct page *page = pfn_to_page(pfn);
+
/* Ensure the starting page is pageblock-aligned */
-   BUG_ON(page_to_pfn(page) & (pageblock_nr_pages - 1));
+   BUG_ON(pfn & (pageblock_nr_pages - 1));
 
/* If the entire pageblock is free, move to the end of free page */
if (pageblock_free(page)) {
@@ -1200,16 +1202,16 @@ static struct page *next_active_pageblock(struct page 
*page)
/* be careful. we don't have locks, page_order can be changed.*/
order = page_order(page);
if ((order < MAX_ORDER) && (order >= pageblock_order))
-   return page + (1 << order);
+   return pfn + (1 << order);
}
 
-   return page + pageblock_nr_pages;
+   return pfn + pageblock_nr_pages;
 }
 
-static bool is_pageblock_removable_nolock(struct page *page)
+static bool is_pageblock_removable_nolock(unsigned long pfn)
 {
+   struct page *page = pfn_to_page(pfn);
struct zone *zone;
-   unsigned long pfn;
 
/*
 * We have to be careful here because we are iterating over memory
@@ -1232,13 +1234,14 @@ static bool is_pageblock_removable_nolock(struct page 
*page)
 /* Checks if this range of memory is likely to be hot-removable. */
 bool is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
 {
-   struct page *page = pfn_to_page(start_pfn);
-   unsigned long end_pfn = min(start_pfn + nr_pages, 
zone_end_pfn(page_zone(page)));
-   struct page *end_page = pfn_to_page(end_pfn);
+   unsigned long end_pfn;
+
+   end_pfn = min(start_pfn + nr_pages,
+   zone_end_pfn(page_zone(pfn_to_page(start_pfn;
 
/* Check the starting page of each pageblock within the range */
-   for (; page < end_page; page = next_active_pageblock(page)) {
-   if (!is_pageblock_removable_nolock(page))
+   for (; start_pfn < end_pfn; start_pfn = 
next_active_pageblock(start_pfn)) {
+   if (!is_pageblock_removable_nolock(start_pfn))
return false;
cond_resched();
}
-- 
Michal Hocko
SUSE Labs


Re: [LKP] efad4e475c [ 40.308255] Oops: 0000 [#1] PREEMPT SMP PTI

2019-02-18 Thread Mike Rapoport
On Mon, Feb 18, 2019 at 04:22:13PM +0100, Michal Hocko wrote:
> On Mon 18-02-19 16:20:50, Michal Hocko wrote:
> > On Mon 18-02-19 16:05:15, Mike Rapoport wrote:
> > > On Mon, Feb 18, 2019 at 11:30:13AM +0100, Michal Hocko wrote:
> > > > On Mon 18-02-19 18:01:39, Rong Chen wrote:
> > > > > 
> > > > > On 2/18/19 4:55 PM, Michal Hocko wrote:
> > > > > > [Sorry for an excessive quoting in the previous email]
> > > > > > [Cc Pavel - the full report is 
> > > > > > http://lkml.kernel.org/r/20190218052823.GH29177@shao2-debian[]
> > > > > > 
> > > > > > On Mon 18-02-19 08:08:44, Michal Hocko wrote:
> > > > > > > On Mon 18-02-19 13:28:23, kernel test robot wrote:
> > > > > > [...]
> > > > > > > > [   40.305212] PGD 0 P4D 0
> > > > > > > > [   40.308255] Oops:  [#1] PREEMPT SMP PTI
> > > > > > > > [   40.313055] CPU: 1 PID: 239 Comm: udevd Not tainted 
> > > > > > > > 5.0.0-rc4-00149-gefad4e4 #1
> > > > > > > > [   40.321348] Hardware name: QEMU Standard PC (i440FX + PIIX, 
> > > > > > > > 1996), BIOS 1.10.2-1 04/01/2014
> > > > > > > > [   40.330813] RIP: 0010:page_mapping+0x12/0x80
> > > > > > > > [   40.335709] Code: 5d c3 48 89 df e8 0e ad 02 00 85 c0 75 da 
> > > > > > > > 89 e8 5b 5d c3 0f 1f 44 00 00 53 48 89 fb 48 8b 43 08 48 8d 50 
> > > > > > > > ff a8 01 48 0f 45 da <48> 8b 53 08 48 8d 42 ff 83 e2 01 48 0f 
> > > > > > > > 44 c3 48 83 38 ff 74 2f 48
> > > > > > > > [   40.356704] RSP: 0018:88801fa87cd8 EFLAGS: 00010202
> > > > > > > > [   40.362714] RAX:  RBX: fffe RCX: 
> > > > > > > > 000a
> > > > > > > > [   40.370798] RDX: fffe RSI: 820b9a20 RDI: 
> > > > > > > > 88801e5c
> > > > > > > > [   40.378830] RBP: 6db6db6db6db6db7 R08: 88801e8bb000 R09: 
> > > > > > > > 01b64d13
> > > > > > > > [   40.386902] R10: 88801fa87cf8 R11: 0001 R12: 
> > > > > > > > 88801e64
> > > > > > > > [   40.395033] R13: 820b9a20 R14: 88801f145258 R15: 
> > > > > > > > 0001
> > > > > > > > [   40.403138] FS:  7fb2079817c0() 
> > > > > > > > GS:88801dd0() knlGS:
> > > > > > > > [   40.412243] CS:  0010 DS:  ES:  CR0: 80050033
> > > > > > > > [   40.418846] CR2: 0006 CR3: 1fa82000 CR4: 
> > > > > > > > 06a0
> > > > > > > > [   40.426951] Call Trace:
> > > > > > > > [   40.429843]  __dump_page+0x14/0x2c0
> > > > > > > > [   40.433947]  is_mem_section_removable+0x24c/0x2c0
> > > > > > > This looks like we are stumbling over an unitialized struct page 
> > > > > > > again.
> > > > > > > Something this patch should prevent from. Could you try to apply 
> > > > > > > [1]
> > > > > > > which will make __dump_page more robust so that we do not blow up 
> > > > > > > there
> > > > > > > and give some more details in return.
> > > > > > > 
> > > > > > > Btw. is this reproducible all the time?
> > > > > > And forgot to ask whether this is reproducible with pending mmotm
> > > > > > patches in linux-next.
> > > > > 
> > > > > 
> > > > > Do you mean the below patch? I can reproduce the problem too.
> > > > 
> > > > Yes, thanks for the swift response. The patch has just added a debugging
> > > > output
> > > > [0.013697] Early memory node ranges
> > > > [0.013701]   node   0: [mem 0x1000-0x0009efff]
> > > > [0.013706]   node   0: [mem 0x0010-0x1ffd]
> > > > [0.013711] zeroying 0-1
> > > > 
> > > > This is the first pfn.
> > > > 
> > > > [0.013715] zeroying 9f-100
> > > > 
> > > > this is [mem 0x9f000, 0xf] so it fills up the whole hole between the
> > > > above two ranges. This is definitely good.
> > > > 
> > > > [0.013722] zeroying 1ffe0-1ffe0
> > > > 
> > > > this is a single page at 0x1ffe right after the zone end.
> > > > 
> > > > [0.013727] Zeroed struct page in unavailable ranges: 98 pages
> > > > 
> > > > Hmm, so this is getting really interesting. The whole zone range should
> > > > be covered. So this is either some off-by-one or I something that I am
> > > > missing right now. Could you apply the following on top please? We
> > > > definitely need to see what pfn this is.
> > > > 
> > > > 
> > > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > > > index 124e794867c5..59bcfd934e37 100644
> > > > --- a/mm/memory_hotplug.c
> > > > +++ b/mm/memory_hotplug.c
> > > > @@ -1232,12 +1232,14 @@ static bool 
> > > > is_pageblock_removable_nolock(struct page *page)
> > > >  /* Checks if this range of memory is likely to be hot-removable. */
> > > >  bool is_mem_section_removable(unsigned long start_pfn, unsigned long 
> > > > nr_pages)
> > > >  {
> > > > -   struct page *page = pfn_to_page(start_pfn);
> > > > +   struct page *page = pfn_to_page(start_pfn), *first_page;
> > > > unsigned long end_pfn = min(start_pfn + nr_pages, 
> > > > zone_end_pfn(page_zone(page)));
> > > > struct page *end_page = 

Re: [LKP] efad4e475c [ 40.308255] Oops: 0000 [#1] PREEMPT SMP PTI

2019-02-18 Thread Michal Hocko
On Mon 18-02-19 16:20:50, Michal Hocko wrote:
> On Mon 18-02-19 16:05:15, Mike Rapoport wrote:
> > On Mon, Feb 18, 2019 at 11:30:13AM +0100, Michal Hocko wrote:
> > > On Mon 18-02-19 18:01:39, Rong Chen wrote:
> > > > 
> > > > On 2/18/19 4:55 PM, Michal Hocko wrote:
> > > > > [Sorry for an excessive quoting in the previous email]
> > > > > [Cc Pavel - the full report is 
> > > > > http://lkml.kernel.org/r/20190218052823.GH29177@shao2-debian[]
> > > > > 
> > > > > On Mon 18-02-19 08:08:44, Michal Hocko wrote:
> > > > > > On Mon 18-02-19 13:28:23, kernel test robot wrote:
> > > > > [...]
> > > > > > > [   40.305212] PGD 0 P4D 0
> > > > > > > [   40.308255] Oops:  [#1] PREEMPT SMP PTI
> > > > > > > [   40.313055] CPU: 1 PID: 239 Comm: udevd Not tainted 
> > > > > > > 5.0.0-rc4-00149-gefad4e4 #1
> > > > > > > [   40.321348] Hardware name: QEMU Standard PC (i440FX + PIIX, 
> > > > > > > 1996), BIOS 1.10.2-1 04/01/2014
> > > > > > > [   40.330813] RIP: 0010:page_mapping+0x12/0x80
> > > > > > > [   40.335709] Code: 5d c3 48 89 df e8 0e ad 02 00 85 c0 75 da 89 
> > > > > > > e8 5b 5d c3 0f 1f 44 00 00 53 48 89 fb 48 8b 43 08 48 8d 50 ff a8 
> > > > > > > 01 48 0f 45 da <48> 8b 53 08 48 8d 42 ff 83 e2 01 48 0f 44 c3 48 
> > > > > > > 83 38 ff 74 2f 48
> > > > > > > [   40.356704] RSP: 0018:88801fa87cd8 EFLAGS: 00010202
> > > > > > > [   40.362714] RAX:  RBX: fffe RCX: 
> > > > > > > 000a
> > > > > > > [   40.370798] RDX: fffe RSI: 820b9a20 RDI: 
> > > > > > > 88801e5c
> > > > > > > [   40.378830] RBP: 6db6db6db6db6db7 R08: 88801e8bb000 R09: 
> > > > > > > 01b64d13
> > > > > > > [   40.386902] R10: 88801fa87cf8 R11: 0001 R12: 
> > > > > > > 88801e64
> > > > > > > [   40.395033] R13: 820b9a20 R14: 88801f145258 R15: 
> > > > > > > 0001
> > > > > > > [   40.403138] FS:  7fb2079817c0() 
> > > > > > > GS:88801dd0() knlGS:
> > > > > > > [   40.412243] CS:  0010 DS:  ES:  CR0: 80050033
> > > > > > > [   40.418846] CR2: 0006 CR3: 1fa82000 CR4: 
> > > > > > > 06a0
> > > > > > > [   40.426951] Call Trace:
> > > > > > > [   40.429843]  __dump_page+0x14/0x2c0
> > > > > > > [   40.433947]  is_mem_section_removable+0x24c/0x2c0
> > > > > > This looks like we are stumbling over an unitialized struct page 
> > > > > > again.
> > > > > > Something this patch should prevent from. Could you try to apply [1]
> > > > > > which will make __dump_page more robust so that we do not blow up 
> > > > > > there
> > > > > > and give some more details in return.
> > > > > > 
> > > > > > Btw. is this reproducible all the time?
> > > > > And forgot to ask whether this is reproducible with pending mmotm
> > > > > patches in linux-next.
> > > > 
> > > > 
> > > > Do you mean the below patch? I can reproduce the problem too.
> > > 
> > > Yes, thanks for the swift response. The patch has just added a debugging
> > > output
> > > [0.013697] Early memory node ranges
> > > [0.013701]   node   0: [mem 0x1000-0x0009efff]
> > > [0.013706]   node   0: [mem 0x0010-0x1ffd]
> > > [0.013711] zeroying 0-1
> > > 
> > > This is the first pfn.
> > > 
> > > [0.013715] zeroying 9f-100
> > > 
> > > this is [mem 0x9f000, 0xf] so it fills up the whole hole between the
> > > above two ranges. This is definitely good.
> > > 
> > > [0.013722] zeroying 1ffe0-1ffe0
> > > 
> > > this is a single page at 0x1ffe right after the zone end.
> > > 
> > > [0.013727] Zeroed struct page in unavailable ranges: 98 pages
> > > 
> > > Hmm, so this is getting really interesting. The whole zone range should
> > > be covered. So this is either some off-by-one or I something that I am
> > > missing right now. Could you apply the following on top please? We
> > > definitely need to see what pfn this is.
> > > 
> > > 
> > > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > > index 124e794867c5..59bcfd934e37 100644
> > > --- a/mm/memory_hotplug.c
> > > +++ b/mm/memory_hotplug.c
> > > @@ -1232,12 +1232,14 @@ static bool is_pageblock_removable_nolock(struct 
> > > page *page)
> > >  /* Checks if this range of memory is likely to be hot-removable. */
> > >  bool is_mem_section_removable(unsigned long start_pfn, unsigned long 
> > > nr_pages)
> > >  {
> > > - struct page *page = pfn_to_page(start_pfn);
> > > + struct page *page = pfn_to_page(start_pfn), *first_page;
> > >   unsigned long end_pfn = min(start_pfn + nr_pages, 
> > > zone_end_pfn(page_zone(page)));
> > >   struct page *end_page = pfn_to_page(end_pfn);
> > > 
> > >   /* Check the starting page of each pageblock within the range */
> > > - for (; page < end_page; page = next_active_pageblock(page)) {
> > > + for (first_page = page; page < end_page; page = 
> > > next_active_pageblock(page)) {
> > > + if (PagePoisoned(page))
> > > 

Re: [LKP] efad4e475c [ 40.308255] Oops: 0000 [#1] PREEMPT SMP PTI

2019-02-18 Thread Michal Hocko
On Mon 18-02-19 16:05:15, Mike Rapoport wrote:
> On Mon, Feb 18, 2019 at 11:30:13AM +0100, Michal Hocko wrote:
> > On Mon 18-02-19 18:01:39, Rong Chen wrote:
> > > 
> > > On 2/18/19 4:55 PM, Michal Hocko wrote:
> > > > [Sorry for an excessive quoting in the previous email]
> > > > [Cc Pavel - the full report is 
> > > > http://lkml.kernel.org/r/20190218052823.GH29177@shao2-debian[]
> > > > 
> > > > On Mon 18-02-19 08:08:44, Michal Hocko wrote:
> > > > > On Mon 18-02-19 13:28:23, kernel test robot wrote:
> > > > [...]
> > > > > > [   40.305212] PGD 0 P4D 0
> > > > > > [   40.308255] Oops:  [#1] PREEMPT SMP PTI
> > > > > > [   40.313055] CPU: 1 PID: 239 Comm: udevd Not tainted 
> > > > > > 5.0.0-rc4-00149-gefad4e4 #1
> > > > > > [   40.321348] Hardware name: QEMU Standard PC (i440FX + PIIX, 
> > > > > > 1996), BIOS 1.10.2-1 04/01/2014
> > > > > > [   40.330813] RIP: 0010:page_mapping+0x12/0x80
> > > > > > [   40.335709] Code: 5d c3 48 89 df e8 0e ad 02 00 85 c0 75 da 89 
> > > > > > e8 5b 5d c3 0f 1f 44 00 00 53 48 89 fb 48 8b 43 08 48 8d 50 ff a8 
> > > > > > 01 48 0f 45 da <48> 8b 53 08 48 8d 42 ff 83 e2 01 48 0f 44 c3 48 83 
> > > > > > 38 ff 74 2f 48
> > > > > > [   40.356704] RSP: 0018:88801fa87cd8 EFLAGS: 00010202
> > > > > > [   40.362714] RAX:  RBX: fffe RCX: 
> > > > > > 000a
> > > > > > [   40.370798] RDX: fffe RSI: 820b9a20 RDI: 
> > > > > > 88801e5c
> > > > > > [   40.378830] RBP: 6db6db6db6db6db7 R08: 88801e8bb000 R09: 
> > > > > > 01b64d13
> > > > > > [   40.386902] R10: 88801fa87cf8 R11: 0001 R12: 
> > > > > > 88801e64
> > > > > > [   40.395033] R13: 820b9a20 R14: 88801f145258 R15: 
> > > > > > 0001
> > > > > > [   40.403138] FS:  7fb2079817c0() 
> > > > > > GS:88801dd0() knlGS:
> > > > > > [   40.412243] CS:  0010 DS:  ES:  CR0: 80050033
> > > > > > [   40.418846] CR2: 0006 CR3: 1fa82000 CR4: 
> > > > > > 06a0
> > > > > > [   40.426951] Call Trace:
> > > > > > [   40.429843]  __dump_page+0x14/0x2c0
> > > > > > [   40.433947]  is_mem_section_removable+0x24c/0x2c0
> > > > > This looks like we are stumbling over an unitialized struct page 
> > > > > again.
> > > > > Something this patch should prevent from. Could you try to apply [1]
> > > > > which will make __dump_page more robust so that we do not blow up 
> > > > > there
> > > > > and give some more details in return.
> > > > > 
> > > > > Btw. is this reproducible all the time?
> > > > And forgot to ask whether this is reproducible with pending mmotm
> > > > patches in linux-next.
> > > 
> > > 
> > > Do you mean the below patch? I can reproduce the problem too.
> > 
> > Yes, thanks for the swift response. The patch has just added a debugging
> > output
> > [0.013697] Early memory node ranges
> > [0.013701]   node   0: [mem 0x1000-0x0009efff]
> > [0.013706]   node   0: [mem 0x0010-0x1ffd]
> > [0.013711] zeroying 0-1
> > 
> > This is the first pfn.
> > 
> > [0.013715] zeroying 9f-100
> > 
> > this is [mem 0x9f000, 0xf] so it fills up the whole hole between the
> > above two ranges. This is definitely good.
> > 
> > [0.013722] zeroying 1ffe0-1ffe0
> > 
> > this is a single page at 0x1ffe right after the zone end.
> > 
> > [0.013727] Zeroed struct page in unavailable ranges: 98 pages
> > 
> > Hmm, so this is getting really interesting. The whole zone range should
> > be covered. So this is either some off-by-one or I something that I am
> > missing right now. Could you apply the following on top please? We
> > definitely need to see what pfn this is.
> > 
> > 
> > diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> > index 124e794867c5..59bcfd934e37 100644
> > --- a/mm/memory_hotplug.c
> > +++ b/mm/memory_hotplug.c
> > @@ -1232,12 +1232,14 @@ static bool is_pageblock_removable_nolock(struct 
> > page *page)
> >  /* Checks if this range of memory is likely to be hot-removable. */
> >  bool is_mem_section_removable(unsigned long start_pfn, unsigned long 
> > nr_pages)
> >  {
> > -   struct page *page = pfn_to_page(start_pfn);
> > +   struct page *page = pfn_to_page(start_pfn), *first_page;
> > unsigned long end_pfn = min(start_pfn + nr_pages, 
> > zone_end_pfn(page_zone(page)));
> > struct page *end_page = pfn_to_page(end_pfn);
> > 
> > /* Check the starting page of each pageblock within the range */
> > -   for (; page < end_page; page = next_active_pageblock(page)) {
> > +   for (first_page = page; page < end_page; page = 
> > next_active_pageblock(page)) {
> > +   if (PagePoisoned(page))
> > +   pr_info("Unexpected poisoned page %px pfn:%lx\n", page, 
> > start_pfn + page-first_page);
> > if (!is_pageblock_removable_nolock(page))
> > return false;
> > 

Re: [LKP] efad4e475c [ 40.308255] Oops: 0000 [#1] PREEMPT SMP PTI

2019-02-18 Thread Mike Rapoport
On Mon, Feb 18, 2019 at 11:30:13AM +0100, Michal Hocko wrote:
> On Mon 18-02-19 18:01:39, Rong Chen wrote:
> > 
> > On 2/18/19 4:55 PM, Michal Hocko wrote:
> > > [Sorry for an excessive quoting in the previous email]
> > > [Cc Pavel - the full report is 
> > > http://lkml.kernel.org/r/20190218052823.GH29177@shao2-debian[]
> > > 
> > > On Mon 18-02-19 08:08:44, Michal Hocko wrote:
> > > > On Mon 18-02-19 13:28:23, kernel test robot wrote:
> > > [...]
> > > > > [   40.305212] PGD 0 P4D 0
> > > > > [   40.308255] Oops:  [#1] PREEMPT SMP PTI
> > > > > [   40.313055] CPU: 1 PID: 239 Comm: udevd Not tainted 
> > > > > 5.0.0-rc4-00149-gefad4e4 #1
> > > > > [   40.321348] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> > > > > BIOS 1.10.2-1 04/01/2014
> > > > > [   40.330813] RIP: 0010:page_mapping+0x12/0x80
> > > > > [   40.335709] Code: 5d c3 48 89 df e8 0e ad 02 00 85 c0 75 da 89 e8 
> > > > > 5b 5d c3 0f 1f 44 00 00 53 48 89 fb 48 8b 43 08 48 8d 50 ff a8 01 48 
> > > > > 0f 45 da <48> 8b 53 08 48 8d 42 ff 83 e2 01 48 0f 44 c3 48 83 38 ff 
> > > > > 74 2f 48
> > > > > [   40.356704] RSP: 0018:88801fa87cd8 EFLAGS: 00010202
> > > > > [   40.362714] RAX:  RBX: fffe RCX: 
> > > > > 000a
> > > > > [   40.370798] RDX: fffe RSI: 820b9a20 RDI: 
> > > > > 88801e5c
> > > > > [   40.378830] RBP: 6db6db6db6db6db7 R08: 88801e8bb000 R09: 
> > > > > 01b64d13
> > > > > [   40.386902] R10: 88801fa87cf8 R11: 0001 R12: 
> > > > > 88801e64
> > > > > [   40.395033] R13: 820b9a20 R14: 88801f145258 R15: 
> > > > > 0001
> > > > > [   40.403138] FS:  7fb2079817c0() GS:88801dd0() 
> > > > > knlGS:
> > > > > [   40.412243] CS:  0010 DS:  ES:  CR0: 80050033
> > > > > [   40.418846] CR2: 0006 CR3: 1fa82000 CR4: 
> > > > > 06a0
> > > > > [   40.426951] Call Trace:
> > > > > [   40.429843]  __dump_page+0x14/0x2c0
> > > > > [   40.433947]  is_mem_section_removable+0x24c/0x2c0
> > > > This looks like we are stumbling over an unitialized struct page again.
> > > > Something this patch should prevent from. Could you try to apply [1]
> > > > which will make __dump_page more robust so that we do not blow up there
> > > > and give some more details in return.
> > > > 
> > > > Btw. is this reproducible all the time?
> > > And forgot to ask whether this is reproducible with pending mmotm
> > > patches in linux-next.
> > 
> > 
> > Do you mean the below patch? I can reproduce the problem too.
> 
> Yes, thanks for the swift response. The patch has just added a debugging
> output
> [0.013697] Early memory node ranges
> [0.013701]   node   0: [mem 0x1000-0x0009efff]
> [0.013706]   node   0: [mem 0x0010-0x1ffd]
> [0.013711] zeroying 0-1
> 
> This is the first pfn.
> 
> [0.013715] zeroying 9f-100
> 
> this is [mem 0x9f000, 0xf] so it fills up the whole hole between the
> above two ranges. This is definitely good.
> 
> [0.013722] zeroying 1ffe0-1ffe0
> 
> this is a single page at 0x1ffe right after the zone end.
> 
> [0.013727] Zeroed struct page in unavailable ranges: 98 pages
> 
> Hmm, so this is getting really interesting. The whole zone range should
> be covered. So this is either some off-by-one or I something that I am
> missing right now. Could you apply the following on top please? We
> definitely need to see what pfn this is.
> 
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index 124e794867c5..59bcfd934e37 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -1232,12 +1232,14 @@ static bool is_pageblock_removable_nolock(struct page 
> *page)
>  /* Checks if this range of memory is likely to be hot-removable. */
>  bool is_mem_section_removable(unsigned long start_pfn, unsigned long 
> nr_pages)
>  {
> - struct page *page = pfn_to_page(start_pfn);
> + struct page *page = pfn_to_page(start_pfn), *first_page;
>   unsigned long end_pfn = min(start_pfn + nr_pages, 
> zone_end_pfn(page_zone(page)));
>   struct page *end_page = pfn_to_page(end_pfn);
> 
>   /* Check the starting page of each pageblock within the range */
> - for (; page < end_page; page = next_active_pageblock(page)) {
> + for (first_page = page; page < end_page; page = 
> next_active_pageblock(page)) {
> + if (PagePoisoned(page))
> + pr_info("Unexpected poisoned page %px pfn:%lx\n", page, 
> start_pfn + page-first_page);
>   if (!is_pageblock_removable_nolock(page))
>   return false;
>   cond_resched();

I've added more prints and somehow end_page gets too big (in brackets is
the pfn):

[   11.183835] ===> start: 88801e24(0), end: 88801e40(8000)
[   11.188457] ===> start: 88801e40(8000), end: 88801e64(1)
[   

Re: [LKP] efad4e475c [ 40.308255] Oops: 0000 [#1] PREEMPT SMP PTI

2019-02-18 Thread Michal Hocko
On Mon 18-02-19 18:01:39, Rong Chen wrote:
> 
> On 2/18/19 4:55 PM, Michal Hocko wrote:
> > [Sorry for an excessive quoting in the previous email]
> > [Cc Pavel - the full report is 
> > http://lkml.kernel.org/r/20190218052823.GH29177@shao2-debian[]
> > 
> > On Mon 18-02-19 08:08:44, Michal Hocko wrote:
> > > On Mon 18-02-19 13:28:23, kernel test robot wrote:
> > [...]
> > > > [   40.305212] PGD 0 P4D 0
> > > > [   40.308255] Oops:  [#1] PREEMPT SMP PTI
> > > > [   40.313055] CPU: 1 PID: 239 Comm: udevd Not tainted 
> > > > 5.0.0-rc4-00149-gefad4e4 #1
> > > > [   40.321348] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> > > > BIOS 1.10.2-1 04/01/2014
> > > > [   40.330813] RIP: 0010:page_mapping+0x12/0x80
> > > > [   40.335709] Code: 5d c3 48 89 df e8 0e ad 02 00 85 c0 75 da 89 e8 5b 
> > > > 5d c3 0f 1f 44 00 00 53 48 89 fb 48 8b 43 08 48 8d 50 ff a8 01 48 0f 45 
> > > > da <48> 8b 53 08 48 8d 42 ff 83 e2 01 48 0f 44 c3 48 83 38 ff 74 2f 48
> > > > [   40.356704] RSP: 0018:88801fa87cd8 EFLAGS: 00010202
> > > > [   40.362714] RAX:  RBX: fffe RCX: 
> > > > 000a
> > > > [   40.370798] RDX: fffe RSI: 820b9a20 RDI: 
> > > > 88801e5c
> > > > [   40.378830] RBP: 6db6db6db6db6db7 R08: 88801e8bb000 R09: 
> > > > 01b64d13
> > > > [   40.386902] R10: 88801fa87cf8 R11: 0001 R12: 
> > > > 88801e64
> > > > [   40.395033] R13: 820b9a20 R14: 88801f145258 R15: 
> > > > 0001
> > > > [   40.403138] FS:  7fb2079817c0() GS:88801dd0() 
> > > > knlGS:
> > > > [   40.412243] CS:  0010 DS:  ES:  CR0: 80050033
> > > > [   40.418846] CR2: 0006 CR3: 1fa82000 CR4: 
> > > > 06a0
> > > > [   40.426951] Call Trace:
> > > > [   40.429843]  __dump_page+0x14/0x2c0
> > > > [   40.433947]  is_mem_section_removable+0x24c/0x2c0
> > > This looks like we are stumbling over an unitialized struct page again.
> > > Something this patch should prevent from. Could you try to apply [1]
> > > which will make __dump_page more robust so that we do not blow up there
> > > and give some more details in return.
> > > 
> > > Btw. is this reproducible all the time?
> > And forgot to ask whether this is reproducible with pending mmotm
> > patches in linux-next.
> 
> 
> Do you mean the below patch? I can reproduce the problem too.

Yes, thanks for the swift response. The patch has just added a debugging
output
[0.013697] Early memory node ranges
[0.013701]   node   0: [mem 0x1000-0x0009efff]
[0.013706]   node   0: [mem 0x0010-0x1ffd]
[0.013711] zeroying 0-1

This is the first pfn.

[0.013715] zeroying 9f-100

this is [mem 0x9f000, 0xf] so it fills up the whole hole between the
above two ranges. This is definitely good.

[0.013722] zeroying 1ffe0-1ffe0

this is a single page at 0x1ffe right after the zone end.

[0.013727] Zeroed struct page in unavailable ranges: 98 pages

Hmm, so this is getting really interesting. The whole zone range should
be covered. So this is either some off-by-one or I something that I am
missing right now. Could you apply the following on top please? We
definitely need to see what pfn this is.


diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 124e794867c5..59bcfd934e37 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1232,12 +1232,14 @@ static bool is_pageblock_removable_nolock(struct page 
*page)
 /* Checks if this range of memory is likely to be hot-removable. */
 bool is_mem_section_removable(unsigned long start_pfn, unsigned long nr_pages)
 {
-   struct page *page = pfn_to_page(start_pfn);
+   struct page *page = pfn_to_page(start_pfn), *first_page;
unsigned long end_pfn = min(start_pfn + nr_pages, 
zone_end_pfn(page_zone(page)));
struct page *end_page = pfn_to_page(end_pfn);
 
/* Check the starting page of each pageblock within the range */
-   for (; page < end_page; page = next_active_pageblock(page)) {
+   for (first_page = page; page < end_page; page = 
next_active_pageblock(page)) {
+   if (PagePoisoned(page))
+   pr_info("Unexpected poisoned page %px pfn:%lx\n", page, 
start_pfn + page-first_page);
if (!is_pageblock_removable_nolock(page))
return false;
cond_resched();
-- 
Michal Hocko
SUSE Labs


Re: [LKP] efad4e475c [ 40.308255] Oops: 0000 [#1] PREEMPT SMP PTI

2019-02-18 Thread Michal Hocko
On Mon 18-02-19 17:11:49, Rong Chen wrote:
> 
> On 2/18/19 5:03 PM, Michal Hocko wrote:
> > On Mon 18-02-19 16:47:26, Rong Chen wrote:
> > > On 2/18/19 3:08 PM, Michal Hocko wrote:
> > > > On Mon 18-02-19 13:28:23, kernel test robot wrote:
> > [...]
> > > > > [   40.305212] PGD 0 P4D 0
> > > > > [   40.308255] Oops:  [#1] PREEMPT SMP PTI
> > > > > [   40.313055] CPU: 1 PID: 239 Comm: udevd Not tainted 
> > > > > 5.0.0-rc4-00149-gefad4e4 #1
> > > > > [   40.321348] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> > > > > BIOS 1.10.2-1 04/01/2014
> > > > > [   40.330813] RIP: 0010:page_mapping+0x12/0x80
> > > > > [   40.335709] Code: 5d c3 48 89 df e8 0e ad 02 00 85 c0 75 da 89 e8 
> > > > > 5b 5d c3 0f 1f 44 00 00 53 48 89 fb 48 8b 43 08 48 8d 50 ff a8 01 48 
> > > > > 0f 45 da <48> 8b 53 08 48 8d 42 ff 83 e2 01 48 0f 44 c3 48 83 38 ff 
> > > > > 74 2f 48
> > > > > [   40.356704] RSP: 0018:88801fa87cd8 EFLAGS: 00010202
> > > > > [   40.362714] RAX:  RBX: fffe RCX: 
> > > > > 000a
> > > > > [   40.370798] RDX: fffe RSI: 820b9a20 RDI: 
> > > > > 88801e5c
> > > > > [   40.378830] RBP: 6db6db6db6db6db7 R08: 88801e8bb000 R09: 
> > > > > 01b64d13
> > > > > [   40.386902] R10: 88801fa87cf8 R11: 0001 R12: 
> > > > > 88801e64
> > > > > [   40.395033] R13: 820b9a20 R14: 88801f145258 R15: 
> > > > > 0001
> > > > > [   40.403138] FS:  7fb2079817c0() GS:88801dd0() 
> > > > > knlGS:
> > > > > [   40.412243] CS:  0010 DS:  ES:  CR0: 80050033
> > > > > [   40.418846] CR2: 0006 CR3: 1fa82000 CR4: 
> > > > > 06a0
> > > > > [   40.426951] Call Trace:
> > > > > [   40.429843]  __dump_page+0x14/0x2c0
> > > > > [   40.433947]  is_mem_section_removable+0x24c/0x2c0
> > > > This looks like we are stumbling over an unitialized struct page again.
> > > > Something this patch should prevent from. Could you try to apply [1]
> > > > which will make __dump_page more robust so that we do not blow up there
> > > > and give some more details in return.
> > > 
> > > Hi Hocko,
> > > 
> > > I have applied [1] and attached the dmesg file.
> > Thanks so the log confirms that this is really an unitialized struct
> > page
> > [   12.228622] raw:   
> > [   12.231474] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
> > [   12.232135] [ cut here ]
> > [   12.232649] kernel BUG at include/linux/mm.h:1020!
> > 
> > So now, we have to find out what has been left behind. Please see my
> > other email. Also could you give me faddr2line of the
> > is_mem_section_removable offset please? I assume it is
> > is_pageblock_removable_nolock:
> > if (!node_online(page_to_nid(page)))
> > return false;
> 
> 
> faddr2line result:
> 
> is_mem_section_removable+0x24c/0x2c0:
> page_to_nid at include/linux/mm.h:1020
> (inlined by) is_pageblock_removable_nolock at mm/memory_hotplug.c:1221
> (inlined by) is_mem_section_removable at mm/memory_hotplug.c:1241

Thanks so this indeed points to page_to_nid. Thanks!
-- 
Michal Hocko
SUSE Labs


Re: [LKP] efad4e475c [ 40.308255] Oops: 0000 [#1] PREEMPT SMP PTI

2019-02-18 Thread Rong Chen



On 2/18/19 5:03 PM, Michal Hocko wrote:

On Mon 18-02-19 16:47:26, Rong Chen wrote:

On 2/18/19 3:08 PM, Michal Hocko wrote:

On Mon 18-02-19 13:28:23, kernel test robot wrote:

[...]

[   40.305212] PGD 0 P4D 0
[   40.308255] Oops:  [#1] PREEMPT SMP PTI
[   40.313055] CPU: 1 PID: 239 Comm: udevd Not tainted 5.0.0-rc4-00149-gefad4e4 
#1
[   40.321348] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
1.10.2-1 04/01/2014
[   40.330813] RIP: 0010:page_mapping+0x12/0x80
[   40.335709] Code: 5d c3 48 89 df e8 0e ad 02 00 85 c0 75 da 89 e8 5b 5d c3 0f 1f 
44 00 00 53 48 89 fb 48 8b 43 08 48 8d 50 ff a8 01 48 0f 45 da <48> 8b 53 08 48 
8d 42 ff 83 e2 01 48 0f 44 c3 48 83 38 ff 74 2f 48
[   40.356704] RSP: 0018:88801fa87cd8 EFLAGS: 00010202
[   40.362714] RAX:  RBX: fffe RCX: 000a
[   40.370798] RDX: fffe RSI: 820b9a20 RDI: 88801e5c
[   40.378830] RBP: 6db6db6db6db6db7 R08: 88801e8bb000 R09: 01b64d13
[   40.386902] R10: 88801fa87cf8 R11: 0001 R12: 88801e64
[   40.395033] R13: 820b9a20 R14: 88801f145258 R15: 0001
[   40.403138] FS:  7fb2079817c0() GS:88801dd0() 
knlGS:
[   40.412243] CS:  0010 DS:  ES:  CR0: 80050033
[   40.418846] CR2: 0006 CR3: 1fa82000 CR4: 06a0
[   40.426951] Call Trace:
[   40.429843]  __dump_page+0x14/0x2c0
[   40.433947]  is_mem_section_removable+0x24c/0x2c0

This looks like we are stumbling over an unitialized struct page again.
Something this patch should prevent from. Could you try to apply [1]
which will make __dump_page more robust so that we do not blow up there
and give some more details in return.


Hi Hocko,

I have applied [1] and attached the dmesg file.

Thanks so the log confirms that this is really an unitialized struct
page
[   12.228622] raw:   
[   12.231474] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
[   12.232135] [ cut here ]
[   12.232649] kernel BUG at include/linux/mm.h:1020!

So now, we have to find out what has been left behind. Please see my
other email. Also could you give me faddr2line of the
is_mem_section_removable offset please? I assume it is
is_pageblock_removable_nolock:
if (!node_online(page_to_nid(page)))
return false;



faddr2line result:

is_mem_section_removable+0x24c/0x2c0:
page_to_nid at include/linux/mm.h:1020
(inlined by) is_pageblock_removable_nolock at mm/memory_hotplug.c:1221
(inlined by) is_mem_section_removable at mm/memory_hotplug.c:1241

Best Regards,
Rong Chen




Re: [LKP] efad4e475c [ 40.308255] Oops: 0000 [#1] PREEMPT SMP PTI

2019-02-18 Thread Michal Hocko
On Mon 18-02-19 16:47:26, Rong Chen wrote:
> 
> On 2/18/19 3:08 PM, Michal Hocko wrote:
> > On Mon 18-02-19 13:28:23, kernel test robot wrote:
[...]
> > > [   40.305212] PGD 0 P4D 0
> > > [   40.308255] Oops:  [#1] PREEMPT SMP PTI
> > > [   40.313055] CPU: 1 PID: 239 Comm: udevd Not tainted 
> > > 5.0.0-rc4-00149-gefad4e4 #1
> > > [   40.321348] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), 
> > > BIOS 1.10.2-1 04/01/2014
> > > [   40.330813] RIP: 0010:page_mapping+0x12/0x80
> > > [   40.335709] Code: 5d c3 48 89 df e8 0e ad 02 00 85 c0 75 da 89 e8 5b 
> > > 5d c3 0f 1f 44 00 00 53 48 89 fb 48 8b 43 08 48 8d 50 ff a8 01 48 0f 45 
> > > da <48> 8b 53 08 48 8d 42 ff 83 e2 01 48 0f 44 c3 48 83 38 ff 74 2f 48
> > > [   40.356704] RSP: 0018:88801fa87cd8 EFLAGS: 00010202
> > > [   40.362714] RAX:  RBX: fffe RCX: 
> > > 000a
> > > [   40.370798] RDX: fffe RSI: 820b9a20 RDI: 
> > > 88801e5c
> > > [   40.378830] RBP: 6db6db6db6db6db7 R08: 88801e8bb000 R09: 
> > > 01b64d13
> > > [   40.386902] R10: 88801fa87cf8 R11: 0001 R12: 
> > > 88801e64
> > > [   40.395033] R13: 820b9a20 R14: 88801f145258 R15: 
> > > 0001
> > > [   40.403138] FS:  7fb2079817c0() GS:88801dd0() 
> > > knlGS:
> > > [   40.412243] CS:  0010 DS:  ES:  CR0: 80050033
> > > [   40.418846] CR2: 0006 CR3: 1fa82000 CR4: 
> > > 06a0
> > > [   40.426951] Call Trace:
> > > [   40.429843]  __dump_page+0x14/0x2c0
> > > [   40.433947]  is_mem_section_removable+0x24c/0x2c0
> > This looks like we are stumbling over an unitialized struct page again.
> > Something this patch should prevent from. Could you try to apply [1]
> > which will make __dump_page more robust so that we do not blow up there
> > and give some more details in return.
> 
> 
> Hi Hocko,
> 
> I have applied [1] and attached the dmesg file.

Thanks so the log confirms that this is really an unitialized struct
page
[   12.228622] raw:   
[   12.231474] page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
[   12.232135] [ cut here ]
[   12.232649] kernel BUG at include/linux/mm.h:1020!

So now, we have to find out what has been left behind. Please see my
other email. Also could you give me faddr2line of the
is_mem_section_removable offset please? I assume it is 
is_pageblock_removable_nolock:
if (!node_online(page_to_nid(page)))
return false;
-- 
Michal Hocko
SUSE Labs


Re: [LKP] efad4e475c [ 40.308255] Oops: 0000 [#1] PREEMPT SMP PTI

2019-02-18 Thread Michal Hocko
[Sorry for an excessive quoting in the previous email]
[Cc Pavel - the full report is 
http://lkml.kernel.org/r/20190218052823.GH29177@shao2-debian[]

On Mon 18-02-19 08:08:44, Michal Hocko wrote:
> On Mon 18-02-19 13:28:23, kernel test robot wrote:
[...]
> > [   40.305212] PGD 0 P4D 0 
> > [   40.308255] Oops:  [#1] PREEMPT SMP PTI
> > [   40.313055] CPU: 1 PID: 239 Comm: udevd Not tainted 
> > 5.0.0-rc4-00149-gefad4e4 #1
> > [   40.321348] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 
> > 1.10.2-1 04/01/2014
> > [   40.330813] RIP: 0010:page_mapping+0x12/0x80
> > [   40.335709] Code: 5d c3 48 89 df e8 0e ad 02 00 85 c0 75 da 89 e8 5b 5d 
> > c3 0f 1f 44 00 00 53 48 89 fb 48 8b 43 08 48 8d 50 ff a8 01 48 0f 45 da 
> > <48> 8b 53 08 48 8d 42 ff 83 e2 01 48 0f 44 c3 48 83 38 ff 74 2f 48
> > [   40.356704] RSP: 0018:88801fa87cd8 EFLAGS: 00010202
> > [   40.362714] RAX:  RBX: fffe RCX: 
> > 000a
> > [   40.370798] RDX: fffe RSI: 820b9a20 RDI: 
> > 88801e5c
> > [   40.378830] RBP: 6db6db6db6db6db7 R08: 88801e8bb000 R09: 
> > 01b64d13
> > [   40.386902] R10: 88801fa87cf8 R11: 0001 R12: 
> > 88801e64
> > [   40.395033] R13: 820b9a20 R14: 88801f145258 R15: 
> > 0001
> > [   40.403138] FS:  7fb2079817c0() GS:88801dd0() 
> > knlGS:
> > [   40.412243] CS:  0010 DS:  ES:  CR0: 80050033
> > [   40.418846] CR2: 0006 CR3: 1fa82000 CR4: 
> > 06a0
> > [   40.426951] Call Trace:
> > [   40.429843]  __dump_page+0x14/0x2c0
> > [   40.433947]  is_mem_section_removable+0x24c/0x2c0
> 
> This looks like we are stumbling over an unitialized struct page again.
> Something this patch should prevent from. Could you try to apply [1]
> which will make __dump_page more robust so that we do not blow up there
> and give some more details in return.
> 
> Btw. is this reproducible all the time?

And forgot to ask whether this is reproducible with pending mmotm
patches in linux-next.

> I will have a look at the memory layout later today.

[0.059335] No NUMA configuration found
[0.059345] Faking a node at [mem 0x-0x1ffd]
[0.059399] NODE_DATA(0) allocated [mem 0x1e8c3000-0x1e8c5fff]
[0.073143] Zone ranges:
[0.073175]   DMA32[mem 0x1000-0x1ffd]
[0.073204]   Normal   empty
[0.073212] Movable zone start for each node
[0.073240] Early memory node ranges
[0.073247]   node   0: [mem 0x1000-0x0009efff]
[0.073275]   node   0: [mem 0x0010-0x1ffd]
[0.073309] Zeroed struct page in unavailable ranges: 98 pages
[0.073312] Initmem setup node 0 [mem 0x1000-0x1ffd]
[0.073343] On node 0 totalpages: 130942
[0.073373]   DMA32 zone: 1792 pages used for memmap
[0.073400]   DMA32 zone: 21 pages reserved
[0.073408]   DMA32 zone: 130942 pages, LIFO batch:31

We have only a single NUMA node with a single ZONE_DMA32. But there is a
hole in the zone and the first range before the hole is not section
aligned. We do zero some unavailable ranges but from the number it is no
clear which range it is and 98. [0x60fff, 0xf) is 96 pages. The
patch below should tell us whether we are covering all we need. If yes
then the hole shouldn't make any difference and the problem must be
somewhere else.

---
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 35fdde041f5c..c60642505e04 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6706,10 +6706,13 @@ void __init zero_resv_unavail(void)
pgcnt = 0;
for_each_mem_range(i, , NULL,
NUMA_NO_NODE, MEMBLOCK_NONE, , , NULL) {
-   if (next < start)
+   if (next < start) {
+   pr_info("zeroying %llx-%llx\n", PFN_DOWN(next), 
PFN_UP(start));
pgcnt += zero_pfn_range(PFN_DOWN(next), PFN_UP(start));
+   }
next = end;
}
+   pr_info("zeroying %llx-%lx\n", PFN_DOWN(next), max_pfn);
pgcnt += zero_pfn_range(PFN_DOWN(next), max_pfn);
 
/*
-- 
Michal Hocko
SUSE Labs


Re: [LKP] efad4e475c [ 40.308255] Oops: 0000 [#1] PREEMPT SMP PTI

2019-02-17 Thread Michal Hocko
On Mon 18-02-19 13:28:23, kernel test robot wrote:
> Greetings,
> 
> 0day kernel testing robot got the below dmesg and the first bad commit is
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git master
> 
> commit efad4e475c312456edb3c789d0996d12ed744c13
> Author: Michal Hocko 
> AuthorDate: Fri Feb 1 14:20:34 2019 -0800
> Commit: Linus Torvalds 
> CommitDate: Fri Feb 1 15:46:23 2019 -0800
> 
> mm, memory_hotplug: is_mem_section_removable do not pass the end of a zone
> 
> Patch series "mm, memory_hotplug: fix uninitialized pages fallouts", v2.
> 
> Mikhail Zaslonko has posted fixes for the two bugs quite some time ago
> [1].  I have pushed back on those fixes because I believed that it is
> much better to plug the problem at the initialization time rather than
> play whack-a-mole all over the hotplug code and find all the places
> which expect the full memory section to be initialized.
> 
> We have ended up with commit 2830bf6f05fb ("mm, memory_hotplug:
> initialize struct pages for the full memory section") merged and cause a
> regression [2][3].  The reason is that there might be memory layouts
> when two NUMA nodes share the same memory section so the merged fix is
> simply incorrect.
> 
> In order to plug this hole we really have to be zone range aware in
> those handlers.  I have split up the original patch into two.  One is
> unchanged (patch 2) and I took a different approach for `removable'
> crash.
> 
> [1] http://lkml.kernel.org/r/20181105150401.97287-2-zaslo...@linux.ibm.com
> [2] https://bugzilla.redhat.com/show_bug.cgi?id=1666948
> [3] http://lkml.kernel.org/r/20190125163938.ga20...@dhcp22.suse.cz
> 
> This patch (of 2):
> 
> Mikhail has reported the following VM_BUG_ON triggered when reading sysfs
> removable state of a memory block:
> 
>  page:03d08300c000 is uninitialized and poisoned
>  page dumped because: VM_BUG_ON_PAGE(PagePoisoned(p))
>  Call Trace:
>is_mem_section_removable+0xb4/0x190
>show_mem_removable+0x9a/0xd8
>dev_attr_show+0x34/0x70
>sysfs_kf_seq_show+0xc8/0x148
>seq_read+0x204/0x480
>__vfs_read+0x32/0x178
>vfs_read+0x82/0x138
>ksys_read+0x5a/0xb0
>system_call+0xdc/0x2d8
>  Last Breaking-Event-Address:
>is_mem_section_removable+0xb4/0x190
>  Kernel panic - not syncing: Fatal exception: panic_on_oops
> 
> The reason is that the memory block spans the zone boundary and we are
> stumbling over an unitialized struct page.  Fix this by enforcing zone
> range in is_mem_section_removable so that we never run away from a zone.
> 
> Link: http://lkml.kernel.org/r/20190128144506.15603-2-mho...@kernel.org
> Signed-off-by: Michal Hocko 
> Reported-by: Mikhail Zaslonko 
> Debugged-by: Mikhail Zaslonko 
> Tested-by: Gerald Schaefer 
> Tested-by: Mikhail Gavrilov 
> Reviewed-by: Oscar Salvador 
> Cc: Pavel Tatashin 
> Cc: Heiko Carstens 
> Cc: Martin Schwidefsky 
> Signed-off-by: Andrew Morton 
> Signed-off-by: Linus Torvalds 
> 
> 9bcdeb51bd  oom, oom_reaper: do not enqueue same task twice
> efad4e475c  mm, memory_hotplug: is_mem_section_removable do not pass the end 
> of a zone
> f17b5f06cb  Linux 5.0-rc4
> 7a92eb7cc1  Add linux-next specific files for 20190215
> +-+++--+---+
> | | 9bcdeb51bd | 
> efad4e475c | v5.0-rc4 | next-20190215 |
> +-+++--+---+
> | boot_successes  | 31 | 2
>   | 21   | 0 |
> | boot_failures   | 0  | 11   
>   | 6| 10|
> | Oops:#[##]  | 0  | 11   
>   |  |   |
> | RIP:page_mapping| 0  | 11   
>   |  |   |
> | WARNING:at_kernel/locking/lockdep.c:#lock_downgrade | 0  | 3
>   |  |   |
> | RIP:lock_downgrade  | 0  | 3
>   |  |   |
> | Kernel_panic-not_syncing:Fatal_exception| 0  | 11   
>   | 0| 10|
> | BUG:unable_to_handle_kernel | 0  | 6
>   |  |   |
> | BUG:kernel_in_stage | 0  | 0
>   | 6|   |
> | kernel_BUG_at_include/linux/mm.h| 0  | 0
>   | 0| 10|
> | invalid_opcode:#[##]| 0