Re: [PATCH 11/14] mm, memory_hotplug: do not associate hotadded memory to zones until online

2017-06-25 Thread Michal Hocko
On Sun 25-06-17 08:14:13, Wei Yang wrote:
> On Mon, May 15, 2017 at 10:58:24AM +0200, Michal Hocko wrote:
> >From: Michal Hocko 
> >
> [...]
> >+void move_pfn_range_to_zone(struct zone *zone,
> >+unsigned long start_pfn, unsigned long nr_pages)
> >+{
> >+struct pglist_data *pgdat = zone->zone_pgdat;
> >+int nid = pgdat->node_id;
> >+unsigned long flags;
> >+unsigned long i;
> 
> This is an unused variable:
> 
>   mm/memory_hotplug.c: In function ‘move_pfn_range_to_zone’:
>   mm/memory_hotplug.c:895:16: warning: unused variable ‘i’ [-Wunused-variable]
> 
> Do you suggest me to write a patch or you would fix it in your later rework?

Please send a fix for your
http://lkml.kernel.org/r/20170616092335.5177-2-richard.weiy...@gmail.com
Andrew will fold it into that patch.

> 
> >+
> >+if (zone_is_empty(zone))
> >+init_currently_empty_zone(zone, start_pfn, nr_pages);
> >+
> >+clear_zone_contiguous(zone);
> >+
> >+/* TODO Huh pgdat is irqsave while zone is not. It used to be like that 
> >before */
> >+pgdat_resize_lock(pgdat, );
> >+zone_span_writelock(zone);
> >+resize_zone_range(zone, start_pfn, nr_pages);
> >+zone_span_writeunlock(zone);
> >+resize_pgdat_range(pgdat, start_pfn, nr_pages);
> >+pgdat_resize_unlock(pgdat, );
> >+
> >+/*
> >+ * TODO now we have a visible range of pages which are not associated
> >+ * with their zone properly. Not nice but set_pfnblock_flags_mask
> >+ * expects the zone spans the pfn range. All the pages in the range
> >+ * are reserved so nobody should be touching them so we should be safe
> >+ */
> >+memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, 
> >MEMMAP_HOTPLUG);
> >+for (i = 0; i < nr_pages; i++) {
> >+unsigned long pfn = start_pfn + i;
> >+set_page_links(pfn_to_page(pfn), zone_idx(zone), nid, pfn);
> > }
> > 
> >2.11.0
> 
> -- 
> Wei Yang
> Help you, Help me



-- 
Michal Hocko
SUSE Labs


Re: [PATCH 11/14] mm, memory_hotplug: do not associate hotadded memory to zones until online

2017-06-25 Thread Michal Hocko
On Sun 25-06-17 08:14:13, Wei Yang wrote:
> On Mon, May 15, 2017 at 10:58:24AM +0200, Michal Hocko wrote:
> >From: Michal Hocko 
> >
> [...]
> >+void move_pfn_range_to_zone(struct zone *zone,
> >+unsigned long start_pfn, unsigned long nr_pages)
> >+{
> >+struct pglist_data *pgdat = zone->zone_pgdat;
> >+int nid = pgdat->node_id;
> >+unsigned long flags;
> >+unsigned long i;
> 
> This is an unused variable:
> 
>   mm/memory_hotplug.c: In function ‘move_pfn_range_to_zone’:
>   mm/memory_hotplug.c:895:16: warning: unused variable ‘i’ [-Wunused-variable]
> 
> Do you suggest me to write a patch or you would fix it in your later rework?

Please send a fix for your
http://lkml.kernel.org/r/20170616092335.5177-2-richard.weiy...@gmail.com
Andrew will fold it into that patch.

> 
> >+
> >+if (zone_is_empty(zone))
> >+init_currently_empty_zone(zone, start_pfn, nr_pages);
> >+
> >+clear_zone_contiguous(zone);
> >+
> >+/* TODO Huh pgdat is irqsave while zone is not. It used to be like that 
> >before */
> >+pgdat_resize_lock(pgdat, );
> >+zone_span_writelock(zone);
> >+resize_zone_range(zone, start_pfn, nr_pages);
> >+zone_span_writeunlock(zone);
> >+resize_pgdat_range(pgdat, start_pfn, nr_pages);
> >+pgdat_resize_unlock(pgdat, );
> >+
> >+/*
> >+ * TODO now we have a visible range of pages which are not associated
> >+ * with their zone properly. Not nice but set_pfnblock_flags_mask
> >+ * expects the zone spans the pfn range. All the pages in the range
> >+ * are reserved so nobody should be touching them so we should be safe
> >+ */
> >+memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, 
> >MEMMAP_HOTPLUG);
> >+for (i = 0; i < nr_pages; i++) {
> >+unsigned long pfn = start_pfn + i;
> >+set_page_links(pfn_to_page(pfn), zone_idx(zone), nid, pfn);
> > }
> > 
> >2.11.0
> 
> -- 
> Wei Yang
> Help you, Help me



-- 
Michal Hocko
SUSE Labs


Re: [PATCH 11/14] mm, memory_hotplug: do not associate hotadded memory to zones until online

2017-06-24 Thread Wei Yang
On Mon, May 15, 2017 at 10:58:24AM +0200, Michal Hocko wrote:
>From: Michal Hocko 
>
[...]
>+void move_pfn_range_to_zone(struct zone *zone,
>+  unsigned long start_pfn, unsigned long nr_pages)
>+{
>+  struct pglist_data *pgdat = zone->zone_pgdat;
>+  int nid = pgdat->node_id;
>+  unsigned long flags;
>+  unsigned long i;

This is an unused variable:

  mm/memory_hotplug.c: In function ‘move_pfn_range_to_zone’:
  mm/memory_hotplug.c:895:16: warning: unused variable ‘i’ [-Wunused-variable]

Do you suggest me to write a patch or you would fix it in your later rework?

>+
>+  if (zone_is_empty(zone))
>+  init_currently_empty_zone(zone, start_pfn, nr_pages);
>+
>+  clear_zone_contiguous(zone);
>+
>+  /* TODO Huh pgdat is irqsave while zone is not. It used to be like that 
>before */
>+  pgdat_resize_lock(pgdat, );
>+  zone_span_writelock(zone);
>+  resize_zone_range(zone, start_pfn, nr_pages);
>+  zone_span_writeunlock(zone);
>+  resize_pgdat_range(pgdat, start_pfn, nr_pages);
>+  pgdat_resize_unlock(pgdat, );
>+
>+  /*
>+   * TODO now we have a visible range of pages which are not associated
>+   * with their zone properly. Not nice but set_pfnblock_flags_mask
>+   * expects the zone spans the pfn range. All the pages in the range
>+   * are reserved so nobody should be touching them so we should be safe
>+   */
>+  memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, 
>MEMMAP_HOTPLUG);
>+  for (i = 0; i < nr_pages; i++) {
>+  unsigned long pfn = start_pfn + i;
>+  set_page_links(pfn_to_page(pfn), zone_idx(zone), nid, pfn);
>   }
> 
>2.11.0

-- 
Wei Yang
Help you, Help me


signature.asc
Description: PGP signature


Re: [PATCH 11/14] mm, memory_hotplug: do not associate hotadded memory to zones until online

2017-06-24 Thread Wei Yang
On Mon, May 15, 2017 at 10:58:24AM +0200, Michal Hocko wrote:
>From: Michal Hocko 
>
[...]
>+void move_pfn_range_to_zone(struct zone *zone,
>+  unsigned long start_pfn, unsigned long nr_pages)
>+{
>+  struct pglist_data *pgdat = zone->zone_pgdat;
>+  int nid = pgdat->node_id;
>+  unsigned long flags;
>+  unsigned long i;

This is an unused variable:

  mm/memory_hotplug.c: In function ‘move_pfn_range_to_zone’:
  mm/memory_hotplug.c:895:16: warning: unused variable ‘i’ [-Wunused-variable]

Do you suggest me to write a patch or you would fix it in your later rework?

>+
>+  if (zone_is_empty(zone))
>+  init_currently_empty_zone(zone, start_pfn, nr_pages);
>+
>+  clear_zone_contiguous(zone);
>+
>+  /* TODO Huh pgdat is irqsave while zone is not. It used to be like that 
>before */
>+  pgdat_resize_lock(pgdat, );
>+  zone_span_writelock(zone);
>+  resize_zone_range(zone, start_pfn, nr_pages);
>+  zone_span_writeunlock(zone);
>+  resize_pgdat_range(pgdat, start_pfn, nr_pages);
>+  pgdat_resize_unlock(pgdat, );
>+
>+  /*
>+   * TODO now we have a visible range of pages which are not associated
>+   * with their zone properly. Not nice but set_pfnblock_flags_mask
>+   * expects the zone spans the pfn range. All the pages in the range
>+   * are reserved so nobody should be touching them so we should be safe
>+   */
>+  memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, 
>MEMMAP_HOTPLUG);
>+  for (i = 0; i < nr_pages; i++) {
>+  unsigned long pfn = start_pfn + i;
>+  set_page_links(pfn_to_page(pfn), zone_idx(zone), nid, pfn);
>   }
> 
>2.11.0

-- 
Wei Yang
Help you, Help me


signature.asc
Description: PGP signature


Re: [PATCH 11/14] mm, memory_hotplug: do not associate hotadded memory to zones until online

2017-06-16 Thread Wei Yang
On Fri, Jun 16, 2017 at 10:45:55AM +0200, Michal Hocko wrote:
>On Fri 16-06-17 16:11:42, Wei Yang wrote:
>> Well, I love this patch a lot. We don't need to put the hotadd memory in one
>> zone and move it to another. This looks great!
>> 
>> On Mon, May 15, 2017 at 10:58:24AM +0200, Michal Hocko wrote:
>> >From: Michal Hocko 
>> >
>> [...]
>> +
>> >+void move_pfn_range_to_zone(struct zone *zone,
>> >+   unsigned long start_pfn, unsigned long nr_pages)
>> >+{
>> >+   struct pglist_data *pgdat = zone->zone_pgdat;
>> >+   int nid = pgdat->node_id;
>> >+   unsigned long flags;
>> >+   unsigned long i;
>> >+
>> >+   if (zone_is_empty(zone))
>> >+   init_currently_empty_zone(zone, start_pfn, nr_pages);
>> >+
>> >+   clear_zone_contiguous(zone);
>> >+
>> >+   /* TODO Huh pgdat is irqsave while zone is not. It used to be like that 
>> >before */
>> >+   pgdat_resize_lock(pgdat, );
>> >+   zone_span_writelock(zone);
>> >+   resize_zone_range(zone, start_pfn, nr_pages);
>> >+   zone_span_writeunlock(zone);
>> >+   resize_pgdat_range(pgdat, start_pfn, nr_pages);
>> >+   pgdat_resize_unlock(pgdat, );
>> >+
>> >+   /*
>> >+* TODO now we have a visible range of pages which are not associated
>> >+* with their zone properly. Not nice but set_pfnblock_flags_mask
>> >+* expects the zone spans the pfn range. All the pages in the range
>> >+* are reserved so nobody should be touching them so we should be safe
>> >+*/
>> >+   memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, 
>> >MEMMAP_HOTPLUG);
>> >+   for (i = 0; i < nr_pages; i++) {
>> >+   unsigned long pfn = start_pfn + i;
>> >+   set_page_links(pfn_to_page(pfn), zone_idx(zone), nid, pfn);
>> >}
>> 
>> memmap_init_zone()->__init_single_page()->set_page_links()
>> 
>> Do I miss something that you call set_page_links() explicitly here?
>
>I guess you are right. Not sure why I've done this explicitly. I've most
>probably just missed that. Could you post a patch that removes the for
>loop.
>

Sure, I will come up with two patches based on you auto-latest branch.

>Thanks!
>-- 
>Michal Hocko
>SUSE Labs

-- 
Wei Yang
Help you, Help me


signature.asc
Description: PGP signature


Re: [PATCH 11/14] mm, memory_hotplug: do not associate hotadded memory to zones until online

2017-06-16 Thread Wei Yang
On Fri, Jun 16, 2017 at 10:45:55AM +0200, Michal Hocko wrote:
>On Fri 16-06-17 16:11:42, Wei Yang wrote:
>> Well, I love this patch a lot. We don't need to put the hotadd memory in one
>> zone and move it to another. This looks great!
>> 
>> On Mon, May 15, 2017 at 10:58:24AM +0200, Michal Hocko wrote:
>> >From: Michal Hocko 
>> >
>> [...]
>> +
>> >+void move_pfn_range_to_zone(struct zone *zone,
>> >+   unsigned long start_pfn, unsigned long nr_pages)
>> >+{
>> >+   struct pglist_data *pgdat = zone->zone_pgdat;
>> >+   int nid = pgdat->node_id;
>> >+   unsigned long flags;
>> >+   unsigned long i;
>> >+
>> >+   if (zone_is_empty(zone))
>> >+   init_currently_empty_zone(zone, start_pfn, nr_pages);
>> >+
>> >+   clear_zone_contiguous(zone);
>> >+
>> >+   /* TODO Huh pgdat is irqsave while zone is not. It used to be like that 
>> >before */
>> >+   pgdat_resize_lock(pgdat, );
>> >+   zone_span_writelock(zone);
>> >+   resize_zone_range(zone, start_pfn, nr_pages);
>> >+   zone_span_writeunlock(zone);
>> >+   resize_pgdat_range(pgdat, start_pfn, nr_pages);
>> >+   pgdat_resize_unlock(pgdat, );
>> >+
>> >+   /*
>> >+* TODO now we have a visible range of pages which are not associated
>> >+* with their zone properly. Not nice but set_pfnblock_flags_mask
>> >+* expects the zone spans the pfn range. All the pages in the range
>> >+* are reserved so nobody should be touching them so we should be safe
>> >+*/
>> >+   memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, 
>> >MEMMAP_HOTPLUG);
>> >+   for (i = 0; i < nr_pages; i++) {
>> >+   unsigned long pfn = start_pfn + i;
>> >+   set_page_links(pfn_to_page(pfn), zone_idx(zone), nid, pfn);
>> >}
>> 
>> memmap_init_zone()->__init_single_page()->set_page_links()
>> 
>> Do I miss something that you call set_page_links() explicitly here?
>
>I guess you are right. Not sure why I've done this explicitly. I've most
>probably just missed that. Could you post a patch that removes the for
>loop.
>

Sure, I will come up with two patches based on you auto-latest branch.

>Thanks!
>-- 
>Michal Hocko
>SUSE Labs

-- 
Wei Yang
Help you, Help me


signature.asc
Description: PGP signature


Re: [PATCH 11/14] mm, memory_hotplug: do not associate hotadded memory to zones until online

2017-06-16 Thread Michal Hocko
On Fri 16-06-17 16:11:42, Wei Yang wrote:
> Well, I love this patch a lot. We don't need to put the hotadd memory in one
> zone and move it to another. This looks great!
> 
> On Mon, May 15, 2017 at 10:58:24AM +0200, Michal Hocko wrote:
> >From: Michal Hocko 
> >
> [...]
> +
> >+void move_pfn_range_to_zone(struct zone *zone,
> >+unsigned long start_pfn, unsigned long nr_pages)
> >+{
> >+struct pglist_data *pgdat = zone->zone_pgdat;
> >+int nid = pgdat->node_id;
> >+unsigned long flags;
> >+unsigned long i;
> >+
> >+if (zone_is_empty(zone))
> >+init_currently_empty_zone(zone, start_pfn, nr_pages);
> >+
> >+clear_zone_contiguous(zone);
> >+
> >+/* TODO Huh pgdat is irqsave while zone is not. It used to be like that 
> >before */
> >+pgdat_resize_lock(pgdat, );
> >+zone_span_writelock(zone);
> >+resize_zone_range(zone, start_pfn, nr_pages);
> >+zone_span_writeunlock(zone);
> >+resize_pgdat_range(pgdat, start_pfn, nr_pages);
> >+pgdat_resize_unlock(pgdat, );
> >+
> >+/*
> >+ * TODO now we have a visible range of pages which are not associated
> >+ * with their zone properly. Not nice but set_pfnblock_flags_mask
> >+ * expects the zone spans the pfn range. All the pages in the range
> >+ * are reserved so nobody should be touching them so we should be safe
> >+ */
> >+memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, 
> >MEMMAP_HOTPLUG);
> >+for (i = 0; i < nr_pages; i++) {
> >+unsigned long pfn = start_pfn + i;
> >+set_page_links(pfn_to_page(pfn), zone_idx(zone), nid, pfn);
> > }
> 
> memmap_init_zone()->__init_single_page()->set_page_links()
> 
> Do I miss something that you call set_page_links() explicitly here?

I guess you are right. Not sure why I've done this explicitly. I've most
probably just missed that. Could you post a patch that removes the for
loop.

Thanks!
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 11/14] mm, memory_hotplug: do not associate hotadded memory to zones until online

2017-06-16 Thread Michal Hocko
On Fri 16-06-17 16:11:42, Wei Yang wrote:
> Well, I love this patch a lot. We don't need to put the hotadd memory in one
> zone and move it to another. This looks great!
> 
> On Mon, May 15, 2017 at 10:58:24AM +0200, Michal Hocko wrote:
> >From: Michal Hocko 
> >
> [...]
> +
> >+void move_pfn_range_to_zone(struct zone *zone,
> >+unsigned long start_pfn, unsigned long nr_pages)
> >+{
> >+struct pglist_data *pgdat = zone->zone_pgdat;
> >+int nid = pgdat->node_id;
> >+unsigned long flags;
> >+unsigned long i;
> >+
> >+if (zone_is_empty(zone))
> >+init_currently_empty_zone(zone, start_pfn, nr_pages);
> >+
> >+clear_zone_contiguous(zone);
> >+
> >+/* TODO Huh pgdat is irqsave while zone is not. It used to be like that 
> >before */
> >+pgdat_resize_lock(pgdat, );
> >+zone_span_writelock(zone);
> >+resize_zone_range(zone, start_pfn, nr_pages);
> >+zone_span_writeunlock(zone);
> >+resize_pgdat_range(pgdat, start_pfn, nr_pages);
> >+pgdat_resize_unlock(pgdat, );
> >+
> >+/*
> >+ * TODO now we have a visible range of pages which are not associated
> >+ * with their zone properly. Not nice but set_pfnblock_flags_mask
> >+ * expects the zone spans the pfn range. All the pages in the range
> >+ * are reserved so nobody should be touching them so we should be safe
> >+ */
> >+memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, 
> >MEMMAP_HOTPLUG);
> >+for (i = 0; i < nr_pages; i++) {
> >+unsigned long pfn = start_pfn + i;
> >+set_page_links(pfn_to_page(pfn), zone_idx(zone), nid, pfn);
> > }
> 
> memmap_init_zone()->__init_single_page()->set_page_links()
> 
> Do I miss something that you call set_page_links() explicitly here?

I guess you are right. Not sure why I've done this explicitly. I've most
probably just missed that. Could you post a patch that removes the for
loop.

Thanks!
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 11/14] mm, memory_hotplug: do not associate hotadded memory to zones until online

2017-06-16 Thread Wei Yang
Well, I love this patch a lot. We don't need to put the hotadd memory in one
zone and move it to another. This looks great!

On Mon, May 15, 2017 at 10:58:24AM +0200, Michal Hocko wrote:
>From: Michal Hocko 
>
[...]
+
>+void move_pfn_range_to_zone(struct zone *zone,
>+  unsigned long start_pfn, unsigned long nr_pages)
>+{
>+  struct pglist_data *pgdat = zone->zone_pgdat;
>+  int nid = pgdat->node_id;
>+  unsigned long flags;
>+  unsigned long i;
>+
>+  if (zone_is_empty(zone))
>+  init_currently_empty_zone(zone, start_pfn, nr_pages);
>+
>+  clear_zone_contiguous(zone);
>+
>+  /* TODO Huh pgdat is irqsave while zone is not. It used to be like that 
>before */
>+  pgdat_resize_lock(pgdat, );
>+  zone_span_writelock(zone);
>+  resize_zone_range(zone, start_pfn, nr_pages);
>+  zone_span_writeunlock(zone);
>+  resize_pgdat_range(pgdat, start_pfn, nr_pages);
>+  pgdat_resize_unlock(pgdat, );
>+
>+  /*
>+   * TODO now we have a visible range of pages which are not associated
>+   * with their zone properly. Not nice but set_pfnblock_flags_mask
>+   * expects the zone spans the pfn range. All the pages in the range
>+   * are reserved so nobody should be touching them so we should be safe
>+   */
>+  memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, 
>MEMMAP_HOTPLUG);
>+  for (i = 0; i < nr_pages; i++) {
>+  unsigned long pfn = start_pfn + i;
>+  set_page_links(pfn_to_page(pfn), zone_idx(zone), nid, pfn);
>   }

memmap_init_zone()->__init_single_page()->set_page_links()

Do I miss something that you call set_page_links() explicitly here?

-- 
Wei Yang
Help you, Help me


signature.asc
Description: PGP signature


Re: [PATCH 11/14] mm, memory_hotplug: do not associate hotadded memory to zones until online

2017-06-16 Thread Wei Yang
Well, I love this patch a lot. We don't need to put the hotadd memory in one
zone and move it to another. This looks great!

On Mon, May 15, 2017 at 10:58:24AM +0200, Michal Hocko wrote:
>From: Michal Hocko 
>
[...]
+
>+void move_pfn_range_to_zone(struct zone *zone,
>+  unsigned long start_pfn, unsigned long nr_pages)
>+{
>+  struct pglist_data *pgdat = zone->zone_pgdat;
>+  int nid = pgdat->node_id;
>+  unsigned long flags;
>+  unsigned long i;
>+
>+  if (zone_is_empty(zone))
>+  init_currently_empty_zone(zone, start_pfn, nr_pages);
>+
>+  clear_zone_contiguous(zone);
>+
>+  /* TODO Huh pgdat is irqsave while zone is not. It used to be like that 
>before */
>+  pgdat_resize_lock(pgdat, );
>+  zone_span_writelock(zone);
>+  resize_zone_range(zone, start_pfn, nr_pages);
>+  zone_span_writeunlock(zone);
>+  resize_pgdat_range(pgdat, start_pfn, nr_pages);
>+  pgdat_resize_unlock(pgdat, );
>+
>+  /*
>+   * TODO now we have a visible range of pages which are not associated
>+   * with their zone properly. Not nice but set_pfnblock_flags_mask
>+   * expects the zone spans the pfn range. All the pages in the range
>+   * are reserved so nobody should be touching them so we should be safe
>+   */
>+  memmap_init_zone(nr_pages, nid, zone_idx(zone), start_pfn, 
>MEMMAP_HOTPLUG);
>+  for (i = 0; i < nr_pages; i++) {
>+  unsigned long pfn = start_pfn + i;
>+  set_page_links(pfn_to_page(pfn), zone_idx(zone), nid, pfn);
>   }

memmap_init_zone()->__init_single_page()->set_page_links()

Do I miss something that you call set_page_links() explicitly here?

-- 
Wei Yang
Help you, Help me


signature.asc
Description: PGP signature


Re: [PATCH 11/14] mm, memory_hotplug: do not associate hotadded memory to zones until online

2017-06-16 Thread Michal Hocko
[Please try to trim the context you are replying to]

On Fri 16-06-17 12:20:58, Wei Yang wrote:
> On Mon, May 15, 2017 at 10:58:24AM +0200, Michal Hocko wrote:
[...]
> > /*
> >+ * Return true if [start_pfn, start_pfn + nr_pages) range has a non-empty
> >+ * intersection with the given zone
> >+ */
> >+static inline bool zone_intersects(struct zone *zone,
> >+unsigned long start_pfn, unsigned long nr_pages)
> >+{
> >+if (zone_is_empty(zone))
> >+return false;
> >+if (start_pfn >= zone_end_pfn(zone))
> >+return false;
> >+
> >+if (zone->zone_start_pfn <= start_pfn)
> >+return true;
> >+if (start_pfn + nr_pages > zone->zone_start_pfn)
> >+return true;
> >+
> >+return false;
> >+}
> 
> I think this could be simplified as:
> 
> static inline bool zone_intersects(struct zone *zone,
>   unsigned long start_pfn, unsigned long nr_pages)
> {
>   if (zone_is_empty(zone))
>   return false;
> 
>   if (start_pfn >= zone_end_pfn(zone) ||
>   start_pfn + nr_pages <= zone->zone_start_pfn)
>   return false;
> 
>   return true;
> }

Feel free to send a patch.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 11/14] mm, memory_hotplug: do not associate hotadded memory to zones until online

2017-06-16 Thread Michal Hocko
[Please try to trim the context you are replying to]

On Fri 16-06-17 12:20:58, Wei Yang wrote:
> On Mon, May 15, 2017 at 10:58:24AM +0200, Michal Hocko wrote:
[...]
> > /*
> >+ * Return true if [start_pfn, start_pfn + nr_pages) range has a non-empty
> >+ * intersection with the given zone
> >+ */
> >+static inline bool zone_intersects(struct zone *zone,
> >+unsigned long start_pfn, unsigned long nr_pages)
> >+{
> >+if (zone_is_empty(zone))
> >+return false;
> >+if (start_pfn >= zone_end_pfn(zone))
> >+return false;
> >+
> >+if (zone->zone_start_pfn <= start_pfn)
> >+return true;
> >+if (start_pfn + nr_pages > zone->zone_start_pfn)
> >+return true;
> >+
> >+return false;
> >+}
> 
> I think this could be simplified as:
> 
> static inline bool zone_intersects(struct zone *zone,
>   unsigned long start_pfn, unsigned long nr_pages)
> {
>   if (zone_is_empty(zone))
>   return false;
> 
>   if (start_pfn >= zone_end_pfn(zone) ||
>   start_pfn + nr_pages <= zone->zone_start_pfn)
>   return false;
> 
>   return true;
> }

Feel free to send a patch.
-- 
Michal Hocko
SUSE Labs


Re: [PATCH 11/14] mm, memory_hotplug: do not associate hotadded memory to zones until online

2017-06-15 Thread Wei Yang
On Mon, May 15, 2017 at 10:58:24AM +0200, Michal Hocko wrote:
>From: Michal Hocko 
>
>The current memory hotplug implementation relies on having all the
>struct pages associate with a zone/node during the physical hotplug phase
>(arch_add_memory->__add_pages->__add_section->__add_zone). In the vast
>majority of cases this means that they are added to ZONE_NORMAL. This
>has been so since 9d99aaa31f59 ("[PATCH] x86_64: Support memory hotadd
>without sparsemem") and it wasn't a big deal back then because movable
>onlining didn't exist yet.
>
>Much later memory hotplug wanted to (ab)use ZONE_MOVABLE for movable
>onlining 511c2aba8f07 ("mm, memory-hotplug: dynamic configure movable
>memory and portion memory") and then things got more complicated. Rather
>than reconsidering the zone association which was no longer needed
>(because the memory hotplug already depended on SPARSEMEM) a convoluted
>semantic of zone shifting has been developed. Only the currently last
>memblock or the one adjacent to the zone_movable can be onlined movable.
>This essentially means that the online type changes as the new memblocks
>are added.
>
>Let's simulate memory hot online manually
>$ echo 0x1 > /sys/devices/system/memory/probe
>$ grep . /sys/devices/system/memory/memory32/valid_zones
>Normal Movable
>
>$ echo $((0x1+(128<<20))) > /sys/devices/system/memory/probe
>$ grep . /sys/devices/system/memory/memory3?/valid_zones
>/sys/devices/system/memory/memory32/valid_zones:Normal
>/sys/devices/system/memory/memory33/valid_zones:Normal Movable
>
>$ echo $((0x1+2*(128<<20))) > /sys/devices/system/memory/probe
>$ grep . /sys/devices/system/memory/memory3?/valid_zones
>/sys/devices/system/memory/memory32/valid_zones:Normal
>/sys/devices/system/memory/memory33/valid_zones:Normal
>/sys/devices/system/memory/memory34/valid_zones:Normal Movable
>
>$ echo online_movable > /sys/devices/system/memory/memory34/state
>$ grep . /sys/devices/system/memory/memory3?/valid_zones
>/sys/devices/system/memory/memory32/valid_zones:Normal
>/sys/devices/system/memory/memory33/valid_zones:Normal Movable
>/sys/devices/system/memory/memory34/valid_zones:Movable Normal
>
>This is an awkward semantic because an udev event is sent as soon as the
>block is onlined and an udev handler might want to online it based on
>some policy (e.g. association with a node) but it will inherently race
>with new blocks showing up.
>
>This patch changes the physical online phase to not associate pages
>with any zone at all. All the pages are just marked reserved and wait
>for the onlining phase to be associated with the zone as per the online
>request. There are only two requirements
>   - existing ZONE_NORMAL and ZONE_MOVABLE cannot overlap
>   - ZONE_NORMAL precedes ZONE_MOVABLE in physical addresses
>the later on is not an inherent requirement and can be changed in the
>future. It preserves the current behavior and made the code slightly
>simpler. This is subject to change in future.
>
>This means that the same physical online steps as above will lead to the
>following state:
>Normal Movable
>
>/sys/devices/system/memory/memory32/valid_zones:Normal Movable
>/sys/devices/system/memory/memory33/valid_zones:Normal Movable
>
>/sys/devices/system/memory/memory32/valid_zones:Normal Movable
>/sys/devices/system/memory/memory33/valid_zones:Normal Movable
>/sys/devices/system/memory/memory34/valid_zones:Normal Movable
>
>/sys/devices/system/memory/memory32/valid_zones:Normal Movable
>/sys/devices/system/memory/memory33/valid_zones:Normal Movable
>/sys/devices/system/memory/memory34/valid_zones:Movable
>
>Implementation:
>The current move_pfn_range is reimplemented to check the above
>requirements (allow_online_pfn_range) and then updates the respective
>zone (move_pfn_range_to_zone), the pgdat and links all the pages in the
>pfn range with the zone/node. __add_pages is updated to not require the
>zone and only initializes sections in the range. This allowed to
>simplify the arch_add_memory code (s390 could get rid of quite some
>of code).
>
>devm_memremap_pages is the only user of arch_add_memory which relies
>on the zone association because it only hooks into the memory hotplug
>only half way. It uses it to associate the new memory with ZONE_DEVICE
>but doesn't allow it to be {on,off}lined via sysfs. This means that this
>particular code path has to call move_pfn_range_to_zone explicitly.
>
>The original zone shifting code is kept in place and will be removed in
>the follow up patch for an easier review.
>
>Please note that this patch also changes the original behavior when
>offlining a memory block adjacent to another zone (Normal vs. Movable)
>used to allow to change its movable type. This will be handled later.
>
>Changes since v1
>- we have to associate the page with the node early (in __add_section),
>  because pfn_to_node depends on struct page containing this
>  information - based on testing by Reza Arbab
>- 

Re: [PATCH 11/14] mm, memory_hotplug: do not associate hotadded memory to zones until online

2017-06-15 Thread Wei Yang
On Mon, May 15, 2017 at 10:58:24AM +0200, Michal Hocko wrote:
>From: Michal Hocko 
>
>The current memory hotplug implementation relies on having all the
>struct pages associate with a zone/node during the physical hotplug phase
>(arch_add_memory->__add_pages->__add_section->__add_zone). In the vast
>majority of cases this means that they are added to ZONE_NORMAL. This
>has been so since 9d99aaa31f59 ("[PATCH] x86_64: Support memory hotadd
>without sparsemem") and it wasn't a big deal back then because movable
>onlining didn't exist yet.
>
>Much later memory hotplug wanted to (ab)use ZONE_MOVABLE for movable
>onlining 511c2aba8f07 ("mm, memory-hotplug: dynamic configure movable
>memory and portion memory") and then things got more complicated. Rather
>than reconsidering the zone association which was no longer needed
>(because the memory hotplug already depended on SPARSEMEM) a convoluted
>semantic of zone shifting has been developed. Only the currently last
>memblock or the one adjacent to the zone_movable can be onlined movable.
>This essentially means that the online type changes as the new memblocks
>are added.
>
>Let's simulate memory hot online manually
>$ echo 0x1 > /sys/devices/system/memory/probe
>$ grep . /sys/devices/system/memory/memory32/valid_zones
>Normal Movable
>
>$ echo $((0x1+(128<<20))) > /sys/devices/system/memory/probe
>$ grep . /sys/devices/system/memory/memory3?/valid_zones
>/sys/devices/system/memory/memory32/valid_zones:Normal
>/sys/devices/system/memory/memory33/valid_zones:Normal Movable
>
>$ echo $((0x1+2*(128<<20))) > /sys/devices/system/memory/probe
>$ grep . /sys/devices/system/memory/memory3?/valid_zones
>/sys/devices/system/memory/memory32/valid_zones:Normal
>/sys/devices/system/memory/memory33/valid_zones:Normal
>/sys/devices/system/memory/memory34/valid_zones:Normal Movable
>
>$ echo online_movable > /sys/devices/system/memory/memory34/state
>$ grep . /sys/devices/system/memory/memory3?/valid_zones
>/sys/devices/system/memory/memory32/valid_zones:Normal
>/sys/devices/system/memory/memory33/valid_zones:Normal Movable
>/sys/devices/system/memory/memory34/valid_zones:Movable Normal
>
>This is an awkward semantic because an udev event is sent as soon as the
>block is onlined and an udev handler might want to online it based on
>some policy (e.g. association with a node) but it will inherently race
>with new blocks showing up.
>
>This patch changes the physical online phase to not associate pages
>with any zone at all. All the pages are just marked reserved and wait
>for the onlining phase to be associated with the zone as per the online
>request. There are only two requirements
>   - existing ZONE_NORMAL and ZONE_MOVABLE cannot overlap
>   - ZONE_NORMAL precedes ZONE_MOVABLE in physical addresses
>the later on is not an inherent requirement and can be changed in the
>future. It preserves the current behavior and made the code slightly
>simpler. This is subject to change in future.
>
>This means that the same physical online steps as above will lead to the
>following state:
>Normal Movable
>
>/sys/devices/system/memory/memory32/valid_zones:Normal Movable
>/sys/devices/system/memory/memory33/valid_zones:Normal Movable
>
>/sys/devices/system/memory/memory32/valid_zones:Normal Movable
>/sys/devices/system/memory/memory33/valid_zones:Normal Movable
>/sys/devices/system/memory/memory34/valid_zones:Normal Movable
>
>/sys/devices/system/memory/memory32/valid_zones:Normal Movable
>/sys/devices/system/memory/memory33/valid_zones:Normal Movable
>/sys/devices/system/memory/memory34/valid_zones:Movable
>
>Implementation:
>The current move_pfn_range is reimplemented to check the above
>requirements (allow_online_pfn_range) and then updates the respective
>zone (move_pfn_range_to_zone), the pgdat and links all the pages in the
>pfn range with the zone/node. __add_pages is updated to not require the
>zone and only initializes sections in the range. This allowed to
>simplify the arch_add_memory code (s390 could get rid of quite some
>of code).
>
>devm_memremap_pages is the only user of arch_add_memory which relies
>on the zone association because it only hooks into the memory hotplug
>only half way. It uses it to associate the new memory with ZONE_DEVICE
>but doesn't allow it to be {on,off}lined via sysfs. This means that this
>particular code path has to call move_pfn_range_to_zone explicitly.
>
>The original zone shifting code is kept in place and will be removed in
>the follow up patch for an easier review.
>
>Please note that this patch also changes the original behavior when
>offlining a memory block adjacent to another zone (Normal vs. Movable)
>used to allow to change its movable type. This will be handled later.
>
>Changes since v1
>- we have to associate the page with the node early (in __add_section),
>  because pfn_to_node depends on struct page containing this
>  information - based on testing by Reza Arbab
>- resize_{zone,pgdat}_range has to check 

Re: [PATCH 11/14] mm, memory_hotplug: do not associate hotadded memory to zones until online

2017-05-19 Thread Vlastimil Babka
On 05/15/2017 10:58 AM, Michal Hocko wrote:
> From: Michal Hocko 
> 
> The current memory hotplug implementation relies on having all the
> struct pages associate with a zone/node during the physical hotplug phase
> (arch_add_memory->__add_pages->__add_section->__add_zone). In the vast
> majority of cases this means that they are added to ZONE_NORMAL. This
> has been so since 9d99aaa31f59 ("[PATCH] x86_64: Support memory hotadd
> without sparsemem") and it wasn't a big deal back then because movable
> onlining didn't exist yet.
> 
> Much later memory hotplug wanted to (ab)use ZONE_MOVABLE for movable
> onlining 511c2aba8f07 ("mm, memory-hotplug: dynamic configure movable
> memory and portion memory") and then things got more complicated. Rather
> than reconsidering the zone association which was no longer needed
> (because the memory hotplug already depended on SPARSEMEM) a convoluted
> semantic of zone shifting has been developed. Only the currently last
> memblock or the one adjacent to the zone_movable can be onlined movable.
> This essentially means that the online type changes as the new memblocks
> are added.
> 
> Let's simulate memory hot online manually
> $ echo 0x1 > /sys/devices/system/memory/probe
> $ grep . /sys/devices/system/memory/memory32/valid_zones
> Normal Movable
> 
> $ echo $((0x1+(128<<20))) > /sys/devices/system/memory/probe
> $ grep . /sys/devices/system/memory/memory3?/valid_zones
> /sys/devices/system/memory/memory32/valid_zones:Normal
> /sys/devices/system/memory/memory33/valid_zones:Normal Movable
> 
> $ echo $((0x1+2*(128<<20))) > /sys/devices/system/memory/probe
> $ grep . /sys/devices/system/memory/memory3?/valid_zones
> /sys/devices/system/memory/memory32/valid_zones:Normal
> /sys/devices/system/memory/memory33/valid_zones:Normal
> /sys/devices/system/memory/memory34/valid_zones:Normal Movable
> 
> $ echo online_movable > /sys/devices/system/memory/memory34/state
> $ grep . /sys/devices/system/memory/memory3?/valid_zones
> /sys/devices/system/memory/memory32/valid_zones:Normal
> /sys/devices/system/memory/memory33/valid_zones:Normal Movable
> /sys/devices/system/memory/memory34/valid_zones:Movable Normal
> 
> This is an awkward semantic because an udev event is sent as soon as the
> block is onlined and an udev handler might want to online it based on
> some policy (e.g. association with a node) but it will inherently race
> with new blocks showing up.
> 
> This patch changes the physical online phase to not associate pages
> with any zone at all. All the pages are just marked reserved and wait
> for the onlining phase to be associated with the zone as per the online
> request. There are only two requirements
>   - existing ZONE_NORMAL and ZONE_MOVABLE cannot overlap
>   - ZONE_NORMAL precedes ZONE_MOVABLE in physical addresses
> the later on is not an inherent requirement and can be changed in the
> future. It preserves the current behavior and made the code slightly
> simpler. This is subject to change in future.
> 
> This means that the same physical online steps as above will lead to the
> following state:
> Normal Movable
> 
> /sys/devices/system/memory/memory32/valid_zones:Normal Movable
> /sys/devices/system/memory/memory33/valid_zones:Normal Movable
> 
> /sys/devices/system/memory/memory32/valid_zones:Normal Movable
> /sys/devices/system/memory/memory33/valid_zones:Normal Movable
> /sys/devices/system/memory/memory34/valid_zones:Normal Movable
> 
> /sys/devices/system/memory/memory32/valid_zones:Normal Movable
> /sys/devices/system/memory/memory33/valid_zones:Normal Movable
> /sys/devices/system/memory/memory34/valid_zones:Movable
> 
> Implementation:
> The current move_pfn_range is reimplemented to check the above
> requirements (allow_online_pfn_range) and then updates the respective
> zone (move_pfn_range_to_zone), the pgdat and links all the pages in the
> pfn range with the zone/node. __add_pages is updated to not require the
> zone and only initializes sections in the range. This allowed to
> simplify the arch_add_memory code (s390 could get rid of quite some
> of code).
> 
> devm_memremap_pages is the only user of arch_add_memory which relies
> on the zone association because it only hooks into the memory hotplug
> only half way. It uses it to associate the new memory with ZONE_DEVICE
> but doesn't allow it to be {on,off}lined via sysfs. This means that this
> particular code path has to call move_pfn_range_to_zone explicitly.
> 
> The original zone shifting code is kept in place and will be removed in
> the follow up patch for an easier review.
> 
> Please note that this patch also changes the original behavior when
> offlining a memory block adjacent to another zone (Normal vs. Movable)
> used to allow to change its movable type. This will be handled later.
> 
> Changes since v1
> - we have to associate the page with the node early (in __add_section),
>   because pfn_to_node depends on struct page containing 

Re: [PATCH 11/14] mm, memory_hotplug: do not associate hotadded memory to zones until online

2017-05-19 Thread Vlastimil Babka
On 05/15/2017 10:58 AM, Michal Hocko wrote:
> From: Michal Hocko 
> 
> The current memory hotplug implementation relies on having all the
> struct pages associate with a zone/node during the physical hotplug phase
> (arch_add_memory->__add_pages->__add_section->__add_zone). In the vast
> majority of cases this means that they are added to ZONE_NORMAL. This
> has been so since 9d99aaa31f59 ("[PATCH] x86_64: Support memory hotadd
> without sparsemem") and it wasn't a big deal back then because movable
> onlining didn't exist yet.
> 
> Much later memory hotplug wanted to (ab)use ZONE_MOVABLE for movable
> onlining 511c2aba8f07 ("mm, memory-hotplug: dynamic configure movable
> memory and portion memory") and then things got more complicated. Rather
> than reconsidering the zone association which was no longer needed
> (because the memory hotplug already depended on SPARSEMEM) a convoluted
> semantic of zone shifting has been developed. Only the currently last
> memblock or the one adjacent to the zone_movable can be onlined movable.
> This essentially means that the online type changes as the new memblocks
> are added.
> 
> Let's simulate memory hot online manually
> $ echo 0x1 > /sys/devices/system/memory/probe
> $ grep . /sys/devices/system/memory/memory32/valid_zones
> Normal Movable
> 
> $ echo $((0x1+(128<<20))) > /sys/devices/system/memory/probe
> $ grep . /sys/devices/system/memory/memory3?/valid_zones
> /sys/devices/system/memory/memory32/valid_zones:Normal
> /sys/devices/system/memory/memory33/valid_zones:Normal Movable
> 
> $ echo $((0x1+2*(128<<20))) > /sys/devices/system/memory/probe
> $ grep . /sys/devices/system/memory/memory3?/valid_zones
> /sys/devices/system/memory/memory32/valid_zones:Normal
> /sys/devices/system/memory/memory33/valid_zones:Normal
> /sys/devices/system/memory/memory34/valid_zones:Normal Movable
> 
> $ echo online_movable > /sys/devices/system/memory/memory34/state
> $ grep . /sys/devices/system/memory/memory3?/valid_zones
> /sys/devices/system/memory/memory32/valid_zones:Normal
> /sys/devices/system/memory/memory33/valid_zones:Normal Movable
> /sys/devices/system/memory/memory34/valid_zones:Movable Normal
> 
> This is an awkward semantic because an udev event is sent as soon as the
> block is onlined and an udev handler might want to online it based on
> some policy (e.g. association with a node) but it will inherently race
> with new blocks showing up.
> 
> This patch changes the physical online phase to not associate pages
> with any zone at all. All the pages are just marked reserved and wait
> for the onlining phase to be associated with the zone as per the online
> request. There are only two requirements
>   - existing ZONE_NORMAL and ZONE_MOVABLE cannot overlap
>   - ZONE_NORMAL precedes ZONE_MOVABLE in physical addresses
> the later on is not an inherent requirement and can be changed in the
> future. It preserves the current behavior and made the code slightly
> simpler. This is subject to change in future.
> 
> This means that the same physical online steps as above will lead to the
> following state:
> Normal Movable
> 
> /sys/devices/system/memory/memory32/valid_zones:Normal Movable
> /sys/devices/system/memory/memory33/valid_zones:Normal Movable
> 
> /sys/devices/system/memory/memory32/valid_zones:Normal Movable
> /sys/devices/system/memory/memory33/valid_zones:Normal Movable
> /sys/devices/system/memory/memory34/valid_zones:Normal Movable
> 
> /sys/devices/system/memory/memory32/valid_zones:Normal Movable
> /sys/devices/system/memory/memory33/valid_zones:Normal Movable
> /sys/devices/system/memory/memory34/valid_zones:Movable
> 
> Implementation:
> The current move_pfn_range is reimplemented to check the above
> requirements (allow_online_pfn_range) and then updates the respective
> zone (move_pfn_range_to_zone), the pgdat and links all the pages in the
> pfn range with the zone/node. __add_pages is updated to not require the
> zone and only initializes sections in the range. This allowed to
> simplify the arch_add_memory code (s390 could get rid of quite some
> of code).
> 
> devm_memremap_pages is the only user of arch_add_memory which relies
> on the zone association because it only hooks into the memory hotplug
> only half way. It uses it to associate the new memory with ZONE_DEVICE
> but doesn't allow it to be {on,off}lined via sysfs. This means that this
> particular code path has to call move_pfn_range_to_zone explicitly.
> 
> The original zone shifting code is kept in place and will be removed in
> the follow up patch for an easier review.
> 
> Please note that this patch also changes the original behavior when
> offlining a memory block adjacent to another zone (Normal vs. Movable)
> used to allow to change its movable type. This will be handled later.
> 
> Changes since v1
> - we have to associate the page with the node early (in __add_section),
>   because pfn_to_node depends on struct page containing this
>   

[PATCH 11/14] mm, memory_hotplug: do not associate hotadded memory to zones until online

2017-05-15 Thread Michal Hocko
From: Michal Hocko 

The current memory hotplug implementation relies on having all the
struct pages associate with a zone/node during the physical hotplug phase
(arch_add_memory->__add_pages->__add_section->__add_zone). In the vast
majority of cases this means that they are added to ZONE_NORMAL. This
has been so since 9d99aaa31f59 ("[PATCH] x86_64: Support memory hotadd
without sparsemem") and it wasn't a big deal back then because movable
onlining didn't exist yet.

Much later memory hotplug wanted to (ab)use ZONE_MOVABLE for movable
onlining 511c2aba8f07 ("mm, memory-hotplug: dynamic configure movable
memory and portion memory") and then things got more complicated. Rather
than reconsidering the zone association which was no longer needed
(because the memory hotplug already depended on SPARSEMEM) a convoluted
semantic of zone shifting has been developed. Only the currently last
memblock or the one adjacent to the zone_movable can be onlined movable.
This essentially means that the online type changes as the new memblocks
are added.

Let's simulate memory hot online manually
$ echo 0x1 > /sys/devices/system/memory/probe
$ grep . /sys/devices/system/memory/memory32/valid_zones
Normal Movable

$ echo $((0x1+(128<<20))) > /sys/devices/system/memory/probe
$ grep . /sys/devices/system/memory/memory3?/valid_zones
/sys/devices/system/memory/memory32/valid_zones:Normal
/sys/devices/system/memory/memory33/valid_zones:Normal Movable

$ echo $((0x1+2*(128<<20))) > /sys/devices/system/memory/probe
$ grep . /sys/devices/system/memory/memory3?/valid_zones
/sys/devices/system/memory/memory32/valid_zones:Normal
/sys/devices/system/memory/memory33/valid_zones:Normal
/sys/devices/system/memory/memory34/valid_zones:Normal Movable

$ echo online_movable > /sys/devices/system/memory/memory34/state
$ grep . /sys/devices/system/memory/memory3?/valid_zones
/sys/devices/system/memory/memory32/valid_zones:Normal
/sys/devices/system/memory/memory33/valid_zones:Normal Movable
/sys/devices/system/memory/memory34/valid_zones:Movable Normal

This is an awkward semantic because an udev event is sent as soon as the
block is onlined and an udev handler might want to online it based on
some policy (e.g. association with a node) but it will inherently race
with new blocks showing up.

This patch changes the physical online phase to not associate pages
with any zone at all. All the pages are just marked reserved and wait
for the onlining phase to be associated with the zone as per the online
request. There are only two requirements
- existing ZONE_NORMAL and ZONE_MOVABLE cannot overlap
- ZONE_NORMAL precedes ZONE_MOVABLE in physical addresses
the later on is not an inherent requirement and can be changed in the
future. It preserves the current behavior and made the code slightly
simpler. This is subject to change in future.

This means that the same physical online steps as above will lead to the
following state:
Normal Movable

/sys/devices/system/memory/memory32/valid_zones:Normal Movable
/sys/devices/system/memory/memory33/valid_zones:Normal Movable

/sys/devices/system/memory/memory32/valid_zones:Normal Movable
/sys/devices/system/memory/memory33/valid_zones:Normal Movable
/sys/devices/system/memory/memory34/valid_zones:Normal Movable

/sys/devices/system/memory/memory32/valid_zones:Normal Movable
/sys/devices/system/memory/memory33/valid_zones:Normal Movable
/sys/devices/system/memory/memory34/valid_zones:Movable

Implementation:
The current move_pfn_range is reimplemented to check the above
requirements (allow_online_pfn_range) and then updates the respective
zone (move_pfn_range_to_zone), the pgdat and links all the pages in the
pfn range with the zone/node. __add_pages is updated to not require the
zone and only initializes sections in the range. This allowed to
simplify the arch_add_memory code (s390 could get rid of quite some
of code).

devm_memremap_pages is the only user of arch_add_memory which relies
on the zone association because it only hooks into the memory hotplug
only half way. It uses it to associate the new memory with ZONE_DEVICE
but doesn't allow it to be {on,off}lined via sysfs. This means that this
particular code path has to call move_pfn_range_to_zone explicitly.

The original zone shifting code is kept in place and will be removed in
the follow up patch for an easier review.

Please note that this patch also changes the original behavior when
offlining a memory block adjacent to another zone (Normal vs. Movable)
used to allow to change its movable type. This will be handled later.

Changes since v1
- we have to associate the page with the node early (in __add_section),
  because pfn_to_node depends on struct page containing this
  information - based on testing by Reza Arbab
- resize_{zone,pgdat}_range has to check whether they are popoulated -
  Reza Arbab
- fix devm_memremap_pages to use pfn rather than physical address -
  Jérôme Glisse
- 

[PATCH 11/14] mm, memory_hotplug: do not associate hotadded memory to zones until online

2017-05-15 Thread Michal Hocko
From: Michal Hocko 

The current memory hotplug implementation relies on having all the
struct pages associate with a zone/node during the physical hotplug phase
(arch_add_memory->__add_pages->__add_section->__add_zone). In the vast
majority of cases this means that they are added to ZONE_NORMAL. This
has been so since 9d99aaa31f59 ("[PATCH] x86_64: Support memory hotadd
without sparsemem") and it wasn't a big deal back then because movable
onlining didn't exist yet.

Much later memory hotplug wanted to (ab)use ZONE_MOVABLE for movable
onlining 511c2aba8f07 ("mm, memory-hotplug: dynamic configure movable
memory and portion memory") and then things got more complicated. Rather
than reconsidering the zone association which was no longer needed
(because the memory hotplug already depended on SPARSEMEM) a convoluted
semantic of zone shifting has been developed. Only the currently last
memblock or the one adjacent to the zone_movable can be onlined movable.
This essentially means that the online type changes as the new memblocks
are added.

Let's simulate memory hot online manually
$ echo 0x1 > /sys/devices/system/memory/probe
$ grep . /sys/devices/system/memory/memory32/valid_zones
Normal Movable

$ echo $((0x1+(128<<20))) > /sys/devices/system/memory/probe
$ grep . /sys/devices/system/memory/memory3?/valid_zones
/sys/devices/system/memory/memory32/valid_zones:Normal
/sys/devices/system/memory/memory33/valid_zones:Normal Movable

$ echo $((0x1+2*(128<<20))) > /sys/devices/system/memory/probe
$ grep . /sys/devices/system/memory/memory3?/valid_zones
/sys/devices/system/memory/memory32/valid_zones:Normal
/sys/devices/system/memory/memory33/valid_zones:Normal
/sys/devices/system/memory/memory34/valid_zones:Normal Movable

$ echo online_movable > /sys/devices/system/memory/memory34/state
$ grep . /sys/devices/system/memory/memory3?/valid_zones
/sys/devices/system/memory/memory32/valid_zones:Normal
/sys/devices/system/memory/memory33/valid_zones:Normal Movable
/sys/devices/system/memory/memory34/valid_zones:Movable Normal

This is an awkward semantic because an udev event is sent as soon as the
block is onlined and an udev handler might want to online it based on
some policy (e.g. association with a node) but it will inherently race
with new blocks showing up.

This patch changes the physical online phase to not associate pages
with any zone at all. All the pages are just marked reserved and wait
for the onlining phase to be associated with the zone as per the online
request. There are only two requirements
- existing ZONE_NORMAL and ZONE_MOVABLE cannot overlap
- ZONE_NORMAL precedes ZONE_MOVABLE in physical addresses
the later on is not an inherent requirement and can be changed in the
future. It preserves the current behavior and made the code slightly
simpler. This is subject to change in future.

This means that the same physical online steps as above will lead to the
following state:
Normal Movable

/sys/devices/system/memory/memory32/valid_zones:Normal Movable
/sys/devices/system/memory/memory33/valid_zones:Normal Movable

/sys/devices/system/memory/memory32/valid_zones:Normal Movable
/sys/devices/system/memory/memory33/valid_zones:Normal Movable
/sys/devices/system/memory/memory34/valid_zones:Normal Movable

/sys/devices/system/memory/memory32/valid_zones:Normal Movable
/sys/devices/system/memory/memory33/valid_zones:Normal Movable
/sys/devices/system/memory/memory34/valid_zones:Movable

Implementation:
The current move_pfn_range is reimplemented to check the above
requirements (allow_online_pfn_range) and then updates the respective
zone (move_pfn_range_to_zone), the pgdat and links all the pages in the
pfn range with the zone/node. __add_pages is updated to not require the
zone and only initializes sections in the range. This allowed to
simplify the arch_add_memory code (s390 could get rid of quite some
of code).

devm_memremap_pages is the only user of arch_add_memory which relies
on the zone association because it only hooks into the memory hotplug
only half way. It uses it to associate the new memory with ZONE_DEVICE
but doesn't allow it to be {on,off}lined via sysfs. This means that this
particular code path has to call move_pfn_range_to_zone explicitly.

The original zone shifting code is kept in place and will be removed in
the follow up patch for an easier review.

Please note that this patch also changes the original behavior when
offlining a memory block adjacent to another zone (Normal vs. Movable)
used to allow to change its movable type. This will be handled later.

Changes since v1
- we have to associate the page with the node early (in __add_section),
  because pfn_to_node depends on struct page containing this
  information - based on testing by Reza Arbab
- resize_{zone,pgdat}_range has to check whether they are popoulated -
  Reza Arbab
- fix devm_memremap_pages to use pfn rather than physical address -
  Jérôme Glisse
- move_pfn_range has to check