RE: [RFC][PATCH 0/2] x86/boot/KASLR: Restrict kernel to be randomized in mirror regions if existed

2017-06-15 Thread Izumi, Taku
Dear Baoquan,

> > Our customer reported that Kernel text may be located on non-mirror
> > region (movable zone) when both address range mirroring feature and
> > KASLR are enabled.

   I know your customer :)

> > The functions of address range mirroring feature are as follows.
> > - The physical memory region whose descriptors in EFI memory map have
> >   EFI_MEMORY_MORE_RELIABLE attribute (bit: 16) are mirrored
> > - The function arranges such mirror region into normal zone and other
> region
> >   into movable zone in order to locate kernel code and data on mirror
> > region
> >
> > So we need restrict kernel to be located inside mirror region if it is
> > existed.
> >
> > The method is very simple. If efi is enabled, just iterate all efi
> > memory map and pick up mirror region to process for adding candidate
> > of slot. If efi disabled or no mirror region existed, still process
> > e820 memory map. This won't bring much efficiency loss, at worst we
> > just go through all efi memory maps and found no mirror.
> >
> > One question:
> > From code, though mirror regions are existed, they are meaningful only
> > if kernelcore=mirror kernel option is specified. Not sure if my
> > understanding is correct.

   Your understanding is almost correct. 
   Only when "kernelcore=mirror" specified, the above procedure works.
   But, if mirrored regions are existed, bootmem allocator tries to 
   allocate from mirrored region independently of "kerenelcore=mirror" option.

   So, IMHO, kernel text is important, so putting it to mirrored (more reliable)
   region is reasonable whether or not "kernelcore=mirror" is specified.

   Anyway thanks for submitting patch.
   We have Address Range Mirroring capable machine, so we'll test your patch.

  Sincerely,
  Taku Izumi

> 
> Since you are the author of kernelcore=mirror related code and expert on
> mirror feature, could you help answer above question?
> 
> Thanks
> Baoquan
> >
> > NOTE:
> > I haven't got a machine with efi mirror region enabled, so only test
> > the
> > e820 map processing case and the case of no mirror region on efi machine.
> > So set this as a RFC patchset, will post formal one after above
> > question is made clear and mirror issue test passed.
> >
> > Baoquan He (2):
> >   x86/boot/KASLR: Adapt process_e820_entry for all kinds of memory map
> >   x86/boot/KASLR: Restrict kernel to be randomized in mirror regions if
> > existed
> >
> >  arch/x86/boot/compressed/kaslr.c | 129
> > +++
> >  1 file changed, 104 insertions(+), 25 deletions(-)
> >
> > --
> > 2.5.5
> >



RE: [RFC][PATCH 0/2] x86/boot/KASLR: Restrict kernel to be randomized in mirror regions if existed

2017-06-15 Thread Izumi, Taku
Dear Baoquan,

> > Our customer reported that Kernel text may be located on non-mirror
> > region (movable zone) when both address range mirroring feature and
> > KASLR are enabled.

   I know your customer :)

> > The functions of address range mirroring feature are as follows.
> > - The physical memory region whose descriptors in EFI memory map have
> >   EFI_MEMORY_MORE_RELIABLE attribute (bit: 16) are mirrored
> > - The function arranges such mirror region into normal zone and other
> region
> >   into movable zone in order to locate kernel code and data on mirror
> > region
> >
> > So we need restrict kernel to be located inside mirror region if it is
> > existed.
> >
> > The method is very simple. If efi is enabled, just iterate all efi
> > memory map and pick up mirror region to process for adding candidate
> > of slot. If efi disabled or no mirror region existed, still process
> > e820 memory map. This won't bring much efficiency loss, at worst we
> > just go through all efi memory maps and found no mirror.
> >
> > One question:
> > From code, though mirror regions are existed, they are meaningful only
> > if kernelcore=mirror kernel option is specified. Not sure if my
> > understanding is correct.

   Your understanding is almost correct. 
   Only when "kernelcore=mirror" specified, the above procedure works.
   But, if mirrored regions are existed, bootmem allocator tries to 
   allocate from mirrored region independently of "kerenelcore=mirror" option.

   So, IMHO, kernel text is important, so putting it to mirrored (more reliable)
   region is reasonable whether or not "kernelcore=mirror" is specified.

   Anyway thanks for submitting patch.
   We have Address Range Mirroring capable machine, so we'll test your patch.

  Sincerely,
  Taku Izumi

> 
> Since you are the author of kernelcore=mirror related code and expert on
> mirror feature, could you help answer above question?
> 
> Thanks
> Baoquan
> >
> > NOTE:
> > I haven't got a machine with efi mirror region enabled, so only test
> > the
> > e820 map processing case and the case of no mirror region on efi machine.
> > So set this as a RFC patchset, will post formal one after above
> > question is made clear and mirror issue test passed.
> >
> > Baoquan He (2):
> >   x86/boot/KASLR: Adapt process_e820_entry for all kinds of memory map
> >   x86/boot/KASLR: Restrict kernel to be randomized in mirror regions if
> > existed
> >
> >  arch/x86/boot/compressed/kaslr.c | 129
> > +++
> >  1 file changed, 104 insertions(+), 25 deletions(-)
> >
> > --
> > 2.5.5
> >



RE: [bug discuss] fjes driver call trace warning, "PNP0C02" used in fjes seems like a bug,

2016-06-09 Thread Izumi, Taku
Dear Gao,

> From a SW perspective it like an acpi driver that uses "PNP0C02"
> as driver ids to perform the driver match in the ACPI table.
> 
> From my understanding this is wrong in principle because that identifier
> must be used to reserve motherboard resources (see par 4.1.2 of the PCI
> Firmware Specifications v3.2)
> 
> Therefore such identifier it is used from
> http://lxr.free-electrons.com/source/drivers/pnp/system.c
> to reserve such resources.
> 
> Basically your driver is breaking any other device that
> needs to reserve motherboard resources through system.c
> driver.
> 
> @David Miller, what is your opinion about this?
> I think this driver should be reverted...

 I'm willing to revise my driver if it's something wrong.
 I can't reproduce this problem. Could you please show me how to reproduce 
problem ?

 Sincerely,
 Taku Izumi


RE: [bug discuss] fjes driver call trace warning, "PNP0C02" used in fjes seems like a bug,

2016-06-09 Thread Izumi, Taku
Dear Gao,

> From a SW perspective it like an acpi driver that uses "PNP0C02"
> as driver ids to perform the driver match in the ACPI table.
> 
> From my understanding this is wrong in principle because that identifier
> must be used to reserve motherboard resources (see par 4.1.2 of the PCI
> Firmware Specifications v3.2)
> 
> Therefore such identifier it is used from
> http://lxr.free-electrons.com/source/drivers/pnp/system.c
> to reserve such resources.
> 
> Basically your driver is breaking any other device that
> needs to reserve motherboard resources through system.c
> driver.
> 
> @David Miller, what is your opinion about this?
> I think this driver should be reverted...

 I'm willing to revise my driver if it's something wrong.
 I can't reproduce this problem. Could you please show me how to reproduce 
problem ?

 Sincerely,
 Taku Izumi


RE: [PATCH] net: fjes: fjes_main: Remove create_workqueue

2016-06-02 Thread Izumi, Taku
Dear Bhaktipriya,

Thanks. Looks good to me.

Sincerely,
Taku Izumi

> -Original Message-
> From: Bhaktipriya Shridhar [mailto:bhaktipriy...@gmail.com]
> Sent: Thursday, June 02, 2016 6:31 PM
> To: David S. Miller; Izumi, Taku/泉 拓; Florian Westphal; Bhaktipriya Shridhar
> Cc: Tejun Heo; net...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH] net: fjes: fjes_main: Remove create_workqueue
> 
> alloc_workqueue replaces deprecated create_workqueue().
> 
> The workqueue adapter->txrx_wq has workitem
> >raise_intr_rxdata_task per adapter. Extended Socket Network
> Device is shared memory based, so someone's transmission denotes other's
> reception.  raise_intr_rxdata_task raises interruption of receivers from
> the sender in order to notify receivers.
> 
> The workqueue adapter->control_wq has workitem
> >interrupt_watch_task per adapter. interrupt_watch_task is used
> to prevent delay of interrupts.
> 
> Dedicated workqueues have been used in both cases since the workitems
> on the workqueues are involved in normal device operation and require
> forward progress under memory pressure.
> 
> max_active has been set to 0 since there is no need for throttling
> the number of active work items.
> 
> Since network devices  may be used for memory reclaim,
> WQ_MEM_RECLAIM has been set to guarantee forward progress.
> 
> Signed-off-by: Bhaktipriya Shridhar <bhaktipriy...@gmail.com>
> ---
>  drivers/net/fjes/fjes_main.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c
> index 86c331b..9006877 100644
> --- a/drivers/net/fjes/fjes_main.c
> +++ b/drivers/net/fjes/fjes_main.c
> @@ -1187,8 +1187,9 @@ static int fjes_probe(struct platform_device *plat_dev)
>   adapter->force_reset = false;
>   adapter->open_guard = false;
> 
> - adapter->txrx_wq = create_workqueue(DRV_NAME "/txrx");
> - adapter->control_wq = create_workqueue(DRV_NAME "/control");
> + adapter->txrx_wq = alloc_workqueue(DRV_NAME "/txrx", WQ_MEM_RECLAIM, 0);
> + adapter->control_wq = alloc_workqueue(DRV_NAME "/control",
> +   WQ_MEM_RECLAIM, 0);
> 
>   INIT_WORK(>tx_stall_task, fjes_tx_stall_task);
>   INIT_WORK(>raise_intr_rxdata_task,
> --
> 2.1.4
> 



RE: [PATCH] net: fjes: fjes_main: Remove create_workqueue

2016-06-02 Thread Izumi, Taku
Dear Bhaktipriya,

Thanks. Looks good to me.

Sincerely,
Taku Izumi

> -Original Message-
> From: Bhaktipriya Shridhar [mailto:bhaktipriy...@gmail.com]
> Sent: Thursday, June 02, 2016 6:31 PM
> To: David S. Miller; Izumi, Taku/泉 拓; Florian Westphal; Bhaktipriya Shridhar
> Cc: Tejun Heo; net...@vger.kernel.org; linux-kernel@vger.kernel.org
> Subject: [PATCH] net: fjes: fjes_main: Remove create_workqueue
> 
> alloc_workqueue replaces deprecated create_workqueue().
> 
> The workqueue adapter->txrx_wq has workitem
> >raise_intr_rxdata_task per adapter. Extended Socket Network
> Device is shared memory based, so someone's transmission denotes other's
> reception.  raise_intr_rxdata_task raises interruption of receivers from
> the sender in order to notify receivers.
> 
> The workqueue adapter->control_wq has workitem
> >interrupt_watch_task per adapter. interrupt_watch_task is used
> to prevent delay of interrupts.
> 
> Dedicated workqueues have been used in both cases since the workitems
> on the workqueues are involved in normal device operation and require
> forward progress under memory pressure.
> 
> max_active has been set to 0 since there is no need for throttling
> the number of active work items.
> 
> Since network devices  may be used for memory reclaim,
> WQ_MEM_RECLAIM has been set to guarantee forward progress.
> 
> Signed-off-by: Bhaktipriya Shridhar 
> ---
>  drivers/net/fjes/fjes_main.c | 5 +++--
>  1 file changed, 3 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/net/fjes/fjes_main.c b/drivers/net/fjes/fjes_main.c
> index 86c331b..9006877 100644
> --- a/drivers/net/fjes/fjes_main.c
> +++ b/drivers/net/fjes/fjes_main.c
> @@ -1187,8 +1187,9 @@ static int fjes_probe(struct platform_device *plat_dev)
>   adapter->force_reset = false;
>   adapter->open_guard = false;
> 
> - adapter->txrx_wq = create_workqueue(DRV_NAME "/txrx");
> - adapter->control_wq = create_workqueue(DRV_NAME "/control");
> + adapter->txrx_wq = alloc_workqueue(DRV_NAME "/txrx", WQ_MEM_RECLAIM, 0);
> + adapter->control_wq = alloc_workqueue(DRV_NAME "/control",
> +   WQ_MEM_RECLAIM, 0);
> 
>   INIT_WORK(>tx_stall_task, fjes_tx_stall_task);
>   INIT_WORK(>raise_intr_rxdata_task,
> --
> 2.1.4
> 



RE: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option

2015-12-16 Thread Izumi, Taku
Dear Xishi,

 Sorry for late.

> -Original Message-
> From: Xishi Qiu [mailto:qiuxi...@huawei.com]
> Sent: Friday, December 11, 2015 6:44 PM
> To: Izumi, Taku/泉 拓
> Cc: Luck, Tony; linux-kernel@vger.kernel.org; linux...@kvack.org; 
> a...@linux-foundation.org; Kamezawa, Hiroyuki/亀澤 寛
> 之; m...@csn.ul.ie; Hansen, Dave; m...@codeblueprint.co.uk
> Subject: Re: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
> 
> On 2015/12/11 13:53, Izumi, Taku wrote:
> 
> > Dear Xishi,
> >
> >> Hi Taku,
> >>
> >> Whether it is possible that we rewrite the fallback function in buddy 
> >> system
> >> when zone_movable and mirrored_kernelcore are both enabled?
> >
> >   What does "when zone_movable and mirrored_kernelcore are both enabled?" 
> > mean ?
> >
> >   My patchset just provides a new way to create ZONE_MOVABLE.
> >
> 
> Hi Taku,
> 
> I mean when zone_movable is from kernelcore=mirror, not kernelcore=nn[KMG].

  I'm not quite sure what you are saying, but if you want to screen user memory
  so that one is allocated from mirrored zone and another is from non-mirrored 
zone,
  I think it is possible to reuse my patchset.

  Sincerely,
  Taku Izumi

> Thanks,
> Xishi Qiu
> 
> >   Sincerely,
> >   Taku Izumi
> >>
> >> It seems something like that we add a new zone but the name is 
> >> zone_movable,
> >> not zone_mirror. And the prerequisite is that we won't enable these two
> >> features(movable memory and mirrored memory) at the same time. Thus we can
> >> reuse the code of movable zone.
> >>
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
> > .
> >
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option

2015-12-16 Thread Izumi, Taku
Dear Xishi,

 Sorry for late.

> -Original Message-
> From: Xishi Qiu [mailto:qiuxi...@huawei.com]
> Sent: Friday, December 11, 2015 6:44 PM
> To: Izumi, Taku/泉 拓
> Cc: Luck, Tony; linux-kernel@vger.kernel.org; linux...@kvack.org; 
> a...@linux-foundation.org; Kamezawa, Hiroyuki/亀澤 寛
> 之; m...@csn.ul.ie; Hansen, Dave; m...@codeblueprint.co.uk
> Subject: Re: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option
> 
> On 2015/12/11 13:53, Izumi, Taku wrote:
> 
> > Dear Xishi,
> >
> >> Hi Taku,
> >>
> >> Whether it is possible that we rewrite the fallback function in buddy 
> >> system
> >> when zone_movable and mirrored_kernelcore are both enabled?
> >
> >   What does "when zone_movable and mirrored_kernelcore are both enabled?" 
> > mean ?
> >
> >   My patchset just provides a new way to create ZONE_MOVABLE.
> >
> 
> Hi Taku,
> 
> I mean when zone_movable is from kernelcore=mirror, not kernelcore=nn[KMG].

  I'm not quite sure what you are saying, but if you want to screen user memory
  so that one is allocated from mirrored zone and another is from non-mirrored 
zone,
  I think it is possible to reuse my patchset.

  Sincerely,
  Taku Izumi

> Thanks,
> Xishi Qiu
> 
> >   Sincerely,
> >   Taku Izumi
> >>
> >> It seems something like that we add a new zone but the name is 
> >> zone_movable,
> >> not zone_mirror. And the prerequisite is that we won't enable these two
> >> features(movable memory and mirrored memory) at the same time. Thus we can
> >> reuse the code of movable zone.
> >>
> >
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
> > .
> >
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option

2015-12-10 Thread Izumi, Taku
Dear Xishi,

> Hi Taku,
> 
> Whether it is possible that we rewrite the fallback function in buddy system
> when zone_movable and mirrored_kernelcore are both enabled?

  What does "when zone_movable and mirrored_kernelcore are both enabled?" mean ?
  
  My patchset just provides a new way to create ZONE_MOVABLE.

  Sincerely,
  Taku Izumi
> 
> It seems something like that we add a new zone but the name is zone_movable,
> not zone_mirror. And the prerequisite is that we won't enable these two
> features(movable memory and mirrored memory) at the same time. Thus we can
> reuse the code of movable zone.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option

2015-12-10 Thread Izumi, Taku
Dear Xishi,

> Hi Taku,
> 
> Whether it is possible that we rewrite the fallback function in buddy system
> when zone_movable and mirrored_kernelcore are both enabled?

  What does "when zone_movable and mirrored_kernelcore are both enabled?" mean ?
  
  My patchset just provides a new way to create ZONE_MOVABLE.

  Sincerely,
  Taku Izumi
> 
> It seems something like that we add a new zone but the name is zone_movable,
> not zone_mirror. And the prerequisite is that we won't enable these two
> features(movable memory and mirrored memory) at the same time. Thus we can
> reuse the code of movable zone.
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option

2015-12-09 Thread Izumi, Taku
Dear Tony, Xishi,

> >> How about add some comment, if mirrored memroy is too small, then the
> >> normal zone is small, so it may be oom.
> >> The mirrored memory is at least 1/64 of whole memory, because struct
> >> pages usually take 64 bytes per page.
> >
> > 1/64th is the absolute lower bound (for the page structures as you say). I
> > expect people will need to configure 10% or more to run any real workloads.

> >
> > I made the memblock boot time allocator fall back to non-mirrored memory
> > if mirrored memory ran out.  What happens in the run time allocator if the
> > non-movable zones run out of pages? Will we allocate kernel pages from 
> > movable
> > memory?
> >
> 
> As I know, the kernel pages will not allocated from movable zone.

 Yes, kernel pages are not allocated from ZONE_MOVABLE.

 In this case administrator must review and reconfigure the mirror ratio via 
 "MirrorRequest" EFI variable.
 
  Sincerely,
  Taku Izumi

>
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
> > .
> >
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v3 2/2] mm: Introduce kernelcore=mirror option

2015-12-09 Thread Izumi, Taku
Dear Tony, Xishi,

> >> How about add some comment, if mirrored memroy is too small, then the
> >> normal zone is small, so it may be oom.
> >> The mirrored memory is at least 1/64 of whole memory, because struct
> >> pages usually take 64 bytes per page.
> >
> > 1/64th is the absolute lower bound (for the page structures as you say). I
> > expect people will need to configure 10% or more to run any real workloads.

> >
> > I made the memblock boot time allocator fall back to non-mirrored memory
> > if mirrored memory ran out.  What happens in the run time allocator if the
> > non-movable zones run out of pages? Will we allocate kernel pages from 
> > movable
> > memory?
> >
> 
> As I know, the kernel pages will not allocated from movable zone.

 Yes, kernel pages are not allocated from ZONE_MOVABLE.

 In this case administrator must review and reconfigure the mirror ratio via 
 "MirrorRequest" EFI variable.
 
  Sincerely,
  Taku Izumi

>
> > --
> > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> > the body of a message to majord...@vger.kernel.org
> > More majordomo info at  http://vger.kernel.org/majordomo-info.html
> > Please read the FAQ at  http://www.tux.org/lkml/
> >
> > .
> >
> 
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v2 2/2] mm: Introduce kernelcore=reliable option

2015-12-08 Thread Izumi, Taku
Dear Xishi,

 Thanks for reviewing.

> -Original Message-
> From: Xishi Qiu [mailto:qiuxi...@huawei.com]
> Sent: Wednesday, December 09, 2015 11:26 AM
> To: Izumi, Taku/泉 拓
> Cc: linux-kernel@vger.kernel.org; linux...@kvack.org; tony.l...@intel.com; 
> Kamezawa, Hiroyuki/亀澤 寛之; m...@csn.ul.ie;
> a...@linux-foundation.org; dave.han...@intel.com; m...@codeblueprint.co.uk
> Subject: Re: [PATCH v2 2/2] mm: Introduce kernelcore=reliable option
> 
> On 2015/11/27 23:04, Taku Izumi wrote:
> 
> > This patch extends existing "kernelcore" option and
> > introduces kernelcore=reliable option. By specifying
> > "reliable" instead of specifying the amount of memory,
> > non-reliable region will be arranged into ZONE_MOVABLE.
> >
> > v1 -> v2:
> >  - Refine so that the following case also can be
> >handled properly:
> >
> >  Node X:  |MM--MM|
> >(legend) M: mirrored  -: not mirrrored
> >
> >  In this case, ZONE_NORMAL and ZONE_MOVABLE are
> >  arranged like bellow:
> >
> >  Node X:  |--|
> >   |ooxxoo| ZONE_NORMAL
> > |ooxx| ZONE_MOVABLE
> >(legend) o: present  x: absent
> >
> > Signed-off-by: Taku Izumi 
> > ---
> >  Documentation/kernel-parameters.txt |   9 ++-
> >  mm/page_alloc.c | 110 
> > ++--
> >  2 files changed, 112 insertions(+), 7 deletions(-)
> >
> > diff --git a/Documentation/kernel-parameters.txt 
> > b/Documentation/kernel-parameters.txt
> > index f8aae63..ed44c2c8 100644
> > --- a/Documentation/kernel-parameters.txt
> > +++ b/Documentation/kernel-parameters.txt
> > @@ -1695,7 +1695,8 @@ bytes respectively. Such letter suffixes can also be 
> > entirely omitted.
> >
> > keepinitrd  [HW,ARM]
> >
> > -   kernelcore=nn[KMG]  [KNL,X86,IA-64,PPC] This parameter
> > +   kernelcore= Format: nn[KMG] | "reliable"
> > +   [KNL,X86,IA-64,PPC] This parameter
> > specifies the amount of memory usable by the kernel
> > for non-movable allocations.  The requested amount is
> > spread evenly throughout all nodes in the system. The
> > @@ -1711,6 +1712,12 @@ bytes respectively. Such letter suffixes can also be 
> > entirely omitted.
> > use the HighMem zone if it exists, and the Normal
> > zone if it does not.
> >
> > +   Instead of specifying the amount of memory (nn[KMS]),
> > +   you can specify "reliable" option. In case "reliable"
> > +   option is specified, reliable memory is used for
> > +   non-movable allocations and remaining memory is used
> > +   for Movable pages.
> > +
> > kgdbdbgp=   [KGDB,HW] kgdb over EHCI usb debug port.
> > Format: [,poll interval]
> > The controller # is the number of the ehci usb debug
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index acb0b4e..006a3d8 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -251,6 +251,7 @@ static unsigned long __meminitdata 
> > arch_zone_highest_possible_pfn[MAX_NR_ZONES];
> >  static unsigned long __initdata required_kernelcore;
> >  static unsigned long __initdata required_movablecore;
> >  static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
> > +static bool reliable_kernelcore;
> >
> >  /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
> >  int movable_zone;
> > @@ -4472,6 +4473,7 @@ void __meminit memmap_init_zone(unsigned long size, 
> > int nid, unsigned long zone,
> > unsigned long pfn;
> > struct zone *z;
> > unsigned long nr_initialised = 0;
> > +   struct memblock_region *r = NULL, *tmp;
> >
> > if (highest_memmap_pfn < end_pfn - 1)
> > highest_memmap_pfn = end_pfn - 1;
> > @@ -4491,6 +4493,38 @@ void __meminit memmap_init_zone(unsigned long size, 
> > int nid, unsigned long zone,
> > if (!update_defer_init(pgdat, pfn, end_pfn,
> > _initialised))
> > break;
> > +
> > +   /*
> > +* if not reliable_kernelcore and ZONE_MOVABLE exists,
> > +* range from zone_mova

RE: [PATCH v2 0/2] mm: Introduce kernelcore=reliable option

2015-12-08 Thread Izumi, Taku
Dear Tony,


> >  Which do you think is beter ?
> >- change into kernelcore="mirrored"
> >- keep kernelcore="reliable" and minmal printk fix
> 
> UEFI came up with the "reliable" wording (as a more generic term ...
> as Andrew said
> it could cover differences in ECC modes, or some alternate memory
> technology that
> has lower error rates).
> 
> But I personally like "mirror" more ... it matches current
> implementation. Of course
> I'll look silly if some future system does something other than mirror.
> 

 Okay, I'll change the option name into kernelcore=mirror.

Sincerely,
Taku Izumi


RE: [PATCH v2 0/2] mm: Introduce kernelcore=reliable option

2015-12-08 Thread Izumi, Taku
Dear Tony,

  Thanks for testing!

Dear Andrew,


> > Xeon E7 v3 based systems supports Address Range Mirroring
> > and UEFI BIOS complied with UEFI spec 2.5 can notify which
> > ranges are reliable (mirrored) via EFI memory map.
> > Now Linux kernel utilize its information and allocates
> > boot time memory from reliable region.
> >
> > My requirement is:
> >   - allocate kernel memory from reliable region
> >   - allocate user memory from non-reliable region
> >
> > In order to meet my requirement, ZONE_MOVABLE is useful.
> > By arranging non-reliable range into ZONE_MOVABLE,
> > reliable memory is only used for kernel allocations.
> >
> > My idea is to extend existing "kernelcore" option and
> > introduces kernelcore=reliable option. By specifying
> > "reliable" instead of specifying the amount of memory,
> > non-reliable region will be arranged into ZONE_MOVABLE.
> 
> It is unfortunate that the kernel presently refers to this memory as
> "mirrored", but this patchset introduces the new term "reliable".  I
> think it would be better if we use "mirrored" throughout.
> Of course, mirroring isn't the only way to get reliable memory.

  YES. "mirroring" is not the only way.
  So, in my opinion, we should change "mirrored" into "reliable" in order
  to match terms of UEFI 2.5 spec.

> Perhaps if a part of the system memory has ECC correction then this
> also can be accessed using "reliable", in which case your proposed
> naming makes sense.  reliable == mirrored || ecc?

  "reliable" is better.

  But, I'm willing to change "reliable" into "mirrored".

  Otherwise, I keep "kernelcore=reliable" and add the following minimal fix as 
  a separate patch:

diff  a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -134,7 +134,7 @@ void __init efi_find_mirror(void)
}
}
if (mirror_size)
-   pr_info("Memory: %lldM/%lldM mirrored memory\n",
+   pr_info("Memory: %lldM/%lldM reliable memory\n",
mirror_size>>20, total_size>>20);
 }

 
 Which do you think is beter ?
   - change into kernelcore="mirrored"
   - keep kernelcore="reliable" and minmal printk fix 

> 
> Secondly, does this patchset mean that kernelcore=reliable and
> kernelcore=100M are exclusive?  Or can the user specify
> "kernelcore=reliable,kernelcore=100M" to use 100M of reliable memory
> for kernelcore?

  No, these are exclusive.
> 
> This is unclear from the documentation and I suggest that this be
> spelled out.

  Thanks. I'll update its document.

 Sincerely,
 Taku Izumi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v2 0/2] mm: Introduce kernelcore=reliable option

2015-12-08 Thread Izumi, Taku
Dear Tony,

  Thanks for testing!

Dear Andrew,


> > Xeon E7 v3 based systems supports Address Range Mirroring
> > and UEFI BIOS complied with UEFI spec 2.5 can notify which
> > ranges are reliable (mirrored) via EFI memory map.
> > Now Linux kernel utilize its information and allocates
> > boot time memory from reliable region.
> >
> > My requirement is:
> >   - allocate kernel memory from reliable region
> >   - allocate user memory from non-reliable region
> >
> > In order to meet my requirement, ZONE_MOVABLE is useful.
> > By arranging non-reliable range into ZONE_MOVABLE,
> > reliable memory is only used for kernel allocations.
> >
> > My idea is to extend existing "kernelcore" option and
> > introduces kernelcore=reliable option. By specifying
> > "reliable" instead of specifying the amount of memory,
> > non-reliable region will be arranged into ZONE_MOVABLE.
> 
> It is unfortunate that the kernel presently refers to this memory as
> "mirrored", but this patchset introduces the new term "reliable".  I
> think it would be better if we use "mirrored" throughout.
> Of course, mirroring isn't the only way to get reliable memory.

  YES. "mirroring" is not the only way.
  So, in my opinion, we should change "mirrored" into "reliable" in order
  to match terms of UEFI 2.5 spec.

> Perhaps if a part of the system memory has ECC correction then this
> also can be accessed using "reliable", in which case your proposed
> naming makes sense.  reliable == mirrored || ecc?

  "reliable" is better.

  But, I'm willing to change "reliable" into "mirrored".

  Otherwise, I keep "kernelcore=reliable" and add the following minimal fix as 
  a separate patch:

diff  a/arch/x86/platform/efi/efi.c b/arch/x86/platform/efi/efi.c
--- a/arch/x86/platform/efi/efi.c
+++ b/arch/x86/platform/efi/efi.c
@@ -134,7 +134,7 @@ void __init efi_find_mirror(void)
}
}
if (mirror_size)
-   pr_info("Memory: %lldM/%lldM mirrored memory\n",
+   pr_info("Memory: %lldM/%lldM reliable memory\n",
mirror_size>>20, total_size>>20);
 }

 
 Which do you think is beter ?
   - change into kernelcore="mirrored"
   - keep kernelcore="reliable" and minmal printk fix 

> 
> Secondly, does this patchset mean that kernelcore=reliable and
> kernelcore=100M are exclusive?  Or can the user specify
> "kernelcore=reliable,kernelcore=100M" to use 100M of reliable memory
> for kernelcore?

  No, these are exclusive.
> 
> This is unclear from the documentation and I suggest that this be
> spelled out.

  Thanks. I'll update its document.

 Sincerely,
 Taku Izumi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v2 0/2] mm: Introduce kernelcore=reliable option

2015-12-08 Thread Izumi, Taku
Dear Tony,


> >  Which do you think is beter ?
> >- change into kernelcore="mirrored"
> >- keep kernelcore="reliable" and minmal printk fix
> 
> UEFI came up with the "reliable" wording (as a more generic term ...
> as Andrew said
> it could cover differences in ECC modes, or some alternate memory
> technology that
> has lower error rates).
> 
> But I personally like "mirror" more ... it matches current
> implementation. Of course
> I'll look silly if some future system does something other than mirror.
> 

 Okay, I'll change the option name into kernelcore=mirror.

Sincerely,
Taku Izumi


RE: [PATCH v2 2/2] mm: Introduce kernelcore=reliable option

2015-12-08 Thread Izumi, Taku
Dear Xishi,

 Thanks for reviewing.

> -Original Message-
> From: Xishi Qiu [mailto:qiuxi...@huawei.com]
> Sent: Wednesday, December 09, 2015 11:26 AM
> To: Izumi, Taku/泉 拓
> Cc: linux-kernel@vger.kernel.org; linux...@kvack.org; tony.l...@intel.com; 
> Kamezawa, Hiroyuki/亀澤 寛之; m...@csn.ul.ie;
> a...@linux-foundation.org; dave.han...@intel.com; m...@codeblueprint.co.uk
> Subject: Re: [PATCH v2 2/2] mm: Introduce kernelcore=reliable option
> 
> On 2015/11/27 23:04, Taku Izumi wrote:
> 
> > This patch extends existing "kernelcore" option and
> > introduces kernelcore=reliable option. By specifying
> > "reliable" instead of specifying the amount of memory,
> > non-reliable region will be arranged into ZONE_MOVABLE.
> >
> > v1 -> v2:
> >  - Refine so that the following case also can be
> >handled properly:
> >
> >  Node X:  |MM--MM|
> >(legend) M: mirrored  -: not mirrrored
> >
> >  In this case, ZONE_NORMAL and ZONE_MOVABLE are
> >  arranged like bellow:
> >
> >  Node X:  |--|
> >   |ooxxoo| ZONE_NORMAL
> > |ooxx| ZONE_MOVABLE
> >(legend) o: present  x: absent
> >
> > Signed-off-by: Taku Izumi <izumi.t...@jp.fujitsu.com>
> > ---
> >  Documentation/kernel-parameters.txt |   9 ++-
> >  mm/page_alloc.c | 110 
> > ++--
> >  2 files changed, 112 insertions(+), 7 deletions(-)
> >
> > diff --git a/Documentation/kernel-parameters.txt 
> > b/Documentation/kernel-parameters.txt
> > index f8aae63..ed44c2c8 100644
> > --- a/Documentation/kernel-parameters.txt
> > +++ b/Documentation/kernel-parameters.txt
> > @@ -1695,7 +1695,8 @@ bytes respectively. Such letter suffixes can also be 
> > entirely omitted.
> >
> > keepinitrd  [HW,ARM]
> >
> > -   kernelcore=nn[KMG]  [KNL,X86,IA-64,PPC] This parameter
> > +   kernelcore= Format: nn[KMG] | "reliable"
> > +   [KNL,X86,IA-64,PPC] This parameter
> > specifies the amount of memory usable by the kernel
> > for non-movable allocations.  The requested amount is
> > spread evenly throughout all nodes in the system. The
> > @@ -1711,6 +1712,12 @@ bytes respectively. Such letter suffixes can also be 
> > entirely omitted.
> > use the HighMem zone if it exists, and the Normal
> > zone if it does not.
> >
> > +   Instead of specifying the amount of memory (nn[KMS]),
> > +   you can specify "reliable" option. In case "reliable"
> > +   option is specified, reliable memory is used for
> > +   non-movable allocations and remaining memory is used
> > +   for Movable pages.
> > +
> > kgdbdbgp=   [KGDB,HW] kgdb over EHCI usb debug port.
> > Format: <Controller#>[,poll interval]
> > The controller # is the number of the ehci usb debug
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index acb0b4e..006a3d8 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -251,6 +251,7 @@ static unsigned long __meminitdata 
> > arch_zone_highest_possible_pfn[MAX_NR_ZONES];
> >  static unsigned long __initdata required_kernelcore;
> >  static unsigned long __initdata required_movablecore;
> >  static unsigned long __meminitdata zone_movable_pfn[MAX_NUMNODES];
> > +static bool reliable_kernelcore;
> >
> >  /* movable_zone is the "real" zone pages in ZONE_MOVABLE are taken from */
> >  int movable_zone;
> > @@ -4472,6 +4473,7 @@ void __meminit memmap_init_zone(unsigned long size, 
> > int nid, unsigned long zone,
> > unsigned long pfn;
> > struct zone *z;
> > unsigned long nr_initialised = 0;
> > +   struct memblock_region *r = NULL, *tmp;
> >
> > if (highest_memmap_pfn < end_pfn - 1)
> > highest_memmap_pfn = end_pfn - 1;
> > @@ -4491,6 +4493,38 @@ void __meminit memmap_init_zone(unsigned long size, 
> > int nid, unsigned long zone,
> > if (!update_defer_init(pgdat, pfn, end_pfn,
> > _initialised))
> > break;
> > +
> > +   /*
> > +* if not reliable_kernelcore and ZONE_MOVABLE exists,
>

RE: [PATCH] fjes: fix inconsistent indenting

2015-11-11 Thread Izumi, Taku
Thanks, Colin.

Signed-off-by: Taku Izumi 

> -Original Message-
> From: Colin King [mailto:colin.k...@canonical.com]
> Sent: Thursday, November 12, 2015 12:23 AM
> To: David S. Miller; Izumi, Taku/泉 拓; Markus Elfring; net...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Subject: [PATCH] fjes: fix inconsistent indenting
> 
> From: Colin Ian King 
> 
> minor change, indenting is one tab out.
> 
> Signed-off-by: Colin Ian King 
> ---
>  drivers/net/fjes/fjes_hw.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/fjes/fjes_hw.c b/drivers/net/fjes/fjes_hw.c
> index bb8b530..b103adb 100644
> --- a/drivers/net/fjes/fjes_hw.c
> +++ b/drivers/net/fjes/fjes_hw.c
> @@ -599,7 +599,7 @@ int fjes_hw_unregister_buff_addr(struct fjes_hw *hw, int 
> dest_epid)
>   FJES_CMD_REQ_RES_CODE_BUSY) &&
>  (timeout > 0)) {
>   msleep(200 + hw->my_epid * 20);
> - timeout -= (200 + hw->my_epid * 20);
> + timeout -= (200 + hw->my_epid * 20);
> 
>   res_buf->unshare_buffer.length = 0;
>   res_buf->unshare_buffer.code = 0;
> --
> 2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] fjes: fix inconsistent indenting

2015-11-11 Thread Izumi, Taku
Thanks, Colin.

Signed-off-by: Taku Izumi <izumi.t...@jp.fujitsu.com>

> -Original Message-
> From: Colin King [mailto:colin.k...@canonical.com]
> Sent: Thursday, November 12, 2015 12:23 AM
> To: David S. Miller; Izumi, Taku/泉 拓; Markus Elfring; net...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Subject: [PATCH] fjes: fix inconsistent indenting
> 
> From: Colin Ian King <colin.k...@canonical.com>
> 
> minor change, indenting is one tab out.
> 
> Signed-off-by: Colin Ian King <colin.k...@canonical.com>
> ---
>  drivers/net/fjes/fjes_hw.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/net/fjes/fjes_hw.c b/drivers/net/fjes/fjes_hw.c
> index bb8b530..b103adb 100644
> --- a/drivers/net/fjes/fjes_hw.c
> +++ b/drivers/net/fjes/fjes_hw.c
> @@ -599,7 +599,7 @@ int fjes_hw_unregister_buff_addr(struct fjes_hw *hw, int 
> dest_epid)
>   FJES_CMD_REQ_RES_CODE_BUSY) &&
>  (timeout > 0)) {
>   msleep(200 + hw->my_epid * 20);
> - timeout -= (200 + hw->my_epid * 20);
> + timeout -= (200 + hw->my_epid * 20);
> 
>   res_buf->unshare_buffer.length = 0;
>   res_buf->unshare_buffer.code = 0;
> --
> 2.5.0

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] mm: Introduce kernelcore=reliable option

2015-10-22 Thread Izumi, Taku
 Dear Tony,

> -Original Message-
> From: Luck, Tony [mailto:tony.l...@intel.com]
> Sent: Friday, October 23, 2015 8:27 AM
> To: Kamezawa, Hiroyuki/亀澤 寛之; Izumi, Taku/泉 拓; linux-kernel@vger.kernel.org; 
> linux...@kvack.org
> Cc: qiuxi...@huawei.com; m...@csn.ul.ie; a...@linux-foundation.org; Hansen, 
> Dave; m...@codeblueprint.co.uk
> Subject: RE: [PATCH] mm: Introduce kernelcore=reliable option
> 
> > I think /proc/zoneinfo can show detailed numbers per zone. Do we need some 
> > for meminfo ?
> 
> I wrote a little script (attached) to summarize /proc/zoneinfo ... on my 
> system it says
> 
> $ zoneinfo
> Node  Normal Movable DMA   DMA32
>00.00   103020.078.94 1554.46
>1 9284.5489870.43
>2 9626.3394050.09
>3 9602.8293650.04
> 
> Not sure why I have zero Normal memory free on node0.  The sum of all those
> free counts is 410667.72 MB ... which is close enough to the boot time message
> showing the amount of mirror/total memory:
> 
> [0.00] efi: Memory: 80979/420096M mirrored memory
> 
> but a fair amount of the 80G of mirrored memory seems to have been miscounted
> as Movable instead of Normal. Perhaps this is because I have two blocks of 
> mirrored
> memory on each node and the movable zone code doesn't expect that?

 You were saying that OS view of memory of node is something like the following 
?
  
Node X:  |MM--MM|  
   (legend) M: mirrored  -: not mirrrored

 If so, is this a real Box's configuration?
 Sorry, I haven't got a real Address Range Mirror capable boxes yet ...
 I thought mirroring range is concatenated at the first part of each node.

 Sincerely,
 Taku Izumi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] efi: Fix warning of int-to-pointer-cast on x86 32-bit builds

2015-10-22 Thread Izumi, Taku
Dear Ard,

> > commit-0f96a99 introduces the following warning message:
> >
> >  drivers/firmware/efi/fake_mem.c:186:20: warning: cast to pointer
> >  from integer of different size [-Wint-to-pointer-cast]
> >
> > new_memmap_phy was defined as a u64 value and casted to void*.
> > This causes a warning of int-to-pointer-cast on x86 32-bit
> > environment.
> >
> > This patch changes the type of "new_memmap_phy" variable
> > from "u64" into "phys_addr_t" to avoid it.
> 
> This assumes sizeof(void*) == sizeof(phys_addr_t), which is not always true, 
> e.g., on 32-bit ARM (whose UEFI support is
> in development but not yet merged) with LPAE enabled.
> 
> Could we use unsigned long instead?

  Okay. I'll update my patch.

Sincerely,
Taku Izumi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] efi: Fix warning of int-to-pointer-cast on x86 32-bit builds

2015-10-22 Thread Izumi, Taku
Dear Ard,

> > commit-0f96a99 introduces the following warning message:
> >
> >  drivers/firmware/efi/fake_mem.c:186:20: warning: cast to pointer
> >  from integer of different size [-Wint-to-pointer-cast]
> >
> > new_memmap_phy was defined as a u64 value and casted to void*.
> > This causes a warning of int-to-pointer-cast on x86 32-bit
> > environment.
> >
> > This patch changes the type of "new_memmap_phy" variable
> > from "u64" into "phys_addr_t" to avoid it.
> 
> This assumes sizeof(void*) == sizeof(phys_addr_t), which is not always true, 
> e.g., on 32-bit ARM (whose UEFI support is
> in development but not yet merged) with LPAE enabled.
> 
> Could we use unsigned long instead?

  Okay. I'll update my patch.

Sincerely,
Taku Izumi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] mm: Introduce kernelcore=reliable option

2015-10-22 Thread Izumi, Taku
 Dear Tony,

> -Original Message-
> From: Luck, Tony [mailto:tony.l...@intel.com]
> Sent: Friday, October 23, 2015 8:27 AM
> To: Kamezawa, Hiroyuki/亀澤 寛之; Izumi, Taku/泉 拓; linux-kernel@vger.kernel.org; 
> linux...@kvack.org
> Cc: qiuxi...@huawei.com; m...@csn.ul.ie; a...@linux-foundation.org; Hansen, 
> Dave; m...@codeblueprint.co.uk
> Subject: RE: [PATCH] mm: Introduce kernelcore=reliable option
> 
> > I think /proc/zoneinfo can show detailed numbers per zone. Do we need some 
> > for meminfo ?
> 
> I wrote a little script (attached) to summarize /proc/zoneinfo ... on my 
> system it says
> 
> $ zoneinfo
> Node  Normal Movable DMA   DMA32
>00.00   103020.078.94 1554.46
>1 9284.5489870.43
>2 9626.3394050.09
>3 9602.8293650.04
> 
> Not sure why I have zero Normal memory free on node0.  The sum of all those
> free counts is 410667.72 MB ... which is close enough to the boot time message
> showing the amount of mirror/total memory:
> 
> [0.00] efi: Memory: 80979/420096M mirrored memory
> 
> but a fair amount of the 80G of mirrored memory seems to have been miscounted
> as Movable instead of Normal. Perhaps this is because I have two blocks of 
> mirrored
> memory on each node and the movable zone code doesn't expect that?

 You were saying that OS view of memory of node is something like the following 
?
  
Node X:  |MM--MM|  
   (legend) M: mirrored  -: not mirrrored

 If so, is this a real Box's configuration?
 Sorry, I haven't got a real Address Range Mirror capable boxes yet ...
 I thought mirroring range is concatenated at the first part of each node.

 Sincerely,
 Taku Izumi

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] mm: Introduce kernelcore=reliable option

2015-10-19 Thread Izumi, Taku
 Hi Xishi,

> On 2015/10/15 21:32, Taku Izumi wrote:
> 
> > Xeon E7 v3 based systems supports Address Range Mirroring
> > and UEFI BIOS complied with UEFI spec 2.5 can notify which
> > ranges are reliable (mirrored) via EFI memory map.
> > Now Linux kernel utilize its information and allocates
> > boot time memory from reliable region.
> >
> > My requirement is:
> >   - allocate kernel memory from reliable region
> >   - allocate user memory from non-reliable region
> >
> > In order to meet my requirement, ZONE_MOVABLE is useful.
> > By arranging non-reliable range into ZONE_MOVABLE,
> > reliable memory is only used for kernel allocations.
> >
> > This patch extends existing "kernelcore" option and
> > introduces kernelcore=reliable option. By specifying
> > "reliable" instead of specifying the amount of memory,
> > non-reliable region will be arranged into ZONE_MOVABLE.
> >
> > Earlier discussion is at:
> >  https://lkml.org/lkml/2015/10/9/24
> >
> 
> Hi Taku,
> 
> If user don't want to waste a lot of memory, and he only set
> a few memory to mirrored memory, then the kernelcore is very
> small, right? That means OS will have a very small normal zone
> and a very large movable zone.

 Right.

> Kernel allocation could only use the unmovable zone. As the
> normal zone is very small, the kernel allocation maybe OOM,
> right?

 Right.

> Do you mean that we will reuse the movable zone in short-term
> solution and create a new zone(mirrored zone) in future?

 If there is that kind of requirements, I don't oppose 
 creating a new zone.

 Sincerely,
 Taku Izumi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] mm: Introduce kernelcore=reliable option

2015-10-19 Thread Izumi, Taku
 Hi Xishi,

> On 2015/10/15 21:32, Taku Izumi wrote:
> 
> > Xeon E7 v3 based systems supports Address Range Mirroring
> > and UEFI BIOS complied with UEFI spec 2.5 can notify which
> > ranges are reliable (mirrored) via EFI memory map.
> > Now Linux kernel utilize its information and allocates
> > boot time memory from reliable region.
> >
> > My requirement is:
> >   - allocate kernel memory from reliable region
> >   - allocate user memory from non-reliable region
> >
> > In order to meet my requirement, ZONE_MOVABLE is useful.
> > By arranging non-reliable range into ZONE_MOVABLE,
> > reliable memory is only used for kernel allocations.
> >
> > This patch extends existing "kernelcore" option and
> > introduces kernelcore=reliable option. By specifying
> > "reliable" instead of specifying the amount of memory,
> > non-reliable region will be arranged into ZONE_MOVABLE.
> >
> > Earlier discussion is at:
> >  https://lkml.org/lkml/2015/10/9/24
> >
> 
> Hi Taku,
> 
> If user don't want to waste a lot of memory, and he only set
> a few memory to mirrored memory, then the kernelcore is very
> small, right? That means OS will have a very small normal zone
> and a very large movable zone.

 Right.

> Kernel allocation could only use the unmovable zone. As the
> normal zone is very small, the kernel allocation maybe OOM,
> right?

 Right.

> Do you mean that we will reuse the movable zone in short-term
> solution and create a new zone(mirrored zone) in future?

 If there is that kind of requirements, I don't oppose 
 creating a new zone.

 Sincerely,
 Taku Izumi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH][RFC] mm: Introduce kernelcore=reliable option

2015-10-13 Thread Izumi, Taku
> > I remember Kame has already suggested this idea. In my opinion,
> > I still think it's better to add a new migratetype or a new zone,
> > so both user and kernel could use mirrored memory.
> 
> A new zone would be more flexible ... and probably the right long
> term solution.  But this looks like a very clever was to try out the
> feature with a minimally invasive patch.

 Yes. I agree creating a new zone is the right solution for long term.
 I believe this approach using MOVABLE_ZONE is good and reasonable 
 for short-term solution.

Sincerely,
Taku Izumi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH][RFC] mm: Introduce kernelcore=reliable option

2015-10-13 Thread Izumi, Taku
> > I remember Kame has already suggested this idea. In my opinion,
> > I still think it's better to add a new migratetype or a new zone,
> > so both user and kernel could use mirrored memory.
> 
> A new zone would be more flexible ... and probably the right long
> term solution.  But this looks like a very clever was to try out the
> feature with a minimally invasive patch.

 Yes. I agree creating a new zone is the right solution for long term.
 I believe this approach using MOVABLE_ZONE is good and reasonable 
 for short-term solution.

Sincerely,
Taku Izumi
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH 2/2] x86, efi: Add "efi_fake_mem" boot option

2015-09-29 Thread Izumi, Taku
 I've missed git-format-patch after rebasing.
 I'll resend right one..

> -Original Message-
> From: kbuild test robot [mailto:l...@intel.com]
> Sent: Wednesday, September 30, 2015 10:37 AM
> To: Izumi, Taku/泉 拓
> Cc: kbuild-...@01.org; linux-kernel@vger.kernel.org; 
> linux-...@vger.kernel.org; x...@kernel.org; matt.flem...@intel.com;
> t...@linutronix.de; mi...@redhat.com; h...@zytor.com; tony.l...@intel.com; 
> qinxi...@huawei.com; Kamezawa, Hiroyuki/亀
> 澤 寛之; ard.biesheu...@linaro.org; Izumi, Taku/泉 拓
> Subject: Re: [PATCH 2/2] x86, efi: Add "efi_fake_mem" boot option
> 
> Hi Taku,
> 
> [auto build test results on v4.3-rc3 -- if it's inappropriate base, please 
> ignore]
> 
> config: i386-allmodconfig (attached as .config)
> reproduce:
>   git checkout afcc94d3f91a00ce97d735a563a8e5d595f45a03
>   # save the attached .config to linux build tree
>   make ARCH=i386
> 
> All error/warnings (new ones prefixed by >>):
> 
> >> drivers/firmware/efi/fake_mem.c:36:25: error: 'CONFIG_EFI_MAX_FAKEMEM' 
> >> undeclared here (not in a function)
> #define EFI_MAX_FAKEMEM CONFIG_EFI_MAX_FAKEMEM
> ^
> >> drivers/firmware/efi/fake_mem.c:42:34: note: in expansion of macro 
> >> 'EFI_MAX_FAKEMEM'
> static struct fake_mem fake_mems[EFI_MAX_FAKEMEM];
>  ^
>drivers/firmware/efi/fake_mem.c: In function 'efi_fake_memmap':
> >> drivers/firmware/efi/fake_mem.c:186:20: warning: cast to pointer from 
> >> integer of different size [-Wint-to-pointer-cast]
>  memmap.phys_map = (void *)new_memmap_phy;
>^
>drivers/firmware/efi/fake_mem.c: At top level:
> >> drivers/firmware/efi/fake_mem.c:42:24: warning: 'fake_mems' defined but 
> >> not used [-Wunused-variable]
> static struct fake_mem fake_mems[EFI_MAX_FAKEMEM];
>^
> 
> vim +/CONFIG_EFI_MAX_FAKEMEM +36 drivers/firmware/efi/fake_mem.c
> 
> 30#include 
> 31#include 
> 32#include 
> 33#include 
> 34#include 
> 35
>   > 36#define EFI_MAX_FAKEMEM CONFIG_EFI_MAX_FAKEMEM
> 37
> 38struct fake_mem {
> 39struct range range;
> 40u64 attribute;
> 41};
>   > 42static struct fake_mem fake_mems[EFI_MAX_FAKEMEM];
> 43static int nr_fake_mem;
> 44
> 45static int __init cmp_fake_mem(const void *x1, const void *x2)
> 46{
> 47const struct fake_mem *m1 = x1;
> 48const struct fake_mem *m2 = x2;
> 49
> 50if (m1->range.start < m2->range.start)
> 51return -1;
> 52if (m1->range.start > m2->range.start)
> 53return 1;
> 54return 0;
> 55}
> 56
> 57void __init efi_fake_memmap(void)
> 58{
> 59u64 start, end, m_start, m_end, m_attr;
> 60int new_nr_map = memmap.nr_map;
> 61efi_memory_desc_t *md;
> 62u64 new_memmap_phy;
> 63void *new_memmap;
> 64void *old, *new;
> 65int i;
> 66
> 67if (!nr_fake_mem || !efi_enabled(EFI_MEMMAP))
> 68return;
> 69
> 70/* count up the number of EFI memory descriptor */
> 71for (old = memmap.map; old < memmap.map_end; old += 
> memmap.desc_size) {
> 72md = old;
> 73start = md->phys_addr;
> 74end = start + (md->num_pages << EFI_PAGE_SHIFT) 
> - 1;
> 75
> 76for (i = 0; i < nr_fake_mem; i++) {
> 77/* modifying range */
> 78m_start = fake_mems[i].range.start;
> 79m_end = fake_mems[i].range.end;
> 80
> 81if (m_start <= start) {
> 82/* split into 2 parts */
> 83if (start < m_end && m_end < 
> end)
> 84new_nr_map++;
> 85}
> 86if (start < m_start && m_start < end) {
> 87/

RE: [PATCH 2/2] x86, efi: Add "efi_fake_mem" boot option

2015-09-29 Thread Izumi, Taku
 I've missed git-format-patch after rebasing.
 I'll resend right one..

> -Original Message-
> From: kbuild test robot [mailto:l...@intel.com]
> Sent: Wednesday, September 30, 2015 10:37 AM
> To: Izumi, Taku/泉 拓
> Cc: kbuild-...@01.org; linux-kernel@vger.kernel.org; 
> linux-...@vger.kernel.org; x...@kernel.org; matt.flem...@intel.com;
> t...@linutronix.de; mi...@redhat.com; h...@zytor.com; tony.l...@intel.com; 
> qinxi...@huawei.com; Kamezawa, Hiroyuki/亀
> 澤 寛之; ard.biesheu...@linaro.org; Izumi, Taku/泉 拓
> Subject: Re: [PATCH 2/2] x86, efi: Add "efi_fake_mem" boot option
> 
> Hi Taku,
> 
> [auto build test results on v4.3-rc3 -- if it's inappropriate base, please 
> ignore]
> 
> config: i386-allmodconfig (attached as .config)
> reproduce:
>   git checkout afcc94d3f91a00ce97d735a563a8e5d595f45a03
>   # save the attached .config to linux build tree
>   make ARCH=i386
> 
> All error/warnings (new ones prefixed by >>):
> 
> >> drivers/firmware/efi/fake_mem.c:36:25: error: 'CONFIG_EFI_MAX_FAKEMEM' 
> >> undeclared here (not in a function)
> #define EFI_MAX_FAKEMEM CONFIG_EFI_MAX_FAKEMEM
> ^
> >> drivers/firmware/efi/fake_mem.c:42:34: note: in expansion of macro 
> >> 'EFI_MAX_FAKEMEM'
> static struct fake_mem fake_mems[EFI_MAX_FAKEMEM];
>  ^
>drivers/firmware/efi/fake_mem.c: In function 'efi_fake_memmap':
> >> drivers/firmware/efi/fake_mem.c:186:20: warning: cast to pointer from 
> >> integer of different size [-Wint-to-pointer-cast]
>  memmap.phys_map = (void *)new_memmap_phy;
>^
>drivers/firmware/efi/fake_mem.c: At top level:
> >> drivers/firmware/efi/fake_mem.c:42:24: warning: 'fake_mems' defined but 
> >> not used [-Wunused-variable]
> static struct fake_mem fake_mems[EFI_MAX_FAKEMEM];
>^
> 
> vim +/CONFIG_EFI_MAX_FAKEMEM +36 drivers/firmware/efi/fake_mem.c
> 
> 30#include 
> 31#include 
> 32#include 
> 33#include 
> 34#include 
> 35
>   > 36#define EFI_MAX_FAKEMEM CONFIG_EFI_MAX_FAKEMEM
> 37
> 38struct fake_mem {
> 39struct range range;
> 40u64 attribute;
> 41};
>   > 42static struct fake_mem fake_mems[EFI_MAX_FAKEMEM];
> 43static int nr_fake_mem;
> 44
> 45static int __init cmp_fake_mem(const void *x1, const void *x2)
> 46{
> 47const struct fake_mem *m1 = x1;
> 48const struct fake_mem *m2 = x2;
> 49
> 50if (m1->range.start < m2->range.start)
> 51return -1;
> 52if (m1->range.start > m2->range.start)
> 53return 1;
> 54return 0;
> 55}
> 56
> 57void __init efi_fake_memmap(void)
> 58{
> 59u64 start, end, m_start, m_end, m_attr;
> 60int new_nr_map = memmap.nr_map;
> 61efi_memory_desc_t *md;
> 62u64 new_memmap_phy;
> 63void *new_memmap;
> 64void *old, *new;
> 65int i;
> 66
> 67if (!nr_fake_mem || !efi_enabled(EFI_MEMMAP))
> 68return;
> 69
> 70/* count up the number of EFI memory descriptor */
> 71for (old = memmap.map; old < memmap.map_end; old += 
> memmap.desc_size) {
> 72md = old;
> 73start = md->phys_addr;
> 74end = start + (md->num_pages << EFI_PAGE_SHIFT) 
> - 1;
> 75
> 76for (i = 0; i < nr_fake_mem; i++) {
> 77/* modifying range */
> 78m_start = fake_mems[i].range.start;
> 79m_end = fake_mems[i].range.end;
> 80
> 81if (m_start <= start) {
> 82/* split into 2 parts */
> 83if (start < m_end && m_end < 
> end)
> 84new_nr_map++;
> 85}
> 86if (start < m_start && m_start < end) {
> 87/

RE: [PATCH 2/2] x86, efi: Add "efi_fake_mem_mirror" boot option

2015-08-26 Thread Izumi, Taku
Dear Matt,

Thank you for reviewing.

I updated my patchset.
I'm happy if you review new one.

Sincerely,
Taku Izumi

> -Original Message-
> From: Matt Fleming [mailto:m...@codeblueprint.co.uk]
> Sent: Wednesday, August 26, 2015 8:46 AM
> To: Izumi, Taku/泉 拓
> Cc: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; x...@kernel.org; 
> matt.flem...@intel.com;
> t...@linutronix.de; mi...@redhat.com; h...@zytor.com; tony.l...@intel.com; 
> qiuxi...@huawei.com; Kamezawa, Hiroyuki/亀
> 澤 寛之
> Subject: Re: [PATCH 2/2] x86, efi: Add "efi_fake_mem_mirror" boot option
> 
> On Fri, 21 Aug, at 02:16:00AM, Taku Izumi wrote:
> > This patch introduces new boot option named "efi_fake_mem_mirror".
> > By specifying this parameter, you can mark specific memory as
> > mirrored memory. This is useful for debugging of Address Range
> > Mirroring feature.
> >
> > For example, if you specify "efi_fake_mem_mirror=2G@4G,2G@0x10a000",
> > the original (firmware provided) EFI memmap will be updated so that
> > the specified memory regions have EFI_MEMORY_MORE_RELIABLE attribute:
> >
> >  
> >efi: mem00: [Boot Data  |   ||  |  |  |   |WB|WT|WC|UC] 
> > range=[0x-0x1000)
> (0MB)
> >efi: mem01: [Loader Data|   ||  |  |  |   |WB|WT|WC|UC] 
> > range=[0x1000-0x2000)
> (0MB)
> >...
> >efi: mem35: [Boot Data  |   ||  |  |  |   |WB|WT|WC|UC] 
> > range=[0x47ee6000-0x48014000)
> (1MB)
> >efi: mem36: [Conventional Memory|   ||  |  |  |   |WB|WT|WC|UC] 
> > range=[0x0001-0x0020a000)
> (129536MB)
> >efi: mem37: [Reserved   |RUN||  |  |  |   |  |  |  |UC] 
> > range=[0x6000-0x9000)
> (768MB)
> >
> >  
> >efi: mem00: [Boot Data  |   ||  |  |  |   |WB|WT|WC|UC] 
> > range=[0x-0x1000)
> (0MB)
> >efi: mem01: [Loader Data|   ||  |  |  |   |WB|WT|WC|UC] 
> > range=[0x1000-0x2000)
> (0MB)
> >...
> >efi: mem35: [Boot Data  |   ||  |  |  |   |WB|WT|WC|UC] 
> > range=[0x47ee6000-0x48014000)
> (1MB)
> >efi: mem36: [Conventional Memory|   |RELY|  |  |  |   |WB|WT|WC|UC] 
> > range=[0x0001-0x00018000)
> (2048MB)
> >efi: mem37: [Conventional Memory|   ||  |  |  |   |WB|WT|WC|UC] 
> > range=[0x00018000-0x0010a000)
> (61952MB)
> >efi: mem38: [Conventional Memory|   |RELY|  |  |  |   |WB|WT|WC|UC] 
> > range=[0x0010a000-0x00112000)
> (2048MB)
> >efi: mem39: [Conventional Memory|   ||  |  |  |   |WB|WT|WC|UC] 
> > range=[0x00112000-0x0020a000)
> (63488MB)
> >efi: mem40: [Reserved   |RUN||  |  |  |   |  |  |  |UC] 
> > range=[0x6000-0x9000)
> (768MB)
> >
> > And you will find that the following message is output:
> >
> >efi: Memory: 4096M/131455M mirrored memory
> >
> > Signed-off-by: Taku Izumi 
> > ---
> >  Documentation/kernel-parameters.txt |   8 ++
> >  arch/x86/include/asm/efi.h  |   2 +
> >  arch/x86/kernel/setup.c |   4 +-
> >  arch/x86/platform/efi/efi.c |   2 +-
> >  arch/x86/platform/efi/quirks.c  | 169 
> > 
> >  5 files changed, 183 insertions(+), 2 deletions(-)
> 
> [...]
> 
> > diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
> > index 1c7380d..5c785e1 100644
> > --- a/arch/x86/platform/efi/quirks.c
> > +++ b/arch/x86/platform/efi/quirks.c
> > @@ -18,6 +18,10 @@
> >
> 
> The quirks file isn't intended to be used for this kind of feature.
> It's very much a repository for workarounds for quirky firmware, i.e.
> known bugs.
> 
> Instead, how about putting all this into a new fake_mem.c file? Going
> further than that, there's nothing that I can see that looks
> particularly x86-specific, so how about sticking all this in
> drivers/firmware/efi/fake_mem.c so that the arm64 folks can make use
> of it if/when they want to start playing around with
> EFI_MEMORY_MORE_RELIABLE?
> 
> >  static efi_char16_t efi_dummy_name[6] = { 'D', 'U', 'M', 'M', 'Y', 0 };
> >
> > +#define EFI_MAX_FAKE_MIRROR 8
> > +static struct range fake_mirrors[EFI_MAX_FAKE_MIRROR];
> > +static int num_fake_mirror;
> > +
> >  static bool efi_no_storage_paranoia;
> >
> >  

RE: [PATCH 2/2] x86, efi: Add efi_fake_mem_mirror boot option

2015-08-26 Thread Izumi, Taku
Dear Matt,

Thank you for reviewing.

I updated my patchset.
I'm happy if you review new one.

Sincerely,
Taku Izumi

 -Original Message-
 From: Matt Fleming [mailto:m...@codeblueprint.co.uk]
 Sent: Wednesday, August 26, 2015 8:46 AM
 To: Izumi, Taku/泉 拓
 Cc: linux-kernel@vger.kernel.org; linux-...@vger.kernel.org; x...@kernel.org; 
 matt.flem...@intel.com;
 t...@linutronix.de; mi...@redhat.com; h...@zytor.com; tony.l...@intel.com; 
 qiuxi...@huawei.com; Kamezawa, Hiroyuki/亀
 澤 寛之
 Subject: Re: [PATCH 2/2] x86, efi: Add efi_fake_mem_mirror boot option
 
 On Fri, 21 Aug, at 02:16:00AM, Taku Izumi wrote:
  This patch introduces new boot option named efi_fake_mem_mirror.
  By specifying this parameter, you can mark specific memory as
  mirrored memory. This is useful for debugging of Address Range
  Mirroring feature.
 
  For example, if you specify efi_fake_mem_mirror=2G@4G,2G@0x10a000,
  the original (firmware provided) EFI memmap will be updated so that
  the specified memory regions have EFI_MEMORY_MORE_RELIABLE attribute:
 
   original EFI memmap
 efi: mem00: [Boot Data  |   ||  |  |  |   |WB|WT|WC|UC] 
  range=[0x-0x1000)
 (0MB)
 efi: mem01: [Loader Data|   ||  |  |  |   |WB|WT|WC|UC] 
  range=[0x1000-0x2000)
 (0MB)
 ...
 efi: mem35: [Boot Data  |   ||  |  |  |   |WB|WT|WC|UC] 
  range=[0x47ee6000-0x48014000)
 (1MB)
 efi: mem36: [Conventional Memory|   ||  |  |  |   |WB|WT|WC|UC] 
  range=[0x0001-0x0020a000)
 (129536MB)
 efi: mem37: [Reserved   |RUN||  |  |  |   |  |  |  |UC] 
  range=[0x6000-0x9000)
 (768MB)
 
   updated EFI memmap
 efi: mem00: [Boot Data  |   ||  |  |  |   |WB|WT|WC|UC] 
  range=[0x-0x1000)
 (0MB)
 efi: mem01: [Loader Data|   ||  |  |  |   |WB|WT|WC|UC] 
  range=[0x1000-0x2000)
 (0MB)
 ...
 efi: mem35: [Boot Data  |   ||  |  |  |   |WB|WT|WC|UC] 
  range=[0x47ee6000-0x48014000)
 (1MB)
 efi: mem36: [Conventional Memory|   |RELY|  |  |  |   |WB|WT|WC|UC] 
  range=[0x0001-0x00018000)
 (2048MB)
 efi: mem37: [Conventional Memory|   ||  |  |  |   |WB|WT|WC|UC] 
  range=[0x00018000-0x0010a000)
 (61952MB)
 efi: mem38: [Conventional Memory|   |RELY|  |  |  |   |WB|WT|WC|UC] 
  range=[0x0010a000-0x00112000)
 (2048MB)
 efi: mem39: [Conventional Memory|   ||  |  |  |   |WB|WT|WC|UC] 
  range=[0x00112000-0x0020a000)
 (63488MB)
 efi: mem40: [Reserved   |RUN||  |  |  |   |  |  |  |UC] 
  range=[0x6000-0x9000)
 (768MB)
 
  And you will find that the following message is output:
 
 efi: Memory: 4096M/131455M mirrored memory
 
  Signed-off-by: Taku Izumi izumi.t...@jp.fujitsu.com
  ---
   Documentation/kernel-parameters.txt |   8 ++
   arch/x86/include/asm/efi.h  |   2 +
   arch/x86/kernel/setup.c |   4 +-
   arch/x86/platform/efi/efi.c |   2 +-
   arch/x86/platform/efi/quirks.c  | 169 
  
   5 files changed, 183 insertions(+), 2 deletions(-)
 
 [...]
 
  diff --git a/arch/x86/platform/efi/quirks.c b/arch/x86/platform/efi/quirks.c
  index 1c7380d..5c785e1 100644
  --- a/arch/x86/platform/efi/quirks.c
  +++ b/arch/x86/platform/efi/quirks.c
  @@ -18,6 +18,10 @@
 
 
 The quirks file isn't intended to be used for this kind of feature.
 It's very much a repository for workarounds for quirky firmware, i.e.
 known bugs.
 
 Instead, how about putting all this into a new fake_mem.c file? Going
 further than that, there's nothing that I can see that looks
 particularly x86-specific, so how about sticking all this in
 drivers/firmware/efi/fake_mem.c so that the arm64 folks can make use
 of it if/when they want to start playing around with
 EFI_MEMORY_MORE_RELIABLE?
 
   static efi_char16_t efi_dummy_name[6] = { 'D', 'U', 'M', 'M', 'Y', 0 };
 
  +#define EFI_MAX_FAKE_MIRROR 8
  +static struct range fake_mirrors[EFI_MAX_FAKE_MIRROR];
  +static int num_fake_mirror;
  +
   static bool efi_no_storage_paranoia;
 
   /*
  @@ -288,3 +292,168 @@ bool efi_poweroff_required(void)
   {
  return !!acpi_gbl_reduced_hardware;
   }
  +
  +void __init efi_fake_memmap(void)
  +{
  +   efi_memory_desc_t *md;
  +   void *p, *q;
  +   int i;
  +   int nr_map = memmap.nr_map;
  +   u64 start, end, m_start, m_end;
  +   u64 new_memmap_phy;
  +   void *new_memmap;
  +
  +   if (!num_fake_mirror)
  +   return;
  +
  +   /* count up the number of EFI memory descriptor */
  +   for (p = memmap.map; p  memmap.map_end; p += memmap.desc_size) {
  +   md = p;
  +   start = md-phys_addr;
  +   end = start + (md-num_pages  EFI_PAGE_SHIFT) - 1;
  +
  +   for (i = 0; i  num_fake_mirror; i

RE: [RFC PATCH 0/2 shit_A shit_B] workqueue: fix wq_numa bug

2015-01-22 Thread Izumi, Taku

> This patches are un-changloged, un-compiled, un-booted, un-tested,
> they are just shits, I even hope them un-sent or blocked.
> 
> The patches include two -solutions-:
> 
> Shit_A:
>   workqueue: reset pool->node and unhash the pool when the node is
> offline
>   update wq_numa when cpu_present_mask changed
> 
>  kernel/workqueue.c | 107 
> +
>  1 file changed, 84 insertions(+), 23 deletions(-)
> 
> 
> Shit_B:
>   workqueue: reset pool->node and unhash the pool when the node is
> offline
>   workqueue: remove wq_numa_possible_cpumask
>   workqueue: directly update attrs of pools when cpu hot[un]plug
> 
>  kernel/workqueue.c | 135 
> +++--
>  1 file changed, 101 insertions(+), 34 deletions(-)
> 

  I tried your patchsets.
  linux-3.18.3 + Shit_A:

Build OK. 
I tried to reproduce the problem that Ishimatsu had reported, but it 
doesn't occur.
It seems that your patch fixes this problem.

  linux-3.18.3  + Shit_B: 

Build OK, but I encountered kernel panic at boot time.

[0.189000] BUG: unable to handle kernel NULL pointer dereference at 
0008
[0.189000] IP: [] __list_add+0x16/0xc0
[0.189000] PGD 0 
[0.189000] Oops:  [#1] SMP 
[0.189000] Modules linked in:
[0.189000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.18.3+ #3
[0.189000] Hardware name: FUJITSU PRIMEQUEST2800E/SB, BIOS PRIMEQUEST 2000 
Series BIOS Version 01.81 12/03/2014
[0.189000] task: 880869678000 ti: 880869664000 task.ti: 
880869664000
[0.189000] RIP: 0010:[]  [] 
__list_add+0x16/0xc0
[0.189000] RSP: :880869667be8  EFLAGS: 00010296
[0.189000] RAX: 88087f83cda8 RBX: 88087f83cd80 RCX: 
[0.189000] RDX:  RSI: 88086912bb98 RDI: 88087f83cd80
[0.189000] RBP: 880869667c08 R08:  R09: 88087f807480
[0.189000] R10: 810911b6 R11: 810956ac R12: 
[0.189000] R13: 88086912bb98 R14: 0400 R15: 0400
[0.189000] FS:  () GS:88087fc0() 
knlGS:
[0.189000] CS:  0010 DS:  ES:  CR0: 80050033
[0.189000] CR2: 0008 CR3: 01998000 CR4: 001407f0
[0.189000] Stack:
[0.189000]  000a 88086912b800 88087f83cd00 
88087f80c000
[0.189000]  880869667c48 810912c8 880869667c28 
88087f803f00
[0.189000]  fff4 88086964b760 88086964b6a0 
88086964b740
[0.189000] Call Trace:
[0.189000]  [] alloc_unbound_pwq+0x298/0x3b0
[0.189000]  [] apply_workqueue_attrs+0x158/0x4c0
[0.189000]  [] __alloc_workqueue_key+0x174/0x5b0
[0.189000]  [] ? alloc_cpumask_var_node+0x56/0x80
[0.189000]  [] init_workqueues+0x33d/0x40f
[0.189000]  [] ? 
ftrace_define_fields_workqueue_execute_start+0x6a/0x6a
[0.189000]  [] do_one_initcall+0xd4/0x210
[0.189000]  [] ? native_smp_prepare_cpus+0x34d/0x352
[0.189000]  [] kernel_init_freeable+0xf5/0x23c
[0.189000]  [] ? rest_init+0x80/0x80
[0.189000]  [] kernel_init+0xe/0xf0
[0.189000]  [] ret_from_fork+0x7c/0xb0
[0.189000]  [] ? rest_init+0x80/0x80
[0.189000] Code: ff b8 f4 ff ff ff e9 3b ff ff ff b8 f4 ff ff ff e9 31 ff 
ff ff 55 48 89 e5 41 55 49 89 f5 41 54 49 89 d4 53 48 89 fb 48 83 ec 08 <4c> 8b 
42 08 49 39 f0 75 2e 4d 8b 45 00 4d 39 c4 75 6c 4c 39 e3 
[0.189000] RIP  [] __list_add+0x16/0xc0
[0.189000]  RSP 
[0.189000] CR2: 0008
[0.189000] ---[ end trace 58feee6875cf67cf ]---
[0.189000] Kernel panic - not syncing: Fatal exception
[0.189000] ---[ end Kernel panic - not syncing: Fatal exception

   
  Sincerely,
  Taku Izumi


> Both patch1 of the both solutions are: reset pool->node and unhash the pool,
> it is suggested by TJ, I found it is a good leading-step for fixing the bug.
> 
> The other patches are handling wq_numa_possible_cpumask where the solutions
> diverge.
> 
> Solution_A uses present_mask rather than possible_cpumask. It adds
> wq_numa_notify_cpu_present_set/cleared() for notifications of
> the changes of cpu_present_mask.  But the notifications are un-existed
> right now, so I fake one (wq_numa_check_present_cpumask_changes())
> to imitate them.  I hope the memory people add a real one.
> 
> Solution_B uses online_mask rather than possible_cpumask.
> this solution remove more coupling between numa_code and workqueue,
> it just depends on cpumask_of_node(node).
> 
> Patch2_of_Solution_B removes the wq_numa_possible_cpumask and add
> overhead when cpu hot[un]plug, Patch3 reduce this overhead.
> 
> Thanks,
> Lai
> 
> 
> Reported-by: Yasuaki Ishimatsu 
> Cc: Tejun Heo 
> Cc: Yasuaki Ishimatsu 
> Cc: "Gu, Zheng" 
> Cc: tangchen 
> Cc: Hiroyuki KAMEZAWA 
> --
> 2.1.0
> 
> --
> To unsubscribe from this list: send the line "unsubscribe 

RE: [RFC PATCH 0/2 shit_A shit_B] workqueue: fix wq_numa bug

2015-01-22 Thread Izumi, Taku

 This patches are un-changloged, un-compiled, un-booted, un-tested,
 they are just shits, I even hope them un-sent or blocked.
 
 The patches include two -solutions-:
 
 Shit_A:
   workqueue: reset pool-node and unhash the pool when the node is
 offline
   update wq_numa when cpu_present_mask changed
 
  kernel/workqueue.c | 107 
 +
  1 file changed, 84 insertions(+), 23 deletions(-)
 
 
 Shit_B:
   workqueue: reset pool-node and unhash the pool when the node is
 offline
   workqueue: remove wq_numa_possible_cpumask
   workqueue: directly update attrs of pools when cpu hot[un]plug
 
  kernel/workqueue.c | 135 
 +++--
  1 file changed, 101 insertions(+), 34 deletions(-)
 

  I tried your patchsets.
  linux-3.18.3 + Shit_A:

Build OK. 
I tried to reproduce the problem that Ishimatsu had reported, but it 
doesn't occur.
It seems that your patch fixes this problem.

  linux-3.18.3  + Shit_B: 

Build OK, but I encountered kernel panic at boot time.

[0.189000] BUG: unable to handle kernel NULL pointer dereference at 
0008
[0.189000] IP: [8131ef96] __list_add+0x16/0xc0
[0.189000] PGD 0 
[0.189000] Oops:  [#1] SMP 
[0.189000] Modules linked in:
[0.189000] CPU: 0 PID: 1 Comm: swapper/0 Not tainted 3.18.3+ #3
[0.189000] Hardware name: FUJITSU PRIMEQUEST2800E/SB, BIOS PRIMEQUEST 2000 
Series BIOS Version 01.81 12/03/2014
[0.189000] task: 880869678000 ti: 880869664000 task.ti: 
880869664000
[0.189000] RIP: 0010:[8131ef96]  [8131ef96] 
__list_add+0x16/0xc0
[0.189000] RSP: :880869667be8  EFLAGS: 00010296
[0.189000] RAX: 88087f83cda8 RBX: 88087f83cd80 RCX: 
[0.189000] RDX:  RSI: 88086912bb98 RDI: 88087f83cd80
[0.189000] RBP: 880869667c08 R08:  R09: 88087f807480
[0.189000] R10: 810911b6 R11: 810956ac R12: 
[0.189000] R13: 88086912bb98 R14: 0400 R15: 0400
[0.189000] FS:  () GS:88087fc0() 
knlGS:
[0.189000] CS:  0010 DS:  ES:  CR0: 80050033
[0.189000] CR2: 0008 CR3: 01998000 CR4: 001407f0
[0.189000] Stack:
[0.189000]  000a 88086912b800 88087f83cd00 
88087f80c000
[0.189000]  880869667c48 810912c8 880869667c28 
88087f803f00
[0.189000]  fff4 88086964b760 88086964b6a0 
88086964b740
[0.189000] Call Trace:
[0.189000]  [810912c8] alloc_unbound_pwq+0x298/0x3b0
[0.189000]  [81091ce8] apply_workqueue_attrs+0x158/0x4c0
[0.189000]  [81092424] __alloc_workqueue_key+0x174/0x5b0
[0.189000]  [813052a6] ? alloc_cpumask_var_node+0x56/0x80
[0.189000]  [81b21573] init_workqueues+0x33d/0x40f
[0.189000]  [81b21236] ? 
ftrace_define_fields_workqueue_execute_start+0x6a/0x6a
[0.189000]  [81002144] do_one_initcall+0xd4/0x210
[0.189000]  [81b12f4d] ? native_smp_prepare_cpus+0x34d/0x352
[0.189000]  [81b0026d] kernel_init_freeable+0xf5/0x23c
[0.189000]  [81653370] ? rest_init+0x80/0x80
[0.189000]  [8165337e] kernel_init+0xe/0xf0
[0.189000]  [8166bcfc] ret_from_fork+0x7c/0xb0
[0.189000]  [81653370] ? rest_init+0x80/0x80
[0.189000] Code: ff b8 f4 ff ff ff e9 3b ff ff ff b8 f4 ff ff ff e9 31 ff 
ff ff 55 48 89 e5 41 55 49 89 f5 41 54 49 89 d4 53 48 89 fb 48 83 ec 08 4c 8b 
42 08 49 39 f0 75 2e 4d 8b 45 00 4d 39 c4 75 6c 4c 39 e3 
[0.189000] RIP  [8131ef96] __list_add+0x16/0xc0
[0.189000]  RSP 880869667be8
[0.189000] CR2: 0008
[0.189000] ---[ end trace 58feee6875cf67cf ]---
[0.189000] Kernel panic - not syncing: Fatal exception
[0.189000] ---[ end Kernel panic - not syncing: Fatal exception

   
  Sincerely,
  Taku Izumi


 Both patch1 of the both solutions are: reset pool-node and unhash the pool,
 it is suggested by TJ, I found it is a good leading-step for fixing the bug.
 
 The other patches are handling wq_numa_possible_cpumask where the solutions
 diverge.
 
 Solution_A uses present_mask rather than possible_cpumask. It adds
 wq_numa_notify_cpu_present_set/cleared() for notifications of
 the changes of cpu_present_mask.  But the notifications are un-existed
 right now, so I fake one (wq_numa_check_present_cpumask_changes())
 to imitate them.  I hope the memory people add a real one.
 
 Solution_B uses online_mask rather than possible_cpumask.
 this solution remove more coupling between numa_code and workqueue,
 it just depends on cpumask_of_node(node).
 
 Patch2_of_Solution_B removes the wq_numa_possible_cpumask and add
 overhead when cpu hot[un]plug, Patch3 reduce this overhead.