Re: [PATCHv2] mm/sparse: reset section's mem_map when fully deactivated

2020-01-23 Thread Michal Hocko
On Thu 23-01-20 19:10:47, Andrew Morton wrote:
> On Mon, 20 Jan 2020 08:29:39 +0100 Michal Hocko  wrote:
> 
> > On Mon 20-01-20 10:33:14, Pingfan Liu wrote:
> > > After commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug"),
> > > when a mem section is fully deactivated, section_mem_map still records the
> > > section's start pfn, which is not used any more and will be reassigned
> > > during re-added.
> > > 
> > > In analogy with alloc/free pattern, it is better to clear all fields of
> > > section_mem_map.
> > > 
> > > Beside this, it breaks the user space tool "makedumpfile" [1], which makes
> > > assumption that a hot-removed section has mem_map as NULL, instead of
> > > checking directly against SECTION_MARKED_PRESENT bit. (makedumpfile will 
> > > be
> > > better to change the assumption, and need a patch)
> > > 
> > > The bug can be reproduced on IBM POWERVM by "drmgr -c mem -r -q 5" ,
> > > trigger a crash, and save vmcore by makedumpfile
> > 
> > While makedumpfile lives very closely to the kernel and occasional
> > breakage is to be expected I still believe that Fixes: ba72b4c8cf60
> > is due.
> 
> But not a cc:stable?

Well, I wouldn't say this is really critical. makedumpfile will get its
fix... But if people think it would be useful in stable I won't oppose. 
-- 
Michal Hocko
SUSE Labs

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCHv2] mm/sparse: reset section's mem_map when fully deactivated

2020-01-23 Thread Andrew Morton
On Mon, 20 Jan 2020 08:29:39 +0100 Michal Hocko  wrote:

> On Mon 20-01-20 10:33:14, Pingfan Liu wrote:
> > After commit ba72b4c8cf60 ("mm/sparsemem: support sub-section hotplug"),
> > when a mem section is fully deactivated, section_mem_map still records the
> > section's start pfn, which is not used any more and will be reassigned
> > during re-added.
> > 
> > In analogy with alloc/free pattern, it is better to clear all fields of
> > section_mem_map.
> > 
> > Beside this, it breaks the user space tool "makedumpfile" [1], which makes
> > assumption that a hot-removed section has mem_map as NULL, instead of
> > checking directly against SECTION_MARKED_PRESENT bit. (makedumpfile will be
> > better to change the assumption, and need a patch)
> > 
> > The bug can be reproduced on IBM POWERVM by "drmgr -c mem -r -q 5" ,
> > trigger a crash, and save vmcore by makedumpfile
> 
> While makedumpfile lives very closely to the kernel and occasional
> breakage is to be expected I still believe that Fixes: ba72b4c8cf60
> is due.

But not a cc:stable?

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


RE: [PATCH] makedumpfile: cope with not-present mem section

2020-01-23 Thread 萩尾 一仁
Hi Pingfan,

Sorry, I had been busy, then was looking into its history a bit.

> -Original Message-
> On 01/20/2020 04:59 PM, Baoquan He wrote:
> > On 01/20/20 at 09:33am, David Hildenbrand wrote:
> >>
> >>
> >>> Am 20.01.2020 um 03:25 schrieb Pingfan Liu :
> >>>
> >>> After kernel commit ba72b4c8cf60 ("mm/sparsemem: support sub-section
> >>> hotplug"), when hot-removed, section_mem_map is still encoded with section
> >>> start pfn, not NULL. This break the current makedumpfile.

Could you add kernel version which this started and error message,
if possible, for someone hitting this issue to find the patch.

> >>>
> >>> Whatever section_mem_map coding info after hot-removed, it is reliable
> >>> just to work on SECTION_MARKED_PRESENT bit. Fixing makedumpfile by this
> >>> way.
> >>>
> >>> Signed-off-by: Pingfan Liu 
> >>> To: kexec@lists.infradead.org
> >>> Cc: Kazuhito Hagio 
> >>> Cc: Baoquan He 
> >>> Cc: David Hildenbrand 
> >>> Cc: Andrew Morton 
> >>> Cc: Dan Williams 
> >>> Cc: Oscar Salvador 
> >>> Cc: Michal Hocko 
> >>> Cc: Qian Cai 
> >>> ---
> >>> makedumpfile.c | 6 +-
> >>> 1 file changed, 1 insertion(+), 5 deletions(-)
> >>>
> >>> diff --git a/makedumpfile.c b/makedumpfile.c
> >>> index e290fbd..ab40a58 100644
> >>> --- a/makedumpfile.c
> >>> +++ b/makedumpfile.c
> >>> @@ -3406,8 +3406,6 @@ section_mem_map_addr(unsigned long addr, unsigned 
> >>> long *map_mask)
> >>>map = ULONG(mem_section + OFFSET(mem_section.section_mem_map));
> >>>mask = SECTION_MAP_MASK;
> >>>*map_mask = map & ~mask;
> >>> -if (map == 0x0)
> >>> -*map_mask |= SECTION_MARKED_PRESENT;
> This should be a trick to let map==0x0 survive the logic
>   if (!(map_mask & SECTION_MARKED_PRESENT)) {
>   return FALSE;
>   }
> in validate_mem_section().
> >>
> >> Why was that added in the first place? This looks like dome compat 
> >> handling to me. Are we sure we can
> remove it?
> The logic introduced by commit 14876c45aef ("[PATCH makedumpfile] handle
> mem_section as either a pointer or an array")
> Combining section_mem_map_addr() and the following logic in
> validate_mem_section()
> if (mem_map == 0) {
>   mem_map = NOT_MEMMAP_ADDR;
> }
> 
> I reached the same conclusion as Baoquan's.
> >
> >
> > The original code assumes that there are only two kinds of mem_section,
> > one is empty value, the second is a present one. This removing behaves
> > the same as the old code, I don't see a problem with it.
> >
> > I tried to understand the code, but I don't know why it need call
> > validate_mem_section() twice and compare the returned value.
> I think it could be if/else, no need to call twice.

It looks to me that commit 14876c45aef ("[PATCH makedumpfile] handle mem_section
as either a pointer or an array") was intended to support kernels which had
  83e3c48729d9 ("mm/sparsemem: Allocate mem_section at runtime for 
CONFIG_SPARSEMEM_EXTREME=y")
and did not have
  a0b1280368d1 ("kdump: write correct address of mem_section into vmcoreinfo").

Apparently there were some released Ubuntu kernels like this.

So, if we need that logic, your patch seems
- support kernels which section_mem_map is non-NULL for hot-removed memory,
- but might increase the possiblity of false-positive.

I think this is a tradeoff between the above two, but the latter would be
small enough.  And I'm testing the patch and OK with 10 vmcores so far.

> Cc Thadeu Lima de Souza Cascardo, who contributes the original code, and
> may have some idea about it.

So if Cascardo doesn't have any objection, I will accept.

Thanks,
Kazu

P.S. My email address has been changed to k-hagio...@nec.com.
Please send email to this address in the future.  Thanks.
___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [PATCH] ata: ahci: Add shutdown to freeze hardware resources of ahci

2020-01-23 Thread Bjorn Helgaas
On Thu, Jan 23, 2020 at 10:41:57AM +, Prabhakar Kushwaha wrote:
> device_shutdown() called from reboot/power_shutdown expect all
> devices to be shutdown. Same is true for ahci pci driver.
> As no shutdown function was implemented ata subsystem remains
> always alive and DMA/interrupt still active.
> 
> It creates problem during kexec, here "M" bit is cleared to stop
> DMA usage.

Maybe this is obvious to AHCI folks, but I don't know what "M" bit you
are referring to, and I don't see anything in the patch itself that
looks like an "M" bit.  I thought maybe you meant the "Bus Master
Enable" bit (PCI_COMMAND_MASTER), but I don't see an obvious
connection to that either.

> Any further DMA transaction may cause instability and
> the hard-disk may even not get detected for second kernel.
> One of possible case is periodic file system sync.

I think "may cause instability" and "disk may even not get detected"
is too vague and hand-wavy to really add useful information to the
commit log.

> So defining ahci pci driver shutdown to freeze hardware (mask
> interrupt, stop DMA engine and free DMA resources).
> 
> Signed-off-by: Prabhakar Kushwaha 
> ---
>  drivers/ata/ahci.c|  8 
>  drivers/ata/libata-core.c | 21 +
>  include/linux/libata.h|  1 +
>  3 files changed, 30 insertions(+)
> 
> diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
> index 4bfd1b14b390..31fc934740b6 100644
> --- a/drivers/ata/ahci.c
> +++ b/drivers/ata/ahci.c
> @@ -81,6 +81,7 @@ enum board_ids {
>  
>  static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id 
> *ent);
>  static void ahci_remove_one(struct pci_dev *dev);
> +static void ahci_shutdown_one(struct pci_dev *dev);
>  static int ahci_vt8251_hardreset(struct ata_link *link, unsigned int *class,
>unsigned long deadline);
>  static int ahci_avn_hardreset(struct ata_link *link, unsigned int *class,
> @@ -606,6 +607,7 @@ static struct pci_driver ahci_pci_driver = {
>   .id_table   = ahci_pci_tbl,
>   .probe  = ahci_init_one,
>   .remove = ahci_remove_one,
> + .shutdown   = ahci_shutdown_one,
>   .driver = {
>   .pm = _pci_pm_ops,
>   },
> @@ -626,6 +628,7 @@ MODULE_PARM_DESC(mobile_lpm_policy, "Default LPM policy 
> for mobile chipsets");
>  static void ahci_pci_save_initial_config(struct pci_dev *pdev,
>struct ahci_host_priv *hpriv)
>  {
> +

Spurious whitespace change, please remove.

>   if (pdev->vendor == PCI_VENDOR_ID_JMICRON && pdev->device == 0x2361) {
>   dev_info(>dev, "JMB361 has only one port\n");
>   hpriv->force_port_map = 1;
> @@ -1877,6 +1880,11 @@ static int ahci_init_one(struct pci_dev *pdev, const 
> struct pci_device_id *ent)
>   return 0;
>  }
>  
> +static void ahci_shutdown_one(struct pci_dev *pdev)
> +{
> + ata_pci_shutdown_one(pdev);
> +}
> +
>  static void ahci_remove_one(struct pci_dev *pdev)
>  {
>   pm_runtime_get_noresume(>dev);
> diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
> index 6f4ab5c5b52d..42c8728f6117 100644
> --- a/drivers/ata/libata-core.c
> +++ b/drivers/ata/libata-core.c
> @@ -6767,6 +6767,26 @@ void ata_pci_remove_one(struct pci_dev *pdev)
>   ata_host_detach(host);
>  }
>  
> +void ata_pci_shutdown_one(struct pci_dev *pdev)
> +{
> + struct ata_host *host = pci_get_drvdata(pdev);
> + int i;
> +
> + for (i = 0; i < host->n_ports; i++) {
> + struct ata_port *ap = host->ports[i];
> +
> + ap->pflags |= ATA_PFLAG_FROZEN;
> +
> + /* Disable port interrupts */
> + if (ap->ops->freeze)
> + ap->ops->freeze(ap);
> +
> + /* Stop the port DMA engines */
> + if (ap->ops->port_stop)
> + ap->ops->port_stop(ap);
> + }
> +}
> +
>  /* move to PCI subsystem */
>  int pci_test_config_bits(struct pci_dev *pdev, const struct pci_bits *bits)
>  {
> @@ -7387,6 +7407,7 @@ EXPORT_SYMBOL_GPL(ata_timing_cycle2mode);
>  
>  #ifdef CONFIG_PCI
>  EXPORT_SYMBOL_GPL(pci_test_config_bits);
> +EXPORT_SYMBOL_GPL(ata_pci_shutdown_one);
>  EXPORT_SYMBOL_GPL(ata_pci_remove_one);
>  #ifdef CONFIG_PM
>  EXPORT_SYMBOL_GPL(ata_pci_device_do_suspend);
> diff --git a/include/linux/libata.h b/include/linux/libata.h
> index 2dbde119721d..bff539918d82 100644
> --- a/include/linux/libata.h
> +++ b/include/linux/libata.h
> @@ -1221,6 +1221,7 @@ struct pci_bits {
>  };
>  
>  extern int pci_test_config_bits(struct pci_dev *pdev, const struct pci_bits 
> *bits);
> +extern void ata_pci_shutdown_one(struct pci_dev *pdev);
>  extern void ata_pci_remove_one(struct pci_dev *pdev);
>  
>  #ifdef CONFIG_PM
> -- 
> 2.17.1
> 

___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec


Re: [RFC] printing the final constructed kernel command line

2020-01-23 Thread Eric DeVolder

Bhupesh,

I had a minute to examine this today, and I realized that adding these dbgprint's for the command 
line under the -d [only] may be a bit heavy.


When the -d option is used, *all* of the below is output. What I was asked to provide (for our 
fleet) is a way to record what the kexec/kdump kernel and command line is. I agree this command line 
info belongs in -d as well, but if you would be open to a lighter weight option, we would make use 
of this option.


The -d seems to be most useful when debugging why kexec load failed (eg. memory map, relocations, 
etc). In our situation, we have a loaded kernel, and weeks perhaps months later (as capabilities of 
the fleet change), we would like to be able to query (via dmesg/journalctl/etc capture) that kexec 
kernel and command line; in determining if a change to that kexec-loaded kernel is needed.


Thanks for you consideration,
eric

% journalctl -u kdump
-- Logs begin at Tue 2020-01-21 15:07:15 EST, end at Tue 2020-01-21 15:07:26 
EST. --
Jan 21 15:07:20 vm382 systemd[1]: Starting Crash recovery kernel arming...
Jan 21 15:07:22 vm382 kdumpctl[1382]: /sbin/kexec  -d -p 
--command-line=BOOT_IMAGE=/vmlinuz-4.1.12-124.35.2.el7uek.x86_64 ro rhgb quiet LANG=en_US.UTF-8 
irqpoll nr_cpus=1 reset_devices cgroup_disable=memory mce=off numa=off udev.children-max=2 panic=10 
rootflags=nofail acpi_no_memhotplug transparent_hugepage=never nokaslr novmcoredd 
disable_cpu_apicid=0 --initrd=/boot/initramfs-4.1.12-124.35.2.el7uek.x86_64kdump.img 
/boot/vmlinuz-4.1.12-124.35.2.el7uek.x86_64

Jan 21 15:07:22 vm382 kdumpctl[1382]: Try gzip decompression.
Jan 21 15:07:22 vm382 kdumpctl[1382]: Try LZMA decompression.
Jan 21 15:07:22 vm382 kdumpctl[1382]: lzma_decompress_file: read on 
/boot/vmlinuz-4.1.12-124.35.2.el7uek.x86_64 of 65536 bytes failed

Jan 21 15:07:22 vm382 kdumpctl[1382]: kernel: 0x7f0c45619010 kernel_size: 
0x61c070
Jan 21 15:07:22 vm382 kdumpctl[1382]: MEMORY RANGES
Jan 21 15:07:22 vm382 kdumpctl[1382]: 0100-0009 (0)
Jan 21 15:07:22 vm382 kdumpctl[1382]: 0010-7d86d017 (0)
Jan 21 15:07:22 vm382 kdumpctl[1382]: 7d86d018-7d8a9857 (0)
Jan 21 15:07:22 vm382 kdumpctl[1382]: 7d8a9858-7d8aa017 (0)
Jan 21 15:07:22 vm382 kdumpctl[1382]: 7d8aa018-7d8b3657 (0)
Jan 21 15:07:22 vm382 kdumpctl[1382]: 7d8b3658-7dbfefff (0)
Jan 21 15:07:22 vm382 kdumpctl[1382]: 7dbff000-7dbf (2)
Jan 21 15:07:22 vm382 kdumpctl[1382]: 7dc0-7e2f2fff (0)
Jan 21 15:07:22 vm382 kdumpctl[1382]: 7e2f3000-7e31afff (1)
Jan 21 15:07:22 vm382 kdumpctl[1382]: 7e31b000-7f39afff (0)
Jan 21 15:07:22 vm382 kdumpctl[1382]: 7f39b000-7f3f2fff (1)
Jan 21 15:07:22 vm382 kdumpctl[1382]: 7f3f3000-7f3fafff (2)
Jan 21 15:07:22 vm382 kdumpctl[1382]: 7f3fb000-7f3fefff (3)
Jan 21 15:07:22 vm382 kdumpctl[1382]: 7f3ff000-7f7f (0)
Jan 21 15:07:22 vm382 kdumpctl[1382]: 7f80-8fff (1)
Jan 21 15:07:22 vm382 kdumpctl[1382]: ffe0- (1)
Jan 21 15:07:22 vm382 kdumpctl[1382]: 0001-00027fff (0)
Jan 21 15:07:22 vm382 kdumpctl[1382]: -0fff : 
reserved
Jan 21 15:07:22 vm382 kdumpctl[1382]: 1000-0009 : 
System RAM
Jan 21 15:07:22 vm382 kdumpctl[1382]: 000a-000b : PCI 
Bus :00
Jan 21 15:07:22 vm382 kdumpctl[1382]: 000f-000f : 
System ROM
Jan 21 15:07:22 vm382 kdumpctl[1382]: 0010-7d86d017 : 
System RAM
Jan 21 15:07:22 vm382 kdumpctl[1382]: 0200-02764898 : 
Kernel code
Jan 21 15:07:22 vm382 kdumpctl[1382]: 02764899-02cc4ebf : 
Kernel data
Jan 21 15:07:22 vm382 kdumpctl[1382]: 02ec9000-03211fff : 
Kernel bss
Jan 21 15:07:22 vm382 kdumpctl[1382]: 2700-370f : Crash 
kernel
Jan 21 15:07:22 vm382 kdumpctl[1382]: 7d86d018-7d8a9857 : 
System RAM
Jan 21 15:07:22 vm382 kdumpctl[1382]: 7d8a9858-7d8aa017 : 
System RAM
Jan 21 15:07:22 vm382 kdumpctl[1382]: 7d8aa018-7d8b3657 : 
System RAM
Jan 21 15:07:22 vm382 kdumpctl[1382]: 7d8b3658-7dbfefff : 
System RAM
Jan 21 15:07:22 vm382 kdumpctl[1382]: 7dbff000-7dbf : ACPI 
Tables
Jan 21 15:07:22 vm382 kdumpctl[1382]: 7dc0-7e2f2fff : 
System RAM
Jan 21 15:07:22 vm382 kdumpctl[1382]: 7e2f3000-7e31afff : 
reserved
Jan 21 15:07:22 vm382 kdumpctl[1382]: 7e31b000-7f39afff : 
System RAM
Jan 21 15:07:22 vm382 kdumpctl[1382]: 7f39b000-7f3f2fff : 
reserved
Jan 21 15:07:22 vm382 kdumpctl[1382]: 7f3f3000-7f3fafff : ACPI 
Tables
Jan 21 15:07:22 vm382 kdumpctl[1382]: 7f3fb000-7f3fefff : ACPI 
Non-volatile Storage

[PATCH] ata: ahci: Add shutdown to freeze hardware resources of ahci

2020-01-23 Thread Prabhakar Kushwaha
device_shutdown() called from reboot/power_shutdown expect all
devices to be shutdown. Same is true for ahci pci driver.
As no shutdown function was implemented ata subsystem remains
always alive and DMA/interrupt still active.

It creates problem during kexec, here "M" bit is cleared to stop
DMA usage. Any further DMA transaction may cause instability and
the hard-disk may even not get detected for second kernel.
One of possible case is periodic file system sync.

So defining ahci pci driver shutdown to freeze hardware (mask
interrupt, stop DMA engine and free DMA resources).

Signed-off-by: Prabhakar Kushwaha 
---
 drivers/ata/ahci.c|  8 
 drivers/ata/libata-core.c | 21 +
 include/linux/libata.h|  1 +
 3 files changed, 30 insertions(+)

diff --git a/drivers/ata/ahci.c b/drivers/ata/ahci.c
index 4bfd1b14b390..31fc934740b6 100644
--- a/drivers/ata/ahci.c
+++ b/drivers/ata/ahci.c
@@ -81,6 +81,7 @@ enum board_ids {
 
 static int ahci_init_one(struct pci_dev *pdev, const struct pci_device_id 
*ent);
 static void ahci_remove_one(struct pci_dev *dev);
+static void ahci_shutdown_one(struct pci_dev *dev);
 static int ahci_vt8251_hardreset(struct ata_link *link, unsigned int *class,
 unsigned long deadline);
 static int ahci_avn_hardreset(struct ata_link *link, unsigned int *class,
@@ -606,6 +607,7 @@ static struct pci_driver ahci_pci_driver = {
.id_table   = ahci_pci_tbl,
.probe  = ahci_init_one,
.remove = ahci_remove_one,
+   .shutdown   = ahci_shutdown_one,
.driver = {
.pm = _pci_pm_ops,
},
@@ -626,6 +628,7 @@ MODULE_PARM_DESC(mobile_lpm_policy, "Default LPM policy for 
mobile chipsets");
 static void ahci_pci_save_initial_config(struct pci_dev *pdev,
 struct ahci_host_priv *hpriv)
 {
+
if (pdev->vendor == PCI_VENDOR_ID_JMICRON && pdev->device == 0x2361) {
dev_info(>dev, "JMB361 has only one port\n");
hpriv->force_port_map = 1;
@@ -1877,6 +1880,11 @@ static int ahci_init_one(struct pci_dev *pdev, const 
struct pci_device_id *ent)
return 0;
 }
 
+static void ahci_shutdown_one(struct pci_dev *pdev)
+{
+   ata_pci_shutdown_one(pdev);
+}
+
 static void ahci_remove_one(struct pci_dev *pdev)
 {
pm_runtime_get_noresume(>dev);
diff --git a/drivers/ata/libata-core.c b/drivers/ata/libata-core.c
index 6f4ab5c5b52d..42c8728f6117 100644
--- a/drivers/ata/libata-core.c
+++ b/drivers/ata/libata-core.c
@@ -6767,6 +6767,26 @@ void ata_pci_remove_one(struct pci_dev *pdev)
ata_host_detach(host);
 }
 
+void ata_pci_shutdown_one(struct pci_dev *pdev)
+{
+   struct ata_host *host = pci_get_drvdata(pdev);
+   int i;
+
+   for (i = 0; i < host->n_ports; i++) {
+   struct ata_port *ap = host->ports[i];
+
+   ap->pflags |= ATA_PFLAG_FROZEN;
+
+   /* Disable port interrupts */
+   if (ap->ops->freeze)
+   ap->ops->freeze(ap);
+
+   /* Stop the port DMA engines */
+   if (ap->ops->port_stop)
+   ap->ops->port_stop(ap);
+   }
+}
+
 /* move to PCI subsystem */
 int pci_test_config_bits(struct pci_dev *pdev, const struct pci_bits *bits)
 {
@@ -7387,6 +7407,7 @@ EXPORT_SYMBOL_GPL(ata_timing_cycle2mode);
 
 #ifdef CONFIG_PCI
 EXPORT_SYMBOL_GPL(pci_test_config_bits);
+EXPORT_SYMBOL_GPL(ata_pci_shutdown_one);
 EXPORT_SYMBOL_GPL(ata_pci_remove_one);
 #ifdef CONFIG_PM
 EXPORT_SYMBOL_GPL(ata_pci_device_do_suspend);
diff --git a/include/linux/libata.h b/include/linux/libata.h
index 2dbde119721d..bff539918d82 100644
--- a/include/linux/libata.h
+++ b/include/linux/libata.h
@@ -1221,6 +1221,7 @@ struct pci_bits {
 };
 
 extern int pci_test_config_bits(struct pci_dev *pdev, const struct pci_bits 
*bits);
+extern void ata_pci_shutdown_one(struct pci_dev *pdev);
 extern void ata_pci_remove_one(struct pci_dev *pdev);
 
 #ifdef CONFIG_PM
-- 
2.17.1


___
kexec mailing list
kexec@lists.infradead.org
http://lists.infradead.org/mailman/listinfo/kexec