Re: [PATCH v5 14/14] memory-hotplug: free node_data when a node is offlined

2012-12-27 Thread Wen Congyang
At 12/26/2012 11:55 AM, Kamezawa Hiroyuki Wrote:
 (2012/12/24 21:09), Tang Chen wrote:
 From: Wen Congyang we...@cn.fujitsu.com

 We call hotadd_new_pgdat() to allocate memory to store node_data. So we
 should free it when removing a node.

 Signed-off-by: Wen Congyang we...@cn.fujitsu.com
 
 I'm sorry but is it safe to remove pgdat ? All zone cache and zonelists are
 properly cleared/rebuilded in synchronous way ? and No threads are visinting
 zone in vmscan.c ?

We have rebuilt zonelists when a zone has no memory after offlining some pages.

Thanks
Wen Congyang

 
 Thanks,
 -Kame
 
 ---
   mm/memory_hotplug.c |   20 +++-
   1 files changed, 19 insertions(+), 1 deletions(-)

 diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
 index f8a1d2f..447fa24 100644
 --- a/mm/memory_hotplug.c
 +++ b/mm/memory_hotplug.c
 @@ -1680,9 +1680,12 @@ static int check_cpu_on_node(void *data)
   /* offline the node if all memory sections of this node are removed */
   static void try_offline_node(int nid)
   {
 +pg_data_t *pgdat = NODE_DATA(nid);
  unsigned long start_pfn = NODE_DATA(nid)-node_start_pfn;
 -unsigned long end_pfn = start_pfn + NODE_DATA(nid)-node_spanned_pages;
 +unsigned long end_pfn = start_pfn + pgdat-node_spanned_pages;
  unsigned long pfn;
 +struct page *pgdat_page = virt_to_page(pgdat);
 +int i;
   
  for (pfn = start_pfn; pfn  end_pfn; pfn += PAGES_PER_SECTION) {
  unsigned long section_nr = pfn_to_section_nr(pfn);
 @@ -1709,6 +1712,21 @@ static void try_offline_node(int nid)
   */
  node_set_offline(nid);
  unregister_one_node(nid);
 +
 +if (!PageSlab(pgdat_page)  !PageCompound(pgdat_page))
 +/* node data is allocated from boot memory */
 +return;
 +
 +/* free waittable in each zone */
 +for (i = 0; i  MAX_NR_ZONES; i++) {
 +struct zone *zone = pgdat-node_zones + i;
 +
 +if (zone-wait_table)
 +vfree(zone-wait_table);
 +}
 +
 +arch_refresh_nodedata(nid, NULL);
 +arch_free_nodedata(pgdat);
   }
   
   int __ref remove_memory(int nid, u64 start, u64 size)

 
 
 

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] ppc/iommu: prevent false TCE leak message

2012-12-27 Thread Thadeu Lima de Souza Cascardo
When a device DMA window includes the address 0, it's reserved in the
TCE bitmap to avoid returning that address to drivers.

When the device is removed, the bitmap is checked for any mappings not
removed by the driver, indicating a possible DMA mapping leak. Since the
reserved address is not cleared, a message is printed, warning of such a
leak.

Check for the reservation, and clear it before checking for any other
standing mappings.

Signed-off-by: Thadeu Lima de Souza Cascardo casca...@linux.vnet.ibm.com
---
 arch/powerpc/kernel/iommu.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 8226c6c..226e9e5 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -717,6 +717,11 @@ void iommu_free_table(struct iommu_table *tbl, const char 
*node_name)
return;
}
 
+   /* In case we have reserved the first bit, we should not emit
+* the warning below. */
+   if (tbl-it_offset == 0)
+   clear_bit(0, tbl-it_map);
+
/* verify that table contains no entries */
/* it_size is in entries, and we're examining 64 at a time */
for (i = 0; i  (tbl-it_size/64); i++) {
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] ppc/EEH: fix crash when adding a device in a slot with DDW

2012-12-27 Thread Thadeu Lima de Souza Cascardo
The DDW code uses a eeh_dev struct from the pci_dev. However, this is
not set until eeh_add_device_late is called.

Since pci_bus_add_devices is called before eeh_add_device_late, the PCI
devices are added to the bus, making drivers' probe hooks to be called.
These will call set_dma_mask, which will call the DDW code, which will
require the eeh_dev struct from pci_dev. This would result in a crash,
due to a NULL dereference.

Calling eeh_add_device_late after pci_bus_add_devices would make the
system BUG, because device files shouldn't be added to devices there
were not added to the system. So, a new function is needed to add such
files only after pci_bus_add_devices have been called.

Cc: sta...@vger.kernel.org
Signed-off-by: Thadeu Lima de Souza Cascardo casca...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h   |3 +++
 arch/powerpc/kernel/pci-common.c |7 +--
 arch/powerpc/platforms/pseries/eeh.c |   24 +++-
 3 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index b0ef738..71aac19 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -201,6 +201,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev);
 void __init eeh_addr_cache_build(void);
 void eeh_add_device_tree_early(struct device_node *);
 void eeh_add_device_tree_late(struct pci_bus *);
+void eeh_add_device_tree_files(struct pci_bus *);
 void eeh_remove_bus_device(struct pci_dev *, int);
 
 /**
@@ -240,6 +241,8 @@ static inline void eeh_add_device_tree_early(struct 
device_node *dn) { }
 
 static inline void eeh_add_device_tree_late(struct pci_bus *bus) { }
 
+static inline void eeh_add_device_tree_files(struct pci_bus *bus) { }
+
 static inline void eeh_remove_bus_device(struct pci_dev *dev, int purge_pe) { }
 
 static inline void eeh_lock(void) { }
diff --git a/arch/powerpc/kernel/pci-common.c b/arch/powerpc/kernel/pci-common.c
index 7f94f76..7b1f14c 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1480,11 +1480,14 @@ void pcibios_finish_adding_to_bus(struct pci_bus *bus)
pcibios_allocate_bus_resources(bus);
pcibios_claim_one_bus(bus);
 
+   /* Fixup EEH */
+   eeh_add_device_tree_late(bus);
+
/* Add new devices to global lists.  Register in proc, sysfs. */
pci_bus_add_devices(bus);
 
-   /* Fixup EEH */
-   eeh_add_device_tree_late(bus);
+   /* Add EEH sysfs files */
+   eeh_add_device_tree_files(bus);
 }
 EXPORT_SYMBOL_GPL(pcibios_finish_adding_to_bus);
 
diff --git a/arch/powerpc/platforms/pseries/eeh.c 
b/arch/powerpc/platforms/pseries/eeh.c
index 9a04322..a667a34 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -788,7 +788,6 @@ static void eeh_add_device_late(struct pci_dev *dev)
dev-dev.archdata.edev = edev;
 
eeh_addr_cache_insert_dev(dev);
-   eeh_sysfs_add_device(dev);
 }
 
 /**
@@ -815,6 +814,29 @@ void eeh_add_device_tree_late(struct pci_bus *bus)
 EXPORT_SYMBOL_GPL(eeh_add_device_tree_late);
 
 /**
+ * eeh_add_device_tree_files - Add EEH sysfs files for the indicated PCI bus
+ * @bus: PCI bus
+ *
+ * This routine must be used to add EEH sysfs files for PCI
+ * devices which are attached to the indicated PCI bus. The PCI bus
+ * is added after system boot through hotplug or dlpar.
+ */
+void eeh_add_device_tree_files(struct pci_bus *bus)
+{
+   struct pci_dev *dev;
+
+   list_for_each_entry(dev, bus-devices, bus_list) {
+   eeh_sysfs_add_device(dev);
+   if (dev-hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+   struct pci_bus *subbus = dev-subordinate;
+   if (subbus)
+   eeh_add_device_tree_files(subbus);
+   }
+   }
+}
+EXPORT_SYMBOL_GPL(eeh_add_device_tree_files);
+
+/**
  * eeh_remove_device - Undo EEH setup for the indicated pci device
  * @dev: pci device to be removed
  * @purge_pe: remove the PE or not
-- 
1.7.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] PowerPC documentation: fixed path to the powerpc directory

2012-12-27 Thread Thomas Waldecker
ppc - powerpc

Signed-off-by: Thomas Waldecker thomas.waldec...@gmail.com
---
 Documentation/powerpc/cpu_features.txt | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/Documentation/powerpc/cpu_features.txt 
b/Documentation/powerpc/cpu_features.txt
index ffa4183..ae09df8 100644
--- a/Documentation/powerpc/cpu_features.txt
+++ b/Documentation/powerpc/cpu_features.txt
@@ -11,10 +11,10 @@ split instruction and data caches, and if the CPU supports 
the DOZE and NAP
 sleep modes.
 
 Detection of the feature set is simple. A list of processors can be found in
-arch/ppc/kernel/cputable.c. The PVR register is masked and compared with each
-value in the list. If a match is found, the cpu_features of cur_cpu_spec is
-assigned to the feature bitmask for this processor and a __setup_cpu function
-is called.
+arch/powerpc/kernel/cputable.c. The PVR register is masked and compared with
+each value in the list. If a match is found, the cpu_features of cur_cpu_spec
+is assigned to the feature bitmask for this processor and a __setup_cpu
+function is called.
 
 C code may test 'cur_cpu_spec[smp_processor_id()]-cpu_features' for a
 particular feature bit. This is done in quite a few places, for example
@@ -51,6 +51,6 @@ should be used in the majority of cases.
 
 The END_FTR_SECTION macros are implemented by storing information about this
 code in the '__ftr_fixup' ELF section. When do_cpu_ftr_fixups
-(arch/ppc/kernel/misc.S) is invoked, it will iterate over the records in
+(arch/powerpc/kernel/misc.S) is invoked, it will iterate over the records in
 __ftr_fixup, and if the required feature is not present it will loop writing
 nop's from each BEGIN_FTR_SECTION to END_FTR_SECTION.
-- 
1.8.0.2

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v5 14/14] memory-hotplug: free node_data when a node is offlined

2012-12-27 Thread Kamezawa Hiroyuki
(2012/12/27 21:16), Wen Congyang wrote:
 At 12/26/2012 11:55 AM, Kamezawa Hiroyuki Wrote:
 (2012/12/24 21:09), Tang Chen wrote:
 From: Wen Congyang we...@cn.fujitsu.com

 We call hotadd_new_pgdat() to allocate memory to store node_data. So we
 should free it when removing a node.

 Signed-off-by: Wen Congyang we...@cn.fujitsu.com

 I'm sorry but is it safe to remove pgdat ? All zone cache and zonelists are
 properly cleared/rebuilded in synchronous way ? and No threads are visinting
 zone in vmscan.c ?
 
 We have rebuilt zonelists when a zone has no memory after offlining some 
 pages.
 

How do you guarantee that the address of pgdat/zone is not on stack of any 
kernel
threads or other kernel objects without reference counting or other syncing 
method ?


Thanks,
-Kame


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] ppc/EEH: fix crash when adding a device in a slot with DDW

2012-12-27 Thread Gavin Shan
On Thu, Dec 27, 2012 at 02:34:00PM -0200, Thadeu Lima de Souza Cascardo wrote:
The DDW code uses a eeh_dev struct from the pci_dev. However, this is
not set until eeh_add_device_late is called.

Since pci_bus_add_devices is called before eeh_add_device_late, the PCI
devices are added to the bus, making drivers' probe hooks to be called.
These will call set_dma_mask, which will call the DDW code, which will
require the eeh_dev struct from pci_dev. This would result in a crash,
due to a NULL dereference.

Calling eeh_add_device_late after pci_bus_add_devices would make the
system BUG, because device files shouldn't be added to devices there
were not added to the system. So, a new function is needed to add such
files only after pci_bus_add_devices have been called.


Could you please explain for a bit how did you trigger the problem? I'm
not sure you got it while doing PCI hotplug or just saw the issue during
system bootup stage :-)

Cc: sta...@vger.kernel.org
Signed-off-by: Thadeu Lima de Souza Cascardo casca...@linux.vnet.ibm.com
---
 arch/powerpc/include/asm/eeh.h   |3 +++
 arch/powerpc/kernel/pci-common.c |7 +--
 arch/powerpc/platforms/pseries/eeh.c |   24 +++-
 3 files changed, 31 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/eeh.h b/arch/powerpc/include/asm/eeh.h
index b0ef738..71aac19 100644
--- a/arch/powerpc/include/asm/eeh.h
+++ b/arch/powerpc/include/asm/eeh.h
@@ -201,6 +201,7 @@ int eeh_dev_check_failure(struct eeh_dev *edev);
 void __init eeh_addr_cache_build(void);
 void eeh_add_device_tree_early(struct device_node *);
 void eeh_add_device_tree_late(struct pci_bus *);
+void eeh_add_device_tree_files(struct pci_bus *);

Since the function is going to add EEH specific sysfs files, its name would
be something like eeh_add_sysfs_files instead of eeh_add_device_tree_files 
:-)

 void eeh_remove_bus_device(struct pci_dev *, int);

 /**
@@ -240,6 +241,8 @@ static inline void eeh_add_device_tree_early(struct 
device_node *dn) { }

 static inline void eeh_add_device_tree_late(struct pci_bus *bus) { }

+static inline void eeh_add_device_tree_files(struct pci_bus *bus) { }
+

It'd better to rename the function name to eeh_add_sysfs_files mentioned
as above.

 static inline void eeh_remove_bus_device(struct pci_dev *dev, int purge_pe) { 
 }

 static inline void eeh_lock(void) { }
diff --git a/arch/powerpc/kernel/pci-common.c 
b/arch/powerpc/kernel/pci-common.c
index 7f94f76..7b1f14c 100644
--- a/arch/powerpc/kernel/pci-common.c
+++ b/arch/powerpc/kernel/pci-common.c
@@ -1480,11 +1480,14 @@ void pcibios_finish_adding_to_bus(struct pci_bus *bus)
   pcibios_allocate_bus_resources(bus);
   pcibios_claim_one_bus(bus);

+  /* Fixup EEH */
+  eeh_add_device_tree_late(bus);
+
   /* Add new devices to global lists.  Register in proc, sysfs. */
   pci_bus_add_devices(bus);

-  /* Fixup EEH */
-  eeh_add_device_tree_late(bus);
+  /* Add EEH sysfs files */
+  eeh_add_device_tree_files(bus);

The function name would be eeh_add_sysfs_files as above.

 }
 EXPORT_SYMBOL_GPL(pcibios_finish_adding_to_bus);


By the way, arch/powerpc/kernel/of_platform.c::of_pci_phb_probe is also calling
to eeh_add_device_tree_late() as well. Since you have removed part of the logic
from original eeh_add_device_tree_late(), which is add EEH specific sysfs files,
and you put that part of logic to eeh_add_device_tree_files(). So I think you
also need make the similiar change for of_pci_phb_probe() as well :-)

diff --git a/arch/powerpc/platforms/pseries/eeh.c 
b/arch/powerpc/platforms/pseries/eeh.c
index 9a04322..a667a34 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -788,7 +788,6 @@ static void eeh_add_device_late(struct pci_dev *dev)
   dev-dev.archdata.edev = edev;

   eeh_addr_cache_insert_dev(dev);
-  eeh_sysfs_add_device(dev);
 }

 /**
@@ -815,6 +814,29 @@ void eeh_add_device_tree_late(struct pci_bus *bus)
 EXPORT_SYMBOL_GPL(eeh_add_device_tree_late);

 /**
+ * eeh_add_device_tree_files - Add EEH sysfs files for the indicated PCI bus
+ * @bus: PCI bus
+ *
+ * This routine must be used to add EEH sysfs files for PCI
+ * devices which are attached to the indicated PCI bus. The PCI bus
+ * is added after system boot through hotplug or dlpar.
+ */
+void eeh_add_device_tree_files(struct pci_bus *bus)
+{
+  struct pci_dev *dev;
+
+  list_for_each_entry(dev, bus-devices, bus_list) {
+  eeh_sysfs_add_device(dev);
+  if (dev-hdr_type == PCI_HEADER_TYPE_BRIDGE) {
+  struct pci_bus *subbus = dev-subordinate;
+  if (subbus)
+  eeh_add_device_tree_files(subbus);
+  }
+  }
+}
+EXPORT_SYMBOL_GPL(eeh_add_device_tree_files);
+

The function name mentioned as above.

+/**
  * eeh_remove_device - Undo EEH setup for the indicated pci device
  * @dev: pci device to be removed
  * 

Re: [PATCH] ppc/iommu: prevent false TCE leak message

2012-12-27 Thread Gavin Shan
On Thu, Dec 27, 2012 at 02:28:06PM -0200, Thadeu Lima de Souza Cascardo wrote:
When a device DMA window includes the address 0, it's reserved in the
TCE bitmap to avoid returning that address to drivers.

When the device is removed, the bitmap is checked for any mappings not
removed by the driver, indicating a possible DMA mapping leak. Since the
reserved address is not cleared, a message is printed, warning of such a
leak.

Check for the reservation, and clear it before checking for any other
standing mappings.

Signed-off-by: Thadeu Lima de Souza Cascardo casca...@linux.vnet.ibm.com
---
 arch/powerpc/kernel/iommu.c |5 +
 1 files changed, 5 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/iommu.c b/arch/powerpc/kernel/iommu.c
index 8226c6c..226e9e5 100644
--- a/arch/powerpc/kernel/iommu.c
+++ b/arch/powerpc/kernel/iommu.c
@@ -717,6 +717,11 @@ void iommu_free_table(struct iommu_table *tbl, const char 
*node_name)
   return;
   }

+  /* In case we have reserved the first bit, we should not emit
+   * the warning below. */

At least, the comment would look like:

/*
 * xxx
 */

+  if (tbl-it_offset == 0)
+  clear_bit(0, tbl-it_map);
+
   /* verify that table contains no entries */
   /* it_size is in entries, and we're examining 64 at a time */

The comment would be merged as well? :-)

   for (i = 0; i  (tbl-it_size/64); i++) {

Thanks,
Gavin

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev