[PPC] relocation truncated to fit: R_PPC64_REL24 against symbol `.eeh_check_failure' defined in .text section in arch/powerpc/platforms/built-in.o

2012-09-28 Thread Fengguang Wu
Hi,

I got such build errors in powerpc allyesconfig and other configs.
How can they be eliminated? I'm running the cross compile tools from
kernel.org.

drivers/built-in.o: In function `.yenta_interrupt':
yenta_socket.c:(.text+0x1ffba78): relocation truncated to fit: R_PPC64_REL24 
against symbol `.eeh_check_failure' defined in .text section in 
arch/powerpc/platforms/built-in.o
yenta_socket.c:(.text+0x1ffbb40): relocation truncated to fit: R_PPC64_REL24 
against symbol `.eeh_check_failure' defined in .text section in 
arch/powerpc/platforms/built-in.o
yenta_socket.c:(.text+0x1ffbcd0): relocation truncated to fit: R_PPC64_REL24 
against symbol `.eeh_check_failure' defined in .text section in 
arch/powerpc/platforms/built-in.o
drivers/built-in.o: In function `.yenta_interrupt_wrapper':
yenta_socket.c:(.text+0x1ffbe3c): relocation truncated to fit: R_PPC64_REL24 
against symbol `_savegpr0_29' defined in .text.save.restore section in 
arch/powerpc/lib/built-in.o
yenta_socket.c:(.text+0x1ffbea8): relocation truncated to fit: R_PPC64_REL24 
against symbol `_restgpr0_29' defined in .text.save.restore section in 
arch/powerpc/lib/built-in.o
drivers/built-in.o: In function `.yenta_probe_irq.isra.1':
yenta_socket.c:(.text+0x1ffc044): relocation truncated to fit: R_PPC64_REL24 
against symbol `.eeh_check_failure' defined in .text section in 
arch/powerpc/platforms/built-in.o
yenta_socket.c:(.text+0x1ffc1d0): relocation truncated to fit: R_PPC64_REL24 
against symbol `.eeh_check_failure' defined in .text section in 
arch/powerpc/platforms/built-in.o
yenta_socket.c:(.text+0x1ffc298): relocation truncated to fit: R_PPC64_REL24 
against symbol `.eeh_check_failure' defined in .text section in 
arch/powerpc/platforms/built-in.o
yenta_socket.c:(.text+0x1ffc478): relocation truncated to fit: R_PPC64_REL24 
against symbol `.eeh_check_failure' defined in .text section in 
arch/powerpc/platforms/built-in.o
yenta_socket.c:(.text+0x1ffc608): relocation truncated to fit: R_PPC64_REL24 
against symbol `.eeh_check_failure' defined in .text section in 
arch/powerpc/platforms/built-in.o
yenta_socket.c:(.text+0x1ffc7a0): additional relocation overflows omitted from 
the output

Thanks,
Fengguang
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 3/3] edac/85xx: Enable the EDAC PCI err driver by device_initcall

2012-09-28 Thread Kumar Gala

On Sep 27, 2012, at 4:51 PM, Scott Wood wrote:

 On 09/27/2012 04:45:08 PM, Gala Kumar-B11780 wrote:
 On Sep 27, 2012, at 11:09 AM, Scott Wood wrote:
 On 09/27/2012 02:02:03 PM, Chunhe Lan wrote:
 Original process of call:
The mpc85xx_pci_err_probe function completes to been registered
and enabled of EDAC PCI err driver at the latter time stage of
kernel boot in the mpc85xx_edac.c.
 Current process of call:
The mpc85xx_pci_err_probe function completes to been registered
and enabled of EDAC PCI err driver at the first time stage of
kernel boot in the fsl_pci.c.
 So in this case the following error messages appear in the boot log:
   PCI: Probing PCI hardware
   pci :00:00.0: ignoring class b20 (doesn't match header type 01)
   PCIE error(s) detected
   PCIE ERR_DR register: 0x0002
   PCIE ERR_CAP_STAT register: 0x8001
   PCIE ERR_CAP_R0 register: 0x0800
   PCIE ERR_CAP_R1 register: 0x
   PCIE ERR_CAP_R2 register: 0x
   PCIE ERR_CAP_R3 register: 0x
 Because the EDAC PCI err driver is registered and enabled earlier than
 original point of call. But at this point of time, PCI hardware is not
 probed and initialized, and it is in unknowable state.
 So, move enable function into mpc85xx_pci_err_en which is called at the
 middle time stage of kernel boot and after PCI hardware is probed and
 initialized by device_initcall in the fsl_pci.c.
 Signed-off-by: Chunhe Lan chunhe@freescale.com
 ---
 arch/powerpc/sysdev/fsl_pci.c |   12 ++
 arch/powerpc/sysdev/fsl_pci.h |5 
 drivers/edac/mpc85xx_edac.c   |   47 
 
 3 files changed, 50 insertions(+), 14 deletions(-)
 diff --git a/arch/powerpc/sysdev/fsl_pci.c b/arch/powerpc/sysdev/fsl_pci.c
 index 3d6f4d8..a591965 100644
 --- a/arch/powerpc/sysdev/fsl_pci.c
 +++ b/arch/powerpc/sysdev/fsl_pci.c
 @@ -904,4 +904,16 @@ static int __init fsl_pci_init(void)
return platform_driver_register(fsl_pci_driver);
 }
 arch_initcall(fsl_pci_init);
 +
 +static int __init fsl_pci_err_en(void)
 +{
 +  struct device_node *np;
 +
 +  for_each_node_by_type(np, pci)
 +  if (of_match_node(pci_ids, np))
 +  mpc85xx_pci_err_en(np);
 +
 +  return 0;
 +}
 +device_initcall(fsl_pci_err_en);
 
 Why can't you call this from the normal PCIe controller init, instead of 
 searching for the node independently?
 Don't we have this now with mpc85xx_pci_err_probe() ??
 
 What do you mean by this?

I'm saying don't we replace fsl_pci_err_en() with mpc85xx_pci_err_probe()...

I need to look at this more, but not clear why mpc85xx_pci_err_en() can just be 
part of mpc85xx_pci_err_probe()

- k
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


R: Re: PCI device not working

2012-09-28 Thread Davide Viti
Hi Kumar,


It was, can you figure out in u-boot what exact config read on 
the bus would return the correct thing.

The fact that when we probe the 
device at 0001:03 we should get back something like cfg_data=0xabba1b65


here 
follow some details about what is going on inside u-boot; verbosity increases 
from [1] to [3]

 [1] PCI printouts when the board come up
 [2] output of pci 
[0-3] long u-boot command
 [3] same as [1] but with debug print inside 
indirect_read_config_##size() [drivers/pci/pci_indirect.c]

if you were curious 
about our u-boot board settings, please refer to:
http://www.mail-archive.
com/linuxppc-dev@lists.ozlabs.org/msg62007.html

thanx alot,
Davide



*
*[1]*
*
PCIE1 used as Root Complex (base 
addr ffe09000)
   Scanning PCI bus 01
01  00  1b65  abba  
0280  00
cfg_addr:ffe09000  cfg_data:ffe09004  indirect_type:0

PCIE1 on bus 00 - 01


PCIE2 used as Root Complex (base addr ffe0a000)

   Scanning PCI bus 03
03  00  1b65  abba  0280  00

cfg_addr:ffe0a000  cfg_data:ffe0a004  indirect_type:0
PCIE2 on bus 02 - 03


*
*[2]*
*

= pci 0 long
Scanning PCI devices 
on bus 0

Found PCI device 00.00.00:
  vendor ID =   0x1957
  
device ID =   0x0100
  command register =0x0006
  
status register = 0x0010
  revision ID = 0x11
  
class code =  0x0b (Processor)
  sub class code =  
0x20
  programming interface =   0x00
  cache line =  0x08

  latency time =0x00
  header type = 0x01
  
BIST =0x00
  base address 0 =  0xfff0
  
base address 1 =  0x
  primary bus number =  0x00
  
secondary bus number =0x01
  subordinate bus number =  0x01
  
secondary latency timer = 0x00
  IO base = 0x00
  IO 
limit =0x00
  secondary status =0x
  memory 
base = 0xa000
  memory limit =0xa000
  prefetch 
memory base =0x1001
  prefetch memory limit =   0x0001
  prefetch 
memory base upper =  0x
  prefetch memory limit upper = 0x
  IO 
base upper 16 bits =   0x
  IO limit upper 16 bits =  0x
  
expansion ROM base address =  0x
  interrupt line =  0x00
  
interrupt pin =   0x00
  bridge control =  0x

= 
pci 1 long
Scanning PCI devices on bus 1

Found PCI device 01.00.00:kk
  vendor 
ID =   0x1b65
  device ID =   0xabba
  command 
register =0x0006
  status register = 0x0010
  revision 
ID = 0x01
  class code =  0x02 (Network 
controller)
  sub class code =  0x80
  programming interface 
=   0x00
  cache line =  0x08
  latency time 
=0x00
  header type = 0x00
  BIST 
=0x00
  base address 0 =  0xa000
  base 
address 1 =  0xa001
  base address 2 =  0x

  base address 3 =  0x
  base address 4 =  
0x
  base address 5 =  0x
  cardBus CIS pointer 
= 0x
  sub system vendor ID =0x
  sub system ID 
=   0x
  expansion ROM base address =  0x
  interrupt 
line =  0x00
  interrupt pin =   0x01
  min Grant 
=   0x00
  max Latency = 0x00

= pci 2 long

Scanning PCI devices on bus 2

Found PCI device 02.00.00:
  vendor ID 
=   0x1957
  device ID =   0x0100
  command 
register =0x0006
  status register = 0x0010
  revision 
ID = 0x11
  class code =  0x0b (Processor)
  
sub class code =  0x20
  programming interface =   0x00
  cache 
line =  0x08
  latency time =0x00
  header type 
= 0x01
  BIST =0x00
  base address 0 
=  0xfff0
  base address 1 =  0x
  primary 
bus number =  0x00
  secondary bus number =0x01
  subordinate 
bus number =  0x01
  secondary latency timer = 0x00
  IO base 
= 0x00
  IO limit =0x00
  secondary 
status =0x
  memory base = 0xb000
  memory 
limit =0xb000
  prefetch memory base =0x1001
  prefetch 
memory limit =   0x0001
  prefetch memory base upper =  0x
  
prefetch memory limit upper = 0x
  IO base upper 16 bits =   0x

  IO limit upper 16 bits =  0x
  expansion ROM base address =  
0x
  interrupt line =

Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC

2012-09-28 Thread J. Bruce Fields
On Fri, Sep 28, 2012 at 04:19:55AM +0200, Alexander Graf wrote:
 
 On 28.09.2012, at 04:04, Linus Torvalds wrote:
 
  On Thu, Sep 27, 2012 at 6:55 PM, Alexander Graf ag...@suse.de wrote:
  
  Below are OOPS excerpts from different rc's I tried. All of them crashed - 
  all the way up to current Linus' master branch. I haven't cross-checked, 
  but I don't remember any such behavior from pre-3.6 releases.
  
  Since you seem to be able to reproduce it easily (and apparently
  reliably), any chance you could just bisect it?
  
  Since I assume v3.5 is fine, and apparently -rc1 is already busted, a simple
  
git bisect start
git bisect good v3.5
git bisect bad v3.6-rc1
  
  will get you started on your adventure..
 
 Heh, will give it a try :). The thing really does look quite bisectable.
 
 
 It might take a few hours though - the machine isn't exactly fast by today's 
 standards and it's getting late here. But I'll keep you updated.

I doubt it's anything special about that workload, but just for kicks I
tried a git clone -ls (cloning my linux tree to another directory on
the same nfs filesystem), with server on 3.6.0-rc7, and didn't see
anything interesting (just an xfs lockdep warning that looks like this
one jlayton already reported:
http://oss.sgi.com/archives/xfs/2012-09/msg00088.html
)

Any (even partial) bisection results would certainly be useful, thanks.

--b.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [REGRESSION] nfsd crashing with 3.6.0-rc7 on PowerPC

2012-09-28 Thread Alexander Graf

On 28.09.2012, at 17:10, J. Bruce Fields wrote:

 On Fri, Sep 28, 2012 at 04:19:55AM +0200, Alexander Graf wrote:
 
 On 28.09.2012, at 04:04, Linus Torvalds wrote:
 
 On Thu, Sep 27, 2012 at 6:55 PM, Alexander Graf ag...@suse.de wrote:
 
 Below are OOPS excerpts from different rc's I tried. All of them crashed - 
 all the way up to current Linus' master branch. I haven't cross-checked, 
 but I don't remember any such behavior from pre-3.6 releases.
 
 Since you seem to be able to reproduce it easily (and apparently
 reliably), any chance you could just bisect it?
 
 Since I assume v3.5 is fine, and apparently -rc1 is already busted, a simple
 
  git bisect start
  git bisect good v3.5
  git bisect bad v3.6-rc1
 
 will get you started on your adventure..
 
 Heh, will give it a try :). The thing really does look quite bisectable.
 
 
 It might take a few hours though - the machine isn't exactly fast by today's 
 standards and it's getting late here. But I'll keep you updated.
 
 I doubt it's anything special about that workload, but just for kicks I
 tried a git clone -ls (cloning my linux tree to another directory on
 the same nfs filesystem), with server on 3.6.0-rc7, and didn't see
 anything interesting (just an xfs lockdep warning that looks like this
 one jlayton already reported:
 http://oss.sgi.com/archives/xfs/2012-09/msg00088.html
 )
 
 Any (even partial) bisection results would certainly be useful, thanks.

Yeah, still trying. Running the same workload in a PPC VM didn't show any 
badness. Then I tried again to bisect on the machine it broken on, and that 
commit failed even more badly on me than the previous ones, destroying my local 
git tree.

Trying to narrow down now in a slightly more contained environment :).


Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 3/3] edac/85xx: Enable the EDAC PCI err driver by device_initcall

2012-09-28 Thread Scott Wood

On 09/27/2012 05:33:26 PM, Kumar Gala wrote:


On Sep 27, 2012, at 4:51 PM, Scott Wood wrote:

 On 09/27/2012 04:45:08 PM, Gala Kumar-B11780 wrote:
 On Sep 27, 2012, at 11:09 AM, Scott Wood wrote:
 On 09/27/2012 02:02:03 PM, Chunhe Lan wrote:
 Original process of call:
The mpc85xx_pci_err_probe function completes to been registered
and enabled of EDAC PCI err driver at the latter time stage of
kernel boot in the mpc85xx_edac.c.
 Current process of call:
The mpc85xx_pci_err_probe function completes to been registered
and enabled of EDAC PCI err driver at the first time stage of
kernel boot in the fsl_pci.c.
 So in this case the following error messages appear in the boot  
log:

   PCI: Probing PCI hardware
   pci :00:00.0: ignoring class b20 (doesn't match header  
type 01)

   PCIE error(s) detected
   PCIE ERR_DR register: 0x0002
   PCIE ERR_CAP_STAT register: 0x8001
   PCIE ERR_CAP_R0 register: 0x0800
   PCIE ERR_CAP_R1 register: 0x
   PCIE ERR_CAP_R2 register: 0x
   PCIE ERR_CAP_R3 register: 0x
 Because the EDAC PCI err driver is registered and enabled  
earlier than
 original point of call. But at this point of time, PCI hardware  
is not

 probed and initialized, and it is in unknowable state.
 So, move enable function into mpc85xx_pci_err_en which is called  
at the
 middle time stage of kernel boot and after PCI hardware is  
probed and

 initialized by device_initcall in the fsl_pci.c.
 Signed-off-by: Chunhe Lan chunhe@freescale.com
 ---
 arch/powerpc/sysdev/fsl_pci.c |   12 ++
 arch/powerpc/sysdev/fsl_pci.h |5 
 drivers/edac/mpc85xx_edac.c   |   47  


 3 files changed, 50 insertions(+), 14 deletions(-)
 diff --git a/arch/powerpc/sysdev/fsl_pci.c  
b/arch/powerpc/sysdev/fsl_pci.c

 index 3d6f4d8..a591965 100644
 --- a/arch/powerpc/sysdev/fsl_pci.c
 +++ b/arch/powerpc/sysdev/fsl_pci.c
 @@ -904,4 +904,16 @@ static int __init fsl_pci_init(void)
return platform_driver_register(fsl_pci_driver);
 }
 arch_initcall(fsl_pci_init);
 +
 +static int __init fsl_pci_err_en(void)
 +{
 +  struct device_node *np;
 +
 +  for_each_node_by_type(np, pci)
 +  if (of_match_node(pci_ids, np))
 +  mpc85xx_pci_err_en(np);
 +
 +  return 0;
 +}
 +device_initcall(fsl_pci_err_en);

 Why can't you call this from the normal PCIe controller init,  
instead of searching for the node independently?

 Don't we have this now with mpc85xx_pci_err_probe() ??

 What do you mean by this?

I'm saying don't we replace fsl_pci_err_en() with  
mpc85xx_pci_err_probe()...


I need to look at this more, but not clear why mpc85xx_pci_err_en()  
can just be part of mpc85xx_pci_err_probe()


OK, I was confused -- I thought the point was to make it happen  
earlier, not later.  The changelog is not clear at all.


Don't we want to be able to capture errors that happen during PCI  
driver initialization, though?


-Scott
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/6] powerpc: Add enable_ppr kernel parameter to enable PPR save/restore

2012-09-28 Thread Ryan Arnold
On Tue, 2012-09-11 at 15:55 +1000, Benjamin Herrenschmidt wrote:
 On Mon, 2012-09-10 at 22:42 -0700, Haren Myneni wrote:
  
  Thanks Michael. Yes, we noticed 6% overhead with null syscall test.
  Hence added cmdline option as suggested. I will add this comment in
  the
  changelog.
  
  Regarding the option name, I thought about various ones such as
  retain_process_ppr, retain_smt_priority, save_ppr and etc. Finally
  added
  'enable_ppr' since it enables CPU_FTR (CPU_FTR_HAS_PPR) which allows
  to
  save/restore PPR value. Sure, I will change this option.
 
 No, that isn't a problem with the name. It's a problem with the polarity
 of the option.
 
 If you need a command line argument to enable the option, then nobody
 will enable it, it's pointless.

In GLIBC (ppc.h) we'll be providing a user space API to change the
thread priority in user state.  We're also interested in using this in
some of the locking constructs if performance tests indicate it's
beneficial.

I have concerns with being able to enable/disable this option at boot
time.  Usually, in GLIBC we'll just do a kernel version check and enable
certain facilities if we're building against a particular kernel that
supports them.

In this case, with a configurable option, GLIBC is going to need the
kernel to export a hwcap bit that tells us whether we need to do the
save/restore ourselves.  Having to check the hwcap, and do the
save/restore in user space will, of course, increase the overhead on our
side.

If no hwcap bit is provided and this is disabled at kernel boot time, no
check is done and the user process assumes it's running under a certain
priority when it is, in-fact, not.  I don't care for this option.  We'll
be hitting code paths that are ineffective and unnecessary.

Ryan S. Arnold
Linux Technology Center


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC v9 PATCH 01/21] memory-hotplug: rename remove_memory() to offline_memory()/offline_pages()

2012-09-28 Thread KOSAKI Motohiro
On Thu, Sep 27, 2012 at 11:50 PM, Yasuaki Ishimatsu
isimatu.yasu...@jp.fujitsu.com wrote:
 Hi Chen,


 2012/09/28 11:22, Ni zhan Chen wrote:

 On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote:

 From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

 remove_memory() only try to offline pages. It is called in two cases:
 1. hot remove a memory device
 2. echo offline /sys/devices/system/memory/memoryXX/state

 In the 1st case, we should also change memory block's state, and notify
 the userspace that the memory block's state is changed after offlining
 pages.

 So rename remove_memory() to offline_memory()/offline_pages(). And in
 the 1st case, offline_memory() will be used. The function
 offline_memory()
 is not implemented. In the 2nd case, offline_pages() will be used.


 But this time there is not a function associated with add_memory.


 To associate with add_memory() later, we renamed it.

Then, you introduced bisect breakage. It is definitely unacceptable.

NAK.
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC v9 PATCH 13/21] memory-hotplug: check page type in get_page_bootmem

2012-09-28 Thread Ni zhan Chen

On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote:

From: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com

The function get_page_bootmem() may be called more than one time to the same
page. There is no need to set page's type, private if the function is not
the first time called to the page.

Note: the patch is just optimization and does not fix any problem.


Hi Yasuaki,

this patch is reasonable to me. I have another question associated to 
get_page_bootmem(), the question is from another fujitsu guy's patch 
changelog [commit : 04753278769f3], the changelog said  that:


 1) When the memmap of removing section is allocated on other
 section by bootmem, it should/can be free.
 2) When the memmap of removing section is allocated on the
 same section, it shouldn't be freed. Because the section has to be
 logical memory offlined already and all pages must be isolated against
 page allocater. If it is freed, page allocator may use it which will
 be removed physically soon.

but I don't see his patch guarantee 2), it means that his patch doesn't 
guarantee the memmap of removing section which is allocated on other 
section by bootmem doesn't be freed. Hopefully get your explaination in 
details, thanks in advance. :-)




CC: David Rientjes rient...@google.com
CC: Jiang Liu liu...@gmail.com
CC: Len Brown len.br...@intel.com
CC: Benjamin Herrenschmidt b...@kernel.crashing.org
CC: Paul Mackerras pau...@samba.org
CC: Christoph Lameter c...@linux.com
Cc: Minchan Kim minchan@gmail.com
CC: Andrew Morton a...@linux-foundation.org
CC: KOSAKI Motohiro kosaki.motoh...@jp.fujitsu.com
CC: Wen Congyang we...@cn.fujitsu.com
Signed-off-by: Yasuaki Ishimatsu isimatu.yasu...@jp.fujitsu.com
---
  mm/memory_hotplug.c |   15 +++
  1 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index d736df3..26a5012 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -95,10 +95,17 @@ static void release_memory_resource(struct resource *res)
  static void get_page_bootmem(unsigned long info,  struct page *page,
 unsigned long type)
  {
-   page-lru.next = (struct list_head *) type;
-   SetPagePrivate(page);
-   set_page_private(page, info);
-   atomic_inc(page-_count);
+   unsigned long page_type;
+
+   page_type = (unsigned long)page-lru.next;
+   if (page_type  MEMORY_HOTPLUG_MIN_BOOTMEM_TYPE ||
+   page_type  MEMORY_HOTPLUG_MAX_BOOTMEM_TYPE){
+   page-lru.next = (struct list_head *)type;
+   SetPagePrivate(page);
+   set_page_private(page, info);
+   atomic_inc(page-_count);
+   } else
+   atomic_inc(page-_count);
  }
  
  /* reference to __meminit __free_pages_bootmem is valid


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 3/3] edac/85xx: Enable the EDAC PCI err driver by device_initcall

2012-09-28 Thread Chunhe Lan

On 09/28/2012 01:35 PM, Scott Wood wrote:

On 09/27/2012 05:33:26 PM, Kumar Gala wrote:


On Sep 27, 2012, at 4:51 PM, Scott Wood wrote:

 On 09/27/2012 04:45:08 PM, Gala Kumar-B11780 wrote:
 On Sep 27, 2012, at 11:09 AM, Scott Wood wrote:
 On 09/27/2012 02:02:03 PM, Chunhe Lan wrote:
 Original process of call:
 The mpc85xx_pci_err_probe function completes to been registered
 and enabled of EDAC PCI err driver at the latter time stage of
 kernel boot in the mpc85xx_edac.c.
 Current process of call:
 The mpc85xx_pci_err_probe function completes to been registered
 and enabled of EDAC PCI err driver at the firsttime 
stage of

 kernel boot in the fsl_pci.c.
 So in this case the following error messages appear in the boot 
log:

   PCI: Probing PCI hardware
   pci :00:00.0: ignoring class b20 (doesn't match header 
type 01)

   PCIE error(s) detected
   PCIE ERR_DR register: 0x0002
   PCIE ERR_CAP_STAT register: 0x8001
   PCIE ERR_CAP_R0 register: 0x0800
   PCIE ERR_CAP_R1 register: 0x
   PCIE ERR_CAP_R2 register: 0x
   PCIE ERR_CAP_R3 register: 0x
 Because the EDAC PCI err driver is registered and enabled 
earlier than
 original point of call. But at this point of time, PCI hardware 
is not

 probed and initialized, and it is in unknowable state.
 So, move enable function into mpc85xx_pci_err_en which is called 
at the
 middle time stage of kernel boot and after PCI hardware is 
probed and

 initialized by device_initcall in the fsl_pci.c.
 Signed-off-by: Chunhe Lan chunhe@freescale.com
 ---
 arch/powerpc/sysdev/fsl_pci.c |   12 ++
 arch/powerpc/sysdev/fsl_pci.h |5 
 drivers/edac/mpc85xx_edac.c   |   47 


 3 files changed, 50 insertions(+), 14 deletions(-)
 diff --git a/arch/powerpc/sysdev/fsl_pci.c 
b/arch/powerpc/sysdev/fsl_pci.c

 index 3d6f4d8..a591965 100644
 --- a/arch/powerpc/sysdev/fsl_pci.c
 +++ b/arch/powerpc/sysdev/fsl_pci.c
 @@ -904,4 +904,16 @@ static int __init fsl_pci_init(void)
 return platform_driver_register(fsl_pci_driver);
 }
 arch_initcall(fsl_pci_init);
 +
 +static int __init fsl_pci_err_en(void)
 +{
 +struct device_node *np;
 +
 +for_each_node_by_type(np, pci)
 +if (of_match_node(pci_ids, np))
 +mpc85xx_pci_err_en(np);
 +
 +return 0;
 +}
 +device_initcall(fsl_pci_err_en);

 Why can't you call this from the normal PCIe controller init, 
instead of searching for the node independently?

 Don't we have this now with mpc85xx_pci_err_probe() ??

 What do you mean by this?

I'm saying don't we replace fsl_pci_err_en() with 
mpc85xx_pci_err_probe()...


I need to look at this more, but not clear why mpc85xx_pci_err_en() 
can just be part of mpc85xx_pci_err_probe()


OK, I was confused -- I thought the point was to make it happen 
earlier, not later.  The changelog is not clear at all.


Don't we want to be able to capture errors that happen during PCI 
driver initialization, though?

Yes.
When PCI controller is probing slot which if the any device does 
not have on, happens the invalid address errors.
Then the edac driver prints the many error massages. This makes 
sense as normal, but this is ugly.
So, move the enable edac driver to later, and only detect the 
errors of the follow-up pci operations.


   Thanks,
   Chunhe


-Scott




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [RFC v9 PATCH 00/21] memory-hotplug: hot-remove physical memory

2012-09-28 Thread Ni zhan Chen

On 09/05/2012 05:25 PM, we...@cn.fujitsu.com wrote:

From: Wen Congyang we...@cn.fujitsu.com

This patch series aims to support physical memory hot-remove.

The patches can free/remove the following things:

   - acpi_memory_info  : [RFC PATCH 4/19]
   - /sys/firmware/memmap/X/{end, start, type} : [RFC PATCH 8/19]
   - iomem_resource: [RFC PATCH 9/19]
   - mem_section and related sysfs files   : [RFC PATCH 10-11, 13-16/19]
   - page table of removed memory  : [RFC PATCH 12/19]
   - node and related sysfs files  : [RFC PATCH 18-19/19]

If you find lack of function for physical memory hot-remove, please let me
know.


Since patchset is too big, could you add more patchset changelog to 
describe how this patchset works? in order that it is easier to review.




How to test this patchset?
1. apply this patchset and build the kernel. MEMORY_HOTPLUG, MEMORY_HOTREMOVE,
ACPI_HOTPLUG_MEMORY must be selected.
2. load the module acpi_memhotplug
3. hotplug the memory device(it depends on your hardware)
You will see the memory device under the directory /sys/bus/acpi/devices/.
Its name is PNP0C80:XX.
4. online/offline pages provided by this memory device
You can write online/offline to /sys/devices/system/memory/memoryX/state to
online/offline pages provided by this memory device
5. hotremove the memory device
You can hotremove the memory device by the hardware, or writing 1 to
/sys/bus/acpi/devices/PNP0C80:XX/eject.

Note: if the memory provided by the memory device is used by the kernel, it
can't be offlined. It is not a bug.

Known problems:
1. memory can't be offlined when CONFIG_MEMCG is selected.
For example: there is a memory device on node 1. The address range
is [1G, 1.5G). You will find 4 new directories memory8, memory9, memory10,
and memory11 under the directory /sys/devices/system/memory/.
If CONFIG_MEMCG is selected, we will allocate memory to store page cgroup
when we online pages. When we online memory8, the memory stored page cgroup
is not provided by this memory device. But when we online memory9, the 
memory
stored page cgroup may be provided by memory8. So we can't offline memory8
now. We should offline the memory in the reversed order.
When the memory device is hotremoved, we will auto offline memory provided
by this memory device. But we don't know which memory is onlined first, so
offlining memory may fail. In such case, you should offline the memory by
hand before hotremoving the memory device.
2. hotremoving memory device may cause kernel panicked
This bug will be fixed by Liu Jiang's patch:
https://lkml.org/lkml/2012/7/3/1

change log of v9:
  [RFC PATCH v9 8/21]
* add a lock to protect the list map_entries
* add an indicator to firmware_map_entry to remember whether the memory
  is allocated from bootmem
  [RFC PATCH v9 10/21]
* change the macro to inline function
  [RFC PATCH v9 19/21]
* don't offline the node if the cpu on the node is onlined
  [RFC PATCH v9 21/21]
* create new patch: auto offline page_cgroup when onlining memory block
  failed

change log of v8:
  [RFC PATCH v8 17/20]
* Fix problems when one node's range include the other nodes
  [RFC PATCH v8 18/20]
* fix building error when CONFIG_MEMORY_HOTPLUG_SPARSE or CONFIG_HUGETLBFS
  is not defined.
  [RFC PATCH v8 19/20]
* don't offline node when some memory sections are not removed
  [RFC PATCH v8 20/20]
* create new patch: clear hwpoisoned flag when onlining pages

change log of v7:
  [RFC PATCH v7 4/19]
* do not continue if acpi_memory_device_remove_memory() fails.
  [RFC PATCH v7 15/19]
* handle usemap in register_page_bootmem_info_section() too.

change log of v6:
  [RFC PATCH v6 12/19]
* fix building error on other archtitectures than x86

  [RFC PATCH v6 15-16/19]
* fix building error on other archtitectures than x86

change log of v5:
  * merge the patchset to clear page table and the patchset to hot remove
memory(from ishimatsu) to one big patchset.

  [RFC PATCH v5 1/19]
* rename remove_memory() to offline_memory()/offline_pages()

  [RFC PATCH v5 2/19]
* new patch: implement offline_memory(). This function offlines pages,
  update memory block's state, and notify the userspace that the memory
  block's state is changed.

  [RFC PATCH v5 4/19]
* offline and remove memory in acpi_memory_disable_device() too.

  [RFC PATCH v5 17/19]
* new patch: add a new function __remove_zone() to revert the things done
  in the function __add_zone().

  [RFC PATCH v5 18/19]
* flush work befor reseting node device.

change log of v4:
  * remove memory-hotplug : unify argument of firmware_map_add_early/hotplug
from the patch series, since the patch is a bugfix. It is being disccussed
on other thread. But for testing the patch series, the patch is needed.
So I added