from:"Nathan Lynch"

Re: [PATCH 1/6 v5] Kernel DLPAR Infrastructure

2009-10-28 Thread Nathan Lynch

On Thu, 2009-10-29 at 14:08 +1100, Benjamin Herrenschmidt wrote:
 On Wed, 2009-10-28 at 15:53 -0500, Nathan Fontenot wrote:
  +   struct device_node *dn;
  +   struct device_node *first_dn = NULL;
  +   struct device_node *last_dn = NULL;
  +   struct property *property;
  +   struct property *last_property = NULL;
  +   struct cc_workarea *ccwa;
  +   int cc_token;
  +   int rc;
  +
  +   cc_token = rtas_token(ibm,configure-connector);
  +   if (cc_token == RTAS_UNKNOWN_SERVICE)
  +   return NULL;
  +
  +   spin_lock(workarea_lock);
  +
  +   ccwa = (struct cc_workarea *)workarea[0];
  +   ccwa-drc_index = drc_index;
  +   ccwa-zero = 0;
 
 Popping a free page with gfp (or just kmalloc'ing 4K) would avoid the
 need for the lock too.

Not kmalloc -- the alignment of the buffer isn't guaranteed when
slub/slab debug is on, and iirc  the work area needs to be 4K-aligned.
__get_free_page should be fine, I think.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

[RFC/PATCH 0/3] checkpoint/restart for powerpc

2009-01-28 Thread Nathan Lynch

This series is a first attempt at enabling checkpoint/restart for
powerpc.  It is based on Oren Laadan's v13 checkpoint/restart series:

http://lkml.org/lkml/2009/1/27/227

Nathan Lynch (3):
  powerpc: bare minimum checkpoint/restart implementation
  powerpc: wire up checkpoint and restart syscalls
  allow checkpoint/restart on powerpc

 arch/powerpc/include/asm/checkpoint_hdr.h |   40 +
 arch/powerpc/include/asm/systbl.h |2 +
 arch/powerpc/include/asm/unistd.h |4 +-
 arch/powerpc/mm/Makefile  |1 +
 arch/powerpc/mm/checkpoint.c  |  261 +
 checkpoint/Kconfig|2 +-
 6 files changed, 308 insertions(+), 2 deletions(-)
 create mode 100644 arch/powerpc/include/asm/checkpoint_hdr.h
 create mode 100644 arch/powerpc/mm/checkpoint.c

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 2/3] powerpc: wire up checkpoint and restart syscalls

2009-01-28 Thread Nathan Lynch

Signed-off-by: Nathan Lynch n...@pobox.com
---
 arch/powerpc/include/asm/systbl.h |2 ++
 arch/powerpc/include/asm/unistd.h |4 +++-
 2 files changed, 5 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/include/asm/systbl.h 
b/arch/powerpc/include/asm/systbl.h
index 803def2..cdb60b4 100644
--- a/arch/powerpc/include/asm/systbl.h
+++ b/arch/powerpc/include/asm/systbl.h
@@ -322,3 +322,5 @@ SYSCALL_SPU(epoll_create1)
 SYSCALL_SPU(dup3)
 SYSCALL_SPU(pipe2)
 SYSCALL(inotify_init1)
+SYSCALL(checkpoint)
+SYSCALL(restart)
diff --git a/arch/powerpc/include/asm/unistd.h 
b/arch/powerpc/include/asm/unistd.h
index e07d0c7..2e333a1 100644
--- a/arch/powerpc/include/asm/unistd.h
+++ b/arch/powerpc/include/asm/unistd.h
@@ -341,10 +341,12 @@
 #define __NR_dup3  316
 #define __NR_pipe2 317
 #define __NR_inotify_init1 318
+#define __NR_checkpoint319
+#define __NR_restart   320
 
 #ifdef __KERNEL__
 
-#define __NR_syscalls  319
+#define __NR_syscalls  321
 
 #define __NR__exit __NR_exit
 #define NR_syscalls__NR_syscalls
-- 
1.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 3/3] allow checkpoint/restart on powerpc

2009-01-28 Thread Nathan Lynch

Signed-off-by: Nathan Lynch n...@pobox.com
---
 checkpoint/Kconfig |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/checkpoint/Kconfig b/checkpoint/Kconfig
index ffaa635..00036aa 100644
--- a/checkpoint/Kconfig
+++ b/checkpoint/Kconfig
@@ -1,7 +1,7 @@
 config CHECKPOINT_RESTART
prompt Enable checkpoint/restart (EXPERIMENTAL)
def_bool n
-   depends on X86_32  EXPERIMENTAL
+   depends on (X86_32 || PPC)  EXPERIMENTAL
help
  Application checkpoint/restart is the ability to save the
  state of a running application so that it can later resume
-- 
1.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 1/3] powerpc: bare minimum checkpoint/restart implementation

2009-01-28 Thread Nathan Lynch

The only thing of significance here is that the checkpointed task's
pt_regs and fp state are saved and restored (see cr_write_cpu and
cr_read_cpu); the rest of the code consists of dummy implementations
of the APIs the arch needs to provide to the checkpoint/restart core.

What works:
* self and external checkpoint of simple (single thread, one open
  file) 32- and 64-bit processes on a ppc64 kernel

What doesn't work:
* restarting a 32-bit task from a 64-bit task and vice versa

Untested:
* ppc32 (but it builds)

Signed-off-by: Nathan Lynch n...@pobox.com
---
 arch/powerpc/include/asm/checkpoint_hdr.h |   40 +
 arch/powerpc/mm/Makefile  |1 +
 arch/powerpc/mm/checkpoint.c  |  261 +
 3 files changed, 302 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/include/asm/checkpoint_hdr.h
 create mode 100644 arch/powerpc/mm/checkpoint.c

diff --git a/arch/powerpc/include/asm/checkpoint_hdr.h 
b/arch/powerpc/include/asm/checkpoint_hdr.h
new file mode 100644
index 000..752c53f
--- /dev/null
+++ b/arch/powerpc/include/asm/checkpoint_hdr.h
@@ -0,0 +1,40 @@
+#ifndef __ASM_PPC_CKPT_HDR_H
+#define __ASM_PPC_CKPT_HDR_H
+/*
+ *  Checkpoint/restart - architecture specific headers ppc
+ *
+ *  Copyright (C) 2008 Oren Laadan
+ *
+ *  This file is subject to the terms and conditions of the GNU General Public
+ *  License.  See the file COPYING in the main directory of the Linux
+ *  distribution for more details.
+ */
+
+#include linux/types.h
+#include asm/ptrace.h
+#include asm/mmu.h
+#include asm/processor.h
+
+struct cr_hdr_head_arch {
+   __u32 unimplemented;
+};
+
+struct cr_hdr_thread {
+   __u32 unimplemented;
+};
+
+struct cr_hdr_cpu {
+   struct pt_regs pt_regs;
+   /* relevant fields from thread_struct */
+   double fpr[32][TS_FPRWIDTH];
+   unsigned int fpscr;
+   int fpexc_mode;
+   /* unsigned int align_ctl; this is never updated? */
+   unsigned long dabr;
+};
+
+struct cr_hdr_mm_context {
+   __u32 unimplemented;
+};
+
+#endif /* __ASM_PPC_CKPT_HDR__H */
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index e7392b4..8a523a0 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -24,3 +24,4 @@ obj-$(CONFIG_NEED_MULTIPLE_NODES) += numa.o
 obj-$(CONFIG_PPC_MM_SLICES)+= slice.o
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
 obj-$(CONFIG_PPC_SUBPAGE_PROT) += subpage-prot.o
+obj-$(CONFIG_CHECKPOINT_RESTART) += checkpoint.o
diff --git a/arch/powerpc/mm/checkpoint.c b/arch/powerpc/mm/checkpoint.c
new file mode 100644
index 000..8cdff24
--- /dev/null
+++ b/arch/powerpc/mm/checkpoint.c
@@ -0,0 +1,261 @@
+/*
+ *  Checkpoint/restart - architecture specific support for powerpc.
+ *  Based on x86 implementation.
+ *
+ *  Copyright (C) 2008 Oren Laadan
+ *  Copyright 2009 IBM Corp.
+ *
+ *  This file is subject to the terms and conditions of the GNU General Public
+ *  License.  See the file COPYING in the main directory of the Linux
+ *  distribution for more details.
+ */
+
+#define DEBUG 1 /* for pr_debug */
+
+#include linux/checkpoint.h
+#include linux/checkpoint_hdr.h
+#include linux/kernel.h
+#include asm/processor.h
+
+static void cr_hdr_init(struct cr_hdr *hdr, __s16 type, __s16 len, __u32 
parent)
+{
+   hdr-type = type;
+   hdr-len = len;
+   hdr-parent = parent;
+}
+
+/* dump the thread_struct of a given task */
+int cr_write_thread(struct cr_ctx *ctx, struct task_struct *t)
+{
+   struct cr_hdr_thread *thread_hdr;
+   struct cr_hdr cr_hdr;
+   u32 parent;
+   int ret;
+
+   thread_hdr = cr_hbuf_get(ctx, sizeof(*thread_hdr));
+   if (!thread_hdr)
+   return -ENOMEM;
+
+   parent = task_pid_vnr(t);
+
+   cr_hdr_init(cr_hdr, CR_HDR_THREAD, sizeof(*thread_hdr), parent);
+
+   thread_hdr-unimplemented = 0xdeadbeef;
+
+   ret = cr_write_obj(ctx, cr_hdr, thread_hdr);
+   cr_hbuf_put(ctx, sizeof(*thread_hdr));
+
+   return ret;
+}
+
+/* dump the cpu state and registers of a given task */
+int cr_write_cpu(struct cr_ctx *ctx, struct task_struct *t)
+{
+   struct cr_hdr_cpu *cpu_hdr;
+   struct pt_regs *pt_regs;
+   struct cr_hdr cr_hdr;
+   u32 parent;
+   int ret;
+
+   cpu_hdr = cr_hbuf_get(ctx, sizeof(*cpu_hdr));
+   if (!cpu_hdr)
+   return -ENOMEM;
+
+   parent = task_pid_vnr(t);
+
+   cr_hdr_init(cr_hdr, CR_HDR_CPU, sizeof(*cpu_hdr), parent);
+
+   /* pt_regs: GPRs, MSR, etc */
+   pt_regs = task_pt_regs(t);
+   cpu_hdr-pt_regs = *pt_regs;
+
+   /* FP state */
+   memcpy(cpu_hdr-fpr, t-thread.fpr, sizeof(cpu_hdr-fpr));
+   cpu_hdr-fpscr = t-thread.fpscr.val;
+   cpu_hdr-fpexc_mode = t-thread.fpexc_mode;
+
+   /* Handle DABR for now, dbcr[01] later */
+   cpu_hdr-dabr = t-thread.dabr;
+
+   /* ToDo: Altivec/VSX/SPE state */
+
+   ret = cr_write_obj(ctx, cr_hdr, cpu_hdr);
+   cr_hbuf_put

Re: [PATCH 1/3] powerpc: bare minimum checkpoint/restart implementation

2009-01-29 Thread Nathan Lynch

Hey Oren, thanks for taking a look.

Oren Laadan wrote:
 
 Nathan Lynch wrote:
  
  What doesn't work:
  * restarting a 32-bit task from a 64-bit task and vice versa
 
 Is there a test to bail if we attempt to checkpoint such tasks ?

No, but I'll add one if it looks too hard to fix for the next round.


  +struct cr_hdr_cpu {
  +   struct pt_regs pt_regs;
 
 It has been suggested (as done in x86/32 code) not to use 'struct pt_regs'
 because it can (and has) changed on x86 and because it only container
 the registers that the kernel trashes, not all usermode registers.
 
 https://lists.linux-foundation.org/pipermail/containers/2008-August/012355.html

Yeah, I considered that discussion, but the situation is different for
powerpc (someone on linuxppc-dev smack me if I'm wrong here :)
pt_regs is part of the ABI, and it encompasses all user mode registers
except for floating point, which are handled separately.


  +   /* relevant fields from thread_struct */
  +   double fpr[32][TS_FPRWIDTH];
 
 Can TS_FPRWIDTH change between sub-archs or kernel versions ?  If so, it
 needs to be stated explicitly.
 
  +   unsigned int fpscr;
  +   int fpexc_mode;
  +   /* unsigned int align_ctl; this is never updated? */
  +   unsigned long dabr;
 
 Are these fields always guarantee to compile to the same number of bytes
 regardless of 32/64 bit choice of compiler (or sub-arch?) ?
 
 In the x86(32/64) architecture we use types with explicit size such as
 __u32 and the like to ensure that it always compiled to the same
 size.

Yeah, I'll have to fix these up.



  +static void cr_hdr_init(struct cr_hdr *hdr, __s16 type, __s16 len, __u32 
  parent)
  +{
  +   hdr-type = type;
  +   hdr-len = len;
  +   hdr-parent = parent;
  +}
  +
 
 This function is rather generic and useful to non-arch-dependent and other
 architectures code. Perhaps put in a separate patch ?

Alright.  By the way, why are cr_hdr-type and cr_hdr-len signed
types?


  +int cr_write_cpu(struct cr_ctx *ctx, struct task_struct *t)
  +{
  +   struct cr_hdr_cpu *cpu_hdr;
  +   struct pt_regs *pt_regs;
  +   struct cr_hdr cr_hdr;
  +   u32 parent;
  +   int ret;
  +
  +   cpu_hdr = cr_hbuf_get(ctx, sizeof(*cpu_hdr));
  +   if (!cpu_hdr)
  +   return -ENOMEM;
  +
  +   parent = task_pid_vnr(t);
  +
  +   cr_hdr_init(cr_hdr, CR_HDR_CPU, sizeof(*cpu_hdr), parent);
  +
  +   /* pt_regs: GPRs, MSR, etc */
  +   pt_regs = task_pt_regs(t);
  +   cpu_hdr-pt_regs = *pt_regs;
  +
  +   /* FP state */
  +   memcpy(cpu_hdr-fpr, t-thread.fpr, sizeof(cpu_hdr-fpr));
 
 As note above, is sizeof(cpu_hdr-fpr) the same on all chips ?

It can differ depending on kernel configuration.


  +/* restart APIs */
  +
 
 The restart APIs belong in a separate file: arch/powerpc/mm/restart.c

Explain why, please?  This isn't a lot of code, and it seems likely
that checkpoint and restart paths will share data structures and tend
to be modified together over time.


  +   pr_debug(%s: unexpected thread_hdr contents: 0x%lx\n,
  +__func__, (unsigned long)thread_hdr-unimplemented);
 
 Given the macro for 'pr_fmt' in include/linux/checkpoint.h, the use of
 __func__ is redunant.

It seems to me that defining your own pr_fmt in a public header like
that is inappropriate, or at least unconventional.  Any file that
happens to include linux/checkpoint.h will have any prior definitions
of pr_fmt overridden, no?


  +   regs = task_pt_regs(current);
  +   *regs = cpu_hdr-pt_regs;
  +
  +   regs-msr = sanitize_msr(regs-msr);
  +
  +   /* FP state */
  +   memcpy(current-thread.fpr, cpu_hdr-fpr, sizeof(current-thread.fpr));
  +   current-thread.fpscr.val = cpu_hdr-fpscr;
  +   current-thread.fpexc_mode = cpu_hdr-fpexc_mode;
  +
  +   /* debug registers */
  +   current-thread.dabr = cpu_hdr-dabr;
 
 I'm unfamiliar with powerpc; is it necessary to sanitize any of the registers
 here ?  For instance, can the user cause harm with specially crafted values
 of some registers ?

I had this in mind with the treatment of MSR, but I'll check on the
others, thanks.


  +int cr_read_mm_context(struct cr_ctx *ctx, struct mm_struct *mm, int 
  rparent)
  +{
  +   struct cr_hdr_mm_context *mm_hdr;
  +   int ret;
  +
  +   mm_hdr = cr_hbuf_get(ctx, sizeof(*mm_hdr));
  +   if (!mm_hdr)
  +   return -ENOMEM;
  +
  +   ret = cr_read_obj_type(ctx, mm_hdr, sizeof(*mm_hdr),
  + CR_HDR_MM_CONTEXT);
  +   if (ret != rparent)
  +   goto out;
 
 Seems like 'ret' isn't set to an error value if the 'goto' executes.

It returns whatever error value cr_read_obj_type() returns.  Hrm.  I
guess if the image is garbage, cr_read_obj_type can potentially return
a non-error value that still isn't the desired value, is that right?

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 1/1] powerpc: Fix partition migration hang under load

2009-01-29 Thread Nathan Lynch

Brian King wrote:
 
 While testing partition migration with heavy CPU load using
 shared processors, it was observed that sometimes the migration
 would never complete and would appear to hang. Currently, the
 migration code assumes that if H_SUCCESS is returned from the H_JOIN
 then the migration is complete and the processor is waking up on
 the target system. If there was an outstanding PROD to the processor
 when the H_JOIN is called, however, it will return H_SUCCESS on the source
 system

Hmm, did you determine where that outstanding H_PROD is coming from?
AFAICT this is the only code which uses that hcall, and all processors
should have consumed their prods from one migration before another
migration can commence.

Regardless, ACK -- if we were to add another H_PROD call site (or if
there's one I missed) this would be necessary anyway.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 1/3] powerpc: bare minimum checkpoint/restart implementation

2009-01-30 Thread Nathan Lynch

Oren Laadan wrote:
 
 Nathan Lynch wrote:
  
  Oren Laadan wrote:
  Nathan Lynch wrote:
  + pr_debug(%s: unexpected thread_hdr contents: 0x%lx\n,
  +  __func__, (unsigned long)thread_hdr-unimplemented);
  Given the macro for 'pr_fmt' in include/linux/checkpoint.h, the use of
  __func__ is redunant.
  
  It seems to me that defining your own pr_fmt in a public header like
  that is inappropriate, or at least unconventional.  Any file that
  happens to include linux/checkpoint.h will have any prior definitions
  of pr_fmt overridden, no?
  
 
 Hmmm.. didn't think of it this way. Using the pr_debug() there was yet
 another feedback from LKML, and it seemed reasonable to me. Can you
 think of a case where linux/checkpoint.h will happen to be included
 in checkpoint-related code ?

(Assume you meant included in checkpoint-unrelated code)

I could see checkpoint.h being included by files that don't
exclusively deal with C/R.  If you want a uniform debug statement
format for C/R-related code, that's fine, but this isn't the way to do
it.  See the existing users (almost all in drivers/s390).

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 1/3] powerpc: bare minimum checkpoint/restart implementation

2009-02-16 Thread Nathan Lynch

Nathan Lynch n...@pobox.com wrote:

 Oren Laadan wrote:
  
  Nathan Lynch wrote:
   
   What doesn't work:
   * restarting a 32-bit task from a 64-bit task and vice versa
  
  Is there a test to bail if we attempt to checkpoint such tasks ?
 
 No, but I'll add one if it looks too hard to fix for the next round.

Unfortunately, adding a check for this is hard.

The point of no return in the restart path is cr_read_mm, which tears
down current's address space.  cr_read_mm runs way before cr_read_cpu,
which is the only restart method I've implemented for powerpc so far.
So, checking for this condition in cr_read_cpu is too late if I want
restart(2) to return an error and leave the caller's memory map
intact.  (And I do want this: restart should be as robust as execve.)

Well okay then, cr_read_head_arch seems to be the right place in the
restart sequence for the architecture code to handle this.  However,
cr_write_head_arch (which produces the buffer that cr_read_head_arch
consumes) is not provided a reference to the task to be checkpointed,
nor can it assume that it's operating on current.  I need a reference
to a task before I can determine whether it's running in 32- or 64-bit
mode, or using the FPU, Altivec, SPE, whatever.

In any case, mixing 32- and 64-bit tasks across restart is something I
eventually want to support, not reject.  But the problem I've outlined
applies to FPU state and vector extensions (VMX, SPE), as well as
sanity-checking debug register (DABR) contents.  We'll need to be able
to error out gracefully from restart when a checkpoint image specifies a
feature unsupported by the current kernel or hardware.  But I don't see
how to do it with the current architecture.  Am I missing something?
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[RFC/PATCH] powerpc: provide APIs for validating and updating DABR

2009-02-17 Thread Nathan Lynch

A checkpointed task image may specify a value for the DABR (Data
Access Breakpoint Register).  The restart code needs to validate this
value before making any changes to the current task.

ptrace_set_debugreg encapsulates the bounds checking and platform
dependencies of programming the DABR.  Split this into validate
(debugreg_valid) and update (debugreg_update) functions, and make
them available for use outside of the ptrace code.

Also ptrace_set_debugreg has extern linkage, but no users outside of
ptrace.c.  Make it static.

Signed-off-by: Nathan Lynch n...@pobox.com
---
 arch/powerpc/include/asm/ptrace.h |6 +++
 arch/powerpc/kernel/ptrace.c  |   68 
 2 files changed, 44 insertions(+), 30 deletions(-)

Is something like this okay to carry in the checkpoint/restart patches?


diff --git a/arch/powerpc/include/asm/ptrace.h 
b/arch/powerpc/include/asm/ptrace.h
index c9c678f..1b3a5f0 100644
--- a/arch/powerpc/include/asm/ptrace.h
+++ b/arch/powerpc/include/asm/ptrace.h
@@ -81,6 +81,8 @@ struct pt_regs {
 
 #ifndef __ASSEMBLY__
 
+#include linux/types.h
+
 #define instruction_pointer(regs) ((regs)-nip)
 #define user_stack_pointer(regs) ((regs)-gpr[1])
 #define regs_return_value(regs) ((regs)-gpr[3])
@@ -138,6 +140,10 @@ do {   
  \
 extern void user_enable_single_step(struct task_struct *);
 extern void user_disable_single_step(struct task_struct *);
 
+/* for reprogramming DABR/DAC during restart of a checkpointed task */
+extern bool debugreg_valid(unsigned long val);
+extern void debugreg_update(struct task_struct *task, unsigned long val);
+
 #endif /* __ASSEMBLY__ */
 
 #endif /* __KERNEL__ */
diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
index 3635be6..60259c6 100644
--- a/arch/powerpc/kernel/ptrace.c
+++ b/arch/powerpc/kernel/ptrace.c
@@ -735,22 +735,13 @@ void user_disable_single_step(struct task_struct *task)
clear_tsk_thread_flag(task, TIF_SINGLESTEP);
 }
 
-int ptrace_set_debugreg(struct task_struct *task, unsigned long addr,
-  unsigned long data)
+bool debugreg_valid(unsigned long val)
 {
-   /* For ppc64 we support one DABR and no IABR's at the moment (ppc64).
-*  For embedded processors we support one DAC and no IAC's at the
-*  moment.
-*/
-   if (addr  0)
-   return -EINVAL;
-
/* The bottom 3 bits in dabr are flags */
-   if ((data  ~0x7UL) = TASK_SIZE)
-   return -EIO;
+   if ((val  ~0x7UL) = TASK_SIZE)
+   return false;
 
 #ifndef CONFIG_BOOKE
-
/* For processors using DABR (i.e. 970), the bottom 3 bits are flags.
 *  It was assumed, on previous implementations, that 3 bits were
 *  passed together with the data address, fitting the design of the
@@ -764,47 +755,64 @@ int ptrace_set_debugreg(struct task_struct *task, 
unsigned long addr,
 */
 
/* Ensure breakpoint translation bit is set */
-   if (data  !(data  DABR_TRANSLATION))
-   return -EIO;
-
-   /* Move contents to the DABR register */
-   task-thread.dabr = data;
-
-#endif
-#if defined(CONFIG_BOOKE)
-
+   if (val  !(val  DABR_TRANSLATION))
+   return false;
+#else
/* As described above, it was assumed 3 bits were passed with the data
 *  address, but we will assume only the mode bits will be passed
 *  as to not cause alignment restrictions for DAC-based processors.
 */
 
+   /* Read or Write bits must be set */
+   if (!(val  0x3UL))
+   return -EINVAL;
+#endif
+   return true;
+}
+
+void debugreg_update(struct task_struct *task, unsigned long val)
+{
+#ifndef CONFIG_BOOKE
+   task-thread.dabr = val;
+#else
/* DAC's hold the whole address without any mode flags */
-   task-thread.dabr = data  ~0x3UL;
+   task-thread.dabr = val  ~0x3UL;
 
if (task-thread.dabr == 0) {
task-thread.dbcr0 = ~(DBSR_DAC1R | DBSR_DAC1W | DBCR0_IDM);
task-thread.regs-msr = ~MSR_DE;
-   return 0;
}
 
-   /* Read or Write bits must be set */
-
-   if (!(data  0x3UL))
-   return -EINVAL;
-
/* Set the Internal Debugging flag (IDM bit 1) for the DBCR0
   register */
task-thread.dbcr0 = DBCR0_IDM;
 
/* Check for write and read flags and set DBCR0
   accordingly */
-   if (data  0x1UL)
+   if (val  0x1UL)
task-thread.dbcr0 |= DBSR_DAC1R;
-   if (data  0x2UL)
+   if (val  0x2UL)
task-thread.dbcr0 |= DBSR_DAC1W;
 
task-thread.regs-msr |= MSR_DE;
 #endif
+}
+
+static int ptrace_set_debugreg(struct task_struct *task, unsigned long addr,
+  unsigned long data)
+{
+   /* For ppc64 we support one DABR and no IABR's at the moment (ppc64

[PATCH 1/3 v2] powerpc: heckpoint/restart implementation

2009-02-17 Thread Nathan Lynch

On Tue, 17 Feb 2009 01:03:55 -0600
Nathan Lynch n...@pobox.com wrote:

 Nathan Lynch n...@pobox.com wrote:
 
  Oren Laadan wrote:
   
   Nathan Lynch wrote:

What doesn't work:
* restarting a 32-bit task from a 64-bit task and vice versa
   
   Is there a test to bail if we attempt to checkpoint such tasks ?
  
  No, but I'll add one if it looks too hard to fix for the next round.
 
 Unfortunately, adding a check for this is hard.
 
 The point of no return in the restart path is cr_read_mm, which tears
 down current's address space.  cr_read_mm runs way before cr_read_cpu,
 which is the only restart method I've implemented for powerpc so far.
 So, checking for this condition in cr_read_cpu is too late if I want
 restart(2) to return an error and leave the caller's memory map
 intact.  (And I do want this: restart should be as robust as execve.)
 
 Well okay then, cr_read_head_arch seems to be the right place in the
 restart sequence for the architecture code to handle this.  However,
 cr_write_head_arch (which produces the buffer that cr_read_head_arch
 consumes) is not provided a reference to the task to be checkpointed,
 nor can it assume that it's operating on current.  I need a reference
 to a task before I can determine whether it's running in 32- or 64-bit
 mode, or using the FPU, Altivec, SPE, whatever.
 
 In any case, mixing 32- and 64-bit tasks across restart is something I
 eventually want to support, not reject.  But the problem I've outlined
 applies to FPU state and vector extensions (VMX, SPE), as well as
 sanity-checking debug register (DABR) contents.  We'll need to be able
 to error out gracefully from restart when a checkpoint image specifies a
 feature unsupported by the current kernel or hardware.  But I don't see
 how to do it with the current architecture.  Am I missing something?

Anyway, here's what I have coded up in response to all the feedback
(thanks!)  But all the error/compatibility checking I added doesn't
seem that useful unless the above is addressed...

Support for checkpointing and restarting GPRs, FPU state, DABR, and
Altivec state.

The portion of the checkpoint image manipulated by this code begins
with a bitmask of features indicating the various contexts saved.
Fields in image that can vary depending on kernel configuration
(e.g. FP regs due to VSX) have their sizes explicitly recorded, except
for GPRS, so migrating between ppc32 and ppc64 won't work yet.

The restart code ensures that the task is not modified until the
checkpoint image is validated against the current kernel configuration
and hardware features (e.g. can't restart a task using Altivec on
non-Altivec systems).

What works:
* self and external checkpoint of simple (single thread, one open
  file) 32- and 64-bit processes on a ppc64 kernel

What doesn't work:
* restarting a 32-bit task from a 64-bit task and vice versa

Untested:
* ppc32 (but it builds)

Signed-off-by: Nathan Lynch n...@pobox.com
---

This depends on powerpc: provide APIs for validating and updating
DABR which I posted to linuxppc-dev on 17 Feb:

http://patchwork.ozlabs.org/patch/23311/

v2 changelog:
- use feature bitmask in checkpoint image as suggested by Ben
- fail restart if checkpoint image specifies unsupported features
- handle Altivec/VMX and SPE register state
- validate DABR value from checkpoint image
- fail restart on differing FP register set sizes (can happen
  depending on CONFIG_VSX)
- fail restart on 32-/64-bit mismatch between image and restarting
  task
- don't write meaningless data in unimplemented arch callbacks
- kill cr_hdr_init helper


 arch/powerpc/include/asm/checkpoint_hdr.h |   15 +
 arch/powerpc/mm/Makefile  |1 +
 arch/powerpc/mm/checkpoint.c  |  482 +
 3 files changed, 498 insertions(+), 0 deletions(-)
 create mode 100644 arch/powerpc/include/asm/checkpoint_hdr.h
 create mode 100644 arch/powerpc/mm/checkpoint.c

diff --git a/arch/powerpc/include/asm/checkpoint_hdr.h 
b/arch/powerpc/include/asm/checkpoint_hdr.h
new file mode 100644
index 000..9f0d099
--- /dev/null
+++ b/arch/powerpc/include/asm/checkpoint_hdr.h
@@ -0,0 +1,15 @@
+#ifndef __ASM_PPC_CKPT_HDR_H
+#define __ASM_PPC_CKPT_HDR_H
+/*
+ *  Checkpoint/restart - architecture specific headers ppc
+ *
+ *  Copyright (C) 2008 Oren Laadan
+ *
+ *  This file is subject to the terms and conditions of the GNU General Public
+ *  License.  See the file COPYING in the main directory of the Linux
+ *  distribution for more details.
+ */
+
+/* nothing to see here */
+
+#endif /* __ASM_PPC_CKPT_HDR__H */
diff --git a/arch/powerpc/mm/Makefile b/arch/powerpc/mm/Makefile
index e7392b4..8a523a0 100644
--- a/arch/powerpc/mm/Makefile
+++ b/arch/powerpc/mm/Makefile
@@ -24,3 +24,4 @@ obj-$(CONFIG_NEED_MULTIPLE_NODES) += numa.o
 obj-$(CONFIG_PPC_MM_SLICES)+= slice.o
 obj-$(CONFIG_HUGETLB_PAGE) += hugetlbpage.o
 obj-$(CONFIG_PPC_SUBPAGE_PROT) += subpage-prot.o
+obj

Re: [PATCH 1/3] powerpc: bare minimum checkpoint/restart implementation

2009-02-24 Thread Nathan Lynch

On Tue, 24 Feb 2009 13:58:26 -0600
Serge E. Hallyn se...@us.ibm.com wrote:

 Quoting Nathan Lynch (n...@pobox.com):
  Nathan Lynch n...@pobox.com wrote:
  
   Oren Laadan wrote:

Nathan Lynch wrote:
 
 What doesn't work:
 * restarting a 32-bit task from a 64-bit task and vice versa

Is there a test to bail if we attempt to checkpoint such tasks ?
   
   No, but I'll add one if it looks too hard to fix for the next round.
  
  Unfortunately, adding a check for this is hard.
  
  The point of no return in the restart path is cr_read_mm, which tears
  down current's address space.  cr_read_mm runs way before cr_read_cpu,
  which is the only restart method I've implemented for powerpc so far.
  So, checking for this condition in cr_read_cpu is too late if I want
  restart(2) to return an error and leave the caller's memory map
  intact.  (And I do want this: restart should be as robust as execve.)
  
  Well okay then, cr_read_head_arch seems to be the right place in the
  restart sequence for the architecture code to handle this.  However,
  cr_write_head_arch (which produces the buffer that cr_read_head_arch
  consumes) is not provided a reference to the task to be checkpointed,
  nor can it assume that it's operating on current.  I need a reference
  to a task before I can determine whether it's running in 32- or 64-bit
  mode, or using the FPU, Altivec, SPE, whatever.
  
  In any case, mixing 32- and 64-bit tasks across restart is something I
  eventually want to support, not reject.  But the problem I've outlined
  applies to FPU state and vector extensions (VMX, SPE), as well as
  sanity-checking debug register (DABR) contents.  We'll need to be able
  to error out gracefully from restart when a checkpoint image specifies a
  feature unsupported by the current kernel or hardware.  But I don't see
  how to do it with the current architecture.  Am I missing something?
 
 I suspect I can guess the response to this suggestion, but how about we
 accept that if sys_restart() fails due to something like this, the
 task is lost and can't exit gracefully?

In the short term it might be necessary.  But the restart code should
forcibly kill the task instead of returning an error back up to
userspace in this case.  Once the memory map of the process has been
altered, there is no point in allowing it to continue (and likely dump
a useless core).  Btw, this failure mode seems to apply when
cr_read_files() fails, too...

But in the long term, things need to be more robust (e.g. restart(2)
returns ENOEXEC without messing with current-mm).  I think it's worth
looking at how execve operates... if I understand correctly, it sets up
a new mm_struct disconnected from the current task and activates it at
the last moment.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 1/3] powerpc: bare minimum checkpoint/restart implementation

2009-03-16 Thread Nathan Lynch

Oren Laadan or...@cs.columbia.edu wrote:
 
 Nathan Lynch wrote:
  Nathan Lynch n...@pobox.com wrote:
  Oren Laadan wrote:
  Nathan Lynch wrote:
  What doesn't work:
  * restarting a 32-bit task from a 64-bit task and vice versa
  Is there a test to bail if we attempt to checkpoint such tasks ?
  No, but I'll add one if it looks too hard to fix for the next round.
  
  Unfortunately, adding a check for this is hard.
  
  The point of no return in the restart path is cr_read_mm, which tears
  down current's address space.  cr_read_mm runs way before cr_read_cpu,
  which is the only restart method I've implemented for powerpc so far.
  So, checking for this condition in cr_read_cpu is too late if I want
  restart(2) to return an error and leave the caller's memory map
  intact.  (And I do want this: restart should be as robust as execve.)
 
 In the case of restarting a container, I think it's ok if a restarting
 tasks dies in an ugly way -- this will be observed and handled by the
 initiating task outside the container, which will gracefully report to
 the caller/user.

How would task exit be observed?  Are all tasks in a restarted
container guaranteed to be children (in the sense that wait(2) would
work) of the initiating task?


 Even if you close this hole, then any other failure later on during
 restart - even a failure to allocate kernel memory due to memory pressure,
 will give that undesired effect that you are trying to avoid.

Kernel memory allocation failure is not the kind of problem I'm trying
to address.  I am trying to address the case of restarting a checkpoint
image that needs features that are not present, where the set of
features used by the checkpoint image can be compared against the set
of features the platform provides.


 That said, any difference in the architecture that may cause restart to
 fail is probably best placed in cr_write_head_arch.

I think I explained in my earlier mail why the current implementation's
cr_write_head_arch doesn't help in this case:

  Well okay then, cr_read_head_arch seems to be the right place in the
  restart sequence for the architecture code to handle this.  However,
  cr_write_head_arch (which produces the buffer that cr_read_head_arch
  consumes) is not provided a reference to the task to be checkpointed,
  nor can it assume that it's operating on current.  I need a reference
  to a task before I can determine whether it's running in 32- or 64-bit
  mode, or using the FPU, Altivec, SPE, whatever.
  
  In any case, mixing 32- and 64-bit tasks across restart is something I
  eventually want to support, not reject.  But the problem I've outlined
  applies to FPU state and vector extensions (VMX, SPE), as well as
  sanity-checking debug register (DABR) contents.  We'll need to be able
  to error out gracefully from restart when a checkpoint image specifies a
  feature unsupported by the current kernel or hardware.  But I don't see
  how to do it with the current architecture.  Am I missing something?
  
 
 More specifically, I envision restart to work like this:
 
 1) user invokes user-land utility (e.g. cr --restart ...
 2) 'cr' will create a new container
 3) 'cr' will start a child in that container
 4) child will create rest of tree (in kernel or in user space - tbd)
 5) each task in that tree will restore itself
 6) 'cr' monitors this process
 7) if all goes well - 'cr' report ok.
 8) if something goes bad, 'cr' notices and notifies caller/user

Again, how would 'cr' obtain exit status for these tasks, and how would
it distinguish failure from normal operation?
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] Quieten arch/powerpc in a allmodconfig build.

2009-04-08 Thread Nathan Lynch

Michael Ellerman mich...@ellerman.id.au wrote:

 On Wed, 2009-04-08 at 15:51 +1000, Tony Breeds wrote:
  On Wed, Apr 08, 2009 at 03:08:55PM +1000, Michael Ellerman wrote:
  
   The getter routines in here could really multiplex their return values
   with a negative error code, which I generally prefer, but this works I
   guess.
  
  I was hoping someone would notice and suggest it.  tag you're it!
 
 I meant we /could/ change them, but we could also leave them, it's a bit
 of a coin-flip which is better. Nathan might have an opinion?

I think I had some reason for doing it this way, but I'm drawing a
blank right now.

In the meantime, can someone post the warnings that gcc emits for
cacheinfo.c, as well as the gcc version?  I can't reproduce them with
4.3.2 on Fedora.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] Quieten arch/powerpc in a allmodconfig build.

2009-04-09 Thread Nathan Lynch

Tony Breeds t...@bakeyournoodle.com wrote:

 On Wed, Apr 08, 2009 at 01:47:36PM -0500, Nathan Lynch wrote:
 
  I think I had some reason for doing it this way, but I'm drawing a
  blank right now.
  
  In the meantime, can someone post the warnings that gcc emits for
  cacheinfo.c, as well as the gcc version?  I can't reproduce them with
  4.3.2 on Fedora.
 
 ---
 [t...@thor ~]$ egrep cacheinfo tmp/build.log 
 /scratch/tony/working/arch/powerpc/kernel/cacheinfo.c: In function 
 'associativity_show':
 /scratch/tony/working/arch/powerpc/kernel/cacheinfo.c:562: warning: 
 'associativity' may be used uninitialized in this function
 /scratch/tony/working/arch/powerpc/kernel/cacheinfo.c: In function 
 'size_show':
 /scratch/tony/working/arch/powerpc/kernel/cacheinfo.c:513: warning: 'size_kb' 
 may be used uninitialized in this function

Thanks.

So I think I've convinced myself that the warnings are incorrect and
that uninitialized use is not possible.

But I find it odd that gcc gives warnings for these sites but not others
in the file that use the same idiom (e.g. line_size_show,
nr_sets_show).  I'd guess that inlining is implicated somehow.  Would I
be justified in worrying that this version of gcc is generating
incorrect code?

If not, then I'm fine with the uninitialized_var() changes, but do
please include the warnings and the compiler version in the changelog.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 5/5] proc: export more page flags in /proc/kpageflags

2009-04-28 Thread Nathan Lynch

Wu Fengguang fengguang...@intel.com writes:

 On Wed, Apr 29, 2009 at 05:32:44AM +0800, Andrew Morton wrote:
 On Tue, 28 Apr 2009 09:09:12 +0800
 Wu Fengguang fengguang...@intel.com wrote:
 
  +/*
  + * Kernel flags are exported faithfully to Linus and his fellow hackers.
  + * Otherwise some details are masked to avoid confusing the end user:
  + * - some kernel flags are completely invisible
  + * - some kernel flags are conditionally invisible on their odd usages
  + */
  +#ifdef CONFIG_DEBUG_KERNEL
  +static inline int genuine_linus(void) { return 1; }
 
 Although he's a fine chap, the use of the _linus tag isn't terribly
 clear (to me).  I think what you're saying here is that this enables
 kernel-developer-only features, yes?

 Yes.

 If so, perhaps we could come up with an identifier which expresses that
 more clearly.
 
 But I'd expect that everyone and all distros enable CONFIG_DEBUG_KERNEL
 for _some_ reason, so what's the point?

At the least, it has not always been so...


 Good point! I can confirm my debian has CONFIG_DEBUG_KERNEL=Y!

I can confirm mine does not.

etch-i386:~# uname -a
Linux etch-i386 2.6.18-6-686 #1 SMP Fri Dec 12 16:48:28 UTC 2008 i686 GNU/Linux
etch-i386:~# grep DEBUG_KERNEL /boot/config-2.6.18-6-686 
# CONFIG_DEBUG_KERNEL is not set

For what that's worth.


 It is preferable that we always implement the same interface for all
 Kconfig settings.  If this exposes information which is confusing or
 not useful to end-users then so be it - we should be able to cover that
 in supporting documentation.

 My original patch takes that straightforward manner - and I still like it.
 I would be very glad to move the filtering code from kernel to user space.

 The use of more obscure flags could be discouraged by _not_ documenting
 them. A really curious user is encouraged to refer to the code for the
 exact meaning (and perhaps become a kernel developer ;-)

 Also, as mentioned in the other email, it would be good if we were to
 publish a little userspace app which people can use to access this raw
 data.  We could give that application an `--i-am-a-kernel-developer'
 option!

 OK. I'll include page-types.c in the next take.

  +#else
  +static inline int genuine_linus(void) { return 0; }
  +#endif
 
 This isn't an appropriate use of CONFIG_DEBUG_KERNEL.
 
 DEBUG_KERNEL is a Kconfig-only construct which is use to enable _other_
 debugging features.  The way you've used it here, if the person who is
 configuring the kernel wants to enable any other completely-unrelated
 debug feature, they have to enable DEBUG_KERNEL first.  But when they
 do that, they unexpectedly alter the behaviour of pagemap!
 
 There are two other places where CONFIG_DEBUG_KERNEL affects code
 generation in .c files: arch/parisc/mm/init.c and
 arch/powerpc/kernel/sysfs.c.  These are both wrong, and need slapping ;)

 (add cc to related maintainers)

I assume I was cc'd because I've changed arch/powerpc/kernel/sysfs.c a
couple of times in the last year, but I can't claim to maintain that
code.  I'm pretty sure I haven't touched the code in question in this
discussion.  I've cc'd linuxppc-dev.


 CONFIG_DEBUG_KERNEL being enabled in distro kernels effectively means 

 #ifdef CONFIG_DEBUG_KERNEL == #if 1

 as the following patch demos. Now it becomes obviously silly.

Sure, #if 1 is usually silly.  But if the point is that DEBUG_KERNEL is
not supposed to directly affect code generation, then I see two options
for powerpc:

- remove the #ifdef CONFIG_DEBUG_KERNEL guards from
  arch/powerpc/kernel/sysfs.c, unconditionally enabling the hid/ima
  sysfs attributes, or

- define a new config symbol which governs whether those attributes are
  enabled, and make it depend on DEBUG_KERNEL


 --- a/arch/powerpc/kernel/sysfs.c
 +++ b/arch/powerpc/kernel/sysfs.c
 @@ -212,19 +212,19 @@ static SYSDEV_ATTR(purr, 0600, show_purr, store_purr);
  #endif /* CONFIG_PPC64 */
  
  #ifdef HAS_PPC_PMC_PA6T
  SYSFS_PMCSETUP(pa6t_pmc0, SPRN_PA6T_PMC0);
  SYSFS_PMCSETUP(pa6t_pmc1, SPRN_PA6T_PMC1);
  SYSFS_PMCSETUP(pa6t_pmc2, SPRN_PA6T_PMC2);
  SYSFS_PMCSETUP(pa6t_pmc3, SPRN_PA6T_PMC3);
  SYSFS_PMCSETUP(pa6t_pmc4, SPRN_PA6T_PMC4);
  SYSFS_PMCSETUP(pa6t_pmc5, SPRN_PA6T_PMC5);
 -#ifdef CONFIG_DEBUG_KERNEL
 +#if 1
  SYSFS_PMCSETUP(hid0, SPRN_HID0);
  SYSFS_PMCSETUP(hid1, SPRN_HID1);
  SYSFS_PMCSETUP(hid4, SPRN_HID4);
  SYSFS_PMCSETUP(hid5, SPRN_HID5);
  SYSFS_PMCSETUP(ima0, SPRN_PA6T_IMA0);
  SYSFS_PMCSETUP(ima1, SPRN_PA6T_IMA1);
  SYSFS_PMCSETUP(ima2, SPRN_PA6T_IMA2);
  SYSFS_PMCSETUP(ima3, SPRN_PA6T_IMA3);
  SYSFS_PMCSETUP(ima4, SPRN_PA6T_IMA4);
 @@ -282,19 +282,19 @@ static struct sysdev_attribute classic_pmc_attrs[] = {
  static struct sysdev_attribute pa6t_attrs[] = {
   _SYSDEV_ATTR(mmcr0, 0600, show_mmcr0, store_mmcr0),
   _SYSDEV_ATTR(mmcr1, 0600, show_mmcr1, store_mmcr1),
   _SYSDEV_ATTR(pmc0, 0600, show_pa6t_pmc0, store_pa6t_pmc0),
   _SYSDEV_ATTR(pmc1, 0600, show_pa6t_pmc1, store_pa6t_pmc1),

Re: Kernel oops while duming user core.

2008-01-31 Thread Nathan Lynch

Rune Torgersen wrote:
 Kumar Gala wrote:
  Can you git-bisect to narrow this down further.
 
 Not easilly, as the board port to arch/powerpc was done on 2.6.24-rc7
 and up.
 Is there an somewhat esy way in git to apply the differences from master
 branch to our board branch to a branch created by bisect?
 
 And I don't even know where this started to happen.
 Would trying arch/ppc help any? I have our arch/ppc port in a
 semiworking state for kernels up to 2.6.23

Well, we know this happens on other 32-bit powerpc machines (pmac at
least)... perhaps someone could arrange to bisect on a machine that
works with older powerpc kernels (assuming they have a good repro
case).
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: Kernel oops while duming user core.

2008-01-31 Thread Nathan Lynch

Scott Wood wrote:
 On Thu, Jan 31, 2008 at 11:40:04AM -0600, Rune Torgersen wrote:
  Unable to handle kernel paging request for data at address 0x48024000
  Faulting instruction address: 0xc000f0a0
  Oops: Kernel access of bad area, sig: 11 [#1]
  PREEMPT Innovative Systems ApMax
 
 Does it happen without preempt?
 
  Modules linked in: drv_wd(P) drv_scc devcom drv_pcir tipc drv_ss7
  drv_auxcpu drv_leds(P) drv_ethsw proc_sysinfo(P) i2c_8266(P)
  NIP: c000f0a0 LR: c0011fec CTR: 0080
  REGS: eebe9b70 TRAP: 0300   Tainted: P (2.6.24-test)
 
 Does it happen without the modules?

I doubt the modules are the problem; there was a practically identical
report from someone with an untainted 2.6.24-rc kernel a few weeks ago
(see my first reply to Rune).

 
  MSR: 9032 EE,ME,IR,DR  CR: 24004442  XER: 
  DAR: 48024000, DSISR: 2000
 
 Hmm, this doesn't look like a valid DSISR, so I'm guessing this was a TLB
 miss that got redirected to DataAccess (or is there something that causes
 DSRISR[2] to be set on 8280?  I didn't see anything in the manual...). 
 However, SRR1 in that case seems to indicate a store, which dcbst shouldn't
 generate (except on 8xx, according to the comment in update_mmu_cache).
 
 Do you have a simple test case that we could try to reproduce?  I tried a
 simple core dump on an 8280, and it worked.

Is the crashing program multithreaded?  The first report had firefox
triggering the oops.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: Kernel oops while duming user core.

2008-01-31 Thread Nathan Lynch

Rune Torgersen wrote:
 Hi
 
 I get the following kernel core while a user program I have is dumping
 core.
 Any DIeas at what to look for? (this is runnign 2.6.24, arch/powerpc on
 a 8280)
 When runnign the program on 2.6.18 arch/ppc, the program gets a sig 11
 and dumps core.
 On 2.6.24, I ghet the kernel oops, and then the program hangs sround
 forever and is unkillable.

Hmm, this is the second report of 2.6.24 crashing in
__flush_dcache_icache during a core dump; see:
http://ozlabs.org/pipermail/linuxppc-dev/2007-December/048662.html

Is this easily recreatable?

 
 Unable to handle kernel paging request for data at address 0x48024000
 Faulting instruction address: 0xc000ef88
 Oops: Kernel access of bad area, sig: 11 [#1]
 PREEMPT Innovative Systems ApMax
 Modules linked in: drv_wd(P) drv_scc devcom drv_pcir tipc drv_ss7
 drv_auxcpu drv_leds(P) drv_ethsw proc_sysinfo(P) i2c_8266(P)
 NIP: c000ef88 LR: c0012180 CTR: 0080
 REGS: eebc9b70 TRAP: 0300   Tainted: P (2.6.24)
 MSR: 9032 EE,ME,IR,DR  CR: 24004442  XER: 
 DAR: 48024000, DSISR: 2000
 TASK = eebac3c0[3131] 'armd' THREAD: eebc8000
 GPR00: ee9b7d00 eebc9c20 eebac3c0 48024000 0080 399a4181 48024000
 
 GPR08: 399a4181 ee9b7d00  c200 44004422 10100f38 ee82fc00
 bfff
 GPR16: ef377060 0030 ee9b7d00  eebc9cdc 0011 eebc9cd8
 eeb96480
 GPR24: ee9b7d00 399a4181 48024000 eeb9a370 eeb9a370 399a4181 48024000
 c2733480
 NIP [c000ef88] __flush_dcache_icache+0x14/0x40
 LR [c0012180] update_mmu_cache+0x74/0x114
 Call Trace:
 [eebc9c20] [eebc8000] 0xeebc8000 (unreliable)
 [eebc9c40] [c005d060] handle_mm_fault+0x630/0xbc0
 [eebc9c80] [c005d9e4] get_user_pages+0x3f4/0x4fc
 [eebc9cd0] [c00aa7c4] elf_core_dump+0x9a4/0xc5c
 [eebc9d60] [c00779e4] do_coredump+0x6e0/0x748
 [eebc9e50] [c002a5b0] get_signal_to_deliver+0x40c/0x45c
 [eebc9e80] [c0008ce8] do_signal+0x50/0x294
 [eebc9f40] [c000fb98] do_user_signal+0x74/0xc4
 --- Exception: 300 at 0x10044efc
 LR = 0x10044ec0
 Instruction dump:
 4d820020 7c8903a6 7c001bac 38630020 4200fff8 7c0004ac 4e800020 6000
 54630026 38800080 7c8903a6 7c661b78 7c00186c 38630020 4200fff8
 7c0004ac
 ---[ end trace 97db37eaf213da3c ]---
 note: armd[3131] exited with preempt_count 2


___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 0/5] ehea checkpatch fixups

2008-01-31 Thread Nathan Lynch

Doug Maxey wrote:
 
 A small set of fixups for checkpatch.
 
 Based on upstream pull from a few hours ago.

Er, ehea doesn't even build right now[1], maybe that could get fixed
first?  :-)

[1] http://lkml.org/lkml/2008/1/28/337
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: asm-offsets.c

2008-02-06 Thread Nathan Lynch

Christoph Hellwig wrote:
 On Wed, Feb 06, 2008 at 11:43:41PM -0500, Sean MacLennan wrote:
  I just did a git pull of Josh's tree, and 
  arch/powerpc/kernel/asm-offsets.c does not compile. I have only been 
  glossing over the linuxppc-dev emails, so forgive me if this already 
  came up.
 
 I have the same problem with current powerpc.git.

What are the exact error and config?  I haven't been able to reproduce
this with several defconfigs and toggling CONFIG_HIGH_RES_TIMERS.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: asm-offsets.c

2008-02-06 Thread Nathan Lynch

Sean MacLennan wrote:
 Christoph Hellwig wrote:
  On Thu, Feb 07, 2008 at 12:49:34AM -0600, Nathan Lynch wrote:

  Christoph Hellwig wrote:
  
  On Wed, Feb 06, 2008 at 11:43:41PM -0500, Sean MacLennan wrote:

  I just did a git pull of Josh's tree, and 
  arch/powerpc/kernel/asm-offsets.c does not compile. I have only been 
  glossing over the linuxppc-dev emails, so forgive me if this already 
  came up.
  
  I have the same problem with current powerpc.git.

  What are the exact error and config?  I haven't been able to reproduce
  this with several defconfigs and toggling CONFIG_HIGH_RES_TIMERS.
  
 Same error. I did a search for CLOCK_REALTIME_RES and it is not defined 
 in any *.[chS] file.

CLOCK_REALTIME_RES is defined in include/asm/asm-offsets.h, which is
generated from asm-offsets.c as part of the build.

Your config builds fine here, so please post the actual error that you
are seeing during the build?
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 1/2][OF] Add of_device_is_disabled function

2008-02-24 Thread Nathan Lynch

Josh Boyer wrote:
 On Sat, 23 Feb 2008 15:58:23 -0600
 Josh Boyer [EMAIL PROTECTED] wrote:
 
  IEEE 1275 defined a standard status property to indicate the operational
  status of a device.  The property has four possible values: okay, disabled,
  fail, fail-xxx.  The absence of this property means the operational status
  of the device is unknown or okay.
  
  This adds a function called of_device_is_disabled that checks to see if a
  node has the status property set to disabled.  This can be quite useful
  for devices that may be present but disabled due to pin sharing, etc.
  
 
 Talking with Ben H a bit, he suggested to reverse this API.  Basically,
 create an of_device_is_available that returns 1 if the status property
 is completely missing, or if it's set to okay or ok.  The latter is
 to cope with some broken firmwares.

I agree with Ben's suggestion.  The rtas_pci and eeh code could be
converted to use this, which gives a net savings of a few bytes with
ppc64_defconfig:


diff --git a/arch/powerpc/kernel/rtas_pci.c b/arch/powerpc/kernel/rtas_pci.c
index 433a0a0..39a752b 100644
--- a/arch/powerpc/kernel/rtas_pci.c
+++ b/arch/powerpc/kernel/rtas_pci.c
@@ -56,21 +56,6 @@ static inline int config_access_valid(struct pci_dn *dn, int 
where)
return 0;
 }
 
-static int of_device_available(struct device_node * dn)
-{
-const char *status;
-
-status = of_get_property(dn, status, NULL);
-
-if (!status)
-return 1;
-
-if (!strcmp(status, okay))
-return 1;
-
-return 0;
-}
-
 int rtas_read_config(struct pci_dn *pdn, int where, int size, u32 *val)
 {
int returnval = -1;
@@ -117,7 +102,7 @@ static int rtas_pci_read_config(struct pci_bus *bus,
for (dn = busdn-child; dn; dn = dn-sibling) {
struct pci_dn *pdn = PCI_DN(dn);
if (pdn  pdn-devfn == devfn
-of_device_available(dn))
+of_device_is_available(dn))
return rtas_read_config(pdn, where, size, val);
}
 
@@ -164,7 +149,7 @@ static int rtas_pci_write_config(struct pci_bus *bus,
for (dn = busdn-child; dn; dn = dn-sibling) {
struct pci_dn *pdn = PCI_DN(dn);
if (pdn  pdn-devfn == devfn
-of_device_available(dn))
+of_device_is_available(dn))
return rtas_write_config(pdn, where, size, val);
}
return PCIBIOS_DEVICE_NOT_FOUND;
diff --git a/arch/powerpc/platforms/pseries/eeh.c 
b/arch/powerpc/platforms/pseries/eeh.c
index 9eb539e..550b2f7 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -945,7 +945,6 @@ static void *early_enable_eeh(struct device_node *dn, void 
*data)
unsigned int rets[3];
struct eeh_early_enable_info *info = data;
int ret;
-   const char *status = of_get_property(dn, status, NULL);
const u32 *class_code = of_get_property(dn, class-code, NULL);
const u32 *vendor_id = of_get_property(dn, vendor-id, NULL);
const u32 *device_id = of_get_property(dn, device-id, NULL);
@@ -959,8 +958,8 @@ static void *early_enable_eeh(struct device_node *dn, void 
*data)
pdn-eeh_freeze_count = 0;
pdn-eeh_false_positives = 0;
 
-   if (status  strncmp(status, ok, 2) != 0)
-   return NULL;/* ignore devices with bad status */
+   if (!of_device_is_available(dn))
+   return NULL;
 
/* Ignore bad nodes. */
if (!class_code || !vendor_id || !device_id)
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: copy_from_user problem

2008-02-26 Thread Nathan Lynch

Maynard Johnson wrote:
 
 static long lib_addr;
 module_param(lib_addr, long, 0);
 
Should be unsigned long?

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: copy_from_user problem

2008-02-26 Thread Nathan Lynch

Nathan Lynch wrote:
 Maynard Johnson wrote:
  
  static long lib_addr;
  module_param(lib_addr, long, 0);
  
 Should be unsigned long?

ulong, rather, but that doesn't fix it.

In any case, lib_addr is a user virtual address; doesn't the kernel
need to do get_user_pages or some such to get at arbitrary process
memory?
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 3/3] ppc64-specific memory notifier support

2008-02-28 Thread Nathan Lynch

Badari Pulavarty wrote:

 +static struct notifier_block pseries_smp_nb = {

Rename this to pseries_mem_nb?

 + .notifier_call = pseries_memory_notifier,
 +};
 +
 +static int __init pseries_memory_hotplug_init(void)
 +{
 + if (firmware_has_feature(FW_FEATURE_LPAR))
 + pSeries_reconfig_notifier_register(pseries_smp_nb);
 +
 + return 0;
 +}
 +arch_initcall(pseries_memory_hotplug_init);

arch_initcall doesn't seem appropriate.  __initcall should be fine.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 0/3] hotplug memory remove updates

2008-02-28 Thread Nathan Lynch

Badari Pulavarty wrote:
 Hi Paul,
 
 Here are the hotplug memory remove updates for 2.6.25-rc2-mm1.

How have these been tested?  Have you initiated a memory remove
operation from the HMC?  That's the only way to catch some bugs...
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 2/2 v2] [POWERPC] Ignore disabled serial ports

2008-03-03 Thread Nathan Lynch

Josh Boyer wrote:
 On Mon, 3 Mar 2008 13:09:25 -0600
 Scott Wood [EMAIL PROTECTED] wrote:
 
  On Sun, Mar 02, 2008 at 10:43:17PM -0600, Josh Boyer wrote:
   On Mon, 3 Mar 2008 04:43:42 +0100
   Arnd Bergmann [EMAIL PROTECTED] wrote:
I wonder whether we should move the check for used-by-rtas into the
of_device_is_available function. I understand that used-by-rtas is
another way of expressing the idea that the kernel is not supposed to
access the specific device. In this case, the device is physically
present, but is not available to the OS.
   
   I'd rather not at the moment.  My intention was to only look at the
   status property for now.  I'd like to avoid this function growing into
   a huge switch statement for $random_firmware's way of flagging
   something as don't touch.
  
  Better that than having the huge list of tests in every driver...
 
 Perhaps.
 
 This isn't set in stone.  I'd rather get what's in the patch in-tree
 now and massage it as we go.  Otherwise this bike shed will wind up
 being rainbow colored yet totally useless because it's a never-ending
 patch rework.

I agree.  Josh's patch is immediately useful to other code as-is.

used-by-rtas is powerpc-specific and doesn't belong in drivers/of IMO.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 3/3] ppc64-specific memory notifier support

2008-03-03 Thread Nathan Lynch

Michael Ellerman wrote:
 On Thu, 2008-02-28 at 08:46 -0800, Badari Pulavarty wrote:
  Hotplug memory notifier for ppc64. This gets invoked by writing
  the device-node that needs to be removed to /proc/ppc64/ofdt.
  We need to adjust the sections and remove sysfs entries by
  calling __remove_pages(). Then call arch specific code to
  get rid of htab mappings for the section of memory.
  
  Signed-off-by: Badari Pulavarty [EMAIL PROTECTED]
  ---
   arch/powerpc/platforms/pseries/Makefile |1 
   arch/powerpc/platforms/pseries/hotplug-memory.c |   98 
  
   2 files changed, 99 insertions(+)
  
  Index: linux-2.6.25-rc2/arch/powerpc/platforms/pseries/hotplug-memory.c
  ===
  --- /dev/null   1970-01-01 00:00:00.0 +
  +++ linux-2.6.25-rc2/arch/powerpc/platforms/pseries/hotplug-memory.c
  2008-02-28 08:20:14.0 -0800
 
  +
  +static struct notifier_block pseries_smp_nb = {
  +   .notifier_call = pseries_memory_notifier,
  +};
  +
  +static int __init pseries_memory_hotplug_init(void)
  +{
  +   if (firmware_has_feature(FW_FEATURE_LPAR))
  +   pSeries_reconfig_notifier_register(pseries_smp_nb);
  +
  +   return 0;
  +}
  +arch_initcall(pseries_memory_hotplug_init);
 
 This is going to fire on non-pseries LPAR platforms, like iSeries and
 PS3. Which is not what you want I think.

Well, the notifier will be registered, yes, but it will never be
called because that path is reachable only from a write to
/proc/ppc64/ofdt, which is not created on non-pseries.

Maybe it should be

machine_device_initcall(pseries, pseries_memory_hotplug_init);

(and pseries_cpu_hotplug_init in hotplug-cpu.c should be changed to
machine_arch_initcall)
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 1/3] maple: use platform name in define_machine()

2008-03-12 Thread Nathan Lynch

Prevailing practice for define_machine() in powerpc is to use the
platform name when the platform has only one define_machine()
statement, but maple uses maple_md.  This caused me some
head-scratching when writing some new code that uses
machine_is(maple).

Use maple instead of maple_md.  There should not be any behavioral
change -- fixup_maple_ide() calls machine_is(maple) but the body of
the function is ifdef'd out.

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---
 arch/powerpc/platforms/maple/setup.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/maple/setup.c 
b/arch/powerpc/platforms/maple/setup.c
index 3ce2d73..3b32e07 100644
--- a/arch/powerpc/platforms/maple/setup.c
+++ b/arch/powerpc/platforms/maple/setup.c
@@ -319,7 +319,7 @@ static int __init maple_probe(void)
return 1;
 }
 
-define_machine(maple_md) {
+define_machine(maple) {
.name   = Maple,
.probe  = maple_probe,
.setup_arch = maple_setup_arch,
-- 
1.5.4.4

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

maple: minor updates

2008-03-12 Thread Nathan Lynch


Please consider the following mostly-trivial patches for 2.6.26.

 arch/powerpc/configs/maple_defconfig |  133 --
 arch/powerpc/platforms/maple/pci.c   |   47 
 arch/powerpc/platforms/maple/setup.c |2 +-
 3 files changed, 129 insertions(+), 53 deletions(-)
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 2/3] maple: kill fixup_maple_ide

2008-03-12 Thread Nathan Lynch

This function has been a no-op for about 18 months; it's there in
the history should anyone need to resurrect it.

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---
 arch/powerpc/platforms/maple/pci.c |   47 
 1 files changed, 0 insertions(+), 47 deletions(-)

diff --git a/arch/powerpc/platforms/maple/pci.c 
b/arch/powerpc/platforms/maple/pci.c
index 3ffa0ac..3018552 100644
--- a/arch/powerpc/platforms/maple/pci.c
+++ b/arch/powerpc/platforms/maple/pci.c
@@ -592,50 +592,3 @@ int maple_pci_get_legacy_ide_irq(struct pci_dev *pdev, int 
channel)
}
return irq;
 }
-
-/* XXX: To remove once all firmwares are ok */
-static void fixup_maple_ide(struct pci_dev* dev)
-{
-   if (!machine_is(maple))
-   return;
-
-#if 0 /* Enable this to enable IDE port 0 */
-   {
-   u8 v;
-
-   pci_read_config_byte(dev, 0x40, v);
-   v |= 2;
-   pci_write_config_byte(dev, 0x40, v);
-   }
-#endif
-#if 0 /* fix bus master base */
-   pci_write_config_dword(dev, 0x20, 0xcc01);
-   printk(old ide resource: %lx - %lx \n,
-  dev-resource[4].start, dev-resource[4].end);
-   dev-resource[4].start = 0xcc00;
-   dev-resource[4].end = 0xcc10;
-#endif
-#if 0 /* Enable this to fixup IDE sense/polarity of irqs in IO-APICs */
-   {
-   struct pci_dev *apicdev;
-   u32 v;
-
-   apicdev = pci_get_slot (dev-bus, PCI_DEVFN(5,0));
-   if (apicdev == NULL)
-   printk(IDE Fixup IRQ: Can't find IO-APIC !\n);
-   else {
-   pci_write_config_byte(apicdev, 0xf2, 0x10 + 2*14);
-   pci_read_config_dword(apicdev, 0xf4, v);
-   v = ~0x0022;
-   pci_write_config_dword(apicdev, 0xf4, v);
-   pci_write_config_byte(apicdev, 0xf2, 0x10 + 2*15);
-   pci_read_config_dword(apicdev, 0xf4, v);
-   v = ~0x0022;
-   pci_write_config_dword(apicdev, 0xf4, v);
-   pci_dev_put(apicdev);
-   }
-   }
-#endif
-}
-DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_AMD, PCI_DEVICE_ID_AMD_8111_IDE,
-fixup_maple_ide);
-- 
1.5.4.4

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 3/3] maple: enable ipr driver in defconfig

2008-03-12 Thread Nathan Lynch

Some machines supported by the maple platform have an Obsidian
controller which can't be used without enabling CONFIG_IPR and the
options on which it depends.

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---
 arch/powerpc/configs/maple_defconfig |  133 --
 1 files changed, 128 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/configs/maple_defconfig 
b/arch/powerpc/configs/maple_defconfig
index 8b810d0..25be62b 100644
--- a/arch/powerpc/configs/maple_defconfig
+++ b/arch/powerpc/configs/maple_defconfig
@@ -1,7 +1,7 @@
 #
 # Automatically generated make config: don't edit
 # Linux kernel version: 2.6.24-rc4
-# Thu Dec  6 16:48:26 2007
+# Wed Mar 12 16:21:48 2008
 #
 CONFIG_PPC64=y
 
@@ -333,7 +333,7 @@ CONFIG_DEFAULT_TCP_CONG=cubic
 CONFIG_UEVENT_HELPER_PATH=/sbin/hotplug
 CONFIG_STANDALONE=y
 CONFIG_PREVENT_FIRMWARE_BUILD=y
-# CONFIG_FW_LOADER is not set
+CONFIG_FW_LOADER=y
 # CONFIG_DEBUG_DRIVER is not set
 # CONFIG_DEBUG_DEVRES is not set
 # CONFIG_SYS_HYPERVISOR is not set
@@ -374,6 +374,7 @@ CONFIG_BLK_DEV_IDEDISK=y
 CONFIG_BLK_DEV_IDECD=y
 # CONFIG_BLK_DEV_IDETAPE is not set
 # CONFIG_BLK_DEV_IDEFLOPPY is not set
+# CONFIG_BLK_DEV_IDESCSI is not set
 CONFIG_IDE_TASK_IOCTL=y
 CONFIG_IDE_PROC_FS=y
 
@@ -427,10 +428,129 @@ CONFIG_IDE_ARCH_OBSOLETE_INIT=y
 # SCSI device support
 #
 # CONFIG_RAID_ATTRS is not set
-# CONFIG_SCSI is not set
-# CONFIG_SCSI_DMA is not set
+CONFIG_SCSI=y
+CONFIG_SCSI_DMA=y
+# CONFIG_SCSI_TGT is not set
 # CONFIG_SCSI_NETLINK is not set
-# CONFIG_ATA is not set
+# CONFIG_SCSI_PROC_FS is not set
+
+#
+# SCSI support type (disk, tape, CD-ROM)
+#
+CONFIG_BLK_DEV_SD=y
+# CONFIG_CHR_DEV_ST is not set
+# CONFIG_CHR_DEV_OSST is not set
+# CONFIG_BLK_DEV_SR is not set
+CONFIG_CHR_DEV_SG=y
+# CONFIG_CHR_DEV_SCH is not set
+
+#
+# Some SCSI devices (e.g. CD jukebox) support multiple LUNs
+#
+# CONFIG_SCSI_MULTI_LUN is not set
+# CONFIG_SCSI_CONSTANTS is not set
+# CONFIG_SCSI_LOGGING is not set
+# CONFIG_SCSI_SCAN_ASYNC is not set
+CONFIG_SCSI_WAIT_SCAN=m
+
+#
+# SCSI Transports
+#
+# CONFIG_SCSI_SPI_ATTRS is not set
+# CONFIG_SCSI_FC_ATTRS is not set
+# CONFIG_SCSI_ISCSI_ATTRS is not set
+# CONFIG_SCSI_SAS_LIBSAS is not set
+# CONFIG_SCSI_SRP_ATTRS is not set
+CONFIG_SCSI_LOWLEVEL=y
+# CONFIG_ISCSI_TCP is not set
+# CONFIG_BLK_DEV_3W__RAID is not set
+# CONFIG_SCSI_3W_9XXX is not set
+# CONFIG_SCSI_ACARD is not set
+# CONFIG_SCSI_AACRAID is not set
+# CONFIG_SCSI_AIC7XXX is not set
+# CONFIG_SCSI_AIC7XXX_OLD is not set
+# CONFIG_SCSI_AIC79XX is not set
+# CONFIG_SCSI_AIC94XX is not set
+# CONFIG_SCSI_ARCMSR is not set
+# CONFIG_MEGARAID_NEWGEN is not set
+# CONFIG_MEGARAID_LEGACY is not set
+# CONFIG_MEGARAID_SAS is not set
+# CONFIG_SCSI_HPTIOP is not set
+# CONFIG_SCSI_DMX3191D is not set
+# CONFIG_SCSI_EATA is not set
+# CONFIG_SCSI_FUTURE_DOMAIN is not set
+# CONFIG_SCSI_GDTH is not set
+# CONFIG_SCSI_IPS is not set
+# CONFIG_SCSI_INITIO is not set
+# CONFIG_SCSI_INIA100 is not set
+# CONFIG_SCSI_STEX is not set
+# CONFIG_SCSI_SYM53C8XX_2 is not set
+CONFIG_SCSI_IPR=y
+CONFIG_SCSI_IPR_TRACE=y
+CONFIG_SCSI_IPR_DUMP=y
+# CONFIG_SCSI_QLOGIC_1280 is not set
+# CONFIG_SCSI_QLA_FC is not set
+# CONFIG_SCSI_QLA_ISCSI is not set
+# CONFIG_SCSI_LPFC is not set
+# CONFIG_SCSI_DC395x is not set
+# CONFIG_SCSI_DC390T is not set
+# CONFIG_SCSI_DEBUG is not set
+# CONFIG_SCSI_SRP is not set
+CONFIG_ATA=y
+CONFIG_ATA_NONSTANDARD=y
+# CONFIG_SATA_AHCI is not set
+# CONFIG_SATA_SVW is not set
+# CONFIG_ATA_PIIX is not set
+# CONFIG_SATA_MV is not set
+# CONFIG_SATA_NV is not set
+# CONFIG_PDC_ADMA is not set
+# CONFIG_SATA_QSTOR is not set
+# CONFIG_SATA_PROMISE is not set
+# CONFIG_SATA_SX4 is not set
+# CONFIG_SATA_SIL is not set
+# CONFIG_SATA_SIL24 is not set
+# CONFIG_SATA_SIS is not set
+# CONFIG_SATA_ULI is not set
+# CONFIG_SATA_VIA is not set
+# CONFIG_SATA_VITESSE is not set
+# CONFIG_SATA_INIC162X is not set
+# CONFIG_PATA_ALI is not set
+# CONFIG_PATA_AMD is not set
+# CONFIG_PATA_ARTOP is not set
+# CONFIG_PATA_ATIIXP is not set
+# CONFIG_PATA_CMD640_PCI is not set
+# CONFIG_PATA_CMD64X is not set
+# CONFIG_PATA_CS5520 is not set
+# CONFIG_PATA_CS5530 is not set
+# CONFIG_PATA_CYPRESS is not set
+# CONFIG_PATA_EFAR is not set
+# CONFIG_ATA_GENERIC is not set
+# CONFIG_PATA_HPT366 is not set
+# CONFIG_PATA_HPT37X is not set
+# CONFIG_PATA_HPT3X2N is not set
+# CONFIG_PATA_HPT3X3 is not set
+# CONFIG_PATA_IT821X is not set
+# CONFIG_PATA_IT8213 is not set
+# CONFIG_PATA_JMICRON is not set
+# CONFIG_PATA_TRIFLEX is not set
+# CONFIG_PATA_MARVELL is not set
+# CONFIG_PATA_MPIIX is not set
+# CONFIG_PATA_OLDPIIX is not set
+# CONFIG_PATA_NETCELL is not set
+# CONFIG_PATA_NS87410 is not set
+# CONFIG_PATA_NS87415 is not set
+# CONFIG_PATA_OPTI is not set
+# CONFIG_PATA_OPTIDMA is not set
+# CONFIG_PATA_PDC_OLD is not set
+# CONFIG_PATA_RADISYS is not set
+# CONFIG_PATA_RZ1000 is not set
+# CONFIG_PATA_SC1200 is not set
+# CONFIG_PATA_SERVERWORKS

[PATCH] convert pci and eeh code to of_device_is_available

2008-03-13 Thread Nathan Lynch

A couple of places are duplicating the function of
of_device_is_available; convert them to use it.

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---

This depends on Josh Boyer's patch [OF] Add of_device_is_available
function:

http://patchwork.ozlabs.org/linuxppc/patch?id=17100

 arch/powerpc/kernel/rtas_pci.c   |   19 ++-
 arch/powerpc/platforms/pseries/eeh.c |5 ++---
 2 files changed, 4 insertions(+), 20 deletions(-)

diff --git a/arch/powerpc/kernel/rtas_pci.c b/arch/powerpc/kernel/rtas_pci.c
index 433a0a0..39a752b 100644
--- a/arch/powerpc/kernel/rtas_pci.c
+++ b/arch/powerpc/kernel/rtas_pci.c
@@ -56,21 +56,6 @@ static inline int config_access_valid(struct pci_dn *dn, int 
where)
return 0;
 }
 
-static int of_device_available(struct device_node * dn)
-{
-const char *status;
-
-status = of_get_property(dn, status, NULL);
-
-if (!status)
-return 1;
-
-if (!strcmp(status, okay))
-return 1;
-
-return 0;
-}
-
 int rtas_read_config(struct pci_dn *pdn, int where, int size, u32 *val)
 {
int returnval = -1;
@@ -117,7 +102,7 @@ static int rtas_pci_read_config(struct pci_bus *bus,
for (dn = busdn-child; dn; dn = dn-sibling) {
struct pci_dn *pdn = PCI_DN(dn);
if (pdn  pdn-devfn == devfn
-of_device_available(dn))
+of_device_is_available(dn))
return rtas_read_config(pdn, where, size, val);
}
 
@@ -164,7 +149,7 @@ static int rtas_pci_write_config(struct pci_bus *bus,
for (dn = busdn-child; dn; dn = dn-sibling) {
struct pci_dn *pdn = PCI_DN(dn);
if (pdn  pdn-devfn == devfn
-of_device_available(dn))
+of_device_is_available(dn))
return rtas_write_config(pdn, where, size, val);
}
return PCIBIOS_DEVICE_NOT_FOUND;
diff --git a/arch/powerpc/platforms/pseries/eeh.c 
b/arch/powerpc/platforms/pseries/eeh.c
index 9eb539e..550b2f7 100644
--- a/arch/powerpc/platforms/pseries/eeh.c
+++ b/arch/powerpc/platforms/pseries/eeh.c
@@ -945,7 +945,6 @@ static void *early_enable_eeh(struct device_node *dn, void 
*data)
unsigned int rets[3];
struct eeh_early_enable_info *info = data;
int ret;
-   const char *status = of_get_property(dn, status, NULL);
const u32 *class_code = of_get_property(dn, class-code, NULL);
const u32 *vendor_id = of_get_property(dn, vendor-id, NULL);
const u32 *device_id = of_get_property(dn, device-id, NULL);
@@ -959,8 +958,8 @@ static void *early_enable_eeh(struct device_node *dn, void 
*data)
pdn-eeh_freeze_count = 0;
pdn-eeh_false_positives = 0;
 
-   if (status  strncmp(status, ok, 2) != 0)
-   return NULL;/* ignore devices with bad status */
+   if (!of_device_is_available(dn))
+   return NULL;
 
/* Ignore bad nodes. */
if (!class_code || !vendor_id || !device_id)
-- 
1.5.4.rc5

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH v2] Force 4K pages for IO addresses.

2008-03-18 Thread Nathan Lynch

Benjamin Herrenschmidt wrote:
 
 On Mon, 2008-03-17 at 19:34 -0500, Olof Johansson wrote:
  On Mon, Mar 17, 2008 at 02:54:19PM +1100, Tony Breeds wrote:
   Currently HEA requires 4K pages for IO resources.  Just set the pages 
   size to
   IO to 4K.
  
  Well, that's too bad. Why penalize all platforms for it?
  
  I.e.: Nack, we use 64K iopages on pa6t and it works well. No need to
  waste tlb and erat space.
 
 We would have to make that pSeries specific for now I suppose...
 
 We don't have a way to know that there can be an EHEA right ? It may
 not be in the device-tree at boot and dynamically added to the
 partition...

The ibm,drc-names property of the root node should have HEA strings
in it on systems where EHEA can potentially be present.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 2/2] Force 4K IOPages when eHEA is present in the machine.

2008-03-20 Thread Nathan Lynch

Tony Breeds wrote:
 
 +#if defined(CONFIG_PPC_PSERIES)  defined(CONFIG_PPC_64K_PAGES)
 +static int __init scan_dt_for_ehea(unsigned long node, const char *uname,
 +  int depth, void *data)
 +{
 + if (depth != 0)
 + return 0;
 +
 + if (of_flat_dt_search(node, HEA , ibm,drc-names))
 + return 1;
 +
 + return 0;
 +}
 +#endif

Searching ibm,drc-names is necessary, but is it sufficient?  That is,
can there be a system that has an HEA adapter without that property
being present?  Thinking in particular about single-partition systems
that run without an HMC attached.


  static void __init htab_init_page_sizes(void)
  {
   int rc;
 +#ifdef CONFIG_PPC_64K_PAGES
 + int has_ehea = 0;
 +#endif
  
   /* Default to 4K pages only */
   memcpy(mmu_psize_defs, mmu_psize_defaults_old,
  sizeof(mmu_psize_defaults_old));
  
 +#if defined(CONFIG_PPC_PSERIES)  defined(CONFIG_PPC_64K_PAGES)
 + /* Scan to see if this system can have an EHEA, if so we'll
 +  * demote io_psize to 4K */
 + has_ehea = of_scan_flat_dt(scan_dt_for_ehea, NULL);
 +#endif

I'm wondering if some of the ifdef stuff could be avoided if you used
a firmware feature bit to signify the limitation (or the lack of it).
The bit could be set during pSeries_probe.  Just an idea; I don't feel
that strongly about it.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH] scanlog_init cleanup, minor fixes

2008-03-23 Thread Nathan Lynch

scanlog_init() could use some love.

* properly return -ENODEV if this system doesn't support scan-log-dump
* don't printk if scan-log-dump not present; only older systems have it
* convert from create_proc_entry() to preferred proc_create()
* allocate zeroed data buffer
* fix potential memory leak of ent-data on failed create_proc_entry()
* simplify control flow

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---

(not urgent, please consider for 2.6.26)

 arch/powerpc/platforms/pseries/scanlog.c |   37 ++---
 1 files changed, 18 insertions(+), 19 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/scanlog.c 
b/arch/powerpc/platforms/pseries/scanlog.c
index 8e1ef16..e5b0ea8 100644
--- a/arch/powerpc/platforms/pseries/scanlog.c
+++ b/arch/powerpc/platforms/pseries/scanlog.c
@@ -195,31 +195,30 @@ const struct file_operations scanlog_fops = {
 static int __init scanlog_init(void)
 {
struct proc_dir_entry *ent;
+   void *data;
+   int err = -ENOMEM;
 
ibm_scan_log_dump = rtas_token(ibm,scan-log-dump);
-   if (ibm_scan_log_dump == RTAS_UNKNOWN_SERVICE) {
-   printk(KERN_ERR scan-log-dump not implemented on this 
system\n);
-   return -EIO;
-   }
+   if (ibm_scan_log_dump == RTAS_UNKNOWN_SERVICE)
+   return -ENODEV;
 
-ent = create_proc_entry(ppc64/rtas/scan-log-dump,  S_IRUSR, NULL);
-   if (ent) {
-   ent-proc_fops = scanlog_fops;
-   /* Ideally we could allocate a buffer  4G */
-   ent-data = kmalloc(RTAS_DATA_BUF_SIZE, GFP_KERNEL);
-   if (!ent-data) {
-   printk(KERN_ERR Failed to allocate a buffer\n);
-   remove_proc_entry(scan-log-dump, ent-parent);
-   return -ENOMEM;
-   }
-   ((unsigned int *)ent-data)[0] = 0;
-   } else {
-   printk(KERN_ERR Failed to create ppc64/scan-log-dump proc 
entry\n);
-   return -EIO;
-   }
+   /* Ideally we could allocate a buffer  4G */
+   data = kzalloc(RTAS_DATA_BUF_SIZE, GFP_KERNEL);
+   if (!data)
+   goto err;
+
+   ent = proc_create(ppc64/rtas/scan-log-dump, S_IRUSR, NULL,
+ scanlog_fops);
+   if (!ent)
+   goto err;
+
+   ent-data = data;
proc_ppc64_scan_log_dump = ent;
 
return 0;
+err:
+   kfree(data);
+   return err;
 }
 
 static void __exit scanlog_cleanup(void)
-- 
1.5.4.rc5

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] RTAS - adapt procfs interface

2008-04-01 Thread Nathan Lynch

Jens Osterkamp wrote:
 
 Handling of the proc_dir_entry-count has being changed in 2.6.24-rc5.

Do you know which commit caused the change?


 After this change the default value for pde-count is 1 and not 0 as it
 was in earlier kernels. Therefore, if we want to check wether our procfs
 file is already opened (already in use), we have to check if pde-count
 is not greater than 2 but not 1.
 
 Signed-off-by: Maxim Shchetynin [EMAIL PROTECTED]
 Signed-off-by: Jens Osterkamp [EMAIL PROTECTED]
 
 Index: linux-2.6/arch/powerpc/kernel/rtas_flash.c
 ===
 --- linux-2.6.orig/arch/powerpc/kernel/rtas_flash.c
 +++ linux-2.6/arch/powerpc/kernel/rtas_flash.c
 @@ -356,7 +356,7 @@ static int rtas_excl_open(struct inode *
  
   /* Enforce exclusive open with use count of PDE */
   spin_lock(flash_file_open_lock);
 - if (atomic_read(dp-count)  1) {
 + if (atomic_read(dp-count)  2) {
   spin_unlock(flash_file_open_lock);
   return -EBUSY;
   }

One could argue that the real problem is using the proc_dir_entry's
reference count to enforce exclusive open.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] RTAS - adapt procfs interface

2008-04-01 Thread Nathan Lynch

Nathan Lynch wrote:
 
 One could argue that the real problem is using the proc_dir_entry's
 reference count to enforce exclusive open.


I think this is better... the way these files are used is lame, but
this should preserve the existing behavior.  I haven't yet tested
this, can you?


diff --git a/arch/powerpc/kernel/rtas_flash.c b/arch/powerpc/kernel/rtas_flash.c
index f227659..00bc308 100644
--- a/arch/powerpc/kernel/rtas_flash.c
+++ b/arch/powerpc/kernel/rtas_flash.c
@@ -139,7 +139,7 @@ struct rtas_validate_flash_t
unsigned int update_results;/* Update results token */
 };
 
-static DEFINE_SPINLOCK(flash_file_open_lock);
+static atomic_t open_count = ATOMIC_INIT(0);
 static struct proc_dir_entry *firmware_flash_pde;
 static struct proc_dir_entry *firmware_update_pde;
 static struct proc_dir_entry *validate_pde;
@@ -216,7 +216,7 @@ static int rtas_flash_release(struct inode *inode, struct 
file *file)
uf-flist = NULL;
}
 
-   atomic_dec(dp-count);
+   atomic_dec(open_count);
return 0;
 }
 
@@ -352,26 +352,17 @@ static ssize_t rtas_flash_write(struct file *file, const 
char __user *buffer,
 
 static int rtas_excl_open(struct inode *inode, struct file *file)
 {
-   struct proc_dir_entry *dp = PDE(inode);
-
-   /* Enforce exclusive open with use count of PDE */
-   spin_lock(flash_file_open_lock);
-   if (atomic_read(dp-count)  1) {
-   spin_unlock(flash_file_open_lock);
+   if (atomic_inc_return(open_count)  1) {
+   atomic_dec(open_count);
return -EBUSY;
}
 
-   atomic_inc(dp-count);
-   spin_unlock(flash_file_open_lock);
-   
return 0;
 }
 
 static int rtas_excl_release(struct inode *inode, struct file *file)
 {
-   struct proc_dir_entry *dp = PDE(inode);
-
-   atomic_dec(dp-count);
+   atomic_dec(open_count);
 
return 0;
 }
@@ -580,7 +571,7 @@ static int validate_flash_release(struct inode *inode, 
struct file *file)
}
 
/* The matching atomic_inc was in rtas_excl_open() */
-   atomic_dec(dp-count);
+   atomic_dec(open_count);
 
return 0;
 }
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] RTAS - adapt procfs interface

2008-04-02 Thread Nathan Lynch

Paul Mackerras wrote:
 Nathan Lynch writes:
 
  I think this is better... the way these files are used is lame, but
  this should preserve the existing behavior.  I haven't yet tested
  this, can you?
 
 Looks OK -- can I have a proper patch description and a signed-off-by
 line for this please?

Actually, my patch has the potentially undesirable consequence of
allowing only one of the three flash-related proc files to be open at
any time, whereas the previous behavior enforced exclusive open on a
per-file basis.

If you want something for 2.6.25, I think the patch Jens posted is
of lower risk.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 1/1 resend][PPC] replace logical by bitand in pci_process_ISA_OF_ranges(), arch/powerpc/kernel/isa-bridge.c

2008-04-07 Thread Nathan Lynch

Roel Kluin wrote:
 replace logical by bit and for ISA_SPACE_MASK

I think Paul has picked this up now:

http://ozlabs.org/pipermail/linuxppc-dev/2008-April/054103.html
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

do section mismatch check on full vmlinux breaks powerpc build

2007-07-24 Thread Nathan Lynch

Hello-

2.6.23-rc1 breaks the build for 64-bit powerpc for me (using
maple_defconfig):

  LD  vmlinux.o
powerpc64-unknown-linux-gnu-ld: dynreloc miscount for
kernel/built-in.o, section .opd
powerpc64-unknown-linux-gnu-ld: can not edit opd Bad value
make: *** [vmlinux.o] Error 1

This is on a i386 host with:
powerpc64-unknown-linux-gnu-gcc (GCC) 4.1.2
GNU ld version 2.16.1

Reverting the following commit fixes it:

commit 741f98fe298a73c9d47ed53703c1279a29718581
Author: Sam Ravnborg [EMAIL PROTECTED]
Date:   Tue Jul 17 10:54:06 2007 +0200

kbuild: do section mismatch check on full vmlinux


However, I see a possibly related binutils patch:
http://article.gmane.org/gmane.comp.gnu.binutils/33650

Will there be a kbuild fix for this or should I update my binutils?

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: do section mismatch check on full vmlinux breaks powerpc build

2007-07-25 Thread Nathan Lynch

Hi Sam-

Sam Ravnborg wrote:
 On Tue, Jul 24, 2007 at 05:41:05PM -0500, Nathan Lynch wrote:
  
  2.6.23-rc1 breaks the build for 64-bit powerpc for me (using
  maple_defconfig):
  
LD  vmlinux.o
  powerpc64-unknown-linux-gnu-ld: dynreloc miscount for
  kernel/built-in.o, section .opd
  powerpc64-unknown-linux-gnu-ld: can not edit opd Bad value
  make: *** [vmlinux.o] Error 1
  
 
 I narrowed it down to the following change to avoid the breakage:
 
 diff --git a/include/linux/mm.h b/include/linux/mm.h
 index c456c3a..2ea222f 100644
 --- a/include/linux/mm.h
 +++ b/include/linux/mm.h
 @@ -1246,7 +1246,7 @@ void drop_slab(void);
  extern int randomize_va_space;
  #endif
  
 -__attribute__((weak)) const char *arch_vma_name(struct vm_area_struct *vma);
 +//__attribute__((weak)) const char *arch_vma_name(struct vm_area_struct 
 *vma);
  
  #endif /* __KERNEL__ */
  #endif /* _LINUX_MM_H */
 
 So seems that something goes a bit fishy when using weak symbols and this 
 trigges 
 a binutils bug.
 
 The above line was introdused in the following commit:
 
 commit f269fdd1829acc5e53bf57b145003e5733133f2b
 Author: David Howells [EMAIL PROTECTED]
 Date:   Wed Sep 27 01:50:23 2006 -0700
 
 [PATCH] NOMMU: move the fallback arch_vma_name() to a sensible place


Thanks for looking into this.  Removing the __attribute__((weak))
from arch_vma_name's declaration in linux/mm.h unbreaks the build for
me.

Maybe it shouldn't matter, but it seems unusual to have the weak
attribute specified at the function's declaration.  I wasn't able to
find any examples of that for other weak functions in the kernel
(e.g. sched_clock).

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 6/6] pseries: eliminate global var

2007-08-08 Thread Nathan Lynch

Linas Vepstas wrote:
 --- linux-2.6.22-git2.orig/include/asm-powerpc/nvram.h2007-07-08 
 18:32:17.0 -0500
 +++ linux-2.6.22-git2/include/asm-powerpc/nvram.h 2007-08-08 
 13:34:27.0 -0500
 @@ -62,14 +62,19 @@ struct nvram_partition {
   unsigned int index;
  };
  
 -
 -extern int nvram_write_error_log(char * buff, int length, unsigned int 
 err_type);
 -extern int nvram_read_error_log(char * buff, int length, unsigned int * 
 err_type);
 -extern int nvram_clear_error_log(void);
  extern struct nvram_partition *nvram_find_partition(int sig, const char 
 *name);
  
 -extern int pSeries_nvram_init(void);
  extern int mmio_nvram_init(void);
 +
 +#ifdef CONFIG_PPC_PSERIES
 +extern int pSeries_nvram_init(void);
 +extern int nvram_write_error_log(char * buff, int length,
 +  unsigned int err_type, unsigned int 
 err_seq);
 +extern int nvram_read_error_log(char * buff, int length,
 +  unsigned int * err_type, unsigned int 
 *err_seq);
 +extern int nvram_clear_error_log(void);
 +#endif /* CONFIG_PPC_PSERIES */

Declarations need not be #ifdef'd.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 1/6] pseries: avoid excess rtas calls

2007-08-08 Thread Nathan Lynch

Linas Vepstas wrote:
 
 We don't need to look up the rtas event token once per 
 cpu per second. This avoids some misc string ops and 
 rtas calls and provides some minor performance improvement.

It does not avoid any calls to RTAS.  (rtas_token merely looks up
properties under the /rtas node.)

Otherwise, looks okay.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 6/6] pseries: eliminate global var

2007-08-08 Thread Nathan Lynch

Linas Vepstas wrote:
 On Wed, Aug 08, 2007 at 04:57:14PM -0500, Nathan Lynch wrote:
  Linas Vepstas wrote:
   +
   +#ifdef CONFIG_PPC_PSERIES
   +extern int pSeries_nvram_init(void);
   +extern int nvram_write_error_log(char * buff, int length,
   +  unsigned int err_type, unsigned int 
   err_seq);
   +extern int nvram_read_error_log(char * buff, int length,
   +  unsigned int * err_type, unsigned int 
   *err_seq);
   +extern int nvram_clear_error_log(void);
   +#endif /* CONFIG_PPC_PSERIES */
  
  Declarations need not be #ifdef'd.
 
 Ah, I thought that would be cleaner ... should I resubmit,
 the patch, or does it matter that much?

FWIW I'd prefer you drop that hunk and resubmit; it's needless churn
that can only cause build problems.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [RFC/PATCH] remove gratuitous reads from maple pci config space methods

2007-08-08 Thread Nathan Lynch

Benjamin Herrenschmidt wrote:
 On Wed, 2007-08-08 at 19:50 -0500, Nathan Lynch wrote:
  
  Remove the gratuitous reads from u3_agp_write_config,
  u3_ht_write_config, and u4_pcie_write_config.
  
  Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
 
 Acked-by: Benjamin Herrenschmidt [EMAIL PROTECTED]
 ---
 
 Thanks ! Care to fix powermac too ? :-)

Sure, I'll get it tomorrow... looks like pasemi cribbed the powermac
code too :)
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [RFC/PATCH] remove gratuitous reads from maple pci config space methods

2007-08-08 Thread Nathan Lynch

David Gibson wrote:
 On Wed, Aug 08, 2007 at 07:50:44PM -0500, Nathan Lynch wrote:
  The maple pci configuration space write methods read the written
  location immediately after the write is performed, presumably in order
  to flush the write.  However, configuration space writes are not
  allowed to be posted, making these reads gratuitous.
 
 It might be worth checking that there isn't a particular reason for
 these.  Just because posting writes are forbidden doesn't mean a
 particular bridge won't screw it up...

Well, I had already checked with Ben, who wrote the code, and my
understanding is that the reads are intended to work around some
misbehaving Apple bridges, but that a sync after the write (implied by
releasing pci_lock in the generic pci code) should suffice for those.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 01/11] rtas_pci_ops: use named structure member initializers

2007-08-09 Thread Nathan Lynch

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---
 arch/powerpc/kernel/rtas_pci.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/rtas_pci.c b/arch/powerpc/kernel/rtas_pci.c
index a5de621..21f14e5 100644
--- a/arch/powerpc/kernel/rtas_pci.c
+++ b/arch/powerpc/kernel/rtas_pci.c
@@ -171,8 +171,8 @@ static int rtas_pci_write_config(struct pci_bus *bus,
 }
 
 struct pci_ops rtas_pci_ops = {
-   rtas_pci_read_config,
-   rtas_pci_write_config
+   .read = rtas_pci_read_config,
+   .write = rtas_pci_write_config,
 };
 
 int is_python(struct device_node *dev)
-- 
1.5.2.4

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 02/11] celleb_fake_pci_ops: use named structure member initializers

2007-08-09 Thread Nathan Lynch

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---
 arch/powerpc/platforms/celleb/pci.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/celleb/pci.c 
b/arch/powerpc/platforms/celleb/pci.c
index e9ac19c..de8038b 100644
--- a/arch/powerpc/platforms/celleb/pci.c
+++ b/arch/powerpc/platforms/celleb/pci.c
@@ -242,8 +242,8 @@ static int celleb_fake_pci_write_config(struct pci_bus *bus,
 }
 
 static struct pci_ops celleb_fake_pci_ops = {
-   celleb_fake_pci_read_config,
-   celleb_fake_pci_write_config
+   .read = celleb_fake_pci_read_config,
+   .write = celleb_fake_pci_write_config,
 };
 
 static inline void celleb_setup_pci_base_addrs(struct pci_controller *hose,
-- 
1.5.2.4

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 03/11] celleb_epci_ops: use named structure member initializers

2007-08-09 Thread Nathan Lynch

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---
 arch/powerpc/platforms/celleb/scc_epci.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/celleb/scc_epci.c 
b/arch/powerpc/platforms/celleb/scc_epci.c
index c4b0110..7506acc 100644
--- a/arch/powerpc/platforms/celleb/scc_epci.c
+++ b/arch/powerpc/platforms/celleb/scc_epci.c
@@ -278,8 +278,8 @@ static int celleb_epci_write_config(struct pci_bus *bus,
 }
 
 struct pci_ops celleb_epci_ops = {
-   celleb_epci_read_config,
-   celleb_epci_write_config,
+   .read = celleb_epci_read_config,
+   .write = celleb_epci_write_config,
 };
 
 /* to be moved in FW */
-- 
1.5.2.4

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 04/11] maple pci_ops: use named structure member initializers

2007-08-09 Thread Nathan Lynch

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---
 arch/powerpc/platforms/maple/pci.c |   12 ++--
 1 files changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/platforms/maple/pci.c 
b/arch/powerpc/platforms/maple/pci.c
index 2542403..6247f94 100644
--- a/arch/powerpc/platforms/maple/pci.c
+++ b/arch/powerpc/platforms/maple/pci.c
@@ -185,8 +185,8 @@ static int u3_agp_write_config(struct pci_bus *bus, 
unsigned int devfn,
 
 static struct pci_ops u3_agp_pci_ops =
 {
-   u3_agp_read_config,
-   u3_agp_write_config
+   .read = u3_agp_read_config,
+   .write = u3_agp_write_config,
 };
 
 static unsigned long u3_ht_cfa0(u8 devfn, u8 off)
@@ -284,8 +284,8 @@ static int u3_ht_write_config(struct pci_bus *bus, unsigned 
int devfn,
 
 static struct pci_ops u3_ht_pci_ops =
 {
-   u3_ht_read_config,
-   u3_ht_write_config
+   .read = u3_ht_read_config,
+   .write = u3_ht_write_config,
 };
 
 static unsigned int u4_pcie_cfa0(unsigned int devfn, unsigned int off)
@@ -392,8 +392,8 @@ static int u4_pcie_write_config(struct pci_bus *bus, 
unsigned int devfn,
 
 static struct pci_ops u4_pcie_pci_ops =
 {
-u4_pcie_read_config,
-u4_pcie_write_config
+   .read = u4_pcie_read_config,
+   .write = u4_pcie_write_config,
 };
 
 static void __init setup_u3_agp(struct pci_controller* hose)
-- 
1.5.2.4

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 05/11] pa_pxp_ops: use named structure member initializers

2007-08-09 Thread Nathan Lynch

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---
 arch/powerpc/platforms/pasemi/pci.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/pasemi/pci.c 
b/arch/powerpc/platforms/pasemi/pci.c
index ab1f5f6..882b571 100644
--- a/arch/powerpc/platforms/pasemi/pci.c
+++ b/arch/powerpc/platforms/pasemi/pci.c
@@ -122,8 +122,8 @@ static int pa_pxp_write_config(struct pci_bus *bus, 
unsigned int devfn,
 }
 
 static struct pci_ops pa_pxp_ops = {
-   pa_pxp_read_config,
-   pa_pxp_write_config,
+   .read = pa_pxp_read_config,
+   .write = pa_pxp_write_config,
 };
 
 static void __init setup_pa_pxp(struct pci_controller *hose)
-- 
1.5.2.4

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 06/11] powermac pci_ops: use named structure member initializers

2007-08-09 Thread Nathan Lynch

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---
 arch/powerpc/platforms/powermac/pci.c |   16 
 1 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/platforms/powermac/pci.c 
b/arch/powerpc/platforms/powermac/pci.c
index 92586db..69d67ff 100644
--- a/arch/powerpc/platforms/powermac/pci.c
+++ b/arch/powerpc/platforms/powermac/pci.c
@@ -225,8 +225,8 @@ static int macrisc_write_config(struct pci_bus *bus, 
unsigned int devfn,
 
 static struct pci_ops macrisc_pci_ops =
 {
-   macrisc_read_config,
-   macrisc_write_config
+   .read = macrisc_read_config,
+   .write = macrisc_write_config,
 };
 
 #ifdef CONFIG_PPC32
@@ -280,8 +280,8 @@ chaos_write_config(struct pci_bus *bus, unsigned int devfn, 
int offset,
 
 static struct pci_ops chaos_pci_ops =
 {
-   chaos_read_config,
-   chaos_write_config
+   .read = chaos_read_config,
+   .write = chaos_write_config,
 };
 
 static void __init setup_chaos(struct pci_controller *hose,
@@ -456,8 +456,8 @@ static int u3_ht_write_config(struct pci_bus *bus, unsigned 
int devfn,
 
 static struct pci_ops u3_ht_pci_ops =
 {
-   u3_ht_read_config,
-   u3_ht_write_config
+   .read = u3_ht_read_config,
+   .write = u3_ht_write_config,
 };
 
 #define U4_PCIE_CFA0(devfn, off)   \
@@ -561,8 +561,8 @@ static int u4_pcie_write_config(struct pci_bus *bus, 
unsigned int devfn,
 
 static struct pci_ops u4_pcie_pci_ops =
 {
-   u4_pcie_read_config,
-   u4_pcie_write_config
+   .read = u4_pcie_read_config,
+   .write = u4_pcie_write_config,
 };
 
 #endif /* CONFIG_PPC64 */
-- 
1.5.2.4

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 07/11] null_pci_ops: use named structure member initializers

2007-08-09 Thread Nathan Lynch

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---
 arch/powerpc/kernel/pci_32.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/pci_32.c b/arch/powerpc/kernel/pci_32.c
index 04a3109..0e2bee4 100644
--- a/arch/powerpc/kernel/pci_32.c
+++ b/arch/powerpc/kernel/pci_32.c
@@ -1457,8 +1457,8 @@ null_write_config(struct pci_bus *bus, unsigned int 
devfn, int offset,
 
 static struct pci_ops null_pci_ops =
 {
-   null_read_config,
-   null_write_config
+   .read = null_read_config,
+   .write = null_write_config,
 };
 
 /*
-- 
1.5.2.4

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 08/11] efika rtas_pci_ops: use named structure member initializers

2007-08-09 Thread Nathan Lynch

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---
 arch/powerpc/platforms/52xx/efika.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/platforms/52xx/efika.c 
b/arch/powerpc/platforms/52xx/efika.c
index 4be6e7a..4263158 100644
--- a/arch/powerpc/platforms/52xx/efika.c
+++ b/arch/powerpc/platforms/52xx/efika.c
@@ -78,8 +78,8 @@ static int rtas_write_config(struct pci_bus *bus, unsigned 
int devfn,
 }
 
 static struct pci_ops rtas_pci_ops = {
-   rtas_read_config,
-   rtas_write_config
+   .read = rtas_read_config,
+   .write = rtas_write_config,
 };
 
 
-- 
1.5.2.4

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 09/11] chrp pci_ops: use named structure member initializers

2007-08-09 Thread Nathan Lynch

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---
 arch/powerpc/platforms/chrp/pci.c |8 
 1 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/chrp/pci.c 
b/arch/powerpc/platforms/chrp/pci.c
index 28d1647..0c6dba9 100644
--- a/arch/powerpc/platforms/chrp/pci.c
+++ b/arch/powerpc/platforms/chrp/pci.c
@@ -86,8 +86,8 @@ int gg2_write_config(struct pci_bus *bus, unsigned int devfn, 
int off,
 
 static struct pci_ops gg2_pci_ops =
 {
-   gg2_read_config,
-   gg2_write_config
+   .read = gg2_read_config,
+   .write = gg2_write_config,
 };
 
 /*
@@ -124,8 +124,8 @@ int rtas_write_config(struct pci_bus *bus, unsigned int 
devfn, int offset,
 
 static struct pci_ops rtas_pci_ops =
 {
-   rtas_read_config,
-   rtas_write_config
+   .read = rtas_read_config,
+   .write = rtas_write_config,
 };
 
 volatile struct Hydra __iomem *Hydra = NULL;
-- 
1.5.2.4

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 11/11] tsi108_direct_pci_ops: use named structure member initializers

2007-08-09 Thread Nathan Lynch

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---
 arch/powerpc/sysdev/tsi108_pci.c |4 ++--
 1 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/sysdev/tsi108_pci.c b/arch/powerpc/sysdev/tsi108_pci.c
index 90db8a7..cf0bfbd 100644
--- a/arch/powerpc/sysdev/tsi108_pci.c
+++ b/arch/powerpc/sysdev/tsi108_pci.c
@@ -193,8 +193,8 @@ void tsi108_clear_pci_cfg_error(void)
 }
 
 static struct pci_ops tsi108_direct_pci_ops = {
-   tsi108_direct_read_config,
-   tsi108_direct_write_config
+   .read = tsi108_direct_read_config,
+   .write = tsi108_direct_write_config,
 };
 
 int __init tsi108_setup_pci(struct device_node *dev, u32 cfg_phys, int primary)
-- 
1.5.2.4

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [RFC/PATCH] remove gratuitous reads from maple pci config space methods

2007-08-09 Thread Nathan Lynch

Segher Boessenkool wrote:
 It might be worth checking that there isn't a particular reason for
 these.  Just because posting writes are forbidden doesn't mean a
 particular bridge won't screw it up...
 
 Well, I had already checked with Ben, who wrote the code, and my
 understanding is that the reads are intended to work around some
 misbehaving Apple bridges,
 
 None of the PCI interfaces on the U3 or U4 bridges have that
 problem as far as I know.  I think the workaround was copied
 from code for older Apple bridges?

Okay, then the change should be fine for maple.

 but that a sync after the write (implied by
 releasing pci_lock in the generic pci code) should suffice for those.
 
 I don't see how a sync could help here at all, not more than
 an eieio anyway?

Alright, well, maybe take it up with Ben when I post the patch for
powermac, since that's where it could actually matter.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH] remove gratuitous reads from powermac pci config space methods

2007-08-09 Thread Nathan Lynch

The powermac pci configuration space write methods read the written
location immediately after the write is performed, presumably in order
to flush the write.  However, configuration space writes are not
allowed to be posted, making these reads gratuitous.  Furthermore,
this behavior potentially causes us to violate the PCI PM spec when
changing between e.g. D0 and D3 states, because a delay of up to 10ms
may be required before the OS accesses configuration space after the
write which initiates the transition.

Remove the unnecessary reads from macrisc_write_config,
u3_ht_write_config, and u4_pcie_write_config.


Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---
 arch/powerpc/platforms/powermac/pci.c |9 -
 1 files changed, 0 insertions(+), 9 deletions(-)

diff --git a/arch/powerpc/platforms/powermac/pci.c 
b/arch/powerpc/platforms/powermac/pci.c
index 92586db..bf1f5d1 100644
--- a/arch/powerpc/platforms/powermac/pci.c
+++ b/arch/powerpc/platforms/powermac/pci.c
@@ -209,15 +209,12 @@ static int macrisc_write_config(struct pci_bus *bus, 
unsigned int devfn,
switch (len) {
case 1:
out_8(addr, val);
-   (void) in_8(addr);
break;
case 2:
out_le16(addr, val);
-   (void) in_le16(addr);
break;
default:
out_le32(addr, val);
-   (void) in_le32(addr);
break;
}
return PCIBIOS_SUCCESSFUL;
@@ -440,15 +437,12 @@ static int u3_ht_write_config(struct pci_bus *bus, 
unsigned int devfn,
switch (len) {
case 1:
out_8(addr, val);
-   (void) in_8(addr);
break;
case 2:
out_le16(addr, val);
-   (void) in_le16(addr);
break;
default:
out_le32((u32 __iomem *)addr, val);
-   (void) in_le32(addr);
break;
}
return PCIBIOS_SUCCESSFUL;
@@ -545,15 +539,12 @@ static int u4_pcie_write_config(struct pci_bus *bus, 
unsigned int devfn,
switch (len) {
case 1:
out_8(addr, val);
-   (void) in_8(addr);
break;
case 2:
out_le16(addr, val);
-   (void) in_le16(addr);
break;
default:
out_le32(addr, val);
-   (void) in_le32(addr);
break;
}
return PCIBIOS_SUCCESSFUL;
-- 
1.5.2.4

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 2.6.23] ibmebus: Prevent bus_id collisions

2007-08-30 Thread Nathan Lynch

Hi Joachim-

Joachim Fenkes wrote:
 Nathan Lynch [EMAIL PROTECTED] wrote on 29.08.2007 20:12:32:
  Will anything break?
 
 Nope. Userspace programs should not depend on ibmebus' way of naming the 
 devices; especially since some overly long loc_codes tended to be 
 truncated and thus rendered useless. I have tested IBM's DLPAR tools 
 against the changed kernel, and they didn't break.

Okay.


  Also, I dislike this approach of duplicating the firmware device tree
  path in sysfs.
 
 Why? Any specific reasons for your dislike?

struct device's bus_id field is but 20 bytes in size.  Too close for
comfort?


  Are GX/ibmebus devices guaranteed to be children of
  the same node in the OF device tree?  If so, their unit addresses will
  be unique, and therefore suitable values for bus_id.  I believe this
  is what the powerpc vio bus code does.
 
 While there's no such guarantee (as in officially signed document), yes, 
 I expect future GX devices to also appear beneath the OFDT root node. For 
 the existing devices, the unit addresses are already part of the device 
 name, so I save the need to use sprintf() again. Plus, I rather like using 
 the full_name since it also contains a descriptive name as opposed to 
 being just nondescript numbers, helping the layman (ie user) to make sense 
 out of a dev_id.

Okay, but your layman isn't supposed to be relying on any
user-friendly properties of the name :) Hope he doesn't work on a
distro installer.

Anyway, if you're still confident in this approach, I relent.  :)
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 08/12] IB/ehca: Replace get_paca()-paca_index by the more portable smp_processor_id()

2007-09-11 Thread Nathan Lynch

Hi,

Joachim Fenkes wrote:
 Signed-off-by: Joachim Fenkes [EMAIL PROTECTED]
 ---
  drivers/infiniband/hw/ehca/ehca_tools.h |   14 +++---
  1 files changed, 7 insertions(+), 7 deletions(-)
 
 diff --git a/drivers/infiniband/hw/ehca/ehca_tools.h 
 b/drivers/infiniband/hw/ehca/ehca_tools.h
 index f9b264b..863f972 100644
 --- a/drivers/infiniband/hw/ehca/ehca_tools.h
 +++ b/drivers/infiniband/hw/ehca/ehca_tools.h
 @@ -73,37 +73,37 @@ extern int ehca_debug_level;
   if (unlikely(ehca_debug_level)) \
   dev_printk(KERN_DEBUG, (ib_dev)-dma_device, \
  PU%04x EHCA_DBG:%s  format \n, \
 -get_paca()-paca_index, __FUNCTION__, \
 +smp_processor_id(), __FUNCTION__, \
  ## arg); \
   } while (0)
  
  #define ehca_info(ib_dev, format, arg...) \
   dev_info((ib_dev)-dma_device, PU%04x EHCA_INFO:%s  format \n, \
 -  get_paca()-paca_index, __FUNCTION__, ## arg)
 +  smp_processor_id(), __FUNCTION__, ## arg)
  
  #define ehca_warn(ib_dev, format, arg...) \
   dev_warn((ib_dev)-dma_device, PU%04x EHCA_WARN:%s  format \n, \
 -  get_paca()-paca_index, __FUNCTION__, ## arg)
 +  smp_processor_id(), __FUNCTION__, ## arg)
  
  #define ehca_err(ib_dev, format, arg...) \
   dev_err((ib_dev)-dma_device, PU%04x EHCA_ERR:%s  format \n, \
 - get_paca()-paca_index, __FUNCTION__, ## arg)
 + smp_processor_id(), __FUNCTION__, ## arg)

I think I see these macros used in preemptible code (e.g. ehca_probe),
where smp_processor_id() will print a warning when
CONFIG_DEBUG_PREEMPT=y.  Probably better to use raw_smp_processor_id.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: Hard hang in hypervisor!?

2007-10-09 Thread Nathan Lynch

Linas Vepstas wrote:
 
 I was futzing with linux-2.6.23-rc8-mm1 in a power6 lpar when,
 for whatever reason, a spinlock locked up. The bizarre thing 
 was that the rest of system locked up as well: an ssh terminal,
 and also an hvc console.
 
 Breaking into the debugger showed 4 cpus, 1 of which was 
 deadlocked in the spinlock, and the other 3 in 
 .pseries_dedicated_idle_sleep
 
 This was, ahhh, unexpected.  What's up with that? Can
 anyone provide any insight?

Sounds consistent with a task trying to double-acquire the lock, or an
interrupt handler attempting to acquire a lock that the current task
holds.  Or maybe even an uninitialized spinlock.  Do you know which
lock it was?
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: Module with 36K relocation entries

2007-11-01 Thread Nathan Lynch

Medve Emilian-EMMEDVE1 wrote:
 
 I have module with 36K relocation entries (I know, I know, how could
 that be...) and the count_relocs() function takes ~17 seconds to crunch
 through the relocation table from the .rela.text section. I don't think
 I can reduce the number of entries in the relocation table (can I?) so
 I'm thinking at improving the performance of count_relocs() (currently
 O(n^2)). Does anybody have a simpler idea? Does anyone have any
 constructive suggestion on how to improve the situation?

This seems to come up every few months.  There was a patch submitted
here:

http://ozlabs.org/pipermail/linuxppc-dev/2007-June/037641.html

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: 2.6.24-rc1 freezes on powerbook at first boot stage

2007-11-03 Thread Nathan Lynch

(cc'ing linuxppc-dev, see
http://www.mail-archive.com/[EMAIL PROTECTED]/msg221770.html
for original post and .config)

Elimar Riesebieter wrote:
 On Wed, 24 Oct 2007 the mental interface of
 Elimar Riesebieter told:
 
 [...]
  The kernel is loaded from firmware but freezes at the moment to load
  the radeon framebuffer. I can't get any debug info (don't know
  how?).
 
 Screen dump till freeze:
 
 Using PowerMac machine description
 Total memory = 1024MB; using 2048kB for hash table (at cfe0)
 Linux version 2.6.24-rc1-aragorn ([EMAIL PROTECTED]) (gcc version 4.2.3 
 20071014 (prelelease) (Debian 4.2.2-3)) #1 Wed Oct 24 12:48:27 CEST 2007
 Found UniNorth memory controller  host bridge @ 0xf800 revision: 0xd2
 Mapped at 0xfdfc
 Found a Intrepid mac-io controller, rev: 0, mapped at 0xfdf4
 PowerMac motherboard: PowerBook G4 15
 console [udbg0] enabled
 setup_arch: bootmem
 Found UniNorth PCI host bridge at 0xf000. Firmware bus number: 
 0-1
 Found UniNorth PCI host bridge at 0xf200. Firmware bus number: 
 0-1
 Found UniNorth PCI host bridge at 0xf400. Firmware bus number: 
 0-1
 via-pmu: Server Mode is disabled
 PMU driver v2 initialized for Core99, firmware: 0c
 nvram: Checking bank 0...
 nvram: gen0=741, gen1=740
 nvram: Active bank is: 0
 nvram: OF partition at 0x410
 nvram: XP partition at 0x1020
 nvram: NR partition at 0x1120
 arch: exit
 Zone PFN ranges:
   DMA 0 -   196608
   Normal 196608 -   196608
   HighMem196608 -   262144
 Movable zone start PFN for each node
 early_node_map[1] active PFN ranges
 0:0 -   262144
 Built 1 zonelists in Zone order.  Total pages: 260096
 Kernel command line: root=/[EMAIL PROTECTED]/[EMAIL PROTECTED]/[EMAIL 
 PROTECTED]:5 root=/dev/hda5
 mpic: Setting up MPIC  MPIC 1version 1.2 at 8004, max 4 CPUs
 mpic: ISU size: 64, shift: 6, mask: 3f
 mpic: Initializing for 64 sources
 PID hash table entries: 4096 (order: 12, 16384 bytes)
 GMT Delta read from XPRAM: 0 minutes, DST: off
 clocksource: timebase mult[d9038e4] shift [22] registered
 clockevent: decremeter mult[4b7] shift[16] cpu[0]
 Console: colour dummy device 80x25
 console handover: boot [udbg0] - real [tty0]

Does 2.6.23 (or any earlier kernel) work?

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] Balance alloc/free and ioremap/iounmap in gpio_mdio_probe (powerpc/platforms/pasemi/gpio_mdio.c)

2007-11-04 Thread Nathan Lynch

Hi,

Roel Kluin wrote:
 I think this is how it should be done, but
 please review: it was not tested.
 --
 Balance alloc/free and ioremap/iounmap

It would be more helpful if your changelog identified the objects
which could be leaked.  More comments below.


 -static int __devinit gpio_mdio_probe(struct of_device *ofdev,
 -  const struct of_device_id *match)
 +static int __devinit __gpio_mdio_register_bus(struct of_device *ofdev,
 + const struct of_device_id *match)
  {
   struct device *dev = ofdev-dev;
   struct device_node *np = ofdev-node;
 - struct device_node *gpio_np;
   struct mii_bus *new_bus;
   struct resource res;
   struct gpio_priv *priv;
   const unsigned int *prop;
 - int err = 0;
   int i;
  
 - gpio_np = of_find_compatible_node(NULL, gpio, 1682m-gpio);
 -
 - if (!gpio_np)
 - return -ENODEV;
 -
 - err = of_address_to_resource(gpio_np, 0, res);
 - of_node_put(gpio_np);
 -
 - if (err)
 - return -EINVAL;
 -
 - if (!gpio_regs)
 - gpio_regs = ioremap(res.start, 0x100);
 -
 - if (!gpio_regs)
 - return -EPERM;
 -
   priv = kzalloc(sizeof(struct gpio_priv), GFP_KERNEL);
 - if (priv == NULL)
 + if (unlikely(priv == NULL))
   return -ENOMEM;

Please don't add likely/unlikely to non-hot paths such as this.


   new_bus = kzalloc(sizeof(struct mii_bus), GFP_KERNEL);
  
 - if (new_bus == NULL)
 - return -ENOMEM;
 + if (unlikely(new_bus == NULL))
 + goto free_priv;

again

  
   new_bus-name = pasemi gpio mdio bus,
   new_bus-read = gpio_mdio_read,
   new_bus-write = gpio_mdio_write,
   new_bus-reset = gpio_mdio_reset,
  
   prop = of_get_property(np, reg, NULL);
   new_bus-id = *prop;
   new_bus-priv = priv;
  
   new_bus-phy_mask = 0;
  
   new_bus-irq = kmalloc(sizeof(int)*PHY_MAX_ADDR, GFP_KERNEL);
 + if (unlikely(new_bus-irq == NULL))
 + goto free_new_bus;
 +

again

   for(i = 0; i  PHY_MAX_ADDR; ++i)
   new_bus-irq[i] = irq_create_mapping(NULL, 10);
  
  
   prop = of_get_property(np, mdc-pin, NULL);
   priv-mdc_pin = *prop;
  
   prop = of_get_property(np, mdio-pin, NULL);
   priv-mdio_pin = *prop;
  
   new_bus-dev = dev;
   dev_set_drvdata(dev, new_bus);
  
   err = mdiobus_register(new_bus);
  
 - if (0 != err) {
 + if (unlikely(0 != err)) {

again

   printk(KERN_ERR %s: Cannot register as MDIO bus, err %d\n,
   new_bus-name, err);
   goto bus_register_fail;
   }
  
   return 0;
  
  bus_register_fail:
 + kfree(new_bus-irq);
 +free_new_bus:
   kfree(new_bus);
 +free_priv:
 + kfree(priv);
 +
 + return -ENOMEM;
 +}
 +
 +
 +static int __devinit gpio_mdio_probe(struct of_device *ofdev,
 +  const struct of_device_id *match)
 +{
 + struct device_node *gpio_np;
 + int err;
 +
 + gpio_np = of_find_compatible_node(NULL, gpio, 1682m-gpio);
 +
 + if (!gpio_np)
 + return -ENODEV;
 +
 + err = of_address_to_resource(gpio_np, 0, res);

Hmm, where is res defined?


 + of_node_put(gpio_np);
 +
 + if (err)
 + return -EINVAL;
 +
 + if (!gpio_regs) {
 +

Unneeded newline


 + gpio_regs = ioremap(res.start, 0x100);
 + if (unlikely(!gpio_regs))
 + return -EPERM;
 +
 + err = __gpio_mdio_register_bus(ofdev, match);
 + if (err  0)
 + iounmap(gpio_regs);
 + } else
 + err = __gpio_mdio_register_bus(ofdev, match);
  
   return err;
 +

again

  }
  
  
  static int gpio_mdio_remove(struct of_device *dev)
  {
   struct mii_bus *bus = dev_get_drvdata(dev-dev);
  
   mdiobus_unregister(bus);
 
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[RFC/PATCH] reduce load time for modules with lots of relocs

2007-11-05 Thread Nathan Lynch

Nathan Lynch wrote:
 Medve Emilian-EMMEDVE1 wrote:
  
  I have module with 36K relocation entries (I know, I know, how could
  that be...) and the count_relocs() function takes ~17 seconds to crunch
  through the relocation table from the .rela.text section. I don't think
  I can reduce the number of entries in the relocation table (can I?) so
  I'm thinking at improving the performance of count_relocs() (currently
  O(n^2)). Does anybody have a simpler idea? Does anyone have any
  constructive suggestion on how to improve the situation?
 
 This seems to come up every few months.  There was a patch submitted
 here:
 
 http://ozlabs.org/pipermail/linuxppc-dev/2007-June/037641.html

I think this comes up often enough for us to fix it (IIRC unionfs
people complained about it long ago too), and it's kind of lame to
spend 17 seconds in the kernel without scheduling.  So I dug up some
old patches that should reduce the complexity to O(n logn), using the
sort() function, which IMO is preferable to doing our own hash thing
as that patch from June does.  Only the 64-bit code is tested.  Does
this help your situation?

 arch/powerpc/kernel/module_32.c |   85 +++-
 arch/powerpc/kernel/module_64.c |   80 ++---
 2 files changed, 132 insertions(+), 33 deletions(-)


diff --git a/arch/powerpc/kernel/module_32.c b/arch/powerpc/kernel/module_32.c
index 07a89a3..001b941 100644
--- a/arch/powerpc/kernel/module_32.c
+++ b/arch/powerpc/kernel/module_32.c
@@ -24,6 +24,7 @@
 #include linux/kernel.h
 #include linux/cache.h
 #include linux/bug.h
+#include linux/sort.h
 
 #include setup.h
 
@@ -50,26 +51,64 @@ void module_free(struct module *mod, void *module_region)
table entries. */
 }
 
+static int reloc_cmp(const void *_a, const void *_b)
+{
+   const Elf32_Rela *a = *(Elf32_Rela **)_a, *b = *(Elf32_Rela **)_b;
+
+   if (a-r_info  b-r_info)
+   return -1;
+   else if (a-r_info  b-r_info)
+   return 1;
+   else if (a-r_addend  b-r_addend)
+   return -1;
+   else if (a-r_addend  b-r_addend)
+   return 1;
+
+   return 0;
+}
+
+
 /* Count how many different relocations (different symbol, different
addend) */
 static unsigned int count_relocs(const Elf32_Rela *rela, unsigned int num)
 {
-   unsigned int i, j, ret = 0;
+   unsigned int i, sorted_count = 0;
+   Elf32_Word last_info;
+   Elf32_Sword last_addend;
+   Elf32_Rela **sortbuf = NULL;
+
+   if (num == 0)
+   return 0;
+
+   sortbuf = vmalloc(num * sizeof(*sortbuf));
+
+   if (!sortbuf)
+   return -ENOMEM;
+
+   for (i = 0; i  num; i++)
+   sortbuf[i] = (Elf32_Rela *)rela[i];
+
+   sort(sortbuf, i, sizeof(sortbuf[0]), reloc_cmp, NULL);
+
+   last_info = sortbuf[0]-r_info;
+   last_addend = sortbuf[0]-r_addend;
+   sorted_count = 1;
 
-   /* Sure, this is order(n^2), but it's usually short, and not
-   time critical */
for (i = 0; i  num; i++) {
-   for (j = 0; j  i; j++) {
-   /* If this addend appeared before, it's
-   already been counted */
-   if (ELF32_R_SYM(rela[i].r_info)
-   == ELF32_R_SYM(rela[j].r_info)
-rela[i].r_addend == rela[j].r_addend)
-   break;
-   }
-   if (j == i) ret++;
+   /* If this r_info,r_addend tuple matches the previous
+* entry, don't count it again
+*/
+   if (sortbuf[i]-r_info == last_info 
+   sortbuf[i]-r_addend == last_addend)
+   continue;
+
+   last_info = sortbuf[i]-r_info;
+   last_addend = sortbuf[i]-r_addend;
+   sorted_count++;
}
-   return ret;
+
+   vfree(sortbuf);
+   return sorted_count;
 }
 
 /* Get the potential trampolines size required of the init and
@@ -96,15 +135,19 @@ static unsigned long get_plt_size(const Elf32_Ehdr *hdr,
continue;
 
if (sechdrs[i].sh_type == SHT_RELA) {
+   int count;
DEBUGP(Found relocations in section %u\n, i);
DEBUGP(Ptr: %p.  Number: %u\n,
   (void *)hdr + sechdrs[i].sh_offset,
   sechdrs[i].sh_size / sizeof(Elf32_Rela));
-   ret += count_relocs((void *)hdr
+   count = count_relocs((void *)hdr
 + sechdrs[i].sh_offset,
 sechdrs[i].sh_size
 / sizeof(Elf32_Rela))
* sizeof(struct ppc_plt_entry);
+   if (count  0

[RFC/PATCH] Fix rtas_ibm_suspend_me bugs

2007-11-05 Thread Nathan Lynch

(very rfc for now, no sign-off, needs more testing)

There are a couple of bugs in the rtas_ibm_suspend_me() and
rtas_percpu_suspend_me() functions:

1.  rtas_ibm_suspend_me() uses on_each_cpu() to invoke
rtas_percpu_suspend_me() via IPI:

if (on_each_cpu(rtas_percpu_suspend_me, data, 1, 0))
...

'data' is on the stack, and rtas_ibm_suspend_me() takes no measures to
ensure that all instances of rtas_percpu_suspend_me() are finished
accessing 'data' before returning.  This can result in the IPI'd cpus
accessing random stack data and getting stuck in H_JOIN.

Fix this by moving rtas_suspend_me_data off the stack, and protect it
with a mutex.  Use a completion to ensure that all cpus are done
accessing the data before unlocking.

2.  rtas_percpu_suspend_me passes the Linux logical cpu id to the
H_PROD hypervisor method, when it should be using the platform
interrupt server value for that cpu (hard_smp_processor_id).  In
practice, this probably causes problems only on partitions where
processors have been removed and added in a particular order.

---
 arch/powerpc/kernel/rtas.c |   64 ---
 1 files changed, 47 insertions(+), 17 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 2147807..24faaea 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -19,6 +19,9 @@
 #include linux/init.h
 #include linux/capability.h
 #include linux/delay.h
+#include linux/mutex.h
+#include linux/completion.h
+#include linux/smp.h
 
 #include asm/prom.h
 #include asm/rtas.h
@@ -34,6 +37,7 @@
 #include asm/lmb.h
 #include asm/udbg.h
 #include asm/syscalls.h
+#include asm/atomic.h
 
 struct rtas_t rtas = {
.lock = SPIN_LOCK_UNLOCKED
@@ -41,10 +45,21 @@ struct rtas_t rtas = {
 EXPORT_SYMBOL(rtas);
 
 struct rtas_suspend_me_data {
+   atomic_t working; /* number of cpus accessing rtas_suspend_me_data */
long waiting;
struct rtas_args *args;
+   struct completion done; /* wait on this until working == 0 */
 };
 
+static void rtas_suspend_me_data_init(struct rtas_suspend_me_data *rsmd,
+ struct rtas_args *args)
+{
+   atomic_set(rsmd-working, 0);
+   rsmd-waiting = 1;
+   rsmd-args = args;
+   init_completion(rsmd-done);
+}
+
 DEFINE_SPINLOCK(rtas_data_buf_lock);
 EXPORT_SYMBOL(rtas_data_buf_lock);
 
@@ -671,36 +686,49 @@ static void rtas_percpu_suspend_me(void *info)
 * we set it to 0.
 */
local_irq_save(flags);
+   atomic_inc(data-working);
do {
rc = plpar_hcall_norets(H_JOIN);
smp_rmb();
} while (rc == H_SUCCESS  data-waiting  0);
if (rc == H_SUCCESS)
+   /* join is complete (or there was an error) and this
+* cpu was prodded
+*/
goto out;
 
if (rc == H_CONTINUE) {
+   /* this cpu does the join */
data-waiting = 0;
data-args-args[data-args-nargs] =
rtas_call(ibm_suspend_me_token, 0, 1, NULL);
-   for_each_possible_cpu(i)
-   plpar_hcall_norets(H_PROD,i);
} else {
data-waiting = -EBUSY;
printk(KERN_ERR Error on H_JOIN hypervisor call\n);
}
 
+   /* This cpu did the join or got an error, so we need to prod
+* everyone else.  Extra prods are harmless.
+*/
+   for_each_online_cpu(i)
+   plpar_hcall_norets(H_PROD, get_hard_smp_processor_id(i));
+
 out:
+   if (atomic_dec_return(data-working) == 0)
+   complete(data-done);
local_irq_restore(flags);
return;
 }
 
+static DEFINE_MUTEX(rsm_lock); /* protects rsm_data */
+static struct rtas_suspend_me_data rsm_data;
+
 static int rtas_ibm_suspend_me(struct rtas_args *args)
 {
-   int i;
+   int err;
long state;
long rc;
unsigned long retbuf[PLPAR_HCALL_BUFSIZE];
-   struct rtas_suspend_me_data data;
 
/* Make sure the state is valid */
rc = plpar_hcall(H_VASI_STATE, retbuf,
@@ -721,25 +749,27 @@ static int rtas_ibm_suspend_me(struct rtas_args *args)
return 0;
}
 
-   data.waiting = 1;
-   data.args = args;
+   mutex_lock(rsm_lock);
+
+   rtas_suspend_me_data_init(rsm_data, args);
 
-   /* Call function on all CPUs.  One of us will make the
-* rtas call
+   /* Call function on all CPUs.  One of us (but not necessarily
+* this one) will make the ibm,suspend-me call.
 */
-   if (on_each_cpu(rtas_percpu_suspend_me, data, 1, 0))
-   data.waiting = -EINVAL;
+   if (on_each_cpu(rtas_percpu_suspend_me, rsm_data, 1, 0))
+   rsm_data.waiting = -EINVAL;
+
+   /* Must wait for all IPIs to complete before unlocking */
+   wait_for_completion(rsm_data.done);
 
-   if (data.waiting != 0)
+   if

Re: [RFC/PATCH] reduce load time for modules with lots of relocs

2007-11-06 Thread Nathan Lynch

Medve Emilian wrote:
 Would it be possible to do the sort in place (without the extra
 buffer)? I mean would that upset any other part of the kernel?

I suspect so.  Sounds like you've got the good testcase, maybe you
should try it and see.  :)
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [RFC/PATCH] Fix rtas_ibm_suspend_me bugs

2007-11-07 Thread Nathan Lynch

Nathan Lynch wrote:
 (very rfc for now, no sign-off, needs more testing)
 
 There are a couple of bugs in the rtas_ibm_suspend_me() and
 rtas_percpu_suspend_me() functions:
 
 1.  rtas_ibm_suspend_me() uses on_each_cpu() to invoke
 rtas_percpu_suspend_me() via IPI:
 
 if (on_each_cpu(rtas_percpu_suspend_me, data, 1, 0))
 ...
 
 'data' is on the stack, and rtas_ibm_suspend_me() takes no measures to
 ensure that all instances of rtas_percpu_suspend_me() are finished
 accessing 'data' before returning.  This can result in the IPI'd cpus
 accessing random stack data and getting stuck in H_JOIN.

Another possible issue is that H_JOIN requires MSR.EE to be off, but
lazy interrupt disabling could conceivably allow that constraint to be
violated if we end up doing H_JOIN on the cpu which calls on_each_cpu().
At least I think so...
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH v2] fix multiple bugs in rtas_ibm_suspend_me code

2007-11-09 Thread Nathan Lynch

There are several issues with the rtas_ibm_suspend_me code, which
enables platform-assisted suspension of an LPAR for migration or
hibernation as covered in PAPR 2.2.

1.) rtas_ibm_suspend_me uses on_each_cpu() to invoke
rtas_percpu_suspend_me on all cpus via IPI:

if (on_each_cpu(rtas_percpu_suspend_me, data, 1, 0))
...

'data' is on the calling task's stack, but rtas_ibm_suspend_me takes
no measures to ensure that all instances of rtas_percpu_suspend_me are
finished accessing 'data' before returning.  This can result in the
IPI'd cpus accessing random stack data and getting stuck in H_JOIN.

This is addressed by using an atomic count of workers and a completion
on the stack.

2.) rtas_percpu_suspend_me is needlessly calling H_JOIN in a loop.
The only event that can cause a cpu to return from H_JOIN is an H_PROD
from another cpu or a NMI/system reset.  Each cpu need call H_JOIN
only once per suspend operation.

Remove the loop and the now unnecessary 'waiting' state variable.

3.) H_JOIN must be called with MSR[EE] off, but lazy interrupt
disabling may cause the caller of rtas_ibm_suspend_me to call H_JOIN
with it on; the local_irq_disable() in on_each_cpu() is not
sufficient.

Fix this by explicitly saving the MSR and clearing the EE bit before
calling H_JOIN.

4.) H_PROD is being called with the Linux logical cpu number as the
parameter, not the platform interrupt server value.  (It's also being
called for all possible cpus, which is harmless, but unnecessary.)

This is fixed by calling H_PROD for each online cpu using
get_hard_smp_processor_id(cpu) for the argument.

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]

---
 arch/powerpc/kernel/rtas.c |   94 +---
 1 files changed, 53 insertions(+), 41 deletions(-)

Changes since v1:
- the completion is adequate for ensuring that all IPIs have
completed, so there's no need for a mutex and rtas_suspend_me_data
can be on the stack as before.

- add fix for hard-disabling interrupts before H_JOIN

- remove unnecessary H_JOIN loop and state machinery

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 2147807..3afe771 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -41,8 +41,10 @@ struct rtas_t rtas = {
 EXPORT_SYMBOL(rtas);
 
 struct rtas_suspend_me_data {
-   long waiting;
-   struct rtas_args *args;
+   atomic_t working; /* number of cpus accessing this struct */
+   int token; /* ibm,suspend-me */
+   int error;
+   struct completion *complete; /* wait on this until working == 0 */
 };
 
 DEFINE_SPINLOCK(rtas_data_buf_lock);
@@ -657,50 +659,62 @@ static int ibm_suspend_me_token = RTAS_UNKNOWN_SERVICE;
 #ifdef CONFIG_PPC_PSERIES
 static void rtas_percpu_suspend_me(void *info)
 {
-   int i;
long rc;
-   long flags;
+   unsigned long msr_save;
+   int cpu;
struct rtas_suspend_me_data *data =
(struct rtas_suspend_me_data *)info;
 
-   /*
-* We use waiting to indicate our state.  As long
-* as it is 0, we are still trying to all join up.
-* If it goes to 0, we have successfully joined up and
-* one thread got H_CONTINUE.  If any error happens,
-* we set it to 0.
-*/
-   local_irq_save(flags);
-   do {
-   rc = plpar_hcall_norets(H_JOIN);
-   smp_rmb();
-   } while (rc == H_SUCCESS  data-waiting  0);
-   if (rc == H_SUCCESS)
-   goto out;
+   atomic_inc(data-working);
+
+   /* really need to ensure MSR.EE is off for H_JOIN */
+   msr_save = mfmsr();
+   mtmsr(msr_save  ~(MSR_EE));
+
+   rc = plpar_hcall_norets(H_JOIN);
+
+   mtmsr(msr_save);
 
-   if (rc == H_CONTINUE) {
-   data-waiting = 0;
-   data-args-args[data-args-nargs] =
-   rtas_call(ibm_suspend_me_token, 0, 1, NULL);
-   for_each_possible_cpu(i)
-   plpar_hcall_norets(H_PROD,i);
+   if (rc == H_SUCCESS) {
+   /* This cpu was prodded and the suspend is complete. */
+   goto out;
+   } else if (rc == H_CONTINUE) {
+   /* All other cpus are in H_JOIN, this cpu does
+* the suspend.
+*/
+   printk(KERN_DEBUG calling ibm,suspend-me on cpu %i\n,
+  smp_processor_id());
+   data-error = rtas_call(data-token, 0, 1, NULL);
+
+   if (data-error)
+   printk(KERN_DEBUG ibm,suspend-me returned %d\n,
+  data-error);
} else {
-   data-waiting = -EBUSY;
-   printk(KERN_ERR Error on H_JOIN hypervisor call\n);
+   printk(KERN_ERR H_JOIN on cpu %i failed with rc = %ld\n,
+  smp_processor_id(), rc);
+   data-error = rc;
}
-
+   /* This cpu did the suspend or got an error; in either case

[PATCH] remove prod_processor()

2007-11-12 Thread Nathan Lynch

prod_processor() is unused, and that's a good thing, since it does not
supply the required proc id parameter to H_PROD.

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---
 arch/powerpc/platforms/pseries/plpar_wrappers.h |5 -
 1 files changed, 0 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/plpar_wrappers.h 
b/arch/powerpc/platforms/pseries/plpar_wrappers.h
index d003c80..d8680b5 100644
--- a/arch/powerpc/platforms/pseries/plpar_wrappers.h
+++ b/arch/powerpc/platforms/pseries/plpar_wrappers.h
@@ -8,11 +8,6 @@ static inline long poll_pending(void)
return plpar_hcall_norets(H_POLL_PENDING);
 }
 
-static inline long prod_processor(void)
-{
-   return plpar_hcall_norets(H_PROD);
-}
-
 static inline long cede_processor(void)
 {
return plpar_hcall_norets(H_CEDE);
-- 
1.5.3.4.206.g58ba4

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH v3] fix multiple bugs in rtas_ibm_suspend_me code

2007-11-13 Thread Nathan Lynch

There are several issues with the rtas_ibm_suspend_me code, which
enables platform-assisted suspension of an LPAR for migration or
hibernation as covered in PAPR 2.2.

1.) rtas_ibm_suspend_me uses on_each_cpu() to invoke
rtas_percpu_suspend_me on all cpus via IPI:

if (on_each_cpu(rtas_percpu_suspend_me, data, 1, 0))
...

'data' is on the calling task's stack, but rtas_ibm_suspend_me takes
no measures to ensure that all instances of rtas_percpu_suspend_me are
finished accessing 'data' before returning.  This can result in the
IPI'd cpus accessing random stack data and getting stuck in H_JOIN.

This is addressed by using an atomic count of workers and a completion
on the stack.

2.) rtas_percpu_suspend_me is needlessly calling H_JOIN in a loop.
The only event that can cause a cpu to return from H_JOIN is an H_PROD
from another cpu or a NMI/system reset.  Each cpu need call H_JOIN
only once per suspend operation.

Remove the loop and the now unnecessary 'waiting' state variable.

3.) H_JOIN must be called with MSR[EE] off, but lazy interrupt
disabling may cause the caller of rtas_ibm_suspend_me to call H_JOIN
with it on; the local_irq_disable() in on_each_cpu() is not
sufficient.

Fix this by explicitly saving the MSR and clearing the EE bit before
calling H_JOIN.

4.) H_PROD is being called with the Linux logical cpu number as the
parameter, not the platform interrupt server value.  (It's also being
called for all possible cpus, which is harmless, but unnecessary.)

This is fixed by calling H_PROD for each online cpu using
get_hard_smp_processor_id(cpu) for the argument.

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]

---

Paul Mackerras wrote:

 On a uniprocessor configuration, with this patch I get:

   CC  arch/powerpc/kernel/rtas.o
 /home/paulus/kernel/powerpc/arch/powerpc/kernel/rtas.c: In function 
 ‘rtas_percpu_suspend_me’:
 /home/paulus/kernel/powerpc/arch/powerpc/kernel/rtas.c:702: error: implicit 
 declaration of function ‘get_hard_smp_processor_id
 make[2]: *** [arch/powerpc/kernel/rtas.o] Error 1

 I think you need to #include asm/smp.h in rtas.c.

Rather sloppy of me, sorry.

Changes since v2:
- Add appropriate #includes for APIs used, fixing SMP=n build

Changes since v1:
- the completion is adequate for ensuring that all IPIs have
completed, so there's no need for a mutex and rtas_suspend_me_data
can be on the stack as before.

- add fix for hard-disabling interrupts before H_JOIN

- remove unnecessary H_JOIN loop and state machinery


 arch/powerpc/kernel/rtas.c |   99 ++--
 1 files changed, 58 insertions(+), 41 deletions(-)

diff --git a/arch/powerpc/kernel/rtas.c b/arch/powerpc/kernel/rtas.c
index 2147807..52e95c2 100644
--- a/arch/powerpc/kernel/rtas.c
+++ b/arch/powerpc/kernel/rtas.c
@@ -19,6 +19,9 @@
 #include linux/init.h
 #include linux/capability.h
 #include linux/delay.h
+#include linux/smp.h
+#include linux/completion.h
+#include linux/cpumask.h

 #include asm/prom.h
 #include asm/rtas.h
@@ -34,6 +37,8 @@
 #include asm/lmb.h
 #include asm/udbg.h
 #include asm/syscalls.h
+#include asm/smp.h
+#include asm/atomic.h

 struct rtas_t rtas = {
.lock = SPIN_LOCK_UNLOCKED
@@ -41,8 +46,10 @@ struct rtas_t rtas = {
 EXPORT_SYMBOL(rtas);

 struct rtas_suspend_me_data {
-   long waiting;
-   struct rtas_args *args;
+   atomic_t working; /* number of cpus accessing this struct */
+   int token; /* ibm,suspend-me */
+   int error;
+   struct completion *complete; /* wait on this until working == 0 */
 };

 DEFINE_SPINLOCK(rtas_data_buf_lock);
@@ -657,50 +664,62 @@ static int ibm_suspend_me_token = RTAS_UNKNOWN_SERVICE;
 #ifdef CONFIG_PPC_PSERIES
 static void rtas_percpu_suspend_me(void *info)
 {
-   int i;
long rc;
-   long flags;
+   unsigned long msr_save;
+   int cpu;
struct rtas_suspend_me_data *data =
(struct rtas_suspend_me_data *)info;

-   /*
-* We use waiting to indicate our state.  As long
-* as it is 0, we are still trying to all join up.
-* If it goes to 0, we have successfully joined up and
-* one thread got H_CONTINUE.  If any error happens,
-* we set it to 0.
-*/
-   local_irq_save(flags);
-   do {
-   rc = plpar_hcall_norets(H_JOIN);
-   smp_rmb();
-   } while (rc == H_SUCCESS  data-waiting  0);
-   if (rc == H_SUCCESS)
-   goto out;
+   atomic_inc(data-working);
+
+   /* really need to ensure MSR.EE is off for H_JOIN */
+   msr_save = mfmsr();
+   mtmsr(msr_save  ~(MSR_EE));
+
+   rc = plpar_hcall_norets(H_JOIN);
+
+   mtmsr(msr_save);

-   if (rc == H_CONTINUE) {
-   data-waiting = 0;
-   data-args-args[data-args-nargs] =
-   rtas_call(ibm_suspend_me_token, 0, 1, NULL);
-   for_each_possible_cpu(i)
-   plpar_hcall_norets(H_PROD,i);
+   if (rc

Re: [PATCH v3] fix multiple bugs in rtas_ibm_suspend_me code

2007-11-13 Thread Nathan Lynch

Nathan Lynch wrote:
 
 3.) H_JOIN must be called with MSR[EE] off, but lazy interrupt
 disabling may cause the caller of rtas_ibm_suspend_me to call H_JOIN
 with it on; the local_irq_disable() in on_each_cpu() is not
 sufficient.
 
 Fix this by explicitly saving the MSR and clearing the EE bit before
 calling H_JOIN.

...

 + atomic_inc(data-working);
 +
 + /* really need to ensure MSR.EE is off for H_JOIN */
 + msr_save = mfmsr();
 + mtmsr(msr_save  ~(MSR_EE));
 +
 + rc = plpar_hcall_norets(H_JOIN);
 +
 + mtmsr(msr_save);

BTW, I'm wondering if this is the right way to do this.  I think
there's the possibility that we could enter this routine hard-enabled
and take take an interrupt between the mfmsr and the first mtmsr, but
I haven't worked out all the implications.  Would hard_irq_disable be
better?
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH] use correct ccr bit for syscall error status

2010-03-12 Thread Nathan Lynch

The powerpc implementations of syscall_get_error and
syscall_set_return_value should use CCR0:S0 (0x1000) for testing
and setting syscall error status.  Fortunately these APIs don't seem
to be used at the moment.

Signed-off-by: Nathan Lynch n...@pobox.com
---
 arch/powerpc/include/asm/syscall.h |6 +++---
 1 files changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/powerpc/include/asm/syscall.h 
b/arch/powerpc/include/asm/syscall.h
index efa7f0b..23913e9 100644
--- a/arch/powerpc/include/asm/syscall.h
+++ b/arch/powerpc/include/asm/syscall.h
@@ -30,7 +30,7 @@ static inline void syscall_rollback(struct task_struct *task,
 static inline long syscall_get_error(struct task_struct *task,
 struct pt_regs *regs)
 {
-   return (regs-ccr  0x1000) ? -regs-gpr[3] : 0;
+   return (regs-ccr  0x1000) ? -regs-gpr[3] : 0;
 }
 
 static inline long syscall_get_return_value(struct task_struct *task,
@@ -44,10 +44,10 @@ static inline void syscall_set_return_value(struct 
task_struct *task,
int error, long val)
 {
if (error) {
-   regs-ccr |= 0x1000L;
+   regs-ccr |= 0x1000L;
regs-gpr[3] = -error;
} else {
-   regs-ccr = ~0x1000L;
+   regs-ccr = ~0x1000L;
regs-gpr[3] = val;
}
 }
-- 
1.6.0.6

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] [POWERPC] 85xx: Add next-level-cache property

2008-06-02 Thread Nathan Lynch

Segher Boessenkool wrote:

 The PowerPC binding defines an l2-cache property for this (it
 points from CPU node to L2 cache node, from L2 cache node to L3
 cache node, from L3 cache node to L4 cache node, etc.)

 So looking at the PPC binding its not terrible clear about l3-cache  
 being a valid property.

 It isn't.  The property is called l2-cache at every level.

 I believe the discussion w/ePAPR was to create something a bit more  
 generic and clarify/update the PPC binding.

 Nasty.  Sure, l2-cache isn't the nicest name to point to deeper
 cache levels, but introducing a new property with (substantially)
 the same semantics is worse.

The semantics appear to be identical, even.


 There really shouldn't be a new property name until new functionality
 is introduced.  For example, it could allow to describe more than one
 cache at each level (the current binding already allows more than one
 parent for each cache, but only one child; and cache hierarchies like
 that actually exist).

 I'm going to stick with the new binding as we don't use this linking  
 currently.

 Dunno what's the best thing to do here.  If you don't need the
 functionality yet, it might be best to postpone putting either
 property in there.  Sigh, what a mess.

Does existing practice count for anything?  IBM pseries firmware uses
the l2-cache property as described in the PowerPC binding.

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 2/3] [2.6.26] ehea: Add dependency to Kconfig

2008-06-03 Thread Nathan Lynch

Hannes Hering wrote:
 
 On Wednesday 28 May 2008 18:44:05 Nathan Lynch wrote:
  
  Hannes Hering wrote:
   The new ehea memory hot plug implementation depends on MEMORY_HOTPLUG.
   
   Signed-off-by: Hannes Hering [EMAIL PROTECTED]
   ---
   
   diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
   index f90a86b..181cd86 100644
   --- a/drivers/net/Kconfig
   +++ b/drivers/net/Kconfig
   @@ -2440,7 +2440,7 @@ config CHELSIO_T3

config EHEA
 tristate eHEA Ethernet support
   - depends on IBMEBUS  INET  SPARSEMEM
   + depends on IBMEBUS  INET  SPARSEMEM  MEMORY_HOTPLUG
 select INET_LRO
 ---help---
   This driver supports the IBM pSeries eHEA ethernet adapter.
  
  I disagree with this change.
  
  It makes it impossible to build the ehea driver without memory hotplug
  enabled.  Presumably, this commit was intended to work around a build
  break of this sort (with EHEA=m and MEMORY_HOTPLUG=n):
  
  drivers/net/ehea/ehea_qmr.c: In function 'ehea_create_busmap':
  drivers/net/ehea/ehea_qmr.c:635: error: implicit declaration of function 
  'walk_memory_resource'
  
  (some indication of this should have been in the commit message, btw)
  
  I think this was the wrong way to fix the issue.  EHEA=m and
  MEMORY_HOTPLUG=n is a valid configuration for machines I test.
  
  Any thoughts on the following, which makes walk_memory_resource()
  available regardless of MEMORY_HOTPLUG's setting?  I've tested it on a
  JS22 (Power6 blade).

 I agree that the ehea cannot be built without MEMORY_HOTPLUG. The
 problem is the fact that the ppc walk_memory_resource declaration is
 in the scope of MEMORY_HOTPLUG. At the moment I don't have complete
 overview if the move of the code as you propose in your patch has
 any side effects. We probably need to talk to Badari who provided
 the walk_memory_resource code. We can also just throw it onto one of
 our boxes to see what happens. ;)

I would certainly appreciate any additional testing.

You wrote the ehea code that uses walk_memory_resource, so I was
hoping you could speak to whether ehea needs that interface when
CONFIG_MEMORY_HOTPLUG=n.  Or maybe there should be a no-op version of
walk_memory_resource for that case?  And what about the
arch-independent version in kernel/resource.c?  Badari?

It would be nice to get this resolved for 2.6.26 -- this new
dependency causes working 2.6.25 configs to effectively fail (by
deselecting CONFIG_EHEA during make oldconfig).
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 1/2] make walk_memory_resource available with MEMORY_HOTPLUG=n

2008-06-03 Thread Nathan Lynch

The ehea driver was recently changed[1] to use walk_memory_resource() to
detect the system's memory layout.  However, walk_memory_resource() is
available only when memory hotplug is enabled.  So CONFIG_EHEA was
made to depend on MEMORY_HOTPLUG [2], but it is inappropriate for a
network driver to have such a dependency.

Make the declaration of walk_memory_resource() and its powerpc
implementation (ehea is powerpc-specific) unconditionally available.

[1] 48cfb14f8b89d4d5b3df6c16f08b258686fb12ad
ehea: Add DLPAR memory remove support

[2] fb7b6ca2b6b7c23b52be143bdd5f55a23b9780c8
ehea: Add dependency to Kconfig

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---
 arch/powerpc/mm/mem.c  |3 +--
 include/linux/memory_hotplug.h |   16 
 2 files changed, 9 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c
index f67e118..51f82d8 100644
--- a/arch/powerpc/mm/mem.c
+++ b/arch/powerpc/mm/mem.c
@@ -151,6 +151,7 @@ out:
return ret;
 }
 #endif /* CONFIG_MEMORY_HOTREMOVE */
+#endif /* CONFIG_MEMORY_HOTPLUG */
 
 /*
  * walk_memory_resource() needs to make sure there is no holes in a given
@@ -184,8 +185,6 @@ walk_memory_resource(unsigned long start_pfn, unsigned long 
nr_pages, void *arg,
 }
 EXPORT_SYMBOL_GPL(walk_memory_resource);
 
-#endif /* CONFIG_MEMORY_HOTPLUG */
-
 void show_mem(void)
 {
unsigned long total = 0, reserved = 0;
diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index 73e3586..ea9f5ad 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -77,14 +77,6 @@ extern int __add_pages(struct zone *zone, unsigned long 
start_pfn,
 extern int __remove_pages(struct zone *zone, unsigned long start_pfn,
unsigned long nr_pages);
 
-/*
- * Walk through all memory which is registered as resource.
- * arg is (start_pfn, nr_pages, private_arg_pointer)
- */
-extern int walk_memory_resource(unsigned long start_pfn,
-   unsigned long nr_pages, void *arg,
-   int (*func)(unsigned long, unsigned long, void *));
-
 #ifdef CONFIG_NUMA
 extern int memory_add_physaddr_to_nid(u64 start);
 #else
@@ -199,6 +191,14 @@ static inline void register_page_bootmem_info_node(struct 
pglist_data *pgdat)
 
 #endif /* ! CONFIG_MEMORY_HOTPLUG */
 
+/*
+ * Walk through all memory which is registered as resource.
+ * arg is (start_pfn, nr_pages, private_arg_pointer)
+ */
+extern int walk_memory_resource(unsigned long start_pfn,
+   unsigned long nr_pages, void *arg,
+   int (*func)(unsigned long, unsigned long, void *));
+
 extern int add_memory(int nid, u64 start, u64 size);
 extern int arch_add_memory(int nid, u64 start, u64 size);
 extern int remove_memory(u64 start, u64 size);
-- 
1.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 2/2] ehea: remove dependency on MEMORY_HOTPLUG

2008-06-03 Thread Nathan Lynch

Now that walk_memory_resource() is available regardless of
MEMORY_HOTPLUG's setting, this dependency is not needed.

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---
 drivers/net/Kconfig |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/net/Kconfig b/drivers/net/Kconfig
index dd0ec9e..f4182cf 100644
--- a/drivers/net/Kconfig
+++ b/drivers/net/Kconfig
@@ -2426,7 +2426,7 @@ config CHELSIO_T3
 
 config EHEA
tristate eHEA Ethernet support
-   depends on IBMEBUS  INET  SPARSEMEM  MEMORY_HOTPLUG
+   depends on IBMEBUS  INET  SPARSEMEM
select INET_LRO
---help---
  This driver supports the IBM pSeries eHEA ethernet adapter.
-- 
1.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 1/2] make walk_memory_resource available with MEMORY_HOTPLUG=n

2008-06-06 Thread Nathan Lynch

Nathan Lynch wrote:
 The ehea driver was recently changed[1] to use walk_memory_resource() to
 detect the system's memory layout.  However, walk_memory_resource() is
 available only when memory hotplug is enabled.  So CONFIG_EHEA was
 made to depend on MEMORY_HOTPLUG [2], but it is inappropriate for a
 network driver to have such a dependency.
 
 Make the declaration of walk_memory_resource() and its powerpc
 implementation (ehea is powerpc-specific) unconditionally available.

Paul, can you take these two patches, or should I send them to akpm?
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] Power5,Power6 BSR driver

2008-06-17 Thread Nathan Lynch

Hi, mainly a couple of coding style things, but one minor bug (I
think).

[EMAIL PROTECTED] wrote:
 From: Sonny Rao [EMAIL PROTECTED]
 
 +static int bsr_mmap(struct file *filp, struct vm_area_struct *vma)
 +{
 + unsigned long size   = vma-vm_end - vma-vm_start;
 + struct bsr_dev *dev = filp-private_data;
 +
 + if (size  dev-bsr_len || (size  (PAGE_SIZE-1)))
 + return -EINVAL;
 +
 + vma-vm_flags |= (VM_IO | VM_DONTEXPAND);
 + vma-vm_page_prot = pgprot_noncached(vma-vm_page_prot);
 +
 + if (io_remap_pfn_range(vma, vma-vm_start, dev-bsr_addr  PAGE_SHIFT,
 +size, vma-vm_page_prot))
 + return -EAGAIN;

Indentation is wrong.


 +static void bsr_cleanup_devs(void)
 +{
 + int i;
 + for (i=0 ; i  num_bsr_devs; i++) {

 i = 0

 + struct bsr_dev *cur = bsr_devs + i;
 + if (cur-bsr_device) {
 + cdev_del(cur-bsr_cdev);
 + device_del(cur-bsr_device);
 + }
 + }
 +
 + kfree(bsr_devs);
 +}
 +
 +static int bsr_create_devs(struct device_node *bn)
 +{
 + int reg_len, bsr_stride_len, bsr_bytes_len;
 + const u64 *reg;
 + const u32 *bsr_stride;
 + const u32 *bsr_bytes;
 + unsigned i;
 +
 + reg= of_get_property(bn, reg, reg_len);
 + bsr_stride = of_get_property(bn, ibm,lock-stride, bsr_stride_len);
 + bsr_bytes  = of_get_property(bn, ibm,#lock-bytes, bsr_bytes_len);
 +
 + if (!reg || !bsr_stride || !bsr_bytes ||
 + (bsr_stride_len != bsr_bytes_len) ||
 + (bsr_stride_len/4 != reg_len/16)) {

 bsr_stride_len / 4 != reg_len / 16


 + printk(KERN_ERR bsr of-node has missing/incorrect property\n);
 + return -ENODEV;
 + }

...

 +static int __init bsr_init(void)
 +{
 + struct device_node *np;
 + dev_t bsr_dev = MKDEV(bsr_major, 0);
 + int ret = -ENODEV;
 + int result;
 +
 + np = of_find_compatible_node(NULL, ibm,bsr, ibm,bsr);
 + if (!np)
 + goto out_err;
 +
 + bsr_class = class_create(THIS_MODULE, bsr);
 + if (IS_ERR(bsr_class)) {
 + printk(KERN_ERR class_create() failed for bsr_class\n);
 + goto out_err;

At this point I think you can leak a reference to np.


 + }
 + bsr_class-dev_attrs = bsr_dev_attrs;
 +
 + result = alloc_chrdev_region(bsr_dev, 0, BSR_MAX_DEVS, bsr);
 + bsr_major = MAJOR(bsr_dev);
 + if (result  0) {
 + printk(KERN_ERR alloc_chrdev_region() failed for bsr\n);
 + goto out_err_1;
 + }
 +
 + if ((ret = bsr_create_devs(np))  0)
 + goto out_err_2;
 +
 + of_node_put(np);
 +
 + return 0;
 +
 + out_err_2:
 + unregister_chrdev_region(bsr_dev, BSR_MAX_DEVS);
 +
 + out_err_1:
 + class_destroy(bsr_class);
 + of_node_put(np);
 +
 + out_err:
 +
 + return ret;
 +}
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[RFC/PATCH 0/3] sched: allow arch override of cpu power

2008-06-18 Thread Nathan Lynch

There is an interesting quality of POWER6 cores, which each have 2
hardware threads: assuming one thread on the core is idle, the primary
thread is a little faster than the secondary thread.  To illustrate:

for cpumask in 0x1 0x2 ; do
taskset $cpumask /usr/bin/time -f %e elapsed, %U user, %S sys \
/bin/sh -c i=100 ; while (( i-- )) ; do : ; done
done

17.05 elapsed, 16.83 user, 0.22 sys
17.54 elapsed, 17.32 user, 0.22 sys

(The first result is for a primary thread; the second result for a
secondary thread.)

So it would be nice to have the scheduler slightly prefer primary
threads on POWER6 machines.  These patches, which allow the
architecture to override the scheduler's CPU power calculation, are
one possible approach, but I'm open to others.  Please note: these
seemed to have the desired effect on 2.6.25-rc kernels (2-3%
improvement in a kernbench-like make -j nr_cores), but I'm not
seeing this improvement with 2.6.26-rc kernels for some reason I am
still trying to track down.

Nathan Lynch (3):
  sched: support arch override of sched_group cpu power
  add cpu_power to machdep_calls, override SD_SIBLING_INIT
  adjust cpu power for secondary threads on POWER6

 arch/powerpc/kernel/setup-common.c |7 ++
 arch/powerpc/platforms/pseries/setup.c |   37 
 include/asm-powerpc/cputable.h |3 +-
 include/asm-powerpc/machdep.h  |2 +
 include/asm-powerpc/topology.h |   31 ++
 include/linux/sched.h  |1 +
 kernel/sched.c |   14 
 7 files changed, 94 insertions(+), 1 deletions(-)

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[RFC/PATCH 2/3] add cpu_power to machdep_calls, override SD_SIBLING_INIT

2008-06-18 Thread Nathan Lynch

Add a cpu_power() call to machdep_calls, which will allow platforms to
override the scheduler's default cpu power calculation.  If the
platform does not provide a cpu_power() method, the scheduler's
default value is used.

Copy the default SD_SIBLING_INIT to powerpc's topology.h and add the
SD_ASYM_CPU_POWER flag, which will cause ppc_md.cpu_power() to be
invoked (via arch_cpu_power()) during sched domain initialization.

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---
 arch/powerpc/kernel/setup-common.c |7 +++
 include/asm-powerpc/machdep.h  |2 ++
 include/asm-powerpc/topology.h |   31 +++
 3 files changed, 40 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/setup-common.c 
b/arch/powerpc/kernel/setup-common.c
index db540ea..609d09e 100644
--- a/arch/powerpc/kernel/setup-common.c
+++ b/arch/powerpc/kernel/setup-common.c
@@ -671,3 +671,10 @@ static int powerpc_debugfs_init(void)
 }
 arch_initcall(powerpc_debugfs_init);
 #endif
+
+unsigned int arch_cpu_power(int cpu, unsigned int default_power)
+{
+   if (ppc_md.cpu_power)
+   return ppc_md.cpu_power(cpu, default_power);
+   return default_power;
+}
diff --git a/include/asm-powerpc/machdep.h b/include/asm-powerpc/machdep.h
index 54ed64d..de6ff6b 100644
--- a/include/asm-powerpc/machdep.h
+++ b/include/asm-powerpc/machdep.h
@@ -260,6 +260,8 @@ struct machdep_calls {
void (*suspend_disable_irqs)(void);
void (*suspend_enable_irqs)(void);
 #endif
+   /* Override scheduler's cpu power calculation */
+   unsigned int (*cpu_power)(int cpu, unsigned int default_power);
 };
 
 extern void power4_idle(void);
diff --git a/include/asm-powerpc/topology.h b/include/asm-powerpc/topology.h
index 100c6fb..4335c15 100644
--- a/include/asm-powerpc/topology.h
+++ b/include/asm-powerpc/topology.h
@@ -72,6 +72,37 @@ static inline int pcibus_to_node(struct pci_bus *bus)
.nr_balance_failed  = 0,\
 }
 
+#define SD_SIBLING_INIT (struct sched_domain) {\
+   .span   = CPU_MASK_NONE,\
+   .parent = NULL, \
+   .child  = NULL, \
+   .groups = NULL, \
+   .min_interval   = 1,\
+   .max_interval   = 2,\
+   .busy_factor= 64,   \
+   .imbalance_pct  = 110,  \
+   .cache_nice_tries   = 0,\
+   .busy_idx   = 0,\
+   .idle_idx   = 0,\
+   .newidle_idx= 0,\
+   .wake_idx   = 0,\
+   .forkexec_idx   = 0,\
+   .flags  = SD_LOAD_BALANCE   \
+   | SD_BALANCE_NEWIDLE\
+   | SD_BALANCE_FORK   \
+   | SD_BALANCE_EXEC   \
+   | SD_WAKE_AFFINE\
+   | SD_WAKE_IDLE  \
+   | SD_SHARE_CPUPOWER \
+   | SD_ASYM_CPU_POWER,\
+   .last_balance   = jiffies,  \
+   .balance_interval   = 1,\
+   .nr_balance_failed  = 0,\
+}
+
+#define arch_cpu_power(cpu, power) arch_cpu_power(cpu, power)
+extern unsigned int arch_cpu_power(int cpu, unsigned int default_power);
+
 extern void __init dump_numa_cpu_topology(void);
 
 extern int sysfs_add_device_to_node(struct sys_device *dev, int nid);
-- 
1.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[RFC/PATCH 1/3] sched: support arch override of sched_group cpu power

2008-06-18 Thread Nathan Lynch

Add a new sched domain flag, SD_ASYM_CPU_POWER, which signifies that
the architecture may override the cpu power for a cpu via a hook in
init_sched_groups_power().  Add a dummy definition of arch_cpu_power()
which conforms with the existing behavior.

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---
 include/linux/sched.h |1 +
 kernel/sched.c|   14 ++
 2 files changed, 15 insertions(+), 0 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index c5d3f84..cfbefca 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -713,6 +713,7 @@ enum cpu_idle_type {
 #define SD_SHARE_PKG_RESOURCES 512 /* Domain members share cpu pkg 
resources */
 #define SD_SERIALIZE   1024/* Only a single load balancing 
instance */
 #define SD_WAKE_IDLE_FAR   2048/* Gain latency sacrificing cache hit */
+#define SD_ASYM_CPU_POWER  4096/* Domain members of unequal power */
 
 #define BALANCE_FOR_MC_POWER   \
(sched_smt_power_savings ? SD_POWERSAVINGS_BALANCE : 0)
diff --git a/kernel/sched.c b/kernel/sched.c
index eaf6751..3fba083 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -6761,6 +6761,13 @@ static void free_sched_groups(const cpumask_t *cpu_map, 
cpumask_t *nodemask)
 }
 #endif
 
+#ifndef arch_cpu_power
+static inline unsigned int arch_cpu_power(int cpu, unsigned int default_power)
+{
+   return default_power;
+}
+#endif
+
 /*
  * Initialize sched groups cpu_power.
  *
@@ -6789,6 +6796,13 @@ static void init_sched_groups_power(int cpu, struct 
sched_domain *sd)
 
sd-groups-__cpu_power = 0;
 
+   if (!child  (sd-flags  SD_ASYM_CPU_POWER)) {
+   unsigned int power = arch_cpu_power(cpu, SCHED_LOAD_SCALE);
+
+   sg_inc_cpu_power(sd-groups, power);
+   return;
+   }
+
/*
 * For perf policy, if the groups in child domain share resources
 * (for example cores sharing some portions of the cache hierarchy
-- 
1.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[RFC/PATCH 3/3] adjust cpu power for secondary threads on POWER6

2008-06-18 Thread Nathan Lynch

On POWER6 processors, cpu-bound programs generally perform better on
the primary thread when the secondary thread is idle than they do on
the secondary thread while the primary thread is idle.  This
difference can be observed by timing a simple shell loop:

for cpumask in 0x1 0x2 ; do
taskset $cpumask /usr/bin/time -f %e elapsed, %U user, %S sys \
/bin/sh -c i=100 ; while (( i-- )) ; do : ; done
done

17.05 elapsed, 16.83 user, 0.22 sys
17.54 elapsed, 17.32 user, 0.22 sys

(The first result is for a primary thread; the second result for a
secondary thread.)

So we want the CPU scheduler to slightly favor primary threads on
POWER6.

Add a new cpu feature bit which indicates the need to override the
scheduler's cpu power calculation.

Implement ppc_md.cpu_power for the pseries platform, and scale
secondary threads' cpu power to 97% of the (default) primary threads'
cpu power.  Allow this percentage to be overriden on the kernel
command line via sec_thread_power_scale=.

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---
 arch/powerpc/platforms/pseries/setup.c |   37 
 include/asm-powerpc/cputable.h |3 +-
 2 files changed, 39 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index f5d29f5..a1141c0 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -501,6 +501,42 @@ void pSeries_power_off(void)
for (;;);
 }
 
+/* Percentage by which the cpu power of secondary threads is adjusted */
+static unsigned int sec_thread_power_scale = 97;
+static int __init setup_sec_thread_power_scale(char *str)
+{
+   int power;
+
+   if (get_option(str, power)  power  0  power = 100)
+   sec_thread_power_scale = power;
+
+   return 1;
+}
+__setup(sec_thread_power_scale=, setup_sec_thread_power_scale);
+
+static unsigned int pseries_cpu_power(int cpu, unsigned int default_power)
+{
+   struct device_node *np;
+   unsigned int thread, power;
+
+   if (!cpu_has_feature(CPU_FTR_ASYM_POWER))
+   return default_power;
+
+   power = default_power;
+
+   np = of_get_cpu_node(cpu, thread);
+   WARN_ON(!np);
+   if (!np)
+   goto out;
+
+   /* If this isn't a primary thread, scale the power */
+   if (thread != 0)
+   power = default_power * sec_thread_power_scale / 100;
+out:
+   of_node_put(np);
+   return power;
+}
+
 #ifndef CONFIG_PCI
 void pSeries_final_fixup(void) { }
 #endif
@@ -525,4 +561,5 @@ define_machine(pseries) {
.progress   = rtas_progress,
.system_reset_exception = pSeries_system_reset_exception,
.machine_check_exception = pSeries_machine_check_exception,
+   .cpu_power = pseries_cpu_power,
 };
diff --git a/include/asm-powerpc/cputable.h b/include/asm-powerpc/cputable.h
index 1e79673..eb886a9 100644
--- a/include/asm-powerpc/cputable.h
+++ b/include/asm-powerpc/cputable.h
@@ -152,6 +152,7 @@ extern void do_feature_fixups(unsigned long value, void 
*fixup_start,
 #define CPU_FTR_UNIFIED_ID_CACHE   ASM_CONST(0x0100)
 #define CPU_FTR_SPEASM_CONST(0x0200)
 #define CPU_FTR_NEED_PAIRED_STWCX  ASM_CONST(0x0400)
+#define CPU_FTR_ASYM_POWER ASM_CONST(0x0800)
 
 /*
  * Add the 64-bit processor unique features in the top half of the word;
@@ -375,7 +376,7 @@ extern void do_feature_fixups(unsigned long value, void 
*fixup_start,
CPU_FTR_MMCRA | CPU_FTR_SMT | \
CPU_FTR_COHERENT_ICACHE | CPU_FTR_LOCKLESS_TLBIE | \
CPU_FTR_PURR | CPU_FTR_SPURR | CPU_FTR_REAL_LE | \
-   CPU_FTR_DSCR)
+   CPU_FTR_DSCR | CPU_FTR_ASYM_POWER)
 #define CPU_FTRS_CELL  (CPU_FTR_USE_TB | \
CPU_FTR_HPTE_TABLE | CPU_FTR_PPCAS_ARCH_V2 | CPU_FTR_CTRL | \
CPU_FTR_ALTIVEC_COMP | CPU_FTR_MMCRA | CPU_FTR_SMT | \
-- 
1.5.5

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [RFC/PATCH 0/3] sched: allow arch override of cpu power

2008-06-19 Thread Nathan Lynch

Ingo Molnar wrote:
 
 * Nathan Lynch [EMAIL PROTECTED] wrote:
  So it would be nice to have the scheduler slightly prefer primary 
  threads on POWER6 machines.  These patches, which allow the 
  architecture to override the scheduler's CPU power calculation, are 
  one possible approach, but I'm open to others.  Please note: these 
  seemed to have the desired effect on 2.6.25-rc kernels (2-3% 
  improvement in a kernbench-like make -j nr_cores), but I'm not 
  seeing this improvement with 2.6.26-rc kernels for some reason I am 
  still trying to track down.
 
 ok, i guess that discrepancy has to be tracked down before we can think 
 about these patches - but the principle is OK.

Great.  I'll keep trying to figure out what's going on there.


 One problem is that the whole cpu-power balancing code in sched.c is a 
 bit ... unclear and under-documented. So any change to this area should 
 begin at documenting the basics: what do the units mean exactly, how are 
 they used in balancing and what is the desired effect.

 I'd not be surprised if there were a few buglets in this area, SMT is 
 not at the forefront of testing at the moment. There's nothing 
 spectacularly broken in it (i have a HT machine myself), but the 
 concepts have bitrotten a bit. Patches - even if they just add comments 
 - are welcome :-)

Okay, I'll have a look.  Thanks Ingo.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [RFC/PATCH 0/3] sched: allow arch override of cpu power

2008-06-27 Thread Nathan Lynch

Breno Leitao wrote:
 Nathan Lynch wrote:
 There is an interesting quality of POWER6 cores, which each have 2
 hardware threads: assuming one thread on the core is idle, the primary
 thread is a little faster than the secondary thread.  To illustrate:
   
 I found this feature interesting and decided to do some tests.
 After some tests I found that the example you post really runs fast in  
 the first CPU, but a more elaborated application runs slower on the  
 first CPU.
 Here is a small example:

 # taskset 0x1 time -f %e,  %U, %S ./a.out ; taskset 0x2 time -f %e,  
 %U, %S ./a.out
 10.77,  10.72, 0.01
 10.53, 10.48, 0.01

 # taskset 0x2 time -f %e,  %U, %S ./a.out ; taskset 0x1 time -f %e,  
 %U, %S ./a.out
 10.55,  10.50, 0.01
 10.77, 10.72, 0.01

I've been able to duplicate your results, thanks for the testcase.
Guess I'll need to understand what's going on before continuing with
this...
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH] Add PPC_FEATURE_PMU_COMPAT

2008-07-03 Thread Nathan Lynch

Beginning with Power6, there is a set of 32 PMU events which is
compatible across POWER processor lines.  PPC_FEATURE_PMU_COMPAT
indicates support for this subset.

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---
 arch/powerpc/kernel/cputable.c |4 ++--
 include/asm-powerpc/cputable.h |1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 817cea1..a9c90be 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -70,10 +70,10 @@ extern void __restore_cpu_power7(void);
 PPC_FEATURE_SMT | PPC_FEATURE_ICACHE_SNOOP)
 #define COMMON_USER_POWER6 (COMMON_USER_PPC64 | PPC_FEATURE_ARCH_2_05 |\
 PPC_FEATURE_SMT | PPC_FEATURE_ICACHE_SNOOP | \
-PPC_FEATURE_TRUE_LE)
+PPC_FEATURE_TRUE_LE | PPC_FEATURE_PMU_COMPAT)
 #define COMMON_USER_POWER7 (COMMON_USER_PPC64 | PPC_FEATURE_ARCH_2_06 |\
 PPC_FEATURE_SMT | PPC_FEATURE_ICACHE_SNOOP | \
-PPC_FEATURE_TRUE_LE)
+PPC_FEATURE_TRUE_LE | PPC_FEATURE_PMU_COMPAT)
 #define COMMON_USER_PA6T   (COMMON_USER_PPC64 | PPC_FEATURE_PA6T |\
 PPC_FEATURE_TRUE_LE | \
 PPC_FEATURE_HAS_ALTIVEC_COMP)
diff --git a/include/asm-powerpc/cputable.h b/include/asm-powerpc/cputable.h
index 4e4491c..a9c1de3 100644
--- a/include/asm-powerpc/cputable.h
+++ b/include/asm-powerpc/cputable.h
@@ -26,6 +26,7 @@
 #define PPC_FEATURE_POWER6_EXT 0x0200
 #define PPC_FEATURE_ARCH_2_06  0x0100
 #define PPC_FEATURE_HAS_VSX0x0080
+#define PPC_FEATURE_PMU_COMPAT 0x0040
 
 #define PPC_FEATURE_TRUE_LE0x0002
 #define PPC_FEATURE_PPC_LE 0x0001
-- 
1.5.5.1

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 1/2] elf loader support for auxvec base platform string

2008-07-03 Thread Nathan Lynch

Some IBM POWER-based platforms have the ability to run in a
mode which mostly appears to the OS as a different processor from the
actual hardware.  For example, a Power6 system may appear to be a
Power5+, which makes the AT_PLATFORM value power5+.

However, some applications (virtual machines, optimized libraries) can
benefit from knowledge of the underlying CPU model.  A new aux vector
entry, AT_BASE_PLATFORM, will denote the actual hardware.  For
example, on a Power6 system in Power5+ compatibility mode, AT_PLATFORM
will be power5+ and AT_BASE_PLATFORM will be power6.

If the architecture has defined ELF_BASE_PLATFORM, copy that value to
the user stack in the same manner as ELF_PLATFORM.

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---

Next patch implements ELF/AT_BASE_PLATFORM for powerpc.

 fs/binfmt_elf.c |   23 +++
 1 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index d48ff5f..834c2c4 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -131,6 +131,10 @@ static int padzero(unsigned long elf_bss)
 #define STACK_ALLOC(sp, len) ({ sp -= len ; sp; })
 #endif
 
+#ifndef ELF_BASE_PLATFORM
+#define ELF_BASE_PLATFORM NULL
+#endif
+
 static int
 create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
unsigned long load_addr, unsigned long interp_load_addr)
@@ -142,7 +146,9 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr 
*exec,
elf_addr_t __user *envp;
elf_addr_t __user *sp;
elf_addr_t __user *u_platform;
+   elf_addr_t __user *u_base_platform;
const char *k_platform = ELF_PLATFORM;
+   const char *k_base_platform = ELF_BASE_PLATFORM;
int items;
elf_addr_t *elf_info;
int ei_index = 0;
@@ -172,6 +178,19 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr 
*exec,
return -EFAULT;
}
 
+   /*
+* If this architecture has a base platform capability
+* string, copy it to userspace.
+*/
+   u_base_platform = NULL;
+   if (k_base_platform) {
+   size_t len = strlen(k_base_platform) + 1;
+
+   u_base_platform = (elf_addr_t __user *)STACK_ALLOC(p, len);
+   if (__copy_to_user(u_base_platform, k_base_platform, len))
+   return -EFAULT;
+   }
+
/* Create the ELF interpreter info */
elf_info = (elf_addr_t *)current-mm-saved_auxv;
/* update AT_VECTOR_SIZE_BASE if the number of NEW_AUX_ENT() changes */
@@ -208,6 +227,10 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr 
*exec,
NEW_AUX_ENT(AT_PLATFORM,
(elf_addr_t)(unsigned long)u_platform);
}
+   if (k_base_platform) {
+   NEW_AUX_ENT(AT_BASE_PLATFORM,
+   (elf_addr_t)(unsigned long)u_base_platform);
+   }
if (bprm-interp_flags  BINPRM_FLAGS_EXECFD) {
NEW_AUX_ENT(AT_EXECFD, bprm-interp_data);
}
-- 
1.5.5.1

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH 2/2] enable AT_BASE_PLATFORM aux vector for powerpc

2008-07-03 Thread Nathan Lynch

Stash the first platform string matched by identify_cpu() in
powerpc_base_platform, and supply that to the ELF loader for the value
of AT_BASE_PLATFORM.

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---

Note: I wasn't sure how to pick the index value for AT_BASE_PLATFORM;
I simply searched include/ for all AT_ values and added 1 to the
highest value found.

 arch/powerpc/kernel/cputable.c |   11 +++
 include/asm-powerpc/auxvec.h   |3 +++
 include/asm-powerpc/cputable.h |2 ++
 include/asm-powerpc/elf.h  |8 
 4 files changed, 24 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index a9c90be..f1eb632 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -23,6 +23,9 @@
 struct cpu_spec* cur_cpu_spec = NULL;
 EXPORT_SYMBOL(cur_cpu_spec);
 
+/* The platform string corresponding to the real PVR */
+const char *powerpc_base_platform;
+
 /* NOTE:
  * Unlike ppc32, ppc64 will only call this once for the boot CPU, it's
  * the responsibility of the appropriate CPU save/restore functions to
@@ -1620,6 +1623,14 @@ struct cpu_spec * __init identify_cpu(unsigned long 
offset, unsigned int pvr)
} else
*t = *s;
*PTRRELOC(cur_cpu_spec) = the_cpu_spec;
+
+   /*
+* Set the base platform string once; assumes
+* we're called with real pvr first.
+*/
+   if (powerpc_base_platform == NULL)
+   powerpc_base_platform = t-platform;
+
 #if defined(CONFIG_PPC64) || defined(CONFIG_BOOKE)
/* ppc64 and booke expect identify_cpu to also call
 * setup_cpu for that processor. I will consolidate
diff --git a/include/asm-powerpc/auxvec.h b/include/asm-powerpc/auxvec.h
index 19a099b..507f081 100644
--- a/include/asm-powerpc/auxvec.h
+++ b/include/asm-powerpc/auxvec.h
@@ -16,4 +16,7 @@
  */
 #define AT_SYSINFO_EHDR33
 
+/* The actual hardware platform; may differ from AT_PLATFORM */
+#define AT_BASE_PLATFORM   38
+
 #endif
diff --git a/include/asm-powerpc/cputable.h b/include/asm-powerpc/cputable.h
index a9c1de3..527152f 100644
--- a/include/asm-powerpc/cputable.h
+++ b/include/asm-powerpc/cputable.h
@@ -125,6 +125,8 @@ extern struct cpu_spec *identify_cpu(unsigned long offset, 
unsigned int pvr);
 extern void do_feature_fixups(unsigned long value, void *fixup_start,
  void *fixup_end);
 
+extern const char *powerpc_base_platform;
+
 #endif /* __ASSEMBLY__ */
 
 /* CPU kernel features */
diff --git a/include/asm-powerpc/elf.h b/include/asm-powerpc/elf.h
index 38a5172..ff0c947 100644
--- a/include/asm-powerpc/elf.h
+++ b/include/asm-powerpc/elf.h
@@ -237,6 +237,14 @@ extern int dump_task_altivec(struct task_struct *, 
elf_vrregset_t *vrregs);
 
 #define ELF_PLATFORM   (cur_cpu_spec-platform)
 
+/* While ELF_PLATFORM indicates the ISA supported by the platform, it
+ * may not accurately reflect the underlying behavior of the hardware
+ * (as in the case of running in Power5+ compatibility mode on a
+ * Power6 machine).  ELF_BASE_PLATFORM allows ld.so to load libraries
+ * that are tuned for the real hardware.
+ */
+#define ELF_BASE_PLATFORM (powerpc_base_platform)
+
 #ifdef __powerpc64__
 # define ELF_PLAT_INIT(_r, load_addr)  do {\
_r-gpr[2] = load_addr; \
-- 
1.5.5.1

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 1/2] elf loader support for auxvec base platform string

2008-07-07 Thread Nathan Lynch

Mikael Pettersson wrote:
 Nathan Lynch writes:
   Some IBM POWER-based platforms have the ability to run in a
   mode which mostly appears to the OS as a different processor from the
   actual hardware.  For example, a Power6 system may appear to be a
   Power5+, which makes the AT_PLATFORM value power5+.
   
   However, some applications (virtual machines, optimized libraries) can
   benefit from knowledge of the underlying CPU model.  A new aux vector
   entry, AT_BASE_PLATFORM, will denote the actual hardware.  For
   example, on a Power6 system in Power5+ compatibility mode, AT_PLATFORM
   will be power5+ and AT_BASE_PLATFORM will be power6.
 
 Why on earth would you ever want AT_PLATFORM to differ from AT_BASE_PLATFORM?
 In cases that matter you admit that AT_BASE_PLATFORM takes precedence,
 so why involve a fake lame not-quite-the-platform in the first place?
 
 Workaround for buggy software?

My apologies, I did not explain the motivation well.

The idea is that while AT_PLATFORM indicates the instruction set
supported, AT_BASE_PLATFORM indicates the underlying
microarchitecture.  It's not a matter of buggy software, or of one
value taking precedence over the other.
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 1/2] elf loader support for auxvec base platform string

2008-07-07 Thread Nathan Lynch

Roland McGrath wrote:
  Well, we use strings to represent the platforms already (ie, the actual
  CPU microarchitecture). Fitting those into bits would be annoying, it
 
 Then use dsocaps.
 
  makes sense to have AT_BASE_PLATFORM to be the base variant of
  AT_PLATFORM.
 
 I understand why you think so.  But let's not be too abstract.  The
 purpose of the addition is to drive ld.so's selection of libraries, yes?

That is one purpose.  But there are others (JVMs, performance tools).
dsocaps seems to be an ld.so-specific thing... or am I missing how a
third-party program would use it?
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH] Add PPC_FEATURE_PMU_COMPAT

2008-07-07 Thread Nathan Lynch

Kumar Gala wrote:

 On Jul 3, 2008, at 6:20 PM, Nathan Lynch wrote:

 Beginning with Power6, there is a set of 32 PMU events which is
 compatible across POWER processor lines.  PPC_FEATURE_PMU_COMPAT
 indicates support for this subset.

 Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
 ---
 arch/powerpc/kernel/cputable.c |4 ++--
 include/asm-powerpc/cputable.h |1 +
 2 files changed, 3 insertions(+), 2 deletions(-)

 Can you explain what these PMU events are a bit further?

Maynard, can you help out here...?  :)
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

Re: [PATCH 1/2] elf loader support for auxvec base platform string

2008-07-07 Thread Nathan Lynch

Benjamin Herrenschmidt wrote:
 On Thu, 2008-07-03 at 19:19 -0700, Roland McGrath wrote:
  Why not just use ELF_HWCAP for this?  It looks like powerpc only has 3 bits
  left there (keeping it to 32), but 3 is not 0.  If not that, why not use
  dsocaps?  That is, some magic in the vDSO, which glibc already supports on
  all machines where it uses the vDSO.  (For how it works, see the use in
  arch/x86/vdso/vdso32/note.S for CONFIG_XEN.)
 
 Well, we use strings to represent the platforms already (ie, the actual
 CPU microarchitecture). Fitting those into bits would be annoying, it
 makes sense to have AT_BASE_PLATFORM to be the base variant of
 AT_PLATFORM.
 
 _However_ there is a bug in that this patch adds an entry without
 bumping the number of entries in the cached array (ie.
 AT_VECTOR_SIZE_BASE needs to be updated).

Ugh, yes.  I was hoping to work this in such a way that AT_VECTOR_SIZE
(and thus the size of mm_struct) increases only for architectures that
implement AT_BASE_PLATFORM... would it be wrong to account for it in
AT_VECTOR_SIZE_ARCH?
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH v2] Add PPC_FEATURE_PSERIES_PMU_COMPAT

2008-07-08 Thread Nathan Lynch

Background from Maynard Johnson:
As of POWER6, a set of 32 common events is defined that must be
supported on all future POWER processors.  The main impetus for this
compat set is the need to support partition migration, especially from
processor P(n) to processor P(n+1), where performance software that's
running in the new partition may not be knowledgeable about processor
P(n+1).  If a performance tool determines it does not support the
physical processor, but is told (via the PPC_FEATURE_PSERIES_PMU_COMPAT
bit) that the processor supports the notion of the PMU compat set,
then the performance tool can surface just those events to the user
of the tool.

PPC_FEATURE_PSERIES_PMU_COMPAT indicates that the PMU supports at
least this basic subset of events which is compatible across POWER
processor lines.

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---

Changes since v1:
- make name of feature bit less generic
- provide more complete changelog

 arch/powerpc/kernel/cputable.c |6 --
 include/asm-powerpc/cputable.h |1 +
 2 files changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 817cea1..c4eb377 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -70,10 +70,12 @@ extern void __restore_cpu_power7(void);
 PPC_FEATURE_SMT | PPC_FEATURE_ICACHE_SNOOP)
 #define COMMON_USER_POWER6 (COMMON_USER_PPC64 | PPC_FEATURE_ARCH_2_05 |\
 PPC_FEATURE_SMT | PPC_FEATURE_ICACHE_SNOOP | \
-PPC_FEATURE_TRUE_LE)
+PPC_FEATURE_TRUE_LE | \
+PPC_FEATURE_PSERIES_PMU_COMPAT)
 #define COMMON_USER_POWER7 (COMMON_USER_PPC64 | PPC_FEATURE_ARCH_2_06 |\
 PPC_FEATURE_SMT | PPC_FEATURE_ICACHE_SNOOP | \
-PPC_FEATURE_TRUE_LE)
+PPC_FEATURE_TRUE_LE | \
+PPC_FEATURE_PSERIES_PMU_COMPAT)
 #define COMMON_USER_PA6T   (COMMON_USER_PPC64 | PPC_FEATURE_PA6T |\
 PPC_FEATURE_TRUE_LE | \
 PPC_FEATURE_HAS_ALTIVEC_COMP)
diff --git a/include/asm-powerpc/cputable.h b/include/asm-powerpc/cputable.h
index 3171ac9..d1492a2 100644
--- a/include/asm-powerpc/cputable.h
+++ b/include/asm-powerpc/cputable.h
@@ -26,6 +26,7 @@
 #define PPC_FEATURE_POWER6_EXT 0x0200
 #define PPC_FEATURE_ARCH_2_06  0x0100
 #define PPC_FEATURE_HAS_VSX0x0080
+#define PPC_FEATURE_PSERIES_PMU_COMPAT 0x0040
 
 #define PPC_FEATURE_TRUE_LE0x0002
 #define PPC_FEATURE_PPC_LE 0x0001
-- 
1.5.6.2

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH] kill useless SMT code in prom_hold_cpus

2008-07-08 Thread Nathan Lynch

I think this code that counts SMT threads and compares against NR_CPUS
is an artifact of pre-powerpc-merge ppc64.  We care about starting
only primary threads in the OF client code.

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---
 arch/powerpc/kernel/prom_init.c |   39 +++
 1 files changed, 3 insertions(+), 36 deletions(-)

diff --git a/arch/powerpc/kernel/prom_init.c b/arch/powerpc/kernel/prom_init.c
index 1ea8c8d..b1dd86c 100644
--- a/arch/powerpc/kernel/prom_init.c
+++ b/arch/powerpc/kernel/prom_init.c
@@ -205,8 +205,6 @@ static int __initdata mem_reserve_cnt;
 static cell_t __initdata regbuf[1024];
 
 
-#define MAX_CPU_THREADS 2
-
 /*
  * Error results ... some OF calls will return -1 on error, some
  * will return 0, some will return either. To simplify, here are
@@ -1332,10 +1330,6 @@ static void __init prom_hold_cpus(void)
unsigned int reg;
phandle node;
char type[64];
-   int cpuid = 0;
-   unsigned int interrupt_server[MAX_CPU_THREADS];
-   unsigned int cpu_threads, hw_cpu_num;
-   int propsize;
struct prom_t *_prom = RELOC(prom);
unsigned long *spinloop
= (void *) LOW_ADDR(__secondary_hold_spinloop);
@@ -1379,7 +1373,6 @@ static void __init prom_hold_cpus(void)
reg = -1;
prom_getprop(node, reg, reg, sizeof(reg));
 
-   prom_debug(\ncpuid= 0x%x\n, cpuid);
prom_debug(cpu hw idx   = 0x%x\n, reg);
 
/* Init the acknowledge var which will be reset by
@@ -1388,28 +1381,9 @@ static void __init prom_hold_cpus(void)
 */
*acknowledge = (unsigned long)-1;
 
-   propsize = prom_getprop(node, ibm,ppc-interrupt-server#s,
-   interrupt_server,
-   sizeof(interrupt_server));
-   if (propsize  0) {
-   /* no property.  old hardware has no SMT */
-   cpu_threads = 1;
-   interrupt_server[0] = reg; /* fake it with phys id */
-   } else {
-   /* We have a threaded processor */
-   cpu_threads = propsize / sizeof(u32);
-   if (cpu_threads  MAX_CPU_THREADS) {
-   prom_printf(SMT: too many threads!\n
-   SMT: found %x, max is %x\n,
-   cpu_threads, MAX_CPU_THREADS);
-   cpu_threads = 1; /* ToDo: panic? */
-   }
-   }
-
-   hw_cpu_num = interrupt_server[0];
-   if (hw_cpu_num != _prom-cpu) {
+   if (reg != _prom-cpu) {
/* Primary Thread of non-boot cpu */
-   prom_printf(%x : starting cpu hw idx %x... , cpuid, 
reg);
+   prom_printf(starting cpu hw idx %x... , reg);
call_prom(start-cpu, 3, 0, node,
  secondary_hold, reg);
 
@@ -1424,17 +1398,10 @@ static void __init prom_hold_cpus(void)
}
 #ifdef CONFIG_SMP
else
-   prom_printf(%x : boot cpu %x\n, cpuid, reg);
+   prom_printf(boot cpu hw idx %x\n, reg);
 #endif /* CONFIG_SMP */
-
-   /* Reserve cpu #s for secondary threads.   They start later. */
-   cpuid += cpu_threads;
}
 
-   if (cpuid  NR_CPUS)
-   prom_printf(WARNING: maximum CPUs ( __stringify(NR_CPUS)
-   ) exceeded: ignoring extras\n);
-
prom_debug(prom_hold_cpus: end...\n);
 }
 
-- 
1.5.6.2

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH v3] Add PPC_FEATURE_PSERIES_PERFMON_COMPAT

2008-07-09 Thread Nathan Lynch

Background from Maynard Johnson:
As of POWER6, a set of 32 common events is defined that must be
supported on all future POWER processors.  The main impetus for this
compat set is the need to support partition migration, especially from
processor P(n) to processor P(n+1), where performance software that's
running in the new partition may not be knowledgeable about processor
P(n+1).  If a performance tool determines it does not support the
physical processor, but is told (via the
PPC_FEATURE_PSERIES_PERFMON_COMPAT bit) that the processor supports
the notion of the PMU compat set, then the performance tool can
surface just those events to the user of the tool.

PPC_FEATURE_PSERIES_PERFMON_COMPAT indicates that the PMU supports at
least this basic subset of events which is compatible across POWER
processor lines.

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---

Changes since v2:
- Further disambiguated name of feature bit (PMU - PERFMON)

Changes since v1:
- make name of feature bit less generic
- provide more complete changelog

 arch/powerpc/kernel/cputable.c |6 --
 include/asm-powerpc/cputable.h |3 +++
 2 files changed, 7 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index 817cea1..02088b0 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -70,10 +70,12 @@ extern void __restore_cpu_power7(void);
 PPC_FEATURE_SMT | PPC_FEATURE_ICACHE_SNOOP)
 #define COMMON_USER_POWER6 (COMMON_USER_PPC64 | PPC_FEATURE_ARCH_2_05 |\
 PPC_FEATURE_SMT | PPC_FEATURE_ICACHE_SNOOP | \
-PPC_FEATURE_TRUE_LE)
+PPC_FEATURE_TRUE_LE | \
+PPC_FEATURE_PSERIES_PERFMON_COMPAT)
 #define COMMON_USER_POWER7 (COMMON_USER_PPC64 | PPC_FEATURE_ARCH_2_06 |\
 PPC_FEATURE_SMT | PPC_FEATURE_ICACHE_SNOOP | \
-PPC_FEATURE_TRUE_LE)
+PPC_FEATURE_TRUE_LE | \
+PPC_FEATURE_PSERIES_PERFMON_COMPAT)
 #define COMMON_USER_PA6T   (COMMON_USER_PPC64 | PPC_FEATURE_PA6T |\
 PPC_FEATURE_TRUE_LE | \
 PPC_FEATURE_HAS_ALTIVEC_COMP)
diff --git a/include/asm-powerpc/cputable.h b/include/asm-powerpc/cputable.h
index 3171ac9..58d4281 100644
--- a/include/asm-powerpc/cputable.h
+++ b/include/asm-powerpc/cputable.h
@@ -27,6 +27,9 @@
 #define PPC_FEATURE_ARCH_2_06  0x0100
 #define PPC_FEATURE_HAS_VSX0x0080
 
+#define PPC_FEATURE_PSERIES_PERFMON_COMPAT \
+   0x0040
+
 #define PPC_FEATURE_TRUE_LE0x0002
 #define PPC_FEATURE_PPC_LE 0x0001
 
-- 
1.5.6.2

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH] elf loader support for auxvec base platform string

2008-07-15 Thread Nathan Lynch

Some IBM POWER-based platforms have the ability to run in a
mode which mostly appears to the OS as a different processor from the
actual hardware.  For example, a Power6 system may appear to be a
Power5+, which makes the AT_PLATFORM value power5+.  This means that
programs are restricted to the ISA supported by Power5+;
Power6-specific instructions are treated as illegal.

However, some applications (virtual machines, optimized libraries) can
benefit from knowledge of the underlying CPU model.  A new aux vector
entry, AT_BASE_PLATFORM, will denote the actual hardware.  For
example, on a Power6 system in Power5+ compatibility mode, AT_PLATFORM
will be power5+ and AT_BASE_PLATFORM will be power6.  The idea is
that AT_PLATFORM indicates the instruction set supported, while
AT_BASE_PLATFORM indicates the underlying microarchitecture.

If the architecture has defined ELF_BASE_PLATFORM, copy that value to
the user stack in the same manner as ELF_PLATFORM.

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---
 fs/binfmt_elf.c|   23 +++
 include/linux/auxvec.h |5 -
 2 files changed, 27 insertions(+), 1 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index d48ff5f..834c2c4 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -131,6 +131,10 @@ static int padzero(unsigned long elf_bss)
 #define STACK_ALLOC(sp, len) ({ sp -= len ; sp; })
 #endif
 
+#ifndef ELF_BASE_PLATFORM
+#define ELF_BASE_PLATFORM NULL
+#endif
+
 static int
 create_elf_tables(struct linux_binprm *bprm, struct elfhdr *exec,
unsigned long load_addr, unsigned long interp_load_addr)
@@ -142,7 +146,9 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr 
*exec,
elf_addr_t __user *envp;
elf_addr_t __user *sp;
elf_addr_t __user *u_platform;
+   elf_addr_t __user *u_base_platform;
const char *k_platform = ELF_PLATFORM;
+   const char *k_base_platform = ELF_BASE_PLATFORM;
int items;
elf_addr_t *elf_info;
int ei_index = 0;
@@ -172,6 +178,19 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr 
*exec,
return -EFAULT;
}
 
+   /*
+* If this architecture has a base platform capability
+* string, copy it to userspace.
+*/
+   u_base_platform = NULL;
+   if (k_base_platform) {
+   size_t len = strlen(k_base_platform) + 1;
+
+   u_base_platform = (elf_addr_t __user *)STACK_ALLOC(p, len);
+   if (__copy_to_user(u_base_platform, k_base_platform, len))
+   return -EFAULT;
+   }
+
/* Create the ELF interpreter info */
elf_info = (elf_addr_t *)current-mm-saved_auxv;
/* update AT_VECTOR_SIZE_BASE if the number of NEW_AUX_ENT() changes */
@@ -208,6 +227,10 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr 
*exec,
NEW_AUX_ENT(AT_PLATFORM,
(elf_addr_t)(unsigned long)u_platform);
}
+   if (k_base_platform) {
+   NEW_AUX_ENT(AT_BASE_PLATFORM,
+   (elf_addr_t)(unsigned long)u_base_platform);
+   }
if (bprm-interp_flags  BINPRM_FLAGS_EXECFD) {
NEW_AUX_ENT(AT_EXECFD, bprm-interp_data);
}
diff --git a/include/linux/auxvec.h b/include/linux/auxvec.h
index ad89545..1adc61d 100644
--- a/include/linux/auxvec.h
+++ b/include/linux/auxvec.h
@@ -26,8 +26,11 @@
 
 #define AT_SECURE 23   /* secure mode boolean */
 
+#define AT_BASE_PLATFORM 38/* string identifying real platform, may
+* differ from AT_PLATFORM. */
+
 #ifdef __KERNEL__
-#define AT_VECTOR_SIZE_BASE (14 + 2) /* NEW_AUX_ENT entries in auxiliary table 
*/
+#define AT_VECTOR_SIZE_BASE (14 + 3) /* NEW_AUX_ENT entries in auxiliary table 
*/
 #endif
 
 #endif /* _LINUX_AUXVEC_H */
-- 
1.5.6.2

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

[PATCH] enable AT_BASE_PLATFORM aux vector for powerpc

2008-07-15 Thread Nathan Lynch

Stash the first platform string matched by identify_cpu() in
powerpc_base_platform, and supply that to the ELF loader for the value
of AT_BASE_PLATFORM.

Signed-off-by: Nathan Lynch [EMAIL PROTECTED]
---
 arch/powerpc/kernel/cputable.c |   11 +++
 include/asm-powerpc/cputable.h |2 ++
 include/asm-powerpc/elf.h  |8 
 3 files changed, 21 insertions(+), 0 deletions(-)

diff --git a/arch/powerpc/kernel/cputable.c b/arch/powerpc/kernel/cputable.c
index f7f3c21..89d8731 100644
--- a/arch/powerpc/kernel/cputable.c
+++ b/arch/powerpc/kernel/cputable.c
@@ -23,6 +23,9 @@
 struct cpu_spec* cur_cpu_spec = NULL;
 EXPORT_SYMBOL(cur_cpu_spec);
 
+/* The platform string corresponding to the real PVR */
+const char *powerpc_base_platform;
+
 /* NOTE:
  * Unlike ppc32, ppc64 will only call this once for the boot CPU, it's
  * the responsibility of the appropriate CPU save/restore functions to
@@ -1632,6 +1635,14 @@ struct cpu_spec * __init identify_cpu(unsigned long 
offset, unsigned int pvr)
} else
*t = *s;
*PTRRELOC(cur_cpu_spec) = the_cpu_spec;
+
+   /*
+* Set the base platform string once; assumes
+* we're called with real pvr first.
+*/
+   if (powerpc_base_platform == NULL)
+   powerpc_base_platform = t-platform;
+
 #if defined(CONFIG_PPC64) || defined(CONFIG_BOOKE)
/* ppc64 and booke expect identify_cpu to also call
 * setup_cpu for that processor. I will consolidate
diff --git a/include/asm-powerpc/cputable.h b/include/asm-powerpc/cputable.h
index 2a3e907..ef8a248 100644
--- a/include/asm-powerpc/cputable.h
+++ b/include/asm-powerpc/cputable.h
@@ -127,6 +127,8 @@ extern struct cpu_spec *identify_cpu(unsigned long offset, 
unsigned int pvr);
 extern void do_feature_fixups(unsigned long value, void *fixup_start,
  void *fixup_end);
 
+extern const char *powerpc_base_platform;
+
 #endif /* __ASSEMBLY__ */
 
 /* CPU kernel features */
diff --git a/include/asm-powerpc/elf.h b/include/asm-powerpc/elf.h
index 8966467..80d1f39 100644
--- a/include/asm-powerpc/elf.h
+++ b/include/asm-powerpc/elf.h
@@ -217,6 +217,14 @@ typedef elf_vrregset_t elf_fpxregset_t;
 
 #define ELF_PLATFORM   (cur_cpu_spec-platform)
 
+/* While ELF_PLATFORM indicates the ISA supported by the platform, it
+ * may not accurately reflect the underlying behavior of the hardware
+ * (as in the case of running in Power5+ compatibility mode on a
+ * Power6 machine).  ELF_BASE_PLATFORM allows ld.so to load libraries
+ * that are tuned for the real hardware.
+ */
+#define ELF_BASE_PLATFORM (powerpc_base_platform)
+
 #ifdef __powerpc64__
 # define ELF_PLAT_INIT(_r, load_addr)  do {\
_r-gpr[2] = load_addr; \
-- 
1.5.6.2

___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

AT_BASE_PLATFORM (v2)

2008-07-15 Thread Nathan Lynch

Background:
Some IBM POWER-based systems have the ability to run in a
compatibility mode which mostly appears to the OS as a different
processor from the actual hardware.  This feature of the platform is
useful for live partition migration and for backwards compatibility
with old kernels on new hardware.  For example, a Power6 system may
appear to be a Power5+, which makes the AT_PLATFORM value power5+.

Problem:
Booting a system in a compatibility mode means that ld.so may load
libraries that are inappropriately tuned for the real
microarchitecture, and apps that use JIT techniques do not have the
right information for generating tuned code.  While the AT_PLATFORM
auxiliary vector entry correctly indicates the ISA supported, it does
not accurately reflect the underlying microarchitecture in this case,
and there is no good way for userspace to get this information.

Proposed solution:
Add an AT_BASE_PLATFORM auxiliary vector entry which indicates the
microarchitecture.  This entry uses the same string format as
AT_PLATFORM, and is readily usable by ld.so and other applications.

Other solutions that have been suggested but found wanting:

- Use a bit in AT_HWCAP to indicate compat mode -- this is not
expressive enough.  It's not possible to derive the microarchitecture
from the combination of AT_PLATFORM's value and a single bit.

- Use dsocaps -- this seems to be a ld.so-specific interface and not
easily usable by other programs.  ld.so/glibc is not the only program
that can use knowledge of the microarchitecture.

The following two patches:
- add the base support to binfmt_elf.c for AT_BASE_PLATFORM
- implement AT_BASE_PLATFORM for powerpc

Changes since v1:
- increment AT_VECTOR_SIZE_BASE
- define AT_BASE_PLATFORM in generic code instead of powerpc
___
Linuxppc-dev mailing list
Linuxppc-dev@ozlabs.org
https://ozlabs.org/mailman/listinfo/linuxppc-dev

1 2 3 4 5 6 7 8 9 >

1 - 100 of 884 matches

Mail list logo