Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Benjamin Herrenschmidt
On Mon, 2011-08-22 at 17:52 -0700, aafabbri wrote:

 I'm not following you.
 
 You have to enforce group/iommu domain assignment whether you have the
 existing uiommu API, or if you change it to your proposed
 ioctl(inherit_iommu) API.
 
 The only change needed to VFIO here should be to make uiommu fd assignment
 happen on the groups instead of on device fds.  That operation fails or
 succeeds according to the group semantics (all-or-none assignment/same
 uiommu).

Ok, so I missed that part where you change uiommu to operate on group
fd's rather than device fd's, my apologies if you actually wrote that
down :-) It might be obvious ... bare with me I just flew back from the
US and I am badly jet lagged ...

So I see what you mean, however...

 I think the question is: do we force 1:1 iommu/group mapping, or do we allow
 arbitrary mapping (satisfying group constraints) as we do today.
 
 I'm saying I'm an existing user who wants the arbitrary iommu/group mapping
 ability and definitely think the uiommu approach is cleaner than the
 ioctl(inherit_iommu) approach.  We considered that approach before but it
 seemed less clean so we went with the explicit uiommu context.

Possibly, the question that interest me the most is what interface will
KVM end up using. I'm also not terribly fan with the (perceived)
discrepancy between using uiommu to create groups but using the group fd
to actually do the mappings, at least if that is still the plan.

If the separate uiommu interface is kept, then anything that wants to be
able to benefit from the ability to put multiple devices (or existing
groups) into such a meta group would need to be explicitly modified to
deal with the uiommu APIs.

I tend to prefer such meta groups as being something you create
statically using a configuration interface, either via sysfs, netlink or
ioctl's to a control vfio device driven by a simple command line tool
(which can have the configuration stored in /etc and re-apply it at
boot).

That way, any program capable of exploiting VFIO groups will
automatically be able to exploit those meta groups (or groups of
groups) as well as long as they are supported on the system.

If we ever have system specific constraints as to how such groups can be
created, then it can all be handled at the level of that configuration
tool without impact on whatever programs know how to exploit them via
the VFIO interfaces.

   .../...
  
  If we in singleton-group land were building our own groups which were 
  sets
  of devices sharing the IOMMU domains we wanted, I suppose we could do away
  with uiommu fds, but it sounds like the current proposal would create 20
  singleton groups (x86 iommu w/o PCI bridges = all devices are 
  partitionable
  endpoints).  Asking me to ioctl(inherit) them together into a blob sounds
  worse than the current explicit uiommu API.
  
  I'd rather have an API to create super-groups (groups of groups)
  statically and then you can use such groups as normal groups using the
  same interface. That create/management process could be done via a
  simple command line utility or via sysfs banging, whatever...

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] SPI: fix build with CONFIG_SPI_FSL_ESPI=m

2011-08-23 Thread Jiri Slaby
When spi_fsl_espi is chosen to be built as a module, there is a build
error because we test only CONFIG_SPI_FSL_ESPI in declaration of
struct mpc8xxx_spi in drivers/spi/spi_fsl_lib.h.

We need to add a test for CONFIG_SPI_FSL_ESPI_MODULE too.

The error looks like:
drivers/spi/spi_fsl_espi.c: In function 'fsl_espi_bufs':
drivers/spi/spi_fsl_espi.c:232: error: 'struct mpc8xxx_spi' has no member named 
'len'
...

Signed-off-by: Jiri Slaby jsl...@suse.cz
---
 drivers/spi/spi-fsl-lib.h |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/drivers/spi/spi-fsl-lib.h b/drivers/spi/spi-fsl-lib.h
index cbe881b..97968de 100644
--- a/drivers/spi/spi-fsl-lib.h
+++ b/drivers/spi/spi-fsl-lib.h
@@ -28,7 +28,7 @@ struct mpc8xxx_spi {
/* rx  tx bufs from the spi_transfer */
const void *tx;
void *rx;
-#ifdef CONFIG_SPI_FSL_ESPI
+#if defined(CONFIG_SPI_FSL_ESPI) || defined(CONFIG_SPI_FSL_ESPI_MODULE)
int len;
 #endif
 
-- 
1.7.6


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip

2011-08-23 Thread Matthieu CASTET
LiuShuo a écrit :
 于 2011年08月23日 00:19, Scott Wood 写道:
 On 08/22/2011 11:13 AM, Matthieu CASTET wrote:
 Scott Wood a écrit :
 To eliminate it we'd need to do an extra data transfer without reissuing
 the command, which Shuo was unable to get to work.

 That's weird because our controller seems quite flexible [1].

 Something like that should work ?

  out_be32(lbc-fir,
   (FIR_OP_CM2  FIR_OP0_SHIFT) |
   (FIR_OP_CA  FIR_OP1_SHIFT) |
   (FIR_OP_PA  FIR_OP2_SHIFT) |
   (FIR_OP_WB  FIR_OP3_SHIFT));
 refill FCM buffer with next 2k data

  out_be32(lbc-fir,
   (FIR_OP_WB  FIR_OP3_SHIFT) |
   (FIR_OP_CM3  FIR_OP4_SHIFT) |
   (FIR_OP_CW1  FIR_OP5_SHIFT) |
   (FIR_OP_RS  FIR_OP6_SHIFT));
 Something like that is what I originally suggested, but Shuo said it
 didn't work (even in theory, it requires a CE-don't-care NAND chip,
 since bus atomicity is broken).

 Shuo, what specifically did you try, and what did you see happen?

 -Scott
 First, if we want to read 4K data with once command issuing, we can't 
 use HW_ECC.
Yes, but as ivan said doesn't the cost of 2 read isn't bigger than software ecc 
?

 Even if we use SW_ECC, we always get lots of weird '0xFF's between 1st 
 2k and 2nd 2k data.
Did you understand where those 0xff comes (what's the size of them. Doesn't the
controller try to insert spare aera ?)

Could you detail the sequence you used ?

Matthieu

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip

2011-08-23 Thread LiuShuo

于 2011年08月19日 00:25, Scott Wood 写道:

On 08/17/2011 09:33 PM, b35...@freescale.com wrote:

From: Liu Shuob35...@freescale.com

Freescale FCM controller has a 2K size limitation of buffer RAM. In order
to support the Nand flash chip whose page size is larger than 2K bytes,
we divide a page into multi-2K pages for MTD layer driver. In that case,
we force to set the page size to 2K bytes. We convert the page address of
MTD layer driver to a real page address in flash chips and a column index
in fsl_elbc driver. We can issue any column address by UA instruction of
elbc controller.

NOTE: Due to there is a limitation of 'Number of Partial Program Cycles in
the Same Page (NOP)', the flash chip which is supported by this workaround
have to meet below conditions.
1. page size is not greater than 4KB
2.  1) if main area and spare area have independent NOPs:
  main  area NOP:=3
  spare area NOP:=2?

How often are the NOPs split like this?


2) if main area and spare area have a common NOP:
  NOP   :=4

This depends on how the flash is used.  If you treat it as a NOP1 flash
(e.g. run ubifs rather than jffs2), then you need NOP2 for a 4K chip and
NOP4 for an 8K chip.  OTOH, if you would be making full use of NOP4 on a
real 2K chip, you'll need NOP8 for a 4K chip.

The NOP restrictions should be documented in the code itself, not just
in the git changelog.  Maybe print it to the console when this hack is
used, along with the NOP value read from the ID.


We can't read the NOP from the ID on any chip. Some chips don't
give this infomation.(e.g. Micron MT29F4G08BAC)

So it is hard to determine whether the probe() should fail in the code.
Maybe we will always print the NOP restrictions when this hack is used,
let the customers select how to use the flash on their board.

-LiuShuo

If it's less than 4
for 4K or 8 for 8K, also print a message saying not to use jffs2 (does
yaffs2 do similar things?).  If it's less than 2 for 4K or 4 for 8K, the
probe should fail.

-Scott



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: linux-next: boot test failure (net tree)

2011-08-23 Thread Jeff Kirsher
On Mon, 2011-08-22 at 21:02 -0700, David Miller wrote:
 From: Arnaud Lacombe lacom...@gmail.com
 Date: Mon, 22 Aug 2011 23:50:02 -0400
 
  Are you implying we need some kind of way to migrate config ?
 
 The issue is that the dependencies for every single ethernet driver
 have changed.  Some dependencies have been dropped (f.e. NETDEV_1
 and some have been added (f.e. ETHERNET, NET_VENDOR_)
 
 So right now an automated (non-prompted, default to no on all new
 options) run on an existing config results in all ethernet drivers
 getting disabled because the new dependencies don't get enabled.
 
 This wouldn't be so bad if it was just one or two drivers, but in
 this case it's every single ethernet driver which will have and hit
 this problem.
 

Ok, I have patch which will resolve the issue.  It is the last patch in
the series I am about to send out.  What this patch does is set the
new Kconfig options to Y, so that current defconfig's can build
driver's that are currently set to build.

This will fix the issue, I have confirmed this with the x86_64
defconfig.  It will be nice that eventually all configs get updated so
that not all the NET_VENDOR_* tags have to be enabled, but
understandably this is the best way to ensure that current defconfig's
will compile all expected drivers.


signature.asc
Description: This is a digitally signed message part
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH] SPI: fix build with CONFIG_SPI_FSL_ESPI=m

2011-08-23 Thread Jiri Slaby
On 08/23/2011 09:59 AM, Jiri Slaby wrote:
 When spi_fsl_espi is chosen to be built as a module, there is a build
 error because we test only CONFIG_SPI_FSL_ESPI in declaration of
 struct mpc8xxx_spi in drivers/spi/spi_fsl_lib.h.
 
 We need to add a test for CONFIG_SPI_FSL_ESPI_MODULE too.
 
 The error looks like:
 drivers/spi/spi_fsl_espi.c: In function 'fsl_espi_bufs':
 drivers/spi/spi_fsl_espi.c:232: error: 'struct mpc8xxx_spi' has no member 
 named 'len'
 ...
 
 Signed-off-by: Jiri Slaby jsl...@suse.cz
 ---
  drivers/spi/spi-fsl-lib.h |2 +-
  1 files changed, 1 insertions(+), 1 deletions(-)
 
 diff --git a/drivers/spi/spi-fsl-lib.h b/drivers/spi/spi-fsl-lib.h
 index cbe881b..97968de 100644
 --- a/drivers/spi/spi-fsl-lib.h
 +++ b/drivers/spi/spi-fsl-lib.h
 @@ -28,7 +28,7 @@ struct mpc8xxx_spi {
   /* rx  tx bufs from the spi_transfer */
   const void *tx;
   void *rx;
 -#ifdef CONFIG_SPI_FSL_ESPI
 +#if defined(CONFIG_SPI_FSL_ESPI) || defined(CONFIG_SPI_FSL_ESPI_MODULE)
   int len;
  #endif

Oh, and there are still link errors:
ERROR: mpc8xxx_spi_tx_buf_u32 [drivers/spi/spi_fsl_spi.ko] undefined!
ERROR: mpc8xxx_spi_rx_buf_u32 [drivers/spi/spi_fsl_spi.ko] undefined!
ERROR: mpc8xxx_spi_tx_buf_u16 [drivers/spi/spi_fsl_spi.ko] undefined!
ERROR: mpc8xxx_spi_rx_buf_u16 [drivers/spi/spi_fsl_spi.ko] undefined!
ERROR: mpc8xxx_spi_tx_buf_u8 [drivers/spi/spi_fsl_spi.ko] undefined!
ERROR: mpc8xxx_spi_rx_buf_u8 [drivers/spi/spi_fsl_spi.ko] undefined!
ERROR: of_mpc8xxx_spi_probe [drivers/spi/spi_fsl_spi.ko] undefined!
ERROR: mpc8xxx_spi_strmode [drivers/spi/spi_fsl_spi.ko] undefined!
ERROR: mpc8xxx_spi_probe [drivers/spi/spi_fsl_spi.ko] undefined!
ERROR: mpc8xxx_spi_remove [drivers/spi/spi_fsl_spi.ko] undefined!
ERROR: to_of_pinfo [drivers/spi/spi_fsl_spi.ko] undefined!
ERROR: mpc8xxx_spi_tx_buf_u32 [drivers/spi/spi_fsl_espi.ko] undefined!
ERROR: mpc8xxx_spi_rx_buf_u32 [drivers/spi/spi_fsl_espi.ko] undefined!
ERROR: of_mpc8xxx_spi_probe [drivers/spi/spi_fsl_espi.ko] undefined!
ERROR: mpc8xxx_spi_probe [drivers/spi/spi_fsl_espi.ko] undefined!
ERROR: mpc8xxx_spi_remove [drivers/spi/spi_fsl_espi.ko] undefined!

The functions are not exported...

Should I export all those or deny CONFIG_SPI_FSL_ESPI=m?

thanks,
-- 
js
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/2] [hw-breakpoint] Use generic hw-breakpoint interfaces for new PPC ptrace flags

2011-08-23 Thread K.Prasad
On Tue, Aug 23, 2011 at 03:08:50PM +1000, David Gibson wrote:
 On Fri, Aug 19, 2011 at 01:21:36PM +0530, K.Prasad wrote:
  PPC_PTRACE_GETHWDBGINFO, PPC_PTRACE_SETHWDEBUG and PPC_PTRACE_DELHWDEBUG are
  PowerPC specific ptrace flags that use the watchpoint register. While they 
  are
  targeted primarily towards BookE users, user-space applications such as GDB
  have started using them for BookS too.
  
  This patch enables the use of generic hardware breakpoint interfaces for 
  these
  new flags. The version number of the associated data structures
  ppc_hw_breakpoint and ppc_debug_info is incremented to denote new 
  semantics.
 
 So, the structure itself doesn't seem to have been extended.  I don't
 understand what the semantic difference is - your patch comment needs
 to explain this clearly.


We had a request to extend the structure but thought it was dangerous to
do so. For instance if the user-space used version1 of the structure,
while kernel did a copy_to_user() pertaining to version2, then we'd run
into problems. Unfortunately the ptrace flags weren't designed to accept
a version number as input from the user through the
PPC_PTRACE_GETHWDBGINFO flag (which would have solved this issue).

I'll add a comment w.r.t change in semantics - such as the ability to
accept 'range' breakpoints in BookS.
 
  Apart from the usual benefits of using generic hw-breakpoint interfaces, 
  these
  changes allow debuggers (such as GDB) to use a common set of ptrace flags 
  for
  their watchpoint needs and allow more precise breakpoint specification 
  (length
  of the variable can be specified).
 
 What is the mechanism for implementing the range breakpoint on book3s?
 

The hw-breakpoint interface, accepts length as an argument in BookS (any
value = 8 Bytes) and would filter out extraneous interrupts arising out
of accesses outside the range comprising addr, addr + len inside
hw_breakpoint_handler function.

We put that ability to use here.

  [Edjunior: Identified an issue in the patch with the sanity check for 
  version
  numbers]
  
  Tested-by: Edjunior Barbosa Machado emach...@linux.vnet.ibm.com
  Signed-off-by: K.Prasad pra...@linux.vnet.ibm.com
  ---
   Documentation/powerpc/ptrace.txt |   16 ++
   arch/powerpc/kernel/ptrace.c |  104 
  +++---
   2 files changed, 112 insertions(+), 8 deletions(-)
  
  diff --git a/Documentation/powerpc/ptrace.txt 
  b/Documentation/powerpc/ptrace.txt
  index f4a5499..97301ae 100644
  --- a/Documentation/powerpc/ptrace.txt
  +++ b/Documentation/powerpc/ptrace.txt
  @@ -127,6 +127,22 @@ Some examples of using the structure to:
 p.addr2   = (uint64_t) end_range;
 p.condition_value = 0;
   
  +- set a watchpoint in server processors (BookS) using version 2
  +
  +  p.version = 2;
  +  p.trigger_type= PPC_BREAKPOINT_TRIGGER_RW;
  +  p.addr_mode   = PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE;
  +  or
  +  p.addr_mode   = PPC_BREAKPOINT_MODE_RANGE_EXACT;
  +
  +  p.condition_mode  = PPC_BREAKPOINT_CONDITION_NONE;
  +  p.addr= (uint64_t) begin_range;
  +  /* For PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE addr2 needs to be specified, 
  where
  +   * addr2 - addr = 8 Bytes.
  +   */
  +  p.addr2   = (uint64_t) end_range;
  +  p.condition_value = 0;
  +
   3. PTRACE_DELHWDEBUG
   
   Takes an integer which identifies an existing breakpoint or watchpoint
  diff --git a/arch/powerpc/kernel/ptrace.c b/arch/powerpc/kernel/ptrace.c
  index 05b7dd2..18d28b6 100644
  --- a/arch/powerpc/kernel/ptrace.c
  +++ b/arch/powerpc/kernel/ptrace.c
  @@ -1339,11 +1339,17 @@ static int set_dac_range(struct task_struct *child,
   static long ppc_set_hwdebug(struct task_struct *child,
   struct ppc_hw_breakpoint *bp_info)
   {
  +#ifdef CONFIG_HAVE_HW_BREAKPOINT
  +   int ret, len = 0;
  +   struct thread_struct *thread = (child-thread);
  +   struct perf_event *bp;
  +   struct perf_event_attr attr;
  +#endif /* CONFIG_HAVE_HW_BREAKPOINT */
 
 I'm confused.  This compiled before on book3s, and I don't see any
 changes to Makefile or Kconfig in the patch that will result in this
 code compiling  when it previously didn't   Why are these new guards
 added?
 

The code is guarded using the CONFIG_ flags for two reasons.
a) We don't want the code to be included for BookE and other
architectures.
b) In BookS, we're now adding a new ability based on whether
CONFIG_HAVE_HW_BREAKPOINT is defined. Presently this config option is
kept on by default, however there are plans to make this a config-time
option.

   #ifndef CONFIG_PPC_ADV_DEBUG_REGS
  unsigned long dabr;
   #endif
   
  -   if (bp_info-version != 1)
  +   if ((bp_info-version != 1)  (bp_info-version != 2))
  return -ENOTSUPP;
   #ifdef CONFIG_PPC_ADV_DEBUG_REGS
  /*
  @@ -1382,13 +1388,9 @@ static long ppc_set_hwdebug(struct task_struct 
  *child,
   */
  if ((bp_info-trigger_type  PPC_BREAKPOINT_TRIGGER_RW) == 0 ||
 

Re: [PATCH 2/2] [PowerPC Book3E] Introduce new ptrace debug feature flag

2011-08-23 Thread K.Prasad
On Tue, Aug 23, 2011 at 03:09:31PM +1000, David Gibson wrote:
 On Fri, Aug 19, 2011 at 01:23:38PM +0530, K.Prasad wrote:
  
  While PPC_PTRACE_SETHWDEBUG ptrace flag in PowerPC accepts
  PPC_BREAKPOINT_MODE_EXACT mode of breakpoint, the same is not intimated to 
  the
  user-space debuggers (like GDB) who may want to use it. Hence we introduce a
  new PPC_DEBUG_FEATURE_DATA_BP_EXACT flag which will be populated on the
  features member of struct ppc_debug_info to advertise support for the
  same on Book3E PowerPC processors.
 
 I thought the idea was that the BP_EXACT mode was the default - if the
 new interface was supported at all, then BP_EXACT was always
 supported.  So, why do you need a new flag?
 

Yes, BP_EXACT was always supported but not advertised through
PPC_PTRACE_GETHWDBGINFO. We're now doing that.

Thanks,
K.Prasad

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip

2011-08-23 Thread LiuShuo

于 2011年08月23日 16:14, Matthieu CASTET 写道:

LiuShuo a écrit :

于 2011年08月23日 00:19, Scott Wood 写道:

On 08/22/2011 11:13 AM, Matthieu CASTET wrote:

Scott Wood a écrit :

To eliminate it we'd need to do an extra data transfer without reissuing
the command, which Shuo was unable to get to work.


That's weird because our controller seems quite flexible [1].

Something like that should work ?

  out_be32(lbc-fir,
   (FIR_OP_CM2   FIR_OP0_SHIFT) |
   (FIR_OP_CA   FIR_OP1_SHIFT) |
   (FIR_OP_PA   FIR_OP2_SHIFT) |
   (FIR_OP_WB   FIR_OP3_SHIFT));
refill FCM buffer with next 2k data

  out_be32(lbc-fir,
   (FIR_OP_WB   FIR_OP3_SHIFT) |
   (FIR_OP_CM3   FIR_OP4_SHIFT) |
   (FIR_OP_CW1   FIR_OP5_SHIFT) |
   (FIR_OP_RS   FIR_OP6_SHIFT));

Something like that is what I originally suggested, but Shuo said it
didn't work (even in theory, it requires a CE-don't-care NAND chip,
since bus atomicity is broken).

Shuo, what specifically did you try, and what did you see happen?

-Scott

First, if we want to read 4K data with once command issuing, we can't
use HW_ECC.

Yes, but as ivan said doesn't the cost of 2 read isn't bigger than software ecc 
?


Even if we use SW_ECC, we always get lots of weird '0xFF's between 1st
2k and 2nd 2k data.

Did you understand where those 0xff comes (what's the size of them. Doesn't the
controller try to insert spare aera ?)
I don't understand. I set FBCR to 2048, the controller will read the 
main area without spare area.

But the size of them is nearly spare area size( more or less a few bytes).
I can't guess the behavior of the controller then, so I select another way.

Could you try to do it and explain how those 0xff comes ?

Could you detail the sequence you used ?


First half :
  out_be32(lbc-fbcr, 2048);
  out_be32(lbc-fir,
   (FIR_OP_CM0  FIR_OP0_SHIFT) |
   (FIR_OP_CA  FIR_OP1_SHIFT) |
   (FIR_OP_PA  FIR_OP2_SHIFT) |
   (FIR_OP_CM1  FIR_OP3_SHIFT) |
   (FIR_OP_RBW  FIR_OP4_SHIFT));


Sencond half :
out_be32(lbc-fbcr, 2048);
out_be32(lbc-fir,
   (FIR_OP_RB  FIR_OP0_SHIFT) |
   (FIR_OP_RBW  FIR_OP1_SHIFT));


-Liu Shuo


Matthieu





___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip

2011-08-23 Thread Matthieu CASTET
LiuShuo a écrit :
 于 2011年08月19日 00:25, Scott Wood 写道:
 On 08/17/2011 09:33 PM, b35...@freescale.com wrote:
 From: Liu Shuob35...@freescale.com

 Freescale FCM controller has a 2K size limitation of buffer RAM. In order
 to support the Nand flash chip whose page size is larger than 2K bytes,
 we divide a page into multi-2K pages for MTD layer driver. In that case,
 we force to set the page size to 2K bytes. We convert the page address of
 MTD layer driver to a real page address in flash chips and a column index
 in fsl_elbc driver. We can issue any column address by UA instruction of
 elbc controller.

 NOTE: Due to there is a limitation of 'Number of Partial Program Cycles in
 the Same Page (NOP)', the flash chip which is supported by this workaround
 have to meet below conditions.
 1. page size is not greater than 4KB
 2.  1) if main area and spare area have independent NOPs:
   main  area NOP:=3
   spare area NOP:=2?
 How often are the NOPs split like this?

 2) if main area and spare area have a common NOP:
   NOP   :=4
 This depends on how the flash is used.  If you treat it as a NOP1 flash
 (e.g. run ubifs rather than jffs2), then you need NOP2 for a 4K chip and
 NOP4 for an 8K chip.  OTOH, if you would be making full use of NOP4 on a
 real 2K chip, you'll need NOP8 for a 4K chip.

 The NOP restrictions should be documented in the code itself, not just
 in the git changelog.  Maybe print it to the console when this hack is
 used, along with the NOP value read from the ID.
 
 We can't read the NOP from the ID on any chip. Some chips don't
 give this infomation.(e.g. Micron MT29F4G08BAC)
Doesn't the micron chip provide it with onfi info ?

Matthieu
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip

2011-08-23 Thread Matthieu CASTET
LiuShuo a écrit :
 于 2011年08月23日 16:14, Matthieu CASTET 写道:
 LiuShuo a écrit :
 于 2011年08月23日 00:19, Scott Wood 写道:
 On 08/22/2011 11:13 AM, Matthieu CASTET wrote:
 Scott Wood a écrit :
 To eliminate it we'd need to do an extra data transfer without reissuing
 the command, which Shuo was unable to get to work.

 That's weird because our controller seems quite flexible [1].

 Something like that should work ?

   out_be32(lbc-fir,
(FIR_OP_CM2   FIR_OP0_SHIFT) |
(FIR_OP_CA   FIR_OP1_SHIFT) |
(FIR_OP_PA   FIR_OP2_SHIFT) |
(FIR_OP_WB   FIR_OP3_SHIFT));
 refill FCM buffer with next 2k data

   out_be32(lbc-fir,
(FIR_OP_WB   FIR_OP3_SHIFT) |
(FIR_OP_CM3   FIR_OP4_SHIFT) |
(FIR_OP_CW1   FIR_OP5_SHIFT) |
(FIR_OP_RS   FIR_OP6_SHIFT));
 Something like that is what I originally suggested, but Shuo said it
 didn't work (even in theory, it requires a CE-don't-care NAND chip,
 since bus atomicity is broken).

 Shuo, what specifically did you try, and what did you see happen?

 -Scott
 First, if we want to read 4K data with once command issuing, we can't
 use HW_ECC.
 Yes, but as ivan said doesn't the cost of 2 read isn't bigger than software 
 ecc ?

 Even if we use SW_ECC, we always get lots of weird '0xFF's between 1st
 2k and 2nd 2k data.
 Did you understand where those 0xff comes (what's the size of them. Doesn't 
 the
 controller try to insert spare aera ?)
 I don't understand. I set FBCR to 2048, the controller will read the 
 main area without spare area.
 But the size of them is nearly spare area size( more or less a few bytes)..
 I can't guess the behavior of the controller then, so I select another way.
 
 Could you try to do it and explain how those 0xff comes ?
 Could you detail the sequence you used ?

 First half :
out_be32(lbc-fbcr, 2048);
shouldn't you read 2k+64 here ? At the end you want 4k plus spare aera (128) ?

out_be32(lbc-fir,
 (FIR_OP_CM0  FIR_OP0_SHIFT) |
 (FIR_OP_CA  FIR_OP1_SHIFT) |
 (FIR_OP_PA  FIR_OP2_SHIFT) |
 (FIR_OP_CM1  FIR_OP3_SHIFT) |
 (FIR_OP_RBW  FIR_OP4_SHIFT));
 
 
 Sencond half :
  out_be32(lbc-fbcr, 2048);
  out_be32(lbc-fir,
 (FIR_OP_RB  FIR_OP0_SHIFT) |
 (FIR_OP_RBW  FIR_OP1_SHIFT));
Why do you do FIR_OP_RBW ?
FIR_OP_RB already fetch the data.

Matthieu
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Joerg Roedel
On Mon, Aug 22, 2011 at 08:52:18PM -0400, aafabbri wrote:
 You have to enforce group/iommu domain assignment whether you have the
 existing uiommu API, or if you change it to your proposed
 ioctl(inherit_iommu) API.
 
 The only change needed to VFIO here should be to make uiommu fd assignment
 happen on the groups instead of on device fds.  That operation fails or
 succeeds according to the group semantics (all-or-none assignment/same
 uiommu).

That is makes uiommu basically the same as the meta-groups, right?

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Joerg Roedel
On Tue, Aug 23, 2011 at 02:54:43AM -0400, Benjamin Herrenschmidt wrote:
 Possibly, the question that interest me the most is what interface will
 KVM end up using. I'm also not terribly fan with the (perceived)
 discrepancy between using uiommu to create groups but using the group fd
 to actually do the mappings, at least if that is still the plan.
 
 If the separate uiommu interface is kept, then anything that wants to be
 able to benefit from the ability to put multiple devices (or existing
 groups) into such a meta group would need to be explicitly modified to
 deal with the uiommu APIs.
 
 I tend to prefer such meta groups as being something you create
 statically using a configuration interface, either via sysfs, netlink or
 ioctl's to a control vfio device driven by a simple command line tool
 (which can have the configuration stored in /etc and re-apply it at
 boot).

Hmm, I don't think that these groups are static for the systems
run-time. They only exist for the lifetime of a guest per default, at
least on x86. Thats why I prefer to do this grouping using VFIO and not
some sysfs interface (which would be the third interface beside the
ioctls and netlink a VFIO user needs to be aware of). Doing this in the
ioctl interface just makes things easier.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] powerpc: fixup QE_General4 errata

2011-08-23 Thread Joakim Tjernlund
QE_General4 should only round up the divisor iff divisor is  3.
Rounding up lower divisors makes the error too big, causing USB
on MPC832x to fail.

Signed-off-by: Joakim Tjernlund joakim.tjernl...@transmode.se
---
 arch/powerpc/sysdev/qe_lib/qe.c |2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)

diff --git a/arch/powerpc/sysdev/qe_lib/qe.c b/arch/powerpc/sysdev/qe_lib/qe.c
index 093e0ae..5399316 100644
--- a/arch/powerpc/sysdev/qe_lib/qe.c
+++ b/arch/powerpc/sysdev/qe_lib/qe.c
@@ -216,7 +216,7 @@ int qe_setbrg(enum qe_clock brg, unsigned int rate, 
unsigned int multiplier)
/* Errata QE_General4, which affects some MPC832x and MPC836x SOCs, says
   that the BRG divisor must be even if you're not using divide-by-16
   mode. */
-   if (!div16  (divisor  1))
+   if (!div16  (divisor  1)  (divisor  3))
divisor++;
 
tempval = ((divisor - 1)  QE_BRGC_DIVISOR_SHIFT) |
-- 
1.7.3.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] usb: Allocate pram dynamically.

2011-08-23 Thread Joakim Tjernlund
MPC832x does not have enough MURAM to do fixed MURAM allocation.
Change to dynamic allocation.

Signed-off-by: Joakim Tjernlund joakim.tjernl...@transmode.se
---
 drivers/usb/host/fhci-hcd.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/usb/host/fhci-hcd.c b/drivers/usb/host/fhci-hcd.c
index c7c8392..98adbe8 100644
--- a/drivers/usb/host/fhci-hcd.c
+++ b/drivers/usb/host/fhci-hcd.c
@@ -622,12 +622,15 @@ static int __devinit of_fhci_probe(struct of_device 
*ofdev,
goto err_pram;
}
 
-   pram_addr = cpm_muram_alloc_fixed(iprop[2], FHCI_PRAM_SIZE);
+   pram_addr = cpm_muram_alloc(FHCI_PRAM_SIZE, 64);
if (IS_ERR_VALUE(pram_addr)) {
dev_err(dev, failed to allocate usb pram\n);
ret = -ENOMEM;
goto err_pram;
}
+
+   qe_issue_cmd(QE_ASSIGN_PAGE_TO_DEVICE, QE_CR_SUBBLOCK_USB,
+QE_CR_PROTOCOL_UNSPECIFIED, pram_addr);
fhci-pram = cpm_muram_addr(pram_addr);
 
/* GPIOs and pins */
-- 
1.7.3.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


[PATCH] perf: fix build for PowerPC with uclibc toolchains

2011-08-23 Thread Florian Fainelli
libio.h is not provided by uClibc, in order to be able to test the
definition of __UCLIBC__ we need to include stdlib.h, which also
includes stddef.h, providing the definition of 'NULL'.

Signed-off-by: Florian Fainelli flor...@openwrt.org
---
FYI, I submitted the exact same patch for ARM:
https://patchwork.kernel.org/patch/1049152/

diff --git a/tools/perf/arch/powerpc/util/dwarf-regs.c 
b/tools/perf/arch/powerpc/util/dwarf-regs.c
index 48ae0c5..7cdd61d 100644
--- a/tools/perf/arch/powerpc/util/dwarf-regs.c
+++ b/tools/perf/arch/powerpc/util/dwarf-regs.c
@@ -9,7 +9,10 @@
  * 2 of the License, or (at your option) any later version.
  */
 
+#include stdlib.h
+#ifndef __UCLIBC__
 #include libio.h
+#endif
 #include dwarf-regs.h
 
 
-- 
1.7.4.1

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


RE: [PATCH] RapidIO: Fix use of non-compatible registers

2011-08-23 Thread Bounine, Alexandre
Andrew Morton a...@linux-foundation.org wrote:

 On Tue, 26 Jul 2011 14:07:26 -0400
 Alexandre Bounine alexandre.boun...@idt.com wrote:
 
  Replace/remove use of RIO v.1.2 registers/bits that are not forward-
 compatible
  with newer versions of RapidIO specification.
 
  RapidIO specification v. 1.3 removed Write Port CSR, Doorbell CSR,
  Mailbox CSR and Mailbox and Doorbell bits of the PEF CAR.
 
.
 You did a cc:stable but provided no reason (that I can understand) for
 backporting the patch.  Please explain why the problem is sufficiently
 serious to warrant this action.

My reason for this is that use of removed (since RIO v.1.3) register
bits
affects users of currently available 1.3 and 2.x compliant devices who
may
use not so recent kernel versions.

Removing checks for unsupported bits makes corresponding routines
compatible
with all versions of RapidIO specification. Therefore, backporting makes
stable
kernel versions compliant with RIO v.1.3 and later as well. 



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] usb: Allocate pram dynamically.

2011-08-23 Thread Anton Vorontsov
On Tue, Aug 23, 2011 at 02:38:41PM +0200, Joakim Tjernlund wrote:
 MPC832x does not have enough MURAM to do fixed MURAM allocation.
 Change to dynamic allocation.
 
 Signed-off-by: Joakim Tjernlund joakim.tjernl...@transmode.se

Acked-by: Anton Vorontsov cbouatmai...@gmail.com

Thanks!

p.s. You probably want to send this to Greg KH, + Cc linux-usb
mailing list.

-- 
Anton Vorontsov
Email: cbouatmai...@gmail.com
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Roedel, Joerg
On Mon, Aug 22, 2011 at 05:03:53PM -0400, Benjamin Herrenschmidt wrote:
 
  I am in favour of /dev/vfio/$GROUP. If multiple devices should be
  assigned to a guest, there can also be an ioctl to bind a group to an
  address-space of another group (certainly needs some care to not allow
  that both groups belong to different processes).
  
  Btw, a problem we havn't talked about yet entirely is
  driver-deassignment. User space can decide to de-assign the device from
  vfio while a fd is open on it. With PCI there is no way to let this fail
  (the .release function returns void last time i checked). Is this a
  problem, and yes, how we handle that?
 
 We can treat it as a hard unplug (like a cardbus gone away).
 
 IE. Dispose of the direct mappings (switch to MMIO emulation) and return
 all ff's from reads ( ignore writes).
 
 Then send an unplug event via whatever mechanism the platform provides
 (ACPI hotplug controller on x86 for example, we haven't quite sorted out
 what to do on power for hotplug yet).

Hmm, good idea. But as far as I know the hotplug-event needs to be in
the guest _before_ the device is actually unplugged (so that the guest
can unbind its driver first). That somehow brings back the sleep-idea
and the timeout in the .release function.

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Roedel, Joerg
On Mon, Aug 22, 2011 at 03:17:00PM -0400, Alex Williamson wrote:
 On Mon, 2011-08-22 at 19:25 +0200, Joerg Roedel wrote:

  I am in favour of /dev/vfio/$GROUP. If multiple devices should be
  assigned to a guest, there can also be an ioctl to bind a group to an
  address-space of another group (certainly needs some care to not allow
  that both groups belong to different processes).
 
 That's an interesting idea.  Maybe an interface similar to the current
 uiommu interface, where you open() the 2nd group fd and pass the fd via
 ioctl to the primary group.  IOMMUs that don't support this would fail
 the attach device callback, which would fail the ioctl to bind them.  It
 will need to be designed so any group can be removed from the super-set
 and the remaining group(s) still works.  This feels like something that
 can be added after we get an initial implementation.

Handling it through fds is a good idea. This makes sure that everything
belongs to one process. I am not really sure yet if we go the way to
just bind plain groups together or if we create meta-groups. The
meta-groups thing seems somewhat cleaner, though.

  Btw, a problem we havn't talked about yet entirely is
  driver-deassignment. User space can decide to de-assign the device from
  vfio while a fd is open on it. With PCI there is no way to let this fail
  (the .release function returns void last time i checked). Is this a
  problem, and yes, how we handle that?
 
 The current vfio has the same problem, we can't unbind a device from
 vfio while it's attached to a guest.  I think we'd use the same solution
 too; send out a netlink packet for a device removal and have the .remove
 call sleep on a wait_event(, refcnt == 0).  We could also set a timeout
 and SIGBUS the PIDs holding the device if they don't return it
 willingly.  Thanks,

Putting the process to sleep (which would be uninterruptible) seems bad.
The process would sleep until the guest releases the device-group, which
can take days or months.
The best thing (and the most intrusive :-) ) is to change PCI core to
allow unbindings to fail, I think. But this probably further complicates
the way to upstream VFIO...

Joerg

-- 
AMD Operating System Research Center

Advanced Micro Devices GmbH Einsteinring 24 85609 Dornach
General Managers: Alberto Bozzo, Andrew Bowd
Registration: Dornach, Landkr. Muenchen; Registerger. Muenchen, HRB Nr. 43632

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH] usb: Allocate pram dynamically.

2011-08-23 Thread Joakim Tjernlund
Anton Vorontsov cbouatmai...@gmail.com wrote on 2011/08/23 15:02:53:

 From: Anton Vorontsov cbouatmai...@gmail.com
 To: Joakim Tjernlund joakim.tjernl...@transmode.se
 Cc: linuxppc-dev@lists.ozlabs.org
 Date: 2011/08/23 15:02
 Subject: Re: [PATCH] usb: Allocate pram dynamically.

 On Tue, Aug 23, 2011 at 02:38:41PM +0200, Joakim Tjernlund wrote:
  MPC832x does not have enough MURAM to do fixed MURAM allocation.
  Change to dynamic allocation.
 
  Signed-off-by: Joakim Tjernlund joakim.tjernl...@transmode.se

 Acked-by: Anton Vorontsov cbouatmai...@gmail.com

 Thanks!

 p.s. You probably want to send this to Greg KH, + Cc linux-usb
 mailing list.

Added linux-usb and Greg KH per Antons suggestion.

 Jocke

From 587137e365ac1ba7e333a09962b3e4b68c587808 Mon Sep 17 00:00:00 2001
From: Joakim Tjernlund joakim.tjernl...@transmode.se
Date: Tue, 23 Aug 2011 11:04:24 +0200
Subject: [PATCH] usb: Allocate pram dynamically.

MPC832x does not have enough MURAM to do fixed MURAM allocation.
Change to dynamic allocation.

Signed-off-by: Joakim Tjernlund joakim.tjernl...@transmode.se
---
 drivers/usb/host/fhci-hcd.c |5 -
 1 files changed, 4 insertions(+), 1 deletions(-)

diff --git a/drivers/usb/host/fhci-hcd.c b/drivers/usb/host/fhci-hcd.c
index c7c8392..98adbe8 100644
--- a/drivers/usb/host/fhci-hcd.c
+++ b/drivers/usb/host/fhci-hcd.c
@@ -622,12 +622,15 @@ static int __devinit of_fhci_probe(struct of_device 
*ofdev,
goto err_pram;
}

-   pram_addr = cpm_muram_alloc_fixed(iprop[2], FHCI_PRAM_SIZE);
+   pram_addr = cpm_muram_alloc(FHCI_PRAM_SIZE, 64);
if (IS_ERR_VALUE(pram_addr)) {
dev_err(dev, failed to allocate usb pram\n);
ret = -ENOMEM;
goto err_pram;
}
+
+   qe_issue_cmd(QE_ASSIGN_PAGE_TO_DEVICE, QE_CR_SUBBLOCK_USB,
+QE_CR_PROTOCOL_UNSPECIFIED, pram_addr);
fhci-pram = cpm_muram_addr(pram_addr);

/* GPIOs and pins */
--
1.7.3.4

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


MPC5200 + BestComm support in QEMU

2011-08-23 Thread steve . belanger
Hi, 

I'm Steve, an embedded software developper for Bombardier Transportation 
Canada. We use the MPC5200 for most of our onboard computers inside train 
control systems. To enhance our SW engineering process, we would like the 
emulate the MPC5200 processor using QEMU, an open source software CPU 
emulator. This software supports the MPC5200 CPU emulation.

However, the network interface is handled with the BestComm DMA engine and 
it seems very difficult to simulate this co-processor with our current 
knowledge level. In that sense, I would like to know if someone was able 
to emulate correctly the MPC5200 with the BestComm DMA using QEMU 
software?

Regards,

Steve Bélanger, ing. / Eng.
Développeur logiciel embarqué / Embedded Software Developper
Bombardier Transportation Canada Inc. 
Train Control and Management System
Saint-Bruno: 450-441-2020 ext.6148

 
  
Please consider the environment before you print / Merci de penser à 
l'environnement avant d'imprimer 

___
 

This e-mail communication (and any attachment/s) may contain confidential 
or privileged information and is intended only for the individual(s) or 
entity named above and to others who have been specifically authorized to 
receive it. If you are not the intended recipient, please do not read, 
copy, use or disclose the contents of this communication to others. Please 
notify the sender that you have received this e-mail in error by reply 
e-mail, and delete the e-mail subsequently. Please note that in order to 
protect the security of our information systems an AntiSPAM solution is in 
use and will browse through incoming emails. 
Thank you. 
_
 


Ce message (ainsi que le(s) fichier(s)), transmis par courriel, peut 
contenir des renseignements confidentiels ou protégés et est destiné à 
l?usage exclusif du destinataire ci-dessus. Toute autre personne est, par 
les présentes, avisée qu?il est strictement interdit de le diffuser, le 
distribuer ou le reproduire. Si vous l?avez reçu par inadvertance, 
veuillez nous en aviser et détruire ce message. Veuillez prendre note 
qu'une solution antipollupostage (AntiSPAM) est utilisée afin d'assurer la 
sécurité de nos systèmes d'information et qu'elle furètera les courriels 
entrants.
Merci. 
_
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

MPC5200 + BestComm support in QEMU

2011-08-23 Thread steve . belanger
Hi, 

I'm Steve, an embedded software developper for Bombardier Transportation 
Canada. We use the MPC5200 for most of our onboard computers inside train 
control systems. To enhance our SW engineering process, we would like the 
emulate the MPC5200 processor using QEMU, an open source software CPU 
emulator. This software supports the MPC5200 CPU emulation.

However, the network interface is handled with the BestComm DMA engine and 
it seems very difficult to simulate this co-processor with our current 
knowledge level. In that sense, I would like to know if someone was able 
to emulate correctly the MPC5200 with the BestComm DMA using QEMU 
software?

Steve Bélanger, ing. / Eng.
Développeur logiciel embarqué / Embedded Software Developper
Bombardier Transportation Canada Inc. 
Train Control and Management System
Saint-Bruno: 450-441-2020 ext.6148

 
  
Please consider the environment before you print / Merci de penser à 
l'environnement avant d'imprimer 

___
 

This e-mail communication (and any attachment/s) may contain confidential 
or privileged information and is intended only for the individual(s) or 
entity named above and to others who have been specifically authorized to 
receive it. If you are not the intended recipient, please do not read, 
copy, use or disclose the contents of this communication to others. Please 
notify the sender that you have received this e-mail in error by reply 
e-mail, and delete the e-mail subsequently. Please note that in order to 
protect the security of our information systems an AntiSPAM solution is in 
use and will browse through incoming emails. 
Thank you. 
_
 


Ce message (ainsi que le(s) fichier(s)), transmis par courriel, peut 
contenir des renseignements confidentiels ou protégés et est destiné à 
l?usage exclusif du destinataire ci-dessus. Toute autre personne est, par 
les présentes, avisée qu?il est strictement interdit de le diffuser, le 
distribuer ou le reproduire. Si vous l?avez reçu par inadvertance, 
veuillez nous en aviser et détruire ce message. Veuillez prendre note 
qu'une solution antipollupostage (AntiSPAM) est utilisée afin d'assurer la 
sécurité de nos systèmes d'information et qu'elle furètera les courriels 
entrants.
Merci. 
_
 


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: [PATCH v3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip

2011-08-23 Thread Scott Wood
On 08/23/2011 05:02 AM, Matthieu CASTET wrote:
 LiuShuo a écrit :
 We can't read the NOP from the ID on any chip. Some chips don't
 give this infomation.(e.g. Micron MT29F4G08BAC)

Are there any 4K+ chips (especially ones with insufficient NOP) that
don't have the info?

This chip is 2K and NOP8.

Is there an easy way (without needing to have every datasheet for every
chip ever made) to determine at runtime which chips supply this information?

 Doesn't the micron chip provide it with onfi info ?

This chip doesn't appear to be ONFI.

-Scott

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Alex Williamson
On Tue, 2011-08-23 at 12:38 +1000, David Gibson wrote:
 On Mon, Aug 22, 2011 at 09:45:48AM -0600, Alex Williamson wrote:
  On Mon, 2011-08-22 at 15:55 +1000, David Gibson wrote:
   On Sat, Aug 20, 2011 at 09:51:39AM -0700, Alex Williamson wrote:
We had an extremely productive VFIO BoF on Monday.  Here's my attempt to
capture the plan that I think we agreed to:

We need to address both the description and enforcement of device
groups.  Groups are formed any time the iommu does not have resolution
between a set of devices.  On x86, this typically happens when a
PCI-to-PCI bridge exists between the set of devices and the iommu.  For
Power, partitionable endpoints define a group.  Grouping information
needs to be exposed for both userspace and kernel internal usage.  This
will be a sysfs attribute setup by the iommu drivers.  Perhaps:

# cat /sys/devices/pci:00/:00:19.0/iommu_group
42

(I use a PCI example here, but attribute should not be PCI specific)
   
   Ok.  Am I correct in thinking these group IDs are representing the
   minimum granularity, and are therefore always static, defined only by
   the connected hardware, not by configuration?
  
  Yes, that's the idea.  An open question I have towards the configuration
  side is whether we might add iommu driver specific options to the
  groups.  For instance on x86 where we typically have B:D.F granularity,
  should we have an option not to trust multi-function devices and use a
  B:D granularity for grouping?
 
 Right.  And likewise I can see a place for configuration parameters
 like the present 'allow_unsafe_irqs'.  But these would be more-or-less
 global options which affected the overall granularity, rather than
 detailed configuration such as explicitly binding some devices into a
 group, yes?

Yes, currently the interrupt remapping support is a global iommu
capability.  I suppose it's possible that this could be an iommu option,
where the iommu driver would not advertise a group if the interrupt
remapping constraint isn't met.

From there we have a few options.  In the BoF we discussed a model 
where
binding a device to vfio creates a /dev/vfio$GROUP character device
file.  This group fd provides provides dma mapping ioctls as well as
ioctls to enumerate and return a device fd for each attached member of
the group (similar to KVM_CREATE_VCPU).  We enforce grouping by
returning an error on open() of the group fd if there are members of the
group not bound to the vfio driver.  Each device fd would then support a
similar set of ioctls and mapping (mmio/pio/config) interface as current
vfio, except for the obvious domain and dma ioctls superseded by the
group fd.
   
   It seems a slightly strange distinction that the group device appears
   when any device in the group is bound to vfio, but only becomes usable
   when all devices are bound.
   
Another valid model might be that /dev/vfio/$GROUP is created for all
groups when the vfio module is loaded.  The group fd would allow open()
and some set of iommu querying and device enumeration ioctls, but would
error on dma mapping and retrieving device fds until all of the group
devices are bound to the vfio driver.
   
   Which is why I marginally prefer this model, although it's not a big
   deal.
  
  Right, we can also combine models.  Binding a device to vfio
  creates /dev/vfio$GROUP, which only allows a subset of ioctls and no
  device access until all the group devices are also bound.  I think
  the /dev/vfio/$GROUP might help provide an enumeration interface as well
  though, which could be useful.
 
 I'm not entirely sure what you mean here.  But, that's now several
 weak votes in favour of the always-present group devices, and none in
 favour of the created-when-first-device-bound model, so I suggest we
 take the /dev/vfio/$GROUP as our tentative approach.

Yep

In either case, the uiommu interface is removed entirely since dma
mapping is done via the group fd.  As necessary in the future, we can
define a more high performance dma mapping interface for streaming dma
via the group fd.  I expect we'll also include architecture specific
group ioctls to describe features and capabilities of the iommu.  The
group fd will need to prevent concurrent open()s to maintain a 1:1 group
to userspace process ownership model.
   
   A 1:1 group-process correspondance seems wrong to me. But there are
   many ways you could legitimately write the userspace side of the code,
   many of them involving some sort of concurrency.  Implementing that
   concurrency as multiple processes (using explicit shared memory and/or
   other IPC mechanisms to co-ordinate) seems a valid choice that we
   shouldn't arbitrarily prohibit.
   
   Obviously, only one UID may be permitted to have the group open at a
   time, and I think that's enough to prevent them doing any 

Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread aafabbri



On 8/23/11 4:04 AM, Joerg Roedel joerg.roe...@amd.com wrote:

 On Mon, Aug 22, 2011 at 08:52:18PM -0400, aafabbri wrote:
 You have to enforce group/iommu domain assignment whether you have the
 existing uiommu API, or if you change it to your proposed
 ioctl(inherit_iommu) API.
 
 The only change needed to VFIO here should be to make uiommu fd assignment
 happen on the groups instead of on device fds.  That operation fails or
 succeeds according to the group semantics (all-or-none assignment/same
 uiommu).
 
 That is makes uiommu basically the same as the meta-groups, right?

Yes, functionality seems the same, thus my suggestion to keep uiommu
explicit.  Is there some need for group-groups besides defining sets of
groups which share IOMMU resources?

I do all this stuff (bringing up sets of devices which may share IOMMU
domain) dynamically from C applications.  I don't really want some static
(boot-time or sysfs fiddling) supergroup config unless there is a good
reason KVM/power needs it.

As you say in your next email, doing it all from ioctls is very easy,
programmatically.

-Aaron Fabbri

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Alex Williamson
On Tue, 2011-08-23 at 16:54 +1000, Benjamin Herrenschmidt wrote:
 On Mon, 2011-08-22 at 17:52 -0700, aafabbri wrote:
 
  I'm not following you.
  
  You have to enforce group/iommu domain assignment whether you have the
  existing uiommu API, or if you change it to your proposed
  ioctl(inherit_iommu) API.
  
  The only change needed to VFIO here should be to make uiommu fd assignment
  happen on the groups instead of on device fds.  That operation fails or
  succeeds according to the group semantics (all-or-none assignment/same
  uiommu).
 
 Ok, so I missed that part where you change uiommu to operate on group
 fd's rather than device fd's, my apologies if you actually wrote that
 down :-) It might be obvious ... bare with me I just flew back from the
 US and I am badly jet lagged ...

I missed it too, the model I'm proposing entirely removes the uiommu
concept.

 So I see what you mean, however...
 
  I think the question is: do we force 1:1 iommu/group mapping, or do we allow
  arbitrary mapping (satisfying group constraints) as we do today.
  
  I'm saying I'm an existing user who wants the arbitrary iommu/group mapping
  ability and definitely think the uiommu approach is cleaner than the
  ioctl(inherit_iommu) approach.  We considered that approach before but it
  seemed less clean so we went with the explicit uiommu context.
 
 Possibly, the question that interest me the most is what interface will
 KVM end up using. I'm also not terribly fan with the (perceived)
 discrepancy between using uiommu to create groups but using the group fd
 to actually do the mappings, at least if that is still the plan.

Current code: uiommu creates the domain, we bind a vfio device to that
domain via a SET_UIOMMU_DOMAIN ioctl on the vfio device, then do
mappings via MAP_DMA on the vfio device (affecting all the vfio devices
bound to the domain)

My current proposal: groups are predefined.  groups ~= iommu domain.
The iommu domain would probably be allocated when the first device is
bound to vfio.  As each device is bound, it gets attached to the group.
DMAs are done via an ioctl on the group.

I think group + uiommu leads to effectively reliving most of the
problems with the current code.  The only benefit is the group
assignment to enforce hardware restrictions.  We still have the problem
that uiommu open() = iommu_domain_alloc(), whose properties are
meaningless without attached devices (groups).  Which I think leads to
the same awkward model of attaching groups to define the domain, then we
end up doing mappings via the group to enforce ordering.

 If the separate uiommu interface is kept, then anything that wants to be
 able to benefit from the ability to put multiple devices (or existing
 groups) into such a meta group would need to be explicitly modified to
 deal with the uiommu APIs.
 
 I tend to prefer such meta groups as being something you create
 statically using a configuration interface, either via sysfs, netlink or
 ioctl's to a control vfio device driven by a simple command line tool
 (which can have the configuration stored in /etc and re-apply it at
 boot).

I cringe anytime there's a mention of static.  IMHO, we have to
support hotplug.  That means meta groups change dynamically.  Maybe
this supports the idea that we should be able to retrieve a new fd from
the group to do mappings.  Any groups bound together will return the
same fd and the fd will persist so long as any member of the group is
open.

 That way, any program capable of exploiting VFIO groups will
 automatically be able to exploit those meta groups (or groups of
 groups) as well as long as they are supported on the system.
 
 If we ever have system specific constraints as to how such groups can be
 created, then it can all be handled at the level of that configuration
 tool without impact on whatever programs know how to exploit them via
 the VFIO interfaces.

I'd prefer to have the constraints be represented in the ioctl to bind
groups.  It works or not and the platform gets to define what it
considers compatible.  Thanks,

Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Alex Williamson
On Tue, 2011-08-23 at 15:14 +0200, Roedel, Joerg wrote:
 On Mon, Aug 22, 2011 at 03:17:00PM -0400, Alex Williamson wrote:
  On Mon, 2011-08-22 at 19:25 +0200, Joerg Roedel wrote:
 
   I am in favour of /dev/vfio/$GROUP. If multiple devices should be
   assigned to a guest, there can also be an ioctl to bind a group to an
   address-space of another group (certainly needs some care to not allow
   that both groups belong to different processes).
  
  That's an interesting idea.  Maybe an interface similar to the current
  uiommu interface, where you open() the 2nd group fd and pass the fd via
  ioctl to the primary group.  IOMMUs that don't support this would fail
  the attach device callback, which would fail the ioctl to bind them.  It
  will need to be designed so any group can be removed from the super-set
  and the remaining group(s) still works.  This feels like something that
  can be added after we get an initial implementation.
 
 Handling it through fds is a good idea. This makes sure that everything
 belongs to one process. I am not really sure yet if we go the way to
 just bind plain groups together or if we create meta-groups. The
 meta-groups thing seems somewhat cleaner, though.

I'm leaning towards binding because we need to make it dynamic, but I
don't really have a good picture of the lifecycle of a meta-group.

   Btw, a problem we havn't talked about yet entirely is
   driver-deassignment. User space can decide to de-assign the device from
   vfio while a fd is open on it. With PCI there is no way to let this fail
   (the .release function returns void last time i checked). Is this a
   problem, and yes, how we handle that?
  
  The current vfio has the same problem, we can't unbind a device from
  vfio while it's attached to a guest.  I think we'd use the same solution
  too; send out a netlink packet for a device removal and have the .remove
  call sleep on a wait_event(, refcnt == 0).  We could also set a timeout
  and SIGBUS the PIDs holding the device if they don't return it
  willingly.  Thanks,
 
 Putting the process to sleep (which would be uninterruptible) seems bad.
 The process would sleep until the guest releases the device-group, which
 can take days or months.
 The best thing (and the most intrusive :-) ) is to change PCI core to
 allow unbindings to fail, I think. But this probably further complicates
 the way to upstream VFIO...

Yes, it's not ideal but I think it's sufficient for now and if we later
get support for returning an error from release, we can set a timeout
after notifying the user to make use of that.  Thanks,

Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Aaron Fabbri



On 8/23/11 10:01 AM, Alex Williamson alex.william...@redhat.com wrote:

 On Tue, 2011-08-23 at 16:54 +1000, Benjamin Herrenschmidt wrote:
 On Mon, 2011-08-22 at 17:52 -0700, aafabbri wrote:
 
 I'm not following you.
 
 You have to enforce group/iommu domain assignment whether you have the
 existing uiommu API, or if you change it to your proposed
 ioctl(inherit_iommu) API.
 
 The only change needed to VFIO here should be to make uiommu fd assignment
 happen on the groups instead of on device fds.  That operation fails or
 succeeds according to the group semantics (all-or-none assignment/same
 uiommu).
 
 Ok, so I missed that part where you change uiommu to operate on group
 fd's rather than device fd's, my apologies if you actually wrote that
 down :-) It might be obvious ... bare with me I just flew back from the
 US and I am badly jet lagged ...
 
 I missed it too, the model I'm proposing entirely removes the uiommu
 concept.
 
 So I see what you mean, however...
 
 I think the question is: do we force 1:1 iommu/group mapping, or do we allow
 arbitrary mapping (satisfying group constraints) as we do today.
 
 I'm saying I'm an existing user who wants the arbitrary iommu/group mapping
 ability and definitely think the uiommu approach is cleaner than the
 ioctl(inherit_iommu) approach.  We considered that approach before but it
 seemed less clean so we went with the explicit uiommu context.
 
 Possibly, the question that interest me the most is what interface will
 KVM end up using. I'm also not terribly fan with the (perceived)
 discrepancy between using uiommu to create groups but using the group fd
 to actually do the mappings, at least if that is still the plan.
 
 Current code: uiommu creates the domain, we bind a vfio device to that
 domain via a SET_UIOMMU_DOMAIN ioctl on the vfio device, then do
 mappings via MAP_DMA on the vfio device (affecting all the vfio devices
 bound to the domain)
 
 My current proposal: groups are predefined.  groups ~= iommu domain.

This is my main objection.  I'd rather not lose the ability to have multiple
devices (which are all predefined as singleton groups on x86 w/o PCI
bridges) share IOMMU resources.  Otherwise, 20 devices sharing buffers would
require 20x the IOMMU/ioTLB resources.  KVM doesn't care about this case?

 The iommu domain would probably be allocated when the first device is
 bound to vfio.  As each device is bound, it gets attached to the group.
 DMAs are done via an ioctl on the group.
 
 I think group + uiommu leads to effectively reliving most of the
 problems with the current code.  The only benefit is the group
 assignment to enforce hardware restrictions.  We still have the problem
 that uiommu open() = iommu_domain_alloc(), whose properties are
 meaningless without attached devices (groups).  Which I think leads to
 the same awkward model of attaching groups to define the domain, then we
 end up doing mappings via the group to enforce ordering.

Is there a better way to allow groups to share an IOMMU domain?

Maybe, instead of having an ioctl to allow a group A to inherit the same
iommu domain as group B, we could have an ioctl to fully merge two groups
(could be what Ben was thinking):

A.ioctl(MERGE_TO_GROUP, B)

The group A now goes away and its devices join group B.  If A ever had an
iommu domain assigned (and buffers mapped?) we fail.

Groups cannot get smaller (they are defined as minimum granularity of an
IOMMU, initially).  They can get bigger if you want to share IOMMU
resources, though.

Any downsides to this approach?

-AF

 
 If the separate uiommu interface is kept, then anything that wants to be
 able to benefit from the ability to put multiple devices (or existing
 groups) into such a meta group would need to be explicitly modified to
 deal with the uiommu APIs.
 
 I tend to prefer such meta groups as being something you create
 statically using a configuration interface, either via sysfs, netlink or
 ioctl's to a control vfio device driven by a simple command line tool
 (which can have the configuration stored in /etc and re-apply it at
 boot).
 
 I cringe anytime there's a mention of static.  IMHO, we have to
 support hotplug.  That means meta groups change dynamically.  Maybe
 this supports the idea that we should be able to retrieve a new fd from
 the group to do mappings.  Any groups bound together will return the
 same fd and the fd will persist so long as any member of the group is
 open.
 
 That way, any program capable of exploiting VFIO groups will
 automatically be able to exploit those meta groups (or groups of
 groups) as well as long as they are supported on the system.
 
 If we ever have system specific constraints as to how such groups can be
 created, then it can all be handled at the level of that configuration
 tool without impact on whatever programs know how to exploit them via
 the VFIO interfaces.
 
 I'd prefer to have the constraints be represented in the ioctl to bind
 groups.  It works or not and 

Re: [PATCH] powerpc: fixup QE_General4 errata

2011-08-23 Thread Timur Tabi
Joakim Tjernlund wrote:
 QE_General4 should only round up the divisor iff divisor is  3.
 Rounding up lower divisors makes the error too big, causing USB
 on MPC832x to fail.
 
 Signed-off-by: Joakim Tjernlund joakim.tjernl...@transmode.se

Acked-by: Timur Tabi ti...@freescale.com

-- 
Timur Tabi
Linux kernel developer at Freescale

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Alex Williamson
On Tue, 2011-08-23 at 10:33 -0700, Aaron Fabbri wrote:
 
 
 On 8/23/11 10:01 AM, Alex Williamson alex.william...@redhat.com wrote:
 
  On Tue, 2011-08-23 at 16:54 +1000, Benjamin Herrenschmidt wrote:
  On Mon, 2011-08-22 at 17:52 -0700, aafabbri wrote:
  
  I'm not following you.
  
  You have to enforce group/iommu domain assignment whether you have the
  existing uiommu API, or if you change it to your proposed
  ioctl(inherit_iommu) API.
  
  The only change needed to VFIO here should be to make uiommu fd assignment
  happen on the groups instead of on device fds.  That operation fails or
  succeeds according to the group semantics (all-or-none assignment/same
  uiommu).
  
  Ok, so I missed that part where you change uiommu to operate on group
  fd's rather than device fd's, my apologies if you actually wrote that
  down :-) It might be obvious ... bare with me I just flew back from the
  US and I am badly jet lagged ...
  
  I missed it too, the model I'm proposing entirely removes the uiommu
  concept.
  
  So I see what you mean, however...
  
  I think the question is: do we force 1:1 iommu/group mapping, or do we 
  allow
  arbitrary mapping (satisfying group constraints) as we do today.
  
  I'm saying I'm an existing user who wants the arbitrary iommu/group 
  mapping
  ability and definitely think the uiommu approach is cleaner than the
  ioctl(inherit_iommu) approach.  We considered that approach before but it
  seemed less clean so we went with the explicit uiommu context.
  
  Possibly, the question that interest me the most is what interface will
  KVM end up using. I'm also not terribly fan with the (perceived)
  discrepancy between using uiommu to create groups but using the group fd
  to actually do the mappings, at least if that is still the plan.
  
  Current code: uiommu creates the domain, we bind a vfio device to that
  domain via a SET_UIOMMU_DOMAIN ioctl on the vfio device, then do
  mappings via MAP_DMA on the vfio device (affecting all the vfio devices
  bound to the domain)
  
  My current proposal: groups are predefined.  groups ~= iommu domain.
 
 This is my main objection.  I'd rather not lose the ability to have multiple
 devices (which are all predefined as singleton groups on x86 w/o PCI
 bridges) share IOMMU resources.  Otherwise, 20 devices sharing buffers would
 require 20x the IOMMU/ioTLB resources.  KVM doesn't care about this case?

We do care, I just wasn't prioritizing it as heavily since I think the
typical model is probably closer to 1 device per guest.

  The iommu domain would probably be allocated when the first device is
  bound to vfio.  As each device is bound, it gets attached to the group.
  DMAs are done via an ioctl on the group.
  
  I think group + uiommu leads to effectively reliving most of the
  problems with the current code.  The only benefit is the group
  assignment to enforce hardware restrictions.  We still have the problem
  that uiommu open() = iommu_domain_alloc(), whose properties are
  meaningless without attached devices (groups).  Which I think leads to
  the same awkward model of attaching groups to define the domain, then we
  end up doing mappings via the group to enforce ordering.
 
 Is there a better way to allow groups to share an IOMMU domain?
 
 Maybe, instead of having an ioctl to allow a group A to inherit the same
 iommu domain as group B, we could have an ioctl to fully merge two groups
 (could be what Ben was thinking):
 
 A.ioctl(MERGE_TO_GROUP, B)
 
 The group A now goes away and its devices join group B.  If A ever had an
 iommu domain assigned (and buffers mapped?) we fail.
 
 Groups cannot get smaller (they are defined as minimum granularity of an
 IOMMU, initially).  They can get bigger if you want to share IOMMU
 resources, though.
 
 Any downsides to this approach?

That's sort of the way I'm picturing it.  When groups are bound
together, they effectively form a pool, where all the groups are peers.
When the MERGE/BIND ioctl is called on group A and passed the group B
fd, A can check compatibility of the domain associated with B, unbind
devices from the B domain and attach them to the A domain.  The B domain
would then be freed and it would bump the refcnt on the A domain.  If we
need to remove A from the pool, we call UNMERGE/UNBIND on B with the A
fd, it will remove the A devices from the shared object, disassociate A
with the shared object, re-alloc a domain for A and rebind A devices to
that domain. 

This is where it seems like it might be helpful to make a GET_IOMMU_FD
ioctl so that an iommu object is ubiquitous and persistent across the
pool.  Operations on any group fd work on the pool as a whole.  Thanks,

Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 8/9] arch/powerpc/sysdev/ehv_pic.c: add missing kfree

2011-08-23 Thread Timur Tabi
Julia Lawall wrote:
 At this point, ehv_pic has been allocated but not stored anywhere, so it
 should be freed before leaving the function.

Acked-by: Timur Tabi ti...@freescale.com

FYI, Ashish is no longer with Freescale, so I've taken over maintainership of
ehv_pic.

-- 
Timur Tabi
Linux kernel developer at Freescale

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 8/9] arch/powerpc/sysdev/ehv_pic.c: add missing kfree

2011-08-23 Thread Timur Tabi
Ben, Kumar, can one of you take a look at my question and help me out?

 wrote:
 On Mon, Aug 8, 2011 at 7:18 AM, Julia Lawall ju...@diku.dk wrote:
 
 diff --git a/arch/powerpc/sysdev/ehv_pic.c b/arch/powerpc/sysdev/ehv_pic.c
 index af1a5df..b6731e4 100644
 --- a/arch/powerpc/sysdev/ehv_pic.c
 +++ b/arch/powerpc/sysdev/ehv_pic.c
 @@ -280,6 +280,7 @@ void __init ehv_pic_init(void)

if (!ehv_pic-irqhost) {
of_node_put(np);
 +   kfree(ehv_pic);
return;
}
 
 Although the fix is correct, I think there is another bug in this
 function.  'np' is not released when the function finishes
 successfully.   I've looked at other functions that use
 irq_alloc_host(), and most of them do the same thing: they don't call
 of_node_put() on the device node pointer.  The only exception I've
 found is mpc5121_ads_cpld_pic_init().
 
 Ben, Kumar: am I missing something?  irq_alloc_host() calls of_node_get():
 
   host-of_node = of_node_get(of_node);
 
 so doesn't that mean that the caller of irq_alloc_host() should
 release the device node pointer?
 


-- 
Timur Tabi
Linux kernel developer at Freescale

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Alex Williamson
On Tue, 2011-08-23 at 07:01 +1000, Benjamin Herrenschmidt wrote:
 On Mon, 2011-08-22 at 09:45 -0600, Alex Williamson wrote:
 
  Yes, that's the idea.  An open question I have towards the configuration
  side is whether we might add iommu driver specific options to the
  groups.  For instance on x86 where we typically have B:D.F granularity,
  should we have an option not to trust multi-function devices and use a
  B:D granularity for grouping?
 
 Or even B or range of busses... if you want to enforce strict isolation
 you really can't trust anything below a bus level :-)
 
  Right, we can also combine models.  Binding a device to vfio
  creates /dev/vfio$GROUP, which only allows a subset of ioctls and no
  device access until all the group devices are also bound.  I think
  the /dev/vfio/$GROUP might help provide an enumeration interface as well
  though, which could be useful.
 
 Could be tho in what form ? returning sysfs pathes ?

I'm at a loss there, please suggest.  I think we need an ioctl that
returns some kind of array of devices within the group and another that
maybe takes an index from that array and returns an fd for that device.
A sysfs path string might be a reasonable array element, but it sounds
like a pain to work with.

  1:1 group-process is probably too strong.  Not allowing concurrent
  open()s on the group file enforces a single userspace entity is
  responsible for that group.  Device fds can be passed to other
  processes, but only retrieved via the group fd.  I suppose we could even
  branch off the dma interface into a different fd, but it seems like we
  would logically want to serialize dma mappings at each iommu group
  anyway.  I'm open to alternatives, this just seemed an easy way to do
  it.  Restricting on UID implies that we require isolated qemu instances
  to run as different UIDs.  I know that's a goal, but I don't know if we
  want to make it an assumption in the group security model.
 
 1:1 process has the advantage of linking to an -mm which makes the whole
 mmu notifier business doable. How do you want to track down mappings and
 do the second level translation in the case of explicit map/unmap (like
 on power) if you are not tied to an mm_struct ?

Right, I threw away the mmu notifier code that was originally part of
vfio because we can't do anything useful with it yet on x86.  I
definitely don't want to prevent it where it makes sense though.  Maybe
we just record current-mm on open and restrict subsequent opens to the
same.

  Yes.  I'm not sure there's a good ROI to prioritize that model.  We have
  to assume 1 device per guest is a typical model and that the iotlb is
  large enough that we might improve thrashing to see both a resource and
  performance benefit from it.  I'm open to suggestions for how we could
  include it though.
 
 Sharing may or may not be possible depending on setups so yes, it's a
 bit tricky.
 
 My preference is to have a static interface (and that's actually where
 your pet netlink might make some sense :-) to create synthetic groups
 made of other groups if the arch allows it. But that might not be the
 best approach. In another email I also proposed an option for a group to
 capture another one...

I already made some comments on this in a different thread, so I won't
repeat here.

   If that's
   not what you're saying, how would the domains - now made up of a
   user's selection of groups, rather than individual devices - be
   configured?
   
Hope that captures it, feel free to jump in with corrections and
suggestions.  Thanks,
   
 
 Another aspect I don't see discussed is how we represent these things to
 the guest.
 
 On Power for example, I have a requirement that a given iommu domain is
 represented by a single dma window property in the device-tree. What
 that means is that that property needs to be either in the node of the
 device itself if there's only one device in the group or in a parent
 node (ie a bridge or host bridge) if there are multiple devices.
 
 Now I do -not- want to go down the path of simulating P2P bridges,
 besides we'll quickly run out of bus numbers if we go there.
 
 For us the most simple and logical approach (which is also what pHyp
 uses and what Linux handles well) is really to expose a given PCI host
 bridge per group to the guest. Believe it or not, it makes things
 easier :-)

I'm all for easier.  Why does exposing the bridge use less bus numbers
than emulating a bridge?

On x86, I want to maintain that our default assignment is at the device
level.  A user should be able to pick single or multiple devices from
across several groups and have them all show up as individual,
hotpluggable devices on bus 0 in the guest.  Not surprisingly, we've
also seen cases where users try to attach a bridge to the guest,
assuming they'll get all the devices below the bridge, so I'd be in
favor of making this just work if possible too, though we may have to
prevent hotplug of those.

Given the device 

Re: [PATCH part1 v2 1/9] Add udbg driver using the PS3 gelic Ethernet device

2011-08-23 Thread Geoff Levand
Hi,

We had some questions as to why we have this totally separate driver
from the gelic driver, so I think it worthwhile to have an
explanation of why in the commit log.   Otherwise, the code looks
OK.

-Geoff

 From: Hector Martin hec...@marcansoft.com
 
 Signed-off-by: Hector Martin hec...@marcansoft.com
 [a.heider: Various cleanups to make checkpatch.pl happy]
 Signed-off-by: Andre Heider a.hei...@gmail.com
 ---
  arch/powerpc/Kconfig.debug  |8 +
  arch/powerpc/include/asm/udbg.h |1 +
  arch/powerpc/kernel/udbg.c  |2 +
  arch/powerpc/platforms/ps3/Kconfig  |   12 ++
  arch/powerpc/platforms/ps3/Makefile |1 +
  arch/powerpc/platforms/ps3/gelic_udbg.c |  273 
 +++
  drivers/net/ps3_gelic_net.c |3 +
  drivers/net/ps3_gelic_net.h |6 +
  8 files changed, 306 insertions(+), 0 deletions(-)
  create mode 100644 arch/powerpc/platforms/ps3/gelic_udbg.c
 
 diff --git a/arch/powerpc/Kconfig.debug b/arch/powerpc/Kconfig.debug
 index 067cb84..ab2335f 100644
 --- a/arch/powerpc/Kconfig.debug
 +++ b/arch/powerpc/Kconfig.debug
 @@ -258,6 +258,14 @@ config PPC_EARLY_DEBUG_WSP
   depends on PPC_WSP
   select PPC_UDBG_16550
  
 +config PPC_EARLY_DEBUG_PS3GELIC
 + bool Early debugging through the PS3 Ethernet port
 + depends on PPC_PS3
 + select PS3GELIC_UDBG
 + help
 +   Select this to enable early debugging for the PlayStation3 via
 +   UDP broadcasts sent out through the Ethernet port.
 +
  endchoice
  
  config PPC_EARLY_DEBUG_HVSI_VTERMNO
 diff --git a/arch/powerpc/include/asm/udbg.h b/arch/powerpc/include/asm/udbg.h
 index 93e05d1..7cf796f 100644
 --- a/arch/powerpc/include/asm/udbg.h
 +++ b/arch/powerpc/include/asm/udbg.h
 @@ -54,6 +54,7 @@ extern void __init udbg_init_40x_realmode(void);
  extern void __init udbg_init_cpm(void);
  extern void __init udbg_init_usbgecko(void);
  extern void __init udbg_init_wsp(void);
 +extern void __init udbg_init_ps3gelic(void);
  
  #endif /* __KERNEL__ */
  #endif /* _ASM_POWERPC_UDBG_H */
 diff --git a/arch/powerpc/kernel/udbg.c b/arch/powerpc/kernel/udbg.c
 index faa82c1..5b3e98e 100644
 --- a/arch/powerpc/kernel/udbg.c
 +++ b/arch/powerpc/kernel/udbg.c
 @@ -67,6 +67,8 @@ void __init udbg_early_init(void)
   udbg_init_usbgecko();
  #elif defined(CONFIG_PPC_EARLY_DEBUG_WSP)
   udbg_init_wsp();
 +#elif defined(CONFIG_PPC_EARLY_DEBUG_PS3GELIC)
 + udbg_init_ps3gelic();
  #endif
  
  #ifdef CONFIG_PPC_EARLY_DEBUG
 diff --git a/arch/powerpc/platforms/ps3/Kconfig 
 b/arch/powerpc/platforms/ps3/Kconfig
 index dfe316b..476d9d9 100644
 --- a/arch/powerpc/platforms/ps3/Kconfig
 +++ b/arch/powerpc/platforms/ps3/Kconfig
 @@ -148,4 +148,16 @@ config PS3_LPM
 profiling support of the Cell processor with programs like
 oprofile and perfmon2, then say Y or M, otherwise say N.
  
 +config PS3GELIC_UDBG
 + bool PS3 udbg output via UDP broadcasts on Ethernet
 + depends on PPC_PS3
 + help
 +   Enables udbg early debugging output by sending broadcast UDP
 +   via the Ethernet port (UDP port number 18194).
 +
 +   This driver uses a trivial implementation and is independent
 +   from the main network driver.
 +
 +   If in doubt, say N here.
 +
  endmenu
 diff --git a/arch/powerpc/platforms/ps3/Makefile 
 b/arch/powerpc/platforms/ps3/Makefile
 index ac1bdf8..02b9e63 100644
 --- a/arch/powerpc/platforms/ps3/Makefile
 +++ b/arch/powerpc/platforms/ps3/Makefile
 @@ -2,6 +2,7 @@ obj-y += setup.o mm.o time.o hvcall.o htab.o repository.o
  obj-y += interrupt.o exports.o os-area.o
  obj-y += system-bus.o
  
 +obj-$(CONFIG_PS3GELIC_UDBG) += gelic_udbg.o
  obj-$(CONFIG_SMP) += smp.o
  obj-$(CONFIG_SPU_BASE) += spu.o
  obj-y += device-init.o
 diff --git a/arch/powerpc/platforms/ps3/gelic_udbg.c 
 b/arch/powerpc/platforms/ps3/gelic_udbg.c
 new file mode 100644
 index 000..20b46a1
 --- /dev/null
 +++ b/arch/powerpc/platforms/ps3/gelic_udbg.c
 @@ -0,0 +1,273 @@
 +/*
 + * udbg debug output routine via GELIC UDP broadcasts
 + *
 + * Copyright (C) 2007 Sony Computer Entertainment Inc.
 + * Copyright 2006, 2007 Sony Corporation
 + * Copyright (C) 2010 Hector Martin hec...@marcansoft.com
 + * Copyright (C) 2011 Andre Heider a.hei...@gmail.com
 + *
 + * This program is free software; you can redistribute it and/or
 + * modify it under the terms of the GNU General Public License
 + * as published by the Free Software Foundation; either version 2
 + * of the License, or (at your option) any later version.
 + *
 + */
 +
 +#include asm/io.h
 +#include asm/udbg.h
 +#include asm/lv1call.h
 +
 +#define GELIC_BUS_ID 1
 +#define GELIC_DEVICE_ID 0
 +#define GELIC_DEBUG_PORT 18194
 +#define GELIC_MAX_MESSAGE_SIZE 1000
 +
 +#define GELIC_LV1_GET_MAC_ADDRESS 1
 +#define GELIC_LV1_GET_VLAN_ID 4
 +#define GELIC_LV1_VLAN_TX_ETHERNET_0 2
 +
 +#define GELIC_DESCR_DMA_STAT_MASK 0xf000
 +#define GELIC_DESCR_DMA_CARDOWNED 

Re: [PATCH part1 v2 2/9] ps3: Add helper functions to read highmem info from the repository

2011-08-23 Thread Geoff Levand
On 08/11/2011 12:31 PM, Andre Heider wrote:
 An earlier step in the boot chain can preallocate the highmem region.
 A boot loader doing so will place the region infos in the repository.
 Provide helper functions to read the required nodes.
 
 Signed-off-by: Andre Heider a.hei...@gmail.com
 ---
  arch/powerpc/platforms/ps3/platform.h   |3 ++
  arch/powerpc/platforms/ps3/repository.c |   36 
 +++
  2 files changed, 39 insertions(+), 0 deletions(-)
 
 diff --git a/arch/powerpc/platforms/ps3/platform.h 
 b/arch/powerpc/platforms/ps3/platform.h
 index 9a196a8..d9b4ec0 100644
 --- a/arch/powerpc/platforms/ps3/platform.h
 +++ b/arch/powerpc/platforms/ps3/platform.h
 @@ -187,6 +187,9 @@ int ps3_repository_read_rm_size(unsigned int ppe_id, u64 
 *rm_size);
  int ps3_repository_read_region_total(u64 *region_total);
  int ps3_repository_read_mm_info(u64 *rm_base, u64 *rm_size,
   u64 *region_total);
 +int ps3_repository_read_highmem_base(u64 *highmem_base);
 +int ps3_repository_read_highmem_size(u64 *highmem_size);
 +int ps3_repository_read_highmem_info(u64 *highmem_base, u64 *highmem_size);

In the general case we could have multiple regions.  If we
add a region arg here we can handle that if needed.
region_index would be {1..}.  ps3_repository_read_highmem_info
could hold how many regions, so:

  int ps3_repository_read_highmem_base(unsigned int region_index, u64 
*highmem_base);
  int ps3_repository_read_highmem_size(unsigned int region_index, u64 
*highmem_size);
  int ps3_repository_read_highmem_region(unsigned int region_index, u64 
*highmem_base, u64 *highmem_size);

  
  /* repository pme info */
  
 diff --git a/arch/powerpc/platforms/ps3/repository.c 
 b/arch/powerpc/platforms/ps3/repository.c
 index 5e304c2..9908d61 100644
 --- a/arch/powerpc/platforms/ps3/repository.c
 +++ b/arch/powerpc/platforms/ps3/repository.c
 @@ -778,6 +778,42 @@ int ps3_repository_read_mm_info(u64 *rm_base, u64 
 *rm_size, u64 *region_total)
   : ps3_repository_read_region_total(region_total);
  }
  
 +int ps3_repository_read_highmem_base(u64 *highmem_base)
 +{
 + return read_node(PS3_LPAR_ID_CURRENT,
 +  make_first_field(bi, 0),
 +  make_field(highmem, 0),
 +  make_field(base, 0),
 +  0,
 +  highmem_base, NULL);
 +}

I think something like this seems better:

int ps3_repository_read_highmem_base(unsigned int region_index, u64 
*highmem_base)
{
return read_node(PS3_LPAR_ID_CURRENT,
make_first_field(highmem, 0),
make_field(region, region_index),
make_field(base, 0),
0,
highmem_base, NULL);
}

 +
 +int ps3_repository_read_highmem_size(u64 *highmem_size)
 +{
 + return read_node(PS3_LPAR_ID_CURRENT,
 +  make_first_field(bi, 0),
 +  make_field(highmem, 0),
 +  make_field(size, 0),
 +  0,
 +  highmem_size, NULL);
 +}
 +
 +/**
 + * ps3_repository_read_highmem_info - Read high memory info
 + * @highmem_base: High memory base address.
 + * @highmem_size: High mode memory size.
 + */
 +
 +int ps3_repository_read_highmem_info(u64 *highmem_base, u64 *highmem_size)
 +{
 + int result;
 +
 + *highmem_base = 0;
 + result = ps3_repository_read_highmem_base(highmem_base);
 + return result ? result
 + : ps3_repository_read_highmem_size(highmem_size);


ps3_repository_read_highmem_base(1, highmem_base);
...
 +}
 +
  /**
   * ps3_repository_read_num_spu_reserved - Number of physical spus reserved.
   * @num_spu: Number of physical spus.

-Geoff


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH part1 v2 3/9] ps3: Get lv1 high memory region from the repository

2011-08-23 Thread Geoff Levand
On 08/11/2011 12:31 PM, Andre Heider wrote:
 This lets the bootloader preallocate the high lv1 region and pass its
 location to the kernel through the repository. Thus, it can be used to
 hold the initrd. If the region info doesn't exist, the kernel retains
 the old behavior and attempts to allocate the region itself.
 
 Based on the patch
 [PS3] Get lv1 high memory region from devtree
 from Hector Martin hec...@marcansoft.com
 
 Signed-off-by: Andre Heider a.hei...@gmail.com
 ---
  arch/powerpc/platforms/ps3/mm.c |   46 --
  1 files changed, 43 insertions(+), 3 deletions(-)
 
 diff --git a/arch/powerpc/platforms/ps3/mm.c b/arch/powerpc/platforms/ps3/mm.c
 index c204588..983b719 100644
 --- a/arch/powerpc/platforms/ps3/mm.c
 +++ b/arch/powerpc/platforms/ps3/mm.c
 @@ -78,12 +78,14 @@ enum {
   * @base: base address
   * @size: size in bytes
   * @offset: difference between base and rm.size
 + * @destroy: flag if region should be destroyed upon shutdown
   */
  
  struct mem_region {
   u64 base;
   u64 size;
   unsigned long offset;
 + int destroy;
  };
  
  /**
 @@ -261,6 +263,7 @@ static int ps3_mm_region_create(struct mem_region *r, 
 unsigned long size)
   goto zero_region;
   }
  
 + r-destroy = 1;
   r-offset = r-base - map.rm.size;
   return result;
  
 @@ -279,6 +282,12 @@ static void ps3_mm_region_destroy(struct mem_region *r)
   int result;
  
   DBG(%s:%d: r-base = %llxh\n, __func__, __LINE__, r-base);
 +
 + if (!r-destroy) {
 + DBG(%s:%d: not destroying region\n, __func__, __LINE__);
 + return;
 + }
 +
   if (r-base) {
   result = lv1_release_memory(r-base);
   BUG_ON(result);
 @@ -287,6 +296,29 @@ static void ps3_mm_region_destroy(struct mem_region *r)
   }
  }
  
 +static int ps3_mm_get_repository_highmem(struct mem_region *r)
 +{
 + int result = ps3_repository_read_highmem_info(r-base, r-size);
 +
 + if (result)
 + goto zero_region;
 +
 + if (!r-base || !r-size) {
 + result = -1;
 + goto zero_region;
 + }
 +
 + r-offset = r-base - map.rm.size;
 + DBG(%s:%d got high region from repository: %llxh %llxh\n,
 + __func__, __LINE__, r-base, r-size);
 + return 0;
 +
 +zero_region:
 + DBG(%s:%d no high region in repository...\n, __func__, __LINE__);

Three dots implies something more is on its way.  I
don't think we need them.

 + r-size = r-base = r-offset = 0;
 + return result;
 +}
 +
  /**
   * ps3_mm_add_memory - hot add memory
   */
 @@ -303,6 +335,12 @@ static int __init ps3_mm_add_memory(void)
  
   BUG_ON(!mem_init_done);
  
 + if (!map.r1.size) {
 + DBG(%s:%d: no region 1, not adding memory\n,
 + __func__, __LINE__);
 + return 0;
 + }
 +
   start_addr = map.rm.size;
   start_pfn = start_addr  PAGE_SHIFT;
   nr_pages = (map.r1.size + PAGE_SIZE - 1)  PAGE_SHIFT;
 @@ -1217,9 +1255,11 @@ void __init ps3_mm_init(void)
   BUG_ON(map.rm.base);
   BUG_ON(!map.rm.size);
  
 -
 - /* arrange to do this in ps3_mm_add_memory */
 - ps3_mm_region_create(map.r1, map.total - map.rm.size);
 + /* check if we got the highmem region from an earlier boot step */
 + if (ps3_mm_get_repository_highmem(map.r1)) {
 + /* arrange to do this in ps3_mm_add_memory */
 + ps3_mm_region_create(map.r1, map.total - map.rm.size);
 + }
  
   /* correct map.total for the real total amount of memory we use */
   map.total = map.rm.size + map.r1.size;



___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH part1 v2 4/9] Add region 1 memory early

2011-08-23 Thread Geoff Levand
On 08/11/2011 12:31 PM, Andre Heider wrote:
 From: Hector Martin hec...@marcansoft.com
 
 Real mode memory can be limited and runs out quickly as memory is
 allocated during kernel startup.
 Having region1 available sooner fixes this.
 
 Signed-off-by: Hector Martin hec...@marcansoft.com
 [a.heider: Various cleanups to make checkpatch.pl happy]
 Signed-off-by: Andre Heider a.hei...@gmail.com
 ---
  arch/powerpc/platforms/ps3/mm.c |   75 
 +++
  1 files changed, 13 insertions(+), 62 deletions(-)
 
 diff --git a/arch/powerpc/platforms/ps3/mm.c b/arch/powerpc/platforms/ps3/mm.c
 index 983b719..68b3879 100644
 --- a/arch/powerpc/platforms/ps3/mm.c
 +++ b/arch/powerpc/platforms/ps3/mm.c
 @@ -20,7 +20,6 @@
  
  #include linux/kernel.h
  #include linux/module.h
 -#include linux/memory_hotplug.h
  #include linux/memblock.h
  #include linux/slab.h
  
 @@ -94,10 +93,8 @@ struct mem_region {
   * @vas_id - HV virtual address space id
   * @htab_size: htab size in bytes
   *
 - * The HV virtual address space (vas) allows for hotplug memory regions.
 - * Memory regions can be created and destroyed in the vas at runtime.

This is still true, so we should keep these comments.  We
are only changing the way we use the feature.

   * @rm: real mode (bootmem) region
 - * @r1: hotplug memory region(s)
 + * @r1: high memory region

high memory region(s)

   *
   * ps3 addresses
   * virt_addr: a cpu 'translated' effective address
 @@ -223,10 +220,6 @@ void ps3_mm_vas_destroy(void)
   }
  }
  
 -/**/
 -/* memory hotplug routines   
  */
 -/**/
 -
  /**
   * ps3_mm_region_create - create a memory region in the vas
   * @r: pointer to a struct mem_region to accept initialized values
 @@ -319,57 +312,6 @@ zero_region:
   return result;
  }
  
 -/**
 - * ps3_mm_add_memory - hot add memory
 - */
 -
 -static int __init ps3_mm_add_memory(void)
 -{
 - int result;
 - unsigned long start_addr;
 - unsigned long start_pfn;
 - unsigned long nr_pages;
 -
 - if (!firmware_has_feature(FW_FEATURE_PS3_LV1))
 - return -ENODEV;
 -
 - BUG_ON(!mem_init_done);
 -
 - if (!map.r1.size) {
 - DBG(%s:%d: no region 1, not adding memory\n,
 - __func__, __LINE__);
 - return 0;
 - }
 -
 - start_addr = map.rm.size;
 - start_pfn = start_addr  PAGE_SHIFT;
 - nr_pages = (map.r1.size + PAGE_SIZE - 1)  PAGE_SHIFT;
 -
 - DBG(%s:%d: start_addr %lxh, start_pfn %lxh, nr_pages %lxh\n,
 - __func__, __LINE__, start_addr, start_pfn, nr_pages);
 -
 - result = add_memory(0, start_addr, map.r1.size);
 -
 - if (result) {
 - pr_err(%s:%d: add_memory failed: (%d)\n,
 - __func__, __LINE__, result);
 - return result;
 - }
 -
 - memblock_add(start_addr, map.r1.size);
 - memblock_analyze();
 -
 - result = online_pages(start_pfn, nr_pages);
 -
 - if (result)
 - pr_err(%s:%d: online_pages failed: (%d)\n,
 - __func__, __LINE__, result);
 -
 - return result;
 -}
 -
 -device_initcall(ps3_mm_add_memory);
 -
  
 /**/
  /* dma routines  
  */
  
 /**/
 @@ -1256,14 +1198,23 @@ void __init ps3_mm_init(void)
   BUG_ON(!map.rm.size);
  
   /* check if we got the highmem region from an earlier boot step */
 - if (ps3_mm_get_repository_highmem(map.r1)) {
 - /* arrange to do this in ps3_mm_add_memory */
 + if (ps3_mm_get_repository_highmem(map.r1))
   ps3_mm_region_create(map.r1, map.total - map.rm.size);
 - }

This should be folded into patch #3.

  
   /* correct map.total for the real total amount of memory we use */
   map.total = map.rm.size + map.r1.size;
  
 + if (!map.r1.size) {
 + DBG(%s:%d: no region 1, not adding memory\n,
 + __func__, __LINE__);
 + } else {

Remove brackets around a single line conditional.

 + DBG(%s:%d: adding memory: start %llxh, size %llxh\n,
 + __func__, __LINE__, map.rm.size, map.r1.size);
 +
 + memblock_add(map.rm.size, map.r1.size);
 + memblock_analyze();
 + }
 +
   DBG( - %s:%d\n, __func__, __LINE__);
  }
  




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH part1 v2 5/9] ps3: MEMORY_HOTPLUG is not a requirement anymore

2011-08-23 Thread Geoff Levand
On 08/11/2011 12:31 PM, Andre Heider wrote:
 Signed-off-by: Andre Heider a.hei...@gmail.com
 ---
  arch/powerpc/platforms/ps3/Kconfig |1 -
  1 files changed, 0 insertions(+), 1 deletions(-)
 
 diff --git a/arch/powerpc/platforms/ps3/Kconfig 
 b/arch/powerpc/platforms/ps3/Kconfig
 index 476d9d9..84df5c8 100644
 --- a/arch/powerpc/platforms/ps3/Kconfig
 +++ b/arch/powerpc/platforms/ps3/Kconfig
 @@ -7,7 +7,6 @@ config PPC_PS3
   select USB_OHCI_BIG_ENDIAN_MMIO
   select USB_ARCH_HAS_EHCI
   select USB_EHCI_BIG_ENDIAN_MMIO
 - select MEMORY_HOTPLUG


This one is OK, once the others are fixed up.

-Geoff

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH part1 v2 6/9] ps3: Detect the current lpar

2011-08-23 Thread Geoff Levand
Hi,

On 08/11/2011 12:31 PM, Andre Heider wrote:
 Detect it by reading the ss laid repository node, and make it
 accessible via ps3_get_ss_laid().

I'm wondering now if we even need this.  It is mainly used by your
later patch 8/9 that modifies ps3flash_init() to test if we should
call ps3_system_bus_driver_register().  If we don't use
ps3_get_ss_laid() and just allow ps3_system_bus_driver_register()
to be called, would the device probe fail and have the same result
as the test?

I would prefer to not have ps3_get_ss_laid() if possible.

-Geoff

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH part1 v2 8/9] ps3flash: Refuse to work in lpars other than OtherOS

2011-08-23 Thread Geoff Levand
On 08/11/2011 12:31 PM, Andre Heider wrote:
 The driver implements a character and misc device, meant for the
 axed OtherOS to exchange various settings with GameOS.
 Since Firmware 3.21 there is no support anymore to write these
 settings, so test if we're running in OtherOS, and refuse to load
 if that is not the case.

Please see my comments to the v1 patch regarding this text.

 Signed-off-by: Andre Heider a.hei...@gmail.com
 ---
  arch/powerpc/platforms/ps3/Kconfig |2 +-
  drivers/char/ps3flash.c|7 +++
  2 files changed, 8 insertions(+), 1 deletions(-)
 
 diff --git a/arch/powerpc/platforms/ps3/Kconfig 
 b/arch/powerpc/platforms/ps3/Kconfig
 index 84df5c8..72fdecd 100644
 --- a/arch/powerpc/platforms/ps3/Kconfig
 +++ b/arch/powerpc/platforms/ps3/Kconfig
 @@ -121,7 +121,7 @@ config PS3_FLASH
  
 This support is required to access the PS3 FLASH ROM, which
 contains the boot loader and some boot options.
 -   In general, all users will say Y or M.
 +   In general, all PS3 OtherOS users will say Y or M.
  
 As this driver needs a fixed buffer of 256 KiB of memory, it can
 be disabled on the kernel command line using ps3flash=off, to
 diff --git a/drivers/char/ps3flash.c b/drivers/char/ps3flash.c
 index d0c57c2..fc6d867 100644
 --- a/drivers/char/ps3flash.c
 +++ b/drivers/char/ps3flash.c
 @@ -25,6 +25,7 @@
  
  #include asm/lv1call.h
  #include asm/ps3stor.h
 +#include asm/firmware.h
  
  
  #define DEVICE_NAME  ps3flash
 @@ -464,6 +465,12 @@ static struct ps3_system_bus_driver ps3flash = {
  
  static int __init ps3flash_init(void)
  {
 + if (!firmware_has_feature(FW_FEATURE_PS3_LV1))
 + return -ENODEV;
 +
 + if (ps3_get_ss_laid() != PS3_SS_LAID_OTHEROS)
 + return -ENODEV;
 +
   return ps3_system_bus_driver_register(ps3flash);
  }
  


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [Cbe-oss-dev] [PATCH part1 v2 4/9] Add region 1 memory early

2011-08-23 Thread Antonio Ospite
On Tue, 23 Aug 2011 13:53:46 -0700
Geoff Levand ge...@infradead.org wrote:

[...]
   
  +   if (!map.r1.size) {
  +   DBG(%s:%d: no region 1, not adding memory\n,
  +   __func__, __LINE__);
  +   } else {
 
 Remove brackets around a single line conditional.
 
  +   DBG(%s:%d: adding memory: start %llxh, size %llxh\n,
  +   __func__, __LINE__, map.rm.size, map.r1.size);
  +
  +   memblock_add(map.rm.size, map.r1.size);
  +   memblock_analyze();
  +   }
  +

In Documentation/CodingStyle I read that if [only] one branch is a
single statement then the parenthesis are OK (and even recommended) for
both branches, I guess this is for style consistency. See Chapter 3,
around line 169 on my copy. I guess the wording on that paragraph can
be made more explicit, I'll try to fix that up.

Regards,
   Antonio

-- 
Antonio Ospite
http://ao2.it

PGP public key ID: 0x4553B001

A: Because it messes up the order in which people normally read text.
   See http://en.wikipedia.org/wiki/Posting_style
Q: Why is top-posting such a bad thing?


pgpC8bdEgALbX.pgp
Description: PGP signature
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Benjamin Herrenschmidt
On Tue, 2011-08-23 at 15:18 +0200, Roedel, Joerg wrote:
 On Mon, Aug 22, 2011 at 05:03:53PM -0400, Benjamin Herrenschmidt wrote:
  
   I am in favour of /dev/vfio/$GROUP. If multiple devices should be
   assigned to a guest, there can also be an ioctl to bind a group to an
   address-space of another group (certainly needs some care to not allow
   that both groups belong to different processes).
   
   Btw, a problem we havn't talked about yet entirely is
   driver-deassignment. User space can decide to de-assign the device from
   vfio while a fd is open on it. With PCI there is no way to let this fail
   (the .release function returns void last time i checked). Is this a
   problem, and yes, how we handle that?
  
  We can treat it as a hard unplug (like a cardbus gone away).
  
  IE. Dispose of the direct mappings (switch to MMIO emulation) and return
  all ff's from reads ( ignore writes).
  
  Then send an unplug event via whatever mechanism the platform provides
  (ACPI hotplug controller on x86 for example, we haven't quite sorted out
  what to do on power for hotplug yet).
 
 Hmm, good idea. But as far as I know the hotplug-event needs to be in
 the guest _before_ the device is actually unplugged (so that the guest
 can unbind its driver first). That somehow brings back the sleep-idea
 and the timeout in the .release function.

That's for normal assisted hotplug, but don't we support hard hotplug ?
I mean, things like cardbus, thunderbolt (if we ever support that)
etc... will need it and some platforms do support hard hotplug of PCIe
devices.

(That's why drivers should never spin on MMIO waiting for a 1 bit to
clear without a timeout :-)

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Benjamin Herrenschmidt
On Tue, 2011-08-23 at 10:23 -0600, Alex Williamson wrote:
 
 Yeah.  Joerg's idea of binding groups internally (pass the fd of one
 group to another via ioctl) is one option.  The tricky part will be
 implementing it to support hot unplug of any group from the
 supergroup.
 I believe Ben had a suggestion that supergroups could be created in
 sysfs, but I don't know what the mechanism to do that looks like.  It
 would also be an extra management step to dynamically bind and unbind
 groups to the supergroup around hotplug.  Thanks, 

I don't really care that much what the method for creating them is, to
be honest, I just prefer this concept of meta groups or super groups
or synthetic groups (whatever you want to name them) to having a
separate uiommu file descriptor.

The one reason I have a slight preference for creating them statically
using some kind of separate interface (again, I don't care whether it's
sysfs, netlink, etc...) is that it means things like qemu don't have to
care about them.

In general, apps that want to use vfio can just get passed the path to
such a group or the /dev/ path or the group number (whatever we chose as
the way to identify a group), and don't need to know anything about
super groups, how to manipulate them, create them, possible
constraints etc...

Now, libvirt might want to know about that other API in order to provide
control on the creation of these things, but that's a different issue.

By static I mean they persist, they aren't tied to the lifetime of an
fd.

Now that's purely a preference on my side because I believe it will make
life easier for actual programs wanting to use vfio to not have to care
about those super-groups, but as I said earlier, I don't actually care
that much :-)

Cheers,
Ben.


___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Benjamin Herrenschmidt

  For us the most simple and logical approach (which is also what pHyp
  uses and what Linux handles well) is really to expose a given PCI host
  bridge per group to the guest. Believe it or not, it makes things
  easier :-)
 
 I'm all for easier.  Why does exposing the bridge use less bus numbers
 than emulating a bridge?

Because a host bridge doesn't look like a PCI to PCI bridge at all for
us. It's an entire separate domain with it's own bus number space
(unlike most x86 setups).

In fact we have some problems afaik in qemu today with the concept of
PCI domains, for example, I think qemu has assumptions about a single
shared IO space domain which isn't true for us (each PCI host bridge
provides a distinct IO space domain starting at 0). We'll have to fix
that, but it's not a huge deal.

So for each group we'd expose in the guest an entire separate PCI
domain space with its own IO, MMIO etc... spaces, handed off from a
single device-tree host bridge which doesn't itself appear in the
config space, doesn't need any emulation of any config space etc...

 On x86, I want to maintain that our default assignment is at the device
 level.  A user should be able to pick single or multiple devices from
 across several groups and have them all show up as individual,
 hotpluggable devices on bus 0 in the guest.  Not surprisingly, we've
 also seen cases where users try to attach a bridge to the guest,
 assuming they'll get all the devices below the bridge, so I'd be in
 favor of making this just work if possible too, though we may have to
 prevent hotplug of those.

 Given the device requirement on x86 and since everything is a PCI device
 on x86, I'd like to keep a qemu command line something like -device
 vfio,host=00:19.0.  I assume that some of the iommu properties, such as
 dma window size/address, will be query-able through an architecture
 specific (or general if possible) ioctl on the vfio group fd.  I hope
 that will help the specification, but I don't fully understand what all
 remains.  Thanks,

Well, for iommu there's a couple of different issues here but yes,
basically on one side we'll have some kind of ioctl to know what segment
of the device(s) DMA address space is assigned to the group and we'll
need to represent that to the guest via a device-tree property in some
kind of parent node of all the devices in that group.

We -might- be able to implement some kind of hotplug of individual
devices of a group under such a PHB (PCI Host Bridge), I don't know for
sure yet, some of that PAPR stuff is pretty arcane, but basically, for
all intend and purpose, we really want a group to be represented as a
PHB in the guest.

We cannot arbitrary have individual devices of separate groups be
represented in the guest as siblings on a single simulated PCI bus.

Cheers,
Ben.

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [Cbe-oss-dev] [PATCH part1 v2 4/9] Add region 1 memory early

2011-08-23 Thread Geoff Levand
Hi Antonio,

On 08/23/2011 03:37 PM, Antonio Ospite wrote:
 Geoff Levand ge...@infradead.org wrote:
  +  if (!map.r1.size) {
  +  DBG(%s:%d: no region 1, not adding memory\n,
  +  __func__, __LINE__);
  +  } else {
 
 Remove brackets around a single line conditional.
 
  +  DBG(%s:%d: adding memory: start %llxh, size %llxh\n,
  +  __func__, __LINE__, map.rm.size, map.r1.size);
  +
  +  memblock_add(map.rm.size, map.r1.size);
  +  memblock_analyze();
  +  }
  +
 
 In Documentation/CodingStyle I read that if [only] one branch is a
 single statement then the parenthesis are OK (and even recommended) for
 both branches, I guess this is for style consistency. See Chapter 3,
 around line 169 on my copy. I guess the wording on that paragraph can
 be made more explicit, I'll try to fix that up.

Thanks for the comments.  I don't think its such an important change,
mainly for consistency of style within the PS3 files.

-Geoff

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH v3] mtd/nand : workaround for Freescale FCM to support large-page Nand chip

2011-08-23 Thread LiuShuo

于 2011年08月23日 18:02, Matthieu CASTET 写道:

LiuShuo a écrit :

于 2011年08月19日 00:25, Scott Wood 写道:

On 08/17/2011 09:33 PM, b35...@freescale.com wrote:

From: Liu Shuob35...@freescale.com

Freescale FCM controller has a 2K size limitation of buffer RAM. In order
to support the Nand flash chip whose page size is larger than 2K bytes,
we divide a page into multi-2K pages for MTD layer driver. In that case,
we force to set the page size to 2K bytes. We convert the page address of
MTD layer driver to a real page address in flash chips and a column index
in fsl_elbc driver. We can issue any column address by UA instruction of
elbc controller.

NOTE: Due to there is a limitation of 'Number of Partial Program Cycles in
the Same Page (NOP)', the flash chip which is supported by this workaround
have to meet below conditions.
1. page size is not greater than 4KB
2.  1) if main area and spare area have independent NOPs:
  main  area NOP:=3
  spare area NOP:=2?

How often are the NOPs split like this?


2) if main area and spare area have a common NOP:
  NOP   :=4

This depends on how the flash is used.  If you treat it as a NOP1 flash
(e.g. run ubifs rather than jffs2), then you need NOP2 for a 4K chip and
NOP4 for an 8K chip.  OTOH, if you would be making full use of NOP4 on a
real 2K chip, you'll need NOP8 for a 4K chip.

The NOP restrictions should be documented in the code itself, not just
in the git changelog.  Maybe print it to the console when this hack is
used, along with the NOP value read from the ID.

We can't read the NOP from the ID on any chip. Some chips don't
give this infomation.(e.g. Micron MT29F4G08BAC)

Doesn't the micron chip provide it with onfi info ?

Sorry, there is something wrong with my expression.
We can get the NOP info from datasheet, but can't get it by READID 
command in code.


-LiuShuo

Matthieu




___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev

Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Alexander Graf

On 23.08.2011, at 18:41, Benjamin Herrenschmidt wrote:

 On Tue, 2011-08-23 at 10:23 -0600, Alex Williamson wrote:
 
 Yeah.  Joerg's idea of binding groups internally (pass the fd of one
 group to another via ioctl) is one option.  The tricky part will be
 implementing it to support hot unplug of any group from the
 supergroup.
 I believe Ben had a suggestion that supergroups could be created in
 sysfs, but I don't know what the mechanism to do that looks like.  It
 would also be an extra management step to dynamically bind and unbind
 groups to the supergroup around hotplug.  Thanks, 
 
 I don't really care that much what the method for creating them is, to
 be honest, I just prefer this concept of meta groups or super groups
 or synthetic groups (whatever you want to name them) to having a
 separate uiommu file descriptor.
 
 The one reason I have a slight preference for creating them statically
 using some kind of separate interface (again, I don't care whether it's
 sysfs, netlink, etc...) is that it means things like qemu don't have to
 care about them.
 
 In general, apps that want to use vfio can just get passed the path to
 such a group or the /dev/ path or the group number (whatever we chose as
 the way to identify a group), and don't need to know anything about
 super groups, how to manipulate them, create them, possible
 constraints etc...
 
 Now, libvirt might want to know about that other API in order to provide
 control on the creation of these things, but that's a different issue.
 
 By static I mean they persist, they aren't tied to the lifetime of an
 fd.
 
 Now that's purely a preference on my side because I believe it will make
 life easier for actual programs wanting to use vfio to not have to care
 about those super-groups, but as I said earlier, I don't actually care
 that much :-)

Oh I think it's one of the building blocks we need for a sane user space device 
exposure API. If I want to pass user X a few devices that are all behind a 
single IOMMU, I just chown that device node to user X and be done with it.

The user space tool actually using the VFIO interface wouldn't be in 
configuration business then - and it really shouldn't. That's what system 
configuration is there for :).

But I'm fairly sure we managed to persuade Alex that this is the right path on 
the BOF :)


Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: kvm PCI assignment VFIO ramblings

2011-08-23 Thread Alexander Graf

On 23.08.2011, at 18:51, Benjamin Herrenschmidt wrote:

 
 For us the most simple and logical approach (which is also what pHyp
 uses and what Linux handles well) is really to expose a given PCI host
 bridge per group to the guest. Believe it or not, it makes things
 easier :-)
 
 I'm all for easier.  Why does exposing the bridge use less bus numbers
 than emulating a bridge?
 
 Because a host bridge doesn't look like a PCI to PCI bridge at all for
 us. It's an entire separate domain with it's own bus number space
 (unlike most x86 setups).
 
 In fact we have some problems afaik in qemu today with the concept of
 PCI domains, for example, I think qemu has assumptions about a single
 shared IO space domain which isn't true for us (each PCI host bridge
 provides a distinct IO space domain starting at 0). We'll have to fix
 that, but it's not a huge deal.
 
 So for each group we'd expose in the guest an entire separate PCI
 domain space with its own IO, MMIO etc... spaces, handed off from a
 single device-tree host bridge which doesn't itself appear in the
 config space, doesn't need any emulation of any config space etc...
 
 On x86, I want to maintain that our default assignment is at the device
 level.  A user should be able to pick single or multiple devices from
 across several groups and have them all show up as individual,
 hotpluggable devices on bus 0 in the guest.  Not surprisingly, we've
 also seen cases where users try to attach a bridge to the guest,
 assuming they'll get all the devices below the bridge, so I'd be in
 favor of making this just work if possible too, though we may have to
 prevent hotplug of those.
 
 Given the device requirement on x86 and since everything is a PCI device
 on x86, I'd like to keep a qemu command line something like -device
 vfio,host=00:19.0.  I assume that some of the iommu properties, such as
 dma window size/address, will be query-able through an architecture
 specific (or general if possible) ioctl on the vfio group fd.  I hope
 that will help the specification, but I don't fully understand what all
 remains.  Thanks,
 
 Well, for iommu there's a couple of different issues here but yes,
 basically on one side we'll have some kind of ioctl to know what segment
 of the device(s) DMA address space is assigned to the group and we'll
 need to represent that to the guest via a device-tree property in some
 kind of parent node of all the devices in that group.
 
 We -might- be able to implement some kind of hotplug of individual
 devices of a group under such a PHB (PCI Host Bridge), I don't know for
 sure yet, some of that PAPR stuff is pretty arcane, but basically, for
 all intend and purpose, we really want a group to be represented as a
 PHB in the guest.
 
 We cannot arbitrary have individual devices of separate groups be
 represented in the guest as siblings on a single simulated PCI bus.

So would it make sense for you to go the same route that we need to go on 
embedded power, with a separate VFIO style interface that simply exports memory 
ranges and irq bindings, but doesn't know anything about PCI? For e500, we'll 
be using something like that to pass through a full PCI bus into the system.


Alex

___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 2/2] [PowerPC Book3E] Introduce new ptrace debug feature flag

2011-08-23 Thread David Gibson
On Tue, Aug 23, 2011 at 02:57:56PM +0530, K.Prasad wrote:
 On Tue, Aug 23, 2011 at 03:09:31PM +1000, David Gibson wrote:
  On Fri, Aug 19, 2011 at 01:23:38PM +0530, K.Prasad wrote:
   
   While PPC_PTRACE_SETHWDEBUG ptrace flag in PowerPC accepts
   PPC_BREAKPOINT_MODE_EXACT mode of breakpoint, the same is not intimated 
   to the
   user-space debuggers (like GDB) who may want to use it. Hence we 
   introduce a
   new PPC_DEBUG_FEATURE_DATA_BP_EXACT flag which will be populated on the
   features member of struct ppc_debug_info to advertise support for the
   same on Book3E PowerPC processors.
  
  I thought the idea was that the BP_EXACT mode was the default - if the
  new interface was supported at all, then BP_EXACT was always
  supported.  So, why do you need a new flag?
  
 
 Yes, BP_EXACT was always supported but not advertised through
 PPC_PTRACE_GETHWDBGINFO. We're now doing that.

I can see that.  But you haven't answered why.

-- 
David Gibson| I'll have my music baroque, and my code
david AT gibson.dropbear.id.au  | minimalist, thank you.  NOT _the_ _other_
| _way_ _around_!
http://www.ozlabs.org/~dgibson
___
Linuxppc-dev mailing list
Linuxppc-dev@lists.ozlabs.org
https://lists.ozlabs.org/listinfo/linuxppc-dev


Re: [PATCH 1/2] [hw-breakpoint] Use generic hw-breakpoint interfaces for new PPC ptrace flags

2011-08-23 Thread David Gibson
On Tue, Aug 23, 2011 at 02:55:13PM +0530, K.Prasad wrote:
 On Tue, Aug 23, 2011 at 03:08:50PM +1000, David Gibson wrote:
  On Fri, Aug 19, 2011 at 01:21:36PM +0530, K.Prasad wrote:
   PPC_PTRACE_GETHWDBGINFO, PPC_PTRACE_SETHWDEBUG and PPC_PTRACE_DELHWDEBUG 
   are
   PowerPC specific ptrace flags that use the watchpoint register. While 
   they are
   targeted primarily towards BookE users, user-space applications such as 
   GDB
   have started using them for BookS too.
   
   This patch enables the use of generic hardware breakpoint interfaces for 
   these
   new flags. The version number of the associated data structures
   ppc_hw_breakpoint and ppc_debug_info is incremented to denote new 
   semantics.
  
  So, the structure itself doesn't seem to have been extended.  I don't
  understand what the semantic difference is - your patch comment needs
  to explain this clearly.
 
 
 We had a request to extend the structure but thought it was dangerous to
 do so. For instance if the user-space used version1 of the structure,
 while kernel did a copy_to_user() pertaining to version2, then we'd run
 into problems. Unfortunately the ptrace flags weren't designed to accept
 a version number as input from the user through the
 PPC_PTRACE_GETHWDBGINFO flag (which would have solved this issue).

I still don't follow you.

 I'll add a comment w.r.t change in semantics - such as the ability to
 accept 'range' breakpoints in BookS.
  
   Apart from the usual benefits of using generic hw-breakpoint interfaces, 
   these
   changes allow debuggers (such as GDB) to use a common set of ptrace flags 
   for
   their watchpoint needs and allow more precise breakpoint specification 
   (length
   of the variable can be specified).
  
  What is the mechanism for implementing the range breakpoint on book3s?
  
 
 The hw-breakpoint interface, accepts length as an argument in BookS (any
 value = 8 Bytes) and would filter out extraneous interrupts arising out
 of accesses outside the range comprising addr, addr + len inside
 hw_breakpoint_handler function.
 
 We put that ability to use here.

Ah, so in hardware the breakpoints are always 8 bytes long, but you
filter out false hits on a shorter range?  Of course, the utility of
range breakpoints is questionable when length =8, but the start must
be aligned on an 8-byte boundary.

[snip]
 if ((unsigned long)bp_info-addr = TASK_SIZE)
 return -EIO;

   @@ -1398,15 +1400,86 @@ static long ppc_set_hwdebug(struct task_struct 
   *child,
 dabr |= DABR_DATA_READ;
 if (bp_info-trigger_type  PPC_BREAKPOINT_TRIGGER_WRITE)
 dabr |= DABR_DATA_WRITE;
   +#ifdef CONFIG_HAVE_HW_BREAKPOINT
   + if (bp_info-version == 1)
   + goto version_one;
  
  There are several legitimate uses of goto in the kernel, but this is
  definitely not one of them.  You're essentially using it to put the
  old and new versions of the same function in one block.  Nasty.
  
 
 Maybe it's the label that's causing bother here. It might look elegant
 if it was called something like exit_* or error_* :-)
 
 The goto here helps reduce code, is similar to the error exits we use
 everywhere.

Rubbish, it is not an exception exit at all, it is two separate code
paths for the different versions which would be much clearer as two
different functions.

   + if (ptrace_get_breakpoints(child)  0)
   + return -ESRCH;

   - child-thread.dabr = dabr;
   + bp = thread-ptrace_bps[0];
   + if (!bp_info-addr) {
   + if (bp) {
   + unregister_hw_breakpoint(bp);
   + thread-ptrace_bps[0] = NULL;
   + }
   + ptrace_put_breakpoints(child);
   + return 0;
  
  Why are you making setting a 0 watchpoint remove the existing one (I
  think that's what this does).  I thought there was an explicit del
  breakpoint operation instead.
 
 We had to define the semantics for what writing a 0 to DABR could mean,
 and I think it is intuitive to consider it as deletion
 request...couldn't think of a case where DABR with addr=0 and RW=1 would
 be required.

When a user space program maps pages at virtual address 0, which it
can do.

   + }
   + /*
   +  * Check if the request is for 'range' breakpoints. We can
   +  * support it if range  8 bytes.
   +  */
   + if (bp_info-addr_mode == PPC_BREAKPOINT_MODE_RANGE_INCLUSIVE)
   + len = bp_info-addr2 - bp_info-addr;
  
  So you compute the length here, but I don't see you ever test if it is
   8 and return an error.
  
 
 The hw-breakpoint interfaces would fail if the length was  8.

Ok.

   + else if (bp_info-addr_mode != PPC_BREAKPOINT_MODE_EXACT) {
   + ptrace_put_breakpoints(child);
   + return -EINVAL;
   + }
   + if (bp) {
   + attr = bp-attr;
   + attr.bp_addr = (unsigned long)bp_info-addr  
   ~HW_BREAKPOINT_ALIGN;
   + arch_bp_generic_fields(dabr 
   + (DABR_DATA_WRITE |