Re: [PATCH v2 2/8] phy: mvebu-cp110-comphy: fix port check in ->xlate()

2018-12-02 Thread Russell King - ARM Linux
On Sun, Dec 02, 2018 at 08:35:09PM +0100, Miquel Raynal wrote:
> Hi Russell,
> 
> Russell King - ARM Linux  wrote on Fri, 30 Nov
> 2018 19:00:31 +:
> 
> > On Fri, Nov 30, 2018 at 03:47:37PM +0100, Miquel Raynal wrote:
> > > So far the PHY ->xlate() callback was checking if the port was
> > > "invalid" before continuing, meaning that the port has not been used
> > > yet. This check is not correct as there is no opposite call to  
> > > ->xlate() once the PHY is released by the user and the port will  
> > > remain "valid" after the first phy_get()/phy_put() calls. Hence, if
> > > this driver is built as a module, inserted, removed and inserted
> > > again, the PHY will appear busy and the second probe will fail.
> > > 
> > > To fix this, just drop the faulty check and instead verify that the
> > > port number is valid (ie. in the possible range).
> > > 
> > > Signed-off-by: Miquel Raynal 
> > > ---
> > >  drivers/phy/marvell/phy-mvebu-cp110-comphy.c | 4 ++--
> > >  1 file changed, 2 insertions(+), 2 deletions(-)
> > > 
> > > diff --git a/drivers/phy/marvell/phy-mvebu-cp110-comphy.c 
> > > b/drivers/phy/marvell/phy-mvebu-cp110-comphy.c
> > > index 31b9a1c18345..a40b876ff214 100644
> > > --- a/drivers/phy/marvell/phy-mvebu-cp110-comphy.c
> > > +++ b/drivers/phy/marvell/phy-mvebu-cp110-comphy.c
> > > @@ -567,9 +567,9 @@ static struct phy *mvebu_comphy_xlate(struct device 
> > > *dev,
> > >   return phy;
> > >  
> > >   lane = phy_get_drvdata(phy);
> > > - if (lane->port >= 0)
> > > - return ERR_PTR(-EBUSY);
> > >   lane->port = args->args[0];
> > > + if (lane->port >= MVEBU_COMPHY_PORTS)
> > > + return ERR_PTR(-EINVAL);  
> > 
> > Shouldn't we validate args->args[0] before doing anything?
> > 
> 
> I don't understand your point, there is a check on args->args[0] as
> we check its value (through lane->port) right after. What do you
> have in mind?

Right, there is already a check on args->args[0] for it being greater
than MVEBU_COMPHY_PORTS and returning an error (and in fact warning
if that is the case).  So in that case, what is the use of the above
additional test you are proposing to add?  The resulting code ends up
looking like this:

if (WARN_ON(args->args[0] >= MVEBU_COMPHY_PORTS))
return ERR_PTR(-EINVAL);
...
lane->port = args->args[0];
+   if (lane->port >= MVEBU_COMPHY_PORTS)
+   return ERR_PTR(-EINVAL);  

which is just silly - the second test can never be evaluated as true,
and therefore is redundant.

In any case, my point was that in your patch, where you assign
lane->port and then validate the lane->port value, this is in
principle the wrong order - the order should always be: validate
first, then make use.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH 3.18 14/83] ARM: make lookup_processor_type() non-__init

2018-11-30 Thread Russell King - ARM Linux
On Fri, Nov 30, 2018 at 04:18:43PM +0100, Greg Kroah-Hartman wrote:
> On Fri, Nov 30, 2018 at 04:15:54PM +0100, Greg Kroah-Hartman wrote:
> > On Thu, Nov 29, 2018 at 02:28:18PM +, Russell King - ARM Linux wrote:
> > > Hi,
> > > 
> > > As I've already fed back to Sascha about this, this patch on its own
> > > does not fix anything, and is not a stable kernel candidate without
> > > a patch that makes use of it (iow, the spectre fixes.)  It is a
> > > preparatory patch for mainline commit 383fb3ee8024.
> > > 
> > > Every commit in:
> > > 
> > > $ git rev-list v4.16..383fb3ee8024
> > > 
> > > are the ARM spectre fixes, which are being back-ported by David Long.
> > > 
> > > Please do not cherry-pick commits from within this series for _any_
> > > stable kernel, but please wait for David to send you the back-ported
> > > patches.
> > 
> > Ugh, ok, that's a bunch here.
> 
> Ok, not that bad, only 4:
>   383fb3ee8024 ("ARM: spectre-v2: per-CPU vtables to work around 
> big.Little systems")
>   e209950fdd06 ("ARM: add PROC_VTABLE and PROC_TABLE macros")
>   945aceb1db88 ("ARM: clean up per-processor check_bugs method call")
>   899a42f83667 ("ARM: make lookup_processor_type() non-__init")
> 
> I'll go drop them all from the trees now.

Thanks.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH v2 2/8] phy: mvebu-cp110-comphy: fix port check in ->xlate()

2018-11-30 Thread Russell King - ARM Linux
On Fri, Nov 30, 2018 at 03:47:37PM +0100, Miquel Raynal wrote:
> So far the PHY ->xlate() callback was checking if the port was
> "invalid" before continuing, meaning that the port has not been used
> yet. This check is not correct as there is no opposite call to
> ->xlate() once the PHY is released by the user and the port will
> remain "valid" after the first phy_get()/phy_put() calls. Hence, if
> this driver is built as a module, inserted, removed and inserted
> again, the PHY will appear busy and the second probe will fail.
> 
> To fix this, just drop the faulty check and instead verify that the
> port number is valid (ie. in the possible range).
> 
> Signed-off-by: Miquel Raynal 
> ---
>  drivers/phy/marvell/phy-mvebu-cp110-comphy.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/drivers/phy/marvell/phy-mvebu-cp110-comphy.c 
> b/drivers/phy/marvell/phy-mvebu-cp110-comphy.c
> index 31b9a1c18345..a40b876ff214 100644
> --- a/drivers/phy/marvell/phy-mvebu-cp110-comphy.c
> +++ b/drivers/phy/marvell/phy-mvebu-cp110-comphy.c
> @@ -567,9 +567,9 @@ static struct phy *mvebu_comphy_xlate(struct device *dev,
>   return phy;
>  
>   lane = phy_get_drvdata(phy);
> - if (lane->port >= 0)
> - return ERR_PTR(-EBUSY);
>   lane->port = args->args[0];
> + if (lane->port >= MVEBU_COMPHY_PORTS)
> + return ERR_PTR(-EINVAL);

Shouldn't we validate args->args[0] before doing anything?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH v8] clk: Add (devm_)clk_get_optional() functions

2018-11-30 Thread Russell King - ARM Linux
On Fri, Nov 30, 2018 at 10:25:37AM +, Phil Edworthy wrote:
> Hi Stephen,
> 
> On 30 November 2018 09:09 Stephen Boyd wrote:
> > Quoting Phil Edworthy (2018-11-20 06:14:45)
> > > This adds clk_get_optional() and devm_clk_get_optional() functions to
> > > get optional clocks.
> > > They behave the same as (devm_)clk_get except where there is no clock
> > > producer. In this case, instead of returning -ENOENT, the function
> > > returns NULL. This makes error checking simpler and allows
> > > clk_prepare_enable, etc to be called on the returned reference without
> > > additional checks.
> > 
> > Ok. I guess that works by virtue of how -ENOENT is returned by various
> > functions that are called deeper in the clk_get() path? I'm cautiously
> > optimistic. So cautious, we should probably add a comment to these optional
> > functions that indicate they rely on the functions they call to return 
> > -ENOENT
> > under the various conditions that make a clk optional.
> Yes, it does indeed rely on how clk_get() is implemented.
> Specifically, that if __of_clk_get_by_name() returns -EINVAL, the error is
> superseded by clk_get_sys() returning -ENOENT.
> As you say, a comment may help here.

Each time the question of the optional clk_get() stuff comes up, we go
around the same discussions time and time again.  So far, each time
has ended up flopping.

Yes, clk_get() can only ever return -ENOENT if it falls back to the
non-DT methods, because it assumes that the clk tables are complete
(it can do nothing else.)

I don't think it needs a comment because it's obvious from the code
and also from the implementation point of view.

> > > +static inline struct clk *clk_get_optional(struct device *dev, const
> > > +char *id)
> > 
> > Any kernel doc for this function?
> I took my cue from the surrounding functions, let me know if I have to add it.

I don't see you need to - this is an internal function by way of the
"static inline" you have before it.  It's not an API function.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH 2/6] phy: add A3700 COMPHY support

2018-11-29 Thread Russell King - ARM Linux
On Thu, Nov 29, 2018 at 05:12:45PM +0100, Miquel Raynal wrote:
> Hello,
> 
> Miquel Raynal  wrote on Thu, 22 Nov 2018
> 22:04:16 +0100:
> > +static struct phy *mvebu_a3700_comphy_xlate(struct device *dev,
> > +   struct of_phandle_args *args)
> > +{
> > +   struct mvebu_a3700_comphy_lane *lane;
> > +   struct phy *phy;
> > +
> > +   if (WARN_ON(args->args[0] >= MVEBU_A3700_COMPHY_PORTS))
> > +   return ERR_PTR(-EINVAL);
> > +
> > +   phy = of_phy_simple_xlate(dev, args);
> > +   if (IS_ERR(phy))
> > +   return phy;
> > +
> > +   lane = phy_get_drvdata(phy);
> > +   if (lane->port >= 0)
> > +   return ERR_PTR(-EBUSY);
> 
> This is not valid. It works only the first time the consumer gets
> a PHY for this lane. If the consumer put the PHY (module is unloaded)
> and then gets the PHY again (module is re-loaded a second time), the
> port is already set to != -1 and this will exit with an error and
> prevent the module to probe.
> 
> > +
> > +   lane->port = args->args[0];
> 
> I will remove the above check and transform it into a valid "port"
> value right here.

Please note that mvebu_comphy_xlate() in
drivers/phy/marvell/phy-mvebu-cp110-comphy.c does exactly the same.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] arm64: io: specify asm operand width for __iormb()

2018-11-29 Thread Russell King - ARM Linux
On Thu, Nov 29, 2018 at 09:10:39AM -0700, Nathan Chancellor wrote:
> On Thu, Nov 29, 2018 at 10:49:03AM +, Will Deacon wrote:
> > On Thu, Nov 29, 2018 at 09:03:54AM +, Julien Thierry wrote:
> > >
> > >
> > > On 29/11/18 04:19, Nick Desaulniers wrote:
> > > > Fixes the warning produced from Clang:
> > > > ./include/asm-generic/io.h:711:9: warning: value size does not match
> > > > register size specified by the constraint and modifier
> > > > [-Wasm-operand-widths]
> > > > return readl(addr);
> > > >^
> > > > ./arch/arm64/include/asm/io.h:149:58: note: expanded from macro 'readl'
> > > >   ^
> > > > ./include/asm-generic/io.h:711:9: note: use constraint modifier "w"
> > > > ./arch/arm64/include/asm/io.h:149:50: note: expanded from macro 'readl'
> > > >   ^
> > > > ./arch/arm64/include/asm/io.h:118:25: note: expanded from macro 
> > > > '__iormb'
> > > > asm volatile("eor   %w0, %1, %1\n" \
> > > >  ^
> > >
> > > Why does the "eor %0, %1, %1" become "eor %w0, %1, %1" ?
> > > The variable passed to the inline assembly for %0 is unsigned long, so
> > > always 64-bits wide on arm64. Why is clang trying to use a 32-bit
> > > register for it?
> 
> Sorry, this was my fault, I accidentally added a w during testing to see
> what constraints were valid (given that my assembly knowledge is nearly
> non-existence so forgive the non-sensical experimentation) and I used
> that message rather than the original one. This is the unadulterated one.
> 
> In file included from arch/arm64/kernel/asm-offsets.c:24:
> In file included from ./include/linux/dma-mapping.h:11:
> In file included from ./include/linux/scatterlist.h:9:
> In file included from ./arch/arm64/include/asm/io.h:209:
> ./include/asm-generic/io.h:695:9: warning: value size does not match register 
> size specified by the constraint and modifier [-Wasm-operand-widths]
> return readb(addr);
>^
> ./arch/arm64/include/asm/io.h:147:58: note: expanded from macro 'readb'
> #define readb(c)({ u8  __v = readb_relaxed(c); __iormb(__v); 
> __v; })
>^
> ./include/asm-generic/io.h:695:9: note: use constraint modifier "w"
> ./arch/arm64/include/asm/io.h:147:50: note: expanded from macro 'readb'
> #define readb(c)({ u8  __v = readb_relaxed(c); __iormb(__v); 
> __v; })
>^
> ./arch/arm64/include/asm/io.h:118:24: note: expanded from macro '__iormb'
> asm volatile("eor   %0, %1, %1\n"   \
> ^
> 
> >
> > Yeah, the message above looks bogus to me. I can see %1 being 32-bit for
> > read[bwl], so maybe clang is just getting the diagnostic wrong. If so,
> > I wonder if the following fixes the problem:
> >
> 
> This doesn't appear to work, I get this error:
> 
> In file included from arch/arm64/kernel/asm-offsets.c:24:
> In file included from ./include/linux/dma-mapping.h:11:
> In file included from ./include/linux/scatterlist.h:9:
> In file included from ./arch/arm64/include/asm/io.h:209:
> ./include/asm-generic/io.h:695:9: error: expected expression
> return readb(addr);
>^
> ./arch/arm64/include/asm/io.h:147:50: note: expanded from macro 'readb'
> #define readb(c)({ u8  __v = readb_relaxed(c); __iormb(__v); 
> __v; })
>^
> ./arch/arm64/include/asm/io.h:120:28: note: expanded from macro '__iormb'
>  : "=r" (tmp) : "r" (unsigned long)(v) : "memory"); \
>  ^
> 
> >
> > diff --git a/arch/arm64/include/asm/io.h b/arch/arm64/include/asm/io.h
> > index d42d00d8d5b6..13befec8b64e 100644
> > --- a/arch/arm64/include/asm/io.h
> > +++ b/arch/arm64/include/asm/io.h
> > @@ -117,7 +117,7 @@ static inline u64 __raw_readq(const volatile void 
> > __iomem *addr)
> >  */ \
> > asm volatile("eor   %0, %1, %1\n"   \
> >  "cbnz  %0, ."  \
> > -: "=r" (tmp) : "r" (v) : "memory");\
> > +: "=r" (tmp) : "r" (unsigned long)(v) : "memory"); \

The parens around the passed value are part of the asm() syntax, which
is:

"contraint" (expression)

so should be:

+: "=r" (tmp) : "r" ((unsigned long)(v)) : "memory"); \

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH 3.18 14/83] ARM: make lookup_processor_type() non-__init

2018-11-29 Thread Russell King - ARM Linux
Hi,

As I've already fed back to Sascha about this, this patch on its own
does not fix anything, and is not a stable kernel candidate without
a patch that makes use of it (iow, the spectre fixes.)  It is a
preparatory patch for mainline commit 383fb3ee8024.

Every commit in:

$ git rev-list v4.16..383fb3ee8024

are the ARM spectre fixes, which are being back-ported by David Long.

Please do not cherry-pick commits from within this series for _any_
stable kernel, but please wait for David to send you the back-ported
patches.

Thanks.

On Thu, Nov 29, 2018 at 03:11:32PM +0100, Greg Kroah-Hartman wrote:
> 3.18-stable review patch.  If anyone has any objections, please let me know.
> 
> --
> 
> [ Upstream commit 899a42f836678a595f7d2bc36a5a0c2b03d08cbc ]
> 
> Move lookup_processor_type() out of the __init section so it is callable
> from (eg) the secondary startup code during hotplug.
> 
> Reviewed-by: Julien Thierry 
> Signed-off-by: Russell King 
> Signed-off-by: Sasha Levin 
> ---
>  arch/arm/kernel/head-common.S | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm/kernel/head-common.S b/arch/arm/kernel/head-common.S
> index 8733012d231f..7e662bdd5cb3 100644
> --- a/arch/arm/kernel/head-common.S
> +++ b/arch/arm/kernel/head-common.S
> @@ -122,6 +122,9 @@ __mmap_switched_data:
>   .long   init_thread_union + THREAD_START_SP @ sp
>   .size   __mmap_switched_data, . - __mmap_switched_data
>  
> + __FINIT
> + .text
> +
>  /*
>   * This provides a C-API version of __lookup_processor_type
>   */
> @@ -133,9 +136,6 @@ ENTRY(lookup_processor_type)
>   ldmfd   sp!, {r4 - r6, r9, pc}
>  ENDPROC(lookup_processor_type)
>  
> - __FINIT
> - .text
> -
>  /*
>   * Read processor ID register (CP#15, CR0), and look up in the linker-built
>   * supported processor list.  Note that we can't use the absolute addresses
> -- 
> 2.17.1
> 
> 
> 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] pci: imx6: support kernels built in Thumb-2 mode

2018-11-28 Thread Russell King - ARM Linux
On Wed, Nov 28, 2018 at 02:25:54PM +0100, Stefan Agner wrote:
> Add a fault handler which handles reads in Thumb-2 mode. Install
> the appropriate handler depending on which mode the kernel has
> been built. This avoids an "Unhandled fault: external abort on
> non-linefetch (0x1008) at 0xf0a8" during boot on a device
> with a PCIe switch connected.
> 
> Link: https://lore.kernel.org/linux-pci/20181126161645.8177-1-ste...@agner.ch/
> Signed-off-by: Stefan Agner 
> ---
> FWIW, I found this manual helpful to write the code below:
> http://hermes.wings.cs.wisc.edu/files/Thumb-2SupplementReferenceManual.pdf#page=43=100,0,66
> 
> --
> Stefan
> 
>  drivers/pci/controller/dwc/pci-imx6.c | 37 ++-
>  1 file changed, 36 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/controller/dwc/pci-imx6.c 
> b/drivers/pci/controller/dwc/pci-imx6.c
> index 69f86234f7c0..683deb74d69f 100644
> --- a/drivers/pci/controller/dwc/pci-imx6.c
> +++ b/drivers/pci/controller/dwc/pci-imx6.c
> @@ -29,6 +29,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include "pcie-designware.h"
>  
> @@ -299,6 +300,37 @@ static int imx6q_pcie_abort_handler(unsigned long addr,
>   return 1;
>  }
>  
> +static int imx6q_pcie_abort_handler_thumb2(unsigned long addr,
> + unsigned int fsr, struct pt_regs *regs)
> +{
> + unsigned long pc = instruction_pointer(regs);
> + unsigned long instr = *(unsigned long *)pc;

So what happens if userspace mmap()s the PCIe space (eg, via
/dev/mem), and then accesses it, triggering this fault?  You'll
be attempting to read from userspace here, which will oops the
kernel.  The kernel is not allowed to access userspace by
simply dereferencing a pointer.

> + unsigned long thumb2_instr = __mem_to_opcode_thumb16(instr);
> + int reg = thumb2_instr & 7;
> +
> + if (!__opcode_is_thumb16(instr & 0xUL))
> + return 1;
> +
> + /* Load word/byte and halfword immediate offset */
> + if (((thumb2_instr & 0xe800) == 0x6800) ||
> + ((thumb2_instr & 0xf800) == 0x8800)) {
> + unsigned long val;
> +
> + if (thumb2_instr & 0x1000)
> + val = 0xff;
> + else if (thumb2_instr & 0x8000)
> + val = 0x;
> + else
> + val = 0xUL;
> +
> + regs->uregs[reg] = val;
> + regs->ARM_pc += 2;
> + return 0;
> + }
> +
> + return 1;
> +}
> +
>  static int imx6_pcie_attach_pd(struct device *dev)
>  {
>   struct imx6_pcie *imx6_pcie = dev_get_drvdata(dev);
> @@ -1069,6 +1101,8 @@ static struct platform_driver imx6_pcie_driver = {
>  
>  static int __init imx6_pcie_init(void)
>  {
> + bool thumb2 = IS_ENABLED(CONFIG_THUMB2_KERNEL);
> +
>   /*
>* Since probe() can be deferred we need to make sure that
>* hook_fault_code is not called after __init memory is freed
> @@ -1076,7 +1110,8 @@ static int __init imx6_pcie_init(void)
>* we can install the handler here without risking it
>* accessing some uninitialized driver state.
>*/
> - hook_fault_code(8, imx6q_pcie_abort_handler, SIGBUS, 0,
> + hook_fault_code(8, thumb2 ? imx6q_pcie_abort_handler_thumb2 :
> + imx6q_pcie_abort_handler, SIGBUS, 0,
>   "external abort on non-linefetch");
>  
>   return platform_driver_register(_pcie_driver);
> -- 
> 2.19.1
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH AUTOSEL 4.14 10/21] ARM: make lookup_processor_type() non-__init

2018-11-28 Thread Russell King - ARM Linux
On Wed, Nov 28, 2018 at 09:12:52AM -0500, Sasha Levin wrote:
> On Fri, Nov 23, 2018 at 12:02:54AM +0000, Russell King - ARM Linux wrote:
> >Hi Sasha,
> >
> >We need to keep track of which Spectre patches have been backported
> >and which haven't.  David Long has been doing the backport work,
> >which doesn't include all the patches in my present spectre branch.
> >You've picked up the last five, meaning there's a bunch in the middle
> >of the entire series which haven't yet been considered - for example,
> >the Spectre variant 1.1 patches.
> 
> I'll drop all the ones you pointed out from my branches. Sometimes it's
> difficult to tell if a certain patch is part of the spectre fix.

Given that all the 32-bit ARM Spectre patches are sequential in a
branch, it shouldn't be too difficult as it's possible to list all
the commit IDs using:

$ git rev-list v4.16..383fb3ee8024

where 383fb3ee8024 is the current head.  I'm not anticipating at this
time anything further.

> >The entire series of 32-bit ARM Spectre patches can be listed in
> >mainline via:
> >
> >$ git log v4.16..383fb3ee8024
> >
> >where almost all of those need to be applied to stable kernels, with
> >possible manual back-porting where they don't apply.
> 
> Is it something that will be backported at some point?

Yes, as I mentioned, this is work that David Long is undertaking,
with a separate review process before submitting the backported
patches for inclusion in the appropriate stable tree.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] arm: always update thread_info->syscall

2018-11-27 Thread Russell King - ARM Linux
On Tue, Nov 27, 2018 at 10:56:20AM +, Russell King - ARM Linux wrote:
> On Tue, Nov 27, 2018 at 08:30:32AM -0200, Rafael David Tinoco wrote:
> > On 11/26/18 9:44 PM, Russell King - ARM Linux wrote:
> > >On Mon, Nov 26, 2018 at 11:41:11PM +, Russell King - ARM Linux wrote:
> > >>On Mon, Nov 26, 2018 at 11:33:03PM +, Russell King - ARM Linux wrote:
> > >>>On Mon, Nov 26, 2018 at 08:53:35PM -0200, Rafael David Tinoco wrote:
> > >>>>Right now, only way for task->thread_info->syscall to be updated is if
> > >>>>if _TIF_SYSCALL_WORK is set in current's task thread_info->flags
> > >>>>(similar to what has_syscall_work() checks for arm64).
> > >>>>
> > >>>>This means that "->syscall" will only be updated if we are tracing the
> > >>>>syscalls through ptrace, for example. This is NOT the same behavior as
> > >>>>arm64, when pt_regs->syscallno is updated in the beginning of svc0
> > >>>>handler for *every* syscall entry.
> > >>>
> > >>>So when was it decided that the syscall number will always be required
> > >>>(we need it to know how far back this has to be backported).
> > >>
> > >>PS, I rather object to the fact that the required behaviour seems to
> > >>change, arch maintainers aren't told about it until... some test is
> > >>created at some random point in the future which then fails.
> > >>
> > >>Surely there's a better way to communicate changes in requirements
> > >>than discovery-by-random-bug-report ?
> > >
> > >Final comment for tonight - the commit introducing /proc/*/syscall says:
> > >
> > > This adds /proc/PID/syscall and /proc/PID/task/TID/syscall magic 
> > > files.
> > > These use task_current_syscall() to show the task's current system 
> > > call
> > > number and argument registers, stack pointer and PC.  For a task 
> > > blocked
> > > but not in a syscall, the file shows "-1" in place of the syscall 
> > > number,
> > > followed by only the SP and PC.  For a task that's not blocked, it 
> > > shows
> > > "running".
> > >
> > >Please validate that a blocked task does indeed show -1 with your patch
> > >applied.
> > 
> > Will do. This is done in an upper level (collect_syscall <-
> > task_current_syscall <- proc_pid_syscall):
> > 
> > if (!try_get_task_stack(target)) {
> > /* Task has no stack, so the task isn't in a syscall. */
> > *sp = *pc = 0;
> > *callno = -1;
> > return 0;
> > }
> > 
> > I think only missing part for arm was that one, but will confirm, after
> > fixing usage of "r7" for obtaining "scno". Will send a v2 in this thread.
> 
> There's another question - what's the expected behaviour when we
> restart a syscall using the restartblock mechanism?  Is the syscall
> number expected to be __NR_restart_syscall or the original syscall
> number?
> 
> I can't find anywhere that this detail is specified (damn the lack
> of API documentation - I'm tempted to say that we won't implement
> this until it gets documented properly, and that test can continue
> failing until such time that happens.)

Having looked around, it seems that the /proc//syscall interface
was sneaked into the kernel.  The patch series which added it was
sent in 2008 with a covering message that made no mention of this new
interface, instead stating:

  http://lkml.iu.edu/hypermail/linux/kernel/0807.2/0551.html

   Most of these changes move code around with little or no change,
   and they should not break anything or change any behavior.

While that statement is absolutely correct, it doesn't highlight the
fact that the set of patches _also_ include a brand new userspace
interface exposing things like syscall numbers and arguments in /proc.

There appears to be no documentation at all of this interface, so there
is no definition of how it is supposed to work or what it is supposed
to expose beyond what little information is in the original patch:

  http://lkml.iu.edu/hypermail/linux/kernel/0807.2/0577.html

   This adds /proc/PID/syscall and /proc/PID/task/TID/syscall magic files.
   These use task_current_syscall() to show the task's current system call
   number and argument registers, stack pointer and PC. For a task blocked
   but not in a syscall, the file shows "-1" in place of the syscall number,
   followed by only the SP and PC. For a task that's not blocked, it shows
 

Re: [PATCH] arm: always update thread_info->syscall

2018-11-27 Thread Russell King - ARM Linux
On Tue, Nov 27, 2018 at 08:30:32AM -0200, Rafael David Tinoco wrote:
> On 11/26/18 9:44 PM, Russell King - ARM Linux wrote:
> >On Mon, Nov 26, 2018 at 11:41:11PM +, Russell King - ARM Linux wrote:
> >>On Mon, Nov 26, 2018 at 11:33:03PM +, Russell King - ARM Linux wrote:
> >>>On Mon, Nov 26, 2018 at 08:53:35PM -0200, Rafael David Tinoco wrote:
> >>>>Right now, only way for task->thread_info->syscall to be updated is if
> >>>>if _TIF_SYSCALL_WORK is set in current's task thread_info->flags
> >>>>(similar to what has_syscall_work() checks for arm64).
> >>>>
> >>>>This means that "->syscall" will only be updated if we are tracing the
> >>>>syscalls through ptrace, for example. This is NOT the same behavior as
> >>>>arm64, when pt_regs->syscallno is updated in the beginning of svc0
> >>>>handler for *every* syscall entry.
> >>>
> >>>So when was it decided that the syscall number will always be required
> >>>(we need it to know how far back this has to be backported).
> >>
> >>PS, I rather object to the fact that the required behaviour seems to
> >>change, arch maintainers aren't told about it until... some test is
> >>created at some random point in the future which then fails.
> >>
> >>Surely there's a better way to communicate changes in requirements
> >>than discovery-by-random-bug-report ?
> >
> >Final comment for tonight - the commit introducing /proc/*/syscall says:
> >
> > This adds /proc/PID/syscall and /proc/PID/task/TID/syscall magic files.
> > These use task_current_syscall() to show the task's current system call
> > number and argument registers, stack pointer and PC.  For a task blocked
> > but not in a syscall, the file shows "-1" in place of the syscall 
> > number,
> > followed by only the SP and PC.  For a task that's not blocked, it shows
> > "running".
> >
> >Please validate that a blocked task does indeed show -1 with your patch
> >applied.
> 
> Will do. This is done in an upper level (collect_syscall <-
> task_current_syscall <- proc_pid_syscall):
> 
>   if (!try_get_task_stack(target)) {
>   /* Task has no stack, so the task isn't in a syscall. */
>   *sp = *pc = 0;
>   *callno = -1;
>   return 0;
>   }
> 
> I think only missing part for arm was that one, but will confirm, after
> fixing usage of "r7" for obtaining "scno". Will send a v2 in this thread.

There's another question - what's the expected behaviour when we
restart a syscall using the restartblock mechanism?  Is the syscall
number expected to be __NR_restart_syscall or the original syscall
number?

I can't find anywhere that this detail is specified (damn the lack
of API documentation - I'm tempted to say that we won't implement
this until it gets documented properly, and that test can continue
failing until such time that happens.)

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] arm: always update thread_info->syscall

2018-11-26 Thread Russell King - ARM Linux
On Mon, Nov 26, 2018 at 11:41:11PM +, Russell King - ARM Linux wrote:
> On Mon, Nov 26, 2018 at 11:33:03PM +0000, Russell King - ARM Linux wrote:
> > On Mon, Nov 26, 2018 at 08:53:35PM -0200, Rafael David Tinoco wrote:
> > > Right now, only way for task->thread_info->syscall to be updated is if
> > > if _TIF_SYSCALL_WORK is set in current's task thread_info->flags
> > > (similar to what has_syscall_work() checks for arm64).
> > > 
> > > This means that "->syscall" will only be updated if we are tracing the
> > > syscalls through ptrace, for example. This is NOT the same behavior as
> > > arm64, when pt_regs->syscallno is updated in the beginning of svc0
> > > handler for *every* syscall entry.
> > 
> > So when was it decided that the syscall number will always be required
> > (we need it to know how far back this has to be backported).
> 
> PS, I rather object to the fact that the required behaviour seems to
> change, arch maintainers aren't told about it until... some test is
> created at some random point in the future which then fails.
> 
> Surely there's a better way to communicate changes in requirements
> than discovery-by-random-bug-report ?

Final comment for tonight - the commit introducing /proc/*/syscall says:

This adds /proc/PID/syscall and /proc/PID/task/TID/syscall magic files.
These use task_current_syscall() to show the task's current system call
number and argument registers, stack pointer and PC.  For a task blocked
but not in a syscall, the file shows "-1" in place of the syscall number,
followed by only the SP and PC.  For a task that's not blocked, it shows
"running".

Please validate that a blocked task does indeed show -1 with your patch
applied.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] arm: always update thread_info->syscall

2018-11-26 Thread Russell King - ARM Linux
On Mon, Nov 26, 2018 at 11:33:03PM +, Russell King - ARM Linux wrote:
> On Mon, Nov 26, 2018 at 08:53:35PM -0200, Rafael David Tinoco wrote:
> > Right now, only way for task->thread_info->syscall to be updated is if
> > if _TIF_SYSCALL_WORK is set in current's task thread_info->flags
> > (similar to what has_syscall_work() checks for arm64).
> > 
> > This means that "->syscall" will only be updated if we are tracing the
> > syscalls through ptrace, for example. This is NOT the same behavior as
> > arm64, when pt_regs->syscallno is updated in the beginning of svc0
> > handler for *every* syscall entry.
> 
> So when was it decided that the syscall number will always be required
> (we need it to know how far back this has to be backported).

PS, I rather object to the fact that the required behaviour seems to
change, arch maintainers aren't told about it until... some test is
created at some random point in the future which then fails.

Surely there's a better way to communicate changes in requirements
than discovery-by-random-bug-report ?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] arm: always update thread_info->syscall

2018-11-26 Thread Russell King - ARM Linux
On Mon, Nov 26, 2018 at 08:53:35PM -0200, Rafael David Tinoco wrote:
> Right now, only way for task->thread_info->syscall to be updated is if
> if _TIF_SYSCALL_WORK is set in current's task thread_info->flags
> (similar to what has_syscall_work() checks for arm64).
> 
> This means that "->syscall" will only be updated if we are tracing the
> syscalls through ptrace, for example. This is NOT the same behavior as
> arm64, when pt_regs->syscallno is updated in the beginning of svc0
> handler for *every* syscall entry.

So when was it decided that the syscall number will always be required
(we need it to know how far back this has to be backported).

> This patch fixes the issue since this behavior is needed for
> /proc//syscall 1st argument to be correctly updated.
> 
> Link: https://bugs.linaro.org/show_bug.cgi?id=3783
> Cc:  # v4.4 v4.9 v4.14 v4.19
> Signed-off-by: Rafael David Tinoco 
> ---
>  arch/arm/kernel/asm-offsets.c  | 1 +
>  arch/arm/kernel/entry-common.S | 2 ++
>  2 files changed, 3 insertions(+)
> 
> diff --git a/arch/arm/kernel/asm-offsets.c b/arch/arm/kernel/asm-offsets.c
> index 3968d6c22455..bfe68a98e1c6 100644
> --- a/arch/arm/kernel/asm-offsets.c
> +++ b/arch/arm/kernel/asm-offsets.c
> @@ -64,6 +64,7 @@ int main(void)
>DEFINE(TI_USED_CP, offsetof(struct thread_info, used_cp));
>DEFINE(TI_TP_VALUE,offsetof(struct thread_info, tp_value));
>DEFINE(TI_FPSTATE, offsetof(struct thread_info, fpstate));
> +  DEFINE(TI_SYSCALL, offsetof(struct thread_info, syscall));
>  #ifdef CONFIG_VFP
>DEFINE(TI_VFPSTATE,offsetof(struct thread_info, vfpstate));
>  #ifdef CONFIG_SMP
> diff --git a/arch/arm/kernel/entry-common.S b/arch/arm/kernel/entry-common.S
> index 0465d65d23de..557e2add4e83 100644
> --- a/arch/arm/kernel/entry-common.S
> +++ b/arch/arm/kernel/entry-common.S
> @@ -257,6 +257,8 @@ local_restart:
>   tst r10, #_TIF_SYSCALL_WORK @ are we tracing syscalls?
>   bne __sys_trace
>  
> + str r7, [tsk, #TI_SYSCALL]  @ update thread_info->syscall

"scno" is the systemcall number, not "r7".

> +
>   invoke_syscall tbl, scno, r10, __ret_fast_syscall
>  
>   add r1, sp, #S_OFF
> -- 
> 2.20.0.rc1
> 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] dmaengine: ti: omap-dma: Configure LCH_TYPE for OMAP1

2018-11-25 Thread Russell King - ARM Linux
On Sun, Nov 25, 2018 at 11:11:05AM +, Russell King - ARM Linux wrote:
> On Sat, Nov 24, 2018 at 05:07:17PM -0800, Tony Lindgren wrote:
> > * Russell King - ARM Linux  [181124 20:10]:
> > > On Fri, Nov 23, 2018 at 08:52:15PM +0200, Aaro Koskinen wrote:
> > > > Hi,
> > > > 
> > > > On Fri, Nov 23, 2018 at 02:35:04PM +0200, Peter Ujfalusi wrote:
> > > > > On 22/11/2018 17.12, Russell King - ARM Linux wrote:
> > > > > > I'm also not sure about this:
> > > > > > 
> > > > > > if (cpu_is_omap15xx())
> > > > > > end++;
> > > > > > 
> > > > > > in dma_dest_len() - is that missing from the omap-dma driver?  It 
> > > > > > looks
> > > > > > like a work-around for some problem on OMAP15xx, but I can't make 
> > > > > > sense
> > > > > > about why it's in the UDC driver rather than the legacy DMA driver.
> > > > > 
> > > > > afaik no other legacy drivers were doing similar thing, this must be
> > > > > something which is needed for the omap_udc driver to fix up something?
> > > > 
> > > > Here's the patch that added it: 
> > > > https://marc.info/?l=linux-omap=119634396324221=2
> > > > 
> > > > "Make DMA-OUT behave on the 1510 ... the 1510 CPC register was just
> > > > off-by-one with respect to the 1611 CDAC"
> > > 
> > > ... which suggests that's a problem with the CPC register itself, and
> > > we should fix that in the DMAengine driver rather than the USB gadget
> > > driver.
> > > 
> > > Tony, any input on this?
> > 
> > Yeah that sounds like some hardware work-around for 15xx as described
> > in the DMA_DEST_LAST macro reading CSAC on 15xx instead of CDAC. Seems
> > like it should be done in the dmaengine driver.. My guess is that other
> > dma users never needed to read CSAC register?
> 
> Hmm, reading the OMAP1510 TRM for the CPC register, it seems that
> omap-dma won't handle this particularly well.  The fact that the
> register only updates after the last request in an element or frame
> means that if we try to use this value as the current source /
> destination address before the first transfer, it can be wildly
> wrong.
> 
> Saving the current value at the beginning of a request, and detecting
> if it's changed (like omap_udc) isn't going to work well in the
> generic case.  If, say, the register happens to contain 0x0004, and
> our next transfer is using 32-bit elements in element sync mode
> starting at address with lsb 16 bits as 0, it would mean we'd see
> 0x0004 in this register after the _second_ element has been
> transferred, and we'd assume that the register hasn't yet changed -
> but we would have in reality transferred two elements.
> 
> However, omap-dma.c zeros the CPC register before each transfer,
> which is interesting, because in one place the OMAP1510 TRM says
> that the CPC register is read/write, but in the actual description
> of this register, it says it's read-only.
> 
> What it also means is that, in such a case, after the 2nd element has
> been transferred, where the register contains 0x0004, the address
> we'd be looking for (to calculate the residual) should be 0x0008,
> not 0x0005, so we actually need to be adding the number of bytes
> according to element size.
> 
> Looking at omap-dma.c, someone has added support for the residue
> granularity:
> 
> if (__dma_omap15xx(od->plat->dma_attr))
> od->ddev.residue_granularity =
> DMA_RESIDUE_GRANULARITY_DESCRIPTOR;
> else
> od->ddev.residue_granularity = DMA_RESIDUE_GRANULARITY_BURST;

I'll point out that this was introduced by:

commit c9bd0946da243a8eb86b44ff613e2c813f9b683b
Author: Janusz Krzysztofik 
Date:   Tue Jun 5 18:59:57 2018 +0200

dmaengine: ti: omap-dma: Fix OMAP1510 incorrect residue_granularity
...
It was verified already before that omap_get_dma_src_pos() from
arch/arm/plat-omap/dma.c didn't work correctly for OMAP1510 - see
commit 1bdd7419910c ("ASoC: OMAP: fix OMAP1510 broken PCM pointer
callback") for details.  Apparently the same applies to its successor,
omap_dma_get_src_pos() from drivers/dma/ti/omap-dma.c.

So, this now presents us with bigger problems - if we fix it now for
omap_udc, should the above commit be reverted, and if we do revert
the above commit, will it regress OMAP1510 audio.

The intention of omap-dma was always to report an accurate residue.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] dmaengine: ti: omap-dma: Configure LCH_TYPE for OMAP1

2018-11-25 Thread Russell King - ARM Linux
On Sat, Nov 24, 2018 at 05:07:17PM -0800, Tony Lindgren wrote:
> * Russell King - ARM Linux  [181124 20:10]:
> > On Fri, Nov 23, 2018 at 08:52:15PM +0200, Aaro Koskinen wrote:
> > > Hi,
> > > 
> > > On Fri, Nov 23, 2018 at 02:35:04PM +0200, Peter Ujfalusi wrote:
> > > > On 22/11/2018 17.12, Russell King - ARM Linux wrote:
> > > > > I'm also not sure about this:
> > > > > 
> > > > > if (cpu_is_omap15xx())
> > > > > end++;
> > > > > 
> > > > > in dma_dest_len() - is that missing from the omap-dma driver?  It 
> > > > > looks
> > > > > like a work-around for some problem on OMAP15xx, but I can't make 
> > > > > sense
> > > > > about why it's in the UDC driver rather than the legacy DMA driver.
> > > > 
> > > > afaik no other legacy drivers were doing similar thing, this must be
> > > > something which is needed for the omap_udc driver to fix up something?
> > > 
> > > Here's the patch that added it: 
> > > https://marc.info/?l=linux-omap=119634396324221=2
> > > 
> > > "Make DMA-OUT behave on the 1510 ... the 1510 CPC register was just
> > > off-by-one with respect to the 1611 CDAC"
> > 
> > ... which suggests that's a problem with the CPC register itself, and
> > we should fix that in the DMAengine driver rather than the USB gadget
> > driver.
> > 
> > Tony, any input on this?
> 
> Yeah that sounds like some hardware work-around for 15xx as described
> in the DMA_DEST_LAST macro reading CSAC on 15xx instead of CDAC. Seems
> like it should be done in the dmaengine driver.. My guess is that other
> dma users never needed to read CSAC register?

Hmm, reading the OMAP1510 TRM for the CPC register, it seems that
omap-dma won't handle this particularly well.  The fact that the
register only updates after the last request in an element or frame
means that if we try to use this value as the current source /
destination address before the first transfer, it can be wildly
wrong.

Saving the current value at the beginning of a request, and detecting
if it's changed (like omap_udc) isn't going to work well in the
generic case.  If, say, the register happens to contain 0x0004, and
our next transfer is using 32-bit elements in element sync mode
starting at address with lsb 16 bits as 0, it would mean we'd see
0x0004 in this register after the _second_ element has been
transferred, and we'd assume that the register hasn't yet changed -
but we would have in reality transferred two elements.

However, omap-dma.c zeros the CPC register before each transfer,
which is interesting, because in one place the OMAP1510 TRM says
that the CPC register is read/write, but in the actual description
of this register, it says it's read-only.

What it also means is that, in such a case, after the 2nd element has
been transferred, where the register contains 0x0004, the address
we'd be looking for (to calculate the residual) should be 0x0008,
not 0x0005, so we actually need to be adding the number of bytes
according to element size.

Looking at omap-dma.c, someone has added support for the residue
granularity:

if (__dma_omap15xx(od->plat->dma_attr))
od->ddev.residue_granularity =
DMA_RESIDUE_GRANULARITY_DESCRIPTOR;
else
od->ddev.residue_granularity = DMA_RESIDUE_GRANULARITY_BURST;

If OMAP15xx is truely descriptor granularity, then we can't use
omap-dma for omap_udc, because omap_udc needs to know exactly
how many bytes were transferred.

So... hmm, OMAP15xx DMA looks like a complete mess, and the only
way to know what would and wouldn't work is to actually have the
hardware.

I think we're better off leaving omap-udc well alone, and if it's
now broken with DMA, then that's unfortunate - it would require
someone with the hardware to diagnose the problem and fix it.  I
think trying to convert it to dmaengine would be risking way more
problems than its worth.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] dmaengine: ti: omap-dma: Configure LCH_TYPE for OMAP1

2018-11-24 Thread Russell King - ARM Linux
On Fri, Nov 23, 2018 at 08:52:15PM +0200, Aaro Koskinen wrote:
> Hi,
> 
> On Fri, Nov 23, 2018 at 02:35:04PM +0200, Peter Ujfalusi wrote:
> > On 22/11/2018 17.12, Russell King - ARM Linux wrote:
> > > I'm also not sure about this:
> > > 
> > > if (cpu_is_omap15xx())
> > > end++;
> > > 
> > > in dma_dest_len() - is that missing from the omap-dma driver?  It looks
> > > like a work-around for some problem on OMAP15xx, but I can't make sense
> > > about why it's in the UDC driver rather than the legacy DMA driver.
> > 
> > afaik no other legacy drivers were doing similar thing, this must be
> > something which is needed for the omap_udc driver to fix up something?
> 
> Here's the patch that added it: 
> https://marc.info/?l=linux-omap=119634396324221=2
> 
> "Make DMA-OUT behave on the 1510 ... the 1510 CPC register was just
> off-by-one with respect to the 1611 CDAC"

... which suggests that's a problem with the CPC register itself, and
we should fix that in the DMAengine driver rather than the USB gadget
driver.

Tony, any input on this?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] dmaengine: ti: omap-dma: Configure LCH_TYPE for OMAP1

2018-11-24 Thread Russell King - ARM Linux
On Sat, Nov 24, 2018 at 09:06:48PM +0200, Aaro Koskinen wrote:
> Hello,
> 
> On Sat, Nov 24, 2018 at 05:48:23PM +, Russell King - ARM Linux wrote:
> > Hmm, there's more questionable stuff in this driver, and the gadget
> > layer.
> 
> [...]
> 
> > So, whatever way I look at this, the code in the removal path both
> > in omap_udc and the gadget removal code higher up looks very wrong
> > and broken to me.
> 
> Yes, week ago I saw omap_udc crashing on both probe failure and
> module removal and sent some fixes for the most obvious failures (see
> https://marc.info/?l=linux-usb=154258778316932=2).

The effect of your patch is basically to replace the release function
with a no-op function.

> Is there any good driver that uses usb_add_gadget_udc_release() correctly?
> Looking at fsl_qe_udc.c and fsl_udc_core.c they should also crash if
> usb_add_gadget_udc_release() fails.

usb_add_gadget_udc_release() itself will call the release function
automatically on error.

The release function should _also_ be called when usb_del_gadget_udc()
is called (and would be guaranteed if the memset() is removed.)

So, moving the cleanup in the remove path into the release function
would solve the problem with omap_udc, and removing the memset()
would solve the problem with the core code.

It does leave a problem if the omap_udc module is removed - the
release function _could_ be called after the module has been removed
which would lead to an oops.  That's presumably why there's a
completion.  One solution to that would be to move the assignment
of udc->done before the call to usb_del_gadget_udc().

However, using a completion for something like this tends to be
frowned upon, but I don't see any other way to ensure correctness
here.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] dmaengine: ti: omap-dma: Configure LCH_TYPE for OMAP1

2018-11-24 Thread Russell King - ARM Linux
On Sat, Nov 24, 2018 at 02:17:40AM +0200, Aaro Koskinen wrote:
> Hi,
> 
> On Fri, Nov 23, 2018 at 01:45:46PM +0200, Peter Ujfalusi wrote:
> > On 23/11/2018 0.01, Aaro Koskinen wrote:
> > > With that reverted, the DMA works OK (and I can also now confirm that
> > > OMAP_DMA_LCH_2D works). I haven't yet checked if we actually need that
> > > quirk in OMAP UDC,
> > 
> > The omap_udc driver is a bit of a mess, need to check it myself, but for
> > now we can just set the quirk_ep_out_aligned_size and investigate later.
> 
> OK, with quirk_ep_out_aligned_size we get 770/16xx DMA working again,
> but on 15xx the omap_udc DMA still doesn't work (tested today for the
> first time ever, I have no idea if it has ever worked and if so, when?).

Hmm, there's more questionable stuff in this driver, and the gadget
layer.

Fundamental fact of struct device - it's a ref-counted structure and
will only be freed when the last reference is dropped.  dev_unregister()
merely drops the refcount, it doesn't guarantee that it's dropped to
zero (iow, there can be other users).  Only when the refcount drops
to zero is the dev.release function called.  However:

usb_add_gadget_udc_release(..., release)
{
if (release)
gadget->dev.release = release;
else
gadget->dev.release = usb_udc_nop_release;
device_initialize(>dev);
ret = device_add(>dev);
}

At this point, that struct device is registered, so its refcount can
be increased by other users.

void usb_del_gadget_udc(struct usb_gadget *gadget)
{
...
device_unregister(>dev);
memset(>dev, 0x00, sizeof(gadget->dev));
}

That memset() is down-right wrong - the refcount on this struct device
may _not_ be zero at this point, the struct device could well be in
use by another thread.  That memset will trample over the contents of
the structure potentially while someone else is using it, and
_potentially_ before the gadget->dev.release function has been called.

However, that _may_ be a good thing when you read the omap_udc code:

status = usb_add_gadget_udc_release(>dev, >gadget,
omap_udc_release);

During the omap_udc_remove() function:
{
...
usb_del_gadget_udc(>gadget);
if (udc->driver)
return -EBUSY;

udc->done = 

... more dereferences of udc, which is a _global_ variable ...

wait_for_completion();
}

Now, omap_udc_release() does this:

complete(udc->done);
kfree(udc);
udc = NULL;

So, when usb_del_gadget_udc() is called, if device_unregister() within
there drops the last reference count, omap_udc_release() will be called
immediately.  Since udc->done hasn't been setup at that point, that
complete() will fail with a NULL pointer dereference.  If that doesn't
happen, then the kfree() and following set of the global 'udc' variable
to NULL means that all future references to 'udc' after the call to
usb_del_gadget_udc() in omap_udc_remove() will be dereferencing a NULL
pointer.  So one way or the other, this leads to a kernel OOPS.

If, on the other hand, omap_udc_release() was not called in
device_unregister(), the function pointer will be zeroed by the
memset(), which will lead to 'udc' never being freed - in other words,
we leak memory.

What's more is that 'done' is never "completed" so we end up hanging
at the wait_for_completion().

Then there's the pointless:

if (udc->driver)
return -EBUSY;

in omap_udc_remove().  The effect of returning an error is... what
exactly?  It doesn't prevent the device being removed at all, it
doesn't delay it, in fact the whole "remove returns an int" is
nothing but confusion - the return value from all driver remove
methods is completely ignored.

If udc->driver is still set at this point, it basically means that
we skip the rest of the tear down, but the platform device will
still be unbound from the driver, leaving (eg) the transceiver phy
still claimed, the procfs file still registered, the interrupts
still claimed, the memory region still registered, etc.  If omap_udc
is built as a module, the module could even be removed while all
that is still registered.

So, whatever way I look at this, the code in the removal path both
in omap_udc and the gadget removal code higher up looks very wrong
and broken to me.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] dmaengine: ti: omap-dma: Configure LCH_TYPE for OMAP1

2018-11-23 Thread Russell King - ARM Linux
On Fri, Nov 23, 2018 at 04:16:59PM +, Russell King - ARM Linux wrote:
> Hi Peter,
> 
> Here's the patch, which should now support IN as well as OUT.
> Completely untested, as mentioned before.

Now compile tested...

 drivers/usb/gadget/udc/omap_udc.c | 291 ++
 drivers/usb/gadget/udc/omap_udc.h |   3 +-
 2 files changed, 137 insertions(+), 157 deletions(-)

diff --git a/drivers/usb/gadget/udc/omap_udc.c 
b/drivers/usb/gadget/udc/omap_udc.c
index 3a16431da321..dd85476ec234 100644
--- a/drivers/usb/gadget/udc/omap_udc.c
+++ b/drivers/usb/gadget/udc/omap_udc.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -203,7 +204,7 @@ static int omap_ep_enable(struct usb_ep *_ep,
/* set endpoint to initial state */
ep->dma_channel = 0;
ep->has_dma = 0;
-   ep->lch = -1;
+   ep->dma = NULL;
use_ep(ep, UDC_EP_SEL);
omap_writew(udc->clr_halt, UDC_CTRL);
ep->ackwait = 0;
@@ -468,43 +469,6 @@ static int read_fifo(struct omap_ep *ep, struct omap_req 
*req)
 
 /*-*/
 
-static u16 dma_src_len(struct omap_ep *ep, dma_addr_t start)
-{
-   dma_addr_t  end;
-
-   /* IN-DMA needs this on fault/cancel paths, so 15xx misreports
-* the last transfer's bytecount by more than a FIFO's worth.
-*/
-   if (cpu_is_omap15xx())
-   return 0;
-
-   end = omap_get_dma_src_pos(ep->lch);
-   if (end == ep->dma_counter)
-   return 0;
-
-   end |= start & (0x << 16);
-   if (end < start)
-   end += 0x1;
-   return end - start;
-}
-
-static u16 dma_dest_len(struct omap_ep *ep, dma_addr_t start)
-{
-   dma_addr_t  end;
-
-   end = omap_get_dma_dst_pos(ep->lch);
-   if (end == ep->dma_counter)
-   return 0;
-
-   end |= start & (0x << 16);
-   if (cpu_is_omap15xx())
-   end++;
-   if (end < start)
-   end += 0x1;
-   return end - start;
-}
-
-
 /* Each USB transfer request using DMA maps to one or more DMA transfers.
  * When DMA completion isn't request completion, the UDC continues with
  * the next DMA transfer for that USB transfer.
@@ -512,34 +476,53 @@ static u16 dma_dest_len(struct omap_ep *ep, dma_addr_t 
start)
 
 static void next_in_dma(struct omap_ep *ep, struct omap_req *req)
 {
-   u16 txdma_ctrl, w;
-   unsignedlength = req->req.length - req->req.actual;
-   const int   sync_mode = cpu_is_omap15xx()
-   ? OMAP_DMA_SYNC_FRAME
-   : OMAP_DMA_SYNC_ELEMENT;
-   int dma_trigger = 0;
+   struct dma_async_tx_descriptor *tx;
+   struct dma_chan *dma = ep->dma;
+   dma_cookie_t cookie;
+   unsigned burst, length;
+   u16 txdma_ctrl, w;
+   struct dma_slave_config omap_udc_in_cfg = {
+   .direction = DMA_MEM_TO_DEV,
+   .dst_addr = UDC_DATA_DMA,
+   };
+
+   length = req->req.length - req->req.actual;
 
/* measure length in either bytes or packets */
-   if ((cpu_is_omap16xx() && length <= UDC_TXN_TSC)
-   || (cpu_is_omap15xx() && length < ep->maxpacket)) {
+   if ((cpu_is_omap16xx() && length <= UDC_TXN_TSC) ||
+   (cpu_is_omap15xx() && length < ep->maxpacket)) {
+   omap_udc_in_cfg.dst_addr_width = DMA_SLAVE_BUSWIDTH_1_BYTE;
txdma_ctrl = UDC_TXN_EOT | length;
-   omap_set_dma_transfer_params(ep->lch, OMAP_DMA_DATA_TYPE_S8,
-   length, 1, sync_mode, dma_trigger, 0);
+   burst = length;
} else {
-   length = min(length / ep->maxpacket,
-   (unsigned) UDC_TXN_TSC + 1);
+   omap_udc_in_cfg.dst_addr_width = DMA_SLAVE_BUSWIDTH_2_BYTES;
+   length = min_t(unsigned, length / ep->maxpacket,
+  UDC_TXN_TSC + 1);
txdma_ctrl = length;
-   omap_set_dma_transfer_params(ep->lch, OMAP_DMA_DATA_TYPE_S16,
-   ep->ep.maxpacket >> 1, length, sync_mode,
-   dma_trigger, 0);
length *= ep->maxpacket;
+   burst = ep->ep.maxpacket >> 1;
}
-   omap_set_dma_src_params(ep->lch, OMAP_DMA_PORT_EMIFF,
-   OMAP_DMA_AMODE_POST_INC, req->req.dma + req->req.actual,
-   0, 0);
 
-   omap_start_dma(ep->lch);
-   ep->dma_counter = omap_get_dma_src_pos(ep->lch);
+   if (!cpu_is_omap15xx())
+   burst = 1;
+
+   omap_udc_in_cfg.dst_m

Re: [PATCH] dmaengine: ti: omap-dma: Configure LCH_TYPE for OMAP1

2018-11-23 Thread Russell King - ARM Linux
Hi Peter,

Here's the patch, which should now support IN as well as OUT.
Completely untested, as mentioned before.

 drivers/usb/gadget/udc/omap_udc.c | 286 ++
 drivers/usb/gadget/udc/omap_udc.h |   3 +-
 2 files changed, 135 insertions(+), 154 deletions(-)

diff --git a/drivers/usb/gadget/udc/omap_udc.c 
b/drivers/usb/gadget/udc/omap_udc.c
index 3a16431da321..ad6f315e4327 100644
--- a/drivers/usb/gadget/udc/omap_udc.c
+++ b/drivers/usb/gadget/udc/omap_udc.c
@@ -28,6 +28,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -203,7 +204,7 @@ static int omap_ep_enable(struct usb_ep *_ep,
/* set endpoint to initial state */
ep->dma_channel = 0;
ep->has_dma = 0;
-   ep->lch = -1;
+   ep->dma = NULL;
use_ep(ep, UDC_EP_SEL);
omap_writew(udc->clr_halt, UDC_CTRL);
ep->ackwait = 0;
@@ -468,43 +469,6 @@ static int read_fifo(struct omap_ep *ep, struct omap_req 
*req)
 
 /*-*/
 
-static u16 dma_src_len(struct omap_ep *ep, dma_addr_t start)
-{
-   dma_addr_t  end;
-
-   /* IN-DMA needs this on fault/cancel paths, so 15xx misreports
-* the last transfer's bytecount by more than a FIFO's worth.
-*/
-   if (cpu_is_omap15xx())
-   return 0;
-
-   end = omap_get_dma_src_pos(ep->lch);
-   if (end == ep->dma_counter)
-   return 0;
-
-   end |= start & (0x << 16);
-   if (end < start)
-   end += 0x1;
-   return end - start;
-}
-
-static u16 dma_dest_len(struct omap_ep *ep, dma_addr_t start)
-{
-   dma_addr_t  end;
-
-   end = omap_get_dma_dst_pos(ep->lch);
-   if (end == ep->dma_counter)
-   return 0;
-
-   end |= start & (0x << 16);
-   if (cpu_is_omap15xx())
-   end++;
-   if (end < start)
-   end += 0x1;
-   return end - start;
-}
-
-
 /* Each USB transfer request using DMA maps to one or more DMA transfers.
  * When DMA completion isn't request completion, the UDC continues with
  * the next DMA transfer for that USB transfer.
@@ -512,34 +476,53 @@ static u16 dma_dest_len(struct omap_ep *ep, dma_addr_t 
start)
 
 static void next_in_dma(struct omap_ep *ep, struct omap_req *req)
 {
-   u16 txdma_ctrl, w;
-   unsignedlength = req->req.length - req->req.actual;
-   const int   sync_mode = cpu_is_omap15xx()
-   ? OMAP_DMA_SYNC_FRAME
-   : OMAP_DMA_SYNC_ELEMENT;
-   int dma_trigger = 0;
+   struct dma_async_tx_descriptor *tx;
+   struct dma_chan *dma = ep->dma;
+   dma_cookie_t cookie;
+   unsigned burst, length;
+   u16 txdma_ctrl, w;
+   struct dma_slave_config omap_udc_in_cfg = {
+   .direction = DMA_MEM_TO_DEV,
+   .dst_addr = UDC_DATA_DMA,
+   };
+
+   length = req->req.length - req->req.actual;
 
/* measure length in either bytes or packets */
-   if ((cpu_is_omap16xx() && length <= UDC_TXN_TSC)
-   || (cpu_is_omap15xx() && length < ep->maxpacket)) {
+   if ((cpu_is_omap16xx() && length <= UDC_TXN_TSC) ||
+   (cpu_is_omap15xx() && length < ep->maxpacket)) {
+   omap_udc_in_cfg.dst_addr_width = DMA_SLAVE_BUSWIDTH_1_BYTE;
txdma_ctrl = UDC_TXN_EOT | length;
-   omap_set_dma_transfer_params(ep->lch, OMAP_DMA_DATA_TYPE_S8,
-   length, 1, sync_mode, dma_trigger, 0);
+   burst = length;
} else {
-   length = min(length / ep->maxpacket,
-   (unsigned) UDC_TXN_TSC + 1);
+   omap_udc_in_cfg.dst_addr_width = DMA_SLAVE_BUSWIDTH_2_BYTE;
+   length = min_t(unsigned, length / ep->maxpacket,
+  UDC_TXN_TSC + 1);
txdma_ctrl = length;
-   omap_set_dma_transfer_params(ep->lch, OMAP_DMA_DATA_TYPE_S16,
-   ep->ep.maxpacket >> 1, length, sync_mode,
-   dma_trigger, 0);
length *= ep->maxpacket;
+   burst = ep->ep.maxpacket >> 1;
}
-   omap_set_dma_src_params(ep->lch, OMAP_DMA_PORT_EMIFF,
-   OMAP_DMA_AMODE_POST_INC, req->req.dma + req->req.actual,
-   0, 0);
 
-   omap_start_dma(ep->lch);
-   ep->dma_counter = omap_get_dma_src_pos(ep->lch);
+   if (!cpu_is_omap15xx())
+   burst = 1;
+
+   omap_udc_in_cfg.dst_maxburst = burst;
+
+   if (WARN_ON(dmaengine_slave_config(dma, _udc_in_cfg)))
+   return;
+
+   tx = dmaengine_prep_slave_single(dma, req->req.dma + req->req.actual,
+length, DMA_MEM_TO_DEV, 0);
+   if (WARN_ON(!tx))
+   return;
+
+   cookie = 

Re: [PATCH] dmaengine: ti: omap-dma: Configure LCH_TYPE for OMAP1

2018-11-23 Thread Russell King - ARM Linux
On Fri, Nov 23, 2018 at 02:35:04PM +0200, Peter Ujfalusi wrote:
> > Also we can't deal with the omap_set_dma_dest_burst_mode() setting -
> > DMAengine always uses a 64 byte burst, but udc wants a smaller burst
> > setting.  Does this matter?
> 
> The tusb also fiddled with the burst before the conversion, I believe
> what the DMAengine driver is doing should be fine. If not then we fix it
> while converting the omap_udc.

That's good news, so I can ignore that difference.

> > I'm also not sure about this:
> > 
> > if (cpu_is_omap15xx())
> > end++;
> > 
> > in dma_dest_len() - is that missing from the omap-dma driver?  It looks
> > like a work-around for some problem on OMAP15xx, but I can't make sense
> > about why it's in the UDC driver rather than the legacy DMA driver.
> 
> afaik no other legacy drivers were doing similar thing, this must be
> something which is needed for the omap_udc driver to fix up something?
> 
> > 
> > I'm also confused by:
> > 
> > end |= start & (0x << 16);
> > 
> > also in dma_dest_len() - omap_get_dma_dst_pos() returns in the high 16
> > bits the full address:
> > 
> > if (dma_omap1())
> > offset |= (p->dma_read(CDSA, lch) & 0x);
> 
> CDSA is OMAP_DMA_REG_2X16BIT for omap1
> The CPC/CDAC holds the LSB of the _current_ DMA pointer. The code gets
> the MSB of the address from the CDSA registers.
> 
> > 
> > so if the address crosses a 64k physical address boundary, surely orring
> > in the start address is wrong?  The more I look at dma_dest_len(), the
> > more I wonder whether that and dma_src_len() are anywhere near correct,
> > and whether that is a source of breakage for Aaro.
> 
> Hrm, again... the position reporting on OMAP1 is certainly not something
> I would put my life on :o

My feeling is - if the code in plat-omap/dma.c doesn't work, we've got
the same problems in the dmaengine driver, so the reported residue will
be wrong.  Any workarounds need to be within the dmaengine driver, not
in individual drivers.  We can't just go subtracting 1 from the residue
reported by dmaengine.

> > diff --git a/drivers/usb/gadget/udc/omap_udc.c 
> > b/drivers/usb/gadget/udc/omap_udc.c
> > index 3a16431da321..a37e1d2f0f3e 100644
> > --- a/drivers/usb/gadget/udc/omap_udc.c
> > +++ b/drivers/usb/gadget/udc/omap_udc.c
> > @@ -204,6 +204,7 @@ static int omap_ep_enable(struct usb_ep *_ep,
> > ep->dma_channel = 0;
> > ep->has_dma = 0;
> > ep->lch = -1;
> > +   ep->dma = NULL;
> > use_ep(ep, UDC_EP_SEL);
> > omap_writew(udc->clr_halt, UDC_CTRL);
> > ep->ackwait = 0;
> > @@ -576,21 +577,49 @@ static void finish_in_dma(struct omap_ep *ep, struct 
> > omap_req *req, int status)
> >  static void next_out_dma(struct omap_ep *ep, struct omap_req *req)
> >  {
> > unsigned packets = req->req.length - req->req.actual;
> > -   int dma_trigger = 0;
> > +   struct dma_async_tx_descriptor *tx;
> > +   struct dma_chan *dma = ep->dma;
> > +   dma_cookie_t cookie;
> > u16 w;
> >  
> > -   /* set up this DMA transfer, enable the fifo, start */
> > -   packets /= ep->ep.maxpacket;
> > -   packets = min(packets, (unsigned)UDC_RXN_TC + 1);
> > +   packets = min_t(unsigned, packets / ep->ep.maxpacket, UDC_RXN_TC + 1);
> > req->dma_bytes = packets * ep->ep.maxpacket;
> > -   omap_set_dma_transfer_params(ep->lch, OMAP_DMA_DATA_TYPE_S16,
> > -   ep->ep.maxpacket >> 1, packets,
> > -   OMAP_DMA_SYNC_ELEMENT,
> > -   dma_trigger, 0);
> > -   omap_set_dma_dest_params(ep->lch, OMAP_DMA_PORT_EMIFF,
> > -   OMAP_DMA_AMODE_POST_INC, req->req.dma + req->req.actual,
> > -   0, 0);
> > -   ep->dma_counter = omap_get_dma_dst_pos(ep->lch);
> > +
> > +   if (dma) {
> > +   struct dma_slave_config cfg = {
> > +   .direction = DMA_DEV_TO_MEM,
> > +   .src_addr_width = DMA_SLAVE_BUSWIDTH_2_BYTE,
> > +   /*
> > +* DMAengine uses frame sync mode, setting maxburst=1
> > +* is equivalent to element sync mode.
> > +*/
> > +   .src_maxburst = 1,
> 
> We should fix the omap-dma driver for slave_sg  instead:
> 
> if (!burst)
>   burst = 1;
> 
> I thought that I already did that.

It isn't in 4.19, and I see no changes between 4.19 and 4.20-rc for
ti/omap-dma.c.

> > +   struct dma_chan *dma;
> > +
> > dma_channel = OMAP_DMA_USB_W2FC_RX0 - 1 + channel;
> > -   status = omap_request_dma(dma_channel,
> > -   ep->ep.name, dma_error, ep, >lch);
> > -   if (status == 0) {
> > +
> > +   dma = __dma_request_channel(, omap_dma_filter_fn,
> > +   (void *)dma_channel);
> 
> dma_request_chan(dev, "ch_name");
> 
> where ch_name: rx0/1/2, tx0/1/2
> 
> and we don't need the omap_dma_filter_fn in here as all taken care via
> the dma_slave_map

Yea, we can 

Re: Sleeping in user_access section

2018-11-23 Thread Russell King - ARM Linux
On Fri, Nov 23, 2018 at 11:57:31AM +, Julien Thierry wrote:
> 
> 
> On 23/11/18 10:50, Russell King - ARM Linux wrote:
> >On Fri, Nov 23, 2018 at 01:57:12AM -0800, h...@zytor.com wrote:
> >>You should never call a sleeping function from a user_access section.
> >>It is intended for very limited regions.
> >
> >So, what happens if the "unsafe" user access causes a page fault that
> >ends up sleeping?
> >
> 
> Thanks for pointing that out.
> 
> On the arm64 side, PAN state is saved in spsr and (if PAN feature is enabled
> in SCTLR) PAN bit gets set (disabling the user accesses). For software PAN
> we follow the same behaviour on exception entry. So upon exception we
> implicitly exit user_access mode and then re-enter it when returning from
> the exception.
> 
> On x86, the EFLAGS.AC bit is also saved upon exception and I think it is
> cleared upon exception entry so there is implicit exit from the user_access
> mode when taking exception/interrupt.
> 
> This however is just how those two architectures happen to behave and
> doesn't seem to be specified as part of the user_access API...
> 
> Which is why I'd like to clarify the semantics of user_access region wrt
> sleeping functions.

Explicitly calling a sleeping function may not be recommended, but
it's a rather fundamental fact of the Linux kernel that if we want
to access userspace, we must be either prepared to sleep, or fail
the access and return an error.

Looking at kernel/exit.c and kernel/compat.c, I see no sign of any
retry mechanism in the current call sites, so any failed user access
would return an error to userspace - which is not the behaviour that
userspace would expect.

It's possible that, when user_access_begin() et.al. are implemented,
access_ok() also comes with the requirement to make sure that the
userspace pages are accessible, but even _that_ would be racy,
because there's no way to pin the pages, so another thread/CPU could
page those user pages back out... leading to a fault.

So, realistically, with how user_access_begin()..user_access_end()
is currently being used, an architecture fundamentally has to be
prepared for threads to sleep within that section of code through
the action of the page fault handling.  I can't see any other
realistic possibility here.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: Sleeping in user_access section

2018-11-23 Thread Russell King - ARM Linux
On Fri, Nov 23, 2018 at 01:57:12AM -0800, h...@zytor.com wrote:
> You should never call a sleeping function from a user_access section.
> It is intended for very limited regions.

So, what happens if the "unsafe" user access causes a page fault that
ends up sleeping?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] dmaengine: ti: omap-dma: Configure LCH_TYPE for OMAP1

2018-11-22 Thread Russell King - ARM Linux
On Fri, Nov 23, 2018 at 12:24:26AM +0200, Aaro Koskinen wrote:
> Hi,
> 
> On Thu, Nov 22, 2018 at 03:12:36PM +, Russell King - ARM Linux wrote:
> > On Thu, Nov 22, 2018 at 10:29:48AM +0000, Russell King - ARM Linux wrote:
> > > On Tue, Nov 20, 2018 at 11:04:06PM +0200, Aaro Koskinen wrote:
> > > > I had switched to PIO mode in 2015 since the WARNs about legacy DMA
> > > > API were too annoying and flooding the console. And now that I tried
> > > > using DMA again with g_ether, it doesn't work anymore. The device get's
> > > > recognized on host side, but no traffic goes through. Switching back to
> > > > PIO makes it to work again.
> > > 
> > > A solution to that would be to do what the warning message says, and
> > > update the driver to the DMAengine API.
> 
> Fully agreed, but I was busy debugging other more serious issues, and
> just wanted to get a reliable ssh or USB serial access to the device
> without any extra noise, so switching to PIO using a module parameter
> is probably what most users do in such situations.
> 
> > Here's a partial conversion (not even build tested) - it only supports
> > OUT transfers with dmaengine at the moment.
> 
> Thanks, I'll take a closer look and try to do some testing hopefully
> during the weekend.

The patch was more for Peter to take a peek at - there's definitely
some bits missing in the dmaengine driver (like the write to the
LCH_CTRL register) that would need to be fixed somehow.

However, it's worth noting that there is exactly one user of
omap_set_dma_channel_mode(), which is omap-udc, which means any DMA
channel made use of by omap-udc will have the LCH_CTRL register
modified to LCH_P, and it will remain that way even if someone else
subsequently makes use of the same channel.  That's rather suspicious
to me... maybe we can just initialise all LCH_CTRL registers to LCH_P
in the dmaengine driver in that case!  If not, then there's a bug
right there.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH AUTOSEL 4.4 3/8] ARM: make lookup_processor_type() non-__init

2018-11-22 Thread Russell King - ARM Linux
Same comments as for the 4.9 version of these patches, and also applies
to the two 3.18 patches as well.  They aren't fixes, but preparation
for fixes, and should not be backported without the actual Spectre fix
patch (which probably requires manual backport effort.)

On Thu, Nov 22, 2018 at 02:57:10PM -0500, Sasha Levin wrote:
> From: Russell King 
> 
> [ Upstream commit 899a42f836678a595f7d2bc36a5a0c2b03d08cbc ]
> 
> Move lookup_processor_type() out of the __init section so it is callable
> from (eg) the secondary startup code during hotplug.
> 
> Reviewed-by: Julien Thierry 
> Signed-off-by: Russell King 
> Signed-off-by: Sasha Levin 
> ---
>  arch/arm/kernel/head-common.S | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm/kernel/head-common.S b/arch/arm/kernel/head-common.S
> index 8733012d231f..7e662bdd5cb3 100644
> --- a/arch/arm/kernel/head-common.S
> +++ b/arch/arm/kernel/head-common.S
> @@ -122,6 +122,9 @@ __mmap_switched_data:
>   .long   init_thread_union + THREAD_START_SP @ sp
>   .size   __mmap_switched_data, . - __mmap_switched_data
>  
> + __FINIT
> + .text
> +
>  /*
>   * This provides a C-API version of __lookup_processor_type
>   */
> @@ -133,9 +136,6 @@ ENTRY(lookup_processor_type)
>   ldmfd   sp!, {r4 - r6, r9, pc}
>  ENDPROC(lookup_processor_type)
>  
> - __FINIT
> - .text
> -
>  /*
>   * Read processor ID register (CP#15, CR0), and look up in the linker-built
>   * supported processor list.  Note that we can't use the absolute addresses
> -- 
> 2.17.1
> 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH AUTOSEL 4.9 10/15] ARM: split out processor lookup

2018-11-22 Thread Russell King - ARM Linux
Sorry, I meant this patch, not patch 11.

On Thu, Nov 22, 2018 at 02:56:16PM -0500, Sasha Levin wrote:
> From: Russell King 
> 
> [ Upstream commit 65987a8553061515b5851b472081aedb9837a391 ]
> 
> Split out the lookup of the processor type and associated error handling
> from the rest of setup_processor() - we will need to use this in the
> secondary CPU bringup path for big.Little Spectre variant 2 mitigation.
> 
> Reviewed-by: Julien Thierry 
> Signed-off-by: Russell King 
> Signed-off-by: Sasha Levin 
> ---
>  arch/arm/include/asm/cputype.h |  1 +
>  arch/arm/kernel/setup.c| 31 +++
>  2 files changed, 20 insertions(+), 12 deletions(-)
> 
> diff --git a/arch/arm/include/asm/cputype.h b/arch/arm/include/asm/cputype.h
> index b62eaeb147aa..65db1376f09a 100644
> --- a/arch/arm/include/asm/cputype.h
> +++ b/arch/arm/include/asm/cputype.h
> @@ -98,6 +98,7 @@
>  #define ARM_CPU_PART_SCORPION0x510002d0
>  
>  extern unsigned int processor_id;
> +struct proc_info_list *lookup_processor(u32 midr);
>  
>  #ifdef CONFIG_CPU_CP15
>  #define read_cpuid(reg)  
> \
> diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
> index f4e54503afa9..8d5c3a118abe 100644
> --- a/arch/arm/kernel/setup.c
> +++ b/arch/arm/kernel/setup.c
> @@ -667,22 +667,29 @@ static void __init smp_build_mpidr_hash(void)
>  }
>  #endif
>  
> -static void __init setup_processor(void)
> +/*
> + * locate processor in the list of supported processor types.  The linker
> + * builds this table for us from the entries in arch/arm/mm/proc-*.S
> + */
> +struct proc_info_list *lookup_processor(u32 midr)
>  {
> - struct proc_info_list *list;
> + struct proc_info_list *list = lookup_processor_type(midr);
>  
> - /*
> -  * locate processor in the list of supported processor
> -  * types.  The linker builds this table for us from the
> -  * entries in arch/arm/mm/proc-*.S
> -  */
> - list = lookup_processor_type(read_cpuid_id());
>   if (!list) {
> - pr_err("CPU configuration botched (ID %08x), unable to 
> continue.\n",
> -read_cpuid_id());
> - while (1);
> + pr_err("CPU%u: configuration botched (ID %08x), CPU halted\n",
> +smp_processor_id(), midr);
> + while (1)
> + /* can't use cpu_relax() here as it may require MMU setup */;
>   }
>  
> + return list;
> +}
> +
> +static void __init setup_processor(void)
> +{
> + unsigned int midr = read_cpuid_id();
> + struct proc_info_list *list = lookup_processor(midr);
> +
>   cpu_name = list->cpu_name;
>   __cpu_architecture = __get_cpu_architecture();
>  
> @@ -700,7 +707,7 @@ static void __init setup_processor(void)
>  #endif
>  
>   pr_info("CPU: %s [%08x] revision %d (ARMv%s), cr=%08lx\n",
> - cpu_name, read_cpuid_id(), read_cpuid_id() & 15,
> + list->cpu_name, midr, midr & 15,
>   proc_arch[cpu_architecture()], get_cr());
>  
>   snprintf(init_utsname()->machine, __NEW_UTS_LEN + 1, "%s%c",
> -- 
> 2.17.1
> 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH AUTOSEL 4.9 09/15] ARM: make lookup_processor_type() non-__init

2018-11-22 Thread Russell King - ARM Linux
This, and your patch 11 aren't fixes on their own.  They're part of a
rework for the "ARM: spectre-v2: per-CPU vtables to work around
big.Little systems" patch.  It doesn't make sense to back port these
without that patch, even though they can be applied without.

On Thu, Nov 22, 2018 at 02:56:15PM -0500, Sasha Levin wrote:
> From: Russell King 
> 
> [ Upstream commit 899a42f836678a595f7d2bc36a5a0c2b03d08cbc ]
> 
> Move lookup_processor_type() out of the __init section so it is callable
> from (eg) the secondary startup code during hotplug.
> 
> Reviewed-by: Julien Thierry 
> Signed-off-by: Russell King 
> Signed-off-by: Sasha Levin 
> ---
>  arch/arm/kernel/head-common.S | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm/kernel/head-common.S b/arch/arm/kernel/head-common.S
> index 8733012d231f..7e662bdd5cb3 100644
> --- a/arch/arm/kernel/head-common.S
> +++ b/arch/arm/kernel/head-common.S
> @@ -122,6 +122,9 @@ __mmap_switched_data:
>   .long   init_thread_union + THREAD_START_SP @ sp
>   .size   __mmap_switched_data, . - __mmap_switched_data
>  
> + __FINIT
> + .text
> +
>  /*
>   * This provides a C-API version of __lookup_processor_type
>   */
> @@ -133,9 +136,6 @@ ENTRY(lookup_processor_type)
>   ldmfd   sp!, {r4 - r6, r9, pc}
>  ENDPROC(lookup_processor_type)
>  
> - __FINIT
> - .text
> -
>  /*
>   * Read processor ID register (CP#15, CR0), and look up in the linker-built
>   * supported processor list.  Note that we can't use the absolute addresses
> -- 
> 2.17.1
> 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH AUTOSEL 4.14 10/21] ARM: make lookup_processor_type() non-__init

2018-11-22 Thread Russell King - ARM Linux
Hi Sasha,

We need to keep track of which Spectre patches have been backported
and which haven't.  David Long has been doing the backport work,
which doesn't include all the patches in my present spectre branch.
You've picked up the last five, meaning there's a bunch in the middle
of the entire series which haven't yet been considered - for example,
the Spectre variant 1.1 patches.

The entire series of 32-bit ARM Spectre patches can be listed in
mainline via:

$ git log v4.16..383fb3ee8024

where almost all of those need to be applied to stable kernels, with
possible manual back-porting where they don't apply.

On Thu, Nov 22, 2018 at 02:54:41PM -0500, Sasha Levin wrote:
> From: Russell King 
> 
> [ Upstream commit 899a42f836678a595f7d2bc36a5a0c2b03d08cbc ]
> 
> Move lookup_processor_type() out of the __init section so it is callable
> from (eg) the secondary startup code during hotplug.
> 
> Reviewed-by: Julien Thierry 
> Signed-off-by: Russell King 
> Signed-off-by: Sasha Levin 
> ---
>  arch/arm/kernel/head-common.S | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/arm/kernel/head-common.S b/arch/arm/kernel/head-common.S
> index 8733012d231f..7e662bdd5cb3 100644
> --- a/arch/arm/kernel/head-common.S
> +++ b/arch/arm/kernel/head-common.S
> @@ -122,6 +122,9 @@ __mmap_switched_data:
>   .long   init_thread_union + THREAD_START_SP @ sp
>   .size   __mmap_switched_data, . - __mmap_switched_data
>  
> + __FINIT
> + .text
> +
>  /*
>   * This provides a C-API version of __lookup_processor_type
>   */
> @@ -133,9 +136,6 @@ ENTRY(lookup_processor_type)
>   ldmfd   sp!, {r4 - r6, r9, pc}
>  ENDPROC(lookup_processor_type)
>  
> - __FINIT
> - .text
> -
>  /*
>   * Read processor ID register (CP#15, CR0), and look up in the linker-built
>   * supported processor list.  Note that we can't use the absolute addresses
> -- 
> 2.17.1
> 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH 1/2] module: Overwrite st_size instead of st_info

2018-11-22 Thread Russell King - ARM Linux
On Thu, Nov 22, 2018 at 06:40:45PM +0100, Ard Biesheuvel wrote:
> On Thu, 22 Nov 2018 at 17:29, Jessica Yu  wrote:
> >
> > +++ Vincent Whitchurch [22/11/18 13:24 +0100]:
> > >On Thu, Nov 22, 2018 at 12:01:54PM +, Dave Martin wrote:
> > >> On Mon, Nov 19, 2018 at 05:25:12PM +0100, Vincent Whitchurch wrote:
> > >> > st_info is currently overwritten after relocation and used to store the
> > >> > elf_type().  However, we're going to need it fix kallsyms on ARM's
> > >> > Thumb-2 kernels, so preserve st_info and overwrite the st_size field
> > >> > instead.  st_size is neither used by the module core nor by any
> > >> > architecture.
> > >> >
> > >> > Signed-off-by: Vincent Whitchurch 
> > >> > ---
> > >> > v4: Split out to separate patch.  Use st_size instead of st_other.
> > >> > v1-v3: See PATCH 2/2
> > >> >
> > >> >  kernel/module.c | 4 ++--
> > >> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > >> >
> > >> > diff --git a/kernel/module.c b/kernel/module.c
> > >> > index 49a405891587..3d86a38b580c 100644
> > >> > --- a/kernel/module.c
> > >> > +++ b/kernel/module.c
> > >> > @@ -2682,7 +2682,7 @@ static void add_kallsyms(struct module *mod, 
> > >> > const struct load_info *info)
> > >> >
> > >> >/* Set types up while we still have access to sections. */
> > >> >for (i = 0; i < mod->kallsyms->num_symtab; i++)
> > >> > -  mod->kallsyms->symtab[i].st_info
> > >> > +  mod->kallsyms->symtab[i].st_size
> > >> >= elf_type(>kallsyms->symtab[i], info);
> > >> >
> > >> >/* Now populate the cut down core kallsyms for after init. */
> > >> > @@ -4061,7 +4061,7 @@ int module_get_kallsym(unsigned int symnum, 
> > >> > unsigned long *value, char *type,
> > >> >kallsyms = rcu_dereference_sched(mod->kallsyms);
> > >> >if (symnum < kallsyms->num_symtab) {
> > >> >*value = kallsyms->symtab[symnum].st_value;
> > >> > -  *type = kallsyms->symtab[symnum].st_info;
> > >> > +  *type = kallsyms->symtab[symnum].st_size;
> > >> >strlcpy(name, symname(kallsyms, symnum), 
> > >> > KSYM_NAME_LEN);
> > >> >strlcpy(module_name, mod->name, MODULE_NAME_LEN);
> > >> >*exported = is_exported(name, *value, mod);
> > >>
> > >> This is fine if st_size is really unused, but how sure are you of that?
> > >>
> > >> grepping for st_size throws up some hits that appear ELF-related, but
> > >> I've not investigated them in detail.
> > >>
> > >> (The fact that struct stat has an identically named field throws up
> > >> a load of false positives too.)
> > >
> > >$ git describe --tags
> > >v4.20-rc3-93-g92b419289cee
> > >
> > >$ rg -m1 '[\.>]st_size'  --iglob '!**/tools/**' --iglob '!**/vdso*' 
> > >--iglob '!**/scripts/**' --iglob '!**/usr/**' --iglob '!**/samples/**' | 
> > >cat
> > >| kernel/kexec_file.c: if (sym->st_size != size) {
> > >
> > >Symbols in kexec kernel.
> > >
> > >| fs/stat.c:   tmp.st_size = stat->size;
> > >| Documentation/networking/tls.txt:  sendfile(sock, file, , 
> > >stat.st_size);
> > >| net/9p/client.c: ret->st_rdev, ret->st_size, ret->st_blksize,
> > >| net/9p/protocol.c:   >st_rdev, 
> > >>st_size,
> > >| fs/9p/vfs_inode_dotl.c:  i_size_write(inode, stat->st_size);
> > >| fs/hostfs/hostfs_user.c: p->size = buf->st_size;
> > >| arch/powerpc/boot/mktree.c:  nblks = (st.st_size + IMGBLK) / IMGBLK;
> > >| arch/alpha/kernel/osf_sys.c: tmp.st_size = lstat->size;
> > >| arch/x86/ia32/sys_ia32.c:__put_user(stat->size, >st_size) 
> > >||
> > >
> > >Not Elf_Sym.
> > >
> > >| arch/x86/kernel/machine_kexec_64.c:   sym->st_size);
> > >
> > >Symbols in kexec kernel.
> > >
> > >| arch/sparc/boot/piggyback.c: st4(buffer + 12, s.st_size);
> > >| arch/sparc/kernel/sys_sparc32.c: err |= put_user(stat->size, 
> > >>st_size);
> > >| arch/um/os-Linux/file.c: .ust_size= src->st_size,/* 
> > >total size, in bytes */
> > >| arch/um/os-Linux/start_up.c: size = (buf.st_size + UM_KERN_PAGE_SIZE) & 
> > >~(UM_KERN_PAGE_SIZE - 1);
> > >| arch/s390/kernel/compat_linux.c: tmp.st_size = stat->size;
> > >| arch/arm/kernel/sys_oabi-compat.c:   tmp.st_size = stat->size;
> > >| arch/mips/boot/compressed/calc_vmlinuz_load_addr.c:  vmlinux_size = 
> > >(uint64_t)sb.st_size;
> > >| drivers/net/ethernet/marvell/sky2.c: hw->st_idx = 
> > >RING_NEXT(hw->st_idx, hw->st_size);
> > >
> > >Not Elf_Sym.
> >
> > [ added Miroslav to CC, just in case he would like to check :) ]
> >
> > I have just double checked as well, and am fairly certain that the
> > Elf_Sym st_size field is not used to apply module relocations in any
> > arches, and it is not used in the core module loader nor in the module
> > kallsyms code. We'd like to avoid overwriting st_info in any case, to
> > fix kallsyms on Thumb-2 and also so that livepatch won't run into any
> > issues with 

Re: [PATCH] dmaengine: ti: omap-dma: Configure LCH_TYPE for OMAP1

2018-11-22 Thread Russell King - ARM Linux
On Thu, Nov 22, 2018 at 10:29:48AM +, Russell King - ARM Linux wrote:
> On Tue, Nov 20, 2018 at 11:04:06PM +0200, Aaro Koskinen wrote:
> > I had switched to PIO mode in 2015 since the WARNs about legacy DMA
> > API were too annoying and flooding the console. And now that I tried
> > using DMA again with g_ether, it doesn't work anymore. The device get's
> > recognized on host side, but no traffic goes through. Switching back to
> > PIO makes it to work again.
> 
> A solution to that would be to do what the warning message says, and
> update the driver to the DMAengine API.

Here's a partial conversion (not even build tested) - it only supports
OUT transfers with dmaengine at the moment.

There's at least one thing that doesn't make sense - the driver
apparently can transfer more than req->length bytes, surely if it does
that, it's a serious problem - shouldn't it be noisy about that?

Also we can't deal with the omap_set_dma_dest_burst_mode() setting -
DMAengine always uses a 64 byte burst, but udc wants a smaller burst
setting.  Does this matter?

I've kept the old code for reference (and the driver will fall back if
we can't get a dmaengine channel.)  I haven't been through and checked
that we result in the channel setup largely the same either.

There will be one major difference - UDC uses element sync, where
an element is 16bits, ep->ep.maxpacket/2 in a frame, and "packets"
frames.  DMAengine is using frame sync, with a 16bit element, one
element in a frame, and packets*ep->ep.maxpacket/2 frames.  This
should be functionally equivalent but I'd like confirmation of that.

I'm also not sure about this:

if (cpu_is_omap15xx())
end++;

in dma_dest_len() - is that missing from the omap-dma driver?  It looks
like a work-around for some problem on OMAP15xx, but I can't make sense
about why it's in the UDC driver rather than the legacy DMA driver.

I'm also confused by:

end |= start & (0x << 16);

also in dma_dest_len() - omap_get_dma_dst_pos() returns in the high 16
bits the full address:

if (dma_omap1())
offset |= (p->dma_read(CDSA, lch) & 0x);

so if the address crosses a 64k physical address boundary, surely orring
in the start address is wrong?  The more I look at dma_dest_len(), the
more I wonder whether that and dma_src_len() are anywhere near correct,
and whether that is a source of breakage for Aaro.

As I've already said, I've no way to test this on hardware.

Please review and let me know whether I missed anything on the OUT
handling path.

Fixing the IN path is going to be a bit more head-scratching.

 drivers/usb/gadget/udc/omap_udc.c | 154 +-
 drivers/usb/gadget/udc/omap_udc.h |   2 +
 2 files changed, 120 insertions(+), 36 deletions(-)

diff --git a/drivers/usb/gadget/udc/omap_udc.c 
b/drivers/usb/gadget/udc/omap_udc.c
index 3a16431da321..a37e1d2f0f3e 100644
--- a/drivers/usb/gadget/udc/omap_udc.c
+++ b/drivers/usb/gadget/udc/omap_udc.c
@@ -204,6 +204,7 @@ static int omap_ep_enable(struct usb_ep *_ep,
ep->dma_channel = 0;
ep->has_dma = 0;
ep->lch = -1;
+   ep->dma = NULL;
use_ep(ep, UDC_EP_SEL);
omap_writew(udc->clr_halt, UDC_CTRL);
ep->ackwait = 0;
@@ -576,21 +577,49 @@ static void finish_in_dma(struct omap_ep *ep, struct 
omap_req *req, int status)
 static void next_out_dma(struct omap_ep *ep, struct omap_req *req)
 {
unsigned packets = req->req.length - req->req.actual;
-   int dma_trigger = 0;
+   struct dma_async_tx_descriptor *tx;
+   struct dma_chan *dma = ep->dma;
+   dma_cookie_t cookie;
u16 w;
 
-   /* set up this DMA transfer, enable the fifo, start */
-   packets /= ep->ep.maxpacket;
-   packets = min(packets, (unsigned)UDC_RXN_TC + 1);
+   packets = min_t(unsigned, packets / ep->ep.maxpacket, UDC_RXN_TC + 1);
req->dma_bytes = packets * ep->ep.maxpacket;
-   omap_set_dma_transfer_params(ep->lch, OMAP_DMA_DATA_TYPE_S16,
-   ep->ep.maxpacket >> 1, packets,
-   OMAP_DMA_SYNC_ELEMENT,
-   dma_trigger, 0);
-   omap_set_dma_dest_params(ep->lch, OMAP_DMA_PORT_EMIFF,
-   OMAP_DMA_AMODE_POST_INC, req->req.dma + req->req.actual,
-   0, 0);
-   ep->dma_counter = omap_get_dma_dst_pos(ep->lch);
+
+   if (dma) {
+   struct dma_slave_config cfg = {
+   .direction = DMA_DEV_TO_MEM,
+   .src_addr_width = DMA_SLAVE_BUSWIDTH_2_BYTE,
+   /*
+* DMAengine uses frame sync mode, setting maxburst=1
+* is equivalent to element sync mode.
+*/
+   .src_maxburst = 1,
+

Re: [PATCH] dmaengine: ti: omap-dma: Configure LCH_TYPE for OMAP1

2018-11-22 Thread Russell King - ARM Linux
On Tue, Nov 20, 2018 at 11:04:06PM +0200, Aaro Koskinen wrote:
> I had switched to PIO mode in 2015 since the WARNs about legacy DMA
> API were too annoying and flooding the console. And now that I tried
> using DMA again with g_ether, it doesn't work anymore. The device get's
> recognized on host side, but no traffic goes through. Switching back to
> PIO makes it to work again.

A solution to that would be to do what the warning message says, and
update the driver to the DMAengine API.

The reason it didn't get updated when the DMAengine conversion happened
is because I don't have hardware for it, so had no way to test, and no
one seemed to know that anyone was using it.  Eventually, the WARN_ON()
was added to try and root out any users and generate interest in
updating the drivers.  Obviously that didn't happen, because people
just worked around the warning rather than saying anything.

I'm afraid we're long past the time that I'd be willing to update the
omap_udc driver now as I've dropped most of my knowledge on that as
it's been four years, and Peter has been looking after OMAP DMAengine
issues since.

Sorry.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [RFC PATCH 01/13] arm: Fix mutual exclusion in arch_gettimeoffset

2018-11-16 Thread Russell King - ARM Linux
On Thu, Nov 15, 2018 at 03:12:17PM +1100, Finn Thain wrote:
> The thread went way off-topic when Christoph took the opportunity to 
> suggest the removal of RPC and EBSA110. That doesn't interest me.

I suspect that is the right solution - I've been trying to get 4.19
to boot on my RPC and it's proving very difficult for several reasons,
not limited to the HDD seeming to be throwing the odd disk error, as
well as the kernel being rather large (a 4.19 kernel is 2.7M compressed
as opposed to 2.6.29 being 1.2M compressed for the equivalent
configuration.)

Given that memory on the RPC is not contiguous, that probably means
an uncompressed kernel overflows the size of the first memory bank,
so there's probably no hope for modern kernels to boot on the machine.

The EBSA110 is probably in a similar boat - I don't remember whether it
had 16MB or 32MB as the maximal amount of memory, but memory was getting
tight with some kernels even running a minimalist userspace.

So, it's probably time to say goodbyte to the kernel support for these
platforms.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [RFC PATCH 01/13] arm: Fix mutual exclusion in arch_gettimeoffset

2018-11-14 Thread Russell King - ARM Linux
On Wed, Nov 14, 2018 at 03:58:36PM +0100, Geert Uytterhoeven wrote:
> Hi Russell,
> 
> On Wed, Nov 14, 2018 at 3:16 PM Russell King - ARM Linux
>  wrote:
> > On Wed, Nov 14, 2018 at 02:17:09PM +1100, Finn Thain wrote:
> > > So, even assuming that you're right about the limitations of single-timer
> > > platforms in general, removal of arch_gettimeoffset wouldn't require the
> > > removal of any platforms, AFAICT.
> >
> > I haven't proposed removing platforms.
> >
> > I'm just objecting to the idea of removing arch_gettimeoffset(),
> > thereby causing a regression by changing the resolution of
> > gettimeofday() without any sign of equivalent functionality.
> >
> > However, I now see (having searched mailing lists) what you are
> > trying to do - you have _not_ copied me or the mailing lists I'm
> > on with your cover message, so I'm *totally* lacking in the context
> > of your patch series, particularly where you are converting m68k
> > to use clocksources without needing the gettimeoffset() stuff.
> >
> > You have failed to explain that in this thread - probably assuming
> > that I've read your cover message.  I haven't until now, because
> > you never sent it to me or the linux-arm-kernel mailing list.
> >
> > I have found this thread _very_ frustrating, and frankly a waste of
> > my time discussing the finer points because of this lack of context.
> > Please ensure that if you're going to be sending a patch series,
> > that the cover message at least finds its way to the intended
> > audience of your patches, so that everyone has the context they
> > need when looking at (eg) the single patch they may receive.
> >
> > Alternatively, if someone raises a problem with the patch, and you
> > _know_ you haven't done that, then please consider informing them
> > where they can get more context, eg, by providing a link to your
> > archived cover message.  It would help avoid misunderstandings.
> 
> Sorry for the lack of context.
> The real trigger was also not explained in the cover message, and was a
> the threat to remove platforms not using modern timekeeping APIs, cfr.
>  "Removing support for old hardware from the kernel"
> (https://lwn.net/Articles/769468/).

And naturally, because kernel developers are oh so great at
communication, that decision has been communicated to those
affected by it. *NOT*.

Clearly there is a need for much better communication.  We're
better at it when dealing with patches, but not when it comes to
physical meetings.

Maybe when some decision like "we're going to kill stuff still
using the old gettimeoffset API" then _someone_ needs to identify
which platforms are using that and make sure that those maintainers
know about that decision, much the same way that if a patch were
to touch the gettimeoffset API, we'd make damn sure that such a
patch went to those affected by the change.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [RFC PATCH 01/13] arm: Fix mutual exclusion in arch_gettimeoffset

2018-11-14 Thread Russell King - ARM Linux
On Wed, Nov 14, 2018 at 02:17:09PM +1100, Finn Thain wrote:
> On Tue, 13 Nov 2018, Russell King - ARM Linux wrote:
> 
> > 
> > A clocksource provides a cycle counter that monotonically changes and 
> > does not wrap between clockevent events.
> > 
> > A clock event is responsible for providing events to the system when 
> > some work is needing to be done, limited by the wrap interval of the 
> > clocksource.
> > 
> > Each time the clock event triggers an interrupt, the clocksource is
> > read to determine how much time has passed, using:
> > 
> > count = (new_value - old_value) & available_bits
> > nanosecs = count * scale >> shift;
> > 
> > If you try to combine the clocksource and clockevent because you only
> > have a single counter, and the counter has the behaviour of:
> > - counting down towards zero
> > - when reaching zero, triggers an interrupt, and reloads with N
> > 
> > then this provides your clockevent, but you can't use this as a clock
> > source, because each time you receive an interrupt and try to read the
> > counter value, it will be approximately the same value.  This means
> > that the above calculation fails to register the correct number of
> > nanoseconds passing.  Hence, this does not work.
> > 
> > Also note where I said above that the clock event device must be able
> > to provide an interrupt _before_ the clocksource wraps - clearly with
> > such a timer, that is utterly impossible.
> > 
> > The simple solution is to not use such a counter as the clocksource,
> > which means you fall back to using the jiffies clocksource, and your
> > timekeeping has jiffy resolution - so 10ms, or possibly 1ms if you
> > want a 1kHz timer interval.  For most applications, that's simply way
> > to coarse, as was realised in the very early days of Linux.
> > 
> > If only there was a way to interpolate between timer interrupts...
> > which is exactly what arch_gettimeoffset() does, and is a historical
> > reminant of the pre-clocksource/clockevent days of the kernel - but
> > it is the only way to get better resolution from this sort of setup.
> > 
> 
> Both of the platforms in question (RPC and EBSA110) have not 
> defined(CONFIG_GENERIC_CLOCKEVENTS) and have not defined any struct 
> clock_event_device, AFAICT.

Correct, because they don't use clockevents.  This is pre-clocksource,
pre-clockevent code, it uses the old timekeeping infrastructure,
which is xtime_update() and providing a gettimeoffset implementation.

> So, even assuming that you're right about the limitations of single-timer 
> platforms in general, removal of arch_gettimeoffset wouldn't require the 
> removal of any platforms, AFAICT.

I haven't proposed removing platforms.

I'm just objecting to the idea of removing arch_gettimeoffset(),
thereby causing a regression by changing the resolution of
gettimeofday() without any sign of equivalent functionality.

However, I now see (having searched mailing lists) what you are
trying to do - you have _not_ copied me or the mailing lists I'm
on with your cover message, so I'm *totally* lacking in the context
of your patch series, particularly where you are converting m68k
to use clocksources without needing the gettimeoffset() stuff.

You have failed to explain that in this thread - probably assuming
that I've read your cover message.  I haven't until now, because
you never sent it to me or the linux-arm-kernel mailing list.

I have found this thread _very_ frustrating, and frankly a waste of
my time discussing the finer points because of this lack of context.
Please ensure that if you're going to be sending a patch series,
that the cover message at least finds its way to the intended
audience of your patches, so that everyone has the context they
need when looking at (eg) the single patch they may receive.

Alternatively, if someone raises a problem with the patch, and you
_know_ you haven't done that, then please consider informing them
where they can get more context, eg, by providing a link to your
archived cover message.  It would help avoid misunderstandings.

Thanks.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [RFC PATCH 01/13] arm: Fix mutual exclusion in arch_gettimeoffset

2018-11-14 Thread Russell King - ARM Linux
On Mon, Nov 12, 2018 at 03:12:39PM +1100, Finn Thain wrote:
> Implementations of arch_gettimeoffset are generally not re-entrant
> and assume that interrupts have been disabled. Unfortunately this
> pre-condition got broken in v2.6.32.

To me, it looks way worse than what you think.

The original code sequence before conversion to arch_gettimeoffset()
was:

1. lock (inc. disabling IRQs)
2. read gettimeoffset correction
3. add offset to current xtime
4. unlock

This means there is no chance for a timer interrupt to happen between
reading the current offset and adding it to the current kernel time.
If the timer does roll over, then the interrupt will remain pending,
so the whole "read offset and add to xtime" is one atomic block and
we can use the pending interrupt to indicate whether the timer has
rolled over and apply the appropriate offset correction.

With the arch_gettimeoffset() code, that changes:

1. read clocksource (which will be jiffies)
2. compute clocksource delta
3. increment nanoseconds
4. add gettimeoffset correction

If we receive a timer interrupt part-way through this, for example,
between 2 and 3, then jiffies will increment (via do_timer()) and the
interrupt will be cleared.  This means that the check in
ioc_timer_gettimeoffset() for a pending interrupt, for example, will
be false, and we will not know that an interrupt has happened.

So, unless I'm missing something, commit 5cfc8ee0bb51 well and truely
broke the accuracy of timekeeping on these platforms, and will result
in timekeeping spontaneously jumping backwards.  I had been wondering
why NTP had become less stable on EBSA110, but never investigated.

However, that still does not justify blowing away this functionality
without replacement, as Finn has been proposing.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [RFC PATCH 01/13] arm: Fix mutual exclusion in arch_gettimeoffset

2018-11-13 Thread Russell King - ARM Linux
On Wed, Nov 14, 2018 at 02:35:29PM +1300, Michael Schmitz wrote:
> So we'd still have to use jiffies + interpolation from the current timer
> rundown counter as clocksource (since that will be monotonous and won't
> wrap)?
> 
> The way Finn did the clocksource for m68k, the clocksource counter does
> behave as it should (monotonous, doesn't wrap) at least so far as I've
> tested. Using the same timer for clocksource and clock events will degrade
> timekeeping based on timer events to jiffies resolution, as you point out.
> That would already have been the case before, so no gain in resolution.

... and that is where you are wrong.

RPC, for example, has gettimeofday() resolution of 500ns.  Removing
gettimeoffset() will result in a resolution of 1/HZ since there is
no longer any interpolation between interrupts.

> Other timekeeping would have worked at higher resolution before
> (interpolating through arch_gettimeoffset) just the same as now
> (interpolating through clocksource reads). Finn's clocksource read code
> essentially is arch_gettimeoffset under a new name.

Where is this code - all I've seen is code adding IRQ exclusion in
the gettimeoffset method.  If some other solution is being proposed,
then it's no good beating about the bush - show the damn code, and
then that can be considered.

However, what has been said in this thread is basically "remove the
gettimeoffset method", which is _not_ acceptable, it will cause
gettimeofday() on these platforms to lose resolution, which is a
user visible REGRESSION, plain and simple.

If what is actually meant is "we have a replacement for gettimeoffset"
then, and I'm sure we all know this, there needs to be a patch
proposed showing what is being proposed, rather than waffling around
the issue.

> Are you saying that's not possible on arm, because the current timer rundown
> counter can't be read while the timer is running?
> 
> If I were to run a second timer at higher rate for clocksource, but keeping
> the 10 ms timer as clock event (could easily do that by using timer D on
> Atari Falcon) - how would that improve my timekeeping? Clock events still
> only happen 10 ms apart ...

Ah, I think you're talking about something else.

You seem to be talking about what happens when time keeping interrupts
happen.

I'm talking about the resolution of gettimeofday() and the other
interfaces that are used (eg) for packet capture timestamping and
the like - the _user_ visible effects of the timekeeping system.

With the existing implementation, all these have better-than-jiffy
resolution - in the case of RPC, that's 500ns, rather than 10ms
which would be the case without gettimeoffset().  Removing
gettimeoffset() as Finn has proposed without preserving that
resolution is simply completely unacceptable.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [RFC PATCH 01/13] arm: Fix mutual exclusion in arch_gettimeoffset

2018-11-13 Thread Russell King - ARM Linux
On Wed, Nov 14, 2018 at 08:55:37AM +1100, Finn Thain wrote:
> On Tue, 13 Nov 2018, Russell King - ARM Linux wrote:
> 
> > On Tue, Nov 13, 2018 at 02:39:00PM +1100, Finn Thain wrote:
> > > 
> > > You could remove the old arch_gettimeoffset API without dropping any 
> > > platforms.
> > > 
> > > If no-one converts a given platform to the clocksource API it would mean 
> > > that the default 'jiffies' clocksource will get used on that platform.
> > > 
> > > Clock resolution and timer precision would be degraded, but that might 
> > > not 
> > > matter.
> > > 
> > > Anyway, if someone who has this hardware is willing to test a clocksource 
> > > API conversion, they can let me know and I'll attempt that patch.
> > 
> > There's reasons why that's not appropriate - such as not having two
> > separate timers in order to supply a clocksource and separate clock
> > event.
> > 
> > Not all hardware is suited to the clocksource + clockevent idea.
> > 
> 
> Sorry, I don't follow.
> 
> AFAIK, clocksources and clock event devices are orthogonal concepts. There 
> are platforms with !ARCH_USES_GETTIMEOFFSET && !GENERIC_CLOCKEVENTS (and 
> every other combination).
> 
> A clocksource read method just provides a cycle count, and in this sense 
> arch_gettimeoffset() is equivalent to a clocksource.

A clocksource provides a cycle counter that monotonically changes and
does not wrap between clockevent events.

A clock event is responsible for providing events to the system when
some work is needing to be done, limited by the wrap interval of the
clocksource.

Each time the clock event triggers an interrupt, the clocksource is
read to determine how much time has passed, using:

count = (new_value - old_value) & available_bits
nanosecs = count * scale >> shift;

If you try to combine the clocksource and clockevent because you only
have a single counter, and the counter has the behaviour of:
- counting down towards zero
- when reaching zero, triggers an interrupt, and reloads with N

then this provides your clockevent, but you can't use this as a clock
source, because each time you receive an interrupt and try to read the
counter value, it will be approximately the same value.  This means
that the above calculation fails to register the correct number of
nanoseconds passing.  Hence, this does not work.

Also note where I said above that the clock event device must be able
to provide an interrupt _before_ the clocksource wraps - clearly with
such a timer, that is utterly impossible.

The simple solution is to not use such a counter as the clocksource,
which means you fall back to using the jiffies clocksource, and your
timekeeping has jiffy resolution - so 10ms, or possibly 1ms if you
want a 1kHz timer interval.  For most applications, that's simply way
to coarse, as was realised in the very early days of Linux.

If only there was a way to interpolate between timer interrupts...
which is exactly what arch_gettimeoffset() does, and is a historical
reminant of the pre-clocksource/clockevent days of the kernel - but
it is the only way to get better resolution from this sort of setup.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] arm64: dts: marvell: armada_8k: reserve memory for ATF

2018-11-13 Thread Russell King - ARM Linux
On Tue, Nov 13, 2018 at 02:48:07PM +0100, Emmanuel Vadot wrote:
> Like 4436a371 for 37xx, reserve the PSCI area memory region so kernels
> can call the code there.
> Region address is taken from the ATF code [1] and is 2MiB aligned.
> 
> [1] plat/marvell/a8k/common/include/platform_def.h
> 
> Signed-off-by: Emmanuel Vadot 
> ---
>  arch/arm64/boot/dts/marvell/armada-ap806.dtsi | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/marvell/armada-ap806.dtsi 
> b/arch/arm64/boot/dts/marvell/armada-ap806.dtsi
> index 073610ac0a53..d8a79bb69de1 100644
> --- a/arch/arm64/boot/dts/marvell/armada-ap806.dtsi
> +++ b/arch/arm64/boot/dts/marvell/armada-ap806.dtsi
> @@ -23,6 +23,21 @@
>   spi0 = 
>   };
>  
> + reserved-memory {
> + #address-cells = <2>;
> + #size-cells = <2>;
> + ranges;
> +
> + /*
> +  * The PSCI firmware region depicted below is the default one
> +  * and should be updated by the bootloader.
> +  */
> + psci-area@400 {
> + reg = <0 0x400 0 0x20>;
> + no-map;
> + };
> + };

Does this really apply to everything?  I'm not sure it does.
If I look at a memory dump from the Macchiatobin, I see:

 0400   0007 0005 0040
 0410  0001 1000 0007 0001
 0420  0008  0009 
 0430  000a  fff7 
 0440  fbf7dfc7 fdff f2fe fbfdfffb
 0450  af7ffefb fbfff7ff efbfffe3 5efdbf4d
 0460  ff7e fdffedff ff9f8fff ef9d
 0470  bffc ff7fff7a eff7ff7d 7fffbfff
 0480  ffefefff fedfff7f ffdf ffbdff7f
 0490  f3ff7fdb 76ff ffeffefe ff7f
 04a0  9fff fff7ffdf edff fcffdfff
 04b0  fffbefff fefb ffe7 6fff
 04c0  ffdffdfb fd7fff9e fbefefbf eeefdfeb
 04d0  bbdfbda3 9dfdef92 7b6ffaee fffde3fc
 04e0  ff9efffd ff79ffdb ffaef57f ffaec1fe
 04f0  bff7 fffeffdb eb3f eaedffee
 04000100  fffb ffdf ffdffebf bfefbff3
 04000110   feff f7ff ffeeefff
 04000120  f77f fedf fffe fffefefe
 04000130  fb7d fffd fdff b7ff
 04000140  6fff ffee7fed feff fffbfe6d
 04000150  fd5ff7ff f7bfffdc fdfff3c7 f7ff6f1d

Doesn't look like firmware to me.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] ARM: stm32: debug: add low-level debug support

2018-11-13 Thread Russell King - ARM Linux
On Tue, Nov 13, 2018 at 09:16:16AM +, Bich HEMON wrote:
> 
> On 11/12/18 7:22 PM, Olof Johansson wrote:
> > On Thu, Jul 27, 2017 at 04:50:20PM +, Bich HEMON wrote:
> >> From: Gerald Baeza 
> >>
> >> This adds low-level debug support on USART1 for STM32F4
> >> and STM32F7.
> >> Compiled via 'CONFIG_DEBUG_LL' and 'CONFIG_EARLY_PRINTK'.
> >> Enabled via 'earlyprintk' in bootargs.
> >>
> >> Signed-off-by: Gerald Baeza 
> >> Signed-off-by: Bich Hemon 
> > 
> > Hi,
> > 
> > This had fallen between the chairs it seems. I have applied it to arm-soc
> > next/soc now, for 4.21 merge window.
> > 
> > It ended up being patched up manually to consolidate the version in
> > Russell's patch tracker with this posted version, and I tweaked whitespace
> > a bit. Let me know if I missed something.
> > 
> > 
> > Thanks,
> > 
> > -Olof
> > 
> 
> Hi Olof,
> 
> Please note that this patch has to be abandoned as Ludovic BARRE pushed 
> a new version of this change:
> https://patchwork.codeaurora.org/patch/400563/
> 
> You can find it in Russell's tracker here:
> http://www.arm.linux.org.uk/developer/patches/viewpatch.php?id=8803/1

And I'm not going to merge that because:

1) it's not for me to merge - it doesn't go through my tree, but through
   arm-soc, which Olof and Arnd manage.
2) it's not been on the mailing list as per normal submission process.

Sorry.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [RFC PATCH 01/13] arm: Fix mutual exclusion in arch_gettimeoffset

2018-11-13 Thread Russell King - ARM Linux
On Tue, Nov 13, 2018 at 02:39:00PM +1100, Finn Thain wrote:
> On Mon, 12 Nov 2018, Christoph Hellwig wrote:
> 
> > On Mon, Nov 12, 2018 at 03:12:39PM +1100, Finn Thain wrote:
> > > Implementations of arch_gettimeoffset are generally not re-entrant and 
> > > assume that interrupts have been disabled. Unfortunately this 
> > > pre-condition got broken in v2.6.32.
> > > 
> > > Fixes: 5cfc8ee0bb51 ("ARM: convert arm to arch_gettimeoffset()")
> > > Signed-off-by: Finn Thain 
> > 
> > This looks like a sensible fix for -stable backporting.  But with m68k
> > converted over it might be time for these two arm platforms to either
> > be converted to the clocksource API or to be dropped..
> > 
> 
> You could remove the old arch_gettimeoffset API without dropping any 
> platforms.
> 
> If no-one converts a given platform to the clocksource API it would mean 
> that the default 'jiffies' clocksource will get used on that platform.
> 
> Clock resolution and timer precision would be degraded, but that might not 
> matter.
> 
> Anyway, if someone who has this hardware is willing to test a clocksource 
> API conversion, they can let me know and I'll attempt that patch.

There's reasons why that's not appropriate - such as not having two
separate timers in order to supply a clocksource and separate clock
event.

Not all hardware is suited to the clocksource + clockevent idea.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH v6 5/9] dt-bindings: ARM: document marvell,ecc-enable binding

2018-11-09 Thread Russell King - ARM Linux
On Fri, Nov 09, 2018 at 12:40:06PM +0100, Arnd Bergmann wrote:
> On Fri, Nov 9, 2018 at 8:04 AM Chris Packham
>  wrote:
> >
> > Add documentation for the marvell,ecc-enable and marvell,ecc-disable
> > properties which can be used to enable/disable ECC on the Marvell aurora
> > cache.
> >
> > Signed-off-by: Chris Packham 
> > ---
> 
> Why do you need both enable and disable? Wouldn't one of them be enough here?

It isn't an "on when ecc-enable is present, off when not" because the
current behaviour is to preserve these bits in the control register.

If we were to implement it as "if no ecc-enable property, turn off
ECC" then that would drastically change the behaviour - systems which
were configured for ECC suddenly lose ECC support.

Since we don't know which have it and which don't, we can't implement
the option like that.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: linux-4.20-rc1/arch/arm/vfp/vfpmodule.c:576: possible cut'n'paste error

2018-11-06 Thread Russell King - ARM Linux
On Tue, Nov 06, 2018 at 04:29:54PM +, Julien Thierry wrote:
> Hi Russel, David,
> 
> On 06/11/18 16:20, Russell King - ARM Linux wrote:
> >On Mon, Nov 05, 2018 at 01:53:13PM +, David Binderman wrote:
> >>Hello there,
> >>
> >>2nd try. Plain text might help.
> >
> >Yep, Linux kernel development generally doesn't like wasteful html
> >emails, sorry.
> >
> >>linux-4.20-rc1/arch/arm/vfp/vfpmodule.c:576]: (warning) Redundant 
> >>assignment of 'ufp_exc->fpinst2' to itself.
> >>
> >>Source code is
> >>
> >>        ufp_exc->fpexc = hwstate->fpexc;
> >>        ufp_exc->fpinst = hwstate->fpinst;
> >>        ufp_exc->fpinst2 = ufp_exc->fpinst2;
> >
> >Thanks for the report - it most certainly is a bug introduced by
> >Julien's patches, but I don't get your warning here.  Which compiler
> >produces that warning?
> >
> 
> Hmmm, silly typo/copy-paste from my end. Last line should be:
> 
>   ufp_exc->fpinst2 = hwstate->fpinst2;
> 
> Sorry about that.
> 
> >Julien - unfortunately, I've just asked Linus to take another fix
> >for Spectre, so we're going to have to wait a bit before I can
> >submit something for this.
> >
> 
> It is just a one line fix, should I submit a patch to the LAKML or patch
> system?

Both I guess.

> Otherwise let me know when is it best for you to receive the fix then to
> send to Linus.

As mentioned, I'll have to wait a while, it's not fair to send a pull
request for one patch followed immediately by another pull request for
another patch on top of the same pull request.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: linux-4.20-rc1/arch/arm/vfp/vfpmodule.c:576: possible cut'n'paste error

2018-11-06 Thread Russell King - ARM Linux
On Tue, Nov 06, 2018 at 04:33:26PM +, David Binderman wrote:
> hello there Russell,
> 
> > linux-4.20-rc1/arch/arm/vfp/vfpmodule.c:576]: (warning) Redundant 
> > assignment of >'ufp_exc->fpinst2' to itself.
> 
> >Thanks for the report - it most certainly is a bug introduced by
> >Julien's patches, but I don't get your warning here.  Which compiler
> >produces that warning?
> 
> Not a compiler, cppcheck, a static analyser for C and C++ code.
> 
> Interestingly, more of the same in file 
> linux-4.20-rc1/arch/arm/mach-pxa/pxa3xx.c
> 
> [linux-4.20-rc1/arch/arm/mach-pxa/pxa3xx.c:84]: (warning) Redundant 
> assignment of 'ASCR' to itself.
> [linux-4.20-rc1/arch/arm/mach-pxa/pxa3xx.c:85]: (warning) Redundant 
> assignment of 'ARSR' to itself.
> [linux-4.20-rc1/arch/arm/mach-pxa/pxa3xx.c:120]: (warning) Redundant 
> assignment of 'ASCR' to itself.
> [linux-4.20-rc1/arch/arm/mach-pxa/pxa3xx.c:121]: (warning) Redundant 
> assignment of 'ARSR' to itself.
> 
> I don't know if these four are also worth fixing.

There are cases where this can be false positives.  In the case of a
status register with write-1-to-clear bits for example.  These ones
look very much like that.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: linux-4.20-rc1/arch/arm/vfp/vfpmodule.c:576: possible cut'n'paste error

2018-11-06 Thread Russell King - ARM Linux
On Mon, Nov 05, 2018 at 01:53:13PM +, David Binderman wrote:
> Hello there,
> 
> 2nd try. Plain text might help.

Yep, Linux kernel development generally doesn't like wasteful html
emails, sorry.

> linux-4.20-rc1/arch/arm/vfp/vfpmodule.c:576]: (warning) Redundant assignment 
> of 'ufp_exc->fpinst2' to itself.
> 
> Source code is
> 
>        ufp_exc->fpexc = hwstate->fpexc;
>        ufp_exc->fpinst = hwstate->fpinst;
>        ufp_exc->fpinst2 = ufp_exc->fpinst2;

Thanks for the report - it most certainly is a bug introduced by
Julien's patches, but I don't get your warning here.  Which compiler
produces that warning?

Julien - unfortunately, I've just asked Linus to take another fix
for Spectre, so we're going to have to wait a bit before I can
submit something for this.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH v2] ARM:kexec:offline panic_smp_self_stop CPU

2018-11-02 Thread Russell King - ARM Linux
On Fri, Nov 02, 2018 at 10:31:27AM +0800, wangyufen wrote:
> In case panic() and panic() called at the same time on different CPUS.
> For example:
> CPU 0:
>   panic()
>  __crash_kexec
>machine_crash_shutdown
>  crash_smp_send_stop
>machine_kexec
>  BUG_ON(num_online_cpus() > 1);
> 
> CPU 1:
>   panic()
> local_irq_disable
> panic_smp_self_stop
> 
> If CPU 1 calls panic_smp_self_stop() before crash_smp_send_stop(), kdump
> fails. CPU1 can't receive the ipi irq, CPU1 will be always online.
> To fix this problem, this patch split out the panic_smp_self_stop()
> and add set_cpu_online(smp_processor_id(), false).

Looks fine now, please send it to the patch system (details in my
signature.)  Thanks.

> 
> Signed-off-by: Yufen Wang 
> ---
>  arch/arm/kernel/smp.c | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/arch/arm/kernel/smp.c b/arch/arm/kernel/smp.c
> index 9000d8b..d7b86e4 100644
> --- a/arch/arm/kernel/smp.c
> +++ b/arch/arm/kernel/smp.c
> @@ -682,6 +682,21 @@ void smp_send_stop(void)
>   pr_warn("SMP: failed to stop secondary CPUs\n");
>  }
>  
> +/* In case panic() and panic() called at the same time on CPU1 and CPU2,
> + * and CPU 1 calls panic_smp_self_stop() before crash_smp_send_stop()
> + * CPU1 can't receive the ipi irqs from CPU2, CPU1 will be always online,
> + * kdump fails. So split out the panic_smp_self_stop() and add
> + * set_cpu_online(smp_processor_id(), false).
> + */
> +void panic_smp_self_stop(void)
> +{
> + pr_debug("CPU %u will stop doing anything useful since another CPU has 
> paniced\n",
> +  smp_processor_id());
> + set_cpu_online(smp_processor_id(), false);
> + while (1)
> + cpu_relax();
> +}
> +
>  /*
>   * not supported here
>   */
> -- 
> 2.7.4
> 
> 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] ARM:kexec:offline panic_smp_self_stop CPU

2018-11-01 Thread Russell King - ARM Linux
On Thu, Nov 01, 2018 at 07:20:49PM +0800, Wang Yufen wrote:
> From: Yufen Wang 
> 
> In case panic() and panic() called at the same time on different CPUS.
> For example:
> CPU 0:
>   panic()
>  __crash_kexec
>machine_crash_shutdown
>  crash_smp_send_stop
>machine_kexec
>  BUG_ON(num_online_cpus() > 1);
> 
> CPU 1:
>   panic()
> local_irq_disable
> panic_smp_self_stop
> 
> If CPU 1 calls panic_smp_self_stop() before crash_smp_send_stop(), kdump
> fails. CPU1 can't receive the ipi irq, CPU1 will be always online.
> I changed BUG_ON to WARN in kexec crash as arm64 does, kdump also fails.
> Because num_online_cpus() > 1, can't disable the L2 in _soft_restart.
> To fix this problem, this patch split out the panic_smp_self_stop()
> and add set_cpu_online(smp_processor_id(), false).

Thanks.

I think this may as well go into arch/arm/kernel/smp.c - it won't be
required for single-CPU systems, since there aren't "other" CPUs.

It's probably also worth a comment above the function as to why we
have this.

> 
> Signed-off-by: Yufen Wang 
> ---
>  arch/arm/kernel/setup.c | 10 ++
>  1 file changed, 10 insertions(+)
> 
> diff --git a/arch/arm/kernel/setup.c b/arch/arm/kernel/setup.c
> index 31940bd..151861f 100644
> --- a/arch/arm/kernel/setup.c
> +++ b/arch/arm/kernel/setup.c
> @@ -602,6 +602,16 @@ static void __init smp_build_mpidr_hash(void)
>  }
>  #endif
>  
> +void panic_smp_self_stop(void)
> +{
> + printk(KERN_DEBUG "CPU %u will stop doing anything useful since another 
> CPU has paniced\n",
> + smp_processor_id());
> + set_cpu_online(smp_processor_id(), false);
> + while (1)
> + cpu_relax();
> +
> +}
> +
>  static void __init setup_processor(void)
>  {
>   struct proc_info_list *list;
> -- 
> 2.7.4
> 
> 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH 4/6] of/fdt: Populate phys_initrd_start/phys_initrd_size from FDT

2018-10-30 Thread Russell King - ARM Linux
On Mon, Oct 29, 2018 at 04:52:04PM -0700, Florian Fainelli wrote:
> If the architecture implements ARCH_HAS_PHYS_INITRD, make the FDT
> scanning code populate the physical address of the start of the FDT and
> its size.
> 
> Signed-off-by: Florian Fainelli 
> ---
>  arch/arm/mm/init.c | 2 +-
>  drivers/of/fdt.c   | 4 
>  2 files changed, 5 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c
> index 8f364aa24172..517e95cfb5d2 100644
> --- a/arch/arm/mm/init.c
> +++ b/arch/arm/mm/init.c
> @@ -237,7 +237,7 @@ static void __init arm_initrd_init(void)
>   phys_addr_t start;
>   unsigned long size;
>  
> - /* FDT scan will populate initrd_start */
> + /* FDT scan will populate initrd_start and phys_initrd_start */
>   if (initrd_start && !phys_initrd_size) {
>   phys_initrd_start = __virt_to_phys(initrd_start);
>   phys_initrd_size = initrd_end - initrd_start;

We should be able to delete the whole if () { } block and comment as
a result of this series - it was added by Rob in 65939301acdb to
unify the DT initrd code.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] arm: mm: fault: check ADFSR in case of abort

2018-10-29 Thread Russell King - ARM Linux
On Mon, Oct 29, 2018 at 03:54:36PM +, Mark Rutland wrote:
> On Mon, Oct 29, 2018 at 02:20:51PM +, Wiebe, Wladislav (Nokia - DE/Ulm) 
> wrote:
> > When running into situations like:
> > "Unhandled fault: synchronous external abort (0x210) at 0xXXX"
> > or
> > "Unhandled prefetch abort: synchronous external abort (0x210) at 0xXXX"
> > it is useful to know the content of ADFSR (Auxiliary Data Fault Status
> > Register) to indicate an ECC double-bit error in L1 or L2 cache.
> > 
> > Refer to:
> > Cortex-A15 Technical Reference Manual, Revision: r2p1
> > [6.4.8. Error Correction Code]
> > 
> > Signed-off-by: Wladislav Wiebe 
> > ---
> >  arch/arm/mm/fault.c | 18 ++
> >  1 file changed, 18 insertions(+)
> > 
> > diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
> > index 3232afb6fdc0..5e240deb6ed6 100644
> > --- a/arch/arm/mm/fault.c
> > +++ b/arch/arm/mm/fault.c
> > @@ -547,6 +547,22 @@ hook_fault_code(int nr, int (*fn)(unsigned long, 
> > unsigned int, struct pt_regs *)
> > fsr_info[nr].name = name;
> >  }
> >  
> > +/*
> > + * Check for ECC double-bit errors in Auxiliary Data Fault Status Register
> > + */
> > +static void check_adfsr_for_ecc(void)
> > +{
> > +   u32 adfsr = 0;
> > +
> > +   asm("mrc p15, 0, %0, c5, c1, 0" : "=r" (adfsr));
> > +
> > +   if (adfsr & (BIT(31) | BIT(23))) {
> > +   pr_alert("ADFSR status 0x%x indicates that an L1 or L2 cache\n"
> > +"ECC double-bit error occurred at some time.\n",
> > + adfsr);
> > +   }
> > +}
> > +
> >  /*
> >   * Dispatch a data abort to the relevant handler.
> >   */
> > @@ -559,6 +575,7 @@ do_DataAbort(unsigned long addr, unsigned int fsr, 
> > struct pt_regs *regs)
> > if (!inf->fn(addr, fsr & ~FSR_LNX_PF, regs))
> > return;
> >  
> > +   check_adfsr_for_ecc();
> > pr_alert("Unhandled fault: %s (0x%03x) at 0x%08lx\n",
> > inf->name, fsr, addr);
> > show_pte(current->mm, addr);
> > @@ -593,6 +610,7 @@ do_PrefetchAbort(unsigned long addr, unsigned int ifsr, 
> > struct pt_regs *regs)
> > if (!inf->fn(addr, ifsr | FSR_LNX_PF, regs))
> > return;
> >  
> > +   check_adfsr_for_ecc();
> > pr_alert("Unhandled prefetch abort: %s (0x%03x) at 0x%08lx\n",
> > inf->name, ifsr, addr);
> 
> IIUC at this point the task is preemptible (and interruptible),

It may be preemptable, but isn't necessarily so.  It depends whether the
called FSR specific function enabled interrupts or not.

So, it would be better to read the ADFSR before calling the FSR specific
function to guarantee that we read the values that correspond with _this_
fault.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] arm: mm: fault: check ADFSR in case of abort

2018-10-29 Thread Russell King - ARM Linux
On Mon, Oct 29, 2018 at 02:20:51PM +, Wiebe, Wladislav (Nokia - DE/Ulm) 
wrote:
> When running into situations like:
> "Unhandled fault: synchronous external abort (0x210) at 0xXXX"
> or
> "Unhandled prefetch abort: synchronous external abort (0x210) at 0xXXX"
> it is useful to know the content of ADFSR (Auxiliary Data Fault Status
> Register) to indicate an ECC double-bit error in L1 or L2 cache.
> 
> Refer to:
> Cortex-A15 Technical Reference Manual, Revision: r2p1
> [6.4.8. Error Correction Code]

This is CPU independent code, and so must only access registers that are
present on all CPUs which may run that code.

Here's the extract from the ARM ARM for the ADFSR and AIFSR:

  The position of these registers is architecturally-defined, but the
  content and use of the registers is IMPLEMENTATION DEFINED. An
  implementation can use these registers to return additional fault
  status information. An example use of these registers is to return
  more information for diagnosing parity errors.

So by testing bits in this register, you are making use of
implementation defined values.

It also goes on to say:

  These registers are not implemented in architecture versions before
  ARMv7.

So before ARMv7, we have to take note of the unimplemented CP15 rules:

2. In an allocated CP15 primary register, accesses to all unallocated
   encodings are UNPREDICTABLE for accesses at PL1 or higher.  This
   means that any MCR or MRC access from PL1 or higher with a
   combination of , ,  and  values not shown in,
   or referenced from, Full list of VMSA CP15 registers, by coprocessor
   register number on page B3-1481, that would access an allocated
   CP15 primary register, is UNPREDICTABLE. As indicated by rule 1, for
   the ARMv7-Aarchitecture, the allocated CP15 primary registers are:
   • in any VMSA implementation, c0-c3, c5-c11, c13, and c15
   ...

So I'd prefer if we didn't attempt to read this register on CPUs where
this isn't explicitly implemented.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] ARM: mm: Facilitate debugging CONFIG_KUSER_HELPERS disabled

2018-10-25 Thread Russell King - ARM Linux
On Fri, Oct 26, 2018 at 12:00:09AM +0530, Souptick Joarder wrote:
> On Thu, Oct 25, 2018 at 11:40 PM Florian Fainelli  
> wrote:
> >
> > Some software such as perf makes unconditional use of the special
> > [vectors] page which is only provided when CONFIG_KUSER_HELPERS is
> > enabled in the kernel.
> >
> > Facilitate the debugging of such situations by printing a debug message
> > to the kernel log showing the task name and the faulting address.
> >
> > Suggested-by: Russell King 
> > Signed-off-by: Florian Fainelli 
> > ---
> >  arch/arm/mm/fault.c | 5 +
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
> > index f4ea4c62c613..f17471fbc1c4 100644
> > --- a/arch/arm/mm/fault.c
> > +++ b/arch/arm/mm/fault.c
> > @@ -173,6 +173,11 @@ __do_user_fault(struct task_struct *tsk, unsigned long 
> > addr,
> > show_regs(regs);
> > }
> >  #endif
> > +#ifndef CONFIG_KUSER_HELPERS
> 
> Just have one doubt, if the condition is "#ifdef CONFIG_KUSER_HELPER"
> as commit message suggests the scenario is valid when CONFIG_KUSER_HELPER
> is enabled ? No ?
> 
> > +   if ((sig == SIGSEGV) && ((addr & PAGE_MASK) == 0x))
> > +   printk(KERN_DEBUG "%s: CONFIG_KUSER_HELPERS disabled at 
> > 0x%08lx\n",
> > +  tsk->comm, addr);
> > +#endif

The idea is to print a message when we get a SEGV _and_ the faulting
address is in the vectors page _and_ CONFIG_KUSER_HELPERS is _not_ set
(which makes the page inaccessible.)  The message points users to
enable it and/or why the application has failed.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH 2/2] perf tests: Add a test for the ARM 32-bit [vectors] page

2018-10-25 Thread Russell King - ARM Linux
On Thu, Oct 25, 2018 at 10:53:12AM -0700, Florian Fainelli wrote:
> Something like this below? It does not have to be an alternative
> solution, I would find it useful for perf to make sure the vectors page
> is present in the virtual address space by having an explicit test. perf
> maintains, what do you think?

Yep, I'd swap the two tests merely because the check for 'sig' will be
slightly cheaper than the mask-and-test of addr, but it probably doesn't
make that much difference.

> 
> diff --git a/arch/arm/mm/fault.c b/arch/arm/mm/fault.c
> index f4ea4c62c613..b045b48d368d 100644
> --- a/arch/arm/mm/fault.c
> +++ b/arch/arm/mm/fault.c
> @@ -173,6 +173,11 @@ __do_user_fault(struct task_struct *tsk, unsigned
> long addr,
> show_regs(regs);
> }
>  #endif
> +#ifndef CONFIG_KUSER_HELPERS
> +   if (((addr & PAGE_MASK) == 0x) && (sig == SIGSEGV))
> +   printk(KERN_DEBUG "%s: CONFIG_KUSER_HELPERS disabled at
> 0x%08lx\n",
> +  tsk->comm, addr);
> +#endif
> 
> tsk->thread.address = addr;
> tsk->thread.error_code = fsr;
> 
> 
> > 
> > BTW, I note that the help text for CONFIG_KUSER_HELPERS needs to be
> > updated.
> > 
> 
> 
> -- 
> Florian

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH 2/2] perf tests: Add a test for the ARM 32-bit [vectors] page

2018-10-25 Thread Russell King - ARM Linux
On Thu, Oct 25, 2018 at 10:19:54AM -0700, Florian Fainelli wrote:
> On 10/24/18 7:10 PM, Andrew Lunn wrote:
> > On Wed, Oct 24, 2018 at 05:09:05PM -0700, Florian Fainelli wrote:
> >> perf on ARM requires CONFIG_KUSER_HELPERS to be turned on to allow some
> >> independance with respect to the ARM CPU being used. Add a test which
> >> tries to locate the [vectors] page, created when CONFIG_KUSER_HELPERS is
> >> turned on to help asses the system's health.
> > 
> > Hi Florian
> > 
> > I've suffered the pain from missing CONFIG_KUSER_HELPERS. The
> > segfaults give little clue as to what is going wrong,and gdb is not
> > much use either.
> 
> If you have a working backtrace, you can typically see the virtual
> address being in 0x_ which gives a bit of a clue, but yes, this
> is not particularly helpful.
> 
> > 
> > What i don't see here is any clue to CONFIG_KUSER_HELPERS. If the test
> > fails, could you print a message suggesting CONFIG_KUSER_HELPERS on
> > ARM?
> 
> Sure, sounds reasonable, thanks for taking a look.

An alternative would be rather than adding this to user programs, to
have the kernel detect it and print a warning itself.

BTW, I note that the help text for CONFIG_KUSER_HELPERS needs to be
updated.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH v3 1/3] of/fdt: Absorb ARM64's __early_init_dt_declare_initrd()

2018-10-25 Thread Russell King - ARM Linux
On Thu, Oct 25, 2018 at 08:25:04AM -0500, Rob Herring wrote:
> On Wed, Oct 24, 2018 at 7:17 PM Florian Fainelli  wrote:
> >
> > ARM64 is the only architecture that requires a re-definition of
> > __early_init_dt_declare_initrd(), absorb its custom implemention in
> > drivers/of/fdt.c.
> >
> > Suggested-by: Rob Herring  
> You forgot a shift key. :)
> 
> > Signed-off-by: Florian Fainelli 
> > ---
> >  drivers/of/fdt.c | 5 +
> >  1 file changed, 5 insertions(+)
> >
> > diff --git a/drivers/of/fdt.c b/drivers/of/fdt.c
> > index 800ad252cf9c..7d316f008f22 100644
> > --- a/drivers/of/fdt.c
> > +++ b/drivers/of/fdt.c
> > @@ -896,9 +896,14 @@ const void * __init of_flat_dt_match_machine(const 
> > void *default_match,
> >  static void __early_init_dt_declare_initrd(unsigned long start,
> >unsigned long end)
> >  {
> > +#if IS_ENABLED(CONFIG_ARM64)
> 
> C code, not preprocessor please.
> 
> > +   initrd_start = start;
> > +   initrd_end = end;
> 
> Thinking some more about this, perhaps it is better to just add the
> *_phys variants now along side the VA variants and set them here. Then
> the arm64 code can override the initrd_start, initrd_end, and
> initrd_below_start_ok values.

Please, let's not make the age old mistake of inventing new symbols
for stuff that already exists:

$ grep phys_initrd_start arch/ -rl
arch/arm/mm/init.c
arch/nds32/mm/init.c
arch/unicore32/mm/init.c
$ grep initrd_start_phys arch/ -rl
$

Please use the "phys_initrd_start" and "phys_initrd_end" naming,
which already exist on some architectures rather than inventing a new
set of symbols for the same thing and then forcing arches to change.
We could then get rid of:

/* FDT scan will populate initrd_start */
if (initrd_start && !phys_initrd_size) {
phys_initrd_start = __virt_to_phys(initrd_start);
phys_initrd_size = initrd_end - initrd_start;
}

initrd_start = initrd_end = 0;

in ARM, which exists purely to cope with DT.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH 1/2] mm/zsmalloc.c: check encoded object value overflow for PAE

2018-10-25 Thread Russell King - ARM Linux
On Thu, Oct 25, 2018 at 09:37:59AM -0300, Rafael David Tinoco wrote:
> Is it okay to propose using only MAX_PHYSMEM_BITS for zsmalloc (like
> it was before commit 02390b87) instead, and make sure *at least* ARM
> 32/64 and x86/x64, for now, have it defined outside sparsemem headers
> as well ?

It looks to me like this has been broken on ARM for quite some time,
predating that commit.  The original was:

#ifndef MAX_PHYSMEM_BITS
#ifdef CONFIG_HIGHMEM64G
#define MAX_PHYSMEM_BITS 36
#else /* !CONFIG_HIGHMEM64G */
#define MAX_PHYSMEM_BITS BITS_PER_LONG
#endif
#endif
#define _PFN_BITS  (MAX_PHYSMEM_BITS - PAGE_SHIFT)

On ARM, CONFIG_HIGHMEM64G is never defined (it's an x86 private symbol)
which means that the above sets MAX_PHYSMEM_BITS to 32 on non-sparsemem
ARM LPAE platforms.  So commit 02390b87 hasn't really changed anything
as far as ARM LPAE is concerned - and this looks to be a bug that goes
all the way back to when zsmalloc.c was moved out of staging in 2014.

Digging further back, it seems this brokenness was introduced with:

commit 6e00ec00b1a76a199b8c0acae401757b795daf57
Author: Seth Jennings 
Date:   Mon Mar 5 11:33:22 2012 -0600

staging: zsmalloc: calculate MAX_PHYSMEM_BITS if not defined

This patch provides a way to determine or "set a
reasonable value for" MAX_PHYSMEM_BITS in the case that
it is not defined (i.e. !SPARSEMEM)

Signed-off-by: Seth Jennings 
Acked-by: Nitin Gupta 
Signed-off-by: Greg Kroah-Hartman 

which, at the time, realised the problem with SPARSEMEM, but decided
that in the absense of SPARSEMEM, that MAX_PHYSMEM_BITS shall be
BITS_PER_LONG which seems absurd (see below.)

> This way I can WARN_ONCE(), instead of BUG(), when specific
> arch does not define it - enforcing behavior - showing BITS_PER_LONG
> is being used instead of MAX_PHYSMEM_BITS (warning, at least once, for
> the possibility of an overflow, like the issue showed in here).

Assuming that the maximum number of physical memory bits are
BITS_PER_LONG in the absense of MAX_POSSIBLE_PHYSMEM_BITS is a nonsense
- we have had the potential for PAE systems for a long time, and to
introduce new code that makes this assumption was plainly wrong.

We know when there's the potential for PAE, and thus more than
BITS_PER_LONG bits of physical memory address, through
CONFIG_PHYS_ADDR_T_64BIT.  So if we have the situation where
MAX_POSSIBLE_PHYSMEM_BITS (or the older case of MAX_PHYSMEM_BITS) not
being defined, but CONFIG_PHYS_ADDR_T_64BIT set, we should've been
erroring or something based on not knowing how many physical memory
bits are possible - it would be more than BITS_PER_LONG but less
than some unknown number of bits.

This is why I think any fallback here to BITS_PER_LONG is wrong.

What I suggested is to not fall back to BITS_PER_LONG in any case, but
always define MAX_PHYSMEM_BITS.  However, I now see that won't work for
x86 because MAX_PHYSMEM_BITS is not a constant anymore.

So I suggest everything that uses zsmalloc.c should instead define
MAX_POSSIBLE_PHYSMEM_BITS.

Note that there should _also_ be some protection in zsmalloc.c against
MAX_POSSIBLE_PHYSMEM_BITS being too large:

#define OBJ_INDEX_BITS  (BITS_PER_LONG - _PFN_BITS - OBJ_TAG_BITS)
#define OBJ_TAG_BITS 1
#define _PFN_BITS   (MAX_POSSIBLE_PHYSMEM_BITS - PAGE_SHIFT)

which means there's an implicit limitation on _PFN_BITS being less than
BITS_PER_LONG - OBJ_TAG_BITS (where, if it's equal to this, and hence
OBJ_INDEX_BITS will be zero.)  This imples that MAX_POSSIBLE_PHYSMEM_BITS
must be smaller than BITS_PER_LONG + PAGE_SHIFT - OBJ_TAG_BITS, or
43 bits on a 32 bit system.  If you want to guarantee a minimum number
of objects, then that limitation needs to be reduced further.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH 1/2] mm/zsmalloc.c: check encoded object value overflow for PAE

2018-10-25 Thread Russell King - ARM Linux
On Wed, Oct 24, 2018 at 10:27:44PM -0300, Rafael David Tinoco wrote:
> On 32-bit systems, zsmalloc uses HIGHMEM and, when PAE is enabled, the
> physical frame number might be so big that zsmalloc obj encoding (to
> location) will break IF architecture does not re-define
> MAX_PHYSMEM_BITS, causing:

I think there's a deeper problem here - a misunderstanding of what
MAX_PHYSMEM_BITS is.

MAX_PHYSMEM_BITS is a definition for sparsemem, and is only visible
when sparsemem is enabled.  When sparsemem is disabled, asm/sparsemem.h
is not included (and should not be included) which means there is no
MAX_PHYSMEM_BITS definition.

I don't think zsmalloc.c should be (ab)using MAX_PHYSMEM_BITS, and
your description above makes it sound like you expect it to always be
defined.

If we want to have a definition for this, we shouldn't be playing
fragile games like:

#ifndef MAX_POSSIBLE_PHYSMEM_BITS
#ifdef MAX_PHYSMEM_BITS
#define MAX_POSSIBLE_PHYSMEM_BITS MAX_PHYSMEM_BITS
#else
/*
 * If this definition of MAX_PHYSMEM_BITS is used, OBJ_INDEX_BITS will just
 * be PAGE_SHIFT
 */
#define MAX_POSSIBLE_PHYSMEM_BITS BITS_PER_LONG
#endif
#endif

but instead insist that MAX_PHYSMEM_BITS is defined _everywhere_.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] i2c: cadence: Implement timeout

2018-10-24 Thread Russell King - ARM Linux
On Wed, Oct 24, 2018 at 04:20:03PM +0530, shubhrajyoti.da...@gmail.com wrote:
> From: Shubhrajyoti Datta 
> 
> In some cases we are waiting in a loop. Replace the infinite wait with
> the  timeout.
> 
> Signed-off-by: Shubhrajyoti Datta 
> ---
>  drivers/i2c/busses/i2c-cadence.c | 30 ++
>  1 file changed, 26 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/i2c/busses/i2c-cadence.c 
> b/drivers/i2c/busses/i2c-cadence.c
> index b136057..9c38278 100644
> --- a/drivers/i2c/busses/i2c-cadence.c
> +++ b/drivers/i2c/busses/i2c-cadence.c
> @@ -209,6 +209,7 @@ static irqreturn_t cdns_i2c_isr(int irq, void *ptr)
>   struct cdns_i2c *id = ptr;
>   /* Signal completion only after everything is updated */
>   int done_flag = 0;
> + unsigned int timeout;
>   irqreturn_t status = IRQ_NONE;
>  
>   isr_status = cdns_i2c_readreg(CDNS_I2C_ISR_OFFSET);
> @@ -235,6 +236,7 @@ static irqreturn_t cdns_i2c_isr(int irq, void *ptr)
>   ((isr_status & CDNS_I2C_IXR_COMP) ||
>(isr_status & CDNS_I2C_IXR_DATA))) {
>   /* Read data if receive data valid is set */
> + timeout = 1000;
>   while (cdns_i2c_readreg(CDNS_I2C_SR_OFFSET) &
>  CDNS_I2C_SR_RXDV) {
>   /*
> @@ -253,6 +255,16 @@ static irqreturn_t cdns_i2c_isr(int irq, void *ptr)
>  
>   if (cdns_is_holdquirk(id, hold_quirk))
>   break;
> + timeout--;
> + if (timeout)
> + mdelay(1);
> + else
> + break;
> + }
> + if (!timeout) {
> + id->err_status = -ETIMEDOUT;
> + complete(>xfer_done);
> + return IRQ_HANDLED;

Good kernel programming principle: Always check for the success
condition when exiting due to timeout rather than the fact that we
timed out.

Also, is this _really_ a loop that needs a timeout condition?  Looking
at the original code, it looks like the purpose of the loop is to read
more than one byte, and you are introducing a 1ms delay between the
read of each byte.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH v3 0/7] include: add setbits32/clrbits32/clrsetbits32/setbits64/clrbits64/clrsetbits64

2018-10-24 Thread Russell King - ARM Linux
On Wed, Oct 24, 2018 at 07:35:46AM +, Corentin Labbe wrote:
> This patchset adds a new set of functions which are open-coded in lot of
> place.
> Basicly the pattern is always the same, "read, modify a bit, write"
> some driver and the powerpc arch already have thoses pattern them as 
> functions. (like ahci_sunxi.c or dwmac-meson8b)

The advantage of them being open-coded is that it's _obvious_ to the
reviewer that there is a read-modify-write going on which, in a multi-
threaded environment, may need some locking (so it should trigger a
review of the locking around that code.)

With it hidden inside a helper which has no locking itself, it becomes
much easier to pass over in review, which means that races are much
more likely to go unspotted - and that is bad news.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH v2 1/3] ARM: socfpga: Clean unused functions

2018-10-23 Thread Russell King - ARM Linux
On Tue, Oct 23, 2018 at 10:47:05AM +0200, Clément Péron wrote:
> Hi Dinh / Russell,
> 
> Could you have a look at these patchs please ?

Nothing to do with me - it's up to the socfpga maintainer and arm-soc
maintainers.

> Thanks,
> Clement
> 
> On Tue, 9 Oct 2018 at 13:20, Clément Péron  wrote:
> >
> > These functions are unused externally, removed them and declare
> > the one used locally as static.
> >
> > Signed-off-by: Clément Péron 
> > ---
> >  arch/arm/mach-socfpga/core.h| 2 --
> >  arch/arm/mach-socfpga/socfpga.c | 2 +-
> >  2 files changed, 1 insertion(+), 3 deletions(-)
> >
> > diff --git a/arch/arm/mach-socfpga/core.h b/arch/arm/mach-socfpga/core.h
> > index 65e1817d8afe..92cae0a9213f 100644
> > --- a/arch/arm/mach-socfpga/core.h
> > +++ b/arch/arm/mach-socfpga/core.h
> > @@ -34,8 +34,6 @@
> >
> >  #define RSTMGR_MPUMODRST_CPU1  0x2 /* CPU1 Reset */
> >
> > -extern void socfpga_init_clocks(void);
> > -extern void socfpga_sysmgr_init(void);
> >  void socfpga_init_l2_ecc(void);
> >  void socfpga_init_ocram_ecc(void);
> >  void socfpga_init_arria10_l2_ecc(void);
> > diff --git a/arch/arm/mach-socfpga/socfpga.c 
> > b/arch/arm/mach-socfpga/socfpga.c
> > index dde14f7bf2c3..5fb6f79059a8 100644
> > --- a/arch/arm/mach-socfpga/socfpga.c
> > +++ b/arch/arm/mach-socfpga/socfpga.c
> > @@ -32,7 +32,7 @@ void __iomem *rst_manager_base_addr;
> >  void __iomem *sdr_ctl_base_addr;
> >  unsigned long socfpga_cpu1start_addr;
> >
> > -void __init socfpga_sysmgr_init(void)
> > +static void __init socfpga_sysmgr_init(void)
> >  {
> > struct device_node *np;
> >
> > --
> > 2.17.1
> >

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] arm: kernel: add support for detecting armv8 cpu cache information

2018-10-18 Thread Russell King - ARM Linux
On Thu, Oct 18, 2018 at 02:16:47PM +0800, Teng Fei Fan wrote:
> This patch adds support for cacheinfo on 32bit ARMv8 platform.
> Add support for detecting cpu cache information cpu cache information
> via sysfs for 32bit armv8 platform. And export to sysfs then userspace
> can get from /sys/devices/system/cpu/cpuX/cache.

You don't explain why this is needed.

We don't do this for previous 32-bit CPUs, so why should we do this
for ARMv8 running on a 32-bit kernel?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH 1/3] driver core: add probe_err log helper

2018-10-17 Thread Russell King - ARM Linux
On Tue, Oct 16, 2018 at 04:55:00PM +0300, Andy Shevchenko wrote:
> On Tue, Oct 16, 2018 at 3:55 PM Andrzej Hajda  wrote:
> > On 16.10.2018 13:29, Andrzej Hajda wrote:
> > > On 16.10.2018 13:01, Andy Shevchenko wrote:
> > >> On Tue, Oct 16, 2018 at 10:22 AM Andrzej Hajda  
> > >> wrote:
> > >>> During probe every time driver gets resource it should usually check 
> > >>> for error
> > >>> printk some message if it is not -EPROBE_DEFER and return the error. 
> > >>> This
> > >>> pattern is simple but requires adding few lines after any resource 
> > >>> acquisition
> > >>> code, as a result it is often omited or implemented only partially.
> > >>> probe_err helps to replace such code seqences with simple call, so code:
> > >>> if (err != -EPROBE_DEFER)
> > >>> dev_err(dev, ...);
> > >>> return err;
> > >>> becomes:
> > >>> return probe_err(dev, err, ...);
> 
> > >>> +   va_start(args, fmt);
> > >>> +
> > >>> +   vaf.fmt = fmt;
> > >>> +   vaf.va = 
> > >>> +
> > >>> +   __dev_printk(KERN_ERR, dev, );
> 
> > >> It would be nice to print an error code as well, wouldn't it?
> > > Hmm, on probe fail error is printed anyway (with exception of
> > > EPROBE_DEFER, ENODEV and ENXIO):
> > > "probe of %s failed with error %d\n"
> > > On the other side currently some drivers prints the error code anyway
> > > via dev_err or similar, so I guess during conversion to probe_err it
> > > should be removed then.
> > >
> > > If we add error code to probe_err is it OK to report it this way?
> > > dev_err(dev, "%V, %d\n", , err);
> >
> > Ups, I forgot that message passed to probe_err will contain already
> > newline character.
> 
> You may consider not to pass it.

It's normal to pass the '\n', so by doing this, we create the situation
where this function becomes the exception to the norm.  That's not a
good idea - we will see people forget that appending '\n' should not
be done for this particular function.

While we could add a checkpatch rule, that's hassle (extra rework).  In
any case, I think the message would be much better formatted if we did:

dev_err(dev, "error %d: %V", err, );

which means we end up with (eg):

error -5: request_irq failed for irq 9

rather than:

request_irq failed for irq 9, -5

which is more confusing.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] [ALTERNATIVE] ARM: fix copypage functions for clang

2018-10-17 Thread Russell King - ARM Linux
On Wed, Oct 17, 2018 at 11:04:17AM +0200, Arnd Bergmann wrote:
> The constraints were originally added in commit 9a40ac86152c ("ARM:
> 6164/1: Add kto and kfrom to input operands list.") as a gcc-4.5
> workaround. Another workaround for the same problem was added in commit
> 9c695203a7dd ("compiler-gcc.h: gcc-4.5 needs noclone and noinline
> on __naked functions") and should have obsoleted the first one.

That is an incorrect statement - please read the discussion back then,
particularly Mikael Pettersson's reply:

"I've tested and verified that this bit enables a gcc-4.5 compiled kernel
to boot on TS-119 (Kirkwood) when combined with my fix for __naked.
With neither or only one of the patches applied, the kernel oopses hard
in copy_user_page() as it tries to start /sbin/init."

That is very clear that it is not "one or the other" patch, and it's
certainly not true that one patch obsoletes the other.

Mikael is also very clear in the effects that are going on - to re-quote
what I've already quoted (and clearly you missed):

"- the asm() bodies of these __naked functions have inadequate input
  parameter constraints, in particular they fail to declare any
  dependencies on the functions' formal parameters; gcc-4.5 sees this
  and skips the parameter setup before calling these functions, causing
  runtime crashes"

This description makes it clear that it's not the naked function that
is wrong, but the function that _calls_ the naked function - stating
that GCC fails to setup the parameters _for_ _the_ _called_ _naked_
_function_.

So, there are two issues here:

1. gcc-4.5 has been observed to clone and inline naked functions, which
   you claim has been fixed.
2. gcc-4.5 fails to setup parameters for naked functions, which we have
   no idea whether it's been fixed.

> I've used this on my randconfig build setup, and it makes all
> configurations build without warnings, but I have not done
> any runtime testing on it.

Since the problem has always been a runtime issue, a build-only test is
insufficient.

Sorry, but no, this is way too risky in its current form.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list

2018-10-16 Thread Russell King - ARM Linux
On Tue, Oct 16, 2018 at 10:00:19AM +0200, Linus Walleij wrote:
> On Tue, Oct 16, 2018 at 12:16 AM Stefan Agner  wrote:
> 
> > When functions incoming parameters are not in input operands list gcc
> > 4.5 does not load the parameters into registers before calling this
> > function but the inline assembly assumes valid addresses inside this
> > function. This breaks the code because r0 and r1 are invalid when
> > execution enters v4wb_copy_user_page ()
> >
> > Also the constant needs to be used as third input operand so account
> > for that as well.
> >
> > This fixes copypage-fa.c what has previously done before for the other
> > copypage implementations in commit 9a40ac86152c ("ARM: 6164/1: Add kto
> > and kfrom to input operands list.").
> >
> > Signed-off-by: Stefan Agner 
> 
> Please add:
> Cc: sta...@vger.kernel.org

It's not obvious yet whether this is right - it contradicts the GCC
manual, but then we have evidence that it's required for some GCC
versions where GCC may clone the function, or if the function is
used within the same file.

> I am on deep waters with ARM assembly, admittedly. So I wanted to
> ask: OpenWRT has this cache patch:
> https://github.com/openwrt/openwrt/blob/master/target/linux/gemini/patches-4.14/0001-cache-patch-from-OpenWRT.patch
> I do not know why (sorry).
> 
> Do you think that patch is actually a hack to hide the problem
> fixed with this patch? (OK maybe stupid question but...)

No, it looks to me like a hack to make DMA cache handling "more
efficient" by cleaning/invalidating the entire cache when dealing
with large streaming buffers.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH 2/2] ARM: copypage: do not use naked functions

2018-10-16 Thread Russell King - ARM Linux
On Mon, Oct 15, 2018 at 07:27:43PM -0400, Nicolas Pitre wrote:
> It's hard to see what that commit was actually fixing, but the operands 
> usage is wrong as explained already. Maybe the generated code has been 
> OK for all those years but that is due to luck rather than correctness.
...
> No idea. Maybe Russell remembers?
> Maybe digging into the mailing list archive might tell.

I found this as a reply to the patch by Mikael Pettersson:

I've tested and verified that this bit enables a gcc-4.5 compiled kernel
to boot on TS-119 (Kirkwood) when combined with my fix for __naked.
With neither or only one of the patches applied, the kernel oopses hard
in copy_user_page() as it tries to start /sbin/init.
...
- the asm() bodies of these __naked functions have inadequate input
  parameter constraints, in particular they fail to declare any
  dependencies on the functions' formal parameters; gcc-4.5 sees this
  and skips the parameter setup before calling these functions, causing
  runtime crashes; Khem's patch (this one) fixes that
  (copypage-xscale.c already had correct asm() constraints so it works
  with only the __naked fix, these other copypage-*.c files need both
  patches to work)

So, while wrong to the GCC manual, it's fixing a bug that is present
with gcc-4.5 and who-knows what other GCC versions.  Reverting the
commit has the chance to cause regressions with GCC.

It looks like any change here needs to be validated on a range of
GCC versions, because there are versions of GCC known not to follow
it's manual!

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list

2018-10-15 Thread Russell King - ARM Linux
On Tue, Oct 16, 2018 at 12:52:58AM +0200, Stefan Agner wrote:
> On 16.10.2018 00:46, Russell King - ARM Linux wrote:
> > On Tue, Oct 16, 2018 at 12:39:54AM +0200, Stefan Agner wrote:
> >> On 16.10.2018 00:23, Russell King - ARM Linux wrote:
> >> > On Tue, Oct 16, 2018 at 12:16:29AM +0200, Stefan Agner wrote:
> >> >> When functions incoming parameters are not in input operands list gcc
> >> >> 4.5 does not load the parameters into registers before calling this
> >> >> function but the inline assembly assumes valid addresses inside this
> >> >> function. This breaks the code because r0 and r1 are invalid when
> >> >> execution enters v4wb_copy_user_page ()
> >> >
> >> > NAK.  Naked functions must never be inlined.  Please add a "noinline"
> >> > attribute to the function rather than making things more complex.
> >> >
> >>
> >> To be honest, I did not put much thought into this commit since it is
> >> just doing to copypage-fa.c what 9a40ac86152c ("ARM: 6164/1: Add kto and
> >> kfrom to input operands list.") has been done to the other copypage
> >> implementations...
> >>
> >> [adding Khem]
> >>
> >> > The GCC manual states:
> >> >
> >> > `naked'
> >> >  Use this attribute on the ARM, AVR, MCORE, MSP430, NDS32, RL78, RX
> >> >  and SPU ports to indicate that the specified function does not
> >> >  need prologue/epilogue sequences generated by the compiler.  It is
> >> >  up to the programmer to provide these sequences. The only
> >> >   
> >> >  statements that can be safely included in naked functions are
> >> >  ^
> >> >  `asm' statements that do not have operands.  All other statements,
> >> >  ^^^
> >> >  including declarations of local variables, `if' statements, and so
> >> >  forth, should be avoided.  Naked functions should be used to
> >> >  implement the body of an assembly function, while allowing the
> >> >  compiler to construct the requisite function declaration for the
> >> >  assembler.
> >> >
> >> > The 'I' attribute is fine here because it is a constant that is not
> >> > allowed to be in a register (and hence has no code generation side
> >> > effects.)
> >> >
> >> > Adding operands for the input parameters, however, isn't going to
> >> > work around the fact that _this_ assembly is written to be out of
> >> > line and so it must never be inlined by the compiler.
> >>
> >> I briefly looked at a disassembled version after applying both patches,
> >> it indeed leads to inlining. However, the code seems to be working
> >> (thanks to asm volatile?)...
> > 
> > Apart from v4wb_copy_user_page() and mc_copy_user_page(), how is
> > Clang inlining these static functions that are only used through
> > function pointers?
> 
> I only looked at copypage-xscale.c (the mc_copy_user_page() case)...

The two I mention are different from the rest, because they are used
from other functions within the same file.  The rest are all used
through function pointers and should, therefore, never be inlined.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH 2/2] ARM: copypage: do not use naked functions

2018-10-15 Thread Russell King - ARM Linux
On Mon, Oct 15, 2018 at 06:54:49PM -0400, Nicolas Pitre wrote:
> On Mon, 15 Oct 2018, Russell King - ARM Linux wrote:
> 
> > On Mon, Oct 15, 2018 at 06:35:33PM -0400, Nicolas Pitre wrote:
> > > On Tue, 16 Oct 2018, Stefan Agner wrote:
> > > 
> > > > GCC documentation says naked functions should only use basic ASM
> > > > syntax. The extended ASM or mixture of basic ASM and "C" code is
> > > > not guaranteed. Currently it seems to work though.
> > > > 
> > > > Furthermore with Clang using parameters in extended asm in a
> > > > naked function is not supported:
> > > >   arch/arm/mm/copypage-v4wb.c:47:9: error: parameter references not
> > > >   allowed in naked functions
> > > > : "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 64));
> > > >^
> > > > 
> > > > Use a regular function to be more portable. Also use volatile asm
> > > > to avoid unsolicited optimizations.
> > > > 
> > > > Tested with qemu versatileab machine and versatile_defconfig and
> > > > qemu mainstone machine using pxa_defconfig compiled with GCC 7.2.1
> > > > and Clang 7.0.
> > > > 
> > > > Link: https://github.com/ClangBuiltLinux/linux/issues/90
> > > > Reported-by: Joel Stanley 
> > > > Signed-off-by: Stefan Agner 
> > > > ---
> > > >  arch/arm/mm/copypage-fa.c   | 17 +++--
> > > >  arch/arm/mm/copypage-feroceon.c | 17 +++--
> > > >  arch/arm/mm/copypage-v4mc.c | 14 +-
> > > >  arch/arm/mm/copypage-v4wb.c | 17 +++--
> > > >  arch/arm/mm/copypage-v4wt.c | 17 +++--
> > > >  arch/arm/mm/copypage-xsc3.c | 17 +++--
> > > >  arch/arm/mm/copypage-xscale.c   | 13 -
> > > >  7 files changed, 72 insertions(+), 40 deletions(-)
> > > > 
> > > > diff --git a/arch/arm/mm/copypage-fa.c b/arch/arm/mm/copypage-fa.c
> > > > index ec6501308c60..33ccd396bf99 100644
> > > > --- a/arch/arm/mm/copypage-fa.c
> > > > +++ b/arch/arm/mm/copypage-fa.c
> > > > @@ -17,11 +17,16 @@
> > > >  /*
> > > >   * Faraday optimised copy_user_page
> > > >   */
> > > > -static void __naked
> > > > -fa_copy_user_page(void *kto, const void *kfrom)
> > > > +static void fa_copy_user_page(void *kto, const void *kfrom)
> > > >  {
> > > > -   asm("\
> > > > -   stmfd   sp!, {r4, lr}   @ 2\n\
> > > > +   register void *r0 asm("r0") = kto;
> > > > +   register const void *r1 asm("r1") = kfrom;
> > > > +
> > > > +   asm(
> > > > +   __asmeq("%0", "r0")
> > > > +   __asmeq("%1", "r1")
> > > > +   "\
> > > > +   stmfd   sp!, {r4}   @ 2\n\
> > > > mov r2, %2  @ 1\n\
> > > >  1: ldmia   r1!, {r3, r4, ip, lr}   @ 4\n\
> > > > stmia   r0, {r3, r4, ip, lr}@ 4\n\
> > > > @@ -34,9 +39,9 @@ fa_copy_user_page(void *kto, const void *kfrom)
> > > > subsr2, r2, #1  @ 1\n\
> > > > bne 1b  @ 1\n\
> > > > mcr p15, 0, r2, c7, c10, 4  @ 1   drain WB\n\
> > > > -   ldmfd   sp!, {r4, pc}   @ 3"
> > > > +   ldmfd   sp!, {r4}   @ 3"
> > > > :
> > > > -   : "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 32));
> > > > +   : "r" (r0), "r" (r1), "I" (PAGE_SIZE / 32));
> > > 
> > > This is still wrong as you list r0 and r1 in the input operand list 
> > > where they must remain constant but the code does modify them. You 
> > > should list them in the output operand list with the "&" attribute. Also 
> > > r2 should be listed in the clobbered list.
> > 
> > Either we keep these as naked functions (and, if Clang wants to
> > try to inline naked functions which makes no sense, also mark them
> > as noinline) or we make them proper functions and also add (eg) r4
> > to the clobber list and get rid of the stacking of that register
> > along with LR/PC.
> 
> Yes, indeed.
> 
> I'd say: remove the naked stuff, and let the compiler do the 
> prologue/epilogue itself (or inline it for that matter). And don't force 
> pointers and counter into particular registers. This way r0-r3 could be 
> used as temporaries since they're probably already clobbered by the call 
> to kmap_atomic() anyway. That is likely to be better than forcing ip/lr 
> as temporaryes.

That doesn't work for the general case - which is where the functions
are called via function pointers, and so are never inlined.  For these,
the current code is optimal, and I suspect the compiler will do worse
with it.

For the two instances (v4wb and mc) that don't follow that pattern,
you may be right, but I'd want to see the result of the changes.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list

2018-10-15 Thread Russell King - ARM Linux
On Tue, Oct 16, 2018 at 12:39:54AM +0200, Stefan Agner wrote:
> On 16.10.2018 00:23, Russell King - ARM Linux wrote:
> > On Tue, Oct 16, 2018 at 12:16:29AM +0200, Stefan Agner wrote:
> >> When functions incoming parameters are not in input operands list gcc
> >> 4.5 does not load the parameters into registers before calling this
> >> function but the inline assembly assumes valid addresses inside this
> >> function. This breaks the code because r0 and r1 are invalid when
> >> execution enters v4wb_copy_user_page ()
> > 
> > NAK.  Naked functions must never be inlined.  Please add a "noinline"
> > attribute to the function rather than making things more complex.
> > 
> 
> To be honest, I did not put much thought into this commit since it is
> just doing to copypage-fa.c what 9a40ac86152c ("ARM: 6164/1: Add kto and
> kfrom to input operands list.") has been done to the other copypage
> implementations...
> 
> [adding Khem]
> 
> > The GCC manual states:
> > 
> > `naked'
> >  Use this attribute on the ARM, AVR, MCORE, MSP430, NDS32, RL78, RX
> >  and SPU ports to indicate that the specified function does not
> >  need prologue/epilogue sequences generated by the compiler.  It is
> >  up to the programmer to provide these sequences. The only
> >   
> >  statements that can be safely included in naked functions are
> >  ^
> >  `asm' statements that do not have operands.  All other statements,
> >  ^^^
> >  including declarations of local variables, `if' statements, and so
> >  forth, should be avoided.  Naked functions should be used to
> >  implement the body of an assembly function, while allowing the
> >  compiler to construct the requisite function declaration for the
> >  assembler.
> > 
> > The 'I' attribute is fine here because it is a constant that is not
> > allowed to be in a register (and hence has no code generation side
> > effects.)
> > 
> > Adding operands for the input parameters, however, isn't going to
> > work around the fact that _this_ assembly is written to be out of
> > line and so it must never be inlined by the compiler.
> 
> I briefly looked at a disassembled version after applying both patches,
> it indeed leads to inlining. However, the code seems to be working
> (thanks to asm volatile?)...

Apart from v4wb_copy_user_page() and mc_copy_user_page(), how is
Clang inlining these static functions that are only used through
function pointers?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH 2/2] ARM: copypage: do not use naked functions

2018-10-15 Thread Russell King - ARM Linux
On Mon, Oct 15, 2018 at 06:35:33PM -0400, Nicolas Pitre wrote:
> On Tue, 16 Oct 2018, Stefan Agner wrote:
> 
> > GCC documentation says naked functions should only use basic ASM
> > syntax. The extended ASM or mixture of basic ASM and "C" code is
> > not guaranteed. Currently it seems to work though.
> > 
> > Furthermore with Clang using parameters in extended asm in a
> > naked function is not supported:
> >   arch/arm/mm/copypage-v4wb.c:47:9: error: parameter references not
> >   allowed in naked functions
> > : "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 64));
> >^
> > 
> > Use a regular function to be more portable. Also use volatile asm
> > to avoid unsolicited optimizations.
> > 
> > Tested with qemu versatileab machine and versatile_defconfig and
> > qemu mainstone machine using pxa_defconfig compiled with GCC 7.2.1
> > and Clang 7.0.
> > 
> > Link: https://github.com/ClangBuiltLinux/linux/issues/90
> > Reported-by: Joel Stanley 
> > Signed-off-by: Stefan Agner 
> > ---
> >  arch/arm/mm/copypage-fa.c   | 17 +++--
> >  arch/arm/mm/copypage-feroceon.c | 17 +++--
> >  arch/arm/mm/copypage-v4mc.c | 14 +-
> >  arch/arm/mm/copypage-v4wb.c | 17 +++--
> >  arch/arm/mm/copypage-v4wt.c | 17 +++--
> >  arch/arm/mm/copypage-xsc3.c | 17 +++--
> >  arch/arm/mm/copypage-xscale.c   | 13 -
> >  7 files changed, 72 insertions(+), 40 deletions(-)
> > 
> > diff --git a/arch/arm/mm/copypage-fa.c b/arch/arm/mm/copypage-fa.c
> > index ec6501308c60..33ccd396bf99 100644
> > --- a/arch/arm/mm/copypage-fa.c
> > +++ b/arch/arm/mm/copypage-fa.c
> > @@ -17,11 +17,16 @@
> >  /*
> >   * Faraday optimised copy_user_page
> >   */
> > -static void __naked
> > -fa_copy_user_page(void *kto, const void *kfrom)
> > +static void fa_copy_user_page(void *kto, const void *kfrom)
> >  {
> > -   asm("\
> > -   stmfd   sp!, {r4, lr}   @ 2\n\
> > +   register void *r0 asm("r0") = kto;
> > +   register const void *r1 asm("r1") = kfrom;
> > +
> > +   asm(
> > +   __asmeq("%0", "r0")
> > +   __asmeq("%1", "r1")
> > +   "\
> > +   stmfd   sp!, {r4}   @ 2\n\
> > mov r2, %2  @ 1\n\
> >  1: ldmia   r1!, {r3, r4, ip, lr}   @ 4\n\
> > stmia   r0, {r3, r4, ip, lr}@ 4\n\
> > @@ -34,9 +39,9 @@ fa_copy_user_page(void *kto, const void *kfrom)
> > subsr2, r2, #1  @ 1\n\
> > bne 1b  @ 1\n\
> > mcr p15, 0, r2, c7, c10, 4  @ 1   drain WB\n\
> > -   ldmfd   sp!, {r4, pc}   @ 3"
> > +   ldmfd   sp!, {r4}   @ 3"
> > :
> > -   : "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 32));
> > +   : "r" (r0), "r" (r1), "I" (PAGE_SIZE / 32));
> 
> This is still wrong as you list r0 and r1 in the input operand list 
> where they must remain constant but the code does modify them. You 
> should list them in the output operand list with the "&" attribute. Also 
> r2 should be listed in the clobbered list.

Either we keep these as naked functions (and, if Clang wants to
try to inline naked functions which makes no sense, also mark them
as noinline) or we make them proper functions and also add (eg) r4
to the clobber list and get rid of the stacking of that register
along with LR/PC.

Having this half-way house which will generate worse code is not
acceptable.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH 1/2] ARM: copypage-fa: add kto and kfrom to input operands list

2018-10-15 Thread Russell King - ARM Linux
On Tue, Oct 16, 2018 at 12:16:29AM +0200, Stefan Agner wrote:
> When functions incoming parameters are not in input operands list gcc
> 4.5 does not load the parameters into registers before calling this
> function but the inline assembly assumes valid addresses inside this
> function. This breaks the code because r0 and r1 are invalid when
> execution enters v4wb_copy_user_page ()

NAK.  Naked functions must never be inlined.  Please add a "noinline"
attribute to the function rather than making things more complex.

The GCC manual states:

`naked'
 Use this attribute on the ARM, AVR, MCORE, MSP430, NDS32, RL78, RX
 and SPU ports to indicate that the specified function does not
 need prologue/epilogue sequences generated by the compiler.  It is
 up to the programmer to provide these sequences. The only
  
 statements that can be safely included in naked functions are
 ^
 `asm' statements that do not have operands.  All other statements,
 ^^^
 including declarations of local variables, `if' statements, and so
 forth, should be avoided.  Naked functions should be used to
 implement the body of an assembly function, while allowing the
 compiler to construct the requisite function declaration for the
 assembler.

The 'I' attribute is fine here because it is a constant that is not
allowed to be in a register (and hence has no code generation side
effects.)

Adding operands for the input parameters, however, isn't going to
work around the fact that _this_ assembly is written to be out of
line and so it must never be inlined by the compiler.

> Also the constant needs to be used as third input operand so account
> for that as well.
> 
> This fixes copypage-fa.c what has previously done before for the other
> copypage implementations in commit 9a40ac86152c ("ARM: 6164/1: Add kto
> and kfrom to input operands list.").
> 
> Signed-off-by: Stefan Agner 
> ---
>  arch/arm/mm/copypage-fa.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/arm/mm/copypage-fa.c b/arch/arm/mm/copypage-fa.c
> index d130a5ece5d5..ec6501308c60 100644
> --- a/arch/arm/mm/copypage-fa.c
> +++ b/arch/arm/mm/copypage-fa.c
> @@ -22,7 +22,7 @@ fa_copy_user_page(void *kto, const void *kfrom)
>  {
>   asm("\
>   stmfd   sp!, {r4, lr}   @ 2\n\
> - mov r2, %0  @ 1\n\
> + mov r2, %2  @ 1\n\
>  1:   ldmia   r1!, {r3, r4, ip, lr}   @ 4\n\
>   stmia   r0, {r3, r4, ip, lr}@ 4\n\
>   mcr p15, 0, r0, c7, c14, 1  @ 1   clean and invalidate D 
> line\n\
> @@ -36,7 +36,7 @@ fa_copy_user_page(void *kto, const void *kfrom)
>   mcr p15, 0, r2, c7, c10, 4  @ 1   drain WB\n\
>   ldmfd   sp!, {r4, pc}   @ 3"
>   :
> - : "I" (PAGE_SIZE / 32));
> + : "r" (kto), "r" (kfrom), "I" (PAGE_SIZE / 32));
>  }
>  
>  void fa_copy_user_highpage(struct page *to, struct page *from,
> -- 
> 2.19.1
> 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH v7] regulator: fixed: Convert to use GPIO descriptor only

2018-10-12 Thread Russell King - ARM Linux
On Fri, Oct 12, 2018 at 11:43:13AM +, Marcel Ziswiler wrote:
> I don't think it is that fictitious as it makes it crystal clear that
> there is something shared with all its pros and cons. E.g. what happens
> if one of them regulators wants to turn off while the other one still
> needs power? The regular regulator dependency tree would nicely make
> this all clear.

If you're introducing a regulator that doesn't exist in reality
just to be able to share a GPIO line that is wired to several
real regulators, then it _is_ ficticious.  You're not describing
the hardware, you're describing something else to work around the
shortcomings of the implementation that can't cope with how stuff
is wired up in the real world.  You're making the DT description
fit the software implementation, rather than the software
implementation fit the real world hardware.

Having a single GPIO that controls multiple separate regulators
which have entirely separate supplies of their own is very common
in electronics.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH v7] regulator: fixed: Convert to use GPIO descriptor only

2018-10-12 Thread Russell King - ARM Linux
On Fri, Oct 12, 2018 at 11:39:15AM +0100, Jon Hunter wrote:
> We had the same situation for Tegra124 Jetson TK1 but I don't think that
> adding a pseudo intermediate regulator is cleaner. If the GPIO controls
> more than one regulator, I don't see why is it necessary to change the
> DT. There are several other people reporting the same problem with
> various different boards. So this does seem to be a common usage.

Given that DT describes the hardware, not the software implementation,
it must not change just because we move from GPIO numbers to GPIO
descriptors.

The existing DT description is reasonable, and introducing ficticious
regulators in DT to work around the implementation is not reasonable.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [Question] directory for SoC-related DT binding

2018-10-10 Thread Russell King - ARM Linux
On Wed, Oct 10, 2018 at 02:04:14PM +0200, Stefan Wahren wrote:
> Hi,
> 
> Am 10.10.2018 um 13:19 schrieb Rob Herring:
> > On Wed, Oct 10, 2018 at 6:08 AM Masahiro Yamada
> >  wrote:
> >> Hi,
> >>
> >>
> >> I see a bunch of vendor (or SoC) names in
> >> Documentation/device/bindings/arm/
> >>
> >> ./Documentation/devicetree/bindings/arm/altera
> >> ./Documentation/devicetree/bindings/arm/amlogic
> > Yeah, it's kind of a mixture of board/soc bindings mostly with some
> > ARM architecture, ARM, Ltd. IP, and SoC system reg bindings.
> >
> > Eventually, I'd like to not split board bindings by arch and maybe we
> > should move all the system/misc reg bindings out.
> >
> > [,,,]
> >
> >> I also see some vendor names in
> >> Documentation/device/bindings/soc/
> >>
> >> ./Documentation/devicetree/bindings/soc/bcm
> >> ./Documentation/devicetree/bindings/soc/dove
> >> ./Documentation/devicetree/bindings/soc/fsl
> >> ./Documentation/devicetree/bindings/soc/mediatek
> >> ./Documentation/devicetree/bindings/soc/qcom
> >> ./Documentation/devicetree/bindings/soc/rockchip
> >> ./Documentation/devicetree/bindings/soc/ti
> >> ./Documentation/devicetree/bindings/soc/xilinx
> >> ./Documentation/devicetree/bindings/soc/zte
> > This I believe is mostly SoC system reg bindings though there's
> > probably a few other things.
> >
> >> Confusingly, I see bcm, mediatek, rockchip
> >> in both locations.
> >>
> >> Is there any rule to choose one than the other?
> > Top-level SoC/board bindings in arm/ and anything else elsewhere ideally.
> 
> in case of Documentation/devicetree/bindings/soc/bcm the directory
> contains SoC / board bindings, cpu-enable and a firmware binding.

I think you're confused there...

$ ls -1 Documentation/devicetree/bindings/soc/bcm/
brcm,bcm2835-vchiq.txt
raspberrypi,bcm2835-power.txt

Doesn't look like SoC/board bindings to me...

whereas:

$ ls -1 Documentation/devicetree/bindings/arm/bcm/
brcm,bcm11351-cpu-method.txt
brcm,bcm11351.txt
brcm,bcm21664.txt
brcm,bcm23550-cpu-method.txt
brcm,bcm23550.txt
brcm,bcm2835.txt
brcm,bcm4708.txt
brcm,bcm63138.txt
brcm,brcmstb.txt
brcm,cygnus.txt
brcm,hr2.txt
brcm,ns2.txt
brcm,nsp-cpu-method.txt
brcm,nsp.txt
brcm,stingray.txt
brcm,vulcan-soc.txt
raspberrypi,bcm2835-firmware.txt

does fit with your description, except for the directory path...

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [Question] directory for SoC-related DT binding

2018-10-10 Thread Russell King - ARM Linux
On Wed, Oct 10, 2018 at 08:07:52PM +0900, Masahiro Yamada wrote:
> Hi,
> 
> 
> I see a bunch of vendor (or SoC) names in
> Documentation/device/bindings/arm/
> 
> ./Documentation/devicetree/bindings/arm/altera
> ./Documentation/devicetree/bindings/arm/amlogic
> ./Documentation/devicetree/bindings/arm/apm
> ./Documentation/devicetree/bindings/arm/bcm
> ./Documentation/devicetree/bindings/arm/calxeda
> ./Documentation/devicetree/bindings/arm/freescale
> ./Documentation/devicetree/bindings/arm/hisilicon
> ./Documentation/devicetree/bindings/arm/keystone
> ./Documentation/devicetree/bindings/arm/marvell
> ./Documentation/devicetree/bindings/arm/mediatek
> ./Documentation/devicetree/bindings/arm/mrvl
> ./Documentation/devicetree/bindings/arm/msm
> ./Documentation/devicetree/bindings/arm/npcm
> ./Documentation/devicetree/bindings/arm/nxp
> ./Documentation/devicetree/bindings/arm/omap
> ./Documentation/devicetree/bindings/arm/rockchip
> ./Documentation/devicetree/bindings/arm/samsung
> ./Documentation/devicetree/bindings/arm/stm32
> ./Documentation/devicetree/bindings/arm/sunxi
> ./Documentation/devicetree/bindings/arm/tegra
> ./Documentation/devicetree/bindings/arm/ti
> ./Documentation/devicetree/bindings/arm/uniphier
> ./Documentation/devicetree/bindings/arm/ux500
> ./Documentation/devicetree/bindings/arm/vt8500

These are for arch/arm/mach-* and are the DT root descriptions for the
various boards and SoCs.

> I also see some vendor names in
> Documentation/device/bindings/soc/
> 
> ./Documentation/devicetree/bindings/soc/bcm
> ./Documentation/devicetree/bindings/soc/dove
> ./Documentation/devicetree/bindings/soc/fsl
> ./Documentation/devicetree/bindings/soc/mediatek
> ./Documentation/devicetree/bindings/soc/qcom
> ./Documentation/devicetree/bindings/soc/rockchip
> ./Documentation/devicetree/bindings/soc/ti
> ./Documentation/devicetree/bindings/soc/xilinx
> ./Documentation/devicetree/bindings/soc/zte

These are for individual drivers in drivers/soc/

> Confusingly, I see bcm, mediatek, rockchip
> in both locations.

Correct, because one is for the SoC and board as a whole, the other is
for some subset of the SoC handled via drivers/soc - for example, with
Broadcom, there's a Raspberry PI power domain driver in drivers/soc,
which is described by

 Documentation/devicetree/bindings/soc/bcm/raspberrypi,bcm2835-power.txt

whereas

 Documentation/devicetree/bindings/arm/bcm/brcm,bcm2835.txt

describes the bindings for the root DT node BCM2835 as used on the
Raspberry Pi amongst other boards.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] Doc: lockdep: add information about performance impact

2018-10-10 Thread Russell King - ARM Linux
On Tue, Oct 09, 2018 at 04:58:18PM +0100, Will Deacon wrote:
> On Tue, Oct 09, 2018 at 05:43:59PM +0200, Peter Zijlstra wrote:
> > On Tue, Oct 09, 2018 at 05:39:27PM +0200, Lukasz Luba wrote:
> > > This patch add some warning related to performance drop.
> > > It should be mentioned that this is not for free
> > > and the platfrom resources (cache, bus interconnect, etc.)
> > > will be used more frequently.
> > 
> > To me this reads a bit like: water is wet.
> > 
> > Is this really needed?
> 
> I don't think so -- this is a debug option under "kernel hacking". Surely
> the perf hit comes with the territory.

Indeed it does - since adding debug code means additional instructions,
more cache pressure and, therefore, slower execution.

I've already said to people... turn on all debug options for development,
and when you move to performance evaluation and optimisation for the
production version, evaluate and turn off debug options.

Unfortunately, turning off PROVE_LOCKING in the defconfigs is going
to have a negative impact on the automated build/boot testing that
systems like kernelci.org do for us - these are primarily based
around the defconfigs, which means we're going to end up building
with PROVE_LOCKING disabled.  This means kernelci.org will be less
likely to catch locking issues.

Forcing PROVE_LOCKING on also doesn't make sense - there may be (rare)
bugs, eg race conditions, uncovered by having that disabled.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] traps:Recover undefined user instruction on ARM

2018-10-05 Thread Russell King - ARM Linux
On Fri, Oct 05, 2018 at 10:15:57AM +0530, Manjeet Pawar wrote:
> From: Rohit Thapliyal  
> 
> During user undefined instruction exception, the arm exception
> handler currently results in application crash through SIGILL.
> The bad instruction can be due to ddr/hardware issue.
> For such cases, exception trap handler could try to recover the corrupted
> text by clearing pagetable entry of undefined instruction pc and trying to 
> fetch
> the instruction opcode by generating major page fault.
> Resulting in loading the page with correct instruction from mapped file.
> If there is no error in root filesystem i.e. the opcode is intact
> in file, then filemap fault shall be able to recover
> the instruction and continue execution normally instead of crashing.

For the reasons others have pointed out in previous replies, I will not
merge this patch or anything similar to this for mainline.  The only
sane solution to the above problem is to fix the hardware or the
hardware timings so that DDR access is reliable.

Sorry.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [RFC 1/2] reboot: Make restart_handler_list a blocking notifier chain.

2018-10-05 Thread Russell King - ARM Linux
On Fri, Oct 05, 2018 at 10:27:48AM +0200, Nicolas Cavallari wrote:
> On 04/10/2018 18:49, Russell King - ARM Linux wrote:
> > This isn't going to work.
> > 
> > For example, sysrq processing (which can happen in IRQ context) calls
> > emergency_restart() for the reboot sysrq.  That calls through to
> > machine_restart(), which then calls do_kernel_restart().
> >
> > If do_kernel_restart() sleeps, then we're trying to sleep in IRQ
> > context, and that's a no no.  I'm afraid you can't just add an irq
> > enable and change the notifier list to be atomic - and, as you're
> > making the change in generic code, this affects everyone, not just the
> > architecture that happens to be in front of you (so if it were merged,
> > you're likely to get a lot of bug reports!)
> 
> I don't get it.
> 
> Many implementations of machine_restart() sleeps or do an infinite loop.
> Almost half of the restart_handler users sleeps or do an infinite loop.

Infinite loops are not a problem when shutting down or rebooting -
they're only "infinite" in the sense that control never returns but
that is the case anyway when the restart or shutdown takes effect.

> I understand that sleeping in IRQ context is bad, but the kernel does it
> anyway. And it seems nobody have noticed any problem with this.

That is incorrect - there are reports of this failing as I mentioned
in my email.

Also note that there is a big difference between sleeping in atomic
context (iow, sleeping with spinlocks held, attempting to sleep with
IRQs disabled) and sleeping in IRQ context (iow, sleeping in an
interrupt handler).  You can "work around" the former with your code,
but in doing so you end up _breaking_ the latter case for every
situation.

I've pointed out some of the issues that make it unreliable in my
initial email (quoted below).

> > It gets worse, because (eg) a panic() or IRQ can happen with any locks
> > held - and if the I2C device's locks are held when one of those events
> > happen, you have a deadlock situation (hence you won't reboot!)
> > 
> > I suppose a good first step would be for us to have our own
> > machine_emergency_restart() on ARM, to separate the atomic paths
> > from the non-atomic paths, so that those who need to talk to an I2C,
> > that can happen from the non-atomic path (which means things like
> > /sbin/reboot will work) but other stuff (eg, restart on panic, sysrq,
> > soft-watchdog) will fail.
> 
> That would fix my use case, but not the existing code ?

There is no fix possible for a blocking I2C transfer in IRQ context,
it will always be unreliable for the reasons I've explained above.

All those existing cases where drivers are doing I2C transfers using
the I2C host driver for power down or restart are broken and
unreliable.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [RFC 1/2] reboot: Make restart_handler_list a blocking notifier chain.

2018-10-04 Thread Russell King - ARM Linux
On Thu, Oct 04, 2018 at 06:23:38PM +0200, Nicolas Cavallari wrote:
> Many users of restart_handlers are sleeping in their callbacks.  Some
> are doing infinite loops or calling driver code that may sleep or
> perform operation on slow busses, like i2c.
> 
> This is not allowed in an atomic notifier chain, which is what
> restart_handler_list currently is, so use a blocking notifier chain
> instead.

This isn't going to work.

For example, sysrq processing (which can happen in IRQ context) calls
emergency_restart() for the reboot sysrq.  That calls through to
machine_restart(), which then calls do_kernel_restart().

If do_kernel_restart() sleeps, then we're trying to sleep in IRQ
context, and that's a no no.  I'm afraid you can't just add an irq
enable and change the notifier list to be atomic - and, as you're
making the change in generic code, this affects everyone, not just the
architecture that happens to be in front of you (so if it were merged,
you're likely to get a lot of bug reports!)

It gets worse, because (eg) a panic() or IRQ can happen with any locks
held - and if the I2C device's locks are held when one of those events
happen, you have a deadlock situation (hence you won't reboot!)

I suppose a good first step would be for us to have our own
machine_emergency_restart() on ARM, to separate the atomic paths
from the non-atomic paths, so that those who need to talk to an I2C,
that can happen from the non-atomic path (which means things like
/sbin/reboot will work) but other stuff (eg, restart on panic, sysrq,
soft-watchdog) will fail.

This issue as come up recently surrounding PMIC issues, but the
discussions there appear to have completely dried up...

However, my conclusion is that having an I2C driver deal with reboot
is possible for the process-induced reboot cases, but it's never going
to work reliably for the emergency case.

If you want the emergency case to work, then you need to work out some
way to talk on the I2C bus without involving any locks and with the I2C
bus possibly mid-transfer - which is not an easy problem to solve.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH v2] mm: Introduce new function vm_insert_kmem_page

2018-10-04 Thread Russell King - ARM Linux
On Thu, Oct 04, 2018 at 05:45:13PM +0530, Souptick Joarder wrote:
> On Thu, Oct 4, 2018 at 3:45 AM Russell King - ARM Linux
>  wrote:
> >
> > On Wed, Oct 03, 2018 at 01:00:03PM -0700, Matthew Wilcox wrote:
> > > On Thu, Oct 04, 2018 at 12:28:54AM +0530, Souptick Joarder wrote:
> > > > These are the approaches which could have been taken to handle
> > > > this scenario -
> > > >
> > > > *  Replace vm_insert_page with vmf_insert_page and then write few
> > > >extra lines of code to convert VM_FAULT_CODE to errno which
> > > >makes driver users more complex ( also the reverse mapping errno to
> > > >VM_FAULT_CODE have been cleaned up as part of vm_fault_t migration ,
> > > >not preferred to introduce anything similar again)
> > > >
> > > > *  Maintain both vm_insert_page and vmf_insert_page and use it in
> > > >respective places. But it won't gurantee that vm_insert_page will
> > > >never be used in #PF context.
> > > >
> > > > *  Introduce a similar API like vm_insert_page, convert all non #PF
> > > >consumer to use it and finally remove vm_insert_page by converting
> > > >it to vmf_insert_page.
> > > >
> > > > And the 3rd approach was taken by introducing vm_insert_kmem_page().
> > > >
> > > > In short, vmf_insert_page will be used in page fault handlers
> > > > context and vm_insert_kmem_page will be used to map kernel
> > > > memory to user vma outside page fault handlers context.
> > >
> > > As far as I can tell, vm_insert_kmem_page() is line-for-line identical
> > > with vm_insert_page().  Seriously, here's a diff I just did:
> > >
> > > -static int insert_page(struct vm_area_struct *vma, unsigned long addr,
> > > -   struct page *page, pgprot_t prot)
> > > +static int insert_kmem_page(struct vm_area_struct *vma, unsigned long 
> > > addr,
> > > +   struct page *page, pgprot_t prot)
> > > -   /* Ok, finally just insert the thing.. */
> > > -int vm_insert_page(struct vm_area_struct *vma, unsigned long addr,
> > > +int vm_insert_kmem_page(struct vm_area_struct *vma, unsigned long addr,
> > > -   return insert_page(vma, addr, page, vma->vm_page_prot);
> > > +   return insert_kmem_page(vma, addr, page, vma->vm_page_prot);
> > > -EXPORT_SYMBOL(vm_insert_page);
> > > +EXPORT_SYMBOL(vm_insert_kmem_page);
> > >
> > > What on earth are you trying to do?
> 
> >
> > Reading the commit log, it seems that the intention is to split out
> > vm_insert_page() used outside of page-fault handling with the use
> > within page-fault handling, so that different return codes can be
> > used.
> >
> > I don't see that justifies the code duplication - can't
> > vm_insert_page() and vm_insert_kmem_page() use the same mechanics
> > to do their job, and just translate the error code from the most-
> > specific to the least-specific error code?  Do we really need two
> > copies of the same code just to return different error codes.
> 
> Sorry about it.
> can I take below approach in a patch series ->
> 
> create a wrapper function vm_insert_kmem_page using vm_insert_page.
> Convert all the non #PF users to use it.
> Then make vm_insert_page static and convert inline vmf_insert_page to caller.

I'm confused, what are you trying to do?

It seems that we already have:

vm_insert_page() - returns an errno
vmf_insert_page() - returns a VM_FAULT_* code

>From what I _think_ you're saying, you're trying to provide
vm_insert_kmem_page() as a direct replacement for the existing
vm_insert_page(), and then make vm_insert_page() behave as per
vmf_insert_page(), so we end up with:

vm_insert_kmem_page() - returns an errno
vm_insert_page() - returns a VM_FAULT_* code
vmf_insert_page() - returns a VM_FAULT_* code and is identical to
  vm_insert_page()

Given that the documentation for vm_insert_page() says:

 * Usually this function is called from f_op->mmap() handler
 * under mm->mmap_sem write-lock, so it can change vma->vm_flags.
 * Caller must set VM_MIXEDMAP on vma if it wants to call this
 * function from other places, for example from page-fault handler.

this says that the "usual" use method for vm_insert_page() is
_outside_ of page fault handling - if it is used _inside_ page fault
handling, then it states that additional fixups are required on the
VMA.  So I don't get why your patch commentry seems to be saying that
users of vm_insert_page() outside of page fault handling all need to
be patched - isn't this the use case that this function is defined
to be handling?

If you're going to be changing the semantics, doesn't this create a
flag day where we could get new users of vm_insert_page() using the
_existing_ semantics merged after you've changed its semantics (eg,
the return code)?

Maybe I don't understand fully what you're trying to achieve here.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH 3/3] ARM: socfpga: Turn on ARM errata for L2 cache

2018-10-04 Thread Russell King - ARM Linux
On Thu, Oct 04, 2018 at 10:54:15AM +0200, Clément Péron wrote:
> From: Dinh Nguyen 
> 
> Turn on these ARM and PL310 errata for SoCFPGA:
> 
> ARM_ERRATA_754322
> ARM_ERRATA_764369
> ARM_ERRATA_775420
> 
> PL310_ERRATA_588369
> PL310_ERRATA_727915
> PL310_ERRATA_753970
> PL310_ERRATA_769419
> 
> Fixes: 387798b37c8d ("ARM: initial multiplatform support")

What was broken in the referred to commit that caused this regression?
If it's not a regression, why does it have a Fixes: line?

> Signed-off-by: Dinh Nguyen 
> Signed-off-by: Clément Péron 
> ---
>  arch/arm/mach-socfpga/Kconfig | 7 +++
>  1 file changed, 7 insertions(+)
> 
> diff --git a/arch/arm/mach-socfpga/Kconfig b/arch/arm/mach-socfpga/Kconfig
> index 4adb901dd5eb..a04660918d71 100644
> --- a/arch/arm/mach-socfpga/Kconfig
> +++ b/arch/arm/mach-socfpga/Kconfig
> @@ -11,6 +11,13 @@ menuconfig ARCH_SOCFPGA
>   select HAVE_ARM_TWD if SMP
>   select MFD_SYSCON
>   select PCI_DOMAINS if PCI
> + select ARM_ERRATA_754322
> + select ARM_ERRATA_764369 if SMP
> + select ARM_ERRATA_775420
> + select PL310_ERRATA_588369
> + select PL310_ERRATA_727915
> + select PL310_ERRATA_753970 if PL310
> + select PL310_ERRATA_769419
>  
>  if ARCH_SOCFPGA
>  config SOCFPGA_SUSPEND
> -- 
> 2.17.1
> 

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: RESEND and REBASE arm+arm64+aarch32 vdso rewrite

2018-10-02 Thread Russell King - ARM Linux
On Mon, Oct 01, 2018 at 01:44:52PM -0700, Mark Salyzyn wrote:
> Despite the gain of 0.4% for screen-on battery life, where Android has a mix
> of 64 and 32 bit applications, thus still relevant _today_ on 64 bit
> architectures (providing vDSO32 for 32-bit applications).

I don't think the issue is what you think it is.  0.4% gain is
equivalent to almost (but not quite) 1 minute extra for a lifetime of
4 hours.  Is that really noticable, and is it worth the churn from
merging this series?

Given that the gain is so marginal, I can see why people find it
difficult to get excited about this series to spend the time reviewing
it.

What I'm saying is that the reason that people should look at this
series hasn't been "sold" particularly well.  How does it look from
the system performance point of view - is there a speed-up there
that's more significant?

In any case, I suspect that if you compare the battery life from
kernels from two years ago with modern kernels, you'll see a
degredation over that period just because of the progressive
increase in complexity, and especially things such as the Spectre
work-arounds.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 12.1Mbps down 622kbps up
According to speedtest.net: 11.9Mbps down 500kbps up


Re: [PATCH] ARM: makefile: pass -march=armv4 to assembler even on CPU32v3

2018-10-01 Thread Russell King - ARM Linux
On Mon, Oct 01, 2018 at 08:10:26PM +0200, Ard Biesheuvel wrote:
> On 1 October 2018 at 19:56, Russell King - ARM Linux
>  wrote:
> > On Sun, Sep 30, 2018 at 04:49:04AM +0200, Jason A. Donenfeld wrote:
> >> Per the discussion about half-way down in [1], the kernel doesn't
> >> actually support the ARMv3 ISA, but selects it for some ARMv4 ISA
> >> hardware that benefits from ARMv3 code generation.
> >
> > The issue is to do with the half-word stores in the ARMv4 ISA, which
> > need to be avoided on StrongARM RiscPC - the bus from the processor
> > card (which was designed for ARM610 and ARM710) does not support
> > anything except 8-bit and 32-bit accesses, so the 16-bit load/store
> > instructions don't work correctly.
> >
> > Obviously, the reason for having the compiler use ARMv3 is to avoid
> > those instructions which we have no other way to prevent - however,
> > the use of ARMv3 with the assembler ensures that ldrh/strh are not
> > accidentally used.
> >
> > We could argue that the ARMv3 assembly files are now stable, so the
> > chances of ldrh/strh being introduced is low, which would make this
> > change tolerable, but the commit message needs to spell out that
> > we lose this protection.
> >
> >> Such a consideration,
> >> then, only applies to the compiler but not to the assembler. This commit
> >> passes -march=armv4 to the assembler in those cases, so that code
> >> written for ARMv4 will continue to compile and run fine, without needing
> >> module-specific asflags-y overrides.
> >
> > Note that "code written for ARMv4" will not be usable on this platform
> > if it makes use of ldrh/strh, so depending on which instructions the
> > assembler is complaining about, it could very well be a real "you're
> > doing something wrong" case.
> >
> > The side effect of this patch is that such cases will now be hidden
> > rather than evaluated on a case-by-case basis.
> >
> 
> Thanks for the insight.
> 
> So Arnd's suggestion to switch to armv3-m would actually be feasible
> then? The code in question does not use ldrh only umull, and so it
> should build with armv3m as well. And we will even generate some
> better code for RiscPC if we apply it to both the cflags and the
> asflags.

Yep.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 14.8Mbps down 650kbps up
According to speedtest.net: 13Mbps down 490kbps up


Re: [PATCH] ARM: makefile: pass -march=armv4 to assembler even on CPU32v3

2018-10-01 Thread Russell King - ARM Linux
On Sun, Sep 30, 2018 at 04:49:04AM +0200, Jason A. Donenfeld wrote:
> Per the discussion about half-way down in [1], the kernel doesn't
> actually support the ARMv3 ISA, but selects it for some ARMv4 ISA
> hardware that benefits from ARMv3 code generation.

The issue is to do with the half-word stores in the ARMv4 ISA, which
need to be avoided on StrongARM RiscPC - the bus from the processor
card (which was designed for ARM610 and ARM710) does not support
anything except 8-bit and 32-bit accesses, so the 16-bit load/store
instructions don't work correctly.

Obviously, the reason for having the compiler use ARMv3 is to avoid
those instructions which we have no other way to prevent - however,
the use of ARMv3 with the assembler ensures that ldrh/strh are not
accidentally used.

We could argue that the ARMv3 assembly files are now stable, so the
chances of ldrh/strh being introduced is low, which would make this
change tolerable, but the commit message needs to spell out that
we lose this protection.

> Such a consideration,
> then, only applies to the compiler but not to the assembler. This commit
> passes -march=armv4 to the assembler in those cases, so that code
> written for ARMv4 will continue to compile and run fine, without needing
> module-specific asflags-y overrides.

Note that "code written for ARMv4" will not be usable on this platform
if it makes use of ldrh/strh, so depending on which instructions the
assembler is complaining about, it could very well be a real "you're
doing something wrong" case.

The side effect of this patch is that such cases will now be hidden
rather than evaluated on a case-by-case basis.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 14.8Mbps down 650kbps up
According to speedtest.net: 13Mbps down 490kbps up


Re: [PATCH] ARM: disable ARMv6 for Clang older than 8.0

2018-10-01 Thread Russell King - ARM Linux
On Sun, Sep 30, 2018 at 04:48:20PM -0700, Joe Perches wrote:
> On Mon, 2018-10-01 at 00:22 +0200, Stefan Agner wrote:
> > The kernel passes the ArmV6K architecture to the compiler when
> > using the multi platform selection and enabling ARMv6. Clang
> > older than version 8.0 emit assembly with an non-existing CPU,
> > which then makes the assembler fail. Prevent the user from
> > selecting ARMv6 when using Clang before 8.0.
> > 
> > Signed-off-by: Stefan Agner 
> > Link: https://github.com/ClangBuiltLinux/linux/issues/55
> > ---
> >  arch/arm/Kconfig | 1 +
> >  1 file changed, 1 insertion(+)
> > 
> > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> > index e8cd55a5b04c..8da160757381 100644
> > --- a/arch/arm/Kconfig
> > +++ b/arch/arm/Kconfig
> > @@ -671,6 +671,7 @@ config ARCH_MULTI_V4_V5
> >  
> >  config ARCH_MULTI_V6
> > bool "ARMv6 based platforms (ARM11)"
> > +   depends on !CC_IS_CLANG || CLANG_VERSION>=8
> > select ARCH_MULTI_V6_V7
> > select CPU_V6K
> >  
> 
> Perhaps it'd be better to avoid this in selection
> criteria in Kconfig and instead add this to
> include/linux/compiler_clang.h
> ---
>  include/linux/compiler-clang.h | 4 
>  1 file changed, 4 insertions(+)
> 
> diff --git a/include/linux/compiler-clang.h b/include/linux/compiler-clang.h
> index b1ce500fe8b3..90fd16c85359 100644
> --- a/include/linux/compiler-clang.h
> +++ b/include/linux/compiler-clang.h
> @@ -3,6 +3,10 @@
>  #error "Please don't include  directly, include 
>  instead."
>  #endif
>  
> +#if defined(CONFIG_ARCH_MULTI_V6) && CLANG_VERSION < 8
> +# error Sorry, your compiler is too old - please upgrade it.
> +#endif

(a) does it makes sense to have arch CONFIG_* in linux/*.h?  It could
instead go in arch/arm/kernel/asm-offsets.c, where we have the other
tests similar to this for GCC.

(b) do we get far enough that the #error will be generated before the
reported assembler error happens?  IOW, would the assembler error
mask this #error?

I think it needs confirmation that a #error for this problem results
in the desired behaviour.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 14.8Mbps down 650kbps up
According to speedtest.net: 13Mbps down 490kbps up


Re: [BUG] sleep in atomic in 8250 runtime PM code path

2018-09-29 Thread Russell King - ARM Linux
On Sat, Sep 29, 2018 at 01:20:36PM +0800, Jisheng Zhang wrote:
> Hi,
> 
> Recently I found I could trigger sleep in atomic bug on berlin after commit
> d76c74387e1c ("serial: 8250_dw: Fix runtime PM handling"). The path looks 
> like:
> 
> dw8250_probe => serial850_register_8250_port => uart_add_one_port=>
> register_console => console_unlock => univ8250_console_write =>
> serial8250_console_write => serial8250_rpm_get => pm_runtime_get_sync
> 
> The irq is disabled by printk_safe_enter_irqsave() in console_unlock, but
> pm_runtime_get_sync can't be called in atomic context...
> 
> I guess the reason why we didn't notice it is due to the fact that
> only OMAP and DW sets UART_CAP_RPM currently, and DW set the flag in
> May 2018.
> 
> Per my understanding, the bug sits in the 8250 core driver rather than
> 8250_dw.c.

(Adding Tony and Sebastian, presumably CAP_RPM comes from OMAP since
that is the only other user, and this same bug is present there too.)

Correct.  printk() can be called from atomic contexts (consider what
happens when an oops or similar occurs - we can be in any context,
holding any locks etc.)  Plain printk() can also be used from within
spinlocked irqs-off regions.

This means the console's write function may be called in these contexts.
Since pm_runtime_get_sync() is may sleep, it means that its use in the
console path is _fundamentally_ wrong, and will lead to exactly this
problem.

I don't see a way around that other than to avoid RPM on consoles.
(which makes the presence of the RPM code in serial8250_console_write()
completely unnecessary.)

When I rewrote the serial drivers and created serial_core & 8250, this
is something that I realised, and I arranged the PM support at the time
to always maintain the console in active state (this is prior to RPM).

While I'm looking at commit d74d5d1b7288 ("tty: serial: 8250_core: add
run time pm"):

+static void serial8250_rpm_get_tx(struct uart_8250_port *p)
+{
+   unsigned char rpm_active;
+
+   if (!(p->capabilities & UART_CAP_RPM))
+   return;
+
+   rpm_active = xchg(>rpm_tx_active, 1);
+   if (rpm_active)
+   return;
+   pm_runtime_get_sync(p->port.dev);
+}

is particularly "interesting" - if this is called from sections of
code that allow it to be called concurrently from different contexts,
then we could have:

rpm_tx_active   thread 0thread 1
0
xchg(, 1)
1
xchg(, 1)
... goes on to use port ...
pm_runtime_get_sync()

In other words, the port can be used _before_ pm_runtime_get_sync() is
called.

If, on the other hand, this can't race, then considering the
serial8250_rpm_put_tx() path as well, what stops this race from
happening:

rpm_tx_active   thread 0thread 1
1
serial8250_rpm_get_tx()
serial8250_rpm_put_tx()
xchg(, 1)
1
xchg(, 0)
0
pm_runtime_put_autosuspend()

Now to the real point about the above - if _neither_ race is possible,
then what is the point of the more expensive xchg() here rather than
simple test-and-assignment of rpm_tx_active?  Either these paths can't
race with each other and xchg() is unnecessary, or they can and they
_could_ fail as shown above.  My suspicion is that xchg() is an attempt
to reduce the likelyhood of one of these races being hit.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 14.8Mbps down 650kbps up
According to speedtest.net: 13Mbps down 490kbps up


Re: [BISECTED] Regression: Solidrun Clearfog Base won't boot since "PCI: mvebu: Only remap I/O space if configured"

2018-09-24 Thread Russell King - ARM Linux
On Mon, Sep 24, 2018 at 12:26:14PM +0200, Thomas Petazzoni wrote:
> Hello,
> 
> On Mon, 24 Sep 2018 11:12:13 +0100, Russell King - ARM Linux wrote:
> 
> > > Thanks for the testing. I'll wait for Russell to say if he is happy
> > > (or not) with the addition of pci_unmap_io() in the ARM code, if that's
> > > the case, I'll send a proper patch to fix the issue.  
> > 
> > I'd prefer not to provide an unmapping API, because it gives the
> > impression that it's okay to unmap the IO space mapping, and we'll
> > end up with drivers that decide to call it in their cleanup path.
> > IO space isn't expected to ever go away - eg, on a PC, it's always
> > present.
> 
> But being able to unmap it would also be needed to be able to remove
> PCI host controller drivers, and therefore compile them as module, and
> make them more like any other drivers.
> 
> I'm not sure why we need to guarantee that the I/O space is always
> mapped:
> 
>  - It isn't mapped before the PCI controller driver does the mapping.
> 
>  - There is no reason for it to be accessed when the PCI controller
>driver is not initialized: PCI devices can only be probed and
>initialized when the PCI controller driver is probed/initialized.

There are historic reasons.  PCI provides ISA IO space, and when you
have a machine with ISA peripherals present, the PCI IO space must
never be unmapped - if it is, ISA drivers will oops the kernel.  There
is no way for a vanishing PCI controller to cause ISA drivers to be
unbound.

If you have a host controller that does unmap PCI IO space and you have
ISA peripherals with drivers present, unbinding the PCI host controller
will remove the IO space mapping, and next time an ISA peripheral
touches IO space, the kernel will oops.

> All other drivers, including on ARM, use pci_remap_iospace(), which
> does provide the pci_unmap_iospace() counter part.

... which has been created in PCI land just to deal with PCI without
regard for the above issue.

However, there's another issue I missed - if you _do_ have ISA
peripherals, you likely want the IO space setup from very early on,
and you won't be using the new fangled PCI host driver support anyway.
That uses pci_map_io_early() rather than pci_ioremap_io() or
pci_remap_io().

> But to me, the general direction is that the ARM-specific
> pci_remap_io() API is fading away, and its replacement already provides
> an unmapping capability. So why not add the same unmapping capability
> to pci_remap_io() ?

Yes, that would be a good longer term plan - we don't need three
different ways to map PCI IO space, but it is development.

> But we have a regression and we need to fix it. Do you suggest to not
> use the new pci_host_probe() API ?

Well, arguably, the patch that caused the regression is the buggy patch,
_not_ the lack of unmapping API for pci_ioremap_io().  Trying to address
a regression with further development means that _that_ development
needs thought and review, which is a slower process.

I do understand the desire to keep moving forward and never take a step
backwards, but sometimes backwards steps are the best way to resolve a
regression.  But I also do appreciate that a simple revert in this case
is not possible.

I'll accept your patch on the condition that the ARM private
pci_ioremap_io() will go away in the very near future (please _try_
to get agreement on that before this patch is merged.)

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


Re: [BISECTED] Regression: Solidrun Clearfog Base won't boot since "PCI: mvebu: Only remap I/O space if configured"

2018-09-24 Thread Russell King - ARM Linux
On Thu, Sep 13, 2018 at 10:42:41AM +0200, Thomas Petazzoni wrote:
> Hello,
> 
> On Thu, 13 Sep 2018 10:20:45 +0200, Jan Kundrát wrote:
> > On čtvrtek 13. září 2018 9:45:15 CEST, Thomas Petazzoni wrote:
> > > What about something like the below. I tested it, including the error
> > > case by forcing an -EPROBE_DEFER. The new pci_unmap_io() is modeled
> > > after pci_unmap_iospace(). Actually, I would prefer to use
> > > pci_remap_iospace() and pci_unmap_iospace() but for now this API
> > > doesn't allow overloading the memory type used for the mapping.  
> > 
> > Thanks for providing this fix so quickly, Thomas. I can confirm that this 
> > patch -- tested on top of 54eda9df17f3215b9ed16629ee71ea07413efdaf ("Merge 
> > tag 'pci-v4.19-fixes-1' of 
> > git://git.kernel.org/pub/scm/linux/kernel/git/helgaas/pci"). Disclaimer: I 
> > have zero familiarity with Linux' PCI code.
> > 
> > Tested-by: Jan Kundrát 
> 
> Thanks for the testing. I'll wait for Russell to say if he is happy
> (or not) with the addition of pci_unmap_io() in the ARM code, if that's
> the case, I'll send a proper patch to fix the issue.

I'd prefer not to provide an unmapping API, because it gives the
impression that it's okay to unmap the IO space mapping, and we'll
end up with drivers that decide to call it in their cleanup path.
IO space isn't expected to ever go away - eg, on a PC, it's always
present.

I've been toying with the idea of making pci_map_io() ignore
subsequent attempts to overwrite the mapping with an identical
mapping, and WARN() if there is an attempt to overwrite an existing
mapping with other physical address, but I'm not entirely happy
with that either (which is why I haven't responded sooner.)

At the moment, I don't have a way forward that I'm happy with for
this.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


Re: [PATCH] ARM: add serial.h and set BASE_BAUD to 0

2018-09-21 Thread Russell King - ARM Linux
On Thu, Sep 20, 2018 at 10:13:57PM +0200, Rafał Miłecki wrote:
> From: Rafał Miłecki 
> 
> For years arm has been using serial.h from asm-generic which sets
> BASE_BAUD value to the (1843200 / 16). This is incorrect as:
> 1) This value obviously isn't correct for all devices
> 2) There are no device specific serial.h with CONFIG_ARCH_MULTIPLATFORM

This goes back a long way.  The BASE_BAUD definition is the base baud
for 8250 compatible UARTs _only_, no others.

However, as of 182ead3e418a ("earlycon: Remove hardcoded port->uartclk
initialization in of_setup_earlycon"), port->uartclk is no longer
initialised using BASE_BAUD.  As acknowledged in 814453adea7d
("earlycon: Initialize port->uartclk based on clock-frequency property")
the initialisation using BASE_BAUD was bogus, and there is now the
clock-frequency DT property which should be present to set this up.

Now, setting BASE_BAUD to zero as per your patch will break the
8250 serial driver - this relies on BASE_BAUD being set to the
current value, and yes, ARM hardware that uses this will break.  So
this is not a solution.

The only solution is that BASE_BAUD must not be abused - it must not
be used by non-8250 hardware, and thankfully there are already
solutions in the kernel (such as clock-frequency) to allow the clock
rate to be specified.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


Re: [BISECTED] Regression: Solidrun Clearfog Base won't boot since "PCI: mvebu: Only remap I/O space if configured"

2018-09-12 Thread Russell King - ARM Linux
On Wed, Sep 12, 2018 at 09:49:41PM +0300, Baruch Siach wrote:
> I reproduced the same Oops on Clearfog Base without any taint:
> 
> [1.476401] Internal error: Oops - undefined instruction: 0 [#1] SMP ARM
...
> [1.855954] Code: e2844004 e5972000 e352 0aee (e7f001f2)

That is a BUG().  Please turn on verbose bug reporting to get more
information about the cause.

There are two possibilities:

BUG_ON(addr >= end);

and

BUG_ON(!pte_none(*pte));

It's probably the latter - the region is probably already mapped, that
being the PCI IO region.

The original driver was setup to call pci_ioremap_io() as the very
last thing - and as the driver is non-removable, we were guaranteed
to never tear down this mapping (which is sensible, it's published
to userspace.)

However, the current code calls pci_ioremap_io() much earlier, in
a path where probe failures can occur.  This breaks pci_ioremap_io()'s
requirements - it must not be called more than once.  So:

ee1604381a37 ("PCI: mvebu: Only remap I/O space if configured")

is basically incorrect - pci_ioremap_io() needs to move back to a
place where it is only called in a path which will never fail.
However, looking at the generic host bits, I'm not sure such a place
exists in the new effort to make stuff more generic.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


Re: [PATCH 1/3] ARM: dts: NSP: Enable SFP on bcm958625hr

2018-08-27 Thread Russell King - ARM Linux
On Mon, Aug 27, 2018 at 01:03:42PM -0700, Florian Fainelli wrote:
> Enable the SFP connected to port 5 of the switch and wire up all GPIOs
> to the SFP cage. Because of a hardware limitation of the i2c controller
> on the iProc SoCs which prevents large i2c (> 256 bytes) transactions to
> work, we use the i2c-gpio interface instead, which does not have that
> limitation. This allows us to read the SFP module EEPROM, which would
> not be possible otherwise since it exceeds that size during a single
> read transfer.

We shouldn't exceed 256 bytes, since 256 bytes is the "page" size
of the EEPROM.  The most that we read in one block is either
ETH_MODULE_SFF_8079_LEN or (ETH_MODULE_SFF_8472_LEN - 
ETH_MODULE_SFF_8079_LEN), both of which result in no more than 256
byte reads.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


Re: [PATCH] ARM: drop experimental mark for ARM stack unwinding

2018-08-11 Thread Russell King - ARM Linux
On Sat, Aug 11, 2018 at 12:31:27PM +0200, Stefan Agner wrote:
> ARM stack unwinding is upstream since 2009 and has been proven
> working well. At this time it is the preferred stack unwinding
> support since it also supports Thumb 2. Do not scare people
> and drop the EXPERIMENTAL mark.

Actually, there are still cases where we're missing the annotations -
I've seen them in various oops dumps, but I forget where they are.
What I do remember is that they didn't seem trivial to fix at the
time, so it's still recommended that frame pointers are used if you
want to guarantee getting a debuggable oops dump.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


Re: [PATCH v3 7/8] net: phy: Add support to configure clock in Broadcom iProc mdio mux

2018-08-01 Thread Russell King - ARM Linux
On Wed, Aug 01, 2018 at 10:07:12PM +0200, Andrew Lunn wrote:
> You might want to consider adding clk_optional_get() and
> devm_clk_optional_get().

I think there's attempts to add such APIs but I don't think it's
trivial - it seems to require a _lot_ of discussion.

I think part of that is because of the quirky use of error codes.
If you look at clk_get(), it calls __of_clk_get_by_name() which
returns:

 -ENOENT if DT is disabled
 -ENOENT if the device has no DT node
 -EPROBE_DEFER if the lookup in DT succeeds but there's no registered
   clock
 -EINVAL if the device has a DT node but the lookup of the name
   failed (in otherwords, the optional clock was omitted)
 -ENOENT if the clocks = property has not enough clocks for the
clock-names property
 -ENOMEM if we fail to allocate the clk
 -ENOENT if __clk_get() fails

or any other error code returned via of_clk_provider's ->get() method.

The use of -EINVAL, one of the most common error codes, makes it
difficult to be sure that the clock is not specified in DT.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


Re: [PATCH 6/6] arm64: enable RapidIO menu in Kconfig

2018-08-01 Thread Russell King - ARM Linux
On Tue, Jul 31, 2018 at 04:01:27PM -0400, Alex Bounine wrote:
> On 2018-07-31 02:18 PM, Russell King - ARM Linux wrote:
> >On Tue, Jul 31, 2018 at 01:59:27PM -0400, Alex Bounine wrote:
> >>On 2018-07-31 11:52 AM, Russell King - ARM Linux wrote:
> >>>On Tue, Jul 31, 2018 at 08:54:14AM -0400, Alex Bounine wrote:
> >>>>On 2018-07-31 04:41 AM, Will Deacon wrote:
> >>>>>On Mon, Jul 30, 2018 at 06:50:34PM -0400, Alexei Colin wrote:
> >>>>>>Platforms with a PCI bus will be offered the RapidIO menu since they may
> >>>>>>be want support for a RapidIO PCI device. Platforms without a PCI bus
> >>>>>>that might include a RapidIO IP block will need to "select HAS_RAPIDIO"
> >>>>>>in the platform-/machine-specific "config ARCH_*" Kconfig entry.
> >>>>>>
> >>>>>>Tested that kernel builds for arm64 with RapidIO subsystem and
> >>>>>>switch drivers enabled, also that the modules load successfully
> >>>>>>on a custom Aarch64 Qemu model.
> >>>>>>
> >>>>>>Cc: Andrew Morton 
> >>>>>>Cc: Russell King 
> >>>>>>Cc: John Paul Walters 
> >>>>>>Cc: linux-arm-ker...@lists.infradead.org
> >>>>>>Cc: linux-kernel@vger.kernel.org,
> >>>>>>Signed-off-by: Alexei Colin 
> >>>>>>---
> >>>>>>  arch/arm64/Kconfig | 2 ++
> >>>>>>  1 file changed, 2 insertions(+)
> >>>>>
> >>>>>Thanks, this looks much cleaner than before:
> >>>>>
> >>>>>Acked-by: Will Deacon 
> >>>>>
> >>>>>The only thing I'm not sure about is why we don't just select HAS_RAPIDIO
> >>>>>unconditionally in the arm64 Kconfig. Does selecting only that option
> >>>>>actually pull in new code to the build?
> >>>>>
> >>>>HAS_RAPIDIO option is intended for SOCs that have built in SRIO 
> >>>>controllers,
> >>>>like TI KeyStoneII or FPGAs. Because RapidIO subsystem core is required
> >>>>during RapidIO port driver initialization, having separate option allows 
> >>>>us
> >>>>to control available build options for RapidIO core and port driver (bool
> >>>>vs. tristate) and disable module option if port driver is configured as
> >>>>built-in.
> >>>
> >>>Your explanation doesn't make much sense to me.
> >>>
> >>>RAPIDIO is the bus-level support, right?  So drivers that depend on
> >>>the bus-level support should depend on RAPIDIO, and so, if RAPIDIO
> >>>is configured as a module, they will also be allowed to be disabled
> >>>or a module, but not built-in if tristate.  If it is boolean, and
> >>>causes the driver to be built-in to the kernel, then you need to use
> >>>"RAPIDIO=y" so that it's dependency is only satisfied when the core
> >>>is built-in.
> >>>
> >>
> >>RapidIO host controllers (on local bus like SoC internal or PCIe) are
> >>serviced by MPORT device drivers that are subsystem specific and communicate
> >>with RapidIO core using set of callbacks. Depending on HW architecture these
> >>drivers may be defined as built-in or module.
> >
> >Why does hardware architecture define whether something has to be built
> >in or can be modular?
> >
> >It is the case today that (eg) on-SoC hardware _can_ be built as a module
> >if desired - just because it's on the SoC does not mean it has to be
> >built in to the kernel.  Why is RapidIO any different?
> >
> 
> Not HW architecture - legacy can be blamed as well. Freescale's FSL_RIO
> driver still exist as built-in so far. Intent of this patch set is to allow
> RapidIO support in more architectures without reworking old stuff.
> I do not think that anyone will be updating FSL_RIO driver soon.

Sorry, but I'm even more confused.  If it's not hardware architecture,
then why did you say previously "Depending on HW architecture these
drivers may be defined as built-in or module." ?

If it's that the driver isn't written to be a module, then that is not
"HW architecture".  Are you just trying to muddy the water?

> Also we cannot dictate to developers of future drivers which method to use -
> they may have built-in option only for the first release.

How is that any different from all the other drivers that we have?

If a driver can

Re: [PATCH 6/6] arm64: enable RapidIO menu in Kconfig

2018-07-31 Thread Russell King - ARM Linux
On Tue, Jul 31, 2018 at 01:59:27PM -0400, Alex Bounine wrote:
> On 2018-07-31 11:52 AM, Russell King - ARM Linux wrote:
> >On Tue, Jul 31, 2018 at 08:54:14AM -0400, Alex Bounine wrote:
> >>On 2018-07-31 04:41 AM, Will Deacon wrote:
> >>>On Mon, Jul 30, 2018 at 06:50:34PM -0400, Alexei Colin wrote:
> >>>>Platforms with a PCI bus will be offered the RapidIO menu since they may
> >>>>be want support for a RapidIO PCI device. Platforms without a PCI bus
> >>>>that might include a RapidIO IP block will need to "select HAS_RAPIDIO"
> >>>>in the platform-/machine-specific "config ARCH_*" Kconfig entry.
> >>>>
> >>>>Tested that kernel builds for arm64 with RapidIO subsystem and
> >>>>switch drivers enabled, also that the modules load successfully
> >>>>on a custom Aarch64 Qemu model.
> >>>>
> >>>>Cc: Andrew Morton 
> >>>>Cc: Russell King 
> >>>>Cc: John Paul Walters 
> >>>>Cc: linux-arm-ker...@lists.infradead.org
> >>>>Cc: linux-kernel@vger.kernel.org,
> >>>>Signed-off-by: Alexei Colin 
> >>>>---
> >>>>  arch/arm64/Kconfig | 2 ++
> >>>>  1 file changed, 2 insertions(+)
> >>>
> >>>Thanks, this looks much cleaner than before:
> >>>
> >>>Acked-by: Will Deacon 
> >>>
> >>>The only thing I'm not sure about is why we don't just select HAS_RAPIDIO
> >>>unconditionally in the arm64 Kconfig. Does selecting only that option
> >>>actually pull in new code to the build?
> >>>
> >>HAS_RAPIDIO option is intended for SOCs that have built in SRIO controllers,
> >>like TI KeyStoneII or FPGAs. Because RapidIO subsystem core is required
> >>during RapidIO port driver initialization, having separate option allows us
> >>to control available build options for RapidIO core and port driver (bool
> >>vs. tristate) and disable module option if port driver is configured as
> >>built-in.
> >
> >Your explanation doesn't make much sense to me.
> >
> >RAPIDIO is the bus-level support, right?  So drivers that depend on
> >the bus-level support should depend on RAPIDIO, and so, if RAPIDIO
> >is configured as a module, they will also be allowed to be disabled
> >or a module, but not built-in if tristate.  If it is boolean, and
> >causes the driver to be built-in to the kernel, then you need to use
> >"RAPIDIO=y" so that it's dependency is only satisfied when the core
> >is built-in.
> >
> 
> RapidIO host controllers (on local bus like SoC internal or PCIe) are
> serviced by MPORT device drivers that are subsystem specific and communicate
> with RapidIO core using set of callbacks. Depending on HW architecture these
> drivers may be defined as built-in or module.

Why does hardware architecture define whether something has to be built
in or can be modular?

It is the case today that (eg) on-SoC hardware _can_ be built as a module
if desired - just because it's on the SoC does not mean it has to be
built in to the kernel.  Why is RapidIO any different?

> Built-in driver will require presence of the RapidIO core during its
> initialization time and therefore we have to set CONFIG_RAPIDIO=y.
> Also we have PCIe based host controllers and their drivers are OK to be
> built as module like for any other PCI device.
> 
> Based on requirements and available resources/HW_configuration the platform
> can have on-chip host controller enabled or disabled (or simply not
> implemented in case of FPGA). This leads us to combination of on-chip and
> PCIe host controllers. For example, if PCIe bus is required for other
> devices, we may have to use PCIe-to_SRIO bridge because all available
> on-chip SerDes lines are assigned to PCIe.
> 
> If we use CONFIG_RAPIDIO as a single indicator of possible configuration, we
> can make visible config menu entry for on-chip controller that is not
> available on given platform due to HW setup. For example on KeystoneII or
> MPC85xx/86xx. The option HAS_RAPIDIO tells us that given platform really
> uses on-chip RapidIO host controller. This is why FSL_RIO depends on
> HAS_RAPIDIO.
> 
> Also having PCIe enabled in any form is not a good option to control support
> for on-chip controller.

I'm not saying that - can you please read my suggestion below and
respond to that.  I'm giving you technical feedback, but it seems
all I'm getting back is "this is how we're doing it" rather than a
constructive discussion.

For example, can you point out why my idea I present below would not
work?

> For 

Re: [PATCH 6/6] arm64: enable RapidIO menu in Kconfig

2018-07-31 Thread Russell King - ARM Linux
On Tue, Jul 31, 2018 at 08:54:14AM -0400, Alex Bounine wrote:
> On 2018-07-31 04:41 AM, Will Deacon wrote:
> >On Mon, Jul 30, 2018 at 06:50:34PM -0400, Alexei Colin wrote:
> >>Platforms with a PCI bus will be offered the RapidIO menu since they may
> >>be want support for a RapidIO PCI device. Platforms without a PCI bus
> >>that might include a RapidIO IP block will need to "select HAS_RAPIDIO"
> >>in the platform-/machine-specific "config ARCH_*" Kconfig entry.
> >>
> >>Tested that kernel builds for arm64 with RapidIO subsystem and
> >>switch drivers enabled, also that the modules load successfully
> >>on a custom Aarch64 Qemu model.
> >>
> >>Cc: Andrew Morton 
> >>Cc: Russell King 
> >>Cc: John Paul Walters 
> >>Cc: linux-arm-ker...@lists.infradead.org
> >>Cc: linux-kernel@vger.kernel.org,
> >>Signed-off-by: Alexei Colin 
> >>---
> >>  arch/arm64/Kconfig | 2 ++
> >>  1 file changed, 2 insertions(+)
> >
> >Thanks, this looks much cleaner than before:
> >
> >Acked-by: Will Deacon 
> >
> >The only thing I'm not sure about is why we don't just select HAS_RAPIDIO
> >unconditionally in the arm64 Kconfig. Does selecting only that option
> >actually pull in new code to the build?
> >
> HAS_RAPIDIO option is intended for SOCs that have built in SRIO controllers,
> like TI KeyStoneII or FPGAs. Because RapidIO subsystem core is required
> during RapidIO port driver initialization, having separate option allows us
> to control available build options for RapidIO core and port driver (bool
> vs. tristate) and disable module option if port driver is configured as
> built-in.

Your explanation doesn't make much sense to me.

RAPIDIO is the bus-level support, right?  So drivers that depend on
the bus-level support should depend on RAPIDIO, and so, if RAPIDIO
is configured as a module, they will also be allowed to be disabled
or a module, but not built-in if tristate.  If it is boolean, and
causes the driver to be built-in to the kernel, then you need to use
"RAPIDIO=y" so that it's dependency is only satisfied when the core
is built-in.

HAS_RAPIDIO gives the impression that it defines whether or not
the rapidio core code is allowable or not - it doesn't suggest that
it has anything to do with drivers.  However, reading the PowerPC
Kconfig files, it seems to be used that way.  That's confusing, and
ought to be fixed.  From what I can tell, it's only used for FSL_RIO,
so I suggest that gets converted to:

config HAS_RAPIDIO
bool PCI

config RAPIDIO
tristate "RapidIO support"
depends on HAS_RAPIDIO

config HAS_FSL_RIO
bool
select HAS_RAPIDIO

config FSL_RIO
bool "Freescale Embedded SRIO Controller support"
depends on RAPIDIO = y && HAS_FSL_RIO

This frees up HAS_RAPIDIO to operate as one would expect - to define
whether or not RAPIDIO should be offered.  This also allows:

config ARM
select HAS_RAPIDIO if PCI

to be added to arch/arm/Kconfig if appropriate.  However, I'm not yet
convinced that _just because_ we have PCI does not mean that RAPIDIO
should be offered.  I stated a series of questions about that last
Tuesday in response to an individual patch adding rapidio to arch/arm,
and that email seems to have been ignored - at least as far as the
questions go.

Please ensure that you respond to your reviewers questions, otherwise
you will start receiving plain NAKs to your patches instead (since
it becomes a waste of time for reviewers to put any further effort
in to explain why they don't like the patch.)

Thanks.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


Re: [PATCH 5/6] arm: enable RapidIO menu in Kconfig

2018-07-31 Thread Russell King - ARM Linux
On Tue, Jul 31, 2018 at 08:43:02AM -0400, Alex Bounine wrote:
> On 2018-07-31 08:04 AM, Christoph Hellwig wrote:
> >On Mon, Jul 30, 2018 at 06:50:33PM -0400, Alexei Colin wrote:
> >>diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> >>index afe350e5e3d9..602a61324890 100644
> >>--- a/arch/arm/Kconfig
> >>+++ b/arch/arm/Kconfig
> >>@@ -1278,6 +1278,8 @@ config PCI_HOST_ITE8152
> >>  source "drivers/pci/Kconfig"
> >>+source "drivers/rapidio/Kconfig"
> >
> >Please include this from drivers/Kconfig instead of enabling it
> >for a completely random set of architectures.
> >
> This is not a random set of architectures but only ones that implement
> support for RapidIO as system bus.
> 
> On some platforms RapidIO can be the only system bus available replacing
> PCI/PCIe.
> 
> As it is done now, RapidIO is configured in "Bus Options" (x86/PPC) or "Bus
> Support" (ARMs) sub-menu and from system configuration option it should be
> kept this way.
> 
> Current location of RAPIDIO configuration option is familiar to users of
> PowerPC and x86 platforms, and is similarly available in some ARM
> manufacturers kernel code trees.

... which is why you have a HAS_RAPIDIO thing and select it from
arch/*/Kconfig as I suggested in my original review of your patch
set.

As I've also said, I'm unable to review what you've sent this time
around because I don't have all your patches.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


Re: [PATCH] PCI: let pci_request_irq properly deal with threaded interrupts

2018-07-31 Thread Russell King - ARM Linux
On Tue, Jul 31, 2018 at 12:50:21AM +0200, Thomas Gleixner wrote:
> On Mon, 30 Jul 2018, Bjorn Helgaas wrote:
> 
> > [+cc maintainers of possibly erroneous callers of request_threaded_irq()]
> > 
> > On Mon, Jul 30, 2018 at 04:30:28PM -0500, Bjorn Helgaas wrote:
> > > [+cc Thomas, Christoph, LKML]
> > > 
> > > On Mon, Jul 30, 2018 at 12:03:42AM +0200, Heiner Kallweit wrote:
> > > > If we have a threaded interrupt with the handler being NULL, then
> > > > request_threaded_irq() -> __setup_irq() will complain and bail out
> > > > if the IRQF_ONESHOT flag isn't set. Therefore check for the handler
> > > > being NULL and set IRQF_ONESHOT in this case.
> > > > 
> > > > This change is needed to migrate the mei_me driver to
> > > > pci_alloc_irq_vectors() and pci_request_irq().
> > > > 
> > > > Signed-off-by: Heiner Kallweit 
> > > 
> > > I'd like an ack from Thomas because this requirement about IRQF_ONESHOT
> > > usage isn't mentioned in the request_threaded_irq() function doc or
> > > Documentation/
> > 
> > Possibly these other request_threaded_irq() callers are similarly
> > broken?  I can't tell for sure about tda998x_create(), but all the
> > others certainly call request_threaded_irq() with "handler == NULL"
> > and irqflags that do not contain IRQF_ONESHOT:
> > 
> >   tda998x_create()
> > request_threaded_irq(client->irq, NULL, ..., irqd_get_trigger_type(), 
> > ...)
> > (I can't tell what irqd_get_trigger_type() does)
> 
> It reads the trigger type back from the irq chip (level/edge/polarity) but
> does not return with the ONESHOT bit set.

What tree are you (Bjorn) using there?  Looking in my git history,
request_threaded_irq() was added in January 2014, as:

+   irqf_trigger =
+   irqd_get_trigger_type(irq_get_irq_data(client->irq));
+   ret = request_threaded_irq(client->irq, NULL,
+  tda998x_irq_thread,
+  irqf_trigger | IRQF_ONESHOT,
+  "tda998x", priv);

and updated more recently for the CEC support in ae81553c30ef
("drm/i2c: tda998x: allow interrupt to be shared") to be:

-   irqf_trigger =
+   irq_flags =
irqd_get_trigger_type(irq_get_irq_data(client->irq));
+   irq_flags |= IRQF_SHARED | IRQF_ONESHOT;
ret = request_threaded_irq(client->irq, NULL,
-  tda998x_irq_thread,
-  irqf_trigger | IRQF_ONESHOT,
+  tda998x_irq_thread, irq_flags,
   "tda998x", priv);

In all cases, IRQF_ONESHOT has been set for this threaded interrupt.

I've also checked drm-next, and it doesn't show anything like what
Bjorn quoted.

Confused.  Bjorn, can you double check that what you think are
erroneous request_threaded_irq() calls are actually as per what
you quoted please?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


Re: [PATCH][RESEND 2] component: enhance handling of devres group for master

2018-07-26 Thread Russell King - ARM Linux
On Wed, Jul 25, 2018 at 11:06:28PM -0700, bgosw...@codeaurora.org wrote:
> From: Banajit Goswami 
> 
> The devres group opened for a master is left open-ended (without
> devres_group_close) even after bind() is complete. Similarly, while
> releasing the devres resources for master, the most recently opened
> devres group is selected, and released without identifying the
> targeted group. As the devres group opened before master bind was
> never closed, there may have unintended consequences of releasing
> devres resources that were allocated after master bind() function
> was complete.

... as intended, because this layers on top of the driver model.
You are aware that the driver model leaves its implicit group open
as well?

I think you need to explain why you need this - what is the problem
scenario that you are encountering.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


Re: [PATCH] arm: enable RapidIO config options in Kconfig

2018-07-24 Thread Russell King - ARM Linux
On Tue, Jul 24, 2018 at 09:41:26AM -0400, Alexei Colin wrote:
> ARM SoCs with a PCI bus offer the RapiodIO config menu; SoCs with
> RapidIO IP blocks but without a PCI bus, need to add "select
> HAS_RAPIDIO" to the Kconfig entry for that SoC (e.g. ARCH_*).
> 
> HAS_RAPIDIO was chosen over HAVE_RAPIDIO to be consistent with
> other architectures which already define this flag (powerpc).

Is there any reason we can't have the RAPIDIO and HAS_RAPIDIO config
blocks in drivers/rapidio/Kconfig (maybe without the || PCI - see
below), and then have architecture Kconfig files do:

config (ARM or PPC)
select HAS_RAPIDIO if PCI

or in the case of MIPS:

config MIPS
...
select HAS_RAPIDIO

That would avoid duplicating almost identical RAPIDIO and HAS_RAPIDIO
config blocks in architecture Kconfig files.

Why should rapidio be available just because we have PCI selected?  If
it behaves as a PCI add-in card, then why isn't this available for any
architecture with PCI?

What about build coverage - shouldn't RAPIDIO be selectable for (eg)
other architectures via COMPILE_TEST if it's supposed to be generic?

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


Re: Enquiry on unbalanced memory throughput for dual-Cortex A9 core.

2018-07-20 Thread Russell King - ARM Linux
On Fri, Jul 20, 2018 at 08:49:47AM +, Ooi, Tzy Way wrote:
> Hi Russell,
> 
> I am trying the memory write operation with the LM benchmark test. I
> tried to execute the memory write operation here
> 
> twice to get both Cortex A9 core processor to work on each processes.
> Both processors is going to perform write operation at almost the same
> time to the memory.
> 
> As shown in the pictures below, the memory throughput from one of the
> cores is about double the throughput of another core. i.e. 377MB/s VS
> 728MB/s
> 
> [cid:image001.png@01D42049.5A7D0070]
> 
> I have tested this operation across few dual cores Cortex A9 boards and
> all the board is having the same result. The test is tested on kernel
> version 4.9 and newest Linux kernel version 4.18.0-rc2

Here's how 4.14 behaves on an iMX6D SoC (also dual core Cortex A9):

$ taskset -c 0 ./bw_mem -N 1000 1M fwr & taskset -c 1 ./bw_mem -N 1000 1M fwr
[1] 21799
1.00 521.10
1.00 497.27
[1]+  Donetaskset -c 0 ./bw_mem -N 1000 1M fwr
$ taskset -c 0 ./bw_mem -N 1000 1M fwr & taskset -c 1 ./bw_mem -N 1000 1M fwr
[1] 21803
1.00 520.83
1.00 496.44

which shows some asymmetry but nowhere near yours.

I'm using taskset to force each to be locked to a particular CPU - you'll
see why further down.  Even without it, I get similar results to those I
mention above.

Now, playing around with this, so we can identify which bw_mem output is
which:

$ taskset -c 0 ./bw_mem -N 1000 1M fwr & c1=$(taskset -c 1 ./bw_mem -N 1000 1M 
fwr 2>&1); echo "c1: $c1"
[1] 21876
1.00 521.92
c1: 1.00 496.69
$ taskset -c 1 ./bw_mem -N 1000 1M fwr & c1=$(taskset -c 0 ./bw_mem -N 1000 1M 
fwr 2>&1); echo "c0: $c1"
[1] 21881
c0: 1.00 521.83
1.00 496.20

CPU0 is always the slightly faster of the two.  If we use /usr/bin/time
to time these:

CPU0:
6.10user 0.25system 0:06.56elapsed 96%CPU (0avgtext+0avgdata 1664maxresident)k
0inputs+0outputs (0major+407minor)pagefaults 0swaps

CPU1:
6.36user 0.24system 0:06.77elapsed 97%CPU (0avgtext+0avgdata 1600maxresident)k
0inputs+0outputs (0major+399minor)pagefaults 0swaps

So, CPU1 takes slightly longer in userspace, has less resident pages and
less minor faults which is rather odd.  Repeatedly running just one
instance gives different results each time... disabling virtual address
space randomisation solves that:

  echo 0 >/proc/sys/kernel/randomize_va_space

which then gives me:

CPU0: 1.00 520.20
6.18user 0.20system 0:06.59elapsed 96%CPU (0avgtext+0avgdata 1700maxresident)k
0inputs+0outputs (0major+403minor)pagefaults 0swaps
CPU1: 1.00 496.61
6.46user 0.14system 0:06.77elapsed 97%CPU (0avgtext+0avgdata 1700maxresident)k
0inputs+0outputs (0major+403minor)pagefaults 0swaps

CPU0: 1.00 521.10
6.13user 0.21system 0:06.57elapsed 96%CPU (0avgtext+0avgdata 1700maxresident)k
0inputs+0outputs (0major+403minor)pagefaults 0swaps
CPU1: 1.00 498.01
6.40user 0.18system 0:06.75elapsed 97%CPU (0avgtext+0avgdata 1700maxresident)k
0inputs+0outputs (0major+403minor)pagefaults 0swaps

which is rather more stable as far as resource usage goes between the
two CPUs, but still an asymmetry in the reported bandwidths and times.
So, this has ruled out differences in VA layout.

Now for the interesting bit... it's important to understand what and
how stuff is being measured.  Looking at the bw_mem.c and associated
source code, it measures the performance against the wall clock, which
includes everything that the system is doing on each particular CPU.
So, if a CPU is interrupted by another thread wanting to run, it'll
affect the results.  Hence, it's best to run on an otherwise quiet
system, eg, without an init daemon (eg, booted with init=/bin/sh on
the kernel command line - but note there won't be any job control,
so ^C won't work!)

However, continuing on...

If I run bw_mem on just one CPU:

CPU1: 1.00 2617.31
5.74user 0.18system 0:06.03elapsed 98%CPU (0avgtext+0avgdata 1700maxresident)k
0inputs+0outputs (0major+403minor)pagefaults 0swaps

Same number of iterations, same memory size, but notice that it appears
to be a lot faster reported by bw_mem, but the time taken is about the
same.  cpufreq comes to mind, but that's disabled on this system.

So, it brings up a rather obvious question: what exactly is bw_mem
measuring, and is it measuring it correctly?

$ /usr/bin/time taskset -c 1 ./bw_mem -P 1 -N 1000 1M fwr
1.00 2601.26
5.80user 0.16system 0:06.06elapsed 98%CPU (0avgtext+0avgdata 1700maxresident)k
0inputs+0outputs (0major+403minor)pagefaults 0swaps
$ /usr/bin/time ./bw_mem -P 2 -N 1000 1M fwr
^CCommand terminated by signal 2
5.54user 0.13system 1:12.20elapsed 7%CPU (0avgtext+0avgdata 1696maxresident)k
0inputs+0outputs (0major+365minor)pagefaults 0swaps

so requesting a parallelism of 2 results in the program never seemingly
ending in a reasonable period of time, which suggests a bug somewhere.
Are we sure that bw_mem is actually working as intended?

Maybe if Larry is reading 

Re: [PATCH] ARM64: smp: Fix cpu_up() racing with sys_reboot

2018-07-19 Thread Russell King - ARM Linux
On Thu, Jul 19, 2018 at 03:18:46PM -0700, Venkata Narendra Kumar Gutta wrote:
> Nothing stops a process from hotplugging in a CPU concurrently
> with a sys_reboot() call. In such a situation we could have
> ipi_cpu_stop() mark a cpu as 'offline' and _cpu_up() ignore the
> fact that the CPU is not really offline and call the
> CPU_UP_PREPARE notifier. When this happens stop_machine code will
> complain that the cpu thread already exists and BUG_ON().
> 
> CPU0  CPU1
> 
>  sys_reboot()
>  kernel_restart()
>  machine_restart()
>  machine_shutdown()
>  smp_send_stop()
>  ...   ipi_cpu_stop()
>  set_cpu_online(1, false)
>local_irq_disable()
>while(1)
> 
>  cpu_up()
>  _cpu_up()
>  if (!cpu_online(1))
>  __cpu_notify(CPU_UP_PREPARE...)
> 
>  cpu_stop_cpu_callback()
>  BUG_ON(stopper->thread)
> 
> This is easily reproducible by hotplugging in and out in a tight
> loop while also rebooting.
> 
> Since the CPU is not really offline and hasn't gone through the
> proper steps to be marked as such, let's mark the CPU as inactive.
> This is just as easily testable as online and avoids any possibility
> of _cpu_up() trying to bring the CPU back online when it never was
> offline to begin with. Based on the similar patchset by for arm
> targets 040c163( "ARM: smp: Fix cpu_up() racing with sys_reboot)"

You really need to use 12 digit commit IDs to refer to commits to
avoid:

$ git log 040c163
fatal: ambiguous argument '040c163': unknown revision or path not in the 
working tree.

because 7 digits are just too short and given the size of the kernel,
result in conflicting references.

I can find no trace of "040c163" touching arch/arm/kernel/smp.c,
nor can I find a commit with "racing with sys_reboot" touching the
same file.

The commit you reference doesn't seem to be in mainline.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


  1   2   3   4   5   6   7   8   9   10   >