Re: [v3,1/1] powerpc/pseries: fix EEH recovery of some IOV devices

2018-07-31 Thread Michael Ellerman
On Mon, 2018-07-30 at 01:59:14 UTC, Sam Bobroff wrote:
> EEH recovery currently fails on pSeries for some IOV capable PCI
> devices, if CONFIG_PCI_IOV is on and the hypervisor doesn't provide
> certain device tree properties for the device. (Found on an IOV
> capable device using the ipr driver.)
> 
> Recovery fails in pci_enable_resources() at the check on r->parent,
> because r->flags is set and r->parent is not.  This state is due to
> sriov_init() setting the start, end and flags members of the IOV BARs
> but the parent not being set later in
> pseries_pci_fixup_iov_resources(), because the
> "ibm,open-sriov-vf-bar-info" property is missing.
> 
> Correct this by zeroing the resource flags for IOV BARs when they
> can't be configured (this is the same method used by sriov_init() and
> __pci_read_base()).
> 
> VFs cleared this way can't be enabled later, because that requires
> another device tree property, "ibm,number-of-configurable-vfs" as well
> as support for the RTAS function "ibm_map_pes". These are all part of
> hypervisor support for IOV and it seems unlikely that a hypervisor
> would ever partially, but not fully, support it. (None are currently
> provided by QEMU/KVM.)
> 
> Signed-off-by: Sam Bobroff 
> Reviewed-by: Bryant G. Ly 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/b87b9cf4935325c98522823caeddd3

cheers


Re: Adds __init annotation at mmu_init_secondary func

2018-07-31 Thread Michael Ellerman
On Thu, 2018-07-26 at 12:52:50 UTC, Alexey Spirkov wrote:
> mmu_init_secondary function called at initialization sequence
> but it misses __init annotation. As result modpost warning is generated.
> Some building systems sensitive to such kind of warnings.
> 
> Signed-off-by: Alexey Spirkov 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/f7e2a152231f4a0308cc8f9c2296ba

cheers


Re: [v8, 1/2] powernv:opal-sensor-groups: Add support to enable sensor groups

2018-07-31 Thread Michael Ellerman
On Tue, 2018-07-24 at 09:13:08 UTC, Shilpasri G Bhat wrote:
> Adds support to enable/disable a sensor group at runtime. This
> can be used to select the sensor groups that needs to be copied to
> main memory by OCC. Sensor groups like power, temperature, current,
> voltage, frequency, utilization can be enabled/disabled at runtime.
> 
> Signed-off-by: Shilpasri G Bhat 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/04baaf28f40c68c35a413cd9d0db71

cheers


Re: powerpc/mm: Don't report PUDs as memory leaks when using kmemleak

2018-07-31 Thread Michael Ellerman
On Thu, 2018-07-19 at 14:33:16 UTC, Michael Ellerman wrote:
> Paul Menzel reported that kmemleak was producing reports such as:
> 
>   unreferenced object 0xc000f8b8 (size 16384):
> comm "init", pid 1, jiffies 4294937416 (age 312.240s)
> hex dump (first 32 bytes):
>   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
>   00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  
> backtrace:
>   [] __pud_alloc+0x80/0x190
>   [<87f2e8a3>] move_page_tables+0xbac/0xdc0
>   [<091e51c2>] shift_arg_pages+0xc0/0x210
>   [] setup_arg_pages+0x22c/0x2a0
>   [<60871529>] load_elf_binary+0x41c/0x1648
>   [] search_binary_handler.part.11+0xbc/0x280
>   [<34e0cdd7>] __do_execve_file.isra.13+0x73c/0x940
>   [<5f953a6e>] sys_execve+0x58/0x70
>   [<9700a858>] system_call+0x5c/0x70
> 
> Indicating that a PUD was being leaked.
> 
> However what's really happening is that kmemleak is not able to
> recognise the references from the PGD to the PUD, because they are not
> fully qualified pointers.
> 
> We can confirm that in xmon, eg:
> 
> Find the task struct for pid 1 "init":
>   0:mon> P
>task_struct ->thread.kspPID   PPID S  P CMD
>   c001fe7c c001fe803960  1  0 S 13 systemd
> 
> Dump virtual address 0 to find the PGD:
>   0:mon> dv 0 c001fe7c
>   pgd  @ 0xc000f8b01000
> 
> Dump the memory of the PGD:
>   0:mon> d c000f8b01000
>   c000f8b01000 f8b9   ||
>   c000f8b01010    ||
>   c000f8b01020    ||
>   c000f8b01030  f8b8  ||
> 
> 
> There we can see the reference to our supposedly leaked PUD. But
> because it's missing the leading 0xc, kmemleak won't recognise it.
> 
> We can confirm it's still in use by translating an address that is
> mapped via it:
>   0:mon> dv 7fff9400 c001fe7c
>   pgd  @ 0xc000f8b01000
>   pgdp @ 0xc000f8b01038 = 0xf8b8 <--
>   pudp @ 0xc000f8b81ff8 = 0x037c4000
>   pmdp @ 0xc37c5ca0 = 0xfbd89000
>   ptep @ 0xc000fbd89000 = 0xc081d5ce0386
>   Maps physical address = 0x0001d5ce
>   Flags = Accessed Dirty Read Write
> 
> The fix is fairly simple. We need to tell kmemleak to ignore PUD
> allocations and never report them as leaks. We can also tell it not to
> scan the PGD, because it will never find pointers in there. However it
> will still notice if we allocate a PGD and then leak it.
> 
> Reported-by: Paul Menzel 
> Signed-off-by: Michael Ellerman 
> Tested-by: Paul Menzel  on IBM S822LC

Applied to powerpc next.

https://git.kernel.org/powerpc/c/a984506c542e26b31cbb446438f843

cheers


Re: [v3,16/16] powerpc: split asm/tlbflush.h

2018-07-31 Thread Michael Ellerman
On Thu, 2018-07-05 at 16:25:21 UTC, Christophe Leroy wrote:
> Split asm/tlbflush.h into:
> asm/nohash/tlbflush.h
> asm/book3s/32/tlbflush.h
> asm/book3s/64/tlbflush.h (already existing)
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/405cb4024e52b137685213b377ea3f

cheers


Re: [v3, 15/16] powerpc: remove unnecessary inclusion of asm/tlbflush.h

2018-07-31 Thread Michael Ellerman
On Thu, 2018-07-05 at 16:25:19 UTC, Christophe Leroy wrote:
> asm/tlbflush.h is only needed for:
> - using functions xxx_flush_tlb_xxx()
> - using MMU_NO_CONTEXT
> - including asm-generic/pgtable.h
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/45ef5992e06dcc3a4c7d34d2305228

cheers


Re: [v3,12/16] powerpc/44x: remove page.h from mmu-44x.h

2018-07-31 Thread Michael Ellerman
On Thu, 2018-07-05 at 16:25:13 UTC, Christophe Leroy wrote:
> mmu-44x.h doesn't need asm/page.h if PAGE_SHIFT are replaced by 
> CONFIG_PPC_XX_PAGES
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/7bc396958cafba126078ad92480a42

cheers


Re: [v3, 11/16] powerpc/nohash: fix hash related comments in pgtable.h

2018-07-31 Thread Michael Ellerman
On Thu, 2018-07-05 at 16:25:11 UTC, Christophe Leroy wrote:
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/0c295d0e9c9f7a592f230bbcf51655

cheers


Re: [v3,10/16] powerpc: fix includes in asm/processor.h

2018-07-31 Thread Michael Ellerman
On Thu, 2018-07-05 at 16:25:09 UTC, Christophe Leroy wrote:
> Remove superflous includes and add missing ones
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/62b8426578c414c918468ab4cc7517

cheers


Re: [v3,09/16] powerpc/book3s: Remove PPC_PIN_SIZE

2018-07-31 Thread Michael Ellerman
On Thu, 2018-07-05 at 16:25:07 UTC, Christophe Leroy wrote:
> PPC_PIN_SIZE is specific to the 44x and is defined in mmu.h
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/6b622669119e20a53a1983cd41115e

cheers


Re: [v3,08/16] powerpc: declare set_breakpoint() static

2018-07-31 Thread Michael Ellerman
On Thu, 2018-07-05 at 16:25:05 UTC, Christophe Leroy wrote:
> set_breakpoint() is only used in process.c so make it static
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/b5ac51d747122f8858bdcb3fc7a5c7

cheers


Re: [v3,07/16] powerpc: remove superflous inclusions of asm/fixmap.h

2018-07-31 Thread Michael Ellerman
On Thu, 2018-07-05 at 16:25:03 UTC, Christophe Leroy wrote:
> Files not using fixmap consts or functions don't need asm/fixmap.h
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/e8cb7a55eb8dcf65838d0911dc7ba0

cheers


Re: [v3,06/16] powerpc: clean inclusions of asm/feature-fixups.h

2018-07-31 Thread Michael Ellerman
On Thu, 2018-07-05 at 16:25:01 UTC, Christophe Leroy wrote:
> files not using feature fixup don't need asm/feature-fixups.h
> files using feature fixup need asm/feature-fixups.h
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/2c86cd188f8a5631f3d75a1dea14d2

cheers


Re: [v3,05/16] powerpc: clean the inclusion of stringify.h

2018-07-31 Thread Michael Ellerman
On Thu, 2018-07-05 at 16:24:59 UTC, Christophe Leroy wrote:
> Only include linux/stringify.h is files using __stringify()
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/5c35a02c545a7bbe77f3a1ae337d9e

cheers


Re: [v3, 04/16] powerpc: move ASM_CONST and stringify_in_c() into asm-const.h

2018-07-31 Thread Michael Ellerman
On Thu, 2018-07-05 at 16:24:57 UTC, Christophe Leroy wrote:
> This patch moves ASM_CONST() and stringify_in_c() into
> dedicated asm-const.h, then cleans all related inclusions.
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/ec0c464cdbf38bf6ddabec8bfa595b

cheers


Re: [v3,03/16] powerpc/405: move PPC405_ERR77 in asm-405.h

2018-07-31 Thread Michael Ellerman
On Thu, 2018-07-05 at 16:24:55 UTC, Christophe Leroy wrote:
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/36a7eeaff7d06cef253c8df6dfe363

cheers


Re: [v3, 02/16] powerpc: remove unneeded inclusions of cpu_has_feature.h

2018-07-31 Thread Michael Ellerman
On Thu, 2018-07-05 at 16:24:53 UTC, Christophe Leroy wrote:
> Files not using cpu_has_feature() don't need cpu_has_feature.h
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/8c58259bba43084eb5876aeefa574e

cheers


Re: [v3,01/16] powerpc: remove kdump.h from page.h

2018-07-31 Thread Michael Ellerman
On Thu, 2018-07-05 at 16:24:51 UTC, Christophe Leroy wrote:
> page.h doesn't need kdump.h
> 
> Signed-off-by: Christophe Leroy 

Applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/db0a2b633da4216b767d7aed95ffe3

cheers


Re: [v4, 1/2] powernv/cpuidle: Parse dt idle properties into global structure

2018-07-31 Thread Michael Ellerman
On Thu, 2018-07-05 at 11:40:21 UTC, Akshay Adiga wrote:
> Device-tree parsing happens twice, once while deciding idle state to be
> used for hotplug and once during cpuidle init. Hence, parsing the device
> tree and caching it will reduce code duplication. Parsing code has been
> moved to pnv_parse_cpuidle_dt() from pnv_probe_idle_states(). In addition
> to the properties in the device tree the number of available states is
> also required.
> 
> Signed-off-by: Akshay Adiga 
> Reviewed-by: Nicholas Piggin 
> Reviewed-by: Gautham R. Shenoy 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/9c7b185ab2fe313b4426bf55da3624

cheers


Re: [v4,01/11] macintosh/via-pmu: Fix section mismatch warning

2018-07-31 Thread Michael Ellerman
On Mon, 2018-07-02 at 08:21:18 UTC, Finn Thain wrote:
> The pmu_init() function has the __init qualifier, but the ops struct
> that holds a pointer to it does not. This causes a build warning.
> The driver works fine because the pointer is only dereferenced early.
> 
> The function is so small that there's negligible benefit from using
> the __init qualifier. Remove it to fix the warning, consistent with
> the other ADB drivers.
> 
> Tested-by: Stan Johnson 
> Signed-off-by: Finn Thain 
> Reviewed-by: Geert Uytterhoeven 

Series applied to powerpc next, thanks.

https://git.kernel.org/powerpc/c/73f4447d43484224d7abfba0d9468d

cheers


RE: [PATCH 3/3] ptp_qoriq: convert to use module parameters for initialization

2018-07-31 Thread Y.b. Lu
Hi David,

> -Original Message-
> From: David Miller [mailto:da...@davemloft.net]
> Sent: Tuesday, July 31, 2018 12:26 AM
> To: Y.b. Lu 
> Cc: net...@vger.kernel.org; Madalin-cristian Bucur
> ; richardcoch...@gmail.com; robh...@kernel.org;
> shawn...@kernel.org; devicet...@vger.kernel.org;
> linuxppc-dev@lists.ozlabs.org; linux-arm-ker...@lists.infradead.org;
> linux-ker...@vger.kernel.org
> Subject: Re: [PATCH 3/3] ptp_qoriq: convert to use module parameters for
> initialization
> 
> From: Yangbo Lu 
> Date: Mon, 30 Jul 2018 18:01:54 +0800
> 
> > +static unsigned int cksel = DEFAULT_CKSEL; module_param(cksel, uint,
> > +0644); MODULE_PARM_DESC(cksel, "Select reference clock");
> > +
> > +static unsigned int clk_src;
> > +module_param(clk_src, uint, 0644);
> > +MODULE_PARM_DESC(clk_src, "Reference clock frequency (if clocks
> > +property not provided in dts)");
> > +
> > +static unsigned int tmr_prsc = 2;
> > +module_param(tmr_prsc, uint, 0644);
> > +MODULE_PARM_DESC(tmr_prsc, "Output clock division/prescale factor");
> > +
> > +static unsigned int tmr_fiper1 = 10; module_param(tmr_fiper1,
> > +uint, 0644); MODULE_PARM_DESC(tmr_fiper1, "Desired fixed interval
> > +pulse period (ns)");
> > +
> > +static unsigned int tmr_fiper2 = 10; module_param(tmr_fiper2,
> > +uint, 0644); MODULE_PARM_DESC(tmr_fiper2, "Desired fixed interval
> > +pulse period (ns)");
> 
> Sorry, there is no way I am every applying something like this.  Module
> parameters are to be avoided at all costs.
> 
> And you don't need it here, you have DTS, please use it.
> 
> You are required to support the existing DTS cases, in order to avoid breaking
> things, anyways.
[Y.b. Lu] I get your point. Will drop module_param method.
Thanks a lot for your suggestion.


RE: [PATCH 3/3] ptp_qoriq: convert to use module parameters for initialization

2018-07-31 Thread Y.b. Lu
Hi Richard,

> -Original Message-
> From: Richard Cochran [mailto:richardcoch...@gmail.com]
> Sent: Monday, July 30, 2018 10:31 PM
> To: Y.b. Lu 
> Cc: net...@vger.kernel.org; Madalin-cristian Bucur
> ; Rob Herring ; Shawn Guo
> ; David S . Miller ;
> devicet...@vger.kernel.org; linuxppc-dev@lists.ozlabs.org;
> linux-arm-ker...@lists.infradead.org; linux-ker...@vger.kernel.org
> Subject: Re: [PATCH 3/3] ptp_qoriq: convert to use module parameters for
> initialization
> 
> On Mon, Jul 30, 2018 at 06:01:54PM +0800, Yangbo Lu wrote:
> > The ptp_qoriq driver initialized the 1588 timer with the
> > configurations provided by the properties of device tree node. For
> > example,
> >
> >   fsl,tclk-period = <5>;
> >   fsl,tmr-prsc= <2>;
> >   fsl,tmr-add = <0xaaab>;
> >   fsl,tmr-fiper1  = <5>;
> >   fsl,tmr-fiper2  = <0>;
> >   fsl,max-adj = <4>;
> >
> > These things actually were runtime configurations which were not
> > proper to be put into dts.
> 
> That is debatable.  While I agree that the dts isn't ideal for these, still 
> it is the
> lesser of two or more evils.

[Y.b. Lu] Ok. You're right indeed :)

> 
> > This patch is to convert
> > to use module parameters for 1588 timer initialization, and to support
> > initial register values calculation.
> 
> It is hard for me to understand how using module parameters improves the
> situation.

[Y.b. Lu] Actually I'm not sure whether module_param will be accepted to 
replace dts.
I thought the most possibility would be rejection before sending them out.
Just want suggestion and confirmation whether there is better idea than dts 
from your comments.

Since we should keep the dts, I will drop the module_param.
Could I add a function to calculate a set of default register values to 
initialize ptp timer when dts method failed to get required properties in 
driver?
I think this will be useful. The ptp timer on new platforms (you may see two 
dts patches in this patchset. Many platforms will be affected.) will work 
without these dts properties. If user want specific setting, they can set dts 
properties.


> 
> > If the parameters are not provided, the driver will calculate register
> > values with a set of default parameters. With this patch, those dts
> > properties are no longer needed for new platform to support 1588
> > timer, and many QorIQ DPAA platforms (some P series and T series
> > platforms of PowerPC, and some LS series platforms of ARM64) could use
> > this driver for their fman ptp timer with default module parameters.
> > However, this patch didn't remove the dts method. Because there were
> > still many old platforms using the dts method. We need to clean up
> > their dts files, verify module parameters on them, and convert them to
> > the new method gradually in case of breaking any function.
> 
> In addition, like it or not, because the dts is an ABI, you must continue 
> support
> of the dts values as a legacy option.

[Y.b. Lu] I get your point now. The dts should be kept :)

> 
> Thanks,
> Richard


Re: [PATCH resend] powerpc/64s: fix page table fragment refcount race vs speculative references

2018-07-31 Thread Nicholas Piggin
On Tue, 31 Jul 2018 21:42:22 +1000
Michael Ellerman  wrote:

> Nicholas Piggin  writes:
> > On Fri, 27 Jul 2018 08:38:35 -0700
> > Matthew Wilcox  wrote:  
> >> On Sat, Jul 28, 2018 at 12:29:06AM +1000, Nicholas Piggin wrote:  
> >> > On Fri, 27 Jul 2018 06:41:56 -0700
> >> > Matthew Wilcox  wrote:  
> >> > > On Fri, Jul 27, 2018 at 09:48:17PM +1000, Nicholas Piggin wrote:
> >> > > > The page table fragment allocator uses the main page refcount racily
> >> > > > with respect to speculative references. A customer observed a BUG due
> >> > > > to page table page refcount underflow in the fragment allocator. This
> >> > > > can be caused by the fragment allocator set_page_count stomping on a
> >> > > > speculative reference, and then the speculative failure handler
> >> > > > decrements the new reference, and the underflow eventually pops when
> >> > > > the page tables are freed.  
> >> > > 
> >> > > Oof.  Can't you fix this instead by using page_ref_add() instead of
> >> > > set_page_count()?
> >> > 
> >> > It's ugly doing it that way. The problem is we have a page table
> >> > destructor and that would be missed if the spec ref was the last
> >> > put. In practice with RCU page table freeing maybe you can say
> >> > there will be no spec ref there (unless something changes), but
> >> > still it just seems much simpler doing this and avoiding any
> >> > complexity or relying on other synchronization.
> >> 
> >> I don't want to rely on the speculative reference not happening by the
> >> time the page table is torn down; that's way too black-magic for me.
> >> Another possibility would be to use, say, the top 16 bits of the
> >> atomic for your counter and call the dtor once the atomic is below 64k.
> >> I'm also thinking about overhauling the dtor system so it's not tied to
> >> compound pages; anyone with a bit in page_type would be able to use it.
> >> That way you'd always get your dtor called, even if the speculative
> >> reference was the last one.  
> >
> > Yeah we could look at doing either of those if necessary.
> >  
> >> > > > Any objection to the struct page change to grab the arch specific
> >> > > > page table page word for powerpc to use? If not, then this should
> >> > > > go via powerpc tree because it's inconsequential for core mm.  
> >> > > 
> >> > > I want (eventually) to get to the point where every struct page carries
> >> > > a pointer to the struct mm that it belongs to.  It's good for debugging
> >> > > as well as handling memory errors in page tables.
> >> > 
> >> > That doesn't seem like it should be a problem, there's some spare
> >> > words there for arch independent users.
> >> 
> >> Could you take one of the spare words instead then?  My intent was to
> >> just take the 'x86 pgds only' comment off that member.  _pt_pad_2 looks
> >> ideal because it'll be initialised to 0 and you'll return it to 0 by
> >> the time you're done.  
> >
> > It doesn't matter for powerpc where the atomic_t goes, so I'm fine with
> > moving it. But could you juggle the fields with your patch instead? I
> > thought it would be nice to using this field that has been already
> > tested on x86 not to overlap with any other data for
> > bug fix that'll have to be widely backported.  
> 
> Can we come to a conclusion on this one?
> 
> As far as backporting goes pt_mm is new in 4.18-rc so the patch will
> need to be manually backported anyway. But I agree with Nick we'd rather
> use a slot that is known to be free for arch use.

Let's go with that for now. I'd really rather not fix this obscure
bug by introducing something even worse. I'll volunteer to change
the powerpc page table cache code if we can't find any more space in
the struct page.

So what does mapping get used for by page table pages? 4c21e2f2441
("[PATCH] mm: split page table lock") adds that page->mapping = NULL
in pte_lock_deinit, but I don't see why because page->mapping is
never used anywhere else by that patch. Maybe a previous version
of that patch used mapping rather than private?

Thanks,
Nick


Re: [PATCH 00/16] Finally remove PSERIES from exception naming

2018-07-31 Thread Nicholas Piggin
On Thu, 26 Jul 2018 23:07:01 +1000
Michael Ellerman  wrote:

> This finally annoyed me enough to do something about it, I foolishly started
> pulling the string and now here I am.
> 
> I think the end result is sufficiently more readable to justify the churn. I
> particularly like that we now have EXCEPTION_PROLOG_0/1/2.
> 
> It will cause some pain for backports, but Nick plans to rewrite the exception
> vectors entirely at some point so this will be trivial in comparison.

Yeah I have no problem with this series. I'll get onto that rewrite
again soon.

Acked-by: Nicholas Piggin 

> 
> cheers
> 
> 
> Michael Ellerman (16):
>   powerpc/64s: Move SET_SCRATCH0() into EXCEPTION_PROLOG_PSERIES()
>   powerpc/64s: Move SET_SCRATCH0() into EXCEPTION_RELON_PROLOG_PSERIES()
>   powerpc/64s: Rename STD_EXCEPTION_PSERIES to STD_EXCEPTION
>   powerpc/64s: Rename STD_EXCEPTION_PSERIES_OOL to STD_EXCEPTION_OOL
>   powerpc/64s: Rename STD_RELON_EXCEPTION_PSERIES to STD_RELON_EXCEPTION
>   powerpc/64s: Rename STD_RELON_EXCEPTION_PSERIES_OOL to
> STD_RELON_EXCEPTION_OOL
>   powerpc/64s: Rename EXCEPTION_PROLOG_PSERIES_1 to EXCEPTION_PROLOG_2
>   powerpc/64s: Remove PSERIES from the NORI macros
>   powerpc/64s: Rename EXCEPTION_RELON_PROLOG_PSERIES_1
>   powerpc/64s: Rename EXCEPTION_RELON_PROLOG_PSERIES
>   powerpc/64s: Rename EXCEPTION_PROLOG_PSERIES to EXCEPTION_PROLOG
>   powerpc/64s: Drop _MASKABLE_EXCEPTION_PSERIES()
>   powerpc/64s: Drop _MASKABLE_RELON_EXCEPTION_PSERIES()
>   powerpc/64s: Remove PSERIES naming from the MASKABLE macros
>   powerpc/64s: Drop unused loc parameter to MASKABLE_EXCEPTION macros
>   powerpc/64s: Don't use __MASKABLE_EXCEPTION unnecessarily
> 
>  arch/powerpc/include/asm/exception-64s.h | 117 
> ++-
>  arch/powerpc/include/asm/head-64.h   |  16 ++---
>  arch/powerpc/kernel/exceptions-64s.S |  32 -
>  3 files changed, 72 insertions(+), 93 deletions(-)
> 



Re: [PATCH] powerpc/64s: Make unrecoverable SLB miss less confusing

2018-07-31 Thread Nicholas Piggin
On Thu, 26 Jul 2018 23:01:51 +1000
Michael Ellerman  wrote:

> If we take an SLB miss while MSR[RI]=0 we can't recover and have to
> oops. Currently this is reported by faking up a 0x4100 exception, eg:
> 
>   Unrecoverable exception 4100 at 0
>   Oops: Unrecoverable exception, sig: 6 [#1]
>   ...
>   CPU: 0 PID: 1262 Comm: sh Not tainted 
> 4.18.0-rc3-gcc-7.3.1-00098-g7fc2229fb2ab-dirty #9
>   NIP:   LR: c000b9e4 CTR: 7fff8bb971b0
>   REGS: c000ee02bbb0 TRAP: 4100
>   ...
>   LR [c000b9e4] system_call+0x5c/0x70
> 
> The 0x4100 value was chosen back in 2004 as part of the fix for the
> "mega bug" - "ppc64: Fix SLB reload bug". Back then it was obvious
> that 0x4100 was not a real trap value, as the highest actual trap was
> less than 0x2000.
> 
> Since then however the architecture has changed and now we have
> "virtual mode" or "relon" exceptions, in which exceptions can be
> delivered with the MMU on starting at 0x4000.
> 
> At a glance 0x4100 looks like a virtual mode 0x100 exception, aka
> system reset exception. A close reading of the architecture will show
> that system reset exceptions can't be delivered in virtual mode, and
> so 0x4100 is not a valid trap number. But that's not immediately
> obvious. There's also nothing about 0x4100 that suggests SLB miss.
> 
> So to make things a bit less confusing switch to a fake but unique and
> hopefully more helpful numbering. For data SLB misses we report a
> 0x390 trap and for instruction we report 0x490. Compared to 0x380 and
> 0x480 for the actual data & instruction SLB exceptions.
> 
> Also add a C handler that prints a more explicit message. The end
> result is something like:
> 
>   Oops: Unrecoverable SLB miss (MSR[RI]=0), sig: 6 [#3]

This is all good, but allow me to nitpick. Our unrecoverable
exception messages (and other messages, but those) are becoming a bit
ad-hoc and messy.

It would be nice to go the other way eventually and consolidate them
into one. Would be nice to have a common function that takes regs and
returns the string of the corresponding exception name that makes
these more readable.

>   ...
>   CPU: 0 PID: 1262 Comm: sh Not tainted 
> 4.18.0-rc3-gcc-7.3.1-00098-g7fc2229fb2ab-dirty #9
>   NIP:   LR: c000b9e4 CTR: 
>   REGS: c000f19a3bb0 TRAP: 0490

Unless I'm mistaken, the fake trap number was only because the code
couldn't distinguish between 380 and 480. Now that you do, I think you
can just use them directly rather than 390/490.

Thanks,
Nick

>   ...
>   LR [c000b9e4] system_call+0x5c/0x70
> 
> Signed-off-by: Michael Ellerman 
> ---
>  arch/powerpc/include/asm/asm-prototypes.h | 1 +
>  arch/powerpc/kernel/exceptions-64s.S  | 7 +--
>  arch/powerpc/kernel/traps.c   | 6 ++
>  3 files changed, 12 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/powerpc/include/asm/asm-prototypes.h 
> b/arch/powerpc/include/asm/asm-prototypes.h
> index 7841b8a60657..ffba4a6ee619 100644
> --- a/arch/powerpc/include/asm/asm-prototypes.h
> +++ b/arch/powerpc/include/asm/asm-prototypes.h
> @@ -74,6 +74,7 @@ void facility_unavailable_exception(struct pt_regs *regs);
>  void TAUException(struct pt_regs *regs);
>  void altivec_assist_exception(struct pt_regs *regs);
>  void unrecoverable_exception(struct pt_regs *regs);
> +void unrecoverable_slb_miss(struct pt_regs *regs);
>  void kernel_bad_stack(struct pt_regs *regs);
>  void system_reset_exception(struct pt_regs *regs);
>  void machine_check_exception(struct pt_regs *regs);
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index a6fa85916273..8e1396433eb4 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -743,11 +743,14 @@ END_MMU_FTR_SECTION_IFCLR(MMU_FTR_TYPE_RADIX)
>   b   .
>  
>  EXC_COMMON_BEGIN(unrecov_slb)
> - EXCEPTION_PROLOG_COMMON(0x4100, PACA_EXSLB)
> + EXCEPTION_PROLOG_COMMON(0x390, PACA_EXSLB)
>   RECONCILE_IRQ_STATE(r10, r11)
>   bl  save_nvgprs
> + beq cr6, 1f // cr6.eq is set for a data SLB miss ...
> + li  r10, 0x490  // else fix trap number for instruction SLB miss
> + std r10, _TRAP(r1)
>  1:   addir3,r1,STACK_FRAME_OVERHEAD
> - bl  unrecoverable_exception
> + bl  unrecoverable_slb_miss
>   b   1b
>  
>  EXC_COMMON_BEGIN(large_addr_slb)
> diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
> index 0e17dcb48720..0b1724a0b001 100644
> --- a/arch/powerpc/kernel/traps.c
> +++ b/arch/powerpc/kernel/traps.c
> @@ -2061,6 +2061,12 @@ void unrecoverable_exception(struct pt_regs *regs)
>  }
>  NOKPROBE_SYMBOL(unrecoverable_exception);
>  
> +void unrecoverable_slb_miss(struct pt_regs *regs)
> +{
> + die("Unrecoverable SLB miss (MSR[RI]=0)", regs, SIGABRT);
> +}
> +NOKPROBE_SYMBOL(unrecoverable_slb_miss);
> +
>  #if defined(CONFIG_BOOKE_WDT) || defined(CONFIG_40x)

Re: [PATCH] powerpc/64s: Make rfi_flush_fallback a little more robust

2018-07-31 Thread Nicholas Piggin
On Thu, 26 Jul 2018 22:42:44 +1000
Michael Ellerman  wrote:

> Because rfi_flush_fallback runs immediately before the return to
> userspace it currently runs with the user r1 (stack pointer). This
> means if we oops in there we will report a bad kernel stack pointer in
> the exception entry path, eg:
> 
>   Bad kernel stack pointer 77150e40 at c00023b4
>   Oops: Bad kernel stack pointer, sig: 6 [#1]
>   LE SMP NR_CPUS=32 NUMA PowerNV
>   Modules linked in:
>   CPU: 0 PID: 1246 Comm: klogd Not tainted 
> 4.18.0-rc2-gcc-7.3.1-00175-g0443f8a69ba3 #7
>   NIP:  c00023b4 LR: 10053e00 CTR: 0040
>   REGS: c000fffe7d40 TRAP: 4100   Not tainted  
> (4.18.0-rc2-gcc-7.3.1-00175-g0443f8a69ba3)
>   MSR:  92803031   CR: 44000442  XER: 
> 2000
>   CFAR: c000bac8 IRQMASK: c000f1e66a80
>   GPR00: 0200 77150e40 7fff93a99900 0020
>   ...
>   NIP [c00023b4] rfi_flush_fallback+0x34/0x80
>   LR [10053e00] 0x10053e00
> 
> Although the NIP tells us where we were, and the TRAP number tells us
> what happened, it would still be nicer if we could report the actual
> exception rather than barfing about the stack pointer.
> 
> We an do that fairly simply by loading the kernel stack pointer on
> entry and restoring the user value before returning. That way we see a
> regular oops such as:
> 
>   Unrecoverable exception 4100 at c000239c
>   Oops: Unrecoverable exception, sig: 6 [#1]
>   LE SMP NR_CPUS=32 NUMA PowerNV
>   Modules linked in:
>   CPU: 0 PID: 1251 Comm: klogd Not tainted 
> 4.18.0-rc3-gcc-7.3.1-00097-g4ebfcac65acd-dirty #40
>   NIP:  c000239c LR: 10053e00 CTR: 0040
>   REGS: c000f1e17bb0 TRAP: 4100   Not tainted  
> (4.18.0-rc3-gcc-7.3.1-00097-g4ebfcac65acd-dirty)
>   MSR:  92803031   CR: 44000442  XER: 
> 2000
>   CFAR: c000bac8 IRQMASK: 0
>   ...
>   NIP [c000239c] rfi_flush_fallback+0x3c/0x80
>   LR [10053e00] 0x10053e00
>   Call Trace:
>   [c000f1e17e30] [c000b9e4] system_call+0x5c/0x70 (unreliable)
> 
> Note this shouldn't make the kernel stack pointer vulnerable to a
> meltdown attack, because it should be flushed from the cache before we
> return to userspace. The user r1 value will be in the cache, because
> we load it in the return path, but that is harmless.
> 
> Signed-off-by: Michael Ellerman 

Yeah that's a lot nicer, thanks.

Reviewed-by: Nicholas Piggin 

> ---
>  arch/powerpc/kernel/exceptions-64s.S | 6 ++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/arch/powerpc/kernel/exceptions-64s.S 
> b/arch/powerpc/kernel/exceptions-64s.S
> index f7cc12aa3dc6..a6fa85916273 100644
> --- a/arch/powerpc/kernel/exceptions-64s.S
> +++ b/arch/powerpc/kernel/exceptions-64s.S
> @@ -1518,6 +1518,8 @@ TRAMP_REAL_BEGIN(stf_barrier_fallback)
>  TRAMP_REAL_BEGIN(rfi_flush_fallback)
>   SET_SCRATCH0(r13);
>   GET_PACA(r13);
> + std r1,PACA_EXRFI+EX_R12(r13)
> + ld  r1,PACAKSAVE(r13)
>   std r9,PACA_EXRFI+EX_R9(r13)
>   std r10,PACA_EXRFI+EX_R10(r13)
>   std r11,PACA_EXRFI+EX_R11(r13)
> @@ -1552,12 +1554,15 @@ TRAMP_REAL_BEGIN(rfi_flush_fallback)
>   ld  r9,PACA_EXRFI+EX_R9(r13)
>   ld  r10,PACA_EXRFI+EX_R10(r13)
>   ld  r11,PACA_EXRFI+EX_R11(r13)
> + ld  r1,PACA_EXRFI+EX_R12(r13)
>   GET_SCRATCH0(r13);
>   rfid
>  
>  TRAMP_REAL_BEGIN(hrfi_flush_fallback)
>   SET_SCRATCH0(r13);
>   GET_PACA(r13);
> + std r1,PACA_EXRFI+EX_R12(r13)
> + ld  r1,PACAKSAVE(r13)
>   std r9,PACA_EXRFI+EX_R9(r13)
>   std r10,PACA_EXRFI+EX_R10(r13)
>   std r11,PACA_EXRFI+EX_R11(r13)
> @@ -1592,6 +1597,7 @@ TRAMP_REAL_BEGIN(hrfi_flush_fallback)
>   ld  r9,PACA_EXRFI+EX_R9(r13)
>   ld  r10,PACA_EXRFI+EX_R10(r13)
>   ld  r11,PACA_EXRFI+EX_R11(r13)
> + ld  r1,PACA_EXRFI+EX_R12(r13)
>   GET_SCRATCH0(r13);
>   hrfid
>  



Re: [resend] [PATCH 0/3] powerpc/pseries: use H_BLOCK_REMOVE

2018-07-31 Thread Nicholas Piggin
On Fri, 27 Jul 2018 15:51:30 +0200
Laurent Dufour  wrote:

> [Resending so everyone is getting the cover letter]
> 
> On very large system we could see soft lockup fired when a process is exiting
> 
> watchdog: BUG: soft lockup - CPU#851 stuck for 21s! [forkoff:215523]
> Modules linked in: pseries_rng rng_core xfs raid10 vmx_crypto btrfs libcrc32c 
> xor zstd_decompress zstd_compress xxhash lzo_compress raid6_pq crc32c_vpmsum 
> lpfc crc_t10dif crct10dif_generic crct10dif_common dm_multipath scsi_dh_rdac 
> scsi_dh_alua autofs4
> CPU: 851 PID: 215523 Comm: forkoff Not tainted 4.17.0 #1
> NIP:  c00b995c LR: c00b8f64 CTR: aa18
> REGS: c6b0645b7610 TRAP: 0901   Not tainted  (4.17.0)
> MSR:  80010280b033   CR: 22042082  
> XER: 
> CFAR: 006cf8f0 SOFTE: 0 
> GPR00: 0010 c6b0645b7890 c0f99200  
> GPR04: 8e01a5a4de58 400249cf1bfd5480 8e01a5a4de50 400249cf1bfd5480 
> GPR08: 8e01a5a4de48 400249cf1bfd5480 8e01a5a4de40 400249cf1bfd5480 
> GPR12:  c0001e690800 
> NIP [c00b995c] plpar_hcall9+0x44/0x7c
> LR [c00b8f64] pSeries_lpar_flush_hash_range+0x324/0x3d0
> Call Trace:
> [c6b0645b7890] [8e01a5a4dd20] 0x8e01a5a4dd20 (unreliable)
> [c6b0645b7a00] [c006d5b0] flush_hash_range+0x60/0x110
> [c6b0645b7a50] [c0072a2c] __flush_tlb_pending+0x4c/0xd0
> [c6b0645b7a80] [c02eaf44] unmap_page_range+0x984/0xbd0
> [c6b0645b7bc0] [c02eb594] unmap_vmas+0x84/0x100
> [c6b0645b7c10] [c02f8afc] exit_mmap+0xac/0x1f0
> [c6b0645b7cd0] [c00f2638] mmput+0x98/0x1b0
> [c6b0645b7d00] [c00fc9d0] do_exit+0x330/0xc00
> [c6b0645b7dc0] [c00fd384] do_group_exit+0x64/0x100
> [c6b0645b7e00] [c00fd44c] sys_exit_group+0x2c/0x30
> [c6b0645b7e30] [c000b960] system_call+0x58/0x6c
> Instruction dump:
> 6000 f8810028 7ca42b78 7cc53378 7ce63b78 7d074378 7d284b78 7d495378 
> e9410060 e9610068 e9810070 4422 <7d806378> e9810028 f88c f8ac0008
> 
> This happens when removing the PTE by calling the hypervisor using the
> H_BULK_REMOVE call. This call is processing up to 4 PTEs but is doing a
> tlbie for each PTE it is processing. This could lead to long time spent in
> the hypervisor (sometimes up to 4s) and soft lockup being raised because
> the scheduler is not called in zap_pte_range().
> 
> Since the Power7's time, the hypervisor is providing a new hcall
> H_BLOCK_REMOVE allowing processing up to 8 PTEs with one call to
> tlbie. By limiting the amount of tlbie generated, this reduces the time
> spent invalidating the PTEs.

Oh that's a nice feature. I must have an ancient PAPR because I don't
have it. It could be a good project for someone to implement it in KVM
too.

> 
> This hcall requires that the pages are "all within the same naturally
> aligned 8 page virtual address block".
> 
> With this patch series applied, I couldn't see any soft lockup raised on
> the victim LPAR I was running the test one.
> 
> This series is covering both normal pages and huge pages.

Really nice, thanks for working on the problem.

Thanks,
Nick


Re: [PATCH] powerpc/64s/radix: Fix missing global invalidations when removing copro

2018-07-31 Thread Nicholas Piggin
On Tue, 31 Jul 2018 15:24:52 +0200
Frederic Barrat  wrote:

> With the optimizations for TLB invalidation from commit 0cef77c7798a
> ("powerpc/64s/radix: flush remote CPUs out of single-threaded
> mm_cpumask"), the scope of a TLBI (global vs. local) can now be
> influenced by the value of the 'copros' counter of the memory context.
> 
> When calling mm_context_remove_copro(), the 'copros' counter is
> decremented first before flushing. It may have the unintended side
> effect of sending local TLBIs when we explicitly need global
> invalidations in this case. Thus breaking any nMMU user in a bad and
> unpredictable way.
> 
> Fix it by flushing first, before updating the 'copros' counter, so
> that invalidations will be global.
> 
> Fixes: 0cef77c7798a ("powerpc/64s/radix: flush remote CPUs out of 
> single-threaded mm_cpumask")
> Signed-off-by: Frederic Barrat 

Thanks for catching this, looks good to me.

Reviewed-by: Nicholas Piggin 


Re: Infinite looping observed in __offline_pages

2018-07-31 Thread Rashmica



On 26/07/18 04:11, John Allen wrote:
> Hi All,
>
> Under heavy stress and constant memory hot add/remove, I have observed
> the following loop to occasionally loop infinitely:
>
> mm/memory_hotplug.c:__offline_pages
>
> repeat:
>    /* start memory hot removal */
>    ret = -EINTR;
>    if (signal_pending(current))
>    goto failed_removal;
>
>    cond_resched();
>    lru_add_drain_all();
>    drain_all_pages(zone);
>
>    pfn = scan_movable_pages(start_pfn, end_pfn);
>    if (pfn) { /* We have movable pages */
>    ret = do_migrate_range(pfn, end_pfn);
>    goto repeat;
>    }
>

What is CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE set to for you?

I have also observed this when hot removing and adding memory. However I
only have only seen this when my kernel has
CONFIG_MEMORY_HOTPLUG_DEFAULT_ONLINE=n (when it is set to online
automatically I do not have this issue) so I assumed that I wasn't
onlining the memory properly...

> What appears to be happening in this case is that do_migrate_range
> returns a failure code which is being ignored. The failure is stemming
> from migrate_pages returning "1" which I'm guessing is the result of
> us hitting the following case:
>
> mm/migrate.c: migrate_pages
>
> default:
>     /*
>  * Permanent failure (-EBUSY, -ENOSYS, etc.):
>  * unlike -EAGAIN case, the failed page is
>  * removed from migration page list and not
>  * retried in the next outer loop.
>  */
>     nr_failed++;
>     break;
> }
>
> Does a failure in do_migrate_range indicate that the range is
> unmigratable and the loop in __offline_pages should terminate and goto
> failed_removal? Or should we allow a certain number of retrys before we
> give up on migrating the range?
>
> This issue was observed on a ppc64le lpar on a 4.18-rc6 kernel.
>
> -John
>



Re: [PATCH v2 2/2] selftests/powerpc: Add more version checks to alignment_handler test

2018-07-31 Thread Andrew Donnellan

On 31/07/18 22:08, Michael Ellerman wrote:

The alignment_handler is documented to only work on Power8/Power9, but
we can make it run on older CPUs by guarding more of the tests with
feature checks.

Signed-off-by: Michael Ellerman 


Looks good to me.

Reviewed-by: Andrew Donnellan 


---
  .../powerpc/alignment/alignment_handler.c  | 67 +++---
  1 file changed, 59 insertions(+), 8 deletions(-)

v2: Don't incorrectly duplicate any of the tests, as noticed by @ajd.

diff --git a/tools/testing/selftests/powerpc/alignment/alignment_handler.c 
b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
index 0eddd16af49f..169a8b9719fb 100644
--- a/tools/testing/selftests/powerpc/alignment/alignment_handler.c
+++ b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
@@ -49,6 +49,8 @@
  #include 
  #include 
  
+#include 

+
  #include "utils.h"
  
  int bufsize;

@@ -289,6 +291,7 @@ int test_alignment_handler_vsx_206(void)
int rc = 0;
  
  	SKIP_IF(!can_open_fb0());

+   SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06));
  
  	printf("VSX: 2.06B\n");

LOAD_VSX_XFORM_TEST(lxvd2x);
@@ -306,6 +309,7 @@ int test_alignment_handler_vsx_207(void)
int rc = 0;
  
  	SKIP_IF(!can_open_fb0());

+   SKIP_IF(!have_hwcap2(PPC_FEATURE2_ARCH_2_07));
  
  	printf("VSX: 2.07B\n");

LOAD_VSX_XFORM_TEST(lxsspx);
@@ -380,7 +384,6 @@ int test_alignment_handler_integer(void)
LOAD_DFORM_TEST(ldu);
LOAD_XFORM_TEST(ldx);
LOAD_XFORM_TEST(ldux);
-   LOAD_XFORM_TEST(ldbrx);
LOAD_DFORM_TEST(lmw);
STORE_DFORM_TEST(stb);
STORE_XFORM_TEST(stbx);
@@ -400,8 +403,23 @@ int test_alignment_handler_integer(void)
STORE_XFORM_TEST(stdx);
STORE_DFORM_TEST(stdu);
STORE_XFORM_TEST(stdux);
-   STORE_XFORM_TEST(stdbrx);
STORE_DFORM_TEST(stmw);
+
+   return rc;
+}
+
+int test_alignment_handler_integer_206(void)
+{
+   int rc = 0;
+
+   SKIP_IF(!can_open_fb0());
+   SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06));
+
+   printf("Integer: 2.06\n");
+
+   LOAD_XFORM_TEST(ldbrx);
+   STORE_XFORM_TEST(stdbrx);
+
return rc;
  }
  
@@ -410,6 +428,7 @@ int test_alignment_handler_vmx(void)

int rc = 0;
  
  	SKIP_IF(!can_open_fb0());

+   SKIP_IF(!have_hwcap(PPC_FEATURE_HAS_ALTIVEC));
  
  	printf("VMX\n");

LOAD_VMX_XFORM_TEST(lvx);
@@ -441,20 +460,14 @@ int test_alignment_handler_fp(void)
printf("Floating point\n");
LOAD_FLOAT_DFORM_TEST(lfd);
LOAD_FLOAT_XFORM_TEST(lfdx);
-   LOAD_FLOAT_DFORM_TEST(lfdp);
-   LOAD_FLOAT_XFORM_TEST(lfdpx);
LOAD_FLOAT_DFORM_TEST(lfdu);
LOAD_FLOAT_XFORM_TEST(lfdux);
LOAD_FLOAT_DFORM_TEST(lfs);
LOAD_FLOAT_XFORM_TEST(lfsx);
LOAD_FLOAT_DFORM_TEST(lfsu);
LOAD_FLOAT_XFORM_TEST(lfsux);
-   LOAD_FLOAT_XFORM_TEST(lfiwzx);
-   LOAD_FLOAT_XFORM_TEST(lfiwax);
STORE_FLOAT_DFORM_TEST(stfd);
STORE_FLOAT_XFORM_TEST(stfdx);
-   STORE_FLOAT_DFORM_TEST(stfdp);
-   STORE_FLOAT_XFORM_TEST(stfdpx);
STORE_FLOAT_DFORM_TEST(stfdu);
STORE_FLOAT_XFORM_TEST(stfdux);
STORE_FLOAT_DFORM_TEST(stfs);
@@ -466,6 +479,38 @@ int test_alignment_handler_fp(void)
return rc;
  }
  
+int test_alignment_handler_fp_205(void)

+{
+   int rc = 0;
+
+   SKIP_IF(!can_open_fb0());
+   SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_05));
+
+   printf("Floating point: 2.05\n");
+
+   LOAD_FLOAT_DFORM_TEST(lfdp);
+   LOAD_FLOAT_XFORM_TEST(lfdpx);
+   LOAD_FLOAT_XFORM_TEST(lfiwax);
+   STORE_FLOAT_DFORM_TEST(stfdp);
+   STORE_FLOAT_XFORM_TEST(stfdpx);
+
+   return rc;
+}
+
+int test_alignment_handler_fp_206(void)
+{
+   int rc = 0;
+
+   SKIP_IF(!can_open_fb0());
+   SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06));
+
+   printf("Floating point: 2.06\n");
+
+   LOAD_FLOAT_XFORM_TEST(lfiwzx);
+
+   return rc;
+}
+
  void usage(char *prog)
  {
printf("Usage: %s [options]\n", prog);
@@ -513,9 +558,15 @@ int main(int argc, char *argv[])
   "test_alignment_handler_vsx_300");
rc |= test_harness(test_alignment_handler_integer,
   "test_alignment_handler_integer");
+   rc |= test_harness(test_alignment_handler_integer_206,
+  "test_alignment_handler_integer_206");
rc |= test_harness(test_alignment_handler_vmx,
   "test_alignment_handler_vmx");
rc |= test_harness(test_alignment_handler_fp,
   "test_alignment_handler_fp");
+   rc |= test_harness(test_alignment_handler_fp_205,
+  "test_alignment_handler_fp_205");
+   rc |= test_harness(test_alignment_handler_fp_206,
+  "test_alignment_handler_fp_206");
return rc;
  }



--
Andrew Donnellan  OzLabs, ADL Canberra

Re: [PATCH v5 00/11] hugetlb: Factorize hugetlb architecture primitives

2018-07-31 Thread Luiz Capitulino
On Tue, 31 Jul 2018 06:01:44 +
Alexandre Ghiti  wrote:

> [CC linux-mm for inclusion in -mm tree] 
> 
> In order to reduce copy/paste of functions across architectures and then
> make riscv hugetlb port (and future ports) simpler and smaller, this
> patchset intends to factorize the numerous hugetlb primitives that are
> defined across all the architectures.

[...]

>  15 files changed, 139 insertions(+), 382 deletions(-)

I imagine you're mostly interested in non-x86 review at this point, but
as this series is looking amazing:

Reviewed-by: Luiz Capitulino 


Re: [PATCH 1/4] treewide: convert ISO_8859-1 text comments to utf-8

2018-07-31 Thread Rob Herring
On Tue, Jul 24, 2018 at 01:13:25PM +0200, Arnd Bergmann wrote:
> Almost all files in the kernel are either plain text or UTF-8
> encoded. A couple however are ISO_8859-1, usually just a few
> characters in a C comments, for historic reasons.
> 
> This converts them all to UTF-8 for consistency.
> 
> Signed-off-by: Arnd Bergmann 
> ---
>  .../devicetree/bindings/net/nfc/pn544.txt |   2 +-
>  arch/arm/boot/dts/sun4i-a10-inet97fv2.dts |   2 +-
>  arch/arm/crypto/sha256_glue.c |   2 +-
>  arch/arm/crypto/sha256_neon_glue.c|   4 +-
>  drivers/crypto/vmx/ghashp8-ppc.pl |  12 +-
>  drivers/iio/dac/ltc2632.c |   2 +-
>  drivers/power/reset/ltc2952-poweroff.c|   4 +-
>  kernel/events/callchain.c |   2 +-
>  net/netfilter/ipvs/Kconfig|   8 +-
>  net/netfilter/ipvs/ip_vs_mh.c |   4 +-
>  tools/power/cpupower/po/de.po |  44 +++
>  tools/power/cpupower/po/fr.po | 120 +-
>  12 files changed, 103 insertions(+), 103 deletions(-)
> 
> diff --git a/Documentation/devicetree/bindings/net/nfc/pn544.txt 
> b/Documentation/devicetree/bindings/net/nfc/pn544.txt
> index 538a86f7b2b0..72593f056b75 100644
> --- a/Documentation/devicetree/bindings/net/nfc/pn544.txt
> +++ b/Documentation/devicetree/bindings/net/nfc/pn544.txt
> @@ -2,7 +2,7 @@
>  
>  Required properties:
>  - compatible: Should be "nxp,pn544-i2c".
> -- clock-frequency: I�C work frequency.
> +- clock-frequency: I²C work frequency.

I'd prefer just plain ASCII 'I2C' here, but either way:

Acked-by: Rob Herring 

Rob


[PATCH] powerpc/selftests: Wait all threads to join

2018-07-31 Thread Breno Leitao
Test tm-tmspr might exit before all threads stop executing, because it just
waits for the very last thread to join before proceeding/exiting.

This patch makes sure that all threads that were created will join before
proceeding/exiting.

This patch also guarantees that the amount of threads being created is equal
to thread_num.

Signed-off-by: Breno Leitao 
---
 tools/testing/selftests/powerpc/tm/tm-tmspr.c | 27 +--
 1 file changed, 17 insertions(+), 10 deletions(-)

diff --git a/tools/testing/selftests/powerpc/tm/tm-tmspr.c 
b/tools/testing/selftests/powerpc/tm/tm-tmspr.c
index 2bda81c7bf23..df1d7d4b1c89 100644
--- a/tools/testing/selftests/powerpc/tm/tm-tmspr.c
+++ b/tools/testing/selftests/powerpc/tm/tm-tmspr.c
@@ -98,7 +98,7 @@ void texasr(void *in)
 
 int test_tmspr()
 {
-   pthread_t   thread;
+   pthread_t   *thread;
int thread_num;
unsigned long   i;
 
@@ -107,21 +107,28 @@ int test_tmspr()
/* To cause some context switching */
thread_num = 10 * sysconf(_SC_NPROCESSORS_ONLN);
 
+   thread = malloc(thread_num * sizeof(pthread_t));
+   if (thread == NULL)
+   return EXIT_FAILURE;
+
/* Test TFIAR and TFHAR */
-   for (i = 0 ; i < thread_num ; i += 2){
-   if (pthread_create(, NULL, (void*)tfiar_tfhar, (void 
*)i))
+   for (i = 0; i < thread_num; i += 2) {
+   if (pthread_create([i], NULL, (void *)tfiar_tfhar,
+  (void *)i))
return EXIT_FAILURE;
}
-   if (pthread_join(thread, NULL) != 0)
-   return EXIT_FAILURE;
-
/* Test TEXASR */
-   for (i = 0 ; i < thread_num ; i++){
-   if (pthread_create(, NULL, (void*)texasr, (void *)i))
+   for (i = 1; i < thread_num; i += 2) {
+   if (pthread_create([i], NULL, (void *)texasr, (void *)i))
return EXIT_FAILURE;
}
-   if (pthread_join(thread, NULL) != 0)
-   return EXIT_FAILURE;
+
+   for (i = 0; i < thread_num; i++) {
+   if (pthread_join(thread[i], NULL) != 0)
+   return EXIT_FAILURE;
+   }
+
+   free(thread);
 
if (passed)
return 0;
-- 
2.16.3



ssb: Remove SSB_WARN_ON, SSB_BUG_ON and SSB_DEBUG

2018-07-31 Thread Michael Büsch
Use the standard WARN_ON instead.
If a small kernel is desired, WARN_ON can be disabled globally.

Also remove SSB_DEBUG. Besides WARN_ON it only adds a tiny debug check.
Include this check unconditionally.

Signed-off-by: Michael Buesch 
---

CC-ing Mips and PPC maintainers due to changes in defconfig



 arch/mips/configs/bcm47xx_defconfig |  1 -
 arch/powerpc/configs/wii_defconfig  |  1 -
 drivers/ssb/Kconfig |  9 ---
 drivers/ssb/driver_chipcommon.c |  8 +++---
 drivers/ssb/driver_chipcommon_pmu.c | 10 
 drivers/ssb/driver_gpio.c   |  4 +--
 drivers/ssb/driver_pcicore.c|  6 ++---
 drivers/ssb/embedded.c  | 10 
 drivers/ssb/host_soc.c  | 12 -
 drivers/ssb/main.c  | 38 +
 drivers/ssb/pci.c   | 19 +--
 drivers/ssb/pcmcia.c| 14 +--
 drivers/ssb/scan.c  |  4 +--
 drivers/ssb/sdio.c  | 12 -
 drivers/ssb/ssb_private.h   |  9 ---
 include/linux/ssb/ssb.h |  2 --
 16 files changed, 63 insertions(+), 96 deletions(-)

diff --git a/arch/mips/configs/bcm47xx_defconfig 
b/arch/mips/configs/bcm47xx_defconfig
index fad8e964f14c..ba800a892384 100644
--- a/arch/mips/configs/bcm47xx_defconfig
+++ b/arch/mips/configs/bcm47xx_defconfig
@@ -66,7 +66,6 @@ CONFIG_HW_RANDOM=y
 CONFIG_GPIO_SYSFS=y
 CONFIG_WATCHDOG=y
 CONFIG_BCM47XX_WDT=y
-CONFIG_SSB_DEBUG=y
 CONFIG_SSB_DRIVER_GIGE=y
 CONFIG_BCMA_DRIVER_GMAC_CMN=y
 CONFIG_USB=y
diff --git a/arch/powerpc/configs/wii_defconfig 
b/arch/powerpc/configs/wii_defconfig
index 10940533da71..f5c366b02828 100644
--- a/arch/powerpc/configs/wii_defconfig
+++ b/arch/powerpc/configs/wii_defconfig
@@ -78,7 +78,6 @@ CONFIG_GPIO_HLWD=y
 CONFIG_POWER_RESET=y
 CONFIG_POWER_RESET_GPIO=y
 # CONFIG_HWMON is not set
-CONFIG_SSB_DEBUG=y
 CONFIG_FB=y
 # CONFIG_VGA_CONSOLE is not set
 CONFIG_FRAMEBUFFER_CONSOLE=y
diff --git a/drivers/ssb/Kconfig b/drivers/ssb/Kconfig
index 6c438c819eb9..df30e1323252 100644
--- a/drivers/ssb/Kconfig
+++ b/drivers/ssb/Kconfig
@@ -89,15 +89,6 @@ config SSB_HOST_SOC
 
  If unsure, say N
 
-config SSB_DEBUG
-   bool "SSB debugging"
-   depends on SSB
-   help
- This turns on additional runtime checks and debugging
- messages. Turn this on for SSB troubleshooting.
-
- If unsure, say N
-
 config SSB_SERIAL
bool
depends on SSB
diff --git a/drivers/ssb/driver_chipcommon.c b/drivers/ssb/driver_chipcommon.c
index 48050c6fd847..99a4656d113d 100644
--- a/drivers/ssb/driver_chipcommon.c
+++ b/drivers/ssb/driver_chipcommon.c
@@ -56,7 +56,7 @@ void ssb_chipco_set_clockmode(struct ssb_chipcommon *cc,
 
if (cc->capabilities & SSB_CHIPCO_CAP_PMU)
return; /* PMU controls clockmode, separated function needed */
-   SSB_WARN_ON(ccdev->id.revision >= 20);
+   WARN_ON(ccdev->id.revision >= 20);
 
/* chipcommon cores prior to rev6 don't support dynamic clock control */
if (ccdev->id.revision < 6)
@@ -111,7 +111,7 @@ void ssb_chipco_set_clockmode(struct ssb_chipcommon *cc,
}
break;
default:
-   SSB_WARN_ON(1);
+   WARN_ON(1);
}
 }
 
@@ -164,7 +164,7 @@ static int chipco_pctl_clockfreqlimit(struct ssb_chipcommon 
*cc, int get_max)
divisor = 32;
break;
default:
-   SSB_WARN_ON(1);
+   WARN_ON(1);
}
} else if (cc->dev->id.revision < 10) {
switch (clocksrc) {
@@ -277,7 +277,7 @@ static void calc_fast_powerup_delay(struct ssb_chipcommon 
*cc)
minfreq = chipco_pctl_clockfreqlimit(cc, 0);
pll_on_delay = chipco_read32(cc, SSB_CHIPCO_PLLONDELAY);
tmp = (((pll_on_delay + 2) * 100) + (minfreq - 1)) / minfreq;
-   SSB_WARN_ON(tmp & ~0x);
+   WARN_ON(tmp & ~0x);
 
cc->fast_pwrup_delay = tmp;
 }
diff --git a/drivers/ssb/driver_chipcommon_pmu.c 
b/drivers/ssb/driver_chipcommon_pmu.c
index e28682a53cdf..0f60e90ded26 100644
--- a/drivers/ssb/driver_chipcommon_pmu.c
+++ b/drivers/ssb/driver_chipcommon_pmu.c
@@ -128,7 +128,7 @@ static void ssb_pmu0_pllinit_r0(struct ssb_chipcommon *cc,
  ~(1 << SSB_PMURES_5354_BB_PLL_PU));
break;
default:
-   SSB_WARN_ON(1);
+   WARN_ON(1);
}
for (i = 1500; i; i--) {
tmp = chipco_read32(cc, SSB_CHIPCO_CLKCTLST);
@@ -265,7 +265,7 @@ static void ssb_pmu1_pllinit_r0(struct ssb_chipcommon *cc,
buffer_strength = 0x22;
break;
default:
-   SSB_WARN_ON(1);
+   WARN_ON(1);
}
for (i = 1500; i; i--) {
tmp = chipco_read32(cc, SSB_CHIPCO_CLKCTLST);
@@ -501,7 +501,7 @@ static 

Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-07-31 Thread Benjamin Herrenschmidt
On Tue, 2018-07-31 at 10:30 -0700, Christoph Hellwig wrote:
> > However the question people raise is that DMA API is already full of
> > arch-specific tricks the likes of which are outlined in your post linked
> > above. How is this one much worse?
> 
> None of these warts is visible to the driver, they are all handled in
> the architecture (possibly on a per-bus basis).
> 
> So for virtio we really need to decide if it has one set of behavior
> as specified in the virtio spec, or if it behaves exactly as if it
> was on a PCI bus, or in fact probably both as you lined up.  But no
> magic arch specific behavior inbetween.

The only arch specific behaviour is needed in the case where it doesn't
behave like PCI. In this case, the PCI DMA ops are not suitable, but in
our secure VMs, we still need to make it use swiotlb in order to bounce
through non-secure pages.

It would be nice if "real PCI" was the default but it's not, VMs are
created in "legacy" mode all the times and we don't know at VM creation
time whether it will become a secure VM or not. The way our secure VMs
work is that they start as a normal VM, load a secure "payload" and
call the Ultravisor to "become" secure.

So we're in a bit of a bind here. We need that one-liner optional arch
hook to make virtio use swiotlb in that "IOMMU bypass" case.

Ben.



Re: [PATCH] powerpc/mobility: Fix node detach/rename problem

2018-07-31 Thread Tyrel Datwyler
On 07/30/2018 11:42 PM, Michael Ellerman wrote:
> Tyrel Datwyler  writes:
>> On 07/29/2018 06:11 AM, Michael Bringmann wrote:
>>> During LPAR migration, the content of the device tree/sysfs may
>>> be updated including deletion and replacement of nodes in the
>>> tree.  When nodes are added to the internal node structures, they
>>> are appended in FIFO order to a list of nodes maintained by the
>>> OF code APIs.  When nodes are removed from the device tree, they
>>> are marked OF_DETACHED, but not actually deleted from the system
>>> to allow for pointers cached elsewhere in the kernel.  The order
>>> and content of the entries in the list of nodes is not altered,
>>> though.
>>>
>>> During LPAR migration some common nodes are deleted and re-added
>>> e.g. "ibm,platform-facilities".  If a node is re-added to the OF
>>> node lists, the of_attach_node function checks to make sure that
>>> the name + ibm,phandle of the to-be-added data is unique.  As the
>>> previous copy of a re-added node is not modified beyond the addition
>>> of a bit flag, the code (1) finds the old copy, (2) prints a WARNING
>>> notice to the console, (3) renames the to-be-added node to avoid
>>> filename collisions within a directory, and (3) adds entries to
>>> the sysfs/kernfs.
>>
>> So, this patch actually just band aids over the real problem. This is
>> a long standing problem with several PFO drivers leaking references.
>> The issue here is that, during the device tree update that follows a
>> migration. the update of the ibm,platform-facilities node and friends
>> below are always deleted and re-added on the destination lpar and
>> subsequently the leaked references prevent the devices nodes from
>> every actually being properly cleaned up after detach. Thus, leading
>> to the issue you are observing.

So, it was the end of the day, and I kind of glossed over the issue Michael was 
trying to address with an issue that I remembered that had been around for 
awhile.

> 
> Leaking references shouldn't affect the node being detached from the
> tree though.

No, but it does prevent it from being freed from sysfs which leads to the sysfs 
entry renaming that happens when another node with the same name is attached.

> 
> See of_detach_node() calling __of_detach_node(), none of that depends on
> the refcount.
> 
> It's only the actual freeing of the node, in of_node_release() that is
> prevented by leaked reference counts.

Right, but if we did refcount correctly there is the new problem that the node 
is freed and now the phandle cache points at freed memory in the case were no 
invalidation is done.

> 
> So I agree we need to do a better job with the reference counting, but I
> don't see how it is causing the problem here

Now in regards to the phandle caching somehow I missed that code going into OF 
earlier this year. That should have had at least some discussion from our side 
based on the fact that it is built on dtc compiler assumption that there are a 
set number of phandles that are allocated starting at 1..n such that they are 
monotonically increasing. That has a nice fixed size with O(1) lookup time. 
Phyp doesn't guarantee any sort of niceness with nicely ordered phandles. From 
what I've seen there are a subset of phandles that decrease from (-1) 
monotonically, and then there are a bunch of random values for cpus and IOAs. 
Thinking on it might not be that big of a deal as we just end up in the case 
where we have a phandle collision one makes it into the cache and is optimized 
while the other doesn't. On another note, they clearly hit a similar issue 
during overlay attach and remove, and as Rob pointed out their solution to 
handle it is a full cache invalidation followed by rescanning the whole tree to 
rebuild it. Seeing as our dynamic lifecycle is node by node this definitely 
adds some overhead.

-Tyrel

> 
> cheers
> 



Re: phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem)

2018-07-31 Thread Michael Bringmann
On 07/31/2018 02:18 PM, Frank Rowand wrote:
> On 07/31/18 07:17, Rob Herring wrote:
>> On Tue, Jul 31, 2018 at 12:34 AM Michael Ellerman  
>> wrote:
>>>
>>> Hi Rob/Frank,
>>>
>>> I think we might have a problem with the phandle_cache not interacting
>>> well with of_detach_node():
>>
>> Probably needs a similar fix as this commit did for overlays:
>>
>> commit b9952b5218added5577e4a3443969bc20884cea9
>> Author: Frank Rowand 
>> Date:   Thu Jul 12 14:00:07 2018 -0700
>>
>> of: overlay: update phandle cache on overlay apply and remove
>>
>> A comment in the review of the patch adding the phandle cache said that
>> the cache would have to be updated when modules are applied and removed.
>> This patch implements the cache updates.
>>
>> Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of
>> of_find_node_by_phandle()")
>> Reported-by: Alan Tull 
>> Suggested-by: Alan Tull 
>> Signed-off-by: Frank Rowand 
>> Signed-off-by: Rob Herring 
> 
> Agreed.  Sorry about missing the of_detach_node() case.
> 
> 
>> Really what we need here is an "invalidate phandle" function rather
>> than free and re-allocate the whole damn cache.
> 
> The big hammer approach was chosen to avoid the race conditions that
> would otherwise occur.  OF does not have a locking strategy that
> would be able to protect against the races.
> 
> We could maybe implement a slightly smaller hammer by (1) disabling
> the cache, (2) invalidate a phandle entry in the cache, (3) re-enable
> the cache.  That is an off the cuff thought - I would have to look
> a little bit more carefully to make sure it would work.
> 
> But I don't see a need to add the complexity of the smaller hammer
> or the bigger hammer of proper locking _unless_ we start seeing that
> the cache is being freed and re-allocated frequently.  For overlays
> I don't expect the high frequency because it happens on a per overlay
> removal basis (not per node removal basis).  For of_detach_node() the
> event _is_ on a per node removal basis.  Michael, do you expect node
> removals to be a frequent event with low latency being important?  If
> so, a rough guess on what the frequency would be?

I am only seeing node removals during startup of the kernel after an
LPAR migration event.  For the tests that I have run between a couple
of P8 platforms, I see about 15 node removals split between, 8 l2-caches,
4 hardware-specific property clusters (e.g. ibm,platform-facilities),
and 3 CPUs.

> 
> -Frank

Michael
> 
> 
>> Rob
>>
>>>
>>> Michael Bringmann  writes:
 See below.

 On 07/30/2018 01:31 AM, Michael Ellerman wrote:
> Michael Bringmann  writes:
>
>> During LPAR migration, the content of the device tree/sysfs may
>> be updated including deletion and replacement of nodes in the
>> tree.  When nodes are added to the internal node structures, they
>> are appended in FIFO order to a list of nodes maintained by the
>> OF code APIs.
>
> That hasn't been true for several years. The data structure is an n-ary
> tree. What kernel version are you working on?

 Sorry for an error in my description.  I oversimplified based on the
 name of a search iterator.  Let me try to provide a better explanation
 of the problem, here.

 This is the problem.  The PPC mobility code receives RTAS requests to
 delete nodes with platform-/hardware-specific attributes when restarting
 the kernel after a migration.  My example is for migration between a
 P8 Alpine and a P8 Brazos.   Nodes to be deleted may include 
 'ibm,random-v1',
 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1',
 or others.

 The mobility.c code calls 'of_detach_node' for the nodes and their 
 children.
 This makes calls to detach the properties and to try to remove the 
 associated
 sysfs/kernfs files.

 Then new copies of the same nodes are next provided by the PHYP, local
 copies are built, and a pointer to the 'struct device_node' is passed to
 of_attach_node.  Before the call to of_attach_node, the phandle is 
 initialized
 to 0 when the data structure is alloced.  During the call to 
 of_attach_node,
 it calls __of_attach_node which pulls the actual name and phandle from just
 created sub-properties named something like 'name' and 'ibm,phandle'.

 This is all fine for the first migration.  The problem occurs with the
 second and subsequent migrations when the PHYP on the new system wants to
 replace the same set of nodes again, referenced with the same names and
 phandle values.

>
>> When nodes are removed from the device tree, they
>> are marked OF_DETACHED, but not actually deleted from the system
>> to allow for pointers cached elsewhere in the kernel.  The order
>> and content of the entries in the list of nodes is not altered,
>> though.
>
> Something is 

Re: phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem)

2018-07-31 Thread Frank Rowand
On 07/31/18 12:18, Frank Rowand wrote:
> On 07/31/18 07:17, Rob Herring wrote:
>> On Tue, Jul 31, 2018 at 12:34 AM Michael Ellerman  
>> wrote:
>>>
>>> Hi Rob/Frank,
>>>
>>> I think we might have a problem with the phandle_cache not interacting
>>> well with of_detach_node():
>>
>> Probably needs a similar fix as this commit did for overlays:
>>
>> commit b9952b5218added5577e4a3443969bc20884cea9
>> Author: Frank Rowand 
>> Date:   Thu Jul 12 14:00:07 2018 -0700
>>
>> of: overlay: update phandle cache on overlay apply and remove
>>
>> A comment in the review of the patch adding the phandle cache said that
>> the cache would have to be updated when modules are applied and removed.
>> This patch implements the cache updates.
>>
>> Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of
>> of_find_node_by_phandle()")
>> Reported-by: Alan Tull 
>> Suggested-by: Alan Tull 
>> Signed-off-by: Frank Rowand 
>> Signed-off-by: Rob Herring 
> 
> Agreed.  Sorry about missing the of_detach_node() case.
> 
> 
>> Really what we need here is an "invalidate phandle" function rather
>> than free and re-allocate the whole damn cache.
> 
> The big hammer approach was chosen to avoid the race conditions that
> would otherwise occur.  OF does not have a locking strategy that
> would be able to protect against the races.
> 
> We could maybe implement a slightly smaller hammer by (1) disabling
> the cache, (2) invalidate a phandle entry in the cache, (3) re-enable
> the cache.  That is an off the cuff thought - I would have to look
> a little bit more carefully to make sure it would work.
> 
> But I don't see a need to add the complexity of the smaller hammer
> or the bigger hammer of proper locking _unless_ we start seeing that
> the cache is being freed and re-allocated frequently.  For overlays
> I don't expect the high frequency because it happens on a per overlay
> removal basis (not per node removal basis).


>  For of_detach_node() the
> event _is_ on a per node removal basis.  Michael, do you expect node
> removals to be a frequent event with low latency being important?  If
> so, a rough guess on what the frequency would be?

I have not looked at how of_detach_node() is used, so it might not be
very different that overlays.  If a group of of_detach_node() calls
are made from a common code location, the the sequence could possibly
be:

   of_free_phandle_cache()

   multiple calls of of_detach_node()

   of_populate_phandle_cache()

-Frank
> 
> -Frank
> 
> 
>> Rob
>>
>>>
>>> Michael Bringmann  writes:
 See below.

 On 07/30/2018 01:31 AM, Michael Ellerman wrote:
> Michael Bringmann  writes:
>
>> During LPAR migration, the content of the device tree/sysfs may
>> be updated including deletion and replacement of nodes in the
>> tree.  When nodes are added to the internal node structures, they
>> are appended in FIFO order to a list of nodes maintained by the
>> OF code APIs.
>
> That hasn't been true for several years. The data structure is an n-ary
> tree. What kernel version are you working on?

 Sorry for an error in my description.  I oversimplified based on the
 name of a search iterator.  Let me try to provide a better explanation
 of the problem, here.

 This is the problem.  The PPC mobility code receives RTAS requests to
 delete nodes with platform-/hardware-specific attributes when restarting
 the kernel after a migration.  My example is for migration between a
 P8 Alpine and a P8 Brazos.   Nodes to be deleted may include 
 'ibm,random-v1',
 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1',
 or others.

 The mobility.c code calls 'of_detach_node' for the nodes and their 
 children.
 This makes calls to detach the properties and to try to remove the 
 associated
 sysfs/kernfs files.

 Then new copies of the same nodes are next provided by the PHYP, local
 copies are built, and a pointer to the 'struct device_node' is passed to
 of_attach_node.  Before the call to of_attach_node, the phandle is 
 initialized
 to 0 when the data structure is alloced.  During the call to 
 of_attach_node,
 it calls __of_attach_node which pulls the actual name and phandle from just
 created sub-properties named something like 'name' and 'ibm,phandle'.

 This is all fine for the first migration.  The problem occurs with the
 second and subsequent migrations when the PHYP on the new system wants to
 replace the same set of nodes again, referenced with the same names and
 phandle values.

>
>> When nodes are removed from the device tree, they
>> are marked OF_DETACHED, but not actually deleted from the system
>> to allow for pointers cached elsewhere in the kernel.  The order
>> and content of the entries in the list of nodes is 

Re: phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem)

2018-07-31 Thread Frank Rowand
On 07/31/18 07:17, Rob Herring wrote:
> On Tue, Jul 31, 2018 at 12:34 AM Michael Ellerman  wrote:
>>
>> Hi Rob/Frank,
>>
>> I think we might have a problem with the phandle_cache not interacting
>> well with of_detach_node():
> 
> Probably needs a similar fix as this commit did for overlays:
> 
> commit b9952b5218added5577e4a3443969bc20884cea9
> Author: Frank Rowand 
> Date:   Thu Jul 12 14:00:07 2018 -0700
> 
> of: overlay: update phandle cache on overlay apply and remove
> 
> A comment in the review of the patch adding the phandle cache said that
> the cache would have to be updated when modules are applied and removed.
> This patch implements the cache updates.
> 
> Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of
> of_find_node_by_phandle()")
> Reported-by: Alan Tull 
> Suggested-by: Alan Tull 
> Signed-off-by: Frank Rowand 
> Signed-off-by: Rob Herring 

Agreed.  Sorry about missing the of_detach_node() case.


> Really what we need here is an "invalidate phandle" function rather
> than free and re-allocate the whole damn cache.

The big hammer approach was chosen to avoid the race conditions that
would otherwise occur.  OF does not have a locking strategy that
would be able to protect against the races.

We could maybe implement a slightly smaller hammer by (1) disabling
the cache, (2) invalidate a phandle entry in the cache, (3) re-enable
the cache.  That is an off the cuff thought - I would have to look
a little bit more carefully to make sure it would work.

But I don't see a need to add the complexity of the smaller hammer
or the bigger hammer of proper locking _unless_ we start seeing that
the cache is being freed and re-allocated frequently.  For overlays
I don't expect the high frequency because it happens on a per overlay
removal basis (not per node removal basis).  For of_detach_node() the
event _is_ on a per node removal basis.  Michael, do you expect node
removals to be a frequent event with low latency being important?  If
so, a rough guess on what the frequency would be?

-Frank


> Rob
> 
>>
>> Michael Bringmann  writes:
>>> See below.
>>>
>>> On 07/30/2018 01:31 AM, Michael Ellerman wrote:
 Michael Bringmann  writes:

> During LPAR migration, the content of the device tree/sysfs may
> be updated including deletion and replacement of nodes in the
> tree.  When nodes are added to the internal node structures, they
> are appended in FIFO order to a list of nodes maintained by the
> OF code APIs.

 That hasn't been true for several years. The data structure is an n-ary
 tree. What kernel version are you working on?
>>>
>>> Sorry for an error in my description.  I oversimplified based on the
>>> name of a search iterator.  Let me try to provide a better explanation
>>> of the problem, here.
>>>
>>> This is the problem.  The PPC mobility code receives RTAS requests to
>>> delete nodes with platform-/hardware-specific attributes when restarting
>>> the kernel after a migration.  My example is for migration between a
>>> P8 Alpine and a P8 Brazos.   Nodes to be deleted may include 
>>> 'ibm,random-v1',
>>> 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1',
>>> or others.
>>>
>>> The mobility.c code calls 'of_detach_node' for the nodes and their children.
>>> This makes calls to detach the properties and to try to remove the 
>>> associated
>>> sysfs/kernfs files.
>>>
>>> Then new copies of the same nodes are next provided by the PHYP, local
>>> copies are built, and a pointer to the 'struct device_node' is passed to
>>> of_attach_node.  Before the call to of_attach_node, the phandle is 
>>> initialized
>>> to 0 when the data structure is alloced.  During the call to of_attach_node,
>>> it calls __of_attach_node which pulls the actual name and phandle from just
>>> created sub-properties named something like 'name' and 'ibm,phandle'.
>>>
>>> This is all fine for the first migration.  The problem occurs with the
>>> second and subsequent migrations when the PHYP on the new system wants to
>>> replace the same set of nodes again, referenced with the same names and
>>> phandle values.
>>>

> When nodes are removed from the device tree, they
> are marked OF_DETACHED, but not actually deleted from the system
> to allow for pointers cached elsewhere in the kernel.  The order
> and content of the entries in the list of nodes is not altered,
> though.

 Something is going wrong if this is actually happening.

 When the node is detached it should be *detached* from the tree of all
 nodes, so it should not be discoverable other than by having an existing
 pointer to it.
>>> On the second and subsequent migrations, the PHYP tells the system
>>> to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1',
>>> 'ibm,compression-v1', 'ibm,sym-encryption-v1'.  It specifies these
>>> nodes by its known set of phandle values -- the same 

Re: phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem)

2018-07-31 Thread Michael Bringmann
I applied your suggestion with a couple of modifications,
and it looks to have worked for the first 2 migration events.
I am not seeing the errors from repeated migrations.  The
revised patch tested is,

diff --git a/drivers/of/base.c b/drivers/of/base.c
index 466e3c8..8bf64e5 100644
--- a/drivers/of/base.c
+++ b/drivers/of/base.c
@@ -1096,8 +1096,14 @@ struct device_node *of_find_node_by_phandle(phandle 
handle)

if (phandle_cache) {
if (phandle_cache[masked_handle] &&
-   handle == phandle_cache[masked_handle]->phandle)
+   handle == phandle_cache[masked_handle]->phandle) {
np = phandle_cache[masked_handle];
+
+   if (of_node_check_flag(np, OF_DETACHED)) {
+   np = NULL;
+   phandle_cache[masked_handle] = NULL;
+   }
+   }
}

if (!np) {

During a conference call this morning, Tyrel expressed concerns
about the use of the phandle_cache on powerpc, at all.  I will
try another build with that feature disabled, but without this
patch.

Michael


On 07/31/2018 01:34 AM, Michael Ellerman wrote:
> Hi Rob/Frank,
> 
> I think we might have a problem with the phandle_cache not interacting
> well with of_detach_node():
> 
> Michael Bringmann  writes:
>> See below.
>>
>> On 07/30/2018 01:31 AM, Michael Ellerman wrote:
>>> Michael Bringmann  writes:
>>>
 During LPAR migration, the content of the device tree/sysfs may
 be updated including deletion and replacement of nodes in the
 tree.  When nodes are added to the internal node structures, they
 are appended in FIFO order to a list of nodes maintained by the
 OF code APIs.
>>>
>>> That hasn't been true for several years. The data structure is an n-ary
>>> tree. What kernel version are you working on?
>>
>> Sorry for an error in my description.  I oversimplified based on the
>> name of a search iterator.  Let me try to provide a better explanation
>> of the problem, here.
>>
>> This is the problem.  The PPC mobility code receives RTAS requests to
>> delete nodes with platform-/hardware-specific attributes when restarting
>> the kernel after a migration.  My example is for migration between a
>> P8 Alpine and a P8 Brazos.   Nodes to be deleted may include 'ibm,random-v1',
>> 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1',
>> or others.
>>
>> The mobility.c code calls 'of_detach_node' for the nodes and their children.
>> This makes calls to detach the properties and to try to remove the associated
>> sysfs/kernfs files.
>>
>> Then new copies of the same nodes are next provided by the PHYP, local
>> copies are built, and a pointer to the 'struct device_node' is passed to
>> of_attach_node.  Before the call to of_attach_node, the phandle is 
>> initialized
>> to 0 when the data structure is alloced.  During the call to of_attach_node,
>> it calls __of_attach_node which pulls the actual name and phandle from just
>> created sub-properties named something like 'name' and 'ibm,phandle'.
>>
>> This is all fine for the first migration.  The problem occurs with the
>> second and subsequent migrations when the PHYP on the new system wants to
>> replace the same set of nodes again, referenced with the same names and
>> phandle values.
>>
>>>
 When nodes are removed from the device tree, they
 are marked OF_DETACHED, but not actually deleted from the system
 to allow for pointers cached elsewhere in the kernel.  The order
 and content of the entries in the list of nodes is not altered,
 though.
>>>
>>> Something is going wrong if this is actually happening.
>>>
>>> When the node is detached it should be *detached* from the tree of all
>>> nodes, so it should not be discoverable other than by having an existing
>>> pointer to it.
>> On the second and subsequent migrations, the PHYP tells the system
>> to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1',
>> 'ibm,compression-v1', 'ibm,sym-encryption-v1'.  It specifies these
>> nodes by its known set of phandle values -- the same handles used
>> by the PHYP on the source system are known on the target system.
>> The mobility.c code calls of_find_node_by_phandle() with these values
>> and ends up locating the first instance of each node that was added
>> during the original boot, instead of the second instance of each node
>> created after the first migration.  The detach during the second
>> migration fails with errors like,
>>
>> [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 
>> __of_detach_node+0x8/0xa0
>> [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag 
>> inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc 
>> xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod 
>> ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
>> [ 4565.030733] CPU: 

Re: [RESEND PATCH 0/6] rapidio: move Kconfig menu definition to subsystem

2018-07-31 Thread Alex Bounine




On 2018-07-31 11:59 AM, Russell King - ARM Linux wrote:

For the thread associated with this patch set, a review of a previous
patch for ARM posted last Tuesday on this subject asked a series of
questions about the PCI-nature of this.  The review has not been
responded to.


We are dealing with this now. More appropriate to do it this time than 
before having reworked set.



If it is inappropriate to offer RapidIO for any architecture that
happens to has PCI, then it is inappropriate to offer it for any
ARM machine that happens to have PCI.


It is completely appropriate to use RapidIO on any architecture that has 
PCI/PCIe using existing PCIe-to-SRIO host bridges. Works well with 
Marvell and NVIDIA boards.


Confusion here is caused by the fact that there are ARM and non-ARM 
devices that offer on-chip RapidIO host controllers as well as PCIe. 
E.g. TI Keystone/KeystoneII, FSL 85xx/86xx, Xilinx and Altera FPGAs with 
ARM cores, Cavium on MIPS. In most cases external buses are configurable 
and we have to address possible combinations.


I already posted some explanation in response to your earlier comment.


In light of the lack of explanation on this point so far, I'm naking
the ARM part of this series for now.



Explanations posted.


I also think that the HAS_RAPIDIO thing is misleading and needs
sorting out (as I've mentioned in other emails, including the one
I refer to above) before rapidio becomes available more widely.

Highly likely it is used right now in a base station of mobile operator 
near you :)



On Tue, Jul 31, 2018 at 10:29:48AM -0400, Alexei Colin wrote:

Resending the patchset from prior submission:
https://lkml.org/lkml/2018/7/30/911

The only change are the Cc tags in all patches now include the mailing
lists for all affected architectures, and patch 1/6 (which adds the menu
item to RapdidIO subsystem Kconfig) is CCed to all maintainers who are
getting this cover letter. The cover letter has been updated with
explanations to points raised in the feedback.



The top-level Kconfig entry for RapidIO subsystem is currently
duplicated in several architecture-specific Kconfig files. This set of
patches does two things:

1. Move the Kconfig menu definition into the RapidIO subsystem and
remove the duplicate definitions from arch Kconfig files.

2. Enable RapidIO Kconfig menu entry for arm and arm64 architectures,
where it was not enabled before. I tested that subsystem and drivers
build successfully for both architectures, and tested that the modules
load on a custom arm64 Qemu model.

For all architectures, RapidIO menu should be offered when either:
(1) The platform has a PCI bus (which host a RapidIO module on the bus).
(2) The platform has a RapidIO IP block (connected to a system bus, e.g.
AXI on ARM). In this case, 'select HAS_RAPIDIO' should be added to the
'config ARCH_*' menu entry for the SoCs that offer the IP block.

Prior to this patchset, different architectures used different criteria:
* powerpc: (1) and (2)
* mips: (1) and (2) after recent commit into next that added (2):
   https://www.linux-mips.org/archives/linux-mips/2018-07/msg00596.html
   fc5d988878942e9b42a4de5204bdd452f3f1ce47
   491ec1553e0075f345fbe476a93775eabcbc40b6
* x86: (1)
* arm,arm64: none (RapidIO menus never offered)

This set of architectures are the ones that implement support for
RapidIO as system bus. On some platforms RapidIO can be the only system
bus available replacing PCI/PCIe.  As it is done now, RapidIO is
configured in "Bus Options" (x86/PPC) or "Bus Support" (ARMs) sub-menu
and from system configuration option it should be kept this way.
Current location of RAPIDIO configuration option is familiar to users of
PowerPC and x86 platforms, and is similarly available in some ARM
manufacturers kernel code trees. (Alex Bounine)

HAS_RAPIDIO is not enabled unconditionally, because HAS_RAPIDIO option
is intended for SOCs that have built in SRIO controllers, like TI
KeyStoneII or FPGAs. Because RapidIO subsystem core is required during
RapidIO port driver initialization, having separate option allows us to
control available build options for RapidIO core and port driver (bool
vs.  tristate) and disable module option if port driver is configured as
built-in. (Alex Bounine)

Responses to feedback from prior submission (thanks for the reviews!):
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-July/593347.html
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-July/593349.html

Changelog:
   * Moved Kconfig entry into RapidIO subsystem instead of duplicating

In the current patchset, I took the approach of adding '|| PCI' to the
depends in the subsystem. I did try the alterantive approach mentioned
in the reviews for v1 of this patch, where the subsystem Kconfig does
not add a '|| PCI' and each per-architecture Kconfig has to add a
'select HAS_RAPIDIO if PCI' and SoCs with IP blocks have to also add
'select HAS_RAPIDIO'. This works too but requires each architecture's
Kconfig 

Re: [RFC 0/4] Virtio uses DMA API for all devices

2018-07-31 Thread Christoph Hellwig
On Mon, Jul 30, 2018 at 04:26:32PM +0300, Michael S. Tsirkin wrote:
> Real hardware would reuse parts of the interface but by necessity it
> needs to behave slightly differently on some platforms.  However for
> some platforms (such as x86) a PV virtio driver will by luck work with a
> PCI device backend without changes. As these platforms and drivers are
> widely deployed, some people will deploy hardware like that.  Should be
> a non issue as by definition it's transparent to guests.

On some x86.  As soon as you have an iommu or strange PCI root ports
things are going to start breaking apart.

> > And that very much excludes arch-specific (or
> > Xen-specific) overrides.
> 
> We already committed to a xen specific hack but generally I prefer
> devices that describe how they work instead of platforms magically
> guessing, yes.

For legacy reasons I guess we'll have to keep it, but we really need
to avoid adding more junk than this.

> However the question people raise is that DMA API is already full of
> arch-specific tricks the likes of which are outlined in your post linked
> above. How is this one much worse?

None of these warts is visible to the driver, they are all handled in
the architecture (possibly on a per-bus basis).

So for virtio we really need to decide if it has one set of behavior
as specified in the virtio spec, or if it behaves exactly as if it
was on a PCI bus, or in fact probably both as you lined up.  But no
magic arch specific behavior inbetween.


Re: [RESEND PATCH 0/6] rapidio: move Kconfig menu definition to subsystem

2018-07-31 Thread Russell King - ARM Linux
For the thread associated with this patch set, a review of a previous
patch for ARM posted last Tuesday on this subject asked a series of
questions about the PCI-nature of this.  The review has not been
responded to.

If it is inappropriate to offer RapidIO for any architecture that
happens to has PCI, then it is inappropriate to offer it for any
ARM machine that happens to have PCI.

In light of the lack of explanation on this point so far, I'm naking
the ARM part of this series for now.

I also think that the HAS_RAPIDIO thing is misleading and needs
sorting out (as I've mentioned in other emails, including the one
I refer to above) before rapidio becomes available more widely.

On Tue, Jul 31, 2018 at 10:29:48AM -0400, Alexei Colin wrote:
> Resending the patchset from prior submission:
> https://lkml.org/lkml/2018/7/30/911
> 
> The only change are the Cc tags in all patches now include the mailing
> lists for all affected architectures, and patch 1/6 (which adds the menu
> item to RapdidIO subsystem Kconfig) is CCed to all maintainers who are
> getting this cover letter. The cover letter has been updated with
> explanations to points raised in the feedback.
> 
> 
> 
> The top-level Kconfig entry for RapidIO subsystem is currently
> duplicated in several architecture-specific Kconfig files. This set of
> patches does two things:
> 
> 1. Move the Kconfig menu definition into the RapidIO subsystem and
> remove the duplicate definitions from arch Kconfig files.
> 
> 2. Enable RapidIO Kconfig menu entry for arm and arm64 architectures,
> where it was not enabled before. I tested that subsystem and drivers
> build successfully for both architectures, and tested that the modules
> load on a custom arm64 Qemu model.
> 
> For all architectures, RapidIO menu should be offered when either:
> (1) The platform has a PCI bus (which host a RapidIO module on the bus).
> (2) The platform has a RapidIO IP block (connected to a system bus, e.g.
> AXI on ARM). In this case, 'select HAS_RAPIDIO' should be added to the
> 'config ARCH_*' menu entry for the SoCs that offer the IP block.
> 
> Prior to this patchset, different architectures used different criteria:
> * powerpc: (1) and (2)
> * mips: (1) and (2) after recent commit into next that added (2):
>   https://www.linux-mips.org/archives/linux-mips/2018-07/msg00596.html
>   fc5d988878942e9b42a4de5204bdd452f3f1ce47
>   491ec1553e0075f345fbe476a93775eabcbc40b6
> * x86: (1)
> * arm,arm64: none (RapidIO menus never offered)
> 
> This set of architectures are the ones that implement support for
> RapidIO as system bus. On some platforms RapidIO can be the only system
> bus available replacing PCI/PCIe.  As it is done now, RapidIO is
> configured in "Bus Options" (x86/PPC) or "Bus Support" (ARMs) sub-menu
> and from system configuration option it should be kept this way.
> Current location of RAPIDIO configuration option is familiar to users of
> PowerPC and x86 platforms, and is similarly available in some ARM
> manufacturers kernel code trees. (Alex Bounine)
> 
> HAS_RAPIDIO is not enabled unconditionally, because HAS_RAPIDIO option
> is intended for SOCs that have built in SRIO controllers, like TI
> KeyStoneII or FPGAs. Because RapidIO subsystem core is required during
> RapidIO port driver initialization, having separate option allows us to
> control available build options for RapidIO core and port driver (bool
> vs.  tristate) and disable module option if port driver is configured as
> built-in. (Alex Bounine)
> 
> Responses to feedback from prior submission (thanks for the reviews!):
> http://lists.infradead.org/pipermail/linux-arm-kernel/2018-July/593347.html
> http://lists.infradead.org/pipermail/linux-arm-kernel/2018-July/593349.html
> 
> Changelog:
>   * Moved Kconfig entry into RapidIO subsystem instead of duplicating
> 
> In the current patchset, I took the approach of adding '|| PCI' to the
> depends in the subsystem. I did try the alterantive approach mentioned
> in the reviews for v1 of this patch, where the subsystem Kconfig does
> not add a '|| PCI' and each per-architecture Kconfig has to add a
> 'select HAS_RAPIDIO if PCI' and SoCs with IP blocks have to also add
> 'select HAS_RAPIDIO'. This works too but requires each architecture's
> Kconfig to add the line for RapidIO (whereas current approach does not
> require that involvement) and also may create a false impression that
> the dependency on PCI is strict.
> 
> We appreciate the suggestion for also selecting the RapdiIO subsystem for
> compilation with COMPILE_TEST, but hope to address it in a separate
> patchset, localized to the subsystem, since it will need to change
> depends on all drivers, not just on the top level, and since this
> patch now spans multiple architectures.
> 
> Alexei Colin (6):
>   rapidio: define top Kconfig menu in driver subtree
>   x86: factor out RapidIO Kconfig menu
>   powerpc: factor out RapidIO Kconfig menu entry
>   mips: factor out RapidIO Kconfig entry
>   arm: 

Re: [RESEND PATCH 5/6] arm: enable RapidIO menu in Kconfig

2018-07-31 Thread Russell King - ARM Linux
On Tue, Jul 31, 2018 at 10:29:53AM -0400, Alexei Colin wrote:
> Platforms with a PCI bus will be offered the RapidIO menu since they may
> be want support for a RapidIO PCI device. Platforms without a PCI bus
> that might include a RapidIO IP block will need to "select HAS_RAPIDIO"
> in the platform-/machine-specific "config ARCH_*" Kconfig entry.
> 
> Tested that kernel builds for ARM with the RapidIO subsystem and switch
> drivers enabled.
> 
> Cc: Andrew Morton 
> Cc: Russell King 
> Cc: John Paul Walters 
> Cc: linux-ker...@vger.kernel.org
> Cc: x...@kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-m...@linux-mips.org
> Cc: linux-arm-ker...@lists.infradead.org
> Signed-off-by: Alexei Colin 
> ---
>  arch/arm/Kconfig | 2 ++
>  1 file changed, 2 insertions(+)
> 
> diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
> index afe350e5e3d9..602a61324890 100644
> --- a/arch/arm/Kconfig
> +++ b/arch/arm/Kconfig
> @@ -1278,6 +1278,8 @@ config PCI_HOST_ITE8152
>  
>  source "drivers/pci/Kconfig"
>  
> +source "drivers/rapidio/Kconfig"
> +
>  source "drivers/pcmcia/Kconfig"

Why not place the new Kconfig file after pcmcia?  That way, it is in
a consistent position wrt architectures such as powerpc, and it is
also in alphabetical order.

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


[PATCH] powerpc/pasemi: Seach for PCI root bus by compatible property

2018-07-31 Thread Christian Zigotzky

Just for info: I tested it on my Nemo board today and it works.

-- Christian

On 31 July 2018 at 2:04PM, Michael Ellerman wrote:

Michael Ellerman  writes:

Darren Stevens  writes:


Pasemi arch code finds the root of the PCI-e bus by searching the
device-tree for a node called 'pxp'. But the root bus has a
compatible property of 'pasemi,rootbus' so search for that instead.

Signed-off-by: Darren Stevens 
---

This works on the Amigaone X1000, I don't know if this method of
finding the pci bus was there bcause of earlier firmwares.

Does anyone have another pasemi board they can test this on?

The last time I plugged mine in it popped the power supply and took out
power to half the office :) - I haven't had a chance to try it since.

I actually I remembered I have a device tree lying around from an electra.

It has:

   [I] home:pxp@0,8000(7)(I)> lsprop name compatible
   name "pxp"
   compatible   "pasemi,rootbus"
"pa-pxp"


So it looks like the patch would work fine on it at least.

cheers


diff --git a/arch/powerpc/platforms/pasemi/pci.c 
b/arch/powerpc/platforms/pasemi/pci.c
index c7c8607..be62380 100644
--- a/arch/powerpc/platforms/pasemi/pci.c
+++ b/arch/powerpc/platforms/pasemi/pci.c
@@ -216,6 +216,7 @@ static int __init pas_add_bridge(struct device_node *dev)
  void __init pas_pci_init(void)
  {
 struct device_node *np, *root;
+   int res;
  
 root = of_find_node_by_path("/");

 if (!root) {
@@ -226,11 +227,11 @@ void __init pas_pci_init(void)
  
 pci_set_flags(PCI_SCAN_ALL_PCIE_DEVS);
  
-   for (np = NULL; (np = of_get_next_child(root, np)) != NULL;)

-   if (np->name && !strcmp(np->name, "pxp") && !pas_add_bridge(np))
-   of_node_get(np);
-
-   of_node_put(root);
+   np = of_find_compatible_node(root, NULL, "pasemi,rootbus");
+   if (np) {
+   res = pas_add_bridge(np);
+   of_node_put(np);
+   }
  }
  
  void __iomem *pasemi_pci_getcfgaddr(struct pci_dev *dev, int offset)





Re: [RESEND PATCH 1/6] rapidio: define top Kconfig menu in driver subtree

2018-07-31 Thread Russell King - ARM Linux
On Tue, Jul 31, 2018 at 10:29:49AM -0400, Alexei Colin wrote:
> The top-level Kconfig entry for RapidIO subsystem is currently
> duplicated in several architecture-specific Kconfigs. This
> commit re-defines it in the driver subtree, and subsequent
> commits, one per architecture, will remove the duplicated
> definitions from respective per-architecture Kconfigs.
> 
> Cc: Andrew Morton 
> Cc: John Paul Walters 
> Cc: Catalin Marinas 
> Cc: Russell King 
> Cc: Arnd Bergmann 
> Cc: Will Deacon 
> Cc: Ralf Baechle 
> Cc: Paul Burton 
> Cc: Alexander Sverdlin 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Thomas Gleixner 
> Cc: Peter Anvin 
> Cc: x...@kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-m...@linux-mips.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-ker...@vger.kernel.org
> Signed-off-by: Alexei Colin 
> ---
>  drivers/rapidio/Kconfig | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/drivers/rapidio/Kconfig b/drivers/rapidio/Kconfig
> index d6d2f20c4597..98e301847584 100644
> --- a/drivers/rapidio/Kconfig
> +++ b/drivers/rapidio/Kconfig
> @@ -1,6 +1,21 @@
>  #
>  # RapidIO configuration
>  #
> +
> +config HAS_RAPIDIO
> + bool
> + default n

There's no need to specify this default - the default default defaults to
'n' anyway, so "default n" just respecifies what's already the default.
(next time I'll try to add more "default"s into that! ;) )

-- 
RMK's Patch system: http://www.armlinux.org.uk/developer/patches/
FTTC broadband for 0.8mile line in suburbia: sync at 13.8Mbps down 630kbps up
According to speedtest.net: 13Mbps down 490kbps up


Re: [RESEND PATCH 0/6] rapidio: move Kconfig menu definition to subsystem

2018-07-31 Thread Alex Bounine

Acked-by: Alexandre Bounine 


On 2018-07-31 10:29 AM, Alexei Colin wrote:

Resending the patchset from prior submission:
https://lkml.org/lkml/2018/7/30/911

The only change are the Cc tags in all patches now include the mailing
lists for all affected architectures, and patch 1/6 (which adds the menu
item to RapdidIO subsystem Kconfig) is CCed to all maintainers who are
getting this cover letter. The cover letter has been updated with
explanations to points raised in the feedback.



The top-level Kconfig entry for RapidIO subsystem is currently
duplicated in several architecture-specific Kconfig files. This set of
patches does two things:

1. Move the Kconfig menu definition into the RapidIO subsystem and
remove the duplicate definitions from arch Kconfig files.

2. Enable RapidIO Kconfig menu entry for arm and arm64 architectures,
where it was not enabled before. I tested that subsystem and drivers
build successfully for both architectures, and tested that the modules
load on a custom arm64 Qemu model.

For all architectures, RapidIO menu should be offered when either:
(1) The platform has a PCI bus (which host a RapidIO module on the bus).
(2) The platform has a RapidIO IP block (connected to a system bus, e.g.
AXI on ARM). In this case, 'select HAS_RAPIDIO' should be added to the
'config ARCH_*' menu entry for the SoCs that offer the IP block.

Prior to this patchset, different architectures used different criteria:
* powerpc: (1) and (2)
* mips: (1) and (2) after recent commit into next that added (2):
   https://www.linux-mips.org/archives/linux-mips/2018-07/msg00596.html
   fc5d988878942e9b42a4de5204bdd452f3f1ce47
   491ec1553e0075f345fbe476a93775eabcbc40b6
* x86: (1)
* arm,arm64: none (RapidIO menus never offered)

This set of architectures are the ones that implement support for
RapidIO as system bus. On some platforms RapidIO can be the only system
bus available replacing PCI/PCIe.  As it is done now, RapidIO is
configured in "Bus Options" (x86/PPC) or "Bus Support" (ARMs) sub-menu
and from system configuration option it should be kept this way.
Current location of RAPIDIO configuration option is familiar to users of
PowerPC and x86 platforms, and is similarly available in some ARM
manufacturers kernel code trees. (Alex Bounine)

HAS_RAPIDIO is not enabled unconditionally, because HAS_RAPIDIO option
is intended for SOCs that have built in SRIO controllers, like TI
KeyStoneII or FPGAs. Because RapidIO subsystem core is required during
RapidIO port driver initialization, having separate option allows us to
control available build options for RapidIO core and port driver (bool
vs.  tristate) and disable module option if port driver is configured as
built-in. (Alex Bounine)

Responses to feedback from prior submission (thanks for the reviews!):
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-July/593347.html
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-July/593349.html

Changelog:
   * Moved Kconfig entry into RapidIO subsystem instead of duplicating

In the current patchset, I took the approach of adding '|| PCI' to the
depends in the subsystem. I did try the alterantive approach mentioned
in the reviews for v1 of this patch, where the subsystem Kconfig does
not add a '|| PCI' and each per-architecture Kconfig has to add a
'select HAS_RAPIDIO if PCI' and SoCs with IP blocks have to also add
'select HAS_RAPIDIO'. This works too but requires each architecture's
Kconfig to add the line for RapidIO (whereas current approach does not
require that involvement) and also may create a false impression that
the dependency on PCI is strict.

We appreciate the suggestion for also selecting the RapdiIO subsystem for
compilation with COMPILE_TEST, but hope to address it in a separate
patchset, localized to the subsystem, since it will need to change
depends on all drivers, not just on the top level, and since this
patch now spans multiple architectures.

Alexei Colin (6):
   rapidio: define top Kconfig menu in driver subtree
   x86: factor out RapidIO Kconfig menu
   powerpc: factor out RapidIO Kconfig menu entry
   mips: factor out RapidIO Kconfig entry
   arm: enable RapidIO menu in Kconfig
   arm64: enable RapidIO menu in Kconfig

  arch/arm/Kconfig|  2 ++
  arch/arm64/Kconfig  |  2 ++
  arch/mips/Kconfig   | 11 ---
  arch/powerpc/Kconfig| 13 +
  arch/x86/Kconfig|  8 
  drivers/rapidio/Kconfig | 15 +++
  6 files changed, 20 insertions(+), 31 deletions(-)



Re: [RESEND PATCH 1/6] rapidio: define top Kconfig menu in driver subtree

2018-07-31 Thread Alexander Sverdlin
Hi!

On 31/07/18 16:29, Alexei Colin wrote:
> The top-level Kconfig entry for RapidIO subsystem is currently
> duplicated in several architecture-specific Kconfigs. This
> commit re-defines it in the driver subtree, and subsequent
> commits, one per architecture, will remove the duplicated
> definitions from respective per-architecture Kconfigs.
> 
> Cc: Andrew Morton 
> Cc: John Paul Walters 
> Cc: Catalin Marinas 
> Cc: Russell King 
> Cc: Arnd Bergmann 
> Cc: Will Deacon 
> Cc: Ralf Baechle 
> Cc: Paul Burton 
> Cc: Alexander Sverdlin 
> Cc: Benjamin Herrenschmidt 
> Cc: Paul Mackerras 
> Cc: Thomas Gleixner 
> Cc: Peter Anvin 
> Cc: x...@kernel.org
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-m...@linux-mips.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-ker...@vger.kernel.org
> Signed-off-by: Alexei Colin 

Reviewed-by: Alexander Sverdlin 

> ---
>  drivers/rapidio/Kconfig | 15 +++
>  1 file changed, 15 insertions(+)
> 
> diff --git a/drivers/rapidio/Kconfig b/drivers/rapidio/Kconfig
> index d6d2f20c4597..98e301847584 100644
> --- a/drivers/rapidio/Kconfig
> +++ b/drivers/rapidio/Kconfig
> @@ -1,6 +1,21 @@
>  #
>  # RapidIO configuration
>  #
> +
> +config HAS_RAPIDIO
> + bool
> + default n
> +
> +config RAPIDIO
> + tristate "RapidIO support"
> + depends on HAS_RAPIDIO || PCI
> + help
> +   This feature enables support for RapidIO high-performance
> +   packet-switched interconnect.
> +
> +   If you say Y here, the kernel will include drivers and
> +   infrastructure code to support RapidIO interconnect devices.
> +
>  source "drivers/rapidio/devices/Kconfig"
>  
>  config RAPIDIO_DISC_TIMEOUT

-- 
Best regards,
Alexander Sverdlin.


[PATCH v3 5/9] powerpc/traps: Print signal name for unhandled signals

2018-07-31 Thread Murilo Opsfelder Araujo
This adds a human-readable name in the unhandled signal message.

Before this patch, a page fault looked like:

  pandafault[6303]: unhandled signal 11 at 17d0 nip 161c lr 
7fff93c55100 code 2 in pandafault[1000+1]

After this patch, a page fault looks like:

  pandafault[6352]: segfault (11) at 13a2a09f8 nip 13a2a086c lr 7fffb63e5100 
code 2 in pandafault[13a2a+1]

Signed-off-by: Murilo Opsfelder Araujo 
---
 arch/powerpc/kernel/traps.c | 39 +++--
 1 file changed, 37 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 1c4f06fca370..e71f12bca146 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -96,6 +96,41 @@ EXPORT_SYMBOL(__debugger_fault_handler);
 #define TM_DEBUG(x...) do { } while(0)
 #endif
 
+static const char *signames[SIGRTMIN + 1] = {
+   "UNKNOWN",
+   "SIGHUP",   // 1
+   "SIGINT",   // 2
+   "SIGQUIT",  // 3
+   "SIGILL",   // 4
+   "unhandled trap",   // 5 = SIGTRAP
+   "SIGABRT",  // 6 = SIGIOT
+   "bus error",// 7 = SIGBUS
+   "floating point exception", // 8 = SIGFPE
+   "illegal instruction",  // 9 = SIGILL
+   "SIGUSR1",  // 10
+   "segfault", // 11 = SIGSEGV
+   "SIGUSR2",  // 12
+   "SIGPIPE",  // 13
+   "SIGALRM",  // 14
+   "SIGTERM",  // 15
+   "SIGSTKFLT",// 16
+   "SIGCHLD",  // 17
+   "SIGCONT",  // 18
+   "SIGSTOP",  // 19
+   "SIGTSTP",  // 20
+   "SIGTTIN",  // 21
+   "SIGTTOU",  // 22
+   "SIGURG",   // 23
+   "SIGXCPU",  // 24
+   "SIGXFSZ",  // 25
+   "SIGVTALRM",// 26
+   "SIGPROF",  // 27
+   "SIGWINCH", // 28
+   "SIGIO",// 29 = SIGPOLL = SIGLOST
+   "SIGPWR",   // 30
+   "SIGSYS",   // 31 = SIGUNUSED
+};
+
 /*
  * Trap & Exception support
  */
@@ -314,8 +349,8 @@ static void show_signal_msg(int signr, struct pt_regs 
*regs, int code,
if (!unhandled_signal(current, signr))
return;
 
-   pr_info("%s[%d]: unhandled signal %d at %lx nip %lx lr %lx code %x",
-   current->comm, current->pid, signr,
+   pr_info("%s[%d]: %s (%d) at %lx nip %lx lr %lx code %x",
+   current->comm, current->pid, signames[signr], signr,
addr, regs->nip, regs->link, code);
 
print_vma_addr(KERN_CONT " in ", regs->nip);
-- 
2.17.1



[PATCH v3 9/9] powerpc/traps: Add line prefix in show_instructions()

2018-07-31 Thread Murilo Opsfelder Araujo
Remove "Instruction dump:" line by adding a prefix to display current->comm
and current->pid, along with the instructions dump.

The prefix can serve as a glue that links the instructions dump to its
originator, allowing messages to be interleaved in the logs.

Before this patch, a page fault looked like:

  pandafault[10524]: segfault (11) at 17d0 nip 161c lr 7fffbd295100 
code 2 in pandafault[1000+1]
  Instruction dump:
  4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe
  392988d0 f93f0020 e93f0020 39400048 <9949> 3920 7d234b78 383f0040

After this patch, it looks like:

  pandafault[10850]: segfault (11) at 17d0 nip 161c lr 7fff9f3e5100 
code 2 in pandafault[1000+1]
  pandafault[10850]: code: 4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 
f821ffc1 7c3f0b78 3d22fffe
  pandafault[10850]: code: 392988d0 f93f0020 e93f0020 39400048 <9949> 
3920 7d234b78 383f0040

Signed-off-by: Murilo Opsfelder Araujo 
---
 arch/powerpc/kernel/process.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index e78799a8855a..d12143e7d8f9 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1265,16 +1265,19 @@ static int instructions_to_print = 16;
 void show_instructions(struct pt_regs *regs)
 {
int i;
+   const char *prefix = KERN_INFO "%s[%d]: code: ";
unsigned long pc = regs->nip - (instructions_to_print * 3 / 4 *
sizeof(int));
 
-   printk("Instruction dump:");
+   printk(prefix, current->comm, current->pid);
 
for (i = 0; i < instructions_to_print; i++) {
int instr;
 
-   if (!(i % 8))
+   if (!(i % 8) && (i > 0)) {
pr_cont("\n");
+   printk(prefix, current->comm, current->pid);
+   }
 
 #if !defined(CONFIG_BOOKE)
/* If executing with the IMMU off, adjust pc rather
-- 
2.17.1



[PATCH v3 8/9] powerpc/traps: Show instructions on exceptions

2018-07-31 Thread Murilo Opsfelder Araujo
Call show_instructions() in arch/powerpc/kernel/traps.c to dump
instructions at faulty location, useful to debugging.

Before this patch, an unhandled signal message looked like:

  pandafault[10524]: segfault (11) at 17d0 nip 161c lr 7fffbd295100 
code 2 in pandafault[1000+1]

After this patch, it looks like:

  pandafault[10524]: segfault (11) at 17d0 nip 161c lr 7fffbd295100 
code 2 in pandafault[1000+1]
  Instruction dump:
  4bfffeec 4bfffee8 3c401002 38427f00 fbe1fff8 f821ffc1 7c3f0b78 3d22fffe
  392988d0 f93f0020 e93f0020 39400048 <9949> 3920 7d234b78 383f0040

Signed-off-by: Murilo Opsfelder Araujo 
---
 arch/powerpc/kernel/traps.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index e71f12bca146..b27f3f71d745 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -70,6 +70,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #if defined(CONFIG_DEBUGGER) || defined(CONFIG_KEXEC_CORE)
 int (*__debugger)(struct pt_regs *regs) __read_mostly;
@@ -356,6 +357,8 @@ static void show_signal_msg(int signr, struct pt_regs 
*regs, int code,
print_vma_addr(KERN_CONT " in ", regs->nip);
 
pr_cont("\n");
+
+   show_instructions(regs);
 }
 
 void _exception_pkey(int signr, struct pt_regs *regs, int code,
-- 
2.17.1



[PATCH v3 7/9] powerpc: Add stacktrace.h header

2018-07-31 Thread Murilo Opsfelder Araujo
Move show_instructions() declaration to
arch/powerpc/include/asm/stacktrace.h and include asm/stracktrace.h in
arch/powerpc/kernel/process.c, which contains the implementation.

This allows show_instructions() to be called on, for example,
show_signal_msg().

Signed-off-by: Murilo Opsfelder Araujo 
---
 arch/powerpc/include/asm/stacktrace.h | 13 +
 arch/powerpc/kernel/process.c |  3 ++-
 2 files changed, 15 insertions(+), 1 deletion(-)
 create mode 100644 arch/powerpc/include/asm/stacktrace.h

diff --git a/arch/powerpc/include/asm/stacktrace.h 
b/arch/powerpc/include/asm/stacktrace.h
new file mode 100644
index ..217ebc52ff97
--- /dev/null
+++ b/arch/powerpc/include/asm/stacktrace.h
@@ -0,0 +1,13 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+/*
+ * Stack trace functions.
+ *
+ * Copyright 2018, Murilo Opsfelder Araujo, IBM Corporation.
+ */
+
+#ifndef _ASM_POWERPC_STACKTRACE_H
+#define _ASM_POWERPC_STACKTRACE_H
+
+void show_instructions(struct pt_regs *regs);
+
+#endif /* _ASM_POWERPC_STACKTRACE_H */
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 50094c44bf79..e78799a8855a 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -52,6 +52,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1261,7 +1262,7 @@ struct task_struct *__switch_to(struct task_struct *prev,
 
 static int instructions_to_print = 16;
 
-static void show_instructions(struct pt_regs *regs)
+void show_instructions(struct pt_regs *regs)
 {
int i;
unsigned long pc = regs->nip - (instructions_to_print * 3 / 4 *
-- 
2.17.1



[PATCH v3 6/9] powerpc: Do not call __kernel_text_address() in show_instructions()

2018-07-31 Thread Murilo Opsfelder Araujo
Modify show_instructions() not to call __kernel_text_address(), allowing
userspace instruction dump.  probe_kernel_address(), which returns -EFAULT
if something goes wrong, is still being called.

Signed-off-by: Murilo Opsfelder Araujo 
---
 arch/powerpc/kernel/process.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index e9533b4d2f08..50094c44bf79 100644
--- a/arch/powerpc/kernel/process.c
+++ b/arch/powerpc/kernel/process.c
@@ -1283,8 +1283,7 @@ static void show_instructions(struct pt_regs *regs)
pc = (unsigned long)phys_to_virt(pc);
 #endif
 
-   if (!__kernel_text_address(pc) ||
-probe_kernel_address((unsigned int __user *)pc, instr)) {
+   if (probe_kernel_address((unsigned int __user *)pc, instr)) {
pr_cont(" ");
} else {
if (regs->nip == pc)
-- 
2.17.1



[PATCH v3 4/9] powerpc/traps: Print VMA for unhandled signals

2018-07-31 Thread Murilo Opsfelder Araujo
This adds VMA address in the message printed for unhandled signals,
similarly to what other architectures, like x86, print.

Before this patch, a page fault looked like:

  pandafault[61470]: unhandled signal 11 at 17d0 nip 161c lr 
7fff8d185100 code 2

After this patch, a page fault looks like:

  pandafault[6303]: unhandled signal 11 at 17d0 nip 161c lr 
7fff93c55100 code 2 in pandafault[1000+1]

Signed-off-by: Murilo Opsfelder Araujo 
---
 arch/powerpc/kernel/traps.c | 6 +-
 1 file changed, 5 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index fd4e0648a2d2..1c4f06fca370 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -314,9 +314,13 @@ static void show_signal_msg(int signr, struct pt_regs 
*regs, int code,
if (!unhandled_signal(current, signr))
return;
 
-   pr_info("%s[%d]: unhandled signal %d at %lx nip %lx lr %lx code %x\n",
+   pr_info("%s[%d]: unhandled signal %d at %lx nip %lx lr %lx code %x",
current->comm, current->pid, signr,
addr, regs->nip, regs->link, code);
+
+   print_vma_addr(KERN_CONT " in ", regs->nip);
+
+   pr_cont("\n");
 }
 
 void _exception_pkey(int signr, struct pt_regs *regs, int code,
-- 
2.17.1



[PATCH v3 3/9] powerpc/traps: Use %lx format in show_signal_msg()

2018-07-31 Thread Murilo Opsfelder Araujo
Use %lx format to print registers.  This avoids having two different
formats and avoids checking for MSR_64BIT, improving readability of the
function.

Even though we could have used %px, which is functionally equivalent to %lx
as per Documentation/core-api/printk-formats.rst, it is not semantically
correct because the data printed are not pointers.  And using %px requires
casting data to (void *).

Besides that, %lx matches the format used in show_regs().

Before this patch:

  pandafault[4808]: unhandled signal 11 at 1718 nip 
1574 lr 7fff935e7a6c code 2

After this patch:

  pandafault[4732]: unhandled signal 11 at 1718 nip 1574 lr 
7fff86697a6c code 2

Signed-off-by: Murilo Opsfelder Araujo 
---
 arch/powerpc/kernel/traps.c | 11 +++
 1 file changed, 3 insertions(+), 8 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 4faab4705774..fd4e0648a2d2 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -311,17 +311,12 @@ static bool show_unhandled_signals_ratelimited(void)
 static void show_signal_msg(int signr, struct pt_regs *regs, int code,
unsigned long addr)
 {
-   const char fmt32[] = KERN_INFO "%s[%d]: unhandled signal %d " \
-   "at %08lx nip %08lx lr %08lx code %x\n";
-   const char fmt64[] = KERN_INFO "%s[%d]: unhandled signal %d " \
-   "at %016lx nip %016lx lr %016lx code %x\n";
-
if (!unhandled_signal(current, signr))
return;
 
-   printk(regs->msr & MSR_64BIT ? fmt64 : fmt32,
-  current->comm, current->pid, signr,
-  addr, regs->nip, regs->link, code);
+   pr_info("%s[%d]: unhandled signal %d at %lx nip %lx lr %lx code %x\n",
+   current->comm, current->pid, signr,
+   addr, regs->nip, regs->link, code);
 }
 
 void _exception_pkey(int signr, struct pt_regs *regs, int code,
-- 
2.17.1



[PATCH v3 2/9] powerpc/traps: Return early in show_signal_msg()

2018-07-31 Thread Murilo Opsfelder Araujo
Modify the logic of show_signal_msg() to return early, if possible.
Replace printk_ratelimited() by printk() and a default rate limit burst to
limit displaying unhandled signals messages.

Mainly reason of this change is to improve readability of the function.
The conditions to display the message were coupled together in one single
`if` statement.

Splitting out the rate limit check outside show_signal_msg() makes it
easier to the caller decide if it wants to respect a printk rate limit or
not.

Signed-off-by: Murilo Opsfelder Araujo 
---
 arch/powerpc/kernel/traps.c | 21 +++--
 1 file changed, 15 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index cbd3dc365193..4faab4705774 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -301,6 +301,13 @@ void user_single_step_siginfo(struct task_struct *tsk,
info->si_addr = (void __user *)regs->nip;
 }
 
+static bool show_unhandled_signals_ratelimited(void)
+{
+   static DEFINE_RATELIMIT_STATE(rs, DEFAULT_RATELIMIT_INTERVAL,
+ DEFAULT_RATELIMIT_BURST);
+   return show_unhandled_signals && __ratelimit();
+}
+
 static void show_signal_msg(int signr, struct pt_regs *regs, int code,
unsigned long addr)
 {
@@ -309,11 +316,12 @@ static void show_signal_msg(int signr, struct pt_regs 
*regs, int code,
const char fmt64[] = KERN_INFO "%s[%d]: unhandled signal %d " \
"at %016lx nip %016lx lr %016lx code %x\n";
 
-   if (show_unhandled_signals && unhandled_signal(current, signr)) {
-   printk_ratelimited(regs->msr & MSR_64BIT ? fmt64 : fmt32,
-  current->comm, current->pid, signr,
-  addr, regs->nip, regs->link, code);
-   }
+   if (!unhandled_signal(current, signr))
+   return;
+
+   printk(regs->msr & MSR_64BIT ? fmt64 : fmt32,
+  current->comm, current->pid, signr,
+  addr, regs->nip, regs->link, code);
 }
 
 void _exception_pkey(int signr, struct pt_regs *regs, int code,
@@ -326,7 +334,8 @@ void _exception_pkey(int signr, struct pt_regs *regs, int 
code,
return;
}
 
-   show_signal_msg(signr, regs, code, addr);
+   if (show_unhandled_signals_ratelimited())
+   show_signal_msg(signr, regs, code, addr);
 
if (arch_irqs_disabled() && !arch_irq_disabled_regs(regs))
local_irq_enable();
-- 
2.17.1



[PATCH v3 1/9] powerpc/traps: Print unhandled signals in a separate function

2018-07-31 Thread Murilo Opsfelder Araujo
Isolate the logic of printing unhandled signals out of _exception_pkey().
No functional change, only code rearrangement.

Signed-off-by: Murilo Opsfelder Araujo 
---
 arch/powerpc/kernel/traps.c | 26 --
 1 file changed, 16 insertions(+), 10 deletions(-)

diff --git a/arch/powerpc/kernel/traps.c b/arch/powerpc/kernel/traps.c
index 0e17dcb48720..cbd3dc365193 100644
--- a/arch/powerpc/kernel/traps.c
+++ b/arch/powerpc/kernel/traps.c
@@ -301,26 +301,32 @@ void user_single_step_siginfo(struct task_struct *tsk,
info->si_addr = (void __user *)regs->nip;
 }
 
+static void show_signal_msg(int signr, struct pt_regs *regs, int code,
+   unsigned long addr)
+{
+   const char fmt32[] = KERN_INFO "%s[%d]: unhandled signal %d " \
+   "at %08lx nip %08lx lr %08lx code %x\n";
+   const char fmt64[] = KERN_INFO "%s[%d]: unhandled signal %d " \
+   "at %016lx nip %016lx lr %016lx code %x\n";
+
+   if (show_unhandled_signals && unhandled_signal(current, signr)) {
+   printk_ratelimited(regs->msr & MSR_64BIT ? fmt64 : fmt32,
+  current->comm, current->pid, signr,
+  addr, regs->nip, regs->link, code);
+   }
+}
 
 void _exception_pkey(int signr, struct pt_regs *regs, int code,
-   unsigned long addr, int key)
+unsigned long addr, int key)
 {
siginfo_t info;
-   const char fmt32[] = KERN_INFO "%s[%d]: unhandled signal %d " \
-   "at %08lx nip %08lx lr %08lx code %x\n";
-   const char fmt64[] = KERN_INFO "%s[%d]: unhandled signal %d " \
-   "at %016lx nip %016lx lr %016lx code %x\n";
 
if (!user_mode(regs)) {
die("Exception in kernel mode", regs, signr);
return;
}
 
-   if (show_unhandled_signals && unhandled_signal(current, signr)) {
-   printk_ratelimited(regs->msr & MSR_64BIT ? fmt64 : fmt32,
-  current->comm, current->pid, signr,
-  addr, regs->nip, regs->link, code);
-   }
+   show_signal_msg(signr, regs, code, addr);
 
if (arch_irqs_disabled() && !arch_irq_disabled_regs(regs))
local_irq_enable();
-- 
2.17.1



[PATCH v3 0/9] powerpc: Modernize unhandled signals message

2018-07-31 Thread Murilo Opsfelder Araujo
Hi, everyone.

This series was inspired by the need to modernize and display more
informative messages about unhandled signals.

The "unhandled signal NN" is not very informative.  We thought it would be
helpful adding a human-readable message describing what the signal number
means, printing the VMA address, and dumping the instructions.

Before this series:

  pandafault32[4724]: unhandled signal 11 at 15e4 nip 1444 lr 0fe31ef4 
code 2

  pandafault64[4725]: unhandled signal 11 at 1718 nip 
1574 lr 7fff7faa7a6c code 2

After this series:

  pandafault32[4753]: segfault (11) at 15e4 nip 1444 lr fe31ef4 code 2 
in pandafault32[1000+1]
  pandafault32[4753]: code: 4b3c 6000 6042 4b30 9421ffd0 
93e1002c 7c3f0b78 3d201000
  pandafault32[4753]: code: 392905e4 913f0008 813f0008 39400048 <9949> 
3920 7d234b78 397f0030

  pandafault64[4754]: segfault (11) at 1718 nip 1574 lr 7fffb0007a6c 
code 2 in pandafault64[1000+1]
  pandafault64[4754]: code: e8010010 7c0803a6 4bfffef4 4bfffef0 fbe1fff8 
f821ffb1 7c3f0b78 3d22fffe
  pandafault64[4754]: code: 39298818 f93f0030 e93f0030 39400048 <9949> 
3920 7d234b78 383f0050

Link to v2:

  https://lore.kernel.org/lkml/20180727145811.12334-1-muri...@linux.ibm.com/

v2..v3:

  - Dropped patch 3
  - Updated patch 4 to use %lx

Cheers!

Murilo Opsfelder Araujo (9):
  powerpc/traps: Print unhandled signals in a separate function
  powerpc/traps: Return early in show_signal_msg()
  powerpc/traps: Use %lx format in show_signal_msg()
  powerpc/traps: Print VMA for unhandled signals
  powerpc/traps: Print signal name for unhandled signals
  powerpc: Do not call __kernel_text_address() in show_instructions()
  powerpc: Add stacktrace.h header
  powerpc/traps: Show instructions on exceptions
  powerpc/traps: Add line prefix in show_instructions()

 arch/powerpc/include/asm/stacktrace.h | 13 +
 arch/powerpc/kernel/process.c | 13 +++--
 arch/powerpc/kernel/traps.c   | 72 +++
 3 files changed, 83 insertions(+), 15 deletions(-)
 create mode 100644 arch/powerpc/include/asm/stacktrace.h

-- 
2.17.1



[RESEND PATCH 5/6] arm: enable RapidIO menu in Kconfig

2018-07-31 Thread Alexei Colin
Platforms with a PCI bus will be offered the RapidIO menu since they may
be want support for a RapidIO PCI device. Platforms without a PCI bus
that might include a RapidIO IP block will need to "select HAS_RAPIDIO"
in the platform-/machine-specific "config ARCH_*" Kconfig entry.

Tested that kernel builds for ARM with the RapidIO subsystem and switch
drivers enabled.

Cc: Andrew Morton 
Cc: Russell King 
Cc: John Paul Walters 
Cc: linux-ker...@vger.kernel.org
Cc: x...@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-m...@linux-mips.org
Cc: linux-arm-ker...@lists.infradead.org
Signed-off-by: Alexei Colin 
---
 arch/arm/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig
index afe350e5e3d9..602a61324890 100644
--- a/arch/arm/Kconfig
+++ b/arch/arm/Kconfig
@@ -1278,6 +1278,8 @@ config PCI_HOST_ITE8152
 
 source "drivers/pci/Kconfig"
 
+source "drivers/rapidio/Kconfig"
+
 source "drivers/pcmcia/Kconfig"
 
 endmenu
-- 
2.18.0



[RESEND PATCH 6/6] arm64: enable RapidIO menu in Kconfig

2018-07-31 Thread Alexei Colin
Platforms with a PCI bus will be offered the RapidIO menu since they may
be want support for a RapidIO PCI device. Platforms without a PCI bus
that might include a RapidIO IP block will need to "select HAS_RAPIDIO"
in the platform-/machine-specific "config ARCH_*" Kconfig entry.

Tested that kernel builds for arm64 with RapidIO subsystem and
switch drivers enabled, also that the modules load successfully
on a custom Aarch64 Qemu model.

Cc: Andrew Morton 
Cc: Russell King 
Cc: John Paul Walters 
Cc: x...@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-m...@linux-mips.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org,
Signed-off-by: Alexei Colin 
---
 arch/arm64/Kconfig | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index a8f0c74e6f7f..5e8cf90505ec 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -308,6 +308,8 @@ config PCI_SYSCALL
 
 source "drivers/pci/Kconfig"
 
+source "drivers/rapidio/Kconfig"
+
 endmenu
 
 menu "Kernel Features"
-- 
2.18.0



[RESEND PATCH 4/6] mips: factor out RapidIO Kconfig entry

2018-07-31 Thread Alexei Colin
The menu entry is now defined in the rapidio subtree.

Platforms with a PCI bus will be offered the RapidIO menu since they may
be want support for a RapidIO PCI device. Platforms without a PCI bus
that might include a RapidIO IP block will need to "select HAS_RAPIDIO"
in the platform-/machine-specific "config ARCH_*" Kconfig entry.

Cc: Andrew Morton 
Cc: Russell King 
Cc: Alexander Sverdlin 
Cc: John Paul Walters 
Cc: x...@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-m...@linux-mips.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Alexei Colin 
---
 arch/mips/Kconfig | 11 ---
 1 file changed, 11 deletions(-)

diff --git a/arch/mips/Kconfig b/arch/mips/Kconfig
index 10256056647c..92b9262ee731 100644
--- a/arch/mips/Kconfig
+++ b/arch/mips/Kconfig
@@ -3106,17 +3106,6 @@ config ZONE_DMA32
 
 source "drivers/pcmcia/Kconfig"
 
-config HAS_RAPIDIO
-   bool
-   default n
-
-config RAPIDIO
-   tristate "RapidIO support"
-   depends on HAS_RAPIDIO || PCI
-   help
- If you say Y here, the kernel will include drivers and
- infrastructure code to support RapidIO interconnect devices.
-
 source "drivers/rapidio/Kconfig"
 
 endmenu
-- 
2.18.0



[RESEND PATCH 3/6] powerpc: factor out RapidIO Kconfig menu entry

2018-07-31 Thread Alexei Colin
The menu entry is now defined in the rapidio subtree.  Also, re-order
the bus menu so tha the platform-specific RapidIO controller appears
after the entry for the RapidIO subsystem.

Platforms with a PCI bus will be offered the RapidIO menu since they may
be want support for a RapidIO PCI device. Platforms without a PCI bus
that might include a RapidIO IP block will need to "select HAS_RAPIDIO"
in the platform-/machine-specific "config ARCH_*" Kconfig entry.

Cc: Andrew Morton 
Cc: Russell King 
Cc: John Paul Walters 
Cc: x...@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-m...@linux-mips.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Alexei Colin 
---
 arch/powerpc/Kconfig | 13 +
 1 file changed, 1 insertion(+), 12 deletions(-)

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 25d005af0a5b..17ea8a5f90a0 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -993,16 +993,7 @@ source "drivers/pci/Kconfig"
 
 source "drivers/pcmcia/Kconfig"
 
-config HAS_RAPIDIO
-   bool
-   default n
-
-config RAPIDIO
-   tristate "RapidIO support"
-   depends on HAS_RAPIDIO || PCI
-   help
- If you say Y here, the kernel will include drivers and
- infrastructure code to support RapidIO interconnect devices.
+source "drivers/rapidio/Kconfig"
 
 config FSL_RIO
bool "Freescale Embedded SRIO Controller support"
@@ -1012,8 +1003,6 @@ config FSL_RIO
  Include support for RapidIO controller on Freescale embedded
  processors (MPC8548, MPC8641, etc).
 
-source "drivers/rapidio/Kconfig"
-
 endmenu
 
 config NONSTATIC_KERNEL
-- 
2.18.0



[RESEND PATCH 2/6] x86: factor out RapidIO Kconfig menu

2018-07-31 Thread Alexei Colin
The menu entry is now defined in the rapidio subtree.

Cc: Andrew Morton 
Cc: Russell King 
Cc: John Paul Walters 
Cc: x...@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-m...@linux-mips.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Alexei Colin 
---
 arch/x86/Kconfig | 8 
 1 file changed, 8 deletions(-)

diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 27fce7e84357..6f057000e486 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -2828,14 +2828,6 @@ config AMD_NB
 
 source "drivers/pcmcia/Kconfig"
 
-config RAPIDIO
-   tristate "RapidIO support"
-   depends on PCI
-   default n
-   help
- If enabled this option will include drivers and the core
- infrastructure code to support RapidIO interconnect devices.
-
 source "drivers/rapidio/Kconfig"
 
 config X86_SYSFB
-- 
2.18.0



[RESEND PATCH 1/6] rapidio: define top Kconfig menu in driver subtree

2018-07-31 Thread Alexei Colin
The top-level Kconfig entry for RapidIO subsystem is currently
duplicated in several architecture-specific Kconfigs. This
commit re-defines it in the driver subtree, and subsequent
commits, one per architecture, will remove the duplicated
definitions from respective per-architecture Kconfigs.

Cc: Andrew Morton 
Cc: John Paul Walters 
Cc: Catalin Marinas 
Cc: Russell King 
Cc: Arnd Bergmann 
Cc: Will Deacon 
Cc: Ralf Baechle 
Cc: Paul Burton 
Cc: Alexander Sverdlin 
Cc: Benjamin Herrenschmidt 
Cc: Paul Mackerras 
Cc: Thomas Gleixner 
Cc: Peter Anvin 
Cc: x...@kernel.org
Cc: linuxppc-dev@lists.ozlabs.org
Cc: linux-m...@linux-mips.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-ker...@vger.kernel.org
Signed-off-by: Alexei Colin 
---
 drivers/rapidio/Kconfig | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/rapidio/Kconfig b/drivers/rapidio/Kconfig
index d6d2f20c4597..98e301847584 100644
--- a/drivers/rapidio/Kconfig
+++ b/drivers/rapidio/Kconfig
@@ -1,6 +1,21 @@
 #
 # RapidIO configuration
 #
+
+config HAS_RAPIDIO
+   bool
+   default n
+
+config RAPIDIO
+   tristate "RapidIO support"
+   depends on HAS_RAPIDIO || PCI
+   help
+ This feature enables support for RapidIO high-performance
+ packet-switched interconnect.
+
+ If you say Y here, the kernel will include drivers and
+ infrastructure code to support RapidIO interconnect devices.
+
 source "drivers/rapidio/devices/Kconfig"
 
 config RAPIDIO_DISC_TIMEOUT
-- 
2.18.0



[RESEND PATCH 0/6] rapidio: move Kconfig menu definition to subsystem

2018-07-31 Thread Alexei Colin
Resending the patchset from prior submission:
https://lkml.org/lkml/2018/7/30/911

The only change are the Cc tags in all patches now include the mailing
lists for all affected architectures, and patch 1/6 (which adds the menu
item to RapdidIO subsystem Kconfig) is CCed to all maintainers who are
getting this cover letter. The cover letter has been updated with
explanations to points raised in the feedback.



The top-level Kconfig entry for RapidIO subsystem is currently
duplicated in several architecture-specific Kconfig files. This set of
patches does two things:

1. Move the Kconfig menu definition into the RapidIO subsystem and
remove the duplicate definitions from arch Kconfig files.

2. Enable RapidIO Kconfig menu entry for arm and arm64 architectures,
where it was not enabled before. I tested that subsystem and drivers
build successfully for both architectures, and tested that the modules
load on a custom arm64 Qemu model.

For all architectures, RapidIO menu should be offered when either:
(1) The platform has a PCI bus (which host a RapidIO module on the bus).
(2) The platform has a RapidIO IP block (connected to a system bus, e.g.
AXI on ARM). In this case, 'select HAS_RAPIDIO' should be added to the
'config ARCH_*' menu entry for the SoCs that offer the IP block.

Prior to this patchset, different architectures used different criteria:
* powerpc: (1) and (2)
* mips: (1) and (2) after recent commit into next that added (2):
  https://www.linux-mips.org/archives/linux-mips/2018-07/msg00596.html
  fc5d988878942e9b42a4de5204bdd452f3f1ce47
  491ec1553e0075f345fbe476a93775eabcbc40b6
* x86: (1)
* arm,arm64: none (RapidIO menus never offered)

This set of architectures are the ones that implement support for
RapidIO as system bus. On some platforms RapidIO can be the only system
bus available replacing PCI/PCIe.  As it is done now, RapidIO is
configured in "Bus Options" (x86/PPC) or "Bus Support" (ARMs) sub-menu
and from system configuration option it should be kept this way.
Current location of RAPIDIO configuration option is familiar to users of
PowerPC and x86 platforms, and is similarly available in some ARM
manufacturers kernel code trees. (Alex Bounine)

HAS_RAPIDIO is not enabled unconditionally, because HAS_RAPIDIO option
is intended for SOCs that have built in SRIO controllers, like TI
KeyStoneII or FPGAs. Because RapidIO subsystem core is required during
RapidIO port driver initialization, having separate option allows us to
control available build options for RapidIO core and port driver (bool
vs.  tristate) and disable module option if port driver is configured as
built-in. (Alex Bounine)

Responses to feedback from prior submission (thanks for the reviews!):
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-July/593347.html
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-July/593349.html

Changelog:
  * Moved Kconfig entry into RapidIO subsystem instead of duplicating

In the current patchset, I took the approach of adding '|| PCI' to the
depends in the subsystem. I did try the alterantive approach mentioned
in the reviews for v1 of this patch, where the subsystem Kconfig does
not add a '|| PCI' and each per-architecture Kconfig has to add a
'select HAS_RAPIDIO if PCI' and SoCs with IP blocks have to also add
'select HAS_RAPIDIO'. This works too but requires each architecture's
Kconfig to add the line for RapidIO (whereas current approach does not
require that involvement) and also may create a false impression that
the dependency on PCI is strict.

We appreciate the suggestion for also selecting the RapdiIO subsystem for
compilation with COMPILE_TEST, but hope to address it in a separate
patchset, localized to the subsystem, since it will need to change
depends on all drivers, not just on the top level, and since this
patch now spans multiple architectures.

Alexei Colin (6):
  rapidio: define top Kconfig menu in driver subtree
  x86: factor out RapidIO Kconfig menu
  powerpc: factor out RapidIO Kconfig menu entry
  mips: factor out RapidIO Kconfig entry
  arm: enable RapidIO menu in Kconfig
  arm64: enable RapidIO menu in Kconfig

 arch/arm/Kconfig|  2 ++
 arch/arm64/Kconfig  |  2 ++
 arch/mips/Kconfig   | 11 ---
 arch/powerpc/Kconfig| 13 +
 arch/x86/Kconfig|  8 
 drivers/rapidio/Kconfig | 15 +++
 6 files changed, 20 insertions(+), 31 deletions(-)

-- 
2.18.0



Re: [RFC PATCH kernel 0/5] powerpc/P9/vfio: Pass through NVIDIA Tesla V100

2018-07-31 Thread Alex Williamson
On Tue, 31 Jul 2018 14:03:35 +1000
Alexey Kardashevskiy  wrote:

> On 31/07/2018 02:29, Alex Williamson wrote:
> > On Mon, 30 Jul 2018 18:58:49 +1000
> > Alexey Kardashevskiy  wrote:
> >> After some local discussions, it was pointed out that force disabling
> >> nvlinks won't bring us much as for an nvlink to work, both sides need to
> >> enable it so malicious guests cannot penetrate good ones (or a host)
> >> unless a good guest enabled the link but won't happen with a well
> >> behaving guest. And if two guests became malicious, then can still only
> >> harm each other, and so can they via other ways such network. This is
> >> different from PCIe as once PCIe link is unavoidably enabled, a well
> >> behaving device cannot firewall itself from peers as it is up to the
> >> upstream bridge(s) now to decide the routing; with nvlink2, a GPU still
> >> has means to protect itself, just like a guest can run "firewalld" for
> >> network.
> >>
> >> Although it would be a nice feature to have an extra barrier between
> >> GPUs, is inability to block the links in hypervisor still a blocker for
> >> V100 pass through?  
> > 
> > How is the NVLink configured by the guest, is it 'on'/'off' or are
> > specific routes configured?   
> 
> The GPU-GPU links need not to be blocked and need to be enabled
> (==trained) by a driver in the guest. There are no routes between GPUs
> in NVLink fabric, these are direct links, it is just a switch on each
> side, both switches need to be on for a link to work.

Ok, but there is at least the possibility of multiple direct links per
GPU, the very first diagram I find of NVlink shows 8 interconnected
GPUs:

https://www.nvidia.com/en-us/data-center/nvlink/

So if each switch enables one direct, point to point link, how does the
guest know which links to open for which peer device?  And of course
since we can't see the spec, a security audit is at best hearsay :-\
 
> The GPU-CPU links - the GPU bit is the same switch, the CPU NVlink state
> is controlled via the emulated PCI bridges which I pass through together
> with the GPU.

So there's a special emulated switch, is that how the guest knows which
GPUs it can enable NVLinks to?

> > If the former, then isn't a non-malicious
> > guest still susceptible to a malicious guest?  
> 
> A non-malicious guest needs to turn its switch on for a link to a GPU
> which belongs to a malicious guest.

Actual security, or obfuscation, will we ever know...

> > If the latter, how is
> > routing configured by the guest given that the guest view of the
> > topology doesn't match physical hardware?  Are these routes
> > deconfigured by device reset?  Are they part of the save/restore
> > state?  Thanks,  

Still curious what happens to these routes on reset.  Can a later user
of a GPU inherit a device where the links are already enabled?  Thanks,

Alex


Re: [PATCH 0/6] rapidio: move Kconfig menu definition to subsystem

2018-07-31 Thread Alex Bounine

Acked-by: Alexandre Bounine 

On 2018-07-30 06:50 PM, Alexei Colin wrote:

The top-level Kconfig entry for RapidIO subsystem is currently
duplicated in several architecture-specific Kconfig files. This set of
patches does two things:

1. Move the Kconfig menu definition into the RapidIO subsystem and
remove the duplicate definitions from arch Kconfig files.

2. Enable RapidIO Kconfig menu entry for arm and arm64 architectures,
where it was not enabled before. I tested that subsystem and drivers
build successfully for both architectures, and tested that the modules
load on a custom arm64 Qemu model.

For all architectures, RapidIO menu should be offered when either:
(1) The platform has a PCI bus (which host a RapidIO module on the bus).
(2) The platform has a RapidIO IP block (connected to a system bus, e.g.
AXI on ARM). In this case, 'select HAS_RAPIDIO' should be added to the
'config ARCH_*' menu entry for the SoCs that offer the IP block.

Prior to this patchset, different architectures used different criteria:
* powerpc: (1) and (2)
* mips: (1) and (2) after recent commit into next that added (2):
   https://www.linux-mips.org/archives/linux-mips/2018-07/msg00596.html
   fc5d988878942e9b42a4de5204bdd452f3f1ce47
   491ec1553e0075f345fbe476a93775eabcbc40b6
* x86: (1)
* arm,arm64: none (RapidIO menus never offered)

Responses to feedback from prior submission (thanks for the reviews!):
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-July/593347.html
http://lists.infradead.org/pipermail/linux-arm-kernel/2018-July/593349.html

Changelog:
   * Moved Kconfig entry into RapidIO subsystem instead of duplicating

In the current patchset, I took the approach of adding '|| PCI' to the
depends in the subsystem. I did try the alterantive approach mentioned
in the reviews for v1 of this patch, where the subsystem Kconfig does
not add a '|| PCI' and each per-architecture Kconfig has to add a
'select HAS_RAPIDIO if PCI' and SoCs with IP blocks have to also add
'select HAS_RAPIDIO'. This works too but requires each architecture's
Kconfig to add the line for RapidIO (whereas current approach does not
require that involvement) and also may create a false impression that
the dependency on PCI is strict.

We appreciate the suggestion for also selecting the RapdiIO subsystem for
compilation with COMPILE_TEST, but hope to address it in a separate
patchset, localized to the subsystem, since it will need to change
depends on all drivers, not just on the top level, and since this
patch now spans multiple architectures.


Alexei Colin (6):
   rapidio: define top Kconfig menu in driver subtree
   x86: factor out RapidIO Kconfig menu
   powerpc: factor out RapidIO Kconfig menu entry
   mips: factor out RapidIO Kconfig entry
   arm: enable RapidIO menu in Kconfig
   arm64: enable RapidIO menu in Kconfig

  arch/arm/Kconfig|  2 ++
  arch/arm64/Kconfig  |  2 ++
  arch/mips/Kconfig   | 11 ---
  arch/powerpc/Kconfig| 13 +
  arch/x86/Kconfig|  8 
  drivers/rapidio/Kconfig | 15 +++
  6 files changed, 20 insertions(+), 31 deletions(-)



Re: phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem)

2018-07-31 Thread Rob Herring
On Tue, Jul 31, 2018 at 12:34 AM Michael Ellerman  wrote:
>
> Hi Rob/Frank,
>
> I think we might have a problem with the phandle_cache not interacting
> well with of_detach_node():

Probably needs a similar fix as this commit did for overlays:

commit b9952b5218added5577e4a3443969bc20884cea9
Author: Frank Rowand 
Date:   Thu Jul 12 14:00:07 2018 -0700

of: overlay: update phandle cache on overlay apply and remove

A comment in the review of the patch adding the phandle cache said that
the cache would have to be updated when modules are applied and removed.
This patch implements the cache updates.

Fixes: 0b3ce78e90fc ("of: cache phandle nodes to reduce cost of
of_find_node_by_phandle()")
Reported-by: Alan Tull 
Suggested-by: Alan Tull 
Signed-off-by: Frank Rowand 
Signed-off-by: Rob Herring 


Really what we need here is an "invalidate phandle" function rather
than free and re-allocate the whole damn cache.

Rob

>
> Michael Bringmann  writes:
> > See below.
> >
> > On 07/30/2018 01:31 AM, Michael Ellerman wrote:
> >> Michael Bringmann  writes:
> >>
> >>> During LPAR migration, the content of the device tree/sysfs may
> >>> be updated including deletion and replacement of nodes in the
> >>> tree.  When nodes are added to the internal node structures, they
> >>> are appended in FIFO order to a list of nodes maintained by the
> >>> OF code APIs.
> >>
> >> That hasn't been true for several years. The data structure is an n-ary
> >> tree. What kernel version are you working on?
> >
> > Sorry for an error in my description.  I oversimplified based on the
> > name of a search iterator.  Let me try to provide a better explanation
> > of the problem, here.
> >
> > This is the problem.  The PPC mobility code receives RTAS requests to
> > delete nodes with platform-/hardware-specific attributes when restarting
> > the kernel after a migration.  My example is for migration between a
> > P8 Alpine and a P8 Brazos.   Nodes to be deleted may include 
> > 'ibm,random-v1',
> > 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1',
> > or others.
> >
> > The mobility.c code calls 'of_detach_node' for the nodes and their children.
> > This makes calls to detach the properties and to try to remove the 
> > associated
> > sysfs/kernfs files.
> >
> > Then new copies of the same nodes are next provided by the PHYP, local
> > copies are built, and a pointer to the 'struct device_node' is passed to
> > of_attach_node.  Before the call to of_attach_node, the phandle is 
> > initialized
> > to 0 when the data structure is alloced.  During the call to of_attach_node,
> > it calls __of_attach_node which pulls the actual name and phandle from just
> > created sub-properties named something like 'name' and 'ibm,phandle'.
> >
> > This is all fine for the first migration.  The problem occurs with the
> > second and subsequent migrations when the PHYP on the new system wants to
> > replace the same set of nodes again, referenced with the same names and
> > phandle values.
> >
> >>
> >>> When nodes are removed from the device tree, they
> >>> are marked OF_DETACHED, but not actually deleted from the system
> >>> to allow for pointers cached elsewhere in the kernel.  The order
> >>> and content of the entries in the list of nodes is not altered,
> >>> though.
> >>
> >> Something is going wrong if this is actually happening.
> >>
> >> When the node is detached it should be *detached* from the tree of all
> >> nodes, so it should not be discoverable other than by having an existing
> >> pointer to it.
> > On the second and subsequent migrations, the PHYP tells the system
> > to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1',
> > 'ibm,compression-v1', 'ibm,sym-encryption-v1'.  It specifies these
> > nodes by its known set of phandle values -- the same handles used
> > by the PHYP on the source system are known on the target system.
> > The mobility.c code calls of_find_node_by_phandle() with these values
> > and ends up locating the first instance of each node that was added
> > during the original boot, instead of the second instance of each node
> > created after the first migration.  The detach during the second
> > migration fails with errors like,
> >
> > [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 
> > __of_detach_node+0x8/0xa0
> > [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag 
> > inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc 
> > xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod 
> > ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
> > [ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: GW 
> > 4.18.0-rc1-wi107836-v05-120+ #201
> > [ 4565.030737] NIP:  c07c1ea8 LR: c07c1fb4 CTR: 
> > 00655170
> > [ 4565.030741] REGS: c003f302b690 TRAP: 0700   Tainted: GW  
> > 

[PATCH] powerpc/selftests: Avoid backgroud process/threads

2018-07-31 Thread Breno Leitao
Current tm-unavailable test runs for a long period (>120 seconds), and if it is
interrupted, as pressing CRTL-C (SIGINT), the foreground process (harness) dies
but the child process and threads continue to execute (with PPID = 1 now).

In this case, you'd think the test is gone, but there are two threads being
executed in background, one of the thread ('pong') consumes 100% of the CPU and
the other one ('ping') dumps output message, from time to time, in the STDOUT,
which is annoying.

This patch simply gets the child process to be SIGTERMed when the parent dies.

Signed-off-by: Breno Leitao 
Signed-off-by: Gustavo Romero 
---
 tools/testing/selftests/powerpc/tm/tm-unavailable.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/tools/testing/selftests/powerpc/tm/tm-unavailable.c 
b/tools/testing/selftests/powerpc/tm/tm-unavailable.c
index 156c8e750259..c42f8b60063c 100644
--- a/tools/testing/selftests/powerpc/tm/tm-unavailable.c
+++ b/tools/testing/selftests/powerpc/tm/tm-unavailable.c
@@ -23,6 +23,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include "tm.h"
 
@@ -342,6 +344,9 @@ int tm_unavailable_test(void)
 
SKIP_IF(!have_htm());
 
+   /* Send me SIGTERM if PPID is dead (as SIGINTed) */
+   prctl(PR_SET_PDEATHSIG, SIGTERM);
+
/* Set only CPU 0 in the mask. Both threads will be bound to CPU 0. */
CPU_ZERO();
CPU_SET(0, );
-- 
2.16.3



[PATCH] powerpc/fadump: handle crash memory ranges array overflow

2018-07-31 Thread Hari Bathini
Crash memory ranges is an array of memory ranges of the crashing kernel
to be exported as a dump via /proc/vmcore file. The size of the array
is set based on INIT_MEMBLOCK_REGIONS, which works alright in most cases
where memblock memory regions count is less than INIT_MEMBLOCK_REGIONS
value. But this count can grow beyond INIT_MEMBLOCK_REGIONS value since
commit 142b45a72e22 ("memblock: Add array resizing support").

On large memory systems with a few DLPAR operations, the memblock memory
regions count could be larger than INIT_MEMBLOCK_REGIONS value. On such
systems, registering fadump results in crash or other system failures
like below:

  task: c7f39a290010 ti: cb738000 task.ti: cb738000
  NIP: c0047df4 LR: c00f9e58 CTR: c010f180
  REGS: cb73b570 TRAP: 0300   Tainted: G  L   X  (4.4.140+)
  MSR: 80009033   CR: 22004484  XER: 2000
  CFAR: c0008500 DAR: 07a45000 DSISR: 4000 SOFTE: 0
  GPR00: c00f9e58 cb73b7f0 c0f09a00 001a
  GPR04: c7f3bf774c90 0004 c0eb9a00 0800
  GPR08: 0804 07a45000 c0fa9a00 c7ffb169ca20
  GPR12: 22004482 cfa12c00 c7f3a0ea97a8 
  GPR16: c7f3a0ea9a50 cb73bd60 0118 0001fe80
  GPR20: 0118  c0b8c980 00d0
  GPR24: 07ffb0b1 c7ffb169c980  c0b8c980
  GPR28: 0004 c7ffb169c980 001a c7ffb169c980
  NIP [c0047df4] smp_send_reschedule+0x24/0x80
  LR [c00f9e58] resched_curr+0x138/0x160
  Call Trace:
  [cb73b7f0] [c00f9e58] resched_curr+0x138/0x160 (unreliable)
  [cb73b820] [c00fb538] check_preempt_curr+0xc8/0xf0
  [cb73b850] [c00fb598] ttwu_do_wakeup+0x38/0x150
  [cb73b890] [c00fc9c4] try_to_wake_up+0x224/0x4d0
  [cb73b900] [c011ef34] __wake_up_common+0x94/0x100
  [cb73b960] [c034a78c] ep_poll_callback+0xac/0x1c0
  [cb73b9b0] [c011ef34] __wake_up_common+0x94/0x100
  [cb73ba10] [c011f810] __wake_up_sync_key+0x70/0xa0
  [cb73ba60] [c067c3e8] sock_def_readable+0x58/0xa0
  [cb73ba90] [c07848ac] unix_stream_sendmsg+0x2dc/0x4c0
  [cb73bb70] [c0675a38] sock_sendmsg+0x68/0xa0
  [cb73bba0] [c067673c] ___sys_sendmsg+0x2cc/0x2e0
  [cb73bd30] [c0677dbc] __sys_sendmsg+0x5c/0xc0
  [cb73bdd0] [c06789bc] SyS_socketcall+0x36c/0x3f0
  [cb73be30] [c0009488] system_call+0x3c/0x100
  Instruction dump:
  4e800020 6000 6042 3c4c00ec 38421c30 7c0802a6 f8010010 6000
  3d42000a e92ab420 2fa9 4dde0020  2fa9 419e0044 7c0802a6
  ---[ end trace a6d1dd4bab5f8253 ]---

as array index overflow is not checked for while setting up crash memory
ranges causing memory corruption. To resolve this issue, resize crash
memory ranges array on hitting array size limit.

But without a hard limit on the number of crash memory ranges, there is
a possibility of program headers count overflow in the /proc/vmcore ELF
file while exporting each of this memory ranges as PT_LOAD segments. To
reduce the likelihood of such scenario, fold adjacent memory ranges to
minimize the total number of crash memory ranges.

Fixes: 2df173d9e85d ("fadump: Initialize elfcore header and add PT_LOAD program 
headers.")
Cc: sta...@vger.kernel.org
Cc: Mahesh Salgaonkar 
Signed-off-by: Hari Bathini 
---
 arch/powerpc/include/asm/fadump.h |2 +
 arch/powerpc/kernel/fadump.c  |   63 ++---
 2 files changed, 59 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/fadump.h 
b/arch/powerpc/include/asm/fadump.h
index 5a23010..ff708b3 100644
--- a/arch/powerpc/include/asm/fadump.h
+++ b/arch/powerpc/include/asm/fadump.h
@@ -196,7 +196,7 @@ struct fadump_crash_info_header {
 };
 
 /* Crash memory ranges */
-#define INIT_CRASHMEM_RANGES   (INIT_MEMBLOCK_REGIONS + 2)
+#define INIT_CRASHMEM_RANGES   INIT_MEMBLOCK_REGIONS
 
 struct fad_crash_memory_ranges {
unsigned long long  base;
diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c
index 07e8396..1c1df4f 100644
--- a/arch/powerpc/kernel/fadump.c
+++ b/arch/powerpc/kernel/fadump.c
@@ -47,7 +47,9 @@ static struct fadump_mem_struct fdm;
 static const struct fadump_mem_struct *fdm_active;
 
 static DEFINE_MUTEX(fadump_mutex);
-struct fad_crash_memory_ranges crash_memory_ranges[INIT_CRASHMEM_RANGES];
+struct fad_crash_memory_ranges init_crash_memory_ranges[INIT_CRASHMEM_RANGES];
+int max_crash_mem_ranges = INIT_CRASHMEM_RANGES;
+struct fad_crash_memory_ranges *crash_memory_ranges = init_crash_memory_ranges;
 int crash_mem_ranges;
 
 /* Scan the Firmware Assisted dump configuration details. */
@@ -871,14 

[RFC PATCH v3] powerpc/64s: Move idle code to powernv C code

2018-07-31 Thread Nicholas Piggin
Reimplement Book3S idle code to C, in the powernv platform code.
Assembly stubs are used to save and restore the stack frame and
non-volatile GPRs before going to idle, but these are small and
mostly agnostic to microarchitecture implementation details.

The optimisation where EC=ESL=0 idle modes did not have to save
GPRs or mtmsrd L=0 is restored, because it's simple to do.

Idle wakeup no longer uses the ->cpu_restore call to reinit SPRs,
but saves and restores them all explicitly. This can easily be
extended to tracking the set of system-wide SPRs that do not have
to be saved each time.

Moving the HMI, SPR, OPAL, locking, etc. to C is the only real
way this stuff will cope with non-trivial new CPU implementation
details, firmware changes, etc., without becoming unmaintainable.

Since RFC v1:
- Now tested and working with POWER9 hash and radix.
- KVM support added. This took a bit of work to untangle and might
  still have some issues, but POWER9 seems to work including hash on
  radix with dependent threads mode.
- This snowballed a bit because of KVM and other details making it
  not feasible to leave POWER7/8 code alone. That's only half done
  at the moment.
- So far this trades about 800 lines of asm for 500 of C. With POWER7/8
  support done it might be another hundred or so lines of C.

Since RFC v2:
- Fixed deep state SLB reloading
- Now tested and working with POWER8.
- Accounted for most feedback.

Thanks,
Nick

---
 include/asm/book3s/64/mmu-hash.h |1 
 include/asm/cpuidle.h|   21 
 include/asm/paca.h   |   40 -
 include/asm/processor.h  |9 
 include/asm/reg.h|7 
 kernel/asm-offsets.c |   18 
 kernel/dt_cpu_ftrs.c |   21 
 kernel/exceptions-64s.S  |   17 
 kernel/idle_book3s.S |  998 +++
 kernel/setup-common.c|4 
 kvm/book3s_hv_rmhandlers.S   |   94 ++-
 mm/slb.c |   15 
 platforms/powernv/idle.c |  839 ++--
 platforms/powernv/subcore.c  |2 
 xmon/xmon.c  |   25 
 15 files changed, 902 insertions(+), 1209 deletions(-)

diff --git a/arch/powerpc/include/asm/book3s/64/mmu-hash.h 
b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
index 50ed64fba4ae..b68a4fe446d6 100644
--- a/arch/powerpc/include/asm/book3s/64/mmu-hash.h
+++ b/arch/powerpc/include/asm/book3s/64/mmu-hash.h
@@ -487,6 +487,7 @@ extern void hpte_init_native(void);
 
 extern void slb_initialize(void);
 extern void slb_flush_and_rebolt(void);
+extern void slb_shadow_reload(void);
 
 extern void slb_vmalloc_update(void);
 extern void slb_set_size(u16 size);
diff --git a/arch/powerpc/include/asm/cpuidle.h 
b/arch/powerpc/include/asm/cpuidle.h
index e210a83eb196..9b5c7ec908f2 100644
--- a/arch/powerpc/include/asm/cpuidle.h
+++ b/arch/powerpc/include/asm/cpuidle.h
@@ -27,10 +27,11 @@
  * the THREAD_WINKLE_BITS are set, which indicate which threads have not
  * yet woken from the winkle state.
  */
-#define PNV_CORE_IDLE_LOCK_BIT 0x1000
+#define NR_PNV_CORE_IDLE_LOCK_BIT  28
+#define PNV_CORE_IDLE_LOCK_BIT (1ULL << 
NR_PNV_CORE_IDLE_LOCK_BIT)
 
+#define PNV_CORE_IDLE_WINKLE_COUNT_SHIFT   16
 #define PNV_CORE_IDLE_WINKLE_COUNT 0x0001
-#define PNV_CORE_IDLE_WINKLE_COUNT_ALL_BIT 0x0008
 #define PNV_CORE_IDLE_WINKLE_COUNT_BITS0x000F
 #define PNV_CORE_IDLE_THREAD_WINKLE_BITS_SHIFT 8
 #define PNV_CORE_IDLE_THREAD_WINKLE_BITS   0xFF00
@@ -68,22 +69,6 @@
 #define ERR_DEEP_STATE_ESL_MISMATCH-2
 
 #ifndef __ASSEMBLY__
-/* Additional SPRs that need to be saved/restored during stop */
-struct stop_sprs {
-   u64 pid;
-   u64 ldbar;
-   u64 fscr;
-   u64 hfscr;
-   u64 mmcr1;
-   u64 mmcr2;
-   u64 mmcra;
-};
-
-extern u32 pnv_fastsleep_workaround_at_entry[];
-extern u32 pnv_fastsleep_workaround_at_exit[];
-
-extern u64 pnv_first_deep_stop_state;
-
 unsigned long pnv_cpu_offline(unsigned int cpu);
 int validate_psscr_val_mask(u64 *psscr_val, u64 *psscr_mask, u32 flags);
 static inline void report_invalid_psscr_val(u64 psscr_val, int err)
diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h
index 4e9cede5a7e7..d2cee5ebaaa1 100644
--- a/arch/powerpc/include/asm/paca.h
+++ b/arch/powerpc/include/asm/paca.h
@@ -168,7 +168,6 @@ struct paca_struct {
u8 irq_happened;/* irq happened while soft-disabled */
u8 io_sync; /* writel() needs spin_unlock sync */
u8 irq_work_pending;/* IRQ_WORK interrupt while 
soft-disable */
-   u8 nap_state_lost;  /* NV GPR values lost in power7_idle */
 #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE
u8 pmcregs_in_use;  /* pseries puts this in lppaca */
 #endif
@@ -178,23 +177,30 @@ struct paca_struct {
 #endif
 
 #ifdef 

[PATCH] powerpc/64s/radix: Fix missing global invalidations when removing copro

2018-07-31 Thread Frederic Barrat
With the optimizations for TLB invalidation from commit 0cef77c7798a
("powerpc/64s/radix: flush remote CPUs out of single-threaded
mm_cpumask"), the scope of a TLBI (global vs. local) can now be
influenced by the value of the 'copros' counter of the memory context.

When calling mm_context_remove_copro(), the 'copros' counter is
decremented first before flushing. It may have the unintended side
effect of sending local TLBIs when we explicitly need global
invalidations in this case. Thus breaking any nMMU user in a bad and
unpredictable way.

Fix it by flushing first, before updating the 'copros' counter, so
that invalidations will be global.

Fixes: 0cef77c7798a ("powerpc/64s/radix: flush remote CPUs out of 
single-threaded mm_cpumask")
Signed-off-by: Frederic Barrat 
---
 arch/powerpc/include/asm/mmu_context.h | 33 --
 1 file changed, 21 insertions(+), 12 deletions(-)

diff --git a/arch/powerpc/include/asm/mmu_context.h 
b/arch/powerpc/include/asm/mmu_context.h
index 79d570cbf332..b2f89b621b15 100644
--- a/arch/powerpc/include/asm/mmu_context.h
+++ b/arch/powerpc/include/asm/mmu_context.h
@@ -143,24 +143,33 @@ static inline void mm_context_remove_copro(struct 
mm_struct *mm)
 {
int c;
 
-   c = atomic_dec_if_positive(>context.copros);
-
-   /* Detect imbalance between add and remove */
-   WARN_ON(c < 0);
-
/*
-* Need to broadcast a global flush of the full mm before
-* decrementing active_cpus count, as the next TLBI may be
-* local and the nMMU and/or PSL need to be cleaned up.
-* Should be rare enough so that it's acceptable.
+* When removing the last copro, we need to broadcast a global
+* flush of the full mm, as the next TLBI may be local and the
+* nMMU and/or PSL need to be cleaned up.
+*
+* Both the 'copros' and 'active_cpus' counts are looked at in
+* flush_all_mm() to determine the scope (local/global) of the
+* TLBIs, so we need to flush first before decrementing
+* 'copros'. If this API is used by several callers for the
+* same context, it can lead to over-flushing. It's hopefully
+* not common enough to be a problem.
 *
 * Skip on hash, as we don't know how to do the proper flush
 * for the time being. Invalidations will remain global if
-* used on hash.
+* used on hash. Note that we can't drop 'copros' either, as
+* it could make some invalidations local with no flush
+* in-between.
 */
-   if (c == 0 && radix_enabled()) {
+   if (radix_enabled()) {
flush_all_mm(mm);
-   dec_mm_active_cpus(mm);
+
+   c = atomic_dec_if_positive(>context.copros);
+   /* Detect imbalance between add and remove */
+   WARN_ON(c < 0);
+
+   if (c == 0)
+   dec_mm_active_cpus(mm);
}
 }
 #else
-- 
2.17.1



Re: [PATCH 08/20] powerpc/dma: remove the unused dma_nommu_ops export

2018-07-31 Thread Christoph Hellwig
It turns out cxl actually uses it.  So for now skip this patch,
although random code in drivers messing with dma ops will need to
be sorted out sooner or later.


[PATCH v2 2/2] selftests/powerpc: Add more version checks to alignment_handler test

2018-07-31 Thread Michael Ellerman
The alignment_handler is documented to only work on Power8/Power9, but
we can make it run on older CPUs by guarding more of the tests with
feature checks.

Signed-off-by: Michael Ellerman 
---
 .../powerpc/alignment/alignment_handler.c  | 67 +++---
 1 file changed, 59 insertions(+), 8 deletions(-)

v2: Don't incorrectly duplicate any of the tests, as noticed by @ajd.

diff --git a/tools/testing/selftests/powerpc/alignment/alignment_handler.c 
b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
index 0eddd16af49f..169a8b9719fb 100644
--- a/tools/testing/selftests/powerpc/alignment/alignment_handler.c
+++ b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
@@ -49,6 +49,8 @@
 #include 
 #include 
 
+#include 
+
 #include "utils.h"
 
 int bufsize;
@@ -289,6 +291,7 @@ int test_alignment_handler_vsx_206(void)
int rc = 0;
 
SKIP_IF(!can_open_fb0());
+   SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06));
 
printf("VSX: 2.06B\n");
LOAD_VSX_XFORM_TEST(lxvd2x);
@@ -306,6 +309,7 @@ int test_alignment_handler_vsx_207(void)
int rc = 0;
 
SKIP_IF(!can_open_fb0());
+   SKIP_IF(!have_hwcap2(PPC_FEATURE2_ARCH_2_07));
 
printf("VSX: 2.07B\n");
LOAD_VSX_XFORM_TEST(lxsspx);
@@ -380,7 +384,6 @@ int test_alignment_handler_integer(void)
LOAD_DFORM_TEST(ldu);
LOAD_XFORM_TEST(ldx);
LOAD_XFORM_TEST(ldux);
-   LOAD_XFORM_TEST(ldbrx);
LOAD_DFORM_TEST(lmw);
STORE_DFORM_TEST(stb);
STORE_XFORM_TEST(stbx);
@@ -400,8 +403,23 @@ int test_alignment_handler_integer(void)
STORE_XFORM_TEST(stdx);
STORE_DFORM_TEST(stdu);
STORE_XFORM_TEST(stdux);
-   STORE_XFORM_TEST(stdbrx);
STORE_DFORM_TEST(stmw);
+
+   return rc;
+}
+
+int test_alignment_handler_integer_206(void)
+{
+   int rc = 0;
+
+   SKIP_IF(!can_open_fb0());
+   SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06));
+
+   printf("Integer: 2.06\n");
+
+   LOAD_XFORM_TEST(ldbrx);
+   STORE_XFORM_TEST(stdbrx);
+
return rc;
 }
 
@@ -410,6 +428,7 @@ int test_alignment_handler_vmx(void)
int rc = 0;
 
SKIP_IF(!can_open_fb0());
+   SKIP_IF(!have_hwcap(PPC_FEATURE_HAS_ALTIVEC));
 
printf("VMX\n");
LOAD_VMX_XFORM_TEST(lvx);
@@ -441,20 +460,14 @@ int test_alignment_handler_fp(void)
printf("Floating point\n");
LOAD_FLOAT_DFORM_TEST(lfd);
LOAD_FLOAT_XFORM_TEST(lfdx);
-   LOAD_FLOAT_DFORM_TEST(lfdp);
-   LOAD_FLOAT_XFORM_TEST(lfdpx);
LOAD_FLOAT_DFORM_TEST(lfdu);
LOAD_FLOAT_XFORM_TEST(lfdux);
LOAD_FLOAT_DFORM_TEST(lfs);
LOAD_FLOAT_XFORM_TEST(lfsx);
LOAD_FLOAT_DFORM_TEST(lfsu);
LOAD_FLOAT_XFORM_TEST(lfsux);
-   LOAD_FLOAT_XFORM_TEST(lfiwzx);
-   LOAD_FLOAT_XFORM_TEST(lfiwax);
STORE_FLOAT_DFORM_TEST(stfd);
STORE_FLOAT_XFORM_TEST(stfdx);
-   STORE_FLOAT_DFORM_TEST(stfdp);
-   STORE_FLOAT_XFORM_TEST(stfdpx);
STORE_FLOAT_DFORM_TEST(stfdu);
STORE_FLOAT_XFORM_TEST(stfdux);
STORE_FLOAT_DFORM_TEST(stfs);
@@ -466,6 +479,38 @@ int test_alignment_handler_fp(void)
return rc;
 }
 
+int test_alignment_handler_fp_205(void)
+{
+   int rc = 0;
+
+   SKIP_IF(!can_open_fb0());
+   SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_05));
+
+   printf("Floating point: 2.05\n");
+
+   LOAD_FLOAT_DFORM_TEST(lfdp);
+   LOAD_FLOAT_XFORM_TEST(lfdpx);
+   LOAD_FLOAT_XFORM_TEST(lfiwax);
+   STORE_FLOAT_DFORM_TEST(stfdp);
+   STORE_FLOAT_XFORM_TEST(stfdpx);
+
+   return rc;
+}
+
+int test_alignment_handler_fp_206(void)
+{
+   int rc = 0;
+
+   SKIP_IF(!can_open_fb0());
+   SKIP_IF(!have_hwcap(PPC_FEATURE_ARCH_2_06));
+
+   printf("Floating point: 2.06\n");
+
+   LOAD_FLOAT_XFORM_TEST(lfiwzx);
+
+   return rc;
+}
+
 void usage(char *prog)
 {
printf("Usage: %s [options]\n", prog);
@@ -513,9 +558,15 @@ int main(int argc, char *argv[])
   "test_alignment_handler_vsx_300");
rc |= test_harness(test_alignment_handler_integer,
   "test_alignment_handler_integer");
+   rc |= test_harness(test_alignment_handler_integer_206,
+  "test_alignment_handler_integer_206");
rc |= test_harness(test_alignment_handler_vmx,
   "test_alignment_handler_vmx");
rc |= test_harness(test_alignment_handler_fp,
   "test_alignment_handler_fp");
+   rc |= test_harness(test_alignment_handler_fp_205,
+  "test_alignment_handler_fp_205");
+   rc |= test_harness(test_alignment_handler_fp_206,
+  "test_alignment_handler_fp_206");
return rc;
 }
-- 
2.14.1



[PATCH v2 1/2] selftests/powerpc: Skip earlier in alignment_handler test

2018-07-31 Thread Michael Ellerman
Currently the alignment_handler test prints "Can't open /dev/fb0"
about 80 times per run, which is a little annoying.

Refactor it to check earlier if it can open /dev/fb0 and skip if not,
this results in each test printing something like:

  test: test_alignment_handler_vsx_206
  tags: git_version:v4.18-rc3-134-gfb21a48904aa
  [SKIP] Test skipped on line 291
  skip: test_alignment_handler_vsx_206

Signed-off-by: Michael Ellerman 
Acked-by: Andrew Donnellan 
Signed-off-by: Michael Ellerman 
---
 .../powerpc/alignment/alignment_handler.c  | 40 +++---
 1 file changed, 35 insertions(+), 5 deletions(-)

v2: Unchanged.

diff --git a/tools/testing/selftests/powerpc/alignment/alignment_handler.c 
b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
index 0f2698f9fd6d..0eddd16af49f 100644
--- a/tools/testing/selftests/powerpc/alignment/alignment_handler.c
+++ b/tools/testing/selftests/powerpc/alignment/alignment_handler.c
@@ -40,6 +40,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -191,7 +192,7 @@ int test_memcmp(void *s1, void *s2, int n, int offset, char 
*test_name)
  */
 int do_test(char *test_name, void (*test_func)(char *, char *))
 {
-   int offset, width, fd, rc = 0, r;
+   int offset, width, fd, rc, r;
void *mem0, *mem1, *ci0, *ci1;
 
printf("\tDoing %s:\t", test_name);
@@ -199,8 +200,8 @@ int do_test(char *test_name, void (*test_func)(char *, char 
*))
fd = open("/dev/fb0", O_RDWR);
if (fd < 0) {
printf("\n");
-   perror("Can't open /dev/fb0");
-   SKIP_IF(1);
+   perror("Can't open /dev/fb0 now?");
+   return 1;
}
 
ci0 = mmap(NULL, bufsize, PROT_WRITE, MAP_SHARED,
@@ -226,6 +227,7 @@ int do_test(char *test_name, void (*test_func)(char *, char 
*))
return rc;
}
 
+   rc = 0;
/* offset = 0 no alignment fault, so skip */
for (offset = 1; offset < 16; offset++) {
width = 16; /* vsx == 16 bytes */
@@ -244,32 +246,50 @@ int do_test(char *test_name, void (*test_func)(char *, 
char *))
r |= test_memcpy(mem1, mem0, width, offset, test_func);
if (r && !debug) {
printf("FAILED: Got signal");
+   rc = 1;
break;
}
 
r |= test_memcmp(mem1, ci1, width, offset, test_name);
-   rc |= r;
if (r && !debug) {
printf("FAILED: Wrong Data");
+   rc = 1;
break;
}
}
-   if (!r)
+
+   if (rc == 0)
printf("PASSED");
+
printf("\n");
 
munmap(ci0, bufsize);
munmap(ci1, bufsize);
free(mem0);
free(mem1);
+   close(fd);
 
return rc;
 }
 
+static bool can_open_fb0(void)
+{
+   int fd;
+
+   fd = open("/dev/fb0", O_RDWR);
+   if (fd < 0)
+   return false;
+
+   close(fd);
+   return true;
+}
+
 int test_alignment_handler_vsx_206(void)
 {
int rc = 0;
 
+   SKIP_IF(!can_open_fb0());
+
printf("VSX: 2.06B\n");
LOAD_VSX_XFORM_TEST(lxvd2x);
LOAD_VSX_XFORM_TEST(lxvw4x);
@@ -285,6 +305,8 @@ int test_alignment_handler_vsx_207(void)
 {
int rc = 0;
 
+   SKIP_IF(!can_open_fb0());
+
printf("VSX: 2.07B\n");
LOAD_VSX_XFORM_TEST(lxsspx);
LOAD_VSX_XFORM_TEST(lxsiwax);
@@ -298,6 +320,8 @@ int test_alignment_handler_vsx_300(void)
 {
int rc = 0;
 
+   SKIP_IF(!can_open_fb0());
+
SKIP_IF(!have_hwcap2(PPC_FEATURE2_ARCH_3_00));
printf("VSX: 3.00B\n");
LOAD_VMX_DFORM_TEST(lxsd);
@@ -328,6 +352,8 @@ int test_alignment_handler_integer(void)
 {
int rc = 0;
 
+   SKIP_IF(!can_open_fb0());
+
printf("Integer\n");
LOAD_DFORM_TEST(lbz);
LOAD_DFORM_TEST(lbzu);
@@ -383,6 +409,8 @@ int test_alignment_handler_vmx(void)
 {
int rc = 0;
 
+   SKIP_IF(!can_open_fb0());
+
printf("VMX\n");
LOAD_VMX_XFORM_TEST(lvx);
 
@@ -408,6 +436,8 @@ int test_alignment_handler_fp(void)
 {
int rc = 0;
 
+   SKIP_IF(!can_open_fb0());
+
printf("Floating point\n");
LOAD_FLOAT_DFORM_TEST(lfd);
LOAD_FLOAT_XFORM_TEST(lfdx);
-- 
2.14.1



Re: [PATCH] powerpc/pasemi: Seach for PCI root bus by compatible property

2018-07-31 Thread Michael Ellerman
Michael Ellerman  writes:
> Darren Stevens  writes:
>
>> Pasemi arch code finds the root of the PCI-e bus by searching the
>> device-tree for a node called 'pxp'. But the root bus has a 
>> compatible property of 'pasemi,rootbus' so search for that instead.
>>
>> Signed-off-by: Darren Stevens 
>> ---
>>
>> This works on the Amigaone X1000, I don't know if this method of
>> finding the pci bus was there bcause of earlier firmwares.
>
> Does anyone have another pasemi board they can test this on?
>
> The last time I plugged mine in it popped the power supply and took out
> power to half the office :) - I haven't had a chance to try it since.

I actually I remembered I have a device tree lying around from an electra.

It has:

  [I] home:pxp@0,8000(7)(I)> lsprop name compatible
  name "pxp"
  compatible   "pasemi,rootbus"
   "pa-pxp"


So it looks like the patch would work fine on it at least.

cheers

>> diff --git a/arch/powerpc/platforms/pasemi/pci.c 
>> b/arch/powerpc/platforms/pasemi/pci.c
>> index c7c8607..be62380 100644
>> --- a/arch/powerpc/platforms/pasemi/pci.c
>> +++ b/arch/powerpc/platforms/pasemi/pci.c
>> @@ -216,6 +216,7 @@ static int __init pas_add_bridge(struct device_node *dev)
>>  void __init pas_pci_init(void)
>>  {
>> struct device_node *np, *root;
>> +   int res;
>>  
>> root = of_find_node_by_path("/");
>> if (!root) {
>> @@ -226,11 +227,11 @@ void __init pas_pci_init(void)
>>  
>> pci_set_flags(PCI_SCAN_ALL_PCIE_DEVS);
>>  
>> -   for (np = NULL; (np = of_get_next_child(root, np)) != NULL;)
>> -   if (np->name && !strcmp(np->name, "pxp") && !pas_add_bridge(np))
>> -   of_node_get(np);
>> -
>> -   of_node_put(root);
>> +   np = of_find_compatible_node(root, NULL, "pasemi,rootbus");
>> +   if (np) {
>> +   res = pas_add_bridge(np);
>> +   of_node_put(np);
>> +   }
>>  }
>>  
>>  void __iomem *pasemi_pci_getcfgaddr(struct pci_dev *dev, int offset)


Re: [PATCH] powerpc/pasemi: Seach for PCI root bus by compatible property

2018-07-31 Thread Michael Ellerman
Darren Stevens  writes:

> Pasemi arch code finds the root of the PCI-e bus by searching the
> device-tree for a node called 'pxp'. But the root bus has a 
> compatible property of 'pasemi,rootbus' so search for that instead.
>
> Signed-off-by: Darren Stevens 
> ---
>
> This works on the Amigaone X1000, I don't know if this method of
> finding the pci bus was there bcause of earlier firmwares.

Does anyone have another pasemi board they can test this on?

The last time I plugged mine in it popped the power supply and took out
power to half the office :) - I haven't had a chance to try it since.

cheers

> diff --git a/arch/powerpc/platforms/pasemi/pci.c 
> b/arch/powerpc/platforms/pasemi/pci.c
> index c7c8607..be62380 100644
> --- a/arch/powerpc/platforms/pasemi/pci.c
> +++ b/arch/powerpc/platforms/pasemi/pci.c
> @@ -216,6 +216,7 @@ static int __init pas_add_bridge(struct device_node *dev)
>  void __init pas_pci_init(void)
>  {
> struct device_node *np, *root;
> +   int res;
>  
> root = of_find_node_by_path("/");
> if (!root) {
> @@ -226,11 +227,11 @@ void __init pas_pci_init(void)
>  
> pci_set_flags(PCI_SCAN_ALL_PCIE_DEVS);
>  
> -   for (np = NULL; (np = of_get_next_child(root, np)) != NULL;)
> -   if (np->name && !strcmp(np->name, "pxp") && !pas_add_bridge(np))
> -   of_node_get(np);
> -
> -   of_node_put(root);
> +   np = of_find_compatible_node(root, NULL, "pasemi,rootbus");
> +   if (np) {
> +   res = pas_add_bridge(np);
> +   of_node_put(np);
> +   }
>  }
>  
>  void __iomem *pasemi_pci_getcfgaddr(struct pci_dev *dev, int offset)


Re: [PATCH resend] powerpc/64s: fix page table fragment refcount race vs speculative references

2018-07-31 Thread Michael Ellerman
Nicholas Piggin  writes:
> On Fri, 27 Jul 2018 08:38:35 -0700
> Matthew Wilcox  wrote:
>> On Sat, Jul 28, 2018 at 12:29:06AM +1000, Nicholas Piggin wrote:
>> > On Fri, 27 Jul 2018 06:41:56 -0700
>> > Matthew Wilcox  wrote:
>> > > On Fri, Jul 27, 2018 at 09:48:17PM +1000, Nicholas Piggin wrote:  
>> > > > The page table fragment allocator uses the main page refcount racily
>> > > > with respect to speculative references. A customer observed a BUG due
>> > > > to page table page refcount underflow in the fragment allocator. This
>> > > > can be caused by the fragment allocator set_page_count stomping on a
>> > > > speculative reference, and then the speculative failure handler
>> > > > decrements the new reference, and the underflow eventually pops when
>> > > > the page tables are freed.
>> > > 
>> > > Oof.  Can't you fix this instead by using page_ref_add() instead of
>> > > set_page_count()?  
>> > 
>> > It's ugly doing it that way. The problem is we have a page table
>> > destructor and that would be missed if the spec ref was the last
>> > put. In practice with RCU page table freeing maybe you can say
>> > there will be no spec ref there (unless something changes), but
>> > still it just seems much simpler doing this and avoiding any
>> > complexity or relying on other synchronization.  
>> 
>> I don't want to rely on the speculative reference not happening by the
>> time the page table is torn down; that's way too black-magic for me.
>> Another possibility would be to use, say, the top 16 bits of the
>> atomic for your counter and call the dtor once the atomic is below 64k.
>> I'm also thinking about overhauling the dtor system so it's not tied to
>> compound pages; anyone with a bit in page_type would be able to use it.
>> That way you'd always get your dtor called, even if the speculative
>> reference was the last one.
>
> Yeah we could look at doing either of those if necessary.
>
>> > > > Any objection to the struct page change to grab the arch specific
>> > > > page table page word for powerpc to use? If not, then this should
>> > > > go via powerpc tree because it's inconsequential for core mm.
>> > > 
>> > > I want (eventually) to get to the point where every struct page carries
>> > > a pointer to the struct mm that it belongs to.  It's good for debugging
>> > > as well as handling memory errors in page tables.  
>> > 
>> > That doesn't seem like it should be a problem, there's some spare
>> > words there for arch independent users.  
>> 
>> Could you take one of the spare words instead then?  My intent was to
>> just take the 'x86 pgds only' comment off that member.  _pt_pad_2 looks
>> ideal because it'll be initialised to 0 and you'll return it to 0 by
>> the time you're done.
>
> It doesn't matter for powerpc where the atomic_t goes, so I'm fine with
> moving it. But could you juggle the fields with your patch instead? I
> thought it would be nice to using this field that has been already
> tested on x86 not to overlap with any other data for
> bug fix that'll have to be widely backported.

Can we come to a conclusion on this one?

As far as backporting goes pt_mm is new in 4.18-rc so the patch will
need to be manually backported anyway. But I agree with Nick we'd rather
use a slot that is known to be free for arch use.

cheers


Re: [PATCH v3] PCI: Data corruption happening due to race condition

2018-07-31 Thread Michael Ellerman
Bjorn Helgaas  writes:
> On Thu, Jul 19, 2018 at 02:18:09PM +1000, Benjamin Herrenschmidt wrote:
>> On Wed, 2018-07-18 at 18:29 -0500, Bjorn Helgaas wrote:
>> > [+cc Paul, Michael, linuxppc-dev]
>> > 
>> 
>>/...
>> 
>> > > Debugging revealed a race condition between pcie core driver
>> > > enabling is_added bit(pci_bus_add_device()) and nvme driver
>> > > reset work-queue enabling is_busmaster bit (by pci_set_master()).
>> > > As both fields are not handled in atomic manner and that clears
>> > > is_added bit.
>> > > 
>> > > Fix moves device addition is_added bit to separate private flag
>> > > variable and use different atomic functions to set and retrieve
>> > > device addition state. As is_added shares different memory
>> > > location so race condition is avoided.
>> > 
>> > Really nice bit of debugging!
>> 
>> Indeed. However I'm not fan of the solution. Shouldn't we instead have
>> some locking for the content of pci_dev ? I've always been wary of us
>> having other similar races in there.
>> 
>> As for the powerpc bits, I'm probably the one who wrote them, however,
>> I'm on vacation this week and right now, no bandwidth to context switch
>> all that back in :-) So give me a few days and/or ping me next week.
>
> OK, here's a ping :)
>
> Some powerpc cleanup would be ideal, but I'd like to fix the race for
> v4.19, so I'm fine with this patch as-is.  But I'd definitely want
> your ack before inserting the ugly #include path in the powerpc code.

Sorry, the patch didn't hit linuxppc so I forgot about it.

I'm OK with the patch, the include is a bit gross, but I guess it's
fine.

I have a change to pseries/setup.c queued that might collide, though
it's just an addition of another include so it's a trivial fixup.

Acked-by: Michael Ellerman 


In terms of longer term clean up, do you have a sketch of what you'd
like to see?

cheers


Re: [PATCH v5 09/11] hugetlb: Introduce generic version of huge_ptep_set_wrprotect

2018-07-31 Thread Alexandre Ghiti



On 07/31/2018 12:06 PM, Michael Ellerman wrote:

Alexandre Ghiti  writes:


arm, ia64, mips, sh, x86 architectures use the same version
of huge_ptep_set_wrprotect, so move this generic implementation into
asm-generic/hugetlb.h.
Note: powerpc uses twice for book3s/32 and nohash/32 the same version as
the above architectures, but the modification was not straightforward
and hence has not been done.

Do you remember what the problem was there?

It looks like you should just be able to drop them like the others. I
assume there's some header spaghetti that causes problems though?


Yes, the header spaghetti frightened me a bit. Maybe I should have tried 
harder: I can try to remove them and find the right defconfigs to 
compile both to begin with. And to guarantee the functionality is 
preserved, can I use the testsuite of libhugetlbfs with qemu ?


Alex



cheers



Signed-off-by: Alexandre Ghiti 
Reviewed-by: Mike Kravetz 
---
  arch/arm/include/asm/hugetlb-3level.h| 6 --
  arch/arm64/include/asm/hugetlb.h | 1 +
  arch/ia64/include/asm/hugetlb.h  | 6 --
  arch/mips/include/asm/hugetlb.h  | 6 --
  arch/parisc/include/asm/hugetlb.h| 1 +
  arch/powerpc/include/asm/book3s/32/pgtable.h | 2 ++
  arch/powerpc/include/asm/book3s/64/pgtable.h | 1 +
  arch/powerpc/include/asm/nohash/32/pgtable.h | 2 ++
  arch/powerpc/include/asm/nohash/64/pgtable.h | 1 +
  arch/sh/include/asm/hugetlb.h| 6 --
  arch/sparc/include/asm/hugetlb.h | 1 +
  arch/x86/include/asm/hugetlb.h   | 6 --
  include/asm-generic/hugetlb.h| 8 
  13 files changed, 17 insertions(+), 30 deletions(-)

diff --git a/arch/arm/include/asm/hugetlb-3level.h 
b/arch/arm/include/asm/hugetlb-3level.h
index b897541520ef..8247cd6a2ac6 100644
--- a/arch/arm/include/asm/hugetlb-3level.h
+++ b/arch/arm/include/asm/hugetlb-3level.h
@@ -37,12 +37,6 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
return retval;
  }
  
-static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,

-  unsigned long addr, pte_t *ptep)
-{
-   ptep_set_wrprotect(mm, addr, ptep);
-}
-
  static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep,
 pte_t pte, int dirty)
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 3e7f6e69b28d..f4f69ae5466e 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -48,6 +48,7 @@ extern int huge_ptep_set_access_flags(struct vm_area_struct 
*vma,
  #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
  extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
 unsigned long addr, pte_t *ptep);
+#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
  extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pte_t *ptep);
  #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index cbe296271030..49d1f7949f3a 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -27,12 +27,6 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
  {
  }
  
-static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,

-  unsigned long addr, pte_t *ptep)
-{
-   ptep_set_wrprotect(mm, addr, ptep);
-}
-
  static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep,
 pte_t pte, int dirty)
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index 6ff2531cfb1d..3dcf5debf8c4 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -63,12 +63,6 @@ static inline int huge_pte_none(pte_t pte)
return !val || (val == (unsigned long)invalid_pte_table);
  }
  
-static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,

-  unsigned long addr, pte_t *ptep)
-{
-   ptep_set_wrprotect(mm, addr, ptep);
-}
-
  static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 unsigned long addr,
 pte_t *ptep, pte_t pte,
diff --git a/arch/parisc/include/asm/hugetlb.h 
b/arch/parisc/include/asm/hugetlb.h
index fb7e0fd858a3..9c3950ca2974 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -39,6 +39,7 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
  {
  }
  
+#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT

  void huge_ptep_set_wrprotect(struct mm_struct *mm,
  

Re: [PATCH] PCI: call dma_debug_add_bus for pci_bus_type in common code

2018-07-31 Thread Bjorn Helgaas
On Mon, Jul 30, 2018 at 09:38:42AM +0200, Christoph Hellwig wrote:
> There is nothing arch specific about PCI or dma-debug, so move this
> call to common code just after registering the bus type.
> 
> Signed-off-by: Christoph Hellwig 

Applied with acks from Thomas and Michael to pci/misc for v4.19, thanks!

> ---
>  arch/powerpc/kernel/dma.c | 3 ---
>  arch/sh/drivers/pci/pci.c | 2 --
>  arch/x86/kernel/pci-dma.c | 3 ---
>  drivers/pci/pci-driver.c  | 2 +-
>  4 files changed, 1 insertion(+), 9 deletions(-)
> 
> diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
> index 155170d70324..dbfc7056d7df 100644
> --- a/arch/powerpc/kernel/dma.c
> +++ b/arch/powerpc/kernel/dma.c
> @@ -357,9 +357,6 @@ EXPORT_SYMBOL_GPL(dma_get_required_mask);
>  
>  static int __init dma_init(void)
>  {
> -#ifdef CONFIG_PCI
> - dma_debug_add_bus(_bus_type);
> -#endif
>  #ifdef CONFIG_IBMVIO
>   dma_debug_add_bus(_bus_type);
>  #endif
> diff --git a/arch/sh/drivers/pci/pci.c b/arch/sh/drivers/pci/pci.c
> index e5b7437ab4af..8256626bc53c 100644
> --- a/arch/sh/drivers/pci/pci.c
> +++ b/arch/sh/drivers/pci/pci.c
> @@ -160,8 +160,6 @@ static int __init pcibios_init(void)
>   for (hose = hose_head; hose; hose = hose->next)
>   pcibios_scanbus(hose);
>  
> - dma_debug_add_bus(_bus_type);
> -
>   pci_initialized = 1;
>  
>   return 0;
> diff --git a/arch/x86/kernel/pci-dma.c b/arch/x86/kernel/pci-dma.c
> index ab5d9dd668d2..43f58632f123 100644
> --- a/arch/x86/kernel/pci-dma.c
> +++ b/arch/x86/kernel/pci-dma.c
> @@ -155,9 +155,6 @@ static int __init pci_iommu_init(void)
>  {
>   struct iommu_table_entry *p;
>  
> -#ifdef CONFIG_PCI
> - dma_debug_add_bus(_bus_type);
> -#endif
>   x86_init.iommu.iommu_init();
>  
>   for (p = __iommu_table; p < __iommu_table_end; p++) {
> diff --git a/drivers/pci/pci-driver.c b/drivers/pci/pci-driver.c
> index 6792292b5fc7..bef17c3fca67 100644
> --- a/drivers/pci/pci-driver.c
> +++ b/drivers/pci/pci-driver.c
> @@ -1668,7 +1668,7 @@ static int __init pci_driver_init(void)
>   if (ret)
>   return ret;
>  #endif
> -
> + dma_debug_add_bus(_bus_type);
>   return 0;
>  }
>  postcore_initcall(pci_driver_init);
> -- 
> 2.18.0
> 


Re: [PATCH] powerpc: do not redefined NEED_DMA_MAP_STATE

2018-07-31 Thread Michael Ellerman
Christoph Hellwig  writes:

> kernel/dma/Kconfig already defines NEED_DMA_MAP_STATE, just select it
> from PPC64 and NOT_COHERENT_CACHE instead.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/powerpc/Kconfig   | 3 ---
>  arch/powerpc/platforms/Kconfig.cputype | 2 ++
>  2 files changed, 2 insertions(+), 3 deletions(-)

Thanks.

I did this instead:

commit 870771ae76010c5e42ee8e0278f5823e46e96e3f (HEAD -> next-test)
Author: Christoph Hellwig 
AuthorDate: Mon Jul 30 09:37:21 2018 +0200
Commit: Michael Ellerman 
CommitDate: Tue Jul 31 20:43:57 2018 +1000

powerpc: Do not redefine NEED_DMA_MAP_STATE

kernel/dma/Kconfig already defines NEED_DMA_MAP_STATE, just select it
from CONFIG_PPC using the same condition as an if guard.

Signed-off-by: Christoph Hellwig 
[mpe: Move it under PPC]
Signed-off-by: Michael Ellerman 

diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
index 5eb4d969afbf..ee38fce075ee 100644
--- a/arch/powerpc/Kconfig
+++ b/arch/powerpc/Kconfig
@@ -226,6 +226,7 @@ config PPC
select IRQ_DOMAIN
select IRQ_FORCED_THREADING
select MODULES_USE_ELF_RELA
+   select NEED_DMA_MAP_STATE   if PPC64 || NOT_COHERENT_CACHE
select NEED_SG_DMA_LENGTH
select NO_BOOTMEM
select OF
@@ -885,9 +886,6 @@ config ZONE_DMA
bool
default y
 
-config NEED_DMA_MAP_STATE
-   def_bool (PPC64 || NOT_COHERENT_CACHE)
-
 config GENERIC_ISA_DMA
bool
depends on ISA_DMA_API


cheers


Re: [PATCH v5 09/11] hugetlb: Introduce generic version of huge_ptep_set_wrprotect

2018-07-31 Thread Michael Ellerman
Alexandre Ghiti  writes:

> arm, ia64, mips, sh, x86 architectures use the same version
> of huge_ptep_set_wrprotect, so move this generic implementation into
> asm-generic/hugetlb.h.
> Note: powerpc uses twice for book3s/32 and nohash/32 the same version as
> the above architectures, but the modification was not straightforward
> and hence has not been done.

Do you remember what the problem was there?

It looks like you should just be able to drop them like the others. I
assume there's some header spaghetti that causes problems though?

cheers


> Signed-off-by: Alexandre Ghiti 
> Reviewed-by: Mike Kravetz 
> ---
>  arch/arm/include/asm/hugetlb-3level.h| 6 --
>  arch/arm64/include/asm/hugetlb.h | 1 +
>  arch/ia64/include/asm/hugetlb.h  | 6 --
>  arch/mips/include/asm/hugetlb.h  | 6 --
>  arch/parisc/include/asm/hugetlb.h| 1 +
>  arch/powerpc/include/asm/book3s/32/pgtable.h | 2 ++
>  arch/powerpc/include/asm/book3s/64/pgtable.h | 1 +
>  arch/powerpc/include/asm/nohash/32/pgtable.h | 2 ++
>  arch/powerpc/include/asm/nohash/64/pgtable.h | 1 +
>  arch/sh/include/asm/hugetlb.h| 6 --
>  arch/sparc/include/asm/hugetlb.h | 1 +
>  arch/x86/include/asm/hugetlb.h   | 6 --
>  include/asm-generic/hugetlb.h| 8 
>  13 files changed, 17 insertions(+), 30 deletions(-)
>
> diff --git a/arch/arm/include/asm/hugetlb-3level.h 
> b/arch/arm/include/asm/hugetlb-3level.h
> index b897541520ef..8247cd6a2ac6 100644
> --- a/arch/arm/include/asm/hugetlb-3level.h
> +++ b/arch/arm/include/asm/hugetlb-3level.h
> @@ -37,12 +37,6 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
>   return retval;
>  }
>  
> -static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> -unsigned long addr, pte_t *ptep)
> -{
> - ptep_set_wrprotect(mm, addr, ptep);
> -}
> -
>  static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>unsigned long addr, pte_t *ptep,
>pte_t pte, int dirty)
> diff --git a/arch/arm64/include/asm/hugetlb.h 
> b/arch/arm64/include/asm/hugetlb.h
> index 3e7f6e69b28d..f4f69ae5466e 100644
> --- a/arch/arm64/include/asm/hugetlb.h
> +++ b/arch/arm64/include/asm/hugetlb.h
> @@ -48,6 +48,7 @@ extern int huge_ptep_set_access_flags(struct vm_area_struct 
> *vma,
>  #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
>  extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
>unsigned long addr, pte_t *ptep);
> +#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
>  extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
>   unsigned long addr, pte_t *ptep);
>  #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
> diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
> index cbe296271030..49d1f7949f3a 100644
> --- a/arch/ia64/include/asm/hugetlb.h
> +++ b/arch/ia64/include/asm/hugetlb.h
> @@ -27,12 +27,6 @@ static inline void huge_ptep_clear_flush(struct 
> vm_area_struct *vma,
>  {
>  }
>  
> -static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> -unsigned long addr, pte_t *ptep)
> -{
> - ptep_set_wrprotect(mm, addr, ptep);
> -}
> -
>  static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>unsigned long addr, pte_t *ptep,
>pte_t pte, int dirty)
> diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
> index 6ff2531cfb1d..3dcf5debf8c4 100644
> --- a/arch/mips/include/asm/hugetlb.h
> +++ b/arch/mips/include/asm/hugetlb.h
> @@ -63,12 +63,6 @@ static inline int huge_pte_none(pte_t pte)
>   return !val || (val == (unsigned long)invalid_pte_table);
>  }
>  
> -static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
> -unsigned long addr, pte_t *ptep)
> -{
> - ptep_set_wrprotect(mm, addr, ptep);
> -}
> -
>  static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
>unsigned long addr,
>pte_t *ptep, pte_t pte,
> diff --git a/arch/parisc/include/asm/hugetlb.h 
> b/arch/parisc/include/asm/hugetlb.h
> index fb7e0fd858a3..9c3950ca2974 100644
> --- a/arch/parisc/include/asm/hugetlb.h
> +++ b/arch/parisc/include/asm/hugetlb.h
> @@ -39,6 +39,7 @@ static inline void huge_ptep_clear_flush(struct 
> vm_area_struct *vma,
>  {
>  }
>  
> +#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
>  void huge_ptep_set_wrprotect(struct mm_struct *mm,
>  unsigned long addr, pte_t *ptep);
>  
> diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
> b/arch/powerpc/include/asm/book3s/32/pgtable.h
> index 

RE: [PATCH v2 04/10] powerpc/traps: Use REG_FMT in show_signal_msg()

2018-07-31 Thread Alastair D'Silva
> -Original Message-
> From: Michael Ellerman 
> Sent: Tuesday, 31 July 2018 7:32 PM
> To: Murilo Opsfelder Araujo ; LEROY Christophe
> 
> Cc: linux-ker...@vger.kernel.org; Alastair D'Silva ;
> Andrew Donnellan ; Balbir Singh
> ; Benjamin Herrenschmidt
> ; Cyril Bur ; Eric W .
> Biederman ; Joe Perches ;
> Michael Neuling ; Nicholas Piggin
> ; Paul Mackerras ; Simon Guo
> ; Sukadev Bhattiprolu
> ; Tobin C . Harding ; linuxppc-
> d...@lists.ozlabs.org
> Subject: Re: [PATCH v2 04/10] powerpc/traps: Use REG_FMT in
> show_signal_msg()
> 
> Murilo Opsfelder Araujo  writes:
> > On Mon, Jul 30, 2018 at 06:30:47PM +0200, LEROY Christophe wrote:
> >> Murilo Opsfelder Araujo  a écrit :
> >> > On Fri, Jul 27, 2018 at 06:40:23PM +0200, LEROY Christophe wrote:
> >> > > Murilo Opsfelder Araujo  a écrit :
> >> > >
> >> > > > Simplify the message format by using REG_FMT as the register
> >> > > > format.  This avoids having two different formats and avoids
> checking for MSR_64BIT.
> >> > >
> >> > > Are you sure it is what we want ?
> >> >
> >> > Yes.
> >> >
> >> > > Won't it change the behaviour for a 32 bits app running on a 64bits
> kernel ?
> >> >
> >> > In fact, this changes how many zeroes are prefixed when displaying
> >> > the registers (%016lx vs. %08lx format).  For example, 32-bits
> >> > userspace, 64-bits kernel:
> >>
> >> Indeed that's what I suspected. What is the real benefit of this change ?
> >> Why not keep the current format for 32bits userspace ? All those
> >> leading zeroes are pointless to me.
> >
> > One of the benefits is simplifying the code by removing some checks.
> > Another is deduplicating almost identical format strings in favor of a 
> > unified
> one.
> >
> > After reading Joe's comment [1], %px seems to be the format we're
> looking for.
> > An extract from Documentation/core-api/printk-formats.rst:
> >
> >   "%px is functionally equivalent to %lx (or %lu). %px is preferred because 
> > it
> >   is more uniquely grep'able."
> >
> > So I guess we don't need to worry about the format (%016lx vs. %08lx),
> > let's just use %px, as per the guideline.
> 
> I don't think I like %px.

Me neither, semantically, it's for pointers, and the data being displayed is 
not a pointer.

> It makes the format string cleaner, but it means we have to cast everything
> to void * which is ugly as heck.
> 
> I actually don't think the leading zeroes are helpful at all in the signal
> message, ie. we should just use %lx there.
> 
> They are useful in show_regs() because we want everything to line up.
> 
> So I think I'll drop patch 3 and use 0x%lx in show_signal_msg(), meaning we
> end up with, eg:
> 
>   [   73.414535] segv[3759]: segfault (11) at 0x0 nip 0x1420 lr 0xfe61854
> code 0x1 in segv[1000+1]
>   [   73.414641] segv[3759]: code: 4e800421 80010014 38210010 7c0803a6
> 4b30 9421ffd0 93e1002c 7c3f0b78
>   [   73.414665] segv[3759]: code: 3920 913f001c 813f001c 3941
> <9149> 3920 7d234b78 397f0030

Or better yet, "%#lx" - the hash adds the appropriate prefix in the right case 
for the format.

-- 
Alastair D'Silva   mob: 0423 762 819
skype: alastair_dsilva msn: alast...@d-silva.org
blog: http://alastair.d-silva.orgTwitter: @EvilDeece





Re: powerpc: 32BIT vs. 64BIT (PPC32 vs. PPC64)

2018-07-31 Thread Michael Ellerman
Masahiro Yamada  writes:
> 2018-07-07 23:59 GMT+09:00 Randy Dunlap :
>> On 07/07/2018 05:13 AM, Nicholas Piggin wrote:
>>> On Fri, 6 Jul 2018 21:58:29 -0700
>>> Randy Dunlap  wrote:
>>>
 On 07/06/2018 06:45 PM, Benjamin Herrenschmidt wrote:
> On Thu, 2018-07-05 at 14:30 -0700, Randy Dunlap wrote:
>> Hi,
>>
>> Is there a good way (or a shortcut) to do something like:
>>
>> $ make ARCH=powerpc O=PPC32 [other_options] allmodconfig
>>   to get a PPC32/32BIT allmodconfig
>>
>> and also be able to do:
>>
>> $make ARCH=powerpc O=PPC64 [other_options] allmodconfig
>>   to get a PPC64/64BIT allmodconfig?
>
> Hrm... O= is for the separate build dir, so there much be something
> else.
>
> You mean having ARCH= aliases like ppc/ppc32 and ppc64 ?

 Yes.

> That would be a matter of overriding some .config defaults I suppose, I
> don't know how this is done on other archs.
>
> I see the aliasing trick in the Makefile but that's about it.
>
>> Note that arch/x86, arch/sh, and arch/sparc have ways to do
>> some flavor(s) of this (from Documentation/kbuild/kbuild.txt;
>> sh and sparc based on a recent "fix" patch from me):
>
> I fail to see what you are actually talking about here ... sorry. Do
> you have concrete examples on x86 or sparc ? From what I can tell the
> "i386" or "sparc32/sparc64" aliases just change SRCARCH in Makefile and
> 32 vs 64-bit is just a Kconfig option...

 Yes, your summary is mostly correct.

 I'm just looking for a way to do cross-compile builds that are close to
 ppc32 allmodconfig and ppc64 allmodconfig.
>>>
>>> Would there a problem with adding ARCH=ppc32 / ppc64 matching? This
>>> seems to work...
>>
>> Yes, this mostly works and is similar to a patch (my patch) on my test 
>> machine.
>> And they both work for allmodconfig, which is my primary build target.
>>
>> And they both have one little quirk that is confusing when the build target
>> is defconfig:
>>
>> When ARCH=ppc32, the terminal output (stdout) is: (using O=PPC32)
>>
>> make[1]: Entering directory '/home/rdunlap/lnx/lnx-418-rc3/PPC32'
>>   GEN ./Makefile
>> *** Default configuration is based on 'ppc64_defconfig'   < NOTE <
>> #
>> # configuration written to .config
>> #
>> make[1]: Leaving directory '/home/rdunlap/lnx/lnx-418-rc3/PPC32'
>>
>
>
> Maybe, we can set one of ppc32 defconfigs to KBUILD_DEFCONFIG
> if ARCH is ppc32 ?

We could, but as I said in another reply I'd rather we didn't play
tricks with ARCH.

I've merged a patch to add three new allmodconfig targets for ppc32,
ppc64le and ppc64_book3e:

  https://patchwork.ozlabs.org/project/linuxppc-dev/list/?series=54457=*


cheers


Re: [PATCH 3/6] powerpc: factor out RapidIO Kconfig menu entry

2018-07-31 Thread Michael Ellerman
Alexei Colin  writes:

> The menu entry is now defined in the rapidio subtree.  Also, re-order
> the bus menu so tha the platform-specific RapidIO controller appears
> after the entry for the RapidIO subsystem.
>
> Platforms with a PCI bus will be offered the RapidIO menu since they may
> be want support for a RapidIO PCI device. Platforms without a PCI bus
> that might include a RapidIO IP block will need to "select HAS_RAPIDIO"
> in the platform-/machine-specific "config ARCH_*" Kconfig entry.
>
> Cc: Andrew Morton 
> Cc: John Paul Walters 
> Cc: linuxppc-dev@lists.ozlabs.org
> Cc: linux-ker...@vger.kernel.org
> Signed-off-by: Alexei Colin 
> ---
>  arch/powerpc/Kconfig | 13 +
>  1 file changed, 1 insertion(+), 12 deletions(-)

Looks good.

Acked-by: Michael Ellerman  (powerpc)

cheers

> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig
> index 25d005af0a5b..17ea8a5f90a0 100644
> --- a/arch/powerpc/Kconfig
> +++ b/arch/powerpc/Kconfig
> @@ -993,16 +993,7 @@ source "drivers/pci/Kconfig"
>  
>  source "drivers/pcmcia/Kconfig"
>  
> -config HAS_RAPIDIO
> - bool
> - default n
> -
> -config RAPIDIO
> - tristate "RapidIO support"
> - depends on HAS_RAPIDIO || PCI
> - help
> -   If you say Y here, the kernel will include drivers and
> -   infrastructure code to support RapidIO interconnect devices.
> +source "drivers/rapidio/Kconfig"
>  
>  config FSL_RIO
>   bool "Freescale Embedded SRIO Controller support"
> @@ -1012,8 +1003,6 @@ config FSL_RIO
> Include support for RapidIO controller on Freescale embedded
> processors (MPC8548, MPC8641, etc).
>  
> -source "drivers/rapidio/Kconfig"
> -
>  endmenu
>  
>  config NONSTATIC_KERNEL
> -- 
> 2.18.0


Re: [PATCH] PCI: call dma_debug_add_bus for pci_bus_type in common code

2018-07-31 Thread Michael Ellerman
Christoph Hellwig  writes:

> There is nothing arch specific about PCI or dma-debug, so move this
> call to common code just after registering the bus type.
>
> Signed-off-by: Christoph Hellwig 
> ---
>  arch/powerpc/kernel/dma.c | 3 ---
>  arch/sh/drivers/pci/pci.c | 2 --
>  arch/x86/kernel/pci-dma.c | 3 ---
>  drivers/pci/pci-driver.c  | 2 +-
>  4 files changed, 1 insertion(+), 9 deletions(-)
>
> diff --git a/arch/powerpc/kernel/dma.c b/arch/powerpc/kernel/dma.c
> index 155170d70324..dbfc7056d7df 100644
> --- a/arch/powerpc/kernel/dma.c
> +++ b/arch/powerpc/kernel/dma.c
> @@ -357,9 +357,6 @@ EXPORT_SYMBOL_GPL(dma_get_required_mask);
>  
>  static int __init dma_init(void)
>  {
> -#ifdef CONFIG_PCI
> - dma_debug_add_bus(_bus_type);
> -#endif
>  #ifdef CONFIG_IBMVIO
>   dma_debug_add_bus(_bus_type);
>  #endif

Acked-by: Michael Ellerman  (powerpc)

cheers


Re: [PATCH v2 04/10] powerpc/traps: Use REG_FMT in show_signal_msg()

2018-07-31 Thread Michael Ellerman
Murilo Opsfelder Araujo  writes:
> On Mon, Jul 30, 2018 at 06:30:47PM +0200, LEROY Christophe wrote:
>> Murilo Opsfelder Araujo  a écrit :
>> > On Fri, Jul 27, 2018 at 06:40:23PM +0200, LEROY Christophe wrote:
>> > > Murilo Opsfelder Araujo  a écrit :
>> > >
>> > > > Simplify the message format by using REG_FMT as the register format.  
>> > > > This
>> > > > avoids having two different formats and avoids checking for MSR_64BIT.
>> > >
>> > > Are you sure it is what we want ?
>> >
>> > Yes.
>> >
>> > > Won't it change the behaviour for a 32 bits app running on a 64bits 
>> > > kernel ?
>> >
>> > In fact, this changes how many zeroes are prefixed when displaying the
>> > registers
>> > (%016lx vs. %08lx format).  For example, 32-bits userspace, 64-bits kernel:
>>
>> Indeed that's what I suspected. What is the real benefit of this change ?
>> Why not keep the current format for 32bits userspace ? All those leading
>> zeroes are pointless to me.
>
> One of the benefits is simplifying the code by removing some checks.  Another 
> is
> deduplicating almost identical format strings in favor of a unified one.
>
> After reading Joe's comment [1], %px seems to be the format we're looking for.
> An extract from Documentation/core-api/printk-formats.rst:
>
>   "%px is functionally equivalent to %lx (or %lu). %px is preferred because it
>   is more uniquely grep'able."
>
> So I guess we don't need to worry about the format (%016lx vs. %08lx), let's
> just use %px, as per the guideline.

I don't think I like %px.

It makes the format string cleaner, but it means we have to cast
everything to void * which is ugly as heck.

I actually don't think the leading zeroes are helpful at all in the
signal message, ie. we should just use %lx there.

They are useful in show_regs() because we want everything to line up.

So I think I'll drop patch 3 and use 0x%lx in show_signal_msg(), meaning
we end up with, eg:

  [   73.414535] segv[3759]: segfault (11) at 0x0 nip 0x1420 lr 0xfe61854 
code 0x1 in segv[1000+1]
  [   73.414641] segv[3759]: code: 4e800421 80010014 38210010 7c0803a6 4b30 
9421ffd0 93e1002c 7c3f0b78
  [   73.414665] segv[3759]: code: 3920 913f001c 813f001c 3941 
<9149> 3920 7d234b78 397f0030


I'll do that unless anyone screams loudly, because it would be nice to
get this into 4.19.

cheers


Re: [PATCH v5 00/11] hugetlb: Factorize hugetlb architecture primitives

2018-07-31 Thread Catalin Marinas
On Tue, Jul 31, 2018 at 06:01:44AM +, Alexandre Ghiti wrote:
> Alexandre Ghiti (11):
>   hugetlb: Harmonize hugetlb.h arch specific defines with pgtable.h
>   hugetlb: Introduce generic version of hugetlb_free_pgd_range
>   hugetlb: Introduce generic version of set_huge_pte_at
>   hugetlb: Introduce generic version of huge_ptep_get_and_clear
>   hugetlb: Introduce generic version of huge_ptep_clear_flush
>   hugetlb: Introduce generic version of huge_pte_none
>   hugetlb: Introduce generic version of huge_pte_wrprotect
>   hugetlb: Introduce generic version of prepare_hugepage_range
>   hugetlb: Introduce generic version of huge_ptep_set_wrprotect
>   hugetlb: Introduce generic version of huge_ptep_set_access_flags
>   hugetlb: Introduce generic version of huge_ptep_get
[...]
>  arch/arm64/include/asm/hugetlb.h | 39 +++-

For the arm64 bits in this series:

Acked-by: Catalin Marinas 


Re: [PATCH] PCI: call dma_debug_add_bus for pci_bus_type in common code

2018-07-31 Thread Christoph Hellwig
On Mon, Jul 30, 2018 at 04:17:13PM -0500, Bjorn Helgaas wrote:
> [+cc Joerg]
> 
> On Mon, Jul 30, 2018 at 09:38:42AM +0200, Christoph Hellwig wrote:
> > There is nothing arch specific about PCI or dma-debug, so move this
> > call to common code just after registering the bus type.
> 
> I assume that previously, even if the user set CONFIG_DMA_API_DEBUG=y
> we only got PCI DMA debug on powerpc, sh, and x86.  And after this
> patch, we'll get PCI DMA debug on *all* arches?

Yes.  Note that this only covers the actual bus related part, that
is warning about outstanding dma mappings on unload.  The rest of the
dma api debugging already is entirely generic.


Re: [PATCH] PCI: call dma_debug_add_bus for pci_bus_type in common code

2018-07-31 Thread Joerg Roedel
On Mon, Jul 30, 2018 at 04:17:13PM -0500, Bjorn Helgaas wrote:
> [+cc Joerg]
> 
> On Mon, Jul 30, 2018 at 09:38:42AM +0200, Christoph Hellwig wrote:
> > There is nothing arch specific about PCI or dma-debug, so move this
> > call to common code just after registering the bus type.
> 
> I assume that previously, even if the user set CONFIG_DMA_API_DEBUG=y
> we only got PCI DMA debug on powerpc, sh, and x86.  And after this
> patch, we'll get PCI DMA debug on *all* arches?
> 
> If that's true, I'll add a comment to that effect to the commitlog
> since that new functionality might be of interest to other arches.

There should be implicit support for dma-debug for all arches that use
the generic dma_ops code. The dma_debug_add_bus() function just adds the
reporting of pending dma-allocations on driver-unload for a device. 

Regards,

Joerg



Re: [RFC 2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively

2018-07-31 Thread Anshuman Khandual
On 07/30/2018 02:55 PM, Christoph Hellwig wrote:
>> +const struct dma_map_ops virtio_direct_dma_ops;
> 
> This belongs into a header if it is non-static.  If you only
> use it in this file anyway please mark it static and avoid a forward
> declaration.

Sure, will make it static, move the definition up in the file to avoid
forward declaration.
 
> 
>> +
>>  int virtio_finalize_features(struct virtio_device *dev)
>>  {
>>  int ret = dev->config->finalize_features(dev);
>> @@ -174,6 +176,9 @@ int virtio_finalize_features(struct virtio_device *dev)
>>  if (ret)
>>  return ret;
>>  
>> +if (virtio_has_iommu_quirk(dev))
>> +set_dma_ops(dev->dev.parent, _direct_dma_ops);
> 
> This needs a big fat comment explaining what is going on here.

Sure, will do. Also talk about the XEN domain exception as well once
that goes into this conditional statement.

> 
> Also not new, but I find the existance of virtio_has_iommu_quirk and its
> name horribly confusing.  It might be better to open code it here once
> only a single caller is left.

Sure will do. There is one definition in the tools directory which can
be removed and then this will be the only one left.



Re: [PATCH v3 1/1] powerpc/pseries: fix EEH recovery of some IOV devices

2018-07-31 Thread Michael Ellerman
Bjorn Helgaas  writes:
> On Mon, Jul 30, 2018 at 11:59:14AM +1000, Sam Bobroff wrote:
>> EEH recovery currently fails on pSeries for some IOV capable PCI
>> devices, if CONFIG_PCI_IOV is on and the hypervisor doesn't provide
>> certain device tree properties for the device. (Found on an IOV
>> capable device using the ipr driver.)
>> 
>> Recovery fails in pci_enable_resources() at the check on r->parent,
>> because r->flags is set and r->parent is not.  This state is due to
>> sriov_init() setting the start, end and flags members of the IOV BARs
>> but the parent not being set later in
>> pseries_pci_fixup_iov_resources(), because the
>> "ibm,open-sriov-vf-bar-info" property is missing.
>> 
>> Correct this by zeroing the resource flags for IOV BARs when they
>> can't be configured (this is the same method used by sriov_init() and
>> __pci_read_base()).
>> 
>> VFs cleared this way can't be enabled later, because that requires
>> another device tree property, "ibm,number-of-configurable-vfs" as well
>> as support for the RTAS function "ibm_map_pes". These are all part of
>> hypervisor support for IOV and it seems unlikely that a hypervisor
>> would ever partially, but not fully, support it. (None are currently
>> provided by QEMU/KVM.)
>> 
>> Signed-off-by: Sam Bobroff 
>
> Michael, I assume you'll take this, since it only touches powerpc.
> Let me know if you need anything from me.

Yeah I'll take it, thanks.

cheers


Re: [PATCH] powerpc/mobility: Fix node detach/rename problem

2018-07-31 Thread Michael Ellerman
Tyrel Datwyler  writes:
> On 07/29/2018 06:11 AM, Michael Bringmann wrote:
>> During LPAR migration, the content of the device tree/sysfs may
>> be updated including deletion and replacement of nodes in the
>> tree.  When nodes are added to the internal node structures, they
>> are appended in FIFO order to a list of nodes maintained by the
>> OF code APIs.  When nodes are removed from the device tree, they
>> are marked OF_DETACHED, but not actually deleted from the system
>> to allow for pointers cached elsewhere in the kernel.  The order
>> and content of the entries in the list of nodes is not altered,
>> though.
>> 
>> During LPAR migration some common nodes are deleted and re-added
>> e.g. "ibm,platform-facilities".  If a node is re-added to the OF
>> node lists, the of_attach_node function checks to make sure that
>> the name + ibm,phandle of the to-be-added data is unique.  As the
>> previous copy of a re-added node is not modified beyond the addition
>> of a bit flag, the code (1) finds the old copy, (2) prints a WARNING
>> notice to the console, (3) renames the to-be-added node to avoid
>> filename collisions within a directory, and (3) adds entries to
>> the sysfs/kernfs.
>
> So, this patch actually just band aids over the real problem. This is
> a long standing problem with several PFO drivers leaking references.
> The issue here is that, during the device tree update that follows a
> migration. the update of the ibm,platform-facilities node and friends
> below are always deleted and re-added on the destination lpar and
> subsequently the leaked references prevent the devices nodes from
> every actually being properly cleaned up after detach. Thus, leading
> to the issue you are observing.

Leaking references shouldn't affect the node being detached from the
tree though.

See of_detach_node() calling __of_detach_node(), none of that depends on
the refcount.

It's only the actual freeing of the node, in of_node_release() that is
prevented by leaked reference counts.

So I agree we need to do a better job with the reference counting, but I
don't see how it is causing the problem here.

cheers


Re: [RFC 2/4] virtio: Override device's DMA OPS with virtio_direct_dma_ops selectively

2018-07-31 Thread Anshuman Khandual
On 07/30/2018 03:00 PM, Christoph Hellwig wrote:
>>> +
>>> +   if (xen_domain())
>>> +   goto skip_override;
>>> +
>>> +   if (virtio_has_iommu_quirk(dev))
>>> +   set_dma_ops(dev->dev.parent, _direct_dma_ops);
>>> +
>>> + skip_override:
>>> +
>>
>> I prefer normal if scoping as opposed to goto spaghetti pls.
>> Better yet move vring_use_dma_api here and use it.
>> Less of a chance something will break.
> 
> I agree about avoid pointless gotos here, but we can do things
> perfectly well without either gotos or a confusing helper here
> if we structure it right. E.g.:
> 
>   // suitably detailed comment here
>   if (!xen_domain() &&
>   !virtio_has_feature(vdev, VIRTIO_F_IOMMU_PLATFORM))
>   set_dma_ops(dev->dev.parent, _direct_dma_ops);

I had updated this patch calling vring_use_dma_api() as a helper
as suggested by Michael but yes we can have the above condition
with a comment block. I will change this patch accordingly.

> 
> and while we're at it - modifying dma ops for the parent looks very
> dangerous.  I don't think we can do that, as it could break iommu
> setup interactions.  IFF we set a specific dma map ops it has to be
> on the virtio device itself, of which we have full control.

I understand your concern. At present virtio core calls parent's DMA
ops callbacks when device has VIRTIO_F_IOMMU_PLATFORM flag set. Most
likely those DMA OPS are architecture specific ones which can really
configure IOMMU. Most probably all devices and their parents share
the same DMA ops callback. IIUC as long as the entire system has a
single DMA ops structure, it should be okay. But I may be missing
other implications. I tried changing virtio core so that it always
calls device's DMA ops instead of it's parent DMA ops, it hit the
following WARN_ON for devices without IOMMU flag and hit both the
WARN_ON and BUG_ON for devices with the IOMMU flag.

static inline void *dma_alloc_attrs(struct device *dev, size_t size,
   dma_addr_t *dma_handle, gfp_t flag,
   unsigned long attrs)
{
const struct dma_map_ops *ops = get_dma_ops(dev);
void *cpu_addr;

BUG_ON(!ops);
WARN_ON_ONCE(dev && !dev->coherent_dma_mask);



Seems like virtio device's DMA ops and coherent_dma_mask was never
set correctly assuming that virtio core always called parent's DMA
OPS all the time. We may have to change virtio device init to fix
this. Any thoughts ?



phandle_cache vs of_detach_node (was Re: [PATCH] powerpc/mobility: Fix node detach/rename problem)

2018-07-31 Thread Michael Ellerman
Hi Rob/Frank,

I think we might have a problem with the phandle_cache not interacting
well with of_detach_node():

Michael Bringmann  writes:
> See below.
>
> On 07/30/2018 01:31 AM, Michael Ellerman wrote:
>> Michael Bringmann  writes:
>> 
>>> During LPAR migration, the content of the device tree/sysfs may
>>> be updated including deletion and replacement of nodes in the
>>> tree.  When nodes are added to the internal node structures, they
>>> are appended in FIFO order to a list of nodes maintained by the
>>> OF code APIs.
>> 
>> That hasn't been true for several years. The data structure is an n-ary
>> tree. What kernel version are you working on?
>
> Sorry for an error in my description.  I oversimplified based on the
> name of a search iterator.  Let me try to provide a better explanation
> of the problem, here.
>
> This is the problem.  The PPC mobility code receives RTAS requests to
> delete nodes with platform-/hardware-specific attributes when restarting
> the kernel after a migration.  My example is for migration between a
> P8 Alpine and a P8 Brazos.   Nodes to be deleted may include 'ibm,random-v1',
> 'ibm,compression-v1', 'ibm,platform-facilities', 'ibm,sym-encryption-v1',
> or others.
>
> The mobility.c code calls 'of_detach_node' for the nodes and their children.
> This makes calls to detach the properties and to try to remove the associated
> sysfs/kernfs files.
>
> Then new copies of the same nodes are next provided by the PHYP, local
> copies are built, and a pointer to the 'struct device_node' is passed to
> of_attach_node.  Before the call to of_attach_node, the phandle is initialized
> to 0 when the data structure is alloced.  During the call to of_attach_node,
> it calls __of_attach_node which pulls the actual name and phandle from just
> created sub-properties named something like 'name' and 'ibm,phandle'.
>
> This is all fine for the first migration.  The problem occurs with the
> second and subsequent migrations when the PHYP on the new system wants to
> replace the same set of nodes again, referenced with the same names and
> phandle values.
>
>> 
>>> When nodes are removed from the device tree, they
>>> are marked OF_DETACHED, but not actually deleted from the system
>>> to allow for pointers cached elsewhere in the kernel.  The order
>>> and content of the entries in the list of nodes is not altered,
>>> though.
>> 
>> Something is going wrong if this is actually happening.
>> 
>> When the node is detached it should be *detached* from the tree of all
>> nodes, so it should not be discoverable other than by having an existing
>> pointer to it.
> On the second and subsequent migrations, the PHYP tells the system
> to again delete the nodes 'ibm,platform-facilities', 'ibm,random-v1',
> 'ibm,compression-v1', 'ibm,sym-encryption-v1'.  It specifies these
> nodes by its known set of phandle values -- the same handles used
> by the PHYP on the source system are known on the target system.
> The mobility.c code calls of_find_node_by_phandle() with these values
> and ends up locating the first instance of each node that was added
> during the original boot, instead of the second instance of each node
> created after the first migration.  The detach during the second
> migration fails with errors like,
>
> [ 4565.030704] WARNING: CPU: 3 PID: 4787 at drivers/of/dynamic.c:252 
> __of_detach_node+0x8/0xa0
> [ 4565.030708] Modules linked in: nfsv3 nfs_acl nfs tcp_diag udp_diag 
> inet_diag unix_diag af_packet_diag netlink_diag lockd grace fscache sunrpc 
> xts vmx_crypto sg pseries_rng binfmt_misc ip_tables xfs libcrc32c sd_mod 
> ibmveth ibmvscsi scsi_transport_srp dm_mirror dm_region_hash dm_log dm_mod
> [ 4565.030733] CPU: 3 PID: 4787 Comm: drmgr Tainted: GW 
> 4.18.0-rc1-wi107836-v05-120+ #201
> [ 4565.030737] NIP:  c07c1ea8 LR: c07c1fb4 CTR: 
> 00655170
> [ 4565.030741] REGS: c003f302b690 TRAP: 0700   Tainted: GW
>   (4.18.0-rc1-wi107836-v05-120+)
> [ 4565.030745] MSR:  80010282b033  
>  CR: 22288822  XER: 000a
> [ 4565.030757] CFAR: c07c1fb0 IRQMASK: 1
> [ 4565.030757] GPR00: c07c1fa4 c003f302b910 c114bf00 
> c0038e68
> [ 4565.030757] GPR04: 0001  80c008e0b4b8 
> 
> [ 4565.030757] GPR08:  0001 8003 
> 2843
> [ 4565.030757] GPR12: 8800 c0001ec9ae00 4000 
> 
> [ 4565.030757] GPR16:  0008  
> f6ff
> [ 4565.030757] GPR20: 0007  c003e9f1f034 
> 0001
> [ 4565.030757] GPR24:    
> 
> [ 4565.030757] GPR28: c1549d28 c1134828 c0038e68 
> c003f302b930
> [ 4565.030804] NIP [c07c1ea8] __of_detach_node+0x8/0xa0
> [ 4565.030808] LR [c07c1fb4] 

[PATCH v5 11/11] hugetlb: Introduce generic version of huge_ptep_get

2018-07-31 Thread Alexandre Ghiti
ia64, mips, parisc, powerpc, sh, sparc, x86 architectures use the
same version of huge_ptep_get, so move this generic implementation into
asm-generic/hugetlb.h.

Signed-off-by: Alexandre Ghiti 
Reviewed-by: Mike Kravetz 
---
 arch/arm/include/asm/hugetlb-3level.h | 1 +
 arch/arm64/include/asm/hugetlb.h  | 1 +
 arch/ia64/include/asm/hugetlb.h   | 5 -
 arch/mips/include/asm/hugetlb.h   | 5 -
 arch/parisc/include/asm/hugetlb.h | 5 -
 arch/powerpc/include/asm/hugetlb.h| 5 -
 arch/sh/include/asm/hugetlb.h | 5 -
 arch/sparc/include/asm/hugetlb.h  | 5 -
 arch/x86/include/asm/hugetlb.h| 5 -
 include/asm-generic/hugetlb.h | 7 +++
 10 files changed, 9 insertions(+), 35 deletions(-)

diff --git a/arch/arm/include/asm/hugetlb-3level.h 
b/arch/arm/include/asm/hugetlb-3level.h
index 54e4b097b1f5..0d9f3918fa7e 100644
--- a/arch/arm/include/asm/hugetlb-3level.h
+++ b/arch/arm/include/asm/hugetlb-3level.h
@@ -29,6 +29,7 @@
  * ptes.
  * (The valid bit is automatically cleared by set_pte_at for PROT_NONE ptes).
  */
+#define __HAVE_ARCH_HUGE_PTEP_GET
 static inline pte_t huge_ptep_get(pte_t *ptep)
 {
pte_t retval = *ptep;
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 80887abcef7f..fb6609875455 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -20,6 +20,7 @@
 
 #include 
 
+#define __HAVE_ARCH_HUGE_PTEP_GET
 static inline pte_t huge_ptep_get(pte_t *ptep)
 {
return READ_ONCE(*ptep);
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index e9b42750fdf5..36cc0396b214 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -27,11 +27,6 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
 {
 }
 
-static inline pte_t huge_ptep_get(pte_t *ptep)
-{
-   return *ptep;
-}
-
 static inline void arch_clear_hugepage_flags(struct page *page)
 {
 }
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index 120adc3b2ffd..425bb6fc3bda 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -82,11 +82,6 @@ static inline int huge_ptep_set_access_flags(struct 
vm_area_struct *vma,
return changed;
 }
 
-static inline pte_t huge_ptep_get(pte_t *ptep)
-{
-   return *ptep;
-}
-
 static inline void arch_clear_hugepage_flags(struct page *page)
 {
 }
diff --git a/arch/parisc/include/asm/hugetlb.h 
b/arch/parisc/include/asm/hugetlb.h
index 165b4e5a6f32..7cb595dcb7d7 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -48,11 +48,6 @@ int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep,
 pte_t pte, int dirty);
 
-static inline pte_t huge_ptep_get(pte_t *ptep)
-{
-   return *ptep;
-}
-
 static inline void arch_clear_hugepage_flags(struct page *page)
 {
 }
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index 658bf7136a3c..33a2d9e3ea9e 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -142,11 +142,6 @@ extern int huge_ptep_set_access_flags(struct 
vm_area_struct *vma,
  unsigned long addr, pte_t *ptep,
  pte_t pte, int dirty);
 
-static inline pte_t huge_ptep_get(pte_t *ptep)
-{
-   return *ptep;
-}
-
 static inline void arch_clear_hugepage_flags(struct page *page)
 {
 }
diff --git a/arch/sh/include/asm/hugetlb.h b/arch/sh/include/asm/hugetlb.h
index c87195ae0cfa..6f025fe18146 100644
--- a/arch/sh/include/asm/hugetlb.h
+++ b/arch/sh/include/asm/hugetlb.h
@@ -32,11 +32,6 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
 {
 }
 
-static inline pte_t huge_ptep_get(pte_t *ptep)
-{
-   return *ptep;
-}
-
 static inline void arch_clear_hugepage_flags(struct page *page)
 {
clear_bit(PG_dcache_clean, >flags);
diff --git a/arch/sparc/include/asm/hugetlb.h b/arch/sparc/include/asm/hugetlb.h
index 028a1465fbe7..3963f80d1cb3 100644
--- a/arch/sparc/include/asm/hugetlb.h
+++ b/arch/sparc/include/asm/hugetlb.h
@@ -53,11 +53,6 @@ static inline int huge_ptep_set_access_flags(struct 
vm_area_struct *vma,
return changed;
 }
 
-static inline pte_t huge_ptep_get(pte_t *ptep)
-{
-   return *ptep;
-}
-
 static inline void arch_clear_hugepage_flags(struct page *page)
 {
 }
diff --git a/arch/x86/include/asm/hugetlb.h b/arch/x86/include/asm/hugetlb.h
index 574d42eb081e..7469d321f072 100644
--- a/arch/x86/include/asm/hugetlb.h
+++ b/arch/x86/include/asm/hugetlb.h
@@ -13,11 +13,6 @@ static inline int is_hugepage_only_range(struct mm_struct 
*mm,
return 0;
 }
 
-static inline pte_t huge_ptep_get(pte_t *ptep)
-{
-   return *ptep;
-}
-
 static inline void arch_clear_hugepage_flags(struct 

[PATCH v5 10/11] hugetlb: Introduce generic version of huge_ptep_set_access_flags

2018-07-31 Thread Alexandre Ghiti
arm, ia64, sh, x86 architectures use the same version
of huge_ptep_set_access_flags, so move this generic implementation
into asm-generic/hugetlb.h.

Signed-off-by: Alexandre Ghiti 
Reviewed-by: Mike Kravetz 
---
 arch/arm/include/asm/hugetlb-3level.h | 7 ---
 arch/arm64/include/asm/hugetlb.h  | 1 +
 arch/ia64/include/asm/hugetlb.h   | 7 ---
 arch/mips/include/asm/hugetlb.h   | 1 +
 arch/parisc/include/asm/hugetlb.h | 1 +
 arch/powerpc/include/asm/hugetlb.h| 1 +
 arch/sh/include/asm/hugetlb.h | 7 ---
 arch/sparc/include/asm/hugetlb.h  | 1 +
 arch/x86/include/asm/hugetlb.h| 7 ---
 include/asm-generic/hugetlb.h | 9 +
 10 files changed, 14 insertions(+), 28 deletions(-)

diff --git a/arch/arm/include/asm/hugetlb-3level.h 
b/arch/arm/include/asm/hugetlb-3level.h
index 8247cd6a2ac6..54e4b097b1f5 100644
--- a/arch/arm/include/asm/hugetlb-3level.h
+++ b/arch/arm/include/asm/hugetlb-3level.h
@@ -37,11 +37,4 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
return retval;
 }
 
-static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
-unsigned long addr, pte_t *ptep,
-pte_t pte, int dirty)
-{
-   return ptep_set_access_flags(vma, addr, ptep, pte, dirty);
-}
-
 #endif /* _ASM_ARM_HUGETLB_3LEVEL_H */
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index f4f69ae5466e..80887abcef7f 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -42,6 +42,7 @@ extern pte_t arch_make_huge_pte(pte_t entry, struct 
vm_area_struct *vma,
 #define __HAVE_ARCH_HUGE_SET_HUGE_PTE_AT
 extern void set_huge_pte_at(struct mm_struct *mm, unsigned long addr,
pte_t *ptep, pte_t pte);
+#define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
 extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
  unsigned long addr, pte_t *ptep,
  pte_t pte, int dirty);
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index 49d1f7949f3a..e9b42750fdf5 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -27,13 +27,6 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
 {
 }
 
-static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
-unsigned long addr, pte_t *ptep,
-pte_t pte, int dirty)
-{
-   return ptep_set_access_flags(vma, addr, ptep, pte, dirty);
-}
-
 static inline pte_t huge_ptep_get(pte_t *ptep)
 {
return *ptep;
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index 3dcf5debf8c4..120adc3b2ffd 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -63,6 +63,7 @@ static inline int huge_pte_none(pte_t pte)
return !val || (val == (unsigned long)invalid_pte_table);
 }
 
+#define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
 static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 unsigned long addr,
 pte_t *ptep, pte_t pte,
diff --git a/arch/parisc/include/asm/hugetlb.h 
b/arch/parisc/include/asm/hugetlb.h
index 9c3950ca2974..165b4e5a6f32 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -43,6 +43,7 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
 void huge_ptep_set_wrprotect(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep);
 
+#define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
 int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep,
 pte_t pte, int dirty);
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index 69c14ecac133..658bf7136a3c 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -137,6 +137,7 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
flush_hugetlb_page(vma, addr);
 }
 
+#define __HAVE_ARCH_HUGE_PTEP_SET_ACCESS_FLAGS
 extern int huge_ptep_set_access_flags(struct vm_area_struct *vma,
  unsigned long addr, pte_t *ptep,
  pte_t pte, int dirty);
diff --git a/arch/sh/include/asm/hugetlb.h b/arch/sh/include/asm/hugetlb.h
index 8df4004977b9..c87195ae0cfa 100644
--- a/arch/sh/include/asm/hugetlb.h
+++ b/arch/sh/include/asm/hugetlb.h
@@ -32,13 +32,6 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
 {
 }
 
-static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
-   

[PATCH v5 09/11] hugetlb: Introduce generic version of huge_ptep_set_wrprotect

2018-07-31 Thread Alexandre Ghiti
arm, ia64, mips, sh, x86 architectures use the same version
of huge_ptep_set_wrprotect, so move this generic implementation into
asm-generic/hugetlb.h.
Note: powerpc uses twice for book3s/32 and nohash/32 the same version as
the above architectures, but the modification was not straightforward
and hence has not been done.

Signed-off-by: Alexandre Ghiti 
Reviewed-by: Mike Kravetz 
---
 arch/arm/include/asm/hugetlb-3level.h| 6 --
 arch/arm64/include/asm/hugetlb.h | 1 +
 arch/ia64/include/asm/hugetlb.h  | 6 --
 arch/mips/include/asm/hugetlb.h  | 6 --
 arch/parisc/include/asm/hugetlb.h| 1 +
 arch/powerpc/include/asm/book3s/32/pgtable.h | 2 ++
 arch/powerpc/include/asm/book3s/64/pgtable.h | 1 +
 arch/powerpc/include/asm/nohash/32/pgtable.h | 2 ++
 arch/powerpc/include/asm/nohash/64/pgtable.h | 1 +
 arch/sh/include/asm/hugetlb.h| 6 --
 arch/sparc/include/asm/hugetlb.h | 1 +
 arch/x86/include/asm/hugetlb.h   | 6 --
 include/asm-generic/hugetlb.h| 8 
 13 files changed, 17 insertions(+), 30 deletions(-)

diff --git a/arch/arm/include/asm/hugetlb-3level.h 
b/arch/arm/include/asm/hugetlb-3level.h
index b897541520ef..8247cd6a2ac6 100644
--- a/arch/arm/include/asm/hugetlb-3level.h
+++ b/arch/arm/include/asm/hugetlb-3level.h
@@ -37,12 +37,6 @@ static inline pte_t huge_ptep_get(pte_t *ptep)
return retval;
 }
 
-static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
-  unsigned long addr, pte_t *ptep)
-{
-   ptep_set_wrprotect(mm, addr, ptep);
-}
-
 static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep,
 pte_t pte, int dirty)
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 3e7f6e69b28d..f4f69ae5466e 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -48,6 +48,7 @@ extern int huge_ptep_set_access_flags(struct vm_area_struct 
*vma,
 #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
 extern pte_t huge_ptep_get_and_clear(struct mm_struct *mm,
 unsigned long addr, pte_t *ptep);
+#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
 extern void huge_ptep_set_wrprotect(struct mm_struct *mm,
unsigned long addr, pte_t *ptep);
 #define __HAVE_ARCH_HUGE_PTEP_CLEAR_FLUSH
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index cbe296271030..49d1f7949f3a 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -27,12 +27,6 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
 {
 }
 
-static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
-  unsigned long addr, pte_t *ptep)
-{
-   ptep_set_wrprotect(mm, addr, ptep);
-}
-
 static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 unsigned long addr, pte_t *ptep,
 pte_t pte, int dirty)
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index 6ff2531cfb1d..3dcf5debf8c4 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -63,12 +63,6 @@ static inline int huge_pte_none(pte_t pte)
return !val || (val == (unsigned long)invalid_pte_table);
 }
 
-static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,
-  unsigned long addr, pte_t *ptep)
-{
-   ptep_set_wrprotect(mm, addr, ptep);
-}
-
 static inline int huge_ptep_set_access_flags(struct vm_area_struct *vma,
 unsigned long addr,
 pte_t *ptep, pte_t pte,
diff --git a/arch/parisc/include/asm/hugetlb.h 
b/arch/parisc/include/asm/hugetlb.h
index fb7e0fd858a3..9c3950ca2974 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -39,6 +39,7 @@ static inline void huge_ptep_clear_flush(struct 
vm_area_struct *vma,
 {
 }
 
+#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
 void huge_ptep_set_wrprotect(struct mm_struct *mm,
   unsigned long addr, pte_t *ptep);
 
diff --git a/arch/powerpc/include/asm/book3s/32/pgtable.h 
b/arch/powerpc/include/asm/book3s/32/pgtable.h
index 02f5acd7ccc4..d2cd1d0226e9 100644
--- a/arch/powerpc/include/asm/book3s/32/pgtable.h
+++ b/arch/powerpc/include/asm/book3s/32/pgtable.h
@@ -228,6 +228,8 @@ static inline void ptep_set_wrprotect(struct mm_struct *mm, 
unsigned long addr,
 {
pte_update(ptep, (_PAGE_RW | _PAGE_HWWRITE), _PAGE_RO);
 }
+
+#define __HAVE_ARCH_HUGE_PTEP_SET_WRPROTECT
 static inline void huge_ptep_set_wrprotect(struct mm_struct *mm,

[PATCH v5 08/11] hugetlb: Introduce generic version of prepare_hugepage_range

2018-07-31 Thread Alexandre Ghiti
arm, arm64, powerpc, sparc, x86 architectures use the same version of
prepare_hugepage_range, so move this generic implementation into
asm-generic/hugetlb.h.

Signed-off-by: Alexandre Ghiti 
Reviewed-by: Mike Kravetz 
---
 arch/arm/include/asm/hugetlb.h | 11 ---
 arch/arm64/include/asm/hugetlb.h   | 11 ---
 arch/ia64/include/asm/hugetlb.h|  1 +
 arch/mips/include/asm/hugetlb.h|  1 +
 arch/parisc/include/asm/hugetlb.h  |  1 +
 arch/powerpc/include/asm/hugetlb.h | 15 ---
 arch/sh/include/asm/hugetlb.h  |  1 +
 arch/sparc/include/asm/hugetlb.h   | 16 
 arch/x86/include/asm/hugetlb.h | 15 ---
 include/asm-generic/hugetlb.h  | 15 +++
 10 files changed, 19 insertions(+), 68 deletions(-)

diff --git a/arch/arm/include/asm/hugetlb.h b/arch/arm/include/asm/hugetlb.h
index 9ca14227eeb7..3fcef21ff2c2 100644
--- a/arch/arm/include/asm/hugetlb.h
+++ b/arch/arm/include/asm/hugetlb.h
@@ -33,17 +33,6 @@ static inline int is_hugepage_only_range(struct mm_struct 
*mm,
return 0;
 }
 
-static inline int prepare_hugepage_range(struct file *file,
-unsigned long addr, unsigned long len)
-{
-   struct hstate *h = hstate_file(file);
-   if (len & ~huge_page_mask(h))
-   return -EINVAL;
-   if (addr & ~huge_page_mask(h))
-   return -EINVAL;
-   return 0;
-}
-
 static inline void arch_clear_hugepage_flags(struct page *page)
 {
clear_bit(PG_dcache_clean, >flags);
diff --git a/arch/arm64/include/asm/hugetlb.h b/arch/arm64/include/asm/hugetlb.h
index 1fd64ebf0cd7..3e7f6e69b28d 100644
--- a/arch/arm64/include/asm/hugetlb.h
+++ b/arch/arm64/include/asm/hugetlb.h
@@ -31,17 +31,6 @@ static inline int is_hugepage_only_range(struct mm_struct 
*mm,
return 0;
 }
 
-static inline int prepare_hugepage_range(struct file *file,
-unsigned long addr, unsigned long len)
-{
-   struct hstate *h = hstate_file(file);
-   if (len & ~huge_page_mask(h))
-   return -EINVAL;
-   if (addr & ~huge_page_mask(h))
-   return -EINVAL;
-   return 0;
-}
-
 static inline void arch_clear_hugepage_flags(struct page *page)
 {
clear_bit(PG_dcache_clean, >flags);
diff --git a/arch/ia64/include/asm/hugetlb.h b/arch/ia64/include/asm/hugetlb.h
index 82fe3d7a38d9..cbe296271030 100644
--- a/arch/ia64/include/asm/hugetlb.h
+++ b/arch/ia64/include/asm/hugetlb.h
@@ -9,6 +9,7 @@ void hugetlb_free_pgd_range(struct mmu_gather *tlb, unsigned 
long addr,
unsigned long end, unsigned long floor,
unsigned long ceiling);
 
+#define __HAVE_ARCH_PREPARE_HUGEPAGE_RANGE
 int prepare_hugepage_range(struct file *file,
unsigned long addr, unsigned long len);
 
diff --git a/arch/mips/include/asm/hugetlb.h b/arch/mips/include/asm/hugetlb.h
index b3d6bb53ee6e..6ff2531cfb1d 100644
--- a/arch/mips/include/asm/hugetlb.h
+++ b/arch/mips/include/asm/hugetlb.h
@@ -18,6 +18,7 @@ static inline int is_hugepage_only_range(struct mm_struct *mm,
return 0;
 }
 
+#define __HAVE_ARCH_PREPARE_HUGEPAGE_RANGE
 static inline int prepare_hugepage_range(struct file *file,
 unsigned long addr,
 unsigned long len)
diff --git a/arch/parisc/include/asm/hugetlb.h 
b/arch/parisc/include/asm/hugetlb.h
index 5a102d7251e4..fb7e0fd858a3 100644
--- a/arch/parisc/include/asm/hugetlb.h
+++ b/arch/parisc/include/asm/hugetlb.h
@@ -22,6 +22,7 @@ static inline int is_hugepage_only_range(struct mm_struct *mm,
  * If the arch doesn't supply something else, assume that hugepage
  * size aligned regions are ok without further preparation.
  */
+#define __HAVE_ARCH_PREPARE_HUGEPAGE_RANGE
 static inline int prepare_hugepage_range(struct file *file,
unsigned long addr, unsigned long len)
 {
diff --git a/arch/powerpc/include/asm/hugetlb.h 
b/arch/powerpc/include/asm/hugetlb.h
index 7123599089c6..69c14ecac133 100644
--- a/arch/powerpc/include/asm/hugetlb.h
+++ b/arch/powerpc/include/asm/hugetlb.h
@@ -117,21 +117,6 @@ void hugetlb_free_pgd_range(struct mmu_gather *tlb, 
unsigned long addr,
unsigned long end, unsigned long floor,
unsigned long ceiling);
 
-/*
- * If the arch doesn't supply something else, assume that hugepage
- * size aligned regions are ok without further preparation.
- */
-static inline int prepare_hugepage_range(struct file *file,
-   unsigned long addr, unsigned long len)
-{
-   struct hstate *h = hstate_file(file);
-   if (len & ~huge_page_mask(h))
-   return -EINVAL;
-   if (addr & ~huge_page_mask(h))
-   return -EINVAL;
-   return 0;
-}
-
 #define __HAVE_ARCH_HUGE_PTEP_GET_AND_CLEAR
 static inline pte_t huge_ptep_get_and_clear(struct 

  1   2   >