Re: powerpc: Convert power off logic to pm_power_off topic branch
On 11/03/2014 08:47 PM, Michael Ellerman wrote: Hi Guenter, I've put the pm_power_off patch in a topic branch: https://git.kernel.org/cgit/linux/kernel/git/mpe/linux.git/log/?h=topic/pm-power-off Hi Michael, Excellent. Right now all I can do is to wait for Rafael. He was not happy with earlier versions of the series and did not yet comment on the most recent version. Since we are already at 3.18-rc3 without making real progress, it may well be that the series is going to miss 3.19. Guenter ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: powerpc: Replace __get_cpu_var uses
On Tue, 2014-10-21 at 15:23 -0500, Christoph Lameter wrote: > This still has not been merged and now powerpc is the only arch that does > not have this change. Sorry about missing linuxppc-dev before. Hi Christoph, I've put this in a topic branch, with the fixups I described last week: https://git.kernel.org/cgit/linux/kernel/git/mpe/linux.git/log/?h=topic/get-cpu-var I'll pull this into my next when I open it. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
powerpc: Convert power off logic to pm_power_off topic branch
Hi Guenter, I've put the pm_power_off patch in a topic branch: https://git.kernel.org/cgit/linux/kernel/git/mpe/linux.git/log/?h=topic/pm-power-off I'll pull this into my next when I open it. Let me know if there's any issue with it. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[git pull] Please pull mpe.git for-linus branch (for powerpc)
Hi Linus, Some more powerpc fixes if you please. cheers The following changes since commit d506aa68c23db708ad45ca8c17f0d7f5d7029a37: Merge branch 'for-linus' of git://git.kernel.dk/linux-block (2014-10-29 11:57:10 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/mpe/linux.git for-linus for you to fetch changes up to 10ccaf178b2b961d8bca252d647ed7ed8aae2a20: powerpc: use device_online/offline() instead of cpu_up/down() (2014-11-02 10:55:56 +1100) Anton Blanchard (1): powerpc: do_notify_resume can be called with bad thread_info flags argument Benjamin Herrenschmidt (1): powerpc/powernv: Properly fix LPC debugfs endianness Dan Streetman (1): powerpc: use device_online/offline() instead of cpu_up/down() Fabian Frederick (1): powerpc: Fix section mismatch warning Hari Bathini (1): powerpc/fadump: Fix endianess issues in firmware assisted dump handling arch/powerpc/include/asm/fadump.h | 52 +++--- arch/powerpc/kernel/entry_64.S| 6 ++ arch/powerpc/kernel/fadump.c | 114 +++--- arch/powerpc/mm/init_32.c | 2 +- arch/powerpc/platforms/powernv/opal-lpc.c | 59 arch/powerpc/platforms/pseries/dlpar.c| 4 +- arch/powerpc/platforms/pseries/lpar.c | 14 +++- 7 files changed, 163 insertions(+), 88 deletions(-) signature.asc Description: This is a digitally signed message part ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] PPC: bpf_jit_comp: add SKF_AD_PKTTYPE instruction
On Mon, 2014-11-03 at 09:21 -0800, Alexei Starovoitov wrote: > On Mon, Nov 3, 2014 at 9:06 AM, David Miller wrote: > > From: Denis Kirjanov > > Date: Thu, 30 Oct 2014 09:12:15 +0300 > > > >> Add BPF extension SKF_AD_PKTTYPE to ppc JIT to load > >> skb->pkt_type field. > >> > >> Before: > >> [ 88.262622] test_bpf: #11 LD_IND_NET 86 97 99 PASS > >> [ 88.265740] test_bpf: #12 LD_PKTTYPE 109 107 PASS > >> > >> After: > >> [ 80.605964] test_bpf: #11 LD_IND_NET 44 40 39 PASS > >> [ 80.607370] test_bpf: #12 LD_PKTTYPE 9 9 PASS > >> > >> CC: Alexei Starovoitov > >> CC: Michael Ellerman > >> Cc: Matt Evans > >> Signed-off-by: Denis Kirjanov > >> > >> v2: Added test rusults > > > > So, can I apply this now? > > I think this question is more towards ppc folks, > since both Daniel and myself said before that it looks ok. Yeah sorry, as I said I don't really know enough about BPF to ack it. > Philippe just tested the previous version of this patch on ppc64le... > I'm guessing that Matt (original author of bpf jit for ppc) is not replying, > because he has no objections. Actually that might be because he works at ARM now :) If you can CC Philippe on future BPF patches for powerpc that would probably help. cheers ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/2] irqdomain: add support for creating a continous mapping
On Mon, 2014-11-03 at 17:18 +0100, Johannes Thumshirn wrote: > A MSI device may have multiple interrupts. That means that the > interrupts numbers should be continuos so that pdev->irq refers to the > first interrupt, pdev->irq + 1 to the second and so on. > This patch adds support for continuous allocation of virqs for a range > of hwirqs. The function is based on irq_create_mapping() but due to the > number argument there is very little in common now. > > Signed-off-by: Sebastian Andrzej Siewior > Signed-off-by: Johannes Thumshirn > --- > include/linux/irqdomain.h | 10 -- > kernel/irq/irqdomain.c| 85 > --- > 2 files changed, 73 insertions(+), 22 deletions(-) This is changing core kernel code and thus LKML should be CCed, as well as Ben Herrenschmidt who is the maintainer of kernel/irq/irqdomain.c. Also please respond to feedback in http://patchwork.ozlabs.org/patch/322497/ Is it really necessary for the virqs to be contiguous? How is the availability of multiple MSIs communicated to the driver? Is there an example of a driver that currently uses multiple MSIs? > /** > - * irq_create_mapping() - Map a hardware interrupt into linux irq space > + * irq_create_mapping_block() - Map multiple hardware interrupts > * @domain: domain owning this hardware interrupt or NULL for default domain > * @hwirq: hardware irq number in that domain space > + * @num: number of interrupts > + * > + * Maps a hwirq to a newly allocated virq. If num is greater than 1 then num > + * hwirqs (hwirq ... hwirq + num - 1) will be mapped and virq will be > + * continuous. > + * Returns the first linux virq number. > * > - * Only one mapping per hardware interrupt is permitted. Returns a linux > - * irq number. > * If the sense/trigger is to be specified, set_irq_type() should be called > * on the number returned from that call. > */ > -unsigned int irq_create_mapping(struct irq_domain *domain, > +unsigned int irq_create_mapping_block(struct irq_domain *domain, > irq_hw_number_t hwirq) > { Where is the num parameter? How does this even build? > unsigned int hint; > int virq; > + int node; > + int i; > > - pr_debug("irq_create_mapping(0x%p, 0x%lx)\n", domain, hwirq); > + pr_debug("%s(0x%p, 0x%lx, %d)\n", __func__, domain, hwirq, num); > > /* Look for default domain if nececssary */ > - if (domain == NULL) > + if (!domain && num == 1) > domain = irq_default_domain; > + > if (domain == NULL) { > WARN(1, "%s(, %lx) called with NULL domain\n", __func__, hwirq); > return 0; > @@ -403,35 +437,46 @@ unsigned int irq_create_mapping(struct irq_domain > *domain, > pr_debug("-> using domain @%p\n", domain); > > /* Check if mapping already exists */ > - virq = irq_find_mapping(domain, hwirq); > - if (virq) { > - pr_debug("-> existing mapping on virq %d\n", virq); > - return virq; > + for (i = 0; i < num; i++) { > + virq = irq_find_mapping(domain, hwirq + i); > + if (virq != NO_IRQ) { Please don't introduce new uses of NO_IRQ. irq_find_mapping() returns zero on failure. Some architectures (e.g. ARM) define NO_IRQ as something other than zero, which will cause this to break. > + if (i == 0) > + return irq_check_continuous_mapping(domain, > + hwirq, num); > + pr_err("irq: hwirq %ld has no mapping but hwirq %ld " > + "maps to virq %d. This can't be a block\n", > + hwirq, hwirq + i, virq); > + return -EINVAL; > + } > } Explain how you'd get into this error state, and how you'd avoid doing so. > + node = of_node_to_nid(domain->of_node); > + > /* Allocate a virtual interrupt number */ > hint = hwirq % nr_irqs; > if (hint == 0) > hint++; > - virq = irq_alloc_desc_from(hint, of_node_to_nid(domain->of_node)); > - if (virq <= 0) > - virq = irq_alloc_desc_from(1, of_node_to_nid(domain->of_node)); > + virq = irq_alloc_desc_from(hint, node); > + if (virq <= 0 && hint != 1) > + virq = irq_alloc_desc_from(1, node); Factoring out node seems irrelevant to, and obscures, what you're doing which is adding a chcek for hint. Why is a hint value of 1 special? You're also still allocating only one virq, unlike in http://patchwork.ozlabs.org/patch/322497/ -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: FSL MSI Mapping
On Wed, 2014-10-29 at 07:54 +0100, Johannes Thumshirn wrote: > On Tue, Oct 28, 2014 at 02:10:18PM -0500, Scott Wood wrote: > > On Tue, 2014-10-28 at 18:06 +0100, Johannes Thumshirn wrote: > > > Hi, > > > > > > I got notified about your patch to support multiple MSI Vectors on > > > Freescale > > > PowerPC platforms. Is there any reason why it wasn't applied until now? I > > > couldn't find anything about it in the list archives. > > > > > > I think it would be a real benefit for all to have multiple MSI vecotrs on > > > PowerPCs. > > > > Could you provide a patchwork link to the patch you're talking about? > > Unfortunately I couldn't find it in the ppc patchwork, but here are the links > to > the patch series in the list archives: > https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-February/115484.html > https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-February/115485.html > https://lists.ozlabs.org/pipermail/linuxppc-dev/2014-February/115486.html They were in patchwork with changes requested in patch 1/2: http://patchwork.ozlabs.org/patch/322497/ http://patchwork.ozlabs.org/patch/322334/ -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] PPC: bpf_jit_comp: add SKF_AD_PKTTYPE instruction
From: Alexei Starovoitov Date: Mon, 3 Nov 2014 09:21:03 -0800 > On Mon, Nov 3, 2014 at 9:06 AM, David Miller wrote: >> From: Denis Kirjanov >> Date: Thu, 30 Oct 2014 09:12:15 +0300 >> >>> Add BPF extension SKF_AD_PKTTYPE to ppc JIT to load >>> skb->pkt_type field. >>> >>> Before: >>> [ 88.262622] test_bpf: #11 LD_IND_NET 86 97 99 PASS >>> [ 88.265740] test_bpf: #12 LD_PKTTYPE 109 107 PASS >>> >>> After: >>> [ 80.605964] test_bpf: #11 LD_IND_NET 44 40 39 PASS >>> [ 80.607370] test_bpf: #12 LD_PKTTYPE 9 9 PASS >>> >>> CC: Alexei Starovoitov >>> CC: Michael Ellerman >>> Cc: Matt Evans >>> Signed-off-by: Denis Kirjanov >>> >>> v2: Added test rusults >> >> So, can I apply this now? > > I think this question is more towards ppc folks, > since both Daniel and myself said before that it looks ok. > Philippe just tested the previous version of this patch on ppc64le... > I'm guessing that Matt (original author of bpf jit for ppc) is not replying, > because he has no objections. > Either way the addition is tiny and contained, so can go in now. Ok, I have applied this to net-next, thanks everyone. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 1/3] ipr: Convert to generic DMA API
Thanks, I've applied patch 3 to drivers-for-3.18, and patches 1 and 2 to core-for-3.19. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] PPC: bpf_jit_comp: add SKF_AD_PKTTYPE instruction
On 11/3/14, Philippe Bergheaud wrote: > Denis Kirjanov wrote: >> Any feedback from PPC folks? > > I have reviewed the patch and it looks fine to me. > I have tested successfuly on ppc64le. > I could not test it on ppc64. Nice, I've tested it on PPC64be > > Philippe > >> On 10/26/14, Denis Kirjanov wrote: >> >>>Cc: Matt Evans >>>Signed-off-by: Denis Kirjanov >>>--- >>> arch/powerpc/include/asm/ppc-opcode.h | 1 + >>> arch/powerpc/net/bpf_jit.h| 7 +++ >>> arch/powerpc/net/bpf_jit_comp.c | 5 + >>> 3 files changed, 13 insertions(+) >>> >>>diff --git a/arch/powerpc/include/asm/ppc-opcode.h >>>b/arch/powerpc/include/asm/ppc-opcode.h >>>index 6f85362..1a52877 100644 >>>--- a/arch/powerpc/include/asm/ppc-opcode.h >>>+++ b/arch/powerpc/include/asm/ppc-opcode.h >>>@@ -204,6 +204,7 @@ >>> #define PPC_INST_ERATSX_DOT 0x7c000127 >>> >>> /* Misc instructions for BPF compiler */ >>>+#define PPC_INST_LBZ0x8800 >>> #define PPC_INST_LD 0xe800 >>> #define PPC_INST_LHZ0xa000 >>> #define PPC_INST_LHBRX 0x7c00062c >>>diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h >>>index 9aee27c..c406aa9 100644 >>>--- a/arch/powerpc/net/bpf_jit.h >>>+++ b/arch/powerpc/net/bpf_jit.h >>>@@ -87,6 +87,9 @@ DECLARE_LOAD_FUNC(sk_load_byte_msh); >>> #define PPC_STD(r, base, i) EMIT(PPC_INST_STD | ___PPC_RS(r) |\ >>> ___PPC_RA(base) | ((i) & 0xfffc)) >>> >>>+ >>>+#define PPC_LBZ(r, base, i) EMIT(PPC_INST_LBZ | ___PPC_RT(r) |\ >>>+ ___PPC_RA(base) | IMM_L(i)) >>> #define PPC_LD(r, base, i) EMIT(PPC_INST_LD | ___PPC_RT(r) | \ >>> ___PPC_RA(base) | IMM_L(i)) >>> #define PPC_LWZ(r, base, i) EMIT(PPC_INST_LWZ | ___PPC_RT(r) |\ >>>@@ -96,6 +99,10 @@ DECLARE_LOAD_FUNC(sk_load_byte_msh); >>> #define PPC_LHBRX(r, base, b) EMIT(PPC_INST_LHBRX | ___PPC_RT(r) | >>> \ >>> ___PPC_RA(base) | ___PPC_RB(b)) >>> /* Convenience helpers for the above with 'far' offsets: */ >>>+#define PPC_LBZ_OFFS(r, base, i) do { if ((i) < 32768) PPC_LBZ(r, base, >>> i); >>> \ >>>+else { PPC_ADDIS(r, base, IMM_HA(i));\ >>>+PPC_LBZ(r, r, IMM_L(i)); } } while(0) >>>+ >>> #define PPC_LD_OFFS(r, base, i) do { if ((i) < 32768) PPC_LD(r, base, >>> i); >>> \ >>> else { PPC_ADDIS(r, base, IMM_HA(i));\ >>> PPC_LD(r, r, IMM_L(i)); } } while(0) >>>diff --git a/arch/powerpc/net/bpf_jit_comp.c >>>b/arch/powerpc/net/bpf_jit_comp.c >>>index cbae2df..d110e28 100644 >>>--- a/arch/powerpc/net/bpf_jit_comp.c >>>+++ b/arch/powerpc/net/bpf_jit_comp.c >>>@@ -407,6 +407,11 @@ static int bpf_jit_build_body(struct bpf_prog *fp, >>> u32 >>>*image, >>> PPC_LHZ_OFFS(r_A, r_skb, offsetof(struct sk_buff, >>> queue_mapping)); >>> break; >>>+case BPF_ANC | SKF_AD_PKTTYPE: >>>+PPC_LBZ_OFFS(r_A, r_skb, PKT_TYPE_OFFSET()); >>>+PPC_ANDI(r_A, r_A, PKT_TYPE_MAX); >>>+PPC_SRWI(r_A, r_A, 5); >>>+break; >>> case BPF_ANC | SKF_AD_CPU: >>> #ifdef CONFIG_SMP >>> /* >>>-- >>>2.1.0 >>> >>> >> >> ___ >> Linuxppc-dev mailing list >> Linuxppc-dev@lists.ozlabs.org >> https://lists.ozlabs.org/listinfo/linuxppc-dev > > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH 0/2] Support multiple MSI interrupts on FSL-MPIC
On Mon, 2014-11-03 at 17:18 +0100, Johannes Thumshirn wrote: > This series adds support for multiple MSI interrupts on FSL's MPIC. The patch > series was originally done by Sebastian Andrzej Siewior > > and re-applied and tested by me. > > I dodn't know whether it is OK to put my name on it or not. So if Sebastian > has > a problem with it I'll of cause change it immediately. It was just convenient > to > use my name in the git commit but I don't want to do any kind of copyright > infrigement. You should set the Author field to Sebastian, which will result in a "From:" line at the top of the message body when sending patches via e-mail. -Scott ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] PPC: bpf_jit_comp: add SKF_AD_PKTTYPE instruction
On Mon, Nov 3, 2014 at 9:06 AM, David Miller wrote: > From: Denis Kirjanov > Date: Thu, 30 Oct 2014 09:12:15 +0300 > >> Add BPF extension SKF_AD_PKTTYPE to ppc JIT to load >> skb->pkt_type field. >> >> Before: >> [ 88.262622] test_bpf: #11 LD_IND_NET 86 97 99 PASS >> [ 88.265740] test_bpf: #12 LD_PKTTYPE 109 107 PASS >> >> After: >> [ 80.605964] test_bpf: #11 LD_IND_NET 44 40 39 PASS >> [ 80.607370] test_bpf: #12 LD_PKTTYPE 9 9 PASS >> >> CC: Alexei Starovoitov >> CC: Michael Ellerman >> Cc: Matt Evans >> Signed-off-by: Denis Kirjanov >> >> v2: Added test rusults > > So, can I apply this now? I think this question is more towards ppc folks, since both Daniel and myself said before that it looks ok. Philippe just tested the previous version of this patch on ppc64le... I'm guessing that Matt (original author of bpf jit for ppc) is not replying, because he has no objections. Either way the addition is tiny and contained, so can go in now. ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH v2] PPC: bpf_jit_comp: add SKF_AD_PKTTYPE instruction
From: Denis Kirjanov Date: Thu, 30 Oct 2014 09:12:15 +0300 > Add BPF extension SKF_AD_PKTTYPE to ppc JIT to load > skb->pkt_type field. > > Before: > [ 88.262622] test_bpf: #11 LD_IND_NET 86 97 99 PASS > [ 88.265740] test_bpf: #12 LD_PKTTYPE 109 107 PASS > > After: > [ 80.605964] test_bpf: #11 LD_IND_NET 44 40 39 PASS > [ 80.607370] test_bpf: #12 LD_PKTTYPE 9 9 PASS > > CC: Alexei Starovoitov > CC: Michael Ellerman > Cc: Matt Evans > Signed-off-by: Denis Kirjanov > > v2: Added test rusults So, can I apply this now? ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/2] Support multiple MSI interrupts on FSL-MPIC
This series adds support for multiple MSI interrupts on FSL's MPIC. The patch series was originally done by Sebastian Andrzej Siewior and re-applied and tested by me. I dodn't know whether it is OK to put my name on it or not. So if Sebastian has a problem with it I'll of cause change it immediately. It was just convenient to use my name in the git commit but I don't want to do any kind of copyright infrigement. Johannes Thumshirn (2): irqdomain: add support for creating a continous mapping powerpc: msi: fsl: add support for multiple MSI interrupts arch/powerpc/kernel/msi.c | 4 -- arch/powerpc/platforms/cell/axon_msi.c | 3 ++ arch/powerpc/platforms/powernv/pci.c | 3 ++ arch/powerpc/platforms/pseries/msi.c | 3 ++ arch/powerpc/sysdev/fsl_msi.c | 22 ++--- arch/powerpc/sysdev/mpic_pasemi_msi.c | 3 ++ arch/powerpc/sysdev/mpic_u3msi.c | 3 ++ arch/powerpc/sysdev/ppc4xx_msi.c | 2 + include/linux/irqdomain.h | 10 +++- kernel/irq/irqdomain.c | 85 ++ 10 files changed, 106 insertions(+), 32 deletions(-) -- 1.9.1 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/2] powerpc: msi: fsl: add support for multiple MSI interrupts
This patch pushes the check for nvec > 1 && MSI into the check function of each MSI driver except for FSL's MSI where the functionality is added. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Johannes Thumshirn --- arch/powerpc/kernel/msi.c | 4 arch/powerpc/platforms/cell/axon_msi.c | 3 +++ arch/powerpc/platforms/powernv/pci.c | 3 +++ arch/powerpc/platforms/pseries/msi.c | 3 +++ arch/powerpc/sysdev/fsl_msi.c | 22 -- arch/powerpc/sysdev/mpic_pasemi_msi.c | 3 +++ arch/powerpc/sysdev/mpic_u3msi.c | 3 +++ arch/powerpc/sysdev/ppc4xx_msi.c | 2 ++ 8 files changed, 33 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/kernel/msi.c b/arch/powerpc/kernel/msi.c index 71bd161..80ee2f4 100644 --- a/arch/powerpc/kernel/msi.c +++ b/arch/powerpc/kernel/msi.c @@ -20,10 +20,6 @@ int arch_setup_msi_irqs(struct pci_dev *dev, int nvec, int type) return -ENOSYS; } - /* PowerPC doesn't support multiple MSI yet */ - if (type == PCI_CAP_ID_MSI && nvec > 1) - return 1; - return ppc_md.setup_msi_irqs(dev, nvec, type); } diff --git a/arch/powerpc/platforms/cell/axon_msi.c b/arch/powerpc/platforms/cell/axon_msi.c index 862b327..537a70e 100644 --- a/arch/powerpc/platforms/cell/axon_msi.c +++ b/arch/powerpc/platforms/cell/axon_msi.c @@ -250,6 +250,9 @@ static int setup_msi_msg_address(struct pci_dev *dev, struct msi_msg *msg) of_node_put(dn); + if (type == PCI_CAP_ID_MSI && nvec > 1) + return 1; + return 0; } diff --git a/arch/powerpc/platforms/powernv/pci.c b/arch/powerpc/platforms/powernv/pci.c index b2187d0..de33ec0 100644 --- a/arch/powerpc/platforms/powernv/pci.c +++ b/arch/powerpc/platforms/powernv/pci.c @@ -63,6 +63,9 @@ static int pnv_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type) if (pdn && pdn->force_32bit_msi && !phb->msi32_support) return -ENODEV; + if (type == PCI_CAP_ID_MSI && nvec > 1) + return 1; + list_for_each_entry(entry, &pdev->msi_list, list) { if (!entry->msi_attrib.is_64 && !phb->msi32_support) { pr_warn("%s: Supports only 64-bit MSIs\n", diff --git a/arch/powerpc/platforms/pseries/msi.c b/arch/powerpc/platforms/pseries/msi.c index 8ab5add..544e924 100644 --- a/arch/powerpc/platforms/pseries/msi.c +++ b/arch/powerpc/platforms/pseries/msi.c @@ -383,6 +383,9 @@ static int rtas_setup_msi_irqs(struct pci_dev *pdev, int nvec_in, int type) int nvec = nvec_in; int use_32bit_msi_hack = 0; + if (type == PCI_CAP_ID_MSI && nvec > 1) + return 1; + if (type == PCI_CAP_ID_MSIX) rc = check_req_msix(pdev, nvec); else diff --git a/arch/powerpc/sysdev/fsl_msi.c b/arch/powerpc/sysdev/fsl_msi.c index de40b48..454e8b1 100644 --- a/arch/powerpc/sysdev/fsl_msi.c +++ b/arch/powerpc/sysdev/fsl_msi.c @@ -131,13 +131,19 @@ static void fsl_teardown_msi_irqs(struct pci_dev *pdev) struct fsl_msi *msi_data; list_for_each_entry(entry, &pdev->msi_list, list) { + int num; + int i; + if (entry->irq == NO_IRQ) continue; msi_data = irq_get_chip_data(entry->irq); irq_set_msi_desc(entry->irq, NULL); + num = 1 << entry->msi_attrib.multiple; msi_bitmap_free_hwirqs(&msi_data->bitmap, - virq_to_hw(entry->irq), 1); - irq_dispose_mapping(entry->irq); + virq_to_hw(entry->irq), num); + + for (i = 0; i < num; i++) + irq_dispose_mapping(entry->irq + i); } return; @@ -180,6 +186,7 @@ static int fsl_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type) struct msi_desc *entry; struct msi_msg msg; struct fsl_msi *msi_data; + int i; if (type == PCI_CAP_ID_MSIX) pr_debug("fslmsi: MSI-X untested, trying anyway.\n"); @@ -219,7 +226,8 @@ static int fsl_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type) if (phandle && (phandle != msi_data->phandle)) continue; - hwirq = msi_bitmap_alloc_hwirqs(&msi_data->bitmap, 1); + hwirq = msi_bitmap_alloc_hwirqs(&msi_data->bitmap, + nvec); if (hwirq >= 0) break; } @@ -230,16 +238,18 @@ static int fsl_setup_msi_irqs(struct pci_dev *pdev, int nvec, int type) goto out_free; } - virq = irq_create_mapping(msi_data->irqhost, hwirq); + virq = irq_create_mapping_block(msi_data->irqhost, hwirq, nvec); if
[PATCH 1/2] irqdomain: add support for creating a continous mapping
A MSI device may have multiple interrupts. That means that the interrupts numbers should be continuos so that pdev->irq refers to the first interrupt, pdev->irq + 1 to the second and so on. This patch adds support for continuous allocation of virqs for a range of hwirqs. The function is based on irq_create_mapping() but due to the number argument there is very little in common now. Signed-off-by: Sebastian Andrzej Siewior Signed-off-by: Johannes Thumshirn --- include/linux/irqdomain.h | 10 -- kernel/irq/irqdomain.c| 85 --- 2 files changed, 73 insertions(+), 22 deletions(-) diff --git a/include/linux/irqdomain.h b/include/linux/irqdomain.h index b0f9d16..75662f3 100644 --- a/include/linux/irqdomain.h +++ b/include/linux/irqdomain.h @@ -175,8 +175,14 @@ extern void irq_domain_associate_many(struct irq_domain *domain, extern void irq_domain_disassociate(struct irq_domain *domain, unsigned int irq); -extern unsigned int irq_create_mapping(struct irq_domain *host, - irq_hw_number_t hwirq); +extern unsigned int irq_create_mapping_block(struct irq_domain *host, + irq_hw_number_t hwirq, unsigned int num); +static inline unsigned int irq_create_mapping(struct irq_domain *host, + irq_hw_number_t hwirq) +{ + return irq_create_mapping_block(host, hwirq, 1); +} + extern void irq_dispose_mapping(unsigned int virq); /** diff --git a/kernel/irq/irqdomain.c b/kernel/irq/irqdomain.c index 6534ff6..fba488f 100644 --- a/kernel/irq/irqdomain.c +++ b/kernel/irq/irqdomain.c @@ -375,27 +375,61 @@ unsigned int irq_create_direct_mapping(struct irq_domain *domain) } EXPORT_SYMBOL_GPL(irq_create_direct_mapping); +static int irq_check_continuous_mapping(struct irq_domain *domain, + irq_hw_number_t hwirq, unsigned int num) +{ + int virq; + int i; + + virq = irq_find_mapping(domain, hwirq); + + for (i = 1; i < num; i++) { + unsigned int next; + + next = irq_find_mapping(domain, hwirq + i); + if (next == virq + i) + continue; + + pr_err("irq: invalid partial mapping. First hwirq %lu maps to " + "%d and\n", hwirq, virq); + pr_err("irq: +%d hwirq (%lu) maps to %d but should be %d.\n", + i, hwirq + i, next, virq + i); + return -EINVAL; + } + + pr_debug("-> existing mapping on virq %d\n", virq); + return virq; +} + + /** - * irq_create_mapping() - Map a hardware interrupt into linux irq space + * irq_create_mapping_block() - Map multiple hardware interrupts * @domain: domain owning this hardware interrupt or NULL for default domain * @hwirq: hardware irq number in that domain space + * @num: number of interrupts + * + * Maps a hwirq to a newly allocated virq. If num is greater than 1 then num + * hwirqs (hwirq ... hwirq + num - 1) will be mapped and virq will be + * continuous. + * Returns the first linux virq number. * - * Only one mapping per hardware interrupt is permitted. Returns a linux - * irq number. * If the sense/trigger is to be specified, set_irq_type() should be called * on the number returned from that call. */ -unsigned int irq_create_mapping(struct irq_domain *domain, +unsigned int irq_create_mapping_block(struct irq_domain *domain, irq_hw_number_t hwirq) { unsigned int hint; int virq; + int node; + int i; - pr_debug("irq_create_mapping(0x%p, 0x%lx)\n", domain, hwirq); + pr_debug("%s(0x%p, 0x%lx, %d)\n", __func__, domain, hwirq, num); /* Look for default domain if nececssary */ - if (domain == NULL) + if (!domain && num == 1) domain = irq_default_domain; + if (domain == NULL) { WARN(1, "%s(, %lx) called with NULL domain\n", __func__, hwirq); return 0; @@ -403,35 +437,46 @@ unsigned int irq_create_mapping(struct irq_domain *domain, pr_debug("-> using domain @%p\n", domain); /* Check if mapping already exists */ - virq = irq_find_mapping(domain, hwirq); - if (virq) { - pr_debug("-> existing mapping on virq %d\n", virq); - return virq; + for (i = 0; i < num; i++) { + virq = irq_find_mapping(domain, hwirq + i); + if (virq != NO_IRQ) { + if (i == 0) + return irq_check_continuous_mapping(domain, + hwirq, num); + pr_err("irq: hwirq %ld has no mapping but hwirq %ld " + "maps to virq %d. This can't be a block\n", + hwirq, hwirq + i, virq); +
[PATCH 3/4] powernv: cpuidle: Redesign idle states management
Deep idle states like sleep and winkle are per core idle states. A core enters these states only when all the threads enter either the particular idle state or a deeper one. There are tasks like fastsleep hardware bug workaround and hypervisor core state save which have to be done only by the last thread of the core entering deep idle state and similarly tasks like timebase resync, hypervisor core register restore that have to be done only by the first thread waking up from these state. The current idle state management does not have a way to distinguish the first/last thread of the core waking/entering idle states. Tasks like timebase resync are done for all the threads. This is not only is suboptimal, but can cause functionality issues when subcores and kvm is involved. This patch adds the necessary infrastructure to track idle states of threads in a per-core structure. It uses this info to perform tasks like fastsleep workaround and timebase resync only once per core. Signed-off-by: Shreyas B. Prabhu Originally-by: Preeti U. Murthy Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Rafael J. Wysocki Cc: linux...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org --- arch/powerpc/include/asm/cpuidle.h | 14 ++ arch/powerpc/include/asm/opal.h| 2 + arch/powerpc/include/asm/paca.h| 4 + arch/powerpc/kernel/asm-offsets.c | 4 + arch/powerpc/kernel/exceptions-64s.S | 20 ++- arch/powerpc/kernel/idle_power7.S | 183 +++-- arch/powerpc/platforms/powernv/opal-wrappers.S | 37 + arch/powerpc/platforms/powernv/setup.c | 52 ++- arch/powerpc/platforms/powernv/smp.c | 3 +- drivers/cpuidle/cpuidle-powernv.c | 3 +- 10 files changed, 267 insertions(+), 55 deletions(-) create mode 100644 arch/powerpc/include/asm/cpuidle.h diff --git a/arch/powerpc/include/asm/cpuidle.h b/arch/powerpc/include/asm/cpuidle.h new file mode 100644 index 000..8c82850 --- /dev/null +++ b/arch/powerpc/include/asm/cpuidle.h @@ -0,0 +1,14 @@ +#ifndef _ASM_POWERPC_CPUIDLE_H +#define _ASM_POWERPC_CPUIDLE_H + +#ifdef CONFIG_PPC_POWERNV +/* Used in powernv idle state management */ +#define PNV_THREAD_RUNNING 0 +#define PNV_THREAD_NAP 1 +#define PNV_THREAD_SLEEP2 +#define PNV_THREAD_WINKLE 3 +#define PNV_CORE_IDLE_LOCK_BIT 0x100 +#define PNV_CORE_IDLE_THREAD_BITS 0x0FF +#endif + +#endif diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index f8b95c0..bef7fbc 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -152,6 +152,7 @@ struct opal_sg_list { #define OPAL_PCI_ERR_INJECT96 #define OPAL_PCI_EEH_FREEZE_SET97 #define OPAL_HANDLE_HMI98 +#define OPAL_CONFIG_CPU_IDLE_STATE 99 #define OPAL_REGISTER_DUMP_REGION 101 #define OPAL_UNREGISTER_DUMP_REGION102 @@ -162,6 +163,7 @@ struct opal_sg_list { */ #define OPAL_PM_NAP_ENABLED0x0001 #define OPAL_PM_SLEEP_ENABLED 0x0002 +#define OPAL_PM_SLEEP_ENABLED_ER1 0x0008 #ifndef __ASSEMBLY__ diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index a5139ea..85aeedb 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -158,6 +158,10 @@ struct paca_struct { * early exception handler for use by high level C handler */ struct opal_machine_check_event *opal_mc_evt; + + /* Per-core mask tracking idle threads and a lock bit-[L][] */ + u32 *core_idle_state_ptr; + u8 thread_idle_state; /* ~Idle[0]/Nap[1]/Sleep[2]/Winkle[3] */ #endif #ifdef CONFIG_PPC_BOOK3S_64 /* Exclusive emergency stack pointer for machine check exception. */ diff --git a/arch/powerpc/kernel/asm-offsets.c b/arch/powerpc/kernel/asm-offsets.c index 9d7dede..50f299e 100644 --- a/arch/powerpc/kernel/asm-offsets.c +++ b/arch/powerpc/kernel/asm-offsets.c @@ -731,6 +731,10 @@ int main(void) DEFINE(OPAL_MC_SRR0, offsetof(struct opal_machine_check_event, srr0)); DEFINE(OPAL_MC_SRR1, offsetof(struct opal_machine_check_event, srr1)); DEFINE(PACA_OPAL_MC_EVT, offsetof(struct paca_struct, opal_mc_evt)); + DEFINE(PACA_CORE_IDLE_STATE_PTR, + offsetof(struct paca_struct, core_idle_state_ptr)); + DEFINE(PACA_THREAD_IDLE_STATE, + offsetof(struct paca_struct, thread_idle_state)); #endif return 0; diff --git a/arch/powerpc/kernel/exceptions-64s.S b/arch/powerpc/kernel/exceptions-64s.S index 72e783e..3311c8d 100644 --- a/arch/powerpc/kernel/exceptions-64s.S +++ b/arch/powerpc/kernel/exceptions-64s.S @@ -15,6 +15,7 @@ #include #include #include +#include /* * We layout physic
[PATCH 4/4] powernv: powerpc: Add winkle support for offline cpus
Winkle is a deep idle state supported in power8 chips. A core enters winkle when all the threads of the core enter winkle. In this state power supply to the entire chiplet i.e core, private L2 and private L3 is turned off. As a result it gives higher powersavings compared to sleep. But entering winkle results in a total hypervisor state loss. Hence the hypervisor context has to be preserved before entering winkle and restored upon wake up. Power-on Reset Engine (PORE) is a dedicated engine which is responsible for powering on the chiplet during wake up. It can be programmed to restore the register contests of a few specific registers. This patch uses PORE to restore register state wherever possible and uses stack to save and restore rest of the necessary registers. With hypervisor state restore things fall under three categories- per-core state, per-subcore state and per-thread state. To manage this, extend the infrastructure introduced for sleep. Mainly we add a paca variable subcore_sibling_mask. Using this and the core_idle_state we can distingush first thread in core and subcore. Signed-off-by: Shreyas B. Prabhu Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org --- arch/powerpc/include/asm/opal.h| 3 + arch/powerpc/include/asm/paca.h| 2 + arch/powerpc/include/asm/ppc-opcode.h | 2 + arch/powerpc/include/asm/processor.h | 1 + arch/powerpc/include/asm/reg.h | 2 + arch/powerpc/kernel/asm-offsets.c | 2 + arch/powerpc/kernel/cpu_setup_power.S | 4 + arch/powerpc/kernel/exceptions-64s.S | 10 ++ arch/powerpc/kernel/idle_power7.S | 161 ++--- arch/powerpc/platforms/powernv/opal-wrappers.S | 2 + arch/powerpc/platforms/powernv/setup.c | 73 +++ arch/powerpc/platforms/powernv/smp.c | 4 +- arch/powerpc/platforms/powernv/subcore.c | 34 ++ arch/powerpc/platforms/powernv/subcore.h | 1 + 14 files changed, 285 insertions(+), 16 deletions(-) diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index bef7fbc..f0ca2d9 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -153,6 +153,7 @@ struct opal_sg_list { #define OPAL_PCI_EEH_FREEZE_SET97 #define OPAL_HANDLE_HMI98 #define OPAL_CONFIG_CPU_IDLE_STATE 99 +#define OPAL_SLW_SET_REG 100 #define OPAL_REGISTER_DUMP_REGION 101 #define OPAL_UNREGISTER_DUMP_REGION102 @@ -163,6 +164,7 @@ struct opal_sg_list { */ #define OPAL_PM_NAP_ENABLED0x0001 #define OPAL_PM_SLEEP_ENABLED 0x0002 +#define OPAL_PM_WINKLE_ENABLED 0x0004 #define OPAL_PM_SLEEP_ENABLED_ER1 0x0008 #ifndef __ASSEMBLY__ @@ -972,6 +974,7 @@ int64_t opal_sensor_read(uint32_t sensor_hndl, int token, __be32 *sensor_data); int64_t opal_handle_hmi(void); int64_t opal_register_dump_region(uint32_t id, uint64_t start, uint64_t end); int64_t opal_unregister_dump_region(uint32_t id); +int64_t opal_slw_set_reg(uint64_t cpu_pir, uint64_t sprn, uint64_t val); int64_t opal_pci_set_phb_cxl_mode(uint64_t phb_id, uint64_t mode, uint64_t pe_number); /* Internal functions */ diff --git a/arch/powerpc/include/asm/paca.h b/arch/powerpc/include/asm/paca.h index 85aeedb..c2e51b7 100644 --- a/arch/powerpc/include/asm/paca.h +++ b/arch/powerpc/include/asm/paca.h @@ -162,6 +162,8 @@ struct paca_struct { /* Per-core mask tracking idle threads and a lock bit-[L][] */ u32 *core_idle_state_ptr; u8 thread_idle_state; /* ~Idle[0]/Nap[1]/Sleep[2]/Winkle[3] */ + /* Mask to denote subcore sibling threads */ + u8 subcore_sibling_mask; #endif #ifdef CONFIG_PPC_BOOK3S_64 /* Exclusive emergency stack pointer for machine check exception. */ diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 6f85362..5155be7 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -194,6 +194,7 @@ #define PPC_INST_NAP 0x4c000364 #define PPC_INST_SLEEP 0x4c0003a4 +#define PPC_INST_WINKLE0x4c0003e4 /* A2 specific instructions */ #define PPC_INST_ERATWE0x7c0001a6 @@ -374,6 +375,7 @@ #define PPC_NAPstringify_in_c(.long PPC_INST_NAP) #define PPC_SLEEP stringify_in_c(.long PPC_INST_SLEEP) +#define PPC_WINKLE stringify_in_c(.long PPC_INST_WINKLE) /* BHRB instructions */ #define PPC_CLRBHRBstringify_in_c(.long PPC_INST_CLRBHRB) diff --git a/arch/powerpc/include/asm/processor.h b/arch/powerpc/include/asm/processor.h index dda7ac4..c076842 100644 --- a/arch/powerpc/include/asm/processor.h +++ b/arch
[PATCH 2/4] powerpc/powernv: Enable Offline CPUs to enter deep idle states
From: "Preeti U. Murthy" The secondary threads should enter deep idle states so as to gain maximum powersavings when the entire core is offline. To do so the offline path must be made aware of the available deepest idle state. Hence probe the device tree for the possible idle states in powernv core code and expose the deepest idle state through flags. Since the device tree is probed by the cpuidle driver as well, move the parameters required to discover the idle states into an appropriate common place to both the driver and the powernv core code. Another point is that fastsleep idle state may require workarounds in the kernel to function properly. This workaround is introduced in the subsequent patches. However neither the cpuidle driver or the hotplug path need be bothered about this workaround. They will be taken care of by the core powernv code. Originally-by: Srivatsa S. Bhat Signed-off-by: Preeti U. Murthy Signed-off-by: Shreyas B. Prabhu Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Rafael J. Wysocki Cc: linux...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org --- arch/powerpc/include/asm/opal.h | 8 ++ arch/powerpc/platforms/powernv/powernv.h | 2 ++ arch/powerpc/platforms/powernv/setup.c | 49 arch/powerpc/platforms/powernv/smp.c | 7 - drivers/cpuidle/cpuidle-powernv.c| 9 ++ 5 files changed, 68 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/opal.h b/arch/powerpc/include/asm/opal.h index 9124b0e..f8b95c0 100644 --- a/arch/powerpc/include/asm/opal.h +++ b/arch/powerpc/include/asm/opal.h @@ -155,6 +155,14 @@ struct opal_sg_list { #define OPAL_REGISTER_DUMP_REGION 101 #define OPAL_UNREGISTER_DUMP_REGION102 +/* Device tree flags */ + +/* Flags set in power-mgmt nodes in device tree if + * respective idle states are supported in the platform. + */ +#define OPAL_PM_NAP_ENABLED0x0001 +#define OPAL_PM_SLEEP_ENABLED 0x0002 + #ifndef __ASSEMBLY__ #include diff --git a/arch/powerpc/platforms/powernv/powernv.h b/arch/powerpc/platforms/powernv/powernv.h index 6c8e2d1..604c48e 100644 --- a/arch/powerpc/platforms/powernv/powernv.h +++ b/arch/powerpc/platforms/powernv/powernv.h @@ -29,6 +29,8 @@ static inline u64 pnv_pci_dma_get_required_mask(struct pci_dev *pdev) } #endif +extern u32 pnv_get_supported_cpuidle_states(void); + extern void pnv_lpc_init(void); bool cpu_core_split_required(void); diff --git a/arch/powerpc/platforms/powernv/setup.c b/arch/powerpc/platforms/powernv/setup.c index 3f9546d..34c6665 100644 --- a/arch/powerpc/platforms/powernv/setup.c +++ b/arch/powerpc/platforms/powernv/setup.c @@ -290,6 +290,55 @@ static void __init pnv_setup_machdep_rtas(void) } #endif /* CONFIG_PPC_POWERNV_RTAS */ +static u32 supported_cpuidle_states; + +u32 pnv_get_supported_cpuidle_states(void) +{ + return supported_cpuidle_states; +} + +static int __init pnv_init_idle_states(void) +{ + struct device_node *power_mgt; + int dt_idle_states; + const __be32 *idle_state_flags; + u32 len_flags, flags; + int i; + + supported_cpuidle_states = 0; + + if (cpuidle_disable != IDLE_NO_OVERRIDE) + return 0; + + if (!firmware_has_feature(FW_FEATURE_OPALv3)) + return 0; + + power_mgt = of_find_node_by_path("/ibm,opal/power-mgt"); + if (!power_mgt) { + pr_warn("opal: PowerMgmt Node not found\n"); + return 0; + } + + idle_state_flags = of_get_property(power_mgt, + "ibm,cpu-idle-state-flags", &len_flags); + if (!idle_state_flags) { + pr_warn("DT-PowerMgmt: missing ibm,cpu-idle-state-flags\n"); + return 0; + } + + dt_idle_states = len_flags / sizeof(u32); + + for (i = 0; i < dt_idle_states; i++) { + flags = be32_to_cpu(idle_state_flags[i]); + supported_cpuidle_states |= flags; + } + + return 0; +} + +subsys_initcall(pnv_init_idle_states); + + static int __init pnv_probe(void) { unsigned long root = of_get_flat_dt_root(); diff --git a/arch/powerpc/platforms/powernv/smp.c b/arch/powerpc/platforms/powernv/smp.c index 4753958..3dc4cec 100644 --- a/arch/powerpc/platforms/powernv/smp.c +++ b/arch/powerpc/platforms/powernv/smp.c @@ -149,6 +149,7 @@ static int pnv_smp_cpu_disable(void) static void pnv_smp_cpu_kill_self(void) { unsigned int cpu; + u32 idle_states; /* Standard hot unplug procedure */ local_irq_disable(); @@ -159,13 +160,17 @@ static void pnv_smp_cpu_kill_self(void) generic_set_cpu_dead(cpu); smp_wmb(); + idle_states = pnv_get_supported_cpuidle_states(); /* We don't want to take decrementer interrupts while we are offline, * so clear LPCR:PECE1. We keep PECE2 enabled. */ mtspr(SPRN_LPCR, mfsp
[PATCH 1/4] powerpc: powernv: Switch off MMU before entering nap/sleep/rvwinkle mode
From: Paul Mackerras Currently, when going idle, we set the flag indicating that we are in nap mode (paca->kvm_hstate.hwthread_state) and then execute the nap (or sleep or rvwinkle) instruction, all with the MMU on. This is bad for two reasons: (a) the architecture specifies that those instructions must be executed with the MMU off, and in fact with only the SF, HV, ME and possibly RI bits set, and (b) this introduces a race, because as soon as we set the flag, another thread can switch the MMU to a guest context. If the race is lost, this thread will typically start looping on relocation-on ISIs at 0xc...4400. This fixes it by setting the MSR as required by the architecture before setting the flag or executing the nap/sleep/rvwinkle instruction. [ shre...@linux.vnet.ibm.com: Edited to handle LE ] Signed-off-by: Paul Mackerras Signed-off-by: Shreyas B. Prabhu Cc: Benjamin Herrenschmidt Cc: Michael Ellerman Cc: linuxppc-dev@lists.ozlabs.org --- arch/powerpc/include/asm/reg.h| 2 ++ arch/powerpc/kernel/idle_power7.S | 18 +- 2 files changed, 19 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/reg.h b/arch/powerpc/include/asm/reg.h index c998279..a68ee15 100644 --- a/arch/powerpc/include/asm/reg.h +++ b/arch/powerpc/include/asm/reg.h @@ -118,8 +118,10 @@ #define __MSR (MSR_ME | MSR_RI | MSR_IR | MSR_DR | MSR_ISF |MSR_HV) #ifdef __BIG_ENDIAN__ #define MSR_ __MSR +#define MSR_IDLE (MSR_ME | MSR_SF | MSR_HV) #else #define MSR_ (__MSR | MSR_LE) +#define MSR_IDLE (MSR_ME | MSR_SF | MSR_HV | MSR_LE) #endif #define MSR_KERNEL (MSR_ | MSR_64BIT) #define MSR_USER32 (MSR_ | MSR_PR | MSR_EE) diff --git a/arch/powerpc/kernel/idle_power7.S b/arch/powerpc/kernel/idle_power7.S index c0754bb..283c603 100644 --- a/arch/powerpc/kernel/idle_power7.S +++ b/arch/powerpc/kernel/idle_power7.S @@ -101,7 +101,23 @@ _GLOBAL(power7_powersave_common) std r9,_MSR(r1) std r1,PACAR1(r13) -_GLOBAL(power7_enter_nap_mode) + /* +* Go to real mode to do the nap, as required by the architecture. +* Also, we need to be in real mode before setting hwthread_state, +* because as soon as we do that, another thread can switch +* the MMU context to the guest. +*/ + LOAD_REG_IMMEDIATE(r5, MSR_IDLE) + li r6, MSR_RI + andcr6, r9, r6 + LOAD_REG_ADDR(r7, power7_enter_nap_mode) + mtmsrd r6, 1 /* clear RI before setting SRR0/1 */ + mtspr SPRN_SRR0, r7 + mtspr SPRN_SRR1, r5 + rfid + + .globl power7_enter_nap_mode +power7_enter_nap_mode: #ifdef CONFIG_KVM_BOOK3S_HV_POSSIBLE /* Tell KVM we're napping */ li r4,KVM_HWTHREAD_IN_NAP -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 0/4] powernv: cpuidle: Redesign idle states management
Deep idle states like sleep and winkle are per core idle states. A core enters these states only when all the threads enter either the particular idle state or a deeper one. There are tasks like fastsleep hardware bug workaround and hypervisor core state save which have to be done only by the last thread of the core entering deep idle state and similarly tasks like timebase resync, hypervisor core register restore that have to be done only by the first thread waking up from these states. The current idle state management does not have a way to distinguish the first/last thread of the core waking/entering idle states. Tasks like timebase resync are done for all the threads. This is not only is suboptimal, but can cause functionality issues when subcores are involved. Winkle is deeper idle state compared to fastsleep. In this state the power supply to the chiplet, i.e core, private L2 and private L3 is turned off. This results in a total hypervisor state loss. This patch set adds support for winkle and provides a way to track the idle states of the threads of the core and use it for idle state management of idle states sleep and winkle. TODO: - Handle the case where a thread enters nap and wakes up with supervisor/ hypervisor state loss. This can only happen due to a bug in the hardware or the kernel. One way to handle this can be restore the state, switch to the kernel process context and trigger a panic or a warning. Cc: Benjamin Herrenschmidt Cc: Paul Mackerras Cc: Michael Ellerman Cc: Rafael J. Wysocki Cc: linux...@vger.kernel.org Cc: linuxppc-dev@lists.ozlabs.org Cc: Vaidyanathan Srinivasan Cc: Preeti U Murthy Paul Mackerras (1): powerpc: powernv: Switch off MMU before entering nap/sleep/rvwinkle mode Preeti U. Murthy (1): powerpc/powernv: Enable Offline CPUs to enter deep idle states Shreyas B. Prabhu (2): powernv: cpuidle: Redesign idle states management powernv: powerpc: Add winkle support for offline cpus arch/powerpc/include/asm/cpuidle.h | 14 ++ arch/powerpc/include/asm/opal.h| 13 + arch/powerpc/include/asm/paca.h| 6 + arch/powerpc/include/asm/ppc-opcode.h | 2 + arch/powerpc/include/asm/processor.h | 1 + arch/powerpc/include/asm/reg.h | 2 + arch/powerpc/kernel/asm-offsets.c | 6 + arch/powerpc/kernel/cpu_setup_power.S | 4 + arch/powerpc/kernel/exceptions-64s.S | 30 ++- arch/powerpc/kernel/idle_power7.S | 326 + arch/powerpc/platforms/powernv/opal-wrappers.S | 39 +++ arch/powerpc/platforms/powernv/powernv.h | 2 + arch/powerpc/platforms/powernv/setup.c | 170 + arch/powerpc/platforms/powernv/smp.c | 10 +- arch/powerpc/platforms/powernv/subcore.c | 35 +++ arch/powerpc/platforms/powernv/subcore.h | 1 + drivers/cpuidle/cpuidle-powernv.c | 10 +- 17 files changed, 611 insertions(+), 60 deletions(-) create mode 100644 arch/powerpc/include/asm/cpuidle.h -- 1.9.3 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] powerpc: use device_online/offline() instead of cpu_up/down()
On 10/31/2014 02:41 PM, Dan Streetman wrote: > In powerpc pseries platform dlpar operations, Use device_online() and > device_offline() instead of cpu_up() and cpu_down(). > > Calling cpu_up/down directly does not update the cpu device offline > field, which is used to online/offline a cpu from sysfs. Calling > device_online/offline instead keeps the sysfs cpu online value correct. > The hotplug lock, which is required to be held when calling > device_online/offline, is already held when dlpar_online/offline_cpu > are called, since they are called only from cpu_probe|release_store. > > This patch fixes errors on PowerVM systems that have cpu(s) added/removed > using dlpar operations; without this patch, the > /sys/devices/system/cpu/cpuN/online nodes do not correctly show the > online state of added/removed cpus. > > Signed-off-by: Dan Streetman > Cc: Nathan Fontenot Acked-by: Nathan Fontenot > --- > > Previous discussion for this: > https://lkml.org/lkml/2014/10/29/839 > > arch/powerpc/platforms/pseries/dlpar.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/arch/powerpc/platforms/pseries/dlpar.c > b/arch/powerpc/platforms/pseries/dlpar.c > index 6ad83bd..c22bb1b 100644 > --- a/arch/powerpc/platforms/pseries/dlpar.c > +++ b/arch/powerpc/platforms/pseries/dlpar.c > @@ -382,7 +382,7 @@ static int dlpar_online_cpu(struct device_node *dn) > BUG_ON(get_cpu_current_state(cpu) > != CPU_STATE_OFFLINE); > cpu_maps_update_done(); > - rc = cpu_up(cpu); > + rc = device_online(get_cpu_device(cpu)); > if (rc) > goto out; > cpu_maps_update_begin(); > @@ -467,7 +467,7 @@ static int dlpar_offline_cpu(struct device_node *dn) > if (get_cpu_current_state(cpu) == CPU_STATE_ONLINE) { > set_preferred_offline_state(cpu, > CPU_STATE_OFFLINE); > cpu_maps_update_done(); > - rc = cpu_down(cpu); > + rc = device_offline(get_cpu_device(cpu)); > if (rc) > goto out; > cpu_maps_update_begin(); > ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
Re: [PATCH] PPC: bpf_jit_comp: add SKF_AD_PKTTYPE instruction
Denis Kirjanov wrote: Any feedback from PPC folks? I have reviewed the patch and it looks fine to me. I have tested successfuly on ppc64le. I could not test it on ppc64. Philippe On 10/26/14, Denis Kirjanov wrote: Cc: Matt Evans Signed-off-by: Denis Kirjanov --- arch/powerpc/include/asm/ppc-opcode.h | 1 + arch/powerpc/net/bpf_jit.h| 7 +++ arch/powerpc/net/bpf_jit_comp.c | 5 + 3 files changed, 13 insertions(+) diff --git a/arch/powerpc/include/asm/ppc-opcode.h b/arch/powerpc/include/asm/ppc-opcode.h index 6f85362..1a52877 100644 --- a/arch/powerpc/include/asm/ppc-opcode.h +++ b/arch/powerpc/include/asm/ppc-opcode.h @@ -204,6 +204,7 @@ #define PPC_INST_ERATSX_DOT 0x7c000127 /* Misc instructions for BPF compiler */ +#define PPC_INST_LBZ 0x8800 #define PPC_INST_LD 0xe800 #define PPC_INST_LHZ0xa000 #define PPC_INST_LHBRX 0x7c00062c diff --git a/arch/powerpc/net/bpf_jit.h b/arch/powerpc/net/bpf_jit.h index 9aee27c..c406aa9 100644 --- a/arch/powerpc/net/bpf_jit.h +++ b/arch/powerpc/net/bpf_jit.h @@ -87,6 +87,9 @@ DECLARE_LOAD_FUNC(sk_load_byte_msh); #define PPC_STD(r, base, i) EMIT(PPC_INST_STD | ___PPC_RS(r) |\ ___PPC_RA(base) | ((i) & 0xfffc)) + +#define PPC_LBZ(r, base, i)EMIT(PPC_INST_LBZ | ___PPC_RT(r) |\ +___PPC_RA(base) | IMM_L(i)) #define PPC_LD(r, base, i) EMIT(PPC_INST_LD | ___PPC_RT(r) | \ ___PPC_RA(base) | IMM_L(i)) #define PPC_LWZ(r, base, i) EMIT(PPC_INST_LWZ | ___PPC_RT(r) |\ @@ -96,6 +99,10 @@ DECLARE_LOAD_FUNC(sk_load_byte_msh); #define PPC_LHBRX(r, base, b) EMIT(PPC_INST_LHBRX | ___PPC_RT(r) | \ ___PPC_RA(base) | ___PPC_RB(b)) /* Convenience helpers for the above with 'far' offsets: */ +#define PPC_LBZ_OFFS(r, base, i) do { if ((i) < 32768) PPC_LBZ(r, base, i); \ + else { PPC_ADDIS(r, base, IMM_HA(i));\ + PPC_LBZ(r, r, IMM_L(i)); } } while(0) + #define PPC_LD_OFFS(r, base, i) do { if ((i) < 32768) PPC_LD(r, base, i); \ else { PPC_ADDIS(r, base, IMM_HA(i));\ PPC_LD(r, r, IMM_L(i)); } } while(0) diff --git a/arch/powerpc/net/bpf_jit_comp.c b/arch/powerpc/net/bpf_jit_comp.c index cbae2df..d110e28 100644 --- a/arch/powerpc/net/bpf_jit_comp.c +++ b/arch/powerpc/net/bpf_jit_comp.c @@ -407,6 +407,11 @@ static int bpf_jit_build_body(struct bpf_prog *fp, u32 *image, PPC_LHZ_OFFS(r_A, r_skb, offsetof(struct sk_buff, queue_mapping)); break; + case BPF_ANC | SKF_AD_PKTTYPE: + PPC_LBZ_OFFS(r_A, r_skb, PKT_TYPE_OFFSET()); + PPC_ANDI(r_A, r_A, PKT_TYPE_MAX); + PPC_SRWI(r_A, r_A, 5); + break; case BPF_ANC | SKF_AD_CPU: #ifdef CONFIG_SMP /* -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 1/2] powerpc/mm: Check for matching hpte without taking hpte lock
With smaller hash page table config, we would end up in situation where we would be replacing hash page table slot frequently. In such config, we will find the hpte to be not matching, and we can do that check without holding the hpte lock. We need to recheck the hpte again after holding lock. Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/mm/hash_native_64.c | 24 +++- 1 file changed, 15 insertions(+), 9 deletions(-) diff --git a/arch/powerpc/mm/hash_native_64.c b/arch/powerpc/mm/hash_native_64.c index ae4962a06476..83c6bb12be14 100644 --- a/arch/powerpc/mm/hash_native_64.c +++ b/arch/powerpc/mm/hash_native_64.c @@ -294,8 +294,6 @@ static long native_hpte_updatepp(unsigned long slot, unsigned long newpp, DBG_LOW("update(vpn=%016lx, avpnv=%016lx, group=%lx, newpp=%lx)", vpn, want_v & HPTE_V_AVPN, slot, newpp); - native_lock_hpte(hptep); - hpte_v = be64_to_cpu(hptep->v); /* * We need to invalidate the TLB always because hpte_remove doesn't do @@ -308,16 +306,24 @@ static long native_hpte_updatepp(unsigned long slot, unsigned long newpp, DBG_LOW(" -> miss\n"); ret = -1; } else { - DBG_LOW(" -> hit\n"); - /* Update the HPTE */ - hptep->r = cpu_to_be64((be64_to_cpu(hptep->r) & ~(HPTE_R_PP | HPTE_R_N)) | - (newpp & (HPTE_R_PP | HPTE_R_N | HPTE_R_C))); + native_lock_hpte(hptep); + /* recheck with locks held */ + hpte_v = be64_to_cpu(hptep->v); + if (unlikely(!HPTE_V_COMPARE(hpte_v, want_v) || +!(hpte_v & HPTE_V_VALID))) { + ret = -1; + } else { + DBG_LOW(" -> hit\n"); + /* Update the HPTE */ + hptep->r = cpu_to_be64((be64_to_cpu(hptep->r) & + ~(HPTE_R_PP | HPTE_R_N)) | + (newpp & (HPTE_R_PP | HPTE_R_N | +HPTE_R_C))); + } + native_unlock_hpte(hptep); } - native_unlock_hpte(hptep); - /* Ensure it is out of the tlb too. */ tlbie(vpn, bpsize, apsize, ssize, local); - return ret; } -- 2.1.0 ___ Linuxppc-dev mailing list Linuxppc-dev@lists.ozlabs.org https://lists.ozlabs.org/listinfo/linuxppc-dev
[PATCH 2/2] powerpc/mm: don't do tlbie for updatepp request with NO HPTE fault
upatepp get called for a nohpte fault, when we find from the linux page table that the translation was hashed before. In that case we are sure that there is no existing translation, hence we could avoid doing tlbie. Performance number: We use randbox_access_bench written by Anton. Kernel with THP disabled and smaller hash page table size. 86.60% random_access_b [kernel.kallsyms][k] .native_hpte_updatepp 2.10% random_access_b random_access_bench [.] doit 1.99% random_access_b [kernel.kallsyms][k] .do_raw_spin_lock 1.85% random_access_b [kernel.kallsyms][k] .native_hpte_insert 1.26% random_access_b [kernel.kallsyms][k] .native_flush_hash_range 1.18% random_access_b [kernel.kallsyms][k] .__delay 0.69% random_access_b [kernel.kallsyms][k] .native_hpte_remove 0.37% random_access_b [kernel.kallsyms][k] .clear_user_page 0.34% random_access_b [kernel.kallsyms][k] .__hash_page_64K 0.32% random_access_b [kernel.kallsyms][k] fast_exception_return 0.30% random_access_b [kernel.kallsyms][k] .hash_page_mm With Fix: 27.54% random_access_b random_access_bench [.] doit 22.90% random_access_b [kernel.kallsyms][k] .native_hpte_insert 5.76% random_access_b [kernel.kallsyms][k] .native_hpte_remove 5.20% random_access_b [kernel.kallsyms][k] fast_exception_return 5.12% random_access_b [kernel.kallsyms][k] .__hash_page_64K 4.80% random_access_b [kernel.kallsyms][k] .hash_page_mm 3.31% random_access_b [kernel.kallsyms][k] data_access_common 1.84% random_access_b [kernel.kallsyms][k] .trace_hardirqs_on_caller Signed-off-by: Aneesh Kumar K.V --- arch/powerpc/include/asm/machdep.h| 2 +- arch/powerpc/include/asm/mmu-hash64.h | 22 ++--- arch/powerpc/include/asm/tlbflush.h | 2 +- arch/powerpc/kernel/exceptions-64s.S | 2 ++ arch/powerpc/mm/hash_low_64.S | 15 ++-- arch/powerpc/mm/hash_native_64.c | 15 arch/powerpc/mm/hash_utils_64.c | 40 +++ arch/powerpc/mm/hugepage-hash64.c | 6 ++--- arch/powerpc/mm/hugetlbpage-hash64.c | 6 ++--- arch/powerpc/platforms/cell/beat_htab.c | 4 ++-- arch/powerpc/platforms/cell/spu_base.c| 5 ++-- arch/powerpc/platforms/cell/spufs/fault.c | 2 +- arch/powerpc/platforms/ps3/htab.c | 2 +- arch/powerpc/platforms/pseries/lpar.c | 2 +- drivers/misc/cxl/fault.c | 8 +-- 15 files changed, 82 insertions(+), 51 deletions(-) diff --git a/arch/powerpc/include/asm/machdep.h b/arch/powerpc/include/asm/machdep.h index 307347f8ddbd..7b44bdf0c313 100644 --- a/arch/powerpc/include/asm/machdep.h +++ b/arch/powerpc/include/asm/machdep.h @@ -42,7 +42,7 @@ struct machdep_calls { unsigned long newpp, unsigned long vpn, int bpsize, int apsize, -int ssize, int local); +int ssize, unsigned long flags); void(*hpte_updateboltedpp)(unsigned long newpp, unsigned long ea, int psize, int ssize); diff --git a/arch/powerpc/include/asm/mmu-hash64.h b/arch/powerpc/include/asm/mmu-hash64.h index aeebc94b2bce..4f13c3ed7acf 100644 --- a/arch/powerpc/include/asm/mmu-hash64.h +++ b/arch/powerpc/include/asm/mmu-hash64.h @@ -316,27 +316,33 @@ static inline unsigned long hpt_hash(unsigned long vpn, return hash & 0x7fUL; } +#define HPTE_LOCAL_UPDATE 0x1 +#define HPTE_NOHPTE_UPDATE 0x2 + extern int __hash_page_4K(unsigned long ea, unsigned long access, unsigned long vsid, pte_t *ptep, unsigned long trap, - unsigned int local, int ssize, int subpage_prot); + unsigned long flags, int ssize, int subpage_prot); extern int __hash_page_64K(unsigned long ea, unsigned long access, unsigned long vsid, pte_t *ptep, unsigned long trap, - unsigned int local, int ssize); + unsigned long flags, int ssize); struct mm_struct; unsigned int hash_page_do_lazy_icache(unsigned int pp, pte_t pte, int trap); -extern int hash_page_mm(struct mm_struct *mm, unsigned long ea, unsigned long access, unsigned long trap); -extern int hash_page(unsigned long ea, unsigned long access, unsigned long trap); +extern int hash_page_mm(struct mm_struct *mm, unsigned l