date:20180313

Re: [PATCH] arm64: dts: qcom: rename qcom,pcie devices to pcie

2018-03-13 Thread Bjorn Andersson

On Tue 13 Mar 05:13 PDT 2018, Arnd Bergmann wrote:

> The node name for a PCIe host bridge must be "pcie" as required by
> the binging. dtc now warns about it:
> 

"binding", I presume...

> arch/arm64/boot/dts/qcom/apq8096-db820c.dtb: Warning (pci_bridge): 
> /soc/agnoc@0/qcom,pcie@61: node name is not "pci" or "pcie"
> arch/arm64/boot/dts/qcom/apq8096-db820c.dtb: Warning (pci_device_bus_num): 
> Failed prerequisite 'pci_bridge'
> arch/arm64/boot/dts/qcom/msm8996-mtp.dtb: Warning (pci_bridge): 
> /soc/agnoc@0/qcom,pcie@61: node name is not "pci" or "pcie"
> arch/arm64/boot/dts/qcom/msm8996-mtp.dtb: Warning (pci_device_bus_num): 
> Failed prerequisite 'pci_bridge'
> 
> This renames the nodes as appropriate.
> 

Reviewed-by: Bjorn Andersson 

Regards,
Bjorn

RE: [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg()

2018-03-13 Thread Dexuan Cui

> From: Dexuan Cui
> Sent: Wednesday, March 7, 2018 13:40
> To: Lorenzo Pieralisi 
> Cc: bhelg...@google.com; linux-...@vger.kernel.org; KY Srinivasan
> ; Stephen Hemminger ;
> o...@aepfle.de; a...@canonical.com; jasow...@redhat.com; linux-
> ker...@vger.kernel.org; driverdev-de...@linuxdriverproject.org; Haiyang
> Zhang ; vkuzn...@redhat.com;
> marcelo.ce...@canonical.com; Michael Kelley (EOSG)
> ; sta...@vger.kernel.org; Jack
> Morgenstein 
> Subject: RE: [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg()
> 
> > From: Lorenzo Pieralisi 
> > Sent: Wednesday, March 7, 2018 04:35
> > On Tue, Mar 06, 2018 at 06:21:56PM +, Dexuan Cui wrote:
> > > 1. With the patch "x86/vector/msi: Switch to global reservation mode"
> > > (4900be8360), the recent v4.15 and newer kernels always hang for 1-vCPU
> > > Hyper-V VM with SR-IOV. This is because when we reach
> > hv_compose_msi_msg()
> > > by request_irq()  -> request_threaded_irq() -> 
> > > __setup_irq()->irq_startup()
> > >  -> __irq_startup() -> irq_domain_activate_irq() -> ... ->
> > > msi_domain_activate() -> ... -> hv_compose_msi_msg(), local irq is
> > > disabled in __setup_irq().
> > >
> > > Fix this by polling the channel.
> > >
> > > 2. If the host is ejecting the VF device before we reach
> > > hv_compose_msi_msg(), in a UP VM, we can hang in
> hv_compose_msi_msg()
> > > forever, because at this time the host doesn't respond to the
> > > CREATE_INTERRUPT request. This issue also happens to old kernels like
> > > v4.14, v4.13, etc.
> >
> > If you are fixing a problem you should report what commit you are fixing
> > with a Fixes: tag and add a CC: sta...@vger.kernel.org to the commit log
> > to send it to stable kernels to which it should be applied; mentioning
> > kernel versions in the commit log is useless and should be omitted.
> 
> Hi Lorenzo,
> Thanks for your comments!
> This patch does have a "Cc: sta...@vger.kernel.org" in the sign-off area. :-)
> 
> Here the patch is made to resolve 2 issues:
> #1 is triggered by the x86 global reservation mode (4900be8360) patch.
> 4900be8360 in itself is good. It's just that drivers/pci/host/pci-hyperv.c
> should be fixed.
> 
> #2 is a longstanding issue since the first day the pci-hyperv driver was
> accepted into the kernel.
> 
> So IMO actually we don't really need to add a Fixes: tag, which is usually
> used to specify a specific commit that introduces a bug that is being fixed.
> 
> > Side note: you should not have sta...@vger.kernel.org in the email
> > addresses CC list you are sending the patches to (you mark patches for
> > stable by adding an appropriate CC tag in the commit log).
> 
> Sorry, I didn't know this, but actually I didn't add sta...@vger.kernel.org
> manually. Instead I used "git send-email" to send this patchset, and it told
> me "The Cc list above has been expanded by additional addresses found
> in the patch commit message."
> 
> I didn't find a way to disable this behavior of "git send-email" by checking
> its manual and googling it. This is strange.
> 
> > Here:
> >
> > git.kernel.org/.../Documentation/process/stable-kernel-rules.rst
> >
> > Last but not least, most of the patches in this series do not justify
> > sending them to stable kernels at all so you should remove the
> > corresponding tag from the patches.
> 
> I hope at least these 2 patches can go into the stable kernels:
> [PATCH v3 3/6] PCI: hv: serialize the present/eject work items
> [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg()
> Especially the second one, which fixes a real hang issue for UP virtual
> machines running v4.15 and newer.
> And,  IMO the patches are small enough (<100 lines) , but definitely
> the maintainers make the final call.
> 
> >
> > Thanks,
> > Lorenzo
> >
> > > Fix this by polling the channel for the PCI_EJECT message and
> > > hpdev->state, and by checking the PCI vendor ID.
> > >
> > > Note: actually the above issues also happen to a SMP VM, if
> > > "hbus->hdev->channel->target_cpu == smp_processor_id()" is true.
> > >
> > > Signed-off-by: Dexuan Cui 
> > > Tested-by: Adrian Suhov 
> > > Tested-by: Chris Valean 
> > > Cc: sta...@vger.kernel.org
> > > Cc: Stephen Hemminger 
> > > Cc: K. Y. Srinivasan 
> > > Cc: Vitaly Kuznetsov 
> > > Cc: Jack Morgenstein 
> > > ---
> > >  drivers/pci/host/pci-hyperv.c | 58
> 
> 
> Thanks,
> -- Dexuan

Hi Lorenzo, Bjorn, and all,
Do you need more ACKs? Currently Michael and Haiyang reviewed and ack'd 
the patchset.

Should I send a v4 that just removes the "CC: sta...@vger.kernel.org" tag
for patches 1, 2, 4 and 5? I tend to avoid a v4 as I supppose it would be 
easier if you just remove the tags if

[bug, bisected] pfifo_fast causes packet reordering

2018-03-13 Thread Jakob Unterwurzacher

During stress-testing our "ucan" USB/CAN adapter SocketCAN driver on 
Linux v4.16-rc4-383-ged58d66f60b3 we observed that a small fraction of 
packets are delivered out-of-order.


We have tracked the problem down to the driver interface level, and it 
seems that the driver's net_device_ops.ndo_start_xmit() function gets 
the packets handed over in the wrong order.


This behavior was not observed on Linux v4.15 and I have bisected the 
problem down to this patch:



commit c5ad119fb6c09b0297446be05bd66602fa564758
Author: John Fastabend 
Date:   Thu Dec 7 09:58:19 2017 -0800

   net: sched: pfifo_fast use skb_array

   This converts the pfifo_fast qdisc to use the skb_array data structure
   and set the lockless qdisc bit. pfifo_fast is the first qdisc to support
   the lockless bit that can be a child of a qdisc requiring locking. So
   we add logic to clear the lock bit on initialization in these cases when
   the qdisc graft operation occurs.

   This also removes the logic used to pick the next band to dequeue from
   and instead just checks a per priority array for packets from top priority
   to lowest. This might need to be a bit more clever but seems to work
   for now.

   Signed-off-by: John Fastabend 
   Signed-off-by: David S. Miller 


The patch does not revert cleanly, but moving to one commit earlier 
makes the problem go away.


Selecting the "fq" scheduler instead of "pfifo_fast" makes the problem 
go away as well.


Is this an unintended side-effect of the patch or is there something the 
driver has to do to request in-order delivery?


Thanks,
Jakob

Re: [PATCH 18/31] perf vendor events arm64: Add armv8-recommended.json

2018-03-13 Thread Arnaldo Carvalho de Melo

Em Tue, Mar 13, 2018 at 03:26:18PM +0100, Ingo Molnar escreveu:
> * Arnaldo Carvalho de Melo <a...@kernel.org> wrote:
> > From: John Garry <john.ga...@huawei.com>

> > Add JSON for ARMv8 IMPLEMENTATION DEFINED recommended events.

> > The JSON is copied from ARMv8 architecture reference manual, available
> > here:

> > https://static.docs.arm.com/ddi0487/ca/DDI0487C_a_armv8_arm.pdf

> > Signed-off-by: Shaokun Zhang <zhangshao...@hisilicon.com>
> > Cc: Alexander Shishkin <alexander.shish...@linux.intel.com>
> > Cc: Andi Kleen <a...@linux.intel.com>
> > Cc: Ganapatrao Kulkarni <ganapatrao.kulka...@cavium.com>
> > Cc: Jiri Olsa <jo...@redhat.com>
> > Cc: Namhyung Kim <namhy...@kernel.org>
> > Cc: Peter Zijlstra <pet...@infradead.org>
> > Cc: Shaokun Zhang <zhangshao...@hisilicon.com>
> > Cc: Will Deacon <will.dea...@arm.com>
> > Cc: William Cohen <wco...@redhat.com>
> > Cc: linux-arm-ker...@lists.infradead.org
> > Cc: linux...@huawei.com
> > Link: 
> > http://lkml.kernel.org/r/1520506716-197429-9-git-send-email-john.ga...@huawei.com
> > Signed-off-by: John Garry <john.ga...@huawei.com>
> > Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com>
 
> That's not a valid SOB chain, author != first-Signed-off-by.

I removed that cset for now, can you please check if the
perf-core-for-mingo-4.17-20180313-2 tag is allright?

- Arnaldo

Re: [PATCH v4 09/24] fpga: dfl-pci: add enumeration for feature devices

2018-03-13 Thread Alan Tull

On Tue, Feb 13, 2018 at 3:24 AM, Wu Hao  wrote:

Hi Hao,

Thanks again for splitting the pci part of the code from enumeration
and everything else.

One thing that may need to be fixed below, so with that fixed, adding my ack.

> The Device Feature List (DFL) is implemented in MMIO, and features
> are linked via the DFLs. This patch enables pcie driver to prepare
> enumeration information (e.g locations of all device feature lists
> in MMIO) and use common APIs provided by the Device Feature List
> framework to enumerate each feature device linked.
>
> Signed-off-by: Tim Whisonant 
> Signed-off-by: Enno Luebbers 
> Signed-off-by: Shiva Rao 
> Signed-off-by: Christopher Rauer 
> Signed-off-by: Zhang Yi 
> Signed-off-by: Xiao Guangrong 
> Signed-off-by: Wu Hao 
Acked-by: Alan Tull 

> ---
> v3: split from another patch
> use common functions from DFL framework for enumeration.
> v4: rebase
> ---
>  drivers/fpga/dfl-pci.c | 199 
> -
>  1 file changed, 197 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/fpga/dfl-pci.c b/drivers/fpga/dfl-pci.c
> index d91ea42..8ce8a94 100644
> --- a/drivers/fpga/dfl-pci.c
> +++ b/drivers/fpga/dfl-pci.c
> @@ -22,9 +22,52 @@
>  #include 
>  #include 
>
> +#include "dfl.h"
> +
>  #define DRV_VERSION"0.8"
>  #define DRV_NAME   "dfl-pci"
>
> +struct cci_drvdata {
> +   struct fpga_cdev *cdev; /* container device */
> +   struct list_head regions; /* list of pci bar mapping region */
> +};
> +
> +/* pci bar mapping info */
> +struct cci_region {
> +   int bar;
> +   void __iomem *ioaddr;   /* pointer to mapped bar region */
> +   struct list_head node;
> +};
> +
> +static void __iomem *cci_pci_ioremap_bar(struct pci_dev *pcidev, int bar)
> +{
> +   struct cci_drvdata *drvdata = pci_get_drvdata(pcidev);
> +   struct cci_region *region;
> +
> +   list_for_each_entry(region, >regions, node)
> +   if (region->bar == bar) {
> +   dev_dbg(>dev, "BAR %d region exists\n", bar);
> +   return region->ioaddr;
> +   }
> +
> +   region = devm_kzalloc(>dev, sizeof(*region), GFP_KERNEL);
> +   if (!region)
> +   return NULL;
> +
> +   region->bar = bar;
> +   region->ioaddr = pci_ioremap_bar(pcidev, bar);
> +   if (!region->ioaddr) {
> +   dev_err(>dev, "can't ioremap memory from BAR %d.\n",
> +   bar);
> +   devm_kfree(>dev, region);
> +   return NULL;
> +   }
> +
> +   list_add(>node, >regions);
> +
> +   return region->ioaddr;
> +}
> +
>  /* PCI Device ID */
>  #define PCIE_DEVICE_ID_PF_INT_5_X  0xBCBD
>  #define PCIE_DEVICE_ID_PF_INT_6_X  0xBCC0
> @@ -45,6 +88,143 @@ static struct pci_device_id cci_pcie_id_tbl[] = {
>  };
>  MODULE_DEVICE_TABLE(pci, cci_pcie_id_tbl);
>
> +static int cci_init_drvdata(struct pci_dev *pcidev)
> +{
> +   struct cci_drvdata *drvdata;
> +
> +   drvdata = devm_kzalloc(>dev, sizeof(*drvdata), GFP_KERNEL);
> +   if (!drvdata)
> +   return -ENOMEM;
> +
> +   INIT_LIST_HEAD(>regions);
> +
> +   pci_set_drvdata(pcidev, drvdata);
> +
> +   return 0;
> +}
> +
> +static void cci_pci_release_regions(struct pci_dev *pcidev)
> +{
> +   struct cci_drvdata *drvdata = pci_get_drvdata(pcidev);
> +   struct cci_region *tmp, *region;
> +
> +   list_for_each_entry_safe(region, tmp, >regions, node) {
> +   list_del(>node);
> +   if (region->ioaddr)
> +   pci_iounmap(pcidev, region->ioaddr);
> +   devm_kfree(>dev, region);
> +   }
> +}
> +
> +static void cci_remove_drvdata(struct pci_dev *pcidev)
> +{
> +   struct cci_drvdata *drvdata = pci_get_drvdata(pcidev);
> +
> +   cci_pci_release_regions(pcidev);

Would it make sense to call fpga_enum_info_free here?  I understand
fpga_enum_info_alloc uses devm, but it does a get_device which needs
to be put.

> +   pci_set_drvdata(pcidev, NULL);
> +   devm_kfree(>dev, drvdata);
> +}
> +
> +static void cci_remove_feature_devs(struct pci_dev *pcidev)
> +{
> +   struct cci_drvdata *drvdata = pci_get_drvdata(pcidev);
> +
> +   /* remove all children feature devices */
> +   fpga_remove_feature_devs(drvdata->cdev);
> +}
> +
> +/* enumerate feature devices under pci device */
> +static int cci_enumerate_feature_devs(struct pci_dev *pcidev)
> +{
> +   struct cci_drvdata *drvdata = pci_get_drvdata(pcidev);
> +   struct fpga_cdev *cdev;
> +   struct fpga_enum_info *info;
> +   resource_size_t start, len;
> +   void __iomem *base;
> +   int port_num, bar, i, ret = 0;
> +   u32 offset;
> +   u64 v;
> +
> +   /*

Re: [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg()

2018-03-13 Thread Lorenzo Pieralisi

On Tue, Mar 13, 2018 at 06:23:39PM +, Dexuan Cui wrote:
> > From: Dexuan Cui
> > Sent: Wednesday, March 7, 2018 13:40
> > To: Lorenzo Pieralisi 
> > Cc: bhelg...@google.com; linux-...@vger.kernel.org; KY Srinivasan
> > ; Stephen Hemminger ;
> > o...@aepfle.de; a...@canonical.com; jasow...@redhat.com; linux-
> > ker...@vger.kernel.org; driverdev-de...@linuxdriverproject.org; Haiyang
> > Zhang ; vkuzn...@redhat.com;
> > marcelo.ce...@canonical.com; Michael Kelley (EOSG)
> > ; sta...@vger.kernel.org; Jack
> > Morgenstein 
> > Subject: RE: [PATCH v3 6/6] PCI: hv: fix 2 hang issues in 
> > hv_compose_msi_msg()
> > 
> > > From: Lorenzo Pieralisi 
> > > Sent: Wednesday, March 7, 2018 04:35
> > > On Tue, Mar 06, 2018 at 06:21:56PM +, Dexuan Cui wrote:
> > > > 1. With the patch "x86/vector/msi: Switch to global reservation mode"
> > > > (4900be8360), the recent v4.15 and newer kernels always hang for 1-vCPU
> > > > Hyper-V VM with SR-IOV. This is because when we reach
> > > hv_compose_msi_msg()
> > > > by request_irq()  -> request_threaded_irq() -> 
> > > > __setup_irq()->irq_startup()
> > > >  -> __irq_startup() -> irq_domain_activate_irq() -> ... ->
> > > > msi_domain_activate() -> ... -> hv_compose_msi_msg(), local irq is
> > > > disabled in __setup_irq().
> > > >
> > > > Fix this by polling the channel.
> > > >
> > > > 2. If the host is ejecting the VF device before we reach
> > > > hv_compose_msi_msg(), in a UP VM, we can hang in
> > hv_compose_msi_msg()
> > > > forever, because at this time the host doesn't respond to the
> > > > CREATE_INTERRUPT request. This issue also happens to old kernels like
> > > > v4.14, v4.13, etc.
> > >
> > > If you are fixing a problem you should report what commit you are fixing
> > > with a Fixes: tag and add a CC: sta...@vger.kernel.org to the commit log
> > > to send it to stable kernels to which it should be applied; mentioning
> > > kernel versions in the commit log is useless and should be omitted.
> > 
> > Hi Lorenzo,
> > Thanks for your comments!
> > This patch does have a "Cc: sta...@vger.kernel.org" in the sign-off area. 
> > :-)
> > 
> > Here the patch is made to resolve 2 issues:
> > #1 is triggered by the x86 global reservation mode (4900be8360) patch.
> > 4900be8360 in itself is good. It's just that drivers/pci/host/pci-hyperv.c
> > should be fixed.
> > 
> > #2 is a longstanding issue since the first day the pci-hyperv driver was
> > accepted into the kernel.
> > 
> > So IMO actually we don't really need to add a Fixes: tag, which is usually
> > used to specify a specific commit that introduces a bug that is being fixed.
> > 
> > > Side note: you should not have sta...@vger.kernel.org in the email
> > > addresses CC list you are sending the patches to (you mark patches for
> > > stable by adding an appropriate CC tag in the commit log).
> > 
> > Sorry, I didn't know this, but actually I didn't add sta...@vger.kernel.org
> > manually. Instead I used "git send-email" to send this patchset, and it told
> > me "The Cc list above has been expanded by additional addresses found
> > in the patch commit message."
> > 
> > I didn't find a way to disable this behavior of "git send-email" by checking
> > its manual and googling it. This is strange.
> > 
> > > Here:
> > >
> > > git.kernel.org/.../Documentation/process/stable-kernel-rules.rst
> > >
> > > Last but not least, most of the patches in this series do not justify
> > > sending them to stable kernels at all so you should remove the
> > > corresponding tag from the patches.
> > 
> > I hope at least these 2 patches can go into the stable kernels:
> > [PATCH v3 3/6] PCI: hv: serialize the present/eject work items
> > [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg()
> > Especially the second one, which fixes a real hang issue for UP virtual
> > machines running v4.15 and newer.
> > And,  IMO the patches are small enough (<100 lines) , but definitely
> > the maintainers make the final call.
> > 
> > >
> > > Thanks,
> > > Lorenzo
> > >
> > > > Fix this by polling the channel for the PCI_EJECT message and
> > > > hpdev->state, and by checking the PCI vendor ID.
> > > >
> > > > Note: actually the above issues also happen to a SMP VM, if
> > > > "hbus->hdev->channel->target_cpu == smp_processor_id()" is true.
> > > >
> > > > Signed-off-by: Dexuan Cui 
> > > > Tested-by: Adrian Suhov 
> > > > Tested-by: Chris Valean 
> > > > Cc: sta...@vger.kernel.org
> > > > Cc: Stephen Hemminger 
> > > > Cc: K. Y. Srinivasan 
> > > > Cc: Vitaly Kuznetsov 
> > > > Cc: Jack Morgenstein 
> > > > ---
> > > >  drivers/pci/host/pci-hyperv.c | 58
> > 
> > 
> > Thanks,
> > -- Dexuan
> 
> Hi Lorenzo,

[PATCH RFC 8/8] powerpc/64: barrier_nospec: Add commandline trigger

2018-03-13 Thread Michal Suchanek

Copypasta from rfi implementation

Signed-off-by: Michal Suchanek 
---
 arch/powerpc/kernel/setup_64.c | 8 
 1 file changed, 8 insertions(+)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 4b67b7b877d9..257f0e6be107 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -840,6 +840,14 @@ static int __init handle_no_pti(char *p)
 }
 early_param("nopti", handle_no_pti);
 
+static int __init handle_no_nospec(char *p)
+{
+   pr_info("barrier_nospec: disabled on command line.");
+   no_nospec = true;
+   return 0;
+}
+early_param("no_nospec", handle_no_nospec);
+
 static void do_nothing(void *unused)
 {
/*
-- 
2.13.6

[PATCH RFC 7/8] powerpc/64s: barrier_nospec: Add hcall triggerr

2018-03-13 Thread Michal Suchanek

Copypasta from rfi implementation

Signed-off-by: Michal Suchanek 
---
 arch/powerpc/platforms/pseries/setup.c | 38 ++
 1 file changed, 25 insertions(+), 13 deletions(-)

diff --git a/arch/powerpc/platforms/pseries/setup.c 
b/arch/powerpc/platforms/pseries/setup.c
index 1a527625acf7..b779ddb8e250 100644
--- a/arch/powerpc/platforms/pseries/setup.c
+++ b/arch/powerpc/platforms/pseries/setup.c
@@ -459,38 +459,50 @@ static void __init find_and_init_phbs(void)
of_pci_check_probe_only();
 }
 
-static void pseries_setup_rfi_flush(void)
+static void pseries_setup_rfi_nospec(void)
 {
struct h_cpu_char_result result;
-   enum l1d_flush_type types;
-   bool enable;
+   enum l1d_flush_type flush_types;
+   enum spec_barrier_type barrier_type;
+   bool flush_enable;
+   bool barrier_enable;
long rc;
 
/* Enable by default */
-   enable = true;
+   flush_enable = true;
+   barrier_enable = true;
+   /* no fallback if the firmware does not tell us */
+   barrier_type = SPEC_BARRIER_NONE;
 
rc = plpar_get_cpu_characteristics();
if (rc == H_SUCCESS) {
-   types = L1D_FLUSH_NONE;
+   flush_types = L1D_FLUSH_NONE;
 
if (result.character & H_CPU_CHAR_L1D_FLUSH_TRIG2)
-   types |= L1D_FLUSH_MTTRIG;
+   flush_types |= L1D_FLUSH_MTTRIG;
if (result.character & H_CPU_CHAR_L1D_FLUSH_ORI30)
-   types |= L1D_FLUSH_ORI;
+   flush_types |= L1D_FLUSH_ORI;
+   if (result.character & H_CPU_CHAR_SPEC_BAR_ORI31)
+   barrier_type |= SPEC_BARRIER_ORI;
 
/* Use fallback if nothing set in hcall */
-   if (types == L1D_FLUSH_NONE)
-   types = L1D_FLUSH_FALLBACK;
+   if (flush_types == L1D_FLUSH_NONE)
+   flush_types = L1D_FLUSH_FALLBACK;
 
if ((!(result.behaviour & H_CPU_BEHAV_L1D_FLUSH_PR)) ||
(!(result.behaviour & H_CPU_BEHAV_FAVOUR_SECURITY)))
-   enable = false;
+   flush_enable = false;
+
+   if ((!(result.behaviour & H_CPU_BEHAV_BNDS_CHK_SPEC_BAR)) ||
+   (!(result.behaviour & H_CPU_BEHAV_FAVOUR_SECURITY)))
+   barrier_enable = false;
} else {
/* Default to fallback if case hcall is not available */
-   types = L1D_FLUSH_FALLBACK;
+   flush_types = L1D_FLUSH_FALLBACK;
}
 
-   setup_rfi_flush(types, enable);
+   setup_barrier_nospec(barrier_type, barrier_enable);
+   setup_rfi_flush(flush_types, flush_enable);
 }
 
 #ifdef CONFIG_PCI_IOV
@@ -666,7 +678,7 @@ static void __init pSeries_setup_arch(void)
 
fwnmi_init();
 
-   pseries_setup_rfi_flush();
+   pseries_setup_rfi_nospec();
 
/* By default, only probe PCI (can be overridden by rtas_pci) */
pci_add_flags(PCI_PROBE_ONLY);
-- 
2.13.6

[PATCH v4] dmaengine: pl330: flush before wait, and add dev burst support.

2018-03-13 Thread Frank Mori Hess

Do DMAFLUSHP _before_ the first DMAWFP to ensure controller
and peripheral are in agreement about dma request state before first
transfer.  Add support for burst transfers to/from peripherals. In the new
scheme, the controller does as many burst transfers as it can then
transfers the remaining dregs with either single transfers for
peripherals, or with a reduced size burst for memory-to-memory transfers.

Signed-off-by: Frank Mori Hess 
Tested-by: Frank Mori Hess 
---

I tested dma transfers to peripherals with v3 patch and designware serial 
port (drivers/tty/serial/8250/8250_dw.c) and a GPIB interface
(https://github.com/fmhess/fmh_gpib_core).  I tested memory-to-memory
transfers using the dmatest module.

v3 of this patch should be the same as v2 except with checkpatch.pl
warnings and errors cleaned up.

v4 addresses cosmetic complaints about v3, should be functionally unchanged.

 drivers/dma/pl330.c | 209 +++-
 1 file changed, 159 insertions(+), 50 deletions(-)

diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c
index d7327fd5f445..819a578e317f 100644
--- a/drivers/dma/pl330.c
+++ b/drivers/dma/pl330.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "dmaengine.h"
 #define PL330_MAX_CHAN 8
@@ -1094,51 +1095,96 @@ static inline int _ldst_memtomem(unsigned dry_run, u8 
buf[],
return off;
 }
 
-static inline int _ldst_devtomem(struct pl330_dmac *pl330, unsigned dry_run,
-u8 buf[], const struct _xfer_spec *pxs,
-int cyc)
+static u32 _emit_load(unsigned int dry_run, u8 buf[],
+   enum pl330_cond cond, enum dma_transfer_direction direction,
+   u8 peri)
 {
int off = 0;
-   enum pl330_cond cond;
 
-   if (pl330->quirks & PL330_QUIRK_BROKEN_NO_FLUSHP)
-   cond = BURST;
-   else
-   cond = SINGLE;
+   switch (direction) {
+   case DMA_MEM_TO_MEM:
+   /* fall through */
+   case DMA_MEM_TO_DEV:
+   off += _emit_LD(dry_run, [off], cond);
+   break;
 
-   while (cyc--) {
-   off += _emit_WFP(dry_run, [off], cond, pxs->desc->peri);
-   off += _emit_LDP(dry_run, [off], cond, pxs->desc->peri);
-   off += _emit_ST(dry_run, [off], ALWAYS);
+   case DMA_DEV_TO_MEM:
+   if (cond == ALWAYS) {
+   off += _emit_LDP(dry_run, [off], SINGLE,
+   peri);
+   off += _emit_LDP(dry_run, [off], BURST,
+   peri);
+   } else {
+   off += _emit_LDP(dry_run, [off], cond,
+   peri);
+   }
+   break;
 
-   if (!(pl330->quirks & PL330_QUIRK_BROKEN_NO_FLUSHP))
-   off += _emit_FLUSHP(dry_run, [off],
-   pxs->desc->peri);
+   default:
+   /* this code should be unreachable */
+   WARN_ON(1);
+   break;
}
 
return off;
 }
 
-static inline int _ldst_memtodev(struct pl330_dmac *pl330,
+static inline u32 _emit_store(unsigned int dry_run, u8 buf[],
+   enum pl330_cond cond, enum dma_transfer_direction direction,
+   u8 peri)
+{
+   int off = 0;
+
+   switch (direction) {
+   case DMA_MEM_TO_MEM:
+   /* fall through */
+   case DMA_DEV_TO_MEM:
+   off += _emit_ST(dry_run, [off], cond);
+   break;
+
+   case DMA_MEM_TO_DEV:
+   if (cond == ALWAYS) {
+   off += _emit_STP(dry_run, [off], SINGLE,
+   peri);
+   off += _emit_STP(dry_run, [off], BURST,
+   peri);
+   } else {
+   off += _emit_STP(dry_run, [off], cond,
+   peri);
+   }
+   break;
+
+   default:
+   /* this code should be unreachable */
+   WARN_ON(1);
+   break;
+   }
+
+   return off;
+}
+
+static inline int _ldst_peripheral(struct pl330_dmac *pl330,
 unsigned dry_run, u8 buf[],
-const struct _xfer_spec *pxs, int cyc)
+const struct _xfer_spec *pxs, int cyc,
+enum pl330_cond cond)
 {
int off = 0;
-   enum pl330_cond cond;
 
if (pl330->quirks & PL330_QUIRK_BROKEN_NO_FLUSHP)
cond = BURST;
-   else
-   cond = SINGLE;
 
+   /*
+* do FLUSHP at beginning to clear any stale dma requests before the
+* first WFP.
+*/
+   if (!(pl330->quirks & PL330_QUIRK_BROKEN_NO_FLUSHP))
+   off += _emit_FLUSHP(dry_run, [off], pxs->desc->peri);

[PATCH RFC 3/8] powerpc/64: Use barrier_nospec in syscall entry

2018-03-13 Thread Michal Suchanek

Signed-off-by: Michal Suchanek 
---
 arch/powerpc/kernel/entry_64.S | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S
index 2cb5109a7ea3..7bfc4cf48af2 100644
--- a/arch/powerpc/kernel/entry_64.S
+++ b/arch/powerpc/kernel/entry_64.S
@@ -36,6 +36,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #ifdef CONFIG_PPC_BOOK3S
 #include 
@@ -159,6 +160,7 @@ system_call:/* label this so stack 
traces look sane */
andi.   r11,r10,_TIF_SYSCALL_DOTRACE
bne .Lsyscall_dotrace   /* does not return */
cmpldi  0,r0,NR_syscalls
+   barrier_nospec
bge-.Lsyscall_enosys
 
 .Lsyscall:
@@ -319,6 +321,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR)
ld  r10,TI_FLAGS(r10)
 
cmpldi  r0,NR_syscalls
+   barrier_nospec
blt+.Lsyscall
 
/* Return code is already in r3 thanks to do_syscall_trace_enter() */
-- 
2.13.6

[PATCH RFC 6/8] powerpc/64: barrier_nospec: Add debugfs trigger

2018-03-13 Thread Michal Suchanek

Copypasta from rfi implementation

Signed-off-by: Michal Suchanek 
---
 arch/powerpc/kernel/setup_64.c | 35 +++
 1 file changed, 35 insertions(+)

diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index d1d9f047161e..4b67b7b877d9 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -955,6 +955,41 @@ static __init int rfi_flush_debugfs_init(void)
return 0;
 }
 device_initcall(rfi_flush_debugfs_init);
+
+static int barrier_nospec_set(void *data, u64 val)
+{
+   switch (val) {
+   case 0:
+   case 1:
+   break;
+   default:
+   return -EINVAL;
+   }
+
+   if (!!val == !!barrier_nospec_enabled)
+   return 0;
+
+   barrier_nospec_enable(!!val);
+
+   return 0;
+}
+
+static int barrier_nospec_get(void *data, u64 *val)
+{
+   *val = barrier_nospec_enabled ? 1 : 0;
+   return 0;
+}
+
+DEFINE_SIMPLE_ATTRIBUTE(fops_barrier_nospec,
+   barrier_nospec_get, barrier_nospec_set, "%llu\n");
+
+static __init int barrier_nospec_debugfs_init(void)
+{
+   debugfs_create_file("barrier_nospec", 0600, powerpc_debugfs_root, NULL,
+   _barrier_nospec);
+   return 0;
+}
+device_initcall(barrier_nospec_debugfs_init);
 #endif
 
 ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, 
char *buf)
-- 
2.13.6

[PATCH RFC 0/8] powerpc barrier_nospec

2018-03-13 Thread Michal Suchanek

Hello,

this is patchset adding barrier_nospec on powerpc. It is based on the
out-of-tree gmb() patch and the existing rfi patches.

I do not have the tests for the Spectre/Meltdown issues available so this is
untested.

Feedback on the general approach as well as actual effectivity is welcome.

Thanks

Michal


Michal Suchanek (8):
  powerpc: Add barrier_nospec
  powerpc: Use barrier_nospec in copy_from_user
  powerpc/64: Use barrier_nospec in syscall entry
  powerpc/64s: Add support for ori barrier_nospec
  powerpc/64: Patch barrier_nospec in modules
  powerpc/64: barrier_nospec: Add debugfs trigger
  powerpc/64s: barrier_nospec: Add hcall triggerr
  powerpc/64: barrier_nospec: Add commandline trigger

 arch/powerpc/include/asm/barrier.h|  9 
 arch/powerpc/include/asm/feature-fixups.h |  9 
 arch/powerpc/include/asm/setup.h  | 11 +
 arch/powerpc/include/asm/uaccess.h| 11 -
 arch/powerpc/kernel/entry_64.S|  3 ++
 arch/powerpc/kernel/module.c  |  6 +++
 arch/powerpc/kernel/setup_64.c| 72 +++
 arch/powerpc/kernel/vmlinux.lds.S |  7 +++
 arch/powerpc/lib/feature-fixups.c | 38 
 arch/powerpc/platforms/pseries/setup.c| 38 ++--
 10 files changed, 190 insertions(+), 14 deletions(-)

-- 
2.13.6

Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

2018-03-13 Thread Julia Lawall



On Tue, 13 Mar 2018, Matthew Wilcox wrote:

> On Tue, Mar 13, 2018 at 06:19:51PM +0100, Julia Lawall wrote:
> > On Thu, 8 Mar 2018, Matthew Wilcox wrote:
> > > On Thu, Mar 08, 2018 at 07:24:47AM +0100, Julia Lawall wrote:
> > > > Thanks.  So it's OK to replace kmalloc and kzalloc, even though they
> > > > didn't previously consider vmalloc and even though kmalloc doesn't zero?
> > >
> > > We'll also need to replace the corresponding places where those structs
> > > are freed with kvfree().  Can coccinelle handle that too?
> >
> > Is the use of vmalloc a necessary part of the design?  Or could there be a
> > non vmalloc versions for call sites that are already ok with that?
>
> We can also add kmalloc_struct() along with kmalloc_ab_c that won't fall
> back to vmalloc but just return NULL.

It could be safer than being sure to find all of the relevant kfrees.

julia

[PATCH] spi: bcm-qspi: fIX some error handling paths

2018-03-13 Thread Christophe JAILLET

For some reason, commit c0368e4db4a3 ("spi: bcm-qspi: Fix use after free
in bcm_qspi_probe() in error path") has updated some gotos, but not all of
them.

This looks spurious, so fix it.

Fixes: fa236a7ef240 ("spi: bcm-qspi: Add Broadcom MSPI driver")
Signed-off-by: Christophe JAILLET 
---
I've added the same Fixes: tag as in commit c0368e4db4a3 even if the patch
would not cleanly apply on top of it.
At least commit 4e3b2d236fe0 ("spi: bcm-qspi: Add BSPI spi-nor flash
controller driver") would also be needed.

Also the label names could be improved. But this goes beyond the scope of
this patch.
---
 drivers/spi/spi-bcm-qspi.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/spi/spi-bcm-qspi.c b/drivers/spi/spi-bcm-qspi.c
index a172ab299e80..1596d35498c5 100644
--- a/drivers/spi/spi-bcm-qspi.c
+++ b/drivers/spi/spi-bcm-qspi.c
@@ -1247,7 +1247,7 @@ int bcm_qspi_probe(struct platform_device *pdev,
qspi->base[MSPI]  = devm_ioremap_resource(dev, res);
if (IS_ERR(qspi->base[MSPI])) {
ret = PTR_ERR(qspi->base[MSPI]);
-   goto qspi_probe_err;
+   goto qspi_resource_err;
}
} else {
goto qspi_resource_err;
@@ -1258,7 +1258,7 @@ int bcm_qspi_probe(struct platform_device *pdev,
qspi->base[BSPI]  = devm_ioremap_resource(dev, res);
if (IS_ERR(qspi->base[BSPI])) {
ret = PTR_ERR(qspi->base[BSPI]);
-   goto qspi_probe_err;
+   goto qspi_resource_err;
}
qspi->bspi_mode = true;
} else {
-- 
2.14.1

Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct

2018-03-13 Thread Matthew Wilcox

On Tue, Mar 13, 2018 at 06:19:51PM +0100, Julia Lawall wrote:
> On Thu, 8 Mar 2018, Matthew Wilcox wrote:
> > On Thu, Mar 08, 2018 at 07:24:47AM +0100, Julia Lawall wrote:
> > > Thanks.  So it's OK to replace kmalloc and kzalloc, even though they
> > > didn't previously consider vmalloc and even though kmalloc doesn't zero?
> >
> > We'll also need to replace the corresponding places where those structs
> > are freed with kvfree().  Can coccinelle handle that too?
> 
> Is the use of vmalloc a necessary part of the design?  Or could there be a
> non vmalloc versions for call sites that are already ok with that?

We can also add kmalloc_struct() along with kmalloc_ab_c that won't fall
back to vmalloc but just return NULL.

[PATCH v3 0/3] Bluetooth: hci_qca: Add serdev support

2018-03-13 Thread Thierry Escande

Hi,

This patchset enables the Qualcomm BT controller QCA6174 node in the
device tree of the db820c board. This allows the bluetooth chipset to
be probed and registered against the hci layer by using the serdev
framework.

This patchset also contains the documentation for the compatible
string "qcom,qca6174-bt" related to this chipset.

v3:
- Address comments for patch #3 (details in patch)

v2:
- Fix author email

Thierry Escande (3):
  arm64: dts: apq8096-db820c: enable bluetooth node
  dt-bindings: net: bluetooth: Add qualcomm-bluetooth
  Bluetooth: hci_qca: Add serdev support

 .../devicetree/bindings/net/qualcomm-bluetooth.txt |  34 +++
 arch/arm64/boot/dts/qcom/apq8096-db820c-pins.dtsi  |  14 +++
 .../boot/dts/qcom/apq8096-db820c-pmic-pins.dtsi|  17 
 arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi   |  32 ++
 arch/arm64/boot/dts/qcom/msm8996.dtsi  |  10 ++
 drivers/bluetooth/Kconfig  |   2 +-
 drivers/bluetooth/hci_qca.c| 109 -
 7 files changed, 215 insertions(+), 3 deletions(-)
 create mode 100644 Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt

-- 
2.14.1

[PATCH v3 3/3] Bluetooth: hci_qca: Add serdev support

2018-03-13 Thread Thierry Escande

Add support for Qualcomm serial slave devices. Probe the serial device,
retrieve its maximum speed and register a new hci uart device.

Signed-off-by: Thierry Escande 
---

v3:
- Remove redundant call to gpiod_set_value() after devm_gpiod_get()
- Check returned values for clk_set_rate() and clk_prepare_enable()
- Use clk_disable_unprepare()

 drivers/bluetooth/Kconfig   |   2 +-
 drivers/bluetooth/hci_qca.c | 109 +++-
 2 files changed, 108 insertions(+), 3 deletions(-)

diff --git a/drivers/bluetooth/Kconfig b/drivers/bluetooth/Kconfig
index 07e55cd8f8c8..c2a6a7ebd14b 100644
--- a/drivers/bluetooth/Kconfig
+++ b/drivers/bluetooth/Kconfig
@@ -195,7 +195,7 @@ config BT_HCIUART_BCM
 
 config BT_HCIUART_QCA
bool "Qualcomm Atheros protocol support"
-   depends on BT_HCIUART
+   depends on BT_HCIUART_SERDEV
select BT_HCIUART_H4
select BT_QCA
help
diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c
index 05ec530b8a3a..aa0886d5324f 100644
--- a/drivers/bluetooth/hci_qca.c
+++ b/drivers/bluetooth/hci_qca.c
@@ -29,7 +29,11 @@
  */
 
 #include 
+#include 
 #include 
+#include 
+#include 
+#include 
 
 #include 
 #include 
@@ -50,6 +54,9 @@
 #define IBS_TX_IDLE_TIMEOUT_MS 2000
 #define BAUDRATE_SETTLE_TIMEOUT_MS 300
 
+/* divclk4 rate */
+#define DIVCLK4_RATE_32KHZ 32768
+
 /* HCI_IBS transmit side sleep protocol states */
 enum tx_ibs_states {
HCI_IBS_TX_ASLEEP,
@@ -111,6 +118,12 @@ struct qca_data {
u64 votes_off;
 };
 
+struct qca_serdev {
+   struct hci_uart  serdev_hu;
+   struct gpio_desc *bt_en;
+   struct clk   *divclk4;
+};
+
 static void __serial_clock_on(struct tty_struct *tty)
 {
/* TODO: Some chipset requires to enable UART clock on client
@@ -386,6 +399,7 @@ static void hci_ibs_wake_retrans_timeout(struct timer_list 
*t)
 /* Initialize protocol */
 static int qca_open(struct hci_uart *hu)
 {
+   struct qca_serdev *qcadev;
struct qca_data *qca;
 
BT_DBG("hu %p qca_open", hu);
@@ -444,6 +458,13 @@ static int qca_open(struct hci_uart *hu)
timer_setup(>tx_idle_timer, hci_ibs_tx_idle_timeout, 0);
qca->tx_idle_delay = IBS_TX_IDLE_TIMEOUT_MS;
 
+   if (hu->serdev) {
+   serdev_device_open(hu->serdev);
+
+   qcadev = serdev_device_get_drvdata(hu->serdev);
+   gpiod_set_value(qcadev->bt_en, 1);
+   }
+
BT_DBG("HCI_UART_QCA open, tx_idle_delay=%u, wake_retrans=%u",
   qca->tx_idle_delay, qca->wake_retrans);
 
@@ -512,6 +533,7 @@ static int qca_flush(struct hci_uart *hu)
 /* Close protocol */
 static int qca_close(struct hci_uart *hu)
 {
+   struct qca_serdev *qcadev;
struct qca_data *qca = hu->priv;
 
BT_DBG("hu %p qca close", hu);
@@ -525,6 +547,13 @@ static int qca_close(struct hci_uart *hu)
destroy_workqueue(qca->workqueue);
qca->hu = NULL;
 
+   if (hu->serdev) {
+   serdev_device_close(hu->serdev);
+
+   qcadev = serdev_device_get_drvdata(hu->serdev);
+   gpiod_set_value(qcadev->bt_en, 0);
+   }
+
kfree_skb(qca->rx_skb);
 
hu->priv = NULL;
@@ -885,6 +914,14 @@ static int qca_set_baudrate(struct hci_dev *hdev, uint8_t 
baudrate)
return 0;
 }
 
+static inline void host_set_baudrate(struct hci_uart *hu, unsigned int speed)
+{
+   if (hu->serdev)
+   serdev_device_set_baudrate(hu->serdev, speed);
+   else
+   hci_uart_set_baudrate(hu, speed);
+}
+
 static int qca_setup(struct hci_uart *hu)
 {
struct hci_dev *hdev = hu->hdev;
@@ -905,7 +942,7 @@ static int qca_setup(struct hci_uart *hu)
speed = hu->proto->init_speed;
 
if (speed)
-   hci_uart_set_baudrate(hu, speed);
+   host_set_baudrate(hu, speed);
 
/* Setup user speed if needed */
speed = 0;
@@ -924,7 +961,7 @@ static int qca_setup(struct hci_uart *hu)
   ret);
return ret;
}
-   hci_uart_set_baudrate(hu, speed);
+   host_set_baudrate(hu, speed);
}
 
/* Setup patch / NVM configurations */
@@ -958,12 +995,80 @@ static struct hci_uart_proto qca_proto = {
.dequeue= qca_dequeue,
 };
 
+static int qca_serdev_probe(struct serdev_device *serdev)
+{
+   struct qca_serdev *qcadev;
+   int err;
+
+   qcadev = devm_kzalloc(>dev, sizeof(*qcadev), GFP_KERNEL);
+   if (!qcadev)
+   return -ENOMEM;
+
+   qcadev->serdev_hu.serdev = serdev;
+   serdev_device_set_drvdata(serdev, qcadev);
+
+   qcadev->bt_en = devm_gpiod_get(>dev, "bt-disable-n",
+  GPIOD_OUT_LOW);
+   if (IS_ERR(qcadev->bt_en)) {
+   dev_err(>dev, "failed to acquire bt-disable-n gpio\n");
+   return

RE: [PATCH 2/3] dt-bindings: clock: Add bindings for ZynqMP clock driver

2018-03-13 Thread Jolly Shah

Hi Rob,

> -Original Message-
> From: Rob Herring [mailto:r...@kernel.org]
> Sent: Wednesday, March 07, 2018 5:20 PM
> To: Jolly Shah 
> Cc: mturque...@baylibre.com; sb...@codeaurora.org;
> michal.si...@xilinx.com; mark.rutl...@arm.com; linux-...@vger.kernel.org;
> devicet...@vger.kernel.org; Shubhrajyoti Datta ; linux-
> ker...@vger.kernel.org; Rajan Vaja ; linux-arm-
> ker...@lists.infradead.org
> Subject: Re: [PATCH 2/3] dt-bindings: clock: Add bindings for ZynqMP clock
> driver
> 
> On Wed, Mar 7, 2018 at 4:47 PM, Jolly Shah  wrote:
> > Hi Rob,
> >
> >
> >> -Original Message-
> >> From: Rob Herring [mailto:r...@kernel.org]
> >> Sent: Monday, March 05, 2018 5:46 PM
> >> To: Jolly Shah 
> >> Cc: mturque...@baylibre.com; sb...@codeaurora.org;
> >> michal.si...@xilinx.com; mark.rutl...@arm.com;
> >> linux-...@vger.kernel.org; devicet...@vger.kernel.org; Shubhrajyoti
> >> Datta ; linux- ker...@vger.kernel.org; Jolly
> >> Shah ; Rajan Vaja ;
> >> linux-arm-ker...@lists.infradead.org
> >> Subject: Re: [PATCH 2/3] dt-bindings: clock: Add bindings for ZynqMP
> >> clock driver
> >>
> >> On Wed, Feb 28, 2018 at 02:27:40PM -0800, Jolly Shah wrote:
> >> > Add documentation to describe Xilinx ZynqMP clock driver bindings.
> >> >
> >> > Signed-off-by: Jolly Shah 
> >> > Signed-off-by: Rajan Vaja 
> >> > Signed-off-by: Shubhrajyoti Datta 
> >> > ---
> 
> >> > +95 dpll_post_src
> >> > +96 vpll_int
> >> > +97 vpll_pre_src
> >> > +98 vpll_half
> >> > +99 vpll_int_mux
> >> > +100vpll_post_src
> >> > +101can0_mio
> >> > +102can1_mio
> >> > +
> >> > +Example:
> >> > +
> >> > +clk: clk {
> >> > +   #clock-cells = <1>;
> >> > +   compatible = "xlnx,zynqmp-clk";
> >>
> >> How do you control the clocks?
> >
> > Clocks are controlled by a dedicated platform management controller. Above
> clock ids are used to identify clocks between master and PMU.
> 
> What is the interface to the "platform management controller"? Because you
> have no registers, I'm guessing a firmware interface? If so, then just define 
> the
> firmware node as a clock provider.

Yes it is firmware interface. Along with clocks, firmware interface also 
controls power and pinctrl operations as major.
I am not sure if I understand you correctly. Do you suggest to register clocks 
through Firmware driver or just use firmware DT node as clock provider and 
clock driver DT node can reference clocks from FW node to register same?
 
> 
> Rob

[PATCH v3 2/3] dt-bindings: net: bluetooth: Add qualcomm-bluetooth

2018-03-13 Thread Thierry Escande

Add binding document for serial bluetooth chips using Qualcomm protocol.

Signed-off-by: Thierry Escande 
---
 .../devicetree/bindings/net/qualcomm-bluetooth.txt | 34 ++
 1 file changed, 34 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt

diff --git a/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt 
b/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt
new file mode 100644
index ..288cf062e906
--- /dev/null
+++ b/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt
@@ -0,0 +1,34 @@
+Qualcomm Bluetooth Chips
+-
+
+This documents the binding structure and common properties for serial
+attached Qualcomm devices.
+
+Serial attached Qualcomm devices shall be a child node of the host UART
+device the slave device is attached to.
+
+Required properties:
+
+ - compatible: should contain one of the following:
+   * "qcom,qca6174-bt"
+
+Optional properties:
+
+ - bt-disable-n-gpios: GPIO specifier, used to enable chip during probe
+
+Example:
+
+serial@757 {
+   pinctrl-names = "default", "sleep";
+   pinctrl-0 = <_uart1_default>;
+   pinctrl-1 = <_uart1_sleep>;
+
+   bluetooth {
+   compatible = "qcom,qca6174-bt";
+
+   bt-disable-n-gpios = <_gpios 19 GPIO_ACTIVE_HIGH>;
+
+   pinctrl-names = "default";
+   pinctrl-0 = <_en_pin_a>, <_pin_a>;
+   };
+};
-- 
2.14.1

Re: [PATCH v4 4/6] ipc: Clamp msgmni and shmmni to the real IPCMNI limit

2018-03-13 Thread Waiman Long

On 03/13/2018 02:17 PM, Eric W. Biederman wrote:
> Waiman Long  writes:
>
>> A user can write arbitrary integer values to msgmni and shmmni sysctl
>> parameters without getting error, but the actual limit is really
>> IPCMNI (32k). This can mislead users as they think they can get a
>> value that is not real.
>>
>> Enforcing the limit by failing the sysctl parameter write, however,
>> can break existing user applications.
> Which applications examples please.
>
> I am seeing this patchset late but it looks like a whole lot of changes
> to avoid a theoretical possibility.
>
> Changes that have an impact on more than just the ipc code you are
> patching.
>
> That makes me feel very uncomfortable with these changes.
>
> Eric

This patchset is constructed to address a customer request that there is
no easy way to find out the actual usable range of a sysctl parameter.
In this particular case, the customer wants to use more than 32k shared
memory segments. They can put in a large value into shmmni, but the
application didn't work properly because shmmni was internally clamped
to 32k without any visible sign that a smaller limit has been imposed.

Out of a concern that there might be customers out there setting those
sysctl parameters outside of the allowable range without knowing it,
just enforcing the right limits may have the undesirable consequence of
breaking their existing setup scripts. I don't have concrete example of
what customers are doing that, but it  won't look good if we wait until
the complaints come in.

The new code won't affect existing code unless the necessary flag is
set. So would you mind elaborating what other impact do you see that
will affect other non-IPC code in an undesirable way?

Cheers,
Longman

[RT PATCH 1/2] Revert "block: blk-mq: Use swait"

2018-03-13 Thread Sebastian Andrzej Siewior

This reverts commit "block: blk-mq: Use swait". The issue remains but
will be fixed differently.

Signed-off-by: Sebastian Andrzej Siewior 
---
 block/blk-core.c   | 6 +++---
 block/blk-mq.c | 8 
 include/linux/blkdev.h | 2 +-
 3 files changed, 8 insertions(+), 8 deletions(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index ff1258ca236c..f9ac6f169c67 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -799,7 +799,7 @@ int blk_queue_enter(struct request_queue *q, bool nowait)
 */
smp_rmb();
 
-   ret = swait_event_interruptible(q->mq_freeze_wq,
+   ret = wait_event_interruptible(q->mq_freeze_wq,
!atomic_read(>mq_freeze_depth) ||
blk_queue_dying(q));
if (blk_queue_dying(q))
@@ -819,7 +819,7 @@ static void blk_queue_usage_counter_release(struct 
percpu_ref *ref)
struct request_queue *q =
container_of(ref, struct request_queue, q_usage_counter);
 
-   swake_up_all(>mq_freeze_wq);
+   wake_up_all(>mq_freeze_wq);
 }
 
 static void blk_rq_timed_out_timer(unsigned long data)
@@ -895,7 +895,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
q->bypass_depth = 1;
__set_bit(QUEUE_FLAG_BYPASS, >queue_flags);
 
-   init_swait_queue_head(>mq_freeze_wq);
+   init_waitqueue_head(>mq_freeze_wq);
 
/*
 * Init percpu_ref in atomic mode so that it's faster to shutdown.
diff --git a/block/blk-mq.c b/block/blk-mq.c
index bbe43d32f71a..c5bd467dd97b 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -132,14 +132,14 @@ EXPORT_SYMBOL_GPL(blk_freeze_queue_start);
 
 void blk_mq_freeze_queue_wait(struct request_queue *q)
 {
-   swait_event(q->mq_freeze_wq, percpu_ref_is_zero(>q_usage_counter));
+   wait_event(q->mq_freeze_wq, percpu_ref_is_zero(>q_usage_counter));
 }
 EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait);
 
 int blk_mq_freeze_queue_wait_timeout(struct request_queue *q,
 unsigned long timeout)
 {
-   return swait_event_timeout(q->mq_freeze_wq,
+   return wait_event_timeout(q->mq_freeze_wq,
percpu_ref_is_zero(>q_usage_counter),
timeout);
 }
@@ -182,7 +182,7 @@ void blk_mq_unfreeze_queue(struct request_queue *q)
WARN_ON_ONCE(freeze_depth < 0);
if (!freeze_depth) {
percpu_ref_reinit(>q_usage_counter);
-   swake_up_all(>mq_freeze_wq);
+   wake_up_all(>mq_freeze_wq);
}
 }
 EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue);
@@ -263,7 +263,7 @@ void blk_mq_wake_waiters(struct request_queue *q)
 * dying, we need to ensure that processes currently waiting on
 * the queue are notified as well.
 */
-   swake_up_all(>mq_freeze_wq);
+   wake_up_all(>mq_freeze_wq);
 }
 
 bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx)
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 6f278f1fd634..b68752bfb645 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -598,7 +598,7 @@ struct request_queue {
struct throtl_data *td;
 #endif
struct rcu_head rcu_head;
-   struct swait_queue_head mq_freeze_wq;
+   wait_queue_head_t   mq_freeze_wq;
struct percpu_ref   q_usage_counter;
struct list_headall_q_node;
 
-- 
2.16.2

Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory

2018-03-13 Thread Logan Gunthorpe




On 12/03/18 09:28 PM, Sinan Kaya wrote:

Maybe, dev parameter should also be struct pci_dev so that you can get rid of
all to_pci_dev() calls in this code including find_parent_pci_dev() function.


No, this was mentioned in v2. find_parent_pci_dev is necessary because 
the calling drivers aren't likely to have a bunch of struct pci_dev's 
for all the devices they are working with lying around. It's a much 
nicer from an API stand point to take struct devs and not worth it just 
to have a PCI API only taking struct pci_devs.


Logan

Re: [GIT PULL] bcm2835-dt-next-2018-03-13

2018-03-13 Thread Florian Fainelli

On 03/13/2018 10:30 AM, Eric Anholt wrote:
> Hi Florian,
> 
> The following changes since commit bcc76c4014dce4e3834dbd5b7f6593cbcfbfebe0:
> 
>   ARM: dts: bcm2835-rpi-zero-w: Enable OTG mode (2018-02-26 15:02:23 -0800)
> 
> are available in the Git repository at:
> 
>   git://github.com/anholt/linux tags/bcm2835-dt-next-2018-03-13
> 
> for you to fetch changes up to ae67cfc4f6abb379e5bbe9a5616f00b23bbecb01:
> 
>   ARM: bcm2835: Add the DPI hardware to the device tree. (2018-03-12 11:44:29 
> -0700)
> 
> 
> This pull request brings in one last patch to add the BCM2835 DPI
> controller to the device tree.
> 
> 
> Eric Anholt (1):
>   ARM: bcm2835: Add the DPI hardware to the device tree.

Merged, thanks Eric!
-- 
Florian

Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory

2018-03-13 Thread Logan Gunthorpe




On 13/03/18 11:49 AM, Sinan Kaya wrote:

And there's also the ACS problem which means if you want to use P2P on the root 
ports you'll have to disable ACS on the entire system. (Or preferably, the 
IOMMU groups need to get more sophisticated to allow for dynamic changes).



Do you think you can keep a pointer to the parent bridge instead of querying it
via get_upstream_bridge_port() here so that we can reuse your
pci_p2pdma_disable_acs() in the future.


Keep a pointer where? pci_p2pdma_disable_acs() and 
pci_p2pdma_add_client() are used in completely different cases on 
completely different devices. There really is no overlap and no obvious 
place to store the port pointer (except in the struct pci_dev itself, in 
which case why not just call the function again).



+int pci_p2pdma_disable_acs(struct pci_dev *pdev)
+{
+   struct pci_dev *up;
+   int pos;
+   u16 ctrl;
+
+   up = get_upstream_bridge_port(pdev);
+   if (!up)
+   return 0;

Some future device would only deal with pci_p2pdma_add_client(() for 
whitelisting
instead of changing all of your code.


That's a problem for whoever writes the future code.


We should at least limit the usage of get_upstream_bridge_port() family of 
functions
to probe time only.


Why?


We can tell iommu to do one to one translation by passing iommu.passthrough=1 
to kernel
command line to have identical behavior to your switch case.


Well, someone will need to write code for all available IOMMUs to 
support this. That's a very big project.


Logan

Re: [PATCH v3 3/3] arm64: dts: renesas: Add LVDS decoder to R-Car V3M Eagle

2018-03-13 Thread Simon Horman

On Tue, Mar 13, 2018 at 03:30:25PM +0100, Jacopo Mondi wrote:
> The R-Car V3M Eagle board includes a transparent THC63LVD1024 LVDS
> decoder, connected to the on-chip LVDS encoder output on one side
> and to HDMI encoder ADV7511w on the other one.
> 
> As the decoder does not need any configuration it has been so-far
> omitted from DTS. Now that a driver is available, describe it in DT
> as well.
> 
> Signed-off-by: Jacopo Mondi 

I have marked this as deferred pending acceptance of the new binding.
Please repost or ping me once that has happened.

Re: [3/3] irqchip/gic-v3: Bounds check redistributor accesses

2018-03-13 Thread Nishanth Menon

Marc,

On Tue, Mar 13, 2018 at 9:21 AM, Marc Zyngier  wrote:
> Hi Lokesh,
>
> On 13/03/18 13:38, Lokesh Vutla wrote:
>> Hi All,
>>
>> On Wednesday 11 October 2017 03:11 PM, Punit Agrawal wrote:
>>> The kernel crashes while iterating over a redistributor that is
>>> in-correctly sized by the platform firmware or doesn't contain the last
>>> record.
>>>
>>> Prevent the crash by checking accesses against the size of the region
>>> provided by the firmware. While we are at it, warn the user about
>>> incorrect region size.
>>>
>>> Signed-off-by: Punit Agrawal 
>>> Cc: Marc Zyngier 
>>
>> Sorry to bring up an old thread. Just wanted to check what is the status
>> on this series.
>
> So far, I wasn't inclined to merge it, as it only allowed you to detect
> a broken system, as opposed to help with a working one.

Is'nt that a good reason to have it? Why not help an error in dtb with
a debug helper than an obtuse crash to debug painfully?

>
>> This will also be useful when we try to boot linux + hypervisor with
>> less number of cores than the SoC supports. For example:
>> - SoC has 4 cores and Linux tries to boot with 2 cores.
>> - then a type-2 hypervisor gets installed.
>> - Hypervisor tries to boot a VM with linux on core 1.
>>
>> Now the VM boot will fail while it iterates over all the GICR regions
>> till GICR_TYPER is found. Hypervisor will trap any accesses to GICR
>> regions of any invalid cpus(cpu 2, cpu 3 in this case).
>
> It you're passing the redistributors to a guest, you're doing something
> terribly wrong. You're putting the guest in a position to do a DoS on
> the hypervisor (disabling its timer interrupt, for example). Not the
> greatest move. There is a number of other gotchas with this approach
> (virtual interrupts, distributor virtualization...).
>
>> If the $patch is not the right approach, can you suggest on how to
>> handle the above scenario?
>
> The proper way to handle this is to virtualize the distributor and
> redistributor by trap/emulate. The only thing you can safely pass to a
> guest is the CPU interface, either as system registers or in its MMIO
> form (if you have the GICv2 compatibility interface).
>

Dumb question: Would'nt a trap emulate be real expensive with hyp
context in v8 (all the context save/restore we'd have to do? (in the
context of discussion, GICV2 compat is disabled).


-- 
---
Regards,
Nishanth Menon

Re: [PATCH v2 1/5] iommu/amd - Add debugfs support

2018-03-13 Thread Gary R Hook


On 03/13/2018 12:16 PM, Andy Shevchenko wrote:

On Fri, Mar 9, 2018 at 2:50 AM, Gary R Hook  wrote:


+   default n


Redundant


Roger that.


+#include 
+#include 
+#include 


Keep in order?


What order would that be? These few needed files are listed in the same
order as which they appear in amd_iommu.c. I'm gonna need a preference
spelled out, please (and a rationale, so I may better understand).


+#include "amd_iommu_proto.h"
+#include "amd_iommu_types.h"



+/* DebugFS helpers */
+#defineOBUFP   (obuf + oboff)
+#defineOBUFLEN obuflen
+#defineOBUFSPC (OBUFLEN - oboff)
+#defineOSCNPRINTF(fmt, ...) \
+   scnprintf(OBUFP, OBUFSPC, fmt, ## __VA_ARGS__)


I don't see any advantages of this. Other way around, they will simple
makes things hard to read an understand in place.


I used this technique in the CCP driver code (where it was accepted), in 
an effort to do the opposite of what you claim: make the code more 
readable. Given the 80 column limit, a large number of arguments, and 
very long statements, IMO something needs to give. I don't find the use 
of #defines to be obfuscating.


I'm not trying to argue, but rather simply state the perspective / 
reasoning I used to create a source file I feel is manageable. I have 17 
more iommu patches built upon this strategy, and this seems to be 
advantageous for all of them.






+   for (i = start ; i <= end ; i++)


Missed {}


Wasn't sure about the M.O. given that the body of this loop is a single 
if statement. And I don't see anywhere in

https://www.kernel.org/doc/html/latest/process/coding-style.html
in section 3.1 where curly braces are called for in this situation. May 
I ask for clarification on the style rule, please?





+   if ((amd_iommu_dev_table[i].data[0] ^ 0x3)
+   || amd_iommu_dev_table[i].data[1])
+   n++;
+   return n;
+}



+
+static ssize_t amd_iommu_debugfs_dtecount_read(struct file *filp,
+ char __user *ubuf,
+ size_t count, loff_t *offp)
+{
+   struct amd_iommu *iommu = filp->private_data;



+   unsigned int obuflen = 512;


Sounds like way too much.


I can tune these up.




+   if (!iommu)
+   return 0;


When this possible?


It was intended as a sanity check, but if this happens, much worse has 
already gone wrong. I'll remove.





+   obuf = kmalloc(OBUFLEN, GFP_KERNEL);
+   if (!obuf)
+   return -ENOMEM;
+
+   n = amd_iommu_count_valid_dtes(0, 0x);
+   oboff += OSCNPRINTF("%d\n", n);



+   return ret;
+}




@@ -89,6 +89,7 @@
  #define ACPI_DEVFLAG_ATSDIS 0x1000

  #define LOOP_TIMEOUT   10
+
  /*
   * ACPI table definitions
   *


Doesn't belong to the patch.


I'm sorry, I don't understand. The added blank line doesn't belong to 
the patch?





+#endif
+
+


Extra unneeded line.


Thanks,

Re: [PATCH v2 5/5] iommu/amd - Add a debugfs entry to specify a IOMMU device table entry

2018-03-13 Thread Gary R Hook


On 03/13/2018 12:20 PM, Andy Shevchenko wrote:



+   oboff += OSCNPRINTF("%02x:%02x:%x (%u / %04x)\n",
+   PCI_BUS_NUM(amd_iommu_devid),
+   PCI_SLOT(amd_iommu_devid),
+   PCI_FUNC(amd_iommu_devid),


Perhaps at some point we will have an extension to %p to print PCI BDFs.


But until then  ;-)


+   if (strnchr(obuf, OBUFLEN, ':'))
+   {


Style


D'oh!


+   } else if (obuf[0] == '0' && obuf[1] == 'x') {
+   n = sscanf(obuf, "%x", _iommu_devid);
+   } else {
+   n = sscanf(obuf, "%d", _iommu_devid);
+   }


kstrtoint() ?


I see various mechanisms for this sort of thing, and simply chose one.
Am happy to use whatever is preferred.

Re: [PATCH v2 7/8] [PATCH 7/8] drivers/hwmon: Add a generic PECI hwmon client driver

2018-03-13 Thread Jae Hyun Yoo


Hi Stef,

Thanks for sharing your time to test it.

That is expected result in v2. Previously in v1, it used delayed 
creation on core temperature group so it was okay if hwmon driver is 
registered when client CPU is powered down, but in v2, the driver should 
check resolved cores at probing time to prevent the delayed creation on 
core temperature group and indexing gap which breaks hwmon subsystem's 
common rule, so I added peci_detect() into peci_new_device() in PECI 
core driver to check online status of the client CPU when registering a 
new device.


You may need to use dynamic dtoverlay for loading/unloading a PECI hwmon 
driver according to the current client CPU power state. It means a PECI 
hwmon driver can be registered only when the client CPU is powered on. 
This design will be kept in v3 as well.


Thanks,
Jae

On 3/13/2018 2:32 AM, Stef van Os wrote:

Hi Jae,

I tried version 1 and 2 of your PECI patch on our (AST2500 / Xeon E5 v4) 
system. The V1 patchset works as expected (reading back temperature 0 
until PECI is up), but the hwmon driver probe fails with version 2. It 
communicates with the Xeon and assumes during kernel boot of the Aspeed 
that PECI to the Xeon's is already up and running, but our system 
enables the main Xeon supplies from AST2500 userspace.


If I load the hwmon driver as a module to load later on, the driver does 
not call probe like e.g. a I2C driver on the I2C bus does. Am I using V2 
wrongly?


BR,
Stef

On 02/21/2018 05:16 PM, Jae Hyun Yoo wrote:

This commit adds a generic PECI hwmon client driver implementation.

Signed-off-by: Jae Hyun Yoo 
---
  drivers/hwmon/Kconfig  |  10 +
  drivers/hwmon/Makefile |   1 +
  drivers/hwmon/peci-hwmon.c | 928 
+

  3 files changed, 939 insertions(+)
  create mode 100644 drivers/hwmon/peci-hwmon.c

diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig
index ef23553ff5cb..f22e0c31f597 100644
--- a/drivers/hwmon/Kconfig
+++ b/drivers/hwmon/Kconfig
@@ -1246,6 +1246,16 @@ config SENSORS_NCT7904
    This driver can also be built as a module.  If so, the module
    will be called nct7904.
+config SENSORS_PECI_HWMON
+    tristate "PECI hwmon support"
+    depends on PECI
+    help
+  If you say yes here you get support for the generic PECI hwmon
+  driver.
+
+  This driver can also be built as a module.  If so, the module
+  will be called peci-hwmon.
+
  config SENSORS_NSA320
  tristate "ZyXEL NSA320 and compatible fan speed and temperature 
sensors"

  depends on GPIOLIB && OF
diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile
index f814b4ace138..946f54b168e5 100644
--- a/drivers/hwmon/Makefile
+++ b/drivers/hwmon/Makefile
@@ -135,6 +135,7 @@ obj-$(CONFIG_SENSORS_NCT7802)    += nct7802.o
  obj-$(CONFIG_SENSORS_NCT7904)    += nct7904.o
  obj-$(CONFIG_SENSORS_NSA320)    += nsa320-hwmon.o
  obj-$(CONFIG_SENSORS_NTC_THERMISTOR)    += ntc_thermistor.o
+obj-$(CONFIG_SENSORS_PECI_HWMON)    += peci-hwmon.o
  obj-$(CONFIG_SENSORS_PC87360)    += pc87360.o
  obj-$(CONFIG_SENSORS_PC87427)    += pc87427.o
  obj-$(CONFIG_SENSORS_PCF8591)    += pcf8591.o
diff --git a/drivers/hwmon/peci-hwmon.c b/drivers/hwmon/peci-hwmon.c
new file mode 100644
index ..edd27744adcb
--- /dev/null
+++ b/drivers/hwmon/peci-hwmon.c
@@ -0,0 +1,928 @@
+// SPDX-License-Identifier: GPL-2.0
+// Copyright (c) 2018 Intel Corporation
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define DIMM_SLOT_NUMS_MAX    12  /* Max DIMM numbers (channel ranks 
x 2) */
+#define CORE_NUMS_MAX 28  /* Max core numbers (max on SKX 
Platinum) */

+#define TEMP_TYPE_PECI    6   /* Sensor type 6: Intel PECI */
+
+#define CORE_TEMP_ATTRS   5
+#define DIMM_TEMP_ATTRS   2
+#define ATTR_NAME_LEN 24
+
+#define DEFAULT_ATTR_GRP_NUMS 5
+
+#define UPDATE_INTERVAL_MIN   HZ
+#define DIMM_MASK_CHECK_DELAY msecs_to_jiffies(5000)
+
+enum sign {
+    POS,
+    NEG
+};
+
+struct temp_data {
+    bool valid;
+    s32  value;
+    unsigned long last_updated;
+};
+
+struct temp_group {
+    struct temp_data tjmax;
+    struct temp_data tcontrol;
+    struct temp_data tthrottle;
+    struct temp_data dts_margin;
+    struct temp_data die;
+    struct temp_data core[CORE_NUMS_MAX];
+    struct temp_data dimm[DIMM_SLOT_NUMS_MAX];
+};
+
+struct core_temp_group {
+    struct sensor_device_attribute sd_attrs[CORE_TEMP_ATTRS];
+    char attr_name[CORE_TEMP_ATTRS][ATTR_NAME_LEN];
+    struct attribute *attrs[CORE_TEMP_ATTRS + 1];
+    struct attribute_group attr_group;
+};
+
+struct dimm_temp_group {
+    struct sensor_device_attribute sd_attrs[DIMM_TEMP_ATTRS];
+    char attr_name[DIMM_TEMP_ATTRS][ATTR_NAME_LEN];
+    struct attribute *attrs[DIMM_TEMP_ATTRS + 1];
+    struct attribute_group attr_group;
+};
+
+struct peci_hwmon {
+    struct peci_client *client;
+    struct device *dev;
+    struct device

Re: [PATCH v4 4/6] ipc: Clamp msgmni and shmmni to the real IPCMNI limit

2018-03-13 Thread Eric W. Biederman

Waiman Long  writes:

> A user can write arbitrary integer values to msgmni and shmmni sysctl
> parameters without getting error, but the actual limit is really
> IPCMNI (32k). This can mislead users as they think they can get a
> value that is not real.
>
> Enforcing the limit by failing the sysctl parameter write, however,
> can break existing user applications.

Which applications examples please.

I am seeing this patchset late but it looks like a whole lot of changes
to avoid a theoretical possibility.

Changes that have an impact on more than just the ipc code you are
patching.

That makes me feel very uncomfortable with these changes.

Eric


> Instead, the range clamping flag
> is set to enforce the limit without failing existing user code. Users
> can easily figure out if the sysctl parameter value is out of range
> by either reading back the parameter value or checking the kernel
> ring buffer for warning.
>
> Signed-off-by: Waiman Long 
> ---
>  ipc/ipc_sysctl.c | 9 +++--
>  1 file changed, 7 insertions(+), 2 deletions(-)
>
> diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c
> index 8ad93c2..1955dd4 100644
> --- a/ipc/ipc_sysctl.c
> +++ b/ipc/ipc_sysctl.c
> @@ -99,6 +99,7 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, 
> int write,
>  static int zero;
>  static int one = 1;
>  static int int_max = INT_MAX;
> +static int ipc_mni = IPCMNI;
>  
>  static struct ctl_table ipc_kern_table[] = {
>   {
> @@ -120,7 +121,10 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, 
> int write,
>   .data   = _ipc_ns.shm_ctlmni,
>   .maxlen = sizeof(init_ipc_ns.shm_ctlmni),
>   .mode   = 0644,
> - .proc_handler   = proc_ipc_dointvec,
> + .proc_handler   = proc_ipc_dointvec_minmax,
> + .extra1 = ,
> + .extra2 = _mni,
> + .flags  = CTL_FLAGS_CLAMP_RANGE,
>   },
>   {
>   .procname   = "shm_rmid_forced",
> @@ -147,7 +151,8 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, 
> int write,
>   .mode   = 0644,
>   .proc_handler   = proc_ipc_dointvec_minmax,
>   .extra1 = ,
> - .extra2 = _max,
> + .extra2 = _mni,
> + .flags  = CTL_FLAGS_CLAMP_RANGE,
>   },
>   {
>   .procname   = "auto_msgmni",

Re: [RFC PATCH linux-next] evm: evm_hmac can be static

2018-03-13 Thread Mimi Zohar

On Wed, 2018-03-14 at 01:42 +0800, kbuild test robot wrote:
> Fixes: c49fc264be98 ("evm: Move evm_hmac and evm_hash from evm_main.c to 
> evm_crypto.c")
> Signed-off-by: Fengguang Wu 

Thanks!

> ---
>  evm_crypto.c |4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/security/integrity/evm/evm_crypto.c 
> b/security/integrity/evm/evm_crypto.c
> index fdde9cb..a46fba3 100644
> --- a/security/integrity/evm/evm_crypto.c
> +++ b/security/integrity/evm/evm_crypto.c
> @@ -37,8 +37,8 @@ static DEFINE_MUTEX(mutex);
> 
>  static unsigned long evm_set_key_flags;
> 
> -char * const evm_hmac = "hmac(sha1)";
> -char * const evm_hash = "sha1";
> +static char * const evm_hmac = "hmac(sha1)";
> +static char * const evm_hash = "sha1";
> 
>  /**
>   * evm_set_key() - set EVM HMAC key from the kernel

Re: [PATCH 3/3] Bluetooth: hci_qca: Add serdev support

2018-03-13 Thread Loic Poulain

On 13 March 2018 at 17:03, Thierry Escande  wrote:
> From: Thierry Escande 
>
> Add support for Qualcomm serial slave devices. Probe the serial device,
> retrieve its maximum speed and register a new hci uart device.
>
> Signed-off-by: Thierry Escande 
> ---
>  drivers/bluetooth/Kconfig   |   2 +-
>  drivers/bluetooth/hci_qca.c | 102 
> +++-
>  2 files changed, 101 insertions(+), 3 deletions(-)
>
> diff --git a/drivers/bluetooth/Kconfig b/drivers/bluetooth/Kconfig
> index 07e55cd8f8c8..c2a6a7ebd14b 100644
> --- a/drivers/bluetooth/Kconfig
> +++ b/drivers/bluetooth/Kconfig
> @@ -195,7 +195,7 @@ config BT_HCIUART_BCM
>
>  config BT_HCIUART_QCA
> bool "Qualcomm Atheros protocol support"
> -   depends on BT_HCIUART
> +   depends on BT_HCIUART_SERDEV

Driver depends on both BT_HCIUART and BT_HCIUART_SERDEV.

> @@ -386,6 +399,7 @@ static void hci_ibs_wake_retrans_timeout(struct 
> timer_list *t)
>  /* Initialize protocol */
>  static int qca_open(struct hci_uart *hu)
>  {
> +   struct qca_serdev *qcadev;
> struct qca_data *qca;
>
> BT_DBG("hu %p qca_open", hu);
> @@ -444,6 +458,13 @@ static int qca_open(struct hci_uart *hu)
> timer_setup(>tx_idle_timer, hci_ibs_tx_idle_timeout, 0);
> qca->tx_idle_delay = IBS_TX_IDLE_TIMEOUT_MS;
>
> +   if (hu->serdev) {
> +   serdev_device_open(hu->serdev);
> +
> +   qcadev = serdev_device_get_drvdata(hu->serdev);
> +   gpiod_set_value(qcadev->bt_en, 1);

I suggest to replace gpiod_set_value calls with the _cansleep version.
You always set gpio value from sleepable context and this should avoid
any further issue with different gpio controller.

Regards,
Loic

Re: [PATCH v2 1/3] KVM: X86: Provides userspace with a capability to not intercept MWAIT

2018-03-13 Thread Jim Mattson

Is there a need for a new API for yielding MONITOR/MWAIT to the guest?
Why not just tie this to the guest CPUID.01H:ECX[MWAIT] being set?

On Mon, Mar 12, 2018 at 4:53 AM, Wanpeng Li  wrote:
> From: Wanpeng Li 
>
> Allowing a guest to execute MWAIT without interception enables a guest
> to put a (physical) CPU into a power saving state, where it takes
> longer to return from than what may be desired by the host.
>
> Don't give a guest that power over a host by default. (Especially,
> since nothing prevents a guest from using MWAIT even when it is not
> advertised via CPUID.)
>
> Cc: Paolo Bonzini 
> Cc: Radim Krčmář 
> Cc: Jan H. Schönherr 
> Signed-off-by: Wanpeng Li 
> ---
>  Documentation/virtual/kvm/api.txt | 23 ++-
>  arch/x86/include/asm/kvm_host.h   |  2 ++
>  arch/x86/kvm/svm.c|  2 +-
>  arch/x86/kvm/vmx.c|  9 +
>  arch/x86/kvm/x86.c| 24 
>  arch/x86/kvm/x86.h| 10 +-
>  include/uapi/linux/kvm.h  |  2 +-
>  tools/include/uapi/linux/kvm.h|  2 +-
>  8 files changed, 49 insertions(+), 25 deletions(-)
>
> diff --git a/Documentation/virtual/kvm/api.txt 
> b/Documentation/virtual/kvm/api.txt
> index 98de506..76e5a15 100644
> --- a/Documentation/virtual/kvm/api.txt
> +++ b/Documentation/virtual/kvm/api.txt
> @@ -4358,6 +4358,20 @@ enables QEMU to build error log and branch to guest 
> kernel registered
>  machine check handling routine. Without this capability KVM will
>  branch to guests' 0x200 interrupt vector.
>
> +7.13 KVM_CAP_X86_DISABLE_EXITS
> +
> +Architectures: x86
> +Parameters: args[0] defines which exits are disabled
> +Returns: 0 on success, -EINVAL when args[0] contains invalid exits
> +
> +Valid exits in args[0] are
> +
> +#define KVM_X86_DISABLE_EXITS_MWAIT(1 << 0)
> +
> +Enabling this capability on a VM provides userspace with a way to no
> +longer intercepts some instructions for improved latency in some
> +workloads.
> +
>  8. Other capabilities.
>  --
>
> @@ -4470,15 +4484,6 @@ reserved.
>  Both registers and addresses are 64-bits wide.
>  It will be possible to run 64-bit or 32-bit guest code.
>
> -8.8 KVM_CAP_X86_GUEST_MWAIT
> -
> -Architectures: x86
> -
> -This capability indicates that guest using memory monotoring instructions
> -(MWAIT/MWAITX) to stop the virtual CPU will not cause a VM exit.  As such 
> time
> -spent while virtual CPU is halted in this way will then be accounted for as
> -guest running time on the host (as opposed to e.g. HLT).
> -
>  8.9 KVM_CAP_ARM_USER_IRQ
>
>  Architectures: arm, arm64
> diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
> index 0395c35..e107171 100644
> --- a/arch/x86/include/asm/kvm_host.h
> +++ b/arch/x86/include/asm/kvm_host.h
> @@ -811,6 +811,8 @@ struct kvm_arch {
>
> gpa_t wall_clock;
>
> +   bool mwait_in_guest;
> +
> bool ept_identity_pagetable_done;
> gpa_t ept_identity_map_addr;
>
> diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
> index be9c839..321b3fd 100644
> --- a/arch/x86/kvm/svm.c
> +++ b/arch/x86/kvm/svm.c
> @@ -1390,7 +1390,7 @@ static void init_vmcb(struct vcpu_svm *svm)
> set_intercept(svm, INTERCEPT_XSETBV);
> set_intercept(svm, INTERCEPT_RSM);
>
> -   if (!kvm_mwait_in_guest()) {
> +   if (!kvm_mwait_in_guest(svm->vcpu.kvm)) {
> set_intercept(svm, INTERCEPT_MONITOR);
> set_intercept(svm, INTERCEPT_MWAIT);
> }
> diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
> index 6cefd7b..2302ae2 100644
> --- a/arch/x86/kvm/vmx.c
> +++ b/arch/x86/kvm/vmx.c
> @@ -3733,13 +3733,11 @@ static __init int setup_vmcs_config(struct 
> vmcs_config *vmcs_conf)
>   CPU_BASED_UNCOND_IO_EXITING |
>   CPU_BASED_MOV_DR_EXITING |
>   CPU_BASED_USE_TSC_OFFSETING |
> + CPU_BASED_MWAIT_EXITING |
> + CPU_BASED_MONITOR_EXITING |
>   CPU_BASED_INVLPG_EXITING |
>   CPU_BASED_RDPMC_EXITING;
>
> -   if (!kvm_mwait_in_guest())
> -   min |= CPU_BASED_MWAIT_EXITING |
> -   CPU_BASED_MONITOR_EXITING;
> -
> opt = CPU_BASED_TPR_SHADOW |
>   CPU_BASED_USE_MSR_BITMAPS |
>   CPU_BASED_ACTIVATE_SECONDARY_CONTROLS;
> @@ -5531,6 +5529,9 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx)
> exec_control |= CPU_BASED_CR3_STORE_EXITING |
> CPU_BASED_CR3_LOAD_EXITING  |
> CPU_BASED_INVLPG_EXITING;
> +   if (kvm_mwait_in_guest(vmx->vcpu.kvm))
> +   exec_control &= ~(CPU_BASED_MWAIT_EXITING |
> +   CPU_BASED_MONITOR_EXITING);
> return exec_control;
>  }
>
> diff --git

Re: [PATCH v4 2/2] ARM: ftrace: Add MODULE_PLTS support

2018-03-13 Thread Alexander Sverdlin

Hi!

On 13/03/18 18:51, Ard Biesheuvel wrote:
>>> if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS))
>>> module_plt_alloc_fixed();
>> Do you consider this a legal C code if without module-plts.o the function 
>> would not exist at all?
>> That's too much relying on optimizer I think...
>>
> Yes, we rely on that in many different places in the kernel.

https://www.kernel.org/doc/Documentation/process/coding-style.rst:
"However, this approach still allows the C compiler to see the code
inside the block, and check it for correctness (syntax, types, symbol
references, etc).  Thus, you still have to use an #ifdef if the code inside the
block references symbols that will not exist if the condition is not met."

But we can of course ignore it.

-- 
Best regards,
Alexander Sverdlin.

[v6 1/2] mm: disable interrupts while initializing deferred pages

2018-03-13 Thread Pavel Tatashin

Vlastimil Babka reported about a window issue during which when deferred
pages are initialized, and the current version of on-demand initialization
is finished, allocations may fail.  While this is highly unlikely scenario,
since this kind of allocation request must be large, and must come from
interrupt handler, we still want to cover it.

We solve this by initializing deferred pages with interrupts disabled, and
holding node_size_lock spin lock while pages in the node are being
initialized. The on-demand deferred page initialization that comes later
will use the same lock, and thus synchronize with deferred_init_memmap().

It is unlikely for threads that initialize deferred pages to be
interrupted.  They run soon after smp_init(), but before modules are
initialized, and long before user space programs. This is why there is no
adverse effect of having these threads running with interrupts disabled.

Signed-off-by: Pavel Tatashin 
---
 include/linux/memory_hotplug.h | 53 ++
 include/linux/mmzone.h |  5 ++--
 mm/page_alloc.c| 19 ---
 3 files changed, 42 insertions(+), 35 deletions(-)

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index aba5f86eb038..2b0265265c28 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -51,24 +51,6 @@ enum {
MMOP_ONLINE_MOVABLE,
 };
 
-/*
- * pgdat resizing functions
- */
-static inline
-void pgdat_resize_lock(struct pglist_data *pgdat, unsigned long *flags)
-{
-   spin_lock_irqsave(>node_size_lock, *flags);
-}
-static inline
-void pgdat_resize_unlock(struct pglist_data *pgdat, unsigned long *flags)
-{
-   spin_unlock_irqrestore(>node_size_lock, *flags);
-}
-static inline
-void pgdat_resize_init(struct pglist_data *pgdat)
-{
-   spin_lock_init(>node_size_lock);
-}
 /*
  * Zone resizing functions
  *
@@ -246,13 +228,6 @@ extern void clear_zone_contiguous(struct zone *zone);
___page;\
  })
 
-/*
- * Stub functions for when hotplug is off
- */
-static inline void pgdat_resize_lock(struct pglist_data *p, unsigned long *f) 
{}
-static inline void pgdat_resize_unlock(struct pglist_data *p, unsigned long 
*f) {}
-static inline void pgdat_resize_init(struct pglist_data *pgdat) {}
-
 static inline unsigned zone_span_seqbegin(struct zone *zone)
 {
return 0;
@@ -293,6 +268,34 @@ static inline bool movable_node_is_enabled(void)
 }
 #endif /* ! CONFIG_MEMORY_HOTPLUG */
 
+#if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_DEFERRED_STRUCT_PAGE_INIT)
+/*
+ * pgdat resizing functions
+ */
+static inline
+void pgdat_resize_lock(struct pglist_data *pgdat, unsigned long *flags)
+{
+   spin_lock_irqsave(>node_size_lock, *flags);
+}
+static inline
+void pgdat_resize_unlock(struct pglist_data *pgdat, unsigned long *flags)
+{
+   spin_unlock_irqrestore(>node_size_lock, *flags);
+}
+static inline
+void pgdat_resize_init(struct pglist_data *pgdat)
+{
+   spin_lock_init(>node_size_lock);
+}
+#else /* !(CONFIG_MEMORY_HOTPLUG || CONFIG_DEFERRED_STRUCT_PAGE_INIT) */
+/*
+ * Stub functions for when hotplug is off
+ */
+static inline void pgdat_resize_lock(struct pglist_data *p, unsigned long *f) 
{}
+static inline void pgdat_resize_unlock(struct pglist_data *p, unsigned long 
*f) {}
+static inline void pgdat_resize_init(struct pglist_data *pgdat) {}
+#endif /* !(CONFIG_MEMORY_HOTPLUG || CONFIG_DEFERRED_STRUCT_PAGE_INIT) */
+
 #ifdef CONFIG_MEMORY_HOTREMOVE
 
 extern bool is_mem_section_removable(unsigned long pfn, unsigned long 
nr_pages);
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 7522a6987595..d14168da66a7 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -633,14 +633,15 @@ typedef struct pglist_data {
 #ifndef CONFIG_NO_BOOTMEM
struct bootmem_data *bdata;
 #endif
-#ifdef CONFIG_MEMORY_HOTPLUG
+#if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_DEFERRED_STRUCT_PAGE_INIT)
/*
 * Must be held any time you expect node_start_pfn, node_present_pages
 * or node_spanned_pages stay constant.  Holding this will also
 * guarantee that any pfn_valid() stays that way.
 *
 * pgdat_resize_lock() and pgdat_resize_unlock() are provided to
-* manipulate node_size_lock without checking for CONFIG_MEMORY_HOTPLUG.
+* manipulate node_size_lock without checking for CONFIG_MEMORY_HOTPLUG
+* or CONFIG_DEFERRED_STRUCT_PAGE_INIT.
 *
 * Nests above zone->lock and zone->span_seqlock
 */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 3d974cb2a1a1..cada509e2176 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1506,7 +1506,7 @@ static void __init deferred_free_pages(int nid, int zid, 
unsigned long pfn,
} else if (!(pfn & nr_pgmask)) {
deferred_free_range(pfn - nr_free, nr_free);

[v6 0/2] initialize pages on demand during boot

2018-03-13 Thread Pavel Tatashin

Change log:
v5 - v6
- Fixed issues found by Andrew Morton: replaced cond_resched() with
  touch_nmi_watchdog(), instead of simply deleting it.
- Removed useless pgdata_resize_lock_irq(), as regular
  pgdata_resize_lock() does exactly what is needed.
- Included fixes to comments by Andrew from
  mm-initialize-pages-on-demand-during-boot-v5-fix.patch.

v4 - v5
- Fix issue reported by Vlasimil Babka:
  > I've noticed that this function first disables the
  > on-demand initialization, and then runs the kthreads.
  > Doesn't that leave a window where allocations can fail? The
  > chances are probably small, but I think it would be better
  > to avoid it completely, rare failures suck.
  >
  > Fixing that probably means rethinking the whole
  > synchronization more dramatically though :/
- Introduced a new patch that uses node resize lock to synchronize
  on-demand deferred page initialization, and regular deferred page
  initialization.

v3 - v4
- Fix !CONFIG_NUMA issue.

v2 - v3
Andrew Morton's comments:
- Moved read of pgdat->first_deferred_pfn into
  deferred_zone_grow_lock, thus got rid of READ_ONCE()/WRITE_ONCE()
- Replaced spin_lock() with spin_lock_irqsave() in
  deferred_grow_zone
- Updated comments for deferred_zone_grow_lock
- Updated comment before deferred_grow_zone() explaining return
  value, and also noinline specifier.
- Fixed comment before _deferred_grow_zone().

v1 - v2
Added Tested-by: Masayoshi Mizuma

This change helps for three reasons:

1. Insufficient amount of reserved memory due to arguments provided by
user. User may request some buffers, increased hash tables sizes etc.
Currently, machine panics during boot if it can't allocate memory due
to insufficient amount of reserved memory. With this change, it will
be able to grow zone before deferred pages are initialized.

One observed example is described in the linked discussion [1] Mel
Gorman writes:

"
Yasuaki Ishimatsu reported a premature OOM when trace_buf_size=100m was
specified on a machine with many CPUs. The kernel tried to allocate 38.4GB
but only 16GB was available due to deferred memory initialisation.
"

The allocations in the above scenario happen per-cpu in smp_init(),
and before deferred pages are initialized. So, there is no way to
predict how much memory we should put aside to boot successfully with
deferred page initialization feature compiled in.

2. The second reason is future proof. The kernel memory requirements
may change, and we do not want to constantly update
reset_deferred_meminit() to satisfy the new requirements. In addition,
this function is currently in common code, but potentially would need
to be split into arch specific variants, as more arches will start
taking advantage of deferred page initialization feature.

3. On demand initialization of reserved pages guarantees that we will
initialize only as many pages early in boot using only one thread as
needed, the rest are going to be efficiently initialized in parallel.

[1] https://www.spinics.net/lists/linux-mm/msg139087.html

Pavel Tatashin (2):
  mm: disable interrupts while initializing deferred pages
  mm: initialize pages on demand during boot

 include/linux/memblock.h   |  10 --
 include/linux/memory_hotplug.h |  53 ++-
 include/linux/mmzone.h |   5 +-
 mm/memblock.c  |  23 -
 mm/page_alloc.c| 202 +++--
 5 files changed, 186 insertions(+), 107 deletions(-)

-- 
2.16.2

[v6 2/2] mm: initialize pages on demand during boot

2018-03-13 Thread Pavel Tatashin

Deferred page initialization allows the boot cpu to initialize a small
subset of the system's pages early in boot, with other cpus doing the rest
later on.

It is, however, problematic to know how many pages the kernel needs during
boot.  Different modules and kernel parameters may change the requirement,
so the boot cpu either initializes too many pages or runs out of memory.

To fix that, initialize early pages on demand.  This ensures the kernel
does the minimum amount of work to initialize pages during boot and leaves
the rest to be divided in the multithreaded initialization path
(deferred_init_memmap).

The on-demand code is permanently disabled using static branching once
deferred pages are initialized.  After the static branch is changed to
false, the overhead is up-to two branch-always instructions if the zone
watermark check fails or if rmqueue fails.

Sergey Senozhatsky noticed that while deferred pages currently make sense
only on NUMA machines (we start one thread per latency node), CONFIG_NUMA
is not a requirement for CONFIG_DEFERRED_STRUCT_PAGE_INIT, so that is also
must be addressed in the patch.

Signed-off-by: Pavel Tatashin 
Reviewed-by: Daniel Jordan 
Reviewed-by: Steven Sistare 
Reviewed-by: Andrew Morton 
Tested-by: Masayoshi Mizuma 
Acked-by: Mel Gorman 
---
 include/linux/memblock.h |  10 ---
 mm/memblock.c|  23 --
 mm/page_alloc.c  | 183 +--
 3 files changed, 144 insertions(+), 72 deletions(-)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index 8be5077efb5f..6c305afd95ab 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -417,21 +417,11 @@ static inline void early_memtest(phys_addr_t start, 
phys_addr_t end)
 {
 }
 #endif
-
-extern unsigned long memblock_reserved_memory_within(phys_addr_t start_addr,
-   phys_addr_t end_addr);
 #else
 static inline phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align)
 {
return 0;
 }
-
-static inline unsigned long memblock_reserved_memory_within(phys_addr_t 
start_addr,
-   phys_addr_t end_addr)
-{
-   return 0;
-}
-
 #endif /* CONFIG_HAVE_MEMBLOCK */
 
 #endif /* __KERNEL__ */
diff --git a/mm/memblock.c b/mm/memblock.c
index b6ba6b7adadc..80a12c64b203 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1778,29 +1778,6 @@ static void __init_memblock memblock_dump(struct 
memblock_type *type)
}
 }
 
-extern unsigned long __init_memblock
-memblock_reserved_memory_within(phys_addr_t start_addr, phys_addr_t end_addr)
-{
-   struct memblock_region *rgn;
-   unsigned long size = 0;
-   int idx;
-
-   for_each_memblock_type(idx, (), rgn) {
-   phys_addr_t start, end;
-
-   if (rgn->base + rgn->size < start_addr)
-   continue;
-   if (rgn->base > end_addr)
-   continue;
-
-   start = rgn->base;
-   end = start + rgn->size;
-   size += end - start;
-   }
-
-   return size;
-}
-
 void __init_memblock __memblock_dump_all(void)
 {
pr_info("MEMBLOCK configuration:\n");
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index cada509e2176..529e2dce7d16 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -292,40 +292,6 @@ EXPORT_SYMBOL(nr_online_nodes);
 int page_group_by_mobility_disabled __read_mostly;
 
 #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT
-
-/*
- * Determine how many pages need to be initialized during early boot
- * (non-deferred initialization).
- * The value of first_deferred_pfn will be set later, once non-deferred pages
- * are initialized, but for now set it ULONG_MAX.
- */
-static inline void reset_deferred_meminit(pg_data_t *pgdat)
-{
-   phys_addr_t start_addr, end_addr;
-   unsigned long max_pgcnt;
-   unsigned long reserved;
-
-   /*
-* Initialise at least 2G of a node but also take into account that
-* two large system hashes that can take up 1GB for 0.25TB/node.
-*/
-   max_pgcnt = max(2UL << (30 - PAGE_SHIFT),
-   (pgdat->node_spanned_pages >> 8));
-
-   /*
-* Compensate the all the memblock reservations (e.g. crash kernel)
-* from the initial estimation to make sure we will initialize enough
-* memory to boot.
-*/
-   start_addr = PFN_PHYS(pgdat->node_start_pfn);
-   end_addr = PFN_PHYS(pgdat->node_start_pfn + max_pgcnt);
-   reserved = memblock_reserved_memory_within(start_addr, end_addr);
-   max_pgcnt += PHYS_PFN(reserved);
-
-   pgdat->static_init_pgcnt = min(max_pgcnt, pgdat->node_spanned_pages);
-   pgdat->first_deferred_pfn = ULONG_MAX;
-}
-
 /* Returns true if the struct page for the pfn is uninitialised */
 static inline bool __meminit

Re: [PATCH v4 2/2] ARM: ftrace: Add MODULE_PLTS support

2018-03-13 Thread Ard Biesheuvel

On 13 March 2018 at 18:24, Alexander Sverdlin
 wrote:
> Hi!
>
> On 13/03/18 18:51, Ard Biesheuvel wrote:
 if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS))
 module_plt_alloc_fixed();
>>> Do you consider this a legal C code if without module-plts.o the function 
>>> would not exist at all?
>>> That's too much relying on optimizer I think...
>>>
>> Yes, we rely on that in many different places in the kernel.
>
> https://www.kernel.org/doc/Documentation/process/coding-style.rst:
> "However, this approach still allows the C compiler to see the code
> inside the block, and check it for correctness (syntax, types, symbol
> references, etc).  Thus, you still have to use an #ifdef if the code inside 
> the
> block references symbols that will not exist if the condition is not met."
>

"will not exist" is ambiguous here. It is rather common to declare
symbols, but only define them conditionally, and use IS_ENABLED() to
refer to them. As the documentation says, this gets rid of #ifdefs,
making the code always visible to the compiler which is a good thing.

Re: [PATCH v4 2/2] ARM: ftrace: Add MODULE_PLTS support

2018-03-13 Thread Alexander Sverdlin

Hi!

On 13/03/18 18:39, Ard Biesheuvel wrote:
>>  u32 get_module_plt(struct module *mod, unsigned long loc, Elf32_Addr 
>> val)
>>  {
>> struct mod_plt_sec *pltsec = !in_init(mod, loc) ? 
>> >arch.core :
>>   
>> >arch.init;
>> +   struct plt_entries *plt;
>> +   int idx;
>>
>> -   struct plt_entries *plt = (struct plt_entries 
>> *)pltsec->plt->sh_addr;
^^^ (*)

>> -   int idx = 0;
>> +   /* cache the address, ELF header is available only during module 
>> load */
>> +   if (!pltsec->plt_ent)
>> +   pltsec->plt_ent = (struct plt_entries 
>> *)pltsec->plt->sh_addr;
>> +   plt = pltsec->plt_ent;
>> +

(*)

> Where is plt_ent ever used?
 Above is exactly the place it's used.
 I need to cache it because after the module load is finished the ELF 
 header is freed,
 pltsec->plt pointer (*) is not valid any more.
 With the above modification it's possible to call the function during the 
 whole life
 time of the module.

>>> Right, ok. That's a problem.
>>>
>>> This means that you are relying on get_module_plt() being called at
>>> least once at module load time, which is not guaranteed.
>> This is indeed guaranteed. For FTRACE use case. If it's being called from 
>> FTRACE in
>> run time, this would mean there were long calls in this module section, 
>> which in
>> turn means, get_module_plt() was called at least once for this module and 
>> this
>> section.
>>
>> This doesn't hold in general, though.
>>
>> In any case, if you insist, I can try to rework the whole stuff implementing 
>> module_finalize().
>>
> I think it would be much better to use the module_finalize() hook for
> this, given that it is only called once already, and all the data you
> need is still available.

No problem, but some kind of (*) block would still be required, because
get_module_plt() has to work after module_finalize() as well as *before* it.
So before module_finalize() we would have to dereference pltsec->plt->sh_addr 
conditionally.
 
> Note that ARM already has such a function, so you'll need to add a
> call there, i.e.,
> 
> if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS))
> module_plt_alloc_fixed();
> 
> or something like that. The CONFIG_FTRACE dependency can be kept local
> to module-plts.c
> 

-- 
Best regards,
Alexander Sverdlin.

Re: [PATCH] debugobjects: avoid another unused variable warning.

2018-03-13 Thread Yang Shi




On 3/13/18 6:18 AM, Arnd Bergmann wrote:

debug_objects_maxchecked is only updated in __debug_check_no_obj_freed(),
and only read in debug_objects_maxchecked, unfortunately both of
these are optional and depend on different Kconfig symbols. When both
CONFIG_DEBUG_OBJECTS_FREE and CONFIG_DEBUG_FS are disabled, we get
this warning:

lib/debugobjects.c:56:14: error: 'debug_objects_maxchecked' defined but not 
used [-Werror=unused-variable]

Rather than trying to add more complex #ifdef protections, this marks
the variable as __maybe_unused so it can be silently dropped when
usused.


Thanks for catching this.

Acked-by: Yang Shi 



Fixes: bd9dcd046509 ("debugobjects: Export max loops counter")
Signed-off-by: Arnd Bergmann 
---
  lib/debugobjects.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/lib/debugobjects.c b/lib/debugobjects.c
index 105ecfc47d8c..994be4805cec 100644
--- a/lib/debugobjects.c
+++ b/lib/debugobjects.c
@@ -53,7 +53,7 @@ static intobj_nr_tofree;
  static struct kmem_cache  *obj_cache;
  
  static int			debug_objects_maxchain __read_mostly;

-static int debug_objects_maxchecked __read_mostly;
+static int __maybe_unused  debug_objects_maxchecked __read_mostly;
  static intdebug_objects_fixups __read_mostly;
  static intdebug_objects_warnings __read_mostly;
  static intdebug_objects_enabled __read_mostly

Re: [PATCH 0/3] irqchip: GIC kexec/kdump improvement and workarounds

2018-03-13 Thread Marc Zyngier

On 13/03/18 17:51, Mark Rutland wrote:
> On Tue, Mar 13, 2018 at 05:21:00PM +, Marc Zyngier wrote:
>> As kexec and kdump are getting used a bit more intensively, I've been
>> made aware of a number of shortcomings.
>>
>> The main gripe is from folks trying to launch a kdump kernel from
>> within an interrupt handler. If using EOImode==1, things work as
>> expected. If using EOImode==0 (such as in a guest), the secondary
>> kernel hangs as the previous interrupt hasn't been EOI'd, and the
>> active priority is still set. The first two patches are addressing
>> this situation for both GICv2 and GICv3 by reseting the APRs to their
>> default value.
> 
> As a more general thing, if irqchip drivers have state that needs to be
> reset in their init code, can we live all this irqchip reset to the
> crashdump kernel, and kill machine_kexec_mask_interrupts() entirely?

We could, once we know for sure that all the potential irqchips have
been fixed. Or we could just remove it immediately, and see what breaks.

> That would avoid some work (including pointer chasing on potentially
> corrupt memory) in the kernel that crashed, making it more likely that
> we get to the crashkernel intact...

Seems perfectly sensible to me.

M.
-- 
Jazz is not dead. It just smells funny...

Re: [bug, bisected] pfifo_fast causes packet reordering

2018-03-13 Thread Dave Taht

On Tue, Mar 13, 2018 at 11:24 AM, Jakob Unterwurzacher
 wrote:
> During stress-testing our "ucan" USB/CAN adapter SocketCAN driver on Linux
> v4.16-rc4-383-ged58d66f60b3 we observed that a small fraction of packets are
> delivered out-of-order.
>
> We have tracked the problem down to the driver interface level, and it seems
> that the driver's net_device_ops.ndo_start_xmit() function gets the packets
> handed over in the wrong order.
>
> This behavior was not observed on Linux v4.15 and I have bisected the
> problem down to this patch:
>
>> commit c5ad119fb6c09b0297446be05bd66602fa564758
>> Author: John Fastabend 
>> Date:   Thu Dec 7 09:58:19 2017 -0800
>>
>>net: sched: pfifo_fast use skb_array
>>
>>This converts the pfifo_fast qdisc to use the skb_array data structure
>>and set the lockless qdisc bit. pfifo_fast is the first qdisc to
>> support
>>the lockless bit that can be a child of a qdisc requiring locking. So
>>we add logic to clear the lock bit on initialization in these cases
>> when
>>the qdisc graft operation occurs.
>>
>>This also removes the logic used to pick the next band to dequeue from
>>and instead just checks a per priority array for packets from top
>> priority
>>to lowest. This might need to be a bit more clever but seems to work
>>for now.
>>
>>Signed-off-by: John Fastabend 
>>Signed-off-by: David S. Miller 
>
>
> The patch does not revert cleanly, but moving to one commit earlier makes
> the problem go away.
>
> Selecting the "fq" scheduler instead of "pfifo_fast" makes the problem go
> away as well.

I am of course, a fan of obsoleting pfifo_fast. There's no good reason
for it anymore.

>
> Is this an unintended side-effect of the patch or is there something the
> driver has to do to request in-order delivery?
>
> Thanks,
> Jakob



-- 

Dave Täht
CEO, TekLibre, LLC
http://www.teklibre.com
Tel: 1-669-226-2619

[PATCH RFC 4/8] powerpc/64s: Add support for ori barrier_nospec

2018-03-13 Thread Michal Suchanek

Copypasta from rfi implementation

Signed-off-by: Michal Suchanek 
---
 arch/powerpc/include/asm/barrier.h|  4 ++--
 arch/powerpc/include/asm/feature-fixups.h |  9 +
 arch/powerpc/include/asm/setup.h  |  8 
 arch/powerpc/kernel/setup_64.c| 29 +
 arch/powerpc/kernel/vmlinux.lds.S |  7 +++
 arch/powerpc/lib/feature-fixups.c | 27 +++
 6 files changed, 82 insertions(+), 2 deletions(-)

diff --git a/arch/powerpc/include/asm/barrier.h 
b/arch/powerpc/include/asm/barrier.h
index 8e47b3abe405..4079a95e84c2 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -75,9 +75,9 @@ do {  
\
___p1;  \
 })
 
-/* TODO: add patching so this can be disabled */
 /* Prevent speculative execution past this barrier. */
-#define barrier_nospec_asm ori 31,31,0
+#define barrier_nospec_asm SPEC_BARRIER_FIXUP_SECTION; \
+   nop
 #ifdef __ASSEMBLY__
 #define barrier_nospec barrier_nospec_asm
 #else
diff --git a/arch/powerpc/include/asm/feature-fixups.h 
b/arch/powerpc/include/asm/feature-fixups.h
index 1e82eb3caabd..9d3382618ffd 100644
--- a/arch/powerpc/include/asm/feature-fixups.h
+++ b/arch/powerpc/include/asm/feature-fixups.h
@@ -195,11 +195,20 @@ label##3: \
FTR_ENTRY_OFFSET 951b-952b; \
.popsection;
 
+#define SPEC_BARRIER_FIXUP_SECTION \
+953:   \
+   .pushsection __spec_barrier_fixup,"a";  \
+   .align 2;   \
+954:   \
+   FTR_ENTRY_OFFSET 953b-954b; \
+   .popsection;
+
 
 #ifndef __ASSEMBLY__
 #include 
 
 extern long __start___rfi_flush_fixup, __stop___rfi_flush_fixup;
+extern long __start___spec_barrier_fixup, __stop___spec_barrier_fixup;
 
 void apply_feature_fixups(void);
 void setup_feature_keys(void);
diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h
index 469b7fdc9be4..486d02e4a310 100644
--- a/arch/powerpc/include/asm/setup.h
+++ b/arch/powerpc/include/asm/setup.h
@@ -49,8 +49,16 @@ enum l1d_flush_type {
L1D_FLUSH_MTTRIG= 0x8,
 };
 
+/* These are bit flags */
+enum spec_barrier_type {
+   SPEC_BARRIER_NONE   = 0x1,
+   SPEC_BARRIER_ORI= 0x2,
+};
+
 void __init setup_rfi_flush(enum l1d_flush_type, bool enable);
 void do_rfi_flush_fixups(enum l1d_flush_type types);
+void __init setup_barrier_nospec(enum spec_barrier_type, bool enable);
+void do_barrier_nospec_fixups(enum spec_barrier_type type);
 
 #endif /* !__ASSEMBLY__ */
 
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index c388cc3357fa..09f21a954bfc 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -815,6 +815,10 @@ static enum l1d_flush_type enabled_flush_types;
 static void *l1d_flush_fallback_area;
 static bool no_rfi_flush;
 bool rfi_flush;
+enum spec_barrier_type powerpc_barrier_nospec;
+static enum spec_barrier_type barrier_nospec_type;
+static bool no_nospec;
+bool barrier_nospec_enabled;
 
 static int __init handle_no_rfi_flush(char *p)
 {
@@ -899,6 +903,31 @@ void __init setup_rfi_flush(enum l1d_flush_type types, 
bool enable)
rfi_flush_enable(enable);
 }
 
+void barrier_nospec_enable(bool enable)
+{
+   barrier_nospec_enabled = enable;
+
+   if (enable) {
+   powerpc_barrier_nospec = barrier_nospec_type;
+   do_barrier_nospec_fixups(powerpc_barrier_nospec);
+   on_each_cpu(do_nothing, NULL, 1);
+   } else {
+   powerpc_barrier_nospec = SPEC_BARRIER_NONE;
+   do_barrier_nospec_fixups(powerpc_barrier_nospec);
+   }
+}
+
+void __init setup_barrier_nospec(enum spec_barrier_type type, bool enable)
+{
+   if (type & SPEC_BARRIER_ORI)
+   pr_info("barrier_nospec: Using ori type flush\n");
+
+   barrier_nospec_type = type;
+
+   if (!no_nospec)
+   barrier_nospec_enable(enable);
+}
+
 #ifdef CONFIG_DEBUG_FS
 static int rfi_flush_set(void *data, u64 val)
 {
diff --git a/arch/powerpc/kernel/vmlinux.lds.S 
b/arch/powerpc/kernel/vmlinux.lds.S
index c8af90ff49f0..744b58ff77f1 100644
--- a/arch/powerpc/kernel/vmlinux.lds.S
+++ b/arch/powerpc/kernel/vmlinux.lds.S
@@ -139,6 +139,13 @@ SECTIONS
*(__rfi_flush_fixup)
__stop___rfi_flush_fixup = .;
}
+
+   . = ALIGN(8);
+   __spec_barrier_fixup : AT(ADDR(__spec_barrier_fixup) - LOAD_OFFSET) {
+   __start___spec_barrier_fixup = .;
+   *(__spec_barrier_fixup)
+

[PATCH RFC 5/8] powerpc/64: Patch barrier_nospec in modules

2018-03-13 Thread Michal Suchanek

Copypasta from lwsync patching.

Note that unlike RFI which is patched only in kernel the nospec state
reflects settings at the time the module was loaded.

Iterating all modules and re-patching every time the settings change is
not implemented.

Signed-off-by: Michal Suchanek 
---
 arch/powerpc/include/asm/setup.h  |  5 -
 arch/powerpc/kernel/module.c  |  6 ++
 arch/powerpc/kernel/setup_64.c|  4 ++--
 arch/powerpc/lib/feature-fixups.c | 17 ++---
 4 files changed, 26 insertions(+), 6 deletions(-)

diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h
index 486d02e4a310..7e3a41248810 100644
--- a/arch/powerpc/include/asm/setup.h
+++ b/arch/powerpc/include/asm/setup.h
@@ -58,7 +58,10 @@ enum spec_barrier_type {
 void __init setup_rfi_flush(enum l1d_flush_type, bool enable);
 void do_rfi_flush_fixups(enum l1d_flush_type types);
 void __init setup_barrier_nospec(enum spec_barrier_type, bool enable);
-void do_barrier_nospec_fixups(enum spec_barrier_type type);
+void do_barrier_nospec_fixups_kernel(enum spec_barrier_type type);
+void do_barrier_nospec_fixups(enum spec_barrier_type type,
+ void *start, void *end);
+extern enum spec_barrier_type powerpc_barrier_nospec;
 
 #endif /* !__ASSEMBLY__ */
 
diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c
index 3f7ba0f5bf29..7b6d0ec06a21 100644
--- a/arch/powerpc/kernel/module.c
+++ b/arch/powerpc/kernel/module.c
@@ -72,6 +72,12 @@ int module_finalize(const Elf_Ehdr *hdr,
do_feature_fixups(powerpc_firmware_features,
  (void *)sect->sh_addr,
  (void *)sect->sh_addr + sect->sh_size);
+
+   sect = find_section(hdr, sechdrs, "__spec_barrier_fixup");
+   if (sect != NULL)
+   do_barrier_nospec_fixups(powerpc_barrier_nospec,
+ (void *)sect->sh_addr,
+ (void *)sect->sh_addr + sect->sh_size);
 #endif
 
sect = find_section(hdr, sechdrs, "__lwsync_fixup");
diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c
index 09f21a954bfc..d1d9f047161e 100644
--- a/arch/powerpc/kernel/setup_64.c
+++ b/arch/powerpc/kernel/setup_64.c
@@ -909,11 +909,11 @@ void barrier_nospec_enable(bool enable)
 
if (enable) {
powerpc_barrier_nospec = barrier_nospec_type;
-   do_barrier_nospec_fixups(powerpc_barrier_nospec);
+   do_barrier_nospec_fixups_kernel(powerpc_barrier_nospec);
on_each_cpu(do_nothing, NULL, 1);
} else {
powerpc_barrier_nospec = SPEC_BARRIER_NONE;
-   do_barrier_nospec_fixups(powerpc_barrier_nospec);
+   do_barrier_nospec_fixups_kernel(powerpc_barrier_nospec);
}
 }
 
diff --git a/arch/powerpc/lib/feature-fixups.c 
b/arch/powerpc/lib/feature-fixups.c
index 000e153184ad..b59ebc2215e8 100644
--- a/arch/powerpc/lib/feature-fixups.c
+++ b/arch/powerpc/lib/feature-fixups.c
@@ -156,14 +156,15 @@ void do_rfi_flush_fixups(enum l1d_flush_type types)
printk(KERN_DEBUG "rfi-flush: patched %d locations\n", i);
 }
 
-void do_barrier_nospec_fixups(enum spec_barrier_type type)
+void do_barrier_nospec_fixups(enum spec_barrier_type type,
+ void *fixup_start, void *fixup_end)
 {
unsigned int instr, *dest;
long *start, *end;
int i;
 
-   start = PTRRELOC(&__start___spec_barrier_fixup),
-   end = PTRRELOC(&__stop___spec_barrier_fixup);
+   start = fixup_start;
+   end = fixup_end;
 
instr = 0x6000; /* nop */
 
@@ -182,6 +183,16 @@ void do_barrier_nospec_fixups(enum spec_barrier_type type)
printk(KERN_DEBUG "barrier-nospec: patched %d locations\n", i);
 }
 
+void do_barrier_nospec_fixups_kernel(enum spec_barrier_type type)
+{
+   void *start, *end;
+
+   start = PTRRELOC(&__start___spec_barrier_fixup),
+   end = PTRRELOC(&__stop___spec_barrier_fixup);
+
+   do_barrier_nospec_fixups(type, start, end);
+}
+
 #endif /* CONFIG_PPC_BOOK3S_64 */
 
 void do_lwsync_fixups(unsigned long value, void *fixup_start, void *fixup_end)
-- 
2.13.6

Re: [PATCH 00/11] RISC-V: Resolve the issue of loadable module on 64-bit

2018-03-13 Thread Palmer Dabbelt


On Tue, 13 Mar 2018 01:35:05 PDT (-0700), z...@andestech.com wrote:

These patches resolve the some issues of loadable module.
  - symbol out of ranges
  - unknown relocation types

The reference of external variable and function symbols
cannot exceed 32-bit offset ranges in kernel module.
The module only can work on the 32-bit OS or the 64-bit
OS with sv32 virtual addressing.

These patches will generate the .got, .got.plt and
.plt sections during loading module, let it can refer
to the symbol which locate more than 32-bit offset.
These sections depend on the relocation types:
 - R_RISCV_GOT_HI20
 - R_RISCV_CALL_PLT

These patches also support more relocation types
 - R_RISCV_CALL
 - R_RISCV_HI20
 - R_RISCV_LO12_I
 - R_RISCV_LO12_S
 - R_RISCV_RVC_BRANCH
 - R_RISCV_RVC_JUMP
 - R_RISCV_ALIGN
 - R_RISCV_ADD32
 - R_RISCV_SUB32

Zong Li (11):
  RISC-V: Add sections of PLT and GOT for kernel module
  RISC-V: Add section of GOT.PLT for kernel module
  RISC-V: Support GOT_HI20/CALL_PLT relocation type in kernel module
  RISC-V: Support CALL relocation type in kernel module
  RISC-V: Support HI20/LO12_I/LO12_S relocation type in kernel module
  RISC-V: Support RVC_BRANCH/JUMP relocation type in kernel modulewq
  RISC-V: Support ALIGN relocation type in kernel module
  RISC-V: Support ADD32 relocation type in kernel module
  RISC-V: Support SUB32 relocation type in kernel module
  RISC-V: Enable module support in defconfig
  RISC-V: Add definition of relocation types

 arch/riscv/Kconfig  |   5 ++
 arch/riscv/Makefile |   3 +
 arch/riscv/configs/defconfig|   2 +
 arch/riscv/include/asm/module.h | 112 +++
 arch/riscv/include/uapi/asm/elf.h   |  24 +
 arch/riscv/kernel/Makefile  |   1 +
 arch/riscv/kernel/module-sections.c | 156 
 arch/riscv/kernel/module.c  | 175 ++--
 arch/riscv/kernel/module.lds|   8 ++
 9 files changed, 480 insertions(+), 6 deletions(-)
 create mode 100644 arch/riscv/include/asm/module.h
 create mode 100644 arch/riscv/kernel/module-sections.c
 create mode 100644 arch/riscv/kernel/module.lds


This is the second set of patches that turn on modules, and it has the same
R_RISCV_ALIGN problem as the other one

   http://lists.infradead.org/pipermail/linux-riscv/2018-February/81.html

It looks like this one uses shared libraries for modules instead of static
objects.  I think using shared objects is the right thing to do, as it'll allow
us to place modules anywhere in the address space by having multiple GOTs and
PLTs.  That's kind of complicated, though, so we can start with something
simpler like this.

[PATCH RFC 1/8] powerpc: Add barrier_nospec

2018-03-13 Thread Michal Suchanek

Copypasta from original gmb() and rfi implementation

Signed-off-by: Michal Suchanek 
---
 arch/powerpc/include/asm/barrier.h | 9 +
 1 file changed, 9 insertions(+)

diff --git a/arch/powerpc/include/asm/barrier.h 
b/arch/powerpc/include/asm/barrier.h
index 10daa1d56e0a..8e47b3abe405 100644
--- a/arch/powerpc/include/asm/barrier.h
+++ b/arch/powerpc/include/asm/barrier.h
@@ -75,6 +75,15 @@ do { 
\
___p1;  \
 })
 
+/* TODO: add patching so this can be disabled */
+/* Prevent speculative execution past this barrier. */
+#define barrier_nospec_asm ori 31,31,0
+#ifdef __ASSEMBLY__
+#define barrier_nospec barrier_nospec_asm
+#else
+#define barrier_nospec() __asm__ __volatile__ 
(stringify_in_c(barrier_nospec_asm) : : :)
+#endif
+
 #include 
 
 #endif /* _ASM_POWERPC_BARRIER_H */
-- 
2.13.6

[PATCH RFC 2/8] powerpc: Use barrier_nospec in copy_from_user

2018-03-13 Thread Michal Suchanek

Coopypasta from x86.

Signed-off-by: Michal Suchanek 
---
 arch/powerpc/include/asm/uaccess.h | 11 ++-
 1 file changed, 10 insertions(+), 1 deletion(-)

diff --git a/arch/powerpc/include/asm/uaccess.h 
b/arch/powerpc/include/asm/uaccess.h
index 51bfeb8777f0..af9b0e731f46 100644
--- a/arch/powerpc/include/asm/uaccess.h
+++ b/arch/powerpc/include/asm/uaccess.h
@@ -248,6 +248,7 @@ do {
\
__chk_user_ptr(ptr);\
if (!is_kernel_addr((unsigned long)__gu_addr))  \
might_fault();  \
+   barrier_nospec();   \
__get_user_size(__gu_val, __gu_addr, (size), __gu_err); \
(x) = (__typeof__(*(ptr)))__gu_val; \
__gu_err;   \
@@ -258,8 +259,10 @@ do {   
\
long __gu_err = -EFAULT;\
unsigned long  __gu_val = 0;\
const __typeof__(*(ptr)) __user *__gu_addr = (ptr); \
+   int can_access = access_ok(VERIFY_READ, __gu_addr, (size)); \
might_fault();  \
-   if (access_ok(VERIFY_READ, __gu_addr, (size)))  \
+   barrier_nospec();   \
+   if (can_access) \
__get_user_size(__gu_val, __gu_addr, (size), __gu_err); \
(x) = (__force __typeof__(*(ptr)))__gu_val; 
\
__gu_err;   \
@@ -271,6 +274,7 @@ do {
\
unsigned long __gu_val; \
const __typeof__(*(ptr)) __user *__gu_addr = (ptr); \
__chk_user_ptr(ptr);\
+   barrier_nospec();   \
__get_user_size(__gu_val, __gu_addr, (size), __gu_err); \
(x) = (__force __typeof__(*(ptr)))__gu_val; \
__gu_err;   \
@@ -298,15 +302,19 @@ static inline unsigned long raw_copy_from_user(void *to,
 
switch (n) {
case 1:
+   barrier_nospec();
__get_user_size(*(u8 *)to, from, 1, ret);
break;
case 2:
+   barrier_nospec();
__get_user_size(*(u16 *)to, from, 2, ret);
break;
case 4:
+   barrier_nospec();
__get_user_size(*(u32 *)to, from, 4, ret);
break;
case 8:
+   barrier_nospec();
__get_user_size(*(u64 *)to, from, 8, ret);
break;
}
@@ -314,6 +322,7 @@ static inline unsigned long raw_copy_from_user(void *to,
return 0;
}
 
+   barrier_nospec();
return __copy_tofrom_user((__force void __user *)to, from, n);
 }
 
-- 
2.13.6

[PATCH v3 1/3] arm64: dts: apq8096-db820c: enable bluetooth node

2018-03-13 Thread Thierry Escande

Add a new serial node for the Qualcomm BT controller QCA6174. This
allows automatic probing and hci registration through the serdev
framework instead of relying on the userspace helpers.

Signed-off-by: Thierry Escande 
---
 arch/arm64/boot/dts/qcom/apq8096-db820c-pins.dtsi  | 14 ++
 .../boot/dts/qcom/apq8096-db820c-pmic-pins.dtsi| 17 
 arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi   | 32 ++
 arch/arm64/boot/dts/qcom/msm8996.dtsi  | 10 +++
 4 files changed, 73 insertions(+)

diff --git a/arch/arm64/boot/dts/qcom/apq8096-db820c-pins.dtsi 
b/arch/arm64/boot/dts/qcom/apq8096-db820c-pins.dtsi
index 24552f19b3fa..172165d84669 100644
--- a/arch/arm64/boot/dts/qcom/apq8096-db820c-pins.dtsi
+++ b/arch/arm64/boot/dts/qcom/apq8096-db820c-pins.dtsi
@@ -36,4 +36,18 @@
drive-strength = <2>;   /* 2 MA */
};
};
+
+   blsp1_uart1_default: blsp1_uart1_default {
+   function = "blsp_uart2";
+   pins = "gpio41", "gpio42", "gpio43", "gpio44";
+   drive-strength = <16>;
+   bias-disable;
+   };
+
+   blsp1_uart1_sleep: blsp1_uart1_sleep {
+   function = "gpio";
+   pins = "gpio41", "gpio42", "gpio43", "gpio44";
+   drive-strength = <2>;
+   bias-disable;
+   };
 };
diff --git a/arch/arm64/boot/dts/qcom/apq8096-db820c-pmic-pins.dtsi 
b/arch/arm64/boot/dts/qcom/apq8096-db820c-pmic-pins.dtsi
index 59b29ddfb6e9..f8d2a3b10b1f 100644
--- a/arch/arm64/boot/dts/qcom/apq8096-db820c-pmic-pins.dtsi
+++ b/arch/arm64/boot/dts/qcom/apq8096-db820c-pmic-pins.dtsi
@@ -26,6 +26,23 @@
};
};
 
+   divclk4_pin_a: divclk4 {
+   pins = "gpio18";
+   function = "func2";
+
+   bias-disable;
+   power-source = ;
+   };
+
+   bt_en_pin_a: bt-en-active {
+   pins = "gpio19";
+   function = "normal";
+
+   output-low;
+   power-source = ;
+   qcom,drive-strength = ;
+   };
+
usb3_vbus_det_gpio: pm8996_gpio22 {
pinconf {
pins = "gpio22";
diff --git a/arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi 
b/arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi
index 1c8f1b86472d..b05d6bc0b856 100644
--- a/arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi
+++ b/arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi
@@ -23,6 +23,7 @@
aliases {
serial0 = _uart1;
serial1 = _uart2;
+   serial2 = _uart1;
i2c0= _i2c2;
i2c1= _i2c1;
i2c2= _i2c0;
@@ -34,7 +35,38 @@
stdout-path = "serial0:115200n8";
};
 
+   clocks {
+   divclk4: divclk4 {
+   compatible = "fixed-clock";
+   #clock-cells = <0>;
+   clock-frequency = <32768>;
+   clock-output-names = "divclk4";
+
+   pinctrl-names = "default";
+   pinctrl-0 = <_pin_a>;
+   };
+   };
+
soc {
+   serial@757 {
+   label = "BT-UART";
+   status = "okay";
+   pinctrl-names = "default", "sleep";
+   pinctrl-0 = <_uart1_default>;
+   pinctrl-1 = <_uart1_sleep>;
+
+   bluetooth {
+   compatible = "qcom,qca6174-bt";
+
+   bt-disable-n-gpios = <_gpios 19 
GPIO_ACTIVE_HIGH>;
+
+   pinctrl-names = "default";
+   pinctrl-0 = <_en_pin_a>;
+
+   clocks = <>;
+   };
+   };
+
serial@75b {
label = "LS-UART1";
status = "okay";
diff --git a/arch/arm64/boot/dts/qcom/msm8996.dtsi 
b/arch/arm64/boot/dts/qcom/msm8996.dtsi
index 0a6f7952bbb1..2d54a86a027f 100644
--- a/arch/arm64/boot/dts/qcom/msm8996.dtsi
+++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi
@@ -408,6 +408,16 @@
#clock-cells = <1>;
};
 
+   blsp1_uart1: serial@757 {
+   compatible = "qcom,msm-uartdm-v1.4", "qcom,msm-uartdm";
+   reg = <0x0757 0x1000>;
+   interrupts = ;
+   clocks = < GCC_BLSP1_UART2_APPS_CLK>,
+< GCC_BLSP1_AHB_CLK>;
+   clock-names = "core", "iface";
+   status = "disabled";
+   };
+
blsp1_spi0: spi@7575000 {
compatible = "qcom,spi-qup-v2.2.1";
reg = <0x07575000 0x600>;
-- 
2.14.1

[RT PATCH 2/2] block: blk-mq: move blk_queue_usage_counter_release() into process context

2018-03-13 Thread Sebastian Andrzej Siewior

| BUG: sleeping function called from invalid context at 
kernel/locking/rtmutex.c:914
| in_atomic(): 1, irqs_disabled(): 0, pid: 255, name: kworker/u257:6
| 5 locks held by kworker/u257:6/255:
|  #0:  ("events_unbound"){.+.+.+}, at: [] 
process_one_work+0x171/0x5e0
|  #1:  ((>work)){+.+.+.}, at: [] 
process_one_work+0x171/0x5e0
|  #2:  (>scan_mutex){+.+.+.}, at: [] 
__scsi_add_device+0xa3/0x130 [scsi_mod]
|  #3:  (>tag_list_lock){+.+...}, at: [] 
blk_mq_init_queue+0x96a/0xa50
|  #4:  (rcu_read_lock_sched){..}, at: [] 
percpu_ref_kill_and_confirm+0x1d/0x120
| Preemption disabled at:[] 
blk_mq_freeze_queue_start+0x56/0x70
|
| CPU: 2 PID: 255 Comm: kworker/u257:6 Not tainted 3.18.7-rt0+ #1
| Workqueue: events_unbound async_run_entry_fn
|  0003 8800bc29f998 815b3a12 
|   8800bc29f9b8 8109aa16 8800bc29fa28
|  8800bc5d1bc8 8800bc29f9e8 815b8dd4 8800
| Call Trace:
|  [] dump_stack+0x4f/0x7c
|  [] __might_sleep+0x116/0x190
|  [] rt_spin_lock+0x24/0x60
|  [] __wake_up+0x29/0x60
|  [] blk_mq_usage_counter_release+0x1e/0x20
|  [] percpu_ref_kill_and_confirm+0x106/0x120
|  [] blk_mq_freeze_queue_start+0x56/0x70
|  [] blk_mq_update_tag_set_depth+0x40/0xd0
|  [] blk_mq_init_queue+0x98c/0xa50
|  [] scsi_mq_alloc_queue+0x20/0x60 [scsi_mod]
|  [] scsi_alloc_sdev+0x2f5/0x370 [scsi_mod]
|  [] scsi_probe_and_add_lun+0x9e4/0xdd0 [scsi_mod]
|  [] __scsi_add_device+0x126/0x130 [scsi_mod]
|  [] ata_scsi_scan_host+0xaf/0x200 [libata]
|  [] async_port_probe+0x46/0x60 [libata]
|  [] async_run_entry_fn+0x3b/0xf0
|  [] process_one_work+0x201/0x5e0

percpu_ref_kill_and_confirm() invokes blk_mq_usage_counter_release() in
a rcu-sched region. swait based wake queue can't be used due to
wake_up_all() usage and disabled interrupts in !RT configs (as reported
by Corey Minyard).
Uses work_queue() to invoke wake_up_all() in process context.

Signed-off-by: Sebastian Andrzej Siewior 
---
 block/blk-core.c   | 13 -
 include/linux/blkdev.h |  2 ++
 2 files changed, 14 insertions(+), 1 deletion(-)

diff --git a/block/blk-core.c b/block/blk-core.c
index f9ac6f169c67..4db4051724ea 100644
--- a/block/blk-core.c
+++ b/block/blk-core.c
@@ -814,12 +814,20 @@ void blk_queue_exit(struct request_queue *q)
percpu_ref_put(>q_usage_counter);
 }
 
+static void blk_queue_usage_counter_release_swork(struct swork_event *sev)
+{
+   struct request_queue *q =
+   container_of(sev, struct request_queue, mq_pcpu_wake);
+
+   wake_up_all(>mq_freeze_wq);
+}
+
 static void blk_queue_usage_counter_release(struct percpu_ref *ref)
 {
struct request_queue *q =
container_of(ref, struct request_queue, q_usage_counter);
 
-   wake_up_all(>mq_freeze_wq);
+   swork_queue(>mq_pcpu_wake);
 }
 
 static void blk_rq_timed_out_timer(unsigned long data)
@@ -896,6 +904,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, 
int node_id)
__set_bit(QUEUE_FLAG_BYPASS, >queue_flags);
 
init_waitqueue_head(>mq_freeze_wq);
+   INIT_SWORK(>mq_pcpu_wake, blk_queue_usage_counter_release_swork);
 
/*
 * Init percpu_ref in atomic mode so that it's faster to shutdown.
@@ -3623,6 +3632,8 @@ int __init blk_dev_init(void)
if (!kblockd_workqueue)
panic("Failed to create kblockd\n");
 
+   BUG_ON(swork_get());
+
request_cachep = kmem_cache_create("blkdev_requests",
sizeof(struct request), 0, SLAB_PANIC, NULL);
 
diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index b68752bfb645..49b53ad6d2d6 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 struct module;
 struct scsi_ioctl_command;
@@ -599,6 +600,7 @@ struct request_queue {
 #endif
struct rcu_head rcu_head;
wait_queue_head_t   mq_freeze_wq;
+   struct swork_event  mq_pcpu_wake;
struct percpu_ref   q_usage_counter;
struct list_headall_q_node;
 
-- 
2.16.2

[GIT PULL] auxdisplay for v4.16-rc6

2018-03-13 Thread Miguel Ojeda

Hi Linus,

Please pull these small changes to silence a few warnings in auxdisplay.

This is my first pull request, so if there is anything wrong, please let
me know!

Also, note that for the moment my PGP/GPG key is only signed by 2 other
kernel maintainers and that we did it over email/Skype. I used GitHub
as well for the same reason.

Cheers,
Miguel Ojeda

The following changes since commit 0c8efd610b58cb23cefdfa12015799079aef94ae:

  Linux 4.16-rc5 (2018-03-11 17:25:09 -0700)

are available in the git repository at:

  https://github.com/ojeda/linux.git tags/auxdisplay-for-linus-v4.16-rc6

for you to fetch changes up to 26a2c54d03bd514fb3d3520706f911b3ca56b011:

  auxdisplay: img-ascii-lcd: Silence 2 uninitialized warnings (2018-03-13 
18:16:38 +0100)


Silence a few warnings in auxdisplay.

  - A couple of uninitialized warnings reported by the build service.

  - A doc comment warning under W=1.

  - Three fall through comments not recognized under W=1.


Miguel Ojeda (3):
  auxdisplay: panel: Change comments to silence fallthrough warnings
  auxdisplay: img-ascii-lcd: Fix doc comment to silence warnings
  auxdisplay: img-ascii-lcd: Silence 2 uninitialized warnings

 drivers/auxdisplay/img-ascii-lcd.c | 6 +++---
 drivers/auxdisplay/panel.c | 6 +++---
 2 files changed, 6 insertions(+), 6 deletions(-)

Re: [PATCH v4 1/6] sysctl: Add flags to support min/max range clamping

2018-03-13 Thread Waiman Long

On 03/13/2018 01:46 PM, Eric W. Biederman wrote:
> Waiman Long  writes:
>
>> When minimum/maximum values are specified for a sysctl parameter in
>> the ctl_table structure with proc_dointvec_minmax() handler, update
>> to that parameter will fail with error if the given value is outside
>> of the required range.
>>
>> There are use cases where it may be better to clamp the value of
>> the sysctl parameter to the given range without failing the update,
>> especially if the users are not aware of the actual range limits.
>> Reading the value back after the update will now be a good practice
>> to see if the provided value exceeds the range limits.
> What use cases?  Who will break.  Examples would be good.

See my response to the 4/6 patch.

>> To provide this less restrictive form of range checking, a new flags
>> field is added to the ctl_table structure.
>>
>> When the CTL_FLAGS_CLAMP_RANGE flag is set in the ctl_table
>> entry, any update from the userspace will be clamped to the given
>> range without error if either the proc_dointvec_minmax() or the
>> proc_douintvec_minmax() handlers is used.
> As this is constructed it will increase the size of ctl_table for
> everyone.  Better would be to add a new function that behaves similary
> but differently than to burden struct ctl_table for the rest of time.

If you have concern about increasing the size of the ctl_table, I can
revert it back to use unsigned short for flag which will fit into the
hole left by the 16-bit mode field. I thought increasing the size of
ctl_table a bit isn't a big problem, I may be wrong on that.

Adding the new flags field will afford us the ability to better
customize the sysctl parameters to different needs that may arise in the
future.

Cheers,
Longman

[PATCH V2] perf/x86/intel/uncore: Querying number of CHAs from CAPID6 register

2018-03-13 Thread kan . liang

From: Kan Liang 

The number of CHAs is miscalculated on multi PCI domain systems on
Skylake server.

(From Kroening, Gary:

For systems with a single PCI segment, it is sufficient to look for the
bus number to change in order to determine that all of the CHa's have
been counted for a single socket.
However, for multi PCI segment systems, each socket is given a new
segment and the bus number does NOT change.  So looking only for the
bus number to change ends up counting all of the CHa's on all sockets
in the system.  This leads to writing CPU MSRs beyond a valid range and
causes an error in ivbep_uncore_msr_init_box().)

To determine the number of CHAs, it should read bits 27:0 in the CAPID6
register located at Device 30, Function 3, Offset 0x9C. These 28 bits
form a bit vector of available LLC slices and the CHAs that manage those
slices.

Fixes: cd34cd97b7b4 ("perf/x86/intel/uncore: Add Skylake server uncore
support")
Reported-by: Kroening, Gary 
Signed-off-by: Kan Liang 
---

Changes since V1:
 - add missed pci_dev_put()
 - Drop ugly casting by using hweight32()
 - Add comments for macros.

 arch/x86/events/intel/uncore_snbep.c | 31 +--
 1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/arch/x86/events/intel/uncore_snbep.c 
b/arch/x86/events/intel/uncore_snbep.c
index 6d8044a..8970f71 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -3562,24 +3562,27 @@ static struct intel_uncore_type *skx_msr_uncores[] = {
NULL,
 };
 
+/*
+ * To determine the number of CHAs, it should read bits 27:0 in the CAPID6
+ * register which located at Device 30, Function 3, Offset 0x9C. PCI ID 0x2083.
+ */
+#define SKX_CAPID6 0x9c
+#define SKX_CHA_BIT_MASK   GENMASK(27, 0)
+
 static int skx_count_chabox(void)
 {
-   struct pci_dev *chabox_dev = NULL;
-   int bus, count = 0;
+   struct pci_dev *dev = NULL;
+   u32 val = 0;
 
-   while (1) {
-   chabox_dev = pci_get_device(PCI_VENDOR_ID_INTEL, 0x208d, 
chabox_dev);
-   if (!chabox_dev)
-   break;
-   if (count == 0)
-   bus = chabox_dev->bus->number;
-   if (bus != chabox_dev->bus->number)
-   break;
-   count++;
-   }
+   dev = pci_get_device(PCI_VENDOR_ID_INTEL, 0x2083, dev);
+   if (!dev)
+   goto out;
 
-   pci_dev_put(chabox_dev);
-   return count;
+   pci_read_config_dword(dev, SKX_CAPID6, );
+   val &= SKX_CHA_BIT_MASK;
+out:
+   pci_dev_put(dev);
+   return hweight32(val);
 }
 
 void skx_uncore_cpu_init(void)
-- 
2.7.4

Re: [v5 1/2] mm: disable interrupts while initializing deferred pages

2018-03-13 Thread Andrew Morton

On Tue, 13 Mar 2018 12:04:30 -0400 Pavel Tatashin  
wrote:

> > 
> > > --- a/mm/page_alloc.c
> > > +++ b/mm/page_alloc.c
> > > @@ -1506,7 +1506,6 @@ static void __init deferred_free_pages(int nid, int 
> > > zid, unsigned long pfn,
> > >   } else if (!(pfn & nr_pgmask)) {
> > >   deferred_free_range(pfn - nr_free, nr_free);
> > >   nr_free = 1;
> > > - cond_resched();
> > >   } else {
> > >   nr_free++;
> > 
> > And how can we simply remove these cond_resched()s?  I assume this is
> > being done because interrupts are now disabled?  But those were there
> > for a reason, weren't they?
> 
> We must remove cond_resched() because we can't sleep anymore. They were
> added to fight NMI timeouts, so I will replace them with
> touch_nmi_watchdog() in a follow-up fix.

This makes no sense.  Any code section where we can add cond_resched()
was never subject to NMI timeouts because that code cannot be running with
disabled interrupts.

RE: [PATCH v5 1/4] dt-bindings: firmware: Add bindings for ZynqMP firmware

2018-03-13 Thread Jolly Shah



> -Original Message-
> From: Sudeep Holla [mailto:sudeep.ho...@arm.com]
> Sent: Tuesday, March 13, 2018 3:16 AM
> To: Jolly Shah 
> Cc: Sudeep Holla ; gre...@linuxfoundation.org;
> m...@codeblueprint.co.uk; hkallwe...@gmail.com; michal.si...@xilinx.com;
> robh...@kernel.org; mark.rutl...@arm.com; ard.biesheu...@linaro.org;
> mi...@kernel.org; keesc...@chromium.org; dmitry.torok...@gmail.com;
> Rajan Vaja ; linux-arm-ker...@lists.infradead.org; linux-
> ker...@vger.kernel.org; devicet...@vger.kernel.org
> Subject: Re: [PATCH v5 1/4] dt-bindings: firmware: Add bindings for ZynqMP
> firmware
> 
> 
> 
> On 12/03/18 23:07, Jolly Shah wrote:
> > Hi Sudeep,
> 
>  Do you foresee using SMC/HVC for this firmware even on future
> platforms?
>  If not, I suggest to keep the protocol part separate from the transport 
>  i.e.
>  smc/hvc via ATF. It could be replaced with mailbox or some h/w
>  mechanism in future ?
> 
> >>>
> >>> We have PSCI and EEMI interfaces exposed to linux from ATF. PSCI is
> >>> an EEMI client. We do not have current plans to switch to mailbox as
> >>> it will require 2 communication channels to PMU as PSCI is through ATF.
> >>>
> >>
> >> OK, but I just saw some bindings that has mailbox interface, honestly
> >> it's getting too confusing with multiple series on the same thing
> >> floating and hence I requested to put it together as one series.
> >
> > Mailbox binding is used for power management driver. Mailbox is only
> > used for PMU->APU communication. APU->PMU communication is always
> > through EEMI firmware interface which is using SMC/HVC.
> >
> 
> Ah OK, is it because there's no non-secure mailbox or to avoid races, all non-
> secure EEMI is channeled through SMC ?
> 

2 reasons:
1> Avoid multiple EEMI communication channels as PSCI is through ATF.
2> We have some secure operations handled in ATF because of memory constraints 
on PMU

> --
> Regards,
> Sudeep

Re: [PATCH 2/2] hwmon: (ucd9000) Add debugfs attributes to provide mfr_status

2018-03-13 Thread Eddie James




On 03/10/2018 10:50 AM, Guenter Roeck wrote:

On 03/09/2018 11:19 AM, Eddie James wrote:

From: Christopher Bostic 

Expose the gpiN_fault fields of mfr_status as individual debugfs
attributes. This provides a way for users to be easily notified of gpi
faults. Also provide the whole mfr_status register in debugfs.

Signed-off-by: Christopher Bostic 
Signed-off-by: Andrew Jeffery 
Signed-off-by: Eddie James 
---
  drivers/hwmon/pmbus/ucd9000.c | 172 
+-

  1 file changed, 171 insertions(+), 1 deletion(-)

diff --git a/drivers/hwmon/pmbus/ucd9000.c 
b/drivers/hwmon/pmbus/ucd9000.c

index e3a507f..297da0e 100644
--- a/drivers/hwmon/pmbus/ucd9000.c
+++ b/drivers/hwmon/pmbus/ucd9000.c
@@ -19,6 +19,7 @@
   * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
   */
  +#include 
  #include 
  #include 
  #include 
@@ -36,6 +37,7 @@
  #define UCD9000_NUM_PAGES    0xd6
  #define UCD9000_FAN_CONFIG_INDEX    0xe7
  #define UCD9000_FAN_CONFIG    0xe8
+#define UCD9000_MFR_STATUS    0xf3
  #define UCD9000_GPIO_SELECT    0xfa
  #define UCD9000_GPIO_CONFIG    0xfb
  #define UCD9000_DEVICE_ID    0xfd
@@ -63,13 +65,22 @@
  #define UCD901XX_NUM_GPIOS    26
  #define UCD90910_NUM_GPIOS    26
  +#define UCD9000_DEBUGFS_NAME_LEN    24
+#define UCD9000_GPI_COUNT    8
+
  struct ucd9000_data {
  u8 fan_data[UCD9000_NUM_FAN][I2C_SMBUS_BLOCK_MAX];
  struct pmbus_driver_info info;
  struct gpio_chip gpio;
+    struct dentry *debugfs;
  };
  #define to_ucd9000_data(_info) container_of(_info, struct 
ucd9000_data, info)

  +struct ucd9000_debugfs_entry {
+    struct i2c_client *client;
+    u8 index;
+};
+
  static int ucd9000_get_fan_config(struct i2c_client *client, int fan)
  {
  int fan_config = 0;
@@ -328,6 +339,156 @@ static int ucd9000_gpio_direction_output(struct 
gpio_chip *gc,

    val);
  }
  +#if IS_ENABLED(CONFIG_DEBUG_FS)
+static int ucd9000_get_mfr_status(struct i2c_client *client, u8 
*buffer)

+{
+    int ret = pmbus_set_page(client, 0);
+
+    if (ret < 0)
+    return ret;
+
+    /*
+ * With the ucd90120 and ucd90124 devices, this command 
[MFR_STATUS]
+ * is 2 bytes long (bits 0-15).  With the ucd90240 this command 
is 5

+ * bytes long.  With all other devices, it is 4 bytes long.
+ */
+    return i2c_smbus_read_block_data(client, UCD9000_MFR_STATUS, 
buffer);

+}
+
+static int ucd9000_debugfs_show_mfr_status_bit(void *data, u64 *val)
+{
+    struct ucd9000_debugfs_entry *entry = data;
+    struct i2c_client *client = entry->client;
+    u8 buffer[4];
+    int ret;
+
+    /*
+ * This attribute is only created for devices that return 4 
bytes for

+ * status_mfr, so it's safe to call with 4-byte buffer.
+ */
+    ret = ucd9000_get_mfr_status(client, buffer);
+    if (ret < 0) {
+    dev_err(>dev, "Failed to read mfr status. rc:%d\n",
+    ret);
+
+    return ret;
+    }
+
+    /*
+ * Attribute only created for devices with gpi fault bits at bits
+ * 16-23, which is the second byte of the response.
+ */
+    *val = !!(buffer[1] & BIT(entry->index));
+
+    return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(ucd9000_debugfs_mfr_status_bit,
+ ucd9000_debugfs_show_mfr_status_bit, NULL, "%1lld\n");
+
+static int ucd9000_debugfs_show_mfr_status_word2(void *data, u64 *val)
+{
+    struct i2c_client *client = data;
+    __be16 buffer;
+    int ret;
+
+    ret = ucd9000_get_mfr_status(client, (u8 *));
+    if (ret < 0) {
+    dev_err(>dev, "Failed to read mfr status. rc:%d\n",
+    ret);
+
+    return ret;
+    }
+
+    *val = be16_to_cpu(buffer);
+
+    return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(ucd9000_debugfs_mfr_status_word2,
+ ucd9000_debugfs_show_mfr_status_word2, NULL,
+ "%04llx\n");
+
+static int ucd9000_debugfs_show_mfr_status_word4(void *data, u64 *val)
+{
+    struct i2c_client *client = data;
+    __be32 buffer;
+    int ret;
+
+    ret = ucd9000_get_mfr_status(client, (u8 *));
+    if (ret < 0) {
+    dev_err(>dev, "Failed to read mfr status. rc:%d\n",
+    ret);
+
+    return ret;
+    }
+
+    *val = be32_to_cpu(buffer);
+
+    return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(ucd9000_debugfs_mfr_status_word4,
+ ucd9000_debugfs_show_mfr_status_word4, NULL,
+ "%08llx\n");
+
+static int ucd9000_init_debugfs(struct i2c_client *client,
+    const struct i2c_device_id *mid,
+    struct ucd9000_data *data)
+{
+    struct dentry *debugfs;
+    struct ucd9000_debugfs_entry *entries;
+    int i;
+    char name[UCD9000_DEBUGFS_NAME_LEN];
+
+    debugfs = pmbus_get_debugfs_dir(client);
+    if (!debugfs)
+    return -ENOENT;
+
+    data->debugfs = debugfs_create_dir(client->name, debugfs);
+    if (!data->debugfs)
+    return -ENOENT;
+
+    /*
+ * Of

[PATCH] hv_netvsc: Make sure out channel is fully opened on send

2018-03-13 Thread Mohammed Gamal

Dring high network traffic changes to network interface parameters
such as number of channels or MTU can cause a kernel panic with a NULL
pointer dereference. This is due to netvsc_device_remove() being
called and deallocating the channel ring buffers, which can then be
accessed by netvsc_send_pkt() before they're allocated on calling
netvsc_device_add()

The patch fixes this problem by checking the channel state and returning
ENODEV if not yet opened. We also move the call to hv_ringbuf_avail_percent()
which may access the uninitialized ring buffer.

Signed-off-by: Mohammed Gamal 
---
 drivers/net/hyperv/netvsc.c | 5 +++--
 1 file changed, 3 insertions(+), 2 deletions(-)

diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c
index 0265d70..44a8358 100644
--- a/drivers/net/hyperv/netvsc.c
+++ b/drivers/net/hyperv/netvsc.c
@@ -757,7 +757,7 @@ static inline int netvsc_send_pkt(
struct netdev_queue *txq = netdev_get_tx_queue(ndev, packet->q_idx);
u64 req_id;
int ret;
-   u32 ring_avail = hv_ringbuf_avail_percent(_channel->outbound);
+   u32 ring_avail;
 
nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT;
if (skb)
@@ -773,7 +773,7 @@ static inline int netvsc_send_pkt(
 
req_id = (ulong)skb;
 
-   if (out_channel->rescind)
+   if (out_channel->rescind || out_channel->state != CHANNEL_OPENED_STATE)
return -ENODEV;
 
if (packet->page_buf_cnt) {
@@ -791,6 +791,7 @@ static inline int netvsc_send_pkt(
   
VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED);
}
 
+   ring_avail = hv_ringbuf_avail_percent(_channel->outbound);
if (ret == 0) {
atomic_inc_return(>queue_sends);
 
-- 
1.8.3.1

Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory

2018-03-13 Thread Sinan Kaya

On 3/13/2018 2:44 PM, Logan Gunthorpe wrote:
>> Do you think you can keep a pointer to the parent bridge instead of querying 
>> it
>> via get_upstream_bridge_port() here so that we can reuse your
>> pci_p2pdma_disable_acs() in the future.
> 
> Keep a pointer where? pci_p2pdma_disable_acs() and pci_p2pdma_add_client() 
> are used in completely different cases on completely different devices. There 
> really is no overlap and no obvious place to store the port pointer (except 
> in the struct pci_dev itself, in which case why not just call the function 
> again).

I was thinking of this for the pci_p2pdma_add_client() case for the
parent pointer.

+struct pci_p2pdma_client {
+   struct list_head list;
+   struct pci_dev *client;
+   struct pci_dev *provider;
+};

You are right second code seems to be there to disable ACS altogether
when this kernel configuration is selected.

It doesn't have any relationship to the client code.

But then, Why bother searching for the switch at all?

Even if the switch is there, there is no guarantee that it is currently
being used for P2P.

It seems that we are going with the assumption that enabling this config
option implies you want P2P, then we can simplify this code as well.

-- 
Sinan Kaya
Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm 
Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux 
Foundation Collaborative Project.

[PATCH net 2/2] vmxnet3: use correct flag to indicate LRO feature

2018-03-13 Thread Ronak Doshi

'Commit 45dac1d6ea04 ("vmxnet3: Changes for vmxnet3 adapter version 2
(fwd)")' introduced a flag "lro" in structure vmxnet3_adapter which is
used to indicate whether LRO is enabled or not. However, the patch
did not set the flag and hence it was never exercised.

So, when LRO is enabled, it resulted in poor TCP performance due to
delayed acks. This issue is seen with packets which are larger than
the mss getting a delayed ack rather than an immediate ack, thus
resulting in high latency.

This patch removes the lro flag and directly uses device features
against NETIF_F_LRO to check if lro is enabled.

Reported-by: Rachel Lunnon 
Signed-off-by: Ronak Doshi 
Acked-by: Shrikrishna Khare 
---
 drivers/net/vmxnet3/vmxnet3_drv.c | 3 ++-
 drivers/net/vmxnet3/vmxnet3_int.h | 5 ++---
 2 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c 
b/drivers/net/vmxnet3/vmxnet3_drv.c
index 052eef2f729f..86c4d6e4dfaa 100644
--- a/drivers/net/vmxnet3/vmxnet3_drv.c
+++ b/drivers/net/vmxnet3/vmxnet3_drv.c
@@ -1473,7 +1473,8 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq,
vmxnet3_rx_csum(adapter, skb,
(union Vmxnet3_GenericDesc *)rcd);
skb->protocol = eth_type_trans(skb, adapter->netdev);
-   if (!rcd->tcp || !adapter->lro)
+   if (!rcd->tcp ||
+   !(adapter->netdev->features & NETIF_F_LRO))
goto not_lro;
 
if (segCnt != 0 && mss != 0) {
diff --git a/drivers/net/vmxnet3/vmxnet3_int.h 
b/drivers/net/vmxnet3/vmxnet3_int.h
index b94fdfd0b6f1..99387a4a20a8 100644
--- a/drivers/net/vmxnet3/vmxnet3_int.h
+++ b/drivers/net/vmxnet3/vmxnet3_int.h
@@ -69,10 +69,10 @@
 /*
  * Version numbers
  */
-#define VMXNET3_DRIVER_VERSION_STRING   "1.4.12.0-k"
+#define VMXNET3_DRIVER_VERSION_STRING   "1.4.13.0-k"
 
 /* a 32-bit int, each byte encode a verion number in VMXNET3_DRIVER_VERSION */
-#define VMXNET3_DRIVER_VERSION_NUM  0x01040c00
+#define VMXNET3_DRIVER_VERSION_NUM  0x01040d00
 
 #if defined(CONFIG_PCI_MSI)
/* RSS only makes sense if MSI-X is supported. */
@@ -343,7 +343,6 @@ struct vmxnet3_adapter {
u8  version;
 
boolrxcsum;
-   boollro;
 
 #ifdef VMXNET3_RSS
struct UPT1_RSSConf *rss_conf;
-- 
2.11.0

[PATCH] iio/adc/nau7802: Improve unlocking of a mutex in nau7802_read_raw()

2018-03-13 Thread SF Markus Elfring

From: Markus Elfring 
Date: Tue, 13 Mar 2018 20:52:26 +0100

* Add a jump target so that a call of the function "mutex_unlock" is stored
  only once in this function implementation.

* Replace two calls by goto statements.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 drivers/iio/adc/nau7802.c | 15 +--
 1 file changed, 5 insertions(+), 10 deletions(-)

diff --git a/drivers/iio/adc/nau7802.c b/drivers/iio/adc/nau7802.c
index 8997e74a8847..68d06a492760 100644
--- a/drivers/iio/adc/nau7802.c
+++ b/drivers/iio/adc/nau7802.c
@@ -303,10 +303,8 @@ static int nau7802_read_raw(struct iio_dev *indio_dev,
 *   - Channel 2 is value 1 in the CHS register
 */
ret = i2c_smbus_read_byte_data(st->client, NAU7802_REG_CTRL2);
-   if (ret < 0) {
-   mutex_unlock(>lock);
-   return ret;
-   }
+   if (ret < 0)
+   goto unlock;
 
if (((ret & NAU7802_CTRL2_CHS_BIT) && !chan->channel) ||
(!(ret & NAU7802_CTRL2_CHS_BIT) &&
@@ -316,18 +314,15 @@ static int nau7802_read_raw(struct iio_dev *indio_dev,
NAU7802_REG_CTRL2,
NAU7802_CTRL2_CHS(chan->channel) |
NAU7802_CTRL2_CRS(st->sample_rate));
-
-   if (ret < 0) {
-   mutex_unlock(>lock);
-   return ret;
-   }
+   if (ret < 0)
+   goto unlock;
}
 
if (st->client->irq)
ret = nau7802_read_irq(indio_dev, chan, val);
else
ret = nau7802_read_poll(indio_dev, chan, val);
-
+unlock:
mutex_unlock(>lock);
return ret;
 
-- 
2.16.2

Re: [PATCH 2/2] hwmon: (ucd9000) Add debugfs attributes to provide mfr_status

2018-03-13 Thread Eddie James




On 03/13/2018 02:29 PM, Guenter Roeck wrote:

On Tue, Mar 13, 2018 at 02:01:51PM -0500, Eddie James wrote:


On 03/10/2018 10:50 AM, Guenter Roeck wrote:

On 03/09/2018 11:19 AM, Eddie James wrote:

From: Christopher Bostic 

Expose the gpiN_fault fields of mfr_status as individual debugfs
attributes. This provides a way for users to be easily notified of gpi
faults. Also provide the whole mfr_status register in debugfs.

Signed-off-by: Christopher Bostic 
Signed-off-by: Andrew Jeffery 
Signed-off-by: Eddie James 
---
   drivers/hwmon/pmbus/ucd9000.c | 172
+-
   1 file changed, 171 insertions(+), 1 deletion(-)

diff --git a/drivers/hwmon/pmbus/ucd9000.c
b/drivers/hwmon/pmbus/ucd9000.c
index e3a507f..297da0e 100644
--- a/drivers/hwmon/pmbus/ucd9000.c
+++ b/drivers/hwmon/pmbus/ucd9000.c
@@ -19,6 +19,7 @@
    * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA.
    */
   +#include 
   #include 
   #include 
   #include 
@@ -36,6 +37,7 @@
   #define UCD9000_NUM_PAGES    0xd6
   #define UCD9000_FAN_CONFIG_INDEX    0xe7
   #define UCD9000_FAN_CONFIG    0xe8
+#define UCD9000_MFR_STATUS    0xf3
   #define UCD9000_GPIO_SELECT    0xfa
   #define UCD9000_GPIO_CONFIG    0xfb
   #define UCD9000_DEVICE_ID    0xfd
@@ -63,13 +65,22 @@
   #define UCD901XX_NUM_GPIOS    26
   #define UCD90910_NUM_GPIOS    26
   +#define UCD9000_DEBUGFS_NAME_LEN    24
+#define UCD9000_GPI_COUNT    8
+
   struct ucd9000_data {
   u8 fan_data[UCD9000_NUM_FAN][I2C_SMBUS_BLOCK_MAX];
   struct pmbus_driver_info info;
   struct gpio_chip gpio;
+    struct dentry *debugfs;
   };
   #define to_ucd9000_data(_info) container_of(_info, struct
ucd9000_data, info)
   +struct ucd9000_debugfs_entry {
+    struct i2c_client *client;
+    u8 index;
+};
+
   static int ucd9000_get_fan_config(struct i2c_client *client, int fan)
   {
   int fan_config = 0;
@@ -328,6 +339,156 @@ static int ucd9000_gpio_direction_output(struct
gpio_chip *gc,
     val);
   }
   +#if IS_ENABLED(CONFIG_DEBUG_FS)
+static int ucd9000_get_mfr_status(struct i2c_client *client, u8
*buffer)
+{
+    int ret = pmbus_set_page(client, 0);
+
+    if (ret < 0)
+    return ret;
+
+    /*
+ * With the ucd90120 and ucd90124 devices, this command
[MFR_STATUS]
+ * is 2 bytes long (bits 0-15).  With the ucd90240 this command is
5
+ * bytes long.  With all other devices, it is 4 bytes long.
+ */
+    return i2c_smbus_read_block_data(client, UCD9000_MFR_STATUS,
buffer);
+}
+
+static int ucd9000_debugfs_show_mfr_status_bit(void *data, u64 *val)
+{
+    struct ucd9000_debugfs_entry *entry = data;
+    struct i2c_client *client = entry->client;
+    u8 buffer[4];
+    int ret;
+
+    /*
+ * This attribute is only created for devices that return 4 bytes
for
+ * status_mfr, so it's safe to call with 4-byte buffer.
+ */
+    ret = ucd9000_get_mfr_status(client, buffer);
+    if (ret < 0) {
+    dev_err(>dev, "Failed to read mfr status. rc:%d\n",
+    ret);
+
+    return ret;
+    }
+
+    /*
+ * Attribute only created for devices with gpi fault bits at bits
+ * 16-23, which is the second byte of the response.
+ */
+    *val = !!(buffer[1] & BIT(entry->index));
+
+    return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(ucd9000_debugfs_mfr_status_bit,
+ ucd9000_debugfs_show_mfr_status_bit, NULL, "%1lld\n");
+
+static int ucd9000_debugfs_show_mfr_status_word2(void *data, u64 *val)
+{
+    struct i2c_client *client = data;
+    __be16 buffer;
+    int ret;
+
+    ret = ucd9000_get_mfr_status(client, (u8 *));
+    if (ret < 0) {
+    dev_err(>dev, "Failed to read mfr status. rc:%d\n",
+    ret);
+
+    return ret;
+    }
+
+    *val = be16_to_cpu(buffer);
+
+    return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(ucd9000_debugfs_mfr_status_word2,
+ ucd9000_debugfs_show_mfr_status_word2, NULL,
+ "%04llx\n");
+
+static int ucd9000_debugfs_show_mfr_status_word4(void *data, u64 *val)
+{
+    struct i2c_client *client = data;
+    __be32 buffer;
+    int ret;
+
+    ret = ucd9000_get_mfr_status(client, (u8 *));
+    if (ret < 0) {
+    dev_err(>dev, "Failed to read mfr status. rc:%d\n",
+    ret);
+
+    return ret;
+    }
+
+    *val = be32_to_cpu(buffer);
+
+    return 0;
+}
+DEFINE_DEBUGFS_ATTRIBUTE(ucd9000_debugfs_mfr_status_word4,
+ ucd9000_debugfs_show_mfr_status_word4, NULL,
+ "%08llx\n");
+
+static int ucd9000_init_debugfs(struct i2c_client *client,
+    const struct i2c_device_id *mid,
+    struct ucd9000_data *data)
+{
+    struct dentry *debugfs;
+    struct ucd9000_debugfs_entry *entries;
+    int i;
+    char name[UCD9000_DEBUGFS_NAME_LEN];
+
+    debugfs = pmbus_get_debugfs_dir(client);
+    if (!debugfs)
+    return -ENOENT;
+
+

Re: [tip:locking/core 9/11] include/asm-generic/atomic-instrumented.h:288:24: sparse: cast truncates bits from constant value (100 becomes 0)

2018-03-13 Thread Luc Van Oostenryck

On Tue, Mar 13, 2018 at 07:08:06PM +0100, Peter Zijlstra wrote:
> On Tue, Mar 13, 2018 at 07:00:17PM +0100, Luc Van Oostenryck wrote:
> > The issue here is that sparse has a whole class of warnings that are
> > given very early (here at expansion of constant expressions), before
> > eliminating code from branches that are never taken (which, surprise,
> > need itself to have constant expressions already expanded).
> > 
> > It's often annoying like the case here.
> > OTOH, I don't think it's always a bad thing. Sometimes we want to
> > have warnings even from code we know will not be executed (in this
> > config but maybe it will in another one).
> 
> Is that really a valid concern with all the automated randconfig
> building going on today?

I don't think so, for the kernel at least. For other uses it may.
But don't take me wrongly: I don't want to defend those warnings here,
I just want to say that the situation is not totally black & white.


One easy-short-term solution that wouldn't make things ugly would be
to use a mask instead of a cast:

 static __always_inline unsigned long
 cmpxchg_size(volatile void *ptr, unsigned long old, unsigned long new, 
int size)
 {
switch (size) {
case 1:
-   return arch_cmpxchg((u8 *)ptr, (u8)old, (u8)new);
+   return arch_cmpxchg((u8 *)ptr, old & 0xff, new & 0xff);
case 2:
-   return arch_cmpxchg((u16 *)ptr, (u16)old, (u16)new);
+   return arch_cmpxchg((u16 *)ptr, old & 0x, new & 
0x);


-- Luc

Re: [PATCH RFC 1/8] powerpc: Add barrier_nospec

2018-03-13 Thread Peter Zijlstra

On Tue, Mar 13, 2018 at 07:32:59PM +0100, Michal Suchanek wrote:
> Copypasta from original gmb() and rfi implementation

Actual real changelogs would be very welcome. Seeing as I've not seen
these mythical RFI patches, this leaves one quite puzzled :-)

[PATCH] media: staging/imx: fill vb2_v4l2_buffer sequence entry

2018-03-13 Thread Peter Seiderer

Signed-off-by: Peter Seiderer 
---
 drivers/staging/media/imx/imx-media-csi.c | 5 +
 1 file changed, 5 insertions(+)

diff --git a/drivers/staging/media/imx/imx-media-csi.c 
b/drivers/staging/media/imx/imx-media-csi.c
index 5a195f80a24d..3a6a645b9dce 100644
--- a/drivers/staging/media/imx/imx-media-csi.c
+++ b/drivers/staging/media/imx/imx-media-csi.c
@@ -111,6 +111,7 @@ struct csi_priv {
struct v4l2_ctrl_handler ctrl_hdlr;
 
int stream_count; /* streaming counter */
+   __u32 frame_sequence; /* frame sequence counter */
bool last_eof;   /* waiting for last EOF at stream off */
bool nfb4eof;/* NFB4EOF encountered during streaming */
struct completion last_eof_comp;
@@ -234,8 +235,11 @@ static void csi_vb2_buf_done(struct csi_priv *priv)
struct vb2_buffer *vb;
dma_addr_t phys;
 
+   priv->frame_sequence++;
+
done = priv->active_vb2_buf[priv->ipu_buf_num];
if (done) {
+   done->vbuf.sequence = priv->frame_sequence;
vb = >vbuf.vb2_buf;
vb->timestamp = ktime_get_ns();
vb2_buffer_done(vb, priv->nfb4eof ?
@@ -543,6 +547,7 @@ static int csi_idmac_start(struct csi_priv *priv)
 
/* init EOF completion waitq */
init_completion(>last_eof_comp);
+   priv->frame_sequence = 0;
priv->last_eof = false;
priv->nfb4eof = false;
 
-- 
2.16.2

Re: [PATCH] net: dsa: drop some VLAs in switch.c

2018-03-13 Thread Florian Fainelli

On 03/13/2018 12:58 PM, Vivien Didelot wrote:
> Hi Salvatore,
> 
> Salvatore Mesoraca  writes:
> 
>> dsa_switch's num_ports is currently fixed to DSA_MAX_PORTS. So we avoid
>> 2 VLAs[1] by using DSA_MAX_PORTS instead of ds->num_ports.
>>
>> [1] https://lkml.org/lkml/2018/3/7/621
>>
>> Signed-off-by: Salvatore Mesoraca 
> 
> NAK.
> 
> We are in the process to remove hardcoded limits such as DSA_MAX_PORTS
> and DSA_MAX_SWITCHES, so we have to stick with ds->num_ports.

Then this means that we need to allocate a bitmap from the heap, which
sounds a bit superfluous and could theoretically fail... not sure which
way is better, but bumping the size to DSA_MAX_PORTS definitively does
help people working on enabling -Wvla.
-- 
Florian

Re: [PATCH] net: dsa: drop some VLAs in switch.c

2018-03-13 Thread Vivien Didelot

Hi Salvatore,

Salvatore Mesoraca  writes:

> dsa_switch's num_ports is currently fixed to DSA_MAX_PORTS. So we avoid
> 2 VLAs[1] by using DSA_MAX_PORTS instead of ds->num_ports.
>
> [1] https://lkml.org/lkml/2018/3/7/621
>
> Signed-off-by: Salvatore Mesoraca 

NAK.

We are in the process to remove hardcoded limits such as DSA_MAX_PORTS
and DSA_MAX_SWITCHES, so we have to stick with ds->num_ports.

Thanks,

Vivien

Re: [RESEND PATCH] rsi: Remove stack VLA usage

2018-03-13 Thread Tobin C. Harding

On Sun, Mar 11, 2018 at 09:06:10PM -0500, Larry Finger wrote:
> On 03/11/2018 08:43 PM, Tobin C. Harding wrote:
> >The kernel would like to have all stack VLA usage removed[1].  rsi uses
> >a VLA based on 'blksize'.  Elsewhere in the SDIO code maximum block size
> >is defined using a magic number.  We can use a pre-processor defined
> >constant and declare the array to maximum size.  We add a check before
> >accessing the array in case of programmer error.
> >
> >[1]: https://lkml.org/lkml/2018/3/7/621
> >
> >Signed-off-by: Tobin C. Harding 
> >---
> >
> >RESEND: add wireless mailing list to CC's (requested by Kalle)
> >
> >  drivers/net/wireless/rsi/rsi_91x_hal.c  | 13 +++--
> >  drivers/net/wireless/rsi/rsi_91x_sdio.c |  9 +++--
> >  2 files changed, 14 insertions(+), 8 deletions(-)
> >
> >diff --git a/drivers/net/wireless/rsi/rsi_91x_hal.c 
> >b/drivers/net/wireless/rsi/rsi_91x_hal.c
> >index 1176de646942..839ebdd602df 100644
> >--- a/drivers/net/wireless/rsi/rsi_91x_hal.c
> >+++ b/drivers/net/wireless/rsi/rsi_91x_hal.c
> >@@ -641,7 +641,7 @@ static int ping_pong_write(struct rsi_hw *adapter, u8 
> >cmd, u8 *addr, u32 size)
> > u32 cmd_addr;
> > u16 cmd_resp, cmd_req;
> > u8 *str;
> >-int status;
> >+int status, ret;
> > if (cmd == PING_WRITE) {
> > cmd_addr = PING_BUFFER_ADDRESS;
> >@@ -655,12 +655,13 @@ static int ping_pong_write(struct rsi_hw *adapter, u8 
> >cmd, u8 *addr, u32 size)
> > str = "PONG_VALID";
> > }
> >-status = hif_ops->load_data_master_write(adapter, cmd_addr, size,
> >+ret = hif_ops->load_data_master_write(adapter, cmd_addr, size,
> > block_size, addr);
> >-if (status) {
> >-rsi_dbg(ERR_ZONE, "%s: Unable to write blk at addr %0x\n",
> >-__func__, *addr);
> >-return status;
> >+if (ret) {
> >+if (ret != -EINVAL)
> >+rsi_dbg(ERR_ZONE, "%s: Unable to write blk at addr 
> >%0x\n",
> >+__func__, *addr);
> >+return ret;
> > }
> > status = bl_cmd(adapter, cmd_req, cmd_resp, str);
> >diff --git a/drivers/net/wireless/rsi/rsi_91x_sdio.c 
> >b/drivers/net/wireless/rsi/rsi_91x_sdio.c
> >index b0cf41195051..b766578b591a 100644
> >--- a/drivers/net/wireless/rsi/rsi_91x_sdio.c
> >+++ b/drivers/net/wireless/rsi/rsi_91x_sdio.c
> >@@ -20,6 +20,8 @@
> >  #include "rsi_common.h"
> >  #include "rsi_hal.h"
> >+#define RSI_MAX_BLOCK_SIZE 256
> >+
> >  /**
> >   * rsi_sdio_set_cmd52_arg() - This function prepares cmd 52 read/write arg.
> >   * @rw: Read/write
> >@@ -362,7 +364,7 @@ static int rsi_setblocklength(struct rsi_hw *adapter, 
> >u32 length)
> > rsi_dbg(INIT_ZONE, "%s: Setting the block length\n", __func__);
> > status = sdio_set_block_size(dev->pfunction, length);
> >-dev->pfunction->max_blksize = 256;
> >+dev->pfunction->max_blksize = RSI_MAX_BLOCK_SIZE;
> > adapter->block_size = dev->pfunction->max_blksize;
> > rsi_dbg(INFO_ZONE,
> >@@ -567,9 +569,12 @@ static int rsi_sdio_load_data_master_write(struct 
> >rsi_hw *adapter,
> >  {
> > u32 num_blocks, offset, i;
> > u16 msb_address, lsb_address;
> >-u8 temp_buf[block_size];
> >+u8 temp_buf[RSI_MAX_BLOCK_SIZE];
> > int status;
> >+if (block_size > RSI_MAX_BLOCK_SIZE)
> >+return -EINVAL;
> >+
> > num_blocks = instructions_sz / block_size;
> > msb_address = base_address >> 16;
> 
> I am not giving this patch a negative review, but my solution to the same
> problem has been to change the on-stack array into a u8 pointer, use
> kmalloc() to assign the space, and then free that space at the end. That way
> large stack allocations are avoided, with a minimum of changes.

Your idea is better Larry, have you got a patch done already or do you
want me to knock one up?

thanks,
Tobin.

Re: perf-core build fails on powerpc

2018-03-13 Thread Sukadev Bhattiprolu

John Garry [john.ga...@huawei.com] wrote:
> On 13/03/2018 19:17, Sukadev Bhattiprolu wrote:
> > 
> > 
> > Building perf on Powerpc seems broken when using Arnaldo's perf/core branch
> > with HEAD as:
> > 
> > 1b442ed ("perf test: Fix exit code for record+probe_libc_inet_pton.sh")
> > 
> > It maybe related to this commit:
> > 
> > commit d596299
> > Author: John Garry 
> > Date:   Thu Mar 8 18:58:29 2018 +0800
> > 
> > perf vendor events: Add support for pmu events vendor subdirectory
> > 
> > Reverting this hunk from tools/perf/pmu-events/jevents.c, seems to fix the
> > problem for me.
> > 

Hi John,

I have an xfs file system which seems to have d_type == DT_UNKNOWN for all
entries in 'tools/perf/pmu-events/arch/power8'! readdir(3) says ->d_type
may not be supported by all file systems.

Not relying on ->d_type seems to fix it:

@@ -873,26 +879,26 @@ static int is_leaf_dir(const char *fpath)
return 0;
 
while ((dir = readdir(d)) != NULL) {
-   if (dir->d_type == DT_DIR && dir->d_name[0] != '.') {
-   res = 0;
-   break;
-   } else if (dir->d_type == DT_UNKNOWN) {
-   char path[PATH_MAX];
-   struct stat st;
+   char path[PATH_MAX];
+   struct stat st;
 
-   sprintf(path, "%s/%s", fpath, dir->d_name);
-   if (stat(path, ))
-   break;
+   if (strcmp(dir->d_name, ".") == 0 ||
+   strcmp(dir->d_name, "..") == 0)
+   continue;
 
-   if (S_ISDIR(st.st_mode)) {
-   res = 0;
-   break;
-   }
+   sprintf(path, "%s/%s", fpath, dir->d_name);
+   if (stat(path, ))
+   break;
+
+   if (S_ISDIR(st.st_mode)) {
+   res = 0;
+   break;

Re: linux-next: Tree for Mar 13

2018-03-13 Thread Stephen Rothwell

Hi Mike,

On Tue, 13 Mar 2018 11:57:40 -0400 Mike Snitzer  wrote:
>
> I had to revert the following commits to get this kernel to build
> (otherwise I got macro expansion errors, using RHEL7
> gcc-4.8.5-28.el7.x86_64):

Yeah, I reported that earlier (216 of my 244 builds failed - gcc 4.6.3)

> beb7eb2 kernelh-skip-single-eval-logic-on-literals-in-min-max-v3
> 19ff7e5 kernelh-skip-single-eval-logic-on-literals-in-min-max-v2
> c7c133f kernel.h: skip single-eval logic on literals in min()/max()
> 
> (it wasn't until I reverted commit c7c133f that the kernel build worked)

I will revert those from linux-next today unless someone sends a fix.

-- 
Cheers,
Stephen Rothwell


pgpwRHWBWLjIO.pgp
Description: OpenPGP digital signature

Re: [Intel-gfx] [PATCH] [v2] drm/i915/pmu: avoid -Wmaybe-uninitialized warning

2018-03-13 Thread Arnd Bergmann

On Tue, Mar 13, 2018 at 6:46 PM, Tvrtko Ursulin
 wrote:
>
> On 13/03/2018 16:19, Arnd Bergmann wrote:
>>
>> The conditional spinlock confuses gcc into thinking the 'flags' value
>> might contain uninitialized data:
>>
>> drivers/gpu/drm/i915/i915_pmu.c: In function '__i915_pmu_event_read':
>> arch/x86/include/asm/paravirt_types.h:573:3: error: 'flags' may be used
>> uninitialized in this function [-Werror=maybe-uninitialized]
>
>
> Hm, how does paravirt_types.h comes into the picture?

spin_unlock_irqrestore() calls arch_local_irq_restore()

>> The code is correct, but it's easy to see how the compiler gets confused
>> here. This avoids the problem by pulling the lock outside of the function
>> into its only caller.
>
>
> Is it specific gcc version, specific options, or specific kernel config that
> this happens?

Not gcc version specific (same result with gcc-4.9 through 8, didn't test
earlier versions that are currently broken).

> Strange that it hasn't been seen so far.

It seems to be a relatively rare 'randconfig' combination. Looking at
the preprocessed sources, I find:

static u64 get_rc6(struct drm_i915_private *i915, bool locked)
{

 unsigned long flags;
 u64 val;

 if (intel_runtime_pm_get_if_in_use(i915)) {
  val = __get_rc6(i915);
  intel_runtime_pm_put(i915);
  if (!locked)
   do { do { ({ unsigned long __dummy; typeof(flags) __dummy2;
(void)(&__dummy == &__dummy2); 1; }); do { do { do { ({ unsigned long
__dummy; typeof(flags) __dummy2; (void)(&__dummy == &__dummy2); 1; });
flags = arch_local_irq_save(); } while (0); trace_hardirqs_off(); }
while (0); do { __asm__ __volatile__("": : :"memory"); do { (void)0;
(void)(spinlock_check(>pmu.lock)); } while (0); } while (0); }
while (0); } while (0); } while (0);

  if (val >= i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur) {
   i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur = 0;
   i915->pmu.sample[__I915_SAMPLE_RC6].cur = val;
  } else {
   val = i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur;
  }
  if (!locked)
   spin_unlock_irqrestore(>pmu.lock, flags);
 } else {
  struct pci_dev *pdev = i915->drm.pdev;
  struct device *kdev = >dev;
  unsigned long flags2;
# 455 "/git/arm-soc/drivers/gpu/drm/i915/i915_pmu.c"
  if (!locked)
   do { do { ({ unsigned long __dummy; typeof(flags) __dummy2;
(void)(&__dummy == &__dummy2); 1; }); do { do { do { ({ unsigned long
__dummy; typeof(flags) __dummy2; (void)(&__dummy == &__dummy2); 1; });
flags = arch_local_irq_save(); } while (0); trace_hardirqs_off(); }
while (0); do { __asm__ __volatile__("": : :"memory"); do { (void)0;
(void)(spinlock_check(>pmu.lock)); } while (0); } while (0); }
while (0); } while (0); } while (0);

  do { do { ({ unsigned long __dummy; typeof(flags2) __dummy2;
(void)(&__dummy == &__dummy2); 1; }); do { do { do { ({ unsigned long
__dummy; typeof(flags2) __dummy2; (void)(&__dummy == &__dummy2); 1;
}); flags2 = arch_local_irq_save(); } while (0); trace_hardirqs_off();
} while (0); do { __asm__ __volatile__("": : :"memory"); do { (void)0;
(void)(spinlock_check(>power.lock)); } while (0); } while (0); }
while (0); } while (0); } while (0);

  if (!i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur)
   i915->pmu.suspended_jiffies_last =
  kdev->power.suspended_jiffies;

  val = kdev->power.suspended_jiffies -
i915->pmu.suspended_jiffies_last;
  val += jiffies - kdev->power.accounting_timestamp;

  spin_unlock_irqrestore(>power.lock, flags2);

  val = jiffies_to_nsecs(val);
  val += i915->pmu.sample[__I915_SAMPLE_RC6].cur;
  i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur = val;

  if (!locked)
   spin_unlock_irqrestore(>pmu.lock, flags);
 }
  return val;
}

so it seems that the spin_lock_irqsave() is completely inlined through
a macro while the unlock is not, and the lock contains a memory barrier
(among other things) that might tell the compiler that the state of the
'locked' flag could changed underneath it.

It could also be the problem that arch_local_irq_restore() uses
__builtin_expect() in  PVOP_TEST_NULL(op) when
CONFIG_PARAVIRT_DEBUG is enabled, see

static inline __attribute__((unused))
__attribute__((no_instrument_function))
__attribute__((no_instrument_function)) void
arch_local_irq_restore(unsigned long f)
{
 ({ unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;; do {
if (__builtin_expect(!!(pv_irq_ops.restore_fl.func == ((void *)0)),
0)) do { do { asm volatile("1:\t" ".byte 0x0f, 0x0b" "\n"
".pushsection __bug_table,\"aw\"\n" "2:\t" ".long " "1b" "\t#
bug_entry::bug_addr\n" "\t" ".long " "%c0" "\t# bug_entry::file\n"
"\t.word %c1" "\t# bug_entry::line\n" "\t.word %c2" "\t#
bug_entry::flags\n" "\t.org 2b+%c3\n" ".popsection" : : "i"
("/git/arm-soc/arch/x86/include/asm/paravirt.h"), "i" (783), "i" (0),
"i" (sizeof(struct bug_entry))); } while (0); do { ; asm volatile("");
__builtin_unreachable(); } while (0); } while (0); } while (0); asm
volatile("" "771:\n\t" "999:\n\t" ".pushsection
.discard.retpoline_safe\n\t" " "

Re: [RT PATCH 2/2] block: blk-mq: move blk_queue_usage_counter_release() into process context

2018-03-13 Thread Peter Zijlstra

On Tue, Mar 13, 2018 at 07:42:41PM +0100, Sebastian Andrzej Siewior wrote:
> +static void blk_queue_usage_counter_release_swork(struct swork_event *sev)
> +{
> + struct request_queue *q =
> + container_of(sev, struct request_queue, mq_pcpu_wake);
> +
> + wake_up_all(>mq_freeze_wq);
> +}
> +
>  static void blk_queue_usage_counter_release(struct percpu_ref *ref)
>  {
>   struct request_queue *q =
>   container_of(ref, struct request_queue, q_usage_counter);
>  
> - wake_up_all(>mq_freeze_wq);
> + swork_queue(>mq_pcpu_wake);
>  }

Depending on if we expect there to actually be wakeups, you could do
something like:

if (wq_has_sleepers(>mq_freeze_wq))
swork_queue(>mq_pcpu_wake));

avoiding the whole workqueue thing in the case there wasn't anybody
waiting for it. But since I don't know this code, I can't say if it
makes sense or not. Tejun?

Re: [v5 1/2] mm: disable interrupts while initializing deferred pages

2018-03-13 Thread Andrew Morton

On Tue, 13 Mar 2018 15:45:46 -0400 Pavel Tatashin  
wrote:

> > > 
> > > We must remove cond_resched() because we can't sleep anymore. They were
> > > added to fight NMI timeouts, so I will replace them with
> > > touch_nmi_watchdog() in a follow-up fix.
> > 
> > This makes no sense.  Any code section where we can add cond_resched()
> > was never subject to NMI timeouts because that code cannot be running with
> > disabled interrupts.
> > 
> 
> Hi Andrew,
> 
> I was talking about this patch:
> 
> 9b6e63cbf85b89b2dbffa4955dbf2df8250e5375
> mm, page_alloc: add scheduling point to memmap_init_zone
> 
> Which adds cond_resched() to memmap_init_zone() to avoid NMI timeouts.
> 
> memmap_init_zone() is used both, early in boot, when non-deferred struct
> pages are initialized, but also may be used later, during memory hotplug.
> 
> As I understand, the later case could cause the timeout on non-preemptible
> kernels.
> 
> My understanding, is that the same logic was used here when cond_resched()s
> were added.
> 
> Please correct me if I am wrong.

Yes, the message is a bit confusing and the terminology is perhaps
vague.  And it's been a while since I played with this stuff, so from
(dated) memory:

Soft lockup: kernel has run for too long without rescheduling
Hard lockup: kernel has run for too long with interrupts disabled

Both of these are detected by the NMI watchdog handler.

9b6e63cbf85b89b2d fixes a soft lockup by adding a manual rescheduling
point.  Replacing that with touch_nmi_watchdog() won't work (I think). 
Presumably calling touch_softlockup_watchdog() will "work", in that it
suppresses the warning.  But it won't fix the thing which the warning
is actually warning about: starvation of the CPU scheduler.  That's
what the cond_resched() does.

I'm not sure what to suggest, really.  Your changelog isn't the best:
"Vlastimil Babka reported about a window issue during which when
deferred pages are initialized, and the current version of on-demand
initialization is finished, allocations may fail".  Well...  where is
ths mysterious window?  Without such detail it's hard for others to
suggest alternative approaches.

Re: [PATCH v2] usb: musb: Fix external abort in musb_remove on omap2430

2018-03-13 Thread Bin Liu

Hi,

On Mon, Mar 12, 2018 at 11:30:39PM +0100, Merlijn Wajer wrote:
> Hi Bin,
> 
> On 09/03/18 15:11, Bin Liu wrote:
> > Hi,
> > 
> > On Thu, Mar 08, 2018 at 11:19:48PM +0100, Merlijn Wajer wrote:
> >> This fixes an oops on unbind / module unload (on the musb omap2430
> >> platform).
> >>
> >> musb_remove function now calls musb_platform_exit before disabling
> >> runtime pm.
> >>
> >> Signed-off-by: Merlijn Wajer 
> > 
> > Applied. Thanks.
> 
> Thank you. Is there any chance that this patch and the vbus patch could
> end up in the 4.16-rc series? I was waiting for another rc to see if the
> vbus patch would land, but it didn't land with 4.16-rc5.

Those two patches are already in the upstream tree, hopefully they will
be in the v4.16-rc.

Regards,
-Bin.

Re: [RESEND] rsi: Remove stack VLA usage

2018-03-13 Thread tcharding

On Mon, Mar 12, 2018 at 09:46:06AM +, Kalle Valo wrote:
> tcharding  wrote:
> 
> > The kernel would like to have all stack VLA usage removed[1].  rsi uses
> > a VLA based on 'blksize'.  Elsewhere in the SDIO code maximum block size
> > is defined using a magic number.  We can use a pre-processor defined
> > constant and declare the array to maximum size.  We add a check before
> > accessing the array in case of programmer error.
> > 
> > [1]: https://lkml.org/lkml/2018/3/7/621
> > 
> > Signed-off-by: Tobin C. Harding 
> 
> Tobin, your name in patchwork.kernel.org is just "tcharding" then it should be
> "Tobin C. Harding". Patchwork is braindead in a way as it takes the name from
> it's database instead of the From header of the patch in question.
> 
> I can fix that manually but it would be helpful if you could register to
> patchwork and fix your name during registration. You have only one chance to
> fix your name (another braindead feature!) so be careful :)

Hi Kalle,

I logged into my patchwork account but I don't see any way to set the
name.  Within 'profile' there is only 'change password' and 'link
email'.  I thought I could unregister then re-register but I can't see
how to do that either.  Is there a maintainer of patchwork.kernel.org
who I can email to manually remove me from the system?

thanks,
Tobin.

[PATCH] crypto: arm,arm64 - Fix random regeneration of S_shipped

2018-03-13 Thread Leonard Crestez

The decision to rebuild .S_shipped is made based on the relative
timestamps of .S_shipped and .pl files but git makes this essentially
random. This means that the perl script might run anyway (usually at
most once per checkout), defeating the whole purpose of _shipped.

Fix by skipping the rule unless explicit make variables are provided:
REGENERATE_ARM_CRYPTO or REGENERATE_ARM64_CRYPTO.

This can produce nasty occasional build failures downstream, for example
for toolchains with broken perl. The solution is minimally intrusive to
make it easier to push into stable.

Another report on a similar issue here: https://lkml.org/lkml/2018/3/8/1379

Signed-off-by: Leonard Crestez 
Cc: 
---
 arch/arm/crypto/Makefile   | 2 ++
 arch/arm64/crypto/Makefile | 2 ++
 2 files changed, 4 insertions(+)

Not clear if this needs to go through crypto or arm but all commits in these
directories start with "crypto:".

My problems were only on arm64 because of a yocto toolchain which ships a 
version
of perl which fails on "use integer;".

CC stable because this can cause trouble for downstream packagers.

diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile
index 30ef8e2..c9919c2 100644
--- a/arch/arm/crypto/Makefile
+++ b/arch/arm/crypto/Makefile
@@ -47,20 +47,22 @@ sha256-arm-y:= sha256-core.o sha256_glue.o 
$(sha256-arm-neon-y)
 sha512-arm-neon-$(CONFIG_KERNEL_MODE_NEON) := sha512-neon-glue.o
 sha512-arm-y   := sha512-core.o sha512-glue.o $(sha512-arm-neon-y)
 sha1-arm-ce-y  := sha1-ce-core.o sha1-ce-glue.o
 sha2-arm-ce-y  := sha2-ce-core.o sha2-ce-glue.o
 aes-arm-ce-y   := aes-ce-core.o aes-ce-glue.o
 ghash-arm-ce-y := ghash-ce-core.o ghash-ce-glue.o
 crct10dif-arm-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o
 crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o
 chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o
 
+ifdef REGENERATE_ARM_CRYPTO
 quiet_cmd_perl = PERL$@
   cmd_perl = $(PERL) $(<) > $(@)
 
 $(src)/sha256-core.S_shipped: $(src)/sha256-armv4.pl
$(call cmd,perl)
 
 $(src)/sha512-core.S_shipped: $(src)/sha512-armv4.pl
$(call cmd,perl)
+endif
 
 .PRECIOUS: $(obj)/sha256-core.S $(obj)/sha512-core.S
diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile
index cee9b8d9..dfe651b 100644
--- a/arch/arm64/crypto/Makefile
+++ b/arch/arm64/crypto/Makefile
@@ -60,20 +60,22 @@ obj-$(CONFIG_CRYPTO_AES_ARM64_BS) += aes-neon-bs.o
 aes-neon-bs-y := aes-neonbs-core.o aes-neonbs-glue.o
 
 AFLAGS_aes-ce.o:= -DINTERLEAVE=4
 AFLAGS_aes-neon.o  := -DINTERLEAVE=4
 
 CFLAGS_aes-glue-ce.o   := -DUSE_V8_CRYPTO_EXTENSIONS
 
 $(obj)/aes-glue-%.o: $(src)/aes-glue.c FORCE
$(call if_changed_rule,cc_o_c)
 
+ifdef REGENERATE_ARM64_CRYPTO
 quiet_cmd_perlasm = PERLASM $@
   cmd_perlasm = $(PERL) $(<) void $(@)
 
 $(src)/sha256-core.S_shipped: $(src)/sha512-armv8.pl
$(call cmd,perlasm)
 
 $(src)/sha512-core.S_shipped: $(src)/sha512-armv8.pl
$(call cmd,perlasm)
+endif
 
 .PRECIOUS: $(obj)/sha256-core.S $(obj)/sha512-core.S
-- 
2.7.4

Re: [3/3] irqchip/gic-v3: Bounds check redistributor accesses

2018-03-13 Thread Marc Zyngier

On Tue, 13 Mar 2018 18:49:44 +,
Nishanth Menon wrote:
> 
> Marc,
> 
> On Tue, Mar 13, 2018 at 9:21 AM, Marc Zyngier  wrote:
> > Hi Lokesh,
> >
> > On 13/03/18 13:38, Lokesh Vutla wrote:
> >> Hi All,
> >>
> >> On Wednesday 11 October 2017 03:11 PM, Punit Agrawal wrote:
> >>> The kernel crashes while iterating over a redistributor that is
> >>> in-correctly sized by the platform firmware or doesn't contain the last
> >>> record.
> >>>
> >>> Prevent the crash by checking accesses against the size of the region
> >>> provided by the firmware. While we are at it, warn the user about
> >>> incorrect region size.
> >>>
> >>> Signed-off-by: Punit Agrawal 
> >>> Cc: Marc Zyngier 
> >>
> >> Sorry to bring up an old thread. Just wanted to check what is the status
> >> on this series.
> >
> > So far, I wasn't inclined to merge it, as it only allowed you to detect
> > a broken system, as opposed to help with a working one.
> 
> Is'nt that a good reason to have it? Why not help an error in dtb with
> a debug helper than an obtuse crash to debug painfully?

The kernel is not in the business of validating DTBs. And we both know
that if we start papering over firmware bugs, these bugs become ABI,
and we live with them forever. If you want to validate DTBs, you
should use a tool that is actually validating it against your HW, and
not use the kernel as the validation tool.

> 
> >
> >> This will also be useful when we try to boot linux + hypervisor with
> >> less number of cores than the SoC supports. For example:
> >> - SoC has 4 cores and Linux tries to boot with 2 cores.
> >> - then a type-2 hypervisor gets installed.
> >> - Hypervisor tries to boot a VM with linux on core 1.
> >>
> >> Now the VM boot will fail while it iterates over all the GICR regions
> >> till GICR_TYPER is found. Hypervisor will trap any accesses to GICR
> >> regions of any invalid cpus(cpu 2, cpu 3 in this case).
> >
> > It you're passing the redistributors to a guest, you're doing something
> > terribly wrong. You're putting the guest in a position to do a DoS on
> > the hypervisor (disabling its timer interrupt, for example). Not the
> > greatest move. There is a number of other gotchas with this approach
> > (virtual interrupts, distributor virtualization...).
> >
> >> If the $patch is not the right approach, can you suggest on how to
> >> handle the above scenario?
> >
> > The proper way to handle this is to virtualize the distributor and
> > redistributor by trap/emulate. The only thing you can safely pass to a
> > guest is the CPU interface, either as system registers or in its MMIO
> > form (if you have the GICv2 compatibility interface).
> >
> 
> Dumb question: Would'nt a trap emulate be real expensive with hyp
> context in v8 (all the context save/restore we'd have to do? (in the
> context of discussion, GICV2 compat is disabled).

How often do you hit the distributor and redistributors? Why do you
need to context switch a distributor that you emulate (hint: you
don't). I suggest you look at how real life hypervisor works (there's
a pretty good one in the tree).

Thanks,

M.

-- 
Jazz is not dead, it just smell funny.

Re: [PATCH 3/8] gpio: pci-idio-16: Implement get_multiple callback

2018-03-13 Thread William Breathitt Gray

On Tue, Mar 13, 2018 at 06:04:33PM +0200, Andy Shevchenko wrote:
>On Mon, Mar 12, 2018 at 10:49 PM, William Breathitt Gray
> wrote:
>
>> +static int idio_16_gpio_get_multiple(struct gpio_chip *chip,
>> +   unsigned long *mask, unsigned long *bits)
>> +{
>
>
>> +   /* clear bits array to a clean slate */
>> +   for (i = 0; i < chip->ngpio; i += BITS_PER_LONG)
>> +   bits[i / BITS_PER_LONG] = 0;
>
>bitmap_clear() or bitmap_zero()

bitmap_zero() would be perfect for this situation. I'll submit a version
2 of this patchset with this change for the various drivers herein.

>
>> +   /* get bits are evaluated a gpio register size at a time */
>> +   for (i = 0; i < chip->ngpio; i += gpio_reg_size) {
>> +   bit_word_offset = i % BITS_PER_LONG;
>> +   bits_mask = mask[BIT_WORD(i)] & (reg_mask << 
>> bit_word_offset);
>> +   if (!bits_mask) {
>> +   /* no get bits in this register so skip to next one 
>> */
>> +   continue;
>> +   }
>
>for_each_set_bit() ?
>
>> +   /* store acquired bits */
>> +   bits[BIT_WORD(i)] |= port_state << bit_word_offset;
>
>__set_bit()
>__clear_bit()

I'm not sure for_each_set_bit() and __set_bit()/__clear_bit() would be
good for this particular situation since I'm working with the entire
register word (8 bits for this specific device) at a time.

Since I know each register word is 8-bits, I can make 8-bit jumps and
skips rather than evaluating each bit individually. Similarly, it seems
more efficient to store the input 8 bits at a time rather than breaking
it up to individual bits.

However, I do see how all these inline bitwise operations could make it
difficult to follow the code, so perhaps I can break it up across more
lines so that the logic of the loop becomes easier to read.

William Breathitt Gray

>
>We have bitmap API for *ages*. Is it too hard to check?
>
>-- 
>With Best Regards,
>Andy Shevchenko

Re: [2/3] mwifiex: support sysfs initiated device coredump

2018-03-13 Thread Marcel Holtmann

Hi Arend,

> Since commit 3c47d19ff4dc ("drivers: base: add coredump driver ops")
> it is possible to initiate a device coredump from user-space. This
> patch adds support for it adding the .coredump() driver callback.
> As there is no longer a need to initiate it through debugfs remove
> that code.
> 
> Signed-off-by: Arend van Spriel 
 
 Based on the discussion I assume this is ok to take to w-d-next. If that's 
 not
 the case, please let me know ASAP.
>>> 
>>> It is up to the mwifiex maintainers to decide, I guess. The ABI
>>> documentation need to be revised and change the callback to void
>>> return type. I am not sure what the best approach is. 1) apply this
>>> and fix return type later, or 2) fix return type and resubmit this.
>>> What is your opinion?
>> 
>> I guess the callback change will go through Greg's tree? Then I suspect
>> it's easier that you submit the callback change to Greg first and wait
>> for it to trickle down to wireless-drivers-next (after the next merge
>> window) and then I can apply the driver patches. Otherwise there might
>> be a conflict between my and Greg's tree.
> 
> That was my assessment, but unfortunately Marcel already applied the btmrvl 
> patch before I could reply. So how do I move from here? Option 1) revert 
> brmrvl and fix callback return type, or 2) apply mwifiex patch and fix 
> callback return type later for both drivers.

I can take the patch back out of bluetooth-next if needed. It is your call.

Regards

Marcel

Re: [2/3] mwifiex: support sysfs initiated device coredump

2018-03-13 Thread Arend van Spriel


On 3/13/2018 9:19 PM, Marcel Holtmann wrote:

Hi Arend,


Since commit 3c47d19ff4dc ("drivers: base: add coredump driver ops")
it is possible to initiate a device coredump from user-space. This
patch adds support for it adding the .coredump() driver callback.
As there is no longer a need to initiate it through debugfs remove
that code.

Signed-off-by: Arend van Spriel 


Based on the discussion I assume this is ok to take to w-d-next. If that's not
the case, please let me know ASAP.


It is up to the mwifiex maintainers to decide, I guess. The ABI
documentation need to be revised and change the callback to void
return type. I am not sure what the best approach is. 1) apply this
and fix return type later, or 2) fix return type and resubmit this.
What is your opinion?


I guess the callback change will go through Greg's tree? Then I suspect
it's easier that you submit the callback change to Greg first and wait
for it to trickle down to wireless-drivers-next (after the next merge
window) and then I can apply the driver patches. Otherwise there might
be a conflict between my and Greg's tree.


That was my assessment, but unfortunately Marcel already applied the btmrvl 
patch before I could reply. So how do I move from here? Option 1) revert brmrvl 
and fix callback return type, or 2) apply mwifiex patch and fix callback return 
type later for both drivers.


I can take the patch back out of bluetooth-next if needed. It is your call.


Thanks, Marcel

Let's go for that. Please revert/remove the patch.

Regards,
Arend

Re: [PATCH v2 1/5] iommu/amd - Add debugfs support

2018-03-13 Thread Andy Shevchenko

On Tue, Mar 13, 2018 at 8:54 PM, Gary R Hook  wrote:
> On 03/13/2018 12:16 PM, Andy Shevchenko wrote:
>> On Fri, Mar 9, 2018 at 2:50 AM, Gary R Hook  wrote:

>>> +#include 
>>> +#include 
>>> +#include 
>>
>>
>> Keep in order?

> What order would that be? These few needed files are listed in the same
> order as which they appear in amd_iommu.c. I'm gonna need a preference
> spelled out, please (and a rationale, so I may better understand).

To increase readability and avoid potential header duplication (here
is can bus protocol implementation where the problem exists for real,
even in new code!)

>>> +/* DebugFS helpers */
>>> +#defineOBUFP   (obuf + oboff)
>>> +#defineOBUFLEN obuflen
>>> +#defineOBUFSPC (OBUFLEN - oboff)
>>> +#defineOSCNPRINTF(fmt, ...) \
>>> +   scnprintf(OBUFP, OBUFSPC, fmt, ## __VA_ARGS__)
>>
>>
>> I don't see any advantages of this. Other way around, they will simple
>> makes things hard to read an understand in place.
>
>
> I used this technique in the CCP driver code (where it was accepted), in an
> effort to do the opposite of what you claim: make the code more readable.
> Given the 80 column limit, a large number of arguments, and very long
> statements, IMO something needs to give. I don't find the use of #defines to
> be obfuscating.
>
> I'm not trying to argue, but rather simply state the perspective / reasoning
> I used to create a source file I feel is manageable. I have 17 more iommu
> patches built upon this strategy, and this seems to be advantageous for all
> of them.

It's fine to me as long as it's fine to maintainer, but honestly
speaking I would avoid such code as much as possible. Imagine that
your "advantage" basically becomes disadvantage to everyone else who
is not familiar with the code.

Each time I see macro in the code, I would need to at least step on
it, run cscope, read, and come back. And at this point of time I
already forgot what this code is doing, it does use sNprintf() or
sCNprintf() or whatever wrapper on top of either...

>>> +   for (i = start ; i <= end ; i++)

>> Missed {}

> Wasn't sure about the M.O. given that the body of this loop is a single if
> statement. And I don't see anywhere in
> https://www.kernel.org/doc/html/latest/process/coding-style.html
> in section 3.1 where curly braces are called for in this situation. May I
> ask for clarification on the style rule, please?

You can do nothing, though I'm guided by the end of section 3.0
(though it tells only about 'if' case).

>>> @@ -89,6 +89,7 @@
>>>   #define ACPI_DEVFLAG_ATSDIS 0x1000
>>>
>>>   #define LOOP_TIMEOUT   10
>>> +
>>>   /*
>>>* ACPI table definitions
>>>*

>> Doesn't belong to the patch.

> I'm sorry, I don't understand. The added blank line doesn't belong to the
> patch?

Correct.

-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH ghak21 V2 3/4] audit: add refused symlink to audit_names

2018-03-13 Thread Paul Moore

On Tue, Mar 13, 2018 at 6:52 AM, Richard Guy Briggs  wrote:
> On 2018-03-13 11:38, Steve Grubb wrote:
>> On Tue, 13 Mar 2018 06:11:08 -0400
>> Richard Guy Briggs  wrote:
>>
>> > On 2018-03-13 09:35, Steve Grubb wrote:
>> > > On Mon, 12 Mar 2018 11:52:56 -0400
>> > > Richard Guy Briggs  wrote:
>> > >
>> > > > On 2018-03-12 11:53, Paul Moore wrote:
>> > > > > On Mon, Mar 12, 2018 at 11:26 AM, Richard Guy Briggs
>> > > > >  wrote:
>> > > > > > On 2018-03-12 11:12, Paul Moore wrote:
>> > > > > >> On Mon, Mar 12, 2018 at 2:31 AM, Richard Guy Briggs
>> > > > > >>  wrote:
>> > > > > >> > Audit link denied events for symlinks had duplicate PATH
>> > > > > >> > records rather than just updating the existing PATH record.
>> > > > > >> > Update the symlink's PATH record with the current dentry
>> > > > > >> > and inode information.
>> > > > > >> >
>> > > > > >> > See: https://github.com/linux-audit/audit-kernel/issues/21
>> > > > > >> > Signed-off-by: Richard Guy Briggs 
>> > > > > >> > ---
>> > > > > >> >  fs/namei.c | 1 +
>> > > > > >> >  1 file changed, 1 insertion(+)
>> > > > > >>
>> > > > > >> Why didn't you include this in patch 4/4 like I asked during
>> > > > > >> the previous review?
>> > > > > >
>> > > > > > Please see the last comment of:
>> > > > > > https://www.redhat.com/archives/linux-audit/2018-March/msg00070.html
>> > > > >
>> > > > > Yes, I just saw that ... I hadn't seen your replies on the v1
>> > > > > patches until I had finished reviewing v2.  I just replied to
>> > > > > that mail in the v1 thread, but basically you need to figure
>> > > > > out what is necessary here and let us know.  If I have to
>> > > > > figure it out it likely isn't going to get done with enough
>> > > > > soak time prior to the upcoming merge window.
>> > > >
>> > > > Steve?  I was hoping you could chime in here.
>> > >
>> > > If the CWD record will always be the same as the PARENT record,
>> > > then we do not need the parent record. Duplicate information is
>> > > bad. Like all the duplicate SYSCALL information.
>> >
>> > The CWD record could be different from the PARENT record, since I
>> > could have SYMLINK=/tmp/test/symlink, CWD=/tmp, PARENT=/tmp/test.
>> > Does the parent record even matter since it might not be a directory
>> > operation like creat, unlink or rename?
>>
>> There's 2 issues. One is creating the path if what we have is relative.
>> In this situation CWD should be enough. But if the question is whether
>> the PARENT directory should be included...what if the PARENT
>> permissions do not allow the successful name resolution? In that case
>> we might only get a PARENT record no? In that case we would need it.
>
> I think in the case of symlink creation, normal file create code path
> would be in effect, and would properly log parent and symlink source
> file paths (if a rule to log it was in effect) which is not something
> that would trigger a symlink link denied error.  Symlink link denied
> happens only when trying to actually follow the link before
> resolving the target path of a read/write/exec of the symlink target.
>
> If the parent permissions of the link's target don't allow successful
> name resolution then the symlink link denied condition isn't met, but
> rather any other rule that applies to the target path.

I'm guessing you are in the process of tracking all this down, but if
not, lets get to a point where we can answer this definitively and not
guess :)

-- 
paul moore
www.paul-moore.com

RE: [PATCH V2] perf/x86/intel/uncore: Querying number of CHAs from CAPID6 register

2018-03-13 Thread Kroening, Gary

I've tested this patch on the same set of hubless (single-segment) and scalable 
(segment-per-socket) configurations as for Kan's version 1.

As far as we can tell this will also work for Cascade Lake, but will need 
revisiting for Ice Lake.

Thanks.
Gary

> -Original Message-
> From: kan.li...@linux.intel.com [mailto:kan.li...@linux.intel.com]
> Sent: Tuesday, March 13, 2018 1:52 PM
> To: mi...@redhat.com; h...@zytor.com; t...@linutronix.de;
> pet...@infradead.org; andy.shevche...@gmail.com
> Cc: Kroening, Gary; Travis, Mike; Banman, Andrew; Sivanich, Dimitri;
> Anderson, Russ; x...@kernel.org; linux-kernel@vger.kernel.org; Kan Liang
> Subject: [PATCH V2] perf/x86/intel/uncore: Querying number of CHAs from
> CAPID6 register
> 
> From: Kan Liang 
> 
> The number of CHAs is miscalculated on multi PCI domain systems on
> Skylake server.
> 
> (From Kroening, Gary:
> 
> For systems with a single PCI segment, it is sufficient to look for the
> bus number to change in order to determine that all of the CHa's have
> been counted for a single socket.
> However, for multi PCI segment systems, each socket is given a new
> segment and the bus number does NOT change.  So looking only for the
> bus number to change ends up counting all of the CHa's on all sockets
> in the system.  This leads to writing CPU MSRs beyond a valid range and
> causes an error in ivbep_uncore_msr_init_box().)
> 
> To determine the number of CHAs, it should read bits 27:0 in the CAPID6
> register located at Device 30, Function 3, Offset 0x9C. These 28 bits
> form a bit vector of available LLC slices and the CHAs that manage those
> slices.
> 
> Fixes: cd34cd97b7b4 ("perf/x86/intel/uncore: Add Skylake server uncore
> support")
> Reported-by: Kroening, Gary 
> Signed-off-by: Kan Liang 
> ---
> 
> Changes since V1:
>  - add missed pci_dev_put()
>  - Drop ugly casting by using hweight32()
>  - Add comments for macros.
> 
>  arch/x86/events/intel/uncore_snbep.c | 31 +--
>  1 file changed, 17 insertions(+), 14 deletions(-)
> 
> diff --git a/arch/x86/events/intel/uncore_snbep.c
> b/arch/x86/events/intel/uncore_snbep.c
> index 6d8044a..8970f71 100644
> --- a/arch/x86/events/intel/uncore_snbep.c
> +++ b/arch/x86/events/intel/uncore_snbep.c
> @@ -3562,24 +3562,27 @@ static struct intel_uncore_type *skx_msr_uncores[]
> = {
>   NULL,
>  };
> 
> +/*
> + * To determine the number of CHAs, it should read bits 27:0 in the
> CAPID6
> + * register which located at Device 30, Function 3, Offset 0x9C. PCI ID
> 0x2083.
> + */
> +#define SKX_CAPID6   0x9c
> +#define SKX_CHA_BIT_MASK GENMASK(27, 0)
> +
>  static int skx_count_chabox(void)
>  {
> - struct pci_dev *chabox_dev = NULL;
> - int bus, count = 0;
> + struct pci_dev *dev = NULL;
> + u32 val = 0;
> 
> - while (1) {
> - chabox_dev = pci_get_device(PCI_VENDOR_ID_INTEL, 0x208d,
> chabox_dev);
> - if (!chabox_dev)
> - break;
> - if (count == 0)
> - bus = chabox_dev->bus->number;
> - if (bus != chabox_dev->bus->number)
> - break;
> - count++;
> - }
> + dev = pci_get_device(PCI_VENDOR_ID_INTEL, 0x2083, dev);
> + if (!dev)
> + goto out;
> 
> - pci_dev_put(chabox_dev);
> - return count;
> + pci_read_config_dword(dev, SKX_CAPID6, );
> + val &= SKX_CHA_BIT_MASK;
> +out:
> + pci_dev_put(dev);
> + return hweight32(val);
>  }
> 
>  void skx_uncore_cpu_init(void)
> --
> 2.7.4

Re: [PATCH V2] perf/x86/intel/uncore: Querying number of CHAs from CAPID6 register

2018-03-13 Thread Andy Shevchenko

On Tue, Mar 13, 2018 at 8:51 PM,   wrote:
> From: Kan Liang 
>
> The number of CHAs is miscalculated on multi PCI domain systems on
> Skylake server.
>
> (From Kroening, Gary:
>
> For systems with a single PCI segment, it is sufficient to look for the
> bus number to change in order to determine that all of the CHa's have
> been counted for a single socket.
> However, for multi PCI segment systems, each socket is given a new
> segment and the bus number does NOT change.  So looking only for the
> bus number to change ends up counting all of the CHa's on all sockets
> in the system.  This leads to writing CPU MSRs beyond a valid range and
> causes an error in ivbep_uncore_msr_init_box().)
>
> To determine the number of CHAs, it should read bits 27:0 in the CAPID6
> register located at Device 30, Function 3, Offset 0x9C. These 28 bits
> form a bit vector of available LLC slices and the CHAs that manage those
> slices.
>

FWIW,
Reviewed-by: Andy Shevchenko 

> Fixes: cd34cd97b7b4 ("perf/x86/intel/uncore: Add Skylake server uncore
> support")
> Reported-by: Kroening, Gary 
> Signed-off-by: Kan Liang 
> ---
>
> Changes since V1:
>  - add missed pci_dev_put()
>  - Drop ugly casting by using hweight32()
>  - Add comments for macros.
>
>  arch/x86/events/intel/uncore_snbep.c | 31 +--
>  1 file changed, 17 insertions(+), 14 deletions(-)
>
> diff --git a/arch/x86/events/intel/uncore_snbep.c 
> b/arch/x86/events/intel/uncore_snbep.c
> index 6d8044a..8970f71 100644
> --- a/arch/x86/events/intel/uncore_snbep.c
> +++ b/arch/x86/events/intel/uncore_snbep.c
> @@ -3562,24 +3562,27 @@ static struct intel_uncore_type *skx_msr_uncores[] = {
> NULL,
>  };
>
> +/*
> + * To determine the number of CHAs, it should read bits 27:0 in the CAPID6
> + * register which located at Device 30, Function 3, Offset 0x9C. PCI ID 
> 0x2083.
> + */
> +#define SKX_CAPID6 0x9c
> +#define SKX_CHA_BIT_MASK   GENMASK(27, 0)
> +
>  static int skx_count_chabox(void)
>  {
> -   struct pci_dev *chabox_dev = NULL;
> -   int bus, count = 0;
> +   struct pci_dev *dev = NULL;
> +   u32 val = 0;
>
> -   while (1) {
> -   chabox_dev = pci_get_device(PCI_VENDOR_ID_INTEL, 0x208d, 
> chabox_dev);
> -   if (!chabox_dev)
> -   break;
> -   if (count == 0)
> -   bus = chabox_dev->bus->number;
> -   if (bus != chabox_dev->bus->number)
> -   break;
> -   count++;
> -   }
> +   dev = pci_get_device(PCI_VENDOR_ID_INTEL, 0x2083, dev);
> +   if (!dev)
> +   goto out;
>
> -   pci_dev_put(chabox_dev);
> -   return count;
> +   pci_read_config_dword(dev, SKX_CAPID6, );
> +   val &= SKX_CHA_BIT_MASK;
> +out:
> +   pci_dev_put(dev);
> +   return hweight32(val);
>  }
>
>  void skx_uncore_cpu_init(void)
> --
> 2.7.4
>



-- 
With Best Regards,
Andy Shevchenko

Re: [PATCH v2] workqueue: use put_device() instead of kfree()

2018-03-13 Thread Tejun Heo

On Tue, Mar 06, 2018 at 03:35:43PM +0530, Arvind Yadav wrote:
> Never directly free @dev after calling device_register(), even
> if it returned an error! Always use put_device() to give up the
> reference initialized in this function instead.
> 
> Signed-off-by: Arvind Yadav 

Applied to wq/for-4.16-fixes.

Thanks.

-- 
tejun

Re: linux-next: build warning after merge of the net-next tree

2018-03-13 Thread Gustavo A. R. Silva

On 03/13/2018 10:33 AM, David Miller wrote:

From: "Gustavo A. R. Silva" 
Date: Tue, 13 Mar 2018 06:46:24 -0500

Hi Stephen,

On 03/13/2018 01:11 AM, Stephen Rothwell wrote:

Hi all,
After merging the net-next tree, today's linux-next build (sparc
defconfig) produced this warning:
net/core/pktgen.c: In function 'pktgen_if_write':
net/core/pktgen.c:1710:1: warning: the frame size of 1048 bytes is
larger than 1024 bytes [-Wframe-larger-than=]
   }
   ^
Introduced by commit
35951393bbff ("pktgen: Remove VLA usage")

Thanks for the report.

David:

If this code is not going to be executed very often [1], then I think
it is safe to use dynamic memory allocation instead, as this is not
going to impact the performance.

What do you think?

[1] https://lkml.org/lkml/2018/3/9/630

Sure, that works.

It is only invoked when pktgen configuration changes are made.

OK. I'll send a new patch for this.

Thanks
--
Gustavo

Re: [PATCH v4 4/6] ipc: Clamp msgmni and shmmni to the real IPCMNI limit

2018-03-13 Thread Eric W. Biederman

Waiman Long  writes:

> On 03/13/2018 02:17 PM, Eric W. Biederman wrote:
>> Waiman Long  writes:
>>
>>> A user can write arbitrary integer values to msgmni and shmmni sysctl
>>> parameters without getting error, but the actual limit is really
>>> IPCMNI (32k). This can mislead users as they think they can get a
>>> value that is not real.
>>>
>>> Enforcing the limit by failing the sysctl parameter write, however,
>>> can break existing user applications.
>> Which applications examples please.
>>
>> I am seeing this patchset late but it looks like a whole lot of changes
>> to avoid a theoretical possibility.
>>
>> Changes that have an impact on more than just the ipc code you are
>> patching.
>>
>> That makes me feel very uncomfortable with these changes.
>>
>> Eric
>
> This patchset is constructed to address a customer request that there is
> no easy way to find out the actual usable range of a sysctl parameter.
> In this particular case, the customer wants to use more than 32k shared
> memory segments. They can put in a large value into shmmni, but the
> application didn't work properly because shmmni was internally clamped
> to 32k without any visible sign that a smaller limit has been imposed.
>
> Out of a concern that there might be customers out there setting those
> sysctl parameters outside of the allowable range without knowing it,
> just enforcing the right limits may have the undesirable consequence of
> breaking their existing setup scripts. I don't have concrete example of
> what customers are doing that, but it  won't look good if we wait until
> the complaints come in.
>
> The new code won't affect existing code unless the necessary flag is
> set. So would you mind elaborating what other impact do you see that
> will affect other non-IPC code in an undesirable way?

The increase in size of struct ctl_table.  Every caller is affected.
Plus it increases everyone's cognitive load to figure out what is
this flags field as they fill out ctl_table.

Just introducing a proc_dointvec_minmax_clamped follows the existing
pattern and it makes it easier for everyone who both read the code.

It strikes me as quite peculiar that the response to bug report where
the complaint is an error is not given, is to continue the current
behavior without giving an error.

Arguably the simplest fix here would be to kill IPCMNI entirely.  Assign
the shmids from a sequence counter.  And place those structures in a
rbtree indexed by shmni.  There are 32bit fields but I don't think we
must keep the low 16bits for an index into an array and the high 16bits
as the actual sequence number.

Except for the checkpoint/restart case which is aguably much too
specific about how these ids are assigned that would give much more
freedom and allow people the number of shm segments that they actually
want to use.

For a further complication I don't expect you can get away with changing
the size or the fields in struct ctl_table in the kernel your customers
are running.

So please use a new function not flags it will simplify everyone's life.
If you can please actually fix this so you can have more shmids that
would be the really classy thing todo.

Eric

Re: [PATCH] libata: add refcounting to ata_host

2018-03-13 Thread Tejun Heo

On Fri, Mar 09, 2018 at 08:34:41AM +, Taras Kondratiuk wrote:
> After commit 9a6d6a2ddabb ("ata: make ata port as parent device of scsi
> host") manual driver unbind/remove causes use-after-free.
> 
> Unbind unconditionally invokes devres_release_all() which calls
> ata_host_release() and frees ata_host/ata_port memory while it is still
> being referenced as a parent of SCSI host. When SCSI host is finally
> released scsi_host_dev_release() calls put_device(parent) and accesses
> freed ata_port memory.
> 
> Add reference counting to make sure that ata_host lives long enough.
> 
> Bug report: https://lkml.org/lkml/2017/11/1/945
> Fixes: 9a6d6a2ddabb ("ata: make ata port as parent device of scsi host")
> Cc: Tejun Heo 
> Cc: Lin Ming 
> Cc: linux-...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Taras Kondratiuk 

Applied to libata/for-4.17.

Thanks.

-- 
tejun

[PATCH] regulator: gpio: Fix some error handling paths in 'gpio_regulator_probe()'

2018-03-13 Thread Christophe JAILLET

Re-order error handling code and gotos to avoid leaks in error handling
paths.

Fixes: 9f946099fe19 ("regulator: gpio: fix parsing of gpio list")
Signed-off-by: Christophe JAILLET 
---
 drivers/regulator/gpio-regulator.c | 16 +++-
 1 file changed, 7 insertions(+), 9 deletions(-)

diff --git a/drivers/regulator/gpio-regulator.c 
b/drivers/regulator/gpio-regulator.c
index 0fce06acfaec..a2eb50719c7b 100644
--- a/drivers/regulator/gpio-regulator.c
+++ b/drivers/regulator/gpio-regulator.c
@@ -271,8 +271,7 @@ static int gpio_regulator_probe(struct platform_device 
*pdev)
drvdata->desc.name = kstrdup(config->supply_name, GFP_KERNEL);
if (drvdata->desc.name == NULL) {
dev_err(>dev, "Failed to allocate supply name\n");
-   ret = -ENOMEM;
-   goto err;
+   return -ENOMEM;
}
 
if (config->nr_gpios != 0) {
@@ -292,7 +291,7 @@ static int gpio_regulator_probe(struct platform_device 
*pdev)
dev_err(>dev,
"Could not obtain regulator setting 
GPIOs: %d\n",
ret);
-   goto err_memstate;
+   goto err_memgpio;
}
}
 
@@ -303,7 +302,7 @@ static int gpio_regulator_probe(struct platform_device 
*pdev)
if (drvdata->states == NULL) {
dev_err(>dev, "Failed to allocate state data\n");
ret = -ENOMEM;
-   goto err_memgpio;
+   goto err_stategpio;
}
drvdata->nr_states = config->nr_states;
 
@@ -324,7 +323,7 @@ static int gpio_regulator_probe(struct platform_device 
*pdev)
default:
dev_err(>dev, "No regulator type set\n");
ret = -EINVAL;
-   goto err_memgpio;
+   goto err_memstate;
}
 
/* build initial state from gpio init data. */
@@ -361,22 +360,21 @@ static int gpio_regulator_probe(struct platform_device 
*pdev)
if (IS_ERR(drvdata->dev)) {
ret = PTR_ERR(drvdata->dev);
dev_err(>dev, "Failed to register regulator: %d\n", ret);
-   goto err_stategpio;
+   goto err_memstate;
}
 
platform_set_drvdata(pdev, drvdata);
 
return 0;
 
-err_stategpio:
-   gpio_free_array(drvdata->gpios, drvdata->nr_gpios);
 err_memstate:
kfree(drvdata->states);
+err_stategpio:
+   gpio_free_array(drvdata->gpios, drvdata->nr_gpios);
 err_memgpio:
kfree(drvdata->gpios);
 err_name:
kfree(drvdata->desc.name);
-err:
return ret;
 }
 
-- 
2.14.1

Re: [PATCH 1/8] Uprobe: Export vaddr <-> offset conversion functions

2018-03-13 Thread Jerome Glisse

On Tue, Mar 13, 2018 at 06:25:56PM +0530, Ravi Bangoria wrote:
> No functionality changes.
> 
> Signed-off-by: Ravi Bangoria 

Reviewed-by: Jérôme Glisse 

> ---
>  include/linux/mm.h  | 12 
>  kernel/events/uprobes.c | 10 --
>  2 files changed, 12 insertions(+), 10 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index ad06d42..95909f2 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2274,6 +2274,18 @@ struct vm_unmapped_area_info {
>   return unmapped_area(info);
>  }
>  
> +static inline unsigned long
> +offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
> +{
> + return vma->vm_start + offset - ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
> +}
> +
> +static inline loff_t
> +vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
> +{
> + return ((loff_t)vma->vm_pgoff << PAGE_SHIFT) + (vaddr - vma->vm_start);
> +}
> +
>  /* truncate.c */
>  extern void truncate_inode_pages(struct address_space *, loff_t);
>  extern void truncate_inode_pages_range(struct address_space *,
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index ce6848e..bd6f230 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -130,16 +130,6 @@ static bool valid_vma(struct vm_area_struct *vma, bool 
> is_register)
>   return vma->vm_file && (vma->vm_flags & flags) == VM_MAYEXEC;
>  }
>  
> -static unsigned long offset_to_vaddr(struct vm_area_struct *vma, loff_t 
> offset)
> -{
> - return vma->vm_start + offset - ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
> -}
> -
> -static loff_t vaddr_to_offset(struct vm_area_struct *vma, unsigned long 
> vaddr)
> -{
> - return ((loff_t)vma->vm_pgoff << PAGE_SHIFT) + (vaddr - vma->vm_start);
> -}
> -
>  /**
>   * __replace_page - replace page in vma by new page.
>   * based on replace_page in mm/ksm.c
> -- 
> 1.8.3.1
>

[PATCH] clk: rockchip: Add 1.6GHz PLL rate

2018-03-13 Thread Derek Basehore

We need this rate to generate 100, 200, and 228.57MHz from the same
PLL. 228.57MHz is useful for a pixel clock when the VPLL is used for
and external display.

Signed-off-by: Derek Basehore 
---
 drivers/clk/rockchip/clk-rk3399.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/clk/rockchip/clk-rk3399.c 
b/drivers/clk/rockchip/clk-rk3399.c
index 6847120b61cd..3e57c6eef93d 100644
--- a/drivers/clk/rockchip/clk-rk3399.c
+++ b/drivers/clk/rockchip/clk-rk3399.c
@@ -57,6 +57,7 @@ static struct rockchip_pll_rate_table rk3399_pll_rates[] = {
RK3036_PLL_RATE(165600, 1, 69, 1, 1, 1, 0),
RK3036_PLL_RATE(163200, 1, 68, 1, 1, 1, 0),
RK3036_PLL_RATE(160800, 1, 67, 1, 1, 1, 0),
+   RK3036_PLL_RATE(16, 3, 200, 1, 1, 1, 0),
RK3036_PLL_RATE(158400, 1, 66, 1, 1, 1, 0),
RK3036_PLL_RATE(156000, 1, 65, 1, 1, 1, 0),
RK3036_PLL_RATE(153600, 1, 64, 1, 1, 1, 0),
-- 
2.16.2.660.g709887971b-goog

Re: [PATCH 2/8] mm: Prefix vma_ to vaddr_to_offset() and offset_to_vaddr()

2018-03-13 Thread Jerome Glisse

On Tue, Mar 13, 2018 at 06:25:57PM +0530, Ravi Bangoria wrote:
> No functionality changes.
> 
> Signed-off-by: Ravi Bangoria 

Doing this with coccinelle would have been nicer but this is small
enough to review without too much fatigue :)

Reviewed-by: Jérôme Glisse 

> ---
>  include/linux/mm.h  |  4 ++--
>  kernel/events/uprobes.c | 14 +++---
>  2 files changed, 9 insertions(+), 9 deletions(-)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 95909f2..d7ee526 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2275,13 +2275,13 @@ struct vm_unmapped_area_info {
>  }
>  
>  static inline unsigned long
> -offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
> +vma_offset_to_vaddr(struct vm_area_struct *vma, loff_t offset)
>  {
>   return vma->vm_start + offset - ((loff_t)vma->vm_pgoff << PAGE_SHIFT);
>  }
>  
>  static inline loff_t
> -vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
> +vma_vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr)
>  {
>   return ((loff_t)vma->vm_pgoff << PAGE_SHIFT) + (vaddr - vma->vm_start);
>  }
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index bd6f230..535fd39 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -748,7 +748,7 @@ static inline struct map_info *free_map_info(struct 
> map_info *info)
>   curr = info;
>  
>   info->mm = vma->vm_mm;
> - info->vaddr = offset_to_vaddr(vma, offset);
> + info->vaddr = vma_offset_to_vaddr(vma, offset);
>   }
>   i_mmap_unlock_read(mapping);
>  
> @@ -807,7 +807,7 @@ static inline struct map_info *free_map_info(struct 
> map_info *info)
>   goto unlock;
>  
>   if (vma->vm_start > info->vaddr ||
> - vaddr_to_offset(vma, info->vaddr) != uprobe->offset)
> + vma_vaddr_to_offset(vma, info->vaddr) != uprobe->offset)
>   goto unlock;
>  
>   if (is_register) {
> @@ -977,7 +977,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct 
> mm_struct *mm)
>   uprobe->offset >= offset + vma->vm_end - vma->vm_start)
>   continue;
>  
> - vaddr = offset_to_vaddr(vma, uprobe->offset);
> + vaddr = vma_offset_to_vaddr(vma, uprobe->offset);
>   err |= remove_breakpoint(uprobe, mm, vaddr);
>   }
>   up_read(>mmap_sem);
> @@ -1023,7 +1023,7 @@ static void build_probe_list(struct inode *inode,
>   struct uprobe *u;
>  
>   INIT_LIST_HEAD(head);
> - min = vaddr_to_offset(vma, start);
> + min = vma_vaddr_to_offset(vma, start);
>   max = min + (end - start) - 1;
>  
>   spin_lock(_treelock);
> @@ -1076,7 +1076,7 @@ int uprobe_mmap(struct vm_area_struct *vma)
>   list_for_each_entry_safe(uprobe, u, _list, pending_list) {
>   if (!fatal_signal_pending(current) &&
>   filter_chain(uprobe, UPROBE_FILTER_MMAP, vma->vm_mm)) {
> - unsigned long vaddr = offset_to_vaddr(vma, 
> uprobe->offset);
> + unsigned long vaddr = vma_offset_to_vaddr(vma, 
> uprobe->offset);
>   install_breakpoint(uprobe, vma->vm_mm, vma, vaddr);
>   }
>   put_uprobe(uprobe);
> @@ -1095,7 +1095,7 @@ int uprobe_mmap(struct vm_area_struct *vma)
>  
>   inode = file_inode(vma->vm_file);
>  
> - min = vaddr_to_offset(vma, start);
> + min = vma_vaddr_to_offset(vma, start);
>   max = min + (end - start) - 1;
>  
>   spin_lock(_treelock);
> @@ -1730,7 +1730,7 @@ static struct uprobe *find_active_uprobe(unsigned long 
> bp_vaddr, int *is_swbp)
>   if (vma && vma->vm_start <= bp_vaddr) {
>   if (valid_vma(vma, false)) {
>   struct inode *inode = file_inode(vma->vm_file);
> - loff_t offset = vaddr_to_offset(vma, bp_vaddr);
> + loff_t offset = vma_vaddr_to_offset(vma, bp_vaddr);
>  
>   uprobe = find_uprobe(inode, offset);
>   }
> -- 
> 1.8.3.1
>

Re: [PATCH 3/8] Uprobe: Rename map_info to uprobe_map_info

2018-03-13 Thread Jerome Glisse

On Tue, Mar 13, 2018 at 06:25:58PM +0530, Ravi Bangoria wrote:
> map_info is very generic name, rename it to uprobe_map_info.
> Renaming will help to export this structure outside of the
> file.
> 
> Also rename free_map_info() to uprobe_free_map_info() and
> build_map_info() to uprobe_build_map_info().
> 
> No functionality changes.
> 
> Signed-off-by: Ravi Bangoria 

Same coccinelle comments as previously.

Reviewed-by: Jérôme Glisse 

> ---
>  kernel/events/uprobes.c | 32 +---
>  1 file changed, 17 insertions(+), 15 deletions(-)
> 
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 535fd39..081b88c1 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -695,27 +695,29 @@ static void delete_uprobe(struct uprobe *uprobe)
>   put_uprobe(uprobe);
>  }
>  
> -struct map_info {
> - struct map_info *next;
> +struct uprobe_map_info {
> + struct uprobe_map_info *next;
>   struct mm_struct *mm;
>   unsigned long vaddr;
>  };
>  
> -static inline struct map_info *free_map_info(struct map_info *info)
> +static inline struct uprobe_map_info *
> +uprobe_free_map_info(struct uprobe_map_info *info)
>  {
> - struct map_info *next = info->next;
> + struct uprobe_map_info *next = info->next;
>   kfree(info);
>   return next;
>  }
>  
> -static struct map_info *
> -build_map_info(struct address_space *mapping, loff_t offset, bool 
> is_register)
> +static struct uprobe_map_info *
> +uprobe_build_map_info(struct address_space *mapping, loff_t offset,
> +   bool is_register)
>  {
>   unsigned long pgoff = offset >> PAGE_SHIFT;
>   struct vm_area_struct *vma;
> - struct map_info *curr = NULL;
> - struct map_info *prev = NULL;
> - struct map_info *info;
> + struct uprobe_map_info *curr = NULL;
> + struct uprobe_map_info *prev = NULL;
> + struct uprobe_map_info *info;
>   int more = 0;
>  
>   again:
> @@ -729,7 +731,7 @@ static inline struct map_info *free_map_info(struct 
> map_info *info)
>* Needs GFP_NOWAIT to avoid i_mmap_rwsem recursion 
> through
>* reclaim. This is optimistic, no harm done if it 
> fails.
>*/
> - prev = kmalloc(sizeof(struct map_info),
> + prev = kmalloc(sizeof(struct uprobe_map_info),
>   GFP_NOWAIT | __GFP_NOMEMALLOC | 
> __GFP_NOWARN);
>   if (prev)
>   prev->next = NULL;
> @@ -762,7 +764,7 @@ static inline struct map_info *free_map_info(struct 
> map_info *info)
>   }
>  
>   do {
> - info = kmalloc(sizeof(struct map_info), GFP_KERNEL);
> + info = kmalloc(sizeof(struct uprobe_map_info), GFP_KERNEL);
>   if (!info) {
>   curr = ERR_PTR(-ENOMEM);
>   goto out;
> @@ -774,7 +776,7 @@ static inline struct map_info *free_map_info(struct 
> map_info *info)
>   goto again;
>   out:
>   while (prev)
> - prev = free_map_info(prev);
> + prev = uprobe_free_map_info(prev);
>   return curr;
>  }
>  
> @@ -782,11 +784,11 @@ static inline struct map_info *free_map_info(struct 
> map_info *info)
>  register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new)
>  {
>   bool is_register = !!new;
> - struct map_info *info;
> + struct uprobe_map_info *info;
>   int err = 0;
>  
>   percpu_down_write(_mmap_sem);
> - info = build_map_info(uprobe->inode->i_mapping,
> + info = uprobe_build_map_info(uprobe->inode->i_mapping,
>   uprobe->offset, is_register);
>   if (IS_ERR(info)) {
>   err = PTR_ERR(info);
> @@ -825,7 +827,7 @@ static inline struct map_info *free_map_info(struct 
> map_info *info)
>   up_write(>mmap_sem);
>   free:
>   mmput(mm);
> - info = free_map_info(info);
> + info = uprobe_free_map_info(info);
>   }
>   out:
>   percpu_up_write(_mmap_sem);
> -- 
> 1.8.3.1
>

Re: [PATCH 4/8] Uprobe: Export uprobe_map_info along with uprobe_{build/free}_map_info()

2018-03-13 Thread Jerome Glisse

On Tue, Mar 13, 2018 at 06:25:59PM +0530, Ravi Bangoria wrote:
> These exported data structure and functions will be used by other
> files in later patches.
> 
> No functionality changes.
> 
> Signed-off-by: Ravi Bangoria 

Reviewed-by: Jérôme Glisse 

> ---
>  include/linux/uprobes.h |  9 +
>  kernel/events/uprobes.c | 14 +++---
>  2 files changed, 12 insertions(+), 11 deletions(-)
> 
> diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h
> index 0a294e9..7bd2760 100644
> --- a/include/linux/uprobes.h
> +++ b/include/linux/uprobes.h
> @@ -109,12 +109,19 @@ enum rp_check {
>   RP_CHECK_RET,
>  };
>  
> +struct address_space;
>  struct xol_area;
>  
>  struct uprobes_state {
>   struct xol_area *xol_area;
>  };
>  
> +struct uprobe_map_info {
> + struct uprobe_map_info *next;
> + struct mm_struct *mm;
> + unsigned long vaddr;
> +};
> +
>  extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned 
> long vaddr);
>  extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, 
> unsigned long vaddr);
>  extern bool is_swbp_insn(uprobe_opcode_t *insn);
> @@ -149,6 +156,8 @@ struct uprobes_state {
>  extern bool arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs 
> *regs);
>  extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr,
>void *src, unsigned long len);
> +extern struct uprobe_map_info *uprobe_free_map_info(struct uprobe_map_info 
> *info);
> +extern struct uprobe_map_info *uprobe_build_map_info(struct address_space 
> *mapping, loff_t offset, bool is_register);
>  #else /* !CONFIG_UPROBES */
>  struct uprobes_state {
>  };
> diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c
> index 081b88c1..e7830b8 100644
> --- a/kernel/events/uprobes.c
> +++ b/kernel/events/uprobes.c
> @@ -695,23 +695,15 @@ static void delete_uprobe(struct uprobe *uprobe)
>   put_uprobe(uprobe);
>  }
>  
> -struct uprobe_map_info {
> - struct uprobe_map_info *next;
> - struct mm_struct *mm;
> - unsigned long vaddr;
> -};
> -
> -static inline struct uprobe_map_info *
> -uprobe_free_map_info(struct uprobe_map_info *info)
> +struct uprobe_map_info *uprobe_free_map_info(struct uprobe_map_info *info)
>  {
>   struct uprobe_map_info *next = info->next;
>   kfree(info);
>   return next;
>  }
>  
> -static struct uprobe_map_info *
> -uprobe_build_map_info(struct address_space *mapping, loff_t offset,
> -   bool is_register)
> +struct uprobe_map_info *uprobe_build_map_info(struct address_space *mapping,
> +   loff_t offset, bool is_register)
>  {
>   unsigned long pgoff = offset >> PAGE_SHIFT;
>   struct vm_area_struct *vma;
> -- 
> 1.8.3.1
>

Re: [v5 1/2] mm: disable interrupts while initializing deferred pages

2018-03-13 Thread Pavel Tatashin


> Soft lockup: kernel has run for too long without rescheduling
> Hard lockup: kernel has run for too long with interrupts disabled
> 
> Both of these are detected by the NMI watchdog handler.
> 
> 9b6e63cbf85b89b2d fixes a soft lockup by adding a manual rescheduling
> point.  Replacing that with touch_nmi_watchdog() won't work (I think). 
> Presumably calling touch_softlockup_watchdog() will "work", in that it
> suppresses the warning.  But it won't fix the thing which the warning
> is actually warning about: starvation of the CPU scheduler.  That's
> what the cond_resched() does.

But, unlike memmap_init_zone(), which can be used after boot, here we do
not worry about kernel running for too long.  This is because we are
booting, and no user programs are running.

So, it is acceptable to have a long uninterruptible span, as long
as we making a useful progress. BTW, the boot CPU still has
interrupts enabled during this span.

Comment in: include/linux/nmi.h, states:

 * If the architecture supports the NMI watchdog, touch_nmi_watchdog()
 * may be used to reset the timeout - for code which intentionally
 * disables interrupts for a long time. This call is stateless.

Which is exactly what we are trying to do here, now that these threads
run with interrupts disabled.

Before, where they were running with interrupts enabled, and
cond_resched() was enough to satisfy soft lockups.

> 
> I'm not sure what to suggest, really.  Your changelog isn't the best:
> "Vlastimil Babka reported about a window issue during which when
> deferred pages are initialized, and the current version of on-demand
> initialization is finished, allocations may fail".  Well...  where is
> ths mysterious window?  Without such detail it's hard for others to
> suggest alternative approaches.

Here is hopefully a better description of the problem:

Currently, during boot we preinitialize some number of struct pages to satisfy 
all boot allocations. Even if these allocations happen when we initialize the 
reset of deferred pages in page_alloc_init_late(). The problem is that we do 
not know how much kernel will need, and it also depends on various options.

So, with this work, we are changing this behavior to initialize struct pages 
on-demand, only when allocations happen.

During boot, when we try to allocate memory, the on-demand struct page 
initialization code takes care of it. But, once the deferred pages are 
initializing in:

page_alloc_init_late()
   for_each_node_state(nid, N_MEMORY)
  kthread_run(deferred_init_memmap())

We cannot use on-demand initialization, as these threads resize pgdat.

This whole thing is to take care of this time.

My first version of on-demand deferred page initialization would simply fail to 
allocate memory during this period of time. But, this new version waits for 
threads to finish initializing deferred memory, and successfully perform the 
allocation.

Because interrupt handler would wait for pgdat resize lock.

Thank you,
Pavel

Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory

2018-03-13 Thread Logan Gunthorpe

On 13/03/18 01:53 PM, Sinan Kaya wrote:
> I agree disabling globally would be bad. Somebody can always say I have
> ten switches on my system. I want to do peer-to-peer on one switch only. Now,
> this change weakened security for the other switches that I had no intention
> with doing P2P.
> 
> Isn't this a problem?

Well, if it's a problem for someone they'll have to solve it. We're
targeting JBOFs that have no use for ACS / IOMMU groups at all.

> Can we specify the BDF of the downstream device we want P2P with during boot 
> via
> kernel command line?

That's a painful configuration burden. And then things might stop
working if you change your topology at all and now have to change boot
parameters.

Logan

Re: dcache: remove trylock loops (was Re: [BUG] lock_parent() breakage when used from shrink_dentry_list())

2018-03-13 Thread John Ogness

On 2018-03-12, Al Viro  wrote:
>> If someone else has grabbed a reference, it shouldn't be added to the
>> lru list. Only decremented.
>> 
>> if (entry->d_lockref.count == 1)
>
> Nah, better handle that in retain_dentry() itself.  See updated
> #work.dcache.
>
> + if (unlikely(dentry->d_lockref.count != 1)) {
> + dentry->d_lockref.count--;
> + } else if (likely(!retain_dentry(dentry))) {
> + __dentry_kill(dentry);
> + return parent;
> + }

Although the updated version is correct (and saves on lines of code), I
find putting the deref and lru_add code in the "true" case of
retain_dentry() to be pretty tricky. IMHO the code is easier to
understand if it looks like this:

if (unlikely(dentry->d_lockref.count != 1)) {
dentry->d_lockref.count--;
} else if (likely(!retain_dentry(dentry))) {
__dentry_kill(dentry);
return parent;
} else {
dentry->d_lockref.count--;
dentry_lru_add(dentry);
}

This is what your version is doing, but that final else is hiding in the
retain_dentry() "true" case.

My suggestion is to revert 7479f57fecd2a4837b5c79ce1cf0dcf284db54be (and
then fixup dput() to deref before calling dentry_lru_add()).

> FWIW, there's another trylock loop on dentries - one in
> autofs get_next_positive_dentry().  Any plans re dealing
> with that one?

I will need to dig into it a bit deeper (I am unfamiliar with autofs),
but it looks like it is trying to do basically the same thing as the
ascend loop in d_walk().

> I'd spent the last couple of weeks (when not being too sick
> for any work) going through dcache.c and related code; hopefully
> this time I will get the documentation into postable shape ;-/

Thank you for all your help in getting these changes cleaned and
correctly implemented so quickly.

I've reviewed your latest trylock loop removal patches and found only 1
minor issue. I'll post that separately.

John Ogness

Re: [RESEND RFC] translate_pid API

2018-03-13 Thread Jann Horn

On Mon, Mar 12, 2018 at 10:18 AM,   wrote:
> Resending the RFC with participants of previous discussions
> in the list.
>
> Following patch which is a variation of a solution discussed
> in https://lwn.net/Articles/736330/ provides the users of
> pid namespace, the functionality of pid translation between
> namespaces using a namespace identifier. The topic of
> pid translation has been discussed in the community few times
> but there has always been a resistance to adding new solution
> for this problem.

> I will outline the planned usecase of pid namespace by oracle
> database and explain why any of the existing solution cannot
> be used to solve their problem.
>
> Consider a system in which several PID namespaces with multiple
> nested levels exists in parallel with monitor processes managing
> all the namespaces. PID translation is required for controlling
> and accessing information about the processes by the monitors
> and other processes down the hierarchy of namespaces. Controlling
> primarily involves sending signals or using ptrace by a process in
> parent namespace on any of the processes in its child namespace.
> Accessing information deals with the reading /proc//* files
> of processes in child namespace. None of the processes have
> root/CAP_SYS_ADMIN privileges.

How are you dealing with PID reuse?

[...]
> diff --git a/fs/nsfs.c b/fs/nsfs.c
> index 36b0772..c635465 100644
> --- a/fs/nsfs.c
> +++ b/fs/nsfs.c
> @@ -222,8 +222,13 @@ int ns_get_name(char *buf, size_t size, struct 
> task_struct *task,
> const char *name;
> ns = ns_ops->get(task);
> if (ns) {
> -   name = ns_ops->real_ns_name ? : ns_ops->name;
> -   res = snprintf(buf, size, "%s:[%u]", name, ns->inum);
> +   if (!strcmp(ns_ops->name, "pidns_id")) {

Wouldn't it be cleaner to check for "ns_ops==_id_operations"?

> +   res = snprintf(buf, size, "[%llu]",
> +  (unsigned long long)ns->ns_id);
> +   } else {
> +   name = ns_ops->real_ns_name ? : ns_ops->name;
> +   res = snprintf(buf, size, "%s:[%u]", name, ns->inum);
> +   }
> ns_ops->put(ns);
> }
> return res;
[...]
> diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h
> index 49538b1..11d1d57 100644
> --- a/include/linux/pid_namespace.h
> +++ b/include/linux/pid_namespace.h
> @@ -11,6 +11,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>
>
>  struct fs_pin;
> @@ -44,6 +45,8 @@ struct pid_namespace {
> kgid_t pid_gid;
> int hide_pid;
> int reboot; /* group exit code if this pidns was rebooted */
> +   struct hlist_bl_node node;
> +   atomic_t lookups_pending;
> struct ns_common ns;
>  } __randomize_layout;
>
[...]
> diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c
> index 0b53eef..ff83aa8 100644
> --- a/kernel/pid_namespace.c
> +++ b/kernel/pid_namespace.c
[...]
> @@ -159,6 +201,30 @@ static void delayed_free_pidns(struct rcu_head *p)
>
>  static void destroy_pid_namespace(struct pid_namespace *ns)
>  {
> +   struct pid_namespace *ph;
> +   struct hlist_bl_head *head;
> +   struct hlist_bl_node *dup_node;
> +
> +   /*
> +* Remove the namespace structure from hash table so
> +* now new lookups can start on it.

s/now new/no new/

[...]
> @@ -474,9 +551,116 @@ static struct user_namespace *pidns_owner(struct 
> ns_common *ns)
> .get_parent = pidns_get_parent,
>  };
>
> +/*
> + * translate_pid - convert pid in source pid-ns into target pid-ns.
> + * @pid: pid for translation
> + * @source: pid-ns id
> + * @target: pid-ns id
> + *
> + * Return pid in @target pid-ns, zero if task have no pid there,
> + * or -ESRCH of task with @pid is not found in @source pid-ns.

s/of/if/

> + */
> +SYSCALL_DEFINE3(translate_pid, pid_t, pid, u64, source,
> +   u64, target)
> +{
> +   struct pid_namespace *source_ns = NULL, *target_ns = NULL;
> +   struct pid *struct_pid;
> +   struct pid_namespace *ph;
> +   struct hlist_bl_head *shead = NULL;
> +   struct hlist_bl_head *thead = NULL;
> +   struct hlist_bl_node *dup_node;
> +   pid_t result;
> +
> +   if (!source) {
> +   source_ns = _pid_ns;
> +   } else {
> +   shead = pid_ns_hash_head(pid_ns_hash, source);
> +   hlist_bl_lock(shead);
> +   hlist_bl_for_each_entry(ph, dup_node, shead, node) {
> +   if (source == ph->ns.ns_id) {
> +   source_ns = ph;
> +   break;
> +   }
> +   }
> +   if (!source_ns) {
> +   hlist_bl_unlock(shead);
> +   return -EINVAL;
> +   }
> +   }
> +   if

Re: linux-next: Tree for Mar 13

2018-03-13 Thread Stephen Rothwell

Hi Dominik,

On Tue, 13 Mar 2018 18:52:05 +0100 Dominik Brodowski 
 wrote:
>
> would it be possible to have
> 
>https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git 
> syscalls-next
> 
> added to -next? This will probably only be for one or two release cycles,
> but it might easen the development process of that patch series.

Added from today.  I put it near the start since Linus said he may take
(some of) it this release.

Thanks for adding your subsystem tree as a participant of linux-next.  As
you may know, this is not a judgement of your code.  The purpose of
linux-next is for integration testing and to lower the impact of
conflicts between subsystems in the next merge window. 

You will need to ensure that the patches/commits in your tree/series have
been:
 * submitted under GPL v2 (or later) and include the Contributor's
Signed-off-by,
 * posted to the relevant mailing list,
 * reviewed by you (or another maintainer of your subsystem tree),
 * successfully unit tested, and 
 * destined for the current or next Linux merge window.

Basically, this should be just what you would send to Linus (or ask him
to fetch).  It is allowed to be rebased if you deem it necessary.

-- 
Cheers,
Stephen Rothwell 
s...@canb.auug.org.au

pgpPZcLtBfF2O.pgp
Description: OpenPGP digital signature

Re: [PATCH V2] perf/x86/intel/uncore: Querying number of CHAs from CAPID6 register

2018-03-13 Thread Liang, Kan




On 3/13/2018 4:24 PM, Kroening, Gary wrote:

I've tested this patch on the same set of hubless (single-segment) and scalable 
(segment-per-socket) configurations as for Kan's version 1.

As far as we can tell this will also work for Cascade Lake, but will need 
revisiting for Ice Lake.


For Ice lake, there will be another method to query the number of CHAs.
This patch is only for Skylake server.

Thanks,
Kan



Thanks.
Gary


-Original Message-
From: kan.li...@linux.intel.com [mailto:kan.li...@linux.intel.com]
Sent: Tuesday, March 13, 2018 1:52 PM
To: mi...@redhat.com; h...@zytor.com; t...@linutronix.de;
pet...@infradead.org; andy.shevche...@gmail.com
Cc: Kroening, Gary; Travis, Mike; Banman, Andrew; Sivanich, Dimitri;
Anderson, Russ; x...@kernel.org; linux-kernel@vger.kernel.org; Kan Liang
Subject: [PATCH V2] perf/x86/intel/uncore: Querying number of CHAs from
CAPID6 register

From: Kan Liang 

The number of CHAs is miscalculated on multi PCI domain systems on
Skylake server.

(From Kroening, Gary:

For systems with a single PCI segment, it is sufficient to look for the
bus number to change in order to determine that all of the CHa's have
been counted for a single socket.
However, for multi PCI segment systems, each socket is given a new
segment and the bus number does NOT change.  So looking only for the
bus number to change ends up counting all of the CHa's on all sockets
in the system.  This leads to writing CPU MSRs beyond a valid range and
causes an error in ivbep_uncore_msr_init_box().)

To determine the number of CHAs, it should read bits 27:0 in the CAPID6
register located at Device 30, Function 3, Offset 0x9C. These 28 bits
form a bit vector of available LLC slices and the CHAs that manage those
slices.

Fixes: cd34cd97b7b4 ("perf/x86/intel/uncore: Add Skylake server uncore
support")
Reported-by: Kroening, Gary 
Signed-off-by: Kan Liang 
---

Changes since V1:
  - add missed pci_dev_put()
  - Drop ugly casting by using hweight32()
  - Add comments for macros.

  arch/x86/events/intel/uncore_snbep.c | 31 +--
  1 file changed, 17 insertions(+), 14 deletions(-)

diff --git a/arch/x86/events/intel/uncore_snbep.c
b/arch/x86/events/intel/uncore_snbep.c
index 6d8044a..8970f71 100644
--- a/arch/x86/events/intel/uncore_snbep.c
+++ b/arch/x86/events/intel/uncore_snbep.c
@@ -3562,24 +3562,27 @@ static struct intel_uncore_type *skx_msr_uncores[]
= {
NULL,
  };

+/*
+ * To determine the number of CHAs, it should read bits 27:0 in the
CAPID6
+ * register which located at Device 30, Function 3, Offset 0x9C. PCI ID
0x2083.
+ */
+#define SKX_CAPID6 0x9c
+#define SKX_CHA_BIT_MASK   GENMASK(27, 0)
+
  static int skx_count_chabox(void)
  {
-   struct pci_dev *chabox_dev = NULL;
-   int bus, count = 0;
+   struct pci_dev *dev = NULL;
+   u32 val = 0;

-   while (1) {
-   chabox_dev = pci_get_device(PCI_VENDOR_ID_INTEL, 0x208d,
chabox_dev);
-   if (!chabox_dev)
-   break;
-   if (count == 0)
-   bus = chabox_dev->bus->number;
-   if (bus != chabox_dev->bus->number)
-   break;
-   count++;
-   }
+   dev = pci_get_device(PCI_VENDOR_ID_INTEL, 0x2083, dev);
+   if (!dev)
+   goto out;

-   pci_dev_put(chabox_dev);
-   return count;
+   pci_read_config_dword(dev, SKX_CAPID6, );
+   val &= SKX_CHA_BIT_MASK;
+out:
+   pci_dev_put(dev);
+   return hweight32(val);
  }

  void skx_uncore_cpu_init(void)
--
2.7.4

[PATCH v3 2/2] kbuild: Don't mess with the .cache.mk when root

2018-03-13 Thread Douglas Anderson

As pointed out by Masahiro Yamada people often run "sudo make install"
or "sudo make modules_install".  In theory, that could cause a cache
file (or the directory containing it) to be created by root.  After
doing this then subsequent invocations to make would yell with a whole
bunch of warnings like:

  /bin/sh: ./.cache.mk: Permission denied

These yells would be mostly harmless (we'd just go on running without
the cache), but they're ugly.

The above situation would be fairly unlikely because it should only
happen if someone does "sudo make install" before doing a normal
"make", which is an invalid thing to do.  If you did a normal "make"
before the "sudo make install" then all the cache files and
directories should have already been created by a normal user and,
even if the superuser needed to add to the existing files, we wouldn't
end up with a permission problem.

The above situation would also not have been catastrophic because
presumably if the user was able to run "sudo make install" then they
should also be able to run "sudo make clean" to clean things up.

However, even though the problem described is relatively minor, it
probably makes sense to add a heuristic to avoid creating cache files
when we're running as root.  This heuristic doesn't add a ton of
overhead and it might save someone time tracking down strange
"Permission denied" messages.  We'll consider this heuristic good
enough because the problem really shouldn't be that serious.

Fixes: 3298b690b21c ("kbuild: Add a cache for generated variables")
Suggested-by: Masahiro Yamada 
Signed-off-by: Douglas Anderson 
---

Changes in v3:
- Use "uid 0" as the heuristic instead of install
- Do the checking in the main Makefile instead of Kbuild.include

Changes in v2: None

 Makefile   | 6 ++
 scripts/Kbuild.include | 2 ++
 2 files changed, 8 insertions(+)

diff --git a/Makefile b/Makefile
index f1e61470640b..c4d2131831ba 100644
--- a/Makefile
+++ b/Makefile
@@ -268,6 +268,12 @@ __build_one_by_one:
 
 else
 
+# Don't create Makefile caches if running as root since they can't be deleted
+# easily; in the real world we might be root when doing "sudo make install"
+ifeq ($(shell id -u),0)
+export KBUILD_NOCACHE := 1
+endif
+
 # We need some generic definitions (do not try to remake the file).
 scripts/Kbuild.include: ;
 include scripts/Kbuild.include
diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include
index 065324a8046f..581f93f20119 100644
--- a/scripts/Kbuild.include
+++ b/scripts/Kbuild.include
@@ -137,12 +137,14 @@ __sanitize-opt = $(subst $$,_,$(subst 
$(right_paren),_,$(subst $(left_paren),_,$
 define __run-and-store
 ifeq ($(origin $(1)),undefined)
   $$(eval $(1) := $$(shell $$(2)))
+ifneq ($(KBUILD_NOCACHE),1)
 ifeq ($(create-cache-dir),1)
   $$(shell mkdir -p $(dir $(make-cache)))
   $$(eval create-cache-dir :=)
 endif
   $$(shell echo '$(1) := $$($(1))' >> $(make-cache))
 endif
+endif
 endef
 __shell-cached = $(eval $(call __run-and-store,$(1)))$($(1))
 shell-cached = $(call __shell-cached,__cached_$(call __sanitize-opt,$(1)),$(1))
-- 
2.16.2.660.g709887971b-goog

RE: Dell Inc. XPS 13 9343/0TM99H fails to boot v4.16-rc5

2018-03-13 Thread Mario.Limonciello

> -Original Message-
> From: Dominik Brodowski [mailto:li...@dominikbrodowski.net]
> Sent: Tuesday, March 13, 2018 1:51 PM
> To: Limonciello, Mario 
> Cc: dvh...@infradead.org; platform-driver-...@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: Re: Dell Inc. XPS 13 9343/0TM99H fails to boot v4.16-rc5
> 
> On Mon, Mar 12, 2018 at 10:42:01PM +, mario.limoncie...@dell.com wrote:
> >
> >
> > > -Original Message-
> > > From: Dominik Brodowski [mailto:li...@dominikbrodowski.net]
> > > Sent: Tuesday, March 13, 2018 2:54 AM
> > > To: dvh...@infradead.org; Limonciello, Mario 
> > > Cc: platform-driver-...@vger.kernel.org; linux-kernel@vger.kernel.org
> > > Subject: Dell Inc. XPS 13 9343/0TM99H fails to boot v4.16-rc5
> > >
> > > Mario,
> > >
> > > unfortunately, my Dell Inc. XPS 13 9343/0TM99H, BIOS A11 12/08/2016 fails 
> > > to
> > > boot v4.16-rc5. More exactly, I could bisect it down to commit 25d47027e10
> > > ("platform/x86: dell-smbios: Link all dell-smbios-* modules together").
> > > Usually, I have enabled
> > >
> > > CONFIG_SENSORS_DELL_SMM=y
> > > CONFIG_DELL_SMBIOS=y
> > > CONFIG_DELL_SMBIOS_WMI=y
> > > CONFIG_DELL_SMBIOS_SMM=y
> > > CONFIG_DELL_LAPTOP=y
> > > CONFIG_DELL_WMI=y
> > > CONFIG_DELL_WMI_DESCRIPTOR=y
> > > # CONFIG_DELL_WMI_AIO is not set
> > > # CONFIG_DELL_WMI_LED is not set
> > > # CONFIG_DELL_SMO8800 is not set
> > > # CONFIG_DELL_RBTN is not set
> > > # CONFIG_DELL_RBU is not set
> > >
> > > For v4.16-rc5 to work, I need to manually disable DELL_SMBIOS_WMI:
> > >
> > > -CONFIG_DELL_SMBIOS_WMI=y
> > > +# CONFIG_DELL_SMBIOS_WMI is not set
> > >
> > > Any ideas?
> > >
> > Dominick,
> >
> > Interesting.  Can you please change CONFIG_DELL_SMBIOS to a module
> > and see if that behavior persists?  If it does, can you please blacklist it 
> > on
> > the kernel command line and try to load it manually and share any
> > backtrace?
> 
> Mario,
> 
> building and running it as a *module* works flawlessly. But that was
> actually expected after a 'grep "initcall"' in drivers/platform/x86:

Dominck,

Thanks for your checking and explanation of the situation.

> 
> As Darren pointed out, DELL_SMBIOS_WMI depends on ACPI_WMI, so probably
> ACPI_WMI needs to be initialized first. However, the all-in-one
> dell-smbios.o is run as subsys_initcall(), same as wmi.o
> (subsys_initcall_sync() there).
> 
> If both are built-ins, that means that dell-smbios.o is run first, and wmi.o
> second. Changing dell-smbios.o to run at the later fs_initcall() level
> instead lets me boot the kernel. HOWEVER:
> 
> 1) Is there a reason why both the core and the dell-smbios-smm driver have
>to run already at subsys_initcall() time? They did so previous to your
>patch. Is it OK to defer these parts opf the all-in-one dell-smbios.o
>to fs_initcall(), or even to the default device_initcall()?

As long as they're ready before dell-laptop's initialization which uses
late_initcall that should be fine.

Am I correct to presume you're going to propose a patch you can test and
confirm your hypothesis rather than Darren reverting my patch to bring
them together?

> 
> 2) dell-smbios-wmi depends on (well, selects) DELL_WMI_DESCRIPTOR. The
>dell-smbios-wmi is running at the default device_initcall() time, but
>(AFAICS) probably later than the initialization of dell-smbios-wmi.o.
>May I presume that this poses no additional problem?
> 

I "think" that should actually be fine.  The driver will use deferred probing
until it's ready meaning that dell-smbios-wmi should wait until 
dell-wmi-descriptor
is done with it's probing routine.

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 3138 matches

Mail list logo