Re: [PATCH] arm64: dts: qcom: rename qcom,pcie devices to pcie
On Tue 13 Mar 05:13 PDT 2018, Arnd Bergmann wrote: > The node name for a PCIe host bridge must be "pcie" as required by > the binging. dtc now warns about it: > "binding", I presume... > arch/arm64/boot/dts/qcom/apq8096-db820c.dtb: Warning (pci_bridge): > /soc/agnoc@0/qcom,pcie@61: node name is not "pci" or "pcie" > arch/arm64/boot/dts/qcom/apq8096-db820c.dtb: Warning (pci_device_bus_num): > Failed prerequisite 'pci_bridge' > arch/arm64/boot/dts/qcom/msm8996-mtp.dtb: Warning (pci_bridge): > /soc/agnoc@0/qcom,pcie@61: node name is not "pci" or "pcie" > arch/arm64/boot/dts/qcom/msm8996-mtp.dtb: Warning (pci_device_bus_num): > Failed prerequisite 'pci_bridge' > > This renames the nodes as appropriate. > Reviewed-by: Bjorn AnderssonRegards, Bjorn
RE: [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg()
> From: Dexuan Cui > Sent: Wednesday, March 7, 2018 13:40 > To: Lorenzo Pieralisi> Cc: bhelg...@google.com; linux-...@vger.kernel.org; KY Srinivasan > ; Stephen Hemminger ; > o...@aepfle.de; a...@canonical.com; jasow...@redhat.com; linux- > ker...@vger.kernel.org; driverdev-de...@linuxdriverproject.org; Haiyang > Zhang ; vkuzn...@redhat.com; > marcelo.ce...@canonical.com; Michael Kelley (EOSG) > ; sta...@vger.kernel.org; Jack > Morgenstein > Subject: RE: [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg() > > > From: Lorenzo Pieralisi > > Sent: Wednesday, March 7, 2018 04:35 > > On Tue, Mar 06, 2018 at 06:21:56PM +, Dexuan Cui wrote: > > > 1. With the patch "x86/vector/msi: Switch to global reservation mode" > > > (4900be8360), the recent v4.15 and newer kernels always hang for 1-vCPU > > > Hyper-V VM with SR-IOV. This is because when we reach > > hv_compose_msi_msg() > > > by request_irq() -> request_threaded_irq() -> > > > __setup_irq()->irq_startup() > > > -> __irq_startup() -> irq_domain_activate_irq() -> ... -> > > > msi_domain_activate() -> ... -> hv_compose_msi_msg(), local irq is > > > disabled in __setup_irq(). > > > > > > Fix this by polling the channel. > > > > > > 2. If the host is ejecting the VF device before we reach > > > hv_compose_msi_msg(), in a UP VM, we can hang in > hv_compose_msi_msg() > > > forever, because at this time the host doesn't respond to the > > > CREATE_INTERRUPT request. This issue also happens to old kernels like > > > v4.14, v4.13, etc. > > > > If you are fixing a problem you should report what commit you are fixing > > with a Fixes: tag and add a CC: sta...@vger.kernel.org to the commit log > > to send it to stable kernels to which it should be applied; mentioning > > kernel versions in the commit log is useless and should be omitted. > > Hi Lorenzo, > Thanks for your comments! > This patch does have a "Cc: sta...@vger.kernel.org" in the sign-off area. :-) > > Here the patch is made to resolve 2 issues: > #1 is triggered by the x86 global reservation mode (4900be8360) patch. > 4900be8360 in itself is good. It's just that drivers/pci/host/pci-hyperv.c > should be fixed. > > #2 is a longstanding issue since the first day the pci-hyperv driver was > accepted into the kernel. > > So IMO actually we don't really need to add a Fixes: tag, which is usually > used to specify a specific commit that introduces a bug that is being fixed. > > > Side note: you should not have sta...@vger.kernel.org in the email > > addresses CC list you are sending the patches to (you mark patches for > > stable by adding an appropriate CC tag in the commit log). > > Sorry, I didn't know this, but actually I didn't add sta...@vger.kernel.org > manually. Instead I used "git send-email" to send this patchset, and it told > me "The Cc list above has been expanded by additional addresses found > in the patch commit message." > > I didn't find a way to disable this behavior of "git send-email" by checking > its manual and googling it. This is strange. > > > Here: > > > > git.kernel.org/.../Documentation/process/stable-kernel-rules.rst > > > > Last but not least, most of the patches in this series do not justify > > sending them to stable kernels at all so you should remove the > > corresponding tag from the patches. > > I hope at least these 2 patches can go into the stable kernels: > [PATCH v3 3/6] PCI: hv: serialize the present/eject work items > [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg() > Especially the second one, which fixes a real hang issue for UP virtual > machines running v4.15 and newer. > And, IMO the patches are small enough (<100 lines) , but definitely > the maintainers make the final call. > > > > > Thanks, > > Lorenzo > > > > > Fix this by polling the channel for the PCI_EJECT message and > > > hpdev->state, and by checking the PCI vendor ID. > > > > > > Note: actually the above issues also happen to a SMP VM, if > > > "hbus->hdev->channel->target_cpu == smp_processor_id()" is true. > > > > > > Signed-off-by: Dexuan Cui > > > Tested-by: Adrian Suhov > > > Tested-by: Chris Valean > > > Cc: sta...@vger.kernel.org > > > Cc: Stephen Hemminger > > > Cc: K. Y. Srinivasan > > > Cc: Vitaly Kuznetsov > > > Cc: Jack Morgenstein > > > --- > > > drivers/pci/host/pci-hyperv.c | 58 > > > Thanks, > -- Dexuan Hi Lorenzo, Bjorn, and all, Do you need more ACKs? Currently Michael and Haiyang reviewed and ack'd the patchset. Should I send a v4 that just removes the "CC: sta...@vger.kernel.org" tag for patches 1, 2, 4 and 5? I tend to avoid a v4 as I supppose it would be easier if you just remove the tags if
[bug, bisected] pfifo_fast causes packet reordering
During stress-testing our "ucan" USB/CAN adapter SocketCAN driver on Linux v4.16-rc4-383-ged58d66f60b3 we observed that a small fraction of packets are delivered out-of-order. We have tracked the problem down to the driver interface level, and it seems that the driver's net_device_ops.ndo_start_xmit() function gets the packets handed over in the wrong order. This behavior was not observed on Linux v4.15 and I have bisected the problem down to this patch: commit c5ad119fb6c09b0297446be05bd66602fa564758 Author: John FastabendDate: Thu Dec 7 09:58:19 2017 -0800 net: sched: pfifo_fast use skb_array This converts the pfifo_fast qdisc to use the skb_array data structure and set the lockless qdisc bit. pfifo_fast is the first qdisc to support the lockless bit that can be a child of a qdisc requiring locking. So we add logic to clear the lock bit on initialization in these cases when the qdisc graft operation occurs. This also removes the logic used to pick the next band to dequeue from and instead just checks a per priority array for packets from top priority to lowest. This might need to be a bit more clever but seems to work for now. Signed-off-by: John Fastabend Signed-off-by: David S. Miller The patch does not revert cleanly, but moving to one commit earlier makes the problem go away. Selecting the "fq" scheduler instead of "pfifo_fast" makes the problem go away as well. Is this an unintended side-effect of the patch or is there something the driver has to do to request in-order delivery? Thanks, Jakob
Re: [PATCH 18/31] perf vendor events arm64: Add armv8-recommended.json
Em Tue, Mar 13, 2018 at 03:26:18PM +0100, Ingo Molnar escreveu: > * Arnaldo Carvalho de Melo <a...@kernel.org> wrote: > > From: John Garry <john.ga...@huawei.com> > > Add JSON for ARMv8 IMPLEMENTATION DEFINED recommended events. > > The JSON is copied from ARMv8 architecture reference manual, available > > here: > > https://static.docs.arm.com/ddi0487/ca/DDI0487C_a_armv8_arm.pdf > > Signed-off-by: Shaokun Zhang <zhangshao...@hisilicon.com> > > Cc: Alexander Shishkin <alexander.shish...@linux.intel.com> > > Cc: Andi Kleen <a...@linux.intel.com> > > Cc: Ganapatrao Kulkarni <ganapatrao.kulka...@cavium.com> > > Cc: Jiri Olsa <jo...@redhat.com> > > Cc: Namhyung Kim <namhy...@kernel.org> > > Cc: Peter Zijlstra <pet...@infradead.org> > > Cc: Shaokun Zhang <zhangshao...@hisilicon.com> > > Cc: Will Deacon <will.dea...@arm.com> > > Cc: William Cohen <wco...@redhat.com> > > Cc: linux-arm-ker...@lists.infradead.org > > Cc: linux...@huawei.com > > Link: > > http://lkml.kernel.org/r/1520506716-197429-9-git-send-email-john.ga...@huawei.com > > Signed-off-by: John Garry <john.ga...@huawei.com> > > Signed-off-by: Arnaldo Carvalho de Melo <a...@redhat.com> > That's not a valid SOB chain, author != first-Signed-off-by. I removed that cset for now, can you please check if the perf-core-for-mingo-4.17-20180313-2 tag is allright? - Arnaldo
Re: [PATCH v4 09/24] fpga: dfl-pci: add enumeration for feature devices
On Tue, Feb 13, 2018 at 3:24 AM, Wu Haowrote: Hi Hao, Thanks again for splitting the pci part of the code from enumeration and everything else. One thing that may need to be fixed below, so with that fixed, adding my ack. > The Device Feature List (DFL) is implemented in MMIO, and features > are linked via the DFLs. This patch enables pcie driver to prepare > enumeration information (e.g locations of all device feature lists > in MMIO) and use common APIs provided by the Device Feature List > framework to enumerate each feature device linked. > > Signed-off-by: Tim Whisonant > Signed-off-by: Enno Luebbers > Signed-off-by: Shiva Rao > Signed-off-by: Christopher Rauer > Signed-off-by: Zhang Yi > Signed-off-by: Xiao Guangrong > Signed-off-by: Wu Hao Acked-by: Alan Tull > --- > v3: split from another patch > use common functions from DFL framework for enumeration. > v4: rebase > --- > drivers/fpga/dfl-pci.c | 199 > - > 1 file changed, 197 insertions(+), 2 deletions(-) > > diff --git a/drivers/fpga/dfl-pci.c b/drivers/fpga/dfl-pci.c > index d91ea42..8ce8a94 100644 > --- a/drivers/fpga/dfl-pci.c > +++ b/drivers/fpga/dfl-pci.c > @@ -22,9 +22,52 @@ > #include > #include > > +#include "dfl.h" > + > #define DRV_VERSION"0.8" > #define DRV_NAME "dfl-pci" > > +struct cci_drvdata { > + struct fpga_cdev *cdev; /* container device */ > + struct list_head regions; /* list of pci bar mapping region */ > +}; > + > +/* pci bar mapping info */ > +struct cci_region { > + int bar; > + void __iomem *ioaddr; /* pointer to mapped bar region */ > + struct list_head node; > +}; > + > +static void __iomem *cci_pci_ioremap_bar(struct pci_dev *pcidev, int bar) > +{ > + struct cci_drvdata *drvdata = pci_get_drvdata(pcidev); > + struct cci_region *region; > + > + list_for_each_entry(region, >regions, node) > + if (region->bar == bar) { > + dev_dbg(>dev, "BAR %d region exists\n", bar); > + return region->ioaddr; > + } > + > + region = devm_kzalloc(>dev, sizeof(*region), GFP_KERNEL); > + if (!region) > + return NULL; > + > + region->bar = bar; > + region->ioaddr = pci_ioremap_bar(pcidev, bar); > + if (!region->ioaddr) { > + dev_err(>dev, "can't ioremap memory from BAR %d.\n", > + bar); > + devm_kfree(>dev, region); > + return NULL; > + } > + > + list_add(>node, >regions); > + > + return region->ioaddr; > +} > + > /* PCI Device ID */ > #define PCIE_DEVICE_ID_PF_INT_5_X 0xBCBD > #define PCIE_DEVICE_ID_PF_INT_6_X 0xBCC0 > @@ -45,6 +88,143 @@ static struct pci_device_id cci_pcie_id_tbl[] = { > }; > MODULE_DEVICE_TABLE(pci, cci_pcie_id_tbl); > > +static int cci_init_drvdata(struct pci_dev *pcidev) > +{ > + struct cci_drvdata *drvdata; > + > + drvdata = devm_kzalloc(>dev, sizeof(*drvdata), GFP_KERNEL); > + if (!drvdata) > + return -ENOMEM; > + > + INIT_LIST_HEAD(>regions); > + > + pci_set_drvdata(pcidev, drvdata); > + > + return 0; > +} > + > +static void cci_pci_release_regions(struct pci_dev *pcidev) > +{ > + struct cci_drvdata *drvdata = pci_get_drvdata(pcidev); > + struct cci_region *tmp, *region; > + > + list_for_each_entry_safe(region, tmp, >regions, node) { > + list_del(>node); > + if (region->ioaddr) > + pci_iounmap(pcidev, region->ioaddr); > + devm_kfree(>dev, region); > + } > +} > + > +static void cci_remove_drvdata(struct pci_dev *pcidev) > +{ > + struct cci_drvdata *drvdata = pci_get_drvdata(pcidev); > + > + cci_pci_release_regions(pcidev); Would it make sense to call fpga_enum_info_free here? I understand fpga_enum_info_alloc uses devm, but it does a get_device which needs to be put. > + pci_set_drvdata(pcidev, NULL); > + devm_kfree(>dev, drvdata); > +} > + > +static void cci_remove_feature_devs(struct pci_dev *pcidev) > +{ > + struct cci_drvdata *drvdata = pci_get_drvdata(pcidev); > + > + /* remove all children feature devices */ > + fpga_remove_feature_devs(drvdata->cdev); > +} > + > +/* enumerate feature devices under pci device */ > +static int cci_enumerate_feature_devs(struct pci_dev *pcidev) > +{ > + struct cci_drvdata *drvdata = pci_get_drvdata(pcidev); > + struct fpga_cdev *cdev; > + struct fpga_enum_info *info; > + resource_size_t start, len; > + void __iomem *base; > + int port_num, bar, i, ret = 0; > + u32 offset; > + u64 v; > + > + /*
Re: [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg()
On Tue, Mar 13, 2018 at 06:23:39PM +, Dexuan Cui wrote: > > From: Dexuan Cui > > Sent: Wednesday, March 7, 2018 13:40 > > To: Lorenzo Pieralisi> > Cc: bhelg...@google.com; linux-...@vger.kernel.org; KY Srinivasan > > ; Stephen Hemminger ; > > o...@aepfle.de; a...@canonical.com; jasow...@redhat.com; linux- > > ker...@vger.kernel.org; driverdev-de...@linuxdriverproject.org; Haiyang > > Zhang ; vkuzn...@redhat.com; > > marcelo.ce...@canonical.com; Michael Kelley (EOSG) > > ; sta...@vger.kernel.org; Jack > > Morgenstein > > Subject: RE: [PATCH v3 6/6] PCI: hv: fix 2 hang issues in > > hv_compose_msi_msg() > > > > > From: Lorenzo Pieralisi > > > Sent: Wednesday, March 7, 2018 04:35 > > > On Tue, Mar 06, 2018 at 06:21:56PM +, Dexuan Cui wrote: > > > > 1. With the patch "x86/vector/msi: Switch to global reservation mode" > > > > (4900be8360), the recent v4.15 and newer kernels always hang for 1-vCPU > > > > Hyper-V VM with SR-IOV. This is because when we reach > > > hv_compose_msi_msg() > > > > by request_irq() -> request_threaded_irq() -> > > > > __setup_irq()->irq_startup() > > > > -> __irq_startup() -> irq_domain_activate_irq() -> ... -> > > > > msi_domain_activate() -> ... -> hv_compose_msi_msg(), local irq is > > > > disabled in __setup_irq(). > > > > > > > > Fix this by polling the channel. > > > > > > > > 2. If the host is ejecting the VF device before we reach > > > > hv_compose_msi_msg(), in a UP VM, we can hang in > > hv_compose_msi_msg() > > > > forever, because at this time the host doesn't respond to the > > > > CREATE_INTERRUPT request. This issue also happens to old kernels like > > > > v4.14, v4.13, etc. > > > > > > If you are fixing a problem you should report what commit you are fixing > > > with a Fixes: tag and add a CC: sta...@vger.kernel.org to the commit log > > > to send it to stable kernels to which it should be applied; mentioning > > > kernel versions in the commit log is useless and should be omitted. > > > > Hi Lorenzo, > > Thanks for your comments! > > This patch does have a "Cc: sta...@vger.kernel.org" in the sign-off area. > > :-) > > > > Here the patch is made to resolve 2 issues: > > #1 is triggered by the x86 global reservation mode (4900be8360) patch. > > 4900be8360 in itself is good. It's just that drivers/pci/host/pci-hyperv.c > > should be fixed. > > > > #2 is a longstanding issue since the first day the pci-hyperv driver was > > accepted into the kernel. > > > > So IMO actually we don't really need to add a Fixes: tag, which is usually > > used to specify a specific commit that introduces a bug that is being fixed. > > > > > Side note: you should not have sta...@vger.kernel.org in the email > > > addresses CC list you are sending the patches to (you mark patches for > > > stable by adding an appropriate CC tag in the commit log). > > > > Sorry, I didn't know this, but actually I didn't add sta...@vger.kernel.org > > manually. Instead I used "git send-email" to send this patchset, and it told > > me "The Cc list above has been expanded by additional addresses found > > in the patch commit message." > > > > I didn't find a way to disable this behavior of "git send-email" by checking > > its manual and googling it. This is strange. > > > > > Here: > > > > > > git.kernel.org/.../Documentation/process/stable-kernel-rules.rst > > > > > > Last but not least, most of the patches in this series do not justify > > > sending them to stable kernels at all so you should remove the > > > corresponding tag from the patches. > > > > I hope at least these 2 patches can go into the stable kernels: > > [PATCH v3 3/6] PCI: hv: serialize the present/eject work items > > [PATCH v3 6/6] PCI: hv: fix 2 hang issues in hv_compose_msi_msg() > > Especially the second one, which fixes a real hang issue for UP virtual > > machines running v4.15 and newer. > > And, IMO the patches are small enough (<100 lines) , but definitely > > the maintainers make the final call. > > > > > > > > Thanks, > > > Lorenzo > > > > > > > Fix this by polling the channel for the PCI_EJECT message and > > > > hpdev->state, and by checking the PCI vendor ID. > > > > > > > > Note: actually the above issues also happen to a SMP VM, if > > > > "hbus->hdev->channel->target_cpu == smp_processor_id()" is true. > > > > > > > > Signed-off-by: Dexuan Cui > > > > Tested-by: Adrian Suhov > > > > Tested-by: Chris Valean > > > > Cc: sta...@vger.kernel.org > > > > Cc: Stephen Hemminger > > > > Cc: K. Y. Srinivasan > > > > Cc: Vitaly Kuznetsov > > > > Cc: Jack Morgenstein > > > > --- > > > > drivers/pci/host/pci-hyperv.c | 58 > > > > > > Thanks, > > -- Dexuan > > Hi Lorenzo,
[PATCH RFC 8/8] powerpc/64: barrier_nospec: Add commandline trigger
Copypasta from rfi implementation Signed-off-by: Michal Suchanek--- arch/powerpc/kernel/setup_64.c | 8 1 file changed, 8 insertions(+) diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 4b67b7b877d9..257f0e6be107 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -840,6 +840,14 @@ static int __init handle_no_pti(char *p) } early_param("nopti", handle_no_pti); +static int __init handle_no_nospec(char *p) +{ + pr_info("barrier_nospec: disabled on command line."); + no_nospec = true; + return 0; +} +early_param("no_nospec", handle_no_nospec); + static void do_nothing(void *unused) { /* -- 2.13.6
[PATCH RFC 7/8] powerpc/64s: barrier_nospec: Add hcall triggerr
Copypasta from rfi implementation Signed-off-by: Michal Suchanek--- arch/powerpc/platforms/pseries/setup.c | 38 ++ 1 file changed, 25 insertions(+), 13 deletions(-) diff --git a/arch/powerpc/platforms/pseries/setup.c b/arch/powerpc/platforms/pseries/setup.c index 1a527625acf7..b779ddb8e250 100644 --- a/arch/powerpc/platforms/pseries/setup.c +++ b/arch/powerpc/platforms/pseries/setup.c @@ -459,38 +459,50 @@ static void __init find_and_init_phbs(void) of_pci_check_probe_only(); } -static void pseries_setup_rfi_flush(void) +static void pseries_setup_rfi_nospec(void) { struct h_cpu_char_result result; - enum l1d_flush_type types; - bool enable; + enum l1d_flush_type flush_types; + enum spec_barrier_type barrier_type; + bool flush_enable; + bool barrier_enable; long rc; /* Enable by default */ - enable = true; + flush_enable = true; + barrier_enable = true; + /* no fallback if the firmware does not tell us */ + barrier_type = SPEC_BARRIER_NONE; rc = plpar_get_cpu_characteristics(); if (rc == H_SUCCESS) { - types = L1D_FLUSH_NONE; + flush_types = L1D_FLUSH_NONE; if (result.character & H_CPU_CHAR_L1D_FLUSH_TRIG2) - types |= L1D_FLUSH_MTTRIG; + flush_types |= L1D_FLUSH_MTTRIG; if (result.character & H_CPU_CHAR_L1D_FLUSH_ORI30) - types |= L1D_FLUSH_ORI; + flush_types |= L1D_FLUSH_ORI; + if (result.character & H_CPU_CHAR_SPEC_BAR_ORI31) + barrier_type |= SPEC_BARRIER_ORI; /* Use fallback if nothing set in hcall */ - if (types == L1D_FLUSH_NONE) - types = L1D_FLUSH_FALLBACK; + if (flush_types == L1D_FLUSH_NONE) + flush_types = L1D_FLUSH_FALLBACK; if ((!(result.behaviour & H_CPU_BEHAV_L1D_FLUSH_PR)) || (!(result.behaviour & H_CPU_BEHAV_FAVOUR_SECURITY))) - enable = false; + flush_enable = false; + + if ((!(result.behaviour & H_CPU_BEHAV_BNDS_CHK_SPEC_BAR)) || + (!(result.behaviour & H_CPU_BEHAV_FAVOUR_SECURITY))) + barrier_enable = false; } else { /* Default to fallback if case hcall is not available */ - types = L1D_FLUSH_FALLBACK; + flush_types = L1D_FLUSH_FALLBACK; } - setup_rfi_flush(types, enable); + setup_barrier_nospec(barrier_type, barrier_enable); + setup_rfi_flush(flush_types, flush_enable); } #ifdef CONFIG_PCI_IOV @@ -666,7 +678,7 @@ static void __init pSeries_setup_arch(void) fwnmi_init(); - pseries_setup_rfi_flush(); + pseries_setup_rfi_nospec(); /* By default, only probe PCI (can be overridden by rtas_pci) */ pci_add_flags(PCI_PROBE_ONLY); -- 2.13.6
[PATCH v4] dmaengine: pl330: flush before wait, and add dev burst support.
Do DMAFLUSHP _before_ the first DMAWFP to ensure controller and peripheral are in agreement about dma request state before first transfer. Add support for burst transfers to/from peripherals. In the new scheme, the controller does as many burst transfers as it can then transfers the remaining dregs with either single transfers for peripherals, or with a reduced size burst for memory-to-memory transfers. Signed-off-by: Frank Mori HessTested-by: Frank Mori Hess --- I tested dma transfers to peripherals with v3 patch and designware serial port (drivers/tty/serial/8250/8250_dw.c) and a GPIB interface (https://github.com/fmhess/fmh_gpib_core). I tested memory-to-memory transfers using the dmatest module. v3 of this patch should be the same as v2 except with checkpatch.pl warnings and errors cleaned up. v4 addresses cosmetic complaints about v3, should be functionally unchanged. drivers/dma/pl330.c | 209 +++- 1 file changed, 159 insertions(+), 50 deletions(-) diff --git a/drivers/dma/pl330.c b/drivers/dma/pl330.c index d7327fd5f445..819a578e317f 100644 --- a/drivers/dma/pl330.c +++ b/drivers/dma/pl330.c @@ -27,6 +27,7 @@ #include #include #include +#include #include "dmaengine.h" #define PL330_MAX_CHAN 8 @@ -1094,51 +1095,96 @@ static inline int _ldst_memtomem(unsigned dry_run, u8 buf[], return off; } -static inline int _ldst_devtomem(struct pl330_dmac *pl330, unsigned dry_run, -u8 buf[], const struct _xfer_spec *pxs, -int cyc) +static u32 _emit_load(unsigned int dry_run, u8 buf[], + enum pl330_cond cond, enum dma_transfer_direction direction, + u8 peri) { int off = 0; - enum pl330_cond cond; - if (pl330->quirks & PL330_QUIRK_BROKEN_NO_FLUSHP) - cond = BURST; - else - cond = SINGLE; + switch (direction) { + case DMA_MEM_TO_MEM: + /* fall through */ + case DMA_MEM_TO_DEV: + off += _emit_LD(dry_run, [off], cond); + break; - while (cyc--) { - off += _emit_WFP(dry_run, [off], cond, pxs->desc->peri); - off += _emit_LDP(dry_run, [off], cond, pxs->desc->peri); - off += _emit_ST(dry_run, [off], ALWAYS); + case DMA_DEV_TO_MEM: + if (cond == ALWAYS) { + off += _emit_LDP(dry_run, [off], SINGLE, + peri); + off += _emit_LDP(dry_run, [off], BURST, + peri); + } else { + off += _emit_LDP(dry_run, [off], cond, + peri); + } + break; - if (!(pl330->quirks & PL330_QUIRK_BROKEN_NO_FLUSHP)) - off += _emit_FLUSHP(dry_run, [off], - pxs->desc->peri); + default: + /* this code should be unreachable */ + WARN_ON(1); + break; } return off; } -static inline int _ldst_memtodev(struct pl330_dmac *pl330, +static inline u32 _emit_store(unsigned int dry_run, u8 buf[], + enum pl330_cond cond, enum dma_transfer_direction direction, + u8 peri) +{ + int off = 0; + + switch (direction) { + case DMA_MEM_TO_MEM: + /* fall through */ + case DMA_DEV_TO_MEM: + off += _emit_ST(dry_run, [off], cond); + break; + + case DMA_MEM_TO_DEV: + if (cond == ALWAYS) { + off += _emit_STP(dry_run, [off], SINGLE, + peri); + off += _emit_STP(dry_run, [off], BURST, + peri); + } else { + off += _emit_STP(dry_run, [off], cond, + peri); + } + break; + + default: + /* this code should be unreachable */ + WARN_ON(1); + break; + } + + return off; +} + +static inline int _ldst_peripheral(struct pl330_dmac *pl330, unsigned dry_run, u8 buf[], -const struct _xfer_spec *pxs, int cyc) +const struct _xfer_spec *pxs, int cyc, +enum pl330_cond cond) { int off = 0; - enum pl330_cond cond; if (pl330->quirks & PL330_QUIRK_BROKEN_NO_FLUSHP) cond = BURST; - else - cond = SINGLE; + /* +* do FLUSHP at beginning to clear any stale dma requests before the +* first WFP. +*/ + if (!(pl330->quirks & PL330_QUIRK_BROKEN_NO_FLUSHP)) + off += _emit_FLUSHP(dry_run, [off], pxs->desc->peri);
[PATCH RFC 3/8] powerpc/64: Use barrier_nospec in syscall entry
Signed-off-by: Michal Suchanek--- arch/powerpc/kernel/entry_64.S | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/powerpc/kernel/entry_64.S b/arch/powerpc/kernel/entry_64.S index 2cb5109a7ea3..7bfc4cf48af2 100644 --- a/arch/powerpc/kernel/entry_64.S +++ b/arch/powerpc/kernel/entry_64.S @@ -36,6 +36,7 @@ #include #include #include +#include #include #ifdef CONFIG_PPC_BOOK3S #include @@ -159,6 +160,7 @@ system_call:/* label this so stack traces look sane */ andi. r11,r10,_TIF_SYSCALL_DOTRACE bne .Lsyscall_dotrace /* does not return */ cmpldi 0,r0,NR_syscalls + barrier_nospec bge-.Lsyscall_enosys .Lsyscall: @@ -319,6 +321,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_HAS_PPR) ld r10,TI_FLAGS(r10) cmpldi r0,NR_syscalls + barrier_nospec blt+.Lsyscall /* Return code is already in r3 thanks to do_syscall_trace_enter() */ -- 2.13.6
[PATCH RFC 6/8] powerpc/64: barrier_nospec: Add debugfs trigger
Copypasta from rfi implementation Signed-off-by: Michal Suchanek--- arch/powerpc/kernel/setup_64.c | 35 +++ 1 file changed, 35 insertions(+) diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index d1d9f047161e..4b67b7b877d9 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -955,6 +955,41 @@ static __init int rfi_flush_debugfs_init(void) return 0; } device_initcall(rfi_flush_debugfs_init); + +static int barrier_nospec_set(void *data, u64 val) +{ + switch (val) { + case 0: + case 1: + break; + default: + return -EINVAL; + } + + if (!!val == !!barrier_nospec_enabled) + return 0; + + barrier_nospec_enable(!!val); + + return 0; +} + +static int barrier_nospec_get(void *data, u64 *val) +{ + *val = barrier_nospec_enabled ? 1 : 0; + return 0; +} + +DEFINE_SIMPLE_ATTRIBUTE(fops_barrier_nospec, + barrier_nospec_get, barrier_nospec_set, "%llu\n"); + +static __init int barrier_nospec_debugfs_init(void) +{ + debugfs_create_file("barrier_nospec", 0600, powerpc_debugfs_root, NULL, + _barrier_nospec); + return 0; +} +device_initcall(barrier_nospec_debugfs_init); #endif ssize_t cpu_show_meltdown(struct device *dev, struct device_attribute *attr, char *buf) -- 2.13.6
[PATCH RFC 0/8] powerpc barrier_nospec
Hello, this is patchset adding barrier_nospec on powerpc. It is based on the out-of-tree gmb() patch and the existing rfi patches. I do not have the tests for the Spectre/Meltdown issues available so this is untested. Feedback on the general approach as well as actual effectivity is welcome. Thanks Michal Michal Suchanek (8): powerpc: Add barrier_nospec powerpc: Use barrier_nospec in copy_from_user powerpc/64: Use barrier_nospec in syscall entry powerpc/64s: Add support for ori barrier_nospec powerpc/64: Patch barrier_nospec in modules powerpc/64: barrier_nospec: Add debugfs trigger powerpc/64s: barrier_nospec: Add hcall triggerr powerpc/64: barrier_nospec: Add commandline trigger arch/powerpc/include/asm/barrier.h| 9 arch/powerpc/include/asm/feature-fixups.h | 9 arch/powerpc/include/asm/setup.h | 11 + arch/powerpc/include/asm/uaccess.h| 11 - arch/powerpc/kernel/entry_64.S| 3 ++ arch/powerpc/kernel/module.c | 6 +++ arch/powerpc/kernel/setup_64.c| 72 +++ arch/powerpc/kernel/vmlinux.lds.S | 7 +++ arch/powerpc/lib/feature-fixups.c | 38 arch/powerpc/platforms/pseries/setup.c| 38 ++-- 10 files changed, 190 insertions(+), 14 deletions(-) -- 2.13.6
Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct
On Tue, 13 Mar 2018, Matthew Wilcox wrote: > On Tue, Mar 13, 2018 at 06:19:51PM +0100, Julia Lawall wrote: > > On Thu, 8 Mar 2018, Matthew Wilcox wrote: > > > On Thu, Mar 08, 2018 at 07:24:47AM +0100, Julia Lawall wrote: > > > > Thanks. So it's OK to replace kmalloc and kzalloc, even though they > > > > didn't previously consider vmalloc and even though kmalloc doesn't zero? > > > > > > We'll also need to replace the corresponding places where those structs > > > are freed with kvfree(). Can coccinelle handle that too? > > > > Is the use of vmalloc a necessary part of the design? Or could there be a > > non vmalloc versions for call sites that are already ok with that? > > We can also add kmalloc_struct() along with kmalloc_ab_c that won't fall > back to vmalloc but just return NULL. It could be safer than being sure to find all of the relevant kfrees. julia
[PATCH] spi: bcm-qspi: fIX some error handling paths
For some reason, commit c0368e4db4a3 ("spi: bcm-qspi: Fix use after free in bcm_qspi_probe() in error path") has updated some gotos, but not all of them. This looks spurious, so fix it. Fixes: fa236a7ef240 ("spi: bcm-qspi: Add Broadcom MSPI driver") Signed-off-by: Christophe JAILLET--- I've added the same Fixes: tag as in commit c0368e4db4a3 even if the patch would not cleanly apply on top of it. At least commit 4e3b2d236fe0 ("spi: bcm-qspi: Add BSPI spi-nor flash controller driver") would also be needed. Also the label names could be improved. But this goes beyond the scope of this patch. --- drivers/spi/spi-bcm-qspi.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/spi/spi-bcm-qspi.c b/drivers/spi/spi-bcm-qspi.c index a172ab299e80..1596d35498c5 100644 --- a/drivers/spi/spi-bcm-qspi.c +++ b/drivers/spi/spi-bcm-qspi.c @@ -1247,7 +1247,7 @@ int bcm_qspi_probe(struct platform_device *pdev, qspi->base[MSPI] = devm_ioremap_resource(dev, res); if (IS_ERR(qspi->base[MSPI])) { ret = PTR_ERR(qspi->base[MSPI]); - goto qspi_probe_err; + goto qspi_resource_err; } } else { goto qspi_resource_err; @@ -1258,7 +1258,7 @@ int bcm_qspi_probe(struct platform_device *pdev, qspi->base[BSPI] = devm_ioremap_resource(dev, res); if (IS_ERR(qspi->base[BSPI])) { ret = PTR_ERR(qspi->base[BSPI]); - goto qspi_probe_err; + goto qspi_resource_err; } qspi->bspi_mode = true; } else { -- 2.14.1
Re: [PATCH 2/2] mm: Add kvmalloc_ab_c and kvzalloc_struct
On Tue, Mar 13, 2018 at 06:19:51PM +0100, Julia Lawall wrote: > On Thu, 8 Mar 2018, Matthew Wilcox wrote: > > On Thu, Mar 08, 2018 at 07:24:47AM +0100, Julia Lawall wrote: > > > Thanks. So it's OK to replace kmalloc and kzalloc, even though they > > > didn't previously consider vmalloc and even though kmalloc doesn't zero? > > > > We'll also need to replace the corresponding places where those structs > > are freed with kvfree(). Can coccinelle handle that too? > > Is the use of vmalloc a necessary part of the design? Or could there be a > non vmalloc versions for call sites that are already ok with that? We can also add kmalloc_struct() along with kmalloc_ab_c that won't fall back to vmalloc but just return NULL.
[PATCH v3 0/3] Bluetooth: hci_qca: Add serdev support
Hi, This patchset enables the Qualcomm BT controller QCA6174 node in the device tree of the db820c board. This allows the bluetooth chipset to be probed and registered against the hci layer by using the serdev framework. This patchset also contains the documentation for the compatible string "qcom,qca6174-bt" related to this chipset. v3: - Address comments for patch #3 (details in patch) v2: - Fix author email Thierry Escande (3): arm64: dts: apq8096-db820c: enable bluetooth node dt-bindings: net: bluetooth: Add qualcomm-bluetooth Bluetooth: hci_qca: Add serdev support .../devicetree/bindings/net/qualcomm-bluetooth.txt | 34 +++ arch/arm64/boot/dts/qcom/apq8096-db820c-pins.dtsi | 14 +++ .../boot/dts/qcom/apq8096-db820c-pmic-pins.dtsi| 17 arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi | 32 ++ arch/arm64/boot/dts/qcom/msm8996.dtsi | 10 ++ drivers/bluetooth/Kconfig | 2 +- drivers/bluetooth/hci_qca.c| 109 - 7 files changed, 215 insertions(+), 3 deletions(-) create mode 100644 Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt -- 2.14.1
[PATCH v3 3/3] Bluetooth: hci_qca: Add serdev support
Add support for Qualcomm serial slave devices. Probe the serial device, retrieve its maximum speed and register a new hci uart device. Signed-off-by: Thierry Escande--- v3: - Remove redundant call to gpiod_set_value() after devm_gpiod_get() - Check returned values for clk_set_rate() and clk_prepare_enable() - Use clk_disable_unprepare() drivers/bluetooth/Kconfig | 2 +- drivers/bluetooth/hci_qca.c | 109 +++- 2 files changed, 108 insertions(+), 3 deletions(-) diff --git a/drivers/bluetooth/Kconfig b/drivers/bluetooth/Kconfig index 07e55cd8f8c8..c2a6a7ebd14b 100644 --- a/drivers/bluetooth/Kconfig +++ b/drivers/bluetooth/Kconfig @@ -195,7 +195,7 @@ config BT_HCIUART_BCM config BT_HCIUART_QCA bool "Qualcomm Atheros protocol support" - depends on BT_HCIUART + depends on BT_HCIUART_SERDEV select BT_HCIUART_H4 select BT_QCA help diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c index 05ec530b8a3a..aa0886d5324f 100644 --- a/drivers/bluetooth/hci_qca.c +++ b/drivers/bluetooth/hci_qca.c @@ -29,7 +29,11 @@ */ #include +#include #include +#include +#include +#include #include #include @@ -50,6 +54,9 @@ #define IBS_TX_IDLE_TIMEOUT_MS 2000 #define BAUDRATE_SETTLE_TIMEOUT_MS 300 +/* divclk4 rate */ +#define DIVCLK4_RATE_32KHZ 32768 + /* HCI_IBS transmit side sleep protocol states */ enum tx_ibs_states { HCI_IBS_TX_ASLEEP, @@ -111,6 +118,12 @@ struct qca_data { u64 votes_off; }; +struct qca_serdev { + struct hci_uart serdev_hu; + struct gpio_desc *bt_en; + struct clk *divclk4; +}; + static void __serial_clock_on(struct tty_struct *tty) { /* TODO: Some chipset requires to enable UART clock on client @@ -386,6 +399,7 @@ static void hci_ibs_wake_retrans_timeout(struct timer_list *t) /* Initialize protocol */ static int qca_open(struct hci_uart *hu) { + struct qca_serdev *qcadev; struct qca_data *qca; BT_DBG("hu %p qca_open", hu); @@ -444,6 +458,13 @@ static int qca_open(struct hci_uart *hu) timer_setup(>tx_idle_timer, hci_ibs_tx_idle_timeout, 0); qca->tx_idle_delay = IBS_TX_IDLE_TIMEOUT_MS; + if (hu->serdev) { + serdev_device_open(hu->serdev); + + qcadev = serdev_device_get_drvdata(hu->serdev); + gpiod_set_value(qcadev->bt_en, 1); + } + BT_DBG("HCI_UART_QCA open, tx_idle_delay=%u, wake_retrans=%u", qca->tx_idle_delay, qca->wake_retrans); @@ -512,6 +533,7 @@ static int qca_flush(struct hci_uart *hu) /* Close protocol */ static int qca_close(struct hci_uart *hu) { + struct qca_serdev *qcadev; struct qca_data *qca = hu->priv; BT_DBG("hu %p qca close", hu); @@ -525,6 +547,13 @@ static int qca_close(struct hci_uart *hu) destroy_workqueue(qca->workqueue); qca->hu = NULL; + if (hu->serdev) { + serdev_device_close(hu->serdev); + + qcadev = serdev_device_get_drvdata(hu->serdev); + gpiod_set_value(qcadev->bt_en, 0); + } + kfree_skb(qca->rx_skb); hu->priv = NULL; @@ -885,6 +914,14 @@ static int qca_set_baudrate(struct hci_dev *hdev, uint8_t baudrate) return 0; } +static inline void host_set_baudrate(struct hci_uart *hu, unsigned int speed) +{ + if (hu->serdev) + serdev_device_set_baudrate(hu->serdev, speed); + else + hci_uart_set_baudrate(hu, speed); +} + static int qca_setup(struct hci_uart *hu) { struct hci_dev *hdev = hu->hdev; @@ -905,7 +942,7 @@ static int qca_setup(struct hci_uart *hu) speed = hu->proto->init_speed; if (speed) - hci_uart_set_baudrate(hu, speed); + host_set_baudrate(hu, speed); /* Setup user speed if needed */ speed = 0; @@ -924,7 +961,7 @@ static int qca_setup(struct hci_uart *hu) ret); return ret; } - hci_uart_set_baudrate(hu, speed); + host_set_baudrate(hu, speed); } /* Setup patch / NVM configurations */ @@ -958,12 +995,80 @@ static struct hci_uart_proto qca_proto = { .dequeue= qca_dequeue, }; +static int qca_serdev_probe(struct serdev_device *serdev) +{ + struct qca_serdev *qcadev; + int err; + + qcadev = devm_kzalloc(>dev, sizeof(*qcadev), GFP_KERNEL); + if (!qcadev) + return -ENOMEM; + + qcadev->serdev_hu.serdev = serdev; + serdev_device_set_drvdata(serdev, qcadev); + + qcadev->bt_en = devm_gpiod_get(>dev, "bt-disable-n", + GPIOD_OUT_LOW); + if (IS_ERR(qcadev->bt_en)) { + dev_err(>dev, "failed to acquire bt-disable-n gpio\n"); + return
RE: [PATCH 2/3] dt-bindings: clock: Add bindings for ZynqMP clock driver
Hi Rob, > -Original Message- > From: Rob Herring [mailto:r...@kernel.org] > Sent: Wednesday, March 07, 2018 5:20 PM > To: Jolly Shah> Cc: mturque...@baylibre.com; sb...@codeaurora.org; > michal.si...@xilinx.com; mark.rutl...@arm.com; linux-...@vger.kernel.org; > devicet...@vger.kernel.org; Shubhrajyoti Datta ; linux- > ker...@vger.kernel.org; Rajan Vaja ; linux-arm- > ker...@lists.infradead.org > Subject: Re: [PATCH 2/3] dt-bindings: clock: Add bindings for ZynqMP clock > driver > > On Wed, Mar 7, 2018 at 4:47 PM, Jolly Shah wrote: > > Hi Rob, > > > > > >> -Original Message- > >> From: Rob Herring [mailto:r...@kernel.org] > >> Sent: Monday, March 05, 2018 5:46 PM > >> To: Jolly Shah > >> Cc: mturque...@baylibre.com; sb...@codeaurora.org; > >> michal.si...@xilinx.com; mark.rutl...@arm.com; > >> linux-...@vger.kernel.org; devicet...@vger.kernel.org; Shubhrajyoti > >> Datta ; linux- ker...@vger.kernel.org; Jolly > >> Shah ; Rajan Vaja ; > >> linux-arm-ker...@lists.infradead.org > >> Subject: Re: [PATCH 2/3] dt-bindings: clock: Add bindings for ZynqMP > >> clock driver > >> > >> On Wed, Feb 28, 2018 at 02:27:40PM -0800, Jolly Shah wrote: > >> > Add documentation to describe Xilinx ZynqMP clock driver bindings. > >> > > >> > Signed-off-by: Jolly Shah > >> > Signed-off-by: Rajan Vaja > >> > Signed-off-by: Shubhrajyoti Datta > >> > --- > > >> > +95 dpll_post_src > >> > +96 vpll_int > >> > +97 vpll_pre_src > >> > +98 vpll_half > >> > +99 vpll_int_mux > >> > +100vpll_post_src > >> > +101can0_mio > >> > +102can1_mio > >> > + > >> > +Example: > >> > + > >> > +clk: clk { > >> > + #clock-cells = <1>; > >> > + compatible = "xlnx,zynqmp-clk"; > >> > >> How do you control the clocks? > > > > Clocks are controlled by a dedicated platform management controller. Above > clock ids are used to identify clocks between master and PMU. > > What is the interface to the "platform management controller"? Because you > have no registers, I'm guessing a firmware interface? If so, then just define > the > firmware node as a clock provider. Yes it is firmware interface. Along with clocks, firmware interface also controls power and pinctrl operations as major. I am not sure if I understand you correctly. Do you suggest to register clocks through Firmware driver or just use firmware DT node as clock provider and clock driver DT node can reference clocks from FW node to register same? > > Rob
[PATCH v3 2/3] dt-bindings: net: bluetooth: Add qualcomm-bluetooth
Add binding document for serial bluetooth chips using Qualcomm protocol. Signed-off-by: Thierry Escande--- .../devicetree/bindings/net/qualcomm-bluetooth.txt | 34 ++ 1 file changed, 34 insertions(+) create mode 100644 Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt diff --git a/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt b/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt new file mode 100644 index ..288cf062e906 --- /dev/null +++ b/Documentation/devicetree/bindings/net/qualcomm-bluetooth.txt @@ -0,0 +1,34 @@ +Qualcomm Bluetooth Chips +- + +This documents the binding structure and common properties for serial +attached Qualcomm devices. + +Serial attached Qualcomm devices shall be a child node of the host UART +device the slave device is attached to. + +Required properties: + + - compatible: should contain one of the following: + * "qcom,qca6174-bt" + +Optional properties: + + - bt-disable-n-gpios: GPIO specifier, used to enable chip during probe + +Example: + +serial@757 { + pinctrl-names = "default", "sleep"; + pinctrl-0 = <_uart1_default>; + pinctrl-1 = <_uart1_sleep>; + + bluetooth { + compatible = "qcom,qca6174-bt"; + + bt-disable-n-gpios = <_gpios 19 GPIO_ACTIVE_HIGH>; + + pinctrl-names = "default"; + pinctrl-0 = <_en_pin_a>, <_pin_a>; + }; +}; -- 2.14.1
Re: [PATCH v4 4/6] ipc: Clamp msgmni and shmmni to the real IPCMNI limit
On 03/13/2018 02:17 PM, Eric W. Biederman wrote: > Waiman Longwrites: > >> A user can write arbitrary integer values to msgmni and shmmni sysctl >> parameters without getting error, but the actual limit is really >> IPCMNI (32k). This can mislead users as they think they can get a >> value that is not real. >> >> Enforcing the limit by failing the sysctl parameter write, however, >> can break existing user applications. > Which applications examples please. > > I am seeing this patchset late but it looks like a whole lot of changes > to avoid a theoretical possibility. > > Changes that have an impact on more than just the ipc code you are > patching. > > That makes me feel very uncomfortable with these changes. > > Eric This patchset is constructed to address a customer request that there is no easy way to find out the actual usable range of a sysctl parameter. In this particular case, the customer wants to use more than 32k shared memory segments. They can put in a large value into shmmni, but the application didn't work properly because shmmni was internally clamped to 32k without any visible sign that a smaller limit has been imposed. Out of a concern that there might be customers out there setting those sysctl parameters outside of the allowable range without knowing it, just enforcing the right limits may have the undesirable consequence of breaking their existing setup scripts. I don't have concrete example of what customers are doing that, but it won't look good if we wait until the complaints come in. The new code won't affect existing code unless the necessary flag is set. So would you mind elaborating what other impact do you see that will affect other non-IPC code in an undesirable way? Cheers, Longman
[RT PATCH 1/2] Revert "block: blk-mq: Use swait"
This reverts commit "block: blk-mq: Use swait". The issue remains but will be fixed differently. Signed-off-by: Sebastian Andrzej Siewior--- block/blk-core.c | 6 +++--- block/blk-mq.c | 8 include/linux/blkdev.h | 2 +- 3 files changed, 8 insertions(+), 8 deletions(-) diff --git a/block/blk-core.c b/block/blk-core.c index ff1258ca236c..f9ac6f169c67 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -799,7 +799,7 @@ int blk_queue_enter(struct request_queue *q, bool nowait) */ smp_rmb(); - ret = swait_event_interruptible(q->mq_freeze_wq, + ret = wait_event_interruptible(q->mq_freeze_wq, !atomic_read(>mq_freeze_depth) || blk_queue_dying(q)); if (blk_queue_dying(q)) @@ -819,7 +819,7 @@ static void blk_queue_usage_counter_release(struct percpu_ref *ref) struct request_queue *q = container_of(ref, struct request_queue, q_usage_counter); - swake_up_all(>mq_freeze_wq); + wake_up_all(>mq_freeze_wq); } static void blk_rq_timed_out_timer(unsigned long data) @@ -895,7 +895,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) q->bypass_depth = 1; __set_bit(QUEUE_FLAG_BYPASS, >queue_flags); - init_swait_queue_head(>mq_freeze_wq); + init_waitqueue_head(>mq_freeze_wq); /* * Init percpu_ref in atomic mode so that it's faster to shutdown. diff --git a/block/blk-mq.c b/block/blk-mq.c index bbe43d32f71a..c5bd467dd97b 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -132,14 +132,14 @@ EXPORT_SYMBOL_GPL(blk_freeze_queue_start); void blk_mq_freeze_queue_wait(struct request_queue *q) { - swait_event(q->mq_freeze_wq, percpu_ref_is_zero(>q_usage_counter)); + wait_event(q->mq_freeze_wq, percpu_ref_is_zero(>q_usage_counter)); } EXPORT_SYMBOL_GPL(blk_mq_freeze_queue_wait); int blk_mq_freeze_queue_wait_timeout(struct request_queue *q, unsigned long timeout) { - return swait_event_timeout(q->mq_freeze_wq, + return wait_event_timeout(q->mq_freeze_wq, percpu_ref_is_zero(>q_usage_counter), timeout); } @@ -182,7 +182,7 @@ void blk_mq_unfreeze_queue(struct request_queue *q) WARN_ON_ONCE(freeze_depth < 0); if (!freeze_depth) { percpu_ref_reinit(>q_usage_counter); - swake_up_all(>mq_freeze_wq); + wake_up_all(>mq_freeze_wq); } } EXPORT_SYMBOL_GPL(blk_mq_unfreeze_queue); @@ -263,7 +263,7 @@ void blk_mq_wake_waiters(struct request_queue *q) * dying, we need to ensure that processes currently waiting on * the queue are notified as well. */ - swake_up_all(>mq_freeze_wq); + wake_up_all(>mq_freeze_wq); } bool blk_mq_can_queue(struct blk_mq_hw_ctx *hctx) diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index 6f278f1fd634..b68752bfb645 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -598,7 +598,7 @@ struct request_queue { struct throtl_data *td; #endif struct rcu_head rcu_head; - struct swait_queue_head mq_freeze_wq; + wait_queue_head_t mq_freeze_wq; struct percpu_ref q_usage_counter; struct list_headall_q_node; -- 2.16.2
Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory
On 12/03/18 09:28 PM, Sinan Kaya wrote: Maybe, dev parameter should also be struct pci_dev so that you can get rid of all to_pci_dev() calls in this code including find_parent_pci_dev() function. No, this was mentioned in v2. find_parent_pci_dev is necessary because the calling drivers aren't likely to have a bunch of struct pci_dev's for all the devices they are working with lying around. It's a much nicer from an API stand point to take struct devs and not worth it just to have a PCI API only taking struct pci_devs. Logan
Re: [GIT PULL] bcm2835-dt-next-2018-03-13
On 03/13/2018 10:30 AM, Eric Anholt wrote: > Hi Florian, > > The following changes since commit bcc76c4014dce4e3834dbd5b7f6593cbcfbfebe0: > > ARM: dts: bcm2835-rpi-zero-w: Enable OTG mode (2018-02-26 15:02:23 -0800) > > are available in the Git repository at: > > git://github.com/anholt/linux tags/bcm2835-dt-next-2018-03-13 > > for you to fetch changes up to ae67cfc4f6abb379e5bbe9a5616f00b23bbecb01: > > ARM: bcm2835: Add the DPI hardware to the device tree. (2018-03-12 11:44:29 > -0700) > > > This pull request brings in one last patch to add the BCM2835 DPI > controller to the device tree. > > > Eric Anholt (1): > ARM: bcm2835: Add the DPI hardware to the device tree. Merged, thanks Eric! -- Florian
Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory
On 13/03/18 11:49 AM, Sinan Kaya wrote: And there's also the ACS problem which means if you want to use P2P on the root ports you'll have to disable ACS on the entire system. (Or preferably, the IOMMU groups need to get more sophisticated to allow for dynamic changes). Do you think you can keep a pointer to the parent bridge instead of querying it via get_upstream_bridge_port() here so that we can reuse your pci_p2pdma_disable_acs() in the future. Keep a pointer where? pci_p2pdma_disable_acs() and pci_p2pdma_add_client() are used in completely different cases on completely different devices. There really is no overlap and no obvious place to store the port pointer (except in the struct pci_dev itself, in which case why not just call the function again). +int pci_p2pdma_disable_acs(struct pci_dev *pdev) +{ + struct pci_dev *up; + int pos; + u16 ctrl; + + up = get_upstream_bridge_port(pdev); + if (!up) + return 0; Some future device would only deal with pci_p2pdma_add_client(() for whitelisting instead of changing all of your code. That's a problem for whoever writes the future code. We should at least limit the usage of get_upstream_bridge_port() family of functions to probe time only. Why? We can tell iommu to do one to one translation by passing iommu.passthrough=1 to kernel command line to have identical behavior to your switch case. Well, someone will need to write code for all available IOMMUs to support this. That's a very big project. Logan
Re: [PATCH v3 3/3] arm64: dts: renesas: Add LVDS decoder to R-Car V3M Eagle
On Tue, Mar 13, 2018 at 03:30:25PM +0100, Jacopo Mondi wrote: > The R-Car V3M Eagle board includes a transparent THC63LVD1024 LVDS > decoder, connected to the on-chip LVDS encoder output on one side > and to HDMI encoder ADV7511w on the other one. > > As the decoder does not need any configuration it has been so-far > omitted from DTS. Now that a driver is available, describe it in DT > as well. > > Signed-off-by: Jacopo MondiI have marked this as deferred pending acceptance of the new binding. Please repost or ping me once that has happened.
Re: [3/3] irqchip/gic-v3: Bounds check redistributor accesses
Marc, On Tue, Mar 13, 2018 at 9:21 AM, Marc Zyngierwrote: > Hi Lokesh, > > On 13/03/18 13:38, Lokesh Vutla wrote: >> Hi All, >> >> On Wednesday 11 October 2017 03:11 PM, Punit Agrawal wrote: >>> The kernel crashes while iterating over a redistributor that is >>> in-correctly sized by the platform firmware or doesn't contain the last >>> record. >>> >>> Prevent the crash by checking accesses against the size of the region >>> provided by the firmware. While we are at it, warn the user about >>> incorrect region size. >>> >>> Signed-off-by: Punit Agrawal >>> Cc: Marc Zyngier >> >> Sorry to bring up an old thread. Just wanted to check what is the status >> on this series. > > So far, I wasn't inclined to merge it, as it only allowed you to detect > a broken system, as opposed to help with a working one. Is'nt that a good reason to have it? Why not help an error in dtb with a debug helper than an obtuse crash to debug painfully? > >> This will also be useful when we try to boot linux + hypervisor with >> less number of cores than the SoC supports. For example: >> - SoC has 4 cores and Linux tries to boot with 2 cores. >> - then a type-2 hypervisor gets installed. >> - Hypervisor tries to boot a VM with linux on core 1. >> >> Now the VM boot will fail while it iterates over all the GICR regions >> till GICR_TYPER is found. Hypervisor will trap any accesses to GICR >> regions of any invalid cpus(cpu 2, cpu 3 in this case). > > It you're passing the redistributors to a guest, you're doing something > terribly wrong. You're putting the guest in a position to do a DoS on > the hypervisor (disabling its timer interrupt, for example). Not the > greatest move. There is a number of other gotchas with this approach > (virtual interrupts, distributor virtualization...). > >> If the $patch is not the right approach, can you suggest on how to >> handle the above scenario? > > The proper way to handle this is to virtualize the distributor and > redistributor by trap/emulate. The only thing you can safely pass to a > guest is the CPU interface, either as system registers or in its MMIO > form (if you have the GICv2 compatibility interface). > Dumb question: Would'nt a trap emulate be real expensive with hyp context in v8 (all the context save/restore we'd have to do? (in the context of discussion, GICV2 compat is disabled). -- --- Regards, Nishanth Menon
Re: [PATCH v2 1/5] iommu/amd - Add debugfs support
On 03/13/2018 12:16 PM, Andy Shevchenko wrote: On Fri, Mar 9, 2018 at 2:50 AM, Gary R Hookwrote: + default n Redundant Roger that. +#include +#include +#include Keep in order? What order would that be? These few needed files are listed in the same order as which they appear in amd_iommu.c. I'm gonna need a preference spelled out, please (and a rationale, so I may better understand). +#include "amd_iommu_proto.h" +#include "amd_iommu_types.h" +/* DebugFS helpers */ +#defineOBUFP (obuf + oboff) +#defineOBUFLEN obuflen +#defineOBUFSPC (OBUFLEN - oboff) +#defineOSCNPRINTF(fmt, ...) \ + scnprintf(OBUFP, OBUFSPC, fmt, ## __VA_ARGS__) I don't see any advantages of this. Other way around, they will simple makes things hard to read an understand in place. I used this technique in the CCP driver code (where it was accepted), in an effort to do the opposite of what you claim: make the code more readable. Given the 80 column limit, a large number of arguments, and very long statements, IMO something needs to give. I don't find the use of #defines to be obfuscating. I'm not trying to argue, but rather simply state the perspective / reasoning I used to create a source file I feel is manageable. I have 17 more iommu patches built upon this strategy, and this seems to be advantageous for all of them. + for (i = start ; i <= end ; i++) Missed {} Wasn't sure about the M.O. given that the body of this loop is a single if statement. And I don't see anywhere in https://www.kernel.org/doc/html/latest/process/coding-style.html in section 3.1 where curly braces are called for in this situation. May I ask for clarification on the style rule, please? + if ((amd_iommu_dev_table[i].data[0] ^ 0x3) + || amd_iommu_dev_table[i].data[1]) + n++; + return n; +} + +static ssize_t amd_iommu_debugfs_dtecount_read(struct file *filp, + char __user *ubuf, + size_t count, loff_t *offp) +{ + struct amd_iommu *iommu = filp->private_data; + unsigned int obuflen = 512; Sounds like way too much. I can tune these up. + if (!iommu) + return 0; When this possible? It was intended as a sanity check, but if this happens, much worse has already gone wrong. I'll remove. + obuf = kmalloc(OBUFLEN, GFP_KERNEL); + if (!obuf) + return -ENOMEM; + + n = amd_iommu_count_valid_dtes(0, 0x); + oboff += OSCNPRINTF("%d\n", n); + return ret; +} @@ -89,6 +89,7 @@ #define ACPI_DEVFLAG_ATSDIS 0x1000 #define LOOP_TIMEOUT 10 + /* * ACPI table definitions * Doesn't belong to the patch. I'm sorry, I don't understand. The added blank line doesn't belong to the patch? +#endif + + Extra unneeded line. Thanks,
Re: [PATCH v2 5/5] iommu/amd - Add a debugfs entry to specify a IOMMU device table entry
On 03/13/2018 12:20 PM, Andy Shevchenko wrote: + oboff += OSCNPRINTF("%02x:%02x:%x (%u / %04x)\n", + PCI_BUS_NUM(amd_iommu_devid), + PCI_SLOT(amd_iommu_devid), + PCI_FUNC(amd_iommu_devid), Perhaps at some point we will have an extension to %p to print PCI BDFs. But until then ;-) + if (strnchr(obuf, OBUFLEN, ':')) + { Style D'oh! + } else if (obuf[0] == '0' && obuf[1] == 'x') { + n = sscanf(obuf, "%x", _iommu_devid); + } else { + n = sscanf(obuf, "%d", _iommu_devid); + } kstrtoint() ? I see various mechanisms for this sort of thing, and simply chose one. Am happy to use whatever is preferred.
Re: [PATCH v2 7/8] [PATCH 7/8] drivers/hwmon: Add a generic PECI hwmon client driver
Hi Stef, Thanks for sharing your time to test it. That is expected result in v2. Previously in v1, it used delayed creation on core temperature group so it was okay if hwmon driver is registered when client CPU is powered down, but in v2, the driver should check resolved cores at probing time to prevent the delayed creation on core temperature group and indexing gap which breaks hwmon subsystem's common rule, so I added peci_detect() into peci_new_device() in PECI core driver to check online status of the client CPU when registering a new device. You may need to use dynamic dtoverlay for loading/unloading a PECI hwmon driver according to the current client CPU power state. It means a PECI hwmon driver can be registered only when the client CPU is powered on. This design will be kept in v3 as well. Thanks, Jae On 3/13/2018 2:32 AM, Stef van Os wrote: Hi Jae, I tried version 1 and 2 of your PECI patch on our (AST2500 / Xeon E5 v4) system. The V1 patchset works as expected (reading back temperature 0 until PECI is up), but the hwmon driver probe fails with version 2. It communicates with the Xeon and assumes during kernel boot of the Aspeed that PECI to the Xeon's is already up and running, but our system enables the main Xeon supplies from AST2500 userspace. If I load the hwmon driver as a module to load later on, the driver does not call probe like e.g. a I2C driver on the I2C bus does. Am I using V2 wrongly? BR, Stef On 02/21/2018 05:16 PM, Jae Hyun Yoo wrote: This commit adds a generic PECI hwmon client driver implementation. Signed-off-by: Jae Hyun Yoo--- drivers/hwmon/Kconfig | 10 + drivers/hwmon/Makefile | 1 + drivers/hwmon/peci-hwmon.c | 928 + 3 files changed, 939 insertions(+) create mode 100644 drivers/hwmon/peci-hwmon.c diff --git a/drivers/hwmon/Kconfig b/drivers/hwmon/Kconfig index ef23553ff5cb..f22e0c31f597 100644 --- a/drivers/hwmon/Kconfig +++ b/drivers/hwmon/Kconfig @@ -1246,6 +1246,16 @@ config SENSORS_NCT7904 This driver can also be built as a module. If so, the module will be called nct7904. +config SENSORS_PECI_HWMON + tristate "PECI hwmon support" + depends on PECI + help + If you say yes here you get support for the generic PECI hwmon + driver. + + This driver can also be built as a module. If so, the module + will be called peci-hwmon. + config SENSORS_NSA320 tristate "ZyXEL NSA320 and compatible fan speed and temperature sensors" depends on GPIOLIB && OF diff --git a/drivers/hwmon/Makefile b/drivers/hwmon/Makefile index f814b4ace138..946f54b168e5 100644 --- a/drivers/hwmon/Makefile +++ b/drivers/hwmon/Makefile @@ -135,6 +135,7 @@ obj-$(CONFIG_SENSORS_NCT7802) += nct7802.o obj-$(CONFIG_SENSORS_NCT7904) += nct7904.o obj-$(CONFIG_SENSORS_NSA320) += nsa320-hwmon.o obj-$(CONFIG_SENSORS_NTC_THERMISTOR) += ntc_thermistor.o +obj-$(CONFIG_SENSORS_PECI_HWMON) += peci-hwmon.o obj-$(CONFIG_SENSORS_PC87360) += pc87360.o obj-$(CONFIG_SENSORS_PC87427) += pc87427.o obj-$(CONFIG_SENSORS_PCF8591) += pcf8591.o diff --git a/drivers/hwmon/peci-hwmon.c b/drivers/hwmon/peci-hwmon.c new file mode 100644 index ..edd27744adcb --- /dev/null +++ b/drivers/hwmon/peci-hwmon.c @@ -0,0 +1,928 @@ +// SPDX-License-Identifier: GPL-2.0 +// Copyright (c) 2018 Intel Corporation + +#include +#include +#include +#include +#include +#include +#include +#include + +#define DIMM_SLOT_NUMS_MAX 12 /* Max DIMM numbers (channel ranks x 2) */ +#define CORE_NUMS_MAX 28 /* Max core numbers (max on SKX Platinum) */ +#define TEMP_TYPE_PECI 6 /* Sensor type 6: Intel PECI */ + +#define CORE_TEMP_ATTRS 5 +#define DIMM_TEMP_ATTRS 2 +#define ATTR_NAME_LEN 24 + +#define DEFAULT_ATTR_GRP_NUMS 5 + +#define UPDATE_INTERVAL_MIN HZ +#define DIMM_MASK_CHECK_DELAY msecs_to_jiffies(5000) + +enum sign { + POS, + NEG +}; + +struct temp_data { + bool valid; + s32 value; + unsigned long last_updated; +}; + +struct temp_group { + struct temp_data tjmax; + struct temp_data tcontrol; + struct temp_data tthrottle; + struct temp_data dts_margin; + struct temp_data die; + struct temp_data core[CORE_NUMS_MAX]; + struct temp_data dimm[DIMM_SLOT_NUMS_MAX]; +}; + +struct core_temp_group { + struct sensor_device_attribute sd_attrs[CORE_TEMP_ATTRS]; + char attr_name[CORE_TEMP_ATTRS][ATTR_NAME_LEN]; + struct attribute *attrs[CORE_TEMP_ATTRS + 1]; + struct attribute_group attr_group; +}; + +struct dimm_temp_group { + struct sensor_device_attribute sd_attrs[DIMM_TEMP_ATTRS]; + char attr_name[DIMM_TEMP_ATTRS][ATTR_NAME_LEN]; + struct attribute *attrs[DIMM_TEMP_ATTRS + 1]; + struct attribute_group attr_group; +}; + +struct peci_hwmon { + struct peci_client *client; + struct device *dev; + struct device
Re: [PATCH v4 4/6] ipc: Clamp msgmni and shmmni to the real IPCMNI limit
Waiman Longwrites: > A user can write arbitrary integer values to msgmni and shmmni sysctl > parameters without getting error, but the actual limit is really > IPCMNI (32k). This can mislead users as they think they can get a > value that is not real. > > Enforcing the limit by failing the sysctl parameter write, however, > can break existing user applications. Which applications examples please. I am seeing this patchset late but it looks like a whole lot of changes to avoid a theoretical possibility. Changes that have an impact on more than just the ipc code you are patching. That makes me feel very uncomfortable with these changes. Eric > Instead, the range clamping flag > is set to enforce the limit without failing existing user code. Users > can easily figure out if the sysctl parameter value is out of range > by either reading back the parameter value or checking the kernel > ring buffer for warning. > > Signed-off-by: Waiman Long > --- > ipc/ipc_sysctl.c | 9 +++-- > 1 file changed, 7 insertions(+), 2 deletions(-) > > diff --git a/ipc/ipc_sysctl.c b/ipc/ipc_sysctl.c > index 8ad93c2..1955dd4 100644 > --- a/ipc/ipc_sysctl.c > +++ b/ipc/ipc_sysctl.c > @@ -99,6 +99,7 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, > int write, > static int zero; > static int one = 1; > static int int_max = INT_MAX; > +static int ipc_mni = IPCMNI; > > static struct ctl_table ipc_kern_table[] = { > { > @@ -120,7 +121,10 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, > int write, > .data = _ipc_ns.shm_ctlmni, > .maxlen = sizeof(init_ipc_ns.shm_ctlmni), > .mode = 0644, > - .proc_handler = proc_ipc_dointvec, > + .proc_handler = proc_ipc_dointvec_minmax, > + .extra1 = , > + .extra2 = _mni, > + .flags = CTL_FLAGS_CLAMP_RANGE, > }, > { > .procname = "shm_rmid_forced", > @@ -147,7 +151,8 @@ static int proc_ipc_auto_msgmni(struct ctl_table *table, > int write, > .mode = 0644, > .proc_handler = proc_ipc_dointvec_minmax, > .extra1 = , > - .extra2 = _max, > + .extra2 = _mni, > + .flags = CTL_FLAGS_CLAMP_RANGE, > }, > { > .procname = "auto_msgmni",
Re: [RFC PATCH linux-next] evm: evm_hmac can be static
On Wed, 2018-03-14 at 01:42 +0800, kbuild test robot wrote: > Fixes: c49fc264be98 ("evm: Move evm_hmac and evm_hash from evm_main.c to > evm_crypto.c") > Signed-off-by: Fengguang WuThanks! > --- > evm_crypto.c |4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/security/integrity/evm/evm_crypto.c > b/security/integrity/evm/evm_crypto.c > index fdde9cb..a46fba3 100644 > --- a/security/integrity/evm/evm_crypto.c > +++ b/security/integrity/evm/evm_crypto.c > @@ -37,8 +37,8 @@ static DEFINE_MUTEX(mutex); > > static unsigned long evm_set_key_flags; > > -char * const evm_hmac = "hmac(sha1)"; > -char * const evm_hash = "sha1"; > +static char * const evm_hmac = "hmac(sha1)"; > +static char * const evm_hash = "sha1"; > > /** > * evm_set_key() - set EVM HMAC key from the kernel
Re: [PATCH 3/3] Bluetooth: hci_qca: Add serdev support
On 13 March 2018 at 17:03, Thierry Escandewrote: > From: Thierry Escande > > Add support for Qualcomm serial slave devices. Probe the serial device, > retrieve its maximum speed and register a new hci uart device. > > Signed-off-by: Thierry Escande > --- > drivers/bluetooth/Kconfig | 2 +- > drivers/bluetooth/hci_qca.c | 102 > +++- > 2 files changed, 101 insertions(+), 3 deletions(-) > > diff --git a/drivers/bluetooth/Kconfig b/drivers/bluetooth/Kconfig > index 07e55cd8f8c8..c2a6a7ebd14b 100644 > --- a/drivers/bluetooth/Kconfig > +++ b/drivers/bluetooth/Kconfig > @@ -195,7 +195,7 @@ config BT_HCIUART_BCM > > config BT_HCIUART_QCA > bool "Qualcomm Atheros protocol support" > - depends on BT_HCIUART > + depends on BT_HCIUART_SERDEV Driver depends on both BT_HCIUART and BT_HCIUART_SERDEV. > @@ -386,6 +399,7 @@ static void hci_ibs_wake_retrans_timeout(struct > timer_list *t) > /* Initialize protocol */ > static int qca_open(struct hci_uart *hu) > { > + struct qca_serdev *qcadev; > struct qca_data *qca; > > BT_DBG("hu %p qca_open", hu); > @@ -444,6 +458,13 @@ static int qca_open(struct hci_uart *hu) > timer_setup(>tx_idle_timer, hci_ibs_tx_idle_timeout, 0); > qca->tx_idle_delay = IBS_TX_IDLE_TIMEOUT_MS; > > + if (hu->serdev) { > + serdev_device_open(hu->serdev); > + > + qcadev = serdev_device_get_drvdata(hu->serdev); > + gpiod_set_value(qcadev->bt_en, 1); I suggest to replace gpiod_set_value calls with the _cansleep version. You always set gpio value from sleepable context and this should avoid any further issue with different gpio controller. Regards, Loic
Re: [PATCH v2 1/3] KVM: X86: Provides userspace with a capability to not intercept MWAIT
Is there a need for a new API for yielding MONITOR/MWAIT to the guest? Why not just tie this to the guest CPUID.01H:ECX[MWAIT] being set? On Mon, Mar 12, 2018 at 4:53 AM, Wanpeng Liwrote: > From: Wanpeng Li > > Allowing a guest to execute MWAIT without interception enables a guest > to put a (physical) CPU into a power saving state, where it takes > longer to return from than what may be desired by the host. > > Don't give a guest that power over a host by default. (Especially, > since nothing prevents a guest from using MWAIT even when it is not > advertised via CPUID.) > > Cc: Paolo Bonzini > Cc: Radim Krčmář > Cc: Jan H. Schönherr > Signed-off-by: Wanpeng Li > --- > Documentation/virtual/kvm/api.txt | 23 ++- > arch/x86/include/asm/kvm_host.h | 2 ++ > arch/x86/kvm/svm.c| 2 +- > arch/x86/kvm/vmx.c| 9 + > arch/x86/kvm/x86.c| 24 > arch/x86/kvm/x86.h| 10 +- > include/uapi/linux/kvm.h | 2 +- > tools/include/uapi/linux/kvm.h| 2 +- > 8 files changed, 49 insertions(+), 25 deletions(-) > > diff --git a/Documentation/virtual/kvm/api.txt > b/Documentation/virtual/kvm/api.txt > index 98de506..76e5a15 100644 > --- a/Documentation/virtual/kvm/api.txt > +++ b/Documentation/virtual/kvm/api.txt > @@ -4358,6 +4358,20 @@ enables QEMU to build error log and branch to guest > kernel registered > machine check handling routine. Without this capability KVM will > branch to guests' 0x200 interrupt vector. > > +7.13 KVM_CAP_X86_DISABLE_EXITS > + > +Architectures: x86 > +Parameters: args[0] defines which exits are disabled > +Returns: 0 on success, -EINVAL when args[0] contains invalid exits > + > +Valid exits in args[0] are > + > +#define KVM_X86_DISABLE_EXITS_MWAIT(1 << 0) > + > +Enabling this capability on a VM provides userspace with a way to no > +longer intercepts some instructions for improved latency in some > +workloads. > + > 8. Other capabilities. > -- > > @@ -4470,15 +4484,6 @@ reserved. > Both registers and addresses are 64-bits wide. > It will be possible to run 64-bit or 32-bit guest code. > > -8.8 KVM_CAP_X86_GUEST_MWAIT > - > -Architectures: x86 > - > -This capability indicates that guest using memory monotoring instructions > -(MWAIT/MWAITX) to stop the virtual CPU will not cause a VM exit. As such > time > -spent while virtual CPU is halted in this way will then be accounted for as > -guest running time on the host (as opposed to e.g. HLT). > - > 8.9 KVM_CAP_ARM_USER_IRQ > > Architectures: arm, arm64 > diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h > index 0395c35..e107171 100644 > --- a/arch/x86/include/asm/kvm_host.h > +++ b/arch/x86/include/asm/kvm_host.h > @@ -811,6 +811,8 @@ struct kvm_arch { > > gpa_t wall_clock; > > + bool mwait_in_guest; > + > bool ept_identity_pagetable_done; > gpa_t ept_identity_map_addr; > > diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c > index be9c839..321b3fd 100644 > --- a/arch/x86/kvm/svm.c > +++ b/arch/x86/kvm/svm.c > @@ -1390,7 +1390,7 @@ static void init_vmcb(struct vcpu_svm *svm) > set_intercept(svm, INTERCEPT_XSETBV); > set_intercept(svm, INTERCEPT_RSM); > > - if (!kvm_mwait_in_guest()) { > + if (!kvm_mwait_in_guest(svm->vcpu.kvm)) { > set_intercept(svm, INTERCEPT_MONITOR); > set_intercept(svm, INTERCEPT_MWAIT); > } > diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c > index 6cefd7b..2302ae2 100644 > --- a/arch/x86/kvm/vmx.c > +++ b/arch/x86/kvm/vmx.c > @@ -3733,13 +3733,11 @@ static __init int setup_vmcs_config(struct > vmcs_config *vmcs_conf) > CPU_BASED_UNCOND_IO_EXITING | > CPU_BASED_MOV_DR_EXITING | > CPU_BASED_USE_TSC_OFFSETING | > + CPU_BASED_MWAIT_EXITING | > + CPU_BASED_MONITOR_EXITING | > CPU_BASED_INVLPG_EXITING | > CPU_BASED_RDPMC_EXITING; > > - if (!kvm_mwait_in_guest()) > - min |= CPU_BASED_MWAIT_EXITING | > - CPU_BASED_MONITOR_EXITING; > - > opt = CPU_BASED_TPR_SHADOW | > CPU_BASED_USE_MSR_BITMAPS | > CPU_BASED_ACTIVATE_SECONDARY_CONTROLS; > @@ -5531,6 +5529,9 @@ static u32 vmx_exec_control(struct vcpu_vmx *vmx) > exec_control |= CPU_BASED_CR3_STORE_EXITING | > CPU_BASED_CR3_LOAD_EXITING | > CPU_BASED_INVLPG_EXITING; > + if (kvm_mwait_in_guest(vmx->vcpu.kvm)) > + exec_control &= ~(CPU_BASED_MWAIT_EXITING | > + CPU_BASED_MONITOR_EXITING); > return exec_control; > } > > diff --git
Re: [PATCH v4 2/2] ARM: ftrace: Add MODULE_PLTS support
Hi! On 13/03/18 18:51, Ard Biesheuvel wrote: >>> if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS)) >>> module_plt_alloc_fixed(); >> Do you consider this a legal C code if without module-plts.o the function >> would not exist at all? >> That's too much relying on optimizer I think... >> > Yes, we rely on that in many different places in the kernel. https://www.kernel.org/doc/Documentation/process/coding-style.rst: "However, this approach still allows the C compiler to see the code inside the block, and check it for correctness (syntax, types, symbol references, etc). Thus, you still have to use an #ifdef if the code inside the block references symbols that will not exist if the condition is not met." But we can of course ignore it. -- Best regards, Alexander Sverdlin.
[v6 1/2] mm: disable interrupts while initializing deferred pages
Vlastimil Babka reported about a window issue during which when deferred pages are initialized, and the current version of on-demand initialization is finished, allocations may fail. While this is highly unlikely scenario, since this kind of allocation request must be large, and must come from interrupt handler, we still want to cover it. We solve this by initializing deferred pages with interrupts disabled, and holding node_size_lock spin lock while pages in the node are being initialized. The on-demand deferred page initialization that comes later will use the same lock, and thus synchronize with deferred_init_memmap(). It is unlikely for threads that initialize deferred pages to be interrupted. They run soon after smp_init(), but before modules are initialized, and long before user space programs. This is why there is no adverse effect of having these threads running with interrupts disabled. Signed-off-by: Pavel Tatashin--- include/linux/memory_hotplug.h | 53 ++ include/linux/mmzone.h | 5 ++-- mm/page_alloc.c| 19 --- 3 files changed, 42 insertions(+), 35 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index aba5f86eb038..2b0265265c28 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -51,24 +51,6 @@ enum { MMOP_ONLINE_MOVABLE, }; -/* - * pgdat resizing functions - */ -static inline -void pgdat_resize_lock(struct pglist_data *pgdat, unsigned long *flags) -{ - spin_lock_irqsave(>node_size_lock, *flags); -} -static inline -void pgdat_resize_unlock(struct pglist_data *pgdat, unsigned long *flags) -{ - spin_unlock_irqrestore(>node_size_lock, *flags); -} -static inline -void pgdat_resize_init(struct pglist_data *pgdat) -{ - spin_lock_init(>node_size_lock); -} /* * Zone resizing functions * @@ -246,13 +228,6 @@ extern void clear_zone_contiguous(struct zone *zone); ___page;\ }) -/* - * Stub functions for when hotplug is off - */ -static inline void pgdat_resize_lock(struct pglist_data *p, unsigned long *f) {} -static inline void pgdat_resize_unlock(struct pglist_data *p, unsigned long *f) {} -static inline void pgdat_resize_init(struct pglist_data *pgdat) {} - static inline unsigned zone_span_seqbegin(struct zone *zone) { return 0; @@ -293,6 +268,34 @@ static inline bool movable_node_is_enabled(void) } #endif /* ! CONFIG_MEMORY_HOTPLUG */ +#if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_DEFERRED_STRUCT_PAGE_INIT) +/* + * pgdat resizing functions + */ +static inline +void pgdat_resize_lock(struct pglist_data *pgdat, unsigned long *flags) +{ + spin_lock_irqsave(>node_size_lock, *flags); +} +static inline +void pgdat_resize_unlock(struct pglist_data *pgdat, unsigned long *flags) +{ + spin_unlock_irqrestore(>node_size_lock, *flags); +} +static inline +void pgdat_resize_init(struct pglist_data *pgdat) +{ + spin_lock_init(>node_size_lock); +} +#else /* !(CONFIG_MEMORY_HOTPLUG || CONFIG_DEFERRED_STRUCT_PAGE_INIT) */ +/* + * Stub functions for when hotplug is off + */ +static inline void pgdat_resize_lock(struct pglist_data *p, unsigned long *f) {} +static inline void pgdat_resize_unlock(struct pglist_data *p, unsigned long *f) {} +static inline void pgdat_resize_init(struct pglist_data *pgdat) {} +#endif /* !(CONFIG_MEMORY_HOTPLUG || CONFIG_DEFERRED_STRUCT_PAGE_INIT) */ + #ifdef CONFIG_MEMORY_HOTREMOVE extern bool is_mem_section_removable(unsigned long pfn, unsigned long nr_pages); diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 7522a6987595..d14168da66a7 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -633,14 +633,15 @@ typedef struct pglist_data { #ifndef CONFIG_NO_BOOTMEM struct bootmem_data *bdata; #endif -#ifdef CONFIG_MEMORY_HOTPLUG +#if defined(CONFIG_MEMORY_HOTPLUG) || defined(CONFIG_DEFERRED_STRUCT_PAGE_INIT) /* * Must be held any time you expect node_start_pfn, node_present_pages * or node_spanned_pages stay constant. Holding this will also * guarantee that any pfn_valid() stays that way. * * pgdat_resize_lock() and pgdat_resize_unlock() are provided to -* manipulate node_size_lock without checking for CONFIG_MEMORY_HOTPLUG. +* manipulate node_size_lock without checking for CONFIG_MEMORY_HOTPLUG +* or CONFIG_DEFERRED_STRUCT_PAGE_INIT. * * Nests above zone->lock and zone->span_seqlock */ diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 3d974cb2a1a1..cada509e2176 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1506,7 +1506,7 @@ static void __init deferred_free_pages(int nid, int zid, unsigned long pfn, } else if (!(pfn & nr_pgmask)) { deferred_free_range(pfn - nr_free, nr_free);
[v6 0/2] initialize pages on demand during boot
Change log: v5 - v6 - Fixed issues found by Andrew Morton: replaced cond_resched() with touch_nmi_watchdog(), instead of simply deleting it. - Removed useless pgdata_resize_lock_irq(), as regular pgdata_resize_lock() does exactly what is needed. - Included fixes to comments by Andrew from mm-initialize-pages-on-demand-during-boot-v5-fix.patch. v4 - v5 - Fix issue reported by Vlasimil Babka: > I've noticed that this function first disables the > on-demand initialization, and then runs the kthreads. > Doesn't that leave a window where allocations can fail? The > chances are probably small, but I think it would be better > to avoid it completely, rare failures suck. > > Fixing that probably means rethinking the whole > synchronization more dramatically though :/ - Introduced a new patch that uses node resize lock to synchronize on-demand deferred page initialization, and regular deferred page initialization. v3 - v4 - Fix !CONFIG_NUMA issue. v2 - v3 Andrew Morton's comments: - Moved read of pgdat->first_deferred_pfn into deferred_zone_grow_lock, thus got rid of READ_ONCE()/WRITE_ONCE() - Replaced spin_lock() with spin_lock_irqsave() in deferred_grow_zone - Updated comments for deferred_zone_grow_lock - Updated comment before deferred_grow_zone() explaining return value, and also noinline specifier. - Fixed comment before _deferred_grow_zone(). v1 - v2 Added Tested-by: Masayoshi Mizuma This change helps for three reasons: 1. Insufficient amount of reserved memory due to arguments provided by user. User may request some buffers, increased hash tables sizes etc. Currently, machine panics during boot if it can't allocate memory due to insufficient amount of reserved memory. With this change, it will be able to grow zone before deferred pages are initialized. One observed example is described in the linked discussion [1] Mel Gorman writes: " Yasuaki Ishimatsu reported a premature OOM when trace_buf_size=100m was specified on a machine with many CPUs. The kernel tried to allocate 38.4GB but only 16GB was available due to deferred memory initialisation. " The allocations in the above scenario happen per-cpu in smp_init(), and before deferred pages are initialized. So, there is no way to predict how much memory we should put aside to boot successfully with deferred page initialization feature compiled in. 2. The second reason is future proof. The kernel memory requirements may change, and we do not want to constantly update reset_deferred_meminit() to satisfy the new requirements. In addition, this function is currently in common code, but potentially would need to be split into arch specific variants, as more arches will start taking advantage of deferred page initialization feature. 3. On demand initialization of reserved pages guarantees that we will initialize only as many pages early in boot using only one thread as needed, the rest are going to be efficiently initialized in parallel. [1] https://www.spinics.net/lists/linux-mm/msg139087.html Pavel Tatashin (2): mm: disable interrupts while initializing deferred pages mm: initialize pages on demand during boot include/linux/memblock.h | 10 -- include/linux/memory_hotplug.h | 53 ++- include/linux/mmzone.h | 5 +- mm/memblock.c | 23 - mm/page_alloc.c| 202 +++-- 5 files changed, 186 insertions(+), 107 deletions(-) -- 2.16.2
[v6 2/2] mm: initialize pages on demand during boot
Deferred page initialization allows the boot cpu to initialize a small subset of the system's pages early in boot, with other cpus doing the rest later on. It is, however, problematic to know how many pages the kernel needs during boot. Different modules and kernel parameters may change the requirement, so the boot cpu either initializes too many pages or runs out of memory. To fix that, initialize early pages on demand. This ensures the kernel does the minimum amount of work to initialize pages during boot and leaves the rest to be divided in the multithreaded initialization path (deferred_init_memmap). The on-demand code is permanently disabled using static branching once deferred pages are initialized. After the static branch is changed to false, the overhead is up-to two branch-always instructions if the zone watermark check fails or if rmqueue fails. Sergey Senozhatsky noticed that while deferred pages currently make sense only on NUMA machines (we start one thread per latency node), CONFIG_NUMA is not a requirement for CONFIG_DEFERRED_STRUCT_PAGE_INIT, so that is also must be addressed in the patch. Signed-off-by: Pavel TatashinReviewed-by: Daniel Jordan Reviewed-by: Steven Sistare Reviewed-by: Andrew Morton Tested-by: Masayoshi Mizuma Acked-by: Mel Gorman --- include/linux/memblock.h | 10 --- mm/memblock.c| 23 -- mm/page_alloc.c | 183 +-- 3 files changed, 144 insertions(+), 72 deletions(-) diff --git a/include/linux/memblock.h b/include/linux/memblock.h index 8be5077efb5f..6c305afd95ab 100644 --- a/include/linux/memblock.h +++ b/include/linux/memblock.h @@ -417,21 +417,11 @@ static inline void early_memtest(phys_addr_t start, phys_addr_t end) { } #endif - -extern unsigned long memblock_reserved_memory_within(phys_addr_t start_addr, - phys_addr_t end_addr); #else static inline phys_addr_t memblock_alloc(phys_addr_t size, phys_addr_t align) { return 0; } - -static inline unsigned long memblock_reserved_memory_within(phys_addr_t start_addr, - phys_addr_t end_addr) -{ - return 0; -} - #endif /* CONFIG_HAVE_MEMBLOCK */ #endif /* __KERNEL__ */ diff --git a/mm/memblock.c b/mm/memblock.c index b6ba6b7adadc..80a12c64b203 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1778,29 +1778,6 @@ static void __init_memblock memblock_dump(struct memblock_type *type) } } -extern unsigned long __init_memblock -memblock_reserved_memory_within(phys_addr_t start_addr, phys_addr_t end_addr) -{ - struct memblock_region *rgn; - unsigned long size = 0; - int idx; - - for_each_memblock_type(idx, (), rgn) { - phys_addr_t start, end; - - if (rgn->base + rgn->size < start_addr) - continue; - if (rgn->base > end_addr) - continue; - - start = rgn->base; - end = start + rgn->size; - size += end - start; - } - - return size; -} - void __init_memblock __memblock_dump_all(void) { pr_info("MEMBLOCK configuration:\n"); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index cada509e2176..529e2dce7d16 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -292,40 +292,6 @@ EXPORT_SYMBOL(nr_online_nodes); int page_group_by_mobility_disabled __read_mostly; #ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT - -/* - * Determine how many pages need to be initialized during early boot - * (non-deferred initialization). - * The value of first_deferred_pfn will be set later, once non-deferred pages - * are initialized, but for now set it ULONG_MAX. - */ -static inline void reset_deferred_meminit(pg_data_t *pgdat) -{ - phys_addr_t start_addr, end_addr; - unsigned long max_pgcnt; - unsigned long reserved; - - /* -* Initialise at least 2G of a node but also take into account that -* two large system hashes that can take up 1GB for 0.25TB/node. -*/ - max_pgcnt = max(2UL << (30 - PAGE_SHIFT), - (pgdat->node_spanned_pages >> 8)); - - /* -* Compensate the all the memblock reservations (e.g. crash kernel) -* from the initial estimation to make sure we will initialize enough -* memory to boot. -*/ - start_addr = PFN_PHYS(pgdat->node_start_pfn); - end_addr = PFN_PHYS(pgdat->node_start_pfn + max_pgcnt); - reserved = memblock_reserved_memory_within(start_addr, end_addr); - max_pgcnt += PHYS_PFN(reserved); - - pgdat->static_init_pgcnt = min(max_pgcnt, pgdat->node_spanned_pages); - pgdat->first_deferred_pfn = ULONG_MAX; -} - /* Returns true if the struct page for the pfn is uninitialised */ static inline bool __meminit
Re: [PATCH v4 2/2] ARM: ftrace: Add MODULE_PLTS support
On 13 March 2018 at 18:24, Alexander Sverdlinwrote: > Hi! > > On 13/03/18 18:51, Ard Biesheuvel wrote: if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS)) module_plt_alloc_fixed(); >>> Do you consider this a legal C code if without module-plts.o the function >>> would not exist at all? >>> That's too much relying on optimizer I think... >>> >> Yes, we rely on that in many different places in the kernel. > > https://www.kernel.org/doc/Documentation/process/coding-style.rst: > "However, this approach still allows the C compiler to see the code > inside the block, and check it for correctness (syntax, types, symbol > references, etc). Thus, you still have to use an #ifdef if the code inside > the > block references symbols that will not exist if the condition is not met." > "will not exist" is ambiguous here. It is rather common to declare symbols, but only define them conditionally, and use IS_ENABLED() to refer to them. As the documentation says, this gets rid of #ifdefs, making the code always visible to the compiler which is a good thing.
Re: [PATCH v4 2/2] ARM: ftrace: Add MODULE_PLTS support
Hi! On 13/03/18 18:39, Ard Biesheuvel wrote: >> u32 get_module_plt(struct module *mod, unsigned long loc, Elf32_Addr >> val) >> { >> struct mod_plt_sec *pltsec = !in_init(mod, loc) ? >> >arch.core : >> >> >arch.init; >> + struct plt_entries *plt; >> + int idx; >> >> - struct plt_entries *plt = (struct plt_entries >> *)pltsec->plt->sh_addr; ^^^ (*) >> - int idx = 0; >> + /* cache the address, ELF header is available only during module >> load */ >> + if (!pltsec->plt_ent) >> + pltsec->plt_ent = (struct plt_entries >> *)pltsec->plt->sh_addr; >> + plt = pltsec->plt_ent; >> + (*) > Where is plt_ent ever used? Above is exactly the place it's used. I need to cache it because after the module load is finished the ELF header is freed, pltsec->plt pointer (*) is not valid any more. With the above modification it's possible to call the function during the whole life time of the module. >>> Right, ok. That's a problem. >>> >>> This means that you are relying on get_module_plt() being called at >>> least once at module load time, which is not guaranteed. >> This is indeed guaranteed. For FTRACE use case. If it's being called from >> FTRACE in >> run time, this would mean there were long calls in this module section, >> which in >> turn means, get_module_plt() was called at least once for this module and >> this >> section. >> >> This doesn't hold in general, though. >> >> In any case, if you insist, I can try to rework the whole stuff implementing >> module_finalize(). >> > I think it would be much better to use the module_finalize() hook for > this, given that it is only called once already, and all the data you > need is still available. No problem, but some kind of (*) block would still be required, because get_module_plt() has to work after module_finalize() as well as *before* it. So before module_finalize() we would have to dereference pltsec->plt->sh_addr conditionally. > Note that ARM already has such a function, so you'll need to add a > call there, i.e., > > if (IS_ENABLED(CONFIG_ARM_MODULE_PLTS)) > module_plt_alloc_fixed(); > > or something like that. The CONFIG_FTRACE dependency can be kept local > to module-plts.c > -- Best regards, Alexander Sverdlin.
Re: [PATCH] debugobjects: avoid another unused variable warning.
On 3/13/18 6:18 AM, Arnd Bergmann wrote: debug_objects_maxchecked is only updated in __debug_check_no_obj_freed(), and only read in debug_objects_maxchecked, unfortunately both of these are optional and depend on different Kconfig symbols. When both CONFIG_DEBUG_OBJECTS_FREE and CONFIG_DEBUG_FS are disabled, we get this warning: lib/debugobjects.c:56:14: error: 'debug_objects_maxchecked' defined but not used [-Werror=unused-variable] Rather than trying to add more complex #ifdef protections, this marks the variable as __maybe_unused so it can be silently dropped when usused. Thanks for catching this. Acked-by: Yang ShiFixes: bd9dcd046509 ("debugobjects: Export max loops counter") Signed-off-by: Arnd Bergmann --- lib/debugobjects.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/lib/debugobjects.c b/lib/debugobjects.c index 105ecfc47d8c..994be4805cec 100644 --- a/lib/debugobjects.c +++ b/lib/debugobjects.c @@ -53,7 +53,7 @@ static intobj_nr_tofree; static struct kmem_cache *obj_cache; static int debug_objects_maxchain __read_mostly; -static int debug_objects_maxchecked __read_mostly; +static int __maybe_unused debug_objects_maxchecked __read_mostly; static intdebug_objects_fixups __read_mostly; static intdebug_objects_warnings __read_mostly; static intdebug_objects_enabled __read_mostly
Re: [PATCH 0/3] irqchip: GIC kexec/kdump improvement and workarounds
On 13/03/18 17:51, Mark Rutland wrote: > On Tue, Mar 13, 2018 at 05:21:00PM +, Marc Zyngier wrote: >> As kexec and kdump are getting used a bit more intensively, I've been >> made aware of a number of shortcomings. >> >> The main gripe is from folks trying to launch a kdump kernel from >> within an interrupt handler. If using EOImode==1, things work as >> expected. If using EOImode==0 (such as in a guest), the secondary >> kernel hangs as the previous interrupt hasn't been EOI'd, and the >> active priority is still set. The first two patches are addressing >> this situation for both GICv2 and GICv3 by reseting the APRs to their >> default value. > > As a more general thing, if irqchip drivers have state that needs to be > reset in their init code, can we live all this irqchip reset to the > crashdump kernel, and kill machine_kexec_mask_interrupts() entirely? We could, once we know for sure that all the potential irqchips have been fixed. Or we could just remove it immediately, and see what breaks. > That would avoid some work (including pointer chasing on potentially > corrupt memory) in the kernel that crashed, making it more likely that > we get to the crashkernel intact... Seems perfectly sensible to me. M. -- Jazz is not dead. It just smells funny...
Re: [bug, bisected] pfifo_fast causes packet reordering
On Tue, Mar 13, 2018 at 11:24 AM, Jakob Unterwurzacherwrote: > During stress-testing our "ucan" USB/CAN adapter SocketCAN driver on Linux > v4.16-rc4-383-ged58d66f60b3 we observed that a small fraction of packets are > delivered out-of-order. > > We have tracked the problem down to the driver interface level, and it seems > that the driver's net_device_ops.ndo_start_xmit() function gets the packets > handed over in the wrong order. > > This behavior was not observed on Linux v4.15 and I have bisected the > problem down to this patch: > >> commit c5ad119fb6c09b0297446be05bd66602fa564758 >> Author: John Fastabend >> Date: Thu Dec 7 09:58:19 2017 -0800 >> >>net: sched: pfifo_fast use skb_array >> >>This converts the pfifo_fast qdisc to use the skb_array data structure >>and set the lockless qdisc bit. pfifo_fast is the first qdisc to >> support >>the lockless bit that can be a child of a qdisc requiring locking. So >>we add logic to clear the lock bit on initialization in these cases >> when >>the qdisc graft operation occurs. >> >>This also removes the logic used to pick the next band to dequeue from >>and instead just checks a per priority array for packets from top >> priority >>to lowest. This might need to be a bit more clever but seems to work >>for now. >> >>Signed-off-by: John Fastabend >>Signed-off-by: David S. Miller > > > The patch does not revert cleanly, but moving to one commit earlier makes > the problem go away. > > Selecting the "fq" scheduler instead of "pfifo_fast" makes the problem go > away as well. I am of course, a fan of obsoleting pfifo_fast. There's no good reason for it anymore. > > Is this an unintended side-effect of the patch or is there something the > driver has to do to request in-order delivery? > > Thanks, > Jakob -- Dave Täht CEO, TekLibre, LLC http://www.teklibre.com Tel: 1-669-226-2619
[PATCH RFC 4/8] powerpc/64s: Add support for ori barrier_nospec
Copypasta from rfi implementation Signed-off-by: Michal Suchanek--- arch/powerpc/include/asm/barrier.h| 4 ++-- arch/powerpc/include/asm/feature-fixups.h | 9 + arch/powerpc/include/asm/setup.h | 8 arch/powerpc/kernel/setup_64.c| 29 + arch/powerpc/kernel/vmlinux.lds.S | 7 +++ arch/powerpc/lib/feature-fixups.c | 27 +++ 6 files changed, 82 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h index 8e47b3abe405..4079a95e84c2 100644 --- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -75,9 +75,9 @@ do { \ ___p1; \ }) -/* TODO: add patching so this can be disabled */ /* Prevent speculative execution past this barrier. */ -#define barrier_nospec_asm ori 31,31,0 +#define barrier_nospec_asm SPEC_BARRIER_FIXUP_SECTION; \ + nop #ifdef __ASSEMBLY__ #define barrier_nospec barrier_nospec_asm #else diff --git a/arch/powerpc/include/asm/feature-fixups.h b/arch/powerpc/include/asm/feature-fixups.h index 1e82eb3caabd..9d3382618ffd 100644 --- a/arch/powerpc/include/asm/feature-fixups.h +++ b/arch/powerpc/include/asm/feature-fixups.h @@ -195,11 +195,20 @@ label##3: \ FTR_ENTRY_OFFSET 951b-952b; \ .popsection; +#define SPEC_BARRIER_FIXUP_SECTION \ +953: \ + .pushsection __spec_barrier_fixup,"a"; \ + .align 2; \ +954: \ + FTR_ENTRY_OFFSET 953b-954b; \ + .popsection; + #ifndef __ASSEMBLY__ #include extern long __start___rfi_flush_fixup, __stop___rfi_flush_fixup; +extern long __start___spec_barrier_fixup, __stop___spec_barrier_fixup; void apply_feature_fixups(void); void setup_feature_keys(void); diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h index 469b7fdc9be4..486d02e4a310 100644 --- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -49,8 +49,16 @@ enum l1d_flush_type { L1D_FLUSH_MTTRIG= 0x8, }; +/* These are bit flags */ +enum spec_barrier_type { + SPEC_BARRIER_NONE = 0x1, + SPEC_BARRIER_ORI= 0x2, +}; + void __init setup_rfi_flush(enum l1d_flush_type, bool enable); void do_rfi_flush_fixups(enum l1d_flush_type types); +void __init setup_barrier_nospec(enum spec_barrier_type, bool enable); +void do_barrier_nospec_fixups(enum spec_barrier_type type); #endif /* !__ASSEMBLY__ */ diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index c388cc3357fa..09f21a954bfc 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -815,6 +815,10 @@ static enum l1d_flush_type enabled_flush_types; static void *l1d_flush_fallback_area; static bool no_rfi_flush; bool rfi_flush; +enum spec_barrier_type powerpc_barrier_nospec; +static enum spec_barrier_type barrier_nospec_type; +static bool no_nospec; +bool barrier_nospec_enabled; static int __init handle_no_rfi_flush(char *p) { @@ -899,6 +903,31 @@ void __init setup_rfi_flush(enum l1d_flush_type types, bool enable) rfi_flush_enable(enable); } +void barrier_nospec_enable(bool enable) +{ + barrier_nospec_enabled = enable; + + if (enable) { + powerpc_barrier_nospec = barrier_nospec_type; + do_barrier_nospec_fixups(powerpc_barrier_nospec); + on_each_cpu(do_nothing, NULL, 1); + } else { + powerpc_barrier_nospec = SPEC_BARRIER_NONE; + do_barrier_nospec_fixups(powerpc_barrier_nospec); + } +} + +void __init setup_barrier_nospec(enum spec_barrier_type type, bool enable) +{ + if (type & SPEC_BARRIER_ORI) + pr_info("barrier_nospec: Using ori type flush\n"); + + barrier_nospec_type = type; + + if (!no_nospec) + barrier_nospec_enable(enable); +} + #ifdef CONFIG_DEBUG_FS static int rfi_flush_set(void *data, u64 val) { diff --git a/arch/powerpc/kernel/vmlinux.lds.S b/arch/powerpc/kernel/vmlinux.lds.S index c8af90ff49f0..744b58ff77f1 100644 --- a/arch/powerpc/kernel/vmlinux.lds.S +++ b/arch/powerpc/kernel/vmlinux.lds.S @@ -139,6 +139,13 @@ SECTIONS *(__rfi_flush_fixup) __stop___rfi_flush_fixup = .; } + + . = ALIGN(8); + __spec_barrier_fixup : AT(ADDR(__spec_barrier_fixup) - LOAD_OFFSET) { + __start___spec_barrier_fixup = .; + *(__spec_barrier_fixup) +
[PATCH RFC 5/8] powerpc/64: Patch barrier_nospec in modules
Copypasta from lwsync patching. Note that unlike RFI which is patched only in kernel the nospec state reflects settings at the time the module was loaded. Iterating all modules and re-patching every time the settings change is not implemented. Signed-off-by: Michal Suchanek--- arch/powerpc/include/asm/setup.h | 5 - arch/powerpc/kernel/module.c | 6 ++ arch/powerpc/kernel/setup_64.c| 4 ++-- arch/powerpc/lib/feature-fixups.c | 17 ++--- 4 files changed, 26 insertions(+), 6 deletions(-) diff --git a/arch/powerpc/include/asm/setup.h b/arch/powerpc/include/asm/setup.h index 486d02e4a310..7e3a41248810 100644 --- a/arch/powerpc/include/asm/setup.h +++ b/arch/powerpc/include/asm/setup.h @@ -58,7 +58,10 @@ enum spec_barrier_type { void __init setup_rfi_flush(enum l1d_flush_type, bool enable); void do_rfi_flush_fixups(enum l1d_flush_type types); void __init setup_barrier_nospec(enum spec_barrier_type, bool enable); -void do_barrier_nospec_fixups(enum spec_barrier_type type); +void do_barrier_nospec_fixups_kernel(enum spec_barrier_type type); +void do_barrier_nospec_fixups(enum spec_barrier_type type, + void *start, void *end); +extern enum spec_barrier_type powerpc_barrier_nospec; #endif /* !__ASSEMBLY__ */ diff --git a/arch/powerpc/kernel/module.c b/arch/powerpc/kernel/module.c index 3f7ba0f5bf29..7b6d0ec06a21 100644 --- a/arch/powerpc/kernel/module.c +++ b/arch/powerpc/kernel/module.c @@ -72,6 +72,12 @@ int module_finalize(const Elf_Ehdr *hdr, do_feature_fixups(powerpc_firmware_features, (void *)sect->sh_addr, (void *)sect->sh_addr + sect->sh_size); + + sect = find_section(hdr, sechdrs, "__spec_barrier_fixup"); + if (sect != NULL) + do_barrier_nospec_fixups(powerpc_barrier_nospec, + (void *)sect->sh_addr, + (void *)sect->sh_addr + sect->sh_size); #endif sect = find_section(hdr, sechdrs, "__lwsync_fixup"); diff --git a/arch/powerpc/kernel/setup_64.c b/arch/powerpc/kernel/setup_64.c index 09f21a954bfc..d1d9f047161e 100644 --- a/arch/powerpc/kernel/setup_64.c +++ b/arch/powerpc/kernel/setup_64.c @@ -909,11 +909,11 @@ void barrier_nospec_enable(bool enable) if (enable) { powerpc_barrier_nospec = barrier_nospec_type; - do_barrier_nospec_fixups(powerpc_barrier_nospec); + do_barrier_nospec_fixups_kernel(powerpc_barrier_nospec); on_each_cpu(do_nothing, NULL, 1); } else { powerpc_barrier_nospec = SPEC_BARRIER_NONE; - do_barrier_nospec_fixups(powerpc_barrier_nospec); + do_barrier_nospec_fixups_kernel(powerpc_barrier_nospec); } } diff --git a/arch/powerpc/lib/feature-fixups.c b/arch/powerpc/lib/feature-fixups.c index 000e153184ad..b59ebc2215e8 100644 --- a/arch/powerpc/lib/feature-fixups.c +++ b/arch/powerpc/lib/feature-fixups.c @@ -156,14 +156,15 @@ void do_rfi_flush_fixups(enum l1d_flush_type types) printk(KERN_DEBUG "rfi-flush: patched %d locations\n", i); } -void do_barrier_nospec_fixups(enum spec_barrier_type type) +void do_barrier_nospec_fixups(enum spec_barrier_type type, + void *fixup_start, void *fixup_end) { unsigned int instr, *dest; long *start, *end; int i; - start = PTRRELOC(&__start___spec_barrier_fixup), - end = PTRRELOC(&__stop___spec_barrier_fixup); + start = fixup_start; + end = fixup_end; instr = 0x6000; /* nop */ @@ -182,6 +183,16 @@ void do_barrier_nospec_fixups(enum spec_barrier_type type) printk(KERN_DEBUG "barrier-nospec: patched %d locations\n", i); } +void do_barrier_nospec_fixups_kernel(enum spec_barrier_type type) +{ + void *start, *end; + + start = PTRRELOC(&__start___spec_barrier_fixup), + end = PTRRELOC(&__stop___spec_barrier_fixup); + + do_barrier_nospec_fixups(type, start, end); +} + #endif /* CONFIG_PPC_BOOK3S_64 */ void do_lwsync_fixups(unsigned long value, void *fixup_start, void *fixup_end) -- 2.13.6
Re: [PATCH 00/11] RISC-V: Resolve the issue of loadable module on 64-bit
On Tue, 13 Mar 2018 01:35:05 PDT (-0700), z...@andestech.com wrote: These patches resolve the some issues of loadable module. - symbol out of ranges - unknown relocation types The reference of external variable and function symbols cannot exceed 32-bit offset ranges in kernel module. The module only can work on the 32-bit OS or the 64-bit OS with sv32 virtual addressing. These patches will generate the .got, .got.plt and .plt sections during loading module, let it can refer to the symbol which locate more than 32-bit offset. These sections depend on the relocation types: - R_RISCV_GOT_HI20 - R_RISCV_CALL_PLT These patches also support more relocation types - R_RISCV_CALL - R_RISCV_HI20 - R_RISCV_LO12_I - R_RISCV_LO12_S - R_RISCV_RVC_BRANCH - R_RISCV_RVC_JUMP - R_RISCV_ALIGN - R_RISCV_ADD32 - R_RISCV_SUB32 Zong Li (11): RISC-V: Add sections of PLT and GOT for kernel module RISC-V: Add section of GOT.PLT for kernel module RISC-V: Support GOT_HI20/CALL_PLT relocation type in kernel module RISC-V: Support CALL relocation type in kernel module RISC-V: Support HI20/LO12_I/LO12_S relocation type in kernel module RISC-V: Support RVC_BRANCH/JUMP relocation type in kernel modulewq RISC-V: Support ALIGN relocation type in kernel module RISC-V: Support ADD32 relocation type in kernel module RISC-V: Support SUB32 relocation type in kernel module RISC-V: Enable module support in defconfig RISC-V: Add definition of relocation types arch/riscv/Kconfig | 5 ++ arch/riscv/Makefile | 3 + arch/riscv/configs/defconfig| 2 + arch/riscv/include/asm/module.h | 112 +++ arch/riscv/include/uapi/asm/elf.h | 24 + arch/riscv/kernel/Makefile | 1 + arch/riscv/kernel/module-sections.c | 156 arch/riscv/kernel/module.c | 175 ++-- arch/riscv/kernel/module.lds| 8 ++ 9 files changed, 480 insertions(+), 6 deletions(-) create mode 100644 arch/riscv/include/asm/module.h create mode 100644 arch/riscv/kernel/module-sections.c create mode 100644 arch/riscv/kernel/module.lds This is the second set of patches that turn on modules, and it has the same R_RISCV_ALIGN problem as the other one http://lists.infradead.org/pipermail/linux-riscv/2018-February/81.html It looks like this one uses shared libraries for modules instead of static objects. I think using shared objects is the right thing to do, as it'll allow us to place modules anywhere in the address space by having multiple GOTs and PLTs. That's kind of complicated, though, so we can start with something simpler like this.
[PATCH RFC 1/8] powerpc: Add barrier_nospec
Copypasta from original gmb() and rfi implementation Signed-off-by: Michal Suchanek--- arch/powerpc/include/asm/barrier.h | 9 + 1 file changed, 9 insertions(+) diff --git a/arch/powerpc/include/asm/barrier.h b/arch/powerpc/include/asm/barrier.h index 10daa1d56e0a..8e47b3abe405 100644 --- a/arch/powerpc/include/asm/barrier.h +++ b/arch/powerpc/include/asm/barrier.h @@ -75,6 +75,15 @@ do { \ ___p1; \ }) +/* TODO: add patching so this can be disabled */ +/* Prevent speculative execution past this barrier. */ +#define barrier_nospec_asm ori 31,31,0 +#ifdef __ASSEMBLY__ +#define barrier_nospec barrier_nospec_asm +#else +#define barrier_nospec() __asm__ __volatile__ (stringify_in_c(barrier_nospec_asm) : : :) +#endif + #include #endif /* _ASM_POWERPC_BARRIER_H */ -- 2.13.6
[PATCH RFC 2/8] powerpc: Use barrier_nospec in copy_from_user
Coopypasta from x86. Signed-off-by: Michal Suchanek--- arch/powerpc/include/asm/uaccess.h | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index 51bfeb8777f0..af9b0e731f46 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -248,6 +248,7 @@ do { \ __chk_user_ptr(ptr);\ if (!is_kernel_addr((unsigned long)__gu_addr)) \ might_fault(); \ + barrier_nospec(); \ __get_user_size(__gu_val, __gu_addr, (size), __gu_err); \ (x) = (__typeof__(*(ptr)))__gu_val; \ __gu_err; \ @@ -258,8 +259,10 @@ do { \ long __gu_err = -EFAULT;\ unsigned long __gu_val = 0;\ const __typeof__(*(ptr)) __user *__gu_addr = (ptr); \ + int can_access = access_ok(VERIFY_READ, __gu_addr, (size)); \ might_fault(); \ - if (access_ok(VERIFY_READ, __gu_addr, (size))) \ + barrier_nospec(); \ + if (can_access) \ __get_user_size(__gu_val, __gu_addr, (size), __gu_err); \ (x) = (__force __typeof__(*(ptr)))__gu_val; \ __gu_err; \ @@ -271,6 +274,7 @@ do { \ unsigned long __gu_val; \ const __typeof__(*(ptr)) __user *__gu_addr = (ptr); \ __chk_user_ptr(ptr);\ + barrier_nospec(); \ __get_user_size(__gu_val, __gu_addr, (size), __gu_err); \ (x) = (__force __typeof__(*(ptr)))__gu_val; \ __gu_err; \ @@ -298,15 +302,19 @@ static inline unsigned long raw_copy_from_user(void *to, switch (n) { case 1: + barrier_nospec(); __get_user_size(*(u8 *)to, from, 1, ret); break; case 2: + barrier_nospec(); __get_user_size(*(u16 *)to, from, 2, ret); break; case 4: + barrier_nospec(); __get_user_size(*(u32 *)to, from, 4, ret); break; case 8: + barrier_nospec(); __get_user_size(*(u64 *)to, from, 8, ret); break; } @@ -314,6 +322,7 @@ static inline unsigned long raw_copy_from_user(void *to, return 0; } + barrier_nospec(); return __copy_tofrom_user((__force void __user *)to, from, n); } -- 2.13.6
[PATCH v3 1/3] arm64: dts: apq8096-db820c: enable bluetooth node
Add a new serial node for the Qualcomm BT controller QCA6174. This allows automatic probing and hci registration through the serdev framework instead of relying on the userspace helpers. Signed-off-by: Thierry Escande--- arch/arm64/boot/dts/qcom/apq8096-db820c-pins.dtsi | 14 ++ .../boot/dts/qcom/apq8096-db820c-pmic-pins.dtsi| 17 arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi | 32 ++ arch/arm64/boot/dts/qcom/msm8996.dtsi | 10 +++ 4 files changed, 73 insertions(+) diff --git a/arch/arm64/boot/dts/qcom/apq8096-db820c-pins.dtsi b/arch/arm64/boot/dts/qcom/apq8096-db820c-pins.dtsi index 24552f19b3fa..172165d84669 100644 --- a/arch/arm64/boot/dts/qcom/apq8096-db820c-pins.dtsi +++ b/arch/arm64/boot/dts/qcom/apq8096-db820c-pins.dtsi @@ -36,4 +36,18 @@ drive-strength = <2>; /* 2 MA */ }; }; + + blsp1_uart1_default: blsp1_uart1_default { + function = "blsp_uart2"; + pins = "gpio41", "gpio42", "gpio43", "gpio44"; + drive-strength = <16>; + bias-disable; + }; + + blsp1_uart1_sleep: blsp1_uart1_sleep { + function = "gpio"; + pins = "gpio41", "gpio42", "gpio43", "gpio44"; + drive-strength = <2>; + bias-disable; + }; }; diff --git a/arch/arm64/boot/dts/qcom/apq8096-db820c-pmic-pins.dtsi b/arch/arm64/boot/dts/qcom/apq8096-db820c-pmic-pins.dtsi index 59b29ddfb6e9..f8d2a3b10b1f 100644 --- a/arch/arm64/boot/dts/qcom/apq8096-db820c-pmic-pins.dtsi +++ b/arch/arm64/boot/dts/qcom/apq8096-db820c-pmic-pins.dtsi @@ -26,6 +26,23 @@ }; }; + divclk4_pin_a: divclk4 { + pins = "gpio18"; + function = "func2"; + + bias-disable; + power-source = ; + }; + + bt_en_pin_a: bt-en-active { + pins = "gpio19"; + function = "normal"; + + output-low; + power-source = ; + qcom,drive-strength = ; + }; + usb3_vbus_det_gpio: pm8996_gpio22 { pinconf { pins = "gpio22"; diff --git a/arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi b/arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi index 1c8f1b86472d..b05d6bc0b856 100644 --- a/arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi +++ b/arch/arm64/boot/dts/qcom/apq8096-db820c.dtsi @@ -23,6 +23,7 @@ aliases { serial0 = _uart1; serial1 = _uart2; + serial2 = _uart1; i2c0= _i2c2; i2c1= _i2c1; i2c2= _i2c0; @@ -34,7 +35,38 @@ stdout-path = "serial0:115200n8"; }; + clocks { + divclk4: divclk4 { + compatible = "fixed-clock"; + #clock-cells = <0>; + clock-frequency = <32768>; + clock-output-names = "divclk4"; + + pinctrl-names = "default"; + pinctrl-0 = <_pin_a>; + }; + }; + soc { + serial@757 { + label = "BT-UART"; + status = "okay"; + pinctrl-names = "default", "sleep"; + pinctrl-0 = <_uart1_default>; + pinctrl-1 = <_uart1_sleep>; + + bluetooth { + compatible = "qcom,qca6174-bt"; + + bt-disable-n-gpios = <_gpios 19 GPIO_ACTIVE_HIGH>; + + pinctrl-names = "default"; + pinctrl-0 = <_en_pin_a>; + + clocks = <>; + }; + }; + serial@75b { label = "LS-UART1"; status = "okay"; diff --git a/arch/arm64/boot/dts/qcom/msm8996.dtsi b/arch/arm64/boot/dts/qcom/msm8996.dtsi index 0a6f7952bbb1..2d54a86a027f 100644 --- a/arch/arm64/boot/dts/qcom/msm8996.dtsi +++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi @@ -408,6 +408,16 @@ #clock-cells = <1>; }; + blsp1_uart1: serial@757 { + compatible = "qcom,msm-uartdm-v1.4", "qcom,msm-uartdm"; + reg = <0x0757 0x1000>; + interrupts = ; + clocks = < GCC_BLSP1_UART2_APPS_CLK>, +< GCC_BLSP1_AHB_CLK>; + clock-names = "core", "iface"; + status = "disabled"; + }; + blsp1_spi0: spi@7575000 { compatible = "qcom,spi-qup-v2.2.1"; reg = <0x07575000 0x600>; -- 2.14.1
[RT PATCH 2/2] block: blk-mq: move blk_queue_usage_counter_release() into process context
| BUG: sleeping function called from invalid context at kernel/locking/rtmutex.c:914 | in_atomic(): 1, irqs_disabled(): 0, pid: 255, name: kworker/u257:6 | 5 locks held by kworker/u257:6/255: | #0: ("events_unbound"){.+.+.+}, at: [] process_one_work+0x171/0x5e0 | #1: ((>work)){+.+.+.}, at: [] process_one_work+0x171/0x5e0 | #2: (>scan_mutex){+.+.+.}, at: [] __scsi_add_device+0xa3/0x130 [scsi_mod] | #3: (>tag_list_lock){+.+...}, at: [] blk_mq_init_queue+0x96a/0xa50 | #4: (rcu_read_lock_sched){..}, at: [] percpu_ref_kill_and_confirm+0x1d/0x120 | Preemption disabled at:[] blk_mq_freeze_queue_start+0x56/0x70 | | CPU: 2 PID: 255 Comm: kworker/u257:6 Not tainted 3.18.7-rt0+ #1 | Workqueue: events_unbound async_run_entry_fn | 0003 8800bc29f998 815b3a12 | 8800bc29f9b8 8109aa16 8800bc29fa28 | 8800bc5d1bc8 8800bc29f9e8 815b8dd4 8800 | Call Trace: | [] dump_stack+0x4f/0x7c | [] __might_sleep+0x116/0x190 | [] rt_spin_lock+0x24/0x60 | [] __wake_up+0x29/0x60 | [] blk_mq_usage_counter_release+0x1e/0x20 | [] percpu_ref_kill_and_confirm+0x106/0x120 | [] blk_mq_freeze_queue_start+0x56/0x70 | [] blk_mq_update_tag_set_depth+0x40/0xd0 | [] blk_mq_init_queue+0x98c/0xa50 | [] scsi_mq_alloc_queue+0x20/0x60 [scsi_mod] | [] scsi_alloc_sdev+0x2f5/0x370 [scsi_mod] | [] scsi_probe_and_add_lun+0x9e4/0xdd0 [scsi_mod] | [] __scsi_add_device+0x126/0x130 [scsi_mod] | [] ata_scsi_scan_host+0xaf/0x200 [libata] | [] async_port_probe+0x46/0x60 [libata] | [] async_run_entry_fn+0x3b/0xf0 | [] process_one_work+0x201/0x5e0 percpu_ref_kill_and_confirm() invokes blk_mq_usage_counter_release() in a rcu-sched region. swait based wake queue can't be used due to wake_up_all() usage and disabled interrupts in !RT configs (as reported by Corey Minyard). Uses work_queue() to invoke wake_up_all() in process context. Signed-off-by: Sebastian Andrzej Siewior--- block/blk-core.c | 13 - include/linux/blkdev.h | 2 ++ 2 files changed, 14 insertions(+), 1 deletion(-) diff --git a/block/blk-core.c b/block/blk-core.c index f9ac6f169c67..4db4051724ea 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -814,12 +814,20 @@ void blk_queue_exit(struct request_queue *q) percpu_ref_put(>q_usage_counter); } +static void blk_queue_usage_counter_release_swork(struct swork_event *sev) +{ + struct request_queue *q = + container_of(sev, struct request_queue, mq_pcpu_wake); + + wake_up_all(>mq_freeze_wq); +} + static void blk_queue_usage_counter_release(struct percpu_ref *ref) { struct request_queue *q = container_of(ref, struct request_queue, q_usage_counter); - wake_up_all(>mq_freeze_wq); + swork_queue(>mq_pcpu_wake); } static void blk_rq_timed_out_timer(unsigned long data) @@ -896,6 +904,7 @@ struct request_queue *blk_alloc_queue_node(gfp_t gfp_mask, int node_id) __set_bit(QUEUE_FLAG_BYPASS, >queue_flags); init_waitqueue_head(>mq_freeze_wq); + INIT_SWORK(>mq_pcpu_wake, blk_queue_usage_counter_release_swork); /* * Init percpu_ref in atomic mode so that it's faster to shutdown. @@ -3623,6 +3632,8 @@ int __init blk_dev_init(void) if (!kblockd_workqueue) panic("Failed to create kblockd\n"); + BUG_ON(swork_get()); + request_cachep = kmem_cache_create("blkdev_requests", sizeof(struct request), 0, SLAB_PANIC, NULL); diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h index b68752bfb645..49b53ad6d2d6 100644 --- a/include/linux/blkdev.h +++ b/include/linux/blkdev.h @@ -27,6 +27,7 @@ #include #include #include +#include struct module; struct scsi_ioctl_command; @@ -599,6 +600,7 @@ struct request_queue { #endif struct rcu_head rcu_head; wait_queue_head_t mq_freeze_wq; + struct swork_event mq_pcpu_wake; struct percpu_ref q_usage_counter; struct list_headall_q_node; -- 2.16.2
[GIT PULL] auxdisplay for v4.16-rc6
Hi Linus, Please pull these small changes to silence a few warnings in auxdisplay. This is my first pull request, so if there is anything wrong, please let me know! Also, note that for the moment my PGP/GPG key is only signed by 2 other kernel maintainers and that we did it over email/Skype. I used GitHub as well for the same reason. Cheers, Miguel Ojeda The following changes since commit 0c8efd610b58cb23cefdfa12015799079aef94ae: Linux 4.16-rc5 (2018-03-11 17:25:09 -0700) are available in the git repository at: https://github.com/ojeda/linux.git tags/auxdisplay-for-linus-v4.16-rc6 for you to fetch changes up to 26a2c54d03bd514fb3d3520706f911b3ca56b011: auxdisplay: img-ascii-lcd: Silence 2 uninitialized warnings (2018-03-13 18:16:38 +0100) Silence a few warnings in auxdisplay. - A couple of uninitialized warnings reported by the build service. - A doc comment warning under W=1. - Three fall through comments not recognized under W=1. Miguel Ojeda (3): auxdisplay: panel: Change comments to silence fallthrough warnings auxdisplay: img-ascii-lcd: Fix doc comment to silence warnings auxdisplay: img-ascii-lcd: Silence 2 uninitialized warnings drivers/auxdisplay/img-ascii-lcd.c | 6 +++--- drivers/auxdisplay/panel.c | 6 +++--- 2 files changed, 6 insertions(+), 6 deletions(-)
Re: [PATCH v4 1/6] sysctl: Add flags to support min/max range clamping
On 03/13/2018 01:46 PM, Eric W. Biederman wrote: > Waiman Longwrites: > >> When minimum/maximum values are specified for a sysctl parameter in >> the ctl_table structure with proc_dointvec_minmax() handler, update >> to that parameter will fail with error if the given value is outside >> of the required range. >> >> There are use cases where it may be better to clamp the value of >> the sysctl parameter to the given range without failing the update, >> especially if the users are not aware of the actual range limits. >> Reading the value back after the update will now be a good practice >> to see if the provided value exceeds the range limits. > What use cases? Who will break. Examples would be good. See my response to the 4/6 patch. >> To provide this less restrictive form of range checking, a new flags >> field is added to the ctl_table structure. >> >> When the CTL_FLAGS_CLAMP_RANGE flag is set in the ctl_table >> entry, any update from the userspace will be clamped to the given >> range without error if either the proc_dointvec_minmax() or the >> proc_douintvec_minmax() handlers is used. > As this is constructed it will increase the size of ctl_table for > everyone. Better would be to add a new function that behaves similary > but differently than to burden struct ctl_table for the rest of time. If you have concern about increasing the size of the ctl_table, I can revert it back to use unsigned short for flag which will fit into the hole left by the 16-bit mode field. I thought increasing the size of ctl_table a bit isn't a big problem, I may be wrong on that. Adding the new flags field will afford us the ability to better customize the sysctl parameters to different needs that may arise in the future. Cheers, Longman
[PATCH V2] perf/x86/intel/uncore: Querying number of CHAs from CAPID6 register
From: Kan LiangThe number of CHAs is miscalculated on multi PCI domain systems on Skylake server. (From Kroening, Gary: For systems with a single PCI segment, it is sufficient to look for the bus number to change in order to determine that all of the CHa's have been counted for a single socket. However, for multi PCI segment systems, each socket is given a new segment and the bus number does NOT change. So looking only for the bus number to change ends up counting all of the CHa's on all sockets in the system. This leads to writing CPU MSRs beyond a valid range and causes an error in ivbep_uncore_msr_init_box().) To determine the number of CHAs, it should read bits 27:0 in the CAPID6 register located at Device 30, Function 3, Offset 0x9C. These 28 bits form a bit vector of available LLC slices and the CHAs that manage those slices. Fixes: cd34cd97b7b4 ("perf/x86/intel/uncore: Add Skylake server uncore support") Reported-by: Kroening, Gary Signed-off-by: Kan Liang --- Changes since V1: - add missed pci_dev_put() - Drop ugly casting by using hweight32() - Add comments for macros. arch/x86/events/intel/uncore_snbep.c | 31 +-- 1 file changed, 17 insertions(+), 14 deletions(-) diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c index 6d8044a..8970f71 100644 --- a/arch/x86/events/intel/uncore_snbep.c +++ b/arch/x86/events/intel/uncore_snbep.c @@ -3562,24 +3562,27 @@ static struct intel_uncore_type *skx_msr_uncores[] = { NULL, }; +/* + * To determine the number of CHAs, it should read bits 27:0 in the CAPID6 + * register which located at Device 30, Function 3, Offset 0x9C. PCI ID 0x2083. + */ +#define SKX_CAPID6 0x9c +#define SKX_CHA_BIT_MASK GENMASK(27, 0) + static int skx_count_chabox(void) { - struct pci_dev *chabox_dev = NULL; - int bus, count = 0; + struct pci_dev *dev = NULL; + u32 val = 0; - while (1) { - chabox_dev = pci_get_device(PCI_VENDOR_ID_INTEL, 0x208d, chabox_dev); - if (!chabox_dev) - break; - if (count == 0) - bus = chabox_dev->bus->number; - if (bus != chabox_dev->bus->number) - break; - count++; - } + dev = pci_get_device(PCI_VENDOR_ID_INTEL, 0x2083, dev); + if (!dev) + goto out; - pci_dev_put(chabox_dev); - return count; + pci_read_config_dword(dev, SKX_CAPID6, ); + val &= SKX_CHA_BIT_MASK; +out: + pci_dev_put(dev); + return hweight32(val); } void skx_uncore_cpu_init(void) -- 2.7.4
Re: [v5 1/2] mm: disable interrupts while initializing deferred pages
On Tue, 13 Mar 2018 12:04:30 -0400 Pavel Tatashinwrote: > > > > > --- a/mm/page_alloc.c > > > +++ b/mm/page_alloc.c > > > @@ -1506,7 +1506,6 @@ static void __init deferred_free_pages(int nid, int > > > zid, unsigned long pfn, > > > } else if (!(pfn & nr_pgmask)) { > > > deferred_free_range(pfn - nr_free, nr_free); > > > nr_free = 1; > > > - cond_resched(); > > > } else { > > > nr_free++; > > > > And how can we simply remove these cond_resched()s? I assume this is > > being done because interrupts are now disabled? But those were there > > for a reason, weren't they? > > We must remove cond_resched() because we can't sleep anymore. They were > added to fight NMI timeouts, so I will replace them with > touch_nmi_watchdog() in a follow-up fix. This makes no sense. Any code section where we can add cond_resched() was never subject to NMI timeouts because that code cannot be running with disabled interrupts.
RE: [PATCH v5 1/4] dt-bindings: firmware: Add bindings for ZynqMP firmware
> -Original Message- > From: Sudeep Holla [mailto:sudeep.ho...@arm.com] > Sent: Tuesday, March 13, 2018 3:16 AM > To: Jolly Shah> Cc: Sudeep Holla ; gre...@linuxfoundation.org; > m...@codeblueprint.co.uk; hkallwe...@gmail.com; michal.si...@xilinx.com; > robh...@kernel.org; mark.rutl...@arm.com; ard.biesheu...@linaro.org; > mi...@kernel.org; keesc...@chromium.org; dmitry.torok...@gmail.com; > Rajan Vaja ; linux-arm-ker...@lists.infradead.org; linux- > ker...@vger.kernel.org; devicet...@vger.kernel.org > Subject: Re: [PATCH v5 1/4] dt-bindings: firmware: Add bindings for ZynqMP > firmware > > > > On 12/03/18 23:07, Jolly Shah wrote: > > Hi Sudeep, > > Do you foresee using SMC/HVC for this firmware even on future > platforms? > If not, I suggest to keep the protocol part separate from the transport > i.e. > smc/hvc via ATF. It could be replaced with mailbox or some h/w > mechanism in future ? > > >>> > >>> We have PSCI and EEMI interfaces exposed to linux from ATF. PSCI is > >>> an EEMI client. We do not have current plans to switch to mailbox as > >>> it will require 2 communication channels to PMU as PSCI is through ATF. > >>> > >> > >> OK, but I just saw some bindings that has mailbox interface, honestly > >> it's getting too confusing with multiple series on the same thing > >> floating and hence I requested to put it together as one series. > > > > Mailbox binding is used for power management driver. Mailbox is only > > used for PMU->APU communication. APU->PMU communication is always > > through EEMI firmware interface which is using SMC/HVC. > > > > Ah OK, is it because there's no non-secure mailbox or to avoid races, all non- > secure EEMI is channeled through SMC ? > 2 reasons: 1> Avoid multiple EEMI communication channels as PSCI is through ATF. 2> We have some secure operations handled in ATF because of memory constraints on PMU > -- > Regards, > Sudeep
Re: [PATCH 2/2] hwmon: (ucd9000) Add debugfs attributes to provide mfr_status
On 03/10/2018 10:50 AM, Guenter Roeck wrote: On 03/09/2018 11:19 AM, Eddie James wrote: From: Christopher BosticExpose the gpiN_fault fields of mfr_status as individual debugfs attributes. This provides a way for users to be easily notified of gpi faults. Also provide the whole mfr_status register in debugfs. Signed-off-by: Christopher Bostic Signed-off-by: Andrew Jeffery Signed-off-by: Eddie James --- drivers/hwmon/pmbus/ucd9000.c | 172 +- 1 file changed, 171 insertions(+), 1 deletion(-) diff --git a/drivers/hwmon/pmbus/ucd9000.c b/drivers/hwmon/pmbus/ucd9000.c index e3a507f..297da0e 100644 --- a/drivers/hwmon/pmbus/ucd9000.c +++ b/drivers/hwmon/pmbus/ucd9000.c @@ -19,6 +19,7 @@ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ +#include #include #include #include @@ -36,6 +37,7 @@ #define UCD9000_NUM_PAGES 0xd6 #define UCD9000_FAN_CONFIG_INDEX 0xe7 #define UCD9000_FAN_CONFIG 0xe8 +#define UCD9000_MFR_STATUS 0xf3 #define UCD9000_GPIO_SELECT 0xfa #define UCD9000_GPIO_CONFIG 0xfb #define UCD9000_DEVICE_ID 0xfd @@ -63,13 +65,22 @@ #define UCD901XX_NUM_GPIOS 26 #define UCD90910_NUM_GPIOS 26 +#define UCD9000_DEBUGFS_NAME_LEN 24 +#define UCD9000_GPI_COUNT 8 + struct ucd9000_data { u8 fan_data[UCD9000_NUM_FAN][I2C_SMBUS_BLOCK_MAX]; struct pmbus_driver_info info; struct gpio_chip gpio; + struct dentry *debugfs; }; #define to_ucd9000_data(_info) container_of(_info, struct ucd9000_data, info) +struct ucd9000_debugfs_entry { + struct i2c_client *client; + u8 index; +}; + static int ucd9000_get_fan_config(struct i2c_client *client, int fan) { int fan_config = 0; @@ -328,6 +339,156 @@ static int ucd9000_gpio_direction_output(struct gpio_chip *gc, val); } +#if IS_ENABLED(CONFIG_DEBUG_FS) +static int ucd9000_get_mfr_status(struct i2c_client *client, u8 *buffer) +{ + int ret = pmbus_set_page(client, 0); + + if (ret < 0) + return ret; + + /* + * With the ucd90120 and ucd90124 devices, this command [MFR_STATUS] + * is 2 bytes long (bits 0-15). With the ucd90240 this command is 5 + * bytes long. With all other devices, it is 4 bytes long. + */ + return i2c_smbus_read_block_data(client, UCD9000_MFR_STATUS, buffer); +} + +static int ucd9000_debugfs_show_mfr_status_bit(void *data, u64 *val) +{ + struct ucd9000_debugfs_entry *entry = data; + struct i2c_client *client = entry->client; + u8 buffer[4]; + int ret; + + /* + * This attribute is only created for devices that return 4 bytes for + * status_mfr, so it's safe to call with 4-byte buffer. + */ + ret = ucd9000_get_mfr_status(client, buffer); + if (ret < 0) { + dev_err(>dev, "Failed to read mfr status. rc:%d\n", + ret); + + return ret; + } + + /* + * Attribute only created for devices with gpi fault bits at bits + * 16-23, which is the second byte of the response. + */ + *val = !!(buffer[1] & BIT(entry->index)); + + return 0; +} +DEFINE_DEBUGFS_ATTRIBUTE(ucd9000_debugfs_mfr_status_bit, + ucd9000_debugfs_show_mfr_status_bit, NULL, "%1lld\n"); + +static int ucd9000_debugfs_show_mfr_status_word2(void *data, u64 *val) +{ + struct i2c_client *client = data; + __be16 buffer; + int ret; + + ret = ucd9000_get_mfr_status(client, (u8 *)); + if (ret < 0) { + dev_err(>dev, "Failed to read mfr status. rc:%d\n", + ret); + + return ret; + } + + *val = be16_to_cpu(buffer); + + return 0; +} +DEFINE_DEBUGFS_ATTRIBUTE(ucd9000_debugfs_mfr_status_word2, + ucd9000_debugfs_show_mfr_status_word2, NULL, + "%04llx\n"); + +static int ucd9000_debugfs_show_mfr_status_word4(void *data, u64 *val) +{ + struct i2c_client *client = data; + __be32 buffer; + int ret; + + ret = ucd9000_get_mfr_status(client, (u8 *)); + if (ret < 0) { + dev_err(>dev, "Failed to read mfr status. rc:%d\n", + ret); + + return ret; + } + + *val = be32_to_cpu(buffer); + + return 0; +} +DEFINE_DEBUGFS_ATTRIBUTE(ucd9000_debugfs_mfr_status_word4, + ucd9000_debugfs_show_mfr_status_word4, NULL, + "%08llx\n"); + +static int ucd9000_init_debugfs(struct i2c_client *client, + const struct i2c_device_id *mid, + struct ucd9000_data *data) +{ + struct dentry *debugfs; + struct ucd9000_debugfs_entry *entries; + int i; + char name[UCD9000_DEBUGFS_NAME_LEN]; + + debugfs = pmbus_get_debugfs_dir(client); + if (!debugfs) + return -ENOENT; + + data->debugfs = debugfs_create_dir(client->name, debugfs); + if (!data->debugfs) + return -ENOENT; + + /* + * Of
[PATCH] hv_netvsc: Make sure out channel is fully opened on send
Dring high network traffic changes to network interface parameters such as number of channels or MTU can cause a kernel panic with a NULL pointer dereference. This is due to netvsc_device_remove() being called and deallocating the channel ring buffers, which can then be accessed by netvsc_send_pkt() before they're allocated on calling netvsc_device_add() The patch fixes this problem by checking the channel state and returning ENODEV if not yet opened. We also move the call to hv_ringbuf_avail_percent() which may access the uninitialized ring buffer. Signed-off-by: Mohammed Gamal--- drivers/net/hyperv/netvsc.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/net/hyperv/netvsc.c b/drivers/net/hyperv/netvsc.c index 0265d70..44a8358 100644 --- a/drivers/net/hyperv/netvsc.c +++ b/drivers/net/hyperv/netvsc.c @@ -757,7 +757,7 @@ static inline int netvsc_send_pkt( struct netdev_queue *txq = netdev_get_tx_queue(ndev, packet->q_idx); u64 req_id; int ret; - u32 ring_avail = hv_ringbuf_avail_percent(_channel->outbound); + u32 ring_avail; nvmsg.hdr.msg_type = NVSP_MSG1_TYPE_SEND_RNDIS_PKT; if (skb) @@ -773,7 +773,7 @@ static inline int netvsc_send_pkt( req_id = (ulong)skb; - if (out_channel->rescind) + if (out_channel->rescind || out_channel->state != CHANNEL_OPENED_STATE) return -ENODEV; if (packet->page_buf_cnt) { @@ -791,6 +791,7 @@ static inline int netvsc_send_pkt( VMBUS_DATA_PACKET_FLAG_COMPLETION_REQUESTED); } + ring_avail = hv_ringbuf_avail_percent(_channel->outbound); if (ret == 0) { atomic_inc_return(>queue_sends); -- 1.8.3.1
Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory
On 3/13/2018 2:44 PM, Logan Gunthorpe wrote: >> Do you think you can keep a pointer to the parent bridge instead of querying >> it >> via get_upstream_bridge_port() here so that we can reuse your >> pci_p2pdma_disable_acs() in the future. > > Keep a pointer where? pci_p2pdma_disable_acs() and pci_p2pdma_add_client() > are used in completely different cases on completely different devices. There > really is no overlap and no obvious place to store the port pointer (except > in the struct pci_dev itself, in which case why not just call the function > again). I was thinking of this for the pci_p2pdma_add_client() case for the parent pointer. +struct pci_p2pdma_client { + struct list_head list; + struct pci_dev *client; + struct pci_dev *provider; +}; You are right second code seems to be there to disable ACS altogether when this kernel configuration is selected. It doesn't have any relationship to the client code. But then, Why bother searching for the switch at all? Even if the switch is there, there is no guarantee that it is currently being used for P2P. It seems that we are going with the assumption that enabling this config option implies you want P2P, then we can simplify this code as well. -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
[PATCH net 2/2] vmxnet3: use correct flag to indicate LRO feature
'Commit 45dac1d6ea04 ("vmxnet3: Changes for vmxnet3 adapter version 2 (fwd)")' introduced a flag "lro" in structure vmxnet3_adapter which is used to indicate whether LRO is enabled or not. However, the patch did not set the flag and hence it was never exercised. So, when LRO is enabled, it resulted in poor TCP performance due to delayed acks. This issue is seen with packets which are larger than the mss getting a delayed ack rather than an immediate ack, thus resulting in high latency. This patch removes the lro flag and directly uses device features against NETIF_F_LRO to check if lro is enabled. Reported-by: Rachel LunnonSigned-off-by: Ronak Doshi Acked-by: Shrikrishna Khare --- drivers/net/vmxnet3/vmxnet3_drv.c | 3 ++- drivers/net/vmxnet3/vmxnet3_int.h | 5 ++--- 2 files changed, 4 insertions(+), 4 deletions(-) diff --git a/drivers/net/vmxnet3/vmxnet3_drv.c b/drivers/net/vmxnet3/vmxnet3_drv.c index 052eef2f729f..86c4d6e4dfaa 100644 --- a/drivers/net/vmxnet3/vmxnet3_drv.c +++ b/drivers/net/vmxnet3/vmxnet3_drv.c @@ -1473,7 +1473,8 @@ vmxnet3_rq_rx_complete(struct vmxnet3_rx_queue *rq, vmxnet3_rx_csum(adapter, skb, (union Vmxnet3_GenericDesc *)rcd); skb->protocol = eth_type_trans(skb, adapter->netdev); - if (!rcd->tcp || !adapter->lro) + if (!rcd->tcp || + !(adapter->netdev->features & NETIF_F_LRO)) goto not_lro; if (segCnt != 0 && mss != 0) { diff --git a/drivers/net/vmxnet3/vmxnet3_int.h b/drivers/net/vmxnet3/vmxnet3_int.h index b94fdfd0b6f1..99387a4a20a8 100644 --- a/drivers/net/vmxnet3/vmxnet3_int.h +++ b/drivers/net/vmxnet3/vmxnet3_int.h @@ -69,10 +69,10 @@ /* * Version numbers */ -#define VMXNET3_DRIVER_VERSION_STRING "1.4.12.0-k" +#define VMXNET3_DRIVER_VERSION_STRING "1.4.13.0-k" /* a 32-bit int, each byte encode a verion number in VMXNET3_DRIVER_VERSION */ -#define VMXNET3_DRIVER_VERSION_NUM 0x01040c00 +#define VMXNET3_DRIVER_VERSION_NUM 0x01040d00 #if defined(CONFIG_PCI_MSI) /* RSS only makes sense if MSI-X is supported. */ @@ -343,7 +343,6 @@ struct vmxnet3_adapter { u8 version; boolrxcsum; - boollro; #ifdef VMXNET3_RSS struct UPT1_RSSConf *rss_conf; -- 2.11.0
[PATCH] iio/adc/nau7802: Improve unlocking of a mutex in nau7802_read_raw()
From: Markus ElfringDate: Tue, 13 Mar 2018 20:52:26 +0100 * Add a jump target so that a call of the function "mutex_unlock" is stored only once in this function implementation. * Replace two calls by goto statements. This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring --- drivers/iio/adc/nau7802.c | 15 +-- 1 file changed, 5 insertions(+), 10 deletions(-) diff --git a/drivers/iio/adc/nau7802.c b/drivers/iio/adc/nau7802.c index 8997e74a8847..68d06a492760 100644 --- a/drivers/iio/adc/nau7802.c +++ b/drivers/iio/adc/nau7802.c @@ -303,10 +303,8 @@ static int nau7802_read_raw(struct iio_dev *indio_dev, * - Channel 2 is value 1 in the CHS register */ ret = i2c_smbus_read_byte_data(st->client, NAU7802_REG_CTRL2); - if (ret < 0) { - mutex_unlock(>lock); - return ret; - } + if (ret < 0) + goto unlock; if (((ret & NAU7802_CTRL2_CHS_BIT) && !chan->channel) || (!(ret & NAU7802_CTRL2_CHS_BIT) && @@ -316,18 +314,15 @@ static int nau7802_read_raw(struct iio_dev *indio_dev, NAU7802_REG_CTRL2, NAU7802_CTRL2_CHS(chan->channel) | NAU7802_CTRL2_CRS(st->sample_rate)); - - if (ret < 0) { - mutex_unlock(>lock); - return ret; - } + if (ret < 0) + goto unlock; } if (st->client->irq) ret = nau7802_read_irq(indio_dev, chan, val); else ret = nau7802_read_poll(indio_dev, chan, val); - +unlock: mutex_unlock(>lock); return ret; -- 2.16.2
Re: [PATCH 2/2] hwmon: (ucd9000) Add debugfs attributes to provide mfr_status
On 03/13/2018 02:29 PM, Guenter Roeck wrote: On Tue, Mar 13, 2018 at 02:01:51PM -0500, Eddie James wrote: On 03/10/2018 10:50 AM, Guenter Roeck wrote: On 03/09/2018 11:19 AM, Eddie James wrote: From: Christopher BosticExpose the gpiN_fault fields of mfr_status as individual debugfs attributes. This provides a way for users to be easily notified of gpi faults. Also provide the whole mfr_status register in debugfs. Signed-off-by: Christopher Bostic Signed-off-by: Andrew Jeffery Signed-off-by: Eddie James --- drivers/hwmon/pmbus/ucd9000.c | 172 +- 1 file changed, 171 insertions(+), 1 deletion(-) diff --git a/drivers/hwmon/pmbus/ucd9000.c b/drivers/hwmon/pmbus/ucd9000.c index e3a507f..297da0e 100644 --- a/drivers/hwmon/pmbus/ucd9000.c +++ b/drivers/hwmon/pmbus/ucd9000.c @@ -19,6 +19,7 @@ * Foundation, Inc., 675 Mass Ave, Cambridge, MA 02139, USA. */ +#include #include #include #include @@ -36,6 +37,7 @@ #define UCD9000_NUM_PAGES 0xd6 #define UCD9000_FAN_CONFIG_INDEX 0xe7 #define UCD9000_FAN_CONFIG 0xe8 +#define UCD9000_MFR_STATUS 0xf3 #define UCD9000_GPIO_SELECT 0xfa #define UCD9000_GPIO_CONFIG 0xfb #define UCD9000_DEVICE_ID 0xfd @@ -63,13 +65,22 @@ #define UCD901XX_NUM_GPIOS 26 #define UCD90910_NUM_GPIOS 26 +#define UCD9000_DEBUGFS_NAME_LEN 24 +#define UCD9000_GPI_COUNT 8 + struct ucd9000_data { u8 fan_data[UCD9000_NUM_FAN][I2C_SMBUS_BLOCK_MAX]; struct pmbus_driver_info info; struct gpio_chip gpio; + struct dentry *debugfs; }; #define to_ucd9000_data(_info) container_of(_info, struct ucd9000_data, info) +struct ucd9000_debugfs_entry { + struct i2c_client *client; + u8 index; +}; + static int ucd9000_get_fan_config(struct i2c_client *client, int fan) { int fan_config = 0; @@ -328,6 +339,156 @@ static int ucd9000_gpio_direction_output(struct gpio_chip *gc, val); } +#if IS_ENABLED(CONFIG_DEBUG_FS) +static int ucd9000_get_mfr_status(struct i2c_client *client, u8 *buffer) +{ + int ret = pmbus_set_page(client, 0); + + if (ret < 0) + return ret; + + /* + * With the ucd90120 and ucd90124 devices, this command [MFR_STATUS] + * is 2 bytes long (bits 0-15). With the ucd90240 this command is 5 + * bytes long. With all other devices, it is 4 bytes long. + */ + return i2c_smbus_read_block_data(client, UCD9000_MFR_STATUS, buffer); +} + +static int ucd9000_debugfs_show_mfr_status_bit(void *data, u64 *val) +{ + struct ucd9000_debugfs_entry *entry = data; + struct i2c_client *client = entry->client; + u8 buffer[4]; + int ret; + + /* + * This attribute is only created for devices that return 4 bytes for + * status_mfr, so it's safe to call with 4-byte buffer. + */ + ret = ucd9000_get_mfr_status(client, buffer); + if (ret < 0) { + dev_err(>dev, "Failed to read mfr status. rc:%d\n", + ret); + + return ret; + } + + /* + * Attribute only created for devices with gpi fault bits at bits + * 16-23, which is the second byte of the response. + */ + *val = !!(buffer[1] & BIT(entry->index)); + + return 0; +} +DEFINE_DEBUGFS_ATTRIBUTE(ucd9000_debugfs_mfr_status_bit, + ucd9000_debugfs_show_mfr_status_bit, NULL, "%1lld\n"); + +static int ucd9000_debugfs_show_mfr_status_word2(void *data, u64 *val) +{ + struct i2c_client *client = data; + __be16 buffer; + int ret; + + ret = ucd9000_get_mfr_status(client, (u8 *)); + if (ret < 0) { + dev_err(>dev, "Failed to read mfr status. rc:%d\n", + ret); + + return ret; + } + + *val = be16_to_cpu(buffer); + + return 0; +} +DEFINE_DEBUGFS_ATTRIBUTE(ucd9000_debugfs_mfr_status_word2, + ucd9000_debugfs_show_mfr_status_word2, NULL, + "%04llx\n"); + +static int ucd9000_debugfs_show_mfr_status_word4(void *data, u64 *val) +{ + struct i2c_client *client = data; + __be32 buffer; + int ret; + + ret = ucd9000_get_mfr_status(client, (u8 *)); + if (ret < 0) { + dev_err(>dev, "Failed to read mfr status. rc:%d\n", + ret); + + return ret; + } + + *val = be32_to_cpu(buffer); + + return 0; +} +DEFINE_DEBUGFS_ATTRIBUTE(ucd9000_debugfs_mfr_status_word4, + ucd9000_debugfs_show_mfr_status_word4, NULL, + "%08llx\n"); + +static int ucd9000_init_debugfs(struct i2c_client *client, + const struct i2c_device_id *mid, + struct ucd9000_data *data) +{ + struct dentry *debugfs; + struct ucd9000_debugfs_entry *entries; + int i; + char name[UCD9000_DEBUGFS_NAME_LEN]; + + debugfs = pmbus_get_debugfs_dir(client); + if (!debugfs) + return -ENOENT; + +
Re: [tip:locking/core 9/11] include/asm-generic/atomic-instrumented.h:288:24: sparse: cast truncates bits from constant value (100 becomes 0)
On Tue, Mar 13, 2018 at 07:08:06PM +0100, Peter Zijlstra wrote: > On Tue, Mar 13, 2018 at 07:00:17PM +0100, Luc Van Oostenryck wrote: > > The issue here is that sparse has a whole class of warnings that are > > given very early (here at expansion of constant expressions), before > > eliminating code from branches that are never taken (which, surprise, > > need itself to have constant expressions already expanded). > > > > It's often annoying like the case here. > > OTOH, I don't think it's always a bad thing. Sometimes we want to > > have warnings even from code we know will not be executed (in this > > config but maybe it will in another one). > > Is that really a valid concern with all the automated randconfig > building going on today? I don't think so, for the kernel at least. For other uses it may. But don't take me wrongly: I don't want to defend those warnings here, I just want to say that the situation is not totally black & white. One easy-short-term solution that wouldn't make things ugly would be to use a mask instead of a cast: static __always_inline unsigned long cmpxchg_size(volatile void *ptr, unsigned long old, unsigned long new, int size) { switch (size) { case 1: - return arch_cmpxchg((u8 *)ptr, (u8)old, (u8)new); + return arch_cmpxchg((u8 *)ptr, old & 0xff, new & 0xff); case 2: - return arch_cmpxchg((u16 *)ptr, (u16)old, (u16)new); + return arch_cmpxchg((u16 *)ptr, old & 0x, new & 0x); -- Luc
Re: [PATCH RFC 1/8] powerpc: Add barrier_nospec
On Tue, Mar 13, 2018 at 07:32:59PM +0100, Michal Suchanek wrote: > Copypasta from original gmb() and rfi implementation Actual real changelogs would be very welcome. Seeing as I've not seen these mythical RFI patches, this leaves one quite puzzled :-)
[PATCH] media: staging/imx: fill vb2_v4l2_buffer sequence entry
Signed-off-by: Peter Seiderer--- drivers/staging/media/imx/imx-media-csi.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/staging/media/imx/imx-media-csi.c b/drivers/staging/media/imx/imx-media-csi.c index 5a195f80a24d..3a6a645b9dce 100644 --- a/drivers/staging/media/imx/imx-media-csi.c +++ b/drivers/staging/media/imx/imx-media-csi.c @@ -111,6 +111,7 @@ struct csi_priv { struct v4l2_ctrl_handler ctrl_hdlr; int stream_count; /* streaming counter */ + __u32 frame_sequence; /* frame sequence counter */ bool last_eof; /* waiting for last EOF at stream off */ bool nfb4eof;/* NFB4EOF encountered during streaming */ struct completion last_eof_comp; @@ -234,8 +235,11 @@ static void csi_vb2_buf_done(struct csi_priv *priv) struct vb2_buffer *vb; dma_addr_t phys; + priv->frame_sequence++; + done = priv->active_vb2_buf[priv->ipu_buf_num]; if (done) { + done->vbuf.sequence = priv->frame_sequence; vb = >vbuf.vb2_buf; vb->timestamp = ktime_get_ns(); vb2_buffer_done(vb, priv->nfb4eof ? @@ -543,6 +547,7 @@ static int csi_idmac_start(struct csi_priv *priv) /* init EOF completion waitq */ init_completion(>last_eof_comp); + priv->frame_sequence = 0; priv->last_eof = false; priv->nfb4eof = false; -- 2.16.2
Re: [PATCH] net: dsa: drop some VLAs in switch.c
On 03/13/2018 12:58 PM, Vivien Didelot wrote: > Hi Salvatore, > > Salvatore Mesoracawrites: > >> dsa_switch's num_ports is currently fixed to DSA_MAX_PORTS. So we avoid >> 2 VLAs[1] by using DSA_MAX_PORTS instead of ds->num_ports. >> >> [1] https://lkml.org/lkml/2018/3/7/621 >> >> Signed-off-by: Salvatore Mesoraca > > NAK. > > We are in the process to remove hardcoded limits such as DSA_MAX_PORTS > and DSA_MAX_SWITCHES, so we have to stick with ds->num_ports. Then this means that we need to allocate a bitmap from the heap, which sounds a bit superfluous and could theoretically fail... not sure which way is better, but bumping the size to DSA_MAX_PORTS definitively does help people working on enabling -Wvla. -- Florian
Re: [PATCH] net: dsa: drop some VLAs in switch.c
Hi Salvatore, Salvatore Mesoracawrites: > dsa_switch's num_ports is currently fixed to DSA_MAX_PORTS. So we avoid > 2 VLAs[1] by using DSA_MAX_PORTS instead of ds->num_ports. > > [1] https://lkml.org/lkml/2018/3/7/621 > > Signed-off-by: Salvatore Mesoraca NAK. We are in the process to remove hardcoded limits such as DSA_MAX_PORTS and DSA_MAX_SWITCHES, so we have to stick with ds->num_ports. Thanks, Vivien
Re: [RESEND PATCH] rsi: Remove stack VLA usage
On Sun, Mar 11, 2018 at 09:06:10PM -0500, Larry Finger wrote: > On 03/11/2018 08:43 PM, Tobin C. Harding wrote: > >The kernel would like to have all stack VLA usage removed[1]. rsi uses > >a VLA based on 'blksize'. Elsewhere in the SDIO code maximum block size > >is defined using a magic number. We can use a pre-processor defined > >constant and declare the array to maximum size. We add a check before > >accessing the array in case of programmer error. > > > >[1]: https://lkml.org/lkml/2018/3/7/621 > > > >Signed-off-by: Tobin C. Harding> >--- > > > >RESEND: add wireless mailing list to CC's (requested by Kalle) > > > > drivers/net/wireless/rsi/rsi_91x_hal.c | 13 +++-- > > drivers/net/wireless/rsi/rsi_91x_sdio.c | 9 +++-- > > 2 files changed, 14 insertions(+), 8 deletions(-) > > > >diff --git a/drivers/net/wireless/rsi/rsi_91x_hal.c > >b/drivers/net/wireless/rsi/rsi_91x_hal.c > >index 1176de646942..839ebdd602df 100644 > >--- a/drivers/net/wireless/rsi/rsi_91x_hal.c > >+++ b/drivers/net/wireless/rsi/rsi_91x_hal.c > >@@ -641,7 +641,7 @@ static int ping_pong_write(struct rsi_hw *adapter, u8 > >cmd, u8 *addr, u32 size) > > u32 cmd_addr; > > u16 cmd_resp, cmd_req; > > u8 *str; > >-int status; > >+int status, ret; > > if (cmd == PING_WRITE) { > > cmd_addr = PING_BUFFER_ADDRESS; > >@@ -655,12 +655,13 @@ static int ping_pong_write(struct rsi_hw *adapter, u8 > >cmd, u8 *addr, u32 size) > > str = "PONG_VALID"; > > } > >-status = hif_ops->load_data_master_write(adapter, cmd_addr, size, > >+ret = hif_ops->load_data_master_write(adapter, cmd_addr, size, > > block_size, addr); > >-if (status) { > >-rsi_dbg(ERR_ZONE, "%s: Unable to write blk at addr %0x\n", > >-__func__, *addr); > >-return status; > >+if (ret) { > >+if (ret != -EINVAL) > >+rsi_dbg(ERR_ZONE, "%s: Unable to write blk at addr > >%0x\n", > >+__func__, *addr); > >+return ret; > > } > > status = bl_cmd(adapter, cmd_req, cmd_resp, str); > >diff --git a/drivers/net/wireless/rsi/rsi_91x_sdio.c > >b/drivers/net/wireless/rsi/rsi_91x_sdio.c > >index b0cf41195051..b766578b591a 100644 > >--- a/drivers/net/wireless/rsi/rsi_91x_sdio.c > >+++ b/drivers/net/wireless/rsi/rsi_91x_sdio.c > >@@ -20,6 +20,8 @@ > > #include "rsi_common.h" > > #include "rsi_hal.h" > >+#define RSI_MAX_BLOCK_SIZE 256 > >+ > > /** > > * rsi_sdio_set_cmd52_arg() - This function prepares cmd 52 read/write arg. > > * @rw: Read/write > >@@ -362,7 +364,7 @@ static int rsi_setblocklength(struct rsi_hw *adapter, > >u32 length) > > rsi_dbg(INIT_ZONE, "%s: Setting the block length\n", __func__); > > status = sdio_set_block_size(dev->pfunction, length); > >-dev->pfunction->max_blksize = 256; > >+dev->pfunction->max_blksize = RSI_MAX_BLOCK_SIZE; > > adapter->block_size = dev->pfunction->max_blksize; > > rsi_dbg(INFO_ZONE, > >@@ -567,9 +569,12 @@ static int rsi_sdio_load_data_master_write(struct > >rsi_hw *adapter, > > { > > u32 num_blocks, offset, i; > > u16 msb_address, lsb_address; > >-u8 temp_buf[block_size]; > >+u8 temp_buf[RSI_MAX_BLOCK_SIZE]; > > int status; > >+if (block_size > RSI_MAX_BLOCK_SIZE) > >+return -EINVAL; > >+ > > num_blocks = instructions_sz / block_size; > > msb_address = base_address >> 16; > > I am not giving this patch a negative review, but my solution to the same > problem has been to change the on-stack array into a u8 pointer, use > kmalloc() to assign the space, and then free that space at the end. That way > large stack allocations are avoided, with a minimum of changes. Your idea is better Larry, have you got a patch done already or do you want me to knock one up? thanks, Tobin.
Re: perf-core build fails on powerpc
John Garry [john.ga...@huawei.com] wrote: > On 13/03/2018 19:17, Sukadev Bhattiprolu wrote: > > > > > > Building perf on Powerpc seems broken when using Arnaldo's perf/core branch > > with HEAD as: > > > > 1b442ed ("perf test: Fix exit code for record+probe_libc_inet_pton.sh") > > > > It maybe related to this commit: > > > > commit d596299 > > Author: John Garry> > Date: Thu Mar 8 18:58:29 2018 +0800 > > > > perf vendor events: Add support for pmu events vendor subdirectory > > > > Reverting this hunk from tools/perf/pmu-events/jevents.c, seems to fix the > > problem for me. > > Hi John, I have an xfs file system which seems to have d_type == DT_UNKNOWN for all entries in 'tools/perf/pmu-events/arch/power8'! readdir(3) says ->d_type may not be supported by all file systems. Not relying on ->d_type seems to fix it: @@ -873,26 +879,26 @@ static int is_leaf_dir(const char *fpath) return 0; while ((dir = readdir(d)) != NULL) { - if (dir->d_type == DT_DIR && dir->d_name[0] != '.') { - res = 0; - break; - } else if (dir->d_type == DT_UNKNOWN) { - char path[PATH_MAX]; - struct stat st; + char path[PATH_MAX]; + struct stat st; - sprintf(path, "%s/%s", fpath, dir->d_name); - if (stat(path, )) - break; + if (strcmp(dir->d_name, ".") == 0 || + strcmp(dir->d_name, "..") == 0) + continue; - if (S_ISDIR(st.st_mode)) { - res = 0; - break; - } + sprintf(path, "%s/%s", fpath, dir->d_name); + if (stat(path, )) + break; + + if (S_ISDIR(st.st_mode)) { + res = 0; + break;
Re: linux-next: Tree for Mar 13
Hi Mike, On Tue, 13 Mar 2018 11:57:40 -0400 Mike Snitzerwrote: > > I had to revert the following commits to get this kernel to build > (otherwise I got macro expansion errors, using RHEL7 > gcc-4.8.5-28.el7.x86_64): Yeah, I reported that earlier (216 of my 244 builds failed - gcc 4.6.3) > beb7eb2 kernelh-skip-single-eval-logic-on-literals-in-min-max-v3 > 19ff7e5 kernelh-skip-single-eval-logic-on-literals-in-min-max-v2 > c7c133f kernel.h: skip single-eval logic on literals in min()/max() > > (it wasn't until I reverted commit c7c133f that the kernel build worked) I will revert those from linux-next today unless someone sends a fix. -- Cheers, Stephen Rothwell pgpwRHWBWLjIO.pgp Description: OpenPGP digital signature
Re: [Intel-gfx] [PATCH] [v2] drm/i915/pmu: avoid -Wmaybe-uninitialized warning
On Tue, Mar 13, 2018 at 6:46 PM, Tvrtko Ursulinwrote: > > On 13/03/2018 16:19, Arnd Bergmann wrote: >> >> The conditional spinlock confuses gcc into thinking the 'flags' value >> might contain uninitialized data: >> >> drivers/gpu/drm/i915/i915_pmu.c: In function '__i915_pmu_event_read': >> arch/x86/include/asm/paravirt_types.h:573:3: error: 'flags' may be used >> uninitialized in this function [-Werror=maybe-uninitialized] > > > Hm, how does paravirt_types.h comes into the picture? spin_unlock_irqrestore() calls arch_local_irq_restore() >> The code is correct, but it's easy to see how the compiler gets confused >> here. This avoids the problem by pulling the lock outside of the function >> into its only caller. > > > Is it specific gcc version, specific options, or specific kernel config that > this happens? Not gcc version specific (same result with gcc-4.9 through 8, didn't test earlier versions that are currently broken). > Strange that it hasn't been seen so far. It seems to be a relatively rare 'randconfig' combination. Looking at the preprocessed sources, I find: static u64 get_rc6(struct drm_i915_private *i915, bool locked) { unsigned long flags; u64 val; if (intel_runtime_pm_get_if_in_use(i915)) { val = __get_rc6(i915); intel_runtime_pm_put(i915); if (!locked) do { do { ({ unsigned long __dummy; typeof(flags) __dummy2; (void)(&__dummy == &__dummy2); 1; }); do { do { do { ({ unsigned long __dummy; typeof(flags) __dummy2; (void)(&__dummy == &__dummy2); 1; }); flags = arch_local_irq_save(); } while (0); trace_hardirqs_off(); } while (0); do { __asm__ __volatile__("": : :"memory"); do { (void)0; (void)(spinlock_check(>pmu.lock)); } while (0); } while (0); } while (0); } while (0); } while (0); if (val >= i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur) { i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur = 0; i915->pmu.sample[__I915_SAMPLE_RC6].cur = val; } else { val = i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur; } if (!locked) spin_unlock_irqrestore(>pmu.lock, flags); } else { struct pci_dev *pdev = i915->drm.pdev; struct device *kdev = >dev; unsigned long flags2; # 455 "/git/arm-soc/drivers/gpu/drm/i915/i915_pmu.c" if (!locked) do { do { ({ unsigned long __dummy; typeof(flags) __dummy2; (void)(&__dummy == &__dummy2); 1; }); do { do { do { ({ unsigned long __dummy; typeof(flags) __dummy2; (void)(&__dummy == &__dummy2); 1; }); flags = arch_local_irq_save(); } while (0); trace_hardirqs_off(); } while (0); do { __asm__ __volatile__("": : :"memory"); do { (void)0; (void)(spinlock_check(>pmu.lock)); } while (0); } while (0); } while (0); } while (0); } while (0); do { do { ({ unsigned long __dummy; typeof(flags2) __dummy2; (void)(&__dummy == &__dummy2); 1; }); do { do { do { ({ unsigned long __dummy; typeof(flags2) __dummy2; (void)(&__dummy == &__dummy2); 1; }); flags2 = arch_local_irq_save(); } while (0); trace_hardirqs_off(); } while (0); do { __asm__ __volatile__("": : :"memory"); do { (void)0; (void)(spinlock_check(>power.lock)); } while (0); } while (0); } while (0); } while (0); } while (0); if (!i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur) i915->pmu.suspended_jiffies_last = kdev->power.suspended_jiffies; val = kdev->power.suspended_jiffies - i915->pmu.suspended_jiffies_last; val += jiffies - kdev->power.accounting_timestamp; spin_unlock_irqrestore(>power.lock, flags2); val = jiffies_to_nsecs(val); val += i915->pmu.sample[__I915_SAMPLE_RC6].cur; i915->pmu.sample[__I915_SAMPLE_RC6_ESTIMATED].cur = val; if (!locked) spin_unlock_irqrestore(>pmu.lock, flags); } return val; } so it seems that the spin_lock_irqsave() is completely inlined through a macro while the unlock is not, and the lock contains a memory barrier (among other things) that might tell the compiler that the state of the 'locked' flag could changed underneath it. It could also be the problem that arch_local_irq_restore() uses __builtin_expect() in PVOP_TEST_NULL(op) when CONFIG_PARAVIRT_DEBUG is enabled, see static inline __attribute__((unused)) __attribute__((no_instrument_function)) __attribute__((no_instrument_function)) void arch_local_irq_restore(unsigned long f) { ({ unsigned long __eax = __eax, __edx = __edx, __ecx = __ecx;; do { if (__builtin_expect(!!(pv_irq_ops.restore_fl.func == ((void *)0)), 0)) do { do { asm volatile("1:\t" ".byte 0x0f, 0x0b" "\n" ".pushsection __bug_table,\"aw\"\n" "2:\t" ".long " "1b" "\t# bug_entry::bug_addr\n" "\t" ".long " "%c0" "\t# bug_entry::file\n" "\t.word %c1" "\t# bug_entry::line\n" "\t.word %c2" "\t# bug_entry::flags\n" "\t.org 2b+%c3\n" ".popsection" : : "i" ("/git/arm-soc/arch/x86/include/asm/paravirt.h"), "i" (783), "i" (0), "i" (sizeof(struct bug_entry))); } while (0); do { ; asm volatile(""); __builtin_unreachable(); } while (0); } while (0); } while (0); asm volatile("" "771:\n\t" "999:\n\t" ".pushsection .discard.retpoline_safe\n\t" " "
Re: [RT PATCH 2/2] block: blk-mq: move blk_queue_usage_counter_release() into process context
On Tue, Mar 13, 2018 at 07:42:41PM +0100, Sebastian Andrzej Siewior wrote: > +static void blk_queue_usage_counter_release_swork(struct swork_event *sev) > +{ > + struct request_queue *q = > + container_of(sev, struct request_queue, mq_pcpu_wake); > + > + wake_up_all(>mq_freeze_wq); > +} > + > static void blk_queue_usage_counter_release(struct percpu_ref *ref) > { > struct request_queue *q = > container_of(ref, struct request_queue, q_usage_counter); > > - wake_up_all(>mq_freeze_wq); > + swork_queue(>mq_pcpu_wake); > } Depending on if we expect there to actually be wakeups, you could do something like: if (wq_has_sleepers(>mq_freeze_wq)) swork_queue(>mq_pcpu_wake)); avoiding the whole workqueue thing in the case there wasn't anybody waiting for it. But since I don't know this code, I can't say if it makes sense or not. Tejun?
Re: [v5 1/2] mm: disable interrupts while initializing deferred pages
On Tue, 13 Mar 2018 15:45:46 -0400 Pavel Tatashinwrote: > > > > > > We must remove cond_resched() because we can't sleep anymore. They were > > > added to fight NMI timeouts, so I will replace them with > > > touch_nmi_watchdog() in a follow-up fix. > > > > This makes no sense. Any code section where we can add cond_resched() > > was never subject to NMI timeouts because that code cannot be running with > > disabled interrupts. > > > > Hi Andrew, > > I was talking about this patch: > > 9b6e63cbf85b89b2dbffa4955dbf2df8250e5375 > mm, page_alloc: add scheduling point to memmap_init_zone > > Which adds cond_resched() to memmap_init_zone() to avoid NMI timeouts. > > memmap_init_zone() is used both, early in boot, when non-deferred struct > pages are initialized, but also may be used later, during memory hotplug. > > As I understand, the later case could cause the timeout on non-preemptible > kernels. > > My understanding, is that the same logic was used here when cond_resched()s > were added. > > Please correct me if I am wrong. Yes, the message is a bit confusing and the terminology is perhaps vague. And it's been a while since I played with this stuff, so from (dated) memory: Soft lockup: kernel has run for too long without rescheduling Hard lockup: kernel has run for too long with interrupts disabled Both of these are detected by the NMI watchdog handler. 9b6e63cbf85b89b2d fixes a soft lockup by adding a manual rescheduling point. Replacing that with touch_nmi_watchdog() won't work (I think). Presumably calling touch_softlockup_watchdog() will "work", in that it suppresses the warning. But it won't fix the thing which the warning is actually warning about: starvation of the CPU scheduler. That's what the cond_resched() does. I'm not sure what to suggest, really. Your changelog isn't the best: "Vlastimil Babka reported about a window issue during which when deferred pages are initialized, and the current version of on-demand initialization is finished, allocations may fail". Well... where is ths mysterious window? Without such detail it's hard for others to suggest alternative approaches.
Re: [PATCH v2] usb: musb: Fix external abort in musb_remove on omap2430
Hi, On Mon, Mar 12, 2018 at 11:30:39PM +0100, Merlijn Wajer wrote: > Hi Bin, > > On 09/03/18 15:11, Bin Liu wrote: > > Hi, > > > > On Thu, Mar 08, 2018 at 11:19:48PM +0100, Merlijn Wajer wrote: > >> This fixes an oops on unbind / module unload (on the musb omap2430 > >> platform). > >> > >> musb_remove function now calls musb_platform_exit before disabling > >> runtime pm. > >> > >> Signed-off-by: Merlijn Wajer> > > > Applied. Thanks. > > Thank you. Is there any chance that this patch and the vbus patch could > end up in the 4.16-rc series? I was waiting for another rc to see if the > vbus patch would land, but it didn't land with 4.16-rc5. Those two patches are already in the upstream tree, hopefully they will be in the v4.16-rc. Regards, -Bin.
Re: [RESEND] rsi: Remove stack VLA usage
On Mon, Mar 12, 2018 at 09:46:06AM +, Kalle Valo wrote: > tchardingwrote: > > > The kernel would like to have all stack VLA usage removed[1]. rsi uses > > a VLA based on 'blksize'. Elsewhere in the SDIO code maximum block size > > is defined using a magic number. We can use a pre-processor defined > > constant and declare the array to maximum size. We add a check before > > accessing the array in case of programmer error. > > > > [1]: https://lkml.org/lkml/2018/3/7/621 > > > > Signed-off-by: Tobin C. Harding > > Tobin, your name in patchwork.kernel.org is just "tcharding" then it should be > "Tobin C. Harding". Patchwork is braindead in a way as it takes the name from > it's database instead of the From header of the patch in question. > > I can fix that manually but it would be helpful if you could register to > patchwork and fix your name during registration. You have only one chance to > fix your name (another braindead feature!) so be careful :) Hi Kalle, I logged into my patchwork account but I don't see any way to set the name. Within 'profile' there is only 'change password' and 'link email'. I thought I could unregister then re-register but I can't see how to do that either. Is there a maintainer of patchwork.kernel.org who I can email to manually remove me from the system? thanks, Tobin.
[PATCH] crypto: arm,arm64 - Fix random regeneration of S_shipped
The decision to rebuild .S_shipped is made based on the relative timestamps of .S_shipped and .pl files but git makes this essentially random. This means that the perl script might run anyway (usually at most once per checkout), defeating the whole purpose of _shipped. Fix by skipping the rule unless explicit make variables are provided: REGENERATE_ARM_CRYPTO or REGENERATE_ARM64_CRYPTO. This can produce nasty occasional build failures downstream, for example for toolchains with broken perl. The solution is minimally intrusive to make it easier to push into stable. Another report on a similar issue here: https://lkml.org/lkml/2018/3/8/1379 Signed-off-by: Leonard CrestezCc: --- arch/arm/crypto/Makefile | 2 ++ arch/arm64/crypto/Makefile | 2 ++ 2 files changed, 4 insertions(+) Not clear if this needs to go through crypto or arm but all commits in these directories start with "crypto:". My problems were only on arm64 because of a yocto toolchain which ships a version of perl which fails on "use integer;". CC stable because this can cause trouble for downstream packagers. diff --git a/arch/arm/crypto/Makefile b/arch/arm/crypto/Makefile index 30ef8e2..c9919c2 100644 --- a/arch/arm/crypto/Makefile +++ b/arch/arm/crypto/Makefile @@ -47,20 +47,22 @@ sha256-arm-y:= sha256-core.o sha256_glue.o $(sha256-arm-neon-y) sha512-arm-neon-$(CONFIG_KERNEL_MODE_NEON) := sha512-neon-glue.o sha512-arm-y := sha512-core.o sha512-glue.o $(sha512-arm-neon-y) sha1-arm-ce-y := sha1-ce-core.o sha1-ce-glue.o sha2-arm-ce-y := sha2-ce-core.o sha2-ce-glue.o aes-arm-ce-y := aes-ce-core.o aes-ce-glue.o ghash-arm-ce-y := ghash-ce-core.o ghash-ce-glue.o crct10dif-arm-ce-y := crct10dif-ce-core.o crct10dif-ce-glue.o crc32-arm-ce-y:= crc32-ce-core.o crc32-ce-glue.o chacha20-neon-y := chacha20-neon-core.o chacha20-neon-glue.o +ifdef REGENERATE_ARM_CRYPTO quiet_cmd_perl = PERL$@ cmd_perl = $(PERL) $(<) > $(@) $(src)/sha256-core.S_shipped: $(src)/sha256-armv4.pl $(call cmd,perl) $(src)/sha512-core.S_shipped: $(src)/sha512-armv4.pl $(call cmd,perl) +endif .PRECIOUS: $(obj)/sha256-core.S $(obj)/sha512-core.S diff --git a/arch/arm64/crypto/Makefile b/arch/arm64/crypto/Makefile index cee9b8d9..dfe651b 100644 --- a/arch/arm64/crypto/Makefile +++ b/arch/arm64/crypto/Makefile @@ -60,20 +60,22 @@ obj-$(CONFIG_CRYPTO_AES_ARM64_BS) += aes-neon-bs.o aes-neon-bs-y := aes-neonbs-core.o aes-neonbs-glue.o AFLAGS_aes-ce.o:= -DINTERLEAVE=4 AFLAGS_aes-neon.o := -DINTERLEAVE=4 CFLAGS_aes-glue-ce.o := -DUSE_V8_CRYPTO_EXTENSIONS $(obj)/aes-glue-%.o: $(src)/aes-glue.c FORCE $(call if_changed_rule,cc_o_c) +ifdef REGENERATE_ARM64_CRYPTO quiet_cmd_perlasm = PERLASM $@ cmd_perlasm = $(PERL) $(<) void $(@) $(src)/sha256-core.S_shipped: $(src)/sha512-armv8.pl $(call cmd,perlasm) $(src)/sha512-core.S_shipped: $(src)/sha512-armv8.pl $(call cmd,perlasm) +endif .PRECIOUS: $(obj)/sha256-core.S $(obj)/sha512-core.S -- 2.7.4
Re: [3/3] irqchip/gic-v3: Bounds check redistributor accesses
On Tue, 13 Mar 2018 18:49:44 +, Nishanth Menon wrote: > > Marc, > > On Tue, Mar 13, 2018 at 9:21 AM, Marc Zyngierwrote: > > Hi Lokesh, > > > > On 13/03/18 13:38, Lokesh Vutla wrote: > >> Hi All, > >> > >> On Wednesday 11 October 2017 03:11 PM, Punit Agrawal wrote: > >>> The kernel crashes while iterating over a redistributor that is > >>> in-correctly sized by the platform firmware or doesn't contain the last > >>> record. > >>> > >>> Prevent the crash by checking accesses against the size of the region > >>> provided by the firmware. While we are at it, warn the user about > >>> incorrect region size. > >>> > >>> Signed-off-by: Punit Agrawal > >>> Cc: Marc Zyngier > >> > >> Sorry to bring up an old thread. Just wanted to check what is the status > >> on this series. > > > > So far, I wasn't inclined to merge it, as it only allowed you to detect > > a broken system, as opposed to help with a working one. > > Is'nt that a good reason to have it? Why not help an error in dtb with > a debug helper than an obtuse crash to debug painfully? The kernel is not in the business of validating DTBs. And we both know that if we start papering over firmware bugs, these bugs become ABI, and we live with them forever. If you want to validate DTBs, you should use a tool that is actually validating it against your HW, and not use the kernel as the validation tool. > > > > >> This will also be useful when we try to boot linux + hypervisor with > >> less number of cores than the SoC supports. For example: > >> - SoC has 4 cores and Linux tries to boot with 2 cores. > >> - then a type-2 hypervisor gets installed. > >> - Hypervisor tries to boot a VM with linux on core 1. > >> > >> Now the VM boot will fail while it iterates over all the GICR regions > >> till GICR_TYPER is found. Hypervisor will trap any accesses to GICR > >> regions of any invalid cpus(cpu 2, cpu 3 in this case). > > > > It you're passing the redistributors to a guest, you're doing something > > terribly wrong. You're putting the guest in a position to do a DoS on > > the hypervisor (disabling its timer interrupt, for example). Not the > > greatest move. There is a number of other gotchas with this approach > > (virtual interrupts, distributor virtualization...). > > > >> If the $patch is not the right approach, can you suggest on how to > >> handle the above scenario? > > > > The proper way to handle this is to virtualize the distributor and > > redistributor by trap/emulate. The only thing you can safely pass to a > > guest is the CPU interface, either as system registers or in its MMIO > > form (if you have the GICv2 compatibility interface). > > > > Dumb question: Would'nt a trap emulate be real expensive with hyp > context in v8 (all the context save/restore we'd have to do? (in the > context of discussion, GICV2 compat is disabled). How often do you hit the distributor and redistributors? Why do you need to context switch a distributor that you emulate (hint: you don't). I suggest you look at how real life hypervisor works (there's a pretty good one in the tree). Thanks, M. -- Jazz is not dead, it just smell funny.
Re: [PATCH 3/8] gpio: pci-idio-16: Implement get_multiple callback
On Tue, Mar 13, 2018 at 06:04:33PM +0200, Andy Shevchenko wrote: >On Mon, Mar 12, 2018 at 10:49 PM, William Breathitt Gray >wrote: > >> +static int idio_16_gpio_get_multiple(struct gpio_chip *chip, >> + unsigned long *mask, unsigned long *bits) >> +{ > > >> + /* clear bits array to a clean slate */ >> + for (i = 0; i < chip->ngpio; i += BITS_PER_LONG) >> + bits[i / BITS_PER_LONG] = 0; > >bitmap_clear() or bitmap_zero() bitmap_zero() would be perfect for this situation. I'll submit a version 2 of this patchset with this change for the various drivers herein. > >> + /* get bits are evaluated a gpio register size at a time */ >> + for (i = 0; i < chip->ngpio; i += gpio_reg_size) { >> + bit_word_offset = i % BITS_PER_LONG; >> + bits_mask = mask[BIT_WORD(i)] & (reg_mask << >> bit_word_offset); >> + if (!bits_mask) { >> + /* no get bits in this register so skip to next one >> */ >> + continue; >> + } > >for_each_set_bit() ? > >> + /* store acquired bits */ >> + bits[BIT_WORD(i)] |= port_state << bit_word_offset; > >__set_bit() >__clear_bit() I'm not sure for_each_set_bit() and __set_bit()/__clear_bit() would be good for this particular situation since I'm working with the entire register word (8 bits for this specific device) at a time. Since I know each register word is 8-bits, I can make 8-bit jumps and skips rather than evaluating each bit individually. Similarly, it seems more efficient to store the input 8 bits at a time rather than breaking it up to individual bits. However, I do see how all these inline bitwise operations could make it difficult to follow the code, so perhaps I can break it up across more lines so that the logic of the loop becomes easier to read. William Breathitt Gray > >We have bitmap API for *ages*. Is it too hard to check? > >-- >With Best Regards, >Andy Shevchenko
Re: [2/3] mwifiex: support sysfs initiated device coredump
Hi Arend, > Since commit 3c47d19ff4dc ("drivers: base: add coredump driver ops") > it is possible to initiate a device coredump from user-space. This > patch adds support for it adding the .coredump() driver callback. > As there is no longer a need to initiate it through debugfs remove > that code. > > Signed-off-by: Arend van SprielBased on the discussion I assume this is ok to take to w-d-next. If that's not the case, please let me know ASAP. >>> >>> It is up to the mwifiex maintainers to decide, I guess. The ABI >>> documentation need to be revised and change the callback to void >>> return type. I am not sure what the best approach is. 1) apply this >>> and fix return type later, or 2) fix return type and resubmit this. >>> What is your opinion? >> >> I guess the callback change will go through Greg's tree? Then I suspect >> it's easier that you submit the callback change to Greg first and wait >> for it to trickle down to wireless-drivers-next (after the next merge >> window) and then I can apply the driver patches. Otherwise there might >> be a conflict between my and Greg's tree. > > That was my assessment, but unfortunately Marcel already applied the btmrvl > patch before I could reply. So how do I move from here? Option 1) revert > brmrvl and fix callback return type, or 2) apply mwifiex patch and fix > callback return type later for both drivers. I can take the patch back out of bluetooth-next if needed. It is your call. Regards Marcel
Re: [2/3] mwifiex: support sysfs initiated device coredump
On 3/13/2018 9:19 PM, Marcel Holtmann wrote: Hi Arend, Since commit 3c47d19ff4dc ("drivers: base: add coredump driver ops") it is possible to initiate a device coredump from user-space. This patch adds support for it adding the .coredump() driver callback. As there is no longer a need to initiate it through debugfs remove that code. Signed-off-by: Arend van SprielBased on the discussion I assume this is ok to take to w-d-next. If that's not the case, please let me know ASAP. It is up to the mwifiex maintainers to decide, I guess. The ABI documentation need to be revised and change the callback to void return type. I am not sure what the best approach is. 1) apply this and fix return type later, or 2) fix return type and resubmit this. What is your opinion? I guess the callback change will go through Greg's tree? Then I suspect it's easier that you submit the callback change to Greg first and wait for it to trickle down to wireless-drivers-next (after the next merge window) and then I can apply the driver patches. Otherwise there might be a conflict between my and Greg's tree. That was my assessment, but unfortunately Marcel already applied the btmrvl patch before I could reply. So how do I move from here? Option 1) revert brmrvl and fix callback return type, or 2) apply mwifiex patch and fix callback return type later for both drivers. I can take the patch back out of bluetooth-next if needed. It is your call. Thanks, Marcel Let's go for that. Please revert/remove the patch. Regards, Arend
Re: [PATCH v2 1/5] iommu/amd - Add debugfs support
On Tue, Mar 13, 2018 at 8:54 PM, Gary R Hookwrote: > On 03/13/2018 12:16 PM, Andy Shevchenko wrote: >> On Fri, Mar 9, 2018 at 2:50 AM, Gary R Hook wrote: >>> +#include >>> +#include >>> +#include >> >> >> Keep in order? > What order would that be? These few needed files are listed in the same > order as which they appear in amd_iommu.c. I'm gonna need a preference > spelled out, please (and a rationale, so I may better understand). To increase readability and avoid potential header duplication (here is can bus protocol implementation where the problem exists for real, even in new code!) >>> +/* DebugFS helpers */ >>> +#defineOBUFP (obuf + oboff) >>> +#defineOBUFLEN obuflen >>> +#defineOBUFSPC (OBUFLEN - oboff) >>> +#defineOSCNPRINTF(fmt, ...) \ >>> + scnprintf(OBUFP, OBUFSPC, fmt, ## __VA_ARGS__) >> >> >> I don't see any advantages of this. Other way around, they will simple >> makes things hard to read an understand in place. > > > I used this technique in the CCP driver code (where it was accepted), in an > effort to do the opposite of what you claim: make the code more readable. > Given the 80 column limit, a large number of arguments, and very long > statements, IMO something needs to give. I don't find the use of #defines to > be obfuscating. > > I'm not trying to argue, but rather simply state the perspective / reasoning > I used to create a source file I feel is manageable. I have 17 more iommu > patches built upon this strategy, and this seems to be advantageous for all > of them. It's fine to me as long as it's fine to maintainer, but honestly speaking I would avoid such code as much as possible. Imagine that your "advantage" basically becomes disadvantage to everyone else who is not familiar with the code. Each time I see macro in the code, I would need to at least step on it, run cscope, read, and come back. And at this point of time I already forgot what this code is doing, it does use sNprintf() or sCNprintf() or whatever wrapper on top of either... >>> + for (i = start ; i <= end ; i++) >> Missed {} > Wasn't sure about the M.O. given that the body of this loop is a single if > statement. And I don't see anywhere in > https://www.kernel.org/doc/html/latest/process/coding-style.html > in section 3.1 where curly braces are called for in this situation. May I > ask for clarification on the style rule, please? You can do nothing, though I'm guided by the end of section 3.0 (though it tells only about 'if' case). >>> @@ -89,6 +89,7 @@ >>> #define ACPI_DEVFLAG_ATSDIS 0x1000 >>> >>> #define LOOP_TIMEOUT 10 >>> + >>> /* >>>* ACPI table definitions >>>* >> Doesn't belong to the patch. > I'm sorry, I don't understand. The added blank line doesn't belong to the > patch? Correct. -- With Best Regards, Andy Shevchenko
Re: [PATCH ghak21 V2 3/4] audit: add refused symlink to audit_names
On Tue, Mar 13, 2018 at 6:52 AM, Richard Guy Briggswrote: > On 2018-03-13 11:38, Steve Grubb wrote: >> On Tue, 13 Mar 2018 06:11:08 -0400 >> Richard Guy Briggs wrote: >> >> > On 2018-03-13 09:35, Steve Grubb wrote: >> > > On Mon, 12 Mar 2018 11:52:56 -0400 >> > > Richard Guy Briggs wrote: >> > > >> > > > On 2018-03-12 11:53, Paul Moore wrote: >> > > > > On Mon, Mar 12, 2018 at 11:26 AM, Richard Guy Briggs >> > > > > wrote: >> > > > > > On 2018-03-12 11:12, Paul Moore wrote: >> > > > > >> On Mon, Mar 12, 2018 at 2:31 AM, Richard Guy Briggs >> > > > > >> wrote: >> > > > > >> > Audit link denied events for symlinks had duplicate PATH >> > > > > >> > records rather than just updating the existing PATH record. >> > > > > >> > Update the symlink's PATH record with the current dentry >> > > > > >> > and inode information. >> > > > > >> > >> > > > > >> > See: https://github.com/linux-audit/audit-kernel/issues/21 >> > > > > >> > Signed-off-by: Richard Guy Briggs >> > > > > >> > --- >> > > > > >> > fs/namei.c | 1 + >> > > > > >> > 1 file changed, 1 insertion(+) >> > > > > >> >> > > > > >> Why didn't you include this in patch 4/4 like I asked during >> > > > > >> the previous review? >> > > > > > >> > > > > > Please see the last comment of: >> > > > > > https://www.redhat.com/archives/linux-audit/2018-March/msg00070.html >> > > > > >> > > > > Yes, I just saw that ... I hadn't seen your replies on the v1 >> > > > > patches until I had finished reviewing v2. I just replied to >> > > > > that mail in the v1 thread, but basically you need to figure >> > > > > out what is necessary here and let us know. If I have to >> > > > > figure it out it likely isn't going to get done with enough >> > > > > soak time prior to the upcoming merge window. >> > > > >> > > > Steve? I was hoping you could chime in here. >> > > >> > > If the CWD record will always be the same as the PARENT record, >> > > then we do not need the parent record. Duplicate information is >> > > bad. Like all the duplicate SYSCALL information. >> > >> > The CWD record could be different from the PARENT record, since I >> > could have SYMLINK=/tmp/test/symlink, CWD=/tmp, PARENT=/tmp/test. >> > Does the parent record even matter since it might not be a directory >> > operation like creat, unlink or rename? >> >> There's 2 issues. One is creating the path if what we have is relative. >> In this situation CWD should be enough. But if the question is whether >> the PARENT directory should be included...what if the PARENT >> permissions do not allow the successful name resolution? In that case >> we might only get a PARENT record no? In that case we would need it. > > I think in the case of symlink creation, normal file create code path > would be in effect, and would properly log parent and symlink source > file paths (if a rule to log it was in effect) which is not something > that would trigger a symlink link denied error. Symlink link denied > happens only when trying to actually follow the link before > resolving the target path of a read/write/exec of the symlink target. > > If the parent permissions of the link's target don't allow successful > name resolution then the symlink link denied condition isn't met, but > rather any other rule that applies to the target path. I'm guessing you are in the process of tracking all this down, but if not, lets get to a point where we can answer this definitively and not guess :) -- paul moore www.paul-moore.com
RE: [PATCH V2] perf/x86/intel/uncore: Querying number of CHAs from CAPID6 register
I've tested this patch on the same set of hubless (single-segment) and scalable (segment-per-socket) configurations as for Kan's version 1. As far as we can tell this will also work for Cascade Lake, but will need revisiting for Ice Lake. Thanks. Gary > -Original Message- > From: kan.li...@linux.intel.com [mailto:kan.li...@linux.intel.com] > Sent: Tuesday, March 13, 2018 1:52 PM > To: mi...@redhat.com; h...@zytor.com; t...@linutronix.de; > pet...@infradead.org; andy.shevche...@gmail.com > Cc: Kroening, Gary; Travis, Mike; Banman, Andrew; Sivanich, Dimitri; > Anderson, Russ; x...@kernel.org; linux-kernel@vger.kernel.org; Kan Liang > Subject: [PATCH V2] perf/x86/intel/uncore: Querying number of CHAs from > CAPID6 register > > From: Kan Liang> > The number of CHAs is miscalculated on multi PCI domain systems on > Skylake server. > > (From Kroening, Gary: > > For systems with a single PCI segment, it is sufficient to look for the > bus number to change in order to determine that all of the CHa's have > been counted for a single socket. > However, for multi PCI segment systems, each socket is given a new > segment and the bus number does NOT change. So looking only for the > bus number to change ends up counting all of the CHa's on all sockets > in the system. This leads to writing CPU MSRs beyond a valid range and > causes an error in ivbep_uncore_msr_init_box().) > > To determine the number of CHAs, it should read bits 27:0 in the CAPID6 > register located at Device 30, Function 3, Offset 0x9C. These 28 bits > form a bit vector of available LLC slices and the CHAs that manage those > slices. > > Fixes: cd34cd97b7b4 ("perf/x86/intel/uncore: Add Skylake server uncore > support") > Reported-by: Kroening, Gary > Signed-off-by: Kan Liang > --- > > Changes since V1: > - add missed pci_dev_put() > - Drop ugly casting by using hweight32() > - Add comments for macros. > > arch/x86/events/intel/uncore_snbep.c | 31 +-- > 1 file changed, 17 insertions(+), 14 deletions(-) > > diff --git a/arch/x86/events/intel/uncore_snbep.c > b/arch/x86/events/intel/uncore_snbep.c > index 6d8044a..8970f71 100644 > --- a/arch/x86/events/intel/uncore_snbep.c > +++ b/arch/x86/events/intel/uncore_snbep.c > @@ -3562,24 +3562,27 @@ static struct intel_uncore_type *skx_msr_uncores[] > = { > NULL, > }; > > +/* > + * To determine the number of CHAs, it should read bits 27:0 in the > CAPID6 > + * register which located at Device 30, Function 3, Offset 0x9C. PCI ID > 0x2083. > + */ > +#define SKX_CAPID6 0x9c > +#define SKX_CHA_BIT_MASK GENMASK(27, 0) > + > static int skx_count_chabox(void) > { > - struct pci_dev *chabox_dev = NULL; > - int bus, count = 0; > + struct pci_dev *dev = NULL; > + u32 val = 0; > > - while (1) { > - chabox_dev = pci_get_device(PCI_VENDOR_ID_INTEL, 0x208d, > chabox_dev); > - if (!chabox_dev) > - break; > - if (count == 0) > - bus = chabox_dev->bus->number; > - if (bus != chabox_dev->bus->number) > - break; > - count++; > - } > + dev = pci_get_device(PCI_VENDOR_ID_INTEL, 0x2083, dev); > + if (!dev) > + goto out; > > - pci_dev_put(chabox_dev); > - return count; > + pci_read_config_dword(dev, SKX_CAPID6, ); > + val &= SKX_CHA_BIT_MASK; > +out: > + pci_dev_put(dev); > + return hweight32(val); > } > > void skx_uncore_cpu_init(void) > -- > 2.7.4
Re: [PATCH V2] perf/x86/intel/uncore: Querying number of CHAs from CAPID6 register
On Tue, Mar 13, 2018 at 8:51 PM,wrote: > From: Kan Liang > > The number of CHAs is miscalculated on multi PCI domain systems on > Skylake server. > > (From Kroening, Gary: > > For systems with a single PCI segment, it is sufficient to look for the > bus number to change in order to determine that all of the CHa's have > been counted for a single socket. > However, for multi PCI segment systems, each socket is given a new > segment and the bus number does NOT change. So looking only for the > bus number to change ends up counting all of the CHa's on all sockets > in the system. This leads to writing CPU MSRs beyond a valid range and > causes an error in ivbep_uncore_msr_init_box().) > > To determine the number of CHAs, it should read bits 27:0 in the CAPID6 > register located at Device 30, Function 3, Offset 0x9C. These 28 bits > form a bit vector of available LLC slices and the CHAs that manage those > slices. > FWIW, Reviewed-by: Andy Shevchenko > Fixes: cd34cd97b7b4 ("perf/x86/intel/uncore: Add Skylake server uncore > support") > Reported-by: Kroening, Gary > Signed-off-by: Kan Liang > --- > > Changes since V1: > - add missed pci_dev_put() > - Drop ugly casting by using hweight32() > - Add comments for macros. > > arch/x86/events/intel/uncore_snbep.c | 31 +-- > 1 file changed, 17 insertions(+), 14 deletions(-) > > diff --git a/arch/x86/events/intel/uncore_snbep.c > b/arch/x86/events/intel/uncore_snbep.c > index 6d8044a..8970f71 100644 > --- a/arch/x86/events/intel/uncore_snbep.c > +++ b/arch/x86/events/intel/uncore_snbep.c > @@ -3562,24 +3562,27 @@ static struct intel_uncore_type *skx_msr_uncores[] = { > NULL, > }; > > +/* > + * To determine the number of CHAs, it should read bits 27:0 in the CAPID6 > + * register which located at Device 30, Function 3, Offset 0x9C. PCI ID > 0x2083. > + */ > +#define SKX_CAPID6 0x9c > +#define SKX_CHA_BIT_MASK GENMASK(27, 0) > + > static int skx_count_chabox(void) > { > - struct pci_dev *chabox_dev = NULL; > - int bus, count = 0; > + struct pci_dev *dev = NULL; > + u32 val = 0; > > - while (1) { > - chabox_dev = pci_get_device(PCI_VENDOR_ID_INTEL, 0x208d, > chabox_dev); > - if (!chabox_dev) > - break; > - if (count == 0) > - bus = chabox_dev->bus->number; > - if (bus != chabox_dev->bus->number) > - break; > - count++; > - } > + dev = pci_get_device(PCI_VENDOR_ID_INTEL, 0x2083, dev); > + if (!dev) > + goto out; > > - pci_dev_put(chabox_dev); > - return count; > + pci_read_config_dword(dev, SKX_CAPID6, ); > + val &= SKX_CHA_BIT_MASK; > +out: > + pci_dev_put(dev); > + return hweight32(val); > } > > void skx_uncore_cpu_init(void) > -- > 2.7.4 > -- With Best Regards, Andy Shevchenko
Re: [PATCH v2] workqueue: use put_device() instead of kfree()
On Tue, Mar 06, 2018 at 03:35:43PM +0530, Arvind Yadav wrote: > Never directly free @dev after calling device_register(), even > if it returned an error! Always use put_device() to give up the > reference initialized in this function instead. > > Signed-off-by: Arvind YadavApplied to wq/for-4.16-fixes. Thanks. -- tejun
Re: linux-next: build warning after merge of the net-next tree
On 03/13/2018 10:33 AM, David Miller wrote: From: "Gustavo A. R. Silva"Date: Tue, 13 Mar 2018 06:46:24 -0500 Hi Stephen, On 03/13/2018 01:11 AM, Stephen Rothwell wrote: Hi all, After merging the net-next tree, today's linux-next build (sparc defconfig) produced this warning: net/core/pktgen.c: In function 'pktgen_if_write': net/core/pktgen.c:1710:1: warning: the frame size of 1048 bytes is larger than 1024 bytes [-Wframe-larger-than=] } ^ Introduced by commit 35951393bbff ("pktgen: Remove VLA usage") Thanks for the report. David: If this code is not going to be executed very often [1], then I think it is safe to use dynamic memory allocation instead, as this is not going to impact the performance. What do you think? [1] https://lkml.org/lkml/2018/3/9/630 Sure, that works. It is only invoked when pktgen configuration changes are made. OK. I'll send a new patch for this. Thanks -- Gustavo
Re: [PATCH v4 4/6] ipc: Clamp msgmni and shmmni to the real IPCMNI limit
Waiman Longwrites: > On 03/13/2018 02:17 PM, Eric W. Biederman wrote: >> Waiman Long writes: >> >>> A user can write arbitrary integer values to msgmni and shmmni sysctl >>> parameters without getting error, but the actual limit is really >>> IPCMNI (32k). This can mislead users as they think they can get a >>> value that is not real. >>> >>> Enforcing the limit by failing the sysctl parameter write, however, >>> can break existing user applications. >> Which applications examples please. >> >> I am seeing this patchset late but it looks like a whole lot of changes >> to avoid a theoretical possibility. >> >> Changes that have an impact on more than just the ipc code you are >> patching. >> >> That makes me feel very uncomfortable with these changes. >> >> Eric > > This patchset is constructed to address a customer request that there is > no easy way to find out the actual usable range of a sysctl parameter. > In this particular case, the customer wants to use more than 32k shared > memory segments. They can put in a large value into shmmni, but the > application didn't work properly because shmmni was internally clamped > to 32k without any visible sign that a smaller limit has been imposed. > > Out of a concern that there might be customers out there setting those > sysctl parameters outside of the allowable range without knowing it, > just enforcing the right limits may have the undesirable consequence of > breaking their existing setup scripts. I don't have concrete example of > what customers are doing that, but it won't look good if we wait until > the complaints come in. > > The new code won't affect existing code unless the necessary flag is > set. So would you mind elaborating what other impact do you see that > will affect other non-IPC code in an undesirable way? The increase in size of struct ctl_table. Every caller is affected. Plus it increases everyone's cognitive load to figure out what is this flags field as they fill out ctl_table. Just introducing a proc_dointvec_minmax_clamped follows the existing pattern and it makes it easier for everyone who both read the code. It strikes me as quite peculiar that the response to bug report where the complaint is an error is not given, is to continue the current behavior without giving an error. Arguably the simplest fix here would be to kill IPCMNI entirely. Assign the shmids from a sequence counter. And place those structures in a rbtree indexed by shmni. There are 32bit fields but I don't think we must keep the low 16bits for an index into an array and the high 16bits as the actual sequence number. Except for the checkpoint/restart case which is aguably much too specific about how these ids are assigned that would give much more freedom and allow people the number of shm segments that they actually want to use. For a further complication I don't expect you can get away with changing the size or the fields in struct ctl_table in the kernel your customers are running. So please use a new function not flags it will simplify everyone's life. If you can please actually fix this so you can have more shmids that would be the really classy thing todo. Eric
Re: [PATCH] libata: add refcounting to ata_host
On Fri, Mar 09, 2018 at 08:34:41AM +, Taras Kondratiuk wrote: > After commit 9a6d6a2ddabb ("ata: make ata port as parent device of scsi > host") manual driver unbind/remove causes use-after-free. > > Unbind unconditionally invokes devres_release_all() which calls > ata_host_release() and frees ata_host/ata_port memory while it is still > being referenced as a parent of SCSI host. When SCSI host is finally > released scsi_host_dev_release() calls put_device(parent) and accesses > freed ata_port memory. > > Add reference counting to make sure that ata_host lives long enough. > > Bug report: https://lkml.org/lkml/2017/11/1/945 > Fixes: 9a6d6a2ddabb ("ata: make ata port as parent device of scsi host") > Cc: Tejun Heo> Cc: Lin Ming > Cc: linux-...@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Signed-off-by: Taras Kondratiuk Applied to libata/for-4.17. Thanks. -- tejun
[PATCH] regulator: gpio: Fix some error handling paths in 'gpio_regulator_probe()'
Re-order error handling code and gotos to avoid leaks in error handling paths. Fixes: 9f946099fe19 ("regulator: gpio: fix parsing of gpio list") Signed-off-by: Christophe JAILLET--- drivers/regulator/gpio-regulator.c | 16 +++- 1 file changed, 7 insertions(+), 9 deletions(-) diff --git a/drivers/regulator/gpio-regulator.c b/drivers/regulator/gpio-regulator.c index 0fce06acfaec..a2eb50719c7b 100644 --- a/drivers/regulator/gpio-regulator.c +++ b/drivers/regulator/gpio-regulator.c @@ -271,8 +271,7 @@ static int gpio_regulator_probe(struct platform_device *pdev) drvdata->desc.name = kstrdup(config->supply_name, GFP_KERNEL); if (drvdata->desc.name == NULL) { dev_err(>dev, "Failed to allocate supply name\n"); - ret = -ENOMEM; - goto err; + return -ENOMEM; } if (config->nr_gpios != 0) { @@ -292,7 +291,7 @@ static int gpio_regulator_probe(struct platform_device *pdev) dev_err(>dev, "Could not obtain regulator setting GPIOs: %d\n", ret); - goto err_memstate; + goto err_memgpio; } } @@ -303,7 +302,7 @@ static int gpio_regulator_probe(struct platform_device *pdev) if (drvdata->states == NULL) { dev_err(>dev, "Failed to allocate state data\n"); ret = -ENOMEM; - goto err_memgpio; + goto err_stategpio; } drvdata->nr_states = config->nr_states; @@ -324,7 +323,7 @@ static int gpio_regulator_probe(struct platform_device *pdev) default: dev_err(>dev, "No regulator type set\n"); ret = -EINVAL; - goto err_memgpio; + goto err_memstate; } /* build initial state from gpio init data. */ @@ -361,22 +360,21 @@ static int gpio_regulator_probe(struct platform_device *pdev) if (IS_ERR(drvdata->dev)) { ret = PTR_ERR(drvdata->dev); dev_err(>dev, "Failed to register regulator: %d\n", ret); - goto err_stategpio; + goto err_memstate; } platform_set_drvdata(pdev, drvdata); return 0; -err_stategpio: - gpio_free_array(drvdata->gpios, drvdata->nr_gpios); err_memstate: kfree(drvdata->states); +err_stategpio: + gpio_free_array(drvdata->gpios, drvdata->nr_gpios); err_memgpio: kfree(drvdata->gpios); err_name: kfree(drvdata->desc.name); -err: return ret; } -- 2.14.1
Re: [PATCH 1/8] Uprobe: Export vaddr <-> offset conversion functions
On Tue, Mar 13, 2018 at 06:25:56PM +0530, Ravi Bangoria wrote: > No functionality changes. > > Signed-off-by: Ravi BangoriaReviewed-by: Jérôme Glisse > --- > include/linux/mm.h | 12 > kernel/events/uprobes.c | 10 -- > 2 files changed, 12 insertions(+), 10 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index ad06d42..95909f2 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2274,6 +2274,18 @@ struct vm_unmapped_area_info { > return unmapped_area(info); > } > > +static inline unsigned long > +offset_to_vaddr(struct vm_area_struct *vma, loff_t offset) > +{ > + return vma->vm_start + offset - ((loff_t)vma->vm_pgoff << PAGE_SHIFT); > +} > + > +static inline loff_t > +vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr) > +{ > + return ((loff_t)vma->vm_pgoff << PAGE_SHIFT) + (vaddr - vma->vm_start); > +} > + > /* truncate.c */ > extern void truncate_inode_pages(struct address_space *, loff_t); > extern void truncate_inode_pages_range(struct address_space *, > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > index ce6848e..bd6f230 100644 > --- a/kernel/events/uprobes.c > +++ b/kernel/events/uprobes.c > @@ -130,16 +130,6 @@ static bool valid_vma(struct vm_area_struct *vma, bool > is_register) > return vma->vm_file && (vma->vm_flags & flags) == VM_MAYEXEC; > } > > -static unsigned long offset_to_vaddr(struct vm_area_struct *vma, loff_t > offset) > -{ > - return vma->vm_start + offset - ((loff_t)vma->vm_pgoff << PAGE_SHIFT); > -} > - > -static loff_t vaddr_to_offset(struct vm_area_struct *vma, unsigned long > vaddr) > -{ > - return ((loff_t)vma->vm_pgoff << PAGE_SHIFT) + (vaddr - vma->vm_start); > -} > - > /** > * __replace_page - replace page in vma by new page. > * based on replace_page in mm/ksm.c > -- > 1.8.3.1 >
[PATCH] clk: rockchip: Add 1.6GHz PLL rate
We need this rate to generate 100, 200, and 228.57MHz from the same PLL. 228.57MHz is useful for a pixel clock when the VPLL is used for and external display. Signed-off-by: Derek Basehore--- drivers/clk/rockchip/clk-rk3399.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/clk/rockchip/clk-rk3399.c b/drivers/clk/rockchip/clk-rk3399.c index 6847120b61cd..3e57c6eef93d 100644 --- a/drivers/clk/rockchip/clk-rk3399.c +++ b/drivers/clk/rockchip/clk-rk3399.c @@ -57,6 +57,7 @@ static struct rockchip_pll_rate_table rk3399_pll_rates[] = { RK3036_PLL_RATE(165600, 1, 69, 1, 1, 1, 0), RK3036_PLL_RATE(163200, 1, 68, 1, 1, 1, 0), RK3036_PLL_RATE(160800, 1, 67, 1, 1, 1, 0), + RK3036_PLL_RATE(16, 3, 200, 1, 1, 1, 0), RK3036_PLL_RATE(158400, 1, 66, 1, 1, 1, 0), RK3036_PLL_RATE(156000, 1, 65, 1, 1, 1, 0), RK3036_PLL_RATE(153600, 1, 64, 1, 1, 1, 0), -- 2.16.2.660.g709887971b-goog
Re: [PATCH 2/8] mm: Prefix vma_ to vaddr_to_offset() and offset_to_vaddr()
On Tue, Mar 13, 2018 at 06:25:57PM +0530, Ravi Bangoria wrote: > No functionality changes. > > Signed-off-by: Ravi BangoriaDoing this with coccinelle would have been nicer but this is small enough to review without too much fatigue :) Reviewed-by: Jérôme Glisse > --- > include/linux/mm.h | 4 ++-- > kernel/events/uprobes.c | 14 +++--- > 2 files changed, 9 insertions(+), 9 deletions(-) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 95909f2..d7ee526 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -2275,13 +2275,13 @@ struct vm_unmapped_area_info { > } > > static inline unsigned long > -offset_to_vaddr(struct vm_area_struct *vma, loff_t offset) > +vma_offset_to_vaddr(struct vm_area_struct *vma, loff_t offset) > { > return vma->vm_start + offset - ((loff_t)vma->vm_pgoff << PAGE_SHIFT); > } > > static inline loff_t > -vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr) > +vma_vaddr_to_offset(struct vm_area_struct *vma, unsigned long vaddr) > { > return ((loff_t)vma->vm_pgoff << PAGE_SHIFT) + (vaddr - vma->vm_start); > } > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > index bd6f230..535fd39 100644 > --- a/kernel/events/uprobes.c > +++ b/kernel/events/uprobes.c > @@ -748,7 +748,7 @@ static inline struct map_info *free_map_info(struct > map_info *info) > curr = info; > > info->mm = vma->vm_mm; > - info->vaddr = offset_to_vaddr(vma, offset); > + info->vaddr = vma_offset_to_vaddr(vma, offset); > } > i_mmap_unlock_read(mapping); > > @@ -807,7 +807,7 @@ static inline struct map_info *free_map_info(struct > map_info *info) > goto unlock; > > if (vma->vm_start > info->vaddr || > - vaddr_to_offset(vma, info->vaddr) != uprobe->offset) > + vma_vaddr_to_offset(vma, info->vaddr) != uprobe->offset) > goto unlock; > > if (is_register) { > @@ -977,7 +977,7 @@ static int unapply_uprobe(struct uprobe *uprobe, struct > mm_struct *mm) > uprobe->offset >= offset + vma->vm_end - vma->vm_start) > continue; > > - vaddr = offset_to_vaddr(vma, uprobe->offset); > + vaddr = vma_offset_to_vaddr(vma, uprobe->offset); > err |= remove_breakpoint(uprobe, mm, vaddr); > } > up_read(>mmap_sem); > @@ -1023,7 +1023,7 @@ static void build_probe_list(struct inode *inode, > struct uprobe *u; > > INIT_LIST_HEAD(head); > - min = vaddr_to_offset(vma, start); > + min = vma_vaddr_to_offset(vma, start); > max = min + (end - start) - 1; > > spin_lock(_treelock); > @@ -1076,7 +1076,7 @@ int uprobe_mmap(struct vm_area_struct *vma) > list_for_each_entry_safe(uprobe, u, _list, pending_list) { > if (!fatal_signal_pending(current) && > filter_chain(uprobe, UPROBE_FILTER_MMAP, vma->vm_mm)) { > - unsigned long vaddr = offset_to_vaddr(vma, > uprobe->offset); > + unsigned long vaddr = vma_offset_to_vaddr(vma, > uprobe->offset); > install_breakpoint(uprobe, vma->vm_mm, vma, vaddr); > } > put_uprobe(uprobe); > @@ -1095,7 +1095,7 @@ int uprobe_mmap(struct vm_area_struct *vma) > > inode = file_inode(vma->vm_file); > > - min = vaddr_to_offset(vma, start); > + min = vma_vaddr_to_offset(vma, start); > max = min + (end - start) - 1; > > spin_lock(_treelock); > @@ -1730,7 +1730,7 @@ static struct uprobe *find_active_uprobe(unsigned long > bp_vaddr, int *is_swbp) > if (vma && vma->vm_start <= bp_vaddr) { > if (valid_vma(vma, false)) { > struct inode *inode = file_inode(vma->vm_file); > - loff_t offset = vaddr_to_offset(vma, bp_vaddr); > + loff_t offset = vma_vaddr_to_offset(vma, bp_vaddr); > > uprobe = find_uprobe(inode, offset); > } > -- > 1.8.3.1 >
Re: [PATCH 3/8] Uprobe: Rename map_info to uprobe_map_info
On Tue, Mar 13, 2018 at 06:25:58PM +0530, Ravi Bangoria wrote: > map_info is very generic name, rename it to uprobe_map_info. > Renaming will help to export this structure outside of the > file. > > Also rename free_map_info() to uprobe_free_map_info() and > build_map_info() to uprobe_build_map_info(). > > No functionality changes. > > Signed-off-by: Ravi BangoriaSame coccinelle comments as previously. Reviewed-by: Jérôme Glisse > --- > kernel/events/uprobes.c | 32 +--- > 1 file changed, 17 insertions(+), 15 deletions(-) > > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > index 535fd39..081b88c1 100644 > --- a/kernel/events/uprobes.c > +++ b/kernel/events/uprobes.c > @@ -695,27 +695,29 @@ static void delete_uprobe(struct uprobe *uprobe) > put_uprobe(uprobe); > } > > -struct map_info { > - struct map_info *next; > +struct uprobe_map_info { > + struct uprobe_map_info *next; > struct mm_struct *mm; > unsigned long vaddr; > }; > > -static inline struct map_info *free_map_info(struct map_info *info) > +static inline struct uprobe_map_info * > +uprobe_free_map_info(struct uprobe_map_info *info) > { > - struct map_info *next = info->next; > + struct uprobe_map_info *next = info->next; > kfree(info); > return next; > } > > -static struct map_info * > -build_map_info(struct address_space *mapping, loff_t offset, bool > is_register) > +static struct uprobe_map_info * > +uprobe_build_map_info(struct address_space *mapping, loff_t offset, > + bool is_register) > { > unsigned long pgoff = offset >> PAGE_SHIFT; > struct vm_area_struct *vma; > - struct map_info *curr = NULL; > - struct map_info *prev = NULL; > - struct map_info *info; > + struct uprobe_map_info *curr = NULL; > + struct uprobe_map_info *prev = NULL; > + struct uprobe_map_info *info; > int more = 0; > > again: > @@ -729,7 +731,7 @@ static inline struct map_info *free_map_info(struct > map_info *info) >* Needs GFP_NOWAIT to avoid i_mmap_rwsem recursion > through >* reclaim. This is optimistic, no harm done if it > fails. >*/ > - prev = kmalloc(sizeof(struct map_info), > + prev = kmalloc(sizeof(struct uprobe_map_info), > GFP_NOWAIT | __GFP_NOMEMALLOC | > __GFP_NOWARN); > if (prev) > prev->next = NULL; > @@ -762,7 +764,7 @@ static inline struct map_info *free_map_info(struct > map_info *info) > } > > do { > - info = kmalloc(sizeof(struct map_info), GFP_KERNEL); > + info = kmalloc(sizeof(struct uprobe_map_info), GFP_KERNEL); > if (!info) { > curr = ERR_PTR(-ENOMEM); > goto out; > @@ -774,7 +776,7 @@ static inline struct map_info *free_map_info(struct > map_info *info) > goto again; > out: > while (prev) > - prev = free_map_info(prev); > + prev = uprobe_free_map_info(prev); > return curr; > } > > @@ -782,11 +784,11 @@ static inline struct map_info *free_map_info(struct > map_info *info) > register_for_each_vma(struct uprobe *uprobe, struct uprobe_consumer *new) > { > bool is_register = !!new; > - struct map_info *info; > + struct uprobe_map_info *info; > int err = 0; > > percpu_down_write(_mmap_sem); > - info = build_map_info(uprobe->inode->i_mapping, > + info = uprobe_build_map_info(uprobe->inode->i_mapping, > uprobe->offset, is_register); > if (IS_ERR(info)) { > err = PTR_ERR(info); > @@ -825,7 +827,7 @@ static inline struct map_info *free_map_info(struct > map_info *info) > up_write(>mmap_sem); > free: > mmput(mm); > - info = free_map_info(info); > + info = uprobe_free_map_info(info); > } > out: > percpu_up_write(_mmap_sem); > -- > 1.8.3.1 >
Re: [PATCH 4/8] Uprobe: Export uprobe_map_info along with uprobe_{build/free}_map_info()
On Tue, Mar 13, 2018 at 06:25:59PM +0530, Ravi Bangoria wrote: > These exported data structure and functions will be used by other > files in later patches. > > No functionality changes. > > Signed-off-by: Ravi BangoriaReviewed-by: Jérôme Glisse > --- > include/linux/uprobes.h | 9 + > kernel/events/uprobes.c | 14 +++--- > 2 files changed, 12 insertions(+), 11 deletions(-) > > diff --git a/include/linux/uprobes.h b/include/linux/uprobes.h > index 0a294e9..7bd2760 100644 > --- a/include/linux/uprobes.h > +++ b/include/linux/uprobes.h > @@ -109,12 +109,19 @@ enum rp_check { > RP_CHECK_RET, > }; > > +struct address_space; > struct xol_area; > > struct uprobes_state { > struct xol_area *xol_area; > }; > > +struct uprobe_map_info { > + struct uprobe_map_info *next; > + struct mm_struct *mm; > + unsigned long vaddr; > +}; > + > extern int set_swbp(struct arch_uprobe *aup, struct mm_struct *mm, unsigned > long vaddr); > extern int set_orig_insn(struct arch_uprobe *aup, struct mm_struct *mm, > unsigned long vaddr); > extern bool is_swbp_insn(uprobe_opcode_t *insn); > @@ -149,6 +156,8 @@ struct uprobes_state { > extern bool arch_uprobe_ignore(struct arch_uprobe *aup, struct pt_regs > *regs); > extern void arch_uprobe_copy_ixol(struct page *page, unsigned long vaddr, >void *src, unsigned long len); > +extern struct uprobe_map_info *uprobe_free_map_info(struct uprobe_map_info > *info); > +extern struct uprobe_map_info *uprobe_build_map_info(struct address_space > *mapping, loff_t offset, bool is_register); > #else /* !CONFIG_UPROBES */ > struct uprobes_state { > }; > diff --git a/kernel/events/uprobes.c b/kernel/events/uprobes.c > index 081b88c1..e7830b8 100644 > --- a/kernel/events/uprobes.c > +++ b/kernel/events/uprobes.c > @@ -695,23 +695,15 @@ static void delete_uprobe(struct uprobe *uprobe) > put_uprobe(uprobe); > } > > -struct uprobe_map_info { > - struct uprobe_map_info *next; > - struct mm_struct *mm; > - unsigned long vaddr; > -}; > - > -static inline struct uprobe_map_info * > -uprobe_free_map_info(struct uprobe_map_info *info) > +struct uprobe_map_info *uprobe_free_map_info(struct uprobe_map_info *info) > { > struct uprobe_map_info *next = info->next; > kfree(info); > return next; > } > > -static struct uprobe_map_info * > -uprobe_build_map_info(struct address_space *mapping, loff_t offset, > - bool is_register) > +struct uprobe_map_info *uprobe_build_map_info(struct address_space *mapping, > + loff_t offset, bool is_register) > { > unsigned long pgoff = offset >> PAGE_SHIFT; > struct vm_area_struct *vma; > -- > 1.8.3.1 >
Re: [v5 1/2] mm: disable interrupts while initializing deferred pages
> Soft lockup: kernel has run for too long without rescheduling > Hard lockup: kernel has run for too long with interrupts disabled > > Both of these are detected by the NMI watchdog handler. > > 9b6e63cbf85b89b2d fixes a soft lockup by adding a manual rescheduling > point. Replacing that with touch_nmi_watchdog() won't work (I think). > Presumably calling touch_softlockup_watchdog() will "work", in that it > suppresses the warning. But it won't fix the thing which the warning > is actually warning about: starvation of the CPU scheduler. That's > what the cond_resched() does. But, unlike memmap_init_zone(), which can be used after boot, here we do not worry about kernel running for too long. This is because we are booting, and no user programs are running. So, it is acceptable to have a long uninterruptible span, as long as we making a useful progress. BTW, the boot CPU still has interrupts enabled during this span. Comment in: include/linux/nmi.h, states: * If the architecture supports the NMI watchdog, touch_nmi_watchdog() * may be used to reset the timeout - for code which intentionally * disables interrupts for a long time. This call is stateless. Which is exactly what we are trying to do here, now that these threads run with interrupts disabled. Before, where they were running with interrupts enabled, and cond_resched() was enough to satisfy soft lockups. > > I'm not sure what to suggest, really. Your changelog isn't the best: > "Vlastimil Babka reported about a window issue during which when > deferred pages are initialized, and the current version of on-demand > initialization is finished, allocations may fail". Well... where is > ths mysterious window? Without such detail it's hard for others to > suggest alternative approaches. Here is hopefully a better description of the problem: Currently, during boot we preinitialize some number of struct pages to satisfy all boot allocations. Even if these allocations happen when we initialize the reset of deferred pages in page_alloc_init_late(). The problem is that we do not know how much kernel will need, and it also depends on various options. So, with this work, we are changing this behavior to initialize struct pages on-demand, only when allocations happen. During boot, when we try to allocate memory, the on-demand struct page initialization code takes care of it. But, once the deferred pages are initializing in: page_alloc_init_late() for_each_node_state(nid, N_MEMORY) kthread_run(deferred_init_memmap()) We cannot use on-demand initialization, as these threads resize pgdat. This whole thing is to take care of this time. My first version of on-demand deferred page initialization would simply fail to allocate memory during this period of time. But, this new version waits for threads to finish initializing deferred memory, and successfully perform the allocation. Because interrupt handler would wait for pgdat resize lock. Thank you, Pavel
Re: [PATCH v3 01/11] PCI/P2PDMA: Support peer-to-peer memory
On 13/03/18 01:53 PM, Sinan Kaya wrote: > I agree disabling globally would be bad. Somebody can always say I have > ten switches on my system. I want to do peer-to-peer on one switch only. Now, > this change weakened security for the other switches that I had no intention > with doing P2P. > > Isn't this a problem? Well, if it's a problem for someone they'll have to solve it. We're targeting JBOFs that have no use for ACS / IOMMU groups at all. > Can we specify the BDF of the downstream device we want P2P with during boot > via > kernel command line? That's a painful configuration burden. And then things might stop working if you change your topology at all and now have to change boot parameters. Logan
Re: dcache: remove trylock loops (was Re: [BUG] lock_parent() breakage when used from shrink_dentry_list())
On 2018-03-12, Al Virowrote: >> If someone else has grabbed a reference, it shouldn't be added to the >> lru list. Only decremented. >> >> if (entry->d_lockref.count == 1) > > Nah, better handle that in retain_dentry() itself. See updated > #work.dcache. > > + if (unlikely(dentry->d_lockref.count != 1)) { > + dentry->d_lockref.count--; > + } else if (likely(!retain_dentry(dentry))) { > + __dentry_kill(dentry); > + return parent; > + } Although the updated version is correct (and saves on lines of code), I find putting the deref and lru_add code in the "true" case of retain_dentry() to be pretty tricky. IMHO the code is easier to understand if it looks like this: if (unlikely(dentry->d_lockref.count != 1)) { dentry->d_lockref.count--; } else if (likely(!retain_dentry(dentry))) { __dentry_kill(dentry); return parent; } else { dentry->d_lockref.count--; dentry_lru_add(dentry); } This is what your version is doing, but that final else is hiding in the retain_dentry() "true" case. My suggestion is to revert 7479f57fecd2a4837b5c79ce1cf0dcf284db54be (and then fixup dput() to deref before calling dentry_lru_add()). > FWIW, there's another trylock loop on dentries - one in > autofs get_next_positive_dentry(). Any plans re dealing > with that one? I will need to dig into it a bit deeper (I am unfamiliar with autofs), but it looks like it is trying to do basically the same thing as the ascend loop in d_walk(). > I'd spent the last couple of weeks (when not being too sick > for any work) going through dcache.c and related code; hopefully > this time I will get the documentation into postable shape ;-/ Thank you for all your help in getting these changes cleaned and correctly implemented so quickly. I've reviewed your latest trylock loop removal patches and found only 1 minor issue. I'll post that separately. John Ogness
Re: [RESEND RFC] translate_pid API
On Mon, Mar 12, 2018 at 10:18 AM,wrote: > Resending the RFC with participants of previous discussions > in the list. > > Following patch which is a variation of a solution discussed > in https://lwn.net/Articles/736330/ provides the users of > pid namespace, the functionality of pid translation between > namespaces using a namespace identifier. The topic of > pid translation has been discussed in the community few times > but there has always been a resistance to adding new solution > for this problem. > I will outline the planned usecase of pid namespace by oracle > database and explain why any of the existing solution cannot > be used to solve their problem. > > Consider a system in which several PID namespaces with multiple > nested levels exists in parallel with monitor processes managing > all the namespaces. PID translation is required for controlling > and accessing information about the processes by the monitors > and other processes down the hierarchy of namespaces. Controlling > primarily involves sending signals or using ptrace by a process in > parent namespace on any of the processes in its child namespace. > Accessing information deals with the reading /proc//* files > of processes in child namespace. None of the processes have > root/CAP_SYS_ADMIN privileges. How are you dealing with PID reuse? [...] > diff --git a/fs/nsfs.c b/fs/nsfs.c > index 36b0772..c635465 100644 > --- a/fs/nsfs.c > +++ b/fs/nsfs.c > @@ -222,8 +222,13 @@ int ns_get_name(char *buf, size_t size, struct > task_struct *task, > const char *name; > ns = ns_ops->get(task); > if (ns) { > - name = ns_ops->real_ns_name ? : ns_ops->name; > - res = snprintf(buf, size, "%s:[%u]", name, ns->inum); > + if (!strcmp(ns_ops->name, "pidns_id")) { Wouldn't it be cleaner to check for "ns_ops==_id_operations"? > + res = snprintf(buf, size, "[%llu]", > + (unsigned long long)ns->ns_id); > + } else { > + name = ns_ops->real_ns_name ? : ns_ops->name; > + res = snprintf(buf, size, "%s:[%u]", name, ns->inum); > + } > ns_ops->put(ns); > } > return res; [...] > diff --git a/include/linux/pid_namespace.h b/include/linux/pid_namespace.h > index 49538b1..11d1d57 100644 > --- a/include/linux/pid_namespace.h > +++ b/include/linux/pid_namespace.h > @@ -11,6 +11,7 @@ > #include > #include > #include > +#include > > > struct fs_pin; > @@ -44,6 +45,8 @@ struct pid_namespace { > kgid_t pid_gid; > int hide_pid; > int reboot; /* group exit code if this pidns was rebooted */ > + struct hlist_bl_node node; > + atomic_t lookups_pending; > struct ns_common ns; > } __randomize_layout; > [...] > diff --git a/kernel/pid_namespace.c b/kernel/pid_namespace.c > index 0b53eef..ff83aa8 100644 > --- a/kernel/pid_namespace.c > +++ b/kernel/pid_namespace.c [...] > @@ -159,6 +201,30 @@ static void delayed_free_pidns(struct rcu_head *p) > > static void destroy_pid_namespace(struct pid_namespace *ns) > { > + struct pid_namespace *ph; > + struct hlist_bl_head *head; > + struct hlist_bl_node *dup_node; > + > + /* > +* Remove the namespace structure from hash table so > +* now new lookups can start on it. s/now new/no new/ [...] > @@ -474,9 +551,116 @@ static struct user_namespace *pidns_owner(struct > ns_common *ns) > .get_parent = pidns_get_parent, > }; > > +/* > + * translate_pid - convert pid in source pid-ns into target pid-ns. > + * @pid: pid for translation > + * @source: pid-ns id > + * @target: pid-ns id > + * > + * Return pid in @target pid-ns, zero if task have no pid there, > + * or -ESRCH of task with @pid is not found in @source pid-ns. s/of/if/ > + */ > +SYSCALL_DEFINE3(translate_pid, pid_t, pid, u64, source, > + u64, target) > +{ > + struct pid_namespace *source_ns = NULL, *target_ns = NULL; > + struct pid *struct_pid; > + struct pid_namespace *ph; > + struct hlist_bl_head *shead = NULL; > + struct hlist_bl_head *thead = NULL; > + struct hlist_bl_node *dup_node; > + pid_t result; > + > + if (!source) { > + source_ns = _pid_ns; > + } else { > + shead = pid_ns_hash_head(pid_ns_hash, source); > + hlist_bl_lock(shead); > + hlist_bl_for_each_entry(ph, dup_node, shead, node) { > + if (source == ph->ns.ns_id) { > + source_ns = ph; > + break; > + } > + } > + if (!source_ns) { > + hlist_bl_unlock(shead); > + return -EINVAL; > + } > + } > + if
Re: linux-next: Tree for Mar 13
Hi Dominik, On Tue, 13 Mar 2018 18:52:05 +0100 Dominik Brodowskiwrote: > > would it be possible to have > >https://git.kernel.org/pub/scm/linux/kernel/git/brodo/linux.git > syscalls-next > > added to -next? This will probably only be for one or two release cycles, > but it might easen the development process of that patch series. Added from today. I put it near the start since Linus said he may take (some of) it this release. Thanks for adding your subsystem tree as a participant of linux-next. As you may know, this is not a judgement of your code. The purpose of linux-next is for integration testing and to lower the impact of conflicts between subsystems in the next merge window. You will need to ensure that the patches/commits in your tree/series have been: * submitted under GPL v2 (or later) and include the Contributor's Signed-off-by, * posted to the relevant mailing list, * reviewed by you (or another maintainer of your subsystem tree), * successfully unit tested, and * destined for the current or next Linux merge window. Basically, this should be just what you would send to Linus (or ask him to fetch). It is allowed to be rebased if you deem it necessary. -- Cheers, Stephen Rothwell s...@canb.auug.org.au pgpPZcLtBfF2O.pgp Description: OpenPGP digital signature
Re: [PATCH V2] perf/x86/intel/uncore: Querying number of CHAs from CAPID6 register
On 3/13/2018 4:24 PM, Kroening, Gary wrote: I've tested this patch on the same set of hubless (single-segment) and scalable (segment-per-socket) configurations as for Kan's version 1. As far as we can tell this will also work for Cascade Lake, but will need revisiting for Ice Lake. For Ice lake, there will be another method to query the number of CHAs. This patch is only for Skylake server. Thanks, Kan Thanks. Gary -Original Message- From: kan.li...@linux.intel.com [mailto:kan.li...@linux.intel.com] Sent: Tuesday, March 13, 2018 1:52 PM To: mi...@redhat.com; h...@zytor.com; t...@linutronix.de; pet...@infradead.org; andy.shevche...@gmail.com Cc: Kroening, Gary; Travis, Mike; Banman, Andrew; Sivanich, Dimitri; Anderson, Russ; x...@kernel.org; linux-kernel@vger.kernel.org; Kan Liang Subject: [PATCH V2] perf/x86/intel/uncore: Querying number of CHAs from CAPID6 register From: Kan LiangThe number of CHAs is miscalculated on multi PCI domain systems on Skylake server. (From Kroening, Gary: For systems with a single PCI segment, it is sufficient to look for the bus number to change in order to determine that all of the CHa's have been counted for a single socket. However, for multi PCI segment systems, each socket is given a new segment and the bus number does NOT change. So looking only for the bus number to change ends up counting all of the CHa's on all sockets in the system. This leads to writing CPU MSRs beyond a valid range and causes an error in ivbep_uncore_msr_init_box().) To determine the number of CHAs, it should read bits 27:0 in the CAPID6 register located at Device 30, Function 3, Offset 0x9C. These 28 bits form a bit vector of available LLC slices and the CHAs that manage those slices. Fixes: cd34cd97b7b4 ("perf/x86/intel/uncore: Add Skylake server uncore support") Reported-by: Kroening, Gary Signed-off-by: Kan Liang --- Changes since V1: - add missed pci_dev_put() - Drop ugly casting by using hweight32() - Add comments for macros. arch/x86/events/intel/uncore_snbep.c | 31 +-- 1 file changed, 17 insertions(+), 14 deletions(-) diff --git a/arch/x86/events/intel/uncore_snbep.c b/arch/x86/events/intel/uncore_snbep.c index 6d8044a..8970f71 100644 --- a/arch/x86/events/intel/uncore_snbep.c +++ b/arch/x86/events/intel/uncore_snbep.c @@ -3562,24 +3562,27 @@ static struct intel_uncore_type *skx_msr_uncores[] = { NULL, }; +/* + * To determine the number of CHAs, it should read bits 27:0 in the CAPID6 + * register which located at Device 30, Function 3, Offset 0x9C. PCI ID 0x2083. + */ +#define SKX_CAPID6 0x9c +#define SKX_CHA_BIT_MASK GENMASK(27, 0) + static int skx_count_chabox(void) { - struct pci_dev *chabox_dev = NULL; - int bus, count = 0; + struct pci_dev *dev = NULL; + u32 val = 0; - while (1) { - chabox_dev = pci_get_device(PCI_VENDOR_ID_INTEL, 0x208d, chabox_dev); - if (!chabox_dev) - break; - if (count == 0) - bus = chabox_dev->bus->number; - if (bus != chabox_dev->bus->number) - break; - count++; - } + dev = pci_get_device(PCI_VENDOR_ID_INTEL, 0x2083, dev); + if (!dev) + goto out; - pci_dev_put(chabox_dev); - return count; + pci_read_config_dword(dev, SKX_CAPID6, ); + val &= SKX_CHA_BIT_MASK; +out: + pci_dev_put(dev); + return hweight32(val); } void skx_uncore_cpu_init(void) -- 2.7.4
[PATCH v3 2/2] kbuild: Don't mess with the .cache.mk when root
As pointed out by Masahiro Yamada people often run "sudo make install" or "sudo make modules_install". In theory, that could cause a cache file (or the directory containing it) to be created by root. After doing this then subsequent invocations to make would yell with a whole bunch of warnings like: /bin/sh: ./.cache.mk: Permission denied These yells would be mostly harmless (we'd just go on running without the cache), but they're ugly. The above situation would be fairly unlikely because it should only happen if someone does "sudo make install" before doing a normal "make", which is an invalid thing to do. If you did a normal "make" before the "sudo make install" then all the cache files and directories should have already been created by a normal user and, even if the superuser needed to add to the existing files, we wouldn't end up with a permission problem. The above situation would also not have been catastrophic because presumably if the user was able to run "sudo make install" then they should also be able to run "sudo make clean" to clean things up. However, even though the problem described is relatively minor, it probably makes sense to add a heuristic to avoid creating cache files when we're running as root. This heuristic doesn't add a ton of overhead and it might save someone time tracking down strange "Permission denied" messages. We'll consider this heuristic good enough because the problem really shouldn't be that serious. Fixes: 3298b690b21c ("kbuild: Add a cache for generated variables") Suggested-by: Masahiro YamadaSigned-off-by: Douglas Anderson --- Changes in v3: - Use "uid 0" as the heuristic instead of install - Do the checking in the main Makefile instead of Kbuild.include Changes in v2: None Makefile | 6 ++ scripts/Kbuild.include | 2 ++ 2 files changed, 8 insertions(+) diff --git a/Makefile b/Makefile index f1e61470640b..c4d2131831ba 100644 --- a/Makefile +++ b/Makefile @@ -268,6 +268,12 @@ __build_one_by_one: else +# Don't create Makefile caches if running as root since they can't be deleted +# easily; in the real world we might be root when doing "sudo make install" +ifeq ($(shell id -u),0) +export KBUILD_NOCACHE := 1 +endif + # We need some generic definitions (do not try to remake the file). scripts/Kbuild.include: ; include scripts/Kbuild.include diff --git a/scripts/Kbuild.include b/scripts/Kbuild.include index 065324a8046f..581f93f20119 100644 --- a/scripts/Kbuild.include +++ b/scripts/Kbuild.include @@ -137,12 +137,14 @@ __sanitize-opt = $(subst $$,_,$(subst $(right_paren),_,$(subst $(left_paren),_,$ define __run-and-store ifeq ($(origin $(1)),undefined) $$(eval $(1) := $$(shell $$(2))) +ifneq ($(KBUILD_NOCACHE),1) ifeq ($(create-cache-dir),1) $$(shell mkdir -p $(dir $(make-cache))) $$(eval create-cache-dir :=) endif $$(shell echo '$(1) := $$($(1))' >> $(make-cache)) endif +endif endef __shell-cached = $(eval $(call __run-and-store,$(1)))$($(1)) shell-cached = $(call __shell-cached,__cached_$(call __sanitize-opt,$(1)),$(1)) -- 2.16.2.660.g709887971b-goog
RE: Dell Inc. XPS 13 9343/0TM99H fails to boot v4.16-rc5
> -Original Message- > From: Dominik Brodowski [mailto:li...@dominikbrodowski.net] > Sent: Tuesday, March 13, 2018 1:51 PM > To: Limonciello, Mario> Cc: dvh...@infradead.org; platform-driver-...@vger.kernel.org; linux- > ker...@vger.kernel.org > Subject: Re: Dell Inc. XPS 13 9343/0TM99H fails to boot v4.16-rc5 > > On Mon, Mar 12, 2018 at 10:42:01PM +, mario.limoncie...@dell.com wrote: > > > > > > > -Original Message- > > > From: Dominik Brodowski [mailto:li...@dominikbrodowski.net] > > > Sent: Tuesday, March 13, 2018 2:54 AM > > > To: dvh...@infradead.org; Limonciello, Mario > > > Cc: platform-driver-...@vger.kernel.org; linux-kernel@vger.kernel.org > > > Subject: Dell Inc. XPS 13 9343/0TM99H fails to boot v4.16-rc5 > > > > > > Mario, > > > > > > unfortunately, my Dell Inc. XPS 13 9343/0TM99H, BIOS A11 12/08/2016 fails > > > to > > > boot v4.16-rc5. More exactly, I could bisect it down to commit 25d47027e10 > > > ("platform/x86: dell-smbios: Link all dell-smbios-* modules together"). > > > Usually, I have enabled > > > > > > CONFIG_SENSORS_DELL_SMM=y > > > CONFIG_DELL_SMBIOS=y > > > CONFIG_DELL_SMBIOS_WMI=y > > > CONFIG_DELL_SMBIOS_SMM=y > > > CONFIG_DELL_LAPTOP=y > > > CONFIG_DELL_WMI=y > > > CONFIG_DELL_WMI_DESCRIPTOR=y > > > # CONFIG_DELL_WMI_AIO is not set > > > # CONFIG_DELL_WMI_LED is not set > > > # CONFIG_DELL_SMO8800 is not set > > > # CONFIG_DELL_RBTN is not set > > > # CONFIG_DELL_RBU is not set > > > > > > For v4.16-rc5 to work, I need to manually disable DELL_SMBIOS_WMI: > > > > > > -CONFIG_DELL_SMBIOS_WMI=y > > > +# CONFIG_DELL_SMBIOS_WMI is not set > > > > > > Any ideas? > > > > > Dominick, > > > > Interesting. Can you please change CONFIG_DELL_SMBIOS to a module > > and see if that behavior persists? If it does, can you please blacklist it > > on > > the kernel command line and try to load it manually and share any > > backtrace? > > Mario, > > building and running it as a *module* works flawlessly. But that was > actually expected after a 'grep "initcall"' in drivers/platform/x86: Dominck, Thanks for your checking and explanation of the situation. > > As Darren pointed out, DELL_SMBIOS_WMI depends on ACPI_WMI, so probably > ACPI_WMI needs to be initialized first. However, the all-in-one > dell-smbios.o is run as subsys_initcall(), same as wmi.o > (subsys_initcall_sync() there). > > If both are built-ins, that means that dell-smbios.o is run first, and wmi.o > second. Changing dell-smbios.o to run at the later fs_initcall() level > instead lets me boot the kernel. HOWEVER: > > 1) Is there a reason why both the core and the dell-smbios-smm driver have >to run already at subsys_initcall() time? They did so previous to your >patch. Is it OK to defer these parts opf the all-in-one dell-smbios.o >to fs_initcall(), or even to the default device_initcall()? As long as they're ready before dell-laptop's initialization which uses late_initcall that should be fine. Am I correct to presume you're going to propose a patch you can test and confirm your hypothesis rather than Darren reverting my patch to bring them together? > > 2) dell-smbios-wmi depends on (well, selects) DELL_WMI_DESCRIPTOR. The >dell-smbios-wmi is running at the default device_initcall() time, but >(AFAICS) probably later than the initialization of dell-smbios-wmi.o. >May I presume that this poses no additional problem? > I "think" that should actually be fine. The driver will use deferred probing until it's ready meaning that dell-smbios-wmi should wait until dell-wmi-descriptor is done with it's probing routine.