Re: [PATCH 6/9] bus: brcmstb_gisb: correct support for 64-bit address output
On Fri, Mar 24, 2017 at 7:46 AM, Doug Bergerwrote: > The GISB bus can support addresses beyond 32-bits. So this commit > corrects support for reading a captured 64-bit address into a 64-bit > variable by obtaining the high bits from the ARB_ERR_CAP_HI_ADDR > register (when present) and then outputting the full 64-bit value. > > It also removes unused definitions. > > Fixes: 44127b771d9c ("bus: add Broadcom GISB bus arbiter timeout/error > handler") > Signed-off-by: Doug Berger > --- > drivers/bus/brcmstb_gisb.c | 36 > 1 file changed, 20 insertions(+), 16 deletions(-) > > diff --git a/drivers/bus/brcmstb_gisb.c b/drivers/bus/brcmstb_gisb.c > [snip] > @@ -119,6 +116,16 @@ static u32 gisb_read(struct brcmstb_gisb_arb_device > *gdev, int reg) > return ioread32(gdev->base + offset); > } > > +static u64 gisb_read_address(struct brcmstb_gisb_arb_device *gdev) > +{ > + u64 value; > + > + value = (u64)gisb_read(gdev, ARB_ERR_CAP_ADDR); Unlike the one on the next line, this cast can be omitted. > + value |= (u64)gisb_read(gdev, ARB_ERR_CAP_HI_ADDR) << 32; > + > + return value; > +} > [snip] Acked-by: Gregory Fong
Re: [PATCH 6/9] bus: brcmstb_gisb: correct support for 64-bit address output
On Fri, Mar 24, 2017 at 7:46 AM, Doug Berger wrote: > The GISB bus can support addresses beyond 32-bits. So this commit > corrects support for reading a captured 64-bit address into a 64-bit > variable by obtaining the high bits from the ARB_ERR_CAP_HI_ADDR > register (when present) and then outputting the full 64-bit value. > > It also removes unused definitions. > > Fixes: 44127b771d9c ("bus: add Broadcom GISB bus arbiter timeout/error > handler") > Signed-off-by: Doug Berger > --- > drivers/bus/brcmstb_gisb.c | 36 > 1 file changed, 20 insertions(+), 16 deletions(-) > > diff --git a/drivers/bus/brcmstb_gisb.c b/drivers/bus/brcmstb_gisb.c > [snip] > @@ -119,6 +116,16 @@ static u32 gisb_read(struct brcmstb_gisb_arb_device > *gdev, int reg) > return ioread32(gdev->base + offset); > } > > +static u64 gisb_read_address(struct brcmstb_gisb_arb_device *gdev) > +{ > + u64 value; > + > + value = (u64)gisb_read(gdev, ARB_ERR_CAP_ADDR); Unlike the one on the next line, this cast can be omitted. > + value |= (u64)gisb_read(gdev, ARB_ERR_CAP_HI_ADDR) << 32; > + > + return value; > +} > [snip] Acked-by: Gregory Fong
RE: [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for IOVA allocation
Hi Robin, I have made 3 separate patches now, which gives clear idea about the changes. we can have discussion there. Regards, Oza. -Original Message- From: Robin Murphy [mailto:robin.mur...@arm.com] Sent: Monday, March 20, 2017 9:14 PM To: Oza Oza Cc: Joerg Roedel; linux-...@vger.kernel.org; io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org; linux-arm-ker...@lists.infradead.org; devicet...@vger.kernel.org; bcm-kernel-feedback-l...@broadcom.com Subject: Re: [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for IOVA allocation On 20/03/17 08:57, Oza Oza wrote: > + linux-pci > > Regards, > Oza. > > -Original Message- > From: Oza Pawandeep [mailto:oza@broadcom.com] > Sent: Friday, March 17, 2017 11:41 AM > To: Joerg Roedel; Robin Murphy > Cc: io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org; > linux-arm-ker...@lists.infradead.org; devicet...@vger.kernel.org; > bcm-kernel-feedback-l...@broadcom.com; Oza Pawandeep > Subject: [RFC PATCH] iommu/dma: account pci host bridge dma_mask for > IOVA allocation > > It is possible that PCI device supports 64-bit DMA addressing, and > thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however > PCI host bridge may have limitations on the inbound transaction > addressing. As an example, consider NVME SSD device connected to > iproc-PCIe controller. > > Currently, the IOMMU DMA ops only considers PCI device dma_mask when > allocating an IOVA. This is particularly problematic on > ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for > in-bound transactions only after PCI Host has forwarded these > transactions on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA > of in-bound transactions has to honor the addressing restrictions of the > PCI Host. > > this patch is inspired by > http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1306545.ht > ml http://www.spinics.net/lists/arm-kernel/msg566947.html > > but above inspiraiton solves the half of the problem. > the rest of the problem is descrbied below, what we face on iproc > based SOCs. > > current pcie frmework and of framework integration assumes dma-ranges > in a way where memory-mapped devices define their dma-ranges. > dma-ranges: (child-bus-address, parent-bus-address, length). > > but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges. > dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; > > of_dma_configure is specifically witten to take care of memory mapped > devices. > but no implementation exists for pci to take care of pcie based memory > ranges. > in fact pci world doesnt seem to define standard dma-ranges since > there is an absense of the same, the dma_mask used to remain 32bit > because of > 0 size return (parsed by of_dma_configure()) > > this patch also implements of_pci_get_dma_ranges to cater to pci world > dma-ranges. > so then the returned size get best possible (largest) dma_mask. > for e.g. > dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get > dev->coherent_dma_mask=0x7f. > > conclusion: there are following problems > 1) linux pci and iommu framework integration has glitches with respect > to dma-ranges > 2) pci linux framework look very uncertain about dma-ranges, rather > binding is not defined >the way it is defined for memory mapped devices. >rcar and iproc based SOCs use their custom one dma-ranges >(rather can be standard) > 3) even if in case of default parser of_dma_get_ranges,: >it throws and erro" >"no dma-ranges found for node" >because of the bug which exists. >following lines should be moved to the end of while(1) > 839 node = of_get_next_parent(node); > 840 if (!node) > 841 break; Right, having made sense of this and looked into things myself I think I understand now; what this boils down to is that the existing implementation of of_dma_get_range() expects always to be given a leaf device_node, and doesn't cope with being given a device_node for the given device's parent bus directly. That's really all there is; it's not specific to PCI (there are other probeable and DMA-capable buses whose children aren't described in DT, like the fsl-mc thing), and it definitely doesn't have anything to do with IOMMUs. Now, that's certainly something to fix, but AFAICS this patch doesn't do that, only adds some PCI-specific code which is never called. DMA mask inheritance for arm64 is another issue, which again is general, but does tend to be more visible in the IOMMU case. That still needs some work on the APCI side - all the DT-centric approaches so far either regress or at best do nothing for ACPI. I've made a note to try to look into that soon, but from what I recall I fear there is still an open question about what to do for a default in the absence of IORT or _DMA (once the current assumption that drivers can override our arbitrary default at
RE: [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for IOVA allocation
Hi Robin, I have made 3 separate patches now, which gives clear idea about the changes. we can have discussion there. Regards, Oza. -Original Message- From: Robin Murphy [mailto:robin.mur...@arm.com] Sent: Monday, March 20, 2017 9:14 PM To: Oza Oza Cc: Joerg Roedel; linux-...@vger.kernel.org; io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org; linux-arm-ker...@lists.infradead.org; devicet...@vger.kernel.org; bcm-kernel-feedback-l...@broadcom.com Subject: Re: [RFC PATCH] iommu/dma/pci: account pci host bridge dma_mask for IOVA allocation On 20/03/17 08:57, Oza Oza wrote: > + linux-pci > > Regards, > Oza. > > -Original Message- > From: Oza Pawandeep [mailto:oza@broadcom.com] > Sent: Friday, March 17, 2017 11:41 AM > To: Joerg Roedel; Robin Murphy > Cc: io...@lists.linux-foundation.org; linux-kernel@vger.kernel.org; > linux-arm-ker...@lists.infradead.org; devicet...@vger.kernel.org; > bcm-kernel-feedback-l...@broadcom.com; Oza Pawandeep > Subject: [RFC PATCH] iommu/dma: account pci host bridge dma_mask for > IOVA allocation > > It is possible that PCI device supports 64-bit DMA addressing, and > thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however > PCI host bridge may have limitations on the inbound transaction > addressing. As an example, consider NVME SSD device connected to > iproc-PCIe controller. > > Currently, the IOMMU DMA ops only considers PCI device dma_mask when > allocating an IOVA. This is particularly problematic on > ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for > in-bound transactions only after PCI Host has forwarded these > transactions on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA > of in-bound transactions has to honor the addressing restrictions of the > PCI Host. > > this patch is inspired by > http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1306545.ht > ml http://www.spinics.net/lists/arm-kernel/msg566947.html > > but above inspiraiton solves the half of the problem. > the rest of the problem is descrbied below, what we face on iproc > based SOCs. > > current pcie frmework and of framework integration assumes dma-ranges > in a way where memory-mapped devices define their dma-ranges. > dma-ranges: (child-bus-address, parent-bus-address, length). > > but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges. > dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; > > of_dma_configure is specifically witten to take care of memory mapped > devices. > but no implementation exists for pci to take care of pcie based memory > ranges. > in fact pci world doesnt seem to define standard dma-ranges since > there is an absense of the same, the dma_mask used to remain 32bit > because of > 0 size return (parsed by of_dma_configure()) > > this patch also implements of_pci_get_dma_ranges to cater to pci world > dma-ranges. > so then the returned size get best possible (largest) dma_mask. > for e.g. > dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get > dev->coherent_dma_mask=0x7f. > > conclusion: there are following problems > 1) linux pci and iommu framework integration has glitches with respect > to dma-ranges > 2) pci linux framework look very uncertain about dma-ranges, rather > binding is not defined >the way it is defined for memory mapped devices. >rcar and iproc based SOCs use their custom one dma-ranges >(rather can be standard) > 3) even if in case of default parser of_dma_get_ranges,: >it throws and erro" >"no dma-ranges found for node" >because of the bug which exists. >following lines should be moved to the end of while(1) > 839 node = of_get_next_parent(node); > 840 if (!node) > 841 break; Right, having made sense of this and looked into things myself I think I understand now; what this boils down to is that the existing implementation of of_dma_get_range() expects always to be given a leaf device_node, and doesn't cope with being given a device_node for the given device's parent bus directly. That's really all there is; it's not specific to PCI (there are other probeable and DMA-capable buses whose children aren't described in DT, like the fsl-mc thing), and it definitely doesn't have anything to do with IOMMUs. Now, that's certainly something to fix, but AFAICS this patch doesn't do that, only adds some PCI-specific code which is never called. DMA mask inheritance for arm64 is another issue, which again is general, but does tend to be more visible in the IOMMU case. That still needs some work on the APCI side - all the DT-centric approaches so far either regress or at best do nothing for ACPI. I've made a note to try to look into that soon, but from what I recall I fear there is still an open question about what to do for a default in the absence of IORT or _DMA (once the current assumption that drivers can override our arbitrary default at
[RFC PATCH 2/3] iommu/dma: account pci host bridge dma_mask for IOVA allocation
it is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. As an example, consider NVME SSD device connected to iproc-PCIe controller. Currently, the IOMMU DMA ops only considers PCI device dma_mask when allocating an IOVA. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. this patch is inspired by http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1306545.html http://www.spinics.net/lists/arm-kernel/msg566947.html but above inspiraiton solves the half of the problem. the rest of the problem is descrbied below, what we face on iproc based SOCs. current pcie frmework and of framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. dma-ranges: (child-bus-address, parent-bus-address, length). but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; of_dma_configure is specifically witten to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. in fact pci world doesnt seem to define standard dma-ranges this patch implements of_pci_get_dma_ranges to cater to pci world dma-ranges. so then the returned size get best possible (largest) dma_mask. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. Reviewed-by: Anup PatelReviewed-by: Scott Branden Signed-off-by: Oza Pawandeep diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 8c7c244..20cfff7 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -217,6 +217,9 @@ config NEED_DMA_MAP_STATE config NEED_SG_DMA_LENGTH def_bool y +config ARCH_HAS_DMA_SET_COHERENT_MASK + def_bool y + config SMP def_bool y diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h index 73d5bab..64b4dc3 100644 --- a/arch/arm64/include/asm/device.h +++ b/arch/arm64/include/asm/device.h @@ -20,6 +20,7 @@ struct dev_archdata { #ifdef CONFIG_IOMMU_API void *iommu;/* private IOMMU data */ #endif + u64 parent_dma_mask; bool dma_coherent; }; diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 81cdb2e..5845ecd 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -564,6 +564,7 @@ static void flush_page(struct device *dev, const void *virt, phys_addr_t phys) __dma_flush_area(virt, PAGE_SIZE); } + static void *__iommu_alloc_attrs(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, unsigned long attrs) @@ -795,6 +796,20 @@ static void __iommu_unmap_sg_attrs(struct device *dev, iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs); } +static int __iommu_set_dma_mask(struct device *dev, u64 mask) +{ + /* device is not DMA capable */ + if (!dev->dma_mask) + return -EIO; + + if (mask > dev->archdata.parent_dma_mask) + mask = dev->archdata.parent_dma_mask; + + *dev->dma_mask = mask; + + return 0; +} + static const struct dma_map_ops iommu_dma_ops = { .alloc = __iommu_alloc_attrs, .free = __iommu_free_attrs, @@ -811,8 +826,21 @@ static void __iommu_unmap_sg_attrs(struct device *dev, .map_resource = iommu_dma_map_resource, .unmap_resource = iommu_dma_unmap_resource, .mapping_error = iommu_dma_mapping_error, + .set_dma_mask = __iommu_set_dma_mask, }; +int dma_set_coherent_mask(struct device *dev, u64 mask) +{ + if (get_dma_ops(dev) == _dma_ops && + mask > dev->archdata.parent_dma_mask) + mask = dev->archdata.parent_dma_mask; + + dev->coherent_dma_mask = mask; + return 0; +} +EXPORT_SYMBOL(dma_set_coherent_mask); + + /* * TODO: Right now __iommu_setup_dma_ops() gets called too early to do * everything it needs to - the device is only partially created and the @@ -975,6 +1003,8 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, if (!dev->dma_ops) dev->dma_ops = _dma_ops; + dev->archdata.parent_dma_mask = size - 1; + dev->archdata.dma_coherent = coherent; __iommu_setup_dma_ops(dev, dma_base, size, iommu); } diff --git a/drivers/of/device.c b/drivers/of/device.c index d362a98..471dcdf 100644 --- a/drivers/of/device.c +++ b/drivers/of/device.c @@ -139,10 +139,8 @@ void of_dma_configure(struct
[RFC PATCH 3/3] of: fix node traversing in of_dma_get_range
it jumps to the parent node without examining the child node. also with that, it throws "no dma-ranges found for node" for pci dma-ranges. this patch fixes device node traversing for dma-ranges. Reviewed-by: Anup PatelSigned-off-by: Oza Pawandeep diff --git a/drivers/of/address.c b/drivers/of/address.c index 02b2903..3293d55 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -836,9 +836,6 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *siz while (1) { naddr = of_n_addr_cells(node); nsize = of_n_size_cells(node); - node = of_get_next_parent(node); - if (!node) - break; ranges = of_get_property(node, "dma-ranges", ); @@ -852,6 +849,10 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *siz */ if (!ranges) break; + + node = of_get_next_parent(node); + if (!node) + break; } if (!ranges) { -- 1.9.1
[RFC PATCH 2/3] iommu/dma: account pci host bridge dma_mask for IOVA allocation
it is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. As an example, consider NVME SSD device connected to iproc-PCIe controller. Currently, the IOMMU DMA ops only considers PCI device dma_mask when allocating an IOVA. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. this patch is inspired by http://www.mail-archive.com/linux-kernel@vger.kernel.org/msg1306545.html http://www.spinics.net/lists/arm-kernel/msg566947.html but above inspiraiton solves the half of the problem. the rest of the problem is descrbied below, what we face on iproc based SOCs. current pcie frmework and of framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. dma-ranges: (child-bus-address, parent-bus-address, length). but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; of_dma_configure is specifically witten to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. in fact pci world doesnt seem to define standard dma-ranges this patch implements of_pci_get_dma_ranges to cater to pci world dma-ranges. so then the returned size get best possible (largest) dma_mask. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. Reviewed-by: Anup Patel Reviewed-by: Scott Branden Signed-off-by: Oza Pawandeep diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 8c7c244..20cfff7 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -217,6 +217,9 @@ config NEED_DMA_MAP_STATE config NEED_SG_DMA_LENGTH def_bool y +config ARCH_HAS_DMA_SET_COHERENT_MASK + def_bool y + config SMP def_bool y diff --git a/arch/arm64/include/asm/device.h b/arch/arm64/include/asm/device.h index 73d5bab..64b4dc3 100644 --- a/arch/arm64/include/asm/device.h +++ b/arch/arm64/include/asm/device.h @@ -20,6 +20,7 @@ struct dev_archdata { #ifdef CONFIG_IOMMU_API void *iommu;/* private IOMMU data */ #endif + u64 parent_dma_mask; bool dma_coherent; }; diff --git a/arch/arm64/mm/dma-mapping.c b/arch/arm64/mm/dma-mapping.c index 81cdb2e..5845ecd 100644 --- a/arch/arm64/mm/dma-mapping.c +++ b/arch/arm64/mm/dma-mapping.c @@ -564,6 +564,7 @@ static void flush_page(struct device *dev, const void *virt, phys_addr_t phys) __dma_flush_area(virt, PAGE_SIZE); } + static void *__iommu_alloc_attrs(struct device *dev, size_t size, dma_addr_t *handle, gfp_t gfp, unsigned long attrs) @@ -795,6 +796,20 @@ static void __iommu_unmap_sg_attrs(struct device *dev, iommu_dma_unmap_sg(dev, sgl, nelems, dir, attrs); } +static int __iommu_set_dma_mask(struct device *dev, u64 mask) +{ + /* device is not DMA capable */ + if (!dev->dma_mask) + return -EIO; + + if (mask > dev->archdata.parent_dma_mask) + mask = dev->archdata.parent_dma_mask; + + *dev->dma_mask = mask; + + return 0; +} + static const struct dma_map_ops iommu_dma_ops = { .alloc = __iommu_alloc_attrs, .free = __iommu_free_attrs, @@ -811,8 +826,21 @@ static void __iommu_unmap_sg_attrs(struct device *dev, .map_resource = iommu_dma_map_resource, .unmap_resource = iommu_dma_unmap_resource, .mapping_error = iommu_dma_mapping_error, + .set_dma_mask = __iommu_set_dma_mask, }; +int dma_set_coherent_mask(struct device *dev, u64 mask) +{ + if (get_dma_ops(dev) == _dma_ops && + mask > dev->archdata.parent_dma_mask) + mask = dev->archdata.parent_dma_mask; + + dev->coherent_dma_mask = mask; + return 0; +} +EXPORT_SYMBOL(dma_set_coherent_mask); + + /* * TODO: Right now __iommu_setup_dma_ops() gets called too early to do * everything it needs to - the device is only partially created and the @@ -975,6 +1003,8 @@ void arch_setup_dma_ops(struct device *dev, u64 dma_base, u64 size, if (!dev->dma_ops) dev->dma_ops = _dma_ops; + dev->archdata.parent_dma_mask = size - 1; + dev->archdata.dma_coherent = coherent; __iommu_setup_dma_ops(dev, dma_base, size, iommu); } diff --git a/drivers/of/device.c b/drivers/of/device.c index d362a98..471dcdf 100644 --- a/drivers/of/device.c +++ b/drivers/of/device.c @@ -139,10 +139,8 @@ void of_dma_configure(struct device *dev, struct device_node *np) * Limit coherent and dma
[RFC PATCH 3/3] of: fix node traversing in of_dma_get_range
it jumps to the parent node without examining the child node. also with that, it throws "no dma-ranges found for node" for pci dma-ranges. this patch fixes device node traversing for dma-ranges. Reviewed-by: Anup Patel Signed-off-by: Oza Pawandeep diff --git a/drivers/of/address.c b/drivers/of/address.c index 02b2903..3293d55 100644 --- a/drivers/of/address.c +++ b/drivers/of/address.c @@ -836,9 +836,6 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *siz while (1) { naddr = of_n_addr_cells(node); nsize = of_n_size_cells(node); - node = of_get_next_parent(node); - if (!node) - break; ranges = of_get_property(node, "dma-ranges", ); @@ -852,6 +849,10 @@ int of_dma_get_range(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *siz */ if (!ranges) break; + + node = of_get_next_parent(node); + if (!node) + break; } if (!ranges) { -- 1.9.1
[RFC PATCH 1/3] of/pci: dma-ranges to account highest possible host bridge dma_mask
it is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. As an example, consider NVME SSD device connected to iproc-PCIe controller. Currently, the IOMMU DMA ops only considers PCI device dma_mask when allocating an IOVA. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. current pcie frmework and of framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. dma-ranges: (child-bus-address, parent-bus-address, length). but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; of_dma_configure is specifically witten to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. in fact pci world doesnt seem to define standard dma-ranges this patch implements of_pci_get_dma_ranges to cater to pci world dma-ranges. so then the returned size get best possible (largest) dma_mask. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. Reviewed-by: Anup PatelReviewed-by: Scott Branden Signed-off-by: Oza Pawandeep Signed-off-by: Oza Pawandeep diff --git a/drivers/of/device.c b/drivers/of/device.c index b1e6beb..d362a98 100644 --- a/drivers/of/device.c +++ b/drivers/of/device.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include "of_private.h" @@ -104,7 +105,11 @@ void of_dma_configure(struct device *dev, struct device_node *np) if (!dev->dma_mask) dev->dma_mask = >coherent_dma_mask; - ret = of_dma_get_range(np, _addr, , ); + if (dev_is_pci(dev)) + ret = of_pci_get_dma_ranges(np, _addr, , ); + else + ret = of_dma_get_range(np, _addr, , ); + if (ret < 0) { dma_addr = offset = 0; size = dev->coherent_dma_mask + 1; diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..c7f8626 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -283,6 +283,52 @@ int of_pci_get_host_bridge_resources(struct device_node *dev, return err; } EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources); + +int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + int rlen, ret = 0; + const int na = 3, ns = 2; + struct of_pci_range_parser parser; + struct of_pci_range range; + + if (!node) + return -EINVAL; + + parser.node = node; + parser.pna = of_n_addr_cells(node); + parser.np = parser.pna + na + ns; + + parser.range = of_get_property(node, "dma-ranges", ); + + if (!parser.range) { + pr_debug("pcie device has no dma-ranges defined for node(%s)\n", np->full_name); + ret = -ENODEV; + goto out; + } + + parser.end = parser.range + rlen / sizeof(__be32); + *size = 0; + + for_each_of_pci_range(, ) { + if (*size < range.size) { + *dma_addr = range.pci_addr; + *size = range.size; + *paddr = range.cpu_addr; + } + } + + pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n", +*dma_addr, *paddr, *size); +*dma_addr = range.pci_addr; +*size = range.size; + +out: + of_node_put(node); + return ret; + +} +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges); #endif /* CONFIG_OF_ADDRESS */ #ifdef CONFIG_PCI_MSI diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index 0e0974e..907ace0 100644 --- a/include/linux/of_pci.h +++ b/include/linux/of_pci.h @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { } int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, struct list_head *resources, resource_size_t *io_base); +int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *size); #else static inline int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, @@ -83,6 +84,11 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev, { return -EINVAL; } + +static inline int of_pci_get_dma_ranges(struct device_node
[RFC PATCH 1/3] of/pci: dma-ranges to account highest possible host bridge dma_mask
it is possible that PCI device supports 64-bit DMA addressing, and thus it's driver sets device's dma_mask to DMA_BIT_MASK(64), however PCI host bridge may have limitations on the inbound transaction addressing. As an example, consider NVME SSD device connected to iproc-PCIe controller. Currently, the IOMMU DMA ops only considers PCI device dma_mask when allocating an IOVA. This is particularly problematic on ARM/ARM64 SOCs where the IOMMU (i.e. SMMU) translates IOVA to PA for in-bound transactions only after PCI Host has forwarded these transactions on SOC IO bus. This means on such ARM/ARM64 SOCs the IOVA of in-bound transactions has to honor the addressing restrictions of the PCI Host. current pcie frmework and of framework integration assumes dma-ranges in a way where memory-mapped devices define their dma-ranges. dma-ranges: (child-bus-address, parent-bus-address, length). but iproc based SOCs and even Rcar based SOCs has PCI world dma-ranges. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; of_dma_configure is specifically witten to take care of memory mapped devices. but no implementation exists for pci to take care of pcie based memory ranges. in fact pci world doesnt seem to define standard dma-ranges this patch implements of_pci_get_dma_ranges to cater to pci world dma-ranges. so then the returned size get best possible (largest) dma_mask. for e.g. dma-ranges = <0x4300 0x00 0x00 0x00 0x00 0x80 0x00>; we should get dev->coherent_dma_mask=0x7f. Reviewed-by: Anup Patel Reviewed-by: Scott Branden Signed-off-by: Oza Pawandeep Signed-off-by: Oza Pawandeep diff --git a/drivers/of/device.c b/drivers/of/device.c index b1e6beb..d362a98 100644 --- a/drivers/of/device.c +++ b/drivers/of/device.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include "of_private.h" @@ -104,7 +105,11 @@ void of_dma_configure(struct device *dev, struct device_node *np) if (!dev->dma_mask) dev->dma_mask = >coherent_dma_mask; - ret = of_dma_get_range(np, _addr, , ); + if (dev_is_pci(dev)) + ret = of_pci_get_dma_ranges(np, _addr, , ); + else + ret = of_dma_get_range(np, _addr, , ); + if (ret < 0) { dma_addr = offset = 0; size = dev->coherent_dma_mask + 1; diff --git a/drivers/of/of_pci.c b/drivers/of/of_pci.c index 0ee42c3..c7f8626 100644 --- a/drivers/of/of_pci.c +++ b/drivers/of/of_pci.c @@ -283,6 +283,52 @@ int of_pci_get_host_bridge_resources(struct device_node *dev, return err; } EXPORT_SYMBOL_GPL(of_pci_get_host_bridge_resources); + +int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *size) +{ + struct device_node *node = of_node_get(np); + int rlen, ret = 0; + const int na = 3, ns = 2; + struct of_pci_range_parser parser; + struct of_pci_range range; + + if (!node) + return -EINVAL; + + parser.node = node; + parser.pna = of_n_addr_cells(node); + parser.np = parser.pna + na + ns; + + parser.range = of_get_property(node, "dma-ranges", ); + + if (!parser.range) { + pr_debug("pcie device has no dma-ranges defined for node(%s)\n", np->full_name); + ret = -ENODEV; + goto out; + } + + parser.end = parser.range + rlen / sizeof(__be32); + *size = 0; + + for_each_of_pci_range(, ) { + if (*size < range.size) { + *dma_addr = range.pci_addr; + *size = range.size; + *paddr = range.cpu_addr; + } + } + + pr_debug("dma_addr(%llx) cpu_addr(%llx) size(%llx)\n", +*dma_addr, *paddr, *size); +*dma_addr = range.pci_addr; +*size = range.size; + +out: + of_node_put(node); + return ret; + +} +EXPORT_SYMBOL_GPL(of_pci_get_dma_ranges); #endif /* CONFIG_OF_ADDRESS */ #ifdef CONFIG_PCI_MSI diff --git a/include/linux/of_pci.h b/include/linux/of_pci.h index 0e0974e..907ace0 100644 --- a/include/linux/of_pci.h +++ b/include/linux/of_pci.h @@ -76,6 +76,7 @@ static inline void of_pci_check_probe_only(void) { } int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, struct list_head *resources, resource_size_t *io_base); +int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *size); #else static inline int of_pci_get_host_bridge_resources(struct device_node *dev, unsigned char busno, unsigned char bus_max, @@ -83,6 +84,11 @@ static inline int of_pci_get_host_bridge_resources(struct device_node *dev, { return -EINVAL; } + +static inline int of_pci_get_dma_ranges(struct device_node *np, u64 *dma_addr, u64 *paddr, u64 *size) +{ + return -EINVAL; +} #endif #if
Re: [PATCH v6 03/15] IB/mthca: Replace PCI pool old API
On Sun, 2017-03-19 at 18:03 +0100, Romain Perier wrote: > The PCI pool API is deprecated. This commit replaces the PCI pool old > API by the appropriate function with the DMA pool API. > > Signed-off-by: Romain Perier> Acked-by: Peter Senna Tschudin > Tested-by: Peter Senna Tschudin > Changes look ok, and I've tested them here as well. Acked-by: Doug Ledford Tested-by: Doug Ledford -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Re: [PATCH v6 03/15] IB/mthca: Replace PCI pool old API
On Sun, 2017-03-19 at 18:03 +0100, Romain Perier wrote: > The PCI pool API is deprecated. This commit replaces the PCI pool old > API by the appropriate function with the DMA pool API. > > Signed-off-by: Romain Perier > Acked-by: Peter Senna Tschudin > Tested-by: Peter Senna Tschudin > Changes look ok, and I've tested them here as well. Acked-by: Doug Ledford Tested-by: Doug Ledford -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Re: [PATCH 4/9] bus: brcmstb_gisb: Use register offsets with writes too
On Fri, Mar 24, 2017 at 7:46 AM, Doug Bergerwrote: > This commit corrects the bug introduced in commit f80835875d3d > ("bus: brcmstb_gisb: Look up register offsets in a table") such > that gisb_write() translates the register enumeration into an > offset from the base address for writes as well as reads. > > Fixes: f80835875d3d ("bus: brcmstb_gisb: Look up register offsets in a table") > Signed-off-by: Doug Berger Acked-by: Gregory Fong
Re: [PATCH 4/9] bus: brcmstb_gisb: Use register offsets with writes too
On Fri, Mar 24, 2017 at 7:46 AM, Doug Berger wrote: > This commit corrects the bug introduced in commit f80835875d3d > ("bus: brcmstb_gisb: Look up register offsets in a table") such > that gisb_write() translates the register enumeration into an > offset from the base address for writes as well as reads. > > Fixes: f80835875d3d ("bus: brcmstb_gisb: Look up register offsets in a table") > Signed-off-by: Doug Berger Acked-by: Gregory Fong
[PATCH] staging: greybus: compress return logic
Simplify function returns by merging assignment and return. Signed-off-by: Arushi Singhal--- drivers/staging/greybus/loopback.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/staging/greybus/loopback.c b/drivers/staging/greybus/loopback.c index aaf29a5fac83..08e255884206 100644 --- a/drivers/staging/greybus/loopback.c +++ b/drivers/staging/greybus/loopback.c @@ -365,11 +365,8 @@ static void gb_loopback_calculate_stats(struct gb_loopback *gb, bool error); static u32 gb_loopback_nsec_to_usec_latency(u64 elapsed_nsecs) { - u32 lat; - do_div(elapsed_nsecs, NSEC_PER_USEC); - lat = elapsed_nsecs; - return lat; + return elapsed_nsecs; } static u64 __gb_loopback_calc_latency(u64 t1, u64 t2) -- 2.11.0
[PATCH] staging: greybus: compress return logic
Simplify function returns by merging assignment and return. Signed-off-by: Arushi Singhal --- drivers/staging/greybus/loopback.c | 5 + 1 file changed, 1 insertion(+), 4 deletions(-) diff --git a/drivers/staging/greybus/loopback.c b/drivers/staging/greybus/loopback.c index aaf29a5fac83..08e255884206 100644 --- a/drivers/staging/greybus/loopback.c +++ b/drivers/staging/greybus/loopback.c @@ -365,11 +365,8 @@ static void gb_loopback_calculate_stats(struct gb_loopback *gb, bool error); static u32 gb_loopback_nsec_to_usec_latency(u64 elapsed_nsecs) { - u32 lat; - do_div(elapsed_nsecs, NSEC_PER_USEC); - lat = elapsed_nsecs; - return lat; + return elapsed_nsecs; } static u64 __gb_loopback_calc_latency(u64 t1, u64 t2) -- 2.11.0
Re: [PATCH] regulator: core: Limit propagation of parent voltage count and list
Hello Matthias, On 03/24/2017 05:38 PM, Brian Norris wrote: > On Fri, Mar 24, 2017 at 01:09:52PM -0700, Matthias Kaehlcke wrote: >> diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c >> index 53d4fc70dbd0..121838e0125b 100644 >> --- a/drivers/regulator/core.c >> +++ b/drivers/regulator/core.c >> @@ -2487,6 +2487,10 @@ static int _regulator_list_voltage(struct regulator >> *regulator, >> if (lock) >> mutex_unlock(>mutex); >> } else if (rdev->supply) { >> +// Limit propagation of parent values to switch regulators > > The kernel doesn't use C99 comments. Oddly enough, this isn't actually +1 > in the coding style doc (Documentation/process/coding-style.rst), nor is > it caught by scripts/checkpatch.pl (even though it clearly has a 'C99 > comment' rule). > >> +if (ops->get_voltage || ops->get_voltage_sel) It's valid to have a .get_voltage_sel callback without a .list_voltage? At least it seems that _regulator_get_voltage() assumes that having a .get_voltage_sel implies that a .list_voltage will also be available. static int _regulator_get_voltage(struct regulator_dev *rdev) { ... if (rdev->desc->ops->get_voltage_sel) { sel = rdev->desc->ops->get_voltage_sel(rdev); if (sel < 0) return sel; ret = rdev->desc->ops->list_voltage(rdev, sel); } else if (rdev->desc->ops->get_voltage) { ... } So I would only check for if (ops->get_voltage). >> +return -EINVAL; >> + >> ret = _regulator_list_voltage(rdev->supply, selector, lock); >> } else { >> return -EINVAL; >> @@ -2540,6 +2544,7 @@ EXPORT_SYMBOL_GPL(regulator_is_enabled); >> int regulator_count_voltages(struct regulator *regulator) >> { >> struct regulator_dev*rdev = regulator->rdev; >> +const struct regulator_ops *ops = rdev->desc->ops; >> >> if (rdev->desc->n_voltages) >> return rdev->desc->n_voltages; >> @@ -2547,6 +2552,10 @@ int regulator_count_voltages(struct regulator >> *regulator) >> if (!rdev->supply) >> return -EINVAL; >> >> +// Limit propagation of parent value to switch regulators > > Same here. > >> +if (ops->get_voltage || ops->get_voltage_sel || ops->list_voltage) >> +return -EINVAL; >> + I wonder if instead of always checking if the regulator lacks operations, it wouldn't be better to do it just once and store that the regulator is a switch so that state can be used as explicit check for switch instead. Something like if (!rdev->supply || !rdev->switch) looks more clear to me. >> return regulator_count_voltages(rdev->supply); >> } >> EXPORT_SYMBOL_GPL(regulator_count_voltages); > > I'm not very familiar with this code, but judging by your problem > description in previous threads and by comparing with the logic in > _regulator_get_voltage() (for when to reference the ->supply), this > seems resonable. So: > > Reviewed-by: Brian Norris> Agreed, the logic sounds reasonable indeed and I didn't think of this case when writing the mentioned commit, so feel free to add: Reviewed-by: Javier Martinez Canillas > It's probably worth verifying that this doesn't break whatever Javier > was supporting in the first place, as a sanity check. > I've tested in the system that led to the mentioned commit and I did not find any issue with $SUBJECT. Tested-by: Javier Martinez Canillas > Brian > Best regards, -- Javier Martinez Canillas Open Source Group Samsung Research America
Re: [PATCH] regulator: core: Limit propagation of parent voltage count and list
Hello Matthias, On 03/24/2017 05:38 PM, Brian Norris wrote: > On Fri, Mar 24, 2017 at 01:09:52PM -0700, Matthias Kaehlcke wrote: >> diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c >> index 53d4fc70dbd0..121838e0125b 100644 >> --- a/drivers/regulator/core.c >> +++ b/drivers/regulator/core.c >> @@ -2487,6 +2487,10 @@ static int _regulator_list_voltage(struct regulator >> *regulator, >> if (lock) >> mutex_unlock(>mutex); >> } else if (rdev->supply) { >> +// Limit propagation of parent values to switch regulators > > The kernel doesn't use C99 comments. Oddly enough, this isn't actually +1 > in the coding style doc (Documentation/process/coding-style.rst), nor is > it caught by scripts/checkpatch.pl (even though it clearly has a 'C99 > comment' rule). > >> +if (ops->get_voltage || ops->get_voltage_sel) It's valid to have a .get_voltage_sel callback without a .list_voltage? At least it seems that _regulator_get_voltage() assumes that having a .get_voltage_sel implies that a .list_voltage will also be available. static int _regulator_get_voltage(struct regulator_dev *rdev) { ... if (rdev->desc->ops->get_voltage_sel) { sel = rdev->desc->ops->get_voltage_sel(rdev); if (sel < 0) return sel; ret = rdev->desc->ops->list_voltage(rdev, sel); } else if (rdev->desc->ops->get_voltage) { ... } So I would only check for if (ops->get_voltage). >> +return -EINVAL; >> + >> ret = _regulator_list_voltage(rdev->supply, selector, lock); >> } else { >> return -EINVAL; >> @@ -2540,6 +2544,7 @@ EXPORT_SYMBOL_GPL(regulator_is_enabled); >> int regulator_count_voltages(struct regulator *regulator) >> { >> struct regulator_dev*rdev = regulator->rdev; >> +const struct regulator_ops *ops = rdev->desc->ops; >> >> if (rdev->desc->n_voltages) >> return rdev->desc->n_voltages; >> @@ -2547,6 +2552,10 @@ int regulator_count_voltages(struct regulator >> *regulator) >> if (!rdev->supply) >> return -EINVAL; >> >> +// Limit propagation of parent value to switch regulators > > Same here. > >> +if (ops->get_voltage || ops->get_voltage_sel || ops->list_voltage) >> +return -EINVAL; >> + I wonder if instead of always checking if the regulator lacks operations, it wouldn't be better to do it just once and store that the regulator is a switch so that state can be used as explicit check for switch instead. Something like if (!rdev->supply || !rdev->switch) looks more clear to me. >> return regulator_count_voltages(rdev->supply); >> } >> EXPORT_SYMBOL_GPL(regulator_count_voltages); > > I'm not very familiar with this code, but judging by your problem > description in previous threads and by comparing with the logic in > _regulator_get_voltage() (for when to reference the ->supply), this > seems resonable. So: > > Reviewed-by: Brian Norris > Agreed, the logic sounds reasonable indeed and I didn't think of this case when writing the mentioned commit, so feel free to add: Reviewed-by: Javier Martinez Canillas > It's probably worth verifying that this doesn't break whatever Javier > was supporting in the first place, as a sanity check. > I've tested in the system that led to the mentioned commit and I did not find any issue with $SUBJECT. Tested-by: Javier Martinez Canillas > Brian > Best regards, -- Javier Martinez Canillas Open Source Group Samsung Research America
0324 tree BUG at kernel/auditsc.c:1513!
[11230.930568] [ cut here ] [11230.953828] kernel BUG at kernel/auditsc.c:1513! [11230.976157] invalid opcode: [#1] SMP [11230.995917] Modules linked in: btrfs xor raid6_pq ext2 dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio loop ext4 jbd2 mbcache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter dm_mirror dm_region_hash dm_log dm_mod intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ipmi_ssif aesni_intel crypto_simd ipmi_si glue_helper iTCO_wdt ipmi_devintf iTCO_vendor_support cryptd dax_pmem sg hpilo ipmi_msghandler hpwdt lpc_ich pcc_cpufreq pcspkr dax ioatdma i2c_i801 wmi acpi_power_meter acpi_cpufreq [11231.318010] dca shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm tg3 ptp hpsa crc32c_intel nd_pmem serio_raw i2c_core pps_core scsi_transport_sas [last unloaded: scsi_debug] [11231.440342] CPU: 24 PID: 15334 Comm: dio_truncate Not tainted 4.11.0-rc3-linux-next-65b2dc3-next-20170324 #336 [11231.488861] Hardware name: HP ProLiant DL360 Gen9, BIOS P89 05/06/2015 [11231.521003] task: 9eb578bc5a00 task.stack: c277665d8000 [11231.547477] RIP: 0010:__audit_syscall_entry+0xf0/0x100 [11231.570495] RSP: 0018:c277665dbe90 EFLAGS: 00010206 [11231.594551] RAX: RBX: 9ebf2896a800 RCX: [11231.626815] RDX: 4000 RSI: 7ffe7a853c60 RDI: 0002 [11231.658965] RBP: c277665dbea0 R08: 7ffe7a853940 R09: 9eb578bc5a00 [11231.691211] R10: 7ffe7a853940 R11: 770b5a00 R12: [11231.723119] R13: 0002 R14: R15: [11231.755258] FS: 7fdbdb18b740() GS:9ebf3fc0() knlGS: [11231.791482] CS: 0010 DS: ES: CR0: 80050033 [11231.817433] CR2: 7fff02451000 CR3: 00076082 CR4: 001406e0 [11231.849728] Call Trace: [11231.860748] syscall_trace_enter+0x1d0/0x2b0 [11231.880034] ? __audit_syscall_exit+0x209/0x290 [11231.900057] do_syscall_64+0x155/0x180 [11231.916776] entry_SYSCALL64_slow_path+0x25/0x25 [11231.937440] RIP: 0033:0x7fdbdad70c20 [11231.953513] RSP: 002b:7ffe7a853c28 EFLAGS: 0246 ORIG_RAX: 0002 [11231.989037] RAX: ffda RBX: 7fdbdb18b6c0 RCX: 7fdbdad70c20 [11232.023770] RDX: RSI: 4000 RDI: 7ffe7a853c60 [11232.059308] RBP: 7ffe7a853c60 R08: R09: 01a68010 [11232.091419] R10: 7ffe7a853940 R11: 0246 R12: [11232.123493] R13: 7ffe7a854d50 R14: R15: [11232.155457] Code: 02 00 00 00 00 00 00 5b 41 5c 5d c3 48 c7 43 50 00 00 00 00 48 c7 c2 a0 f8 6f a6 48 89 de 4c 89 cf e8 05 f5 ff ff 41 89 c4 eb a9 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 [11232.240315] RIP: __audit_syscall_entry+0xf0/0x100 RSP: c277665dbe90 [11232.272441] BUG: unable to handle kernel paging request at 9ebf29362000 [11232.272451] ---[ end trace 7e25ab22dc4e0f7a ]--- [11232.273486] Kernel panic - not syncing: Fatal exception [11232.347746] IP: __memmove+0x24/0x1a0 [11232.363684] PGD 7362e1067 [11232.363684] P4D 7362e1067 [11232.375799] PUD 107243f063 [11232.387851] PMD 1069361063 [11232.400372] BAD [11232.422587] Oops: 0002 [#2] SMP [11232.436808] Modules linked in: btrfs xor raid6_pq ext2 dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio loop ext4 jbd2 mbcache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter dm_mirror dm_region_hash dm_log dm_mod intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ipmi_ssif aesni_intel crypto_simd ipmi_si glue_helper iTCO_wdt ipmi_devintf iTCO_vendor_support cryptd dax_pmem sg hpilo ipmi_msghandler hpwdt lpc_ich pcc_cpufreq pcspkr dax ioatdma i2c_i801 wmi acpi_power_meter acpi_cpufreq [11232.793787] dca shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm tg3 ptp hpsa crc32c_intel nd_pmem serio_raw i2c_core pps_core scsi_transport_sas [last unloaded: scsi_debug] [11232.920032] CPU: 19 PID: 15333 Comm: dio_truncate Tainted: G D 4.11
0324 tree BUG at kernel/auditsc.c:1513!
[11230.930568] [ cut here ] [11230.953828] kernel BUG at kernel/auditsc.c:1513! [11230.976157] invalid opcode: [#1] SMP [11230.995917] Modules linked in: btrfs xor raid6_pq ext2 dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio loop ext4 jbd2 mbcache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter dm_mirror dm_region_hash dm_log dm_mod intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ipmi_ssif aesni_intel crypto_simd ipmi_si glue_helper iTCO_wdt ipmi_devintf iTCO_vendor_support cryptd dax_pmem sg hpilo ipmi_msghandler hpwdt lpc_ich pcc_cpufreq pcspkr dax ioatdma i2c_i801 wmi acpi_power_meter acpi_cpufreq [11231.318010] dca shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm tg3 ptp hpsa crc32c_intel nd_pmem serio_raw i2c_core pps_core scsi_transport_sas [last unloaded: scsi_debug] [11231.440342] CPU: 24 PID: 15334 Comm: dio_truncate Not tainted 4.11.0-rc3-linux-next-65b2dc3-next-20170324 #336 [11231.488861] Hardware name: HP ProLiant DL360 Gen9, BIOS P89 05/06/2015 [11231.521003] task: 9eb578bc5a00 task.stack: c277665d8000 [11231.547477] RIP: 0010:__audit_syscall_entry+0xf0/0x100 [11231.570495] RSP: 0018:c277665dbe90 EFLAGS: 00010206 [11231.594551] RAX: RBX: 9ebf2896a800 RCX: [11231.626815] RDX: 4000 RSI: 7ffe7a853c60 RDI: 0002 [11231.658965] RBP: c277665dbea0 R08: 7ffe7a853940 R09: 9eb578bc5a00 [11231.691211] R10: 7ffe7a853940 R11: 770b5a00 R12: [11231.723119] R13: 0002 R14: R15: [11231.755258] FS: 7fdbdb18b740() GS:9ebf3fc0() knlGS: [11231.791482] CS: 0010 DS: ES: CR0: 80050033 [11231.817433] CR2: 7fff02451000 CR3: 00076082 CR4: 001406e0 [11231.849728] Call Trace: [11231.860748] syscall_trace_enter+0x1d0/0x2b0 [11231.880034] ? __audit_syscall_exit+0x209/0x290 [11231.900057] do_syscall_64+0x155/0x180 [11231.916776] entry_SYSCALL64_slow_path+0x25/0x25 [11231.937440] RIP: 0033:0x7fdbdad70c20 [11231.953513] RSP: 002b:7ffe7a853c28 EFLAGS: 0246 ORIG_RAX: 0002 [11231.989037] RAX: ffda RBX: 7fdbdb18b6c0 RCX: 7fdbdad70c20 [11232.023770] RDX: RSI: 4000 RDI: 7ffe7a853c60 [11232.059308] RBP: 7ffe7a853c60 R08: R09: 01a68010 [11232.091419] R10: 7ffe7a853940 R11: 0246 R12: [11232.123493] R13: 7ffe7a854d50 R14: R15: [11232.155457] Code: 02 00 00 00 00 00 00 5b 41 5c 5d c3 48 c7 43 50 00 00 00 00 48 c7 c2 a0 f8 6f a6 48 89 de 4c 89 cf e8 05 f5 ff ff 41 89 c4 eb a9 <0f> 0b 0f 1f 40 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 [11232.240315] RIP: __audit_syscall_entry+0xf0/0x100 RSP: c277665dbe90 [11232.272441] BUG: unable to handle kernel paging request at 9ebf29362000 [11232.272451] ---[ end trace 7e25ab22dc4e0f7a ]--- [11232.273486] Kernel panic - not syncing: Fatal exception [11232.347746] IP: __memmove+0x24/0x1a0 [11232.363684] PGD 7362e1067 [11232.363684] P4D 7362e1067 [11232.375799] PUD 107243f063 [11232.387851] PMD 1069361063 [11232.400372] BAD [11232.422587] Oops: 0002 [#2] SMP [11232.436808] Modules linked in: btrfs xor raid6_pq ext2 dm_thin_pool dm_persistent_data dm_bio_prison dm_bufio loop ext4 jbd2 mbcache xt_CHECKSUM iptable_mangle ipt_MASQUERADE nf_nat_masquerade_ipv4 iptable_nat nf_nat_ipv4 nf_nat nf_conntrack_ipv4 nf_defrag_ipv4 xt_conntrack nf_conntrack ipt_REJECT nf_reject_ipv4 tun bridge stp llc ebtable_filter ebtables ip6table_filter ip6_tables iptable_filter dm_mirror dm_region_hash dm_log dm_mod intel_rapl sb_edac edac_core x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel kvm irqbypass crct10dif_pclmul crc32_pclmul ghash_clmulni_intel pcbc ipmi_ssif aesni_intel crypto_simd ipmi_si glue_helper iTCO_wdt ipmi_devintf iTCO_vendor_support cryptd dax_pmem sg hpilo ipmi_msghandler hpwdt lpc_ich pcc_cpufreq pcspkr dax ioatdma i2c_i801 wmi acpi_power_meter acpi_cpufreq [11232.793787] dca shpchp nfsd auth_rpcgss nfs_acl lockd grace sunrpc binfmt_misc ip_tables xfs libcrc32c sd_mod mgag200 i2c_algo_bit drm_kms_helper syscopyarea sysfillrect sysimgblt fb_sys_fops ttm drm tg3 ptp hpsa crc32c_intel nd_pmem serio_raw i2c_core pps_core scsi_transport_sas [last unloaded: scsi_debug] [11232.920032] CPU: 19 PID: 15333 Comm: dio_truncate Tainted: G D 4.11
[PATCH][RFC] cpufreq: Bring CPUs up even if cpufreq_online failed
There is a report that after commit 27622b061eb4 ("cpufreq: Convert to hotplug state machine"), the normal CPU offline/online cycle failed on some platforms. According to the ftrace result, this problem was triggered on platforms using acpi-freq as the default cpufreq driver, and due to the lack of some ACPI freq method(_PCT eg), the cpufreq_online failed and returned a negative value, thus the cpu hotplug statemachine rollbacked the CPU online process. Actually the failure of cpufreq_online should not impact the whole CPU online process according to the original semantics before above patch. BTW, during system bootup the cpufreq_online is not invoked via cpuhotplug statemachine but by the cpufreq device creation process, thus the APs can be brought up although cpufreq_online failed in that stage. This patch ignores the return value of cpufreq_online/offline and prints a warning if there is a failure. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=194581 Fixes: 27622b061eb4 ("cpufreq: Convert to hotplug state machine") Reported-and-tested-by: Tomasz Maciej NowakCc: "Rafael J. Wysocki" Cc: Viresh Kumar Cc: linux...@vger.kernel.org Cc: Stable # 4.9+ Signed-off-by: Chen Yu --- drivers/cpufreq/cpufreq.c | 26 -- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index b8ff617..1c13873 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2391,6 +2391,28 @@ EXPORT_SYMBOL_GPL(cpufreq_boost_enabled); */ static enum cpuhp_state hp_online; +static int cpuhp_cpufreq_online(unsigned int cpu) +{ + int ret = cpufreq_online(cpu); + + if (ret) + pr_err("Failed to bring cpufreq online for CPU%u. (%d)\n", + cpu, ret); + + return 0; +} + +static int cpuhp_cpufreq_offline(unsigned int cpu) +{ + int ret = cpufreq_offline(cpu); + + if (ret) + pr_err("Failed to put cpufreq offline for CPU%u. (%d)\n", + cpu, ret); + + return 0; +} + /** * cpufreq_register_driver - register a CPU Frequency driver * @driver_data: A struct cpufreq_driver containing the values# @@ -2453,8 +2475,8 @@ int cpufreq_register_driver(struct cpufreq_driver *driver_data) } ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "cpufreq:online", - cpufreq_online, - cpufreq_offline); + cpuhp_cpufreq_online, + cpuhp_cpufreq_offline); if (ret < 0) goto err_if_unreg; hp_online = ret; -- 2.7.4
[PATCH][RFC] cpufreq: Bring CPUs up even if cpufreq_online failed
There is a report that after commit 27622b061eb4 ("cpufreq: Convert to hotplug state machine"), the normal CPU offline/online cycle failed on some platforms. According to the ftrace result, this problem was triggered on platforms using acpi-freq as the default cpufreq driver, and due to the lack of some ACPI freq method(_PCT eg), the cpufreq_online failed and returned a negative value, thus the cpu hotplug statemachine rollbacked the CPU online process. Actually the failure of cpufreq_online should not impact the whole CPU online process according to the original semantics before above patch. BTW, during system bootup the cpufreq_online is not invoked via cpuhotplug statemachine but by the cpufreq device creation process, thus the APs can be brought up although cpufreq_online failed in that stage. This patch ignores the return value of cpufreq_online/offline and prints a warning if there is a failure. Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=194581 Fixes: 27622b061eb4 ("cpufreq: Convert to hotplug state machine") Reported-and-tested-by: Tomasz Maciej Nowak Cc: "Rafael J. Wysocki" Cc: Viresh Kumar Cc: linux...@vger.kernel.org Cc: Stable # 4.9+ Signed-off-by: Chen Yu --- drivers/cpufreq/cpufreq.c | 26 -- 1 file changed, 24 insertions(+), 2 deletions(-) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index b8ff617..1c13873 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -2391,6 +2391,28 @@ EXPORT_SYMBOL_GPL(cpufreq_boost_enabled); */ static enum cpuhp_state hp_online; +static int cpuhp_cpufreq_online(unsigned int cpu) +{ + int ret = cpufreq_online(cpu); + + if (ret) + pr_err("Failed to bring cpufreq online for CPU%u. (%d)\n", + cpu, ret); + + return 0; +} + +static int cpuhp_cpufreq_offline(unsigned int cpu) +{ + int ret = cpufreq_offline(cpu); + + if (ret) + pr_err("Failed to put cpufreq offline for CPU%u. (%d)\n", + cpu, ret); + + return 0; +} + /** * cpufreq_register_driver - register a CPU Frequency driver * @driver_data: A struct cpufreq_driver containing the values# @@ -2453,8 +2475,8 @@ int cpufreq_register_driver(struct cpufreq_driver *driver_data) } ret = cpuhp_setup_state_nocalls(CPUHP_AP_ONLINE_DYN, "cpufreq:online", - cpufreq_online, - cpufreq_offline); + cpuhp_cpufreq_online, + cpuhp_cpufreq_offline); if (ret < 0) goto err_if_unreg; hp_online = ret; -- 2.7.4
Re: [PATCH 4.10 00/27] 4.10.6-stable review
On 03/24/2017 10:58 AM, Greg Kroah-Hartman wrote: This is the start of the stable review cycle for the 4.10.6 release. There are 27 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Sun Mar 26 15:12:13 UTC 2017. Anything received after that time might be too late. Build results: total: 149 pass: 149 fail: 0 Qemu test results: total: 122 pass: 122 fail: 0 Details are available at http://kerneltests.org/builders. Guenter
Re: [PATCH 4.10 00/27] 4.10.6-stable review
On 03/24/2017 10:58 AM, Greg Kroah-Hartman wrote: This is the start of the stable review cycle for the 4.10.6 release. There are 27 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Sun Mar 26 15:12:13 UTC 2017. Anything received after that time might be too late. Build results: total: 149 pass: 149 fail: 0 Qemu test results: total: 122 pass: 122 fail: 0 Details are available at http://kerneltests.org/builders. Guenter
Re: [PATCH 4.9 00/24] 4.9.18-stable review
On 03/24/2017 10:58 AM, Greg Kroah-Hartman wrote: This is the start of the stable review cycle for the 4.9.18 release. There are 24 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Sun Mar 26 15:12:08 UTC 2017. Anything received after that time might be too late. Build results: total: 149 pass: 149 fail: 0 Qemu test results: total: 122 pass: 122 fail: 0 Details are available at http://kerneltests.org/builders. Guenter
Re: [PATCH 4.9 00/24] 4.9.18-stable review
On 03/24/2017 10:58 AM, Greg Kroah-Hartman wrote: This is the start of the stable review cycle for the 4.9.18 release. There are 24 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Sun Mar 26 15:12:08 UTC 2017. Anything received after that time might be too late. Build results: total: 149 pass: 149 fail: 0 Qemu test results: total: 122 pass: 122 fail: 0 Details are available at http://kerneltests.org/builders. Guenter
Re: [PATCH 4.4 00/30] 4.4.57-stable review
On 03/24/2017 10:58 AM, Greg Kroah-Hartman wrote: This is the start of the stable review cycle for the 4.4.57 release. There are 30 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Sun Mar 26 15:12:02 UTC 2017. Anything received after that time might be too late. Build results: total: 149 pass: 149 fail: 0 Qemu test results: total: 115 pass: 115 fail: 0 Details are available at http://kerneltests.org/builders. Guenter
Re: [PATCH 4.4 00/30] 4.4.57-stable review
On 03/24/2017 10:58 AM, Greg Kroah-Hartman wrote: This is the start of the stable review cycle for the 4.4.57 release. There are 30 patches in this series, all will be posted as a response to this one. If anyone has any issues with these being applied, please let me know. Responses should be made by Sun Mar 26 15:12:02 UTC 2017. Anything received after that time might be too late. Build results: total: 149 pass: 149 fail: 0 Qemu test results: total: 115 pass: 115 fail: 0 Details are available at http://kerneltests.org/builders. Guenter
Re: [PATCH 4.4 00/30] 4.4.57-stable review
On 03/24/2017 05:10 PM, Kevin Hilman wrote: + at91 maintainers kernelci.org botwrites: stable-rc boot: 496 boots: 1 failed, 492 passed with 2 offline, 1 conflict (v4.4.56-31-gbcd1e808ead3) Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/kernel/v4.4.56-31-gbcd1e808ead3/ Full Build Summary: https://kernelci.org/build/stable-rc/kernel/v4.4.56-31-gbcd1e808ead3/ Tree: stable-rc Branch: local/linux-4.4.y Git Describe: v4.4.56-31-gbcd1e808ead3 Git Commit: bcd1e808ead359a9af8476025d8b8a5349796dcd Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 97 unique boards, 23 SoC families, 31 builds out of 202 Boot Regressions Detected: arm: multi_v7_defconfig+CONFIG_LKDTM=y: at91-sama5d2_xplained: lab-free-electrons: new failure (last pass: v4.4.51-27-g2ffd736763bc) This one is definitely a new regression. Hopefully the AT91 maintainers (now Cc'd) can have a closer look. 6b1d7b6f54c7 would be a candidate for a culprit. Guenter [...] Conflicting Boot Failure Detected: (These likely are not failures as other labs are reporting PASS. Needs review.) arm: multi_v7_defconfig+CONFIG_PROVE_LOCKING=y: at91-sama5d3_xplained: lab-baylibre-seattle: PASS lab-free-electrons: FAIL @Alexandre: Because it's passing in my lab and failing in yours, I'm guessing this is still the UART overflow issue we've discussed before? What's strange is that this defconfig in your lab seems to only be booting for stable/linux-4.4.y[1] but not mailine or newer stable trees, so I couldn't check if the problem still exists in mainline. Kevin [1] https://kernelci.org/boot/at91-sama5d3_xplained/?CONFIG_PROVE_LOCKING
Re: [PATCH 4.4 00/30] 4.4.57-stable review
On 03/24/2017 05:10 PM, Kevin Hilman wrote: + at91 maintainers kernelci.org bot writes: stable-rc boot: 496 boots: 1 failed, 492 passed with 2 offline, 1 conflict (v4.4.56-31-gbcd1e808ead3) Full Boot Summary: https://kernelci.org/boot/all/job/stable-rc/kernel/v4.4.56-31-gbcd1e808ead3/ Full Build Summary: https://kernelci.org/build/stable-rc/kernel/v4.4.56-31-gbcd1e808ead3/ Tree: stable-rc Branch: local/linux-4.4.y Git Describe: v4.4.56-31-gbcd1e808ead3 Git Commit: bcd1e808ead359a9af8476025d8b8a5349796dcd Git URL: http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git Tested: 97 unique boards, 23 SoC families, 31 builds out of 202 Boot Regressions Detected: arm: multi_v7_defconfig+CONFIG_LKDTM=y: at91-sama5d2_xplained: lab-free-electrons: new failure (last pass: v4.4.51-27-g2ffd736763bc) This one is definitely a new regression. Hopefully the AT91 maintainers (now Cc'd) can have a closer look. 6b1d7b6f54c7 would be a candidate for a culprit. Guenter [...] Conflicting Boot Failure Detected: (These likely are not failures as other labs are reporting PASS. Needs review.) arm: multi_v7_defconfig+CONFIG_PROVE_LOCKING=y: at91-sama5d3_xplained: lab-baylibre-seattle: PASS lab-free-electrons: FAIL @Alexandre: Because it's passing in my lab and failing in yours, I'm guessing this is still the UART overflow issue we've discussed before? What's strange is that this defconfig in your lab seems to only be booting for stable/linux-4.4.y[1] but not mailine or newer stable trees, so I couldn't check if the problem still exists in mainline. Kevin [1] https://kernelci.org/boot/at91-sama5d3_xplained/?CONFIG_PROVE_LOCKING
[PATCH net-next 0/2] netvsc: Fix miscellaneous issues
From: K. Y. SrinivasanFix miscellaneous issues. K. Y. Srinivasan (2): netvsc: Fix a bug in sub-channel handling netvsc: Properly initialize the return value drivers/net/hyperv/netvsc_drv.c |2 +- drivers/net/hyperv/rndis_filter.c |5 + 2 files changed, 6 insertions(+), 1 deletions(-)
[PATCH net-next 0/2] netvsc: Fix miscellaneous issues
From: K. Y. Srinivasan Fix miscellaneous issues. K. Y. Srinivasan (2): netvsc: Fix a bug in sub-channel handling netvsc: Properly initialize the return value drivers/net/hyperv/netvsc_drv.c |2 +- drivers/net/hyperv/rndis_filter.c |5 + 2 files changed, 6 insertions(+), 1 deletions(-)
[PATCH net-next 1/2] netvsc: Fix a bug in sub-channel handling
From: K. Y. SrinivasanAll netvsc channels are handled via NAPI. Setup the "read mode" correctly for the netvsc sub-channels. Signed-off-by: K. Y. Srinivasan --- drivers/net/hyperv/rndis_filter.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c index 91b3bcf..9835825 100644 --- a/drivers/net/hyperv/rndis_filter.c +++ b/drivers/net/hyperv/rndis_filter.c @@ -1002,6 +1002,11 @@ static void netvsc_sc_open(struct vmbus_channel *new_sc) if (!nvchan->mrc.buf) return; + /* Because the device uses NAPI, all the interrupt batching and +* control is done via Net softirq, not the channel handling +*/ + set_channel_read_mode(new_sc, HV_CALL_ISR); + ret = vmbus_open(new_sc, nvscdev->ring_size * PAGE_SIZE, nvscdev->ring_size * PAGE_SIZE, NULL, 0, netvsc_channel_cb, nvchan); -- 1.7.1
[PATCH net-next 2/2] netvsc: Properly initialize the return value
From: K. Y. SrinivasanInitialize the return value correctly. Signed-off-by: K. Y. Srinivasan --- drivers/net/hyperv/netvsc_drv.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index eb7ae79..f830bbb 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -855,7 +855,7 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu) struct hv_device *hdev = ndevctx->device_ctx; struct netvsc_device_info device_info; bool was_running; - int ret; + int ret = 0; if (!nvdev || nvdev->destroy) return -ENODEV; -- 1.7.1
[PATCH net-next 1/2] netvsc: Fix a bug in sub-channel handling
From: K. Y. Srinivasan All netvsc channels are handled via NAPI. Setup the "read mode" correctly for the netvsc sub-channels. Signed-off-by: K. Y. Srinivasan --- drivers/net/hyperv/rndis_filter.c |5 + 1 files changed, 5 insertions(+), 0 deletions(-) diff --git a/drivers/net/hyperv/rndis_filter.c b/drivers/net/hyperv/rndis_filter.c index 91b3bcf..9835825 100644 --- a/drivers/net/hyperv/rndis_filter.c +++ b/drivers/net/hyperv/rndis_filter.c @@ -1002,6 +1002,11 @@ static void netvsc_sc_open(struct vmbus_channel *new_sc) if (!nvchan->mrc.buf) return; + /* Because the device uses NAPI, all the interrupt batching and +* control is done via Net softirq, not the channel handling +*/ + set_channel_read_mode(new_sc, HV_CALL_ISR); + ret = vmbus_open(new_sc, nvscdev->ring_size * PAGE_SIZE, nvscdev->ring_size * PAGE_SIZE, NULL, 0, netvsc_channel_cb, nvchan); -- 1.7.1
[PATCH net-next 2/2] netvsc: Properly initialize the return value
From: K. Y. Srinivasan Initialize the return value correctly. Signed-off-by: K. Y. Srinivasan --- drivers/net/hyperv/netvsc_drv.c |2 +- 1 files changed, 1 insertions(+), 1 deletions(-) diff --git a/drivers/net/hyperv/netvsc_drv.c b/drivers/net/hyperv/netvsc_drv.c index eb7ae79..f830bbb 100644 --- a/drivers/net/hyperv/netvsc_drv.c +++ b/drivers/net/hyperv/netvsc_drv.c @@ -855,7 +855,7 @@ static int netvsc_change_mtu(struct net_device *ndev, int mtu) struct hv_device *hdev = ndevctx->device_ctx; struct netvsc_device_info device_info; bool was_running; - int ret; + int ret = 0; if (!nvdev || nvdev->destroy) return -ENODEV; -- 1.7.1
Re: [net-next PATCH v3 0/8] Add busy poll support for epoll
From: Alexander DuyckDate: Fri, 24 Mar 2017 10:07:47 -0700 > This patch set adds support for using busy polling with epoll. Series applied, thanks!
Re: [net-next PATCH v3 0/8] Add busy poll support for epoll
From: Alexander Duyck Date: Fri, 24 Mar 2017 10:07:47 -0700 > This patch set adds support for using busy polling with epoll. Series applied, thanks!
Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
Hi Vincent, On Thu, Mar 23, 2017 at 3:08 PM, Vincent Guittotwrote: [..] >>> So I'm not really aligned with the description of your problem: PELT metric underestimates the load of the CPU. The PELT is just about tracking CFS task utilization but not whole CPU utilization and according to your description of the problem (time stolen by irq), your problem doesn't come from an underestimation of CFS task but from time spent in something else but not accounted in the value used by schedutil >>> >>> Quite likely. Indeed, it can really be that the CFS task is preempted >>> because of some RT activity generated by the IRQ handler. >>> >>> More in general, I've also noticed many suboptimal freq switches when >>> RT tasks interleave with CFS ones, because of: >>> - relatively long down _and up_ throttling times >>> - the way schedutil's flags are tracked and updated >>> - the callsites from where we call schedutil updates >>> >>> For example it can really happen that we are running at the highest >>> OPP because of some RT activity. Then we switch back to a relatively >>> low utilization CFS workload and then: >>> 1. a tick happens which produces a frequency drop >> >> Any idea why this frequency drop would happen? Say a running CFS task >> gets preempted by RT task, the PELT signal shouldn't drop for the >> duration the CFS task is preempted because the task is runnable, so > > utilization only tracks the running state but not runnable state. > Runnable state is tracked in load_avg Thanks. I got it now. Correct me if I'm wrong but strictly speaking utilization for a cfs_rq (which drives the frequency for CFS) still tracks the blocked/runnable time of tasks although its decayed as time moves forward. Only when we migrate the rq of a cfs task is the util_avg contribution removed from the rq. But I can see now why running RT can decay this load tracking signal. Regards, Joel
Re: [RFC][PATCH 2/2] cpufreq: schedutil: Force max frequency on busy CPUs
Hi Vincent, On Thu, Mar 23, 2017 at 3:08 PM, Vincent Guittot wrote: [..] >>> So I'm not really aligned with the description of your problem: PELT metric underestimates the load of the CPU. The PELT is just about tracking CFS task utilization but not whole CPU utilization and according to your description of the problem (time stolen by irq), your problem doesn't come from an underestimation of CFS task but from time spent in something else but not accounted in the value used by schedutil >>> >>> Quite likely. Indeed, it can really be that the CFS task is preempted >>> because of some RT activity generated by the IRQ handler. >>> >>> More in general, I've also noticed many suboptimal freq switches when >>> RT tasks interleave with CFS ones, because of: >>> - relatively long down _and up_ throttling times >>> - the way schedutil's flags are tracked and updated >>> - the callsites from where we call schedutil updates >>> >>> For example it can really happen that we are running at the highest >>> OPP because of some RT activity. Then we switch back to a relatively >>> low utilization CFS workload and then: >>> 1. a tick happens which produces a frequency drop >> >> Any idea why this frequency drop would happen? Say a running CFS task >> gets preempted by RT task, the PELT signal shouldn't drop for the >> duration the CFS task is preempted because the task is runnable, so > > utilization only tracks the running state but not runnable state. > Runnable state is tracked in load_avg Thanks. I got it now. Correct me if I'm wrong but strictly speaking utilization for a cfs_rq (which drives the frequency for CFS) still tracks the blocked/runnable time of tasks although its decayed as time moves forward. Only when we migrate the rq of a cfs task is the util_avg contribution removed from the rq. But I can see now why running RT can decay this load tracking signal. Regards, Joel
Re: [PATCH v6 06/15] mlx5: Replace PCI pool old API
On Mon, 2017-03-20 at 08:31 +0200, Leon Romanovsky wrote: > On Sun, Mar 19, 2017 at 06:03:55PM +0100, Romain Perier wrote: > > > > The PCI pool API is deprecated. This commit replaces the PCI pool > > old > > API by the appropriate function with the DMA pool API. > > > > Signed-off-by: Romain Perier> > Reviewed-by: Peter Senna Tschudin > > --- > > drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 11 ++- > > include/linux/mlx5/driver.h | 2 +- > > 2 files changed, 7 insertions(+), 6 deletions(-) > > > > Thanks, > Acked-by: Leon Romanovsky Changes look fine to me, and in addition I've compiled and tested them on mlx5 hardware and verified that RDMA communications still work. Acked-by: Doug Ledford Tested-by: Doug Ledford -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Re: [PATCH v6 06/15] mlx5: Replace PCI pool old API
On Mon, 2017-03-20 at 08:31 +0200, Leon Romanovsky wrote: > On Sun, Mar 19, 2017 at 06:03:55PM +0100, Romain Perier wrote: > > > > The PCI pool API is deprecated. This commit replaces the PCI pool > > old > > API by the appropriate function with the DMA pool API. > > > > Signed-off-by: Romain Perier > > Reviewed-by: Peter Senna Tschudin > > --- > > drivers/net/ethernet/mellanox/mlx5/core/cmd.c | 11 ++- > > include/linux/mlx5/driver.h | 2 +- > > 2 files changed, 7 insertions(+), 6 deletions(-) > > > > Thanks, > Acked-by: Leon Romanovsky Changes look fine to me, and in addition I've compiled and tested them on mlx5 hardware and verified that RDMA communications still work. Acked-by: Doug Ledford Tested-by: Doug Ledford -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Re: [net-next PATCH v3 5/8] net: Track start of busy loop instead of when it should end
On Fri, 2017-03-24 at 10:08 -0700, Alexander Duyck wrote: > From: Alexander Duyck> > This patch flips the logic we were using to determine if the busy polling > has timed out. The main motivation for this is that we will need to > support two different possible timeout values in the future and by > recording the start time rather than when we would want to end we can focus > on making the end_time specific to the task be it epoll or socket based > polling. > > Signed-off-by: Alexander Duyck > --- Acked-by: Eric Dumazet Thanks guys !
Re: [net-next PATCH v3 5/8] net: Track start of busy loop instead of when it should end
On Fri, 2017-03-24 at 10:08 -0700, Alexander Duyck wrote: > From: Alexander Duyck > > This patch flips the logic we were using to determine if the busy polling > has timed out. The main motivation for this is that we will need to > support two different possible timeout values in the future and by > recording the start time rather than when we would want to end we can focus > on making the end_time specific to the task be it epoll or socket based > polling. > > Signed-off-by: Alexander Duyck > --- Acked-by: Eric Dumazet Thanks guys !
Re: [net-next PATCH v3 7/8] epoll: Add busy poll support to epoll with socket fds.
On Fri, 2017-03-24 at 10:08 -0700, Alexander Duyck wrote: > From: Sridhar Samudrala> > This patch adds busy poll support to epoll. The implementation is meant to > be opportunistic in that it will take the NAPI ID from the last socket > that is added to the ready list that contains a valid NAPI ID and it will > use that for busy polling until the ready list goes empty. Once the ready > list goes empty the NAPI ID is reset and busy polling is disabled until a > new socket is added to the ready list. > > In addition when we insert a new socket into the epoll we record the NAPI > ID and assume we are going to receive events on it. If that doesn't occur > it will be evicted as the active NAPI ID and we will resume normal > behavior. > > An application can use SO_INCOMING_CPU or SO_REUSEPORT_ATTACH_C/EBPF socket > options to spread the incoming connections to specific worker threads > based on the incoming queue. This enables epoll for each worker thread > to have only sockets that receive packets from a single queue. So when an > application calls epoll_wait() and there are no events available to report, > busy polling is done on the associated queue to pull the packets. > > Signed-off-by: Sridhar Samudrala > Signed-off-by: Alexander Duyck > --- > fs/eventpoll.c | 93 > > 1 file changed, 93 insertions(+) Acked-by: Eric Dumazet
Re: [net-next PATCH v3 7/8] epoll: Add busy poll support to epoll with socket fds.
On Fri, 2017-03-24 at 10:08 -0700, Alexander Duyck wrote: > From: Sridhar Samudrala > > This patch adds busy poll support to epoll. The implementation is meant to > be opportunistic in that it will take the NAPI ID from the last socket > that is added to the ready list that contains a valid NAPI ID and it will > use that for busy polling until the ready list goes empty. Once the ready > list goes empty the NAPI ID is reset and busy polling is disabled until a > new socket is added to the ready list. > > In addition when we insert a new socket into the epoll we record the NAPI > ID and assume we are going to receive events on it. If that doesn't occur > it will be evicted as the active NAPI ID and we will resume normal > behavior. > > An application can use SO_INCOMING_CPU or SO_REUSEPORT_ATTACH_C/EBPF socket > options to spread the incoming connections to specific worker threads > based on the incoming queue. This enables epoll for each worker thread > to have only sockets that receive packets from a single queue. So when an > application calls epoll_wait() and there are no events available to report, > busy polling is done on the associated queue to pull the packets. > > Signed-off-by: Sridhar Samudrala > Signed-off-by: Alexander Duyck > --- > fs/eventpoll.c | 93 > > 1 file changed, 93 insertions(+) Acked-by: Eric Dumazet
Re: [PATCH] drm/vmwgfx: Check check that number of mip levels is above zero in vmw_surface_define_ioctl()
Hi, thank you for this patch. Murray McAllister reported this one a couple of months ago, and this is already in our queue. Sinclair On Fri, Mar 24, 2017 at 04:37:10PM +0100, Vladis Dronov wrote: > In vmw_surface_define_ioctl(), a num_sizes parameter is assigned a > user-controlled value which is not checked for zero. It is used in > a call to kmalloc() which returns ZERO_SIZE_PTR. Later ZERO_SIZE_PTR > is dereferenced which leads to a GPF and possibly to a kernel panic. > Add the check for zero to avoid this. > > Reference: > https://urldefense.proofpoint.com/v2/url?u=https-3A__bugzilla.redhat.com_show-5Fbug.cgi-3Fid-3D1435719=DwIBAg=uilaK90D4TOVoH58JNXRgQ=HaJ2a6NYExoV0cntAYcoqA=OW9cIAAez9eRIxEYMaToDu2szuR_YrfQcOzAH6L8dXo=-3P2pG3n1YW6-8NG6mLC7kyxmx7mMxJmXgY79ZgQeo4= > > Signed-off-by: Vladis Dronov> --- > drivers/gpu/drm/vmwgfx/vmwgfx_surface.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c > b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c > index b445ce9..42840cc 100644 > --- a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c > +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c > @@ -716,8 +716,8 @@ int vmw_surface_define_ioctl(struct drm_device *dev, void > *data, > for (i = 0; i < DRM_VMW_MAX_SURFACE_FACES; ++i) > num_sizes += req->mip_levels[i]; > > - if (num_sizes > DRM_VMW_MAX_SURFACE_FACES * > - DRM_VMW_MAX_MIP_LEVELS) > + if (num_sizes <= 0 || > + num_sizes > DRM_VMW_MAX_SURFACE_FACES * DRM_VMW_MAX_MIP_LEVELS) > return -EINVAL; > > size = vmw_user_surface_size + 128 + > -- > 2.9.3 >
Re: [PATCH] drm/vmwgfx: Check check that number of mip levels is above zero in vmw_surface_define_ioctl()
Hi, thank you for this patch. Murray McAllister reported this one a couple of months ago, and this is already in our queue. Sinclair On Fri, Mar 24, 2017 at 04:37:10PM +0100, Vladis Dronov wrote: > In vmw_surface_define_ioctl(), a num_sizes parameter is assigned a > user-controlled value which is not checked for zero. It is used in > a call to kmalloc() which returns ZERO_SIZE_PTR. Later ZERO_SIZE_PTR > is dereferenced which leads to a GPF and possibly to a kernel panic. > Add the check for zero to avoid this. > > Reference: > https://urldefense.proofpoint.com/v2/url?u=https-3A__bugzilla.redhat.com_show-5Fbug.cgi-3Fid-3D1435719=DwIBAg=uilaK90D4TOVoH58JNXRgQ=HaJ2a6NYExoV0cntAYcoqA=OW9cIAAez9eRIxEYMaToDu2szuR_YrfQcOzAH6L8dXo=-3P2pG3n1YW6-8NG6mLC7kyxmx7mMxJmXgY79ZgQeo4= > > Signed-off-by: Vladis Dronov > --- > drivers/gpu/drm/vmwgfx/vmwgfx_surface.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c > b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c > index b445ce9..42840cc 100644 > --- a/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c > +++ b/drivers/gpu/drm/vmwgfx/vmwgfx_surface.c > @@ -716,8 +716,8 @@ int vmw_surface_define_ioctl(struct drm_device *dev, void > *data, > for (i = 0; i < DRM_VMW_MAX_SURFACE_FACES; ++i) > num_sizes += req->mip_levels[i]; > > - if (num_sizes > DRM_VMW_MAX_SURFACE_FACES * > - DRM_VMW_MAX_MIP_LEVELS) > + if (num_sizes <= 0 || > + num_sizes > DRM_VMW_MAX_SURFACE_FACES * DRM_VMW_MAX_MIP_LEVELS) > return -EINVAL; > > size = vmw_user_surface_size + 128 + > -- > 2.9.3 >
[PATCH v3] staging: media: davinci_vpfe: Replace a bit shift.
This patch replaces bit shifting on 1 with the BIT(x) macro. This was done with coccinelle: @@ constant c; @@ -1 << c +BIT(c) Signed-off-by: Arushi Singhal--- changes in v3 - change the subject. - remove extra parenthesis. drivers/staging/media/davinci_vpfe/dm365_ipipe.c | 2 +- drivers/staging/media/davinci_vpfe/dm365_ipipeif.c | 2 +- drivers/staging/media/davinci_vpfe/dm365_isif.c| 10 +- drivers/staging/media/davinci_vpfe/dm365_resizer.c | 6 +++--- 4 files changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/staging/media/davinci_vpfe/dm365_ipipe.c b/drivers/staging/media/davinci_vpfe/dm365_ipipe.c index 6a3434cebd79..7eeb53217168 100644 --- a/drivers/staging/media/davinci_vpfe/dm365_ipipe.c +++ b/drivers/staging/media/davinci_vpfe/dm365_ipipe.c @@ -1815,7 +1815,7 @@ vpfe_ipipe_init(struct vpfe_ipipe_device *ipipe, struct platform_device *pdev) v4l2_subdev_init(sd, _v4l2_ops); sd->internal_ops = _v4l2_internal_ops; strlcpy(sd->name, "DAVINCI IPIPE", sizeof(sd->name)); - sd->grp_id = 1 << 16; /* group ID for davinci subdevs */ + sd->grp_id = BIT(16); /* group ID for davinci subdevs */ v4l2_set_subdevdata(sd, ipipe); sd->flags |= V4L2_SUBDEV_FL_HAS_DEVNODE; diff --git a/drivers/staging/media/davinci_vpfe/dm365_ipipeif.c b/drivers/staging/media/davinci_vpfe/dm365_ipipeif.c index 46fd2c7f69c3..c07f028dd6be 100644 --- a/drivers/staging/media/davinci_vpfe/dm365_ipipeif.c +++ b/drivers/staging/media/davinci_vpfe/dm365_ipipeif.c @@ -1021,7 +1021,7 @@ int vpfe_ipipeif_init(struct vpfe_ipipeif_device *ipipeif, sd->internal_ops = _v4l2_internal_ops; strlcpy(sd->name, "DAVINCI IPIPEIF", sizeof(sd->name)); - sd->grp_id = 1 << 16; /* group ID for davinci subdevs */ + sd->grp_id = BIT(16); /* group ID for davinci subdevs */ v4l2_set_subdevdata(sd, ipipeif); diff --git a/drivers/staging/media/davinci_vpfe/dm365_isif.c b/drivers/staging/media/davinci_vpfe/dm365_isif.c index 569bcdc9ce2f..74b1247203b1 100644 --- a/drivers/staging/media/davinci_vpfe/dm365_isif.c +++ b/drivers/staging/media/davinci_vpfe/dm365_isif.c @@ -821,7 +821,7 @@ isif_config_dfc(struct vpfe_isif_device *isif, struct vpfe_isif_dfc *vdfc) /* Correct whole line or partial */ if (vdfc->corr_whole_line) - val |= 1 << ISIF_VDFC_CORR_WHOLE_LN_SHIFT; + val |= BIT(ISIF_VDFC_CORR_WHOLE_LN_SHIFT); /* level shift value */ val |= (vdfc->def_level_shift & ISIF_VDFC_LEVEL_SHFT_MASK) << @@ -849,7 +849,7 @@ isif_config_dfc(struct vpfe_isif_device *isif, struct vpfe_isif_dfc *vdfc) val = isif_read(isif->isif_cfg.base_addr, DFCMEMCTL); /* set DFCMARST and set DFCMWR */ - val |= 1 << ISIF_DFCMEMCTL_DFCMARST_SHIFT; + val |= BIT(ISIF_DFCMEMCTL_DFCMARST_SHIFT); val |= 1; isif_write(isif->isif_cfg.base_addr, val, DFCMEMCTL); @@ -880,7 +880,7 @@ isif_config_dfc(struct vpfe_isif_device *isif, struct vpfe_isif_dfc *vdfc) } val = isif_read(isif->isif_cfg.base_addr, DFCMEMCTL); /* clear DFCMARST and set DFCMWR */ - val &= ~(1 << ISIF_DFCMEMCTL_DFCMARST_SHIFT); + val &= ~BIT(ISIF_DFCMEMCTL_DFCMARST_SHIFT); val |= 1; isif_write(isif->isif_cfg.base_addr, val, DFCMEMCTL); @@ -1140,7 +1140,7 @@ static int isif_config_raw(struct v4l2_subdev *sd, int mode) isif_write(isif->isif_cfg.base_addr, val, CGAMMAWD); /* Configure DPCM compression settings */ if (params->v4l2_pix_fmt == V4L2_PIX_FMT_SGRBG10DPCM8) { - val = 1 << ISIF_DPCM_EN_SHIFT; + val = BIT(ISIF_DPCM_EN_SHIFT); val |= (params->dpcm_predictor & ISIF_DPCM_PREDICTOR_MASK) << ISIF_DPCM_PREDICTOR_SHIFT; } @@ -2044,7 +2044,7 @@ int vpfe_isif_init(struct vpfe_isif_device *isif, struct platform_device *pdev) v4l2_subdev_init(sd, _v4l2_ops); sd->internal_ops = _v4l2_internal_ops; strlcpy(sd->name, "DAVINCI ISIF", sizeof(sd->name)); - sd->grp_id = 1 << 16; /* group ID for davinci subdevs */ + sd->grp_id = BIT(16); /* group ID for davinci subdevs */ v4l2_set_subdevdata(sd, isif); sd->flags |= V4L2_SUBDEV_FL_HAS_EVENTS | V4L2_SUBDEV_FL_HAS_DEVNODE; pads[ISIF_PAD_SINK].flags = MEDIA_PAD_FL_SINK; diff --git a/drivers/staging/media/davinci_vpfe/dm365_resizer.c b/drivers/staging/media/davinci_vpfe/dm365_resizer.c index 857b0e847c5e..3b3469adaf91 100644 --- a/drivers/staging/media/davinci_vpfe/dm365_resizer.c +++ b/drivers/staging/media/davinci_vpfe/dm365_resizer.c @@ -1903,7 +1903,7 @@ int vpfe_resizer_init(struct vpfe_resizer_device *vpfe_rsz, v4l2_subdev_init(sd, _v4l2_ops); sd->internal_ops = _v4l2_internal_ops; strlcpy(sd->name, "DAVINCI RESIZER
[PATCH v3] staging: media: davinci_vpfe: Replace a bit shift.
This patch replaces bit shifting on 1 with the BIT(x) macro. This was done with coccinelle: @@ constant c; @@ -1 << c +BIT(c) Signed-off-by: Arushi Singhal --- changes in v3 - change the subject. - remove extra parenthesis. drivers/staging/media/davinci_vpfe/dm365_ipipe.c | 2 +- drivers/staging/media/davinci_vpfe/dm365_ipipeif.c | 2 +- drivers/staging/media/davinci_vpfe/dm365_isif.c| 10 +- drivers/staging/media/davinci_vpfe/dm365_resizer.c | 6 +++--- 4 files changed, 10 insertions(+), 10 deletions(-) diff --git a/drivers/staging/media/davinci_vpfe/dm365_ipipe.c b/drivers/staging/media/davinci_vpfe/dm365_ipipe.c index 6a3434cebd79..7eeb53217168 100644 --- a/drivers/staging/media/davinci_vpfe/dm365_ipipe.c +++ b/drivers/staging/media/davinci_vpfe/dm365_ipipe.c @@ -1815,7 +1815,7 @@ vpfe_ipipe_init(struct vpfe_ipipe_device *ipipe, struct platform_device *pdev) v4l2_subdev_init(sd, _v4l2_ops); sd->internal_ops = _v4l2_internal_ops; strlcpy(sd->name, "DAVINCI IPIPE", sizeof(sd->name)); - sd->grp_id = 1 << 16; /* group ID for davinci subdevs */ + sd->grp_id = BIT(16); /* group ID for davinci subdevs */ v4l2_set_subdevdata(sd, ipipe); sd->flags |= V4L2_SUBDEV_FL_HAS_DEVNODE; diff --git a/drivers/staging/media/davinci_vpfe/dm365_ipipeif.c b/drivers/staging/media/davinci_vpfe/dm365_ipipeif.c index 46fd2c7f69c3..c07f028dd6be 100644 --- a/drivers/staging/media/davinci_vpfe/dm365_ipipeif.c +++ b/drivers/staging/media/davinci_vpfe/dm365_ipipeif.c @@ -1021,7 +1021,7 @@ int vpfe_ipipeif_init(struct vpfe_ipipeif_device *ipipeif, sd->internal_ops = _v4l2_internal_ops; strlcpy(sd->name, "DAVINCI IPIPEIF", sizeof(sd->name)); - sd->grp_id = 1 << 16; /* group ID for davinci subdevs */ + sd->grp_id = BIT(16); /* group ID for davinci subdevs */ v4l2_set_subdevdata(sd, ipipeif); diff --git a/drivers/staging/media/davinci_vpfe/dm365_isif.c b/drivers/staging/media/davinci_vpfe/dm365_isif.c index 569bcdc9ce2f..74b1247203b1 100644 --- a/drivers/staging/media/davinci_vpfe/dm365_isif.c +++ b/drivers/staging/media/davinci_vpfe/dm365_isif.c @@ -821,7 +821,7 @@ isif_config_dfc(struct vpfe_isif_device *isif, struct vpfe_isif_dfc *vdfc) /* Correct whole line or partial */ if (vdfc->corr_whole_line) - val |= 1 << ISIF_VDFC_CORR_WHOLE_LN_SHIFT; + val |= BIT(ISIF_VDFC_CORR_WHOLE_LN_SHIFT); /* level shift value */ val |= (vdfc->def_level_shift & ISIF_VDFC_LEVEL_SHFT_MASK) << @@ -849,7 +849,7 @@ isif_config_dfc(struct vpfe_isif_device *isif, struct vpfe_isif_dfc *vdfc) val = isif_read(isif->isif_cfg.base_addr, DFCMEMCTL); /* set DFCMARST and set DFCMWR */ - val |= 1 << ISIF_DFCMEMCTL_DFCMARST_SHIFT; + val |= BIT(ISIF_DFCMEMCTL_DFCMARST_SHIFT); val |= 1; isif_write(isif->isif_cfg.base_addr, val, DFCMEMCTL); @@ -880,7 +880,7 @@ isif_config_dfc(struct vpfe_isif_device *isif, struct vpfe_isif_dfc *vdfc) } val = isif_read(isif->isif_cfg.base_addr, DFCMEMCTL); /* clear DFCMARST and set DFCMWR */ - val &= ~(1 << ISIF_DFCMEMCTL_DFCMARST_SHIFT); + val &= ~BIT(ISIF_DFCMEMCTL_DFCMARST_SHIFT); val |= 1; isif_write(isif->isif_cfg.base_addr, val, DFCMEMCTL); @@ -1140,7 +1140,7 @@ static int isif_config_raw(struct v4l2_subdev *sd, int mode) isif_write(isif->isif_cfg.base_addr, val, CGAMMAWD); /* Configure DPCM compression settings */ if (params->v4l2_pix_fmt == V4L2_PIX_FMT_SGRBG10DPCM8) { - val = 1 << ISIF_DPCM_EN_SHIFT; + val = BIT(ISIF_DPCM_EN_SHIFT); val |= (params->dpcm_predictor & ISIF_DPCM_PREDICTOR_MASK) << ISIF_DPCM_PREDICTOR_SHIFT; } @@ -2044,7 +2044,7 @@ int vpfe_isif_init(struct vpfe_isif_device *isif, struct platform_device *pdev) v4l2_subdev_init(sd, _v4l2_ops); sd->internal_ops = _v4l2_internal_ops; strlcpy(sd->name, "DAVINCI ISIF", sizeof(sd->name)); - sd->grp_id = 1 << 16; /* group ID for davinci subdevs */ + sd->grp_id = BIT(16); /* group ID for davinci subdevs */ v4l2_set_subdevdata(sd, isif); sd->flags |= V4L2_SUBDEV_FL_HAS_EVENTS | V4L2_SUBDEV_FL_HAS_DEVNODE; pads[ISIF_PAD_SINK].flags = MEDIA_PAD_FL_SINK; diff --git a/drivers/staging/media/davinci_vpfe/dm365_resizer.c b/drivers/staging/media/davinci_vpfe/dm365_resizer.c index 857b0e847c5e..3b3469adaf91 100644 --- a/drivers/staging/media/davinci_vpfe/dm365_resizer.c +++ b/drivers/staging/media/davinci_vpfe/dm365_resizer.c @@ -1903,7 +1903,7 @@ int vpfe_resizer_init(struct vpfe_resizer_device *vpfe_rsz, v4l2_subdev_init(sd, _v4l2_ops); sd->internal_ops = _v4l2_internal_ops; strlcpy(sd->name, "DAVINCI RESIZER CROP", sizeof(sd->name)); -
Re: [PATCH] ipmi_ssif: use setup_timer
A little nicer, yes. In queue for the next release. Thanks, -corey On 03/24/2017 09:15 AM, Geliang Tang wrote: Use setup_timer() instead of init_timer() to simplify the code. Signed-off-by: Geliang Tang--- drivers/char/ipmi/ipmi_ssif.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/char/ipmi/ipmi_ssif.c b/drivers/char/ipmi/ipmi_ssif.c index cca6e5b..a92a049 100644 --- a/drivers/char/ipmi/ipmi_ssif.c +++ b/drivers/char/ipmi/ipmi_ssif.c @@ -1642,9 +1642,8 @@ static int ssif_probe(struct i2c_client *client, const struct i2c_device_id *id) spin_lock_init(_info->lock); ssif_info->ssif_state = SSIF_NORMAL; - init_timer(_info->retry_timer); - ssif_info->retry_timer.data = (unsigned long) ssif_info; - ssif_info->retry_timer.function = retry_timeout; + setup_timer(_info->retry_timer, retry_timeout, + (unsigned long)ssif_info); for (i = 0; i < SSIF_NUM_STATS; i++) atomic_set(_info->stats[i], 0);
Re: [PATCH] ipmi_ssif: use setup_timer
A little nicer, yes. In queue for the next release. Thanks, -corey On 03/24/2017 09:15 AM, Geliang Tang wrote: Use setup_timer() instead of init_timer() to simplify the code. Signed-off-by: Geliang Tang --- drivers/char/ipmi/ipmi_ssif.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/char/ipmi/ipmi_ssif.c b/drivers/char/ipmi/ipmi_ssif.c index cca6e5b..a92a049 100644 --- a/drivers/char/ipmi/ipmi_ssif.c +++ b/drivers/char/ipmi/ipmi_ssif.c @@ -1642,9 +1642,8 @@ static int ssif_probe(struct i2c_client *client, const struct i2c_device_id *id) spin_lock_init(_info->lock); ssif_info->ssif_state = SSIF_NORMAL; - init_timer(_info->retry_timer); - ssif_info->retry_timer.data = (unsigned long) ssif_info; - ssif_info->retry_timer.function = retry_timeout; + setup_timer(_info->retry_timer, retry_timeout, + (unsigned long)ssif_info); for (i = 0; i < SSIF_NUM_STATS; i++) atomic_set(_info->stats[i], 0);
Re: [PATCH] ACPI / IPMI: allow ACPI_IPMI with IPMI_SSIF
Oops, yes, this is an issue. However, it should probably depend on IPMI_HANDLER, not the individual interface types. -corey On 03/23/2017 10:53 AM, Sinan Kaya wrote: ACPI_IPMI driver currently depends on IPMI System Interface (IPMI_SI) driver to be enabled. IPMI_SI driver only handles KCS, SMIC and BT BMC interfaces. IPMI_SSIF is an alternative BMC communication method. It allows BMC to be accessed over an I2C bus instead of a standard interface. Enabling ACPI_IPMI over IPMI_SSIF with this change. Signed-off-by: Sinan Kaya--- drivers/acpi/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index 83e5f7e..8767062 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -256,7 +256,7 @@ config ACPI_PROCESSOR config ACPI_IPMI tristate "IPMI" - depends on IPMI_SI + depends on IPMI_SI||IPMI_SSIF default n help This driver enables the ACPI to access the BMC controller. And it
Re: [PATCH] ACPI / IPMI: allow ACPI_IPMI with IPMI_SSIF
Oops, yes, this is an issue. However, it should probably depend on IPMI_HANDLER, not the individual interface types. -corey On 03/23/2017 10:53 AM, Sinan Kaya wrote: ACPI_IPMI driver currently depends on IPMI System Interface (IPMI_SI) driver to be enabled. IPMI_SI driver only handles KCS, SMIC and BT BMC interfaces. IPMI_SSIF is an alternative BMC communication method. It allows BMC to be accessed over an I2C bus instead of a standard interface. Enabling ACPI_IPMI over IPMI_SSIF with this change. Signed-off-by: Sinan Kaya --- drivers/acpi/Kconfig | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/acpi/Kconfig b/drivers/acpi/Kconfig index 83e5f7e..8767062 100644 --- a/drivers/acpi/Kconfig +++ b/drivers/acpi/Kconfig @@ -256,7 +256,7 @@ config ACPI_PROCESSOR config ACPI_IPMI tristate "IPMI" - depends on IPMI_SI + depends on IPMI_SI||IPMI_SSIF default n help This driver enables the ACPI to access the BMC controller. And it
Re: [PATCH] ACPI / IPMI: change warning to debug on timeout
Why would a timeout for a message be expected? The BMC should at least respond with an error for an incorrect message. -corey On 03/23/2017 10:32 AM, Sinan Kaya wrote: Getting timeout message from BMC when trying to read from a non-existent FRU. This is expected but warning is not. Let's reduce the warning to debug. Signed-off-by: Sinan Kaya--- drivers/acpi/acpi_ipmi.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index 747c2ba..1b64419 100644 --- a/drivers/acpi/acpi_ipmi.c +++ b/drivers/acpi/acpi_ipmi.c @@ -429,8 +429,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data) if (msg->recv_type == IPMI_RESPONSE_RECV_TYPE && msg->msg.data_len == 1) { if (msg->msg.data[0] == IPMI_TIMEOUT_COMPLETION_CODE) { - dev_WARN_ONCE(dev, true, - "Unexpected response (timeout).\n"); + dev_dbg_once(dev, "Unexpected response (timeout).\n"); tx_msg->msg_done = ACPI_IPMI_TIMEOUT; } goto out_comp;
[PATCH] Staging: atomisp - octal permissions, style fix
Changed permissions to octal style Found using checkpatch Signed-off-by: Derek Robson--- drivers/staging/media/atomisp/pci/atomisp2/atomisp_drvfs.c | 9 +++-- drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm.c | 8 2 files changed, 7 insertions(+), 10 deletions(-) diff --git a/drivers/staging/media/atomisp/pci/atomisp2/atomisp_drvfs.c b/drivers/staging/media/atomisp/pci/atomisp2/atomisp_drvfs.c index fcfe8d7190b0..763bc5f2a033 100644 --- a/drivers/staging/media/atomisp/pci/atomisp2/atomisp_drvfs.c +++ b/drivers/staging/media/atomisp/pci/atomisp2/atomisp_drvfs.c @@ -163,12 +163,9 @@ static ssize_t iunit_dbgopt_store(struct device_driver *drv, const char *buf, } static struct driver_attribute iunit_drvfs_attrs[] = { - __ATTR(dbglvl, S_IRUSR|S_IWUSR|S_IRGRP|S_IROTH, iunit_dbglvl_show, - iunit_dbglvl_store), - __ATTR(dbgfun, S_IRUSR|S_IWUSR|S_IRGRP|S_IROTH, iunit_dbgfun_show, - iunit_dbgfun_store), - __ATTR(dbgopt, S_IRUSR|S_IWUSR|S_IRGRP|S_IROTH, iunit_dbgopt_show, - iunit_dbgopt_store), + __ATTR(dbglvl, 0644, iunit_dbglvl_show, iunit_dbglvl_store), + __ATTR(dbgfun, 0644, iunit_dbgfun_show, iunit_dbgfun_store), + __ATTR(dbgopt, 0644, iunit_dbgopt_show, iunit_dbgopt_store), }; static int iunit_drvfs_create_files(struct pci_driver *drv) diff --git a/drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm.c b/drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm.c index a362b492ec8e..edf554e8e96e 100644 --- a/drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm.c +++ b/drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm.c @@ -161,10 +161,10 @@ static ssize_t dynamic_pool_show(struct device *dev, return ret; }; -static DEVICE_ATTR(active_bo, S_IRUGO, active_bo_show, NULL); -static DEVICE_ATTR(free_bo, S_IRUGO, free_bo_show, NULL); -static DEVICE_ATTR(reserved_pool, S_IRUGO, reserved_pool_show, NULL); -static DEVICE_ATTR(dynamic_pool, S_IRUGO, dynamic_pool_show, NULL); +static DEVICE_ATTR(active_bo, 0444, active_bo_show, NULL); +static DEVICE_ATTR(free_bo, 0444, free_bo_show, NULL); +static DEVICE_ATTR(reserved_pool, 0444, reserved_pool_show, NULL); +static DEVICE_ATTR(dynamic_pool, 0444, dynamic_pool_show, NULL); static struct attribute *sysfs_attrs_ctrl[] = { _attr_active_bo.attr, -- 2.12.0
[PATCH] Staging: atomisp - octal permissions, style fix
Changed permissions to octal style Found using checkpatch Signed-off-by: Derek Robson --- drivers/staging/media/atomisp/pci/atomisp2/atomisp_drvfs.c | 9 +++-- drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm.c | 8 2 files changed, 7 insertions(+), 10 deletions(-) diff --git a/drivers/staging/media/atomisp/pci/atomisp2/atomisp_drvfs.c b/drivers/staging/media/atomisp/pci/atomisp2/atomisp_drvfs.c index fcfe8d7190b0..763bc5f2a033 100644 --- a/drivers/staging/media/atomisp/pci/atomisp2/atomisp_drvfs.c +++ b/drivers/staging/media/atomisp/pci/atomisp2/atomisp_drvfs.c @@ -163,12 +163,9 @@ static ssize_t iunit_dbgopt_store(struct device_driver *drv, const char *buf, } static struct driver_attribute iunit_drvfs_attrs[] = { - __ATTR(dbglvl, S_IRUSR|S_IWUSR|S_IRGRP|S_IROTH, iunit_dbglvl_show, - iunit_dbglvl_store), - __ATTR(dbgfun, S_IRUSR|S_IWUSR|S_IRGRP|S_IROTH, iunit_dbgfun_show, - iunit_dbgfun_store), - __ATTR(dbgopt, S_IRUSR|S_IWUSR|S_IRGRP|S_IROTH, iunit_dbgopt_show, - iunit_dbgopt_store), + __ATTR(dbglvl, 0644, iunit_dbglvl_show, iunit_dbglvl_store), + __ATTR(dbgfun, 0644, iunit_dbgfun_show, iunit_dbgfun_store), + __ATTR(dbgopt, 0644, iunit_dbgopt_show, iunit_dbgopt_store), }; static int iunit_drvfs_create_files(struct pci_driver *drv) diff --git a/drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm.c b/drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm.c index a362b492ec8e..edf554e8e96e 100644 --- a/drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm.c +++ b/drivers/staging/media/atomisp/pci/atomisp2/hmm/hmm.c @@ -161,10 +161,10 @@ static ssize_t dynamic_pool_show(struct device *dev, return ret; }; -static DEVICE_ATTR(active_bo, S_IRUGO, active_bo_show, NULL); -static DEVICE_ATTR(free_bo, S_IRUGO, free_bo_show, NULL); -static DEVICE_ATTR(reserved_pool, S_IRUGO, reserved_pool_show, NULL); -static DEVICE_ATTR(dynamic_pool, S_IRUGO, dynamic_pool_show, NULL); +static DEVICE_ATTR(active_bo, 0444, active_bo_show, NULL); +static DEVICE_ATTR(free_bo, 0444, free_bo_show, NULL); +static DEVICE_ATTR(reserved_pool, 0444, reserved_pool_show, NULL); +static DEVICE_ATTR(dynamic_pool, 0444, dynamic_pool_show, NULL); static struct attribute *sysfs_attrs_ctrl[] = { _attr_active_bo.attr, -- 2.12.0
Re: [PATCH] ACPI / IPMI: change warning to debug on timeout
Why would a timeout for a message be expected? The BMC should at least respond with an error for an incorrect message. -corey On 03/23/2017 10:32 AM, Sinan Kaya wrote: Getting timeout message from BMC when trying to read from a non-existent FRU. This is expected but warning is not. Let's reduce the warning to debug. Signed-off-by: Sinan Kaya --- drivers/acpi/acpi_ipmi.c | 3 +-- 1 file changed, 1 insertion(+), 2 deletions(-) diff --git a/drivers/acpi/acpi_ipmi.c b/drivers/acpi/acpi_ipmi.c index 747c2ba..1b64419 100644 --- a/drivers/acpi/acpi_ipmi.c +++ b/drivers/acpi/acpi_ipmi.c @@ -429,8 +429,7 @@ static void ipmi_msg_handler(struct ipmi_recv_msg *msg, void *user_msg_data) if (msg->recv_type == IPMI_RESPONSE_RECV_TYPE && msg->msg.data_len == 1) { if (msg->msg.data[0] == IPMI_TIMEOUT_COMPLETION_CODE) { - dev_WARN_ONCE(dev, true, - "Unexpected response (timeout).\n"); + dev_dbg_once(dev, "Unexpected response (timeout).\n"); tx_msg->msg_done = ACPI_IPMI_TIMEOUT; } goto out_comp;
Re: [PATCH] ipmi: Fix kernel panic at ipmi_ssif_thread()
This is incorrect. These values *must* be set before ssif_i2c_send() is done. Once ssif_i2c_send() is called, msg_written_handler can be called at any time after that, including before where you moved the new code. If I understand this correctly, I think you need to add a variable, maybe named "bytes_to_write" and do bytes_to_write = ssif_info->multi_data + ssif_info->multi_pos; before the comparison and pass bytes_to_write into ssif_i2c_send(). Thanks, -corey On 03/23/2017 02:07 AM, Joeseph Chang wrote: From: Joeseph Changmsg_written_handler() may set ssif_info->multi_data to NULL when using ipmitool to write fru. Change the ssif i2c send data sequence in msg_written_handler() to fix NULL pointer kernel panic and incorrect ssif_info->multi_pos. Signed-off-by: Joeseph Chang --- drivers/char/ipmi/ipmi_ssif.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/drivers/char/ipmi/ipmi_ssif.c b/drivers/char/ipmi/ipmi_ssif.c index cca6e5b..39346ee 100644 --- a/drivers/char/ipmi/ipmi_ssif.c +++ b/drivers/char/ipmi/ipmi_ssif.c @@ -899,21 +899,13 @@ static void msg_written_handler(struct ssif_info *ssif_info, int result, left = 32; /* Length byte. */ ssif_info->multi_data[ssif_info->multi_pos] = left; - ssif_info->multi_pos += left; - if (left < 32) - /* -* Write is finished. Note that we must end -* with a write of less than 32 bytes to -* complete the transaction, even if it is -* zero bytes. -*/ - ssif_info->multi_data = NULL; rv = ssif_i2c_send(ssif_info, msg_written_handler, I2C_SMBUS_WRITE, SSIF_IPMI_MULTI_PART_REQUEST_MIDDLE, ssif_info->multi_data + ssif_info->multi_pos, I2C_SMBUS_BLOCK_DATA); + if (rv < 0) { /* request failed, just return the error. */ ssif_inc_stat(ssif_info, send_errors); @@ -922,6 +914,16 @@ static void msg_written_handler(struct ssif_info *ssif_info, int result, pr_info("Error from i2c_non_blocking_op(3)\n"); msg_done_handler(ssif_info, -EIO, NULL, 0); } + + ssif_info->multi_pos += left; + if (left < 32) + /* +* Write is finished. Note that we must end +* with a write of less than 32 bytes to +* complete the transaction, even if it is +* zero bytes. +*/ + ssif_info->multi_data = NULL; } else { /* Ready to request the result. */ unsigned long oflags, *flags;
Re: [PATCH] ipmi: Fix kernel panic at ipmi_ssif_thread()
This is incorrect. These values *must* be set before ssif_i2c_send() is done. Once ssif_i2c_send() is called, msg_written_handler can be called at any time after that, including before where you moved the new code. If I understand this correctly, I think you need to add a variable, maybe named "bytes_to_write" and do bytes_to_write = ssif_info->multi_data + ssif_info->multi_pos; before the comparison and pass bytes_to_write into ssif_i2c_send(). Thanks, -corey On 03/23/2017 02:07 AM, Joeseph Chang wrote: From: Joeseph Chang msg_written_handler() may set ssif_info->multi_data to NULL when using ipmitool to write fru. Change the ssif i2c send data sequence in msg_written_handler() to fix NULL pointer kernel panic and incorrect ssif_info->multi_pos. Signed-off-by: Joeseph Chang --- drivers/char/ipmi/ipmi_ssif.c | 20 +++- 1 file changed, 11 insertions(+), 9 deletions(-) diff --git a/drivers/char/ipmi/ipmi_ssif.c b/drivers/char/ipmi/ipmi_ssif.c index cca6e5b..39346ee 100644 --- a/drivers/char/ipmi/ipmi_ssif.c +++ b/drivers/char/ipmi/ipmi_ssif.c @@ -899,21 +899,13 @@ static void msg_written_handler(struct ssif_info *ssif_info, int result, left = 32; /* Length byte. */ ssif_info->multi_data[ssif_info->multi_pos] = left; - ssif_info->multi_pos += left; - if (left < 32) - /* -* Write is finished. Note that we must end -* with a write of less than 32 bytes to -* complete the transaction, even if it is -* zero bytes. -*/ - ssif_info->multi_data = NULL; rv = ssif_i2c_send(ssif_info, msg_written_handler, I2C_SMBUS_WRITE, SSIF_IPMI_MULTI_PART_REQUEST_MIDDLE, ssif_info->multi_data + ssif_info->multi_pos, I2C_SMBUS_BLOCK_DATA); + if (rv < 0) { /* request failed, just return the error. */ ssif_inc_stat(ssif_info, send_errors); @@ -922,6 +914,16 @@ static void msg_written_handler(struct ssif_info *ssif_info, int result, pr_info("Error from i2c_non_blocking_op(3)\n"); msg_done_handler(ssif_info, -EIO, NULL, 0); } + + ssif_info->multi_pos += left; + if (left < 32) + /* +* Write is finished. Note that we must end +* with a write of less than 32 bytes to +* complete the transaction, even if it is +* zero bytes. +*/ + ssif_info->multi_data = NULL; } else { /* Ready to request the result. */ unsigned long oflags, *flags;
Re: [RESEND PATCH] IB/qib: fix false-postive maybe-uninitialized warning
On Tue, 2017-03-14 at 13:18 +0100, Arnd Bergmann wrote: > aarch64-linux-gcc-7 complains about code it doesn't fully understand: > > drivers/infiniband/hw/qib/qib_iba7322.c: In function > 'qib_7322_txchk_change': > include/asm-generic/bitops/non-atomic.h:105:35: error: 'shadow' may > be used uninitialized in this function [-Werror=maybe-uninitialized] > > The code is right, and despite trying hard, I could not come up with > a version > that I liked better than just adding a fake initialization here to > shut up the > warning. > > Fixes: f931551bafe1 ("IB/qib: Add new qib driver for QLogic PCIe > InfiniBand adapters") > Signed-off-by: Arnd Bergmann> Acked-by: Ira Weiny > --- > Submitted originally on Feb 27, the patch is still required on v4.11- > rc2 to get > a clean build. > --- > drivers/infiniband/hw/qib/qib_iba7322.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c > b/drivers/infiniband/hw/qib/qib_iba7322.c > index 12c4208fd701..af9f596bb68b 100644 > --- a/drivers/infiniband/hw/qib/qib_iba7322.c > +++ b/drivers/infiniband/hw/qib/qib_iba7322.c > @@ -7068,7 +7068,7 @@ static void qib_7322_txchk_change(struct > qib_devdata *dd, u32 start, > unsigned long flags; > > while (wait) { > - unsigned long shadow; > + unsigned long shadow = 0; > int cstart, previ = -1; > > /* Applied. Although, it would be preferable to fix the compiler, but I don't have any control over that and as this breaks compiles with -Werror, I'll relent and let it in. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Re: [RESEND PATCH] IB/qib: fix false-postive maybe-uninitialized warning
On Tue, 2017-03-14 at 13:18 +0100, Arnd Bergmann wrote: > aarch64-linux-gcc-7 complains about code it doesn't fully understand: > > drivers/infiniband/hw/qib/qib_iba7322.c: In function > 'qib_7322_txchk_change': > include/asm-generic/bitops/non-atomic.h:105:35: error: 'shadow' may > be used uninitialized in this function [-Werror=maybe-uninitialized] > > The code is right, and despite trying hard, I could not come up with > a version > that I liked better than just adding a fake initialization here to > shut up the > warning. > > Fixes: f931551bafe1 ("IB/qib: Add new qib driver for QLogic PCIe > InfiniBand adapters") > Signed-off-by: Arnd Bergmann > Acked-by: Ira Weiny > --- > Submitted originally on Feb 27, the patch is still required on v4.11- > rc2 to get > a clean build. > --- > drivers/infiniband/hw/qib/qib_iba7322.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/infiniband/hw/qib/qib_iba7322.c > b/drivers/infiniband/hw/qib/qib_iba7322.c > index 12c4208fd701..af9f596bb68b 100644 > --- a/drivers/infiniband/hw/qib/qib_iba7322.c > +++ b/drivers/infiniband/hw/qib/qib_iba7322.c > @@ -7068,7 +7068,7 @@ static void qib_7322_txchk_change(struct > qib_devdata *dd, u32 start, > unsigned long flags; > > while (wait) { > - unsigned long shadow; > + unsigned long shadow = 0; > int cstart, previ = -1; > > /* Applied. Although, it would be preferable to fix the compiler, but I don't have any control over that and as this breaks compiles with -Werror, I'll relent and let it in. -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Re: [PATCH v2 09/23] MAINTAINERS: Add file patterns for infiniband device tree bindings
On Sun, 2017-03-12 at 14:16 +0100, Geert Uytterhoeven wrote: > Submitters of device tree binding documentation may forget to CC > the subsystem maintainer if this is missing. > > Signed-off-by: Geert Uytterhoeven> Cc: Doug Ledford > Cc: Sean Hefty > Cc: Hal Rosenstock > Cc: linux-r...@vger.kernel.org > --- > Please apply this patch directly if you want to be involved in device > tree binding documentation for your subsystem. I assume this is going through someone else' tree since I only see patch 09 of 23 and not the entire series. But, for this specific patch: Acked-by: Doug Ledford > > v2: > - New. > > Impact on next-20170310: > > +Doug Ledford (supporter:INFINIBAND SUBSYSTEM) > +Sean Hefty (supporter:INFINIBAND SUBSYSTEM) > +Hal Rosenstock (supporter:INFINIBAND > SUBSYSTEM) > Rob Herring (maintainer:OPEN FIRMWARE AND > FLATTENED DEVICE TREE BINDINGS) > Mark Rutland (maintainer:OPEN FIRMWARE AND > FLATTENED DEVICE TREE BINDINGS) > -Wei Hu (commit_signer:2/2=100%) > -Lijun Ou > (commit_signer:2/2=100%,authored:2/2=100%) > -Doug Ledford (commit_signer:2/2=100%) > -Salil Mehta (commit_signer:1/2=50%) > +linux-r...@vger.kernel.org (open list:INFINIBAND SUBSYSTEM) > devicet...@vger.kernel.org (open list:OPEN FIRMWARE AND FLATTENED > DEVICE TREE BINDINGS) > linux-kernel@vger.kernel.org (open list) > --- > MAINTAINERS | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/MAINTAINERS b/MAINTAINERS > index fa42535181d48e27..c3fbd1af0f49a7e9 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -6459,6 +6459,7 @@ W: http://www.openfabrics.org/ > Q: http://patchwork.kernel.org/project/linux-rdma/list/ > T: git > git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git > S: Supported > +F: Documentation/devicetree/bindings/infiniband/ > F: Documentation/infiniband/ > F: drivers/infiniband/ > F: include/uapi/linux/if_infiniband.h -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Re: [PATCH v2 09/23] MAINTAINERS: Add file patterns for infiniband device tree bindings
On Sun, 2017-03-12 at 14:16 +0100, Geert Uytterhoeven wrote: > Submitters of device tree binding documentation may forget to CC > the subsystem maintainer if this is missing. > > Signed-off-by: Geert Uytterhoeven > Cc: Doug Ledford > Cc: Sean Hefty > Cc: Hal Rosenstock > Cc: linux-r...@vger.kernel.org > --- > Please apply this patch directly if you want to be involved in device > tree binding documentation for your subsystem. I assume this is going through someone else' tree since I only see patch 09 of 23 and not the entire series. But, for this specific patch: Acked-by: Doug Ledford > > v2: > - New. > > Impact on next-20170310: > > +Doug Ledford (supporter:INFINIBAND SUBSYSTEM) > +Sean Hefty (supporter:INFINIBAND SUBSYSTEM) > +Hal Rosenstock (supporter:INFINIBAND > SUBSYSTEM) > Rob Herring (maintainer:OPEN FIRMWARE AND > FLATTENED DEVICE TREE BINDINGS) > Mark Rutland (maintainer:OPEN FIRMWARE AND > FLATTENED DEVICE TREE BINDINGS) > -Wei Hu (commit_signer:2/2=100%) > -Lijun Ou > (commit_signer:2/2=100%,authored:2/2=100%) > -Doug Ledford (commit_signer:2/2=100%) > -Salil Mehta (commit_signer:1/2=50%) > +linux-r...@vger.kernel.org (open list:INFINIBAND SUBSYSTEM) > devicet...@vger.kernel.org (open list:OPEN FIRMWARE AND FLATTENED > DEVICE TREE BINDINGS) > linux-kernel@vger.kernel.org (open list) > --- > MAINTAINERS | 1 + > 1 file changed, 1 insertion(+) > > diff --git a/MAINTAINERS b/MAINTAINERS > index fa42535181d48e27..c3fbd1af0f49a7e9 100644 > --- a/MAINTAINERS > +++ b/MAINTAINERS > @@ -6459,6 +6459,7 @@ W: http://www.openfabrics.org/ > Q: http://patchwork.kernel.org/project/linux-rdma/list/ > T: git > git://git.kernel.org/pub/scm/linux/kernel/git/dledford/rdma.git > S: Supported > +F: Documentation/devicetree/bindings/infiniband/ > F: Documentation/infiniband/ > F: drivers/infiniband/ > F: include/uapi/linux/if_infiniband.h -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Re: [RFC v5 5/9] sched/deadline: do not reclaim the whole CPU bandwidth
On Fri, 24 Mar 2017 22:58:27 +0100 luca abeniwrote: > Hi Peter, > > On Fri, 24 Mar 2017 15:00:15 +0100 > Peter Zijlstra wrote: > > > On Fri, Mar 24, 2017 at 04:52:58AM +0100, luca abeni wrote: > > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > > index 20c62e7..efa88eb 100644 > > > --- a/kernel/sched/core.c > > > +++ b/kernel/sched/core.c > > > @@ -6716,6 +6716,12 @@ static void sched_dl_do_global(void) > > > raw_spin_unlock_irqrestore(_b->lock, flags); > > > > > > rcu_read_unlock_sched(); > > > + if (dl_b->bw == -1) > > > + cpu_rq(cpu)->dl.deadline_bw_inv = 1 << 8; > > > + else > > > + cpu_rq(cpu)->dl.deadline_bw_inv = > > > + to_ratio(global_rt_runtime(), > > > + global_rt_period()) >> > > > 12; > > > > Coding style requires braces here (on both legs of the condition).. > > Sorry about this; checkpatch did not complain and I did not check the > coding rules. I'll add the braces. I'm not sure it's completely documented anywhere. The brackets are not needed if there's one statement after the if, but for readability, it's sometimes best to put brackets in if there's more than one line. That can even include comments. It's not a hard rule, but more of a preference. I'm personally OK with the above, but Peter being the maintainer, has the say to give the preference of this kind of rule. -- Steve
Re: [RFC v5 5/9] sched/deadline: do not reclaim the whole CPU bandwidth
On Fri, 24 Mar 2017 22:58:27 +0100 luca abeni wrote: > Hi Peter, > > On Fri, 24 Mar 2017 15:00:15 +0100 > Peter Zijlstra wrote: > > > On Fri, Mar 24, 2017 at 04:52:58AM +0100, luca abeni wrote: > > > > > diff --git a/kernel/sched/core.c b/kernel/sched/core.c > > > index 20c62e7..efa88eb 100644 > > > --- a/kernel/sched/core.c > > > +++ b/kernel/sched/core.c > > > @@ -6716,6 +6716,12 @@ static void sched_dl_do_global(void) > > > raw_spin_unlock_irqrestore(_b->lock, flags); > > > > > > rcu_read_unlock_sched(); > > > + if (dl_b->bw == -1) > > > + cpu_rq(cpu)->dl.deadline_bw_inv = 1 << 8; > > > + else > > > + cpu_rq(cpu)->dl.deadline_bw_inv = > > > + to_ratio(global_rt_runtime(), > > > + global_rt_period()) >> > > > 12; > > > > Coding style requires braces here (on both legs of the condition).. > > Sorry about this; checkpatch did not complain and I did not check the > coding rules. I'll add the braces. I'm not sure it's completely documented anywhere. The brackets are not needed if there's one statement after the if, but for readability, it's sometimes best to put brackets in if there's more than one line. That can even include comments. It's not a hard rule, but more of a preference. I'm personally OK with the above, but Peter being the maintainer, has the say to give the preference of this kind of rule. -- Steve
Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization
On Fri, 24 Mar 2017 22:47:15 +0100 luca abeniwrote: > Ok... Since I am not good at ascii art, would it be ok to add a textual > description? If yes, I'll add a comment like: > " > The utilization of a task is added to the runqueue's active utilization > when the task becomes active (is enqueued in the runqueue), and is > removed when the task becomes inactive. A task does not become > immediately inactive when it blocks, but becomes inactive at the so > called "0 lag time"; so, we setup the "inactive timer" to fire at the > "0 lag time". When the "inactive timer" fires, the task utilization is > removed from the runqueue's active utilization. If the task wakes up > again on the same runqueue before the "0 lag time", the active > utilization must not be changed and the "inactive timer" must be > cancelled. If the task wakes up again on a different runqueue before > the "0 lag time", then the task's utilization must be removed from the > previous runqueue's active utilization and must be added to the new > runqueue's active utilization. > In order to avoid races between a task waking up on a runqueue while the > "inactive timer" is running on a different CPU, the "dl_non_contending" > flag is used to indicate that a task is not on a runqueue but is active > (so, the flag is set when the task blocks and is cleared when the > "inactive timer" fires or when the task wakes up). Sure, the above is great if you never want anyone to read it ;) Can you please break it up a little. My head starts to spin by the third line down. -- Steve > " > (if this is ok, where can I add this comment?)
Re: [RFC v5 2/9] sched/deadline: improve the tracking of active utilization
On Fri, 24 Mar 2017 22:47:15 +0100 luca abeni wrote: > Ok... Since I am not good at ascii art, would it be ok to add a textual > description? If yes, I'll add a comment like: > " > The utilization of a task is added to the runqueue's active utilization > when the task becomes active (is enqueued in the runqueue), and is > removed when the task becomes inactive. A task does not become > immediately inactive when it blocks, but becomes inactive at the so > called "0 lag time"; so, we setup the "inactive timer" to fire at the > "0 lag time". When the "inactive timer" fires, the task utilization is > removed from the runqueue's active utilization. If the task wakes up > again on the same runqueue before the "0 lag time", the active > utilization must not be changed and the "inactive timer" must be > cancelled. If the task wakes up again on a different runqueue before > the "0 lag time", then the task's utilization must be removed from the > previous runqueue's active utilization and must be added to the new > runqueue's active utilization. > In order to avoid races between a task waking up on a runqueue while the > "inactive timer" is running on a different CPU, the "dl_non_contending" > flag is used to indicate that a task is not on a runqueue but is active > (so, the flag is set when the task blocks and is cleared when the > "inactive timer" fires or when the task wakes up). Sure, the above is great if you never want anyone to read it ;) Can you please break it up a little. My head starts to spin by the third line down. -- Steve > " > (if this is ok, where can I add this comment?)
Re: [net-next PATCH v3 0/8] Add busy poll support for epoll
From: Alexander DuyckDate: Fri, 24 Mar 2017 10:07:47 -0700 > v3: Split off the code for limiting busy_poll and busy_read into a separate > patch for net. > Updated patch that changed busy loop time tracking so that it uses > "local_clock() >> 10" as we originally did. > Tweaked "Change return type.." patch by moving declaration of "work" > inside the loop where is was accessed and always reset to 0. > Added "Acked-by" for patches that received acks. Eric, I believe Alexander has addresed your concerns. Could you please review patches 5 and 7 and ACK if they look ok to you? Thanks.
Re: [net-next PATCH v3 0/8] Add busy poll support for epoll
From: Alexander Duyck Date: Fri, 24 Mar 2017 10:07:47 -0700 > v3: Split off the code for limiting busy_poll and busy_read into a separate > patch for net. > Updated patch that changed busy loop time tracking so that it uses > "local_clock() >> 10" as we originally did. > Tweaked "Change return type.." patch by moving declaration of "work" > inside the loop where is was accessed and always reset to 0. > Added "Acked-by" for patches that received acks. Eric, I believe Alexander has addresed your concerns. Could you please review patches 5 and 7 and ACK if they look ok to you? Thanks.
Re: [PATCH] uapi: fix rdma/mlx5-abi.h userspace compilation errors
On Fri, 2017-02-24 at 03:28 +0300, Dmitry V. Levin wrote: > Consistently use types from linux/types.h to fix the following > rdma/mlx5-abi.h userspace compilation errors: > > /usr/include/rdma/mlx5-abi.h:69:25: error: 'u64' undeclared here (not > in a function) > MLX5_LIB_CAP_4K_UAR = (u64)1 << 0, > /usr/include/rdma/mlx5-abi.h:69:29: error: expected ',' or '}' before > numeric constant > MLX5_LIB_CAP_4K_UAR = (u64)1 << 0, > > Include to fix the following rdma/mlx5-abi.h > userspace compilation error: > > /usr/include/rdma/mlx5-abi.h:286:12: error: 'ETH_ALEN' undeclared > here (not in a function) > __u8 dmac[ETH_ALEN]; > > Signed-off-by: Dmitry V. LevinThanks, applied. > --- > include/uapi/rdma/mlx5-abi.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/include/uapi/rdma/mlx5-abi.h b/include/uapi/rdma/mlx5- > abi.h > index da7cd62..0b3d308 100644 > --- a/include/uapi/rdma/mlx5-abi.h > +++ b/include/uapi/rdma/mlx5-abi.h > @@ -34,6 +34,7 @@ > #define MLX5_ABI_USER_H > > #include > +#include /* For ETH_ALEN. */ > > enum { > MLX5_QP_FLAG_SIGNATURE = 1 << 0, > @@ -66,7 +67,7 @@ struct mlx5_ib_alloc_ucontext_req { > }; > > enum mlx5_lib_caps { > - MLX5_LIB_CAP_4K_UAR = (u64)1 << 0, > + MLX5_LIB_CAP_4K_UAR = (__u64)1 << 0, > }; > > struct mlx5_ib_alloc_ucontext_req_v2 { -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Re: [PATCH] uapi: fix rdma/mlx5-abi.h userspace compilation errors
On Fri, 2017-02-24 at 03:28 +0300, Dmitry V. Levin wrote: > Consistently use types from linux/types.h to fix the following > rdma/mlx5-abi.h userspace compilation errors: > > /usr/include/rdma/mlx5-abi.h:69:25: error: 'u64' undeclared here (not > in a function) > MLX5_LIB_CAP_4K_UAR = (u64)1 << 0, > /usr/include/rdma/mlx5-abi.h:69:29: error: expected ',' or '}' before > numeric constant > MLX5_LIB_CAP_4K_UAR = (u64)1 << 0, > > Include to fix the following rdma/mlx5-abi.h > userspace compilation error: > > /usr/include/rdma/mlx5-abi.h:286:12: error: 'ETH_ALEN' undeclared > here (not in a function) > __u8 dmac[ETH_ALEN]; > > Signed-off-by: Dmitry V. Levin Thanks, applied. > --- > include/uapi/rdma/mlx5-abi.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/include/uapi/rdma/mlx5-abi.h b/include/uapi/rdma/mlx5- > abi.h > index da7cd62..0b3d308 100644 > --- a/include/uapi/rdma/mlx5-abi.h > +++ b/include/uapi/rdma/mlx5-abi.h > @@ -34,6 +34,7 @@ > #define MLX5_ABI_USER_H > > #include > +#include /* For ETH_ALEN. */ > > enum { > MLX5_QP_FLAG_SIGNATURE = 1 << 0, > @@ -66,7 +67,7 @@ struct mlx5_ib_alloc_ucontext_req { > }; > > enum mlx5_lib_caps { > - MLX5_LIB_CAP_4K_UAR = (u64)1 << 0, > + MLX5_LIB_CAP_4K_UAR = (__u64)1 << 0, > }; > > struct mlx5_ib_alloc_ucontext_req_v2 { -- Doug Ledford GPG KeyID: B826A3330E572FDD Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD
Re: [PATCH v3 1/2] module: verify address is read-only
On Fri, Mar 24, 2017 at 5:42 PM, Jessica Yuwrote: > +++ Kees Cook [23/03/17 14:13 -0700]: >> >> On Wed, Mar 22, 2017 at 7:55 PM, Eddie Kovsky wrote: >>> >>> Implement a mechanism to check if a module's address is in >>> the rodata or ro_after_init sections. It mimics the exsiting functions >>> that test if an address is inside a module's text section. >>> >>> Functions that take a module as an argument will be able to >>> verify that the module is in a read-only section. >>> >>> Signed-off-by: Eddie Kovsky >> >> >> Awesome! I'll be glad to have these. >> >> Reviewed-by: Kees Cook >> >> Jessica, if this looks good to you, should this go via modules or via >> my tree? If mine, can I have your Ack? > > > Sure, would be happy to let you take this patch through your tree. I > only had a few minor comments left, so I think just one more small > respin and it'll be good to go. Awesome, I'll wait for v4 and your Ack and put it through the kspp tree. :) Thanks for the review! -Kees -- Kees Cook Pixel Security
Re: [PATCH v3 1/2] module: verify address is read-only
On Fri, Mar 24, 2017 at 5:42 PM, Jessica Yu wrote: > +++ Kees Cook [23/03/17 14:13 -0700]: >> >> On Wed, Mar 22, 2017 at 7:55 PM, Eddie Kovsky wrote: >>> >>> Implement a mechanism to check if a module's address is in >>> the rodata or ro_after_init sections. It mimics the exsiting functions >>> that test if an address is inside a module's text section. >>> >>> Functions that take a module as an argument will be able to >>> verify that the module is in a read-only section. >>> >>> Signed-off-by: Eddie Kovsky >> >> >> Awesome! I'll be glad to have these. >> >> Reviewed-by: Kees Cook >> >> Jessica, if this looks good to you, should this go via modules or via >> my tree? If mine, can I have your Ack? > > > Sure, would be happy to let you take this patch through your tree. I > only had a few minor comments left, so I think just one more small > respin and it'll be good to go. Awesome, I'll wait for v4 and your Ack and put it through the kspp tree. :) Thanks for the review! -Kees -- Kees Cook Pixel Security
Re: [PATCH v3 1/2] module: verify address is read-only
On Fri, Mar 24, 2017 at 6:41 PM, Eddie Kovskywrote: > On 03/24/17, Jessica Yu wrote: >> +++ Eddie Kovsky [22/03/17 20:55 -0600]: >> > Implement a mechanism to check if a module's address is in >> > the rodata or ro_after_init sections. It mimics the exsiting functions >> > that test if an address is inside a module's text section. >> > >> > Functions that take a module as an argument will be able to >> >> > verify that the module is in a read-only section. >> >> s/module/module address/? >> > Yes, that is more accurate. > >> Also, there is some useful information in your cover letter on the >> context and motivation for adding to the api, it would be good to >> reproduce that info here so that we can have it officially in the >> changelog. (sentences "This implements a suggestion made by Kees..." >> and "The idea is to prevent structures..." would be nice to copy here) >> > Okay, that's easy to add. > > Kees, would you like me to add your Suggested-by as well? Sure! I'm find either way. :) -Kees -- Kees Cook Pixel Security
Re: [PATCH v3 1/2] module: verify address is read-only
On Fri, Mar 24, 2017 at 6:41 PM, Eddie Kovsky wrote: > On 03/24/17, Jessica Yu wrote: >> +++ Eddie Kovsky [22/03/17 20:55 -0600]: >> > Implement a mechanism to check if a module's address is in >> > the rodata or ro_after_init sections. It mimics the exsiting functions >> > that test if an address is inside a module's text section. >> > >> > Functions that take a module as an argument will be able to >> >> > verify that the module is in a read-only section. >> >> s/module/module address/? >> > Yes, that is more accurate. > >> Also, there is some useful information in your cover letter on the >> context and motivation for adding to the api, it would be good to >> reproduce that info here so that we can have it officially in the >> changelog. (sentences "This implements a suggestion made by Kees..." >> and "The idea is to prevent structures..." would be nice to copy here) >> > Okay, that's easy to add. > > Kees, would you like me to add your Suggested-by as well? Sure! I'm find either way. :) -Kees -- Kees Cook Pixel Security
[PATCH] Staging: vt6655 - block comments style fix
Fixed style of block comments Found using checkpatch Signed-off-by: Derek Robson--- drivers/staging/vt6655/rf.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/vt6655/rf.h b/drivers/staging/vt6655/rf.h index b6e853784a26..37600093cab2 100644 --- a/drivers/staging/vt6655/rf.h +++ b/drivers/staging/vt6655/rf.h @@ -30,7 +30,7 @@ /*- Export Definitions -*/ /* * Baseband RF pair definition in eeprom (Bits 6..0) -*/ + */ #define RF_RFMD2959 0x01 #define RF_MAXIMAG 0x02 #define RF_AIROHA 0x03 -- 2.12.0
[PATCH] Staging: vt6655 - block comments style fix
Fixed style of block comments Found using checkpatch Signed-off-by: Derek Robson --- drivers/staging/vt6655/rf.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/staging/vt6655/rf.h b/drivers/staging/vt6655/rf.h index b6e853784a26..37600093cab2 100644 --- a/drivers/staging/vt6655/rf.h +++ b/drivers/staging/vt6655/rf.h @@ -30,7 +30,7 @@ /*- Export Definitions -*/ /* * Baseband RF pair definition in eeprom (Bits 6..0) -*/ + */ #define RF_RFMD2959 0x01 #define RF_MAXIMAG 0x02 #define RF_AIROHA 0x03 -- 2.12.0
Re: [PATCH v3 1/2] module: verify address is read-only
On 03/24/17, Jessica Yu wrote: > +++ Eddie Kovsky [22/03/17 20:55 -0600]: > > Implement a mechanism to check if a module's address is in > > the rodata or ro_after_init sections. It mimics the exsiting functions > > that test if an address is inside a module's text section. > > > > Functions that take a module as an argument will be able to > > > verify that the module is in a read-only section. > > s/module/module address/? > Yes, that is more accurate. > Also, there is some useful information in your cover letter on the > context and motivation for adding to the api, it would be good to > reproduce that info here so that we can have it officially in the > changelog. (sentences "This implements a suggestion made by Kees..." > and "The idea is to prevent structures..." would be nice to copy here) > Okay, that's easy to add. Kees, would you like me to add your Suggested-by as well? > > Signed-off-by: Eddie Kovsky> > --- > > Changes in v3: > > - Change function name is_module_ro_address() to > > is_module_rodata_address(). > > - Improve comments on is_module_rodata_address(). > > - Add a !CONFIG_MODULES stub for is_module_rodata_address. > > - Correct and simplify the check for the read-only memory regions in > > the module address. > > > > include/linux/module.h | 12 > > kernel/module.c| 53 > > ++ > > 2 files changed, 65 insertions(+) > > > > diff --git a/include/linux/module.h b/include/linux/module.h > > index 9ad68561d8c2..66b7fd321334 100644 > > --- a/include/linux/module.h > > +++ b/include/linux/module.h > > @@ -492,7 +492,9 @@ static inline int module_is_live(struct module *mod) > > > > struct module *__module_text_address(unsigned long addr); > > struct module *__module_address(unsigned long addr); > > +struct module *__module_ro_address(unsigned long addr); > > Can we rename __module_ro_address to __module_rodata_address (to match > is_module_rodata_address())? > Sure, consistency is a good thing. > > bool is_module_address(unsigned long addr); > > +bool is_module_rodata_address(unsigned long addr); > > bool __is_module_percpu_address(unsigned long addr, unsigned long > > *can_addr); > > bool is_module_percpu_address(unsigned long addr); > > bool is_module_text_address(unsigned long addr); > > @@ -646,6 +648,16 @@ static inline struct module *__module_address(unsigned > > long addr) > > return NULL; > > } > > > > +static inline struct module *__module_ro_address(unsigned long addr) > > +{ > > + return NULL; > > +} > > + > > +static inline bool is_module_rodata_address(unsigned long addr) > > +{ > > + return false; > > +} > > + > > Style nitpick: Can you move is_module_rodata_address() below > is_module_address()? > That way, the function stubs are grouped a bit more consistently. > I think I might have shuffled that on the last rebase. > > static inline struct module *__module_text_address(unsigned long addr) > > { > > return NULL; > > diff --git a/kernel/module.c b/kernel/module.c > > index 8ffcd29a4245..99ada1257aaa 100644 > > --- a/kernel/module.c > > +++ b/kernel/module.c > > @@ -4292,6 +4292,59 @@ struct module *__module_text_address(unsigned long > > addr) > > } > > EXPORT_SYMBOL_GPL(__module_text_address); > > > > +/** > > + * is_module_rodata_address - is this address inside read-only module code? > > s/code/data/ > Yes. > > + * @addr: the address to check. > > + * > > + */ > > +bool is_module_rodata_address(unsigned long addr) > > +{ > > + bool ret; > > + > > + preempt_disable(); > > + ret = __module_ro_address(addr) != NULL; > > + preempt_enable(); > > + > > + return ret; > > +} > > + > > +/* > > + * __module_ro_address - get the module whose rodata/ro_after_init sections > > + * contain the given address. > > As mentioned in the first comment, let's rename __module_ro_address to > __module_rodata_address, so it mirrors is_module_rodata_address(). > > > + * @addr: the address. > > + * > > + * Must be called with preempt disabled or module mutex held so that > > + * module doesn't get freed during this. > > + */ > > +struct module *__module_ro_address(unsigned long addr) > > +{ > > + struct module *mod = __module_address(addr); > > We need to check that mod is not NULL before dereferencing it below. > I missed that. The new variables are now using mod before we check it. I'll fix it on the next round. > > + void *init_base = mod->init_layout.base; > > + unsigned int init_text_size = mod->init_layout.text_size; > > + unsigned int init_ro_after_init_size = > > mod->init_layout.ro_after_init_size; > > + > > + void *core_base = mod->core_layout.base; > > + unsigned int core_text_size = mod->core_layout.text_size; > > + unsigned int core_ro_after_init_size = > > mod->core_layout.ro_after_init_size; > > + > > + /* > > +* Make sure module is within the read-only section. > > +* range(base + text_size, base + ro_after_init_size)
Re: [PATCH v3 1/2] module: verify address is read-only
On 03/24/17, Jessica Yu wrote: > +++ Eddie Kovsky [22/03/17 20:55 -0600]: > > Implement a mechanism to check if a module's address is in > > the rodata or ro_after_init sections. It mimics the exsiting functions > > that test if an address is inside a module's text section. > > > > Functions that take a module as an argument will be able to > > > verify that the module is in a read-only section. > > s/module/module address/? > Yes, that is more accurate. > Also, there is some useful information in your cover letter on the > context and motivation for adding to the api, it would be good to > reproduce that info here so that we can have it officially in the > changelog. (sentences "This implements a suggestion made by Kees..." > and "The idea is to prevent structures..." would be nice to copy here) > Okay, that's easy to add. Kees, would you like me to add your Suggested-by as well? > > Signed-off-by: Eddie Kovsky > > --- > > Changes in v3: > > - Change function name is_module_ro_address() to > > is_module_rodata_address(). > > - Improve comments on is_module_rodata_address(). > > - Add a !CONFIG_MODULES stub for is_module_rodata_address. > > - Correct and simplify the check for the read-only memory regions in > > the module address. > > > > include/linux/module.h | 12 > > kernel/module.c| 53 > > ++ > > 2 files changed, 65 insertions(+) > > > > diff --git a/include/linux/module.h b/include/linux/module.h > > index 9ad68561d8c2..66b7fd321334 100644 > > --- a/include/linux/module.h > > +++ b/include/linux/module.h > > @@ -492,7 +492,9 @@ static inline int module_is_live(struct module *mod) > > > > struct module *__module_text_address(unsigned long addr); > > struct module *__module_address(unsigned long addr); > > +struct module *__module_ro_address(unsigned long addr); > > Can we rename __module_ro_address to __module_rodata_address (to match > is_module_rodata_address())? > Sure, consistency is a good thing. > > bool is_module_address(unsigned long addr); > > +bool is_module_rodata_address(unsigned long addr); > > bool __is_module_percpu_address(unsigned long addr, unsigned long > > *can_addr); > > bool is_module_percpu_address(unsigned long addr); > > bool is_module_text_address(unsigned long addr); > > @@ -646,6 +648,16 @@ static inline struct module *__module_address(unsigned > > long addr) > > return NULL; > > } > > > > +static inline struct module *__module_ro_address(unsigned long addr) > > +{ > > + return NULL; > > +} > > + > > +static inline bool is_module_rodata_address(unsigned long addr) > > +{ > > + return false; > > +} > > + > > Style nitpick: Can you move is_module_rodata_address() below > is_module_address()? > That way, the function stubs are grouped a bit more consistently. > I think I might have shuffled that on the last rebase. > > static inline struct module *__module_text_address(unsigned long addr) > > { > > return NULL; > > diff --git a/kernel/module.c b/kernel/module.c > > index 8ffcd29a4245..99ada1257aaa 100644 > > --- a/kernel/module.c > > +++ b/kernel/module.c > > @@ -4292,6 +4292,59 @@ struct module *__module_text_address(unsigned long > > addr) > > } > > EXPORT_SYMBOL_GPL(__module_text_address); > > > > +/** > > + * is_module_rodata_address - is this address inside read-only module code? > > s/code/data/ > Yes. > > + * @addr: the address to check. > > + * > > + */ > > +bool is_module_rodata_address(unsigned long addr) > > +{ > > + bool ret; > > + > > + preempt_disable(); > > + ret = __module_ro_address(addr) != NULL; > > + preempt_enable(); > > + > > + return ret; > > +} > > + > > +/* > > + * __module_ro_address - get the module whose rodata/ro_after_init sections > > + * contain the given address. > > As mentioned in the first comment, let's rename __module_ro_address to > __module_rodata_address, so it mirrors is_module_rodata_address(). > > > + * @addr: the address. > > + * > > + * Must be called with preempt disabled or module mutex held so that > > + * module doesn't get freed during this. > > + */ > > +struct module *__module_ro_address(unsigned long addr) > > +{ > > + struct module *mod = __module_address(addr); > > We need to check that mod is not NULL before dereferencing it below. > I missed that. The new variables are now using mod before we check it. I'll fix it on the next round. > > + void *init_base = mod->init_layout.base; > > + unsigned int init_text_size = mod->init_layout.text_size; > > + unsigned int init_ro_after_init_size = > > mod->init_layout.ro_after_init_size; > > + > > + void *core_base = mod->core_layout.base; > > + unsigned int core_text_size = mod->core_layout.text_size; > > + unsigned int core_ro_after_init_size = > > mod->core_layout.ro_after_init_size; > > + > > + /* > > +* Make sure module is within the read-only section. > > +* range(base + text_size, base + ro_after_init_size) > > +*
Re: [PATCH 4.9 00/24] 4.9.18-stable review
Hello Kevin, On 03/24/2017 08:57 PM, Kevin Hilman wrote: > + Sjoerd, Javier, > > kernelci.org botwrites: > >> stable-rc boot: 233 boots: 1 failed, 222 passed with 10 offline >> (v4.9.17-25-g4d90baeca3c8) >> >> Full Boot Summary: >> https://kernelci.org/boot/all/job/stable-rc/kernel/v4.9.17-25-g4d90baeca3c8/ >> Full Build Summary: >> https://kernelci.org/build/stable-rc/kernel/v4.9.17-25-g4d90baeca3c8/ >> >> Tree: stable-rc >> Branch: local/linux-4.9.y >> Git Describe: v4.9.17-25-g4d90baeca3c8 >> Git Commit: 4d90baeca3c8aebdcb6e3ec8a6b31d4279cfbf3d >> Git URL: >> http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git >> Tested: 62 unique boards, 16 SoC families, 24 builds out of 199 >> >> Boot Regressions Detected: >> >> arm: >> >> exynos_defconfig: >> exynos5250-snow: >> lab-collabora: new failure (last pass: v4.9.17) > > This one a new regression, but this board is in the Collabora lab and I > can't currently bisect remotely. > > @Sjoerd, Javier: can either of you have a closer look at this? > > Looks like a failure in the DRM component init path. > This seems to be the bug that's fixed by commit f0a8b49c03d2 ("drm/bridge: analogix dp: Fix runtime PM state on driver bind") [0]. I thought it was only a regression for v4.10, but it looks like that also affects v4.9. [0]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f0a8b49c03d2 > Kevin > Best regards, -- Javier Martinez Canillas Open Source Group Samsung Research America
Re: [PATCH 4.9 00/24] 4.9.18-stable review
Hello Kevin, On 03/24/2017 08:57 PM, Kevin Hilman wrote: > + Sjoerd, Javier, > > kernelci.org bot writes: > >> stable-rc boot: 233 boots: 1 failed, 222 passed with 10 offline >> (v4.9.17-25-g4d90baeca3c8) >> >> Full Boot Summary: >> https://kernelci.org/boot/all/job/stable-rc/kernel/v4.9.17-25-g4d90baeca3c8/ >> Full Build Summary: >> https://kernelci.org/build/stable-rc/kernel/v4.9.17-25-g4d90baeca3c8/ >> >> Tree: stable-rc >> Branch: local/linux-4.9.y >> Git Describe: v4.9.17-25-g4d90baeca3c8 >> Git Commit: 4d90baeca3c8aebdcb6e3ec8a6b31d4279cfbf3d >> Git URL: >> http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git >> Tested: 62 unique boards, 16 SoC families, 24 builds out of 199 >> >> Boot Regressions Detected: >> >> arm: >> >> exynos_defconfig: >> exynos5250-snow: >> lab-collabora: new failure (last pass: v4.9.17) > > This one a new regression, but this board is in the Collabora lab and I > can't currently bisect remotely. > > @Sjoerd, Javier: can either of you have a closer look at this? > > Looks like a failure in the DRM component init path. > This seems to be the bug that's fixed by commit f0a8b49c03d2 ("drm/bridge: analogix dp: Fix runtime PM state on driver bind") [0]. I thought it was only a regression for v4.10, but it looks like that also affects v4.9. [0]: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=f0a8b49c03d2 > Kevin > Best regards, -- Javier Martinez Canillas Open Source Group Samsung Research America
Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
On Sat, Mar 25, 2017 at 2:14 AM, Sai Gurrappadiwrote: > Hi Rafael, > > On 03/21/2017 04:08 PM, Rafael J. Wysocki wrote: >> From: Rafael J. Wysocki >> >> The way the schedutil governor uses the PELT metric causes it to >> underestimate the CPU utilization in some cases. >> >> That can be easily demonstrated by running kernel compilation on >> a Sandy Bridge Intel processor, running turbostat in parallel with >> it and looking at the values written to the MSR_IA32_PERF_CTL >> register. Namely, the expected result would be that when all CPUs >> were 100% busy, all of them would be requested to run in the maximum >> P-state, but observation shows that this clearly isn't the case. >> The CPUs run in the maximum P-state for a while and then are >> requested to run slower and go back to the maximum P-state after >> a while again. That causes the actual frequency of the processor to >> visibly oscillate below the sustainable maximum in a jittery fashion >> which clearly is not desirable. >> >> That has been attributed to CPU utilization metric updates on task >> migration that cause the total utilization value for the CPU to be >> reduced by the utilization of the migrated task. If that happens, >> the schedutil governor may see a CPU utilization reduction and will >> attempt to reduce the CPU frequency accordingly right away. That >> may be premature, though, for example if the system is generally >> busy and there are other runnable tasks waiting to be run on that >> CPU already. >> > > Thinking out loud a bit, I wonder if what you really want to do is basically: > > schedutil_cpu_util(cpu) = max(cpu_rq(cpu)->cfs.util_avg, total_cpu_util_avg); > > Where total_cpu_util_avg tracks the average utilization of the CPU itself > over time (% of time the CPU was busy) in the same PELT like manner. The > difference here is that it doesn't change instantaneously as tasks migrate > in/out but it decays/accumulates just like the per-entity util_avgs. > > Over time, total_cpu_util_avg and cfs_rq(cpu)->util_avg will tend towards > each other the lesser the amount of 'overlap' / overloading. > > Yes, the above metric would 'overestimate' in case all tasks have migrated > away and we are left with an idle CPU. A fix for that could be to just use > the PELT value like so: > > schedutil_cpu_util(cpu) = max(cpu_rq(cpu)->cfs.util_avg, idle_cpu(cpu) ? 0 : > total_cpu_util_avg); > > Note that the problem described here in the commit message doesn't need fully > runnable threads, it just needs two threads to execute in parallel on the > same CPU for a period of time. I don't think looking at just idle_calls > necessarily covers all cases. > > Thoughts? Well, is the total_cpu_util_avg metric readily available? Thanks, Rafael
Re: lockdep warning: console vs. mem hotplug
On (03/24/17 21:08), Steven Rostedt wrote: > > Sebastian, does this change make lockdep happy? > > > > it removes console drivers from the __offline_isolated_pages(). not the > > best solution I can think of, but the simplest one. > > > > --- > > > > mm/page_alloc.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index f749b7ff7c50..eb61e6ab5f4f 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -7705,7 +7705,7 @@ __offline_isolated_pages(unsigned long start_pfn, > > unsigned long end_pfn) > > BUG_ON(!PageBuddy(page)); > > order = page_order(page); > > #ifdef CONFIG_DEBUG_VM > > - pr_info("remove from free list %lx %d %lx\n", > > + printk_deferred(KERN_INFO "remove from free list %lx %d %lx\n", > > pfn, 1 << order, end_pfn); > > #endif > > list_del(>lru); > > > My fear is that this will trigger for any printk in page_alloc.c under > the zone lock. absolutely true. I Cc'd debugobjects, mm and sclp_console maintainers because the real (smart) solution to the problem is somewhere there. another problem (not reported) is that we have conflicting dependencies mod_timer -> sclp_console sclp_console -> mod_timer which can result in a deadlock: mod_timer -> debugobjects -> printk -> sclp_console -> mod_timer this one, I think, can be addressed by switching to a printk_safe in debugobjects. // I'm traveling now. there will be delays in replies, sorry. -ss
Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
On Sat, Mar 25, 2017 at 2:14 AM, Sai Gurrappadi wrote: > Hi Rafael, > > On 03/21/2017 04:08 PM, Rafael J. Wysocki wrote: >> From: Rafael J. Wysocki >> >> The way the schedutil governor uses the PELT metric causes it to >> underestimate the CPU utilization in some cases. >> >> That can be easily demonstrated by running kernel compilation on >> a Sandy Bridge Intel processor, running turbostat in parallel with >> it and looking at the values written to the MSR_IA32_PERF_CTL >> register. Namely, the expected result would be that when all CPUs >> were 100% busy, all of them would be requested to run in the maximum >> P-state, but observation shows that this clearly isn't the case. >> The CPUs run in the maximum P-state for a while and then are >> requested to run slower and go back to the maximum P-state after >> a while again. That causes the actual frequency of the processor to >> visibly oscillate below the sustainable maximum in a jittery fashion >> which clearly is not desirable. >> >> That has been attributed to CPU utilization metric updates on task >> migration that cause the total utilization value for the CPU to be >> reduced by the utilization of the migrated task. If that happens, >> the schedutil governor may see a CPU utilization reduction and will >> attempt to reduce the CPU frequency accordingly right away. That >> may be premature, though, for example if the system is generally >> busy and there are other runnable tasks waiting to be run on that >> CPU already. >> > > Thinking out loud a bit, I wonder if what you really want to do is basically: > > schedutil_cpu_util(cpu) = max(cpu_rq(cpu)->cfs.util_avg, total_cpu_util_avg); > > Where total_cpu_util_avg tracks the average utilization of the CPU itself > over time (% of time the CPU was busy) in the same PELT like manner. The > difference here is that it doesn't change instantaneously as tasks migrate > in/out but it decays/accumulates just like the per-entity util_avgs. > > Over time, total_cpu_util_avg and cfs_rq(cpu)->util_avg will tend towards > each other the lesser the amount of 'overlap' / overloading. > > Yes, the above metric would 'overestimate' in case all tasks have migrated > away and we are left with an idle CPU. A fix for that could be to just use > the PELT value like so: > > schedutil_cpu_util(cpu) = max(cpu_rq(cpu)->cfs.util_avg, idle_cpu(cpu) ? 0 : > total_cpu_util_avg); > > Note that the problem described here in the commit message doesn't need fully > runnable threads, it just needs two threads to execute in parallel on the > same CPU for a period of time. I don't think looking at just idle_calls > necessarily covers all cases. > > Thoughts? Well, is the total_cpu_util_avg metric readily available? Thanks, Rafael
Re: lockdep warning: console vs. mem hotplug
On (03/24/17 21:08), Steven Rostedt wrote: > > Sebastian, does this change make lockdep happy? > > > > it removes console drivers from the __offline_isolated_pages(). not the > > best solution I can think of, but the simplest one. > > > > --- > > > > mm/page_alloc.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > > index f749b7ff7c50..eb61e6ab5f4f 100644 > > --- a/mm/page_alloc.c > > +++ b/mm/page_alloc.c > > @@ -7705,7 +7705,7 @@ __offline_isolated_pages(unsigned long start_pfn, > > unsigned long end_pfn) > > BUG_ON(!PageBuddy(page)); > > order = page_order(page); > > #ifdef CONFIG_DEBUG_VM > > - pr_info("remove from free list %lx %d %lx\n", > > + printk_deferred(KERN_INFO "remove from free list %lx %d %lx\n", > > pfn, 1 << order, end_pfn); > > #endif > > list_del(>lru); > > > My fear is that this will trigger for any printk in page_alloc.c under > the zone lock. absolutely true. I Cc'd debugobjects, mm and sclp_console maintainers because the real (smart) solution to the problem is somewhere there. another problem (not reported) is that we have conflicting dependencies mod_timer -> sclp_console sclp_console -> mod_timer which can result in a deadlock: mod_timer -> debugobjects -> printk -> sclp_console -> mod_timer this one, I think, can be addressed by switching to a printk_safe in debugobjects. // I'm traveling now. there will be delays in replies, sorry. -ss
Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
I think we can all agree that the *ideal* situation would be, for the balloon driver to not immediately hotplug memory so it can add 11 more pages, so maybe I just need to figure out why the balloon driver thinks it needs 11 more pages, and fix that. How does the new memory appear in the guest? Via online_pages()? Or is ballooning triggered from watch_target()? -boris
Re: maybe revert commit c275a57f5ec3 "xen/balloon: Set balloon's initial state to number of existing RAM pages"
I think we can all agree that the *ideal* situation would be, for the balloon driver to not immediately hotplug memory so it can add 11 more pages, so maybe I just need to figure out why the balloon driver thinks it needs 11 more pages, and fix that. How does the new memory appear in the guest? Via online_pages()? Or is ballooning triggered from watch_target()? -boris
Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
Hi Rafael, On 03/21/2017 04:08 PM, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki> > The way the schedutil governor uses the PELT metric causes it to > underestimate the CPU utilization in some cases. > > That can be easily demonstrated by running kernel compilation on > a Sandy Bridge Intel processor, running turbostat in parallel with > it and looking at the values written to the MSR_IA32_PERF_CTL > register. Namely, the expected result would be that when all CPUs > were 100% busy, all of them would be requested to run in the maximum > P-state, but observation shows that this clearly isn't the case. > The CPUs run in the maximum P-state for a while and then are > requested to run slower and go back to the maximum P-state after > a while again. That causes the actual frequency of the processor to > visibly oscillate below the sustainable maximum in a jittery fashion > which clearly is not desirable. > > That has been attributed to CPU utilization metric updates on task > migration that cause the total utilization value for the CPU to be > reduced by the utilization of the migrated task. If that happens, > the schedutil governor may see a CPU utilization reduction and will > attempt to reduce the CPU frequency accordingly right away. That > may be premature, though, for example if the system is generally > busy and there are other runnable tasks waiting to be run on that > CPU already. > Thinking out loud a bit, I wonder if what you really want to do is basically: schedutil_cpu_util(cpu) = max(cpu_rq(cpu)->cfs.util_avg, total_cpu_util_avg); Where total_cpu_util_avg tracks the average utilization of the CPU itself over time (% of time the CPU was busy) in the same PELT like manner. The difference here is that it doesn't change instantaneously as tasks migrate in/out but it decays/accumulates just like the per-entity util_avgs. Over time, total_cpu_util_avg and cfs_rq(cpu)->util_avg will tend towards each other the lesser the amount of 'overlap' / overloading. Yes, the above metric would 'overestimate' in case all tasks have migrated away and we are left with an idle CPU. A fix for that could be to just use the PELT value like so: schedutil_cpu_util(cpu) = max(cpu_rq(cpu)->cfs.util_avg, idle_cpu(cpu) ? 0 : total_cpu_util_avg); Note that the problem described here in the commit message doesn't need fully runnable threads, it just needs two threads to execute in parallel on the same CPU for a period of time. I don't think looking at just idle_calls necessarily covers all cases. Thoughts? Thanks, -Sai
Re: [RFC][PATCH v3 2/2] cpufreq: schedutil: Avoid reducing frequency of busy CPUs prematurely
Hi Rafael, On 03/21/2017 04:08 PM, Rafael J. Wysocki wrote: > From: Rafael J. Wysocki > > The way the schedutil governor uses the PELT metric causes it to > underestimate the CPU utilization in some cases. > > That can be easily demonstrated by running kernel compilation on > a Sandy Bridge Intel processor, running turbostat in parallel with > it and looking at the values written to the MSR_IA32_PERF_CTL > register. Namely, the expected result would be that when all CPUs > were 100% busy, all of them would be requested to run in the maximum > P-state, but observation shows that this clearly isn't the case. > The CPUs run in the maximum P-state for a while and then are > requested to run slower and go back to the maximum P-state after > a while again. That causes the actual frequency of the processor to > visibly oscillate below the sustainable maximum in a jittery fashion > which clearly is not desirable. > > That has been attributed to CPU utilization metric updates on task > migration that cause the total utilization value for the CPU to be > reduced by the utilization of the migrated task. If that happens, > the schedutil governor may see a CPU utilization reduction and will > attempt to reduce the CPU frequency accordingly right away. That > may be premature, though, for example if the system is generally > busy and there are other runnable tasks waiting to be run on that > CPU already. > Thinking out loud a bit, I wonder if what you really want to do is basically: schedutil_cpu_util(cpu) = max(cpu_rq(cpu)->cfs.util_avg, total_cpu_util_avg); Where total_cpu_util_avg tracks the average utilization of the CPU itself over time (% of time the CPU was busy) in the same PELT like manner. The difference here is that it doesn't change instantaneously as tasks migrate in/out but it decays/accumulates just like the per-entity util_avgs. Over time, total_cpu_util_avg and cfs_rq(cpu)->util_avg will tend towards each other the lesser the amount of 'overlap' / overloading. Yes, the above metric would 'overestimate' in case all tasks have migrated away and we are left with an idle CPU. A fix for that could be to just use the PELT value like so: schedutil_cpu_util(cpu) = max(cpu_rq(cpu)->cfs.util_avg, idle_cpu(cpu) ? 0 : total_cpu_util_avg); Note that the problem described here in the commit message doesn't need fully runnable threads, it just needs two threads to execute in parallel on the same CPU for a period of time. I don't think looking at just idle_calls necessarily covers all cases. Thoughts? Thanks, -Sai
Re: lockdep warning: console vs. mem hotplug
On Sat, 25 Mar 2017 09:04:42 +0900 Sergey Senozhatskywrote: > On (03/21/17 13:44), Sergey Senozhatsky wrote: > [..] > > so we probably can > > > > > > 1) move pr_info() out of zone->lock in __offline_isolated_pages(). > >meh... > > > > > > 2) switch to printk_deferred() in __offline_isolated_pages(). > >meh.. there might a bunch of other printks done from under zone->lock. > > > > > > 3) move add_timer() out of sclp_con_lock console in sclp_console_write(). > >well, there can be other consoles that do something similar. > > > > > > 4) ... something smart. > > > Sebastian, does this change make lockdep happy? > > it removes console drivers from the __offline_isolated_pages(). not the > best solution I can think of, but the simplest one. > > --- > > mm/page_alloc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index f749b7ff7c50..eb61e6ab5f4f 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -7705,7 +7705,7 @@ __offline_isolated_pages(unsigned long start_pfn, > unsigned long end_pfn) > BUG_ON(!PageBuddy(page)); > order = page_order(page); > #ifdef CONFIG_DEBUG_VM > - pr_info("remove from free list %lx %d %lx\n", > + printk_deferred(KERN_INFO "remove from free list %lx %d %lx\n", > pfn, 1 << order, end_pfn); > #endif > list_del(>lru); My fear is that this will trigger for any printk in page_alloc.c under the zone lock. -- Steve
Re: lockdep warning: console vs. mem hotplug
On Sat, 25 Mar 2017 09:04:42 +0900 Sergey Senozhatsky wrote: > On (03/21/17 13:44), Sergey Senozhatsky wrote: > [..] > > so we probably can > > > > > > 1) move pr_info() out of zone->lock in __offline_isolated_pages(). > >meh... > > > > > > 2) switch to printk_deferred() in __offline_isolated_pages(). > >meh.. there might a bunch of other printks done from under zone->lock. > > > > > > 3) move add_timer() out of sclp_con_lock console in sclp_console_write(). > >well, there can be other consoles that do something similar. > > > > > > 4) ... something smart. > > > Sebastian, does this change make lockdep happy? > > it removes console drivers from the __offline_isolated_pages(). not the > best solution I can think of, but the simplest one. > > --- > > mm/page_alloc.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index f749b7ff7c50..eb61e6ab5f4f 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -7705,7 +7705,7 @@ __offline_isolated_pages(unsigned long start_pfn, > unsigned long end_pfn) > BUG_ON(!PageBuddy(page)); > order = page_order(page); > #ifdef CONFIG_DEBUG_VM > - pr_info("remove from free list %lx %d %lx\n", > + printk_deferred(KERN_INFO "remove from free list %lx %d %lx\n", > pfn, 1 << order, end_pfn); > #endif > list_del(>lru); My fear is that this will trigger for any printk in page_alloc.c under the zone lock. -- Steve
[PATCH] cfg80211: Fix array-bounds warning in fragment copy
__ieee80211_amsdu_copy_frag intentionally initializes a pointer to array[-1] to increment it later to valid values. clang rightfully generates an array-bounds warning on the initialization statement. Work around this by initializing the pointer to array[0] and decrementing it later, which allows to leave the rest of the algorithm untouched. Signed-off-by: Matthias Kaehlcke--- net/wireless/util.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/wireless/util.c b/net/wireless/util.c index 68e5f2ecee1a..d3d459e4a070 100644 --- a/net/wireless/util.c +++ b/net/wireless/util.c @@ -659,7 +659,7 @@ __ieee80211_amsdu_copy_frag(struct sk_buff *skb, struct sk_buff *frame, int offset, int len) { struct skb_shared_info *sh = skb_shinfo(skb); - const skb_frag_t *frag = >frags[-1]; + const skb_frag_t *frag = >frags[0]; struct page *frag_page; void *frag_ptr; int frag_len, frag_size; @@ -669,6 +669,7 @@ __ieee80211_amsdu_copy_frag(struct sk_buff *skb, struct sk_buff *frame, frag_page = virt_to_head_page(skb->head); frag_ptr = skb->data; frag_size = head_size; + frag--; while (offset >= frag_size) { offset -= frag_size; -- 2.12.1.578.ge9c3154ca4-goog
[PATCH] cfg80211: Fix array-bounds warning in fragment copy
__ieee80211_amsdu_copy_frag intentionally initializes a pointer to array[-1] to increment it later to valid values. clang rightfully generates an array-bounds warning on the initialization statement. Work around this by initializing the pointer to array[0] and decrementing it later, which allows to leave the rest of the algorithm untouched. Signed-off-by: Matthias Kaehlcke --- net/wireless/util.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/net/wireless/util.c b/net/wireless/util.c index 68e5f2ecee1a..d3d459e4a070 100644 --- a/net/wireless/util.c +++ b/net/wireless/util.c @@ -659,7 +659,7 @@ __ieee80211_amsdu_copy_frag(struct sk_buff *skb, struct sk_buff *frame, int offset, int len) { struct skb_shared_info *sh = skb_shinfo(skb); - const skb_frag_t *frag = >frags[-1]; + const skb_frag_t *frag = >frags[0]; struct page *frag_page; void *frag_ptr; int frag_len, frag_size; @@ -669,6 +669,7 @@ __ieee80211_amsdu_copy_frag(struct sk_buff *skb, struct sk_buff *frame, frag_page = virt_to_head_page(skb->head); frag_ptr = skb->data; frag_size = head_size; + frag--; while (offset >= frag_size) { offset -= frag_size; -- 2.12.1.578.ge9c3154ca4-goog
[GIT PULL] clk fixes for v4.11-rc3
The following changes since commit c1ae3cfa0e89fa1a7ecc4c99031f5e9ae99d9201: Linux 4.11-rc1 (2017-03-05 12:59:56 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git tags/clk-fixes-for-linus for you to fetch changes up to 7f0b97d5bb4c1c99c38dd6770ad11f714ea42583: Merge tag 'sunxi-clk-fixes-for-4.11' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into clk-fixes (2017-03-23 16:08:46 -0700) A handful of Sunxi and Rockchip clk driver fixes and a core framework one where we need to copy a string because we can't guarantee it isn't freed sometime later. Arnd Bergmann (1): clk: sunxi: ccu-sun5i needs nkmp Chen-Yu Tsai (2): clk: sunxi-ng: mp: Adjust parent rate for pre-dividers clk: sunxi-ng: sun6i: Fix enable bit offset for hdmi-ddc module clock Heiko Stuebner (2): clk: rockchip: add "," to mux_pll_src_apll_dpll_gpll_usb480m_p on rk3036 clk: rockchip: Make uartpll a child of the gpll on rk3036 Icenowy Zheng (1): clk: sunxi-ng: fix recalc_rate formula of NKMP clocks Leonard Crestez (1): clk: core: Copy connection id Philipp Tomsich (1): clk: sunxi-ng: Fix div/mult settings for osc12M on A64 Stephen Boyd (1): Merge tag 'sunxi-clk-fixes-for-4.11' of https://git.kernel.org/.../mripard/linux into clk-fixes Thomas Petazzoni (1): dt-bindings: arm: update Armada CP110 system controller binding .../devicetree/bindings/arm/marvell/cp110-system-controller0.txt | 6 +++--- drivers/clk/clk.c| 3 ++- drivers/clk/rockchip/clk-rk3036.c| 9 - drivers/clk/sunxi-ng/Kconfig | 1 + drivers/clk/sunxi-ng/ccu-sun50i-a64.c| 2 +- drivers/clk/sunxi-ng/ccu-sun6i-a31.c | 2 +- drivers/clk/sunxi-ng/ccu_mp.c| 8 drivers/clk/sunxi-ng/ccu_nkmp.c | 2 +- 8 files changed, 25 insertions(+), 8 deletions(-) -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
[GIT PULL] clk fixes for v4.11-rc3
The following changes since commit c1ae3cfa0e89fa1a7ecc4c99031f5e9ae99d9201: Linux 4.11-rc1 (2017-03-05 12:59:56 -0800) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git tags/clk-fixes-for-linus for you to fetch changes up to 7f0b97d5bb4c1c99c38dd6770ad11f714ea42583: Merge tag 'sunxi-clk-fixes-for-4.11' of https://git.kernel.org/pub/scm/linux/kernel/git/mripard/linux into clk-fixes (2017-03-23 16:08:46 -0700) A handful of Sunxi and Rockchip clk driver fixes and a core framework one where we need to copy a string because we can't guarantee it isn't freed sometime later. Arnd Bergmann (1): clk: sunxi: ccu-sun5i needs nkmp Chen-Yu Tsai (2): clk: sunxi-ng: mp: Adjust parent rate for pre-dividers clk: sunxi-ng: sun6i: Fix enable bit offset for hdmi-ddc module clock Heiko Stuebner (2): clk: rockchip: add "," to mux_pll_src_apll_dpll_gpll_usb480m_p on rk3036 clk: rockchip: Make uartpll a child of the gpll on rk3036 Icenowy Zheng (1): clk: sunxi-ng: fix recalc_rate formula of NKMP clocks Leonard Crestez (1): clk: core: Copy connection id Philipp Tomsich (1): clk: sunxi-ng: Fix div/mult settings for osc12M on A64 Stephen Boyd (1): Merge tag 'sunxi-clk-fixes-for-4.11' of https://git.kernel.org/.../mripard/linux into clk-fixes Thomas Petazzoni (1): dt-bindings: arm: update Armada CP110 system controller binding .../devicetree/bindings/arm/marvell/cp110-system-controller0.txt | 6 +++--- drivers/clk/clk.c| 3 ++- drivers/clk/rockchip/clk-rk3036.c| 9 - drivers/clk/sunxi-ng/Kconfig | 1 + drivers/clk/sunxi-ng/ccu-sun50i-a64.c| 2 +- drivers/clk/sunxi-ng/ccu-sun6i-a31.c | 2 +- drivers/clk/sunxi-ng/ccu_mp.c| 8 drivers/clk/sunxi-ng/ccu_nkmp.c | 2 +- 8 files changed, 25 insertions(+), 8 deletions(-) -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
Re: lockdep warning: console vs. mem hotplug
On Sat, 25 Mar 2017 09:00:05 +0900 Sergey Senozhatskywrote: > Hello, > > On (03/24/17 12:39), Steven Rostedt wrote: > [..] > > Is there a stack trace of where the lockdep dump happened? That is > > useful too. Otherwise we don't see where the inverse happened. > > Steven, isn't it the inversion I describe in [1] (after the first lockdep > warning)? > > [1] lkml.kernel.org/r/20170321044421.GB448@jagdpanzerIV.localdomain > Yeah, I believe you are right. I just wanted to make sure. It's the same backtrace as the "(&(_con_lock)->rlock){-.-...}:" dump, but I wanted to make sure. -- Steve
Re: lockdep warning: console vs. mem hotplug
On Sat, 25 Mar 2017 09:00:05 +0900 Sergey Senozhatsky wrote: > Hello, > > On (03/24/17 12:39), Steven Rostedt wrote: > [..] > > Is there a stack trace of where the lockdep dump happened? That is > > useful too. Otherwise we don't see where the inverse happened. > > Steven, isn't it the inversion I describe in [1] (after the first lockdep > warning)? > > [1] lkml.kernel.org/r/20170321044421.GB448@jagdpanzerIV.localdomain > Yeah, I believe you are right. I just wanted to make sure. It's the same backtrace as the "(&(_con_lock)->rlock){-.-...}:" dump, but I wanted to make sure. -- Steve
Re: [PATCH] perf auxtrace: Fix no_size logic in addr_filter__resolve_kernel_syms()
Em Fri, Mar 24, 2017 at 04:20:12PM -0700, Andi Kleen escreveu: > On Fri, Mar 24, 2017 at 02:15:52PM +0200, Adrian Hunter wrote: > > Address filtering with kernel symbols incorrectly resulted in the error > > "Cannot determine size of symbol" because the no_size logic was the wrong > > way around. > > > > Signed-off-by: Adrian Hunter> > Cc: sta...@vger.kernel.org # v4.9+ > > --- > > Tested-by: Andi Kleen Thanks, applied. - Arnaldo > Fixes my problems with tracing kernel with PT with filtering. > > -Andi
Re: [PATCH] perf auxtrace: Fix no_size logic in addr_filter__resolve_kernel_syms()
Em Fri, Mar 24, 2017 at 04:20:12PM -0700, Andi Kleen escreveu: > On Fri, Mar 24, 2017 at 02:15:52PM +0200, Adrian Hunter wrote: > > Address filtering with kernel symbols incorrectly resulted in the error > > "Cannot determine size of symbol" because the no_size logic was the wrong > > way around. > > > > Signed-off-by: Adrian Hunter > > Cc: sta...@vger.kernel.org # v4.9+ > > --- > > Tested-by: Andi Kleen Thanks, applied. - Arnaldo > Fixes my problems with tracing kernel with PT with filtering. > > -Andi