Re: [PATCH 4.4 00/43] 4.4.66-stable review
On 05/01/2017 09:59 PM, kernelci.org bot wrote: > stable-rc/linux-4.4.y boot: 100 boots: 0 failed, 100 passed > (v4.4.65-44-gbb57d8400b2e) > > Full Boot Summary: > https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.4.y/kernel/v4.4.65-44-gbb57d8400b2e/ > Full Build Summary: > https://kernelci.org/build/stable-rc/branch/linux-4.4.y/kernel/v4.4.65-44-gbb57d8400b2e/ > > Tree: stable-rc > Branch: linux-4.4.y > Git Describe: v4.4.65-44-gbb57d8400b2e > Git Commit: bb57d8400b2eec5a7cb4526879431f4b6cac6009 > Git URL: > http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > Tested: 14 unique boards, 8 SoC families, 18 builds out of 199 > > --- > For more info write to > Compiled and booted on my test system. No dmesg regressions. thanks, -- Shuah
Re: [PATCH v2 29/30] dt-bindings: add vendor prefix for bananapi
+arm-soc On Mon, May 1, 2017 at 1:58 AM, Sean Wang wrote: > On Fri, 2017-04-28 at 15:37 -0500, Rob Herring wrote: >> On Wed, Apr 26, 2017 at 05:26:13PM +0800, sean.w...@mediatek.com wrote: >> > From: Sean Wang >> > >> > Banana Pi team in Sinovoip Co., Limited which are dedicated to >> > design and manufacture open hardware product. >> > >> > Website: http://www.banana-pi.org/ >> > >> > Signed-off-by: Sean Wang >> > --- >> > Documentation/devicetree/bindings/vendor-prefixes.txt | 1 + >> > 1 file changed, 1 insertion(+) >> > >> > diff --git a/Documentation/devicetree/bindings/vendor-prefixes.txt >> > b/Documentation/devicetree/bindings/vendor-prefixes.txt >> > index ec0bfb9..8ca0f3c 100644 >> > --- a/Documentation/devicetree/bindings/vendor-prefixes.txt >> > +++ b/Documentation/devicetree/bindings/vendor-prefixes.txt >> > @@ -44,6 +44,7 @@ avia avia semiconductor >> > avic Shanghai AVIC Optoelectronics Co., Ltd. >> > axentiaAxentia Technologies AB >> > axis Axis Communications AB >> > +bananapi Banana Pi SINOVOP CO., LIMITED >> >> s/SINOVOP/SINOVOIP/ > > Hi Rob, > > thanks for your careful reviewing , i will correct it in the next > version. > > BTW, Could those related dts changes go through your tree if everything > looks good? Because I find Matthias have been inactive for a while and > the latest branch in his tree seems still staying on 4.10. Happy to take binding doc changes, but I don't take dts changes because those go thru arm-soc. If the platform maintainer is not responsive, then we should replace or add maintainers. Rob
Re: [PATCH 1/3] drm: fourcc byteorder: drop DRM_FORMAT_BIG_ENDIAN
Hi Gerd, I did not have the change to follow through the discussion. Pardon if my suggestion have already been discussed. On 2 May 2017 at 14:34, Gerd Hoffmann wrote: > It's unused. > > Suggested-by: Daniel Vetter > Signed-off-by: Gerd Hoffmann > --- > include/uapi/drm/drm_fourcc.h | 2 -- > drivers/gpu/drm/drm_fourcc.c | 3 +-- > drivers/gpu/drm/drm_framebuffer.c | 2 +- > 3 files changed, 2 insertions(+), 5 deletions(-) > > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h > index 995c8f9c69..305bc34be0 100644 > --- a/include/uapi/drm/drm_fourcc.h > +++ b/include/uapi/drm/drm_fourcc.h > @@ -33,8 +33,6 @@ extern "C" { > #define fourcc_code(a, b, c, d) ((__u32)(a) | ((__u32)(b) << 8) | \ > ((__u32)(c) << 16) | ((__u32)(d) << 24)) > > -#define DRM_FORMAT_BIG_ENDIAN (1<<31) /* format is big endian instead of > little endian */ > - Weston references DRM_FORMAT_BIG_ENDIAN thus this patch will lead to build breakage. That is never fun, so please carefully coordinate with the Weston devs to minimise the fireworks. Personally I would leave the symbol, since it's UAPI and document that should not be used. Not my call, so feel free to ignore. Thanks Emil
Re: [PATCH 4.10 00/62] 4.10.14-stable review
On 05/01/2017 10:59 PM, kernelci.org bot wrote: > stable-rc/linux-4.10.y boot: 87 boots: 0 failed, 87 passed > (v4.10.13-63-gcabfe9402479) > > Full Boot Summary: > https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.10.y/kernel/v4.10.13-63-gcabfe9402479/ > Full Build Summary: > https://kernelci.org/build/stable-rc/branch/linux-4.10.y/kernel/v4.10.13-63-gcabfe9402479/ > > Tree: stable-rc > Branch: linux-4.10.y > Git Describe: v4.10.13-63-gcabfe9402479 > Git Commit: cabfe940247902943161bc50dec830a5f19352e5 > Git URL: > http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > Tested: 15 unique boards, 9 SoC families, 18 builds out of 203 > > --- > For more info write to > Compiled and booted on my test system. No dmesg regressions. thanks, -- Shuah
Re: [PATCH 4.9 00/54] 4.9.26-stable review
On 05/01/2017 10:19 PM, kernelci.org bot wrote: > stable-rc/linux-4.9.y boot: 96 boots: 0 failed, 96 passed > (v4.9.25-55-g49bccd4690a2) > > Full Boot Summary: > https://kernelci.org/boot/all/job/stable-rc/branch/linux-4.9.y/kernel/v4.9.25-55-g49bccd4690a2/ > Full Build Summary: > https://kernelci.org/build/stable-rc/branch/linux-4.9.y/kernel/v4.9.25-55-g49bccd4690a2/ > > Tree: stable-rc > Branch: linux-4.9.y > Git Describe: v4.9.25-55-g49bccd4690a2 > Git Commit: 49bccd4690a2adf6e3236fef3005d905b91cf520 > Git URL: > http://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > Tested: 16 unique boards, 9 SoC families, 20 builds out of 203 > > --- > For more info write to > Compiled and booted on my test system. No dmesg regressions. thanks, -- Shuah
[PATCH 0/6] mailbox: arm_mhu: add support for subchannels
Hi Jassi, This series adds subchannel support to ARM MHU mailbox controller driver. Since SCPI never used second slot, we were able to use the existing driver as is. However, that's changing soon and the new SCMI protocol under development needs subchannel support. If you recall when you initially added this driver, I was pushing for some of these changes like threaded irq. This patch series adds support for the subchannels on ARM MHU controllers. Regards, Sudeep Sudeep Holla (6): mailbox: arm_mhu: reorder header inclusion and drop unneeded ones Documentation: devicetree: add bindings to support ARM MHU subchannels mailbox: arm_mhu: migrate to threaded irq handler mailbox: arm_mhu: re-factor data structure to add subchannel support mailbox: arm_mhu: add full support for sub-channels mailbox: arm_mhu: add name support to record mbox-name .../devicetree/bindings/mailbox/arm-mhu.txt| 44 ++- drivers/mailbox/arm_mhu.c | 381 ++--- 2 files changed, 376 insertions(+), 49 deletions(-) -- 2.7.4
[PATCH 1/6] mailbox: arm_mhu: reorder header inclusion and drop unneeded ones
This patch just re-orders some of the headers includes and also drop the ones that are unnecessary. Cc: Alexey Klimov Cc: Jassi Brar Signed-off-by: Sudeep Holla --- drivers/mailbox/arm_mhu.c | 11 --- 1 file changed, 4 insertions(+), 7 deletions(-) diff --git a/drivers/mailbox/arm_mhu.c b/drivers/mailbox/arm_mhu.c index 99befa76e37c..be0f293a9457 100644 --- a/drivers/mailbox/arm_mhu.c +++ b/drivers/mailbox/arm_mhu.c @@ -13,16 +13,13 @@ * GNU General Public License for more details. */ -#include -#include -#include -#include -#include +#include +#include #include +#include #include -#include -#include #include +#include #define INTR_STAT_OFS 0x0 #define INTR_SET_OFS 0x8 -- 2.7.4
[PATCH 4/6] mailbox: arm_mhu: re-factor data structure to add subchannel support
In order to support subchannels, we need a bit of reword around data structures that are per-channel. Since the number of subchannels are not fixed though restricted to maximum of 20, the channel assignment and initialization is move to xlate function. This patch also adds the platform data for the existing support of one channel per physical channel. Cc: Alexey Klimov Cc: Jassi Brar Signed-off-by: Sudeep Holla --- drivers/mailbox/arm_mhu.c | 208 +- 1 file changed, 186 insertions(+), 22 deletions(-) diff --git a/drivers/mailbox/arm_mhu.c b/drivers/mailbox/arm_mhu.c index ebe17c097f43..4e0f690b97fd 100644 --- a/drivers/mailbox/arm_mhu.c +++ b/drivers/mailbox/arm_mhu.c @@ -20,6 +20,8 @@ #include #include #include +#include +#include #define INTR_STAT_OFS 0x0 #define INTR_SET_OFS 0x8 @@ -30,7 +32,8 @@ #define MHU_SEC_OFFSET 0x200 #define TX_REG_OFFSET 0x100 -#define MHU_CHANS 3 +#define MHU_NUM_PCHANS 3 /* Secure, Non-Secure High and Low Priority */ +#define MHU_CHAN_MAX 20 /* Max channels to save on unused RAM */ struct mhu_link { unsigned irq; @@ -40,53 +43,175 @@ struct mhu_link { struct arm_mhu { void __iomem *base; - struct mhu_link mlink[MHU_CHANS]; - struct mbox_chan chan[MHU_CHANS]; + struct mhu_link mlink[MHU_NUM_PCHANS]; struct mbox_controller mbox; + struct device *dev; }; +/** + * ARM MHU Mailbox platform specific configuration + * + * @num_pchans: Maximum number of physical channels + * @num_subchans: Maximum number of sub-channels per physical channel + */ +struct mhu_mbox_pdata { + unsigned int num_pchans; + unsigned int num_subchans; + bool support_subchans; +}; + +/** + * ARM MHU Mailbox allocated channel information + * + * @mhu: Pointer to parent mailbox device + * @pchan: Physical channel within which this sub-channel resides in + * @channel: Sub-channel number pertaining to this container + */ +struct mhu_channel { + struct arm_mhu *mhu; + unsigned int pchan; + unsigned int subchan; +}; + +static inline struct mbox_chan * +mhu_mbox_to_channel(struct mbox_controller *mbox, + unsigned int pchan, unsigned int subchan) +{ + int i; + struct mhu_channel *chan_info; + + for (i = 0; i < mbox->num_chans; i++) { + chan_info = mbox->chans[i].con_priv; + if (chan_info && chan_info->pchan == pchan && + chan_info->subchan == subchan) + return >chans[i]; + } + + dev_err(mbox->dev, + "Channel not registered: physical channel: %d subchan: %d\n", + pchan, subchan); + + return NULL; +} + +static unsigned int mhu_mbox_irq_to_pchan_num(struct arm_mhu *mhu, int irq) +{ + unsigned int pchan; + struct mhu_mbox_pdata *pdata = dev_get_platdata(mhu->dev); + + for (pchan = 0; pchan < pdata->num_pchans; pchan++) + if (mhu->mlink[pchan].irq == irq) + break; + return pchan; +} + +static struct mbox_chan *mhu_mbox_xlate(struct mbox_controller *mbox, + const struct of_phandle_args *spec) +{ + struct arm_mhu *mhu = dev_get_drvdata(mbox->dev); + struct mhu_mbox_pdata *pdata = dev_get_platdata(mhu->dev); + struct mhu_channel *chan_info; + struct mbox_chan *chan = NULL; + unsigned int pchan = spec->args[0]; + unsigned int subchan = pdata->support_subchans ? spec->args[1] : 0; + int i; + + /* Bounds checking */ + if (pchan >= pdata->num_pchans || subchan >= pdata->num_subchans) { + dev_err(mbox->dev, + "Invalid channel requested pchan: %d subchan: %d\n", + pchan, subchan); + return ERR_PTR(-EINVAL); + } + + for (i = 0; i < mbox->num_chans; i++) { + chan_info = mbox->chans[i].con_priv; + + /* Is requested channel free? */ + if (chan_info && + mbox->dev == chan_info->mhu->dev && + pchan == chan_info->pchan && + subchan == chan_info->subchan) { + dev_err(mbox->dev, "Channel in use\n"); + return ERR_PTR(-EBUSY); + } + + /* +* Find the first free slot, then continue checking +* to see if requested channel is in use +*/ + if (!chan && !chan_info) + chan = >chans[i]; + } + + if (!chan) { + dev_err(mbox->dev, "No free channels left\n"); + return ERR_PTR(-EBUSY); + } + + chan_info = devm_kzalloc(mbox->dev, sizeof(*chan_info), GFP_KERNEL); + if (!chan_info) + return ERR_PTR(-ENOMEM); + + chan_info->mhu = mhu; + chan_info->pchan = pchan; +
[PATCH 6/6] mailbox: arm_mhu: add name support to record mbox-name
It's sometimes useful to identify the mailbox controller with the name as specified in the devicetree via mbox-name property especially in a system with multiple controllers. This patch adds support to read and record the mailbox controller name. Cc: Alexey Klimov Cc: Jassi Brar Signed-off-by: Sudeep Holla --- drivers/mailbox/arm_mhu.c | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/drivers/mailbox/arm_mhu.c b/drivers/mailbox/arm_mhu.c index 0f5ab2204649..9aa623a3aa9a 100644 --- a/drivers/mailbox/arm_mhu.c +++ b/drivers/mailbox/arm_mhu.c @@ -47,6 +47,7 @@ struct arm_mhu { struct mhu_link mlink[MHU_NUM_PCHANS]; struct mbox_controller mbox; struct device *dev; + const char *name; }; /** @@ -257,8 +258,8 @@ static struct mbox_chan *mhu_mbox_xlate(struct mbox_controller *mbox, chan->con_priv = chan_info; - dev_dbg(mbox->dev, "mbox: created channel - physical: %d sub: %d\n", - pchan, subchan); + dev_dbg(mbox->dev, "mbox: %s, created channel - physical: %d sub: %d\n", + mhu->name, pchan, subchan); return chan; } @@ -361,6 +362,7 @@ static int mhu_probe(struct amba_device *adev, const struct amba_id *id) struct mhu_mbox_pdata *pdata; const struct of_device_id *match; struct device *dev = >dev; + struct device_node *np = dev->of_node; int mhu_reg[MHU_NUM_PCHANS] = { MHU_LP_OFFSET, MHU_HP_OFFSET, MHU_SEC_OFFSET, }; @@ -390,6 +392,10 @@ static int mhu_probe(struct amba_device *adev, const struct amba_id *id) return PTR_ERR(mhu->base); } + err = of_property_read_string(np, "mbox-name", >name); + if (err) + mhu->name = np->full_name; + chans = devm_kcalloc(dev, max_chans, sizeof(*chans), GFP_KERNEL); if (!chans) return -ENOMEM; @@ -441,7 +447,7 @@ static int mhu_probe(struct amba_device *adev, const struct amba_id *id) } } - dev_info(dev, "ARM MHU Mailbox registered\n"); + dev_info(dev, "%s mailbox registered\n", mhu->name); return 0; } -- 2.7.4
[PATCH 3/6] mailbox: arm_mhu: migrate to threaded irq handler
In preparation to introduce support for subchannels which require the interrupt handlers to be threaded, this patch moves the existing interrupt handler to threaded handler. Also it moves out the registering and freeing of the handlers from the mailbox startup and shutdown methods. This also is required to support sub-channels. Cc: Alexey Klimov Cc: Jassi Brar Signed-off-by: Sudeep Holla --- drivers/mailbox/arm_mhu.c | 41 ++--- 1 file changed, 22 insertions(+), 19 deletions(-) diff --git a/drivers/mailbox/arm_mhu.c b/drivers/mailbox/arm_mhu.c index be0f293a9457..ebe17c097f43 100644 --- a/drivers/mailbox/arm_mhu.c +++ b/drivers/mailbox/arm_mhu.c @@ -84,27 +84,15 @@ static int mhu_startup(struct mbox_chan *chan) { struct mhu_link *mlink = chan->con_priv; u32 val; - int ret; val = readl_relaxed(mlink->tx_reg + INTR_STAT_OFS); writel_relaxed(val, mlink->tx_reg + INTR_CLR_OFS); - ret = request_irq(mlink->irq, mhu_rx_interrupt, - IRQF_SHARED, "mhu_link", chan); - if (ret) { - dev_err(chan->mbox->dev, - "Unable to acquire IRQ %d\n", mlink->irq); - return ret; - } - return 0; } static void mhu_shutdown(struct mbox_chan *chan) { - struct mhu_link *mlink = chan->con_priv; - - free_irq(mlink->irq, chan); } static const struct mbox_chan_ops mhu_ops = { @@ -132,13 +120,6 @@ static int mhu_probe(struct amba_device *adev, const struct amba_id *id) return PTR_ERR(mhu->base); } - for (i = 0; i < MHU_CHANS; i++) { - mhu->chan[i].con_priv = >mlink[i]; - mhu->mlink[i].irq = adev->irq[i]; - mhu->mlink[i].rx_reg = mhu->base + mhu_reg[i]; - mhu->mlink[i].tx_reg = mhu->mlink[i].rx_reg + TX_REG_OFFSET; - } - mhu->mbox.dev = dev; mhu->mbox.chans = >chan[0]; mhu->mbox.num_chans = MHU_CHANS; @@ -155,6 +136,28 @@ static int mhu_probe(struct amba_device *adev, const struct amba_id *id) return err; } + for (i = 0; i < MHU_CHANS; i++) { + int irq = mhu->mlink[i].irq = adev->irq[i]; + + if (irq <= 0) { + dev_dbg(dev, "No IRQ found for Channel %d\n", i); + continue; + } + + mhu->chan[i].con_priv = >mlink[i]; + mhu->mlink[i].rx_reg = mhu->base + mhu_reg[i]; + mhu->mlink[i].tx_reg = mhu->mlink[i].rx_reg + TX_REG_OFFSET; + + err = devm_request_threaded_irq(dev, irq, NULL, + mhu_rx_interrupt, IRQF_ONESHOT, + "mhu_link", >chan[i]); + if (err) { + dev_err(dev, "Can't claim IRQ %d\n", irq); + mbox_controller_unregister(>mbox); + return err; + } + } + dev_info(dev, "ARM MHU Mailbox registered\n"); return 0; } -- 2.7.4
[PATCH 2/6] Documentation: devicetree: add bindings to support ARM MHU subchannels
The ARM MHU has mechanism to assert interrupt signals to facilitate inter-processor message based communication. It drives the signal using a 32-bit register, with all 32-bits logically ORed together. It also enables software to set, clear and check the status of each of the bits of this register independently. Each bit of the register can be associated with a type of event that can contribute to raising the interrupt thereby allowing it to be used as independent subchannels. Since the first version of this binding can't support sub-channels, this patch extends the existing binding to support them. Cc: Alexey Klimov Cc: Jassi Brar Cc: Rob Herring Cc: devicet...@vger.kernel.org Signed-off-by: Sudeep Holla --- .../devicetree/bindings/mailbox/arm-mhu.txt| 44 -- 1 file changed, 41 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/mailbox/arm-mhu.txt b/Documentation/devicetree/bindings/mailbox/arm-mhu.txt index 4971f03f0b33..86a66f7918e2 100644 --- a/Documentation/devicetree/bindings/mailbox/arm-mhu.txt +++ b/Documentation/devicetree/bindings/mailbox/arm-mhu.txt @@ -10,21 +10,40 @@ STAT register and the remote clears it after having read the data. The last channel is specified to be a 'Secure' resource, hence can't be used by Linux running NS. +The MHU drives the interrupt signal using a 32-bit register, with all +32-bits logically ORed together. It provides a set of registers to +enable software to set, clear and check the status of each of the bits +of this register independently. The use of 32 bits per interrupt line +enables software to provide more information about the source of the +interrupt. For example, each bit of the register can be associated with +a type of event that can contribute to raising the interrupt. + Mailbox Device Node: Required properties: -- compatible: Shall be "arm,mhu" & "arm,primecell" +- compatible: Shall be "arm,primecell" and one of the below: + "arm,mhu" - if the controller doesn't support + subchannels + "arm,mhu-v2" - if the controller supports subchannels - reg: Contains the mailbox register address range (base address and length) -- #mbox-cells Shall be 1 - the index of the channel needed. +- #mbox-cells Shall be 1 - the index of the channel needed when + compatible is "arm,mhu" + Shall be 2 - the index of the channel needed, and + the index of the subchannel with the channel when + compatible is "arm,mhu-v2" - interrupts: Contains the interrupt information corresponding to - each of the 3 links of MHU. + each of the 3 physical channels of MHU namely low + priority non-secure, high priority non-secure and + secure channels. Example: +1. Controller which doesn't support subchannels + mhu: mailbox@2b1f { #mbox-cells = <1>; compatible = "arm,mhu", "arm,primecell"; @@ -41,3 +60,22 @@ used by Linux running NS. reg = <0 0x2e00 0x4000>; mboxes = < 1>; /* HP-NonSecure */ }; + +2. Controller which supports subchannels + + mhu: mailbox@2b1f { + #mbox-cells = <2>; + compatible = "arm,mhu-v2", "arm,primecell"; + reg = <0 0x2b1f 0x1000>; + interrupts = <0 36 4>, /* LP-NonSecure */ +<0 35 4>, /* HP-NonSecure */ +<0 37 4>; /* Secure */ + clocks = < 0 2 1>; + clock-names = "apb_pclk"; + }; + + mhu_client: scb@2e00 { + compatible = "arm,scpi"; + reg = <0 0x2e00 0x200>; + mboxes = < 1 4>; /* HP-NonSecure 5th sub-channel */ + }; -- 2.7.4
[PATCH 5/6] mailbox: arm_mhu: add full support for sub-channels
We now have the basic infrastructure and binding to support subchannels on ARM MHU controller. This patch adds all the necessary mailbox operations to add support for the sub-channels. Maximum of 32 subchannels are supported on each physical channel, however the total number of subchannels is restricted to 20 in order to save memory. It can increased if required in future. Cc: Alexey Klimov Cc: Jassi Brar Signed-off-by: Sudeep Holla --- drivers/mailbox/arm_mhu.c | 127 -- 1 file changed, 123 insertions(+), 4 deletions(-) diff --git a/drivers/mailbox/arm_mhu.c b/drivers/mailbox/arm_mhu.c index 4e0f690b97fd..0f5ab2204649 100644 --- a/drivers/mailbox/arm_mhu.c +++ b/drivers/mailbox/arm_mhu.c @@ -18,6 +18,7 @@ #include #include #include +#include #include #include #include @@ -94,6 +95,14 @@ mhu_mbox_to_channel(struct mbox_controller *mbox, return NULL; } +static void mhu_mbox_clear_irq(struct mbox_chan *chan) +{ + struct mhu_channel *chan_info = chan->con_priv; + void __iomem *base = chan_info->mhu->mlink[chan_info->pchan].rx_reg; + + writel_relaxed(BIT(chan_info->subchan), base + INTR_CLR_OFS); +} + static unsigned int mhu_mbox_irq_to_pchan_num(struct arm_mhu *mhu, int irq) { unsigned int pchan; @@ -105,6 +114,95 @@ static unsigned int mhu_mbox_irq_to_pchan_num(struct arm_mhu *mhu, int irq) return pchan; } +static struct mbox_chan *mhu_mbox_irq_to_channel(struct arm_mhu *mhu, +unsigned int pchan) +{ + unsigned long bits; + unsigned int subchan; + struct mbox_chan *chan = NULL; + struct mbox_controller *mbox = >mbox; + void __iomem *base = mhu->mlink[pchan].rx_reg; + + bits = readl_relaxed(base + INTR_STAT_OFS); + if (!bits) + /* No IRQs fired in specified physical channel */ + return NULL; + + /* An IRQ has fired, find the associated channel */ + for (subchan = 0; bits; subchan++) { + if (!test_and_clear_bit(subchan, )) + continue; + + chan = mhu_mbox_to_channel(mbox, pchan, subchan); + if (chan) + break; + } + + return chan; +} + +static irqreturn_t mhu_mbox_thread_handler(int irq, void *data) +{ + struct mbox_chan *chan; + struct arm_mhu *mhu = data; + unsigned int pchan = mhu_mbox_irq_to_pchan_num(mhu, irq); + + while (NULL != (chan = mhu_mbox_irq_to_channel(mhu, pchan))) { + mbox_chan_received_data(chan, NULL); + mhu_mbox_clear_irq(chan); + } + + return IRQ_HANDLED; +} + +static bool mhu_subchan_last_tx_done(struct mbox_chan *chan) +{ + struct mhu_channel *chan_info = chan->con_priv; + void __iomem *base = chan_info->mhu->mlink[chan_info->pchan].tx_reg; + + if (readl_relaxed(base + INTR_STAT_OFS) & BIT(chan_info->subchan)) + return false; + + return true; +} + +static int mhu_subchan_send_data(struct mbox_chan *chan, void *data) +{ + struct mhu_channel *chan_info = chan->con_priv; + void __iomem *base = chan_info->mhu->mlink[chan_info->pchan].tx_reg; + + /* Send event to co-processor */ + writel_relaxed(BIT(chan_info->subchan), base + INTR_SET_OFS); + + return 0; +} + +static int mhu_subchan_startup(struct mbox_chan *chan) +{ + mhu_mbox_clear_irq(chan); + return 0; +} + +static void mhu_subchan_shutdown(struct mbox_chan *chan) +{ + struct mhu_channel *chan_info = chan->con_priv; + struct mbox_controller *mbox = _info->mhu->mbox; + int i; + + for (i = 0; i < mbox->num_chans; i++) + if (chan == >chans[i]) + break; + + if (mbox->num_chans == i) { + dev_warn(mbox->dev, "Request to free non-existent channel\n"); + return; + } + + /* Reset channel */ + mhu_mbox_clear_irq(chan); + chan->con_priv = NULL; +} + static struct mbox_chan *mhu_mbox_xlate(struct mbox_controller *mbox, const struct of_phandle_args *spec) { @@ -227,14 +325,28 @@ static const struct mbox_chan_ops mhu_ops = { .last_tx_done = mhu_last_tx_done, }; +static const struct mbox_chan_ops mhu_subchan_ops = { + .send_data = mhu_subchan_send_data, + .startup = mhu_subchan_startup, + .shutdown = mhu_subchan_shutdown, + .last_tx_done = mhu_subchan_last_tx_done, +}; + static const struct mhu_mbox_pdata arm_mhu_pdata = { .num_pchans = 3, .num_subchans = 1, .support_subchans = false, }; +static const struct mhu_mbox_pdata arm_mhu_v2_pdata = { + .num_pchans = 2,/* Secure can't be used */ + .num_subchans = 32, + .support_subchans = true, +}; + static const struct of_device_id mhu_mbox_match[] = { { .compatible = "arm,mhu",
RE: [PATCH 3/4] net: macb: Add hardware PTP support
> From: Richard Cochran [mailto:richardcoch...@gmail.com] > Sent: 14 kwietnia 2017 20:29 > To: Rafal Ozieblo > Cc: David Miller ; nicolas.fe...@atmel.com; > net...@vger.kernel.org; linux-kernel@vger.kernel.org; > devicet...@vger.kernel.org; linux-arm-ker...@lists.infradead.org; > harinikatakamli...@gmail.com; harini.kata...@xilinx.com; > andrei.pistir...@microchip.com > Subject: Re: [PATCH 3/4] net: macb: Add hardware PTP support > > On Thu, Apr 13, 2017 at 02:39:23PM +0100, Rafal Ozieblo wrote: (...) > > +static int gem_tsu_get_time(struct macb *bp, struct timespec64 *ts) > > +{ > > + long first, second; > > + u32 secl, sech; > > + unsigned long flags; > > + > > + if (!bp || !ts) > > + return -EINVAL; > > Useless test. Sorry for me being too carefulness, I'll remove all tests. > (...) > > +static int gem_ptp_gettime(struct ptp_clock_info *ptp, struct timespec64 > *ts) > > +{ > > + struct macb *bp = container_of(ptp, struct macb, ptp_clock_info); > > + > > + if (!ptp || !ts) > > + return -EINVAL; > > Useles test. > > What is the point of this wrapper function anyhow? Please remove it. gem_ptp_gettime() is assigned in ptp_clock_info and it has to have ptp_clock_info pointer as first parameter. gem_tsu_get_time() is used in the source code but with macb pointer. Do you want me to do something like: gem_ptp_gettime(macb->ptp, ts); and first would be getting macb pointer from ptp ? struct macb *bp = container_of(ptp, struct macb, ptp_clock_info); > > > + > > + gem_tsu_get_time(bp, ts); > > + return 0; > > +} > > + > > +static int gem_ptp_settime(struct ptp_clock_info *ptp, > > + const struct timespec64 *ts) > > +{ > > + struct macb *bp = container_of(ptp, struct macb, ptp_clock_info); > > + > > + if (!ptp || !ts) > > + return -EINVAL; > > Another useless function. ditto > > > + gem_tsu_set_time(bp, ts); > > + return 0; > > +} > > + > > +static int gem_ptp_enable(struct ptp_clock_info *ptp, > > + struct ptp_clock_request *rq, int on) > > +{ > > + struct macb *bp = container_of(ptp, struct macb, ptp_clock_info); > > + > > + if (!ptp || !rq) > > + return -EINVAL; > > Sigh. > > > + > > + switch (rq->type) { > > + case PTP_CLK_REQ_EXTTS: /* Toggle TSU match interrupt */ > > + if (on) > > + macb_writel(bp, IER, MACB_BIT(TCI)); > > No locking to protect IER and IDE? There is no need. > > > + else > > + macb_writel(bp, IDR, MACB_BIT(TCI)); > > + break; > > + case PTP_CLK_REQ_PEROUT: /* Toggle Periodic output */ > > + return -EOPNOTSUPP; > > + /* break; */ > > + case PTP_CLK_REQ_PPS: /* Toggle TSU periodic (second) > interrupt */ > > + if (on) > > + macb_writel(bp, IER, MACB_BIT(SRI)); > > + else > > + macb_writel(bp, IDR, MACB_BIT(SRI)); > > + break; > > + default: > > + break; > > + } > > + return 0; > > +} > > + (...) > > -- > > 2.4.5 > > > > Thanks, > Richard
Re: [PATCH RFC v2 5/6] proc: instantiate only pids that we can ptrace on 'limit_pids=1' mount option
Hello Andy, (Sorry for my late response) On Thu, Apr 27, 2017 at 12:09 AM, Andy Lutomirski wrote: > On Tue, Apr 25, 2017 at 5:23 AM, Djalal Harouni wrote: >> If "limit_pids=1" mount option is set then do not instantiate pids that >> we can not ptrace. "limit_pids=1" means that procfs should only contain >> pids that the caller can ptrace. >> >> Cc: Kees Cook >> Cc: Andy Lutomirski >> Signed-off-by: Djalal Harouni >> --- >> fs/proc/base.c | 9 + >> 1 file changed, 9 insertions(+) >> >> diff --git a/fs/proc/base.c b/fs/proc/base.c >> index 2e0f661..a663284 100644 >> --- a/fs/proc/base.c >> +++ b/fs/proc/base.c >> @@ -3149,6 +3149,7 @@ struct dentry *proc_pid_lookup(struct inode *dir, >> struct dentry * dentry, unsign >> unsigned tgid; >> struct proc_fs_info *fs_info = proc_sb(dir->i_sb); >> struct pid_namespace *ns = fs_info->pid_ns; >> + int limit_pids = proc_fs_limit_pids(fs_info); > > Shouldn't the addition of proc_fs_limit_pids() be in this patch? Actually I think this patch should be part of proc_fs_limit_pids() since it will cover paths that are not handled by inode ->permission() checks. But I will review this patch ad discussion "devpts: Make each mount of devpts an independent filesystem" [1] and try to get the rational about it, if doing permission like checks in lookups is related, or if it should be done in the first place... The other thing that I didn't have time to check is standardizing on returned error codes: I prefer to always return -ENOENT, instead of -EPERM or sometimes -ENOENT. > Also, can we name it something self-documented? > "ptraceable_pids_only=1", perhaps? Or even pids=ptraceable (as > opposed to pids=all or maybe other choices in the future)? Yes, I will update. [1] https://patchwork.kernel.org/patch/9150781/ -- tixxdz
Re: [RESEND PATCH] spi: bcm63xx-hsspi: Export OF device ID table as module aliases
On 1 May 2017 at 22:13, Andres Galacho wrote: > The device table is required to load modules based on > modaliases. After adding MODULE_DEVICE_TABLE, below entries > for example will be added to module.alias: > alias: of:N*T*Cbrcm,bcm6328-hsspiC* > alias: of:N*T*Cbrcm,bcm6328-hsspi > > Signed-off-by: Andres Galacho Not that it matters much with a trivial patch like this, but Acked-by: Jonas Gorski also thanks for doing it. Regards Jonas
error value for "internal error"
I've been wondering what to return for soft asserts like this: if (WARN_ON(something unexpected)) return -E; EINVAL doesn't fit because it means the input from userspace was wrong. EIO means something went bad with the hardware. There's no "software error" or "internal error" type return code. Would it make sense to introduce one? Rule would be to always add more detail to dmesg (such as done by WARN_ON) when such a code is returned. Thanks, Miklos
Re: [PATCH] tools/kvm: fix top level makefile
On 11/04/2017 17:34, Justin M. Forbes wrote: > The top level tools/Makefile includes kvm_stat as a target in help, but > the actual target is missing. > > Signed-off-by: Justin M. Forbes > --- > tools/Makefile | 5 - > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/tools/Makefile b/tools/Makefile > index 00caacd..c8a90d0 100644 > --- a/tools/Makefile > +++ b/tools/Makefile > @@ -86,10 +86,13 @@ tmon: FORCE > freefall: FORCE > $(call descend,laptop/$@) > > +kvm_stat: FORCE > + $(call descend,kvm/$@) > + > all: acpi cgroup cpupower gpio hv firewire lguest \ > perf selftests turbostat usb \ > virtio vm net x86_energy_perf_policy \ > - tmon freefall objtool > + tmon freefall objtool kvm_stat > > acpi_install: > $(call descend,power/$(@:_install=),install) > Applied, thanks. Paolo
[PATCH V2] pwm: tegra: Set maximum pwm clock source per SoC tapeout
The PWM hardware IP is taped-out with different maximum frequency on different SoCs. >From HW team: Before Tegra186, it is 38.4MHz. In Tegra186, it is 102MHz. Add support to limit the clock source frequency to the maximum IP supported frequency. Provide these values via SoC chipdata. Signed-off-by: Laxman Dewangan --- Changes from V1: - Set the 48MHz maximum frequency for Tegra210 and earlier. - Set the maximum frequency unconditionally as per V1 review comment. --- drivers/pwm/pwm-tegra.c | 18 +- 1 file changed, 17 insertions(+), 1 deletion(-) diff --git a/drivers/pwm/pwm-tegra.c b/drivers/pwm/pwm-tegra.c index 8c6ed55..e9b33f0 100644 --- a/drivers/pwm/pwm-tegra.c +++ b/drivers/pwm/pwm-tegra.c @@ -41,6 +41,9 @@ struct tegra_pwm_soc { unsigned int num_channels; + + /* Maximum IP frequency for given SoCs */ + unsigned long max_frequency; }; struct tegra_pwm_chip { @@ -201,7 +204,18 @@ static int tegra_pwm_probe(struct platform_device *pdev) if (IS_ERR(pwm->clk)) return PTR_ERR(pwm->clk); - /* Read PWM clock rate from source */ + /* Set maximum frequency of the IP */ + ret = clk_set_rate(pwm->clk, pwm->soc->max_frequency); + if (ret < 0) { + dev_err(>dev, "Failed to set max frequency: %d\n", ret); + return ret; + } + + /* +* The requested and configured frequency may differ due to +* clock register resolutions. Get the configured frequency +* so that PWM period can be calculated more accurately. +*/ pwm->clk_rate = clk_get_rate(pwm->clk); pwm->rst = devm_reset_control_get(>dev, "pwm"); @@ -273,10 +287,12 @@ static int tegra_pwm_resume(struct device *dev) static const struct tegra_pwm_soc tegra20_pwm_soc = { .num_channels = 4, + .max_frequency = 4800UL, }; static const struct tegra_pwm_soc tegra186_pwm_soc = { .num_channels = 1, + .max_frequency = 10200UL, }; static const struct of_device_id tegra_pwm_of_match[] = { -- 2.1.4
Re: net/ipv6: warning in inet6_ifa_finish_destroy
On 28/04/17 21:39, Cong Wang wrote: > On Fri, Apr 28, 2017 at 6:08 AM, Andrey Konovalov > wrote: >> Hi, >> >> I've got the following error report while fuzzing the kernel with syzkaller. >> >> On commit 5a7ad1146caa895ad718a534399e38bd2ba721b7 (4.11-rc8). >> >> C reproducer and .config are attached. >> It takes 1-2 minutes of running the reproducer to trigger the issue. >> >> [ cut here ] >> WARNING: CPU: 0 PID: 21 at net/ipv6/addrconf.c:894 >> inet6_ifa_finish_destroy+0x12e/0x190 >> Modules linked in: >> CPU: 0 PID: 21 Comm: kworker/0:1 Not tainted 4.11.0-rc8+ #296 >> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS Bochs 01/01/2011 >> Workqueue: ipv6_addrconf addrconf_dad_work >> Call Trace: >> __dump_stack lib/dump_stack.c:16 >> dump_stack+0x292/0x398 lib/dump_stack.c:52 >> __warn+0x19f/0x1e0 kernel/panic.c:549 >> warn_slowpath_null+0x2c/0x40 kernel/panic.c:584 >> inet6_ifa_finish_destroy+0x12e/0x190 c:894 >> in6_ifa_put ./include/net/addrconf.h:330 >> addrconf_dad_work+0x4e9/0x1040 net/ipv6/addrconf.c:3963 > > > I don't look too much, but a quick glance shows in the following > path: > > } else if (action == DAD_ABORT) { > in6_ifa_hold(ifp); > addrconf_dad_stop(ifp, 1); > if (disable_ipv6) > addrconf_ifdown(idev->dev, 0); > goto out; > } > > the inet6_addr could be removed from hash table in > addrconf_ifdown() before calling in6_ifa_put(). which causes > this warning. > git describe 85b51b12115c79cce7ea1ced6c0bd0339a165d3f --contains v4.8-rc5~34^2~32 This fix was introduced in 4.8, so it is interesting that this problem is only showing up now for 4.11. Also, it is not reproducible manually, i.e. DAD failure with disable_ipv6 works just fine without triggering this warning, with or without keeping IPv6 addresses on admin down. I will go ahead with putting out a fix so that in6_ifa_put() precedes the call to addrconf_ifdown() in this case. Thanks for the heads up on this, Mike
Re: [v6,10/23] drivers/fsi: Add device read/write/peek API
On Apr 10, 2017, at 3:46 PM, Christopher Bostic wrote: From: Jeremy Kerr This change introduces the fsi device API: simple read, write and peek accessors for the devices' address spaces. Includes contributions from Chris Bostic and Edward A. James . Signed-off-by: Edward A. James Signed-off-by: Jeremy Kerr Signed-off-by: Chris Bostic Signed-off-by: Joel Stanley --- drivers/fsi/fsi-core.c | 33 + include/linux/fsi.h| 7 ++- 2 files changed, 39 insertions(+), 1 deletion(-) diff --git a/drivers/fsi/fsi-core.c b/drivers/fsi/fsi-core.c index a8faa89..4da0b030 100644 --- a/drivers/fsi/fsi-core.c +++ b/drivers/fsi/fsi-core.c @@ -32,6 +32,8 @@ #define FSI_SLAVE_CONF_CRC_MASK0x000f #define FSI_SLAVE_CONF_DATA_BITS28 +#define FSI_PEEK_BASE0x410 + static const int engine_page_size = 0x400; #define FSI_SLAVE_BASE0x800 @@ -73,9 +75,40 @@ static int fsi_master_read(struct fsi_master *master, int link, uint8_t slave_id, uint32_t addr, void *val, size_t size); static int fsi_master_write(struct fsi_master *master, int link, uint8_t slave_id, uint32_t addr, const void *val, size_t size); +static int fsi_slave_read(struct fsi_slave *slave, uint32_t addr, +void *val, size_t size); +static int fsi_slave_write(struct fsi_slave *slave, uint32_t addr, +const void *val, size_t size); /* FSI endpoint-device support */ +int fsi_device_read(struct fsi_device *dev, uint32_t addr, void *val, +size_t size) +{ +if (addr > dev->size || size > dev->size || addr > dev->size - size) +return -EINVAL; + +return fsi_slave_read(dev->slave, dev->addr + addr, val, size); +} +EXPORT_SYMBOL_GPL(fsi_device_read); + +int fsi_device_write(struct fsi_device *dev, uint32_t addr, const void *val, +size_t size) +{ +if (addr > dev->size || size > dev->size || addr > dev->size - size) +return -EINVAL; + +return fsi_slave_write(dev->slave, dev->addr + addr, val, size); +} +EXPORT_SYMBOL_GPL(fsi_device_write); + +int fsi_device_peek(struct fsi_device *dev, void *val) +{ +uint32_t addr = FSI_PEEK_BASE + ((dev->unit - 2) * sizeof(uint32_t)); + +return fsi_slave_read(dev->slave, addr, val, sizeof(uint32_t)); +} + static void fsi_device_release(struct device *_device) { struct fsi_device *device = to_fsi_dev(_device); diff --git a/include/linux/fsi.h b/include/linux/fsi.h index efa55ba..66bce48 100644 --- a/include/linux/fsi.h +++ b/include/linux/fsi.h @@ -27,6 +27,12 @@ struct fsi_device { uint32_tsize; }; +extern int fsi_device_read(struct fsi_device *dev, uint32_t addr, +void *val, size_t size); +extern int fsi_device_write(struct fsi_device *dev, uint32_t addr, +const void *val, size_t size); +extern int fsi_device_peek(struct fsi_device *dev, void *val); + struct fsi_device_id { u8engine_type; u8version; @@ -40,7 +46,6 @@ struct fsi_device_id { #define FSI_DEVICE_VERSIONED(t, v) \ .engine_type = (t), .version = (v), - struct fsi_driver { struct device_driverdrv; const struct fsi_device_id*id_table; I also wrote a driver against this API (an i2c algorithm). Everything looks good and working well. Edward (Eddie) James
Re: [PATCH v4 1/5] dt-bindings: gpu: add bindings for the ARM Mali Midgard GPU
On Tue, May 2, 2017 at 6:23 AM, Guillaume Tucker wrote: > Hi Rob, > > On 28/04/17 20:27, Rob Herring wrote: >> >> On Tue, Apr 25, 2017 at 02:16:16PM +0100, Guillaume Tucker wrote: > > >>> diff --git a/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt >>> b/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt >>> new file mode 100644 >>> index ..547ddeceb498 >>> --- /dev/null >>> +++ b/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt >>> @@ -0,0 +1,82 @@ >>> +ARM Mali Midgard GPU >>> + >>> + >>> +Required properties: >>> + >>> +- compatible : >>> + * Must be one of the following: >>> ++ "arm,mali-t60x" >>> ++ "arm,mali-t62x" >> >> >> Don't use wildcards. > > > Sure, old habits die hard... I'll fix it in patch v5. > >>> ++ "arm,mali-t720" >>> ++ "arm,mali-t760" >>> ++ "arm,mali-t820" >>> ++ "arm,mali-t830" >>> ++ "arm,mali-t860" >>> ++ "arm,mali-t880" >>> + * And, optionally, one of the vendor specific compatible: >> >> >> IMO, these should not be optional. > > > Well, vendor compatible strings are clearly optional for the > Utgard GPU series for which the bindings docs were recently > merged. It seems that whether these should be optional or not, > the documentation should be consistent between at least all > similar types of devices like Midgard and Utgard GPUs. They have > different architectures but from a device tree point of view, > they both have the same kind of SoC-specific integration (clocks, > irqs, regulators...). Clocks should not vary by SoC. There is often variation because clocks get driven by same source or are not s/w controlled, but really there should not be that variation. I noticed Utgard has 2 clocks. So is Midgard really just 1 clock? The DT should have all the clocks listed in the TRMs. > So was this was overlooked in the Utgard case and should it > ideally be fixed there as well as non-optional? Or, is it OK to > keep these optional on a second thought? Probably should be required in the Utgard case as well. Rob
Re: [PATCH 1/3] drm: fourcc byteorder: drop DRM_FORMAT_BIG_ENDIAN
> > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h > > index 995c8f9c69..305bc34be0 100644 > > --- a/include/uapi/drm/drm_fourcc.h > > +++ b/include/uapi/drm/drm_fourcc.h > > @@ -33,8 +33,6 @@ extern "C" { > > #define fourcc_code(a, b, c, d) ((__u32)(a) | ((__u32)(b) << 8) | \ > > ((__u32)(c) << 16) | ((__u32)(d) << 24)) > > > > -#define DRM_FORMAT_BIG_ENDIAN (1<<31) /* format is big endian instead of > > little endian */ > > - > > Weston references DRM_FORMAT_BIG_ENDIAN thus this patch will lead to > build breakage. > Personally I would leave the symbol, since it's UAPI and document that > should not be used. Fair enough. I can surely add a deprecated comment instead of just dropping it. I'm wondering how it is used though, given that none of the drivers in the kernel actually support this flag ... cheers, Gerd
Re: 4.8.8 kernel trigger OOM killer repeatedly when I have lots of RAM that should be free
On Tue, May 02, 2017 at 09:44:33AM +0200, Michal Hocko wrote: > On Mon 01-05-17 21:12:35, Marc MERLIN wrote: > > Howdy, > > > > Well, sadly, the problem is more or less back is 4.11.0. The system doesn't > > really > > crash but it goes into an infinite loop with > > [34776.826800] BUG: workqueue lockup - pool cpus=6 node=0 flags=0x0 nice=0 > > stuck for 33s! > > More logs: https://pastebin.com/YqE4riw0 > > I am seeing a lot of traces where tasks is waiting for an IO. I do not > see any OOM report there. Why do you believe this is an OOM killer > issue? Good question. This is a followup of the problem I had in 4.8.8 until I got a patch to fix the issue. Then, it used to OOM and later, to pile up I/O tasks like this. Now it doesn't OOM anymore, but tasks still pile up. I temporarily fixed the issue by doing this: gargamel:~# echo 0 > /proc/sys/vm/dirty_ratio gargamel:~# echo 0 > /proc/sys/vm/dirty_background_ratio of course my performance is abysmal now, but I can at least run btrfs scrub without piling up enough IO to deadlock the system. On Tue, May 02, 2017 at 07:44:47PM +0900, Tetsuo Handa wrote: > > Any idea what I should do next? > > Maybe you can try collecting list of all in-flight allocations with backtraces > using kmallocwd patches at > http://lkml.kernel.org/r/1489578541-81526-1-git-send-email-penguin-ker...@i-love.sakura.ne.jp > and > http://lkml.kernel.org/r/201704272019.jeh26057.shfotmljoov...@i-love.sakura.ne.jp > which also tracks mempool allocations. > (Well, the > > - cond_resched(); > + //cond_resched(); > > change in the latter patch would not be preferable.) Thanks. I can give that a shot as soon as my current scrub is done, it may take another 12 to 24H at this rate. In the meantimne, as explained above, not allowing any dirty VM has worked around the problem (Linus pointed out to me in the original thread that on a lightly loaded 24GB system, even 1 or 2% could still be a lot of memory for requests to pile up in and cause issues in degenerative cases like mine). Now I'm still curious what changed betweeen 4.8.8 + custom patches and 4.11 to cause this. Marc -- "A mouse is a device used to point at the xterm you want to type in" - A.S.R. Microsoft is to operating systems what McDonalds is to gourmet cooking Home page: http://marc.merlins.org/ | PGP 1024R/763BE901
Re: [GIT PULL] PCI fixes for v4.10
On 5/2/2017 6:49 AM, Lukas Wunner wrote: > On Mon, May 01, 2017 at 10:41:20PM -0400, Sinan Kaya wrote: >> On 5/1/2017 9:54 PM, Lukas Wunner wrote: >>> (b) ASPM L1 enabled on boot, but disabled after powering off and back on >>> => I believe Sinan is working on this (+cc). >> >> The decision was made not to touch ASPM registers following hotplug insertion >> unless pcie_aspm.policy=powersave is specified. >> >> The discussion is here: https://lkml.org/lkml/2017/4/17/255 >> >> This was done to maintain existing behavior and not break things. > > Thanks for the reference, I hadn't followed the discussion in April > very closely, but I think the outcome of the discussion is unfortunate. > > As can be seen in Ashok's tests, merely turning slot power off and back > on is sufficient to end up with a setting that draws more power. That > may be equally surprising for users as the issues would be that we seek > to avoid with a "safety-first" ASPM policy. In any case it seems > undesirable. > > I hope this is not the end if it and would like to encourage you to > keep working on this. Perhaps it is too simple to just define a > default policy, and what is really needed is a policy that adjusts > itself dynamically to specific devices or workloads, or that can be > influenced by device drivers. > I think our conclusion was to push PM decision to a userspace utility. ASPM already has a knob in sysfs that can be adjusted at runtime. The name of the file is policy. Same argument as the kernel command line but it is writable. Different policies can be programmed into this field after OS boot. It is really the system power management entity that needs to decide how system should behave. We just need to educate the userspace utility about the presence of ASPM policy variable. 1. OS could still boot with the default parameters. 2. Userspace utility sets the policy variable to powersave 3. User performs hotplug remove + insert 4. ASPM is enabled due to policy change. I don't know anything about what this userspace utility is or if it understands about PCIE ASPM at all. As an analogy, we can choose the level of power management in windows through Power Policy. control panel -> power options -> change advanced power options -> PCI Express -> Link State Power Management We want to have the same level of configuration. > Thanks for your efforts, > > Lukas > -- Sinan Kaya Qualcomm Datacenter Technologies, Inc. as an affiliate of Qualcomm Technologies, Inc. Qualcomm Technologies, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project.
Re: [usb-host] question about pointer dereference before null check
On Mon, 1 May 2017, Gustavo A. R. Silva wrote: > Hello everybody, > > While taking a look into Coverity ID 100828 I ran into the following > piece of code at drivers/usb/host/ehci-sched.c:1096: > > u32 addr; > int think_time; > int hs_transfers; > > addr = dev->ttport << 24; > if (!ehci_is_TDI(ehci) > || (dev->tt->hub != > ehci_to_hcd(ehci)->self.root_hub)) > addr |= dev->tt->hub->devnum << 16; > addr |= epnum << 8; > addr |= dev->devnum; > stream->ps.usecs = HS_USECS_ISO(maxp); > think_time = dev->tt ? dev->tt->think_time : 0; > > The issue here is that dev->tt is being dereferenced before null check. > > I was thinking on placing think_time = dev->tt ? dev->tt->think_time : > 0; just before the _if_ statement. But this doesn't address the > problem of dev->tt actually being NULL. > > While looking into the callers of the function containing this piece > of code (iso_stream_init()) my impression is that dev->tt is not NULL > at the time this function is called and, a very simple patch like the > following can be applied in order to avoid the Coverity issue: > > -think_time = dev->tt ? dev->tt->think_time : 0; > +think_time = dev->tt->think_time; > > But I can't tell for sure, so in case dev->tt is NULL, a good strategy > to properly update the _addr_ variable would be needed. > > What do you think? > > I would really appreciate any comment on this, > Thank you! You are right; udev->tt cannot ever be NULL when this section of code runs. The test should be removed. Alan Stern
Hopefully
Dear friend, My name is Mr Micheal Rita, I am the Bill and Exchange (assistant) Manager of Bank of Africa Ouagadougou, Burkina Faso. In my department I discovered an abandoned sum of teen million five hundred thousand United State of American dollars (10.5MILLION USA DOLLARS) in an account that belongs to one of our foreign customer who died in airline that crashed on 4th October 2001. Since I got information about his death I have been expecting his next of kin to come over and claim his money because we can not release it unless somebody applies for it as the next of kin or relation to the deceased as indicated in our banking guidelines, but unfortunately we learnt that all his supposed next of kin or relation died alongside with him in the plane crash leaving nobody behind for the claim. It is therefore upon this discovery that I decided to make this business proposal to you and release the money to you as next of kin or relation to the deceased for safety and subsequent disbursement since nobody is coming for it and I don't want the money to go into the bank treasury as unclaimed bill. You will be entitled with 40% of the total sum while 60% will be for me after which I will visit your Country to invest my own share when the fund is successfully transferred into your account, Please I would like you to keep this transaction confidential and as a top secret as you may wish to know that I am a bank official. Yours sincerely, Mr Micheal Rita.
[PATCH 0/6] md: Fine-tuning for some function implementations
From: Markus Elfring Date: Tue, 2 May 2017 16:12:34 +0200 Some update suggestions were taken into account from static source code analysis. Markus Elfring (6): Replace seven seq_printf() calls by seq_putc() Replace 17 seq_printf() calls by seq_puts() Adjust four function calls together with a variable assignment Use seq_puts() in faulty_status() Adjust six function calls together with a variable assignment in faulty_status() Add some spaces for better code readability drivers/md/faulty.c | 48 ++ drivers/md/md.c | 66 + 2 files changed, 64 insertions(+), 50 deletions(-) -- 2.12.2
[PATCH] Revert "KVM: Support vCPU-based gfn->hva cache"
This reverts commit bbd6411513aa8ef3ea02abab61318daf87c1af1e. I've been sitting on this revert for too long and it unfortunately missed 4.11. It's also the reason why I haven't merged ring-based dirty tracking for 4.12. Using kvm_vcpu_memslots in kvm_gfn_to_hva_cache_init and kvm_vcpu_write_guest_offset_cached means that the MSR value can now be used to access SMRAM, simply by making it point to an SMRAM physical address. This is problematic because it lets the guest OS overwrite memory that it shouldn't be able to touch. Cc: sta...@vger.kernel.org Fixes: bbd6411513aa8ef3ea02abab61318daf87c1af1e Signed-off-by: Paolo Bonzini --- arch/x86/kvm/lapic.c | 22 -- arch/x86/kvm/x86.c | 41 + include/linux/kvm_host.h | 16 virt/kvm/kvm_main.c | 34 +- 4 files changed, 58 insertions(+), 55 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index bad6a25067bc..9fa5b8164961 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -529,14 +529,16 @@ int kvm_apic_set_irq(struct kvm_vcpu *vcpu, struct kvm_lapic_irq *irq, static int pv_eoi_put_user(struct kvm_vcpu *vcpu, u8 val) { - return kvm_vcpu_write_guest_cached(vcpu, >arch.pv_eoi.data, , - sizeof(val)); + + return kvm_write_guest_cached(vcpu->kvm, >arch.pv_eoi.data, , + sizeof(val)); } static int pv_eoi_get_user(struct kvm_vcpu *vcpu, u8 *val) { - return kvm_vcpu_read_guest_cached(vcpu, >arch.pv_eoi.data, val, - sizeof(*val)); + + return kvm_read_guest_cached(vcpu->kvm, >arch.pv_eoi.data, val, + sizeof(*val)); } static inline bool pv_eoi_enabled(struct kvm_vcpu *vcpu) @@ -2285,8 +2287,8 @@ void kvm_lapic_sync_from_vapic(struct kvm_vcpu *vcpu) if (!test_bit(KVM_APIC_CHECK_VAPIC, >arch.apic_attention)) return; - if (kvm_vcpu_read_guest_cached(vcpu, >arch.apic->vapic_cache, , - sizeof(u32))) + if (kvm_read_guest_cached(vcpu->kvm, >arch.apic->vapic_cache, , + sizeof(u32))) return; apic_set_tpr(vcpu->arch.apic, data & 0xff); @@ -2338,14 +2340,14 @@ void kvm_lapic_sync_to_vapic(struct kvm_vcpu *vcpu) max_isr = 0; data = (tpr & 0xff) | ((max_isr & 0xf0) << 8) | (max_irr << 24); - kvm_vcpu_write_guest_cached(vcpu, >arch.apic->vapic_cache, , - sizeof(u32)); + kvm_write_guest_cached(vcpu->kvm, >arch.apic->vapic_cache, , + sizeof(u32)); } int kvm_lapic_set_vapic_addr(struct kvm_vcpu *vcpu, gpa_t vapic_addr) { if (vapic_addr) { - if (kvm_vcpu_gfn_to_hva_cache_init(vcpu, + if (kvm_gfn_to_hva_cache_init(vcpu->kvm, >arch.apic->vapic_cache, vapic_addr, sizeof(u32))) return -EINVAL; @@ -2439,7 +2441,7 @@ int kvm_lapic_enable_pv_eoi(struct kvm_vcpu *vcpu, u64 data) vcpu->arch.pv_eoi.msr_val = data; if (!pv_eoi_enabled(vcpu)) return 0; - return kvm_vcpu_gfn_to_hva_cache_init(vcpu, >arch.pv_eoi.data, + return kvm_gfn_to_hva_cache_init(vcpu->kvm, >arch.pv_eoi.data, addr, sizeof(u8)); } diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c index 2fe9aa116288..b38a302858a0 100644 --- a/arch/x86/kvm/x86.c +++ b/arch/x86/kvm/x86.c @@ -1785,7 +1785,7 @@ static void kvm_setup_pvclock_page(struct kvm_vcpu *v) struct kvm_vcpu_arch *vcpu = >arch; struct pvclock_vcpu_time_info guest_hv_clock; - if (unlikely(kvm_vcpu_read_guest_cached(v, >pv_time, + if (unlikely(kvm_read_guest_cached(v->kvm, >pv_time, _hv_clock, sizeof(guest_hv_clock return; @@ -1806,9 +1806,9 @@ static void kvm_setup_pvclock_page(struct kvm_vcpu *v) BUILD_BUG_ON(offsetof(struct pvclock_vcpu_time_info, version) != 0); vcpu->hv_clock.version = guest_hv_clock.version + 1; - kvm_vcpu_write_guest_cached(v, >pv_time, - >hv_clock, - sizeof(vcpu->hv_clock.version)); + kvm_write_guest_cached(v->kvm, >pv_time, + >hv_clock, + sizeof(vcpu->hv_clock.version)); smp_wmb(); @@ -1822,16 +1822,16 @@ static void kvm_setup_pvclock_page(struct kvm_vcpu *v) trace_kvm_pvclock_update(v->vcpu_id, >hv_clock); - kvm_vcpu_write_guest_cached(v, >pv_time, - >hv_clock, - sizeof(vcpu->hv_clock)); + kvm_write_guest_cached(v->kvm,
[PATCH 1/6] md: Replace seven seq_printf() calls by seq_putc()
From: Markus Elfring Date: Tue, 2 May 2017 14:01:17 +0200 A few single characters should be put into a sequence. Thus use the corresponding function "seq_putc". This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring --- drivers/md/md.c | 16 1 file changed, 8 insertions(+), 8 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 82f798be964f..7f0a35ee192a 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -7498,7 +7498,7 @@ static void status_unused(struct seq_file *seq) if (!i) seq_printf(seq, ""); - seq_printf(seq, "\n"); + seq_putc(seq, '\n'); } static int status_resync(struct seq_file *seq, struct mddev *mddev) @@ -7552,12 +7552,13 @@ static int status_resync(struct seq_file *seq, struct mddev *mddev) per_milli = res; { int i, x = per_milli/50, y = 20-x; - seq_printf(seq, "["); + + seq_putc(seq, '['); for (i = 0; i < x; i++) - seq_printf(seq, "="); - seq_printf(seq, ">"); + seq_putc(seq, '='); + seq_putc(seq, '>'); for (i = 0; i < y; i++) - seq_printf(seq, "."); + seq_putc(seq, '.'); seq_printf(seq, "] "); } seq_printf(seq, " %s =%3u.%u%% (%llu/%llu)", @@ -7678,7 +7679,7 @@ static int md_seq_show(struct seq_file *seq, void *v) seq_printf(seq, "[%s] ", pers->name); spin_unlock(_lock); - seq_printf(seq, "\n"); + seq_putc(seq, '\n'); seq->poll_event = atomic_read(_event_count); return 0; } @@ -7754,8 +7755,7 @@ static int md_seq_show(struct seq_file *seq, void *v) seq_printf(seq, "\n "); bitmap_status(seq, mddev->bitmap); - - seq_printf(seq, "\n"); + seq_putc(seq, '\n'); } spin_unlock(>lock); -- 2.12.2
[PATCH 2/6] md: Replace 17 seq_printf() calls by seq_puts()
From: Markus Elfring Date: Tue, 2 May 2017 14:22:45 +0200 Strings which did not contain data format specifications should be put into a sequence. Thus use the corresponding function "seq_puts". This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring --- drivers/md/md.c | 38 -- 1 file changed, 20 insertions(+), 18 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 7f0a35ee192a..65233a91e5e2 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -7487,7 +7487,7 @@ static void status_unused(struct seq_file *seq) int i = 0; struct md_rdev *rdev; - seq_printf(seq, "unused devices: "); + seq_puts(seq, "unused devices: "); list_for_each_entry(rdev, _raid_disks, same_set) { char b[BDEVNAME_SIZE]; @@ -7496,7 +7496,7 @@ static void status_unused(struct seq_file *seq) bdevname(rdev->bdev,b)); } if (!i) - seq_printf(seq, ""); + seq_puts(seq, ""); seq_putc(seq, '\n'); } @@ -7525,13 +7525,13 @@ static int status_resync(struct seq_file *seq, struct mddev *mddev) if (resync == 0) { if (mddev->recovery_cp < MaxSector) { - seq_printf(seq, "\tresync=PENDING"); + seq_puts(seq, "\tresync=PENDING"); return 1; } return 0; } if (resync < 3) { - seq_printf(seq, "\tresync=DELAYED"); + seq_puts(seq, "\tresync=DELAYED"); return 1; } @@ -7559,7 +7559,7 @@ static int status_resync(struct seq_file *seq, struct mddev *mddev) seq_putc(seq, '>'); for (i = 0; i < y; i++) seq_putc(seq, '.'); - seq_printf(seq, "] "); + seq_puts(seq, "] "); } seq_printf(seq, " %s =%3u.%u%% (%llu/%llu)", (test_bit(MD_RECOVERY_RESHAPE, >recovery)? @@ -7673,7 +7673,8 @@ static int md_seq_show(struct seq_file *seq, void *v) if (v == (void*)1) { struct md_personality *pers; - seq_printf(seq, "Personalities : "); + + seq_puts(seq, "Personalities : "); spin_lock(_lock); list_for_each_entry(pers, _list, list) seq_printf(seq, "[%s] ", pers->name); @@ -7694,9 +7695,9 @@ static int md_seq_show(struct seq_file *seq, void *v) mddev->pers ? "" : "in"); if (mddev->pers) { if (mddev->ro==1) - seq_printf(seq, " (read-only)"); + seq_puts(seq, " (read-only)"); if (mddev->ro==2) - seq_printf(seq, " (auto-read-only)"); + seq_puts(seq, " (auto-read-only)"); seq_printf(seq, " %s", mddev->pers->name); } @@ -7707,17 +7708,17 @@ static int md_seq_show(struct seq_file *seq, void *v) seq_printf(seq, " %s[%d]", bdevname(rdev->bdev,b), rdev->desc_nr); if (test_bit(WriteMostly, >flags)) - seq_printf(seq, "(W)"); + seq_puts(seq, "(W)"); if (test_bit(Journal, >flags)) - seq_printf(seq, "(J)"); + seq_puts(seq, "(J)"); if (test_bit(Faulty, >flags)) { - seq_printf(seq, "(F)"); + seq_puts(seq, "(F)"); continue; } if (rdev->raid_disk < 0) - seq_printf(seq, "(S)"); /* spare */ + seq_puts(seq, "(S)"); /* spare */ if (test_bit(Replacement, >flags)) - seq_printf(seq, "(R)"); + seq_puts(seq, "(R)"); sectors += rdev->sectors; } rcu_read_unlock(); @@ -7742,17 +7743,18 @@ static int md_seq_show(struct seq_file *seq, void *v) seq_printf(seq, " super external:%s", mddev->metadata_type); else - seq_printf(seq, " super non-persistent"); + seq_puts(seq, " super non-persistent"); if (mddev->pers) { mddev->pers->status(seq, mddev); - seq_printf(seq, "\n "); + seq_puts(seq, "\n "); if (mddev->pers->sync_request) { if (status_resync(seq, mddev)) -
[PATCH 3/6] md: Adjust four function calls together with a variable assignment
From: Markus Elfring Date: Tue, 2 May 2017 15:07:21 +0200 The script "checkpatch.pl" pointed information out like the following. ERROR: do not use assignment in if condition Thus fix affected source code places. Signed-off-by: Markus Elfring --- drivers/md/md.c | 12 1 file changed, 8 insertions(+), 4 deletions(-) diff --git a/drivers/md/md.c b/drivers/md/md.c index 65233a91e5e2..60580095a5de 100644 --- a/drivers/md/md.c +++ b/drivers/md/md.c @@ -2153,7 +2153,8 @@ static int bind_rdev_to_array(struct md_rdev *rdev, struct mddev *mddev) rdev->mddev = mddev; pr_debug("md: bind<%s>\n", b); - if ((err = kobject_add(>kobj, >kobj, "dev-%s", b))) + err = kobject_add(>kobj, >kobj, "dev-%s", b); + if (err) goto fail; ko = _to_dev(rdev->bdev->bd_part)->kobj; @@ -7306,7 +7307,8 @@ static int md_open(struct block_device *bdev, fmode_t mode) } BUG_ON(mddev != bdev->bd_disk->private_data); - if ((err = mutex_lock_interruptible(>open_mutex))) + err = mutex_lock_interruptible(>open_mutex); + if (err) goto out; if (test_bit(MD_CLOSING, >flags)) { @@ -8927,10 +8929,12 @@ static int __init md_init(void) if (!md_misc_wq) goto err_misc_wq; - if ((ret = register_blkdev(MD_MAJOR, "md")) < 0) + ret = register_blkdev(MD_MAJOR, "md"); + if (ret < 0) goto err_md; - if ((ret = register_blkdev(0, "mdp")) < 0) + ret = register_blkdev(0, "mdp"); + if (ret < 0) goto err_mdp; mdp_major = ret; -- 2.12.2
Re: [PATCH 1/3] drm: fourcc byteorder: drop DRM_FORMAT_BIG_ENDIAN
On Tue, 2 May 2017 14:53:43 +0100 Emil Velikov wrote: > Hi Gerd, > > I did not have the change to follow through the discussion. > Pardon if my suggestion have already been discussed. > > On 2 May 2017 at 14:34, Gerd Hoffmann wrote: > > It's unused. > > > > Suggested-by: Daniel Vetter > > Signed-off-by: Gerd Hoffmann > > --- > > include/uapi/drm/drm_fourcc.h | 2 -- > > drivers/gpu/drm/drm_fourcc.c | 3 +-- > > drivers/gpu/drm/drm_framebuffer.c | 2 +- > > 3 files changed, 2 insertions(+), 5 deletions(-) > > > > diff --git a/include/uapi/drm/drm_fourcc.h b/include/uapi/drm/drm_fourcc.h > > index 995c8f9c69..305bc34be0 100644 > > --- a/include/uapi/drm/drm_fourcc.h > > +++ b/include/uapi/drm/drm_fourcc.h > > @@ -33,8 +33,6 @@ extern "C" { > > #define fourcc_code(a, b, c, d) ((__u32)(a) | ((__u32)(b) << 8) | \ > > ((__u32)(c) << 16) | ((__u32)(d) << 24)) > > > > -#define DRM_FORMAT_BIG_ENDIAN (1<<31) /* format is big endian instead of > > little endian */ > > - > > Weston references DRM_FORMAT_BIG_ENDIAN thus this patch will lead to > build breakage. > That is never fun, so please carefully coordinate with the Weston devs > to minimise the fireworks. > > Personally I would leave the symbol, since it's UAPI and document that > should not be used. Not my call, so feel free to ignore. Hi, indeed, weston does have one occurrence of it. I don't think it has actually been needed in practice ever, but removing it will cause a build failure: https://cgit.freedesktop.org/wayland/weston/tree/libweston/gl-renderer.c?id=2.0.0#n1820 Funnily enough, it's only ever used to get rid of the bit, "just in case". I also think that this patch requires more comments than the commit message has at the moment. Removing the definition also removes the possibility to describe a lot of pixel formats, so that should definitely be mentioned. I think it would also be good to have some kind of justified claim that no hardware actually needs the pixel formats this is removing (e.g. RGB565 BE). Maybe this was already in the long discussions, but I feel it should be summarized in the commit message. Thanks, pq pgp6YKhPrkfy_.pgp Description: OpenPGP digital signature
Re: [patch 2/2] MM: allow per-cpu vmstat_threshold and vmstat_worker configuration
On Tue, 25 Apr 2017 10:57:19 -0300 Marcelo Tosatti wrote: > The per-CPU vmstat worker is a problem on -RT workloads (because > ideally the CPU is entirely reserved for the -RT app, without > interference). The worker transfers accumulated per-CPU > vmstat counters to global counters. This is a problem for non-RT too. Any task pinned to an isolated CPU that doesn't want to be ever interrupted will be interrupted by the vmstat kworker. > To resolve the problem, create two tunables: > > * Userspace configurable per-CPU vmstat threshold: by default the > VM code calculates the size of the per-CPU vmstat arrays. This > tunable allows userspace to configure the values. > > * Userspace configurable per-CPU vmstat worker: allow disabling > the per-CPU vmstat worker. I have several questions about the tunables: - What does the vmstat_threshold value mean? What are the implications of changing this value? What's the difference in choosing 1, 2, 3 or 500? - If the purpose of having vmstat_threshold is to allow disabling the vmstat kworker, why can't the kernel pick a value automatically? - What are the implications of disabling the vmstat kworker? Will vm stats still be collected someway or will it be completely off for the CPU? Also, shouldn't this patch be split into two? > The patch below contains documentation which describes the tunables > in more detail. > > Signed-off-by: Marcelo Tosatti > > --- > Documentation/vm/vmstat_thresholds.txt | 38 + > mm/vmstat.c| 248 > +++-- > 2 files changed, 272 insertions(+), 14 deletions(-) > > Index: linux-2.6-git-disable-vmstat-worker/mm/vmstat.c > === > --- linux-2.6-git-disable-vmstat-worker.orig/mm/vmstat.c 2017-04-25 > 07:39:13.941019853 -0300 > +++ linux-2.6-git-disable-vmstat-worker/mm/vmstat.c 2017-04-25 > 10:44:51.581977296 -0300 > @@ -91,8 +91,17 @@ > EXPORT_SYMBOL(vm_zone_stat); > EXPORT_SYMBOL(vm_node_stat); > > +struct vmstat_uparam { > + atomic_t vmstat_work_enabled; > + atomic_t user_stat_thresh; > +}; > + > +static DEFINE_PER_CPU(struct vmstat_uparam, vmstat_uparam); > + > #ifdef CONFIG_SMP > > +#define MAX_THRESHOLD 125 > + > int calculate_pressure_threshold(struct zone *zone) > { > int threshold; > @@ -110,9 +119,9 @@ > threshold = max(1, (int)(watermark_distance / num_online_cpus())); > > /* > - * Maximum threshold is 125 > + * Maximum threshold is MAX_THRESHOLD == 125 >*/ > - threshold = min(125, threshold); > + threshold = min(MAX_THRESHOLD, threshold); > > return threshold; > } > @@ -188,15 +197,31 @@ > threshold = calculate_normal_threshold(zone); > > for_each_online_cpu(cpu) { > - int pgdat_threshold; > + int pgdat_threshold, ustat_thresh; > + struct vmstat_uparam *vup; > > - per_cpu_ptr(zone->pageset, cpu)->stat_threshold > - = threshold; > + struct per_cpu_nodestat __percpu *pcp; > + struct per_cpu_pageset *p; > + > + p = per_cpu_ptr(zone->pageset, cpu); > + > + vup = _cpu(vmstat_uparam, cpu); > + ustat_thresh = atomic_read(>user_stat_thresh); > + > + if (ustat_thresh) > + p->stat_threshold = ustat_thresh; > + else > + p->stat_threshold = threshold; > + > + pcp = per_cpu_ptr(pgdat->per_cpu_nodestats, cpu); > > /* Base nodestat threshold on the largest populated > zone. */ > - pgdat_threshold = per_cpu_ptr(pgdat->per_cpu_nodestats, > cpu)->stat_threshold; > - per_cpu_ptr(pgdat->per_cpu_nodestats, > cpu)->stat_threshold > - = max(threshold, pgdat_threshold); > + pgdat_threshold = pcp->stat_threshold; > + if (ustat_thresh) > + pcp->stat_threshold = ustat_thresh; > + else > + pcp->stat_threshold = max(threshold, > + pgdat_threshold); > } > > /* > @@ -226,9 +251,24 @@ > continue; > > threshold = (*calculate_pressure)(zone); > - for_each_online_cpu(cpu) > + for_each_online_cpu(cpu) { > + int t, ustat_thresh; > + struct vmstat_uparam *vup; > + > + vup = _cpu(vmstat_uparam, cpu); > + ustat_thresh = atomic_read(>user_stat_thresh); > + t = threshold; > + > + /* > +
[PATCH 4/6] md: Use seq_puts() in faulty_status()
From: Markus Elfring Date: Tue, 2 May 2017 15:20:30 +0200 A string which did not contain a data format specification should be put into a sequence. Thus use the corresponding function "seq_puts". This issue was detected by using the Coccinelle software. Signed-off-by: Markus Elfring --- drivers/md/faulty.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/md/faulty.c b/drivers/md/faulty.c index b0536cfd8e17..f5536c91be5c 100644 --- a/drivers/md/faulty.c +++ b/drivers/md/faulty.c @@ -253,7 +253,7 @@ static void faulty_status(struct seq_file *seq, struct mddev *mddev) n, conf->period[ReadFixable]); if ((n=atomic_read(>counters[WriteAll])) != 0) - seq_printf(seq, " WriteAll"); + seq_puts(seq, " WriteAll"); seq_printf(seq, " nfaults=%d", conf->nfaults); } -- 2.12.2
[PATCH 5/6] md: Adjust six function calls together with a variable assignment in faulty_status()
From: Markus Elfring Date: Tue, 2 May 2017 15:35:35 +0200 The script "checkpatch.pl" pointed information out like the following. ERROR: do not use assignment in if condition Thus fix the affected source code places. Signed-off-by: Markus Elfring --- drivers/md/faulty.c | 19 --- 1 file changed, 12 insertions(+), 7 deletions(-) diff --git a/drivers/md/faulty.c b/drivers/md/faulty.c index f5536c91be5c..2573009b1265 100644 --- a/drivers/md/faulty.c +++ b/drivers/md/faulty.c @@ -231,28 +231,33 @@ static void faulty_status(struct seq_file *seq, struct mddev *mddev) struct faulty_conf *conf = mddev->private; int n; - if ((n=atomic_read(>counters[WriteTransient])) != 0) + n = atomic_read(>counters[WriteTransient]); + if (n != 0) seq_printf(seq, " WriteTransient=%d(%d)", n, conf->period[WriteTransient]); - if ((n=atomic_read(>counters[ReadTransient])) != 0) + n = atomic_read(>counters[ReadTransient]); + if (n != 0) seq_printf(seq, " ReadTransient=%d(%d)", n, conf->period[ReadTransient]); - if ((n=atomic_read(>counters[WritePersistent])) != 0) + n = atomic_read(>counters[WritePersistent]); + if (n != 0) seq_printf(seq, " WritePersistent=%d(%d)", n, conf->period[WritePersistent]); - if ((n=atomic_read(>counters[ReadPersistent])) != 0) + n = atomic_read(>counters[ReadPersistent]); + if (n != 0) seq_printf(seq, " ReadPersistent=%d(%d)", n, conf->period[ReadPersistent]); - - if ((n=atomic_read(>counters[ReadFixable])) != 0) + n = atomic_read(>counters[ReadFixable]); + if (n != 0) seq_printf(seq, " ReadFixable=%d(%d)", n, conf->period[ReadFixable]); - if ((n=atomic_read(>counters[WriteAll])) != 0) + n = atomic_read(>counters[WriteAll]); + if (n != 0) seq_puts(seq, " WriteAll"); seq_printf(seq, " nfaults=%d", conf->nfaults); -- 2.12.2
Re: [PATCH RFC v2 4/6] proc: support mounting private procfs instances inside same pid namespace
On Thu, Apr 27, 2017 at 12:13 AM, Andy Lutomirski wrote: > On Tue, Apr 25, 2017 at 5:23 AM, Djalal Harouni wrote: [...] >> We have to align procfs and modernize it to have a per mount context >> where at least the mount option do not propagate to all other mounts, >> then maybe we can continue to implement new features. One example is to >> require CAP_SYS_ADMIN in the init user namespace on some /proc/* which are >> not pids and which are are not virtualized by design, or CAP_NET_ADMIN >> inside userns on the net bits that are virtualized, etc. >> These mount options won't propagate to previous mounts, and the system >> will continue to be usable. >> >> Ths patch introduces the new 'limit_pids' mount option as it was also >> suggesed by Andy Lutomirski [1]. When this option is passed we >> automatically create a private procfs instance. This is not the default >> behaviour since we do not want to break userspace and we do not want to >> provide different devices IDs by default, please see [1] for why. > > I think that calling the option to make a separate instance > "limit_pids" is extremely counterintuitive. Ok. > My strong preference would be to make proc *always* make a separate > instance (unless it's a bind mount) and to make it work. If that > means fudging stat() output, so be it. I also agree, but as said if we change stat(), userspace won't be able to notice if these two proc instances are really separated, the device ID is the only indication here. So, > Failing that, let's come up with some coherent way to make this work. > "new_instance" or similar would do. Then make limit_pid cause an > error unless new_instance is also set. This is reasonable and it follows devpts filesystem, when 'new_instance' was introduced to gradually switch to private instances, then it was made default behaviour. We can do the same and improve the situation bit by bit without breaking anything. I will prepare a new clean version with "newinstance" and "pids=ptraceable" requiring it, this way we don't change anything that is related to current 'hidepid' behaviour. Let me know please if you don't agree. Thank you for the feedback! -- tixxdz
[PATCH 6/6] md: Add some spaces for better code readability
From: Markus Elfring Date: Tue, 2 May 2017 16:00:45 +0200 Use space characters and blank lines at some source code places according to the Linux coding style convention. Signed-off-by: Markus Elfring --- drivers/md/faulty.c | 27 +++ 1 file changed, 15 insertions(+), 12 deletions(-) diff --git a/drivers/md/faulty.c b/drivers/md/faulty.c index 2573009b1265..434c3643c9c3 100644 --- a/drivers/md/faulty.c +++ b/drivers/md/faulty.c @@ -110,19 +110,20 @@ static int check_sector(struct faulty_conf *conf, sector_t start, sector_t end, { /* If we find a ReadFixable sector, we fix it ... */ int i; - for (i=0; infaults; i++) + + for (i = 0; i < conf->nfaults; i++) if (conf->faults[i] >= start && conf->faults[i] < end) { /* found it ... */ switch (conf->modes[i] * 2 + dir) { - case WritePersistent*2+WRITE: return 1; - case ReadPersistent*2+READ: return 1; - case ReadFixable*2+READ: return 1; - case ReadFixable*2+WRITE: + case WritePersistent * 2 + WRITE: return 1; + case ReadPersistent * 2 + READ: return 1; + case ReadFixable * 2 + READ: return 1; + case ReadFixable * 2 + WRITE: conf->modes[i] = NoPersist; return 0; - case AllPersist*2+READ: - case AllPersist*2+WRITE: return 1; + case AllPersist * 2 + READ: + case AllPersist * 2 + WRITE: return 1; default: return 0; } @@ -134,9 +135,10 @@ static void add_sector(struct faulty_conf *conf, sector_t start, int mode) { int i; int n = conf->nfaults; - for (i=0; infaults; i++) + + for (i = 0; i < conf->nfaults; i++) if (conf->faults[i] == start) { - switch(mode) { + switch (mode) { case NoPersist: conf->modes[i] = mode; return; case WritePersistent: if (conf->modes[i] == ReadPersistent || @@ -167,7 +169,7 @@ static void add_sector(struct faulty_conf *conf, sector_t start, int mode) conf->faults[n] = start; conf->modes[n] = mode; if (conf->nfaults == n) - conf->nfaults = n+1; + conf->nfaults = n + 1; } static void faulty_make_request(struct mddev *mddev, struct bio *bio) @@ -278,7 +280,8 @@ static int faulty_reshape(struct mddev *mddev) conf->nfaults = 0; else if (mode == ClearErrors) { int i; - for (i=0 ; i < Modes ; i++) { + + for (i = 0; i < Modes; i++) { conf->period[i] = 0; atomic_set(>counters[i], 0); } @@ -317,7 +320,7 @@ static int faulty_run(struct mddev *mddev) if (!conf) return -ENOMEM; - for (i=0; icounters[i], 0); conf->period[i] = 0; } -- 2.12.2
Re: [PATCH] usb: gadget: udc: add null check before pointer dereference
On Mon, 1 May 2017, Gustavo A. R. Silva wrote: > Add null check before dereferencing dev->regs pointer inside > net2280_led_shutdown() function. > > Addresses-Coverity-ID: 101783 > Signed-off-by: Gustavo A. R. Silva > --- > drivers/usb/gadget/udc/net2280.c | 8 +--- > 1 file changed, 5 insertions(+), 3 deletions(-) > > diff --git a/drivers/usb/gadget/udc/net2280.c > b/drivers/usb/gadget/udc/net2280.c > index 3828c2e..1898a4b 100644 > --- a/drivers/usb/gadget/udc/net2280.c > +++ b/drivers/usb/gadget/udc/net2280.c > @@ -3573,7 +3573,11 @@ static void net2280_remove(struct pci_dev *pdev) > BUG_ON(dev->driver); > > /* then clean up the resources we allocated during probe() */ > - net2280_led_shutdown(dev); > + > + if (dev->regs) { > + net2280_led_shutdown(dev); > + iounmap(dev->regs); > + } > if (dev->requests) { > int i; > for (i = 1; i < 5; i++) { > @@ -3588,8 +3592,6 @@ static void net2280_remove(struct pci_dev *pdev) > free_irq(pdev->irq, dev); > if (dev->quirks & PLX_PCIE) > pci_disable_msi(pdev); > - if (dev->regs) > - iounmap(dev->regs); > if (dev->region) > release_mem_region(pci_resource_start(pdev, 0), > pci_resource_len(pdev, 0)); No, you must not move the iounmap() call, because an interrupt could theoretically occur at any time. Either just live with an extra test of dev->regs, or else move the net2280_led_shutdown() call later. Alan Stern
Re: [PATCH] md/bitmap: use i_blocksize()
On Sun, Jan 22, 2017 at 09:50:30AM +0800, Geliang Tang wrote: > On Sat, Jan 21, 2017 at 10:13:07AM -0800, Shaohua Li wrote: > > On Fri, Jan 20, 2017 at 10:29:52PM +0800, Geliang Tang wrote: > > > Since i_blocksize() helper has been defined in fs.h, use it instead > > > of open-coding. > > > > which tree is this patch applied to? I can't find it in Linus's tree > > > > This patch is against the last linux-next tree, next-20170120. It > depends on the patch '2651ed7 fs: add i_blocksize()' which isn't merged > into Linus's tree yet. > > -Geliang Hi Shaohua, The i_blocksize() helper has been merged into Linus's tree already, and this patch is still valid. Could you please reconsider it if this patch can be applied? Thanks. -Geliang
Re: [PATCH 00/14] sched/topology fixes
On Fri, Apr 28, 2017 at 03:53:39PM +0200, Peter Zijlstra wrote: > Also, the following occurred to me: > > sg_span & sg_mask == sg_mask > > Therefore, we don't need to do the whole "sg_span &" business. > > Hmm? > @@ -856,7 +857,7 @@ build_sched_groups(struct sched_domain * > continue; > > group = get_group(i, sdd, ); > - cpumask_setall(sched_group_mask(sg)); > + cpumask_copy(sched_group_mask(sg), sched_group_cpus(sg)); > > for_each_cpu(j, span) { > if (get_group(j, sdd, NULL) != group) OK, so this explodes mightily. That code also hurt my brain bad, so I had to fix that a little. The below seems to boot. --- --- a/kernel/sched/fair.c +++ b/kernel/sched/fair.c @@ -7996,7 +7996,7 @@ static int active_load_balance_cpu_stop( static int should_we_balance(struct lb_env *env) { struct sched_group *sg = env->sd->groups; - struct cpumask *sg_cpus, *sg_mask; + struct cpumask *sg_mask; int cpu, balance_cpu = -1; /* @@ -8006,11 +8006,10 @@ static int should_we_balance(struct lb_e if (env->idle == CPU_NEWLY_IDLE) return 1; - sg_cpus = sched_group_cpus(sg); sg_mask = sched_group_mask(sg); /* Try to find first idle cpu */ - for_each_cpu_and(cpu, sg_cpus, env->cpus) { - if (!cpumask_test_cpu(cpu, sg_mask) || !idle_cpu(cpu)) + for_each_cpu_and(cpu, sg_mask, env->cpus) { + if (!idle_cpu(cpu)) continue; balance_cpu = cpu; --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -85,7 +85,8 @@ static int sched_domain_debug_one(struct group->sgc->id, cpumask_pr_args(sched_group_cpus(group))); - if ((sd->flags & SD_OVERLAP) && !cpumask_full(sched_group_mask(group))) { + if ((sd->flags & SD_OVERLAP) && + !cpumask_equal(sched_group_mask(group), sched_group_cpus(group))) { printk(KERN_CONT " mask=%*pbl", cpumask_pr_args(sched_group_mask(group))); } @@ -505,7 +506,7 @@ enum s_alloc { */ int group_balance_cpu(struct sched_group *sg) { - return cpumask_first_and(sched_group_cpus(sg), sched_group_mask(sg)); + return cpumask_first(sched_group_mask(sg)); } @@ -833,23 +834,34 @@ build_overlap_sched_groups(struct sched_ * [*] in other words, the first group of each domain is its child domain. */ -static int get_group(int cpu, struct sd_data *sdd, struct sched_group **sg) +static struct sched_group *get_group(int cpu, struct sd_data *sdd) { struct sched_domain *sd = *per_cpu_ptr(sdd->sd, cpu); struct sched_domain *child = sd->child; + struct sched_group *sg; if (child) cpu = cpumask_first(sched_domain_span(child)); - if (sg) { - *sg = *per_cpu_ptr(sdd->sg, cpu); - (*sg)->sgc = *per_cpu_ptr(sdd->sgc, cpu); + sg = *per_cpu_ptr(sdd->sg, cpu); + sg->sgc = *per_cpu_ptr(sdd->sgc, cpu); - /* For claim_allocations: */ - atomic_set(&(*sg)->sgc->ref, 1); + /* For claim_allocations: */ + atomic_inc(>ref); + atomic_inc(>sgc->ref); + + if (child) { + cpumask_copy(sched_group_cpus(sg), sched_domain_span(child)); + cpumask_copy(sched_group_mask(sg), sched_group_cpus(sg)); + } else { + cpumask_set_cpu(cpu, sched_group_cpus(sg)); + cpumask_set_cpu(cpu, sched_group_cpus(sg)); } - return cpu; + sg->sgc->capacity = SCHED_CAPACITY_SCALE * cpumask_weight(sched_group_cpus(sg)); + sg->sgc->min_capacity = SCHED_CAPACITY_SCALE; + + return sg; } /* @@ -868,34 +880,20 @@ build_sched_groups(struct sched_domain * struct cpumask *covered; int i; - get_group(cpu, sdd, >groups); - atomic_inc(>groups->ref); - - if (cpu != cpumask_first(span)) - return 0; - lockdep_assert_held(_domains_mutex); covered = sched_domains_tmpmask; cpumask_clear(covered); - for_each_cpu(i, span) { + for_each_cpu_wrap(i, span, cpu) { struct sched_group *sg; - int group, j; if (cpumask_test_cpu(i, covered)) continue; - group = get_group(i, sdd, ); - cpumask_setall(sched_group_mask(sg)); + sg = get_group(i, sdd); - for_each_cpu(j, span) { - if (get_group(j, sdd, NULL) != group) - continue; - - cpumask_set_cpu(j, covered); - cpumask_set_cpu(j, sched_group_cpus(sg)); - } + cpumask_or(covered, covered, sched_group_cpus(sg));
[PATCH V2 0/3] try to save some memory for kmem_cache in some cases
kmem_cache is a frequently used data in kernel. During the code reading, I found maybe we could save some space in some cases. 1. On 64bit arch, type int will occupy a word if it doesn't sit well. 2. cpu_slab->partial is just used when CONFIG_SLUB_CPU_PARTIAL is set 3. cpu_partial is just used when CONFIG_SLUB_CPU_PARTIAL is set, while just save some space on 32bit arch. v2: define some macro to make the code more elegant Wei Yang (3): mm/slub: pack red_left_pad with another int to save a word mm/slub: wrap cpu_slab->partial in CONFIG_SLUB_CPU_PARTIAL mm/slub: wrap kmem_cache->cpu_partial in config CONFIG_SLUB_CPU_PARTIAL include/linux/slub_def.h | 34 ++- mm/slub.c| 85 ++-- 2 files changed, 80 insertions(+), 39 deletions(-) -- 2.11.0
[PATCH V2 1/3] mm/slub: pack red_left_pad with another int to save a word
On 64bit arch, struct is 8-bytes aligned, so int will occupy a word if it doesn't sits well. This patch pack red_left_pad with reserved to save 8 bytes for struct kmem_cache on a 64bit arch. Signed-off-by: Wei Yang --- include/linux/slub_def.h | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index 07ef550c6627..ec13aab32647 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -79,9 +79,9 @@ struct kmem_cache { int inuse; /* Offset to metadata */ int align; /* Alignment */ int reserved; /* Reserved bytes at the end of slabs */ + int red_left_pad; /* Left redzone padding size */ const char *name; /* Name (only for display!) */ struct list_head list; /* List of slab caches */ - int red_left_pad; /* Left redzone padding size */ #ifdef CONFIG_SYSFS struct kobject kobj;/* For sysfs */ #endif -- 2.11.0
[PATCH V2 3/3] mm/slub: wrap kmem_cache->cpu_partial in config CONFIG_SLUB_CPU_PARTIAL
kmem_cache->cpu_partial is just used when CONFIG_SLUB_CPU_PARTIAL is set, so wrap it with config CONFIG_SLUB_CPU_PARTIAL will save some space on 32bit arch. This patch wrap kmem_cache->cpu_partial in config CONFIG_SLUB_CPU_PARTIAL and wrap its sysfs too. Signed-off-by: Wei Yang --- v2: define slub_cpu_partial() to make code more elegant --- include/linux/slub_def.h | 13 + mm/slub.c| 69 ++-- 2 files changed, 51 insertions(+), 31 deletions(-) diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index f882a34bb9aa..d808e8e6293b 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -86,7 +86,9 @@ struct kmem_cache { int size; /* The size of an object including meta data */ int object_size;/* The size of an object without meta data */ int offset; /* Free pointer offset. */ +#ifdef CONFIG_SLUB_CPU_PARTIAL int cpu_partial;/* Number of per cpu partial objects to keep around */ +#endif struct kmem_cache_order_objects oo; /* Allocation and freeing of slabs */ @@ -130,6 +132,17 @@ struct kmem_cache { struct kmem_cache_node *node[MAX_NUMNODES]; }; +#ifdef CONFIG_SLUB_CPU_PARTIAL +#define slub_cpu_partial(s)((s)->cpu_partial) +#define slub_set_cpu_partial(s, n) \ +({ \ + slub_cpu_partial(s) = (n); \ +}) +#else +#define slub_cpu_partial(s)(0) +#define slub_set_cpu_partial(s, n) +#endif // CONFIG_SLUB_CPU_PARTIAL + #ifdef CONFIG_SYSFS #define SLAB_SUPPORTS_SYSFS void sysfs_slab_release(struct kmem_cache *); diff --git a/mm/slub.c b/mm/slub.c index ae6166533261..795112b65c61 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -1829,7 +1829,7 @@ static void *get_partial_node(struct kmem_cache *s, struct kmem_cache_node *n, stat(s, CPU_PARTIAL_NODE); } if (!kmem_cache_has_cpu_partial(s) - || available > s->cpu_partial / 2) + || available > slub_cpu_partial(s) / 2) break; } @@ -3410,6 +3410,39 @@ static void set_min_partial(struct kmem_cache *s, unsigned long min) s->min_partial = min; } +static void set_cpu_partial(struct kmem_cache *s) +{ +#ifdef CONFIG_SLUB_CPU_PARTIAL + /* +* cpu_partial determined the maximum number of objects kept in the +* per cpu partial lists of a processor. +* +* Per cpu partial lists mainly contain slabs that just have one +* object freed. If they are used for allocation then they can be +* filled up again with minimal effort. The slab will never hit the +* per node partial lists and therefore no locking will be required. +* +* This setting also determines +* +* A) The number of objects from per cpu partial slabs dumped to the +*per node list when we reach the limit. +* B) The number of objects in cpu partial slabs to extract from the +*per node list when we run out of per cpu objects. We only fetch +*50% to keep some capacity around for frees. +*/ + if (!kmem_cache_has_cpu_partial(s)) + s->cpu_partial = 0; + else if (s->size >= PAGE_SIZE) + s->cpu_partial = 2; + else if (s->size >= 1024) + s->cpu_partial = 6; + else if (s->size >= 256) + s->cpu_partial = 13; + else + s->cpu_partial = 30; +#endif +} + /* * calculate_sizes() determines the order and the distribution of data within * a slab object. @@ -3568,33 +3601,7 @@ static int kmem_cache_open(struct kmem_cache *s, unsigned long flags) */ set_min_partial(s, ilog2(s->size) / 2); - /* -* cpu_partial determined the maximum number of objects kept in the -* per cpu partial lists of a processor. -* -* Per cpu partial lists mainly contain slabs that just have one -* object freed. If they are used for allocation then they can be -* filled up again with minimal effort. The slab will never hit the -* per node partial lists and therefore no locking will be required. -* -* This setting also determines -* -* A) The number of objects from per cpu partial slabs dumped to the -*per node list when we reach the limit. -* B) The number of objects in cpu partial slabs to extract from the -*per node list when we run out of per cpu objects. We only fetch -*50% to keep some capacity around for frees. -*/ - if (!kmem_cache_has_cpu_partial(s)) - s->cpu_partial = 0; - else if (s->size >= PAGE_SIZE) - s->cpu_partial = 2; - else if (s->size >= 1024) -
[PATCH V2 2/3] mm/slub: wrap cpu_slab->partial in CONFIG_SLUB_CPU_PARTIAL
cpu_slab's field partial is used when CONFIG_SLUB_CPU_PARTIAL is set, which means we can save a pointer's space on each cpu for every slub item. This patch wrap cpu_slab->partial in CONFIG_SLUB_CPU_PARTIAL and wrap its sysfs too. Signed-off-by: Wei Yang --- v2: define slub_percpu_partial() to make code more elegant --- include/linux/slub_def.h | 19 +++ mm/slub.c| 16 +--- 2 files changed, 28 insertions(+), 7 deletions(-) diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h index ec13aab32647..f882a34bb9aa 100644 --- a/include/linux/slub_def.h +++ b/include/linux/slub_def.h @@ -41,12 +41,31 @@ struct kmem_cache_cpu { void **freelist;/* Pointer to next available object */ unsigned long tid; /* Globally unique transaction id */ struct page *page; /* The slab from which we are allocating */ +#ifdef CONFIG_SLUB_CPU_PARTIAL struct page *partial; /* Partially allocated frozen slabs */ +#endif #ifdef CONFIG_SLUB_STATS unsigned stat[NR_SLUB_STAT_ITEMS]; #endif }; +#ifdef CONFIG_SLUB_CPU_PARTIAL +#define slub_percpu_partial(c) ((c)->partial) + +#define slub_set_percpu_partial(c, p) \ +({ \ + slub_percpu_partial(c) = (p)->next; \ +}) + +#define slub_percpu_partial_read_once(c) READ_ONCE(slub_percpu_partial(c)) +#else +#define slub_percpu_partial(c) NULL + +#define slub_set_percpu_partial(c, p) + +#define slub_percpu_partial_read_once(c) NULL +#endif // CONFIG_SLUB_CPU_PARTIAL + /* * Word size structure that can be atomically updated or read and that * contains both the order and the number of objects that a slab of the diff --git a/mm/slub.c b/mm/slub.c index 7f4bc7027ed5..ae6166533261 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2302,7 +2302,7 @@ static bool has_cpu_slab(int cpu, void *info) struct kmem_cache *s = info; struct kmem_cache_cpu *c = per_cpu_ptr(s->cpu_slab, cpu); - return c->page || c->partial; + return c->page || slub_percpu_partial(c); } static void flush_all(struct kmem_cache *s) @@ -2568,9 +2568,9 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, new_slab: - if (c->partial) { - page = c->page = c->partial; - c->partial = page->next; + if (slub_percpu_partial(c)) { + page = c->page = slub_percpu_partial(c); + slub_set_percpu_partial(c, page); stat(s, CPU_PARTIAL_ALLOC); c->freelist = NULL; goto redo; @@ -4760,7 +4760,7 @@ static ssize_t show_slab_objects(struct kmem_cache *s, total += x; nodes[node] += x; - page = READ_ONCE(c->partial); + page = slub_percpu_partial_read_once(c); if (page) { node = page_to_nid(page); if (flags & SO_TOTAL) @@ -4988,7 +4988,8 @@ static ssize_t slabs_cpu_partial_show(struct kmem_cache *s, char *buf) int len; for_each_online_cpu(cpu) { - struct page *page = per_cpu_ptr(s->cpu_slab, cpu)->partial; + struct page *page = + slub_percpu_partial(per_cpu_ptr(s->cpu_slab, cpu)); if (page) { pages += page->pages; @@ -5000,7 +5001,8 @@ static ssize_t slabs_cpu_partial_show(struct kmem_cache *s, char *buf) #ifdef CONFIG_SMP for_each_online_cpu(cpu) { - struct page *page = per_cpu_ptr(s->cpu_slab, cpu) ->partial; + struct page *page = + slub_percpu_partial(per_cpu_ptr(s->cpu_slab, cpu)); if (page && len < PAGE_SIZE - 20) len += sprintf(buf + len, " C%d=%d(%d)", cpu, -- 2.11.0
[PATCH] ARM: imx_v6_v7_defconfig: Enable cpufreq governors
Enable more common cpufreq governors in imx defconfig because this is very useful for testing. In particular you can't use cpufreq-set -f $FREQ without explicitly defining CONFIG_CPU_FREQ_GOV_USERSPACE=y. Signed-off-by: Leonard Crestez --- It might make sense for all governors to be enabled by default from drivers/cpufreq/Kconfig and allow defconfigs to be shorter. Right now the descriptions for some of them includes a line that says "If in doubt, say Y" but the config options don't have actually have a default value defined and they effectively default to N. Cycling via savedefconfig on shawnguo/for-next also generates some reordering for some newly added options CONFIG_TOUCHSCREEN_MAX11801=y and CONFIG_HID_MULTITOUCH=y. Those were not included but it's strange that this happens. Maybe those options were inserted manually, or otherwise there is an annoying bug in kconfig? arch/arm/configs/imx_v6_v7_defconfig | 3 +++ 1 file changed, 3 insertions(+) diff --git a/arch/arm/configs/imx_v6_v7_defconfig b/arch/arm/configs/imx_v6_v7_defconfig index bb6fa56..bf1e7e3 100644 --- a/arch/arm/configs/imx_v6_v7_defconfig +++ b/arch/arm/configs/imx_v6_v7_defconfig @@ -55,6 +55,9 @@ CONFIG_CMDLINE="noinitrd console=ttymxc0,115200" CONFIG_KEXEC=y CONFIG_CPU_FREQ=y CONFIG_CPU_FREQ_DEFAULT_GOV_ONDEMAND=y +CONFIG_CPU_FREQ_GOV_POWERSAVE=y +CONFIG_CPU_FREQ_GOV_USERSPACE=y +CONFIG_CPU_FREQ_GOV_CONSERVATIVE=y CONFIG_ARM_IMX6Q_CPUFREQ=y CONFIG_CPU_IDLE=y CONFIG_VFP=y -- 2.7.4
Re: [PATCH 2/2] efi/efi_test: drop useless kfree
On Tue, May 02, 2017 at 03:11:45PM +0800, ivanhu wrote: > > > On 04/29/2017 09:42 AM, Geliang Tang wrote: > > Drop useless kfree when memdup_user() failed, since we have already > > called kfree in memdup_user(). > > > > Signed-off-by: Geliang Tang > > --- > > drivers/firmware/efi/test/efi_test.c | 4 +--- > > 1 file changed, 1 insertion(+), 3 deletions(-) > > > > diff --git a/drivers/firmware/efi/test/efi_test.c > > b/drivers/firmware/efi/test/efi_test.c > > index 08129b7..00332ae 100644 > > --- a/drivers/firmware/efi/test/efi_test.c > > +++ b/drivers/firmware/efi/test/efi_test.c > > @@ -261,10 +261,8 @@ static long efi_runtime_set_variable(unsigned long arg) > > } > > data = memdup_user(setvariable.data, setvariable.data_size); > > - if (IS_ERR(data)) { > > - kfree(name); > You mean "name" or "data"? > Sorry, it's my mistake. The original code is correct. Please ignore this patch. -Geliang > Cheers, > Ivan > > > + if (IS_ERR(data)) > > return PTR_ERR(data); > > - } > > status = efi.set_variable(name, _guid, > > setvariable.attributes,
Re: [PATCH v4 1/5] dt-bindings: gpu: add bindings for the ARM Mali Midgard GPU
On 02/05/17 15:13, Rob Herring wrote: On Tue, May 2, 2017 at 6:23 AM, Guillaume Tucker wrote: Hi Rob, On 28/04/17 20:27, Rob Herring wrote: On Tue, Apr 25, 2017 at 02:16:16PM +0100, Guillaume Tucker wrote: diff --git a/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt b/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt new file mode 100644 index ..547ddeceb498 --- /dev/null +++ b/Documentation/devicetree/bindings/gpu/arm,mali-midgard.txt @@ -0,0 +1,82 @@ +ARM Mali Midgard GPU + + +Required properties: + +- compatible : + * Must be one of the following: ++ "arm,mali-t60x" ++ "arm,mali-t62x" Don't use wildcards. Sure, old habits die hard... I'll fix it in patch v5. ++ "arm,mali-t720" ++ "arm,mali-t760" ++ "arm,mali-t820" ++ "arm,mali-t830" ++ "arm,mali-t860" ++ "arm,mali-t880" + * And, optionally, one of the vendor specific compatible: IMO, these should not be optional. Well, vendor compatible strings are clearly optional for the Utgard GPU series for which the bindings docs were recently merged. It seems that whether these should be optional or not, the documentation should be consistent between at least all similar types of devices like Midgard and Utgard GPUs. They have different architectures but from a device tree point of view, they both have the same kind of SoC-specific integration (clocks, irqs, regulators...). Clocks should not vary by SoC. There is often variation because clocks get driven by same source or are not s/w controlled, but really there should not be that variation. I noticed Utgard has 2 clocks. So is Midgard really just 1 clock? The DT should have all the clocks listed in the TRMs. I meant to say that the clock sources are different in each SoC, but yes the same clock input is always needed by the GPU. The TRM is confidential but to the best of my knowledge and based on existing device trees and the out-of-tree kernel driver, the Midgard GPU has only one clock input. So was this was overlooked in the Utgard case and should it ideally be fixed there as well as non-optional? Or, is it OK to keep these optional on a second thought? Probably should be required in the Utgard case as well. OK, so I'll make the vendor compatible strings required (for Midgard) in patch v5. Guillaume
Re: [PATCH v2] IB/i40iw: use setup_timer
On Sat, Apr 29, 2017 at 09:45:27AM -0400, Doug Ledford wrote: > On 4/28/2017 9:37 PM, Geliang Tang wrote: > > Use setup_timer() instead of init_timer() to simplify the code. > > > > Signed-off-by: Geliang Tang > > --- > > Changes in v2: > > - use setup_timer() in i40iw_terminate_start_timer(). > > I already applied your previous patch and did the fixup to add > i40iw_terminate_start_timer() into the patch myself. Thanks. > > > -- > Doug Ledford > GPG Key ID: B826A3330E572FDD > Key fingerprint = AE6B 1BDA 122B 23B4 265B 1274 B826 A333 0E57 2FDD > Thank you, Doug. -Geliang
Re: [PATCH] vmscan: scan pages until it founds eligible pages
Hi Michal, On Tue, May 02, 2017 at 09:54:32AM +0200, Michal Hocko wrote: > On Tue 02-05-17 14:14:52, Minchan Kim wrote: > > Oops, forgot to add lkml and linux-mm. > > Sorry for that. > > Send it again. > > > > >From 8ddf1c8aa15baf085bc6e8c62ce705459d57ea4c Mon Sep 17 00:00:00 2001 > > From: Minchan Kim > > Date: Tue, 2 May 2017 12:34:05 +0900 > > Subject: [PATCH] vmscan: scan pages until it founds eligible pages > > > > On Tue, May 02, 2017 at 01:40:38PM +0900, Minchan Kim wrote: > > There are premature OOM happening. Although there are a ton of free > > swap and anonymous LRU list of elgible zones, OOM happened. > > > > With investigation, skipping page of isolate_lru_pages makes reclaim > > void because it returns zero nr_taken easily so LRU shrinking is > > effectively nothing and just increases priority aggressively. > > Finally, OOM happens. > > I am not really sure I understand the problem you are facing. Could you > be more specific please? What is your configuration etc... Sure, KVM guest on x86_64, It has 2G memory and 1G swap and configured movablecore=1G to simulate highmem zone. Workload is a process consumes 2.2G memory and then random touch the address space so it makes lots of swap in/out. > > > balloon invoked oom-killer: > > gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), > > nodemask=(null), order=0, oom_score_adj=0 > [...] > > Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB > > inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB > > mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB > > unstable:0kB all_unreclaimable? no > > DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB > > inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB > > writepending:0kB present:15992kB managed:15908kB mlocked:0kB > > slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB > > pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > > lowmem_reserve[]: 0 992 992 1952 > > DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB > > inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB > > writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB > > slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB > > pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB > > lowmem_reserve[]: 0 0 0 959 > > Hmm DMA32 has sufficient free memory to allow this order-0 request. > Inactive anon lru is basically empty. Why do not we rotate a really > large active anon list? Isn't this the primary problem? It's a side effect by skipping page logic in isolate_lru_pages I mentioned above in changelog. The problem is a lot of anonymous memory in movable zone(ie, highmem) and non-small memory in DMA32 zone. In heavy memory pressure, requesting a page in GFP_KERNEL triggers reclaim. VM knows inactive list is low so it tries to deactivate pages. For it, first of all, it tries to isolate pages from active list but there are lots of anonymous pages from movable zone so skipping logic in isolate_lru_pages works. With the result, isolate_lru_pages cannot isolate any eligible pages so reclaim trial is effectively void. It continues to meet OOM. I'm on long vacation from today so understand if my response is slow.
Re: [PATCH 06/14] sched/topology,debug: Verify the first group matches the child domain
On Mon, May 01, 2017 at 05:13:26PM -0400, Rik van Riel wrote: > On Fri, 2017-04-28 at 15:20 +0200, Peter Zijlstra wrote: > > Signed-off-by: Peter Zijlstra (Intel) > > This could use a changelog. Yes indeed... I put off writing one because $hard, and clearly I forgot entirely :-/ How's this? --- Subject: sched/topology,debug: Verify the first group matches the child domain From: Peter Zijlstra Date: Fri Apr 14 18:20:48 CEST 2017 We want sched_groups to be sibling child domains (or individual CPUs when there are no child domains). Furthermore, since the first group of a domain should include the CPU of that domain, the first group of each domain should match the child domain. Verify this is indeed so. Signed-off-by: Peter Zijlstra (Intel) --- kernel/sched/topology.c |6 ++ 1 file changed, 6 insertions(+) --- a/kernel/sched/topology.c +++ b/kernel/sched/topology.c @@ -93,6 +93,12 @@ static int sched_domain_debug_one(struct group->sgc->capacity); } + if (group == sd->groups && sd->child && + !cpumask_equal(sched_domain_span(sd->child), + sched_group_cpus(group))) { + printk(KERN_ERR "ERROR: domain->groups does not match domain->child\n"); + } + group = group->next; if (group != sd->groups)
Re: [PATCH v2 1/9] dt-bindings: display: sun4i: Add component endpoint ID numbering scheme
Hi, On Fri, Apr 28, 2017 at 08:48:41AM -0500, Rob Herring wrote: > On Fri, Apr 21, 2017 at 04:38:49PM +0800, Chen-Yu Tsai wrote: > > The Allwinner display pipeline contains many hardware components, some > > of which can consume data from one of multiple upstream components. > > The numbering scheme of these components must be encoded into the device > > tree so the driver can figure out which component out of two or more of > > the same type it is supposed to use or program. > > > > This patch adds the constraint that local endpoint IDs must be the index > > or number of the remote endpoint's hardware block, for all components > > in the display pipeline up to the TCONs. > > > > Signed-off-by: Chen-Yu Tsai > > --- > > Documentation/devicetree/bindings/display/sunxi/sun4i-drm.txt | 10 > > ++ > > 1 file changed, 10 insertions(+) > > > > diff --git a/Documentation/devicetree/bindings/display/sunxi/sun4i-drm.txt > > b/Documentation/devicetree/bindings/display/sunxi/sun4i-drm.txt > > index 57a8d0610062..7acdbf14ae1c 100644 > > --- a/Documentation/devicetree/bindings/display/sunxi/sun4i-drm.txt > > +++ b/Documentation/devicetree/bindings/display/sunxi/sun4i-drm.txt > > @@ -4,6 +4,16 @@ Allwinner A10 Display Pipeline > > The Allwinner A10 Display pipeline is composed of several components > > that are going to be documented below: > > > > +For the input port of all components up to the TCON in the display > > +pipeline, if there are multiple components, the local endpoint IDs > > +must correspond to the index of the upstream block. For example, if > > +the remote endpoint is Frontend 1, then the local endpoint ID must > > +be 1. > > + > > +Conversely, for the output ports of the same group, the remote endpoint > > +ID must be the index of the local hardware block. If the local backend > > +is backend 1, then the remote endpoint ID must be 1. > > It would be clearer if you just explicitly listed IDs and their > connections. From how this is worded, it would not work if you had > connections like this: > > DevA 0 > DevA 1 > DevB 0 > DevB 1 > > These would need to be endpoints 0-3 in TCON, and that doesn't reflect > the remote devices' index. Chen-Yu, can you send a patch to rephrase the doc that way? Thanks, Maxime -- Maxime Ripard, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com signature.asc Description: PGP signature
Re: [PATCH 00/14] sched/topology fixes
On 05/02/2017 11:43 AM, Peter Zijlstra wrote: > On Fri, Apr 28, 2017 at 03:53:39PM +0200, Peter Zijlstra wrote: >> Also, the following occurred to me: >> >> sg_span & sg_mask == sg_mask >> >> Therefore, we don't need to do the whole "sg_span &" business. >> >> Hmm? >> @@ -856,7 +857,7 @@ build_sched_groups(struct sched_domain * >> continue; >> >> group = get_group(i, sdd, ); >> -cpumask_setall(sched_group_mask(sg)); >> +cpumask_copy(sched_group_mask(sg), sched_group_cpus(sg)); >> >> for_each_cpu(j, span) { >> if (get_group(j, sdd, NULL) != group) > OK, so this explodes mightily. > > That code also hurt my brain bad, so I had to fix that a little. > > The below seems to boot. > > --- > --- a/kernel/sched/fair.c > +++ b/kernel/sched/fair.c > @@ -7996,7 +7996,7 @@ static int active_load_balance_cpu_stop( > static int should_we_balance(struct lb_env *env) > { > struct sched_group *sg = env->sd->groups; > - struct cpumask *sg_cpus, *sg_mask; > + struct cpumask *sg_mask; > int cpu, balance_cpu = -1; > > /* > @@ -8006,11 +8006,10 @@ static int should_we_balance(struct lb_e > if (env->idle == CPU_NEWLY_IDLE) > return 1; > > - sg_cpus = sched_group_cpus(sg); > sg_mask = sched_group_mask(sg); > /* Try to find first idle cpu */ > - for_each_cpu_and(cpu, sg_cpus, env->cpus) { > - if (!cpumask_test_cpu(cpu, sg_mask) || !idle_cpu(cpu)) > + for_each_cpu_and(cpu, sg_mask, env->cpus) { > + if (!idle_cpu(cpu)) > continue; > > balance_cpu = cpu; > --- a/kernel/sched/topology.c > +++ b/kernel/sched/topology.c > @@ -85,7 +85,8 @@ static int sched_domain_debug_one(struct > group->sgc->id, > cpumask_pr_args(sched_group_cpus(group))); > > - if ((sd->flags & SD_OVERLAP) && > !cpumask_full(sched_group_mask(group))) { > + if ((sd->flags & SD_OVERLAP) && > + !cpumask_equal(sched_group_mask(group), > sched_group_cpus(group))) { > printk(KERN_CONT " mask=%*pbl", > cpumask_pr_args(sched_group_mask(group))); > } > @@ -505,7 +506,7 @@ enum s_alloc { > */ > int group_balance_cpu(struct sched_group *sg) > { > - return cpumask_first_and(sched_group_cpus(sg), sched_group_mask(sg)); > + return cpumask_first(sched_group_mask(sg)); > } > > > @@ -833,23 +834,34 @@ build_overlap_sched_groups(struct sched_ > * [*] in other words, the first group of each domain is its child domain. > */ > > -static int get_group(int cpu, struct sd_data *sdd, struct sched_group **sg) > +static struct sched_group *get_group(int cpu, struct sd_data *sdd) > { > struct sched_domain *sd = *per_cpu_ptr(sdd->sd, cpu); > struct sched_domain *child = sd->child; > + struct sched_group *sg; > > if (child) > cpu = cpumask_first(sched_domain_span(child)); > > - if (sg) { > - *sg = *per_cpu_ptr(sdd->sg, cpu); > - (*sg)->sgc = *per_cpu_ptr(sdd->sgc, cpu); > + sg = *per_cpu_ptr(sdd->sg, cpu); > + sg->sgc = *per_cpu_ptr(sdd->sgc, cpu); > > - /* For claim_allocations: */ > - atomic_set(&(*sg)->sgc->ref, 1); > + /* For claim_allocations: */ > + atomic_inc(>ref); > + atomic_inc(>sgc->ref); > + > + if (child) { > + cpumask_copy(sched_group_cpus(sg), sched_domain_span(child)); > + cpumask_copy(sched_group_mask(sg), sched_group_cpus(sg)); > + } else { > + cpumask_set_cpu(cpu, sched_group_cpus(sg)); > + cpumask_set_cpu(cpu, sched_group_cpus(sg)); Typo here. The mask is not being set in the else clause. > } > > - return cpu; > + sg->sgc->capacity = SCHED_CAPACITY_SCALE * > cpumask_weight(sched_group_cpus(sg)); > + sg->sgc->min_capacity = SCHED_CAPACITY_SCALE; > + > + return sg; > } > > /* > @@ -868,34 +880,20 @@ build_sched_groups(struct sched_domain * > struct cpumask *covered; > int i; > > - get_group(cpu, sdd, >groups); > - atomic_inc(>groups->ref); > - > - if (cpu != cpumask_first(span)) > - return 0; > - > lockdep_assert_held(_domains_mutex); > covered = sched_domains_tmpmask; > > cpumask_clear(covered); > > - for_each_cpu(i, span) { > + for_each_cpu_wrap(i, span, cpu) { > struct sched_group *sg; > - int group, j; > > if (cpumask_test_cpu(i, covered)) > continue; > > - group = get_group(i, sdd, ); > - cpumask_setall(sched_group_mask(sg)); > + sg = get_group(i, sdd); > > - for_each_cpu(j, span) { > - if (get_group(j, sdd, NULL) != group) > -
Re: [PATCH 1/4] PCI: add resizeable BAR infrastructure v4
Am 26.04.2017 um 18:45 schrieb Andy Shevchenko: [SNIP] -#define PCI_REBAR_CTRL_NBAR_MASK (7 << 5)/* mask for # bars */ -#define PCI_REBAR_CTRL_NBAR_SHIFT 5 /* shift for # bars */ +#define PCI_REBAR_CTRL_NBAR_MASK (7 << 5)/* mask for # BARs */ +#define PCI_REBAR_CTRL_NBAR_SHIFT 5 /* shift for # BARs */ I understand why, but I dunno it worth to do. Bjorn suggested that. Either way is fine with me, but he needs to stick his signed-of-by on it when pushing it upstream. So his opinion usually wins. All other comments will be integrated in the next version of the patch. Thanks for the review, Christian.
Re: [PATCH v2 1/2] mm: Uncharge poisoned pages
On 28/04/2017 15:48, Michal Hocko wrote: > On Fri 28-04-17 11:17:34, Laurent Dufour wrote: >> On 28/04/2017 09:31, Michal Hocko wrote: >>> [CC Johannes and Vladimir - the patch is >>> http://lkml.kernel.org/r/1493130472-22843-2-git-send-email-lduf...@linux.vnet.ibm.com] >>> >>> On Fri 28-04-17 08:07:55, Michal Hocko wrote: On Thu 27-04-17 13:51:23, Andi Kleen wrote: > Michal Hocko writes: > >> On Tue 25-04-17 16:27:51, Laurent Dufour wrote: >>> When page are poisoned, they should be uncharged from the root memory >>> cgroup. >>> >>> This is required to avoid a BUG raised when the page is onlined back: >>> BUG: Bad page state in process mem-on-off-test pfn:7ae3b >>> page:f1eb8ec0 count:0 mapcount:0 mapping: (null) >>> index:0x1 >>> flags: 0x380020(hwpoison) >> >> My knowledge of memory poisoning is very rudimentary but aren't those >> pages supposed to leak and never come back? In other words isn't the >> hoplug code broken because it should leave them alone? > > Yes that would be the right interpretation. If it was really offlined > due to a hardware error the memory will be poisoned and any access > could cause a machine check. OK, thanks for the clarification. Then I am not sure the patch is correct. Why do we need to uncharge that page at all? >>> >>> Now, I have realized that we actually want to uncharge that page because >>> it will pin the memcg and we do not want to have that memcg and its >>> whole hierarchy pinned as well. This used to work before the charge >>> rework 0a31bc97c80c ("mm: memcontrol: rewrite uncharge API") I guess >>> because we used to uncharge on page cache removal. >>> >>> I do not think the patch is correct, though. memcg_kmem_enabled() will >>> check whether kmem accounting is enabled and we are talking about page >>> cache pages here. You should be using mem_cgroup_uncharge instead. >> >> Thanks for the review Michal. >> >> I was not comfortable either with this patch. >> >> I did some tests calling mem_cgroup_uncharge() when isolate_lru_page() >> succeeds only, so not calling it if isolate_lru_page() failed. > > Wait a moment. This cannot possibly work. isolate_lru_page asumes page > count > 0 and increments the counter so the resulting page count is > 1 > I have only now realized that we have VM_BUG_ON_PAGE(page_count(page), page) > in uncharge_list(). My mistake, my kernel was not build with CONFIG_DEBUG_VM set. You're right this cannot work this way. > This is getting quite hairy. What is the expected page count of the > hwpoison page? I guess we would need to update the VM_BUG_ON in the > memcg uncharge code to ignore the page count of hwpoison pages if it can > be arbitrary. Based on the experiment I did, page count == 2 when isolate_lru_page() succeeds, even in the case of a poisoned page. In my case I think this is because the page is still used by the process which is calling madvise(). I'm wondering if I'm looking at the right place. May be the poisoned page should remain attach to the memory_cgroup until no one is using it. In that case this means that something should be done when the page is off-lined... I've to dig further here. > > Before we go any further, is there any documentation about the expected > behavior and the state of the hwpoison pages? I have a very bad feeling > that the current behavior is quite arbitrary and "testing driven" > holes plugging will make it only more messy. So let's start with the > clear description of what should happen with the hwpoison pages. I didn't find any documentation about that. The root cause is that a bug message is displayed when a poisoned page is off-lined, may be this is in that path that something is missing. Cheers, Laurent.
[PATCH] input: edt-ft5x06: increase allowed data range for threshold parameter
The datasheet and application note does not mention an allowed range for the M09_REGISTER_THRESHOLD parameter. One of our customers needs to set lower values than 20 and they seem to work just fine on EDT EP0xx0M09 with T5x06 touch. So, lacking a known lower limit, we increase the range for thresholds, and set the lower limit to 0. The documentation is updated accordingly. Signed-off-by: Schoefegger Stefan Signed-off-by: Manfred Schlaegl Signed-off-by: Martin Kepplinger --- Documentation/devicetree/bindings/input/touchscreen/edt-ft5x06.txt | 2 +- Documentation/input/devices/edt-ft5x06.rst | 2 +- drivers/input/touchscreen/edt-ft5x06.c | 2 +- 3 files changed, 3 insertions(+), 3 deletions(-) diff --git a/Documentation/devicetree/bindings/input/touchscreen/edt-ft5x06.txt b/Documentation/devicetree/bindings/input/touchscreen/edt-ft5x06.txt index 6db2210..025cf8c 100644 --- a/Documentation/devicetree/bindings/input/touchscreen/edt-ft5x06.txt +++ b/Documentation/devicetree/bindings/input/touchscreen/edt-ft5x06.txt @@ -36,7 +36,7 @@ Optional properties: control gpios - threshold: allows setting the "click"-threshold in the range -from 20 to 80. +from 0 to 80. - gain:allows setting the sensitivity in the range from 0 to 31. Note that lower values indicate higher diff --git a/Documentation/input/devices/edt-ft5x06.rst b/Documentation/input/devices/edt-ft5x06.rst index 2032f0b..1ccc94b 100644 --- a/Documentation/input/devices/edt-ft5x06.rst +++ b/Documentation/input/devices/edt-ft5x06.rst @@ -15,7 +15,7 @@ It has been tested with the following devices: The driver allows configuration of the touch screen via a set of sysfs files: /sys/class/input/eventX/device/device/threshold: -allows setting the "click"-threshold in the range from 20 to 80. +allows setting the "click"-threshold in the range from 0 to 80. /sys/class/input/eventX/device/device/gain: allows setting the sensitivity in the range from 0 to 31. Note that diff --git a/drivers/input/touchscreen/edt-ft5x06.c b/drivers/input/touchscreen/edt-ft5x06.c index 8cf8d8d..f872817 100644 --- a/drivers/input/touchscreen/edt-ft5x06.c +++ b/drivers/input/touchscreen/edt-ft5x06.c @@ -471,7 +471,7 @@ static EDT_ATTR(gain, S_IWUSR | S_IRUGO, WORK_REGISTER_GAIN, static EDT_ATTR(offset, S_IWUSR | S_IRUGO, WORK_REGISTER_OFFSET, M09_REGISTER_OFFSET, 0, 31); static EDT_ATTR(threshold, S_IWUSR | S_IRUGO, WORK_REGISTER_THRESHOLD, - M09_REGISTER_THRESHOLD, 20, 80); + M09_REGISTER_THRESHOLD, 0, 80); static EDT_ATTR(report_rate, S_IWUSR | S_IRUGO, WORK_REGISTER_REPORT_RATE, NO_REGISTER, 3, 14); -- 2.1.4
Re: [PATCH 1/3] drm: fourcc byteorder: drop DRM_FORMAT_BIG_ENDIAN
Hi, > I also think that this patch requires more comments than the > commit message has at the moment. > > Removing the definition also removes the possibility to describe a lot > of pixel formats, so that should definitely be mentioned. I think it > would also be good to have some kind of justified claim that no > hardware actually needs the pixel formats this is removing (e.g. RGB565 > BE). That and RGB2101010 BE are the only candidates I can think of. Dealing with those in none-native byteorder is a PITA though. Display hardware is little endian (pci byte order) for quite a while already. > Maybe this was already in the long discussions, but I feel it > should be summarized in the commit message. Radeon and nvidia (nv40) cards where mentioned. I'll try to summarize (feel free to correct me if I'm wrong). nvidia has support for 8 bit-per-color formats only on bigendian hosts. Not sure whenever this is a driver or hardware limitation. radeon looks differently on pre-R600 and R600+ hardware. pre-R600 can byteswap on cpu access, so the cpu view is bigendian whereas things are actually stored on little endian byte order. Hardware supports both 16bit and 32bit swaps. Used for fbdev emulation (probably 32bit swaps, but not fully sure). xorg radeon driver doesn't use the byteswapping feature, because it is a PITA when bo's are moved between vram and system memory. R600+ supports bigendian framebuffer formats, so no byteswapping on access is needed. Not sure whenever that includes 16bpp formats or whenever this is limited to the 8 bit-per-color formats (simliar to nvidia). Discussion focused on the pre-R600 hardware and how this on-acpu-access byteswapping is more a problem than it helps ... cheers, Gerd
Re: [PATCH 2/7] perf config: Check list empty before showing configs
Em Wed, Apr 26, 2017 at 09:21:03PM +0900, Taeung Song escreveu: > If existent config files contains nothing, > the sections list in config_set can be empty. > > So check not only NULL pointer of config_set but > also the list in config_set. > +++ b/tools/perf/builtin-config.c > @@ -75,7 +75,7 @@ static int show_spec_config(struct perf_config_set *set, > const char *var) > struct perf_config_section *section; > struct perf_config_item *item; > > - if (set == NULL) > + if (set == NULL || list_empty(>sections)) > return -1; But should we consider an error to have an empty config file? I don't think so :-\ - Arnaldo > > perf_config_items__for_each_entry(>sections, section) { > @@ -105,7 +105,7 @@ static int show_config(struct perf_config_set *set) > struct perf_config_section *section; > struct perf_config_item *item; > > - if (set == NULL) > + if (set == NULL || list_empty(>sections)) > return -1; > > perf_config_set__for_each_entry(set, section, item) { > diff --git a/tools/perf/util/config.c b/tools/perf/util/config.c > index 8d724f0..492c862 100644 > --- a/tools/perf/util/config.c > +++ b/tools/perf/util/config.c > @@ -707,7 +707,7 @@ int perf_config(config_fn_t fn, void *data) > struct perf_config_section *section; > struct perf_config_item *item; > > - if (config_set == NULL) > + if (config_set == NULL || list_empty(_set->sections)) > return -1; > > perf_config_set__for_each_entry(config_set, section, item) { > -- > 2.7.4
Re: [PATCH] vmscan: scan pages until it founds eligible pages
On Tue 02-05-17 23:51:50, Minchan Kim wrote: > Hi Michal, > > On Tue, May 02, 2017 at 09:54:32AM +0200, Michal Hocko wrote: > > On Tue 02-05-17 14:14:52, Minchan Kim wrote: > > > Oops, forgot to add lkml and linux-mm. > > > Sorry for that. > > > Send it again. > > > > > > >From 8ddf1c8aa15baf085bc6e8c62ce705459d57ea4c Mon Sep 17 00:00:00 2001 > > > From: Minchan Kim > > > Date: Tue, 2 May 2017 12:34:05 +0900 > > > Subject: [PATCH] vmscan: scan pages until it founds eligible pages > > > > > > On Tue, May 02, 2017 at 01:40:38PM +0900, Minchan Kim wrote: > > > There are premature OOM happening. Although there are a ton of free > > > swap and anonymous LRU list of elgible zones, OOM happened. > > > > > > With investigation, skipping page of isolate_lru_pages makes reclaim > > > void because it returns zero nr_taken easily so LRU shrinking is > > > effectively nothing and just increases priority aggressively. > > > Finally, OOM happens. > > > > I am not really sure I understand the problem you are facing. Could you > > be more specific please? What is your configuration etc... > > Sure, KVM guest on x86_64, It has 2G memory and 1G swap and configured > movablecore=1G to simulate highmem zone. > Workload is a process consumes 2.2G memory and then random touch the > address space so it makes lots of swap in/out. > > > > > > balloon invoked oom-killer: > > > gfp_mask=0x17080c0(GFP_KERNEL_ACCOUNT|__GFP_ZERO|__GFP_NOTRACK), > > > nodemask=(null), order=0, oom_score_adj=0 > > [...] > > > Node 0 active_anon:1698864kB inactive_anon:261256kB active_file:208kB > > > inactive_file:184kB unevictable:0kB isolated(anon):0kB isolated(file):0kB > > > mapped:532kB dirty:108kB writeback:0kB shmem:172kB writeback_tmp:0kB > > > unstable:0kB all_unreclaimable? no > > > DMA free:7316kB min:32kB low:44kB high:56kB active_anon:8064kB > > > inactive_anon:0kB active_file:0kB inactive_file:0kB unevictable:0kB > > > writepending:0kB present:15992kB managed:15908kB mlocked:0kB > > > slab_reclaimable:464kB slab_unreclaimable:40kB kernel_stack:0kB > > > pagetables:24kB bounce:0kB free_pcp:0kB local_pcp:0kB free_cma:0kB > > > lowmem_reserve[]: 0 992 992 1952 > > > DMA32 free:9088kB min:2048kB low:3064kB high:4080kB active_anon:952176kB > > > inactive_anon:0kB active_file:36kB inactive_file:0kB unevictable:0kB > > > writepending:88kB present:1032192kB managed:1019388kB mlocked:0kB > > > slab_reclaimable:13532kB slab_unreclaimable:16460kB kernel_stack:3552kB > > > pagetables:6672kB bounce:0kB free_pcp:56kB local_pcp:24kB free_cma:0kB > > > lowmem_reserve[]: 0 0 0 959 > > > > Hmm DMA32 has sufficient free memory to allow this order-0 request. > > Inactive anon lru is basically empty. Why do not we rotate a really > > large active anon list? Isn't this the primary problem? > > It's a side effect by skipping page logic in isolate_lru_pages > I mentioned above in changelog. > > The problem is a lot of anonymous memory in movable zone(ie, highmem) > and non-small memory in DMA32 zone. Such a configuration is questionable on its own. But let't keep this part alone. > In heavy memory pressure, > requesting a page in GFP_KERNEL triggers reclaim. VM knows inactive list > is low so it tries to deactivate pages. For it, first of all, it tries > to isolate pages from active list but there are lots of anonymous pages > from movable zone so skipping logic in isolate_lru_pages works. With > the result, isolate_lru_pages cannot isolate any eligible pages so > reclaim trial is effectively void. It continues to meet OOM. But skipped pages should be rotated and we should eventually hit pages from the right zone(s). Moreover we should scan the full LRU at priority 0 so why exactly we hit the OOM killer? Anyway [1] has changed this behavior. Are you seeing the issue with this patch dropped? [1] http://www.ozlabs.org/~akpm/mmotm/broken-out/revert-mm-vmscan-account-for-skipped-pages-as-a-partial-scan.patch -- Michal Hocko SUSE Labs
Re: [PATCH] Allow to use DMA_CTRL_REUSE flag for all channel types
Hi Vinod, On Mon, 2017-05-01 at 11:21 +0530, Vinod Koul wrote: > On Fri, Apr 28, 2017 at 04:37:46PM +0300, Eugeniy Paltsev wrote: > > In the current implementation dma_get_slave_caps is used to check > > state of descriptor_reuse option. But dma_get_slave_caps includes > > check if the channel supports slave transactions. > > So DMA_CTRL_REUSE flag can be set (even for MEM-TO-MEM tranfers) > > only if channel supports slave transactions. > > > > Now we can use DMA_CTRL_REUSE flag for all channel types. > > Also it allows to test reusing mechanism with simply mem-to-mem dma > > test. > > We do not want to allow that actually. Slave is always treated as a > special > case, so resue was allowed. > > With memcpy the assumptions are different and clients can do reuse. Could you please clarify why don't we want to allow use DMA_CTRL_REUSE for mem-to-mem transfers? Reusing of mem-to-mem (MEMCPY and DMA_SG) descriptors will work fine on virt-dma based drivers. Anyway the current implementation behaviour is quite strange: If channel supports *slave* transfers DMA_CTRL_REUSE can be set to slave and *mem-to-mem* transfers. And, of course, we can pass DMA_CTRL_REUSE flag to device_prep_dma_sg or device_prep_dma_memcpy directly without checks. > > > > Signed-off-by: Eugeniy Paltsev > > --- > > Â include/linux/dmaengine.h | 6 +- > > Â 1 file changed, 1 insertion(+), 5 deletions(-) > > > > diff --git a/include/linux/dmaengine.h b/include/linux/dmaengine.h > > index 5336808..92cf8b0 100644 > > --- a/include/linux/dmaengine.h > > +++ b/include/linux/dmaengine.h > > @@ -1376,11 +1376,7 @@ static inline int dma_get_slave_caps(struct > > dma_chan *chan, > > Â > > Â static inline int dmaengine_desc_set_reuse(struct > > dma_async_tx_descriptor *tx) > > Â { > > - struct dma_slave_caps caps; > > - > > - dma_get_slave_caps(tx->chan, ); > > - > > - if (caps.descriptor_reuse) { > > + if (tx->chan->device->descriptor_reuse) { > > Â tx->flags |= DMA_CTRL_REUSE; > > Â return 0; > > Â } else { > > --Â > > 2.9.3 > > > > -- -- Â Eugeniy Paltsev
Re: [PATCH] Remove ARM errata Workarounds 458693 and 460075
On Tue, May 02, 2017 at 01:27:54PM +0100, Robin Murphy wrote: > On 18/04/17 16:57, Catalin Marinas wrote: > > On Sun, Apr 16, 2017 at 09:04:46AM +0100, Russell King - ARM Linux wrote: > >> On Sat, Apr 15, 2017 at 07:06:06PM -0500, Nisal Menuka wrote: > >>> According to ARM, these errata exist only in a version of Cortex-A8 > >>> (r2p0) which was never built. Therefore, I believe there are no platforms > >>> where this workaround should be enabled. > >>> link :http://infocenter.arm.com/help/index.jsp?topic= > >>> /com.arm.doc.faqs/ka15634.html > >> > >> These were submitted by ARM Ltd back in 2009 - if the silicon was never > >> built, there would've been no reason to submit them. Maybe Catalin can > >> shed some light on this, being the commit author who introduced these? > > > > We normally try not to submit errata workarounds for revisions that are > > not going to be built/deployed. It's possible that at the time there > > were plans for r2p0 to be licensed and built (not just FPGA) but I don't > > really remember the details. The A8 errata document indeed states that > > r1p0 and r2p0 are obsolete but this can mean many things (like not > > available to license). > > > > I'll try to see if any of the A8 past product managers know anything > > about this. In the meantime, I would leave them in (no run-time > > overhead). > > FWIW, I just fired up a RealView PB-A8 board to check, and that reports > r1p1. True, it's not strictly a real silicon implementation (I think > it's one of the structured ASIC test chips), but since it was, as far as > I'm aware, a commercially-available development system, it's not > impossible that someone may still own and use one of these beasts. The above errata were specific to r1p0 (not r1p1) and r2p0. Since ARM Ltd claims there are no products built around these revisions, I'm fine with removing the workarounds from the kernel (internal testchips/FPGA don't count as products but I doubt they are still relevant 8 years later). -- Catalin
Re: [PATCH V2] pwm: tegra: Set maximum pwm clock source per SoC tapeout
On 02/05/17 15:05, Laxman Dewangan wrote: > The PWM hardware IP is taped-out with different maximum frequency > on different SoCs. > > From HW team: > Before Tegra186, it is 38.4MHz. > In Tegra186, it is 102MHz. > > Add support to limit the clock source frequency to the maximum IP > supported frequency. Provide these values via SoC chipdata. > > Signed-off-by: Laxman Dewangan > > --- > Changes from V1: > - Set the 48MHz maximum frequency for Tegra210 and earlier. I think that your changelog needs to be updated, because it still says 38.4MHz and not 48MHz. Cheers Jon -- nvpublic
[GIT PULL] Hardware module parameter annotation for secure boot
Hi Linus, Can you pull this branch, please? It consists of a set of patches that annotate module parameters that configure hardware resources including ioports, iomem addresses, irq lines and dma channels. This allows a future patch to prohibit the use of such module parameters to prevent that hardware from being abused to gain access to the running kernel image as part of locking the kernel down under UEFI secure boot conditions. Annotations are made by changing: module_param(n, t, p) module_param_named(n, v, t, p) module_param_array(n, t, m, p) to: module_param_hw(n, t, hwtype, p) module_param_hw_named(n, v, t, hwtype, p) module_param_hw_array(n, t, hwtype, m, p) where the module parameter refers to a hardware setting hwtype specifies the type of the resource being configured. This can be one of: ioport Module parameter configures an I/O port iomem Module parameter configures an I/O mem address ioport_or_iomem Module parameter could be either (runtime set) irq Module parameter configures an I/O port dma Module parameter configures a DMA channel dma_addrModule parameter configures a DMA buffer address other Module parameter configures some other value Note that the hwtype is compile checked, but not currently stored (the lockdown code probably won't require it). It is, however, there for future use. A bonus is that the hwtype can also be used for grepping. The intention is for the kernel to ignore or reject attempts to set annotated module parameters if lockdown is enabled. This applies to options passed on the boot command line, passed to insmod/modprobe or direct twiddling in /sys/module/ parameter files. The module initialisation then needs to handle the parameter not being set, by (1) giving an error, (2) probing for a value or (3) using a reasonable default. What I can't do is just reject a module out of hand because it may take a hardware setting in the module parameters. Some important modules, some ipmi stuff for instance, both probe for hardware and allow hardware to be manually specified; if the driver is aborts with any error, you don't get any ipmi hardware. Further, trying to do this entirely in the module initialisation code doesn't protect against sysfs twiddling. [!] Note that in and of itself, this series of patches should have no effect on the the size of the kernel or code execution - that is left to a patch in the next series to effect. It does mark annotated kernel parameters with a KERNEL_PARAM_FL_HWPARAM flag in an already existing field. [!] Further note that this series needs to be pulled after James Morris's security/next. Pulling it at the end of the merge window should be fine and it shouldn't be a problem if annotations are discarded due to merge collisions. David --- The following changes since commit ddb99e118e37f324a4be65a411bb60ae62795cf9: security, keys: convert key_user.usage from atomic_t to refcount_t (2017-04-03 10:49:06 +1000) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/dhowells/linux-fs.git tags/hwparam-20170420 for you to fetch changes up to 6192c41fc608b0a58d5540b015aa1672c266f3c5: Annotate hardware config module parameters in sound/pci/ (2017-04-20 12:02:32 +0100) Annotation of module parameters that specify device settings David Howells (38): Annotate module params that specify hardware parameters (eg. ioport) Annotate hardware config module parameters in arch/x86/mm/ Annotate hardware config module parameters in drivers/char/ipmi/ Annotate hardware config module parameters in drivers/char/mwave/ Annotate hardware config module parameters in drivers/char/ Annotate hardware config module parameters in drivers/clocksource/ Annotate hardware config module parameters in drivers/cpufreq/ Annotate hardware config module parameters in drivers/gpio/ Annotate hardware config module parameters in drivers/i2c/ Annotate hardware config module parameters in drivers/iio/ Annotate hardware config module parameters in drivers/input/ Annotate hardware config module parameters in drivers/isdn/ Annotate hardware config module parameters in drivers/media/ Annotate hardware config module parameters in drivers/misc/ Annotate hardware config module parameters in drivers/mmc/host/ Annotate hardware config module parameters in drivers/net/appletalk/ Annotate hardware config module parameters in drivers/net/arcnet/ Annotate hardware config module parameters in drivers/net/can/ Annotate hardware config module parameters in drivers/net/ethernet/ Annotate hardware config module parameters in
Re: linux-next: Tree for May 2 (xen: events_base.c)
On 05/01/17 23:47, Stephen Rothwell wrote: > Hi all, > > Please do not add any v4.13 destined material in your linux-next > included branches until after v4.12-rc1 has been released. > > Changes since 20170501: > on x86_64: drivers/built-in.o: In function `set_affinity_irq': events_base.c:(.text+0x1632c4): undefined reference to `xen_have_vector_callback' arch/x86/pci/built-in.o: In function `pci_xen_hvm_init': (.init.text+0x59f): undefined reference to `xen_have_vector_callback' -- ~Randy
[PATCH tip/core/rcu 0/2] Debloat rcu_segcblist header file
Hello! This series debloats the include/linux/rcu_segcblist.h include file: 1. Leave data structures in include/linux/rcu_segcblist.h, but move functions to kernel/rcu/rcu_segcblist.h, courtesy of Ingo Molnar. 2. Move non-trivial functions from kernel/rcu/rcu_segcblist.h to kernel/rcu/rcu_segcblist.c to avoid excessive inlining of large functions. Additional patches open-coding some of the more trivial functions to follow. Thanx, Paul include/linux/rcu_segcblist.h | 628 -- init/Kconfig |3 kernel/rcu/Makefile |1 kernel/rcu/rcu_segcblist.c| 505 ++ kernel/rcu/rcu_segcblist.h| 1178 -- kernel/rcu/srcutiny.c |1 kernel/rcu/srcutree.c |1 kernel/rcu/tree.h |3 8 files changed, 1196 insertions(+), 1124 deletions(-)
[PATCH tip/core/rcu 2/2] rcu: Separately compile large rcu_segcblist functions
This commit creates a new kernel/rcu/rcu_segcblist.c file that contains non-trivial segcblist functions. Trivial functions remain as static inline functions in kernel/rcu/rcu_segcblist.h Reported-by: Linus Torvalds Signed-off-by: Paul E. McKenney Cc: Peter Zijlstra Cc: Thomas Gleixner --- init/Kconfig | 3 + kernel/rcu/Makefile| 1 + kernel/rcu/rcu_segcblist.c | 505 ++ kernel/rcu/rcu_segcblist.h | 533 +++-- 4 files changed, 544 insertions(+), 498 deletions(-) create mode 100644 kernel/rcu/rcu_segcblist.c diff --git a/init/Kconfig b/init/Kconfig index 42a346b0df43..1d3475fc9496 100644 --- a/init/Kconfig +++ b/init/Kconfig @@ -573,6 +573,9 @@ config RCU_STALL_COMMON the tiny variants to disable RCU CPU stall warnings, while making these warnings mandatory for the tree variants. +config RCU_NEED_SEGCBLIST + def_bool ( TREE_RCU || PREEMPT_RCU || TINY_SRCU || TREE_SRCU ) + config CONTEXT_TRACKING bool diff --git a/kernel/rcu/Makefile b/kernel/rcu/Makefile index 158e6593d58c..23803c7d5180 100644 --- a/kernel/rcu/Makefile +++ b/kernel/rcu/Makefile @@ -12,3 +12,4 @@ obj-$(CONFIG_TREE_RCU) += tree.o obj-$(CONFIG_PREEMPT_RCU) += tree.o obj-$(CONFIG_TREE_RCU_TRACE) += tree_trace.o obj-$(CONFIG_TINY_RCU) += tiny.o +obj-$(CONFIG_RCU_NEED_SEGCBLIST) += rcu_segcblist.o diff --git a/kernel/rcu/rcu_segcblist.c b/kernel/rcu/rcu_segcblist.c new file mode 100644 index ..2b62a38b080f --- /dev/null +++ b/kernel/rcu/rcu_segcblist.c @@ -0,0 +1,505 @@ +/* + * RCU segmented callback lists, function definitions + * + * This program is free software; you can redistribute it and/or modify + * it under the terms of the GNU General Public License as published by + * the Free Software Foundation; either version 2 of the License, or + * (at your option) any later version. + * + * This program is distributed in the hope that it will be useful, + * but WITHOUT ANY WARRANTY; without even the implied warranty of + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the + * GNU General Public License for more details. + * + * You should have received a copy of the GNU General Public License + * along with this program; if not, you can access it online at + * http://www.gnu.org/licenses/gpl-2.0.html. + * + * Copyright IBM Corporation, 2017 + * + * Authors: Paul E. McKenney + */ + +#include +#include +#include + +#include "rcu_segcblist.h" + +/* Initialize simple callback list. */ +void rcu_cblist_init(struct rcu_cblist *rclp) +{ + rclp->head = NULL; + rclp->tail = >head; + rclp->len = 0; + rclp->len_lazy = 0; +} + +/* + * Debug function to actually count the number of callbacks. + * If the number exceeds the limit specified, return -1. + */ +long rcu_cblist_count_cbs(struct rcu_cblist *rclp, long lim) +{ + int cnt = 0; + struct rcu_head **rhpp = >head; + + for (;;) { + if (!*rhpp) + return cnt; + if (++cnt > lim) + return -1; + rhpp = &(*rhpp)->next; + } +} + +/* + * Dequeue the oldest rcu_head structure from the specified callback + * list. This function assumes that the callback is non-lazy, but + * the caller can later invoke rcu_cblist_dequeued_lazy() if it + * finds otherwise (and if it cares about laziness). This allows + * different users to have different ways of determining laziness. + */ +struct rcu_head *rcu_cblist_dequeue(struct rcu_cblist *rclp) +{ + struct rcu_head *rhp; + + rhp = rclp->head; + if (!rhp) + return NULL; + rclp->len--; + rclp->head = rhp->next; + if (!rclp->head) + rclp->tail = >head; + return rhp; +} + +/* + * Initialize an rcu_segcblist structure. + */ +void rcu_segcblist_init(struct rcu_segcblist *rsclp) +{ + int i; + + BUILD_BUG_ON(RCU_NEXT_TAIL + 1 != ARRAY_SIZE(rsclp->gp_seq)); + BUILD_BUG_ON(ARRAY_SIZE(rsclp->tails) != ARRAY_SIZE(rsclp->gp_seq)); + rsclp->head = NULL; + for (i = 0; i < RCU_CBLIST_NSEGS; i++) + rsclp->tails[i] = >head; + rsclp->len = 0; + rsclp->len_lazy = 0; +} + +/* + * Disable the specified rcu_segcblist structure, so that callbacks can + * no longer be posted to it. This structure must be empty. + */ +void rcu_segcblist_disable(struct rcu_segcblist *rsclp) +{ + WARN_ON_ONCE(!rcu_segcblist_empty(rsclp)); + WARN_ON_ONCE(rcu_segcblist_n_cbs(rsclp)); + WARN_ON_ONCE(rcu_segcblist_n_lazy_cbs(rsclp)); + rsclp->tails[RCU_NEXT_TAIL] = NULL; +} + +/* + * Is the specified segment of the specified rcu_segcblist structure + * empty of callbacks? + */ +bool rcu_segcblist_segempty(struct rcu_segcblist *rsclp, int seg) +{ + if (seg == RCU_DONE_TAIL) + return >head == rsclp->tails[RCU_DONE_TAIL]; + return
[PATCH tip/core/rcu 1/2] srcu: Debloat the header
From: Ingo Molnar Linus noticed that the has huge inline functions which should not be inline at all. As a first step in cleaning this up, move them all to kernel/rcu/ and only keep an absolute minimum of data type defines in the header: before: -rw-r--r-- 1 mingo mingo 22284 May 2 10:25 include/linux/rcu_segcblist.h after: -rw-r--r-- 1 mingo mingo 3180 May 2 10:22 include/linux/rcu_segcblist.h More can be done, such as uninlining the large functions, which inlining is unjustified even if it's an RCU internal matter. Reported-by: Linus Torvalds Cc: Paul E. McKenney Cc: Peter Zijlstra Cc: Thomas Gleixner Signed-off-by: Ingo Molnar Signed-off-by: Paul E. McKenney --- include/linux/rcu_segcblist.h | 628 +--- kernel/rcu/rcu_segcblist.h| 645 ++ kernel/rcu/srcutiny.c | 1 + kernel/rcu/srcutree.c | 1 + kernel/rcu/tree.h | 3 +- 5 files changed, 652 insertions(+), 626 deletions(-) create mode 100644 kernel/rcu/rcu_segcblist.h diff --git a/include/linux/rcu_segcblist.h b/include/linux/rcu_segcblist.h index ced8f313fd05..ba4d2621d9ca 100644 --- a/include/linux/rcu_segcblist.h +++ b/include/linux/rcu_segcblist.h @@ -20,8 +20,8 @@ * Authors: Paul E. McKenney */ -#ifndef __KERNEL_RCU_SEGCBLIST_H -#define __KERNEL_RCU_SEGCBLIST_H +#ifndef __INCLUDE_LINUX_RCU_SEGCBLIST_H +#define __INCLUDE_LINUX_RCU_SEGCBLIST_H /* Simple unsegmented callback lists. */ struct rcu_cblist { @@ -33,102 +33,6 @@ struct rcu_cblist { #define RCU_CBLIST_INITIALIZER(n) { .head = NULL, .tail = } -/* Initialize simple callback list. */ -static inline void rcu_cblist_init(struct rcu_cblist *rclp) -{ - rclp->head = NULL; - rclp->tail = >head; - rclp->len = 0; - rclp->len_lazy = 0; -} - -/* Is simple callback list empty? */ -static inline bool rcu_cblist_empty(struct rcu_cblist *rclp) -{ - return !rclp->head; -} - -/* Return number of callbacks in simple callback list. */ -static inline long rcu_cblist_n_cbs(struct rcu_cblist *rclp) -{ - return rclp->len; -} - -/* Return number of lazy callbacks in simple callback list. */ -static inline long rcu_cblist_n_lazy_cbs(struct rcu_cblist *rclp) -{ - return rclp->len_lazy; -} - -/* - * Debug function to actually count the number of callbacks. - * If the number exceeds the limit specified, return -1. - */ -static inline long rcu_cblist_count_cbs(struct rcu_cblist *rclp, long lim) -{ - int cnt = 0; - struct rcu_head **rhpp = >head; - - for (;;) { - if (!*rhpp) - return cnt; - if (++cnt > lim) - return -1; - rhpp = &(*rhpp)->next; - } -} - -/* - * Dequeue the oldest rcu_head structure from the specified callback - * list. This function assumes that the callback is non-lazy, but - * the caller can later invoke rcu_cblist_dequeued_lazy() if it - * finds otherwise (and if it cares about laziness). This allows - * different users to have different ways of determining laziness. - */ -static inline struct rcu_head *rcu_cblist_dequeue(struct rcu_cblist *rclp) -{ - struct rcu_head *rhp; - - rhp = rclp->head; - if (!rhp) - return NULL; - rclp->len--; - rclp->head = rhp->next; - if (!rclp->head) - rclp->tail = >head; - return rhp; -} - -/* - * Account for the fact that a previously dequeued callback turned out - * to be marked as lazy. - */ -static inline void rcu_cblist_dequeued_lazy(struct rcu_cblist *rclp) -{ - rclp->len_lazy--; -} - -/* - * Interim function to return rcu_cblist head pointer. Longer term, the - * rcu_cblist will be used more pervasively, removing the need for this - * function. - */ -static inline struct rcu_head *rcu_cblist_head(struct rcu_cblist *rclp) -{ - return rclp->head; -} - -/* - * Interim function to return rcu_cblist head pointer. Longer term, the - * rcu_cblist will be used more pervasively, removing the need for this - * function. - */ -static inline struct rcu_head **rcu_cblist_tail(struct rcu_cblist *rclp) -{ - WARN_ON_ONCE(rcu_cblist_empty(rclp)); - return rclp->tail; -} - /* Complicated segmented callback lists. ;-) */ /* @@ -183,530 +87,4 @@ struct rcu_segcblist { .tails[RCU_NEXT_TAIL] = , \ } -/* - * Initialize an rcu_segcblist structure. - */ -static inline void rcu_segcblist_init(struct rcu_segcblist *rsclp) -{ - int i; - - BUILD_BUG_ON(RCU_NEXT_TAIL + 1 != ARRAY_SIZE(rsclp->gp_seq)); - BUILD_BUG_ON(ARRAY_SIZE(rsclp->tails) != ARRAY_SIZE(rsclp->gp_seq)); - rsclp->head = NULL; - for (i = 0; i < RCU_CBLIST_NSEGS; i++) - rsclp->tails[i] = >head; - rsclp->len = 0; - rsclp->len_lazy = 0; -} - -/* - * Is the specified rcu_segcblist structure empty? - * - * But careful! The fact that the ->head field is NULL
RE: Asmedia USB 1343 crashes
From: Thomas Fjellstrom > Sent: 01 May 2017 14:40 > I've got a 970 Pro gaming aura motherboard with an Asmedia 1343 Usb 3.1 > controller. It's been consistently throwing errors and eventually crashing and > becomming unresponsive. ... I've an earlier Asmedia 1042 controller. It has a bug (which I don't remember seeing a fix for) that it will only process one entry from the command ring each time the doorbell is rung. If more than one command gets queued then it all got very confusing because responses to timed out commands kept being seen. (I no longer have the USB3 ethernet device needed to reproduce this.) The later part might have the same bug. David
Re: [PATCH 1/1 linux-next] btrfs: kmap() can't fail
On Tue, Apr 25, 2017 at 08:11:02PM +0200, Fabian Frederick wrote: > Remove NULL test on kmap() > > Signed-off-by: Fabian Frederick Reviewed-by: David Sterba
Re: [PATCH 2/4] PCI: add functionality for resizing resources v3
Am 26.04.2017 um 19:00 schrieb Andy Shevchenko: On Tue, Apr 25, 2017 at 4:19 PM, Christian König wrote: From: Christian König This allows device drivers to request resizing their BARs. The function only tries to reprogram the windows of the bridge directly above the requesting device and only the BAR of the same type (usually mem, 64bit, prefetchable). This is done to make sure not to disturb other drivers by changing the BARs of their devices. If reprogramming the bridge BAR fails the old status is restored and -ENOSPC returned to the calling device driver. +int pci_reassign_bridge_resources(struct pci_dev *bridge, unsigned long type) +{ + const unsigned long type_mask = IORESOURCE_IO | IORESOURCE_MEM | + IORESOURCE_PREFETCH | IORESOURCE_MEM_64; + Redundant. Redundant, but also a reminder to myself that I wanted to ask something about that. This type_mask is used already three times in this file, shouldn't we add a define for that? [SNIP] + list_for_each_entry(dev_res, , list) { + /* Skip the bridge we just assigned resources for. */ + if (bridge == dev_res->dev) + continue; + + bridge = dev_res->dev; + pci_setup_bridge(bridge->subordinate); + } + + free_list(); + free_list(); + return ret; You might re-use two lines with below, but perhaps better to show which case returns 0 explicitly and drop assignment ret = 0 above. Good point, but actually the free_list() is superfluous here since when the failed list isn't empty we end up in the cleanup path. Going to fix all other comments in the next version. Regards, Christian.
Re: [PATCH] usb: gadget: udc: add null check before pointer dereference
Hi Alan, Quoting Alan Stern : On Mon, 1 May 2017, Gustavo A. R. Silva wrote: Add null check before dereferencing dev->regs pointer inside net2280_led_shutdown() function. Addresses-Coverity-ID: 101783 Signed-off-by: Gustavo A. R. Silva --- drivers/usb/gadget/udc/net2280.c | 8 +--- 1 file changed, 5 insertions(+), 3 deletions(-) diff --git a/drivers/usb/gadget/udc/net2280.c b/drivers/usb/gadget/udc/net2280.c index 3828c2e..1898a4b 100644 --- a/drivers/usb/gadget/udc/net2280.c +++ b/drivers/usb/gadget/udc/net2280.c @@ -3573,7 +3573,11 @@ static void net2280_remove(struct pci_dev *pdev) BUG_ON(dev->driver); /* then clean up the resources we allocated during probe() */ - net2280_led_shutdown(dev); + + if (dev->regs) { + net2280_led_shutdown(dev); + iounmap(dev->regs); + } if (dev->requests) { int i; for (i = 1; i < 5; i++) { @@ -3588,8 +3592,6 @@ static void net2280_remove(struct pci_dev *pdev) free_irq(pdev->irq, dev); if (dev->quirks & PLX_PCIE) pci_disable_msi(pdev); - if (dev->regs) - iounmap(dev->regs); if (dev->region) release_mem_region(pci_resource_start(pdev, 0), pci_resource_len(pdev, 0)); No, you must not move the iounmap() call, because an interrupt could theoretically occur at any time. Yeah, I was suspicious about it. Either just live with an extra test of dev->regs, or else move the net2280_led_shutdown() call later. In this case I think it is safe to move the net2280_led_shutdown() call, as the function is only turning off the LEDs. I'll send a patch shortly. Thank you -- Gustavo A. R. Silva
[PATCH v4 RFT 1/2] powerpc/fadump: reduce memory consumption for capture kernel
With fadump (dump capture) kernel booting like a regular kernel, it almost needs the same amount of memory to boot as the production kernel, which is unwarranted for a dump capture kernel. But with no option to disable some of the unnecessary subsystems in fadump kernel, that much memory is wasted on fadump, depriving the production kernel of that memory. Introduce kernel parameter 'fadump_append=' that would take regular kernel parameters to be appended when fadump is active. This 'fadump_append=' parameter can be leveraged to pass parameters like nr_cpus=1, cgroup_disable=memory and numa=off, to disable unwarranted resources/subsystems. Also, ensure the log "Firmware-assisted dump is active" is printed early in the boot process to put the subsequent fadump messages in context. Suggested-by: Michael Ellerman Signed-off-by: Hari Bathini Signed-off-by: Michal Suchanek --- v4: - use space separated arguments instead of comma separated - do not append parameters when fadummp is disabled --- arch/powerpc/kernel/fadump.c | 27 --- 1 file changed, 24 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/kernel/fadump.c b/arch/powerpc/kernel/fadump.c index ebf2e9c..e0c728a 100644 --- a/arch/powerpc/kernel/fadump.c +++ b/arch/powerpc/kernel/fadump.c @@ -79,8 +79,10 @@ int __init early_init_dt_scan_fw_dump(unsigned long node, * dump data waiting for us. */ fdm_active = of_get_flat_dt_prop(node, "ibm,kernel-dump", NULL); - if (fdm_active) + if (fdm_active) { + pr_info("Firmware-assisted dump is active.\n"); fw_dump.dump_active = 1; + } /* Get the sizes required to store dump data for the firmware provided * dump sections. @@ -263,8 +265,12 @@ int __init fadump_reserve_mem(void) { unsigned long base, size, memory_boundary; - if (!fw_dump.fadump_enabled) + if (!fw_dump.fadump_enabled) { + if (fw_dump.dump_active) + pr_warn("Firmware-assisted dump was active but kernel" + " booted with fadump disabled!\n"); return 0; + } if (!fw_dump.fadump_supported) { printk(KERN_INFO "Firmware-assisted dump is not supported on" @@ -304,7 +310,6 @@ int __init fadump_reserve_mem(void) memory_boundary = memblock_end_of_DRAM(); if (fw_dump.dump_active) { - printk(KERN_INFO "Firmware-assisted dump is active.\n"); /* * If last boot has crashed then reserve all the memory * above boot_memory_size so that we don't touch it until @@ -363,6 +368,22 @@ unsigned long __init arch_reserved_kernel_pages(void) return memblock_reserved_size() / PAGE_SIZE; } +/* Look for fadump_append= cmdline option. */ +static int __init early_fadump_append_param(char *p) +{ + if (!p) + return 1; + + if (fw_dump.fadump_enabled && fw_dump.dump_active) { + pr_info("enforcing additional parameters (%s) passed through " + "'fadump_append=' parameter\n", p); + parse_early_options(p); + } + + return 0; +} +early_param("fadump_append", early_fadump_append_param); + /* Look for fadump= cmdline option. */ static int __init early_fadump_param(char *p) { -- 2.10.2
[PATCH v4 RFT 2/2] powerpc/fadump: update documentation about 'fadump_append=' parameter
With the introduction of 'fadump_append=' parameter to pass additional parameters to fadump (capture) kernel, update documentation about it. Signed-off-by: Hari Bathini Signed-off-by: Michal Suchanek --- Documentation/powerpc/firmware-assisted-dump.txt | 8 +++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/Documentation/powerpc/firmware-assisted-dump.txt b/Documentation/powerpc/firmware-assisted-dump.txt index 9cabaf8..99ab8f4 100644 --- a/Documentation/powerpc/firmware-assisted-dump.txt +++ b/Documentation/powerpc/firmware-assisted-dump.txt @@ -162,7 +162,11 @@ How to enable firmware-assisted dump (fadump): 1. Set config option CONFIG_FA_DUMP=y and build kernel. 2. Boot into linux kernel with 'fadump=on' kernel cmdline option. -3. Optionally, user can also set 'crashkernel=' kernel cmdline +3. A user can pass additional kernel parameters as a comma separated list + through 'fadump_append=' parameter, to be be appended when fadump is active. + This can be used to pass parameters like nr_cpus=1, numa=off to reduce + memory consumption during dump capture. +4. Optionally, user can also set 'fadump_reserve_mem=' kernel cmdline to specify size of the memory to reserve for boot memory dump preservation. @@ -172,6 +176,8 @@ NOTE: 1. 'fadump_reserve_mem=' parameter has been deprecated. Instead 2. If firmware-assisted dump fails to reserve memory then it will fallback to existing kdump mechanism if 'crashkernel=' option is set at kernel cmdline. + 3. 'fadump_append=' parameter con be quoted to append multiple arguments + as in 'fadump_append="nr_cpus=8 numa=off"' Sysfs/debugfs files: -- 2.10.2
Re: [Xen-devel] linux-next: Tree for May 2 (xen: events_base.c)
On 05/02/2017 11:34 AM, Randy Dunlap wrote: > On 05/01/17 23:47, Stephen Rothwell wrote: >> Hi all, >> >> Please do not add any v4.13 destined material in your linux-next >> included branches until after v4.12-rc1 has been released. >> >> Changes since 20170501: >> > on x86_64: > > drivers/built-in.o: In function `set_affinity_irq': > events_base.c:(.text+0x1632c4): undefined reference to > `xen_have_vector_callback' > arch/x86/pci/built-in.o: In function `pci_xen_hvm_init': > (.init.text+0x59f): undefined reference to `xen_have_vector_callback' > > Can you send me (or to the list) your config file? Thanks. -boris
Re: [PATCH v2 3/4] iio: accel: adxl345: Setup DATA_READY trigger
On 02/05/17 04:01, Rob Herring wrote: > On Sun, Apr 30, 2017 at 7:32 PM, Jonathan Cameron wrote: >> On 29/04/17 08:49, Eva Rachel Retuya wrote: >>> The ADXL345 provides a DATA_READY interrupt function to signal >>> availability of new data. This interrupt function is latched and can be >>> cleared by reading the data registers. The polarity is set to active >>> high by default. >>> >>> Support this functionality by setting it up as an IIO trigger. >>> >>> In addition, two output pins INT1 and INT2 are available for driving >>> interrupts. Allow mapping to either pins by specifying the >>> interrupt-names property in device tree. >>> >>> Signed-off-by: Eva Rachel Retuya >> Coming together nicely, but a few more bits and pieces inline... >> >> One slight worry is that the irq names stuff is to restrictive >> as we want to direct different interrupts to different pins if >> both are supported! > > [...] > >>> @@ -199,6 +253,22 @@ int adxl345_core_probe(struct device *dev, struct >>> regmap *regmap, >>> dev_err(dev, "Failed to set data range: %d\n", ret); >>> return ret; >>> } >>> + /* >>> + * Any bits set to 0 send their respective interrupts to the INT1 pin, >>> + * whereas bits set to 1 send their respective interrupts to the INT2 >>> + * pin. Map all interrupts to the specified pin. >> This is an interesting comment. The usual reason for dual interrupt >> pins is precisely to not map all functions to the same one. That allows >> for a saving in querying which interrupt it is by having just the data ready >> on one pin and just the events on the other... >> >> Perhaps the current approach won't support that mode of operation? >> Clearly we can't merge a binding that enforces them all being the same >> and then change it later as it'll be incompatible. >> >> I'm not quite sure how one should do this sort of stuff in DT though. >> >> Rob? > > DT should just describe what is connected which I gather here could be > either one or both IRQs. We generally distinguish the IRQs with the > interrupt-names property and then retrieve it as below. Picking this branch to continue on I'll grab Eva's replay as well. Eva said: > I've thought about this before since to me that's the better approach > than one or the other. I'm in a time crunch before hence I went with > this way. The input driver does this as well and what I just did is to > match what it does. If you could point me some drivers for reference, > I'll gladly analyze those and present something better on the next > revision. So taking both of these and having thought about it a bit more in my current jet lagged state (I hate travelling - particularly with the added amusement of a flat tyre on the plane). To my mind we need to describe what interrupts at there as Rob says. It's all obvious if there is only one interrupt connected (often the case I suspect as pins are in short supply on many SoCs). If we allow the binding to specify both pins (using names to do the matching to which pin they are on the chip), then we could allow the driver itself to optimize the usage according to what is enabled. Note though that this can come later - for now we just need to allow the specification of both interrupts if they are present. So lets talk about the ideal ;) Probably makes sense to separate dataready and the events if possible. Ideal would be to even allow individual events to have there own pins as long as there are only two available. So we need a heuristic to work out what interrupts to put where. It doesn't work well as a lookup table (I tried it) #define ADXL345_OVERRUN = BIT(0) #define ADXL345_WATERMARK = BIT(1) #define ADXL345_FREEFALL = BIT(2) #define ADXL345_INACTIVITY = BIT(3) #define ADXL345_ACTIVITY = BIT(4) #define ADXL345_DOUBLE_TAP = BIT(5) #define ADXL345_SINGLE_TAP = BIT(6) #define ADXL345_DATA_READY = BIT(7) So some function that takes the bitmap of what is enabled and tries to divide it sensibly. int adxl345_int_heuristic(u8 input, u8 *output) { long bounce; switch (hweight8()) { case 0 ... 1: *output = input; break; case 2: *output = BIT(ffs()); //this will put one on each interrupt. break; case 3 ... 7: //now it gets tricky. Perhaps always have dataready and watermark on own interrupt if set? if (input & (ADXL345_DATA_READY | ADXL345_WATERMARK)) output = input & (ADXL345_DATA_READY | ADXL345_WATERMARK); else // taps always on same one etc... } } Then your interrupt handler will need to look at the incoming and work out if it needs to read the status register to know what it has. If it doesn't need to then it doesn't do so. Be careful to only clear the right interrupts though in that case as it is always possible both are set. Anyhow, right now all that needs to be there is the
Re: [usb-host] question about pointer dereference before null check
Hi Alan, Quoting Alan Stern : On Mon, 1 May 2017, Gustavo A. R. Silva wrote: Hello everybody, While taking a look into Coverity ID 100828 I ran into the following piece of code at drivers/usb/host/ehci-sched.c:1096: u32 addr; int think_time; int hs_transfers; addr = dev->ttport << 24; if (!ehci_is_TDI(ehci) || (dev->tt->hub != ehci_to_hcd(ehci)->self.root_hub)) addr |= dev->tt->hub->devnum << 16; addr |= epnum << 8; addr |= dev->devnum; stream->ps.usecs = HS_USECS_ISO(maxp); think_time = dev->tt ? dev->tt->think_time : 0; The issue here is that dev->tt is being dereferenced before null check. I was thinking on placing think_time = dev->tt ? dev->tt->think_time : 0; just before the _if_ statement. But this doesn't address the problem of dev->tt actually being NULL. While looking into the callers of the function containing this piece of code (iso_stream_init()) my impression is that dev->tt is not NULL at the time this function is called and, a very simple patch like the following can be applied in order to avoid the Coverity issue: -think_time = dev->tt ? dev->tt->think_time : 0; +think_time = dev->tt->think_time; But I can't tell for sure, so in case dev->tt is NULL, a good strategy to properly update the _addr_ variable would be needed. What do you think? I would really appreciate any comment on this, Thank you! You are right; udev->tt cannot ever be NULL when this section of code runs. The test should be removed. Thanks for confirming, I'll send a patch shortly. -- Gustavo A. R. Silva
Re: [PATCH RFC] hugetlbfs 'noautofill' mount option
On 5/2/17 3:53 AM, Anshuman Khandual wrote: On 05/01/2017 11:30 PM, Prakash Sangappa wrote: Some applications like a database use hugetblfs for performance reasons. Files on hugetlbfs filesystem are created and huge pages allocated using fallocate() API. Pages are deallocated/freed using fallocate() hole punching support that has been added to hugetlbfs. These files are mmapped and accessed by many processes as shared memory. Such applications keep track of which offsets in the hugetlbfs file have pages allocated. Any access to mapped address over holes in the file, which can occur due s/mapped/unmapped/ ^ ? It is 'mapped' address. to bugs in the application, is considered invalid and expect the process to simply receive a SIGBUS. However, currently when a hole in the file is accessed via the mapped address, kernel/mm attempts to automatically allocate a page at page fault time, resulting in implicitly filling the hole But this is expected when you try to control the file allocation from a mapped address. Any changes while walking past or writing the range in the memory mapped should reflect exactly in the file on the disk. Why its not a valid behavior ? Sure, that is a valid behavior. However, hugetlbfs is a pesudo filesystem and the purpose is for applications to use hugepage memory. The contents of these filesystem are not backed by disk nor are they swapped out. The proposed new behavior is only applicable for hugetlbfs filesystem mounted with the new 'noautofill' mount option. The file's page allocation/free are managed using the 'fallocate()' API. For hugetlbfs filesystems mounted without this option, there is no change in behavior. in the file. This may not be the desired behavior for applications like the database that want to explicitly manage page allocations of hugetlbfs files. This patch adds a new hugetlbfs mount option 'noautofill', to indicate that pages should not be allocated at page fault time when accessed thru mmapped address. When the page should be allocated for mapping ? The application would allocate/free file pages using the fallocate() API.
[PATCH] usb: host: remove unnecessary null check
Remove unnecessary null check. udev->tt cannot ever be NULL when this section of code runs. Addresses-Coverity-ID: 100828 Cc: Alan Stern Signed-off-by: Gustavo A. R. Silva --- drivers/usb/host/ehci-sched.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/usb/host/ehci-sched.c b/drivers/usb/host/ehci-sched.c index 980a6b3..6bc6304 100644 --- a/drivers/usb/host/ehci-sched.c +++ b/drivers/usb/host/ehci-sched.c @@ -1105,7 +1105,7 @@ iso_stream_init( addr |= epnum << 8; addr |= dev->devnum; stream->ps.usecs = HS_USECS_ISO(maxp); - think_time = dev->tt ? dev->tt->think_time : 0; + think_time = dev->tt->think_time; stream->ps.tt_usecs = NS_TO_US(think_time + usb_calc_bus_time( dev->speed, is_input, 1, maxp)); hs_transfers = max(1u, (maxp + 187) / 188); -- 2.5.0
Re: [PATCH v2 4/4] iio: accel: adxl345: Add support for triggered buffer
On 02/05/17 13:23, Eva Rachel Retuya wrote: > On Mon, May 01, 2017 at 01:42:29AM +0100, Jonathan Cameron wrote: > [...] >> Few minor bits inline... I'm a little bit in two minds about the >> holding up waiting for new data when using another trigger... >> >> Jonathan > [...] >>> static int adxl345_read_raw(struct iio_dev *indio_dev, >>> @@ -127,6 +151,10 @@ static int adxl345_read_raw(struct iio_dev *indio_dev, >>> >>> switch (mask) { >>> case IIO_CHAN_INFO_RAW: >>> + ret = iio_device_claim_direct_mode(indio_dev); >>> + if (ret) >>> + return ret; >>> + >>> mutex_lock(>lock); >>> ret = adxl345_set_mode(data, ADXL345_POWER_CTL_MEASURE); >>> if (ret < 0) { >>> @@ -148,12 +176,14 @@ static int adxl345_read_raw(struct iio_dev *indio_dev, >>> ret = regmap_bulk_read(data->regmap, chan->address, , >>>sizeof(regval)); >>> mutex_unlock(>lock); >>> + iio_device_release_direct_mode(indio_dev); >>> if (ret < 0) { >>> adxl345_set_mode(data, ADXL345_POWER_CTL_STANDBY); >>> return ret; >>> } >>> >>> - *val = sign_extend32(le16_to_cpu(regval), 12); >>> + *val = sign_extend32(le16_to_cpu(regval), >>> +chan->scan_type.realbits - 1) >> This change isn't really needed, but I suppose it does little harm... >> >>> adxl345_set_mode(data, ADXL345_POWER_CTL_STANDBY); >>> >>> return IIO_VAL_INT; >>> @@ -186,6 +216,64 @@ static irqreturn_t adxl345_irq(int irq, void *p) >>> return IRQ_NONE; >>> } >>> >>> +static irqreturn_t adxl345_trigger_handler(int irq, void *p) >>> +{ >>> + struct iio_poll_func *pf = p; >>> + struct iio_dev *indio_dev = pf->indio_dev; >>> + struct adxl345_data *data = iio_priv(indio_dev); >>> + int ret; >>> + >>> + mutex_lock(>lock); >>> + /* Make sure data is ready when using external trigger */ >> I 'think' this is only really relevant for the very first one. >> After that general rule of thumb is that if an external trigger >> is too quick - bad luck you'll get repeated data. >> >> One of the reasons we would want to use another trigger is to >> support capture in parallel from several sensors - if we 'hold' >> like this we'll get out of sync. >> >> As such I wonder if a better strategy would be to 'hold' for the >> first reading in the buffer enable - thus guaranteeing valid >> data before we start. After that we wouldn't need to check this >> here. >> > > Thanks for the explanation. If we are to go with this one, where to put > it, preenable or postenable? I'm assuming the latter but would like to > confirm. postenable. It could in theory be effected by a future use of the update_scan_mode callback so should be after that. J > >> What do others think? >> > > Any other inputs are greatly appreciated. > > Eva > -- > To unsubscribe from this list: send the line "unsubscribe linux-iio" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html >
Re: [PATCH] device-dax: fix sysfs attribute deadlock
On Tue, May 2, 2017 at 3:43 AM, Yi Zhang wrote: > Verified this patch on 4.11. > > Tested-by: Yi Zhang Thanks!
Re: [PATCH v3 0/4] Improved seccomp logging
On Mon, May 1, 2017 at 7:41 PM, Tyler Hicks wrote: > On 04/27/2017 07:42 PM, Kees Cook wrote: >> On Thu, Apr 27, 2017 at 3:17 PM, Tyler Hicks wrote: >>> Quick update... I finished the move from the high-water mark >>> log_max_action sysctl to the bitmask based actions_logged sysctl. >> >> Awesome! >> >>> Unfortunately, I've just realized that SECCOMP_SET_LOGGING, or any >>> process-wide logging configuration mechanism, will not work. It is fine >>> for the situation where two unrelated processes set up seccomp filters >>> that should be logged differently. However, it fails when two closely >>> related processes, such as parent and child, need to set up seccomp >>> filters that should be logged differently. Imagine a launcher that sets >>> up an application sandbox (including a seccomp filter) and then launches >>> an electron app which will have its own seccomp filter for sandboxing >>> untrusted code that it runs. Unless the launcher and app completely >>> agree on actions that should be logged, the logging won't work as >>> intended for both processes. >> >> Oh, you mean the forked process sets up the logging it wants for the >> filters it just installed, then after exec a process sets up new >> logging requirements? > > Yes - see below. > >> >>> I think this needs to be configured at the filter level. >> >> I'm not sure that's even the right way to compose the logging desires. >> >> So, my initial thought was "whatever ran SECCOMP_SET_LOGGING knows >> what it's doing" and it should be the actual value. >> >> If the launcher wants logs of everything the application does with its >> filters, then a purely-tied-to-filter approach won't work either. >> >> Perhaps log bits can only be enabled? I.e. SECCOMP_SET_LOGGING >> performs an OR instead of an assignment? > > The problem that I'm envisioning with this design is this: > > 1. Launcher is told to launch Chrome and forks off a process. > > 2. Launcher sets up a filter using RET_ERRNO for all unacceptable > syscalls and enables auditing of RET_ERRNO. > > 3. Launcher execs Chrome. > > 4. Chrome then sets up its own, more restrictive filter that uses > RET_ERRNO, among other actions, but does not want auditing of RET_ERRNO. > > If we use process-wide auditing controls, the logs will be filled with > RET_ERRNO messages that were unintended and unrelated to the RET_ERRNO > actions set up in the launcher's filter. > > Unfortunately, the OR'ing idea doesn't solve the problem. Things like my more complicated solution solve this completely, I think. The launcher would, by whatever means, say "RET_ERRNO and log this". The more restrictive sandbox would say "RET_ERROR and don't log this" and we'd just make sure that the composition rules mean the inner rule wins. --Andy
Re: [PATCH] mtd: nand: gpio: make nCE GPIO optional
Hi Christophe, Boris, On Tue, May 02, 2017 at 11:03:34AM +0200, Boris Brezillon wrote: > On Tue, 2 May 2017 07:47:40 +0200 > Christophe LEROY wrote: > > > Le 01/05/2017 à 23:46, Brian Norris a écrit : > > > On Fri, Feb 10, 2017 at 03:01:10PM +0100, Christophe Leroy wrote: > > >> On some hardware, the nCE signal is wired to the ChipSelect associated > > >> to bus address of the NAND, so it is automatically driven during the > > >> memory access and it is not managed by a GPIO. > > >> > > >> Signed-off-by: Christophe Leroy > > > > > > Not really a problem with this patch exactly, but FYI you're only making > > > this optional for the non-DT case. For device tree, this is kinda hard > > > to do, since the current binding suggests we retrieve the GPIOs based on > > > index position, not by name. So if you leave one off...I guess we well > > > just be off-by-1 on the indeces until we hit a non-optional one...which > > > I guess is "CLE". > > > > > > If we wanted this to work for DT, we'd need to extend this driver (and > > > binding doc) to support requesting GPIOs by name. > > > > > > > It works for me with devicetree. > > > > I have the following definition in my DT: > > > > nand@1,0 { > > compatible = "gpio-control-nand"; > > reg = <1 0x0 0x01>; > > #address-cells = <1>; > > #size-cells = <1>; > > gpios = <_pio_c 24 1 // RDY > > 0 // nCE > > _pio_c 26 1 // ALE > > _pio_c 25 1 // CLE > > 0>;// nwp > > }; > > > > Yep, it's perfectly fine to have 'empty' gpio entries (entries with > phandle set to 0/NULL), we're using this trick in the atmel_nand > driver as well. I wasn't aware. In that case, you need to change the binding doc to note that nCE is optional now. Brian
Re: LKML: "Fixes as per checkpatch.pl" patches
On Tue, 2017-05-02 at 18:08 +0300, Ali Kaasinen wrote: > Hellooo, presumably contacted the right people. > > I often browse LKML, and patches saying "Fix style issues as reported by > checkpatch.pl", and Greg responding "That's really vague, you need to be > specific, and only fix one type of thing per patch" seem fairly common > these days, e.g.: > > - http://lkml.iu.edu/hypermail/linux/kernel/1704.3/01867.html > > - http://lkml.iu.edu/hypermail/linux/kernel/1705.0/00105.html > > Just wondering, shouldn't checkpatch mention that? This question should be asked while cc'ing lkml. What Greg does with patches is up to Greg. A checkpatch user guide in Documentation could be useful.
[PATCH tip/core/rcu 0/3] Open-code trivial rcu_cblist accessors
Hello! And this series open-codes trivial rcu_cblist structure accessors: 1. Open-code rcu_cblist_empty(p) as !p->head. 2. Open-code rcu_cblist_n_cbs(p) as p->len. 3. Open-code rcu_cblist_n_lazy_cbs(p) as p->len_lazy. Thanx, Paul rcu_segcblist.h | 20 +--- tree.c | 20 +--- tree_plugin.h |8 tree_trace.c|4 ++-- 4 files changed, 16 insertions(+), 36 deletions(-)
Re: [PATCH v4] irqchip/gicv3-its: Avoid memory over allocation for ITEs
On Sun, Apr 30 2017 at 3:36:15 pm BST, Shanker Donthineni wrote: > We are always allocating extra 255Bytes of memory to handle ITE > physical address alignment requirement. The kmalloc() satisfies > the ITE alignment since the ITS driver is requesting a minimum > size of ITS_ITT_ALIGN bytes. > > Let's try to allocate the exact amount of memory that is required > for ITEs to avoid wastage. > > Signed-off-by: Shanker Donthineni > --- > Changes: > v2: removed 'Change-Id: Ia8084189833f2081ff13c392deb5070c46a64038' from > commit. > v3: changed from IITE to ITE. > v3: removed fallback since kmalloc() guarantees the right alignment. > > drivers/irqchip/irq-gic-v3-its.c | 6 +++--- > 1 file changed, 3 insertions(+), 3 deletions(-) > > diff --git a/drivers/irqchip/irq-gic-v3-its.c > b/drivers/irqchip/irq-gic-v3-its.c > index 45ea1933..72e56f03 100644 > --- a/drivers/irqchip/irq-gic-v3-its.c > +++ b/drivers/irqchip/irq-gic-v3-its.c > @@ -261,7 +261,6 @@ static struct its_collection *its_build_mapd_cmd(struct > its_cmd_block *cmd, > u8 size = ilog2(desc->its_mapd_cmd.dev->nr_ites); > > itt_addr = virt_to_phys(desc->its_mapd_cmd.dev->itt); > - itt_addr = ALIGN(itt_addr, ITS_ITT_ALIGN); > > its_encode_cmd(cmd, GITS_CMD_MAPD); > its_encode_devid(cmd, desc->its_mapd_cmd.dev->device_id); > @@ -1329,13 +1328,14 @@ static struct its_device *its_create_device(struct > its_node *its, u32 dev_id, >*/ > nr_ites = max(2UL, roundup_pow_of_two(nvecs)); > sz = nr_ites * its->ite_size; > - sz = max(sz, ITS_ITT_ALIGN) + ITS_ITT_ALIGN - 1; > + sz = max(sz, ITS_ITT_ALIGN); > itt = kzalloc(sz, GFP_KERNEL); > lpi_map = its_lpi_alloc_chunks(nvecs, _base, _lpis); > if (lpi_map) > col_map = kzalloc(sizeof(*col_map) * nr_lpis, GFP_KERNEL); > > - if (!dev || !itt || !lpi_map || !col_map) { > + if (!dev || !itt || !lpi_map || !col_map || > + !IS_ALIGNED(virt_to_phys(itt), ITS_ITT_ALIGN)) { > kfree(dev); > kfree(itt); > kfree(lpi_map); I'm confused. Either the alignment is guaranteed (and you should document why it is so), or it is not, and we need to handle the non-alignment (instead of failing). Thanks, M. -- Jazz is not dead, it just smell funny.
[PATCH tip/core/rcu 1/3] rcu: Open-code the rcu_cblist_empty() function
Because the rcu_cblist_empty() just samples the ->head pointer, and because the rcu_cblist structure is quite straightforward, it makes sense to open-code rcu_cblist_empty(p) as !p->head, cutting out a level of indirection. This commit makes this change. Reported-by: Ingo Molnar Signed-off-by: Paul E. McKenney Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Linus Torvalds --- kernel/rcu/rcu_segcblist.h | 8 +--- kernel/rcu/tree.c | 9 - kernel/rcu/tree_plugin.h | 4 ++-- 3 files changed, 7 insertions(+), 14 deletions(-) diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h index 86bc1101b806..7d18d41f0116 100644 --- a/kernel/rcu/rcu_segcblist.h +++ b/kernel/rcu/rcu_segcblist.h @@ -22,12 +22,6 @@ #include -/* Is simple callback list empty? */ -static inline bool rcu_cblist_empty(struct rcu_cblist *rclp) -{ - return !rclp->head; -} - /* Return number of callbacks in simple callback list. */ static inline long rcu_cblist_n_cbs(struct rcu_cblist *rclp) { @@ -66,7 +60,7 @@ static inline struct rcu_head *rcu_cblist_head(struct rcu_cblist *rclp) */ static inline struct rcu_head **rcu_cblist_tail(struct rcu_cblist *rclp) { - WARN_ON_ONCE(rcu_cblist_empty(rclp)); + WARN_ON_ONCE(!rclp->head); return rclp->tail; } diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 91fff49d5869..35152414760d 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2647,9 +2647,9 @@ static void rcu_adopt_orphan_cbs(struct rcu_state *rsp, unsigned long flags) /* First adopt the ready-to-invoke callbacks, then the done ones. */ rcu_segcblist_insert_done_cbs(>cblist, >orphan_done); - WARN_ON_ONCE(!rcu_cblist_empty(>orphan_done)); + WARN_ON_ONCE(rsp->orphan_done.head); rcu_segcblist_insert_pend_cbs(>cblist, >orphan_pend); - WARN_ON_ONCE(!rcu_cblist_empty(>orphan_pend)); + WARN_ON_ONCE(rsp->orphan_pend.head); WARN_ON_ONCE(rcu_segcblist_empty(>cblist) != !rcu_segcblist_n_cbs(>cblist)); } @@ -2800,9 +2800,8 @@ static void rcu_do_batch(struct rcu_state *rsp, struct rcu_data *rdp) local_irq_save(flags); count = -rcu_cblist_n_cbs(); - trace_rcu_batch_end(rsp->name, count, !rcu_cblist_empty(), - need_resched(), is_idle_task(current), - rcu_is_callbacks_kthread()); + trace_rcu_batch_end(rsp->name, count, !!rcl.head, need_resched(), + is_idle_task(current), rcu_is_callbacks_kthread()); /* Update counts and requeue any remaining callbacks. */ rcu_segcblist_insert_done_cbs(>cblist, ); diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 7f1d677a2a25..6b4b1f8a272d 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -1942,12 +1942,12 @@ static bool __maybe_unused rcu_nocb_adopt_orphan_cbs(struct rcu_state *rsp, return false; /* First, enqueue the donelist, if any. This preserves CB ordering. */ - if (!rcu_cblist_empty(>orphan_done)) { + if (rsp->orphan_done.head) { __call_rcu_nocb_enqueue(rdp, rcu_cblist_head(>orphan_done), rcu_cblist_tail(>orphan_done), ql, qll, flags); } - if (!rcu_cblist_empty(>orphan_pend)) { + if (rsp->orphan_pend.head) { __call_rcu_nocb_enqueue(rdp, rcu_cblist_head(>orphan_pend), rcu_cblist_tail(>orphan_pend), ql, qll, flags); -- 2.5.2
[PATCH tip/core/rcu 3/3] rcu: Open-code the rcu_cblist_n_lazy_cbs() function
Because the rcu_cblist_n_lazy_cbs() just samples the ->len_lazy counter, and because the rcu_cblist structure is quite straightforward, it makes sense to open-code rcu_cblist_n_lazy_cbs(p) as p->len_lazy, cutting out a level of indirection. This commit makes this change. Reported-by: Ingo Molnar Signed-off-by: Paul E. McKenney Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Linus Torvalds --- kernel/rcu/rcu_segcblist.h | 6 -- kernel/rcu/tree.c | 2 +- kernel/rcu/tree_plugin.h | 2 +- kernel/rcu/tree_trace.c| 2 +- 4 files changed, 3 insertions(+), 9 deletions(-) diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h index 424a6b230921..6e36e36478cd 100644 --- a/kernel/rcu/rcu_segcblist.h +++ b/kernel/rcu/rcu_segcblist.h @@ -22,12 +22,6 @@ #include -/* Return number of lazy callbacks in simple callback list. */ -static inline long rcu_cblist_n_lazy_cbs(struct rcu_cblist *rclp) -{ - return rclp->len_lazy; -} - /* * Account for the fact that a previously dequeued callback turned out * to be marked as lazy. diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 942d529fccbc..1205c8ad138a 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2634,7 +2634,7 @@ static void rcu_adopt_orphan_cbs(struct rcu_state *rsp, unsigned long flags) /* Do the accounting first. */ rdp->n_cbs_adopted += rsp->orphan_done.len; - if (rcu_cblist_n_lazy_cbs(>orphan_done) != rsp->orphan_done.len) + if (rsp->orphan_done.len_lazy != rsp->orphan_done.len) rcu_idle_count_callbacks_posted(); rcu_segcblist_insert_count(>cblist, >orphan_done); diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 7ebe357df155..c9a48657512a 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -1935,7 +1935,7 @@ static bool __maybe_unused rcu_nocb_adopt_orphan_cbs(struct rcu_state *rsp, unsigned long flags) { long ql = rsp->orphan_done.len; - long qll = rcu_cblist_n_lazy_cbs(>orphan_done); + long qll = rsp->orphan_done.len_lazy; /* If this is not a no-CBs CPU, tell the caller to do it the old way. */ if (!rcu_is_nocb_cpu(smp_processor_id())) diff --git a/kernel/rcu/tree_trace.c b/kernel/rcu/tree_trace.c index b7743aa2965f..6cea17a1ea30 100644 --- a/kernel/rcu/tree_trace.c +++ b/kernel/rcu/tree_trace.c @@ -277,7 +277,7 @@ static void print_one_rcu_state(struct seq_file *m, struct rcu_state *rsp) rsp->n_force_qs, rsp->n_force_qs_ngp, rsp->n_force_qs - rsp->n_force_qs_ngp, READ_ONCE(rsp->n_force_qs_lh), - rcu_cblist_n_lazy_cbs(>orphan_done), + rsp->orphan_done.len_lazy, rsp->orphan_done.len); for (rnp = >node[0]; rnp - >node[0] < rcu_num_nodes; rnp++) { if (rnp->level != level) { -- 2.5.2
Re: [PATCH] x86: kvm: Avoid guest page table walk when gpa_available is set
On 28/04/2017 21:15, Brijesh Singh wrote: > Hi Radim, > >> >> This will probably return false negatives when then vcpu->arch.gpa_val >> couldn't be used anyway (possibly after a VM exits), so it is hard to >> draw a conclusion. >> >> I would really like if we had a solution that passed the gpa inside >> function parameters. >> >> (Btw. cr2 can also be a virtual address, so we can call it gpa directly) >> > > I've tried the below patch and it seems to work fine. This does not > consider > PIO case and as you rightly pointed PIO should trigger #NPF relatively > rarely. > At least so far in my runs I have not seen PIO causing #NPF. If this sounds > acceptable approach then I can submit v2 with these changes and remove > gpa_val > additional. PIO can certainly cause #NPF, albeit rarely, so this is not a viable one. Paolo > diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c > index 13c132b..c040e38 100644 > --- a/arch/x86/kvm/x86.c > +++ b/arch/x86/kvm/x86.c > @@ -4662,17 +4662,17 @@ static int emulator_read_write_onepage(unsigned > long addr, void *val, > */ > if (vcpu->arch.gpa_available && > emulator_can_use_gpa(ctxt) && > - vcpu_is_mmio_gpa(vcpu, addr, exception->address, write) && > (addr & ~PAGE_MASK) == (exception->address & ~PAGE_MASK)) { > gpa = exception->address; > - goto mmio; > + ret = vcpu_is_mmio_gpa(vcpu, addr, gpa, write); > + } else { > + dump_stack(); > + ret = vcpu_mmio_gva_to_gpa(vcpu, addr, , exception, > write); > + > + if (ret < 0) > + return X86EMUL_PROPAGATE_FAULT; > } > > - ret = vcpu_mmio_gva_to_gpa(vcpu, addr, , exception, write); > - > - if (ret < 0) > - return X86EMUL_PROPAGATE_FAULT; > - > /* For APIC access vmexit */ > if (ret) > goto mmio; > @@ -7056,11 +7056,11 @@ static int vcpu_run(struct kvm_vcpu *vcpu) > return r; > } > > -static inline int complete_emulated_io(struct kvm_vcpu *vcpu) > +static inline int complete_emulated_io(struct kvm_vcpu *vcpu, unsigned > long cr2) > { > int r; > vcpu->srcu_idx = srcu_read_lock(>kvm->srcu); > - r = emulate_instruction(vcpu, EMULTYPE_NO_DECODE); > + r = x86_emulate_instruction(vcpu, cr2, EMULTYPE_NO_DECODE, NULL, > 0); > srcu_read_unlock(>kvm->srcu, vcpu->srcu_idx); > if (r != EMULATE_DONE) > return 0; > @@ -7071,7 +7071,7 @@ static int complete_emulated_pio(struct kvm_vcpu > *vcpu) > { > BUG_ON(!vcpu->arch.pio.count); > > - return complete_emulated_io(vcpu); > + return complete_emulated_io(vcpu, 0); > } > /* > @@ -7097,6 +7097,7 @@ static int complete_emulated_mmio(struct kvm_vcpu > *vcpu) > struct kvm_run *run = vcpu->run; > struct kvm_mmio_fragment *frag; > unsigned len; > + gpa_t gpa; > > BUG_ON(!vcpu->mmio_needed); > > @@ -7106,6 +7107,12 @@ static int complete_emulated_mmio(struct kvm_vcpu > *vcpu) > if (!vcpu->mmio_is_write) > memcpy(frag->data, run->mmio.data, len); > > + /* > + * lets use the GPA from previous guest page table walk to avoid yet > + * another guest page table walk when completing the MMIO > page-fault. > +*/ > + gpa = frag->gpa; > + > if (frag->len <= 8) { > /* Switch to the next fragment. */ > frag++; > @@ -7124,7 +7131,7 @@ static int complete_emulated_mmio(struct kvm_vcpu > *vcpu) > if (vcpu->mmio_is_write) > return 1; > vcpu->mmio_read_completed = 1; > - return complete_emulated_io(vcpu); > + return complete_emulated_io(vcpu, gpa); > } > > run->exit_reason = KVM_EXIT_MMIO; >
[PATCH tip/core/rcu 2/3] rcu: Open-code the rcu_cblist_n_cbs() function
Because the rcu_cblist_n_cbs() just samples the ->len counter, and because the rcu_cblist structure is quite straightforward, it makes sense to open-code rcu_cblist_n_cbs(p) as p->len, cutting out a level of indirection. This commit makes this change. Reported-by: Ingo Molnar Signed-off-by: Paul E. McKenney Cc: Peter Zijlstra Cc: Thomas Gleixner Cc: Linus Torvalds --- kernel/rcu/rcu_segcblist.h | 6 -- kernel/rcu/tree.c | 9 - kernel/rcu/tree_plugin.h | 2 +- kernel/rcu/tree_trace.c| 2 +- 4 files changed, 6 insertions(+), 13 deletions(-) diff --git a/kernel/rcu/rcu_segcblist.h b/kernel/rcu/rcu_segcblist.h index 7d18d41f0116..424a6b230921 100644 --- a/kernel/rcu/rcu_segcblist.h +++ b/kernel/rcu/rcu_segcblist.h @@ -22,12 +22,6 @@ #include -/* Return number of callbacks in simple callback list. */ -static inline long rcu_cblist_n_cbs(struct rcu_cblist *rclp) -{ - return rclp->len; -} - /* Return number of lazy callbacks in simple callback list. */ static inline long rcu_cblist_n_lazy_cbs(struct rcu_cblist *rclp) { diff --git a/kernel/rcu/tree.c b/kernel/rcu/tree.c index 35152414760d..942d529fccbc 100644 --- a/kernel/rcu/tree.c +++ b/kernel/rcu/tree.c @@ -2633,9 +2633,8 @@ static void rcu_adopt_orphan_cbs(struct rcu_state *rsp, unsigned long flags) return; /* Do the accounting first. */ - rdp->n_cbs_adopted += rcu_cblist_n_cbs(>orphan_done); - if (rcu_cblist_n_lazy_cbs(>orphan_done) != - rcu_cblist_n_cbs(>orphan_done)) + rdp->n_cbs_adopted += rsp->orphan_done.len; + if (rcu_cblist_n_lazy_cbs(>orphan_done) != rsp->orphan_done.len) rcu_idle_count_callbacks_posted(); rcu_segcblist_insert_count(>cblist, >orphan_done); @@ -2792,14 +2791,14 @@ static void rcu_do_batch(struct rcu_state *rsp, struct rcu_data *rdp) * Stop only if limit reached and CPU has something to do. * Note: The rcl structure counts down from zero. */ - if (-rcu_cblist_n_cbs() >= bl && + if (-rcl.len >= bl && (need_resched() || (!is_idle_task(current) && !rcu_is_callbacks_kthread( break; } local_irq_save(flags); - count = -rcu_cblist_n_cbs(); + count = -rcl.len; trace_rcu_batch_end(rsp->name, count, !!rcl.head, need_resched(), is_idle_task(current), rcu_is_callbacks_kthread()); diff --git a/kernel/rcu/tree_plugin.h b/kernel/rcu/tree_plugin.h index 6b4b1f8a272d..7ebe357df155 100644 --- a/kernel/rcu/tree_plugin.h +++ b/kernel/rcu/tree_plugin.h @@ -1934,7 +1934,7 @@ static bool __maybe_unused rcu_nocb_adopt_orphan_cbs(struct rcu_state *rsp, struct rcu_data *rdp, unsigned long flags) { - long ql = rcu_cblist_n_cbs(>orphan_done); + long ql = rsp->orphan_done.len; long qll = rcu_cblist_n_lazy_cbs(>orphan_done); /* If this is not a no-CBs CPU, tell the caller to do it the old way. */ diff --git a/kernel/rcu/tree_trace.c b/kernel/rcu/tree_trace.c index 30c5bf89ee58..b7743aa2965f 100644 --- a/kernel/rcu/tree_trace.c +++ b/kernel/rcu/tree_trace.c @@ -278,7 +278,7 @@ static void print_one_rcu_state(struct seq_file *m, struct rcu_state *rsp) rsp->n_force_qs - rsp->n_force_qs_ngp, READ_ONCE(rsp->n_force_qs_lh), rcu_cblist_n_lazy_cbs(>orphan_done), - rcu_cblist_n_cbs(>orphan_done)); + rsp->orphan_done.len); for (rnp = >node[0]; rnp - >node[0] < rcu_num_nodes; rnp++) { if (rnp->level != level) { seq_puts(m, "\n"); -- 2.5.2
Re: [BUG] cannot mount nfs, bisected to 0db10944a76ba ("nfs: Convert to separately allocated bdi")
Hello, On Fri 28-04-17 11:56:24, Corentin Labbe wrote: > Since linux next-20170421, mounting nfs give me: > [ 774.994934] [ cut here ] > [ 774.994975] WARNING: CPU: 1 PID: 10284 at /linux-next/fs/sysfs/dir.c:31 > sysfs_warn_dup+0x64/0x74 > [ 774.994985] sysfs: cannot create duplicate filename > '/devices/virtual/bdi/0:32' > [ 774.994992] Modules linked in: axp20x_usb_power gpio_axp209 > nvmem_sunxi_sid sun4i_dma sun4i_ss virt_dma > [ 774.995047] CPU: 1 PID: 10284 Comm: mount.nfs Not tainted 4.11.0-rc4+ #14 > [ 774.995054] Hardware name: Allwinner sun7i (A20) Family > [ 774.995085] [] (unwind_backtrace) from [] > (show_stack+0x10/0x14) > [ 774.995104] [] (show_stack) from [] > (dump_stack+0x78/0x8c) > [ 774.995121] [] (dump_stack) from [] (__warn+0xe8/0x100) > [ 774.995135] [] (__warn) from [] > (warn_slowpath_fmt+0x38/0x48) > [ 774.995150] [] (warn_slowpath_fmt) from [] > (sysfs_warn_dup+0x64/0x74) > [ 774.995167] [] (sysfs_warn_dup) from [] > (sysfs_create_dir_ns+0x84/0x94) > [ 774.995184] [] (sysfs_create_dir_ns) from [] > (kobject_add_internal+0x9c/0x2ec) > [ 774.995199] [] (kobject_add_internal) from [] > (kobject_add+0x48/0x98) > [ 774.995217] [] (kobject_add) from [] > (device_add+0xe4/0x5a0) > [ 774.995232] [] (device_add) from [] > (device_create_groups_vargs+0xac/0xbc) > [ 774.995247] [] (device_create_groups_vargs) from [] > (device_create_vargs+0x20/0x28) > [ 774.995263] [] (device_create_vargs) from [] > (bdi_register_va+0x44/0xfc) > [ 774.995280] [] (bdi_register_va) from [] > (super_setup_bdi_name+0x48/0xa4) > [ 774.995299] [] (super_setup_bdi_name) from [] > (nfs_fill_super+0x1a4/0x204) > [ 774.995315] [] (nfs_fill_super) from [] > (nfs_fs_mount_common+0x140/0x1e8) > [ 774.995333] [] (nfs_fs_mount_common) from [] > (nfs4_remote_mount+0x50/0x58) > [ 774.995349] [] (nfs4_remote_mount) from [] > (mount_fs+0x14/0xa4) > [ 774.995368] [] (mount_fs) from [] > (vfs_kern_mount+0x54/0x128) > [ 774.995385] [] (vfs_kern_mount) from [] > (nfs_do_root_mount+0x80/0xa0) > [ 774.995400] [] (nfs_do_root_mount) from [] > (nfs4_try_mount+0x28/0x3c) > [ 774.995415] [] (nfs4_try_mount) from [] > (nfs_fs_mount+0x2cc/0x8c4) > [ 774.995430] [] (nfs_fs_mount) from [] > (mount_fs+0x14/0xa4) > [ 774.995445] [] (mount_fs) from [] > (vfs_kern_mount+0x54/0x128) > [ 774.995461] [] (vfs_kern_mount) from [] > (do_mount+0x158/0xc7c) > [ 774.995475] [] (do_mount) from [] (SyS_mount+0x8c/0xb4) > [ 774.995491] [] (SyS_mount) from [] > (ret_fast_syscall+0x0/0x3c) > [ 774.995501] ---[ end trace 0665e451f8864ff0 ]--- ... > The mount command is > mount -t nfs -o tcp,hard,intr,async,rsize=4096,wsize=4096 > 192.168.1.100:/mnt/local_kernel /usr/src/ > > the mount command failling with: "mount.nfs: Cannot allocate memory" I've tried reproducing this (both with NFSv3 and NFSv4) and failed. Also I have looked through the code and I fail to see how this could happen. Is this the only NFS mount that you have on your system? Didn't also the WARN_ON in super_setup_bdi_name() trigger? Can you run with the attached debug patch and post full dmesg after the failure? Honza -- Jan Kara SUSE Labs, CR >From 90f66349ce91d8dd4811eca5efffb605cf084193 Mon Sep 17 00:00:00 2001 From: Jan Kara Date: Tue, 2 May 2017 18:24:57 +0200 Subject: [PATCH] nfs: Debug bdi registration failure Debug failure and fixup error code propagation. Signed-off-by: Jan Kara --- fs/nfs/super.c | 5 - mm/backing-dev.c | 2 ++ 2 files changed, 6 insertions(+), 1 deletion(-) diff --git a/fs/nfs/super.c b/fs/nfs/super.c index dc69314d455e..f60a387105ad 100644 --- a/fs/nfs/super.c +++ b/fs/nfs/super.c @@ -2344,6 +2344,8 @@ int nfs_fill_super(struct super_block *sb, struct nfs_mount_info *mount_info) nfs_initialise_sb(sb); + printk("Registering bdi %u:%u for server %p\n", MAJOR(server->s_dev), MINOR(server->s_dev), server); + dump_stack(); ret = super_setup_bdi_name(sb, "%u:%u", MAJOR(server->s_dev), MINOR(server->s_dev)); if (ret) @@ -2607,7 +2609,7 @@ struct dentry *nfs_fs_mount_common(struct nfs_server *server, /* initial superblock/root creation */ error = mount_info->fill_super(s, mount_info); if (error) - goto error_splat_super; + goto error_splat_super_err; nfs_get_cache_cookie(s, mount_info->parsed, mount_info->cloned); } @@ -2630,6 +2632,7 @@ struct dentry *nfs_fs_mount_common(struct nfs_server *server, error_splat_root: dput(mntroot); +error_splat_super_err: mntroot = ERR_PTR(error); error_splat_super: deactivate_locked_super(s); diff --git a/mm/backing-dev.c b/mm/backing-dev.c index f028a9a472fd..ac173d06834a 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -923,6 +923,8 @@ static void bdi_remove_from_list(struct backing_dev_info *bdi) void bdi_unregister(struct backing_dev_info *bdi) { + printk("Unregistering bdi %s\n", dev_name(bdi->dev)); +
[PATCH 2/2] ARM: dts: omap4-droid4: Add vibrator
Add vibrator to Droid4's device tree. Signed-off-by: Sebastian Reichel --- arch/arm/boot/dts/omap4-droid4-xt894.dts | 38 1 file changed, 38 insertions(+) diff --git a/arch/arm/boot/dts/omap4-droid4-xt894.dts b/arch/arm/boot/dts/omap4-droid4-xt894.dts index 89eb607f4a9e..67d286a53368 100644 --- a/arch/arm/boot/dts/omap4-droid4-xt894.dts +++ b/arch/arm/boot/dts/omap4-droid4-xt894.dts @@ -116,6 +116,32 @@ }; }; + + pwm8: dmtimer-pwm-8 { + pinctrl-names = "default"; + pinctrl-0 = <_direction_pin>; + + compatible = "ti,omap-dmtimer-pwm"; + #pwm-cells = <3>; + ti,timers = <>; + ti,clock-source = <0x01>; + }; + + pwm9: dmtimer-pwm-9 { + pinctrl-names = "default"; + pinctrl-0 = <_enable_pin>; + + compatible = "ti,omap-dmtimer-pwm"; + #pwm-cells = <3>; + ti,timers = <>; + ti,clock-source = <0x01>; + }; + + vibrator { + compatible = "motorola,mapphone-pwm-vibrator"; + pwms = < 0 10 0>, < 0 10 0>; + pwm-names = "enable", "direction"; + }; }; { @@ -453,6 +479,18 @@ OMAP4_IOPAD(0x1c8, PIN_INPUT_PULLUP | MUX_MODE7) >; }; + + vibrator_direction_pin: pinmux_vibrator_direction_pin { + pinctrl-single,pins = < + OMAP4_IOPAD(0x1ce, PIN_OUTPUT | MUX_MODE1) /* dmtimer8_pwm_evt (gpio_27) */ + >; + }; + + vibrator_enable_pin: pinmux_vibrator_enable_pin { + pinctrl-single,pins = < + OMAP4_IOPAD(0X1d0, PIN_OUTPUT | MUX_MODE1) /* dmtimer9_pwm_evt (gpio_28) */ + >; + }; }; _pmx_wkup { -- 2.11.0
[PATCH 0/2] PWM Vibrator driver
Hi, The Motorola Droid 4 has a vibrator, that is connected to two GPIOs. Motorola's stock kernel names them vib_en and vib_dir, which probably stand for vibrator_enable and vibrator_direction. In their stock kernel both GPIOs are toggled using a hrtimer and a custom vibrator "misc" device is provided to userspace. Thankfully the hardware designers the used GPIOs can also be used from OMAP's dmtimers, so that they can be driven as PWM output instead saving some CPU cycles (and code). The driver is loosely based on an old patch from Dmitry, that I found in the internet(tm) [0]. Note, that I did not check the generic vibrator stuff. I just kept it in the driver, since it's probably what other people expect from a pwm-vibra driver :) [0] https://lkml.org/lkml/2012/4/10/41 -- Sebastian Sebastian Reichel (2): Input: pwm-vibra: new driver ARM: dts: omap4-droid4: Add vibrator .../devicetree/bindings/input/pwm-vibrator.txt | 60 arch/arm/boot/dts/omap4-droid4-xt894.dts | 38 +++ drivers/input/misc/Kconfig | 11 + drivers/input/misc/Makefile| 1 + drivers/input/misc/pwm-vibra.c | 348 + 5 files changed, 458 insertions(+) create mode 100644 Documentation/devicetree/bindings/input/pwm-vibrator.txt create mode 100644 drivers/input/misc/pwm-vibra.c -- 2.11.0
[PATCH 1/2] Input: pwm-vibra: new driver
Provide a simple driver for PWM controllable vibrators. It will be used by Motorola Droid 4. Signed-off-by: Sebastian Reichel --- .../devicetree/bindings/input/pwm-vibrator.txt | 60 drivers/input/misc/Kconfig | 11 + drivers/input/misc/Makefile| 1 + drivers/input/misc/pwm-vibra.c | 348 + 4 files changed, 420 insertions(+) create mode 100644 Documentation/devicetree/bindings/input/pwm-vibrator.txt create mode 100644 drivers/input/misc/pwm-vibra.c diff --git a/Documentation/devicetree/bindings/input/pwm-vibrator.txt b/Documentation/devicetree/bindings/input/pwm-vibrator.txt new file mode 100644 index ..c35be4691366 --- /dev/null +++ b/Documentation/devicetree/bindings/input/pwm-vibrator.txt @@ -0,0 +1,60 @@ +* PWM vibrator device tree bindings + +Registers a PWM device as vibrator. + +Required properties: +- compatible: should be + * "pwm-vibrator" +For vibrators controlled using the PWM channel's duty cycle (higher duty means +the vibrator becomes stronger). + * "motorola,mapphone-pwm-vibrator" + For vibrator found in Motorola Droid 4. This vibrator generates a pulse for + every rising edge, so its controlled using a duty cycle of 50% and changing + the period time. +- pwm-names: Should contain "enable" and optionally "direction" +- pwms: Should contain a PWM handle for each entry in pwm-names + +Example from Motorola Droid 4: + +_pmx_core { + vibrator_direction_pin: pinmux_vibrator_direction_pin { + pinctrl-single,pins = < + OMAP4_IOPAD(0x1ce, PIN_OUTPUT | MUX_MODE1) /* dmtimer8_pwm_evt (gpio_27) */ + >; + }; + + vibrator_enable_pin: pinmux_vibrator_enable_pin { + pinctrl-single,pins = < + OMAP4_IOPAD(0X1d0, PIN_OUTPUT | MUX_MODE1) /* dmtimer9_pwm_evt (gpio_28) */ + >; + }; +}; + +/ { + pwm8: dmtimer-pwm { + pinctrl-names = "default"; + pinctrl-0 = <_direction_pin>; + + compatible = "ti,omap-dmtimer-pwm"; + #pwm-cells = <3>; + ti,timers = <>; + ti,clock-source = <0x01>; + }; + + pwm9: dmtimer-pwm { + pinctrl-names = "default"; + pinctrl-0 = <_enable_pin>; + + compatible = "ti,omap-dmtimer-pwm"; + #pwm-cells = <3>; + ti,timers = <>; + ti,clock-source = <0x01>; + }; + + vibrator { + compatible = "pwm-vibrator"; + pwms = < 0 10 0>, + < 0 10 0>; + pwm-names = "enable", "direction"; + }; +}; diff --git a/drivers/input/misc/Kconfig b/drivers/input/misc/Kconfig index 5b6c52210d20..d8e0b25eb217 100644 --- a/drivers/input/misc/Kconfig +++ b/drivers/input/misc/Kconfig @@ -571,6 +571,17 @@ config INPUT_PWM_BEEPER To compile this driver as a module, choose M here: the module will be called pwm-beeper. +config INPUT_PWM_VIBRA + tristate "PWM vibrator support" + depends on PWM + help + Say Y here to get support for PWM based vibrator devices. + + If unsure, say N. + + To compile this driver as a module, choose M here: the module will be + called pwm-vibra. + config INPUT_GPIO_ROTARY_ENCODER tristate "Rotary encoders connected to GPIO pins" depends on GPIOLIB || COMPILE_TEST diff --git a/drivers/input/misc/Makefile b/drivers/input/misc/Makefile index b10523f2878e..9a6517f5458c 100644 --- a/drivers/input/misc/Makefile +++ b/drivers/input/misc/Makefile @@ -58,6 +58,7 @@ obj-$(CONFIG_INPUT_PM8XXX_VIBRATOR) += pm8xxx-vibrator.o obj-$(CONFIG_INPUT_PMIC8XXX_PWRKEY)+= pmic8xxx-pwrkey.o obj-$(CONFIG_INPUT_POWERMATE) += powermate.o obj-$(CONFIG_INPUT_PWM_BEEPER) += pwm-beeper.o +obj-$(CONFIG_INPUT_PWM_VIBRA) += pwm-vibra.o obj-$(CONFIG_INPUT_RB532_BUTTON) += rb532_button.o obj-$(CONFIG_INPUT_REGULATOR_HAPTIC) += regulator-haptic.o obj-$(CONFIG_INPUT_RETU_PWRBUTTON) += retu-pwrbutton.o diff --git a/drivers/input/misc/pwm-vibra.c b/drivers/input/misc/pwm-vibra.c new file mode 100644 index ..a7d36d88679a --- /dev/null +++ b/drivers/input/misc/pwm-vibra.c @@ -0,0 +1,348 @@ +/* + * PWM vibrator driver + * + * Copyright (C) 2017 Collabora Ltd. + * + * Based on previous work from: + * Copyright (C) 2012 Dmitry Torokhov + * + * Based on PWM beeper driver: + * Copyright (C) 2010, Lars-Peter Clausen + * + * This program is free software; you can redistribute it and/or modify it + * under the terms of the GNU General Public License as published by the + * Free Software Foundation; either version 2 of the License, or (at your + * option) any later version. + */ + +#define DEBUG + +#include +#include +#include +#include +#include +#include
[PATCH] btrfs: always write superblocks synchronously
Commit b685d3d65ac7 "block: treat REQ_FUA and REQ_PREFLUSH as synchronous" removed REQ_SYNC flag from WRITE_FUA implementation. Since REQ_FUA and REQ_FLUSH flags are stripped from submitted IO when the disk doesn't have volatile write cache and thus effectively make the write async. This was seen to cause performance hits up to 90% regression in disk IO related benchmarks such as reaim and dbench[1]. Fix the problem by making sure the first superblock write is also treated as synchronous since they can block progress of the journalling (commit, log syncs) machinery and thus the whole filesystem. [1] https://www.spinics.net/lists/linux-ext4/msg56238.html Fixes: b685d3d65ac (block: treat REQ_FUA and REQ_PREFLUSH as synchronous) Cc: stable Cc: Jan Kara Signed-off-by: Davidlohr Bueso --- fs/btrfs/disk-io.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c index 061c1d1f774f..51b2fd8ceccb 100644 --- a/fs/btrfs/disk-io.c +++ b/fs/btrfs/disk-io.c @@ -3468,7 +3468,7 @@ static int write_dev_supers(struct btrfs_device *device, * to go down lazy. */ if (i == 0) - ret = btrfsic_submit_bh(REQ_OP_WRITE, REQ_FUA, bh); + ret = btrfsic_submit_bh(REQ_OP_WRITE, REQ_FUA | REQ_SYNC, bh); else ret = btrfsic_submit_bh(REQ_OP_WRITE, REQ_SYNC, bh); if (ret) -- 2.12.0
Re: [PATCH 1/1] arm64: Always provide "model name" in /proc/cpuinfo
On 05/02/2017 01:08 PM, Catalin Marinas wrote: > On Tue, May 02, 2017 at 12:39:13AM +0200, Heinrich Schuchardt wrote: >> There is no need to hide the model name in processes >> that are not PER_LINUX32. >> >> So let us always provide a model name that is easily readable. >> >> Fixes: e47b020a323d ("arm64: Provide "model name" in /proc/cpuinfo for >> PER_LINUX32 tasks") >> Signed-off-by: Heinrich Schuchardt >> --- >> arch/arm64/kernel/cpuinfo.c | 5 ++--- >> 1 file changed, 2 insertions(+), 3 deletions(-) >> >> diff --git a/arch/arm64/kernel/cpuinfo.c b/arch/arm64/kernel/cpuinfo.c >> index b3d5b3e8fbcb..9ad9ddcd2f19 100644 >> --- a/arch/arm64/kernel/cpuinfo.c >> +++ b/arch/arm64/kernel/cpuinfo.c >> @@ -118,9 +118,8 @@ static int c_show(struct seq_file *m, void *v) >> * "processor". Give glibc what it expects. >> */ >> seq_printf(m, "processor\t: %d\n", i); >> -if (compat) >> -seq_printf(m, "model name\t: ARMv8 Processor rev %d >> (%s)\n", >> - MIDR_REVISION(midr), COMPAT_ELF_PLATFORM); >> +seq_printf(m, "model name\t: ARMv8 Processor rev %d (%s)\n", >> + MIDR_REVISION(midr), COMPAT_ELF_PLATFORM); >> >> seq_printf(m, "BogoMIPS\t: %lu.%02lu\n", >> loops_per_jiffy / (50UL/HZ), > > Such patch seems to come up regularly: > > https://patchwork.kernel.org/patch/9303311/ > > (and it usually gets rejected) > Dear Catalin, thank you for pointing me to the previous discussion. I understand that adding model name in the current form would not provide sufficient valuable information. The real interesting thing in an ARM SOC is to see which CPU is A72, A57, A53 or whatever. This information is available from the device tree in the compatible property of the individual CPUs (/sys/firmware/devicetree/base/cpus/cpu@*/compatible), e.g. compatible =3D "arm,cortex-a53", "arm,armv8"; I guess this information is the closest to the model name property on other architectures that we can get. Would exposing this information in /proc/cpuinfo as 'model name' make sense to you? Best regards Heinrich
[PATCH v3] arm64: perf: Ignore exclude_hv when kernel is running in HYP
commit d98ecdaca296 ("arm64: perf: Count EL2 events if the kernel is running in HYP") returns -EINVAL when perf system call perf_event_open is called with exclude_hv != exclude_kernel. This change breaks applications on VHE enabled ARMv8.1 platforms. The issue was observed with HHVM application, which calls perf_event_open with exclude_hv = 1 and exclude_kernel = 0. There is no separate hypervisor privilege level when VHE is enabled, the host kernel runs at EL2. So when VHE is enabled, we should ignore exclude_hv from the application. This behaviour is consistent with PowerPC where the exclude_hv is ignored when the hypervisor is not present and with x86 where this flag is ignored. Signed-off-by: Ganapatrao Kulkarni --- Changelog: V2/V3: - Changes as per Will Deacon's suggestions. V1: Initial patch arch/arm64/kernel/perf_event.c | 16 +--- 1 file changed, 9 insertions(+), 7 deletions(-) diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c index 57ae9d9..f6748c0 100644 --- a/arch/arm64/kernel/perf_event.c +++ b/arch/arm64/kernel/perf_event.c @@ -871,15 +871,17 @@ static int armv8pmu_set_event_filter(struct hw_perf_event *event, if (attr->exclude_idle) return -EPERM; - if (is_kernel_in_hyp_mode() && - attr->exclude_kernel != attr->exclude_hv) - return -EINVAL; + if (is_kernel_in_hyp_mode()) { + if (!attr->exclude_kernel) + config_base |= ARMV8_PMU_INCLUDE_EL2; + } else { + if (attr->exclude_kernel) + config_base |= ARMV8_PMU_EXCLUDE_EL1; + if (!attr->exclude_hv) + config_base |= ARMV8_PMU_INCLUDE_EL2; + } if (attr->exclude_user) config_base |= ARMV8_PMU_EXCLUDE_EL0; - if (!is_kernel_in_hyp_mode() && attr->exclude_kernel) - config_base |= ARMV8_PMU_EXCLUDE_EL1; - if (!attr->exclude_hv) - config_base |= ARMV8_PMU_INCLUDE_EL2; /* * Install the filter into config_base as this is used to -- 1.8.1.4
[PATCH] ata-sff: always map page before data transfer
The XPFO [1] patchset may unmap pages from physmap if they happened to be destined for userspace. If such a page is unmapped, it needs to be remapped. Rather than test if a page is in the highmem/xpfo unmapped state, Christoph suggested [2] that we simply always map the page. Suggested-by: Christoph Hellwig Signed-off-by: Tycho Andersen CC: Juerg Haefliger CC: Tejun Heo [1]: https://lkml.org/lkml/2016/11/4/245 [2]: https://lkml.org/lkml/2016/11/4/253 --- I don't understand all the factors at play here, so thoughts are definitely welcome. --- drivers/ata/libata-sff.c | 50 +--- 1 file changed, 18 insertions(+), 32 deletions(-) diff --git a/drivers/ata/libata-sff.c b/drivers/ata/libata-sff.c index 2bd92dc..8da2572 100644 --- a/drivers/ata/libata-sff.c +++ b/drivers/ata/libata-sff.c @@ -703,6 +703,7 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) struct page *page; unsigned int offset; unsigned char *buf; + unsigned long flags; if (qc->curbytes == qc->nbytes - qc->sect_size) ap->hsm_task_state = HSM_ST_LAST; @@ -716,24 +717,16 @@ static void ata_pio_sector(struct ata_queued_cmd *qc) DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); - if (PageHighMem(page)) { - unsigned long flags; - - /* FIXME: use a bounce buffer */ - local_irq_save(flags); - buf = kmap_atomic(page); + /* FIXME: use a bounce buffer */ + local_irq_save(flags); + buf = kmap_atomic(page); - /* do the actual data transfer */ - ap->ops->sff_data_xfer(qc, buf + offset, qc->sect_size, - do_write); + /* do the actual data transfer */ + ap->ops->sff_data_xfer(qc, buf + offset, qc->sect_size, + do_write); - kunmap_atomic(buf); - local_irq_restore(flags); - } else { - buf = page_address(page); - ap->ops->sff_data_xfer(qc, buf + offset, qc->sect_size, - do_write); - } + kunmap_atomic(buf); + local_irq_restore(flags); if (!do_write && !PageSlab(page)) flush_dcache_page(page); @@ -836,6 +829,7 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) struct page *page; unsigned char *buf; unsigned int offset, count, consumed; + unsigned long flags; next_sg: sg = qc->cursg; @@ -861,24 +855,16 @@ static int __atapi_pio_bytes(struct ata_queued_cmd *qc, unsigned int bytes) DPRINTK("data %s\n", qc->tf.flags & ATA_TFLAG_WRITE ? "write" : "read"); - if (PageHighMem(page)) { - unsigned long flags; - - /* FIXME: use bounce buffer */ - local_irq_save(flags); - buf = kmap_atomic(page); + /* FIXME: use bounce buffer */ + local_irq_save(flags); + buf = kmap_atomic(page); - /* do the actual data transfer */ - consumed = ap->ops->sff_data_xfer(qc, buf + offset, - count, rw); + /* do the actual data transfer */ + consumed = ap->ops->sff_data_xfer(qc, buf + offset, + count, rw); - kunmap_atomic(buf); - local_irq_restore(flags); - } else { - buf = page_address(page); - consumed = ap->ops->sff_data_xfer(qc, buf + offset, - count, rw); - } + kunmap_atomic(buf); + local_irq_restore(flags); bytes -= min(bytes, consumed); qc->curbytes += count; -- 2.9.3
Re: [PATCH v2 2/4] iio: accel: adxl345_core: Introduce set_mode and data_ready functions
On 02/05/17 12:39, Eva Rachel Retuya wrote: > On Mon, May 01, 2017 at 01:22:52AM +0100, Jonathan Cameron wrote: > Hello Jonathan, > [...] >>> +static int adxl345_set_mode(struct adxl345_data *data, u8 mode) >>> +{ >>> + struct device *dev = regmap_get_device(data->regmap); >>> + int ret; >>> + >>> + ret = regmap_write(data->regmap, ADXL345_REG_POWER_CTL, mode); >>> + if (ret < 0) { >>> + dev_err(dev, "Failed to set power mode, %d\n", ret); >>> + return ret; >> drop the return ret here and just return ret at the end of the function. >> One of the static checkers will probably moan about this otherwise. > > OK. > >>> + } >>> + >>> + return 0; >>> +} >>> + >>> +static int adxl345_data_ready(struct adxl345_data *data) >>> +{ >> So this is a polling the dataready bit. Will ensure we always >> get fresh data when a read occurs. Please add a comment to >> that effect as that's not always how devices work. > > OK. > >>> + struct device *dev = regmap_get_device(data->regmap); >>> + int tries = 5; >>> + u32 val; >>> + int ret; >>> + >>> + do { >>> + /* >>> +* 1/ODR + 1.1ms; 11.1ms at ODR of 0.10 Hz >>> +* Sensor currently operates at default ODR of 100 Hz >>> +*/ >>> + usleep_range(1100, 11100); >> That's a huge range to allow... I'm not following the argument for why. >> Or do we have a stray 0? >> > > Not a stray 0. Range is from 1.1ms to 11.1ms, this represents the > wake-up time when going to standby/other power saving modes -> > measurement mode. I'm going to clarify the comment on why it is needed > on the next revision. The point about a range sleep is to allow the kernel flexibility in scheduling so as to avoid waking the processor from lower power states when high precision is not needed. If the thing you are talking about needs the maximum time sometimes then you should set the minimum value to that and add a bit to avoid unnecessary processor wake ups. > >>> + >>> + ret = regmap_read(data->regmap, ADXL345_REG_INT_SOURCE, ); >>> + if (ret < 0) >>> + return ret; >>> + if ((val & ADXL345_INT_DATA_READY) == ADXL345_INT_DATA_READY) >>> + return 0; >>> + } while (--tries); >>> + dev_err(dev, "Data is not yet ready, try again.\n"); >>> + >> This is almost certainly a hardware fault. I'd be more brutal with >> the error and return -EIO. If you get here your hardware is very unlikely >> to be working correctly if you try again. > > OK, will change it to -EIO then. > >>> + return -EAGAIN; >>> +} >>> + >>> #define ADXL345_CHANNEL(reg, axis) { >>> \ >>> .type = IIO_ACCEL, \ >>> .modified = 1, \ >>> @@ -72,6 +118,19 @@ static int adxl345_read_raw(struct iio_dev *indio_dev, >>> >>> switch (mask) { >>> case IIO_CHAN_INFO_RAW: >>> + mutex_lock(>lock); >>> + ret = adxl345_set_mode(data, ADXL345_POWER_CTL_MEASURE); >>> + if (ret < 0) { >>> + mutex_unlock(>lock); >>> + return ret; >>> + } >>> + >>> + ret = adxl345_data_ready(data); >>> + if (ret < 0) { >>> + adxl345_set_mode(data, ADXL345_POWER_CTL_STANDBY); >>> + mutex_unlock(>lock); >> What is the logic that puts the mutex_unlock here in the error case >> and before the set_mode in the normal path? Even if it doesn't >> matter make them the same as it is less likely to raise questions >> in the future! > > OK, will make it consistent. > >>> + return ret; >>> + } >>> /* >>> * Data is stored in adjacent registers: >>> * ADXL345_REG_DATA(X0/Y0/Z0) contain the least significant byte >>> @@ -79,10 +138,15 @@ static int adxl345_read_raw(struct iio_dev *indio_dev, >>> */ >>> ret = regmap_bulk_read(data->regmap, chan->address, , >>>sizeof(regval)); >>> - if (ret < 0) >>> + mutex_unlock(>lock); >>> + if (ret < 0) { >>> + adxl345_set_mode(data, ADXL345_POWER_CTL_STANDBY); >>> return ret; >>> + } >>> >>> *val = sign_extend32(le16_to_cpu(regval), 12); >>> + adxl345_set_mode(data, ADXL345_POWER_CTL_STANDBY); >>> + >>> return IIO_VAL_INT; >>> case IIO_CHAN_INFO_SCALE: >>> *val = 0; > [...] >>> @@ -169,8 +224,7 @@ int adxl345_core_remove(struct device *dev) >>> >>> iio_device_unregister(indio_dev); >>> >>> - return regmap_write(data->regmap, ADXL345_REG_POWER_CTL, >>> - ADXL345_POWER_CTL_STANDBY); >>> + return adxl345_set_mode(data, ADXL345_POWER_CTL_STANDBY); >> Under what circumstances would we not already be in the correct state? >> A
Re: [PATCH 2/2] pid_ns: Introduce ioctl to set vector of ns_last_pid's on ns hierarhy
sorry for delay, vacation... On 04/28, Kirill Tkhai wrote: > > On 27.04.2017 19:22, Oleg Nesterov wrote: > > > > Ah, OK, I didn't notice the ns->child_reaper check in > > pidns_for_children_get(). > > > > But note that it doesn't need tasklist_lock too. > > Hm, are there possible strange situations with memory ordering, when we see > ns->child_reaper of already died ns, which was placed in the same memory? > Do we have to use some memory barriers here? Could you spell please? I don't understand your concerns... I don't see how, say, static struct ns_common *pidns_for_children_get(struct task_struct *task) { struct ns_common *ns = NULL; struct pid_namespace *pid_ns; task_lock(task); if (task->nsproxy) { pid_ns = task->nsproxy->pid_ns_for_children; if (pid_ns->child_reaper) { ns = _ns->ns; get_pid_ns(ns); } } task_unlock(task); return ns; } can be wrong. It also looks more clean to me. ->child_reaper is not stable without tasklist, it can be dead/etc, but we do not care? Oleg.
Re: [PATCH RFC v2 4/6] proc: support mounting private procfs instances inside same pid namespace
On Tue, May 2, 2017 at 7:29 AM, Djalal Harouni wrote: > On Thu, Apr 27, 2017 at 12:13 AM, Andy Lutomirski wrote: >> On Tue, Apr 25, 2017 at 5:23 AM, Djalal Harouni wrote: > [...] >>> We have to align procfs and modernize it to have a per mount context >>> where at least the mount option do not propagate to all other mounts, >>> then maybe we can continue to implement new features. One example is to >>> require CAP_SYS_ADMIN in the init user namespace on some /proc/* which are >>> not pids and which are are not virtualized by design, or CAP_NET_ADMIN >>> inside userns on the net bits that are virtualized, etc. >>> These mount options won't propagate to previous mounts, and the system >>> will continue to be usable. >>> >>> Ths patch introduces the new 'limit_pids' mount option as it was also >>> suggesed by Andy Lutomirski [1]. When this option is passed we >>> automatically create a private procfs instance. This is not the default >>> behaviour since we do not want to break userspace and we do not want to >>> provide different devices IDs by default, please see [1] for why. >> >> I think that calling the option to make a separate instance >> "limit_pids" is extremely counterintuitive. > > Ok. > >> My strong preference would be to make proc *always* make a separate >> instance (unless it's a bind mount) and to make it work. If that >> means fudging stat() output, so be it. > > I also agree, but as said if we change stat(), userspace won't be able > to notice if these two proc instances are really separated, the device > ID is the only indication here. I re-read all the threads and I'm still not convinced I see why we need new_instance to be non-default. It's true that the device numbers of /proc/ns/* matter, but if you look (with stat -L, for example), they're *already* not tied to the procfs instance. I'm okay with adding new_instance to be on the safe side, but I'd like it to be done in a way that we could make it become the default some day without breaking anything. This means that we need to be rather careful about how new_instance and hidepid interact.
Re: [GIT PULL] TTY/Serial driver fixes for 4.11-rc4
On Fri, Apr 14, 2017 at 2:30 PM, Greg KH wrote: > On Fri, Apr 14, 2017 at 11:41:26AM +0200, Vegard Nossum wrote: >> On 13 April 2017 at 20:34, Greg KH wrote: >> > On Thu, Apr 13, 2017 at 09:07:40AM -0700, Linus Torvalds wrote: >> >> On Thu, Apr 13, 2017 at 3:50 AM, Vegard Nossum >> >> wrote: >> >> > >> >> > I've bisected a syzkaller crash down to this commit >> >> > (5362544bebe85071188dd9e479b5a5040841c895). The crash is: >> >> > >> >> > [ 25.137552] BUG: unable to handle kernel paging request at >> >> > 2280 >> >> > [ 25.137579] IP: mutex_lock_interruptible+0xb/0x30 >> >> >> >> It would seem to be the >> >> >> >> if (mutex_lock_interruptible(>atomic_read_lock)) >> >> >> >> call in n_tty_read(), the offset is about right for a NULL 'ldata' >> >> pointer (it's a big structure, it has a couple of character buffers of >> >> size N_TTY_BUF_SIZE). >> >> >> >> I don't see the obvious fix, so I suspect at this point we should just >> >> revert, as that commit seems to introduce worse problems that it is >> >> supposed to fix. Greg? >> > >> > Unless Dmitry has a better idea, I will just revert it and send you the >> > pull request in a day or so. >> >> I don't think we need to rush a revert, I'd hope there's a way to fix >> it properly. > > For this late in the release cycle, for something as complex as tty > ldisc handling, for an issue that has been present for over a decade, > the safest thing right now is to go back to the old well-known code by > applying a revert :) > >> So the original problem is that the vmalloc() in n_tty_open() can >> fail, and that will panic in tty_set_ldisc()/tty_ldisc_restore() >> because of its unwillingness to proceed if the tty doesn't have an >> ldisc. >> >> Dmitry fixed this by allowing tty->ldisc == NULL in the case of memory >> allocation failure as we can see from the comment in tty_set_ldisc(). >> >> Unfortunately, it would appear that some other bits of code do not >> like tty->ldisc == NULL (other than the crash in this thread, I saw >> 2-3 similar crashes in other functions, e.g. poll()). I see two >> possibilities: >> >> 1) make other code handle tty->ldisc == NULL. >> >> 2) don't close/free the old ldisc until the new one has been >> successfully created/initialised/opened/attached to the tty, and >> return an error to userspace if changing it failed. >> >> I'm leaning towards #2 as the more obviously correct fix, it makes >> tty_set_ldisc() transactional, the fix seems limited in scope to >> tty_set_ldisc() itself, and we don't need to make every other bit of >> code that uses tty->ldisc handle the NULL case. > > That sounds reasonable to me, care to work on a patch for this? Vegard, do you know how to do this? That was first thing that I tried, but I did not manage to make it work. disc is tied to tty, so it's not that one can create a fully initialized disc on the side and then simply swap pointers. Looking at the code now, there is at least TTY_LDISC_OPEN bit in tty. But as far as I remember there were more fundamental problems. Or maybe I just did not try too hard.