[RFC] ext4: fix a BUG in ext4_mpage_readpages()
The last block should be the total read pages last block, need adjust the block number when caculate every page. Signed-off-by: yalin wang --- fs/ext4/readpage.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c index 5dc5e95..d20c39d 100644 --- a/fs/ext4/readpage.c +++ b/fs/ext4/readpage.c @@ -174,7 +174,7 @@ int ext4_mpage_readpages(struct address_space *mapping, goto confused; block_in_file = (sector_t)page->index << (PAGE_CACHE_SHIFT - blkbits); - last_block = block_in_file + nr_pages * blocks_per_page; + last_block = block_in_file + (nr_pages - page_idx) * blocks_per_page; last_block_in_file = (i_size_read(inode) + blocksize - 1) >> blkbits; if (last_block > last_block_in_file) last_block = last_block_in_file; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 2/2] watchdog: imx2_wdt: add set_pretimeout interface
On Tue, Nov 03, 2015 at 09:26:16AM -0800, Guenter Roeck wrote: > On 11/03/2015 12:04 AM, Robin Gong wrote: > >On Mon, Nov 02, 2015 at 11:48:29PM -0800, Guenter Roeck wrote: > >>On 11/02/2015 10:11 PM, Robin Gong wrote: > >>>Enable set_pretimeout interface and trigger the pretimeout interrupt before > >>>watchdog timeout event happen. > >>> > >>>Signed-off-by: Robin Gong > >>>--- > >>> drivers/watchdog/imx2_wdt.c | 58 > >>> - > >>> 1 file changed, 57 insertions(+), 1 deletion(-) > >>> > >>>diff --git a/drivers/watchdog/imx2_wdt.c b/drivers/watchdog/imx2_wdt.c > >>>index 0bb1a1d..bd42857 100644 > >>>--- a/drivers/watchdog/imx2_wdt.c > >>>+++ b/drivers/watchdog/imx2_wdt.c > >>>@@ -24,6 +24,7 @@ > >>> #include > >>> #include > >>> #include > >>>+#include > >>> #include > >>> #include > >>> #include > >>>@@ -52,12 +53,18 @@ > >>> #define IMX2_WDT_WRSR0x04/* Reset Status > >>> Register */ > >>> #define IMX2_WDT_WRSR_TOUT (1 << 1)/* -> Reset due to > >>> Timeout */ > >>> > >>>+#define IMX2_WDT_WICR 0x06/*Interrupt Control > >>>Register*/ > >>>+#define IMX2_WDT_WICR_WIE (1 << 15) /* -> Interrupt Enable */ > >>>+#define IMX2_WDT_WICR_WTIS(1 << 14) /* -> Interrupt Status > >>>*/ > >>>+#define IMX2_WDT_WICR_WICT(0xFF) /* Watchdog Interrupt Timeout */ > >> > >>Unnecessary ( ) > >> > >>>+ > >>> #define IMX2_WDT_WMCR0x08/* Misc Register */ > >>> > >>> #define IMX2_WDT_MAX_TIME128 > >>> #define IMX2_WDT_DEFAULT_TIME60 /* in seconds */ > >>> > >>> #define WDOG_SEC_TO_COUNT(s) ((s * 2 - 1) << 8) > >>>+#define WDOG_SEC_TO_PRECOUNT(s) ((s) * 2) /* set WDOG pre timeout > >>>count*/ > >>> > >>> struct imx2_wdt_device { > >>> struct clk *clk; > >>>@@ -80,7 +87,8 @@ MODULE_PARM_DESC(timeout, "Watchdog timeout in seconds > >>>(default=" > >>> > >>> static const struct watchdog_info imx2_wdt_info = { > >>> .identity = "imx2+ watchdog", > >>>- .options = WDIOF_KEEPALIVEPING | WDIOF_SETTIMEOUT | WDIOF_MAGICCLOSE, > >>>+ .options = WDIOF_KEEPALIVEPING | WDIOF_SETTIMEOUT | WDIOF_MAGICCLOSE > >>>+ | WDIOF_PRETIMEOUT, > >>> }; > >>> > >>> static int imx2_restart_handler(struct notifier_block *this, unsigned > >>> long mode, > >>>@@ -207,12 +215,49 @@ static inline void imx2_wdt_ping_if_active(struct > >>>watchdog_device *wdog) > >>> } > >>> } > >>> > >>>+static int imx2_wdt_set_pretimeout(struct watchdog_device *wdog, > >>>+ unsigned int new_timeout) > >>>+{ > >>>+ struct imx2_wdt_device *wdev = watchdog_get_drvdata(wdog); > >>>+ u32 val; > >>>+ > >>>+ regmap_read(wdev->regmap, IMX2_WDT_WICR, ); > >>>+ /* set the new pre-timeout value in the WSR */ > >>>+ val &= ~IMX2_WDT_WICR_WICT; > >>>+ val |= WDOG_SEC_TO_PRECOUNT(new_timeout); > >> > >>Looking into this, I think we need more information in the first patch. > >>Specifically, we need a value for max_pretimeout and validate it > >>(new_timeout must be smaller than 128 to avoid overflows, independent > >>of the maximum timeout). > >> > >I think it's enough, since there is "t >= wdd->timeout" in > >watchdog_pretimeout_invalid() of the first patch and wdd->timeout is never > >bigger than wdd->max_timeout which setting in imx2_wdt.c as 128. > > We are dealing with infrastructure code here. This happens to work for your > driver, > but that doesn't mean it will work for other drivers. > > Anyway, I recalled that there is another patchset pending to add pretimeout > support into the watchdog code. I'll have to dig that out and consolidate the > two. > > Guenter > Got it. Thanks. > >>>+ > >>>+ regmap_write(wdev->regmap, IMX2_WDT_WICR, val | IMX2_WDT_WICR_WIE); > >>>+ > >>>+ wdog->pretimeout = new_timeout; > >>>+ > >>>+ return 0; > >>>+} > >>>+ > >>>+static irqreturn_t imx2_wdt_isr(int irq, void *dev_id) > >>>+{ > >>>+ struct platform_device *pdev = dev_id; > >>>+ struct watchdog_device *wdog = platform_get_drvdata(pdev); > >>>+ struct imx2_wdt_device *wdev = watchdog_get_drvdata(wdog); > >>>+ u32 val; > >>>+ > >>>+ regmap_read(wdev->regmap, IMX2_WDT_WICR, ); > >>>+ if (val & IMX2_WDT_WICR_WTIS) { > >>>+ /*clear interrupt status bit*/ > >>>+ regmap_write(wdev->regmap, IMX2_WDT_WICR, val); > >>>+ panic("pre-timeout:%d, %d Seconds remained\n", wdog->pretimeout, > >> > >>"seconds" - no capital 'S'. > >> > >>>+wdog->timeout - wdog->pretimeout); > >>>+ } > >>>+ > >>>+ return IRQ_HANDLED; > >>>+} > >>>+ > >>> static const struct watchdog_ops imx2_wdt_ops = { > >>> .owner = THIS_MODULE, > >>> .start = imx2_wdt_start, > >>> .stop = imx2_wdt_stop, > >>> .ping = imx2_wdt_ping, > >>> .set_timeout = imx2_wdt_set_timeout, > >>>+ .set_pretimeout = imx2_wdt_set_pretimeout, > >>> }; > >>> > >>> static const struct regmap_config imx2_wdt_regmap_config = { >
Re: [PATCH v7,1/3] dt-bindings: binding for jz4780-{nand,bch}
Hi Harvey, On Tue, 6 Oct 2015 17:27:15 +0100 Harvey Hunt wrote: > From: Alex Smith > > Add DT bindings for NAND devices connected to the NEMC on JZ4780 SoCs, > as well as the hardware BCH controller, used by the jz4780_{nand,bch} > drivers. Patch 3 does not seem to follow this binding: it still uses the ingenic,ecc-xxx properties and defines NAND timings. Also, as answered to patch 3, I think it would be clearer to separate the nand controller and nand chip representation. Best Regards, Boris > > Signed-off-by: Alex Smith > Cc: Zubair Lutfullah Kakakhel > Cc: David Woodhouse > Cc: Brian Norris > Cc: linux-...@lists.infradead.org > Cc: devicet...@vger.kernel.org > Cc: linux-kernel@vger.kernel.org > Cc: Alex Smith > Signed-off-by: Harvey Hunt > --- > v6 -> v7: > - Add nand-ecc-mode to DT bindings. > - Add nand-on-flash-bbt to DT bindings. > > v5 -> v6: > - No change. > > v4 -> v5: > - Rename ingenic,bch-device to ingenic,bch-controller to fit with >existing convention. > > v3 -> v4: > - No change > > v2 -> v3: > - Rebase to 4.0-rc6 > - Changed ingenic,ecc-size to common nand-ecc-step-size > - Changed ingenic,ecc-strength to common nand-ecc-strength > - Changed ingenic,busy-gpio to common rb-gpios > - Changed ingenic,wp-gpio to common wp-gpios > > v1 -> v2: > - Rebase to 4.0-rc3 > > .../bindings/mtd/ingenic,jz4780-nand.txt | 61 > ++ > 1 file changed, 61 insertions(+) > create mode 100644 > Documentation/devicetree/bindings/mtd/ingenic,jz4780-nand.txt > > diff --git a/Documentation/devicetree/bindings/mtd/ingenic,jz4780-nand.txt > b/Documentation/devicetree/bindings/mtd/ingenic,jz4780-nand.txt > new file mode 100644 > index 000..44a0468 > --- /dev/null > +++ b/Documentation/devicetree/bindings/mtd/ingenic,jz4780-nand.txt > @@ -0,0 +1,61 @@ > +* Ingenic JZ4780 NAND/BCH > + > +This file documents the device tree bindings for NAND flash devices on the > +JZ4780. NAND devices are connected to the NEMC controller (described in > +memory-controllers/ingenic,jz4780-nemc.txt), and thus NAND device nodes must > +be children of the NEMC node. > + > +Required NAND device properties: > +- compatible: Should be set to "ingenic,jz4780-nand". > +- reg: For each bank with a NAND chip attached, should specify a bank number, > + an offset of 0 and a size of 0x100 (i.e. the whole NEMC bank). > + > +Optional NAND device properties: > +- ingenic,bch-controller: To make use of the hardware BCH controller, this > + property must contain a phandle for the BCH controller node. The required > + properties for this node are described below. If this is not specified, > + software BCH will be used instead. > +- nand-ecc-step-size: ECC block size in bytes. > +- nand-ecc-strength: ECC strength (max number of correctable bits). > +- nand-ecc-mode: String, operation mode of the NAND ecc mode. "hw" by default > +- nand-on-flash-bbt: boolean to enable on flash bbt option if not present > false > +- rb-gpios: GPIO specifier for the busy pin. > +- wp-gpios: GPIO specifier for the write protect pin. > + > +Example: > + > +nemc: nemc@1341 { > + ... > + > + nand: nand@1 { > + compatible = "ingenic,jz4780-nand"; > + reg = <1 0 0x100>; /* Bank 1 */ > + > + ingenic,bch-controller = <>; > + nand-ecc-step-size = <1024>; > + nand-ecc-strength = <24>; > + nand-ecc-mode = "hw"; > + nand-on-flash-bbt; > + > + rb-gpios = < 20 GPIO_ACTIVE_LOW>; > + wp-gpios = < 22 GPIO_ACTIVE_LOW>; > + }; > +}; > + > +The BCH controller is a separate SoC component used for error correction on > +NAND devices. The following is a description of the device properties for a > +BCH controller. > + > +Required BCH properties: > +- compatible: Should be set to "ingenic,jz4780-bch". > +- reg: Should specify the BCH controller registers location and length. > +- clocks: Clock for the BCH controller. > + > +Example: > + > +bch: bch@134d { > + compatible = "ingenic,jz4780-bch"; > + reg = <0x134d 0x1>; > + > + clocks = < JZ4780_CLK_BCH>; > +}; -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1] Revert "serial: imx: remove unbalanced clk_prepare"
On Wed, Nov 04, 2015 at 01:39:17AM -0200, Fabio Estevam wrote: > On Wed, Nov 4, 2015 at 12:58 AM, Robin Gong wrote: > > commit 9e7b399d6528 ("serial: imx: remove unbalanced clk_prepare"). > > Otherwise below warning happen since there are some printk logs in > > interrupt. > > This has already been reverted since v4.3-rc5. That's great, please ignore this patch. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v7] PCI: Xilinx-NWL-PCIe: Added support for Xilinx NWL PCIe Host Controller
> > Without #ifdefs if we compile driver for legacy, MSI structures will not be > available and we get compile time error. > > Sorry for nitpicking but at least can we use elegant version of #ifdefs .i.e. > #if > IS_ENABLED() here ? > Since IS_ENABLED() is checked at runtime, compile time error would come for legacy. Thanks Bharat
Re: [PATCH v7,3/3] MIPS: dts: jz4780/ci20: Add NEMC, BCH and NAND device tree nodes
Paul, Harvey, On Fri, 16 Oct 2015 11:48:48 +0100 Paul Burton wrote: > On Fri, Oct 16, 2015 at 11:31:12AM +0100, James Hogan wrote: > > > >> + > > > >> + { > > > >> + status = "okay"; > > > >> + > > > >> + nand: nand@1 { > > > >> + compatible = "ingenic,jz4780-nand"; > > > > > > > > Isn't the NAND a micron part? This doesn't seem right. Is the device > > > > driver and binding already accepted upstream with that compatible > > > > string? > > > > > > This is the compatible string for the JZ4780 NAND driver, this patch > > > is part of the series adding that. Detection of the NAND part is > > > handled by the MTD subsystem. > > > > Right (didn't spot that it was part of a series). > > > > The node appears to describe the NAND interface itself, i.e. a part of > > the SoC, so should be in the SoC dtsi file, with overrides in the board > > file if necessary for it to work with a particular NAND part > > (potentially utilising status="disabled"). Would you agree? > > Hi James, > > The "nemc" node there is for the Nand & External Memory Controller which > is a hardware block inside the SoC. It has 6 banks (ie. 6 chip select > pins, each associated with a different address range, that connect to > different devices). NAND flash is one such possible device, but a board > could connect it to any of the 6 chip selects, or banks. To represent > that in the SoC dtsi you'd want to have 6 NAND nodes, each disabled by > default, which doesn't make a whole lot of sense to me. Other, non-NAND > devices can connect to the NEMC too - for example the ethernet > controller on the CI20 is connected to one bank. > > The NAND device nodes are sort of a mix of describing the NAND flash > (ie. Micron part as you point out) and its connections & properties, the > way the NEMC should be used to interact with it alongside the BCH block, > and the configuration for the NEMC such as timing parameters. > > I imagine the most semantically correct means of describing it would > probably be for the compatible string to reflect the Micron NAND part, > and the NEMC driver to pick up on the relevant properties of its child > nodes for configuring timings, whether the device is NAND etc. However > the handling of registering NAND devices with MTD would probably then > have to be part of the NEMC driver, which feels a bit off too. Another solution would be to describe both the NAND controller and the NAND chip in the DT (with the NAND chip being a chip of the NAND controller). Actually this is already what other binding are doing [1][2]. I know those bindings are representing NAND controllers which can interface with more than one NAND chip, but I think that even in the 1:1 case it would make it clearer to represent both the NAND chip and the NAND controller. In your case this would give the following representation + { + status = "okay"; + + nandc: nand-controller@1 { + compatible = "ingenic,jz4780-nand"; + reg = <1 0 0x100>; + #address-cells = <1>; + #size-cells = <0>; + + ingenic,bch-controller = <>; + + nand@0 { + nand-ecc-mode = "hw"; + nand-on-flash-bbt; + nand-ecc-size = <1024>; + nand-ecc-strength = <24>; + + #address-cells = <2>; + #size-cells = <2>; + + partition@0 { + label = "u-boot-spl"; + reg = <0x0 0x0 0x0 0x80>; + }; + /* ... */ + + }; + }; +}; Best Regards, Boris [1]http://lxr.free-electrons.com/source/Documentation/devicetree/bindings/mtd/brcm,brcmnand.txt#L119 [2]http://lxr.free-electrons.com/source/Documentation/devicetree/bindings/mtd/sunxi-nand.txt#L28 > > Thanks, > Paul > > __ > Linux MTD discussion mailing list > http://lists.infradead.org/mailman/listinfo/linux-mtd/ -- Boris Brezillon, Free Electrons Embedded Linux and Kernel engineering http://free-electrons.com -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 3/6] arm64: ftrace: fix a stack tracer's output under function graph tracer
On 11/01/2015 05:00 PM, Jungseok Lee wrote: On Oct 30, 2015, at 2:25 PM, AKASHI Takahiro wrote: Hi Akashi, Function graph tracer modifies a return address (LR) in a stack frame to hook a function return. This will result in many useless entries (return_to_handler) showing up in a stack tracer's output. This patch replaces such entries with originals values preserved in current->ret_stack[]. Signed-off-by: AKASHI Takahiro --- arch/arm64/include/asm/ftrace.h |2 ++ arch/arm64/kernel/stacktrace.c | 21 + 2 files changed, 23 insertions(+) diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h index 2b43e20..b7d597c 100644 --- a/arch/arm64/include/asm/ftrace.h +++ b/arch/arm64/include/asm/ftrace.h @@ -29,6 +29,8 @@ struct dyn_arch_ftrace { extern unsigned long ftrace_graph_call; +extern void return_to_handler(void); + static inline unsigned long ftrace_call_adjust(unsigned long addr) { /* diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c index bc0689a..631c49d 100644 --- a/arch/arm64/kernel/stacktrace.c +++ b/arch/arm64/kernel/stacktrace.c @@ -17,6 +17,7 @@ */ #include #include +#include #include #include @@ -78,6 +79,9 @@ struct stack_trace_data { struct stack_trace *trace; unsigned int no_sched_functions; unsigned int skip; +#ifdef CONFIG_FUNCTION_GRAPH_TRACER + unsigned int ret_stack_index; +#endif }; static int save_trace(struct stackframe *frame, void *d) @@ -86,6 +90,20 @@ static int save_trace(struct stackframe *frame, void *d) struct stack_trace *trace = data->trace; unsigned long addr = frame->pc; +#ifdef CONFIG_FUNCTION_GRAPH_TRACER + if (addr == (unsigned long)return_to_handler - AARCH64_INSN_SIZE) { + /* +* This is a case where function graph tracer has +* modified a return address (LR) in a stack frame +* to hook a function return. +* So replace it to an original value. +*/ + frame->pc = addr = + current->ret_stack[data->ret_stack_index--].ret + - AARCH64_INSN_SIZE; + } +#endif /* CONFIG_FUNCTION_GRAPH_TRACER */ + This hunk would be affected as the commit, "ARM64: unwind: Fix PC calculation", e306dfd0, has been reverted. Yeah, we don't need apply this patch any more. The other patches in the series will not be affected. Thanks, -Takahiro AKASHI Best Regards Jungseok Lee -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] tmpfs: avoid a little creat and stat slowdown
Hugh Dickins writes: > LKP reports that v4.2 commit afa2db2fb6f1 ("tmpfs: truncate prealloc > blocks past i_size") causes a 14.5% slowdown in the AIM9 creat-clo > benchmark. > > creat-clo does just what you'd expect from the name, and creat's O_TRUNC > on 0-length file does indeed get into more overhead now shmem_setattr() > tests "0 <= 0" instead of "0 < 0". > > I'm not sure how much we care, but I think it would not be too VW-like > to add in a check for whether any pages (or swap) are allocated: if none > are allocated, there's none to remove from the radix_tree. At first I > thought that check would be good enough for the unmaps too, but no: we > should not skip the unlikely case of unmapping pages beyond the new EOF, > which were COWed from holes which have now been reclaimed, leaving none. > > This gives me an 8.5% speedup: on Haswell instead of LKP's Westmere, > and running a debug config before and after: I hope those account for > the lesser speedup. > > And probably someone has a benchmark where a thousand threads keep on > stat'ing the same file repeatedly: forestall that report by adjusting > v4.3 commit 44a30220bc0a ("shmem: recalculate file inode when fstat") > not to take the spinlock in shmem_getattr() when there's no work to do. > > Reported-by: Ying Huang > Signed-off-by: Hugh Dickins Hi, Hugh, Thanks a lot for your support! The test on LKP shows that this patch restores a big part of the regression! In following list, c435a390574d012f8d30074135d8fcc6f480b484: is parent commit afa2db2fb6f15f860069de94a1257db57589fe95: is the first bad commit has performance regression. 43819159da2b77fedcf7562134d6003dccd6a068: is the fixing patch = compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime: gcc-4.9/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/lkp-wsx02/creat-clo/aim9/300s commit: c435a390574d012f8d30074135d8fcc6f480b484 afa2db2fb6f15f860069de94a1257db57589fe95 43819159da2b77fedcf7562134d6003dccd6a068 c435a390574d012f afa2db2fb6f15f860069de94a1 43819159da2b77fedcf7562134 -- -- %stddev %change %stddev %change %stddev \ |\ |\ 563556 ± 1% -12.5% 493033 ± 5% -5.6% 531968 ± 1% aim9.creat-clo.ops_per_sec 11836 ± 7% +11.4% 13184 ± 7% +15.0% 13608 ± 5% numa-meminfo.node1.SReclaimable 10121526 ± 3% -12.1%8897097 ± 5% -4.1%9707953 ± 4% proc-vmstat.pgalloc_normal 9.34 ± 4% -11.4% 8.28 ± 3% -4.8% 8.88 ± 2% time.user_time 3480 ± 3% -2.5% 3395 ± 1% -28.5% 2488 ± 3% vmstat.system.cs 203275 ± 17% -6.8% 189453 ± 5% -34.4% 133352 ± 11% cpuidle.C1-NHM.usage 8081280 ±129% -93.3% 538377 ± 97% +31.5% 10625496 ±106% cpuidle.C1E-NHM.time 3144 ± 58%+619.0% 22606 ± 56%+903.9% 31563 ± 0% numa-vmstat.node0.numa_other 2958 ± 7% +11.4% 3295 ± 7% +15.0% 3401 ± 5% numa-vmstat.node1.nr_slab_reclaimable 45074 ± 5% -43.4% 25494 ± 57% -68.7% 14105 ± 2% numa-vmstat.node2.numa_other 56140 ± 0% +0.0% 56158 ± 0% -94.4% 3120 ± 0% slabinfo.Acpi-ParseExt.active_objs 1002 ± 0% +0.0% 1002 ± 0% -92.0% 80.00 ± 0% slabinfo.Acpi-ParseExt.active_slabs 56140 ± 0% +0.0% 56158 ± 0% -94.4% 3120 ± 0% slabinfo.Acpi-ParseExt.num_objs 1002 ± 0% +0.0% 1002 ± 0% -92.0% 80.00 ± 0% slabinfo.Acpi-ParseExt.num_slabs 1079 ± 5% -10.8% 962.00 ± 10%-100.0% 0.00 ± -1% slabinfo.blkdev_ioc.active_objs 1079 ± 5% -10.8% 962.00 ± 10%-100.0% 0.00 ± -1% slabinfo.blkdev_ioc.num_objs 110.67 ± 39% +74.4% 193.00 ± 46%+317.5% 462.00 ± 8% slabinfo.blkdev_queue.active_objs 189.33 ± 23% +43.7% 272.00 ± 33%+151.4% 476.00 ± 10% slabinfo.blkdev_queue.num_objs 1129 ± 10% -1.9% 1107 ± 7% +20.8% 1364 ± 6% slabinfo.blkdev_requests.active_objs 1129 ± 10% -1.9% 1107 ± 7% +20.8% 1364 ± 6% slabinfo.blkdev_requests.num_objs 1058 ± 3% -10.3% 949.00 ± 9%-100.0% 0.00 ± -1% slabinfo.file_lock_ctx.active_objs 1058 ± 3% -10.3% 949.00 ± 9%-100.0% 0.00 ± -1% slabinfo.file_lock_ctx.num_objs 4060 ± 1% -2.1% 3973 ± 1% -10.5% 3632 ± 1% slabinfo.files_cache.active_objs 4060 ± 1% -2.1% 3973 ± 1% -10.5% 3632 ± 1% slabinfo.files_cache.num_objs 10001 ± 0% -0.3% 9973 ± 0% -61.1% 3888 ± 0%
Re: Mobility Radeon HD 4530/4570/545v: flicker in 1920x1080
On 04.11.2015 00:03, Pavel Machek wrote: Hi! Any ideas? Alex probably knows more about this, but it sounds like problems with switching the memory clocks on 3D load. Try to disable power management completely with radeon.dpm=0 on the kernel command line or nailing the hardware at a specific power level using sysfs. I tried that, but it still flickers. It's probably pll stability. There seem to be a number of regressions since the pll code was rewritten to support matching the hdmi clocks more closely. Does this patch help? diff --git a/drivers/gpu/drm/radeon/atombios_crtc.c b/drivers/gpu/drm/radeon/atombios_crtc.c index dac78ad..b86f06a 100644 --- a/drivers/gpu/drm/radeon/atombios_crtc.c +++ b/drivers/gpu/drm/radeon/atombios_crtc.c @@ -569,6 +569,8 @@ static u32 atombios_adjust_pll(struct drm_crtc *crtc, radeon_crtc->pll_flags = 0; if (ASIC_IS_AVIVO(rdev)) { + radeon_crtc->pll_flags |= RADEON_PLL_PREFER_MINM_OVER_MAXP; + if ((rdev->family == CHIP_RS600) || (rdev->family == CHIP_RS690) || (rdev->family == CHIP_RS740)) Help.. maybe... it is tricky to tell. It definitely does _not_ fix the issue completely. You could also try the old pll algorithm: I reverted the patch above, and switched to the old algorithm. The flicker is still there. (But maybe its less horrible, like with RADEON_PLL_PREFER_MINM_OVER_MAXP). The flickering would vanish completely if that's the reason for the issue you are seeing. Try setting ref_div_min and ref_div_max to 2 in radeon_compute_pll_avivo(). But I'm not 100% convinced that this is actually a PLL problem, try to compile the firmware it complains about into the kernel as well. Regards, Christian. Thanks, Pavel -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v7] PCI: Xilinx-NWL-PCIe: Added support for Xilinx NWL PCIe Host Controller
> Without #ifdefs if we compile driver for legacy, MSI structures will not be > available and we get compile time error. Sorry for nitpicking but at least can we use elegant version of #ifdefs .i.e. #if IS_ENABLED() here ? Thanks, Amit. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] dmaengine: edma: fix build without CONFIG_OF
On 11/03/2015 04:00 PM, Arnd Bergmann wrote: > During the edma rework, a build error was introduced for the > case that CONFIG_OF is disabled: > > drivers/built-in.o: In function `edma_tc_set_pm_state': > :(.text+0x43bf0): undefined reference to `of_find_device_by_node' > > As the edma_tc_set_pm_state() function does nothing in case > we are running without OF, this adds an IS_ENABLED() check > that turns the function into an empty stub then and avoids the > link error. > > Signed-off-by: Arnd Bergmann > Fixes: ca304fa9bb76 ("ARM/dmaengine: edma: Public API to use private struct > pointer") The actual commit this patch is fixing is: 1be5336bc7ba dmaengine: edma: New device tree binding > --- > Found on ARM randconfig builds with today's linux-next I have sanity built the kernel with omap2plus_defconfig and davinci_all_defconfig since eDMA is used by these platforms and did not faced with this issue, as obviously these defconfigs will result OF to be enabled. > diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c > index 31722d436a42..16713a93da10 100644 > --- a/drivers/dma/edma.c > +++ b/drivers/dma/edma.c > @@ -1560,7 +1560,7 @@ static void edma_tc_set_pm_state(struct edma_tc *tc, > bool enable) > struct platform_device *tc_pdev; > int ret; > > - if (!tc) > + if (!IS_ENABLED(CONFIG_OF) || !tc) > return; Should we instead put the function inside of: #if IS_ENABLED(CONFIG_OF) static void edma_tc_set_pm_state(struct edma_tc *tc, bool enable) { ... } #else static inline void edma_tc_set_pm_state(struct edma_tc *tc, bool enable) { } #endif /* IS_ENABLED(CONFIG_OF) */ > > tc_pdev = of_find_device_by_node(tc->node); > -- Péter -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] h8300: Initialize cleanup and remove module code
h8300's clocksource driver cleanup. - Use CLOCKSOURCE_OF_DECLARE - Remove module exit Signed-off-by: Yoshinori Sato --- drivers/clocksource/h8300_timer16.c | 133 - drivers/clocksource/h8300_timer8.c | 164 ++-- drivers/clocksource/h8300_tpu.c | 105 --- 3 files changed, 151 insertions(+), 251 deletions(-) diff --git a/drivers/clocksource/h8300_timer16.c b/drivers/clocksource/h8300_timer16.c index 82941c1..e2e10cf 100644 --- a/drivers/clocksource/h8300_timer16.c +++ b/drivers/clocksource/h8300_timer16.c @@ -17,6 +17,8 @@ #include #include #include +#include +#include #include #include @@ -47,7 +49,6 @@ #define ABSOLUTE 1 struct timer16_priv { - struct platform_device *pdev; struct clocksource cs; struct irqaction irqaction; unsigned long total_cycles; @@ -147,43 +148,58 @@ static void timer16_disable(struct clocksource *cs) #define REG_CH 0 #define REG_COMM 1 -static int timer16_setup(struct timer16_priv *p, struct platform_device *pdev) +static void __init h8300_16timer_init(struct device_node *node) { - struct resource *res[2]; + void __iomem *base[2]; int ret, irq; unsigned int ch; + struct timer16_priv *p; + struct clk *clk; + + clk = of_clk_get(node, 0); + if (IS_ERR(clk)) { + pr_err("failed to get clock for clocksource\n"); + return; + } - memset(p, 0, sizeof(*p)); - p->pdev = pdev; + base[REG_CH] = of_iomap(node, 0); + if (!base[0]) { + pr_err("failed to map registers for clocksource\n"); + goto free_clk; + } - res[REG_CH] = platform_get_resource(p->pdev, - IORESOURCE_MEM, REG_CH); - res[REG_COMM] = platform_get_resource(p->pdev, - IORESOURCE_MEM, REG_COMM); - if (!res[REG_CH] || !res[REG_COMM]) { - dev_err(>pdev->dev, "failed to get I/O memory\n"); - return -ENXIO; + base[REG_COMM] = of_iomap(node, 1); + if (!base[1]) { + pr_err("failed to map registers for clocksource\n"); + goto unmap_ch; } - irq = platform_get_irq(p->pdev, 0); + + irq = irq_of_parse_and_map(node, 0); if (irq < 0) { - dev_err(>pdev->dev, "failed to get irq\n"); - return irq; + pr_err("failed to get irq for clockevent\n"); + goto unmap_comm; } - p->clk = clk_get(>pdev->dev, "fck"); - if (IS_ERR(p->clk)) { - dev_err(>pdev->dev, "can't get clk\n"); - return PTR_ERR(p->clk); + p = kzalloc(sizeof(*p), GFP_KERNEL); + if (!p) { + pr_err("failed to allocate memory for clocksource\n"); + goto unmap_comm; } - of_property_read_u32(p->pdev->dev.of_node, "renesas,channel", ); - p->pdev = pdev; - p->mapbase = res[REG_CH]->start; - p->mapcommon = res[REG_COMM]->start; + of_property_read_u32(node, "renesas,channel", ); + + p->mapbase = (unsigned long)base[REG_CH]; + p->mapcommon = (unsigned long)base[REG_COMM]; p->enb = 1 << ch; p->imfa = 1 << ch; p->imiea = 1 << (4 + ch); - p->cs.name = pdev->name; + + p->irqaction.name = node->name; + p->irqaction.handler = timer16_interrupt; + p->irqaction.dev_id = p; + p->irqaction.flags = IRQF_TIMER; + + p->cs.name = node->name; p->cs.rating = 200; p->cs.read = timer16_clocksource_read; p->cs.enable = timer16_enable; @@ -191,64 +207,23 @@ static int timer16_setup(struct timer16_priv *p, struct platform_device *pdev) p->cs.mask = CLOCKSOURCE_MASK(sizeof(unsigned long) * 8); p->cs.flags = CLOCK_SOURCE_IS_CONTINUOUS; - ret = request_irq(irq, timer16_interrupt, - IRQF_TIMER, pdev->name, p); + ret = setup_irq(irq, >irqaction); if (ret < 0) { - dev_err(>pdev->dev, "failed to request irq %d\n", irq); - return ret; + pr_err("failed to request irq %d of clocksource\n", irq); + goto free_priv; } clocksource_register_hz(>cs, clk_get_rate(p->clk) / 8); - - return 0; -} - -static int timer16_probe(struct platform_device *pdev) -{ - struct timer16_priv *p = platform_get_drvdata(pdev); - - if (p) { - dev_info(>dev, "kept as earlytimer\n"); - return 0; - } - - p = devm_kzalloc(>dev, sizeof(*p), GFP_KERNEL); - if (!p) - return -ENOMEM; - - return timer16_setup(p, pdev); -} - -static int timer16_remove(struct platform_device *pdev) -{ - return -EBUSY; -} - -static const struct of_device_id timer16_of_table[] = { - { .compatible =
Re: [PATCH v5 0/4] Exynos SROMc configuration and Ethernet support for SMDK5410
On 03.11.2015 18:16, Pavel Fedin wrote: > This patch extends Exynos SROM controller driver with ability to configure > controller outputs and enables SMSC9115 Ethernet chip on SMDK5410 board, > which is connected via SROMc bank #3. > > With this patchset, support for the whole existing SMDK range can be added. > Actually, only bank number is different. > > This patchset also depends on Exynos 5410 pinctrl support, introduced by > patches 0003 and 0004 from this set: > [PATCH v4 0/5] ARM: EXYNOS: ODROID-XU DT and LEDs > http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/330862.html Only the first patch made up to kernel. The rest (2-5) received feedback and I am waiting for fixing it. Andreas apparently loose the interest... Best regards, Krzysztof -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mm: change tlb_finish_mmu() to be more simple
This patch remove unneeded *next temp variable, make this function more simple to read. Signed-off-by: yalin wang --- mm/memory.c | 7 +++ 1 file changed, 3 insertions(+), 4 deletions(-) diff --git a/mm/memory.c b/mm/memory.c index 7f3b9f2..f0040ed 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -270,17 +270,16 @@ void tlb_flush_mmu(struct mmu_gather *tlb) */ void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long end) { - struct mmu_gather_batch *batch, *next; + struct mmu_gather_batch *batch; tlb_flush_mmu(tlb); /* keep the page table cache within bounds */ check_pgt_cache(); - for (batch = tlb->local.next; batch; batch = next) { - next = batch->next; + for (batch = tlb->local.next; batch; batch = batch->next) free_pages((unsigned long)batch, 0); - } + tlb->local.next = NULL; } -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] perf sched latency: Fix removed thread issue
On Tue, Nov 03, 2015 at 03:33:10PM -0300, Arnaldo Carvalho de Melo wrote: > Em Tue, Nov 03, 2015 at 08:41:48AM +0100, Jiri Olsa escreveu: > > On Mon, Nov 02, 2015 at 07:53:53PM -0300, Arnaldo Carvalho de Melo wrote: > > > Em Mon, Nov 02, 2015 at 12:10:25PM +0100, Jiri Olsa escreveu: > > > > If machine's thread gets excited (EXIT event is received), > > > > we set thread->dead = true and it is later on removed from > > > > machine's tree if the pid is reused on new thread. > > > > > > > The latency subcommand holds tree of working atoms sorted > > > > by thread's pid/tid. If there's new thread with same pid > > > > > > Humm, wher is the latency subcommand handling the EXIT event? > > > > > > I see: > > > > > >perf_sched__lat > > > perf_sched__read_events > > >session = perf_session__new(, false, >tool); > > >perf_session__process_events(session) > > > > > > And sched->tool->exit() is not set, which will make > > > perf_session__process_events(), when calling perf_tool__fill_defaults() > > > set it to process_event_stub() which will do nothing for > > > PERF_RECORD_EXIT events, no? > > > > yep, latency command does not handle EXIT event, but the > > thread is removed via FORK event.. the first changelog > > So its not related to processing an EXIT event as described in the > changelog, ok. And I don't see where is that thread->dead is set, i.e. > the sequence is: > > machine__process_fork_event() > machine__remove_thread() > > The only place where thread->dead is set to true is in thread__exited() > and that is only called by machine__process_exit_event(), which is never > called by 'perf sched'. > > It is not "later removed from machine's tree", it is removed straight > away, in __machine__remove_thread(). it is removed later via FORK event, as stated in changelog and in updated changelog I sent in previous email: SNIP > > could you please change it to: > > > > --- > > If machine's thread gets excited (EXIT event is received), > > we set thread->dead = true and it is later on removed from > > machine's tree if the pid is reused on new thread. > > > > We dont handle EXIT command in 'perf sched latency', > > however the old thread is removed anyway when FORK > > event is received for new thrad with same pid/tid. > > --- thanks, jirka -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] rtc: Group Kconfig entries by vendor
The RTC entries are mostly grouped by vendor. Move the few outliers in place. Also, change the one occurrence of 'nxp' to 'NXP' to make all NXP entries consistent. Signed-off-by: Soren Brinkmann --- drivers/rtc/Kconfig | 90 ++--- 1 file changed, 45 insertions(+), 45 deletions(-) diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig index 2a524244afec..ea7f43f12f69 100644 --- a/drivers/rtc/Kconfig +++ b/drivers/rtc/Kconfig @@ -325,16 +325,6 @@ config RTC_DRV_MAX77686 This driver can also be built as a module. If so, the module will be called rtc-max77686. -config RTC_DRV_RK808 - tristate "Rockchip RK808 RTC" - depends on MFD_RK808 - help - If you say yes here you will get support for the - RTC of RK808 PMIC. - - This driver can also be built as a module. If so, the module - will be called rk808-rtc. - config RTC_DRV_MAX77802 tristate "Maxim 77802 RTC" depends on MFD_MAX77686 @@ -345,6 +335,16 @@ config RTC_DRV_MAX77802 This driver can also be built as a module. If so, the module will be called rtc-max77802. +config RTC_DRV_RK808 + tristate "Rockchip RK808 RTC" + depends on MFD_RK808 + help + If you say yes here you will get support for the + RTC of RK808 PMIC. + + This driver can also be built as a module. If so, the module + will be called rk808-rtc. + config RTC_DRV_RS5C372 tristate "Ricoh R2025S/D, RS5C372A/B, RV5C386, RV5C387A" help @@ -391,16 +391,6 @@ config RTC_DRV_X1205 This driver can also be built as a module. If so, the module will be called rtc-x1205. -config RTC_DRV_PALMAS - tristate "TI Palmas RTC driver" - depends on MFD_PALMAS - help - If you say yes here you get support for the RTC of TI PALMA series PMIC - chips. - - This driver can also be built as a module. If so, the module - will be called rtc-palma. - config RTC_DRV_PCF2127 tristate "NXP PCF2127" help @@ -419,6 +409,14 @@ config RTC_DRV_PCF8523 This driver can also be built as a module. If so, the module will be called rtc-pcf8523. +config RTC_DRV_PCF85063 + tristate "NXP PCF85063" + help + If you say yes here you get support for the PCF85063 RTC chip + + This driver can also be built as a module. If so, the module + will be called rtc-pcf85063. + config RTC_DRV_PCF8563 tristate "Philips PCF8563/Epson RTC8564" help @@ -429,14 +427,6 @@ config RTC_DRV_PCF8563 This driver can also be built as a module. If so, the module will be called rtc-pcf8563. -config RTC_DRV_PCF85063 - tristate "nxp PCF85063" - help - If you say yes here you get support for the PCF85063 RTC chip - - This driver can also be built as a module. If so, the module - will be called rtc-pcf85063. - config RTC_DRV_PCF8583 tristate "Philips PCF8583" help @@ -501,6 +491,16 @@ config RTC_DRV_TWL4030 This driver can also be built as a module. If so, the module will be called rtc-twl. +config RTC_DRV_PALMAS + tristate "TI Palmas RTC driver" + depends on MFD_PALMAS + help + If you say yes here you get support for the RTC of TI PALMA series PMIC + chips. + + This driver can also be built as a module. If so, the module + will be called rtc-palma. + config RTC_DRV_TPS6586X tristate "TI TPS6586X RTC driver" depends on MFD_TPS6586X @@ -681,15 +681,6 @@ config RTC_DRV_DS1390 This driver can also be built as a module. If so, the module will be called rtc-ds1390. -config RTC_DRV_MAX6902 - tristate "Maxim MAX6902" - help - If you say yes here you will get support for the - Maxim MAX6902 SPI RTC chip. - - This driver can also be built as a module. If so, the module - will be called rtc-max6902. - config RTC_DRV_R9701 tristate "Epson RTC-9701JE" help @@ -699,6 +690,14 @@ config RTC_DRV_R9701 This driver can also be built as a module. If so, the module will be called rtc-r9701. +config RTC_DRV_RX4581 + tristate "Epson RX-4581" + help + If you say yes here you will get support for the Epson RX-4581. + + This driver can also be built as a module. If so the module + will be called rtc-rx4581. + config RTC_DRV_RS5C348 tristate "Ricoh RS5C348A/B" help @@ -708,6 +707,15 @@ config RTC_DRV_RS5C348 This driver can also be built as a module. If so, the module will be called rtc-rs5c348. +config RTC_DRV_MAX6902 + tristate "Maxim MAX6902" + help + If you say yes here you will get support for the + Maxim MAX6902 SPI RTC chip. + + This
HELLO
I am Sgt Monica Brown and am with the US Army in Afghanistan. I need your assistance in a private project that will be beneficial to the both of us. Get back to me if you can be of help. --- This email has been checked for viruses by Avast antivirus software. https://www.avast.com/antivirus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Documentation: dt: Add bindings for Secure-only devices
On 10/30/2015 1:07 PM, Peter Maydell wrote: > On 30 October 2015 at 18:28, Rob Herring wrote: >> On Thu, Oct 29, 2015 at 9:01 AM, Peter Maydell >> wrote: >>> +Valid Secure world properties: >>> + >>> +- secure-status : specifies whether the device is present and usable >>> + in the secure world. The combination of this with "status" allows >>> + the various possible combinations of device visibility to be >>> + specified: >>> + status = "okay"; // visible in S and NS >> >> I assume neither property present or both okay also mean the same. >> >> status = "okay"; secure-status = "okay"; >> >> We should be explicit. > > Yes; status defaults to "okay" (presumably this is listed in > the overal DT binding spec somewhere), Valid values for the "status" property do not seem to be well documented for Linux. The code, __of_device_is_available(), implements: value of "okay" returns true value of "ok"returns true status property does not exist returns true (thus defaults to true) anything else returns false. Eight individual bindings define the status property (I'll put them on my todo list to remove since they should be inherited). The status property should be in a top level binding, which should be created as part of the bindings documentation cleanup. I'll watch for it to get created or do it myself. (Lots of bindings show examples of using the status property in their examples -- that is fine.) ePAPR v1.1, section "2.3.4 status" does not list "ok", but does list: "okay" "disabled" "fail" "fail-sss" < snip > -Frank -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] arm64: bpf: fix div-by-zero case
On Tue, Nov 3, 2015 at 10:56 PM, Zi Shen Lim wrote: > case BPF_ALU | BPF_DIV | BPF_X: > case BPF_ALU64 | BPF_DIV | BPF_X: > + { > + const u8 r0 = bpf2a64[BPF_REG_0]; > + > + /* if (src == 0) return 0 */ > + jmp_offset = 3; /* skip ahead to else path */ > + check_imm19(jmp_offset); > + emit(A64_CBNZ(is64, src, jmp_offset), ctx); > + emit(A64_MOVZ(1, r0, 0, 0), ctx); > + jmp_offset = epilogue_offset(ctx); > + check_imm26(jmp_offset); > + emit(A64_B(jmp_offset), ctx); > + /* else */ > emit(A64_UDIV(is64, dst, dst, src), ctx); > break; > + } > case BPF_ALU | BPF_MOD | BPF_X: > case BPF_ALU64 | BPF_MOD | BPF_X: BPF_MOD might need the same fix. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC] namei: prevent sgid-hardlinks for unmapped gids
On Tue, Nov 03, 2015 at 03:29:55PM -0800, Kees Cook wrote: > Using "write" does kill the set-gid bit. I haven't looked at > why. > Al or anyone else, is there a meaningful distinction here? I remember this one, I got caught once while trying to put a shell into a suid-writable file to get some privileges someone forgot to offer me :-) It's done by should_remove_suid() which is called upon write() and truncate(). > Should the > mmap MAP_SHARED-write trigger the loss of the set-gid bit too? While > holding the file open with either open or mmap, I get a Text-in-use > error, so I would kind of expect the same behavior between either > close() and munmap(). I wonder if this is a bug, and if so, then your > link patch is indeed useful again. :) I don't see how this could be done with mmap(). Maybe we have a way to know when the first write is performed via this path, I have no idea. Willy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] mm: Introduce kernelcore=reliable option
On 2015/10/31 4:42, Luck, Tony wrote: If each memory controller has the same distance/latency, you (your firmware) don't need to allocate reliable memory per each memory controller. If distance is problem, another node should be allocated. ...is the behavior(splitting zone) really required ? It's useful from a memory bandwidth perspective to have allocations spread across both memory controllers. Keeping a whole bunch of Xeon cores fed needs all the bandwidth you can get. Hmm. But physical address layout is not related to dual memory controller. I think reliable range can be contiguous by firmware... -Kame -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] arm64: bpf: fix div-by-zero case
In the case of division by zero in a BPF program: A = A / X; (X == 0) the expected behavior is to terminate with return value 0. This is confirmed by the test case introduced in commit 86bf1721b226 ("test_bpf: add tests checking that JIT/interpreter sets A and X to 0."). Reported-by: Shi, Yang CC: Xi Wang CC: Alexei Starovoitov CC: Catalin Marinas CC: linux-arm-ker...@lists.infradead.org CC: linux-kernel@vger.kernel.org Fixes: e54bcde3d69d ("arm64: eBPF JIT compiler") Cc: # 3.18+ Signed-off-by: Zi Shen Lim --- arch/arm64/net/bpf_jit.h | 3 ++- arch/arm64/net/bpf_jit_comp.c | 37 + 2 files changed, 27 insertions(+), 13 deletions(-) diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h index 98a26ce..aee5637 100644 --- a/arch/arm64/net/bpf_jit.h +++ b/arch/arm64/net/bpf_jit.h @@ -1,7 +1,7 @@ /* * BPF JIT compiler for ARM64 * - * Copyright (C) 2014 Zi Shen Lim + * Copyright (C) 2014-2015 Zi Shen Lim * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as @@ -35,6 +35,7 @@ aarch64_insn_gen_comp_branch_imm(0, offset, Rt, A64_VARIANT(sf), \ AARCH64_INSN_BRANCH_COMP_##type) #define A64_CBZ(sf, Rt, imm19) A64_COMP_BRANCH(sf, Rt, (imm19) << 2, ZERO) +#define A64_CBNZ(sf, Rt, imm19) A64_COMP_BRANCH(sf, Rt, (imm19) << 2, NONZERO) /* Conditional branch (immediate) */ #define A64_COND_BRANCH(cond, offset) \ diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c index c047598..9ae6f23 100644 --- a/arch/arm64/net/bpf_jit_comp.c +++ b/arch/arm64/net/bpf_jit_comp.c @@ -1,7 +1,7 @@ /* * BPF JIT compiler for ARM64 * - * Copyright (C) 2014 Zi Shen Lim + * Copyright (C) 2014-2015 Zi Shen Lim * * This program is free software; you can redistribute it and/or modify * it under the terms of the GNU General Public License version 2 as @@ -225,6 +225,17 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx) u8 jmp_cond; s32 jmp_offset; +#define check_imm(bits, imm) do { \ + if imm) > 0) && ((imm) >> (bits))) || \ + (((imm) < 0) && (~(imm) >> (bits { \ + pr_info("[%2d] imm=%d(0x%x) out of range\n",\ + i, imm, imm); \ + return -EINVAL; \ + } \ +} while (0) +#define check_imm19(imm) check_imm(19, imm) +#define check_imm26(imm) check_imm(26, imm) + switch (code) { /* dst = src */ case BPF_ALU | BPF_MOV | BPF_X: @@ -258,8 +269,21 @@ static int build_insn(const struct bpf_insn *insn, struct jit_ctx *ctx) break; case BPF_ALU | BPF_DIV | BPF_X: case BPF_ALU64 | BPF_DIV | BPF_X: + { + const u8 r0 = bpf2a64[BPF_REG_0]; + + /* if (src == 0) return 0 */ + jmp_offset = 3; /* skip ahead to else path */ + check_imm19(jmp_offset); + emit(A64_CBNZ(is64, src, jmp_offset), ctx); + emit(A64_MOVZ(1, r0, 0, 0), ctx); + jmp_offset = epilogue_offset(ctx); + check_imm26(jmp_offset); + emit(A64_B(jmp_offset), ctx); + /* else */ emit(A64_UDIV(is64, dst, dst, src), ctx); break; + } case BPF_ALU | BPF_MOD | BPF_X: case BPF_ALU64 | BPF_MOD | BPF_X: ctx->tmp_used = 1; @@ -393,17 +417,6 @@ emit_bswap_uxt: emit(A64_ASR(is64, dst, dst, imm), ctx); break; -#define check_imm(bits, imm) do { \ - if imm) > 0) && ((imm) >> (bits))) || \ - (((imm) < 0) && (~(imm) >> (bits { \ - pr_info("[%2d] imm=%d(0x%x) out of range\n",\ - i, imm, imm); \ - return -EINVAL; \ - } \ -} while (0) -#define check_imm19(imm) check_imm(19, imm) -#define check_imm26(imm) check_imm(26, imm) - /* JUMP off */ case BPF_JMP | BPF_JA: jmp_offset = bpf2a64_offset(i + off, i, ctx); -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/3] cpuidle: small improvements & fixes for menu governor
On Wed, 2015-11-04 at 00:03 +0100, Rafael J. Wysocki wrote: > On 11/3/2015 11:35 PM, Rik van Riel wrote: > > On 11/03/2015 05:05 PM, Rafael J. Wysocki wrote: > > > On 10/28/2015 11:46 PM, r...@redhat.com wrote: > > > > While working on a paravirt cpuidle driver for KVM guests, I > > > > noticed a number of small logic errors in the menu governor > > > > code. > > > > > > > > These patches should get rid of some artifacts that can break > > > > the logic in the menu governor under certain corner cases, and > > > > make idle state selection work better on CPUs with long C1 exit > > > > latencies. > > > > > > > > I have not seen any adverse effects with them in my (quick) > > > > tests. As expected, they do not seem to do much on systems with > > > > many power states and very low C1 exit latencies and target > > > > residencies. > > > > > > > Sorry for the trouble, but can you please resend the series with > > > CCs to > > > linux...@vger.kernel.org? That will make it way easier to handle > > > for me. > > Not a problem. Done. > > > > What change do I need to send in to ensure that the > > linux-pm mailing list shows up in get_maintainer.pl > > output? > > > > $ ./scripts/get_maintainer.pl -f drivers/cpuidle/governors/menu.c > > "Rafael J. Wysocki" > > (commit_signer:4/7=57%,authored:1/7=14%,added_lines:2/19=11%,remove > > d_lines:2/28=7%) > > Daniel Lezcano > > (commit_signer:3/7=43%,authored:1/7=14%,added_lines:1/19=5%) > > Rik van Riel > > (commit_signer:3/7=43%,authored:3/7=43%,added_lines:5/19=26%,remove > > d_lines:3/28=11%) > > Len Brown > > (commit_signer:1/7=14%,authored:1/7=14%,added_lines:10/19=53%,remov > > ed_lines:15/28=54%) > > "Peter Zijlstra (Intel)" > > (commit_signer:1/7=14%) > > Javi Merino > > (authored:1/7=14%,added_lines:1/19=5%,removed_lines:7/28=25%) > > linux-kernel@vger.kernel.org (open list) > > > > I'm not sure why it doesn't show up in there. > If you look at MAINTAINERS under CPUIDLE DRIVERS, linux-pm is > actually listed there. Because the pattern is just the files in the top level directory of drivers/cpuidle and not any files in any directory below >From the MAINTAINERS pattern descriptions: F: Files and directories with wildcard patterns. A trailing slash includes all files and subdirectory files. F: drivers/net/all files in and below drivers/net F: drivers/net/* all files in drivers/net, but not below > Maybe the answer is that get_maintainer.pl needs to be fixed ... Nope, the MAINTAINERS - CPUIDLE DRIVERS entry does though. --- MAINTAINERS | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/MAINTAINERS b/MAINTAINERS index 77ed3a0..8bd5c7e 100644 --- a/MAINTAINERS +++ b/MAINTAINERS @@ -3004,7 +3004,7 @@ M:Daniel Lezcano L: linux...@vger.kernel.org S: Maintained T: git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git -F: drivers/cpuidle/* +F: drivers/cpuidle/ F: include/linux/cpuidle.h CPUID/MSR DRIVER -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v7] PCI: Xilinx-NWL-PCIe: Added support for Xilinx NWL PCIe Host Controller
> > +static struct msi_domain_info nwl_msi_domain_info = { > > + .flags = (MSI_FLAG_USE_DEF_DOM_OPS | > MSI_FLAG_USE_DEF_CHIP_OPS | > > + MSI_FLAG_MULTI_PCI_MSI), > > If you're supporting multi-MSI, how do you ensure that all hwirqs are > contiguous as required by the spec? Clearly, your allocator doesn't provide > this guarantee. > Oh ok, then how can we know if one EP is requesting for multiple irq's, because in pci_enable_msi_range, it does do while loop according to msi_capability_init return value which in turn requests for multiple irq's. Is there any way to find out when single EP requesting for multiple MSI? > Also, you still lack support for MSI-X (which would come for free...). We don't support MSI-X in root port mode. > > > + .chip = _msi_irq_chip, > > +}; > > +#endif > > + > > + irq_domain_remove(pcie->legacy_irq_domain); > > + > > +#ifdef CONFIG_PCI_MSI > > + irq_set_chained_handler_and_data(msi->irq_msi0, NULL, NULL); > > + irq_set_chained_handler_and_data(msi->irq_msi1, NULL, NULL); > > + > > + irq_domain_remove(msi->msi_domain); > > + irq_domain_remove(msi->dev_domain); > > +#endif > > + > > +} > > Remove these #ifdefs. You can test the validity of these fields before calling > the various functions. You can also move this to a separate function. Without #ifdefs if we compile driver for legacy, MSI structures will not be available and we get compile time error. > > > + > > + /* setup AFI/FPCI range */ > > + msi->pages = __get_free_pages(GFP_KERNEL, 0); > > + base = virt_to_phys((void *)msi->pages); > > + nwl_bridge_writel(pcie, lower_32_bits(base), I_MSII_BASE_LO); > > + nwl_bridge_writel(pcie, upper_32_bits(base), I_MSII_BASE_HI); > > I just read this, and I'm puzzled. Actually, puzzled is an understatement. Why > on Earth do you need to give RAM to your MSI HW? > This should be a device, not RAM. By the look of it, this could be absolutely > anything. Are you sure you have to supply RAM here? > This is required in our hardware, so that bridge identifies incoming MWr as MSI. > > + > > + /* Disable high range msi interrupts */ > > + nwl_bridge_writel(pcie, (u32)~MSGF_MSI_SR_HI_MASK, > > +MSGF_MSI_MASK_HI); > > + > > + Bharat -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
linux-next: Tree for Nov 4
Hi all, Please do *not* add any material intended for v4.5 to your linux-next included branches until after v4.4-rc1 has been released. Changes since 20151103: The battery tree still had its build failure so I used the version from next-20150925. The mailbox tree still had its build failure so I used the version from next-20151022. The akpm tree gained a conflict against the drm tree and lost a patch that turned up elsewhere. Non-merge commits (relative to Linus' tree): 10589 8442 files changed, 409626 insertions(+), 188833 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc and an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), it is also built with powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig (this fails its final link) and pseries_le_defconfig and i386, sparc, sparc64 and arm defconfig. Below is a summary of the state of the merge. I am currently merging 229 trees (counting Linus' and 35 trees of patches pending for Linus' tree). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwells...@canb.auug.org.au $ git checkout master $ git reset --hard stable Merging origin/master (316dde2fe95b Merge branch 'for-linus' of git://ftp.arm.linux.org.uk/~rmk/linux-arm) Merging fixes/master (25cb62b76430 Linux 4.3-rc5) Merging kbuild-current/rc-fixes (3d1450d54a4f Makefile: Force gzip and xz on module install) Merging arc-current/for-curr (049e6dde7e57 Linux 4.3-rc4) Merging arm-current/fixes (38850d786a79 ARM: 8449/1: fix bug in vdsomunge swab32 macro) Merging m68k-current/for-linus (95bc06ef049b m68k/defconfig: Update defconfigs for v4.3-rc1) Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached build errors) Merging mips-fixes/mips-fixes (1795cd9b3a91 Linux 3.16-rc5) Merging powerpc-fixes/fixes (977bf062bba3 powerpc/dma: dma_set_coherent_mask() should not be GPL only) Merging powerpc-merge-mpe/fixes (bc0195aad0da Linux 4.2-rc2) Merging sparc/master (73958c651fbf sparc64: use ENTRY/ENDPROC in VISsave) Merging net/master (ebac62fe3d24 ipv6: fix tunnel error handling) Merging ipsec/master (a8a572a6b5f2 xfrm: dst_entries_init() per-net dst_ops) Merging ipvs/master (6ece90f9a13e netfilter: fix Kconfig dependencies for nf_dup_ipv{4,6}) Merging sound-current/for-linus (de1ab6af5c3d ALSA: hda - Fix lost 4k BDL boundary workaround) Merging pci-current/for-linus (1266963170f5 PCI: Prevent out of bounds access in numa_node override) Merging wireless-drivers/master (de28a05ee28e Merge tag 'iwlwifi-for-kalle-2015-10-05' of git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes) Merging driver-core.current/driver-core-linus (9ffecb102835 Linux 4.3-rc3) Merging tty.current/tty-linus (f235f664a8af fbcon: initialize blink interval before calling fb_set_par) Merging usb.current/usb-linus (32b88194f71d Linux 4.3-rc7) Merging usb-gadget-fixes/fixes (25cb62b76430 Linux 4.3-rc5) Merging usb-serial-fixes/usb-linus (32b88194f71d Linux 4.3-rc7) Merging usb-chipidea-fixes/ci-for-usb-stable (f256896afdb6 usb: chipidea: otg: gadget module load and unload support) Merging staging.current/staging-linus (32b88194f71d Linux 4.3-rc7) Merging char-misc.current/char-misc-linus (25cb62b76430 Linux 4.3-rc5) Merging input-current/for-linus (195562194aad Input: alps - only the Dell Latitude D420/430/620/630 have separate stick button bits) Merging crypto-current/master (4afa5f961792 crypto: algif_hash - Only export and import on sockets with data) Merging ide/master (353b39d1b151 ide: pdc202xx_new: Replace timeval with ktime_t) Merging devicetree-current/devicetree/merge (f76502aa9140 of/dynamic: Fix test for PPC_PSERIES) Merging rr-fixes/fixes (275d7d44d802 module: Fix locking in symbol_put_addr()) Merging vfio-fixes/for-linus (4bc94d5dc95d vfio: Fix lockdep issue) Merging kselftest-fixes/fixes (ae785818
linux-next: manual merge of the akpm tree with the drm tree
Hi Andrew, Today's linux-next merge of the akpm tree got a conflict in: drivers/gpu/drm/nouveau/nouveau_ttm.c between commits: 524883bb4846 ("drm/nouveau/ttm: convert to DMA API") b31cf78b9324 ("drm/nouveau/ttm: set the DMA mask for platform devices") from the drm tree and patch: "nouveau: don't call pci_dma_supported" from the akpm tree. I fixed it up (so the patch now looks like below) and can carry the fix as necessary (no action is required). diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c b/drivers/gpu/drm/nouveau/nouveau_ttm.c index 3f713c1b5dc1..d2e7d209f651 100644 --- a/drivers/gpu/drm/nouveau/nouveau_ttm.c +++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c @@ -353,8 +353,7 @@ nouveau_ttm_init(struct nouveau_drm *drm) bits = nvxx_mmu(>device)->dma_bits; if (nvxx_device(>device)->func->pci) { - if (drm->agp.bridge || -!dma_supported(dev->dev, DMA_BIT_MASK(bits))) + if (drm->agp.bridge) bits = 32; } else if (device->func->tegra) { struct nvkm_device_tegra *tegra = device->func->tegra(device); @@ -369,6 +368,10 @@ nouveau_ttm_init(struct nouveau_drm *drm) } ret = dma_set_mask(dev->dev, DMA_BIT_MASK(bits)); + if (ret && bits != 32) { + bits = 32; + ret = dma_set_mask(dev->dev, DMA_BIT_MASK(bits)); + } if (ret) return ret; -- Cheers, Stephen Rothwells...@canb.auug.org.au -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [RFC PATCH 0/3] CFS idle injection
Hello Jacob, On Mon, Nov 02, 2015 at 04:10:25PM -0800, Jacob Pan wrote: > Hi Peter and all, > > A while ago, we had discussion about how powerclamp is broken in the > sense of turning off idle ticks in the forced idle period. > https://lkml.org/lkml/2014/12/18/369 > > It was suggested to replace the current kthread play idle loop with a > timer based runqueue throttling scheme. I finally got around to implement > this and code is much simpler. I also have good test results in terms of > efficiency, scalability, etc. > http://events.linuxfoundation.org/sites/events/files/slides/LinuxCon_Japan_2015_idle_injection1_0.pdf > slide #18+ shows the data on client and server. > > I have two choices for this code: > 1) be part of existing powerclamp driver but require exporting some >sched APIs. > 2) be part of sched since the genernal rule applies when it comes down >to sycnhronized idle time for best power savings. > > The patches below are for #2. There is a known problem with LOW RES timer > mode that I am working on. But I am hoping to get review earlier. > I also like #2 too. Specially now that it is not limited to a specific platform. One question though, could you still keep the cooling device support of it? In some systems, it might make sense to enable / disable idle injections based on temperature. Was there any particular reason you dropped the cooling device support? BR, Eduardo Valentin > We are entering a very power limited environment on client side, frequency > scaling can only be efficient at certain range. e.g. on SKL, upto ~900MHz, > anything below, it is increasingly more efficient to do C-states insertion > if coordinated. > > Looking forward, there are use case beyond thermal/power capping. I think > we can consolidate ballanced partial busy workload that are evenly > distributed among CPUs. > > Please let me know what you think. > > Thanks, > > > Jacob Pan (3): > ktime: add a roundup function > timer: relax tick stop in idle entry > sched: introduce synchronized idle injection > > include/linux/ktime.h| 10 ++ > include/linux/sched.h| 12 ++ > include/linux/sched/sysctl.h | 5 + > include/trace/events/sched.h | 23 +++ > init/Kconfig | 8 + > kernel/sched/fair.c | 345 > +++ > kernel/sched/sched.h | 3 + > kernel/sysctl.c | 20 +++ > kernel/time/tick-sched.c | 2 +- > 9 files changed, 427 insertions(+), 1 deletion(-) > > -- > 1.9.1 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-kernel" in > the body of a message to majord...@vger.kernel.org > More majordomo info at http://vger.kernel.org/majordomo-info.html > Please read the FAQ at http://www.tux.org/lkml/ -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)
On 04/11/15 12:53 AM, Daniel Micay wrote: >> In the common case it will be passed many pages by the allocator. There >> will still be a layer of purging logic on top of MADV_FREE but it can be >> much thinner than the current workarounds for MADV_DONTNEED. So the >> allocator would still be coalescing dirty ranges and only purging when >> the ratio of dirty:clean pages rises above some threshold. It would be >> able to weight the largest ranges for purging first rather than logic >> based on stuff like aging as is used for MADV_DONTNEED. > > I would expect that jemalloc would just start putting the dirty ranges > into the usual pair of red-black trees (with coalescing) and then doing > purging starting from the largest spans to get back down below whatever > dirty:clean ratio it's trying to keep. Right now, it has all lots of > other logic to deal with this since each MADV_DONTNEED call results in > lots of zeroing and then page faults. Er, I mean dirty:active (i.e. ratio of unpurged, dirty pages to ones that are handed out as allocations, which is kept at something like 1:8). A high constant cost in the madvise call but quick handling of each page means that allocators need to be more aggressive with purging more than they strictly need to in one go. For example, it might need to purge 2M to meet the ratio but it could have a contiguous span of 32M of dirty pages. If the cost per page is low enough, it could just do the entire range. signature.asc Description: OpenPGP digital signature
Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
On Tue, Nov 3, 2015 at 5:54 PM, Namhyung Kim wrote: > Hi Brendan, > > On Tue, Nov 03, 2015 at 01:33:43PM -0800, Brendan Gregg wrote: >> On Tue, Nov 3, 2015 at 6:40 AM, Arnaldo Carvalho de Melo >> wrote: >> > Em Tue, Nov 03, 2015 at 09:52:07PM +0900, Namhyung Kim escreveu: >> >> Hello, >> >> >> >> This is what Brendan requested on the perf-users mailing list [1] to >> >> support FlameGraphs [2] more efficiently. This patchset adds a few >> >> more callchain options to adjust the output for it. >> >> >> >> * changes in v4) >> >> - add missing doc update >> >> - cleanup/fix callchain value print code >> >> - add Acked-by from Brendan and Jiri >> > >> > Do those Acked-by stand? Things changed, the values moved from the end >> > of the line to the start, etc. >> > >> [...] >> >> I'd Ack this change as it's a useful addition. It doesn't quite >> address the folded-only output, but it's a step in that direction. I >> think having the value at the start of a line only makes sense for the >> perf report output containing the hist summary lines, for consistency. > > Right, thanks! > > >> >> Here's how I'd shuffle the output of this patch (ignore word wrap >> issues with this email): >> >> # ./perf report --stdio -g folded,count,caller -F pid | \ >> awk '/^ / { n = $1 } >> /^[0-9]/ { split(n,a,":"); print a[2] "-" a[1] ";" $2,$1 }' >> swapper-0;cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op >> 809 >> swapper-0;xen_start_kernel;x86_64_start_reservations;start_kernel;rest_init;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op >> 135 >> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;check_events;xen_hypercall_xen_version >> 63 >> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf >> 54 >> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;memset_erms >> 3 >> dd-30551;xen_irq_enable_direct_end;check_events;xen_hypercall_xen_version 3 >> >> So the output is folded stacks, prefixed by comm-PID. Shuffling the >> summarized output is a lot better than doing a "perf script" dump and >> re-processing call chains. (Note that since I'm using -F, I didn't >> need --no-children; > > Nope. The '-F pid' doesn't affect --children. It doesn't show the > children overhead column but we still have hist entries for > (synthesized) children.. > > $ perf report --no-children | wc -l > 998 > > $ perf report --no-children -F pid,dso,sym | wc -l > 998 > > $ perf report --children | wc -l > 3229 > > $ perf report --children -F pid,dso,sym | wc -l > 3202 > > So I think you still need to use --no-children (or set report.children > config variable to false) for your script. Ok, good to know, thanks. > > >> and with "-g count", I didn't need --show-nr-samples.) > > Yes, I used -n/--show-nr-samples just to check the number is correct. > > >> >> I notice the fields (-F) option already has this precedent: >> >> - "comm": prints PID:comm >> - "pid": prints PID > > It's opposite: "comm" prints comm, "pid" prints PID:comm. :) Ah, right, sorry, I'd typed those the wrong way around. :) > > >> >> If these were added to -g, along with a no-hists, then the two types >> of folded-only output could be generated using: >> >> perf report --stdio -g folded,count,comm,no-hists,caller >> perf report --stdio -g folded,count,pid,no-hists,caller > > As I said, using fields like comm, pid requires to have same keys in > --sort option. So it's basically unreliable to use those specific > field names in the -g option IMHO. I suggested to use 'info' (yes, it > needs better name) to print all sort keys. > > >> >> ... although "no-hists" doesn't hit me as intuitive. How about "-F >> none" to specify zero columns? ie: >> >> perf report --stdio -g folded,count,comm,caller -F none >> perf report --stdio -g folded,count,pid,caller -F none > > Ah, makes sense. So it'd look like > > $ perf report --stdio -g folded,count,info -F none -s comm > $ perf report --stdio -g folded,count,info -F none -s pid > > The output would be > > 809 swapper-0 > cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op > Thanks, looks almost right: a couple of minor changes: 1. If perf already has the precedent of "PID:comm", instead of my "comm-PID", then maybe it should use "PID:comm" for perf consistency. Doesn't make much difference to me. 2. The second space, delimiting "PID:comm" (or comm) and the stack... I'm nervous about using space as a delimiter any more than once, since it can also appear in comm (eg, "java main") and frames (eg, "JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*, Thread*)" -- that's direct from "perf script"!). I'd consider
Re: [PATCH v2 1/7] staging: lustre: remove white space in libcfs_hash.h
On Mon, Nov 02, 2015 at 12:22:07PM -0500, James Simmons wrote: > Cleanup all the unneeded white space in libcfs_hash.h. > > Signed-off-by: James Simmons > --- > .../lustre/include/linux/libcfs/libcfs_hash.h | 135 > ++-- > 1 files changed, 70 insertions(+), 65 deletions(-) Doesn't apply, did I already queue up this series? confused, greg k-h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] Staging: comedi: ssc_dnp: fixed a comment style
On Fri, Oct 30, 2015 at 06:46:20PM +0100, Philippe Loctaux wrote: > Fixed a comment issue. > > Signed-off-by: Philippe Loctaux > --- > drivers/staging/comedi/drivers/ssv_dnp.c | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/staging/comedi/drivers/ssv_dnp.c > b/drivers/staging/comedi/drivers/ssv_dnp.c > index acc7f34..c28b6cb 100644 > --- a/drivers/staging/comedi/drivers/ssv_dnp.c > +++ b/drivers/staging/comedi/drivers/ssv_dnp.c > @@ -150,7 +150,8 @@ static int dnp_attach(struct comedi_device *dev, struct > comedi_devconfig *it) > > /* We use the I/O ports 0x22,0x23 and 0xa3-0xa9, which are always >* allocated for the primary 8259, so we don't need to allocate them > - * ourselves. */ > + * ourselves. > + */ > > /* configure all ports as input (default)*/ > outb(PAMR, CSCIR); Does not apply to my tree at all :( -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)
> In the common case it will be passed many pages by the allocator. There > will still be a layer of purging logic on top of MADV_FREE but it can be > much thinner than the current workarounds for MADV_DONTNEED. So the > allocator would still be coalescing dirty ranges and only purging when > the ratio of dirty:clean pages rises above some threshold. It would be > able to weight the largest ranges for purging first rather than logic > based on stuff like aging as is used for MADV_DONTNEED. I would expect that jemalloc would just start putting the dirty ranges into the usual pair of red-black trees (with coalescing) and then doing purging starting from the largest spans to get back down below whatever dirty:clean ratio it's trying to keep. Right now, it has all lots of other logic to deal with this since each MADV_DONTNEED call results in lots of zeroing and then page faults. signature.asc Description: OpenPGP digital signature
[lkp] [f5f3497cad] 8678414074: Fixed kernel test crashed
FYI, we noticed the below changes on https://github.com/0day-ci/linux Matt-Fleming/x86-setup-Fix-recent-boot-crash-on-32-bit-SMP-machines/20151103-221445 commit 867841407423402e594463cbdddc800f1c5f7b6f ("f5f3497cad: BUG: kernel boot crashed") ++++ || 90939cc535 | 8678414074 | ++++ | boot_successes | 0 | 9 | | boot_failures | 14 | 3 | | BUG:kernel_boot_crashed| 12 || | IP-Config:Auto-configuration_of_network_failed | 2 | 2 | | BUG:kernel_test_crashed| 0 | 1 | ++++ The patch fixes the kernel test crashed issue before. Tested-by: Huang, Ying Thanks, Ying Huang # # Automatically generated file; DO NOT EDIT. # Linux/i386 4.3.0-rc7 Kernel Configuration # # CONFIG_64BIT is not set CONFIG_X86_32=y CONFIG_X86=y CONFIG_INSTRUCTION_DECODER=y CONFIG_PERF_EVENTS_INTEL_UNCORE=y CONFIG_OUTPUT_FORMAT="elf32-i386" CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig" CONFIG_LOCKDEP_SUPPORT=y CONFIG_STACKTRACE_SUPPORT=y CONFIG_HAVE_LATENCYTOP_SUPPORT=y CONFIG_MMU=y CONFIG_NEED_SG_DMA_LENGTH=y CONFIG_GENERIC_ISA_DMA=y CONFIG_GENERIC_BUG=y CONFIG_GENERIC_HWEIGHT=y CONFIG_ARCH_MAY_HAVE_PC_FDC=y CONFIG_RWSEM_XCHGADD_ALGORITHM=y CONFIG_GENERIC_CALIBRATE_DELAY=y CONFIG_ARCH_HAS_CPU_RELAX=y CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y CONFIG_HAVE_SETUP_PER_CPU_AREA=y CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y CONFIG_ARCH_HIBERNATION_POSSIBLE=y CONFIG_ARCH_SUSPEND_POSSIBLE=y CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y CONFIG_ARCH_WANT_GENERAL_HUGETLB=y CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y CONFIG_X86_32_SMP=y CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx" CONFIG_ARCH_SUPPORTS_UPROBES=y CONFIG_FIX_EARLYCON_MEM=y CONFIG_PGTABLE_LEVELS=2 CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config" CONFIG_IRQ_WORK=y CONFIG_BUILDTIME_EXTABLE_SORT=y # # General setup # CONFIG_INIT_ENV_ARG_LIMIT=32 CONFIG_CROSS_COMPILE="" # CONFIG_COMPILE_TEST is not set CONFIG_LOCALVERSION="" CONFIG_LOCALVERSION_AUTO=y CONFIG_HAVE_KERNEL_GZIP=y CONFIG_HAVE_KERNEL_BZIP2=y CONFIG_HAVE_KERNEL_LZMA=y CONFIG_HAVE_KERNEL_XZ=y CONFIG_HAVE_KERNEL_LZO=y CONFIG_HAVE_KERNEL_LZ4=y # CONFIG_KERNEL_GZIP is not set # CONFIG_KERNEL_BZIP2 is not set CONFIG_KERNEL_LZMA=y # CONFIG_KERNEL_XZ is not set # CONFIG_KERNEL_LZO is not set # CONFIG_KERNEL_LZ4 is not set CONFIG_DEFAULT_HOSTNAME="(none)" CONFIG_SYSVIPC=y CONFIG_SYSVIPC_SYSCTL=y CONFIG_POSIX_MQUEUE=y CONFIG_POSIX_MQUEUE_SYSCTL=y # CONFIG_CROSS_MEMORY_ATTACH is not set CONFIG_FHANDLE=y CONFIG_USELIB=y CONFIG_AUDIT=y CONFIG_HAVE_ARCH_AUDITSYSCALL=y # CONFIG_AUDITSYSCALL is not set # # IRQ subsystem # CONFIG_GENERIC_IRQ_PROBE=y CONFIG_GENERIC_IRQ_SHOW=y CONFIG_GENERIC_PENDING_IRQ=y CONFIG_GENERIC_IRQ_CHIP=y CONFIG_IRQ_DOMAIN=y CONFIG_IRQ_DOMAIN_HIERARCHY=y # CONFIG_IRQ_DOMAIN_DEBUG is not set CONFIG_IRQ_FORCED_THREADING=y CONFIG_SPARSE_IRQ=y CONFIG_CLOCKSOURCE_WATCHDOG=y CONFIG_ARCH_CLOCKSOURCE_DATA=y CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y CONFIG_GENERIC_TIME_VSYSCALL=y CONFIG_GENERIC_CLOCKEVENTS=y CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y CONFIG_GENERIC_CMOS_UPDATE=y # # Timers subsystem # CONFIG_TICK_ONESHOT=y CONFIG_HZ_PERIODIC=y # CONFIG_NO_HZ_IDLE is not set CONFIG_NO_HZ=y CONFIG_HIGH_RES_TIMERS=y # # CPU/Task time and stats accounting # # CONFIG_TICK_CPU_ACCOUNTING is not set CONFIG_IRQ_TIME_ACCOUNTING=y CONFIG_BSD_PROCESS_ACCT=y CONFIG_BSD_PROCESS_ACCT_V3=y # CONFIG_TASKSTATS is not set # # RCU Subsystem # CONFIG_PREEMPT_RCU=y CONFIG_RCU_EXPERT=y CONFIG_SRCU=y # CONFIG_TASKS_RCU is not set CONFIG_RCU_STALL_COMMON=y CONFIG_RCU_FANOUT=32 CONFIG_RCU_FANOUT_LEAF=16 CONFIG_TREE_RCU_TRACE=y # CONFIG_RCU_BOOST is not set CONFIG_RCU_KTHREAD_PRIO=0 CONFIG_RCU_NOCB_CPU=y # CONFIG_RCU_NOCB_CPU_NONE is not set # CONFIG_RCU_NOCB_CPU_ZERO is not set CONFIG_RCU_NOCB_CPU_ALL=y # CONFIG_RCU_EXPEDITE_BOOT is not set CONFIG_BUILD_BIN2C=y CONFIG_IKCONFIG=y # CONFIG_IKCONFIG_PROC is not set CONFIG_LOG_BUF_SHIFT=17 CONFIG_LOG_CPU_MAX_BUF_SHIFT=12 CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y CONFIG_CGROUPS=y # CONFIG_CGROUP_DEBUG is not set # CONFIG_CGROUP_FREEZER is not set CONFIG_CGROUP_PIDS=y # CONFIG_CGROUP_DEVICE is not set CONFIG_CPUSETS=y # CONFIG_PROC_PID_CPUSET is not set # CONFIG_CGROUP_CPUACCT is not set # CONFIG_MEMCG is not set # CONFIG_CGROUP_HUGETLB is not set # CONFIG_CGROUP_PERF is not set CONFIG_CGROUP_SCHED=y CON
Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)
> Does this set the write protect bit? > > What happens on architectures without hardware dirty tracking? It's supposed to avoid needing page faults when the data is accessed again, but it can just be implemented via page faults on architectures without a way to check for access or writes. MADV_DONTNEED is also a valid implementation of MADV_FREE if it comes to that (which is what it does on swapless systems for now). > Using the dirty bit for these semantics scares me. This API creates a > page that can have visible nonzero contents and then can > asynchronously and magically zero itself thereafter. That makes me > nervous. Could we use the accessed bit instead? Then the observable > semantics would be equivalent to having MADV_FREE either zero the page > or do nothing, except that it doesn't make up its mind until the next > read. FWIW, those are already basically the semantics provided by GCC and LLVM for data the compiler considers uninitialized (they could be more aggressive since C just says it's undefined, but in practice they allow it but can produce inconsistent results even if it isn't touched). http://llvm.org/docs/LangRef.html#undefined-values It doesn't seem like there would be an advantage to checking if the data was written to vs. whether it was accessed if checking for both of those is comparable in performance. I don't know enough about that. >> + ptent = pte_mkold(ptent); >> + ptent = pte_mkclean(ptent); >> + set_pte_at(mm, addr, pte, ptent); >> + tlb_remove_tlb_entry(tlb, pte, addr); > > It looks like you are flushing the TLB. In a multithreaded program, > that's rather expensive. Potentially silly question: would it be > better to just zero the page immediately in a multithreaded program > and then, when swapping out, check the page is zeroed and, if so, skip > swapping it out? That could be done without forcing an IPI. In the common case it will be passed many pages by the allocator. There will still be a layer of purging logic on top of MADV_FREE but it can be much thinner than the current workarounds for MADV_DONTNEED. So the allocator would still be coalescing dirty ranges and only purging when the ratio of dirty:clean pages rises above some threshold. It would be able to weight the largest ranges for purging first rather than logic based on stuff like aging as is used for MADV_DONTNEED. signature.asc Description: OpenPGP digital signature
Re: [PATCH v1 1/4] Crypto: Crypto driver support aes/des/des3 for rk3288
Hi LABBE, On 2015年11月03日 16:59, LABBE Corentin wrote: > On Tue, Nov 03, 2015 at 01:52:05PM +0800, Zain Wang wrote: >> Crypto driver support cbc/ecb two chainmode, and aes/des/des3 three cipher >> mode. >> The names registered are: >> ecb(aes) cbc(aes) ecb(des) cbc(des) ecb(des3_ede) cbc(des3_ede) >> You can alloc tags above in your case. >> >> And other algorithms and platforms will be added later on. >> >> Signed-off-by: Zain Wang >> --- >> drivers/crypto/Kconfig | 11 + >> drivers/crypto/Makefile| 1 + >> drivers/crypto/rockchip/Makefile | 3 + >> drivers/crypto/rockchip/rk3288_crypto.c| 383 >> drivers/crypto/rockchip/rk3288_crypto.h| 290 >> drivers/crypto/rockchip/rk3288_crypto_ablkcipher.c | 501 >> + >> 6 files changed, 1189 insertions(+) >> create mode 100644 drivers/crypto/rockchip/Makefile >> create mode 100644 drivers/crypto/rockchip/rk3288_crypto.c >> create mode 100644 drivers/crypto/rockchip/rk3288_crypto.h >> create mode 100644 drivers/crypto/rockchip/rk3288_crypto_ablkcipher.c >> >> diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig >> index 2569e04..d1e42cf 100644 >> --- a/drivers/crypto/Kconfig >> +++ b/drivers/crypto/Kconfig >> @@ -498,4 +498,15 @@ config CRYPTO_DEV_SUN4I_SS >>To compile this driver as a module, choose M here: the module >>will be called sun4i-ss. >> >> +config CRYPTO_DEV_ROCKCHIP >> +tristate "Rockchip's Cryptographic Engine driver" >> + >> +select CRYPTO_AES >> +select CRYPTO_DES >> +select CRYPTO_BLKCIPHER >> + >> +help >> + This driver interfaces with the hardware crypto accelerator. >> + Supporting cbc/ecb chainmode, and aes/des/des3_ede cipher mode. >> + >> endif # CRYPTO_HW >> diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile >> index c3ced6f..713de9d 100644 >> --- a/drivers/crypto/Makefile >> +++ b/drivers/crypto/Makefile >> @@ -29,3 +29,4 @@ obj-$(CONFIG_CRYPTO_DEV_QAT) += qat/ >> obj-$(CONFIG_CRYPTO_DEV_QCE) += qce/ >> obj-$(CONFIG_CRYPTO_DEV_VMX) += vmx/ >> obj-$(CONFIG_CRYPTO_DEV_SUN4I_SS) += sunxi-ss/ >> +obj-$(CONFIG_CRYPTO_DEV_ROCKCHIP) += rockchip/ >> diff --git a/drivers/crypto/rockchip/Makefile >> b/drivers/crypto/rockchip/Makefile >> new file mode 100644 >> index 000..7051c6c >> --- /dev/null >> +++ b/drivers/crypto/rockchip/Makefile >> @@ -0,0 +1,3 @@ >> +obj-$(CONFIG_CRYPTO_DEV_ROCKCHIP) += rk_crypto.o >> +rk_crypto-objs := rk3288_crypto.o \ >> + rk3288_crypto_ablkcipher.o \ >> diff --git a/drivers/crypto/rockchip/rk3288_crypto.c >> b/drivers/crypto/rockchip/rk3288_crypto.c >> new file mode 100644 >> index 000..02830f2 >> --- /dev/null >> +++ b/drivers/crypto/rockchip/rk3288_crypto.c >> @@ -0,0 +1,383 @@ >> +/* >> + *Crypto acceleration support for Rockchip RK3288 >> + * >> + * Copyright (c) 2015, Fuzhou Rockchip Electronics Co., Ltd >> + * >> + * Author: Zain Wang >> + * >> + * This program is free software; you can redistribute it and/or modify it >> + * under the terms and conditions of the GNU General Public License, >> + * version 2, as published by the Free Software Foundation. >> + * >> + * Some ideas are from marvell-cesa.c and s5p-sss.c driver. >> + */ >> + >> +#include "rk3288_crypto.h" >> +#include >> +#include >> +#include >> +#include >> +#include >> + >> +struct crypto_info_t *crypto_p; >> + >> +static int rk_crypto_enable_clk(struct crypto_info_t *dev) >> +{ >> +int err; >> + >> +err = clk_prepare_enable(dev->sclk); >> +if (err) { >> +dev_err(dev->dev, "[%s:%d], Couldn't enable clock 'sclk'\n", >> +__func__, __LINE__); >> +goto err_return; >> +} >> +err = clk_prepare_enable(dev->aclk); >> +if (err) { >> +dev_err(dev->dev, "[%s:%d], Couldn't enable clock 'aclk'\n", >> +__func__, __LINE__); >> +goto err_aclk; >> +} >> +err = clk_prepare_enable(dev->hclk); >> +if (err) { >> +dev_err(dev->dev, "[%s:%d], Couldn't enable clock 'hclk'\n", >> +__func__, __LINE__); >> +goto err_hclk; >> +} >> + >> +err = clk_prepare_enable(dev->dmaclk); >> +if (err) { >> +dev_err(dev->dev, "[%s:%d], Couldn't enable clock 'dmaclk'\n", >> +__func__, __LINE__); >> +goto err_dmaclk; >> +} >> +return err; >> +err_dmaclk: >> +clk_disable_unprepare(dev->hclk); >> +err_hclk: >> +clk_disable_unprepare(dev->aclk); >> +err_aclk: >> +clk_disable_unprepare(dev->sclk); >> +err_return: >> +return err; >> +} >> + >> +static void rk_crypto_disable_clk(struct crypto_info_t *dev) >> +{ >> +clk_disable_unprepare(dev->dmaclk); >> +clk_disable_unprepare(dev->hclk); >> +
[lkp] [f2fs] 60b99b486b: +1.9% fileio.requests_per_sec
FYI, we noticed the below changes on https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master commit 60b99b486b568c13cbb7caa83cf8a12af7665f1e ("f2fs: introduce a periodic checkpoint flow") = tbox_group/testcase/rootfs/kconfig/compiler/period/nr_threads/disk/fs/size/filenum/rwmode/iomode: lkp-ws02/fileio/debian-x86_64-2015-02-07.cgz/x86_64-rhel/gcc-4.9/600s/100%/1HDD/f2fs/64G/1024f/seqrewr/sync commit: 5c2674347466d5c2d5169214e95f4ad6dc09e9b6 60b99b486b568c13cbb7caa83cf8a12af7665f1e 5c2674347466d5c2 60b99b486b568c13cbb7caa83c -- %stddev %change %stddev \ |\ ± 8%+279.7% 4219 ± 23% fileio.request_latency_max_ms 5656 ± 0% +1.9% 5764 ± 0% fileio.requests_per_sec 1.088e+08 ± 0% +1.9% 1.108e+08 ± 0% fileio.time.file_system_outputs 46225 ± 0% +4.9% 48473 ± 0% fileio.time.involuntary_context_switches 3759 ± 0% +2.6% 3858 ± 1% fileio.time.maximum_resident_set_size 47.00 ± 92%+163.8% 124.00 ± 8% numa-vmstat.node0.workingset_refault 37306 ± 2% -10.6% 33365 ± 1% softirqs.BLOCK 328.00 ± 0% +55.5% 510.00 ± 1% time.file_system_inputs 79132 ± 17% +30.7% 103393 ± 2% numa-meminfo.node0.Active 122897 ± 10% -20.4% 97780 ± 2% numa-meminfo.node1.Active 1.45 ± 1% +3.3% 1.50 ± 1% turbostat.%Busy 0.40 ± 13% +76.1% 0.70 ± 19% turbostat.Pkg%pc3 0.12 ± 15%+638.8% 0.91 ± 16% turbostat.Pkg%pc6 83662 ± 0% +1.9% 85235 ± 0% vmstat.io.bo 626.00 ± 5% +23.0% 770.25 ± 8% vmstat.memory.buff 9980 ± 1% +2.6% 10241 ± 1% vmstat.system.cs 2.012e+08 ± 11% +25.0% 2.514e+08 ± 8% cpuidle.C1E-NHM.time 52329 ± 11% -23.6% 39994 ± 12% cpuidle.C3-NHM.usage 5227 ± 29%+413.0% 26816 ±114% cpuidle.POLL.time 336.25 ± 33% +82.5% 613.50 ± 6% cpuidle.POLL.usage 2648 ± 5% -13.8% 2283 ± 2% proc-vmstat.kswapd_high_wmark_hit_quickly 3872 ± 1% +12.6% 4361 ± 3% proc-vmstat.kswapd_low_wmark_hit_quickly 1072 ± 29% +30.1% 1395 ± 16% proc-vmstat.pgpgin 9856 ± 11%+350.3% 44384 ± 3% proc-vmstat.slabs_scanned 3364 ± 2% -20.9% 2660 ± 2% slabinfo.f2fs_extent_tree.active_objs 3364 ± 2% -20.9% 2660 ± 2% slabinfo.f2fs_extent_tree.num_objs 19645 ± 0% +9.8% 21579 ± 4% slabinfo.jbd2_revoke_record_s.num_objs 3165 ± 4% -8.7% 2891 ± 8% slabinfo.proc_inode_cache.num_objs 0.00 ± -1% +Inf%1774752 ± 7% latency_stats.avg.call_rwsem_down_read_failed.f2fs_convert_inline_inode.[f2fs].f2fs_write_begin.[f2fs].generic_perform_write.__generic_file_write_iter.generic_file_write_iter.f2fs_file_write_iter.[f2fs].__vfs_write.vfs_write.SyS_pwrite64.entry_SYSCALL_64_fastpath 0.00 ± -1% +Inf%1804979 ± 68% latency_stats.avg.get_request.blk_queue_bio.generic_make_request.submit_bio.f2fs_submit_page_bio.[f2fs].get_meta_page.[f2fs].get_node_info.[f2fs].read_node_page.[f2fs].get_node_page.[f2fs].get_dnode_of_data.[f2fs].f2fs_reserve_block.[f2fs].f2fs_get_block.[f2fs] 0.00 ± -1% +Inf%3191014 ± 6% latency_stats.max.call_rwsem_down_read_failed.f2fs_convert_inline_inode.[f2fs].f2fs_write_begin.[f2fs].generic_perform_write.__generic_file_write_iter.generic_file_write_iter.f2fs_file_write_iter.[f2fs].__vfs_write.vfs_write.SyS_pwrite64.entry_SYSCALL_64_fastpath 1043277 ± 6%+294.7%4118293 ± 23% latency_stats.max.generic_file_write_iter.f2fs_file_write_iter.[f2fs].__vfs_write.vfs_write.SyS_pwrite64.entry_SYSCALL_64_fastpath 0.00 ± -1% +Inf%1804979 ± 68% latency_stats.max.get_request.blk_queue_bio.generic_make_request.submit_bio.f2fs_submit_page_bio.[f2fs].get_meta_page.[f2fs].get_node_info.[f2fs].read_node_page.[f2fs].get_node_page.[f2fs].get_dnode_of_data.[f2fs].f2fs_reserve_block.[f2fs].f2fs_get_block.[f2fs] 16549 ± 50%+248.8% 57724 ±145% latency_stats.max.rpc_wait_bit_killable.__rpc_wait_for_completion_task.nfs4_run_open_task.[nfsv4]._nfs4_open_and_get_state.[nfsv4].nfs4_do_open.[nfsv4].nfs4_atomic_open.[nfsv4].nfs4_file_open.[nfsv4].do_dentry_open.vfs_open.path_openat.do_filp_open.do_sys_open 0.00 ± -1% +Inf% 16400627 ± 7% latency_stats.sum.call_rwsem_down_read_failed.f2fs_convert_inline_inode.[f2fs].f2fs_write_begin.[f2fs].generic_perform_write.__generic_file_write_iter.generic_file_write_iter.f2fs_file_write_iter.[f2fs].__vfs_write.vfs_write.SyS_pwrite64.entry_SYSCALL_64_fastpath 0.00 ± -1% +Inf%1804979 ± 68%
Re: [PATCH V3 2/3] perf/powerpc :add support for sampling intr machine state
On Tuesday 03 November 2015 02:46 PM, Michael Ellerman wrote: > On Tue, 2015-11-03 at 11:40 +0530, Anju T wrote: > >> The perf infrastructure uses a bit mask to find out >> valid registers to display. Define a register mask >> for supported registers defined in asm/perf_regs.h. >> The bit positions also correspond to register IDs >> which is used by perf infrastructure to fetch the register >> values.CONFIG_HAVE_PERF_REGS enables >> sampling of the interrupted machine state. >> diff --git a/arch/powerpc/perf/perf_regs.c b/arch/powerpc/perf/perf_regs.c >> new file mode 100644 >> index 000..0520492 >> --- /dev/null >> +++ b/arch/powerpc/perf/perf_regs.c >> @@ -0,0 +1,92 @@ >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> + >> +#define PT_REGS_OFFSET(id, r) [id] = offsetof(struct pt_regs, r) >> + >> +#define REG_RESERVED (~((1ULL << PERF_REG_POWERPC_MAX) - 1)) >> + >> +static unsigned int pt_regs_offset[PERF_REG_POWERPC_MAX] = { >> +PT_REGS_OFFSET(PERF_REG_POWERPC_GPR0, gpr[0]), >> +PT_REGS_OFFSET(PERF_REG_POWERPC_GPR1, gpr[1]), >> +PT_REGS_OFFSET(PERF_REG_POWERPC_GPR2, gpr[2]), > > > I realise you're following the example of other architectures, but we have > almost this exact same structure in ptrace.c, see regoffset_table. That won't work because we want to add more regs to the perf version but not the ptrace version. Maddy > It would be really nice if we could share them between ptrace and perf. > > cheers > > ___ > Linuxppc-dev mailing list > linuxppc-...@lists.ozlabs.org > https://lists.ozlabs.org/listinfo/linuxppc-dev -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[GIT PULL] Thermal-SoC management updates for v4.4-rc1 #2
Hello Rui, Please pull from git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal linus to receive second Thermal-SoC Management updates for v4.4-rc1 with top-most 7e38a5b1daa12cbaace3c76402999a84460df3e2: thermal: rockchip: support the sleep pinctrl state to avoid glitches in s2r (2015-11-03 09:57:42 -0800) on top of commit 8fb2b9ac2aadd6d87f89071c2c85f8c12b41c943: thermal: underflow bug in imx_set_trip_temp() (2015-10-30 11:35:50 -0700) Specifics: - Rockchip thermal driver now has pinctrl support - First round of fixes on devfreq cooling device. - This branch has been compiled tested and boot tested by Linaro kernelci bot [1,2]. This is a second round of pull after going through the pending patches on patchwork. Still trying to reduce the patchwork counts. The rockchip support has already been reviewed. And there is no point in waiting for the devfreq fixes. Still checking what is pending. If I missed your patch, ping me. [1] - http://kernelci.org/boot/all/job/evalenti/kernel/v4.3-rc3-56-g7e38a5b1daa1/ [2] - http://kernelci.org/build/evalenti/kernel/v4.3-rc3-56-g7e38a5b1daa1/ BR, Eduardo Valentin Caesar Wang (2): dt-bindings: rockchip-thermal: Add the pinctrl states in this document thermal: rockchip: support the sleep pinctrl state to avoid glitches in s2r Javi Merino (2): thermal: devfreq_cooling: use a thermal_cooling_device for register and unregister thermal: devfreq_cooling: Make power a u64 .../devicetree/bindings/thermal/rockchip-thermal.txt | 11 +-- drivers/thermal/devfreq_cooling.c | 18 +++--- drivers/thermal/rockchip_thermal.c | 4 include/linux/devfreq_cooling.h| 16 4 files changed, 32 insertions(+), 17 deletions(-) -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 0/2] Stop Intel Processor Trace logging on panic
Hi all, These patch series provide a feature to stop Intel Processor Trace (Intel PT) logging and save its registers in the memory on panic. Intel PT is a new feature of Intel CPU "Broadwell", it captures information about program execution flow. Here is a article about Intel PT. https://software.intel.com/en-us/blogs/2013/09/18/processor-tracing Once Intel PT is enabled, the events which change program flow, like branch instructions, exceptions, interruptions, traps and so on are logged in the memory. This is very useful for debugging because we can know the detailed behavior of software. When kernel panic occurs while you are running a perf command with Intel PT (with -e intel_pt// option), these patches disable Intel PT and save its registers in the memory. After crash dump is captured by kdump, you can retrieve Intel PT log buffer from vmcore and investigate kernel behavior. I have not made a tool yet to salvage Intel PT log buffer from vmcore, but I'll do once these patches are accepted. changelog: v2: - Define function in intel_pt.h with static inline v1: https://lkml.org/lkml/2015/10/28/136 Background: These patches are a part of patch series I posted before, the original discussion is bellow. x86: Intel Processor Trace Logger v1: https://lkml.org/lkml/2015/7/29/6 v2: https://lkml.org/lkml/2015/9/8/24 The purpose of the original patches is introducing in-kernel logger using Intel PT. To implement it I need to add some APIs to control perf counter and ring buffer in kernel. Alexander Shishkin is working on such APIs for his work to make use of Intel PT for process core dump. Apart from such APIs, the feature to save Intel PT registers on panic is helpful for normal perf command user as I described above, therefore I separate the feature from original patches. Takao Indoh (2): perf/x86/intel/pt: Add interface to stop Intel PT logging x86: Stop Intel PT before kdump starts arch/x86/include/asm/intel_pt.h | 10 ++ arch/x86/kernel/cpu/perf_event_intel_pt.c |9 + arch/x86/kernel/crash.c | 11 +++ 3 files changed, 30 insertions(+), 0 deletions(-) create mode 100644 arch/x86/include/asm/intel_pt.h -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 1/2] perf/x86/intel/pt: Add interface to stop Intel PT logging
This patch add a function for external component to stop Intel PT. Basically this function is used when kernel panic occurs. When it is called, intel_pt driver disables Intel PT and saves its registers using pt_event_stop, which is also used by pmu.stop handler. This function stops Intel PT on the cpu where it is working, therefore user need to call it for each cpu to stop all logging. Signed-off-by: Takao Indoh --- arch/x86/include/asm/intel_pt.h | 10 ++ arch/x86/kernel/cpu/perf_event_intel_pt.c |9 + 2 files changed, 19 insertions(+), 0 deletions(-) create mode 100644 arch/x86/include/asm/intel_pt.h diff --git a/arch/x86/include/asm/intel_pt.h b/arch/x86/include/asm/intel_pt.h new file mode 100644 index 000..e1a4117 --- /dev/null +++ b/arch/x86/include/asm/intel_pt.h @@ -0,0 +1,10 @@ +#ifndef _ASM_X86_INTEL_PT_H +#define _ASM_X86_INTEL_PT_H + +#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL) +void cpu_emergency_stop_pt(void); +#else +static inline void cpu_emergency_stop_pt(void) {} +#endif + +#endif /* _ASM_X86_INTEL_PT_H */ diff --git a/arch/x86/kernel/cpu/perf_event_intel_pt.c b/arch/x86/kernel/cpu/perf_event_intel_pt.c index 4216928..a638b5b 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_pt.c +++ b/arch/x86/kernel/cpu/perf_event_intel_pt.c @@ -27,6 +27,7 @@ #include #include #include +#include #include "perf_event.h" #include "intel_pt.h" @@ -1125,6 +1126,14 @@ static int pt_event_init(struct perf_event *event) return 0; } +void cpu_emergency_stop_pt(void) +{ + struct pt *pt = this_cpu_ptr(_ctx); + + if (pt->handle.event) + pt_event_stop(pt->handle.event, PERF_EF_UPDATE); +} + static __init int pt_init(void) { int ret, cpu, prior_warn = 0; -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 2/2] x86: Stop Intel PT before kdump starts
This patch stops Intel PT logging and saves its registers in the memory before kdump is started. This feature is needed to prevent Intel PT from overwrite its log buffer after panic, and saved registers are needed to find the last position where Intel PT wrote data. After crash dump is captured by kdump, user can retrieve the log buffer from vmcore and use it to investigate kernel behavior. Signed-off-by: Takao Indoh --- arch/x86/kernel/crash.c | 11 +++ 1 files changed, 11 insertions(+), 0 deletions(-) diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c index 74ca2fe..5f383d2 100644 --- a/arch/x86/kernel/crash.c +++ b/arch/x86/kernel/crash.c @@ -35,6 +35,7 @@ #include #include #include +#include /* Alignment required for elf header segment */ #define ELF_CORE_HEADER_ALIGN 4096 @@ -127,6 +128,11 @@ static void kdump_nmi_callback(int cpu, struct pt_regs *regs) cpu_emergency_vmxoff(); cpu_emergency_svm_disable(); + /* +* Disable Intel PT to stop its logging +*/ + cpu_emergency_stop_pt(); + disable_local_APIC(); } @@ -172,6 +178,11 @@ void native_machine_crash_shutdown(struct pt_regs *regs) cpu_emergency_vmxoff(); cpu_emergency_svm_disable(); + /* +* Disable Intel PT to stop its logging +*/ + cpu_emergency_stop_pt(); + #ifdef CONFIG_X86_IO_APIC /* Prevent crash_kexec() from deadlocking on ioapic_lock. */ ioapic_zap_locks(); -- 1.7.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V3 2/3] perf/powerpc :add support for sampling intr machine state
Hi Michael, On Tuesday 03 November 2015 02:46 PM, Michael Ellerman wrote: On Tue, 2015-11-03 at 11:40 +0530, Anju T wrote: The perf infrastructure uses a bit mask to find out valid registers to display. Define a register mask for supported registers defined in asm/perf_regs.h. The bit positions also correspond to register IDs which is used by perf infrastructure to fetch the register values.CONFIG_HAVE_PERF_REGS enables sampling of the interrupted machine state. diff --git a/arch/powerpc/perf/perf_regs.c b/arch/powerpc/perf/perf_regs.c new file mode 100644 index 000..0520492 --- /dev/null +++ b/arch/powerpc/perf/perf_regs.c @@ -0,0 +1,92 @@ +#include +#include +#include +#include +#include +#include +#include +#include + +#define PT_REGS_OFFSET(id, r) [id] = offsetof(struct pt_regs, r) + +#define REG_RESERVED (~((1ULL << PERF_REG_POWERPC_MAX) - 1)) + +static unsigned int pt_regs_offset[PERF_REG_POWERPC_MAX] = { + PT_REGS_OFFSET(PERF_REG_POWERPC_GPR0, gpr[0]), + PT_REGS_OFFSET(PERF_REG_POWERPC_GPR1, gpr[1]), + PT_REGS_OFFSET(PERF_REG_POWERPC_GPR2, gpr[2]), I realise you're following the example of other architectures, but we have almost this exact same structure in ptrace.c, see regoffset_table. It would be really nice if we could share them between ptrace and perf. cheers Thank you for reviewing the patch. That is a great suggestion. In ptrace.c the structure doesn't include ORIG_R3. So,in that case what should we do? Thanks and Regards Anju -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
perf related lockdep bug
This is new to me since todays merge window. Dave [ 293.955261] == [ 293.955395] [ INFO: possible circular locking dependency detected ] [ 293.955536] 4.3.0-think+ #3 Not tainted [ 293.955618] --- [ 293.955756] trinity-c343/12586 is trying to acquire lock: [ 293.955875] (rcu_node_0){-.-...}, at: [] rcu_report_exp_cpu_mult+0x24/0x70 [ 293.956070] but task is already holding lock: [ 293.956194] (>lock){-.}, at: [] perf_lock_task_context+0x53/0x300 [ 293.956386] which lock already depends on the new lock. [ 293.956559] the existing dependency chain (in reverse order) is: [ 293.956720] -> #3 (>lock){-.}: [ 293.956807][] lock_acquire+0xc3/0x1d0 [ 293.956940][] _raw_spin_lock+0x41/0x80 [ 293.957075][] __perf_event_task_sched_out+0x276/0x700 [ 293.957239][] __schedule+0x530/0xaf0 [ 293.957370][] schedule+0x3f/0x90 [ 293.957493][] exit_to_usermode_loop+0x76/0xa0 [ 293.957642][] prepare_exit_to_usermode+0x53/0x60 [ 293.957795][] retint_user+0x8/0x20 [ 293.957923] -> #2 (>lock){-.-.-.}: [ 293.958008][] lock_acquire+0xc3/0x1d0 [ 293.958140][] _raw_spin_lock+0x41/0x80 [ 293.958275][] sched_move_task+0x4d/0x270 [ 293.958417][] cpu_cgroup_fork+0xe/0x10 [ 293.958551][] cgroup_post_fork+0x67/0x160 [ 293.958692][] copy_process+0x147a/0x1e10 [ 293.958831][] _do_fork+0x97/0x680 [ 293.958955][] kernel_thread+0x29/0x30 [ 293.959087][] rest_init+0x27/0x140 [ 293.959211][] start_kernel+0x450/0x45d [ 293.959347][] x86_64_start_reservations+0x2a/0x2c [ 293.959503][] x86_64_start_kernel+0x168/0x176 [ 293.959652] -> #1 (>pi_lock){-.-.-.}: [ 293.959742][] lock_acquire+0xc3/0x1d0 [ 293.959876][] _raw_spin_lock_irqsave+0x4c/0x90 [ 293.960026][] try_to_wake_up+0x30/0x5f0 [ 293.960165][] wake_up_process+0x23/0x40 [ 293.960300][] rcu_spawn_gp_kthread+0xd5/0xfd [ 293.960449][] do_one_initcall+0xb3/0x1f0 [ 293.960587][] kernel_init_freeable+0x101/0x234 [ 293.960736][] kernel_init+0xe/0xe0 [ 293.960860][] ret_from_fork+0x3f/0x70 [ 293.960993] -> #0 (rcu_node_0){-.-...}: [ 293.961081][] __lock_acquire+0x19d2/0x1cf0 [ 293.961222][] lock_acquire+0xc3/0x1d0 [ 293.961351][] _raw_spin_lock_irqsave+0x4c/0x90 [ 293.961501][] rcu_report_exp_cpu_mult+0x24/0x70 [ 293.961652][] rcu_read_unlock_special+0x109/0x5e0 [ 293.961808][] __rcu_read_unlock+0x87/0x90 [ 293.961947][] perf_lock_task_context+0x19d/0x300 [ 293.962101][] perf_event_init_task+0x9f/0x370 [ 293.962248][] copy_process+0x7fa/0x1e10 [ 293.962385][] _do_fork+0x97/0x680 [ 293.962508][] SyS_clone+0x19/0x20 [ 293.962631][] entry_SYSCALL_64_fastpath+0x12/0x6f [ 293.962787] other info that might help us debug this: [ 293.962955] Chain exists of: rcu_node_0 --> >lock --> >lock [ 293.963106] Possible unsafe locking scenario: [ 293.963233]CPU0CPU1 [ 293.963332] [ 293.963430] lock(>lock); [ 293.963500]lock(>lock); [ 293.963627]lock(>lock); [ 293.963755] lock(rcu_node_0); [ 293.963824] *** DEADLOCK *** [ 293.963947] 2 locks held by trinity-c343/12586: [ 293.964047] #0: (rcu_read_lock){..}, at: [] perf_lock_task_context+0xd0/0x300 [ 293.964258] #1: (>lock){-.}, at: [] perf_lock_task_context+0x53/0x300 [ 293.972934] stack backtrace: [ 293.990221] CPU: 3 PID: 12586 Comm: trinity-c343 Not tainted 4.3.0-think+ #3 [ 293.999210] 8804fb6824a0 88044a0efa60 b03607bc b1779630 [ 294.008115] 88044a0efaa0 b00d1453 88044a0efb10 0002 [ 294.017066] 8804fb682468 8804fb6824a0 0001 8804fb681b80 [ 294.026109] Call Trace: [ 294.035079] [] dump_stack+0x4e/0x82 [ 294.044014] [] print_circular_bug+0x1e3/0x250 [ 294.052982] [] __lock_acquire+0x19d2/0x1cf0 [ 294.061926] [] ? get_lock_stats+0x19/0x60 [ 294.070878] [] lock_acquire+0xc3/0x1d0 [ 294.079828] [] ? rcu_report_exp_cpu_mult+0x24/0x70 [ 294.088805] [] _raw_spin_lock_irqsave+0x4c/0x90 [ 294.097771] [] ? rcu_report_exp_cpu_mult+0x24/0x70 [ 294.106807] [] rcu_report_exp_cpu_mult+0x24/0x70 [ 294.115823] [] rcu_read_unlock_special+0x109/0x5e0 [ 294.124813] [] ? get_lock_stats+0x19/0x60 [ 294.133801] [] ? perf_lock_task_context+0x53/0x300 [ 294.142790] [] __rcu_read_unlock+0x87/0x90 [ 294.151675] []
Re: kernel panic in 4.2.3, rb_erase in sch_fq
On 2015-11-04 06:58, Eric Dumazet wrote: On Tue, 2015-11-03 at 20:46 -0800, Eric Dumazet wrote: On Wed, 2015-11-04 at 06:25 +0200, Denys Fedoryshchenko wrote: > On 2015-11-04 00:06, Cong Wang wrote: > > On Mon, Nov 2, 2015 at 6:11 AM, Denys Fedoryshchenko > > wrote: > >> Hi! > >> > >> Actually seems i was getting this panic for a while (once per week) on > >> loaded pppoe server, but just now was able to get full panic message. > >> After checking commit logs on sch_fq.c i didnt seen any fixes, so > >> probably > >> upgrading to newer kernel wont help? > > > > > > Can you share your `tc qdisc show dev ` with us? And how to > > reproduce > > it? I tried to setup htb+fq and then flip the interface back and forth > > but I don't > > see any crash. > My guess it wont be easy to reproduce, it is happening on box with 4.5k > interfaces, that constantly create/delete interfaces, > and even with that this problem may happen once per day, or may not > happen for 1 week. > > Here is script that is being fired after new ppp interface detected. But > pppoe process are independent from > process that are "establishing" shapers. It is probably a generic bug. sch_fq seems OK to me. Somehow nobody tries to change qdisc hundred times per second ;) Could you try following patch ? It seems to 'fix' the issue for me. Following patch would be more appropriate. Prior one was meant to 'show' the issue. diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index cb5d4ad32946..7f5f3e8a10f5 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -706,9 +706,11 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue, spin_lock_bh(root_lock); /* Prune old scheduler */ - if (oqdisc && atomic_read(>refcnt) <= 1) - qdisc_reset(oqdisc); - + if (oqdisc) { + if (atomic_read(>refcnt) <= 1) + qdisc_reset(oqdisc); + set_bit(__QDISC_STATE_DEACTIVATED, >state); + } /* ... and graft new one */ if (qdisc == NULL) qdisc = _qdisc; Applied, will test it, but this bug might be triggered rarely. I will try to push it to more pppoe servers in order to stress test them (and 4.3) more. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH v1] Revert "serial: imx: remove unbalanced clk_prepare"
From: Robin Gong Sent: Wednesday, November 04, 2015 10:58 AM > To: gre...@linuxfoundation.org; edubez...@gmail.com > Cc: Duan Fugang-B38611; linux-ser...@vger.kernel.org; linux- > ker...@vger.kernel.org > Subject: [PATCH v1] Revert "serial: imx: remove unbalanced clk_prepare" > > commit 9e7b399d6528 ("serial: imx: remove unbalanced clk_prepare"). > Otherwise below warning happen since there are some printk logs in > interrupt. > > [ 14.868319] udevd[501]: starting version 182 > [ 16.386107] random: nonblocking pool is initialized > [ 16.386123] [ cut here ] > [ 16.386140] WARNING: CPU: 0 PID: 501 at kernel/locking/mutex.c:868 > mutex_trylock+0x210/0x230() > [ 16.386146] DEBUG_LOCKS_WARN_ON(in_interrupt()) > [ 16.386149] Modules linked in: > [ 16.386157] CPU: 0 PID: 501 Comm: udevd Not tainted 4.3.0-rc1-00014- > gf843df8 #28 > [ 16.386160] Hardware name: Freescale i.MX6 SoloX (Device Tree) > [ 16.386165] Backtrace: > [ 16.386182] [] (dump_backtrace) from [] > (show_stack+0x18/0x1c) > [ 16.386192] r6:c0af1840 r5: r4: r3: > [ 16.386201] [] (show_stack) from [] > (dump_stack+0x8c/0xa4) > [ 16.386216] [] (dump_stack) from [] > (warn_slowpath_common+0x80/0xbc) > [ 16.386225] r6:c07b1ccc r5:0009 r4:edcf5c60 r3:0001 > [ 16.386234] [] (warn_slowpath_common) from [] > (warn_slowpath_fmt+0x38/0x40) > [ 16.386246] r8:c0564694 r7:eeb76880 r6:c13323ec r5:0001 > r4:c0976990 > [ 16.386255] [] (warn_slowpath_fmt) from [] > (mutex_trylock+0x210/0x230) > [ 16.386261] r3:c0978f50 r2:c0976990 > [ 16.386264] r4:c0b29a70 > [ 16.386278] [] (mutex_trylock) from [] > (clk_prepare_lock+0x14/0xf4) > [ 16.386289] r8:c133000c r7:eeb76880 r6:0037 r5:c12efbc8 > r4:eeb76880 > [ 16.386297] [] (clk_prepare_lock) from [] > (clk_prepare+0x18/0x38) > [ 16.386303] r5:c12efbc8 r4:eeb76880 > [ 16.386311] [] (clk_prepare) from [] > (imx_console_write+0x34/0x248) > [ 16.386317] r4:ee999c10 r3:c134a92c > [ 16.386329] [] (imx_console_write) from [] > (call_console_drivers.constprop.25+0xe0/0x104) > [ 16.386342] r10:c07b938c r9: r8:c133000c r7:0037 > r6:edcf4000 r5:c12ef6c0 > [ 16.386345] r4:c0afef38 > [ 16.386355] [] (call_console_drivers.constprop.25) from > [] (console_unlock+0x3fc/0x57c) > [ 16.386367] r10:c132fff0 r9:0100 r8:0037 r7: > r6:0005 r5:c12f4df0 > > Signed-off-by: Robin Gong > --- > drivers/tty/serial/imx.c | 20 ++-- > 1 file changed, 14 insertions(+), 6 deletions(-) I remember there had patch for this. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] crypto: qce: dma_map_sg can handle chained SG
On Tue, Nov 03, 2015 at 01:36:55PM +0100, LABBE Corentin wrote: > > Herbert, do you prefer to drop the patch series and I send an updated one, or > does I will send a new patch series for checking the return value of > sg_nents_for_len() ? Your patch has already been applied so please send any fixes on top of it. Thanks, -- Email: Herbert Xu Home Page: http://gondor.apana.org.au/~herbert/ PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: kernel panic in 4.2.3, rb_erase in sch_fq
On Tue, 2015-11-03 at 20:46 -0800, Eric Dumazet wrote: > On Wed, 2015-11-04 at 06:25 +0200, Denys Fedoryshchenko wrote: > > On 2015-11-04 00:06, Cong Wang wrote: > > > On Mon, Nov 2, 2015 at 6:11 AM, Denys Fedoryshchenko > > > wrote: > > >> Hi! > > >> > > >> Actually seems i was getting this panic for a while (once per week) on > > >> loaded pppoe server, but just now was able to get full panic message. > > >> After checking commit logs on sch_fq.c i didnt seen any fixes, so > > >> probably > > >> upgrading to newer kernel wont help? > > > > > > > > > Can you share your `tc qdisc show dev ` with us? And how to > > > reproduce > > > it? I tried to setup htb+fq and then flip the interface back and forth > > > but I don't > > > see any crash. > > My guess it wont be easy to reproduce, it is happening on box with 4.5k > > interfaces, that constantly create/delete interfaces, > > and even with that this problem may happen once per day, or may not > > happen for 1 week. > > > > Here is script that is being fired after new ppp interface detected. But > > pppoe process are independent from > > process that are "establishing" shapers. > > > It is probably a generic bug. sch_fq seems OK to me. > > Somehow nobody tries to change qdisc hundred times per second ;) > > Could you try following patch ? > > It seems to 'fix' the issue for me. Following patch would be more appropriate. Prior one was meant to 'show' the issue. diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c index cb5d4ad32946..7f5f3e8a10f5 100644 --- a/net/sched/sch_generic.c +++ b/net/sched/sch_generic.c @@ -706,9 +706,11 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue *dev_queue, spin_lock_bh(root_lock); /* Prune old scheduler */ - if (oqdisc && atomic_read(>refcnt) <= 1) - qdisc_reset(oqdisc); - + if (oqdisc) { + if (atomic_read(>refcnt) <= 1) + qdisc_reset(oqdisc); + set_bit(__QDISC_STATE_DEACTIVATED, >state); + } /* ... and graft new one */ if (qdisc == NULL) qdisc = _qdisc; -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 1/5] spi: introduce mmap read support for spi flash devices
Hi, On 11/03/2015 04:49 PM, Michal Suchanek wrote: > On 3 November 2015 at 11:06, Vignesh R wrote: >> In addition to providing direct access to SPI bus, some spi controller >> hardwares (like ti-qspi) provide special memory mapped port >> to accesses SPI flash devices in order to increase read performance. >> This means the controller can automatically send the SPI signals >> required to read data from the SPI flash device. >> For this, spi controller needs to know flash specific information like >> read command to use, dummy bytes and address width. Once these settings >> are populated in hardware registers, any read accesses to flash's memory >> map region(SoC specific) through memcpy (or mem-to mem DMA copy) will be >> handled by controller hardware. The hardware will automatically generate >> SPI signals required to read data from flash and present it to CPU/DMA. >> >> Introduce spi_mtd_mmap_read() interface to support memory mapped read >> over SPI flash devices. SPI master drivers can implement this callback to >> support memory mapped read interfaces. m25p80 flash driver and other >> flash drivers can call this to request memory mapped read. The interface >> should only be used MTD flashes and cannot be used with other SPI devices. >> >> Signed-off-by: Vignesh R >> --- >> drivers/spi/spi.c | 35 +++ >> include/linux/spi/spi.h | 23 +++ >> 2 files changed, 58 insertions(+) >> >> diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c >> index a5f53de813d3..5a5c7a7d47f2 100644 >> --- a/drivers/spi/spi.c >> +++ b/drivers/spi/spi.c >> @@ -1059,6 +1059,7 @@ static void __spi_pump_messages(struct spi_master >> *master, bool in_kthread) >> } >> } >> >> + mutex_lock(>mmap_lock_mutex); >> trace_spi_message_start(master->cur_msg); >> >> if (master->prepare_message) { >> @@ -1068,6 +1069,7 @@ static void __spi_pump_messages(struct spi_master >> *master, bool in_kthread) >> "failed to prepare message: %d\n", ret); >> master->cur_msg->status = ret; >> spi_finalize_current_message(master); >> + mutex_unlock(>mmap_lock_mutex); >> return; >> } >> master->cur_msg_prepared = true; >> @@ -1077,6 +1079,7 @@ static void __spi_pump_messages(struct spi_master >> *master, bool in_kthread) >> if (ret) { >> master->cur_msg->status = ret; >> spi_finalize_current_message(master); >> + mutex_unlock(>mmap_lock_mutex); >> return; >> } >> >> @@ -1084,8 +1087,10 @@ static void __spi_pump_messages(struct spi_master >> *master, bool in_kthread) >> if (ret) { >> dev_err(>dev, >> "failed to transfer one message from queue\n"); >> + mutex_unlock(>mmap_lock_mutex); >> return; >> } >> + mutex_unlock(>mmap_lock_mutex); >> } >> >> /** >> @@ -1732,6 +1737,7 @@ int spi_register_master(struct spi_master *master) >> spin_lock_init(>queue_lock); >> spin_lock_init(>bus_lock_spinlock); >> mutex_init(>bus_lock_mutex); >> + mutex_init(>mmap_lock_mutex); >> master->bus_lock_flag = 0; >> init_completion(>xfer_completion); >> if (!master->max_dma_len) >> @@ -2237,6 +2243,35 @@ int spi_async_locked(struct spi_device *spi, struct >> spi_message *message) >> EXPORT_SYMBOL_GPL(spi_async_locked); >> >> >> +int spi_mtd_mmap_read(struct spi_device *spi, loff_t from, size_t len, >> + size_t *retlen, u_char *buf, u8 read_opcode, >> + u8 addr_width, u8 dummy_bytes) >> + >> +{ >> + struct spi_master *master = spi->master; >> + int ret; >> + >> + if (master->auto_runtime_pm) { >> + ret = pm_runtime_get_sync(master->dev.parent); >> + if (ret < 0) { >> + dev_err(>dev, "Failed to power device: %d\n", >> + ret); >> + goto err; >> + } >> + } >> + mutex_lock(>mmap_lock_mutex); >> + ret = master->spi_mtd_mmap_read(spi, from, len, retlen, buf, >> + read_opcode, addr_width, >> + dummy_bytes); >> + mutex_unlock(>mmap_lock_mutex); >> + if (master->auto_runtime_pm) >> + pm_runtime_put(master->dev.parent); >> + >> +err: >> + return ret; >> +} >> +EXPORT_SYMBOL_GPL(spi_mtd_mmap_read); >> + >> >> /*-*/ >> >> /* Utility methods for SPI master protocol drivers, layered on >> diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h >> index 6b00f18f5e6b..0a6d8ad57357 100644 >> --- a/include/linux/spi/spi.h >> +++
Re: kernel panic in 4.2.3, rb_erase in sch_fq
On Wed, 2015-11-04 at 06:25 +0200, Denys Fedoryshchenko wrote: > On 2015-11-04 00:06, Cong Wang wrote: > > On Mon, Nov 2, 2015 at 6:11 AM, Denys Fedoryshchenko > > wrote: > >> Hi! > >> > >> Actually seems i was getting this panic for a while (once per week) on > >> loaded pppoe server, but just now was able to get full panic message. > >> After checking commit logs on sch_fq.c i didnt seen any fixes, so > >> probably > >> upgrading to newer kernel wont help? > > > > > > Can you share your `tc qdisc show dev ` with us? And how to > > reproduce > > it? I tried to setup htb+fq and then flip the interface back and forth > > but I don't > > see any crash. > My guess it wont be easy to reproduce, it is happening on box with 4.5k > interfaces, that constantly create/delete interfaces, > and even with that this problem may happen once per day, or may not > happen for 1 week. > > Here is script that is being fired after new ppp interface detected. But > pppoe process are independent from > process that are "establishing" shapers. It is probably a generic bug. sch_fq seems OK to me. Somehow nobody tries to change qdisc hundred times per second ;) Could you try following patch ? It seems to 'fix' the issue for me. diff --git a/net/core/dev.c b/net/core/dev.c index 8ce3f74cd6b9..bf136103bc7b 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -2880,6 +2880,12 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, spin_lock(>busylock); spin_lock(root_lock); + if (unlikely(q != rcu_dereference_bh(txq->qdisc))) { + pr_err_ratelimited("Arg, qdisc changed ! state %lx\n", q->state); + kfree_skb(skb); + rc = NET_XMIT_DROP; + goto end; + } if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, >state))) { kfree_skb(skb); rc = NET_XMIT_DROP; @@ -2913,6 +2919,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, struct Qdisc *q, __qdisc_run(q); } } +end: spin_unlock(root_lock); if (unlikely(contended)) spin_unlock(>busylock); -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] locking: Introduce smp_cond_acquire()
On Tue, Nov 3, 2015 at 7:57 PM, Paul E. McKenney wrote: > > Thank you, and yes, it clearly states that read-to-write dependencies > are ordered. Well, I wouldn't say that it's exactly "clear". The fact that they explicitly say "Note that the DP relation does not directly impose a BEFORE (⇐) ordering between accesses u and v" makes it all clear as mud. They *then* go on to talk about how the DP relationship *together* with the odd "is source of" ordering (which in turn is defined in terms of BEFORE ordering) cannot have cycles. I have no idea why they do it that way, but the reason seems to be that they wanted to make "BEFORE" be purely about barriers and accesses, and make the other orderings be described separately. So the "BEFORE" ordering is used to define how memory must act, which is then used as a basis for that storage definition and the "is source of" thing. But none of that seems to make much sense to a *user*. The fact that they seem to equate "BEFORE" with "Processor Issue Constraints" also makes me think that the whole logic was written by a CPU designer, and part of why they document it that way is that the CPU designer literally thought of "can I issue this access" as being very different from "is there some inherent ordering that just results from issues outside of my design". I really don't know. That whole series of memory ordering rules makes my head hurt. But I do think the only thing that matters in the end is that they do have that DP relationship between reads and subsequently dependent writes, but basically not for *any* other relationship. So on alpha read-vs-read, write-vs-later-read, and write-vs-write all have to have memory barriers, unless the accesses physically overlap. Linus -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v3] perf: fix RCU issues with cgroup monitoring mode
Hi Stephane, [auto build test WARNING on: tip/perf/core] [also build test WARNING on: v4.3 next-20151103] url: https://github.com/0day-ci/linux/commits/Stephane-Eranian/perf-fix-RCU-issues-with-cgroup-monitoring-mode/20151104-121512 config: i386-randconfig-i0-201544 (attached as .config) reproduce: # save the attached .config to linux build tree make ARCH=i386 All warnings (new ones prefixed by >>): In file included from include/linux/trace_events.h:9:0, from include/trace/syscall.h:6, from include/linux/syscalls.h:81, from init/main.c:18: include/linux/perf_event.h: In function 'perf_cgroup_from_task': >> include/linux/perf_event.h:702:7: warning: unused variable 'safe' >> [-Wunused-variable] bool safe = ctx ? lockdep_is_held(>lock) : true; ^ vim +/safe +702 include/linux/perf_event.h 686 u64 timestamp; 687 }; 688 689 struct perf_cgroup { 690 struct cgroup_subsys_state css; 691 struct perf_cgroup_info __percpu *info; 692 }; 693 694 /* 695 * Must ensure cgroup is pinned (css_get) before calling 696 * this function. In other words, we cannot call this function 697 * if there is no cgroup event for the current CPU context. 698 */ 699 static inline struct perf_cgroup * 700 perf_cgroup_from_task(struct task_struct *task, struct perf_event_context *ctx) 701 { > 702 bool safe = ctx ? lockdep_is_held(>lock) : true; 703 return container_of(task_css_check(task, perf_event_cgrp_id, safe), 704 struct perf_cgroup, css); 705 } 706 #endif /* CONFIG_CGROUP_PERF */ 707 708 #ifdef CONFIG_PERF_EVENTS 709 710 extern void *perf_aux_output_begin(struct perf_output_handle *handle, --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: [PATCH v3] perf: fix RCU issues with cgroup monitoring mode
Hi Stephane, [auto build test ERROR on: tip/perf/core] [also build test ERROR on: v4.3 next-20151103] url: https://github.com/0day-ci/linux/commits/Stephane-Eranian/perf-fix-RCU-issues-with-cgroup-monitoring-mode/20151104-121512 config: parisc-allyesconfig (attached as .config) reproduce: wget https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # save the attached .config to linux build tree make.cross ARCH=parisc All errors (new ones prefixed by >>): In file included from include/linux/trace_events.h:9:0, from include/trace/syscall.h:6, from include/linux/syscalls.h:81, from kernel/events/core.c:34: include/linux/perf_event.h: In function 'perf_cgroup_from_task': >> include/linux/perf_event.h:702:2: error: implicit declaration of function >> 'lockdep_is_held' [-Werror=implicit-function-declaration] bool safe = ctx ? lockdep_is_held(>lock) : true; ^ include/linux/perf_event.h:702:7: warning: unused variable 'safe' [-Wunused-variable] bool safe = ctx ? lockdep_is_held(>lock) : true; ^ cc1: some warnings being treated as errors vim +/lockdep_is_held +702 include/linux/perf_event.h 696 * this function. In other words, we cannot call this function 697 * if there is no cgroup event for the current CPU context. 698 */ 699 static inline struct perf_cgroup * 700 perf_cgroup_from_task(struct task_struct *task, struct perf_event_context *ctx) 701 { > 702 bool safe = ctx ? lockdep_is_held(>lock) : true; 703 return container_of(task_css_check(task, perf_event_cgrp_id, safe), 704 struct perf_cgroup, css); 705 } --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: Binary data
Re: kernel panic in 4.2.3, rb_erase in sch_fq
On 2015-11-04 00:06, Cong Wang wrote: On Mon, Nov 2, 2015 at 6:11 AM, Denys Fedoryshchenko wrote: Hi! Actually seems i was getting this panic for a while (once per week) on loaded pppoe server, but just now was able to get full panic message. After checking commit logs on sch_fq.c i didnt seen any fixes, so probably upgrading to newer kernel wont help? Can you share your `tc qdisc show dev ` with us? And how to reproduce it? I tried to setup htb+fq and then flip the interface back and forth but I don't see any crash. My guess it wont be easy to reproduce, it is happening on box with 4.5k interfaces, that constantly create/delete interfaces, and even with that this problem may happen once per day, or may not happen for 1 week. Here is script that is being fired after new ppp interface detected. But pppoe process are independent from process that are "establishing" shapers. /sbin/tc qdisc del root /sbin/tc qdisc add handle 1: root htb default 3 /sbin/tc filter add parent 1:0 protocol ip prio 4 handle 1 fw flowid 1:3 /sbin/tc filter add parent 1:0 protocol ip prio 3 u32 match ip protocol 6 0xff match ip src 10.0.252.8/32 flowid 1:3/sbin/tc filter add parent 1:0 protocol ip prio 5 u32 match ip protocol 1 0xff flowid 1:0 /sbin/tc filter add parent 1:0 protocol ip prio 5 u32 match ip protocol 6 0xff match ip sport 80 0x flowid 1:4 /sbin/tc filter add parent 1:0 protocol ip prio 5 u32 match ip protocol 6 0xff match ip sport 443 0x flowid 1:5 /sbin/tc filter add parent 1:0 protocol ip prio 100 u32 match u32 0 0 flowid 1:2 /sbin/tc class add classid 1:1 parent 1:0 htb rate 512Kbit ceil 512Kbit. /sbin/tc class add classid 1:2 parent 1:1 htb rate 32Kbit ceil 512Kbit /sbin/tc class add classid 1:3 parent 1:0 htb rate 10Mbit ceil 10Mbit /sbin/tc class add classid 1:4 parent 1:1 htb rate 32Kbit ceil 512Kbit /sbin/tc class add classid 1:5 parent 1:1 htb rate 32Kbit ceil 512Kbit /sbin/tc qdisc add parent 1:2 fq limit 300 /sbin/tc qdisc add parent 1:3 pfifo limit 300 /sbin/tc qdisc add parent 1:4 fq limit 300 /sbin/tc qdisc add parent 1:5 fq limit 300 Possible cases come to my mind (but maybe i missed others): Script and tc working and interface are deleted in a process (e.g. interface disappears) Script deleting root while there is heavy traffic on interface and a lot of packets queued ppp interface destroyed, while there is a lot of traffic queued on it (this one a bit rare situation) Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] locking changes for v4.4
On Tue, Nov 03, 2015 at 05:30:29PM -0800, Linus Torvalds wrote: > On Tue, Nov 3, 2015 at 3:58 PM, Linus Torvalds > wrote: > > > > I think I'll pull this, but then just make a separate commit to remove > > all the bogus games with "control" dependencies that seem to have no > > basis is reality. > > So the attached is what I committed in my tree. It took much longer to > try to write the rationale than it took to actually remove the > atomic_read_ctrl() functions, and even so I'm not sure how good that > commit message is. But at least it tries to explain what's going on. > > Note the final part of the rationale: > > I may have to eat my words at some point, but in the absense of clear > proof that alpha actually needs this, or indeed even an explanation of > how alpha could _possibly_ need it, I do not believe these functions are > called for. > > And if it turns out that alpha really _does_ need a barrier for this > case, that barrier still should not be "smp_read_barrier_depends()". > We'd have to make up some new speciality barrier just for alpha, along > with the documentation for why it really is necessary. For whatever it is worth, the patch looks good to me. The reasons I could imagine why we might want to mark control dependencies are things like documentation and tooling, but given that we currently only have a very small number of them, it is hard to argue that this is of immediate concern, if it is ever of concern. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v3] perf: fix RCU issues with cgroup monitoring mode
This patch eliminates RCU violations detected by the RCU checker (PROVE_RCU). The impact code paths were all related to cgroup mode monitoring and involved access a task's cgrp. V2 is updated to include comments from PeterZ to eliminate some of the warnings without grabbing the rcu_read lock because we know we are already holding th ctx->lock which prevents the cgroup from disappearing while we are accessing it. The trick, as suggested by Peter, is to modify the perf_cgroup_from_task() to take an extra boolean parameter to allow bypassing the lockdep test in the task_subsys_cstate() macros. This patch uses this approach to update all calls the perf_cgroup_from_task(). In V3, we change the boolean parameter for a pointer to a perf_event_context so we can check the ctx->lock explicitely. This is more robust, than passing the boolean to express that we know the lock is held. The code can change, and thus the locking assumption, checking lockdep_is_held() ensures, the proper locking is in place. Patch relative to tip.git at commit 57ef9fc. Signed-off-by: Stephane Eranian --- arch/x86/kernel/cpu/perf_event_intel_cqm.c | 2 +- include/linux/perf_event.h | 5 +++-- kernel/events/core.c | 25 +++-- 3 files changed, 19 insertions(+), 13 deletions(-) diff --git a/arch/x86/kernel/cpu/perf_event_intel_cqm.c b/arch/x86/kernel/cpu/perf_event_intel_cqm.c index 377e8f8..a316ca9 100644 --- a/arch/x86/kernel/cpu/perf_event_intel_cqm.c +++ b/arch/x86/kernel/cpu/perf_event_intel_cqm.c @@ -298,7 +298,7 @@ static bool __match_event(struct perf_event *a, struct perf_event *b) static inline struct perf_cgroup *event_to_cgroup(struct perf_event *event) { if (event->attach_state & PERF_ATTACH_TASK) - return perf_cgroup_from_task(event->hw.target); + return perf_cgroup_from_task(event->hw.target, event->ctx); return event->cgrp; } diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h index d841d33..94107e4 100644 --- a/include/linux/perf_event.h +++ b/include/linux/perf_event.h @@ -697,9 +697,10 @@ struct perf_cgroup { * if there is no cgroup event for the current CPU context. */ static inline struct perf_cgroup * -perf_cgroup_from_task(struct task_struct *task) +perf_cgroup_from_task(struct task_struct *task, struct perf_event_context *ctx) { - return container_of(task_css(task, perf_event_cgrp_id), + bool safe = ctx ? lockdep_is_held(>lock) : true; + return container_of(task_css_check(task, perf_event_cgrp_id, safe), struct perf_cgroup, css); } #endif /* CONFIG_CGROUP_PERF */ diff --git a/kernel/events/core.c b/kernel/events/core.c index ea02109..f611246 100644 --- a/kernel/events/core.c +++ b/kernel/events/core.c @@ -435,7 +435,7 @@ static inline void update_cgrp_time_from_event(struct perf_event *event) if (!is_cgroup_event(event)) return; - cgrp = perf_cgroup_from_task(current); + cgrp = perf_cgroup_from_task(current, event->ctx); /* * Do not update time when cgroup is not active */ @@ -458,7 +458,7 @@ perf_cgroup_set_timestamp(struct task_struct *task, if (!task || !ctx->nr_cgroups) return; - cgrp = perf_cgroup_from_task(task); + cgrp = perf_cgroup_from_task(task, ctx); info = this_cpu_ptr(cgrp->info); info->timestamp = ctx->timestamp; } @@ -489,7 +489,6 @@ static void perf_cgroup_switch(struct task_struct *task, int mode) * we reschedule only in the presence of cgroup * constrained events. */ - rcu_read_lock(); list_for_each_entry_rcu(pmu, , entry) { cpuctx = this_cpu_ptr(pmu->pmu_cpu_context); @@ -523,7 +522,7 @@ static void perf_cgroup_switch(struct task_struct *task, int mode) * event_filter_match() to not have to pass * task around */ - cpuctx->cgrp = perf_cgroup_from_task(task); + cpuctx->cgrp = perf_cgroup_from_task(task, NULL); cpu_ctx_sched_in(cpuctx, EVENT_ALL, task); } perf_pmu_enable(cpuctx->ctx.pmu); @@ -531,8 +530,6 @@ static void perf_cgroup_switch(struct task_struct *task, int mode) } } - rcu_read_unlock(); - local_irq_restore(flags); } @@ -542,17 +539,18 @@ static inline void perf_cgroup_sched_out(struct task_struct *task, struct perf_cgroup *cgrp1; struct perf_cgroup *cgrp2 = NULL; + rcu_read_lock(); /* * we come here when we know perf_cgroup_events > 0 */ - cgrp1 = perf_cgroup_from_task(task); + cgrp1 = perf_cgroup_from_task(task, NULL); /* * next is NULL when called from
[PATCH v2 net-next] net/core: ensure features get disabled on new lower devs
With moving netdev_sync_lower_features() after the .ndo_set_features calls, I neglected to verify that devices added *after* a flag had been disabled on an upper device were properly added with that flag disabled as well. This currently happens, because we exit __netdev_update_features() when we see dev->features == features for the upper dev. We can retain the optimization of leaving without calling .ndo_set_features with a bit of tweaking and a goto here. Fixes: fd867d51f889 ("net/core: generic support for disabling netdev features down stack") CC: "David S. Miller" CC: Eric Dumazet CC: Jay Vosburgh CC: Veaceslav Falico CC: Andy Gospodarek CC: Jiri Pirko CC: Nikolay Aleksandrov CC: Michal Kubecek CC: Alexander Duyck CC: net...@vger.kernel.org Reported-by: Nikolay Aleksandrov Signed-off-by: Jarod Wilson --- v2: Based on suggestions from Alex, and with not changing err to ret, this patch actually becomes quite minimal and doesn't ugly up the code much. net/core/dev.c | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/net/core/dev.c b/net/core/dev.c index 8ce3f74..ab9b8d0 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -6402,7 +6402,7 @@ int __netdev_update_features(struct net_device *dev) struct net_device *upper, *lower; netdev_features_t features; struct list_head *iter; - int err = 0; + int err = -1; ASSERT_RTNL(); @@ -6419,7 +6419,7 @@ int __netdev_update_features(struct net_device *dev) features = netdev_sync_upper_features(dev, upper, features); if (dev->features == features) - return 0; + goto sync_lower; netdev_dbg(dev, "Features changed: %pNF -> %pNF\n", >features, ); @@ -6434,6 +6434,7 @@ int __netdev_update_features(struct net_device *dev) return -1; } +sync_lower: /* some features must be disabled on lower devices when disabled * on an upper device (think: bonding master or bridge) */ @@ -6443,7 +6444,7 @@ int __netdev_update_features(struct net_device *dev) if (!err) dev->features = features; - return 1; + return err < 0 ? 0 : 1; } /** -- 1.8.3.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH] mailbox: mailbox-test: avoid reading iomem twice
From: Jassi Brar Don't pass mmio region as source to print_hex_dump() and then again to memcpy_fromio(). Do it once and give print_hex_dump() the buffer we just read the data in. Signed-off-by: Jassi Brar --- drivers/mailbox/mailbox-test.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/drivers/mailbox/mailbox-test.c b/drivers/mailbox/mailbox-test.c index f82dc89..684ae17 100644 --- a/drivers/mailbox/mailbox-test.c +++ b/drivers/mailbox/mailbox-test.c @@ -221,11 +221,10 @@ static void mbox_test_receive_message(struct mbox_client *client, void *message) spin_lock_irqsave(>lock, flags); if (tdev->mmio) { + memcpy_fromio(tdev->rx_buffer, tdev->mmio, MBOX_MAX_MSG_LEN); print_hex_dump(KERN_INFO, "Client: Received [MMIO]: ", DUMP_PREFIX_ADDRESS, MBOX_BYTES_PER_LINE, 1, - __io_virt(tdev->mmio), MBOX_MAX_MSG_LEN, true); - memcpy_fromio(tdev->rx_buffer, tdev->mmio, MBOX_MAX_MSG_LEN); - + tdev->rx_buffer, MBOX_MAX_MSG_LEN, true); } else if (message) { print_hex_dump(KERN_INFO, "Client: Received [API]: ", DUMP_PREFIX_ADDRESS, MBOX_BYTES_PER_LINE, 1, -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 4/4] locking: Introduce smp_cond_acquire()
On Tue, Nov 03, 2015 at 11:40:24AM -0800, Linus Torvalds wrote: > On Mon, Nov 2, 2015 at 5:57 PM, Paul E. McKenney > wrote: [ . . . ] > > I am in India and my Alpha Architecture Manual is in the USA. > > I sent you a link to something that should work, and that has the section. Thank you, and yes, it clearly states that read-to-write dependencies are ordered. Color me slow and stupid. > > And they did post a clarification on the web: > > So for alpha, you trust a random web posting by a unknown person that > talks about some random problem in an application that we don't even > know what it is. In my defense, the Alpha architects pointed me at that web posting, but yes, it appears to be pushing for overly conservative safety rather than accuracy. Please accept my apologies for my confusion. Thanx, Paul -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V4] usb: remove unnecessary CONFIG_PM dependency from USB_OTG
Hi, Peter Chen writes: > On Tue, Nov 03, 2015 at 07:56:55AM -0600, Felipe Balbi wrote: >> >> Hi, >> >> Nathan Sullivan writes: >> > The USB OTG support currently depends on power management >> > (CONFIG_PM) being enabled, but does not actually need it enabled. >> > Remove this dependency. >> > >> > Tested on Bay Trail hardware with dwc3 USB. >> > >> > Signed-off-by: Nathan Sullivan >> > --- >> > drivers/usb/core/Kconfig |1 - >> > 1 file changed, 1 deletion(-) >> > >> > diff --git a/drivers/usb/core/Kconfig b/drivers/usb/core/Kconfig >> > index a99c89e..9c5cdf3 100644 >> > --- a/drivers/usb/core/Kconfig >> > +++ b/drivers/usb/core/Kconfig >> > @@ -43,7 +43,6 @@ config USB_DYNAMIC_MINORS >> > >> > config USB_OTG >> >bool "OTG support" >> > - depends on PM >> >> I don't think this is correct. OTG depends on USB bus suspend, which is >> only available on PM builds. Care to further detail why you think PM is >> not needed on OTG ? >> > > OTG depends on USB bus suspend is not a must, the hardware controlled OTG > design do HNP when the bus goes to suspend; but if the software > implements OTG FSM, it is the user option whether do HNP, and bus > suspend is controlled by OTG FSM software (stop SOF), but not by host > stack (eg, ehci). > > I am sorry I did not consider the legacy OTG design, this patch should > be dropped. there is no "legacy" OTG design. OTG requires a bus suspend to enter HNP, and that's achieved by stopping all transfers and avoid new URB submission so usbcore can put the bus in suspend (by means of USB autosuspend). If you're bypassing that in the OTG FSM thing, that needs to be fixed ASAP as that makes it a lot harder for any generic changes in usbcore to be validated. Specially when you consider not many will have whatever special HW which, likely, doesn't even work with mainline to validate a change. Please, make sure to fix that design so that HNP *always* goes through the proper code path. If you have devices which would prevent HNP because their class driver (host side driver) would never autosuspend, fix that as well. cheers -- balbi signature.asc Description: PGP signature
Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)
On Nov 3, 2015 5:30 PM, "Minchan Kim" wrote: > > Linux doesn't have an ability to free pages lazy while other OS already > have been supported that named by madvise(MADV_FREE). > > The gain is clear that kernel can discard freed pages rather than swapping > out or OOM if memory pressure happens. > > Without memory pressure, freed pages would be reused by userspace without > another additional overhead(ex, page fault + allocation + zeroing). > [...] > > How it works: > > When madvise syscall is called, VM clears dirty bit of ptes of the range. > If memory pressure happens, VM checks dirty bit of page table and if it > found still "clean", it means it's a "lazyfree pages" so VM could discard > the page instead of swapping out. Once there was store operation for the > page before VM peek a page to reclaim, dirty bit is set so VM can swap out > the page instead of discarding. What happens if you MADV_FREE something that's MAP_SHARED or isn't ordinary anonymous memory? There's a long history of MADV_DONTNEED on such mappings causing exploitable problems, and I think it would be nice if MADV_FREE were obviously safe. Does this set the write protect bit? What happens on architectures without hardware dirty tracking? For that matter, even on architecture with hardware dirty tracking, what happens in multithreaded processes that have the dirty TLB state cached in a different CPU's TLB? Using the dirty bit for these semantics scares me. This API creates a page that can have visible nonzero contents and then can asynchronously and magically zero itself thereafter. That makes me nervous. Could we use the accessed bit instead? Then the observable semantics would be equivalent to having MADV_FREE either zero the page or do nothing, except that it doesn't make up its mind until the next read. > + ptent = pte_mkold(ptent); > + ptent = pte_mkclean(ptent); > + set_pte_at(mm, addr, pte, ptent); > + tlb_remove_tlb_entry(tlb, pte, addr); It looks like you are flushing the TLB. In a multithreaded program, that's rather expensive. Potentially silly question: would it be better to just zero the page immediately in a multithreaded program and then, when swapping out, check the page is zeroed and, if so, skip swapping it out? That could be done without forcing an IPI. > +static int madvise_free_single_vma(struct vm_area_struct *vma, > + unsigned long start_addr, unsigned long end_addr) > +{ > + unsigned long start, end; > + struct mm_struct *mm = vma->vm_mm; > + struct mmu_gather tlb; > + > + if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP)) > + return -EINVAL; > + > + /* MADV_FREE works for only anon vma at the moment */ > + if (!vma_is_anonymous(vma)) > + return -EINVAL; Does anything weird happen if it's shared? > + if (!PageDirty(page) && (flags & TTU_FREE)) { > + /* It's a freeable page by MADV_FREE */ > + dec_mm_counter(mm, MM_ANONPAGES); > + goto discard; > + } Does something clear TTU_FREE the next time the page gets marked clean? --Andy -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v1] Revert "serial: imx: remove unbalanced clk_prepare"
On Wed, Nov 4, 2015 at 12:58 AM, Robin Gong wrote: > commit 9e7b399d6528 ("serial: imx: remove unbalanced clk_prepare"). > Otherwise below warning happen since there are some printk logs in > interrupt. This has already been reverted since v4.3-rc5. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
RE: [PATCH] ARM: add v7 LPAE multi-platform defconfig
> On Tue, Oct 27, 2015 at 10:35:07PM +0800, Alison Wang wrote: > > v7 LPAE multi-platform defconfig is based on v7 multi-platform > > defconfig and adds LPAE support. > > > > This defconfig is verified on LS1021A which enables GIANFAR, I2C, > > WATCHDOG, AUDIO, EDMA and DSPI drivers, etc. > > > > Signed-off-by: Alison Wang > > I think this would be great if it also had: > > CONFIG_ARCH_VIRT=y > CONFIG_VIRTUALIZATION=y > CONFIG_KVM=y > > That would allow it to be used for KVM testing both as host and guest. > > FWIW, I just tried booting this on my TC2, but I don't get any output > from there. Is it supposed to work on this platform? If not, why not? > [Alison Wang] One question, does TC2 support LPAE? If so, we could try to find the reason and add it in this defconfig for armv7 LPAE support. Best Regards, Alison Wang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 2/2] arm64: Allow changing of attributes outside of modules
On 2015/11/4 5:48, Laura Abbott wrote: > > Currently, the set_memory_* functions that are implemented for arm64 > are restricted to module addresses only. This was mostly done > because arm64 maps normal zone memory with larger page sizes to > improve TLB performance. This has the side effect though of making it > difficult to adjust attributes at the PAGE_SIZE granularity. There are > an increasing number of use cases related to security where it is > necessary to change the attributes of kernel memory. Add functionality > to the page attribute changing code under a Kconfig to let systems > designers decide if they want to make the trade off of security for TLB > pressure. > > Signed-off-by: Laura Abbott > --- > arch/arm64/Kconfig.debug | 11 +++ > arch/arm64/mm/mm.h | 3 ++ > arch/arm64/mm/mmu.c | 2 +- > arch/arm64/mm/pageattr.c | 74 > > 4 files changed, 84 insertions(+), 6 deletions(-) > > diff --git a/arch/arm64/Kconfig.debug b/arch/arm64/Kconfig.debug > index d6285ef..abc6922 100644 > --- a/arch/arm64/Kconfig.debug > +++ b/arch/arm64/Kconfig.debug > @@ -89,6 +89,17 @@ config DEBUG_ALIGN_RODATA > > If in doubt, say N > > +config DEBUG_CHANGE_PAGEATTR > + bool "Allow all kernel memory to have attributes changed" > + help > + If this option is selected, APIs that change page attributes > + (RW <-> RO, X <-> NX) will be valid for all memory mapped in > + the kernel space. The trade off is that there may be increased > + TLB pressure from finer grained page mapping. Turn on this option > + if performance is more important than security > + > + If in doubt, say N > + > source "drivers/hwtracing/coresight/Kconfig" > > endmenu > diff --git a/arch/arm64/mm/mm.h b/arch/arm64/mm/mm.h > index ef47d99..7b0dcc4 100644 > --- a/arch/arm64/mm/mm.h > +++ b/arch/arm64/mm/mm.h > @@ -1,3 +1,6 @@ > extern void __init bootmem_init(void); > > void fixup_init(void); > + > +void split_pud(pud_t *old_pud, pmd_t *pmd); > +void split_pmd(pmd_t *pmd, pte_t *pte); > diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c > index ff41efa..cefad2d 100644 > --- a/arch/arm64/mm/mmu.c > +++ b/arch/arm64/mm/mmu.c > @@ -72,7 +72,7 @@ static void __init *early_alloc(unsigned long sz) > /* > * remap a PMD into pages > */ > -static void split_pmd(pmd_t *pmd, pte_t *pte) > +void split_pmd(pmd_t *pmd, pte_t *pte) > { > unsigned long pfn = pmd_pfn(*pmd); > unsigned long addr = pfn << PAGE_SHIFT; > diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c > index e47ed1c..48a4ce9 100644 > --- a/arch/arm64/mm/pageattr.c > +++ b/arch/arm64/mm/pageattr.c > @@ -15,9 +15,12 @@ > #include > #include > > +#include > #include > #include > > +#include "mm.h" > + > struct page_change_data { > pgprot_t set_mask; > pgprot_t clear_mask; > @@ -36,6 +39,66 @@ static int change_page_range(pte_t *ptep, pgtable_t token, > unsigned long addr, > return 0; > } > > +#ifdef CONFIG_DEBUG_CHANGE_PAGEATTR > +static int check_address(unsigned long addr) > +{ > + pgd_t *pgd = pgd_offset_k(addr); > + pud_t *pud; > + pmd_t *pmd; > + pte_t *pte; > + int ret = -EFAULT; > + > + if (pgd_none(*pgd)) > + goto out; > + > + pud = pud_offset(pgd, addr); > + if (pud_none(*pud)) > + goto out; > + > + if (pud_sect(*pud)) { > + pmd = pmd_alloc_one(_mm, addr); > + if (!pmd) { > + ret = -ENOMEM; > + goto out; > + } > + split_pud(pud, pmd); > + pud_populate(_mm, pud, pmd); > + } > + > + pmd = pmd_offset(pud, addr); > + if (pmd_none(*pmd)) > + goto out; > + > + if (pmd_sect(*pmd)) { > + pte = pte_alloc_one_kernel(_mm, addr); > + if (!pte) { > + ret = -ENOMEM; > + goto out; > + } > + split_pmd(pmd, pte); > + __pmd_populate(pmd, __pa(pte), PMD_TYPE_TABLE); > + } > + > + pte = pte_offset_kernel(pmd, addr); > + if (pte_none(*pte)) > + goto out; > + > + flush_tlb_all(); > + ret = 0; > + > +out: > + return ret; > +} > +#else > +static int check_address(unsigned long addr) > +{ > + if (addr < MODULES_VADDR || addr >= MODULES_END) > + return -EINVAL; > + > + return 0; > +} > +#endif > + > static int change_memory_common(unsigned long addr, int numpages, > pgprot_t set_mask, pgprot_t clear_mask) > { > @@ -45,17 +108,18 @@ static int change_memory_common(unsigned long addr, int > numpages, > int ret; > struct page_change_data data; > > + if (addr < PAGE_OFFSET && !is_vmalloc_addr((void *)addr)) > + return -EINVAL; > + > if (!IS_ALIGNED(addr, PAGE_SIZE)) { > start &= PAGE_MASK; >
Re: [RFC] iommu: arm-smmu: correct reference count
Hi Will, On Tue, Nov 03, 2015 at 01:17:34PM +, Will Deacon wrote: >On Tue, Nov 03, 2015 at 08:59:17PM +0800, Peng Fan wrote: >> iommu_group_alloc will initialize the reference count for group to 1. >> iommu_group_add_device also increase the group reference count, >> if nothing bad happends. And we need to add iommu_group_put to >> decrease the reference count for group. >> >> Signed-off-by: Peng Fan >> Cc: Will Deacon >> Cc: Joerg Roedel >> --- >> >> Not sure whether my understanding is correct or not. I checked >> rockchip-iommu.c >> exynos-iommu.c and fsl_pamu_domain.c, and they all have iommu_group_put after >> iommu_group_add_device. > >Doesn't this pair up with the iommu_group_remove_device in >arm_smmu_remove_device? Are you actually seeing an issue in practice? In arm_smmu_add_platform_device, iommu_group_alloc --> group->device_kobj ref count will be init to 1. iommu_group_add_device --> group->device_kobj ref count will be added 1 to 2. In arm_smmu_remove_device: iommu_group_remove_device --> Decrease group->device_kobj ref count by 1. After arm_smmu_remove_device, the ref count of group->device_kobj is not 0. So I think need to add iommu_group_put after iommu_group_add_device. If I am wrong, please correct me. Just code inspection, not have a platform to test this. Regards, Peng. > >Will >___ >iommu mailing list >io...@lists.linux-foundation.org >https://lists.linuxfoundation.org/mailman/listinfo/iommu -- -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V4] usb: remove unnecessary CONFIG_PM dependency from USB_OTG
On Tue, Nov 03, 2015 at 07:56:55AM -0600, Felipe Balbi wrote: > > Hi, > > Nathan Sullivan writes: > > The USB OTG support currently depends on power management > > (CONFIG_PM) being enabled, but does not actually need it enabled. > > Remove this dependency. > > > > Tested on Bay Trail hardware with dwc3 USB. > > > > Signed-off-by: Nathan Sullivan > > --- > > drivers/usb/core/Kconfig |1 - > > 1 file changed, 1 deletion(-) > > > > diff --git a/drivers/usb/core/Kconfig b/drivers/usb/core/Kconfig > > index a99c89e..9c5cdf3 100644 > > --- a/drivers/usb/core/Kconfig > > +++ b/drivers/usb/core/Kconfig > > @@ -43,7 +43,6 @@ config USB_DYNAMIC_MINORS > > > > config USB_OTG > > bool "OTG support" > > - depends on PM > > I don't think this is correct. OTG depends on USB bus suspend, which is > only available on PM builds. Care to further detail why you think PM is > not needed on OTG ? > OTG depends on USB bus suspend is not a must, the hardware controlled OTG design do HNP when the bus goes to suspend; but if the software implements OTG FSM, it is the user option whether do HNP, and bus suspend is controlled by OTG FSM software (stop SOF), but not by host stack (eg, ehci). I am sorry I did not consider the legacy OTG design, this patch should be dropped. -- Best Regards, Peter Chen -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v13 02/51] vfs: Add MAY_CREATE_FILE and MAY_CREATE_DIR permission flags
On Wed, Nov 4, 2015 at 3:33 AM, Andreas Dilger wrote: > On Nov 3, 2015, at 8:16 AM, Andreas Gruenbacher wrote: >> @@ -3667,7 +3674,7 @@ EXPORT_SYMBOL(dentry_unhash); >> >> int vfs_rmdir(struct inode *dir, struct dentry *dentry) >> { >> - int error = may_delete(dir, dentry, 1); >> + int error = may_delete(dir, dentry, true, false); > > This is a prime example why passing "true" and "false" as function arguments > is not very useful, and especially prone to bugs when there are two of them. > > That said, this is code originally from Al, so he may have a different > opinion. Have you checked how vfs_rename uses the is_dir and new_is_dir variables? Using file modes there probably won't help readability. An enum maybe? Thanks, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v1] Revert "serial: imx: remove unbalanced clk_prepare"
commit 9e7b399d6528 ("serial: imx: remove unbalanced clk_prepare"). Otherwise below warning happen since there are some printk logs in interrupt. [ 14.868319] udevd[501]: starting version 182 [ 16.386107] random: nonblocking pool is initialized [ 16.386123] [ cut here ] [ 16.386140] WARNING: CPU: 0 PID: 501 at kernel/locking/mutex.c:868 mutex_trylock+0x210/0x230() [ 16.386146] DEBUG_LOCKS_WARN_ON(in_interrupt()) [ 16.386149] Modules linked in: [ 16.386157] CPU: 0 PID: 501 Comm: udevd Not tainted 4.3.0-rc1-00014-gf843df8 #28 [ 16.386160] Hardware name: Freescale i.MX6 SoloX (Device Tree) [ 16.386165] Backtrace: [ 16.386182] [] (dump_backtrace) from [] (show_stack+0x18/0x1c) [ 16.386192] r6:c0af1840 r5: r4: r3: [ 16.386201] [] (show_stack) from [] (dump_stack+0x8c/0xa4) [ 16.386216] [] (dump_stack) from [] (warn_slowpath_common+0x80/0xbc) [ 16.386225] r6:c07b1ccc r5:0009 r4:edcf5c60 r3:0001 [ 16.386234] [] (warn_slowpath_common) from [] (warn_slowpath_fmt+0x38/0x40) [ 16.386246] r8:c0564694 r7:eeb76880 r6:c13323ec r5:0001 r4:c0976990 [ 16.386255] [] (warn_slowpath_fmt) from [] (mutex_trylock+0x210/0x230) [ 16.386261] r3:c0978f50 r2:c0976990 [ 16.386264] r4:c0b29a70 [ 16.386278] [] (mutex_trylock) from [] (clk_prepare_lock+0x14/0xf4) [ 16.386289] r8:c133000c r7:eeb76880 r6:0037 r5:c12efbc8 r4:eeb76880 [ 16.386297] [] (clk_prepare_lock) from [] (clk_prepare+0x18/0x38) [ 16.386303] r5:c12efbc8 r4:eeb76880 [ 16.386311] [] (clk_prepare) from [] (imx_console_write+0x34/0x248) [ 16.386317] r4:ee999c10 r3:c134a92c [ 16.386329] [] (imx_console_write) from [] (call_console_drivers.constprop.25+0xe0/0x104) [ 16.386342] r10:c07b938c r9: r8:c133000c r7:0037 r6:edcf4000 r5:c12ef6c0 [ 16.386345] r4:c0afef38 [ 16.386355] [] (call_console_drivers.constprop.25) from [] (console_unlock+0x3fc/0x57c) [ 16.386367] r10:c132fff0 r9:0100 r8:0037 r7: r6:0005 r5:c12f4df0 Signed-off-by: Robin Gong --- drivers/tty/serial/imx.c | 20 ++-- 1 file changed, 14 insertions(+), 6 deletions(-) diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c index fe3d41c..d0388a0 100644 --- a/drivers/tty/serial/imx.c +++ b/drivers/tty/serial/imx.c @@ -1631,12 +1631,12 @@ imx_console_write(struct console *co, const char *s, unsigned int count) int locked = 1; int retval; - retval = clk_prepare_enable(sport->clk_per); + retval = clk_enable(sport->clk_per); if (retval) return; - retval = clk_prepare_enable(sport->clk_ipg); + retval = clk_enable(sport->clk_ipg); if (retval) { - clk_disable_unprepare(sport->clk_per); + clk_disable(sport->clk_per); return; } @@ -1675,8 +1675,8 @@ imx_console_write(struct console *co, const char *s, unsigned int count) if (locked) spin_unlock_irqrestore(>port.lock, flags); - clk_disable_unprepare(sport->clk_ipg); - clk_disable_unprepare(sport->clk_per); + clk_disable(sport->clk_ipg); + clk_disable(sport->clk_per); } /* @@ -1777,7 +1777,15 @@ imx_console_setup(struct console *co, char *options) retval = uart_set_options(>port, co, baud, parity, bits, flow); - clk_disable_unprepare(sport->clk_ipg); + clk_disable(sport->clk_ipg); + if (retval) { + clk_unprepare(sport->clk_ipg); + goto error_console; + } + + retval = clk_prepare(sport->clk_per); + if (retval) + clk_disable_unprepare(sport->clk_ipg); error_console: return retval; -- 1.9.1 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: linux-next: build failure after merge of the mailbox tree
On 1 November 2015 at 17:42, Stephen Rothwell wrote: > Hi Jassi, > > After merging the mailbox tree, today's linux-next build (x86_64 > allmodconfig) failed like this: > > drivers/mailbox/mailbox-test.c: In function 'mbox_test_receive_message': > drivers/mailbox/mailbox-test.c:226:11: error: implicit declaration of > function '__io_virt' [-Werror=implicit-function-declaration] >__io_virt(tdev->mmio), MBOX_MAX_MSG_LEN, true); >^ > > Caused by commit > > a133f8b65d59 ("mailbox: mailbox-test: Correctly repair Sparse warnings") > > I have used the mailbox tree from next-20151022 for today. > Lee, would you please send a fix for your last fix please? Thanks. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v11 08/28] tracing: Add lock-free tracing_map
Hi Namhyung, On Wed, 2015-11-04 at 11:26 +0900, Namhyung Kim wrote: > Hi Tom, > > On Tue, Nov 03, 2015 at 07:47:52PM -0600, Tom Zanussi wrote: > > Hi Namhyung, > > > > On Mon, 2015-11-02 at 16:08 +0900, Namhyung Kim wrote: > > > I thought it'd be better if users can see which one is the real drop > > > or not. IOW if drop count is much smaller than the normal event > > > count, [s]he might want to ignore the occasional drops. Otherwise, > > > [s]he should restart with a bigger table. This requires accurate > > > counts of events and drops though. > > > > > > > OK, how about the below - it basically moves the drops set/test/inc into > > tracing_map_insert(), as well as a total hits count. So those values > > will be available for users to use in deciding whether to use the data > > or restart with a bigger table, and the loop is bailed out of only if no > > matching keys are found and there are drops, so callers can continue > > updating existing entries. > > But if a key didn't get a desired index, it'd still fail to update.. > > > > > > Users who want the original behavior still get the NULL return and can > > stop calling tracing_map_insert() as before: > > > > struct tracing_map_elt *tracing_map_insert(struct tracing_map *map, void > > *key) > > { > > u32 idx, key_hash, test_key; > > struct tracing_map_entry *entry; > > > > key_hash = jhash(key, map->key_size, 0); > > if (key_hash == 0) > > key_hash = 1; > > idx = key_hash >> (32 - (map->map_bits + 1)); > > > > while (1) { > > idx &= (map->map_size - 1); > > entry = TRACING_MAP_ENTRY(map->map, idx); > > test_key = entry->key; > > > > if (test_key && test_key == key_hash && entry->val && > > keys_match(key, entry->val->key, map->key_size)) { > > atomic64_inc(>hits); > > return entry->val; > > } > > > > if (atomic64_read(>drops)) { > > atomic64_inc(>drops); > > break; > > } > > IMHO it should be removed. > > > > > if (!test_key && !cmpxchg(>key, 0, key_hash)) { > > struct tracing_map_elt *elt; > > > > elt = get_free_elt(map); > > if (!elt) { > > atomic64_inc(>drops); > > And reset entry->key here.. > > > break; > > } > > memcpy(elt->key, key, map->key_size); > > entry->val = elt; > > > > atomic64_inc(>hits); > > return entry->val; > > } > > idx++; > > } > > > > return NULL; > > } > > > Then tracing_map_lookup() can be implemented like *_insert but bail out > from the loop if test_key is 0. The caller might do like this: > > if (!atomic64_read(>drops)) > elt = tracing_map_insert(...); > else > elt = tracing_map_lookup(...); > Yeah, I think that should work - let me try that in the next version... Thanks, Tom > Thanks, > Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH V5] mm: memory hot-add: memory can not be added to movable zone defaultly
On 11/4 2015 0:13, Yasuaki Ishimatsu wrote: Hi Changsheng, According to the following thread, Tang has no objection to change kernel behavior since udev cannot online memory as movable. https://lkml.org/lkml/2015/10/21/159 So how about reposting the v5 patch? I have a comment about the patch. Please see below. Thanks,I will update the patch and repost it On Tue, 15 Sep 2015 03:49:58 -0400 Changsheng Liu wrote: From: Changsheng Liu After the user config CONFIG_MOVABLE_NODE and movable_node kernel option, When the memory is hot added, should_add_memory_movable() return 0 because all zones including movable zone are empty, so the memory that was hot added will be added to the normal zone and the normal zone will be created firstly. But we want the whole node to be added to movable zone defaultly. So we change should_add_memory_movable(): if the user config CONFIG_MOVABLE_NODE and movable_node kernel option it will always return 1 and all zones is empty at the same time, so that the movable zone will be created firstly and then the whole node will be added to movable zone defaultly. If we want the node to be added to normal zone, we can do it as follows: "echo online_kernel > /sys/devices/system/memory/memoryXXX/state" Signed-off-by: Xiaofeng Yan Signed-off-by: Changsheng Liu Tested-by: Dongdong Fan --- mm/memory_hotplug.c |8 1 files changed, 8 insertions(+), 0 deletions(-) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 26fbba7..d39dbb0 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -1190,6 +1190,9 @@ static int check_hotplug_memory_range(u64 start, u64 size) /* * If movable zone has already been setup, newly added memory should be check. * If its address is higher than movable zone, it should be added as movable. + * And if system boots up with movable_node and config CONFIG_MOVABLE_NOD and + * added memory does not overlap the zone before MOVABLE_ZONE, + * the memory is added as movable * Without this check, movable zone may overlap with other zone. */ static int should_add_memory_movable(int nid, u64 start, u64 size) @@ -1197,6 +1200,11 @@ static int should_add_memory_movable(int nid, u64 start, u64 size) unsigned long start_pfn = start >> PAGE_SHIFT; pg_data_t *pgdat = NODE_DATA(nid); struct zone *movable_zone = pgdat->node_zones + ZONE_MOVABLE; + struct zone *pre_zone = pgdat->node_zones + (ZONE_MOVABLE - 1); + + if (movable_node_is_enabled() + && zone_end_pfn(pre_zone) <= start_pfn) + return 1; if (movable_node_is_enabled() && (zone_end_pfn(pre_zone) <= start_pfn)) Thanks, Yasuaki Ishimatsu if (zone_is_empty(movable_zone)) return 0; -- 1.7.1 . -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v13 21/51] ext4: Add richacl feature flag
On Wed, Nov 4, 2015 at 3:28 AM, Andreas Gruenbacher wrote: > It's the commit message that's misleading here, I'll fix it. Commit message changed to: This feature flag selects richacl instead of POSIX ACL support on the filesystem. When this feature is off, the "acl" and "noacl" mount options control whether POSIX ACLs are enabled. When it is on, richacls are automatically enabled and using the "noacl" mount option leads to an error. Thanks, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v11 17/28] tracing: Add hist trigger 'execname' modifier
On Mon, 2015-11-02 at 23:10 +0900, Namhyung Kim wrote: > On Thu, Oct 22, 2015 at 01:14:21PM -0500, Tom Zanussi wrote: > > Allow users to have pid fields displayed as program names in the output > > by appending '.execname' to field names: > > > ># echo hist:keys=aaa.execname ... \ > > [ if filter] > event/trigger > > So in this case, 'aaa' should be 'common_pid', right? > Right. > Also, it's just for display thus tasks which have different pid but > same comm name will be saved in different entries, right? If so, it'd > be better to note that somewhere as it might confuse users.. > Right, good point, will document that. > > > > > Signed-off-by: Tom Zanussi > > Tested-by: Masami Hiramatsu > > --- > > kernel/trace/trace.c | 3 +- > > kernel/trace/trace_events_hist.c | 86 > > +++- > > 2 files changed, 87 insertions(+), 2 deletions(-) > > > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > > index 4f0968e..a143be0 100644 > > --- a/kernel/trace/trace.c > > +++ b/kernel/trace/trace.c > > @@ -3822,7 +3822,8 @@ static const char readme_msg[] = > > "\tfollowing modifiers to the field name, as applicable:\n\n" > > "\t.hexdisplay a number as a hex value\n" > > "\t.symdisplay an address as a symbol\n" > > - "\t.sym-offset display an address as a symbol and > > offset\n\n" > > + "\t.sym-offset display an address as a symbol and offset\n" > > + "\t.execname display a common_pid as a program name\n\n" > > "\tThe 'pause' parameter can be used to pause an existing hist\n" > > "\ttrigger or to start a hist trigger but not log any events\n" > > "\tuntil told to do so. 'continue' can be used to start or\n" > > diff --git a/kernel/trace/trace_events_hist.c > > b/kernel/trace/trace_events_hist.c > > index e6e5c6e..656973a 100644 > > --- a/kernel/trace/trace_events_hist.c > > +++ b/kernel/trace/trace_events_hist.c > > @@ -75,6 +75,7 @@ enum hist_field_flags { > > HIST_FIELD_HEX = 8, > > HIST_FIELD_SYM = 16, > > HIST_FIELD_SYM_OFFSET = 32, > > + HIST_FIELD_EXECNAME = 64, > > }; > > > > struct hist_trigger_attrs { > > @@ -218,6 +219,78 @@ static struct hist_trigger_attrs > > *parse_hist_trigger_attrs(char *trigger_str) > > return ERR_PTR(ret); > > } > > > > +static inline void save_comm(char *comm, struct task_struct *task) > > +{ > > + if (!task->pid) { > > + strcpy(comm, ""); > > + return; > > + } > > + > > + if (WARN_ON_ONCE(task->pid < 0)) { > > + strcpy(comm, ""); > > + return; > > + } > > + > > + if (task->pid > PID_MAX_DEFAULT) { > > + strcpy(comm, "<...>"); > > + return; > > + } > > I guess this comes from the saved_cmdlines code which uses a fixed > array of PID_MAX_DEFAULT size. But as it doesn't use the array, we > can simply copy the task->comm IMHO. > > > > + > > + memcpy(comm, task->comm, TASK_COMM_LEN); > > +} > > + > > +static void hist_trigger_elt_free(struct tracing_map_elt *elt) > > +{ > > + kfree((char *)elt->private_data); > > +} > > + > > +static int hist_trigger_elt_alloc(struct tracing_map_elt *elt) > > +{ > > + struct hist_trigger_data *hist_data = elt->map->private_data; > > + struct hist_field *key_field; > > + unsigned int i; > > + > > + for (i = hist_data->n_vals; i < hist_data->n_fields; i++) { > > I think it'd be better to add a list iterator macro like > for_each_key_field() or something since it's easy to miss that the key > fields always follows the value fields IMHO. > Yeah, that's a good idea (sometimes I even forget that myself ;-) > > > + key_field = hist_data->fields[i]; > > + > > + if (key_field->flags & HIST_FIELD_EXECNAME) { > > + unsigned int size = TASK_COMM_LEN + 1; > > + > > + elt->private_data = kzalloc(size, GFP_KERNEL); > > + if (!elt->private_data) > > + return -ENOMEM; > > + break; > > + } > > + } > > + > > + return 0; > > +} > > + > > +static void hist_trigger_elt_copy(struct tracing_map_elt *to, > > + struct tracing_map_elt *from) > > +{ > > + char *comm_from = from->private_data; > > + char *comm_to = to->private_data; > > + > > + if (comm_from) > > + memcpy(comm_to, comm_from, TASK_COMM_LEN + 1); > > +} > > + > > +static void hist_trigger_elt_init(struct tracing_map_elt *elt) > > +{ > > + char *comm = elt->private_data; > > + > > + if (comm) > > + save_comm(comm, current); > > +} > > + > > +static struct tracing_map_ops hist_trigger_ops = { > > + .elt_alloc = hist_trigger_elt_alloc, > > + .elt_copy = hist_trigger_elt_copy, > > + .elt_free = hist_trigger_elt_free, > > + .elt_init =
RE: [GIT PULL] Thermal-SoC management updates for v4.4-rc1 #1
> -Original Message- > From: linux-acpi-ow...@vger.kernel.org [mailto:linux-acpi- > ow...@vger.kernel.org] On Behalf Of Rafael J. Wysocki > Sent: Wednesday, November 04, 2015 5:14 AM > To: Eduardo Valentin > Cc: Zhang, Rui; Rafael J. Wysocki; Linux ACPI; Linux PM; LKML > Subject: Re: [GIT PULL] Thermal-SoC management updates for v4.4-rc1 #1 > Importance: High > > HI, > > On Tue, Nov 3, 2015 at 6:13 PM, Eduardo Valentin > wrote: > > Rui, Rafael, > > > > On Tue, Nov 03, 2015 at 03:10:28AM +, Zhang, Rui wrote: > >> > >> > Javi Merino (2): > >> > PM / OPP: get the voltage for all OPPs > >> > >> I think this patch should be took by Rafael, right? > >> > >> Thanks, > >> Rui > >> > >> > drivers/base/power/opp.c | 4 +- > > > > Well, yes. I included it here because the devfreq support has a > > (functional) dependency on this change. As it is a small change, and I > > did not notice any break in linux-next, I included it in my pull > > request. > > > > Rafael, I think it is your call. Do you want collect this one and send > > it via your tree, or are you fine having it going through Ruis? > > It can go in through the Rui's tree as far as I'm concerned. > Good, I will take this patch. Thanks! -Rui
Re: [PATCH v13 02/51] vfs: Add MAY_CREATE_FILE and MAY_CREATE_DIR permission flags
On Nov 3, 2015, at 8:16 AM, Andreas Gruenbacher wrote: > > Richacls distinguish between creating non-directories and directories. To > support that, add an isdir parameter to may_create(). When checking > inode_permission() for create permission, pass in an additional > MAY_CREATE_FILE or MAY_CREATE_DIR mask flag. > > To allow checking for delete *and* create access when replacing an existing > file via vfs_rename(), add a replace parameter to may_delete(). > > Signed-off-by: Andreas Gruenbacher > Reviewed-by: J. Bruce Fields > --- > fs/namei.c | 43 +-- > include/linux/fs.h | 2 ++ > 2 files changed, 27 insertions(+), 18 deletions(-) > > diff --git a/fs/namei.c b/fs/namei.c > index 224ecf1..0259392 100644 > --- a/fs/namei.c > +++ b/fs/namei.c > @@ -453,7 +453,9 @@ static int sb_permission(struct super_block *sb, struct > inode *inode, int mask) > * this, letting us set arbitrary permissions for filesystem access without > * changing the "normal" UIDs which are used for other things. > * > - * When checking for MAY_APPEND, MAY_WRITE must also be set in @mask. > + * MAY_WRITE must be set in @mask whenever MAY_APPEND, MAY_CREATE_FILE, or > + * MAY_CREATE_DIR are set. That way, file systems that don't support these > + * permissions will check for MAY_WRITE instead. > */ > int inode_permission(struct inode *inode, int mask) > { > @@ -2549,10 +2551,11 @@ EXPORT_SYMBOL(__check_sticky); > * 10. We don't allow removal of NFS sillyrenamed files; it's handled by > * nfs_async_unlink(). > */ > -static int may_delete(struct inode *dir, struct dentry *victim, bool isdir) > +static int may_delete(struct inode *dir, struct dentry *victim, > + bool isdir, bool replace) > { > struct inode *inode = d_backing_inode(victim); > - int error; > + int error, mask = MAY_WRITE | MAY_EXEC; > > if (d_is_negative(victim)) > return -ENOENT; > @@ -2561,7 +2564,9 @@ static int may_delete(struct inode *dir, struct dentry > *victim, bool isdir) > BUG_ON(victim->d_parent->d_inode != dir); > audit_inode_child(dir, victim, AUDIT_TYPE_CHILD_DELETE); > > - error = inode_permission(dir, MAY_WRITE | MAY_EXEC); > + if (replace) > + mask |= isdir ? MAY_CREATE_DIR : MAY_CREATE_FILE; > + error = inode_permission(dir, mask); > if (error) > return error; > if (IS_APPEND(dir)) > @@ -2592,14 +2597,16 @@ static int may_delete(struct inode *dir, struct > dentry *victim, bool isdir) > * 3. We should have write and exec permissions on dir > * 4. We can't do it if dir is immutable (done in permission()) > */ > -static inline int may_create(struct inode *dir, struct dentry *child) > +static inline int may_create(struct inode *dir, struct dentry *child, bool > isdir) > { > + int mask = isdir ? MAY_CREATE_DIR : MAY_CREATE_FILE; > + > audit_inode_child(dir, child, AUDIT_TYPE_CHILD_CREATE); > if (child->d_inode) > return -EEXIST; > if (IS_DEADDIR(dir)) > return -ENOENT; > - return inode_permission(dir, MAY_WRITE | MAY_EXEC); > + return inode_permission(dir, MAY_WRITE | MAY_EXEC | mask); > } > > /* > @@ -2649,7 +2656,7 @@ EXPORT_SYMBOL(unlock_rename); > int vfs_create(struct inode *dir, struct dentry *dentry, umode_t mode, > bool want_excl) > { > - int error = may_create(dir, dentry); > + int error = may_create(dir, dentry, false); > if (error) > return error; > > @@ -3494,7 +3501,7 @@ EXPORT_SYMBOL(user_path_create); > > int vfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t > dev) > { > - int error = may_create(dir, dentry); > + int error = may_create(dir, dentry, false); Passing "true" and "false" from the caller doesn't really make it clear to the reader what the argument is for. Since it isn't possible to check the file mode inside may_create() from dentry->d_inode->i_mode (inode doesn't exist yet) then passing "mode" as an argument would at least make the code more readable. "mode" is available in all of the callers. > > if (error) > return error; > @@ -3586,7 +3593,7 @@ SYSCALL_DEFINE3(mknod, const char __user *, filename, > umode_t, mode, unsigned, d > > int vfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode) > { > - int error = may_create(dir, dentry); > + int error = may_create(dir, dentry, true); > unsigned max_links = dir->i_sb->s_max_links; > > if (error) > @@ -3667,7 +3674,7 @@ EXPORT_SYMBOL(dentry_unhash); > > int vfs_rmdir(struct inode *dir, struct dentry *dentry) > { > - int error = may_delete(dir, dentry, 1); > + int error = may_delete(dir, dentry, true, false); This is a prime example why passing "true" and "false" as function arguments is not very useful, and especially prone to bugs when there are two of them. That said, this is code originally from Al,
Re: [PATCH v13 21/51] ext4: Add richacl feature flag
Andreas, On Wed, Nov 4, 2015 at 3:18 AM, Andreas Dilger wrote: > This patch confuses me. I thought the whole point of INCOMPAT_RICHACL > was that the filesystem should never, ever be mounted without ACL support > because the ACLs will get confused without it. In that case, it doesn't > make sense to have a mount option that _has_ to be specified to mount the > filesystem, and returns an error when trying to disable it. > > It makes more sense to just enable "acl" by default if INCOMPAT_RICHACL > is set in the superblock and not need the mount option at all. It's the commit message that's misleading here, I'll fix it. On richacl filesystems, the acl mount option is always on. It's only on POSIX ACL filesystems that the mount option can be used to turn POSIX ACLs off (which arguably wasn't such a good idea, but there we have it). Thanks, Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)
On (11/04/15 10:25), Minchan Kim wrote: [..] > +static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, > + unsigned long end, struct mm_walk *walk) > + > +{ > + struct mmu_gather *tlb = walk->private; > + struct mm_struct *mm = tlb->mm; > + struct vm_area_struct *vma = walk->vma; > + spinlock_t *ptl; > + pte_t *pte, ptent; > + struct page *page; I'll just ask (probably I'm missing something) + pmd_trans_huge_lock() ? > + split_huge_page_pmd(vma, addr, pmd); > + if (pmd_trans_unstable(pmd)) > + return 0; > + > + pte = pte_offset_map_lock(mm, pmd, addr, ); > + arch_enter_lazy_mmu_mode(); > + for (; addr != end; pte++, addr += PAGE_SIZE) { > + ptent = *pte; > + > + if (!pte_present(ptent)) > + continue; > + > + page = vm_normal_page(vma, addr, ptent); > + if (!page) > + continue; > + -ss -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 1/4] perf test: Keep test result clean if '-v' not set
According to [1], 'perf state' should avoid output too much information if '-v' not set, only 'Ok', 'FAIL' or 'Skip' need to be printed. This patch removes sereval stderr output to make output clean. Before this patch: # perf test dummy 23: Test using a dummy software event to keep tracking : (not supported) Ok After this patch: # perf test dummy 23: Test using a dummy software event to keep tracking : Skip [1] http://lkml.kernel.org/r/20151020134155.ge4...@redhat.com Signed-off-by: Wang Nan Acked-by: Namhyung Kim Cc: Arnaldo Carvalho de Melo Link: http://lkml.kernel.org/n/tip-446450727-218554-1-git-send-email-wangn...@huawei.com --- tools/perf/tests/attr.c| 3 +-- tools/perf/tests/code-reading.c| 8 tools/perf/tests/keep-tracking.c | 4 ++-- tools/perf/tests/llvm.c| 11 --- tools/perf/tests/switch-tracking.c | 4 ++-- 5 files changed, 13 insertions(+), 17 deletions(-) diff --git a/tools/perf/tests/attr.c b/tools/perf/tests/attr.c index 2dfc9ad..638875a 100644 --- a/tools/perf/tests/attr.c +++ b/tools/perf/tests/attr.c @@ -171,6 +171,5 @@ int test__attr(void) !lstat(path_perf, )) return run_dir(path_dir, path_perf); - fprintf(stderr, " (omitted)"); - return 0; + return TEST_SKIP; } diff --git a/tools/perf/tests/code-reading.c b/tools/perf/tests/code-reading.c index 49b1959..a767a64 100644 --- a/tools/perf/tests/code-reading.c +++ b/tools/perf/tests/code-reading.c @@ -613,16 +613,16 @@ int test__code_reading(void) case TEST_CODE_READING_OK: return 0; case TEST_CODE_READING_NO_VMLINUX: - fprintf(stderr, " (no vmlinux)"); + pr_debug("no vmlinux\n"); return 0; case TEST_CODE_READING_NO_KCORE: - fprintf(stderr, " (no kcore)"); + pr_debug("no kcore\n"); return 0; case TEST_CODE_READING_NO_ACCESS: - fprintf(stderr, " (no access)"); + pr_debug("no access\n"); return 0; case TEST_CODE_READING_NO_KERNEL_OBJ: - fprintf(stderr, " (no kernel obj)"); + pr_debug("no kernel obj\n"); return 0; default: return -1; diff --git a/tools/perf/tests/keep-tracking.c b/tools/perf/tests/keep-tracking.c index 4d4b983..a2e2269 100644 --- a/tools/perf/tests/keep-tracking.c +++ b/tools/perf/tests/keep-tracking.c @@ -90,8 +90,8 @@ int test__keep_tracking(void) evsel->attr.enable_on_exec = 0; if (perf_evlist__open(evlist) < 0) { - fprintf(stderr, " (not supported)"); - err = 0; + pr_debug("Unable to open dummy and cycles event\n"); + err = TEST_SKIP; goto out_err; } diff --git a/tools/perf/tests/llvm.c b/tools/perf/tests/llvm.c index 52d5597..512d362 100644 --- a/tools/perf/tests/llvm.c +++ b/tools/perf/tests/llvm.c @@ -36,7 +36,7 @@ static int test__bpf_parsing(void *obj_buf, size_t obj_buf_sz) static int test__bpf_parsing(void *obj_buf __maybe_unused, size_t obj_buf_sz __maybe_unused) { - fprintf(stderr, " (skip bpf parsing)"); + pr_debug("Skip bpf parsing\n"); return 0; } #endif @@ -55,7 +55,7 @@ int test__llvm(void) * and clang is not found in $PATH, and this is not perf test -v */ if (verbose == 0 && !llvm_param.user_set_param && llvm__search_clang()) { - fprintf(stderr, " (no clang, try 'perf test -v LLVM')"); + pr_debug("No clang and no verbosive, skip this test\n"); return TEST_SKIP; } @@ -86,11 +86,8 @@ int test__llvm(void) err = llvm__compile_bpf("-", _buf, _buf_sz); verbose = old_verbose; - if (err) { - if (!verbose) - fprintf(stderr, " (use -v to see error message)"); - return -1; - } + if (err) + return TEST_FAIL; err = test__bpf_parsing(obj_buf, obj_buf_sz); free(obj_buf); diff --git a/tools/perf/tests/switch-tracking.c b/tools/perf/tests/switch-tracking.c index e698742..a02af50 100644 --- a/tools/perf/tests/switch-tracking.c +++ b/tools/perf/tests/switch-tracking.c @@ -366,7 +366,7 @@ int test__switch_tracking(void) /* Third event */ if (!perf_evlist__can_select_event(evlist, sched_switch)) { - fprintf(stderr, " (no sched_switch)"); + pr_debug("No sched_switch\n"); err = 0; goto out; } @@ -442,7 +442,7 @@ int test__switch_tracking(void) } if (perf_evlist__open(evlist) < 0) { - fprintf(stderr, " (not supported)"); + pr_debug("Not supported\n"); err = 0; goto out; } -- 1.8.3.4 -- To unsubscribe from this list: send the line
[PATCH v2 4/4] perf tools: Improve BPF related error messages output
A series of bpf loader related error code is introduced to help error delivering. Functions are improved to return those new error code. Functions which return pointers are adjusted to encode error code into return value using "ERR_PTR". bpf_loader_strerror() are introduced to convert those error message to string. It detected the value of error code and calls libbpf_strerror() and strerror_r() accordingly, so caller don't need to consider checking the range of error code. bpf__strerror_head() is updated so existing strerror functions can support these error code automatically. Signed-off-by: Wang Nan Cc: Arnaldo Carvalho de Melo Cc: Namhyung Kim --- tools/perf/util/bpf-loader.c | 98 +- tools/perf/util/bpf-loader.h | 18 tools/perf/util/parse-events.c | 7 +-- 3 files changed, 110 insertions(+), 13 deletions(-) diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c index 76f07ce..375f254 100644 --- a/tools/perf/util/bpf-loader.c +++ b/tools/perf/util/bpf-loader.c @@ -14,6 +14,10 @@ #include "probe-finder.h" // for MAX_PROBES #include "llvm-utils.h" +#if BPF_LOADER_ERRNO__END >= LIBBPF_ERRNO__START +# error Too many BPF loader error code +#endif + #define DEFINE_PRINT_FN(name, level) \ static int libbpf_##name(const char *fmt, ...) \ { \ @@ -53,7 +57,7 @@ struct bpf_object *bpf__prepare_load(const char *filename, bool source) err = llvm__compile_bpf(filename, _buf, _buf_sz); if (err) - return ERR_PTR(err); + return ERR_PTR(-BPF_LOADER_ERRNO__ECOMPILE); obj = bpf_object__open_buffer(obj_buf, obj_buf_sz, filename); free(obj_buf); } else @@ -113,14 +117,14 @@ config_bpf_program(struct bpf_program *prog) if (err < 0) { pr_debug("bpf: '%s' is not a valid config string\n", config_str); - err = -EINVAL; + err = -BPF_LOADER_ERRNO__ECONFIG; goto errout; } if (pev->group && strcmp(pev->group, PERF_BPF_PROBE_GROUP)) { pr_debug("bpf: '%s': group for event is set and not '%s'.\n", config_str, PERF_BPF_PROBE_GROUP); - err = -EINVAL; + err = -BPF_LOADER_ERRNO__EGROUP; goto errout; } else if (!pev->group) pev->group = strdup(PERF_BPF_PROBE_GROUP); @@ -132,9 +136,9 @@ config_bpf_program(struct bpf_program *prog) } if (!pev->event) { - pr_debug("bpf: '%s': event name is missing\n", + pr_debug("bpf: '%s': event name is missing. Section name should be 'key=value'\n", config_str); - err = -EINVAL; + err = -BPF_LOADER_ERRNO__EEVENTNAME; goto errout; } pr_debug("bpf: config '%s' is ok\n", config_str); @@ -285,7 +289,7 @@ int bpf__foreach_tev(struct bpf_object *obj, (void **)); if (err || !priv) { pr_debug("bpf: failed to get private field\n"); - return -EINVAL; + return -BPF_LOADER_ERRNO__EINTERNAL; } pev = >pev; @@ -308,13 +312,65 @@ int bpf__foreach_tev(struct bpf_object *obj, return 0; } +#define ARRAY_SIZE(x) (sizeof(x)/sizeof(x[0])) + +struct { + int code; + const char *msg; +} bpf_loader_strerror_table[] = { + {BPF_LOADER_ERRNO__ECONFIG, "Invalid config string"}, + {BPF_LOADER_ERRNO__EGROUP, "Invalid group name"}, + {BPF_LOADER_ERRNO__EEVENTNAME, "No event name found in config string"}, + {BPF_LOADER_ERRNO__EINTERNAL, "BPF loader internal error"}, + {BPF_LOADER_ERRNO__ECOMPILE, "Error when compiling BPF scriptlet"}, +}; + +static int +bpf_loader_strerror(int err, char *buf, size_t size) +{ + unsigned int i; + + if (!buf || !size) + return -1; + + err = err > 0 ? err : -err; + + if (err > LIBBPF_ERRNO__START) + return libbpf_strerror(err, buf, size); + + if (err < BPF_LOADER_ERRNO__START) { + char sbuf[STRERR_BUFSIZE], *msg; + + msg = strerror_r(err, sbuf, sizeof(sbuf)); + snprintf(buf, size, "%s", msg); + buf[size - 1] = '\0'; + return 0; + } + + for (i = 0; i < ARRAY_SIZE(bpf_loader_strerror_table); i++) { + if (bpf_loader_strerror_table[i].code == err) { + const char *msg; + + msg = bpf_loader_strerror_table[i].msg; + snprintf(buf, size, "%s", msg); + buf[size - 1] = '\0'; + return 0; + } + } + + snprintf(buf, size, "Unknown bpf loader error %d",
[PATCH v2 3/4] bpf tools: Improve libbpf error reporting
In this patch, a series libbpf specific error numbers and libbpf_strerror() are created to help reporting error to caller. Functions are updated to pass correct error number through macro CHECK_ERR(). All users of bpf_object__open{_buffer}() and bpf_program__title() in perf are modified accordingly. Signed-off-by: Wang Nan Cc: Arnaldo Carvalho de Melo Cc: Namhyung Kim --- tools/lib/bpf/libbpf.c | 149 - tools/lib/bpf/libbpf.h | 12 tools/perf/tests/llvm.c| 2 +- tools/perf/util/bpf-loader.c | 8 +-- tools/perf/util/parse-events.c | 4 +- 5 files changed, 120 insertions(+), 55 deletions(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 4252fc2..74c64b1 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -61,6 +61,62 @@ void libbpf_set_print(libbpf_print_fn_t warn, __pr_debug = debug; } +#define ARRAY_SIZE(x) (sizeof(x)/sizeof(x[0])) +#define STRERR_BUFSIZE 128 + +struct { + int code; + const char *msg; +} libbpf_strerror_table[] = { + {LIBBPF_ERRNO__ELIBELF, "Something wrong in libelf"}, + {LIBBPF_ERRNO__EFORMAT, "BPF object format invalid"}, + {LIBBPF_ERRNO__EKVERSION, "'version' section incorrect or lost"}, + {LIBBPF_ERRNO__EENDIAN, "Endian missmatch"}, + {LIBBPF_ERRNO__EINTERNAL, "Internal error in libbpf"}, + {LIBBPF_ERRNO__ERELOC, "Relocation failed"}, + {LIBBPF_ERRNO__ELOAD, "Failed to load program"}, +}; + +int libbpf_strerror(int err, char *buf, size_t size) +{ + unsigned int i; + + if (!buf || !size) + return -1; + + err = err > 0 ? err : -err; + + if (err < LIBBPF_ERRNO__START) { + int ret; + + ret = strerror_r(err, buf, size); + buf[size - 1] = '\0'; + return ret; + } + + for (i = 0; i < ARRAY_SIZE(libbpf_strerror_table); i++) { + if (libbpf_strerror_table[i].code == err) { + const char *msg; + + msg = libbpf_strerror_table[i].msg; + snprintf(buf, size, "%s", msg); + buf[size - 1] = '\0'; + return 0; + } + } + + snprintf(buf, size, "Unknown libbpf error %d", err); + buf[size - 1] = '\0'; + return -1; +} + +#define CHECK_ERR(action, err, out) do { \ + err = action; \ + if (err)\ + goto out; \ +} while(0) + + /* Copied from tools/perf/util/util.h */ #ifndef zfree # define zfree(ptr) ({ free(*ptr); *ptr = NULL; }) @@ -258,7 +314,7 @@ static struct bpf_object *bpf_object__new(const char *path, obj = calloc(1, sizeof(struct bpf_object) + strlen(path) + 1); if (!obj) { pr_warning("alloc memory failed for %s\n", path); - return NULL; + return ERR_PTR(-ENOMEM); } strcpy(obj->path, path); @@ -305,7 +361,7 @@ static int bpf_object__elf_init(struct bpf_object *obj) if (obj_elf_valid(obj)) { pr_warning("elf init: internal error\n"); - return -EEXIST; + return -LIBBPF_ERRNO__ELIBELF; } if (obj->efile.obj_buf_sz > 0) { @@ -331,14 +387,14 @@ static int bpf_object__elf_init(struct bpf_object *obj) if (!obj->efile.elf) { pr_warning("failed to open %s as ELF file\n", obj->path); - err = -EINVAL; + err = -LIBBPF_ERRNO__ELIBELF; goto errout; } if (!gelf_getehdr(obj->efile.elf, >efile.ehdr)) { pr_warning("failed to get EHDR from %s\n", obj->path); - err = -EINVAL; + err = -LIBBPF_ERRNO__EFORMAT; goto errout; } ep = >efile.ehdr; @@ -346,7 +402,7 @@ static int bpf_object__elf_init(struct bpf_object *obj) if ((ep->e_type != ET_REL) || (ep->e_machine != 0)) { pr_warning("%s is not an eBPF object file\n", obj->path); - err = -EINVAL; + err = -LIBBPF_ERRNO__EFORMAT; goto errout; } @@ -374,14 +430,14 @@ bpf_object__check_endianness(struct bpf_object *obj) goto mismatch; break; default: - return -EINVAL; + return -LIBBPF_ERRNO__EENDIAN; } return 0; mismatch: pr_warning("Error: endianness mismatch.\n"); - return -EINVAL; + return -LIBBPF_ERRNO__EENDIAN; } static int @@ -402,7 +458,7 @@ bpf_object__init_kversion(struct bpf_object *obj, if (size != sizeof(kver)) { pr_warning("invalid kver section in %s\n", obj->path); - return -EINVAL; + return
[PATCH v2 2/4] perf tools: Mute libbpf when '-v' not set
According to [1], libbpf should be muted. This patch reset info and warning message level to ensure libbpf doesn't output anything even if error happened. [1] http://lkml.kernel.org/r/20151020151255.gf5...@kernel.org Signed-off-by: Wang Nan Cc: Arnaldo Carvalho de Melo Cc: Namhyung Kim --- tools/perf/util/bpf-loader.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c index ba6f752..0c5d174 100644 --- a/tools/perf/util/bpf-loader.c +++ b/tools/perf/util/bpf-loader.c @@ -26,8 +26,8 @@ static int libbpf_##name(const char *fmt, ...)\ return ret; \ } -DEFINE_PRINT_FN(warning, 0) -DEFINE_PRINT_FN(info, 0) +DEFINE_PRINT_FN(warning, 1) +DEFINE_PRINT_FN(info, 1) DEFINE_PRINT_FN(debug, 1) struct bpf_prog_priv { -- 1.8.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v11 08/28] tracing: Add lock-free tracing_map
Hi Tom, On Tue, Nov 03, 2015 at 07:47:52PM -0600, Tom Zanussi wrote: > Hi Namhyung, > > On Mon, 2015-11-02 at 16:08 +0900, Namhyung Kim wrote: > > I thought it'd be better if users can see which one is the real drop > > or not. IOW if drop count is much smaller than the normal event > > count, [s]he might want to ignore the occasional drops. Otherwise, > > [s]he should restart with a bigger table. This requires accurate > > counts of events and drops though. > > > > OK, how about the below - it basically moves the drops set/test/inc into > tracing_map_insert(), as well as a total hits count. So those values > will be available for users to use in deciding whether to use the data > or restart with a bigger table, and the loop is bailed out of only if no > matching keys are found and there are drops, so callers can continue > updating existing entries. But if a key didn't get a desired index, it'd still fail to update.. > > Users who want the original behavior still get the NULL return and can > stop calling tracing_map_insert() as before: > > struct tracing_map_elt *tracing_map_insert(struct tracing_map *map, void *key) > { > u32 idx, key_hash, test_key; > struct tracing_map_entry *entry; > > key_hash = jhash(key, map->key_size, 0); > if (key_hash == 0) > key_hash = 1; > idx = key_hash >> (32 - (map->map_bits + 1)); > > while (1) { > idx &= (map->map_size - 1); > entry = TRACING_MAP_ENTRY(map->map, idx); > test_key = entry->key; > > if (test_key && test_key == key_hash && entry->val && > keys_match(key, entry->val->key, map->key_size)) { > atomic64_inc(>hits); > return entry->val; > } > > if (atomic64_read(>drops)) { > atomic64_inc(>drops); > break; > } IMHO it should be removed. > > if (!test_key && !cmpxchg(>key, 0, key_hash)) { > struct tracing_map_elt *elt; > > elt = get_free_elt(map); > if (!elt) { > atomic64_inc(>drops); And reset entry->key here.. > break; > } > memcpy(elt->key, key, map->key_size); > entry->val = elt; > > atomic64_inc(>hits); > return entry->val; > } > idx++; > } > > return NULL; > } Then tracing_map_lookup() can be implemented like *_insert but bail out from the loop if test_key is 0. The caller might do like this: if (!atomic64_read(>drops)) elt = tracing_map_insert(...); else elt = tracing_map_lookup(...); Thanks, Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
[PATCH v2 0/4] perf bpf: Improve error code delivering and output
This patchset is based on perf/core (commit bebd23a2ed31d47e7dd746d3b125068aa2c42d85 in Arnaldo's tree). Compare with v1 [1], this patchset changes error reporting convention and the user of it at one same patch, avoid NULL checking in perf. Wang Nan (4): perf test: Keep test result clean if '-v' not set perf tools: Mute libbpf when '-v' not set bpf tools: Improve libbpf error reporting perf tools: Improve BPF related error messages output tools/lib/bpf/libbpf.c | 149 + tools/lib/bpf/libbpf.h | 12 +++ tools/perf/tests/attr.c| 3 +- tools/perf/tests/code-reading.c| 8 +- tools/perf/tests/keep-tracking.c | 4 +- tools/perf/tests/llvm.c| 13 ++-- tools/perf/tests/switch-tracking.c | 4 +- tools/perf/util/bpf-loader.c | 110 +++ tools/perf/util/bpf-loader.h | 18 + tools/perf/util/parse-events.c | 11 +-- 10 files changed, 245 insertions(+), 87 deletions(-) -- 1.8.3.4 -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v13 43/51] ext4: Don't allow unmapped identifiers in richacls
> On Nov 3, 2015, at 8:17 AM, Andreas Gruenbacher wrote: > > Don't allow acls which contain unmapped identifiers: they are meaningful > for remote file systems only. Looks fine. Reviewed-by: Andreas Dilger > Signed-off-by: Andreas Gruenbacher > --- > fs/ext4/richacl.c | 4 > 1 file changed, 4 insertions(+) > > diff --git a/fs/ext4/richacl.c b/fs/ext4/richacl.c > index 906d048..2115385 100644 > --- a/fs/ext4/richacl.c > +++ b/fs/ext4/richacl.c > @@ -74,6 +74,10 @@ __ext4_set_richacl(handle_t *handle, struct inode *inode, > struct richacl *acl) > int retval, size; > void *value; > > + /* Don't allow acls with unmapped identifiers. */ > + if (richacl_has_unmapped_identifiers(acl)) > + return -EINVAL; > + > if (richacl_equiv_mode(acl, ) == 0) { > inode->i_ctime = ext4_current_time(inode); > inode->i_mode = mode; > -- > 2.5.0 > Cheers, Andreas signature.asc Description: Message signed with OpenPGP using GPGMail
Re: [PATCH v13 20/51] ext4: Add richacl support
On Wed, Nov 4, 2015 at 3:13 AM, Andreas Dilger wrote: > Patch looks reasonable. One minor cleanup below that could be fixed when > the patch series is refreshed, and you can add: > > Reviewed-by: Andreas Dilger Okay, thank you. Andreas -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH] PM / OPP: Protect updates to list_dev with mutex
On 02-11-15, 11:14, Stephen Boyd wrote: > On 10/31, Viresh Kumar wrote: > > On 30-10-15, 10:06, Stephen Boyd wrote: > > > On 10/30, Viresh Kumar wrote: > > > > dev_opp_list_lock is used everywhere to protect device and OPP lists, > > > > but dev_pm_opp_set_sharing_cpus() is missed somehow. And instead we used > > > > rcu-lock, which wouldn't help here as we are adding a new list_dev. > > > > > > > > This also fixes a problem where we have called kzalloc(..., GFP_KERNEL) > > > > from within rcu-lock, which isn't allowed as kzalloc can sleep when > > > > called with GFP_KERNEL. > > > > > > Care to share the splat here? > > > > I don't know what is wrong (or right) with my exynos 5250 board, but I > > didn't got any splat here even with the right config options (yes I > > should have mentioned that earlier). I have seen this at other times > > as well, while we were running after some cpufreq traces.. > > > > But, the case in hand is pretty straight forward and Mike T. did get a > > splat as that's what he told me. We are calling a sleep-able function > > from rcu_lock and that's obviously wrong. > > That's slightly concerning. Given that the bug is so straight > forward but we can't reproduce it doesn't instill a lot of > confidence that the patch is correct. I have asked Mike to provide the splat he got and test the new patch to see if it is fixed or not. > > > > diff --git a/drivers/base/power/opp/cpu.c b/drivers/base/power/opp/cpu.c > > > > index 7654c5606307..91f15b2e25ee 100644 > > > > --- a/drivers/base/power/opp/cpu.c > > > > +++ b/drivers/base/power/opp/cpu.c > > > > @@ -124,12 +124,12 @@ int dev_pm_opp_set_sharing_cpus(struct device > > > > *cpu_dev, cpumask_var_t cpumask) > > > > struct device *dev; > > > > int cpu, ret = 0; > > > > > > > > - rcu_read_lock(); > > > > + mutex_lock(_opp_list_lock); > > > > > > > > dev_opp = _find_device_opp(cpu_dev); > > > > > > So does _find_device_opp() need to be called with rcu_read_lock() > > > held or not? The comment above the function makes it sound like > > > we need RCU, but we don't do that here anymore. > > > > That is more for the readers, as this function is going to return a > > pointer to the device OPP, and to make sure it isn't freed behind > > their back, they need to take the RCU lock. > > > > There are other writer code paths as well, like add-opp, where we just > > take the mutex as there can't be anything stronger than that :) > > > > Agreed, but the comment above the function is misleading. We > should correct that comment and/or add the lockdep checks to the > function like we have elsewhere in this file. Will do. -- viresh -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v13 21/51] ext4: Add richacl feature flag
> On Nov 3, 2015, at 8:16 AM, Andreas Gruenbacher wrote: > > From: "Aneesh Kumar K.V" > > This feature flag selects richacl instead of posix acl support on the > file system. In addition, the "acl" mount option is needed for enabling > either of the two kinds of acls. This patch confuses me. I thought the whole point of INCOMPAT_RICHACL was that the filesystem should never, ever be mounted without ACL support because the ACLs will get confused without it. In that case, it doesn't make sense to have a mount option that _has_ to be specified to mount the filesystem, and returns an error when trying to disable it. It makes more sense to just enable "acl" by default if INCOMPAT_RICHACL is set in the superblock and not need the mount option at all. Having a mount option made more sense when enabling this support was optional for the filesystem, and it could be mounted without it enabled, but you wanted INCOMPAT_RICHACL to prevent that so it needs to be fixed. Cheers, Andreas > Signed-off-by: Aneesh Kumar K.V > Signed-off-by: Andreas Gruenbacher > --- > fs/ext4/ext4.h | 6 -- > fs/ext4/super.c | 49 - > 2 files changed, 44 insertions(+), 11 deletions(-) > > diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h > index fd1f28b..b97a3b1 100644 > --- a/fs/ext4/ext4.h > +++ b/fs/ext4/ext4.h > @@ -991,7 +991,7 @@ struct ext4_inode_info { > #define EXT4_MOUNT_UPDATE_JOURNAL 0x01000 /* Update the journal format */ > #define EXT4_MOUNT_NO_UID32 0x02000 /* Disable 32-bit UIDs */ > #define EXT4_MOUNT_XATTR_USER 0x04000 /* Extended user attributes */ > -#define EXT4_MOUNT_POSIX_ACL 0x08000 /* POSIX Access Control Lists */ > +#define EXT4_MOUNT_ACL 0x08000 /* Access Control Lists > */ > #define EXT4_MOUNT_NO_AUTO_DA_ALLOC 0x1 /* No auto delalloc mapping */ > #define EXT4_MOUNT_BARRIER0x2 /* Use block barriers */ > #define EXT4_MOUNT_QUOTA 0x8 /* Some quota option set */ > @@ -1582,6 +1582,7 @@ static inline int ext4_encrypted_inode(struct inode > *inode) > #define EXT4_FEATURE_INCOMPAT_LARGEDIR0x4000 /* >2GB or 3-lvl > htree */ > #define EXT4_FEATURE_INCOMPAT_INLINE_DATA 0x8000 /* data in inode */ > #define EXT4_FEATURE_INCOMPAT_ENCRYPT 0x1 > +#define EXT4_FEATURE_INCOMPAT_RICHACL0x2 > > #define EXT2_FEATURE_COMPAT_SUPP EXT4_FEATURE_COMPAT_EXT_ATTR > #define EXT2_FEATURE_INCOMPAT_SUPP(EXT4_FEATURE_INCOMPAT_FILETYPE| \ > @@ -1607,7 +1608,8 @@ static inline int ext4_encrypted_inode(struct inode > *inode) >EXT4_FEATURE_INCOMPAT_FLEX_BG| \ >EXT4_FEATURE_INCOMPAT_MMP | \ >EXT4_FEATURE_INCOMPAT_INLINE_DATA | \ > - EXT4_FEATURE_INCOMPAT_ENCRYPT) > + EXT4_FEATURE_INCOMPAT_ENCRYPT | \ > + EXT4_FEATURE_INCOMPAT_RICHACL) > #define EXT4_FEATURE_RO_COMPAT_SUPP (EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER| \ >EXT4_FEATURE_RO_COMPAT_LARGE_FILE| \ >EXT4_FEATURE_RO_COMPAT_GDT_CSUM| \ > diff --git a/fs/ext4/super.c b/fs/ext4/super.c > index a63c7b0..7457ea8 100644 > --- a/fs/ext4/super.c > +++ b/fs/ext4/super.c > @@ -1270,6 +1270,28 @@ static ext4_fsblk_t get_sb_block(void **data) > return sb_block; > } > > +static int enable_acl(struct super_block *sb) > +{ > + sb->s_flags &= ~(MS_POSIXACL | MS_RICHACL); > + if (test_opt(sb, ACL)) { > + if (EXT4_HAS_INCOMPAT_FEATURE(sb, > + EXT4_FEATURE_INCOMPAT_RICHACL)) { > +#ifdef CONFIG_EXT4_FS_RICHACL > + sb->s_flags |= MS_RICHACL; > +#else > + return -EOPNOTSUPP; > +#endif > + } else { > +#ifdef CONFIG_EXT4_FS_POSIX_ACL > + sb->s_flags |= MS_POSIXACL; > +#else > + return -EOPNOTSUPP; > +#endif > + } > + } > + return 0; > +} > + > #define DEFAULT_JOURNAL_IOPRIO (IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, 3)) > static char deprecated_msg[] = "Mount option \"%s\" will be removed by %s\n" > "Contact linux-e...@vger.kernel.org if you think we should keep it.\n"; > @@ -1416,9 +1438,9 @@ static const struct mount_opts { >MOPT_NO_EXT2 | MOPT_DATAJ}, > {Opt_user_xattr, EXT4_MOUNT_XATTR_USER, MOPT_SET}, > {Opt_nouser_xattr, EXT4_MOUNT_XATTR_USER, MOPT_CLEAR}, > -#ifdef CONFIG_EXT4_FS_POSIX_ACL > - {Opt_acl, EXT4_MOUNT_POSIX_ACL, MOPT_SET}, > - {Opt_noacl, EXT4_MOUNT_POSIX_ACL, MOPT_CLEAR}, > +#if defined(CONFIG_EXT4_FS_POSIX_ACL) || defined(CONFIG_EXT4_FS_RICHACL) > + {Opt_acl, EXT4_MOUNT_ACL, MOPT_SET}, > + {Opt_noacl, EXT4_MOUNT_ACL, MOPT_CLEAR}, > #else > {Opt_acl, 0,
Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)
Hi Minchan, On (11/04/15 10:25), Minchan Kim wrote: [..] >+static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr, >+ unsigned long end, struct mm_walk *walk) >+ ... > + if (pmd_trans_unstable(pmd)) > + return 0; I think it makes sense to update pmd_trans_unstable() and pmd_none_or_trans_huge_or_clear_bad() comments in asm-generic/pgtable.h Because they explicitly mention MADV_DONTNEED only. Just a thought. > @@ -379,6 +502,14 @@ madvise_vma(struct vm_area_struct *vma, struct > vm_area_struct **prev, > return madvise_remove(vma, prev, start, end); > case MADV_WILLNEED: > return madvise_willneed(vma, prev, start, end); > + case MADV_FREE: > + /* > + * XXX: In this implementation, MADV_FREE works like XXX > + * MADV_DONTNEED on swapless system or full swap. > + */ > + if (get_nr_swap_pages() > 0) > + return madvise_free(vma, prev, start, end); > + /* passthrough */ -ss -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [GIT PULL] x86/apic changes for v4.4
On Tue, Nov 3, 2015 at 2:32 AM, Ingo Molnar wrote: > > Please pull the latest x86-apic-for-linus git tree from: Side note, I have an *old* patch that I think simplifies the (reasonably common) case of sending IPI's to individual CPU's. That's done by the "reschedule ipi" in particular. That's something that is trivially done on the x2apic, but that the mask-based interfaces makes insanely complicated. I have *not* rebased this on top of modern kernels so it may not actually work as a patch any more, because I'm just sending this out as a "Hmm, what do you guys think" rather than a real submission. Comments? Linus From 9f09a8b2a1fa58f2e692af4a7283fd1674749d39 Mon Sep 17 00:00:00 2001 From: Linus Torvalds Date: Fri, 20 Feb 2015 11:06:02 -0800 Subject: [PATCH] x86: add a single-target IPI function to the apic We still fall back on the "send mask" versions if an apic definition doesn't have the single-target version, but at least this allows the (trivial) case for the common clustered x2apic case. Signed-off-by: Linus Torvalds --- arch/x86/include/asm/apic.h | 1 + arch/x86/kernel/apic/x2apic_cluster.c | 9 + arch/x86/kernel/smp.c | 16 ++-- 3 files changed, 24 insertions(+), 2 deletions(-) diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h index 92003f3c8a42..5d47ffcfa65d 100644 --- a/arch/x86/include/asm/apic.h +++ b/arch/x86/include/asm/apic.h @@ -296,6 +296,7 @@ struct apic { unsigned int *apicid); /* ipi */ + void (*send_IPI)(int cpu, int vector); void (*send_IPI_mask)(const struct cpumask *mask, int vector); void (*send_IPI_mask_allbutself)(const struct cpumask *mask, int vector); diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c index e658f21681c8..177c4fb027a1 100644 --- a/arch/x86/kernel/apic/x2apic_cluster.c +++ b/arch/x86/kernel/apic/x2apic_cluster.c @@ -23,6 +23,14 @@ static inline u32 x2apic_cluster(int cpu) return per_cpu(x86_cpu_to_logical_apicid, cpu) >> 16; } +static void x2apic_send_IPI(int cpu, int vector) +{ + u32 dest = per_cpu(x86_cpu_to_logical_apicid, cpu); + + x2apic_wrmsr_fence(); + __x2apic_send_IPI_dest(dest, vector, APIC_DEST_LOGICAL); +} + static void __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest) { @@ -266,6 +274,7 @@ static struct apic apic_x2apic_cluster = { .cpu_mask_to_apicid_and = x2apic_cpu_mask_to_apicid_and, + .send_IPI = x2apic_send_IPI, .send_IPI_mask = x2apic_send_IPI_mask, .send_IPI_mask_allbutself = x2apic_send_IPI_mask_allbutself, .send_IPI_allbutself = x2apic_send_IPI_allbutself, diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c index be8e1bde07aa..78c9b12d892a 100644 --- a/arch/x86/kernel/smp.c +++ b/arch/x86/kernel/smp.c @@ -114,6 +114,18 @@ static atomic_t stopping_cpu = ATOMIC_INIT(-1); static bool smp_no_nmi_ipi = false; /* + * Helper wrapper: not all apic definitions support sending to + * a single CPU, so we fall back to sending to a mask. + */ +static void send_IPI_cpu(int cpu, int vector) +{ + if (apic->send_IPI) + apic->send_IPI(cpu, vector); + else + apic->send_IPI_mask(cpumask_of(cpu), vector); +} + +/* * this function sends a 'reschedule' IPI to another CPU. * it goes straight through and wastes no time serializing * anything. Worst case is that we lose a reschedule ... @@ -124,12 +136,12 @@ static void native_smp_send_reschedule(int cpu) WARN_ON(1); return; } - apic->send_IPI_mask(cpumask_of(cpu), RESCHEDULE_VECTOR); + send_IPI_cpu(cpu, RESCHEDULE_VECTOR); } void native_send_call_func_single_ipi(int cpu) { - apic->send_IPI_mask(cpumask_of(cpu), CALL_FUNCTION_SINGLE_VECTOR); + send_IPI_cpu(cpu, CALL_FUNCTION_SINGLE_VECTOR); } void native_send_call_func_ipi(const struct cpumask *mask) -- 2.6.2.402.g2635c2b
Re: [PATCH v13 20/51] ext4: Add richacl support
On Nov 3, 2015, at 8:16 AM, Andreas Gruenbacher wrote: > > From: "Aneesh Kumar K.V" > > Support the richacl permission model in ext4. The richacls are stored > in "system.richacl" xattrs. Richacls need to be enabled by tune2fs or > at file system create time. Patch looks reasonable. One minor cleanup below that could be fixed when the patch series is refreshed, and you can add: Reviewed-by: Andreas Dilger > > Signed-off-by: Aneesh Kumar K.V > Signed-off-by: Andreas Gruenbacher > --- > fs/ext4/Kconfig | 11 + > fs/ext4/Makefile | 1 + > fs/ext4/file.c| 3 ++ > fs/ext4/ialloc.c | 11 - > fs/ext4/inode.c | 12 - > fs/ext4/namei.c | 5 ++ > fs/ext4/richacl.c | 141 ++ > fs/ext4/richacl.h | 40 > fs/ext4/xattr.c | 7 +++ > 9 files changed, 228 insertions(+), 3 deletions(-) > create mode 100644 fs/ext4/richacl.c > create mode 100644 fs/ext4/richacl.h > > diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig > index b46e9fc..65c5230 100644 > --- a/fs/ext4/Kconfig > +++ b/fs/ext4/Kconfig > @@ -22,6 +22,17 @@ config EXT3_FS_POSIX_ACL > This config option is here only for backward compatibility. ext3 > filesystem is now handled by the ext4 driver. > > +config EXT4_FS_RICHACL > + bool "Ext4 Rich Access Control Lists (EXPERIMENTAL)" > + depends on EXT4_FS > + select FS_RICHACL > + help > + Richacls are an implementation of NFSv4 ACLs, extended by file masks > + to cleanly integrate into the POSIX file permission model. To learn > + more about them, see http://www.bestbits.at/richacl/. > + > + If you don't know what Richacls are, say N. > + > config EXT3_FS_SECURITY > bool "Ext3 Security Labels" > depends on EXT3_FS > diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile > index 75285ea..ea0d539 100644 > --- a/fs/ext4/Makefile > +++ b/fs/ext4/Makefile > @@ -14,3 +14,4 @@ ext4-$(CONFIG_EXT4_FS_POSIX_ACL)+= acl.o > ext4-$(CONFIG_EXT4_FS_SECURITY) += xattr_security.o > ext4-$(CONFIG_EXT4_FS_ENCRYPTION) += crypto_policy.o crypto.o \ > crypto_key.o crypto_fname.o > +ext4-$(CONFIG_EXT4_FS_RICHACL) += richacl.o > diff --git a/fs/ext4/file.c b/fs/ext4/file.c > index 113837e..a03b4a5 100644 > --- a/fs/ext4/file.c > +++ b/fs/ext4/file.c > @@ -30,6 +30,7 @@ > #include "ext4_jbd2.h" > #include "xattr.h" > #include "acl.h" > +#include "richacl.h" > > /* > * Called when an inode is released. Note that this is different > @@ -719,6 +720,8 @@ const struct inode_operations ext4_file_inode_operations > = { > .removexattr= generic_removexattr, > .get_acl= ext4_get_acl, > .set_acl= ext4_set_acl, > + .get_richacl= ext4_get_richacl, > + .set_richacl= ext4_set_richacl, > .fiemap = ext4_fiemap, > }; > > diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c > index 619bfc1..9657b3a 100644 > --- a/fs/ext4/ialloc.c > +++ b/fs/ext4/ialloc.c > @@ -27,6 +27,7 @@ > #include "ext4_jbd2.h" > #include "xattr.h" > #include "acl.h" > +#include "richacl.h" > > #include > > @@ -697,6 +698,14 @@ out: > return ret; > } > > +static inline int > +ext4_new_acl(handle_t *handle, struct inode *inode, struct inode *dir) > +{ > + if (IS_RICHACL(dir)) > + return ext4_init_richacl(handle, inode, dir); > + return ext4_init_acl(handle, inode, dir); > +} > + > /* > * There are two policies for allocating an inode. If the new inode is > * a directory, then a forward search is made for a block group with both > @@ -1052,7 +1061,7 @@ got: > if (err) > goto fail_drop; > > - err = ext4_init_acl(handle, inode, dir); > + err = ext4_new_acl(handle, inode, dir); > if (err) > goto fail_free_drop; > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 612fbcf..647f3c3 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -42,6 +42,7 @@ > #include "xattr.h" > #include "acl.h" > #include "truncate.h" > +#include "richacl.h" > > #include > > @@ -4638,6 +4639,14 @@ static void ext4_wait_for_tail_page_commit(struct > inode *inode) > } > } > > +static inline int > +ext4_acl_chmod(struct inode *inode, umode_t mode) > +{ > + if (IS_RICHACL(inode)) > + return richacl_chmod(inode, inode->i_mode); > + return posix_acl_chmod(inode, inode->i_mode); > +} > + > /* > * ext4_setattr() > * > @@ -4806,8 +4815,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr > *attr) > ext4_orphan_del(NULL, inode); > > if (!rc && (ia_valid & ATTR_MODE)) > - rc = posix_acl_chmod(inode, inode->i_mode); > - > + rc = ext4_acl_chmod(inode, inode->i_mode); > err_out: > ext4_std_error(inode->i_sb, error); > if (!error) > diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c > index 9f61e76..9b6e8b9 100644 > --- a/fs/ext4/namei.c > +++ b/fs/ext4/namei.c > @@
Re: [PATCH v13 12/51] vfs: Cache richacl in struct inode
On Nov 3, 2015, at 8:16 AM, Andreas Gruenbacher wrote: > > Cache richacls in struct inode so that this doesn't have to be done > individually in each filesystem. This is similar to POSIX ACLs. > > Signed-off-by: Andreas Gruenbacher > --- > fs/inode.c | 11 ++-- > fs/posix_acl.c | 2 +- > fs/richacl_base.c | 4 +-- > fs/richacl_inode.c | 75 + > include/linux/fs.h | 5 +++- > include/linux/richacl.h | 15 ++ > 6 files changed, 100 insertions(+), 12 deletions(-) > > diff --git a/fs/inode.c b/fs/inode.c > index 2a387f4..8462ddb 100644 > --- a/fs/inode.c > +++ b/fs/inode.c > @@ -174,8 +174,11 @@ int inode_init_always(struct super_block *sb, struct > inode *inode) > inode->i_private = NULL; > inode->i_mapping = mapping; > INIT_HLIST_HEAD(>i_dentry); /* buggered by rcu freeing */ > -#ifdef CONFIG_FS_POSIX_ACL > - inode->i_acl = inode->i_default_acl = ACL_NOT_CACHED; > +#if defined(CONFIG_FS_POSIX_ACL) || defined(CONFIG_FS_RICHACL) > + inode->i_acl = ACL_NOT_CACHED; > +# if defined(CONFIG_FS_POSIX_ACL) > + inode->i_default_acl = ACL_NOT_CACHED; > +# endif > #endif > > #ifdef CONFIG_FSNOTIFY > @@ -231,11 +234,13 @@ void __destroy_inode(struct inode *inode) > atomic_long_dec(>i_sb->s_remove_count); > } > > -#ifdef CONFIG_FS_POSIX_ACL > +#if defined(CONFIG_FS_POSIX_ACL) || defined(CONFIG_FS_RICHACL) > if (inode->i_acl && inode->i_acl != ACL_NOT_CACHED) > put_base_acl(inode->i_acl); > +# if defined(CONFIG_FS_POSIX_ACL) > if (inode->i_default_acl && inode->i_default_acl != ACL_NOT_CACHED) > put_base_acl(inode->i_default_acl); > +# endif > #endif > this_cpu_dec(nr_inodes); > } > diff --git a/fs/posix_acl.c b/fs/posix_acl.c > index b3b2265..1d766a5 100644 > --- a/fs/posix_acl.c > +++ b/fs/posix_acl.c > @@ -38,7 +38,7 @@ struct posix_acl *get_cached_acl(struct inode *inode, int > type) > { > struct posix_acl **p = acl_by_type(inode, type); > struct posix_acl *acl = ACCESS_ONCE(*p); > - if (acl) { > + if (acl && IS_POSIXACL(inode)) { > spin_lock(>i_lock); > acl = *p; > if (acl != ACL_NOT_CACHED) > diff --git a/fs/richacl_base.c b/fs/richacl_base.c > index 69b806c..d0ab5e9 100644 > --- a/fs/richacl_base.c > +++ b/fs/richacl_base.c > @@ -33,7 +33,7 @@ richacl_alloc(int count, gfp_t gfp) > struct richacl *acl = kzalloc(size, gfp); > > if (acl) { > - atomic_set(>a_refcount, 1); > + atomic_set(>a_base.ba_refcount, 1); > acl->a_count = count; > } > return acl; > @@ -52,7 +52,7 @@ richacl_clone(const struct richacl *acl, gfp_t gfp) > > if (dup) { > memcpy(dup, acl, size); > - atomic_set(>a_refcount, 1); > + atomic_set(>a_base.ba_refcount, 1); These two calls should be base_acl_init(). There isn't any point to have an abstraction that is only used by the posix_acl* code and not the richacl code, as it will just result in richacl breaking in the future if/when the abstraction is changed. > } > return dup; > } > diff --git a/fs/richacl_inode.c b/fs/richacl_inode.c > index 99b3c93..c41a6c4 100644 > --- a/fs/richacl_inode.c > +++ b/fs/richacl_inode.c > @@ -20,6 +20,81 @@ > #include > #include > > +struct richacl *get_cached_richacl(struct inode *inode) > +{ > + struct richacl *acl; > + > + acl = (struct richacl *)ACCESS_ONCE(inode->i_acl); > + if (acl && IS_RICHACL(inode)) { > + spin_lock(>i_lock); > + acl = (struct richacl *)inode->i_acl; > + if (acl != ACL_NOT_CACHED) > + acl = richacl_get(acl); > + spin_unlock(>i_lock); > + } > + return acl; > +} > +EXPORT_SYMBOL_GPL(get_cached_richacl); > + > +struct richacl *get_cached_richacl_rcu(struct inode *inode) > +{ > + return (struct richacl *)rcu_dereference(inode->i_acl); > +} > +EXPORT_SYMBOL_GPL(get_cached_richacl_rcu); > + > +void set_cached_richacl(struct inode *inode, struct richacl *acl) > +{ > + struct base_acl *old = NULL; > + > + spin_lock(>i_lock); > + old = inode->i_acl; > + rcu_assign_pointer(inode->i_acl, _get(acl)->a_base); > + spin_unlock(>i_lock); > + if (old != ACL_NOT_CACHED) > + put_base_acl(old); > +} > +EXPORT_SYMBOL_GPL(set_cached_richacl); > + > +void forget_cached_richacl(struct inode *inode) > +{ > + struct base_acl *old = NULL; > + > + spin_lock(>i_lock); > + old = inode->i_acl; > + inode->i_acl = ACL_NOT_CACHED; > + spin_unlock(>i_lock); > + if (old != ACL_NOT_CACHED) > + put_base_acl(old); > +} > +EXPORT_SYMBOL_GPL(forget_cached_richacl); > + > +struct richacl *get_richacl(struct inode *inode) > +{ > + struct richacl *acl; > + > + acl = get_cached_richacl(inode); > + if (acl != ACL_NOT_CACHED) > +
Re: [PATCH v11 22/28] tracing: Add enable_hist/disable_hist triggers
On Tue, 2015-11-03 at 17:55 +0900, Namhyung Kim wrote: > On Thu, Oct 22, 2015 at 01:14:26PM -0500, Tom Zanussi wrote: > > Similar to enable_event/disable_event triggers, these triggers enable > > and disable the aggregation of events into maps rather than enabling > > and disabling their writing into the trace buffer. > > > > They can be used to automatically start and stop hist triggers based > > on a matching filter condition. > > > > If there's a paused hist trigger on system:event, the following would > > start it when the filter condition was hit: > > > > # echo enable_hist:system:event [ if filter] > event/trigger > > > > And the following would disable a running system:event hist trigger: > > > > # echo disable_hist:system:event [ if filter] > event/trigger > > What about named hist triggers? Maybe worth adding > "enable/disable_hist_name" too? > enable_disable_hist should work fine with both named and unnamed hist triggers. As an example, the below sets a couple each of both named and unnamed triggers and enables and disables them using enable/disable_hist. Running this I got the expected output i.e. there were events captured in both named and unnamed histograms: echo 'hist:name=foo:keys=skbaddr.hex:vals=len:pause if len < 0' > /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger echo 'hist:name=foo:keys=skbaddr.hex:vals=len:pause if len > 256' > /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger echo 'hist:keys=skbaddr.hex:vals=len:pause if len == 256' > /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger echo 'hist:keys=skbaddr.hex:vals=len:pause' > /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger echo 'enable_hist:net:netif_receive_skb if filename==/usr/bin/wget' > /sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger echo 'disable_hist:net:netif_receive_skb if comm==wget' > /sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger Thanks, Tom > Thanks, > Namhyung > > > > > > See Documentation/trace/events.txt for real examples. > > > > Signed-off-by: Tom Zanussi > > Tested-by: Masami Hiramatsu -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)
Hi Brendan, On Tue, Nov 03, 2015 at 01:33:43PM -0800, Brendan Gregg wrote: > On Tue, Nov 3, 2015 at 6:40 AM, Arnaldo Carvalho de Melo > wrote: > > Em Tue, Nov 03, 2015 at 09:52:07PM +0900, Namhyung Kim escreveu: > >> Hello, > >> > >> This is what Brendan requested on the perf-users mailing list [1] to > >> support FlameGraphs [2] more efficiently. This patchset adds a few > >> more callchain options to adjust the output for it. > >> > >> * changes in v4) > >> - add missing doc update > >> - cleanup/fix callchain value print code > >> - add Acked-by from Brendan and Jiri > > > > Do those Acked-by stand? Things changed, the values moved from the end > > of the line to the start, etc. > > > [...] > > I'd Ack this change as it's a useful addition. It doesn't quite > address the folded-only output, but it's a step in that direction. I > think having the value at the start of a line only makes sense for the > perf report output containing the hist summary lines, for consistency. Right, thanks! > > Here's how I'd shuffle the output of this patch (ignore word wrap > issues with this email): > > # ./perf report --stdio -g folded,count,caller -F pid | \ > awk '/^ / { n = $1 } > /^[0-9]/ { split(n,a,":"); print a[2] "-" a[1] ";" $2,$1 }' > swapper-0;cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op > 809 > swapper-0;xen_start_kernel;x86_64_start_reservations;start_kernel;rest_init;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op > 135 > dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;check_events;xen_hypercall_xen_version > 63 > dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf > 54 > dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;memset_erms > 3 > dd-30551;xen_irq_enable_direct_end;check_events;xen_hypercall_xen_version 3 > > So the output is folded stacks, prefixed by comm-PID. Shuffling the > summarized output is a lot better than doing a "perf script" dump and > re-processing call chains. (Note that since I'm using -F, I didn't > need --no-children; Nope. The '-F pid' doesn't affect --children. It doesn't show the children overhead column but we still have hist entries for (synthesized) children.. $ perf report --no-children | wc -l 998 $ perf report --no-children -F pid,dso,sym | wc -l 998 $ perf report --children | wc -l 3229 $ perf report --children -F pid,dso,sym | wc -l 3202 So I think you still need to use --no-children (or set report.children config variable to false) for your script. > and with "-g count", I didn't need --show-nr-samples.) Yes, I used -n/--show-nr-samples just to check the number is correct. > > I notice the fields (-F) option already has this precedent: > > - "comm": prints PID:comm > - "pid": prints PID It's opposite: "comm" prints comm, "pid" prints PID:comm. :) > > If these were added to -g, along with a no-hists, then the two types > of folded-only output could be generated using: > > perf report --stdio -g folded,count,comm,no-hists,caller > perf report --stdio -g folded,count,pid,no-hists,caller As I said, using fields like comm, pid requires to have same keys in --sort option. So it's basically unreliable to use those specific field names in the -g option IMHO. I suggested to use 'info' (yes, it needs better name) to print all sort keys. > > ... although "no-hists" doesn't hit me as intuitive. How about "-F > none" to specify zero columns? ie: > > perf report --stdio -g folded,count,comm,caller -F none > perf report --stdio -g folded,count,pid,caller -F none Ah, makes sense. So it'd look like $ perf report --stdio -g folded,count,info -F none -s comm $ perf report --stdio -g folded,count,info -F none -s pid The output would be 809 swapper-0 cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op Thoughts? Thanks, Namhyung -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v4 05/10] xen/blkfront: negotiate number of queues/rings to be used with backend
On Wed, Nov 04, 2015 at 09:11:10AM +0800, Bob Liu wrote: > > On 11/04/2015 04:40 AM, Konrad Rzeszutek Wilk wrote: > > On Mon, Nov 02, 2015 at 12:21:41PM +0800, Bob Liu wrote: > >> The number of hardware queues for xen/blkfront is set by parameter > >> 'max_queues'(default 4), while the max value xen/blkback supported is > >> notified > >> through xenstore("multi-queue-max-queues"). > > > > That is not right. > > > > s/The number/The max number/ > > > > The second part: ",while the max value xen/blkback supported is..". I think > > you are trying to say: "it is also capped by the max value that > > the xen/blkback exposes through XenStore key 'multi-queue-max-queues'. > > > >> > >> The negotiated number is the smaller one and would be written back to > >> xenstore > >> as "multi-queue-num-queues", blkback need to read this negotiated number. > > > > s/blkback need to read/blkback needs to read this/ > > > >> > >> Signed-off-by: Bob Liu > >> --- > >> drivers/block/xen-blkfront.c | 166 > >> +++ > >> 1 file changed, 120 insertions(+), 46 deletions(-) > >> > >> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c > >> index 8cc5995..23096d7 100644 > >> --- a/drivers/block/xen-blkfront.c > >> +++ b/drivers/block/xen-blkfront.c > >> @@ -98,6 +98,10 @@ static unsigned int xen_blkif_max_segments = 32; > >> module_param_named(max, xen_blkif_max_segments, int, S_IRUGO); > >> MODULE_PARM_DESC(max, "Maximum amount of segments in indirect requests > >> (default is 32)"); > >> > >> +static unsigned int xen_blkif_max_queues = 4; > >> +module_param_named(max_queues, xen_blkif_max_queues, uint, S_IRUGO); > >> +MODULE_PARM_DESC(max_queues, "Maximum number of hardware queues/rings > >> used per virtual disk"); > >> + > >> /* > >> * Maximum order of pages to be used for the shared ring between front and > >> * backend, 4KB page granularity is used. > >> @@ -113,6 +117,7 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order > >> of pages to be used for the > >> * characters are enough. Define to 20 to keep consist with backend. > >> */ > >> #define RINGREF_NAME_LEN (20) > >> +#define QUEUE_NAME_LEN (12) > > > > Little bit of documentation please. Why 12? Why not 31415 for example? > > I presume it is 'queue-%u' and since so that is 7 + 10 (UINT_MAX is > > 4294967295) = 17! > > > > > >> > >> /* > >> * Per-ring info. > >> @@ -695,7 +700,7 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, > >> u16 sector_size, > >> > >>memset(>tag_set, 0, sizeof(info->tag_set)); > >>info->tag_set.ops = _mq_ops; > >> - info->tag_set.nr_hw_queues = 1; > >> + info->tag_set.nr_hw_queues = info->nr_rings; > >>info->tag_set.queue_depth = BLK_RING_SIZE(info); > >>info->tag_set.numa_node = NUMA_NO_NODE; > >>info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE; > >> @@ -1352,6 +1357,51 @@ fail: > >>return err; > >> } > >> > >> +static int write_per_ring_nodes(struct xenbus_transaction xbt, > >> + struct blkfront_ring_info *rinfo, const char > >> *dir) > >> +{ > >> + int err, i; > > > > Please make 'i' be an unsigned int. Especially as you are using '%u' in the > > snprintf. > > > > > >> + const char *message = NULL; > >> + struct blkfront_info *info = rinfo->dev_info; > >> + > >> + if (info->nr_ring_pages == 1) { > >> + err = xenbus_printf(xbt, dir, "ring-ref", "%u", > >> rinfo->ring_ref[0]); > >> + if (err) { > >> + message = "writing ring-ref"; > >> + goto abort_transaction; > >> + } > >> + pr_info("%s: write ring-ref:%d\n", dir, rinfo->ring_ref[0]); > > > > Ewww. No. > > > >> + } else { > >> + for (i = 0; i < info->nr_ring_pages; i++) { > >> + char ring_ref_name[RINGREF_NAME_LEN]; > >> + > >> + snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", > >> i); > >> + err = xenbus_printf(xbt, dir, ring_ref_name, > >> + "%u", rinfo->ring_ref[i]); > >> + if (err) { > >> + message = "writing ring-ref"; > >> + goto abort_transaction; > >> + } > >> + pr_info("%s: write ring-ref:%d\n", dir, > >> rinfo->ring_ref[i]); > > > > No no please. > > > >> + } > >> + } > >> + > >> + err = xenbus_printf(xbt, dir, "event-channel", "%u", rinfo->evtchn); > > > > That is not right. > > > > That only creates one. But the blkif.h says (And the example agrees) > > that there are N 'event-channel' for N-rings. > > > > Shouldn't this be part of the above loop? > > > > No, this loop is only for per-ring each with "multipage". > > The loop you want is.. > > >> + if (err) { > >> + message = "writing event-channel"; > >> + goto abort_transaction; > >> + } > >> + pr_info("%s: write
Re: [PATCH v11 13/28] tracing: Add hist trigger support for pausing and continuing a trace
On Tue, 2015-11-03 at 17:38 +0900, Namhyung Kim wrote: > On Thu, Oct 22, 2015 at 01:14:17PM -0500, Tom Zanussi wrote: > > Allow users to append 'pause' or 'continue' to an existing trigger in > > order to have it paused or to have a paused trace continue. > > > > This expands the hist trigger syntax from this: > > # echo hist:keys=xxx:vals=yyy:sort=zzz.descending \ > > [ if filter] > event/trigger > > > > to this: > > > > # echo hist:keys=xxx:vals=yyy:sort=zzz.descending:pause or cont \ > > [ if filter] > event/trigger > > What if 'cont' is used for a non-existing hist? I think it should > fail with -ENOENT but it seems that it'd create a new hist trigger.. > You're right - 'pause' is the only one that should create a new trigger, not 'cont' or 'clear'. Thanks, Tom > Thanks, > Namhyung > > > > > > Signed-off-by: Tom Zanussi > > Tested-by: Masami Hiramatsu > > --- > > kernel/trace/trace.c | 7 ++- > > kernel/trace/trace_events_hist.c | 26 +++--- > > 2 files changed, 29 insertions(+), 4 deletions(-) > > > > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c > > index efa8d05..a223959 100644 > > --- a/kernel/trace/trace.c > > +++ b/kernel/trace/trace.c > > @@ -3802,6 +3802,7 @@ static const char readme_msg[] = > > "\t[:values= > "\t[:sort=field1,field2,...]\n" > > "\t[:size=#entries]\n" > > + "\t[:pause][:continue]\n" > > "\t[if ]\n\n" > > "\tWhen a matching event is hit, an entry is added to a hash\n" > > "\ttable using the key(s) and value(s) named, and the value of a\n" > > @@ -3816,7 +3817,11 @@ static const char readme_msg[] = > > "\tused to specify more or fewer than the default 2048 entries\n" > > "\tfor the hashtable size.\n\n" > > "\tReading the 'hist' file for the event will dump the hash\n" > > - "\ttable in its entirety to stdout." > > + "\ttable in its entirety to stdout.\n\n" > > + "\tThe 'pause' parameter can be used to pause an existing hist\n" > > + "\ttrigger or to start a hist trigger but not log any events\n" > > + "\tuntil told to do so. 'continue' can be used to start or\n" > > + "\trestart a paused hist trigger.\n\n" > > #endif > > ; > > > > diff --git a/kernel/trace/trace_events_hist.c > > b/kernel/trace/trace_events_hist.c > > index 4fc3136..8753c3b 100644 > > --- a/kernel/trace/trace_events_hist.c > > +++ b/kernel/trace/trace_events_hist.c > > @@ -78,6 +78,8 @@ struct hist_trigger_attrs { > > char*keys_str; > > char*vals_str; > > char*sort_key_str; > > + boolpause; > > + boolcont; > > unsigned intmap_bits; > > }; > > > > @@ -184,6 +186,11 @@ static struct hist_trigger_attrs > > *parse_hist_trigger_attrs(char *trigger_str) > > attrs->vals_str = kstrdup(str, GFP_KERNEL); > > else if (!strncmp(str, "sort", strlen("sort"))) > > attrs->sort_key_str = kstrdup(str, GFP_KERNEL); > > + else if (!strncmp(str, "pause", strlen("pause"))) > > + attrs->pause = true; > > + else if (!strncmp(str, "continue", strlen("continue")) || > > +!strncmp(str, "cont", strlen("cont"))) > > + attrs->cont = true; > > else if (!strncmp(str, "size", strlen("size"))) { > > int map_bits = parse_map_size(str); > > > > @@ -843,7 +850,10 @@ static int event_hist_trigger_print(struct seq_file *m, > > if (data->filter_str) > > seq_printf(m, " if %s", data->filter_str); > > > > - seq_puts(m, " [active]"); > > + if (data->paused) > > + seq_puts(m, " [paused]"); > > + else > > + seq_puts(m, " [active]"); > > > > seq_putc(m, '\n'); > > > > @@ -882,16 +892,25 @@ static int hist_register_trigger(char *glob, struct > > event_trigger_ops *ops, > > struct event_trigger_data *data, > > struct trace_event_file *file) > > { > > + struct hist_trigger_data *hist_data = data->private_data; > > struct event_trigger_data *test; > > int ret = 0; > > > > list_for_each_entry_rcu(test, >triggers, list) { > > if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) { > > - ret = -EEXIST; > > + if (hist_data->attrs->pause) > > + test->paused = true; > > + else if (hist_data->attrs->cont) > > + test->paused = false; > > + else > > + ret = -EEXIST; > > goto out; > > } > > } > > > > + if (hist_data->attrs->pause) > > + data->paused = true; > > + > > if (data->ops->init) { > > ret = data->ops->init(data->ops,
Re: [PATCH v11 24/28] tracing: Add support for multiple hist triggers per event
On Tue, 2015-11-03 at 09:34 +0900, Namhyung Kim wrote: > On Thu, Oct 22, 2015 at 01:14:28PM -0500, Tom Zanussi wrote: > > Allow users to define any number of hist triggers per trace event. > > Any number of hist triggers may be added for a given event, which may > > differ by key, value, or filter. > > > > Reading the event's 'hist' file will display the output of all the > > hist triggers defined on an event concatenated in the order they were > > defined. > > > > Signed-off-by: Tom Zanussi > > Tested-by: Masami Hiramatsu > > --- > > Documentation/trace/events.txt | 151 > > +-- > > kernel/trace/trace.c | 8 ++- > > kernel/trace/trace_events_hist.c | 138 ++- > > 3 files changed, 256 insertions(+), 41 deletions(-) > > > > diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt > > index b3aa47e..6c64cb7 100644 > > --- a/Documentation/trace/events.txt > > +++ b/Documentation/trace/events.txt > > @@ -532,12 +532,14 @@ The following commands are supported: > > > >'hist' triggers add a 'hist' file to each event's subdirectory. > >Reading the 'hist' file for the event will dump the hash table in > > - its entirety to stdout. Each printed hash table entry is a simple > > - list of the keys and values comprising the entry; keys are printed > > - first and are delineated by curly braces, and are followed by the > > - set of value fields for the entry. By default, numeric fields are > > - displayed as base-10 integers. This can be modified by appending > > - any of the following modifiers to the field name: > > + its entirety to stdout. If there are multiple hist triggers > > + attached to an event, there will be a table for each trigger in the > > + output. Each printed hash table entry is a simple list of the keys > > + and values comprising the entry; keys are printed first and are > > + delineated by curly braces, and are followed by the set of value > > + fields for the entry. By default, numeric fields are displayed as > > + base-10 integers. This can be modified by appending any of the > > + following modifiers to the field name: > > > > .hexdisplay a number as a hex value > > .symdisplay an address as a symbol > > @@ -1629,3 +1631,140 @@ The following commands are supported: > > . > > . > > . > > + > > + The following example demonstrates how multiple hist triggers can be > > + attached to a given event. This capability can be useful for > > + creating a set of different summaries derived from the same set of > > + events, or for comparing the effects of different filters, among > > + other things. > > + > > +# echo 'hist:keys=skbaddr.hex:vals=len if len < 0' > \ > > + /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger > > +# echo 'hist:keys=skbaddr.hex:vals=len if len > 4096' > \ > > + /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger > > +# echo 'hist:keys=skbaddr.hex:vals=len if len == 256' > \ > > + /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger > > +# echo 'hist:keys=skbaddr.hex:vals=len' > \ > > + /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger > > +# echo 'hist:keys=len:vals=common_preempt_count' > \ > > + /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger > > AFAIK other tracefs files honor the truncation flag so open for > writing would destroy other hist triggers. What do you think? > Yeah, I think you're right - adding multiple triggers should use >> (and > should truncate as mentioned). > > > + > > + The above set of commands create four triggers differing only in > > + their filters, along with a completely different though fairly > > + nonsensical trigger. > > [SNIP] > > @@ -1289,22 +1368,18 @@ static int event_hist_trigger_func(struct > > event_command *cmd_ops, > > > > trigger_data->private_data = hist_data; > > > > + if (param) { /* if param is non-empty, it's supposed to be a filter */ > > + ret = cmd_ops->set_filter(param, trigger_data, file); > > Maybe you want to check ->set_filter being NULL first. :) Yep, thanks. Tom > Thanks, > Namhyung > > > > + if (ret < 0) > > + goto out_free; > > + } > > + > > if (glob[0] == '!') { > > cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file); > > ret = 0; > > goto out_free; > > } > > > > - if (!param) /* if param is non-empty, it's supposed to be a filter */ > > - goto out_reg; > > - > > - if (!cmd_ops->set_filter) > > - goto out_reg; > > - > > - ret = cmd_ops->set_filter(param, trigger_data, file); > > - if (ret < 0) > > - goto out_free; > > - out_reg: > > ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file); > > /* > > * The above returns on success
Re: [PATCH v4 04/10] xen/blkfront: split per device io_lock
On Wed, Nov 04, 2015 at 09:07:12AM +0800, Bob Liu wrote: > > On 11/04/2015 04:09 AM, Konrad Rzeszutek Wilk wrote: > > On Mon, Nov 02, 2015 at 12:21:40PM +0800, Bob Liu wrote: > >> The per device io_lock became a coarser grained lock after > >> multi-queues/rings > >> was introduced, this patch introduced a fine-grained ring_lock for each > >> ring. > > > > s/was introduced/was introduced (see commit titled XYZ)/ > > > > s/introdued/introduces/ > >> > >> The old io_lock was renamed to dev_lock and only protect the ->grants list > > > > s/was/is/ > > s/protect/protects/ > > > >> which is shared by all rings. > >> > >> Signed-off-by: Bob Liu > >> --- > >> drivers/block/xen-blkfront.c | 57 > >> ++-- > >> 1 file changed, 34 insertions(+), 23 deletions(-) > >> > >> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c > >> index eab78e7..8cc5995 100644 > >> --- a/drivers/block/xen-blkfront.c > >> +++ b/drivers/block/xen-blkfront.c > >> @@ -121,6 +121,7 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order > >> of pages to be used for the > >> */ > >> struct blkfront_ring_info { > >>struct blkif_front_ring ring; > > > > Can you add a comment explaining the lock semantic? As in under what > > conditions > > should it be taken? Like you have it below. > > > >> + spinlock_t ring_lock; > >>unsigned int ring_ref[XENBUS_MAX_RING_PAGES]; > >>unsigned int evtchn, irq; > >>struct work_struct work; > >> @@ -138,7 +139,8 @@ struct blkfront_ring_info { > >> */ > >> struct blkfront_info > >> { > >> - spinlock_t io_lock; > >> + /* Lock to proect info->grants list shared by multi rings */ > > > > s/proect/protect/ > > > > Missing full stop. > > > >> + spinlock_t dev_lock; > > > > Shouldn't it be right next to what it is protecting? > > > > That is right below (or above): 'struct list_head grants;'? > > > >>struct mutex mutex; > >>struct xenbus_device *xbdev; > >>struct gendisk *gd; > >> @@ -224,6 +226,7 @@ static int fill_grant_buffer(struct blkfront_ring_info > >> *rinfo, int num) > >>struct grant *gnt_list_entry, *n; > >>int i = 0; > >> > >> + spin_lock_irq(>dev_lock); > > > > Why there? Why not where you add it to the list? > >>while(i < num) { > >>gnt_list_entry = kzalloc(sizeof(struct grant), GFP_NOIO); > >>if (!gnt_list_entry) > >> @@ -242,6 +245,7 @@ static int fill_grant_buffer(struct blkfront_ring_info > >> *rinfo, int num) > >>list_add(_list_entry->node, >grants); > > > > Right here that is? > > > > You are holding the lock for the duration of 'kzalloc' and 'alloc_page'. > > > > And more interestingly, GFP_NOIO translates to __GFP_WAIT which means > > it can call 'schedule'. - And you have taken an spinlock. That should > > have thrown lots of warnings? > > > >>i++; > >>} > >> + spin_unlock_irq(>dev_lock); > >> > >>return 0; > >> > >> @@ -254,6 +258,7 @@ out_of_memory: > >>kfree(gnt_list_entry); > >>i--; > >>} > >> + spin_unlock_irq(>dev_lock); > > > > Just do it around the 'list_del' operation. You are using an > > 'safe' > >>BUG_ON(i != 0); > >>return -ENOMEM; > >> } > >> @@ -265,6 +270,7 @@ static struct grant *get_grant(grant_ref_t *gref_head, > >>struct grant *gnt_list_entry; > >>unsigned long buffer_gfn; > >> > >> + spin_lock(>dev_lock); > >>BUG_ON(list_empty(>grants)); > >>gnt_list_entry = list_first_entry(>grants, struct grant, > >> node); > >> @@ -272,8 +278,10 @@ static struct grant *get_grant(grant_ref_t *gref_head, > >> > >>if (gnt_list_entry->gref != GRANT_INVALID_REF) { > >>info->persistent_gnts_c--; > >> + spin_unlock(>dev_lock); > >>return gnt_list_entry; > >>} > >> + spin_unlock(>dev_lock); > > > > Just have one spin_unlock. Put it right before the 'if > > (gnt_list_entry->gref)..'. > > That's used to protect info->persistent_gnts_c, will update all other place. But you don't mention that in the description - that the lock is suppose to also protect persistent_gnts_c. Please update that. > > Thanks, > -Bob -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH v11 08/28] tracing: Add lock-free tracing_map
Hi Namhyung, On Mon, 2015-11-02 at 16:08 +0900, Namhyung Kim wrote: > Hi Tom, > > On Thu, Oct 29, 2015 at 01:35:43PM -0500, Tom Zanussi wrote: > > Hi Namhyung, > > > > On Thu, 2015-10-29 at 17:31 +0900, Namhyung Kim wrote: > > > Hi Tom, > > > > > > On Thu, Oct 22, 2015 at 01:14:12PM -0500, Tom Zanussi wrote: > > > > Add tracing_map, a special-purpose lock-free map for tracing. > > > > > > > > tracing_map is designed to aggregate or 'sum' one or more values > > > > associated with a specific object of type tracing_map_elt, which > > > > is associated by the map to a given key. > > > > > > > > It provides various hooks allowing per-tracer customization and is > > > > separated out into a separate file in order to allow it to be shared > > > > between multiple tracers, but isn't meant to be generally used outside > > > > of that context. > > > > > > > > The tracing_map implementation was inspired by lock-free map > > > > algorithms originated by Dr. Cliff Click: > > > > > > > > http://www.azulsystems.com/blog/cliff/2007-03-26-non-blocking-hashtable > > > > http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf > > > > > > > > Signed-off-by: Tom Zanussi > > > > Tested-by: Masami Hiramatsu > > > > --- > > > > +/** > > > > + * tracing_map_insert - Insert key and/or retrieve val from a > > > > tracing_map > > > > + * @map: The tracing_map to insert into > > > > + * @key: The key to insert > > > > + * > > > > + * Inserts a key into a tracing_map and creates and returns a new > > > > + * tracing_map_elt for it, or if the key has already been inserted by > > > > + * a previous call, returns the tracing_map_elt already associated > > > > + * with it. When the map was created, the number of elements to be > > > > + * allocated for the map was specified (internally maintained as > > > > + * 'max_elts' in struct tracing_map), and that number of > > > > + * tracing_map_elts was created by tracing_map_init(). This is the > > > > + * pre-allocated pool of tracing_map_elts that tracing_map_insert() > > > > + * will allocate from when adding new keys. Once that pool is > > > > + * exhausted, tracing_map_insert() is useless and will return NULL to > > > > + * signal that state. > > > > + * > > > > + * This is a lock-free tracing map insertion function implementing a > > > > + * modified form of Cliff Click's basic insertion algorithm. It > > > > + * requires the table size be a power of two. To prevent any > > > > + * possibility of an infinite loop we always make the internal table > > > > + * size double the size of the requested table size (max_elts * 2). > > > > + * Likewise, we never reuse a slot or resize or delete elements - when > > > > + * we've reached max_elts entries, we simply return NULL once we've > > > > + * run out of entries. Readers can at any point in time traverse the > > > > + * tracing map and safely access the key/val pairs. > > > > + * > > > > + * Return: the tracing_map_elt pointer val associated with the key. > > > > + * If this was a newly inserted key, the val will be a newly allocated > > > > + * and associated tracing_map_elt pointer val. If the key wasn't > > > > + * found and the pool of tracing_map_elts has been exhausted, NULL is > > > > + * returned and no further insertions will succeed. > > > > + */ > > > > +struct tracing_map_elt *tracing_map_insert(struct tracing_map *map, > > > > void *key) > > > > +{ > > > > + u32 idx, key_hash, test_key; > > > > + struct tracing_map_entry *entry; > > > > + > > > > + key_hash = jhash(key, map->key_size, 0); > > > > + if (key_hash == 0) > > > > + key_hash = 1; > > > > + idx = key_hash >> (32 - (map->map_bits + 1)); > > > > + > > > > + while (1) { > > > > + idx &= (map->map_size - 1); > > > > + entry = TRACING_MAP_ENTRY(map->map, idx); > > > > + test_key = entry->key; > > > > + > > > > + if (test_key && test_key == key_hash && entry->val && > > > > + keys_match(key, entry->val->key, map->key_size)) > > > > + return entry->val; > > > > + > > > > + if (!test_key && !cmpxchg(>key, 0, key_hash)) { > > > > + struct tracing_map_elt *elt; > > > > + > > > > + elt = get_free_elt(map); > > > > + if (!elt) > > > > + break; > > > > + memcpy(elt->key, key, map->key_size); > > > > + entry->val = elt; > > > > + > > > > + return entry->val; > > > > + } > > > > + idx++; > > > > + } > > > > + > > > > + return NULL; > > > > +} > > > > > > IIUC this always insert new entry if no matching key found. And if > > > the map is full, it only fails after walking through the entries to > > > find an empty one, mark the entry with the key and call to > > > get_free_elt() returns NULL. As
Re: [PATCH v2] pwm-backlight: fix the panel power sequence
On Tue, 2015-11-03 at 12:08 +0100, Philipp Zabel wrote: > Hi YH, > > Am Dienstag, den 03.11.2015, 16:11 +0800 schrieb YH Huang: > > > The reasoning is that devices where there is no phandle link pointing to > > > the backlight (for example from a simple-panel node), we should keep the > > > current default behaviour (enable during probe). > > > > I have a little problem for the current default behaviour. > > Should we enable during probe? > > Here I mean enabling the backlight (at the end of the probe function), > not enabling the GPIO already when requesting it. > > > Before this patch ( http://patchwork.ozlabs.org/patch/324690/ ), > > we disable "enable-gpio" in the probe function. > > While before this patch the GPIO would be initialized in the disabled > state, the call to backlight_update_status at the end of the probe > function would still enable the backlight afterwards. Based on this, could we disable it initially and update in the backlight_update_status function? Like this, if (pb->enable_gpio) { if (phandle && gpiod_get_direction(pb->enable_gpio) == GPIOF_DIR_OUT && gpiod_get_value(pb->enable_gpio) == 1) gpiod_direction_output(pb->enable_gpio, 1); else gpiod_direction_output(pb->enable_gpio, 0); } And then update with props.brightness in backlight_update_status. I am not sure, maybe I miss something. Regards, YH Huang -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 0/5] perf bpf: Improve error code delivering and output
On 2015/11/3 23:43, Arnaldo Carvalho de Melo wrote: Em Tue, Nov 03, 2015 at 10:44:41AM +, Wang Nan escreveu: This patchset is based on tip/core. Arnaldo found some confusing error reports when reviewing BPF related patches. This patch set makes some changes: Thanks for working on this! I applied the first two and made comments about the conversion to using err.h, please check, - Arnaldo I can't find them in your git repository at git.kernel.org. Could you please recheck? Thank you. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/
Re: [PATCH 10/31] perf test: Enforce LLVM test for BPF test
On 2015/11/4 2:24, Arnaldo Carvalho de Melo wrote: Em Thu, Oct 15, 2015 at 07:58:38PM +0800, Wangnan (F) escreveu: +void test__llvm_prepare(void) +{ + p_test_llvm__bpf_result = mmap(NULL, SHARED_BUF_INIT_SIZE, + PROT_READ | PROT_WRITE, + MAP_SHARED | MAP_ANONYMOUS, -1, 0); + if (!p_test_llvm__bpf_result) It should check MAP_FAILED instead. Fixed by this way: Can you please try refreshing this patchset on top of what is now in acme/perf/core? Also why do we need those struct test->{prepare,cleanup} pointers? You introduced it and then, on the next patch that touches 'perf test' and uses test__llvm_{prepare,cleanup} you call them directly, which I think should be enough, i.e. keep them as functions to call from inside the test called from run_test(), right? - Arnaldo This prepare/cleanup functions are introduced because I want BPF test reuse the result of LLVM test, so it don't need to compile those BPF scripts twice. This is the reason I use shared memory. However, now I think compiling twice is acceptable and can make things simpler. I'll update this patchset in my next pull request. Thank you. -- To unsubscribe from this list: send the line "unsubscribe linux-kernel" in the body of a message to majord...@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html Please read the FAQ at http://www.tux.org/lkml/