[RFC] ext4: fix a BUG in ext4_mpage_readpages()

2015-11-03 Thread yalin wang
The last block should be the total read pages last block,
need adjust the block number when caculate every page.

Signed-off-by: yalin wang 
---
 fs/ext4/readpage.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/fs/ext4/readpage.c b/fs/ext4/readpage.c
index 5dc5e95..d20c39d 100644
--- a/fs/ext4/readpage.c
+++ b/fs/ext4/readpage.c
@@ -174,7 +174,7 @@ int ext4_mpage_readpages(struct address_space *mapping,
goto confused;
 
block_in_file = (sector_t)page->index << (PAGE_CACHE_SHIFT - 
blkbits);
-   last_block = block_in_file + nr_pages * blocks_per_page;
+   last_block = block_in_file + (nr_pages - page_idx) * 
blocks_per_page;
last_block_in_file = (i_size_read(inode) + blocksize - 1) >> 
blkbits;
if (last_block > last_block_in_file)
last_block = last_block_in_file;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 2/2] watchdog: imx2_wdt: add set_pretimeout interface

2015-11-03 Thread Robin Gong
On Tue, Nov 03, 2015 at 09:26:16AM -0800, Guenter Roeck wrote:
> On 11/03/2015 12:04 AM, Robin Gong wrote:
> >On Mon, Nov 02, 2015 at 11:48:29PM -0800, Guenter Roeck wrote:
> >>On 11/02/2015 10:11 PM, Robin Gong wrote:
> >>>Enable set_pretimeout interface and trigger the pretimeout interrupt before
> >>>watchdog timeout event happen.
> >>>
> >>>Signed-off-by: Robin Gong 
> >>>---
> >>>  drivers/watchdog/imx2_wdt.c | 58 
> >>> -
> >>>  1 file changed, 57 insertions(+), 1 deletion(-)
> >>>
> >>>diff --git a/drivers/watchdog/imx2_wdt.c b/drivers/watchdog/imx2_wdt.c
> >>>index 0bb1a1d..bd42857 100644
> >>>--- a/drivers/watchdog/imx2_wdt.c
> >>>+++ b/drivers/watchdog/imx2_wdt.c
> >>>@@ -24,6 +24,7 @@
> >>>  #include 
> >>>  #include 
> >>>  #include 
> >>>+#include 
> >>>  #include 
> >>>  #include 
> >>>  #include 
> >>>@@ -52,12 +53,18 @@
> >>>  #define IMX2_WDT_WRSR0x04/* Reset Status 
> >>> Register */
> >>>  #define IMX2_WDT_WRSR_TOUT   (1 << 1)/* -> Reset due to 
> >>> Timeout */
> >>>
> >>>+#define IMX2_WDT_WICR 0x06/*Interrupt Control 
> >>>Register*/
> >>>+#define IMX2_WDT_WICR_WIE (1 << 15)   /* -> Interrupt Enable */
> >>>+#define IMX2_WDT_WICR_WTIS(1 << 14)   /* -> Interrupt Status 
> >>>*/
> >>>+#define IMX2_WDT_WICR_WICT(0xFF)  /* Watchdog Interrupt Timeout */
> >>
> >>Unnecessary ( )
> >>
> >>>+
> >>>  #define IMX2_WDT_WMCR0x08/* Misc Register */
> >>>
> >>>  #define IMX2_WDT_MAX_TIME128
> >>>  #define IMX2_WDT_DEFAULT_TIME60  /* in seconds */
> >>>
> >>>  #define WDOG_SEC_TO_COUNT(s) ((s * 2 - 1) << 8)
> >>>+#define WDOG_SEC_TO_PRECOUNT(s)   ((s) * 2)   /* set WDOG pre timeout 
> >>>count*/
> >>>
> >>>  struct imx2_wdt_device {
> >>>   struct clk *clk;
> >>>@@ -80,7 +87,8 @@ MODULE_PARM_DESC(timeout, "Watchdog timeout in seconds 
> >>>(default="
> >>>
> >>>  static const struct watchdog_info imx2_wdt_info = {
> >>>   .identity = "imx2+ watchdog",
> >>>-  .options = WDIOF_KEEPALIVEPING | WDIOF_SETTIMEOUT | WDIOF_MAGICCLOSE,
> >>>+  .options = WDIOF_KEEPALIVEPING | WDIOF_SETTIMEOUT | WDIOF_MAGICCLOSE
> >>>+ | WDIOF_PRETIMEOUT,
> >>>  };
> >>>
> >>>  static int imx2_restart_handler(struct notifier_block *this, unsigned 
> >>> long mode,
> >>>@@ -207,12 +215,49 @@ static inline void imx2_wdt_ping_if_active(struct 
> >>>watchdog_device *wdog)
> >>>   }
> >>>  }
> >>>
> >>>+static int imx2_wdt_set_pretimeout(struct watchdog_device *wdog,
> >>>+ unsigned int new_timeout)
> >>>+{
> >>>+  struct imx2_wdt_device *wdev = watchdog_get_drvdata(wdog);
> >>>+  u32 val;
> >>>+
> >>>+  regmap_read(wdev->regmap, IMX2_WDT_WICR, );
> >>>+  /* set the new pre-timeout value in the WSR */
> >>>+  val &= ~IMX2_WDT_WICR_WICT;
> >>>+  val |= WDOG_SEC_TO_PRECOUNT(new_timeout);
> >>
> >>Looking into this, I think we need more information in the first patch.
> >>Specifically, we need a value for max_pretimeout and validate it
> >>(new_timeout must be smaller than 128 to avoid overflows, independent
> >>of the maximum timeout).
> >>
> >I think it's enough, since there is "t >= wdd->timeout" in
> >watchdog_pretimeout_invalid() of the first patch and wdd->timeout is never
> >bigger than wdd->max_timeout which setting in imx2_wdt.c as 128.
> 
> We are dealing with infrastructure code here. This happens to work for your 
> driver,
> but that doesn't mean it will work for other drivers.
> 
> Anyway, I recalled that there is another patchset pending to add pretimeout
> support into the watchdog code. I'll have to dig that out and consolidate the 
> two.
> 
> Guenter
>
Got it. Thanks.
> >>>+
> >>>+  regmap_write(wdev->regmap, IMX2_WDT_WICR, val | IMX2_WDT_WICR_WIE);
> >>>+
> >>>+  wdog->pretimeout = new_timeout;
> >>>+
> >>>+  return 0;
> >>>+}
> >>>+
> >>>+static irqreturn_t imx2_wdt_isr(int irq, void *dev_id)
> >>>+{
> >>>+  struct platform_device *pdev = dev_id;
> >>>+  struct watchdog_device *wdog = platform_get_drvdata(pdev);
> >>>+  struct imx2_wdt_device *wdev = watchdog_get_drvdata(wdog);
> >>>+  u32 val;
> >>>+
> >>>+  regmap_read(wdev->regmap, IMX2_WDT_WICR, );
> >>>+  if (val & IMX2_WDT_WICR_WTIS) {
> >>>+  /*clear interrupt status bit*/
> >>>+  regmap_write(wdev->regmap, IMX2_WDT_WICR, val);
> >>>+  panic("pre-timeout:%d, %d Seconds remained\n", wdog->pretimeout,
> >>
> >>"seconds" - no capital 'S'.
> >>
> >>>+wdog->timeout - wdog->pretimeout);
> >>>+  }
> >>>+
> >>>+  return IRQ_HANDLED;
> >>>+}
> >>>+
> >>>  static const struct watchdog_ops imx2_wdt_ops = {
> >>>   .owner = THIS_MODULE,
> >>>   .start = imx2_wdt_start,
> >>>   .stop = imx2_wdt_stop,
> >>>   .ping = imx2_wdt_ping,
> >>>   .set_timeout = imx2_wdt_set_timeout,
> >>>+  .set_pretimeout = imx2_wdt_set_pretimeout,
> >>>  };
> >>>
> >>>  static const struct regmap_config imx2_wdt_regmap_config = {
> 

Re: [PATCH v7,1/3] dt-bindings: binding for jz4780-{nand,bch}

2015-11-03 Thread Boris Brezillon
Hi Harvey,

On Tue, 6 Oct 2015 17:27:15 +0100
Harvey Hunt  wrote:

> From: Alex Smith 
> 
> Add DT bindings for NAND devices connected to the NEMC on JZ4780 SoCs,
> as well as the hardware BCH controller, used by the jz4780_{nand,bch}
> drivers.

Patch 3 does not seem to follow this binding: it still uses the
ingenic,ecc-xxx properties and defines NAND timings.

Also, as answered to patch 3, I think it would be clearer to separate
the nand controller and nand chip representation.

Best Regards,

Boris

> 
> Signed-off-by: Alex Smith 
> Cc: Zubair Lutfullah Kakakhel 
> Cc: David Woodhouse 
> Cc: Brian Norris 
> Cc: linux-...@lists.infradead.org
> Cc: devicet...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Cc: Alex Smith 
> Signed-off-by: Harvey Hunt 
> ---
> v6 -> v7:
>  - Add nand-ecc-mode to DT bindings.
>  - Add nand-on-flash-bbt to DT bindings.
> 
> v5 -> v6:
>  - No change.
> 
> v4 -> v5:
>  - Rename ingenic,bch-device to ingenic,bch-controller to fit with
>existing convention.
> 
> v3 -> v4:
>  - No change
> 
> v2 -> v3:
>  - Rebase to 4.0-rc6
>  - Changed ingenic,ecc-size to common nand-ecc-step-size
>  - Changed ingenic,ecc-strength to common nand-ecc-strength
>  - Changed ingenic,busy-gpio to common rb-gpios
>  - Changed ingenic,wp-gpio to common wp-gpios
> 
> v1 -> v2:
>  - Rebase to 4.0-rc3
> 
>  .../bindings/mtd/ingenic,jz4780-nand.txt   | 61 
> ++
>  1 file changed, 61 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/mtd/ingenic,jz4780-nand.txt
> 
> diff --git a/Documentation/devicetree/bindings/mtd/ingenic,jz4780-nand.txt 
> b/Documentation/devicetree/bindings/mtd/ingenic,jz4780-nand.txt
> new file mode 100644
> index 000..44a0468
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/mtd/ingenic,jz4780-nand.txt
> @@ -0,0 +1,61 @@
> +* Ingenic JZ4780 NAND/BCH
> +
> +This file documents the device tree bindings for NAND flash devices on the
> +JZ4780. NAND devices are connected to the NEMC controller (described in
> +memory-controllers/ingenic,jz4780-nemc.txt), and thus NAND device nodes must
> +be children of the NEMC node.
> +
> +Required NAND device properties:
> +- compatible: Should be set to "ingenic,jz4780-nand".
> +- reg: For each bank with a NAND chip attached, should specify a bank number,
> +  an offset of 0 and a size of 0x100 (i.e. the whole NEMC bank).
> +
> +Optional NAND device properties:
> +- ingenic,bch-controller: To make use of the hardware BCH controller, this
> +  property must contain a phandle for the BCH controller node. The required
> +  properties for this node are described below. If this is not specified,
> +  software BCH will be used instead.
> +- nand-ecc-step-size: ECC block size in bytes.
> +- nand-ecc-strength: ECC strength (max number of correctable bits).
> +- nand-ecc-mode: String, operation mode of the NAND ecc mode. "hw" by default
> +- nand-on-flash-bbt: boolean to enable on flash bbt option if not present 
> false
> +- rb-gpios: GPIO specifier for the busy pin.
> +- wp-gpios: GPIO specifier for the write protect pin.
> +
> +Example:
> +
> +nemc: nemc@1341 {
> + ...
> +
> + nand: nand@1 {
> + compatible = "ingenic,jz4780-nand";
> + reg = <1 0 0x100>;  /* Bank 1 */
> +
> + ingenic,bch-controller = <>;
> + nand-ecc-step-size = <1024>;
> + nand-ecc-strength = <24>;
> + nand-ecc-mode = "hw";
> + nand-on-flash-bbt;
> +
> + rb-gpios = < 20 GPIO_ACTIVE_LOW>;
> + wp-gpios = < 22 GPIO_ACTIVE_LOW>;
> + };
> +};
> +
> +The BCH controller is a separate SoC component used for error correction on
> +NAND devices. The following is a description of the device properties for a
> +BCH controller.
> +
> +Required BCH properties:
> +- compatible: Should be set to "ingenic,jz4780-bch".
> +- reg: Should specify the BCH controller registers location and length.
> +- clocks: Clock for the BCH controller.
> +
> +Example:
> +
> +bch: bch@134d {
> + compatible = "ingenic,jz4780-bch";
> + reg = <0x134d 0x1>;
> +
> + clocks = < JZ4780_CLK_BCH>;
> +};



-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v1] Revert "serial: imx: remove unbalanced clk_prepare"

2015-11-03 Thread Robin Gong
On Wed, Nov 04, 2015 at 01:39:17AM -0200, Fabio Estevam wrote:
> On Wed, Nov 4, 2015 at 12:58 AM, Robin Gong  wrote:
> > commit 9e7b399d6528 ("serial: imx: remove unbalanced clk_prepare").
> > Otherwise below warning happen since there are some printk logs in
> > interrupt.
> 
> This has already been reverted since v4.3-rc5.
That's great, please ignore this patch.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v7] PCI: Xilinx-NWL-PCIe: Added support for Xilinx NWL PCIe Host Controller

2015-11-03 Thread Bharat Kumar Gogada
> > Without #ifdefs if we compile driver for legacy, MSI structures will not be
> available and we get compile time error.
> 
> Sorry for nitpicking but at least can we use elegant version of #ifdefs .i.e. 
> #if
> IS_ENABLED() here ?
> 
Since IS_ENABLED() is checked at runtime, compile time error would come for 
legacy.

Thanks
Bharat


Re: [PATCH v7,3/3] MIPS: dts: jz4780/ci20: Add NEMC, BCH and NAND device tree nodes

2015-11-03 Thread Boris Brezillon
Paul, Harvey,

On Fri, 16 Oct 2015 11:48:48 +0100
Paul Burton  wrote:

> On Fri, Oct 16, 2015 at 11:31:12AM +0100, James Hogan wrote:
> > > >> +
> > > >> + {
> > > >> + status = "okay";
> > > >> +
> > > >> + nand: nand@1 {
> > > >> + compatible = "ingenic,jz4780-nand";
> > > >
> > > > Isn't the NAND a micron part? This doesn't seem right. Is the device
> > > > driver and binding already accepted upstream with that compatible
> > > > string?
> > > 
> > > This is the compatible string for the JZ4780 NAND driver, this patch
> > > is part of the series adding that. Detection of the NAND part is
> > > handled by the MTD subsystem.
> > 
> > Right (didn't spot that it was part of a series).
> > 
> > The node appears to describe the NAND interface itself, i.e. a part of
> > the SoC, so should be in the SoC dtsi file, with overrides in the board
> > file if necessary for it to work with a particular NAND part
> > (potentially utilising status="disabled"). Would you agree?
> 
> Hi James,
> 
> The "nemc" node there is for the Nand & External Memory Controller which
> is a hardware block inside the SoC. It has 6 banks (ie. 6 chip select
> pins, each associated with a different address range, that connect to
> different devices). NAND flash is one such possible device, but a board
> could connect it to any of the 6 chip selects, or banks. To represent
> that in the SoC dtsi you'd want to have 6 NAND nodes, each disabled by
> default, which doesn't make a whole lot of sense to me. Other, non-NAND
> devices can connect to the NEMC too - for example the ethernet
> controller on the CI20 is connected to one bank.
> 
> The NAND device nodes are sort of a mix of describing the NAND flash
> (ie. Micron part as you point out) and its connections & properties, the
> way the NEMC should be used to interact with it alongside the BCH block,
> and the configuration for the NEMC such as timing parameters.
> 
> I imagine the most semantically correct means of describing it would
> probably be for the compatible string to reflect the Micron NAND part,
> and the NEMC driver to pick up on the relevant properties of its child
> nodes for configuring timings, whether the device is NAND etc. However
> the handling of registering NAND devices with MTD would probably then
> have to be part of the NEMC driver, which feels a bit off too.

Another solution would be to describe both the NAND controller and the
NAND chip in the DT (with the NAND chip being a chip of the NAND
controller).
Actually this is already what other binding are doing [1][2]. I know
those bindings are representing NAND controllers which can interface
with more than one NAND chip, but I think that even in the 1:1 case it
would make it clearer to represent both the NAND chip and the NAND
controller.

In your case this would give the following representation

+ {
+   status = "okay";
+
+   nandc: nand-controller@1 {
+   compatible = "ingenic,jz4780-nand";
+   reg = <1 0 0x100>;
+   #address-cells = <1>;
+   #size-cells = <0>;
+
+   ingenic,bch-controller = <>;
+
+   nand@0 {
+   nand-ecc-mode = "hw";
+   nand-on-flash-bbt;
+   nand-ecc-size = <1024>;
+   nand-ecc-strength = <24>;
+
+   #address-cells = <2>;
+   #size-cells = <2>;
+
+   partition@0 {
+   label = "u-boot-spl";
+   reg = <0x0 0x0 0x0 0x80>;
+   };
+   /* ... */
+
+   };
+   };
+};

Best Regards,

Boris

[1]http://lxr.free-electrons.com/source/Documentation/devicetree/bindings/mtd/brcm,brcmnand.txt#L119
[2]http://lxr.free-electrons.com/source/Documentation/devicetree/bindings/mtd/sunxi-nand.txt#L28

> 
> Thanks,
> Paul
> 
> __
> Linux MTD discussion mailing list
> http://lists.infradead.org/mailman/listinfo/linux-mtd/



-- 
Boris Brezillon, Free Electrons
Embedded Linux and Kernel engineering
http://free-electrons.com
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 3/6] arm64: ftrace: fix a stack tracer's output under function graph tracer

2015-11-03 Thread AKASHI Takahiro

On 11/01/2015 05:00 PM, Jungseok Lee wrote:

On Oct 30, 2015, at 2:25 PM, AKASHI Takahiro wrote:

Hi Akashi,


Function graph tracer modifies a return address (LR) in a stack frame
to hook a function return. This will result in many useless entries
(return_to_handler) showing up in a stack tracer's output.

This patch replaces such entries with originals values preserved in
current->ret_stack[].

Signed-off-by: AKASHI Takahiro 
---
arch/arm64/include/asm/ftrace.h |2 ++
arch/arm64/kernel/stacktrace.c  |   21 +
2 files changed, 23 insertions(+)

diff --git a/arch/arm64/include/asm/ftrace.h b/arch/arm64/include/asm/ftrace.h
index 2b43e20..b7d597c 100644
--- a/arch/arm64/include/asm/ftrace.h
+++ b/arch/arm64/include/asm/ftrace.h
@@ -29,6 +29,8 @@ struct dyn_arch_ftrace {

extern unsigned long ftrace_graph_call;

+extern void return_to_handler(void);
+
static inline unsigned long ftrace_call_adjust(unsigned long addr)
{
/*
diff --git a/arch/arm64/kernel/stacktrace.c b/arch/arm64/kernel/stacktrace.c
index bc0689a..631c49d 100644
--- a/arch/arm64/kernel/stacktrace.c
+++ b/arch/arm64/kernel/stacktrace.c
@@ -17,6 +17,7 @@
  */
#include 
#include 
+#include 
#include 
#include 

@@ -78,6 +79,9 @@ struct stack_trace_data {
struct stack_trace *trace;
unsigned int no_sched_functions;
unsigned int skip;
+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+   unsigned int ret_stack_index;
+#endif
};

static int save_trace(struct stackframe *frame, void *d)
@@ -86,6 +90,20 @@ static int save_trace(struct stackframe *frame, void *d)
struct stack_trace *trace = data->trace;
unsigned long addr = frame->pc;

+#ifdef CONFIG_FUNCTION_GRAPH_TRACER
+   if (addr == (unsigned long)return_to_handler - AARCH64_INSN_SIZE) {
+   /*
+* This is a case where function graph tracer has
+* modified a return address (LR) in a stack frame
+* to hook a function return.
+* So replace it to an original value.
+*/
+   frame->pc = addr =
+   current->ret_stack[data->ret_stack_index--].ret
+   - AARCH64_INSN_SIZE;
+   }
+#endif /* CONFIG_FUNCTION_GRAPH_TRACER */
+


This hunk would be affected as the commit, "ARM64: unwind: Fix PC calculation",
e306dfd0, has been reverted.


Yeah, we don't need apply this patch any more.
The other patches in the series will not be affected.

Thanks,
-Takahiro AKASHI



Best Regards
Jungseok Lee


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] tmpfs: avoid a little creat and stat slowdown

2015-11-03 Thread Huang, Ying
Hugh Dickins  writes:

> LKP reports that v4.2 commit afa2db2fb6f1 ("tmpfs: truncate prealloc
> blocks past i_size") causes a 14.5% slowdown in the AIM9 creat-clo
> benchmark.
>
> creat-clo does just what you'd expect from the name, and creat's O_TRUNC
> on 0-length file does indeed get into more overhead now shmem_setattr()
> tests "0 <= 0" instead of "0 < 0".
>
> I'm not sure how much we care, but I think it would not be too VW-like
> to add in a check for whether any pages (or swap) are allocated: if none
> are allocated, there's none to remove from the radix_tree.  At first I
> thought that check would be good enough for the unmaps too, but no: we
> should not skip the unlikely case of unmapping pages beyond the new EOF,
> which were COWed from holes which have now been reclaimed, leaving none.
>
> This gives me an 8.5% speedup: on Haswell instead of LKP's Westmere,
> and running a debug config before and after: I hope those account for
> the lesser speedup.
>
> And probably someone has a benchmark where a thousand threads keep on
> stat'ing the same file repeatedly: forestall that report by adjusting
> v4.3 commit 44a30220bc0a ("shmem: recalculate file inode when fstat")
> not to take the spinlock in shmem_getattr() when there's no work to do.
>
> Reported-by: Ying Huang 
> Signed-off-by: Hugh Dickins 

Hi, Hugh,

Thanks a lot for your support!  The test on LKP shows that this patch
restores a big part of the regression!  In following list,

c435a390574d012f8d30074135d8fcc6f480b484: is parent commit
afa2db2fb6f15f860069de94a1257db57589fe95: is the first bad commit has
performance regression.
43819159da2b77fedcf7562134d6003dccd6a068: is the fixing patch

=
compiler/cpufreq_governor/kconfig/rootfs/tbox_group/test/testcase/testtime:
  
gcc-4.9/performance/x86_64-rhel/debian-x86_64-2015-02-07.cgz/lkp-wsx02/creat-clo/aim9/300s

commit: 
  c435a390574d012f8d30074135d8fcc6f480b484
  afa2db2fb6f15f860069de94a1257db57589fe95
  43819159da2b77fedcf7562134d6003dccd6a068

c435a390574d012f afa2db2fb6f15f860069de94a1 43819159da2b77fedcf7562134 
 -- -- 
 %stddev %change %stddev %change %stddev
 \  |\  |\  
563556 ±  1% -12.5% 493033 ±  5%  -5.6% 531968 ±  1%  
aim9.creat-clo.ops_per_sec
 11836 ±  7% +11.4%  13184 ±  7% +15.0%  13608 ±  5%  
numa-meminfo.node1.SReclaimable
  10121526 ±  3% -12.1%8897097 ±  5%  -4.1%9707953 ±  4%  
proc-vmstat.pgalloc_normal
  9.34 ±  4% -11.4%   8.28 ±  3%  -4.8%   8.88 ±  2%  
time.user_time
  3480 ±  3%  -2.5%   3395 ±  1% -28.5%   2488 ±  3%  
vmstat.system.cs
203275 ± 17%  -6.8% 189453 ±  5% -34.4% 133352 ± 11%  
cpuidle.C1-NHM.usage
   8081280 ±129% -93.3% 538377 ± 97% +31.5%   10625496 ±106%  
cpuidle.C1E-NHM.time
  3144 ± 58%+619.0%  22606 ± 56%+903.9%  31563 ±  0%  
numa-vmstat.node0.numa_other
  2958 ±  7% +11.4%   3295 ±  7% +15.0%   3401 ±  5%  
numa-vmstat.node1.nr_slab_reclaimable
 45074 ±  5% -43.4%  25494 ± 57% -68.7%  14105 ±  2%  
numa-vmstat.node2.numa_other
 56140 ±  0%  +0.0%  56158 ±  0% -94.4%   3120 ±  0%  
slabinfo.Acpi-ParseExt.active_objs
  1002 ±  0%  +0.0%   1002 ±  0% -92.0%  80.00 ±  0%  
slabinfo.Acpi-ParseExt.active_slabs
 56140 ±  0%  +0.0%  56158 ±  0% -94.4%   3120 ±  0%  
slabinfo.Acpi-ParseExt.num_objs
  1002 ±  0%  +0.0%   1002 ±  0% -92.0%  80.00 ±  0%  
slabinfo.Acpi-ParseExt.num_slabs
  1079 ±  5% -10.8% 962.00 ± 10%-100.0%   0.00 ± -1%  
slabinfo.blkdev_ioc.active_objs
  1079 ±  5% -10.8% 962.00 ± 10%-100.0%   0.00 ± -1%  
slabinfo.blkdev_ioc.num_objs
110.67 ± 39% +74.4% 193.00 ± 46%+317.5% 462.00 ±  8%  
slabinfo.blkdev_queue.active_objs
189.33 ± 23% +43.7% 272.00 ± 33%+151.4% 476.00 ± 10%  
slabinfo.blkdev_queue.num_objs
  1129 ± 10%  -1.9%   1107 ±  7% +20.8%   1364 ±  6%  
slabinfo.blkdev_requests.active_objs
  1129 ± 10%  -1.9%   1107 ±  7% +20.8%   1364 ±  6%  
slabinfo.blkdev_requests.num_objs
  1058 ±  3% -10.3% 949.00 ±  9%-100.0%   0.00 ± -1%  
slabinfo.file_lock_ctx.active_objs
  1058 ±  3% -10.3% 949.00 ±  9%-100.0%   0.00 ± -1%  
slabinfo.file_lock_ctx.num_objs
  4060 ±  1%  -2.1%   3973 ±  1% -10.5%   3632 ±  1%  
slabinfo.files_cache.active_objs
  4060 ±  1%  -2.1%   3973 ±  1% -10.5%   3632 ±  1%  
slabinfo.files_cache.num_objs
 10001 ±  0%  -0.3%   9973 ±  0% -61.1%   3888 ±  0%  

Re: Mobility Radeon HD 4530/4570/545v: flicker in 1920x1080

2015-11-03 Thread Christian König

On 04.11.2015 00:03, Pavel Machek wrote:

Hi!



Any ideas?

Alex probably knows more about this, but it sounds like problems with
switching the memory clocks on 3D load.
Try to disable power management completely with radeon.dpm=0 on the kernel
command line or nailing the hardware at a specific power level using
sysfs.

I tried that, but it still flickers.

It's probably pll stability.  There seem to be a number of regressions
since the pll code was rewritten to support matching the hdmi clocks
more closely.  Does this patch help?

diff --git a/drivers/gpu/drm/radeon/atombios_crtc.c
b/drivers/gpu/drm/radeon/atombios_crtc.c
index dac78ad..b86f06a 100644
--- a/drivers/gpu/drm/radeon/atombios_crtc.c
+++ b/drivers/gpu/drm/radeon/atombios_crtc.c
@@ -569,6 +569,8 @@ static u32 atombios_adjust_pll(struct drm_crtc *crtc,
 radeon_crtc->pll_flags = 0;

 if (ASIC_IS_AVIVO(rdev)) {
+   radeon_crtc->pll_flags |= RADEON_PLL_PREFER_MINM_OVER_MAXP;
+
 if ((rdev->family == CHIP_RS600) ||
 (rdev->family == CHIP_RS690) ||
 (rdev->family == CHIP_RS740))


Help.. maybe... it is tricky to tell. It definitely does _not_ fix the
issue completely.

You could also try the old pll algorithm:

I reverted the patch above, and switched to the old algorithm.

The flicker is still there. (But maybe its less horrible, like with
RADEON_PLL_PREFER_MINM_OVER_MAXP).


The flickering would vanish completely if that's the reason for the 
issue you are seeing.


Try setting ref_div_min and ref_div_max to 2 in radeon_compute_pll_avivo().

But I'm not 100% convinced that this is actually a PLL problem, try to 
compile the firmware it complains about into the kernel as well.


Regards,
Christian.



Thanks,
Pavel



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v7] PCI: Xilinx-NWL-PCIe: Added support for Xilinx NWL PCIe Host Controller

2015-11-03 Thread Amit Tomer
> Without #ifdefs if we compile driver for legacy, MSI structures will not be 
> available and we get compile time error.

Sorry for nitpicking but at least can we use elegant version of
#ifdefs .i.e. #if IS_ENABLED() here ?

Thanks,
Amit.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] dmaengine: edma: fix build without CONFIG_OF

2015-11-03 Thread Peter Ujfalusi
On 11/03/2015 04:00 PM, Arnd Bergmann wrote:
> During the edma rework, a build error was introduced for the
> case that CONFIG_OF is disabled:
> 
> drivers/built-in.o: In function `edma_tc_set_pm_state':
> :(.text+0x43bf0): undefined reference to `of_find_device_by_node'
> 
> As the edma_tc_set_pm_state() function does nothing in case
> we are running without OF, this adds an IS_ENABLED() check
> that turns the function into an empty stub then and avoids the
> link error.
> 
> Signed-off-by: Arnd Bergmann 
> Fixes: ca304fa9bb76 ("ARM/dmaengine: edma: Public API to use private struct 
> pointer")

The actual commit this patch is fixing is:
1be5336bc7ba dmaengine: edma: New device tree binding

> ---
> Found on ARM randconfig builds with today's linux-next

I have sanity built the kernel with omap2plus_defconfig and
davinci_all_defconfig since eDMA is used by these platforms and did not faced
with this issue, as obviously these defconfigs will result OF to be enabled.

> diff --git a/drivers/dma/edma.c b/drivers/dma/edma.c
> index 31722d436a42..16713a93da10 100644
> --- a/drivers/dma/edma.c
> +++ b/drivers/dma/edma.c
> @@ -1560,7 +1560,7 @@ static void edma_tc_set_pm_state(struct edma_tc *tc, 
> bool enable)
>   struct platform_device *tc_pdev;
>   int ret;
>  
> - if (!tc)
> + if (!IS_ENABLED(CONFIG_OF) || !tc)
>   return;

Should we instead put the function inside of:
#if IS_ENABLED(CONFIG_OF)
static void edma_tc_set_pm_state(struct edma_tc *tc, bool enable)
{
...
}
#else
static inline void edma_tc_set_pm_state(struct edma_tc *tc, bool enable)
{
}
#endif /* IS_ENABLED(CONFIG_OF) */


>  
>   tc_pdev = of_find_device_by_node(tc->node);
> 

-- 
Péter
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] h8300: Initialize cleanup and remove module code

2015-11-03 Thread Yoshinori Sato
h8300's clocksource driver cleanup.
- Use CLOCKSOURCE_OF_DECLARE
- Remove module exit

Signed-off-by: Yoshinori Sato 
---
 drivers/clocksource/h8300_timer16.c | 133 -
 drivers/clocksource/h8300_timer8.c  | 164 ++--
 drivers/clocksource/h8300_tpu.c | 105 ---
 3 files changed, 151 insertions(+), 251 deletions(-)

diff --git a/drivers/clocksource/h8300_timer16.c 
b/drivers/clocksource/h8300_timer16.c
index 82941c1..e2e10cf 100644
--- a/drivers/clocksource/h8300_timer16.c
+++ b/drivers/clocksource/h8300_timer16.c
@@ -17,6 +17,8 @@
 #include 
 #include 
 #include 
+#include 
+#include 
 
 #include 
 #include 
@@ -47,7 +49,6 @@
 #define ABSOLUTE 1
 
 struct timer16_priv {
-   struct platform_device *pdev;
struct clocksource cs;
struct irqaction irqaction;
unsigned long total_cycles;
@@ -147,43 +148,58 @@ static void timer16_disable(struct clocksource *cs)
 #define REG_CH   0
 #define REG_COMM 1
 
-static int timer16_setup(struct timer16_priv *p, struct platform_device *pdev)
+static void __init h8300_16timer_init(struct device_node *node)
 {
-   struct resource *res[2];
+   void __iomem *base[2];
int ret, irq;
unsigned int ch;
+   struct timer16_priv *p;
+   struct clk *clk;
+
+   clk = of_clk_get(node, 0);
+   if (IS_ERR(clk)) {
+   pr_err("failed to get clock for clocksource\n");
+   return;
+   }
 
-   memset(p, 0, sizeof(*p));
-   p->pdev = pdev;
+   base[REG_CH] = of_iomap(node, 0);
+   if (!base[0]) {
+   pr_err("failed to map registers for clocksource\n");
+   goto free_clk;
+   }
 
-   res[REG_CH] = platform_get_resource(p->pdev,
-   IORESOURCE_MEM, REG_CH);
-   res[REG_COMM] = platform_get_resource(p->pdev,
- IORESOURCE_MEM, REG_COMM);
-   if (!res[REG_CH] || !res[REG_COMM]) {
-   dev_err(>pdev->dev, "failed to get I/O memory\n");
-   return -ENXIO;
+   base[REG_COMM] = of_iomap(node, 1);
+   if (!base[1]) {
+   pr_err("failed to map registers for clocksource\n");
+   goto unmap_ch;
}
-   irq = platform_get_irq(p->pdev, 0);
+
+   irq = irq_of_parse_and_map(node, 0);
if (irq < 0) {
-   dev_err(>pdev->dev, "failed to get irq\n");
-   return irq;
+   pr_err("failed to get irq for clockevent\n");
+   goto unmap_comm;
}
 
-   p->clk = clk_get(>pdev->dev, "fck");
-   if (IS_ERR(p->clk)) {
-   dev_err(>pdev->dev, "can't get clk\n");
-   return PTR_ERR(p->clk);
+   p = kzalloc(sizeof(*p), GFP_KERNEL);
+   if (!p) {
+   pr_err("failed to allocate memory for clocksource\n");
+   goto unmap_comm;
}
-   of_property_read_u32(p->pdev->dev.of_node, "renesas,channel", );
 
-   p->pdev = pdev;
-   p->mapbase = res[REG_CH]->start;
-   p->mapcommon = res[REG_COMM]->start;
+   of_property_read_u32(node, "renesas,channel", );
+
+   p->mapbase = (unsigned long)base[REG_CH];
+   p->mapcommon = (unsigned long)base[REG_COMM];
p->enb = 1 << ch;
p->imfa = 1 << ch;
p->imiea = 1 << (4 + ch);
-   p->cs.name = pdev->name;
+
+   p->irqaction.name = node->name;
+   p->irqaction.handler = timer16_interrupt;
+   p->irqaction.dev_id = p;
+   p->irqaction.flags = IRQF_TIMER;
+
+   p->cs.name = node->name;
p->cs.rating = 200;
p->cs.read = timer16_clocksource_read;
p->cs.enable = timer16_enable;
@@ -191,64 +207,23 @@ static int timer16_setup(struct timer16_priv *p, struct 
platform_device *pdev)
p->cs.mask = CLOCKSOURCE_MASK(sizeof(unsigned long) * 8);
p->cs.flags = CLOCK_SOURCE_IS_CONTINUOUS;
 
-   ret = request_irq(irq, timer16_interrupt,
- IRQF_TIMER, pdev->name, p);
+   ret = setup_irq(irq, >irqaction);
if (ret < 0) {
-   dev_err(>pdev->dev, "failed to request irq %d\n", irq);
-   return ret;
+   pr_err("failed to request irq %d of clocksource\n", irq);
+   goto free_priv;
}
 
clocksource_register_hz(>cs, clk_get_rate(p->clk) / 8);
-
-   return 0;
-}
-
-static int timer16_probe(struct platform_device *pdev)
-{
-   struct timer16_priv *p = platform_get_drvdata(pdev);
-
-   if (p) {
-   dev_info(>dev, "kept as earlytimer\n");
-   return 0;
-   }
-
-   p = devm_kzalloc(>dev, sizeof(*p), GFP_KERNEL);
-   if (!p)
-   return -ENOMEM;
-
-   return timer16_setup(p, pdev);
-}
-
-static int timer16_remove(struct platform_device *pdev)
-{
-   return -EBUSY;
-}
-
-static const struct of_device_id timer16_of_table[] = {
-   { .compatible = 

Re: [PATCH v5 0/4] Exynos SROMc configuration and Ethernet support for SMDK5410

2015-11-03 Thread Krzysztof Kozlowski
On 03.11.2015 18:16, Pavel Fedin wrote:
> This patch extends Exynos SROM controller driver with ability to configure
> controller outputs and enables SMSC9115 Ethernet chip on SMDK5410 board,
> which is connected via SROMc bank #3.
> 
> With this patchset, support for the whole existing SMDK range can be added.
> Actually, only bank number is different.
> 
> This patchset also depends on Exynos 5410 pinctrl support, introduced by
> patches 0003 and 0004 from this set:
> [PATCH v4 0/5] ARM: EXYNOS: ODROID-XU DT and LEDs
> http://lists.infradead.org/pipermail/linux-arm-kernel/2015-March/330862.html

Only the first patch made up to kernel. The rest (2-5) received feedback
and I am waiting for fixing it. Andreas apparently loose the interest...

Best regards,
Krzysztof

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mm: change tlb_finish_mmu() to be more simple

2015-11-03 Thread yalin wang
This patch remove unneeded *next temp variable,
make this function more simple to read.

Signed-off-by: yalin wang 
---
 mm/memory.c | 7 +++
 1 file changed, 3 insertions(+), 4 deletions(-)

diff --git a/mm/memory.c b/mm/memory.c
index 7f3b9f2..f0040ed 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -270,17 +270,16 @@ void tlb_flush_mmu(struct mmu_gather *tlb)
  */
 void tlb_finish_mmu(struct mmu_gather *tlb, unsigned long start, unsigned long 
end)
 {
-   struct mmu_gather_batch *batch, *next;
+   struct mmu_gather_batch *batch;
 
tlb_flush_mmu(tlb);
 
/* keep the page table cache within bounds */
check_pgt_cache();
 
-   for (batch = tlb->local.next; batch; batch = next) {
-   next = batch->next;
+   for (batch = tlb->local.next; batch; batch = batch->next)
free_pages((unsigned long)batch, 0);
-   }
+
tlb->local.next = NULL;
 }
 
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] perf sched latency: Fix removed thread issue

2015-11-03 Thread Jiri Olsa
On Tue, Nov 03, 2015 at 03:33:10PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Tue, Nov 03, 2015 at 08:41:48AM +0100, Jiri Olsa escreveu:
> > On Mon, Nov 02, 2015 at 07:53:53PM -0300, Arnaldo Carvalho de Melo wrote:
> > > Em Mon, Nov 02, 2015 at 12:10:25PM +0100, Jiri Olsa escreveu:
> > > > If machine's thread gets excited (EXIT event is received),
> > > > we set thread->dead = true and it is later on removed from
> > > > machine's tree if the pid is reused on new thread.
> > > 
> > > > The latency subcommand holds tree of working atoms sorted
> > > > by thread's pid/tid. If there's new thread with same pid
> > > 
> > > Humm, wher is the latency subcommand handling the EXIT event?
> > > 
> > > I see:
> > > 
> > >perf_sched__lat
> > >  perf_sched__read_events
> > >session = perf_session__new(, false, >tool);
> > >perf_session__process_events(session)
> > > 
> > > And sched->tool->exit() is not set, which will make
> > > perf_session__process_events(), when calling perf_tool__fill_defaults()
> > > set it to process_event_stub() which will do nothing for
> > > PERF_RECORD_EXIT events, no?
> > 
> > yep, latency command does not handle EXIT event, but the
> > thread is removed via FORK event.. the first changelog
> 
> So its not related to processing an EXIT event as described in the
> changelog, ok. And I don't see where is that thread->dead is set, i.e.
> the sequence is:
> 
>   machine__process_fork_event()
> machine__remove_thread()
> 
> The only place where thread->dead is set to true is in thread__exited()
> and that is only called by machine__process_exit_event(), which is never
> called by 'perf sched'.
> 
> It is not "later removed from machine's tree", it is removed straight
> away, in __machine__remove_thread().

it is removed later via FORK event, as stated in changelog
and in updated changelog I sent in previous email:

SNIP

> > could you please change it to:
> > 
> > ---
> > If machine's thread gets excited (EXIT event is received),
> > we set thread->dead = true and it is later on removed from
> > machine's tree if the pid is reused on new thread.
> > 
> > We dont handle EXIT command in 'perf sched latency',
> > however the old thread is removed anyway when FORK
> > event is received for new thrad with same pid/tid.
> > ---

thanks,
jirka
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] rtc: Group Kconfig entries by vendor

2015-11-03 Thread Soren Brinkmann
The RTC entries are mostly grouped by vendor. Move the few outliers in
place.
Also, change the one occurrence of 'nxp' to 'NXP' to make all NXP
entries consistent.

Signed-off-by: Soren Brinkmann 
---
 drivers/rtc/Kconfig | 90 ++---
 1 file changed, 45 insertions(+), 45 deletions(-)

diff --git a/drivers/rtc/Kconfig b/drivers/rtc/Kconfig
index 2a524244afec..ea7f43f12f69 100644
--- a/drivers/rtc/Kconfig
+++ b/drivers/rtc/Kconfig
@@ -325,16 +325,6 @@ config RTC_DRV_MAX77686
  This driver can also be built as a module. If so, the module
  will be called rtc-max77686.
 
-config RTC_DRV_RK808
-   tristate "Rockchip RK808 RTC"
-   depends on MFD_RK808
-   help
- If you say yes here you will get support for the
- RTC of RK808 PMIC.
-
- This driver can also be built as a module. If so, the module
- will be called rk808-rtc.
-
 config RTC_DRV_MAX77802
tristate "Maxim 77802 RTC"
depends on MFD_MAX77686
@@ -345,6 +335,16 @@ config RTC_DRV_MAX77802
  This driver can also be built as a module. If so, the module
  will be called rtc-max77802.
 
+config RTC_DRV_RK808
+   tristate "Rockchip RK808 RTC"
+   depends on MFD_RK808
+   help
+ If you say yes here you will get support for the
+ RTC of RK808 PMIC.
+
+ This driver can also be built as a module. If so, the module
+ will be called rk808-rtc.
+
 config RTC_DRV_RS5C372
tristate "Ricoh R2025S/D, RS5C372A/B, RV5C386, RV5C387A"
help
@@ -391,16 +391,6 @@ config RTC_DRV_X1205
  This driver can also be built as a module. If so, the module
  will be called rtc-x1205.
 
-config RTC_DRV_PALMAS
-   tristate "TI Palmas RTC driver"
-   depends on MFD_PALMAS
-   help
- If you say yes here you get support for the RTC of TI PALMA series 
PMIC
- chips.
-
- This driver can also be built as a module. If so, the module
- will be called rtc-palma.
-
 config RTC_DRV_PCF2127
tristate "NXP PCF2127"
help
@@ -419,6 +409,14 @@ config RTC_DRV_PCF8523
  This driver can also be built as a module. If so, the module
  will be called rtc-pcf8523.
 
+config RTC_DRV_PCF85063
+   tristate "NXP PCF85063"
+   help
+ If you say yes here you get support for the PCF85063 RTC chip
+
+ This driver can also be built as a module. If so, the module
+ will be called rtc-pcf85063.
+
 config RTC_DRV_PCF8563
tristate "Philips PCF8563/Epson RTC8564"
help
@@ -429,14 +427,6 @@ config RTC_DRV_PCF8563
  This driver can also be built as a module. If so, the module
  will be called rtc-pcf8563.
 
-config RTC_DRV_PCF85063
-   tristate "nxp PCF85063"
-   help
- If you say yes here you get support for the PCF85063 RTC chip
-
- This driver can also be built as a module. If so, the module
- will be called rtc-pcf85063.
-
 config RTC_DRV_PCF8583
tristate "Philips PCF8583"
help
@@ -501,6 +491,16 @@ config RTC_DRV_TWL4030
  This driver can also be built as a module. If so, the module
  will be called rtc-twl.
 
+config RTC_DRV_PALMAS
+   tristate "TI Palmas RTC driver"
+   depends on MFD_PALMAS
+   help
+ If you say yes here you get support for the RTC of TI PALMA series 
PMIC
+ chips.
+
+ This driver can also be built as a module. If so, the module
+ will be called rtc-palma.
+
 config RTC_DRV_TPS6586X
tristate "TI TPS6586X RTC driver"
depends on MFD_TPS6586X
@@ -681,15 +681,6 @@ config RTC_DRV_DS1390
  This driver can also be built as a module. If so, the module
  will be called rtc-ds1390.
 
-config RTC_DRV_MAX6902
-   tristate "Maxim MAX6902"
-   help
- If you say yes here you will get support for the
- Maxim MAX6902 SPI RTC chip.
-
- This driver can also be built as a module. If so, the module
- will be called rtc-max6902.
-
 config RTC_DRV_R9701
tristate "Epson RTC-9701JE"
help
@@ -699,6 +690,14 @@ config RTC_DRV_R9701
  This driver can also be built as a module. If so, the module
  will be called rtc-r9701.
 
+config RTC_DRV_RX4581
+   tristate "Epson RX-4581"
+   help
+ If you say yes here you will get support for the Epson RX-4581.
+
+ This driver can also be built as a module. If so the module
+ will be called rtc-rx4581.
+
 config RTC_DRV_RS5C348
tristate "Ricoh RS5C348A/B"
help
@@ -708,6 +707,15 @@ config RTC_DRV_RS5C348
  This driver can also be built as a module. If so, the module
  will be called rtc-rs5c348.
 
+config RTC_DRV_MAX6902
+   tristate "Maxim MAX6902"
+   help
+ If you say yes here you will get support for the
+ Maxim MAX6902 SPI RTC chip.
+
+ This 

HELLO

2015-11-03 Thread
 I am Sgt Monica Brown and am with the US Army in Afghanistan. I need your 
assistance in a private project that will be beneficial to the both of us. Get 
back to me if you can be of help.

---
This email has been checked for viruses by Avast antivirus software.
https://www.avast.com/antivirus

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Documentation: dt: Add bindings for Secure-only devices

2015-11-03 Thread Frank Rowand
On 10/30/2015 1:07 PM, Peter Maydell wrote:
> On 30 October 2015 at 18:28, Rob Herring  wrote:
>> On Thu, Oct 29, 2015 at 9:01 AM, Peter Maydell  
>> wrote:
>>> +Valid Secure world properties:
>>> +
>>> +- secure-status : specifies whether the device is present and usable
>>> +  in the secure world. The combination of this with "status" allows
>>> +  the various possible combinations of device visibility to be
>>> +  specified:
>>> +   status = "okay"; // visible in S and NS
>>
>> I assume neither property present or both okay also mean the same.
>>
>> status = "okay"; secure-status = "okay";
>>
>> We should be explicit.
> 
> Yes; status defaults to "okay" (presumably this is listed in
> the overal DT binding spec somewhere), 

Valid values for the "status" property do not seem to be well documented
for Linux.

The code, __of_device_is_available(), implements:

  value of "okay"  returns true
  value of "ok"returns true
  status property does not exist returns true (thus defaults to true)

  anything else returns false.

Eight individual bindings define the status property (I'll put them
on my todo list to remove since they should be inherited).  The status
property should be in a top level binding, which should be created as
part of the bindings documentation cleanup.  I'll watch for it to get
created or do it myself.

(Lots of bindings show examples of using the status property in their
examples -- that is fine.)

ePAPR v1.1, section "2.3.4 status" does not list "ok", but does list:
  "okay"
  "disabled"
  "fail"
  "fail-sss"


< snip >

-Frank
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] arm64: bpf: fix div-by-zero case

2015-11-03 Thread Xi Wang
On Tue, Nov 3, 2015 at 10:56 PM, Zi Shen Lim  wrote:
> case BPF_ALU | BPF_DIV | BPF_X:
> case BPF_ALU64 | BPF_DIV | BPF_X:
> +   {
> +   const u8 r0 = bpf2a64[BPF_REG_0];
> +
> +   /* if (src == 0) return 0 */
> +   jmp_offset = 3; /* skip ahead to else path */
> +   check_imm19(jmp_offset);
> +   emit(A64_CBNZ(is64, src, jmp_offset), ctx);
> +   emit(A64_MOVZ(1, r0, 0, 0), ctx);
> +   jmp_offset = epilogue_offset(ctx);
> +   check_imm26(jmp_offset);
> +   emit(A64_B(jmp_offset), ctx);
> +   /* else */
> emit(A64_UDIV(is64, dst, dst, src), ctx);
> break;
> +   }
> case BPF_ALU | BPF_MOD | BPF_X:
> case BPF_ALU64 | BPF_MOD | BPF_X:

BPF_MOD might need the same fix.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC] namei: prevent sgid-hardlinks for unmapped gids

2015-11-03 Thread Willy Tarreau
On Tue, Nov 03, 2015 at 03:29:55PM -0800, Kees Cook wrote:
> Using "write" does kill the set-gid bit. I haven't looked at
> why.
> Al or anyone else, is there a meaningful distinction here?

I remember this one, I got caught once while trying to put a shell into
a suid-writable file to get some privileges someone forgot to offer me :-)

It's done by should_remove_suid() which is called upon write() and truncate().

> Should the
> mmap MAP_SHARED-write trigger the loss of the set-gid bit too? While
> holding the file open with either open or mmap, I get a Text-in-use
> error, so I would kind of expect the same behavior between either
> close() and munmap(). I wonder if this is a bug, and if so, then your
> link patch is indeed useful again. :)

I don't see how this could be done with mmap(). Maybe we have a way to know
when the first write is performed via this path, I have no idea.

Willy

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] mm: Introduce kernelcore=reliable option

2015-11-03 Thread Kamezawa Hiroyuki

On 2015/10/31 4:42, Luck, Tony wrote:

If each memory controller has the same distance/latency, you (your firmware) 
don't need
to allocate reliable memory per each memory controller.
If distance is problem, another node should be allocated.

...is the behavior(splitting zone) really required ?


It's useful from a memory bandwidth perspective to have allocations
spread across both memory controllers. Keeping a whole bunch of
Xeon cores fed needs all the bandwidth you can get.



Hmm. But physical address layout is not related to dual memory controller.
I think reliable range can be contiguous by firmware...

-Kame


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] arm64: bpf: fix div-by-zero case

2015-11-03 Thread Zi Shen Lim
In the case of division by zero in a BPF program:
A = A / X;  (X == 0)
the expected behavior is to terminate with return value 0.

This is confirmed by the test case introduced in commit 86bf1721b226
("test_bpf: add tests checking that JIT/interpreter sets A and X to 0.").

Reported-by: Shi, Yang 
CC: Xi Wang 
CC: Alexei Starovoitov 
CC: Catalin Marinas 
CC: linux-arm-ker...@lists.infradead.org
CC: linux-kernel@vger.kernel.org
Fixes: e54bcde3d69d ("arm64: eBPF JIT compiler")
Cc:  # 3.18+
Signed-off-by: Zi Shen Lim 
---
 arch/arm64/net/bpf_jit.h  |  3 ++-
 arch/arm64/net/bpf_jit_comp.c | 37 +
 2 files changed, 27 insertions(+), 13 deletions(-)

diff --git a/arch/arm64/net/bpf_jit.h b/arch/arm64/net/bpf_jit.h
index 98a26ce..aee5637 100644
--- a/arch/arm64/net/bpf_jit.h
+++ b/arch/arm64/net/bpf_jit.h
@@ -1,7 +1,7 @@
 /*
  * BPF JIT compiler for ARM64
  *
- * Copyright (C) 2014 Zi Shen Lim 
+ * Copyright (C) 2014-2015 Zi Shen Lim 
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -35,6 +35,7 @@
aarch64_insn_gen_comp_branch_imm(0, offset, Rt, A64_VARIANT(sf), \
AARCH64_INSN_BRANCH_COMP_##type)
 #define A64_CBZ(sf, Rt, imm19) A64_COMP_BRANCH(sf, Rt, (imm19) << 2, ZERO)
+#define A64_CBNZ(sf, Rt, imm19) A64_COMP_BRANCH(sf, Rt, (imm19) << 2, NONZERO)
 
 /* Conditional branch (immediate) */
 #define A64_COND_BRANCH(cond, offset) \
diff --git a/arch/arm64/net/bpf_jit_comp.c b/arch/arm64/net/bpf_jit_comp.c
index c047598..9ae6f23 100644
--- a/arch/arm64/net/bpf_jit_comp.c
+++ b/arch/arm64/net/bpf_jit_comp.c
@@ -1,7 +1,7 @@
 /*
  * BPF JIT compiler for ARM64
  *
- * Copyright (C) 2014 Zi Shen Lim 
+ * Copyright (C) 2014-2015 Zi Shen Lim 
  *
  * This program is free software; you can redistribute it and/or modify
  * it under the terms of the GNU General Public License version 2 as
@@ -225,6 +225,17 @@ static int build_insn(const struct bpf_insn *insn, struct 
jit_ctx *ctx)
u8 jmp_cond;
s32 jmp_offset;
 
+#define check_imm(bits, imm) do {  \
+   if imm) > 0) && ((imm) >> (bits))) ||   \
+   (((imm) < 0) && (~(imm) >> (bits {  \
+   pr_info("[%2d] imm=%d(0x%x) out of range\n",\
+   i, imm, imm);   \
+   return -EINVAL; \
+   }   \
+} while (0)
+#define check_imm19(imm) check_imm(19, imm)
+#define check_imm26(imm) check_imm(26, imm)
+
switch (code) {
/* dst = src */
case BPF_ALU | BPF_MOV | BPF_X:
@@ -258,8 +269,21 @@ static int build_insn(const struct bpf_insn *insn, struct 
jit_ctx *ctx)
break;
case BPF_ALU | BPF_DIV | BPF_X:
case BPF_ALU64 | BPF_DIV | BPF_X:
+   {
+   const u8 r0 = bpf2a64[BPF_REG_0];
+
+   /* if (src == 0) return 0 */
+   jmp_offset = 3; /* skip ahead to else path */
+   check_imm19(jmp_offset);
+   emit(A64_CBNZ(is64, src, jmp_offset), ctx);
+   emit(A64_MOVZ(1, r0, 0, 0), ctx);
+   jmp_offset = epilogue_offset(ctx);
+   check_imm26(jmp_offset);
+   emit(A64_B(jmp_offset), ctx);
+   /* else */
emit(A64_UDIV(is64, dst, dst, src), ctx);
break;
+   }
case BPF_ALU | BPF_MOD | BPF_X:
case BPF_ALU64 | BPF_MOD | BPF_X:
ctx->tmp_used = 1;
@@ -393,17 +417,6 @@ emit_bswap_uxt:
emit(A64_ASR(is64, dst, dst, imm), ctx);
break;
 
-#define check_imm(bits, imm) do {  \
-   if imm) > 0) && ((imm) >> (bits))) ||   \
-   (((imm) < 0) && (~(imm) >> (bits {  \
-   pr_info("[%2d] imm=%d(0x%x) out of range\n",\
-   i, imm, imm);   \
-   return -EINVAL; \
-   }   \
-} while (0)
-#define check_imm19(imm) check_imm(19, imm)
-#define check_imm26(imm) check_imm(26, imm)
-
/* JUMP off */
case BPF_JMP | BPF_JA:
jmp_offset = bpf2a64_offset(i + off, i, ctx);
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/3] cpuidle: small improvements & fixes for menu governor

2015-11-03 Thread Joe Perches
On Wed, 2015-11-04 at 00:03 +0100, Rafael J. Wysocki wrote:
> On 11/3/2015 11:35 PM, Rik van Riel wrote:
> > On 11/03/2015 05:05 PM, Rafael J. Wysocki wrote:
> > > On 10/28/2015 11:46 PM, r...@redhat.com wrote:
> > > > While working on a paravirt cpuidle driver for KVM guests, I
> > > > noticed a number of small logic errors in the menu governor
> > > > code.
> > > > 
> > > > These patches should get rid of some artifacts that can break
> > > > the logic in the menu governor under certain corner cases, and
> > > > make idle state selection work better on CPUs with long C1 exit
> > > > latencies.
> > > > 
> > > > I have not seen any adverse effects with them in my (quick)
> > > > tests. As expected, they do not seem to do much on systems with
> > > > many power states and very low C1 exit latencies and target
> > > > residencies.
> > > > 
> > > Sorry for the trouble, but can you please resend the series with
> > > CCs to
> > > linux...@vger.kernel.org?  That will make it way easier to handle
> > > for me.
> > Not a problem. Done.
> > 
> > What change do I need to send in to ensure that the
> > linux-pm mailing list shows up in get_maintainer.pl
> > output?
> > 
> > $ ./scripts/get_maintainer.pl -f drivers/cpuidle/governors/menu.c
> > "Rafael J. Wysocki" 
> > (commit_signer:4/7=57%,authored:1/7=14%,added_lines:2/19=11%,remove
> > d_lines:2/28=7%)
> > Daniel Lezcano 
> > (commit_signer:3/7=43%,authored:1/7=14%,added_lines:1/19=5%)
> > Rik van Riel 
> > (commit_signer:3/7=43%,authored:3/7=43%,added_lines:5/19=26%,remove
> > d_lines:3/28=11%)
> > Len Brown 
> > (commit_signer:1/7=14%,authored:1/7=14%,added_lines:10/19=53%,remov
> > ed_lines:15/28=54%)
> > "Peter Zijlstra (Intel)" 
> > (commit_signer:1/7=14%)
> > Javi Merino 
> > (authored:1/7=14%,added_lines:1/19=5%,removed_lines:7/28=25%)
> > linux-kernel@vger.kernel.org (open list)
> > 
> 
> I'm not sure why it doesn't show up in there.
> If you look at MAINTAINERS under CPUIDLE DRIVERS, linux-pm is
> actually listed there.

Because the pattern is just the files in the top level directory
of drivers/cpuidle and not any files in any directory below

>From the MAINTAINERS pattern descriptions:

F: Files and directories with wildcard patterns.
   A trailing slash includes all files and subdirectory files.
   F:   drivers/net/all files in and below
drivers/net
   F:   drivers/net/*   all files in drivers/net, but
not below

> Maybe the answer is that get_maintainer.pl needs to be fixed ...

Nope, the MAINTAINERS - CPUIDLE DRIVERS entry does though.

---
 MAINTAINERS | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/MAINTAINERS b/MAINTAINERS
index 77ed3a0..8bd5c7e 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -3004,7 +3004,7 @@ M:Daniel Lezcano 
 L: linux...@vger.kernel.org
 S: Maintained
 T: git git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm.git
-F: drivers/cpuidle/*
+F: drivers/cpuidle/
 F: include/linux/cpuidle.h
 
 CPUID/MSR DRIVER

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v7] PCI: Xilinx-NWL-PCIe: Added support for Xilinx NWL PCIe Host Controller

2015-11-03 Thread Bharat Kumar Gogada
> > +static struct msi_domain_info nwl_msi_domain_info = {
> > +   .flags = (MSI_FLAG_USE_DEF_DOM_OPS |
> MSI_FLAG_USE_DEF_CHIP_OPS |
> > + MSI_FLAG_MULTI_PCI_MSI),
> 
> If you're supporting multi-MSI, how do you ensure that all hwirqs are
> contiguous as required by the spec? Clearly, your allocator doesn't provide
> this guarantee.
> 
Oh ok, then how can we know if one EP is requesting for multiple irq's, because 
in pci_enable_msi_range,
 it does do while loop according to msi_capability_init return value 
which in turn requests for multiple irq's. Is there any way to find out when 
single EP requesting for multiple MSI?

> Also, you still lack support for MSI-X (which would come for free...).

We don't support MSI-X in root port mode.
> 
> > +   .chip = _msi_irq_chip,
> > +};
> > +#endif
> > +
> > +   irq_domain_remove(pcie->legacy_irq_domain);
> > +
> > +#ifdef CONFIG_PCI_MSI
> > +   irq_set_chained_handler_and_data(msi->irq_msi0, NULL, NULL);
> > +   irq_set_chained_handler_and_data(msi->irq_msi1, NULL, NULL);
> > +
> > +   irq_domain_remove(msi->msi_domain);
> > +   irq_domain_remove(msi->dev_domain);
> > +#endif
> > +
> > +}
> 
> Remove these #ifdefs. You can test the validity of these fields before calling
> the various functions. You can also move this to a separate function.

Without #ifdefs if we compile driver for legacy, MSI structures will not be 
available and we get compile time error.
> 
> > +
> > +   /* setup AFI/FPCI range */
> > +   msi->pages = __get_free_pages(GFP_KERNEL, 0);
> > +   base = virt_to_phys((void *)msi->pages);
> > +   nwl_bridge_writel(pcie, lower_32_bits(base), I_MSII_BASE_LO);
> > +   nwl_bridge_writel(pcie, upper_32_bits(base), I_MSII_BASE_HI);
> 
> I just read this, and I'm puzzled. Actually, puzzled is an understatement. Why
> on Earth do you need to give RAM to your MSI HW?
> This should be a device, not RAM. By the look of it, this could be absolutely
> anything. Are you sure you have to supply RAM here?
> 
This is required in our hardware, so that bridge identifies incoming MWr as MSI.
> > +
> > +   /* Disable high range msi interrupts */
> > +   nwl_bridge_writel(pcie, (u32)~MSGF_MSI_SR_HI_MASK,
> > +MSGF_MSI_MASK_HI);
> > +
> > +

Bharat
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


linux-next: Tree for Nov 4

2015-11-03 Thread Stephen Rothwell
Hi all,

Please do *not* add any material intended for v4.5 to your linux-next
included branches until after v4.4-rc1 has been released.

Changes since 20151103:

The battery tree still had its build failure so I used the version from
next-20150925.

The mailbox tree still had its build failure so I used the version from
next-20151022.

The akpm tree gained a conflict against the drm tree and lost a patch
that turned up elsewhere.

Non-merge commits (relative to Linus' tree): 10589
 8442 files changed, 409626 insertions(+), 188833 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig for x86_64,
a multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), it is also built with powerpc allnoconfig
(32 and 64 bit), ppc44x_defconfig, allyesconfig (this fails its final
link) and pseries_le_defconfig and i386, sparc, sparc64 and arm defconfig.

Below is a summary of the state of the merge.

I am currently merging 229 trees (counting Linus' and 35 trees of patches
pending for Linus' tree).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au

$ git checkout master
$ git reset --hard stable
Merging origin/master (316dde2fe95b Merge branch 'for-linus' of 
git://ftp.arm.linux.org.uk/~rmk/linux-arm)
Merging fixes/master (25cb62b76430 Linux 4.3-rc5)
Merging kbuild-current/rc-fixes (3d1450d54a4f Makefile: Force gzip and xz on 
module install)
Merging arc-current/for-curr (049e6dde7e57 Linux 4.3-rc4)
Merging arm-current/fixes (38850d786a79 ARM: 8449/1: fix bug in vdsomunge 
swab32 macro)
Merging m68k-current/for-linus (95bc06ef049b m68k/defconfig: Update defconfigs 
for v4.3-rc1)
Merging metag-fixes/fixes (0164a711c97b metag: Fix ioremap_wc/ioremap_cached 
build errors)
Merging mips-fixes/mips-fixes (1795cd9b3a91 Linux 3.16-rc5)
Merging powerpc-fixes/fixes (977bf062bba3 powerpc/dma: dma_set_coherent_mask() 
should not be GPL only)
Merging powerpc-merge-mpe/fixes (bc0195aad0da Linux 4.2-rc2)
Merging sparc/master (73958c651fbf sparc64: use ENTRY/ENDPROC in VISsave)
Merging net/master (ebac62fe3d24 ipv6: fix tunnel error handling)
Merging ipsec/master (a8a572a6b5f2 xfrm: dst_entries_init() per-net dst_ops)
Merging ipvs/master (6ece90f9a13e netfilter: fix Kconfig dependencies for 
nf_dup_ipv{4,6})
Merging sound-current/for-linus (de1ab6af5c3d ALSA: hda - Fix lost 4k BDL 
boundary workaround)
Merging pci-current/for-linus (1266963170f5 PCI: Prevent out of bounds access 
in numa_node override)
Merging wireless-drivers/master (de28a05ee28e Merge tag 
'iwlwifi-for-kalle-2015-10-05' of 
git://git.kernel.org/pub/scm/linux/kernel/git/iwlwifi/iwlwifi-fixes)
Merging driver-core.current/driver-core-linus (9ffecb102835 Linux 4.3-rc3)
Merging tty.current/tty-linus (f235f664a8af fbcon: initialize blink interval 
before calling fb_set_par)
Merging usb.current/usb-linus (32b88194f71d Linux 4.3-rc7)
Merging usb-gadget-fixes/fixes (25cb62b76430 Linux 4.3-rc5)
Merging usb-serial-fixes/usb-linus (32b88194f71d Linux 4.3-rc7)
Merging usb-chipidea-fixes/ci-for-usb-stable (f256896afdb6 usb: chipidea: otg: 
gadget module load and unload support)
Merging staging.current/staging-linus (32b88194f71d Linux 4.3-rc7)
Merging char-misc.current/char-misc-linus (25cb62b76430 Linux 4.3-rc5)
Merging input-current/for-linus (195562194aad Input: alps - only the Dell 
Latitude D420/430/620/630 have separate stick button bits)
Merging crypto-current/master (4afa5f961792 crypto: algif_hash - Only export 
and import on sockets with data)
Merging ide/master (353b39d1b151 ide: pdc202xx_new: Replace timeval with 
ktime_t)
Merging devicetree-current/devicetree/merge (f76502aa9140 of/dynamic: Fix test 
for PPC_PSERIES)
Merging rr-fixes/fixes (275d7d44d802 module: Fix locking in symbol_put_addr())
Merging vfio-fixes/for-linus (4bc94d5dc95d vfio: Fix lockdep issue)
Merging kselftest-fixes/fixes (ae785818

linux-next: manual merge of the akpm tree with the drm tree

2015-11-03 Thread Stephen Rothwell
Hi Andrew,

Today's linux-next merge of the akpm tree got a conflict in:

  drivers/gpu/drm/nouveau/nouveau_ttm.c

between commits:

  524883bb4846 ("drm/nouveau/ttm: convert to DMA API")
  b31cf78b9324 ("drm/nouveau/ttm: set the DMA mask for platform devices")

from the drm tree and patch:

  "nouveau: don't call pci_dma_supported"

from the akpm tree.

I fixed it up (so the patch now looks like below) and can carry the fix
as necessary (no action is required).

diff --git a/drivers/gpu/drm/nouveau/nouveau_ttm.c 
b/drivers/gpu/drm/nouveau/nouveau_ttm.c
index 3f713c1b5dc1..d2e7d209f651 100644
--- a/drivers/gpu/drm/nouveau/nouveau_ttm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_ttm.c
@@ -353,8 +353,7 @@ nouveau_ttm_init(struct nouveau_drm *drm)
 
bits = nvxx_mmu(>device)->dma_bits;
if (nvxx_device(>device)->func->pci) {
-   if (drm->agp.bridge ||
-!dma_supported(dev->dev, DMA_BIT_MASK(bits)))
+   if (drm->agp.bridge)
bits = 32;
} else if (device->func->tegra) {
struct nvkm_device_tegra *tegra = device->func->tegra(device);
@@ -369,6 +368,10 @@ nouveau_ttm_init(struct nouveau_drm *drm)
}
 
ret = dma_set_mask(dev->dev, DMA_BIT_MASK(bits));
+   if (ret && bits != 32) {
+   bits = 32;
+   ret = dma_set_mask(dev->dev, DMA_BIT_MASK(bits));
+   }
if (ret)
return ret;
 

-- 
Cheers,
Stephen Rothwells...@canb.auug.org.au
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [RFC PATCH 0/3] CFS idle injection

2015-11-03 Thread Eduardo Valentin
Hello Jacob,

On Mon, Nov 02, 2015 at 04:10:25PM -0800, Jacob Pan wrote:
> Hi Peter and all,
> 
> A while ago, we had discussion about how powerclamp is broken in the
> sense of turning off idle ticks in the forced idle period.
> https://lkml.org/lkml/2014/12/18/369
> 
> It was suggested to replace the current kthread play idle loop with a
> timer based runqueue throttling scheme. I finally got around to implement
> this and code is much simpler. I also have good test results in terms of
> efficiency, scalability, etc.
> http://events.linuxfoundation.org/sites/events/files/slides/LinuxCon_Japan_2015_idle_injection1_0.pdf
> slide #18+ shows the data on client and server.
> 
> I have two choices for this code:
> 1) be part of existing powerclamp driver but require exporting some
>sched APIs.
> 2) be part of sched since the genernal rule applies when it comes down
>to sycnhronized idle time for best power savings.
> 
> The patches below are for #2. There is a known problem with LOW RES timer
> mode that I am working on. But I am hoping to get review earlier.
> 

I also like #2 too. Specially now that it is not limited to a specific
platform. One question though, could you still keep the cooling device
support of it? In some systems, it might make sense to enable / disable
idle injections based on temperature.

Was there any particular reason you dropped the cooling device support?

BR,

Eduardo Valentin


> We are entering a very power limited environment on client side, frequency
> scaling can only be efficient at certain range. e.g. on SKL, upto ~900MHz,
> anything below, it is increasingly more efficient to do C-states insertion
> if coordinated.
> 
> Looking forward, there are use case beyond thermal/power capping. I think
> we can consolidate ballanced partial busy workload that are evenly
> distributed among CPUs.
> 
> Please let me know what you think.
> 
> Thanks,
> 
> 
> Jacob Pan (3):
>   ktime: add a roundup function
>   timer: relax tick stop in idle entry
>   sched: introduce synchronized idle injection
> 
>  include/linux/ktime.h|  10 ++
>  include/linux/sched.h|  12 ++
>  include/linux/sched/sysctl.h |   5 +
>  include/trace/events/sched.h |  23 +++
>  init/Kconfig |   8 +
>  kernel/sched/fair.c  | 345 
> +++
>  kernel/sched/sched.h |   3 +
>  kernel/sysctl.c  |  20 +++
>  kernel/time/tick-sched.c |   2 +-
>  9 files changed, 427 insertions(+), 1 deletion(-)
> 
> -- 
> 1.9.1
> 
> --
> To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
> the body of a message to majord...@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> Please read the FAQ at  http://www.tux.org/lkml/
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)

2015-11-03 Thread Daniel Micay
On 04/11/15 12:53 AM, Daniel Micay wrote:
>> In the common case it will be passed many pages by the allocator. There
>> will still be a layer of purging logic on top of MADV_FREE but it can be
>> much thinner than the current workarounds for MADV_DONTNEED. So the
>> allocator would still be coalescing dirty ranges and only purging when
>> the ratio of dirty:clean pages rises above some threshold. It would be
>> able to weight the largest ranges for purging first rather than logic
>> based on stuff like aging as is used for MADV_DONTNEED.
> 
> I would expect that jemalloc would just start putting the dirty ranges
> into the usual pair of red-black trees (with coalescing) and then doing
> purging starting from the largest spans to get back down below whatever
> dirty:clean ratio it's trying to keep. Right now, it has all lots of
> other logic to deal with this since each MADV_DONTNEED call results in
> lots of zeroing and then page faults.

Er, I mean dirty:active (i.e. ratio of unpurged, dirty pages to ones
that are handed out as allocations, which is kept at something like
1:8). A high constant cost in the madvise call but quick handling of
each page means that allocators need to be more aggressive with purging
more than they strictly need to in one go. For example, it might need to
purge 2M to meet the ratio but it could have a contiguous span of 32M of
dirty pages. If the cost per page is low enough, it could just do the
entire range.



signature.asc
Description: OpenPGP digital signature


Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)

2015-11-03 Thread Brendan Gregg
On Tue, Nov 3, 2015 at 5:54 PM, Namhyung Kim  wrote:
> Hi Brendan,
>
> On Tue, Nov 03, 2015 at 01:33:43PM -0800, Brendan Gregg wrote:
>> On Tue, Nov 3, 2015 at 6:40 AM, Arnaldo Carvalho de Melo
>>  wrote:
>> > Em Tue, Nov 03, 2015 at 09:52:07PM +0900, Namhyung Kim escreveu:
>> >> Hello,
>> >>
>> >> This is what Brendan requested on the perf-users mailing list [1] to
>> >> support FlameGraphs [2] more efficiently.  This patchset adds a few
>> >> more callchain options to adjust the output for it.
>> >>
>> >>  * changes in v4)
>> >>   - add missing doc update
>> >>   - cleanup/fix callchain value print code
>> >>   - add Acked-by from Brendan and Jiri
>> >
>> > Do those Acked-by stand? Things changed, the values moved from the end
>> > of the line to the start, etc.
>> >
>> [...]
>>
>> I'd Ack this change as it's a useful addition. It doesn't quite
>> address the folded-only output, but it's a step in that direction. I
>> think having the value at the start of a line only makes sense for the
>> perf report output containing the hist summary lines, for consistency.
>
> Right, thanks!
>
>
>>
>> Here's how I'd shuffle the output of this patch (ignore word wrap
>> issues with this email):
>>
>> # ./perf report --stdio -g folded,count,caller -F pid | \
>> awk '/^ / { n = $1 }
>> /^[0-9]/ { split(n,a,":"); print a[2] "-" a[1] ";" $2,$1 }'
>> swapper-0;cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
>> 809
>> swapper-0;xen_start_kernel;x86_64_start_reservations;start_kernel;rest_init;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
>> 135
>> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;check_events;xen_hypercall_xen_version
>> 63
>> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf
>> 54
>> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;memset_erms
>> 3
>> dd-30551;xen_irq_enable_direct_end;check_events;xen_hypercall_xen_version 3
>>
>> So the output is folded stacks, prefixed by comm-PID. Shuffling the
>> summarized output is a lot better than doing a "perf script" dump and
>> re-processing call chains. (Note that since I'm using -F, I didn't
>> need --no-children;
>
> Nope.  The '-F pid' doesn't affect --children.  It doesn't show the
> children overhead column but we still have hist entries for
> (synthesized) children..
>
>   $ perf report --no-children | wc -l
>   998
>
>   $ perf report --no-children -F pid,dso,sym | wc -l
>   998
>
>   $ perf report --children | wc -l
>   3229
>
>   $ perf report --children -F pid,dso,sym | wc -l
>   3202
>
> So I think you still need to use --no-children (or set report.children
> config variable to false) for your script.

Ok, good to know, thanks.

>
>
>> and with "-g count", I didn't need --show-nr-samples.)
>
> Yes, I used -n/--show-nr-samples just to check the number is correct.
>
>
>>
>> I notice the fields (-F) option already has this precedent:
>>
>> - "comm": prints PID:comm
>> - "pid": prints PID
>
> It's opposite:  "comm" prints comm, "pid" prints PID:comm. :)

Ah, right, sorry, I'd typed those the wrong way around. :)

>
>
>>
>> If these were added to -g, along with a no-hists, then the two types
>> of folded-only output could be generated using:
>>
>> perf report --stdio -g folded,count,comm,no-hists,caller
>> perf report --stdio -g folded,count,pid,no-hists,caller
>
> As I said, using fields like comm, pid requires to have same keys in
> --sort option.  So it's basically unreliable to use those specific
> field names in the -g option IMHO.  I suggested to use 'info' (yes, it
> needs better name) to print all sort keys.
>
>
>>
>> ... although "no-hists" doesn't hit me as intuitive. How about "-F
>> none" to specify zero columns? ie:
>>
>> perf report --stdio -g folded,count,comm,caller -F none
>> perf report --stdio -g folded,count,pid,caller -F none
>
> Ah, makes sense.  So it'd look like
>
>   $ perf report --stdio -g folded,count,info -F none -s comm
>   $ perf report --stdio -g folded,count,info -F none -s pid
>
> The output would be
>
>   809 swapper-0 
> cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
>

Thanks, looks almost right: a couple of minor changes:

1. If perf already has the precedent of "PID:comm", instead of my
"comm-PID", then maybe it should use "PID:comm" for perf consistency.
Doesn't make much difference to me.
2. The second space, delimiting "PID:comm" (or comm) and the stack...
I'm nervous about using space as a delimiter any more than once, since
it can also appear in comm (eg, "java main") and frames (eg,
"JavaCalls::call_helper(JavaValue*, methodHandle*, JavaCallArguments*,
Thread*)" -- that's direct from "perf script"!). I'd consider 

Re: [PATCH v2 1/7] staging: lustre: remove white space in libcfs_hash.h

2015-11-03 Thread Greg Kroah-Hartman
On Mon, Nov 02, 2015 at 12:22:07PM -0500, James Simmons wrote:
> Cleanup all the unneeded white space in libcfs_hash.h.
> 
> Signed-off-by: James Simmons 
> ---
>  .../lustre/include/linux/libcfs/libcfs_hash.h  |  135 
> ++--
>  1 files changed, 70 insertions(+), 65 deletions(-)

Doesn't apply, did I already queue up this series?

confused,

greg k-h
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] Staging: comedi: ssc_dnp: fixed a comment style

2015-11-03 Thread Greg KH
On Fri, Oct 30, 2015 at 06:46:20PM +0100, Philippe Loctaux wrote:
> Fixed a comment issue.
> 
> Signed-off-by: Philippe Loctaux 
> ---
>  drivers/staging/comedi/drivers/ssv_dnp.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/staging/comedi/drivers/ssv_dnp.c 
> b/drivers/staging/comedi/drivers/ssv_dnp.c
> index acc7f34..c28b6cb 100644
> --- a/drivers/staging/comedi/drivers/ssv_dnp.c
> +++ b/drivers/staging/comedi/drivers/ssv_dnp.c
> @@ -150,7 +150,8 @@ static int dnp_attach(struct comedi_device *dev, struct 
> comedi_devconfig *it)
>  
>   /* We use the I/O ports 0x22,0x23 and 0xa3-0xa9, which are always
>* allocated for the primary 8259, so we don't need to allocate them
> -  * ourselves. */
> +  * ourselves.
> +  */
>  
>   /* configure all ports as input (default)*/
>   outb(PAMR, CSCIR);

Does not apply to my tree at all :(
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)

2015-11-03 Thread Daniel Micay
> In the common case it will be passed many pages by the allocator. There
> will still be a layer of purging logic on top of MADV_FREE but it can be
> much thinner than the current workarounds for MADV_DONTNEED. So the
> allocator would still be coalescing dirty ranges and only purging when
> the ratio of dirty:clean pages rises above some threshold. It would be
> able to weight the largest ranges for purging first rather than logic
> based on stuff like aging as is used for MADV_DONTNEED.

I would expect that jemalloc would just start putting the dirty ranges
into the usual pair of red-black trees (with coalescing) and then doing
purging starting from the largest spans to get back down below whatever
dirty:clean ratio it's trying to keep. Right now, it has all lots of
other logic to deal with this since each MADV_DONTNEED call results in
lots of zeroing and then page faults.



signature.asc
Description: OpenPGP digital signature


[lkp] [f5f3497cad] 8678414074: Fixed kernel test crashed

2015-11-03 Thread kernel test robot
FYI, we noticed the below changes on

https://github.com/0day-ci/linux 
Matt-Fleming/x86-setup-Fix-recent-boot-crash-on-32-bit-SMP-machines/20151103-221445
commit 867841407423402e594463cbdddc800f1c5f7b6f ("f5f3497cad: BUG: kernel boot 
crashed")


++++
|| 90939cc535 | 8678414074 |
++++
| boot_successes | 0  | 9  |
| boot_failures  | 14 | 3  |
| BUG:kernel_boot_crashed| 12 ||
| IP-Config:Auto-configuration_of_network_failed | 2  | 2  |
| BUG:kernel_test_crashed| 0  | 1  |
++++

The patch fixes the kernel test crashed issue before.

Tested-by: Huang, Ying 

Thanks,
Ying Huang
#
# Automatically generated file; DO NOT EDIT.
# Linux/i386 4.3.0-rc7 Kernel Configuration
#
# CONFIG_64BIT is not set
CONFIG_X86_32=y
CONFIG_X86=y
CONFIG_INSTRUCTION_DECODER=y
CONFIG_PERF_EVENTS_INTEL_UNCORE=y
CONFIG_OUTPUT_FORMAT="elf32-i386"
CONFIG_ARCH_DEFCONFIG="arch/x86/configs/i386_defconfig"
CONFIG_LOCKDEP_SUPPORT=y
CONFIG_STACKTRACE_SUPPORT=y
CONFIG_HAVE_LATENCYTOP_SUPPORT=y
CONFIG_MMU=y
CONFIG_NEED_SG_DMA_LENGTH=y
CONFIG_GENERIC_ISA_DMA=y
CONFIG_GENERIC_BUG=y
CONFIG_GENERIC_HWEIGHT=y
CONFIG_ARCH_MAY_HAVE_PC_FDC=y
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_GENERIC_CALIBRATE_DELAY=y
CONFIG_ARCH_HAS_CPU_RELAX=y
CONFIG_ARCH_HAS_CACHE_LINE_SIZE=y
CONFIG_HAVE_SETUP_PER_CPU_AREA=y
CONFIG_NEED_PER_CPU_EMBED_FIRST_CHUNK=y
CONFIG_NEED_PER_CPU_PAGE_FIRST_CHUNK=y
CONFIG_ARCH_HIBERNATION_POSSIBLE=y
CONFIG_ARCH_SUSPEND_POSSIBLE=y
CONFIG_ARCH_WANT_HUGE_PMD_SHARE=y
CONFIG_ARCH_WANT_GENERAL_HUGETLB=y
CONFIG_ARCH_SUPPORTS_OPTIMIZED_INLINING=y
CONFIG_ARCH_SUPPORTS_DEBUG_PAGEALLOC=y
CONFIG_X86_32_SMP=y
CONFIG_ARCH_HWEIGHT_CFLAGS="-fcall-saved-ecx -fcall-saved-edx"
CONFIG_ARCH_SUPPORTS_UPROBES=y
CONFIG_FIX_EARLYCON_MEM=y
CONFIG_PGTABLE_LEVELS=2
CONFIG_DEFCONFIG_LIST="/lib/modules/$UNAME_RELEASE/.config"
CONFIG_IRQ_WORK=y
CONFIG_BUILDTIME_EXTABLE_SORT=y

#
# General setup
#
CONFIG_INIT_ENV_ARG_LIMIT=32
CONFIG_CROSS_COMPILE=""
# CONFIG_COMPILE_TEST is not set
CONFIG_LOCALVERSION=""
CONFIG_LOCALVERSION_AUTO=y
CONFIG_HAVE_KERNEL_GZIP=y
CONFIG_HAVE_KERNEL_BZIP2=y
CONFIG_HAVE_KERNEL_LZMA=y
CONFIG_HAVE_KERNEL_XZ=y
CONFIG_HAVE_KERNEL_LZO=y
CONFIG_HAVE_KERNEL_LZ4=y
# CONFIG_KERNEL_GZIP is not set
# CONFIG_KERNEL_BZIP2 is not set
CONFIG_KERNEL_LZMA=y
# CONFIG_KERNEL_XZ is not set
# CONFIG_KERNEL_LZO is not set
# CONFIG_KERNEL_LZ4 is not set
CONFIG_DEFAULT_HOSTNAME="(none)"
CONFIG_SYSVIPC=y
CONFIG_SYSVIPC_SYSCTL=y
CONFIG_POSIX_MQUEUE=y
CONFIG_POSIX_MQUEUE_SYSCTL=y
# CONFIG_CROSS_MEMORY_ATTACH is not set
CONFIG_FHANDLE=y
CONFIG_USELIB=y
CONFIG_AUDIT=y
CONFIG_HAVE_ARCH_AUDITSYSCALL=y
# CONFIG_AUDITSYSCALL is not set

#
# IRQ subsystem
#
CONFIG_GENERIC_IRQ_PROBE=y
CONFIG_GENERIC_IRQ_SHOW=y
CONFIG_GENERIC_PENDING_IRQ=y
CONFIG_GENERIC_IRQ_CHIP=y
CONFIG_IRQ_DOMAIN=y
CONFIG_IRQ_DOMAIN_HIERARCHY=y
# CONFIG_IRQ_DOMAIN_DEBUG is not set
CONFIG_IRQ_FORCED_THREADING=y
CONFIG_SPARSE_IRQ=y
CONFIG_CLOCKSOURCE_WATCHDOG=y
CONFIG_ARCH_CLOCKSOURCE_DATA=y
CONFIG_CLOCKSOURCE_VALIDATE_LAST_CYCLE=y
CONFIG_GENERIC_TIME_VSYSCALL=y
CONFIG_GENERIC_CLOCKEVENTS=y
CONFIG_GENERIC_CLOCKEVENTS_BROADCAST=y
CONFIG_GENERIC_CLOCKEVENTS_MIN_ADJUST=y
CONFIG_GENERIC_CMOS_UPDATE=y

#
# Timers subsystem
#
CONFIG_TICK_ONESHOT=y
CONFIG_HZ_PERIODIC=y
# CONFIG_NO_HZ_IDLE is not set
CONFIG_NO_HZ=y
CONFIG_HIGH_RES_TIMERS=y

#
# CPU/Task time and stats accounting
#
# CONFIG_TICK_CPU_ACCOUNTING is not set
CONFIG_IRQ_TIME_ACCOUNTING=y
CONFIG_BSD_PROCESS_ACCT=y
CONFIG_BSD_PROCESS_ACCT_V3=y
# CONFIG_TASKSTATS is not set

#
# RCU Subsystem
#
CONFIG_PREEMPT_RCU=y
CONFIG_RCU_EXPERT=y
CONFIG_SRCU=y
# CONFIG_TASKS_RCU is not set
CONFIG_RCU_STALL_COMMON=y
CONFIG_RCU_FANOUT=32
CONFIG_RCU_FANOUT_LEAF=16
CONFIG_TREE_RCU_TRACE=y
# CONFIG_RCU_BOOST is not set
CONFIG_RCU_KTHREAD_PRIO=0
CONFIG_RCU_NOCB_CPU=y
# CONFIG_RCU_NOCB_CPU_NONE is not set
# CONFIG_RCU_NOCB_CPU_ZERO is not set
CONFIG_RCU_NOCB_CPU_ALL=y
# CONFIG_RCU_EXPEDITE_BOOT is not set
CONFIG_BUILD_BIN2C=y
CONFIG_IKCONFIG=y
# CONFIG_IKCONFIG_PROC is not set
CONFIG_LOG_BUF_SHIFT=17
CONFIG_LOG_CPU_MAX_BUF_SHIFT=12
CONFIG_HAVE_UNSTABLE_SCHED_CLOCK=y
CONFIG_ARCH_WANT_BATCHED_UNMAP_TLB_FLUSH=y
CONFIG_CGROUPS=y
# CONFIG_CGROUP_DEBUG is not set
# CONFIG_CGROUP_FREEZER is not set
CONFIG_CGROUP_PIDS=y
# CONFIG_CGROUP_DEVICE is not set
CONFIG_CPUSETS=y
# CONFIG_PROC_PID_CPUSET is not set
# CONFIG_CGROUP_CPUACCT is not set
# CONFIG_MEMCG is not set
# CONFIG_CGROUP_HUGETLB is not set
# CONFIG_CGROUP_PERF is not set
CONFIG_CGROUP_SCHED=y
CON

Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)

2015-11-03 Thread Daniel Micay
> Does this set the write protect bit?
> 
> What happens on architectures without hardware dirty tracking?

It's supposed to avoid needing page faults when the data is accessed
again, but it can just be implemented via page faults on architectures
without a way to check for access or writes. MADV_DONTNEED is also a
valid implementation of MADV_FREE if it comes to that (which is what it
does on swapless systems for now).

> Using the dirty bit for these semantics scares me.  This API creates a
> page that can have visible nonzero contents and then can
> asynchronously and magically zero itself thereafter.  That makes me
> nervous.  Could we use the accessed bit instead?  Then the observable
> semantics would be equivalent to having MADV_FREE either zero the page
> or do nothing, except that it doesn't make up its mind until the next
> read.

FWIW, those are already basically the semantics provided by GCC and LLVM
for data the compiler considers uninitialized (they could be more
aggressive since C just says it's undefined, but in practice they allow
it but can produce inconsistent results even if it isn't touched).

http://llvm.org/docs/LangRef.html#undefined-values

It doesn't seem like there would be an advantage to checking if the data
was written to vs. whether it was accessed if checking for both of those
is comparable in performance. I don't know enough about that.

>> +   ptent = pte_mkold(ptent);
>> +   ptent = pte_mkclean(ptent);
>> +   set_pte_at(mm, addr, pte, ptent);
>> +   tlb_remove_tlb_entry(tlb, pte, addr);
> 
> It looks like you are flushing the TLB.  In a multithreaded program,
> that's rather expensive.  Potentially silly question: would it be
> better to just zero the page immediately in a multithreaded program
> and then, when swapping out, check the page is zeroed and, if so, skip
> swapping it out?  That could be done without forcing an IPI.

In the common case it will be passed many pages by the allocator. There
will still be a layer of purging logic on top of MADV_FREE but it can be
much thinner than the current workarounds for MADV_DONTNEED. So the
allocator would still be coalescing dirty ranges and only purging when
the ratio of dirty:clean pages rises above some threshold. It would be
able to weight the largest ranges for purging first rather than logic
based on stuff like aging as is used for MADV_DONTNEED.



signature.asc
Description: OpenPGP digital signature


Re: [PATCH v1 1/4] Crypto: Crypto driver support aes/des/des3 for rk3288

2015-11-03 Thread Zain
Hi LABBE,

On 2015年11月03日 16:59, LABBE Corentin wrote:
> On Tue, Nov 03, 2015 at 01:52:05PM +0800, Zain Wang wrote:
>> Crypto driver support cbc/ecb two chainmode, and aes/des/des3 three cipher
>> mode.
>> The names registered are:
>> ecb(aes) cbc(aes) ecb(des) cbc(des) ecb(des3_ede) cbc(des3_ede)
>> You can alloc tags above in your case.
>>
>> And other algorithms and platforms will be added later on.
>>
>> Signed-off-by: Zain Wang 
>> ---
>>  drivers/crypto/Kconfig |  11 +
>>  drivers/crypto/Makefile|   1 +
>>  drivers/crypto/rockchip/Makefile   |   3 +
>>  drivers/crypto/rockchip/rk3288_crypto.c| 383 
>>  drivers/crypto/rockchip/rk3288_crypto.h| 290 
>>  drivers/crypto/rockchip/rk3288_crypto_ablkcipher.c | 501 
>> +
>>  6 files changed, 1189 insertions(+)
>>  create mode 100644 drivers/crypto/rockchip/Makefile
>>  create mode 100644 drivers/crypto/rockchip/rk3288_crypto.c
>>  create mode 100644 drivers/crypto/rockchip/rk3288_crypto.h
>>  create mode 100644 drivers/crypto/rockchip/rk3288_crypto_ablkcipher.c
>>
>> diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
>> index 2569e04..d1e42cf 100644
>> --- a/drivers/crypto/Kconfig
>> +++ b/drivers/crypto/Kconfig
>> @@ -498,4 +498,15 @@ config CRYPTO_DEV_SUN4I_SS
>>To compile this driver as a module, choose M here: the module
>>will be called sun4i-ss.
>>  
>> +config CRYPTO_DEV_ROCKCHIP
>> +tristate "Rockchip's Cryptographic Engine driver"
>> +
>> +select CRYPTO_AES
>> +select CRYPTO_DES
>> +select CRYPTO_BLKCIPHER
>> +
>> +help
>> +  This driver interfaces with the hardware crypto accelerator.
>> +  Supporting cbc/ecb chainmode, and aes/des/des3_ede cipher mode.
>> +
>>  endif # CRYPTO_HW
>> diff --git a/drivers/crypto/Makefile b/drivers/crypto/Makefile
>> index c3ced6f..713de9d 100644
>> --- a/drivers/crypto/Makefile
>> +++ b/drivers/crypto/Makefile
>> @@ -29,3 +29,4 @@ obj-$(CONFIG_CRYPTO_DEV_QAT) += qat/
>>  obj-$(CONFIG_CRYPTO_DEV_QCE) += qce/
>>  obj-$(CONFIG_CRYPTO_DEV_VMX) += vmx/
>>  obj-$(CONFIG_CRYPTO_DEV_SUN4I_SS) += sunxi-ss/
>> +obj-$(CONFIG_CRYPTO_DEV_ROCKCHIP) += rockchip/
>> diff --git a/drivers/crypto/rockchip/Makefile 
>> b/drivers/crypto/rockchip/Makefile
>> new file mode 100644
>> index 000..7051c6c
>> --- /dev/null
>> +++ b/drivers/crypto/rockchip/Makefile
>> @@ -0,0 +1,3 @@
>> +obj-$(CONFIG_CRYPTO_DEV_ROCKCHIP) += rk_crypto.o
>> +rk_crypto-objs := rk3288_crypto.o \
>> +  rk3288_crypto_ablkcipher.o \
>> diff --git a/drivers/crypto/rockchip/rk3288_crypto.c 
>> b/drivers/crypto/rockchip/rk3288_crypto.c
>> new file mode 100644
>> index 000..02830f2
>> --- /dev/null
>> +++ b/drivers/crypto/rockchip/rk3288_crypto.c
>> @@ -0,0 +1,383 @@
>> +/*
>> + *Crypto acceleration support for Rockchip RK3288
>> + *
>> + * Copyright (c) 2015, Fuzhou Rockchip Electronics Co., Ltd
>> + *
>> + * Author: Zain Wang 
>> + *
>> + * This program is free software; you can redistribute it and/or modify it
>> + * under the terms and conditions of the GNU General Public License,
>> + * version 2, as published by the Free Software Foundation.
>> + *
>> + * Some ideas are from marvell-cesa.c and s5p-sss.c driver.
>> + */
>> +
>> +#include "rk3288_crypto.h"
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +struct crypto_info_t *crypto_p;
>> +
>> +static int rk_crypto_enable_clk(struct crypto_info_t *dev)
>> +{
>> +int err;
>> +
>> +err = clk_prepare_enable(dev->sclk);
>> +if (err) {
>> +dev_err(dev->dev, "[%s:%d], Couldn't enable clock 'sclk'\n",
>> +__func__, __LINE__);
>> +goto err_return;
>> +}
>> +err = clk_prepare_enable(dev->aclk);
>> +if (err) {
>> +dev_err(dev->dev, "[%s:%d], Couldn't enable clock 'aclk'\n",
>> +__func__, __LINE__);
>> +goto err_aclk;
>> +}
>> +err = clk_prepare_enable(dev->hclk);
>> +if (err) {
>> +dev_err(dev->dev, "[%s:%d], Couldn't enable clock 'hclk'\n",
>> +__func__, __LINE__);
>> +goto err_hclk;
>> +}
>> +
>> +err = clk_prepare_enable(dev->dmaclk);
>> +if (err) {
>> +dev_err(dev->dev, "[%s:%d], Couldn't enable clock 'dmaclk'\n",
>> +__func__, __LINE__);
>> +goto err_dmaclk;
>> +}
>> +return err;
>> +err_dmaclk:
>> +clk_disable_unprepare(dev->hclk);
>> +err_hclk:
>> +clk_disable_unprepare(dev->aclk);
>> +err_aclk:
>> +clk_disable_unprepare(dev->sclk);
>> +err_return:
>> +return err;
>> +}
>> +
>> +static void rk_crypto_disable_clk(struct crypto_info_t *dev)
>> +{
>> +clk_disable_unprepare(dev->dmaclk);
>> +clk_disable_unprepare(dev->hclk);
>> +

[lkp] [f2fs] 60b99b486b: +1.9% fileio.requests_per_sec

2015-11-03 Thread kernel test robot
FYI, we noticed the below changes on

https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
commit 60b99b486b568c13cbb7caa83cf8a12af7665f1e ("f2fs: introduce a periodic 
checkpoint flow")


=
tbox_group/testcase/rootfs/kconfig/compiler/period/nr_threads/disk/fs/size/filenum/rwmode/iomode:
  
lkp-ws02/fileio/debian-x86_64-2015-02-07.cgz/x86_64-rhel/gcc-4.9/600s/100%/1HDD/f2fs/64G/1024f/seqrewr/sync

commit: 
  5c2674347466d5c2d5169214e95f4ad6dc09e9b6
  60b99b486b568c13cbb7caa83cf8a12af7665f1e

5c2674347466d5c2 60b99b486b568c13cbb7caa83c 
 -- 
 %stddev %change %stddev
 \  |\  
   ±  8%+279.7%   4219 ± 23%  fileio.request_latency_max_ms
  5656 ±  0%  +1.9%   5764 ±  0%  fileio.requests_per_sec
 1.088e+08 ±  0%  +1.9%  1.108e+08 ±  0%  fileio.time.file_system_outputs
 46225 ±  0%  +4.9%  48473 ±  0%  
fileio.time.involuntary_context_switches
  3759 ±  0%  +2.6%   3858 ±  1%  
fileio.time.maximum_resident_set_size
 47.00 ± 92%+163.8% 124.00 ±  8%  
numa-vmstat.node0.workingset_refault
 37306 ±  2% -10.6%  33365 ±  1%  softirqs.BLOCK
328.00 ±  0% +55.5% 510.00 ±  1%  time.file_system_inputs
 79132 ± 17% +30.7% 103393 ±  2%  numa-meminfo.node0.Active
122897 ± 10% -20.4%  97780 ±  2%  numa-meminfo.node1.Active
  1.45 ±  1%  +3.3%   1.50 ±  1%  turbostat.%Busy
  0.40 ± 13% +76.1%   0.70 ± 19%  turbostat.Pkg%pc3
  0.12 ± 15%+638.8%   0.91 ± 16%  turbostat.Pkg%pc6
 83662 ±  0%  +1.9%  85235 ±  0%  vmstat.io.bo
626.00 ±  5% +23.0% 770.25 ±  8%  vmstat.memory.buff
  9980 ±  1%  +2.6%  10241 ±  1%  vmstat.system.cs
 2.012e+08 ± 11% +25.0%  2.514e+08 ±  8%  cpuidle.C1E-NHM.time
 52329 ± 11% -23.6%  39994 ± 12%  cpuidle.C3-NHM.usage
  5227 ± 29%+413.0%  26816 ±114%  cpuidle.POLL.time
336.25 ± 33% +82.5% 613.50 ±  6%  cpuidle.POLL.usage
  2648 ±  5% -13.8%   2283 ±  2%  
proc-vmstat.kswapd_high_wmark_hit_quickly
  3872 ±  1% +12.6%   4361 ±  3%  
proc-vmstat.kswapd_low_wmark_hit_quickly
  1072 ± 29% +30.1%   1395 ± 16%  proc-vmstat.pgpgin
  9856 ± 11%+350.3%  44384 ±  3%  proc-vmstat.slabs_scanned
  3364 ±  2% -20.9%   2660 ±  2%  
slabinfo.f2fs_extent_tree.active_objs
  3364 ±  2% -20.9%   2660 ±  2%  slabinfo.f2fs_extent_tree.num_objs
 19645 ±  0%  +9.8%  21579 ±  4%  
slabinfo.jbd2_revoke_record_s.num_objs
  3165 ±  4%  -8.7%   2891 ±  8%  slabinfo.proc_inode_cache.num_objs
  0.00 ± -1%  +Inf%1774752 ±  7%  
latency_stats.avg.call_rwsem_down_read_failed.f2fs_convert_inline_inode.[f2fs].f2fs_write_begin.[f2fs].generic_perform_write.__generic_file_write_iter.generic_file_write_iter.f2fs_file_write_iter.[f2fs].__vfs_write.vfs_write.SyS_pwrite64.entry_SYSCALL_64_fastpath
  0.00 ± -1%  +Inf%1804979 ± 68%  
latency_stats.avg.get_request.blk_queue_bio.generic_make_request.submit_bio.f2fs_submit_page_bio.[f2fs].get_meta_page.[f2fs].get_node_info.[f2fs].read_node_page.[f2fs].get_node_page.[f2fs].get_dnode_of_data.[f2fs].f2fs_reserve_block.[f2fs].f2fs_get_block.[f2fs]
  0.00 ± -1%  +Inf%3191014 ±  6%  
latency_stats.max.call_rwsem_down_read_failed.f2fs_convert_inline_inode.[f2fs].f2fs_write_begin.[f2fs].generic_perform_write.__generic_file_write_iter.generic_file_write_iter.f2fs_file_write_iter.[f2fs].__vfs_write.vfs_write.SyS_pwrite64.entry_SYSCALL_64_fastpath
   1043277 ±  6%+294.7%4118293 ± 23%  
latency_stats.max.generic_file_write_iter.f2fs_file_write_iter.[f2fs].__vfs_write.vfs_write.SyS_pwrite64.entry_SYSCALL_64_fastpath
  0.00 ± -1%  +Inf%1804979 ± 68%  
latency_stats.max.get_request.blk_queue_bio.generic_make_request.submit_bio.f2fs_submit_page_bio.[f2fs].get_meta_page.[f2fs].get_node_info.[f2fs].read_node_page.[f2fs].get_node_page.[f2fs].get_dnode_of_data.[f2fs].f2fs_reserve_block.[f2fs].f2fs_get_block.[f2fs]
 16549 ± 50%+248.8%  57724 ±145%  
latency_stats.max.rpc_wait_bit_killable.__rpc_wait_for_completion_task.nfs4_run_open_task.[nfsv4]._nfs4_open_and_get_state.[nfsv4].nfs4_do_open.[nfsv4].nfs4_atomic_open.[nfsv4].nfs4_file_open.[nfsv4].do_dentry_open.vfs_open.path_openat.do_filp_open.do_sys_open
  0.00 ± -1%  +Inf%   16400627 ±  7%  
latency_stats.sum.call_rwsem_down_read_failed.f2fs_convert_inline_inode.[f2fs].f2fs_write_begin.[f2fs].generic_perform_write.__generic_file_write_iter.generic_file_write_iter.f2fs_file_write_iter.[f2fs].__vfs_write.vfs_write.SyS_pwrite64.entry_SYSCALL_64_fastpath
  0.00 ± -1%  +Inf%1804979 ± 68%  

Re: [PATCH V3 2/3] perf/powerpc :add support for sampling intr machine state

2015-11-03 Thread Madhavan Srinivasan


On Tuesday 03 November 2015 02:46 PM, Michael Ellerman wrote:
> On Tue, 2015-11-03 at 11:40 +0530, Anju T wrote:
>
>> The perf infrastructure uses a bit mask to find out
>> valid registers to display. Define a register mask
>> for supported registers defined in asm/perf_regs.h.
>> The bit positions also correspond to register IDs
>> which is used by perf infrastructure to fetch the register
>> values.CONFIG_HAVE_PERF_REGS enables
>> sampling of the interrupted machine state.
>> diff --git a/arch/powerpc/perf/perf_regs.c b/arch/powerpc/perf/perf_regs.c
>> new file mode 100644
>> index 000..0520492
>> --- /dev/null
>> +++ b/arch/powerpc/perf/perf_regs.c
>> @@ -0,0 +1,92 @@
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +#include 
>> +
>> +#define PT_REGS_OFFSET(id, r) [id] = offsetof(struct pt_regs, r)
>> +
>> +#define REG_RESERVED (~((1ULL << PERF_REG_POWERPC_MAX) - 1))
>> +
>> +static unsigned int pt_regs_offset[PERF_REG_POWERPC_MAX] = {
>> +PT_REGS_OFFSET(PERF_REG_POWERPC_GPR0, gpr[0]),
>> +PT_REGS_OFFSET(PERF_REG_POWERPC_GPR1, gpr[1]),
>> +PT_REGS_OFFSET(PERF_REG_POWERPC_GPR2, gpr[2]),
> 
>
> I realise you're following the example of other architectures, but we have
> almost this exact same structure in ptrace.c, see regoffset_table.

That won't work because we want to add more regs to the perf version but
not the ptrace version.

Maddy

> It would be really nice if we could share them between ptrace and perf.
>
> cheers
>
> ___
> Linuxppc-dev mailing list
> linuxppc-...@lists.ozlabs.org
> https://lists.ozlabs.org/listinfo/linuxppc-dev

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[GIT PULL] Thermal-SoC management updates for v4.4-rc1 #2

2015-11-03 Thread Eduardo Valentin
Hello Rui,

Please pull from

  git://git.kernel.org/pub/scm/linux/kernel/git/evalenti/linux-soc-thermal linus

to receive second Thermal-SoC Management updates for v4.4-rc1 with top-most

7e38a5b1daa12cbaace3c76402999a84460df3e2:

  thermal: rockchip: support the sleep pinctrl state to avoid glitches in s2r 
(2015-11-03 09:57:42 -0800)

on top of commit 8fb2b9ac2aadd6d87f89071c2c85f8c12b41c943:

  thermal: underflow bug in imx_set_trip_temp() (2015-10-30 11:35:50 -0700)

Specifics:
- Rockchip thermal driver now has pinctrl support  
- First round of fixes on devfreq cooling device.
- This branch has been compiled tested and boot tested by Linaro kernelci bot 
[1,2].

This is a second round of pull after going through the pending patches
on patchwork. Still trying to reduce the patchwork counts. The rockchip
support has already been reviewed. And there is no point in waiting
for the devfreq fixes.

Still checking what is pending. If I missed your patch, ping me.

[1] - 
http://kernelci.org/boot/all/job/evalenti/kernel/v4.3-rc3-56-g7e38a5b1daa1/
[2] - http://kernelci.org/build/evalenti/kernel/v4.3-rc3-56-g7e38a5b1daa1/

BR,

Eduardo Valentin

Caesar Wang (2):
  dt-bindings: rockchip-thermal: Add the pinctrl states in this document
  thermal: rockchip: support the sleep pinctrl state to avoid glitches in 
s2r

Javi Merino (2):
  thermal: devfreq_cooling: use a thermal_cooling_device for register and 
unregister
  thermal: devfreq_cooling: Make power a u64

 .../devicetree/bindings/thermal/rockchip-thermal.txt   | 11 +--
 drivers/thermal/devfreq_cooling.c  | 18 +++---
 drivers/thermal/rockchip_thermal.c |  4 
 include/linux/devfreq_cooling.h| 16 
 4 files changed, 32 insertions(+), 17 deletions(-)
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 0/2] Stop Intel Processor Trace logging on panic

2015-11-03 Thread Takao Indoh
Hi all,

These patch series provide a feature to stop Intel Processor Trace
(Intel PT) logging and save its registers in the memory on panic.

Intel PT is a new feature of Intel CPU "Broadwell", it captures
information about program execution flow. Here is a article about Intel
PT.
https://software.intel.com/en-us/blogs/2013/09/18/processor-tracing

Once Intel PT is enabled, the events which change program flow, like
branch instructions, exceptions, interruptions, traps and so on are
logged in the memory. This is very useful for debugging because we can
know the detailed behavior of software.

When kernel panic occurs while you are running a perf command with Intel
PT (with -e intel_pt// option), these patches disable Intel PT and save
its registers in the memory. After crash dump is captured by kdump, you
can retrieve Intel PT log buffer from vmcore and investigate kernel
behavior. I have not made a tool yet to salvage Intel PT log buffer from
vmcore, but I'll do once these patches are accepted.

changelog:
v2:
- Define function in intel_pt.h with static inline

v1:
https://lkml.org/lkml/2015/10/28/136


Background:
These patches are a part of patch series I posted before, the original
discussion is bellow.

x86: Intel Processor Trace Logger
v1: https://lkml.org/lkml/2015/7/29/6
v2: https://lkml.org/lkml/2015/9/8/24

The purpose of the original patches is introducing in-kernel logger
using Intel PT. To implement it I need to add some APIs to control perf
counter and ring buffer in kernel. Alexander Shishkin is working on such
APIs for his work to make use of Intel PT for process core dump. Apart
from such APIs, the feature to save Intel PT registers on panic is
helpful for normal perf command user as I described above, therefore I
separate the feature from original patches.

Takao Indoh (2):
  perf/x86/intel/pt: Add interface to stop Intel PT logging
  x86: Stop Intel PT before kdump starts

 arch/x86/include/asm/intel_pt.h   |   10 ++
 arch/x86/kernel/cpu/perf_event_intel_pt.c |9 +
 arch/x86/kernel/crash.c   |   11 +++
 3 files changed, 30 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/intel_pt.h


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/2] perf/x86/intel/pt: Add interface to stop Intel PT logging

2015-11-03 Thread Takao Indoh
This patch add a function for external component to stop Intel PT.
Basically this function is used when kernel panic occurs. When it is
called, intel_pt driver disables Intel PT and saves its registers using
pt_event_stop, which is also used by pmu.stop handler. This function
stops Intel PT on the cpu where it is working, therefore user need to
call it for each cpu to stop all logging.

Signed-off-by: Takao Indoh 
---
 arch/x86/include/asm/intel_pt.h   |   10 ++
 arch/x86/kernel/cpu/perf_event_intel_pt.c |9 +
 2 files changed, 19 insertions(+), 0 deletions(-)
 create mode 100644 arch/x86/include/asm/intel_pt.h

diff --git a/arch/x86/include/asm/intel_pt.h b/arch/x86/include/asm/intel_pt.h
new file mode 100644
index 000..e1a4117
--- /dev/null
+++ b/arch/x86/include/asm/intel_pt.h
@@ -0,0 +1,10 @@
+#ifndef _ASM_X86_INTEL_PT_H
+#define _ASM_X86_INTEL_PT_H
+
+#if defined(CONFIG_PERF_EVENTS) && defined(CONFIG_CPU_SUP_INTEL)
+void cpu_emergency_stop_pt(void);
+#else
+static inline void cpu_emergency_stop_pt(void) {}
+#endif
+
+#endif /* _ASM_X86_INTEL_PT_H */
diff --git a/arch/x86/kernel/cpu/perf_event_intel_pt.c 
b/arch/x86/kernel/cpu/perf_event_intel_pt.c
index 4216928..a638b5b 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_pt.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_pt.c
@@ -27,6 +27,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "perf_event.h"
 #include "intel_pt.h"
@@ -1125,6 +1126,14 @@ static int pt_event_init(struct perf_event *event)
return 0;
 }
 
+void cpu_emergency_stop_pt(void)
+{
+   struct pt *pt = this_cpu_ptr(_ctx);
+
+   if (pt->handle.event)
+   pt_event_stop(pt->handle.event, PERF_EF_UPDATE);
+}
+
 static __init int pt_init(void)
 {
int ret, cpu, prior_warn = 0;
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 2/2] x86: Stop Intel PT before kdump starts

2015-11-03 Thread Takao Indoh
This patch stops Intel PT logging and saves its registers in the memory
before kdump is started. This feature is needed to prevent Intel PT from
overwrite its log buffer after panic, and saved registers are needed to
find the last position where Intel PT wrote data. After crash dump is
captured by kdump, user can retrieve the log buffer from vmcore and use
it to investigate kernel behavior.

Signed-off-by: Takao Indoh 
---
 arch/x86/kernel/crash.c |   11 +++
 1 files changed, 11 insertions(+), 0 deletions(-)

diff --git a/arch/x86/kernel/crash.c b/arch/x86/kernel/crash.c
index 74ca2fe..5f383d2 100644
--- a/arch/x86/kernel/crash.c
+++ b/arch/x86/kernel/crash.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 /* Alignment required for elf header segment */
 #define ELF_CORE_HEADER_ALIGN   4096
@@ -127,6 +128,11 @@ static void kdump_nmi_callback(int cpu, struct pt_regs 
*regs)
cpu_emergency_vmxoff();
cpu_emergency_svm_disable();
 
+   /*
+* Disable Intel PT to stop its logging
+*/
+   cpu_emergency_stop_pt();
+
disable_local_APIC();
 }
 
@@ -172,6 +178,11 @@ void native_machine_crash_shutdown(struct pt_regs *regs)
cpu_emergency_vmxoff();
cpu_emergency_svm_disable();
 
+   /*
+* Disable Intel PT to stop its logging
+*/
+   cpu_emergency_stop_pt();
+
 #ifdef CONFIG_X86_IO_APIC
/* Prevent crash_kexec() from deadlocking on ioapic_lock. */
ioapic_zap_locks();
-- 
1.7.1


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V3 2/3] perf/powerpc :add support for sampling intr machine state

2015-11-03 Thread Anju T

Hi Michael,
On Tuesday 03 November 2015 02:46 PM, Michael Ellerman wrote:

On Tue, 2015-11-03 at 11:40 +0530, Anju T wrote:


The perf infrastructure uses a bit mask to find out
valid registers to display. Define a register mask
for supported registers defined in asm/perf_regs.h.
The bit positions also correspond to register IDs
which is used by perf infrastructure to fetch the register
values.CONFIG_HAVE_PERF_REGS enables
sampling of the interrupted machine state.
diff --git a/arch/powerpc/perf/perf_regs.c b/arch/powerpc/perf/perf_regs.c
new file mode 100644
index 000..0520492
--- /dev/null
+++ b/arch/powerpc/perf/perf_regs.c
@@ -0,0 +1,92 @@
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define PT_REGS_OFFSET(id, r) [id] = offsetof(struct pt_regs, r)
+
+#define REG_RESERVED (~((1ULL << PERF_REG_POWERPC_MAX) - 1))
+
+static unsigned int pt_regs_offset[PERF_REG_POWERPC_MAX] = {
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR0, gpr[0]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR1, gpr[1]),
+   PT_REGS_OFFSET(PERF_REG_POWERPC_GPR2, gpr[2]),



I realise you're following the example of other architectures, but we have
almost this exact same structure in ptrace.c, see regoffset_table.

It would be really nice if we could share them between ptrace and perf.

cheers




Thank you for reviewing the patch.

That is a great suggestion.

In ptrace.c the structure doesn't include ORIG_R3. So,in that case what 
should we do?




Thanks and Regards

Anju


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


perf related lockdep bug

2015-11-03 Thread Dave Jones
This is new to me since todays merge window.

Dave

[  293.955261] ==
[  293.955395] [ INFO: possible circular locking dependency detected ]
[  293.955536] 4.3.0-think+ #3 Not tainted
[  293.955618] ---
[  293.955756] trinity-c343/12586 is trying to acquire lock:
[  293.955875]  (rcu_node_0){-.-...}, at: [] 
rcu_report_exp_cpu_mult+0x24/0x70
[  293.956070] 
   but task is already holding lock:
[  293.956194]  (>lock){-.}, at: [] 
perf_lock_task_context+0x53/0x300
[  293.956386] 
   which lock already depends on the new lock.

[  293.956559] 
   the existing dependency chain (in reverse order) is:
[  293.956720] 
   -> #3 (>lock){-.}:
[  293.956807][] lock_acquire+0xc3/0x1d0
[  293.956940][] _raw_spin_lock+0x41/0x80
[  293.957075][] 
__perf_event_task_sched_out+0x276/0x700
[  293.957239][] __schedule+0x530/0xaf0
[  293.957370][] schedule+0x3f/0x90
[  293.957493][] exit_to_usermode_loop+0x76/0xa0
[  293.957642][] prepare_exit_to_usermode+0x53/0x60
[  293.957795][] retint_user+0x8/0x20
[  293.957923] 
   -> #2 (>lock){-.-.-.}:
[  293.958008][] lock_acquire+0xc3/0x1d0
[  293.958140][] _raw_spin_lock+0x41/0x80
[  293.958275][] sched_move_task+0x4d/0x270
[  293.958417][] cpu_cgroup_fork+0xe/0x10
[  293.958551][] cgroup_post_fork+0x67/0x160
[  293.958692][] copy_process+0x147a/0x1e10
[  293.958831][] _do_fork+0x97/0x680
[  293.958955][] kernel_thread+0x29/0x30
[  293.959087][] rest_init+0x27/0x140
[  293.959211][] start_kernel+0x450/0x45d
[  293.959347][] x86_64_start_reservations+0x2a/0x2c
[  293.959503][] x86_64_start_kernel+0x168/0x176
[  293.959652] 
   -> #1 (>pi_lock){-.-.-.}:
[  293.959742][] lock_acquire+0xc3/0x1d0
[  293.959876][] _raw_spin_lock_irqsave+0x4c/0x90
[  293.960026][] try_to_wake_up+0x30/0x5f0
[  293.960165][] wake_up_process+0x23/0x40
[  293.960300][] rcu_spawn_gp_kthread+0xd5/0xfd
[  293.960449][] do_one_initcall+0xb3/0x1f0
[  293.960587][] kernel_init_freeable+0x101/0x234
[  293.960736][] kernel_init+0xe/0xe0
[  293.960860][] ret_from_fork+0x3f/0x70
[  293.960993] 
   -> #0 (rcu_node_0){-.-...}:
[  293.961081][] __lock_acquire+0x19d2/0x1cf0
[  293.961222][] lock_acquire+0xc3/0x1d0
[  293.961351][] _raw_spin_lock_irqsave+0x4c/0x90
[  293.961501][] rcu_report_exp_cpu_mult+0x24/0x70
[  293.961652][] rcu_read_unlock_special+0x109/0x5e0
[  293.961808][] __rcu_read_unlock+0x87/0x90
[  293.961947][] perf_lock_task_context+0x19d/0x300
[  293.962101][] perf_event_init_task+0x9f/0x370
[  293.962248][] copy_process+0x7fa/0x1e10
[  293.962385][] _do_fork+0x97/0x680
[  293.962508][] SyS_clone+0x19/0x20
[  293.962631][] entry_SYSCALL_64_fastpath+0x12/0x6f
[  293.962787] 
   other info that might help us debug this:

[  293.962955] Chain exists of:
 rcu_node_0 --> >lock --> >lock

[  293.963106]  Possible unsafe locking scenario:

[  293.963233]CPU0CPU1
[  293.963332]
[  293.963430]   lock(>lock);
[  293.963500]lock(>lock);
[  293.963627]lock(>lock);
[  293.963755]   lock(rcu_node_0);
[  293.963824] 
*** DEADLOCK ***

[  293.963947] 2 locks held by trinity-c343/12586:
[  293.964047]  #0:  (rcu_read_lock){..}, at: [] 
perf_lock_task_context+0xd0/0x300
[  293.964258]  #1:  (>lock){-.}, at: [] 
perf_lock_task_context+0x53/0x300
[  293.972934] 
   stack backtrace:
[  293.990221] CPU: 3 PID: 12586 Comm: trinity-c343 Not tainted 4.3.0-think+ #3
[  293.999210]  8804fb6824a0 88044a0efa60 b03607bc 
b1779630
[  294.008115]  88044a0efaa0 b00d1453 88044a0efb10 
0002
[  294.017066]  8804fb682468 8804fb6824a0 0001 
8804fb681b80
[  294.026109] Call Trace:
[  294.035079]  [] dump_stack+0x4e/0x82
[  294.044014]  [] print_circular_bug+0x1e3/0x250
[  294.052982]  [] __lock_acquire+0x19d2/0x1cf0
[  294.061926]  [] ? get_lock_stats+0x19/0x60
[  294.070878]  [] lock_acquire+0xc3/0x1d0
[  294.079828]  [] ? rcu_report_exp_cpu_mult+0x24/0x70
[  294.088805]  [] _raw_spin_lock_irqsave+0x4c/0x90
[  294.097771]  [] ? rcu_report_exp_cpu_mult+0x24/0x70
[  294.106807]  [] rcu_report_exp_cpu_mult+0x24/0x70
[  294.115823]  [] rcu_read_unlock_special+0x109/0x5e0
[  294.124813]  [] ? get_lock_stats+0x19/0x60
[  294.133801]  [] ? perf_lock_task_context+0x53/0x300
[  294.142790]  [] __rcu_read_unlock+0x87/0x90
[  294.151675]  [] 

Re: kernel panic in 4.2.3, rb_erase in sch_fq

2015-11-03 Thread Denys Fedoryshchenko

On 2015-11-04 06:58, Eric Dumazet wrote:

On Tue, 2015-11-03 at 20:46 -0800, Eric Dumazet wrote:

On Wed, 2015-11-04 at 06:25 +0200, Denys Fedoryshchenko wrote:
> On 2015-11-04 00:06, Cong Wang wrote:
> > On Mon, Nov 2, 2015 at 6:11 AM, Denys Fedoryshchenko
> >  wrote:
> >> Hi!
> >>
> >> Actually seems i was getting this panic for a while (once per week) on
> >> loaded pppoe server, but just now was able to get full panic message.
> >> After checking commit logs on sch_fq.c i didnt seen any fixes, so
> >> probably
> >> upgrading to newer kernel wont help?
> >
> >
> > Can you share your `tc qdisc show dev ` with us? And how to
> > reproduce
> > it? I tried to setup htb+fq and then flip the interface back and forth
> > but I don't
> > see any crash.
> My guess it wont be easy to reproduce, it is happening on box with 4.5k
> interfaces, that constantly create/delete interfaces,
> and even with that this problem may happen once per day, or may not
> happen for 1 week.
>
> Here is script that is being fired after new ppp interface detected. But
> pppoe process are independent from
> process that are "establishing" shapers.


It is probably a generic bug. sch_fq seems OK to me.

Somehow nobody tries to change qdisc hundred times per second ;)

Could you try following patch ?

It seems to 'fix' the issue for me.


Following patch would be more appropriate.
Prior one was meant to 'show' the issue.

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index cb5d4ad32946..7f5f3e8a10f5 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -706,9 +706,11 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue
*dev_queue,
spin_lock_bh(root_lock);

/* Prune old scheduler */
-   if (oqdisc && atomic_read(>refcnt) <= 1)
-   qdisc_reset(oqdisc);
-
+   if (oqdisc) {
+   if (atomic_read(>refcnt) <= 1)
+   qdisc_reset(oqdisc);
+   set_bit(__QDISC_STATE_DEACTIVATED, >state);
+   }
/* ... and graft new one */
if (qdisc == NULL)
qdisc = _qdisc;


Applied, will test it, but this bug might be triggered rarely.
I will try to push it to more pppoe servers in order to stress test them 
(and 4.3) more.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH v1] Revert "serial: imx: remove unbalanced clk_prepare"

2015-11-03 Thread Duan Andy
From: Robin Gong  Sent: Wednesday, November 04, 2015 
10:58 AM
> To: gre...@linuxfoundation.org; edubez...@gmail.com
> Cc: Duan Fugang-B38611; linux-ser...@vger.kernel.org; linux-
> ker...@vger.kernel.org
> Subject: [PATCH v1] Revert "serial: imx: remove unbalanced clk_prepare"
> 
> commit 9e7b399d6528 ("serial: imx: remove unbalanced clk_prepare").
> Otherwise below warning happen since there are some printk logs in
> interrupt.
> 
> [   14.868319] udevd[501]: starting version 182
> [   16.386107] random: nonblocking pool is initialized
> [   16.386123] [ cut here ]
> [   16.386140] WARNING: CPU: 0 PID: 501 at kernel/locking/mutex.c:868
> mutex_trylock+0x210/0x230()
> [   16.386146] DEBUG_LOCKS_WARN_ON(in_interrupt())
> [   16.386149] Modules linked in:
> [   16.386157] CPU: 0 PID: 501 Comm: udevd Not tainted 4.3.0-rc1-00014-
> gf843df8 #28
> [   16.386160] Hardware name: Freescale i.MX6 SoloX (Device Tree)
> [   16.386165] Backtrace:
> [   16.386182] [] (dump_backtrace) from []
> (show_stack+0x18/0x1c)
> [   16.386192]  r6:c0af1840 r5: r4: r3:
> [   16.386201] [] (show_stack) from []
> (dump_stack+0x8c/0xa4)
> [   16.386216] [] (dump_stack) from []
> (warn_slowpath_common+0x80/0xbc)
> [   16.386225]  r6:c07b1ccc r5:0009 r4:edcf5c60 r3:0001
> [   16.386234] [] (warn_slowpath_common) from []
> (warn_slowpath_fmt+0x38/0x40)
> [   16.386246]  r8:c0564694 r7:eeb76880 r6:c13323ec r5:0001
> r4:c0976990
> [   16.386255] [] (warn_slowpath_fmt) from []
> (mutex_trylock+0x210/0x230)
> [   16.386261]  r3:c0978f50 r2:c0976990
> [   16.386264]  r4:c0b29a70
> [   16.386278] [] (mutex_trylock) from []
> (clk_prepare_lock+0x14/0xf4)
> [   16.386289]  r8:c133000c r7:eeb76880 r6:0037 r5:c12efbc8
> r4:eeb76880
> [   16.386297] [] (clk_prepare_lock) from []
> (clk_prepare+0x18/0x38)
> [   16.386303]  r5:c12efbc8 r4:eeb76880
> [   16.386311] [] (clk_prepare) from []
> (imx_console_write+0x34/0x248)
> [   16.386317]  r4:ee999c10 r3:c134a92c
> [   16.386329] [] (imx_console_write) from []
> (call_console_drivers.constprop.25+0xe0/0x104)
> [   16.386342]  r10:c07b938c r9: r8:c133000c r7:0037
> r6:edcf4000 r5:c12ef6c0
> [   16.386345]  r4:c0afef38
> [   16.386355] [] (call_console_drivers.constprop.25) from
> [] (console_unlock+0x3fc/0x57c)
> [   16.386367]  r10:c132fff0 r9:0100 r8:0037 r7:
> r6:0005 r5:c12f4df0
> 
> Signed-off-by: Robin Gong 
> ---
>  drivers/tty/serial/imx.c | 20 ++--
>  1 file changed, 14 insertions(+), 6 deletions(-)

I remember there had patch for this.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] crypto: qce: dma_map_sg can handle chained SG

2015-11-03 Thread Herbert Xu
On Tue, Nov 03, 2015 at 01:36:55PM +0100, LABBE Corentin wrote:
>
> Herbert, do you prefer to drop the patch series and I send an updated one, or 
> does I will send a new patch series for checking the return value of 
> sg_nents_for_len() ?

Your patch has already been applied so please send any fixes on
top of it.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: kernel panic in 4.2.3, rb_erase in sch_fq

2015-11-03 Thread Eric Dumazet
On Tue, 2015-11-03 at 20:46 -0800, Eric Dumazet wrote:
> On Wed, 2015-11-04 at 06:25 +0200, Denys Fedoryshchenko wrote:
> > On 2015-11-04 00:06, Cong Wang wrote:
> > > On Mon, Nov 2, 2015 at 6:11 AM, Denys Fedoryshchenko
> > >  wrote:
> > >> Hi!
> > >> 
> > >> Actually seems i was getting this panic for a while (once per week) on
> > >> loaded pppoe server, but just now was able to get full panic message.
> > >> After checking commit logs on sch_fq.c i didnt seen any fixes, so 
> > >> probably
> > >> upgrading to newer kernel wont help?
> > > 
> > > 
> > > Can you share your `tc qdisc show dev ` with us? And how to 
> > > reproduce
> > > it? I tried to setup htb+fq and then flip the interface back and forth
> > > but I don't
> > > see any crash.
> > My guess it wont be easy to reproduce, it is happening on box with 4.5k 
> > interfaces, that constantly create/delete interfaces,
> > and even with that this problem may happen once per day, or may not 
> > happen for 1 week.
> > 
> > Here is script that is being fired after new ppp interface detected. But 
> > pppoe process are independent from
> > process that are "establishing" shapers.
> 
> 
> It is probably a generic bug. sch_fq seems OK to me.
> 
> Somehow nobody tries to change qdisc hundred times per second ;)
> 
> Could you try following patch ?
> 
> It seems to 'fix' the issue for me.

Following patch would be more appropriate.
Prior one was meant to 'show' the issue.

diff --git a/net/sched/sch_generic.c b/net/sched/sch_generic.c
index cb5d4ad32946..7f5f3e8a10f5 100644
--- a/net/sched/sch_generic.c
+++ b/net/sched/sch_generic.c
@@ -706,9 +706,11 @@ struct Qdisc *dev_graft_qdisc(struct netdev_queue 
*dev_queue,
spin_lock_bh(root_lock);
 
/* Prune old scheduler */
-   if (oqdisc && atomic_read(>refcnt) <= 1)
-   qdisc_reset(oqdisc);
-
+   if (oqdisc) {
+   if (atomic_read(>refcnt) <= 1)
+   qdisc_reset(oqdisc);
+   set_bit(__QDISC_STATE_DEACTIVATED, >state);
+   }
/* ... and graft new one */
if (qdisc == NULL)
qdisc = _qdisc;


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 1/5] spi: introduce mmap read support for spi flash devices

2015-11-03 Thread Vignesh R
Hi,

On 11/03/2015 04:49 PM, Michal Suchanek wrote:
> On 3 November 2015 at 11:06, Vignesh R  wrote:
>> In addition to providing direct access to SPI bus, some spi controller
>> hardwares (like ti-qspi) provide special memory mapped port
>> to accesses SPI flash devices in order to increase read performance.
>> This means the controller can automatically send the SPI signals
>> required to read data from the SPI flash device.
>> For this, spi controller needs to know flash specific information like
>> read command to use, dummy bytes and address width. Once these settings
>> are populated in hardware registers, any read accesses to flash's memory
>> map region(SoC specific) through memcpy (or mem-to mem DMA copy) will be
>> handled by controller hardware. The hardware will automatically generate
>> SPI signals required to read data from flash and present it to CPU/DMA.
>>
>> Introduce spi_mtd_mmap_read() interface to support memory mapped read
>> over SPI flash devices. SPI master drivers can implement this callback to
>> support memory mapped read interfaces. m25p80 flash driver and other
>> flash drivers can call this to request memory mapped read. The interface
>> should only be used MTD flashes and cannot be used with other SPI devices.
>>
>> Signed-off-by: Vignesh R 
>> ---
>>  drivers/spi/spi.c   | 35 +++
>>  include/linux/spi/spi.h | 23 +++
>>  2 files changed, 58 insertions(+)
>>
>> diff --git a/drivers/spi/spi.c b/drivers/spi/spi.c
>> index a5f53de813d3..5a5c7a7d47f2 100644
>> --- a/drivers/spi/spi.c
>> +++ b/drivers/spi/spi.c
>> @@ -1059,6 +1059,7 @@ static void __spi_pump_messages(struct spi_master 
>> *master, bool in_kthread)
>> }
>> }
>>
>> +   mutex_lock(>mmap_lock_mutex);
>> trace_spi_message_start(master->cur_msg);
>>
>> if (master->prepare_message) {
>> @@ -1068,6 +1069,7 @@ static void __spi_pump_messages(struct spi_master 
>> *master, bool in_kthread)
>> "failed to prepare message: %d\n", ret);
>> master->cur_msg->status = ret;
>> spi_finalize_current_message(master);
>> +   mutex_unlock(>mmap_lock_mutex);
>> return;
>> }
>> master->cur_msg_prepared = true;
>> @@ -1077,6 +1079,7 @@ static void __spi_pump_messages(struct spi_master 
>> *master, bool in_kthread)
>> if (ret) {
>> master->cur_msg->status = ret;
>> spi_finalize_current_message(master);
>> +   mutex_unlock(>mmap_lock_mutex);
>> return;
>> }
>>
>> @@ -1084,8 +1087,10 @@ static void __spi_pump_messages(struct spi_master 
>> *master, bool in_kthread)
>> if (ret) {
>> dev_err(>dev,
>> "failed to transfer one message from queue\n");
>> +   mutex_unlock(>mmap_lock_mutex);
>> return;
>> }
>> +   mutex_unlock(>mmap_lock_mutex);
>>  }
>>
>>  /**
>> @@ -1732,6 +1737,7 @@ int spi_register_master(struct spi_master *master)
>> spin_lock_init(>queue_lock);
>> spin_lock_init(>bus_lock_spinlock);
>> mutex_init(>bus_lock_mutex);
>> +   mutex_init(>mmap_lock_mutex);
>> master->bus_lock_flag = 0;
>> init_completion(>xfer_completion);
>> if (!master->max_dma_len)
>> @@ -2237,6 +2243,35 @@ int spi_async_locked(struct spi_device *spi, struct 
>> spi_message *message)
>>  EXPORT_SYMBOL_GPL(spi_async_locked);
>>
>>
>> +int spi_mtd_mmap_read(struct spi_device *spi, loff_t from, size_t len,
>> + size_t *retlen, u_char *buf, u8 read_opcode,
>> + u8 addr_width, u8 dummy_bytes)
>> +
>> +{
>> +   struct spi_master *master = spi->master;
>> +   int ret;
>> +
>> +   if (master->auto_runtime_pm) {
>> +   ret = pm_runtime_get_sync(master->dev.parent);
>> +   if (ret < 0) {
>> +   dev_err(>dev, "Failed to power device: %d\n",
>> +   ret);
>> +   goto err;
>> +   }
>> +   }
>> +   mutex_lock(>mmap_lock_mutex);
>> +   ret = master->spi_mtd_mmap_read(spi, from, len, retlen, buf,
>> +   read_opcode, addr_width,
>> +   dummy_bytes);
>> +   mutex_unlock(>mmap_lock_mutex);
>> +   if (master->auto_runtime_pm)
>> +   pm_runtime_put(master->dev.parent);
>> +
>> +err:
>> +   return ret;
>> +}
>> +EXPORT_SYMBOL_GPL(spi_mtd_mmap_read);
>> +
>>  
>> /*-*/
>>
>>  /* Utility methods for SPI master protocol drivers, layered on
>> diff --git a/include/linux/spi/spi.h b/include/linux/spi/spi.h
>> index 6b00f18f5e6b..0a6d8ad57357 100644
>> --- a/include/linux/spi/spi.h
>> +++ 

Re: kernel panic in 4.2.3, rb_erase in sch_fq

2015-11-03 Thread Eric Dumazet
On Wed, 2015-11-04 at 06:25 +0200, Denys Fedoryshchenko wrote:
> On 2015-11-04 00:06, Cong Wang wrote:
> > On Mon, Nov 2, 2015 at 6:11 AM, Denys Fedoryshchenko
> >  wrote:
> >> Hi!
> >> 
> >> Actually seems i was getting this panic for a while (once per week) on
> >> loaded pppoe server, but just now was able to get full panic message.
> >> After checking commit logs on sch_fq.c i didnt seen any fixes, so 
> >> probably
> >> upgrading to newer kernel wont help?
> > 
> > 
> > Can you share your `tc qdisc show dev ` with us? And how to 
> > reproduce
> > it? I tried to setup htb+fq and then flip the interface back and forth
> > but I don't
> > see any crash.
> My guess it wont be easy to reproduce, it is happening on box with 4.5k 
> interfaces, that constantly create/delete interfaces,
> and even with that this problem may happen once per day, or may not 
> happen for 1 week.
> 
> Here is script that is being fired after new ppp interface detected. But 
> pppoe process are independent from
> process that are "establishing" shapers.


It is probably a generic bug. sch_fq seems OK to me.

Somehow nobody tries to change qdisc hundred times per second ;)

Could you try following patch ?

It seems to 'fix' the issue for me.

diff --git a/net/core/dev.c b/net/core/dev.c
index 8ce3f74cd6b9..bf136103bc7b 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -2880,6 +2880,12 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, 
struct Qdisc *q,
spin_lock(>busylock);
 
spin_lock(root_lock);
+   if (unlikely(q != rcu_dereference_bh(txq->qdisc))) {
+   pr_err_ratelimited("Arg, qdisc changed ! state %lx\n", 
q->state);
+   kfree_skb(skb);
+   rc = NET_XMIT_DROP;
+   goto end;
+   }
if (unlikely(test_bit(__QDISC_STATE_DEACTIVATED, >state))) {
kfree_skb(skb);
rc = NET_XMIT_DROP;
@@ -2913,6 +2919,7 @@ static inline int __dev_xmit_skb(struct sk_buff *skb, 
struct Qdisc *q,
__qdisc_run(q);
}
}
+end:
spin_unlock(root_lock);
if (unlikely(contended))
spin_unlock(>busylock);


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] locking: Introduce smp_cond_acquire()

2015-11-03 Thread Linus Torvalds
On Tue, Nov 3, 2015 at 7:57 PM, Paul E. McKenney
 wrote:
>
> Thank you, and yes, it clearly states that read-to-write dependencies
> are ordered.

Well, I wouldn't say that it's exactly "clear".

The fact that they explicitly say "Note that the DP relation does not
directly impose a BEFORE (⇐) ordering between accesses u and v" makes
it all clear as mud.

They *then* go on to talk about how the DP relationship *together*
with the odd "is source of" ordering (which in turn is defined in
terms of BEFORE ordering) cannot have cycles.

I have no idea why they do it that way, but the reason seems to be
that they wanted to make "BEFORE" be purely about barriers and
accesses, and make the other orderings be described separately. So the
"BEFORE" ordering is used to define how memory must act, which is then
used as a basis for that storage definition and the "is source of"
thing.

But none of that seems to make much sense to a *user*.

The fact that they seem to equate "BEFORE" with "Processor Issue
Constraints" also makes me think that the whole logic was written by a
CPU designer, and part of why they document it that way is that the
CPU designer literally thought of "can I issue this access" as being
very different from "is there some inherent ordering that just results
from issues outside of my design".

I really don't know. That whole series of memory ordering rules makes
my head hurt.

But I do think the only thing that matters in the end is that they do
have that DP relationship between reads and subsequently dependent
writes, but  basically not for *any* other relationship.

So on alpha read-vs-read, write-vs-later-read, and write-vs-write all
have to have memory barriers, unless the accesses physically overlap.

 Linus
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v3] perf: fix RCU issues with cgroup monitoring mode

2015-11-03 Thread kbuild test robot
Hi Stephane,

[auto build test WARNING on: tip/perf/core]
[also build test WARNING on: v4.3 next-20151103]

url:
https://github.com/0day-ci/linux/commits/Stephane-Eranian/perf-fix-RCU-issues-with-cgroup-monitoring-mode/20151104-121512
config: i386-randconfig-i0-201544 (attached as .config)
reproduce:
# save the attached .config to linux build tree
make ARCH=i386 

All warnings (new ones prefixed by >>):

   In file included from include/linux/trace_events.h:9:0,
from include/trace/syscall.h:6,
from include/linux/syscalls.h:81,
from init/main.c:18:
   include/linux/perf_event.h: In function 'perf_cgroup_from_task':
>> include/linux/perf_event.h:702:7: warning: unused variable 'safe' 
>> [-Wunused-variable]
 bool safe = ctx ? lockdep_is_held(>lock) : true;
  ^

vim +/safe +702 include/linux/perf_event.h

   686  u64 timestamp;
   687  };
   688  
   689  struct perf_cgroup {
   690  struct cgroup_subsys_state  css;
   691  struct perf_cgroup_info __percpu *info;
   692  };
   693  
   694  /*
   695   * Must ensure cgroup is pinned (css_get) before calling
   696   * this function. In other words, we cannot call this function
   697   * if there is no cgroup event for the current CPU context.
   698   */
   699  static inline struct perf_cgroup *
   700  perf_cgroup_from_task(struct task_struct *task, struct 
perf_event_context *ctx)
   701  {
 > 702  bool safe = ctx ? lockdep_is_held(>lock) : true;
   703  return container_of(task_css_check(task, perf_event_cgrp_id, 
safe),
   704  struct perf_cgroup, css);
   705  }
   706  #endif /* CONFIG_CGROUP_PERF */
   707  
   708  #ifdef CONFIG_PERF_EVENTS
   709  
   710  extern void *perf_aux_output_begin(struct perf_output_handle *handle,

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: [PATCH v3] perf: fix RCU issues with cgroup monitoring mode

2015-11-03 Thread kbuild test robot
Hi Stephane,

[auto build test ERROR on: tip/perf/core]
[also build test ERROR on: v4.3 next-20151103]

url:
https://github.com/0day-ci/linux/commits/Stephane-Eranian/perf-fix-RCU-issues-with-cgroup-monitoring-mode/20151104-121512
config: parisc-allyesconfig (attached as .config)
reproduce:
wget 
https://git.kernel.org/cgit/linux/kernel/git/wfg/lkp-tests.git/plain/sbin/make.cross
 -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# save the attached .config to linux build tree
make.cross ARCH=parisc 

All errors (new ones prefixed by >>):

   In file included from include/linux/trace_events.h:9:0,
from include/trace/syscall.h:6,
from include/linux/syscalls.h:81,
from kernel/events/core.c:34:
   include/linux/perf_event.h: In function 'perf_cgroup_from_task':
>> include/linux/perf_event.h:702:2: error: implicit declaration of function 
>> 'lockdep_is_held' [-Werror=implicit-function-declaration]
 bool safe = ctx ? lockdep_is_held(>lock) : true;
 ^
   include/linux/perf_event.h:702:7: warning: unused variable 'safe' 
[-Wunused-variable]
 bool safe = ctx ? lockdep_is_held(>lock) : true;
  ^
   cc1: some warnings being treated as errors

vim +/lockdep_is_held +702 include/linux/perf_event.h

   696   * this function. In other words, we cannot call this function
   697   * if there is no cgroup event for the current CPU context.
   698   */
   699  static inline struct perf_cgroup *
   700  perf_cgroup_from_task(struct task_struct *task, struct 
perf_event_context *ctx)
   701  {
 > 702  bool safe = ctx ? lockdep_is_held(>lock) : true;
   703  return container_of(task_css_check(task, perf_event_cgrp_id, 
safe),
   704  struct perf_cgroup, css);
   705  }

---
0-DAY kernel test infrastructureOpen Source Technology Center
https://lists.01.org/pipermail/kbuild-all   Intel Corporation


.config.gz
Description: Binary data


Re: kernel panic in 4.2.3, rb_erase in sch_fq

2015-11-03 Thread Denys Fedoryshchenko

On 2015-11-04 00:06, Cong Wang wrote:

On Mon, Nov 2, 2015 at 6:11 AM, Denys Fedoryshchenko
 wrote:

Hi!

Actually seems i was getting this panic for a while (once per week) on
loaded pppoe server, but just now was able to get full panic message.
After checking commit logs on sch_fq.c i didnt seen any fixes, so 
probably

upgrading to newer kernel wont help?



Can you share your `tc qdisc show dev ` with us? And how to 
reproduce

it? I tried to setup htb+fq and then flip the interface back and forth
but I don't
see any crash.
My guess it wont be easy to reproduce, it is happening on box with 4.5k 
interfaces, that constantly create/delete interfaces,
and even with that this problem may happen once per day, or may not 
happen for 1 week.


Here is script that is being fired after new ppp interface detected. But 
pppoe process are independent from

process that are "establishing" shapers.

/sbin/tc qdisc del  root
/sbin/tc qdisc add  handle 1: root htb default 3

/sbin/tc filter add parent 1:0 protocol ip prio 4 handle 1 fw flowid 1:3
/sbin/tc filter add parent 1:0 protocol ip prio 3 u32 match ip protocol 
6 0xff match ip src 10.0.252.8/32 flowid 1:3/sbin/tc filter add parent 
1:0 protocol ip prio 5 u32 match ip protocol 1 0xff flowid 1:0
/sbin/tc filter add parent 1:0 protocol ip prio 5 u32 match ip protocol 
6 0xff match ip sport 80 0x flowid 1:4
/sbin/tc filter add parent 1:0 protocol ip prio 5 u32 match ip protocol 
6 0xff match ip sport 443 0x flowid 1:5
/sbin/tc filter add parent 1:0 protocol ip prio 100 u32 match u32 0 0 
flowid 1:2


/sbin/tc class add  classid 1:1 parent 1:0 htb rate 512Kbit ceil 
512Kbit.

/sbin/tc class add  classid 1:2 parent 1:1 htb rate 32Kbit ceil 512Kbit
/sbin/tc class add  classid 1:3 parent 1:0 htb rate 10Mbit ceil 10Mbit
/sbin/tc class add  classid 1:4 parent 1:1 htb rate 32Kbit ceil 512Kbit
/sbin/tc class add  classid 1:5 parent 1:1 htb rate 32Kbit ceil 512Kbit

/sbin/tc qdisc add parent 1:2 fq limit 300
/sbin/tc qdisc add parent 1:3 pfifo limit 300
/sbin/tc qdisc add parent 1:4 fq limit 300
/sbin/tc qdisc add parent 1:5 fq limit 300

Possible cases come to my mind (but maybe i missed others):
 Script and tc working and interface are deleted in a process (e.g. 
interface disappears)
 Script deleting root while there is heavy traffic on interface and a 
lot of packets queued
 ppp interface destroyed, while there is a lot of traffic queued on it 
(this one a bit rare situation)




Thanks.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] locking changes for v4.4

2015-11-03 Thread Paul E. McKenney
On Tue, Nov 03, 2015 at 05:30:29PM -0800, Linus Torvalds wrote:
> On Tue, Nov 3, 2015 at 3:58 PM, Linus Torvalds
>  wrote:
> >
> > I think I'll pull this, but then just make a separate commit to remove
> > all the bogus games with "control" dependencies that seem to have no
> > basis is reality.
> 
> So the attached is what I committed in my tree. It took much longer to
> try to write the rationale than it took to actually remove the
> atomic_read_ctrl() functions, and even so I'm not sure how good that
> commit message is. But at least it tries to explain what's going on.
> 
> Note the final part of the rationale:
> 
> I may have to eat my words at some point, but in the absense of clear
> proof that alpha actually needs this, or indeed even an explanation of
> how alpha could _possibly_ need it, I do not believe these functions are
> called for.
> 
> And if it turns out that alpha really _does_ need a barrier for this
> case, that barrier still should not be "smp_read_barrier_depends()".
> We'd have to make up some new speciality barrier just for alpha, along
> with the documentation for why it really is necessary.

For whatever it is worth, the patch looks good to me.  The reasons I
could imagine why we might want to mark control dependencies are things
like documentation and tooling, but given that we currently only have a
very small number of them, it is hard to argue that this is of immediate
concern, if it is ever of concern.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v3] perf: fix RCU issues with cgroup monitoring mode

2015-11-03 Thread Stephane Eranian

This patch eliminates RCU violations detected by the RCU
checker (PROVE_RCU). The impact code paths were all related
to cgroup mode monitoring and involved access a task's cgrp.

V2 is updated to include comments from PeterZ to eliminate
some of the warnings without grabbing the rcu_read lock because
we know we are already holding th ctx->lock which prevents
the cgroup from disappearing while we are accessing it.
The trick, as suggested by Peter, is to modify the
perf_cgroup_from_task() to take an extra boolean parameter
to allow bypassing the lockdep test in the task_subsys_cstate()
macros. This patch uses this approach to update all calls the
perf_cgroup_from_task().

In V3, we change the boolean parameter for a pointer to a
perf_event_context so we can check the ctx->lock explicitely.
This is more robust, than passing the boolean to express that
we know the lock is held. The code can change, and thus the
locking assumption, checking lockdep_is_held() ensures,
the proper locking is in place. Patch relative to tip.git
at commit 57ef9fc.

Signed-off-by: Stephane Eranian 
---
 arch/x86/kernel/cpu/perf_event_intel_cqm.c |  2 +-
 include/linux/perf_event.h |  5 +++--
 kernel/events/core.c   | 25 +++--
 3 files changed, 19 insertions(+), 13 deletions(-)

diff --git a/arch/x86/kernel/cpu/perf_event_intel_cqm.c 
b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
index 377e8f8..a316ca9 100644
--- a/arch/x86/kernel/cpu/perf_event_intel_cqm.c
+++ b/arch/x86/kernel/cpu/perf_event_intel_cqm.c
@@ -298,7 +298,7 @@ static bool __match_event(struct perf_event *a, struct 
perf_event *b)
 static inline struct perf_cgroup *event_to_cgroup(struct perf_event *event)
 {
if (event->attach_state & PERF_ATTACH_TASK)
-   return perf_cgroup_from_task(event->hw.target);
+   return perf_cgroup_from_task(event->hw.target, event->ctx);
 
return event->cgrp;
 }
diff --git a/include/linux/perf_event.h b/include/linux/perf_event.h
index d841d33..94107e4 100644
--- a/include/linux/perf_event.h
+++ b/include/linux/perf_event.h
@@ -697,9 +697,10 @@ struct perf_cgroup {
  * if there is no cgroup event for the current CPU context.
  */
 static inline struct perf_cgroup *
-perf_cgroup_from_task(struct task_struct *task)
+perf_cgroup_from_task(struct task_struct *task, struct perf_event_context *ctx)
 {
-   return container_of(task_css(task, perf_event_cgrp_id),
+   bool safe = ctx ? lockdep_is_held(>lock) : true;
+   return container_of(task_css_check(task, perf_event_cgrp_id, safe),
struct perf_cgroup, css);
 }
 #endif /* CONFIG_CGROUP_PERF */
diff --git a/kernel/events/core.c b/kernel/events/core.c
index ea02109..f611246 100644
--- a/kernel/events/core.c
+++ b/kernel/events/core.c
@@ -435,7 +435,7 @@ static inline void update_cgrp_time_from_event(struct 
perf_event *event)
if (!is_cgroup_event(event))
return;
 
-   cgrp = perf_cgroup_from_task(current);
+   cgrp = perf_cgroup_from_task(current, event->ctx);
/*
 * Do not update time when cgroup is not active
 */
@@ -458,7 +458,7 @@ perf_cgroup_set_timestamp(struct task_struct *task,
if (!task || !ctx->nr_cgroups)
return;
 
-   cgrp = perf_cgroup_from_task(task);
+   cgrp = perf_cgroup_from_task(task, ctx);
info = this_cpu_ptr(cgrp->info);
info->timestamp = ctx->timestamp;
 }
@@ -489,7 +489,6 @@ static void perf_cgroup_switch(struct task_struct *task, 
int mode)
 * we reschedule only in the presence of cgroup
 * constrained events.
 */
-   rcu_read_lock();
 
list_for_each_entry_rcu(pmu, , entry) {
cpuctx = this_cpu_ptr(pmu->pmu_cpu_context);
@@ -523,7 +522,7 @@ static void perf_cgroup_switch(struct task_struct *task, 
int mode)
 * event_filter_match() to not have to pass
 * task around
 */
-   cpuctx->cgrp = perf_cgroup_from_task(task);
+   cpuctx->cgrp = perf_cgroup_from_task(task, 
NULL);
cpu_ctx_sched_in(cpuctx, EVENT_ALL, task);
}
perf_pmu_enable(cpuctx->ctx.pmu);
@@ -531,8 +530,6 @@ static void perf_cgroup_switch(struct task_struct *task, 
int mode)
}
}
 
-   rcu_read_unlock();
-
local_irq_restore(flags);
 }
 
@@ -542,17 +539,18 @@ static inline void perf_cgroup_sched_out(struct 
task_struct *task,
struct perf_cgroup *cgrp1;
struct perf_cgroup *cgrp2 = NULL;
 
+   rcu_read_lock();
/*
 * we come here when we know perf_cgroup_events > 0
 */
-   cgrp1 = perf_cgroup_from_task(task);
+   cgrp1 = perf_cgroup_from_task(task, NULL);
 
/*
 * next is NULL when called from 

[PATCH v2 net-next] net/core: ensure features get disabled on new lower devs

2015-11-03 Thread Jarod Wilson
With moving netdev_sync_lower_features() after the .ndo_set_features
calls, I neglected to verify that devices added *after* a flag had been
disabled on an upper device were properly added with that flag disabled as
well. This currently happens, because we exit __netdev_update_features()
when we see dev->features == features for the upper dev. We can retain the
optimization of leaving without calling .ndo_set_features with a bit of
tweaking and a goto here.

Fixes: fd867d51f889 ("net/core: generic support for disabling netdev features 
down stack")
CC: "David S. Miller" 
CC: Eric Dumazet 
CC: Jay Vosburgh 
CC: Veaceslav Falico 
CC: Andy Gospodarek 
CC: Jiri Pirko 
CC: Nikolay Aleksandrov 
CC: Michal Kubecek 
CC: Alexander Duyck 
CC: net...@vger.kernel.org
Reported-by: Nikolay Aleksandrov 
Signed-off-by: Jarod Wilson 
---
v2: Based on suggestions from Alex, and with not changing err to ret, this
patch actually becomes quite minimal and doesn't ugly up the code much.

 net/core/dev.c | 7 ---
 1 file changed, 4 insertions(+), 3 deletions(-)

diff --git a/net/core/dev.c b/net/core/dev.c
index 8ce3f74..ab9b8d0 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -6402,7 +6402,7 @@ int __netdev_update_features(struct net_device *dev)
struct net_device *upper, *lower;
netdev_features_t features;
struct list_head *iter;
-   int err = 0;
+   int err = -1;
 
ASSERT_RTNL();
 
@@ -6419,7 +6419,7 @@ int __netdev_update_features(struct net_device *dev)
features = netdev_sync_upper_features(dev, upper, features);
 
if (dev->features == features)
-   return 0;
+   goto sync_lower;
 
netdev_dbg(dev, "Features changed: %pNF -> %pNF\n",
>features, );
@@ -6434,6 +6434,7 @@ int __netdev_update_features(struct net_device *dev)
return -1;
}
 
+sync_lower:
/* some features must be disabled on lower devices when disabled
 * on an upper device (think: bonding master or bridge)
 */
@@ -6443,7 +6444,7 @@ int __netdev_update_features(struct net_device *dev)
if (!err)
dev->features = features;
 
-   return 1;
+   return err < 0 ? 0 : 1;
 }
 
 /**
-- 
1.8.3.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH] mailbox: mailbox-test: avoid reading iomem twice

2015-11-03 Thread jaswinder . singh
From: Jassi Brar 

Don't pass mmio region as source to print_hex_dump() and then
again to memcpy_fromio(). Do it once and give print_hex_dump()
the buffer we just read the data in.

Signed-off-by: Jassi Brar 
---
 drivers/mailbox/mailbox-test.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/drivers/mailbox/mailbox-test.c b/drivers/mailbox/mailbox-test.c
index f82dc89..684ae17 100644
--- a/drivers/mailbox/mailbox-test.c
+++ b/drivers/mailbox/mailbox-test.c
@@ -221,11 +221,10 @@ static void mbox_test_receive_message(struct mbox_client 
*client, void *message)
 
spin_lock_irqsave(>lock, flags);
if (tdev->mmio) {
+   memcpy_fromio(tdev->rx_buffer, tdev->mmio, MBOX_MAX_MSG_LEN);
print_hex_dump(KERN_INFO, "Client: Received [MMIO]: ",
   DUMP_PREFIX_ADDRESS, MBOX_BYTES_PER_LINE, 1,
-  __io_virt(tdev->mmio), MBOX_MAX_MSG_LEN, true);
-   memcpy_fromio(tdev->rx_buffer, tdev->mmio, MBOX_MAX_MSG_LEN);
-
+  tdev->rx_buffer, MBOX_MAX_MSG_LEN, true);
} else if (message) {
print_hex_dump(KERN_INFO, "Client: Received [API]: ",
   DUMP_PREFIX_ADDRESS, MBOX_BYTES_PER_LINE, 1,
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 4/4] locking: Introduce smp_cond_acquire()

2015-11-03 Thread Paul E. McKenney
On Tue, Nov 03, 2015 at 11:40:24AM -0800, Linus Torvalds wrote:
> On Mon, Nov 2, 2015 at 5:57 PM, Paul E. McKenney
>  wrote:

[ . . . ]

> > I am in India and my Alpha Architecture Manual is in the USA.
> 
> I sent you a link to something that should work, and that has the section.

Thank you, and yes, it clearly states that read-to-write dependencies
are ordered.  Color me slow and stupid.

> > And they did post a clarification on the web:
> 
> So for alpha, you trust a random web posting by a unknown person that
> talks about some random problem in an application that we don't even
> know what it is.

In my defense, the Alpha architects pointed me at that web posting, but
yes, it appears to be pushing for overly conservative safety rather than
accuracy.  Please accept my apologies for my confusion.

Thanx, Paul

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V4] usb: remove unnecessary CONFIG_PM dependency from USB_OTG

2015-11-03 Thread Felipe Balbi

Hi,

Peter Chen  writes:
> On Tue, Nov 03, 2015 at 07:56:55AM -0600, Felipe Balbi wrote:
>> 
>> Hi,
>> 
>> Nathan Sullivan  writes:
>> > The USB OTG support currently depends on power management
>> > (CONFIG_PM) being enabled, but does not actually need it enabled.
>> > Remove this dependency.
>> >
>> > Tested on Bay Trail hardware with dwc3 USB.
>> >
>> > Signed-off-by: Nathan Sullivan 
>> > ---
>> >  drivers/usb/core/Kconfig |1 -
>> >  1 file changed, 1 deletion(-)
>> >
>> > diff --git a/drivers/usb/core/Kconfig b/drivers/usb/core/Kconfig
>> > index a99c89e..9c5cdf3 100644
>> > --- a/drivers/usb/core/Kconfig
>> > +++ b/drivers/usb/core/Kconfig
>> > @@ -43,7 +43,6 @@ config USB_DYNAMIC_MINORS
>> >  
>> >  config USB_OTG
>> >bool "OTG support"
>> > -  depends on PM
>> 
>> I don't think this is correct. OTG depends on USB bus suspend, which is
>> only available on PM builds. Care to further detail why you think PM is
>> not needed on OTG ?
>> 
>
> OTG depends on USB bus suspend is not a must, the hardware controlled OTG
> design do HNP when the bus goes to suspend; but if the software
> implements OTG FSM, it is the user option whether do HNP, and bus
> suspend is controlled by OTG FSM software (stop SOF), but not by host 
> stack (eg, ehci).
>
> I am sorry I did not consider the legacy OTG design, this patch should
> be dropped.

there is no "legacy" OTG design. OTG requires a bus suspend to enter
HNP, and that's achieved by stopping all transfers and avoid new URB
submission so usbcore can put the bus in suspend (by means of USB
autosuspend). If you're bypassing that in the OTG FSM thing, that needs
to be fixed ASAP as that makes it a lot harder for any generic changes
in usbcore to be validated. Specially when you consider not many will
have whatever special HW which, likely, doesn't even work with mainline
to validate a change.

Please, make sure to fix that design so that HNP *always* goes through
the proper code path. If you have devices which would prevent HNP
because their class driver (host side driver) would never autosuspend,
fix that as well.

cheers

-- 
balbi


signature.asc
Description: PGP signature


Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)

2015-11-03 Thread Andy Lutomirski
On Nov 3, 2015 5:30 PM, "Minchan Kim"  wrote:
>
> Linux doesn't have an ability to free pages lazy while other OS already
> have been supported that named by madvise(MADV_FREE).
>
> The gain is clear that kernel can discard freed pages rather than swapping
> out or OOM if memory pressure happens.
>
> Without memory pressure, freed pages would be reused by userspace without
> another additional overhead(ex, page fault + allocation + zeroing).
>

[...]

>
> How it works:
>
> When madvise syscall is called, VM clears dirty bit of ptes of the range.
> If memory pressure happens, VM checks dirty bit of page table and if it
> found still "clean", it means it's a "lazyfree pages" so VM could discard
> the page instead of swapping out.  Once there was store operation for the
> page before VM peek a page to reclaim, dirty bit is set so VM can swap out
> the page instead of discarding.

What happens if you MADV_FREE something that's MAP_SHARED or isn't
ordinary anonymous memory?  There's a long history of MADV_DONTNEED on
such mappings causing exploitable problems, and I think it would be
nice if MADV_FREE were obviously safe.

Does this set the write protect bit?

What happens on architectures without hardware dirty tracking?  For
that matter, even on architecture with hardware dirty tracking, what
happens in multithreaded processes that have the dirty TLB state
cached in a different CPU's TLB?

Using the dirty bit for these semantics scares me.  This API creates a
page that can have visible nonzero contents and then can
asynchronously and magically zero itself thereafter.  That makes me
nervous.  Could we use the accessed bit instead?  Then the observable
semantics would be equivalent to having MADV_FREE either zero the page
or do nothing, except that it doesn't make up its mind until the next
read.

> +   ptent = pte_mkold(ptent);
> +   ptent = pte_mkclean(ptent);
> +   set_pte_at(mm, addr, pte, ptent);
> +   tlb_remove_tlb_entry(tlb, pte, addr);

It looks like you are flushing the TLB.  In a multithreaded program,
that's rather expensive.  Potentially silly question: would it be
better to just zero the page immediately in a multithreaded program
and then, when swapping out, check the page is zeroed and, if so, skip
swapping it out?  That could be done without forcing an IPI.

> +static int madvise_free_single_vma(struct vm_area_struct *vma,
> +   unsigned long start_addr, unsigned long end_addr)
> +{
> +   unsigned long start, end;
> +   struct mm_struct *mm = vma->vm_mm;
> +   struct mmu_gather tlb;
> +
> +   if (vma->vm_flags & (VM_LOCKED|VM_HUGETLB|VM_PFNMAP))
> +   return -EINVAL;
> +
> +   /* MADV_FREE works for only anon vma at the moment */
> +   if (!vma_is_anonymous(vma))
> +   return -EINVAL;

Does anything weird happen if it's shared?

> +   if (!PageDirty(page) && (flags & TTU_FREE)) {
> +   /* It's a freeable page by MADV_FREE */
> +   dec_mm_counter(mm, MM_ANONPAGES);
> +   goto discard;
> +   }

Does something clear TTU_FREE the next time the page gets marked clean?

--Andy
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v1] Revert "serial: imx: remove unbalanced clk_prepare"

2015-11-03 Thread Fabio Estevam
On Wed, Nov 4, 2015 at 12:58 AM, Robin Gong  wrote:
> commit 9e7b399d6528 ("serial: imx: remove unbalanced clk_prepare").
> Otherwise below warning happen since there are some printk logs in
> interrupt.

This has already been reverted since v4.3-rc5.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


RE: [PATCH] ARM: add v7 LPAE multi-platform defconfig

2015-11-03 Thread Huan Wang
> On Tue, Oct 27, 2015 at 10:35:07PM +0800, Alison Wang wrote:
> > v7 LPAE multi-platform defconfig is based on v7 multi-platform
> > defconfig and adds LPAE support.
> >
> > This defconfig is verified on LS1021A which enables GIANFAR, I2C,
> > WATCHDOG, AUDIO, EDMA and DSPI drivers, etc.
> >
> > Signed-off-by: Alison Wang 
> 
> I think this would be great if it also had:
> 
> CONFIG_ARCH_VIRT=y
> CONFIG_VIRTUALIZATION=y
> CONFIG_KVM=y
> 
> That would allow it to be used for KVM testing both as host and guest.
> 
> FWIW, I just tried booting this on my TC2, but I don't get any output
> from there.  Is it supposed to work on this platform?  If not, why not?
> 
[Alison Wang] One question, does TC2 support LPAE? If so, we could try to
find the reason and add it in this defconfig for armv7 LPAE support.


Best Regards,
Alison Wang
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 2/2] arm64: Allow changing of attributes outside of modules

2015-11-03 Thread zhong jiang
On 2015/11/4 5:48, Laura Abbott wrote:
> 
> Currently, the set_memory_* functions that are implemented for arm64
> are restricted to module addresses only. This was mostly done
> because arm64 maps normal zone memory with larger page sizes to
> improve TLB performance. This has the side effect though of making it
> difficult to adjust attributes at the PAGE_SIZE granularity. There are
> an increasing number of use cases related to security where it is
> necessary to change the attributes of kernel memory. Add functionality
> to the page attribute changing code under a Kconfig to let systems
> designers decide if they want to make the trade off of security for TLB
> pressure.
> 
> Signed-off-by: Laura Abbott 
> ---
>  arch/arm64/Kconfig.debug | 11 +++
>  arch/arm64/mm/mm.h   |  3 ++
>  arch/arm64/mm/mmu.c  |  2 +-
>  arch/arm64/mm/pageattr.c | 74 
> 
>  4 files changed, 84 insertions(+), 6 deletions(-)
> 
> diff --git a/arch/arm64/Kconfig.debug b/arch/arm64/Kconfig.debug
> index d6285ef..abc6922 100644
> --- a/arch/arm64/Kconfig.debug
> +++ b/arch/arm64/Kconfig.debug
> @@ -89,6 +89,17 @@ config DEBUG_ALIGN_RODATA
>  
> If in doubt, say N
>  
> +config DEBUG_CHANGE_PAGEATTR
> + bool "Allow all kernel memory to have attributes changed"
> + help
> +   If this option is selected, APIs that change page attributes
> +   (RW <-> RO, X <-> NX) will be valid for all memory mapped in
> +   the kernel space. The trade off is that there may be increased
> +   TLB pressure from finer grained page mapping. Turn on this option
> +   if performance is more important than security
> +
> +   If in doubt, say N
> +
>  source "drivers/hwtracing/coresight/Kconfig"
>  
>  endmenu
> diff --git a/arch/arm64/mm/mm.h b/arch/arm64/mm/mm.h
> index ef47d99..7b0dcc4 100644
> --- a/arch/arm64/mm/mm.h
> +++ b/arch/arm64/mm/mm.h
> @@ -1,3 +1,6 @@
>  extern void __init bootmem_init(void);
>  
>  void fixup_init(void);
> +
> +void split_pud(pud_t *old_pud, pmd_t *pmd);
> +void split_pmd(pmd_t *pmd, pte_t *pte);
> diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
> index ff41efa..cefad2d 100644
> --- a/arch/arm64/mm/mmu.c
> +++ b/arch/arm64/mm/mmu.c
> @@ -72,7 +72,7 @@ static void __init *early_alloc(unsigned long sz)
>  /*
>   * remap a PMD into pages
>   */
> -static void split_pmd(pmd_t *pmd, pte_t *pte)
> +void split_pmd(pmd_t *pmd, pte_t *pte)
>  {
>   unsigned long pfn = pmd_pfn(*pmd);
>   unsigned long addr = pfn << PAGE_SHIFT;
> diff --git a/arch/arm64/mm/pageattr.c b/arch/arm64/mm/pageattr.c
> index e47ed1c..48a4ce9 100644
> --- a/arch/arm64/mm/pageattr.c
> +++ b/arch/arm64/mm/pageattr.c
> @@ -15,9 +15,12 @@
>  #include 
>  #include 
>  
> +#include 
>  #include 
>  #include 
>  
> +#include "mm.h"
> +
>  struct page_change_data {
>   pgprot_t set_mask;
>   pgprot_t clear_mask;
> @@ -36,6 +39,66 @@ static int change_page_range(pte_t *ptep, pgtable_t token, 
> unsigned long addr,
>   return 0;
>  }
>  
> +#ifdef CONFIG_DEBUG_CHANGE_PAGEATTR
> +static int check_address(unsigned long addr)
> +{
> + pgd_t *pgd = pgd_offset_k(addr);
> + pud_t *pud;
> + pmd_t *pmd;
> + pte_t *pte;
> + int ret = -EFAULT;
> +
> + if (pgd_none(*pgd))
> + goto out;
> +
> + pud = pud_offset(pgd, addr);
> + if (pud_none(*pud))
> + goto out;
> +
> + if (pud_sect(*pud)) {
> + pmd = pmd_alloc_one(_mm, addr);
> + if (!pmd) {
> + ret = -ENOMEM;
> + goto out;
> + }
> + split_pud(pud, pmd);
> + pud_populate(_mm, pud, pmd);
> + }
> +
> + pmd = pmd_offset(pud, addr);
> + if (pmd_none(*pmd))
> + goto out;
> +
> + if (pmd_sect(*pmd)) {
> + pte = pte_alloc_one_kernel(_mm, addr);
> + if (!pte) {
> + ret = -ENOMEM;
> + goto out;
> + }
> + split_pmd(pmd, pte);
> + __pmd_populate(pmd, __pa(pte), PMD_TYPE_TABLE);
> + }
> +
> + pte = pte_offset_kernel(pmd, addr);
> + if (pte_none(*pte))
> + goto out;
> +
> + flush_tlb_all();
> + ret = 0;
> +
> +out:
> + return ret;
> +}
> +#else
> +static int check_address(unsigned long addr)
> +{
> + if (addr < MODULES_VADDR || addr >= MODULES_END)
> + return -EINVAL;
> +
> + return 0;
> +}
> +#endif
> +
>  static int change_memory_common(unsigned long addr, int numpages,
>   pgprot_t set_mask, pgprot_t clear_mask)
>  {
> @@ -45,17 +108,18 @@ static int change_memory_common(unsigned long addr, int 
> numpages,
>   int ret;
>   struct page_change_data data;
>  
> + if (addr < PAGE_OFFSET && !is_vmalloc_addr((void *)addr))
> + return -EINVAL;
> +
>   if (!IS_ALIGNED(addr, PAGE_SIZE)) {
>   start &= PAGE_MASK;
>  

Re: [RFC] iommu: arm-smmu: correct reference count

2015-11-03 Thread Peng Fan
Hi Will,

On Tue, Nov 03, 2015 at 01:17:34PM +, Will Deacon wrote:
>On Tue, Nov 03, 2015 at 08:59:17PM +0800, Peng Fan wrote:
>> iommu_group_alloc will initialize the reference count for group to 1.
>> iommu_group_add_device also increase the group reference count,
>> if nothing bad happends. And we need to add iommu_group_put to
>> decrease the reference count for group.
>> 
>> Signed-off-by: Peng Fan 
>> Cc: Will Deacon 
>> Cc: Joerg Roedel 
>> ---
>> 
>> Not sure whether my understanding is correct or not. I checked 
>> rockchip-iommu.c
>> exynos-iommu.c and fsl_pamu_domain.c, and they all have iommu_group_put after
>> iommu_group_add_device.
>
>Doesn't this pair up with the iommu_group_remove_device in
>arm_smmu_remove_device? Are you actually seeing an issue in practice?

In arm_smmu_add_platform_device,
iommu_group_alloc --> group->device_kobj ref count will be init to 1.
iommu_group_add_device --> group->device_kobj ref count will be added 1 to 2.

In arm_smmu_remove_device:
iommu_group_remove_device --> Decrease group->device_kobj ref count by 1.
After arm_smmu_remove_device, the ref count of group->device_kobj is not 0.

So I think need to add iommu_group_put after iommu_group_add_device.
If I am wrong, please correct me.

Just code inspection, not have a platform to test this.

Regards,
Peng.

>
>Will
>___
>iommu mailing list
>io...@lists.linux-foundation.org
>https://lists.linuxfoundation.org/mailman/listinfo/iommu

-- 
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V4] usb: remove unnecessary CONFIG_PM dependency from USB_OTG

2015-11-03 Thread Peter Chen
On Tue, Nov 03, 2015 at 07:56:55AM -0600, Felipe Balbi wrote:
> 
> Hi,
> 
> Nathan Sullivan  writes:
> > The USB OTG support currently depends on power management
> > (CONFIG_PM) being enabled, but does not actually need it enabled.
> > Remove this dependency.
> >
> > Tested on Bay Trail hardware with dwc3 USB.
> >
> > Signed-off-by: Nathan Sullivan 
> > ---
> >  drivers/usb/core/Kconfig |1 -
> >  1 file changed, 1 deletion(-)
> >
> > diff --git a/drivers/usb/core/Kconfig b/drivers/usb/core/Kconfig
> > index a99c89e..9c5cdf3 100644
> > --- a/drivers/usb/core/Kconfig
> > +++ b/drivers/usb/core/Kconfig
> > @@ -43,7 +43,6 @@ config USB_DYNAMIC_MINORS
> >  
> >  config USB_OTG
> > bool "OTG support"
> > -   depends on PM
> 
> I don't think this is correct. OTG depends on USB bus suspend, which is
> only available on PM builds. Care to further detail why you think PM is
> not needed on OTG ?
> 

OTG depends on USB bus suspend is not a must, the hardware controlled OTG
design do HNP when the bus goes to suspend; but if the software
implements OTG FSM, it is the user option whether do HNP, and bus
suspend is controlled by OTG FSM software (stop SOF), but not by host 
stack (eg, ehci).

I am sorry I did not consider the legacy OTG design, this patch should
be dropped.

-- 

Best Regards,
Peter Chen
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v13 02/51] vfs: Add MAY_CREATE_FILE and MAY_CREATE_DIR permission flags

2015-11-03 Thread Andreas Gruenbacher
On Wed, Nov 4, 2015 at 3:33 AM, Andreas Dilger  wrote:
> On Nov 3, 2015, at 8:16 AM, Andreas Gruenbacher  wrote:
>> @@ -3667,7 +3674,7 @@ EXPORT_SYMBOL(dentry_unhash);
>>
>> int vfs_rmdir(struct inode *dir, struct dentry *dentry)
>> {
>> - int error = may_delete(dir, dentry, 1);
>> + int error = may_delete(dir, dentry, true, false);
>
> This is a prime example why passing "true" and "false" as function arguments
> is not very useful, and especially prone to bugs when there are two of them.
>
> That said, this is code originally from Al, so he may have a different
> opinion.

Have you checked how vfs_rename uses the is_dir and new_is_dir
variables? Using file modes there probably won't help readability. An
enum maybe?

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v1] Revert "serial: imx: remove unbalanced clk_prepare"

2015-11-03 Thread Robin Gong
commit 9e7b399d6528 ("serial: imx: remove unbalanced clk_prepare").
Otherwise below warning happen since there are some printk logs in
interrupt.

[   14.868319] udevd[501]: starting version 182
[   16.386107] random: nonblocking pool is initialized
[   16.386123] [ cut here ]
[   16.386140] WARNING: CPU: 0 PID: 501 at kernel/locking/mutex.c:868 
mutex_trylock+0x210/0x230()
[   16.386146] DEBUG_LOCKS_WARN_ON(in_interrupt())
[   16.386149] Modules linked in:
[   16.386157] CPU: 0 PID: 501 Comm: udevd Not tainted 4.3.0-rc1-00014-gf843df8 
#28
[   16.386160] Hardware name: Freescale i.MX6 SoloX (Device Tree)
[   16.386165] Backtrace:
[   16.386182] [] (dump_backtrace) from [] 
(show_stack+0x18/0x1c)
[   16.386192]  r6:c0af1840 r5: r4: r3:
[   16.386201] [] (show_stack) from [] 
(dump_stack+0x8c/0xa4)
[   16.386216] [] (dump_stack) from [] 
(warn_slowpath_common+0x80/0xbc)
[   16.386225]  r6:c07b1ccc r5:0009 r4:edcf5c60 r3:0001
[   16.386234] [] (warn_slowpath_common) from [] 
(warn_slowpath_fmt+0x38/0x40)
[   16.386246]  r8:c0564694 r7:eeb76880 r6:c13323ec r5:0001 r4:c0976990
[   16.386255] [] (warn_slowpath_fmt) from [] 
(mutex_trylock+0x210/0x230)
[   16.386261]  r3:c0978f50 r2:c0976990
[   16.386264]  r4:c0b29a70
[   16.386278] [] (mutex_trylock) from [] 
(clk_prepare_lock+0x14/0xf4)
[   16.386289]  r8:c133000c r7:eeb76880 r6:0037 r5:c12efbc8 r4:eeb76880
[   16.386297] [] (clk_prepare_lock) from [] 
(clk_prepare+0x18/0x38)
[   16.386303]  r5:c12efbc8 r4:eeb76880
[   16.386311] [] (clk_prepare) from [] 
(imx_console_write+0x34/0x248)
[   16.386317]  r4:ee999c10 r3:c134a92c
[   16.386329] [] (imx_console_write) from [] 
(call_console_drivers.constprop.25+0xe0/0x104)
[   16.386342]  r10:c07b938c r9: r8:c133000c r7:0037 r6:edcf4000 
r5:c12ef6c0
[   16.386345]  r4:c0afef38
[   16.386355] [] (call_console_drivers.constprop.25) from 
[] (console_unlock+0x3fc/0x57c)
[   16.386367]  r10:c132fff0 r9:0100 r8:0037 r7: r6:0005 
r5:c12f4df0

Signed-off-by: Robin Gong 
---
 drivers/tty/serial/imx.c | 20 ++--
 1 file changed, 14 insertions(+), 6 deletions(-)

diff --git a/drivers/tty/serial/imx.c b/drivers/tty/serial/imx.c
index fe3d41c..d0388a0 100644
--- a/drivers/tty/serial/imx.c
+++ b/drivers/tty/serial/imx.c
@@ -1631,12 +1631,12 @@ imx_console_write(struct console *co, const char *s, 
unsigned int count)
int locked = 1;
int retval;
 
-   retval = clk_prepare_enable(sport->clk_per);
+   retval = clk_enable(sport->clk_per);
if (retval)
return;
-   retval = clk_prepare_enable(sport->clk_ipg);
+   retval = clk_enable(sport->clk_ipg);
if (retval) {
-   clk_disable_unprepare(sport->clk_per);
+   clk_disable(sport->clk_per);
return;
}
 
@@ -1675,8 +1675,8 @@ imx_console_write(struct console *co, const char *s, 
unsigned int count)
if (locked)
spin_unlock_irqrestore(>port.lock, flags);
 
-   clk_disable_unprepare(sport->clk_ipg);
-   clk_disable_unprepare(sport->clk_per);
+   clk_disable(sport->clk_ipg);
+   clk_disable(sport->clk_per);
 }
 
 /*
@@ -1777,7 +1777,15 @@ imx_console_setup(struct console *co, char *options)
 
retval = uart_set_options(>port, co, baud, parity, bits, flow);
 
-   clk_disable_unprepare(sport->clk_ipg);
+   clk_disable(sport->clk_ipg);
+   if (retval) {
+   clk_unprepare(sport->clk_ipg);
+   goto error_console;
+   }
+
+   retval = clk_prepare(sport->clk_per);
+   if (retval)
+   clk_disable_unprepare(sport->clk_ipg);
 
 error_console:
return retval;
-- 
1.9.1

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: linux-next: build failure after merge of the mailbox tree

2015-11-03 Thread Jassi Brar
On 1 November 2015 at 17:42, Stephen Rothwell  wrote:
> Hi Jassi,
>
> After merging the mailbox tree, today's linux-next build (x86_64
> allmodconfig) failed like this:
>
> drivers/mailbox/mailbox-test.c: In function 'mbox_test_receive_message':
> drivers/mailbox/mailbox-test.c:226:11: error: implicit declaration of 
> function '__io_virt' [-Werror=implicit-function-declaration]
>__io_virt(tdev->mmio), MBOX_MAX_MSG_LEN, true);
>^
>
> Caused by commit
>
>   a133f8b65d59 ("mailbox: mailbox-test: Correctly repair Sparse warnings")
>
> I have used the mailbox tree from next-20151022 for today.
>
Lee, would you please send a fix for your last fix please?

Thanks.
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v11 08/28] tracing: Add lock-free tracing_map

2015-11-03 Thread Tom Zanussi
Hi Namhyung,

On Wed, 2015-11-04 at 11:26 +0900, Namhyung Kim wrote:
> Hi Tom,
> 
> On Tue, Nov 03, 2015 at 07:47:52PM -0600, Tom Zanussi wrote:
> > Hi Namhyung,
> > 
> > On Mon, 2015-11-02 at 16:08 +0900, Namhyung Kim wrote:
> > > I thought it'd be better if users can see which one is the real drop
> > > or not.  IOW if drop count is much smaller than the normal event
> > > count, [s]he might want to ignore the occasional drops.  Otherwise,
> > > [s]he should restart with a bigger table.  This requires accurate
> > > counts of events and drops though.
> > > 
> > 
> > OK, how about the below - it basically moves the drops set/test/inc into
> > tracing_map_insert(), as well as a total hits count.  So those values
> > will be available for users to use in deciding whether to use the data
> > or restart with a bigger table, and the loop is bailed out of only if no
> > matching keys are found and there are drops, so callers can continue
> > updating existing entries.
> 
> But if a key didn't get a desired index, it'd still fail to update..
> 
> 
> > 
> > Users who want the original behavior still get the NULL return and can
> > stop calling tracing_map_insert() as before:
> > 
> > struct tracing_map_elt *tracing_map_insert(struct tracing_map *map, void 
> > *key)
> > {
> > u32 idx, key_hash, test_key;
> > struct tracing_map_entry *entry;
> > 
> > key_hash = jhash(key, map->key_size, 0);
> > if (key_hash == 0)
> > key_hash = 1;
> > idx = key_hash >> (32 - (map->map_bits + 1));
> > 
> > while (1) {
> > idx &= (map->map_size - 1);
> > entry = TRACING_MAP_ENTRY(map->map, idx);
> > test_key = entry->key;
> > 
> > if (test_key && test_key == key_hash && entry->val &&
> > keys_match(key, entry->val->key, map->key_size)) {
> > atomic64_inc(>hits);
> > return entry->val;
> > }
> > 
> > if (atomic64_read(>drops)) {
> > atomic64_inc(>drops);
> > break;
> > }
> 
> IMHO it should be removed.
> 
> > 
> > if (!test_key && !cmpxchg(>key, 0, key_hash)) {
> > struct tracing_map_elt *elt;
> > 
> > elt = get_free_elt(map);
> > if (!elt) {
> > atomic64_inc(>drops);
> 
> And reset entry->key here..
> 
> > break;
> > }
> > memcpy(elt->key, key, map->key_size);
> > entry->val = elt;
> > 
> > atomic64_inc(>hits);
> > return entry->val;
> > }
> > idx++;
> > }
> > 
> > return NULL;
> > }
> 
> 
> Then tracing_map_lookup() can be implemented like *_insert but bail out
> from the loop if test_key is 0.  The caller might do like this:
> 
>   if (!atomic64_read(>drops))
>   elt = tracing_map_insert(...);
>   else
>   elt = tracing_map_lookup(...);
> 

Yeah, I think that should work - let me try that in the next version...

Thanks,

Tom

> Thanks,
> Namhyung


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH V5] mm: memory hot-add: memory can not be added to movable zone defaultly

2015-11-03 Thread Changsheng Liu



On 11/4 2015 0:13, Yasuaki Ishimatsu wrote:

Hi Changsheng,

According to the following thread, Tang has no objection to change kernel
behavior since udev cannot online memory as movable.

https://lkml.org/lkml/2015/10/21/159

So how about reposting the v5 patch?
I have a comment about the patch. Please see below.

 Thanks,I will update the patch and repost it

On Tue, 15 Sep 2015 03:49:58 -0400
Changsheng Liu  wrote:


From: Changsheng Liu 

After the user config CONFIG_MOVABLE_NODE and movable_node kernel option,
When the memory is hot added, should_add_memory_movable() return 0
because all zones including movable zone are empty,
so the memory that was hot added will be added  to the normal zone
and the normal zone will be created firstly.
But we want the whole node to be added to movable zone defaultly.

So we change should_add_memory_movable(): if the user config
CONFIG_MOVABLE_NODE and movable_node kernel option
it will always return 1 and all zones is empty at the same time,
so that the movable zone will be created firstly
and then the whole node will be added to movable zone defaultly.
If we want the node to be added to normal zone,
we can do it as follows:
"echo online_kernel > /sys/devices/system/memory/memoryXXX/state"

Signed-off-by: Xiaofeng Yan 
Signed-off-by: Changsheng Liu 
Tested-by: Dongdong Fan 
---
  mm/memory_hotplug.c |8 
  1 files changed, 8 insertions(+), 0 deletions(-)

diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 26fbba7..d39dbb0 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -1190,6 +1190,9 @@ static int check_hotplug_memory_range(u64 start, u64 size)
  /*
   * If movable zone has already been setup, newly added memory should be check.
   * If its address is higher than movable zone, it should be added as movable.
+ * And if system boots up with movable_node and config CONFIG_MOVABLE_NOD and
+ * added memory does not overlap the zone before MOVABLE_ZONE,
+ * the memory is added as movable
   * Without this check, movable zone may overlap with other zone.
   */
  static int should_add_memory_movable(int nid, u64 start, u64 size)
@@ -1197,6 +1200,11 @@ static int should_add_memory_movable(int nid, u64 start, 
u64 size)
unsigned long start_pfn = start >> PAGE_SHIFT;
pg_data_t *pgdat = NODE_DATA(nid);
struct zone *movable_zone = pgdat->node_zones + ZONE_MOVABLE;
+   struct zone *pre_zone = pgdat->node_zones + (ZONE_MOVABLE - 1);
+
+   if (movable_node_is_enabled()
+   && zone_end_pfn(pre_zone) <= start_pfn)
+   return 1;

if (movable_node_is_enabled() && (zone_end_pfn(pre_zone) <= start_pfn))

Thanks,
Yasuaki Ishimatsu

  
  	if (zone_is_empty(movable_zone))

return 0;
--
1.7.1


.



--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v13 21/51] ext4: Add richacl feature flag

2015-11-03 Thread Andreas Gruenbacher
On Wed, Nov 4, 2015 at 3:28 AM, Andreas Gruenbacher  wrote:
> It's the commit message that's misleading here, I'll fix it.

Commit message changed to:

This feature flag selects richacl instead of POSIX ACL support on the
filesystem.  When this feature is off, the "acl" and "noacl" mount options
control whether POSIX ACLs are enabled.  When it is on, richacls are
automatically enabled and using the "noacl" mount option leads to an error.

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v11 17/28] tracing: Add hist trigger 'execname' modifier

2015-11-03 Thread Tom Zanussi
On Mon, 2015-11-02 at 23:10 +0900, Namhyung Kim wrote:
> On Thu, Oct 22, 2015 at 01:14:21PM -0500, Tom Zanussi wrote:
> > Allow users to have pid fields displayed as program names in the output
> > by appending '.execname' to field names:
> > 
> ># echo hist:keys=aaa.execname ... \
> >   [ if filter] > event/trigger
> 
> So in this case, 'aaa' should be 'common_pid', right?
> 

Right.

> Also, it's just for display thus tasks which have different pid but
> same comm name will be saved in different entries, right?  If so, it'd
> be better to note that somewhere as it might confuse users..
> 

Right, good point, will document that.

> 
> > 
> > Signed-off-by: Tom Zanussi 
> > Tested-by: Masami Hiramatsu 
> > ---
> >  kernel/trace/trace.c |  3 +-
> >  kernel/trace/trace_events_hist.c | 86 
> > +++-
> >  2 files changed, 87 insertions(+), 2 deletions(-)
> > 
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index 4f0968e..a143be0 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -3822,7 +3822,8 @@ static const char readme_msg[] =
> > "\tfollowing modifiers to the field name, as applicable:\n\n"
> > "\t.hexdisplay a number as a hex value\n"
> > "\t.symdisplay an address as a symbol\n"
> > -   "\t.sym-offset display an address as a symbol and 
> > offset\n\n"
> > +   "\t.sym-offset display an address as a symbol and offset\n"
> > +   "\t.execname   display a common_pid as a program name\n\n"
> > "\tThe 'pause' parameter can be used to pause an existing hist\n"
> > "\ttrigger or to start a hist trigger but not log any events\n"
> > "\tuntil told to do so.  'continue' can be used to start or\n"
> > diff --git a/kernel/trace/trace_events_hist.c 
> > b/kernel/trace/trace_events_hist.c
> > index e6e5c6e..656973a 100644
> > --- a/kernel/trace/trace_events_hist.c
> > +++ b/kernel/trace/trace_events_hist.c
> > @@ -75,6 +75,7 @@ enum hist_field_flags {
> > HIST_FIELD_HEX  = 8,
> > HIST_FIELD_SYM  = 16,
> > HIST_FIELD_SYM_OFFSET   = 32,
> > +   HIST_FIELD_EXECNAME = 64,
> >  };
> >  
> >  struct hist_trigger_attrs {
> > @@ -218,6 +219,78 @@ static struct hist_trigger_attrs 
> > *parse_hist_trigger_attrs(char *trigger_str)
> > return ERR_PTR(ret);
> >  }
> >  
> > +static inline void save_comm(char *comm, struct task_struct *task)
> > +{
> > +   if (!task->pid) {
> > +   strcpy(comm, "");
> > +   return;
> > +   }
> > +
> > +   if (WARN_ON_ONCE(task->pid < 0)) {
> > +   strcpy(comm, "");
> > +   return;
> > +   }
> > +
> > +   if (task->pid > PID_MAX_DEFAULT) {
> > +   strcpy(comm, "<...>");
> > +   return;
> > +   }
> 
> I guess this comes from the saved_cmdlines code which uses a fixed
> array of PID_MAX_DEFAULT size.  But as it doesn't use the array, we
> can simply copy the task->comm IMHO.
> 
> 
> > +
> > +   memcpy(comm, task->comm, TASK_COMM_LEN);
> > +}
> > +
> > +static void hist_trigger_elt_free(struct tracing_map_elt *elt)
> > +{
> > +   kfree((char *)elt->private_data);
> > +}
> > +
> > +static int hist_trigger_elt_alloc(struct tracing_map_elt *elt)
> > +{
> > +   struct hist_trigger_data *hist_data = elt->map->private_data;
> > +   struct hist_field *key_field;
> > +   unsigned int i;
> > +
> > +   for (i = hist_data->n_vals; i < hist_data->n_fields; i++) {
> 
> I think it'd be better to add a list iterator macro like
> for_each_key_field() or something since it's easy to miss that the key
> fields always follows the value fields IMHO.
> 

Yeah, that's a good idea (sometimes I even forget that myself ;-)

> 
> > +   key_field = hist_data->fields[i];
> > +
> > +   if (key_field->flags & HIST_FIELD_EXECNAME) {
> > +   unsigned int size = TASK_COMM_LEN + 1;
> > +
> > +   elt->private_data = kzalloc(size, GFP_KERNEL);
> > +   if (!elt->private_data)
> > +   return -ENOMEM;
> > +   break;
> > +   }
> > +   }
> > +
> > +   return 0;
> > +}
> > +
> > +static void hist_trigger_elt_copy(struct tracing_map_elt *to,
> > + struct tracing_map_elt *from)
> > +{
> > +   char *comm_from = from->private_data;
> > +   char *comm_to = to->private_data;
> > +
> > +   if (comm_from)
> > +   memcpy(comm_to, comm_from, TASK_COMM_LEN + 1);
> > +}
> > +
> > +static void hist_trigger_elt_init(struct tracing_map_elt *elt)
> > +{
> > +   char *comm = elt->private_data;
> > +
> > +   if (comm)
> > +   save_comm(comm, current);
> > +}
> > +
> > +static struct tracing_map_ops hist_trigger_ops = {
> > +   .elt_alloc  = hist_trigger_elt_alloc,
> > +   .elt_copy   = hist_trigger_elt_copy,
> > +   .elt_free   = hist_trigger_elt_free,
> > +   .elt_init   = 

RE: [GIT PULL] Thermal-SoC management updates for v4.4-rc1 #1

2015-11-03 Thread Zhang, Rui


> -Original Message-
> From: linux-acpi-ow...@vger.kernel.org [mailto:linux-acpi-
> ow...@vger.kernel.org] On Behalf Of Rafael J. Wysocki
> Sent: Wednesday, November 04, 2015 5:14 AM
> To: Eduardo Valentin
> Cc: Zhang, Rui; Rafael J. Wysocki; Linux ACPI; Linux PM; LKML
> Subject: Re: [GIT PULL] Thermal-SoC management updates for v4.4-rc1 #1
> Importance: High
> 
> HI,
> 
> On Tue, Nov 3, 2015 at 6:13 PM, Eduardo Valentin 
> wrote:
> > Rui, Rafael,
> >
> > On Tue, Nov 03, 2015 at 03:10:28AM +, Zhang, Rui wrote:
> >>
> >> > Javi Merino (2):
> >> >   PM / OPP: get the voltage for all OPPs
> >>
> >> I think this patch should be took by Rafael, right?
> >>
> >> Thanks,
> >> Rui
> >>
> >> >  drivers/base/power/opp.c   |   4 +-
> >
> > Well, yes. I included it here because the devfreq support has a
> > (functional) dependency on this change. As it is a small change, and I
> > did not notice any break in linux-next, I included it in my pull
> > request.
> >
> > Rafael, I think it is your call. Do you want collect this one and send
> > it via your tree, or are you fine having it going through Ruis?
> 
> It can go in through the Rui's tree as far as I'm concerned.
> 

Good, I will take this patch. Thanks!

-Rui 



Re: [PATCH v13 02/51] vfs: Add MAY_CREATE_FILE and MAY_CREATE_DIR permission flags

2015-11-03 Thread Andreas Dilger
On Nov 3, 2015, at 8:16 AM, Andreas Gruenbacher  wrote:
> 
> Richacls distinguish between creating non-directories and directories. To
> support that, add an isdir parameter to may_create(). When checking
> inode_permission() for create permission, pass in an additional
> MAY_CREATE_FILE or MAY_CREATE_DIR mask flag.
> 
> To allow checking for delete *and* create access when replacing an existing
> file via vfs_rename(), add a replace parameter to may_delete().
> 
> Signed-off-by: Andreas Gruenbacher 
> Reviewed-by: J. Bruce Fields 
> ---
> fs/namei.c | 43 +--
> include/linux/fs.h |  2 ++
> 2 files changed, 27 insertions(+), 18 deletions(-)
> 
> diff --git a/fs/namei.c b/fs/namei.c
> index 224ecf1..0259392 100644
> --- a/fs/namei.c
> +++ b/fs/namei.c
> @@ -453,7 +453,9 @@ static int sb_permission(struct super_block *sb, struct 
> inode *inode, int mask)
>  * this, letting us set arbitrary permissions for filesystem access without
>  * changing the "normal" UIDs which are used for other things.
>  *
> - * When checking for MAY_APPEND, MAY_WRITE must also be set in @mask.
> + * MAY_WRITE must be set in @mask whenever MAY_APPEND, MAY_CREATE_FILE, or
> + * MAY_CREATE_DIR are set.  That way, file systems that don't support these
> + * permissions will check for MAY_WRITE instead.
>  */
> int inode_permission(struct inode *inode, int mask)
> {
> @@ -2549,10 +2551,11 @@ EXPORT_SYMBOL(__check_sticky);
>  * 10. We don't allow removal of NFS sillyrenamed files; it's handled by
>  * nfs_async_unlink().
>  */
> -static int may_delete(struct inode *dir, struct dentry *victim, bool isdir)
> +static int may_delete(struct inode *dir, struct dentry *victim,
> +   bool isdir, bool replace)
> {
>   struct inode *inode = d_backing_inode(victim);
> - int error;
> + int error, mask = MAY_WRITE | MAY_EXEC;
> 
>   if (d_is_negative(victim))
>   return -ENOENT;
> @@ -2561,7 +2564,9 @@ static int may_delete(struct inode *dir, struct dentry 
> *victim, bool isdir)
>   BUG_ON(victim->d_parent->d_inode != dir);
>   audit_inode_child(dir, victim, AUDIT_TYPE_CHILD_DELETE);
> 
> - error = inode_permission(dir, MAY_WRITE | MAY_EXEC);
> + if (replace)
> + mask |= isdir ? MAY_CREATE_DIR : MAY_CREATE_FILE;
> + error = inode_permission(dir, mask);
>   if (error)
>   return error;
>   if (IS_APPEND(dir))
> @@ -2592,14 +2597,16 @@ static int may_delete(struct inode *dir, struct 
> dentry *victim, bool isdir)
>  *  3. We should have write and exec permissions on dir
>  *  4. We can't do it if dir is immutable (done in permission())
>  */
> -static inline int may_create(struct inode *dir, struct dentry *child)
> +static inline int may_create(struct inode *dir, struct dentry *child, bool 
> isdir)
> {
> + int mask = isdir ? MAY_CREATE_DIR : MAY_CREATE_FILE;
> +
>   audit_inode_child(dir, child, AUDIT_TYPE_CHILD_CREATE);
>   if (child->d_inode)
>   return -EEXIST;
>   if (IS_DEADDIR(dir))
>   return -ENOENT;
> - return inode_permission(dir, MAY_WRITE | MAY_EXEC);
> + return inode_permission(dir, MAY_WRITE | MAY_EXEC | mask);
> }
> 
> /*
> @@ -2649,7 +2656,7 @@ EXPORT_SYMBOL(unlock_rename);
> int vfs_create(struct inode *dir, struct dentry *dentry, umode_t mode,
>   bool want_excl)
> {
> - int error = may_create(dir, dentry);
> + int error = may_create(dir, dentry, false);
>   if (error)
>   return error;
> 
> @@ -3494,7 +3501,7 @@ EXPORT_SYMBOL(user_path_create);
> 
> int vfs_mknod(struct inode *dir, struct dentry *dentry, umode_t mode, dev_t 
> dev)
> {
> - int error = may_create(dir, dentry);
> + int error = may_create(dir, dentry, false);

Passing "true" and "false" from the caller doesn't really make it clear
to the reader what the argument is for.

Since it isn't possible to check the file mode inside may_create() from
dentry->d_inode->i_mode (inode doesn't exist yet) then passing "mode" as
an argument would at least make the code more readable.  "mode" is
available in all of the callers.

> 
>   if (error)
>   return error;
> @@ -3586,7 +3593,7 @@ SYSCALL_DEFINE3(mknod, const char __user *, filename, 
> umode_t, mode, unsigned, d
> 
> int vfs_mkdir(struct inode *dir, struct dentry *dentry, umode_t mode)
> {
> - int error = may_create(dir, dentry);
> + int error = may_create(dir, dentry, true);
>   unsigned max_links = dir->i_sb->s_max_links;
> 
>   if (error)
> @@ -3667,7 +3674,7 @@ EXPORT_SYMBOL(dentry_unhash);
> 
> int vfs_rmdir(struct inode *dir, struct dentry *dentry)
> {
> - int error = may_delete(dir, dentry, 1);
> + int error = may_delete(dir, dentry, true, false);

This is a prime example why passing "true" and "false" as function arguments
is not very useful, and especially prone to bugs when there are two of them.

That said, this is code originally from Al, 

Re: [PATCH v13 21/51] ext4: Add richacl feature flag

2015-11-03 Thread Andreas Gruenbacher
Andreas,

On Wed, Nov 4, 2015 at 3:18 AM, Andreas Dilger  wrote:
> This patch confuses me.  I thought the whole point of INCOMPAT_RICHACL
> was that the filesystem should never, ever be mounted without ACL support
> because the ACLs will get confused without it.  In that case, it doesn't
> make sense to have a mount option that _has_ to be specified to mount the
> filesystem, and returns an error when trying to disable it.
>
> It makes more sense to just enable "acl" by default if INCOMPAT_RICHACL
> is set in the superblock and not need the mount option at all.

It's the commit message that's misleading here, I'll fix it. On
richacl filesystems, the acl mount option is always on. It's only on
POSIX ACL filesystems that the mount option can be used to turn POSIX
ACLs off (which arguably wasn't such a good idea, but there we have
it).

Thanks,
Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)

2015-11-03 Thread Sergey Senozhatsky
On (11/04/15 10:25), Minchan Kim wrote:
[..]
> +static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
> + unsigned long end, struct mm_walk *walk)
> +
> +{
> + struct mmu_gather *tlb = walk->private;
> + struct mm_struct *mm = tlb->mm;
> + struct vm_area_struct *vma = walk->vma;
> + spinlock_t *ptl;
> + pte_t *pte, ptent;
> + struct page *page;


I'll just ask (probably I'm missing something)

+ pmd_trans_huge_lock() ?

> + split_huge_page_pmd(vma, addr, pmd);
> + if (pmd_trans_unstable(pmd))
> + return 0;
> +
> + pte = pte_offset_map_lock(mm, pmd, addr, );
> + arch_enter_lazy_mmu_mode();
> + for (; addr != end; pte++, addr += PAGE_SIZE) {
> + ptent = *pte;
> +
> + if (!pte_present(ptent))
> + continue;
> +
> + page = vm_normal_page(vma, addr, ptent);
> + if (!page)
> + continue;
> +

-ss
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 1/4] perf test: Keep test result clean if '-v' not set

2015-11-03 Thread Wang Nan
According to [1], 'perf state' should avoid output too much information
if '-v' not set, only 'Ok', 'FAIL' or 'Skip' need to be printed.

This patch removes sereval stderr output to make output clean.

Before this patch:
 # perf test dummy
 23: Test using a dummy software event to keep tracking   : (not supported) 
Ok

After this patch:
 # perf test dummy
 23: Test using a dummy software event to keep tracking   : Skip

[1] http://lkml.kernel.org/r/20151020134155.ge4...@redhat.com

Signed-off-by: Wang Nan 
Acked-by: Namhyung Kim 
Cc: Arnaldo Carvalho de Melo 
Link: 
http://lkml.kernel.org/n/tip-446450727-218554-1-git-send-email-wangn...@huawei.com
---
 tools/perf/tests/attr.c|  3 +--
 tools/perf/tests/code-reading.c|  8 
 tools/perf/tests/keep-tracking.c   |  4 ++--
 tools/perf/tests/llvm.c| 11 ---
 tools/perf/tests/switch-tracking.c |  4 ++--
 5 files changed, 13 insertions(+), 17 deletions(-)

diff --git a/tools/perf/tests/attr.c b/tools/perf/tests/attr.c
index 2dfc9ad..638875a 100644
--- a/tools/perf/tests/attr.c
+++ b/tools/perf/tests/attr.c
@@ -171,6 +171,5 @@ int test__attr(void)
!lstat(path_perf, ))
return run_dir(path_dir, path_perf);
 
-   fprintf(stderr, " (omitted)");
-   return 0;
+   return TEST_SKIP;
 }
diff --git a/tools/perf/tests/code-reading.c b/tools/perf/tests/code-reading.c
index 49b1959..a767a64 100644
--- a/tools/perf/tests/code-reading.c
+++ b/tools/perf/tests/code-reading.c
@@ -613,16 +613,16 @@ int test__code_reading(void)
case TEST_CODE_READING_OK:
return 0;
case TEST_CODE_READING_NO_VMLINUX:
-   fprintf(stderr, " (no vmlinux)");
+   pr_debug("no vmlinux\n");
return 0;
case TEST_CODE_READING_NO_KCORE:
-   fprintf(stderr, " (no kcore)");
+   pr_debug("no kcore\n");
return 0;
case TEST_CODE_READING_NO_ACCESS:
-   fprintf(stderr, " (no access)");
+   pr_debug("no access\n");
return 0;
case TEST_CODE_READING_NO_KERNEL_OBJ:
-   fprintf(stderr, " (no kernel obj)");
+   pr_debug("no kernel obj\n");
return 0;
default:
return -1;
diff --git a/tools/perf/tests/keep-tracking.c b/tools/perf/tests/keep-tracking.c
index 4d4b983..a2e2269 100644
--- a/tools/perf/tests/keep-tracking.c
+++ b/tools/perf/tests/keep-tracking.c
@@ -90,8 +90,8 @@ int test__keep_tracking(void)
evsel->attr.enable_on_exec = 0;
 
if (perf_evlist__open(evlist) < 0) {
-   fprintf(stderr, " (not supported)");
-   err = 0;
+   pr_debug("Unable to open dummy and cycles event\n");
+   err = TEST_SKIP;
goto out_err;
}
 
diff --git a/tools/perf/tests/llvm.c b/tools/perf/tests/llvm.c
index 52d5597..512d362 100644
--- a/tools/perf/tests/llvm.c
+++ b/tools/perf/tests/llvm.c
@@ -36,7 +36,7 @@ static int test__bpf_parsing(void *obj_buf, size_t obj_buf_sz)
 static int test__bpf_parsing(void *obj_buf __maybe_unused,
 size_t obj_buf_sz __maybe_unused)
 {
-   fprintf(stderr, " (skip bpf parsing)");
+   pr_debug("Skip bpf parsing\n");
return 0;
 }
 #endif
@@ -55,7 +55,7 @@ int test__llvm(void)
 * and clang is not found in $PATH, and this is not perf test -v
 */
if (verbose == 0 && !llvm_param.user_set_param && llvm__search_clang()) 
{
-   fprintf(stderr, " (no clang, try 'perf test -v LLVM')");
+   pr_debug("No clang and no verbosive, skip this test\n");
return TEST_SKIP;
}
 
@@ -86,11 +86,8 @@ int test__llvm(void)
err = llvm__compile_bpf("-", _buf, _buf_sz);
 
verbose = old_verbose;
-   if (err) {
-   if (!verbose)
-   fprintf(stderr, " (use -v to see error message)");
-   return -1;
-   }
+   if (err)
+   return TEST_FAIL;
 
err = test__bpf_parsing(obj_buf, obj_buf_sz);
free(obj_buf);
diff --git a/tools/perf/tests/switch-tracking.c 
b/tools/perf/tests/switch-tracking.c
index e698742..a02af50 100644
--- a/tools/perf/tests/switch-tracking.c
+++ b/tools/perf/tests/switch-tracking.c
@@ -366,7 +366,7 @@ int test__switch_tracking(void)
 
/* Third event */
if (!perf_evlist__can_select_event(evlist, sched_switch)) {
-   fprintf(stderr, " (no sched_switch)");
+   pr_debug("No sched_switch\n");
err = 0;
goto out;
}
@@ -442,7 +442,7 @@ int test__switch_tracking(void)
}
 
if (perf_evlist__open(evlist) < 0) {
-   fprintf(stderr, " (not supported)");
+   pr_debug("Not supported\n");
err = 0;
goto out;
}
-- 
1.8.3.4

--
To unsubscribe from this list: send the line 

[PATCH v2 4/4] perf tools: Improve BPF related error messages output

2015-11-03 Thread Wang Nan
A series of bpf loader related error code is introduced to help error
delivering. Functions are improved to return those new error code.
Functions which return pointers are adjusted to encode error code into
return value using "ERR_PTR".

bpf_loader_strerror() are introduced to convert those error message to
string. It detected the value of error code and calls libbpf_strerror()
and strerror_r() accordingly, so caller don't need to consider checking
the range of error code. bpf__strerror_head() is updated so existing
strerror functions can support these error code automatically.

Signed-off-by: Wang Nan 
Cc: Arnaldo Carvalho de Melo 
Cc: Namhyung Kim 
---
 tools/perf/util/bpf-loader.c   | 98 +-
 tools/perf/util/bpf-loader.h   | 18 
 tools/perf/util/parse-events.c |  7 +--
 3 files changed, 110 insertions(+), 13 deletions(-)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index 76f07ce..375f254 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -14,6 +14,10 @@
 #include "probe-finder.h" // for MAX_PROBES
 #include "llvm-utils.h"
 
+#if BPF_LOADER_ERRNO__END >= LIBBPF_ERRNO__START
+# error Too many BPF loader error code
+#endif
+
 #define DEFINE_PRINT_FN(name, level) \
 static int libbpf_##name(const char *fmt, ...) \
 {  \
@@ -53,7 +57,7 @@ struct bpf_object *bpf__prepare_load(const char *filename, 
bool source)
 
err = llvm__compile_bpf(filename, _buf, _buf_sz);
if (err)
-   return ERR_PTR(err);
+   return ERR_PTR(-BPF_LOADER_ERRNO__ECOMPILE);
obj = bpf_object__open_buffer(obj_buf, obj_buf_sz, filename);
free(obj_buf);
} else
@@ -113,14 +117,14 @@ config_bpf_program(struct bpf_program *prog)
if (err < 0) {
pr_debug("bpf: '%s' is not a valid config string\n",
 config_str);
-   err = -EINVAL;
+   err = -BPF_LOADER_ERRNO__ECONFIG;
goto errout;
}
 
if (pev->group && strcmp(pev->group, PERF_BPF_PROBE_GROUP)) {
pr_debug("bpf: '%s': group for event is set and not '%s'.\n",
 config_str, PERF_BPF_PROBE_GROUP);
-   err = -EINVAL;
+   err = -BPF_LOADER_ERRNO__EGROUP;
goto errout;
} else if (!pev->group)
pev->group = strdup(PERF_BPF_PROBE_GROUP);
@@ -132,9 +136,9 @@ config_bpf_program(struct bpf_program *prog)
}
 
if (!pev->event) {
-   pr_debug("bpf: '%s': event name is missing\n",
+   pr_debug("bpf: '%s': event name is missing. Section name should 
be 'key=value'\n",
 config_str);
-   err = -EINVAL;
+   err = -BPF_LOADER_ERRNO__EEVENTNAME;
goto errout;
}
pr_debug("bpf: config '%s' is ok\n", config_str);
@@ -285,7 +289,7 @@ int bpf__foreach_tev(struct bpf_object *obj,
(void **));
if (err || !priv) {
pr_debug("bpf: failed to get private field\n");
-   return -EINVAL;
+   return -BPF_LOADER_ERRNO__EINTERNAL;
}
 
pev = >pev;
@@ -308,13 +312,65 @@ int bpf__foreach_tev(struct bpf_object *obj,
return 0;
 }
 
+#define ARRAY_SIZE(x) (sizeof(x)/sizeof(x[0]))
+
+struct {
+   int code;
+   const char *msg;
+} bpf_loader_strerror_table[] = {
+   {BPF_LOADER_ERRNO__ECONFIG, "Invalid config string"},
+   {BPF_LOADER_ERRNO__EGROUP, "Invalid group name"},
+   {BPF_LOADER_ERRNO__EEVENTNAME, "No event name found in config string"},
+   {BPF_LOADER_ERRNO__EINTERNAL, "BPF loader internal error"},
+   {BPF_LOADER_ERRNO__ECOMPILE, "Error when compiling BPF scriptlet"},
+};
+
+static int
+bpf_loader_strerror(int err, char *buf, size_t size)
+{
+   unsigned int i;
+
+   if (!buf || !size)
+   return -1;
+
+   err = err > 0 ? err : -err;
+
+   if (err > LIBBPF_ERRNO__START)
+   return libbpf_strerror(err, buf, size);
+
+   if (err < BPF_LOADER_ERRNO__START) {
+   char sbuf[STRERR_BUFSIZE], *msg;
+
+   msg = strerror_r(err, sbuf, sizeof(sbuf));
+   snprintf(buf, size, "%s", msg);
+   buf[size - 1] = '\0';
+   return 0;
+   }
+
+   for (i = 0; i < ARRAY_SIZE(bpf_loader_strerror_table); i++) {
+   if (bpf_loader_strerror_table[i].code == err) {
+   const char *msg;
+
+   msg = bpf_loader_strerror_table[i].msg;
+   snprintf(buf, size, "%s", msg);
+   buf[size - 1] = '\0';
+   return 0;
+   }
+   }
+
+   snprintf(buf, size, "Unknown bpf loader error %d", 

[PATCH v2 3/4] bpf tools: Improve libbpf error reporting

2015-11-03 Thread Wang Nan
In this patch, a series libbpf specific error numbers and
libbpf_strerror() are created to help reporting error to caller.
Functions are updated to pass correct error number through macro
CHECK_ERR().

All users of bpf_object__open{_buffer}() and bpf_program__title()
in perf are modified accordingly.

Signed-off-by: Wang Nan 
Cc: Arnaldo Carvalho de Melo 
Cc: Namhyung Kim 
---
 tools/lib/bpf/libbpf.c | 149 -
 tools/lib/bpf/libbpf.h |  12 
 tools/perf/tests/llvm.c|   2 +-
 tools/perf/util/bpf-loader.c   |   8 +--
 tools/perf/util/parse-events.c |   4 +-
 5 files changed, 120 insertions(+), 55 deletions(-)

diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c
index 4252fc2..74c64b1 100644
--- a/tools/lib/bpf/libbpf.c
+++ b/tools/lib/bpf/libbpf.c
@@ -61,6 +61,62 @@ void libbpf_set_print(libbpf_print_fn_t warn,
__pr_debug = debug;
 }
 
+#define ARRAY_SIZE(x) (sizeof(x)/sizeof(x[0]))
+#define STRERR_BUFSIZE  128
+
+struct {
+   int code;
+   const char *msg;
+} libbpf_strerror_table[] = {
+   {LIBBPF_ERRNO__ELIBELF, "Something wrong in libelf"},
+   {LIBBPF_ERRNO__EFORMAT, "BPF object format invalid"},
+   {LIBBPF_ERRNO__EKVERSION, "'version' section incorrect or lost"},
+   {LIBBPF_ERRNO__EENDIAN, "Endian missmatch"},
+   {LIBBPF_ERRNO__EINTERNAL, "Internal error in libbpf"},
+   {LIBBPF_ERRNO__ERELOC, "Relocation failed"},
+   {LIBBPF_ERRNO__ELOAD, "Failed to load program"},
+};
+
+int libbpf_strerror(int err, char *buf, size_t size)
+{
+   unsigned int i;
+
+   if (!buf || !size)
+   return -1;
+
+   err = err > 0 ? err : -err;
+
+   if (err < LIBBPF_ERRNO__START) {
+   int ret;
+
+   ret = strerror_r(err, buf, size);
+   buf[size - 1] = '\0';
+   return ret;
+   }
+
+   for (i = 0; i < ARRAY_SIZE(libbpf_strerror_table); i++) {
+   if (libbpf_strerror_table[i].code == err) {
+   const char *msg;
+
+   msg = libbpf_strerror_table[i].msg;
+   snprintf(buf, size, "%s", msg);
+   buf[size - 1] = '\0';
+   return 0;
+   }
+   }
+
+   snprintf(buf, size, "Unknown libbpf error %d", err);
+   buf[size - 1] = '\0';
+   return -1;
+}
+
+#define CHECK_ERR(action, err, out) do {   \
+   err = action;   \
+   if (err)\
+   goto out;   \
+} while(0)
+
+
 /* Copied from tools/perf/util/util.h */
 #ifndef zfree
 # define zfree(ptr) ({ free(*ptr); *ptr = NULL; })
@@ -258,7 +314,7 @@ static struct bpf_object *bpf_object__new(const char *path,
obj = calloc(1, sizeof(struct bpf_object) + strlen(path) + 1);
if (!obj) {
pr_warning("alloc memory failed for %s\n", path);
-   return NULL;
+   return ERR_PTR(-ENOMEM);
}
 
strcpy(obj->path, path);
@@ -305,7 +361,7 @@ static int bpf_object__elf_init(struct bpf_object *obj)
 
if (obj_elf_valid(obj)) {
pr_warning("elf init: internal error\n");
-   return -EEXIST;
+   return -LIBBPF_ERRNO__ELIBELF;
}
 
if (obj->efile.obj_buf_sz > 0) {
@@ -331,14 +387,14 @@ static int bpf_object__elf_init(struct bpf_object *obj)
if (!obj->efile.elf) {
pr_warning("failed to open %s as ELF file\n",
obj->path);
-   err = -EINVAL;
+   err = -LIBBPF_ERRNO__ELIBELF;
goto errout;
}
 
if (!gelf_getehdr(obj->efile.elf, >efile.ehdr)) {
pr_warning("failed to get EHDR from %s\n",
obj->path);
-   err = -EINVAL;
+   err = -LIBBPF_ERRNO__EFORMAT;
goto errout;
}
ep = >efile.ehdr;
@@ -346,7 +402,7 @@ static int bpf_object__elf_init(struct bpf_object *obj)
if ((ep->e_type != ET_REL) || (ep->e_machine != 0)) {
pr_warning("%s is not an eBPF object file\n",
obj->path);
-   err = -EINVAL;
+   err = -LIBBPF_ERRNO__EFORMAT;
goto errout;
}
 
@@ -374,14 +430,14 @@ bpf_object__check_endianness(struct bpf_object *obj)
goto mismatch;
break;
default:
-   return -EINVAL;
+   return -LIBBPF_ERRNO__EENDIAN;
}
 
return 0;
 
 mismatch:
pr_warning("Error: endianness mismatch.\n");
-   return -EINVAL;
+   return -LIBBPF_ERRNO__EENDIAN;
 }
 
 static int
@@ -402,7 +458,7 @@ bpf_object__init_kversion(struct bpf_object *obj,
 
if (size != sizeof(kver)) {
pr_warning("invalid kver section in %s\n", obj->path);
-   return -EINVAL;
+   return 

[PATCH v2 2/4] perf tools: Mute libbpf when '-v' not set

2015-11-03 Thread Wang Nan
According to [1], libbpf should be muted. This patch reset info and
warning message level to ensure libbpf doesn't output anything even
if error happened.

[1] http://lkml.kernel.org/r/20151020151255.gf5...@kernel.org

Signed-off-by: Wang Nan 
Cc: Arnaldo Carvalho de Melo 
Cc: Namhyung Kim 
---
 tools/perf/util/bpf-loader.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/tools/perf/util/bpf-loader.c b/tools/perf/util/bpf-loader.c
index ba6f752..0c5d174 100644
--- a/tools/perf/util/bpf-loader.c
+++ b/tools/perf/util/bpf-loader.c
@@ -26,8 +26,8 @@ static int libbpf_##name(const char *fmt, ...)\
return ret; \
 }
 
-DEFINE_PRINT_FN(warning, 0)
-DEFINE_PRINT_FN(info, 0)
+DEFINE_PRINT_FN(warning, 1)
+DEFINE_PRINT_FN(info, 1)
 DEFINE_PRINT_FN(debug, 1)
 
 struct bpf_prog_priv {
-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v11 08/28] tracing: Add lock-free tracing_map

2015-11-03 Thread Namhyung Kim
Hi Tom,

On Tue, Nov 03, 2015 at 07:47:52PM -0600, Tom Zanussi wrote:
> Hi Namhyung,
> 
> On Mon, 2015-11-02 at 16:08 +0900, Namhyung Kim wrote:
> > I thought it'd be better if users can see which one is the real drop
> > or not.  IOW if drop count is much smaller than the normal event
> > count, [s]he might want to ignore the occasional drops.  Otherwise,
> > [s]he should restart with a bigger table.  This requires accurate
> > counts of events and drops though.
> > 
> 
> OK, how about the below - it basically moves the drops set/test/inc into
> tracing_map_insert(), as well as a total hits count.  So those values
> will be available for users to use in deciding whether to use the data
> or restart with a bigger table, and the loop is bailed out of only if no
> matching keys are found and there are drops, so callers can continue
> updating existing entries.

But if a key didn't get a desired index, it'd still fail to update..


> 
> Users who want the original behavior still get the NULL return and can
> stop calling tracing_map_insert() as before:
> 
> struct tracing_map_elt *tracing_map_insert(struct tracing_map *map, void *key)
> {
> u32 idx, key_hash, test_key;
> struct tracing_map_entry *entry;
> 
> key_hash = jhash(key, map->key_size, 0);
> if (key_hash == 0)
> key_hash = 1;
> idx = key_hash >> (32 - (map->map_bits + 1));
> 
> while (1) {
> idx &= (map->map_size - 1);
> entry = TRACING_MAP_ENTRY(map->map, idx);
> test_key = entry->key;
> 
> if (test_key && test_key == key_hash && entry->val &&
> keys_match(key, entry->val->key, map->key_size)) {
> atomic64_inc(>hits);
> return entry->val;
> }
> 
> if (atomic64_read(>drops)) {
> atomic64_inc(>drops);
> break;
> }

IMHO it should be removed.

> 
> if (!test_key && !cmpxchg(>key, 0, key_hash)) {
> struct tracing_map_elt *elt;
> 
> elt = get_free_elt(map);
> if (!elt) {
> atomic64_inc(>drops);

And reset entry->key here..

> break;
> }
> memcpy(elt->key, key, map->key_size);
> entry->val = elt;
> 
> atomic64_inc(>hits);
> return entry->val;
> }
> idx++;
> }
> 
> return NULL;
> }


Then tracing_map_lookup() can be implemented like *_insert but bail out
from the loop if test_key is 0.  The caller might do like this:

if (!atomic64_read(>drops))
elt = tracing_map_insert(...);
else
elt = tracing_map_lookup(...);

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


[PATCH v2 0/4] perf bpf: Improve error code delivering and output

2015-11-03 Thread Wang Nan
This patchset is based on perf/core
(commit bebd23a2ed31d47e7dd746d3b125068aa2c42d85 in Arnaldo's tree).

Compare with v1 [1], this patchset changes error reporting convention
and the user of it at one same patch, avoid NULL checking in perf.

Wang Nan (4):
  perf test: Keep test result clean if '-v' not set
  perf tools: Mute libbpf when '-v' not set
  bpf tools: Improve libbpf error reporting
  perf tools: Improve BPF related error messages output

 tools/lib/bpf/libbpf.c | 149 +
 tools/lib/bpf/libbpf.h |  12 +++
 tools/perf/tests/attr.c|   3 +-
 tools/perf/tests/code-reading.c|   8 +-
 tools/perf/tests/keep-tracking.c   |   4 +-
 tools/perf/tests/llvm.c|  13 ++--
 tools/perf/tests/switch-tracking.c |   4 +-
 tools/perf/util/bpf-loader.c   | 110 +++
 tools/perf/util/bpf-loader.h   |  18 +
 tools/perf/util/parse-events.c |  11 +--
 10 files changed, 245 insertions(+), 87 deletions(-)

-- 
1.8.3.4

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v13 43/51] ext4: Don't allow unmapped identifiers in richacls

2015-11-03 Thread Andreas Dilger

> On Nov 3, 2015, at 8:17 AM, Andreas Gruenbacher  wrote:
> 
> Don't allow acls which contain unmapped identifiers: they are meaningful
> for remote file systems only.

Looks fine.

Reviewed-by: Andreas Dilger 

> Signed-off-by: Andreas Gruenbacher 
> ---
> fs/ext4/richacl.c | 4 
> 1 file changed, 4 insertions(+)
> 
> diff --git a/fs/ext4/richacl.c b/fs/ext4/richacl.c
> index 906d048..2115385 100644
> --- a/fs/ext4/richacl.c
> +++ b/fs/ext4/richacl.c
> @@ -74,6 +74,10 @@ __ext4_set_richacl(handle_t *handle, struct inode *inode, 
> struct richacl *acl)
>   int retval, size;
>   void *value;
> 
> + /* Don't allow acls with unmapped identifiers. */
> + if (richacl_has_unmapped_identifiers(acl))
> + return -EINVAL;
> +
>   if (richacl_equiv_mode(acl, ) == 0) {
>   inode->i_ctime = ext4_current_time(inode);
>   inode->i_mode = mode;
> --
> 2.5.0
> 


Cheers, Andreas







signature.asc
Description: Message signed with OpenPGP using GPGMail


Re: [PATCH v13 20/51] ext4: Add richacl support

2015-11-03 Thread Andreas Gruenbacher
On Wed, Nov 4, 2015 at 3:13 AM, Andreas Dilger  wrote:
> Patch looks reasonable.  One minor cleanup below that could be fixed when
> the patch series is refreshed, and you can add:
>
> Reviewed-by: Andreas Dilger 

Okay, thank you.

Andreas
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH] PM / OPP: Protect updates to list_dev with mutex

2015-11-03 Thread Viresh Kumar
On 02-11-15, 11:14, Stephen Boyd wrote:
> On 10/31, Viresh Kumar wrote:
> > On 30-10-15, 10:06, Stephen Boyd wrote:
> > > On 10/30, Viresh Kumar wrote:
> > > > dev_opp_list_lock is used everywhere to protect device and OPP lists,
> > > > but dev_pm_opp_set_sharing_cpus() is missed somehow. And instead we used
> > > > rcu-lock, which wouldn't help here as we are adding a new list_dev.
> > > > 
> > > > This also fixes a problem where we have called kzalloc(..., GFP_KERNEL)
> > > > from within rcu-lock, which isn't allowed as kzalloc can sleep when
> > > > called with GFP_KERNEL.
> > > 
> > > Care to share the splat here?
> > 
> > I don't know what is wrong (or right) with my exynos 5250 board, but I
> > didn't got any splat here even with the right config options (yes I
> > should have mentioned that earlier). I have seen this at other times
> > as well, while we were running after some cpufreq traces..
> > 
> > But, the case in hand is pretty straight forward and Mike T. did get a
> > splat as that's what he told me. We are calling a sleep-able function
> > from rcu_lock and that's obviously wrong.
> 
> That's slightly concerning. Given that the bug is so straight
> forward but we can't reproduce it doesn't instill a lot of
> confidence that the patch is correct.

I have asked Mike to provide the splat he got and test the new patch
to see if it is fixed or not.

> > > > diff --git a/drivers/base/power/opp/cpu.c b/drivers/base/power/opp/cpu.c
> > > > index 7654c5606307..91f15b2e25ee 100644
> > > > --- a/drivers/base/power/opp/cpu.c
> > > > +++ b/drivers/base/power/opp/cpu.c
> > > > @@ -124,12 +124,12 @@ int dev_pm_opp_set_sharing_cpus(struct device 
> > > > *cpu_dev, cpumask_var_t cpumask)
> > > > struct device *dev;
> > > > int cpu, ret = 0;
> > > >  
> > > > -   rcu_read_lock();
> > > > +   mutex_lock(_opp_list_lock);
> > > >  
> > > > dev_opp = _find_device_opp(cpu_dev);
> > > 
> > > So does _find_device_opp() need to be called with rcu_read_lock()
> > > held or not? The comment above the function makes it sound like
> > > we need RCU, but we don't do that here anymore.
> > 
> > That is more for the readers, as this function is going to return a
> > pointer to the device OPP, and to make sure it isn't freed behind
> > their back, they need to take the RCU lock.
> > 
> > There are other writer code paths as well, like add-opp, where we just
> > take the mutex as there can't be anything stronger than that :)
> > 
> 
> Agreed, but the comment above the function is misleading. We
> should correct that comment and/or add the lockdep checks to the
> function like we have elsewhere in this file.

Will do.

-- 
viresh
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v13 21/51] ext4: Add richacl feature flag

2015-11-03 Thread Andreas Dilger

> On Nov 3, 2015, at 8:16 AM, Andreas Gruenbacher  wrote:
> 
> From: "Aneesh Kumar K.V" 
> 
> This feature flag selects richacl instead of posix acl support on the
> file system. In addition, the "acl" mount option is needed for enabling
> either of the two kinds of acls.

This patch confuses me.  I thought the whole point of INCOMPAT_RICHACL
was that the filesystem should never, ever be mounted without ACL support
because the ACLs will get confused without it.  In that case, it doesn't
make sense to have a mount option that _has_ to be specified to mount the
filesystem, and returns an error when trying to disable it.

It makes more sense to just enable "acl" by default if INCOMPAT_RICHACL
is set in the superblock and not need the mount option at all.

Having a mount option made more sense when enabling this support was optional
for the filesystem, and it could be mounted without it enabled, but you
wanted INCOMPAT_RICHACL to prevent that so it needs to be fixed.

Cheers, Andreas

> Signed-off-by: Aneesh Kumar K.V 
> Signed-off-by: Andreas Gruenbacher 
> ---
> fs/ext4/ext4.h  |  6 --
> fs/ext4/super.c | 49 -
> 2 files changed, 44 insertions(+), 11 deletions(-)
> 
> diff --git a/fs/ext4/ext4.h b/fs/ext4/ext4.h
> index fd1f28b..b97a3b1 100644
> --- a/fs/ext4/ext4.h
> +++ b/fs/ext4/ext4.h
> @@ -991,7 +991,7 @@ struct ext4_inode_info {
> #define EXT4_MOUNT_UPDATE_JOURNAL 0x01000 /* Update the journal format */
> #define EXT4_MOUNT_NO_UID32   0x02000  /* Disable 32-bit UIDs */
> #define EXT4_MOUNT_XATTR_USER 0x04000 /* Extended user attributes */
> -#define EXT4_MOUNT_POSIX_ACL 0x08000 /* POSIX Access Control Lists */
> +#define EXT4_MOUNT_ACL   0x08000 /* Access Control Lists 
> */
> #define EXT4_MOUNT_NO_AUTO_DA_ALLOC   0x1 /* No auto delalloc mapping */
> #define EXT4_MOUNT_BARRIER0x2 /* Use block barriers */
> #define EXT4_MOUNT_QUOTA  0x8 /* Some quota option set */
> @@ -1582,6 +1582,7 @@ static inline int ext4_encrypted_inode(struct inode 
> *inode)
> #define EXT4_FEATURE_INCOMPAT_LARGEDIR0x4000 /* >2GB or 3-lvl 
> htree */
> #define EXT4_FEATURE_INCOMPAT_INLINE_DATA 0x8000 /* data in inode */
> #define EXT4_FEATURE_INCOMPAT_ENCRYPT 0x1
> +#define EXT4_FEATURE_INCOMPAT_RICHACL0x2
> 
> #define EXT2_FEATURE_COMPAT_SUPP  EXT4_FEATURE_COMPAT_EXT_ATTR
> #define EXT2_FEATURE_INCOMPAT_SUPP(EXT4_FEATURE_INCOMPAT_FILETYPE| \
> @@ -1607,7 +1608,8 @@ static inline int ext4_encrypted_inode(struct inode 
> *inode)
>EXT4_FEATURE_INCOMPAT_FLEX_BG| \
>EXT4_FEATURE_INCOMPAT_MMP | \
>EXT4_FEATURE_INCOMPAT_INLINE_DATA | \
> -  EXT4_FEATURE_INCOMPAT_ENCRYPT)
> +  EXT4_FEATURE_INCOMPAT_ENCRYPT | \
> +  EXT4_FEATURE_INCOMPAT_RICHACL)
> #define EXT4_FEATURE_RO_COMPAT_SUPP   (EXT4_FEATURE_RO_COMPAT_SPARSE_SUPER| \
>EXT4_FEATURE_RO_COMPAT_LARGE_FILE| \
>EXT4_FEATURE_RO_COMPAT_GDT_CSUM| \
> diff --git a/fs/ext4/super.c b/fs/ext4/super.c
> index a63c7b0..7457ea8 100644
> --- a/fs/ext4/super.c
> +++ b/fs/ext4/super.c
> @@ -1270,6 +1270,28 @@ static ext4_fsblk_t get_sb_block(void **data)
>   return sb_block;
> }
> 
> +static int enable_acl(struct super_block *sb)
> +{
> + sb->s_flags &= ~(MS_POSIXACL | MS_RICHACL);
> + if (test_opt(sb, ACL)) {
> + if (EXT4_HAS_INCOMPAT_FEATURE(sb,
> +   EXT4_FEATURE_INCOMPAT_RICHACL)) {
> +#ifdef CONFIG_EXT4_FS_RICHACL
> + sb->s_flags |= MS_RICHACL;
> +#else
> + return -EOPNOTSUPP;
> +#endif
> + } else {
> +#ifdef CONFIG_EXT4_FS_POSIX_ACL
> + sb->s_flags |= MS_POSIXACL;
> +#else
> + return -EOPNOTSUPP;
> +#endif
> + }
> + }
> + return 0;
> +}
> +
> #define DEFAULT_JOURNAL_IOPRIO (IOPRIO_PRIO_VALUE(IOPRIO_CLASS_BE, 3))
> static char deprecated_msg[] = "Mount option \"%s\" will be removed by %s\n"
>   "Contact linux-e...@vger.kernel.org if you think we should keep it.\n";
> @@ -1416,9 +1438,9 @@ static const struct mount_opts {
>MOPT_NO_EXT2 | MOPT_DATAJ},
>   {Opt_user_xattr, EXT4_MOUNT_XATTR_USER, MOPT_SET},
>   {Opt_nouser_xattr, EXT4_MOUNT_XATTR_USER, MOPT_CLEAR},
> -#ifdef CONFIG_EXT4_FS_POSIX_ACL
> - {Opt_acl, EXT4_MOUNT_POSIX_ACL, MOPT_SET},
> - {Opt_noacl, EXT4_MOUNT_POSIX_ACL, MOPT_CLEAR},
> +#if defined(CONFIG_EXT4_FS_POSIX_ACL) || defined(CONFIG_EXT4_FS_RICHACL)
> + {Opt_acl, EXT4_MOUNT_ACL, MOPT_SET},
> + {Opt_noacl, EXT4_MOUNT_ACL, MOPT_CLEAR},
> #else
>   {Opt_acl, 0, 

Re: [PATCH v2 01/13] mm: support madvise(MADV_FREE)

2015-11-03 Thread Sergey Senozhatsky
Hi Minchan,

On (11/04/15 10:25), Minchan Kim wrote:
[..]
>+static int madvise_free_pte_range(pmd_t *pmd, unsigned long addr,
>+   unsigned long end, struct mm_walk *walk)
>+
...
> + if (pmd_trans_unstable(pmd))
> + return 0;

I think it makes sense to update pmd_trans_unstable() and
pmd_none_or_trans_huge_or_clear_bad() comments in asm-generic/pgtable.h
Because they explicitly mention MADV_DONTNEED only. Just a thought.


> @@ -379,6 +502,14 @@ madvise_vma(struct vm_area_struct *vma, struct 
> vm_area_struct **prev,
>   return madvise_remove(vma, prev, start, end);
>   case MADV_WILLNEED:
>   return madvise_willneed(vma, prev, start, end);
> + case MADV_FREE:
> + /*
> +  * XXX: In this implementation, MADV_FREE works like
  
XXX

> +  * MADV_DONTNEED on swapless system or full swap.
> +  */
> + if (get_nr_swap_pages() > 0)
> + return madvise_free(vma, prev, start, end);
> + /* passthrough */

-ss
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [GIT PULL] x86/apic changes for v4.4

2015-11-03 Thread Linus Torvalds
On Tue, Nov 3, 2015 at 2:32 AM, Ingo Molnar  wrote:
>
> Please pull the latest x86-apic-for-linus git tree from:

Side note, I have an *old* patch that I think simplifies the
(reasonably common) case of sending IPI's to individual CPU's. That's
done by the "reschedule ipi" in particular.

That's something that is trivially done on the x2apic, but that the
mask-based interfaces makes insanely complicated.

I have *not* rebased this on top of modern kernels so it may not
actually work as a patch any more, because I'm just sending this out
as a "Hmm, what do you guys think" rather than a real submission.

Comments?

Linus
From 9f09a8b2a1fa58f2e692af4a7283fd1674749d39 Mon Sep 17 00:00:00 2001
From: Linus Torvalds 
Date: Fri, 20 Feb 2015 11:06:02 -0800
Subject: [PATCH] x86: add a single-target IPI function to the apic

We still fall back on the "send mask" versions if an apic definition
doesn't have the single-target version, but at least this allows the
(trivial) case for the common clustered x2apic case.

Signed-off-by: Linus Torvalds 
---
 arch/x86/include/asm/apic.h   |  1 +
 arch/x86/kernel/apic/x2apic_cluster.c |  9 +
 arch/x86/kernel/smp.c | 16 ++--
 3 files changed, 24 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
index 92003f3c8a42..5d47ffcfa65d 100644
--- a/arch/x86/include/asm/apic.h
+++ b/arch/x86/include/asm/apic.h
@@ -296,6 +296,7 @@ struct apic {
   unsigned int *apicid);
 
 	/* ipi */
+	void (*send_IPI)(int cpu, int vector);
 	void (*send_IPI_mask)(const struct cpumask *mask, int vector);
 	void (*send_IPI_mask_allbutself)(const struct cpumask *mask,
 	 int vector);
diff --git a/arch/x86/kernel/apic/x2apic_cluster.c b/arch/x86/kernel/apic/x2apic_cluster.c
index e658f21681c8..177c4fb027a1 100644
--- a/arch/x86/kernel/apic/x2apic_cluster.c
+++ b/arch/x86/kernel/apic/x2apic_cluster.c
@@ -23,6 +23,14 @@ static inline u32 x2apic_cluster(int cpu)
 	return per_cpu(x86_cpu_to_logical_apicid, cpu) >> 16;
 }
 
+static void x2apic_send_IPI(int cpu, int vector)
+{
+	u32 dest = per_cpu(x86_cpu_to_logical_apicid, cpu);
+
+	x2apic_wrmsr_fence();
+	__x2apic_send_IPI_dest(dest, vector, APIC_DEST_LOGICAL);
+}
+
 static void
 __x2apic_send_IPI_mask(const struct cpumask *mask, int vector, int apic_dest)
 {
@@ -266,6 +274,7 @@ static struct apic apic_x2apic_cluster = {
 
 	.cpu_mask_to_apicid_and		= x2apic_cpu_mask_to_apicid_and,
 
+	.send_IPI			= x2apic_send_IPI,
 	.send_IPI_mask			= x2apic_send_IPI_mask,
 	.send_IPI_mask_allbutself	= x2apic_send_IPI_mask_allbutself,
 	.send_IPI_allbutself		= x2apic_send_IPI_allbutself,
diff --git a/arch/x86/kernel/smp.c b/arch/x86/kernel/smp.c
index be8e1bde07aa..78c9b12d892a 100644
--- a/arch/x86/kernel/smp.c
+++ b/arch/x86/kernel/smp.c
@@ -114,6 +114,18 @@ static atomic_t stopping_cpu = ATOMIC_INIT(-1);
 static bool smp_no_nmi_ipi = false;
 
 /*
+ * Helper wrapper: not all apic definitions support sending to
+ * a single CPU, so we fall back to sending to a mask.
+ */
+static void send_IPI_cpu(int cpu, int vector)
+{
+	if (apic->send_IPI)
+		apic->send_IPI(cpu, vector);
+	else
+		apic->send_IPI_mask(cpumask_of(cpu), vector);
+}
+
+/*
  * this function sends a 'reschedule' IPI to another CPU.
  * it goes straight through and wastes no time serializing
  * anything. Worst case is that we lose a reschedule ...
@@ -124,12 +136,12 @@ static void native_smp_send_reschedule(int cpu)
 		WARN_ON(1);
 		return;
 	}
-	apic->send_IPI_mask(cpumask_of(cpu), RESCHEDULE_VECTOR);
+	send_IPI_cpu(cpu, RESCHEDULE_VECTOR);
 }
 
 void native_send_call_func_single_ipi(int cpu)
 {
-	apic->send_IPI_mask(cpumask_of(cpu), CALL_FUNCTION_SINGLE_VECTOR);
+	send_IPI_cpu(cpu, CALL_FUNCTION_SINGLE_VECTOR);
 }
 
 void native_send_call_func_ipi(const struct cpumask *mask)
-- 
2.6.2.402.g2635c2b



Re: [PATCH v13 20/51] ext4: Add richacl support

2015-11-03 Thread Andreas Dilger
On Nov 3, 2015, at 8:16 AM, Andreas Gruenbacher  wrote:
> 
> From: "Aneesh Kumar K.V" 
> 
> Support the richacl permission model in ext4.  The richacls are stored
> in "system.richacl" xattrs.  Richacls need to be enabled by tune2fs or
> at file system create time.

Patch looks reasonable.  One minor cleanup below that could be fixed when
the patch series is refreshed, and you can add:

Reviewed-by: Andreas Dilger 

> 
> Signed-off-by: Aneesh Kumar K.V 
> Signed-off-by: Andreas Gruenbacher 
> ---
> fs/ext4/Kconfig   |  11 +
> fs/ext4/Makefile  |   1 +
> fs/ext4/file.c|   3 ++
> fs/ext4/ialloc.c  |  11 -
> fs/ext4/inode.c   |  12 -
> fs/ext4/namei.c   |   5 ++
> fs/ext4/richacl.c | 141 ++
> fs/ext4/richacl.h |  40 
> fs/ext4/xattr.c   |   7 +++
> 9 files changed, 228 insertions(+), 3 deletions(-)
> create mode 100644 fs/ext4/richacl.c
> create mode 100644 fs/ext4/richacl.h
> 
> diff --git a/fs/ext4/Kconfig b/fs/ext4/Kconfig
> index b46e9fc..65c5230 100644
> --- a/fs/ext4/Kconfig
> +++ b/fs/ext4/Kconfig
> @@ -22,6 +22,17 @@ config EXT3_FS_POSIX_ACL
> This config option is here only for backward compatibility. ext3
> filesystem is now handled by the ext4 driver.
> 
> +config EXT4_FS_RICHACL
> + bool "Ext4 Rich Access Control Lists (EXPERIMENTAL)"
> + depends on EXT4_FS
> + select FS_RICHACL
> + help
> +   Richacls are an implementation of NFSv4 ACLs, extended by file masks
> +   to cleanly integrate into the POSIX file permission model.  To learn
> +   more about them, see http://www.bestbits.at/richacl/.
> +
> +   If you don't know what Richacls are, say N.
> +
> config EXT3_FS_SECURITY
>   bool "Ext3 Security Labels"
>   depends on EXT3_FS
> diff --git a/fs/ext4/Makefile b/fs/ext4/Makefile
> index 75285ea..ea0d539 100644
> --- a/fs/ext4/Makefile
> +++ b/fs/ext4/Makefile
> @@ -14,3 +14,4 @@ ext4-$(CONFIG_EXT4_FS_POSIX_ACL)+= acl.o
> ext4-$(CONFIG_EXT4_FS_SECURITY)   += xattr_security.o
> ext4-$(CONFIG_EXT4_FS_ENCRYPTION) += crypto_policy.o crypto.o \
>   crypto_key.o crypto_fname.o
> +ext4-$(CONFIG_EXT4_FS_RICHACL)   += richacl.o
> diff --git a/fs/ext4/file.c b/fs/ext4/file.c
> index 113837e..a03b4a5 100644
> --- a/fs/ext4/file.c
> +++ b/fs/ext4/file.c
> @@ -30,6 +30,7 @@
> #include "ext4_jbd2.h"
> #include "xattr.h"
> #include "acl.h"
> +#include "richacl.h"
> 
> /*
>  * Called when an inode is released. Note that this is different
> @@ -719,6 +720,8 @@ const struct inode_operations ext4_file_inode_operations 
> = {
>   .removexattr= generic_removexattr,
>   .get_acl= ext4_get_acl,
>   .set_acl= ext4_set_acl,
> + .get_richacl= ext4_get_richacl,
> + .set_richacl= ext4_set_richacl,
>   .fiemap = ext4_fiemap,
> };
> 
> diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
> index 619bfc1..9657b3a 100644
> --- a/fs/ext4/ialloc.c
> +++ b/fs/ext4/ialloc.c
> @@ -27,6 +27,7 @@
> #include "ext4_jbd2.h"
> #include "xattr.h"
> #include "acl.h"
> +#include "richacl.h"
> 
> #include 
> 
> @@ -697,6 +698,14 @@ out:
>   return ret;
> }
> 
> +static inline int
> +ext4_new_acl(handle_t *handle, struct inode *inode, struct inode *dir)
> +{
> + if (IS_RICHACL(dir))
> + return ext4_init_richacl(handle, inode, dir);
> + return ext4_init_acl(handle, inode, dir);
> +}
> +
> /*
>  * There are two policies for allocating an inode.  If the new inode is
>  * a directory, then a forward search is made for a block group with both
> @@ -1052,7 +1061,7 @@ got:
>   if (err)
>   goto fail_drop;
> 
> - err = ext4_init_acl(handle, inode, dir);
> + err = ext4_new_acl(handle, inode, dir);
>   if (err)
>   goto fail_free_drop;
> 
> diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c
> index 612fbcf..647f3c3 100644
> --- a/fs/ext4/inode.c
> +++ b/fs/ext4/inode.c
> @@ -42,6 +42,7 @@
> #include "xattr.h"
> #include "acl.h"
> #include "truncate.h"
> +#include "richacl.h"
> 
> #include 
> 
> @@ -4638,6 +4639,14 @@ static void ext4_wait_for_tail_page_commit(struct 
> inode *inode)
>   }
> }
> 
> +static inline int
> +ext4_acl_chmod(struct inode *inode, umode_t mode)
> +{
> + if (IS_RICHACL(inode))
> + return richacl_chmod(inode, inode->i_mode);
> + return posix_acl_chmod(inode, inode->i_mode);
> +}
> +
> /*
>  * ext4_setattr()
>  *
> @@ -4806,8 +4815,7 @@ int ext4_setattr(struct dentry *dentry, struct iattr 
> *attr)
>   ext4_orphan_del(NULL, inode);
> 
>   if (!rc && (ia_valid & ATTR_MODE))
> - rc = posix_acl_chmod(inode, inode->i_mode);
> -
> + rc = ext4_acl_chmod(inode, inode->i_mode);
> err_out:
>   ext4_std_error(inode->i_sb, error);
>   if (!error)
> diff --git a/fs/ext4/namei.c b/fs/ext4/namei.c
> index 9f61e76..9b6e8b9 100644
> --- a/fs/ext4/namei.c
> +++ b/fs/ext4/namei.c
> @@ 

Re: [PATCH v13 12/51] vfs: Cache richacl in struct inode

2015-11-03 Thread Andreas Dilger
On Nov 3, 2015, at 8:16 AM, Andreas Gruenbacher  wrote:
> 
> Cache richacls in struct inode so that this doesn't have to be done
> individually in each filesystem.  This is similar to POSIX ACLs.
> 
> Signed-off-by: Andreas Gruenbacher 
> ---
> fs/inode.c  | 11 ++--
> fs/posix_acl.c  |  2 +-
> fs/richacl_base.c   |  4 +--
> fs/richacl_inode.c  | 75 +
> include/linux/fs.h  |  5 +++-
> include/linux/richacl.h | 15 ++
> 6 files changed, 100 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/inode.c b/fs/inode.c
> index 2a387f4..8462ddb 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -174,8 +174,11 @@ int inode_init_always(struct super_block *sb, struct 
> inode *inode)
>   inode->i_private = NULL;
>   inode->i_mapping = mapping;
>   INIT_HLIST_HEAD(>i_dentry);  /* buggered by rcu freeing */
> -#ifdef CONFIG_FS_POSIX_ACL
> - inode->i_acl = inode->i_default_acl = ACL_NOT_CACHED;
> +#if defined(CONFIG_FS_POSIX_ACL) || defined(CONFIG_FS_RICHACL)
> + inode->i_acl = ACL_NOT_CACHED;
> +# if defined(CONFIG_FS_POSIX_ACL)
> + inode->i_default_acl = ACL_NOT_CACHED;
> +# endif
> #endif
> 
> #ifdef CONFIG_FSNOTIFY
> @@ -231,11 +234,13 @@ void __destroy_inode(struct inode *inode)
>   atomic_long_dec(>i_sb->s_remove_count);
>   }
> 
> -#ifdef CONFIG_FS_POSIX_ACL
> +#if defined(CONFIG_FS_POSIX_ACL) || defined(CONFIG_FS_RICHACL)
>   if (inode->i_acl && inode->i_acl != ACL_NOT_CACHED)
>   put_base_acl(inode->i_acl);
> +# if defined(CONFIG_FS_POSIX_ACL)
>   if (inode->i_default_acl && inode->i_default_acl != ACL_NOT_CACHED)
>   put_base_acl(inode->i_default_acl);
> +# endif
> #endif
>   this_cpu_dec(nr_inodes);
> }
> diff --git a/fs/posix_acl.c b/fs/posix_acl.c
> index b3b2265..1d766a5 100644
> --- a/fs/posix_acl.c
> +++ b/fs/posix_acl.c
> @@ -38,7 +38,7 @@ struct posix_acl *get_cached_acl(struct inode *inode, int 
> type)
> {
>   struct posix_acl **p = acl_by_type(inode, type);
>   struct posix_acl *acl = ACCESS_ONCE(*p);
> - if (acl) {
> + if (acl && IS_POSIXACL(inode)) {
>   spin_lock(>i_lock);
>   acl = *p;
>   if (acl != ACL_NOT_CACHED)
> diff --git a/fs/richacl_base.c b/fs/richacl_base.c
> index 69b806c..d0ab5e9 100644
> --- a/fs/richacl_base.c
> +++ b/fs/richacl_base.c
> @@ -33,7 +33,7 @@ richacl_alloc(int count, gfp_t gfp)
>   struct richacl *acl = kzalloc(size, gfp);
> 
>   if (acl) {
> - atomic_set(>a_refcount, 1);
> + atomic_set(>a_base.ba_refcount, 1);
>   acl->a_count = count;
>   }
>   return acl;
> @@ -52,7 +52,7 @@ richacl_clone(const struct richacl *acl, gfp_t gfp)
> 
>   if (dup) {
>   memcpy(dup, acl, size);
> - atomic_set(>a_refcount, 1);
> + atomic_set(>a_base.ba_refcount, 1);

These two calls should be base_acl_init().  There isn't any point to have
an abstraction that is only used by the posix_acl* code and not the richacl
code, as it will just result in richacl breaking in the future if/when the
abstraction is changed.

>   }
>   return dup;
> }
> diff --git a/fs/richacl_inode.c b/fs/richacl_inode.c
> index 99b3c93..c41a6c4 100644
> --- a/fs/richacl_inode.c
> +++ b/fs/richacl_inode.c
> @@ -20,6 +20,81 @@
> #include 
> #include 
> 
> +struct richacl *get_cached_richacl(struct inode *inode)
> +{
> + struct richacl *acl;
> +
> + acl = (struct richacl *)ACCESS_ONCE(inode->i_acl);
> + if (acl && IS_RICHACL(inode)) {
> + spin_lock(>i_lock);
> + acl = (struct richacl *)inode->i_acl;
> + if (acl != ACL_NOT_CACHED)
> + acl = richacl_get(acl);
> + spin_unlock(>i_lock);
> + }
> + return acl;
> +}
> +EXPORT_SYMBOL_GPL(get_cached_richacl);
> +
> +struct richacl *get_cached_richacl_rcu(struct inode *inode)
> +{
> + return (struct richacl *)rcu_dereference(inode->i_acl);
> +}
> +EXPORT_SYMBOL_GPL(get_cached_richacl_rcu);
> +
> +void set_cached_richacl(struct inode *inode, struct richacl *acl)
> +{
> + struct base_acl *old = NULL;
> +
> + spin_lock(>i_lock);
> + old = inode->i_acl;
> + rcu_assign_pointer(inode->i_acl, _get(acl)->a_base);
> + spin_unlock(>i_lock);
> + if (old != ACL_NOT_CACHED)
> + put_base_acl(old);
> +}
> +EXPORT_SYMBOL_GPL(set_cached_richacl);
> +
> +void forget_cached_richacl(struct inode *inode)
> +{
> + struct base_acl *old = NULL;
> +
> + spin_lock(>i_lock);
> + old = inode->i_acl;
> + inode->i_acl = ACL_NOT_CACHED;
> + spin_unlock(>i_lock);
> + if (old != ACL_NOT_CACHED)
> + put_base_acl(old);
> +}
> +EXPORT_SYMBOL_GPL(forget_cached_richacl);
> +
> +struct richacl *get_richacl(struct inode *inode)
> +{
> + struct richacl *acl;
> +
> + acl = get_cached_richacl(inode);
> + if (acl != ACL_NOT_CACHED)
> +  

Re: [PATCH v11 22/28] tracing: Add enable_hist/disable_hist triggers

2015-11-03 Thread Tom Zanussi
On Tue, 2015-11-03 at 17:55 +0900, Namhyung Kim wrote:
> On Thu, Oct 22, 2015 at 01:14:26PM -0500, Tom Zanussi wrote:
> > Similar to enable_event/disable_event triggers, these triggers enable
> > and disable the aggregation of events into maps rather than enabling
> > and disabling their writing into the trace buffer.
> > 
> > They can be used to automatically start and stop hist triggers based
> > on a matching filter condition.
> > 
> > If there's a paused hist trigger on system:event, the following would
> > start it when the filter condition was hit:
> > 
> >   # echo enable_hist:system:event [ if filter] > event/trigger
> > 
> > And the following would disable a running system:event hist trigger:
> > 
> >   # echo disable_hist:system:event [ if filter] > event/trigger
> 
> What about named hist triggers?  Maybe worth adding
> "enable/disable_hist_name" too?
> 

enable_disable_hist should work fine with both named and unnamed hist
triggers.  As an example, the below sets a couple each of both named and
unnamed triggers and enables and disables them using
enable/disable_hist.  Running this I got the expected output i.e. there
were events captured in both named and unnamed histograms:

echo 'hist:name=foo:keys=skbaddr.hex:vals=len:pause if len < 0' > 
/sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
echo 'hist:name=foo:keys=skbaddr.hex:vals=len:pause if len > 256' > 
/sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
echo 'hist:keys=skbaddr.hex:vals=len:pause if len == 256' > 
/sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
echo 'hist:keys=skbaddr.hex:vals=len:pause' > 
/sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger

echo 'enable_hist:net:netif_receive_skb if filename==/usr/bin/wget' > 
/sys/kernel/debug/tracing/events/sched/sched_process_exec/trigger
echo 'disable_hist:net:netif_receive_skb if comm==wget' > 
/sys/kernel/debug/tracing/events/sched/sched_process_exit/trigger

Thanks,

Tom

> Thanks,
> Namhyung
> 
> 
> > 
> > See Documentation/trace/events.txt for real examples.
> > 
> > Signed-off-by: Tom Zanussi 
> > Tested-by: Masami Hiramatsu 


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCHSET 0/4] perf report: Support folded callchain output (v4)

2015-11-03 Thread Namhyung Kim
Hi Brendan,

On Tue, Nov 03, 2015 at 01:33:43PM -0800, Brendan Gregg wrote:
> On Tue, Nov 3, 2015 at 6:40 AM, Arnaldo Carvalho de Melo
>  wrote:
> > Em Tue, Nov 03, 2015 at 09:52:07PM +0900, Namhyung Kim escreveu:
> >> Hello,
> >>
> >> This is what Brendan requested on the perf-users mailing list [1] to
> >> support FlameGraphs [2] more efficiently.  This patchset adds a few
> >> more callchain options to adjust the output for it.
> >>
> >>  * changes in v4)
> >>   - add missing doc update
> >>   - cleanup/fix callchain value print code
> >>   - add Acked-by from Brendan and Jiri
> >
> > Do those Acked-by stand? Things changed, the values moved from the end
> > of the line to the start, etc.
> >
> [...]
> 
> I'd Ack this change as it's a useful addition. It doesn't quite
> address the folded-only output, but it's a step in that direction. I
> think having the value at the start of a line only makes sense for the
> perf report output containing the hist summary lines, for consistency.

Right, thanks!


> 
> Here's how I'd shuffle the output of this patch (ignore word wrap
> issues with this email):
> 
> # ./perf report --stdio -g folded,count,caller -F pid | \
> awk '/^ / { n = $1 }
> /^[0-9]/ { split(n,a,":"); print a[2] "-" a[1] ";" $2,$1 }'
> swapper-0;cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
> 809
> swapper-0;xen_start_kernel;x86_64_start_reservations;start_kernel;rest_init;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op
> 135
> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;check_events;xen_hypercall_xen_version
> 63
> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf
> 54
> dd-30551;__GI___libc_read;entry_SYSCALL_64_fastpath;sys_read;vfs_read;__vfs_read;urandom_read;extract_entropy_user;extract_buf;memset_erms
> 3
> dd-30551;xen_irq_enable_direct_end;check_events;xen_hypercall_xen_version 3
> 
> So the output is folded stacks, prefixed by comm-PID. Shuffling the
> summarized output is a lot better than doing a "perf script" dump and
> re-processing call chains. (Note that since I'm using -F, I didn't
> need --no-children;

Nope.  The '-F pid' doesn't affect --children.  It doesn't show the
children overhead column but we still have hist entries for
(synthesized) children..

  $ perf report --no-children | wc -l
  998

  $ perf report --no-children -F pid,dso,sym | wc -l
  998

  $ perf report --children | wc -l
  3229

  $ perf report --children -F pid,dso,sym | wc -l
  3202

So I think you still need to use --no-children (or set report.children
config variable to false) for your script.


> and with "-g count", I didn't need --show-nr-samples.)

Yes, I used -n/--show-nr-samples just to check the number is correct.


> 
> I notice the fields (-F) option already has this precedent:
> 
> - "comm": prints PID:comm
> - "pid": prints PID

It's opposite:  "comm" prints comm, "pid" prints PID:comm. :)


> 
> If these were added to -g, along with a no-hists, then the two types
> of folded-only output could be generated using:
> 
> perf report --stdio -g folded,count,comm,no-hists,caller
> perf report --stdio -g folded,count,pid,no-hists,caller

As I said, using fields like comm, pid requires to have same keys in
--sort option.  So it's basically unreliable to use those specific
field names in the -g option IMHO.  I suggested to use 'info' (yes, it
needs better name) to print all sort keys.


> 
> ... although "no-hists" doesn't hit me as intuitive. How about "-F
> none" to specify zero columns? ie:
> 
> perf report --stdio -g folded,count,comm,caller -F none
> perf report --stdio -g folded,count,pid,caller -F none

Ah, makes sense.  So it'd look like

  $ perf report --stdio -g folded,count,info -F none -s comm
  $ perf report --stdio -g folded,count,info -F none -s pid

The output would be

  809 swapper-0 
cpu_bringup_and_idle;cpu_startup_entry;default_idle_call;arch_cpu_idle;default_idle;xen_hypercall_sched_op

Thoughts?

Thanks,
Namhyung
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v4 05/10] xen/blkfront: negotiate number of queues/rings to be used with backend

2015-11-03 Thread Konrad Rzeszutek Wilk
On Wed, Nov 04, 2015 at 09:11:10AM +0800, Bob Liu wrote:
> 
> On 11/04/2015 04:40 AM, Konrad Rzeszutek Wilk wrote:
> > On Mon, Nov 02, 2015 at 12:21:41PM +0800, Bob Liu wrote:
> >> The number of hardware queues for xen/blkfront is set by parameter
> >> 'max_queues'(default 4), while the max value xen/blkback supported is 
> >> notified
> >> through xenstore("multi-queue-max-queues").
> > 
> > That is not right.
> > 
> > s/The number/The max number/
> > 
> > The second part: ",while the max value xen/blkback supported is..". I think
> > you are trying to say: "it is also capped by the max value that
> > the xen/blkback exposes through XenStore key 'multi-queue-max-queues'.
> > 
> >>
> >> The negotiated number is the smaller one and would be written back to 
> >> xenstore
> >> as "multi-queue-num-queues", blkback need to read this negotiated number.
> > 
> > s/blkback need to read/blkback needs to read this/
> > 
> >>
> >> Signed-off-by: Bob Liu 
> >> ---
> >>  drivers/block/xen-blkfront.c | 166 
> >> +++
> >>  1 file changed, 120 insertions(+), 46 deletions(-)
> >>
> >> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> >> index 8cc5995..23096d7 100644
> >> --- a/drivers/block/xen-blkfront.c
> >> +++ b/drivers/block/xen-blkfront.c
> >> @@ -98,6 +98,10 @@ static unsigned int xen_blkif_max_segments = 32;
> >>  module_param_named(max, xen_blkif_max_segments, int, S_IRUGO);
> >>  MODULE_PARM_DESC(max, "Maximum amount of segments in indirect requests 
> >> (default is 32)");
> >>  
> >> +static unsigned int xen_blkif_max_queues = 4;
> >> +module_param_named(max_queues, xen_blkif_max_queues, uint, S_IRUGO);
> >> +MODULE_PARM_DESC(max_queues, "Maximum number of hardware queues/rings 
> >> used per virtual disk");
> >> +
> >>  /*
> >>   * Maximum order of pages to be used for the shared ring between front and
> >>   * backend, 4KB page granularity is used.
> >> @@ -113,6 +117,7 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order 
> >> of pages to be used for the
> >>   * characters are enough. Define to 20 to keep consist with backend.
> >>   */
> >>  #define RINGREF_NAME_LEN (20)
> >> +#define QUEUE_NAME_LEN (12)
> > 
> > Little bit of documentation please. Why 12? Why not 31415 for example?
> > I presume it is 'queue-%u' and since so that is 7 + 10 (UINT_MAX is
> > 4294967295) = 17!
> > 
> > 
> >>  
> >>  /*
> >>   *  Per-ring info.
> >> @@ -695,7 +700,7 @@ static int xlvbd_init_blk_queue(struct gendisk *gd, 
> >> u16 sector_size,
> >>  
> >>memset(>tag_set, 0, sizeof(info->tag_set));
> >>info->tag_set.ops = _mq_ops;
> >> -  info->tag_set.nr_hw_queues = 1;
> >> +  info->tag_set.nr_hw_queues = info->nr_rings;
> >>info->tag_set.queue_depth =  BLK_RING_SIZE(info);
> >>info->tag_set.numa_node = NUMA_NO_NODE;
> >>info->tag_set.flags = BLK_MQ_F_SHOULD_MERGE | BLK_MQ_F_SG_MERGE;
> >> @@ -1352,6 +1357,51 @@ fail:
> >>return err;
> >>  }
> >>  
> >> +static int write_per_ring_nodes(struct xenbus_transaction xbt,
> >> +  struct blkfront_ring_info *rinfo, const char 
> >> *dir)
> >> +{
> >> +  int err, i;
> > 
> > Please make 'i' be an unsigned int. Especially as you are using '%u' in the 
> > snprintf.
> > 
> > 
> >> +  const char *message = NULL;
> >> +  struct blkfront_info *info = rinfo->dev_info;
> >> +
> >> +  if (info->nr_ring_pages == 1) {
> >> +  err = xenbus_printf(xbt, dir, "ring-ref", "%u", 
> >> rinfo->ring_ref[0]);
> >> +  if (err) {
> >> +  message = "writing ring-ref";
> >> +  goto abort_transaction;
> >> +  }
> >> +  pr_info("%s: write ring-ref:%d\n", dir, rinfo->ring_ref[0]);
> > 
> > Ewww. No.
> > 
> >> +  } else {
> >> +  for (i = 0; i < info->nr_ring_pages; i++) {
> >> +  char ring_ref_name[RINGREF_NAME_LEN];
> >> +
> >> +  snprintf(ring_ref_name, RINGREF_NAME_LEN, "ring-ref%u", 
> >> i);
> >> +  err = xenbus_printf(xbt, dir, ring_ref_name,
> >> +  "%u", rinfo->ring_ref[i]);
> >> +  if (err) {
> >> +  message = "writing ring-ref";
> >> +  goto abort_transaction;
> >> +  }
> >> +  pr_info("%s: write ring-ref:%d\n", dir, 
> >> rinfo->ring_ref[i]);
> > 
> > No no please.
> > 
> >> +  }
> >> +  }
> >> +
> >> +  err = xenbus_printf(xbt, dir, "event-channel", "%u", rinfo->evtchn);
> > 
> > That is not right.
> > 
> > That only creates one. But the blkif.h says (And the example agrees)
> > that there are N 'event-channel' for N-rings.
> > 
> > Shouldn't this be part of the above loop?
> > 
> 
> No, this loop is only for per-ring each with "multipage".
> 
> The loop you want is..
> 
> >> +  if (err) {
> >> +  message = "writing event-channel";
> >> +  goto abort_transaction;
> >> +  }
> >> +  pr_info("%s: write 

Re: [PATCH v11 13/28] tracing: Add hist trigger support for pausing and continuing a trace

2015-11-03 Thread Tom Zanussi
On Tue, 2015-11-03 at 17:38 +0900, Namhyung Kim wrote:
> On Thu, Oct 22, 2015 at 01:14:17PM -0500, Tom Zanussi wrote:
> > Allow users to append 'pause' or 'continue' to an existing trigger in
> > order to have it paused or to have a paused trace continue.
> > 
> > This expands the hist trigger syntax from this:
> > # echo hist:keys=xxx:vals=yyy:sort=zzz.descending \
> >   [ if filter] > event/trigger
> > 
> > to this:
> > 
> > # echo hist:keys=xxx:vals=yyy:sort=zzz.descending:pause or cont \
> >   [ if filter] > event/trigger
> 
> What if 'cont' is used for a non-existing hist?  I think it should
> fail with -ENOENT but it seems that it'd create a new hist trigger..
> 

You're right - 'pause' is the only one that should create a new trigger,
not 'cont' or 'clear'.

Thanks,

Tom

> Thanks,
> Namhyung
> 
> 
> > 
> > Signed-off-by: Tom Zanussi 
> > Tested-by: Masami Hiramatsu 
> > ---
> >  kernel/trace/trace.c |  7 ++-
> >  kernel/trace/trace_events_hist.c | 26 +++---
> >  2 files changed, 29 insertions(+), 4 deletions(-)
> > 
> > diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
> > index efa8d05..a223959 100644
> > --- a/kernel/trace/trace.c
> > +++ b/kernel/trace/trace.c
> > @@ -3802,6 +3802,7 @@ static const char readme_msg[] =
> > "\t[:values= > "\t[:sort=field1,field2,...]\n"
> > "\t[:size=#entries]\n"
> > +   "\t[:pause][:continue]\n"
> > "\t[if ]\n\n"
> > "\tWhen a matching event is hit, an entry is added to a hash\n"
> > "\ttable using the key(s) and value(s) named, and the value of a\n"
> > @@ -3816,7 +3817,11 @@ static const char readme_msg[] =
> > "\tused to specify more or fewer than the default 2048 entries\n"
> > "\tfor the hashtable size.\n\n"
> > "\tReading the 'hist' file for the event will dump the hash\n"
> > -   "\ttable in its entirety to stdout."
> > +   "\ttable in its entirety to stdout.\n\n"
> > +   "\tThe 'pause' parameter can be used to pause an existing hist\n"
> > +   "\ttrigger or to start a hist trigger but not log any events\n"
> > +   "\tuntil told to do so.  'continue' can be used to start or\n"
> > +   "\trestart a paused hist trigger.\n\n"
> >  #endif
> >  ;
> >  
> > diff --git a/kernel/trace/trace_events_hist.c 
> > b/kernel/trace/trace_events_hist.c
> > index 4fc3136..8753c3b 100644
> > --- a/kernel/trace/trace_events_hist.c
> > +++ b/kernel/trace/trace_events_hist.c
> > @@ -78,6 +78,8 @@ struct hist_trigger_attrs {
> > char*keys_str;
> > char*vals_str;
> > char*sort_key_str;
> > +   boolpause;
> > +   boolcont;
> > unsigned intmap_bits;
> >  };
> >  
> > @@ -184,6 +186,11 @@ static struct hist_trigger_attrs 
> > *parse_hist_trigger_attrs(char *trigger_str)
> > attrs->vals_str = kstrdup(str, GFP_KERNEL);
> > else if (!strncmp(str, "sort", strlen("sort")))
> > attrs->sort_key_str = kstrdup(str, GFP_KERNEL);
> > +   else if (!strncmp(str, "pause", strlen("pause")))
> > +   attrs->pause = true;
> > +   else if (!strncmp(str, "continue", strlen("continue")) ||
> > +!strncmp(str, "cont", strlen("cont")))
> > +   attrs->cont = true;
> > else if (!strncmp(str, "size", strlen("size"))) {
> > int map_bits = parse_map_size(str);
> >  
> > @@ -843,7 +850,10 @@ static int event_hist_trigger_print(struct seq_file *m,
> > if (data->filter_str)
> > seq_printf(m, " if %s", data->filter_str);
> >  
> > -   seq_puts(m, " [active]");
> > +   if (data->paused)
> > +   seq_puts(m, " [paused]");
> > +   else
> > +   seq_puts(m, " [active]");
> >  
> > seq_putc(m, '\n');
> >  
> > @@ -882,16 +892,25 @@ static int hist_register_trigger(char *glob, struct 
> > event_trigger_ops *ops,
> >  struct event_trigger_data *data,
> >  struct trace_event_file *file)
> >  {
> > +   struct hist_trigger_data *hist_data = data->private_data;
> > struct event_trigger_data *test;
> > int ret = 0;
> >  
> > list_for_each_entry_rcu(test, >triggers, list) {
> > if (test->cmd_ops->trigger_type == ETT_EVENT_HIST) {
> > -   ret = -EEXIST;
> > +   if (hist_data->attrs->pause)
> > +   test->paused = true;
> > +   else if (hist_data->attrs->cont)
> > +   test->paused = false;
> > +   else
> > +   ret = -EEXIST;
> > goto out;
> > }
> > }
> >  
> > +   if (hist_data->attrs->pause)
> > +   data->paused = true;
> > +
> > if (data->ops->init) {
> > ret = data->ops->init(data->ops, 

Re: [PATCH v11 24/28] tracing: Add support for multiple hist triggers per event

2015-11-03 Thread Tom Zanussi
On Tue, 2015-11-03 at 09:34 +0900, Namhyung Kim wrote:
> On Thu, Oct 22, 2015 at 01:14:28PM -0500, Tom Zanussi wrote:
> > Allow users to define any number of hist triggers per trace event.
> > Any number of hist triggers may be added for a given event, which may
> > differ by key, value, or filter.
> > 
> > Reading the event's 'hist' file will display the output of all the
> > hist triggers defined on an event concatenated in the order they were
> > defined.
> > 
> > Signed-off-by: Tom Zanussi 
> > Tested-by: Masami Hiramatsu 
> > ---
> >  Documentation/trace/events.txt   | 151 
> > +--
> >  kernel/trace/trace.c |   8 ++-
> >  kernel/trace/trace_events_hist.c | 138 ++-
> >  3 files changed, 256 insertions(+), 41 deletions(-)
> > 
> > diff --git a/Documentation/trace/events.txt b/Documentation/trace/events.txt
> > index b3aa47e..6c64cb7 100644
> > --- a/Documentation/trace/events.txt
> > +++ b/Documentation/trace/events.txt
> > @@ -532,12 +532,14 @@ The following commands are supported:
> >  
> >'hist' triggers add a 'hist' file to each event's subdirectory.
> >Reading the 'hist' file for the event will dump the hash table in
> > -  its entirety to stdout.  Each printed hash table entry is a simple
> > -  list of the keys and values comprising the entry; keys are printed
> > -  first and are delineated by curly braces, and are followed by the
> > -  set of value fields for the entry.  By default, numeric fields are
> > -  displayed as base-10 integers.  This can be modified by appending
> > -  any of the following modifiers to the field name:
> > +  its entirety to stdout.  If there are multiple hist triggers
> > +  attached to an event, there will be a table for each trigger in the
> > +  output.  Each printed hash table entry is a simple list of the keys
> > +  and values comprising the entry; keys are printed first and are
> > +  delineated by curly braces, and are followed by the set of value
> > +  fields for the entry.  By default, numeric fields are displayed as
> > +  base-10 integers.  This can be modified by appending any of the
> > +  following modifiers to the field name:
> >  
> >  .hexdisplay a number as a hex value
> > .symdisplay an address as a symbol
> > @@ -1629,3 +1631,140 @@ The following commands are supported:
> >  .
> >  .
> >  .
> > +
> > +  The following example demonstrates how multiple hist triggers can be
> > +  attached to a given event.  This capability can be useful for
> > +  creating a set of different summaries derived from the same set of
> > +  events, or for comparing the effects of different filters, among
> > +  other things.
> > +
> > +# echo 'hist:keys=skbaddr.hex:vals=len if len < 0' > \
> > +   /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
> > +# echo 'hist:keys=skbaddr.hex:vals=len if len > 4096' > \
> > +   /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
> > +# echo 'hist:keys=skbaddr.hex:vals=len if len == 256' > \
> > +   /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
> > +# echo 'hist:keys=skbaddr.hex:vals=len' > \
> > +   /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
> > +# echo 'hist:keys=len:vals=common_preempt_count' > \
> > +   /sys/kernel/debug/tracing/events/net/netif_receive_skb/trigger
> 
> AFAIK other tracefs files honor the truncation flag so open for
> writing would destroy other hist triggers.  What do you think?
> 

Yeah, I think you're right - adding multiple triggers should use >> (and
> should truncate as mentioned).

> 
> > +
> > +  The above set of commands create four triggers differing only in
> > +  their filters, along with a completely different though fairly
> > +  nonsensical trigger.
> 
> [SNIP]
> > @@ -1289,22 +1368,18 @@ static int event_hist_trigger_func(struct 
> > event_command *cmd_ops,
> >  
> > trigger_data->private_data = hist_data;
> >  
> > +   if (param) { /* if param is non-empty, it's supposed to be a filter */
> > +   ret = cmd_ops->set_filter(param, trigger_data, file);
> 
> Maybe you want to check ->set_filter being NULL first. :)

Yep, thanks.

Tom

> Thanks,
> Namhyung
> 
> 
> > +   if (ret < 0)
> > +   goto out_free;
> > +   }
> > +
> > if (glob[0] == '!') {
> > cmd_ops->unreg(glob+1, trigger_ops, trigger_data, file);
> > ret = 0;
> > goto out_free;
> > }
> >  
> > -   if (!param) /* if param is non-empty, it's supposed to be a filter */
> > -   goto out_reg;
> > -
> > -   if (!cmd_ops->set_filter)
> > -   goto out_reg;
> > -
> > -   ret = cmd_ops->set_filter(param, trigger_data, file);
> > -   if (ret < 0)
> > -   goto out_free;
> > - out_reg:
> > ret = cmd_ops->reg(glob, trigger_ops, trigger_data, file);
> > /*
> >  * The above returns on success 

Re: [PATCH v4 04/10] xen/blkfront: split per device io_lock

2015-11-03 Thread Konrad Rzeszutek Wilk
On Wed, Nov 04, 2015 at 09:07:12AM +0800, Bob Liu wrote:
> 
> On 11/04/2015 04:09 AM, Konrad Rzeszutek Wilk wrote:
> > On Mon, Nov 02, 2015 at 12:21:40PM +0800, Bob Liu wrote:
> >> The per device io_lock became a coarser grained lock after 
> >> multi-queues/rings
> >> was introduced, this patch introduced a fine-grained ring_lock for each 
> >> ring.
> > 
> > s/was introduced/was introduced (see commit titled XYZ)/
> > 
> > s/introdued/introduces/
> >>
> >> The old io_lock was renamed to dev_lock and only protect the ->grants list
> > 
> > s/was/is/
> > s/protect/protects/
> > 
> >> which is shared by all rings.
> >>
> >> Signed-off-by: Bob Liu 
> >> ---
> >>  drivers/block/xen-blkfront.c | 57 
> >> ++--
> >>  1 file changed, 34 insertions(+), 23 deletions(-)
> >>
> >> diff --git a/drivers/block/xen-blkfront.c b/drivers/block/xen-blkfront.c
> >> index eab78e7..8cc5995 100644
> >> --- a/drivers/block/xen-blkfront.c
> >> +++ b/drivers/block/xen-blkfront.c
> >> @@ -121,6 +121,7 @@ MODULE_PARM_DESC(max_ring_page_order, "Maximum order 
> >> of pages to be used for the
> >>   */
> >>  struct blkfront_ring_info {
> >>struct blkif_front_ring ring;
> > 
> > Can you add a comment explaining the lock semantic? As in under what 
> > conditions
> > should it be taken? Like you have it below.
> > 
> >> +  spinlock_t ring_lock;
> >>unsigned int ring_ref[XENBUS_MAX_RING_PAGES];
> >>unsigned int evtchn, irq;
> >>struct work_struct work;
> >> @@ -138,7 +139,8 @@ struct blkfront_ring_info {
> >>   */
> >>  struct blkfront_info
> >>  {
> >> -  spinlock_t io_lock;
> >> +  /* Lock to proect info->grants list shared by multi rings */
> > 
> > s/proect/protect/
> > 
> > Missing full stop.
> > 
> >> +  spinlock_t dev_lock;
> > 
> > Shouldn't it be right next to what it is protecting?
> > 
> > That is right below (or above): 'struct list_head grants;'?
> > 
> >>struct mutex mutex;
> >>struct xenbus_device *xbdev;
> >>struct gendisk *gd;
> >> @@ -224,6 +226,7 @@ static int fill_grant_buffer(struct blkfront_ring_info 
> >> *rinfo, int num)
> >>struct grant *gnt_list_entry, *n;
> >>int i = 0;
> >>  
> >> +  spin_lock_irq(>dev_lock);
> > 
> > Why there? Why not where you add it to the list?
> >>while(i < num) {
> >>gnt_list_entry = kzalloc(sizeof(struct grant), GFP_NOIO);
> >>if (!gnt_list_entry)
> >> @@ -242,6 +245,7 @@ static int fill_grant_buffer(struct blkfront_ring_info 
> >> *rinfo, int num)
> >>list_add(_list_entry->node, >grants);
> > 
> > Right here that is?
> > 
> > You are holding the lock for the duration of 'kzalloc' and 'alloc_page'.
> > 
> > And more interestingly, GFP_NOIO translates to __GFP_WAIT which means
> > it can call 'schedule'. - And you have taken an spinlock. That should
> > have thrown lots of warnings?
> > 
> >>i++;
> >>}
> >> +  spin_unlock_irq(>dev_lock);
> >>  
> >>return 0;
> >>  
> >> @@ -254,6 +258,7 @@ out_of_memory:
> >>kfree(gnt_list_entry);
> >>i--;
> >>}
> >> +  spin_unlock_irq(>dev_lock);
> > 
> > Just do it around the 'list_del' operation. You are using an
> > 'safe'
> >>BUG_ON(i != 0);
> >>return -ENOMEM;
> >>  }
> >> @@ -265,6 +270,7 @@ static struct grant *get_grant(grant_ref_t *gref_head,
> >>struct grant *gnt_list_entry;
> >>unsigned long buffer_gfn;
> >>  
> >> +  spin_lock(>dev_lock);
> >>BUG_ON(list_empty(>grants));
> >>gnt_list_entry = list_first_entry(>grants, struct grant,
> >>  node);
> >> @@ -272,8 +278,10 @@ static struct grant *get_grant(grant_ref_t *gref_head,
> >>  
> >>if (gnt_list_entry->gref != GRANT_INVALID_REF) {
> >>info->persistent_gnts_c--;
> >> +  spin_unlock(>dev_lock);
> >>return gnt_list_entry;
> >>}
> >> +  spin_unlock(>dev_lock);
> > 
> > Just have one spin_unlock. Put it right before the 'if 
> > (gnt_list_entry->gref)..'.
> 
> That's used to protect info->persistent_gnts_c, will update all other place.

But you don't mention that in the description - that the lock is suppose
to also protect persistent_gnts_c. Please update that.

> 
> Thanks,
> -Bob
--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH v11 08/28] tracing: Add lock-free tracing_map

2015-11-03 Thread Tom Zanussi
Hi Namhyung,

On Mon, 2015-11-02 at 16:08 +0900, Namhyung Kim wrote:
> Hi Tom,
> 
> On Thu, Oct 29, 2015 at 01:35:43PM -0500, Tom Zanussi wrote:
> > Hi Namhyung,
> > 
> > On Thu, 2015-10-29 at 17:31 +0900, Namhyung Kim wrote:
> > > Hi Tom,
> > > 
> > > On Thu, Oct 22, 2015 at 01:14:12PM -0500, Tom Zanussi wrote:
> > > > Add tracing_map, a special-purpose lock-free map for tracing.
> > > > 
> > > > tracing_map is designed to aggregate or 'sum' one or more values
> > > > associated with a specific object of type tracing_map_elt, which
> > > > is associated by the map to a given key.
> > > > 
> > > > It provides various hooks allowing per-tracer customization and is
> > > > separated out into a separate file in order to allow it to be shared
> > > > between multiple tracers, but isn't meant to be generally used outside
> > > > of that context.
> > > > 
> > > > The tracing_map implementation was inspired by lock-free map
> > > > algorithms originated by Dr. Cliff Click:
> > > > 
> > > >  http://www.azulsystems.com/blog/cliff/2007-03-26-non-blocking-hashtable
> > > >  http://www.azulsystems.com/events/javaone_2007/2007_LockFreeHash.pdf
> > > > 
> > > > Signed-off-by: Tom Zanussi 
> > > > Tested-by: Masami Hiramatsu 
> > > > ---
> > > > +/**
> > > > + * tracing_map_insert - Insert key and/or retrieve val from a 
> > > > tracing_map
> > > > + * @map: The tracing_map to insert into
> > > > + * @key: The key to insert
> > > > + *
> > > > + * Inserts a key into a tracing_map and creates and returns a new
> > > > + * tracing_map_elt for it, or if the key has already been inserted by
> > > > + * a previous call, returns the tracing_map_elt already associated
> > > > + * with it.  When the map was created, the number of elements to be
> > > > + * allocated for the map was specified (internally maintained as
> > > > + * 'max_elts' in struct tracing_map), and that number of
> > > > + * tracing_map_elts was created by tracing_map_init().  This is the
> > > > + * pre-allocated pool of tracing_map_elts that tracing_map_insert()
> > > > + * will allocate from when adding new keys.  Once that pool is
> > > > + * exhausted, tracing_map_insert() is useless and will return NULL to
> > > > + * signal that state.
> > > > + *
> > > > + * This is a lock-free tracing map insertion function implementing a
> > > > + * modified form of Cliff Click's basic insertion algorithm.  It
> > > > + * requires the table size be a power of two.  To prevent any
> > > > + * possibility of an infinite loop we always make the internal table
> > > > + * size double the size of the requested table size (max_elts * 2).
> > > > + * Likewise, we never reuse a slot or resize or delete elements - when
> > > > + * we've reached max_elts entries, we simply return NULL once we've
> > > > + * run out of entries.  Readers can at any point in time traverse the
> > > > + * tracing map and safely access the key/val pairs.
> > > > + *
> > > > + * Return: the tracing_map_elt pointer val associated with the key.
> > > > + * If this was a newly inserted key, the val will be a newly allocated
> > > > + * and associated tracing_map_elt pointer val.  If the key wasn't
> > > > + * found and the pool of tracing_map_elts has been exhausted, NULL is
> > > > + * returned and no further insertions will succeed.
> > > > + */
> > > > +struct tracing_map_elt *tracing_map_insert(struct tracing_map *map, 
> > > > void *key)
> > > > +{
> > > > +   u32 idx, key_hash, test_key;
> > > > +   struct tracing_map_entry *entry;
> > > > +
> > > > +   key_hash = jhash(key, map->key_size, 0);
> > > > +   if (key_hash == 0)
> > > > +   key_hash = 1;
> > > > +   idx = key_hash >> (32 - (map->map_bits + 1));
> > > > +
> > > > +   while (1) {
> > > > +   idx &= (map->map_size - 1);
> > > > +   entry = TRACING_MAP_ENTRY(map->map, idx);
> > > > +   test_key = entry->key;
> > > > +
> > > > +   if (test_key && test_key == key_hash && entry->val &&
> > > > +   keys_match(key, entry->val->key, map->key_size))
> > > > +   return entry->val;
> > > > +
> > > > +   if (!test_key && !cmpxchg(>key, 0, key_hash)) {
> > > > +   struct tracing_map_elt *elt;
> > > > +
> > > > +   elt = get_free_elt(map);
> > > > +   if (!elt)
> > > > +   break;
> > > > +   memcpy(elt->key, key, map->key_size);
> > > > +   entry->val = elt;
> > > > +
> > > > +   return entry->val;
> > > > +   }
> > > > +   idx++;
> > > > +   }
> > > > +
> > > > +   return NULL;
> > > > +}
> > > 
> > > IIUC this always insert new entry if no matching key found.  And if
> > > the map is full, it only fails after walking through the entries to
> > > find an empty one, mark the entry with the key and call to
> > > get_free_elt() returns NULL.  As 

Re: [PATCH v2] pwm-backlight: fix the panel power sequence

2015-11-03 Thread YH Huang
On Tue, 2015-11-03 at 12:08 +0100, Philipp Zabel wrote:
> Hi YH,
> 
> Am Dienstag, den 03.11.2015, 16:11 +0800 schrieb YH Huang:
> > > The reasoning is that devices where there is no phandle link pointing to
> > > the backlight (for example from a simple-panel node), we should keep the
> > > current default behaviour (enable during probe).
> > 
> > I have a little problem for the current default behaviour.
> > Should we enable during probe?
> 
> Here I mean enabling the backlight (at the end of the probe function),
> not enabling the GPIO already when requesting it.
> 
> > Before this patch ( http://patchwork.ozlabs.org/patch/324690/ ),
> > we disable "enable-gpio" in the probe function.
> 
> While before this patch the GPIO would be initialized in the disabled
> state, the call to backlight_update_status at the end of the probe
> function would still enable the backlight afterwards.

Based on this, could we disable it initially and update in the
backlight_update_status function?

Like this,

if (pb->enable_gpio) {
if (phandle &&
gpiod_get_direction(pb->enable_gpio) == GPIOF_DIR_OUT &&
gpiod_get_value(pb->enable_gpio) == 1)
gpiod_direction_output(pb->enable_gpio, 1);
else
gpiod_direction_output(pb->enable_gpio, 0);
}

And then update with props.brightness in backlight_update_status.
I am not sure, maybe I miss something.

Regards,
YH Huang

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 0/5] perf bpf: Improve error code delivering and output

2015-11-03 Thread Wangnan (F)



On 2015/11/3 23:43, Arnaldo Carvalho de Melo wrote:

Em Tue, Nov 03, 2015 at 10:44:41AM +, Wang Nan escreveu:

This patchset is based on tip/core.

Arnaldo found some confusing error reports when reviewing BPF related
patches. This patch set makes some changes:

Thanks for working on this!

I applied the first two and made comments about the conversion to using
err.h, please check,

- Arnaldo
  


I can't find them in your git repository at git.kernel.org. Could you
please recheck?

Thank you.

--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


Re: [PATCH 10/31] perf test: Enforce LLVM test for BPF test

2015-11-03 Thread Wangnan (F)



On 2015/11/4 2:24, Arnaldo Carvalho de Melo wrote:

Em Thu, Oct 15, 2015 at 07:58:38PM +0800, Wangnan (F) escreveu:

+void test__llvm_prepare(void)
+{
+   p_test_llvm__bpf_result = mmap(NULL, SHARED_BUF_INIT_SIZE,
+  PROT_READ | PROT_WRITE,
+  MAP_SHARED | MAP_ANONYMOUS, -1, 0);
+   if (!p_test_llvm__bpf_result)

It should check MAP_FAILED instead.


Fixed by this way:

Can you please try refreshing this patchset on top of what is now in
acme/perf/core?

Also why do we need those struct test->{prepare,cleanup} pointers? You
introduced it and then, on the next patch that touches 'perf test' and
uses test__llvm_{prepare,cleanup} you call them directly, which I think
should be enough, i.e. keep them as functions to call from inside
the test called from run_test(), right?

- Arnaldo
  


This prepare/cleanup functions are introduced because I want BPF
test reuse the result of LLVM test, so it don't need to compile
those BPF scripts twice. This is the reason I use shared
memory. However, now I think compiling twice is acceptable and
can make things simpler.

I'll update this patchset in my next pull request.

Thank you.


--
To unsubscribe from this list: send the line "unsubscribe linux-kernel" in
the body of a message to majord...@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Please read the FAQ at  http://www.tux.org/lkml/


  1   2   3   4   5   6   7   8   9   10   >