Re: [PATCH V3] arm64: tegra: add topology data for Tegra194 cpu

2019-02-21 Thread Bo Yan

The patch V3 adopted changes suggested by Thierry.


On 2/13/19 8:33 AM, Bo Yan wrote:

The xavier CPU architecture includes 8 CPU cores organized in
4 clusters. Add cpu-map data for topology initialization, this
fixes the topology information in
/sys/devices/system/cpu/cpu[n]/topology

Signed-off-by: Bo Yan 
---
V3: Replaced phandles with full path to CPU node
V2: remove cache nodes, add topology data only

  arch/arm64/boot/dts/nvidia/tegra194.dtsi | 42 
  1 file changed, 42 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 6dfa1ca..708d20c 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -870,6 +870,48 @@
#address-cells = <1>;
#size-cells = <0>;
  
+		cpu-map {

+   cluster0 {
+   core0 {
+   cpu = <&{/cpus/cpu@0}>;
+   };
+
+   core1 {
+   cpu = <&{/cpus/cpu@1}>;
+   };
+   };
+
+   cluster1 {
+   core0 {
+   cpu = <&{/cpus/cpu@2}>;
+   };
+
+   core1 {
+   cpu = <&{/cpus/cpu@3}>;
+   };
+   };
+
+   cluster2 {
+   core0 {
+   cpu = <&{/cpus/cpu@4}>;
+   };
+
+   core1 {
+   cpu = <&{/cpus/cpu@5}>;
+   };
+   };
+
+   cluster3 {
+   core0 {
+   cpu = <&{/cpus/cpu@6}>;
+   };
+
+   core1 {
+   cpu = <&{/cpus/cpu@7}>;
+   };
+   };
+   };
+
cpu@0 {
compatible = "nvidia,tegra194-carmel", "arm,armv8";
device_type = "cpu";



[PATCH V3] arm64: tegra: add topology data for Tegra194 cpu

2019-02-13 Thread Bo Yan
The xavier CPU architecture includes 8 CPU cores organized in
4 clusters. Add cpu-map data for topology initialization, this
fixes the topology information in
/sys/devices/system/cpu/cpu[n]/topology

Signed-off-by: Bo Yan 
---
V3: Replaced phandles with full path to CPU node
V2: remove cache nodes, add topology data only

 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 42 
 1 file changed, 42 insertions(+)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 6dfa1ca..708d20c 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -870,6 +870,48 @@
#address-cells = <1>;
#size-cells = <0>;
 
+   cpu-map {
+   cluster0 {
+   core0 {
+   cpu = <&{/cpus/cpu@0}>;
+   };
+
+   core1 {
+   cpu = <&{/cpus/cpu@1}>;
+   };
+   };
+
+   cluster1 {
+   core0 {
+   cpu = <&{/cpus/cpu@2}>;
+   };
+
+   core1 {
+   cpu = <&{/cpus/cpu@3}>;
+   };
+   };
+
+   cluster2 {
+   core0 {
+   cpu = <&{/cpus/cpu@4}>;
+   };
+
+   core1 {
+   cpu = <&{/cpus/cpu@5}>;
+   };
+   };
+
+   cluster3 {
+   core0 {
+   cpu = <&{/cpus/cpu@6}>;
+   };
+
+   core1 {
+   cpu = <&{/cpus/cpu@7}>;
+   };
+   };
+   };
+
cpu@0 {
compatible = "nvidia,tegra194-carmel", "arm,armv8";
device_type = "cpu";
-- 
2.7.4



Re: [PATCH V2] arm64: tegra: add topology data for Tegra194 cpu

2019-02-13 Thread Bo Yan

I agree, will update this patch with full paths replacing phandles.

On 2/13/19 12:12 AM, Thierry Reding wrote:

On Mon, Feb 11, 2019 at 03:47:07PM -0800, Bo Yan wrote:

The xavier CPU architecture includes 8 CPU cores organized in
4 clusters. Add cpu-map data for topology initialization, this
fixes the topology information in
/sys/devices/system/cpu/cpu[n]/topology

Signed-off-by: Bo Yan 
---
V2: remove cache nodes, add topology data only

  arch/arm64/boot/dts/nvidia/tegra194.dtsi | 58 +++-
  1 file changed, 50 insertions(+), 8 deletions(-)


This mostly looks good to me. One minor comment below.



diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 6dfa1ca..35e6e76 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -870,56 +870,98 @@
#address-cells = <1>;
#size-cells = <0>;
  
-		cpu@0 {

+   cpu-map {
+   cluster0 {
+   core0 {
+   cpu = <_0>;


I wonder if perhaps in this case it would be better to use the full path
to refer to the phandle here. That way we can avoid the labels, which
are somewhat cumbersome to write and the hierarchy, in my opinion, is a
much more natural way to reference these.

What I'm suggesting would look roughly like this:

cpu-map {
cluster0 {
core0 {
cpu = <&{/cpus/cpu@0}>;
};

core1 {
cpu = <&{/cpus/cpu@1}>;
};
};

cluster1 {
core0 {
cpu = <&{/cpus/cpu@2}>;
};

core1 {
cpu = <&{/cpus/cpu@3}>;
};
};

...
};

That's slightly more characters, but I think it's much easier to read
than the labels.

I don't feel very strongly about it, though, so feel free to keep this
as-is if you prefer.

Thierry



[PATCH V2] arm64: tegra: add topology data for Tegra194 cpu

2019-02-11 Thread Bo Yan
The xavier CPU architecture includes 8 CPU cores organized in
4 clusters. Add cpu-map data for topology initialization, this
fixes the topology information in
/sys/devices/system/cpu/cpu[n]/topology

Signed-off-by: Bo Yan 
---
V2: remove cache nodes, add topology data only

 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 58 +++-
 1 file changed, 50 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 6dfa1ca..35e6e76 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -870,56 +870,98 @@
#address-cells = <1>;
#size-cells = <0>;
 
-   cpu@0 {
+   cpu-map {
+   cluster0 {
+   core0 {
+   cpu = <_0>;
+   };
+
+   core1 {
+   cpu = <_1>;
+   };
+   };
+
+   cluster1 {
+   core0 {
+   cpu = <_0>;
+   };
+
+   core1 {
+   cpu = <_1>;
+   };
+   };
+
+   cluster2 {
+   core0 {
+   cpu = <_0>;
+   };
+
+   core1 {
+   cpu = <_1>;
+   };
+   };
+
+   cluster3 {
+   core0 {
+   cpu = <_0>;
+   };
+
+   core1 {
+   cpu = <_1>;
+   };
+   };
+   };
+
+   cl0_0: cpu@0 {
compatible = "nvidia,tegra194-carmel", "arm,armv8";
device_type = "cpu";
reg = <0x1>;
enable-method = "psci";
};
 
-   cpu@1 {
+   cl0_1: cpu@1 {
compatible = "nvidia,tegra194-carmel", "arm,armv8";
device_type = "cpu";
reg = <0x10001>;
enable-method = "psci";
};
 
-   cpu@2 {
+   cl1_0: cpu@2 {
compatible = "nvidia,tegra194-carmel", "arm,armv8";
device_type = "cpu";
reg = <0x100>;
enable-method = "psci";
};
 
-   cpu@3 {
+   cl1_1: cpu@3 {
compatible = "nvidia,tegra194-carmel", "arm,armv8";
device_type = "cpu";
reg = <0x101>;
enable-method = "psci";
};
 
-   cpu@4 {
+   cl2_0: cpu@4 {
compatible = "nvidia,tegra194-carmel", "arm,armv8";
device_type = "cpu";
reg = <0x200>;
enable-method = "psci";
};
 
-   cpu@5 {
+   cl2_1: cpu@5 {
compatible = "nvidia,tegra194-carmel", "arm,armv8";
device_type = "cpu";
reg = <0x201>;
enable-method = "psci";
};
 
-   cpu@6 {
+   cl3_0: cpu@6 {
compatible = "nvidia,tegra194-carmel", "arm,armv8";
device_type = "cpu";
reg = <0x10300>;
enable-method = "psci";
};
 
-   cpu@7 {
+   cl3_1: cpu@7 {
compatible = "nvidia,tegra194-carmel", "arm,armv8";
device_type = "cpu";
reg = <0x10301>;
-- 
2.7.4



Re: [PATCH] arm64: tegra: add topology data for Tegra194 cpu

2019-02-11 Thread Bo Yan
To make this simpler, I think it's best to isolate the cache information 
in its own patch. So I will amend this patch to include topology 
information only.


On 1/31/19 3:29 PM, Bo Yan wrote:


On 1/31/19 2:25 PM, Thierry Reding wrote:

On Thu, Jan 31, 2019 at 10:35:54AM -0800, Bo Yan wrote:

The xavier CPU architecture includes 8 CPU cores organized in
4 clusters. Add cpu-map data for topology initialization, add
cache data for cache node creation in sysfs.

Signed-off-by: Bo Yan 
---
  arch/arm64/boot/dts/nvidia/tegra194.dtsi | 148 
+--

  1 file changed, 140 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi

index 6dfa1ca..7c2a1fb 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -870,63 +870,195 @@
  #address-cells = <1>;
  #size-cells = <0>;



These don't seem to be well-defined. They are mentioned in a very weird
locations (Documentation/devicetree/booting-without-of.txt) but there
seem to be examples and other device tree files that use them so maybe
those are all valid. It might be worth mentioning these in other places
where people can more easily find them.


It might be logical to place a reference to this document 
(booting-without-of.txt) in architecture specific documents, for 
example, arm/cpus.txt. I see the need for improved documentation, but 
this probably should be best done in a separate change.


According to the above document, {i,d}-cache-line-size are deprecated in
favour of {i,d}-cache-block-size.


Mostly, this seems to be derived from the oddity of PowerPC, which might 
have different cache-line-size and cache-block-size. I don't know if 
there are other examples? It looks like the {i,d}-cache-line-size are 
being used in dts files for almost all architectures, the only exception 
is arch/sh/boot/dts/j2_mimas_v2.dts. On ARM and ARM64, cache-line-size 
is the same as cache-block-size. So I am wondering whether the 
booting-without-of.txt should be fixed instead? just to keep it 
consistent among dts files, especially in arm64.




I also don't see any mention of {i,d}-cache_sets in the device tree
bindings, though riscv/cpus.txt mentions {i,d}-cache-sets (note the
hyphen instead of underscore) in the examples. arm/l2c2x0.txt and
arm/uniphier/cache-unifier.txt describe cache-sets, though that's
slightly different.

Might make sense to document all these in more standard places. Maybe
adding them to arm/cpus.txt. For consistency with other properties, I
think there should be called {i,d}-cache-sets like for RISC-V.


+    l2-cache = <_0>;


This seems to be called next-level-cache everywhere else, though it's
only formally described in arm/uniphier/cache-uniphier.txt. So might
also make sense to add this to arm/cpus.txt.


the improved documentation is certainly desired, I agree.



  };
-    cpu@1 {
+    cl0_1: cpu@1 {
  compatible = "nvidia,tegra194-carmel", "arm,armv8";
  device_type = "cpu";
  reg = <0x10001>;
  enable-method = "psci";
+    i-cache-size = <131072>;
+    i-cache-line-size = <64>;
+    i-cache-sets = <512>;
+    d-cache-size = <65536>;
+    d-cache-line-size = <64>;
+    d-cache_sets = <256>;
+    l2-cache = <_0>;
  };
-    cpu@2 {
+    cl1_0: cpu@2 {
  compatible = "nvidia,tegra194-carmel", "arm,armv8";
  device_type = "cpu";
  reg = <0x100>;
  enable-method = "psci";
+    i-cache-size = <131072>;
+    i-cache-line-size = <64>;
+    i-cache-sets = <512>;
+    d-cache-size = <65536>;
+    d-cache-line-size = <64>;
+    d-cache_sets = <256>;
+    l2-cache = <_1>;
  };
-    cpu@3 {
+    cl1_1: cpu@3 {
  compatible = "nvidia,tegra194-carmel", "arm,armv8";
  device_type = "cpu";
  reg = <0x101>;
  enable-method = "psci";
+    i-cache-size = <131072>;
+    i-cache-line-size = <64>;
+    i-cache-sets = <512>;
+    d-cache-size = <65536>;
+    d-cache-line-size = <64>;
+    d-cache_sets = <256>;
+    l2-cache = <_1>;
  };
-    cpu@4 {
+    cl2_0: cpu@4 {
  compatible = "nvidia,tegra194-carmel", "arm,armv8";
  device_type = "cpu";
  reg = <0x200>;
  enable-method = "psci";
+    i-cache-size = <131072>;
+    i-cache-line-

Re: [PATCH] arm64: tegra: add topology data for Tegra194 cpu

2019-01-31 Thread Bo Yan



On 1/31/19 2:25 PM, Thierry Reding wrote:

On Thu, Jan 31, 2019 at 10:35:54AM -0800, Bo Yan wrote:

The xavier CPU architecture includes 8 CPU cores organized in
4 clusters. Add cpu-map data for topology initialization, add
cache data for cache node creation in sysfs.

Signed-off-by: Bo Yan 
---
  arch/arm64/boot/dts/nvidia/tegra194.dtsi | 148 +--
  1 file changed, 140 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 6dfa1ca..7c2a1fb 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -870,63 +870,195 @@
#address-cells = <1>;
#size-cells = <0>;



These don't seem to be well-defined. They are mentioned in a very weird
locations (Documentation/devicetree/booting-without-of.txt) but there
seem to be examples and other device tree files that use them so maybe
those are all valid. It might be worth mentioning these in other places
where people can more easily find them.


It might be logical to place a reference to this document 
(booting-without-of.txt) in architecture specific documents, for 
example, arm/cpus.txt. I see the need for improved documentation, but 
this probably should be best done in a separate change.


According to the above document, {i,d}-cache-line-size are deprecated in
favour of {i,d}-cache-block-size.


Mostly, this seems to be derived from the oddity of PowerPC, which might 
have different cache-line-size and cache-block-size. I don't know if 
there are other examples? It looks like the {i,d}-cache-line-size are 
being used in dts files for almost all architectures, the only exception 
is arch/sh/boot/dts/j2_mimas_v2.dts. On ARM and ARM64, cache-line-size 
is the same as cache-block-size. So I am wondering whether the 
booting-without-of.txt should be fixed instead? just to keep it 
consistent among dts files, especially in arm64.




I also don't see any mention of {i,d}-cache_sets in the device tree
bindings, though riscv/cpus.txt mentions {i,d}-cache-sets (note the
hyphen instead of underscore) in the examples. arm/l2c2x0.txt and
arm/uniphier/cache-unifier.txt describe cache-sets, though that's
slightly different.

Might make sense to document all these in more standard places. Maybe
adding them to arm/cpus.txt. For consistency with other properties, I
think there should be called {i,d}-cache-sets like for RISC-V.


+   l2-cache = <_0>;


This seems to be called next-level-cache everywhere else, though it's
only formally described in arm/uniphier/cache-uniphier.txt. So might
also make sense to add this to arm/cpus.txt.


the improved documentation is certainly desired, I agree.



};
  
-		cpu@1 {

+   cl0_1: cpu@1 {
compatible = "nvidia,tegra194-carmel", "arm,armv8";
device_type = "cpu";
reg = <0x10001>;
enable-method = "psci";
+   i-cache-size = <131072>;
+   i-cache-line-size = <64>;
+   i-cache-sets = <512>;
+   d-cache-size = <65536>;
+   d-cache-line-size = <64>;
+   d-cache_sets = <256>;
+   l2-cache = <_0>;
};
  
-		cpu@2 {

+   cl1_0: cpu@2 {
compatible = "nvidia,tegra194-carmel", "arm,armv8";
device_type = "cpu";
reg = <0x100>;
enable-method = "psci";
+   i-cache-size = <131072>;
+   i-cache-line-size = <64>;
+   i-cache-sets = <512>;
+   d-cache-size = <65536>;
+   d-cache-line-size = <64>;
+   d-cache_sets = <256>;
+   l2-cache = <_1>;
};
  
-		cpu@3 {

+   cl1_1: cpu@3 {
compatible = "nvidia,tegra194-carmel", "arm,armv8";
device_type = "cpu";
reg = <0x101>;
enable-method = "psci";
+   i-cache-size = <131072>;
+   i-cache-line-size = <64>;
+   i-cache-sets = <512>;
+   d-cache-size = <65536>;
+   d-cache-line-size = <64>;
+   d-cache_sets = <256>;
+   l2-cache = <_1>;
};
  
-		cpu@4 {

+   cl2_0: cpu@4 {
compatible = "nvidia,tegra194-carmel&qu

[PATCH] arm64: tegra: add topology data for Tegra194 cpu

2019-01-31 Thread Bo Yan
The xavier CPU architecture includes 8 CPU cores organized in
4 clusters. Add cpu-map data for topology initialization, add
cache data for cache node creation in sysfs.

Signed-off-by: Bo Yan 
---
 arch/arm64/boot/dts/nvidia/tegra194.dtsi | 148 +--
 1 file changed, 140 insertions(+), 8 deletions(-)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 6dfa1ca..7c2a1fb 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -870,63 +870,195 @@
#address-cells = <1>;
#size-cells = <0>;
 
-   cpu@0 {
+   cpu-map {
+   cluster0 {
+   core0 {
+   cpu = <_0>;
+   };
+
+   core1 {
+   cpu = <_1>;
+   };
+   };
+
+   cluster1 {
+   core0 {
+   cpu = <_0>;
+   };
+
+   core1 {
+   cpu = <_1>;
+   };
+   };
+
+   cluster2 {
+   core0 {
+   cpu = <_0>;
+   };
+
+   core1 {
+   cpu = <_1>;
+   };
+   };
+
+   cluster3 {
+   core0 {
+   cpu = <_0>;
+   };
+
+   core1 {
+   cpu = <_1>;
+   };
+   };
+   };
+
+   cl0_0: cpu@0 {
compatible = "nvidia,tegra194-carmel", "arm,armv8";
device_type = "cpu";
reg = <0x1>;
enable-method = "psci";
+   i-cache-size = <131072>;
+   i-cache-line-size = <64>;
+   i-cache-sets = <512>;
+   d-cache-size = <65536>;
+   d-cache-line-size = <64>;
+   d-cache_sets = <256>;
+   l2-cache = <_0>;
};
 
-   cpu@1 {
+   cl0_1: cpu@1 {
compatible = "nvidia,tegra194-carmel", "arm,armv8";
device_type = "cpu";
reg = <0x10001>;
enable-method = "psci";
+   i-cache-size = <131072>;
+   i-cache-line-size = <64>;
+   i-cache-sets = <512>;
+   d-cache-size = <65536>;
+   d-cache-line-size = <64>;
+   d-cache_sets = <256>;
+   l2-cache = <_0>;
};
 
-   cpu@2 {
+   cl1_0: cpu@2 {
compatible = "nvidia,tegra194-carmel", "arm,armv8";
device_type = "cpu";
reg = <0x100>;
enable-method = "psci";
+   i-cache-size = <131072>;
+   i-cache-line-size = <64>;
+   i-cache-sets = <512>;
+   d-cache-size = <65536>;
+   d-cache-line-size = <64>;
+   d-cache_sets = <256>;
+   l2-cache = <_1>;
};
 
-   cpu@3 {
+   cl1_1: cpu@3 {
compatible = "nvidia,tegra194-carmel", "arm,armv8";
device_type = "cpu";
reg = <0x101>;
enable-method = "psci";
+   i-cache-size = <131072>;
+   i-cache-line-size = <64>;
+   i-cache-sets = <512>;
+   d-cache-size = <65536>;
+   d-cache-line-size = <64>;
+   d-cache_sets = <256>;
+   l2-cache = <_1>;
};
 
-   cpu@4 {
+   cl2_0: cpu@4 {
compatible = "nvidia,tegra194-carmel", "arm

[PATCH] soc/tegra: remove duplicated function declaration

2018-11-13 Thread Bo Yan
The function tegra_read_chipid is declared twice in fuse.h. Remove
the redundant declaration.

Signed-off-by: Bo Yan 
---
 include/soc/tegra/fuse.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/soc/tegra/fuse.h b/include/soc/tegra/fuse.h
index 9b6ea0c..8fb2f8a 100644
--- a/include/soc/tegra/fuse.h
+++ b/include/soc/tegra/fuse.h
@@ -60,7 +60,6 @@ struct tegra_sku_info {
 
 u32 tegra_read_straps(void);
 u32 tegra_read_ram_code(void);
-u32 tegra_read_chipid(void);
 int tegra_fuse_readl(unsigned long offset, u32 *value);
 
 extern struct tegra_sku_info tegra_sku_info;
-- 
2.7.4



[PATCH] soc/tegra: remove duplicated function declaration

2018-11-13 Thread Bo Yan
The function tegra_read_chipid is declared twice in fuse.h. Remove
the redundant declaration.

Signed-off-by: Bo Yan 
---
 include/soc/tegra/fuse.h | 1 -
 1 file changed, 1 deletion(-)

diff --git a/include/soc/tegra/fuse.h b/include/soc/tegra/fuse.h
index 9b6ea0c..8fb2f8a 100644
--- a/include/soc/tegra/fuse.h
+++ b/include/soc/tegra/fuse.h
@@ -60,7 +60,6 @@ struct tegra_sku_info {
 
 u32 tegra_read_straps(void);
 u32 tegra_read_ram_code(void);
-u32 tegra_read_chipid(void);
 int tegra_fuse_readl(unsigned long offset, u32 *value);
 
 extern struct tegra_sku_info tegra_sku_info;
-- 
2.7.4



Re: a question about IP checksum helper for arm64

2018-07-09 Thread Bo Yan

Hi Robin,

That UBSAN error prompted me to check the generated instructions. The 
error by itself doesn't make sense to me because there is no requirement 
for 128b alignment on ldp/stp.


With 4.18-rc3, when I build for the default "defconfig" in 
arch/arm64/configs/, I see the disassembled code shows two ldr instead 
of a ldp.


One example is in function "ip_rcv" (/net/ipv4/ip_input.c). After 
disassembling vmlinux, I see following:


if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl)))
089bcdb8:   38776b05ldrbw5, [x24,x23]
static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
{
__uint128_t tmp;
u64 sum;

tmp = *(const __uint128_t *)iph;
089bcdbc:   f9400481ldr x1, [x4,#8]
089bcdc0:   f8776b00ldr x0, [x24,x23]
089bcdc4:   12000ca5and w5, w5, #0xf
089bcdc8:   510014a3sub w3, w5, #0x5

This is done with "make ARCH=arm64 CROSS_COMPILE=... defconfig", so the 
default optimization level is -O2.


I tried the same test as you did: aarch64-linux-gnu-objdump -S -d *.o 
in net/ipv4. The result is inconsistent. In some instances, I do see ldp 
instruction being generated, in some other cases, it's two ldr. For 
example, in inet_gro_receive and ip_mc_check_igmp, it's compiled as I 
expected. For ip_rcv, it's not.


So it looks like this is not very consistent, but it also looks like in 
majority of cases it generates the ldp instructions.




On 07/09/2018 04:54 AM, Robin Murphy wrote:

Hi Bo,

On 06/07/18 17:27, Bo Yan wrote:

Hi Robin, Luke,

Recently I bumped into an error when running GCC undefined behavior 
sanitizer:


UBSAN: Undefined behaviour in 
kernel-4.9/arch/arm64/include/asm/checksum.h:34:6
    load of misaligned address ffc198c8b254 for type 'const 
__int128 unsigned'

    which requires 16 byte alignment


What's your config and reproducer here? I've had UBSan enabled a few 
times since that patch went in and never noticed anything. I've just 
tried it with 4.18-rc3 and indeed don't see anything from just booting 
the machine and making some network traffic. It does indeed fire if I 
also turn on CONFIG_UBSAN_ALIGNMENT, but then it's almost lost among a 
million other warnings for all manner of types - that's to be expected 
since, as the help text says, "Enabling this option on architectures 
that support unaligned accesses may produce a lot of false positives."




The relevant code:

 tmp = *(const __uint128_t *)iph;
 iph += 16;
 ihl -= 4;
 tmp += ((tmp >> 64) | (tmp << 64));
 sum = tmp >> 64;
 do {
 sum += *(const u32 *)iph;
 iph += 4;
 } while (--ihl);

But, I checked the generated disassembly, it doesn't look like 
anything special is generated taking advantage of that.


I'm using Linaro GCC 6.4-2017.08, expecting ldp instructions to be 
emitted, but don't see it.


My regular toolchain is currently Linaro 7.2.1-2017.11, but I also tried 
the last GCC 6 I had installed (6.3.1-2017.05), and for both at -O2 I 
see LDP emitted as expected for most of the identifiable int128 accesses 
(both in a standalone test harness and a quick survey of kernel code via 
'aarch64-linux-gnu-objdump -S net/ipv4/*.o'). Of course, there may well 
be places where the compiler gets clever enough to elide all or part of 
that load where data is already held in registers - I've not audited 
*that* closely - but the whole point of having a pure C implementation 
is that it can be aggressively inlined more than inline asm ever could.


There were some prior discussions about GCC behavior, like this 
thread: https://patchwork.kernel.org/patch/9081911/ , in which you 
talked about the difference between GCC4 and GCC5.3. It looks to me 
this is regressed in Linaro GCC6.4 build.


I have not checked newer GCC versions.

Will it be more stable to just do this with inline assembly instead of 
relying on __uint128_t data type?


GCC documentation says __int128 is supported for targets which have an 
integer mode wide enough to hold 128 bits. aarch64 doesn't have such 
an integer mode.


Yet AArch64 GCC definitely does support __uint128_t, or this code 
wouldn't even build ;)


Robin.



Thanks

Bo


Re: a question about IP checksum helper for arm64

2018-07-09 Thread Bo Yan

Hi Robin,

That UBSAN error prompted me to check the generated instructions. The 
error by itself doesn't make sense to me because there is no requirement 
for 128b alignment on ldp/stp.


With 4.18-rc3, when I build for the default "defconfig" in 
arch/arm64/configs/, I see the disassembled code shows two ldr instead 
of a ldp.


One example is in function "ip_rcv" (/net/ipv4/ip_input.c). After 
disassembling vmlinux, I see following:


if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl)))
089bcdb8:   38776b05ldrbw5, [x24,x23]
static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl)
{
__uint128_t tmp;
u64 sum;

tmp = *(const __uint128_t *)iph;
089bcdbc:   f9400481ldr x1, [x4,#8]
089bcdc0:   f8776b00ldr x0, [x24,x23]
089bcdc4:   12000ca5and w5, w5, #0xf
089bcdc8:   510014a3sub w3, w5, #0x5

This is done with "make ARCH=arm64 CROSS_COMPILE=... defconfig", so the 
default optimization level is -O2.


I tried the same test as you did: aarch64-linux-gnu-objdump -S -d *.o 
in net/ipv4. The result is inconsistent. In some instances, I do see ldp 
instruction being generated, in some other cases, it's two ldr. For 
example, in inet_gro_receive and ip_mc_check_igmp, it's compiled as I 
expected. For ip_rcv, it's not.


So it looks like this is not very consistent, but it also looks like in 
majority of cases it generates the ldp instructions.




On 07/09/2018 04:54 AM, Robin Murphy wrote:

Hi Bo,

On 06/07/18 17:27, Bo Yan wrote:

Hi Robin, Luke,

Recently I bumped into an error when running GCC undefined behavior 
sanitizer:


UBSAN: Undefined behaviour in 
kernel-4.9/arch/arm64/include/asm/checksum.h:34:6
    load of misaligned address ffc198c8b254 for type 'const 
__int128 unsigned'

    which requires 16 byte alignment


What's your config and reproducer here? I've had UBSan enabled a few 
times since that patch went in and never noticed anything. I've just 
tried it with 4.18-rc3 and indeed don't see anything from just booting 
the machine and making some network traffic. It does indeed fire if I 
also turn on CONFIG_UBSAN_ALIGNMENT, but then it's almost lost among a 
million other warnings for all manner of types - that's to be expected 
since, as the help text says, "Enabling this option on architectures 
that support unaligned accesses may produce a lot of false positives."




The relevant code:

 tmp = *(const __uint128_t *)iph;
 iph += 16;
 ihl -= 4;
 tmp += ((tmp >> 64) | (tmp << 64));
 sum = tmp >> 64;
 do {
 sum += *(const u32 *)iph;
 iph += 4;
 } while (--ihl);

But, I checked the generated disassembly, it doesn't look like 
anything special is generated taking advantage of that.


I'm using Linaro GCC 6.4-2017.08, expecting ldp instructions to be 
emitted, but don't see it.


My regular toolchain is currently Linaro 7.2.1-2017.11, but I also tried 
the last GCC 6 I had installed (6.3.1-2017.05), and for both at -O2 I 
see LDP emitted as expected for most of the identifiable int128 accesses 
(both in a standalone test harness and a quick survey of kernel code via 
'aarch64-linux-gnu-objdump -S net/ipv4/*.o'). Of course, there may well 
be places where the compiler gets clever enough to elide all or part of 
that load where data is already held in registers - I've not audited 
*that* closely - but the whole point of having a pure C implementation 
is that it can be aggressively inlined more than inline asm ever could.


There were some prior discussions about GCC behavior, like this 
thread: https://patchwork.kernel.org/patch/9081911/ , in which you 
talked about the difference between GCC4 and GCC5.3. It looks to me 
this is regressed in Linaro GCC6.4 build.


I have not checked newer GCC versions.

Will it be more stable to just do this with inline assembly instead of 
relying on __uint128_t data type?


GCC documentation says __int128 is supported for targets which have an 
integer mode wide enough to hold 128 bits. aarch64 doesn't have such 
an integer mode.


Yet AArch64 GCC definitely does support __uint128_t, or this code 
wouldn't even build ;)


Robin.



Thanks

Bo


a question about IP checksum helper for arm64

2018-07-06 Thread Bo Yan

Hi Robin, Luke,

Recently I bumped into an error when running GCC undefined behavior 
sanitizer:


UBSAN: Undefined behaviour in 
kernel-4.9/arch/arm64/include/asm/checksum.h:34:6
   load of misaligned address ffc198c8b254 for type 'const 
__int128 unsigned'

   which requires 16 byte alignment


The relevant code:

tmp = *(const __uint128_t *)iph;
iph += 16;
ihl -= 4;
tmp += ((tmp >> 64) | (tmp << 64));
sum = tmp >> 64;
do {
sum += *(const u32 *)iph;
iph += 4;
} while (--ihl);

But, I checked the generated disassembly, it doesn't look like anything 
special is generated taking advantage of that.


I'm using Linaro GCC 6.4-2017.08, expecting ldp instructions to be 
emitted, but don't see it.


There were some prior discussions about GCC behavior, like this thread: 
https://patchwork.kernel.org/patch/9081911/ , in which you talked about 
the difference between GCC4 and GCC5.3. It looks to me this is regressed 
in Linaro GCC6.4 build.


I have not checked newer GCC versions.

Will it be more stable to just do this with inline assembly instead of 
relying on __uint128_t data type?


GCC documentation says __int128 is supported for targets which have an 
integer mode wide enough to hold 128 bits. aarch64 doesn't have such an 
integer mode.


Thanks

Bo


a question about IP checksum helper for arm64

2018-07-06 Thread Bo Yan

Hi Robin, Luke,

Recently I bumped into an error when running GCC undefined behavior 
sanitizer:


UBSAN: Undefined behaviour in 
kernel-4.9/arch/arm64/include/asm/checksum.h:34:6
   load of misaligned address ffc198c8b254 for type 'const 
__int128 unsigned'

   which requires 16 byte alignment


The relevant code:

tmp = *(const __uint128_t *)iph;
iph += 16;
ihl -= 4;
tmp += ((tmp >> 64) | (tmp << 64));
sum = tmp >> 64;
do {
sum += *(const u32 *)iph;
iph += 4;
} while (--ihl);

But, I checked the generated disassembly, it doesn't look like anything 
special is generated taking advantage of that.


I'm using Linaro GCC 6.4-2017.08, expecting ldp instructions to be 
emitted, but don't see it.


There were some prior discussions about GCC behavior, like this thread: 
https://patchwork.kernel.org/patch/9081911/ , in which you talked about 
the difference between GCC4 and GCC5.3. It looks to me this is regressed 
in Linaro GCC6.4 build.


I have not checked newer GCC versions.

Will it be more stable to just do this with inline assembly instead of 
relying on __uint128_t data type?


GCC documentation says __int128 is supported for targets which have an 
integer mode wide enough to hold 128 bits. aarch64 doesn't have such an 
integer mode.


Thanks

Bo


Re: [PATCH] irqchip/gic: check return value of of_address_to_resource

2018-07-05 Thread Bo Yan

Marc,

Sorry for the previous reply. My email settings were not correct, so it 
inserted those confidentiality text, which was not what I intended.


This is what I think:

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index ced10c4..0b60bb0 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -1284,7 +1284,7 @@ static bool gic_check_eoimode(struct device_node 
*node, void __iomem **base)

 {
struct resource cpuif_res;

-   of_address_to_resource(node, 1, _res);
+   (void)of_address_to_resource(node, 1, _res);

if (!is_hyp_mode_available())
return false;

We are 100% sure of_address_to_resource will succeed in this particular 
case, so "(void)" will help suppress Coverity warning.


On 07/05/2018 12:18 PM, Bo Yan wrote:

Marc,

I'm also wondering if of_address_to_resource can really fail in this 
particular case?


What if we just explicitly discard the return value like this:

(void)of_address_to_resource(node, 1, _res);

This suppresses Coverity warning by explicitly stating we are 100% sure 
the function call will always return success.


On 07/05/2018 12:13 PM, Marc Zyngier wrote:

Hi Bo,

On Thu, 5 Jul 2018 11:20:59 -0700
Bo Yan  wrote:


The of_address_to_resource returns 0 if successful. gic_check_eoimode
calls it without checking the return value. This induces Coverity
warning: "Unchecked return value".

Return false from gic_check_eoimode if of_address_to_resource returns
non-0 value.

Signed-off-by: Bo Yan 
---
  drivers/irqchip/irq-gic.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index ced10c4..0bceb10 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -1284,7 +1284,8 @@ static bool gic_check_eoimode(struct 
device_node *node, void __iomem **base)

  {
  struct resource cpuif_res;
-    of_address_to_resource(node, 1, _res);
+    if (of_address_to_resource(node, 1, _res))
+    return false;


We've just done an of_iomap() on this resource, which succeeded. How
can the same thing now fail? It would mean that the device tree has
been pulled from under our feet...

And if it could happen, why is returning false the right thing to do?
Why would we say we want EOImode==0 instead of 1?


  if (!is_hyp_mode_available())
  return false;


As it stands, I'm not taking such a patch. It either papers over a
bigger problem, or just keeps a warning quiet for the sake of it.

Thanks,

M.



Re: [PATCH] irqchip/gic: check return value of of_address_to_resource

2018-07-05 Thread Bo Yan

Marc,

Sorry for the previous reply. My email settings were not correct, so it 
inserted those confidentiality text, which was not what I intended.


This is what I think:

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index ced10c4..0b60bb0 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -1284,7 +1284,7 @@ static bool gic_check_eoimode(struct device_node 
*node, void __iomem **base)

 {
struct resource cpuif_res;

-   of_address_to_resource(node, 1, _res);
+   (void)of_address_to_resource(node, 1, _res);

if (!is_hyp_mode_available())
return false;

We are 100% sure of_address_to_resource will succeed in this particular 
case, so "(void)" will help suppress Coverity warning.


On 07/05/2018 12:18 PM, Bo Yan wrote:

Marc,

I'm also wondering if of_address_to_resource can really fail in this 
particular case?


What if we just explicitly discard the return value like this:

(void)of_address_to_resource(node, 1, _res);

This suppresses Coverity warning by explicitly stating we are 100% sure 
the function call will always return success.


On 07/05/2018 12:13 PM, Marc Zyngier wrote:

Hi Bo,

On Thu, 5 Jul 2018 11:20:59 -0700
Bo Yan  wrote:


The of_address_to_resource returns 0 if successful. gic_check_eoimode
calls it without checking the return value. This induces Coverity
warning: "Unchecked return value".

Return false from gic_check_eoimode if of_address_to_resource returns
non-0 value.

Signed-off-by: Bo Yan 
---
  drivers/irqchip/irq-gic.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index ced10c4..0bceb10 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -1284,7 +1284,8 @@ static bool gic_check_eoimode(struct 
device_node *node, void __iomem **base)

  {
  struct resource cpuif_res;
-    of_address_to_resource(node, 1, _res);
+    if (of_address_to_resource(node, 1, _res))
+    return false;


We've just done an of_iomap() on this resource, which succeeded. How
can the same thing now fail? It would mean that the device tree has
been pulled from under our feet...

And if it could happen, why is returning false the right thing to do?
Why would we say we want EOImode==0 instead of 1?


  if (!is_hyp_mode_available())
  return false;


As it stands, I'm not taking such a patch. It either papers over a
bigger problem, or just keeps a warning quiet for the sake of it.

Thanks,

M.



Re: [PATCH] irqchip/gic: check return value of of_address_to_resource

2018-07-05 Thread Bo Yan

Marc,

I'm also wondering if of_address_to_resource can really fail in this 
particular case?


What if we just explicitly discard the return value like this:

(void)of_address_to_resource(node, 1, _res);

This suppresses Coverity warning by explicitly stating we are 100% sure 
the function call will always return success.


On 07/05/2018 12:13 PM, Marc Zyngier wrote:

Hi Bo,

On Thu, 5 Jul 2018 11:20:59 -0700
Bo Yan  wrote:


The of_address_to_resource returns 0 if successful. gic_check_eoimode
calls it without checking the return value. This induces Coverity
warning: "Unchecked return value".

Return false from gic_check_eoimode if of_address_to_resource returns
non-0 value.

Signed-off-by: Bo Yan 
---
  drivers/irqchip/irq-gic.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index ced10c4..0bceb10 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -1284,7 +1284,8 @@ static bool gic_check_eoimode(struct device_node *node, 
void __iomem **base)
  {
struct resource cpuif_res;
  
-	of_address_to_resource(node, 1, _res);

+   if (of_address_to_resource(node, 1, _res))
+   return false;


We've just done an of_iomap() on this resource, which succeeded. How
can the same thing now fail? It would mean that the device tree has
been pulled from under our feet...

And if it could happen, why is returning false the right thing to do?
Why would we say we want EOImode==0 instead of 1?

  
  	if (!is_hyp_mode_available())

return false;


As it stands, I'm not taking such a patch. It either papers over a
bigger problem, or just keeps a warning quiet for the sake of it.

Thanks,

M.



Re: [PATCH] irqchip/gic: check return value of of_address_to_resource

2018-07-05 Thread Bo Yan

Marc,

I'm also wondering if of_address_to_resource can really fail in this 
particular case?


What if we just explicitly discard the return value like this:

(void)of_address_to_resource(node, 1, _res);

This suppresses Coverity warning by explicitly stating we are 100% sure 
the function call will always return success.


On 07/05/2018 12:13 PM, Marc Zyngier wrote:

Hi Bo,

On Thu, 5 Jul 2018 11:20:59 -0700
Bo Yan  wrote:


The of_address_to_resource returns 0 if successful. gic_check_eoimode
calls it without checking the return value. This induces Coverity
warning: "Unchecked return value".

Return false from gic_check_eoimode if of_address_to_resource returns
non-0 value.

Signed-off-by: Bo Yan 
---
  drivers/irqchip/irq-gic.c | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index ced10c4..0bceb10 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -1284,7 +1284,8 @@ static bool gic_check_eoimode(struct device_node *node, 
void __iomem **base)
  {
struct resource cpuif_res;
  
-	of_address_to_resource(node, 1, _res);

+   if (of_address_to_resource(node, 1, _res))
+   return false;


We've just done an of_iomap() on this resource, which succeeded. How
can the same thing now fail? It would mean that the device tree has
been pulled from under our feet...

And if it could happen, why is returning false the right thing to do?
Why would we say we want EOImode==0 instead of 1?

  
  	if (!is_hyp_mode_available())

return false;


As it stands, I'm not taking such a patch. It either papers over a
bigger problem, or just keeps a warning quiet for the sake of it.

Thanks,

M.



[PATCH] irqchip/gic: check return value of of_address_to_resource

2018-07-05 Thread Bo Yan
The of_address_to_resource returns 0 if successful. gic_check_eoimode
calls it without checking the return value. This induces Coverity
warning: "Unchecked return value".

Return false from gic_check_eoimode if of_address_to_resource returns
non-0 value.

Signed-off-by: Bo Yan 
---
 drivers/irqchip/irq-gic.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index ced10c4..0bceb10 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -1284,7 +1284,8 @@ static bool gic_check_eoimode(struct device_node *node, 
void __iomem **base)
 {
struct resource cpuif_res;
 
-   of_address_to_resource(node, 1, _res);
+   if (of_address_to_resource(node, 1, _res))
+   return false;
 
if (!is_hyp_mode_available())
return false;
-- 
2.7.4



[PATCH] irqchip/gic: check return value of of_address_to_resource

2018-07-05 Thread Bo Yan
The of_address_to_resource returns 0 if successful. gic_check_eoimode
calls it without checking the return value. This induces Coverity
warning: "Unchecked return value".

Return false from gic_check_eoimode if of_address_to_resource returns
non-0 value.

Signed-off-by: Bo Yan 
---
 drivers/irqchip/irq-gic.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c
index ced10c4..0bceb10 100644
--- a/drivers/irqchip/irq-gic.c
+++ b/drivers/irqchip/irq-gic.c
@@ -1284,7 +1284,8 @@ static bool gic_check_eoimode(struct device_node *node, 
void __iomem **base)
 {
struct resource cpuif_res;
 
-   of_address_to_resource(node, 1, _res);
+   if (of_address_to_resource(node, 1, _res))
+   return false;
 
if (!is_hyp_mode_available())
return false;
-- 
2.7.4



[PATCH] coresight: etm4x: fix bit shifting

2018-03-05 Thread Bo Yan
ctxid_pid and vmid_val in config are of type u64. When an integer
0xFF is being left shifted more than 32 bits, the behavior is
undefined. The fix is to specify 0xFF as an unsigned long.

Detected by Coverity scan: CID 37650, 37651 (Bad bit shift operation)

Signed-off-by: Bo Yan <b...@nvidia.com>
---
 drivers/hwtracing/coresight/coresight-etm4x-sysfs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c 
b/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c
index 4e6eab53e34e..d21961710713 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c
@@ -1780,7 +1780,7 @@ static ssize_t ctxid_masks_store(struct device *dev,
 */
for (j = 0; j < 8; j++) {
if (maskbyte & 1)
-   config->ctxid_pid[i] &= ~(0xFF << (j * 8));
+   config->ctxid_pid[i] &= ~(0xFFUL << (j * 8));
maskbyte >>= 1;
}
/* Select the next ctxid comparator mask value */
@@ -1963,7 +1963,7 @@ static ssize_t vmid_masks_store(struct device *dev,
 */
for (j = 0; j < 8; j++) {
if (maskbyte & 1)
-   config->vmid_val[i] &= ~(0xFF << (j * 8));
+   config->vmid_val[i] &= ~(0xFFUL << (j * 8));
maskbyte >>= 1;
}
/* Select the next vmid comparator mask value */
-- 
2.7.4



[PATCH] coresight: etm4x: fix bit shifting

2018-03-05 Thread Bo Yan
ctxid_pid and vmid_val in config are of type u64. When an integer
0xFF is being left shifted more than 32 bits, the behavior is
undefined. The fix is to specify 0xFF as an unsigned long.

Detected by Coverity scan: CID 37650, 37651 (Bad bit shift operation)

Signed-off-by: Bo Yan 
---
 drivers/hwtracing/coresight/coresight-etm4x-sysfs.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c 
b/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c
index 4e6eab53e34e..d21961710713 100644
--- a/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c
+++ b/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c
@@ -1780,7 +1780,7 @@ static ssize_t ctxid_masks_store(struct device *dev,
 */
for (j = 0; j < 8; j++) {
if (maskbyte & 1)
-   config->ctxid_pid[i] &= ~(0xFF << (j * 8));
+   config->ctxid_pid[i] &= ~(0xFFUL << (j * 8));
maskbyte >>= 1;
}
/* Select the next ctxid comparator mask value */
@@ -1963,7 +1963,7 @@ static ssize_t vmid_masks_store(struct device *dev,
 */
for (j = 0; j < 8; j++) {
if (maskbyte & 1)
-   config->vmid_val[i] &= ~(0xFF << (j * 8));
+   config->vmid_val[i] &= ~(0xFFUL << (j * 8));
maskbyte >>= 1;
}
/* Select the next vmid comparator mask value */
-- 
2.7.4



Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended

2018-02-02 Thread Bo Yan

On 02/02/2018 11:34 AM, Saravana Kannan wrote:

On 02/02/2018 03:54 AM, Rafael J. Wysocki wrote:

On Wednesday, January 24, 2018 9:53:14 PM CET Bo Yan wrote:


On 01/23/2018 06:02 PM, Rafael J. Wysocki wrote:

On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote:

   drivers/cpufreq/cpufreq.c | 4 
   1 file changed, 4 insertions(+)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 41d148af7748..95b1c4afe14e 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1680,6 +1680,10 @@ void cpufreq_resume(void)
   if (!cpufreq_driver)
   return;

+    if (unlikely(!cpufreq_suspended)) {
+    pr_warn("%s: resume after failing suspend\n", __func__);
+    return;
+    }
   cpufreq_suspended = false;

   if (!has_target() && !cpufreq_driver->resume)


Good catch, but rather than doing this it would be better to avoid
calling cpufreq_resume() at all if cpufreq_suspend() has not been 
called.

Yes, I thought about that, but there is no good way to skip over it
without introducing another flag. cpufreq_resume is called by
dpm_resume, cpufreq_suspend is called by dpm_suspend. In the failure
case, dpm_resume is called, but dpm_suspend is not. So on a higher 
level

it's already unbalanced.

One possibility is to rely on the pm_transition flag. So something 
like:



diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index dc259d20c967..8469e6fc2b2c 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -842,6 +842,7 @@ static void async_resume(void *data, async_cookie_t
cookie)
   void dpm_resume(pm_message_t state)
   {
  struct device *dev;
+   bool suspended = (pm_transition.event != PM_EVENT_ON);
  ktime_t starttime = ktime_get();

  trace_suspend_resume(TPS("dpm_resume"), state.event, true);
@@ -885,7 +886,8 @@ void dpm_resume(pm_message_t state)
  async_synchronize_full();
  dpm_show_time(starttime, state, NULL);

-   cpufreq_resume();
+   if (likely(suspended))
+   cpufreq_resume();
  trace_suspend_resume(TPS("dpm_resume"), state.event, false);
   }


I was thinking about something else.

Anyway, I think your original patch is OK too, but without printing the
message.  Just combine the cpufreq_suspended check with the 
cpufreq_driver

one and the unlikely() thing is not necessary.



I rather have this fixed in the dpm_suspend/resume() code. This is 
just masking the first issue that's being caused by unbalanced error 
handling. If that means adding flags in dpm_suspend/resume() then 
that's what we should do right now and clean it up later if it can be 
improved. Making cpufreq more messy doesn't seem like the right answer.


Thanks,
Saravana


dpm_suspend and dpm_resume by themselves are not balanced in this 
particular case. As it's currently structured, dpm_resume can't be 
omitted even if dpm_suspend is skipped due to earlier failure.  I think 
checking cpufreq_suspended flag is a reasonable compromise. If we can 
find a way to make dpm_suspend/dpm_resume also balanced, that will be best.




Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended

2018-02-02 Thread Bo Yan

On 02/02/2018 11:34 AM, Saravana Kannan wrote:

On 02/02/2018 03:54 AM, Rafael J. Wysocki wrote:

On Wednesday, January 24, 2018 9:53:14 PM CET Bo Yan wrote:


On 01/23/2018 06:02 PM, Rafael J. Wysocki wrote:

On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote:

   drivers/cpufreq/cpufreq.c | 4 
   1 file changed, 4 insertions(+)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 41d148af7748..95b1c4afe14e 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1680,6 +1680,10 @@ void cpufreq_resume(void)
   if (!cpufreq_driver)
   return;

+    if (unlikely(!cpufreq_suspended)) {
+    pr_warn("%s: resume after failing suspend\n", __func__);
+    return;
+    }
   cpufreq_suspended = false;

   if (!has_target() && !cpufreq_driver->resume)


Good catch, but rather than doing this it would be better to avoid
calling cpufreq_resume() at all if cpufreq_suspend() has not been 
called.

Yes, I thought about that, but there is no good way to skip over it
without introducing another flag. cpufreq_resume is called by
dpm_resume, cpufreq_suspend is called by dpm_suspend. In the failure
case, dpm_resume is called, but dpm_suspend is not. So on a higher 
level

it's already unbalanced.

One possibility is to rely on the pm_transition flag. So something 
like:



diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index dc259d20c967..8469e6fc2b2c 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -842,6 +842,7 @@ static void async_resume(void *data, async_cookie_t
cookie)
   void dpm_resume(pm_message_t state)
   {
  struct device *dev;
+   bool suspended = (pm_transition.event != PM_EVENT_ON);
  ktime_t starttime = ktime_get();

  trace_suspend_resume(TPS("dpm_resume"), state.event, true);
@@ -885,7 +886,8 @@ void dpm_resume(pm_message_t state)
  async_synchronize_full();
  dpm_show_time(starttime, state, NULL);

-   cpufreq_resume();
+   if (likely(suspended))
+   cpufreq_resume();
  trace_suspend_resume(TPS("dpm_resume"), state.event, false);
   }


I was thinking about something else.

Anyway, I think your original patch is OK too, but without printing the
message.  Just combine the cpufreq_suspended check with the 
cpufreq_driver

one and the unlikely() thing is not necessary.



I rather have this fixed in the dpm_suspend/resume() code. This is 
just masking the first issue that's being caused by unbalanced error 
handling. If that means adding flags in dpm_suspend/resume() then 
that's what we should do right now and clean it up later if it can be 
improved. Making cpufreq more messy doesn't seem like the right answer.


Thanks,
Saravana


dpm_suspend and dpm_resume by themselves are not balanced in this 
particular case. As it's currently structured, dpm_resume can't be 
omitted even if dpm_suspend is skipped due to earlier failure.  I think 
checking cpufreq_suspended flag is a reasonable compromise. If we can 
find a way to make dpm_suspend/dpm_resume also balanced, that will be best.




[PATCH v2] cpufreq: skip cpufreq resume if it's not suspended

2018-01-25 Thread Bo Yan
cpufreq_resume can be called even without preceding cpufreq_suspend.
This can happen in following scenario:

suspend_devices_and_enter
   --> dpm_suspend_start
  --> dpm_prepare
  --> device_prepare : this function errors out
  --> dpm_suspend: this is skipped due to dpm_prepare failure
   this means cpufreq_suspend is skipped over
   --> goto Recover_platform, due to previous error
   --> goto Resume_devices
   --> dpm_resume_end
   --> dpm_resume
   --> cpufreq_resume

In case schedutil is used as frequency governor, cpufreq_resume will
eventually call sugov_start, which does following:

memset(sg_cpu, 0, sizeof(*sg_cpu));


This effectively erases function pointer for frequency update, causing
crash later on. The function pointer would have been set correctly if
subsequent cpufreq_add_update_util_hook runs successfully, but that
function returns earlier because cpufreq_suspend was not called:

if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu)))
return;

Ideally, suspend should succeed, then things will be fine. But even
in case of suspend failure, system should not crash.

The fix is to check the pm_transition status in dpm_resume. if
pm_transition.event == PMSG_ON, we know for sure dpm_suspend has not
been called, so do not call cpufreq_resume.

Signed-off-by: Bo Yan <b...@nvidia.com>
---
 drivers/base/power/main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 08744b572af6..39829d7a9311 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -921,6 +921,7 @@ static void async_resume(void *data, async_cookie_t cookie)
 void dpm_resume(pm_message_t state)
 {
struct device *dev;
+   bool suspended = (pm_transition.event != PM_EVENT_ON);
ktime_t starttime = ktime_get();
 
trace_suspend_resume(TPS("dpm_resume"), state.event, true);
@@ -964,7 +965,8 @@ void dpm_resume(pm_message_t state)
async_synchronize_full();
dpm_show_time(starttime, state, 0, NULL);
 
-   cpufreq_resume();
+   if (likely(suspended))
+   cpufreq_resume();
trace_suspend_resume(TPS("dpm_resume"), state.event, false);
 }
 
-- 
2.7.4



[PATCH v2] cpufreq: skip cpufreq resume if it's not suspended

2018-01-25 Thread Bo Yan
cpufreq_resume can be called even without preceding cpufreq_suspend.
This can happen in following scenario:

suspend_devices_and_enter
   --> dpm_suspend_start
  --> dpm_prepare
  --> device_prepare : this function errors out
  --> dpm_suspend: this is skipped due to dpm_prepare failure
   this means cpufreq_suspend is skipped over
   --> goto Recover_platform, due to previous error
   --> goto Resume_devices
   --> dpm_resume_end
   --> dpm_resume
   --> cpufreq_resume

In case schedutil is used as frequency governor, cpufreq_resume will
eventually call sugov_start, which does following:

memset(sg_cpu, 0, sizeof(*sg_cpu));


This effectively erases function pointer for frequency update, causing
crash later on. The function pointer would have been set correctly if
subsequent cpufreq_add_update_util_hook runs successfully, but that
function returns earlier because cpufreq_suspend was not called:

if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu)))
return;

Ideally, suspend should succeed, then things will be fine. But even
in case of suspend failure, system should not crash.

The fix is to check the pm_transition status in dpm_resume. if
pm_transition.event == PMSG_ON, we know for sure dpm_suspend has not
been called, so do not call cpufreq_resume.

Signed-off-by: Bo Yan 
---
 drivers/base/power/main.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index 08744b572af6..39829d7a9311 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -921,6 +921,7 @@ static void async_resume(void *data, async_cookie_t cookie)
 void dpm_resume(pm_message_t state)
 {
struct device *dev;
+   bool suspended = (pm_transition.event != PM_EVENT_ON);
ktime_t starttime = ktime_get();
 
trace_suspend_resume(TPS("dpm_resume"), state.event, true);
@@ -964,7 +965,8 @@ void dpm_resume(pm_message_t state)
async_synchronize_full();
dpm_show_time(starttime, state, 0, NULL);
 
-   cpufreq_resume();
+   if (likely(suspended))
+   cpufreq_resume();
trace_suspend_resume(TPS("dpm_resume"), state.event, false);
 }
 
-- 
2.7.4



Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended

2018-01-24 Thread Bo Yan


On 01/23/2018 06:02 PM, Rafael J. Wysocki wrote:

On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote:

  drivers/cpufreq/cpufreq.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 41d148af7748..95b1c4afe14e 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1680,6 +1680,10 @@ void cpufreq_resume(void)
if (!cpufreq_driver)
return;
  
+	if (unlikely(!cpufreq_suspended)) {

+   pr_warn("%s: resume after failing suspend\n", __func__);
+   return;
+   }
cpufreq_suspended = false;
  
  	if (!has_target() && !cpufreq_driver->resume)



Good catch, but rather than doing this it would be better to avoid
calling cpufreq_resume() at all if cpufreq_suspend() has not been called.
Yes, I thought about that, but there is no good way to skip over it 
without introducing another flag. cpufreq_resume is called by 
dpm_resume, cpufreq_suspend is called by dpm_suspend. In the failure 
case, dpm_resume is called, but dpm_suspend is not. So on a higher level 
it's already unbalanced.


One possibility is to rely on the pm_transition flag. So something like:


diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index dc259d20c967..8469e6fc2b2c 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -842,6 +842,7 @@ static void async_resume(void *data, async_cookie_t 
cookie)

 void dpm_resume(pm_message_t state)
 {
    struct device *dev;
+   bool suspended = (pm_transition.event != PM_EVENT_ON);
    ktime_t starttime = ktime_get();

    trace_suspend_resume(TPS("dpm_resume"), state.event, true);
@@ -885,7 +886,8 @@ void dpm_resume(pm_message_t state)
    async_synchronize_full();
    dpm_show_time(starttime, state, NULL);

-   cpufreq_resume();
+   if (likely(suspended))
+   cpufreq_resume();
    trace_suspend_resume(TPS("dpm_resume"), state.event, false);
 }

This relies on the fact that the pm_transition will stay as PMSG_ON if 
dpm_prepare failed, in which case dpm_suspend will be skipped over, 
pm_transition will remain as 0 until dpm_resume.


dpm_suspend changes pm_transition to whatever state it receives, which 
is never PMSG_ON. pm_transition is not changing to PMSG_ON before 
dpm_resume. This is my understanding. does this make sense?





Thanks,
Rafael






Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended

2018-01-24 Thread Bo Yan


On 01/23/2018 06:02 PM, Rafael J. Wysocki wrote:

On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote:

  drivers/cpufreq/cpufreq.c | 4 
  1 file changed, 4 insertions(+)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 41d148af7748..95b1c4afe14e 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1680,6 +1680,10 @@ void cpufreq_resume(void)
if (!cpufreq_driver)
return;
  
+	if (unlikely(!cpufreq_suspended)) {

+   pr_warn("%s: resume after failing suspend\n", __func__);
+   return;
+   }
cpufreq_suspended = false;
  
  	if (!has_target() && !cpufreq_driver->resume)



Good catch, but rather than doing this it would be better to avoid
calling cpufreq_resume() at all if cpufreq_suspend() has not been called.
Yes, I thought about that, but there is no good way to skip over it 
without introducing another flag. cpufreq_resume is called by 
dpm_resume, cpufreq_suspend is called by dpm_suspend. In the failure 
case, dpm_resume is called, but dpm_suspend is not. So on a higher level 
it's already unbalanced.


One possibility is to rely on the pm_transition flag. So something like:


diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c
index dc259d20c967..8469e6fc2b2c 100644
--- a/drivers/base/power/main.c
+++ b/drivers/base/power/main.c
@@ -842,6 +842,7 @@ static void async_resume(void *data, async_cookie_t 
cookie)

 void dpm_resume(pm_message_t state)
 {
    struct device *dev;
+   bool suspended = (pm_transition.event != PM_EVENT_ON);
    ktime_t starttime = ktime_get();

    trace_suspend_resume(TPS("dpm_resume"), state.event, true);
@@ -885,7 +886,8 @@ void dpm_resume(pm_message_t state)
    async_synchronize_full();
    dpm_show_time(starttime, state, NULL);

-   cpufreq_resume();
+   if (likely(suspended))
+   cpufreq_resume();
    trace_suspend_resume(TPS("dpm_resume"), state.event, false);
 }

This relies on the fact that the pm_transition will stay as PMSG_ON if 
dpm_prepare failed, in which case dpm_suspend will be skipped over, 
pm_transition will remain as 0 until dpm_resume.


dpm_suspend changes pm_transition to whatever state it receives, which 
is never PMSG_ON. pm_transition is not changing to PMSG_ON before 
dpm_resume. This is my understanding. does this make sense?





Thanks,
Rafael






[PATCH] cpufreq: skip cpufreq resume if it's not suspended

2018-01-23 Thread Bo Yan
cpufreq_resume can be called even without preceding cpufreq_suspend.
This can happen in following scenario:

suspend_devices_and_enter
   --> dpm_suspend_start
  --> dpm_prepare
  --> device_prepare : this function errors out
  --> dpm_suspend: this is skipped due to dpm_prepare failure
   this means cpufreq_suspend is skipped over
   --> goto Recover_platform, due to previous error
   --> goto Resume_devices
   --> dpm_resume_end
   --> dpm_resume
   --> cpufreq_resume

In case schedutil is used as frequency governor, cpufreq_resume will
eventually call sugov_start, which does following:

memset(sg_cpu, 0, sizeof(*sg_cpu));


This effectively erases function pointer for frequency update, causing
crash later on. The function pointer would have been set correctly if
subsequent cpufreq_add_update_util_hook runs successfully, but that
function returns earlier because cpufreq_suspend was not called:

if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu)))
return;

Ideally, suspend should succeed, then things will be fine. But even
in case of suspend failure, system should not crash.

The fix is to check cpufreq_suspended first, if it's false, that means
cpufreq_suspend was not called in the first place, so do not resume
cpufreq.

Signed-off-by: Bo Yan <b...@nvidia.com>
---
 drivers/cpufreq/cpufreq.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 41d148af7748..95b1c4afe14e 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1680,6 +1680,10 @@ void cpufreq_resume(void)
if (!cpufreq_driver)
return;
 
+   if (unlikely(!cpufreq_suspended)) {
+   pr_warn("%s: resume after failing suspend\n", __func__);
+   return;
+   }
cpufreq_suspended = false;
 
if (!has_target() && !cpufreq_driver->resume)
-- 
2.7.4



[PATCH] cpufreq: skip cpufreq resume if it's not suspended

2018-01-23 Thread Bo Yan
cpufreq_resume can be called even without preceding cpufreq_suspend.
This can happen in following scenario:

suspend_devices_and_enter
   --> dpm_suspend_start
  --> dpm_prepare
  --> device_prepare : this function errors out
  --> dpm_suspend: this is skipped due to dpm_prepare failure
   this means cpufreq_suspend is skipped over
   --> goto Recover_platform, due to previous error
   --> goto Resume_devices
   --> dpm_resume_end
   --> dpm_resume
   --> cpufreq_resume

In case schedutil is used as frequency governor, cpufreq_resume will
eventually call sugov_start, which does following:

memset(sg_cpu, 0, sizeof(*sg_cpu));


This effectively erases function pointer for frequency update, causing
crash later on. The function pointer would have been set correctly if
subsequent cpufreq_add_update_util_hook runs successfully, but that
function returns earlier because cpufreq_suspend was not called:

if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu)))
return;

Ideally, suspend should succeed, then things will be fine. But even
in case of suspend failure, system should not crash.

The fix is to check cpufreq_suspended first, if it's false, that means
cpufreq_suspend was not called in the first place, so do not resume
cpufreq.

Signed-off-by: Bo Yan 
---
 drivers/cpufreq/cpufreq.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c
index 41d148af7748..95b1c4afe14e 100644
--- a/drivers/cpufreq/cpufreq.c
+++ b/drivers/cpufreq/cpufreq.c
@@ -1680,6 +1680,10 @@ void cpufreq_resume(void)
if (!cpufreq_driver)
return;
 
+   if (unlikely(!cpufreq_suspended)) {
+   pr_warn("%s: resume after failing suspend\n", __func__);
+   return;
+   }
cpufreq_suspended = false;
 
if (!has_target() && !cpufreq_driver->resume)
-- 
2.7.4



Re: [PATCH] tracing: erase irqsoff trace with empty write

2017-09-19 Thread Bo Yan

Steven,

Thanks for review. I have posted the following update

https://lkml.org/lkml/2017/9/18/715

Bo

On 09/18/2017 07:38 AM, Steven Rostedt wrote:

On Mon, 11 Sep 2017 11:16:35 -0700
Bo Yan <b...@nvidia.com> wrote:


One convenient way to erase trace is "echo > trace". However, this
is currently broken if the current tracer is irqsoff tracer. This
is because irqsoff tracer use max_buffer as the default trace
buffer.

Set the max_buffer as the one to be cleared when it's the trace
buffer currently in use.


Hi Bo,

I have no problems with the logic of the patch. But just one nit. The
tr->trace_buffer has been called "trace_buf" elsewhere in the file. Can
you resend with using the variable name "trace_buf" instead of "tr_buf"
just to keep it consistent in the file. It's also been called 'buf' but
trace_buf would probably be more appropriate.

Thanks!

-- Steve



Signed-off-by: Bo Yan <b...@nvidia.com>
---
  kernel/trace/trace.c | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 5360b7aec57a..1f91bdd9365b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4020,11 +4020,17 @@ static int tracing_open(struct inode *inode, struct 
file *file)
/* If this file was open for write, then erase contents */
if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) {
int cpu = tracing_get_cpu(inode);
+   struct trace_buffer *tr_buf = >trace_buffer;
+
+#ifdef CONFIG_TRACER_MAX_TRACE
+   if (tr->current_trace->print_max)
+   tr_buf = >max_buffer;
+#endif
  
  		if (cpu == RING_BUFFER_ALL_CPUS)

-   tracing_reset_online_cpus(>trace_buffer);
+   tracing_reset_online_cpus(tr_buf);
else
-   tracing_reset(>trace_buffer, cpu);
+   tracing_reset(tr_buf, cpu);
}
  
  	if (file->f_mode & FMODE_READ) {






Re: [PATCH] tracing: erase irqsoff trace with empty write

2017-09-19 Thread Bo Yan

Steven,

Thanks for review. I have posted the following update

https://lkml.org/lkml/2017/9/18/715

Bo

On 09/18/2017 07:38 AM, Steven Rostedt wrote:

On Mon, 11 Sep 2017 11:16:35 -0700
Bo Yan  wrote:


One convenient way to erase trace is "echo > trace". However, this
is currently broken if the current tracer is irqsoff tracer. This
is because irqsoff tracer use max_buffer as the default trace
buffer.

Set the max_buffer as the one to be cleared when it's the trace
buffer currently in use.


Hi Bo,

I have no problems with the logic of the patch. But just one nit. The
tr->trace_buffer has been called "trace_buf" elsewhere in the file. Can
you resend with using the variable name "trace_buf" instead of "tr_buf"
just to keep it consistent in the file. It's also been called 'buf' but
trace_buf would probably be more appropriate.

Thanks!

-- Steve



Signed-off-by: Bo Yan 
---
  kernel/trace/trace.c | 10 --
  1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 5360b7aec57a..1f91bdd9365b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4020,11 +4020,17 @@ static int tracing_open(struct inode *inode, struct 
file *file)
/* If this file was open for write, then erase contents */
if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) {
int cpu = tracing_get_cpu(inode);
+   struct trace_buffer *tr_buf = >trace_buffer;
+
+#ifdef CONFIG_TRACER_MAX_TRACE
+   if (tr->current_trace->print_max)
+   tr_buf = >max_buffer;
+#endif
  
  		if (cpu == RING_BUFFER_ALL_CPUS)

-   tracing_reset_online_cpus(>trace_buffer);
+   tracing_reset_online_cpus(tr_buf);
else
-   tracing_reset(>trace_buffer, cpu);
+   tracing_reset(tr_buf, cpu);
}
  
  	if (file->f_mode & FMODE_READ) {






Re: [PATCH] cpufreq: cpufreq_stats: make last_index signed int

2017-09-18 Thread Bo Yan

On 09/17/2017 06:50 PM, Viresh Kumar wrote:

On 15-09-17, 13:13, Bo Yan wrote:

It is possible for last_index to get a -1 if current frequency
is not found in the freq table when stats is created. If the
function "cpufreq_stats_update" is called before last_index is
updated with a valid value, the "-1" will be used as index to
update stats->time_in_state, triggering an exception.


No, that's not how it works AFAIK and if it did, then can you explain how your
solution fixes it?

AFAIK, what happens right now is that stats->last_index eventually stores
2147483647 (uint max) and the exception comes while accessing that value.

While with your change, it will become -1 and accessing array[-1] is fine by C
standards, though it is still the wrong thing to do as you are accessing
something outside of the array.

We should just check last_index == -1 before calling cpufreq_stats_update(),
which is already done by one of the callers.



Currently, the "last_index" is being checked before 
cpufreq_stats_update(stats) inside function 
"cpufreq_stats_record_transition", so it's taken care of.


However, the function "show_time_in_state" also calls 
cpufreq_stats_update, the similar check should be done there too, like this:


diff --git a/drivers/cpufreq/cpufreq_stats.c 
b/drivers/cpufreq/cpufreq_stats.c

index e75880eb037d..15305b5ec322 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -62,7 +62,8 @@ static ssize_t show_time_in_state(struct 
cpufreq_policy *policy, char *buf)

if (policy->fast_switch_enabled)
return 0;

-   cpufreq_stats_update(stats);
+   if ((int)stats->last_index >= 0)
+   cpufreq_stats_update(stats);
for (i = 0; i < stats->state_num; i++) {
len += sprintf(buf + len, "%u %llu\n", 
stats->freq_table[i],

(unsigned long long)


This is only needed when policy->cur is not in frequency table when 
stats table is created, in which case, stats->last_index will get -1, 
then user does a "cat time_in_state" before any frequency transition.


Does this make sense?



Re: [PATCH] cpufreq: cpufreq_stats: make last_index signed int

2017-09-18 Thread Bo Yan

On 09/17/2017 06:50 PM, Viresh Kumar wrote:

On 15-09-17, 13:13, Bo Yan wrote:

It is possible for last_index to get a -1 if current frequency
is not found in the freq table when stats is created. If the
function "cpufreq_stats_update" is called before last_index is
updated with a valid value, the "-1" will be used as index to
update stats->time_in_state, triggering an exception.


No, that's not how it works AFAIK and if it did, then can you explain how your
solution fixes it?

AFAIK, what happens right now is that stats->last_index eventually stores
2147483647 (uint max) and the exception comes while accessing that value.

While with your change, it will become -1 and accessing array[-1] is fine by C
standards, though it is still the wrong thing to do as you are accessing
something outside of the array.

We should just check last_index == -1 before calling cpufreq_stats_update(),
which is already done by one of the callers.



Currently, the "last_index" is being checked before 
cpufreq_stats_update(stats) inside function 
"cpufreq_stats_record_transition", so it's taken care of.


However, the function "show_time_in_state" also calls 
cpufreq_stats_update, the similar check should be done there too, like this:


diff --git a/drivers/cpufreq/cpufreq_stats.c 
b/drivers/cpufreq/cpufreq_stats.c

index e75880eb037d..15305b5ec322 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -62,7 +62,8 @@ static ssize_t show_time_in_state(struct 
cpufreq_policy *policy, char *buf)

if (policy->fast_switch_enabled)
return 0;

-   cpufreq_stats_update(stats);
+   if ((int)stats->last_index >= 0)
+   cpufreq_stats_update(stats);
for (i = 0; i < stats->state_num; i++) {
len += sprintf(buf + len, "%u %llu\n", 
stats->freq_table[i],

(unsigned long long)


This is only needed when policy->cur is not in frequency table when 
stats table is created, in which case, stats->last_index will get -1, 
then user does a "cat time_in_state" before any frequency transition.


Does this make sense?



[PATCH V2] tracing: erase irqsoff trace with empty write

2017-09-18 Thread Bo Yan
One convenient way to erase trace is "echo > trace". However, this
is currently broken if the current tracer is irqsoff tracer. This
is because irqsoff tracer use max_buffer as the default trace
buffer.

Set the max_buffer as the one to be cleared when it's the trace
buffer currently in use.

Signed-off-by: Bo Yan <b...@nvidia.com>
---
Changes in v2: rename tr_buf to trace_buf

 kernel/trace/trace.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 5360b7aec57a..a7fb136da891 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4020,11 +4020,17 @@ static int tracing_open(struct inode *inode, struct 
file *file)
/* If this file was open for write, then erase contents */
if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) {
int cpu = tracing_get_cpu(inode);
+   struct trace_buffer *trace_buf = >trace_buffer;
+
+#ifdef CONFIG_TRACER_MAX_TRACE
+   if (tr->current_trace->print_max)
+   trace_buf = >max_buffer;
+#endif
 
if (cpu == RING_BUFFER_ALL_CPUS)
-   tracing_reset_online_cpus(>trace_buffer);
+   tracing_reset_online_cpus(trace_buf);
else
-   tracing_reset(>trace_buffer, cpu);
+   tracing_reset(trace_buf, cpu);
}
 
if (file->f_mode & FMODE_READ) {
-- 
2.7.4



[PATCH V2] tracing: erase irqsoff trace with empty write

2017-09-18 Thread Bo Yan
One convenient way to erase trace is "echo > trace". However, this
is currently broken if the current tracer is irqsoff tracer. This
is because irqsoff tracer use max_buffer as the default trace
buffer.

Set the max_buffer as the one to be cleared when it's the trace
buffer currently in use.

Signed-off-by: Bo Yan 
---
Changes in v2: rename tr_buf to trace_buf

 kernel/trace/trace.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 5360b7aec57a..a7fb136da891 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4020,11 +4020,17 @@ static int tracing_open(struct inode *inode, struct 
file *file)
/* If this file was open for write, then erase contents */
if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) {
int cpu = tracing_get_cpu(inode);
+   struct trace_buffer *trace_buf = >trace_buffer;
+
+#ifdef CONFIG_TRACER_MAX_TRACE
+   if (tr->current_trace->print_max)
+   trace_buf = >max_buffer;
+#endif
 
if (cpu == RING_BUFFER_ALL_CPUS)
-   tracing_reset_online_cpus(>trace_buffer);
+   tracing_reset_online_cpus(trace_buf);
else
-   tracing_reset(>trace_buffer, cpu);
+   tracing_reset(trace_buf, cpu);
}
 
if (file->f_mode & FMODE_READ) {
-- 
2.7.4



[PATCH] cpufreq: cpufreq_stats: make last_index signed int

2017-09-15 Thread Bo Yan
It is possible for last_index to get a -1 if current frequency
is not found in the freq table when stats is created. If the
function "cpufreq_stats_update" is called before last_index is
updated with a valid value, the "-1" will be used as index to
update stats->time_in_state, triggering an exception.

Change the type of last_index of cpufreq_stats from unsigned int
to signed int.

Move last_time to location right before time_in_state. This helps
save 4 bytes in a LP64 programming model.

Signed-off-by: Bo Yan <b...@nvidia.com>
---
 drivers/cpufreq/cpufreq_stats.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
index e75880eb037d..a6838f5a2004 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -18,10 +18,10 @@ static DEFINE_SPINLOCK(cpufreq_stats_lock);
 
 struct cpufreq_stats {
unsigned int total_trans;
-   unsigned long long last_time;
unsigned int max_state;
unsigned int state_num;
-   unsigned int last_index;
+   int last_index;
+   unsigned long long last_time;
u64 *time_in_state;
unsigned int *freq_table;
unsigned int *trans_table;
-- 
2.7.4



[PATCH] cpufreq: cpufreq_stats: make last_index signed int

2017-09-15 Thread Bo Yan
It is possible for last_index to get a -1 if current frequency
is not found in the freq table when stats is created. If the
function "cpufreq_stats_update" is called before last_index is
updated with a valid value, the "-1" will be used as index to
update stats->time_in_state, triggering an exception.

Change the type of last_index of cpufreq_stats from unsigned int
to signed int.

Move last_time to location right before time_in_state. This helps
save 4 bytes in a LP64 programming model.

Signed-off-by: Bo Yan 
---
 drivers/cpufreq/cpufreq_stats.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c
index e75880eb037d..a6838f5a2004 100644
--- a/drivers/cpufreq/cpufreq_stats.c
+++ b/drivers/cpufreq/cpufreq_stats.c
@@ -18,10 +18,10 @@ static DEFINE_SPINLOCK(cpufreq_stats_lock);
 
 struct cpufreq_stats {
unsigned int total_trans;
-   unsigned long long last_time;
unsigned int max_state;
unsigned int state_num;
-   unsigned int last_index;
+   int last_index;
+   unsigned long long last_time;
u64 *time_in_state;
unsigned int *freq_table;
unsigned int *trans_table;
-- 
2.7.4



[PATCH] tracing: erase irqsoff trace with empty write

2017-09-11 Thread Bo Yan
One convenient way to erase trace is "echo > trace". However, this
is currently broken if the current tracer is irqsoff tracer. This
is because irqsoff tracer use max_buffer as the default trace
buffer.

Set the max_buffer as the one to be cleared when it's the trace
buffer currently in use.

Signed-off-by: Bo Yan <b...@nvidia.com>
---
 kernel/trace/trace.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 5360b7aec57a..1f91bdd9365b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4020,11 +4020,17 @@ static int tracing_open(struct inode *inode, struct 
file *file)
/* If this file was open for write, then erase contents */
if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) {
int cpu = tracing_get_cpu(inode);
+   struct trace_buffer *tr_buf = >trace_buffer;
+
+#ifdef CONFIG_TRACER_MAX_TRACE
+   if (tr->current_trace->print_max)
+   tr_buf = >max_buffer;
+#endif
 
if (cpu == RING_BUFFER_ALL_CPUS)
-   tracing_reset_online_cpus(>trace_buffer);
+   tracing_reset_online_cpus(tr_buf);
else
-   tracing_reset(>trace_buffer, cpu);
+   tracing_reset(tr_buf, cpu);
}
 
if (file->f_mode & FMODE_READ) {
-- 
2.7.4



[PATCH] tracing: erase irqsoff trace with empty write

2017-09-11 Thread Bo Yan
One convenient way to erase trace is "echo > trace". However, this
is currently broken if the current tracer is irqsoff tracer. This
is because irqsoff tracer use max_buffer as the default trace
buffer.

Set the max_buffer as the one to be cleared when it's the trace
buffer currently in use.

Signed-off-by: Bo Yan 
---
 kernel/trace/trace.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c
index 5360b7aec57a..1f91bdd9365b 100644
--- a/kernel/trace/trace.c
+++ b/kernel/trace/trace.c
@@ -4020,11 +4020,17 @@ static int tracing_open(struct inode *inode, struct 
file *file)
/* If this file was open for write, then erase contents */
if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) {
int cpu = tracing_get_cpu(inode);
+   struct trace_buffer *tr_buf = >trace_buffer;
+
+#ifdef CONFIG_TRACER_MAX_TRACE
+   if (tr->current_trace->print_max)
+   tr_buf = >max_buffer;
+#endif
 
if (cpu == RING_BUFFER_ALL_CPUS)
-   tracing_reset_online_cpus(>trace_buffer);
+   tracing_reset_online_cpus(tr_buf);
else
-   tracing_reset(>trace_buffer, cpu);
+   tracing_reset(tr_buf, cpu);
}
 
if (file->f_mode & FMODE_READ) {
-- 
2.7.4