Re: [PATCH V3] arm64: tegra: add topology data for Tegra194 cpu
The patch V3 adopted changes suggested by Thierry. On 2/13/19 8:33 AM, Bo Yan wrote: The xavier CPU architecture includes 8 CPU cores organized in 4 clusters. Add cpu-map data for topology initialization, this fixes the topology information in /sys/devices/system/cpu/cpu[n]/topology Signed-off-by: Bo Yan --- V3: Replaced phandles with full path to CPU node V2: remove cache nodes, add topology data only arch/arm64/boot/dts/nvidia/tegra194.dtsi | 42 1 file changed, 42 insertions(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi index 6dfa1ca..708d20c 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi @@ -870,6 +870,48 @@ #address-cells = <1>; #size-cells = <0>; + cpu-map { + cluster0 { + core0 { + cpu = <&{/cpus/cpu@0}>; + }; + + core1 { + cpu = <&{/cpus/cpu@1}>; + }; + }; + + cluster1 { + core0 { + cpu = <&{/cpus/cpu@2}>; + }; + + core1 { + cpu = <&{/cpus/cpu@3}>; + }; + }; + + cluster2 { + core0 { + cpu = <&{/cpus/cpu@4}>; + }; + + core1 { + cpu = <&{/cpus/cpu@5}>; + }; + }; + + cluster3 { + core0 { + cpu = <&{/cpus/cpu@6}>; + }; + + core1 { + cpu = <&{/cpus/cpu@7}>; + }; + }; + }; + cpu@0 { compatible = "nvidia,tegra194-carmel", "arm,armv8"; device_type = "cpu";
[PATCH V3] arm64: tegra: add topology data for Tegra194 cpu
The xavier CPU architecture includes 8 CPU cores organized in 4 clusters. Add cpu-map data for topology initialization, this fixes the topology information in /sys/devices/system/cpu/cpu[n]/topology Signed-off-by: Bo Yan --- V3: Replaced phandles with full path to CPU node V2: remove cache nodes, add topology data only arch/arm64/boot/dts/nvidia/tegra194.dtsi | 42 1 file changed, 42 insertions(+) diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi index 6dfa1ca..708d20c 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi @@ -870,6 +870,48 @@ #address-cells = <1>; #size-cells = <0>; + cpu-map { + cluster0 { + core0 { + cpu = <&{/cpus/cpu@0}>; + }; + + core1 { + cpu = <&{/cpus/cpu@1}>; + }; + }; + + cluster1 { + core0 { + cpu = <&{/cpus/cpu@2}>; + }; + + core1 { + cpu = <&{/cpus/cpu@3}>; + }; + }; + + cluster2 { + core0 { + cpu = <&{/cpus/cpu@4}>; + }; + + core1 { + cpu = <&{/cpus/cpu@5}>; + }; + }; + + cluster3 { + core0 { + cpu = <&{/cpus/cpu@6}>; + }; + + core1 { + cpu = <&{/cpus/cpu@7}>; + }; + }; + }; + cpu@0 { compatible = "nvidia,tegra194-carmel", "arm,armv8"; device_type = "cpu"; -- 2.7.4
Re: [PATCH V2] arm64: tegra: add topology data for Tegra194 cpu
I agree, will update this patch with full paths replacing phandles. On 2/13/19 12:12 AM, Thierry Reding wrote: On Mon, Feb 11, 2019 at 03:47:07PM -0800, Bo Yan wrote: The xavier CPU architecture includes 8 CPU cores organized in 4 clusters. Add cpu-map data for topology initialization, this fixes the topology information in /sys/devices/system/cpu/cpu[n]/topology Signed-off-by: Bo Yan --- V2: remove cache nodes, add topology data only arch/arm64/boot/dts/nvidia/tegra194.dtsi | 58 +++- 1 file changed, 50 insertions(+), 8 deletions(-) This mostly looks good to me. One minor comment below. diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi index 6dfa1ca..35e6e76 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi @@ -870,56 +870,98 @@ #address-cells = <1>; #size-cells = <0>; - cpu@0 { + cpu-map { + cluster0 { + core0 { + cpu = <_0>; I wonder if perhaps in this case it would be better to use the full path to refer to the phandle here. That way we can avoid the labels, which are somewhat cumbersome to write and the hierarchy, in my opinion, is a much more natural way to reference these. What I'm suggesting would look roughly like this: cpu-map { cluster0 { core0 { cpu = <&{/cpus/cpu@0}>; }; core1 { cpu = <&{/cpus/cpu@1}>; }; }; cluster1 { core0 { cpu = <&{/cpus/cpu@2}>; }; core1 { cpu = <&{/cpus/cpu@3}>; }; }; ... }; That's slightly more characters, but I think it's much easier to read than the labels. I don't feel very strongly about it, though, so feel free to keep this as-is if you prefer. Thierry
[PATCH V2] arm64: tegra: add topology data for Tegra194 cpu
The xavier CPU architecture includes 8 CPU cores organized in 4 clusters. Add cpu-map data for topology initialization, this fixes the topology information in /sys/devices/system/cpu/cpu[n]/topology Signed-off-by: Bo Yan --- V2: remove cache nodes, add topology data only arch/arm64/boot/dts/nvidia/tegra194.dtsi | 58 +++- 1 file changed, 50 insertions(+), 8 deletions(-) diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi index 6dfa1ca..35e6e76 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi @@ -870,56 +870,98 @@ #address-cells = <1>; #size-cells = <0>; - cpu@0 { + cpu-map { + cluster0 { + core0 { + cpu = <_0>; + }; + + core1 { + cpu = <_1>; + }; + }; + + cluster1 { + core0 { + cpu = <_0>; + }; + + core1 { + cpu = <_1>; + }; + }; + + cluster2 { + core0 { + cpu = <_0>; + }; + + core1 { + cpu = <_1>; + }; + }; + + cluster3 { + core0 { + cpu = <_0>; + }; + + core1 { + cpu = <_1>; + }; + }; + }; + + cl0_0: cpu@0 { compatible = "nvidia,tegra194-carmel", "arm,armv8"; device_type = "cpu"; reg = <0x1>; enable-method = "psci"; }; - cpu@1 { + cl0_1: cpu@1 { compatible = "nvidia,tegra194-carmel", "arm,armv8"; device_type = "cpu"; reg = <0x10001>; enable-method = "psci"; }; - cpu@2 { + cl1_0: cpu@2 { compatible = "nvidia,tegra194-carmel", "arm,armv8"; device_type = "cpu"; reg = <0x100>; enable-method = "psci"; }; - cpu@3 { + cl1_1: cpu@3 { compatible = "nvidia,tegra194-carmel", "arm,armv8"; device_type = "cpu"; reg = <0x101>; enable-method = "psci"; }; - cpu@4 { + cl2_0: cpu@4 { compatible = "nvidia,tegra194-carmel", "arm,armv8"; device_type = "cpu"; reg = <0x200>; enable-method = "psci"; }; - cpu@5 { + cl2_1: cpu@5 { compatible = "nvidia,tegra194-carmel", "arm,armv8"; device_type = "cpu"; reg = <0x201>; enable-method = "psci"; }; - cpu@6 { + cl3_0: cpu@6 { compatible = "nvidia,tegra194-carmel", "arm,armv8"; device_type = "cpu"; reg = <0x10300>; enable-method = "psci"; }; - cpu@7 { + cl3_1: cpu@7 { compatible = "nvidia,tegra194-carmel", "arm,armv8"; device_type = "cpu"; reg = <0x10301>; -- 2.7.4
Re: [PATCH] arm64: tegra: add topology data for Tegra194 cpu
To make this simpler, I think it's best to isolate the cache information in its own patch. So I will amend this patch to include topology information only. On 1/31/19 3:29 PM, Bo Yan wrote: On 1/31/19 2:25 PM, Thierry Reding wrote: On Thu, Jan 31, 2019 at 10:35:54AM -0800, Bo Yan wrote: The xavier CPU architecture includes 8 CPU cores organized in 4 clusters. Add cpu-map data for topology initialization, add cache data for cache node creation in sysfs. Signed-off-by: Bo Yan --- arch/arm64/boot/dts/nvidia/tegra194.dtsi | 148 +-- 1 file changed, 140 insertions(+), 8 deletions(-) diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi index 6dfa1ca..7c2a1fb 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi @@ -870,63 +870,195 @@ #address-cells = <1>; #size-cells = <0>; These don't seem to be well-defined. They are mentioned in a very weird locations (Documentation/devicetree/booting-without-of.txt) but there seem to be examples and other device tree files that use them so maybe those are all valid. It might be worth mentioning these in other places where people can more easily find them. It might be logical to place a reference to this document (booting-without-of.txt) in architecture specific documents, for example, arm/cpus.txt. I see the need for improved documentation, but this probably should be best done in a separate change. According to the above document, {i,d}-cache-line-size are deprecated in favour of {i,d}-cache-block-size. Mostly, this seems to be derived from the oddity of PowerPC, which might have different cache-line-size and cache-block-size. I don't know if there are other examples? It looks like the {i,d}-cache-line-size are being used in dts files for almost all architectures, the only exception is arch/sh/boot/dts/j2_mimas_v2.dts. On ARM and ARM64, cache-line-size is the same as cache-block-size. So I am wondering whether the booting-without-of.txt should be fixed instead? just to keep it consistent among dts files, especially in arm64. I also don't see any mention of {i,d}-cache_sets in the device tree bindings, though riscv/cpus.txt mentions {i,d}-cache-sets (note the hyphen instead of underscore) in the examples. arm/l2c2x0.txt and arm/uniphier/cache-unifier.txt describe cache-sets, though that's slightly different. Might make sense to document all these in more standard places. Maybe adding them to arm/cpus.txt. For consistency with other properties, I think there should be called {i,d}-cache-sets like for RISC-V. + l2-cache = <_0>; This seems to be called next-level-cache everywhere else, though it's only formally described in arm/uniphier/cache-uniphier.txt. So might also make sense to add this to arm/cpus.txt. the improved documentation is certainly desired, I agree. }; - cpu@1 { + cl0_1: cpu@1 { compatible = "nvidia,tegra194-carmel", "arm,armv8"; device_type = "cpu"; reg = <0x10001>; enable-method = "psci"; + i-cache-size = <131072>; + i-cache-line-size = <64>; + i-cache-sets = <512>; + d-cache-size = <65536>; + d-cache-line-size = <64>; + d-cache_sets = <256>; + l2-cache = <_0>; }; - cpu@2 { + cl1_0: cpu@2 { compatible = "nvidia,tegra194-carmel", "arm,armv8"; device_type = "cpu"; reg = <0x100>; enable-method = "psci"; + i-cache-size = <131072>; + i-cache-line-size = <64>; + i-cache-sets = <512>; + d-cache-size = <65536>; + d-cache-line-size = <64>; + d-cache_sets = <256>; + l2-cache = <_1>; }; - cpu@3 { + cl1_1: cpu@3 { compatible = "nvidia,tegra194-carmel", "arm,armv8"; device_type = "cpu"; reg = <0x101>; enable-method = "psci"; + i-cache-size = <131072>; + i-cache-line-size = <64>; + i-cache-sets = <512>; + d-cache-size = <65536>; + d-cache-line-size = <64>; + d-cache_sets = <256>; + l2-cache = <_1>; }; - cpu@4 { + cl2_0: cpu@4 { compatible = "nvidia,tegra194-carmel", "arm,armv8"; device_type = "cpu"; reg = <0x200>; enable-method = "psci"; + i-cache-size = <131072>; + i-cache-line-
Re: [PATCH] arm64: tegra: add topology data for Tegra194 cpu
On 1/31/19 2:25 PM, Thierry Reding wrote: On Thu, Jan 31, 2019 at 10:35:54AM -0800, Bo Yan wrote: The xavier CPU architecture includes 8 CPU cores organized in 4 clusters. Add cpu-map data for topology initialization, add cache data for cache node creation in sysfs. Signed-off-by: Bo Yan --- arch/arm64/boot/dts/nvidia/tegra194.dtsi | 148 +-- 1 file changed, 140 insertions(+), 8 deletions(-) diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi index 6dfa1ca..7c2a1fb 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi @@ -870,63 +870,195 @@ #address-cells = <1>; #size-cells = <0>; These don't seem to be well-defined. They are mentioned in a very weird locations (Documentation/devicetree/booting-without-of.txt) but there seem to be examples and other device tree files that use them so maybe those are all valid. It might be worth mentioning these in other places where people can more easily find them. It might be logical to place a reference to this document (booting-without-of.txt) in architecture specific documents, for example, arm/cpus.txt. I see the need for improved documentation, but this probably should be best done in a separate change. According to the above document, {i,d}-cache-line-size are deprecated in favour of {i,d}-cache-block-size. Mostly, this seems to be derived from the oddity of PowerPC, which might have different cache-line-size and cache-block-size. I don't know if there are other examples? It looks like the {i,d}-cache-line-size are being used in dts files for almost all architectures, the only exception is arch/sh/boot/dts/j2_mimas_v2.dts. On ARM and ARM64, cache-line-size is the same as cache-block-size. So I am wondering whether the booting-without-of.txt should be fixed instead? just to keep it consistent among dts files, especially in arm64. I also don't see any mention of {i,d}-cache_sets in the device tree bindings, though riscv/cpus.txt mentions {i,d}-cache-sets (note the hyphen instead of underscore) in the examples. arm/l2c2x0.txt and arm/uniphier/cache-unifier.txt describe cache-sets, though that's slightly different. Might make sense to document all these in more standard places. Maybe adding them to arm/cpus.txt. For consistency with other properties, I think there should be called {i,d}-cache-sets like for RISC-V. + l2-cache = <_0>; This seems to be called next-level-cache everywhere else, though it's only formally described in arm/uniphier/cache-uniphier.txt. So might also make sense to add this to arm/cpus.txt. the improved documentation is certainly desired, I agree. }; - cpu@1 { + cl0_1: cpu@1 { compatible = "nvidia,tegra194-carmel", "arm,armv8"; device_type = "cpu"; reg = <0x10001>; enable-method = "psci"; + i-cache-size = <131072>; + i-cache-line-size = <64>; + i-cache-sets = <512>; + d-cache-size = <65536>; + d-cache-line-size = <64>; + d-cache_sets = <256>; + l2-cache = <_0>; }; - cpu@2 { + cl1_0: cpu@2 { compatible = "nvidia,tegra194-carmel", "arm,armv8"; device_type = "cpu"; reg = <0x100>; enable-method = "psci"; + i-cache-size = <131072>; + i-cache-line-size = <64>; + i-cache-sets = <512>; + d-cache-size = <65536>; + d-cache-line-size = <64>; + d-cache_sets = <256>; + l2-cache = <_1>; }; - cpu@3 { + cl1_1: cpu@3 { compatible = "nvidia,tegra194-carmel", "arm,armv8"; device_type = "cpu"; reg = <0x101>; enable-method = "psci"; + i-cache-size = <131072>; + i-cache-line-size = <64>; + i-cache-sets = <512>; + d-cache-size = <65536>; + d-cache-line-size = <64>; + d-cache_sets = <256>; + l2-cache = <_1>; }; - cpu@4 { + cl2_0: cpu@4 { compatible = "nvidia,tegra194-carmel&qu
[PATCH] arm64: tegra: add topology data for Tegra194 cpu
The xavier CPU architecture includes 8 CPU cores organized in 4 clusters. Add cpu-map data for topology initialization, add cache data for cache node creation in sysfs. Signed-off-by: Bo Yan --- arch/arm64/boot/dts/nvidia/tegra194.dtsi | 148 +-- 1 file changed, 140 insertions(+), 8 deletions(-) diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi b/arch/arm64/boot/dts/nvidia/tegra194.dtsi index 6dfa1ca..7c2a1fb 100644 --- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi +++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi @@ -870,63 +870,195 @@ #address-cells = <1>; #size-cells = <0>; - cpu@0 { + cpu-map { + cluster0 { + core0 { + cpu = <_0>; + }; + + core1 { + cpu = <_1>; + }; + }; + + cluster1 { + core0 { + cpu = <_0>; + }; + + core1 { + cpu = <_1>; + }; + }; + + cluster2 { + core0 { + cpu = <_0>; + }; + + core1 { + cpu = <_1>; + }; + }; + + cluster3 { + core0 { + cpu = <_0>; + }; + + core1 { + cpu = <_1>; + }; + }; + }; + + cl0_0: cpu@0 { compatible = "nvidia,tegra194-carmel", "arm,armv8"; device_type = "cpu"; reg = <0x1>; enable-method = "psci"; + i-cache-size = <131072>; + i-cache-line-size = <64>; + i-cache-sets = <512>; + d-cache-size = <65536>; + d-cache-line-size = <64>; + d-cache_sets = <256>; + l2-cache = <_0>; }; - cpu@1 { + cl0_1: cpu@1 { compatible = "nvidia,tegra194-carmel", "arm,armv8"; device_type = "cpu"; reg = <0x10001>; enable-method = "psci"; + i-cache-size = <131072>; + i-cache-line-size = <64>; + i-cache-sets = <512>; + d-cache-size = <65536>; + d-cache-line-size = <64>; + d-cache_sets = <256>; + l2-cache = <_0>; }; - cpu@2 { + cl1_0: cpu@2 { compatible = "nvidia,tegra194-carmel", "arm,armv8"; device_type = "cpu"; reg = <0x100>; enable-method = "psci"; + i-cache-size = <131072>; + i-cache-line-size = <64>; + i-cache-sets = <512>; + d-cache-size = <65536>; + d-cache-line-size = <64>; + d-cache_sets = <256>; + l2-cache = <_1>; }; - cpu@3 { + cl1_1: cpu@3 { compatible = "nvidia,tegra194-carmel", "arm,armv8"; device_type = "cpu"; reg = <0x101>; enable-method = "psci"; + i-cache-size = <131072>; + i-cache-line-size = <64>; + i-cache-sets = <512>; + d-cache-size = <65536>; + d-cache-line-size = <64>; + d-cache_sets = <256>; + l2-cache = <_1>; }; - cpu@4 { + cl2_0: cpu@4 { compatible = "nvidia,tegra194-carmel", "arm
[PATCH] soc/tegra: remove duplicated function declaration
The function tegra_read_chipid is declared twice in fuse.h. Remove the redundant declaration. Signed-off-by: Bo Yan --- include/soc/tegra/fuse.h | 1 - 1 file changed, 1 deletion(-) diff --git a/include/soc/tegra/fuse.h b/include/soc/tegra/fuse.h index 9b6ea0c..8fb2f8a 100644 --- a/include/soc/tegra/fuse.h +++ b/include/soc/tegra/fuse.h @@ -60,7 +60,6 @@ struct tegra_sku_info { u32 tegra_read_straps(void); u32 tegra_read_ram_code(void); -u32 tegra_read_chipid(void); int tegra_fuse_readl(unsigned long offset, u32 *value); extern struct tegra_sku_info tegra_sku_info; -- 2.7.4
[PATCH] soc/tegra: remove duplicated function declaration
The function tegra_read_chipid is declared twice in fuse.h. Remove the redundant declaration. Signed-off-by: Bo Yan --- include/soc/tegra/fuse.h | 1 - 1 file changed, 1 deletion(-) diff --git a/include/soc/tegra/fuse.h b/include/soc/tegra/fuse.h index 9b6ea0c..8fb2f8a 100644 --- a/include/soc/tegra/fuse.h +++ b/include/soc/tegra/fuse.h @@ -60,7 +60,6 @@ struct tegra_sku_info { u32 tegra_read_straps(void); u32 tegra_read_ram_code(void); -u32 tegra_read_chipid(void); int tegra_fuse_readl(unsigned long offset, u32 *value); extern struct tegra_sku_info tegra_sku_info; -- 2.7.4
Re: a question about IP checksum helper for arm64
Hi Robin, That UBSAN error prompted me to check the generated instructions. The error by itself doesn't make sense to me because there is no requirement for 128b alignment on ldp/stp. With 4.18-rc3, when I build for the default "defconfig" in arch/arm64/configs/, I see the disassembled code shows two ldr instead of a ldp. One example is in function "ip_rcv" (/net/ipv4/ip_input.c). After disassembling vmlinux, I see following: if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl))) 089bcdb8: 38776b05ldrbw5, [x24,x23] static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl) { __uint128_t tmp; u64 sum; tmp = *(const __uint128_t *)iph; 089bcdbc: f9400481ldr x1, [x4,#8] 089bcdc0: f8776b00ldr x0, [x24,x23] 089bcdc4: 12000ca5and w5, w5, #0xf 089bcdc8: 510014a3sub w3, w5, #0x5 This is done with "make ARCH=arm64 CROSS_COMPILE=... defconfig", so the default optimization level is -O2. I tried the same test as you did: aarch64-linux-gnu-objdump -S -d *.o in net/ipv4. The result is inconsistent. In some instances, I do see ldp instruction being generated, in some other cases, it's two ldr. For example, in inet_gro_receive and ip_mc_check_igmp, it's compiled as I expected. For ip_rcv, it's not. So it looks like this is not very consistent, but it also looks like in majority of cases it generates the ldp instructions. On 07/09/2018 04:54 AM, Robin Murphy wrote: Hi Bo, On 06/07/18 17:27, Bo Yan wrote: Hi Robin, Luke, Recently I bumped into an error when running GCC undefined behavior sanitizer: UBSAN: Undefined behaviour in kernel-4.9/arch/arm64/include/asm/checksum.h:34:6 load of misaligned address ffc198c8b254 for type 'const __int128 unsigned' which requires 16 byte alignment What's your config and reproducer here? I've had UBSan enabled a few times since that patch went in and never noticed anything. I've just tried it with 4.18-rc3 and indeed don't see anything from just booting the machine and making some network traffic. It does indeed fire if I also turn on CONFIG_UBSAN_ALIGNMENT, but then it's almost lost among a million other warnings for all manner of types - that's to be expected since, as the help text says, "Enabling this option on architectures that support unaligned accesses may produce a lot of false positives." The relevant code: tmp = *(const __uint128_t *)iph; iph += 16; ihl -= 4; tmp += ((tmp >> 64) | (tmp << 64)); sum = tmp >> 64; do { sum += *(const u32 *)iph; iph += 4; } while (--ihl); But, I checked the generated disassembly, it doesn't look like anything special is generated taking advantage of that. I'm using Linaro GCC 6.4-2017.08, expecting ldp instructions to be emitted, but don't see it. My regular toolchain is currently Linaro 7.2.1-2017.11, but I also tried the last GCC 6 I had installed (6.3.1-2017.05), and for both at -O2 I see LDP emitted as expected for most of the identifiable int128 accesses (both in a standalone test harness and a quick survey of kernel code via 'aarch64-linux-gnu-objdump -S net/ipv4/*.o'). Of course, there may well be places where the compiler gets clever enough to elide all or part of that load where data is already held in registers - I've not audited *that* closely - but the whole point of having a pure C implementation is that it can be aggressively inlined more than inline asm ever could. There were some prior discussions about GCC behavior, like this thread: https://patchwork.kernel.org/patch/9081911/ , in which you talked about the difference between GCC4 and GCC5.3. It looks to me this is regressed in Linaro GCC6.4 build. I have not checked newer GCC versions. Will it be more stable to just do this with inline assembly instead of relying on __uint128_t data type? GCC documentation says __int128 is supported for targets which have an integer mode wide enough to hold 128 bits. aarch64 doesn't have such an integer mode. Yet AArch64 GCC definitely does support __uint128_t, or this code wouldn't even build ;) Robin. Thanks Bo
Re: a question about IP checksum helper for arm64
Hi Robin, That UBSAN error prompted me to check the generated instructions. The error by itself doesn't make sense to me because there is no requirement for 128b alignment on ldp/stp. With 4.18-rc3, when I build for the default "defconfig" in arch/arm64/configs/, I see the disassembled code shows two ldr instead of a ldp. One example is in function "ip_rcv" (/net/ipv4/ip_input.c). After disassembling vmlinux, I see following: if (unlikely(ip_fast_csum((u8 *)iph, iph->ihl))) 089bcdb8: 38776b05ldrbw5, [x24,x23] static inline __sum16 ip_fast_csum(const void *iph, unsigned int ihl) { __uint128_t tmp; u64 sum; tmp = *(const __uint128_t *)iph; 089bcdbc: f9400481ldr x1, [x4,#8] 089bcdc0: f8776b00ldr x0, [x24,x23] 089bcdc4: 12000ca5and w5, w5, #0xf 089bcdc8: 510014a3sub w3, w5, #0x5 This is done with "make ARCH=arm64 CROSS_COMPILE=... defconfig", so the default optimization level is -O2. I tried the same test as you did: aarch64-linux-gnu-objdump -S -d *.o in net/ipv4. The result is inconsistent. In some instances, I do see ldp instruction being generated, in some other cases, it's two ldr. For example, in inet_gro_receive and ip_mc_check_igmp, it's compiled as I expected. For ip_rcv, it's not. So it looks like this is not very consistent, but it also looks like in majority of cases it generates the ldp instructions. On 07/09/2018 04:54 AM, Robin Murphy wrote: Hi Bo, On 06/07/18 17:27, Bo Yan wrote: Hi Robin, Luke, Recently I bumped into an error when running GCC undefined behavior sanitizer: UBSAN: Undefined behaviour in kernel-4.9/arch/arm64/include/asm/checksum.h:34:6 load of misaligned address ffc198c8b254 for type 'const __int128 unsigned' which requires 16 byte alignment What's your config and reproducer here? I've had UBSan enabled a few times since that patch went in and never noticed anything. I've just tried it with 4.18-rc3 and indeed don't see anything from just booting the machine and making some network traffic. It does indeed fire if I also turn on CONFIG_UBSAN_ALIGNMENT, but then it's almost lost among a million other warnings for all manner of types - that's to be expected since, as the help text says, "Enabling this option on architectures that support unaligned accesses may produce a lot of false positives." The relevant code: tmp = *(const __uint128_t *)iph; iph += 16; ihl -= 4; tmp += ((tmp >> 64) | (tmp << 64)); sum = tmp >> 64; do { sum += *(const u32 *)iph; iph += 4; } while (--ihl); But, I checked the generated disassembly, it doesn't look like anything special is generated taking advantage of that. I'm using Linaro GCC 6.4-2017.08, expecting ldp instructions to be emitted, but don't see it. My regular toolchain is currently Linaro 7.2.1-2017.11, but I also tried the last GCC 6 I had installed (6.3.1-2017.05), and for both at -O2 I see LDP emitted as expected for most of the identifiable int128 accesses (both in a standalone test harness and a quick survey of kernel code via 'aarch64-linux-gnu-objdump -S net/ipv4/*.o'). Of course, there may well be places where the compiler gets clever enough to elide all or part of that load where data is already held in registers - I've not audited *that* closely - but the whole point of having a pure C implementation is that it can be aggressively inlined more than inline asm ever could. There were some prior discussions about GCC behavior, like this thread: https://patchwork.kernel.org/patch/9081911/ , in which you talked about the difference between GCC4 and GCC5.3. It looks to me this is regressed in Linaro GCC6.4 build. I have not checked newer GCC versions. Will it be more stable to just do this with inline assembly instead of relying on __uint128_t data type? GCC documentation says __int128 is supported for targets which have an integer mode wide enough to hold 128 bits. aarch64 doesn't have such an integer mode. Yet AArch64 GCC definitely does support __uint128_t, or this code wouldn't even build ;) Robin. Thanks Bo
a question about IP checksum helper for arm64
Hi Robin, Luke, Recently I bumped into an error when running GCC undefined behavior sanitizer: UBSAN: Undefined behaviour in kernel-4.9/arch/arm64/include/asm/checksum.h:34:6 load of misaligned address ffc198c8b254 for type 'const __int128 unsigned' which requires 16 byte alignment The relevant code: tmp = *(const __uint128_t *)iph; iph += 16; ihl -= 4; tmp += ((tmp >> 64) | (tmp << 64)); sum = tmp >> 64; do { sum += *(const u32 *)iph; iph += 4; } while (--ihl); But, I checked the generated disassembly, it doesn't look like anything special is generated taking advantage of that. I'm using Linaro GCC 6.4-2017.08, expecting ldp instructions to be emitted, but don't see it. There were some prior discussions about GCC behavior, like this thread: https://patchwork.kernel.org/patch/9081911/ , in which you talked about the difference between GCC4 and GCC5.3. It looks to me this is regressed in Linaro GCC6.4 build. I have not checked newer GCC versions. Will it be more stable to just do this with inline assembly instead of relying on __uint128_t data type? GCC documentation says __int128 is supported for targets which have an integer mode wide enough to hold 128 bits. aarch64 doesn't have such an integer mode. Thanks Bo
a question about IP checksum helper for arm64
Hi Robin, Luke, Recently I bumped into an error when running GCC undefined behavior sanitizer: UBSAN: Undefined behaviour in kernel-4.9/arch/arm64/include/asm/checksum.h:34:6 load of misaligned address ffc198c8b254 for type 'const __int128 unsigned' which requires 16 byte alignment The relevant code: tmp = *(const __uint128_t *)iph; iph += 16; ihl -= 4; tmp += ((tmp >> 64) | (tmp << 64)); sum = tmp >> 64; do { sum += *(const u32 *)iph; iph += 4; } while (--ihl); But, I checked the generated disassembly, it doesn't look like anything special is generated taking advantage of that. I'm using Linaro GCC 6.4-2017.08, expecting ldp instructions to be emitted, but don't see it. There were some prior discussions about GCC behavior, like this thread: https://patchwork.kernel.org/patch/9081911/ , in which you talked about the difference between GCC4 and GCC5.3. It looks to me this is regressed in Linaro GCC6.4 build. I have not checked newer GCC versions. Will it be more stable to just do this with inline assembly instead of relying on __uint128_t data type? GCC documentation says __int128 is supported for targets which have an integer mode wide enough to hold 128 bits. aarch64 doesn't have such an integer mode. Thanks Bo
Re: [PATCH] irqchip/gic: check return value of of_address_to_resource
Marc, Sorry for the previous reply. My email settings were not correct, so it inserted those confidentiality text, which was not what I intended. This is what I think: diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index ced10c4..0b60bb0 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -1284,7 +1284,7 @@ static bool gic_check_eoimode(struct device_node *node, void __iomem **base) { struct resource cpuif_res; - of_address_to_resource(node, 1, _res); + (void)of_address_to_resource(node, 1, _res); if (!is_hyp_mode_available()) return false; We are 100% sure of_address_to_resource will succeed in this particular case, so "(void)" will help suppress Coverity warning. On 07/05/2018 12:18 PM, Bo Yan wrote: Marc, I'm also wondering if of_address_to_resource can really fail in this particular case? What if we just explicitly discard the return value like this: (void)of_address_to_resource(node, 1, _res); This suppresses Coverity warning by explicitly stating we are 100% sure the function call will always return success. On 07/05/2018 12:13 PM, Marc Zyngier wrote: Hi Bo, On Thu, 5 Jul 2018 11:20:59 -0700 Bo Yan wrote: The of_address_to_resource returns 0 if successful. gic_check_eoimode calls it without checking the return value. This induces Coverity warning: "Unchecked return value". Return false from gic_check_eoimode if of_address_to_resource returns non-0 value. Signed-off-by: Bo Yan --- drivers/irqchip/irq-gic.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index ced10c4..0bceb10 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -1284,7 +1284,8 @@ static bool gic_check_eoimode(struct device_node *node, void __iomem **base) { struct resource cpuif_res; - of_address_to_resource(node, 1, _res); + if (of_address_to_resource(node, 1, _res)) + return false; We've just done an of_iomap() on this resource, which succeeded. How can the same thing now fail? It would mean that the device tree has been pulled from under our feet... And if it could happen, why is returning false the right thing to do? Why would we say we want EOImode==0 instead of 1? if (!is_hyp_mode_available()) return false; As it stands, I'm not taking such a patch. It either papers over a bigger problem, or just keeps a warning quiet for the sake of it. Thanks, M.
Re: [PATCH] irqchip/gic: check return value of of_address_to_resource
Marc, Sorry for the previous reply. My email settings were not correct, so it inserted those confidentiality text, which was not what I intended. This is what I think: diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index ced10c4..0b60bb0 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -1284,7 +1284,7 @@ static bool gic_check_eoimode(struct device_node *node, void __iomem **base) { struct resource cpuif_res; - of_address_to_resource(node, 1, _res); + (void)of_address_to_resource(node, 1, _res); if (!is_hyp_mode_available()) return false; We are 100% sure of_address_to_resource will succeed in this particular case, so "(void)" will help suppress Coverity warning. On 07/05/2018 12:18 PM, Bo Yan wrote: Marc, I'm also wondering if of_address_to_resource can really fail in this particular case? What if we just explicitly discard the return value like this: (void)of_address_to_resource(node, 1, _res); This suppresses Coverity warning by explicitly stating we are 100% sure the function call will always return success. On 07/05/2018 12:13 PM, Marc Zyngier wrote: Hi Bo, On Thu, 5 Jul 2018 11:20:59 -0700 Bo Yan wrote: The of_address_to_resource returns 0 if successful. gic_check_eoimode calls it without checking the return value. This induces Coverity warning: "Unchecked return value". Return false from gic_check_eoimode if of_address_to_resource returns non-0 value. Signed-off-by: Bo Yan --- drivers/irqchip/irq-gic.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index ced10c4..0bceb10 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -1284,7 +1284,8 @@ static bool gic_check_eoimode(struct device_node *node, void __iomem **base) { struct resource cpuif_res; - of_address_to_resource(node, 1, _res); + if (of_address_to_resource(node, 1, _res)) + return false; We've just done an of_iomap() on this resource, which succeeded. How can the same thing now fail? It would mean that the device tree has been pulled from under our feet... And if it could happen, why is returning false the right thing to do? Why would we say we want EOImode==0 instead of 1? if (!is_hyp_mode_available()) return false; As it stands, I'm not taking such a patch. It either papers over a bigger problem, or just keeps a warning quiet for the sake of it. Thanks, M.
Re: [PATCH] irqchip/gic: check return value of of_address_to_resource
Marc, I'm also wondering if of_address_to_resource can really fail in this particular case? What if we just explicitly discard the return value like this: (void)of_address_to_resource(node, 1, _res); This suppresses Coverity warning by explicitly stating we are 100% sure the function call will always return success. On 07/05/2018 12:13 PM, Marc Zyngier wrote: Hi Bo, On Thu, 5 Jul 2018 11:20:59 -0700 Bo Yan wrote: The of_address_to_resource returns 0 if successful. gic_check_eoimode calls it without checking the return value. This induces Coverity warning: "Unchecked return value". Return false from gic_check_eoimode if of_address_to_resource returns non-0 value. Signed-off-by: Bo Yan --- drivers/irqchip/irq-gic.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index ced10c4..0bceb10 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -1284,7 +1284,8 @@ static bool gic_check_eoimode(struct device_node *node, void __iomem **base) { struct resource cpuif_res; - of_address_to_resource(node, 1, _res); + if (of_address_to_resource(node, 1, _res)) + return false; We've just done an of_iomap() on this resource, which succeeded. How can the same thing now fail? It would mean that the device tree has been pulled from under our feet... And if it could happen, why is returning false the right thing to do? Why would we say we want EOImode==0 instead of 1? if (!is_hyp_mode_available()) return false; As it stands, I'm not taking such a patch. It either papers over a bigger problem, or just keeps a warning quiet for the sake of it. Thanks, M.
Re: [PATCH] irqchip/gic: check return value of of_address_to_resource
Marc, I'm also wondering if of_address_to_resource can really fail in this particular case? What if we just explicitly discard the return value like this: (void)of_address_to_resource(node, 1, _res); This suppresses Coverity warning by explicitly stating we are 100% sure the function call will always return success. On 07/05/2018 12:13 PM, Marc Zyngier wrote: Hi Bo, On Thu, 5 Jul 2018 11:20:59 -0700 Bo Yan wrote: The of_address_to_resource returns 0 if successful. gic_check_eoimode calls it without checking the return value. This induces Coverity warning: "Unchecked return value". Return false from gic_check_eoimode if of_address_to_resource returns non-0 value. Signed-off-by: Bo Yan --- drivers/irqchip/irq-gic.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index ced10c4..0bceb10 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -1284,7 +1284,8 @@ static bool gic_check_eoimode(struct device_node *node, void __iomem **base) { struct resource cpuif_res; - of_address_to_resource(node, 1, _res); + if (of_address_to_resource(node, 1, _res)) + return false; We've just done an of_iomap() on this resource, which succeeded. How can the same thing now fail? It would mean that the device tree has been pulled from under our feet... And if it could happen, why is returning false the right thing to do? Why would we say we want EOImode==0 instead of 1? if (!is_hyp_mode_available()) return false; As it stands, I'm not taking such a patch. It either papers over a bigger problem, or just keeps a warning quiet for the sake of it. Thanks, M.
[PATCH] irqchip/gic: check return value of of_address_to_resource
The of_address_to_resource returns 0 if successful. gic_check_eoimode calls it without checking the return value. This induces Coverity warning: "Unchecked return value". Return false from gic_check_eoimode if of_address_to_resource returns non-0 value. Signed-off-by: Bo Yan --- drivers/irqchip/irq-gic.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index ced10c4..0bceb10 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -1284,7 +1284,8 @@ static bool gic_check_eoimode(struct device_node *node, void __iomem **base) { struct resource cpuif_res; - of_address_to_resource(node, 1, _res); + if (of_address_to_resource(node, 1, _res)) + return false; if (!is_hyp_mode_available()) return false; -- 2.7.4
[PATCH] irqchip/gic: check return value of of_address_to_resource
The of_address_to_resource returns 0 if successful. gic_check_eoimode calls it without checking the return value. This induces Coverity warning: "Unchecked return value". Return false from gic_check_eoimode if of_address_to_resource returns non-0 value. Signed-off-by: Bo Yan --- drivers/irqchip/irq-gic.c | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-) diff --git a/drivers/irqchip/irq-gic.c b/drivers/irqchip/irq-gic.c index ced10c4..0bceb10 100644 --- a/drivers/irqchip/irq-gic.c +++ b/drivers/irqchip/irq-gic.c @@ -1284,7 +1284,8 @@ static bool gic_check_eoimode(struct device_node *node, void __iomem **base) { struct resource cpuif_res; - of_address_to_resource(node, 1, _res); + if (of_address_to_resource(node, 1, _res)) + return false; if (!is_hyp_mode_available()) return false; -- 2.7.4
[PATCH] coresight: etm4x: fix bit shifting
ctxid_pid and vmid_val in config are of type u64. When an integer 0xFF is being left shifted more than 32 bits, the behavior is undefined. The fix is to specify 0xFF as an unsigned long. Detected by Coverity scan: CID 37650, 37651 (Bad bit shift operation) Signed-off-by: Bo Yan <b...@nvidia.com> --- drivers/hwtracing/coresight/coresight-etm4x-sysfs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c b/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c index 4e6eab53e34e..d21961710713 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c @@ -1780,7 +1780,7 @@ static ssize_t ctxid_masks_store(struct device *dev, */ for (j = 0; j < 8; j++) { if (maskbyte & 1) - config->ctxid_pid[i] &= ~(0xFF << (j * 8)); + config->ctxid_pid[i] &= ~(0xFFUL << (j * 8)); maskbyte >>= 1; } /* Select the next ctxid comparator mask value */ @@ -1963,7 +1963,7 @@ static ssize_t vmid_masks_store(struct device *dev, */ for (j = 0; j < 8; j++) { if (maskbyte & 1) - config->vmid_val[i] &= ~(0xFF << (j * 8)); + config->vmid_val[i] &= ~(0xFFUL << (j * 8)); maskbyte >>= 1; } /* Select the next vmid comparator mask value */ -- 2.7.4
[PATCH] coresight: etm4x: fix bit shifting
ctxid_pid and vmid_val in config are of type u64. When an integer 0xFF is being left shifted more than 32 bits, the behavior is undefined. The fix is to specify 0xFF as an unsigned long. Detected by Coverity scan: CID 37650, 37651 (Bad bit shift operation) Signed-off-by: Bo Yan --- drivers/hwtracing/coresight/coresight-etm4x-sysfs.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c b/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c index 4e6eab53e34e..d21961710713 100644 --- a/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c +++ b/drivers/hwtracing/coresight/coresight-etm4x-sysfs.c @@ -1780,7 +1780,7 @@ static ssize_t ctxid_masks_store(struct device *dev, */ for (j = 0; j < 8; j++) { if (maskbyte & 1) - config->ctxid_pid[i] &= ~(0xFF << (j * 8)); + config->ctxid_pid[i] &= ~(0xFFUL << (j * 8)); maskbyte >>= 1; } /* Select the next ctxid comparator mask value */ @@ -1963,7 +1963,7 @@ static ssize_t vmid_masks_store(struct device *dev, */ for (j = 0; j < 8; j++) { if (maskbyte & 1) - config->vmid_val[i] &= ~(0xFF << (j * 8)); + config->vmid_val[i] &= ~(0xFFUL << (j * 8)); maskbyte >>= 1; } /* Select the next vmid comparator mask value */ -- 2.7.4
Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended
On 02/02/2018 11:34 AM, Saravana Kannan wrote: On 02/02/2018 03:54 AM, Rafael J. Wysocki wrote: On Wednesday, January 24, 2018 9:53:14 PM CET Bo Yan wrote: On 01/23/2018 06:02 PM, Rafael J. Wysocki wrote: On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote: drivers/cpufreq/cpufreq.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 41d148af7748..95b1c4afe14e 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1680,6 +1680,10 @@ void cpufreq_resume(void) if (!cpufreq_driver) return; + if (unlikely(!cpufreq_suspended)) { + pr_warn("%s: resume after failing suspend\n", __func__); + return; + } cpufreq_suspended = false; if (!has_target() && !cpufreq_driver->resume) Good catch, but rather than doing this it would be better to avoid calling cpufreq_resume() at all if cpufreq_suspend() has not been called. Yes, I thought about that, but there is no good way to skip over it without introducing another flag. cpufreq_resume is called by dpm_resume, cpufreq_suspend is called by dpm_suspend. In the failure case, dpm_resume is called, but dpm_suspend is not. So on a higher level it's already unbalanced. One possibility is to rely on the pm_transition flag. So something like: diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index dc259d20c967..8469e6fc2b2c 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -842,6 +842,7 @@ static void async_resume(void *data, async_cookie_t cookie) void dpm_resume(pm_message_t state) { struct device *dev; + bool suspended = (pm_transition.event != PM_EVENT_ON); ktime_t starttime = ktime_get(); trace_suspend_resume(TPS("dpm_resume"), state.event, true); @@ -885,7 +886,8 @@ void dpm_resume(pm_message_t state) async_synchronize_full(); dpm_show_time(starttime, state, NULL); - cpufreq_resume(); + if (likely(suspended)) + cpufreq_resume(); trace_suspend_resume(TPS("dpm_resume"), state.event, false); } I was thinking about something else. Anyway, I think your original patch is OK too, but without printing the message. Just combine the cpufreq_suspended check with the cpufreq_driver one and the unlikely() thing is not necessary. I rather have this fixed in the dpm_suspend/resume() code. This is just masking the first issue that's being caused by unbalanced error handling. If that means adding flags in dpm_suspend/resume() then that's what we should do right now and clean it up later if it can be improved. Making cpufreq more messy doesn't seem like the right answer. Thanks, Saravana dpm_suspend and dpm_resume by themselves are not balanced in this particular case. As it's currently structured, dpm_resume can't be omitted even if dpm_suspend is skipped due to earlier failure. I think checking cpufreq_suspended flag is a reasonable compromise. If we can find a way to make dpm_suspend/dpm_resume also balanced, that will be best.
Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended
On 02/02/2018 11:34 AM, Saravana Kannan wrote: On 02/02/2018 03:54 AM, Rafael J. Wysocki wrote: On Wednesday, January 24, 2018 9:53:14 PM CET Bo Yan wrote: On 01/23/2018 06:02 PM, Rafael J. Wysocki wrote: On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote: drivers/cpufreq/cpufreq.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 41d148af7748..95b1c4afe14e 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1680,6 +1680,10 @@ void cpufreq_resume(void) if (!cpufreq_driver) return; + if (unlikely(!cpufreq_suspended)) { + pr_warn("%s: resume after failing suspend\n", __func__); + return; + } cpufreq_suspended = false; if (!has_target() && !cpufreq_driver->resume) Good catch, but rather than doing this it would be better to avoid calling cpufreq_resume() at all if cpufreq_suspend() has not been called. Yes, I thought about that, but there is no good way to skip over it without introducing another flag. cpufreq_resume is called by dpm_resume, cpufreq_suspend is called by dpm_suspend. In the failure case, dpm_resume is called, but dpm_suspend is not. So on a higher level it's already unbalanced. One possibility is to rely on the pm_transition flag. So something like: diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index dc259d20c967..8469e6fc2b2c 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -842,6 +842,7 @@ static void async_resume(void *data, async_cookie_t cookie) void dpm_resume(pm_message_t state) { struct device *dev; + bool suspended = (pm_transition.event != PM_EVENT_ON); ktime_t starttime = ktime_get(); trace_suspend_resume(TPS("dpm_resume"), state.event, true); @@ -885,7 +886,8 @@ void dpm_resume(pm_message_t state) async_synchronize_full(); dpm_show_time(starttime, state, NULL); - cpufreq_resume(); + if (likely(suspended)) + cpufreq_resume(); trace_suspend_resume(TPS("dpm_resume"), state.event, false); } I was thinking about something else. Anyway, I think your original patch is OK too, but without printing the message. Just combine the cpufreq_suspended check with the cpufreq_driver one and the unlikely() thing is not necessary. I rather have this fixed in the dpm_suspend/resume() code. This is just masking the first issue that's being caused by unbalanced error handling. If that means adding flags in dpm_suspend/resume() then that's what we should do right now and clean it up later if it can be improved. Making cpufreq more messy doesn't seem like the right answer. Thanks, Saravana dpm_suspend and dpm_resume by themselves are not balanced in this particular case. As it's currently structured, dpm_resume can't be omitted even if dpm_suspend is skipped due to earlier failure. I think checking cpufreq_suspended flag is a reasonable compromise. If we can find a way to make dpm_suspend/dpm_resume also balanced, that will be best.
[PATCH v2] cpufreq: skip cpufreq resume if it's not suspended
cpufreq_resume can be called even without preceding cpufreq_suspend. This can happen in following scenario: suspend_devices_and_enter --> dpm_suspend_start --> dpm_prepare --> device_prepare : this function errors out --> dpm_suspend: this is skipped due to dpm_prepare failure this means cpufreq_suspend is skipped over --> goto Recover_platform, due to previous error --> goto Resume_devices --> dpm_resume_end --> dpm_resume --> cpufreq_resume In case schedutil is used as frequency governor, cpufreq_resume will eventually call sugov_start, which does following: memset(sg_cpu, 0, sizeof(*sg_cpu)); This effectively erases function pointer for frequency update, causing crash later on. The function pointer would have been set correctly if subsequent cpufreq_add_update_util_hook runs successfully, but that function returns earlier because cpufreq_suspend was not called: if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu))) return; Ideally, suspend should succeed, then things will be fine. But even in case of suspend failure, system should not crash. The fix is to check the pm_transition status in dpm_resume. if pm_transition.event == PMSG_ON, we know for sure dpm_suspend has not been called, so do not call cpufreq_resume. Signed-off-by: Bo Yan <b...@nvidia.com> --- drivers/base/power/main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 08744b572af6..39829d7a9311 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -921,6 +921,7 @@ static void async_resume(void *data, async_cookie_t cookie) void dpm_resume(pm_message_t state) { struct device *dev; + bool suspended = (pm_transition.event != PM_EVENT_ON); ktime_t starttime = ktime_get(); trace_suspend_resume(TPS("dpm_resume"), state.event, true); @@ -964,7 +965,8 @@ void dpm_resume(pm_message_t state) async_synchronize_full(); dpm_show_time(starttime, state, 0, NULL); - cpufreq_resume(); + if (likely(suspended)) + cpufreq_resume(); trace_suspend_resume(TPS("dpm_resume"), state.event, false); } -- 2.7.4
[PATCH v2] cpufreq: skip cpufreq resume if it's not suspended
cpufreq_resume can be called even without preceding cpufreq_suspend. This can happen in following scenario: suspend_devices_and_enter --> dpm_suspend_start --> dpm_prepare --> device_prepare : this function errors out --> dpm_suspend: this is skipped due to dpm_prepare failure this means cpufreq_suspend is skipped over --> goto Recover_platform, due to previous error --> goto Resume_devices --> dpm_resume_end --> dpm_resume --> cpufreq_resume In case schedutil is used as frequency governor, cpufreq_resume will eventually call sugov_start, which does following: memset(sg_cpu, 0, sizeof(*sg_cpu)); This effectively erases function pointer for frequency update, causing crash later on. The function pointer would have been set correctly if subsequent cpufreq_add_update_util_hook runs successfully, but that function returns earlier because cpufreq_suspend was not called: if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu))) return; Ideally, suspend should succeed, then things will be fine. But even in case of suspend failure, system should not crash. The fix is to check the pm_transition status in dpm_resume. if pm_transition.event == PMSG_ON, we know for sure dpm_suspend has not been called, so do not call cpufreq_resume. Signed-off-by: Bo Yan --- drivers/base/power/main.c | 4 +++- 1 file changed, 3 insertions(+), 1 deletion(-) diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index 08744b572af6..39829d7a9311 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -921,6 +921,7 @@ static void async_resume(void *data, async_cookie_t cookie) void dpm_resume(pm_message_t state) { struct device *dev; + bool suspended = (pm_transition.event != PM_EVENT_ON); ktime_t starttime = ktime_get(); trace_suspend_resume(TPS("dpm_resume"), state.event, true); @@ -964,7 +965,8 @@ void dpm_resume(pm_message_t state) async_synchronize_full(); dpm_show_time(starttime, state, 0, NULL); - cpufreq_resume(); + if (likely(suspended)) + cpufreq_resume(); trace_suspend_resume(TPS("dpm_resume"), state.event, false); } -- 2.7.4
Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended
On 01/23/2018 06:02 PM, Rafael J. Wysocki wrote: On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote: drivers/cpufreq/cpufreq.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 41d148af7748..95b1c4afe14e 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1680,6 +1680,10 @@ void cpufreq_resume(void) if (!cpufreq_driver) return; + if (unlikely(!cpufreq_suspended)) { + pr_warn("%s: resume after failing suspend\n", __func__); + return; + } cpufreq_suspended = false; if (!has_target() && !cpufreq_driver->resume) Good catch, but rather than doing this it would be better to avoid calling cpufreq_resume() at all if cpufreq_suspend() has not been called. Yes, I thought about that, but there is no good way to skip over it without introducing another flag. cpufreq_resume is called by dpm_resume, cpufreq_suspend is called by dpm_suspend. In the failure case, dpm_resume is called, but dpm_suspend is not. So on a higher level it's already unbalanced. One possibility is to rely on the pm_transition flag. So something like: diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index dc259d20c967..8469e6fc2b2c 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -842,6 +842,7 @@ static void async_resume(void *data, async_cookie_t cookie) void dpm_resume(pm_message_t state) { struct device *dev; + bool suspended = (pm_transition.event != PM_EVENT_ON); ktime_t starttime = ktime_get(); trace_suspend_resume(TPS("dpm_resume"), state.event, true); @@ -885,7 +886,8 @@ void dpm_resume(pm_message_t state) async_synchronize_full(); dpm_show_time(starttime, state, NULL); - cpufreq_resume(); + if (likely(suspended)) + cpufreq_resume(); trace_suspend_resume(TPS("dpm_resume"), state.event, false); } This relies on the fact that the pm_transition will stay as PMSG_ON if dpm_prepare failed, in which case dpm_suspend will be skipped over, pm_transition will remain as 0 until dpm_resume. dpm_suspend changes pm_transition to whatever state it receives, which is never PMSG_ON. pm_transition is not changing to PMSG_ON before dpm_resume. This is my understanding. does this make sense? Thanks, Rafael
Re: [PATCH] cpufreq: skip cpufreq resume if it's not suspended
On 01/23/2018 06:02 PM, Rafael J. Wysocki wrote: On Tuesday, January 23, 2018 10:57:55 PM CET Bo Yan wrote: drivers/cpufreq/cpufreq.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 41d148af7748..95b1c4afe14e 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1680,6 +1680,10 @@ void cpufreq_resume(void) if (!cpufreq_driver) return; + if (unlikely(!cpufreq_suspended)) { + pr_warn("%s: resume after failing suspend\n", __func__); + return; + } cpufreq_suspended = false; if (!has_target() && !cpufreq_driver->resume) Good catch, but rather than doing this it would be better to avoid calling cpufreq_resume() at all if cpufreq_suspend() has not been called. Yes, I thought about that, but there is no good way to skip over it without introducing another flag. cpufreq_resume is called by dpm_resume, cpufreq_suspend is called by dpm_suspend. In the failure case, dpm_resume is called, but dpm_suspend is not. So on a higher level it's already unbalanced. One possibility is to rely on the pm_transition flag. So something like: diff --git a/drivers/base/power/main.c b/drivers/base/power/main.c index dc259d20c967..8469e6fc2b2c 100644 --- a/drivers/base/power/main.c +++ b/drivers/base/power/main.c @@ -842,6 +842,7 @@ static void async_resume(void *data, async_cookie_t cookie) void dpm_resume(pm_message_t state) { struct device *dev; + bool suspended = (pm_transition.event != PM_EVENT_ON); ktime_t starttime = ktime_get(); trace_suspend_resume(TPS("dpm_resume"), state.event, true); @@ -885,7 +886,8 @@ void dpm_resume(pm_message_t state) async_synchronize_full(); dpm_show_time(starttime, state, NULL); - cpufreq_resume(); + if (likely(suspended)) + cpufreq_resume(); trace_suspend_resume(TPS("dpm_resume"), state.event, false); } This relies on the fact that the pm_transition will stay as PMSG_ON if dpm_prepare failed, in which case dpm_suspend will be skipped over, pm_transition will remain as 0 until dpm_resume. dpm_suspend changes pm_transition to whatever state it receives, which is never PMSG_ON. pm_transition is not changing to PMSG_ON before dpm_resume. This is my understanding. does this make sense? Thanks, Rafael
[PATCH] cpufreq: skip cpufreq resume if it's not suspended
cpufreq_resume can be called even without preceding cpufreq_suspend. This can happen in following scenario: suspend_devices_and_enter --> dpm_suspend_start --> dpm_prepare --> device_prepare : this function errors out --> dpm_suspend: this is skipped due to dpm_prepare failure this means cpufreq_suspend is skipped over --> goto Recover_platform, due to previous error --> goto Resume_devices --> dpm_resume_end --> dpm_resume --> cpufreq_resume In case schedutil is used as frequency governor, cpufreq_resume will eventually call sugov_start, which does following: memset(sg_cpu, 0, sizeof(*sg_cpu)); This effectively erases function pointer for frequency update, causing crash later on. The function pointer would have been set correctly if subsequent cpufreq_add_update_util_hook runs successfully, but that function returns earlier because cpufreq_suspend was not called: if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu))) return; Ideally, suspend should succeed, then things will be fine. But even in case of suspend failure, system should not crash. The fix is to check cpufreq_suspended first, if it's false, that means cpufreq_suspend was not called in the first place, so do not resume cpufreq. Signed-off-by: Bo Yan <b...@nvidia.com> --- drivers/cpufreq/cpufreq.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 41d148af7748..95b1c4afe14e 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1680,6 +1680,10 @@ void cpufreq_resume(void) if (!cpufreq_driver) return; + if (unlikely(!cpufreq_suspended)) { + pr_warn("%s: resume after failing suspend\n", __func__); + return; + } cpufreq_suspended = false; if (!has_target() && !cpufreq_driver->resume) -- 2.7.4
[PATCH] cpufreq: skip cpufreq resume if it's not suspended
cpufreq_resume can be called even without preceding cpufreq_suspend. This can happen in following scenario: suspend_devices_and_enter --> dpm_suspend_start --> dpm_prepare --> device_prepare : this function errors out --> dpm_suspend: this is skipped due to dpm_prepare failure this means cpufreq_suspend is skipped over --> goto Recover_platform, due to previous error --> goto Resume_devices --> dpm_resume_end --> dpm_resume --> cpufreq_resume In case schedutil is used as frequency governor, cpufreq_resume will eventually call sugov_start, which does following: memset(sg_cpu, 0, sizeof(*sg_cpu)); This effectively erases function pointer for frequency update, causing crash later on. The function pointer would have been set correctly if subsequent cpufreq_add_update_util_hook runs successfully, but that function returns earlier because cpufreq_suspend was not called: if (WARN_ON(per_cpu(cpufreq_update_util_data, cpu))) return; Ideally, suspend should succeed, then things will be fine. But even in case of suspend failure, system should not crash. The fix is to check cpufreq_suspended first, if it's false, that means cpufreq_suspend was not called in the first place, so do not resume cpufreq. Signed-off-by: Bo Yan --- drivers/cpufreq/cpufreq.c | 4 1 file changed, 4 insertions(+) diff --git a/drivers/cpufreq/cpufreq.c b/drivers/cpufreq/cpufreq.c index 41d148af7748..95b1c4afe14e 100644 --- a/drivers/cpufreq/cpufreq.c +++ b/drivers/cpufreq/cpufreq.c @@ -1680,6 +1680,10 @@ void cpufreq_resume(void) if (!cpufreq_driver) return; + if (unlikely(!cpufreq_suspended)) { + pr_warn("%s: resume after failing suspend\n", __func__); + return; + } cpufreq_suspended = false; if (!has_target() && !cpufreq_driver->resume) -- 2.7.4
Re: [PATCH] tracing: erase irqsoff trace with empty write
Steven, Thanks for review. I have posted the following update https://lkml.org/lkml/2017/9/18/715 Bo On 09/18/2017 07:38 AM, Steven Rostedt wrote: On Mon, 11 Sep 2017 11:16:35 -0700 Bo Yan <b...@nvidia.com> wrote: One convenient way to erase trace is "echo > trace". However, this is currently broken if the current tracer is irqsoff tracer. This is because irqsoff tracer use max_buffer as the default trace buffer. Set the max_buffer as the one to be cleared when it's the trace buffer currently in use. Hi Bo, I have no problems with the logic of the patch. But just one nit. The tr->trace_buffer has been called "trace_buf" elsewhere in the file. Can you resend with using the variable name "trace_buf" instead of "tr_buf" just to keep it consistent in the file. It's also been called 'buf' but trace_buf would probably be more appropriate. Thanks! -- Steve Signed-off-by: Bo Yan <b...@nvidia.com> --- kernel/trace/trace.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 5360b7aec57a..1f91bdd9365b 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4020,11 +4020,17 @@ static int tracing_open(struct inode *inode, struct file *file) /* If this file was open for write, then erase contents */ if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) { int cpu = tracing_get_cpu(inode); + struct trace_buffer *tr_buf = >trace_buffer; + +#ifdef CONFIG_TRACER_MAX_TRACE + if (tr->current_trace->print_max) + tr_buf = >max_buffer; +#endif if (cpu == RING_BUFFER_ALL_CPUS) - tracing_reset_online_cpus(>trace_buffer); + tracing_reset_online_cpus(tr_buf); else - tracing_reset(>trace_buffer, cpu); + tracing_reset(tr_buf, cpu); } if (file->f_mode & FMODE_READ) {
Re: [PATCH] tracing: erase irqsoff trace with empty write
Steven, Thanks for review. I have posted the following update https://lkml.org/lkml/2017/9/18/715 Bo On 09/18/2017 07:38 AM, Steven Rostedt wrote: On Mon, 11 Sep 2017 11:16:35 -0700 Bo Yan wrote: One convenient way to erase trace is "echo > trace". However, this is currently broken if the current tracer is irqsoff tracer. This is because irqsoff tracer use max_buffer as the default trace buffer. Set the max_buffer as the one to be cleared when it's the trace buffer currently in use. Hi Bo, I have no problems with the logic of the patch. But just one nit. The tr->trace_buffer has been called "trace_buf" elsewhere in the file. Can you resend with using the variable name "trace_buf" instead of "tr_buf" just to keep it consistent in the file. It's also been called 'buf' but trace_buf would probably be more appropriate. Thanks! -- Steve Signed-off-by: Bo Yan --- kernel/trace/trace.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 5360b7aec57a..1f91bdd9365b 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4020,11 +4020,17 @@ static int tracing_open(struct inode *inode, struct file *file) /* If this file was open for write, then erase contents */ if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) { int cpu = tracing_get_cpu(inode); + struct trace_buffer *tr_buf = >trace_buffer; + +#ifdef CONFIG_TRACER_MAX_TRACE + if (tr->current_trace->print_max) + tr_buf = >max_buffer; +#endif if (cpu == RING_BUFFER_ALL_CPUS) - tracing_reset_online_cpus(>trace_buffer); + tracing_reset_online_cpus(tr_buf); else - tracing_reset(>trace_buffer, cpu); + tracing_reset(tr_buf, cpu); } if (file->f_mode & FMODE_READ) {
Re: [PATCH] cpufreq: cpufreq_stats: make last_index signed int
On 09/17/2017 06:50 PM, Viresh Kumar wrote: On 15-09-17, 13:13, Bo Yan wrote: It is possible for last_index to get a -1 if current frequency is not found in the freq table when stats is created. If the function "cpufreq_stats_update" is called before last_index is updated with a valid value, the "-1" will be used as index to update stats->time_in_state, triggering an exception. No, that's not how it works AFAIK and if it did, then can you explain how your solution fixes it? AFAIK, what happens right now is that stats->last_index eventually stores 2147483647 (uint max) and the exception comes while accessing that value. While with your change, it will become -1 and accessing array[-1] is fine by C standards, though it is still the wrong thing to do as you are accessing something outside of the array. We should just check last_index == -1 before calling cpufreq_stats_update(), which is already done by one of the callers. Currently, the "last_index" is being checked before cpufreq_stats_update(stats) inside function "cpufreq_stats_record_transition", so it's taken care of. However, the function "show_time_in_state" also calls cpufreq_stats_update, the similar check should be done there too, like this: diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index e75880eb037d..15305b5ec322 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -62,7 +62,8 @@ static ssize_t show_time_in_state(struct cpufreq_policy *policy, char *buf) if (policy->fast_switch_enabled) return 0; - cpufreq_stats_update(stats); + if ((int)stats->last_index >= 0) + cpufreq_stats_update(stats); for (i = 0; i < stats->state_num; i++) { len += sprintf(buf + len, "%u %llu\n", stats->freq_table[i], (unsigned long long) This is only needed when policy->cur is not in frequency table when stats table is created, in which case, stats->last_index will get -1, then user does a "cat time_in_state" before any frequency transition. Does this make sense?
Re: [PATCH] cpufreq: cpufreq_stats: make last_index signed int
On 09/17/2017 06:50 PM, Viresh Kumar wrote: On 15-09-17, 13:13, Bo Yan wrote: It is possible for last_index to get a -1 if current frequency is not found in the freq table when stats is created. If the function "cpufreq_stats_update" is called before last_index is updated with a valid value, the "-1" will be used as index to update stats->time_in_state, triggering an exception. No, that's not how it works AFAIK and if it did, then can you explain how your solution fixes it? AFAIK, what happens right now is that stats->last_index eventually stores 2147483647 (uint max) and the exception comes while accessing that value. While with your change, it will become -1 and accessing array[-1] is fine by C standards, though it is still the wrong thing to do as you are accessing something outside of the array. We should just check last_index == -1 before calling cpufreq_stats_update(), which is already done by one of the callers. Currently, the "last_index" is being checked before cpufreq_stats_update(stats) inside function "cpufreq_stats_record_transition", so it's taken care of. However, the function "show_time_in_state" also calls cpufreq_stats_update, the similar check should be done there too, like this: diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index e75880eb037d..15305b5ec322 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -62,7 +62,8 @@ static ssize_t show_time_in_state(struct cpufreq_policy *policy, char *buf) if (policy->fast_switch_enabled) return 0; - cpufreq_stats_update(stats); + if ((int)stats->last_index >= 0) + cpufreq_stats_update(stats); for (i = 0; i < stats->state_num; i++) { len += sprintf(buf + len, "%u %llu\n", stats->freq_table[i], (unsigned long long) This is only needed when policy->cur is not in frequency table when stats table is created, in which case, stats->last_index will get -1, then user does a "cat time_in_state" before any frequency transition. Does this make sense?
[PATCH V2] tracing: erase irqsoff trace with empty write
One convenient way to erase trace is "echo > trace". However, this is currently broken if the current tracer is irqsoff tracer. This is because irqsoff tracer use max_buffer as the default trace buffer. Set the max_buffer as the one to be cleared when it's the trace buffer currently in use. Signed-off-by: Bo Yan <b...@nvidia.com> --- Changes in v2: rename tr_buf to trace_buf kernel/trace/trace.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 5360b7aec57a..a7fb136da891 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4020,11 +4020,17 @@ static int tracing_open(struct inode *inode, struct file *file) /* If this file was open for write, then erase contents */ if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) { int cpu = tracing_get_cpu(inode); + struct trace_buffer *trace_buf = >trace_buffer; + +#ifdef CONFIG_TRACER_MAX_TRACE + if (tr->current_trace->print_max) + trace_buf = >max_buffer; +#endif if (cpu == RING_BUFFER_ALL_CPUS) - tracing_reset_online_cpus(>trace_buffer); + tracing_reset_online_cpus(trace_buf); else - tracing_reset(>trace_buffer, cpu); + tracing_reset(trace_buf, cpu); } if (file->f_mode & FMODE_READ) { -- 2.7.4
[PATCH V2] tracing: erase irqsoff trace with empty write
One convenient way to erase trace is "echo > trace". However, this is currently broken if the current tracer is irqsoff tracer. This is because irqsoff tracer use max_buffer as the default trace buffer. Set the max_buffer as the one to be cleared when it's the trace buffer currently in use. Signed-off-by: Bo Yan --- Changes in v2: rename tr_buf to trace_buf kernel/trace/trace.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 5360b7aec57a..a7fb136da891 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4020,11 +4020,17 @@ static int tracing_open(struct inode *inode, struct file *file) /* If this file was open for write, then erase contents */ if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) { int cpu = tracing_get_cpu(inode); + struct trace_buffer *trace_buf = >trace_buffer; + +#ifdef CONFIG_TRACER_MAX_TRACE + if (tr->current_trace->print_max) + trace_buf = >max_buffer; +#endif if (cpu == RING_BUFFER_ALL_CPUS) - tracing_reset_online_cpus(>trace_buffer); + tracing_reset_online_cpus(trace_buf); else - tracing_reset(>trace_buffer, cpu); + tracing_reset(trace_buf, cpu); } if (file->f_mode & FMODE_READ) { -- 2.7.4
[PATCH] cpufreq: cpufreq_stats: make last_index signed int
It is possible for last_index to get a -1 if current frequency is not found in the freq table when stats is created. If the function "cpufreq_stats_update" is called before last_index is updated with a valid value, the "-1" will be used as index to update stats->time_in_state, triggering an exception. Change the type of last_index of cpufreq_stats from unsigned int to signed int. Move last_time to location right before time_in_state. This helps save 4 bytes in a LP64 programming model. Signed-off-by: Bo Yan <b...@nvidia.com> --- drivers/cpufreq/cpufreq_stats.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index e75880eb037d..a6838f5a2004 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -18,10 +18,10 @@ static DEFINE_SPINLOCK(cpufreq_stats_lock); struct cpufreq_stats { unsigned int total_trans; - unsigned long long last_time; unsigned int max_state; unsigned int state_num; - unsigned int last_index; + int last_index; + unsigned long long last_time; u64 *time_in_state; unsigned int *freq_table; unsigned int *trans_table; -- 2.7.4
[PATCH] cpufreq: cpufreq_stats: make last_index signed int
It is possible for last_index to get a -1 if current frequency is not found in the freq table when stats is created. If the function "cpufreq_stats_update" is called before last_index is updated with a valid value, the "-1" will be used as index to update stats->time_in_state, triggering an exception. Change the type of last_index of cpufreq_stats from unsigned int to signed int. Move last_time to location right before time_in_state. This helps save 4 bytes in a LP64 programming model. Signed-off-by: Bo Yan --- drivers/cpufreq/cpufreq_stats.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/cpufreq/cpufreq_stats.c b/drivers/cpufreq/cpufreq_stats.c index e75880eb037d..a6838f5a2004 100644 --- a/drivers/cpufreq/cpufreq_stats.c +++ b/drivers/cpufreq/cpufreq_stats.c @@ -18,10 +18,10 @@ static DEFINE_SPINLOCK(cpufreq_stats_lock); struct cpufreq_stats { unsigned int total_trans; - unsigned long long last_time; unsigned int max_state; unsigned int state_num; - unsigned int last_index; + int last_index; + unsigned long long last_time; u64 *time_in_state; unsigned int *freq_table; unsigned int *trans_table; -- 2.7.4
[PATCH] tracing: erase irqsoff trace with empty write
One convenient way to erase trace is "echo > trace". However, this is currently broken if the current tracer is irqsoff tracer. This is because irqsoff tracer use max_buffer as the default trace buffer. Set the max_buffer as the one to be cleared when it's the trace buffer currently in use. Signed-off-by: Bo Yan <b...@nvidia.com> --- kernel/trace/trace.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 5360b7aec57a..1f91bdd9365b 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4020,11 +4020,17 @@ static int tracing_open(struct inode *inode, struct file *file) /* If this file was open for write, then erase contents */ if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) { int cpu = tracing_get_cpu(inode); + struct trace_buffer *tr_buf = >trace_buffer; + +#ifdef CONFIG_TRACER_MAX_TRACE + if (tr->current_trace->print_max) + tr_buf = >max_buffer; +#endif if (cpu == RING_BUFFER_ALL_CPUS) - tracing_reset_online_cpus(>trace_buffer); + tracing_reset_online_cpus(tr_buf); else - tracing_reset(>trace_buffer, cpu); + tracing_reset(tr_buf, cpu); } if (file->f_mode & FMODE_READ) { -- 2.7.4
[PATCH] tracing: erase irqsoff trace with empty write
One convenient way to erase trace is "echo > trace". However, this is currently broken if the current tracer is irqsoff tracer. This is because irqsoff tracer use max_buffer as the default trace buffer. Set the max_buffer as the one to be cleared when it's the trace buffer currently in use. Signed-off-by: Bo Yan --- kernel/trace/trace.c | 10 -- 1 file changed, 8 insertions(+), 2 deletions(-) diff --git a/kernel/trace/trace.c b/kernel/trace/trace.c index 5360b7aec57a..1f91bdd9365b 100644 --- a/kernel/trace/trace.c +++ b/kernel/trace/trace.c @@ -4020,11 +4020,17 @@ static int tracing_open(struct inode *inode, struct file *file) /* If this file was open for write, then erase contents */ if ((file->f_mode & FMODE_WRITE) && (file->f_flags & O_TRUNC)) { int cpu = tracing_get_cpu(inode); + struct trace_buffer *tr_buf = >trace_buffer; + +#ifdef CONFIG_TRACER_MAX_TRACE + if (tr->current_trace->print_max) + tr_buf = >max_buffer; +#endif if (cpu == RING_BUFFER_ALL_CPUS) - tracing_reset_online_cpus(>trace_buffer); + tracing_reset_online_cpus(tr_buf); else - tracing_reset(>trace_buffer, cpu); + tracing_reset(tr_buf, cpu); } if (file->f_mode & FMODE_READ) { -- 2.7.4