Re: MFC: different h264 profile and level output the same size encoded result
On 08/29/2016 01:49 PM, Andrzej Hajda wrote: Hi, On 08/27/2016 11:55 AM, Randy Li wrote: Hi: I have been reported that the setting the profile, level and bitrate through the v4l2 extra controls would not make the encoded result different. I tried it recently, it is true. Although the h264 parser would tell me the result have been applied as different h264 profile and level, but size is the same. You may try this in Gstreamer. gst-launch-1.0 -v \ videotestsrc num-buffers=500 ! video/x-raw, width=1920,height=1080 ! \ videoconvert ! \ v4l2video4h264enc extra-controls="controls,h264_profile=1,video_bitrate=100;" ! \ h264parse ! matroskamux ! filesink location=/tmp/1.mkv Is there any way to reduce the size of MFC encoded data? There is control called rc_enable (rate control enable), it must be set to one if you want to control bitrate. This control confuses many users, I guess it cannot be removed as it is already part of UAPI, but enabling it internally by the driver if user sets bitrate, profille, etc, would make it more saner. I see, thank you so much. A guy told me that the "frame_level_rate_control_enable=1" in _ extra-controls="encode,h264_level=10,h264_profile=4,frame_level_rate_control_enable=1,video_bitrate=2097152" would also make it works. But I really know there is a switch need to turn on. Regards Andrzej -- Randy Li The third produce department
Re: MFC: different h264 profile and level output the same size encoded result
On 08/29/2016 01:49 PM, Andrzej Hajda wrote: Hi, On 08/27/2016 11:55 AM, Randy Li wrote: Hi: I have been reported that the setting the profile, level and bitrate through the v4l2 extra controls would not make the encoded result different. I tried it recently, it is true. Although the h264 parser would tell me the result have been applied as different h264 profile and level, but size is the same. You may try this in Gstreamer. gst-launch-1.0 -v \ videotestsrc num-buffers=500 ! video/x-raw, width=1920,height=1080 ! \ videoconvert ! \ v4l2video4h264enc extra-controls="controls,h264_profile=1,video_bitrate=100;" ! \ h264parse ! matroskamux ! filesink location=/tmp/1.mkv Is there any way to reduce the size of MFC encoded data? There is control called rc_enable (rate control enable), it must be set to one if you want to control bitrate. This control confuses many users, I guess it cannot be removed as it is already part of UAPI, but enabling it internally by the driver if user sets bitrate, profille, etc, would make it more saner. I see, thank you so much. A guy told me that the "frame_level_rate_control_enable=1" in _ extra-controls="encode,h264_level=10,h264_profile=4,frame_level_rate_control_enable=1,video_bitrate=2097152" would also make it works. But I really know there is a switch need to turn on. Regards Andrzej -- Randy Li The third produce department
Re: MFC: different h264 profile and level output the same size encoded result
Hi, On 08/27/2016 11:55 AM, Randy Li wrote: > Hi: > >I have been reported that the setting the profile, level and bitrate > through the v4l2 extra controls would not make the encoded result > different. I tried it recently, it is true. Although the h264 parser > would tell me the result have been applied as different h264 profile and > level, but size is the same. > > You may try this in Gstreamer. > > gst-launch-1.0 -v \ > videotestsrc num-buffers=500 ! video/x-raw, width=1920,height=1080 ! \ > videoconvert ! \ > v4l2video4h264enc > extra-controls="controls,h264_profile=1,video_bitrate=100;" ! \ > h264parse ! matroskamux ! filesink location=/tmp/1.mkv > > Is there any way to reduce the size of MFC encoded data? > There is control called rc_enable (rate control enable), it must be set to one if you want to control bitrate. This control confuses many users, I guess it cannot be removed as it is already part of UAPI, but enabling it internally by the driver if user sets bitrate, profille, etc, would make it more saner. Regards Andrzej
Re: MFC: different h264 profile and level output the same size encoded result
Hi, On 08/27/2016 11:55 AM, Randy Li wrote: > Hi: > >I have been reported that the setting the profile, level and bitrate > through the v4l2 extra controls would not make the encoded result > different. I tried it recently, it is true. Although the h264 parser > would tell me the result have been applied as different h264 profile and > level, but size is the same. > > You may try this in Gstreamer. > > gst-launch-1.0 -v \ > videotestsrc num-buffers=500 ! video/x-raw, width=1920,height=1080 ! \ > videoconvert ! \ > v4l2video4h264enc > extra-controls="controls,h264_profile=1,video_bitrate=100;" ! \ > h264parse ! matroskamux ! filesink location=/tmp/1.mkv > > Is there any way to reduce the size of MFC encoded data? > There is control called rc_enable (rate control enable), it must be set to one if you want to control bitrate. This control confuses many users, I guess it cannot be removed as it is already part of UAPI, but enabling it internally by the driver if user sets bitrate, profille, etc, would make it more saner. Regards Andrzej
[GIT] Networking
1) Segregate namespaces properly in conntrack dumps, from Liping Zhang. 2) tcp listener refcount fix in netfilter tproxy, from Eric Dumazet. 3) Fix timeouts in qed driver due to xmit_more, from Yuval Mintz. 4) Fix use-after-free in tcp_xmit_retransmit_queue(). 5) Userspace header fixups (use of __u32, missing includes, etc.) from Mikko Rapeli. 6) Further refinements to fragmentation wrt. gso and tunnels, from Shmulik Ladkani. 7) Trigger poll correctly for zero length UDP packets, from Eric Dumazet. 8) TCP window scaling fix, also from Eric Dumazet. 9) SLAB_DESTROY_BY_RCU is not relevant any more for UDP sockets. 10) Module refcount leak in qdisc_create_dflt(), from Eric Dumazet. 11) Fix deadlock in cp_rx_poll() of 8139cp driver, from Gao Feng. 12) Memory leak in rhashtable's alloc_bucket_locks(), from Eric Dumazet. 13) Add new device ID to alx driver, from Owen Lin. Please pull, thanks a lot! The following changes since commit 184ca823481c99dadd7d946e5afd4bb921eab30d: Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2016-08-17 17:26:58 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git for you to fetch changes up to b99b43bb4bdf1d361f7487cf03d803082bbf9101: Add Killer E2500 device ID in alx driver. (2016-08-29 00:23:50 -0400) Alexander Duyck (1): ixgbe: Do not clear RAR entry when clearing VMDq for SAN MAC Amir Vadai (1): net/mlx5: Update last-use statistics for flow rules Andrew Rybchenko (1): sfc: fix potential stack corruption from running past stat bitmask Anjali Singhai Jain (1): i40e: Change some init flow for the client Colin Ian King (2): net: tehuti: fix typo: "eneble" -> "enable" net: hns: dereference ppe_cb->ppe_common_cb if it is non-null Daniel Borkmann (1): Bluetooth: split sk_filter in l2cap_sock_recv_cb Daniel Romell (1): net: xilinx: emaclite: Fallback to random MAC address. David Ahern (1): net: diag: Fix refcnt leak in error path destroying socket David Daney (1): net: thunderx: Fix OOPs with ethtool --register-dump David S. Miller (5): Merge git://git.kernel.org/.../pablo/nf Merge branch 'kaweth-oopses' Merge branch 'mlx5-fixes' Merge branch 'for-upstream' of git://git.kernel.org/.../bluetooth/bluetooth Merge branch 'mlx5-series' Eran Ben Elisha (2): net/mlx5e: Fix ethtool -g/G rx ring parameter report with striding RQ net/mlx5: Add error prints when validate ETS failed Eric Dumazet (7): netfilter: tproxy: properly refcount tcp listeners tcp: fix use after free in tcp_xmit_retransmit_queue() udp: fix poll() issue with zero sized packets tcp: properly scale window in tcp_v[46]_reqsk_send_ack() udp: get rid of SLAB_DESTROY_BY_RCU allocations qdisc: fix a module refcount leak in qdisc_create_dflt() rhashtable: fix a memory leak in alloc_bucket_locks() Fabio Estevam (1): net: lpc_eth: Check clk_prepare_enable() error Florian Fainelli (2): net: dsa: bcm_sf2: Fix race condition while unmasking interrupts Documentation: networking: dsa: Remove platform device TODO Frederic Dalleau (1): Bluetooth: Fix memory leak at end of hci requests Gao Feng (2): l2tp: Fix the connect status check in pppol2tp_getname 8139cp: Fix one possible deadloop in cp_rx_poll Hadar Hen Zion (2): net/mlx5e: Use correct flow dissector key on flower offloading net/mlx5e: Retrieve the switchdev id from the firmware only once Hariprasad Shenai (1): cxgb4: Fixes resource allocation for ULD's in kdump kernel Ido Schimmel (1): mlxsw: spectrum: Add missing flood to router port Jamal Hadi Salim (1): net sched: fix encoding to use real length Jamie Lentin (1): net: mv88e6xxx: Fix ingress rate removal for mv6131 chips Jiri Pirko (2): mlxsw: spectrum_buffers: Fix pool value handling in mlxsw_sp_sb_tc_pool_bind_set team: loadbalance: push lacpdus to exact delivery Kamal Heib (1): net/mlx5e: Fix memory leak if refreshing TIRs fails Lance Richardson (1): sctp: fix overrun in sctp_diag_dump_one() Liping Zhang (5): netfilter: conntrack: do not dump other netns's conntrack entries via proc netfilter: nfnetlink_log: add "nf-logger-3-1" module alias name netfilter: nfnetlink_acct: report overquota to the right netns netfilter: nfnetlink_acct: fix race between nfacct del and xt_nfacct destroy netfilter: cttimeout: fix use after free error when delete netns Luiz Augusto von Dentz (2): Bluetooth: Fix bt_sock_recvmsg when MSG_TRUNC is not set Bluetooth: Fix hci_sock_recvmsg when MSG_TRUNC is not set Maor Gottlieb (1): net/mlx5: Increase number of ethtool steering priorities Marcelo Ricardo Leitner (1): sctp: linearize early if it's not GSO Mike
[GIT] Networking
1) Segregate namespaces properly in conntrack dumps, from Liping Zhang. 2) tcp listener refcount fix in netfilter tproxy, from Eric Dumazet. 3) Fix timeouts in qed driver due to xmit_more, from Yuval Mintz. 4) Fix use-after-free in tcp_xmit_retransmit_queue(). 5) Userspace header fixups (use of __u32, missing includes, etc.) from Mikko Rapeli. 6) Further refinements to fragmentation wrt. gso and tunnels, from Shmulik Ladkani. 7) Trigger poll correctly for zero length UDP packets, from Eric Dumazet. 8) TCP window scaling fix, also from Eric Dumazet. 9) SLAB_DESTROY_BY_RCU is not relevant any more for UDP sockets. 10) Module refcount leak in qdisc_create_dflt(), from Eric Dumazet. 11) Fix deadlock in cp_rx_poll() of 8139cp driver, from Gao Feng. 12) Memory leak in rhashtable's alloc_bucket_locks(), from Eric Dumazet. 13) Add new device ID to alx driver, from Owen Lin. Please pull, thanks a lot! The following changes since commit 184ca823481c99dadd7d946e5afd4bb921eab30d: Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/net (2016-08-17 17:26:58 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/davem/net.git for you to fetch changes up to b99b43bb4bdf1d361f7487cf03d803082bbf9101: Add Killer E2500 device ID in alx driver. (2016-08-29 00:23:50 -0400) Alexander Duyck (1): ixgbe: Do not clear RAR entry when clearing VMDq for SAN MAC Amir Vadai (1): net/mlx5: Update last-use statistics for flow rules Andrew Rybchenko (1): sfc: fix potential stack corruption from running past stat bitmask Anjali Singhai Jain (1): i40e: Change some init flow for the client Colin Ian King (2): net: tehuti: fix typo: "eneble" -> "enable" net: hns: dereference ppe_cb->ppe_common_cb if it is non-null Daniel Borkmann (1): Bluetooth: split sk_filter in l2cap_sock_recv_cb Daniel Romell (1): net: xilinx: emaclite: Fallback to random MAC address. David Ahern (1): net: diag: Fix refcnt leak in error path destroying socket David Daney (1): net: thunderx: Fix OOPs with ethtool --register-dump David S. Miller (5): Merge git://git.kernel.org/.../pablo/nf Merge branch 'kaweth-oopses' Merge branch 'mlx5-fixes' Merge branch 'for-upstream' of git://git.kernel.org/.../bluetooth/bluetooth Merge branch 'mlx5-series' Eran Ben Elisha (2): net/mlx5e: Fix ethtool -g/G rx ring parameter report with striding RQ net/mlx5: Add error prints when validate ETS failed Eric Dumazet (7): netfilter: tproxy: properly refcount tcp listeners tcp: fix use after free in tcp_xmit_retransmit_queue() udp: fix poll() issue with zero sized packets tcp: properly scale window in tcp_v[46]_reqsk_send_ack() udp: get rid of SLAB_DESTROY_BY_RCU allocations qdisc: fix a module refcount leak in qdisc_create_dflt() rhashtable: fix a memory leak in alloc_bucket_locks() Fabio Estevam (1): net: lpc_eth: Check clk_prepare_enable() error Florian Fainelli (2): net: dsa: bcm_sf2: Fix race condition while unmasking interrupts Documentation: networking: dsa: Remove platform device TODO Frederic Dalleau (1): Bluetooth: Fix memory leak at end of hci requests Gao Feng (2): l2tp: Fix the connect status check in pppol2tp_getname 8139cp: Fix one possible deadloop in cp_rx_poll Hadar Hen Zion (2): net/mlx5e: Use correct flow dissector key on flower offloading net/mlx5e: Retrieve the switchdev id from the firmware only once Hariprasad Shenai (1): cxgb4: Fixes resource allocation for ULD's in kdump kernel Ido Schimmel (1): mlxsw: spectrum: Add missing flood to router port Jamal Hadi Salim (1): net sched: fix encoding to use real length Jamie Lentin (1): net: mv88e6xxx: Fix ingress rate removal for mv6131 chips Jiri Pirko (2): mlxsw: spectrum_buffers: Fix pool value handling in mlxsw_sp_sb_tc_pool_bind_set team: loadbalance: push lacpdus to exact delivery Kamal Heib (1): net/mlx5e: Fix memory leak if refreshing TIRs fails Lance Richardson (1): sctp: fix overrun in sctp_diag_dump_one() Liping Zhang (5): netfilter: conntrack: do not dump other netns's conntrack entries via proc netfilter: nfnetlink_log: add "nf-logger-3-1" module alias name netfilter: nfnetlink_acct: report overquota to the right netns netfilter: nfnetlink_acct: fix race between nfacct del and xt_nfacct destroy netfilter: cttimeout: fix use after free error when delete netns Luiz Augusto von Dentz (2): Bluetooth: Fix bt_sock_recvmsg when MSG_TRUNC is not set Bluetooth: Fix hci_sock_recvmsg when MSG_TRUNC is not set Maor Gottlieb (1): net/mlx5: Increase number of ethtool steering priorities Marcelo Ricardo Leitner (1): sctp: linearize early if it's not GSO Mike
[GIT PULL] platform-drivers-x86 for 4.8-4
Hi Linus, The following changes since commit 3eab887a55424fc2c27553b7bfe32330df83f7b8: Linux 4.8-rc4 (2016-08-28 15:04:33 -0700) are available in the git repository at: git://git.infradead.org/users/dvhart/linux-platform-drivers-x86.git tags/platform-drivers-x86-v4.8-4 for you to fetch changes up to da43bf0c21e57fff0221da5de0a9a388ec0d27cd: intel_pmic_gpio: Make explicitly non-modular (2016-08-28 22:31:52 -0700) Thanks, Darren Hart Intel Open Source Technology Center platform-drivers-x86 for 4.8-4 Remove module related code from two drivers that are only configurable as built-in. intel_pmic_gpio: - Make explicitly non-modular platform/olpc: - Make ec explicitly non-modular Paul Gortmaker (2): platform/olpc: Make ec explicitly non-modular intel_pmic_gpio: Make explicitly non-modular drivers/platform/olpc/olpc-ec.c| 8 +++- drivers/platform/x86/intel_pmic_gpio.c | 8 ++-- 2 files changed, 5 insertions(+), 11 deletions(-) -- Darren Hart Intel Open Source Technology Center
[GIT PULL] platform-drivers-x86 for 4.8-4
Hi Linus, The following changes since commit 3eab887a55424fc2c27553b7bfe32330df83f7b8: Linux 4.8-rc4 (2016-08-28 15:04:33 -0700) are available in the git repository at: git://git.infradead.org/users/dvhart/linux-platform-drivers-x86.git tags/platform-drivers-x86-v4.8-4 for you to fetch changes up to da43bf0c21e57fff0221da5de0a9a388ec0d27cd: intel_pmic_gpio: Make explicitly non-modular (2016-08-28 22:31:52 -0700) Thanks, Darren Hart Intel Open Source Technology Center platform-drivers-x86 for 4.8-4 Remove module related code from two drivers that are only configurable as built-in. intel_pmic_gpio: - Make explicitly non-modular platform/olpc: - Make ec explicitly non-modular Paul Gortmaker (2): platform/olpc: Make ec explicitly non-modular intel_pmic_gpio: Make explicitly non-modular drivers/platform/olpc/olpc-ec.c| 8 +++- drivers/platform/x86/intel_pmic_gpio.c | 8 ++-- 2 files changed, 5 insertions(+), 11 deletions(-) -- Darren Hart Intel Open Source Technology Center
Re: [PATCH] mm: Use zonelist name instead of using hardcoded index
On 08/26/2016 09:27 PM, Aneesh Kumar K.V wrote: > This use the existing enums instead of hardcoded index when looking at the Small nit. 'use' --> 'uses' > zonelist. This makes it more readable. No functionality change by this > patch. Came across this some time back, yeah it really makes sense to replace those hard coded indices. > > Signed-off-by: Aneesh Kumar K.VReviewed-by: Anshuman Khandual
Re: [PATCH] mm: Use zonelist name instead of using hardcoded index
On 08/26/2016 09:27 PM, Aneesh Kumar K.V wrote: > This use the existing enums instead of hardcoded index when looking at the Small nit. 'use' --> 'uses' > zonelist. This makes it more readable. No functionality change by this > patch. Came across this some time back, yeah it really makes sense to replace those hard coded indices. > > Signed-off-by: Aneesh Kumar K.V Reviewed-by: Anshuman Khandual
Grant Offer
You are a recipient to Mrs Julie leach Donation of 2M USD. Contact (julie_leach...@hotmail.com) for claims
Grant Offer
You are a recipient to Mrs Julie leach Donation of 2M USD. Contact (julie_leach...@hotmail.com) for claims
[PATCH 2/3] ARM: dts: imx7-colibri: add basic supply regulators
Colibri modules need to be powered using the power pins 3V3 and AVDD_AUDIO. Add fixed regulators which represent this power rails. Potentially, those power rails could be switched on a carrier board. A carrier board device tree could add a own regulator with a GPIO, and reference that regulator in a vin-supply property of those new module level system regulators. This also synchronize the name of the +3.3V regulator with the one used in the Colibri VF50/VF61 device tree. Signed-off-by: Stefan Agner--- arch/arm/boot/dts/imx7-colibri.dtsi | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/arm/boot/dts/imx7-colibri.dtsi b/arch/arm/boot/dts/imx7-colibri.dtsi index 044b83e..06fb567 100644 --- a/arch/arm/boot/dts/imx7-colibri.dtsi +++ b/arch/arm/boot/dts/imx7-colibri.dtsi @@ -46,12 +46,18 @@ pwms = < 0 500>; }; - reg_3p3v: regulator-3p3v { + reg_module_3v3: regulator-module-3v3 { compatible = "regulator-fixed"; - regulator-name = "3P3V"; + regulator-name = "+V3.3"; + regulator-min-microvolt = <330>; + regulator-max-microvolt = <330>; + }; + + reg_module_3v3_avdd: regulator-module-3v3-avdd { + compatible = "regulator-fixed"; + regulator-name = "+V3.3_AVDD_AUDIO"; regulator-min-microvolt = <330>; regulator-max-microvolt = <330>; - regulator-always-on; }; reg_vref_1v8: regulator-vref-1v8 { -- 2.9.0
[PATCH 3/3] ARM: dts: imx7-colibri: add Audio support
Add audio support via on module I2S SGTL5000 codec. Signed-off-by: Stefan Agner--- arch/arm/boot/dts/imx7-colibri.dtsi | 41 - 1 file changed, 40 insertions(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/imx7-colibri.dtsi b/arch/arm/boot/dts/imx7-colibri.dtsi index 06fb567..a9cc657 100644 --- a/arch/arm/boot/dts/imx7-colibri.dtsi +++ b/arch/arm/boot/dts/imx7-colibri.dtsi @@ -66,6 +66,22 @@ regulator-min-microvolt = <180>; regulator-max-microvolt = <180>; }; + + sound { + compatible = "simple-audio-card"; + simple-audio-card,name = "imx7-sgtl5000"; + simple-audio-card,format = "i2s"; + simple-audio-card,bitclock-master = <_master>; + simple-audio-card,frame-master = <_master>; + simple-audio-card,cpu { + sound-dai = <>; + }; + + dailink_master: simple-audio-card,codec { + sound-dai = <>; + clocks = < IMX7D_AUDIO_MCLK_ROOT_CLK>; + }; + }; }; { @@ -103,6 +119,18 @@ pinctrl-0 = <_i2c1 _i2c1_int>; status = "okay"; + codec: sgtl5000@0a { + compatible = "fsl,sgtl5000"; + #sound-dai-cells = <0>; + reg = <0x0a>; + clocks = < IMX7D_AUDIO_MCLK_ROOT_CLK>; + pinctrl-names = "default"; + pinctrl-0 = <_sai1_mclk>; + VDDA-supply = <_module_3v3_avdd>; + VDDIO-supply = <_module_3v3>; + VDDD-supply = <_DCDC3>; + }; + ad7879@2c { compatible = "adi,ad7879-1"; reg = <0x2c>; @@ -223,6 +251,12 @@ vin-supply = <_DCDC3>; }; + { + pinctrl-names = "default"; + pinctrl-0 = <_sai1>; + status = "okay"; +}; + _pwrkey { status = "disabled"; }; @@ -542,13 +576,18 @@ pinctrl_sai1: sai1-grp { fsl,pins = < - MX7D_PAD_SAI1_MCLK__SAI1_MCLK 0x1f MX7D_PAD_ENET1_RX_CLK__SAI1_TX_BCLK 0x1f MX7D_PAD_SAI1_TX_SYNC__SAI1_TX_SYNC 0x1f MX7D_PAD_ENET1_COL__SAI1_TX_DATA0 0x30 MX7D_PAD_ENET1_TX_CLK__SAI1_RX_DATA00x1f >; }; + + pinctrl_sai1_mclk: sai1grp_mclk { + fsl,pins = < + MX7D_PAD_SAI1_MCLK__SAI1_MCLK 0x1f + >; + }; }; _lpsr { -- 2.9.0
[PATCH 1/3] ARM: dts: imx7-colibri: move SD-card to module level
Move SD-card definition to module level. While at it, also disable write-protect since the Colibri standard does not define a pin for SD-Card write-protection. Signed-off-by: Stefan Agner--- arch/arm/boot/dts/imx7-colibri-eval-v3.dtsi | 4 arch/arm/boot/dts/imx7-colibri.dtsi | 8 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/arch/arm/boot/dts/imx7-colibri-eval-v3.dtsi b/arch/arm/boot/dts/imx7-colibri-eval-v3.dtsi index 1545661..373ee19 100644 --- a/arch/arm/boot/dts/imx7-colibri-eval-v3.dtsi +++ b/arch/arm/boot/dts/imx7-colibri-eval-v3.dtsi @@ -138,10 +138,6 @@ }; { - pinctrl-names = "default"; - pinctrl-0 = <_usdhc1 _cd_usdhc1>; - no-1-8-v; - cd-gpios = < 0 GPIO_ACTIVE_LOW>; keep-power-in-suspend; wakeup-source; status = "okay"; diff --git a/arch/arm/boot/dts/imx7-colibri.dtsi b/arch/arm/boot/dts/imx7-colibri.dtsi index 0a9d3a8..044b83e 100644 --- a/arch/arm/boot/dts/imx7-colibri.dtsi +++ b/arch/arm/boot/dts/imx7-colibri.dtsi @@ -251,6 +251,14 @@ dr_mode = "host"; }; + { + pinctrl-names = "default"; + pinctrl-0 = <_usdhc1 _cd_usdhc1>; + no-1-8-v; + cd-gpios = < 0 GPIO_ACTIVE_LOW>; + disable-wp; +}; + { pinctrl-names = "default"; pinctrl-0 = <_gpio1 _gpio2 _gpio3 _gpio4>; -- 2.9.0
[PATCH 2/3] ARM: dts: imx7-colibri: add basic supply regulators
Colibri modules need to be powered using the power pins 3V3 and AVDD_AUDIO. Add fixed regulators which represent this power rails. Potentially, those power rails could be switched on a carrier board. A carrier board device tree could add a own regulator with a GPIO, and reference that regulator in a vin-supply property of those new module level system regulators. This also synchronize the name of the +3.3V regulator with the one used in the Colibri VF50/VF61 device tree. Signed-off-by: Stefan Agner --- arch/arm/boot/dts/imx7-colibri.dtsi | 12 +--- 1 file changed, 9 insertions(+), 3 deletions(-) diff --git a/arch/arm/boot/dts/imx7-colibri.dtsi b/arch/arm/boot/dts/imx7-colibri.dtsi index 044b83e..06fb567 100644 --- a/arch/arm/boot/dts/imx7-colibri.dtsi +++ b/arch/arm/boot/dts/imx7-colibri.dtsi @@ -46,12 +46,18 @@ pwms = < 0 500>; }; - reg_3p3v: regulator-3p3v { + reg_module_3v3: regulator-module-3v3 { compatible = "regulator-fixed"; - regulator-name = "3P3V"; + regulator-name = "+V3.3"; + regulator-min-microvolt = <330>; + regulator-max-microvolt = <330>; + }; + + reg_module_3v3_avdd: regulator-module-3v3-avdd { + compatible = "regulator-fixed"; + regulator-name = "+V3.3_AVDD_AUDIO"; regulator-min-microvolt = <330>; regulator-max-microvolt = <330>; - regulator-always-on; }; reg_vref_1v8: regulator-vref-1v8 { -- 2.9.0
[PATCH 3/3] ARM: dts: imx7-colibri: add Audio support
Add audio support via on module I2S SGTL5000 codec. Signed-off-by: Stefan Agner --- arch/arm/boot/dts/imx7-colibri.dtsi | 41 - 1 file changed, 40 insertions(+), 1 deletion(-) diff --git a/arch/arm/boot/dts/imx7-colibri.dtsi b/arch/arm/boot/dts/imx7-colibri.dtsi index 06fb567..a9cc657 100644 --- a/arch/arm/boot/dts/imx7-colibri.dtsi +++ b/arch/arm/boot/dts/imx7-colibri.dtsi @@ -66,6 +66,22 @@ regulator-min-microvolt = <180>; regulator-max-microvolt = <180>; }; + + sound { + compatible = "simple-audio-card"; + simple-audio-card,name = "imx7-sgtl5000"; + simple-audio-card,format = "i2s"; + simple-audio-card,bitclock-master = <_master>; + simple-audio-card,frame-master = <_master>; + simple-audio-card,cpu { + sound-dai = <>; + }; + + dailink_master: simple-audio-card,codec { + sound-dai = <>; + clocks = < IMX7D_AUDIO_MCLK_ROOT_CLK>; + }; + }; }; { @@ -103,6 +119,18 @@ pinctrl-0 = <_i2c1 _i2c1_int>; status = "okay"; + codec: sgtl5000@0a { + compatible = "fsl,sgtl5000"; + #sound-dai-cells = <0>; + reg = <0x0a>; + clocks = < IMX7D_AUDIO_MCLK_ROOT_CLK>; + pinctrl-names = "default"; + pinctrl-0 = <_sai1_mclk>; + VDDA-supply = <_module_3v3_avdd>; + VDDIO-supply = <_module_3v3>; + VDDD-supply = <_DCDC3>; + }; + ad7879@2c { compatible = "adi,ad7879-1"; reg = <0x2c>; @@ -223,6 +251,12 @@ vin-supply = <_DCDC3>; }; + { + pinctrl-names = "default"; + pinctrl-0 = <_sai1>; + status = "okay"; +}; + _pwrkey { status = "disabled"; }; @@ -542,13 +576,18 @@ pinctrl_sai1: sai1-grp { fsl,pins = < - MX7D_PAD_SAI1_MCLK__SAI1_MCLK 0x1f MX7D_PAD_ENET1_RX_CLK__SAI1_TX_BCLK 0x1f MX7D_PAD_SAI1_TX_SYNC__SAI1_TX_SYNC 0x1f MX7D_PAD_ENET1_COL__SAI1_TX_DATA0 0x30 MX7D_PAD_ENET1_TX_CLK__SAI1_RX_DATA00x1f >; }; + + pinctrl_sai1_mclk: sai1grp_mclk { + fsl,pins = < + MX7D_PAD_SAI1_MCLK__SAI1_MCLK 0x1f + >; + }; }; _lpsr { -- 2.9.0
[PATCH 1/3] ARM: dts: imx7-colibri: move SD-card to module level
Move SD-card definition to module level. While at it, also disable write-protect since the Colibri standard does not define a pin for SD-Card write-protection. Signed-off-by: Stefan Agner --- arch/arm/boot/dts/imx7-colibri-eval-v3.dtsi | 4 arch/arm/boot/dts/imx7-colibri.dtsi | 8 2 files changed, 8 insertions(+), 4 deletions(-) diff --git a/arch/arm/boot/dts/imx7-colibri-eval-v3.dtsi b/arch/arm/boot/dts/imx7-colibri-eval-v3.dtsi index 1545661..373ee19 100644 --- a/arch/arm/boot/dts/imx7-colibri-eval-v3.dtsi +++ b/arch/arm/boot/dts/imx7-colibri-eval-v3.dtsi @@ -138,10 +138,6 @@ }; { - pinctrl-names = "default"; - pinctrl-0 = <_usdhc1 _cd_usdhc1>; - no-1-8-v; - cd-gpios = < 0 GPIO_ACTIVE_LOW>; keep-power-in-suspend; wakeup-source; status = "okay"; diff --git a/arch/arm/boot/dts/imx7-colibri.dtsi b/arch/arm/boot/dts/imx7-colibri.dtsi index 0a9d3a8..044b83e 100644 --- a/arch/arm/boot/dts/imx7-colibri.dtsi +++ b/arch/arm/boot/dts/imx7-colibri.dtsi @@ -251,6 +251,14 @@ dr_mode = "host"; }; + { + pinctrl-names = "default"; + pinctrl-0 = <_usdhc1 _cd_usdhc1>; + no-1-8-v; + cd-gpios = < 0 GPIO_ACTIVE_LOW>; + disable-wp; +}; + { pinctrl-names = "default"; pinctrl-0 = <_gpio1 _gpio2 _gpio3 _gpio4>; -- 2.9.0
[PATCH v5 6/6] mm/cma: remove per zone CMA stat
From: Joonsoo KimNow, all reserved pages for CMA region are belong to the ZONE_CMA so we don't need to maintain CMA stat in other zones. Remove it. Acked-by: Vlastimil Babka Signed-off-by: Joonsoo Kim --- fs/proc/meminfo.c | 2 +- include/linux/cma.h| 6 ++ include/linux/mmzone.h | 1 - mm/cma.c | 15 +++ mm/page_alloc.c| 7 +++ mm/vmstat.c| 1 - 6 files changed, 25 insertions(+), 7 deletions(-) diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index 8a42849..0ca6f38 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -151,7 +151,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v) #ifdef CONFIG_CMA show_val_kb(m, "CmaTotal: ", totalcma_pages); show_val_kb(m, "CmaFree:", - global_page_state(NR_FREE_CMA_PAGES)); + cma_get_free()); #endif hugetlb_report_meminfo(m); diff --git a/include/linux/cma.h b/include/linux/cma.h index 29f9e77..816290c 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -28,4 +28,10 @@ extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size, struct cma **res_cma); extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align); extern bool cma_release(struct cma *cma, const struct page *pages, unsigned int count); + +#ifdef CONFIG_CMA +extern unsigned long cma_get_free(void); +#else +static inline unsigned long cma_get_free(void) { return 0; } +#endif #endif diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 24e46ca..8bc2611 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -113,7 +113,6 @@ enum zone_stat_item { NUMA_LOCAL, /* allocation from local node */ NUMA_OTHER, /* allocation from other node */ #endif - NR_FREE_CMA_PAGES, NR_VM_ZONE_STAT_ITEMS }; enum node_stat_item { diff --git a/mm/cma.c b/mm/cma.c index c1bae7f..981633b 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -54,6 +54,21 @@ unsigned long cma_get_size(const struct cma *cma) return cma->count << PAGE_SHIFT; } +unsigned long cma_get_free(void) +{ + struct zone *zone; + unsigned long freecma = 0; + + for_each_populated_zone(zone) { + if (!is_zone_cma(zone)) + continue; + + freecma += zone_page_state(zone, NR_FREE_PAGES); + } + + return freecma; +} + static unsigned long cma_bitmap_aligned_mask(const struct cma *cma, int align_order) { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ca17de9..587d542 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -65,6 +65,7 @@ #include #include #include +#include #include #include @@ -4206,7 +4207,7 @@ void show_free_areas(unsigned int filter) global_page_state(NR_BOUNCE), global_page_state(NR_FREE_PAGES), free_pcp, - global_page_state(NR_FREE_CMA_PAGES)); + cma_get_free()); for_each_online_pgdat(pgdat) { printk("Node %d" @@ -4287,7 +4288,6 @@ void show_free_areas(unsigned int filter) " bounce:%lukB" " free_pcp:%lukB" " local_pcp:%ukB" - " free_cma:%lukB" "\n", zone->name, K(zone_page_state(zone, NR_FREE_PAGES)), @@ -4309,8 +4309,7 @@ void show_free_areas(unsigned int filter) K(zone_page_state(zone, NR_PAGETABLE)), K(zone_page_state(zone, NR_BOUNCE)), K(free_pcp), - K(this_cpu_read(zone->pageset->pcp.count)), - K(zone_page_state(zone, NR_FREE_CMA_PAGES))); + K(this_cpu_read(zone->pageset->pcp.count))); printk("lowmem_reserve[]:"); for (i = 0; i < MAX_NR_ZONES; i++) printk(" %ld", zone->lowmem_reserve[i]); diff --git a/mm/vmstat.c b/mm/vmstat.c index ce5838b..93dfd9d 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -951,7 +951,6 @@ const char * const vmstat_text[] = { "numa_local", "numa_other", #endif - "nr_free_cma", /* Node-based counters */ "nr_inactive_anon", -- 1.9.1
[PATCH v5 2/6] mm/cma: introduce new zone, ZONE_CMA
From: Joonsoo KimAttached cover-letter: This series try to solve problems of current CMA implementation. CMA is introduced to provide physically contiguous pages at runtime without exclusive reserved memory area. But, current implementation works like as previous reserved memory approach, because freepages on CMA region are used only if there is no movable freepage. In other words, freepages on CMA region are only used as fallback. In that situation where freepages on CMA region are used as fallback, kswapd would be woken up easily since there is no unmovable and reclaimable freepage, too. If kswapd starts to reclaim memory, fallback allocation to MIGRATE_CMA doesn't occur any more since movable freepages are already refilled by kswapd and then most of freepage on CMA are left to be in free. This situation looks like exclusive reserved memory case. In my experiment, I found that if system memory has 1024 MB memory and 512 MB is reserved for CMA, kswapd is mostly woken up when roughly 512 MB free memory is left. Detailed reason is that for keeping enough free memory for unmovable and reclaimable allocation, kswapd uses below equation when calculating free memory and it easily go under the watermark. Free memory for unmovable and reclaimable = Free total - Free CMA pages This is derivated from the property of CMA freepage that CMA freepage can't be used for unmovable and reclaimable allocation. Anyway, in this case, kswapd are woken up when (FreeTotal - FreeCMA) is lower than low watermark and tries to make free memory until (FreeTotal - FreeCMA) is higher than high watermark. That results in that FreeTotal is moving around 512MB boundary consistently. It then means that we can't utilize full memory capacity. To fix this problem, I submitted some patches [1] about 10 months ago, but, found some more problems to be fixed before solving this problem. It requires many hooks in allocator hotpath so some developers doesn't like it. Instead, some of them suggest different approach [2] to fix all the problems related to CMA, that is, introducing a new zone to deal with free CMA pages. I agree that it is the best way to go so implement here. Although properties of ZONE_MOVABLE and ZONE_CMA is similar, I decide to add a new zone rather than piggyback on ZONE_MOVABLE since they have some differences. First, reserved CMA pages should not be offlined. If freepage for CMA is managed by ZONE_MOVABLE, we need to keep MIGRATE_CMA migratetype and insert many hooks on memory hotplug code to distiguish hotpluggable memory and reserved memory for CMA in the same zone. It would make memory hotplug code which is already complicated more complicated. Second, cma_alloc() can be called more frequently than memory hotplug operation and possibly we need to control allocation rate of ZONE_CMA to optimize latency in the future. In this case, separate zone approach is easy to modify. Third, I'd like to see statistics for CMA, separately. Sometimes, we need to debug why cma_alloc() is failed and separate statistics would be more helpful in this situtaion. Anyway, this patchset solves four problems related to CMA implementation. 1) Utilization problem As mentioned above, we can't utilize full memory capacity due to the limitation of CMA freepage and fallback policy. This patchset implements a new zone for CMA and uses it for GFP_HIGHUSER_MOVABLE request. This typed allocation is used for page cache and anonymous pages which occupies most of memory usage in normal case so we can utilize full memory capacity. Below is the experiment result about this problem. 8 CPUs, 1024 MB, VIRTUAL MACHINE make -j16 CMA reserve:0 MB512 MB Elapsed-time: 92.4186.5 pswpin: 82 18647 pswpout:160 69839 CMA reserve:0 MB512 MB Elapsed-time: 93.193.4 pswpin: 84 46 pswpout:183 92 FYI, there is another attempt [3] trying to solve this problem in lkml. And, as far as I know, Qualcomm also has out-of-tree solution for this problem. 2) Reclaim problem Currently, there is no logic to distinguish CMA pages in reclaim path. If reclaim is initiated for unmovable and reclaimable allocation, reclaiming CMA pages doesn't help to satisfy the request and reclaiming CMA page is just waste. By managing CMA pages in the new zone, we can skip to reclaim ZONE_CMA completely if it is unnecessary. 3) Atomic allocation failure problem Kswapd isn't started to reclaim pages when allocation request is movable type and there is enough free page in the CMA region. After bunch of consecutive movable allocation requests, free pages in ordinary region (not CMA region) would be exhausted without waking up kswapd. At that time, if atomic unmovable allocation comes, it can't be successful since there is not enough page in ordinary region. This problem
[PATCH v5 5/6] mm/cma: remove MIGRATE_CMA
From: Joonsoo KimNow, all reserved pages for CMA region are belong to the ZONE_CMA and there is no other type of pages. Therefore, we don't need to use MIGRATE_CMA to distinguish and handle differently for CMA pages and ordinary pages. Remove MIGRATE_CMA. Unfortunately, this patch make free CMA counter incorrect because we count it when pages are on the MIGRATE_CMA. It will be fixed by next patch. I can squash next patch here but it makes changes complicated and hard to review so I separate that. Acked-by: Vlastimil Babka Signed-off-by: Joonsoo Kim --- include/linux/gfp.h| 3 +- include/linux/mmzone.h | 24 include/linux/page-isolation.h | 5 +-- include/linux/vmstat.h | 8 mm/cma.c | 2 +- mm/compaction.c| 10 + mm/hugetlb.c | 2 +- mm/memory_hotplug.c| 7 ++-- mm/page_alloc.c| 89 -- mm/page_isolation.c| 15 +++ mm/page_owner.c| 6 +-- mm/usercopy.c | 4 +- 12 files changed, 43 insertions(+), 132 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index b86e0c2..815d756 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -553,8 +553,7 @@ static inline bool pm_suspended_storage(void) #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA) /* The below functions must be run on a range from a single zone. */ -extern int alloc_contig_range(unsigned long start, unsigned long end, - unsigned migratetype); +extern int alloc_contig_range(unsigned long start, unsigned long end); extern void free_contig_range(unsigned long pfn, unsigned nr_pages); #endif diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 87b344e..24e46ca 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -41,22 +41,6 @@ enum { MIGRATE_RECLAIMABLE, MIGRATE_PCPTYPES, /* the number of types on the pcp lists */ MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES, -#ifdef CONFIG_CMA - /* -* MIGRATE_CMA migration type is designed to mimic the way -* ZONE_MOVABLE works. Only movable pages can be allocated -* from MIGRATE_CMA pageblocks and page allocator never -* implicitly change migration type of MIGRATE_CMA pageblock. -* -* The way to use it is to change migratetype of a range of -* pageblocks to MIGRATE_CMA which can be done by -* __free_pageblock_cma() function. What is important though -* is that a range of pageblocks must be aligned to -* MAX_ORDER_NR_PAGES should biggest page be bigger then -* a single pageblock. -*/ - MIGRATE_CMA, -#endif #ifdef CONFIG_MEMORY_ISOLATION MIGRATE_ISOLATE,/* can't allocate from here */ #endif @@ -66,14 +50,6 @@ enum { /* In mm/page_alloc.c; keep in sync also with show_migration_types() there */ extern char * const migratetype_names[MIGRATE_TYPES]; -#ifdef CONFIG_CMA -# define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA) -# define is_migrate_cma_page(_page) (get_pageblock_migratetype(_page) == MIGRATE_CMA) -#else -# define is_migrate_cma(migratetype) false -# define is_migrate_cma_page(_page) false -#endif - #define for_each_migratetype_order(order, type) \ for (order = 0; order < MAX_ORDER; order++) \ for (type = 0; type < MIGRATE_TYPES; type++) diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index 047d647..1db9759 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -49,15 +49,14 @@ int move_freepages(struct zone *zone, */ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, -unsigned migratetype, bool skip_hwpoisoned_pages); + bool skip_hwpoisoned_pages); /* * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE. * target range is [start_pfn, end_pfn) */ int -undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, - unsigned migratetype); +undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn); /* * Test all pages in [start_pfn, end_pfn) are isolated or not. diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 6137719..ac6db88 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -341,14 +341,6 @@ static inline void drain_zonestat(struct zone *zone, struct per_cpu_pageset *pset) { } #endif /* CONFIG_SMP */ -static inline void __mod_zone_freepage_state(struct zone *zone, int nr_pages, -int migratetype) -{ - __mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages); - if
[PATCH v5 6/6] mm/cma: remove per zone CMA stat
From: Joonsoo Kim Now, all reserved pages for CMA region are belong to the ZONE_CMA so we don't need to maintain CMA stat in other zones. Remove it. Acked-by: Vlastimil Babka Signed-off-by: Joonsoo Kim --- fs/proc/meminfo.c | 2 +- include/linux/cma.h| 6 ++ include/linux/mmzone.h | 1 - mm/cma.c | 15 +++ mm/page_alloc.c| 7 +++ mm/vmstat.c| 1 - 6 files changed, 25 insertions(+), 7 deletions(-) diff --git a/fs/proc/meminfo.c b/fs/proc/meminfo.c index 8a42849..0ca6f38 100644 --- a/fs/proc/meminfo.c +++ b/fs/proc/meminfo.c @@ -151,7 +151,7 @@ static int meminfo_proc_show(struct seq_file *m, void *v) #ifdef CONFIG_CMA show_val_kb(m, "CmaTotal: ", totalcma_pages); show_val_kb(m, "CmaFree:", - global_page_state(NR_FREE_CMA_PAGES)); + cma_get_free()); #endif hugetlb_report_meminfo(m); diff --git a/include/linux/cma.h b/include/linux/cma.h index 29f9e77..816290c 100644 --- a/include/linux/cma.h +++ b/include/linux/cma.h @@ -28,4 +28,10 @@ extern int cma_init_reserved_mem(phys_addr_t base, phys_addr_t size, struct cma **res_cma); extern struct page *cma_alloc(struct cma *cma, size_t count, unsigned int align); extern bool cma_release(struct cma *cma, const struct page *pages, unsigned int count); + +#ifdef CONFIG_CMA +extern unsigned long cma_get_free(void); +#else +static inline unsigned long cma_get_free(void) { return 0; } +#endif #endif diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 24e46ca..8bc2611 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -113,7 +113,6 @@ enum zone_stat_item { NUMA_LOCAL, /* allocation from local node */ NUMA_OTHER, /* allocation from other node */ #endif - NR_FREE_CMA_PAGES, NR_VM_ZONE_STAT_ITEMS }; enum node_stat_item { diff --git a/mm/cma.c b/mm/cma.c index c1bae7f..981633b 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -54,6 +54,21 @@ unsigned long cma_get_size(const struct cma *cma) return cma->count << PAGE_SHIFT; } +unsigned long cma_get_free(void) +{ + struct zone *zone; + unsigned long freecma = 0; + + for_each_populated_zone(zone) { + if (!is_zone_cma(zone)) + continue; + + freecma += zone_page_state(zone, NR_FREE_PAGES); + } + + return freecma; +} + static unsigned long cma_bitmap_aligned_mask(const struct cma *cma, int align_order) { diff --git a/mm/page_alloc.c b/mm/page_alloc.c index ca17de9..587d542 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -65,6 +65,7 @@ #include #include #include +#include #include #include @@ -4206,7 +4207,7 @@ void show_free_areas(unsigned int filter) global_page_state(NR_BOUNCE), global_page_state(NR_FREE_PAGES), free_pcp, - global_page_state(NR_FREE_CMA_PAGES)); + cma_get_free()); for_each_online_pgdat(pgdat) { printk("Node %d" @@ -4287,7 +4288,6 @@ void show_free_areas(unsigned int filter) " bounce:%lukB" " free_pcp:%lukB" " local_pcp:%ukB" - " free_cma:%lukB" "\n", zone->name, K(zone_page_state(zone, NR_FREE_PAGES)), @@ -4309,8 +4309,7 @@ void show_free_areas(unsigned int filter) K(zone_page_state(zone, NR_PAGETABLE)), K(zone_page_state(zone, NR_BOUNCE)), K(free_pcp), - K(this_cpu_read(zone->pageset->pcp.count)), - K(zone_page_state(zone, NR_FREE_CMA_PAGES))); + K(this_cpu_read(zone->pageset->pcp.count))); printk("lowmem_reserve[]:"); for (i = 0; i < MAX_NR_ZONES; i++) printk(" %ld", zone->lowmem_reserve[i]); diff --git a/mm/vmstat.c b/mm/vmstat.c index ce5838b..93dfd9d 100644 --- a/mm/vmstat.c +++ b/mm/vmstat.c @@ -951,7 +951,6 @@ const char * const vmstat_text[] = { "numa_local", "numa_other", #endif - "nr_free_cma", /* Node-based counters */ "nr_inactive_anon", -- 1.9.1
[PATCH v5 2/6] mm/cma: introduce new zone, ZONE_CMA
From: Joonsoo Kim Attached cover-letter: This series try to solve problems of current CMA implementation. CMA is introduced to provide physically contiguous pages at runtime without exclusive reserved memory area. But, current implementation works like as previous reserved memory approach, because freepages on CMA region are used only if there is no movable freepage. In other words, freepages on CMA region are only used as fallback. In that situation where freepages on CMA region are used as fallback, kswapd would be woken up easily since there is no unmovable and reclaimable freepage, too. If kswapd starts to reclaim memory, fallback allocation to MIGRATE_CMA doesn't occur any more since movable freepages are already refilled by kswapd and then most of freepage on CMA are left to be in free. This situation looks like exclusive reserved memory case. In my experiment, I found that if system memory has 1024 MB memory and 512 MB is reserved for CMA, kswapd is mostly woken up when roughly 512 MB free memory is left. Detailed reason is that for keeping enough free memory for unmovable and reclaimable allocation, kswapd uses below equation when calculating free memory and it easily go under the watermark. Free memory for unmovable and reclaimable = Free total - Free CMA pages This is derivated from the property of CMA freepage that CMA freepage can't be used for unmovable and reclaimable allocation. Anyway, in this case, kswapd are woken up when (FreeTotal - FreeCMA) is lower than low watermark and tries to make free memory until (FreeTotal - FreeCMA) is higher than high watermark. That results in that FreeTotal is moving around 512MB boundary consistently. It then means that we can't utilize full memory capacity. To fix this problem, I submitted some patches [1] about 10 months ago, but, found some more problems to be fixed before solving this problem. It requires many hooks in allocator hotpath so some developers doesn't like it. Instead, some of them suggest different approach [2] to fix all the problems related to CMA, that is, introducing a new zone to deal with free CMA pages. I agree that it is the best way to go so implement here. Although properties of ZONE_MOVABLE and ZONE_CMA is similar, I decide to add a new zone rather than piggyback on ZONE_MOVABLE since they have some differences. First, reserved CMA pages should not be offlined. If freepage for CMA is managed by ZONE_MOVABLE, we need to keep MIGRATE_CMA migratetype and insert many hooks on memory hotplug code to distiguish hotpluggable memory and reserved memory for CMA in the same zone. It would make memory hotplug code which is already complicated more complicated. Second, cma_alloc() can be called more frequently than memory hotplug operation and possibly we need to control allocation rate of ZONE_CMA to optimize latency in the future. In this case, separate zone approach is easy to modify. Third, I'd like to see statistics for CMA, separately. Sometimes, we need to debug why cma_alloc() is failed and separate statistics would be more helpful in this situtaion. Anyway, this patchset solves four problems related to CMA implementation. 1) Utilization problem As mentioned above, we can't utilize full memory capacity due to the limitation of CMA freepage and fallback policy. This patchset implements a new zone for CMA and uses it for GFP_HIGHUSER_MOVABLE request. This typed allocation is used for page cache and anonymous pages which occupies most of memory usage in normal case so we can utilize full memory capacity. Below is the experiment result about this problem. 8 CPUs, 1024 MB, VIRTUAL MACHINE make -j16 CMA reserve:0 MB512 MB Elapsed-time: 92.4186.5 pswpin: 82 18647 pswpout:160 69839 CMA reserve:0 MB512 MB Elapsed-time: 93.193.4 pswpin: 84 46 pswpout:183 92 FYI, there is another attempt [3] trying to solve this problem in lkml. And, as far as I know, Qualcomm also has out-of-tree solution for this problem. 2) Reclaim problem Currently, there is no logic to distinguish CMA pages in reclaim path. If reclaim is initiated for unmovable and reclaimable allocation, reclaiming CMA pages doesn't help to satisfy the request and reclaiming CMA page is just waste. By managing CMA pages in the new zone, we can skip to reclaim ZONE_CMA completely if it is unnecessary. 3) Atomic allocation failure problem Kswapd isn't started to reclaim pages when allocation request is movable type and there is enough free page in the CMA region. After bunch of consecutive movable allocation requests, free pages in ordinary region (not CMA region) would be exhausted without waking up kswapd. At that time, if atomic unmovable allocation comes, it can't be successful since there is not enough page in ordinary region. This problem is reported by Aneesh
[PATCH v5 5/6] mm/cma: remove MIGRATE_CMA
From: Joonsoo Kim Now, all reserved pages for CMA region are belong to the ZONE_CMA and there is no other type of pages. Therefore, we don't need to use MIGRATE_CMA to distinguish and handle differently for CMA pages and ordinary pages. Remove MIGRATE_CMA. Unfortunately, this patch make free CMA counter incorrect because we count it when pages are on the MIGRATE_CMA. It will be fixed by next patch. I can squash next patch here but it makes changes complicated and hard to review so I separate that. Acked-by: Vlastimil Babka Signed-off-by: Joonsoo Kim --- include/linux/gfp.h| 3 +- include/linux/mmzone.h | 24 include/linux/page-isolation.h | 5 +-- include/linux/vmstat.h | 8 mm/cma.c | 2 +- mm/compaction.c| 10 + mm/hugetlb.c | 2 +- mm/memory_hotplug.c| 7 ++-- mm/page_alloc.c| 89 -- mm/page_isolation.c| 15 +++ mm/page_owner.c| 6 +-- mm/usercopy.c | 4 +- 12 files changed, 43 insertions(+), 132 deletions(-) diff --git a/include/linux/gfp.h b/include/linux/gfp.h index b86e0c2..815d756 100644 --- a/include/linux/gfp.h +++ b/include/linux/gfp.h @@ -553,8 +553,7 @@ static inline bool pm_suspended_storage(void) #if (defined(CONFIG_MEMORY_ISOLATION) && defined(CONFIG_COMPACTION)) || defined(CONFIG_CMA) /* The below functions must be run on a range from a single zone. */ -extern int alloc_contig_range(unsigned long start, unsigned long end, - unsigned migratetype); +extern int alloc_contig_range(unsigned long start, unsigned long end); extern void free_contig_range(unsigned long pfn, unsigned nr_pages); #endif diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index 87b344e..24e46ca 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -41,22 +41,6 @@ enum { MIGRATE_RECLAIMABLE, MIGRATE_PCPTYPES, /* the number of types on the pcp lists */ MIGRATE_HIGHATOMIC = MIGRATE_PCPTYPES, -#ifdef CONFIG_CMA - /* -* MIGRATE_CMA migration type is designed to mimic the way -* ZONE_MOVABLE works. Only movable pages can be allocated -* from MIGRATE_CMA pageblocks and page allocator never -* implicitly change migration type of MIGRATE_CMA pageblock. -* -* The way to use it is to change migratetype of a range of -* pageblocks to MIGRATE_CMA which can be done by -* __free_pageblock_cma() function. What is important though -* is that a range of pageblocks must be aligned to -* MAX_ORDER_NR_PAGES should biggest page be bigger then -* a single pageblock. -*/ - MIGRATE_CMA, -#endif #ifdef CONFIG_MEMORY_ISOLATION MIGRATE_ISOLATE,/* can't allocate from here */ #endif @@ -66,14 +50,6 @@ enum { /* In mm/page_alloc.c; keep in sync also with show_migration_types() there */ extern char * const migratetype_names[MIGRATE_TYPES]; -#ifdef CONFIG_CMA -# define is_migrate_cma(migratetype) unlikely((migratetype) == MIGRATE_CMA) -# define is_migrate_cma_page(_page) (get_pageblock_migratetype(_page) == MIGRATE_CMA) -#else -# define is_migrate_cma(migratetype) false -# define is_migrate_cma_page(_page) false -#endif - #define for_each_migratetype_order(order, type) \ for (order = 0; order < MAX_ORDER; order++) \ for (type = 0; type < MIGRATE_TYPES; type++) diff --git a/include/linux/page-isolation.h b/include/linux/page-isolation.h index 047d647..1db9759 100644 --- a/include/linux/page-isolation.h +++ b/include/linux/page-isolation.h @@ -49,15 +49,14 @@ int move_freepages(struct zone *zone, */ int start_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, -unsigned migratetype, bool skip_hwpoisoned_pages); + bool skip_hwpoisoned_pages); /* * Changes MIGRATE_ISOLATE to MIGRATE_MOVABLE. * target range is [start_pfn, end_pfn) */ int -undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn, - unsigned migratetype); +undo_isolate_page_range(unsigned long start_pfn, unsigned long end_pfn); /* * Test all pages in [start_pfn, end_pfn) are isolated or not. diff --git a/include/linux/vmstat.h b/include/linux/vmstat.h index 6137719..ac6db88 100644 --- a/include/linux/vmstat.h +++ b/include/linux/vmstat.h @@ -341,14 +341,6 @@ static inline void drain_zonestat(struct zone *zone, struct per_cpu_pageset *pset) { } #endif /* CONFIG_SMP */ -static inline void __mod_zone_freepage_state(struct zone *zone, int nr_pages, -int migratetype) -{ - __mod_zone_page_state(zone, NR_FREE_PAGES, nr_pages); - if (is_migrate_cma(migratetype)) - __mod_zone_page_state(zone,
[PATCH v5 0/6] Introduce ZONE_CMA
From: Joonsoo KimHello, Changes from v4 o Rebase on next-20160825 o Add general fix patch for lowmem reserve o Fix lowmem reserve ratio o Fix zone span optimizaion per Vlastimil o Fix pageset initialization o Change invocation timing on cma_init_reserved_areas() Changes from v3 o Rebase on next-20160805 o Split first patch per Vlastimil o Remove useless function parameter per Vlastimil o Add code comment per Vlastimil o Add following description on cover-letter This is the 5th version of ZONE_CMA patchset. Most of changes are due to rebase and some minor fixes. CMA has many problems and I mentioned them on the bottom of the cover letter. These problems comes from limitation of CMA memory that should be always migratable for device usage. I think that introducing a new zone is the best approach to solve them. Here are the reasons. Zone is introduced to solve some issues due to H/W addressing limitation. MM subsystem is implemented to work efficiently with these zones. Allocation/reclaim logic in MM consider this limitation very much. What I did in this patchset is introducing a new zone and extending zone's concept slightly. New concept is that zone can have not only H/W addressing limitation but also S/W limitation to guarantee page migration. This concept is originated from ZONE_MOVABLE and it works well for a long time. So, ZONE_CMA should not be special at this moment. There is a major concern from Mel that ZONE_MOVABLE which has S/W limitation causes highmem/lowmem problem. Highmem/lowmem problem is that some of memory cannot be usable for kernel memory due to limitation of the zone. It causes to break LRU ordering and makes hard to find kernel usable memory when memory pressure. However, important point is that this problem doesn't come from implementation detail (ZONE_MOVABLE/MIGRATETYPE). Even if we implement it by MIGRATETYPE instead of by ZONE_MOVABLE, we cannot use that type of memory for kernel allocation because it isn't migratable. So, it will cause to break LRU ordering, too. We cannot avoid the problem in any case. Therefore, we should focus on which solution is better for maintainance and not intrusive for MM subsystem. In this viewpoint, I think that zone approach is better. As mentioned earlier, MM subsystem already have many infrastructures to deal with zone's H/W addressing limitation. Adding S/W limitation on zone concept and adding a new zone doesn't change anything. It will work by itself. My patchset can remove many hooks related to CMA area management in MM while solving the problems. More hooks are required to solve the problems if we choose MIGRATETYPE approach. Although Mel withdrew the review, Vlastimil expressed an agreement on this new zone approach [6]. "I realize I differ here from much more experienced mm guys, and will probably deservingly regret it later on, but I think that the ZONE_CMA approach could work indeed better than current MIGRATE_CMA pageblocks." If anyone has a different opinion, please let me know. Thanks. Changes from v2 o Rebase on next-20160525 o No other changes except following description There was a discussion with Mel [5] after LSF/MM 2016. I could summarise it to help merge decision but it's better to read by yourself since if I summarise it, it would be biased for me. But, if anyone hope the summary, I will do it. :) Anyway, Mel's position on this patchset seems to be neutral. He saids: "I'm not going to outright NAK your series but I won't ACK it either" We can fix the problems with any approach but I hope to go a new zone approach because it is less error-prone. It reduces some corner case handling for now and remove need for potential corner case handling to fix problems. Note that our company is already using ZONE_CMA and there is no problem. If anyone has a different opinion, please let me know and let's discuss together. Andrew, if there is something to do for merge, please let me know. Changes from v1 o Separate some patches which deserve to submit independently o Modify description to reflect current kernel state (e.g. high-order watermark problem disappeared by Mel's work) o Don't increase SECTION_SIZE_BITS to make a room in page flags (detailed reason is on the patch that adds ZONE_CMA) o Adjust ZONE_CMA population code This series try to solve problems of current CMA implementation. CMA is introduced to provide physically contiguous pages at runtime without exclusive reserved memory area. But, current implementation works like as previous reserved memory approach, because freepages on CMA region are used only if there is no movable freepage. In other words, freepages on CMA region are only used as fallback. In that situation where freepages on CMA region are used as fallback, kswapd would be woken up easily since there is no unmovable and reclaimable freepage, too. If kswapd starts to reclaim memory, fallback allocation to MIGRATE_CMA doesn't occur any more since movable freepages
[PATCH v5 0/6] Introduce ZONE_CMA
From: Joonsoo Kim Hello, Changes from v4 o Rebase on next-20160825 o Add general fix patch for lowmem reserve o Fix lowmem reserve ratio o Fix zone span optimizaion per Vlastimil o Fix pageset initialization o Change invocation timing on cma_init_reserved_areas() Changes from v3 o Rebase on next-20160805 o Split first patch per Vlastimil o Remove useless function parameter per Vlastimil o Add code comment per Vlastimil o Add following description on cover-letter This is the 5th version of ZONE_CMA patchset. Most of changes are due to rebase and some minor fixes. CMA has many problems and I mentioned them on the bottom of the cover letter. These problems comes from limitation of CMA memory that should be always migratable for device usage. I think that introducing a new zone is the best approach to solve them. Here are the reasons. Zone is introduced to solve some issues due to H/W addressing limitation. MM subsystem is implemented to work efficiently with these zones. Allocation/reclaim logic in MM consider this limitation very much. What I did in this patchset is introducing a new zone and extending zone's concept slightly. New concept is that zone can have not only H/W addressing limitation but also S/W limitation to guarantee page migration. This concept is originated from ZONE_MOVABLE and it works well for a long time. So, ZONE_CMA should not be special at this moment. There is a major concern from Mel that ZONE_MOVABLE which has S/W limitation causes highmem/lowmem problem. Highmem/lowmem problem is that some of memory cannot be usable for kernel memory due to limitation of the zone. It causes to break LRU ordering and makes hard to find kernel usable memory when memory pressure. However, important point is that this problem doesn't come from implementation detail (ZONE_MOVABLE/MIGRATETYPE). Even if we implement it by MIGRATETYPE instead of by ZONE_MOVABLE, we cannot use that type of memory for kernel allocation because it isn't migratable. So, it will cause to break LRU ordering, too. We cannot avoid the problem in any case. Therefore, we should focus on which solution is better for maintainance and not intrusive for MM subsystem. In this viewpoint, I think that zone approach is better. As mentioned earlier, MM subsystem already have many infrastructures to deal with zone's H/W addressing limitation. Adding S/W limitation on zone concept and adding a new zone doesn't change anything. It will work by itself. My patchset can remove many hooks related to CMA area management in MM while solving the problems. More hooks are required to solve the problems if we choose MIGRATETYPE approach. Although Mel withdrew the review, Vlastimil expressed an agreement on this new zone approach [6]. "I realize I differ here from much more experienced mm guys, and will probably deservingly regret it later on, but I think that the ZONE_CMA approach could work indeed better than current MIGRATE_CMA pageblocks." If anyone has a different opinion, please let me know. Thanks. Changes from v2 o Rebase on next-20160525 o No other changes except following description There was a discussion with Mel [5] after LSF/MM 2016. I could summarise it to help merge decision but it's better to read by yourself since if I summarise it, it would be biased for me. But, if anyone hope the summary, I will do it. :) Anyway, Mel's position on this patchset seems to be neutral. He saids: "I'm not going to outright NAK your series but I won't ACK it either" We can fix the problems with any approach but I hope to go a new zone approach because it is less error-prone. It reduces some corner case handling for now and remove need for potential corner case handling to fix problems. Note that our company is already using ZONE_CMA and there is no problem. If anyone has a different opinion, please let me know and let's discuss together. Andrew, if there is something to do for merge, please let me know. Changes from v1 o Separate some patches which deserve to submit independently o Modify description to reflect current kernel state (e.g. high-order watermark problem disappeared by Mel's work) o Don't increase SECTION_SIZE_BITS to make a room in page flags (detailed reason is on the patch that adds ZONE_CMA) o Adjust ZONE_CMA population code This series try to solve problems of current CMA implementation. CMA is introduced to provide physically contiguous pages at runtime without exclusive reserved memory area. But, current implementation works like as previous reserved memory approach, because freepages on CMA region are used only if there is no movable freepage. In other words, freepages on CMA region are only used as fallback. In that situation where freepages on CMA region are used as fallback, kswapd would be woken up easily since there is no unmovable and reclaimable freepage, too. If kswapd starts to reclaim memory, fallback allocation to MIGRATE_CMA doesn't occur any more since movable freepages are already refilled by
[PATCH v5 4/6] mm/cma: remove ALLOC_CMA
From: Joonsoo KimNow, all reserved pages for CMA region are belong to the ZONE_CMA and it only serves for GFP_HIGHUSER_MOVABLE. Therefore, we don't need to consider ALLOC_CMA at all. Acked-by: Vlastimil Babka Signed-off-by: Joonsoo Kim --- mm/compaction.c | 4 +--- mm/internal.h | 1 - mm/page_alloc.c | 28 +++- 3 files changed, 4 insertions(+), 29 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 29f6c49..4532905 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1401,14 +1401,12 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order, * if compaction succeeds. * For costly orders, we require low watermark instead of min for * compaction to proceed to increase its chances. -* ALLOC_CMA is used, as pages in CMA pageblocks are considered -* suitable migration targets */ watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ? low_wmark_pages(zone) : min_wmark_pages(zone); watermark += compact_gap(order); if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx, - ALLOC_CMA, wmark_target)) + 0, wmark_target)) return COMPACT_SKIPPED; /* diff --git a/mm/internal.h b/mm/internal.h index 3d3f052..01d06bb 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -466,7 +466,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone, #define ALLOC_HARDER 0x10 /* try to alloc harder */ #define ALLOC_HIGH 0x20 /* __GFP_HIGH set */ #define ALLOC_CPUSET 0x40 /* check for correct cpuset */ -#define ALLOC_CMA 0x80 /* allow allocations from CMA areas */ enum ttu_flags; struct tlbflush_unmap_batch; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 91fb172..16ba1fe 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2565,7 +2565,7 @@ int __isolate_free_page(struct page *page, unsigned int order) * exists. */ watermark = min_wmark_pages(zone) + (1UL << order); - if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA)) + if (!zone_watermark_ok(zone, 0, watermark, 0, 0)) return 0; __mod_zone_freepage_state(zone, -(1UL << order), mt); @@ -2808,12 +2808,6 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, else min -= min / 4; -#ifdef CONFIG_CMA - /* If allocation can't use CMA areas don't use free CMA pages */ - if (!(alloc_flags & ALLOC_CMA)) - free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES); -#endif - /* * Check watermarks for an order-0 allocation request. If these * are not met, then a high-order request also cannot go ahead @@ -2843,10 +2837,8 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, } #ifdef CONFIG_CMA - if ((alloc_flags & ALLOC_CMA) && - !list_empty(>free_list[MIGRATE_CMA])) { + if (!list_empty(>free_list[MIGRATE_CMA])) return true; - } #endif } return false; @@ -2863,13 +2855,6 @@ static inline bool zone_watermark_fast(struct zone *z, unsigned int order, unsigned long mark, int classzone_idx, unsigned int alloc_flags) { long free_pages = zone_page_state(z, NR_FREE_PAGES); - long cma_pages = 0; - -#ifdef CONFIG_CMA - /* If allocation can't use CMA areas don't use free CMA pages */ - if (!(alloc_flags & ALLOC_CMA)) - cma_pages = zone_page_state(z, NR_FREE_CMA_PAGES); -#endif /* * Fast check for order-0 only. If this fails then the reserves @@ -2878,7 +2863,7 @@ static inline bool zone_watermark_fast(struct zone *z, unsigned int order, * the caller is !atomic then it'll uselessly search the free * list. That corner case is then slower but it is harmless. */ - if (!order && (free_pages - cma_pages) > mark + z->lowmem_reserve[classzone_idx]) + if (!order && free_pages > mark + z->lowmem_reserve[classzone_idx]) return true; return __zone_watermark_ok(z, order, mark, classzone_idx, alloc_flags, @@ -3355,10 +3340,6 @@ gfp_to_alloc_flags(gfp_t gfp_mask) } else if (unlikely(rt_task(current)) && !in_interrupt()) alloc_flags |= ALLOC_HARDER; -#ifdef CONFIG_CMA - if (gfpflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE) - alloc_flags |= ALLOC_CMA; -#endif return alloc_flags; } @@ -3727,9 +3708,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, if (unlikely(!zonelist->_zonerefs->zone))
[PATCH v5 1/6] mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request
From: Joonsoo KimFreepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that important to reserve. When ZONE_MOVABLE is used, this problem would theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE allocation request which is mainly used for page cache and anon page allocation. So, fix it. And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size makes code complex. For example, if there is highmem system, following reserve ratio is activated for *NORMAL ZONE* which would be easyily misleading people. #ifdef CONFIG_HIGHMEM 32 #endif This patch also fix this situation by defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES and place "#ifdef" to right place. Signed-off-by: Joonsoo Kim --- include/linux/mmzone.h | 2 +- mm/page_alloc.c| 7 --- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index d572b78..e3f39af 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -877,7 +877,7 @@ int min_free_kbytes_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); int watermark_scale_factor_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); -extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1]; +extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES]; int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *, int, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 4f7d5d7..a8310de 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -198,17 +198,18 @@ static void __free_pages_ok(struct page *page, unsigned int order); * TBD: should special case ZONE_DMA32 machines here - in those we normally * don't need any ZONE_NORMAL reservation */ -int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = { +int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES] = { #ifdef CONFIG_ZONE_DMA 256, #endif #ifdef CONFIG_ZONE_DMA32 256, #endif -#ifdef CONFIG_HIGHMEM 32, +#ifdef CONFIG_HIGHMEM +INT_MAX, #endif -32, +INT_MAX, }; EXPORT_SYMBOL(totalram_pages); -- 1.9.1
[PATCH v5 3/6] mm/cma: populate ZONE_CMA
From: Joonsoo KimUntil now, reserved pages for CMA are managed in the ordinary zones where page's pfn are belong to. This approach has numorous problems and fixing them isn't easy. (It is mentioned on previous patch.) To fix this situation, ZONE_CMA is introduced in previous patch, but, not yet populated. This patch implement population of ZONE_CMA by stealing reserved pages from the ordinary zones. Unlike previous implementation that kernel allocation request with __GFP_MOVABLE could be serviced from CMA region, allocation request only with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new approach. This is an inevitable design decision to use the zone implementation because ZONE_CMA could contain highmem. Due to this decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE. I don't think it would be a problem because most of file cache pages and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could be proved by the fact that there are many systems with ZONE_HIGHMEM and they work fine. Notable disadvantage is that we cannot use these pages for blockdev file cache page, because it usually has __GFP_MOVABLE but not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and cons. In my experience, blockdev file cache pages are one of the top reason that causes cma_alloc() to fail temporarily. So, we can get more guarantee of cma_alloc() success by discarding that case. Implementation itself is very easy to understand. Steal when cma area is initialized and recalculate various per zone stat/threshold. Signed-off-by: Joonsoo Kim --- include/linux/memory_hotplug.h | 3 --- include/linux/mm.h | 1 + mm/cma.c | 56 ++ mm/internal.h | 3 +++ mm/page_alloc.c| 29 +++--- 5 files changed, 80 insertions(+), 12 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 01033fa..ea5af47 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -198,9 +198,6 @@ void put_online_mems(void); void mem_hotplug_begin(void); void mem_hotplug_done(void); -extern void set_zone_contiguous(struct zone *zone); -extern void clear_zone_contiguous(struct zone *zone); - #else /* ! CONFIG_MEMORY_HOTPLUG */ /* * Stub functions for when hotplug is off diff --git a/include/linux/mm.h b/include/linux/mm.h index 9d85402..f45e0e4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1933,6 +1933,7 @@ extern void setup_per_cpu_pageset(void); extern void zone_pcp_update(struct zone *zone); extern void zone_pcp_reset(struct zone *zone); +extern void setup_zone_pageset(struct zone *zone); /* page_alloc.c */ extern int min_free_kbytes; diff --git a/mm/cma.c b/mm/cma.c index 384c2cb..d69bdf7 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -38,6 +38,7 @@ #include #include "cma.h" +#include "internal.h" struct cma cma_areas[MAX_CMA_AREAS]; unsigned cma_area_count; @@ -116,10 +117,9 @@ static int __init cma_activate_area(struct cma *cma) for (j = pageblock_nr_pages; j; --j, pfn++) { WARN_ON_ONCE(!pfn_valid(pfn)); /* -* alloc_contig_range requires the pfn range -* specified to be in the same zone. Make this -* simple by forcing the entire CMA resv range -* to be in the same zone. +* In init_cma_reserved_pageblock(), present_pages is +* adjusted with assumption that all pages come from +* a single zone. It could be fixed but not yet done. */ if (page_zone(pfn_to_page(pfn)) != zone) goto err; @@ -145,6 +145,28 @@ err: static int __init cma_init_reserved_areas(void) { int i; + struct zone *zone; + unsigned long start_pfn = UINT_MAX, end_pfn = 0; + + if (!cma_area_count) + return 0; + + for (i = 0; i < cma_area_count; i++) { + if (start_pfn > cma_areas[i].base_pfn) + start_pfn = cma_areas[i].base_pfn; + if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count) + end_pfn = cma_areas[i].base_pfn + cma_areas[i].count; + } + + for_each_zone(zone) { + if (!is_zone_cma(zone)) + continue; + + /* ZONE_CMA doesn't need to exceed CMA region */ + zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn); + zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) - + zone->zone_start_pfn; + } for (i = 0; i < cma_area_count; i++) { int ret = cma_activate_area(_areas[i]); @@
[PATCH v5 4/6] mm/cma: remove ALLOC_CMA
From: Joonsoo Kim Now, all reserved pages for CMA region are belong to the ZONE_CMA and it only serves for GFP_HIGHUSER_MOVABLE. Therefore, we don't need to consider ALLOC_CMA at all. Acked-by: Vlastimil Babka Signed-off-by: Joonsoo Kim --- mm/compaction.c | 4 +--- mm/internal.h | 1 - mm/page_alloc.c | 28 +++- 3 files changed, 4 insertions(+), 29 deletions(-) diff --git a/mm/compaction.c b/mm/compaction.c index 29f6c49..4532905 100644 --- a/mm/compaction.c +++ b/mm/compaction.c @@ -1401,14 +1401,12 @@ static enum compact_result __compaction_suitable(struct zone *zone, int order, * if compaction succeeds. * For costly orders, we require low watermark instead of min for * compaction to proceed to increase its chances. -* ALLOC_CMA is used, as pages in CMA pageblocks are considered -* suitable migration targets */ watermark = (order > PAGE_ALLOC_COSTLY_ORDER) ? low_wmark_pages(zone) : min_wmark_pages(zone); watermark += compact_gap(order); if (!__zone_watermark_ok(zone, 0, watermark, classzone_idx, - ALLOC_CMA, wmark_target)) + 0, wmark_target)) return COMPACT_SKIPPED; /* diff --git a/mm/internal.h b/mm/internal.h index 3d3f052..01d06bb 100644 --- a/mm/internal.h +++ b/mm/internal.h @@ -466,7 +466,6 @@ unsigned long reclaim_clean_pages_from_list(struct zone *zone, #define ALLOC_HARDER 0x10 /* try to alloc harder */ #define ALLOC_HIGH 0x20 /* __GFP_HIGH set */ #define ALLOC_CPUSET 0x40 /* check for correct cpuset */ -#define ALLOC_CMA 0x80 /* allow allocations from CMA areas */ enum ttu_flags; struct tlbflush_unmap_batch; diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 91fb172..16ba1fe 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -2565,7 +2565,7 @@ int __isolate_free_page(struct page *page, unsigned int order) * exists. */ watermark = min_wmark_pages(zone) + (1UL << order); - if (!zone_watermark_ok(zone, 0, watermark, 0, ALLOC_CMA)) + if (!zone_watermark_ok(zone, 0, watermark, 0, 0)) return 0; __mod_zone_freepage_state(zone, -(1UL << order), mt); @@ -2808,12 +2808,6 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, else min -= min / 4; -#ifdef CONFIG_CMA - /* If allocation can't use CMA areas don't use free CMA pages */ - if (!(alloc_flags & ALLOC_CMA)) - free_pages -= zone_page_state(z, NR_FREE_CMA_PAGES); -#endif - /* * Check watermarks for an order-0 allocation request. If these * are not met, then a high-order request also cannot go ahead @@ -2843,10 +2837,8 @@ bool __zone_watermark_ok(struct zone *z, unsigned int order, unsigned long mark, } #ifdef CONFIG_CMA - if ((alloc_flags & ALLOC_CMA) && - !list_empty(>free_list[MIGRATE_CMA])) { + if (!list_empty(>free_list[MIGRATE_CMA])) return true; - } #endif } return false; @@ -2863,13 +2855,6 @@ static inline bool zone_watermark_fast(struct zone *z, unsigned int order, unsigned long mark, int classzone_idx, unsigned int alloc_flags) { long free_pages = zone_page_state(z, NR_FREE_PAGES); - long cma_pages = 0; - -#ifdef CONFIG_CMA - /* If allocation can't use CMA areas don't use free CMA pages */ - if (!(alloc_flags & ALLOC_CMA)) - cma_pages = zone_page_state(z, NR_FREE_CMA_PAGES); -#endif /* * Fast check for order-0 only. If this fails then the reserves @@ -2878,7 +2863,7 @@ static inline bool zone_watermark_fast(struct zone *z, unsigned int order, * the caller is !atomic then it'll uselessly search the free * list. That corner case is then slower but it is harmless. */ - if (!order && (free_pages - cma_pages) > mark + z->lowmem_reserve[classzone_idx]) + if (!order && free_pages > mark + z->lowmem_reserve[classzone_idx]) return true; return __zone_watermark_ok(z, order, mark, classzone_idx, alloc_flags, @@ -3355,10 +3340,6 @@ gfp_to_alloc_flags(gfp_t gfp_mask) } else if (unlikely(rt_task(current)) && !in_interrupt()) alloc_flags |= ALLOC_HARDER; -#ifdef CONFIG_CMA - if (gfpflags_to_migratetype(gfp_mask) == MIGRATE_MOVABLE) - alloc_flags |= ALLOC_CMA; -#endif return alloc_flags; } @@ -3727,9 +3708,6 @@ __alloc_pages_nodemask(gfp_t gfp_mask, unsigned int order, if (unlikely(!zonelist->_zonerefs->zone)) return NULL; - if (IS_ENABLED(CONFIG_CMA) && ac.migratetype
[PATCH v5 1/6] mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request
From: Joonsoo Kim Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that important to reserve. When ZONE_MOVABLE is used, this problem would theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE allocation request which is mainly used for page cache and anon page allocation. So, fix it. And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size makes code complex. For example, if there is highmem system, following reserve ratio is activated for *NORMAL ZONE* which would be easyily misleading people. #ifdef CONFIG_HIGHMEM 32 #endif This patch also fix this situation by defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES and place "#ifdef" to right place. Signed-off-by: Joonsoo Kim --- include/linux/mmzone.h | 2 +- mm/page_alloc.c| 7 --- 2 files changed, 5 insertions(+), 4 deletions(-) diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h index d572b78..e3f39af 100644 --- a/include/linux/mmzone.h +++ b/include/linux/mmzone.h @@ -877,7 +877,7 @@ int min_free_kbytes_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); int watermark_scale_factor_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); -extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1]; +extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES]; int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int, void __user *, size_t *, loff_t *); int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *, int, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 4f7d5d7..a8310de 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -198,17 +198,18 @@ static void __free_pages_ok(struct page *page, unsigned int order); * TBD: should special case ZONE_DMA32 machines here - in those we normally * don't need any ZONE_NORMAL reservation */ -int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = { +int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES] = { #ifdef CONFIG_ZONE_DMA 256, #endif #ifdef CONFIG_ZONE_DMA32 256, #endif -#ifdef CONFIG_HIGHMEM 32, +#ifdef CONFIG_HIGHMEM +INT_MAX, #endif -32, +INT_MAX, }; EXPORT_SYMBOL(totalram_pages); -- 1.9.1
[PATCH v5 3/6] mm/cma: populate ZONE_CMA
From: Joonsoo Kim Until now, reserved pages for CMA are managed in the ordinary zones where page's pfn are belong to. This approach has numorous problems and fixing them isn't easy. (It is mentioned on previous patch.) To fix this situation, ZONE_CMA is introduced in previous patch, but, not yet populated. This patch implement population of ZONE_CMA by stealing reserved pages from the ordinary zones. Unlike previous implementation that kernel allocation request with __GFP_MOVABLE could be serviced from CMA region, allocation request only with GFP_HIGHUSER_MOVABLE can be serviced from CMA region in the new approach. This is an inevitable design decision to use the zone implementation because ZONE_CMA could contain highmem. Due to this decision, ZONE_CMA will work like as ZONE_HIGHMEM or ZONE_MOVABLE. I don't think it would be a problem because most of file cache pages and anonymous pages are requested with GFP_HIGHUSER_MOVABLE. It could be proved by the fact that there are many systems with ZONE_HIGHMEM and they work fine. Notable disadvantage is that we cannot use these pages for blockdev file cache page, because it usually has __GFP_MOVABLE but not __GFP_HIGHMEM and __GFP_USER. But, in this case, there is pros and cons. In my experience, blockdev file cache pages are one of the top reason that causes cma_alloc() to fail temporarily. So, we can get more guarantee of cma_alloc() success by discarding that case. Implementation itself is very easy to understand. Steal when cma area is initialized and recalculate various per zone stat/threshold. Signed-off-by: Joonsoo Kim --- include/linux/memory_hotplug.h | 3 --- include/linux/mm.h | 1 + mm/cma.c | 56 ++ mm/internal.h | 3 +++ mm/page_alloc.c| 29 +++--- 5 files changed, 80 insertions(+), 12 deletions(-) diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h index 01033fa..ea5af47 100644 --- a/include/linux/memory_hotplug.h +++ b/include/linux/memory_hotplug.h @@ -198,9 +198,6 @@ void put_online_mems(void); void mem_hotplug_begin(void); void mem_hotplug_done(void); -extern void set_zone_contiguous(struct zone *zone); -extern void clear_zone_contiguous(struct zone *zone); - #else /* ! CONFIG_MEMORY_HOTPLUG */ /* * Stub functions for when hotplug is off diff --git a/include/linux/mm.h b/include/linux/mm.h index 9d85402..f45e0e4 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -1933,6 +1933,7 @@ extern void setup_per_cpu_pageset(void); extern void zone_pcp_update(struct zone *zone); extern void zone_pcp_reset(struct zone *zone); +extern void setup_zone_pageset(struct zone *zone); /* page_alloc.c */ extern int min_free_kbytes; diff --git a/mm/cma.c b/mm/cma.c index 384c2cb..d69bdf7 100644 --- a/mm/cma.c +++ b/mm/cma.c @@ -38,6 +38,7 @@ #include #include "cma.h" +#include "internal.h" struct cma cma_areas[MAX_CMA_AREAS]; unsigned cma_area_count; @@ -116,10 +117,9 @@ static int __init cma_activate_area(struct cma *cma) for (j = pageblock_nr_pages; j; --j, pfn++) { WARN_ON_ONCE(!pfn_valid(pfn)); /* -* alloc_contig_range requires the pfn range -* specified to be in the same zone. Make this -* simple by forcing the entire CMA resv range -* to be in the same zone. +* In init_cma_reserved_pageblock(), present_pages is +* adjusted with assumption that all pages come from +* a single zone. It could be fixed but not yet done. */ if (page_zone(pfn_to_page(pfn)) != zone) goto err; @@ -145,6 +145,28 @@ err: static int __init cma_init_reserved_areas(void) { int i; + struct zone *zone; + unsigned long start_pfn = UINT_MAX, end_pfn = 0; + + if (!cma_area_count) + return 0; + + for (i = 0; i < cma_area_count; i++) { + if (start_pfn > cma_areas[i].base_pfn) + start_pfn = cma_areas[i].base_pfn; + if (end_pfn < cma_areas[i].base_pfn + cma_areas[i].count) + end_pfn = cma_areas[i].base_pfn + cma_areas[i].count; + } + + for_each_zone(zone) { + if (!is_zone_cma(zone)) + continue; + + /* ZONE_CMA doesn't need to exceed CMA region */ + zone->zone_start_pfn = max(zone->zone_start_pfn, start_pfn); + zone->spanned_pages = min(zone_end_pfn(zone), end_pfn) - + zone->zone_start_pfn; + } for (i = 0; i < cma_area_count; i++) { int ret = cma_activate_area(_areas[i]); @@ -153,9 +175,33 @@ static int __init
Re: [PATCH][v8] PM / hibernate: Verify the consistent of e820 memory map by md5 value
On Mon, Aug 29, 2016 at 12:35:40AM +0800, Chen Yu wrote: > On some platforms, there is occasional panic triggered when trying to > resume from hibernation, a typical panic looks like: > > "BUG: unable to handle kernel paging request at 880085894000 > IP: [] load_image_lzo+0x8c2/0xe70" > > This is because e820 map has been changed by BIOS across > hibernation, and one of the page frames from first kernel > is right located in second kernel's unmapped region, so panic > comes out when accessing unmapped kernel address. > > In order to expose this issue earlier, the md5 hash of e820 map > is passed from suspend kernel to resume kernel, and the system will > trigger panic once it finds the md5 value of previous kernel is not > the same as current resume kernel. ... so basically now even the cases where it managed to resume would panic because the digests differ, even if the original panic condition doesn't trigger the bug, i.e. your Note 1 below. The more important question IMHO would be, can we resume our system successfully *even* if BIOS fiddled with the e820 map? We'd still warn the hell out of it and even make that the md5 digest comparison a default-enabled thing without even having a config option to disable it but can we try harder not to panic and deal with this next BIOS f*ckup more intelligently than throwing our hands in the air and giving up? Thanks. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. --
Re: [PATCH][v8] PM / hibernate: Verify the consistent of e820 memory map by md5 value
On Mon, Aug 29, 2016 at 12:35:40AM +0800, Chen Yu wrote: > On some platforms, there is occasional panic triggered when trying to > resume from hibernation, a typical panic looks like: > > "BUG: unable to handle kernel paging request at 880085894000 > IP: [] load_image_lzo+0x8c2/0xe70" > > This is because e820 map has been changed by BIOS across > hibernation, and one of the page frames from first kernel > is right located in second kernel's unmapped region, so panic > comes out when accessing unmapped kernel address. > > In order to expose this issue earlier, the md5 hash of e820 map > is passed from suspend kernel to resume kernel, and the system will > trigger panic once it finds the md5 value of previous kernel is not > the same as current resume kernel. ... so basically now even the cases where it managed to resume would panic because the digests differ, even if the original panic condition doesn't trigger the bug, i.e. your Note 1 below. The more important question IMHO would be, can we resume our system successfully *even* if BIOS fiddled with the e820 map? We'd still warn the hell out of it and even make that the md5 digest comparison a default-enabled thing without even having a config option to disable it but can we try harder not to panic and deal with this next BIOS f*ckup more intelligently than throwing our hands in the air and giving up? Thanks. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. --
Re: [PART2 PATCH v7 00/12] iommu/AMD: Introduce IOMMU AVIC support
Hi Joerg, Radim Any other concerns? Thanks, Suravee On 8/24/16 01:52, Suravee Suthikulpanit wrote: From: Suravee SuthikulpanitCHANGES FROM V6 === Per Radim: * No longer expose struct amd_ir_data to SVM. * Introduce struct amd_svm_iommu_ir (amd_ir_data wrapper). * Fix logic to manage ir_list where we need to remove the posted interrupt from the previous ir_list before mapping it to a new vcpu. Tested running smp VM with: - Using irqbalance - No irqbalance (manually set /proc/irq/smp_affinity) Misc: * 08/12: Only set ga_root_ptr in amd_ir_set_vcpu_affinity(). * 10/12: Fix bug in #define AVIC_GATAG_TO_VCPUID. GITHUB == Latest git tree can be found at: http://github.com/ssuthiku/linux.gitavic_part2_v7 OVERVIEW This patch set is the second part of the two-part patch series to introduce the new AMD Advance Virtual Interrupt Controller (AVIC) support. In addition to the SVM AVIC, AMD IOMMU also extends the AVIC capability to allow I/O interrupts injection directly into the virtualized guest local APIC without the need for hypervisor intervention. This patch series introduces a new hardware interrupt remapping (IR) mode in AMD IOMMU driver, the Guest Virtual APIC (GA) mode. This is in contrast to the existing "legacy" mode. The IR mode can be specified with a new kernel parameter: amd_iommu_guest_ir=[vapic (default) | legacy] When enabling GA mode, the AMD IOMMU driver will configure device interrupt remapping in GA mode when possible (i.e. SVM AVIC must be enabled, and if the interrupt types are supported). Otherewise, the driver will fallback to using the legacy IR mode. This patch series also introduces new interfaces between SVM and IOMMU to allow: * SVM driver to communicate to IOMMU with updated vcpu scheduling information. * IOMMU driver to notify SVM driver to schedule vcpu on to physical core handle IOMMU GALog entry. DOCUMENTATIONS == More information about SVM AVIC can be found in the AMD64 Architecture Programmer’s Manual Volume 2 - System Programming. http://support.amd.com/TechDocs/24593.pdf More information about IOMMU AVIC can be found int the AMD I/O Virtualization Technology (IOMMU) Specification - Rev 2.62. http://support.amd.com/TechDocs/48882_IOMMU.pdf Any feedback and comments are very much appreciated. Thank you, Suravee Suravee Suthikulpanit (12): iommu/amd: Detect and enable guest vAPIC support iommu/amd: Move and introduce new IRTE-related unions and structures iommu/amd: Introduce interrupt remapping ops structure iommu/amd: Add support for multiple IRTE formats iommu/amd: Detect and initialize guest vAPIC log iommu/amd: Adding GALOG interrupt handler iommu/amd: Introduce amd_iommu_update_ga() iommu/amd: Implements irq_set_vcpu_affinity() hook to setup vapic mode for pass-through devices iommu/amd: Enable vAPIC interrupt remapping mode by default svm: Introduces AVIC per-VM ID svm: Introduce AMD IOMMU avic_ga_log_notifier svm: Implements update_pi_irte hook to setup posted interrupt Documentation/kernel-parameters.txt | 9 + arch/x86/include/asm/kvm_host.h | 2 + arch/x86/kvm/svm.c | 406 -- drivers/iommu/amd_iommu.c | 484 +++- drivers/iommu/amd_iommu_init.c | 181 +- drivers/iommu/amd_iommu_proto.h | 1 + drivers/iommu/amd_iommu_types.h | 149 +++ include/linux/amd-iommu.h | 43 +++- 8 files changed, 1188 insertions(+), 87 deletions(-)
Re: [PART2 PATCH v7 00/12] iommu/AMD: Introduce IOMMU AVIC support
Hi Joerg, Radim Any other concerns? Thanks, Suravee On 8/24/16 01:52, Suravee Suthikulpanit wrote: From: Suravee Suthikulpanit CHANGES FROM V6 === Per Radim: * No longer expose struct amd_ir_data to SVM. * Introduce struct amd_svm_iommu_ir (amd_ir_data wrapper). * Fix logic to manage ir_list where we need to remove the posted interrupt from the previous ir_list before mapping it to a new vcpu. Tested running smp VM with: - Using irqbalance - No irqbalance (manually set /proc/irq/smp_affinity) Misc: * 08/12: Only set ga_root_ptr in amd_ir_set_vcpu_affinity(). * 10/12: Fix bug in #define AVIC_GATAG_TO_VCPUID. GITHUB == Latest git tree can be found at: http://github.com/ssuthiku/linux.gitavic_part2_v7 OVERVIEW This patch set is the second part of the two-part patch series to introduce the new AMD Advance Virtual Interrupt Controller (AVIC) support. In addition to the SVM AVIC, AMD IOMMU also extends the AVIC capability to allow I/O interrupts injection directly into the virtualized guest local APIC without the need for hypervisor intervention. This patch series introduces a new hardware interrupt remapping (IR) mode in AMD IOMMU driver, the Guest Virtual APIC (GA) mode. This is in contrast to the existing "legacy" mode. The IR mode can be specified with a new kernel parameter: amd_iommu_guest_ir=[vapic (default) | legacy] When enabling GA mode, the AMD IOMMU driver will configure device interrupt remapping in GA mode when possible (i.e. SVM AVIC must be enabled, and if the interrupt types are supported). Otherewise, the driver will fallback to using the legacy IR mode. This patch series also introduces new interfaces between SVM and IOMMU to allow: * SVM driver to communicate to IOMMU with updated vcpu scheduling information. * IOMMU driver to notify SVM driver to schedule vcpu on to physical core handle IOMMU GALog entry. DOCUMENTATIONS == More information about SVM AVIC can be found in the AMD64 Architecture Programmer’s Manual Volume 2 - System Programming. http://support.amd.com/TechDocs/24593.pdf More information about IOMMU AVIC can be found int the AMD I/O Virtualization Technology (IOMMU) Specification - Rev 2.62. http://support.amd.com/TechDocs/48882_IOMMU.pdf Any feedback and comments are very much appreciated. Thank you, Suravee Suravee Suthikulpanit (12): iommu/amd: Detect and enable guest vAPIC support iommu/amd: Move and introduce new IRTE-related unions and structures iommu/amd: Introduce interrupt remapping ops structure iommu/amd: Add support for multiple IRTE formats iommu/amd: Detect and initialize guest vAPIC log iommu/amd: Adding GALOG interrupt handler iommu/amd: Introduce amd_iommu_update_ga() iommu/amd: Implements irq_set_vcpu_affinity() hook to setup vapic mode for pass-through devices iommu/amd: Enable vAPIC interrupt remapping mode by default svm: Introduces AVIC per-VM ID svm: Introduce AMD IOMMU avic_ga_log_notifier svm: Implements update_pi_irte hook to setup posted interrupt Documentation/kernel-parameters.txt | 9 + arch/x86/include/asm/kvm_host.h | 2 + arch/x86/kvm/svm.c | 406 -- drivers/iommu/amd_iommu.c | 484 +++- drivers/iommu/amd_iommu_init.c | 181 +- drivers/iommu/amd_iommu_proto.h | 1 + drivers/iommu/amd_iommu_types.h | 149 +++ include/linux/amd-iommu.h | 43 +++- 8 files changed, 1188 insertions(+), 87 deletions(-)
[GIT PULL] Please pull powerpc/linux.git powerpc-4.8-4 tag
Hi Linus ! So my appologies for being a lousy replacement maintainer while Michael is on vacation ... this was meant to be sent early last week, but I has a change pending on one of the fixes and other things made me forget all about. Ugh. This is my first signed-tag and use of 2fa so I hope I got it all right... I tried to use the same format Michael uses for the tag etc... We have some misc fixes for powerpc 4.8. Some trivial bits and some regressions, and a trivial cleanup or two that I saw no point in letting rot in patchwork. Cheers, Ben. The following changes since commit fa8410b355251fd30341662a40ac6b22d3e38468: Linux 4.8-rc3 (2016-08-21 16:14:10 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-4.8-4 for you to fetch changes up to 78a3e8889b4b6b99775ed954696ff3e017f5d19b: powerpc: signals: Discard transaction state from signal frames (2016-08-29 12:48:40 +1000) Andrew Donnellan (1): cxl: use pcibios_free_controller_deferred() when removing vPHBs Andrzej Hajda (1): powerpc/powernv/pci: fix iterator signedness Boqun Feng (1): powerpc, hotplug: Avoid to touch non-existent cpumasks. Christophe Leroy (1): powerpc: sysdev: cpm: fix gpio save_regs functions Cyril Bur (1): powerpc: signals: Discard transaction state from signal frames Guenter Roeck (1): powerpc: cputhreads: Add missing include file Markus Elfring (3): drivers/macintosh: Delete owner assignment powerpc/512x: Delete unnecessary assignment for the field "owner" powerpc: mpc8349emitx: Delete unnecessary assignment for the field "owner" Mauricio Faria de Oliveira (1): powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb) Michael Ellerman (1): powerpc/prom: Fix sub-processor option passed to ibm, client-architecture-support Mukesh Ojha (1): powerpc/powernv : Drop reference added by kset_find_obj() Nicholas Piggin (3): powerpc/pseries: PACA save area fix for general exception vs MCE powerpc/pseries: PACA save area fix for MCE vs MCE powerpc/tm: do not use r13 for tabort_syscall Paolo Bonzini (1): powerpc: move hmi.c to arch/powerpc/kvm/ Paul Gortmaker (1): powerpc: migrate exception table users off module.h and onto extable.h Documentation/powerpc/transactional_memory.txt | 2 ++ arch/powerpc/include/asm/cputhreads.h | 1 + arch/powerpc/include/asm/hmi.h | 2 +- arch/powerpc/include/asm/paca.h| 12 +--- arch/powerpc/include/asm/pci-bridge.h | 1 + arch/powerpc/kernel/Makefile | 2 +- arch/powerpc/kernel/entry_64.S | 12 arch/powerpc/kernel/exceptions-64s.S | 29 ++--- arch/powerpc/kernel/kprobes.c | 2 +- arch/powerpc/kernel/pci-common.c | 36 ++ arch/powerpc/kernel/prom_init.c| 9 -- arch/powerpc/kernel/signal_32.c| 14 + arch/powerpc/kernel/signal_64.c| 14 + arch/powerpc/kernel/smp.c | 2 +- arch/powerpc/kernel/traps.c| 3 +- arch/powerpc/kvm/Makefile | 1 + arch/powerpc/{kernel/hmi.c => kvm/book3s_hv_hmi.c} | 0 arch/powerpc/mm/fault.c| 2 +- arch/powerpc/platforms/512x/mpc512x_lpbfifo.c | 1 - arch/powerpc/platforms/83xx/mcu_mpc8349emitx.c | 1 - arch/powerpc/platforms/embedded6xx/holly.c | 2 +- arch/powerpc/platforms/embedded6xx/mpc7448_hpc2.c | 2 +- arch/powerpc/platforms/powernv/opal-dump.c | 7 - arch/powerpc/platforms/powernv/opal-elog.c | 7 - arch/powerpc/platforms/powernv/pci-ioda.c | 2 +- arch/powerpc/platforms/pseries/pci.c | 4 +++ arch/powerpc/platforms/pseries/pci_dlpar.c | 7 +++-- arch/powerpc/sysdev/cpm1.c | 6 ++-- arch/powerpc/sysdev/cpm_common.c | 3 +- arch/powerpc/sysdev/fsl_rio.c | 2 +- drivers/macintosh/ams/ams-i2c.c| 1 - drivers/macintosh/windfarm_pm112.c | 1 - drivers/macintosh/windfarm_pm72.c | 1 - drivers/macintosh/windfarm_rm31.c | 1 - drivers/misc/cxl/vphb.c| 10 +- drivers/pci/host-bridge.c | 1 + 36 files changed, 160 insertions(+), 43 deletions(-) rename arch/powerpc/{kernel/hmi.c => kvm/book3s_hv_hmi.c} (100%)
[GIT PULL] Please pull powerpc/linux.git powerpc-4.8-4 tag
Hi Linus ! So my appologies for being a lousy replacement maintainer while Michael is on vacation ... this was meant to be sent early last week, but I has a change pending on one of the fixes and other things made me forget all about. Ugh. This is my first signed-tag and use of 2fa so I hope I got it all right... I tried to use the same format Michael uses for the tag etc... We have some misc fixes for powerpc 4.8. Some trivial bits and some regressions, and a trivial cleanup or two that I saw no point in letting rot in patchwork. Cheers, Ben. The following changes since commit fa8410b355251fd30341662a40ac6b22d3e38468: Linux 4.8-rc3 (2016-08-21 16:14:10 -0700) are available in the git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/powerpc/linux.git tags/powerpc-4.8-4 for you to fetch changes up to 78a3e8889b4b6b99775ed954696ff3e017f5d19b: powerpc: signals: Discard transaction state from signal frames (2016-08-29 12:48:40 +1000) Andrew Donnellan (1): cxl: use pcibios_free_controller_deferred() when removing vPHBs Andrzej Hajda (1): powerpc/powernv/pci: fix iterator signedness Boqun Feng (1): powerpc, hotplug: Avoid to touch non-existent cpumasks. Christophe Leroy (1): powerpc: sysdev: cpm: fix gpio save_regs functions Cyril Bur (1): powerpc: signals: Discard transaction state from signal frames Guenter Roeck (1): powerpc: cputhreads: Add missing include file Markus Elfring (3): drivers/macintosh: Delete owner assignment powerpc/512x: Delete unnecessary assignment for the field "owner" powerpc: mpc8349emitx: Delete unnecessary assignment for the field "owner" Mauricio Faria de Oliveira (1): powerpc/pseries: use pci_host_bridge.release_fn() to kfree(phb) Michael Ellerman (1): powerpc/prom: Fix sub-processor option passed to ibm, client-architecture-support Mukesh Ojha (1): powerpc/powernv : Drop reference added by kset_find_obj() Nicholas Piggin (3): powerpc/pseries: PACA save area fix for general exception vs MCE powerpc/pseries: PACA save area fix for MCE vs MCE powerpc/tm: do not use r13 for tabort_syscall Paolo Bonzini (1): powerpc: move hmi.c to arch/powerpc/kvm/ Paul Gortmaker (1): powerpc: migrate exception table users off module.h and onto extable.h Documentation/powerpc/transactional_memory.txt | 2 ++ arch/powerpc/include/asm/cputhreads.h | 1 + arch/powerpc/include/asm/hmi.h | 2 +- arch/powerpc/include/asm/paca.h| 12 +--- arch/powerpc/include/asm/pci-bridge.h | 1 + arch/powerpc/kernel/Makefile | 2 +- arch/powerpc/kernel/entry_64.S | 12 arch/powerpc/kernel/exceptions-64s.S | 29 ++--- arch/powerpc/kernel/kprobes.c | 2 +- arch/powerpc/kernel/pci-common.c | 36 ++ arch/powerpc/kernel/prom_init.c| 9 -- arch/powerpc/kernel/signal_32.c| 14 + arch/powerpc/kernel/signal_64.c| 14 + arch/powerpc/kernel/smp.c | 2 +- arch/powerpc/kernel/traps.c| 3 +- arch/powerpc/kvm/Makefile | 1 + arch/powerpc/{kernel/hmi.c => kvm/book3s_hv_hmi.c} | 0 arch/powerpc/mm/fault.c| 2 +- arch/powerpc/platforms/512x/mpc512x_lpbfifo.c | 1 - arch/powerpc/platforms/83xx/mcu_mpc8349emitx.c | 1 - arch/powerpc/platforms/embedded6xx/holly.c | 2 +- arch/powerpc/platforms/embedded6xx/mpc7448_hpc2.c | 2 +- arch/powerpc/platforms/powernv/opal-dump.c | 7 - arch/powerpc/platforms/powernv/opal-elog.c | 7 - arch/powerpc/platforms/powernv/pci-ioda.c | 2 +- arch/powerpc/platforms/pseries/pci.c | 4 +++ arch/powerpc/platforms/pseries/pci_dlpar.c | 7 +++-- arch/powerpc/sysdev/cpm1.c | 6 ++-- arch/powerpc/sysdev/cpm_common.c | 3 +- arch/powerpc/sysdev/fsl_rio.c | 2 +- drivers/macintosh/ams/ams-i2c.c| 1 - drivers/macintosh/windfarm_pm112.c | 1 - drivers/macintosh/windfarm_pm72.c | 1 - drivers/macintosh/windfarm_rm31.c | 1 - drivers/misc/cxl/vphb.c| 10 +- drivers/pci/host-bridge.c | 1 + 36 files changed, 160 insertions(+), 43 deletions(-) rename arch/powerpc/{kernel/hmi.c => kvm/book3s_hv_hmi.c} (100%)
Build error in timer-atmel-pit.c
Daniel, After updating to linux-4.8-rc4, I got the following build error: linux-x.yy/drivers/clocksource/timer-atmel-pit.c: In function 'at91sam926x_pit_dt_init': linux-x.yy/drivers/clocksource/timer-atmel-pit.c:264:2: error: 'ret' undeclared (first use in this function) ret = clk_prepare_enable(data->mck); ^~~ linux-x.yy/drivers/clocksource/timer-atmel-pit.c:264:2: note: each undeclared identifier is reported only once for each function it appears in This was introduced in commit: 699e36e5b8e9f77b2be4c23f0b309e53be4b2880 Regards, Brent Taylor
Build error in timer-atmel-pit.c
Daniel, After updating to linux-4.8-rc4, I got the following build error: linux-x.yy/drivers/clocksource/timer-atmel-pit.c: In function 'at91sam926x_pit_dt_init': linux-x.yy/drivers/clocksource/timer-atmel-pit.c:264:2: error: 'ret' undeclared (first use in this function) ret = clk_prepare_enable(data->mck); ^~~ linux-x.yy/drivers/clocksource/timer-atmel-pit.c:264:2: note: each undeclared identifier is reported only once for each function it appears in This was introduced in commit: 699e36e5b8e9f77b2be4c23f0b309e53be4b2880 Regards, Brent Taylor
Re: imx-drm: Possible regression after update to atomic
Hi Thorsten, On Sun, Aug 28, 2016 at 6:17 PM, Thorsten Leemhuiswrote: > Lo! Dave, below report made it to the list of regression for 4.8, but > afaics nothing happened after the initial report. Was it discussed (and > maybe even fixed?) elsewhere? Or is there some reason why it shouldn't > be on the list of regressions at all? We've got a patch set[1] to fix this. [1] http://www.spinics.net/lists/dri-devel/msg116491.html Regards, Liu Ying > > Ciao, Thorsten > > On 13.08.2016 14:37, Peter Senna Tschudin wrote: >> >> d7868cb7ac58640e9c0383205ba31bd6a985cc6f is the last commit that works for >> me. I'm experiencing black screen after Weston starts in two different i.MX >> based devices: >> >> - i.MX6 -> arch/arm/boot/dts/imx6q-b850v3.dts >> - i.MX53 based device >> >> Weston starts, but nothing is shown on screen. fb works fine. Disabling fb >> on Kconfig or simply commenting out drm_fbdev_cma_init() solves the black >> screen issue. >> >> The tests that are causing the black screen: >> >> diff --git a/drivers/gpu/drm/imx/ipuv3-plane.c >> b/drivers/gpu/drm/imx/ipuv3-plane.c >> index 4ad67d0..52dc1b7 100644 >> --- a/drivers/gpu/drm/imx/ipuv3-plane.c >> +++ b/drivers/gpu/drm/imx/ipuv3-plane.c >> @@ -325,7 +325,7 @@ static int ipu_plane_atomic_check(struct drm_plane >> *plane, >> if (old_fb && (state->src_w != old_state->src_w || >> state->src_h != old_state->src_h || >> fb->pixel_format != old_fb->pixel_format)) >> - return -EINVAL; >> >> eba = drm_plane_state_to_eba(state); >> >> @@ -336,7 +336,7 @@ static int ipu_plane_atomic_check(struct drm_plane >> *plane, >> return -EINVAL; >> >> if (old_fb && fb->pitches[0] != old_fb->pitches[0]) >> - return -EINVAL; >> >> switch (fb->pixel_format) { >> case DRM_FORMAT_YUV420: >> @@ -372,7 +372,7 @@ static int ipu_plane_atomic_check(struct drm_plane >> *plane, >> return -EINVAL; >> >> if (old_fb && old_fb->pitches[1] != fb->pitches[1]) >> - return -EINVAL; >> } >> >> I tried to replace the return -EINVAL by crtc_state->mode_changed = true >> with no positive results. >> >> I'm trying to understand what is the difference with and without fb, but I >> have no conclusions yet. >> >> Hints on what could be the cause here? >> >> Thank you, >> >> Peter >> >> P.S. This is what I get after replacing the return -EINVAL(the mode is >> correct): https://goo.gl/photos/1eRdcco9GpszgvzM8
Re: imx-drm: Possible regression after update to atomic
Hi Thorsten, On Sun, Aug 28, 2016 at 6:17 PM, Thorsten Leemhuis wrote: > Lo! Dave, below report made it to the list of regression for 4.8, but > afaics nothing happened after the initial report. Was it discussed (and > maybe even fixed?) elsewhere? Or is there some reason why it shouldn't > be on the list of regressions at all? We've got a patch set[1] to fix this. [1] http://www.spinics.net/lists/dri-devel/msg116491.html Regards, Liu Ying > > Ciao, Thorsten > > On 13.08.2016 14:37, Peter Senna Tschudin wrote: >> >> d7868cb7ac58640e9c0383205ba31bd6a985cc6f is the last commit that works for >> me. I'm experiencing black screen after Weston starts in two different i.MX >> based devices: >> >> - i.MX6 -> arch/arm/boot/dts/imx6q-b850v3.dts >> - i.MX53 based device >> >> Weston starts, but nothing is shown on screen. fb works fine. Disabling fb >> on Kconfig or simply commenting out drm_fbdev_cma_init() solves the black >> screen issue. >> >> The tests that are causing the black screen: >> >> diff --git a/drivers/gpu/drm/imx/ipuv3-plane.c >> b/drivers/gpu/drm/imx/ipuv3-plane.c >> index 4ad67d0..52dc1b7 100644 >> --- a/drivers/gpu/drm/imx/ipuv3-plane.c >> +++ b/drivers/gpu/drm/imx/ipuv3-plane.c >> @@ -325,7 +325,7 @@ static int ipu_plane_atomic_check(struct drm_plane >> *plane, >> if (old_fb && (state->src_w != old_state->src_w || >> state->src_h != old_state->src_h || >> fb->pixel_format != old_fb->pixel_format)) >> - return -EINVAL; >> >> eba = drm_plane_state_to_eba(state); >> >> @@ -336,7 +336,7 @@ static int ipu_plane_atomic_check(struct drm_plane >> *plane, >> return -EINVAL; >> >> if (old_fb && fb->pitches[0] != old_fb->pitches[0]) >> - return -EINVAL; >> >> switch (fb->pixel_format) { >> case DRM_FORMAT_YUV420: >> @@ -372,7 +372,7 @@ static int ipu_plane_atomic_check(struct drm_plane >> *plane, >> return -EINVAL; >> >> if (old_fb && old_fb->pitches[1] != fb->pitches[1]) >> - return -EINVAL; >> } >> >> I tried to replace the return -EINVAL by crtc_state->mode_changed = true >> with no positive results. >> >> I'm trying to understand what is the difference with and without fb, but I >> have no conclusions yet. >> >> Hints on what could be the cause here? >> >> Thank you, >> >> Peter >> >> P.S. This is what I get after replacing the return -EINVAL(the mode is >> correct): https://goo.gl/photos/1eRdcco9GpszgvzM8
Re: [PATCH 5/5] net/xgene: fix error handling during reset
From: Arnd BergmannDate: Fri, 26 Aug 2016 17:25:46 +0200 > The newly added reset logic uses helper functions for the MMIO that > may fail. However, when the read operation fails, we end up writing > back uninitialized data to the register, as gcc warns: > > drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c: In function > 'xgene_enet_link_state': > drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c:213:2: error: 'data' may be > used uninitialized in this function [-Werror=maybe-uninitialized] > drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c:209:6: note: 'data' was > declared here > u32 data; > > We already print a warning to the console log if that happens, > the best alternative that I can see is skip the rest of the reset > sequence if the register value cannot be read: Most likely the > write would fail as well, and if it succeeded, worse things could > happen. > > Signed-off-by: Arnd Bergmann > Fixes: 3eb7cb9dc946 ("drivers: net: xgene: XFI PCS reset when link is down") Applied.
Re: [PATCH 4/5] net_sched: fix use of uninitialized ethertype variable in cls_flower
From: Arnd BergmannDate: Fri, 26 Aug 2016 17:25:45 +0200 > The addition of VLAN support caused a possible use of uninitialized > data if we encounter a zero TCA_FLOWER_KEY_ETH_TYPE key, as pointed > out by "gcc -Wmaybe-uninitialized": > > net/sched/cls_flower.c: In function 'fl_change': > net/sched/cls_flower.c:366:22: error: 'ethertype' may be used uninitialized > in this function [-Werror=maybe-uninitialized] > > This changes the code to only set the ethertype field if it > was nonzero, as before the patch. > > Signed-off-by: Arnd Bergmann > Fixes: 9399ae9a6cb2 ("net_sched: flower: Add vlan support") Applied.
Re: [PATCH 5/5] net/xgene: fix error handling during reset
From: Arnd Bergmann Date: Fri, 26 Aug 2016 17:25:46 +0200 > The newly added reset logic uses helper functions for the MMIO that > may fail. However, when the read operation fails, we end up writing > back uninitialized data to the register, as gcc warns: > > drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c: In function > 'xgene_enet_link_state': > drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c:213:2: error: 'data' may be > used uninitialized in this function [-Werror=maybe-uninitialized] > drivers/net/ethernet/apm/xgene/xgene_enet_xgmac.c:209:6: note: 'data' was > declared here > u32 data; > > We already print a warning to the console log if that happens, > the best alternative that I can see is skip the rest of the reset > sequence if the register value cannot be read: Most likely the > write would fail as well, and if it succeeded, worse things could > happen. > > Signed-off-by: Arnd Bergmann > Fixes: 3eb7cb9dc946 ("drivers: net: xgene: XFI PCS reset when link is down") Applied.
Re: [PATCH 4/5] net_sched: fix use of uninitialized ethertype variable in cls_flower
From: Arnd Bergmann Date: Fri, 26 Aug 2016 17:25:45 +0200 > The addition of VLAN support caused a possible use of uninitialized > data if we encounter a zero TCA_FLOWER_KEY_ETH_TYPE key, as pointed > out by "gcc -Wmaybe-uninitialized": > > net/sched/cls_flower.c: In function 'fl_change': > net/sched/cls_flower.c:366:22: error: 'ethertype' may be used uninitialized > in this function [-Werror=maybe-uninitialized] > > This changes the code to only set the ethertype field if it > was nonzero, as before the patch. > > Signed-off-by: Arnd Bergmann > Fixes: 9399ae9a6cb2 ("net_sched: flower: Add vlan support") Applied.
Re: [PATCH v3 1/2] input: misc: Add generic input driver to read encoded GPIO lines
On Thursday 25 August 2016 10:26 PM, Dmitry Torokhov wrote: > On Wed, Aug 24, 2016 at 01:28:58PM +0530, Vignesh R wrote: >> Add a driver to read group of GPIO lines and provide its status as a >> numerical value as input event to the system. This will help in >> interfacing devices, that can be connected over GPIOs, that provide >> input to the system by driving GPIO lines connected to them like a >> rotary dial or a switch. >> >> For example, a rotary switch can be connected to four GPIO lines. The >> status of the GPIO lines reflect the actual position of the rotary >> switch dial. For example, if dial points to 9, then the four GPIO lines >> connected to the switch will read HLLH(0b'1001 = 9). This value >> can be reported as an ABS_* event to the input subsystem. >> >> Signed-off-by: Vignesh R>> Acked-by: Rob Herring >> --- >> >> v3: Fix comments by Andrew and Dmitry >> Link to v2: https://lkml.org/lkml/2016/8/23/79 >> >> .../devicetree/bindings/input/gpio-decoder.txt | 23 >> drivers/input/misc/Kconfig | 12 ++ >> drivers/input/misc/Makefile| 1 + >> drivers/input/misc/gpio_decoder.c | 134 >> + >> 4 files changed, 170 insertions(+) >> create mode 100644 Documentation/devicetree/bindings/input/gpio-decoder.txt >> create mode 100644 drivers/input/misc/gpio_decoder.c >> >> diff --git a/Documentation/devicetree/bindings/input/gpio-decoder.txt >> b/Documentation/devicetree/bindings/input/gpio-decoder.txt >> new file mode 100644 >> index ..14a77fb96cf0 >> --- /dev/null >> +++ b/Documentation/devicetree/bindings/input/gpio-decoder.txt >> @@ -0,0 +1,23 @@ >> +* GPIO Decoder DT bindings >> + >> +Required Properties: >> +- compatible: should be "gpio-decoder" >> +- gpios: a spec of gpios (at least two) to be decoded to a number with >> + first entry representing the MSB. >> + >> +Optional Properties: >> +- decoder-max-value: Maximum possible value that can be reported by >> + the gpios. >> +- linux,axis: the input subsystem axis to map to (ABS_X/ABS_Y). >> + Defaults to 0 (ABS_X). >> + >> +Example: >> +gpio-decoder0 { >> +compatible = "gpio-decoder"; >> +gpios = < 3 GPIO_ACTIVE_HIGH>, >> +< 2 GPIO_ACTIVE_HIGH>, >> +< 1 GPIO_ACTIVE_HIGH>, >> +< 0 GPIO_ACTIVE_HIGH>; >> +linux,axis = <0>; /* ABS_X */ >> +decoder-max-value = <9>; >> +}; >> diff --git a/drivers/input/misc/Kconfig b/drivers/input/misc/Kconfig >> index efb0ca871327..7cdb89397d18 100644 >> --- a/drivers/input/misc/Kconfig >> +++ b/drivers/input/misc/Kconfig >> @@ -292,6 +292,18 @@ config INPUT_GPIO_TILT_POLLED >>To compile this driver as a module, choose M here: the >>module will be called gpio_tilt_polled. >> >> +config INPUT_GPIO_DECODER >> +tristate "Polled GPIO Decoder Input driver" >> +depends on GPIOLIB || COMPILE_TEST >> +select INPUT_POLLDEV >> +help >> + Say Y here if you want driver to read status of multiple GPIO >> + lines and report the encoded value as an absolute integer to >> + input subsystem. >> + >> + To compile this driver as a module, choose M here: the module >> + will be called gpio_decoder. >> + >> config INPUT_IXP4XX_BEEPER >> tristate "IXP4XX Beeper support" >> depends on ARCH_IXP4XX >> diff --git a/drivers/input/misc/Makefile b/drivers/input/misc/Makefile >> index 6a1e5e20fc1c..0b6d025f0487 100644 >> --- a/drivers/input/misc/Makefile >> +++ b/drivers/input/misc/Makefile >> @@ -35,6 +35,7 @@ obj-$(CONFIG_INPUT_DRV2667_HAPTICS)+= drv2667.o >> obj-$(CONFIG_INPUT_GP2A)+= gp2ap002a00f.o >> obj-$(CONFIG_INPUT_GPIO_BEEPER) += gpio-beeper.o >> obj-$(CONFIG_INPUT_GPIO_TILT_POLLED)+= gpio_tilt_polled.o >> +obj-$(CONFIG_INPUT_GPIO_DECODER)+= gpio_decoder.o >> obj-$(CONFIG_INPUT_HISI_POWERKEY) += hisi_powerkey.o >> obj-$(CONFIG_HP_SDC_RTC)+= hp_sdc_rtc.o >> obj-$(CONFIG_INPUT_IMS_PCU) += ims-pcu.o >> diff --git a/drivers/input/misc/gpio_decoder.c >> b/drivers/input/misc/gpio_decoder.c >> new file mode 100644 >> index ..1c2191d4b143 >> --- /dev/null >> +++ b/drivers/input/misc/gpio_decoder.c >> @@ -0,0 +1,134 @@ >> +/* >> + * Copyright (C) 2016 Texas Instruments Incorporated - http://www.ti.com/ >> + * >> + * This program is free software; you can redistribute it and/or >> + * modify it under the terms of the GNU General Public License as >> + * published by the Free Software Foundation version 2. >> + * >> + * This program is distributed "as is" WITHOUT ANY WARRANTY of any >> + * kind, whether express or implied; without even the implied warranty >> + * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> + * GNU General Public License for more details. >> + * >> + * A generic driver to read multiple gpio
Re: [PATCH v3 1/2] input: misc: Add generic input driver to read encoded GPIO lines
On Thursday 25 August 2016 10:26 PM, Dmitry Torokhov wrote: > On Wed, Aug 24, 2016 at 01:28:58PM +0530, Vignesh R wrote: >> Add a driver to read group of GPIO lines and provide its status as a >> numerical value as input event to the system. This will help in >> interfacing devices, that can be connected over GPIOs, that provide >> input to the system by driving GPIO lines connected to them like a >> rotary dial or a switch. >> >> For example, a rotary switch can be connected to four GPIO lines. The >> status of the GPIO lines reflect the actual position of the rotary >> switch dial. For example, if dial points to 9, then the four GPIO lines >> connected to the switch will read HLLH(0b'1001 = 9). This value >> can be reported as an ABS_* event to the input subsystem. >> >> Signed-off-by: Vignesh R >> Acked-by: Rob Herring >> --- >> >> v3: Fix comments by Andrew and Dmitry >> Link to v2: https://lkml.org/lkml/2016/8/23/79 >> >> .../devicetree/bindings/input/gpio-decoder.txt | 23 >> drivers/input/misc/Kconfig | 12 ++ >> drivers/input/misc/Makefile| 1 + >> drivers/input/misc/gpio_decoder.c | 134 >> + >> 4 files changed, 170 insertions(+) >> create mode 100644 Documentation/devicetree/bindings/input/gpio-decoder.txt >> create mode 100644 drivers/input/misc/gpio_decoder.c >> >> diff --git a/Documentation/devicetree/bindings/input/gpio-decoder.txt >> b/Documentation/devicetree/bindings/input/gpio-decoder.txt >> new file mode 100644 >> index ..14a77fb96cf0 >> --- /dev/null >> +++ b/Documentation/devicetree/bindings/input/gpio-decoder.txt >> @@ -0,0 +1,23 @@ >> +* GPIO Decoder DT bindings >> + >> +Required Properties: >> +- compatible: should be "gpio-decoder" >> +- gpios: a spec of gpios (at least two) to be decoded to a number with >> + first entry representing the MSB. >> + >> +Optional Properties: >> +- decoder-max-value: Maximum possible value that can be reported by >> + the gpios. >> +- linux,axis: the input subsystem axis to map to (ABS_X/ABS_Y). >> + Defaults to 0 (ABS_X). >> + >> +Example: >> +gpio-decoder0 { >> +compatible = "gpio-decoder"; >> +gpios = < 3 GPIO_ACTIVE_HIGH>, >> +< 2 GPIO_ACTIVE_HIGH>, >> +< 1 GPIO_ACTIVE_HIGH>, >> +< 0 GPIO_ACTIVE_HIGH>; >> +linux,axis = <0>; /* ABS_X */ >> +decoder-max-value = <9>; >> +}; >> diff --git a/drivers/input/misc/Kconfig b/drivers/input/misc/Kconfig >> index efb0ca871327..7cdb89397d18 100644 >> --- a/drivers/input/misc/Kconfig >> +++ b/drivers/input/misc/Kconfig >> @@ -292,6 +292,18 @@ config INPUT_GPIO_TILT_POLLED >>To compile this driver as a module, choose M here: the >>module will be called gpio_tilt_polled. >> >> +config INPUT_GPIO_DECODER >> +tristate "Polled GPIO Decoder Input driver" >> +depends on GPIOLIB || COMPILE_TEST >> +select INPUT_POLLDEV >> +help >> + Say Y here if you want driver to read status of multiple GPIO >> + lines and report the encoded value as an absolute integer to >> + input subsystem. >> + >> + To compile this driver as a module, choose M here: the module >> + will be called gpio_decoder. >> + >> config INPUT_IXP4XX_BEEPER >> tristate "IXP4XX Beeper support" >> depends on ARCH_IXP4XX >> diff --git a/drivers/input/misc/Makefile b/drivers/input/misc/Makefile >> index 6a1e5e20fc1c..0b6d025f0487 100644 >> --- a/drivers/input/misc/Makefile >> +++ b/drivers/input/misc/Makefile >> @@ -35,6 +35,7 @@ obj-$(CONFIG_INPUT_DRV2667_HAPTICS)+= drv2667.o >> obj-$(CONFIG_INPUT_GP2A)+= gp2ap002a00f.o >> obj-$(CONFIG_INPUT_GPIO_BEEPER) += gpio-beeper.o >> obj-$(CONFIG_INPUT_GPIO_TILT_POLLED)+= gpio_tilt_polled.o >> +obj-$(CONFIG_INPUT_GPIO_DECODER)+= gpio_decoder.o >> obj-$(CONFIG_INPUT_HISI_POWERKEY) += hisi_powerkey.o >> obj-$(CONFIG_HP_SDC_RTC)+= hp_sdc_rtc.o >> obj-$(CONFIG_INPUT_IMS_PCU) += ims-pcu.o >> diff --git a/drivers/input/misc/gpio_decoder.c >> b/drivers/input/misc/gpio_decoder.c >> new file mode 100644 >> index ..1c2191d4b143 >> --- /dev/null >> +++ b/drivers/input/misc/gpio_decoder.c >> @@ -0,0 +1,134 @@ >> +/* >> + * Copyright (C) 2016 Texas Instruments Incorporated - http://www.ti.com/ >> + * >> + * This program is free software; you can redistribute it and/or >> + * modify it under the terms of the GNU General Public License as >> + * published by the Free Software Foundation version 2. >> + * >> + * This program is distributed "as is" WITHOUT ANY WARRANTY of any >> + * kind, whether express or implied; without even the implied warranty >> + * of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the >> + * GNU General Public License for more details. >> + * >> + * A generic driver to read multiple gpio lines and translate the >> + *
Re: [PATCH 3/3] scsi/ncr5380: Improve interrupt latency during PIO tranfers
On Sun, 28 Aug 2016, Geert Uytterhoeven wrote: > Hi Finn, > > On Sat, Aug 27, 2016 at 4:30 AM, Finn Thain> wrote: > > Large PIO transfers are broken up into chunks to try to avoid > > disabling local IRQs for long periods. But IRQs are still disabled for > > too long and this causes SCC FIFO overruns during serial port > > transfers. This patch fixes the problem by halving the PIO chunk size. > > > > Testing with mac_scsi shows that the extra NCR5380_main() loop > > iterations have negligible performance impact on SCSI transfers (about > > 1% slower). On a faster system (using the dmx3191d module) transfers > > showed no measurable change. > > > > Signed-off-by: Finn Thain > > > > --- > > drivers/scsi/NCR5380.c |6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > Index: linux/drivers/scsi/NCR5380.c > > === > > --- linux.orig/drivers/scsi/NCR5380.c 2016-08-27 12:29:57.0 +1000 > > +++ linux/drivers/scsi/NCR5380.c2016-08-27 12:29:58.0 +1000 > > @@ -1847,11 +1847,11 @@ static void NCR5380_information_transfer > > /* XXX - need to source or > > sink data here, as appropriate */ > > } > > } else { > > - /* Break up transfer into 3 ms > > chunks, > > -* presuming 6 accesses per > > handshake. > > + /* Transfer a small chunk so that > > the > > +* irq mode lock is not held too > > long. > > */ > > transfersize = min((unsigned > > long)cmd->SCp.this_residual, > > - > > hostdata->accesses_per_ms / 2); > > + > > hostdata->accesses_per_ms >> 2); > > I think it's easier to read if you use "/ 4". I think the factor, "1/4 byte milliseconds per access" is not very meaningful. The PIO transfersize can be understood as, pio_bytes_until_scc_fifo_overflow = accesses_per_ms / (accesses_per_pio_byte / ms_until_fifo_overflow) This loop seemed like a good place to avoid a DIV instruction (though I didn't try to confirm that) and so I used a bit shift to indicate that intention. The shift amount was an empirical result that happened to work for the hardware I tested it on, at the baud rate I was using. Admittedly, if we want to avoid further tweaks to this then I'll have to do more testing and find a better approximation. -- > > Gr{oetje,eeting}s, > > Geert >
Re: [PATCH 3/3] scsi/ncr5380: Improve interrupt latency during PIO tranfers
On Sun, 28 Aug 2016, Geert Uytterhoeven wrote: > Hi Finn, > > On Sat, Aug 27, 2016 at 4:30 AM, Finn Thain > wrote: > > Large PIO transfers are broken up into chunks to try to avoid > > disabling local IRQs for long periods. But IRQs are still disabled for > > too long and this causes SCC FIFO overruns during serial port > > transfers. This patch fixes the problem by halving the PIO chunk size. > > > > Testing with mac_scsi shows that the extra NCR5380_main() loop > > iterations have negligible performance impact on SCSI transfers (about > > 1% slower). On a faster system (using the dmx3191d module) transfers > > showed no measurable change. > > > > Signed-off-by: Finn Thain > > > > --- > > drivers/scsi/NCR5380.c |6 +++--- > > 1 file changed, 3 insertions(+), 3 deletions(-) > > > > Index: linux/drivers/scsi/NCR5380.c > > === > > --- linux.orig/drivers/scsi/NCR5380.c 2016-08-27 12:29:57.0 +1000 > > +++ linux/drivers/scsi/NCR5380.c2016-08-27 12:29:58.0 +1000 > > @@ -1847,11 +1847,11 @@ static void NCR5380_information_transfer > > /* XXX - need to source or > > sink data here, as appropriate */ > > } > > } else { > > - /* Break up transfer into 3 ms > > chunks, > > -* presuming 6 accesses per > > handshake. > > + /* Transfer a small chunk so that > > the > > +* irq mode lock is not held too > > long. > > */ > > transfersize = min((unsigned > > long)cmd->SCp.this_residual, > > - > > hostdata->accesses_per_ms / 2); > > + > > hostdata->accesses_per_ms >> 2); > > I think it's easier to read if you use "/ 4". I think the factor, "1/4 byte milliseconds per access" is not very meaningful. The PIO transfersize can be understood as, pio_bytes_until_scc_fifo_overflow = accesses_per_ms / (accesses_per_pio_byte / ms_until_fifo_overflow) This loop seemed like a good place to avoid a DIV instruction (though I didn't try to confirm that) and so I used a bit shift to indicate that intention. The shift amount was an empirical result that happened to work for the hardware I tested it on, at the baud rate I was using. Admittedly, if we want to avoid further tweaks to this then I'll have to do more testing and find a better approximation. -- > > Gr{oetje,eeting}s, > > Geert >
Re: [RFC 1/1] drivers: i2c: omap: Add slave support
On 28 August 2016 at 07:35, Wolfram Sangwrote: > Well, I2C is simple, what could go wrong? :/ Actually I2C is elegant and *seems* simple, but in all its asynchronicity there are actually a surprising number of fine details you can trip over. Maybe that's why so many i2c controllers suck: since i2c looks simple enough manufacturers are easily tempted to roll their own instead of licensing a good implementation. Having said that, most of the inconsistency and obnoxiousness of the TI I2C controller is not even excusable by that argument. For example its irq registers *look* like the usual set { rawstatus, status, en, dis } that's their current standard ("Highlander") for peripherals. They do not however *behave* like the standard set however: 1. status isn't always (rawstatus & enabled) 2. status != 0 does not always imply the irq output is asserted 3. some enable-bits also change the behaviour of rawstatus All of these misbehaviours are unprecedented afaik. Normally you'd also expect each irq (raw)status bit to either a. be an event, set by hw and can be cleared by software any time, or b. be a level status, unaffected by software attempts to set/clear. Again the i2c controller decided this is far too little diversity. > So, it is possible to make a proper I2C slave with OMAP, but you need > to know those 100 gory details? Mostly. There are some limitations such as: * No ability to selectively ACK/NACK when addressed as slave. If you're unable to respond for some time then you'd end up blocking the bus with clock stretching. You could temporarily deconfigure your slave address but the TRM states changing slave address is forbidden while bus busy. * According to my notes it always ACKs a General Call and this cannot even be stalled using the SBLOCK register. Since I don't care about GC there's no more details in my notes, but if this is true then on any bus where GC is used, irq handling will have real-time deadlines to avoid losing track of transaction boundaries and misinterpreting data. Finally, as my first link pointed out, various protocol errors can lock up the peripheral's internal state machine. When operating as slave this is basically undetectable: all registers look normal and the bus-busy bit will continue to track start/stop, but the peripheral will not ACK any slave address anymore until you reset it. You could argue "well, but that requires bus protocol errors" but it is nevertheless a direct violation of the I2C standard: I2C-bus compatible devices must reset their bus logic on receipt of a START or repeated START condition such that they all anticipate the sending of a slave address, even if these START conditions are not positioned according to the proper format. Also, my testing showed pulsing SDA low on an idle bus sufficed to trigger this state. It needs to pass the glitch filter of course, but this filter is implemented by sampling the bus requiring two consecutive samples to agree. Two small glitches with just the right timing would therefore suffice. Rather unlikely for random noise, but having lots of signals on your pcb that ultimately derive from the same clock source probably makes the odds a lot more favorable. Matthijs
Re: [RFC 1/1] drivers: i2c: omap: Add slave support
On 28 August 2016 at 07:35, Wolfram Sang wrote: > Well, I2C is simple, what could go wrong? :/ Actually I2C is elegant and *seems* simple, but in all its asynchronicity there are actually a surprising number of fine details you can trip over. Maybe that's why so many i2c controllers suck: since i2c looks simple enough manufacturers are easily tempted to roll their own instead of licensing a good implementation. Having said that, most of the inconsistency and obnoxiousness of the TI I2C controller is not even excusable by that argument. For example its irq registers *look* like the usual set { rawstatus, status, en, dis } that's their current standard ("Highlander") for peripherals. They do not however *behave* like the standard set however: 1. status isn't always (rawstatus & enabled) 2. status != 0 does not always imply the irq output is asserted 3. some enable-bits also change the behaviour of rawstatus All of these misbehaviours are unprecedented afaik. Normally you'd also expect each irq (raw)status bit to either a. be an event, set by hw and can be cleared by software any time, or b. be a level status, unaffected by software attempts to set/clear. Again the i2c controller decided this is far too little diversity. > So, it is possible to make a proper I2C slave with OMAP, but you need > to know those 100 gory details? Mostly. There are some limitations such as: * No ability to selectively ACK/NACK when addressed as slave. If you're unable to respond for some time then you'd end up blocking the bus with clock stretching. You could temporarily deconfigure your slave address but the TRM states changing slave address is forbidden while bus busy. * According to my notes it always ACKs a General Call and this cannot even be stalled using the SBLOCK register. Since I don't care about GC there's no more details in my notes, but if this is true then on any bus where GC is used, irq handling will have real-time deadlines to avoid losing track of transaction boundaries and misinterpreting data. Finally, as my first link pointed out, various protocol errors can lock up the peripheral's internal state machine. When operating as slave this is basically undetectable: all registers look normal and the bus-busy bit will continue to track start/stop, but the peripheral will not ACK any slave address anymore until you reset it. You could argue "well, but that requires bus protocol errors" but it is nevertheless a direct violation of the I2C standard: I2C-bus compatible devices must reset their bus logic on receipt of a START or repeated START condition such that they all anticipate the sending of a slave address, even if these START conditions are not positioned according to the proper format. Also, my testing showed pulsing SDA low on an idle bus sufficed to trigger this state. It needs to pass the glitch filter of course, but this filter is implemented by sampling the bus requiring two consecutive samples to agree. Two small glitches with just the right timing would therefore suffice. Rather unlikely for random noise, but having lots of signals on your pcb that ultimately derive from the same clock source probably makes the odds a lot more favorable. Matthijs
Re: [PATCH] cxgb4/cxgb4vf: fix spelling mistake "provissioned" -> "provisioned"
From: Colin KingDate: Sun, 28 Aug 2016 12:07:02 +0100 > From: Colin Ian King > > Trivial fix to spelling mistake in dev_warn message. > > Signed-off-by: Colin Ian King Applied.
Re: [PATCH] wan/fsl_ucc_hdlc: fix spelling mistake "prameter" -> "parameter"
From: Colin KingDate: Sun, 28 Aug 2016 11:40:41 +0100 > From: Colin Ian King > > Trivial fix to spelling mistake in dev_err message. > > Signed-off-by: Colin Ian King Applied.
Re: [PATCH] net: ucc_geth: fix spelling mistake "propperty" -> "property"
From: Colin KingDate: Sun, 28 Aug 2016 12:03:27 +0100 > From: Colin Ian King > > Trivial fix to spelling mistake in dev_warn message. > > Signed-off-by: Colin Ian King Applied.
Re: [PATCH] cxgb4/cxgb4vf: fix spelling mistake "provissioned" -> "provisioned"
From: Colin King Date: Sun, 28 Aug 2016 12:07:02 +0100 > From: Colin Ian King > > Trivial fix to spelling mistake in dev_warn message. > > Signed-off-by: Colin Ian King Applied.
Re: [PATCH] wan/fsl_ucc_hdlc: fix spelling mistake "prameter" -> "parameter"
From: Colin King Date: Sun, 28 Aug 2016 11:40:41 +0100 > From: Colin Ian King > > Trivial fix to spelling mistake in dev_err message. > > Signed-off-by: Colin Ian King Applied.
Re: [PATCH] net: ucc_geth: fix spelling mistake "propperty" -> "property"
From: Colin King Date: Sun, 28 Aug 2016 12:03:27 +0100 > From: Colin Ian King > > Trivial fix to spelling mistake in dev_warn message. > > Signed-off-by: Colin Ian King Applied.
[PATCH] mount: dont execute propagate_umount() many times for same mounts
In a worse case the current complexity of umount_tree() is O(n^3). * Enumirate all mounts in a target tree (propagate_umount) * Enumirate mounts to find where these changes have to be propagated (mark_umount_candidates) * Enumirate mounts to find a requered mount by parent and dentry (__lookup_mnt_lat) The worse case is when all mounts from the tree live in the same shared group. And in this case we have to enumirate all mounts on each step. Here we can optimize the second step. We don't need to make it for mounts which we already met when we do this step for previous mounts. It reduces the complexity of umount_tree() to O(n^2). Here is a script to generate such mount tree: $ cat run.sh mount -t tmpfs xxx /mnt mount --make-shared /mnt for i in `seq $1`; do mount --bind /mnt `mktemp -d /mnt/test.XX` done time umount -l /mnt $ for i in `seq 10 16`; do echo $i; unshare -Urm bash ./run.sh $i; done Here is performance measurements with and without this patch: mounts | after | before (sec) - 1024 | 0.024 | 0.084 2048 | 0.041 | 0.39 4096 | 0.059 | 3.198 8192 | 0.227 | 50.794 16384 | 1.015 | 810 This patch is a first step to fix CVE-2016-6213. The next step will be to add ucount (user namespace limit) for mounts. Signed-off-by: Andrei Vagin--- fs/mount.h | 2 ++ fs/namespace.c | 19 --- fs/pnode.c | 23 +-- 3 files changed, 39 insertions(+), 5 deletions(-) diff --git a/fs/mount.h b/fs/mount.h index 14db05d..b5631bd 100644 --- a/fs/mount.h +++ b/fs/mount.h @@ -87,6 +87,8 @@ static inline int is_mounted(struct vfsmount *mnt) extern struct mount *__lookup_mnt(struct vfsmount *, struct dentry *); extern struct mount *__lookup_mnt_last(struct vfsmount *, struct dentry *); +extern struct mount *__lookup_mnt_cont(struct mount *, + struct vfsmount *, struct dentry *); extern int __legitimize_mnt(struct vfsmount *, unsigned); extern bool legitimize_mnt(struct vfsmount *, unsigned); diff --git a/fs/namespace.c b/fs/namespace.c index 7bb2cda..924cea7 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -649,9 +649,7 @@ struct mount *__lookup_mnt_last(struct vfsmount *mnt, struct dentry *dentry) goto out; if (!(p->mnt.mnt_flags & MNT_UMOUNT)) res = p; - hlist_for_each_entry_continue(p, mnt_hash) { - if (>mnt_parent->mnt != mnt || p->mnt_mountpoint != dentry) - break; + for (; p != NULL; p = __lookup_mnt_cont(p, mnt, dentry)) { if (!(p->mnt.mnt_flags & MNT_UMOUNT)) res = p; } @@ -659,6 +657,21 @@ out: return res; } +struct mount *__lookup_mnt_cont(struct mount *p, + struct vfsmount *mnt, struct dentry *dentry) +{ + struct hlist_node *node = p->mnt_hash.next; + + if (!node) + return NULL; + + p = hlist_entry(node, struct mount, mnt_hash); + if (>mnt_parent->mnt != mnt || p->mnt_mountpoint != dentry) + return NULL; + + return p; +} + /* * lookup_mnt - Return the first child mount mounted at path * diff --git a/fs/pnode.c b/fs/pnode.c index 9989970..2242aad 100644 --- a/fs/pnode.c +++ b/fs/pnode.c @@ -399,10 +399,24 @@ static void mark_umount_candidates(struct mount *mnt) BUG_ON(parent == mnt); + if (IS_MNT_MARKED(mnt)) + return; + for (m = propagation_next(parent, parent); m; m = propagation_next(m, parent)) { - struct mount *child = __lookup_mnt_last(>mnt, + struct mount *child = __lookup_mnt(>mnt, mnt->mnt_mountpoint); + + while (child && child->mnt.mnt_flags & MNT_UMOUNT) { + /* +* Mark umounted mounts to not call +* __propagate_umount for them again. +*/ + SET_MNT_MARK(child); + child = __lookup_mnt_cont(child, >mnt, + mnt->mnt_mountpoint); + } + if (child && (!IS_MNT_LOCKED(child) || IS_MNT_MARKED(m))) { SET_MNT_MARK(child); } @@ -420,6 +434,9 @@ static void __propagate_umount(struct mount *mnt) BUG_ON(parent == mnt); + if (IS_MNT_MARKED(mnt)) + return; + for (m = propagation_next(parent, parent); m; m = propagation_next(m, parent)) { @@ -431,6 +448,8 @@ static void __propagate_umount(struct mount *mnt) */ if (!child || !IS_MNT_MARKED(child)) continue; + if (child->mnt.mnt_flags & MNT_UMOUNT) + continue; CLEAR_MNT_MARK(child);
[PATCH] mount: dont execute propagate_umount() many times for same mounts
In a worse case the current complexity of umount_tree() is O(n^3). * Enumirate all mounts in a target tree (propagate_umount) * Enumirate mounts to find where these changes have to be propagated (mark_umount_candidates) * Enumirate mounts to find a requered mount by parent and dentry (__lookup_mnt_lat) The worse case is when all mounts from the tree live in the same shared group. And in this case we have to enumirate all mounts on each step. Here we can optimize the second step. We don't need to make it for mounts which we already met when we do this step for previous mounts. It reduces the complexity of umount_tree() to O(n^2). Here is a script to generate such mount tree: $ cat run.sh mount -t tmpfs xxx /mnt mount --make-shared /mnt for i in `seq $1`; do mount --bind /mnt `mktemp -d /mnt/test.XX` done time umount -l /mnt $ for i in `seq 10 16`; do echo $i; unshare -Urm bash ./run.sh $i; done Here is performance measurements with and without this patch: mounts | after | before (sec) - 1024 | 0.024 | 0.084 2048 | 0.041 | 0.39 4096 | 0.059 | 3.198 8192 | 0.227 | 50.794 16384 | 1.015 | 810 This patch is a first step to fix CVE-2016-6213. The next step will be to add ucount (user namespace limit) for mounts. Signed-off-by: Andrei Vagin --- fs/mount.h | 2 ++ fs/namespace.c | 19 --- fs/pnode.c | 23 +-- 3 files changed, 39 insertions(+), 5 deletions(-) diff --git a/fs/mount.h b/fs/mount.h index 14db05d..b5631bd 100644 --- a/fs/mount.h +++ b/fs/mount.h @@ -87,6 +87,8 @@ static inline int is_mounted(struct vfsmount *mnt) extern struct mount *__lookup_mnt(struct vfsmount *, struct dentry *); extern struct mount *__lookup_mnt_last(struct vfsmount *, struct dentry *); +extern struct mount *__lookup_mnt_cont(struct mount *, + struct vfsmount *, struct dentry *); extern int __legitimize_mnt(struct vfsmount *, unsigned); extern bool legitimize_mnt(struct vfsmount *, unsigned); diff --git a/fs/namespace.c b/fs/namespace.c index 7bb2cda..924cea7 100644 --- a/fs/namespace.c +++ b/fs/namespace.c @@ -649,9 +649,7 @@ struct mount *__lookup_mnt_last(struct vfsmount *mnt, struct dentry *dentry) goto out; if (!(p->mnt.mnt_flags & MNT_UMOUNT)) res = p; - hlist_for_each_entry_continue(p, mnt_hash) { - if (>mnt_parent->mnt != mnt || p->mnt_mountpoint != dentry) - break; + for (; p != NULL; p = __lookup_mnt_cont(p, mnt, dentry)) { if (!(p->mnt.mnt_flags & MNT_UMOUNT)) res = p; } @@ -659,6 +657,21 @@ out: return res; } +struct mount *__lookup_mnt_cont(struct mount *p, + struct vfsmount *mnt, struct dentry *dentry) +{ + struct hlist_node *node = p->mnt_hash.next; + + if (!node) + return NULL; + + p = hlist_entry(node, struct mount, mnt_hash); + if (>mnt_parent->mnt != mnt || p->mnt_mountpoint != dentry) + return NULL; + + return p; +} + /* * lookup_mnt - Return the first child mount mounted at path * diff --git a/fs/pnode.c b/fs/pnode.c index 9989970..2242aad 100644 --- a/fs/pnode.c +++ b/fs/pnode.c @@ -399,10 +399,24 @@ static void mark_umount_candidates(struct mount *mnt) BUG_ON(parent == mnt); + if (IS_MNT_MARKED(mnt)) + return; + for (m = propagation_next(parent, parent); m; m = propagation_next(m, parent)) { - struct mount *child = __lookup_mnt_last(>mnt, + struct mount *child = __lookup_mnt(>mnt, mnt->mnt_mountpoint); + + while (child && child->mnt.mnt_flags & MNT_UMOUNT) { + /* +* Mark umounted mounts to not call +* __propagate_umount for them again. +*/ + SET_MNT_MARK(child); + child = __lookup_mnt_cont(child, >mnt, + mnt->mnt_mountpoint); + } + if (child && (!IS_MNT_LOCKED(child) || IS_MNT_MARKED(m))) { SET_MNT_MARK(child); } @@ -420,6 +434,9 @@ static void __propagate_umount(struct mount *mnt) BUG_ON(parent == mnt); + if (IS_MNT_MARKED(mnt)) + return; + for (m = propagation_next(parent, parent); m; m = propagation_next(m, parent)) { @@ -431,6 +448,8 @@ static void __propagate_umount(struct mount *mnt) */ if (!child || !IS_MNT_MARKED(child)) continue; + if (child->mnt.mnt_flags & MNT_UMOUNT) + continue; CLEAR_MNT_MARK(child); if
[PATCH 2/2] f2fs: add roll-forward recovery process for encrypted dentry
Add roll-forward recovery process for encrypted dentry, so the first fsync issued to an encrypted file does not need writing checkpoint. This improves the performance of the following test at thousands of small files: open -> write -> fsync -> close Signed-off-by: Shuoran Liu--- fs/f2fs/dir.c | 75 ++ fs/f2fs/f2fs.h | 4 +++ fs/f2fs/file.c | 2 -- fs/f2fs/recovery.c | 16 +--- 4 files changed, 58 insertions(+), 39 deletions(-) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index 9054aea..8eca6dd 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -212,31 +212,17 @@ static struct f2fs_dir_entry *find_in_level(struct inode *dir, return de; } -/* - * Find an entry in the specified directory with the wanted name. - * It returns the page where the entry was found (as a parameter - res_page), - * and the entry itself. Page is returned mapped and unlocked. - * Entry is guaranteed to be valid. - */ -struct f2fs_dir_entry *f2fs_find_entry(struct inode *dir, - const struct qstr *child, struct page **res_page) +struct f2fs_dir_entry *__f2fs_find_entry(struct inode *dir, + struct fscrypt_name *fname, struct page **res_page) { unsigned long npages = dir_blocks(dir); struct f2fs_dir_entry *de = NULL; unsigned int max_depth; unsigned int level; - struct fscrypt_name fname; - int err; - - err = fscrypt_setup_filename(dir, child, 1, ); - if (err) { - *res_page = ERR_PTR(err); - return NULL; - } if (f2fs_has_inline_dentry(dir)) { *res_page = NULL; - de = find_in_inline_dir(dir, , res_page); + de = find_in_inline_dir(dir, fname, res_page); goto out; } @@ -256,11 +242,35 @@ struct f2fs_dir_entry *f2fs_find_entry(struct inode *dir, for (level = 0; level < max_depth; level++) { *res_page = NULL; - de = find_in_level(dir, level, , res_page); + de = find_in_level(dir, level, fname, res_page); if (de || IS_ERR(*res_page)) break; } out: + return de; +} + +/* + * Find an entry in the specified directory with the wanted name. + * It returns the page where the entry was found (as a parameter - res_page), + * and the entry itself. Page is returned mapped and unlocked. + * Entry is guaranteed to be valid. + */ +struct f2fs_dir_entry *f2fs_find_entry(struct inode *dir, + const struct qstr *child, struct page **res_page) +{ + struct f2fs_dir_entry *de = NULL; + struct fscrypt_name fname; + int err; + + err = fscrypt_setup_filename(dir, child, 1, ); + if (err) { + *res_page = ERR_PTR(err); + return NULL; + } + + de = __f2fs_find_entry(dir, , res_page); + fscrypt_free_filename(); return de; } @@ -599,6 +609,24 @@ fail: return err; } +int __f2fs_do_add_link(struct inode *dir, struct fscrypt_name *fname, + struct inode *inode, nid_t ino, umode_t mode) +{ + struct qstr new_name; + int err = -EAGAIN; + + new_name.name = fname_name(fname); + new_name.len = fname_len(fname); + + if (f2fs_has_inline_dentry(dir)) + err = f2fs_add_inline_entry(dir, _name, inode, ino, mode); + if (err == -EAGAIN) + err = f2fs_add_regular_entry(dir, _name, inode, ino, mode); + + f2fs_update_time(F2FS_I_SB(dir), REQ_TIME); + return err; +} + /* * Caller should grab and release a rwsem by calling f2fs_lock_op() and * f2fs_unlock_op(). @@ -607,24 +635,15 @@ int __f2fs_add_link(struct inode *dir, const struct qstr *name, struct inode *inode, nid_t ino, umode_t mode) { struct fscrypt_name fname; - struct qstr new_name; int err; err = fscrypt_setup_filename(dir, name, 0, ); if (err) return err; - new_name.name = fname_name(); - new_name.len = fname_len(); - - err = -EAGAIN; - if (f2fs_has_inline_dentry(dir)) - err = f2fs_add_inline_entry(dir, _name, inode, ino, mode); - if (err == -EAGAIN) - err = f2fs_add_regular_entry(dir, _name, inode, ino, mode); + err = __f2fs_do_add_link(dir, , inode, ino, mode); fscrypt_free_filename(); - f2fs_update_time(F2FS_I_SB(dir), REQ_TIME); return err; } diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 14f5fe2..78d7641 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -1914,6 +1914,8 @@ struct page *init_inode_metadata(struct inode *, struct inode *, void update_parent_metadata(struct inode *, struct inode *, unsigned int); int room_for_filename(const void *, int, int); void f2fs_drop_nlink(struct
[PATCH 1/2] f2fs: set encryption name flag in add inline entry path
This patch sets encryption name flag in the add inline entry path if filename is encrypted. Signed-off-by: Shuoran Liu--- fs/f2fs/inline.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c index ccea873..f9ce04a7 100644 --- a/fs/f2fs/inline.c +++ b/fs/f2fs/inline.c @@ -524,6 +524,8 @@ int f2fs_add_inline_entry(struct inode *dir, const struct qstr *name, err = PTR_ERR(page); goto fail; } + if (f2fs_encrypted_inode(dir)) + file_set_enc_name(inode); } f2fs_wait_on_page_writeback(ipage, NODE, true); -- 1.9.1
[PATCH 1/2] f2fs: set encryption name flag in add inline entry path
This patch sets encryption name flag in the add inline entry path if filename is encrypted. Signed-off-by: Shuoran Liu --- fs/f2fs/inline.c | 2 ++ 1 file changed, 2 insertions(+) diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c index ccea873..f9ce04a7 100644 --- a/fs/f2fs/inline.c +++ b/fs/f2fs/inline.c @@ -524,6 +524,8 @@ int f2fs_add_inline_entry(struct inode *dir, const struct qstr *name, err = PTR_ERR(page); goto fail; } + if (f2fs_encrypted_inode(dir)) + file_set_enc_name(inode); } f2fs_wait_on_page_writeback(ipage, NODE, true); -- 1.9.1
[PATCH 2/2] f2fs: add roll-forward recovery process for encrypted dentry
Add roll-forward recovery process for encrypted dentry, so the first fsync issued to an encrypted file does not need writing checkpoint. This improves the performance of the following test at thousands of small files: open -> write -> fsync -> close Signed-off-by: Shuoran Liu --- fs/f2fs/dir.c | 75 ++ fs/f2fs/f2fs.h | 4 +++ fs/f2fs/file.c | 2 -- fs/f2fs/recovery.c | 16 +--- 4 files changed, 58 insertions(+), 39 deletions(-) diff --git a/fs/f2fs/dir.c b/fs/f2fs/dir.c index 9054aea..8eca6dd 100644 --- a/fs/f2fs/dir.c +++ b/fs/f2fs/dir.c @@ -212,31 +212,17 @@ static struct f2fs_dir_entry *find_in_level(struct inode *dir, return de; } -/* - * Find an entry in the specified directory with the wanted name. - * It returns the page where the entry was found (as a parameter - res_page), - * and the entry itself. Page is returned mapped and unlocked. - * Entry is guaranteed to be valid. - */ -struct f2fs_dir_entry *f2fs_find_entry(struct inode *dir, - const struct qstr *child, struct page **res_page) +struct f2fs_dir_entry *__f2fs_find_entry(struct inode *dir, + struct fscrypt_name *fname, struct page **res_page) { unsigned long npages = dir_blocks(dir); struct f2fs_dir_entry *de = NULL; unsigned int max_depth; unsigned int level; - struct fscrypt_name fname; - int err; - - err = fscrypt_setup_filename(dir, child, 1, ); - if (err) { - *res_page = ERR_PTR(err); - return NULL; - } if (f2fs_has_inline_dentry(dir)) { *res_page = NULL; - de = find_in_inline_dir(dir, , res_page); + de = find_in_inline_dir(dir, fname, res_page); goto out; } @@ -256,11 +242,35 @@ struct f2fs_dir_entry *f2fs_find_entry(struct inode *dir, for (level = 0; level < max_depth; level++) { *res_page = NULL; - de = find_in_level(dir, level, , res_page); + de = find_in_level(dir, level, fname, res_page); if (de || IS_ERR(*res_page)) break; } out: + return de; +} + +/* + * Find an entry in the specified directory with the wanted name. + * It returns the page where the entry was found (as a parameter - res_page), + * and the entry itself. Page is returned mapped and unlocked. + * Entry is guaranteed to be valid. + */ +struct f2fs_dir_entry *f2fs_find_entry(struct inode *dir, + const struct qstr *child, struct page **res_page) +{ + struct f2fs_dir_entry *de = NULL; + struct fscrypt_name fname; + int err; + + err = fscrypt_setup_filename(dir, child, 1, ); + if (err) { + *res_page = ERR_PTR(err); + return NULL; + } + + de = __f2fs_find_entry(dir, , res_page); + fscrypt_free_filename(); return de; } @@ -599,6 +609,24 @@ fail: return err; } +int __f2fs_do_add_link(struct inode *dir, struct fscrypt_name *fname, + struct inode *inode, nid_t ino, umode_t mode) +{ + struct qstr new_name; + int err = -EAGAIN; + + new_name.name = fname_name(fname); + new_name.len = fname_len(fname); + + if (f2fs_has_inline_dentry(dir)) + err = f2fs_add_inline_entry(dir, _name, inode, ino, mode); + if (err == -EAGAIN) + err = f2fs_add_regular_entry(dir, _name, inode, ino, mode); + + f2fs_update_time(F2FS_I_SB(dir), REQ_TIME); + return err; +} + /* * Caller should grab and release a rwsem by calling f2fs_lock_op() and * f2fs_unlock_op(). @@ -607,24 +635,15 @@ int __f2fs_add_link(struct inode *dir, const struct qstr *name, struct inode *inode, nid_t ino, umode_t mode) { struct fscrypt_name fname; - struct qstr new_name; int err; err = fscrypt_setup_filename(dir, name, 0, ); if (err) return err; - new_name.name = fname_name(); - new_name.len = fname_len(); - - err = -EAGAIN; - if (f2fs_has_inline_dentry(dir)) - err = f2fs_add_inline_entry(dir, _name, inode, ino, mode); - if (err == -EAGAIN) - err = f2fs_add_regular_entry(dir, _name, inode, ino, mode); + err = __f2fs_do_add_link(dir, , inode, ino, mode); fscrypt_free_filename(); - f2fs_update_time(F2FS_I_SB(dir), REQ_TIME); return err; } diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h index 14f5fe2..78d7641 100644 --- a/fs/f2fs/f2fs.h +++ b/fs/f2fs/f2fs.h @@ -1914,6 +1914,8 @@ struct page *init_inode_metadata(struct inode *, struct inode *, void update_parent_metadata(struct inode *, struct inode *, unsigned int); int room_for_filename(const void *, int, int); void f2fs_drop_nlink(struct inode *, struct inode *);
Re: [PATCH 2/2] arm64: dts: rockchip: add eMMC's power domain support for rk3399
On 2016/8/29 10:50, Elaine Zhang wrote: On 08/27/2016 11:05 PM, Shawn Lin wrote: On 2016/8/27 21:41, Ziyuan Xu wrote: Control power domain for eMMC via genpd to reduce power consumption. Signed-off-by: Elaine ZhangSigned-off-by: Ziyuan Xu It looks nice to me. But this should be merged after applying that[0] as your patch will break bind/unbind test for sdhci-of-arasan on rk3399 without it[0]. Moreover, Elaine should make sure that upstreamed rockchip power domain stuff would not off pd for emmc, *otherwise*, I should update my patch to make sure we update clkmul every time when doing suspend 2 resume.. Forgot to say: If use pd, Although there is no call to power odd the pd_emmc, it will be power off when the system doing suspend 2 resume. (Because the system call __device_suspend_noirq->pm_genpd_suspend_noirq->rockchip_pd_power_off) Thanks for explaining this. I checked the code a bit and actually I don't need to updata clkmul since it was recorded, although it is still reset to 0x10 reading from syscon. So for that, we can now pick it up without waiting for my sdhci-of-arasan's update. Reviewed-by: Shawn Lin And it's important to note: If the pd has been power off, some grf regs will be back to the default value.(which grf regs in this pd) So if the pd support power off , this grf regs need to save and restore or reinit. For example: pd_emmc aclk_emmc_grf If the pd is always on,and this pd have wakeup func. The device need to add device_init_wakeup() to make the pd always on when the system doing suspend 2 resume. [0]: https://patchwork.kernel.org/patch/9300971/ --- arch/arm64/boot/dts/rockchip/rk3399.dtsi | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi b/arch/arm64/boot/dts/rockchip/rk3399.dtsi index 32aebc8..71733d4 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi @@ -239,6 +239,7 @@ #clock-cells = <0>; phys = <_phy>; phy-names = "phy_arasan"; +power-domains = < RK3399_PD_EMMC>; status = "disabled"; }; @@ -611,6 +612,11 @@ status = "disabled"; }; +qos_emmc: qos@ffa58000 { +compatible = "syscon"; +reg = <0x0 0xffa58000 0x0 0x20>; +}; + qos_hdcp: qos@ffa9 { compatible = "syscon"; reg = <0x0 0xffa9 0x0 0x20>; @@ -739,6 +745,11 @@ }; /* These power domains are grouped by VD_LOGIC */ +pd_emmc@RK3399_PD_EMMC { +reg = ; +clocks = < ACLK_EMMC>; +pm_qos = <_emmc>; +}; pd_vio@RK3399_PD_VIO { reg = ; #address-cells = <1>; -- Best Regards Shawn Lin
Re: [PATCH 2/2] arm64: dts: rockchip: add eMMC's power domain support for rk3399
On 2016/8/29 10:50, Elaine Zhang wrote: On 08/27/2016 11:05 PM, Shawn Lin wrote: On 2016/8/27 21:41, Ziyuan Xu wrote: Control power domain for eMMC via genpd to reduce power consumption. Signed-off-by: Elaine Zhang Signed-off-by: Ziyuan Xu It looks nice to me. But this should be merged after applying that[0] as your patch will break bind/unbind test for sdhci-of-arasan on rk3399 without it[0]. Moreover, Elaine should make sure that upstreamed rockchip power domain stuff would not off pd for emmc, *otherwise*, I should update my patch to make sure we update clkmul every time when doing suspend 2 resume.. Forgot to say: If use pd, Although there is no call to power odd the pd_emmc, it will be power off when the system doing suspend 2 resume. (Because the system call __device_suspend_noirq->pm_genpd_suspend_noirq->rockchip_pd_power_off) Thanks for explaining this. I checked the code a bit and actually I don't need to updata clkmul since it was recorded, although it is still reset to 0x10 reading from syscon. So for that, we can now pick it up without waiting for my sdhci-of-arasan's update. Reviewed-by: Shawn Lin And it's important to note: If the pd has been power off, some grf regs will be back to the default value.(which grf regs in this pd) So if the pd support power off , this grf regs need to save and restore or reinit. For example: pd_emmc aclk_emmc_grf If the pd is always on,and this pd have wakeup func. The device need to add device_init_wakeup() to make the pd always on when the system doing suspend 2 resume. [0]: https://patchwork.kernel.org/patch/9300971/ --- arch/arm64/boot/dts/rockchip/rk3399.dtsi | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi b/arch/arm64/boot/dts/rockchip/rk3399.dtsi index 32aebc8..71733d4 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi @@ -239,6 +239,7 @@ #clock-cells = <0>; phys = <_phy>; phy-names = "phy_arasan"; +power-domains = < RK3399_PD_EMMC>; status = "disabled"; }; @@ -611,6 +612,11 @@ status = "disabled"; }; +qos_emmc: qos@ffa58000 { +compatible = "syscon"; +reg = <0x0 0xffa58000 0x0 0x20>; +}; + qos_hdcp: qos@ffa9 { compatible = "syscon"; reg = <0x0 0xffa9 0x0 0x20>; @@ -739,6 +745,11 @@ }; /* These power domains are grouped by VD_LOGIC */ +pd_emmc@RK3399_PD_EMMC { +reg = ; +clocks = < ACLK_EMMC>; +pm_qos = <_emmc>; +}; pd_vio@RK3399_PD_VIO { reg = ; #address-cells = <1>; -- Best Regards Shawn Lin
RE: [PATCH] omapdrm: dss: drop unneeded of_node_put() on ref passed to of_get_next_parent()
>Sent: Saturday, August 27, 2016 8:07 PM >To: Tomi Valkeinen; Tony Lindgren ; >Sean Paul ; Peter Chen ; >Andrey Utkin >Cc: David Airlie ; Peter Ujfalusi ; >Dave >Airlie ; Rob Clark ; Dr. H. Nikolaus >Schaller ; Andrew Bradford ; >ker...@pyra-handheld.com; Discussions about the Letux Kernel ker...@openphoenux.org>; dri-de...@lists.freedesktop.org; lkml ker...@vger.kernel.org>; linux-o...@vger.kernel.org >Subject: Re: [PATCH] omapdrm: dss: drop unneeded of_node_put() on ref passed to >of_get_next_parent() > >> [8.842806] OF: ERROR: Bad of_node_put() on /encoder/ports/port@1/endpoint >> [8.843014] [] (omapdss_of_find_source_for_first_ep [omapdss]) > >I can confirm that reverting 2ab9f5879162 fixes this regression, tested on >omap5- >uevm. > It was my careless for introducing regression. The revert patch has already been at linux-next. Sorry for inconvenience. https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=5a78ff7bf7e25191144b550961001bbf6c734da4 Peter
RE: [PATCH] omapdrm: dss: drop unneeded of_node_put() on ref passed to of_get_next_parent()
>Sent: Saturday, August 27, 2016 8:07 PM >To: Tomi Valkeinen ; Tony Lindgren ; >Sean Paul ; Peter Chen ; >Andrey Utkin >Cc: David Airlie ; Peter Ujfalusi ; >Dave >Airlie ; Rob Clark ; Dr. H. Nikolaus >Schaller ; Andrew Bradford ; >ker...@pyra-handheld.com; Discussions about the Letux Kernel ker...@openphoenux.org>; dri-de...@lists.freedesktop.org; lkml ker...@vger.kernel.org>; linux-o...@vger.kernel.org >Subject: Re: [PATCH] omapdrm: dss: drop unneeded of_node_put() on ref passed to >of_get_next_parent() > >> [8.842806] OF: ERROR: Bad of_node_put() on /encoder/ports/port@1/endpoint >> [8.843014] [] (omapdss_of_find_source_for_first_ep [omapdss]) > >I can confirm that reverting 2ab9f5879162 fixes this regression, tested on >omap5- >uevm. > It was my careless for introducing regression. The revert patch has already been at linux-next. Sorry for inconvenience. https://git.kernel.org/cgit/linux/kernel/git/next/linux-next.git/commit/?id=5a78ff7bf7e25191144b550961001bbf6c734da4 Peter
Re: [PATCH v7 11/14] arm64/numa: support HAVE_MEMORYLESS_NODES
On 2016/8/27 19:05, Leizhen (ThunderTown) wrote: > > > On 2016/8/26 23:43, Will Deacon wrote: >> On Wed, Aug 24, 2016 at 03:44:50PM +0800, Zhen Lei wrote: >>> Some numa nodes may have no memory. For example: >>> 1. cpu0 on node0 >>> 2. cpu1 on node1 >>> 3. device0 access the momory from node0 and node1 take the same time. >>> >>> So, we can not simply classify device0 to node0 or node1, but we can >>> define a node2 which distances to node0 and node1 are the same. >>> >>> Signed-off-by: Zhen Lei>>> --- >>> arch/arm64/Kconfig | 4 >>> arch/arm64/kernel/smp.c | 1 + >>> arch/arm64/mm/numa.c| 43 +-- >>> 3 files changed, 46 insertions(+), 2 deletions(-) >>> >>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig >>> index 2815af6..3a2b6ed 100644 >>> --- a/arch/arm64/Kconfig >>> +++ b/arch/arm64/Kconfig >>> @@ -611,6 +611,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK >>> def_bool y >>> depends on NUMA >>> >>> +config HAVE_MEMORYLESS_NODES >>> + def_bool y >>> + depends on NUMA >>> + >>> source kernel/Kconfig.preempt >>> source kernel/Kconfig.hz >>> >>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c >>> index d93d433..4879085 100644 >>> --- a/arch/arm64/kernel/smp.c >>> +++ b/arch/arm64/kernel/smp.c >>> @@ -619,6 +619,7 @@ static void __init of_parse_and_init_cpus(void) >>> } >>> >>> bootcpu_valid = true; >>> + early_map_cpu_to_node(0, of_node_to_nid(dn)); >> >> This seems unrelated? > I will get off my work soon. Maybe I need put it into patch 12. > >> >>> /* >>> * cpu_logical_map has already been >>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c >>> index 6853db7..114180f 100644 >>> --- a/arch/arm64/mm/numa.c >>> +++ b/arch/arm64/mm/numa.c >>> @@ -129,6 +129,14 @@ void __init early_map_cpu_to_node(unsigned int cpu, >>> int nid) >>> nid = 0; >>> >>> cpu_to_node_map[cpu] = nid; >>> + >>> + /* >>> +* We should set the numa node of cpu0 as soon as possible, because it >>> +* has already been set up online before. cpu_to_node(0) will soon be >>> +* called. >>> +*/ >>> + if (!cpu) >>> + set_cpu_numa_node(cpu, nid); >> >> Likewise. >> >>> } >>> >>> #ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA >>> @@ -211,6 +219,35 @@ int __init numa_add_memblk(int nid, u64 start, u64 end) >>> return ret; >>> } >>> >>> +static u64 __init alloc_node_data_from_nearest_node(int nid, const size_t >>> size) >>> +{ >>> + int i, best_nid, distance; >>> + u64 pa; >>> + DECLARE_BITMAP(nodes_map, MAX_NUMNODES); >>> + >>> + bitmap_zero(nodes_map, MAX_NUMNODES); >>> + bitmap_set(nodes_map, nid, 1); >>> + >>> +find_nearest_node: >>> + best_nid = NUMA_NO_NODE; >>> + distance = INT_MAX; >>> + >>> + for_each_clear_bit(i, nodes_map, MAX_NUMNODES) >>> + if (numa_distance[nid][i] < distance) { >>> + best_nid = i; >>> + distance = numa_distance[nid][i]; >>> + } >>> + >>> + pa = memblock_alloc_nid(size, SMP_CACHE_BYTES, best_nid); >>> + if (!pa) { >>> + BUG_ON(best_nid == NUMA_NO_NODE); >>> + bitmap_set(nodes_map, best_nid, 1); >>> + goto find_nearest_node; >>> + } >>> + >>> + return pa; >>> +} >>> + >>> /** >>> * Initialize NODE_DATA for a node on the local memory >>> */ >>> @@ -224,7 +261,9 @@ static void __init setup_node_data(int nid, u64 >>> start_pfn, u64 end_pfn) >>> pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n", >>> nid, start_pfn << PAGE_SHIFT, (end_pfn << PAGE_SHIFT) - 1); >>> >>> - nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid); >>> + nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid); >>> + if (!nd_pa) >>> + nd_pa = alloc_node_data_from_nearest_node(nid, nd_size); >> >> Why not add memblock_alloc_near_nid to the core code, and make it do >> what you need there? > I'm thinking about it next week. But some ARCHs like X86/IA64 have their own > implementation. Do you mean directly and only call alloc_node_data_from_nearest_node? OK, that's fine. Thanks. > >> >> Will >> >> . >>
Re: [PATCH v7 11/14] arm64/numa: support HAVE_MEMORYLESS_NODES
On 2016/8/27 19:05, Leizhen (ThunderTown) wrote: > > > On 2016/8/26 23:43, Will Deacon wrote: >> On Wed, Aug 24, 2016 at 03:44:50PM +0800, Zhen Lei wrote: >>> Some numa nodes may have no memory. For example: >>> 1. cpu0 on node0 >>> 2. cpu1 on node1 >>> 3. device0 access the momory from node0 and node1 take the same time. >>> >>> So, we can not simply classify device0 to node0 or node1, but we can >>> define a node2 which distances to node0 and node1 are the same. >>> >>> Signed-off-by: Zhen Lei >>> --- >>> arch/arm64/Kconfig | 4 >>> arch/arm64/kernel/smp.c | 1 + >>> arch/arm64/mm/numa.c| 43 +-- >>> 3 files changed, 46 insertions(+), 2 deletions(-) >>> >>> diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig >>> index 2815af6..3a2b6ed 100644 >>> --- a/arch/arm64/Kconfig >>> +++ b/arch/arm64/Kconfig >>> @@ -611,6 +611,10 @@ config NEED_PER_CPU_EMBED_FIRST_CHUNK >>> def_bool y >>> depends on NUMA >>> >>> +config HAVE_MEMORYLESS_NODES >>> + def_bool y >>> + depends on NUMA >>> + >>> source kernel/Kconfig.preempt >>> source kernel/Kconfig.hz >>> >>> diff --git a/arch/arm64/kernel/smp.c b/arch/arm64/kernel/smp.c >>> index d93d433..4879085 100644 >>> --- a/arch/arm64/kernel/smp.c >>> +++ b/arch/arm64/kernel/smp.c >>> @@ -619,6 +619,7 @@ static void __init of_parse_and_init_cpus(void) >>> } >>> >>> bootcpu_valid = true; >>> + early_map_cpu_to_node(0, of_node_to_nid(dn)); >> >> This seems unrelated? > I will get off my work soon. Maybe I need put it into patch 12. > >> >>> /* >>> * cpu_logical_map has already been >>> diff --git a/arch/arm64/mm/numa.c b/arch/arm64/mm/numa.c >>> index 6853db7..114180f 100644 >>> --- a/arch/arm64/mm/numa.c >>> +++ b/arch/arm64/mm/numa.c >>> @@ -129,6 +129,14 @@ void __init early_map_cpu_to_node(unsigned int cpu, >>> int nid) >>> nid = 0; >>> >>> cpu_to_node_map[cpu] = nid; >>> + >>> + /* >>> +* We should set the numa node of cpu0 as soon as possible, because it >>> +* has already been set up online before. cpu_to_node(0) will soon be >>> +* called. >>> +*/ >>> + if (!cpu) >>> + set_cpu_numa_node(cpu, nid); >> >> Likewise. >> >>> } >>> >>> #ifdef CONFIG_HAVE_SETUP_PER_CPU_AREA >>> @@ -211,6 +219,35 @@ int __init numa_add_memblk(int nid, u64 start, u64 end) >>> return ret; >>> } >>> >>> +static u64 __init alloc_node_data_from_nearest_node(int nid, const size_t >>> size) >>> +{ >>> + int i, best_nid, distance; >>> + u64 pa; >>> + DECLARE_BITMAP(nodes_map, MAX_NUMNODES); >>> + >>> + bitmap_zero(nodes_map, MAX_NUMNODES); >>> + bitmap_set(nodes_map, nid, 1); >>> + >>> +find_nearest_node: >>> + best_nid = NUMA_NO_NODE; >>> + distance = INT_MAX; >>> + >>> + for_each_clear_bit(i, nodes_map, MAX_NUMNODES) >>> + if (numa_distance[nid][i] < distance) { >>> + best_nid = i; >>> + distance = numa_distance[nid][i]; >>> + } >>> + >>> + pa = memblock_alloc_nid(size, SMP_CACHE_BYTES, best_nid); >>> + if (!pa) { >>> + BUG_ON(best_nid == NUMA_NO_NODE); >>> + bitmap_set(nodes_map, best_nid, 1); >>> + goto find_nearest_node; >>> + } >>> + >>> + return pa; >>> +} >>> + >>> /** >>> * Initialize NODE_DATA for a node on the local memory >>> */ >>> @@ -224,7 +261,9 @@ static void __init setup_node_data(int nid, u64 >>> start_pfn, u64 end_pfn) >>> pr_info("Initmem setup node %d [mem %#010Lx-%#010Lx]\n", >>> nid, start_pfn << PAGE_SHIFT, (end_pfn << PAGE_SHIFT) - 1); >>> >>> - nd_pa = memblock_alloc_try_nid(nd_size, SMP_CACHE_BYTES, nid); >>> + nd_pa = memblock_alloc_nid(nd_size, SMP_CACHE_BYTES, nid); >>> + if (!nd_pa) >>> + nd_pa = alloc_node_data_from_nearest_node(nid, nd_size); >> >> Why not add memblock_alloc_near_nid to the core code, and make it do >> what you need there? > I'm thinking about it next week. But some ARCHs like X86/IA64 have their own > implementation. Do you mean directly and only call alloc_node_data_from_nearest_node? OK, that's fine. Thanks. > >> >> Will >> >> . >>
Re: [PATCH v6 0/8] power: add power sequence library
On Wed, Aug 24, 2016 at 04:53:35PM +0800, Peter Chen wrote: > On Tue, Aug 23, 2016 at 04:02:48PM +0530, Vaibhav Hiremath wrote: > > > > > > On Monday 15 August 2016 02:43 PM, Peter Chen wrote: > > >Hi all, > > > > > >This is a follow-up for my last power sequence framework patch set [1]. > > >According to Rob Herring and Ulf Hansson's comments[2], I use a generic > > >power sequence library for parsing the power sequence elements on DT, > > >and implement generic power sequence on library. The host driver > > >can allocate power sequence instance, and calls pwrseq APIs accordingly. > > > > > >In future, if there are special power sequence requirements, the special > > >power sequence library can be created. > > > > > >This patch set is tested on i.mx6 sabresx evk using a dts change, I use > > >two hot-plug devices to simulate this use case, the related binding > > >change is updated at patch [1/6], The udoo board changes were tested > > >using my last power sequence patch set.[3] > > > > > >Except for hard-wired MMC and USB devices, I find the USB ULPI PHY also > > >need to power on itself before it can be found by ULPI bus. > > > > > >[1] http://www.spinics.net/lists/linux-usb/msg142755.html > > >[2] http://www.spinics.net/lists/linux-usb/msg143106.html > > >[3] http://www.spinics.net/lists/linux-usb/msg142815.html > > (Please ignore my response on V2) > > > > Sorry being so late in the discussion... > > > > If I am not missing anything, then I am afraid to say that the > > generic library > > implementation in this patch series is not going to solve many of > > the custom > > requirement of power on, off, etc... > > I know you mentioned about adding another library when we come > > across such platforms, but should we not keep provision (or easy > > hooks/path) > > to enable that ? > > > > Let me bring in the use case I am dealing with, > > > > > > Host > >| > >V > >USB port > > > >| > >V > > USB HUB device (May need custom on/off seq) > >| > >V > > = > > | | > > V V > > Device-1 Device-2 > > (Needs special power (Needs special power > > on/off sequence. on/off sequence. > > Also may need custom Also, may need custom > > sequence for sequence for > > suspend/resume)suspend/resume) > > > > > > Note: Both Devices are connected to HUB via HSIC and may differ > > in terms of functionality, features they support. > > > > In the above case, both Device-1 and Device-2, need separate > > power on/off sequence. So generic library currently we have in this > > patch series is not going to satisfy the need here. > > > > I looked at all 6 revisions of this patch-series, went through the > > review comments, and looked at MMC power sequence code; > > what I can say here is, we need something similar to > > MMC power sequence here, where every device can have its own > > power sequence (if needed). > > > > I know Rob is not in favor of creating platform device for > > this, and I understand his comment. > > If not platform device, but atleast we need mechanism to > > connect each device back to its of_node and its respective > > driver/library fns. For example, the Devices may support different > > boot modes, and platform driver needs to make sure that > > the right sequence is followed for booting. > > > > Peter, My apologies for taking you back again on this series. > > I am OK, if you wish to address this in incremental addition, > > but my point is, we know that the current generic way is not > > enough for us, so I think we should try to fix it in initial phase only. > > > > Rob, it seems generic power sequence can't cover all cases. > Without information from DT, we can't know which power sequence > for which device. > Vaibhav, do you agree that I create pwrseq library list using postcore_initcall for each library, and choose pwrseq library according to compatible string first, if there is no compatible string for this library, just use generic pwrseq library. -- Best Regards, Peter Chen
Re: [PATCH v6 0/8] power: add power sequence library
On Wed, Aug 24, 2016 at 04:53:35PM +0800, Peter Chen wrote: > On Tue, Aug 23, 2016 at 04:02:48PM +0530, Vaibhav Hiremath wrote: > > > > > > On Monday 15 August 2016 02:43 PM, Peter Chen wrote: > > >Hi all, > > > > > >This is a follow-up for my last power sequence framework patch set [1]. > > >According to Rob Herring and Ulf Hansson's comments[2], I use a generic > > >power sequence library for parsing the power sequence elements on DT, > > >and implement generic power sequence on library. The host driver > > >can allocate power sequence instance, and calls pwrseq APIs accordingly. > > > > > >In future, if there are special power sequence requirements, the special > > >power sequence library can be created. > > > > > >This patch set is tested on i.mx6 sabresx evk using a dts change, I use > > >two hot-plug devices to simulate this use case, the related binding > > >change is updated at patch [1/6], The udoo board changes were tested > > >using my last power sequence patch set.[3] > > > > > >Except for hard-wired MMC and USB devices, I find the USB ULPI PHY also > > >need to power on itself before it can be found by ULPI bus. > > > > > >[1] http://www.spinics.net/lists/linux-usb/msg142755.html > > >[2] http://www.spinics.net/lists/linux-usb/msg143106.html > > >[3] http://www.spinics.net/lists/linux-usb/msg142815.html > > (Please ignore my response on V2) > > > > Sorry being so late in the discussion... > > > > If I am not missing anything, then I am afraid to say that the > > generic library > > implementation in this patch series is not going to solve many of > > the custom > > requirement of power on, off, etc... > > I know you mentioned about adding another library when we come > > across such platforms, but should we not keep provision (or easy > > hooks/path) > > to enable that ? > > > > Let me bring in the use case I am dealing with, > > > > > > Host > >| > >V > >USB port > > > >| > >V > > USB HUB device (May need custom on/off seq) > >| > >V > > = > > | | > > V V > > Device-1 Device-2 > > (Needs special power (Needs special power > > on/off sequence. on/off sequence. > > Also may need custom Also, may need custom > > sequence for sequence for > > suspend/resume)suspend/resume) > > > > > > Note: Both Devices are connected to HUB via HSIC and may differ > > in terms of functionality, features they support. > > > > In the above case, both Device-1 and Device-2, need separate > > power on/off sequence. So generic library currently we have in this > > patch series is not going to satisfy the need here. > > > > I looked at all 6 revisions of this patch-series, went through the > > review comments, and looked at MMC power sequence code; > > what I can say here is, we need something similar to > > MMC power sequence here, where every device can have its own > > power sequence (if needed). > > > > I know Rob is not in favor of creating platform device for > > this, and I understand his comment. > > If not platform device, but atleast we need mechanism to > > connect each device back to its of_node and its respective > > driver/library fns. For example, the Devices may support different > > boot modes, and platform driver needs to make sure that > > the right sequence is followed for booting. > > > > Peter, My apologies for taking you back again on this series. > > I am OK, if you wish to address this in incremental addition, > > but my point is, we know that the current generic way is not > > enough for us, so I think we should try to fix it in initial phase only. > > > > Rob, it seems generic power sequence can't cover all cases. > Without information from DT, we can't know which power sequence > for which device. > Vaibhav, do you agree that I create pwrseq library list using postcore_initcall for each library, and choose pwrseq library according to compatible string first, if there is no compatible string for this library, just use generic pwrseq library. -- Best Regards, Peter Chen
[PATCH] ftrace: Access ret_stack->subtime only in the function profiler
The subtime is used only for function profiler with function graph tracer enabled. Move the definition of subtime under CONFIG_FUNCTION_PROFILER to reduce the memory usage. Also move the initialization of subtime into the graph entry callback. Cc: Josh PoimboeufSigned-off-by: Namhyung Kim --- include/linux/ftrace.h | 2 ++ kernel/trace/ftrace.c| 6 ++ kernel/trace/trace_functions_graph.c | 1 - 3 files changed, 8 insertions(+), 1 deletion(-) diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 6f93ac46e7f0..b3d34d3e0e7e 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -794,7 +794,9 @@ struct ftrace_ret_stack { unsigned long ret; unsigned long func; unsigned long long calltime; +#ifdef CONFIG_FUNCTION_PROFILER unsigned long long subtime; +#endif #ifdef HAVE_FUNCTION_GRAPH_FP_TEST unsigned long fp; #endif diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 84752c8e28b5..2050a7652a86 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -872,7 +872,13 @@ function_profile_call(unsigned long ip, unsigned long parent_ip, #ifdef CONFIG_FUNCTION_GRAPH_TRACER static int profile_graph_entry(struct ftrace_graph_ent *trace) { + int index = trace->depth; + function_profile_call(trace->func, 0, NULL, NULL); + + if (index >= 0 && index < FTRACE_RETFUNC_DEPTH) + current->ret_stack[index].subtime = 0; + return 1; } diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c index 0cbe38a844fa..9c7ffa4df5a8 100644 --- a/kernel/trace/trace_functions_graph.c +++ b/kernel/trace/trace_functions_graph.c @@ -170,7 +170,6 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func, int *depth, current->ret_stack[index].ret = ret; current->ret_stack[index].func = func; current->ret_stack[index].calltime = calltime; - current->ret_stack[index].subtime = 0; #ifdef HAVE_FUNCTION_GRAPH_FP_TEST current->ret_stack[index].fp = frame_pointer; #endif -- 2.9.3
[PATCH] ftrace: Access ret_stack->subtime only in the function profiler
The subtime is used only for function profiler with function graph tracer enabled. Move the definition of subtime under CONFIG_FUNCTION_PROFILER to reduce the memory usage. Also move the initialization of subtime into the graph entry callback. Cc: Josh Poimboeuf Signed-off-by: Namhyung Kim --- include/linux/ftrace.h | 2 ++ kernel/trace/ftrace.c| 6 ++ kernel/trace/trace_functions_graph.c | 1 - 3 files changed, 8 insertions(+), 1 deletion(-) diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 6f93ac46e7f0..b3d34d3e0e7e 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -794,7 +794,9 @@ struct ftrace_ret_stack { unsigned long ret; unsigned long func; unsigned long long calltime; +#ifdef CONFIG_FUNCTION_PROFILER unsigned long long subtime; +#endif #ifdef HAVE_FUNCTION_GRAPH_FP_TEST unsigned long fp; #endif diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 84752c8e28b5..2050a7652a86 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -872,7 +872,13 @@ function_profile_call(unsigned long ip, unsigned long parent_ip, #ifdef CONFIG_FUNCTION_GRAPH_TRACER static int profile_graph_entry(struct ftrace_graph_ent *trace) { + int index = trace->depth; + function_profile_call(trace->func, 0, NULL, NULL); + + if (index >= 0 && index < FTRACE_RETFUNC_DEPTH) + current->ret_stack[index].subtime = 0; + return 1; } diff --git a/kernel/trace/trace_functions_graph.c b/kernel/trace/trace_functions_graph.c index 0cbe38a844fa..9c7ffa4df5a8 100644 --- a/kernel/trace/trace_functions_graph.c +++ b/kernel/trace/trace_functions_graph.c @@ -170,7 +170,6 @@ ftrace_push_return_trace(unsigned long ret, unsigned long func, int *depth, current->ret_stack[index].ret = ret; current->ret_stack[index].func = func; current->ret_stack[index].calltime = calltime; - current->ret_stack[index].subtime = 0; #ifdef HAVE_FUNCTION_GRAPH_FP_TEST current->ret_stack[index].fp = frame_pointer; #endif -- 2.9.3
Re: kcm: use-after-free in fput of kcm socket
On Sun, Aug 28, 2016 at 3:10 AM, Dmitry Vyukovwrote: > Hello, > > The following program triggers use-after-free: > > // autogenerated by syzkaller (http://github.com/google/syzkaller) > #include > #include > > int main() > { > int fd = syscall(SYS_socket, 0x29ul, 0x5ul, 0x0ul, 0, 0, 0); > syscall(SYS_ioctl, fd, 0x89e2ul, 0x20a98000ul, 0, 0, 0); > return 0; > } > > > [ 367.240184] > == > [ 367.240784] BUG: KASAN: use-after-free in __fput+0x65a/0x780 at > addr 880069bc4b30 > [ 367.241034] Read of size 2 by task a.out/4045 > [ 367.241034] CPU: 3 PID: 4045 Comm: a.out Not tainted 4.8.0-rc3+ #34 > [ 367.241034] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS Bochs 01/01/2011 > [ 367.241034] 884b8280 880038fb7bc0 82d1b1d9 > 00622e00 > [ 367.241034] fbfff1097050 88003e198900 880069bc4b00 > 880069bc4ec0 > [ 367.241034] 880069bc4b30 859e90a0 880038fb7be8 > 817da1fc > [ 367.241034] Call Trace: > [ 367.241034] [] dump_stack+0x12e/0x185 > [ 367.241034] [] ? sock_release+0x1d0/0x1d0 > [ 367.241034] [] kasan_object_err+0x1c/0x70 > [ 367.241034] [] kasan_report_error+0x1ae/0x490 > [ 367.241034] [] ? sock_release+0x1d0/0x1d0 > [ 367.241034] [] __asan_report_load2_noabort+0x3e/0x40 > [ 367.241034] [] ? __fput+0x65a/0x780 > [ 367.241034] [] __fput+0x65a/0x780 > [ 367.241034] [] fput+0x15/0x20 > [ 367.241034] [] task_work_run+0xf3/0x170 > [ 367.241034] [] do_exit+0x868/0x2c10 > [ 367.241034] [] ? sock_ioctl+0x1db/0x3d0 > [ 367.241034] [] ? sock_do_ioctl+0xb0/0xb0 > [ 367.241034] [] ? do_vfs_ioctl+0x430/0x1080 > [ 367.241034] [] ? mm_update_next_owner+0x640/0x640 > [ 367.241034] [] ? ioctl_preallocate+0x210/0x210 > [ 367.241034] [] ? bad_area+0x69/0x80 > [ 367.241034] [] ? exit_to_usermode_loop+0x3e/0x210 > [ 367.241034] [] ? entry_SYSCALL_64_fastpath+0x5/0xc1 > [ 367.241034] [] do_group_exit+0x108/0x330 > [ 367.241034] [] SyS_exit_group+0x1d/0x20 > [ 367.241034] [] entry_SYSCALL_64_fastpath+0x23/0xc1 Hmm, we have a double free here. I have a patch to fix it, will send it out very soon. Thanks! > [ 367.241034] Object at 880069bc4b00, in cache sock_inode_cache size: 960 > [ 367.241034] Allocated: > [ 367.241034] PID = 4045 > [ 367.241034] [] save_stack_trace+0x26/0x50 > [ 367.241034] [] save_stack+0x46/0xd0 > [ 367.241034] [] kasan_kmalloc+0xad/0xe0 > [ 367.241034] [] kasan_slab_alloc+0x12/0x20 > [ 367.241034] [] kmem_cache_alloc+0x12b/0x710 > [ 367.241034] [] sock_alloc_inode+0x1d/0x250 > [ 367.241034] [] alloc_inode+0x61/0x180 > [ 367.241034] [] new_inode_pseudo+0x17/0xe0 > [ 367.241034] [] sock_alloc+0x41/0x280 > [ 367.241034] [] kcm_ioctl+0x9b3/0x13e0 > [ 367.241034] [] sock_do_ioctl+0x65/0xb0 > [ 367.241034] [] sock_ioctl+0x2d2/0x3d0 > [ 367.241034] [] do_vfs_ioctl+0x18c/0x1080 > [ 367.241034] [] SyS_ioctl+0x8f/0xc0 > [ 367.241034] [] entry_SYSCALL_64_fastpath+0x23/0xc1 > [ 367.241034] Freed: > [ 367.241034] PID = 4045 > [ 367.241034] [] save_stack_trace+0x26/0x50 > [ 367.241034] [] save_stack+0x46/0xd0 > [ 367.241034] [] kasan_slab_free+0x72/0xc0 > [ 367.241034] [] kmem_cache_free+0x76/0x300 > [ 367.241034] [] sock_destroy_inode+0x56/0x70 > [ 367.241034] [] destroy_inode+0xc7/0x130 > [ 367.241034] [] evict+0x329/0x500 > [ 367.241034] [] iput+0x495/0x930 > [ 367.241034] [] sock_release+0x164/0x1d0 > [ 367.241034] [] sock_close+0x16/0x20 > [ 367.241034] [] __fput+0x236/0x780 > [ 367.241034] [] fput+0x15/0x20 > [ 367.241034] [] task_work_run+0xf3/0x170 > [ 367.241034] [] do_exit+0x868/0x2c10 > [ 367.241034] [] do_group_exit+0x108/0x330 > [ 367.241034] [] SyS_exit_group+0x1d/0x20 > [ 367.241034] [] entry_SYSCALL_64_fastpath+0x23/0xc1 > [ 367.241034] Memory state around the buggy address: > [ 367.241034] 880069bc4a00: fc fc fc fc fc fc fc fc fc fc fc fc > fc fc fc fc > [ 367.241034] 880069bc4a80: fc fc fc fc fc fc fc fc fc fc fc fc > fc fc fc fc > [ 367.241034] >880069bc4b00: fb fb fb fb fb fb fb fb fb fb fb fb > fb fb fb fb > [ 367.241034] ^ > [ 367.241034] 880069bc4b80: fb fb fb fb fb fb fb fb fb fb fb fb > fb fb fb fb > [ 367.241034] 880069bc4c00: fb fb fb fb fb fb fb fb fb fb fb fb > fb fb fb fb > [ 367.241034] > == > > > It is then followed by a bunch of other bugs, full log is here: > https://gist.githubusercontent.com/dvyukov/b9884388bee40b792ae7900928358484/raw/ace2fa242468d584fa61bf753a5891faa71b0932/gistfile1.txt > > > On commit 61c04572de404e52a655a36752e696bbcb483cf5 (Aug 25).
Re: kcm: use-after-free in fput of kcm socket
On Sun, Aug 28, 2016 at 3:10 AM, Dmitry Vyukov wrote: > Hello, > > The following program triggers use-after-free: > > // autogenerated by syzkaller (http://github.com/google/syzkaller) > #include > #include > > int main() > { > int fd = syscall(SYS_socket, 0x29ul, 0x5ul, 0x0ul, 0, 0, 0); > syscall(SYS_ioctl, fd, 0x89e2ul, 0x20a98000ul, 0, 0, 0); > return 0; > } > > > [ 367.240184] > == > [ 367.240784] BUG: KASAN: use-after-free in __fput+0x65a/0x780 at > addr 880069bc4b30 > [ 367.241034] Read of size 2 by task a.out/4045 > [ 367.241034] CPU: 3 PID: 4045 Comm: a.out Not tainted 4.8.0-rc3+ #34 > [ 367.241034] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS Bochs 01/01/2011 > [ 367.241034] 884b8280 880038fb7bc0 82d1b1d9 > 00622e00 > [ 367.241034] fbfff1097050 88003e198900 880069bc4b00 > 880069bc4ec0 > [ 367.241034] 880069bc4b30 859e90a0 880038fb7be8 > 817da1fc > [ 367.241034] Call Trace: > [ 367.241034] [] dump_stack+0x12e/0x185 > [ 367.241034] [] ? sock_release+0x1d0/0x1d0 > [ 367.241034] [] kasan_object_err+0x1c/0x70 > [ 367.241034] [] kasan_report_error+0x1ae/0x490 > [ 367.241034] [] ? sock_release+0x1d0/0x1d0 > [ 367.241034] [] __asan_report_load2_noabort+0x3e/0x40 > [ 367.241034] [] ? __fput+0x65a/0x780 > [ 367.241034] [] __fput+0x65a/0x780 > [ 367.241034] [] fput+0x15/0x20 > [ 367.241034] [] task_work_run+0xf3/0x170 > [ 367.241034] [] do_exit+0x868/0x2c10 > [ 367.241034] [] ? sock_ioctl+0x1db/0x3d0 > [ 367.241034] [] ? sock_do_ioctl+0xb0/0xb0 > [ 367.241034] [] ? do_vfs_ioctl+0x430/0x1080 > [ 367.241034] [] ? mm_update_next_owner+0x640/0x640 > [ 367.241034] [] ? ioctl_preallocate+0x210/0x210 > [ 367.241034] [] ? bad_area+0x69/0x80 > [ 367.241034] [] ? exit_to_usermode_loop+0x3e/0x210 > [ 367.241034] [] ? entry_SYSCALL_64_fastpath+0x5/0xc1 > [ 367.241034] [] do_group_exit+0x108/0x330 > [ 367.241034] [] SyS_exit_group+0x1d/0x20 > [ 367.241034] [] entry_SYSCALL_64_fastpath+0x23/0xc1 Hmm, we have a double free here. I have a patch to fix it, will send it out very soon. Thanks! > [ 367.241034] Object at 880069bc4b00, in cache sock_inode_cache size: 960 > [ 367.241034] Allocated: > [ 367.241034] PID = 4045 > [ 367.241034] [] save_stack_trace+0x26/0x50 > [ 367.241034] [] save_stack+0x46/0xd0 > [ 367.241034] [] kasan_kmalloc+0xad/0xe0 > [ 367.241034] [] kasan_slab_alloc+0x12/0x20 > [ 367.241034] [] kmem_cache_alloc+0x12b/0x710 > [ 367.241034] [] sock_alloc_inode+0x1d/0x250 > [ 367.241034] [] alloc_inode+0x61/0x180 > [ 367.241034] [] new_inode_pseudo+0x17/0xe0 > [ 367.241034] [] sock_alloc+0x41/0x280 > [ 367.241034] [] kcm_ioctl+0x9b3/0x13e0 > [ 367.241034] [] sock_do_ioctl+0x65/0xb0 > [ 367.241034] [] sock_ioctl+0x2d2/0x3d0 > [ 367.241034] [] do_vfs_ioctl+0x18c/0x1080 > [ 367.241034] [] SyS_ioctl+0x8f/0xc0 > [ 367.241034] [] entry_SYSCALL_64_fastpath+0x23/0xc1 > [ 367.241034] Freed: > [ 367.241034] PID = 4045 > [ 367.241034] [] save_stack_trace+0x26/0x50 > [ 367.241034] [] save_stack+0x46/0xd0 > [ 367.241034] [] kasan_slab_free+0x72/0xc0 > [ 367.241034] [] kmem_cache_free+0x76/0x300 > [ 367.241034] [] sock_destroy_inode+0x56/0x70 > [ 367.241034] [] destroy_inode+0xc7/0x130 > [ 367.241034] [] evict+0x329/0x500 > [ 367.241034] [] iput+0x495/0x930 > [ 367.241034] [] sock_release+0x164/0x1d0 > [ 367.241034] [] sock_close+0x16/0x20 > [ 367.241034] [] __fput+0x236/0x780 > [ 367.241034] [] fput+0x15/0x20 > [ 367.241034] [] task_work_run+0xf3/0x170 > [ 367.241034] [] do_exit+0x868/0x2c10 > [ 367.241034] [] do_group_exit+0x108/0x330 > [ 367.241034] [] SyS_exit_group+0x1d/0x20 > [ 367.241034] [] entry_SYSCALL_64_fastpath+0x23/0xc1 > [ 367.241034] Memory state around the buggy address: > [ 367.241034] 880069bc4a00: fc fc fc fc fc fc fc fc fc fc fc fc > fc fc fc fc > [ 367.241034] 880069bc4a80: fc fc fc fc fc fc fc fc fc fc fc fc > fc fc fc fc > [ 367.241034] >880069bc4b00: fb fb fb fb fb fb fb fb fb fb fb fb > fb fb fb fb > [ 367.241034] ^ > [ 367.241034] 880069bc4b80: fb fb fb fb fb fb fb fb fb fb fb fb > fb fb fb fb > [ 367.241034] 880069bc4c00: fb fb fb fb fb fb fb fb fb fb fb fb > fb fb fb fb > [ 367.241034] > == > > > It is then followed by a bunch of other bugs, full log is here: > https://gist.githubusercontent.com/dvyukov/b9884388bee40b792ae7900928358484/raw/ace2fa242468d584fa61bf753a5891faa71b0932/gistfile1.txt > > > On commit 61c04572de404e52a655a36752e696bbcb483cf5 (Aug 25).
Re: [PATCH 2/2] arm64: dts: rockchip: add eMMC's power domain support for rk3399
On 08/27/2016 11:05 PM, Shawn Lin wrote: On 2016/8/27 21:41, Ziyuan Xu wrote: Control power domain for eMMC via genpd to reduce power consumption. Signed-off-by: Elaine ZhangSigned-off-by: Ziyuan Xu It looks nice to me. But this should be merged after applying that[0] as your patch will break bind/unbind test for sdhci-of-arasan on rk3399 without it[0]. Moreover, Elaine should make sure that upstreamed rockchip power domain stuff would not off pd for emmc, *otherwise*, I should update my patch to make sure we update clkmul every time when doing suspend 2 resume.. Forgot to say: If use pd, Although there is no call to power odd the pd_emmc, it will be power off when the system doing suspend 2 resume. (Because the system call __device_suspend_noirq->pm_genpd_suspend_noirq->rockchip_pd_power_off) And it's important to note: If the pd has been power off, some grf regs will be back to the default value.(which grf regs in this pd) So if the pd support power off , this grf regs need to save and restore or reinit. For example: pd_emmc aclk_emmc_grf If the pd is always on,and this pd have wakeup func. The device need to add device_init_wakeup() to make the pd always on when the system doing suspend 2 resume. [0]: https://patchwork.kernel.org/patch/9300971/ --- arch/arm64/boot/dts/rockchip/rk3399.dtsi | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi b/arch/arm64/boot/dts/rockchip/rk3399.dtsi index 32aebc8..71733d4 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi @@ -239,6 +239,7 @@ #clock-cells = <0>; phys = <_phy>; phy-names = "phy_arasan"; +power-domains = < RK3399_PD_EMMC>; status = "disabled"; }; @@ -611,6 +612,11 @@ status = "disabled"; }; +qos_emmc: qos@ffa58000 { +compatible = "syscon"; +reg = <0x0 0xffa58000 0x0 0x20>; +}; + qos_hdcp: qos@ffa9 { compatible = "syscon"; reg = <0x0 0xffa9 0x0 0x20>; @@ -739,6 +745,11 @@ }; /* These power domains are grouped by VD_LOGIC */ +pd_emmc@RK3399_PD_EMMC { +reg = ; +clocks = < ACLK_EMMC>; +pm_qos = <_emmc>; +}; pd_vio@RK3399_PD_VIO { reg = ; #address-cells = <1>;
Re: [PATCH 2/2] arm64: dts: rockchip: add eMMC's power domain support for rk3399
On 08/27/2016 11:05 PM, Shawn Lin wrote: On 2016/8/27 21:41, Ziyuan Xu wrote: Control power domain for eMMC via genpd to reduce power consumption. Signed-off-by: Elaine Zhang Signed-off-by: Ziyuan Xu It looks nice to me. But this should be merged after applying that[0] as your patch will break bind/unbind test for sdhci-of-arasan on rk3399 without it[0]. Moreover, Elaine should make sure that upstreamed rockchip power domain stuff would not off pd for emmc, *otherwise*, I should update my patch to make sure we update clkmul every time when doing suspend 2 resume.. Forgot to say: If use pd, Although there is no call to power odd the pd_emmc, it will be power off when the system doing suspend 2 resume. (Because the system call __device_suspend_noirq->pm_genpd_suspend_noirq->rockchip_pd_power_off) And it's important to note: If the pd has been power off, some grf regs will be back to the default value.(which grf regs in this pd) So if the pd support power off , this grf regs need to save and restore or reinit. For example: pd_emmc aclk_emmc_grf If the pd is always on,and this pd have wakeup func. The device need to add device_init_wakeup() to make the pd always on when the system doing suspend 2 resume. [0]: https://patchwork.kernel.org/patch/9300971/ --- arch/arm64/boot/dts/rockchip/rk3399.dtsi | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi b/arch/arm64/boot/dts/rockchip/rk3399.dtsi index 32aebc8..71733d4 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi @@ -239,6 +239,7 @@ #clock-cells = <0>; phys = <_phy>; phy-names = "phy_arasan"; +power-domains = < RK3399_PD_EMMC>; status = "disabled"; }; @@ -611,6 +612,11 @@ status = "disabled"; }; +qos_emmc: qos@ffa58000 { +compatible = "syscon"; +reg = <0x0 0xffa58000 0x0 0x20>; +}; + qos_hdcp: qos@ffa9 { compatible = "syscon"; reg = <0x0 0xffa9 0x0 0x20>; @@ -739,6 +745,11 @@ }; /* These power domains are grouped by VD_LOGIC */ +pd_emmc@RK3399_PD_EMMC { +reg = ; +clocks = < ACLK_EMMC>; +pm_qos = <_emmc>; +}; pd_vio@RK3399_PD_VIO { reg = ; #address-cells = <1>;
Re: [PATCH] drm/rockchip: vop: make vop register setting take effect
On 2016年08月27日 11:39, Chris Zhong wrote: The setting of vop registers need a reg_done writing to take effect. In vop_enable the vop return to work by by restoring registers, but the registers do not take effect immediately, it should a vop_cfg_done after it. The same thing is needed by windows_disabled in vop_crtc_disable. Signed-off-by: Chris Zhong--- drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 4 1 file changed, 4 insertions(+) Thanks for your fix. applied to my drm-fixes. diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c index efbc41a..a0bfcff 100644 --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c @@ -464,6 +464,8 @@ static int vop_enable(struct drm_crtc *crtc) } memcpy(vop->regs, vop->regsbak, vop->len); + vop_cfg_done(vop); + /* * At here, vop clock & iommu is enable, R/W vop regs would be safe. */ @@ -513,6 +515,8 @@ static void vop_crtc_disable(struct drm_crtc *crtc) spin_unlock(>reg_lock); } + vop_cfg_done(vop); + drm_crtc_vblank_off(crtc); /* -- Mark Yao
Re: [PATCH] drm/rockchip: vop: make vop register setting take effect
On 2016年08月27日 11:39, Chris Zhong wrote: The setting of vop registers need a reg_done writing to take effect. In vop_enable the vop return to work by by restoring registers, but the registers do not take effect immediately, it should a vop_cfg_done after it. The same thing is needed by windows_disabled in vop_crtc_disable. Signed-off-by: Chris Zhong --- drivers/gpu/drm/rockchip/rockchip_drm_vop.c | 4 1 file changed, 4 insertions(+) Thanks for your fix. applied to my drm-fixes. diff --git a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c index efbc41a..a0bfcff 100644 --- a/drivers/gpu/drm/rockchip/rockchip_drm_vop.c +++ b/drivers/gpu/drm/rockchip/rockchip_drm_vop.c @@ -464,6 +464,8 @@ static int vop_enable(struct drm_crtc *crtc) } memcpy(vop->regs, vop->regsbak, vop->len); + vop_cfg_done(vop); + /* * At here, vop clock & iommu is enable, R/W vop regs would be safe. */ @@ -513,6 +515,8 @@ static void vop_crtc_disable(struct drm_crtc *crtc) spin_unlock(>reg_lock); } + vop_cfg_done(vop); + drm_crtc_vblank_off(crtc); /* -- Mark Yao
[PATCH] iio: fix pressure data output unit in hid-sensor-attributes
From: "Kweh, Hock Leong"According to IIO ABI definition, IIO_PRESSURE data output unit is kilopascal: http://lxr.free-electrons.com/source/Documentation/ABI/testing/sysfs-bus-iio This patch fix output unit of HID pressure sensor IIO driver from pascal to kilopascal to follow IIO ABI definition. Signed-off-by: Kweh, Hock Leong --- .../iio/common/hid-sensors/hid-sensor-attributes.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/iio/common/hid-sensors/hid-sensor-attributes.c b/drivers/iio/common/hid-sensors/hid-sensor-attributes.c index e81f434..dc33c1d 100644 --- a/drivers/iio/common/hid-sensors/hid-sensor-attributes.c +++ b/drivers/iio/common/hid-sensors/hid-sensor-attributes.c @@ -56,8 +56,8 @@ static struct { {HID_USAGE_SENSOR_ALS, 0, 1, 0}, {HID_USAGE_SENSOR_ALS, HID_USAGE_SENSOR_UNITS_LUX, 1, 0}, - {HID_USAGE_SENSOR_PRESSURE, 0, 10, 0}, - {HID_USAGE_SENSOR_PRESSURE, HID_USAGE_SENSOR_UNITS_PASCAL, 1, 0}, + {HID_USAGE_SENSOR_PRESSURE, 0, 100, 0}, + {HID_USAGE_SENSOR_PRESSURE, HID_USAGE_SENSOR_UNITS_PASCAL, 0, 1000}, }; static int pow_10(unsigned power) -- 1.7.9.5
[PATCH] iio: fix pressure data output unit in hid-sensor-attributes
From: "Kweh, Hock Leong" According to IIO ABI definition, IIO_PRESSURE data output unit is kilopascal: http://lxr.free-electrons.com/source/Documentation/ABI/testing/sysfs-bus-iio This patch fix output unit of HID pressure sensor IIO driver from pascal to kilopascal to follow IIO ABI definition. Signed-off-by: Kweh, Hock Leong --- .../iio/common/hid-sensors/hid-sensor-attributes.c |4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/drivers/iio/common/hid-sensors/hid-sensor-attributes.c b/drivers/iio/common/hid-sensors/hid-sensor-attributes.c index e81f434..dc33c1d 100644 --- a/drivers/iio/common/hid-sensors/hid-sensor-attributes.c +++ b/drivers/iio/common/hid-sensors/hid-sensor-attributes.c @@ -56,8 +56,8 @@ static struct { {HID_USAGE_SENSOR_ALS, 0, 1, 0}, {HID_USAGE_SENSOR_ALS, HID_USAGE_SENSOR_UNITS_LUX, 1, 0}, - {HID_USAGE_SENSOR_PRESSURE, 0, 10, 0}, - {HID_USAGE_SENSOR_PRESSURE, HID_USAGE_SENSOR_UNITS_PASCAL, 1, 0}, + {HID_USAGE_SENSOR_PRESSURE, 0, 100, 0}, + {HID_USAGE_SENSOR_PRESSURE, HID_USAGE_SENSOR_UNITS_PASCAL, 0, 1000}, }; static int pow_10(unsigned power) -- 1.7.9.5
Re: [PATCH] thermal: hisilicon: fix COMPILE_TEST dependencies
On Mon, Aug 29, 2016 at 10:00:52AM +0800, Zhang Rui wrote: > On 五, 2016-08-26 at 17:44 +0200, Arnd Bergmann wrote: > > As we now 'select STUB_CLK_HI6220', all dependencies for that driver > > have > > to be present in order to enable HISI_THERMAL, as pointed out by > > Kconfig: > > > > warning: (HISI_THERMAL) selects STUB_CLK_HI6220 which has unmet > > direct dependencies (COMMON_CLK && COMMON_CLK_HI6220 && MAILBOX) > > > > This rearranges the dependencies for this symbol, so all the > > dependencies > > aside from ARCH_HISI are always met when building it for compile > > testing. > > This mainly helps for randconfig testing, as an "allmodconfig" kernel > > will > > enable them anyway. > > > > Signed-off-by: Arnd Bergmann> > Fixes: 5f63581ce68e ("thermal: hisilicon: Add dependency on the clock > > driver to allow frequency scaling") > > As commit 5f63581ce68e has not been shipped in upstream yet, please > fold this patch into the original one. I'd prefer one good patch > instead of a broken patch + a fix. Amit has one discussion with me, we have a more clear method to enable Hisilicon thermal driver [1]: we are planning to enable CONFIG_CPU_THERMAL in defconfig, and enable stub clock driver and thermal driver depend on ARCH_HISI; so can resolve all dependency issue. I will prepare related patches and send out review ASAP, sorry for my late. [1] https://lkml.org/lkml/2016/8/8/879 Thanks, Leo Yan > > --- > > drivers/thermal/Kconfig | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig > > index 5cba072c3a62..3c8607c07352 100644 > > --- a/drivers/thermal/Kconfig > > +++ b/drivers/thermal/Kconfig > > @@ -177,7 +177,8 @@ config THERMAL_EMULATION > > > > config HISI_THERMAL > > tristate "Hisilicon thermal driver" > > - depends on (ARCH_HISI && CPU_THERMAL && OF) || COMPILE_TEST > > + depends on ARCH_HISI || COMPILE_TEST > > + depends on CPU_THERMAL && OF && COMMON_CLK_HI6220 && MAILBOX > > depends on HAS_IOMEM > > select STUB_CLK_HI6220 > > help
Re: [PATCH] thermal: hisilicon: fix COMPILE_TEST dependencies
On Mon, Aug 29, 2016 at 10:00:52AM +0800, Zhang Rui wrote: > On 五, 2016-08-26 at 17:44 +0200, Arnd Bergmann wrote: > > As we now 'select STUB_CLK_HI6220', all dependencies for that driver > > have > > to be present in order to enable HISI_THERMAL, as pointed out by > > Kconfig: > > > > warning: (HISI_THERMAL) selects STUB_CLK_HI6220 which has unmet > > direct dependencies (COMMON_CLK && COMMON_CLK_HI6220 && MAILBOX) > > > > This rearranges the dependencies for this symbol, so all the > > dependencies > > aside from ARCH_HISI are always met when building it for compile > > testing. > > This mainly helps for randconfig testing, as an "allmodconfig" kernel > > will > > enable them anyway. > > > > Signed-off-by: Arnd Bergmann > > Fixes: 5f63581ce68e ("thermal: hisilicon: Add dependency on the clock > > driver to allow frequency scaling") > > As commit 5f63581ce68e has not been shipped in upstream yet, please > fold this patch into the original one. I'd prefer one good patch > instead of a broken patch + a fix. Amit has one discussion with me, we have a more clear method to enable Hisilicon thermal driver [1]: we are planning to enable CONFIG_CPU_THERMAL in defconfig, and enable stub clock driver and thermal driver depend on ARCH_HISI; so can resolve all dependency issue. I will prepare related patches and send out review ASAP, sorry for my late. [1] https://lkml.org/lkml/2016/8/8/879 Thanks, Leo Yan > > --- > > drivers/thermal/Kconfig | 3 ++- > > 1 file changed, 2 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig > > index 5cba072c3a62..3c8607c07352 100644 > > --- a/drivers/thermal/Kconfig > > +++ b/drivers/thermal/Kconfig > > @@ -177,7 +177,8 @@ config THERMAL_EMULATION > > > > config HISI_THERMAL > > tristate "Hisilicon thermal driver" > > - depends on (ARCH_HISI && CPU_THERMAL && OF) || COMPILE_TEST > > + depends on ARCH_HISI || COMPILE_TEST > > + depends on CPU_THERMAL && OF && COMMON_CLK_HI6220 && MAILBOX > > depends on HAS_IOMEM > > select STUB_CLK_HI6220 > > help
Re: chipidea: udc: kernel panic in isr_setup_status_phase
On Sun, Aug 28, 2016 at 08:15:02PM +0200, Clemens Gruber wrote: > On Sat, Aug 27, 2016 at 01:21:52AM +0800, Peter Chen wrote: > > The gadget triggers UI interrupt due to host sends packet. > > > > I really can't understand that, why host does not send bus reset > > before sending packet (eg, GET_DESCRIPTOR)? It violates USB spec. > > > > Are you sure the first interrupt is UI when the vbus from off to on? > > Yes, if the error is present, the first interrupt is intr=0x1 (USBi_UI) > and then the NULL pointer dereference would occur. > (Also: Checking for ci->status == NULL and avoiding the dereference does > not make the gadget work. It just avoids the kernel panic.) > > But I also observed a situation where the first interrupt is intr=0x100 > (USBi_SLI) followed by 0x40 (USBi_URI), 0x4 (USBi_PCI) and three times > 0x1 (USBi_UI). > After this "g_ether gadget: suspend" appears and the sequence repeats, > starting again with intr=0x100, followed by 0x40, ... until three times > 0x1 and the g_ether gadget: suspend message. > On the host, every 500ms a new message with incrementing device number > appears: > usb 1-4: new high-speed USB device number 41 using xhci_hcd > usb 1-4: new high-speed USB device number 42 using xhci_hcd > ... > > In the case where everything works, it looks like this: > intr=0x100 (USBi_SLI) > intr=0x40 (USBi_URI) > intr=0x4 (USBi_PCI) > intr=0x1 (USBi_UI) > intr=0x1 (USBi_UI) > ci_hdrc ci_hdrc.0: freeing queued_request > intr=0x41 (USBi_URI + USBi_UI) > intr=0x4 (USBi_PCI) > intr=0x1 (USBi_UI) <-- appears 17 times > g_ether gadget: high-speed config #1: CDC Ethernet (EEM) > intr=0x1 (USBi_UI) <-- appears 5 times > IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link becomes ready > > -- > > Do you think this could be a hardware problem? We used the same method > as in the MCIMX6Q-SDB schematics (SPF-27516_C5.pdf) to avoid any current > flow through OTG VBUS to the inside when the board is powered off but a > host PC is still connected via OTG. > So we not just pass the VBUS signal through, there are two MOSFETs, > which prevent that (if the internal 3.3V is low). > Mostly the same logic as in said document on page 11 (top-left area). > > Another possibility, I am investigating now, is a ground loop and a > main-supply voltage-dependency, although the whole USB OTG part is > on a completely different supply rail, the GNDs are shared. > > I am investigating in all directions at the moment ;-) > Would you please measure the voltage of vbus within 1s at below two conditions: - Just connect cable - Just disconnect cable > > I also switched to CDC/EEM to make sure it has nothing to do with RNDIS, > and the problem is still present. So the error must be on a lower level. > > -- > > You could try to reproduce it with a MCIMX6Q-SDB and varying the main > supply voltage between minimum and maximum allowed voltage levels. For > example: Plug OTG in once at the minimum and once at the maximum level, > see if it behaves differently. > But this is just one of my desperate theories at the moment.. > Sorry, I have no equipment which can change the voltage of main supplier now. -- Best Regards, Peter Chen
Re: chipidea: udc: kernel panic in isr_setup_status_phase
On Sun, Aug 28, 2016 at 08:15:02PM +0200, Clemens Gruber wrote: > On Sat, Aug 27, 2016 at 01:21:52AM +0800, Peter Chen wrote: > > The gadget triggers UI interrupt due to host sends packet. > > > > I really can't understand that, why host does not send bus reset > > before sending packet (eg, GET_DESCRIPTOR)? It violates USB spec. > > > > Are you sure the first interrupt is UI when the vbus from off to on? > > Yes, if the error is present, the first interrupt is intr=0x1 (USBi_UI) > and then the NULL pointer dereference would occur. > (Also: Checking for ci->status == NULL and avoiding the dereference does > not make the gadget work. It just avoids the kernel panic.) > > But I also observed a situation where the first interrupt is intr=0x100 > (USBi_SLI) followed by 0x40 (USBi_URI), 0x4 (USBi_PCI) and three times > 0x1 (USBi_UI). > After this "g_ether gadget: suspend" appears and the sequence repeats, > starting again with intr=0x100, followed by 0x40, ... until three times > 0x1 and the g_ether gadget: suspend message. > On the host, every 500ms a new message with incrementing device number > appears: > usb 1-4: new high-speed USB device number 41 using xhci_hcd > usb 1-4: new high-speed USB device number 42 using xhci_hcd > ... > > In the case where everything works, it looks like this: > intr=0x100 (USBi_SLI) > intr=0x40 (USBi_URI) > intr=0x4 (USBi_PCI) > intr=0x1 (USBi_UI) > intr=0x1 (USBi_UI) > ci_hdrc ci_hdrc.0: freeing queued_request > intr=0x41 (USBi_URI + USBi_UI) > intr=0x4 (USBi_PCI) > intr=0x1 (USBi_UI) <-- appears 17 times > g_ether gadget: high-speed config #1: CDC Ethernet (EEM) > intr=0x1 (USBi_UI) <-- appears 5 times > IPv6: ADDRCONF(NETDEV_CHANGE): usb0: link becomes ready > > -- > > Do you think this could be a hardware problem? We used the same method > as in the MCIMX6Q-SDB schematics (SPF-27516_C5.pdf) to avoid any current > flow through OTG VBUS to the inside when the board is powered off but a > host PC is still connected via OTG. > So we not just pass the VBUS signal through, there are two MOSFETs, > which prevent that (if the internal 3.3V is low). > Mostly the same logic as in said document on page 11 (top-left area). > > Another possibility, I am investigating now, is a ground loop and a > main-supply voltage-dependency, although the whole USB OTG part is > on a completely different supply rail, the GNDs are shared. > > I am investigating in all directions at the moment ;-) > Would you please measure the voltage of vbus within 1s at below two conditions: - Just connect cable - Just disconnect cable > > I also switched to CDC/EEM to make sure it has nothing to do with RNDIS, > and the problem is still present. So the error must be on a lower level. > > -- > > You could try to reproduce it with a MCIMX6Q-SDB and varying the main > supply voltage between minimum and maximum allowed voltage levels. For > example: Plug OTG in once at the minimum and once at the maximum level, > see if it behaves differently. > But this is just one of my desperate theories at the moment.. > Sorry, I have no equipment which can change the voltage of main supplier now. -- Best Regards, Peter Chen
Re: checkkpatch (in)sanity ?
On Sun, Aug 28, 2016 at 07:20:52PM -0400, Joe Perches wrote: > On Sun, 2016-08-28 at 18:37 -0400, Levin, Alexander wrote: > > On Sun, Aug 28, 2016 at 01:15:57PM -0400, Joe Perches wrote: > > > On Sat, 2016-08-27 at 22:47 -0400, Levin, Alexander wrote: > > > > Would you agree that by default we shouldn't show anything that's > > > > not an error/defect? > > > Not particularly, no. > > I think that we need to figure out this disagreement first then. My > > claim is that checkpatch's output isn't useful. > [] > > It'll be interesting to hear from these people about their view of > > checkpatch, but IMO when on average there are more issues than commits > > I can suggest two possible causes: > > > > 1. People are used to ignore checkpatch warnings. > > 2. People aren't using checkpatch. > > > > Can you really make the claim that this is how checkpatch is supposed > > to be working? > > . I make no particular claims about checkpatch. > > I think checkpatch isn't particularly useful for those > thoroughly inculcated in what style the kernel uses and > is more useful for infrequent or new submitters. > > The long time submitters and key maintainers are already > pretty consistent about coding style. I did the same test for authors of 5-9 commits (just an arbitrary choice of numbers for "infrequent"), the results there are much worse: 3981 commits, 7175 issues. The only big subsystem that seems to be forcing checkpatch "correctness" is mm/, where akpm is fixing up checkpatch issues himself. Otherwise, it looks like maintainers are not running checkpatch nor are making sure that the commits they merge in don't have checkpatch issues. > It would be good to examine the specific messages though. What for? The point is that with that amount of issues it's evident that people don't actually use checkpatch to begin with. We can discuss whether the output it produces makes sense all we want, but the fact is that people just don't use it - and I've tried to give my opinion of why I think it happens. -- Thanks, Sasha
Re: checkkpatch (in)sanity ?
On Sun, Aug 28, 2016 at 07:20:52PM -0400, Joe Perches wrote: > On Sun, 2016-08-28 at 18:37 -0400, Levin, Alexander wrote: > > On Sun, Aug 28, 2016 at 01:15:57PM -0400, Joe Perches wrote: > > > On Sat, 2016-08-27 at 22:47 -0400, Levin, Alexander wrote: > > > > Would you agree that by default we shouldn't show anything that's > > > > not an error/defect? > > > Not particularly, no. > > I think that we need to figure out this disagreement first then. My > > claim is that checkpatch's output isn't useful. > [] > > It'll be interesting to hear from these people about their view of > > checkpatch, but IMO when on average there are more issues than commits > > I can suggest two possible causes: > > > > 1. People are used to ignore checkpatch warnings. > > 2. People aren't using checkpatch. > > > > Can you really make the claim that this is how checkpatch is supposed > > to be working? > > . I make no particular claims about checkpatch. > > I think checkpatch isn't particularly useful for those > thoroughly inculcated in what style the kernel uses and > is more useful for infrequent or new submitters. > > The long time submitters and key maintainers are already > pretty consistent about coding style. I did the same test for authors of 5-9 commits (just an arbitrary choice of numbers for "infrequent"), the results there are much worse: 3981 commits, 7175 issues. The only big subsystem that seems to be forcing checkpatch "correctness" is mm/, where akpm is fixing up checkpatch issues himself. Otherwise, it looks like maintainers are not running checkpatch nor are making sure that the commits they merge in don't have checkpatch issues. > It would be good to examine the specific messages though. What for? The point is that with that amount of issues it's evident that people don't actually use checkpatch to begin with. We can discuss whether the output it produces makes sense all we want, but the fact is that people just don't use it - and I've tried to give my opinion of why I think it happens. -- Thanks, Sasha
Re: [PATCH v6 1/2] clk: uniphier: add core support code for UniPhier clock driver
Hi Stephen, 2016-08-20 4:16 GMT+09:00 Stephen Boyd: >> >> >> + >> >> + parent = of_get_parent(dev->of_node); /* parent should be syscon >> >> node */ >> >> + regmap = syscon_node_to_regmap(parent); >> >> + of_node_put(parent); >> > >> > devm_get_regmap(dev->parent) should work then? Why do we need to >> > use OF APIs? >> >> "git grep devm_get_regmap" did not hit anything. >> >> Where is it defined? >> > > Sorry I meant dev_get_regmap(). > I tried this, but it did not work. To make dev_get_regmap() work, the parent device needs to call dev_regmap_init_mmio() beforehand. Since commit bdb0066df96e74a4002125467ebe459feff1ebef (mfd: syscon: Decouple syscon interface from platform devices), syscon_probe() is not called for platform devices, so that never happens. -- Best Regards Masahiro Yamada
Re: [PATCH v6 1/2] clk: uniphier: add core support code for UniPhier clock driver
Hi Stephen, 2016-08-20 4:16 GMT+09:00 Stephen Boyd : >> >> >> + >> >> + parent = of_get_parent(dev->of_node); /* parent should be syscon >> >> node */ >> >> + regmap = syscon_node_to_regmap(parent); >> >> + of_node_put(parent); >> > >> > devm_get_regmap(dev->parent) should work then? Why do we need to >> > use OF APIs? >> >> "git grep devm_get_regmap" did not hit anything. >> >> Where is it defined? >> > > Sorry I meant dev_get_regmap(). > I tried this, but it did not work. To make dev_get_regmap() work, the parent device needs to call dev_regmap_init_mmio() beforehand. Since commit bdb0066df96e74a4002125467ebe459feff1ebef (mfd: syscon: Decouple syscon interface from platform devices), syscon_probe() is not called for platform devices, so that never happens. -- Best Regards Masahiro Yamada
Re: [PATCH 2/2] arm64: dts: rockchip: add eMMC's power domain support for rk3399
On 08/27/2016 11:05 PM, Shawn Lin wrote: On 2016/8/27 21:41, Ziyuan Xu wrote: Control power domain for eMMC via genpd to reduce power consumption. Signed-off-by: Elaine ZhangSigned-off-by: Ziyuan Xu It looks nice to me. But this should be merged after applying that[0] as your patch will break bind/unbind test for sdhci-of-arasan on rk3399 without it[0]. Moreover, Elaine should make sure that upstreamed rockchip power domain stuff would not off pd for emmc, *otherwise*, I should update my patch to make sure we update clkmul every time when doing suspend 2 resume.. It looks nice to me. I was going on to submit with other Pds. [0]: https://patchwork.kernel.org/patch/9300971/ --- arch/arm64/boot/dts/rockchip/rk3399.dtsi | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi b/arch/arm64/boot/dts/rockchip/rk3399.dtsi index 32aebc8..71733d4 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi @@ -239,6 +239,7 @@ #clock-cells = <0>; phys = <_phy>; phy-names = "phy_arasan"; +power-domains = < RK3399_PD_EMMC>; status = "disabled"; }; @@ -611,6 +612,11 @@ status = "disabled"; }; +qos_emmc: qos@ffa58000 { +compatible = "syscon"; +reg = <0x0 0xffa58000 0x0 0x20>; +}; + qos_hdcp: qos@ffa9 { compatible = "syscon"; reg = <0x0 0xffa9 0x0 0x20>; @@ -739,6 +745,11 @@ }; /* These power domains are grouped by VD_LOGIC */ +pd_emmc@RK3399_PD_EMMC { +reg = ; +clocks = < ACLK_EMMC>; +pm_qos = <_emmc>; +}; pd_vio@RK3399_PD_VIO { reg = ; #address-cells = <1>;
Re: [PATCH 2/2] arm64: dts: rockchip: add eMMC's power domain support for rk3399
On 08/27/2016 11:05 PM, Shawn Lin wrote: On 2016/8/27 21:41, Ziyuan Xu wrote: Control power domain for eMMC via genpd to reduce power consumption. Signed-off-by: Elaine Zhang Signed-off-by: Ziyuan Xu It looks nice to me. But this should be merged after applying that[0] as your patch will break bind/unbind test for sdhci-of-arasan on rk3399 without it[0]. Moreover, Elaine should make sure that upstreamed rockchip power domain stuff would not off pd for emmc, *otherwise*, I should update my patch to make sure we update clkmul every time when doing suspend 2 resume.. It looks nice to me. I was going on to submit with other Pds. [0]: https://patchwork.kernel.org/patch/9300971/ --- arch/arm64/boot/dts/rockchip/rk3399.dtsi | 11 +++ 1 file changed, 11 insertions(+) diff --git a/arch/arm64/boot/dts/rockchip/rk3399.dtsi b/arch/arm64/boot/dts/rockchip/rk3399.dtsi index 32aebc8..71733d4 100644 --- a/arch/arm64/boot/dts/rockchip/rk3399.dtsi +++ b/arch/arm64/boot/dts/rockchip/rk3399.dtsi @@ -239,6 +239,7 @@ #clock-cells = <0>; phys = <_phy>; phy-names = "phy_arasan"; +power-domains = < RK3399_PD_EMMC>; status = "disabled"; }; @@ -611,6 +612,11 @@ status = "disabled"; }; +qos_emmc: qos@ffa58000 { +compatible = "syscon"; +reg = <0x0 0xffa58000 0x0 0x20>; +}; + qos_hdcp: qos@ffa9 { compatible = "syscon"; reg = <0x0 0xffa9 0x0 0x20>; @@ -739,6 +745,11 @@ }; /* These power domains are grouped by VD_LOGIC */ +pd_emmc@RK3399_PD_EMMC { +reg = ; +clocks = < ACLK_EMMC>; +pm_qos = <_emmc>; +}; pd_vio@RK3399_PD_VIO { reg = ; #address-cells = <1>;
Re: [PATCH] thermal: hisilicon: fix COMPILE_TEST dependencies
On 五, 2016-08-26 at 17:44 +0200, Arnd Bergmann wrote: > As we now 'select STUB_CLK_HI6220', all dependencies for that driver > have > to be present in order to enable HISI_THERMAL, as pointed out by > Kconfig: > > warning: (HISI_THERMAL) selects STUB_CLK_HI6220 which has unmet > direct dependencies (COMMON_CLK && COMMON_CLK_HI6220 && MAILBOX) > > This rearranges the dependencies for this symbol, so all the > dependencies > aside from ARCH_HISI are always met when building it for compile > testing. > This mainly helps for randconfig testing, as an "allmodconfig" kernel > will > enable them anyway. > > Signed-off-by: Arnd Bergmann> Fixes: 5f63581ce68e ("thermal: hisilicon: Add dependency on the clock > driver to allow frequency scaling") As commit 5f63581ce68e has not been shipped in upstream yet, please fold this patch into the original one. I'd prefer one good patch instead of a broken patch + a fix. thanks, rui > --- > drivers/thermal/Kconfig | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig > index 5cba072c3a62..3c8607c07352 100644 > --- a/drivers/thermal/Kconfig > +++ b/drivers/thermal/Kconfig > @@ -177,7 +177,8 @@ config THERMAL_EMULATION > > config HISI_THERMAL > tristate "Hisilicon thermal driver" > - depends on (ARCH_HISI && CPU_THERMAL && OF) || COMPILE_TEST > + depends on ARCH_HISI || COMPILE_TEST > + depends on CPU_THERMAL && OF && COMMON_CLK_HI6220 && MAILBOX > depends on HAS_IOMEM > select STUB_CLK_HI6220 > help
Re: [PATCH] thermal: hisilicon: fix COMPILE_TEST dependencies
On 五, 2016-08-26 at 17:44 +0200, Arnd Bergmann wrote: > As we now 'select STUB_CLK_HI6220', all dependencies for that driver > have > to be present in order to enable HISI_THERMAL, as pointed out by > Kconfig: > > warning: (HISI_THERMAL) selects STUB_CLK_HI6220 which has unmet > direct dependencies (COMMON_CLK && COMMON_CLK_HI6220 && MAILBOX) > > This rearranges the dependencies for this symbol, so all the > dependencies > aside from ARCH_HISI are always met when building it for compile > testing. > This mainly helps for randconfig testing, as an "allmodconfig" kernel > will > enable them anyway. > > Signed-off-by: Arnd Bergmann > Fixes: 5f63581ce68e ("thermal: hisilicon: Add dependency on the clock > driver to allow frequency scaling") As commit 5f63581ce68e has not been shipped in upstream yet, please fold this patch into the original one. I'd prefer one good patch instead of a broken patch + a fix. thanks, rui > --- > drivers/thermal/Kconfig | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/drivers/thermal/Kconfig b/drivers/thermal/Kconfig > index 5cba072c3a62..3c8607c07352 100644 > --- a/drivers/thermal/Kconfig > +++ b/drivers/thermal/Kconfig > @@ -177,7 +177,8 @@ config THERMAL_EMULATION > > config HISI_THERMAL > tristate "Hisilicon thermal driver" > - depends on (ARCH_HISI && CPU_THERMAL && OF) || COMPILE_TEST > + depends on ARCH_HISI || COMPILE_TEST > + depends on CPU_THERMAL && OF && COMMON_CLK_HI6220 && MAILBOX > depends on HAS_IOMEM > select STUB_CLK_HI6220 > help
Re: [PATCH][v6] PM / hibernate: Print the possible panic reason when resuming with inconsistent e820 map
Hi, [no properly binding reference via In-Reply-To: available thus manually re-creating, sorry] > > So we can print warning in hibernation_die_notifier without > > introducing a global variable? > > > > Actually, I'd kill the machine right away. > > if (memcmp(result, buf, MD5_DIGEST_SIZE)) { > pr_err("PM: e820 map conflict detected!\n"); > panic("BIOS is playing funny tricks with us.\n"); > } > Best regards, > Pavel +1. I would tend to think that it's rather preferable to kill an affected environment scope (in this case: whole system), hard (except for perhaps some emergency epilogue state saving), in case of its state having become suspicious/unpredictable/dangerous, rather than having things carry on in a merry-go-round manner and thus making improper state progress which translates into *continued activity* of *corrupting things* (possibly even leading to *persistent* i.e. storage-recorded corruption!) willy-nilly. Plus, killing a machine hard in such a questionable case would increase our influx of valuable reports due to users being hard-pressed to report this rather than silently ignoring / not even knowing this issue (in those cases where we already know that no further development fixes will be determinable, carrying on with a dire warning probably still is preferable to making the machine completely unusable, eternally, though). Thus, think robustness. Andreas Mohr
Re: [PATCH][v6] PM / hibernate: Print the possible panic reason when resuming with inconsistent e820 map
Hi, [no properly binding reference via In-Reply-To: available thus manually re-creating, sorry] > > So we can print warning in hibernation_die_notifier without > > introducing a global variable? > > > > Actually, I'd kill the machine right away. > > if (memcmp(result, buf, MD5_DIGEST_SIZE)) { > pr_err("PM: e820 map conflict detected!\n"); > panic("BIOS is playing funny tricks with us.\n"); > } > Best regards, > Pavel +1. I would tend to think that it's rather preferable to kill an affected environment scope (in this case: whole system), hard (except for perhaps some emergency epilogue state saving), in case of its state having become suspicious/unpredictable/dangerous, rather than having things carry on in a merry-go-round manner and thus making improper state progress which translates into *continued activity* of *corrupting things* (possibly even leading to *persistent* i.e. storage-recorded corruption!) willy-nilly. Plus, killing a machine hard in such a questionable case would increase our influx of valuable reports due to users being hard-pressed to report this rather than silently ignoring / not even knowing this issue (in those cases where we already know that no further development fixes will be determinable, carrying on with a dire warning probably still is preferable to making the machine completely unusable, eternally, though). Thus, think robustness. Andreas Mohr
Re: [PATCH 1/1] ceph: do not modify fi->frag in need_reset_readdir()
> On Aug 29, 2016, at 00:47, Nicolas Ioosswrote: > > Commit f3c4ebe65ea1 ("ceph: using hash value to compose dentry offset") > modified "if (fpos_frag(new_pos) != fi->frag)" to "if (fi->frag |= > fpos_frag(new_pos))" in need_reset_readdir(), thus replacing a > comparison operator with an assignment one. > > This looks like a typo which is reported by clang when building the > kernel with some warning flags: > >fs/ceph/dir.c:600:22: error: using the result of an assignment as a >condition without parentheses [-Werror,-Wparentheses] >} else if (fi->frag |= fpos_frag(new_pos)) { > ~^ >fs/ceph/dir.c:600:22: note: place parentheses around the assignment >to silence this warning >} else if (fi->frag |= fpos_frag(new_pos)) { >^ > ( ) >fs/ceph/dir.c:600:22: note: use '!=' to turn this compound >assignment into an inequality comparison >} else if (fi->frag |= fpos_frag(new_pos)) { >^~ >!= > > Fixes: f3c4ebe65ea1 ("ceph: using hash value to compose dentry offset") > Cc: sta...@vger.kernel.org # 4.7.x > Signed-off-by: Nicolas Iooss > --- > fs/ceph/dir.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c > index c64a0b794d49..df4b3e6fa563 100644 > --- a/fs/ceph/dir.c > +++ b/fs/ceph/dir.c > @@ -597,7 +597,7 @@ static bool need_reset_readdir(struct ceph_file_info *fi, > loff_t new_pos) > if (is_hash_order(new_pos)) { > /* no need to reset last_name for a forward seek when >* dentries are sotred in hash order */ > - } else if (fi->frag |= fpos_frag(new_pos)) { > + } else if (fi->frag != fpos_frag(new_pos)) { > return true; > } > rinfo = fi->last_readdir ? >last_readdir->r_reply_info : NULL; Applied, thanks Yan, Zheng > -- > 2.9.3 >
Re: [PATCH 1/1] ceph: do not modify fi->frag in need_reset_readdir()
> On Aug 29, 2016, at 00:47, Nicolas Iooss wrote: > > Commit f3c4ebe65ea1 ("ceph: using hash value to compose dentry offset") > modified "if (fpos_frag(new_pos) != fi->frag)" to "if (fi->frag |= > fpos_frag(new_pos))" in need_reset_readdir(), thus replacing a > comparison operator with an assignment one. > > This looks like a typo which is reported by clang when building the > kernel with some warning flags: > >fs/ceph/dir.c:600:22: error: using the result of an assignment as a >condition without parentheses [-Werror,-Wparentheses] >} else if (fi->frag |= fpos_frag(new_pos)) { > ~^ >fs/ceph/dir.c:600:22: note: place parentheses around the assignment >to silence this warning >} else if (fi->frag |= fpos_frag(new_pos)) { >^ > ( ) >fs/ceph/dir.c:600:22: note: use '!=' to turn this compound >assignment into an inequality comparison >} else if (fi->frag |= fpos_frag(new_pos)) { >^~ >!= > > Fixes: f3c4ebe65ea1 ("ceph: using hash value to compose dentry offset") > Cc: sta...@vger.kernel.org # 4.7.x > Signed-off-by: Nicolas Iooss > --- > fs/ceph/dir.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/ceph/dir.c b/fs/ceph/dir.c > index c64a0b794d49..df4b3e6fa563 100644 > --- a/fs/ceph/dir.c > +++ b/fs/ceph/dir.c > @@ -597,7 +597,7 @@ static bool need_reset_readdir(struct ceph_file_info *fi, > loff_t new_pos) > if (is_hash_order(new_pos)) { > /* no need to reset last_name for a forward seek when >* dentries are sotred in hash order */ > - } else if (fi->frag |= fpos_frag(new_pos)) { > + } else if (fi->frag != fpos_frag(new_pos)) { > return true; > } > rinfo = fi->last_readdir ? >last_readdir->r_reply_info : NULL; Applied, thanks Yan, Zheng > -- > 2.9.3 >
Great Offer
You are a recipient to Mrs Julie Leach Donation of $2 million USD. Contact (julieleach...@hotmail.com) for claims.
Great Offer
You are a recipient to Mrs Julie Leach Donation of $2 million USD. Contact (julieleach...@hotmail.com) for claims.