linux-next: Tree for Sep 6

2017-09-05 Thread Stephen Rothwell
Hi all,

Please do not add any v4.15 related material to your linux-next
included branches until after v4.14-rc1 has been released.

Changes since 20170905:

The devicetree tree gained conflicts against Linus' and the sound trees.

The akpm-current tree lost its build failure.

Non-merge commits (relative to Linus' tree): 7233
 7400 files changed, 308697 insertions(+), 105511 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And
finally, a simple boot test of the powerpc pseries_le_defconfig kernel
in qemu.

Below is a summary of the state of the merge.

I am currently merging 268 trees (counting Linus' and 41 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (e7d0c41ecc2e Merge tag 'devprop-4.14-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm)
Merging fixes/master (b4b8cbf679c4 Cavium CNN55XX: fix broken default Kconfig 
entry)
Merging kbuild-current/fixes (64236e315955 kbuild: update comments of 
Makefile.asm-generic)
Merging arc-current/for-curr (1ee55a8f7f6b ARC: Re-enable MMU upon Machine 
Check exception)
Merging arm-current/fixes (746a272e4414 ARM: 8692/1: mm: abort uaccess retries 
upon fatal signal)
Merging m68k-current/for-linus (558d5ad276c9 m68k/mac: Avoid soft-lockup 
warning after mach_power_off)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (1a92a80ad386 powerpc/mm: Ensure cpumask update is 
ordered)
Merging sparc/master (6470812e2226 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc)
Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and 
linking special files)
Merging net/master (6d9c153a0b84 net: dsa: loop: Do not unregister invalid 
fixed PHY)
Merging ipsec/master (f581a0dd744f wl1251: add a missing spin_lock_init())
Merging netfilter/master (6d9c153a0b84 net: dsa: loop: Do not unregister 
invalid fixed PHY)
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (10a54d8196d1 iwlwifi: pcie: move rx workqueue 
initialization to iwl_trans_pcie_alloc())
Merging mac80211/master (04306f4dcf90 mac80211: Complete ampdu work schedule 
during session tear down)
Merging sound-current/for-linus (ee5f38a4459a Merge tag 'asoc-v4.14-cs43130' of 
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus)
Merging pci-current/for-linus (8e1101d25164 PCI/MSI: Don't warn when 
irq_create_affinity_masks() returns NULL)
Merging driver-core.current/driver-core-linus (ef954844c7ac Linux 4.13-rc5)
Merging tty.current/tty-linus (ef954844c7ac Linux 4.13-rc5)
Merging usb.current/usb-linus (ef954844c7ac Linux 4.13-rc5)
Merging usb-gadget-fixes/fixes (b7d44c36a6f6 usb: renesas_usbhs: gadget: fix 
unused-but-set-variable warning)
Merging usb-serial-fixes/usb-linus (fd1b8668af59 USB: serial: option: add 
D-Link DWM-222 device ID)
Merging usb-chipidea-fixes/ci-for-usb-stable (cbb22ebcfb99 usb: chipidea: core: 
check before accessing ci_role in ci_role_show)
Merging phy/fixes (5771a8c08880 Linux v4.13-rc1)
Merging staging.current/staging-linus (cc4a41fe5541 Linux 4.13-rc7)
Merging char-misc.current/char-misc-linus (cc4a41fe5541 Linux 4.13-rc7)
Merging input-current/for-linus (a6cbfa1e6d38 Merge branch 'next' into 
for-linus)
Merging crypto-current/master (2d45a7e89833 crypto: af_alg - get_page upon 
reassignment to TX SGL)
Merging ide/master (b671e170339

linux-next: Tree for Sep 6

2017-09-05 Thread Stephen Rothwell
Hi all,

Please do not add any v4.15 related material to your linux-next
included branches until after v4.14-rc1 has been released.

Changes since 20170905:

The devicetree tree gained conflicts against Linus' and the sound trees.

The akpm-current tree lost its build failure.

Non-merge commits (relative to Linus' tree): 7233
 7400 files changed, 308697 insertions(+), 105511 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc and an allmodconfig (with
CONFIG_BUILD_DOCSRC=n) for x86_64, a multi_v7_defconfig for arm and a
native build of tools/perf. After the final fixups (if any), I do an
x86_64 modules_install followed by builds for x86_64 allnoconfig,
powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig
and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And
finally, a simple boot test of the powerpc pseries_le_defconfig kernel
in qemu.

Below is a summary of the state of the merge.

I am currently merging 268 trees (counting Linus' and 41 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (e7d0c41ecc2e Merge tag 'devprop-4.14-rc1' of 
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm)
Merging fixes/master (b4b8cbf679c4 Cavium CNN55XX: fix broken default Kconfig 
entry)
Merging kbuild-current/fixes (64236e315955 kbuild: update comments of 
Makefile.asm-generic)
Merging arc-current/for-curr (1ee55a8f7f6b ARC: Re-enable MMU upon Machine 
Check exception)
Merging arm-current/fixes (746a272e4414 ARM: 8692/1: mm: abort uaccess retries 
upon fatal signal)
Merging m68k-current/for-linus (558d5ad276c9 m68k/mac: Avoid soft-lockup 
warning after mach_power_off)
Merging metag-fixes/fixes (b884a190afce metag/usercopy: Add missing fixups)
Merging powerpc-fixes/fixes (1a92a80ad386 powerpc/mm: Ensure cpumask update is 
ordered)
Merging sparc/master (6470812e2226 Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc)
Merging fscrypt-current/for-stable (42d97eb0ade3 fscrypt: fix renaming and 
linking special files)
Merging net/master (6d9c153a0b84 net: dsa: loop: Do not unregister invalid 
fixed PHY)
Merging ipsec/master (f581a0dd744f wl1251: add a missing spin_lock_init())
Merging netfilter/master (6d9c153a0b84 net: dsa: loop: Do not unregister 
invalid fixed PHY)
Merging ipvs/master (f7fb77fc1235 netfilter: nft_compat: check extension hook 
mask only if set)
Merging wireless-drivers/master (10a54d8196d1 iwlwifi: pcie: move rx workqueue 
initialization to iwl_trans_pcie_alloc())
Merging mac80211/master (04306f4dcf90 mac80211: Complete ampdu work schedule 
during session tear down)
Merging sound-current/for-linus (ee5f38a4459a Merge tag 'asoc-v4.14-cs43130' of 
git://git.kernel.org/pub/scm/linux/kernel/git/broonie/sound into for-linus)
Merging pci-current/for-linus (8e1101d25164 PCI/MSI: Don't warn when 
irq_create_affinity_masks() returns NULL)
Merging driver-core.current/driver-core-linus (ef954844c7ac Linux 4.13-rc5)
Merging tty.current/tty-linus (ef954844c7ac Linux 4.13-rc5)
Merging usb.current/usb-linus (ef954844c7ac Linux 4.13-rc5)
Merging usb-gadget-fixes/fixes (b7d44c36a6f6 usb: renesas_usbhs: gadget: fix 
unused-but-set-variable warning)
Merging usb-serial-fixes/usb-linus (fd1b8668af59 USB: serial: option: add 
D-Link DWM-222 device ID)
Merging usb-chipidea-fixes/ci-for-usb-stable (cbb22ebcfb99 usb: chipidea: core: 
check before accessing ci_role in ci_role_show)
Merging phy/fixes (5771a8c08880 Linux v4.13-rc1)
Merging staging.current/staging-linus (cc4a41fe5541 Linux 4.13-rc7)
Merging char-misc.current/char-misc-linus (cc4a41fe5541 Linux 4.13-rc7)
Merging input-current/for-linus (a6cbfa1e6d38 Merge branch 'next' into 
for-linus)
Merging crypto-current/master (2d45a7e89833 crypto: af_alg - get_page upon 
reassignment to TX SGL)
Merging ide/master (b671e170339

[PATCH] drivers/iommu/tegra: Optimize mutex_unlock function call

2017-09-05 Thread Pushkar Jambhlekar
Code can be optimized more by making single exit point of function and calling 
mutex_unlock at the end

Signed-off-by: Pushkar Jambhlekar 
---
 drivers/iommu/tegra-smmu.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 3b6449e..da38833 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -232,20 +232,22 @@ static inline void smmu_flush(struct tegra_smmu *smmu)
 static int tegra_smmu_alloc_asid(struct tegra_smmu *smmu, unsigned int *idp)
 {
unsigned long id;
+   int ret = 0;
 
mutex_lock(>lock);
 
id = find_first_zero_bit(smmu->asids, smmu->soc->num_asids);
if (id >= smmu->soc->num_asids) {
-   mutex_unlock(>lock);
-   return -ENOSPC;
+   ret = -ENOSPC;
+   goto Exit;
}
 
set_bit(id, smmu->asids);
*idp = id;
 
+Exit:
mutex_unlock(>lock);
-   return 0;
+   return ret;
 }
 
 static void tegra_smmu_free_asid(struct tegra_smmu *smmu, unsigned int id)
-- 
2.7.4



[PATCH] drivers/iommu/tegra: Optimize mutex_unlock function call

2017-09-05 Thread Pushkar Jambhlekar
Code can be optimized more by making single exit point of function and calling 
mutex_unlock at the end

Signed-off-by: Pushkar Jambhlekar 
---
 drivers/iommu/tegra-smmu.c | 8 +---
 1 file changed, 5 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/tegra-smmu.c b/drivers/iommu/tegra-smmu.c
index 3b6449e..da38833 100644
--- a/drivers/iommu/tegra-smmu.c
+++ b/drivers/iommu/tegra-smmu.c
@@ -232,20 +232,22 @@ static inline void smmu_flush(struct tegra_smmu *smmu)
 static int tegra_smmu_alloc_asid(struct tegra_smmu *smmu, unsigned int *idp)
 {
unsigned long id;
+   int ret = 0;
 
mutex_lock(>lock);
 
id = find_first_zero_bit(smmu->asids, smmu->soc->num_asids);
if (id >= smmu->soc->num_asids) {
-   mutex_unlock(>lock);
-   return -ENOSPC;
+   ret = -ENOSPC;
+   goto Exit;
}
 
set_bit(id, smmu->asids);
*idp = id;
 
+Exit:
mutex_unlock(>lock);
-   return 0;
+   return ret;
 }
 
 static void tegra_smmu_free_asid(struct tegra_smmu *smmu, unsigned int id)
-- 
2.7.4



Re: [PATCH v2] md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list

2017-09-05 Thread Shaohua Li
On Wed, Sep 06, 2017 at 11:02:35AM +0800, Dennis Yang wrote:
> In release_stripe_plug(), if a stripe_head has its STRIPE_ON_UNPLUG_LIST
> set, it indicates that this stripe_head is already in the raid5_plug_cb
> list and release_stripe() would be called instead to drop a reference
> count. Otherwise, the STRIPE_ON_UNPLUG_LIST bit would be set for this
> stripe_head and it will get queued into the raid5_plug_cb list.
> 
> Since break_stripe_batch_list() did not preserve STRIPE_ON_UNPLUG_LIST,
> A stripe could be re-added to plug list while it is still on that list
> in the following situation. If stripe_head A is added to another
> stripe_head B's batch list, in this case A will have its
> batch_head != NULL and be added into the plug list. After that,
> stripe_head B gets handled and called break_stripe_batch_list() to
> reset all the batched stripe_head(including A which is still on
> the plug list)'s state and reset their batch_head to NULL.
> Before the plug list gets processed, if there is another write request
> comes in and get stripe_head A, A will have its batch_head == NULL
> (cleared by calling break_stripe_batch_list() on B) and be added to
> plug list once again.

applied, thanks!
 
> Signed-off-by: Dennis Yang 
> ---
>  drivers/md/raid5.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index e92dd2d..faf3cfd 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -4611,7 +4611,8 @@ static void break_stripe_batch_list(struct stripe_head 
> *head_sh,
>  
>   set_mask_bits(>state, ~(STRIPE_EXPAND_SYNC_FLAGS |
>   (1 << STRIPE_PREREAD_ACTIVE) |
> - (1 << STRIPE_DEGRADED)),
> + (1 << STRIPE_DEGRADED) |
> + (1 << STRIPE_ON_UNPLUG_LIST)),
> head_sh->state & (1 << STRIPE_INSYNC));
>  
>   sh->check_state = head_sh->check_state;
> -- 
> 1.9.1
> 


Re: [PATCH v2] md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list

2017-09-05 Thread Shaohua Li
On Wed, Sep 06, 2017 at 11:02:35AM +0800, Dennis Yang wrote:
> In release_stripe_plug(), if a stripe_head has its STRIPE_ON_UNPLUG_LIST
> set, it indicates that this stripe_head is already in the raid5_plug_cb
> list and release_stripe() would be called instead to drop a reference
> count. Otherwise, the STRIPE_ON_UNPLUG_LIST bit would be set for this
> stripe_head and it will get queued into the raid5_plug_cb list.
> 
> Since break_stripe_batch_list() did not preserve STRIPE_ON_UNPLUG_LIST,
> A stripe could be re-added to plug list while it is still on that list
> in the following situation. If stripe_head A is added to another
> stripe_head B's batch list, in this case A will have its
> batch_head != NULL and be added into the plug list. After that,
> stripe_head B gets handled and called break_stripe_batch_list() to
> reset all the batched stripe_head(including A which is still on
> the plug list)'s state and reset their batch_head to NULL.
> Before the plug list gets processed, if there is another write request
> comes in and get stripe_head A, A will have its batch_head == NULL
> (cleared by calling break_stripe_batch_list() on B) and be added to
> plug list once again.

applied, thanks!
 
> Signed-off-by: Dennis Yang 
> ---
>  drivers/md/raid5.c | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
> index e92dd2d..faf3cfd 100644
> --- a/drivers/md/raid5.c
> +++ b/drivers/md/raid5.c
> @@ -4611,7 +4611,8 @@ static void break_stripe_batch_list(struct stripe_head 
> *head_sh,
>  
>   set_mask_bits(>state, ~(STRIPE_EXPAND_SYNC_FLAGS |
>   (1 << STRIPE_PREREAD_ACTIVE) |
> - (1 << STRIPE_DEGRADED)),
> + (1 << STRIPE_DEGRADED) |
> + (1 << STRIPE_ON_UNPLUG_LIST)),
> head_sh->state & (1 << STRIPE_INSYNC));
>  
>   sh->check_state = head_sh->check_state;
> -- 
> 1.9.1
> 


[PATCH 1/2] dt-bindings: watchdog: Add Spreadtrum watchdog documentation

2017-09-05 Thread Eric Long
This patch adds the documentation for Spreadtrum watchdog driver.

Signed-off-by: Eric Long 
---
 .../devicetree/bindings/watchdog/sprd-wdt.txt | 19 +++
 1 file changed, 19 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/watchdog/sprd-wdt.txt

diff --git a/Documentation/devicetree/bindings/watchdog/sprd-wdt.txt 
b/Documentation/devicetree/bindings/watchdog/sprd-wdt.txt
new file mode 100644
index 000..aeaf3e0
--- /dev/null
+++ b/Documentation/devicetree/bindings/watchdog/sprd-wdt.txt
@@ -0,0 +1,19 @@
+Spreadtrum SoCs Watchdog timer
+
+Required properties:
+- compatible : Should be "sprd,sp9860-wdt".
+- reg : Specifies base physical address and size of the registers.
+- interrupts : Exactly one interrupt specifier.
+- timeout-sec : Contain the default watchdog timeout in seconds.
+- clock-names : Contain the input clock names.
+- clocks : Phandles to input clocks.
+
+Example:
+   watchdog: watchdog@4031 {
+   compatible = "sprd,sp9860-wdt";
+   reg = <0 0x4031 0 0x1000>;
+   interrupts = ;
+   timeout-sec = <12>;
+   clock-names = "enable", "rtc_enable";
+   clocks = <_aon_apb_gates1 8>, <_aon_apb_rtc_gates 9>;
+   };
-- 
1.9.1



[PATCH 2/2] watchdog: Add Spreadtrum watchdog driver

2017-09-05 Thread Eric Long
This patch adds the watchdog driver for Spreadtrum SC9860 platform.

Signed-off-by: Eric Long 
---
 drivers/watchdog/Kconfig|   8 +
 drivers/watchdog/Makefile   |   1 +
 drivers/watchdog/sprd_wdt.c | 366 
 3 files changed, 375 insertions(+)
 create mode 100644 drivers/watchdog/sprd_wdt.c

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index c722cbf..ea07718 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -787,6 +787,14 @@ config UNIPHIER_WATCHDOG
  To compile this driver as a module, choose M here: the
  module will be called uniphier_wdt.
 
+config SPRD_WATCHDOG
+   tristate "Spreadtrum watchdog support"
+   depends on ARCH_SPRD
+   select WATCHDOG_CORE
+   help
+ Say Y here to include support watchdog timer embedded
+ into the Spreadtrum system.
+
 # AVR32 Architecture
 
 config AT32AP700X_WDT
diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile
index 56adf9f..187cca2 100644
--- a/drivers/watchdog/Makefile
+++ b/drivers/watchdog/Makefile
@@ -87,6 +87,7 @@ obj-$(CONFIG_ASPEED_WATCHDOG) += aspeed_wdt.o
 obj-$(CONFIG_ZX2967_WATCHDOG) += zx2967_wdt.o
 obj-$(CONFIG_STM32_WATCHDOG) += stm32_iwdg.o
 obj-$(CONFIG_UNIPHIER_WATCHDOG) += uniphier_wdt.o
+obj-$(CONFIG_SPRD_WATCHDOG) += sprd_wdt.o
 
 # AVR32 Architecture
 obj-$(CONFIG_AT32AP700X_WDT) += at32ap700x_wdt.o
diff --git a/drivers/watchdog/sprd_wdt.c b/drivers/watchdog/sprd_wdt.c
new file mode 100644
index 000..6006bb4
--- /dev/null
+++ b/drivers/watchdog/sprd_wdt.c
@@ -0,0 +1,366 @@
+/*
+ * Spreadtrum watchdog driver
+ * Copyright (C) 2017 Spreadtrum - http://www.spreadtrum.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define WDT_LOAD_LOW   0x0
+#define WDT_LOAD_HIGH  0x4
+#define WDT_CTRL   0x8
+#define WDT_INT_CLR0xc
+#define WDT_INT_RAW0x10
+#define WDT_INT_MSK0x14
+#define WDT_CNT_LOW0x18
+#define WDT_CNT_HIGH   0x1c
+#define WDT_LOCK   0x20
+#define WDT_IRQ_LOAD_LOW   0x2c
+#define WDT_IRQ_LOAD_HIGH  0x30
+
+/* WDT_CTRL */
+#define WDT_INT_EN_BIT BIT(0)
+#define WDT_CNT_EN_BIT BIT(1)
+#define WDT_NEW_VER_EN BIT(2)
+#define WDT_RST_EN_BIT BIT(3)
+
+/* WDT_INT_CLR */
+#define WDT_INT_CLEAR_BIT  BIT(0)
+#define WDT_RST_CLEAR_BIT  BIT(3)
+
+/* WDT_INT_RAW */
+#define WDT_INT_RAW_BITBIT(0)
+#define WDT_RST_RAW_BITBIT(3)
+#define WDT_LD_BUSY_BITBIT(4)
+
+#define WDT_CLK32768
+#define WDT_UNLOCK_KEY 0xe551
+#define WDT_IRQ_TMROUT_OFFSET  0x3
+
+#define WDT_CNT_VALUE_SIZE 16
+#define WDT_CNT_VALUE_MASK GENMASK(15, 0)
+#define WDT_LOAD_TIMEOUT_NUM   1
+
+struct sprd_wdt {
+   void __iomem *base;
+   struct watchdog_device wdd;
+   struct clk *enable;
+   struct clk *rtc_enable;
+   u32 irq_tmr_out;
+   u32 rst_tmr_out;
+   unsigned int irq;
+};
+
+static inline struct sprd_wdt *to_sprd_wdt(struct watchdog_device *wdd)
+{
+   return container_of(wdd, struct sprd_wdt, wdd);
+}
+
+static inline void sprd_wdt_lock(void __iomem *addr)
+{
+   writel_relaxed(0x0, addr + WDT_LOCK);
+}
+
+static inline void sprd_wdt_unlock(void __iomem *addr)
+{
+   writel_relaxed(WDT_UNLOCK_KEY, addr + WDT_LOCK);
+}
+
+static irqreturn_t sprd_wdt_isr(int irq, void *dev_id)
+{
+   struct sprd_wdt *wdt = (struct sprd_wdt *)dev_id;
+
+   sprd_wdt_unlock(wdt->base);
+   writel_relaxed(WDT_INT_CLEAR_BIT, wdt->base + WDT_INT_CLR);
+   sprd_wdt_lock(wdt->base);
+   watchdog_notify_pretimeout(>wdd);
+   return IRQ_HANDLED;
+}
+
+static u32 sprd_wdt_get_cnt_value(struct sprd_wdt *wdt)
+{
+   u32 val;
+
+   val = readl_relaxed(wdt->base + WDT_CNT_HIGH) << WDT_CNT_VALUE_SIZE;
+   val |= readl_relaxed(wdt->base + WDT_CNT_LOW) & WDT_CNT_VALUE_MASK;
+
+   return val;
+}
+
+static int sprd_wdt_load_value(struct sprd_wdt *wdt, u32 rst_value,
+  u32 irq_value)
+{
+   u32 val, cnt = 0;
+
+   sprd_wdt_unlock(wdt->base);
+   writel_relaxed((rst_value >> WDT_CNT_VALUE_SIZE) & WDT_CNT_VALUE_MASK,
+  wdt->base + WDT_LOAD_HIGH);
+   writel_relaxed((rst_value & WDT_CNT_VALUE_MASK),
+  wdt->base + WDT_LOAD_LOW);
+   

[PATCH 1/2] dt-bindings: watchdog: Add Spreadtrum watchdog documentation

2017-09-05 Thread Eric Long
This patch adds the documentation for Spreadtrum watchdog driver.

Signed-off-by: Eric Long 
---
 .../devicetree/bindings/watchdog/sprd-wdt.txt | 19 +++
 1 file changed, 19 insertions(+)
 create mode 100644 Documentation/devicetree/bindings/watchdog/sprd-wdt.txt

diff --git a/Documentation/devicetree/bindings/watchdog/sprd-wdt.txt 
b/Documentation/devicetree/bindings/watchdog/sprd-wdt.txt
new file mode 100644
index 000..aeaf3e0
--- /dev/null
+++ b/Documentation/devicetree/bindings/watchdog/sprd-wdt.txt
@@ -0,0 +1,19 @@
+Spreadtrum SoCs Watchdog timer
+
+Required properties:
+- compatible : Should be "sprd,sp9860-wdt".
+- reg : Specifies base physical address and size of the registers.
+- interrupts : Exactly one interrupt specifier.
+- timeout-sec : Contain the default watchdog timeout in seconds.
+- clock-names : Contain the input clock names.
+- clocks : Phandles to input clocks.
+
+Example:
+   watchdog: watchdog@4031 {
+   compatible = "sprd,sp9860-wdt";
+   reg = <0 0x4031 0 0x1000>;
+   interrupts = ;
+   timeout-sec = <12>;
+   clock-names = "enable", "rtc_enable";
+   clocks = <_aon_apb_gates1 8>, <_aon_apb_rtc_gates 9>;
+   };
-- 
1.9.1



[PATCH 2/2] watchdog: Add Spreadtrum watchdog driver

2017-09-05 Thread Eric Long
This patch adds the watchdog driver for Spreadtrum SC9860 platform.

Signed-off-by: Eric Long 
---
 drivers/watchdog/Kconfig|   8 +
 drivers/watchdog/Makefile   |   1 +
 drivers/watchdog/sprd_wdt.c | 366 
 3 files changed, 375 insertions(+)
 create mode 100644 drivers/watchdog/sprd_wdt.c

diff --git a/drivers/watchdog/Kconfig b/drivers/watchdog/Kconfig
index c722cbf..ea07718 100644
--- a/drivers/watchdog/Kconfig
+++ b/drivers/watchdog/Kconfig
@@ -787,6 +787,14 @@ config UNIPHIER_WATCHDOG
  To compile this driver as a module, choose M here: the
  module will be called uniphier_wdt.
 
+config SPRD_WATCHDOG
+   tristate "Spreadtrum watchdog support"
+   depends on ARCH_SPRD
+   select WATCHDOG_CORE
+   help
+ Say Y here to include support watchdog timer embedded
+ into the Spreadtrum system.
+
 # AVR32 Architecture
 
 config AT32AP700X_WDT
diff --git a/drivers/watchdog/Makefile b/drivers/watchdog/Makefile
index 56adf9f..187cca2 100644
--- a/drivers/watchdog/Makefile
+++ b/drivers/watchdog/Makefile
@@ -87,6 +87,7 @@ obj-$(CONFIG_ASPEED_WATCHDOG) += aspeed_wdt.o
 obj-$(CONFIG_ZX2967_WATCHDOG) += zx2967_wdt.o
 obj-$(CONFIG_STM32_WATCHDOG) += stm32_iwdg.o
 obj-$(CONFIG_UNIPHIER_WATCHDOG) += uniphier_wdt.o
+obj-$(CONFIG_SPRD_WATCHDOG) += sprd_wdt.o
 
 # AVR32 Architecture
 obj-$(CONFIG_AT32AP700X_WDT) += at32ap700x_wdt.o
diff --git a/drivers/watchdog/sprd_wdt.c b/drivers/watchdog/sprd_wdt.c
new file mode 100644
index 000..6006bb4
--- /dev/null
+++ b/drivers/watchdog/sprd_wdt.c
@@ -0,0 +1,366 @@
+/*
+ * Spreadtrum watchdog driver
+ * Copyright (C) 2017 Spreadtrum - http://www.spreadtrum.com
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * version 2 as published by the Free Software Foundation.
+ *
+ * This program is distributed in the hope that it will be useful, but
+ * WITHOUT ANY WARRANTY; without even the implied warranty of
+ * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
+ * General Public License for more details.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#define WDT_LOAD_LOW   0x0
+#define WDT_LOAD_HIGH  0x4
+#define WDT_CTRL   0x8
+#define WDT_INT_CLR0xc
+#define WDT_INT_RAW0x10
+#define WDT_INT_MSK0x14
+#define WDT_CNT_LOW0x18
+#define WDT_CNT_HIGH   0x1c
+#define WDT_LOCK   0x20
+#define WDT_IRQ_LOAD_LOW   0x2c
+#define WDT_IRQ_LOAD_HIGH  0x30
+
+/* WDT_CTRL */
+#define WDT_INT_EN_BIT BIT(0)
+#define WDT_CNT_EN_BIT BIT(1)
+#define WDT_NEW_VER_EN BIT(2)
+#define WDT_RST_EN_BIT BIT(3)
+
+/* WDT_INT_CLR */
+#define WDT_INT_CLEAR_BIT  BIT(0)
+#define WDT_RST_CLEAR_BIT  BIT(3)
+
+/* WDT_INT_RAW */
+#define WDT_INT_RAW_BITBIT(0)
+#define WDT_RST_RAW_BITBIT(3)
+#define WDT_LD_BUSY_BITBIT(4)
+
+#define WDT_CLK32768
+#define WDT_UNLOCK_KEY 0xe551
+#define WDT_IRQ_TMROUT_OFFSET  0x3
+
+#define WDT_CNT_VALUE_SIZE 16
+#define WDT_CNT_VALUE_MASK GENMASK(15, 0)
+#define WDT_LOAD_TIMEOUT_NUM   1
+
+struct sprd_wdt {
+   void __iomem *base;
+   struct watchdog_device wdd;
+   struct clk *enable;
+   struct clk *rtc_enable;
+   u32 irq_tmr_out;
+   u32 rst_tmr_out;
+   unsigned int irq;
+};
+
+static inline struct sprd_wdt *to_sprd_wdt(struct watchdog_device *wdd)
+{
+   return container_of(wdd, struct sprd_wdt, wdd);
+}
+
+static inline void sprd_wdt_lock(void __iomem *addr)
+{
+   writel_relaxed(0x0, addr + WDT_LOCK);
+}
+
+static inline void sprd_wdt_unlock(void __iomem *addr)
+{
+   writel_relaxed(WDT_UNLOCK_KEY, addr + WDT_LOCK);
+}
+
+static irqreturn_t sprd_wdt_isr(int irq, void *dev_id)
+{
+   struct sprd_wdt *wdt = (struct sprd_wdt *)dev_id;
+
+   sprd_wdt_unlock(wdt->base);
+   writel_relaxed(WDT_INT_CLEAR_BIT, wdt->base + WDT_INT_CLR);
+   sprd_wdt_lock(wdt->base);
+   watchdog_notify_pretimeout(>wdd);
+   return IRQ_HANDLED;
+}
+
+static u32 sprd_wdt_get_cnt_value(struct sprd_wdt *wdt)
+{
+   u32 val;
+
+   val = readl_relaxed(wdt->base + WDT_CNT_HIGH) << WDT_CNT_VALUE_SIZE;
+   val |= readl_relaxed(wdt->base + WDT_CNT_LOW) & WDT_CNT_VALUE_MASK;
+
+   return val;
+}
+
+static int sprd_wdt_load_value(struct sprd_wdt *wdt, u32 rst_value,
+  u32 irq_value)
+{
+   u32 val, cnt = 0;
+
+   sprd_wdt_unlock(wdt->base);
+   writel_relaxed((rst_value >> WDT_CNT_VALUE_SIZE) & WDT_CNT_VALUE_MASK,
+  wdt->base + WDT_LOAD_HIGH);
+   writel_relaxed((rst_value & WDT_CNT_VALUE_MASK),
+  wdt->base + WDT_LOAD_LOW);
+   writel_relaxed((irq_value >> 

[PATCH] drivers/staging/pi433: Fixing coding guidelines

2017-09-05 Thread Pushkar Jambhlekar
Fix brace style of if-else case

Signed-off-by: Pushkar Jambhlekar 
---
 drivers/staging/pi433/pi433_if.c | 162 +--
 1 file changed, 54 insertions(+), 108 deletions(-)

diff --git a/drivers/staging/pi433/pi433_if.c b/drivers/staging/pi433/pi433_if.c
index 93c0168..c070cf3 100644
--- a/drivers/staging/pi433/pi433_if.c
+++ b/drivers/staging/pi433/pi433_if.c
@@ -133,19 +133,16 @@ static irqreturn_t DIO0_irq_handler(int irq, void *dev_id)
 {
struct pi433_device *device = dev_id;
 
-   if  (device->irq_state[DIO0] == DIO_PacketSent)
-   {
+   if  (device->irq_state[DIO0] == DIO_PacketSent) {
device->free_in_fifo = FIFO_SIZE;
printk("DIO0 irq: Packet sent\n"); // TODO: printk() should 
include KERN_ facility level
wake_up_interruptible(>fifo_wait_queue);
}
-   else if (device->irq_state[DIO0] == DIO_Rssi_DIO0)
-   {
+   else if (device->irq_state[DIO0] == DIO_Rssi_DIO0) {
printk("DIO0 irq: RSSI level over threshold\n");
wake_up_interruptible(>rx_wait_queue);
}
-   else if (device->irq_state[DIO0] == DIO_PayloadReady)
-   {
+   else if (device->irq_state[DIO0] == DIO_PayloadReady) {
printk("DIO0 irq: PayloadReady\n");
device->free_in_fifo = 0;
wake_up_interruptible(>fifo_wait_queue);
@@ -158,12 +155,10 @@ static irqreturn_t DIO1_irq_handler(int irq, void *dev_id)
 {
struct pi433_device *device = dev_id;
 
-   if  (device->irq_state[DIO1] == DIO_FifoNotEmpty_DIO1)
-   {
+   if  (device->irq_state[DIO1] == DIO_FifoNotEmpty_DIO1) {
device->free_in_fifo = FIFO_SIZE;
}
-   else if (device->irq_state[DIO1] == DIO_FifoLevel)
-   {
+   else if (device->irq_state[DIO1] == DIO_FifoLevel) {
if (device->rx_active)  device->free_in_fifo = FIFO_THRESHOLD - 
1;
elsedevice->free_in_fifo = FIFO_SIZE - 
FIFO_THRESHOLD - 1;
}
@@ -197,12 +192,10 @@ rf69_set_rx_cfg(struct pi433_device *dev, struct 
pi433_rx_cfg *rx_cfg)
/* packet config */
/* enable */
SET_CHECKED(rf69_set_sync_enable(dev->spi, rx_cfg->enable_sync));
-   if (rx_cfg->enable_sync == optionOn)
-   {
+   if (rx_cfg->enable_sync == optionOn) {
SET_CHECKED(rf69_set_fifo_fill_condition(dev->spi, 
afterSyncInterrupt));
}
-   else
-   {
+   else {
SET_CHECKED(rf69_set_fifo_fill_condition(dev->spi, always));
}
if (rx_cfg->enable_length_byte == optionOn) {
@@ -219,29 +212,24 @@ rf69_set_rx_cfg(struct pi433_device *dev, struct 
pi433_rx_cfg *rx_cfg)
 
/* lengths */
SET_CHECKED(rf69_set_sync_size(dev->spi, rx_cfg->sync_length));
-   if (rx_cfg->enable_length_byte == optionOn)
-   {
+   if (rx_cfg->enable_length_byte == optionOn) {
SET_CHECKED(rf69_set_payload_length(dev->spi, 0xff));
}
-   else if (rx_cfg->fixed_message_length != 0)
-   {
+   else if (rx_cfg->fixed_message_length != 0) {
payload_length = rx_cfg->fixed_message_length;
if (rx_cfg->enable_length_byte  == optionOn) payload_length++;
if (rx_cfg->enable_address_filtering != filteringOff) 
payload_length++;
SET_CHECKED(rf69_set_payload_length(dev->spi, payload_length));
}
-   else
-   {
+   else {
SET_CHECKED(rf69_set_payload_length(dev->spi, 0));
}
 
/* values */
-   if (rx_cfg->enable_sync == optionOn)
-   {
+   if (rx_cfg->enable_sync == optionOn) {
SET_CHECKED(rf69_set_sync_values(dev->spi, 
rx_cfg->sync_pattern));
}
-   if (rx_cfg->enable_address_filtering != filteringOff)
-   {
+   if (rx_cfg->enable_address_filtering != filteringOff) {
SET_CHECKED(rf69_set_node_address (dev->spi, 
rx_cfg->node_address));
SET_CHECKED(rf69_set_broadcast_address(dev->spi, 
rx_cfg->broadcast_address));
}
@@ -263,12 +251,10 @@ rf69_set_tx_cfg(struct pi433_device *dev, struct 
pi433_tx_cfg *tx_cfg)
SET_CHECKED(rf69_set_tx_start_condition(dev->spi, 
tx_cfg->tx_start_condition));
 
/* packet format enable */
-   if (tx_cfg->enable_preamble == optionOn)
-   {
+   if (tx_cfg->enable_preamble == optionOn) {
SET_CHECKED(rf69_set_preamble_length(dev->spi, 
tx_cfg->preamble_length));
}
-   else
-   {
+   else {
SET_CHECKED(rf69_set_preamble_length(dev->spi, 0));
}
SET_CHECKED(rf69_set_sync_enable  (dev->spi, tx_cfg->enable_sync));
@@ -284,8 +270,7 @@ rf69_set_tx_cfg(struct pi433_device *dev, struct 
pi433_tx_cfg *tx_cfg)
SET_CHECKED(rf69_set_crc_enable   (dev->spi, tx_cfg->enable_crc));
 
/* 

[PATCH] drivers/staging/pi433: Fixing coding guidelines

2017-09-05 Thread Pushkar Jambhlekar
Fix brace style of if-else case

Signed-off-by: Pushkar Jambhlekar 
---
 drivers/staging/pi433/pi433_if.c | 162 +--
 1 file changed, 54 insertions(+), 108 deletions(-)

diff --git a/drivers/staging/pi433/pi433_if.c b/drivers/staging/pi433/pi433_if.c
index 93c0168..c070cf3 100644
--- a/drivers/staging/pi433/pi433_if.c
+++ b/drivers/staging/pi433/pi433_if.c
@@ -133,19 +133,16 @@ static irqreturn_t DIO0_irq_handler(int irq, void *dev_id)
 {
struct pi433_device *device = dev_id;
 
-   if  (device->irq_state[DIO0] == DIO_PacketSent)
-   {
+   if  (device->irq_state[DIO0] == DIO_PacketSent) {
device->free_in_fifo = FIFO_SIZE;
printk("DIO0 irq: Packet sent\n"); // TODO: printk() should 
include KERN_ facility level
wake_up_interruptible(>fifo_wait_queue);
}
-   else if (device->irq_state[DIO0] == DIO_Rssi_DIO0)
-   {
+   else if (device->irq_state[DIO0] == DIO_Rssi_DIO0) {
printk("DIO0 irq: RSSI level over threshold\n");
wake_up_interruptible(>rx_wait_queue);
}
-   else if (device->irq_state[DIO0] == DIO_PayloadReady)
-   {
+   else if (device->irq_state[DIO0] == DIO_PayloadReady) {
printk("DIO0 irq: PayloadReady\n");
device->free_in_fifo = 0;
wake_up_interruptible(>fifo_wait_queue);
@@ -158,12 +155,10 @@ static irqreturn_t DIO1_irq_handler(int irq, void *dev_id)
 {
struct pi433_device *device = dev_id;
 
-   if  (device->irq_state[DIO1] == DIO_FifoNotEmpty_DIO1)
-   {
+   if  (device->irq_state[DIO1] == DIO_FifoNotEmpty_DIO1) {
device->free_in_fifo = FIFO_SIZE;
}
-   else if (device->irq_state[DIO1] == DIO_FifoLevel)
-   {
+   else if (device->irq_state[DIO1] == DIO_FifoLevel) {
if (device->rx_active)  device->free_in_fifo = FIFO_THRESHOLD - 
1;
elsedevice->free_in_fifo = FIFO_SIZE - 
FIFO_THRESHOLD - 1;
}
@@ -197,12 +192,10 @@ rf69_set_rx_cfg(struct pi433_device *dev, struct 
pi433_rx_cfg *rx_cfg)
/* packet config */
/* enable */
SET_CHECKED(rf69_set_sync_enable(dev->spi, rx_cfg->enable_sync));
-   if (rx_cfg->enable_sync == optionOn)
-   {
+   if (rx_cfg->enable_sync == optionOn) {
SET_CHECKED(rf69_set_fifo_fill_condition(dev->spi, 
afterSyncInterrupt));
}
-   else
-   {
+   else {
SET_CHECKED(rf69_set_fifo_fill_condition(dev->spi, always));
}
if (rx_cfg->enable_length_byte == optionOn) {
@@ -219,29 +212,24 @@ rf69_set_rx_cfg(struct pi433_device *dev, struct 
pi433_rx_cfg *rx_cfg)
 
/* lengths */
SET_CHECKED(rf69_set_sync_size(dev->spi, rx_cfg->sync_length));
-   if (rx_cfg->enable_length_byte == optionOn)
-   {
+   if (rx_cfg->enable_length_byte == optionOn) {
SET_CHECKED(rf69_set_payload_length(dev->spi, 0xff));
}
-   else if (rx_cfg->fixed_message_length != 0)
-   {
+   else if (rx_cfg->fixed_message_length != 0) {
payload_length = rx_cfg->fixed_message_length;
if (rx_cfg->enable_length_byte  == optionOn) payload_length++;
if (rx_cfg->enable_address_filtering != filteringOff) 
payload_length++;
SET_CHECKED(rf69_set_payload_length(dev->spi, payload_length));
}
-   else
-   {
+   else {
SET_CHECKED(rf69_set_payload_length(dev->spi, 0));
}
 
/* values */
-   if (rx_cfg->enable_sync == optionOn)
-   {
+   if (rx_cfg->enable_sync == optionOn) {
SET_CHECKED(rf69_set_sync_values(dev->spi, 
rx_cfg->sync_pattern));
}
-   if (rx_cfg->enable_address_filtering != filteringOff)
-   {
+   if (rx_cfg->enable_address_filtering != filteringOff) {
SET_CHECKED(rf69_set_node_address (dev->spi, 
rx_cfg->node_address));
SET_CHECKED(rf69_set_broadcast_address(dev->spi, 
rx_cfg->broadcast_address));
}
@@ -263,12 +251,10 @@ rf69_set_tx_cfg(struct pi433_device *dev, struct 
pi433_tx_cfg *tx_cfg)
SET_CHECKED(rf69_set_tx_start_condition(dev->spi, 
tx_cfg->tx_start_condition));
 
/* packet format enable */
-   if (tx_cfg->enable_preamble == optionOn)
-   {
+   if (tx_cfg->enable_preamble == optionOn) {
SET_CHECKED(rf69_set_preamble_length(dev->spi, 
tx_cfg->preamble_length));
}
-   else
-   {
+   else {
SET_CHECKED(rf69_set_preamble_length(dev->spi, 0));
}
SET_CHECKED(rf69_set_sync_enable  (dev->spi, tx_cfg->enable_sync));
@@ -284,8 +270,7 @@ rf69_set_tx_cfg(struct pi433_device *dev, struct 
pi433_tx_cfg *tx_cfg)
SET_CHECKED(rf69_set_crc_enable   (dev->spi, tx_cfg->enable_crc));
 
/* configure sync, if 

Re: [PATCH v8 06/13] x86/apic: Mark the apic_intr_mode extern for sanity check cleanup

2017-09-05 Thread Dou Liyang

Hi Baoquan,

At 09/06/2017 01:26 PM, Baoquan He wrote:
[...]

 static void __init smp_cpu_index_default(void)
@@ -1335,19 +1295,20 @@ void __init native_smp_prepare_cpus(unsigned int 
max_cpus)


Please also cleanup the passed in max_cpus since it's not used here any
more. And up to the caller:


Oops, I just checked x86 code, other arch also have this hook. Please
ingore this comment.



Yes, you are right.

As I also found the comments for native_smp_prepare_cpus() is out of
date. I will update the comment and explain it like:

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 4f63afc..9f8479c 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1260,8 +1260,10 @@ static void __init smp_get_logical_apicid(void)
 }

 /*
- * Prepare for SMP bootup.  The MP table or ACPI has been read
- * earlier.  Just do some sanity checking here and enable APIC mode.
+ * Prepare for SMP bootup.
+ *
+ * @max_cpus: configured maximum number of CPUs
+ * It don't be used, but other arch also have this hook, so keep it.
  */
 void __init native_smp_prepare_cpus(unsigned int max_cpus)
 {

Thanks,
dou.


static noinline void __init kernel_init_freeable(void)
{
...
smp_prepare_cpus(setup_max_cpus);
...
}



apic_intr_mode_init();

-   switch (smp_sanity_check(max_cpus)) {
-   case SMP_NO_CONFIG:
-   disable_smp();
-   return;
-   case SMP_NO_APIC:
+   smp_sanity_check();
+
+   switch (apic_intr_mode) {
+   case APIC_PIC:
+   case APIC_VIRTUAL_WIRE_NO_CONFIG:
disable_smp();
return;
-   case SMP_FORCE_UP:
+   case APIC_SYMMETRIC_IO_NO_ROUTING:
disable_smp();
/* Setup local timer */
x86_init.timers.setup_percpu_clockev();
return;
-   case SMP_OK:
+   case APIC_VIRTUAL_WIRE:
+   case APIC_SYMMETRIC_IO:
break;
}

--
2.5.5












Re: [PATCH v8 06/13] x86/apic: Mark the apic_intr_mode extern for sanity check cleanup

2017-09-05 Thread Dou Liyang

Hi Baoquan,

At 09/06/2017 01:26 PM, Baoquan He wrote:
[...]

 static void __init smp_cpu_index_default(void)
@@ -1335,19 +1295,20 @@ void __init native_smp_prepare_cpus(unsigned int 
max_cpus)


Please also cleanup the passed in max_cpus since it's not used here any
more. And up to the caller:


Oops, I just checked x86 code, other arch also have this hook. Please
ingore this comment.



Yes, you are right.

As I also found the comments for native_smp_prepare_cpus() is out of
date. I will update the comment and explain it like:

diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
index 4f63afc..9f8479c 100644
--- a/arch/x86/kernel/smpboot.c
+++ b/arch/x86/kernel/smpboot.c
@@ -1260,8 +1260,10 @@ static void __init smp_get_logical_apicid(void)
 }

 /*
- * Prepare for SMP bootup.  The MP table or ACPI has been read
- * earlier.  Just do some sanity checking here and enable APIC mode.
+ * Prepare for SMP bootup.
+ *
+ * @max_cpus: configured maximum number of CPUs
+ * It don't be used, but other arch also have this hook, so keep it.
  */
 void __init native_smp_prepare_cpus(unsigned int max_cpus)
 {

Thanks,
dou.


static noinline void __init kernel_init_freeable(void)
{
...
smp_prepare_cpus(setup_max_cpus);
...
}



apic_intr_mode_init();

-   switch (smp_sanity_check(max_cpus)) {
-   case SMP_NO_CONFIG:
-   disable_smp();
-   return;
-   case SMP_NO_APIC:
+   smp_sanity_check();
+
+   switch (apic_intr_mode) {
+   case APIC_PIC:
+   case APIC_VIRTUAL_WIRE_NO_CONFIG:
disable_smp();
return;
-   case SMP_FORCE_UP:
+   case APIC_SYMMETRIC_IO_NO_ROUTING:
disable_smp();
/* Setup local timer */
x86_init.timers.setup_percpu_clockev();
return;
-   case SMP_OK:
+   case APIC_VIRTUAL_WIRE:
+   case APIC_SYMMETRIC_IO:
break;
}

--
2.5.5












Re: [PATCH v2 0/8] MT2712 IOMMU SUPPORT

2017-09-05 Thread Yong Wu
On Tue, 2017-08-22 at 22:38 +0800, Joerg Roedel wrote:
> On Mon, Aug 21, 2017 at 07:00:13PM +0800, Yong Wu wrote:
> > Yong Wu (8):
> >   dt-bindings: mediatek: Add binding for mt2712 IOMMU and SMI
> >   iommu/mediatek: Move MTK_M4U_TO_LARB/PORT into mtk_iommu.c
> >   iommu/mediatek: Add mt2712 IOMMU support
> >   iommu/mediatek: Merge 2 M4U HWs into one iommu domain
> >   iommu/mediatek: Move pgtable allocation into domain_alloc
> >   iommu/mediatek: Disable iommu clock when system suspend
> >   iommu/mediatek: Enlarge the validate PA range for 4GB mode
> >   memory: mtk-smi: Degrade SMI init to module_init
> 
> Applied patches 2-7. Patch 1 is something for Matthias.

Hi Matthias,

   Could you help review the patch[1/8]? If it's ok for you too,Could
you help apply that one via your tree.
   Thanks.




Re: [PATCH v2 0/8] MT2712 IOMMU SUPPORT

2017-09-05 Thread Yong Wu
On Tue, 2017-08-22 at 22:38 +0800, Joerg Roedel wrote:
> On Mon, Aug 21, 2017 at 07:00:13PM +0800, Yong Wu wrote:
> > Yong Wu (8):
> >   dt-bindings: mediatek: Add binding for mt2712 IOMMU and SMI
> >   iommu/mediatek: Move MTK_M4U_TO_LARB/PORT into mtk_iommu.c
> >   iommu/mediatek: Add mt2712 IOMMU support
> >   iommu/mediatek: Merge 2 M4U HWs into one iommu domain
> >   iommu/mediatek: Move pgtable allocation into domain_alloc
> >   iommu/mediatek: Disable iommu clock when system suspend
> >   iommu/mediatek: Enlarge the validate PA range for 4GB mode
> >   memory: mtk-smi: Degrade SMI init to module_init
> 
> Applied patches 2-7. Patch 1 is something for Matthias.

Hi Matthias,

   Could you help review the patch[1/8]? If it's ok for you too,Could
you help apply that one via your tree.
   Thanks.




Re: [PATCH] arm64: KVM: VHE: save and restore some PSTATE bits

2017-09-05 Thread gengdongjiu
CC Catalin


On 2017/9/6 2:58, gengdongjiu wrote:
> when exit from guest, some host PSTATE bits may be lost, such as
> PSTATE.PAN or PSTATE.UAO. It is because host and hypervisor all run
> in the EL2, host PSTATE value cannot be saved and restored via
> SPSR_EL2. So if guest has changed the PSTATE, host continues with
> a wrong value guest has set.
> 
> Signed-off-by: Dongjiu Geng 
> Signed-off-by: Haibin Zhang 
> ---
>  arch/arm64/include/asm/kvm_host.h |  8 +++
>  arch/arm64/include/asm/kvm_hyp.h  |  2 ++
>  arch/arm64/include/asm/sysreg.h   | 23 +++
>  arch/arm64/kvm/hyp/entry.S|  2 --
>  arch/arm64/kvm/hyp/switch.c   | 24 ++--
>  arch/arm64/kvm/hyp/sysreg-sr.c| 48 
> ---
>  6 files changed, 100 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index e923b58..cba7d3e 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -193,6 +193,12 @@ struct kvm_cpu_context {
>   };
>  };
>  
> +struct kvm_cpu_host_pstate {
> + u64 daif;
> + u64 uao;
> + u64 pan;
> +};
> +
>  typedef struct kvm_cpu_context kvm_cpu_context_t;
>  
>  struct kvm_vcpu_arch {
> @@ -227,6 +233,8 @@ struct kvm_vcpu_arch {
>  
>   /* Pointer to host CPU context */
>   kvm_cpu_context_t *host_cpu_context;
> + /* Host PSTATE value */
> + struct kvm_cpu_host_pstate host_pstate;
>   struct {
>   /* {Break,watch}point registers */
>   struct kvm_guest_debug_arch regs;
> diff --git a/arch/arm64/include/asm/kvm_hyp.h 
> b/arch/arm64/include/asm/kvm_hyp.h
> index 4572a9b..a75587a 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -134,6 +134,8 @@
>  
>  void __sysreg_save_host_state(struct kvm_cpu_context *ctxt);
>  void __sysreg_restore_host_state(struct kvm_cpu_context *ctxt);
> +void __sysreg_save_host_pstate(struct kvm_vcpu *vcpu);
> +void __sysreg_restore_host_pstate(struct kvm_vcpu *vcpu);
>  void __sysreg_save_guest_state(struct kvm_cpu_context *ctxt);
>  void __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt);
>  void __sysreg32_save_state(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 248339e..efdcf40 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -295,6 +295,29 @@
>  #define SYS_ICH_LR14_EL2 __SYS__LR8_EL2(6)
>  #define SYS_ICH_LR15_EL2 __SYS__LR8_EL2(7)
>  
> +#define REG_PSTATE_PAN   sys_reg(3, 0, 4, 2, 3)
> +#define REG_PSTATE_UAO   sys_reg(3, 0, 4, 2, 4)
> +
> +#define GET_PSTATE_PAN   \
> + ({  \
> + u64 reg;\
> + asm volatile(ALTERNATIVE("mov %0, #0",  \
> + "mrs_s %0, " __stringify(REG_PSTATE_PAN),\
> + ARM64_HAS_PAN)\
> + : "=r" (reg));\
> + reg;\
> + })
> +
> +#define GET_PSTATE_UAO   \
> + ({  \
> + u64 reg;\
> + asm volatile(ALTERNATIVE("mov %0, #0",\
> + "mrs_s %0, " __stringify(REG_PSTATE_UAO),\
> + ARM64_HAS_UAO)\
> + : "=r" (reg));\
> + reg;\
> + })
> +
>  /* Common SCTLR_ELx flags. */
>  #define SCTLR_ELx_EE(1 << 25)
>  #define SCTLR_ELx_I  (1 << 12)
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index 12ee62d..7662ef5 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -96,8 +96,6 @@ ENTRY(__guest_exit)
>  
>   add x1, x1, #VCPU_CONTEXT
>  
> - ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN)
> -
>   // Store the guest regs x2 and x3
>   stp x2, x3,   [x1, #CPU_XREG_OFFSET(2)]
>  
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 945e79c..9b380a1 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -278,6 +278,26 @@ static void __hyp_text __skip_instr(struct kvm_vcpu 
> *vcpu)
>   write_sysreg_el2(*vcpu_pc(vcpu), elr);
>  }
>  
> +static void __hyp_text __save_host_state(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_cpu_context *host_ctxt;
> +
> + host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> +
> + __sysreg_save_host_state(host_ctxt);
> + __sysreg_save_host_pstate(vcpu);
> +}
> +
> +static void __hyp_text __restore_host_state(struct kvm_vcpu *vcpu)
> +{
> + struct 

Re: [PATCH] arm64: KVM: VHE: save and restore some PSTATE bits

2017-09-05 Thread gengdongjiu
CC Catalin


On 2017/9/6 2:58, gengdongjiu wrote:
> when exit from guest, some host PSTATE bits may be lost, such as
> PSTATE.PAN or PSTATE.UAO. It is because host and hypervisor all run
> in the EL2, host PSTATE value cannot be saved and restored via
> SPSR_EL2. So if guest has changed the PSTATE, host continues with
> a wrong value guest has set.
> 
> Signed-off-by: Dongjiu Geng 
> Signed-off-by: Haibin Zhang 
> ---
>  arch/arm64/include/asm/kvm_host.h |  8 +++
>  arch/arm64/include/asm/kvm_hyp.h  |  2 ++
>  arch/arm64/include/asm/sysreg.h   | 23 +++
>  arch/arm64/kvm/hyp/entry.S|  2 --
>  arch/arm64/kvm/hyp/switch.c   | 24 ++--
>  arch/arm64/kvm/hyp/sysreg-sr.c| 48 
> ---
>  6 files changed, 100 insertions(+), 7 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/kvm_host.h 
> b/arch/arm64/include/asm/kvm_host.h
> index e923b58..cba7d3e 100644
> --- a/arch/arm64/include/asm/kvm_host.h
> +++ b/arch/arm64/include/asm/kvm_host.h
> @@ -193,6 +193,12 @@ struct kvm_cpu_context {
>   };
>  };
>  
> +struct kvm_cpu_host_pstate {
> + u64 daif;
> + u64 uao;
> + u64 pan;
> +};
> +
>  typedef struct kvm_cpu_context kvm_cpu_context_t;
>  
>  struct kvm_vcpu_arch {
> @@ -227,6 +233,8 @@ struct kvm_vcpu_arch {
>  
>   /* Pointer to host CPU context */
>   kvm_cpu_context_t *host_cpu_context;
> + /* Host PSTATE value */
> + struct kvm_cpu_host_pstate host_pstate;
>   struct {
>   /* {Break,watch}point registers */
>   struct kvm_guest_debug_arch regs;
> diff --git a/arch/arm64/include/asm/kvm_hyp.h 
> b/arch/arm64/include/asm/kvm_hyp.h
> index 4572a9b..a75587a 100644
> --- a/arch/arm64/include/asm/kvm_hyp.h
> +++ b/arch/arm64/include/asm/kvm_hyp.h
> @@ -134,6 +134,8 @@
>  
>  void __sysreg_save_host_state(struct kvm_cpu_context *ctxt);
>  void __sysreg_restore_host_state(struct kvm_cpu_context *ctxt);
> +void __sysreg_save_host_pstate(struct kvm_vcpu *vcpu);
> +void __sysreg_restore_host_pstate(struct kvm_vcpu *vcpu);
>  void __sysreg_save_guest_state(struct kvm_cpu_context *ctxt);
>  void __sysreg_restore_guest_state(struct kvm_cpu_context *ctxt);
>  void __sysreg32_save_state(struct kvm_vcpu *vcpu);
> diff --git a/arch/arm64/include/asm/sysreg.h b/arch/arm64/include/asm/sysreg.h
> index 248339e..efdcf40 100644
> --- a/arch/arm64/include/asm/sysreg.h
> +++ b/arch/arm64/include/asm/sysreg.h
> @@ -295,6 +295,29 @@
>  #define SYS_ICH_LR14_EL2 __SYS__LR8_EL2(6)
>  #define SYS_ICH_LR15_EL2 __SYS__LR8_EL2(7)
>  
> +#define REG_PSTATE_PAN   sys_reg(3, 0, 4, 2, 3)
> +#define REG_PSTATE_UAO   sys_reg(3, 0, 4, 2, 4)
> +
> +#define GET_PSTATE_PAN   \
> + ({  \
> + u64 reg;\
> + asm volatile(ALTERNATIVE("mov %0, #0",  \
> + "mrs_s %0, " __stringify(REG_PSTATE_PAN),\
> + ARM64_HAS_PAN)\
> + : "=r" (reg));\
> + reg;\
> + })
> +
> +#define GET_PSTATE_UAO   \
> + ({  \
> + u64 reg;\
> + asm volatile(ALTERNATIVE("mov %0, #0",\
> + "mrs_s %0, " __stringify(REG_PSTATE_UAO),\
> + ARM64_HAS_UAO)\
> + : "=r" (reg));\
> + reg;\
> + })
> +
>  /* Common SCTLR_ELx flags. */
>  #define SCTLR_ELx_EE(1 << 25)
>  #define SCTLR_ELx_I  (1 << 12)
> diff --git a/arch/arm64/kvm/hyp/entry.S b/arch/arm64/kvm/hyp/entry.S
> index 12ee62d..7662ef5 100644
> --- a/arch/arm64/kvm/hyp/entry.S
> +++ b/arch/arm64/kvm/hyp/entry.S
> @@ -96,8 +96,6 @@ ENTRY(__guest_exit)
>  
>   add x1, x1, #VCPU_CONTEXT
>  
> - ALTERNATIVE(nop, SET_PSTATE_PAN(1), ARM64_HAS_PAN, CONFIG_ARM64_PAN)
> -
>   // Store the guest regs x2 and x3
>   stp x2, x3,   [x1, #CPU_XREG_OFFSET(2)]
>  
> diff --git a/arch/arm64/kvm/hyp/switch.c b/arch/arm64/kvm/hyp/switch.c
> index 945e79c..9b380a1 100644
> --- a/arch/arm64/kvm/hyp/switch.c
> +++ b/arch/arm64/kvm/hyp/switch.c
> @@ -278,6 +278,26 @@ static void __hyp_text __skip_instr(struct kvm_vcpu 
> *vcpu)
>   write_sysreg_el2(*vcpu_pc(vcpu), elr);
>  }
>  
> +static void __hyp_text __save_host_state(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_cpu_context *host_ctxt;
> +
> + host_ctxt = kern_hyp_va(vcpu->arch.host_cpu_context);
> +
> + __sysreg_save_host_state(host_ctxt);
> + __sysreg_save_host_pstate(vcpu);
> +}
> +
> +static void __hyp_text __restore_host_state(struct kvm_vcpu *vcpu)
> +{
> + struct kvm_cpu_context *host_ctxt;
> +
> + host_ctxt = 

Re: [RFC v2 6/6] platform/x86: intel_pmc_ipc: Use generic Intel IPC device calls

2017-09-05 Thread Kuppuswamy, Sathyanarayanan

Hi Lee,


On 9/5/2017 12:38 AM, Lee Jones wrote:

On Sat, 02 Sep 2017, sathyanarayanan.kuppusw...@linux.intel.com wrote:


From: Kuppuswamy Sathyanarayanan 

Removed redundant IPC helper functions and refactored the driver to use
generic IPC device driver APIs.

This patch also cleans-up PMC IPC user drivers to use APIs provided
by generic IPC driver.

Signed-off-by: Kuppuswamy Sathyanarayanan 

---
  arch/x86/include/asm/intel_pmc_ipc.h  |  37 +--
  drivers/mfd/intel_soc_pmic_bxtwc.c|  24 +-
  include/linux/mfd/intel_soc_pmic.h|   2 +

I'm a bit concerned by the API.
This is not a new change. Even before refactoring this driver, we have 
been using u32 bit variable to pass the DPTR and SPTR address.

   Any reason why you're not using
pointers for addresses?
I think the main reason is,  this is the expected format defined by 
SCU/PMC spec. According to the spec document, SPTR and DPTR registers 
are used to program the 32 bit SRAM address from which the PMC/SCU 
firmware can read/write the data of an IPC command. if we are not using 
SPTR or DPTR, we need to leave the value at zero.

pointers, you should be using NULL, instead of 0.


  drivers/platform/x86/intel_pmc_ipc.c  | 364 +-
  drivers/platform/x86/intel_telemetry_pltdrv.c | 114 
  5 files changed, 215 insertions(+), 326 deletions(-)

Changes since v1:
  * Removed custom APIs.
  * Cleaned up PMC IPC user drivers to use APIs provided by generic
IPC driver.




Re: [RFC v2 6/6] platform/x86: intel_pmc_ipc: Use generic Intel IPC device calls

2017-09-05 Thread Kuppuswamy, Sathyanarayanan

Hi Lee,


On 9/5/2017 12:38 AM, Lee Jones wrote:

On Sat, 02 Sep 2017, sathyanarayanan.kuppusw...@linux.intel.com wrote:


From: Kuppuswamy Sathyanarayanan 

Removed redundant IPC helper functions and refactored the driver to use
generic IPC device driver APIs.

This patch also cleans-up PMC IPC user drivers to use APIs provided
by generic IPC driver.

Signed-off-by: Kuppuswamy Sathyanarayanan 

---
  arch/x86/include/asm/intel_pmc_ipc.h  |  37 +--
  drivers/mfd/intel_soc_pmic_bxtwc.c|  24 +-
  include/linux/mfd/intel_soc_pmic.h|   2 +

I'm a bit concerned by the API.
This is not a new change. Even before refactoring this driver, we have 
been using u32 bit variable to pass the DPTR and SPTR address.

   Any reason why you're not using
pointers for addresses?
I think the main reason is,  this is the expected format defined by 
SCU/PMC spec. According to the spec document, SPTR and DPTR registers 
are used to program the 32 bit SRAM address from which the PMC/SCU 
firmware can read/write the data of an IPC command. if we are not using 
SPTR or DPTR, we need to leave the value at zero.

pointers, you should be using NULL, instead of 0.


  drivers/platform/x86/intel_pmc_ipc.c  | 364 +-
  drivers/platform/x86/intel_telemetry_pltdrv.c | 114 
  5 files changed, 215 insertions(+), 326 deletions(-)

Changes since v1:
  * Removed custom APIs.
  * Cleaned up PMC IPC user drivers to use APIs provided by generic
IPC driver.




[PATCH 3/3] locking/rwsem/x86: Add stack frame dependency for __downgrade_write()

2017-09-05 Thread Miguel Bernal Marin
kernel/locking/rwsem.o: warning: objtool: downgrade_write()+0x22: call without 
frame pointer save/setup

The warning means gcc 7.2.0 placed the __downgrade_write() inline asm (and
its call instruction) before the frame pointer setup in downgrade_write(),
which breaks frame pointer convention and can result in incorrect
stack traces.

Force a stack frame to be created before the call instruction by listing
the stack pointer as an output operand in the inline asm statement.

Signed-off-by: Miguel Bernal Marin 
---
 arch/x86/include/asm/rwsem.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index d26b6916b935..a749dc6a3103 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -205,8 +205,10 @@ static inline void __up_write(struct rw_semaphore *sem)
  */
 static inline void __downgrade_write(struct rw_semaphore *sem)
 {
+   register void *__sp asm(_ASM_SP);
+
asm volatile("# beginning __downgrade_write\n\t"
-LOCK_PREFIX _ASM_ADD "%2,(%1)\n\t"
+LOCK_PREFIX _ASM_ADD "%2,(%2)\n\t"
 /*
  * transitions 0x0001 -> 0x0001 (i386)
  * 0x0001 -> 0x0001 (x86_64)
@@ -215,7 +217,7 @@ static inline void __downgrade_write(struct rw_semaphore 
*sem)
 "  call call_rwsem_downgrade_wake\n"
 "1:\n\t"
 "# ending __downgrade_write\n"
-: "+m" (sem->count)
+: "+m" (sem->count), "+r" (__sp)
 : "a" (sem), "er" (-RWSEM_WAITING_BIAS)
 : "memory", "cc");
 }
-- 
2.14.1



[PATCH 0/3] locking/rwsem/x86: Add stack frame dependency for some inline asm

2017-09-05 Thread Miguel Bernal Marin
Some warning were showed by objtool using gcc 7.2.0

kernel/locking/rwsem.o: warning: objtool: up_read()+0x11: call without frame 
pointer save/setup
kernel/locking/rwsem.o: warning: objtool: up_write()+0x17: call without frame 
pointer save/setup
kernel/locking/rwsem.o: warning: objtool: downgrade_write()+0x22: call without 
frame pointer save/setup

which means gcc placed an inline asm function and its call instruction before
the frame pointer setup.
 
This series forces a stack frame to be created before the call instruction
by listing the stack pointer as an output operand in the inline asm statement.

Miguel Bernal Marin (3):
  locking/rwsem/x86: Add stack frame dependency for __up_read()
  locking/rwsem/x86: Add stack frame dependency for __up_write()
  locking/rwsem/x86: Add stack frame dependency for __downgrade_write()

 arch/x86/include/asm/rwsem.h | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

-- 
2.14.1



[PATCH 3/3] locking/rwsem/x86: Add stack frame dependency for __downgrade_write()

2017-09-05 Thread Miguel Bernal Marin
kernel/locking/rwsem.o: warning: objtool: downgrade_write()+0x22: call without 
frame pointer save/setup

The warning means gcc 7.2.0 placed the __downgrade_write() inline asm (and
its call instruction) before the frame pointer setup in downgrade_write(),
which breaks frame pointer convention and can result in incorrect
stack traces.

Force a stack frame to be created before the call instruction by listing
the stack pointer as an output operand in the inline asm statement.

Signed-off-by: Miguel Bernal Marin 
---
 arch/x86/include/asm/rwsem.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index d26b6916b935..a749dc6a3103 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -205,8 +205,10 @@ static inline void __up_write(struct rw_semaphore *sem)
  */
 static inline void __downgrade_write(struct rw_semaphore *sem)
 {
+   register void *__sp asm(_ASM_SP);
+
asm volatile("# beginning __downgrade_write\n\t"
-LOCK_PREFIX _ASM_ADD "%2,(%1)\n\t"
+LOCK_PREFIX _ASM_ADD "%2,(%2)\n\t"
 /*
  * transitions 0x0001 -> 0x0001 (i386)
  * 0x0001 -> 0x0001 (x86_64)
@@ -215,7 +217,7 @@ static inline void __downgrade_write(struct rw_semaphore 
*sem)
 "  call call_rwsem_downgrade_wake\n"
 "1:\n\t"
 "# ending __downgrade_write\n"
-: "+m" (sem->count)
+: "+m" (sem->count), "+r" (__sp)
 : "a" (sem), "er" (-RWSEM_WAITING_BIAS)
 : "memory", "cc");
 }
-- 
2.14.1



[PATCH 0/3] locking/rwsem/x86: Add stack frame dependency for some inline asm

2017-09-05 Thread Miguel Bernal Marin
Some warning were showed by objtool using gcc 7.2.0

kernel/locking/rwsem.o: warning: objtool: up_read()+0x11: call without frame 
pointer save/setup
kernel/locking/rwsem.o: warning: objtool: up_write()+0x17: call without frame 
pointer save/setup
kernel/locking/rwsem.o: warning: objtool: downgrade_write()+0x22: call without 
frame pointer save/setup

which means gcc placed an inline asm function and its call instruction before
the frame pointer setup.
 
This series forces a stack frame to be created before the call instruction
by listing the stack pointer as an output operand in the inline asm statement.

Miguel Bernal Marin (3):
  locking/rwsem/x86: Add stack frame dependency for __up_read()
  locking/rwsem/x86: Add stack frame dependency for __up_write()
  locking/rwsem/x86: Add stack frame dependency for __downgrade_write()

 arch/x86/include/asm/rwsem.h | 18 --
 1 file changed, 12 insertions(+), 6 deletions(-)

-- 
2.14.1



[PATCH 2/3] locking/rwsem/x86: Add stack frame dependency for __up_write()

2017-09-05 Thread Miguel Bernal Marin
kernel/locking/rwsem.o: warning: objtool: up_write()+0x17: call without frame 
pointer save/setup

The warning means gcc 7.2.0 placed the __up_write() inline asm (and its
call instruction) before the frame pointer setup in up_write(),
which breaks frame pointer convention and can result in incorrect
stack traces.

Force a stack frame to be created before the call instruction by listing
the stack pointer as an output operand in the inline asm statement.

Signed-off-by: Miguel Bernal Marin 
---
 arch/x86/include/asm/rwsem.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index 762167afaec0..d26b6916b935 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -186,14 +186,16 @@ static inline void __up_read(struct rw_semaphore *sem)
 static inline void __up_write(struct rw_semaphore *sem)
 {
long tmp;
+   register void *__sp asm(_ASM_SP);
+
asm volatile("# beginning __up_write\n\t"
-LOCK_PREFIX "  xadd  %1,(%2)\n\t"
+LOCK_PREFIX "  xadd  %1,(%3)\n\t"
 /* subtracts 0x0001, returns the old value */
 "  jns1f\n\t"
 "  call call_rwsem_wake\n" /* expects old value in %edx */
 "1:\n\t"
 "# ending __up_write\n"
-: "+m" (sem->count), "=d" (tmp)
+: "+m" (sem->count), "=d" (tmp), "+r" (__sp)
 : "a" (sem), "1" (-RWSEM_ACTIVE_WRITE_BIAS)
 : "memory", "cc");
 }
-- 
2.14.1



[PATCH 1/3] locking/rwsem/x86: Add stack frame dependency for __up_read()

2017-09-05 Thread Miguel Bernal Marin
kernel/locking/rwsem.o: warning: objtool: up_read()+0x11: call without frame 
pointer save/setup

The warning means gcc 7.2.0 placed the __up_read() inline asm (and its
call instruction) before the frame pointer setup in up_read(),
which breaks frame pointer convention and can result in incorrect
stack traces.

Force a stack frame to be created before the call instruction by listing
the stack pointer as an output operand in the inline asm statement.

Signed-off-by: Miguel Bernal Marin 
---
 arch/x86/include/asm/rwsem.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index a34e0d4b957d..762167afaec0 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -166,14 +166,16 @@ static inline bool __down_write_trylock(struct 
rw_semaphore *sem)
 static inline void __up_read(struct rw_semaphore *sem)
 {
long tmp;
+   register void *__sp asm(_ASM_SP);
+
asm volatile("# beginning __up_read\n\t"
-LOCK_PREFIX "  xadd  %1,(%2)\n\t"
+LOCK_PREFIX "  xadd  %1,(%3)\n\t"
 /* subtracts 1, returns the old value */
 "  jns1f\n\t"
 "  call call_rwsem_wake\n" /* expects old value in %edx */
 "1:\n"
 "# ending __up_read\n"
-: "+m" (sem->count), "=d" (tmp)
+: "+m" (sem->count), "=d" (tmp), "+r" (__sp)
 : "a" (sem), "1" (-RWSEM_ACTIVE_READ_BIAS)
 : "memory", "cc");
 }
-- 
2.14.1



Re: [PATCH v8 06/13] x86/apic: Mark the apic_intr_mode extern for sanity check cleanup

2017-09-05 Thread Baoquan He
On 09/06/17 at 12:25pm, Baoquan He wrote:
> On 08/28/17 at 11:20am, Dou Liyang wrote:
> > -   /*
> >  * Should not be necessary because the MP table should list the boot
> >  * CPU too, but we do it for the sake of robustness anyway.
> >  */
> > @@ -1254,29 +1237,6 @@ static int __init smp_sanity_check(unsigned max_cpus)
> > physid_set(hard_smp_processor_id(), phys_cpu_present_map);
> > }
> > preempt_enable();
> > -
> > -   /*
> > -* If we couldn't find a local APIC, then get out of here now!
> > -*/
> > -   if (APIC_INTEGRATED(boot_cpu_apic_version) &&
> > -   !boot_cpu_has(X86_FEATURE_APIC)) {
> > -   if (!disable_apic) {
> > -   pr_err("BIOS bug, local APIC #%d not detected!...\n",
> > -   boot_cpu_physical_apicid);
> > -   pr_err("... forcing use of dummy APIC emulation (tell 
> > your hw vendor)\n");
> > -   }
> > -   return SMP_NO_APIC;
> > -   }
> > -
> > -   /*
> > -* If SMP should be disabled, then really disable it!
> > -*/
> > -   if (!max_cpus) {
> > -   pr_info("SMP mode deactivated\n");
> > -   return SMP_FORCE_UP;
> > -   }
> > -
> > -   return SMP_OK;
> >  }
> >  
> >  static void __init smp_cpu_index_default(void)
> > @@ -1335,19 +1295,20 @@ void __init native_smp_prepare_cpus(unsigned int 
> > max_cpus)
> 
> Please also cleanup the passed in max_cpus since it's not used here any
> more. And up to the caller:

Oops, I just checked x86 code, other arch also have this hook. Please
ingore this comment.

> 
> static noinline void __init kernel_init_freeable(void)
> {
>   ...
>   smp_prepare_cpus(setup_max_cpus);
>   ...
> }
> 
> >  
> > apic_intr_mode_init();
> >  
> > -   switch (smp_sanity_check(max_cpus)) {
> > -   case SMP_NO_CONFIG:
> > -   disable_smp();
> > -   return;
> > -   case SMP_NO_APIC:
> > +   smp_sanity_check();
> > +
> > +   switch (apic_intr_mode) {
> > +   case APIC_PIC:
> > +   case APIC_VIRTUAL_WIRE_NO_CONFIG:
> > disable_smp();
> > return;
> > -   case SMP_FORCE_UP:
> > +   case APIC_SYMMETRIC_IO_NO_ROUTING:
> > disable_smp();
> > /* Setup local timer */
> > x86_init.timers.setup_percpu_clockev();
> > return;
> > -   case SMP_OK:
> > +   case APIC_VIRTUAL_WIRE:
> > +   case APIC_SYMMETRIC_IO:
> > break;
> > }
> >  
> > -- 
> > 2.5.5
> > 
> > 
> > 


[PATCH 2/3] locking/rwsem/x86: Add stack frame dependency for __up_write()

2017-09-05 Thread Miguel Bernal Marin
kernel/locking/rwsem.o: warning: objtool: up_write()+0x17: call without frame 
pointer save/setup

The warning means gcc 7.2.0 placed the __up_write() inline asm (and its
call instruction) before the frame pointer setup in up_write(),
which breaks frame pointer convention and can result in incorrect
stack traces.

Force a stack frame to be created before the call instruction by listing
the stack pointer as an output operand in the inline asm statement.

Signed-off-by: Miguel Bernal Marin 
---
 arch/x86/include/asm/rwsem.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index 762167afaec0..d26b6916b935 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -186,14 +186,16 @@ static inline void __up_read(struct rw_semaphore *sem)
 static inline void __up_write(struct rw_semaphore *sem)
 {
long tmp;
+   register void *__sp asm(_ASM_SP);
+
asm volatile("# beginning __up_write\n\t"
-LOCK_PREFIX "  xadd  %1,(%2)\n\t"
+LOCK_PREFIX "  xadd  %1,(%3)\n\t"
 /* subtracts 0x0001, returns the old value */
 "  jns1f\n\t"
 "  call call_rwsem_wake\n" /* expects old value in %edx */
 "1:\n\t"
 "# ending __up_write\n"
-: "+m" (sem->count), "=d" (tmp)
+: "+m" (sem->count), "=d" (tmp), "+r" (__sp)
 : "a" (sem), "1" (-RWSEM_ACTIVE_WRITE_BIAS)
 : "memory", "cc");
 }
-- 
2.14.1



[PATCH 1/3] locking/rwsem/x86: Add stack frame dependency for __up_read()

2017-09-05 Thread Miguel Bernal Marin
kernel/locking/rwsem.o: warning: objtool: up_read()+0x11: call without frame 
pointer save/setup

The warning means gcc 7.2.0 placed the __up_read() inline asm (and its
call instruction) before the frame pointer setup in up_read(),
which breaks frame pointer convention and can result in incorrect
stack traces.

Force a stack frame to be created before the call instruction by listing
the stack pointer as an output operand in the inline asm statement.

Signed-off-by: Miguel Bernal Marin 
---
 arch/x86/include/asm/rwsem.h | 6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

diff --git a/arch/x86/include/asm/rwsem.h b/arch/x86/include/asm/rwsem.h
index a34e0d4b957d..762167afaec0 100644
--- a/arch/x86/include/asm/rwsem.h
+++ b/arch/x86/include/asm/rwsem.h
@@ -166,14 +166,16 @@ static inline bool __down_write_trylock(struct 
rw_semaphore *sem)
 static inline void __up_read(struct rw_semaphore *sem)
 {
long tmp;
+   register void *__sp asm(_ASM_SP);
+
asm volatile("# beginning __up_read\n\t"
-LOCK_PREFIX "  xadd  %1,(%2)\n\t"
+LOCK_PREFIX "  xadd  %1,(%3)\n\t"
 /* subtracts 1, returns the old value */
 "  jns1f\n\t"
 "  call call_rwsem_wake\n" /* expects old value in %edx */
 "1:\n"
 "# ending __up_read\n"
-: "+m" (sem->count), "=d" (tmp)
+: "+m" (sem->count), "=d" (tmp), "+r" (__sp)
 : "a" (sem), "1" (-RWSEM_ACTIVE_READ_BIAS)
 : "memory", "cc");
 }
-- 
2.14.1



Re: [PATCH v8 06/13] x86/apic: Mark the apic_intr_mode extern for sanity check cleanup

2017-09-05 Thread Baoquan He
On 09/06/17 at 12:25pm, Baoquan He wrote:
> On 08/28/17 at 11:20am, Dou Liyang wrote:
> > -   /*
> >  * Should not be necessary because the MP table should list the boot
> >  * CPU too, but we do it for the sake of robustness anyway.
> >  */
> > @@ -1254,29 +1237,6 @@ static int __init smp_sanity_check(unsigned max_cpus)
> > physid_set(hard_smp_processor_id(), phys_cpu_present_map);
> > }
> > preempt_enable();
> > -
> > -   /*
> > -* If we couldn't find a local APIC, then get out of here now!
> > -*/
> > -   if (APIC_INTEGRATED(boot_cpu_apic_version) &&
> > -   !boot_cpu_has(X86_FEATURE_APIC)) {
> > -   if (!disable_apic) {
> > -   pr_err("BIOS bug, local APIC #%d not detected!...\n",
> > -   boot_cpu_physical_apicid);
> > -   pr_err("... forcing use of dummy APIC emulation (tell 
> > your hw vendor)\n");
> > -   }
> > -   return SMP_NO_APIC;
> > -   }
> > -
> > -   /*
> > -* If SMP should be disabled, then really disable it!
> > -*/
> > -   if (!max_cpus) {
> > -   pr_info("SMP mode deactivated\n");
> > -   return SMP_FORCE_UP;
> > -   }
> > -
> > -   return SMP_OK;
> >  }
> >  
> >  static void __init smp_cpu_index_default(void)
> > @@ -1335,19 +1295,20 @@ void __init native_smp_prepare_cpus(unsigned int 
> > max_cpus)
> 
> Please also cleanup the passed in max_cpus since it's not used here any
> more. And up to the caller:

Oops, I just checked x86 code, other arch also have this hook. Please
ingore this comment.

> 
> static noinline void __init kernel_init_freeable(void)
> {
>   ...
>   smp_prepare_cpus(setup_max_cpus);
>   ...
> }
> 
> >  
> > apic_intr_mode_init();
> >  
> > -   switch (smp_sanity_check(max_cpus)) {
> > -   case SMP_NO_CONFIG:
> > -   disable_smp();
> > -   return;
> > -   case SMP_NO_APIC:
> > +   smp_sanity_check();
> > +
> > +   switch (apic_intr_mode) {
> > +   case APIC_PIC:
> > +   case APIC_VIRTUAL_WIRE_NO_CONFIG:
> > disable_smp();
> > return;
> > -   case SMP_FORCE_UP:
> > +   case APIC_SYMMETRIC_IO_NO_ROUTING:
> > disable_smp();
> > /* Setup local timer */
> > x86_init.timers.setup_percpu_clockev();
> > return;
> > -   case SMP_OK:
> > +   case APIC_VIRTUAL_WIRE:
> > +   case APIC_SYMMETRIC_IO:
> > break;
> > }
> >  
> > -- 
> > 2.5.5
> > 
> > 
> > 


Re: [PATCH v3 2/3] arm: dts: add Nuvoton NPCM750 device tree

2017-09-05 Thread Joel Stanley
On Wed, Sep 6, 2017 at 10:00 AM, Brendan Higgins
 wrote:
> +++ b/Documentation/devicetree/bindings/arm/npcm/npcm.txt
> @@ -0,0 +1,6 @@
> +NPCM Platforms Device Tree Bindings
> +---
> +NPCM750 SoC
> +Required root node properties:
> +   - compatible = "nuvoton,npcm750";
> +

This is minimal. I assume there will be more content added as more
support is added?

Does it need it's own directory?


> diff --git a/arch/arm/boot/dts/nuvoton-npcm750-evb.dts 
> b/arch/arm/boot/dts/nuvoton-npcm750-evb.dts
> new file mode 100644
> index ..54df32cff21b
> --- /dev/null
> +++ b/arch/arm/boot/dts/nuvoton-npcm750-evb.dts

> +
> +/dts-v1/;
> +#include "nuvoton-npcm750.dtsi"
> +
> +/ {
> +   model = "Nuvoton npcm750 Development Board (Device Tree)";
> +   compatible = "nuvoton,npcm750";
> +
> +   chosen {
> +   stdout-path = 
> +   bootargs = "earlyprintk=serial,serial3,115200";
> +   };
> +
> +   memory {
> +   reg = <0 0x4000>;
> +   };
> +
> +   cpus {
> +   enable-method = "nuvoton,npcm7xx-smp";
> +   };
> +
> +   clk: clock-controller@f0801000 {
> +   status = "okay";
> +   };
> +
> +   apb {
> +   watchdog1: watchdog@f0009000 {
> +   status = "okay";
> +   };

You've already got the label for the node, is there are reason you
don't use a phandle to set the status?

 {
   status = "okay";
};

Same with the serial nodes below.

> +
> +   serial0: serial0@f0001000 {
> +   status = "okay";
> +   };
> +
> +   serial1: serial1@f0002000 {
> +   status = "okay";
> +   };
> +
> +   serial2: serial2@f0003000 {
> +   status = "okay";
> +   };
> +
> +   serial3: serial3@f0004000 {
> +   status = "okay";
> +   };
> +   };


Re: [PATCH v3 2/3] arm: dts: add Nuvoton NPCM750 device tree

2017-09-05 Thread Joel Stanley
On Wed, Sep 6, 2017 at 10:00 AM, Brendan Higgins
 wrote:
> +++ b/Documentation/devicetree/bindings/arm/npcm/npcm.txt
> @@ -0,0 +1,6 @@
> +NPCM Platforms Device Tree Bindings
> +---
> +NPCM750 SoC
> +Required root node properties:
> +   - compatible = "nuvoton,npcm750";
> +

This is minimal. I assume there will be more content added as more
support is added?

Does it need it's own directory?


> diff --git a/arch/arm/boot/dts/nuvoton-npcm750-evb.dts 
> b/arch/arm/boot/dts/nuvoton-npcm750-evb.dts
> new file mode 100644
> index ..54df32cff21b
> --- /dev/null
> +++ b/arch/arm/boot/dts/nuvoton-npcm750-evb.dts

> +
> +/dts-v1/;
> +#include "nuvoton-npcm750.dtsi"
> +
> +/ {
> +   model = "Nuvoton npcm750 Development Board (Device Tree)";
> +   compatible = "nuvoton,npcm750";
> +
> +   chosen {
> +   stdout-path = 
> +   bootargs = "earlyprintk=serial,serial3,115200";
> +   };
> +
> +   memory {
> +   reg = <0 0x4000>;
> +   };
> +
> +   cpus {
> +   enable-method = "nuvoton,npcm7xx-smp";
> +   };
> +
> +   clk: clock-controller@f0801000 {
> +   status = "okay";
> +   };
> +
> +   apb {
> +   watchdog1: watchdog@f0009000 {
> +   status = "okay";
> +   };

You've already got the label for the node, is there are reason you
don't use a phandle to set the status?

 {
   status = "okay";
};

Same with the serial nodes below.

> +
> +   serial0: serial0@f0001000 {
> +   status = "okay";
> +   };
> +
> +   serial1: serial1@f0002000 {
> +   status = "okay";
> +   };
> +
> +   serial2: serial2@f0003000 {
> +   status = "okay";
> +   };
> +
> +   serial3: serial3@f0004000 {
> +   status = "okay";
> +   };
> +   };


Abysmal scheduler performance in Linus' tree?

2017-09-05 Thread Andy Lutomirski
I'm running e7d0c41ecc2e372a81741a30894f556afec24315 from Linus' tree
today, and I'm seeing abysmal scheduler performance.  Running make -j4
ends up with all the tasks on CPU 3 most of the time (on my
4-logical-thread laptop).  taskset -c 0 whatever puts whatever on CPU
0, but plain while true; do true; done puts the infinite loop on CPU 3
right along with the make -j4 tasks.

This is on Fedora 26, and I don't think I'm doing anything weird.
systemd has enabled the cpu controller, but it doesn't seem to have
configured anything or created any non-root cgroups.

Just a heads up.  I haven't tried to diagnose it at all.


Abysmal scheduler performance in Linus' tree?

2017-09-05 Thread Andy Lutomirski
I'm running e7d0c41ecc2e372a81741a30894f556afec24315 from Linus' tree
today, and I'm seeing abysmal scheduler performance.  Running make -j4
ends up with all the tasks on CPU 3 most of the time (on my
4-logical-thread laptop).  taskset -c 0 whatever puts whatever on CPU
0, but plain while true; do true; done puts the infinite loop on CPU 3
right along with the make -j4 tasks.

This is on Fedora 26, and I don't think I'm doing anything weird.
systemd has enabled the cpu controller, but it doesn't seem to have
configured anything or created any non-root cgroups.

Just a heads up.  I haven't tried to diagnose it at all.


[PATCH v2] arm: dts: artpec6: Remove unnecessary interrupt-parent property from sub-nodes

2017-09-05 Thread surenderp
From: Surender Polsani <surend...@techveda.org>

"interrupt-parent" property is declared in root node, so it is global
to all nodes. This property is re-declared in few sub-nodes. To avoid
duplication this property is removed from following sub-nodes: pmu,
amba@0, amba@0/ethernet.

Signed-off-by: Surender Polsani <surend...@techveda.org>
Acked-by: Niklas Cassel <niklas.cas...@axis.com>
---
Changes for v2:

- Done few changes as suggested by Niklas Cassel.
- Patch is rebased and built(ARCH=arm) on latest next-20170905.
---
 arch/arm/boot/dts/artpec6.dtsi | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/arm/boot/dts/artpec6.dtsi b/arch/arm/boot/dts/artpec6.dtsi
index 767cbe8..2ed1177 100644
--- a/arch/arm/boot/dts/artpec6.dtsi
+++ b/arch/arm/boot/dts/artpec6.dtsi
@@ -151,7 +151,6 @@
interrupts = ,
;
interrupt-affinity = <>, <>;
-   interrupt-parent = <>;
};
 
pcie: pcie@f805 {
@@ -185,7 +184,6 @@
compatible = "simple-bus";
#address-cells = <0x1>;
#size-cells = <0x1>;
-   interrupt-parent = <>;
ranges;
dma-ranges = <0x8000 0x 0x4000>;
dma-coherent;
@@ -195,7 +193,6 @@
clocks = <_phy_ref_clk>,
< ARTPEC6_CLK_ETH_ACLK>;
compatible = "snps,dwc-qos-ethernet-4.10";
-   interrupt-parent = <>;
interrupts = ;
reg = <0xf801 0x4000>;
 
-- 
1.9.1



[PATCH v2] arm: dts: artpec6: Remove unnecessary interrupt-parent property from sub-nodes

2017-09-05 Thread surenderp
From: Surender Polsani 

"interrupt-parent" property is declared in root node, so it is global
to all nodes. This property is re-declared in few sub-nodes. To avoid
duplication this property is removed from following sub-nodes: pmu,
amba@0, amba@0/ethernet.

Signed-off-by: Surender Polsani 
Acked-by: Niklas Cassel 
---
Changes for v2:

- Done few changes as suggested by Niklas Cassel.
- Patch is rebased and built(ARCH=arm) on latest next-20170905.
---
 arch/arm/boot/dts/artpec6.dtsi | 3 ---
 1 file changed, 3 deletions(-)

diff --git a/arch/arm/boot/dts/artpec6.dtsi b/arch/arm/boot/dts/artpec6.dtsi
index 767cbe8..2ed1177 100644
--- a/arch/arm/boot/dts/artpec6.dtsi
+++ b/arch/arm/boot/dts/artpec6.dtsi
@@ -151,7 +151,6 @@
interrupts = ,
;
interrupt-affinity = <>, <>;
-   interrupt-parent = <>;
};
 
pcie: pcie@f805 {
@@ -185,7 +184,6 @@
compatible = "simple-bus";
#address-cells = <0x1>;
#size-cells = <0x1>;
-   interrupt-parent = <>;
ranges;
dma-ranges = <0x8000 0x 0x4000>;
dma-coherent;
@@ -195,7 +193,6 @@
clocks = <_phy_ref_clk>,
< ARTPEC6_CLK_ETH_ACLK>;
compatible = "snps,dwc-qos-ethernet-4.10";
-   interrupt-parent = <>;
interrupts = ;
reg = <0xf801 0x4000>;
 
-- 
1.9.1



UBSAN: Undefined error in log2.h

2017-09-05 Thread Shankara Pailoor
Hi,

I am hitting this bug when running the syzkaller fuzzer on kernel 4.13-rc7

Syzkaller hit 'UBSAN: Undefined behaviour in ./include/linux/log2.h:LINE' bug.

Guilty file: fs/pipe.c

Maintainers: []

UBSAN: Undefined behaviour in ./include/linux/log2.h:57:13
shift exponent 64 is too large for 64-bit type 'long unsigned int'
CPU: 3 PID: 2712 Comm: syz-executor0 Not tainted 4.13.0-rc7 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0xf7/0x1ae lib/dump_stack.c:52
 ubsan_epilogue+0x12/0x8f lib/ubsan.c:164
 __ubsan_handle_shift_out_of_bounds+0x2b2/0x32c lib/ubsan.c:421
 __roundup_pow_of_two include/linux/log2.h:57 [inline]
 round_pipe_size fs/pipe.c:1027 [inline]
 pipe_set_size fs/pipe.c:1041 [inline]
 pipe_fcntl+0x6b6/0x7b0 fs/pipe.c:1158
 do_fcntl+0x362/0x1250 fs/fcntl.c:416
 SYSC_fcntl fs/fcntl.c:462 [inline]
 SyS_fcntl+0xd6/0x110 fs/fcntl.c:447
 entry_SYSCALL_64_fastpath+0x18/0xad
RIP: 0033:0x451e59
RSP: 002b:7ffe1469d358 EFLAGS: 0212 ORIG_RAX: 0048
RAX: ffda RBX: 00718000 RCX: 00451e59
RDX:  RSI: 0407 RDI: 0003
RBP: 0046 R08:  R09: 
R10:  R11: 0212 R12: 20011ff8
R13: 00718000 R14:  R15: 


I have the full reproducer program here: https://pastebin.com/mLZx0ySS

The main code is below:

long r[4];
void loop()
{
syscall(SYS_write, 1, "executing program\n", strlen("executing program\n"));
memset(r, -1, sizeof(r));
r[0] = syscall(__NR_mmap, 0x2000ul, 0x12000ul, 0x3ul, 0x32ul,
0xul, 0x0ul);
r[1] = syscall(__NR_pipe, 0x20011ff8ul);
if (r[1] != -1)
NONFAILING(r[2] = *(uint32_t*)0x20011ff8);
r[3] = syscall(__NR_fcntl, r[2], 0x407ul, 0x0ul);
}



UBSAN: Undefined error in log2.h

2017-09-05 Thread Shankara Pailoor
Hi,

I am hitting this bug when running the syzkaller fuzzer on kernel 4.13-rc7

Syzkaller hit 'UBSAN: Undefined behaviour in ./include/linux/log2.h:LINE' bug.

Guilty file: fs/pipe.c

Maintainers: []

UBSAN: Undefined behaviour in ./include/linux/log2.h:57:13
shift exponent 64 is too large for 64-bit type 'long unsigned int'
CPU: 3 PID: 2712 Comm: syz-executor0 Not tainted 4.13.0-rc7 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0xf7/0x1ae lib/dump_stack.c:52
 ubsan_epilogue+0x12/0x8f lib/ubsan.c:164
 __ubsan_handle_shift_out_of_bounds+0x2b2/0x32c lib/ubsan.c:421
 __roundup_pow_of_two include/linux/log2.h:57 [inline]
 round_pipe_size fs/pipe.c:1027 [inline]
 pipe_set_size fs/pipe.c:1041 [inline]
 pipe_fcntl+0x6b6/0x7b0 fs/pipe.c:1158
 do_fcntl+0x362/0x1250 fs/fcntl.c:416
 SYSC_fcntl fs/fcntl.c:462 [inline]
 SyS_fcntl+0xd6/0x110 fs/fcntl.c:447
 entry_SYSCALL_64_fastpath+0x18/0xad
RIP: 0033:0x451e59
RSP: 002b:7ffe1469d358 EFLAGS: 0212 ORIG_RAX: 0048
RAX: ffda RBX: 00718000 RCX: 00451e59
RDX:  RSI: 0407 RDI: 0003
RBP: 0046 R08:  R09: 
R10:  R11: 0212 R12: 20011ff8
R13: 00718000 R14:  R15: 


I have the full reproducer program here: https://pastebin.com/mLZx0ySS

The main code is below:

long r[4];
void loop()
{
syscall(SYS_write, 1, "executing program\n", strlen("executing program\n"));
memset(r, -1, sizeof(r));
r[0] = syscall(__NR_mmap, 0x2000ul, 0x12000ul, 0x3ul, 0x32ul,
0xul, 0x0ul);
r[1] = syscall(__NR_pipe, 0x20011ff8ul);
if (r[1] != -1)
NONFAILING(r[2] = *(uint32_t*)0x20011ff8);
r[3] = syscall(__NR_fcntl, r[2], 0x407ul, 0x0ul);
}



Re: UBSAN: Undefined error in time.h signed integer overflow

2017-09-05 Thread John Stultz
On Tue, Sep 5, 2017 at 9:30 PM, Shankara Pailoor  wrote:
> Hi,
>
> I encountered this bug while fuzzing linux kernel 4.13-rc7 with syzkaller.
>
> 
> UBSAN: Undefined behaviour in ./include/linux/time.h:233:27
> signed integer overflow:
> 8391720337152500783 * 10 cannot be represented in type 'long int'
> CPU: 0 PID: 31798 Comm: syz-executor2 Not tainted 4.13.0-rc7 #2
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
> Call Trace:
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0xf7/0x1ae lib/dump_stack.c:52
>  ubsan_epilogue+0x12/0x8f lib/ubsan.c:164
>  handle_overflow+0x21e/0x292 lib/ubsan.c:195
>  __ubsan_handle_mul_overflow+0x2a/0x3e lib/ubsan.c:219
>  timespec_to_ns include/linux/time.h:233 [inline]
>  posix_cpu_timer_set+0xb5c/0xf20 kernel/time/posix-cpu-timers.c:686
>  do_timer_settime+0x1f4/0x390 kernel/time/posix-timers.c:890
>  SYSC_timer_settime kernel/time/posix-timers.c:916 [inline]
>  SyS_timer_settime+0xea/0x170 kernel/time/posix-timers.c:902
>  entry_SYSCALL_64_fastpath+0x18/0xad
> RIP: 0033:0x451e59
> RSP: 002b:7fb62af4fc08 EFLAGS: 0216 ORIG_RAX: 00df
> RAX: ffda RBX: 00718000 RCX: 00451e59
> RDX: 20006000 RSI:  RDI: 
> RBP: 0046 R08:  R09: 
> R10: 20003fe0 R11: 0216 R12: 004be920
> R13:  R14:  R15: 
> 

Looks similar to the issue Thomas fixed here:
   https://patchwork.kernel.org/patch/9799827/


Thomas: Should we change timespec_to_ns() to use the same transition
internally rather then trying to track all the callers?

thanks
-john


Re: UBSAN: Undefined error in time.h signed integer overflow

2017-09-05 Thread John Stultz
On Tue, Sep 5, 2017 at 9:30 PM, Shankara Pailoor  wrote:
> Hi,
>
> I encountered this bug while fuzzing linux kernel 4.13-rc7 with syzkaller.
>
> 
> UBSAN: Undefined behaviour in ./include/linux/time.h:233:27
> signed integer overflow:
> 8391720337152500783 * 10 cannot be represented in type 'long int'
> CPU: 0 PID: 31798 Comm: syz-executor2 Not tainted 4.13.0-rc7 #2
> Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
> Ubuntu-1.8.2-1ubuntu1 04/01/2014
> Call Trace:
>  __dump_stack lib/dump_stack.c:16 [inline]
>  dump_stack+0xf7/0x1ae lib/dump_stack.c:52
>  ubsan_epilogue+0x12/0x8f lib/ubsan.c:164
>  handle_overflow+0x21e/0x292 lib/ubsan.c:195
>  __ubsan_handle_mul_overflow+0x2a/0x3e lib/ubsan.c:219
>  timespec_to_ns include/linux/time.h:233 [inline]
>  posix_cpu_timer_set+0xb5c/0xf20 kernel/time/posix-cpu-timers.c:686
>  do_timer_settime+0x1f4/0x390 kernel/time/posix-timers.c:890
>  SYSC_timer_settime kernel/time/posix-timers.c:916 [inline]
>  SyS_timer_settime+0xea/0x170 kernel/time/posix-timers.c:902
>  entry_SYSCALL_64_fastpath+0x18/0xad
> RIP: 0033:0x451e59
> RSP: 002b:7fb62af4fc08 EFLAGS: 0216 ORIG_RAX: 00df
> RAX: ffda RBX: 00718000 RCX: 00451e59
> RDX: 20006000 RSI:  RDI: 
> RBP: 0046 R08:  R09: 
> R10: 20003fe0 R11: 0216 R12: 004be920
> R13:  R14:  R15: 
> 

Looks similar to the issue Thomas fixed here:
   https://patchwork.kernel.org/patch/9799827/


Thomas: Should we change timespec_to_ns() to use the same transition
internally rather then trying to track all the callers?

thanks
-john


[PATCH 1/2] mm/slub: wake up kswapd for initial high order allocation

2017-09-05 Thread js1304
From: Joonsoo Kim 

slub uses higher order allocation than it actually needs. In this case,
we don't want to do direct reclaim to make such a high order page since
it causes a big latency to the user. Instead, we would like to fallback
lower order allocation that it actually needs.

However, we also want to get this higher order page in the next time
in order to get the best performance and it would be a role of
the background thread like as kswapd and kcompactd. To wake up them,
we should not clear __GFP_KSWAPD_RECLAIM.

Unlike this intention, current code clears __GFP_KSWAPD_RECLAIM so fix it.
Current unintended code is done by Mel's commit 444eb2a449ef ("mm: thp:
set THP defrag by default to madvise and add a stall-free defrag option")
for slub part. It removes a special case in __alloc_page_slowpath()
where including __GFP_THISNODE and lacking ~__GFP_DIRECT_RECLAIM
effectively means also lacking __GFP_KSWAPD_RECLAIM. However, slub
doesn't use __GFP_THISNODE so it is not the case for this purpose. So,
partially reverting this code in slub doesn't hurt Mel's intention.

Note that this patch does some clean up, too.
__GFP_NOFAIL is cleared twice so remove one.

Signed-off-by: Joonsoo Kim 
---
 mm/slub.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 163352c..45f4a4b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1578,8 +1578,12 @@ static struct page *allocate_slab(struct kmem_cache *s, 
gfp_t flags, int node)
 * so we fall-back to the minimum order allocation.
 */
alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL;
-   if ((alloc_gfp & __GFP_DIRECT_RECLAIM) && oo_order(oo) > 
oo_order(s->min))
-   alloc_gfp = (alloc_gfp | __GFP_NOMEMALLOC) & 
~(__GFP_RECLAIM|__GFP_NOFAIL);
+   if (oo_order(oo) > oo_order(s->min)) {
+   if (alloc_gfp & __GFP_DIRECT_RECLAIM) {
+   alloc_gfp |= __GFP_NOMEMALLOC;
+   alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
+   }
+   }
 
page = alloc_slab_page(s, alloc_gfp, node, oo);
if (unlikely(!page)) {
-- 
2.7.4



[PATCH 2/2] mm/slub: don't use reserved memory for optimistic try

2017-09-05 Thread js1304
From: Joonsoo Kim 

High-order atomic allocation is difficult to succeed since we cannot
reclaim anything in this context. So, we reserves the pageblock for
this kind of request.

In slub, we try to allocate higher-order page more than it actually
needs in order to get the best performance. If this optimistic try is
used with GFP_ATOMIC, alloc_flags will be set as ALLOC_HARDER and
the pageblock reserved for high-order atomic allocation would be used.
Moreover, this request would reserve the MIGRATE_HIGHATOMIC pageblock
,if succeed, to prepare further request. It would not be good to use
MIGRATE_HIGHATOMIC pageblock in terms of fragmentation management
since it unconditionally set a migratetype to request's migratetype
when unreserving the pageblock without considering the migratetype of
used pages in the pageblock.

This is not what we don't intend so fix it by unconditionally masking
out __GFP_ATOMIC in order to not set ALLOC_HARDER.

And, it is also undesirable to use reserved memory for optimistic try
so mask out __GFP_HIGH. This patch also adds __GFP_NOMEMALLOC since
we don't want to use the reserved memory for optimistic try even if
the user has PF_MEMALLOC flag.

Signed-off-by: Joonsoo Kim 
---
 include/linux/gfp.h | 1 +
 mm/page_alloc.c | 8 
 mm/slub.c   | 6 ++
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index f780718..1f5658e 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -568,6 +568,7 @@ extern gfp_t gfp_allowed_mask;
 
 /* Returns true if the gfp_mask allows use of ALLOC_NO_WATERMARK */
 bool gfp_pfmemalloc_allowed(gfp_t gfp_mask);
+gfp_t gfp_drop_reserves(gfp_t gfp_mask);
 
 extern void pm_restrict_gfp_mask(void);
 extern void pm_restore_gfp_mask(void);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6dbc49e..0f34356 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3720,6 +3720,14 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
return !!__gfp_pfmemalloc_flags(gfp_mask);
 }
 
+gfp_t gfp_drop_reserves(gfp_t gfp_mask)
+{
+   gfp_mask &= ~(__GFP_HIGH | __GFP_ATOMIC);
+   gfp_mask |= __GFP_NOMEMALLOC;
+
+   return gfp_mask;
+}
+
 /*
  * Checks whether it makes sense to retry the reclaim to make a forward 
progress
  * for the given allocation request.
diff --git a/mm/slub.c b/mm/slub.c
index 45f4a4b..3d75d30 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1579,10 +1579,8 @@ static struct page *allocate_slab(struct kmem_cache *s, 
gfp_t flags, int node)
 */
alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL;
if (oo_order(oo) > oo_order(s->min)) {
-   if (alloc_gfp & __GFP_DIRECT_RECLAIM) {
-   alloc_gfp |= __GFP_NOMEMALLOC;
-   alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
-   }
+   alloc_gfp = gfp_drop_reserves(alloc_gfp);
+   alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
}
 
page = alloc_slab_page(s, alloc_gfp, node, oo);
-- 
2.7.4



[PATCH 1/2] mm/slub: wake up kswapd for initial high order allocation

2017-09-05 Thread js1304
From: Joonsoo Kim 

slub uses higher order allocation than it actually needs. In this case,
we don't want to do direct reclaim to make such a high order page since
it causes a big latency to the user. Instead, we would like to fallback
lower order allocation that it actually needs.

However, we also want to get this higher order page in the next time
in order to get the best performance and it would be a role of
the background thread like as kswapd and kcompactd. To wake up them,
we should not clear __GFP_KSWAPD_RECLAIM.

Unlike this intention, current code clears __GFP_KSWAPD_RECLAIM so fix it.
Current unintended code is done by Mel's commit 444eb2a449ef ("mm: thp:
set THP defrag by default to madvise and add a stall-free defrag option")
for slub part. It removes a special case in __alloc_page_slowpath()
where including __GFP_THISNODE and lacking ~__GFP_DIRECT_RECLAIM
effectively means also lacking __GFP_KSWAPD_RECLAIM. However, slub
doesn't use __GFP_THISNODE so it is not the case for this purpose. So,
partially reverting this code in slub doesn't hurt Mel's intention.

Note that this patch does some clean up, too.
__GFP_NOFAIL is cleared twice so remove one.

Signed-off-by: Joonsoo Kim 
---
 mm/slub.c | 8 ++--
 1 file changed, 6 insertions(+), 2 deletions(-)

diff --git a/mm/slub.c b/mm/slub.c
index 163352c..45f4a4b 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1578,8 +1578,12 @@ static struct page *allocate_slab(struct kmem_cache *s, 
gfp_t flags, int node)
 * so we fall-back to the minimum order allocation.
 */
alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL;
-   if ((alloc_gfp & __GFP_DIRECT_RECLAIM) && oo_order(oo) > 
oo_order(s->min))
-   alloc_gfp = (alloc_gfp | __GFP_NOMEMALLOC) & 
~(__GFP_RECLAIM|__GFP_NOFAIL);
+   if (oo_order(oo) > oo_order(s->min)) {
+   if (alloc_gfp & __GFP_DIRECT_RECLAIM) {
+   alloc_gfp |= __GFP_NOMEMALLOC;
+   alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
+   }
+   }
 
page = alloc_slab_page(s, alloc_gfp, node, oo);
if (unlikely(!page)) {
-- 
2.7.4



[PATCH 2/2] mm/slub: don't use reserved memory for optimistic try

2017-09-05 Thread js1304
From: Joonsoo Kim 

High-order atomic allocation is difficult to succeed since we cannot
reclaim anything in this context. So, we reserves the pageblock for
this kind of request.

In slub, we try to allocate higher-order page more than it actually
needs in order to get the best performance. If this optimistic try is
used with GFP_ATOMIC, alloc_flags will be set as ALLOC_HARDER and
the pageblock reserved for high-order atomic allocation would be used.
Moreover, this request would reserve the MIGRATE_HIGHATOMIC pageblock
,if succeed, to prepare further request. It would not be good to use
MIGRATE_HIGHATOMIC pageblock in terms of fragmentation management
since it unconditionally set a migratetype to request's migratetype
when unreserving the pageblock without considering the migratetype of
used pages in the pageblock.

This is not what we don't intend so fix it by unconditionally masking
out __GFP_ATOMIC in order to not set ALLOC_HARDER.

And, it is also undesirable to use reserved memory for optimistic try
so mask out __GFP_HIGH. This patch also adds __GFP_NOMEMALLOC since
we don't want to use the reserved memory for optimistic try even if
the user has PF_MEMALLOC flag.

Signed-off-by: Joonsoo Kim 
---
 include/linux/gfp.h | 1 +
 mm/page_alloc.c | 8 
 mm/slub.c   | 6 ++
 3 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/linux/gfp.h b/include/linux/gfp.h
index f780718..1f5658e 100644
--- a/include/linux/gfp.h
+++ b/include/linux/gfp.h
@@ -568,6 +568,7 @@ extern gfp_t gfp_allowed_mask;
 
 /* Returns true if the gfp_mask allows use of ALLOC_NO_WATERMARK */
 bool gfp_pfmemalloc_allowed(gfp_t gfp_mask);
+gfp_t gfp_drop_reserves(gfp_t gfp_mask);
 
 extern void pm_restrict_gfp_mask(void);
 extern void pm_restore_gfp_mask(void);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 6dbc49e..0f34356 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -3720,6 +3720,14 @@ bool gfp_pfmemalloc_allowed(gfp_t gfp_mask)
return !!__gfp_pfmemalloc_flags(gfp_mask);
 }
 
+gfp_t gfp_drop_reserves(gfp_t gfp_mask)
+{
+   gfp_mask &= ~(__GFP_HIGH | __GFP_ATOMIC);
+   gfp_mask |= __GFP_NOMEMALLOC;
+
+   return gfp_mask;
+}
+
 /*
  * Checks whether it makes sense to retry the reclaim to make a forward 
progress
  * for the given allocation request.
diff --git a/mm/slub.c b/mm/slub.c
index 45f4a4b..3d75d30 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -1579,10 +1579,8 @@ static struct page *allocate_slab(struct kmem_cache *s, 
gfp_t flags, int node)
 */
alloc_gfp = (flags | __GFP_NOWARN | __GFP_NORETRY) & ~__GFP_NOFAIL;
if (oo_order(oo) > oo_order(s->min)) {
-   if (alloc_gfp & __GFP_DIRECT_RECLAIM) {
-   alloc_gfp |= __GFP_NOMEMALLOC;
-   alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
-   }
+   alloc_gfp = gfp_drop_reserves(alloc_gfp);
+   alloc_gfp &= ~__GFP_DIRECT_RECLAIM;
}
 
page = alloc_slab_page(s, alloc_gfp, node, oo);
-- 
2.7.4



[PATCH] mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request

2017-09-05 Thread js1304
From: Joonsoo Kim 

Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
important to reserve. When ZONE_MOVABLE is used, this problem would
theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
allocation request which is mainly used for page cache and anon page
allocation. So, fix it by setting 0 to
sysctl_lowmem_reserve_ratio[ZONE_HIGHMEM].

And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
makes code complex. For example, if there is highmem system, following
reserve ratio is activated for *NORMAL ZONE* which would be easyily
misleading people.

 #ifdef CONFIG_HIGHMEM
 32
 #endif

This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
array by MAX_NR_ZONES and place "#ifdef" to right place.

Reviewed-by: Aneesh Kumar K.V 
Acked-by: Vlastimil Babka 
Signed-off-by: Joonsoo Kim 
---
 Documentation/sysctl/vm.txt |  5 ++---
 include/linux/mmzone.h  |  2 +-
 mm/page_alloc.c | 25 ++---
 3 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 9baf66a..e9059d3 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -336,8 +336,6 @@ The lowmem_reserve_ratio is an array. You can see them by 
reading this file.
 % cat /proc/sys/vm/lowmem_reserve_ratio
 256 256 32
 -
-Note: # of this elements is one fewer than number of zones. Because the highest
-  zone's value is not necessary for following calculation.
 
 But, these values are not used directly. The kernel calculates # of protection
 pages for each zones from them. These are shown as array of protection pages
@@ -388,7 +386,8 @@ As above expression, they are reciprocal number of ratio.
 pages of higher zones on the node.
 
 If you would like to protect more pages, smaller values are effective.
-The minimum value is 1 (1/1 -> 100%).
+The minimum value is 1 (1/1 -> 100%). The value less than 1 completely
+disables protection of the pages.
 
 ==
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 356a814..d549c4e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -890,7 +890,7 @@ int min_free_kbytes_sysctl_handler(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
 int watermark_scale_factor_sysctl_handler(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
-extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1];
+extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES];
 int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
 int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *, int,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0f34356..2a7f7e9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -203,17 +203,18 @@ static void __free_pages_ok(struct page *page, unsigned 
int order);
  * TBD: should special case ZONE_DMA32 machines here - in those we normally
  * don't need any ZONE_NORMAL reservation
  */
-int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
+int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES] = {
 #ifdef CONFIG_ZONE_DMA
-256,
+   [ZONE_DMA] = 256,
 #endif
 #ifdef CONFIG_ZONE_DMA32
-256,
+   [ZONE_DMA32] = 256,
 #endif
+   [ZONE_NORMAL] = 32,
 #ifdef CONFIG_HIGHMEM
-32,
+   [ZONE_HIGHMEM] = 0,
 #endif
-32,
+   [ZONE_MOVABLE] = 0,
 };
 
 EXPORT_SYMBOL(totalram_pages);
@@ -6921,13 +6922,15 @@ static void setup_per_zone_lowmem_reserve(void)
struct zone *lower_zone;
 
idx--;
-
-   if (sysctl_lowmem_reserve_ratio[idx] < 1)
-   sysctl_lowmem_reserve_ratio[idx] = 1;
-
lower_zone = pgdat->node_zones + idx;
-   lower_zone->lowmem_reserve[j] = managed_pages /
-   sysctl_lowmem_reserve_ratio[idx];
+
+   if (sysctl_lowmem_reserve_ratio[idx] < 1) {
+   sysctl_lowmem_reserve_ratio[idx] = 0;
+   lower_zone->lowmem_reserve[j] = 0;
+   } else {
+   lower_zone->lowmem_reserve[j] =
+   managed_pages / 
sysctl_lowmem_reserve_ratio[idx];
+   }
managed_pages += lower_zone->managed_pages;
}
}
-- 
2.7.4



[PATCH] mm/page_alloc: don't reserve ZONE_HIGHMEM for ZONE_MOVABLE request

2017-09-05 Thread js1304
From: Joonsoo Kim 

Freepage on ZONE_HIGHMEM doesn't work for kernel memory so it's not that
important to reserve. When ZONE_MOVABLE is used, this problem would
theorectically cause to decrease usable memory for GFP_HIGHUSER_MOVABLE
allocation request which is mainly used for page cache and anon page
allocation. So, fix it by setting 0 to
sysctl_lowmem_reserve_ratio[ZONE_HIGHMEM].

And, defining sysctl_lowmem_reserve_ratio array by MAX_NR_ZONES - 1 size
makes code complex. For example, if there is highmem system, following
reserve ratio is activated for *NORMAL ZONE* which would be easyily
misleading people.

 #ifdef CONFIG_HIGHMEM
 32
 #endif

This patch also fix this situation by defining sysctl_lowmem_reserve_ratio
array by MAX_NR_ZONES and place "#ifdef" to right place.

Reviewed-by: Aneesh Kumar K.V 
Acked-by: Vlastimil Babka 
Signed-off-by: Joonsoo Kim 
---
 Documentation/sysctl/vm.txt |  5 ++---
 include/linux/mmzone.h  |  2 +-
 mm/page_alloc.c | 25 ++---
 3 files changed, 17 insertions(+), 15 deletions(-)

diff --git a/Documentation/sysctl/vm.txt b/Documentation/sysctl/vm.txt
index 9baf66a..e9059d3 100644
--- a/Documentation/sysctl/vm.txt
+++ b/Documentation/sysctl/vm.txt
@@ -336,8 +336,6 @@ The lowmem_reserve_ratio is an array. You can see them by 
reading this file.
 % cat /proc/sys/vm/lowmem_reserve_ratio
 256 256 32
 -
-Note: # of this elements is one fewer than number of zones. Because the highest
-  zone's value is not necessary for following calculation.
 
 But, these values are not used directly. The kernel calculates # of protection
 pages for each zones from them. These are shown as array of protection pages
@@ -388,7 +386,8 @@ As above expression, they are reciprocal number of ratio.
 pages of higher zones on the node.
 
 If you would like to protect more pages, smaller values are effective.
-The minimum value is 1 (1/1 -> 100%).
+The minimum value is 1 (1/1 -> 100%). The value less than 1 completely
+disables protection of the pages.
 
 ==
 
diff --git a/include/linux/mmzone.h b/include/linux/mmzone.h
index 356a814..d549c4e 100644
--- a/include/linux/mmzone.h
+++ b/include/linux/mmzone.h
@@ -890,7 +890,7 @@ int min_free_kbytes_sysctl_handler(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
 int watermark_scale_factor_sysctl_handler(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
-extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1];
+extern int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES];
 int lowmem_reserve_ratio_sysctl_handler(struct ctl_table *, int,
void __user *, size_t *, loff_t *);
 int percpu_pagelist_fraction_sysctl_handler(struct ctl_table *, int,
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 0f34356..2a7f7e9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -203,17 +203,18 @@ static void __free_pages_ok(struct page *page, unsigned 
int order);
  * TBD: should special case ZONE_DMA32 machines here - in those we normally
  * don't need any ZONE_NORMAL reservation
  */
-int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES-1] = {
+int sysctl_lowmem_reserve_ratio[MAX_NR_ZONES] = {
 #ifdef CONFIG_ZONE_DMA
-256,
+   [ZONE_DMA] = 256,
 #endif
 #ifdef CONFIG_ZONE_DMA32
-256,
+   [ZONE_DMA32] = 256,
 #endif
+   [ZONE_NORMAL] = 32,
 #ifdef CONFIG_HIGHMEM
-32,
+   [ZONE_HIGHMEM] = 0,
 #endif
-32,
+   [ZONE_MOVABLE] = 0,
 };
 
 EXPORT_SYMBOL(totalram_pages);
@@ -6921,13 +6922,15 @@ static void setup_per_zone_lowmem_reserve(void)
struct zone *lower_zone;
 
idx--;
-
-   if (sysctl_lowmem_reserve_ratio[idx] < 1)
-   sysctl_lowmem_reserve_ratio[idx] = 1;
-
lower_zone = pgdat->node_zones + idx;
-   lower_zone->lowmem_reserve[j] = managed_pages /
-   sysctl_lowmem_reserve_ratio[idx];
+
+   if (sysctl_lowmem_reserve_ratio[idx] < 1) {
+   sysctl_lowmem_reserve_ratio[idx] = 0;
+   lower_zone->lowmem_reserve[j] = 0;
+   } else {
+   lower_zone->lowmem_reserve[j] =
+   managed_pages / 
sysctl_lowmem_reserve_ratio[idx];
+   }
managed_pages += lower_zone->managed_pages;
}
}
-- 
2.7.4



UBSAN: Undefined error in time.h signed integer overflow

2017-09-05 Thread Shankara Pailoor
Hi,

I encountered this bug while fuzzing linux kernel 4.13-rc7 with syzkaller.


UBSAN: Undefined behaviour in ./include/linux/time.h:233:27
signed integer overflow:
8391720337152500783 * 10 cannot be represented in type 'long int'
CPU: 0 PID: 31798 Comm: syz-executor2 Not tainted 4.13.0-rc7 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0xf7/0x1ae lib/dump_stack.c:52
 ubsan_epilogue+0x12/0x8f lib/ubsan.c:164
 handle_overflow+0x21e/0x292 lib/ubsan.c:195
 __ubsan_handle_mul_overflow+0x2a/0x3e lib/ubsan.c:219
 timespec_to_ns include/linux/time.h:233 [inline]
 posix_cpu_timer_set+0xb5c/0xf20 kernel/time/posix-cpu-timers.c:686
 do_timer_settime+0x1f4/0x390 kernel/time/posix-timers.c:890
 SYSC_timer_settime kernel/time/posix-timers.c:916 [inline]
 SyS_timer_settime+0xea/0x170 kernel/time/posix-timers.c:902
 entry_SYSCALL_64_fastpath+0x18/0xad
RIP: 0033:0x451e59
RSP: 002b:7fb62af4fc08 EFLAGS: 0216 ORIG_RAX: 00df
RAX: ffda RBX: 00718000 RCX: 00451e59
RDX: 20006000 RSI:  RDI: 
RBP: 0046 R08:  R09: 
R10: 20003fe0 R11: 0216 R12: 004be920
R13:  R14:  R15: 



Here is the full reproducer program: https://pastebin.com/xucAtmbD

Below is the core of the reproducer:

long r[16];
void *thr(void *arg)
{
switch ((long)arg) {
case 0:
r[0] = syscall(__NR_mmap, 0x2000ul, 0xd000ul, 0x3ul, 0x32ul,
0xul, 0x0ul);
break;
case 1:
NONFAILING(*(uint64_t*)0x20001fb0 = (uint64_t)0x0);
NONFAILING(*(uint32_t*)0x20001fb8 = (uint32_t)0x1);
NONFAILING(*(uint32_t*)0x20001fbc = (uint32_t)0x0);
NONFAILING(*(uint64_t*)0x20001fc0 = (uint64_t)0x20007fcd);
NONFAILING(*(uint64_t*)0x20001fc8 = (uint64_t)0x20005000);
r[6] = syscall(__NR_timer_create, 0x3ul, 0x20001fb0ul, 0x2000ul);
break;
case 2:
r[7] = syscall(__NR_clock_gettime, 0x0ul, 0x20004000ul);
if (r[7] != -1)
NONFAILING(r[8] = *(uint64_t*)0x20004008);
break;
case 3:
NONFAILING(*(uint64_t*)0x20006000 = (uint64_t)0x0);
NONFAILING(*(uint64_t*)0x20006008 = (uint64_t)0x0);
NONFAILING(*(uint64_t*)0x20006010 = (uint64_t)0x0);
NONFAILING(*(uint64_t*)0x20006018 = r[8]+1000);
r[13] = syscall(__NR_timer_settime, 0x0ul, 0x0ul, 0x20006000ul, 0x20003fe0ul);
break;
case 4:
NONFAILING(memcpy((void*)0x20006000,
"\x2f\x64\x65\x76\x2f\x61\x75\x74\x6f\x66\x73\x00", 12));
r[15] = syscall(__NR_openat, 0xff9cul, 0x20006000ul,
0x8000ul, 0x0ul);
break;
}
return 0;
}



UBSAN: Undefined error in time.h signed integer overflow

2017-09-05 Thread Shankara Pailoor
Hi,

I encountered this bug while fuzzing linux kernel 4.13-rc7 with syzkaller.


UBSAN: Undefined behaviour in ./include/linux/time.h:233:27
signed integer overflow:
8391720337152500783 * 10 cannot be represented in type 'long int'
CPU: 0 PID: 31798 Comm: syz-executor2 Not tainted 4.13.0-rc7 #2
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
Call Trace:
 __dump_stack lib/dump_stack.c:16 [inline]
 dump_stack+0xf7/0x1ae lib/dump_stack.c:52
 ubsan_epilogue+0x12/0x8f lib/ubsan.c:164
 handle_overflow+0x21e/0x292 lib/ubsan.c:195
 __ubsan_handle_mul_overflow+0x2a/0x3e lib/ubsan.c:219
 timespec_to_ns include/linux/time.h:233 [inline]
 posix_cpu_timer_set+0xb5c/0xf20 kernel/time/posix-cpu-timers.c:686
 do_timer_settime+0x1f4/0x390 kernel/time/posix-timers.c:890
 SYSC_timer_settime kernel/time/posix-timers.c:916 [inline]
 SyS_timer_settime+0xea/0x170 kernel/time/posix-timers.c:902
 entry_SYSCALL_64_fastpath+0x18/0xad
RIP: 0033:0x451e59
RSP: 002b:7fb62af4fc08 EFLAGS: 0216 ORIG_RAX: 00df
RAX: ffda RBX: 00718000 RCX: 00451e59
RDX: 20006000 RSI:  RDI: 
RBP: 0046 R08:  R09: 
R10: 20003fe0 R11: 0216 R12: 004be920
R13:  R14:  R15: 



Here is the full reproducer program: https://pastebin.com/xucAtmbD

Below is the core of the reproducer:

long r[16];
void *thr(void *arg)
{
switch ((long)arg) {
case 0:
r[0] = syscall(__NR_mmap, 0x2000ul, 0xd000ul, 0x3ul, 0x32ul,
0xul, 0x0ul);
break;
case 1:
NONFAILING(*(uint64_t*)0x20001fb0 = (uint64_t)0x0);
NONFAILING(*(uint32_t*)0x20001fb8 = (uint32_t)0x1);
NONFAILING(*(uint32_t*)0x20001fbc = (uint32_t)0x0);
NONFAILING(*(uint64_t*)0x20001fc0 = (uint64_t)0x20007fcd);
NONFAILING(*(uint64_t*)0x20001fc8 = (uint64_t)0x20005000);
r[6] = syscall(__NR_timer_create, 0x3ul, 0x20001fb0ul, 0x2000ul);
break;
case 2:
r[7] = syscall(__NR_clock_gettime, 0x0ul, 0x20004000ul);
if (r[7] != -1)
NONFAILING(r[8] = *(uint64_t*)0x20004008);
break;
case 3:
NONFAILING(*(uint64_t*)0x20006000 = (uint64_t)0x0);
NONFAILING(*(uint64_t*)0x20006008 = (uint64_t)0x0);
NONFAILING(*(uint64_t*)0x20006010 = (uint64_t)0x0);
NONFAILING(*(uint64_t*)0x20006018 = r[8]+1000);
r[13] = syscall(__NR_timer_settime, 0x0ul, 0x0ul, 0x20006000ul, 0x20003fe0ul);
break;
case 4:
NONFAILING(memcpy((void*)0x20006000,
"\x2f\x64\x65\x76\x2f\x61\x75\x74\x6f\x66\x73\x00", 12));
r[15] = syscall(__NR_openat, 0xff9cul, 0x20006000ul,
0x8000ul, 0x0ul);
break;
}
return 0;
}



Re: [PATCH 4/4][RFC v2] x86/apic: Spread the vectors by choosing the idlest CPU

2017-09-05 Thread Yu Chen
On Wed, Sep 06, 2017 at 12:57:41AM +0200, Thomas Gleixner wrote:
> On Sun, 3 Sep 2017, Thomas Gleixner wrote:
> 
> > On Fri, 1 Sep 2017, Chen Yu wrote:
> > 
> > > This is the major logic to spread the vectors on different CPUs.
> > > The main idea is to choose the 'idlest' CPU which has assigned
> > > the least number of vectors as the candidate/hint for the vector
> > > allocation domain, in the hope that the vector allocation domain
> > > could leverage this hint to generate corresponding cpumask.
> > > 
> > > One of the requirements to do this vector spreading work comes from the
> > > following hibernation problem found on a 16 cores server:
> > > 
> > > CPU 31 disable failed: CPU has 62 vectors assigned and there
> > > are only 0 available.
> 
> Thinking more about this, this makes no sense whatsoever.
> 
> The total number of interrupts on a system is the same whether they are
> all on CPU 0 or evenly spread over all CPUs.
> 
> As this machine is using physcial destination mode, the number of vectors
> used is the same as the number of interrupts, except for the case where a
> move of an interrupt is in progress and the interrupt which cleans up the
> old vector has not yet arrived. Lets ignore that for now.
> 
> The available vector space is 204 per CPU on such a system.
> 
> 256 - SYSTEM[0-31, 32, 128, 239-255] - LEGACY[50] = 204
> 
> > > CPU 31 disable failed: CPU has 62 vectors assigned and there
> > > are only 0 available.
> 
> CPU31 is the last AP going offline (CPU0 is still online).
> 
> It wants to move 62 vectors to CPU0, but it can't because CPU0 has 0
> available vectors. That means CPU0 has 204 vectors used. I doubt that, but
> what I doubt even more is that this interrupt spreading helps in any way.
> 
> Assumed that we have a total of 204 + 62 = 266 device interrupt vectors in
> use and they are evenly spread over 32 CPUs, so each CPU has either 8 or
> nine vectors. Fine.
> 
> Now if you unplug all CPUs except CPU0 starting from CPU1 up to CPU31 then
> at the point where CPU31 is about to be unplugged, CPU0 holds 133 vectors
> and CPU31 holds 133 vectors as well - assumed that the spread is exactly
> even.
> 
> I have a hard time to figure out how the 133 vectors on CPU31 are now
> magically fitting in the empty space on CPU0, which is 204 - 133 = 71. In
> my limited understanding of math 133 is greater than 71, but your patch
> might make that magically be wrong.
>
The problem is reproduced when the network cable is not plugged in,
because this driver looks like this:

step 1. Reserved enough irq vectors and corresponding IRQs.
step 2. If the network is activated, invoke request_irq() to
register the handler.
step 3. Invoke set_affinity() to spread the IRQs onto different
CPUs, thus to spread the vectors too.

Here's my understanding for why spreading vectors might help for this
special case: 
As step 2 will not get invoked, the IRQs of this driver
has not been enabled, thus in migrate_one_irq() this IRQ
will not be considered because there is a check of
irqd_is_started(d), thus there should only be 8 vectors
allocated by this driver on CPU0, and 8 vectors left on
CPU31, and the 8 vectors on CPU31 will not be migrated
to CPU0 neither, so there is room for other 'valid' vectors
to be migrated to CPU0.
> Can you please provide detailed information about how many device
> interrupts are actually in use/allocated on that system?
> 
> Please enable CONFIG_GENERIC_IRQ_DEBUGFS and provide the output of 
> 
> # cat /sys/kernel/debug/irq/domains/*
>
> and
> 
> # ls /sys/kernel/debug/irq/irqs
>
Ok, here's the information after system bootup on top
of 4.13:

# cat /sys/kernel/debug/irq/domains/*
name:   VECTOR
 size:   0
 mapped: 388
 flags:  0x0041
name:   IO-APIC-0
 size:   24
 mapped: 16
 flags:  0x0041
 parent: VECTOR
name:   VECTOR
 size:   0
 mapped: 388
 flags:  0x0041
name:   IO-APIC-1
 size:   8
 mapped: 2
 flags:  0x0041
 parent: VECTOR
name:   VECTOR
 size:   0
 mapped: 388
 flags:  0x0041
name:   IO-APIC-2
 size:   8
 mapped: 0
 flags:  0x0041
 parent: VECTOR
name:   VECTOR
 size:   0
 mapped: 388
 flags:  0x0041
name:   IO-APIC-3
 size:   8
 mapped: 0
 flags:  0x0041
 parent: VECTOR
name:   VECTOR
 size:   0
 mapped: 388
 flags:  0x0041
name:   IO-APIC-4
 size:   8
 mapped: 5
 flags:  0x0041
 parent: VECTOR
name:   VECTOR
 size:   0
 mapped: 388
 flags:  0x0041
name:   PCI-HT
 size:   0
 mapped: 0
 flags:  0x0041
 parent: VECTOR
name:   VECTOR
 size:   0
 mapped: 388
 flags:  0x0041
name:   PCI-MSI-2
 size:   0
 mapped: 365
 flags:  0x0051
 parent: VECTOR
name:   VECTOR
 size:   0
 mapped: 388
 flags:  0x0041
name:   VECTOR
 size:   0
 mapped: 388
 flags:  0x0041


# ls /sys/kernel/debug/irq/irqs
ls /sys/kernel/debug/irq/irqs
0  10   11  13  142  184  217  259  292  31  33   337  339
340  

Re: [PATCH 4/4][RFC v2] x86/apic: Spread the vectors by choosing the idlest CPU

2017-09-05 Thread Yu Chen
On Wed, Sep 06, 2017 at 12:57:41AM +0200, Thomas Gleixner wrote:
> On Sun, 3 Sep 2017, Thomas Gleixner wrote:
> 
> > On Fri, 1 Sep 2017, Chen Yu wrote:
> > 
> > > This is the major logic to spread the vectors on different CPUs.
> > > The main idea is to choose the 'idlest' CPU which has assigned
> > > the least number of vectors as the candidate/hint for the vector
> > > allocation domain, in the hope that the vector allocation domain
> > > could leverage this hint to generate corresponding cpumask.
> > > 
> > > One of the requirements to do this vector spreading work comes from the
> > > following hibernation problem found on a 16 cores server:
> > > 
> > > CPU 31 disable failed: CPU has 62 vectors assigned and there
> > > are only 0 available.
> 
> Thinking more about this, this makes no sense whatsoever.
> 
> The total number of interrupts on a system is the same whether they are
> all on CPU 0 or evenly spread over all CPUs.
> 
> As this machine is using physcial destination mode, the number of vectors
> used is the same as the number of interrupts, except for the case where a
> move of an interrupt is in progress and the interrupt which cleans up the
> old vector has not yet arrived. Lets ignore that for now.
> 
> The available vector space is 204 per CPU on such a system.
> 
> 256 - SYSTEM[0-31, 32, 128, 239-255] - LEGACY[50] = 204
> 
> > > CPU 31 disable failed: CPU has 62 vectors assigned and there
> > > are only 0 available.
> 
> CPU31 is the last AP going offline (CPU0 is still online).
> 
> It wants to move 62 vectors to CPU0, but it can't because CPU0 has 0
> available vectors. That means CPU0 has 204 vectors used. I doubt that, but
> what I doubt even more is that this interrupt spreading helps in any way.
> 
> Assumed that we have a total of 204 + 62 = 266 device interrupt vectors in
> use and they are evenly spread over 32 CPUs, so each CPU has either 8 or
> nine vectors. Fine.
> 
> Now if you unplug all CPUs except CPU0 starting from CPU1 up to CPU31 then
> at the point where CPU31 is about to be unplugged, CPU0 holds 133 vectors
> and CPU31 holds 133 vectors as well - assumed that the spread is exactly
> even.
> 
> I have a hard time to figure out how the 133 vectors on CPU31 are now
> magically fitting in the empty space on CPU0, which is 204 - 133 = 71. In
> my limited understanding of math 133 is greater than 71, but your patch
> might make that magically be wrong.
>
The problem is reproduced when the network cable is not plugged in,
because this driver looks like this:

step 1. Reserved enough irq vectors and corresponding IRQs.
step 2. If the network is activated, invoke request_irq() to
register the handler.
step 3. Invoke set_affinity() to spread the IRQs onto different
CPUs, thus to spread the vectors too.

Here's my understanding for why spreading vectors might help for this
special case: 
As step 2 will not get invoked, the IRQs of this driver
has not been enabled, thus in migrate_one_irq() this IRQ
will not be considered because there is a check of
irqd_is_started(d), thus there should only be 8 vectors
allocated by this driver on CPU0, and 8 vectors left on
CPU31, and the 8 vectors on CPU31 will not be migrated
to CPU0 neither, so there is room for other 'valid' vectors
to be migrated to CPU0.
> Can you please provide detailed information about how many device
> interrupts are actually in use/allocated on that system?
> 
> Please enable CONFIG_GENERIC_IRQ_DEBUGFS and provide the output of 
> 
> # cat /sys/kernel/debug/irq/domains/*
>
> and
> 
> # ls /sys/kernel/debug/irq/irqs
>
Ok, here's the information after system bootup on top
of 4.13:

# cat /sys/kernel/debug/irq/domains/*
name:   VECTOR
 size:   0
 mapped: 388
 flags:  0x0041
name:   IO-APIC-0
 size:   24
 mapped: 16
 flags:  0x0041
 parent: VECTOR
name:   VECTOR
 size:   0
 mapped: 388
 flags:  0x0041
name:   IO-APIC-1
 size:   8
 mapped: 2
 flags:  0x0041
 parent: VECTOR
name:   VECTOR
 size:   0
 mapped: 388
 flags:  0x0041
name:   IO-APIC-2
 size:   8
 mapped: 0
 flags:  0x0041
 parent: VECTOR
name:   VECTOR
 size:   0
 mapped: 388
 flags:  0x0041
name:   IO-APIC-3
 size:   8
 mapped: 0
 flags:  0x0041
 parent: VECTOR
name:   VECTOR
 size:   0
 mapped: 388
 flags:  0x0041
name:   IO-APIC-4
 size:   8
 mapped: 5
 flags:  0x0041
 parent: VECTOR
name:   VECTOR
 size:   0
 mapped: 388
 flags:  0x0041
name:   PCI-HT
 size:   0
 mapped: 0
 flags:  0x0041
 parent: VECTOR
name:   VECTOR
 size:   0
 mapped: 388
 flags:  0x0041
name:   PCI-MSI-2
 size:   0
 mapped: 365
 flags:  0x0051
 parent: VECTOR
name:   VECTOR
 size:   0
 mapped: 388
 flags:  0x0041
name:   VECTOR
 size:   0
 mapped: 388
 flags:  0x0041


# ls /sys/kernel/debug/irq/irqs
ls /sys/kernel/debug/irq/irqs
0  10   11  13  142  184  217  259  292  31  33   337  339
340  

Re: [PATCH 23/25 v3] ALSA/dummy: Replace tasklet with softirq hrtimer

2017-09-05 Thread Takashi Sakamoto

On Sep 6 2017 01:18, Sebastian Andrzej Siewior wrote:

From: Thomas Gleixner 

The tasklet is used to defer the execution of snd_pcm_period_elapsed() to
the softirq context. Using the CLOCK_MONOTONIC_SOFT base invokes the timer
callback in softirq context as well which renders the tasklet useless.

Signed-off-by: Thomas Gleixner 
Signed-off-by: Anna-Maria Gleixner 
Cc: Jaroslav Kysela 
Cc: Takashi Iwai 
Cc: Takashi Sakamoto 
Cc: alsa-de...@alsa-project.org
[o-takashi: avoid stall due to a call of hrtimer_cancel() on a callback
 of hrtimer]
Signed-off-by: Takashi Sakamoto 
Signed-off-by: Sebastian Andrzej Siewior 
---
On 2017-09-06 01:05:43 [+0900], Takashi Sakamoto wrote:

As Iwai-san mentioned, in this function, .trigger can be called in two
cases; XRUN occurs and draining is done. Thus, let you change the comment as
'In cases of XRUN and draining, this calls .trigger to stop PCM substream.'.
I'm sorry to trouble you.

snd_pcm_period_elapsed()
->snd_pcm_update_hw_ptr0()
   ->snd_pcm_update_state()
 ->snd_pcm_drain_done()
   ...
   ->.trigger(TRIGGER_STOP)
 ->xrun()
   ->snd_pcm_stop()
 ...
 ->.trigger(TRIGGER_STOP)



I think you asked me just to update the comment. Did I do it right?

v2…v3: updated the comment as per Takashi Sakamoto's suggestion.
v1…v2: merged Takashi Sakamoto fixup of the original patch into v2.

  sound/drivers/dummy.c | 25 +++--
  1 file changed, 11 insertions(+), 14 deletions(-)


This Looks good to me. I did quick test and confirmed that it brings no 
stalls.


Tested-by: Takashi Sakamoto 


diff --git a/sound/drivers/dummy.c b/sound/drivers/dummy.c
index dd5ed037adf2..cdd851286f92 100644
--- a/sound/drivers/dummy.c
+++ b/sound/drivers/dummy.c
@@ -376,17 +376,9 @@ struct dummy_hrtimer_pcm {
ktime_t period_time;
atomic_t running;
struct hrtimer timer;
-   struct tasklet_struct tasklet;
struct snd_pcm_substream *substream;
  };
  
-static void dummy_hrtimer_pcm_elapsed(unsigned long priv)

-{
-   struct dummy_hrtimer_pcm *dpcm = (struct dummy_hrtimer_pcm *)priv;
-   if (atomic_read(>running))
-   snd_pcm_period_elapsed(dpcm->substream);
-}
-
  static enum hrtimer_restart dummy_hrtimer_callback(struct hrtimer *timer)
  {
struct dummy_hrtimer_pcm *dpcm;
@@ -394,7 +386,14 @@ static enum hrtimer_restart dummy_hrtimer_callback(struct 
hrtimer *timer)
dpcm = container_of(timer, struct dummy_hrtimer_pcm, timer);
if (!atomic_read(>running))
return HRTIMER_NORESTART;
-   tasklet_schedule(>tasklet);
+   /*
+* In cases of XRUN and draining, this calls .trigger to stop PCM
+* substream.
+*/
+   snd_pcm_period_elapsed(dpcm->substream);
+   if (!atomic_read(>running))
+   return HRTIMER_NORESTART;
+
hrtimer_forward_now(timer, dpcm->period_time);
return HRTIMER_RESTART;
  }
@@ -414,14 +413,14 @@ static int dummy_hrtimer_stop(struct snd_pcm_substream 
*substream)
struct dummy_hrtimer_pcm *dpcm = substream->runtime->private_data;
  
  	atomic_set(>running, 0);

-   hrtimer_cancel(>timer);
+   if (!hrtimer_callback_running(>timer))
+   hrtimer_cancel(>timer);
return 0;
  }
  
  static inline void dummy_hrtimer_sync(struct dummy_hrtimer_pcm *dpcm)

  {
hrtimer_cancel(>timer);
-   tasklet_kill(>tasklet);
  }
  
  static snd_pcm_uframes_t

@@ -466,12 +465,10 @@ static int dummy_hrtimer_create(struct snd_pcm_substream 
*substream)
if (!dpcm)
return -ENOMEM;
substream->runtime->private_data = dpcm;
-   hrtimer_init(>timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+   hrtimer_init(>timer, CLOCK_MONOTONIC_SOFT, HRTIMER_MODE_REL);
dpcm->timer.function = dummy_hrtimer_callback;
dpcm->substream = substream;
atomic_set(>running, 0);
-   tasklet_init(>tasklet, dummy_hrtimer_pcm_elapsed,
-(unsigned long)dpcm);
return 0;
  }



Thanks

Takashi Sakamoto


Re: [PATCH 23/25 v3] ALSA/dummy: Replace tasklet with softirq hrtimer

2017-09-05 Thread Takashi Sakamoto

On Sep 6 2017 01:18, Sebastian Andrzej Siewior wrote:

From: Thomas Gleixner 

The tasklet is used to defer the execution of snd_pcm_period_elapsed() to
the softirq context. Using the CLOCK_MONOTONIC_SOFT base invokes the timer
callback in softirq context as well which renders the tasklet useless.

Signed-off-by: Thomas Gleixner 
Signed-off-by: Anna-Maria Gleixner 
Cc: Jaroslav Kysela 
Cc: Takashi Iwai 
Cc: Takashi Sakamoto 
Cc: alsa-de...@alsa-project.org
[o-takashi: avoid stall due to a call of hrtimer_cancel() on a callback
 of hrtimer]
Signed-off-by: Takashi Sakamoto 
Signed-off-by: Sebastian Andrzej Siewior 
---
On 2017-09-06 01:05:43 [+0900], Takashi Sakamoto wrote:

As Iwai-san mentioned, in this function, .trigger can be called in two
cases; XRUN occurs and draining is done. Thus, let you change the comment as
'In cases of XRUN and draining, this calls .trigger to stop PCM substream.'.
I'm sorry to trouble you.

snd_pcm_period_elapsed()
->snd_pcm_update_hw_ptr0()
   ->snd_pcm_update_state()
 ->snd_pcm_drain_done()
   ...
   ->.trigger(TRIGGER_STOP)
 ->xrun()
   ->snd_pcm_stop()
 ...
 ->.trigger(TRIGGER_STOP)



I think you asked me just to update the comment. Did I do it right?

v2…v3: updated the comment as per Takashi Sakamoto's suggestion.
v1…v2: merged Takashi Sakamoto fixup of the original patch into v2.

  sound/drivers/dummy.c | 25 +++--
  1 file changed, 11 insertions(+), 14 deletions(-)


This Looks good to me. I did quick test and confirmed that it brings no 
stalls.


Tested-by: Takashi Sakamoto 


diff --git a/sound/drivers/dummy.c b/sound/drivers/dummy.c
index dd5ed037adf2..cdd851286f92 100644
--- a/sound/drivers/dummy.c
+++ b/sound/drivers/dummy.c
@@ -376,17 +376,9 @@ struct dummy_hrtimer_pcm {
ktime_t period_time;
atomic_t running;
struct hrtimer timer;
-   struct tasklet_struct tasklet;
struct snd_pcm_substream *substream;
  };
  
-static void dummy_hrtimer_pcm_elapsed(unsigned long priv)

-{
-   struct dummy_hrtimer_pcm *dpcm = (struct dummy_hrtimer_pcm *)priv;
-   if (atomic_read(>running))
-   snd_pcm_period_elapsed(dpcm->substream);
-}
-
  static enum hrtimer_restart dummy_hrtimer_callback(struct hrtimer *timer)
  {
struct dummy_hrtimer_pcm *dpcm;
@@ -394,7 +386,14 @@ static enum hrtimer_restart dummy_hrtimer_callback(struct 
hrtimer *timer)
dpcm = container_of(timer, struct dummy_hrtimer_pcm, timer);
if (!atomic_read(>running))
return HRTIMER_NORESTART;
-   tasklet_schedule(>tasklet);
+   /*
+* In cases of XRUN and draining, this calls .trigger to stop PCM
+* substream.
+*/
+   snd_pcm_period_elapsed(dpcm->substream);
+   if (!atomic_read(>running))
+   return HRTIMER_NORESTART;
+
hrtimer_forward_now(timer, dpcm->period_time);
return HRTIMER_RESTART;
  }
@@ -414,14 +413,14 @@ static int dummy_hrtimer_stop(struct snd_pcm_substream 
*substream)
struct dummy_hrtimer_pcm *dpcm = substream->runtime->private_data;
  
  	atomic_set(>running, 0);

-   hrtimer_cancel(>timer);
+   if (!hrtimer_callback_running(>timer))
+   hrtimer_cancel(>timer);
return 0;
  }
  
  static inline void dummy_hrtimer_sync(struct dummy_hrtimer_pcm *dpcm)

  {
hrtimer_cancel(>timer);
-   tasklet_kill(>tasklet);
  }
  
  static snd_pcm_uframes_t

@@ -466,12 +465,10 @@ static int dummy_hrtimer_create(struct snd_pcm_substream 
*substream)
if (!dpcm)
return -ENOMEM;
substream->runtime->private_data = dpcm;
-   hrtimer_init(>timer, CLOCK_MONOTONIC, HRTIMER_MODE_REL);
+   hrtimer_init(>timer, CLOCK_MONOTONIC_SOFT, HRTIMER_MODE_REL);
dpcm->timer.function = dummy_hrtimer_callback;
dpcm->substream = substream;
atomic_set(>running, 0);
-   tasklet_init(>tasklet, dummy_hrtimer_pcm_elapsed,
-(unsigned long)dpcm);
return 0;
  }



Thanks

Takashi Sakamoto


Re: [PATCH v8 06/13] x86/apic: Mark the apic_intr_mode extern for sanity check cleanup

2017-09-05 Thread Baoquan He
On 08/28/17 at 11:20am, Dou Liyang wrote:
> Calling native_smp_prepare_cpus() to prepare for SMP bootup, does
> some sanity checking, enables APIC mode and disables SMP feature.
> 
> Now, APIC mode setup has been unified to apic_intr_mode_init(),
> some sanity checks are redundant and need to be cleanup.
> 
> Mark the apic_intr_mode extern to refine the switch and remove
> the redundant sanity check.
> 
> Signed-off-by: Dou Liyang 
> ---
>  arch/x86/include/asm/apic.h |  9 +++
>  arch/x86/kernel/apic/apic.c | 12 --
>  arch/x86/kernel/smpboot.c   | 57 
> +++--
>  3 files changed, 22 insertions(+), 56 deletions(-)
> 
> diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
> index 4e550c7..01f3fc8 100644
> --- a/arch/x86/include/asm/apic.h
> +++ b/arch/x86/include/asm/apic.h
> @@ -53,6 +53,15 @@ extern int local_apic_timer_c2_ok;
>  extern int disable_apic;
>  extern unsigned int lapic_timer_frequency;
>  
> +extern enum apic_intr_mode_id apic_intr_mode;
> +enum apic_intr_mode_id {
> + APIC_PIC,
> + APIC_VIRTUAL_WIRE,
> + APIC_VIRTUAL_WIRE_NO_CONFIG,
> + APIC_SYMMETRIC_IO,
> + APIC_SYMMETRIC_IO_NO_ROUTING
> +};
> +
>  #ifdef CONFIG_SMP
>  extern void __inquire_remote_apic(int apicid);
>  #else /* CONFIG_SMP */
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index 9038c5f..97bd47b 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -1235,13 +1235,7 @@ void __init sync_Arb_IDs(void)
>   APIC_INT_LEVELTRIG | APIC_DM_INIT);
>  }
>  
> -enum apic_intr_mode {
> - APIC_PIC,
> - APIC_VIRTUAL_WIRE,
> - APIC_VIRTUAL_WIRE_NO_CONFIG,
> - APIC_SYMMETRIC_IO,
> - APIC_SYMMETRIC_IO_NO_ROUTING,
> -};
> +enum apic_intr_mode_id apic_intr_mode;
>  
>  static int __init apic_intr_mode_select(void)
>  {
> @@ -1367,7 +1361,9 @@ void __init apic_intr_mode_init(void)
>  {
>   bool upmode = false;
>  
> - switch (apic_intr_mode_select()) {
> + apic_intr_mode = apic_intr_mode_select();
> +
> + switch (apic_intr_mode) {
>   case APIC_PIC:
>   pr_info("APIC: Keep in PIC mode(8259)\n");
>   return;
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 8301b75..d2160f7 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1187,17 +1187,10 @@ static __init void disable_smp(void)
>   cpumask_set_cpu(0, topology_core_cpumask(0));
>  }
>  
> -enum {
> - SMP_OK,
> - SMP_NO_CONFIG,
> - SMP_NO_APIC,
> - SMP_FORCE_UP,
> -};
> -
>  /*
>   * Various sanity checks.
>   */
> -static int __init smp_sanity_check(unsigned max_cpus)
> +static void __init smp_sanity_check(void)
>  {
>   preempt_disable();
>  
> @@ -1235,16 +1228,6 @@ static int __init smp_sanity_check(unsigned max_cpus)
>   }
>  
>   /*
> -  * If we couldn't find an SMP configuration at boot time,
> -  * get out of here now!
> -  */
> - if (!smp_found_config && !acpi_lapic) {
> - preempt_enable();
> - pr_notice("SMP motherboard not detected\n");
> - return SMP_NO_CONFIG;
> - }
> -
> - /*
>* Should not be necessary because the MP table should list the boot
>* CPU too, but we do it for the sake of robustness anyway.
>*/
> @@ -1254,29 +1237,6 @@ static int __init smp_sanity_check(unsigned max_cpus)
>   physid_set(hard_smp_processor_id(), phys_cpu_present_map);
>   }
>   preempt_enable();
> -
> - /*
> -  * If we couldn't find a local APIC, then get out of here now!
> -  */
> - if (APIC_INTEGRATED(boot_cpu_apic_version) &&
> - !boot_cpu_has(X86_FEATURE_APIC)) {
> - if (!disable_apic) {
> - pr_err("BIOS bug, local APIC #%d not detected!...\n",
> - boot_cpu_physical_apicid);
> - pr_err("... forcing use of dummy APIC emulation (tell 
> your hw vendor)\n");
> - }
> - return SMP_NO_APIC;
> - }
> -
> - /*
> -  * If SMP should be disabled, then really disable it!
> -  */
> - if (!max_cpus) {
> - pr_info("SMP mode deactivated\n");
> - return SMP_FORCE_UP;
> - }
> -
> - return SMP_OK;
>  }
>  
>  static void __init smp_cpu_index_default(void)
> @@ -1335,19 +1295,20 @@ void __init native_smp_prepare_cpus(unsigned int 
> max_cpus)

Please also cleanup the passed in max_cpus since it's not used here any
more. And up to the caller:

static noinline void __init kernel_init_freeable(void)
{
...
smp_prepare_cpus(setup_max_cpus);
...
}

>  
>   apic_intr_mode_init();
>  
> - switch (smp_sanity_check(max_cpus)) {
> - case SMP_NO_CONFIG:
> - disable_smp();
> - return;
> - case SMP_NO_APIC:
> + smp_sanity_check();
> +
> + switch 

Re: [PATCH v8 06/13] x86/apic: Mark the apic_intr_mode extern for sanity check cleanup

2017-09-05 Thread Baoquan He
On 08/28/17 at 11:20am, Dou Liyang wrote:
> Calling native_smp_prepare_cpus() to prepare for SMP bootup, does
> some sanity checking, enables APIC mode and disables SMP feature.
> 
> Now, APIC mode setup has been unified to apic_intr_mode_init(),
> some sanity checks are redundant and need to be cleanup.
> 
> Mark the apic_intr_mode extern to refine the switch and remove
> the redundant sanity check.
> 
> Signed-off-by: Dou Liyang 
> ---
>  arch/x86/include/asm/apic.h |  9 +++
>  arch/x86/kernel/apic/apic.c | 12 --
>  arch/x86/kernel/smpboot.c   | 57 
> +++--
>  3 files changed, 22 insertions(+), 56 deletions(-)
> 
> diff --git a/arch/x86/include/asm/apic.h b/arch/x86/include/asm/apic.h
> index 4e550c7..01f3fc8 100644
> --- a/arch/x86/include/asm/apic.h
> +++ b/arch/x86/include/asm/apic.h
> @@ -53,6 +53,15 @@ extern int local_apic_timer_c2_ok;
>  extern int disable_apic;
>  extern unsigned int lapic_timer_frequency;
>  
> +extern enum apic_intr_mode_id apic_intr_mode;
> +enum apic_intr_mode_id {
> + APIC_PIC,
> + APIC_VIRTUAL_WIRE,
> + APIC_VIRTUAL_WIRE_NO_CONFIG,
> + APIC_SYMMETRIC_IO,
> + APIC_SYMMETRIC_IO_NO_ROUTING
> +};
> +
>  #ifdef CONFIG_SMP
>  extern void __inquire_remote_apic(int apicid);
>  #else /* CONFIG_SMP */
> diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
> index 9038c5f..97bd47b 100644
> --- a/arch/x86/kernel/apic/apic.c
> +++ b/arch/x86/kernel/apic/apic.c
> @@ -1235,13 +1235,7 @@ void __init sync_Arb_IDs(void)
>   APIC_INT_LEVELTRIG | APIC_DM_INIT);
>  }
>  
> -enum apic_intr_mode {
> - APIC_PIC,
> - APIC_VIRTUAL_WIRE,
> - APIC_VIRTUAL_WIRE_NO_CONFIG,
> - APIC_SYMMETRIC_IO,
> - APIC_SYMMETRIC_IO_NO_ROUTING,
> -};
> +enum apic_intr_mode_id apic_intr_mode;
>  
>  static int __init apic_intr_mode_select(void)
>  {
> @@ -1367,7 +1361,9 @@ void __init apic_intr_mode_init(void)
>  {
>   bool upmode = false;
>  
> - switch (apic_intr_mode_select()) {
> + apic_intr_mode = apic_intr_mode_select();
> +
> + switch (apic_intr_mode) {
>   case APIC_PIC:
>   pr_info("APIC: Keep in PIC mode(8259)\n");
>   return;
> diff --git a/arch/x86/kernel/smpboot.c b/arch/x86/kernel/smpboot.c
> index 8301b75..d2160f7 100644
> --- a/arch/x86/kernel/smpboot.c
> +++ b/arch/x86/kernel/smpboot.c
> @@ -1187,17 +1187,10 @@ static __init void disable_smp(void)
>   cpumask_set_cpu(0, topology_core_cpumask(0));
>  }
>  
> -enum {
> - SMP_OK,
> - SMP_NO_CONFIG,
> - SMP_NO_APIC,
> - SMP_FORCE_UP,
> -};
> -
>  /*
>   * Various sanity checks.
>   */
> -static int __init smp_sanity_check(unsigned max_cpus)
> +static void __init smp_sanity_check(void)
>  {
>   preempt_disable();
>  
> @@ -1235,16 +1228,6 @@ static int __init smp_sanity_check(unsigned max_cpus)
>   }
>  
>   /*
> -  * If we couldn't find an SMP configuration at boot time,
> -  * get out of here now!
> -  */
> - if (!smp_found_config && !acpi_lapic) {
> - preempt_enable();
> - pr_notice("SMP motherboard not detected\n");
> - return SMP_NO_CONFIG;
> - }
> -
> - /*
>* Should not be necessary because the MP table should list the boot
>* CPU too, but we do it for the sake of robustness anyway.
>*/
> @@ -1254,29 +1237,6 @@ static int __init smp_sanity_check(unsigned max_cpus)
>   physid_set(hard_smp_processor_id(), phys_cpu_present_map);
>   }
>   preempt_enable();
> -
> - /*
> -  * If we couldn't find a local APIC, then get out of here now!
> -  */
> - if (APIC_INTEGRATED(boot_cpu_apic_version) &&
> - !boot_cpu_has(X86_FEATURE_APIC)) {
> - if (!disable_apic) {
> - pr_err("BIOS bug, local APIC #%d not detected!...\n",
> - boot_cpu_physical_apicid);
> - pr_err("... forcing use of dummy APIC emulation (tell 
> your hw vendor)\n");
> - }
> - return SMP_NO_APIC;
> - }
> -
> - /*
> -  * If SMP should be disabled, then really disable it!
> -  */
> - if (!max_cpus) {
> - pr_info("SMP mode deactivated\n");
> - return SMP_FORCE_UP;
> - }
> -
> - return SMP_OK;
>  }
>  
>  static void __init smp_cpu_index_default(void)
> @@ -1335,19 +1295,20 @@ void __init native_smp_prepare_cpus(unsigned int 
> max_cpus)

Please also cleanup the passed in max_cpus since it's not used here any
more. And up to the caller:

static noinline void __init kernel_init_freeable(void)
{
...
smp_prepare_cpus(setup_max_cpus);
...
}

>  
>   apic_intr_mode_init();
>  
> - switch (smp_sanity_check(max_cpus)) {
> - case SMP_NO_CONFIG:
> - disable_smp();
> - return;
> - case SMP_NO_APIC:
> + smp_sanity_check();
> +
> + switch (apic_intr_mode) {
> + 

Re: [PATCH] gpio: thunderx: select IRQ_DOMAIN_HIERARCHY instead of depends on

2017-09-05 Thread Masahiro Yamada
Hi David,


2017-09-06 11:09 GMT+09:00 David Daney :
> On 09/05/2017 06:40 PM, Masahiro Yamada wrote:
>>
>> IRQ_DOMAIN_HIERARCHY is not user-configurable, but supposed to be
>> selected by drivers that need IRQ domain hierarchy support.
>>
>> GPIO_THUNDERX is the only user of "depends on IRQ_DOMAIN_HIERARCHY".
>> This means, we can not enable GPIO_THUNDERX unless other drivers
>> select IRQ_DOMAIN_HIERARCHY elsewhere.  This is odd.  Flip the logic.
>>
>> Signed-off-by: Masahiro Yamada 
>
>
> IRQ_DOMAIN_HIERARCHY is set as a result of ARCH_THUNDER (this SoC hardware),
> so it actually works as-is.


Right, ARCH_THUNDER does not select it directly,
but does it indirectly.  (this is not so clear...)

ARCH_THUNDER  -> ARM64 -> ARM_GIC -> IRQ_DOMAIN_HIERARCHY



>  That said, this looks like a reasonable
> improvement, and will allow the COMPILE_TEST to enable it, so...
>
> Acked-by: David Daney 


BTW, I could not understand your intention of
(64BIT && COMPILE_TEST)

Why can COMPILE_TEST be enabled when 64BIT?


-- 
Best Regards
Masahiro Yamada


Re: [PATCH] gpio: thunderx: select IRQ_DOMAIN_HIERARCHY instead of depends on

2017-09-05 Thread Masahiro Yamada
Hi David,


2017-09-06 11:09 GMT+09:00 David Daney :
> On 09/05/2017 06:40 PM, Masahiro Yamada wrote:
>>
>> IRQ_DOMAIN_HIERARCHY is not user-configurable, but supposed to be
>> selected by drivers that need IRQ domain hierarchy support.
>>
>> GPIO_THUNDERX is the only user of "depends on IRQ_DOMAIN_HIERARCHY".
>> This means, we can not enable GPIO_THUNDERX unless other drivers
>> select IRQ_DOMAIN_HIERARCHY elsewhere.  This is odd.  Flip the logic.
>>
>> Signed-off-by: Masahiro Yamada 
>
>
> IRQ_DOMAIN_HIERARCHY is set as a result of ARCH_THUNDER (this SoC hardware),
> so it actually works as-is.


Right, ARCH_THUNDER does not select it directly,
but does it indirectly.  (this is not so clear...)

ARCH_THUNDER  -> ARM64 -> ARM_GIC -> IRQ_DOMAIN_HIERARCHY



>  That said, this looks like a reasonable
> improvement, and will allow the COMPILE_TEST to enable it, so...
>
> Acked-by: David Daney 


BTW, I could not understand your intention of
(64BIT && COMPILE_TEST)

Why can COMPILE_TEST be enabled when 64BIT?


-- 
Best Regards
Masahiro Yamada


Re: [PATCH v8 01/13] x86/apic: Construct a selector for the interrupt delivery mode

2017-09-05 Thread Dou Liyang

Hi Baoquan,

Thanks for your reply!

My answer is in below.

At 09/06/2017 08:55 AM, Baoquan He wrote:

Hi Liyang,

On 08/28/17 at 11:20am, Dou Liyang wrote:

Now, there are many switches in kernel which are used to determine
the final interrupt delivery mode, as shown below:

1) kconfig:
   CONFIG_X86_64; CONFIG_X86_LOCAL_APIC; CONFIG_x86_IO_APIC
2) kernel option: disable_apic; skip_ioapic_setup
3) CPU Capability: boot_cpu_has(X86_FEATURE_APIC)
4) MP table: smp_found_config
5) ACPI: acpi_lapic; acpi_ioapic; nr_ioapic

These switches are disordered and scattered and there are also some
dependencies with each other. These make the code difficult to
maintain and read.

Construct a selector to unify them into a single function, then,
Use this selector to get an interrupt delivery mode directly.

Signed-off-by: Dou Liyang 
---
 arch/x86/kernel/apic/apic.c | 59 +
 1 file changed, 59 insertions(+)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 98b3dd8..01bde03 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1235,6 +1235,65 @@ void __init sync_Arb_IDs(void)
APIC_INT_LEVELTRIG | APIC_DM_INIT);
 }

+enum apic_intr_mode {
+   APIC_PIC,
+   APIC_VIRTUAL_WIRE,
+   APIC_SYMMETRIC_IO,
+};
+
+static int __init apic_intr_mode_select(void)
+{
+   /* Check kernel option */
+   if (disable_apic) {
+   pr_info("APIC disabled via kernel command line\n");
+   return APIC_PIC;
+   }
+
+   /* Check BIOS */
+#ifdef CONFIG_X86_64
+   /* On 64-bit, the APIC must be integrated, Check local APIC only */
+   if (!boot_cpu_has(X86_FEATURE_APIC)) {
+   disable_apic = 1;
+   pr_info("APIC disabled by BIOS\n");
+   return APIC_PIC;
+   }
+#else
+   /*
+* On 32-bit, check whether there is a separate chip or integrated


Change the comment to:

On 32-bit, the APIC may be a separated chip(82489DX) or integrated chip.
if BSP doesn't has APIC feature, we can sure there is no integrated
chip, but can not be sure there is no independent chip. So check two
situation when BSP doesn't has APIC feature.


+* APIC
+*/
+
+   /* Has a separate chip ? */


If there is also no SMP configuration, we can be sure there is no
separated chip. Switch the interrupt delivery node to APIC_PIC directly.


+   if (!boot_cpu_has(X86_FEATURE_APIC) && !smp_found_config) {
+   disable_apic = 1;
+
+   return APIC_PIC;
+   }
+

Here you said several times you are checking if APIC is integrated, but
you always check boot_cpu_has(X86_FEATURE_APIC), and you also check
smp_found_config in above case. Can you make the comment match the code?



Yes.


E.g if (!boot_cpu_has(X86_FEATURE_APIC)), cpu doesn't support lapic,
just return, you can remove the CONFIG_X86_64 check to make it a common
check. And we have lapic_is_integrated() to check if lapic is integrated.


I am sorry my comment may confuse you. our target is checking if BIOS
supports APIC, no matter what type(separated/integrated) it is.

The new logic 1) as you said may like :

if (!boot_cpu_has(X86_FEATURE_APIC))
return ...
if (lapic_is_integrated())
return ...
here we miss (!boot_cpu_has(X86_FEATURE_APIC) && smp_found_config) for
a separated chip.


Besides, we are saying lapic is integrated with ioapic in a single chip,
right? I found MP-Spec mention it. If yes, could you add more words to


Yes, 82489DX – it was a discrete chip that functioned both as local and
I/O APIC


make it more specific and precise? Then people can get the exact


Indeed, I will. Please see the modification of comments


information from the comment and code.

Thanks
Baoquan


+   /* Has a local APIC ? */


Sanity check if the BIOS pretends there is one local APIC.


Thanks,
dou.


+   if (!boot_cpu_has(X86_FEATURE_APIC) &&
+   APIC_INTEGRATED(boot_cpu_apic_version)) {
+   disable_apic = 1;
+   pr_err(FW_BUG "Local APIC %d not detected, force emulation\n",
+  boot_cpu_physical_apicid);
+
+   return APIC_PIC;
+   }
+#endif
+
+   /* Check MP table or ACPI MADT configuration */
+   if (!smp_found_config) {
+   disable_ioapic_support();
+
+   if (!acpi_lapic)
+   pr_info("APIC: ACPI MADT or MP tables are not 
detected\n");
+
+   return APIC_VIRTUAL_WIRE;
+   }
+
+   return APIC_SYMMETRIC_IO;
+}
+
 /*
  * An initial setup of the virtual wire mode.
  */
--
2.5.5












Re: [PATCH v8 01/13] x86/apic: Construct a selector for the interrupt delivery mode

2017-09-05 Thread Dou Liyang

Hi Baoquan,

Thanks for your reply!

My answer is in below.

At 09/06/2017 08:55 AM, Baoquan He wrote:

Hi Liyang,

On 08/28/17 at 11:20am, Dou Liyang wrote:

Now, there are many switches in kernel which are used to determine
the final interrupt delivery mode, as shown below:

1) kconfig:
   CONFIG_X86_64; CONFIG_X86_LOCAL_APIC; CONFIG_x86_IO_APIC
2) kernel option: disable_apic; skip_ioapic_setup
3) CPU Capability: boot_cpu_has(X86_FEATURE_APIC)
4) MP table: smp_found_config
5) ACPI: acpi_lapic; acpi_ioapic; nr_ioapic

These switches are disordered and scattered and there are also some
dependencies with each other. These make the code difficult to
maintain and read.

Construct a selector to unify them into a single function, then,
Use this selector to get an interrupt delivery mode directly.

Signed-off-by: Dou Liyang 
---
 arch/x86/kernel/apic/apic.c | 59 +
 1 file changed, 59 insertions(+)

diff --git a/arch/x86/kernel/apic/apic.c b/arch/x86/kernel/apic/apic.c
index 98b3dd8..01bde03 100644
--- a/arch/x86/kernel/apic/apic.c
+++ b/arch/x86/kernel/apic/apic.c
@@ -1235,6 +1235,65 @@ void __init sync_Arb_IDs(void)
APIC_INT_LEVELTRIG | APIC_DM_INIT);
 }

+enum apic_intr_mode {
+   APIC_PIC,
+   APIC_VIRTUAL_WIRE,
+   APIC_SYMMETRIC_IO,
+};
+
+static int __init apic_intr_mode_select(void)
+{
+   /* Check kernel option */
+   if (disable_apic) {
+   pr_info("APIC disabled via kernel command line\n");
+   return APIC_PIC;
+   }
+
+   /* Check BIOS */
+#ifdef CONFIG_X86_64
+   /* On 64-bit, the APIC must be integrated, Check local APIC only */
+   if (!boot_cpu_has(X86_FEATURE_APIC)) {
+   disable_apic = 1;
+   pr_info("APIC disabled by BIOS\n");
+   return APIC_PIC;
+   }
+#else
+   /*
+* On 32-bit, check whether there is a separate chip or integrated


Change the comment to:

On 32-bit, the APIC may be a separated chip(82489DX) or integrated chip.
if BSP doesn't has APIC feature, we can sure there is no integrated
chip, but can not be sure there is no independent chip. So check two
situation when BSP doesn't has APIC feature.


+* APIC
+*/
+
+   /* Has a separate chip ? */


If there is also no SMP configuration, we can be sure there is no
separated chip. Switch the interrupt delivery node to APIC_PIC directly.


+   if (!boot_cpu_has(X86_FEATURE_APIC) && !smp_found_config) {
+   disable_apic = 1;
+
+   return APIC_PIC;
+   }
+

Here you said several times you are checking if APIC is integrated, but
you always check boot_cpu_has(X86_FEATURE_APIC), and you also check
smp_found_config in above case. Can you make the comment match the code?



Yes.


E.g if (!boot_cpu_has(X86_FEATURE_APIC)), cpu doesn't support lapic,
just return, you can remove the CONFIG_X86_64 check to make it a common
check. And we have lapic_is_integrated() to check if lapic is integrated.


I am sorry my comment may confuse you. our target is checking if BIOS
supports APIC, no matter what type(separated/integrated) it is.

The new logic 1) as you said may like :

if (!boot_cpu_has(X86_FEATURE_APIC))
return ...
if (lapic_is_integrated())
return ...
here we miss (!boot_cpu_has(X86_FEATURE_APIC) && smp_found_config) for
a separated chip.


Besides, we are saying lapic is integrated with ioapic in a single chip,
right? I found MP-Spec mention it. If yes, could you add more words to


Yes, 82489DX – it was a discrete chip that functioned both as local and
I/O APIC


make it more specific and precise? Then people can get the exact


Indeed, I will. Please see the modification of comments


information from the comment and code.

Thanks
Baoquan


+   /* Has a local APIC ? */


Sanity check if the BIOS pretends there is one local APIC.


Thanks,
dou.


+   if (!boot_cpu_has(X86_FEATURE_APIC) &&
+   APIC_INTEGRATED(boot_cpu_apic_version)) {
+   disable_apic = 1;
+   pr_err(FW_BUG "Local APIC %d not detected, force emulation\n",
+  boot_cpu_physical_apicid);
+
+   return APIC_PIC;
+   }
+#endif
+
+   /* Check MP table or ACPI MADT configuration */
+   if (!smp_found_config) {
+   disable_ioapic_support();
+
+   if (!acpi_lapic)
+   pr_info("APIC: ACPI MADT or MP tables are not 
detected\n");
+
+   return APIC_VIRTUAL_WIRE;
+   }
+
+   return APIC_SYMMETRIC_IO;
+}
+
 /*
  * An initial setup of the virtual wire mode.
  */
--
2.5.5












Re: [PATCH 4/4][RFC v2] x86/apic: Spread the vectors by choosing the idlest CPU

2017-09-05 Thread Yu Chen
Thanks for looking at this,
(please bear my slow response as I need to check related code
before replying.)
On Sun, Sep 03, 2017 at 08:17:04PM +0200, Thomas Gleixner wrote:
> On Fri, 1 Sep 2017, Chen Yu wrote:
> 
> > This is the major logic to spread the vectors on different CPUs.
> > The main idea is to choose the 'idlest' CPU which has assigned
> > the least number of vectors as the candidate/hint for the vector
> > allocation domain, in the hope that the vector allocation domain
> > could leverage this hint to generate corresponding cpumask.
> > 
> > One of the requirements to do this vector spreading work comes from the
> > following hibernation problem found on a 16 cores server:
> > 
> > CPU 31 disable failed: CPU has 62 vectors assigned and there
> > are only 0 available.
> > 
> > 2 issues were found after investigation:
> > 
> > 1. The network driver has declared many vector resources via
> >pci_enable_msix_range(), say, this driver might likely want
> >to reserve 6 per logical CPU, then there would be 192 of them.
> > 2. Besides, most of the vectors reserved by this driver are assigned
> >on CPU0 due to the current code strategy, so there would be
> >insufficient slots left on CPU0 to receive any migrated IRQs
> >during CPU offine.
> > 
> > In theory after the commit c5cb83bb337c ("genirq/cpuhotplug: Handle
> > managed IRQs on CPU hotplug") the issue 1 has been solved with the
> > managed interrupt mechanism for multi queue devices, which no longer
> > migrate these interrupts. It simply shuts them down and restarts
> > them when the CPU comes back online.
> > 
> > However, according to the implementation of this network driver,
> > the commit mentioned above does not fully fix the issue 1.
> > Here's the framework of the network driver:
> > 
> > step 1. Reserved enough irq vectors and corresponding IRQs.
> > step 2. If the network is activated, invoke request_irq() to
> > register the handler.
> > step 3. Invoke set_affinity() to spread the IRQs onto different
> > CPUs, thus to spread the vectors too.
> >
> > The problem is, if the network cable is not connected, step 2
> > and 3 will not get invoked, thus the IRQ vectors will not spread
> > on different CPUs and will still be allocated on CPU0. As a result
> > the CPUs will still get the offline failure.
> 
> So the proper solution for this network driver is to switch to managed
> interrupts instead of trying to work around it in some other place. It's
> using the wrong mechanism - end of story.
> 
> Why are you insisting on implementing a band aid for this particular driver
> instead of fixing the underlying problem of that driver which requires to
> have 32 queues and interrupts open even if only a single CPU is online?
> 
I agree, the driver could be rewritten, but it might take some time, so
meanwhile I'm looking at also other possible optimization.
> > Previously there were some discussion in the thread [1] about the
> > vector spread, and here are the suggestion from Thomas:
> > 
> > Q1:
> > Rewrite that allocation code from scratch, use per cpu bitmaps,
> > so we can do fast search over masks and keep track of
> > the vectors which are used/free per cpu.
> > A1:
> > per cpu bitmap was onced considered but this might not be
> > as fast as the solution proposed here. That is, if we introduce the
> > per cpu bitmap, we need to count the number of vectors assigned on
> > each CPUs by cpumask_weight() and sort them in order to get the
> > CPUs who have the least number of vectors assigned. By contrast,
> > if we introduce the bitmaps for each vector number, we can get the
> > 'idle' CPU at the cost of nearly O(1).
> 
> The weight accounting with the cpumasks is an orthogonal issue to the per
> cpu vector bitmaps.
> 
> Once you selected a CPU the current code still loops in circles instead of
> just searching a bitmap.
Yes, I agree.
> 
> And if the required affinity mask needs more than one CPU then this is
> still not giving you the right answer. You assign perhaps a vector to the
> least busy CPU, but the other CPUs which happen to be in the mask are going
> to be randomly selected.
Yes, for multi cpumask, it might depends on how vector domain behaves.
> 
> I'm not against making the vector allocation better, but certainly not by
> adding yet more duct tape to something which is well known as one of the
> dumbest search algorithms on the planet with a worst case of
> 
>nvectors * nr_online_cpus * nr_online_cpus_in_affinity_mask
> 
> and your mechanism nests another loop of potentially NR_VECTORS into
> that. Which is pointless as the actual assignable vector space is
> smaller. While x86 has 256 vectors the assignable vector space is way
> smaller and non continous:
> 
> Vectors   0- 31 are reserved for CPU traps and exceptions
> Vectors 239-255 are reserved by the kernel for IPIs, local timer etc.
> Vector 32 can be reserved by the kernel for the 

Re: [PATCH 4/4][RFC v2] x86/apic: Spread the vectors by choosing the idlest CPU

2017-09-05 Thread Yu Chen
Thanks for looking at this,
(please bear my slow response as I need to check related code
before replying.)
On Sun, Sep 03, 2017 at 08:17:04PM +0200, Thomas Gleixner wrote:
> On Fri, 1 Sep 2017, Chen Yu wrote:
> 
> > This is the major logic to spread the vectors on different CPUs.
> > The main idea is to choose the 'idlest' CPU which has assigned
> > the least number of vectors as the candidate/hint for the vector
> > allocation domain, in the hope that the vector allocation domain
> > could leverage this hint to generate corresponding cpumask.
> > 
> > One of the requirements to do this vector spreading work comes from the
> > following hibernation problem found on a 16 cores server:
> > 
> > CPU 31 disable failed: CPU has 62 vectors assigned and there
> > are only 0 available.
> > 
> > 2 issues were found after investigation:
> > 
> > 1. The network driver has declared many vector resources via
> >pci_enable_msix_range(), say, this driver might likely want
> >to reserve 6 per logical CPU, then there would be 192 of them.
> > 2. Besides, most of the vectors reserved by this driver are assigned
> >on CPU0 due to the current code strategy, so there would be
> >insufficient slots left on CPU0 to receive any migrated IRQs
> >during CPU offine.
> > 
> > In theory after the commit c5cb83bb337c ("genirq/cpuhotplug: Handle
> > managed IRQs on CPU hotplug") the issue 1 has been solved with the
> > managed interrupt mechanism for multi queue devices, which no longer
> > migrate these interrupts. It simply shuts them down and restarts
> > them when the CPU comes back online.
> > 
> > However, according to the implementation of this network driver,
> > the commit mentioned above does not fully fix the issue 1.
> > Here's the framework of the network driver:
> > 
> > step 1. Reserved enough irq vectors and corresponding IRQs.
> > step 2. If the network is activated, invoke request_irq() to
> > register the handler.
> > step 3. Invoke set_affinity() to spread the IRQs onto different
> > CPUs, thus to spread the vectors too.
> >
> > The problem is, if the network cable is not connected, step 2
> > and 3 will not get invoked, thus the IRQ vectors will not spread
> > on different CPUs and will still be allocated on CPU0. As a result
> > the CPUs will still get the offline failure.
> 
> So the proper solution for this network driver is to switch to managed
> interrupts instead of trying to work around it in some other place. It's
> using the wrong mechanism - end of story.
> 
> Why are you insisting on implementing a band aid for this particular driver
> instead of fixing the underlying problem of that driver which requires to
> have 32 queues and interrupts open even if only a single CPU is online?
> 
I agree, the driver could be rewritten, but it might take some time, so
meanwhile I'm looking at also other possible optimization.
> > Previously there were some discussion in the thread [1] about the
> > vector spread, and here are the suggestion from Thomas:
> > 
> > Q1:
> > Rewrite that allocation code from scratch, use per cpu bitmaps,
> > so we can do fast search over masks and keep track of
> > the vectors which are used/free per cpu.
> > A1:
> > per cpu bitmap was onced considered but this might not be
> > as fast as the solution proposed here. That is, if we introduce the
> > per cpu bitmap, we need to count the number of vectors assigned on
> > each CPUs by cpumask_weight() and sort them in order to get the
> > CPUs who have the least number of vectors assigned. By contrast,
> > if we introduce the bitmaps for each vector number, we can get the
> > 'idle' CPU at the cost of nearly O(1).
> 
> The weight accounting with the cpumasks is an orthogonal issue to the per
> cpu vector bitmaps.
> 
> Once you selected a CPU the current code still loops in circles instead of
> just searching a bitmap.
Yes, I agree.
> 
> And if the required affinity mask needs more than one CPU then this is
> still not giving you the right answer. You assign perhaps a vector to the
> least busy CPU, but the other CPUs which happen to be in the mask are going
> to be randomly selected.
Yes, for multi cpumask, it might depends on how vector domain behaves.
> 
> I'm not against making the vector allocation better, but certainly not by
> adding yet more duct tape to something which is well known as one of the
> dumbest search algorithms on the planet with a worst case of
> 
>nvectors * nr_online_cpus * nr_online_cpus_in_affinity_mask
> 
> and your mechanism nests another loop of potentially NR_VECTORS into
> that. Which is pointless as the actual assignable vector space is
> smaller. While x86 has 256 vectors the assignable vector space is way
> smaller and non continous:
> 
> Vectors   0- 31 are reserved for CPU traps and exceptions
> Vectors 239-255 are reserved by the kernel for IPIs, local timer etc.
> Vector 32 can be reserved by the kernel for the 

[PATCH 1/1] workqueue: use type int instead of bool to index array

2017-09-05 Thread zijun_hu
From: zijun_hu 

type bool is used to index three arrays in alloc_and_link_pwqs()
it doesn't look like conventional.

it is fixed by using type int to index the relevant arrays.

Signed-off-by: zijun_hu 
---
 kernel/workqueue.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ab3c0dc8c7ed..0c7a16792949 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3917,7 +3917,7 @@ static void wq_update_unbound_numa(struct 
workqueue_struct *wq, int cpu,
 
 static int alloc_and_link_pwqs(struct workqueue_struct *wq)
 {
-   bool highpri = wq->flags & WQ_HIGHPRI;
+   int pool_idx = (wq->flags & WQ_HIGHPRI) ? 1 : 0;
int cpu, ret;
 
if (!(wq->flags & WQ_UNBOUND)) {
@@ -3931,7 +3931,7 @@ static int alloc_and_link_pwqs(struct workqueue_struct 
*wq)
struct worker_pool *cpu_pools =
per_cpu(cpu_worker_pools, cpu);
 
-   init_pwq(pwq, wq, _pools[highpri]);
+   init_pwq(pwq, wq, _pools[pool_idx]);
 
mutex_lock(>mutex);
link_pwq(pwq);
@@ -3939,14 +3939,15 @@ static int alloc_and_link_pwqs(struct workqueue_struct 
*wq)
}
return 0;
} else if (wq->flags & __WQ_ORDERED) {
-   ret = apply_workqueue_attrs(wq, ordered_wq_attrs[highpri]);
+   ret = apply_workqueue_attrs(wq, ordered_wq_attrs[pool_idx]);
/* there should only be single pwq for ordering guarantee */
WARN(!ret && (wq->pwqs.next != >dfl_pwq->pwqs_node ||
  wq->pwqs.prev != >dfl_pwq->pwqs_node),
 "ordering guarantee broken for workqueue %s\n", wq->name);
return ret;
} else {
-   return apply_workqueue_attrs(wq, unbound_std_wq_attrs[highpri]);
+   return apply_workqueue_attrs(wq,
+   unbound_std_wq_attrs[pool_idx]);
}
 }
 
-- 
1.9.1



[PATCH 1/1] workqueue: use type int instead of bool to index array

2017-09-05 Thread zijun_hu
From: zijun_hu 

type bool is used to index three arrays in alloc_and_link_pwqs()
it doesn't look like conventional.

it is fixed by using type int to index the relevant arrays.

Signed-off-by: zijun_hu 
---
 kernel/workqueue.c | 9 +
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/kernel/workqueue.c b/kernel/workqueue.c
index ab3c0dc8c7ed..0c7a16792949 100644
--- a/kernel/workqueue.c
+++ b/kernel/workqueue.c
@@ -3917,7 +3917,7 @@ static void wq_update_unbound_numa(struct 
workqueue_struct *wq, int cpu,
 
 static int alloc_and_link_pwqs(struct workqueue_struct *wq)
 {
-   bool highpri = wq->flags & WQ_HIGHPRI;
+   int pool_idx = (wq->flags & WQ_HIGHPRI) ? 1 : 0;
int cpu, ret;
 
if (!(wq->flags & WQ_UNBOUND)) {
@@ -3931,7 +3931,7 @@ static int alloc_and_link_pwqs(struct workqueue_struct 
*wq)
struct worker_pool *cpu_pools =
per_cpu(cpu_worker_pools, cpu);
 
-   init_pwq(pwq, wq, _pools[highpri]);
+   init_pwq(pwq, wq, _pools[pool_idx]);
 
mutex_lock(>mutex);
link_pwq(pwq);
@@ -3939,14 +3939,15 @@ static int alloc_and_link_pwqs(struct workqueue_struct 
*wq)
}
return 0;
} else if (wq->flags & __WQ_ORDERED) {
-   ret = apply_workqueue_attrs(wq, ordered_wq_attrs[highpri]);
+   ret = apply_workqueue_attrs(wq, ordered_wq_attrs[pool_idx]);
/* there should only be single pwq for ordering guarantee */
WARN(!ret && (wq->pwqs.next != >dfl_pwq->pwqs_node ||
  wq->pwqs.prev != >dfl_pwq->pwqs_node),
 "ordering guarantee broken for workqueue %s\n", wq->name);
return ret;
} else {
-   return apply_workqueue_attrs(wq, unbound_std_wq_attrs[highpri]);
+   return apply_workqueue_attrs(wq,
+   unbound_std_wq_attrs[pool_idx]);
}
 }
 
-- 
1.9.1



Re: [PATCH 3/3] f2fs: support flexible inline xattr size

2017-09-05 Thread Chao Yu
Hi Jaegeuk,

Do we have time to test and stabilize this new feature before merge window?

Thanks,

On 2017/9/4 18:58, Chao Yu wrote:
> Now, in product, more and more features based on file encryption were
> introduced, their demand of xattr space is increasing, however, inline
> xattr has fixed-size of 200 bytes, once inline xattr space is full, new
> increased xattr data would occupy additional xattr block which may bring
> us more space usage and performance regression during persisting.
> 
> In order to resolve above issue, it's better to expand inline xattr size
> flexibly according to user's requirement.
> 
> So this patch introduces new filesystem feature 'flexible inline xattr',
> and new mount option 'inline_xattr_size=%u', once mkfs enables the
> feature, we can use the option to make f2fs supporting flexible inline
> xattr size.
> 
> To support this feature, we add extra attribute i_inline_xattr_size in
> inode layout, indicating that how many space inline xattr borrows from
> block address mapping space in inode layout, by this, we can easily
> locate and store flexible-sized inline xattr data in inode.
> 
> Inode disk layout:
>   +--+
>   | .i_mode  |
>   | ...  |
>   | .i_ext   |
>   +--+
>   | .i_extra_isize   |
>   | .i_inline_xattr_size |---+
>   | ...  |   |
>   +--+   |
>   | .i_addr  |   |
>   |  - block address or  |   |
>   |  - inline data   |   |
>   +--+<---+  v
>   |inline xattr  |+---inline xattr range
>   +--+<---+
>   | .i_nid   |
>   +--+
>   |   node_footer|
>   | (nid, ino, offset)   |
>   +--+
> 
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/f2fs.h  | 43 ++-
>  fs/f2fs/inode.c | 12 
>  fs/f2fs/namei.c |  6 ++
>  fs/f2fs/node.c  |  4 ++--
>  fs/f2fs/super.c | 32 +++-
>  fs/f2fs/sysfs.c |  7 +++
>  fs/f2fs/xattr.c | 18 +-
>  include/linux/f2fs_fs.h |  5 +++--
>  8 files changed, 100 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 1f24ad4ca1bb..168ad51b7fb9 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -93,6 +93,7 @@ extern char *fault_name[FAULT_MAX];
>  #define F2FS_MOUNT_GRPQUOTA  0x0010
>  #define F2FS_MOUNT_PRJQUOTA  0x0020
>  #define F2FS_MOUNT_QUOTA 0x0040
> +#define F2FS_MOUNT_INLINE_XATTR_SIZE 0x0080
>  
>  #define clear_opt(sbi, option)   ((sbi)->mount_opt.opt &= 
> ~F2FS_MOUNT_##option)
>  #define set_opt(sbi, option) ((sbi)->mount_opt.opt |= F2FS_MOUNT_##option)
> @@ -118,6 +119,7 @@ struct f2fs_mount_info {
>  #define F2FS_FEATURE_EXTRA_ATTR  0x0008
>  #define F2FS_FEATURE_PRJQUOTA0x0010
>  #define F2FS_FEATURE_INODE_CHKSUM0x0020
> +#define F2FS_FEATURE_FLEXIBLE_INLINE_XATTR   0x0040
>  
>  #define F2FS_HAS_FEATURE(sb, mask)   \
>   ((F2FS_SB(sb)->raw_super->feature & cpu_to_le32(mask)) != 0)
> @@ -379,11 +381,14 @@ struct f2fs_flush_device {
>  
>  /* for inline stuff */
>  #define DEF_INLINE_RESERVED_SIZE 1
> +#define DEF_MIN_INLINE_SIZE  1
>  static inline int get_extra_isize(struct inode *inode);
> -#define MAX_INLINE_DATA(inode)   (sizeof(__le32) * \
> - (CUR_ADDRS_PER_INODE(inode) - \
> - DEF_INLINE_RESERVED_SIZE - \
> - F2FS_INLINE_XATTR_ADDRS))
> +static inline int get_inline_xattr_addrs(struct inode *inode);
> +#define F2FS_INLINE_XATTR_ADDRS(inode)   get_inline_xattr_addrs(inode)
> +#define MAX_INLINE_DATA(inode)   (sizeof(__le32) *   
> \
> + (CUR_ADDRS_PER_INODE(inode) -   \
> + F2FS_INLINE_XATTR_ADDRS(inode) -\
> + DEF_INLINE_RESERVED_SIZE))
>  
>  /* for inline dir */
>  #define NR_INLINE_DENTRY(inode)  (MAX_INLINE_DATA(inode) * BITS_PER_BYTE 
> / \
> @@ -592,6 +597,7 @@ struct f2fs_inode_info {
>  
>   int i_extra_isize;  /* size of extra space located in 
> i_addr */
>   kprojid_t i_projid; /* id for project quota */
> + int i_inline_xattr_size;/* inline xattr size */
>  };
>  
>  static inline void get_extent_info(struct extent_info *ext,
> @@ -1043,6 +1049,7 @@ struct f2fs_sb_info {
>   loff_t max_file_blocks; /* max block index of file */
>   int active_logs;/* # of active logs */
>   int dir_level;  /* directory level */
> + int inline_xattr_size;

SME/32-bit regression

2017-09-05 Thread Boris Ostrovsky

It appears there is a regression for 32-bit kernels due to SME changes.

I bisected my particular problem (Xen PV guest) to 
21729f81ce8ae76a6995681d40e16f7ce8075db4 but I also saw pmd_clear_bad() 
errors on baremetal. This seems to be caused by sme_me_mask being an 
unsigned long as opposed to phys_addr_t (the actual problem is that 
__PHYSICAL_MASK is truncated). When I declare it as u64 and drop 
unsigned long cast in __sme_set()/__sme_clr() the problem goes way. 
(This presumably won't work for non-PAE which I haven't tried).



-boris


Re: [PATCH 3/3] f2fs: support flexible inline xattr size

2017-09-05 Thread Chao Yu
Hi Jaegeuk,

Do we have time to test and stabilize this new feature before merge window?

Thanks,

On 2017/9/4 18:58, Chao Yu wrote:
> Now, in product, more and more features based on file encryption were
> introduced, their demand of xattr space is increasing, however, inline
> xattr has fixed-size of 200 bytes, once inline xattr space is full, new
> increased xattr data would occupy additional xattr block which may bring
> us more space usage and performance regression during persisting.
> 
> In order to resolve above issue, it's better to expand inline xattr size
> flexibly according to user's requirement.
> 
> So this patch introduces new filesystem feature 'flexible inline xattr',
> and new mount option 'inline_xattr_size=%u', once mkfs enables the
> feature, we can use the option to make f2fs supporting flexible inline
> xattr size.
> 
> To support this feature, we add extra attribute i_inline_xattr_size in
> inode layout, indicating that how many space inline xattr borrows from
> block address mapping space in inode layout, by this, we can easily
> locate and store flexible-sized inline xattr data in inode.
> 
> Inode disk layout:
>   +--+
>   | .i_mode  |
>   | ...  |
>   | .i_ext   |
>   +--+
>   | .i_extra_isize   |
>   | .i_inline_xattr_size |---+
>   | ...  |   |
>   +--+   |
>   | .i_addr  |   |
>   |  - block address or  |   |
>   |  - inline data   |   |
>   +--+<---+  v
>   |inline xattr  |+---inline xattr range
>   +--+<---+
>   | .i_nid   |
>   +--+
>   |   node_footer|
>   | (nid, ino, offset)   |
>   +--+
> 
> Signed-off-by: Chao Yu 
> ---
>  fs/f2fs/f2fs.h  | 43 ++-
>  fs/f2fs/inode.c | 12 
>  fs/f2fs/namei.c |  6 ++
>  fs/f2fs/node.c  |  4 ++--
>  fs/f2fs/super.c | 32 +++-
>  fs/f2fs/sysfs.c |  7 +++
>  fs/f2fs/xattr.c | 18 +-
>  include/linux/f2fs_fs.h |  5 +++--
>  8 files changed, 100 insertions(+), 27 deletions(-)
> 
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 1f24ad4ca1bb..168ad51b7fb9 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -93,6 +93,7 @@ extern char *fault_name[FAULT_MAX];
>  #define F2FS_MOUNT_GRPQUOTA  0x0010
>  #define F2FS_MOUNT_PRJQUOTA  0x0020
>  #define F2FS_MOUNT_QUOTA 0x0040
> +#define F2FS_MOUNT_INLINE_XATTR_SIZE 0x0080
>  
>  #define clear_opt(sbi, option)   ((sbi)->mount_opt.opt &= 
> ~F2FS_MOUNT_##option)
>  #define set_opt(sbi, option) ((sbi)->mount_opt.opt |= F2FS_MOUNT_##option)
> @@ -118,6 +119,7 @@ struct f2fs_mount_info {
>  #define F2FS_FEATURE_EXTRA_ATTR  0x0008
>  #define F2FS_FEATURE_PRJQUOTA0x0010
>  #define F2FS_FEATURE_INODE_CHKSUM0x0020
> +#define F2FS_FEATURE_FLEXIBLE_INLINE_XATTR   0x0040
>  
>  #define F2FS_HAS_FEATURE(sb, mask)   \
>   ((F2FS_SB(sb)->raw_super->feature & cpu_to_le32(mask)) != 0)
> @@ -379,11 +381,14 @@ struct f2fs_flush_device {
>  
>  /* for inline stuff */
>  #define DEF_INLINE_RESERVED_SIZE 1
> +#define DEF_MIN_INLINE_SIZE  1
>  static inline int get_extra_isize(struct inode *inode);
> -#define MAX_INLINE_DATA(inode)   (sizeof(__le32) * \
> - (CUR_ADDRS_PER_INODE(inode) - \
> - DEF_INLINE_RESERVED_SIZE - \
> - F2FS_INLINE_XATTR_ADDRS))
> +static inline int get_inline_xattr_addrs(struct inode *inode);
> +#define F2FS_INLINE_XATTR_ADDRS(inode)   get_inline_xattr_addrs(inode)
> +#define MAX_INLINE_DATA(inode)   (sizeof(__le32) *   
> \
> + (CUR_ADDRS_PER_INODE(inode) -   \
> + F2FS_INLINE_XATTR_ADDRS(inode) -\
> + DEF_INLINE_RESERVED_SIZE))
>  
>  /* for inline dir */
>  #define NR_INLINE_DENTRY(inode)  (MAX_INLINE_DATA(inode) * BITS_PER_BYTE 
> / \
> @@ -592,6 +597,7 @@ struct f2fs_inode_info {
>  
>   int i_extra_isize;  /* size of extra space located in 
> i_addr */
>   kprojid_t i_projid; /* id for project quota */
> + int i_inline_xattr_size;/* inline xattr size */
>  };
>  
>  static inline void get_extent_info(struct extent_info *ext,
> @@ -1043,6 +1049,7 @@ struct f2fs_sb_info {
>   loff_t max_file_blocks; /* max block index of file */
>   int active_logs;/* # of active logs */
>   int dir_level;  /* directory level */
> + int inline_xattr_size;  /* inline xattr 

SME/32-bit regression

2017-09-05 Thread Boris Ostrovsky

It appears there is a regression for 32-bit kernels due to SME changes.

I bisected my particular problem (Xen PV guest) to 
21729f81ce8ae76a6995681d40e16f7ce8075db4 but I also saw pmd_clear_bad() 
errors on baremetal. This seems to be caused by sme_me_mask being an 
unsigned long as opposed to phys_addr_t (the actual problem is that 
__PHYSICAL_MASK is truncated). When I declare it as u64 and drop 
unsigned long cast in __sme_set()/__sme_clr() the problem goes way. 
(This presumably won't work for non-PAE which I haven't tried).



-boris


[PATCH 3/3] KVM: SVM: Add irqchip_split() checks before enabling AVIC

2017-09-05 Thread Suravee Suthikulpanit
SVM AVIC hardware accelerates guest write to APIC_EOI register
(for edge-trigger interrupt), which means it does not trap to KVM.

So, only enable SVM AVIC only in split irqchip mode.
(e.g. launching qemu w/ option '-machine kernel_irqchip=split').

Suggested-by: Paolo Bonzini 
Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/kvm/svm.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index d1b3eb4..7ce191b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1176,7 +1176,6 @@ static void avic_init_vmcb(struct vcpu_svm *svm)
vmcb->control.avic_physical_id = ppa & AVIC_HPA_MASK;
vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID_COUNT;
vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
-   svm->vcpu.arch.apicv_active = true;
 }
 
 static void init_vmcb(struct vcpu_svm *svm)
@@ -1292,7 +1291,7 @@ static void init_vmcb(struct vcpu_svm *svm)
set_intercept(svm, INTERCEPT_PAUSE);
}
 
-   if (avic)
+   if (kvm_vcpu_apicv_active(>vcpu))
avic_init_vmcb(svm);
 
/*
@@ -1594,6 +1593,12 @@ static int avic_init_vcpu(struct vcpu_svm *svm)
if (!avic)
return 0;
 
+   if (!irqchip_split(svm->vcpu.kvm)) {
+   pr_debug("%s: Disable AVIC due to non-split irqchip.\n",
+__func__);
+   return 0;
+   }
+
ret = avic_init_backing_page(>vcpu);
if (ret)
return ret;
@@ -4388,7 +4393,7 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
 
 static bool svm_get_enable_apicv(struct kvm_vcpu *vcpu)
 {
-   return avic;
+   return avic && irqchip_split(vcpu->kvm);
 }
 
 static void svm_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
@@ -4405,7 +4410,7 @@ static void svm_refresh_apicv_exec_ctrl(struct kvm_vcpu 
*vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb = svm->vmcb;
 
-   if (!avic)
+   if (!avic || !irqchip_split(svm->vcpu.kvm))
return;
 
vmcb->control.int_ctl &= ~AVIC_ENABLE_MASK;
-- 
1.8.3.1



[PATCH 3/3] KVM: SVM: Add irqchip_split() checks before enabling AVIC

2017-09-05 Thread Suravee Suthikulpanit
SVM AVIC hardware accelerates guest write to APIC_EOI register
(for edge-trigger interrupt), which means it does not trap to KVM.

So, only enable SVM AVIC only in split irqchip mode.
(e.g. launching qemu w/ option '-machine kernel_irqchip=split').

Suggested-by: Paolo Bonzini 
Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/kvm/svm.c | 13 +
 1 file changed, 9 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index d1b3eb4..7ce191b 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1176,7 +1176,6 @@ static void avic_init_vmcb(struct vcpu_svm *svm)
vmcb->control.avic_physical_id = ppa & AVIC_HPA_MASK;
vmcb->control.avic_physical_id |= AVIC_MAX_PHYSICAL_ID_COUNT;
vmcb->control.int_ctl |= AVIC_ENABLE_MASK;
-   svm->vcpu.arch.apicv_active = true;
 }
 
 static void init_vmcb(struct vcpu_svm *svm)
@@ -1292,7 +1291,7 @@ static void init_vmcb(struct vcpu_svm *svm)
set_intercept(svm, INTERCEPT_PAUSE);
}
 
-   if (avic)
+   if (kvm_vcpu_apicv_active(>vcpu))
avic_init_vmcb(svm);
 
/*
@@ -1594,6 +1593,12 @@ static int avic_init_vcpu(struct vcpu_svm *svm)
if (!avic)
return 0;
 
+   if (!irqchip_split(svm->vcpu.kvm)) {
+   pr_debug("%s: Disable AVIC due to non-split irqchip.\n",
+__func__);
+   return 0;
+   }
+
ret = avic_init_backing_page(>vcpu);
if (ret)
return ret;
@@ -4388,7 +4393,7 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
 
 static bool svm_get_enable_apicv(struct kvm_vcpu *vcpu)
 {
-   return avic;
+   return avic && irqchip_split(vcpu->kvm);
 }
 
 static void svm_hwapic_irr_update(struct kvm_vcpu *vcpu, int max_irr)
@@ -4405,7 +4410,7 @@ static void svm_refresh_apicv_exec_ctrl(struct kvm_vcpu 
*vcpu)
struct vcpu_svm *svm = to_svm(vcpu);
struct vmcb *vmcb = svm->vmcb;
 
-   if (!avic)
+   if (!avic || !irqchip_split(svm->vcpu.kvm))
return;
 
vmcb->control.int_ctl &= ~AVIC_ENABLE_MASK;
-- 
1.8.3.1



kvm: use-after-free in irq_bypass_register_consumer

2017-09-05 Thread idaifish
Got the following report while fuzzing 4.9.47.

It seems that this bug has been reported by Dmitry Vyukov
[https://lkml.org/lkml/2017/1/27/828]
But I still can reproduce the bug  on latest Ubuntu1604 (4.4.0-94-generic)

PoC:  
https://gist.githubusercontent.com/dvyukov/3411518fa22baff19ae204cc46a6/raw/7826cbcd1cbc472dfa4972fe56371df3c94b70c7/gistfile1.txt


===

BUG: KASAN: use-after-free in __list_add include/linux/list.h:43
[inline] at addr 88007ab9dc80
BUG: KASAN: use-after-free in list_add include/linux/list.h:63
[inline] at addr 88007ab9dc80
BUG: KASAN: use-after-free in irq_bypass_register_consumer+0x3a3/0x420
virt/lib/irqbypass.c:217 at addr 88007ab9dc80
Write of size 8 by task syz-executor4/2970
CPU: 0 PID: 2970 Comm: syz-executor4 Tainted: GB   4.9.47 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
 8800760df9c0 81ad97d9 88007f803080 88007ab9db80
 88007ab9dd80 88007ab9da00 8800760df9e8 8153892c
 8800760dfa78 88007f803080 8800776f7d20 8800760dfa68
Call Trace:
 [] __dump_stack lib/dump_stack.c:15 [inline]
 [] dump_stack+0x83/0xba lib/dump_stack.c:51
 [] kasan_object_err+0x1c/0x70 mm/kasan/report.c:160
 [] print_address_description mm/kasan/report.c:198 [inline]
 [] kasan_report_error+0x1f0/0x4e0 mm/kasan/report.c:287
 [] kasan_report mm/kasan/report.c:309 [inline]
 [] __asan_report_store8_noabort+0x3e/0x40
mm/kasan/report.c:335
 [] __list_add include/linux/list.h:43 [inline]
 [] list_add include/linux/list.h:63 [inline]
 [] irq_bypass_register_consumer+0x3a3/0x420
virt/lib/irqbypass.c:217
 [] kvm_irqfd_assign
arch/x86/kvm/../../../virt/kvm/eventfd.c:417 [inline]
 [] kvm_irqfd+0x1095/0x1840
arch/x86/kvm/../../../virt/kvm/eventfd.c:572
 [] kvm_vm_ioctl+0x2bc/0x1580
arch/x86/kvm/../../../virt/kvm/kvm_main.c:2999
 [] vfs_ioctl fs/ioctl.c:43 [inline]
 [] do_vfs_ioctl+0x18c/0xf80 fs/ioctl.c:679
 [] SYSC_ioctl fs/ioctl.c:694 [inline]
 [] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
 [] entry_SYSCALL_64_fastpath+0x1a/0xa9
Object at 88007ab9db80, in cache kmalloc-512 size: 512
Allocated:
PID = 2970
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
 save_stack+0x46/0xd0 mm/kasan/kasan.c:495
 set_track mm/kasan/kasan.c:507 [inline]
 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:598
 kmem_cache_alloc_trace include/linux/slab.h:391 [inline]
 kmalloc include/linux/slab.h:490 [inline]
 kzalloc include/linux/slab.h:636 [inline]
 kvm_irqfd_assign arch/x86/kvm/../../../virt/kvm/eventfd.c:296 [inline]
 kvm_irqfd+0xbd/0x1840 arch/x86/kvm/../../../virt/kvm/eventfd.c:572
 kvm_vm_ioctl+0x2bc/0x1580 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2999
 vfs_ioctl fs/ioctl.c:43 [inline]
 do_vfs_ioctl+0x18c/0xf80 fs/ioctl.c:679
 SYSC_ioctl fs/ioctl.c:694 [inline]
 SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
 entry_SYSCALL_64_fastpath+0x1a/0xa9
Freed:
PID = 1098
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
 save_stack+0x46/0xd0 mm/kasan/kasan.c:495
 set_track mm/kasan/kasan.c:507 [inline]
 kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:571
 slab_free_hook mm/slub.c:1355 [inline]
 slab_free_freelist_hook mm/slub.c:1377 [inline]
 slab_free mm/slub.c:2958 [inline]
 kfree+0xa0/0x150 mm/slub.c:3878
 irqfd_shutdown+0x137/0x1a0 arch/x86/kvm/../../../virt/kvm/eventfd.c:148
 process_one_work+0x87c/0x1170 kernel/workqueue.c:2096
 worker_thread+0xed/0x14e0 kernel/workqueue.c:2230
 kthread+0x220/0x2a0 kernel/kthread.c:211
 ret_from_fork+0x25/0x30 arch/x86/entry/entry_64.S:433
Memory state around the buggy address:
 88007ab9db80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 88007ab9dc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>88007ab9dc80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
   ^
 88007ab9dd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 88007ab9dd80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc




-- 
Regards,
idaifish


kvm: use-after-free in irq_bypass_register_consumer

2017-09-05 Thread idaifish
Got the following report while fuzzing 4.9.47.

It seems that this bug has been reported by Dmitry Vyukov
[https://lkml.org/lkml/2017/1/27/828]
But I still can reproduce the bug  on latest Ubuntu1604 (4.4.0-94-generic)

PoC:  
https://gist.githubusercontent.com/dvyukov/3411518fa22baff19ae204cc46a6/raw/7826cbcd1cbc472dfa4972fe56371df3c94b70c7/gistfile1.txt


===

BUG: KASAN: use-after-free in __list_add include/linux/list.h:43
[inline] at addr 88007ab9dc80
BUG: KASAN: use-after-free in list_add include/linux/list.h:63
[inline] at addr 88007ab9dc80
BUG: KASAN: use-after-free in irq_bypass_register_consumer+0x3a3/0x420
virt/lib/irqbypass.c:217 at addr 88007ab9dc80
Write of size 8 by task syz-executor4/2970
CPU: 0 PID: 2970 Comm: syz-executor4 Tainted: GB   4.9.47 #1
Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS
Ubuntu-1.8.2-1ubuntu1 04/01/2014
 8800760df9c0 81ad97d9 88007f803080 88007ab9db80
 88007ab9dd80 88007ab9da00 8800760df9e8 8153892c
 8800760dfa78 88007f803080 8800776f7d20 8800760dfa68
Call Trace:
 [] __dump_stack lib/dump_stack.c:15 [inline]
 [] dump_stack+0x83/0xba lib/dump_stack.c:51
 [] kasan_object_err+0x1c/0x70 mm/kasan/report.c:160
 [] print_address_description mm/kasan/report.c:198 [inline]
 [] kasan_report_error+0x1f0/0x4e0 mm/kasan/report.c:287
 [] kasan_report mm/kasan/report.c:309 [inline]
 [] __asan_report_store8_noabort+0x3e/0x40
mm/kasan/report.c:335
 [] __list_add include/linux/list.h:43 [inline]
 [] list_add include/linux/list.h:63 [inline]
 [] irq_bypass_register_consumer+0x3a3/0x420
virt/lib/irqbypass.c:217
 [] kvm_irqfd_assign
arch/x86/kvm/../../../virt/kvm/eventfd.c:417 [inline]
 [] kvm_irqfd+0x1095/0x1840
arch/x86/kvm/../../../virt/kvm/eventfd.c:572
 [] kvm_vm_ioctl+0x2bc/0x1580
arch/x86/kvm/../../../virt/kvm/kvm_main.c:2999
 [] vfs_ioctl fs/ioctl.c:43 [inline]
 [] do_vfs_ioctl+0x18c/0xf80 fs/ioctl.c:679
 [] SYSC_ioctl fs/ioctl.c:694 [inline]
 [] SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
 [] entry_SYSCALL_64_fastpath+0x1a/0xa9
Object at 88007ab9db80, in cache kmalloc-512 size: 512
Allocated:
PID = 2970
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
 save_stack+0x46/0xd0 mm/kasan/kasan.c:495
 set_track mm/kasan/kasan.c:507 [inline]
 kasan_kmalloc+0xad/0xe0 mm/kasan/kasan.c:598
 kmem_cache_alloc_trace include/linux/slab.h:391 [inline]
 kmalloc include/linux/slab.h:490 [inline]
 kzalloc include/linux/slab.h:636 [inline]
 kvm_irqfd_assign arch/x86/kvm/../../../virt/kvm/eventfd.c:296 [inline]
 kvm_irqfd+0xbd/0x1840 arch/x86/kvm/../../../virt/kvm/eventfd.c:572
 kvm_vm_ioctl+0x2bc/0x1580 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2999
 vfs_ioctl fs/ioctl.c:43 [inline]
 do_vfs_ioctl+0x18c/0xf80 fs/ioctl.c:679
 SYSC_ioctl fs/ioctl.c:694 [inline]
 SyS_ioctl+0x8f/0xc0 fs/ioctl.c:685
 entry_SYSCALL_64_fastpath+0x1a/0xa9
Freed:
PID = 1098
 save_stack_trace+0x16/0x20 arch/x86/kernel/stacktrace.c:57
 save_stack+0x46/0xd0 mm/kasan/kasan.c:495
 set_track mm/kasan/kasan.c:507 [inline]
 kasan_slab_free+0x73/0xc0 mm/kasan/kasan.c:571
 slab_free_hook mm/slub.c:1355 [inline]
 slab_free_freelist_hook mm/slub.c:1377 [inline]
 slab_free mm/slub.c:2958 [inline]
 kfree+0xa0/0x150 mm/slub.c:3878
 irqfd_shutdown+0x137/0x1a0 arch/x86/kvm/../../../virt/kvm/eventfd.c:148
 process_one_work+0x87c/0x1170 kernel/workqueue.c:2096
 worker_thread+0xed/0x14e0 kernel/workqueue.c:2230
 kthread+0x220/0x2a0 kernel/kthread.c:211
 ret_from_fork+0x25/0x30 arch/x86/entry/entry_64.S:433
Memory state around the buggy address:
 88007ab9db80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 88007ab9dc00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
>88007ab9dc80: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
   ^
 88007ab9dd00: fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb fb
 88007ab9dd80: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc




-- 
Regards,
idaifish


[PATCH 2/3] KVM: Add struct kvm_vcpu pointer parameter to get_enable_apicv()

2017-09-05 Thread Suravee Suthikulpanit
Modify struct kvm_x86_ops.arch.apicv_active() to take struct kvm_vcpu
pointer as parameter in preparation to subsequent changes.

Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/include/asm/kvm_host.h | 2 +-
 arch/x86/kvm/svm.c  | 2 +-
 arch/x86/kvm/vmx.c  | 2 +-
 arch/x86/kvm/x86.c  | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f4d120a..3cdde44 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -969,7 +969,7 @@ struct kvm_x86_ops {
void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
void (*enable_irq_window)(struct kvm_vcpu *vcpu);
void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
-   bool (*get_enable_apicv)(void);
+   bool (*get_enable_apicv)(struct kvm_vcpu *vcpu);
void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu);
void (*hwapic_irr_update)(struct kvm_vcpu *vcpu, int max_irr);
void (*hwapic_isr_update)(struct kvm_vcpu *vcpu, int isr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 316edbf..d1b3eb4 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4386,7 +4386,7 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
return;
 }
 
-static bool svm_get_enable_apicv(void)
+static bool svm_get_enable_apicv(struct kvm_vcpu *vcpu)
 {
return avic;
 }
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c6ef294..8a7ab16 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4983,7 +4983,7 @@ static void vmx_disable_intercept_msr_x2apic(u32 msr, int 
type, bool apicv_activ
}
 }
 
-static bool vmx_get_enable_apicv(void)
+static bool vmx_get_enable_apicv(struct kvm_vcpu *vcpu)
 {
return enable_apicv;
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 05a5e57..b8c40ca 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7946,7 +7946,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
BUG_ON(vcpu->kvm == NULL);
kvm = vcpu->kvm;
 
-   vcpu->arch.apicv_active = kvm_x86_ops->get_enable_apicv();
+   vcpu->arch.apicv_active = kvm_x86_ops->get_enable_apicv(vcpu);
vcpu->arch.pv.pv_unhalted = false;
vcpu->arch.emulate_ctxt.ops = _ops;
if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_reset_bsp(vcpu))
-- 
1.8.3.1



[PATCH 2/3] KVM: Add struct kvm_vcpu pointer parameter to get_enable_apicv()

2017-09-05 Thread Suravee Suthikulpanit
Modify struct kvm_x86_ops.arch.apicv_active() to take struct kvm_vcpu
pointer as parameter in preparation to subsequent changes.

Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/include/asm/kvm_host.h | 2 +-
 arch/x86/kvm/svm.c  | 2 +-
 arch/x86/kvm/vmx.c  | 2 +-
 arch/x86/kvm/x86.c  | 2 +-
 4 files changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index f4d120a..3cdde44 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -969,7 +969,7 @@ struct kvm_x86_ops {
void (*enable_nmi_window)(struct kvm_vcpu *vcpu);
void (*enable_irq_window)(struct kvm_vcpu *vcpu);
void (*update_cr8_intercept)(struct kvm_vcpu *vcpu, int tpr, int irr);
-   bool (*get_enable_apicv)(void);
+   bool (*get_enable_apicv)(struct kvm_vcpu *vcpu);
void (*refresh_apicv_exec_ctrl)(struct kvm_vcpu *vcpu);
void (*hwapic_irr_update)(struct kvm_vcpu *vcpu, int max_irr);
void (*hwapic_isr_update)(struct kvm_vcpu *vcpu, int isr);
diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index 316edbf..d1b3eb4 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -4386,7 +4386,7 @@ static void svm_set_virtual_x2apic_mode(struct kvm_vcpu 
*vcpu, bool set)
return;
 }
 
-static bool svm_get_enable_apicv(void)
+static bool svm_get_enable_apicv(struct kvm_vcpu *vcpu)
 {
return avic;
 }
diff --git a/arch/x86/kvm/vmx.c b/arch/x86/kvm/vmx.c
index c6ef294..8a7ab16 100644
--- a/arch/x86/kvm/vmx.c
+++ b/arch/x86/kvm/vmx.c
@@ -4983,7 +4983,7 @@ static void vmx_disable_intercept_msr_x2apic(u32 msr, int 
type, bool apicv_activ
}
 }
 
-static bool vmx_get_enable_apicv(void)
+static bool vmx_get_enable_apicv(struct kvm_vcpu *vcpu)
 {
return enable_apicv;
 }
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 05a5e57..b8c40ca 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -7946,7 +7946,7 @@ int kvm_arch_vcpu_init(struct kvm_vcpu *vcpu)
BUG_ON(vcpu->kvm == NULL);
kvm = vcpu->kvm;
 
-   vcpu->arch.apicv_active = kvm_x86_ops->get_enable_apicv();
+   vcpu->arch.apicv_active = kvm_x86_ops->get_enable_apicv(vcpu);
vcpu->arch.pv.pv_unhalted = false;
vcpu->arch.emulate_ctxt.ops = _ops;
if (!irqchip_in_kernel(kvm) || kvm_vcpu_is_reset_bsp(vcpu))
-- 
1.8.3.1



[PATCH 1/3] KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu()

2017-09-05 Thread Suravee Suthikulpanit
Preparing the base code for subsequent changes. This does not change
existing logic.

Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/kvm/svm.c | 28 
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index af256b7..316edbf 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1587,6 +1587,23 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool 
init_event)
avic_update_vapic_bar(svm, APIC_DEFAULT_PHYS_BASE);
 }
 
+static int avic_init_vcpu(struct vcpu_svm *svm)
+{
+   int ret;
+
+   if (!avic)
+   return 0;
+
+   ret = avic_init_backing_page(>vcpu);
+   if (ret)
+   return ret;
+
+   INIT_LIST_HEAD(>ir_list);
+   spin_lock_init(>ir_list_lock);
+
+   return ret;
+}
+
 static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
 {
struct vcpu_svm *svm;
@@ -1623,14 +1640,9 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
if (!hsave_page)
goto free_page3;
 
-   if (avic) {
-   err = avic_init_backing_page(>vcpu);
-   if (err)
-   goto free_page4;
-
-   INIT_LIST_HEAD(>ir_list);
-   spin_lock_init(>ir_list_lock);
-   }
+   err = avic_init_vcpu(svm);
+   if (err)
+   goto free_page4;
 
/* We initialize this flag to true to make sure that the is_running
 * bit would be set the first time the vcpu is loaded.
-- 
1.8.3.1



[PATCH 1/3] KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu()

2017-09-05 Thread Suravee Suthikulpanit
Preparing the base code for subsequent changes. This does not change
existing logic.

Signed-off-by: Suravee Suthikulpanit 
---
 arch/x86/kvm/svm.c | 28 
 1 file changed, 20 insertions(+), 8 deletions(-)

diff --git a/arch/x86/kvm/svm.c b/arch/x86/kvm/svm.c
index af256b7..316edbf 100644
--- a/arch/x86/kvm/svm.c
+++ b/arch/x86/kvm/svm.c
@@ -1587,6 +1587,23 @@ static void svm_vcpu_reset(struct kvm_vcpu *vcpu, bool 
init_event)
avic_update_vapic_bar(svm, APIC_DEFAULT_PHYS_BASE);
 }
 
+static int avic_init_vcpu(struct vcpu_svm *svm)
+{
+   int ret;
+
+   if (!avic)
+   return 0;
+
+   ret = avic_init_backing_page(>vcpu);
+   if (ret)
+   return ret;
+
+   INIT_LIST_HEAD(>ir_list);
+   spin_lock_init(>ir_list_lock);
+
+   return ret;
+}
+
 static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, unsigned int id)
 {
struct vcpu_svm *svm;
@@ -1623,14 +1640,9 @@ static struct kvm_vcpu *svm_create_vcpu(struct kvm *kvm, 
unsigned int id)
if (!hsave_page)
goto free_page3;
 
-   if (avic) {
-   err = avic_init_backing_page(>vcpu);
-   if (err)
-   goto free_page4;
-
-   INIT_LIST_HEAD(>ir_list);
-   spin_lock_init(>ir_list_lock);
-   }
+   err = avic_init_vcpu(svm);
+   if (err)
+   goto free_page4;
 
/* We initialize this flag to true to make sure that the is_running
 * bit would be set the first time the vcpu is loaded.
-- 
1.8.3.1



[PATCH 0/3] KVM: SVM: Fix guest not booting w/ AVIC enabled

2017-09-05 Thread Suravee Suthikulpanit
Certain QEMU options fails to boot VM guest w/ SVM AVIC enabled
(e.g. modprobe kvm_amd avic=1). Investigation shows that this mainly
due to AVIC hardware does not trap into hypervisor when guest OS
writes to APIC_EOI register. 

The boot hang is caused by missing timer interrupt when using in-kernel
PIT model (e.g. launch qemu w/ '-no-hpet' option) since it requires
irq acknowledgmen before injecting another interrupt in case
irq re-injection is enabled (normally default).

Suravee Suthikulpanit (3):
  KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu()
  KVM: Add struct kvm_vcpu pointer parameter to get_enable_apicv()
  KVM: SVM: Add irqchip_split() checks before enabling AVIC

 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/svm.c  | 43 -
 arch/x86/kvm/vmx.c  |  2 +-
 arch/x86/kvm/x86.c  |  2 +-
 4 files changed, 33 insertions(+), 16 deletions(-)

-- 
1.8.3.1



[PATCH 0/3] KVM: SVM: Fix guest not booting w/ AVIC enabled

2017-09-05 Thread Suravee Suthikulpanit
Certain QEMU options fails to boot VM guest w/ SVM AVIC enabled
(e.g. modprobe kvm_amd avic=1). Investigation shows that this mainly
due to AVIC hardware does not trap into hypervisor when guest OS
writes to APIC_EOI register. 

The boot hang is caused by missing timer interrupt when using in-kernel
PIT model (e.g. launch qemu w/ '-no-hpet' option) since it requires
irq acknowledgmen before injecting another interrupt in case
irq re-injection is enabled (normally default).

Suravee Suthikulpanit (3):
  KVM: SVM: Refactor AVIC vcpu initialization into avic_init_vcpu()
  KVM: Add struct kvm_vcpu pointer parameter to get_enable_apicv()
  KVM: SVM: Add irqchip_split() checks before enabling AVIC

 arch/x86/include/asm/kvm_host.h |  2 +-
 arch/x86/kvm/svm.c  | 43 -
 arch/x86/kvm/vmx.c  |  2 +-
 arch/x86/kvm/x86.c  |  2 +-
 4 files changed, 33 insertions(+), 16 deletions(-)

-- 
1.8.3.1



Re: [PATCH RFC v3 0/4] Add cross-compilation support to eBPF samples

2017-09-05 Thread Joel Fernandes
Hi Alexei,

On Tue, Sep 5, 2017 at 8:24 PM, Alexei Starovoitov
 wrote:
> On Sun, Sep 03, 2017 at 11:23:21AM -0700, Joel Fernandes wrote:
>> These patches fix issues seen when cross-compiling eBPF samples on arm64.
>> Compared to [1], I dropped the controversial inline-asm patch pending further
>> discussion on the right way to do it. However these patches are still a step 
>> in
>> the right direction and I wanted them to get in before the more controversial
>> bit.
>
> All patches in this set look good to me.
> When you repost without changes after net-next reopens pls keep my:
> Acked-by: Alexei Starovoitov 

Thanks a lot! will do.

Regards,
Joel


Re: [PATCH RFC v3 0/4] Add cross-compilation support to eBPF samples

2017-09-05 Thread Joel Fernandes
Hi Alexei,

On Tue, Sep 5, 2017 at 8:24 PM, Alexei Starovoitov
 wrote:
> On Sun, Sep 03, 2017 at 11:23:21AM -0700, Joel Fernandes wrote:
>> These patches fix issues seen when cross-compiling eBPF samples on arm64.
>> Compared to [1], I dropped the controversial inline-asm patch pending further
>> discussion on the right way to do it. However these patches are still a step 
>> in
>> the right direction and I wanted them to get in before the more controversial
>> bit.
>
> All patches in this set look good to me.
> When you repost without changes after net-next reopens pls keep my:
> Acked-by: Alexei Starovoitov 

Thanks a lot! will do.

Regards,
Joel


Re: [PATCH RFC v3 0/4] Add cross-compilation support to eBPF samples

2017-09-05 Thread Alexei Starovoitov
On Sun, Sep 03, 2017 at 11:23:21AM -0700, Joel Fernandes wrote:
> These patches fix issues seen when cross-compiling eBPF samples on arm64.
> Compared to [1], I dropped the controversial inline-asm patch pending further
> discussion on the right way to do it. However these patches are still a step 
> in
> the right direction and I wanted them to get in before the more controversial
> bit.

All patches in this set look good to me.
When you repost without changes after net-next reopens pls keep my:
Acked-by: Alexei Starovoitov 



Re: [PATCH RFC v3 0/4] Add cross-compilation support to eBPF samples

2017-09-05 Thread Alexei Starovoitov
On Sun, Sep 03, 2017 at 11:23:21AM -0700, Joel Fernandes wrote:
> These patches fix issues seen when cross-compiling eBPF samples on arm64.
> Compared to [1], I dropped the controversial inline-asm patch pending further
> discussion on the right way to do it. However these patches are still a step 
> in
> the right direction and I wanted them to get in before the more controversial
> bit.

All patches in this set look good to me.
When you repost without changes after net-next reopens pls keep my:
Acked-by: Alexei Starovoitov 



Re: [PATCH] f2fs: fix wrong bfree and bavail

2017-09-05 Thread Chao Yu
On 2017/9/6 11:10, Jaegeuk Kim wrote:
> On 09/06, Chao Yu wrote:
>> On 2017/9/6 5:06, Jaegeuk Kim wrote:
>>> This patch fixes bavail and bfree finally.
>>>
>>> longf_bfree;/* free blocks in fs */
>>> longf_bavail;   /* free blocks avail to non-superuser */
>>>
>>> So, bfree represents all the reserved blocks, while bavail does the space 
>>> only
>>> visible to normal user.
>>>
>>> Fix: 3e6d0b4d9c1c ("f2fs: fix incorrect f_bfree calculation in ->statfs")
>>> Cc:  # 4.8+
>>> Signed-off-by: Jaegeuk Kim 
>>> ---
>>>  fs/f2fs/super.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>> index 7e29db227910..17cca4cb93af 100644
>>> --- a/fs/f2fs/super.c
>>> +++ b/fs/f2fs/super.c
>>> @@ -955,8 +955,8 @@ static int f2fs_statfs(struct dentry *dentry, struct 
>>> kstatfs *buf)
>>> buf->f_bsize = sbi->blocksize;
>>>  
>>> buf->f_blocks = total_count - start_count;
>>> -   buf->f_bfree = user_block_count - valid_user_blocks(sbi) + ovp_count;
>>> -   buf->f_bavail = user_block_count - valid_user_blocks(sbi) -
>>> +   buf->f_bfree = user_block_count - valid_user_blocks(sbi);
>>> +   buf->f_bavail = user_block_count - valid_user_blocks(sbi) - ovp_count -
>>
>>  set_cp(user_block_count, ((get_cp(free_segment_count) + 6 -
>>  get_cp(overprov_segment_count)) * c.blks_per_seg));
>>
>> As we calculated in mkfs, we have removed ovp_count in user_block_count, so
>> shouldn't we add ovp_count for f_bfree here?
> 
> Hehe, I already dropped this. :P

Alright. :)

> It seems like something wrong with df in android.

Maybe its version is too old?

Thanks,

> 
>>
>> Thanks,
>>
>>> sbi->reserved_blocks;
>>>  
>>> avail_node_count = sbi->total_node_count - F2FS_RESERVED_NODE_NUM;
>>>
> 
> .
> 



Re: [PATCH] f2fs: fix wrong bfree and bavail

2017-09-05 Thread Chao Yu
On 2017/9/6 11:10, Jaegeuk Kim wrote:
> On 09/06, Chao Yu wrote:
>> On 2017/9/6 5:06, Jaegeuk Kim wrote:
>>> This patch fixes bavail and bfree finally.
>>>
>>> longf_bfree;/* free blocks in fs */
>>> longf_bavail;   /* free blocks avail to non-superuser */
>>>
>>> So, bfree represents all the reserved blocks, while bavail does the space 
>>> only
>>> visible to normal user.
>>>
>>> Fix: 3e6d0b4d9c1c ("f2fs: fix incorrect f_bfree calculation in ->statfs")
>>> Cc:  # 4.8+
>>> Signed-off-by: Jaegeuk Kim 
>>> ---
>>>  fs/f2fs/super.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
>>> index 7e29db227910..17cca4cb93af 100644
>>> --- a/fs/f2fs/super.c
>>> +++ b/fs/f2fs/super.c
>>> @@ -955,8 +955,8 @@ static int f2fs_statfs(struct dentry *dentry, struct 
>>> kstatfs *buf)
>>> buf->f_bsize = sbi->blocksize;
>>>  
>>> buf->f_blocks = total_count - start_count;
>>> -   buf->f_bfree = user_block_count - valid_user_blocks(sbi) + ovp_count;
>>> -   buf->f_bavail = user_block_count - valid_user_blocks(sbi) -
>>> +   buf->f_bfree = user_block_count - valid_user_blocks(sbi);
>>> +   buf->f_bavail = user_block_count - valid_user_blocks(sbi) - ovp_count -
>>
>>  set_cp(user_block_count, ((get_cp(free_segment_count) + 6 -
>>  get_cp(overprov_segment_count)) * c.blks_per_seg));
>>
>> As we calculated in mkfs, we have removed ovp_count in user_block_count, so
>> shouldn't we add ovp_count for f_bfree here?
> 
> Hehe, I already dropped this. :P

Alright. :)

> It seems like something wrong with df in android.

Maybe its version is too old?

Thanks,

> 
>>
>> Thanks,
>>
>>> sbi->reserved_blocks;
>>>  
>>> avail_node_count = sbi->total_node_count - F2FS_RESERVED_NODE_NUM;
>>>
> 
> .
> 



Re: [f2fs-dev] [PATCH 2/2] f2fs: use generic terms used for encrypted block management

2017-09-05 Thread Chao Yu
On 2017/9/6 8:15, Jaegeuk Kim wrote:
> This patch renames functions regarding to buffer management via META_MAPPING
> used for encrypted blocks especially. We can actually use them in generic way.

Meta inode cache is more like bd_inode cache now. ;)

Reviewed-by: Chao Yu 

Thanks,

> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/data.c|  6 +++---
>  fs/f2fs/f2fs.h|  3 +--
>  fs/f2fs/file.c|  2 +-
>  fs/f2fs/gc.c  | 13 +
>  fs/f2fs/segment.c |  3 +--
>  5 files changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index e6c683e7a10e..ee6801fdbdec 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -1163,7 +1163,7 @@ static struct bio *f2fs_grab_bio(struct inode *inode, 
> block_t blkaddr,
>   return ERR_CAST(ctx);
>  
>   /* wait the page to be moved by cleaning */
> - f2fs_wait_on_encrypted_page_writeback(sbi, blkaddr);
> + f2fs_wait_on_block_writeback(sbi, blkaddr);
>   }
>  
>   bio = bio_alloc(GFP_KERNEL, min_t(int, nr_pages, BIO_MAX_PAGES));
> @@ -1349,7 +1349,7 @@ static int encrypt_one_page(struct f2fs_io_info *fio)
>   return 0;
>  
>   /* wait for GCed encrypted page writeback */
> - f2fs_wait_on_encrypted_page_writeback(fio->sbi, fio->old_blkaddr);
> + f2fs_wait_on_block_writeback(fio->sbi, fio->old_blkaddr);
>  
>  retry_encrypt:
>   fio->encrypted_page = fscrypt_encrypt_page(inode, fio->page,
> @@ -1975,7 +1975,7 @@ static int f2fs_write_begin(struct file *file, struct 
> address_space *mapping,
>  
>   /* wait for GCed encrypted page writeback */
>   if (f2fs_encrypted_file(inode))
> - f2fs_wait_on_encrypted_page_writeback(sbi, blkaddr);
> + f2fs_wait_on_block_writeback(sbi, blkaddr);
>  
>   if (len == PAGE_SIZE || PageUptodate(page))
>   return 0;
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 417c28f01fd5..91fb749686f2 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -2550,8 +2550,7 @@ void allocate_data_block(struct f2fs_sb_info *sbi, 
> struct page *page,
>   struct f2fs_io_info *fio, bool add_list);
>  void f2fs_wait_on_page_writeback(struct page *page,
>   enum page_type type, bool ordered);
> -void f2fs_wait_on_encrypted_page_writeback(struct f2fs_sb_info *sbi,
> - block_t blkaddr);
> +void f2fs_wait_on_block_writeback(struct f2fs_sb_info *sbi, block_t blkaddr);
>  void write_data_summaries(struct f2fs_sb_info *sbi, block_t start_blk);
>  void write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk);
>  int lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index f8cedaba7ce4..224379a9848c 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -107,7 +107,7 @@ static int f2fs_vm_page_mkwrite(struct vm_fault *vmf)
>  
>   /* wait for GCed encrypted page writeback */
>   if (f2fs_encrypted_file(inode))
> - f2fs_wait_on_encrypted_page_writeback(sbi, dn.data_blkaddr);
> + f2fs_wait_on_block_writeback(sbi, dn.data_blkaddr);
>  
>  out_sem:
>   up_read(_I(inode)->i_mmap_sem);
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 6a12f33d0cdd..bfe6a8ccc3a0 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -599,8 +599,12 @@ static bool is_alive(struct f2fs_sb_info *sbi, struct 
> f2fs_summary *sum,
>   return true;
>  }
>  
> -static void move_encrypted_block(struct inode *inode, block_t bidx,
> - unsigned int segno, int 
> off)
> +/*
> + * Move data block via META_MAPPING while keeping locked data page.
> + * This can be used to move blocks, aka LBAs, directly on disk.
> + */
> +static void move_data_block(struct inode *inode, block_t bidx,
> + unsigned int segno, int off)
>  {
>   struct f2fs_io_info fio = {
>   .sbi = F2FS_I_SB(inode),
> @@ -873,9 +877,10 @@ static void gc_data_segment(struct f2fs_sb_info *sbi, 
> struct f2fs_summary *sum,
>   start_bidx = start_bidx_of_node(nofs, inode)
>   + ofs_in_node;
>   if (f2fs_encrypted_file(inode))
> - move_encrypted_block(inode, start_bidx, segno, 
> off);
> + move_data_block(inode, start_bidx, segno, off);
>   else
> - move_data_page(inode, start_bidx, gc_type, 
> segno, off);
> + move_data_page(inode, start_bidx, gc_type,
> + segno, off);
>  
>   if (locked) {
>   up_write(>dio_rwsem[WRITE]);
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 

Re: [PATCH] md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list

2017-09-05 Thread Dennis Yang
> On Fri, Sep 01, 2017 at 05:26:48PM +0800, Dennis Yang wrote:
> > >On Mon, Aug 28, 2017 at 08:01:59PM +0800, Dennis Yang wrote:
> > >> break_stripe_batch_list() did not preserve STRIPE_ON_UNPLUG_LIST which is
> > >> set when a stripe_head gets queued to the stripe_head list maintained by
> > >> raid5_plug_cb and waiting for releasing after blk_unplug().
> > >> 
> > >> In release_stripe_plug(), if a stripe_head has its STRIPE_ON_UNPLUG_LIST
> > >> set, it indicates that this stripe_head is already in the raid5_plug_cb
> > >> list and release_stripe() would be called instead to drop a reference
> > >> count. Otherwise, the STRIPE_ON_UNPLUG_LIST bit would be set for this
> > >> stripe_head and it will get queued into the raid5_plug_cb list.
> > >> 
> > >> Without preserving STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list(),
> > >> a stripe_head could be re-added to the raid5_plug_cb list while it is
> > >> currently on that list and waiting to be released. This would mess up
> > >> the raid5_plug_cb and leads to soft/hard lockup in raid5_unplug() or
> > >> kernel crash.
> > >
> > >Did you observe this? From my understanding, this is impossible. If a 
> > >stripe is
> > >in batch list (that's why we have break_stripe_batch_list), add_stripe_bio 
> > >will
> > >bail out (because sh->batch_head != NULL), so the stripe is never added 
> > >into
> > >plug list, hence has the STRIPE_ON_UNPLUG_LIST bit set. Am I missing 
> > >anything?
> > >
> > >Thanks,
> > >Shaohua
> > 
> > It is hard to reproduce. Our storage exports 4 iscsi LUNs which are built 
> > on 
> > top of a dm-thin pool with 8 x SSD RAID6. Each two of these iscsi LUNs are 
> > connected to a windows server which runs 4 fio jobs with 16 threads and 
> > iodepth = 16. In this setup, we can get the following call trace after 
> > couples 
> > of hours. The stack dump basically shows that the raid5_unplug is holding 
> > the 
> > conf->device_lock for a very long time. After adding some debug traces, we 
> > can 
> > confirm that the raid5_plug_cb->list is corrupted which sometimes leads to 
> > infinite loop in raid5_unplug. 
> > 
> > I didn't notice that add_stripe_bio will bail out if the stripe_head is in 
> > someone's batch list, but I am wondering if the following scenario will 
> > still 
> > happen. If a stripe_head A is added to another stripe_head B's batch list, 
> > in 
> > this case A will have its batch_head != NULL and be added into the plug 
> > list. 
> > After that, stripe_head B gets handled and called break_stripe_batch_list() 
> > to 
> > reset all the batched stripe_head(including A which is still on the plug 
> > list)'s 
> > state and reset their batch_head to NULL. Before the plug list gets 
> > processed, 
> > if there is another write request comes in and get stripe_head A, A will 
> > have 
> > its batch_head == NULL (cleared by calling break_stripe_batch_list() on B) 
> > and 
> > be added to plug list once again. 
> > 
> > Since we have the environment which can reproduce this issue in 6 hours, if 
> > there is anything we can do to help resolve this issue, we are more than 
> > willing to help. 
> 
> yes, I think you are right. If the stripe is on one thread's plug list, the
> thread doesn't do unplug till the stripe is handled (because it's part of a
> batch), we will have the stripe in plug list but not have the bit set. Can you
> resend the patch with description what happens? BTW, does the patch fix your
> issue?
> 
> Thanks,
> Shaohua

Sure, I will resend a patch to you with updated commit message. 

We have already run the same stress test on the same environment for 6 days 
without any error. Since the test cannot run more than 6 hours before this 
patch, 
I think it does fix my issue. 

Thanks,
Dennis
 
> > 
> > Thanks,
> > Dennis
> > 
> > [ 4481.493998] BUG: spinlock lockup suspected on CPU#10, md1_raid6/5497
> > [ 4481.501109]  lock: 0x88102b242888, .magic: dead4ead, .owner: 
> > kworker/u24:2/22322, .owner_cpu: 0
> > [ 4481.511231] CPU: 10 PID: 5497 Comm: md1_raid6 Tainted: PW  O
> > 4.2.8 #7
> > [ 4481.519504] Hardware name: Default string Default string/Default string, 
> > BIOS QY38AR06 01/24/2017
> > [ 4481.529429]  88102b242888 881014967c78 81c933b6 
> > 0007
> > [ 4481.537739]  8805c9720a00 881014967c98 810d2c20 
> > 88102b242888
> > [ 4481.546048]  82d0f7c8 881014967cc8 810d2d9b 
> > 88102b242888
> > [ 4481.554357] Call Trace:
> > [ 4481.557093]  [] dump_stack+0x4c/0x65
> > [ 4481.562840]  [] spin_dump+0x80/0xd0
> > [ 4481.568489]  [] do_raw_spin_lock+0x9b/0x120
> > [ 4481.574916]  [] _raw_spin_lock_irq+0x41/0x50
> > [ 4481.581443]  [] ? raid5d+0x64/0x790
> > [ 4481.587091]  [] raid5d+0x64/0x790
> > [ 4481.592546]  [] ? finish_wait+0x46/0x80
> > [ 4481.598583]  [] ? finish_wait+0x68/0x80
> > [ 4481.604621]  [] md_thread+0x13e/0x150
> > [ 4481.610464]  [] ? wait_woken+0xb0/0xb0
> > [ 4481.616405]  [] ? 

Re: [f2fs-dev] [PATCH 2/2] f2fs: use generic terms used for encrypted block management

2017-09-05 Thread Chao Yu
On 2017/9/6 8:15, Jaegeuk Kim wrote:
> This patch renames functions regarding to buffer management via META_MAPPING
> used for encrypted blocks especially. We can actually use them in generic way.

Meta inode cache is more like bd_inode cache now. ;)

Reviewed-by: Chao Yu 

Thanks,

> 
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/data.c|  6 +++---
>  fs/f2fs/f2fs.h|  3 +--
>  fs/f2fs/file.c|  2 +-
>  fs/f2fs/gc.c  | 13 +
>  fs/f2fs/segment.c |  3 +--
>  5 files changed, 15 insertions(+), 12 deletions(-)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index e6c683e7a10e..ee6801fdbdec 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -1163,7 +1163,7 @@ static struct bio *f2fs_grab_bio(struct inode *inode, 
> block_t blkaddr,
>   return ERR_CAST(ctx);
>  
>   /* wait the page to be moved by cleaning */
> - f2fs_wait_on_encrypted_page_writeback(sbi, blkaddr);
> + f2fs_wait_on_block_writeback(sbi, blkaddr);
>   }
>  
>   bio = bio_alloc(GFP_KERNEL, min_t(int, nr_pages, BIO_MAX_PAGES));
> @@ -1349,7 +1349,7 @@ static int encrypt_one_page(struct f2fs_io_info *fio)
>   return 0;
>  
>   /* wait for GCed encrypted page writeback */
> - f2fs_wait_on_encrypted_page_writeback(fio->sbi, fio->old_blkaddr);
> + f2fs_wait_on_block_writeback(fio->sbi, fio->old_blkaddr);
>  
>  retry_encrypt:
>   fio->encrypted_page = fscrypt_encrypt_page(inode, fio->page,
> @@ -1975,7 +1975,7 @@ static int f2fs_write_begin(struct file *file, struct 
> address_space *mapping,
>  
>   /* wait for GCed encrypted page writeback */
>   if (f2fs_encrypted_file(inode))
> - f2fs_wait_on_encrypted_page_writeback(sbi, blkaddr);
> + f2fs_wait_on_block_writeback(sbi, blkaddr);
>  
>   if (len == PAGE_SIZE || PageUptodate(page))
>   return 0;
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 417c28f01fd5..91fb749686f2 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -2550,8 +2550,7 @@ void allocate_data_block(struct f2fs_sb_info *sbi, 
> struct page *page,
>   struct f2fs_io_info *fio, bool add_list);
>  void f2fs_wait_on_page_writeback(struct page *page,
>   enum page_type type, bool ordered);
> -void f2fs_wait_on_encrypted_page_writeback(struct f2fs_sb_info *sbi,
> - block_t blkaddr);
> +void f2fs_wait_on_block_writeback(struct f2fs_sb_info *sbi, block_t blkaddr);
>  void write_data_summaries(struct f2fs_sb_info *sbi, block_t start_blk);
>  void write_node_summaries(struct f2fs_sb_info *sbi, block_t start_blk);
>  int lookup_journal_in_cursum(struct f2fs_journal *journal, int type,
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index f8cedaba7ce4..224379a9848c 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -107,7 +107,7 @@ static int f2fs_vm_page_mkwrite(struct vm_fault *vmf)
>  
>   /* wait for GCed encrypted page writeback */
>   if (f2fs_encrypted_file(inode))
> - f2fs_wait_on_encrypted_page_writeback(sbi, dn.data_blkaddr);
> + f2fs_wait_on_block_writeback(sbi, dn.data_blkaddr);
>  
>  out_sem:
>   up_read(_I(inode)->i_mmap_sem);
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index 6a12f33d0cdd..bfe6a8ccc3a0 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -599,8 +599,12 @@ static bool is_alive(struct f2fs_sb_info *sbi, struct 
> f2fs_summary *sum,
>   return true;
>  }
>  
> -static void move_encrypted_block(struct inode *inode, block_t bidx,
> - unsigned int segno, int 
> off)
> +/*
> + * Move data block via META_MAPPING while keeping locked data page.
> + * This can be used to move blocks, aka LBAs, directly on disk.
> + */
> +static void move_data_block(struct inode *inode, block_t bidx,
> + unsigned int segno, int off)
>  {
>   struct f2fs_io_info fio = {
>   .sbi = F2FS_I_SB(inode),
> @@ -873,9 +877,10 @@ static void gc_data_segment(struct f2fs_sb_info *sbi, 
> struct f2fs_summary *sum,
>   start_bidx = start_bidx_of_node(nofs, inode)
>   + ofs_in_node;
>   if (f2fs_encrypted_file(inode))
> - move_encrypted_block(inode, start_bidx, segno, 
> off);
> + move_data_block(inode, start_bidx, segno, off);
>   else
> - move_data_page(inode, start_bidx, gc_type, 
> segno, off);
> + move_data_page(inode, start_bidx, gc_type,
> + segno, off);
>  
>   if (locked) {
>   up_write(>dio_rwsem[WRITE]);
> diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
> index 265c3bc44f2d..9e708e525ba8 100644
> --- 

Re: [PATCH] md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list

2017-09-05 Thread Dennis Yang
> On Fri, Sep 01, 2017 at 05:26:48PM +0800, Dennis Yang wrote:
> > >On Mon, Aug 28, 2017 at 08:01:59PM +0800, Dennis Yang wrote:
> > >> break_stripe_batch_list() did not preserve STRIPE_ON_UNPLUG_LIST which is
> > >> set when a stripe_head gets queued to the stripe_head list maintained by
> > >> raid5_plug_cb and waiting for releasing after blk_unplug().
> > >> 
> > >> In release_stripe_plug(), if a stripe_head has its STRIPE_ON_UNPLUG_LIST
> > >> set, it indicates that this stripe_head is already in the raid5_plug_cb
> > >> list and release_stripe() would be called instead to drop a reference
> > >> count. Otherwise, the STRIPE_ON_UNPLUG_LIST bit would be set for this
> > >> stripe_head and it will get queued into the raid5_plug_cb list.
> > >> 
> > >> Without preserving STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list(),
> > >> a stripe_head could be re-added to the raid5_plug_cb list while it is
> > >> currently on that list and waiting to be released. This would mess up
> > >> the raid5_plug_cb and leads to soft/hard lockup in raid5_unplug() or
> > >> kernel crash.
> > >
> > >Did you observe this? From my understanding, this is impossible. If a 
> > >stripe is
> > >in batch list (that's why we have break_stripe_batch_list), add_stripe_bio 
> > >will
> > >bail out (because sh->batch_head != NULL), so the stripe is never added 
> > >into
> > >plug list, hence has the STRIPE_ON_UNPLUG_LIST bit set. Am I missing 
> > >anything?
> > >
> > >Thanks,
> > >Shaohua
> > 
> > It is hard to reproduce. Our storage exports 4 iscsi LUNs which are built 
> > on 
> > top of a dm-thin pool with 8 x SSD RAID6. Each two of these iscsi LUNs are 
> > connected to a windows server which runs 4 fio jobs with 16 threads and 
> > iodepth = 16. In this setup, we can get the following call trace after 
> > couples 
> > of hours. The stack dump basically shows that the raid5_unplug is holding 
> > the 
> > conf->device_lock for a very long time. After adding some debug traces, we 
> > can 
> > confirm that the raid5_plug_cb->list is corrupted which sometimes leads to 
> > infinite loop in raid5_unplug. 
> > 
> > I didn't notice that add_stripe_bio will bail out if the stripe_head is in 
> > someone's batch list, but I am wondering if the following scenario will 
> > still 
> > happen. If a stripe_head A is added to another stripe_head B's batch list, 
> > in 
> > this case A will have its batch_head != NULL and be added into the plug 
> > list. 
> > After that, stripe_head B gets handled and called break_stripe_batch_list() 
> > to 
> > reset all the batched stripe_head(including A which is still on the plug 
> > list)'s 
> > state and reset their batch_head to NULL. Before the plug list gets 
> > processed, 
> > if there is another write request comes in and get stripe_head A, A will 
> > have 
> > its batch_head == NULL (cleared by calling break_stripe_batch_list() on B) 
> > and 
> > be added to plug list once again. 
> > 
> > Since we have the environment which can reproduce this issue in 6 hours, if 
> > there is anything we can do to help resolve this issue, we are more than 
> > willing to help. 
> 
> yes, I think you are right. If the stripe is on one thread's plug list, the
> thread doesn't do unplug till the stripe is handled (because it's part of a
> batch), we will have the stripe in plug list but not have the bit set. Can you
> resend the patch with description what happens? BTW, does the patch fix your
> issue?
> 
> Thanks,
> Shaohua

Sure, I will resend a patch to you with updated commit message. 

We have already run the same stress test on the same environment for 6 days 
without any error. Since the test cannot run more than 6 hours before this 
patch, 
I think it does fix my issue. 

Thanks,
Dennis
 
> > 
> > Thanks,
> > Dennis
> > 
> > [ 4481.493998] BUG: spinlock lockup suspected on CPU#10, md1_raid6/5497
> > [ 4481.501109]  lock: 0x88102b242888, .magic: dead4ead, .owner: 
> > kworker/u24:2/22322, .owner_cpu: 0
> > [ 4481.511231] CPU: 10 PID: 5497 Comm: md1_raid6 Tainted: PW  O
> > 4.2.8 #7
> > [ 4481.519504] Hardware name: Default string Default string/Default string, 
> > BIOS QY38AR06 01/24/2017
> > [ 4481.529429]  88102b242888 881014967c78 81c933b6 
> > 0007
> > [ 4481.537739]  8805c9720a00 881014967c98 810d2c20 
> > 88102b242888
> > [ 4481.546048]  82d0f7c8 881014967cc8 810d2d9b 
> > 88102b242888
> > [ 4481.554357] Call Trace:
> > [ 4481.557093]  [] dump_stack+0x4c/0x65
> > [ 4481.562840]  [] spin_dump+0x80/0xd0
> > [ 4481.568489]  [] do_raw_spin_lock+0x9b/0x120
> > [ 4481.574916]  [] _raw_spin_lock_irq+0x41/0x50
> > [ 4481.581443]  [] ? raid5d+0x64/0x790
> > [ 4481.587091]  [] raid5d+0x64/0x790
> > [ 4481.592546]  [] ? finish_wait+0x46/0x80
> > [ 4481.598583]  [] ? finish_wait+0x68/0x80
> > [ 4481.604621]  [] md_thread+0x13e/0x150
> > [ 4481.610464]  [] ? wait_woken+0xb0/0xb0
> > [ 4481.616405]  [] ? 

Re: [PATCH v3 2/2] ip6_tunnel: fix ip6 tunnel lookup in collect_md mode

2017-09-05 Thread Alexei Starovoitov

On 9/4/17 1:36 AM, Haishuang Yan wrote:

In collect_md mode, if the tun dev is down, it still can call
__ip6_tnl_rcv to receive on packets, and the rx statistics increase
improperly.

Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
Cc: Alexei Starovoitov 
Signed-off-by: Haishuang Yan 

---
Change since v3:
  * Increment rx_dropped if tunnel device is not up, suggested by
  Pravin B Shelar
  * Fix wrong recipient address
---
 net/ipv6/ip6_tunnel.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 10a693a..e91d3b6 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -171,8 +171,11 @@ static struct net_device_stats *ip6_get_stats(struct 
net_device *dev)
}

t = rcu_dereference(ip6n->collect_md_tun);
-   if (t)
-   return t;
+   if (t) {
+   if (t->dev->flags & IFF_UP)
+   return t;
+   t->dev->stats.rx_dropped++;
+   }


Why increment the stats only for this drop case?
There are plenty of other conditions where packet
will be dropped in ip6 tunnel. I think it's important
to present consistent behavior to the users,
so I'd increment drop stats either for all drop cases
or for none. And today it's none.
The ! IFF_UP case should probably be return NULL too.



Re: [PATCH v3 2/2] ip6_tunnel: fix ip6 tunnel lookup in collect_md mode

2017-09-05 Thread Alexei Starovoitov

On 9/4/17 1:36 AM, Haishuang Yan wrote:

In collect_md mode, if the tun dev is down, it still can call
__ip6_tnl_rcv to receive on packets, and the rx statistics increase
improperly.

Fixes: 8d79266bc48c ("ip6_tunnel: add collect_md mode to IPv6 tunnels")
Cc: Alexei Starovoitov 
Signed-off-by: Haishuang Yan 

---
Change since v3:
  * Increment rx_dropped if tunnel device is not up, suggested by
  Pravin B Shelar
  * Fix wrong recipient address
---
 net/ipv6/ip6_tunnel.c | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/net/ipv6/ip6_tunnel.c b/net/ipv6/ip6_tunnel.c
index 10a693a..e91d3b6 100644
--- a/net/ipv6/ip6_tunnel.c
+++ b/net/ipv6/ip6_tunnel.c
@@ -171,8 +171,11 @@ static struct net_device_stats *ip6_get_stats(struct 
net_device *dev)
}

t = rcu_dereference(ip6n->collect_md_tun);
-   if (t)
-   return t;
+   if (t) {
+   if (t->dev->flags & IFF_UP)
+   return t;
+   t->dev->stats.rx_dropped++;
+   }


Why increment the stats only for this drop case?
There are plenty of other conditions where packet
will be dropped in ip6 tunnel. I think it's important
to present consistent behavior to the users,
so I'd increment drop stats either for all drop cases
or for none. And today it's none.
The ! IFF_UP case should probably be return NULL too.



Re: [f2fs-dev] [PATCH 1/2] f2fs: introduce f2fs_encrypted_file for clean-up

2017-09-05 Thread Chao Yu
On 2017/9/6 8:15, Jaegeuk Kim wrote:
> This patch replaces (f2fs_encrypted_inode() && S_ISREG()) with
> f2fs_encrypted_file(), which gives no functional change.
> 
> Signed-off-by: Jaegeuk Kim 

Reviewed-by: Chao Yu 

Thanks,

> ---
>  fs/f2fs/data.c   | 10 +-
>  fs/f2fs/f2fs.h   |  5 +
>  fs/f2fs/file.c   |  2 +-
>  fs/f2fs/gc.c |  5 ++---
>  fs/f2fs/inline.c |  2 +-
>  5 files changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 67da4f6eeaa0..e6c683e7a10e 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -581,7 +581,7 @@ struct page *get_read_data_page(struct inode *inode, 
> pgoff_t index,
>   .encrypted_page = NULL,
>   };
>  
> - if (f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode))
> + if (f2fs_encrypted_file(inode))
>   return read_mapping_page(mapping, index, NULL);
>  
>   page = f2fs_grab_cache_page(mapping, index, for_write);
> @@ -786,7 +786,7 @@ static int __allocate_data_block(struct dnode_of_data *dn)
>  
>  static inline bool __force_buffered_io(struct inode *inode, int rw)
>  {
> - return ((f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode)) ||
> + return (f2fs_encrypted_file(inode) ||
>   (rw == WRITE && test_opt(F2FS_I_SB(inode), LFS)) ||
>   F2FS_I_SB(inode)->s_ndevs);
>  }
> @@ -1157,7 +1157,7 @@ static struct bio *f2fs_grab_bio(struct inode *inode, 
> block_t blkaddr,
>   struct fscrypt_ctx *ctx = NULL;
>   struct bio *bio;
>  
> - if (f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode)) {
> + if (f2fs_encrypted_file(inode)) {
>   ctx = fscrypt_get_ctx(inode, GFP_NOFS);
>   if (IS_ERR(ctx))
>   return ERR_CAST(ctx);
> @@ -1345,7 +1345,7 @@ static int encrypt_one_page(struct f2fs_io_info *fio)
>   struct inode *inode = fio->page->mapping->host;
>   gfp_t gfp_flags = GFP_NOFS;
>  
> - if (!f2fs_encrypted_inode(inode) || !S_ISREG(inode->i_mode))
> + if (!f2fs_encrypted_file(inode))
>   return 0;
>  
>   /* wait for GCed encrypted page writeback */
> @@ -1974,7 +1974,7 @@ static int f2fs_write_begin(struct file *file, struct 
> address_space *mapping,
>   f2fs_wait_on_page_writeback(page, DATA, false);
>  
>   /* wait for GCed encrypted page writeback */
> - if (f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode))
> + if (f2fs_encrypted_file(inode))
>   f2fs_wait_on_encrypted_page_writeback(sbi, blkaddr);
>  
>   if (len == PAGE_SIZE || PageUptodate(page))
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 4b993961d81d..417c28f01fd5 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -2939,6 +2939,11 @@ static inline bool f2fs_encrypted_inode(struct inode 
> *inode)
>   return file_is_encrypt(inode);
>  }
>  
> +static inline bool f2fs_encrypted_file(struct inode *inode)
> +{
> + return f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode);
> +}
> +
>  static inline void f2fs_set_encrypted_inode(struct inode *inode)
>  {
>  #ifdef CONFIG_F2FS_FS_ENCRYPTION
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 63e394907808..f8cedaba7ce4 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -106,7 +106,7 @@ static int f2fs_vm_page_mkwrite(struct vm_fault *vmf)
>   f2fs_wait_on_page_writeback(page, DATA, false);
>  
>   /* wait for GCed encrypted page writeback */
> - if (f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode))
> + if (f2fs_encrypted_file(inode))
>   f2fs_wait_on_encrypted_page_writeback(sbi, dn.data_blkaddr);
>  
>  out_sem:
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index b226760afba8..6a12f33d0cdd 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -831,8 +831,7 @@ static void gc_data_segment(struct f2fs_sb_info *sbi, 
> struct f2fs_summary *sum,
>   continue;
>  
>   /* if encrypted inode, let's go phase 3 */
> - if (f2fs_encrypted_inode(inode) &&
> - S_ISREG(inode->i_mode)) {
> + if (f2fs_encrypted_file(inode)) {
>   add_gc_inode(gc_list, inode);
>   continue;
>   }
> @@ -873,7 +872,7 @@ static void gc_data_segment(struct f2fs_sb_info *sbi, 
> struct f2fs_summary *sum,
>  
>   start_bidx = start_bidx_of_node(nofs, inode)
>   + ofs_in_node;
> - if (f2fs_encrypted_inode(inode) && 
> S_ISREG(inode->i_mode))
> + if (f2fs_encrypted_file(inode))
>   move_encrypted_block(inode, start_bidx, segno, 
> off);
>   else
>   move_data_page(inode, start_bidx, gc_type, 
> segno, off);
> diff --git 

Re: [f2fs-dev] [PATCH 1/2] f2fs: introduce f2fs_encrypted_file for clean-up

2017-09-05 Thread Chao Yu
On 2017/9/6 8:15, Jaegeuk Kim wrote:
> This patch replaces (f2fs_encrypted_inode() && S_ISREG()) with
> f2fs_encrypted_file(), which gives no functional change.
> 
> Signed-off-by: Jaegeuk Kim 

Reviewed-by: Chao Yu 

Thanks,

> ---
>  fs/f2fs/data.c   | 10 +-
>  fs/f2fs/f2fs.h   |  5 +
>  fs/f2fs/file.c   |  2 +-
>  fs/f2fs/gc.c |  5 ++---
>  fs/f2fs/inline.c |  2 +-
>  5 files changed, 14 insertions(+), 10 deletions(-)
> 
> diff --git a/fs/f2fs/data.c b/fs/f2fs/data.c
> index 67da4f6eeaa0..e6c683e7a10e 100644
> --- a/fs/f2fs/data.c
> +++ b/fs/f2fs/data.c
> @@ -581,7 +581,7 @@ struct page *get_read_data_page(struct inode *inode, 
> pgoff_t index,
>   .encrypted_page = NULL,
>   };
>  
> - if (f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode))
> + if (f2fs_encrypted_file(inode))
>   return read_mapping_page(mapping, index, NULL);
>  
>   page = f2fs_grab_cache_page(mapping, index, for_write);
> @@ -786,7 +786,7 @@ static int __allocate_data_block(struct dnode_of_data *dn)
>  
>  static inline bool __force_buffered_io(struct inode *inode, int rw)
>  {
> - return ((f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode)) ||
> + return (f2fs_encrypted_file(inode) ||
>   (rw == WRITE && test_opt(F2FS_I_SB(inode), LFS)) ||
>   F2FS_I_SB(inode)->s_ndevs);
>  }
> @@ -1157,7 +1157,7 @@ static struct bio *f2fs_grab_bio(struct inode *inode, 
> block_t blkaddr,
>   struct fscrypt_ctx *ctx = NULL;
>   struct bio *bio;
>  
> - if (f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode)) {
> + if (f2fs_encrypted_file(inode)) {
>   ctx = fscrypt_get_ctx(inode, GFP_NOFS);
>   if (IS_ERR(ctx))
>   return ERR_CAST(ctx);
> @@ -1345,7 +1345,7 @@ static int encrypt_one_page(struct f2fs_io_info *fio)
>   struct inode *inode = fio->page->mapping->host;
>   gfp_t gfp_flags = GFP_NOFS;
>  
> - if (!f2fs_encrypted_inode(inode) || !S_ISREG(inode->i_mode))
> + if (!f2fs_encrypted_file(inode))
>   return 0;
>  
>   /* wait for GCed encrypted page writeback */
> @@ -1974,7 +1974,7 @@ static int f2fs_write_begin(struct file *file, struct 
> address_space *mapping,
>   f2fs_wait_on_page_writeback(page, DATA, false);
>  
>   /* wait for GCed encrypted page writeback */
> - if (f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode))
> + if (f2fs_encrypted_file(inode))
>   f2fs_wait_on_encrypted_page_writeback(sbi, blkaddr);
>  
>   if (len == PAGE_SIZE || PageUptodate(page))
> diff --git a/fs/f2fs/f2fs.h b/fs/f2fs/f2fs.h
> index 4b993961d81d..417c28f01fd5 100644
> --- a/fs/f2fs/f2fs.h
> +++ b/fs/f2fs/f2fs.h
> @@ -2939,6 +2939,11 @@ static inline bool f2fs_encrypted_inode(struct inode 
> *inode)
>   return file_is_encrypt(inode);
>  }
>  
> +static inline bool f2fs_encrypted_file(struct inode *inode)
> +{
> + return f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode);
> +}
> +
>  static inline void f2fs_set_encrypted_inode(struct inode *inode)
>  {
>  #ifdef CONFIG_F2FS_FS_ENCRYPTION
> diff --git a/fs/f2fs/file.c b/fs/f2fs/file.c
> index 63e394907808..f8cedaba7ce4 100644
> --- a/fs/f2fs/file.c
> +++ b/fs/f2fs/file.c
> @@ -106,7 +106,7 @@ static int f2fs_vm_page_mkwrite(struct vm_fault *vmf)
>   f2fs_wait_on_page_writeback(page, DATA, false);
>  
>   /* wait for GCed encrypted page writeback */
> - if (f2fs_encrypted_inode(inode) && S_ISREG(inode->i_mode))
> + if (f2fs_encrypted_file(inode))
>   f2fs_wait_on_encrypted_page_writeback(sbi, dn.data_blkaddr);
>  
>  out_sem:
> diff --git a/fs/f2fs/gc.c b/fs/f2fs/gc.c
> index b226760afba8..6a12f33d0cdd 100644
> --- a/fs/f2fs/gc.c
> +++ b/fs/f2fs/gc.c
> @@ -831,8 +831,7 @@ static void gc_data_segment(struct f2fs_sb_info *sbi, 
> struct f2fs_summary *sum,
>   continue;
>  
>   /* if encrypted inode, let's go phase 3 */
> - if (f2fs_encrypted_inode(inode) &&
> - S_ISREG(inode->i_mode)) {
> + if (f2fs_encrypted_file(inode)) {
>   add_gc_inode(gc_list, inode);
>   continue;
>   }
> @@ -873,7 +872,7 @@ static void gc_data_segment(struct f2fs_sb_info *sbi, 
> struct f2fs_summary *sum,
>  
>   start_bidx = start_bidx_of_node(nofs, inode)
>   + ofs_in_node;
> - if (f2fs_encrypted_inode(inode) && 
> S_ISREG(inode->i_mode))
> + if (f2fs_encrypted_file(inode))
>   move_encrypted_block(inode, start_bidx, segno, 
> off);
>   else
>   move_data_page(inode, start_bidx, gc_type, 
> segno, off);
> diff --git a/fs/f2fs/inline.c b/fs/f2fs/inline.c
> index 

Re: [PATCH] f2fs: fix wrong bfree and bavail

2017-09-05 Thread Jaegeuk Kim
On 09/06, Chao Yu wrote:
> On 2017/9/6 5:06, Jaegeuk Kim wrote:
> > This patch fixes bavail and bfree finally.
> > 
> > longf_bfree;/* free blocks in fs */
> > longf_bavail;   /* free blocks avail to non-superuser */
> > 
> > So, bfree represents all the reserved blocks, while bavail does the space 
> > only
> > visible to normal user.
> > 
> > Fix: 3e6d0b4d9c1c ("f2fs: fix incorrect f_bfree calculation in ->statfs")
> > Cc:  # 4.8+
> > Signed-off-by: Jaegeuk Kim 
> > ---
> >  fs/f2fs/super.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > index 7e29db227910..17cca4cb93af 100644
> > --- a/fs/f2fs/super.c
> > +++ b/fs/f2fs/super.c
> > @@ -955,8 +955,8 @@ static int f2fs_statfs(struct dentry *dentry, struct 
> > kstatfs *buf)
> > buf->f_bsize = sbi->blocksize;
> >  
> > buf->f_blocks = total_count - start_count;
> > -   buf->f_bfree = user_block_count - valid_user_blocks(sbi) + ovp_count;
> > -   buf->f_bavail = user_block_count - valid_user_blocks(sbi) -
> > +   buf->f_bfree = user_block_count - valid_user_blocks(sbi);
> > +   buf->f_bavail = user_block_count - valid_user_blocks(sbi) - ovp_count -
> 
>   set_cp(user_block_count, ((get_cp(free_segment_count) + 6 -
>   get_cp(overprov_segment_count)) * c.blks_per_seg));
> 
> As we calculated in mkfs, we have removed ovp_count in user_block_count, so
> shouldn't we add ovp_count for f_bfree here?

Hehe, I already dropped this. :P
It seems like something wrong with df in android.

> 
> Thanks,
> 
> > sbi->reserved_blocks;
> >  
> > avail_node_count = sbi->total_node_count - F2FS_RESERVED_NODE_NUM;
> > 


Re: [PATCH] f2fs: fix wrong bfree and bavail

2017-09-05 Thread Jaegeuk Kim
On 09/06, Chao Yu wrote:
> On 2017/9/6 5:06, Jaegeuk Kim wrote:
> > This patch fixes bavail and bfree finally.
> > 
> > longf_bfree;/* free blocks in fs */
> > longf_bavail;   /* free blocks avail to non-superuser */
> > 
> > So, bfree represents all the reserved blocks, while bavail does the space 
> > only
> > visible to normal user.
> > 
> > Fix: 3e6d0b4d9c1c ("f2fs: fix incorrect f_bfree calculation in ->statfs")
> > Cc:  # 4.8+
> > Signed-off-by: Jaegeuk Kim 
> > ---
> >  fs/f2fs/super.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> > 
> > diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> > index 7e29db227910..17cca4cb93af 100644
> > --- a/fs/f2fs/super.c
> > +++ b/fs/f2fs/super.c
> > @@ -955,8 +955,8 @@ static int f2fs_statfs(struct dentry *dentry, struct 
> > kstatfs *buf)
> > buf->f_bsize = sbi->blocksize;
> >  
> > buf->f_blocks = total_count - start_count;
> > -   buf->f_bfree = user_block_count - valid_user_blocks(sbi) + ovp_count;
> > -   buf->f_bavail = user_block_count - valid_user_blocks(sbi) -
> > +   buf->f_bfree = user_block_count - valid_user_blocks(sbi);
> > +   buf->f_bavail = user_block_count - valid_user_blocks(sbi) - ovp_count -
> 
>   set_cp(user_block_count, ((get_cp(free_segment_count) + 6 -
>   get_cp(overprov_segment_count)) * c.blks_per_seg));
> 
> As we calculated in mkfs, we have removed ovp_count in user_block_count, so
> shouldn't we add ovp_count for f_bfree here?

Hehe, I already dropped this. :P
It seems like something wrong with df in android.

> 
> Thanks,
> 
> > sbi->reserved_blocks;
> >  
> > avail_node_count = sbi->total_node_count - F2FS_RESERVED_NODE_NUM;
> > 


Re: [PATCH] f2fs: fix wrong bfree and bavail

2017-09-05 Thread Chao Yu
On 2017/9/6 5:06, Jaegeuk Kim wrote:
> This patch fixes bavail and bfree finally.
> 
> longf_bfree;/* free blocks in fs */
> longf_bavail;   /* free blocks avail to non-superuser */
> 
> So, bfree represents all the reserved blocks, while bavail does the space only
> visible to normal user.
> 
> Fix: 3e6d0b4d9c1c ("f2fs: fix incorrect f_bfree calculation in ->statfs")
> Cc:  # 4.8+
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/super.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 7e29db227910..17cca4cb93af 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -955,8 +955,8 @@ static int f2fs_statfs(struct dentry *dentry, struct 
> kstatfs *buf)
>   buf->f_bsize = sbi->blocksize;
>  
>   buf->f_blocks = total_count - start_count;
> - buf->f_bfree = user_block_count - valid_user_blocks(sbi) + ovp_count;
> - buf->f_bavail = user_block_count - valid_user_blocks(sbi) -
> + buf->f_bfree = user_block_count - valid_user_blocks(sbi);
> + buf->f_bavail = user_block_count - valid_user_blocks(sbi) - ovp_count -

set_cp(user_block_count, ((get_cp(free_segment_count) + 6 -
get_cp(overprov_segment_count)) * c.blks_per_seg));

As we calculated in mkfs, we have removed ovp_count in user_block_count, so
shouldn't we add ovp_count for f_bfree here?

Thanks,

>   sbi->reserved_blocks;
>  
>   avail_node_count = sbi->total_node_count - F2FS_RESERVED_NODE_NUM;
> 



Re: [PATCH] f2fs: fix wrong bfree and bavail

2017-09-05 Thread Chao Yu
On 2017/9/6 5:06, Jaegeuk Kim wrote:
> This patch fixes bavail and bfree finally.
> 
> longf_bfree;/* free blocks in fs */
> longf_bavail;   /* free blocks avail to non-superuser */
> 
> So, bfree represents all the reserved blocks, while bavail does the space only
> visible to normal user.
> 
> Fix: 3e6d0b4d9c1c ("f2fs: fix incorrect f_bfree calculation in ->statfs")
> Cc:  # 4.8+
> Signed-off-by: Jaegeuk Kim 
> ---
>  fs/f2fs/super.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/fs/f2fs/super.c b/fs/f2fs/super.c
> index 7e29db227910..17cca4cb93af 100644
> --- a/fs/f2fs/super.c
> +++ b/fs/f2fs/super.c
> @@ -955,8 +955,8 @@ static int f2fs_statfs(struct dentry *dentry, struct 
> kstatfs *buf)
>   buf->f_bsize = sbi->blocksize;
>  
>   buf->f_blocks = total_count - start_count;
> - buf->f_bfree = user_block_count - valid_user_blocks(sbi) + ovp_count;
> - buf->f_bavail = user_block_count - valid_user_blocks(sbi) -
> + buf->f_bfree = user_block_count - valid_user_blocks(sbi);
> + buf->f_bavail = user_block_count - valid_user_blocks(sbi) - ovp_count -

set_cp(user_block_count, ((get_cp(free_segment_count) + 6 -
get_cp(overprov_segment_count)) * c.blks_per_seg));

As we calculated in mkfs, we have removed ovp_count in user_block_count, so
shouldn't we add ovp_count for f_bfree here?

Thanks,

>   sbi->reserved_blocks;
>  
>   avail_node_count = sbi->total_node_count - F2FS_RESERVED_NODE_NUM;
> 



[PATCH v2] md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list

2017-09-05 Thread Dennis Yang
In release_stripe_plug(), if a stripe_head has its STRIPE_ON_UNPLUG_LIST
set, it indicates that this stripe_head is already in the raid5_plug_cb
list and release_stripe() would be called instead to drop a reference
count. Otherwise, the STRIPE_ON_UNPLUG_LIST bit would be set for this
stripe_head and it will get queued into the raid5_plug_cb list.

Since break_stripe_batch_list() did not preserve STRIPE_ON_UNPLUG_LIST,
A stripe could be re-added to plug list while it is still on that list
in the following situation. If stripe_head A is added to another
stripe_head B's batch list, in this case A will have its
batch_head != NULL and be added into the plug list. After that,
stripe_head B gets handled and called break_stripe_batch_list() to
reset all the batched stripe_head(including A which is still on
the plug list)'s state and reset their batch_head to NULL.
Before the plug list gets processed, if there is another write request
comes in and get stripe_head A, A will have its batch_head == NULL
(cleared by calling break_stripe_batch_list() on B) and be added to
plug list once again.

Signed-off-by: Dennis Yang 
---
 drivers/md/raid5.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index e92dd2d..faf3cfd 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4611,7 +4611,8 @@ static void break_stripe_batch_list(struct stripe_head 
*head_sh,
 
set_mask_bits(>state, ~(STRIPE_EXPAND_SYNC_FLAGS |
(1 << STRIPE_PREREAD_ACTIVE) |
-   (1 << STRIPE_DEGRADED)),
+   (1 << STRIPE_DEGRADED) |
+   (1 << STRIPE_ON_UNPLUG_LIST)),
  head_sh->state & (1 << STRIPE_INSYNC));
 
sh->check_state = head_sh->check_state;
-- 
1.9.1



[PATCH v2] md/raid5: preserve STRIPE_ON_UNPLUG_LIST in break_stripe_batch_list

2017-09-05 Thread Dennis Yang
In release_stripe_plug(), if a stripe_head has its STRIPE_ON_UNPLUG_LIST
set, it indicates that this stripe_head is already in the raid5_plug_cb
list and release_stripe() would be called instead to drop a reference
count. Otherwise, the STRIPE_ON_UNPLUG_LIST bit would be set for this
stripe_head and it will get queued into the raid5_plug_cb list.

Since break_stripe_batch_list() did not preserve STRIPE_ON_UNPLUG_LIST,
A stripe could be re-added to plug list while it is still on that list
in the following situation. If stripe_head A is added to another
stripe_head B's batch list, in this case A will have its
batch_head != NULL and be added into the plug list. After that,
stripe_head B gets handled and called break_stripe_batch_list() to
reset all the batched stripe_head(including A which is still on
the plug list)'s state and reset their batch_head to NULL.
Before the plug list gets processed, if there is another write request
comes in and get stripe_head A, A will have its batch_head == NULL
(cleared by calling break_stripe_batch_list() on B) and be added to
plug list once again.

Signed-off-by: Dennis Yang 
---
 drivers/md/raid5.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index e92dd2d..faf3cfd 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -4611,7 +4611,8 @@ static void break_stripe_batch_list(struct stripe_head 
*head_sh,
 
set_mask_bits(>state, ~(STRIPE_EXPAND_SYNC_FLAGS |
(1 << STRIPE_PREREAD_ACTIVE) |
-   (1 << STRIPE_DEGRADED)),
+   (1 << STRIPE_DEGRADED) |
+   (1 << STRIPE_ON_UNPLUG_LIST)),
  head_sh->state & (1 << STRIPE_INSYNC));
 
sh->check_state = head_sh->check_state;
-- 
1.9.1



linux-next: manual merge of the devicetree tree with Linus' tree

2017-09-05 Thread Stephen Rothwell
Hi Rob,

Today's linux-next merge of the devicetree tree got a conflict in:

  Documentation/devicetree/bindings/net/brcm,bgmac-nsp.txt

between commit:

  ea6c3077678f ("dt-bindings: net: Remove duplicate NSP Ethernet MAC binding 
document")

from Linus' tree and commit:

  4da722ca19f3 ("dt-bindings: Remove "status" from examples")

from the devicetree tree.

I fixed it up (the former removed the file, so I did that) and can
carry the fix as necessary. This is now fixed as far as linux-next is
concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.



-- 
Cheers,
Stephen Rothwell


linux-next: manual merge of the devicetree tree with Linus' tree

2017-09-05 Thread Stephen Rothwell
Hi Rob,

Today's linux-next merge of the devicetree tree got a conflict in:

  Documentation/devicetree/bindings/net/brcm,bgmac-nsp.txt

between commit:

  ea6c3077678f ("dt-bindings: net: Remove duplicate NSP Ethernet MAC binding 
document")

from Linus' tree and commit:

  4da722ca19f3 ("dt-bindings: Remove "status" from examples")

from the devicetree tree.

I fixed it up (the former removed the file, so I did that) and can
carry the fix as necessary. This is now fixed as far as linux-next is
concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.



-- 
Cheers,
Stephen Rothwell


linux-next: manual merge of the devicetree tree with the sound tree

2017-09-05 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the devicetree tree got a conflict in:

  Documentation/devicetree/bindings/sound/rockchip,pdm.txt

between commit:

  fce7358b54c6 ("ASoC: rockchip: separate pinctrl pins from each other")

from the sound tree and commit:

  4da722ca19f3 ("dt-bindings: Remove "status" from examples")

from the devicetree tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc Documentation/devicetree/bindings/sound/rockchip,pdm.txt
index 2ad66f649a28,beb32ca26e5d..
--- a/Documentation/devicetree/bindings/sound/rockchip,pdm.txt
+++ b/Documentation/devicetree/bindings/sound/rockchip,pdm.txt
@@@ -33,10 -34,5 +33,9 @@@ pdm: pdm@ff04 
 _sdi1
 _sdi2
 _sdi3>;
 -  pinctrl-1 = <_sleep>;
 +  pinctrl-1 = <_clk_sleep
 +   _sdi0_sleep
 +   _sdi1_sleep
 +   _sdi2_sleep
 +   _sdi3_sleep>;
-   status = "disabled";
  };


linux-next: manual merge of the devicetree tree with the sound tree

2017-09-05 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the devicetree tree got a conflict in:

  Documentation/devicetree/bindings/sound/rockchip,pdm.txt

between commit:

  fce7358b54c6 ("ASoC: rockchip: separate pinctrl pins from each other")

from the sound tree and commit:

  4da722ca19f3 ("dt-bindings: Remove "status" from examples")

from the devicetree tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc Documentation/devicetree/bindings/sound/rockchip,pdm.txt
index 2ad66f649a28,beb32ca26e5d..
--- a/Documentation/devicetree/bindings/sound/rockchip,pdm.txt
+++ b/Documentation/devicetree/bindings/sound/rockchip,pdm.txt
@@@ -33,10 -34,5 +33,9 @@@ pdm: pdm@ff04 
 _sdi1
 _sdi2
 _sdi3>;
 -  pinctrl-1 = <_sleep>;
 +  pinctrl-1 = <_clk_sleep
 +   _sdi0_sleep
 +   _sdi1_sleep
 +   _sdi2_sleep
 +   _sdi3_sleep>;
-   status = "disabled";
  };


  1   2   3   4   5   6   7   8   9   10   >