Re: [PATCH v4 5/6] bus: fsl-mc: supoprt dma configure for devices on fsl-mc bus
Hi Nipun, Thank you for the patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v4.17-rc3 next-20180430] [cannot apply to iommu/next glikely/devicetree/next] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Nipun-Gupta/Support-for-fsl-mc-bus-and-its-devices-in-SMMU/20180501-125745 config: x86_64-allmodconfig (attached as .config) compiler: gcc-7 (Debian 7.3.0-16) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): drivers/bus/fsl-mc/fsl-mc-bus.c: In function 'fsl_mc_dma_configure': >> drivers/bus/fsl-mc/fsl-mc-bus.c:137:9: error: too many arguments to function >> 'of_dma_configure' return of_dma_configure(dev, dma_dev->of_node, 0); ^~~~ In file included from drivers/bus/fsl-mc/fsl-mc-bus.c:13:0: include/linux/of_device.h:58:5: note: declared here int of_dma_configure(struct device *dev, struct device_node *np); ^~~~ drivers/bus/fsl-mc/fsl-mc-bus.c: At top level: >> drivers/bus/fsl-mc/fsl-mc-bus.c:161:3: error: 'struct bus_type' has no >> member named 'dma_configure' .dma_configure = fsl_mc_dma_configure, ^ vim +/of_dma_configure +137 drivers/bus/fsl-mc/fsl-mc-bus.c 129 130 static int fsl_mc_dma_configure(struct device *dev) 131 { 132 struct device *dma_dev = dev; 133 134 while (dev_is_fsl_mc(dma_dev)) 135 dma_dev = dma_dev->parent; 136 > 137 return of_dma_configure(dev, dma_dev->of_node, 0); 138 } 139 140 static ssize_t modalias_show(struct device *dev, struct device_attribute *attr, 141 char *buf) 142 { 143 struct fsl_mc_device *mc_dev = to_fsl_mc_device(dev); 144 145 return sprintf(buf, "fsl-mc:v%08Xd%s\n", mc_dev->obj_desc.vendor, 146 mc_dev->obj_desc.type); 147 } 148 static DEVICE_ATTR_RO(modalias); 149 150 static struct attribute *fsl_mc_dev_attrs[] = { 151 _attr_modalias.attr, 152 NULL, 153 }; 154 155 ATTRIBUTE_GROUPS(fsl_mc_dev); 156 157 struct bus_type fsl_mc_bus_type = { 158 .name = "fsl-mc", 159 .match = fsl_mc_bus_match, 160 .uevent = fsl_mc_bus_uevent, > 161 .dma_configure = fsl_mc_dma_configure, 162 .dev_groups = fsl_mc_dev_groups, 163 }; 164 EXPORT_SYMBOL_GPL(fsl_mc_bus_type); 165 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH v4 5/6] bus: fsl-mc: supoprt dma configure for devices on fsl-mc bus
Hi Nipun, Thank you for the patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v4.17-rc3 next-20180430] [cannot apply to iommu/next glikely/devicetree/next] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Nipun-Gupta/Support-for-fsl-mc-bus-and-its-devices-in-SMMU/20180501-125745 config: x86_64-allmodconfig (attached as .config) compiler: gcc-7 (Debian 7.3.0-16) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 All errors (new ones prefixed by >>): drivers/bus/fsl-mc/fsl-mc-bus.c: In function 'fsl_mc_dma_configure': >> drivers/bus/fsl-mc/fsl-mc-bus.c:137:9: error: too many arguments to function >> 'of_dma_configure' return of_dma_configure(dev, dma_dev->of_node, 0); ^~~~ In file included from drivers/bus/fsl-mc/fsl-mc-bus.c:13:0: include/linux/of_device.h:58:5: note: declared here int of_dma_configure(struct device *dev, struct device_node *np); ^~~~ drivers/bus/fsl-mc/fsl-mc-bus.c: At top level: >> drivers/bus/fsl-mc/fsl-mc-bus.c:161:3: error: 'struct bus_type' has no >> member named 'dma_configure' .dma_configure = fsl_mc_dma_configure, ^ vim +/of_dma_configure +137 drivers/bus/fsl-mc/fsl-mc-bus.c 129 130 static int fsl_mc_dma_configure(struct device *dev) 131 { 132 struct device *dma_dev = dev; 133 134 while (dev_is_fsl_mc(dma_dev)) 135 dma_dev = dma_dev->parent; 136 > 137 return of_dma_configure(dev, dma_dev->of_node, 0); 138 } 139 140 static ssize_t modalias_show(struct device *dev, struct device_attribute *attr, 141 char *buf) 142 { 143 struct fsl_mc_device *mc_dev = to_fsl_mc_device(dev); 144 145 return sprintf(buf, "fsl-mc:v%08Xd%s\n", mc_dev->obj_desc.vendor, 146 mc_dev->obj_desc.type); 147 } 148 static DEVICE_ATTR_RO(modalias); 149 150 static struct attribute *fsl_mc_dev_attrs[] = { 151 _attr_modalias.attr, 152 NULL, 153 }; 154 155 ATTRIBUTE_GROUPS(fsl_mc_dev); 156 157 struct bus_type fsl_mc_bus_type = { 158 .name = "fsl-mc", 159 .match = fsl_mc_bus_match, 160 .uevent = fsl_mc_bus_uevent, > 161 .dma_configure = fsl_mc_dma_configure, 162 .dev_groups = fsl_mc_dev_groups, 163 }; 164 EXPORT_SYMBOL_GPL(fsl_mc_bus_type); 165 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
linux-next: Tree for May 1
Hi all, Changes since 20180430: The rdma tree gained a conflict against the rdma-fixes tree. Non-merge commits (relative to Linus' tree): 3348 3188 files changed, 126791 insertions(+), 59113 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu (with and without kvm enabled). Below is a summary of the state of the merge. I am currently merging 258 trees (counting Linus' and 44 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (8188fc8bef8c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc) Merging fixes/master (147a89bc71e7 Merge tag 'kconfig-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild) Merging kbuild-current/fixes (6d08b06e67cd Linux 4.17-rc2) Merging arc-current/for-curr (661e50bc8532 Linux 4.16-rc4) Merging arm-current/fixes (30cfae461581 ARM: replace unnecessary perl with sed and the shell $(( )) operator) Merging arm64-fixes/for-next/fixes (3789c122d0a0 arm64: avoid instrumenting atomic_ll_sc.o) Merging m68k-current/for-linus (ecd685580c8f m68k/mac: Remove bogus "FIXME" comment) Merging powerpc-fixes/fixes (b2d7ecbe3556 powerpc/kvm/booke: Fix altivec related build break) Merging sparc/master (00ad691ab140 sparc: vio: use put_device() instead of kfree()) Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2) Merging net/master (f944ad1b2b66 net: ethernet: ucc: fix spelling mistake: "tx-late-collsion" -> "tx-late-collision") Merging bpf/master (815425567dea bpf: fix uninitialized variable in bpf tools) Merging ipsec/master (b4331a681822 vti6: Change minimum MTU to IPV4_MIN_MTU, vti6 can carry IPv4 too) Merging netfilter/master (2f99aa31cd7a netfilter: nf_tables: skip synchronize_rcu if transaction log is empty) Merging ipvs/master (765cca91b895 netfilter: conntrack: include kmemleak.h for kmemleak_not_leak()) Merging wireless-drivers/master (af8a41cccf8f rtlwifi: cleanup 8723be ant_sel definition) Merging mac80211/master (2f0605a697f4 nl80211: Free connkeys on external authentication failure) Merging rdma-fixes/for-rc (db82476f3741 IB/core: Make ib_mad_client_id atomic) Merging sound-current/for-linus (76b3421b39bd ALSA: aloop: Add missing cable lock to ctl API callbacks) Merging pci-current/for-linus (0cf22d6b317c PCI: Add "PCIe" to pcie_print_link_status() messages) Merging driver-core.current/driver-core-linus (6da6c0db5316 Linux v4.17-rc3) Merging tty.current/tty-linus (6da6c0db5316 Linux v4.17-rc3) Merging usb.current/usb-linus (9aea9b6cc78d usb: musb: trace: fix NULL pointer dereference in musb_g_tx()) Merging usb-gadget-fixes/fixes (ed769520727e usb: gadget: composite Allow for larger configuration descriptors) Merging usb-serial-fixes/usb-linus (470b5d6f0cf4 USB: serial: ftdi_sio: use jtag quirk for Arrow USB Blaster) Merging usb-chipidea-fixes/ci-for-usb-stable (964728f9f407 USB: chipidea: msm: fix ulpi-node lookup) Merging phy/fixes (60cc43fc8884 Linux 4.17-rc1) Merging staging.current/staging-linus (6da6c0db5316 Linux v4.17-rc3) Merging char-misc.current/char-misc-linus (6da6c0db5316 Linux v4.17-rc3) Merging input-current/for-linus (596ea7aad431 MAINTAINERS: Rakesh Iyer can't be reached anymore) Merging crypto-current/master (eea0d3ea7546 crypto: drbg - set freed buffers to NULL) Merging ide/master (8e44e6600caa Merge branch 'KASAN-read_word_at_a_time') Mer
linux-next: Tree for May 1
Hi all, Changes since 20180430: The rdma tree gained a conflict against the rdma-fixes tree. Non-merge commits (relative to Linus' tree): 3348 3188 files changed, 126791 insertions(+), 59113 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu (with and without kvm enabled). Below is a summary of the state of the merge. I am currently merging 258 trees (counting Linus' and 44 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (8188fc8bef8c Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc) Merging fixes/master (147a89bc71e7 Merge tag 'kconfig-v4.17' of git://git.kernel.org/pub/scm/linux/kernel/git/masahiroy/linux-kbuild) Merging kbuild-current/fixes (6d08b06e67cd Linux 4.17-rc2) Merging arc-current/for-curr (661e50bc8532 Linux 4.16-rc4) Merging arm-current/fixes (30cfae461581 ARM: replace unnecessary perl with sed and the shell $(( )) operator) Merging arm64-fixes/for-next/fixes (3789c122d0a0 arm64: avoid instrumenting atomic_ll_sc.o) Merging m68k-current/for-linus (ecd685580c8f m68k/mac: Remove bogus "FIXME" comment) Merging powerpc-fixes/fixes (b2d7ecbe3556 powerpc/kvm/booke: Fix altivec related build break) Merging sparc/master (00ad691ab140 sparc: vio: use put_device() instead of kfree()) Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2) Merging net/master (f944ad1b2b66 net: ethernet: ucc: fix spelling mistake: "tx-late-collsion" -> "tx-late-collision") Merging bpf/master (815425567dea bpf: fix uninitialized variable in bpf tools) Merging ipsec/master (b4331a681822 vti6: Change minimum MTU to IPV4_MIN_MTU, vti6 can carry IPv4 too) Merging netfilter/master (2f99aa31cd7a netfilter: nf_tables: skip synchronize_rcu if transaction log is empty) Merging ipvs/master (765cca91b895 netfilter: conntrack: include kmemleak.h for kmemleak_not_leak()) Merging wireless-drivers/master (af8a41cccf8f rtlwifi: cleanup 8723be ant_sel definition) Merging mac80211/master (2f0605a697f4 nl80211: Free connkeys on external authentication failure) Merging rdma-fixes/for-rc (db82476f3741 IB/core: Make ib_mad_client_id atomic) Merging sound-current/for-linus (76b3421b39bd ALSA: aloop: Add missing cable lock to ctl API callbacks) Merging pci-current/for-linus (0cf22d6b317c PCI: Add "PCIe" to pcie_print_link_status() messages) Merging driver-core.current/driver-core-linus (6da6c0db5316 Linux v4.17-rc3) Merging tty.current/tty-linus (6da6c0db5316 Linux v4.17-rc3) Merging usb.current/usb-linus (9aea9b6cc78d usb: musb: trace: fix NULL pointer dereference in musb_g_tx()) Merging usb-gadget-fixes/fixes (ed769520727e usb: gadget: composite Allow for larger configuration descriptors) Merging usb-serial-fixes/usb-linus (470b5d6f0cf4 USB: serial: ftdi_sio: use jtag quirk for Arrow USB Blaster) Merging usb-chipidea-fixes/ci-for-usb-stable (964728f9f407 USB: chipidea: msm: fix ulpi-node lookup) Merging phy/fixes (60cc43fc8884 Linux 4.17-rc1) Merging staging.current/staging-linus (6da6c0db5316 Linux v4.17-rc3) Merging char-misc.current/char-misc-linus (6da6c0db5316 Linux v4.17-rc3) Merging input-current/for-linus (596ea7aad431 MAINTAINERS: Rakesh Iyer can't be reached anymore) Merging crypto-current/master (eea0d3ea7546 crypto: drbg - set freed buffers to NULL) Merging ide/master (8e44e6600caa Merge branch 'KASAN-read_word_at_a_time') Mer
RE: [PATCH 2/2] Use bit-wise majority to recover the contents of ONFI parameter
Hi Miquèl, Thank you for your response and feedback. I've modified the fix based on your comments. Please see the updated patch file at the end of this message (also in attachment). My answers to your comments/questions are inline in the previous message. The new patch is rebased on top of v4.17-rc1. Best regards, Jane Updated patch: From e14ed7dc08296a52f81d14781dee2f455dd90bbd Mon Sep 17 00:00:00 2001 From: Jane WanDate: Mon, 30 Apr 2018 14:05:40 -0700 Subject: [PATCH 2/2] mtd: rawnand: fsl_ifc: use bit-wise majority to recover the contents of ONFI parameter Per ONFI specification (Rev. 4.0), if all parameter pages have invalid CRC values, the bit-wise majority may be used to recover the contents of the parameter pages from the parameter page copies present. Signed-off-by: Jane Wan --- drivers/mtd/nand/raw/nand_base.c | 36 ++-- 1 file changed, 30 insertions(+), 6 deletions(-) diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c index 72f3a89..464c4fb 100644 --- a/drivers/mtd/nand/raw/nand_base.c +++ b/drivers/mtd/nand/raw/nand_base.c @@ -5086,6 +5086,8 @@ static int nand_flash_detect_ext_param_page(struct nand_chip *chip, return ret; } +#define GET_BIT(bit, val) (((val) >> (bit)) & 0x01) + /* * Check if the NAND chip is ONFI compliant, returns 1 if it is, 0 otherwise. */ @@ -5094,7 +5096,8 @@ static int nand_flash_detect_onfi(struct nand_chip *chip) struct mtd_info *mtd = nand_to_mtd(chip); struct nand_onfi_params *p; char id[4]; - int i, ret, val; + int i, ret, val, pagesize; + u8 *buf; /* Try ONFI for unknown chip or LP */ ret = nand_readid_op(chip, 0x20, id, sizeof(id)); @@ -5102,8 +5105,9 @@ static int nand_flash_detect_onfi(struct nand_chip *chip) return 0; /* ONFI chip: allocate a buffer to hold its parameter page */ - p = kzalloc(sizeof(*p), GFP_KERNEL); - if (!p) + pagesize = sizeof(*p); + buf = kzalloc((pagesize * 3), GFP_KERNEL); + if (!buf) return -ENOMEM; ret = nand_read_param_page_op(chip, 0, NULL, 0); @@ -5113,7 +5117,8 @@ static int nand_flash_detect_onfi(struct nand_chip *chip) } for (i = 0; i < 3; i++) { - ret = nand_read_data_op(chip, p, sizeof(*p), true); + p = (struct nand_onfi_params *)[i*pagesize]; + ret = nand_read_data_op(chip, p, pagesize, true); if (ret) { ret = 0; goto free_onfi_param_page; @@ -5126,8 +5131,27 @@ static int nand_flash_detect_onfi(struct nand_chip *chip) } if (i == 3) { - pr_err("Could not find valid ONFI parameter page; aborting\n"); - goto free_onfi_param_page; + int j, k, l; + u8 v, m; + + pr_err("Could not find valid ONFI parameter page\n"); + pr_info("Recover ONFI params with bit-wise majority\n"); + for (j = 0; j < pagesize; j++) { + v = 0; + for (k = 0; k < 8; k++) { + m = 0; + for (l = 0; l < 3; l++) + m += GET_BIT(k, buf[l*pagesize + j]); + if (m > 1) + v |= BIT(k); + } + ((u8 *)p)[j] = v; + } + if (onfi_crc16(ONFI_CRC_BASE, (uint8_t *)p, 254) != + le16_to_cpu(p->crc)) { + pr_err("ONFI parameter recovery failed, aborting\n"); + goto free_onfi_param_page; + } } /* Check version */ -- 1.7.9.5 -Original Message- From: Miquel Raynal [mailto:miquel.ray...@bootlin.com] Sent: Saturday, April 28, 2018 5:07 AM To: Wan, Jane (Nokia - US/Sunnyvale) Cc: dw...@infradead.org; computersforpe...@gmail.com; Bos, Ties (Nokia - US/Sunnyvale) ; linux-...@lists.infradead.org; linux-kernel@vger.kernel.org; Boris Brezillon Subject: Re: [PATCH 2/2] Use bit-wise majority to recover the contents of ONFI parameter Hi Jane, Same comments as before, please: get the right maintainers, add a commit log, rebase and fix the title prefix. [Jane] Added. Thanks. Have you ever needed/tried this algorithm before? [Jane] Yes, we got a batch of particularly bad NAND chips recently and we needed these changes to make them work reliably over temperature. The patch was verified using these bad chips. On Thu, 26 Apr 2018 17:19:56 -0700, Jane Wan wrote: > Signed-off-by: Jane Wan > --- > drivers/mtd/nand/nand_base.c | 35 +++ > 1 file
RE: [PATCH 2/2] Use bit-wise majority to recover the contents of ONFI parameter
Hi Miquèl, Thank you for your response and feedback. I've modified the fix based on your comments. Please see the updated patch file at the end of this message (also in attachment). My answers to your comments/questions are inline in the previous message. The new patch is rebased on top of v4.17-rc1. Best regards, Jane Updated patch: From e14ed7dc08296a52f81d14781dee2f455dd90bbd Mon Sep 17 00:00:00 2001 From: Jane Wan Date: Mon, 30 Apr 2018 14:05:40 -0700 Subject: [PATCH 2/2] mtd: rawnand: fsl_ifc: use bit-wise majority to recover the contents of ONFI parameter Per ONFI specification (Rev. 4.0), if all parameter pages have invalid CRC values, the bit-wise majority may be used to recover the contents of the parameter pages from the parameter page copies present. Signed-off-by: Jane Wan --- drivers/mtd/nand/raw/nand_base.c | 36 ++-- 1 file changed, 30 insertions(+), 6 deletions(-) diff --git a/drivers/mtd/nand/raw/nand_base.c b/drivers/mtd/nand/raw/nand_base.c index 72f3a89..464c4fb 100644 --- a/drivers/mtd/nand/raw/nand_base.c +++ b/drivers/mtd/nand/raw/nand_base.c @@ -5086,6 +5086,8 @@ static int nand_flash_detect_ext_param_page(struct nand_chip *chip, return ret; } +#define GET_BIT(bit, val) (((val) >> (bit)) & 0x01) + /* * Check if the NAND chip is ONFI compliant, returns 1 if it is, 0 otherwise. */ @@ -5094,7 +5096,8 @@ static int nand_flash_detect_onfi(struct nand_chip *chip) struct mtd_info *mtd = nand_to_mtd(chip); struct nand_onfi_params *p; char id[4]; - int i, ret, val; + int i, ret, val, pagesize; + u8 *buf; /* Try ONFI for unknown chip or LP */ ret = nand_readid_op(chip, 0x20, id, sizeof(id)); @@ -5102,8 +5105,9 @@ static int nand_flash_detect_onfi(struct nand_chip *chip) return 0; /* ONFI chip: allocate a buffer to hold its parameter page */ - p = kzalloc(sizeof(*p), GFP_KERNEL); - if (!p) + pagesize = sizeof(*p); + buf = kzalloc((pagesize * 3), GFP_KERNEL); + if (!buf) return -ENOMEM; ret = nand_read_param_page_op(chip, 0, NULL, 0); @@ -5113,7 +5117,8 @@ static int nand_flash_detect_onfi(struct nand_chip *chip) } for (i = 0; i < 3; i++) { - ret = nand_read_data_op(chip, p, sizeof(*p), true); + p = (struct nand_onfi_params *)[i*pagesize]; + ret = nand_read_data_op(chip, p, pagesize, true); if (ret) { ret = 0; goto free_onfi_param_page; @@ -5126,8 +5131,27 @@ static int nand_flash_detect_onfi(struct nand_chip *chip) } if (i == 3) { - pr_err("Could not find valid ONFI parameter page; aborting\n"); - goto free_onfi_param_page; + int j, k, l; + u8 v, m; + + pr_err("Could not find valid ONFI parameter page\n"); + pr_info("Recover ONFI params with bit-wise majority\n"); + for (j = 0; j < pagesize; j++) { + v = 0; + for (k = 0; k < 8; k++) { + m = 0; + for (l = 0; l < 3; l++) + m += GET_BIT(k, buf[l*pagesize + j]); + if (m > 1) + v |= BIT(k); + } + ((u8 *)p)[j] = v; + } + if (onfi_crc16(ONFI_CRC_BASE, (uint8_t *)p, 254) != + le16_to_cpu(p->crc)) { + pr_err("ONFI parameter recovery failed, aborting\n"); + goto free_onfi_param_page; + } } /* Check version */ -- 1.7.9.5 -Original Message- From: Miquel Raynal [mailto:miquel.ray...@bootlin.com] Sent: Saturday, April 28, 2018 5:07 AM To: Wan, Jane (Nokia - US/Sunnyvale) Cc: dw...@infradead.org; computersforpe...@gmail.com; Bos, Ties (Nokia - US/Sunnyvale) ; linux-...@lists.infradead.org; linux-kernel@vger.kernel.org; Boris Brezillon Subject: Re: [PATCH 2/2] Use bit-wise majority to recover the contents of ONFI parameter Hi Jane, Same comments as before, please: get the right maintainers, add a commit log, rebase and fix the title prefix. [Jane] Added. Thanks. Have you ever needed/tried this algorithm before? [Jane] Yes, we got a batch of particularly bad NAND chips recently and we needed these changes to make them work reliably over temperature. The patch was verified using these bad chips. On Thu, 26 Apr 2018 17:19:56 -0700, Jane Wan wrote: > Signed-off-by: Jane Wan > --- > drivers/mtd/nand/nand_base.c | 35 +++ > 1 file changed, 31 insertions(+), 4 deletions(-) > > diff --git a/drivers/mtd/nand/nand_base.c > b/drivers/mtd/nand/nand_base.c index c2e1232..161b523
[PATCH v3 2/2] nvmem: Add RAVE SP EEPROM driver
Add driver providing access to EEPROMs connected to RAVE SP devices Cc: Srinivas KandagatlaCc: linux-kernel@vger.kernel.org Cc: Chris Healy Cc: Lucas Stach Cc: Aleksander Morgado Signed-off-by: Andrey Smirnov --- drivers/nvmem/Kconfig | 6 + drivers/nvmem/Makefile | 3 + drivers/nvmem/rave-sp-eeprom.c | 357 + 3 files changed, 366 insertions(+) create mode 100644 drivers/nvmem/rave-sp-eeprom.c diff --git a/drivers/nvmem/Kconfig b/drivers/nvmem/Kconfig index 1090924efdb1..54a3c298247b 100644 --- a/drivers/nvmem/Kconfig +++ b/drivers/nvmem/Kconfig @@ -175,4 +175,10 @@ config NVMEM_SNVS_LPGPR This driver can also be built as a module. If so, the module will be called nvmem-snvs-lpgpr. +config RAVE_SP_EEPROM + tristate "Rave SP EEPROM Support" + depends on RAVE_SP_CORE + help + Say y here to enable Rave SP EEPROM support. + endif diff --git a/drivers/nvmem/Makefile b/drivers/nvmem/Makefile index e54dcfa6565a..27e96a8efd1c 100644 --- a/drivers/nvmem/Makefile +++ b/drivers/nvmem/Makefile @@ -37,3 +37,6 @@ obj-$(CONFIG_MESON_MX_EFUSE) += nvmem_meson_mx_efuse.o nvmem_meson_mx_efuse-y := meson-mx-efuse.o obj-$(CONFIG_NVMEM_SNVS_LPGPR) += nvmem_snvs_lpgpr.o nvmem_snvs_lpgpr-y := snvs_lpgpr.o +obj-$(CONFIG_RAVE_SP_EEPROM) += nvmem-rave-sp-eeprom.o +nvmem-rave-sp-eeprom-y := rave-sp-eeprom.o + diff --git a/drivers/nvmem/rave-sp-eeprom.c b/drivers/nvmem/rave-sp-eeprom.c new file mode 100644 index ..50aeea6ec6cc --- /dev/null +++ b/drivers/nvmem/rave-sp-eeprom.c @@ -0,0 +1,357 @@ +// SPDX-License-Identifier: GPL-2.0+ + +/* + * EEPROM driver for RAVE SP + * + * Copyright (C) 2018 Zodiac Inflight Innovations + * + */ +#include +#include +#include +#include +#include +#include +#include + +/** + * enum rave_sp_eeprom_access_type - Supported types of EEPROM access + * + * @RAVE_SP_EEPROM_WRITE: EEPROM write + * @RAVE_SP_EEPROM_READ: EEPROM read + */ +enum rave_sp_eeprom_access_type { + RAVE_SP_EEPROM_WRITE = 0, + RAVE_SP_EEPROM_READ = 1, +}; + +/** + * enum rave_sp_eeprom_header_size - EEPROM command header sizes + * + * @RAVE_SP_EEPROM_HEADER_SMALL: EEPROM header size for "small" devices (< 8K) + * @RAVE_SP_EEPROM_HEADER_BIG: EEPROM header size for "big" devices (> 8K) + */ +enum rave_sp_eeprom_header_size { + RAVE_SP_EEPROM_HEADER_SMALL = 4U, + RAVE_SP_EEPROM_HEADER_BIG = 5U, +}; + +#defineRAVE_SP_EEPROM_PAGE_SIZE32U + +/** + * struct rave_sp_eeprom_page - RAVE SP EEPROM page + * + * @type: Access type (see enum rave_sp_eeprom_access_type) + * @success: Success flag (Success = 1, Failure = 0) + * @data: Read data + + * Note this structure corresponds to RSP_*_EEPROM payload from RAVE + * SP ICD + */ +struct rave_sp_eeprom_page { + u8 type; + u8 success; + u8 data[RAVE_SP_EEPROM_PAGE_SIZE]; +} __packed; + +/** + * struct rave_sp_eeprom - RAVE SP EEPROM device + * + * @sp:Pointer to parent RAVE SP device + * @mutex: Lock protecting access to EEPROM + * @address: EEPROM device address + * @header_size: Size of EEPROM command header for this device + * @dev: Pointer to corresponding struct device used for logging + */ +struct rave_sp_eeprom { + struct rave_sp *sp; + struct mutex mutex; + u8 address; + unsigned int header_size; + struct device *dev; +}; + +/** + * rave_sp_eeprom_io - Low-level part of EEPROM page access + * + * @eeprom:EEPROM device to write to + * @type: EEPROM access type (read or write) + * @idx: number of the EEPROM page + * @page: Data to write or buffer to store result (via page->data) + * + * This function does all of the low-level work required to perform a + * EEPROM access. This includes formatting correct command payload, + * sending it and checking received results. + * + * Returns zero in case of success or negative error code in + * case of failure. + */ +static int rave_sp_eeprom_io(struct rave_sp_eeprom *eeprom, +enum rave_sp_eeprom_access_type type, +u16 idx, +struct rave_sp_eeprom_page *page) +{ + const bool is_write = type == RAVE_SP_EEPROM_WRITE; + const unsigned int data_size = is_write ? sizeof(page->data) : 0; + const unsigned int cmd_size = eeprom->header_size + data_size; + const unsigned int rsp_size = + is_write ? sizeof(*page) - sizeof(page->data) : sizeof(*page); + unsigned int offset = 0; + u8 cmd[cmd_size]; + int ret; + + cmd[offset++] = eeprom->address; + cmd[offset++] = 0; + cmd[offset++] = type; + cmd[offset++] = idx; +
[PATCH v3 2/2] nvmem: Add RAVE SP EEPROM driver
Add driver providing access to EEPROMs connected to RAVE SP devices Cc: Srinivas Kandagatla Cc: linux-kernel@vger.kernel.org Cc: Chris Healy Cc: Lucas Stach Cc: Aleksander Morgado Signed-off-by: Andrey Smirnov --- drivers/nvmem/Kconfig | 6 + drivers/nvmem/Makefile | 3 + drivers/nvmem/rave-sp-eeprom.c | 357 + 3 files changed, 366 insertions(+) create mode 100644 drivers/nvmem/rave-sp-eeprom.c diff --git a/drivers/nvmem/Kconfig b/drivers/nvmem/Kconfig index 1090924efdb1..54a3c298247b 100644 --- a/drivers/nvmem/Kconfig +++ b/drivers/nvmem/Kconfig @@ -175,4 +175,10 @@ config NVMEM_SNVS_LPGPR This driver can also be built as a module. If so, the module will be called nvmem-snvs-lpgpr. +config RAVE_SP_EEPROM + tristate "Rave SP EEPROM Support" + depends on RAVE_SP_CORE + help + Say y here to enable Rave SP EEPROM support. + endif diff --git a/drivers/nvmem/Makefile b/drivers/nvmem/Makefile index e54dcfa6565a..27e96a8efd1c 100644 --- a/drivers/nvmem/Makefile +++ b/drivers/nvmem/Makefile @@ -37,3 +37,6 @@ obj-$(CONFIG_MESON_MX_EFUSE) += nvmem_meson_mx_efuse.o nvmem_meson_mx_efuse-y := meson-mx-efuse.o obj-$(CONFIG_NVMEM_SNVS_LPGPR) += nvmem_snvs_lpgpr.o nvmem_snvs_lpgpr-y := snvs_lpgpr.o +obj-$(CONFIG_RAVE_SP_EEPROM) += nvmem-rave-sp-eeprom.o +nvmem-rave-sp-eeprom-y := rave-sp-eeprom.o + diff --git a/drivers/nvmem/rave-sp-eeprom.c b/drivers/nvmem/rave-sp-eeprom.c new file mode 100644 index ..50aeea6ec6cc --- /dev/null +++ b/drivers/nvmem/rave-sp-eeprom.c @@ -0,0 +1,357 @@ +// SPDX-License-Identifier: GPL-2.0+ + +/* + * EEPROM driver for RAVE SP + * + * Copyright (C) 2018 Zodiac Inflight Innovations + * + */ +#include +#include +#include +#include +#include +#include +#include + +/** + * enum rave_sp_eeprom_access_type - Supported types of EEPROM access + * + * @RAVE_SP_EEPROM_WRITE: EEPROM write + * @RAVE_SP_EEPROM_READ: EEPROM read + */ +enum rave_sp_eeprom_access_type { + RAVE_SP_EEPROM_WRITE = 0, + RAVE_SP_EEPROM_READ = 1, +}; + +/** + * enum rave_sp_eeprom_header_size - EEPROM command header sizes + * + * @RAVE_SP_EEPROM_HEADER_SMALL: EEPROM header size for "small" devices (< 8K) + * @RAVE_SP_EEPROM_HEADER_BIG: EEPROM header size for "big" devices (> 8K) + */ +enum rave_sp_eeprom_header_size { + RAVE_SP_EEPROM_HEADER_SMALL = 4U, + RAVE_SP_EEPROM_HEADER_BIG = 5U, +}; + +#defineRAVE_SP_EEPROM_PAGE_SIZE32U + +/** + * struct rave_sp_eeprom_page - RAVE SP EEPROM page + * + * @type: Access type (see enum rave_sp_eeprom_access_type) + * @success: Success flag (Success = 1, Failure = 0) + * @data: Read data + + * Note this structure corresponds to RSP_*_EEPROM payload from RAVE + * SP ICD + */ +struct rave_sp_eeprom_page { + u8 type; + u8 success; + u8 data[RAVE_SP_EEPROM_PAGE_SIZE]; +} __packed; + +/** + * struct rave_sp_eeprom - RAVE SP EEPROM device + * + * @sp:Pointer to parent RAVE SP device + * @mutex: Lock protecting access to EEPROM + * @address: EEPROM device address + * @header_size: Size of EEPROM command header for this device + * @dev: Pointer to corresponding struct device used for logging + */ +struct rave_sp_eeprom { + struct rave_sp *sp; + struct mutex mutex; + u8 address; + unsigned int header_size; + struct device *dev; +}; + +/** + * rave_sp_eeprom_io - Low-level part of EEPROM page access + * + * @eeprom:EEPROM device to write to + * @type: EEPROM access type (read or write) + * @idx: number of the EEPROM page + * @page: Data to write or buffer to store result (via page->data) + * + * This function does all of the low-level work required to perform a + * EEPROM access. This includes formatting correct command payload, + * sending it and checking received results. + * + * Returns zero in case of success or negative error code in + * case of failure. + */ +static int rave_sp_eeprom_io(struct rave_sp_eeprom *eeprom, +enum rave_sp_eeprom_access_type type, +u16 idx, +struct rave_sp_eeprom_page *page) +{ + const bool is_write = type == RAVE_SP_EEPROM_WRITE; + const unsigned int data_size = is_write ? sizeof(page->data) : 0; + const unsigned int cmd_size = eeprom->header_size + data_size; + const unsigned int rsp_size = + is_write ? sizeof(*page) - sizeof(page->data) : sizeof(*page); + unsigned int offset = 0; + u8 cmd[cmd_size]; + int ret; + + cmd[offset++] = eeprom->address; + cmd[offset++] = 0; + cmd[offset++] = type; + cmd[offset++] = idx; + + /* +* If there's still room in this command's header it means we +* are talkin to EEPROM that uses
[PATCH v3 0/2] RAVE SP EEPROM driver
Srinivas: This series is a third iteration of the patchset adding NVMEM support for EEPROMs connected to RAVE SP MFD device (support for which landed in 4.15). Chagnes since [v2]: - Added verbiage about data cells, fixed captial case hex number as well as lack of address in node name in zii,rave-sp-eeprom.txt - Added optional DT property "zii,eeprom-name", to allow giving more descriptive names to NVMEM devices created by the driver Changes since [v1]: - Patchset rebased on latest master from Linus which contains all necessary dependencies - Added sizes.h to include in patch 1/2 to avoid build breaks reported by build-bot - Added missing #size-cells, #address-cells as well as example cell to DT bindings documentation (pointed out by Rob) Feedback is wellcome! Thanks, Andrey Smirnov [v2] lkml.kernel.org/r/20180411015948.19562-1-andrew.smir...@gmail.com [v1] lkml.kernel.org/r/20180321134710.29757-1-andrew.smir...@gmail.com Andrey Smirnov (2): dt-bindings: nvmem: Add binding for RAVE SP EEPROM driver nvmem: Add RAVE SP EEPROM driver .../bindings/nvmem/zii,rave-sp-eeprom.txt | 40 +++ drivers/nvmem/Kconfig | 6 + drivers/nvmem/Makefile | 3 + drivers/nvmem/rave-sp-eeprom.c | 357 + 4 files changed, 406 insertions(+) create mode 100644 Documentation/devicetree/bindings/nvmem/zii,rave-sp-eeprom.txt create mode 100644 drivers/nvmem/rave-sp-eeprom.c -- 2.14.3
[PATCH v3 1/2] dt-bindings: nvmem: Add binding for RAVE SP EEPROM driver
Add Device Tree bindings for RAVE SP EEPROM driver - an MFD cell of parent RAVE SP driver (documented in Documentation/devicetree/bindings/mfd/zii,rave-sp.txt). Cc: Srinivas KandagatlaCc: linux-kernel@vger.kernel.org Cc: Chris Healy Cc: Lucas Stach Cc: Aleksander Morgado Cc: Rob Herring Cc: Mark Rutland Cc: devicet...@vger.kernel.org Signed-off-by: Andrey Smirnov --- .../bindings/nvmem/zii,rave-sp-eeprom.txt | 40 ++ 1 file changed, 40 insertions(+) create mode 100644 Documentation/devicetree/bindings/nvmem/zii,rave-sp-eeprom.txt diff --git a/Documentation/devicetree/bindings/nvmem/zii,rave-sp-eeprom.txt b/Documentation/devicetree/bindings/nvmem/zii,rave-sp-eeprom.txt new file mode 100644 index ..d5e22fc67d66 --- /dev/null +++ b/Documentation/devicetree/bindings/nvmem/zii,rave-sp-eeprom.txt @@ -0,0 +1,40 @@ +Zodiac Inflight Innovations RAVE EEPROM Bindings + +RAVE SP EEPROM device is a "MFD cell" device exposing physical EEPROM +attached to RAVE Supervisory Processor. It is expected that its Device +Tree node is specified as a child of the node corresponding to the +parent RAVE SP device (as documented in +Documentation/devicetree/bindings/mfd/zii,rave-sp.txt) + +Required properties: + +- compatible: Should be "zii,rave-sp-eeprom" + +Optional properties: + +- zii,eeprom-name: Unique EEPROM identifier describing its function in the + system. Will be used as created NVMEM deivce's name. + +Data cells: + +Data cells are child nodes of eerpom node, bindings for which are +documented in Documentation/bindings/nvmem/nvmem.txt + +Example: + + rave-sp { + compatible = "zii,rave-sp-rdu1"; + current-speed = <38400>; + + eeprom@a4 { + compatible = "zii,rave-sp-eeprom"; + reg = <0xa4 0x4000>; + #address-cells = <1>; + #size-cells = <1>; + zii,eeprom-name = "main-eeprom"; + + wdt_timeout: wdt-timeout@81 { + reg = <0x81 2>; + }; + }; + } -- 2.14.3
[PATCH v3 0/2] RAVE SP EEPROM driver
Srinivas: This series is a third iteration of the patchset adding NVMEM support for EEPROMs connected to RAVE SP MFD device (support for which landed in 4.15). Chagnes since [v2]: - Added verbiage about data cells, fixed captial case hex number as well as lack of address in node name in zii,rave-sp-eeprom.txt - Added optional DT property "zii,eeprom-name", to allow giving more descriptive names to NVMEM devices created by the driver Changes since [v1]: - Patchset rebased on latest master from Linus which contains all necessary dependencies - Added sizes.h to include in patch 1/2 to avoid build breaks reported by build-bot - Added missing #size-cells, #address-cells as well as example cell to DT bindings documentation (pointed out by Rob) Feedback is wellcome! Thanks, Andrey Smirnov [v2] lkml.kernel.org/r/20180411015948.19562-1-andrew.smir...@gmail.com [v1] lkml.kernel.org/r/20180321134710.29757-1-andrew.smir...@gmail.com Andrey Smirnov (2): dt-bindings: nvmem: Add binding for RAVE SP EEPROM driver nvmem: Add RAVE SP EEPROM driver .../bindings/nvmem/zii,rave-sp-eeprom.txt | 40 +++ drivers/nvmem/Kconfig | 6 + drivers/nvmem/Makefile | 3 + drivers/nvmem/rave-sp-eeprom.c | 357 + 4 files changed, 406 insertions(+) create mode 100644 Documentation/devicetree/bindings/nvmem/zii,rave-sp-eeprom.txt create mode 100644 drivers/nvmem/rave-sp-eeprom.c -- 2.14.3
[PATCH v3 1/2] dt-bindings: nvmem: Add binding for RAVE SP EEPROM driver
Add Device Tree bindings for RAVE SP EEPROM driver - an MFD cell of parent RAVE SP driver (documented in Documentation/devicetree/bindings/mfd/zii,rave-sp.txt). Cc: Srinivas Kandagatla Cc: linux-kernel@vger.kernel.org Cc: Chris Healy Cc: Lucas Stach Cc: Aleksander Morgado Cc: Rob Herring Cc: Mark Rutland Cc: devicet...@vger.kernel.org Signed-off-by: Andrey Smirnov --- .../bindings/nvmem/zii,rave-sp-eeprom.txt | 40 ++ 1 file changed, 40 insertions(+) create mode 100644 Documentation/devicetree/bindings/nvmem/zii,rave-sp-eeprom.txt diff --git a/Documentation/devicetree/bindings/nvmem/zii,rave-sp-eeprom.txt b/Documentation/devicetree/bindings/nvmem/zii,rave-sp-eeprom.txt new file mode 100644 index ..d5e22fc67d66 --- /dev/null +++ b/Documentation/devicetree/bindings/nvmem/zii,rave-sp-eeprom.txt @@ -0,0 +1,40 @@ +Zodiac Inflight Innovations RAVE EEPROM Bindings + +RAVE SP EEPROM device is a "MFD cell" device exposing physical EEPROM +attached to RAVE Supervisory Processor. It is expected that its Device +Tree node is specified as a child of the node corresponding to the +parent RAVE SP device (as documented in +Documentation/devicetree/bindings/mfd/zii,rave-sp.txt) + +Required properties: + +- compatible: Should be "zii,rave-sp-eeprom" + +Optional properties: + +- zii,eeprom-name: Unique EEPROM identifier describing its function in the + system. Will be used as created NVMEM deivce's name. + +Data cells: + +Data cells are child nodes of eerpom node, bindings for which are +documented in Documentation/bindings/nvmem/nvmem.txt + +Example: + + rave-sp { + compatible = "zii,rave-sp-rdu1"; + current-speed = <38400>; + + eeprom@a4 { + compatible = "zii,rave-sp-eeprom"; + reg = <0xa4 0x4000>; + #address-cells = <1>; + #size-cells = <1>; + zii,eeprom-name = "main-eeprom"; + + wdt_timeout: wdt-timeout@81 { + reg = <0x81 2>; + }; + }; + } -- 2.14.3
Re: [PATCH v2 2/2] mm: vmalloc: Pass proper vm_start into debugobjects
On 5/1/2018 4:34 AM, Andrew Morton wrote: should check for it and do a WARN_ONCE so it gets fixed. Yes, that was an idea in discussion but I've been suggested that it could be intentional. But since you are raising this, I will try to dig once again and share a patch with WARN_ONCE if passing intermediate 'addr' is absolutely not right thing to do. Chintan -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
Re: [PATCH v2 2/2] mm: vmalloc: Pass proper vm_start into debugobjects
On 5/1/2018 4:34 AM, Andrew Morton wrote: should check for it and do a WARN_ONCE so it gets fixed. Yes, that was an idea in discussion but I've been suggested that it could be intentional. But since you are raising this, I will try to dig once again and share a patch with WARN_ONCE if passing intermediate 'addr' is absolutely not right thing to do. Chintan -- Qualcomm India Private Limited, on behalf of Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
[PATCH v1] clk: qcom: gdsc: Add support to poll CFG register to check GDSC state
From: Amit NischalThe default behavior of the GDSC enable/disable sequence is to poll the status bits of either the actual GDSCR or the corresponding HW_CTRL registers. On targets which have support for a CFG_GDSCR register, the status bits might not show the correct state of the GDSC, especially in the disable sequence, where the status bit will be cleared even before the core is completely power collapsed. On targets with this issue, poll the power on/off bits in the CFG_GDSCR register instead to correctly determine the GDSC state. Signed-off-by: Amit Nischal Signed-off-by: Taniya Das --- drivers/clk/qcom/gdsc.c | 42 ++ drivers/clk/qcom/gdsc.h | 1 + 2 files changed, 27 insertions(+), 16 deletions(-) diff --git a/drivers/clk/qcom/gdsc.c b/drivers/clk/qcom/gdsc.c index cb61c15..2a6b0ff 100644 --- a/drivers/clk/qcom/gdsc.c +++ b/drivers/clk/qcom/gdsc.c @@ -33,6 +33,11 @@ #define GMEM_CLAMP_IO_MASK BIT(0) #define GMEM_RESET_MASKBIT(4) +/* CFG_GDSCR */ +#define GDSC_POWER_UP_COMPLETE BIT(16) +#define GDSC_POWER_DOWN_COMPLETE BIT(15) +#define CFG_GDSCR_OFFSET 0x4 + /* Wait 2^n CXO cycles between all states. Here, n=2 (4 cycles). */ #define EN_REST_WAIT_VAL (0x2 << 20) #define EN_FEW_WAIT_VAL(0x8 << 16) @@ -45,15 +50,28 @@ #define domain_to_gdsc(domain) container_of(domain, struct gdsc, pd) -static int gdsc_is_enabled(struct gdsc *sc, unsigned int reg) +static int gdsc_is_enabled(struct gdsc *sc, bool en) { + unsigned int reg; u32 val; int ret; + if (sc->flags & POLL_CFG_GDSCR) + reg = sc->gdscr + CFG_GDSCR_OFFSET; + else + reg = sc->gds_hw_ctrl ? sc->gds_hw_ctrl : sc->gdscr; + ret = regmap_read(sc->regmap, reg, ); if (ret) return ret; + if (sc->flags & POLL_CFG_GDSCR) { + if (en) + return !!(val & GDSC_POWER_UP_COMPLETE); + else + return !(val & GDSC_POWER_DOWN_COMPLETE); + } + return !!(val & PWR_ON_MASK); } @@ -64,17 +82,17 @@ static int gdsc_hwctrl(struct gdsc *sc, bool en) return regmap_update_bits(sc->regmap, sc->gdscr, HW_CONTROL_MASK, val); } -static int gdsc_poll_status(struct gdsc *sc, unsigned int reg, bool en) +static int gdsc_poll_status(struct gdsc *sc, bool en) { ktime_t start; start = ktime_get(); do { - if (gdsc_is_enabled(sc, reg) == en) + if (gdsc_is_enabled(sc, en) == en) return 0; } while (ktime_us_delta(ktime_get(), start) < TIMEOUT_US); - if (gdsc_is_enabled(sc, reg) == en) + if (gdsc_is_enabled(sc, en) == en) return 0; return -ETIMEDOUT; @@ -84,7 +102,6 @@ static int gdsc_toggle_logic(struct gdsc *sc, bool en) { int ret; u32 val = en ? 0 : SW_COLLAPSE_MASK; - unsigned int status_reg = sc->gdscr; ret = regmap_update_bits(sc->regmap, sc->gdscr, SW_COLLAPSE_MASK, val); if (ret) @@ -101,8 +118,7 @@ static int gdsc_toggle_logic(struct gdsc *sc, bool en) return 0; } - if (sc->gds_hw_ctrl) { - status_reg = sc->gds_hw_ctrl; + if (sc->gds_hw_ctrl) /* * The gds hw controller asserts/de-asserts the status bit soon * after it receives a power on/off request from a master. @@ -114,9 +130,8 @@ static int gdsc_toggle_logic(struct gdsc *sc, bool en) * and polling the status bit. */ udelay(1); - } - return gdsc_poll_status(sc, status_reg, en); + return gdsc_poll_status(sc, en); } static inline int gdsc_deassert_reset(struct gdsc *sc) @@ -240,8 +255,6 @@ static int gdsc_disable(struct generic_pm_domain *domain) /* Turn off HW trigger mode if supported */ if (sc->flags & HW_CTRL) { - unsigned int reg; - ret = gdsc_hwctrl(sc, false); if (ret < 0) return ret; @@ -253,8 +266,7 @@ static int gdsc_disable(struct generic_pm_domain *domain) */ udelay(1); - reg = sc->gds_hw_ctrl ? sc->gds_hw_ctrl : sc->gdscr; - ret = gdsc_poll_status(sc, reg, true); + ret = gdsc_poll_status(sc, true); if (ret) return ret; } @@ -276,7 +288,6 @@ static int gdsc_init(struct gdsc *sc) { u32 mask, val; int on, ret; - unsigned int reg; /* * Disable HW trigger: collapse/restore occur based on registers writes. @@ -297,8 +308,7 @@ static int gdsc_init(struct gdsc *sc) return ret; } - reg = sc->gds_hw_ctrl ? sc->gds_hw_ctrl :
[PATCH v1] clk: qcom: gdsc: Add support to poll CFG register to check GDSC state
From: Amit Nischal The default behavior of the GDSC enable/disable sequence is to poll the status bits of either the actual GDSCR or the corresponding HW_CTRL registers. On targets which have support for a CFG_GDSCR register, the status bits might not show the correct state of the GDSC, especially in the disable sequence, where the status bit will be cleared even before the core is completely power collapsed. On targets with this issue, poll the power on/off bits in the CFG_GDSCR register instead to correctly determine the GDSC state. Signed-off-by: Amit Nischal Signed-off-by: Taniya Das --- drivers/clk/qcom/gdsc.c | 42 ++ drivers/clk/qcom/gdsc.h | 1 + 2 files changed, 27 insertions(+), 16 deletions(-) diff --git a/drivers/clk/qcom/gdsc.c b/drivers/clk/qcom/gdsc.c index cb61c15..2a6b0ff 100644 --- a/drivers/clk/qcom/gdsc.c +++ b/drivers/clk/qcom/gdsc.c @@ -33,6 +33,11 @@ #define GMEM_CLAMP_IO_MASK BIT(0) #define GMEM_RESET_MASKBIT(4) +/* CFG_GDSCR */ +#define GDSC_POWER_UP_COMPLETE BIT(16) +#define GDSC_POWER_DOWN_COMPLETE BIT(15) +#define CFG_GDSCR_OFFSET 0x4 + /* Wait 2^n CXO cycles between all states. Here, n=2 (4 cycles). */ #define EN_REST_WAIT_VAL (0x2 << 20) #define EN_FEW_WAIT_VAL(0x8 << 16) @@ -45,15 +50,28 @@ #define domain_to_gdsc(domain) container_of(domain, struct gdsc, pd) -static int gdsc_is_enabled(struct gdsc *sc, unsigned int reg) +static int gdsc_is_enabled(struct gdsc *sc, bool en) { + unsigned int reg; u32 val; int ret; + if (sc->flags & POLL_CFG_GDSCR) + reg = sc->gdscr + CFG_GDSCR_OFFSET; + else + reg = sc->gds_hw_ctrl ? sc->gds_hw_ctrl : sc->gdscr; + ret = regmap_read(sc->regmap, reg, ); if (ret) return ret; + if (sc->flags & POLL_CFG_GDSCR) { + if (en) + return !!(val & GDSC_POWER_UP_COMPLETE); + else + return !(val & GDSC_POWER_DOWN_COMPLETE); + } + return !!(val & PWR_ON_MASK); } @@ -64,17 +82,17 @@ static int gdsc_hwctrl(struct gdsc *sc, bool en) return regmap_update_bits(sc->regmap, sc->gdscr, HW_CONTROL_MASK, val); } -static int gdsc_poll_status(struct gdsc *sc, unsigned int reg, bool en) +static int gdsc_poll_status(struct gdsc *sc, bool en) { ktime_t start; start = ktime_get(); do { - if (gdsc_is_enabled(sc, reg) == en) + if (gdsc_is_enabled(sc, en) == en) return 0; } while (ktime_us_delta(ktime_get(), start) < TIMEOUT_US); - if (gdsc_is_enabled(sc, reg) == en) + if (gdsc_is_enabled(sc, en) == en) return 0; return -ETIMEDOUT; @@ -84,7 +102,6 @@ static int gdsc_toggle_logic(struct gdsc *sc, bool en) { int ret; u32 val = en ? 0 : SW_COLLAPSE_MASK; - unsigned int status_reg = sc->gdscr; ret = regmap_update_bits(sc->regmap, sc->gdscr, SW_COLLAPSE_MASK, val); if (ret) @@ -101,8 +118,7 @@ static int gdsc_toggle_logic(struct gdsc *sc, bool en) return 0; } - if (sc->gds_hw_ctrl) { - status_reg = sc->gds_hw_ctrl; + if (sc->gds_hw_ctrl) /* * The gds hw controller asserts/de-asserts the status bit soon * after it receives a power on/off request from a master. @@ -114,9 +130,8 @@ static int gdsc_toggle_logic(struct gdsc *sc, bool en) * and polling the status bit. */ udelay(1); - } - return gdsc_poll_status(sc, status_reg, en); + return gdsc_poll_status(sc, en); } static inline int gdsc_deassert_reset(struct gdsc *sc) @@ -240,8 +255,6 @@ static int gdsc_disable(struct generic_pm_domain *domain) /* Turn off HW trigger mode if supported */ if (sc->flags & HW_CTRL) { - unsigned int reg; - ret = gdsc_hwctrl(sc, false); if (ret < 0) return ret; @@ -253,8 +266,7 @@ static int gdsc_disable(struct generic_pm_domain *domain) */ udelay(1); - reg = sc->gds_hw_ctrl ? sc->gds_hw_ctrl : sc->gdscr; - ret = gdsc_poll_status(sc, reg, true); + ret = gdsc_poll_status(sc, true); if (ret) return ret; } @@ -276,7 +288,6 @@ static int gdsc_init(struct gdsc *sc) { u32 mask, val; int on, ret; - unsigned int reg; /* * Disable HW trigger: collapse/restore occur based on registers writes. @@ -297,8 +308,7 @@ static int gdsc_init(struct gdsc *sc) return ret; } - reg = sc->gds_hw_ctrl ? sc->gds_hw_ctrl : sc->gdscr; - on = gdsc_is_enabled(sc, reg); + on =
Re: [PATCH] clk: qcom: gdsc: Add support to poll CFG register to check GDSC state
Hello Doug, Thanks for the comments, I have based my latest patch on top of the earlier patches (clk-qcom-sdm845 branch of clk-next). On 5/1/2018 12:12 AM, Doug Anderson wrote: Hi, On Fri, Apr 27, 2018 at 1:19 AM, Taniya Daswrote: -static int gdsc_is_enabled(struct gdsc *sc, unsigned int reg) +static int gdsc_is_enabled(struct gdsc *sc, bool en) { + unsigned int reg; u32 val; int ret; + if (sc->flags & POLL_CFG_GDSCR) + reg = sc->gdscr + CFG_GDSCR_OFFSET; + else + reg = sc->gds_hw_ctrl ? sc->gds_hw_ctrl : sc->gdscr; nit: why two spaces after the "?" in this new patch? Should be just one. diff --git a/drivers/clk/qcom/gdsc.h b/drivers/clk/qcom/gdsc.h index 3964834..ac5f844 100644 --- a/drivers/clk/qcom/gdsc.h +++ b/drivers/clk/qcom/gdsc.h @@ -53,6 +53,7 @@ struct gdsc { #define VOTABLEBIT(0) #define CLAMP_IO BIT(1) #define HW_CTRLBIT(2) +#define POLL_CFG_GDSCR BIT(5) This doesn't apply cleanly to clk-next because clk-next already has the old patch #1 and patch #2 from your series. You should have applied your patch to clk-next before sending out. Also a nit here is that you have two spaces before "BIT(5)" but all other entries in this list have a tab before them. You should be consistent and use a tab. In general I'd tend to assume that Stephen could handle this small merge conflict and fixing the whitespace issues when applying, but if he tells you to spin then you certainly should. I'll also say that I'm nowhere near an expert on gdsc but it looks like Stephen's previous comments were addressed and the patch seems sane in general. Stephen: feel free to add my Reviewed-by: if you wish when applying (or Taniya, if you end up spinning). -Doug -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation. --
Re: [PATCH] clk: qcom: gdsc: Add support to poll CFG register to check GDSC state
Hello Doug, Thanks for the comments, I have based my latest patch on top of the earlier patches (clk-qcom-sdm845 branch of clk-next). On 5/1/2018 12:12 AM, Doug Anderson wrote: Hi, On Fri, Apr 27, 2018 at 1:19 AM, Taniya Das wrote: -static int gdsc_is_enabled(struct gdsc *sc, unsigned int reg) +static int gdsc_is_enabled(struct gdsc *sc, bool en) { + unsigned int reg; u32 val; int ret; + if (sc->flags & POLL_CFG_GDSCR) + reg = sc->gdscr + CFG_GDSCR_OFFSET; + else + reg = sc->gds_hw_ctrl ? sc->gds_hw_ctrl : sc->gdscr; nit: why two spaces after the "?" in this new patch? Should be just one. diff --git a/drivers/clk/qcom/gdsc.h b/drivers/clk/qcom/gdsc.h index 3964834..ac5f844 100644 --- a/drivers/clk/qcom/gdsc.h +++ b/drivers/clk/qcom/gdsc.h @@ -53,6 +53,7 @@ struct gdsc { #define VOTABLEBIT(0) #define CLAMP_IO BIT(1) #define HW_CTRLBIT(2) +#define POLL_CFG_GDSCR BIT(5) This doesn't apply cleanly to clk-next because clk-next already has the old patch #1 and patch #2 from your series. You should have applied your patch to clk-next before sending out. Also a nit here is that you have two spaces before "BIT(5)" but all other entries in this list have a tab before them. You should be consistent and use a tab. In general I'd tend to assume that Stephen could handle this small merge conflict and fixing the whitespace issues when applying, but if he tells you to spin then you certainly should. I'll also say that I'm nowhere near an expert on gdsc but it looks like Stephen's previous comments were addressed and the patch seems sane in general. Stephen: feel free to add my Reviewed-by: if you wish when applying (or Taniya, if you end up spinning). -Doug -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation. --
RE: [PATCH 1/2] Fix FSL NAND driver to read all ONFI parameter pages
Hi Miquèl and Boris, Thank you for your response and feedback. I've modified the fix based on your comments. Please see the updated patch file at the end of this message (also in attachment). My answers to your comments/questions are inline in the previous message. Here is the answer to Boris question in another email thread: > What if some NANDs have 4 or more copies of the param page? [Jane] The ONFI spec defines that the parameter page and its two redundant copies are mandatory. The additional redundant pages are optional. Currently, the FSL NAND driver only reads the first parameter page. This patch is to fix the driver to meet the mandatory requirement in the spec. We got a batch of particularly bad NAND chips recently and we needed these changes to make them work reliably over temperature. The patch was verified using these bad chips. Best regards, Jane Updated patch: From 701de4146aa6355c951e97a77476e12d2da56d42 Mon Sep 17 00:00:00 2001 From: Jane WanDate: Mon, 30 Apr 2018 13:30:46 -0700 Subject: [PATCH 1/2] mtd: rawnand: fsl_ifc: fix FSL NAND driver to read all ONFI parameter pages Per ONFI specification (Rev. 4.0), if the CRC of the first parameter page read is not valid, the host should read redundant parameter page copies. Fix FSL NAND driver to read the two redundant copies which are mandatory in the specification. Signed-off-by: Jane Wan --- drivers/mtd/nand/raw/fsl_ifc_nand.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/mtd/nand/raw/fsl_ifc_nand.c b/drivers/mtd/nand/raw/fsl_ifc_nand.c index 61aae02..98aac1f 100644 --- a/drivers/mtd/nand/raw/fsl_ifc_nand.c +++ b/drivers/mtd/nand/raw/fsl_ifc_nand.c @@ -342,9 +342,16 @@ static void fsl_ifc_cmdfunc(struct mtd_info *mtd, unsigned int command, case NAND_CMD_READID: case NAND_CMD_PARAM: { + /* +* For READID, read 8 bytes that are currently used. +* For PARAM, read all 3 copies of 256-bytes pages. +*/ + int len = 8; int timing = IFC_FIR_OP_RB; - if (command == NAND_CMD_PARAM) + if (command == NAND_CMD_PARAM) { timing = IFC_FIR_OP_RBCD; + len = 256 * 3; + } ifc_out32((IFC_FIR_OP_CW0 << IFC_NAND_FIR0_OP0_SHIFT) | (IFC_FIR_OP_UA << IFC_NAND_FIR0_OP1_SHIFT) | @@ -354,12 +361,8 @@ static void fsl_ifc_cmdfunc(struct mtd_info *mtd, unsigned int command, >ifc_nand.nand_fcr0); ifc_out32(column, >ifc_nand.row3); - /* -* although currently it's 8 bytes for READID, we always read -* the maximum 256 bytes(for PARAM) -*/ - ifc_out32(256, >ifc_nand.nand_fbcr); - ifc_nand_ctrl->read_bytes = 256; + ifc_out32(len, >ifc_nand.nand_fbcr); + ifc_nand_ctrl->read_bytes = len; set_addr(mtd, 0, 0, 0); fsl_ifc_run_command(mtd); -- 1.7.9.5 -Original Message- From: Miquel Raynal [mailto:miquel.ray...@bootlin.com] Sent: Saturday, April 28, 2018 4:42 AM To: Wan, Jane (Nokia - US/Sunnyvale) Cc: dw...@infradead.org; computersforpe...@gmail.com; Bos, Ties (Nokia - US/Sunnyvale) ; linux-...@lists.infradead.org; linux-kernel@vger.kernel.org; Boris Brezillon Subject: Re: [PATCH 1/2] Fix FSL NAND driver to read all ONFI parameter pages Hi Jane, You forgot to Cc the right maintainers, please use ./scripts/get_maintainer.pl for that. [Jane] Added through 4.17-rc1 get_maintainer.pl. I was running the script from an older kernel version we're using. > Signed-off-by: Jane Wan Please add a description of what your are doing in the commit message. The description in the cover letter is good, you can copy the relevant section here. [Jane] Added. > --- > drivers/mtd/nand/fsl_ifc_nand.c | 10 ++ Also, just for you to know, files have moved in a raw/ subdirectory, so please rebase on top of 4.17-rc1 and prefix the commit title with "mtd: rawnand: fsl_ifc:". [Jane] Thank you for the info. I've rebased the change on top of 4.17-rc1 and modified the commit title as you suggested. > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/drivers/mtd/nand/fsl_ifc_nand.c > b/drivers/mtd/nand/fsl_ifc_nand.c index ca36b35..a3cf6ca 100644 > --- a/drivers/mtd/nand/fsl_ifc_nand.c > +++ b/drivers/mtd/nand/fsl_ifc_nand.c > @@ -413,6 +413,7 @@ static void fsl_ifc_cmdfunc(struct mtd_info *mtd, > unsigned int command, > struct fsl_ifc_mtd *priv = chip->priv; > struct fsl_ifc_ctrl *ctrl = priv->ctrl; > struct fsl_ifc_runtime __iomem *ifc = ctrl->rregs; > + int len; > > /* clear the read
RE: [PATCH 1/2] Fix FSL NAND driver to read all ONFI parameter pages
Hi Miquèl and Boris, Thank you for your response and feedback. I've modified the fix based on your comments. Please see the updated patch file at the end of this message (also in attachment). My answers to your comments/questions are inline in the previous message. Here is the answer to Boris question in another email thread: > What if some NANDs have 4 or more copies of the param page? [Jane] The ONFI spec defines that the parameter page and its two redundant copies are mandatory. The additional redundant pages are optional. Currently, the FSL NAND driver only reads the first parameter page. This patch is to fix the driver to meet the mandatory requirement in the spec. We got a batch of particularly bad NAND chips recently and we needed these changes to make them work reliably over temperature. The patch was verified using these bad chips. Best regards, Jane Updated patch: From 701de4146aa6355c951e97a77476e12d2da56d42 Mon Sep 17 00:00:00 2001 From: Jane Wan Date: Mon, 30 Apr 2018 13:30:46 -0700 Subject: [PATCH 1/2] mtd: rawnand: fsl_ifc: fix FSL NAND driver to read all ONFI parameter pages Per ONFI specification (Rev. 4.0), if the CRC of the first parameter page read is not valid, the host should read redundant parameter page copies. Fix FSL NAND driver to read the two redundant copies which are mandatory in the specification. Signed-off-by: Jane Wan --- drivers/mtd/nand/raw/fsl_ifc_nand.c | 17 ++--- 1 file changed, 10 insertions(+), 7 deletions(-) diff --git a/drivers/mtd/nand/raw/fsl_ifc_nand.c b/drivers/mtd/nand/raw/fsl_ifc_nand.c index 61aae02..98aac1f 100644 --- a/drivers/mtd/nand/raw/fsl_ifc_nand.c +++ b/drivers/mtd/nand/raw/fsl_ifc_nand.c @@ -342,9 +342,16 @@ static void fsl_ifc_cmdfunc(struct mtd_info *mtd, unsigned int command, case NAND_CMD_READID: case NAND_CMD_PARAM: { + /* +* For READID, read 8 bytes that are currently used. +* For PARAM, read all 3 copies of 256-bytes pages. +*/ + int len = 8; int timing = IFC_FIR_OP_RB; - if (command == NAND_CMD_PARAM) + if (command == NAND_CMD_PARAM) { timing = IFC_FIR_OP_RBCD; + len = 256 * 3; + } ifc_out32((IFC_FIR_OP_CW0 << IFC_NAND_FIR0_OP0_SHIFT) | (IFC_FIR_OP_UA << IFC_NAND_FIR0_OP1_SHIFT) | @@ -354,12 +361,8 @@ static void fsl_ifc_cmdfunc(struct mtd_info *mtd, unsigned int command, >ifc_nand.nand_fcr0); ifc_out32(column, >ifc_nand.row3); - /* -* although currently it's 8 bytes for READID, we always read -* the maximum 256 bytes(for PARAM) -*/ - ifc_out32(256, >ifc_nand.nand_fbcr); - ifc_nand_ctrl->read_bytes = 256; + ifc_out32(len, >ifc_nand.nand_fbcr); + ifc_nand_ctrl->read_bytes = len; set_addr(mtd, 0, 0, 0); fsl_ifc_run_command(mtd); -- 1.7.9.5 -Original Message- From: Miquel Raynal [mailto:miquel.ray...@bootlin.com] Sent: Saturday, April 28, 2018 4:42 AM To: Wan, Jane (Nokia - US/Sunnyvale) Cc: dw...@infradead.org; computersforpe...@gmail.com; Bos, Ties (Nokia - US/Sunnyvale) ; linux-...@lists.infradead.org; linux-kernel@vger.kernel.org; Boris Brezillon Subject: Re: [PATCH 1/2] Fix FSL NAND driver to read all ONFI parameter pages Hi Jane, You forgot to Cc the right maintainers, please use ./scripts/get_maintainer.pl for that. [Jane] Added through 4.17-rc1 get_maintainer.pl. I was running the script from an older kernel version we're using. > Signed-off-by: Jane Wan Please add a description of what your are doing in the commit message. The description in the cover letter is good, you can copy the relevant section here. [Jane] Added. > --- > drivers/mtd/nand/fsl_ifc_nand.c | 10 ++ Also, just for you to know, files have moved in a raw/ subdirectory, so please rebase on top of 4.17-rc1 and prefix the commit title with "mtd: rawnand: fsl_ifc:". [Jane] Thank you for the info. I've rebased the change on top of 4.17-rc1 and modified the commit title as you suggested. > 1 file changed, 6 insertions(+), 4 deletions(-) > > diff --git a/drivers/mtd/nand/fsl_ifc_nand.c > b/drivers/mtd/nand/fsl_ifc_nand.c index ca36b35..a3cf6ca 100644 > --- a/drivers/mtd/nand/fsl_ifc_nand.c > +++ b/drivers/mtd/nand/fsl_ifc_nand.c > @@ -413,6 +413,7 @@ static void fsl_ifc_cmdfunc(struct mtd_info *mtd, > unsigned int command, > struct fsl_ifc_mtd *priv = chip->priv; > struct fsl_ifc_ctrl *ctrl = priv->ctrl; > struct fsl_ifc_runtime __iomem *ifc = ctrl->rregs; > + int len; > > /* clear the read buffer */ > ifc_nand_ctrl->read_bytes = 0; > @@ -462,11 +463,12 @@ static void fsl_ifc_cmdfunc(struct mtd_info *mtd, >
general protection fault in n_tty_set_termios
Hello, syzbot found the following crash on: HEAD commit:8188fc8bef8c Merge git://git.kernel.org/pub/scm/linux/kerne... git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?id=5093449355231232 kernel config: https://syzkaller.appspot.com/x/.config?id=6493557782959164711 dashboard link: https://syzkaller.appspot.com/bug?extid=ed02be0ad5f26ef4e31b compiler: gcc (GCC) 8.0.1 20180413 (experimental) syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?id=6543533393575936 C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5754063643738112 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+ed02be0ad5f26ef4e...@syzkaller.appspotmail.com kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 4509 Comm: syz-executor654 Not tainted 4.17.0-rc3+ #26 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:n_tty_set_termios+0x2d9/0xe80 drivers/tty/n_tty.c:1782 RSP: 0018:8801b42df698 EFLAGS: 00010203 RAX: 0001 RBX: RCX: 000b RDX: dc00 RSI: RDI: 0005 RBP: 8801b42df6d0 R08: 8801d97aa000 R09: 0002 R10: 8801d97aa888 R11: 8801d97aa000 R12: 8801d9bea500 R13: 8801d9bea8b4 R14: 005d R15: 8801b42df730 FS: 7f082d3d7700() GS:8801daf0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f082d3b5e78 CR3: 0001ac8aa000 CR4: 001406e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: tty_set_termios+0x7a0/0xac0 drivers/tty/tty_ioctl.c:341 set_termios+0x41e/0x7d0 drivers/tty/tty_ioctl.c:414 tty_mode_ioctl+0x855/0xb50 drivers/tty/tty_ioctl.c:749 n_tty_ioctl_helper+0x54/0x3b0 drivers/tty/tty_ioctl.c:940 n_tty_ioctl+0x54/0x320 drivers/tty/n_tty.c:2441 tty_ioctl+0x5e1/0x1870 drivers/tty/tty_io.c:2655 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:500 [inline] do_vfs_ioctl+0x1cf/0x16a0 fs/ioctl.c:684 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701 __do_sys_ioctl fs/ioctl.c:708 [inline] __se_sys_ioctl fs/ioctl.c:706 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x445d19 RSP: 002b:7f082d3d6da8 EFLAGS: 0297 ORIG_RAX: 0010 RAX: ffda RBX: 006dac3c RCX: 00445d19 RDX: 2040 RSI: 5402 RDI: 0033 RBP: R08: R09: R10: R11: 0297 R12: 006dac38 R13: 6d74702f7665642f R14: 7f082d3d79c0 R15: 0007 Code: 8b 45 d0 31 ff 83 e0 02 89 c6 89 45 d0 e8 50 4a e1 fd 8b 45 d0 4c 89 f1 48 ba 00 00 00 00 00 fc ff df 85 c0 0f 95 c0 48 c1 e9 03 <0f> b6 14 11 4c 89 f1 83 e1 07 38 ca 7f 08 84 d2 0f 85 96 09 00 RIP: n_tty_set_termios+0x2d9/0xe80 drivers/tty/n_tty.c:1782 RSP: 8801b42df698 ---[ end trace b89be7398398fc5c ]--- --- This bug is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkal...@googlegroups.com. syzbot will keep track of this bug report. If you forgot to add the Reported-by tag, once the fix for this bug is merged into any tree, please reply to this email with: #syz fix: exact-commit-title If you want to test a patch for this bug, please reply with: #syz test: git://repo/address.git branch and provide the patch inline or as an attachment. To mark this as a duplicate of another syzbot report, please reply with: #syz dup: exact-subject-of-another-report If it's a one-off invalid bug report, please reply with: #syz invalid Note: if the crash happens again, it will cause creation of a new bug report. Note: all commands must start from beginning of the line in the email body.
general protection fault in n_tty_set_termios
Hello, syzbot found the following crash on: HEAD commit:8188fc8bef8c Merge git://git.kernel.org/pub/scm/linux/kerne... git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?id=5093449355231232 kernel config: https://syzkaller.appspot.com/x/.config?id=6493557782959164711 dashboard link: https://syzkaller.appspot.com/bug?extid=ed02be0ad5f26ef4e31b compiler: gcc (GCC) 8.0.1 20180413 (experimental) syzkaller repro:https://syzkaller.appspot.com/x/repro.syz?id=6543533393575936 C reproducer: https://syzkaller.appspot.com/x/repro.c?id=5754063643738112 IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+ed02be0ad5f26ef4e...@syzkaller.appspotmail.com kasan: CONFIG_KASAN_INLINE enabled kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: [#1] SMP KASAN Dumping ftrace buffer: (ftrace buffer empty) Modules linked in: CPU: 1 PID: 4509 Comm: syz-executor654 Not tainted 4.17.0-rc3+ #26 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:n_tty_set_termios+0x2d9/0xe80 drivers/tty/n_tty.c:1782 RSP: 0018:8801b42df698 EFLAGS: 00010203 RAX: 0001 RBX: RCX: 000b RDX: dc00 RSI: RDI: 0005 RBP: 8801b42df6d0 R08: 8801d97aa000 R09: 0002 R10: 8801d97aa888 R11: 8801d97aa000 R12: 8801d9bea500 R13: 8801d9bea8b4 R14: 005d R15: 8801b42df730 FS: 7f082d3d7700() GS:8801daf0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7f082d3b5e78 CR3: 0001ac8aa000 CR4: 001406e0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: tty_set_termios+0x7a0/0xac0 drivers/tty/tty_ioctl.c:341 set_termios+0x41e/0x7d0 drivers/tty/tty_ioctl.c:414 tty_mode_ioctl+0x855/0xb50 drivers/tty/tty_ioctl.c:749 n_tty_ioctl_helper+0x54/0x3b0 drivers/tty/tty_ioctl.c:940 n_tty_ioctl+0x54/0x320 drivers/tty/n_tty.c:2441 tty_ioctl+0x5e1/0x1870 drivers/tty/tty_io.c:2655 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:500 [inline] do_vfs_ioctl+0x1cf/0x16a0 fs/ioctl.c:684 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:701 __do_sys_ioctl fs/ioctl.c:708 [inline] __se_sys_ioctl fs/ioctl.c:706 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:706 do_syscall_64+0x1b1/0x800 arch/x86/entry/common.c:287 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x445d19 RSP: 002b:7f082d3d6da8 EFLAGS: 0297 ORIG_RAX: 0010 RAX: ffda RBX: 006dac3c RCX: 00445d19 RDX: 2040 RSI: 5402 RDI: 0033 RBP: R08: R09: R10: R11: 0297 R12: 006dac38 R13: 6d74702f7665642f R14: 7f082d3d79c0 R15: 0007 Code: 8b 45 d0 31 ff 83 e0 02 89 c6 89 45 d0 e8 50 4a e1 fd 8b 45 d0 4c 89 f1 48 ba 00 00 00 00 00 fc ff df 85 c0 0f 95 c0 48 c1 e9 03 <0f> b6 14 11 4c 89 f1 83 e1 07 38 ca 7f 08 84 d2 0f 85 96 09 00 RIP: n_tty_set_termios+0x2d9/0xe80 drivers/tty/n_tty.c:1782 RSP: 8801b42df698 ---[ end trace b89be7398398fc5c ]--- --- This bug is generated by a bot. It may contain errors. See https://goo.gl/tpsmEJ for more information about syzbot. syzbot engineers can be reached at syzkal...@googlegroups.com. syzbot will keep track of this bug report. If you forgot to add the Reported-by tag, once the fix for this bug is merged into any tree, please reply to this email with: #syz fix: exact-commit-title If you want to test a patch for this bug, please reply with: #syz test: git://repo/address.git branch and provide the patch inline or as an attachment. To mark this as a duplicate of another syzbot report, please reply with: #syz dup: exact-subject-of-another-report If it's a one-off invalid bug report, please reply with: #syz invalid Note: if the crash happens again, it will cause creation of a new bug report. Note: all commands must start from beginning of the line in the email body.
Re: [PATCH] IB/core: Make ib_mad_client_id atomic
On Mon, 30 Apr 2018 13:10:49 -0400 Doug Ledfordwrote: Looks good! -Jack > On Mon, 2018-04-30 at 08:49 -0600, Jason Gunthorpe wrote: > > On Mon, Apr 23, 2018 at 10:16:18PM +0300, jackm wrote: > > > > > > > TIDs need to be globally unique on the entire machine. > > > Jason, that is not exactly correct. > > > > The expecation for /dev/umad users is that they all receive locally > > unique TID prefixes. The kernel may be OK to keep things > > port-specific but it is slightly breaking the API we are presenting > > to userspace to allow them to alias.. > > > > Jason > > Would people be happier with this commit message then: > > IB/core: Make ib_mad_client_id atomic > > Currently, the kernel protects access to the agent ID allocator on a > per port basis using a spinlock, so it is impossible for two > apps/threads on the same port to get the same TID, but it is entirely > possible for two threads on different ports to end up with the same > TID. > > As this can be confusing (regardless of it being legal according to > the IB Spec 1.3, C13-18.1.1, in section 13.4.6.4 - TransactionID > usage), and as the rdma-core user space API for /dev/umad devices > implies unique TIDs even across ports, make the TID an atomic type so > that no two allocations, regardless of port number, will be the same. > > Signed-off-by: Håkon Bugge > Reviewed-by: Jack Morgenstein > Reviewed-by: Ira Weiny > Reviewed-by: Zhu Yanjun > Signed-off-by: Doug Ledford > >
Re: [PATCH] IB/core: Make ib_mad_client_id atomic
On Mon, 30 Apr 2018 13:10:49 -0400 Doug Ledford wrote: Looks good! -Jack > On Mon, 2018-04-30 at 08:49 -0600, Jason Gunthorpe wrote: > > On Mon, Apr 23, 2018 at 10:16:18PM +0300, jackm wrote: > > > > > > > TIDs need to be globally unique on the entire machine. > > > Jason, that is not exactly correct. > > > > The expecation for /dev/umad users is that they all receive locally > > unique TID prefixes. The kernel may be OK to keep things > > port-specific but it is slightly breaking the API we are presenting > > to userspace to allow them to alias.. > > > > Jason > > Would people be happier with this commit message then: > > IB/core: Make ib_mad_client_id atomic > > Currently, the kernel protects access to the agent ID allocator on a > per port basis using a spinlock, so it is impossible for two > apps/threads on the same port to get the same TID, but it is entirely > possible for two threads on different ports to end up with the same > TID. > > As this can be confusing (regardless of it being legal according to > the IB Spec 1.3, C13-18.1.1, in section 13.4.6.4 - TransactionID > usage), and as the rdma-core user space API for /dev/umad devices > implies unique TIDs even across ports, make the TID an atomic type so > that no two allocations, regardless of port number, will be the same. > > Signed-off-by: Håkon Bugge > Reviewed-by: Jack Morgenstein > Reviewed-by: Ira Weiny > Reviewed-by: Zhu Yanjun > Signed-off-by: Doug Ledford > >
[PATCH] tipc: fix a potential missing-check bug
In tipc_link_xmit(), the member field "len" of l->backlog[imp] must be less than the member field "limit" of l->backlog[imp] when imp is equal to TIPC_SYSTEM_IMPORTANCE. Otherwise, an error code, i.e., -ENOBUFS, is returned. This is enforced by the security check. However, at the end of tipc_link_xmit(), the length of "list" is added to l->backlog[imp].len without any further check. This can potentially cause unexpected values for l->backlog[imp].len. If imp is equal to TIPC_SYSTEM_IMPORTANCE and the original value of l->backlog[imp].len is less than l->backlog[imp].limit, after this addition, l->backlog[imp] could be larger than l->backlog[imp].limit. That means the security check can potentially be bypassed, especially when an adversary can control the length of "list". This patch performs such a check after the modification to l->backlog[imp].len (if imp is TIPC_SYSTEM_IMPORTANCE) to avoid such security issues. An error code will be returned if an unexpected value of l->backlog[imp].len is generated. Signed-off-by: Wenwen Wang--- net/tipc/link.c | 5 + 1 file changed, 5 insertions(+) diff --git a/net/tipc/link.c b/net/tipc/link.c index 695acb7..62972fa 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -948,6 +948,11 @@ int tipc_link_xmit(struct tipc_link *l, struct sk_buff_head *list, continue; } l->backlog[imp].len += skb_queue_len(list); + if (imp == TIPC_SYSTEM_IMPORTANCE && + l->backlog[imp].len >= l->backlog[imp].limit) { + pr_warn("%s<%s>, link overflow", link_rst_msg, l->name); + return -ENOBUFS; + } skb_queue_splice_tail_init(list, backlogq); } l->snd_nxt = seqno; -- 2.7.4
[PATCH] tipc: fix a potential missing-check bug
In tipc_link_xmit(), the member field "len" of l->backlog[imp] must be less than the member field "limit" of l->backlog[imp] when imp is equal to TIPC_SYSTEM_IMPORTANCE. Otherwise, an error code, i.e., -ENOBUFS, is returned. This is enforced by the security check. However, at the end of tipc_link_xmit(), the length of "list" is added to l->backlog[imp].len without any further check. This can potentially cause unexpected values for l->backlog[imp].len. If imp is equal to TIPC_SYSTEM_IMPORTANCE and the original value of l->backlog[imp].len is less than l->backlog[imp].limit, after this addition, l->backlog[imp] could be larger than l->backlog[imp].limit. That means the security check can potentially be bypassed, especially when an adversary can control the length of "list". This patch performs such a check after the modification to l->backlog[imp].len (if imp is TIPC_SYSTEM_IMPORTANCE) to avoid such security issues. An error code will be returned if an unexpected value of l->backlog[imp].len is generated. Signed-off-by: Wenwen Wang --- net/tipc/link.c | 5 + 1 file changed, 5 insertions(+) diff --git a/net/tipc/link.c b/net/tipc/link.c index 695acb7..62972fa 100644 --- a/net/tipc/link.c +++ b/net/tipc/link.c @@ -948,6 +948,11 @@ int tipc_link_xmit(struct tipc_link *l, struct sk_buff_head *list, continue; } l->backlog[imp].len += skb_queue_len(list); + if (imp == TIPC_SYSTEM_IMPORTANCE && + l->backlog[imp].len >= l->backlog[imp].limit) { + pr_warn("%s<%s>, link overflow", link_rst_msg, l->name); + return -ENOBUFS; + } skb_queue_splice_tail_init(list, backlogq); } l->snd_nxt = seqno; -- 2.7.4
Re: [PATCH][next] pinctrl: actions: Fix Kconfig dependency and help text
On 04/30/2018 08:00 PM, Manivannan Sadhasivam wrote: > 1. Fix Kconfig dependency for Actions Semi S900 pinctrl driver which > generates below warning in x86: > > WARNING: unmet direct dependencies detected for PINCTRL_OWL > Depends on [n]: PINCTRL [=y] && (ARCH_ACTIONS || COMPILE_TEST [=n]) && OF > [=n] > Selected by [y]: > - PINCTRL_S900 [=y] && PINCTRL [=y] > > 2. Add help text for OWL pinctrl driver > > Signed-off-by: Manivannan SadhasivamReported-by: Randy Dunlap Tested-by: Randy Dunlap Acked-by: Randy Dunlap Thanks. > --- > drivers/pinctrl/actions/Kconfig | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/pinctrl/actions/Kconfig b/drivers/pinctrl/actions/Kconfig > index 1c7309c90f0d..ede97cdbbc12 100644 > --- a/drivers/pinctrl/actions/Kconfig > +++ b/drivers/pinctrl/actions/Kconfig > @@ -1,12 +1,14 @@ > config PINCTRL_OWL > - bool > + bool "Actions Semi OWL pinctrl driver" > depends on (ARCH_ACTIONS || COMPILE_TEST) && OF > select PINMUX > select PINCONF > select GENERIC_PINCONF > + help > + Say Y here to enable Actions Semi OWL pinctrl driver > > config PINCTRL_S900 > bool "Actions Semi S900 pinctrl driver" > - select PINCTRL_OWL > + depends on PINCTRL_OWL > help > Say Y here to enable Actions Semi S900 pinctrl driver > -- ~Randy
Re: [PATCH][next] pinctrl: actions: Fix Kconfig dependency and help text
On 04/30/2018 08:00 PM, Manivannan Sadhasivam wrote: > 1. Fix Kconfig dependency for Actions Semi S900 pinctrl driver which > generates below warning in x86: > > WARNING: unmet direct dependencies detected for PINCTRL_OWL > Depends on [n]: PINCTRL [=y] && (ARCH_ACTIONS || COMPILE_TEST [=n]) && OF > [=n] > Selected by [y]: > - PINCTRL_S900 [=y] && PINCTRL [=y] > > 2. Add help text for OWL pinctrl driver > > Signed-off-by: Manivannan Sadhasivam Reported-by: Randy Dunlap Tested-by: Randy Dunlap Acked-by: Randy Dunlap Thanks. > --- > drivers/pinctrl/actions/Kconfig | 6 -- > 1 file changed, 4 insertions(+), 2 deletions(-) > > diff --git a/drivers/pinctrl/actions/Kconfig b/drivers/pinctrl/actions/Kconfig > index 1c7309c90f0d..ede97cdbbc12 100644 > --- a/drivers/pinctrl/actions/Kconfig > +++ b/drivers/pinctrl/actions/Kconfig > @@ -1,12 +1,14 @@ > config PINCTRL_OWL > - bool > + bool "Actions Semi OWL pinctrl driver" > depends on (ARCH_ACTIONS || COMPILE_TEST) && OF > select PINMUX > select PINCONF > select GENERIC_PINCONF > + help > + Say Y here to enable Actions Semi OWL pinctrl driver > > config PINCTRL_S900 > bool "Actions Semi S900 pinctrl driver" > - select PINCTRL_OWL > + depends on PINCTRL_OWL > help > Say Y here to enable Actions Semi S900 pinctrl driver > -- ~Randy
Re: [PATCH 03/10] staging: lustre: lu_object: discard extra lru count.
On Apr 30, 2018, at 21:52, NeilBrownwrote: > > lu_object maintains 2 lru counts. > One is a per-bucket lsb_lru_len. > The other is the per-cpu ls_lru_len_counter. > > The only times the per-bucket counters are use are: > - a debug message when an object is added > - in lu_site_stats_get when all the counters are combined. > > The debug message is not essential, and the per-cpu counter > can be used to get the combined total. > > So discard the per-bucket lsb_lru_len. > > Signed-off-by: NeilBrown Looks reasonable, though it would also be possible to fix the percpu functions rather than adding a workaround in this code. Reviewed-by: Andreas Dilger > --- > drivers/staging/lustre/lustre/obdclass/lu_object.c | 24 > 1 file changed, 9 insertions(+), 15 deletions(-) > > diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c > b/drivers/staging/lustre/lustre/obdclass/lu_object.c > index 2a8a25d6edb5..2bf089817157 100644 > --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c > +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c > @@ -57,10 +57,6 @@ > #include > > struct lu_site_bkt_data { > - /** > - * number of object in this bucket on the lsb_lru list. > - */ > - longlsb_lru_len; > /** >* LRU list, updated on each access to object. Protected by >* bucket lock of lu_site::ls_obj_hash. > @@ -187,10 +183,9 @@ void lu_object_put(const struct lu_env *env, struct > lu_object *o) > if (!lu_object_is_dying(top)) { > LASSERT(list_empty(>loh_lru)); > list_add_tail(>loh_lru, >lsb_lru); > - bkt->lsb_lru_len++; > percpu_counter_inc(>ls_lru_len_counter); > - CDEBUG(D_INODE, "Add %p to site lru. hash: %p, bkt: %p, > lru_len: %ld\n", > -o, site->ls_obj_hash, bkt, bkt->lsb_lru_len); > + CDEBUG(D_INODE, "Add %p to site lru. hash: %p, bkt: %p\n", > +o, site->ls_obj_hash, bkt); > cfs_hash_bd_unlock(site->ls_obj_hash, , 1); > return; > } > @@ -238,7 +233,6 @@ void lu_object_unhash(const struct lu_env *env, struct > lu_object *o) > > list_del_init(>loh_lru); > bkt = cfs_hash_bd_extra_get(obj_hash, ); > - bkt->lsb_lru_len--; > percpu_counter_dec(>ls_lru_len_counter); > } > cfs_hash_bd_del_locked(obj_hash, , >loh_hash); > @@ -422,7 +416,6 @@ int lu_site_purge_objects(const struct lu_env *env, > struct lu_site *s, > cfs_hash_bd_del_locked(s->ls_obj_hash, > , >loh_hash); > list_move(>loh_lru, ); > - bkt->lsb_lru_len--; > percpu_counter_dec(>ls_lru_len_counter); > if (did_sth == 0) > did_sth = 1; > @@ -621,7 +614,6 @@ static struct lu_object *htable_lookup(struct lu_site *s, > lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_HIT); > if (!list_empty(>loh_lru)) { > list_del_init(>loh_lru); > - bkt->lsb_lru_len--; > percpu_counter_dec(>ls_lru_len_counter); > } > return lu_object_top(h); > @@ -1834,19 +1826,21 @@ struct lu_site_stats { > unsigned intlss_busy; > }; > > -static void lu_site_stats_get(struct cfs_hash *hs, > +static void lu_site_stats_get(const struct lu_site *s, > struct lu_site_stats *stats, int populated) > { > + struct cfs_hash *hs = s->ls_obj_hash; > struct cfs_hash_bd bd; > unsigned int i; > + /* percpu_counter_read_positive() won't accept a const pointer */ > + struct lu_site *s2 = (struct lu_site *)s; It would seem worthwhile to change the percpu_counter_read_positive() and percpu_counter_read() arguments to be "const struct percpu_counter *fbc", rather than doing this cast here. I can't see any reason that would be bad, since both implementations just access fbc->count, and do not modify anything. > + stats->lss_busy += cfs_hash_size_get(hs) - > + percpu_counter_read_positive(>ls_lru_len_counter); > cfs_hash_for_each_bucket(hs, , i) { > - struct lu_site_bkt_data *bkt = cfs_hash_bd_extra_get(hs, ); > struct hlist_head *hhead; > > cfs_hash_bd_lock(hs, , 1); > - stats->lss_busy += > - cfs_hash_bd_count_get() - bkt->lsb_lru_len; > stats->lss_total += cfs_hash_bd_count_get(); > stats->lss_max_search = max((int)stats->lss_max_search, > cfs_hash_bd_depmax_get()); > @@ -2039,7 +2033,7 @@ int lu_site_stats_print(const struct lu_site *s, struct > seq_file *m)
Re: [PATCH 03/10] staging: lustre: lu_object: discard extra lru count.
On Apr 30, 2018, at 21:52, NeilBrown wrote: > > lu_object maintains 2 lru counts. > One is a per-bucket lsb_lru_len. > The other is the per-cpu ls_lru_len_counter. > > The only times the per-bucket counters are use are: > - a debug message when an object is added > - in lu_site_stats_get when all the counters are combined. > > The debug message is not essential, and the per-cpu counter > can be used to get the combined total. > > So discard the per-bucket lsb_lru_len. > > Signed-off-by: NeilBrown Looks reasonable, though it would also be possible to fix the percpu functions rather than adding a workaround in this code. Reviewed-by: Andreas Dilger > --- > drivers/staging/lustre/lustre/obdclass/lu_object.c | 24 > 1 file changed, 9 insertions(+), 15 deletions(-) > > diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c > b/drivers/staging/lustre/lustre/obdclass/lu_object.c > index 2a8a25d6edb5..2bf089817157 100644 > --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c > +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c > @@ -57,10 +57,6 @@ > #include > > struct lu_site_bkt_data { > - /** > - * number of object in this bucket on the lsb_lru list. > - */ > - longlsb_lru_len; > /** >* LRU list, updated on each access to object. Protected by >* bucket lock of lu_site::ls_obj_hash. > @@ -187,10 +183,9 @@ void lu_object_put(const struct lu_env *env, struct > lu_object *o) > if (!lu_object_is_dying(top)) { > LASSERT(list_empty(>loh_lru)); > list_add_tail(>loh_lru, >lsb_lru); > - bkt->lsb_lru_len++; > percpu_counter_inc(>ls_lru_len_counter); > - CDEBUG(D_INODE, "Add %p to site lru. hash: %p, bkt: %p, > lru_len: %ld\n", > -o, site->ls_obj_hash, bkt, bkt->lsb_lru_len); > + CDEBUG(D_INODE, "Add %p to site lru. hash: %p, bkt: %p\n", > +o, site->ls_obj_hash, bkt); > cfs_hash_bd_unlock(site->ls_obj_hash, , 1); > return; > } > @@ -238,7 +233,6 @@ void lu_object_unhash(const struct lu_env *env, struct > lu_object *o) > > list_del_init(>loh_lru); > bkt = cfs_hash_bd_extra_get(obj_hash, ); > - bkt->lsb_lru_len--; > percpu_counter_dec(>ls_lru_len_counter); > } > cfs_hash_bd_del_locked(obj_hash, , >loh_hash); > @@ -422,7 +416,6 @@ int lu_site_purge_objects(const struct lu_env *env, > struct lu_site *s, > cfs_hash_bd_del_locked(s->ls_obj_hash, > , >loh_hash); > list_move(>loh_lru, ); > - bkt->lsb_lru_len--; > percpu_counter_dec(>ls_lru_len_counter); > if (did_sth == 0) > did_sth = 1; > @@ -621,7 +614,6 @@ static struct lu_object *htable_lookup(struct lu_site *s, > lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_HIT); > if (!list_empty(>loh_lru)) { > list_del_init(>loh_lru); > - bkt->lsb_lru_len--; > percpu_counter_dec(>ls_lru_len_counter); > } > return lu_object_top(h); > @@ -1834,19 +1826,21 @@ struct lu_site_stats { > unsigned intlss_busy; > }; > > -static void lu_site_stats_get(struct cfs_hash *hs, > +static void lu_site_stats_get(const struct lu_site *s, > struct lu_site_stats *stats, int populated) > { > + struct cfs_hash *hs = s->ls_obj_hash; > struct cfs_hash_bd bd; > unsigned int i; > + /* percpu_counter_read_positive() won't accept a const pointer */ > + struct lu_site *s2 = (struct lu_site *)s; It would seem worthwhile to change the percpu_counter_read_positive() and percpu_counter_read() arguments to be "const struct percpu_counter *fbc", rather than doing this cast here. I can't see any reason that would be bad, since both implementations just access fbc->count, and do not modify anything. > + stats->lss_busy += cfs_hash_size_get(hs) - > + percpu_counter_read_positive(>ls_lru_len_counter); > cfs_hash_for_each_bucket(hs, , i) { > - struct lu_site_bkt_data *bkt = cfs_hash_bd_extra_get(hs, ); > struct hlist_head *hhead; > > cfs_hash_bd_lock(hs, , 1); > - stats->lss_busy += > - cfs_hash_bd_count_get() - bkt->lsb_lru_len; > stats->lss_total += cfs_hash_bd_count_get(); > stats->lss_max_search = max((int)stats->lss_max_search, > cfs_hash_bd_depmax_get()); > @@ -2039,7 +2033,7 @@ int lu_site_stats_print(const struct lu_site *s, struct > seq_file *m) > struct lu_site_stats stats; > > memset(, 0,
Re: [lustre-devel] [PATCH 02/10] staging: lustre: make struct lu_site_bkt_data private
On Apr 30, 2018, at 21:52, NeilBrownwrote: > > This data structure only needs to be public so that > various modules can access a wait queue to wait for object > destruction. > If we provide a function to get the wait queue, rather than the > whole bucket, the structure can be made private. > > Signed-off-by: NeilBrown Nice cleanup. Reviewed-by: Andreas Dilger > --- > drivers/staging/lustre/lustre/include/lu_object.h | 36 +- > drivers/staging/lustre/lustre/llite/lcommon_cl.c |8 ++- > drivers/staging/lustre/lustre/lov/lov_object.c |8 ++- > drivers/staging/lustre/lustre/obdclass/lu_object.c | 50 +--- > 4 files changed, 54 insertions(+), 48 deletions(-) > > diff --git a/drivers/staging/lustre/lustre/include/lu_object.h > b/drivers/staging/lustre/lustre/include/lu_object.h > index c3b0ed518819..f29bbca5af65 100644 > --- a/drivers/staging/lustre/lustre/include/lu_object.h > +++ b/drivers/staging/lustre/lustre/include/lu_object.h > @@ -549,31 +549,7 @@ struct lu_object_header { > }; > > struct fld; > - > -struct lu_site_bkt_data { > - /** > - * number of object in this bucket on the lsb_lru list. > - */ > - longlsb_lru_len; > - /** > - * LRU list, updated on each access to object. Protected by > - * bucket lock of lu_site::ls_obj_hash. > - * > - * "Cold" end of LRU is lu_site::ls_lru.next. Accessed object are > - * moved to the lu_site::ls_lru.prev (this is due to the non-existence > - * of list_for_each_entry_safe_reverse()). > - */ > - struct list_headlsb_lru; > - /** > - * Wait-queue signaled when an object in this site is ultimately > - * destroyed (lu_object_free()). It is used by lu_object_find() to > - * wait before re-trying when object in the process of destruction is > - * found in the hash table. > - * > - * \see htable_lookup(). > - */ > - wait_queue_head_t lsb_marche_funebre; > -}; > +struct lu_site_bkt_data; > > enum { > LU_SS_CREATED= 0, > @@ -642,14 +618,8 @@ struct lu_site { > struct percpu_counterls_lru_len_counter; > }; > > -static inline struct lu_site_bkt_data * > -lu_site_bkt_from_fid(struct lu_site *site, struct lu_fid *fid) > -{ > - struct cfs_hash_bd bd; > - > - cfs_hash_bd_get(site->ls_obj_hash, fid, ); > - return cfs_hash_bd_extra_get(site->ls_obj_hash, ); > -} > +wait_queue_head_t * > +lu_site_wq_from_fid(struct lu_site *site, struct lu_fid *fid); > > static inline struct seq_server_site *lu_site2seq(const struct lu_site *s) > { > diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c > b/drivers/staging/lustre/lustre/llite/lcommon_cl.c > index df5c0c0ae703..d5b42fb1d601 100644 > --- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c > +++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c > @@ -211,12 +211,12 @@ static void cl_object_put_last(struct lu_env *env, > struct cl_object *obj) > > if (unlikely(atomic_read(>loh_ref) != 1)) { > struct lu_site *site = obj->co_lu.lo_dev->ld_site; > - struct lu_site_bkt_data *bkt; > + wait_queue_head_t *wq; > > - bkt = lu_site_bkt_from_fid(site, >loh_fid); > + wq = lu_site_wq_from_fid(site, >loh_fid); > > init_waitqueue_entry(, current); > - add_wait_queue(>lsb_marche_funebre, ); > + add_wait_queue(wq, ); > > while (1) { > set_current_state(TASK_UNINTERRUPTIBLE); > @@ -226,7 +226,7 @@ static void cl_object_put_last(struct lu_env *env, struct > cl_object *obj) > } > > set_current_state(TASK_RUNNING); > - remove_wait_queue(>lsb_marche_funebre, ); > + remove_wait_queue(wq, ); > } > > cl_object_put(env, obj); > diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c > b/drivers/staging/lustre/lustre/lov/lov_object.c > index f7c69680cb7d..adc90f310fd7 100644 > --- a/drivers/staging/lustre/lustre/lov/lov_object.c > +++ b/drivers/staging/lustre/lustre/lov/lov_object.c > @@ -370,7 +370,7 @@ static void lov_subobject_kill(const struct lu_env *env, > struct lov_object *lov, > struct cl_object*sub; > struct lov_layout_raid0 *r0; > struct lu_site*site; > - struct lu_site_bkt_data *bkt; > + wait_queue_head_t *wq; > wait_queue_entry_t*waiter; > > r0 = >u.raid0; > @@ -378,7 +378,7 @@ static void lov_subobject_kill(const struct lu_env *env, > struct lov_object *lov, > > sub = lovsub2cl(los); > site = sub->co_lu.lo_dev->ld_site; > - bkt = lu_site_bkt_from_fid(site, >co_lu.lo_header->loh_fid); > + wq = lu_site_wq_from_fid(site, >co_lu.lo_header->loh_fid); > > cl_object_kill(env, sub); > /* release a reference to the sub-object and ... */ > @@
Re: [lustre-devel] [PATCH 02/10] staging: lustre: make struct lu_site_bkt_data private
On Apr 30, 2018, at 21:52, NeilBrown wrote: > > This data structure only needs to be public so that > various modules can access a wait queue to wait for object > destruction. > If we provide a function to get the wait queue, rather than the > whole bucket, the structure can be made private. > > Signed-off-by: NeilBrown Nice cleanup. Reviewed-by: Andreas Dilger > --- > drivers/staging/lustre/lustre/include/lu_object.h | 36 +- > drivers/staging/lustre/lustre/llite/lcommon_cl.c |8 ++- > drivers/staging/lustre/lustre/lov/lov_object.c |8 ++- > drivers/staging/lustre/lustre/obdclass/lu_object.c | 50 +--- > 4 files changed, 54 insertions(+), 48 deletions(-) > > diff --git a/drivers/staging/lustre/lustre/include/lu_object.h > b/drivers/staging/lustre/lustre/include/lu_object.h > index c3b0ed518819..f29bbca5af65 100644 > --- a/drivers/staging/lustre/lustre/include/lu_object.h > +++ b/drivers/staging/lustre/lustre/include/lu_object.h > @@ -549,31 +549,7 @@ struct lu_object_header { > }; > > struct fld; > - > -struct lu_site_bkt_data { > - /** > - * number of object in this bucket on the lsb_lru list. > - */ > - longlsb_lru_len; > - /** > - * LRU list, updated on each access to object. Protected by > - * bucket lock of lu_site::ls_obj_hash. > - * > - * "Cold" end of LRU is lu_site::ls_lru.next. Accessed object are > - * moved to the lu_site::ls_lru.prev (this is due to the non-existence > - * of list_for_each_entry_safe_reverse()). > - */ > - struct list_headlsb_lru; > - /** > - * Wait-queue signaled when an object in this site is ultimately > - * destroyed (lu_object_free()). It is used by lu_object_find() to > - * wait before re-trying when object in the process of destruction is > - * found in the hash table. > - * > - * \see htable_lookup(). > - */ > - wait_queue_head_t lsb_marche_funebre; > -}; > +struct lu_site_bkt_data; > > enum { > LU_SS_CREATED= 0, > @@ -642,14 +618,8 @@ struct lu_site { > struct percpu_counterls_lru_len_counter; > }; > > -static inline struct lu_site_bkt_data * > -lu_site_bkt_from_fid(struct lu_site *site, struct lu_fid *fid) > -{ > - struct cfs_hash_bd bd; > - > - cfs_hash_bd_get(site->ls_obj_hash, fid, ); > - return cfs_hash_bd_extra_get(site->ls_obj_hash, ); > -} > +wait_queue_head_t * > +lu_site_wq_from_fid(struct lu_site *site, struct lu_fid *fid); > > static inline struct seq_server_site *lu_site2seq(const struct lu_site *s) > { > diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c > b/drivers/staging/lustre/lustre/llite/lcommon_cl.c > index df5c0c0ae703..d5b42fb1d601 100644 > --- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c > +++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c > @@ -211,12 +211,12 @@ static void cl_object_put_last(struct lu_env *env, > struct cl_object *obj) > > if (unlikely(atomic_read(>loh_ref) != 1)) { > struct lu_site *site = obj->co_lu.lo_dev->ld_site; > - struct lu_site_bkt_data *bkt; > + wait_queue_head_t *wq; > > - bkt = lu_site_bkt_from_fid(site, >loh_fid); > + wq = lu_site_wq_from_fid(site, >loh_fid); > > init_waitqueue_entry(, current); > - add_wait_queue(>lsb_marche_funebre, ); > + add_wait_queue(wq, ); > > while (1) { > set_current_state(TASK_UNINTERRUPTIBLE); > @@ -226,7 +226,7 @@ static void cl_object_put_last(struct lu_env *env, struct > cl_object *obj) > } > > set_current_state(TASK_RUNNING); > - remove_wait_queue(>lsb_marche_funebre, ); > + remove_wait_queue(wq, ); > } > > cl_object_put(env, obj); > diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c > b/drivers/staging/lustre/lustre/lov/lov_object.c > index f7c69680cb7d..adc90f310fd7 100644 > --- a/drivers/staging/lustre/lustre/lov/lov_object.c > +++ b/drivers/staging/lustre/lustre/lov/lov_object.c > @@ -370,7 +370,7 @@ static void lov_subobject_kill(const struct lu_env *env, > struct lov_object *lov, > struct cl_object*sub; > struct lov_layout_raid0 *r0; > struct lu_site*site; > - struct lu_site_bkt_data *bkt; > + wait_queue_head_t *wq; > wait_queue_entry_t*waiter; > > r0 = >u.raid0; > @@ -378,7 +378,7 @@ static void lov_subobject_kill(const struct lu_env *env, > struct lov_object *lov, > > sub = lovsub2cl(los); > site = sub->co_lu.lo_dev->ld_site; > - bkt = lu_site_bkt_from_fid(site, >co_lu.lo_header->loh_fid); > + wq = lu_site_wq_from_fid(site, >co_lu.lo_header->loh_fid); > > cl_object_kill(env, sub); > /* release a reference to the sub-object and ... */ > @@ -391,7 +391,7 @@ static void lov_subobject_kill(const
Re: [PATCH v2] staging: lustre: llite: fix potential missing-check bug when copying lumv
On Apr 30, 2018, at 16:56, Wenwen Wangwrote: > > In ll_dir_ioctl(), the object lumv3 is firstly copied from the user space > using Its address, i.e., lumv1 = If the lmm_magic field of lumv3 is > LOV_USER_MAGIC_V3, lumv3 will be modified by the second copy from the user > space. The second copy is necessary, because the two versions (i.e., > lov_user_md_v1 and lov_user_md_v3) have different data formats and lengths. > However, given that the user data resides in the user space, a malicious > user-space process can race to change the data between the two copies. By > doing so, the attacker can provide a data with an inconsistent version, > e.g., v1 version + v3 data. This can lead to logical errors in the > following execution in ll_dir_setstripe(), which performs different actions > according to the version specified by the field lmm_magic. > > This patch rechecks the version field lmm_magic in the second copy. If the > version is not as expected, i.e., LOV_USER_MAGIC_V3, an error code will be > returned: -EINVAL. > > Signed-off-by: Wenwen Wang Thanks for the updated patch. Reviewed-by: Andreas Dilger > --- > drivers/staging/lustre/lustre/llite/dir.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/staging/lustre/lustre/llite/dir.c > b/drivers/staging/lustre/lustre/llite/dir.c > index d10d272..80d44ca 100644 > --- a/drivers/staging/lustre/lustre/llite/dir.c > +++ b/drivers/staging/lustre/lustre/llite/dir.c > @@ -1185,6 +1185,8 @@ static long ll_dir_ioctl(struct file *file, unsigned > int cmd, unsigned long arg) > if (lumv1->lmm_magic == LOV_USER_MAGIC_V3) { > if (copy_from_user(, lumv3p, sizeof(lumv3))) > return -EFAULT; > + if (lumv3.lmm_magic != LOV_USER_MAGIC_V3) > + return -EINVAL; > } > > if (is_root_inode(inode)) > -- > 2.7.4 > Cheers, Andreas -- Andreas Dilger Lustre Principal Architect Intel Corporation
Re: [PATCH v2] staging: lustre: llite: fix potential missing-check bug when copying lumv
On Apr 30, 2018, at 16:56, Wenwen Wang wrote: > > In ll_dir_ioctl(), the object lumv3 is firstly copied from the user space > using Its address, i.e., lumv1 = If the lmm_magic field of lumv3 is > LOV_USER_MAGIC_V3, lumv3 will be modified by the second copy from the user > space. The second copy is necessary, because the two versions (i.e., > lov_user_md_v1 and lov_user_md_v3) have different data formats and lengths. > However, given that the user data resides in the user space, a malicious > user-space process can race to change the data between the two copies. By > doing so, the attacker can provide a data with an inconsistent version, > e.g., v1 version + v3 data. This can lead to logical errors in the > following execution in ll_dir_setstripe(), which performs different actions > according to the version specified by the field lmm_magic. > > This patch rechecks the version field lmm_magic in the second copy. If the > version is not as expected, i.e., LOV_USER_MAGIC_V3, an error code will be > returned: -EINVAL. > > Signed-off-by: Wenwen Wang Thanks for the updated patch. Reviewed-by: Andreas Dilger > --- > drivers/staging/lustre/lustre/llite/dir.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/drivers/staging/lustre/lustre/llite/dir.c > b/drivers/staging/lustre/lustre/llite/dir.c > index d10d272..80d44ca 100644 > --- a/drivers/staging/lustre/lustre/llite/dir.c > +++ b/drivers/staging/lustre/lustre/llite/dir.c > @@ -1185,6 +1185,8 @@ static long ll_dir_ioctl(struct file *file, unsigned > int cmd, unsigned long arg) > if (lumv1->lmm_magic == LOV_USER_MAGIC_V3) { > if (copy_from_user(, lumv3p, sizeof(lumv3))) > return -EFAULT; > + if (lumv3.lmm_magic != LOV_USER_MAGIC_V3) > + return -EINVAL; > } > > if (is_root_inode(inode)) > -- > 2.7.4 > Cheers, Andreas -- Andreas Dilger Lustre Principal Architect Intel Corporation
Re: [PATCH 01/10] staging: lustre: ldlm: store name directly in namespace.
On Apr 30, 2018, at 21:52, NeilBrownwrote: > > Rather than storing the name of a namespace in the > hash table, store it directly in the namespace. > This will allow the hashtable to be changed to use > rhashtable. > > Signed-off-by: NeilBrown Reviewed-by: Andreas Dilger > --- > drivers/staging/lustre/lustre/include/lustre_dlm.h |5 - > drivers/staging/lustre/lustre/ldlm/ldlm_resource.c |5 + > 2 files changed, 9 insertions(+), 1 deletion(-) > > diff --git a/drivers/staging/lustre/lustre/include/lustre_dlm.h > b/drivers/staging/lustre/lustre/include/lustre_dlm.h > index d668d86423a4..b3532adac31c 100644 > --- a/drivers/staging/lustre/lustre/include/lustre_dlm.h > +++ b/drivers/staging/lustre/lustre/include/lustre_dlm.h > @@ -362,6 +362,9 @@ struct ldlm_namespace { > /** Flag indicating if namespace is on client instead of server */ > enum ldlm_side ns_client; > > + /** name of this namespace */ > + char*ns_name; > + > /** Resource hash table for namespace. */ > struct cfs_hash *ns_rs_hash; > > @@ -878,7 +881,7 @@ static inline bool ldlm_has_layout(struct ldlm_lock *lock) > static inline char * > ldlm_ns_name(struct ldlm_namespace *ns) > { > - return ns->ns_rs_hash->hs_name; > + return ns->ns_name; > } > > static inline struct ldlm_namespace * > diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c > b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c > index 6c615b6e9bdc..43bbc5fd94cc 100644 > --- a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c > +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c > @@ -688,6 +688,9 @@ struct ldlm_namespace *ldlm_namespace_new(struct > obd_device *obd, char *name, > ns->ns_obd = obd; > ns->ns_appetite = apt; > ns->ns_client = client; > + ns->ns_name = kstrdup(name, GFP_KERNEL); > + if (!ns->ns_name) > + goto out_hash; > > INIT_LIST_HEAD(>ns_list_chain); > INIT_LIST_HEAD(>ns_unused_list); > @@ -730,6 +733,7 @@ struct ldlm_namespace *ldlm_namespace_new(struct > obd_device *obd, char *name, > ldlm_namespace_sysfs_unregister(ns); > ldlm_namespace_cleanup(ns, 0); > out_hash: > + kfree(ns->ns_name); > cfs_hash_putref(ns->ns_rs_hash); > out_ns: > kfree(ns); > @@ -993,6 +997,7 @@ void ldlm_namespace_free_post(struct ldlm_namespace *ns) > ldlm_namespace_debugfs_unregister(ns); > ldlm_namespace_sysfs_unregister(ns); > cfs_hash_putref(ns->ns_rs_hash); > + kfree(ns->ns_name); > /* Namespace \a ns should be not on list at this time, otherwise >* this will cause issues related to using freed \a ns in poold >* thread. > > Cheers, Andreas -- Andreas Dilger Lustre Principal Architect Intel Corporation
Re: [PATCH 01/10] staging: lustre: ldlm: store name directly in namespace.
On Apr 30, 2018, at 21:52, NeilBrown wrote: > > Rather than storing the name of a namespace in the > hash table, store it directly in the namespace. > This will allow the hashtable to be changed to use > rhashtable. > > Signed-off-by: NeilBrown Reviewed-by: Andreas Dilger > --- > drivers/staging/lustre/lustre/include/lustre_dlm.h |5 - > drivers/staging/lustre/lustre/ldlm/ldlm_resource.c |5 + > 2 files changed, 9 insertions(+), 1 deletion(-) > > diff --git a/drivers/staging/lustre/lustre/include/lustre_dlm.h > b/drivers/staging/lustre/lustre/include/lustre_dlm.h > index d668d86423a4..b3532adac31c 100644 > --- a/drivers/staging/lustre/lustre/include/lustre_dlm.h > +++ b/drivers/staging/lustre/lustre/include/lustre_dlm.h > @@ -362,6 +362,9 @@ struct ldlm_namespace { > /** Flag indicating if namespace is on client instead of server */ > enum ldlm_side ns_client; > > + /** name of this namespace */ > + char*ns_name; > + > /** Resource hash table for namespace. */ > struct cfs_hash *ns_rs_hash; > > @@ -878,7 +881,7 @@ static inline bool ldlm_has_layout(struct ldlm_lock *lock) > static inline char * > ldlm_ns_name(struct ldlm_namespace *ns) > { > - return ns->ns_rs_hash->hs_name; > + return ns->ns_name; > } > > static inline struct ldlm_namespace * > diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c > b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c > index 6c615b6e9bdc..43bbc5fd94cc 100644 > --- a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c > +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c > @@ -688,6 +688,9 @@ struct ldlm_namespace *ldlm_namespace_new(struct > obd_device *obd, char *name, > ns->ns_obd = obd; > ns->ns_appetite = apt; > ns->ns_client = client; > + ns->ns_name = kstrdup(name, GFP_KERNEL); > + if (!ns->ns_name) > + goto out_hash; > > INIT_LIST_HEAD(>ns_list_chain); > INIT_LIST_HEAD(>ns_unused_list); > @@ -730,6 +733,7 @@ struct ldlm_namespace *ldlm_namespace_new(struct > obd_device *obd, char *name, > ldlm_namespace_sysfs_unregister(ns); > ldlm_namespace_cleanup(ns, 0); > out_hash: > + kfree(ns->ns_name); > cfs_hash_putref(ns->ns_rs_hash); > out_ns: > kfree(ns); > @@ -993,6 +997,7 @@ void ldlm_namespace_free_post(struct ldlm_namespace *ns) > ldlm_namespace_debugfs_unregister(ns); > ldlm_namespace_sysfs_unregister(ns); > cfs_hash_putref(ns->ns_rs_hash); > + kfree(ns->ns_name); > /* Namespace \a ns should be not on list at this time, otherwise >* this will cause issues related to using freed \a ns in poold >* thread. > > Cheers, Andreas -- Andreas Dilger Lustre Principal Architect Intel Corporation
[PATCH 10/10] staging: lustre: fix error deref in ll_splice_alias().
d_splice_alias() can return an ERR_PTR(). If it does while debugging is enabled, the following CDEBUG() will dereference that error and crash. So add appropriate checking, and provide a separate debug message for the error case. Reported-by: James SimmonsFixes: e9d4f0b9f559 ("staging: lustre: llite: use d_splice_alias for directories.") Signed-off-by: NeilBrown --- drivers/staging/lustre/lustre/llite/namei.c |8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c index 6c9ec462eb41..24a6873d86a2 100644 --- a/drivers/staging/lustre/lustre/llite/namei.c +++ b/drivers/staging/lustre/lustre/llite/namei.c @@ -442,11 +442,15 @@ struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de) } else { struct dentry *new = d_splice_alias(inode, de); + if (IS_ERR(new)) + CDEBUG(D_DENTRY, "splice inode %p as %pd gives error %lu\n", + inode, de, PTR_ERR(new)); if (new) de = new; } - CDEBUG(D_DENTRY, "Add dentry %p inode %p refc %d flags %#x\n", - de, d_inode(de), d_count(de), de->d_flags); + if (!IS_ERR(de)) + CDEBUG(D_DENTRY, "Add dentry %p inode %p refc %d flags %#x\n", + de, d_inode(de), d_count(de), de->d_flags); return de; }
[PATCH 10/10] staging: lustre: fix error deref in ll_splice_alias().
d_splice_alias() can return an ERR_PTR(). If it does while debugging is enabled, the following CDEBUG() will dereference that error and crash. So add appropriate checking, and provide a separate debug message for the error case. Reported-by: James Simmons Fixes: e9d4f0b9f559 ("staging: lustre: llite: use d_splice_alias for directories.") Signed-off-by: NeilBrown --- drivers/staging/lustre/lustre/llite/namei.c |8 ++-- 1 file changed, 6 insertions(+), 2 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/namei.c b/drivers/staging/lustre/lustre/llite/namei.c index 6c9ec462eb41..24a6873d86a2 100644 --- a/drivers/staging/lustre/lustre/llite/namei.c +++ b/drivers/staging/lustre/lustre/llite/namei.c @@ -442,11 +442,15 @@ struct dentry *ll_splice_alias(struct inode *inode, struct dentry *de) } else { struct dentry *new = d_splice_alias(inode, de); + if (IS_ERR(new)) + CDEBUG(D_DENTRY, "splice inode %p as %pd gives error %lu\n", + inode, de, PTR_ERR(new)); if (new) de = new; } - CDEBUG(D_DENTRY, "Add dentry %p inode %p refc %d flags %#x\n", - de, d_inode(de), d_count(de), de->d_flags); + if (!IS_ERR(de)) + CDEBUG(D_DENTRY, "Add dentry %p inode %p refc %d flags %#x\n", + de, d_inode(de), d_count(de), de->d_flags); return de; }
[PATCH 09/10] staging: lustre: move remaining code from linux-module.c to module.c
There is no longer any need to keep this code separate, and now we can remove linux-module.c Signed-off-by: NeilBrown--- .../staging/lustre/include/linux/libcfs/libcfs.h |4 drivers/staging/lustre/lnet/libcfs/Makefile|1 .../lustre/lnet/libcfs/linux/linux-module.c| 168 drivers/staging/lustre/lnet/libcfs/module.c| 131 4 files changed, 131 insertions(+), 173 deletions(-) delete mode 100644 drivers/staging/lustre/lnet/libcfs/linux/linux-module.c diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/libcfs.h index 9263e151451b..d420449b620e 100644 --- a/drivers/staging/lustre/include/linux/libcfs/libcfs.h +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs.h @@ -138,10 +138,6 @@ struct libcfs_ioctl_handler { int libcfs_register_ioctl(struct libcfs_ioctl_handler *hand); int libcfs_deregister_ioctl(struct libcfs_ioctl_handler *hand); -int libcfs_ioctl_getdata(struct libcfs_ioctl_hdr **hdr_pp, -const struct libcfs_ioctl_hdr __user *uparam); -int libcfs_ioctl_data_adjust(struct libcfs_ioctl_data *data); - #define _LIBCFS_H /** diff --git a/drivers/staging/lustre/lnet/libcfs/Makefile b/drivers/staging/lustre/lnet/libcfs/Makefile index e6fda27fdabd..e73515789a11 100644 --- a/drivers/staging/lustre/lnet/libcfs/Makefile +++ b/drivers/staging/lustre/lnet/libcfs/Makefile @@ -5,7 +5,6 @@ subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/lustre/include obj-$(CONFIG_LNET) += libcfs.o libcfs-linux-objs := linux-tracefile.o linux-debug.o -libcfs-linux-objs += linux-module.o libcfs-linux-objs += linux-crypto.o libcfs-linux-objs += linux-crypto-adler.o diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c deleted file mode 100644 index 954b681f9db7.. --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c +++ /dev/null @@ -1,168 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * GPL HEADER START - * - * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 only, - * as published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, but - * WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * General Public License version 2 for more details (a copy is included - * in the LICENSE file that accompanied this code). - * - * You should have received a copy of the GNU General Public License - * version 2 along with this program; If not, see - * http://www.gnu.org/licenses/gpl-2.0.html - * - * GPL HEADER END - */ -/* - * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved. - * Use is subject to license terms. - * - * Copyright (c) 2012, Intel Corporation. - */ -/* - * This file is part of Lustre, http://www.lustre.org/ - * Lustre is a trademark of Sun Microsystems, Inc. - */ - -#define DEBUG_SUBSYSTEM S_LNET - -#include -#include - -static inline size_t libcfs_ioctl_packlen(struct libcfs_ioctl_data *data) -{ - size_t len = sizeof(*data); - - len += cfs_size_round(data->ioc_inllen1); - len += cfs_size_round(data->ioc_inllen2); - return len; -} - -static inline bool libcfs_ioctl_is_invalid(struct libcfs_ioctl_data *data) -{ - if (data->ioc_hdr.ioc_len > BIT(30)) { - CERROR("LIBCFS ioctl: ioc_len larger than 1<<30\n"); - return true; - } - if (data->ioc_inllen1 > BIT(30)) { - CERROR("LIBCFS ioctl: ioc_inllen1 larger than 1<<30\n"); - return true; - } - if (data->ioc_inllen2 > BIT(30)) { - CERROR("LIBCFS ioctl: ioc_inllen2 larger than 1<<30\n"); - return true; - } - if (data->ioc_inlbuf1 && !data->ioc_inllen1) { - CERROR("LIBCFS ioctl: inlbuf1 pointer but 0 length\n"); - return true; - } - if (data->ioc_inlbuf2 && !data->ioc_inllen2) { - CERROR("LIBCFS ioctl: inlbuf2 pointer but 0 length\n"); - return true; - } - if (data->ioc_pbuf1 && !data->ioc_plen1) { - CERROR("LIBCFS ioctl: pbuf1 pointer but 0 length\n"); - return true; - } - if (data->ioc_pbuf2 && !data->ioc_plen2) { - CERROR("LIBCFS ioctl: pbuf2 pointer but 0 length\n"); - return true; - } - if (data->ioc_plen1 && !data->ioc_pbuf1) { - CERROR("LIBCFS ioctl: plen1 nonzero but no pbuf1 pointer\n"); - return true; - } - if (data->ioc_plen2 && !data->ioc_pbuf2) { - CERROR("LIBCFS ioctl: plen2 nonzero but no pbuf2
[PATCH 07/10] staging: lustre: llite: remove redundant lookup in dump_pgcache
Both the 'next' and the 'show' functions for the dump_page_cache seqfile perform a lookup based on the current file index. This is needless duplication. The reason appears to be that the state that needs to be communicated from "next" to "show" is two pointers, but seq_file only provides for a single pointer to be returned from next and passed to show. So make use of the new 'seq_private' structure to store the extra pointer. So when 'next' (or 'start') find something, it returns the page and stores the clob in the private area. 'show' accepts the page as an argument, and finds the clob where it was stored. Signed-off-by: NeilBrown--- drivers/staging/lustre/lustre/llite/vvp_dev.c | 97 +++-- 1 file changed, 41 insertions(+), 56 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/vvp_dev.c b/drivers/staging/lustre/lustre/llite/vvp_dev.c index a2619dc04a7f..39a85e967368 100644 --- a/drivers/staging/lustre/lustre/llite/vvp_dev.c +++ b/drivers/staging/lustre/lustre/llite/vvp_dev.c @@ -394,6 +394,7 @@ struct seq_private { struct ll_sb_info *sbi; struct lu_env *env; u16 refcheck; + struct cl_object*clob; }; static void vvp_pgcache_id_unpack(loff_t pos, struct vvp_pgcache_id *id) @@ -458,19 +459,20 @@ static struct cl_object *vvp_pgcache_obj(const struct lu_env *env, return NULL; } -static loff_t vvp_pgcache_find(const struct lu_env *env, - struct lu_device *dev, loff_t pos) +static struct page *vvp_pgcache_find(const struct lu_env *env, +struct lu_device *dev, +struct cl_object **clobp, loff_t *pos) { struct cl_object *clob; struct lu_site *site; struct vvp_pgcache_id id; site = dev->ld_site; - vvp_pgcache_id_unpack(pos, ); + vvp_pgcache_id_unpack(*pos, ); while (1) { if (id.vpi_bucket >= CFS_HASH_NHLIST(site->ls_obj_hash)) - return ~0ULL; + return NULL; clob = vvp_pgcache_obj(env, dev, ); if (clob) { struct inode *inode = vvp_object_inode(clob); @@ -482,20 +484,22 @@ static loff_t vvp_pgcache_find(const struct lu_env *env, if (nr > 0) { id.vpi_index = vmpage->index; /* Cant support over 16T file */ - nr = !(vmpage->index > 0x); + if (vmpage->index <= 0x) { + *clobp = clob; + *pos = vvp_pgcache_id_pack(); + return vmpage; + } put_page(vmpage); } lu_object_ref_del(>co_lu, "dump", current); cl_object_put(env, clob); - if (nr > 0) - return vvp_pgcache_id_pack(); } /* to the next object. */ ++id.vpi_depth; id.vpi_depth &= 0xf; if (id.vpi_depth == 0 && ++id.vpi_bucket == 0) - return ~0ULL; + return NULL; id.vpi_index = 0; } } @@ -538,71 +542,52 @@ static void vvp_pgcache_page_show(const struct lu_env *env, static int vvp_pgcache_show(struct seq_file *f, void *v) { struct seq_private *priv = f->private; - loff_t pos; - struct cl_object*clob; - struct vvp_pgcache_idid; - - pos = *(loff_t *)v; - vvp_pgcache_id_unpack(pos, ); - clob = vvp_pgcache_obj(priv->env, >sbi->ll_cl->cd_lu_dev, ); - if (clob) { - struct inode *inode = vvp_object_inode(clob); - struct cl_page *page = NULL; - struct page *vmpage; - int result; - - result = find_get_pages_contig(inode->i_mapping, - id.vpi_index, 1, - ); - if (result > 0) { - lock_page(vmpage); - page = cl_vmpage_page(vmpage, clob); - unlock_page(vmpage); - put_page(vmpage); - } - - seq_printf(f, "%8x@" DFID ": ", id.vpi_index, - PFID(lu_object_fid(>co_lu))); - if (page) { - vvp_pgcache_page_show(priv->env, f, page); - cl_page_put(priv->env, page); - } else { - seq_puts(f, "missing\n"); - } - lu_object_ref_del(>co_lu, "dump", current); -
[PATCH 08/10] staging: lustre: move misc-device registration closer to related code.
The ioctl handler for the misc device is in lnet/libcfs/module.c but is it registered in lnet/libcfs/linux/linux-module.c. Keeping related code together make maintenance easier, so move the code. Signed-off-by: NeilBrown--- .../staging/lustre/include/linux/libcfs/libcfs.h |2 - .../lustre/lnet/libcfs/linux/linux-module.c| 28 -- drivers/staging/lustre/lnet/libcfs/module.c| 31 +++- 3 files changed, 30 insertions(+), 31 deletions(-) diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/libcfs.h index 6e7754b2f296..9263e151451b 100644 --- a/drivers/staging/lustre/include/linux/libcfs/libcfs.h +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs.h @@ -141,11 +141,9 @@ int libcfs_deregister_ioctl(struct libcfs_ioctl_handler *hand); int libcfs_ioctl_getdata(struct libcfs_ioctl_hdr **hdr_pp, const struct libcfs_ioctl_hdr __user *uparam); int libcfs_ioctl_data_adjust(struct libcfs_ioctl_data *data); -int libcfs_ioctl(unsigned long cmd, void __user *arg); #define _LIBCFS_H -extern struct miscdevice libcfs_dev; /** * The path of debug log dump upcall script. */ diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c index c8908e816c4c..954b681f9db7 100644 --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c @@ -166,31 +166,3 @@ int libcfs_ioctl_getdata(struct libcfs_ioctl_hdr **hdr_pp, kvfree(*hdr_pp); return err; } - -static long -libcfs_psdev_ioctl(struct file *file, unsigned int cmd, unsigned long arg) -{ - if (!capable(CAP_SYS_ADMIN)) - return -EACCES; - - if (_IOC_TYPE(cmd) != IOC_LIBCFS_TYPE || - _IOC_NR(cmd) < IOC_LIBCFS_MIN_NR || - _IOC_NR(cmd) > IOC_LIBCFS_MAX_NR) { - CDEBUG(D_IOCTL, "invalid ioctl ( type %d, nr %d, size %d )\n", - _IOC_TYPE(cmd), _IOC_NR(cmd), _IOC_SIZE(cmd)); - return -EINVAL; - } - - return libcfs_ioctl(cmd, (void __user *)arg); -} - -static const struct file_operations libcfs_fops = { - .owner = THIS_MODULE, - .unlocked_ioctl = libcfs_psdev_ioctl, -}; - -struct miscdevice libcfs_dev = { - .minor = MISC_DYNAMIC_MINOR, - .name = "lnet", - .fops = _fops, -}; diff --git a/drivers/staging/lustre/lnet/libcfs/module.c b/drivers/staging/lustre/lnet/libcfs/module.c index 4b9acd7bc5cf..3fb150a57f49 100644 --- a/drivers/staging/lustre/lnet/libcfs/module.c +++ b/drivers/staging/lustre/lnet/libcfs/module.c @@ -95,7 +95,7 @@ int libcfs_deregister_ioctl(struct libcfs_ioctl_handler *hand) } EXPORT_SYMBOL(libcfs_deregister_ioctl); -int libcfs_ioctl(unsigned long cmd, void __user *uparam) +static int libcfs_ioctl(unsigned long cmd, void __user *uparam) { struct libcfs_ioctl_data *data = NULL; struct libcfs_ioctl_hdr *hdr; @@ -161,6 +161,35 @@ int libcfs_ioctl(unsigned long cmd, void __user *uparam) return err; } + +static long +libcfs_psdev_ioctl(struct file *file, unsigned int cmd, unsigned long arg) +{ + if (!capable(CAP_SYS_ADMIN)) + return -EACCES; + + if (_IOC_TYPE(cmd) != IOC_LIBCFS_TYPE || + _IOC_NR(cmd) < IOC_LIBCFS_MIN_NR || + _IOC_NR(cmd) > IOC_LIBCFS_MAX_NR) { + CDEBUG(D_IOCTL, "invalid ioctl ( type %d, nr %d, size %d )\n", + _IOC_TYPE(cmd), _IOC_NR(cmd), _IOC_SIZE(cmd)); + return -EINVAL; + } + + return libcfs_ioctl(cmd, (void __user *)arg); +} + +static const struct file_operations libcfs_fops = { + .owner = THIS_MODULE, + .unlocked_ioctl = libcfs_psdev_ioctl, +}; + +struct miscdevice libcfs_dev = { + .minor = MISC_DYNAMIC_MINOR, + .name = "lnet", + .fops = _fops, +}; + int lprocfs_call_handler(void *data, int write, loff_t *ppos, void __user *buffer, size_t *lenp, int (*handler)(void *data, int write, loff_t pos,
[PATCH 09/10] staging: lustre: move remaining code from linux-module.c to module.c
There is no longer any need to keep this code separate, and now we can remove linux-module.c Signed-off-by: NeilBrown --- .../staging/lustre/include/linux/libcfs/libcfs.h |4 drivers/staging/lustre/lnet/libcfs/Makefile|1 .../lustre/lnet/libcfs/linux/linux-module.c| 168 drivers/staging/lustre/lnet/libcfs/module.c| 131 4 files changed, 131 insertions(+), 173 deletions(-) delete mode 100644 drivers/staging/lustre/lnet/libcfs/linux/linux-module.c diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/libcfs.h index 9263e151451b..d420449b620e 100644 --- a/drivers/staging/lustre/include/linux/libcfs/libcfs.h +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs.h @@ -138,10 +138,6 @@ struct libcfs_ioctl_handler { int libcfs_register_ioctl(struct libcfs_ioctl_handler *hand); int libcfs_deregister_ioctl(struct libcfs_ioctl_handler *hand); -int libcfs_ioctl_getdata(struct libcfs_ioctl_hdr **hdr_pp, -const struct libcfs_ioctl_hdr __user *uparam); -int libcfs_ioctl_data_adjust(struct libcfs_ioctl_data *data); - #define _LIBCFS_H /** diff --git a/drivers/staging/lustre/lnet/libcfs/Makefile b/drivers/staging/lustre/lnet/libcfs/Makefile index e6fda27fdabd..e73515789a11 100644 --- a/drivers/staging/lustre/lnet/libcfs/Makefile +++ b/drivers/staging/lustre/lnet/libcfs/Makefile @@ -5,7 +5,6 @@ subdir-ccflags-y += -I$(srctree)/drivers/staging/lustre/lustre/include obj-$(CONFIG_LNET) += libcfs.o libcfs-linux-objs := linux-tracefile.o linux-debug.o -libcfs-linux-objs += linux-module.o libcfs-linux-objs += linux-crypto.o libcfs-linux-objs += linux-crypto-adler.o diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c deleted file mode 100644 index 954b681f9db7.. --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c +++ /dev/null @@ -1,168 +0,0 @@ -// SPDX-License-Identifier: GPL-2.0 -/* - * GPL HEADER START - * - * DO NOT ALTER OR REMOVE COPYRIGHT NOTICES OR THIS FILE HEADER. - * - * This program is free software; you can redistribute it and/or modify - * it under the terms of the GNU General Public License version 2 only, - * as published by the Free Software Foundation. - * - * This program is distributed in the hope that it will be useful, but - * WITHOUT ANY WARRANTY; without even the implied warranty of - * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU - * General Public License version 2 for more details (a copy is included - * in the LICENSE file that accompanied this code). - * - * You should have received a copy of the GNU General Public License - * version 2 along with this program; If not, see - * http://www.gnu.org/licenses/gpl-2.0.html - * - * GPL HEADER END - */ -/* - * Copyright (c) 2008, 2010, Oracle and/or its affiliates. All rights reserved. - * Use is subject to license terms. - * - * Copyright (c) 2012, Intel Corporation. - */ -/* - * This file is part of Lustre, http://www.lustre.org/ - * Lustre is a trademark of Sun Microsystems, Inc. - */ - -#define DEBUG_SUBSYSTEM S_LNET - -#include -#include - -static inline size_t libcfs_ioctl_packlen(struct libcfs_ioctl_data *data) -{ - size_t len = sizeof(*data); - - len += cfs_size_round(data->ioc_inllen1); - len += cfs_size_round(data->ioc_inllen2); - return len; -} - -static inline bool libcfs_ioctl_is_invalid(struct libcfs_ioctl_data *data) -{ - if (data->ioc_hdr.ioc_len > BIT(30)) { - CERROR("LIBCFS ioctl: ioc_len larger than 1<<30\n"); - return true; - } - if (data->ioc_inllen1 > BIT(30)) { - CERROR("LIBCFS ioctl: ioc_inllen1 larger than 1<<30\n"); - return true; - } - if (data->ioc_inllen2 > BIT(30)) { - CERROR("LIBCFS ioctl: ioc_inllen2 larger than 1<<30\n"); - return true; - } - if (data->ioc_inlbuf1 && !data->ioc_inllen1) { - CERROR("LIBCFS ioctl: inlbuf1 pointer but 0 length\n"); - return true; - } - if (data->ioc_inlbuf2 && !data->ioc_inllen2) { - CERROR("LIBCFS ioctl: inlbuf2 pointer but 0 length\n"); - return true; - } - if (data->ioc_pbuf1 && !data->ioc_plen1) { - CERROR("LIBCFS ioctl: pbuf1 pointer but 0 length\n"); - return true; - } - if (data->ioc_pbuf2 && !data->ioc_plen2) { - CERROR("LIBCFS ioctl: pbuf2 pointer but 0 length\n"); - return true; - } - if (data->ioc_plen1 && !data->ioc_pbuf1) { - CERROR("LIBCFS ioctl: plen1 nonzero but no pbuf1 pointer\n"); - return true; - } - if (data->ioc_plen2 && !data->ioc_pbuf2) { - CERROR("LIBCFS ioctl: plen2 nonzero but no pbuf2 pointer\n"); -
[PATCH 07/10] staging: lustre: llite: remove redundant lookup in dump_pgcache
Both the 'next' and the 'show' functions for the dump_page_cache seqfile perform a lookup based on the current file index. This is needless duplication. The reason appears to be that the state that needs to be communicated from "next" to "show" is two pointers, but seq_file only provides for a single pointer to be returned from next and passed to show. So make use of the new 'seq_private' structure to store the extra pointer. So when 'next' (or 'start') find something, it returns the page and stores the clob in the private area. 'show' accepts the page as an argument, and finds the clob where it was stored. Signed-off-by: NeilBrown --- drivers/staging/lustre/lustre/llite/vvp_dev.c | 97 +++-- 1 file changed, 41 insertions(+), 56 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/vvp_dev.c b/drivers/staging/lustre/lustre/llite/vvp_dev.c index a2619dc04a7f..39a85e967368 100644 --- a/drivers/staging/lustre/lustre/llite/vvp_dev.c +++ b/drivers/staging/lustre/lustre/llite/vvp_dev.c @@ -394,6 +394,7 @@ struct seq_private { struct ll_sb_info *sbi; struct lu_env *env; u16 refcheck; + struct cl_object*clob; }; static void vvp_pgcache_id_unpack(loff_t pos, struct vvp_pgcache_id *id) @@ -458,19 +459,20 @@ static struct cl_object *vvp_pgcache_obj(const struct lu_env *env, return NULL; } -static loff_t vvp_pgcache_find(const struct lu_env *env, - struct lu_device *dev, loff_t pos) +static struct page *vvp_pgcache_find(const struct lu_env *env, +struct lu_device *dev, +struct cl_object **clobp, loff_t *pos) { struct cl_object *clob; struct lu_site *site; struct vvp_pgcache_id id; site = dev->ld_site; - vvp_pgcache_id_unpack(pos, ); + vvp_pgcache_id_unpack(*pos, ); while (1) { if (id.vpi_bucket >= CFS_HASH_NHLIST(site->ls_obj_hash)) - return ~0ULL; + return NULL; clob = vvp_pgcache_obj(env, dev, ); if (clob) { struct inode *inode = vvp_object_inode(clob); @@ -482,20 +484,22 @@ static loff_t vvp_pgcache_find(const struct lu_env *env, if (nr > 0) { id.vpi_index = vmpage->index; /* Cant support over 16T file */ - nr = !(vmpage->index > 0x); + if (vmpage->index <= 0x) { + *clobp = clob; + *pos = vvp_pgcache_id_pack(); + return vmpage; + } put_page(vmpage); } lu_object_ref_del(>co_lu, "dump", current); cl_object_put(env, clob); - if (nr > 0) - return vvp_pgcache_id_pack(); } /* to the next object. */ ++id.vpi_depth; id.vpi_depth &= 0xf; if (id.vpi_depth == 0 && ++id.vpi_bucket == 0) - return ~0ULL; + return NULL; id.vpi_index = 0; } } @@ -538,71 +542,52 @@ static void vvp_pgcache_page_show(const struct lu_env *env, static int vvp_pgcache_show(struct seq_file *f, void *v) { struct seq_private *priv = f->private; - loff_t pos; - struct cl_object*clob; - struct vvp_pgcache_idid; - - pos = *(loff_t *)v; - vvp_pgcache_id_unpack(pos, ); - clob = vvp_pgcache_obj(priv->env, >sbi->ll_cl->cd_lu_dev, ); - if (clob) { - struct inode *inode = vvp_object_inode(clob); - struct cl_page *page = NULL; - struct page *vmpage; - int result; - - result = find_get_pages_contig(inode->i_mapping, - id.vpi_index, 1, - ); - if (result > 0) { - lock_page(vmpage); - page = cl_vmpage_page(vmpage, clob); - unlock_page(vmpage); - put_page(vmpage); - } - - seq_printf(f, "%8x@" DFID ": ", id.vpi_index, - PFID(lu_object_fid(>co_lu))); - if (page) { - vvp_pgcache_page_show(priv->env, f, page); - cl_page_put(priv->env, page); - } else { - seq_puts(f, "missing\n"); - } - lu_object_ref_del(>co_lu, "dump", current); -
[PATCH 08/10] staging: lustre: move misc-device registration closer to related code.
The ioctl handler for the misc device is in lnet/libcfs/module.c but is it registered in lnet/libcfs/linux/linux-module.c. Keeping related code together make maintenance easier, so move the code. Signed-off-by: NeilBrown --- .../staging/lustre/include/linux/libcfs/libcfs.h |2 - .../lustre/lnet/libcfs/linux/linux-module.c| 28 -- drivers/staging/lustre/lnet/libcfs/module.c| 31 +++- 3 files changed, 30 insertions(+), 31 deletions(-) diff --git a/drivers/staging/lustre/include/linux/libcfs/libcfs.h b/drivers/staging/lustre/include/linux/libcfs/libcfs.h index 6e7754b2f296..9263e151451b 100644 --- a/drivers/staging/lustre/include/linux/libcfs/libcfs.h +++ b/drivers/staging/lustre/include/linux/libcfs/libcfs.h @@ -141,11 +141,9 @@ int libcfs_deregister_ioctl(struct libcfs_ioctl_handler *hand); int libcfs_ioctl_getdata(struct libcfs_ioctl_hdr **hdr_pp, const struct libcfs_ioctl_hdr __user *uparam); int libcfs_ioctl_data_adjust(struct libcfs_ioctl_data *data); -int libcfs_ioctl(unsigned long cmd, void __user *arg); #define _LIBCFS_H -extern struct miscdevice libcfs_dev; /** * The path of debug log dump upcall script. */ diff --git a/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c b/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c index c8908e816c4c..954b681f9db7 100644 --- a/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c +++ b/drivers/staging/lustre/lnet/libcfs/linux/linux-module.c @@ -166,31 +166,3 @@ int libcfs_ioctl_getdata(struct libcfs_ioctl_hdr **hdr_pp, kvfree(*hdr_pp); return err; } - -static long -libcfs_psdev_ioctl(struct file *file, unsigned int cmd, unsigned long arg) -{ - if (!capable(CAP_SYS_ADMIN)) - return -EACCES; - - if (_IOC_TYPE(cmd) != IOC_LIBCFS_TYPE || - _IOC_NR(cmd) < IOC_LIBCFS_MIN_NR || - _IOC_NR(cmd) > IOC_LIBCFS_MAX_NR) { - CDEBUG(D_IOCTL, "invalid ioctl ( type %d, nr %d, size %d )\n", - _IOC_TYPE(cmd), _IOC_NR(cmd), _IOC_SIZE(cmd)); - return -EINVAL; - } - - return libcfs_ioctl(cmd, (void __user *)arg); -} - -static const struct file_operations libcfs_fops = { - .owner = THIS_MODULE, - .unlocked_ioctl = libcfs_psdev_ioctl, -}; - -struct miscdevice libcfs_dev = { - .minor = MISC_DYNAMIC_MINOR, - .name = "lnet", - .fops = _fops, -}; diff --git a/drivers/staging/lustre/lnet/libcfs/module.c b/drivers/staging/lustre/lnet/libcfs/module.c index 4b9acd7bc5cf..3fb150a57f49 100644 --- a/drivers/staging/lustre/lnet/libcfs/module.c +++ b/drivers/staging/lustre/lnet/libcfs/module.c @@ -95,7 +95,7 @@ int libcfs_deregister_ioctl(struct libcfs_ioctl_handler *hand) } EXPORT_SYMBOL(libcfs_deregister_ioctl); -int libcfs_ioctl(unsigned long cmd, void __user *uparam) +static int libcfs_ioctl(unsigned long cmd, void __user *uparam) { struct libcfs_ioctl_data *data = NULL; struct libcfs_ioctl_hdr *hdr; @@ -161,6 +161,35 @@ int libcfs_ioctl(unsigned long cmd, void __user *uparam) return err; } + +static long +libcfs_psdev_ioctl(struct file *file, unsigned int cmd, unsigned long arg) +{ + if (!capable(CAP_SYS_ADMIN)) + return -EACCES; + + if (_IOC_TYPE(cmd) != IOC_LIBCFS_TYPE || + _IOC_NR(cmd) < IOC_LIBCFS_MIN_NR || + _IOC_NR(cmd) > IOC_LIBCFS_MAX_NR) { + CDEBUG(D_IOCTL, "invalid ioctl ( type %d, nr %d, size %d )\n", + _IOC_TYPE(cmd), _IOC_NR(cmd), _IOC_SIZE(cmd)); + return -EINVAL; + } + + return libcfs_ioctl(cmd, (void __user *)arg); +} + +static const struct file_operations libcfs_fops = { + .owner = THIS_MODULE, + .unlocked_ioctl = libcfs_psdev_ioctl, +}; + +struct miscdevice libcfs_dev = { + .minor = MISC_DYNAMIC_MINOR, + .name = "lnet", + .fops = _fops, +}; + int lprocfs_call_handler(void *data, int write, loff_t *ppos, void __user *buffer, size_t *lenp, int (*handler)(void *data, int write, loff_t pos,
[PATCH 05/10] staging: lustre: fold lu_object_new() into lu_object_find_at()
lu_object_new() duplicates a lot of code that is in lu_object_find_at(). There is no real need for a separate function, it is simpler just to skip the bits of lu_object_find_at() that we don't want in the LOC_F_NEW case. Signed-off-by: NeilBrown--- drivers/staging/lustre/lustre/obdclass/lu_object.c | 44 +--- 1 file changed, 12 insertions(+), 32 deletions(-) diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c index 93daa52e2535..9721b3af8ea8 100644 --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c @@ -678,29 +678,6 @@ static void lu_object_limit(const struct lu_env *env, struct lu_device *dev) false); } -static struct lu_object *lu_object_new(const struct lu_env *env, - struct lu_device *dev, - const struct lu_fid *f, - const struct lu_object_conf *conf) -{ - struct lu_object*o; - struct cfs_hash *hs; - struct cfs_hash_bd bd; - - o = lu_object_alloc(env, dev, f, conf); - if (IS_ERR(o)) - return o; - - hs = dev->ld_site->ls_obj_hash; - cfs_hash_bd_get_and_lock(hs, (void *)f, , 1); - cfs_hash_bd_add_locked(hs, , >lo_header->loh_hash); - cfs_hash_bd_unlock(hs, , 1); - - lu_object_limit(env, dev); - - return o; -} - /** * Much like lu_object_find(), but top level device of object is specifically * \a dev rather than top level device of the site. This interface allows @@ -736,18 +713,18 @@ struct lu_object *lu_object_find_at(const struct lu_env *env, * just alloc and insert directly. * */ - if (conf && conf->loc_flags & LOC_F_NEW) - return lu_object_new(env, dev, f, conf); - s = dev->ld_site; hs = s->ls_obj_hash; - cfs_hash_bd_get_and_lock(hs, (void *)f, , 0); - o = htable_lookup(s, , f, ); - cfs_hash_bd_unlock(hs, , 0); - if (!IS_ERR(o) || PTR_ERR(o) != -ENOENT) - return o; + cfs_hash_bd_get(hs, f, ); + if (!(conf && conf->loc_flags & LOC_F_NEW)) { + cfs_hash_bd_lock(hs, , 0); + o = htable_lookup(s, , f, ); + cfs_hash_bd_unlock(hs, , 0); + if (!IS_ERR(o) || PTR_ERR(o) != -ENOENT) + return o; + } /* * Allocate new object. This may result in rather complicated * operations, including fld queries, inode loading, etc. @@ -760,7 +737,10 @@ struct lu_object *lu_object_find_at(const struct lu_env *env, cfs_hash_bd_lock(hs, , 1); - shadow = htable_lookup(s, , f, ); + if (conf && conf->loc_flags & LOC_F_NEW) + shadow = ERR_PTR(-ENOENT); + else + shadow = htable_lookup(s, , f, ); if (likely(PTR_ERR(shadow) == -ENOENT)) { cfs_hash_bd_add_locked(hs, , >lo_header->loh_hash); cfs_hash_bd_unlock(hs, , 1);
[PATCH 05/10] staging: lustre: fold lu_object_new() into lu_object_find_at()
lu_object_new() duplicates a lot of code that is in lu_object_find_at(). There is no real need for a separate function, it is simpler just to skip the bits of lu_object_find_at() that we don't want in the LOC_F_NEW case. Signed-off-by: NeilBrown --- drivers/staging/lustre/lustre/obdclass/lu_object.c | 44 +--- 1 file changed, 12 insertions(+), 32 deletions(-) diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c index 93daa52e2535..9721b3af8ea8 100644 --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c @@ -678,29 +678,6 @@ static void lu_object_limit(const struct lu_env *env, struct lu_device *dev) false); } -static struct lu_object *lu_object_new(const struct lu_env *env, - struct lu_device *dev, - const struct lu_fid *f, - const struct lu_object_conf *conf) -{ - struct lu_object*o; - struct cfs_hash *hs; - struct cfs_hash_bd bd; - - o = lu_object_alloc(env, dev, f, conf); - if (IS_ERR(o)) - return o; - - hs = dev->ld_site->ls_obj_hash; - cfs_hash_bd_get_and_lock(hs, (void *)f, , 1); - cfs_hash_bd_add_locked(hs, , >lo_header->loh_hash); - cfs_hash_bd_unlock(hs, , 1); - - lu_object_limit(env, dev); - - return o; -} - /** * Much like lu_object_find(), but top level device of object is specifically * \a dev rather than top level device of the site. This interface allows @@ -736,18 +713,18 @@ struct lu_object *lu_object_find_at(const struct lu_env *env, * just alloc and insert directly. * */ - if (conf && conf->loc_flags & LOC_F_NEW) - return lu_object_new(env, dev, f, conf); - s = dev->ld_site; hs = s->ls_obj_hash; - cfs_hash_bd_get_and_lock(hs, (void *)f, , 0); - o = htable_lookup(s, , f, ); - cfs_hash_bd_unlock(hs, , 0); - if (!IS_ERR(o) || PTR_ERR(o) != -ENOENT) - return o; + cfs_hash_bd_get(hs, f, ); + if (!(conf && conf->loc_flags & LOC_F_NEW)) { + cfs_hash_bd_lock(hs, , 0); + o = htable_lookup(s, , f, ); + cfs_hash_bd_unlock(hs, , 0); + if (!IS_ERR(o) || PTR_ERR(o) != -ENOENT) + return o; + } /* * Allocate new object. This may result in rather complicated * operations, including fld queries, inode loading, etc. @@ -760,7 +737,10 @@ struct lu_object *lu_object_find_at(const struct lu_env *env, cfs_hash_bd_lock(hs, , 1); - shadow = htable_lookup(s, , f, ); + if (conf && conf->loc_flags & LOC_F_NEW) + shadow = ERR_PTR(-ENOENT); + else + shadow = htable_lookup(s, , f, ); if (likely(PTR_ERR(shadow) == -ENOENT)) { cfs_hash_bd_add_locked(hs, , >lo_header->loh_hash); cfs_hash_bd_unlock(hs, , 1);
[PATCH 02/10] staging: lustre: make struct lu_site_bkt_data private
This data structure only needs to be public so that various modules can access a wait queue to wait for object destruction. If we provide a function to get the wait queue, rather than the whole bucket, the structure can be made private. Signed-off-by: NeilBrown--- drivers/staging/lustre/lustre/include/lu_object.h | 36 +- drivers/staging/lustre/lustre/llite/lcommon_cl.c |8 ++- drivers/staging/lustre/lustre/lov/lov_object.c |8 ++- drivers/staging/lustre/lustre/obdclass/lu_object.c | 50 +--- 4 files changed, 54 insertions(+), 48 deletions(-) diff --git a/drivers/staging/lustre/lustre/include/lu_object.h b/drivers/staging/lustre/lustre/include/lu_object.h index c3b0ed518819..f29bbca5af65 100644 --- a/drivers/staging/lustre/lustre/include/lu_object.h +++ b/drivers/staging/lustre/lustre/include/lu_object.h @@ -549,31 +549,7 @@ struct lu_object_header { }; struct fld; - -struct lu_site_bkt_data { - /** -* number of object in this bucket on the lsb_lru list. -*/ - longlsb_lru_len; - /** -* LRU list, updated on each access to object. Protected by -* bucket lock of lu_site::ls_obj_hash. -* -* "Cold" end of LRU is lu_site::ls_lru.next. Accessed object are -* moved to the lu_site::ls_lru.prev (this is due to the non-existence -* of list_for_each_entry_safe_reverse()). -*/ - struct list_headlsb_lru; - /** -* Wait-queue signaled when an object in this site is ultimately -* destroyed (lu_object_free()). It is used by lu_object_find() to -* wait before re-trying when object in the process of destruction is -* found in the hash table. -* -* \see htable_lookup(). -*/ - wait_queue_head_t lsb_marche_funebre; -}; +struct lu_site_bkt_data; enum { LU_SS_CREATED= 0, @@ -642,14 +618,8 @@ struct lu_site { struct percpu_counterls_lru_len_counter; }; -static inline struct lu_site_bkt_data * -lu_site_bkt_from_fid(struct lu_site *site, struct lu_fid *fid) -{ - struct cfs_hash_bd bd; - - cfs_hash_bd_get(site->ls_obj_hash, fid, ); - return cfs_hash_bd_extra_get(site->ls_obj_hash, ); -} +wait_queue_head_t * +lu_site_wq_from_fid(struct lu_site *site, struct lu_fid *fid); static inline struct seq_server_site *lu_site2seq(const struct lu_site *s) { diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c b/drivers/staging/lustre/lustre/llite/lcommon_cl.c index df5c0c0ae703..d5b42fb1d601 100644 --- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c +++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c @@ -211,12 +211,12 @@ static void cl_object_put_last(struct lu_env *env, struct cl_object *obj) if (unlikely(atomic_read(>loh_ref) != 1)) { struct lu_site *site = obj->co_lu.lo_dev->ld_site; - struct lu_site_bkt_data *bkt; + wait_queue_head_t *wq; - bkt = lu_site_bkt_from_fid(site, >loh_fid); + wq = lu_site_wq_from_fid(site, >loh_fid); init_waitqueue_entry(, current); - add_wait_queue(>lsb_marche_funebre, ); + add_wait_queue(wq, ); while (1) { set_current_state(TASK_UNINTERRUPTIBLE); @@ -226,7 +226,7 @@ static void cl_object_put_last(struct lu_env *env, struct cl_object *obj) } set_current_state(TASK_RUNNING); - remove_wait_queue(>lsb_marche_funebre, ); + remove_wait_queue(wq, ); } cl_object_put(env, obj); diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c b/drivers/staging/lustre/lustre/lov/lov_object.c index f7c69680cb7d..adc90f310fd7 100644 --- a/drivers/staging/lustre/lustre/lov/lov_object.c +++ b/drivers/staging/lustre/lustre/lov/lov_object.c @@ -370,7 +370,7 @@ static void lov_subobject_kill(const struct lu_env *env, struct lov_object *lov, struct cl_object*sub; struct lov_layout_raid0 *r0; struct lu_site*site; - struct lu_site_bkt_data *bkt; + wait_queue_head_t *wq; wait_queue_entry_t*waiter; r0 = >u.raid0; @@ -378,7 +378,7 @@ static void lov_subobject_kill(const struct lu_env *env, struct lov_object *lov, sub = lovsub2cl(los); site = sub->co_lu.lo_dev->ld_site; - bkt = lu_site_bkt_from_fid(site, >co_lu.lo_header->loh_fid); + wq = lu_site_wq_from_fid(site, >co_lu.lo_header->loh_fid); cl_object_kill(env, sub); /* release a reference to the sub-object and ... */ @@ -391,7 +391,7 @@ static void lov_subobject_kill(const struct lu_env *env, struct lov_object *lov, if (r0->lo_sub[idx] == los) { waiter = _env_info(env)->lti_waiter; init_waitqueue_entry(waiter, current); -
[PATCH 06/10] staging: lustre: llite: use more private data in dump_pgcache
The dump_page_cache debugfs file allocates and frees an 'env' in each call to vvp_pgcache_start,next,show. This is likely to be fast, but does introduce the need to check for errors. It is reasonable to allocate a single 'env' when the file is opened, and use that throughout. So create 'seq_private' structure which stores the sbi, env, and refcheck, and attach this to the seqfile. Then use it throughout instead of allocating 'env' repeatedly. Signed-off-by: NeilBrown--- drivers/staging/lustre/lustre/llite/vvp_dev.c | 150 - 1 file changed, 72 insertions(+), 78 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/vvp_dev.c b/drivers/staging/lustre/lustre/llite/vvp_dev.c index 987c03b058e6..a2619dc04a7f 100644 --- a/drivers/staging/lustre/lustre/llite/vvp_dev.c +++ b/drivers/staging/lustre/lustre/llite/vvp_dev.c @@ -390,6 +390,12 @@ struct vvp_pgcache_id { struct lu_object_header *vpi_obj; }; +struct seq_private { + struct ll_sb_info *sbi; + struct lu_env *env; + u16 refcheck; +}; + static void vvp_pgcache_id_unpack(loff_t pos, struct vvp_pgcache_id *id) { BUILD_BUG_ON(sizeof(pos) != sizeof(__u64)); @@ -531,95 +537,71 @@ static void vvp_pgcache_page_show(const struct lu_env *env, static int vvp_pgcache_show(struct seq_file *f, void *v) { + struct seq_private *priv = f->private; loff_t pos; - struct ll_sb_info *sbi; struct cl_object*clob; - struct lu_env *env; struct vvp_pgcache_idid; - u16 refcheck; - int result; - env = cl_env_get(); - if (!IS_ERR(env)) { - pos = *(loff_t *)v; - vvp_pgcache_id_unpack(pos, ); - sbi = f->private; - clob = vvp_pgcache_obj(env, >ll_cl->cd_lu_dev, ); - if (clob) { - struct inode *inode = vvp_object_inode(clob); - struct cl_page *page = NULL; - struct page *vmpage; - - result = find_get_pages_contig(inode->i_mapping, - id.vpi_index, 1, - ); - if (result > 0) { - lock_page(vmpage); - page = cl_vmpage_page(vmpage, clob); - unlock_page(vmpage); - put_page(vmpage); - } + pos = *(loff_t *)v; + vvp_pgcache_id_unpack(pos, ); + clob = vvp_pgcache_obj(priv->env, >sbi->ll_cl->cd_lu_dev, ); + if (clob) { + struct inode *inode = vvp_object_inode(clob); + struct cl_page *page = NULL; + struct page *vmpage; + int result; + + result = find_get_pages_contig(inode->i_mapping, + id.vpi_index, 1, + ); + if (result > 0) { + lock_page(vmpage); + page = cl_vmpage_page(vmpage, clob); + unlock_page(vmpage); + put_page(vmpage); + } - seq_printf(f, "%8x@" DFID ": ", id.vpi_index, - PFID(lu_object_fid(>co_lu))); - if (page) { - vvp_pgcache_page_show(env, f, page); - cl_page_put(env, page); - } else { - seq_puts(f, "missing\n"); - } - lu_object_ref_del(>co_lu, "dump", current); - cl_object_put(env, clob); + seq_printf(f, "%8x@" DFID ": ", id.vpi_index, + PFID(lu_object_fid(>co_lu))); + if (page) { + vvp_pgcache_page_show(priv->env, f, page); + cl_page_put(priv->env, page); } else { - seq_printf(f, "%llx missing\n", pos); + seq_puts(f, "missing\n"); } - cl_env_put(env, ); - result = 0; + lu_object_ref_del(>co_lu, "dump", current); + cl_object_put(priv->env, clob); } else { - result = PTR_ERR(env); + seq_printf(f, "%llx missing\n", pos); } - return result; + return 0; } static void *vvp_pgcache_start(struct seq_file *f, loff_t *pos) { - struct ll_sb_info *sbi; - struct lu_env *env; - u16 refcheck; - - sbi = f->private; + struct seq_private *priv = f->private; - env = cl_env_get(); - if (!IS_ERR(env)) { - sbi = f->private; -
[PATCH 04/10] staging: lustre: lu_object: move retry logic inside htable_lookup
The current retry logic, to wait when a 'dying' object is found, spans multiple functions. The process is attached to a waitqueue and set TASK_UNINTERRUPTIBLE in htable_lookup, and this status is passed back through lu_object_find_try() to lu_object_find_at() where schedule() is called and the process is removed from the queue. This can be simplified by moving all the logic (including hashtable locking) inside htable_lookup(), which now never returns EAGAIN. Note that htable_lookup() is called with the hash bucket lock held, and will drop and retake it if it needs to schedule. I made this a 'goto' loop rather than a 'while(1)' loop as the diff is easier to read. Signed-off-by: NeilBrown--- drivers/staging/lustre/lustre/obdclass/lu_object.c | 73 +++- 1 file changed, 27 insertions(+), 46 deletions(-) diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c index 2bf089817157..93daa52e2535 100644 --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c @@ -586,16 +586,21 @@ EXPORT_SYMBOL(lu_object_print); static struct lu_object *htable_lookup(struct lu_site *s, struct cfs_hash_bd *bd, const struct lu_fid *f, - wait_queue_entry_t *waiter, __u64 *version) { + struct cfs_hash *hs = s->ls_obj_hash; struct lu_site_bkt_data *bkt; struct lu_object_header *h; struct hlist_node *hnode; - __u64 ver = cfs_hash_bd_version_get(bd); + __u64 ver; + wait_queue_entry_t waiter; - if (*version == ver) +retry: + ver = cfs_hash_bd_version_get(bd); + + if (*version == ver) { return ERR_PTR(-ENOENT); + } *version = ver; bkt = cfs_hash_bd_extra_get(s->ls_obj_hash, bd); @@ -625,11 +630,15 @@ static struct lu_object *htable_lookup(struct lu_site *s, * drained), and moreover, lookup has to wait until object is freed. */ - init_waitqueue_entry(waiter, current); - add_wait_queue(>lsb_marche_funebre, waiter); + init_waitqueue_entry(, current); + add_wait_queue(>lsb_marche_funebre, ); set_current_state(TASK_UNINTERRUPTIBLE); lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_DEATH_RACE); - return ERR_PTR(-EAGAIN); + cfs_hash_bd_unlock(hs, bd, 1); + schedule(); + remove_wait_queue(>lsb_marche_funebre, ); + cfs_hash_bd_lock(hs, bd, 1); + goto retry; } /** @@ -693,13 +702,14 @@ static struct lu_object *lu_object_new(const struct lu_env *env, } /** - * Core logic of lu_object_find*() functions. + * Much like lu_object_find(), but top level device of object is specifically + * \a dev rather than top level device of the site. This interface allows + * objects of different "stacking" to be created within the same site. */ -static struct lu_object *lu_object_find_try(const struct lu_env *env, - struct lu_device *dev, - const struct lu_fid *f, - const struct lu_object_conf *conf, - wait_queue_entry_t *waiter) +struct lu_object *lu_object_find_at(const struct lu_env *env, + struct lu_device *dev, + const struct lu_fid *f, + const struct lu_object_conf *conf) { struct lu_object *o; struct lu_object *shadow; @@ -725,17 +735,16 @@ static struct lu_object *lu_object_find_try(const struct lu_env *env, * It is unnecessary to perform lookup-alloc-lookup-insert, instead, * just alloc and insert directly. * -* If dying object is found during index search, add @waiter to the -* site wait-queue and return ERR_PTR(-EAGAIN). */ if (conf && conf->loc_flags & LOC_F_NEW) return lu_object_new(env, dev, f, conf); s = dev->ld_site; hs = s->ls_obj_hash; - cfs_hash_bd_get_and_lock(hs, (void *)f, , 1); - o = htable_lookup(s, , f, waiter, ); - cfs_hash_bd_unlock(hs, , 1); + cfs_hash_bd_get_and_lock(hs, (void *)f, , 0); + o = htable_lookup(s, , f, ); + cfs_hash_bd_unlock(hs, , 0); + if (!IS_ERR(o) || PTR_ERR(o) != -ENOENT) return o; @@ -751,7 +760,7 @@ static struct lu_object *lu_object_find_try(const struct lu_env *env, cfs_hash_bd_lock(hs, , 1); - shadow = htable_lookup(s, , f, waiter, ); + shadow = htable_lookup(s, , f, ); if (likely(PTR_ERR(shadow) == -ENOENT)) { cfs_hash_bd_add_locked(hs, , >lo_header->loh_hash);
[PATCH 02/10] staging: lustre: make struct lu_site_bkt_data private
This data structure only needs to be public so that various modules can access a wait queue to wait for object destruction. If we provide a function to get the wait queue, rather than the whole bucket, the structure can be made private. Signed-off-by: NeilBrown --- drivers/staging/lustre/lustre/include/lu_object.h | 36 +- drivers/staging/lustre/lustre/llite/lcommon_cl.c |8 ++- drivers/staging/lustre/lustre/lov/lov_object.c |8 ++- drivers/staging/lustre/lustre/obdclass/lu_object.c | 50 +--- 4 files changed, 54 insertions(+), 48 deletions(-) diff --git a/drivers/staging/lustre/lustre/include/lu_object.h b/drivers/staging/lustre/lustre/include/lu_object.h index c3b0ed518819..f29bbca5af65 100644 --- a/drivers/staging/lustre/lustre/include/lu_object.h +++ b/drivers/staging/lustre/lustre/include/lu_object.h @@ -549,31 +549,7 @@ struct lu_object_header { }; struct fld; - -struct lu_site_bkt_data { - /** -* number of object in this bucket on the lsb_lru list. -*/ - longlsb_lru_len; - /** -* LRU list, updated on each access to object. Protected by -* bucket lock of lu_site::ls_obj_hash. -* -* "Cold" end of LRU is lu_site::ls_lru.next. Accessed object are -* moved to the lu_site::ls_lru.prev (this is due to the non-existence -* of list_for_each_entry_safe_reverse()). -*/ - struct list_headlsb_lru; - /** -* Wait-queue signaled when an object in this site is ultimately -* destroyed (lu_object_free()). It is used by lu_object_find() to -* wait before re-trying when object in the process of destruction is -* found in the hash table. -* -* \see htable_lookup(). -*/ - wait_queue_head_t lsb_marche_funebre; -}; +struct lu_site_bkt_data; enum { LU_SS_CREATED= 0, @@ -642,14 +618,8 @@ struct lu_site { struct percpu_counterls_lru_len_counter; }; -static inline struct lu_site_bkt_data * -lu_site_bkt_from_fid(struct lu_site *site, struct lu_fid *fid) -{ - struct cfs_hash_bd bd; - - cfs_hash_bd_get(site->ls_obj_hash, fid, ); - return cfs_hash_bd_extra_get(site->ls_obj_hash, ); -} +wait_queue_head_t * +lu_site_wq_from_fid(struct lu_site *site, struct lu_fid *fid); static inline struct seq_server_site *lu_site2seq(const struct lu_site *s) { diff --git a/drivers/staging/lustre/lustre/llite/lcommon_cl.c b/drivers/staging/lustre/lustre/llite/lcommon_cl.c index df5c0c0ae703..d5b42fb1d601 100644 --- a/drivers/staging/lustre/lustre/llite/lcommon_cl.c +++ b/drivers/staging/lustre/lustre/llite/lcommon_cl.c @@ -211,12 +211,12 @@ static void cl_object_put_last(struct lu_env *env, struct cl_object *obj) if (unlikely(atomic_read(>loh_ref) != 1)) { struct lu_site *site = obj->co_lu.lo_dev->ld_site; - struct lu_site_bkt_data *bkt; + wait_queue_head_t *wq; - bkt = lu_site_bkt_from_fid(site, >loh_fid); + wq = lu_site_wq_from_fid(site, >loh_fid); init_waitqueue_entry(, current); - add_wait_queue(>lsb_marche_funebre, ); + add_wait_queue(wq, ); while (1) { set_current_state(TASK_UNINTERRUPTIBLE); @@ -226,7 +226,7 @@ static void cl_object_put_last(struct lu_env *env, struct cl_object *obj) } set_current_state(TASK_RUNNING); - remove_wait_queue(>lsb_marche_funebre, ); + remove_wait_queue(wq, ); } cl_object_put(env, obj); diff --git a/drivers/staging/lustre/lustre/lov/lov_object.c b/drivers/staging/lustre/lustre/lov/lov_object.c index f7c69680cb7d..adc90f310fd7 100644 --- a/drivers/staging/lustre/lustre/lov/lov_object.c +++ b/drivers/staging/lustre/lustre/lov/lov_object.c @@ -370,7 +370,7 @@ static void lov_subobject_kill(const struct lu_env *env, struct lov_object *lov, struct cl_object*sub; struct lov_layout_raid0 *r0; struct lu_site*site; - struct lu_site_bkt_data *bkt; + wait_queue_head_t *wq; wait_queue_entry_t*waiter; r0 = >u.raid0; @@ -378,7 +378,7 @@ static void lov_subobject_kill(const struct lu_env *env, struct lov_object *lov, sub = lovsub2cl(los); site = sub->co_lu.lo_dev->ld_site; - bkt = lu_site_bkt_from_fid(site, >co_lu.lo_header->loh_fid); + wq = lu_site_wq_from_fid(site, >co_lu.lo_header->loh_fid); cl_object_kill(env, sub); /* release a reference to the sub-object and ... */ @@ -391,7 +391,7 @@ static void lov_subobject_kill(const struct lu_env *env, struct lov_object *lov, if (r0->lo_sub[idx] == los) { waiter = _env_info(env)->lti_waiter; init_waitqueue_entry(waiter, current); -
[PATCH 06/10] staging: lustre: llite: use more private data in dump_pgcache
The dump_page_cache debugfs file allocates and frees an 'env' in each call to vvp_pgcache_start,next,show. This is likely to be fast, but does introduce the need to check for errors. It is reasonable to allocate a single 'env' when the file is opened, and use that throughout. So create 'seq_private' structure which stores the sbi, env, and refcheck, and attach this to the seqfile. Then use it throughout instead of allocating 'env' repeatedly. Signed-off-by: NeilBrown --- drivers/staging/lustre/lustre/llite/vvp_dev.c | 150 - 1 file changed, 72 insertions(+), 78 deletions(-) diff --git a/drivers/staging/lustre/lustre/llite/vvp_dev.c b/drivers/staging/lustre/lustre/llite/vvp_dev.c index 987c03b058e6..a2619dc04a7f 100644 --- a/drivers/staging/lustre/lustre/llite/vvp_dev.c +++ b/drivers/staging/lustre/lustre/llite/vvp_dev.c @@ -390,6 +390,12 @@ struct vvp_pgcache_id { struct lu_object_header *vpi_obj; }; +struct seq_private { + struct ll_sb_info *sbi; + struct lu_env *env; + u16 refcheck; +}; + static void vvp_pgcache_id_unpack(loff_t pos, struct vvp_pgcache_id *id) { BUILD_BUG_ON(sizeof(pos) != sizeof(__u64)); @@ -531,95 +537,71 @@ static void vvp_pgcache_page_show(const struct lu_env *env, static int vvp_pgcache_show(struct seq_file *f, void *v) { + struct seq_private *priv = f->private; loff_t pos; - struct ll_sb_info *sbi; struct cl_object*clob; - struct lu_env *env; struct vvp_pgcache_idid; - u16 refcheck; - int result; - env = cl_env_get(); - if (!IS_ERR(env)) { - pos = *(loff_t *)v; - vvp_pgcache_id_unpack(pos, ); - sbi = f->private; - clob = vvp_pgcache_obj(env, >ll_cl->cd_lu_dev, ); - if (clob) { - struct inode *inode = vvp_object_inode(clob); - struct cl_page *page = NULL; - struct page *vmpage; - - result = find_get_pages_contig(inode->i_mapping, - id.vpi_index, 1, - ); - if (result > 0) { - lock_page(vmpage); - page = cl_vmpage_page(vmpage, clob); - unlock_page(vmpage); - put_page(vmpage); - } + pos = *(loff_t *)v; + vvp_pgcache_id_unpack(pos, ); + clob = vvp_pgcache_obj(priv->env, >sbi->ll_cl->cd_lu_dev, ); + if (clob) { + struct inode *inode = vvp_object_inode(clob); + struct cl_page *page = NULL; + struct page *vmpage; + int result; + + result = find_get_pages_contig(inode->i_mapping, + id.vpi_index, 1, + ); + if (result > 0) { + lock_page(vmpage); + page = cl_vmpage_page(vmpage, clob); + unlock_page(vmpage); + put_page(vmpage); + } - seq_printf(f, "%8x@" DFID ": ", id.vpi_index, - PFID(lu_object_fid(>co_lu))); - if (page) { - vvp_pgcache_page_show(env, f, page); - cl_page_put(env, page); - } else { - seq_puts(f, "missing\n"); - } - lu_object_ref_del(>co_lu, "dump", current); - cl_object_put(env, clob); + seq_printf(f, "%8x@" DFID ": ", id.vpi_index, + PFID(lu_object_fid(>co_lu))); + if (page) { + vvp_pgcache_page_show(priv->env, f, page); + cl_page_put(priv->env, page); } else { - seq_printf(f, "%llx missing\n", pos); + seq_puts(f, "missing\n"); } - cl_env_put(env, ); - result = 0; + lu_object_ref_del(>co_lu, "dump", current); + cl_object_put(priv->env, clob); } else { - result = PTR_ERR(env); + seq_printf(f, "%llx missing\n", pos); } - return result; + return 0; } static void *vvp_pgcache_start(struct seq_file *f, loff_t *pos) { - struct ll_sb_info *sbi; - struct lu_env *env; - u16 refcheck; - - sbi = f->private; + struct seq_private *priv = f->private; - env = cl_env_get(); - if (!IS_ERR(env)) { - sbi = f->private; - if
[PATCH 04/10] staging: lustre: lu_object: move retry logic inside htable_lookup
The current retry logic, to wait when a 'dying' object is found, spans multiple functions. The process is attached to a waitqueue and set TASK_UNINTERRUPTIBLE in htable_lookup, and this status is passed back through lu_object_find_try() to lu_object_find_at() where schedule() is called and the process is removed from the queue. This can be simplified by moving all the logic (including hashtable locking) inside htable_lookup(), which now never returns EAGAIN. Note that htable_lookup() is called with the hash bucket lock held, and will drop and retake it if it needs to schedule. I made this a 'goto' loop rather than a 'while(1)' loop as the diff is easier to read. Signed-off-by: NeilBrown --- drivers/staging/lustre/lustre/obdclass/lu_object.c | 73 +++- 1 file changed, 27 insertions(+), 46 deletions(-) diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c index 2bf089817157..93daa52e2535 100644 --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c @@ -586,16 +586,21 @@ EXPORT_SYMBOL(lu_object_print); static struct lu_object *htable_lookup(struct lu_site *s, struct cfs_hash_bd *bd, const struct lu_fid *f, - wait_queue_entry_t *waiter, __u64 *version) { + struct cfs_hash *hs = s->ls_obj_hash; struct lu_site_bkt_data *bkt; struct lu_object_header *h; struct hlist_node *hnode; - __u64 ver = cfs_hash_bd_version_get(bd); + __u64 ver; + wait_queue_entry_t waiter; - if (*version == ver) +retry: + ver = cfs_hash_bd_version_get(bd); + + if (*version == ver) { return ERR_PTR(-ENOENT); + } *version = ver; bkt = cfs_hash_bd_extra_get(s->ls_obj_hash, bd); @@ -625,11 +630,15 @@ static struct lu_object *htable_lookup(struct lu_site *s, * drained), and moreover, lookup has to wait until object is freed. */ - init_waitqueue_entry(waiter, current); - add_wait_queue(>lsb_marche_funebre, waiter); + init_waitqueue_entry(, current); + add_wait_queue(>lsb_marche_funebre, ); set_current_state(TASK_UNINTERRUPTIBLE); lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_DEATH_RACE); - return ERR_PTR(-EAGAIN); + cfs_hash_bd_unlock(hs, bd, 1); + schedule(); + remove_wait_queue(>lsb_marche_funebre, ); + cfs_hash_bd_lock(hs, bd, 1); + goto retry; } /** @@ -693,13 +702,14 @@ static struct lu_object *lu_object_new(const struct lu_env *env, } /** - * Core logic of lu_object_find*() functions. + * Much like lu_object_find(), but top level device of object is specifically + * \a dev rather than top level device of the site. This interface allows + * objects of different "stacking" to be created within the same site. */ -static struct lu_object *lu_object_find_try(const struct lu_env *env, - struct lu_device *dev, - const struct lu_fid *f, - const struct lu_object_conf *conf, - wait_queue_entry_t *waiter) +struct lu_object *lu_object_find_at(const struct lu_env *env, + struct lu_device *dev, + const struct lu_fid *f, + const struct lu_object_conf *conf) { struct lu_object *o; struct lu_object *shadow; @@ -725,17 +735,16 @@ static struct lu_object *lu_object_find_try(const struct lu_env *env, * It is unnecessary to perform lookup-alloc-lookup-insert, instead, * just alloc and insert directly. * -* If dying object is found during index search, add @waiter to the -* site wait-queue and return ERR_PTR(-EAGAIN). */ if (conf && conf->loc_flags & LOC_F_NEW) return lu_object_new(env, dev, f, conf); s = dev->ld_site; hs = s->ls_obj_hash; - cfs_hash_bd_get_and_lock(hs, (void *)f, , 1); - o = htable_lookup(s, , f, waiter, ); - cfs_hash_bd_unlock(hs, , 1); + cfs_hash_bd_get_and_lock(hs, (void *)f, , 0); + o = htable_lookup(s, , f, ); + cfs_hash_bd_unlock(hs, , 0); + if (!IS_ERR(o) || PTR_ERR(o) != -ENOENT) return o; @@ -751,7 +760,7 @@ static struct lu_object *lu_object_find_try(const struct lu_env *env, cfs_hash_bd_lock(hs, , 1); - shadow = htable_lookup(s, , f, waiter, ); + shadow = htable_lookup(s, , f, ); if (likely(PTR_ERR(shadow) == -ENOENT)) { cfs_hash_bd_add_locked(hs, , >lo_header->loh_hash); cfs_hash_bd_unlock(hs, ,
[PATCH 03/10] staging: lustre: lu_object: discard extra lru count.
lu_object maintains 2 lru counts. One is a per-bucket lsb_lru_len. The other is the per-cpu ls_lru_len_counter. The only times the per-bucket counters are use are: - a debug message when an object is added - in lu_site_stats_get when all the counters are combined. The debug message is not essential, and the per-cpu counter can be used to get the combined total. So discard the per-bucket lsb_lru_len. Signed-off-by: NeilBrown--- drivers/staging/lustre/lustre/obdclass/lu_object.c | 24 1 file changed, 9 insertions(+), 15 deletions(-) diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c index 2a8a25d6edb5..2bf089817157 100644 --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c @@ -57,10 +57,6 @@ #include struct lu_site_bkt_data { - /** -* number of object in this bucket on the lsb_lru list. -*/ - longlsb_lru_len; /** * LRU list, updated on each access to object. Protected by * bucket lock of lu_site::ls_obj_hash. @@ -187,10 +183,9 @@ void lu_object_put(const struct lu_env *env, struct lu_object *o) if (!lu_object_is_dying(top)) { LASSERT(list_empty(>loh_lru)); list_add_tail(>loh_lru, >lsb_lru); - bkt->lsb_lru_len++; percpu_counter_inc(>ls_lru_len_counter); - CDEBUG(D_INODE, "Add %p to site lru. hash: %p, bkt: %p, lru_len: %ld\n", - o, site->ls_obj_hash, bkt, bkt->lsb_lru_len); + CDEBUG(D_INODE, "Add %p to site lru. hash: %p, bkt: %p\n", + o, site->ls_obj_hash, bkt); cfs_hash_bd_unlock(site->ls_obj_hash, , 1); return; } @@ -238,7 +233,6 @@ void lu_object_unhash(const struct lu_env *env, struct lu_object *o) list_del_init(>loh_lru); bkt = cfs_hash_bd_extra_get(obj_hash, ); - bkt->lsb_lru_len--; percpu_counter_dec(>ls_lru_len_counter); } cfs_hash_bd_del_locked(obj_hash, , >loh_hash); @@ -422,7 +416,6 @@ int lu_site_purge_objects(const struct lu_env *env, struct lu_site *s, cfs_hash_bd_del_locked(s->ls_obj_hash, , >loh_hash); list_move(>loh_lru, ); - bkt->lsb_lru_len--; percpu_counter_dec(>ls_lru_len_counter); if (did_sth == 0) did_sth = 1; @@ -621,7 +614,6 @@ static struct lu_object *htable_lookup(struct lu_site *s, lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_HIT); if (!list_empty(>loh_lru)) { list_del_init(>loh_lru); - bkt->lsb_lru_len--; percpu_counter_dec(>ls_lru_len_counter); } return lu_object_top(h); @@ -1834,19 +1826,21 @@ struct lu_site_stats { unsigned intlss_busy; }; -static void lu_site_stats_get(struct cfs_hash *hs, +static void lu_site_stats_get(const struct lu_site *s, struct lu_site_stats *stats, int populated) { + struct cfs_hash *hs = s->ls_obj_hash; struct cfs_hash_bd bd; unsigned int i; + /* percpu_counter_read_positive() won't accept a const pointer */ + struct lu_site *s2 = (struct lu_site *)s; + stats->lss_busy += cfs_hash_size_get(hs) - + percpu_counter_read_positive(>ls_lru_len_counter); cfs_hash_for_each_bucket(hs, , i) { - struct lu_site_bkt_data *bkt = cfs_hash_bd_extra_get(hs, ); struct hlist_head *hhead; cfs_hash_bd_lock(hs, , 1); - stats->lss_busy += - cfs_hash_bd_count_get() - bkt->lsb_lru_len; stats->lss_total += cfs_hash_bd_count_get(); stats->lss_max_search = max((int)stats->lss_max_search, cfs_hash_bd_depmax_get()); @@ -2039,7 +2033,7 @@ int lu_site_stats_print(const struct lu_site *s, struct seq_file *m) struct lu_site_stats stats; memset(, 0, sizeof(stats)); - lu_site_stats_get(s->ls_obj_hash, , 1); + lu_site_stats_get(s, , 1); seq_printf(m, "%d/%d %d/%ld %d %d %d %d %d %d %d\n", stats.lss_busy,
[PATCH 03/10] staging: lustre: lu_object: discard extra lru count.
lu_object maintains 2 lru counts. One is a per-bucket lsb_lru_len. The other is the per-cpu ls_lru_len_counter. The only times the per-bucket counters are use are: - a debug message when an object is added - in lu_site_stats_get when all the counters are combined. The debug message is not essential, and the per-cpu counter can be used to get the combined total. So discard the per-bucket lsb_lru_len. Signed-off-by: NeilBrown --- drivers/staging/lustre/lustre/obdclass/lu_object.c | 24 1 file changed, 9 insertions(+), 15 deletions(-) diff --git a/drivers/staging/lustre/lustre/obdclass/lu_object.c b/drivers/staging/lustre/lustre/obdclass/lu_object.c index 2a8a25d6edb5..2bf089817157 100644 --- a/drivers/staging/lustre/lustre/obdclass/lu_object.c +++ b/drivers/staging/lustre/lustre/obdclass/lu_object.c @@ -57,10 +57,6 @@ #include struct lu_site_bkt_data { - /** -* number of object in this bucket on the lsb_lru list. -*/ - longlsb_lru_len; /** * LRU list, updated on each access to object. Protected by * bucket lock of lu_site::ls_obj_hash. @@ -187,10 +183,9 @@ void lu_object_put(const struct lu_env *env, struct lu_object *o) if (!lu_object_is_dying(top)) { LASSERT(list_empty(>loh_lru)); list_add_tail(>loh_lru, >lsb_lru); - bkt->lsb_lru_len++; percpu_counter_inc(>ls_lru_len_counter); - CDEBUG(D_INODE, "Add %p to site lru. hash: %p, bkt: %p, lru_len: %ld\n", - o, site->ls_obj_hash, bkt, bkt->lsb_lru_len); + CDEBUG(D_INODE, "Add %p to site lru. hash: %p, bkt: %p\n", + o, site->ls_obj_hash, bkt); cfs_hash_bd_unlock(site->ls_obj_hash, , 1); return; } @@ -238,7 +233,6 @@ void lu_object_unhash(const struct lu_env *env, struct lu_object *o) list_del_init(>loh_lru); bkt = cfs_hash_bd_extra_get(obj_hash, ); - bkt->lsb_lru_len--; percpu_counter_dec(>ls_lru_len_counter); } cfs_hash_bd_del_locked(obj_hash, , >loh_hash); @@ -422,7 +416,6 @@ int lu_site_purge_objects(const struct lu_env *env, struct lu_site *s, cfs_hash_bd_del_locked(s->ls_obj_hash, , >loh_hash); list_move(>loh_lru, ); - bkt->lsb_lru_len--; percpu_counter_dec(>ls_lru_len_counter); if (did_sth == 0) did_sth = 1; @@ -621,7 +614,6 @@ static struct lu_object *htable_lookup(struct lu_site *s, lprocfs_counter_incr(s->ls_stats, LU_SS_CACHE_HIT); if (!list_empty(>loh_lru)) { list_del_init(>loh_lru); - bkt->lsb_lru_len--; percpu_counter_dec(>ls_lru_len_counter); } return lu_object_top(h); @@ -1834,19 +1826,21 @@ struct lu_site_stats { unsigned intlss_busy; }; -static void lu_site_stats_get(struct cfs_hash *hs, +static void lu_site_stats_get(const struct lu_site *s, struct lu_site_stats *stats, int populated) { + struct cfs_hash *hs = s->ls_obj_hash; struct cfs_hash_bd bd; unsigned int i; + /* percpu_counter_read_positive() won't accept a const pointer */ + struct lu_site *s2 = (struct lu_site *)s; + stats->lss_busy += cfs_hash_size_get(hs) - + percpu_counter_read_positive(>ls_lru_len_counter); cfs_hash_for_each_bucket(hs, , i) { - struct lu_site_bkt_data *bkt = cfs_hash_bd_extra_get(hs, ); struct hlist_head *hhead; cfs_hash_bd_lock(hs, , 1); - stats->lss_busy += - cfs_hash_bd_count_get() - bkt->lsb_lru_len; stats->lss_total += cfs_hash_bd_count_get(); stats->lss_max_search = max((int)stats->lss_max_search, cfs_hash_bd_depmax_get()); @@ -2039,7 +2033,7 @@ int lu_site_stats_print(const struct lu_site *s, struct seq_file *m) struct lu_site_stats stats; memset(, 0, sizeof(stats)); - lu_site_stats_get(s->ls_obj_hash, , 1); + lu_site_stats_get(s, , 1); seq_printf(m, "%d/%d %d/%ld %d %d %d %d %d %d %d\n", stats.lss_busy,
[PATCH 00/10] staging: lustre: assorted improvements.
First 6 patches are clean-up patches that I pulled out of my rhashtable series. I think these stand alone as good cleanups, and having them upstream makes the rhashtable series shorter to ease further review. Second 2 are revised versions of patches I sent previously that had conflicts with other patches that landed first. Last is a bugfix for an issue James mentioned a while back. Thanks, NeilBrown --- NeilBrown (10): staging: lustre: ldlm: store name directly in namespace. staging: lustre: make struct lu_site_bkt_data private staging: lustre: lu_object: discard extra lru count. staging: lustre: lu_object: move retry logic inside htable_lookup staging: lustre: fold lu_object_new() into lu_object_find_at() staging: lustre: llite: use more private data in dump_pgcache staging: lustre: llite: remove redundant lookup in dump_pgcache staging: lustre: move misc-device registration closer to related code. staging: lustre: move remaining code from linux-module.c to module.c staging: lustre: fix error deref in ll_splice_alias(). .../staging/lustre/include/linux/libcfs/libcfs.h |6 - drivers/staging/lustre/lnet/libcfs/Makefile|1 .../lustre/lnet/libcfs/linux/linux-module.c| 196 drivers/staging/lustre/lnet/libcfs/module.c| 162 - drivers/staging/lustre/lustre/include/lu_object.h | 36 drivers/staging/lustre/lustre/include/lustre_dlm.h |5 - drivers/staging/lustre/lustre/ldlm/ldlm_resource.c |5 + drivers/staging/lustre/lustre/llite/lcommon_cl.c |8 - drivers/staging/lustre/lustre/llite/namei.c|8 + drivers/staging/lustre/lustre/llite/vvp_dev.c | 169 - drivers/staging/lustre/lustre/lov/lov_object.c |8 - drivers/staging/lustre/lustre/obdclass/lu_object.c | 169 - 12 files changed, 341 insertions(+), 432 deletions(-) delete mode 100644 drivers/staging/lustre/lnet/libcfs/linux/linux-module.c -- Signature
[PATCH 01/10] staging: lustre: ldlm: store name directly in namespace.
Rather than storing the name of a namespace in the hash table, store it directly in the namespace. This will allow the hashtable to be changed to use rhashtable. Signed-off-by: NeilBrown--- drivers/staging/lustre/lustre/include/lustre_dlm.h |5 - drivers/staging/lustre/lustre/ldlm/ldlm_resource.c |5 + 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/staging/lustre/lustre/include/lustre_dlm.h b/drivers/staging/lustre/lustre/include/lustre_dlm.h index d668d86423a4..b3532adac31c 100644 --- a/drivers/staging/lustre/lustre/include/lustre_dlm.h +++ b/drivers/staging/lustre/lustre/include/lustre_dlm.h @@ -362,6 +362,9 @@ struct ldlm_namespace { /** Flag indicating if namespace is on client instead of server */ enum ldlm_side ns_client; + /** name of this namespace */ + char*ns_name; + /** Resource hash table for namespace. */ struct cfs_hash *ns_rs_hash; @@ -878,7 +881,7 @@ static inline bool ldlm_has_layout(struct ldlm_lock *lock) static inline char * ldlm_ns_name(struct ldlm_namespace *ns) { - return ns->ns_rs_hash->hs_name; + return ns->ns_name; } static inline struct ldlm_namespace * diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c index 6c615b6e9bdc..43bbc5fd94cc 100644 --- a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c @@ -688,6 +688,9 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name, ns->ns_obd = obd; ns->ns_appetite = apt; ns->ns_client = client; + ns->ns_name = kstrdup(name, GFP_KERNEL); + if (!ns->ns_name) + goto out_hash; INIT_LIST_HEAD(>ns_list_chain); INIT_LIST_HEAD(>ns_unused_list); @@ -730,6 +733,7 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name, ldlm_namespace_sysfs_unregister(ns); ldlm_namespace_cleanup(ns, 0); out_hash: + kfree(ns->ns_name); cfs_hash_putref(ns->ns_rs_hash); out_ns: kfree(ns); @@ -993,6 +997,7 @@ void ldlm_namespace_free_post(struct ldlm_namespace *ns) ldlm_namespace_debugfs_unregister(ns); ldlm_namespace_sysfs_unregister(ns); cfs_hash_putref(ns->ns_rs_hash); + kfree(ns->ns_name); /* Namespace \a ns should be not on list at this time, otherwise * this will cause issues related to using freed \a ns in poold * thread.
[PATCH 00/10] staging: lustre: assorted improvements.
First 6 patches are clean-up patches that I pulled out of my rhashtable series. I think these stand alone as good cleanups, and having them upstream makes the rhashtable series shorter to ease further review. Second 2 are revised versions of patches I sent previously that had conflicts with other patches that landed first. Last is a bugfix for an issue James mentioned a while back. Thanks, NeilBrown --- NeilBrown (10): staging: lustre: ldlm: store name directly in namespace. staging: lustre: make struct lu_site_bkt_data private staging: lustre: lu_object: discard extra lru count. staging: lustre: lu_object: move retry logic inside htable_lookup staging: lustre: fold lu_object_new() into lu_object_find_at() staging: lustre: llite: use more private data in dump_pgcache staging: lustre: llite: remove redundant lookup in dump_pgcache staging: lustre: move misc-device registration closer to related code. staging: lustre: move remaining code from linux-module.c to module.c staging: lustre: fix error deref in ll_splice_alias(). .../staging/lustre/include/linux/libcfs/libcfs.h |6 - drivers/staging/lustre/lnet/libcfs/Makefile|1 .../lustre/lnet/libcfs/linux/linux-module.c| 196 drivers/staging/lustre/lnet/libcfs/module.c| 162 - drivers/staging/lustre/lustre/include/lu_object.h | 36 drivers/staging/lustre/lustre/include/lustre_dlm.h |5 - drivers/staging/lustre/lustre/ldlm/ldlm_resource.c |5 + drivers/staging/lustre/lustre/llite/lcommon_cl.c |8 - drivers/staging/lustre/lustre/llite/namei.c|8 + drivers/staging/lustre/lustre/llite/vvp_dev.c | 169 - drivers/staging/lustre/lustre/lov/lov_object.c |8 - drivers/staging/lustre/lustre/obdclass/lu_object.c | 169 - 12 files changed, 341 insertions(+), 432 deletions(-) delete mode 100644 drivers/staging/lustre/lnet/libcfs/linux/linux-module.c -- Signature
[PATCH 01/10] staging: lustre: ldlm: store name directly in namespace.
Rather than storing the name of a namespace in the hash table, store it directly in the namespace. This will allow the hashtable to be changed to use rhashtable. Signed-off-by: NeilBrown --- drivers/staging/lustre/lustre/include/lustre_dlm.h |5 - drivers/staging/lustre/lustre/ldlm/ldlm_resource.c |5 + 2 files changed, 9 insertions(+), 1 deletion(-) diff --git a/drivers/staging/lustre/lustre/include/lustre_dlm.h b/drivers/staging/lustre/lustre/include/lustre_dlm.h index d668d86423a4..b3532adac31c 100644 --- a/drivers/staging/lustre/lustre/include/lustre_dlm.h +++ b/drivers/staging/lustre/lustre/include/lustre_dlm.h @@ -362,6 +362,9 @@ struct ldlm_namespace { /** Flag indicating if namespace is on client instead of server */ enum ldlm_side ns_client; + /** name of this namespace */ + char*ns_name; + /** Resource hash table for namespace. */ struct cfs_hash *ns_rs_hash; @@ -878,7 +881,7 @@ static inline bool ldlm_has_layout(struct ldlm_lock *lock) static inline char * ldlm_ns_name(struct ldlm_namespace *ns) { - return ns->ns_rs_hash->hs_name; + return ns->ns_name; } static inline struct ldlm_namespace * diff --git a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c index 6c615b6e9bdc..43bbc5fd94cc 100644 --- a/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c +++ b/drivers/staging/lustre/lustre/ldlm/ldlm_resource.c @@ -688,6 +688,9 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name, ns->ns_obd = obd; ns->ns_appetite = apt; ns->ns_client = client; + ns->ns_name = kstrdup(name, GFP_KERNEL); + if (!ns->ns_name) + goto out_hash; INIT_LIST_HEAD(>ns_list_chain); INIT_LIST_HEAD(>ns_unused_list); @@ -730,6 +733,7 @@ struct ldlm_namespace *ldlm_namespace_new(struct obd_device *obd, char *name, ldlm_namespace_sysfs_unregister(ns); ldlm_namespace_cleanup(ns, 0); out_hash: + kfree(ns->ns_name); cfs_hash_putref(ns->ns_rs_hash); out_ns: kfree(ns); @@ -993,6 +997,7 @@ void ldlm_namespace_free_post(struct ldlm_namespace *ns) ldlm_namespace_debugfs_unregister(ns); ldlm_namespace_sysfs_unregister(ns); cfs_hash_putref(ns->ns_rs_hash); + kfree(ns->ns_name); /* Namespace \a ns should be not on list at this time, otherwise * this will cause issues related to using freed \a ns in poold * thread.
Re: [kernel-team] Re: [PATCH RFC v5 4/6] trace/irqsoff: Split reset into seperate functions
On Mon, Apr 30, 2018 at 8:46 PM Randy Dunlapwrote: > On 04/30/2018 06:42 PM, Joel Fernandes wrote: > > Split reset functions into seperate functions in preparation > > of future patches that need to do tracer specific reset. > > > Hi, > Since you are updating patches anyway, please > s/seperate/separate/. Thanks Randy, will do. - Joel
Re: [kernel-team] Re: [PATCH RFC v5 4/6] trace/irqsoff: Split reset into seperate functions
On Mon, Apr 30, 2018 at 8:46 PM Randy Dunlap wrote: > On 04/30/2018 06:42 PM, Joel Fernandes wrote: > > Split reset functions into seperate functions in preparation > > of future patches that need to do tracer specific reset. > > > Hi, > Since you are updating patches anyway, please > s/seperate/separate/. Thanks Randy, will do. - Joel
Re: [PATCH RFC v5 4/6] trace/irqsoff: Split reset into seperate functions
On 04/30/2018 06:42 PM, Joel Fernandes wrote: > Split reset functions into seperate functions in preparation > of future patches that need to do tracer specific reset. > Hi, Since you are updating patches anyway, please s/seperate/separate/. thanks, -- ~Randy
Re: [PATCH RFC v5 4/6] trace/irqsoff: Split reset into seperate functions
On 04/30/2018 06:42 PM, Joel Fernandes wrote: > Split reset functions into seperate functions in preparation > of future patches that need to do tracer specific reset. > Hi, Since you are updating patches anyway, please s/seperate/separate/. thanks, -- ~Randy
Warning for driver i915 for 4.17.0-rcX
With kernel 4.17.0-rc3, I noted the following warning from driver i915. kernel: [ cut here ] kernel: Could not determine valid watermarks for inherited state kernel: WARNING: CPU: 3 PID: 224 at drivers/gpu/drm/i915/intel_display.c:14584 intel_modeset_init+0x3be/0x1060 [i915] kernel: Modules linked in: i915(+) xhci_pci i2c_algo_bit ehci_pci xhci_hcd serio_raw drm_kms_helper ehci_hcd syscopyarea sysfillrect sysimgblt kernel: CPU: 3 PID: 224 Comm: systemd-udevd Not tainted 4.17.0-rc0-08000-g2f39cfca0161-dirty #188 kernel: Hardware name: TOSHIBA TECRA A50-A/TECRA A50-A, BIOS Version 4.50 09/29/2014 kernel: RIP: 0010:intel_modeset_init+0x3be/0x1060 [i915] kernel: RSP: 0018:c9000112fab8 EFLAGS: 00010296 kernel: RAX: 0038 RBX: 88021dc1 RCX: 0006 kernel: RDX: 0007 RSI: 0082 RDI: 88022f2d5bd0 kernel: RBP: 88021c8b3000 R08: 0289 R09: 0004 kernel: R10: c9000112f948 R11: 0001 R12: 88021ef3d800 kernel: R13: ffea R14: R15: 88021dc10358 kernel: FS: 7f830913c940() GS:88022f2c() knlGS: kernel: CS: 0010 DS: ES: CR0: 80050033 kernel: CR2: 7ffc91722f58 CR3: 00021e940003 CR4: 001606e0 kernel: Call Trace: kernel: i915_driver_load+0xa87/0xed0 [i915] kernel: local_pci_probe+0x42/0xa0 kernel: pci_device_probe+0x125/0x190 kernel: driver_probe_device+0x30b/0x480 kernel: __driver_attach+0xb8/0xe0 kernel: ? driver_probe_device+0x40/0x480 kernel: ? driver_probe_device+0x480/0x480 kernel: bus_for_each_dev+0x65/0x90 kernel: bus_add_driver+0x161/0x260 kernel: ? 0xa0149000 kernel: driver_register+0x57/0xc0 kernel: ? 0xa0149000 kernel: do_one_initcall+0x4e/0x18d kernel: ? kmem_cache_alloc_trace+0xfe/0x210 kernel: ? do_init_module+0x22/0x20a kernel: do_init_module+0x5b/0x20a kernel: load_module+0x18a1/0x1e20 kernel: ? SYSC_finit_module+0xb7/0xd0 kernel: SYSC_finit_module+0xb7/0xd0 kernel: do_syscall_64+0x6e/0x120 kernel: entry_SYSCALL_64_after_hwframe+0x3d/0xa2 kernel: RIP: 0033:0x7f8307f67529 kernel: RSP: 002b:7ffc91737ab8 EFLAGS: 0246 ORIG_RAX: 0139 kernel: RAX: ffda RBX: 559b3cf6f350 RCX: 7f8307f67529 kernel: RDX: RSI: 7f83088cd83d RDI: 0014 kernel: RBP: 7f83088cd83d R08: R09: 559b3cf44f50 kernel: R10: 0014 R11: 0246 R12: 0002 kernel: R13: 559b3cf634a0 R14: R15: 03938700 kernel: Code: 00 f7 c6 00 00 18 00 0f 84 d7 0a 00 00 85 c9 0f 94 c1 41 88 8c 24 8d 02 00 00 e9 f6 08 00 00 48 c7 c7 b0 2f 41 a0 e8 f2 9b cd e0 kernel: ---[ end trace c0feea6402f4c999 ]--- This warning was bisected to commit a2936e3d9a9cb2ce192455cdec3a8cfccc26b486 (refs/bisect/bad) Author: Ville SyrjäläDate: Thu Nov 23 21:04:49 2017 +0200 drm/i915: Use drm_mode_get_hv_timing() to populate plane clip rectangle Cc: Laurent Pinchart Signed-off-by: Ville Syrjälä Reviewed-by: Daniel Vetter Reviewed-by: Thierry Reding The output of 'lspci -nn -vvv' for the gracpics device is as follows: 00:02.0 VGA compatible controller [0300]: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller [8086:0416] (rev 06) (prog-if 00 [VGA controller]) Subsystem: Toshiba America Info Systems Device [1179:0002] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- Latency: 0 Interrupt: pin A routed to IRQ 28 Region 0: Memory at e000 (64-bit, non-prefetchable) [size=4M] Region 2: Memory at d000 (64-bit, prefetchable) [size=256M] Region 4: I/O ports at 4000 [size=64] [virtual] Expansion ROM at 000c [disabled] [size=128K] Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- Address: fee00018 Data: Capabilities: [d0] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [a4] PCI Advanced Features AFCap: TP+ FLR+ AFCtrl: FLR- AFStatus: TP- Kernel driver in use: i915 Kernel modules: i915 This warning does not seem to cause any problems. Thanks, Larry
Warning for driver i915 for 4.17.0-rcX
With kernel 4.17.0-rc3, I noted the following warning from driver i915. kernel: [ cut here ] kernel: Could not determine valid watermarks for inherited state kernel: WARNING: CPU: 3 PID: 224 at drivers/gpu/drm/i915/intel_display.c:14584 intel_modeset_init+0x3be/0x1060 [i915] kernel: Modules linked in: i915(+) xhci_pci i2c_algo_bit ehci_pci xhci_hcd serio_raw drm_kms_helper ehci_hcd syscopyarea sysfillrect sysimgblt kernel: CPU: 3 PID: 224 Comm: systemd-udevd Not tainted 4.17.0-rc0-08000-g2f39cfca0161-dirty #188 kernel: Hardware name: TOSHIBA TECRA A50-A/TECRA A50-A, BIOS Version 4.50 09/29/2014 kernel: RIP: 0010:intel_modeset_init+0x3be/0x1060 [i915] kernel: RSP: 0018:c9000112fab8 EFLAGS: 00010296 kernel: RAX: 0038 RBX: 88021dc1 RCX: 0006 kernel: RDX: 0007 RSI: 0082 RDI: 88022f2d5bd0 kernel: RBP: 88021c8b3000 R08: 0289 R09: 0004 kernel: R10: c9000112f948 R11: 0001 R12: 88021ef3d800 kernel: R13: ffea R14: R15: 88021dc10358 kernel: FS: 7f830913c940() GS:88022f2c() knlGS: kernel: CS: 0010 DS: ES: CR0: 80050033 kernel: CR2: 7ffc91722f58 CR3: 00021e940003 CR4: 001606e0 kernel: Call Trace: kernel: i915_driver_load+0xa87/0xed0 [i915] kernel: local_pci_probe+0x42/0xa0 kernel: pci_device_probe+0x125/0x190 kernel: driver_probe_device+0x30b/0x480 kernel: __driver_attach+0xb8/0xe0 kernel: ? driver_probe_device+0x40/0x480 kernel: ? driver_probe_device+0x480/0x480 kernel: bus_for_each_dev+0x65/0x90 kernel: bus_add_driver+0x161/0x260 kernel: ? 0xa0149000 kernel: driver_register+0x57/0xc0 kernel: ? 0xa0149000 kernel: do_one_initcall+0x4e/0x18d kernel: ? kmem_cache_alloc_trace+0xfe/0x210 kernel: ? do_init_module+0x22/0x20a kernel: do_init_module+0x5b/0x20a kernel: load_module+0x18a1/0x1e20 kernel: ? SYSC_finit_module+0xb7/0xd0 kernel: SYSC_finit_module+0xb7/0xd0 kernel: do_syscall_64+0x6e/0x120 kernel: entry_SYSCALL_64_after_hwframe+0x3d/0xa2 kernel: RIP: 0033:0x7f8307f67529 kernel: RSP: 002b:7ffc91737ab8 EFLAGS: 0246 ORIG_RAX: 0139 kernel: RAX: ffda RBX: 559b3cf6f350 RCX: 7f8307f67529 kernel: RDX: RSI: 7f83088cd83d RDI: 0014 kernel: RBP: 7f83088cd83d R08: R09: 559b3cf44f50 kernel: R10: 0014 R11: 0246 R12: 0002 kernel: R13: 559b3cf634a0 R14: R15: 03938700 kernel: Code: 00 f7 c6 00 00 18 00 0f 84 d7 0a 00 00 85 c9 0f 94 c1 41 88 8c 24 8d 02 00 00 e9 f6 08 00 00 48 c7 c7 b0 2f 41 a0 e8 f2 9b cd e0 kernel: ---[ end trace c0feea6402f4c999 ]--- This warning was bisected to commit a2936e3d9a9cb2ce192455cdec3a8cfccc26b486 (refs/bisect/bad) Author: Ville Syrjälä Date: Thu Nov 23 21:04:49 2017 +0200 drm/i915: Use drm_mode_get_hv_timing() to populate plane clip rectangle Cc: Laurent Pinchart Signed-off-by: Ville Syrjälä Reviewed-by: Daniel Vetter Reviewed-by: Thierry Reding The output of 'lspci -nn -vvv' for the gracpics device is as follows: 00:02.0 VGA compatible controller [0300]: Intel Corporation 4th Gen Core Processor Integrated Graphics Controller [8086:0416] (rev 06) (prog-if 00 [VGA controller]) Subsystem: Toshiba America Info Systems Device [1179:0002] Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B- DisINTx+ Status: Cap+ 66MHz- UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- SERR- Latency: 0 Interrupt: pin A routed to IRQ 28 Region 0: Memory at e000 (64-bit, non-prefetchable) [size=4M] Region 2: Memory at d000 (64-bit, prefetchable) [size=256M] Region 4: I/O ports at 4000 [size=64] [virtual] Expansion ROM at 000c [disabled] [size=128K] Capabilities: [90] MSI: Enable+ Count=1/1 Maskable- 64bit- Address: fee00018 Data: Capabilities: [d0] Power Management version 2 Flags: PMEClk- DSI+ D1- D2- AuxCurrent=0mA PME(D0-,D1-,D2-,D3hot-,D3cold-) Status: D0 NoSoftRst- PME-Enable- DSel=0 DScale=0 PME- Capabilities: [a4] PCI Advanced Features AFCap: TP+ FLR+ AFCtrl: FLR- AFStatus: TP- Kernel driver in use: i915 Kernel modules: i915 This warning does not seem to cause any problems. Thanks, Larry
Re: [PATCH v2] libata: blacklist Micron SSD
Sudip, > v1: Only M500IT MU01 was blacklisted. > > v2: Whitelist M500IT BG02 and M500DC and then blacklist all other Micron. I think my preference would be to blacklist M500IT with the MU01 firmware (which Micron said was affected) and rely on the "Micron*" fallthrough further down for the rest. I have not gotten firm confirmation on ZRAT behavior so for now we should probably just do: + { "Micron_M500IT_*", "MU01", ATA_HORKAGE_NO_NCQ_TRIM, }, -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH v2] libata: blacklist Micron SSD
Sudip, > v1: Only M500IT MU01 was blacklisted. > > v2: Whitelist M500IT BG02 and M500DC and then blacklist all other Micron. I think my preference would be to blacklist M500IT with the MU01 firmware (which Micron said was affected) and rely on the "Micron*" fallthrough further down for the rest. I have not gotten firm confirmation on ZRAT behavior so for now we should probably just do: + { "Micron_M500IT_*", "MU01", ATA_HORKAGE_NO_NCQ_TRIM, }, -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH 2/2] backlight: Remove ld9040 driver
> -Original Message- > From: Krzysztof Kozlowski> Sent: Monday, April 30, 2018 1:30 PM > To: Lee Jones ; Daniel Thompson > ; Jingoo Han ; > Bartlomiej Zolnierkiewicz ; linux- > ker...@vger.kernel.org; dri-de...@lists.freedesktop.org; linux- > fb...@vger.kernel.org > Cc: Krzysztof Kozlowski ; Marek Szyprowski > ; Inki Dae > Subject: [PATCH 2/2] backlight: Remove ld9040 driver > > The driver for LD9040 AMOLED LCD panel was superseded with DRM driver > panel-samsung-ld9040.c. It does not support DeviceTree and respective > possible user (Exynos4210 Universal C210) is DeviceTree-only and uses > DRM version of driver.. > > Suggested-by: Marek Szyprowski > Cc: Marek Szyprowski > Cc: Inki Dae > Signed-off-by: Krzysztof Kozlowski > --- > drivers/video/backlight/Kconfig| 8 - > drivers/video/backlight/Makefile | 1 - > drivers/video/backlight/ld9040.c | 811 - > > drivers/video/backlight/ld9040_gamma.h | 202 > 4 files changed, 1022 deletions(-) > delete mode 100644 drivers/video/backlight/ld9040.c > delete mode 100644 drivers/video/backlight/ld9040_gamma.h Acked-by: Jingoo Han Best regards, Jingoo Han [.]
Re: [PATCH 2/2] backlight: Remove ld9040 driver
> -Original Message- > From: Krzysztof Kozlowski > Sent: Monday, April 30, 2018 1:30 PM > To: Lee Jones ; Daniel Thompson > ; Jingoo Han ; > Bartlomiej Zolnierkiewicz ; linux- > ker...@vger.kernel.org; dri-de...@lists.freedesktop.org; linux- > fb...@vger.kernel.org > Cc: Krzysztof Kozlowski ; Marek Szyprowski > ; Inki Dae > Subject: [PATCH 2/2] backlight: Remove ld9040 driver > > The driver for LD9040 AMOLED LCD panel was superseded with DRM driver > panel-samsung-ld9040.c. It does not support DeviceTree and respective > possible user (Exynos4210 Universal C210) is DeviceTree-only and uses > DRM version of driver.. > > Suggested-by: Marek Szyprowski > Cc: Marek Szyprowski > Cc: Inki Dae > Signed-off-by: Krzysztof Kozlowski > --- > drivers/video/backlight/Kconfig| 8 - > drivers/video/backlight/Makefile | 1 - > drivers/video/backlight/ld9040.c | 811 - > > drivers/video/backlight/ld9040_gamma.h | 202 > 4 files changed, 1022 deletions(-) > delete mode 100644 drivers/video/backlight/ld9040.c > delete mode 100644 drivers/video/backlight/ld9040_gamma.h Acked-by: Jingoo Han Best regards, Jingoo Han [.]
Re: [PATCH 1/2] backlight: Remove s6e63m0 driver
On Monday, April 30, 2018 1:30 PM, Krzysztof Kozlowski wrote: > > The driver for S6E63M0 AMOLED LCD panel is not used. It does not > support DeviceTree and respective possible users (S5Pv210 Aquila and > Goni boards) are DeviceTree-only. > > Suggested-by: Marek Szyprowski> Cc: Marek Szyprowski > Cc: Inki Dae > Signed-off-by: Krzysztof Kozlowski > --- > drivers/video/backlight/Kconfig | 8 - > drivers/video/backlight/Makefile| 1 - > drivers/video/backlight/s6e63m0.c | 857 > Acked-by: Jingoo Han Best regards, Jingoo Han [.]
Re: [PATCH 1/2] backlight: Remove s6e63m0 driver
On Monday, April 30, 2018 1:30 PM, Krzysztof Kozlowski wrote: > > The driver for S6E63M0 AMOLED LCD panel is not used. It does not > support DeviceTree and respective possible users (S5Pv210 Aquila and > Goni boards) are DeviceTree-only. > > Suggested-by: Marek Szyprowski > Cc: Marek Szyprowski > Cc: Inki Dae > Signed-off-by: Krzysztof Kozlowski > --- > drivers/video/backlight/Kconfig | 8 - > drivers/video/backlight/Makefile| 1 - > drivers/video/backlight/s6e63m0.c | 857 > Acked-by: Jingoo Han Best regards, Jingoo Han [.]
[PATCH 3/4 v2] rculist: add list_for_each_entry_from_rcu()
list_for_each_entry_from_rcu() is an RCU version of list_for_each_entry_from(). It walks a linked list under rcu protection, from a given start point. It is similar to list_for_each_entry_continue_rcu() but starts *at* the given position rather than *after* it. Naturally, the start point must be known to be in the list. Also update the documentation for list_for_each_entry_continue_rcu() to match the documentation for the new list_for_each_entry_from_rcu(), and add list_for_each_entry_from_rcu() and the already existing hlist_for_each_entry_from_rcu() to section 7 of whatisRCU.txt. Signed-off-by: NeilBrown--- Documentation/RCU/whatisRCU.txt | 2 ++ include/linux/rculist.h | 32 +++- 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index a27fbfb0efb8..b7d38bd212d2 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt @@ -814,11 +814,13 @@ RCU list traversal: list_next_rcu list_for_each_entry_rcu list_for_each_entry_continue_rcu + list_for_each_entry_from_rcu hlist_first_rcu hlist_next_rcu hlist_pprev_rcu hlist_for_each_entry_rcu hlist_for_each_entry_rcu_bh + hlist_for_each_entry_from_rcu hlist_for_each_entry_continue_rcu hlist_for_each_entry_continue_rcu_bh hlist_nulls_first_rcu diff --git a/include/linux/rculist.h b/include/linux/rculist.h index 127f534fec94..4786c2235b98 100644 --- a/include/linux/rculist.h +++ b/include/linux/rculist.h @@ -396,13 +396,43 @@ static inline void list_splice_tail_init_rcu(struct list_head *list, * @member:the name of the list_head within the struct. * * Continue to iterate over list of given type, continuing after - * the current position. + * the current position which must have been in the list when the RCU read + * lock was taken. + * This would typically require either that you obtained the node from a + * previous walk of the list in the same RCU read-side critical section, or + * that you held some sort of non-RCU reference (such as a reference count) + * to keep the node alive *and* in the list. + * + * This iterator is similar to list_for_each_entry_from_rcu() except + * this starts after the given position and that one starts at the given + * position. */ #define list_for_each_entry_continue_rcu(pos, head, member)\ for (pos = list_entry_rcu(pos->member.next, typeof(*pos), member); \ >member != (head);\ pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) +/** + * list_for_each_entry_from_rcu - iterate over a list from current point + * @pos: the type * to use as a loop cursor. + * @head: the head for your list. + * @member:the name of the list_node within the struct. + * + * Iterate over the tail of a list starting from a given position, + * which must have been in the list when the RCU read lock was taken. + * This would typically require either that you obtained the node from a + * previous walk of the list in the same RCU read-side critical section, or + * that you held some sort of non-RCU reference (such as a reference count) + * to keep the node alive *and* in the list. + * + * This iterator is similar to list_for_each_entry_continue_rcu() except + * this starts from the given position and that one starts from the position + * after the given position. + */ +#define list_for_each_entry_from_rcu(pos, head, member) \ + for (; &(pos)->member != (head); \ + pos = list_entry_rcu(pos->member.next, typeof(*(pos)), member)) + /** * hlist_del_rcu - deletes entry from hash list without re-initialization * @n: the element to delete from the hash list. -- 2.14.0.rc0.dirty signature.asc Description: PGP signature
[PATCH 3/4 v2] rculist: add list_for_each_entry_from_rcu()
list_for_each_entry_from_rcu() is an RCU version of list_for_each_entry_from(). It walks a linked list under rcu protection, from a given start point. It is similar to list_for_each_entry_continue_rcu() but starts *at* the given position rather than *after* it. Naturally, the start point must be known to be in the list. Also update the documentation for list_for_each_entry_continue_rcu() to match the documentation for the new list_for_each_entry_from_rcu(), and add list_for_each_entry_from_rcu() and the already existing hlist_for_each_entry_from_rcu() to section 7 of whatisRCU.txt. Signed-off-by: NeilBrown --- Documentation/RCU/whatisRCU.txt | 2 ++ include/linux/rculist.h | 32 +++- 2 files changed, 33 insertions(+), 1 deletion(-) diff --git a/Documentation/RCU/whatisRCU.txt b/Documentation/RCU/whatisRCU.txt index a27fbfb0efb8..b7d38bd212d2 100644 --- a/Documentation/RCU/whatisRCU.txt +++ b/Documentation/RCU/whatisRCU.txt @@ -814,11 +814,13 @@ RCU list traversal: list_next_rcu list_for_each_entry_rcu list_for_each_entry_continue_rcu + list_for_each_entry_from_rcu hlist_first_rcu hlist_next_rcu hlist_pprev_rcu hlist_for_each_entry_rcu hlist_for_each_entry_rcu_bh + hlist_for_each_entry_from_rcu hlist_for_each_entry_continue_rcu hlist_for_each_entry_continue_rcu_bh hlist_nulls_first_rcu diff --git a/include/linux/rculist.h b/include/linux/rculist.h index 127f534fec94..4786c2235b98 100644 --- a/include/linux/rculist.h +++ b/include/linux/rculist.h @@ -396,13 +396,43 @@ static inline void list_splice_tail_init_rcu(struct list_head *list, * @member:the name of the list_head within the struct. * * Continue to iterate over list of given type, continuing after - * the current position. + * the current position which must have been in the list when the RCU read + * lock was taken. + * This would typically require either that you obtained the node from a + * previous walk of the list in the same RCU read-side critical section, or + * that you held some sort of non-RCU reference (such as a reference count) + * to keep the node alive *and* in the list. + * + * This iterator is similar to list_for_each_entry_from_rcu() except + * this starts after the given position and that one starts at the given + * position. */ #define list_for_each_entry_continue_rcu(pos, head, member)\ for (pos = list_entry_rcu(pos->member.next, typeof(*pos), member); \ >member != (head);\ pos = list_entry_rcu(pos->member.next, typeof(*pos), member)) +/** + * list_for_each_entry_from_rcu - iterate over a list from current point + * @pos: the type * to use as a loop cursor. + * @head: the head for your list. + * @member:the name of the list_node within the struct. + * + * Iterate over the tail of a list starting from a given position, + * which must have been in the list when the RCU read lock was taken. + * This would typically require either that you obtained the node from a + * previous walk of the list in the same RCU read-side critical section, or + * that you held some sort of non-RCU reference (such as a reference count) + * to keep the node alive *and* in the list. + * + * This iterator is similar to list_for_each_entry_continue_rcu() except + * this starts from the given position and that one starts from the position + * after the given position. + */ +#define list_for_each_entry_from_rcu(pos, head, member) \ + for (; &(pos)->member != (head); \ + pos = list_entry_rcu(pos->member.next, typeof(*(pos)), member)) + /** * hlist_del_rcu - deletes entry from hash list without re-initialization * @n: the element to delete from the hash list. -- 2.14.0.rc0.dirty signature.asc Description: PGP signature
Re: [PATCH 3/5] ib_srpt: depend on INFINIBAND_ADDR_TRANS
On Mon, Apr 30, 2018 at 4:35 PM Jason Gunthorpewrote: > On Wed, Apr 25, 2018 at 03:33:39PM -0700, Greg Thelen wrote: > > INFINIBAND_SRPT code depends on INFINIBAND_ADDR_TRANS provided symbols. > > So declare the kconfig dependency. This is necessary to allow for > > enabling INFINIBAND without INFINIBAND_ADDR_TRANS. > > > > Signed-off-by: Greg Thelen > > Cc: Tarick Bedeir > > drivers/infiniband/ulp/srpt/Kconfig | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/infiniband/ulp/srpt/Kconfig b/drivers/infiniband/ulp/srpt/Kconfig > > index 31ee83d528d9..fb8b7182f05e 100644 > > +++ b/drivers/infiniband/ulp/srpt/Kconfig > > @@ -1,6 +1,6 @@ > > config INFINIBAND_SRPT > > tristate "InfiniBand SCSI RDMA Protocol target support" > > - depends on INFINIBAND && TARGET_CORE > > + depends on INFINIBAND && INFINIBAND_ADDR_TRANS && TARGET_CORE > Isn't INFINIBAND && INFINIBAND_ADDR_TRANS a bit redundant? Can't have > INFINIBAND_ADDR_TRANS without INFINIBAND. By kconfig INFINIBAND_ADDR_TRANS depends on INFINIBAND. So yes, it seems redundant. I don't know if anyone has designs to break this dependency and allow for ADDR_TRANS without INFINIBAND. Assuming not, I'd be willing to amend my series removing redundant INFINIBAND and a followup series to remove it from similar depends. Though I'm not familiar with rdma dev tree lifecycle. Is rdma/for-rc a throw away branch (akin to linux-next), or will it be merged into linus/master? If throwaway, then we can amend its patches, otherwise followups will be needed. Let me know what you'd prefer. Thanks. FYI from v4.17-rc3: drivers/staging/lustre/lnet/Kconfig: depends on LNET && PCI && INFINIBAND && INFINIBAND_ADDR_TRANS net/9p/Kconfig: depends on INET && INFINIBAND && INFINIBAND_ADDR_TRANS net/rds/Kconfig: depends on RDS && INFINIBAND && INFINIBAND_ADDR_TRANS net/sunrpc/Kconfig: depends on SUNRPC && INFINIBAND && INFINIBAND_ADDR_TRANS
Re: [PATCH 3/5] ib_srpt: depend on INFINIBAND_ADDR_TRANS
On Mon, Apr 30, 2018 at 4:35 PM Jason Gunthorpe wrote: > On Wed, Apr 25, 2018 at 03:33:39PM -0700, Greg Thelen wrote: > > INFINIBAND_SRPT code depends on INFINIBAND_ADDR_TRANS provided symbols. > > So declare the kconfig dependency. This is necessary to allow for > > enabling INFINIBAND without INFINIBAND_ADDR_TRANS. > > > > Signed-off-by: Greg Thelen > > Cc: Tarick Bedeir > > drivers/infiniband/ulp/srpt/Kconfig | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/infiniband/ulp/srpt/Kconfig b/drivers/infiniband/ulp/srpt/Kconfig > > index 31ee83d528d9..fb8b7182f05e 100644 > > +++ b/drivers/infiniband/ulp/srpt/Kconfig > > @@ -1,6 +1,6 @@ > > config INFINIBAND_SRPT > > tristate "InfiniBand SCSI RDMA Protocol target support" > > - depends on INFINIBAND && TARGET_CORE > > + depends on INFINIBAND && INFINIBAND_ADDR_TRANS && TARGET_CORE > Isn't INFINIBAND && INFINIBAND_ADDR_TRANS a bit redundant? Can't have > INFINIBAND_ADDR_TRANS without INFINIBAND. By kconfig INFINIBAND_ADDR_TRANS depends on INFINIBAND. So yes, it seems redundant. I don't know if anyone has designs to break this dependency and allow for ADDR_TRANS without INFINIBAND. Assuming not, I'd be willing to amend my series removing redundant INFINIBAND and a followup series to remove it from similar depends. Though I'm not familiar with rdma dev tree lifecycle. Is rdma/for-rc a throw away branch (akin to linux-next), or will it be merged into linus/master? If throwaway, then we can amend its patches, otherwise followups will be needed. Let me know what you'd prefer. Thanks. FYI from v4.17-rc3: drivers/staging/lustre/lnet/Kconfig: depends on LNET && PCI && INFINIBAND && INFINIBAND_ADDR_TRANS net/9p/Kconfig: depends on INET && INFINIBAND && INFINIBAND_ADDR_TRANS net/rds/Kconfig: depends on RDS && INFINIBAND && INFINIBAND_ADDR_TRANS net/sunrpc/Kconfig: depends on SUNRPC && INFINIBAND && INFINIBAND_ADDR_TRANS
[PATCH][next] pinctrl: actions: Fix Kconfig dependency and help text
1. Fix Kconfig dependency for Actions Semi S900 pinctrl driver which generates below warning in x86: WARNING: unmet direct dependencies detected for PINCTRL_OWL Depends on [n]: PINCTRL [=y] && (ARCH_ACTIONS || COMPILE_TEST [=n]) && OF [=n] Selected by [y]: - PINCTRL_S900 [=y] && PINCTRL [=y] 2. Add help text for OWL pinctrl driver Signed-off-by: Manivannan Sadhasivam--- drivers/pinctrl/actions/Kconfig | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/pinctrl/actions/Kconfig b/drivers/pinctrl/actions/Kconfig index 1c7309c90f0d..ede97cdbbc12 100644 --- a/drivers/pinctrl/actions/Kconfig +++ b/drivers/pinctrl/actions/Kconfig @@ -1,12 +1,14 @@ config PINCTRL_OWL - bool + bool "Actions Semi OWL pinctrl driver" depends on (ARCH_ACTIONS || COMPILE_TEST) && OF select PINMUX select PINCONF select GENERIC_PINCONF + help + Say Y here to enable Actions Semi OWL pinctrl driver config PINCTRL_S900 bool "Actions Semi S900 pinctrl driver" - select PINCTRL_OWL + depends on PINCTRL_OWL help Say Y here to enable Actions Semi S900 pinctrl driver -- 2.14.1
[PATCH][next] pinctrl: actions: Fix Kconfig dependency and help text
1. Fix Kconfig dependency for Actions Semi S900 pinctrl driver which generates below warning in x86: WARNING: unmet direct dependencies detected for PINCTRL_OWL Depends on [n]: PINCTRL [=y] && (ARCH_ACTIONS || COMPILE_TEST [=n]) && OF [=n] Selected by [y]: - PINCTRL_S900 [=y] && PINCTRL [=y] 2. Add help text for OWL pinctrl driver Signed-off-by: Manivannan Sadhasivam --- drivers/pinctrl/actions/Kconfig | 6 -- 1 file changed, 4 insertions(+), 2 deletions(-) diff --git a/drivers/pinctrl/actions/Kconfig b/drivers/pinctrl/actions/Kconfig index 1c7309c90f0d..ede97cdbbc12 100644 --- a/drivers/pinctrl/actions/Kconfig +++ b/drivers/pinctrl/actions/Kconfig @@ -1,12 +1,14 @@ config PINCTRL_OWL - bool + bool "Actions Semi OWL pinctrl driver" depends on (ARCH_ACTIONS || COMPILE_TEST) && OF select PINMUX select PINCONF select GENERIC_PINCONF + help + Say Y here to enable Actions Semi OWL pinctrl driver config PINCTRL_S900 bool "Actions Semi S900 pinctrl driver" - select PINCTRL_OWL + depends on PINCTRL_OWL help Say Y here to enable Actions Semi S900 pinctrl driver -- 2.14.1
[GIT] XArray v12
I've made version 12 of the XArray and page cache conversion available at git://git.infradead.org/users/willy/linux-dax.git xarray-20180430 Changes since v11: - At Goldwyn's request, renamed xas_for_each_tag -> xas_for_each_tagged, xas_find_tag -> xas_find_tagged and xas_next_tag -> xas_next_tagged - Fix performance regression (relative to radix_tree_tag_clear) when using xas_clear_tag to clear an already-cleared tag. - Use __test_and_set_bit in node_set_tag() rather than testing node_get_tag() before calling node_set_tag(). - Added asm-generic/bitops/non-atomic.h to tools/include - Removed xas_create() from the exported API. All callers can use xas_load instead. It makes the callers more understanable and it reduces the size of the API. - Documented xas_create_range(). - Improved the documentation for xas_store(), explaining the return value for a multi-index xa_state. - Re-re-did the memfd patches on top of the current state of play. - Used xas_set_order() to zero out all entries for a THP page instead of a loop in page_cache_delete(). Goldwyn pointed out the loop was ugly, and then so did everybody at LSFMM. - Rewrote the nilfs patch to be closer to the original radix tree-based code since I have no way of verifying it and the maintainer isn't responding to requests to see if it works. - f2fs dropped its copy of __set_page_dirty_buffers, so dropped my modification of it. - Fixed a missing irq-disable in shmem_free_swap(). Matthew Wilcox (63): xarray: Replace exceptional entries xarray: Change definition of sibling entries xarray: Add definition of struct xarray xarray: Define struct xa_node xarray: Add documentation xarray: Add xa_load xarray: Add XArray tags xarray: Add xa_store xarray: Add xa_cmpxchg and xa_insert xarray: Add xa_for_each xarray: Add xa_extract xarray: Add xa_destroy xarray: Add xas_next and xas_prev xarray: Add xas_create_range xarray: Add MAINTAINERS entry page cache: Rearrange address_space page cache: Convert hole search to XArray page cache: Add and replace pages using the XArray page cache: Convert page deletion to XArray page cache: Convert page cache lookups to XArray page cache: Convert delete_batch to XArray page cache: Remove stray radix comment page cache: Convert filemap_range_has_page to XArray mm: Convert page-writeback to XArray mm: Convert workingset to XArray mm: Convert truncate to XArray mm: Convert add_to_swap_cache to XArray mm: Convert delete_from_swap_cache to XArray mm: Convert __do_page_cache_readahead to XArray mm: Convert page migration to XArray mm: Convert huge_memory to XArray mm: Convert collapse_shmem to XArray mm: Convert khugepaged_scan_shmem to XArray pagevec: Use xa_tag_t shmem: Convert replace to XArray shmem: Convert shmem_confirm_swap to XArray shmem: Convert find_swap_entry to XArray shmem: Convert shmem_add_to_page_cache to XArray shmem: Convert shmem_alloc_hugepage to XArray shmem: Convert shmem_free_swap to XArray shmem: Convert shmem_partial_swap_usage to XArray memfd: Convert memfd_wait_for_pins to XArray memfd: Convert memfd_tag_pins to XArray shmem: Comment fixups btrfs: Convert page cache to XArray fs: Convert buffer to XArray fs: Convert writeback to XArray nilfs2: Convert to XArray f2fs: Convert to XArray lustre: Convert to XArray dax: Fix use of zero page dax: dax_insert_mapping_entry always succeeds dax: Rename some functions dax: Hash on XArray instead of mapping dax: Convert dax_insert_pfn_mkwrite to XArray dax: Convert __dax_invalidate_entry to XArray dax: Convert dax writeback to XArray dax: Convert page fault handlers to XArray dax: Return fault code from dax_load_hole page cache: Finish XArray conversion radix tree: Remove unused functions radix tree: Remove radix_tree_update_node_t radix tree: Remove radix_tree_clear_tags .clang-format |1 - Documentation/core-api/index.rst|1 + Documentation/core-api/xarray.rst | 360 + MAINTAINERS | 12 + arch/powerpc/include/asm/book3s/64/pgtable.h|4 +- arch/powerpc/include/asm/nohash/64/pgtable.h|4 +- drivers/gpu/drm/i915/i915_gem.c | 17 +- drivers/staging/lustre/lustre/llite/glimpse.c | 12 +- drivers/staging/lustre/lustre/mdc/mdc_request.c | 16 +- fs/btrfs/compression.c |6 +- fs/btrfs/extent_io.c| 12 +- fs/buffer.c | 14 +- fs/dax.c| 725 -- fs/ext
[GIT] XArray v12
I've made version 12 of the XArray and page cache conversion available at git://git.infradead.org/users/willy/linux-dax.git xarray-20180430 Changes since v11: - At Goldwyn's request, renamed xas_for_each_tag -> xas_for_each_tagged, xas_find_tag -> xas_find_tagged and xas_next_tag -> xas_next_tagged - Fix performance regression (relative to radix_tree_tag_clear) when using xas_clear_tag to clear an already-cleared tag. - Use __test_and_set_bit in node_set_tag() rather than testing node_get_tag() before calling node_set_tag(). - Added asm-generic/bitops/non-atomic.h to tools/include - Removed xas_create() from the exported API. All callers can use xas_load instead. It makes the callers more understanable and it reduces the size of the API. - Documented xas_create_range(). - Improved the documentation for xas_store(), explaining the return value for a multi-index xa_state. - Re-re-did the memfd patches on top of the current state of play. - Used xas_set_order() to zero out all entries for a THP page instead of a loop in page_cache_delete(). Goldwyn pointed out the loop was ugly, and then so did everybody at LSFMM. - Rewrote the nilfs patch to be closer to the original radix tree-based code since I have no way of verifying it and the maintainer isn't responding to requests to see if it works. - f2fs dropped its copy of __set_page_dirty_buffers, so dropped my modification of it. - Fixed a missing irq-disable in shmem_free_swap(). Matthew Wilcox (63): xarray: Replace exceptional entries xarray: Change definition of sibling entries xarray: Add definition of struct xarray xarray: Define struct xa_node xarray: Add documentation xarray: Add xa_load xarray: Add XArray tags xarray: Add xa_store xarray: Add xa_cmpxchg and xa_insert xarray: Add xa_for_each xarray: Add xa_extract xarray: Add xa_destroy xarray: Add xas_next and xas_prev xarray: Add xas_create_range xarray: Add MAINTAINERS entry page cache: Rearrange address_space page cache: Convert hole search to XArray page cache: Add and replace pages using the XArray page cache: Convert page deletion to XArray page cache: Convert page cache lookups to XArray page cache: Convert delete_batch to XArray page cache: Remove stray radix comment page cache: Convert filemap_range_has_page to XArray mm: Convert page-writeback to XArray mm: Convert workingset to XArray mm: Convert truncate to XArray mm: Convert add_to_swap_cache to XArray mm: Convert delete_from_swap_cache to XArray mm: Convert __do_page_cache_readahead to XArray mm: Convert page migration to XArray mm: Convert huge_memory to XArray mm: Convert collapse_shmem to XArray mm: Convert khugepaged_scan_shmem to XArray pagevec: Use xa_tag_t shmem: Convert replace to XArray shmem: Convert shmem_confirm_swap to XArray shmem: Convert find_swap_entry to XArray shmem: Convert shmem_add_to_page_cache to XArray shmem: Convert shmem_alloc_hugepage to XArray shmem: Convert shmem_free_swap to XArray shmem: Convert shmem_partial_swap_usage to XArray memfd: Convert memfd_wait_for_pins to XArray memfd: Convert memfd_tag_pins to XArray shmem: Comment fixups btrfs: Convert page cache to XArray fs: Convert buffer to XArray fs: Convert writeback to XArray nilfs2: Convert to XArray f2fs: Convert to XArray lustre: Convert to XArray dax: Fix use of zero page dax: dax_insert_mapping_entry always succeeds dax: Rename some functions dax: Hash on XArray instead of mapping dax: Convert dax_insert_pfn_mkwrite to XArray dax: Convert __dax_invalidate_entry to XArray dax: Convert dax writeback to XArray dax: Convert page fault handlers to XArray dax: Return fault code from dax_load_hole page cache: Finish XArray conversion radix tree: Remove unused functions radix tree: Remove radix_tree_update_node_t radix tree: Remove radix_tree_clear_tags .clang-format |1 - Documentation/core-api/index.rst|1 + Documentation/core-api/xarray.rst | 360 + MAINTAINERS | 12 + arch/powerpc/include/asm/book3s/64/pgtable.h|4 +- arch/powerpc/include/asm/nohash/64/pgtable.h|4 +- drivers/gpu/drm/i915/i915_gem.c | 17 +- drivers/staging/lustre/lustre/llite/glimpse.c | 12 +- drivers/staging/lustre/lustre/mdc/mdc_request.c | 16 +- fs/btrfs/compression.c |6 +- fs/btrfs/extent_io.c| 12 +- fs/buffer.c | 14 +- fs/dax.c| 725 -- fs/ext
Re: [PATCH RFC v5 5/6] tracepoint: Make rcuidle tracepoint callers use SRCU
On Mon, Apr 30, 2018 at 6:42 PM Joel Fernandeswrote: > In recent tests with IRQ on/off tracepoints, a large performance > overhead ~10% is noticed when running hackbench. This is root caused to > calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the > tracepoint code. Following a long discussion on the list [1] about this, > we concluded that srcu is a better alternative for use during rcu idle. > Although it does involve extra barriers, its lighter than the sched-rcu > version which has to do additional RCU calls to notify RCU idle about > entry into RCU sections. > In this patch, we change the underlying implementation of the > trace_*_rcuidle API to use SRCU. This has shown to improve performance > alot for the high frequency irq enable/disable tracepoints. > Test: Tested idle and preempt/irq tracepoints. > [1] https://patchwork.kernel.org/patch/10344297/ > Cc: Steven Rostedt > Cc: Peter Zilstra > Cc: Ingo Molnar > Cc: Mathieu Desnoyers > Cc: Tom Zanussi > Cc: Namhyung Kim > Cc: Thomas Glexiner > Cc: Boqun Feng > Cc: Paul McKenney > Cc: Frederic Weisbecker > Cc: Randy Dunlap > Cc: Masami Hiramatsu > Cc: Fenguang Wu > Cc: Baohong Liu > Cc: Vedang Patel > Cc: kernel-t...@android.com > Signed-off-by: Joel Fernandes > --- > include/linux/tracepoint.h | 46 +++--- > kernel/tracepoint.c| 10 - > 2 files changed, 47 insertions(+), 9 deletions(-) > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h > index c94f466d57ef..4135e08fb5f1 100644 > --- a/include/linux/tracepoint.h > +++ b/include/linux/tracepoint.h > @@ -15,6 +15,7 @@ >*/ > #include > +#include > #include > #include > #include > @@ -33,6 +34,8 @@ struct trace_eval_map { > #define TRACEPOINT_DEFAULT_PRIO10 > +extern struct srcu_struct tracepoint_srcu; > + > extern int > tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data); > extern int > @@ -77,6 +80,9 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb) >*/ > static inline void tracepoint_synchronize_unregister(void) > { > +#ifdef CONFIG_TRACEPOINTS > + synchronize_srcu(_srcu); > +#endif > synchronize_sched(); > } > @@ -129,18 +135,38 @@ extern void syscall_unregfunc(void); >* as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just >* "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto". >*/ > -#define __DO_TRACE(tp, proto, args, cond, rcucheck)\ > +#define __DO_TRACE(tp, proto, args, cond, rcuidle) \ > do {\ > struct tracepoint_func *it_func_ptr;\ > void *it_func; \ > void *__data; \ > + int __maybe_unused idx = 0; \ > \ > if (!(cond))\ > return; \ > - if (rcucheck) \ > - rcu_irq_enter_irqson(); \ > - rcu_read_lock_sched_notrace(); \ > - it_func_ptr = rcu_dereference_sched((tp)->funcs); \ > + \ > + /* \ > +* For rcuidle callers, use srcu since sched-rcu\ > +* doesn't work from the idle path. \ > +*/ \ > + if (rcuidle) { \ > + if (in_nmi()) { \ > + WARN_ON_ONCE(1);\ > + return; /* no srcu from nmi */ \ > + } \ > + \ > + /* To keep it consistent with !rcuidle path */ \ > + preempt_disable_notrace(); \ > +
Re: [PATCH RFC v5 5/6] tracepoint: Make rcuidle tracepoint callers use SRCU
On Mon, Apr 30, 2018 at 6:42 PM Joel Fernandes wrote: > In recent tests with IRQ on/off tracepoints, a large performance > overhead ~10% is noticed when running hackbench. This is root caused to > calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the > tracepoint code. Following a long discussion on the list [1] about this, > we concluded that srcu is a better alternative for use during rcu idle. > Although it does involve extra barriers, its lighter than the sched-rcu > version which has to do additional RCU calls to notify RCU idle about > entry into RCU sections. > In this patch, we change the underlying implementation of the > trace_*_rcuidle API to use SRCU. This has shown to improve performance > alot for the high frequency irq enable/disable tracepoints. > Test: Tested idle and preempt/irq tracepoints. > [1] https://patchwork.kernel.org/patch/10344297/ > Cc: Steven Rostedt > Cc: Peter Zilstra > Cc: Ingo Molnar > Cc: Mathieu Desnoyers > Cc: Tom Zanussi > Cc: Namhyung Kim > Cc: Thomas Glexiner > Cc: Boqun Feng > Cc: Paul McKenney > Cc: Frederic Weisbecker > Cc: Randy Dunlap > Cc: Masami Hiramatsu > Cc: Fenguang Wu > Cc: Baohong Liu > Cc: Vedang Patel > Cc: kernel-t...@android.com > Signed-off-by: Joel Fernandes > --- > include/linux/tracepoint.h | 46 +++--- > kernel/tracepoint.c| 10 - > 2 files changed, 47 insertions(+), 9 deletions(-) > diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h > index c94f466d57ef..4135e08fb5f1 100644 > --- a/include/linux/tracepoint.h > +++ b/include/linux/tracepoint.h > @@ -15,6 +15,7 @@ >*/ > #include > +#include > #include > #include > #include > @@ -33,6 +34,8 @@ struct trace_eval_map { > #define TRACEPOINT_DEFAULT_PRIO10 > +extern struct srcu_struct tracepoint_srcu; > + > extern int > tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data); > extern int > @@ -77,6 +80,9 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb) >*/ > static inline void tracepoint_synchronize_unregister(void) > { > +#ifdef CONFIG_TRACEPOINTS > + synchronize_srcu(_srcu); > +#endif > synchronize_sched(); > } > @@ -129,18 +135,38 @@ extern void syscall_unregfunc(void); >* as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just >* "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto". >*/ > -#define __DO_TRACE(tp, proto, args, cond, rcucheck)\ > +#define __DO_TRACE(tp, proto, args, cond, rcuidle) \ > do {\ > struct tracepoint_func *it_func_ptr;\ > void *it_func; \ > void *__data; \ > + int __maybe_unused idx = 0; \ > \ > if (!(cond))\ > return; \ > - if (rcucheck) \ > - rcu_irq_enter_irqson(); \ > - rcu_read_lock_sched_notrace(); \ > - it_func_ptr = rcu_dereference_sched((tp)->funcs); \ > + \ > + /* \ > +* For rcuidle callers, use srcu since sched-rcu\ > +* doesn't work from the idle path. \ > +*/ \ > + if (rcuidle) { \ > + if (in_nmi()) { \ > + WARN_ON_ONCE(1);\ > + return; /* no srcu from nmi */ \ > + } \ > + \ > + /* To keep it consistent with !rcuidle path */ \ > + preempt_disable_notrace(); \ > + \ > + idx = srcu_read_lock_notrace(_srcu); \ > + it_func_ptr = srcu_dereference((tp)->funcs, \ > + _srcu); \ This last bit is supposed to be srcu_dereference_notrace. The hunk to use that is actually in patch 6/6 , sorry about that. I've fixed it in my tree and it means patches
[PATCH] regulator: ltc3676: Assure PGOOD mask is set before changing voltage
Make sure the DVBxB bit 5, PGOOD mask, is set before changing voltage on the buck converters. If the PGOOD mask bit is not set, the PMIC may deassert the PGOOD signal during the voltage transition. On systems that use the PGOOD signal as a power OK indication for the board or SoC, which should be the case on correct designs, deasserting the PGOOD signal will lead to system reset or shutdown, which is not the expected behavior when changing PMIC buck converter voltage. Signed-off-by: Marek VasutCc: Javier Martinez Canillas Cc: Mark Brown --- NOTE: This was observed on an iMX6Q design during DVFS. When the cpufreq driver changed the frequency and scaled the voltage up, the system froze. This was because the PGOOD went down and the voltage rails lost power shortly after. --- drivers/regulator/ltc3676.c | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/regulator/ltc3676.c b/drivers/regulator/ltc3676.c index 662ee05ea44d..9dec1609ff66 100644 --- a/drivers/regulator/ltc3676.c +++ b/drivers/regulator/ltc3676.c @@ -52,6 +52,7 @@ #define LTC3676_CLIRQ 0x1F #define LTC3676_DVBxA_REF_SELECT BIT(5) +#define LTC3676_DVBxB_PGOOD_MASK BIT(5) #define LTC3676_IRQSTAT_PGOOD_TIMEOUT BIT(3) #define LTC3676_IRQSTAT_UNDERVOLT_WARN BIT(4) @@ -123,6 +124,23 @@ static int ltc3676_set_suspend_mode(struct regulator_dev *rdev, mask, val); } +static int ltc3676_set_voltage_sel(struct regulator_dev *rdev, unsigned selector) +{ + struct ltc3676 *ltc3676 = rdev_get_drvdata(rdev); + struct device *dev = ltc3676->dev; + int ret, dcdc = rdev_get_id(rdev); + + dev_dbg(dev, "%s id=%d selector=%d\n", __func__, dcdc, selector); + + ret = regmap_update_bits(ltc3676->regmap, rdev->desc->vsel_reg + 1, +LTC3676_DVBxB_PGOOD_MASK, +LTC3676_DVBxB_PGOOD_MASK); + if (ret) + return ret; + + return regulator_set_voltage_sel_regmap(rdev, selector); +} + static inline unsigned int ltc3676_scale(unsigned int uV, u32 r1, u32 r2) { uint64_t tmp; @@ -166,7 +184,7 @@ static const struct regulator_ops ltc3676_linear_regulator_ops = { .disable = regulator_disable_regmap, .is_enabled = regulator_is_enabled_regmap, .list_voltage = regulator_list_voltage_linear, - .set_voltage_sel = regulator_set_voltage_sel_regmap, + .set_voltage_sel = ltc3676_set_voltage_sel, .get_voltage_sel = regulator_get_voltage_sel_regmap, .set_suspend_voltage = ltc3676_set_suspend_voltage, .set_suspend_mode = ltc3676_set_suspend_mode, -- 2.16.2
[PATCH] regulator: ltc3676: Assure PGOOD mask is set before changing voltage
Make sure the DVBxB bit 5, PGOOD mask, is set before changing voltage on the buck converters. If the PGOOD mask bit is not set, the PMIC may deassert the PGOOD signal during the voltage transition. On systems that use the PGOOD signal as a power OK indication for the board or SoC, which should be the case on correct designs, deasserting the PGOOD signal will lead to system reset or shutdown, which is not the expected behavior when changing PMIC buck converter voltage. Signed-off-by: Marek Vasut Cc: Javier Martinez Canillas Cc: Mark Brown --- NOTE: This was observed on an iMX6Q design during DVFS. When the cpufreq driver changed the frequency and scaled the voltage up, the system froze. This was because the PGOOD went down and the voltage rails lost power shortly after. --- drivers/regulator/ltc3676.c | 20 +++- 1 file changed, 19 insertions(+), 1 deletion(-) diff --git a/drivers/regulator/ltc3676.c b/drivers/regulator/ltc3676.c index 662ee05ea44d..9dec1609ff66 100644 --- a/drivers/regulator/ltc3676.c +++ b/drivers/regulator/ltc3676.c @@ -52,6 +52,7 @@ #define LTC3676_CLIRQ 0x1F #define LTC3676_DVBxA_REF_SELECT BIT(5) +#define LTC3676_DVBxB_PGOOD_MASK BIT(5) #define LTC3676_IRQSTAT_PGOOD_TIMEOUT BIT(3) #define LTC3676_IRQSTAT_UNDERVOLT_WARN BIT(4) @@ -123,6 +124,23 @@ static int ltc3676_set_suspend_mode(struct regulator_dev *rdev, mask, val); } +static int ltc3676_set_voltage_sel(struct regulator_dev *rdev, unsigned selector) +{ + struct ltc3676 *ltc3676 = rdev_get_drvdata(rdev); + struct device *dev = ltc3676->dev; + int ret, dcdc = rdev_get_id(rdev); + + dev_dbg(dev, "%s id=%d selector=%d\n", __func__, dcdc, selector); + + ret = regmap_update_bits(ltc3676->regmap, rdev->desc->vsel_reg + 1, +LTC3676_DVBxB_PGOOD_MASK, +LTC3676_DVBxB_PGOOD_MASK); + if (ret) + return ret; + + return regulator_set_voltage_sel_regmap(rdev, selector); +} + static inline unsigned int ltc3676_scale(unsigned int uV, u32 r1, u32 r2) { uint64_t tmp; @@ -166,7 +184,7 @@ static const struct regulator_ops ltc3676_linear_regulator_ops = { .disable = regulator_disable_regmap, .is_enabled = regulator_is_enabled_regmap, .list_voltage = regulator_list_voltage_linear, - .set_voltage_sel = regulator_set_voltage_sel_regmap, + .set_voltage_sel = ltc3676_set_voltage_sel, .get_voltage_sel = regulator_get_voltage_sel_regmap, .set_suspend_voltage = ltc3676_set_suspend_voltage, .set_suspend_mode = ltc3676_set_suspend_mode, -- 2.16.2
Re: [PATCH] z3fold: fix reclaim lock-ups
Hi Vitaly, On 04/30/2018 03:58 AM, Vitaly Wool wrote: Do not try to optimize in-page object layout while the page is under reclaim. This fixes lock-ups on reclaim and improves reclaim performance at the same time. A heads-up: z3fold is still crashing (due to a NULL pointer access) under heavy memory pressure with this patch applied. That doesn't mean the patch should not be applied - the new crash is different - but there is more work to do. See https://bugs.chromium.org/p/chromium/issues/detail?id=822360#c21 for a crash log. This was seen with chromeos-4.14 with (I hope) all relevant z3fold patches applied. I am trying to reproduce the problem on top of mainline. Guenter Reported-by: Guenter RoeckSigned-off-by: Vitaly Wool --- mm/z3fold.c | 42 ++ 1 file changed, 30 insertions(+), 12 deletions(-) diff --git a/mm/z3fold.c b/mm/z3fold.c index c0bca6153b95..901c0b07cbda 100644 --- a/mm/z3fold.c +++ b/mm/z3fold.c @@ -144,7 +144,8 @@ enum z3fold_page_flags { PAGE_HEADLESS = 0, MIDDLE_CHUNK_MAPPED, NEEDS_COMPACTING, - PAGE_STALE + PAGE_STALE, + UNDER_RECLAIM }; /* @@ -173,6 +174,7 @@ static struct z3fold_header *init_z3fold_page(struct page *page, clear_bit(MIDDLE_CHUNK_MAPPED, >private); clear_bit(NEEDS_COMPACTING, >private); clear_bit(PAGE_STALE, >private); + clear_bit(UNDER_RECLAIM, >private); spin_lock_init(>page_lock); kref_init(>refcount); @@ -756,6 +758,10 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned long handle) atomic64_dec(>pages_nr); return; } + if (test_bit(UNDER_RECLAIM, >private)) { + z3fold_page_unlock(zhdr); + return; + } if (test_and_set_bit(NEEDS_COMPACTING, >private)) { z3fold_page_unlock(zhdr); return; @@ -840,6 +846,8 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) kref_get(>refcount); list_del_init(>buddy); zhdr->cpu = -1; + set_bit(UNDER_RECLAIM, >private); + break; } list_del_init(>lru); @@ -887,25 +895,35 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) goto next; } next: - spin_lock(>lock); if (test_bit(PAGE_HEADLESS, >private)) { if (ret == 0) { - spin_unlock(>lock); free_z3fold_page(page); return 0; } - } else if (kref_put(>refcount, release_z3fold_page)) { - atomic64_dec(>pages_nr); + spin_lock(>lock); + list_add(>lru, >lru); + spin_unlock(>lock); + } else { + z3fold_page_lock(zhdr); + clear_bit(UNDER_RECLAIM, >private); +if (kref_put(>refcount, + release_z3fold_page_locked)) { + atomic64_dec(>pages_nr); + return 0; + } + /* +* if we are here, the page is still not completely +* free. Take the global pool lock then to be able +* to add it back to the lru list +*/ + spin_lock(>lock); + list_add(>lru, >lru); spin_unlock(>lock); - return 0; + z3fold_page_unlock(zhdr); } - /* -* Add to the beginning of LRU. -* Pool lock has to be kept here to ensure the page has -* not already been released -*/ - list_add(>lru, >lru); + /* We started off locked to we need to lock the pool back */ + spin_lock(>lock); } spin_unlock(>lock); return -EAGAIN;
Re: [PATCH] z3fold: fix reclaim lock-ups
Hi Vitaly, On 04/30/2018 03:58 AM, Vitaly Wool wrote: Do not try to optimize in-page object layout while the page is under reclaim. This fixes lock-ups on reclaim and improves reclaim performance at the same time. A heads-up: z3fold is still crashing (due to a NULL pointer access) under heavy memory pressure with this patch applied. That doesn't mean the patch should not be applied - the new crash is different - but there is more work to do. See https://bugs.chromium.org/p/chromium/issues/detail?id=822360#c21 for a crash log. This was seen with chromeos-4.14 with (I hope) all relevant z3fold patches applied. I am trying to reproduce the problem on top of mainline. Guenter Reported-by: Guenter Roeck Signed-off-by: Vitaly Wool --- mm/z3fold.c | 42 ++ 1 file changed, 30 insertions(+), 12 deletions(-) diff --git a/mm/z3fold.c b/mm/z3fold.c index c0bca6153b95..901c0b07cbda 100644 --- a/mm/z3fold.c +++ b/mm/z3fold.c @@ -144,7 +144,8 @@ enum z3fold_page_flags { PAGE_HEADLESS = 0, MIDDLE_CHUNK_MAPPED, NEEDS_COMPACTING, - PAGE_STALE + PAGE_STALE, + UNDER_RECLAIM }; /* @@ -173,6 +174,7 @@ static struct z3fold_header *init_z3fold_page(struct page *page, clear_bit(MIDDLE_CHUNK_MAPPED, >private); clear_bit(NEEDS_COMPACTING, >private); clear_bit(PAGE_STALE, >private); + clear_bit(UNDER_RECLAIM, >private); spin_lock_init(>page_lock); kref_init(>refcount); @@ -756,6 +758,10 @@ static void z3fold_free(struct z3fold_pool *pool, unsigned long handle) atomic64_dec(>pages_nr); return; } + if (test_bit(UNDER_RECLAIM, >private)) { + z3fold_page_unlock(zhdr); + return; + } if (test_and_set_bit(NEEDS_COMPACTING, >private)) { z3fold_page_unlock(zhdr); return; @@ -840,6 +846,8 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) kref_get(>refcount); list_del_init(>buddy); zhdr->cpu = -1; + set_bit(UNDER_RECLAIM, >private); + break; } list_del_init(>lru); @@ -887,25 +895,35 @@ static int z3fold_reclaim_page(struct z3fold_pool *pool, unsigned int retries) goto next; } next: - spin_lock(>lock); if (test_bit(PAGE_HEADLESS, >private)) { if (ret == 0) { - spin_unlock(>lock); free_z3fold_page(page); return 0; } - } else if (kref_put(>refcount, release_z3fold_page)) { - atomic64_dec(>pages_nr); + spin_lock(>lock); + list_add(>lru, >lru); + spin_unlock(>lock); + } else { + z3fold_page_lock(zhdr); + clear_bit(UNDER_RECLAIM, >private); +if (kref_put(>refcount, + release_z3fold_page_locked)) { + atomic64_dec(>pages_nr); + return 0; + } + /* +* if we are here, the page is still not completely +* free. Take the global pool lock then to be able +* to add it back to the lru list +*/ + spin_lock(>lock); + list_add(>lru, >lru); spin_unlock(>lock); - return 0; + z3fold_page_unlock(zhdr); } - /* -* Add to the beginning of LRU. -* Pool lock has to be kept here to ensure the page has -* not already been released -*/ - list_add(>lru, >lru); + /* We started off locked to we need to lock the pool back */ + spin_lock(>lock); } spin_unlock(>lock); return -EAGAIN;
[PATCH RFC v5 1/6] softirq: reorder trace_softirqs_on to prevent lockdep splat
I'm able to reproduce a lockdep splat when CONFIG_PROVE_LOCKING=y and CONFIG_PREEMPTIRQ_EVENTS=y. $ echo 1 > /d/tracing/events/preemptirq/preempt_enable/enable Cc: Steven RostedtCc: Peter Zilstra Cc: Ingo Molnar Cc: Mathieu Desnoyers Cc: Tom Zanussi Cc: Namhyung Kim Cc: Thomas Glexiner Cc: Boqun Feng Cc: Paul McKenney Cc: Frederic Weisbecker Cc: Randy Dunlap Cc: Masami Hiramatsu Cc: Fenguang Wu Cc: Baohong Liu Cc: Vedang Patel Cc: kernel-t...@android.com Signed-off-by: Joel Fernandes --- kernel/softirq.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/kernel/softirq.c b/kernel/softirq.c index 24d243ef8e71..47e2f61938c0 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -139,9 +139,13 @@ static void __local_bh_enable(unsigned int cnt) { lockdep_assert_irqs_disabled(); + if (preempt_count() == cnt) + trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip()); + if (softirq_count() == (cnt & SOFTIRQ_MASK)) trace_softirqs_on(_RET_IP_); - preempt_count_sub(cnt); + + __preempt_count_sub(cnt); } /* -- 2.17.0.441.gb46fe60e1d-goog
[PATCH RFC v5 1/6] softirq: reorder trace_softirqs_on to prevent lockdep splat
I'm able to reproduce a lockdep splat when CONFIG_PROVE_LOCKING=y and CONFIG_PREEMPTIRQ_EVENTS=y. $ echo 1 > /d/tracing/events/preemptirq/preempt_enable/enable Cc: Steven Rostedt Cc: Peter Zilstra Cc: Ingo Molnar Cc: Mathieu Desnoyers Cc: Tom Zanussi Cc: Namhyung Kim Cc: Thomas Glexiner Cc: Boqun Feng Cc: Paul McKenney Cc: Frederic Weisbecker Cc: Randy Dunlap Cc: Masami Hiramatsu Cc: Fenguang Wu Cc: Baohong Liu Cc: Vedang Patel Cc: kernel-t...@android.com Signed-off-by: Joel Fernandes --- kernel/softirq.c | 6 +- 1 file changed, 5 insertions(+), 1 deletion(-) diff --git a/kernel/softirq.c b/kernel/softirq.c index 24d243ef8e71..47e2f61938c0 100644 --- a/kernel/softirq.c +++ b/kernel/softirq.c @@ -139,9 +139,13 @@ static void __local_bh_enable(unsigned int cnt) { lockdep_assert_irqs_disabled(); + if (preempt_count() == cnt) + trace_preempt_on(CALLER_ADDR0, get_lock_parent_ip()); + if (softirq_count() == (cnt & SOFTIRQ_MASK)) trace_softirqs_on(_RET_IP_); - preempt_count_sub(cnt); + + __preempt_count_sub(cnt); } /* -- 2.17.0.441.gb46fe60e1d-goog
[PATCH RFC v5 2/6] srcu: Add notrace variants of srcu_read_{lock,unlock}
From: "Paul E. McKenney"This is needed for a future tracepoint patch that uses srcu, and to make sure it doesn't call into lockdep. tracepoint code already calls notrace variants for rcu_read_lock_sched so this patch does the same for srcu which will be used in a later patch. Keeps it consistent with rcu-sched. [Joel: Added commit message] Cc: Steven Rostedt Cc: Peter Zilstra Cc: Ingo Molnar Cc: Mathieu Desnoyers Cc: Tom Zanussi Cc: Namhyung Kim Cc: Thomas Glexiner Cc: Boqun Feng Cc: Paul McKenney Cc: Frederic Weisbecker Cc: Randy Dunlap Cc: Masami Hiramatsu Cc: Fenguang Wu Cc: Baohong Liu Cc: Vedang Patel Cc: kernel-t...@android.com Signed-off-by: Paul McKenney Signed-off-by: Joel Fernandes --- include/linux/srcu.h | 17 + 1 file changed, 17 insertions(+) diff --git a/include/linux/srcu.h b/include/linux/srcu.h index 33c1c698df09..2ec618979b20 100644 --- a/include/linux/srcu.h +++ b/include/linux/srcu.h @@ -161,6 +161,16 @@ static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp) return retval; } +/* Used by tracing, cannot be traced and cannot invoke lockdep. */ +static inline notrace int +srcu_read_lock_notrace(struct srcu_struct *sp) __acquires(sp) +{ + int retval; + + retval = __srcu_read_lock(sp); + return retval; +} + /** * srcu_read_unlock - unregister a old reader from an SRCU-protected structure. * @sp: srcu_struct in which to unregister the old reader. @@ -175,6 +185,13 @@ static inline void srcu_read_unlock(struct srcu_struct *sp, int idx) __srcu_read_unlock(sp, idx); } +/* Used by tracing, cannot be traced and cannot call lockdep. */ +static inline notrace void +srcu_read_unlock_notrace(struct srcu_struct *sp, int idx) __releases(sp) +{ + __srcu_read_unlock(sp, idx); +} + /** * smp_mb__after_srcu_read_unlock - ensure full ordering after srcu_read_unlock * -- 2.17.0.441.gb46fe60e1d-goog
[PATCH RFC v5 0/6] Centralize and unify usage of preempt/irq tracepoints
This is the next revision of preempt/irq tracepoint centralization and unified usage across the kernel [1]. The preempt/irq tracepoints exist but not everything in the kernel is using it. This makes things not work simultaneously (for ex, only either lockdep or irqsoff events can be used at a time). This series is an attempt to solve that, and also results in a nice clean up of kernel in general. Several ifdefs are simpler, and the design is more unified and better. Also as a result of this, we also speeded performance all rcuidle tracepoints since their handling is simpler. v5: - Fixed performance issues due to rcu-idle handling Joel Fernandes (5): softirq: reorder trace_softirqs_on to prevent lockdep splat srcu: Add notrace variant of srcu_dereference trace/irqsoff: Split reset into seperate functions tracepoint: Make rcuidle tracepoint callers use SRCU tracing: Centralize preemptirq tracepoints and unify their usage Paul E. McKenney (1): srcu: Add notrace variants of srcu_read_{lock,unlock} include/linux/ftrace.h| 11 +- include/linux/irqflags.h | 11 +- include/linux/lockdep.h | 8 +- include/linux/preempt.h | 2 +- include/linux/srcu.h | 22 +++ include/linux/tracepoint.h| 47 +- include/trace/events/preemptirq.h | 23 +-- init/main.c | 5 +- kernel/locking/lockdep.c | 35 ++--- kernel/sched/core.c | 2 +- kernel/softirq.c | 6 +- kernel/trace/Kconfig | 22 ++- kernel/trace/Makefile | 2 +- kernel/trace/trace_irqsoff.c | 235 +- kernel/trace/trace_preemptirq.c | 71 + kernel/tracepoint.c | 10 +- 16 files changed, 283 insertions(+), 229 deletions(-) create mode 100644 kernel/trace/trace_preemptirq.c Cc: Steven RostedtCc: Peter Zilstra Cc: Ingo Molnar Cc: Mathieu Desnoyers Cc: Tom Zanussi Cc: Namhyung Kim Cc: Thomas Glexiner Cc: Boqun Feng Cc: Paul McKenney Cc: Frederic Weisbecker Cc: Randy Dunlap Cc: Masami Hiramatsu Cc: Fenguang Wu Cc: Baohong Liu Cc: Vedang Patel Cc: kernel-t...@android.com Signed-off-by: Joel Fernandes -- 2.17.0.441.gb46fe60e1d-goog
[PATCH RFC v5 2/6] srcu: Add notrace variants of srcu_read_{lock,unlock}
From: "Paul E. McKenney" This is needed for a future tracepoint patch that uses srcu, and to make sure it doesn't call into lockdep. tracepoint code already calls notrace variants for rcu_read_lock_sched so this patch does the same for srcu which will be used in a later patch. Keeps it consistent with rcu-sched. [Joel: Added commit message] Cc: Steven Rostedt Cc: Peter Zilstra Cc: Ingo Molnar Cc: Mathieu Desnoyers Cc: Tom Zanussi Cc: Namhyung Kim Cc: Thomas Glexiner Cc: Boqun Feng Cc: Paul McKenney Cc: Frederic Weisbecker Cc: Randy Dunlap Cc: Masami Hiramatsu Cc: Fenguang Wu Cc: Baohong Liu Cc: Vedang Patel Cc: kernel-t...@android.com Signed-off-by: Paul McKenney Signed-off-by: Joel Fernandes --- include/linux/srcu.h | 17 + 1 file changed, 17 insertions(+) diff --git a/include/linux/srcu.h b/include/linux/srcu.h index 33c1c698df09..2ec618979b20 100644 --- a/include/linux/srcu.h +++ b/include/linux/srcu.h @@ -161,6 +161,16 @@ static inline int srcu_read_lock(struct srcu_struct *sp) __acquires(sp) return retval; } +/* Used by tracing, cannot be traced and cannot invoke lockdep. */ +static inline notrace int +srcu_read_lock_notrace(struct srcu_struct *sp) __acquires(sp) +{ + int retval; + + retval = __srcu_read_lock(sp); + return retval; +} + /** * srcu_read_unlock - unregister a old reader from an SRCU-protected structure. * @sp: srcu_struct in which to unregister the old reader. @@ -175,6 +185,13 @@ static inline void srcu_read_unlock(struct srcu_struct *sp, int idx) __srcu_read_unlock(sp, idx); } +/* Used by tracing, cannot be traced and cannot call lockdep. */ +static inline notrace void +srcu_read_unlock_notrace(struct srcu_struct *sp, int idx) __releases(sp) +{ + __srcu_read_unlock(sp, idx); +} + /** * smp_mb__after_srcu_read_unlock - ensure full ordering after srcu_read_unlock * -- 2.17.0.441.gb46fe60e1d-goog
[PATCH RFC v5 0/6] Centralize and unify usage of preempt/irq tracepoints
This is the next revision of preempt/irq tracepoint centralization and unified usage across the kernel [1]. The preempt/irq tracepoints exist but not everything in the kernel is using it. This makes things not work simultaneously (for ex, only either lockdep or irqsoff events can be used at a time). This series is an attempt to solve that, and also results in a nice clean up of kernel in general. Several ifdefs are simpler, and the design is more unified and better. Also as a result of this, we also speeded performance all rcuidle tracepoints since their handling is simpler. v5: - Fixed performance issues due to rcu-idle handling Joel Fernandes (5): softirq: reorder trace_softirqs_on to prevent lockdep splat srcu: Add notrace variant of srcu_dereference trace/irqsoff: Split reset into seperate functions tracepoint: Make rcuidle tracepoint callers use SRCU tracing: Centralize preemptirq tracepoints and unify their usage Paul E. McKenney (1): srcu: Add notrace variants of srcu_read_{lock,unlock} include/linux/ftrace.h| 11 +- include/linux/irqflags.h | 11 +- include/linux/lockdep.h | 8 +- include/linux/preempt.h | 2 +- include/linux/srcu.h | 22 +++ include/linux/tracepoint.h| 47 +- include/trace/events/preemptirq.h | 23 +-- init/main.c | 5 +- kernel/locking/lockdep.c | 35 ++--- kernel/sched/core.c | 2 +- kernel/softirq.c | 6 +- kernel/trace/Kconfig | 22 ++- kernel/trace/Makefile | 2 +- kernel/trace/trace_irqsoff.c | 235 +- kernel/trace/trace_preemptirq.c | 71 + kernel/tracepoint.c | 10 +- 16 files changed, 283 insertions(+), 229 deletions(-) create mode 100644 kernel/trace/trace_preemptirq.c Cc: Steven Rostedt Cc: Peter Zilstra Cc: Ingo Molnar Cc: Mathieu Desnoyers Cc: Tom Zanussi Cc: Namhyung Kim Cc: Thomas Glexiner Cc: Boqun Feng Cc: Paul McKenney Cc: Frederic Weisbecker Cc: Randy Dunlap Cc: Masami Hiramatsu Cc: Fenguang Wu Cc: Baohong Liu Cc: Vedang Patel Cc: kernel-t...@android.com Signed-off-by: Joel Fernandes -- 2.17.0.441.gb46fe60e1d-goog
[PATCH RFC v5 3/6] srcu: Add notrace variant of srcu_dereference
In this series, we are making lockdep use an rcuidle tracepoint. For this reason we need a notrace variant of srcu_dereference since otherwise we get lockdep splats since lockdep hooks may not have run yet. This patch adds the needed variant. Cc: Steven RostedtCc: Peter Zilstra Cc: Ingo Molnar Cc: Mathieu Desnoyers Cc: Tom Zanussi Cc: Namhyung Kim Cc: Thomas Glexiner Cc: Boqun Feng Cc: Paul McKenney Cc: Frederic Weisbecker Cc: Randy Dunlap Cc: Masami Hiramatsu Cc: Fenguang Wu Cc: Baohong Liu Cc: Vedang Patel Cc: kernel-t...@android.com Signed-off-by: Joel Fernandes --- include/linux/srcu.h | 5 + 1 file changed, 5 insertions(+) diff --git a/include/linux/srcu.h b/include/linux/srcu.h index 2ec618979b20..a1c4947be877 100644 --- a/include/linux/srcu.h +++ b/include/linux/srcu.h @@ -135,6 +135,11 @@ static inline int srcu_read_lock_held(const struct srcu_struct *sp) */ #define srcu_dereference(p, sp) srcu_dereference_check((p), (sp), 0) +/** + * srcu_dereference_notrace - no tracing and no lockdep calls from here + */ +#define srcu_dereference_notrace(p, sp) srcu_dereference_check((p), (sp), 1) + /** * srcu_read_lock - register a new reader for an SRCU-protected structure. * @sp: srcu_struct in which to register the new reader. -- 2.17.0.441.gb46fe60e1d-goog
[PATCH RFC v5 3/6] srcu: Add notrace variant of srcu_dereference
In this series, we are making lockdep use an rcuidle tracepoint. For this reason we need a notrace variant of srcu_dereference since otherwise we get lockdep splats since lockdep hooks may not have run yet. This patch adds the needed variant. Cc: Steven Rostedt Cc: Peter Zilstra Cc: Ingo Molnar Cc: Mathieu Desnoyers Cc: Tom Zanussi Cc: Namhyung Kim Cc: Thomas Glexiner Cc: Boqun Feng Cc: Paul McKenney Cc: Frederic Weisbecker Cc: Randy Dunlap Cc: Masami Hiramatsu Cc: Fenguang Wu Cc: Baohong Liu Cc: Vedang Patel Cc: kernel-t...@android.com Signed-off-by: Joel Fernandes --- include/linux/srcu.h | 5 + 1 file changed, 5 insertions(+) diff --git a/include/linux/srcu.h b/include/linux/srcu.h index 2ec618979b20..a1c4947be877 100644 --- a/include/linux/srcu.h +++ b/include/linux/srcu.h @@ -135,6 +135,11 @@ static inline int srcu_read_lock_held(const struct srcu_struct *sp) */ #define srcu_dereference(p, sp) srcu_dereference_check((p), (sp), 0) +/** + * srcu_dereference_notrace - no tracing and no lockdep calls from here + */ +#define srcu_dereference_notrace(p, sp) srcu_dereference_check((p), (sp), 1) + /** * srcu_read_lock - register a new reader for an SRCU-protected structure. * @sp: srcu_struct in which to register the new reader. -- 2.17.0.441.gb46fe60e1d-goog
[PATCH RFC v5 5/6] tracepoint: Make rcuidle tracepoint callers use SRCU
In recent tests with IRQ on/off tracepoints, a large performance overhead ~10% is noticed when running hackbench. This is root caused to calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the tracepoint code. Following a long discussion on the list [1] about this, we concluded that srcu is a better alternative for use during rcu idle. Although it does involve extra barriers, its lighter than the sched-rcu version which has to do additional RCU calls to notify RCU idle about entry into RCU sections. In this patch, we change the underlying implementation of the trace_*_rcuidle API to use SRCU. This has shown to improve performance alot for the high frequency irq enable/disable tracepoints. Test: Tested idle and preempt/irq tracepoints. [1] https://patchwork.kernel.org/patch/10344297/ Cc: Steven RostedtCc: Peter Zilstra Cc: Ingo Molnar Cc: Mathieu Desnoyers Cc: Tom Zanussi Cc: Namhyung Kim Cc: Thomas Glexiner Cc: Boqun Feng Cc: Paul McKenney Cc: Frederic Weisbecker Cc: Randy Dunlap Cc: Masami Hiramatsu Cc: Fenguang Wu Cc: Baohong Liu Cc: Vedang Patel Cc: kernel-t...@android.com Signed-off-by: Joel Fernandes --- include/linux/tracepoint.h | 46 +++--- kernel/tracepoint.c| 10 - 2 files changed, 47 insertions(+), 9 deletions(-) diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index c94f466d57ef..4135e08fb5f1 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -15,6 +15,7 @@ */ #include +#include #include #include #include @@ -33,6 +34,8 @@ struct trace_eval_map { #define TRACEPOINT_DEFAULT_PRIO10 +extern struct srcu_struct tracepoint_srcu; + extern int tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data); extern int @@ -77,6 +80,9 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb) */ static inline void tracepoint_synchronize_unregister(void) { +#ifdef CONFIG_TRACEPOINTS + synchronize_srcu(_srcu); +#endif synchronize_sched(); } @@ -129,18 +135,38 @@ extern void syscall_unregfunc(void); * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto". */ -#define __DO_TRACE(tp, proto, args, cond, rcucheck)\ +#define __DO_TRACE(tp, proto, args, cond, rcuidle) \ do {\ struct tracepoint_func *it_func_ptr;\ void *it_func; \ void *__data; \ + int __maybe_unused idx = 0; \ \ if (!(cond))\ return; \ - if (rcucheck) \ - rcu_irq_enter_irqson(); \ - rcu_read_lock_sched_notrace(); \ - it_func_ptr = rcu_dereference_sched((tp)->funcs); \ + \ + /* \ +* For rcuidle callers, use srcu since sched-rcu\ +* doesn't work from the idle path. \ +*/ \ + if (rcuidle) { \ + if (in_nmi()) { \ + WARN_ON_ONCE(1);\ + return; /* no srcu from nmi */ \ + } \ + \ + /* To keep it consistent with !rcuidle path */ \ + preempt_disable_notrace(); \ + \ + idx = srcu_read_lock_notrace(_srcu); \ + it_func_ptr = srcu_dereference((tp)->funcs, \ + _srcu); \ + } else {\
[PATCH RFC v5 4/6] trace/irqsoff: Split reset into seperate functions
Split reset functions into seperate functions in preparation of future patches that need to do tracer specific reset. Cc: Steven RostedtCc: Peter Zilstra Cc: Ingo Molnar Cc: Mathieu Desnoyers Cc: Tom Zanussi Cc: Namhyung Kim Cc: Thomas Glexiner Cc: Boqun Feng Cc: Paul McKenney Cc: Frederic Weisbecker Cc: Randy Dunlap Cc: Masami Hiramatsu Cc: Fenguang Wu Cc: Baohong Liu Cc: Vedang Patel Cc: kernel-t...@android.com Signed-off-by: Joel Fernandes --- kernel/trace/trace_irqsoff.c | 22 +++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c index 03ecb4465ee4..f8daa754cce2 100644 --- a/kernel/trace/trace_irqsoff.c +++ b/kernel/trace/trace_irqsoff.c @@ -634,7 +634,7 @@ static int __irqsoff_tracer_init(struct trace_array *tr) return 0; } -static void irqsoff_tracer_reset(struct trace_array *tr) +static void __irqsoff_tracer_reset(struct trace_array *tr) { int lat_flag = save_flags & TRACE_ITER_LATENCY_FMT; int overwrite_flag = save_flags & TRACE_ITER_OVERWRITE; @@ -665,6 +665,12 @@ static int irqsoff_tracer_init(struct trace_array *tr) return __irqsoff_tracer_init(tr); } + +static void irqsoff_tracer_reset(struct trace_array *tr) +{ + __irqsoff_tracer_reset(tr); +} + static struct tracer irqsoff_tracer __read_mostly = { .name = "irqsoff", @@ -697,11 +703,16 @@ static int preemptoff_tracer_init(struct trace_array *tr) return __irqsoff_tracer_init(tr); } +static void preemptoff_tracer_reset(struct trace_array *tr) +{ + __irqsoff_tracer_reset(tr); +} + static struct tracer preemptoff_tracer __read_mostly = { .name = "preemptoff", .init = preemptoff_tracer_init, - .reset = irqsoff_tracer_reset, + .reset = preemptoff_tracer_reset, .start = irqsoff_tracer_start, .stop = irqsoff_tracer_stop, .print_max = true, @@ -731,11 +742,16 @@ static int preemptirqsoff_tracer_init(struct trace_array *tr) return __irqsoff_tracer_init(tr); } +static void preemptirqsoff_tracer_reset(struct trace_array *tr) +{ + __irqsoff_tracer_reset(tr); +} + static struct tracer preemptirqsoff_tracer __read_mostly = { .name = "preemptirqsoff", .init = preemptirqsoff_tracer_init, - .reset = irqsoff_tracer_reset, + .reset = preemptirqsoff_tracer_reset, .start = irqsoff_tracer_start, .stop = irqsoff_tracer_stop, .print_max = true, -- 2.17.0.441.gb46fe60e1d-goog
[PATCH RFC v5 5/6] tracepoint: Make rcuidle tracepoint callers use SRCU
In recent tests with IRQ on/off tracepoints, a large performance overhead ~10% is noticed when running hackbench. This is root caused to calls to rcu_irq_enter_irqson and rcu_irq_exit_irqson from the tracepoint code. Following a long discussion on the list [1] about this, we concluded that srcu is a better alternative for use during rcu idle. Although it does involve extra barriers, its lighter than the sched-rcu version which has to do additional RCU calls to notify RCU idle about entry into RCU sections. In this patch, we change the underlying implementation of the trace_*_rcuidle API to use SRCU. This has shown to improve performance alot for the high frequency irq enable/disable tracepoints. Test: Tested idle and preempt/irq tracepoints. [1] https://patchwork.kernel.org/patch/10344297/ Cc: Steven Rostedt Cc: Peter Zilstra Cc: Ingo Molnar Cc: Mathieu Desnoyers Cc: Tom Zanussi Cc: Namhyung Kim Cc: Thomas Glexiner Cc: Boqun Feng Cc: Paul McKenney Cc: Frederic Weisbecker Cc: Randy Dunlap Cc: Masami Hiramatsu Cc: Fenguang Wu Cc: Baohong Liu Cc: Vedang Patel Cc: kernel-t...@android.com Signed-off-by: Joel Fernandes --- include/linux/tracepoint.h | 46 +++--- kernel/tracepoint.c| 10 - 2 files changed, 47 insertions(+), 9 deletions(-) diff --git a/include/linux/tracepoint.h b/include/linux/tracepoint.h index c94f466d57ef..4135e08fb5f1 100644 --- a/include/linux/tracepoint.h +++ b/include/linux/tracepoint.h @@ -15,6 +15,7 @@ */ #include +#include #include #include #include @@ -33,6 +34,8 @@ struct trace_eval_map { #define TRACEPOINT_DEFAULT_PRIO10 +extern struct srcu_struct tracepoint_srcu; + extern int tracepoint_probe_register(struct tracepoint *tp, void *probe, void *data); extern int @@ -77,6 +80,9 @@ int unregister_tracepoint_module_notifier(struct notifier_block *nb) */ static inline void tracepoint_synchronize_unregister(void) { +#ifdef CONFIG_TRACEPOINTS + synchronize_srcu(_srcu); +#endif synchronize_sched(); } @@ -129,18 +135,38 @@ extern void syscall_unregfunc(void); * as "(void *, void)". The DECLARE_TRACE_NOARGS() will pass in just * "void *data", where as the DECLARE_TRACE() will pass in "void *data, proto". */ -#define __DO_TRACE(tp, proto, args, cond, rcucheck)\ +#define __DO_TRACE(tp, proto, args, cond, rcuidle) \ do {\ struct tracepoint_func *it_func_ptr;\ void *it_func; \ void *__data; \ + int __maybe_unused idx = 0; \ \ if (!(cond))\ return; \ - if (rcucheck) \ - rcu_irq_enter_irqson(); \ - rcu_read_lock_sched_notrace(); \ - it_func_ptr = rcu_dereference_sched((tp)->funcs); \ + \ + /* \ +* For rcuidle callers, use srcu since sched-rcu\ +* doesn't work from the idle path. \ +*/ \ + if (rcuidle) { \ + if (in_nmi()) { \ + WARN_ON_ONCE(1);\ + return; /* no srcu from nmi */ \ + } \ + \ + /* To keep it consistent with !rcuidle path */ \ + preempt_disable_notrace(); \ + \ + idx = srcu_read_lock_notrace(_srcu); \ + it_func_ptr = srcu_dereference((tp)->funcs, \ + _srcu); \ + } else {\ + rcu_read_lock_sched_notrace(); \ + it_func_ptr = \ + rcu_dereference_sched((tp)->funcs); \ + } \ +
[PATCH RFC v5 4/6] trace/irqsoff: Split reset into seperate functions
Split reset functions into seperate functions in preparation of future patches that need to do tracer specific reset. Cc: Steven Rostedt Cc: Peter Zilstra Cc: Ingo Molnar Cc: Mathieu Desnoyers Cc: Tom Zanussi Cc: Namhyung Kim Cc: Thomas Glexiner Cc: Boqun Feng Cc: Paul McKenney Cc: Frederic Weisbecker Cc: Randy Dunlap Cc: Masami Hiramatsu Cc: Fenguang Wu Cc: Baohong Liu Cc: Vedang Patel Cc: kernel-t...@android.com Signed-off-by: Joel Fernandes --- kernel/trace/trace_irqsoff.c | 22 +++--- 1 file changed, 19 insertions(+), 3 deletions(-) diff --git a/kernel/trace/trace_irqsoff.c b/kernel/trace/trace_irqsoff.c index 03ecb4465ee4..f8daa754cce2 100644 --- a/kernel/trace/trace_irqsoff.c +++ b/kernel/trace/trace_irqsoff.c @@ -634,7 +634,7 @@ static int __irqsoff_tracer_init(struct trace_array *tr) return 0; } -static void irqsoff_tracer_reset(struct trace_array *tr) +static void __irqsoff_tracer_reset(struct trace_array *tr) { int lat_flag = save_flags & TRACE_ITER_LATENCY_FMT; int overwrite_flag = save_flags & TRACE_ITER_OVERWRITE; @@ -665,6 +665,12 @@ static int irqsoff_tracer_init(struct trace_array *tr) return __irqsoff_tracer_init(tr); } + +static void irqsoff_tracer_reset(struct trace_array *tr) +{ + __irqsoff_tracer_reset(tr); +} + static struct tracer irqsoff_tracer __read_mostly = { .name = "irqsoff", @@ -697,11 +703,16 @@ static int preemptoff_tracer_init(struct trace_array *tr) return __irqsoff_tracer_init(tr); } +static void preemptoff_tracer_reset(struct trace_array *tr) +{ + __irqsoff_tracer_reset(tr); +} + static struct tracer preemptoff_tracer __read_mostly = { .name = "preemptoff", .init = preemptoff_tracer_init, - .reset = irqsoff_tracer_reset, + .reset = preemptoff_tracer_reset, .start = irqsoff_tracer_start, .stop = irqsoff_tracer_stop, .print_max = true, @@ -731,11 +742,16 @@ static int preemptirqsoff_tracer_init(struct trace_array *tr) return __irqsoff_tracer_init(tr); } +static void preemptirqsoff_tracer_reset(struct trace_array *tr) +{ + __irqsoff_tracer_reset(tr); +} + static struct tracer preemptirqsoff_tracer __read_mostly = { .name = "preemptirqsoff", .init = preemptirqsoff_tracer_init, - .reset = irqsoff_tracer_reset, + .reset = preemptirqsoff_tracer_reset, .start = irqsoff_tracer_start, .stop = irqsoff_tracer_stop, .print_max = true, -- 2.17.0.441.gb46fe60e1d-goog
[PATCH RFC v5 6/6] tracing: Centralize preemptirq tracepoints and unify their usage
This patch detaches the preemptirq tracepoints from the tracers and keeps it separate. Advantages: * Lockdep and irqsoff event can now run in parallel since they no longer have their own calls. * This unifies the usecase of adding hooks to an irqsoff and irqson event, and a preemptoff and preempton event. 3 users of the events exist: - Lockdep - irqsoff and preemptoff tracers - irqs and preempt trace events The unification cleans up several ifdefs and makes the code in preempt tracer and irqsoff tracers simpler. It gets rid of all the horrific ifdeferry around PROVE_LOCKING and makes configuration of the different users of the tracepoints more easy and understandable. It also gets rid of the time_* function calls from the lockdep hooks used to call into the preemptirq tracer which is not needed anymore. The negative delta in lines of code in this patch is quite large too. In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS as a single point for registering probes onto the tracepoints. With this, the web of config options for preempt/irq toggle tracepoints and its users becomes: PREEMPT_TRACER PREEMPTIRQ_EVENTS IRQSOFF_TRACER PROVE_LOCKING | | \ | | \(selects)/ \\ (selects) / TRACE_PREEMPT_TOGGLE > TRACE_IRQFLAGS \ / \ (depends on) / PREEMPTIRQ_TRACEPOINTS One note, I have to check for lockdep recursion in the code that calls the trace events API and bail out if we're in lockdep recursion protection to prevent something like the following case: a spin_lock is taken. Then lockdep_acquired is called. That does a raw_local_irq_save and then sets lockdep_recursion, and then calls __lockdep_acquired. In this function, a call to get_lock_stats happens which calls preempt_disable, which calls trace IRQS off somewhere which enters my tracepoint code and sets the tracing_irq_cpu flag to prevent recursion. This flag is then never cleared causing lockdep paths to never be entered and thus causing splats and other bad things. Cc: Steven RostedtCc: Peter Zilstra Cc: Ingo Molnar Cc: Mathieu Desnoyers Cc: Tom Zanussi Cc: Namhyung Kim Cc: Thomas Glexiner Cc: Boqun Feng Cc: Paul McKenney Cc: Frederic Weisbecker Cc: Randy Dunlap Cc: Masami Hiramatsu Cc: Fenguang Wu Cc: Baohong Liu Cc: Vedang Patel Cc: kernel-t...@android.com Signed-off-by: Joel Fernandes --- include/linux/ftrace.h| 11 +- include/linux/irqflags.h | 11 +- include/linux/lockdep.h | 8 +- include/linux/preempt.h | 2 +- include/linux/tracepoint.h| 3 +- include/trace/events/preemptirq.h | 23 ++-- init/main.c | 5 +- kernel/locking/lockdep.c | 35 ++--- kernel/sched/core.c | 2 +- kernel/trace/Kconfig | 22 ++- kernel/trace/Makefile | 2 +- kernel/trace/trace_irqsoff.c | 213 -- kernel/trace/trace_preemptirq.c | 71 ++ 13 files changed, 191 insertions(+), 217 deletions(-) create mode 100644 kernel/trace/trace_preemptirq.c diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 9c3c9a319e48..5191030af0c0 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -709,16 +709,7 @@ static inline unsigned long get_lock_parent_ip(void) return CALLER_ADDR2; } -#ifdef CONFIG_IRQSOFF_TRACER - extern void time_hardirqs_on(unsigned long a0, unsigned long a1); - extern void time_hardirqs_off(unsigned long a0, unsigned long a1); -#else - static inline void time_hardirqs_on(unsigned long a0, unsigned long a1) { } - static inline void time_hardirqs_off(unsigned long a0, unsigned long a1) { } -#endif - -#if defined(CONFIG_PREEMPT_TRACER) || \ - (defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS)) +#ifdef CONFIG_TRACE_PREEMPT_TOGGLE extern void trace_preempt_on(unsigned long a0, unsigned long a1); extern void trace_preempt_off(unsigned long a0, unsigned long a1); #else diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h index 9700f00bbc04..50edb9cbbd26 100644 --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -15,9 +15,16 @@ #include #include -#ifdef CONFIG_TRACE_IRQFLAGS +/* Currently trace_softirqs_on/off is used only by lockdep */ +#ifdef CONFIG_PROVE_LOCKING extern void trace_softirqs_on(unsigned long ip); extern void trace_softirqs_off(unsigned long ip); +#else +# define trace_softirqs_on(ip) do { } while
[PATCH RFC v5 6/6] tracing: Centralize preemptirq tracepoints and unify their usage
This patch detaches the preemptirq tracepoints from the tracers and keeps it separate. Advantages: * Lockdep and irqsoff event can now run in parallel since they no longer have their own calls. * This unifies the usecase of adding hooks to an irqsoff and irqson event, and a preemptoff and preempton event. 3 users of the events exist: - Lockdep - irqsoff and preemptoff tracers - irqs and preempt trace events The unification cleans up several ifdefs and makes the code in preempt tracer and irqsoff tracers simpler. It gets rid of all the horrific ifdeferry around PROVE_LOCKING and makes configuration of the different users of the tracepoints more easy and understandable. It also gets rid of the time_* function calls from the lockdep hooks used to call into the preemptirq tracer which is not needed anymore. The negative delta in lines of code in this patch is quite large too. In the patch we introduce a new CONFIG option PREEMPTIRQ_TRACEPOINTS as a single point for registering probes onto the tracepoints. With this, the web of config options for preempt/irq toggle tracepoints and its users becomes: PREEMPT_TRACER PREEMPTIRQ_EVENTS IRQSOFF_TRACER PROVE_LOCKING | | \ | | \(selects)/ \\ (selects) / TRACE_PREEMPT_TOGGLE > TRACE_IRQFLAGS \ / \ (depends on) / PREEMPTIRQ_TRACEPOINTS One note, I have to check for lockdep recursion in the code that calls the trace events API and bail out if we're in lockdep recursion protection to prevent something like the following case: a spin_lock is taken. Then lockdep_acquired is called. That does a raw_local_irq_save and then sets lockdep_recursion, and then calls __lockdep_acquired. In this function, a call to get_lock_stats happens which calls preempt_disable, which calls trace IRQS off somewhere which enters my tracepoint code and sets the tracing_irq_cpu flag to prevent recursion. This flag is then never cleared causing lockdep paths to never be entered and thus causing splats and other bad things. Cc: Steven Rostedt Cc: Peter Zilstra Cc: Ingo Molnar Cc: Mathieu Desnoyers Cc: Tom Zanussi Cc: Namhyung Kim Cc: Thomas Glexiner Cc: Boqun Feng Cc: Paul McKenney Cc: Frederic Weisbecker Cc: Randy Dunlap Cc: Masami Hiramatsu Cc: Fenguang Wu Cc: Baohong Liu Cc: Vedang Patel Cc: kernel-t...@android.com Signed-off-by: Joel Fernandes --- include/linux/ftrace.h| 11 +- include/linux/irqflags.h | 11 +- include/linux/lockdep.h | 8 +- include/linux/preempt.h | 2 +- include/linux/tracepoint.h| 3 +- include/trace/events/preemptirq.h | 23 ++-- init/main.c | 5 +- kernel/locking/lockdep.c | 35 ++--- kernel/sched/core.c | 2 +- kernel/trace/Kconfig | 22 ++- kernel/trace/Makefile | 2 +- kernel/trace/trace_irqsoff.c | 213 -- kernel/trace/trace_preemptirq.c | 71 ++ 13 files changed, 191 insertions(+), 217 deletions(-) create mode 100644 kernel/trace/trace_preemptirq.c diff --git a/include/linux/ftrace.h b/include/linux/ftrace.h index 9c3c9a319e48..5191030af0c0 100644 --- a/include/linux/ftrace.h +++ b/include/linux/ftrace.h @@ -709,16 +709,7 @@ static inline unsigned long get_lock_parent_ip(void) return CALLER_ADDR2; } -#ifdef CONFIG_IRQSOFF_TRACER - extern void time_hardirqs_on(unsigned long a0, unsigned long a1); - extern void time_hardirqs_off(unsigned long a0, unsigned long a1); -#else - static inline void time_hardirqs_on(unsigned long a0, unsigned long a1) { } - static inline void time_hardirqs_off(unsigned long a0, unsigned long a1) { } -#endif - -#if defined(CONFIG_PREEMPT_TRACER) || \ - (defined(CONFIG_DEBUG_PREEMPT) && defined(CONFIG_PREEMPTIRQ_EVENTS)) +#ifdef CONFIG_TRACE_PREEMPT_TOGGLE extern void trace_preempt_on(unsigned long a0, unsigned long a1); extern void trace_preempt_off(unsigned long a0, unsigned long a1); #else diff --git a/include/linux/irqflags.h b/include/linux/irqflags.h index 9700f00bbc04..50edb9cbbd26 100644 --- a/include/linux/irqflags.h +++ b/include/linux/irqflags.h @@ -15,9 +15,16 @@ #include #include -#ifdef CONFIG_TRACE_IRQFLAGS +/* Currently trace_softirqs_on/off is used only by lockdep */ +#ifdef CONFIG_PROVE_LOCKING extern void trace_softirqs_on(unsigned long ip); extern void trace_softirqs_off(unsigned long ip); +#else +# define trace_softirqs_on(ip) do { } while (0) +# define trace_softirqs_off(ip)do { } while (0) +#endif + +#ifdef CONFIG_TRACE_IRQFLAGS extern void trace_hardirqs_on(void); extern void trace_hardirqs_off(void); # define trace_hardirq_context(p) ((p)->hardirq_context) @@ -43,8 +50,6 @@ do { \ #else # define trace_hardirqs_on() do { }
Re: [RFC v4 3/4] irqflags: Avoid unnecessary calls to trace_ if you can
On Wed, Apr 18, 2018 at 2:02 AM Masami Hiramatsuwrote: > On Mon, 16 Apr 2018 21:07:47 -0700 > Joel Fernandes wrote: > > With TRACE_IRQFLAGS, we call trace_ API too many times. We don't need > > to if local_irq_restore or local_irq_save didn't actually do anything. > > > > This gives around a 4% improvement in performance when doing the > > following command: "time find / > /dev/null" > > > > Also its best to avoid these calls where possible, since in this series, > > the RCU code in tracepoint.h seems to be call these quite a bit and I'd > > like to keep this overhead low. > Can we assume that the "flags" has only 1 bit irq-disable flag? > Since it skips calling raw_local_irq_restore(flags); too, > if there is any state in the flags on any arch, it may change the > result. In that case, we can do it as below (just skipping trace_hardirqs_*) > int disabled = irqs_disabled(); > if (!raw_irqs_disabled_flags(flags) && disabled) > trace_hardirqs_on(); > raw_local_irq_restore(flags); > if (raw_irqs_disabled_flags(flags) && !disabled) > trace_hardirqs_off(); With changes to the tracepoint implementation which uses srcu instead of rcu, I'm not able to see a performance improvement with the above patch. For this reason, I will drop this patch from next series. thanks, - Joel
Re: [RFC v4 3/4] irqflags: Avoid unnecessary calls to trace_ if you can
On Wed, Apr 18, 2018 at 2:02 AM Masami Hiramatsu wrote: > On Mon, 16 Apr 2018 21:07:47 -0700 > Joel Fernandes wrote: > > With TRACE_IRQFLAGS, we call trace_ API too many times. We don't need > > to if local_irq_restore or local_irq_save didn't actually do anything. > > > > This gives around a 4% improvement in performance when doing the > > following command: "time find / > /dev/null" > > > > Also its best to avoid these calls where possible, since in this series, > > the RCU code in tracepoint.h seems to be call these quite a bit and I'd > > like to keep this overhead low. > Can we assume that the "flags" has only 1 bit irq-disable flag? > Since it skips calling raw_local_irq_restore(flags); too, > if there is any state in the flags on any arch, it may change the > result. In that case, we can do it as below (just skipping trace_hardirqs_*) > int disabled = irqs_disabled(); > if (!raw_irqs_disabled_flags(flags) && disabled) > trace_hardirqs_on(); > raw_local_irq_restore(flags); > if (raw_irqs_disabled_flags(flags) && !disabled) > trace_hardirqs_off(); With changes to the tracepoint implementation which uses srcu instead of rcu, I'm not able to see a performance improvement with the above patch. For this reason, I will drop this patch from next series. thanks, - Joel