[PATCH] i2c: stm32f7: Make structure stm32f7_i2c_algo constant
Static structure stm32f7_i2c_algo, of type i2c_algorithm, is used only when it is assigned to constant field algo of a variable having type i2c_adapter. As stm32f7_i2c_algo is therefore never modified, make it const as well to protect it from unintended modification. Issue found with Coccinelle. Signed-off-by: Nishka Dasgupta --- drivers/i2c/busses/i2c-stm32f7.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/i2c/busses/i2c-stm32f7.c b/drivers/i2c/busses/i2c-stm32f7.c index 266d1c269b83..d36cf08461f7 100644 --- a/drivers/i2c/busses/i2c-stm32f7.c +++ b/drivers/i2c/busses/i2c-stm32f7.c @@ -1809,7 +1809,7 @@ static u32 stm32f7_i2c_func(struct i2c_adapter *adap) I2C_FUNC_SMBUS_I2C_BLOCK; } -static struct i2c_algorithm stm32f7_i2c_algo = { +static const struct i2c_algorithm stm32f7_i2c_algo = { .master_xfer = stm32f7_i2c_xfer, .smbus_xfer = stm32f7_i2c_smbus_xfer, .functionality = stm32f7_i2c_func, -- 2.19.1
Re: [RFC PATCH 1/5] x86: tsc: add tsc to art helpers
Hi, Thomas Gleixner writes: > Felipe, > > On Tue, 16 Jul 2019, Felipe Balbi wrote: > > -ENOCHANGELOG > > As you said in the cover letter: > >> (3) The change in arch/x86/kernel/tsc.c needs to be reviewed at length >> before going in. > > So some information what those interfaces are used for and why they are > needed would be really helpful. Okay, I have some more details about this. The TGPIO device itself uses ART since TSC is not directly available to anything other than the CPU. The 'problem' here is that reading ART incurs extra latency which we would like to avoid. Therefore, we use TSC and scale it to nanoseconds which, would be the same as ART to ns. >> +void get_tsc_ns(struct system_counterval_t *tsc_counterval, u64 *tsc_ns) >> +{ >> +u64 tmp, res, rem; >> +u64 cycles; >> + >> +tsc_counterval->cycles = clocksource_tsc.read(NULL); >> +cycles = tsc_counterval->cycles; >> +tsc_counterval->cs = art_related_clocksource; >> + >> +rem = do_div(cycles, tsc_khz); >> + >> +res = cycles * USEC_PER_SEC; >> +tmp = rem * USEC_PER_SEC; >> + >> +do_div(tmp, tsc_khz); >> +res += tmp; >> + >> +*tsc_ns = res; >> +} >> +EXPORT_SYMBOL(get_tsc_ns); >> + >> +u64 get_art_ns_now(void) >> +{ >> +struct system_counterval_t tsc_cycles; >> +u64 tsc_ns; >> + >> +get_tsc_ns(_cycles, _ns); >> + >> +return tsc_ns; >> +} >> +EXPORT_SYMBOL(get_art_ns_now); > > While the changes look innocuous I'm missing the big picture why this needs > to emulate ART instead of simply using TSC directly. i don't think we're emulating ART here (other than the name in the function). We're just reading TSC and converting to nanoseconds, right? Cheers -- balbi
[PATCH] Bluetooth: 6lowpan: Make variable header_ops constant
Static variable header_ops, of type header_ops, is used only once, when it is assigned to field header_ops of a variable having type net_device. This corresponding field is declared as const in the definition of net_device. Hence make header_ops constant as well to protect it from unnecessary modification. Issue found with Coccinelle. Signed-off-by: Nishka Dasgupta --- net/bluetooth/6lowpan.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/net/bluetooth/6lowpan.c b/net/bluetooth/6lowpan.c index 9d41de1ec90f..bb55d92691b0 100644 --- a/net/bluetooth/6lowpan.c +++ b/net/bluetooth/6lowpan.c @@ -583,7 +583,7 @@ static const struct net_device_ops netdev_ops = { .ndo_start_xmit = bt_xmit, }; -static struct header_ops header_ops = { +static const struct header_ops header_ops = { .create = header_create, }; -- 2.19.1
[PATCH] Bluetooth: hci_qca: Make structure qca_proto constant
Static structure qca_proto, of type hci_uart_proto, is used four times: as the last argument in function hci_uart_register_device(), and as the only argument to functions hci_uart_register_proto() and hci_uart_unregister_proto(). In all three of these functions, the parameter corresponding to qca_proto is declared as constant. Therefore, make qca_proto itself constant as well in order to protect it from unintended modification. Issue found with Coccinelle. Signed-off-by: Nishka Dasgupta --- drivers/bluetooth/hci_qca.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/bluetooth/hci_qca.c b/drivers/bluetooth/hci_qca.c index 82a0a3691a63..80923fc9418f 100644 --- a/drivers/bluetooth/hci_qca.c +++ b/drivers/bluetooth/hci_qca.c @@ -1326,7 +1326,7 @@ static int qca_setup(struct hci_uart *hu) return ret; } -static struct hci_uart_proto qca_proto = { +static const struct hci_uart_proto qca_proto = { .id = HCI_UART_QCA, .name = "QCA", .manufacturer = 29, -- 2.19.1
Re: [PATCH] lsilogic mpt fusion: mptctl: Fixed race condition around mptctl_id variable using mutexes
Hi Mark, Thank you for the patch! Perhaps something to improve: [auto build test WARNING on linus/master] [cannot apply to v5.3-rc4 next-20190814] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Mark-Balantzyan/lsilogic-mpt-fusion-mptctl-Fixed-race-condition-around-mptctl_id-variable-using-mutexes/20190815-115822 config: x86_64-lkp (attached as .config) compiler: gcc-7 (Debian 7.4.0-10) 7.4.0 reproduce: # save the attached .config to linux build tree make ARCH=x86_64 If you fix the issue, kindly add following tag Reported-by: kbuild test robot All warnings (new ones prefixed by >>): drivers/message/fusion/mptctl.c: In function 'mptctl_do_fw_download': >> drivers/message/fusion/mptctl.c:820:3: warning: this 'if' clause does not >> guard... [-Wmisleading-indentation] if ((mf = mpt_get_msg_frame(mptctl_id, iocp)) == NULL) ^~ drivers/message/fusion/mptctl.c:822:4: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the 'if' return -EAGAIN; ^~ drivers/message/fusion/mptctl.c: In function 'mptctl_do_mpt_command': drivers/message/fusion/mptctl.c:1898:9: warning: this 'if' clause does not guard... [-Wmisleading-indentation] if ((mf = mpt_get_msg_frame(mptctl_id, ioc)) == NULL) ^~ drivers/message/fusion/mptctl.c:1900:17: note: ...this statement, but the latter is misleadingly indented as if it were guarded by the 'if' return -EAGAIN; ^~ vim +/if +820 drivers/message/fusion/mptctl.c ^1da177e4c3f41 Linus Torvalds 2005-04-16 771 ^1da177e4c3f41 Linus Torvalds 2005-04-16 772 /*=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=-=*/ ^1da177e4c3f41 Linus Torvalds 2005-04-16 773 /* ^1da177e4c3f41 Linus Torvalds 2005-04-16 774 * FW Download engine. ^1da177e4c3f41 Linus Torvalds 2005-04-16 775 * Outputs:None. ^1da177e4c3f41 Linus Torvalds 2005-04-16 776 * Return: 0 if successful ^1da177e4c3f41 Linus Torvalds 2005-04-16 777 * -EFAULT if data unavailable ^1da177e4c3f41 Linus Torvalds 2005-04-16 778 * -ENXIO if no such device ^1da177e4c3f41 Linus Torvalds 2005-04-16 779 * -EAGAIN if resource problem ^1da177e4c3f41 Linus Torvalds 2005-04-16 780 * -ENOMEM if no memory for SGE ^1da177e4c3f41 Linus Torvalds 2005-04-16 781 * -EMLINK if too many chain buffers required ^1da177e4c3f41 Linus Torvalds 2005-04-16 782 * -EBADRQC if adapter does not support FW download ^1da177e4c3f41 Linus Torvalds 2005-04-16 783 * -EBUSY if adapter is busy ^1da177e4c3f41 Linus Torvalds 2005-04-16 784 * -ENOMSG if FW upload returned bad status ^1da177e4c3f41 Linus Torvalds 2005-04-16 785 */ ^1da177e4c3f41 Linus Torvalds 2005-04-16 786 static int ^1da177e4c3f41 Linus Torvalds 2005-04-16 787 mptctl_do_fw_download(int ioc, char __user *ufwbuf, size_t fwlen) ^1da177e4c3f41 Linus Torvalds 2005-04-16 788 { ^1da177e4c3f41 Linus Torvalds 2005-04-16 789 FWDownload_t *dlmsg; ^1da177e4c3f41 Linus Torvalds 2005-04-16 790 MPT_FRAME_HDR *mf; ^1da177e4c3f41 Linus Torvalds 2005-04-16 791 MPT_ADAPTER *iocp; ^1da177e4c3f41 Linus Torvalds 2005-04-16 792 FWDownloadTCSGE_t *ptsge; ^1da177e4c3f41 Linus Torvalds 2005-04-16 793 MptSge_t *sgl, *sgIn; ^1da177e4c3f41 Linus Torvalds 2005-04-16 794 char *sgOut; ^1da177e4c3f41 Linus Torvalds 2005-04-16 795 struct buflist *buflist; ^1da177e4c3f41 Linus Torvalds 2005-04-16 796 struct buflist *bl; ^1da177e4c3f41 Linus Torvalds 2005-04-16 797 dma_addr_t sgl_dma; ^1da177e4c3f41 Linus Torvalds 2005-04-16 798 int ret; ^1da177e4c3f41 Linus Torvalds 2005-04-16 799 int numfrags = 0; ^1da177e4c3f41 Linus Torvalds 2005-04-16 800 int maxfrags; ^1da177e4c3f41 Linus Torvalds 2005-04-16 801 int n = 0; ^1da177e4c3f41 Linus Torvalds 2005-04-16 802 u32 sgdir; ^1da177e4c3f41 Linus Torvalds 2005-04-16 803 u32 nib; ^1da177e4c3f41 Linus Torvalds 2005-04-16 804 int fw_bytes_copied = 0; ^1da177e4c3f41 Linus Torvalds 2005-04-16 805 int i; ^1da177e4c3f41 Linus Torvalds 2005-04-16 806 int sge_offset = 0; ^1da177e4c3f41 Linus Torvalds 2005-04-16 807 u16 iocstat; ^1da177e4c3f41 Linus Torvalds 2005-04-16 808
[PATCH v4] bus: ti-sysc: Change return types of functions
Change return type of functions sysc_check_one_child() and sysc_check_children() from int to void as neither ever returns an error. Modify call sites of both functions accordingly. Signed-off-by: Nishka Dasgupta --- Changes in v4: - Merge into a single patch the two patches for sysc_check_one_child() and sysc_check_children(). Changes in v3: - Add patch for sysc_check_children(). - Remove return statement in sysc_check_one_child(). - Remove braces at call site. Changes in v2: - Remove error variable entirely. - Change return type of sysc_check_one_child(). drivers/bus/ti-sysc.c | 22 ++ 1 file changed, 6 insertions(+), 16 deletions(-) diff --git a/drivers/bus/ti-sysc.c b/drivers/bus/ti-sysc.c index e6deabd8305d..a2eae8f36ef8 100644 --- a/drivers/bus/ti-sysc.c +++ b/drivers/bus/ti-sysc.c @@ -615,8 +615,8 @@ static void sysc_check_quirk_stdout(struct sysc *ddata, * node but children have "ti,hwmods". These belong to the interconnect * target node and are managed by this driver. */ -static int sysc_check_one_child(struct sysc *ddata, - struct device_node *np) +static void sysc_check_one_child(struct sysc *ddata, +struct device_node *np) { const char *name; @@ -626,22 +626,14 @@ static int sysc_check_one_child(struct sysc *ddata, sysc_check_quirk_stdout(ddata, np); sysc_parse_dts_quirks(ddata, np, true); - - return 0; } -static int sysc_check_children(struct sysc *ddata) +static void sysc_check_children(struct sysc *ddata) { struct device_node *child; - int error; - - for_each_child_of_node(ddata->dev->of_node, child) { - error = sysc_check_one_child(ddata, child); - if (error) - return error; - } - return 0; + for_each_child_of_node(ddata->dev->of_node, child) + sysc_check_one_child(ddata, child); } /* @@ -794,9 +786,7 @@ static int sysc_map_and_check_registers(struct sysc *ddata) if (error) return error; - error = sysc_check_children(ddata); - if (error) - return error; + sysc_check_children(ddata); error = sysc_parse_registers(ddata); if (error) -- 2.19.1
[PATCH v2 2/2] iommu/arm-smmu-v3: add nr_ats_masters for quickly check
When (smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS) is true, even if a smmu domain does not contain any ats master, the operations of arm_smmu_atc_inv_to_cmd() and lock protection in arm_smmu_atc_inv_domain() are always executed. This will impact performance, especially in multi-core and stress scenarios. For my FIO test scenario, about 8% performance reduced. In fact, we can use a struct member to record how many ats masters that the smmu contains. And check that without traverse the list and check all masters one by one in the lock protection. Fixes: 9ce27afc0830 ("iommu/arm-smmu-v3: Add support for PCI ATS") Signed-off-by: Zhen Lei --- drivers/iommu/arm-smmu-v3.c | 14 +- 1 file changed, 13 insertions(+), 1 deletion(-) diff --git a/drivers/iommu/arm-smmu-v3.c b/drivers/iommu/arm-smmu-v3.c index 29056d9bb12aa01..154334d3310c9b8 100644 --- a/drivers/iommu/arm-smmu-v3.c +++ b/drivers/iommu/arm-smmu-v3.c @@ -631,6 +631,7 @@ struct arm_smmu_domain { struct io_pgtable_ops *pgtbl_ops; boolnon_strict; + int nr_ats_masters; enum arm_smmu_domain_stage stage; union { @@ -1531,7 +1532,16 @@ static int arm_smmu_atc_inv_domain(struct arm_smmu_domain *smmu_domain, struct arm_smmu_cmdq_ent cmd; struct arm_smmu_master *master; - if (!(smmu_domain->smmu->features & ARM_SMMU_FEAT_ATS)) + /* +* The protectiom of spinlock(_domain->devices_lock) is omitted. +* Because for a given master, its map/unmap operations should only be +* happened after it has been attached and before it has been detached. +* So that, if at least one master need to be atc invalidated, the +* value of smmu_domain->nr_ats_masters can not be zero. +* +* This can alleviate performance loss in multi-core scenarios. +*/ + if (!smmu_domain->nr_ats_masters) return 0; arm_smmu_atc_inv_to_cmd(ssid, iova, size, ); @@ -1913,6 +1923,7 @@ static void arm_smmu_detach_dev(struct arm_smmu_master *master) spin_lock_irqsave(_domain->devices_lock, flags); list_del(>domain_head); + smmu_domain->nr_ats_masters--; spin_unlock_irqrestore(_domain->devices_lock, flags); master->domain = NULL; @@ -1968,6 +1979,7 @@ static int arm_smmu_attach_dev(struct iommu_domain *domain, struct device *dev) spin_lock_irqsave(_domain->devices_lock, flags); list_add(>domain_head, _domain->devices); + smmu_domain->nr_ats_masters++; spin_unlock_irqrestore(_domain->devices_lock, flags); out_unlock: mutex_unlock(_domain->init_mutex); -- 1.8.3
[PATCH v4 0/2] mm,thp: Add filemap_huge_fault() for THP
This set of patches is the first step towards a mechanism for automatically mapping read-only text areas of appropriate size and alignment to THPs whenever possible. For now, the central routine, filemap_huge_fault(), amd various support routines are only included if the experimental kernel configuration option RO_EXEC_FILEMAP_HUGE_FAULT_THP is enabled. This is because filemap_huge_fault() is dependent upon the address_space_operations vector readpage() pointing to a routine that will read and fill an entire large page at a time without poulluting the page cache with PAGESIZE entries for the large page being mapped or performing readahead that would pollute the page cache entries for succeeding large pages. Unfortunately, there is no good way to determine how many bytes were read by readpage(). At present, if filemap_huge_fault() were to call a conventional readpage() routine, it would only fill the first PAGESIZE bytes of the large page, which is definitely NOT the desired behavior. However, by making the code available now it is hoped that filesystem maintainers who have pledged to provide such a mechanism will do so more rapidly. The first part of the patch adds an order field to __page_cache_alloc(), allowing callers to directly request page cache pages of various sizes. This code was provided by Matthew Wilcox. The second part of the patch implements the filemap_huge_fault() mechanism as described above. As this code is only run when the experimental config option is set, there are some issues that need to be resolved but this is a good step step that will enable further developemt. Changes since v3: 1. Multiple code review comments addressed 2. filemap_huge_fault() now does rcu locking when possible 3. filemap_huge_fault() now properly adds the THP to the page cache before calling readpage() Changes since v2: 1. FGP changes were pulled out to enable submission as an independent patch 2. Inadvertent tab spacing and comment changes were reverted Changes since v1: 1. Fix improperly generated patch for v1 PATCH 1/2 Matthew Wilcox (1): Add an 'order' argument to __page_cache_alloc() and do_read_cache_page(). Ensure the allocated pages are compound pages. William Kucharski (1): Add filemap_huge_fault() to attempt to satisfy page faults on memory-mapped read-only text pages using THP when possible. fs/afs/dir.c| 2 +- fs/btrfs/compression.c | 2 +- fs/cachefiles/rdwr.c| 4 +- fs/ceph/addr.c | 2 +- fs/ceph/file.c | 2 +- include/linux/mm.h | 2 + include/linux/pagemap.h | 10 +- mm/Kconfig | 15 ++ mm/filemap.c| 357 ++-- mm/huge_memory.c| 3 + mm/mmap.c | 38 - mm/readahead.c | 2 +- mm/rmap.c | 4 +- net/ceph/pagelist.c | 4 +- net/ceph/pagevec.c | 2 +- 15 files changed, 413 insertions(+), 36 deletions(-) -- 2.21.0
[PATCH v4 1/2] mm: Allow the page cache to allocate large pages
Add an 'order' argument to __page_cache_alloc() and do_read_cache_page(). Ensure the allocated pages are compound pages. Signed-off-by: Matthew Wilcox (Oracle) Signed-off-by: William Kucharski Reported-by: kbuild test robot --- fs/afs/dir.c| 2 +- fs/btrfs/compression.c | 2 +- fs/cachefiles/rdwr.c| 4 ++-- fs/ceph/addr.c | 2 +- fs/ceph/file.c | 2 +- include/linux/pagemap.h | 10 ++ mm/filemap.c| 20 +++- mm/readahead.c | 2 +- net/ceph/pagelist.c | 4 ++-- net/ceph/pagevec.c | 2 +- 10 files changed, 27 insertions(+), 23 deletions(-) diff --git a/fs/afs/dir.c b/fs/afs/dir.c index e640d67274be..0a392214f71e 100644 --- a/fs/afs/dir.c +++ b/fs/afs/dir.c @@ -274,7 +274,7 @@ static struct afs_read *afs_read_dir(struct afs_vnode *dvnode, struct key *key) afs_stat_v(dvnode, n_inval); ret = -ENOMEM; - req->pages[i] = __page_cache_alloc(gfp); + req->pages[i] = __page_cache_alloc(gfp, 0); if (!req->pages[i]) goto error; ret = add_to_page_cache_lru(req->pages[i], diff --git a/fs/btrfs/compression.c b/fs/btrfs/compression.c index 60c47b417a4b..5280e7477b7e 100644 --- a/fs/btrfs/compression.c +++ b/fs/btrfs/compression.c @@ -466,7 +466,7 @@ static noinline int add_ra_bio_pages(struct inode *inode, } page = __page_cache_alloc(mapping_gfp_constraint(mapping, -~__GFP_FS)); +~__GFP_FS), 0); if (!page) break; diff --git a/fs/cachefiles/rdwr.c b/fs/cachefiles/rdwr.c index 44a3ce1e4ce4..11d30212745f 100644 --- a/fs/cachefiles/rdwr.c +++ b/fs/cachefiles/rdwr.c @@ -259,7 +259,7 @@ static int cachefiles_read_backing_file_one(struct cachefiles_object *object, goto backing_page_already_present; if (!newpage) { - newpage = __page_cache_alloc(cachefiles_gfp); + newpage = __page_cache_alloc(cachefiles_gfp, 0); if (!newpage) goto nomem_monitor; } @@ -495,7 +495,7 @@ static int cachefiles_read_backing_file(struct cachefiles_object *object, goto backing_page_already_present; if (!newpage) { - newpage = __page_cache_alloc(cachefiles_gfp); + newpage = __page_cache_alloc(cachefiles_gfp, 0); if (!newpage) goto nomem; } diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c index e078cc55b989..bcb41fbee533 100644 --- a/fs/ceph/addr.c +++ b/fs/ceph/addr.c @@ -1707,7 +1707,7 @@ int ceph_uninline_data(struct file *filp, struct page *locked_page) if (len > PAGE_SIZE) len = PAGE_SIZE; } else { - page = __page_cache_alloc(GFP_NOFS); + page = __page_cache_alloc(GFP_NOFS, 0); if (!page) { err = -ENOMEM; goto out; diff --git a/fs/ceph/file.c b/fs/ceph/file.c index 685a03cc4b77..ae58d7c31aa4 100644 --- a/fs/ceph/file.c +++ b/fs/ceph/file.c @@ -1305,7 +1305,7 @@ static ssize_t ceph_read_iter(struct kiocb *iocb, struct iov_iter *to) struct page *page = NULL; loff_t i_size; if (retry_op == READ_INLINE) { - page = __page_cache_alloc(GFP_KERNEL); + page = __page_cache_alloc(GFP_KERNEL, 0); if (!page) return -ENOMEM; } diff --git a/include/linux/pagemap.h b/include/linux/pagemap.h index c7552459a15f..92e026d9a6b7 100644 --- a/include/linux/pagemap.h +++ b/include/linux/pagemap.h @@ -208,17 +208,19 @@ static inline int page_cache_add_speculative(struct page *page, int count) } #ifdef CONFIG_NUMA -extern struct page *__page_cache_alloc(gfp_t gfp); +extern struct page *__page_cache_alloc(gfp_t gfp, unsigned int order); #else -static inline struct page *__page_cache_alloc(gfp_t gfp) +static inline struct page *__page_cache_alloc(gfp_t gfp, unsigned int order) { - return alloc_pages(gfp, 0); + if (order > 0) + gfp |= __GFP_COMP; + return alloc_pages(gfp, order); } #endif static inline struct page *page_cache_alloc(struct address_space *x) { - return __page_cache_alloc(mapping_gfp_mask(x)); + return __page_cache_alloc(mapping_gfp_mask(x), 0); } static inline gfp_t readahead_gfp_mask(struct address_space *x) diff --git a/mm/filemap.c b/mm/filemap.c index
[PATCH v4 2/2] mm,thp: Add experimental config option RO_EXEC_FILEMAP_HUGE_FAULT_THP
Add filemap_huge_fault() to attempt to satisfy page faults on memory-mapped read-only text pages using THP when possible. Signed-off-by: William Kucharski --- include/linux/mm.h | 2 + mm/Kconfig | 15 ++ mm/filemap.c | 337 +++-- mm/huge_memory.c | 3 + mm/mmap.c | 38 - mm/rmap.c | 4 +- 6 files changed, 386 insertions(+), 13 deletions(-) diff --git a/include/linux/mm.h b/include/linux/mm.h index 0334ca97c584..2a5311721739 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -2433,6 +2433,8 @@ extern void truncate_inode_pages_final(struct address_space *); /* generic vm_area_ops exported for stackable file systems */ extern vm_fault_t filemap_fault(struct vm_fault *vmf); +extern vm_fault_t filemap_huge_fault(struct vm_fault *vmf, + enum page_entry_size pe_size); extern void filemap_map_pages(struct vm_fault *vmf, pgoff_t start_pgoff, pgoff_t end_pgoff); extern vm_fault_t filemap_page_mkwrite(struct vm_fault *vmf); diff --git a/mm/Kconfig b/mm/Kconfig index 56cec636a1fc..2debaded0e4d 100644 --- a/mm/Kconfig +++ b/mm/Kconfig @@ -736,4 +736,19 @@ config ARCH_HAS_PTE_SPECIAL config ARCH_HAS_HUGEPD bool +config RO_EXEC_FILEMAP_HUGE_FAULT_THP + bool "read-only exec filemap_huge_fault THP support (EXPERIMENTAL)" + depends on TRANSPARENT_HUGE_PAGECACHE && SHMEM + + help + Introduce filemap_huge_fault() to automatically map executable + read-only pages of mapped files of suitable size and alignment + using THP if possible. + + This is marked experimental because it is a new feature and is + dependent upon filesystmes implementing readpages() in a way + that will recognize large THP pages and read file content to + them without polluting the pagecache with PAGESIZE pages due + to readahead. + endmenu diff --git a/mm/filemap.c b/mm/filemap.c index 38b46fc00855..aebf2f54f52e 100644 --- a/mm/filemap.c +++ b/mm/filemap.c @@ -199,13 +199,12 @@ static void unaccount_page_cache_page(struct address_space *mapping, nr = hpage_nr_pages(page); __mod_node_page_state(page_pgdat(page), NR_FILE_PAGES, -nr); - if (PageSwapBacked(page)) { + + if (PageSwapBacked(page)) __mod_node_page_state(page_pgdat(page), NR_SHMEM, -nr); - if (PageTransHuge(page)) - __dec_node_page_state(page, NR_SHMEM_THPS); - } else { - VM_BUG_ON_PAGE(PageTransHuge(page), page); - } + + if (PageTransHuge(page)) + __dec_node_page_state(page, NR_SHMEM_THPS); /* * At this point page must be either written or cleaned by @@ -1663,7 +1662,8 @@ struct page *pagecache_get_page(struct address_space *mapping, pgoff_t offset, no_page: if (!page && (fgp_flags & FGP_CREAT)) { int err; - if ((fgp_flags & FGP_WRITE) && mapping_cap_account_dirty(mapping)) + if ((fgp_flags & FGP_WRITE) && + mapping_cap_account_dirty(mapping)) gfp_mask |= __GFP_WRITE; if (fgp_flags & FGP_NOFS) gfp_mask &= ~__GFP_FS; @@ -2643,6 +2643,326 @@ vm_fault_t filemap_fault(struct vm_fault *vmf) } EXPORT_SYMBOL(filemap_fault); +#ifdef CONFIG_RO_EXEC_FILEMAP_HUGE_FAULT_THP +/* + * Check for an entry in the page cache which would conflict with the address + * range we wish to map using a THP or is otherwise unusable to map a large + * cached page. + * + * The routine will return true if a usable page is found in the page cache + * (and *pagep will be set to the address of the cached page), or if no + * cached page is found (and *pagep will be set to NULL). + */ +static bool +filemap_huge_check_pagecache_usable(struct xa_state *xas, + struct page **pagep, pgoff_t hindex, pgoff_t hindex_max) +{ + struct page *page; + + while (1) { + page = xas_find(xas, hindex_max); + + if (xas_retry(xas, page)) { + xas_set(xas, hindex); + continue; + } + + /* +* A found entry is unusable if: +* + the entry is an Xarray value, not a pointer +* + the entry is an internal Xarray node +* + the entry is not a Transparent Huge Page +* + the entry is not a compound page +* + the entry is not the head of a compound page +* + the entry is a page page with an order other than +*HPAGE_PMD_ORDER +* + the page's index is not what we expect it to be +* + the page is not up-to-date +*/ + if (!page) + break; + + if
Re: [PATCH] PCI: dwc: Add map irq callback
On 8/14/2019 6:59 PM, Christoph Hellwig wrote: On Wed, Aug 14, 2019 at 04:31:14PM +0800, Dilip Kota wrote: callback. pp->map_irq() must assign the callback along with the platform specific configuration. In Intel PCIe driver pp->map_irq() does the same. (Driver is not yet present in mainline, i will submit for review once this change is approved). And that's what I meant. The standard procedure is to submit your core changes together with the user, not separately. Sure, will submit the driver change along with this change. Sorry for missing it. Thanks, Dilip
Re: [PATCH v5 1/2] dt-bindings: mmc: Document Aspeed SD controller
On Thu, 15 Aug 2019, at 15:06, Joel Stanley wrote: > On Wed, 7 Aug 2019 at 00:38, Andrew Jeffery wrote: > > > > The ASPEED SD/SDIO/MMC controller exposes two slots implementing the > > SDIO Host Specification v2.00, with 1 or 4 bit data buses, or an 8 bit > > data bus if only a single slot is enabled. > > > > Signed-off-by: Andrew Jeffery > > Reviewed-by: Joel Stanley > > Two minor comments below. > > > +++ b/Documentation/devicetree/bindings/mmc/aspeed,sdhci.yaml > > @@ -0,0 +1,105 @@ > > +# SPDX-License-Identifier: GPL-2.0-or-later > > No "Copyright IBM" ? I'm going rogue. That reminds me I should chase up where we got to with the binding licensing. > > > +%YAML 1.2 > > +--- > > > + > > +examples: > > + - | > > +#include > > +sdc@1e74 { > > +compatible = "aspeed,ast2500-sd-controller"; > > +reg = <0x1e74 0x100>; > > +#address-cells = <1>; > > +#size-cells = <1>; > > +ranges = <0 0x1e74 0x1>; > > According to the datasheet this could be 0x2. It does not matter > though, as there's nothing in it past 0x300. Good catch. Andrew
[PATCH v2] regulator: core: Add label to collate of_node_put() statements
In function of_get_child_regulator(), the loop for_each_child_of_node() contains two mid-loop return statements, each preceded by a statement putting child. In order to reduce this repetition, create a new label, err_node_put, that puts child and then returns the required value; edit the mid-loop return blocks to instead go to this new label. Signed-off-by: Nishka Dasgupta --- Changes in v2: - Submit this as a separate patch instead of updating a previous patch. drivers/regulator/core.c | 13 +++-- 1 file changed, 7 insertions(+), 6 deletions(-) diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c index 7a5d52948703..4a27a46ec6e7 100644 --- a/drivers/regulator/core.c +++ b/drivers/regulator/core.c @@ -380,16 +380,17 @@ static struct device_node *of_get_child_regulator(struct device_node *parent, if (!regnode) { regnode = of_get_child_regulator(child, prop_name); - if (regnode) { - of_node_put(child); - return regnode; - } + if (regnode) + goto err_node_put; } else { - of_node_put(child); - return regnode; + goto err_node_put; } } return NULL; + +err_node_put: + of_node_put(child); + return regnode; } /** -- 2.19.1
Re: [PATCH v5 1/2] dt-bindings: mmc: Document Aspeed SD controller
On Wed, 7 Aug 2019 at 00:38, Andrew Jeffery wrote: > > The ASPEED SD/SDIO/MMC controller exposes two slots implementing the > SDIO Host Specification v2.00, with 1 or 4 bit data buses, or an 8 bit > data bus if only a single slot is enabled. > > Signed-off-by: Andrew Jeffery Reviewed-by: Joel Stanley Two minor comments below. > +++ b/Documentation/devicetree/bindings/mmc/aspeed,sdhci.yaml > @@ -0,0 +1,105 @@ > +# SPDX-License-Identifier: GPL-2.0-or-later No "Copyright IBM" ? > +%YAML 1.2 > +--- > + > +examples: > + - | > +#include > +sdc@1e74 { > +compatible = "aspeed,ast2500-sd-controller"; > +reg = <0x1e74 0x100>; > +#address-cells = <1>; > +#size-cells = <1>; > +ranges = <0 0x1e74 0x1>; According to the datasheet this could be 0x2. It does not matter though, as there's nothing in it past 0x300. Cheers, Joel
Re: [PATCH v5 2/2] mmc: Add support for the ASPEED SD controller
On Wed, 7 Aug 2019 at 00:38, Andrew Jeffery wrote: > > Add a minimal driver for ASPEED's SD controller, which exposes two > SDHCIs. > > The ASPEED design implements a common register set for the SDHCIs, and > moves some of the standard configuration elements out to this common > area (e.g. 8-bit mode, and card detect configuration which is not > currently supported). > > The SD controller has a dedicated hardware interrupt that is shared > between the slots. The common register set exposes information on which > slot triggered the interrupt; early revisions of the patch introduced an > irqchip for the register, but reality is it doesn't behave as an > irqchip, and the result fits awkwardly into the irqchip APIs. Instead > I've taken the simple approach of using the IRQ as a shared IRQ with > some minor performance impact for the second slot. > > Ryan was the original author of the patch - I've taken his work and > massaged it to drop the irqchip support and rework the devicetree > integration. The driver has been smoke tested under qemu against a > minimal SD controller model and lightly tested on an ast2500-evb. > > Signed-off-by: Ryan Chen > Signed-off-by: Andrew Jeffery > Acked-by: Adrian Hunter Reviewed-by: Joel Stanley > > --- > > v5: > * Cleanup sdhci driver on registration failure > > v4: No change > > v2: > * Add AST2600 compatible > * Drop SDHCI_QUIRK2_CLOCK_DIV_ZERO_BROKEN > * Ensure slot number is valid > * Fix build with CONFIG_MODULES > * Fix module license string > * Non-PCI devices won't die > * Rename aspeed_sdc_configure_8bit_mode() > * Rename aspeed_sdhci_pdata > * Switch to sdhci_enable_clk() > * Use PTR_ERR() on the right `struct platform_device *` > --- > drivers/mmc/host/Kconfig | 12 ++ > drivers/mmc/host/Makefile | 1 + > drivers/mmc/host/sdhci-of-aspeed.c | 332 + > 3 files changed, 345 insertions(+) > create mode 100644 drivers/mmc/host/sdhci-of-aspeed.c > > diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig > index 14d89a108edd..0f8a230de2f3 100644 > --- a/drivers/mmc/host/Kconfig > +++ b/drivers/mmc/host/Kconfig > @@ -154,6 +154,18 @@ config MMC_SDHCI_OF_ARASAN > > If unsure, say N. > > +config MMC_SDHCI_OF_ASPEED > + tristate "SDHCI OF support for the ASPEED SDHCI controller" > + depends on MMC_SDHCI_PLTFM > + depends on OF > + help > + This selects the ASPEED Secure Digital Host Controller Interface. > + > + If you have a controller with this interface, say Y or M here. You > + also need to enable an appropriate bus interface. > + > + If unsure, say N. > + > config MMC_SDHCI_OF_AT91 > tristate "SDHCI OF support for the Atmel SDMMC controller" > depends on MMC_SDHCI_PLTFM > diff --git a/drivers/mmc/host/Makefile b/drivers/mmc/host/Makefile > index 73578718f119..390ee162fe71 100644 > --- a/drivers/mmc/host/Makefile > +++ b/drivers/mmc/host/Makefile > @@ -84,6 +84,7 @@ obj-$(CONFIG_MMC_SDHCI_ESDHC_IMX) += sdhci-esdhc-imx.o > obj-$(CONFIG_MMC_SDHCI_DOVE) += sdhci-dove.o > obj-$(CONFIG_MMC_SDHCI_TEGRA) += sdhci-tegra.o > obj-$(CONFIG_MMC_SDHCI_OF_ARASAN) += sdhci-of-arasan.o > +obj-$(CONFIG_MMC_SDHCI_OF_ASPEED) += sdhci-of-aspeed.o > obj-$(CONFIG_MMC_SDHCI_OF_AT91)+= sdhci-of-at91.o > obj-$(CONFIG_MMC_SDHCI_OF_ESDHC) += sdhci-of-esdhc.o > obj-$(CONFIG_MMC_SDHCI_OF_HLWD)+= sdhci-of-hlwd.o > diff --git a/drivers/mmc/host/sdhci-of-aspeed.c > b/drivers/mmc/host/sdhci-of-aspeed.c > new file mode 100644 > index ..8bb095ca2fa9 > --- /dev/null > +++ b/drivers/mmc/host/sdhci-of-aspeed.c > @@ -0,0 +1,332 @@ > +// SPDX-License-Identifier: GPL-2.0-or-later > +/* Copyright (C) 2019 ASPEED Technology Inc. */ > +/* Copyright (C) 2019 IBM Corp. */ > + > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > +#include > + > +#include "sdhci-pltfm.h" > + > +#define ASPEED_SDC_INFO0x00 > +#define ASPEED_SDC_S1MMC8BIT(25) > +#define ASPEED_SDC_S0MMC8BIT(24) > + > +struct aspeed_sdc { > + struct clk *clk; > + struct resource *res; > + > + spinlock_t lock; > + void __iomem *regs; > +}; > + > +struct aspeed_sdhci { > + struct aspeed_sdc *parent; > + u32 width_mask; > +}; > + > +static void aspeed_sdc_configure_8bit_mode(struct aspeed_sdc *sdc, > + struct aspeed_sdhci *sdhci, > + bool bus8) > +{ > + u32 info; > + > + /* Set/clear 8 bit mode */ > + spin_lock(>lock); > + info = readl(sdc->regs + ASPEED_SDC_INFO); > + if (bus8) > + info |= sdhci->width_mask; > + else > + info &= ~sdhci->width_mask; > + writel(info, sdc->regs + ASPEED_SDC_INFO); > + spin_unlock(>lock); > +}
Re: [PATCH v5 5/7] PCI/ATS: Add PASID support for PCIe VF devices
On Tue, Aug 13, 2019 at 03:19:58PM -0700, Kuppuswamy Sathyanarayanan wrote: > On Mon, Aug 12, 2019 at 03:05:08PM -0500, Bjorn Helgaas wrote: > > On Thu, Aug 01, 2019 at 05:06:02PM -0700, > > sathyanarayanan.kuppusw...@linux.intel.com wrote: > > > From: Kuppuswamy Sathyanarayanan > > > > > > > > > When IOMMU tries to enable PASID for VF device in > > > iommu_enable_dev_iotlb(), it always fails because PASID support for PCIe > > > VF device is currently broken in PCIE driver. Current implementation > > > expects the given PCIe device (PF & VF) to implement PASID capability > > > before enabling the PASID support. But this assumption is incorrect. As > > > per PCIe spec r4.0, sec 9.3.7.14, all VFs associated with PF can only > > > use the PASID of the PF and not implement it. > > > > > > Also, since PASID is a shared resource between PF/VF, following rules > > > should apply. > > > > > > 1. Use proper locking before accessing/modifying PF resources in VF > > >PASID enable/disable call. > > > 2. Use reference count logic to track the usage of PASID resource. > > > 3. Disable PASID only if the PASID reference count (pasid_ref_cnt) is > > > zero. > > > > > > Cc: Ashok Raj > > > Cc: Keith Busch > > > Suggested-by: Ashok Raj > > > Signed-off-by: Kuppuswamy Sathyanarayanan > > > > > > --- > > > drivers/pci/ats.c | 113 ++-- > > > include/linux/pci.h | 2 + > > > 2 files changed, 90 insertions(+), 25 deletions(-) > > > > > > diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c > > > index 079dc544..9384afd7d00e 100644 > > > --- a/drivers/pci/ats.c > > > +++ b/drivers/pci/ats.c > > > @@ -402,6 +402,8 @@ void pci_pasid_init(struct pci_dev *pdev) > > > if (pdev->is_virtfn) > > > return; > > > > > > + mutex_init(>pasid_lock); > > > + > > > pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_PASID); > > > if (!pos) > > > return; > > > @@ -436,32 +438,57 @@ void pci_pasid_init(struct pci_dev *pdev) > > > int pci_enable_pasid(struct pci_dev *pdev, int features) > > > { > > > u16 control, supported; > > > + int ret = 0; > > > + struct pci_dev *pf = pci_physfn(pdev); > > > > > > - if (WARN_ON(pdev->pasid_enabled)) > > > - return -EBUSY; > > > + mutex_lock(>pasid_lock); > > > > > > - if (!pdev->eetlp_prefix_path) > > > - return -EINVAL; > > > + if (WARN_ON(pdev->pasid_enabled)) { > > > + ret = -EBUSY; > > > + goto pasid_unlock; > > > + } > > > > > > - if (!pdev->pasid_cap) > > > - return -EINVAL; > > > + if (!pdev->eetlp_prefix_path) { > > > + ret = -EINVAL; > > > + goto pasid_unlock; > > > + } > > > > > > - pci_read_config_word(pdev, pdev->pasid_cap + PCI_PASID_CAP, > > > - ); > > > + if (!pf->pasid_cap) { > > > + ret = -EINVAL; > > > + goto pasid_unlock; > > > + } > > > + > > > + if (pdev->is_virtfn && pf->pasid_enabled) > > > + goto update_status; > > > + > > > + pci_read_config_word(pf, pf->pasid_cap + PCI_PASID_CAP, ); > > > supported &= PCI_PASID_CAP_EXEC | PCI_PASID_CAP_PRIV; > > > > > > /* User wants to enable anything unsupported? */ > > > - if ((supported & features) != features) > > > - return -EINVAL; > > > + if ((supported & features) != features) { > > > + ret = -EINVAL; > > > + goto pasid_unlock; > > > + } > > > > > > control = PCI_PASID_CTRL_ENABLE | features; > > > - pdev->pasid_features = features; > > > - > > > + pf->pasid_features = features; > > > pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control); > > > > > > - pdev->pasid_enabled = 1; > > > + /* > > > + * If PASID is not already enabled in PF, increment pasid_ref_cnt > > > + * to count PF PASID usage. > > > + */ > > > + if (pdev->is_virtfn && !pf->pasid_enabled) { > > > + atomic_inc(>pasid_ref_cnt); > > > + pf->pasid_enabled = 1; > > > + } > > > > > > - return 0; > > > +update_status: > > > + atomic_inc(>pasid_ref_cnt); > > > + pdev->pasid_enabled = 1; > > > +pasid_unlock: > > > + mutex_unlock(>pasid_lock); > > > + return ret; > > > } > > > EXPORT_SYMBOL_GPL(pci_enable_pasid); > > > > > > @@ -472,16 +499,29 @@ EXPORT_SYMBOL_GPL(pci_enable_pasid); > > > void pci_disable_pasid(struct pci_dev *pdev) > > > { > > > u16 control = 0; > > > + struct pci_dev *pf = pci_physfn(pdev); > > > + > > > + mutex_lock(>pasid_lock); > > > > > > if (WARN_ON(!pdev->pasid_enabled)) > > > - return; > > > + goto pasid_unlock; > > > > > > - if (!pdev->pasid_cap) > > > - return; > > > + if (!pf->pasid_cap) > > > + goto pasid_unlock; > > > > > > - pci_write_config_word(pdev, pdev->pasid_cap + PCI_PASID_CTRL, control); > > > + atomic_dec(>pasid_ref_cnt); > > > > > > + if (atomic_read(>pasid_ref_cnt)) > > > + goto done; > > > + > > > + /* Disable PASID only if pasid_ref_cnt is zero */ > > > + pci_write_config_word(pf, pf->pasid_cap +
Re: [PATCH] x86/apic: Handle missing global clockevent gracefully
On Mon, Aug 12, 2019 at 2:16 PM Daniel Drake wrote: > I can do a bit of testing on other platforms too. Are there any > specific tests I should run, other than checking that the system boots > and doesn't have any timer watchdog complaints in the log? Tested this on 2 AMD platforms that were not affected by the crash here. In addition to confirming that they boot fine without timer complaints in the logs, I checked the calibrate_APIC_clock() result before and after this patch. I repeated each test twice. Asus E402YA (AMD E2-7015) Before: 99811, 99811 After: 99812, 99812 Acer Aspire A315-21G (AMD A9-9420e) Before: 99811, 99811 After: 99807, 99820 Those new numbers seem very close to the previous ones and I didn't observe any problems. Thanks Daniel
Re: [PATCH v5 3/7] PCI/ATS: Initialize PASID in pci_ats_init()
On Thu, Aug 01, 2019 at 05:06:00PM -0700, sathyanarayanan.kuppusw...@linux.intel.com wrote: > From: Kuppuswamy Sathyanarayanan > > Currently, PASID Capability checks are repeated across all PASID API's. > Instead, cache the capability check result in pci_pasid_init() and use > it in other PASID API's. Also, since PASID is a shared resource between > PF/VF, initialize PASID features with default values in pci_pasid_init(). > > Signed-off-by: Kuppuswamy Sathyanarayanan > > + * TODO: Since PASID is a shared resource between PF/VF, don't update > + * PASID features in the same API as a per device feature. This comment is slightly misleading (at least, it misled *me* :)) because it hints that PASID might be specific to SR-IOV. But I don't think that's true, so if you keep a comment like this, please reword it along the lines of "for SR-IOV devices, the PF's PASID is shared between the PF and all VFs" so it leaves open the possibility of non-SR-IOV devices using PASID as well. Bjorn
[PATCH] powerpc: Allow flush_(inval_)dcache_range to work across ranges >4GB
From: Alastair D'Silva Heads Up: This patch cannot be submitted to Linus's tree, as the affected assembler functions have already been converted to C. When calling flush_(inval_)dcache_range with a size >4GB, we were masking off the upper 32 bits, so we would incorrectly flush a range smaller than intended. This patch replaces the 32 bit shifts with 64 bit ones, so that the full size is accounted for. Signed-off-by: Alastair D'Silva --- arch/powerpc/kernel/misc_64.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S index 1ad4089dd110..d4d096f80f4b 100644 --- a/arch/powerpc/kernel/misc_64.S +++ b/arch/powerpc/kernel/misc_64.S @@ -130,7 +130,7 @@ _GLOBAL_TOC(flush_dcache_range) subfr8,r6,r4/* compute length */ add r8,r8,r5/* ensure we get enough */ lwz r9,DCACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of dcache block size */ - srw.r8,r8,r9/* compute line count */ + srd.r8,r8,r9/* compute line count */ beqlr /* nothing to do? */ mtctr r8 0: dcbst 0,r6 @@ -148,7 +148,7 @@ _GLOBAL(flush_inval_dcache_range) subfr8,r6,r4/* compute length */ add r8,r8,r5/* ensure we get enough */ lwz r9,DCACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of dcache block size */ - srw.r8,r8,r9/* compute line count */ + srd.r8,r8,r9/* compute line count */ beqlr /* nothing to do? */ sync isync -- 2.21.0
Re: [RESEND PATCH 1/2 -mm] mm: account lazy free pages separately
On 8/14/19 5:55 AM, Vlastimil Babka wrote: On 8/12/19 7:00 PM, Yang Shi wrote: I can see that memcg rss size was the primary problem David was looking at. But MemAvailable will not help with that, right? Moreover is Yes, but David actually would like to have memcg MemAvailable (the accounter like the global one), which should be counted like the global one and should account per memcg deferred split THP properly. accounting the full THP correct? What if subpages are still mapped? "Deferred split" definitely doesn't mean they are free. When memory pressure is hit, they would be split, then the unmapped normal pages would be freed. So, when calculating MemAvailable, they are not accounted 100%, but like "available += lazyfree - min(lazyfree / 2, wmark_low)", just like how page cache is accounted. We could get more accurate account, i.e. checking each sub page's mapcount when accounting, but it may change before shrinker start scanning. So, just use the ballpark estimation to trade off the complexity for accurate accounting. If we know the mapcounts in the moment the deferred split is initiated (I suppose there has to be a iteration over all subpages already?), we could get the exact number to adjust the counter with, and also store the number somewhere (e.g. a unused field in first/second tail page, I think we already do that for something). Then in the shrinker we just read that number to adjust the counter back. Then we can ignore the subpage mapping changes before shrinking happens, they shouldn't change the situation significantly, and importantly we we will be safe from counter imbalance thanks to the stored number. Thanks, I'm going to look into this approach. Thanks for the suggestion again.
Re: [RESEND PATCH 1/2 -mm] mm: account lazy free pages separately
On 8/14/19 5:49 AM, Vlastimil Babka wrote: On 8/9/19 8:26 PM, Yang Shi wrote: Here the new counter is introduced for patch 2/2 to account deferred split THPs into available memory since NR_ANON_THPS may contain non-deferred split THPs. I could use an internal counter for deferred split THPs, but if it is accounted by mod_node_page_state, why not just show it in /proc/meminfo? The answer to "Why not" is that it becomes part of userspace API (btw this patchset should have CC'd linux-api@ - please do for further iterations) and even if the implementation detail of deferred splitting might change in the future, we'll basically have to keep the counter (even with 0 value) in /proc/meminfo forever. Also, quite recently we have added the following counter: KReclaimable: Kernel allocations that the kernel will attempt to reclaim under memory pressure. Includes SReclaimable (below), and other direct allocations with a shrinker. Although THP allocations are not exactly "kernel allocations", once they are unmapped, they are in fact kernel-only, so IMHO it wouldn't be a big stretch to add the lazy THP pages there? Thanks a lot for the suggestion. I agree it may be a good fit. Hope "kernel allocations" not cause confusion. But, we can explain in the documentation. Or we fix NR_ANON_THPS and show deferred split THPs in /proc/meminfo?
Re: [RESEND PATCH 1/2 -mm] mm: account lazy free pages separately
On 8/14/19 4:08 AM, Michal Hocko wrote: On Mon 12-08-19 10:00:17, Yang Shi wrote: On 8/12/19 2:34 AM, Michal Hocko wrote: On Fri 09-08-19 16:54:43, Yang Shi wrote: On 8/9/19 11:26 AM, Yang Shi wrote: On 8/9/19 11:02 AM, Michal Hocko wrote: [...] I have to study the code some more but is there any reason why those pages are not accounted as proper THPs anymore? Sure they are partially unmaped but they are still THPs so why cannot we keep them accounted like that. Having a new counter to reflect that sounds like papering over the problem to me. But as I've said I might be missing something important here. I think we could keep those pages accounted for NR_ANON_THPS since they are still THP although they are unmapped as you mentioned if we just want to fix the improper accounting. By double checking what NR_ANON_THPS really means, Documentation/filesystems/proc.txt says "Non-file backed huge pages mapped into userspace page tables". Then it makes some sense to dec NR_ANON_THPS when removing rmap even though they are still THPs. I don't think we would like to change the definition, if so a new counter may make more sense. Yes, changing NR_ANON_THPS semantic sounds like a bad idea. Let me try whether I understand the problem. So we have some THP in limbo waiting for them to be split and unmapped parts to be freed, right? I can see that page_remove_anon_compound_rmap does correctly decrement NR_ANON_MAPPED for sub pages that are no longer mapped by anybody. LRU pages seem to be accounted properly as well. As you've said NR_ANON_THPS reflects the number of THPs mapped and that should be reflecting the reality already IIUC. So the only problem seems to be that deferred THP might aggregate a lot of immediately freeable memory (if none of the subpages are mapped) and that can confuse MemAvailable because it doesn't know about the fact. Has an skewed counter resulted in a user observable behavior/failures? No. But the skewed counter may make big difference for a big scale cluster. The MemAvailable is an important factor for cluster scheduler to determine the capacity. But MemAvailable is a very rough estimation. Is relying on it really a good measure? I mean there is a lot of reclaimable memory that is not reflected there (some fs. internal data structures, networking buffers etc.) Yes, I agree there are other freeable objects not accounted into MemAvailable. Their size depends on the workload. But, deferred split THPs seems more common with the common workloads. A simple run with MariaDB test of mmtest shows it could generate over fifteen thousand deferred split THPs (accumulated around 30G in one hour run, 75% of 40G memory for my VM). So, it may be worth accounting deferred split THPs in MemAvailable. [...] accounting the full THP correct? What if subpages are still mapped? "Deferred split" definitely doesn't mean they are free. When memory pressure is hit, they would be split, then the unmapped normal pages would be freed. So, when calculating MemAvailable, they are not accounted 100%, but like "available += lazyfree - min(lazyfree / 2, wmark_low)", just like how page cache is accounted. Then this is even more dubious IMHO. We could get more accurate account, i.e. checking each sub page's mapcount when accounting, but it may change before shrinker start scanning. So, just use the ballpark estimation to trade off the complexity for accurate accounting. I do not see much point in fixing up one particular counter when there is a whole lot that is even not considered. I would rather live with the fact that MemAvailable is only very rough estimate then whack a mole on any memory consumer that is freeable directly or indirectly via memory reclaim. Because this is likely to be always subtly broken and only visible under very specific workloads so there is no way to test for it. I saw Vlastimil suggested KReclaimable, it seems a good fit. If so we don't need create a new counter anymore.
Re: [PATCH v5 3/7] PCI/ATS: Initialize PASID in pci_ats_init()
On Thu, Aug 01, 2019 at 05:06:00PM -0700, sathyanarayanan.kuppusw...@linux.intel.com wrote: > From: Kuppuswamy Sathyanarayanan > > Currently, PASID Capability checks are repeated across all PASID API's. > Instead, cache the capability check result in pci_pasid_init() and use > it in other PASID API's. Also, since PASID is a shared resource between > PF/VF, initialize PASID features with default values in pci_pasid_init(). > > Signed-off-by: Kuppuswamy Sathyanarayanan > > --- > drivers/pci/ats.c | 74 + > include/linux/pci-ats.h | 5 +++ > include/linux/pci.h | 1 + > 3 files changed, 59 insertions(+), 21 deletions(-) > > diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c > index 280be911f190..1f4be27a071d 100644 > --- a/drivers/pci/ats.c > +++ b/drivers/pci/ats.c > @@ -30,6 +30,8 @@ void pci_ats_init(struct pci_dev *dev) > dev->ats_cap = pos; > > pci_pri_init(dev); > + > + pci_pasid_init(dev); > } > > /** > @@ -315,6 +317,40 @@ EXPORT_SYMBOL_GPL(pci_reset_pri); > #endif /* CONFIG_PCI_PRI */ > > #ifdef CONFIG_PCI_PASID > + > +void pci_pasid_init(struct pci_dev *pdev) > +{ > ... > +} > diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h > index 33653d4ca94f..bc7f815d38ff 100644 > --- a/include/linux/pci-ats.h > +++ b/include/linux/pci-ats.h > @@ -40,6 +40,7 @@ static inline int pci_reset_pri(struct pci_dev *pdev) > > #ifdef CONFIG_PCI_PASID > > +void pci_pasid_init(struct pci_dev *pdev); This also looks like it should be static in ats.c. > int pci_enable_pasid(struct pci_dev *pdev, int features); > void pci_disable_pasid(struct pci_dev *pdev); > void pci_restore_pasid_state(struct pci_dev *pdev); > @@ -48,6 +49,10 @@ int pci_max_pasids(struct pci_dev *pdev); > > #else /* CONFIG_PCI_PASID */ > > +static inline void pci_pasid_init(struct pci_dev *pdev) > +{ > +} > + > static inline int pci_enable_pasid(struct pci_dev *pdev, int features) > { > return -EINVAL;
Re: [PATCH] Fix a stack buffer overflow bug check_input_term
The stack trace differs from test to test, the attached trace1 file is taken from one of the tests. The bug is confirmed by adding some printk statement in `check_input_term`, the trace with output of printk is attached in trace2 file. This patch is a tentative fix to the bug, please give me feedback. On 8/15/19 12:35 AM, Hui Peng wrote: > `check_input_term` recursively calls itself with input > from device side (e.g., uac_input_terminal_descriptor.bCSourceID) > as argument (id). In `check_input_term`, if `check_input_term` > is called with the same `id` argument as the caller, it triggers > endless recursive call, resulting kernel space stack overflow. > > This patch fixes the bug by adding a bitmap to `struct mixer_build` > to keep track of the checked ids by `check_input_term` and stop > the execution if some id has been checked (similar to how > parse_audio_unit handles unitid argument). > > Reported-by: Hui Peng > Reported-by: Mathias Payer > Signed-off-by: Hui Peng > --- > sound/usb/mixer.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/sound/usb/mixer.c b/sound/usb/mixer.c > index ea487378be17..1f6c8213df82 100644 > --- a/sound/usb/mixer.c > +++ b/sound/usb/mixer.c > @@ -68,6 +68,7 @@ struct mixer_build { > unsigned char *buffer; > unsigned int buflen; > DECLARE_BITMAP(unitbitmap, MAX_ID_ELEMS); > + DECLARE_BITMAP(termbitmap, MAX_ID_ELEMS); > struct usb_audio_term oterm; > const struct usbmix_name_map *map; > const struct usbmix_selector_map *selector_map; > @@ -782,6 +783,8 @@ static int check_input_term(struct mixer_build *state, > int id, > int err; > void *p1; > > + if (test_and_set_bit(id, state->termbitmap)) > + return 0; > memset(term, 0, sizeof(*term)); > while ((p1 = find_audio_control_unit(state, id)) != NULL) { > unsigned char *hdr = p1; [7.839002] usb 1-1: new high-speed USB device number 2 using xhci_hcd [7.966787] usb 1-1: Using ep0 maxpacket: 16 [7.969898] usb 1-1: string descriptor 0 read error: -22 [7.971507] usb 1-1: New USB device found, idVendor=046d, idProduct=0a44, bcdDevice= 1.27 [7.973874] usb 1-1: New USB device strings: Mfr=1, Product=2, SerialNumber=0 [7.979864] usb 1-1: current rate 16732531 is different from the runtime rate 48000 [7.982029] usb 1-1: current rate 11254477 is different from the runtime rate 48000 [7.987190] [parse_audio_unit:2797] reached [7.987741] [parse_audio_unit:2814] reached [7.988310] [parse_audio_feature_unit:1855] reached [7.988982] [parse_audio_feature_unit:1915] reached [7.989605] [parse_audio_unit:2797] reached [7.990173] [parse_audio_unit:2804] reached [7.999615] [parse_audio_unit:2797] reached [8.000166] [parse_audio_unit:2814] reached [8.000734] [parse_audio_feature_unit:1855] reached [8.001361] [parse_audio_feature_unit:1915] reached [8.002011] [parse_audio_unit:2797] reached [8.002590] [parse_audio_unit:2811] reached [8.003180] [parse_audio_unit:2797] reached [8.003754] [parse_audio_unit:2814] reached [8.004342] [parse_audio_feature_unit:1855] reached [8.004992] [parse_audio_feature_unit:1915] reached [8.005656] [parse_audio_feature_unit:1921] reached [8.006324] [check_input_term:804] reached [8.006881] [check_input_term:860] reached [8.007451] [check_input_term:804] reached [8.008038] [check_input_term:860] reached [8.008596] [check_input_term:804] reached [8.009129] [check_input_term:860] reached [8.009686] [check_input_term:804] reached [8.010217] [check_input_term:860] reached [8.010834] [check_input_term:804] reached [8.011430] [check_input_term:860] reached [8.011945] [check_input_term:804] reached [8.012577] [check_input_term:860] reached [8.013109] [check_input_term:804] reached [8.013704] [check_input_term:860] reached [8.014319] [check_input_term:804] reached [8.014848] [check_input_term:860] reached [8.015415] [check_input_term:804] reached [8.015964] [check_input_term:860] reached [8.016525] [check_input_term:804] reached [8.017055] [check_input_term:860] reached [8.017612] [check_input_term:804] reached [8.018168] [check_input_term:860] reached [8.018701] [check_input_term:804] reached [8.019273] [check_input_term:860] reached [8.019805] [check_input_term:804] reached [8.020362] [check_input_term:860] reached [8.020893] [check_input_term:804] reached [8.021440] [check_input_term:860] reached [8.021972] [check_input_term:804] reached [8.022536] [check_input_term:860] reached [8.023127] [check_input_term:804] reached [8.023664] [check_input_term:860] reached [8.024213] [check_input_term:804] reached [8.024791] [check_input_term:860] reached [8.025351] [check_input_term:804] reached [8.025886] [check_input_term:860] reached [8.026435] [check_input_term:804] reached [
RE: [EXT] Re: [v1 1/3] clk: ls1028a: Add clock driver for Display output interface
> -Original Message- > From: Stephen Boyd > Sent: 2019年8月15日 1:27 > To: linux-...@vger.kernel.org; linux-de...@linux.nxdi.nxp.com; > linux-kernel@vger.kernel.org; liviu.du...@arm.com; Leo Li > ; Michael Turquette ; Wen > He > Subject: RE: [EXT] Re: [v1 1/3] clk: ls1028a: Add clock driver for Display > output > interface > > > Quoting Wen He (2019-08-14 02:38:21) > > > > > > > -Original Message- > > > From: Stephen Boyd > > > Sent: 2019年8月14日 2:25 > > > To: Michael Turquette ; Wen He > > > ; Leo Li ; > > > linux-...@vger.kernel.org; linux-de...@linux.nxdi.nxp.com; > > > linux-kernel@vger.kernel.org; liviu.du...@arm.com > > > Cc: Wen He > > > Subject: [EXT] Re: [v1 1/3] clk: ls1028a: Add clock driver for > > > Display output interface > > > > > > > > > Quoting Wen He (2019-08-12 03:01:03) > > > > diff --git a/drivers/clk/Kconfig b/drivers/clk/Kconfig index > > > > 801fa1cd0321..0e6c7027d637 100644 > > > > --- a/drivers/clk/Kconfig > > > > +++ b/drivers/clk/Kconfig > > > > @@ -223,6 +223,15 @@ config CLK_QORIQ > > > > This adds the clock driver support for Freescale QorIQ > platforms > > > > using common clock framework. > > > > > > > > +config CLK_PLLDIG > > > > +bool "Clock driver for LS1028A Display output" > > > > +depends on ARCH_LAYERSCAPE && OF > > > > > > Does it actually depend on either of these to build? Probabl not, so > > > maybe just default ARCH_LAYERSCAPE && OF? Also, can your Kconfig > > > variable be named something more specific like CLK_LS1028A_PLLDIG? > > > > Actually it also depends Display modules, but we allow building > > display drivers as modules, so is here whether need add Display > > modules depend and also allow clock driver building to a module? > > Would it be better to reduce the number of the modules insert, I think > > the clock driver should be long available for the system. > > I'm asking if it actually requires ARCH_LAYERSCAPE or OF to successfully > compile the file. Is that true? I don't see any asm/ includes or anything > that's > going to fail if either of these configs aren't enabled. > So it seems safe to change this to > > depends on ARCH_LAYERSCAPE || COMPILE_TEST > default ARCH_LAYERSCAPE > > so that it's compiled by default on this architecture and is available to be > compile tested by various test builders. Understand, Will send next patch version. Best Regards, Wen > > > > > looks like great if named Kconfig variable to 'CLK_LS1028A_PLLDIG'. > > > > >
Re: [PATCH v5 2/7] PCI/ATS: Initialize PRI in pci_ats_init()
On Thu, Aug 01, 2019 at 05:05:59PM -0700, sathyanarayanan.kuppusw...@linux.intel.com wrote: > From: Kuppuswamy Sathyanarayanan > > Currently, PRI Capability checks are repeated across all PRI API's. > Instead, cache the capability check result in pci_pri_init() and use it > in other PRI API's. Also, since PRI is a shared resource between PF/VF, > initialize default values for common PRI features in pci_pri_init(). > > Signed-off-by: Kuppuswamy Sathyanarayanan > > --- > drivers/pci/ats.c | 80 - > include/linux/pci-ats.h | 5 +++ > include/linux/pci.h | 1 + > 3 files changed, 61 insertions(+), 25 deletions(-) > > diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c > index cdd936d10f68..280be911f190 100644 > --- a/drivers/pci/ats.c > +++ b/drivers/pci/ats.c > @@ -28,6 +28,8 @@ void pci_ats_init(struct pci_dev *dev) > return; > > dev->ats_cap = pos; > + > + pci_pri_init(dev); > } > > /** > @@ -170,36 +172,72 @@ int pci_ats_page_aligned(struct pci_dev *pdev) > EXPORT_SYMBOL_GPL(pci_ats_page_aligned); > > #ifdef CONFIG_PCI_PRI > + > +void pci_pri_init(struct pci_dev *pdev) > +{ > ... > +} > diff --git a/include/linux/pci-ats.h b/include/linux/pci-ats.h > index 1a0bdaee2f32..33653d4ca94f 100644 > --- a/include/linux/pci-ats.h > +++ b/include/linux/pci-ats.h > @@ -6,6 +6,7 @@ > > #ifdef CONFIG_PCI_PRI > > +void pci_pri_init(struct pci_dev *pdev); pci_pri_init() is implemented and called in drivers/pci/ats.c. Unless there's a need to call this from outside ats.c, it should be static and should not be declared here. If you can make it static, please also reorder the code so you don't need a forward declaration in ats.c. > int pci_enable_pri(struct pci_dev *pdev, u32 reqs); > void pci_disable_pri(struct pci_dev *pdev); > void pci_restore_pri_state(struct pci_dev *pdev); > @@ -13,6 +14,10 @@ int pci_reset_pri(struct pci_dev *pdev); > > #else /* CONFIG_PCI_PRI */ > > +static inline void pci_pri_init(struct pci_dev *pdev) > +{ > +} > + > static inline int pci_enable_pri(struct pci_dev *pdev, u32 reqs) > { > return -ENODEV;
[PATCH] Fix a stack buffer overflow bug check_input_term
`check_input_term` recursively calls itself with input from device side (e.g., uac_input_terminal_descriptor.bCSourceID) as argument (id). In `check_input_term`, if `check_input_term` is called with the same `id` argument as the caller, it triggers endless recursive call, resulting kernel space stack overflow. This patch fixes the bug by adding a bitmap to `struct mixer_build` to keep track of the checked ids by `check_input_term` and stop the execution if some id has been checked (similar to how parse_audio_unit handles unitid argument). Reported-by: Hui Peng Reported-by: Mathias Payer Signed-off-by: Hui Peng --- sound/usb/mixer.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/sound/usb/mixer.c b/sound/usb/mixer.c index ea487378be17..1f6c8213df82 100644 --- a/sound/usb/mixer.c +++ b/sound/usb/mixer.c @@ -68,6 +68,7 @@ struct mixer_build { unsigned char *buffer; unsigned int buflen; DECLARE_BITMAP(unitbitmap, MAX_ID_ELEMS); + DECLARE_BITMAP(termbitmap, MAX_ID_ELEMS); struct usb_audio_term oterm; const struct usbmix_name_map *map; const struct usbmix_selector_map *selector_map; @@ -782,6 +783,8 @@ static int check_input_term(struct mixer_build *state, int id, int err; void *p1; + if (test_and_set_bit(id, state->termbitmap)) + return 0; memset(term, 0, sizeof(*term)); while ((p1 = find_audio_control_unit(state, id)) != NULL) { unsigned char *hdr = p1; -- 2.22.1
[PATCH] clk: composite: Drop unused clk.h include
This include isn't used. Drop it. Signed-off-by: Stephen Boyd --- drivers/clk/clk-composite.c | 1 - 1 file changed, 1 deletion(-) diff --git a/drivers/clk/clk-composite.c b/drivers/clk/clk-composite.c index b06038b8f658..4f13a681ddfc 100644 --- a/drivers/clk/clk-composite.c +++ b/drivers/clk/clk-composite.c @@ -3,7 +3,6 @@ * Copyright (c) 2013 NVIDIA CORPORATION. All rights reserved. */ -#include #include #include #include -- Sent by a computer through tubes
RE: [EXT] Re: rtc: pcf85363/pcf85263: fix error that failed to run hwclock -w
> Caution: EXT Email > > Hi, > > On 14/08/2019 17:32:49+0800, Biwen Li wrote: > > Issue: > > # hwclock -w > > hwclock: RTC_SET_TIME: Invalid argument > > > > The patch fixes error when run command hwclock -w with rtc > > pcf85363/pcf85263 > > > > Could you describe a bit more the issue and what causes it? 1. Relative patch: https://lkml.org/lkml/2019/4/3/55 , this patch will always check for unwritable registers, it will compare reg with max_register in regmap_writeable. 2. In drivers/rtc/rtc-pcf85363.c, CTRL_STOP_EN is 0x2e, but DT_100THS is 0, max_regiter is 0x2f, then reg will be equal to 0x30, '0x30 < 0x2f' is false,so regmap_writeable will return false. > > IIRC I wrote that code and it works on my pcf85363. > > > Signed-off-by: Biwen Li > > --- > > drivers/rtc/rtc-pcf85363.c | 7 ++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/rtc/rtc-pcf85363.c b/drivers/rtc/rtc-pcf85363.c > > index a075e77617dc..3450d615974d 100644 > > --- a/drivers/rtc/rtc-pcf85363.c > > +++ b/drivers/rtc/rtc-pcf85363.c > > @@ -166,7 +166,12 @@ static int pcf85363_rtc_set_time(struct device *dev, > struct rtc_time *tm) > > buf[DT_YEARS] = bin2bcd(tm->tm_year % 100); > > > > ret = regmap_bulk_write(pcf85363->regmap, CTRL_STOP_EN, > > - tmp, sizeof(tmp)); > > + tmp, 2); > > + if (ret) > > + return ret; > > + > > + ret = regmap_bulk_write(pcf85363->regmap, DT_100THS, > > + buf, sizeof(tmp) - 2); > > if (ret) > > return ret; > > > > -- > > 2.17.1 > > > > -- > Alexandre Belloni, Bootlin > Embedded Linux and Kernel engineering > https://eur01.safelinks.protection.outlook.com/?url=https%3A%2F%2Fbootlin. > comdata=02%7C01%7Cbiwen.li%40nxp.com%7C8ef8fda7d05a48ef707 > 308d7209f8029%7C686ea1d3bc2b4c6fa92cd99c5c301635%7C0%7C0%7C63 > 7013741730886353sdata=ePC651YZUzvL5ocjXAZqKT0tIZpJM01LgRNSa > 7i7wLE%3Dreserved=0
RE: rtc: pcf85363/pcf85263: fix error that failed to run hwclock -w
> > Subject: rtc: pcf85363/pcf85263: fix error that failed to run hwclock > > -w > > > > Issue: > > # hwclock -w > > hwclock: RTC_SET_TIME: Invalid argument > > > > The patch fixes error when run command hwclock -w with rtc > > pcf85363/pcf85263 > > Can you explain a little bit more in the commit message on how the changes fix > the above issue? It is not that clear just from the code. 1. Relative patch: https://lkml.org/lkml/2019/4/3/55 , this patch will always check for unwritable registers, it will compare reg with max_register in regmap_writeable. 2. In drivers/rtc/rtc-pcf85363.c, CTRL_STOP_EN is 0x2e, but DT_100THS is 0, max_regiter is 0x2f, then reg will be equal to 0x30, '0x30 < 0x2f' is false,so regmap_writeable will return false. > > > > > Signed-off-by: Biwen Li > > --- > > drivers/rtc/rtc-pcf85363.c | 7 ++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/drivers/rtc/rtc-pcf85363.c b/drivers/rtc/rtc-pcf85363.c > > index a075e77617dc..3450d615974d 100644 > > --- a/drivers/rtc/rtc-pcf85363.c > > +++ b/drivers/rtc/rtc-pcf85363.c > > @@ -166,7 +166,12 @@ static int pcf85363_rtc_set_time(struct device > > *dev, struct rtc_time *tm) > > buf[DT_YEARS] = bin2bcd(tm->tm_year % 100); > > > > ret = regmap_bulk_write(pcf85363->regmap, CTRL_STOP_EN, > > - tmp, sizeof(tmp)); > > + tmp, 2); > > + if (ret) > > + return ret; > > + > > + ret = regmap_bulk_write(pcf85363->regmap, DT_100THS, > > + buf, sizeof(tmp) - 2); > > if (ret) > > return ret; > > > > -- > > 2.17.1
[PATCH 0/6] powerpc: convert cache asm to C
From: Alastair D'Silva This series addresses a few issues discovered in how we flush caches: 1. Flushes were truncated at 4GB, so larger flushes were incorrect. 2. Flushing the dcache in arch_add_memory was unnecessary This series also converts much of the cache assembler to C, with the aim of making it easier to maintain. Alastair D'Silva (6): powerpc: Allow flush_icache_range to work across ranges >4GB powerpc: define helpers to get L1 icache sizes powerpc: Convert flush_icache_range & friends to C powerpc: Chunk calls to flush_dcache_range in arch_*_memory powerpc: Remove 'extern' from func prototypes in cache headers powerpc: Don't flush caches when adding memory arch/powerpc/include/asm/cache.h | 63 +- arch/powerpc/include/asm/cacheflush.h | 49 ++- arch/powerpc/kernel/misc_32.S | 117 -- arch/powerpc/kernel/misc_64.S | 97 - arch/powerpc/mm/mem.c | 80 +- 5 files changed, 146 insertions(+), 260 deletions(-) -- 2.21.0
[PATCH 6/6] powerpc: Don't flush caches when adding memory
From: Alastair D'Silva This operation takes a significant amount of time when hotplugging large amounts of memory (~50 seconds with 890GB of persistent memory). This was orignally in commit fb5924fddf9e ("powerpc/mm: Flush cache on memory hot(un)plug") to support memtrace, but the flush on add is not needed as it is flushed on remove. Signed-off-by: Alastair D'Silva --- arch/powerpc/mm/mem.c | 6 -- 1 file changed, 6 deletions(-) diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index fb0d5e9aa11b..43be99de7c9a 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -111,7 +111,6 @@ int __ref arch_add_memory(int nid, u64 start, u64 size, { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; - unsigned long i; int rc; resize_hpt_for_hotplug(memblock_phys_mem_size()); @@ -124,11 +123,6 @@ int __ref arch_add_memory(int nid, u64 start, u64 size, return -EFAULT; } - for (i = 0; i < size; i += FLUSH_CHUNK_SIZE) { - flush_dcache_range(start + i, min(start + size, start + i + FLUSH_CHUNK_SIZE)); - cond_resched(); - } - return __add_pages(nid, start_pfn, nr_pages, restrictions); } -- 2.21.0
[PATCH 5/6] powerpc: Remove 'extern' from func prototypes in cache headers
From: Alastair D'Silva The 'extern' keyword does not value-add for function prototypes. Signed-off-by: Alastair D'Silva --- arch/powerpc/include/asm/cache.h | 8 arch/powerpc/include/asm/cacheflush.h | 6 +++--- 2 files changed, 7 insertions(+), 7 deletions(-) diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h index 728f154204db..c5c096e968e0 100644 --- a/arch/powerpc/include/asm/cache.h +++ b/arch/powerpc/include/asm/cache.h @@ -102,10 +102,10 @@ static inline u32 l1_icache_bytes(void) #define __read_mostly __attribute__((__section__(".data..read_mostly"))) #ifdef CONFIG_PPC_BOOK3S_32 -extern long _get_L2CR(void); -extern long _get_L3CR(void); -extern void _set_L2CR(unsigned long); -extern void _set_L3CR(unsigned long); +long _get_L2CR(void); +long _get_L3CR(void); +void _set_L2CR(unsigned long val); +void _set_L3CR(unsigned long val); #else #define _get_L2CR()0L #define _get_L3CR()0L diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h index 4c3377aff8ed..1826bf2cc137 100644 --- a/arch/powerpc/include/asm/cacheflush.h +++ b/arch/powerpc/include/asm/cacheflush.h @@ -38,15 +38,15 @@ static inline void flush_cache_vmap(unsigned long start, unsigned long end) { } #endif #define ARCH_IMPLEMENTS_FLUSH_DCACHE_PAGE 1 -extern void flush_dcache_page(struct page *page); +void flush_dcache_page(struct page *page); #define flush_dcache_mmap_lock(mapping)do { } while (0) #define flush_dcache_mmap_unlock(mapping) do { } while (0) void flush_icache_range(unsigned long start, unsigned long stop); -extern void flush_icache_user_range(struct vm_area_struct *vma, +void flush_icache_user_range(struct vm_area_struct *vma, struct page *page, unsigned long addr, int len); -extern void flush_dcache_icache_page(struct page *page); +void flush_dcache_icache_page(struct page *page); /** * flush_dcache_range(): Write any modified data cache blocks out to memory and invalidate them. -- 2.21.0
[PATCH 3/6] powerpc: Convert flush_icache_range & friends to C
From: Alastair D'Silva Similar to commit 22e9c88d486a ("powerpc/64: reuse PPC32 static inline flush_dcache_range()") this patch converts flush_icache_range() to C, and reimplements the following functions as wrappers around it: __flush_dcache_icache __flush_dcache_icache_phys This was done as we discovered a long-standing bug where the length of the range was truncated due to using a 32 bit shift instead of a 64 bit one. By converting these functions to C, it becomes easier to maintain. Signed-off-by: Alastair D'Silva --- arch/powerpc/include/asm/cache.h | 26 +++--- arch/powerpc/include/asm/cacheflush.h | 32 --- arch/powerpc/kernel/misc_32.S | 117 -- arch/powerpc/kernel/misc_64.S | 97 - arch/powerpc/mm/mem.c | 71 +++- 5 files changed, 102 insertions(+), 241 deletions(-) diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h index f852d5cd746c..728f154204db 100644 --- a/arch/powerpc/include/asm/cache.h +++ b/arch/powerpc/include/asm/cache.h @@ -98,20 +98,7 @@ static inline u32 l1_icache_bytes(void) #endif #endif /* ! __ASSEMBLY__ */ -#if defined(__ASSEMBLY__) -/* - * For a snooping icache, we still need a dummy icbi to purge all the - * prefetched instructions from the ifetch buffers. We also need a sync - * before the icbi to order the the actual stores to memory that might - * have modified instructions with the icbi. - */ -#define PURGE_PREFETCHED_INS \ - sync; \ - icbi0,r3; \ - sync; \ - isync - -#else +#if !defined(__ASSEMBLY__) #define __read_mostly __attribute__((__section__(".data..read_mostly"))) #ifdef CONFIG_PPC_BOOK3S_32 @@ -145,6 +132,17 @@ static inline void dcbst(void *addr) { __asm__ __volatile__ ("dcbst %y0" : : "Z"(*(u8 *)addr) : "memory"); } + +static inline void icbi(void *addr) +{ + __asm__ __volatile__ ("icbi 0, %0" : : "r"(addr) : "memory"); +} + +static inline void iccci(void) +{ + __asm__ __volatile__ ("iccci 0, r0"); +} + #endif /* !__ASSEMBLY__ */ #endif /* __KERNEL__ */ #endif /* _ASM_POWERPC_CACHE_H */ diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h index ed57843ef452..4c3377aff8ed 100644 --- a/arch/powerpc/include/asm/cacheflush.h +++ b/arch/powerpc/include/asm/cacheflush.h @@ -42,24 +42,18 @@ extern void flush_dcache_page(struct page *page); #define flush_dcache_mmap_lock(mapping)do { } while (0) #define flush_dcache_mmap_unlock(mapping) do { } while (0) -extern void flush_icache_range(unsigned long, unsigned long); +void flush_icache_range(unsigned long start, unsigned long stop); extern void flush_icache_user_range(struct vm_area_struct *vma, struct page *page, unsigned long addr, int len); -extern void __flush_dcache_icache(void *page_va); extern void flush_dcache_icache_page(struct page *page); -#if defined(CONFIG_PPC32) && !defined(CONFIG_BOOKE) -extern void __flush_dcache_icache_phys(unsigned long physaddr); -#else -static inline void __flush_dcache_icache_phys(unsigned long physaddr) -{ - BUG(); -} -#endif -/* - * Write any modified data cache blocks out to memory and invalidate them. +/** + * flush_dcache_range(): Write any modified data cache blocks out to memory and invalidate them. * Does not invalidate the corresponding instruction cache blocks. + * + * @start: the start address + * @stop: the stop address (exclusive) */ static inline void flush_dcache_range(unsigned long start, unsigned long stop) { @@ -82,6 +76,20 @@ static inline void flush_dcache_range(unsigned long start, unsigned long stop) isync(); } +/** + * __flush_dcache_icache(): Flush a particular page from the data cache to RAM. + * Note: this is necessary because the instruction cache does *not* + * snoop from the data cache. + * + * @page: the address of the page to flush + */ +static inline void __flush_dcache_icache(void *page) +{ + unsigned long page_addr = (unsigned long)page; + + flush_icache_range(page_addr, page_addr + PAGE_SIZE); +} + /* * Write any modified data cache blocks out to memory. * Does not invalidate the corresponding cache lines (especially for diff --git a/arch/powerpc/kernel/misc_32.S b/arch/powerpc/kernel/misc_32.S index fe4bd321730e..12b95e6799d4 100644 --- a/arch/powerpc/kernel/misc_32.S +++ b/arch/powerpc/kernel/misc_32.S @@ -318,123 +318,6 @@ END_FTR_SECTION_IFSET(CPU_FTR_UNIFIED_ID_CACHE) EXPORT_SYMBOL(flush_instruction_cache) #endif /* CONFIG_PPC_8xx */ -/* - * Write any modified data cache blocks out to memory - * and invalidate the corresponding instruction cache blocks. - * This is a no-op on the 601. - * - * flush_icache_range(unsigned long start, unsigned long stop) - */ -_GLOBAL(flush_icache_range) -BEGIN_FTR_SECTION -
[PATCH 4/6] powerpc: Chunk calls to flush_dcache_range in arch_*_memory
From: Alastair D'Silva When presented with large amounts of memory being hotplugged (in my test case, ~890GB), the call to flush_dcache_range takes a while (~50 seconds), triggering RCU stalls. This patch breaks up the call into 16GB chunks, calling cond_resched() inbetween to allow the scheduler to run. Signed-off-by: Alastair D'Silva --- arch/powerpc/mm/mem.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/mm/mem.c b/arch/powerpc/mm/mem.c index 5400da87a804..fb0d5e9aa11b 100644 --- a/arch/powerpc/mm/mem.c +++ b/arch/powerpc/mm/mem.c @@ -104,11 +104,14 @@ int __weak remove_section_mapping(unsigned long start, unsigned long end) return -ENODEV; } +#define FLUSH_CHUNK_SIZE (16ull * 1024ull * 1024ull * 1024ull) + int __ref arch_add_memory(int nid, u64 start, u64 size, struct mhp_restrictions *restrictions) { unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; + unsigned long i; int rc; resize_hpt_for_hotplug(memblock_phys_mem_size()); @@ -120,7 +123,11 @@ int __ref arch_add_memory(int nid, u64 start, u64 size, start, start + size, rc); return -EFAULT; } - flush_dcache_range(start, start + size); + + for (i = 0; i < size; i += FLUSH_CHUNK_SIZE) { + flush_dcache_range(start + i, min(start + size, start + i + FLUSH_CHUNK_SIZE)); + cond_resched(); + } return __add_pages(nid, start_pfn, nr_pages, restrictions); } @@ -131,13 +138,18 @@ void __ref arch_remove_memory(int nid, u64 start, u64 size, unsigned long start_pfn = start >> PAGE_SHIFT; unsigned long nr_pages = size >> PAGE_SHIFT; struct page *page = pfn_to_page(start_pfn) + vmem_altmap_offset(altmap); + unsigned long i; int ret; __remove_pages(page_zone(page), start_pfn, nr_pages, altmap); /* Remove htab bolted mappings for this section of memory */ start = (unsigned long)__va(start); - flush_dcache_range(start, start + size); + for (i = 0; i < size; i += FLUSH_CHUNK_SIZE) { + flush_dcache_range(start + i, min(start + size, start + i + FLUSH_CHUNK_SIZE)); + cond_resched(); + } + ret = remove_section_mapping(start, start + size); WARN_ON_ONCE(ret); -- 2.21.0
[PATCH 2/6] powerpc: define helpers to get L1 icache sizes
From: Alastair D'Silva This patch adds helpers to retrieve icache sizes, and renames the existing helpers to make it clear that they are for dcache. Signed-off-by: Alastair D'Silva --- arch/powerpc/include/asm/cache.h | 29 +++ arch/powerpc/include/asm/cacheflush.h | 12 +-- 2 files changed, 31 insertions(+), 10 deletions(-) diff --git a/arch/powerpc/include/asm/cache.h b/arch/powerpc/include/asm/cache.h index b3388d95f451..f852d5cd746c 100644 --- a/arch/powerpc/include/asm/cache.h +++ b/arch/powerpc/include/asm/cache.h @@ -55,25 +55,46 @@ struct ppc64_caches { extern struct ppc64_caches ppc64_caches; -static inline u32 l1_cache_shift(void) +static inline u32 l1_dcache_shift(void) { return ppc64_caches.l1d.log_block_size; } -static inline u32 l1_cache_bytes(void) +static inline u32 l1_dcache_bytes(void) { return ppc64_caches.l1d.block_size; } + +static inline u32 l1_icache_shift(void) +{ + return ppc64_caches.l1i.log_block_size; +} + +static inline u32 l1_icache_bytes(void) +{ + return ppc64_caches.l1i.block_size; +} #else -static inline u32 l1_cache_shift(void) +static inline u32 l1_dcache_shift(void) { return L1_CACHE_SHIFT; } -static inline u32 l1_cache_bytes(void) +static inline u32 l1_dcache_bytes(void) { return L1_CACHE_BYTES; } + +static inline u32 l1_icache_shift(void) +{ + return L1_CACHE_SHIFT; +} + +static inline u32 l1_icache_bytes(void) +{ + return L1_CACHE_BYTES; +} + #endif #endif /* ! __ASSEMBLY__ */ diff --git a/arch/powerpc/include/asm/cacheflush.h b/arch/powerpc/include/asm/cacheflush.h index eef388f2659f..ed57843ef452 100644 --- a/arch/powerpc/include/asm/cacheflush.h +++ b/arch/powerpc/include/asm/cacheflush.h @@ -63,8 +63,8 @@ static inline void __flush_dcache_icache_phys(unsigned long physaddr) */ static inline void flush_dcache_range(unsigned long start, unsigned long stop) { - unsigned long shift = l1_cache_shift(); - unsigned long bytes = l1_cache_bytes(); + unsigned long shift = l1_dcache_shift(); + unsigned long bytes = l1_dcache_bytes(); void *addr = (void *)(start & ~(bytes - 1)); unsigned long size = stop - (unsigned long)addr + (bytes - 1); unsigned long i; @@ -89,8 +89,8 @@ static inline void flush_dcache_range(unsigned long start, unsigned long stop) */ static inline void clean_dcache_range(unsigned long start, unsigned long stop) { - unsigned long shift = l1_cache_shift(); - unsigned long bytes = l1_cache_bytes(); + unsigned long shift = l1_dcache_shift(); + unsigned long bytes = l1_dcache_bytes(); void *addr = (void *)(start & ~(bytes - 1)); unsigned long size = stop - (unsigned long)addr + (bytes - 1); unsigned long i; @@ -108,8 +108,8 @@ static inline void clean_dcache_range(unsigned long start, unsigned long stop) static inline void invalidate_dcache_range(unsigned long start, unsigned long stop) { - unsigned long shift = l1_cache_shift(); - unsigned long bytes = l1_cache_bytes(); + unsigned long shift = l1_dcache_shift(); + unsigned long bytes = l1_dcache_bytes(); void *addr = (void *)(start & ~(bytes - 1)); unsigned long size = stop - (unsigned long)addr + (bytes - 1); unsigned long i; -- 2.21.0
[PATCH 1/6] powerpc: Allow flush_icache_range to work across ranges >4GB
From: Alastair D'Silva When calling flush_icache_range with a size >4GB, we were masking off the upper 32 bits, so we would incorrectly flush a range smaller than intended. This patch replaces the 32 bit shifts with 64 bit ones, so that the full size is accounted for. Signed-off-by: Alastair D'Silva Cc: sta...@vger.kernel.org --- arch/powerpc/kernel/misc_64.S | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/misc_64.S b/arch/powerpc/kernel/misc_64.S index b55a7b4cb543..9bc0aa9aeb65 100644 --- a/arch/powerpc/kernel/misc_64.S +++ b/arch/powerpc/kernel/misc_64.S @@ -82,7 +82,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE) subfr8,r6,r4/* compute length */ add r8,r8,r5/* ensure we get enough */ lwz r9,DCACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of cache block size */ - srw.r8,r8,r9/* compute line count */ + srd.r8,r8,r9/* compute line count */ beqlr /* nothing to do? */ mtctr r8 1: dcbst 0,r6 @@ -98,7 +98,7 @@ END_FTR_SECTION_IFSET(CPU_FTR_COHERENT_ICACHE) subfr8,r6,r4/* compute length */ add r8,r8,r5 lwz r9,ICACHEL1LOGBLOCKSIZE(r10)/* Get log-2 of Icache block size */ - srw.r8,r8,r9/* compute line count */ + srd.r8,r8,r9/* compute line count */ beqlr /* nothing to do? */ mtctr r8 2: icbi0,r6 -- 2.21.0
[PATCH] clk: sunxi: Don't call clk_hw_get_name() on a hw that isn't registered
The implementation of clk_hw_get_name() relies on the clk_core associated with the clk_hw pointer existing. If of_clk_hw_register() fails, there isn't a clk_core created yet, so calling clk_hw_get_name() here fails. Extract the name first so we can print it later. Fixes: 1d80c14248d6 ("clk: sunxi-ng: Add common infrastructure") Cc: Maxime Ripard Cc: Chen-Yu Tsai Signed-off-by: Stephen Boyd --- drivers/clk/sunxi-ng/ccu_common.c | 5 +++-- 1 file changed, 3 insertions(+), 2 deletions(-) diff --git a/drivers/clk/sunxi-ng/ccu_common.c b/drivers/clk/sunxi-ng/ccu_common.c index 7fe3ac980e5f..2e20e650b6c0 100644 --- a/drivers/clk/sunxi-ng/ccu_common.c +++ b/drivers/clk/sunxi-ng/ccu_common.c @@ -97,14 +97,15 @@ int sunxi_ccu_probe(struct device_node *node, void __iomem *reg, for (i = 0; i < desc->hw_clks->num ; i++) { struct clk_hw *hw = desc->hw_clks->hws[i]; + const char *name; if (!hw) continue; + name = hw->init->name; ret = of_clk_hw_register(node, hw); if (ret) { - pr_err("Couldn't register clock %d - %s\n", - i, clk_hw_get_name(hw)); + pr_err("Couldn't register clock %d - %s\n", i, name); goto err_clk_unreg; } } base-commit: 5f9e832c137075045d15cd6899ab0505cfb2ca4b -- Sent by a computer through tubes
Re: [PATCH] KVM: LAPIC: Periodically revaluate appropriate lapic_timer_advance_ns
On Wed, 14 Aug 2019 at 20:50, Paolo Bonzini wrote: > > On 12/08/19 11:06, Wanpeng Li wrote: > > On Fri, 9 Aug 2019 at 18:24, Paolo Bonzini wrote: > >> > >> On 09/08/19 07:45, Wanpeng Li wrote: > >>> From: Wanpeng Li > >>> > >>> Even if for realtime CPUs, cache line bounces, frequency scaling, presence > >>> of higher-priority RT tasks, etc can cause different response. These > >>> interferences should be considered and periodically revaluate whether > >>> or not the lapic_timer_advance_ns value is the best, do nothing if it is, > >>> otherwise recaluate again. > >> > >> How much fluctuation do you observe between different runs? > > > > Sometimes can ~1000 cycles after converting to guest tsc freq. > > Hmm, I wonder if we need some kind of continuous smoothing. Something like Actually this can fluctuate drastically instead of continuous smoothing during testing (running linux guest instead of kvm-unit-tests). > > if (abs(advance_expire_delta) < LAPIC_TIMER_ADVANCE_ADJUST_DONE) { > /* no update for random fluctuations */ > return; > } > > if (unlikely(timer_advance_ns > 5000)) > timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT; > apic->lapic_timer.timer_advance_ns = timer_advance_ns; > > and removing all the timer_advance_adjust_done stuff. What do you think? I just sent out v2, periodically revaluate and get a minimal conservative value from these revaluate points. Please have a look. :) Regards, Wanpeng Li
[PATCH v2] KVM: LAPIC: Periodically revaluate to get conservative lapic_timer_advance_ns
From: Wanpeng Li Even if for realtime CPUs, cache line bounces, frequency scaling, presence of higher-priority RT tasks, etc can still cause different response. These interferences should be considered and periodically revaluate whether or not the lapic_timer_advance_ns value is the best, do nothing if it is, otherwise recaluate again. Set lapic_timer_advance_ns to the minimal conservative value from all the estimated values. Testing on Skylake server, cat vcpu*/lapic_timer_advance_ns, before patch: 1628 4161 4321 3236 ... Testing on Skylake server, cat vcpu*/lapic_timer_advance_ns, after patch: 1553 1499 1509 1489 ... Testing on Haswell desktop, cat vcpu*/lapic_timer_advance_ns, before patch: 4617 3641 4102 4577 ... Testing on Haswell desktop, cat vcpu*/lapic_timer_advance_ns, after patch: 2775 2892 2764 2775 ... Cc: Paolo Bonzini Cc: Radim Krčmář Signed-off-by: Wanpeng Li --- arch/x86/kvm/lapic.c | 34 -- arch/x86/kvm/lapic.h | 2 ++ 2 files changed, 30 insertions(+), 6 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index df5cd07..8487d9c 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -69,6 +69,7 @@ #define LAPIC_TIMER_ADVANCE_ADJUST_INIT 1000 /* step-by-step approximation to mitigate fluctuation */ #define LAPIC_TIMER_ADVANCE_ADJUST_STEP 8 +#define LAPIC_TIMER_ADVANCE_RECALC_PERIOD (600 * HZ) static inline int apic_test_vector(int vec, void *bitmap) { @@ -1480,10 +1481,21 @@ static inline void __wait_lapic_expire(struct kvm_vcpu *vcpu, u64 guest_cycles) static inline void adjust_lapic_timer_advance(struct kvm_vcpu *vcpu, s64 advance_expire_delta) { - struct kvm_lapic *apic = vcpu->arch.apic; - u32 timer_advance_ns = apic->lapic_timer.timer_advance_ns; + struct kvm_timer *ktimer = >arch.apic->lapic_timer; + u32 timer_advance_ns = ktimer->timer_advance_ns; u64 ns; + /* periodic revaluate */ + if (unlikely(ktimer->timer_advance_adjust_done)) { + ktimer->recalc_timer_advance_ns = jiffies + + LAPIC_TIMER_ADVANCE_RECALC_PERIOD; + if (abs(advance_expire_delta) > LAPIC_TIMER_ADVANCE_ADJUST_DONE) { + timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT; + ktimer->timer_advance_adjust_done = false; + } else + return; + } + /* too early */ if (advance_expire_delta < 0) { ns = -advance_expire_delta * 100ULL; @@ -1499,12 +1511,18 @@ static inline void adjust_lapic_timer_advance(struct kvm_vcpu *vcpu, } if (abs(advance_expire_delta) < LAPIC_TIMER_ADVANCE_ADJUST_DONE) - apic->lapic_timer.timer_advance_adjust_done = true; + ktimer->timer_advance_adjust_done = true; if (unlikely(timer_advance_ns > 5000)) { timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT; - apic->lapic_timer.timer_advance_adjust_done = false; + ktimer->timer_advance_adjust_done = false; + } + ktimer->timer_advance_ns = timer_advance_ns; + + if (ktimer->timer_advance_adjust_done) { + if (ktimer->min_timer_advance_ns > timer_advance_ns) + ktimer->min_timer_advance_ns = timer_advance_ns; + ktimer->timer_advance_ns = ktimer->min_timer_advance_ns; } - apic->lapic_timer.timer_advance_ns = timer_advance_ns; } static void __kvm_wait_lapic_expire(struct kvm_vcpu *vcpu) @@ -1523,7 +1541,8 @@ static void __kvm_wait_lapic_expire(struct kvm_vcpu *vcpu) if (guest_tsc < tsc_deadline) __wait_lapic_expire(vcpu, tsc_deadline - guest_tsc); - if (unlikely(!apic->lapic_timer.timer_advance_adjust_done)) + if (unlikely(!apic->lapic_timer.timer_advance_adjust_done) || + time_before(apic->lapic_timer.recalc_timer_advance_ns, jiffies)) adjust_lapic_timer_advance(vcpu, apic->lapic_timer.advance_expire_delta); } @@ -2301,9 +2320,12 @@ int kvm_create_lapic(struct kvm_vcpu *vcpu, int timer_advance_ns) if (timer_advance_ns == -1) { apic->lapic_timer.timer_advance_ns = LAPIC_TIMER_ADVANCE_ADJUST_INIT; apic->lapic_timer.timer_advance_adjust_done = false; + apic->lapic_timer.recalc_timer_advance_ns = jiffies; + apic->lapic_timer.min_timer_advance_ns = UINT_MAX; } else { apic->lapic_timer.timer_advance_ns = timer_advance_ns; apic->lapic_timer.timer_advance_adjust_done = true; + apic->lapic_timer.recalc_timer_advance_ns = MAX_JIFFY_OFFSET; } diff --git a/arch/x86/kvm/lapic.h b/arch/x86/kvm/lapic.h index 50053d2..56a05eb 100644 --- a/arch/x86/kvm/lapic.h +++ b/arch/x86/kvm/lapic.h @@ -31,6 +31,8 @@ struct kvm_timer { u32 timer_mode_mask;
Re: clk/clk-next boot bisection: v5.3-rc1-79-g31f58d2f58cb on sun8i-h3-libretech-all-h3-cc
Quoting kernelci.org bot (2019-08-14 20:35:25) > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * > * This automated bisection report was sent to you on the basis * > * that you may be involved with the breaking commit it has * > * found. No manual investigation has been done to verify it, * > * and the root cause of the problem may be somewhere else. * > * * > * If you do send a fix, please include this trailer:* > * Reported-by: "kernelci.org bot" * > * * > * Hope this helps! * > * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * > > clk/clk-next boot bisection: v5.3-rc1-79-g31f58d2f58cb on > sun8i-h3-libretech-all-h3-cc If this is the only board that failed, great! Must be something in a sun8i driver that uses the init structure after registration. > > Summary: > Start: 31f58d2f58cb Merge branch 'clk-meson' into clk-next > Details:https://kernelci.org/boot/id/5d54b9d159b514324cf1226e > Plain log: > https://storage.kernelci.org//clk/clk-next/v5.3-rc1-79-g31f58d2f58cb/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-8/lab-baylibre/boot-sun8i-h3-libretech-all-h3-cc.txt > HTML log: > https://storage.kernelci.org//clk/clk-next/v5.3-rc1-79-g31f58d2f58cb/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-8/lab-baylibre/boot-sun8i-h3-libretech-all-h3-cc.html > Result: c82987e740d1 clk: Overwrite clk_hw::init with NULL during > clk_register() > > Checks: > revert: PASS > verify: PASS > > Parameters: > Tree: clk > URL:https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git > Branch: clk-next > Target: sun8i-h3-libretech-all-h3-cc > CPU arch: arm > Lab:lab-baylibre > Compiler: gcc-8 > Config: multi_v7_defconfig+CONFIG_SMP=n > Test suite: boot > > Breaking commit found: > > --- > commit c82987e740d12be98b8ae8aa9221b8b9e2541271 > Author: Stephen Boyd > Date: Wed Jul 31 12:35:17 2019 -0700 > > clk: Overwrite clk_hw::init with NULL during clk_register() > > We don't want clk provider drivers to use the init structure after clk > registration time, but we leave a dangling reference to it by means of > clk_hw::init. Let's overwrite the member with NULL during clk_register() > so that this can't be used anymore after registration time. > > Cc: Bjorn Andersson > Cc: Doug Anderson > Signed-off-by: Stephen Boyd > Link: https://lkml.kernel.org/r/20190731193517.237136-10-sb...@kernel.org > Reviewed-by: Sylwester Nawrocki > > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c > index c0990703ce54..efac620264a2 100644 > --- a/drivers/clk/clk.c > +++ b/drivers/clk/clk.c > @@ -3484,9 +3484,9 @@ static int clk_cpy_name(const char **dst_p, const char > *src, bool must_exist) > return 0; > } > > -static int clk_core_populate_parent_map(struct clk_core *core) > +static int clk_core_populate_parent_map(struct clk_core *core, > + const struct clk_init_data *init) > { > - const struct clk_init_data *init = core->hw->init; > u8 num_parents = init->num_parents; > const char * const *parent_names = init->parent_names; > const struct clk_hw **parent_hws = init->parent_hws; > @@ -3566,6 +3566,14 @@ __clk_register(struct device *dev, struct device_node > *np, struct clk_hw *hw) > { > int ret; > struct clk_core *core; > + const struct clk_init_data *init = hw->init; > + > + /* > +* The init data is not supposed to be used outside of registration > path. > +* Set it to NULL so that provider drivers can't use it either and so > that > +* we catch use of hw->init early on in the core. > +*/ > + hw->init = NULL; > > core = kzalloc(sizeof(*core), GFP_KERNEL); > if (!core) { > @@ -3573,17 +3581,17 @@ __clk_register(struct device *dev, struct device_node > *np, struct clk_hw *hw) > goto fail_out; > } > > - core->name = kstrdup_const(hw->init->name, GFP_KERNEL); > + core->name = kstrdup_const(init->name, GFP_KERNEL); > if (!core->name) { > ret = -ENOMEM; > goto fail_name; > } > > - if (WARN_ON(!hw->init->ops)) { > + if (WARN_ON(!init->ops)) { > ret = -EINVAL; > goto fail_ops; > } > - core->ops = hw->init->ops; > + core->ops = init->ops; > > if (dev && pm_runtime_enabled(dev)) > core->rpm_enabled = true; > @@ -3592,13 +3600,13 @@ __clk_register(struct device *dev, struct device_node > *np, struct clk_hw *hw) > if (dev && dev->driver) >
[PATCH] perf vendor events intel: Add Tremontx event file v1.02
Add a Intel event file for perf. Signed-off-by: Haiyan Song --- tools/perf/pmu-events/arch/x86/mapfile.csv | 1 + tools/perf/pmu-events/arch/x86/tremontx/cache.json | 111 ++ .../pmu-events/arch/x86/tremontx/frontend.json | 26 ++ .../perf/pmu-events/arch/x86/tremontx/memory.json | 26 ++ tools/perf/pmu-events/arch/x86/tremontx/other.json | 26 ++ .../pmu-events/arch/x86/tremontx/pipeline.json | 111 ++ .../arch/x86/tremontx/uncore-memory.json | 73 .../pmu-events/arch/x86/tremontx/uncore-other.json | 431 + .../pmu-events/arch/x86/tremontx/uncore-power.json | 11 + .../arch/x86/tremontx/virtual-memory.json | 86 10 files changed, 902 insertions(+) create mode 100644 tools/perf/pmu-events/arch/x86/tremontx/cache.json create mode 100644 tools/perf/pmu-events/arch/x86/tremontx/frontend.json create mode 100644 tools/perf/pmu-events/arch/x86/tremontx/memory.json create mode 100644 tools/perf/pmu-events/arch/x86/tremontx/other.json create mode 100644 tools/perf/pmu-events/arch/x86/tremontx/pipeline.json create mode 100644 tools/perf/pmu-events/arch/x86/tremontx/uncore-memory.json create mode 100644 tools/perf/pmu-events/arch/x86/tremontx/uncore-other.json create mode 100644 tools/perf/pmu-events/arch/x86/tremontx/uncore-power.json create mode 100644 tools/perf/pmu-events/arch/x86/tremontx/virtual-memory.json diff --git a/tools/perf/pmu-events/arch/x86/mapfile.csv b/tools/perf/pmu-events/arch/x86/mapfile.csv index b90e5fec2f32..745ced083844 100644 --- a/tools/perf/pmu-events/arch/x86/mapfile.csv +++ b/tools/perf/pmu-events/arch/x86/mapfile.csv @@ -35,4 +35,5 @@ GenuineIntel-6-55-[01234],v1,skylakex,core GenuineIntel-6-55-[56789ABCDEF],v1,cascadelakex,core GenuineIntel-6-7D,v1,icelake,core GenuineIntel-6-7E,v1,icelake,core +GenuineIntel-6-86,v1,tremontx,core AuthenticAMD-23-[[:xdigit:]]+,v1,amdfam17h,core diff --git a/tools/perf/pmu-events/arch/x86/tremontx/cache.json b/tools/perf/pmu-events/arch/x86/tremontx/cache.json new file mode 100644 index ..f88040171b4d --- /dev/null +++ b/tools/perf/pmu-events/arch/x86/tremontx/cache.json @@ -0,0 +1,111 @@ +[ +{ +"CollectPEBSRecord": "2", +"PublicDescription": "Counts cacheable memory requests that miss in the the Last Level Cache. Requests include Demand Loads, Reads for Ownership(RFO), Instruction fetches and L1 HW prefetches. If the platform has an L3 cache, last level cache is the L3, otherwise it is the L2.", +"EventCode": "0x2e", +"Counter": "0,1,2,3", +"UMask": "0x41", +"PEBScounters": "0,1,2,3", +"EventName": "LONGEST_LAT_CACHE.MISS", +"PDIR_COUNTER": "na", +"SampleAfterValue": "23", +"BriefDescription": "Counts memory requests originating from the core that miss in the last level cache. If the platform has an L3 cache, last level cache is the L3, otherwise it is the L2." +}, +{ +"CollectPEBSRecord": "2", +"PublicDescription": "Counts cacheable memory requests that access the Last Level Cache. Requests include Demand Loads, Reads for Ownership(RFO), Instruction fetches and L1 HW prefetches. If the platform has an L3 cache, last level cache is the L3, otherwise it is the L2.", +"EventCode": "0x2e", +"Counter": "0,1,2,3", +"UMask": "0x4f", +"PEBScounters": "0,1,2,3", +"EventName": "LONGEST_LAT_CACHE.REFERENCE", +"PDIR_COUNTER": "na", +"SampleAfterValue": "23", +"BriefDescription": "Counts memory requests originating from the core that reference a cache line in the last level cache. If the platform has an L3 cache, last level cache is the L3, otherwise it is the L2." +}, +{ +"PEBS": "1", +"CollectPEBSRecord": "2", +"PublicDescription": "Counts the number of load uops retired. This event is Precise Event capable", +"EventCode": "0xd0", +"Counter": "0,1,2,3", +"UMask": "0x81", +"PEBScounters": "0,1,2,3", +"EventName": "MEM_UOPS_RETIRED.ALL_LOADS", +"SampleAfterValue": "23", +"BriefDescription": "Counts the number of load uops retired.", +"Data_LA": "1" +}, +{ +"PEBS": "1", +"CollectPEBSRecord": "2", +"PublicDescription": "Counts the number of store uops retired. This event is Precise Event capable", +"EventCode": "0xd0", +"Counter": "0,1,2,3", +"UMask": "0x82", +"PEBScounters": "0,1,2,3", +"EventName": "MEM_UOPS_RETIRED.ALL_STORES", +"SampleAfterValue": "23", +"BriefDescription": "Counts the number of store uops retired.", +"Data_LA": "1" +}, +{ +"PEBS": "1", +"CollectPEBSRecord": "2", +"EventCode": "0xd1", +"Counter": "0,1,2,3", +"UMask": "0x1", +"PEBScounters": "0,1,2,3", +"EventName":
Re: [5.3.0-rc4-next][bisected 882632][qla2xxx] WARNING: CPU: 10 PID: 425 at drivers/scsi/qla2xxx/qla_isr.c:2784 qla2x00_status_entry.isra
On 8/14/19 10:18 AM, Abdul Haleem wrote: On Wed, 2019-08-14 at 10:05 -0700, Bart Van Assche wrote: On 8/14/19 9:52 AM, Abdul Haleem wrote: Greeting's Today's linux-next kernel (5.3.0-rc4-next-20190813) booted with warning on my powerpc power 8 lpar The WARN_ON_ONCE() was introduced by commit 88263208 (scsi: qla2xxx: Complain if sp->done() is not...) boot logs: WARNING: CPU: 10 PID: 425 at drivers/scsi/qla2xxx/qla_isr.c:2784 Hi Abdul, Thank you for having reported this. Is that the only warning reported on your setup by the qla2xxx driver? If that warning is commented out, does the qla2xxx driver work as expected? boot warning did not show up when the commit is reverted. should I comment out only the WARN_ON_ONCE() which is causing the issue, and not the other one ? Yes please. Commit 88263208 introduced five kernel warnings but I think only one of these should be removed again, e.g. as follows: diff --git a/drivers/scsi/qla2xxx/qla_isr.c b/drivers/scsi/qla2xxx/qla_isr.c index cd39ac18c5fd..d81b5ecce24b 100644 --- a/drivers/scsi/qla2xxx/qla_isr.c +++ b/drivers/scsi/qla2xxx/qla_isr.c @@ -2780,8 +2780,6 @@ qla2x00_status_entry(scsi_qla_host_t *vha, struct rsp_que *rsp, void *pkt) if (rsp->status_srb == NULL) sp->done(sp, res); - else - WARN_ON_ONCE(true); } /**
[PATCH v5] arm64: dts: ls1028a: Add esdhc node in dts
From: Ashish Kumar This patch is to add esdhc node and enable SD UHS-I, eMMC HS200 for ls1028ardb/ls1028aqds board. Signed-off-by: Ashish Kumar Signed-off-by: Yangbo Lu Signed-off-by: Yinbo Zhu --- Change in v5: Fix indent. arch/arm64/boot/dts/freescale/fsl-ls1028a-qds.dts | 8 +++ arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts | 13 +++ arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi| 27 +++ 3 files changed, 48 insertions(+) diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a-qds.dts b/arch/arm64/boot/dts/freescale/fsl-ls1028a-qds.dts index de6ef39..5e14e5a 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a-qds.dts +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a-qds.dts @@ -95,6 +95,14 @@ status = "okay"; }; + { + status = "okay"; +}; + + { + status = "okay"; +}; + { status = "okay"; diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts b/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts index 9fb9113..1a69221 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts @@ -83,6 +83,19 @@ }; }; + { + sd-uhs-sdr104; + sd-uhs-sdr50; + sd-uhs-sdr25; + sd-uhs-sdr12; + status = "okay"; +}; + + { + mmc-hs200-1_8v; + status = "okay"; +}; + { status = "okay"; diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi index 7975519..f299075 100644 --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi @@ -245,6 +245,33 @@ status = "disabled"; }; + esdhc: mmc@214 { + compatible = "fsl,ls1028a-esdhc", "fsl,esdhc"; + reg = <0x0 0x214 0x0 0x1>; + interrupts = ; + clock-frequency = <0>; /* fixed up by bootloader */ + clocks = < 2 1>; + voltage-ranges = <1800 1800 3300 3300>; + sdhci,auto-cmd12; + little-endian; + bus-width = <4>; + status = "disabled"; + }; + + esdhc1: mmc@215 { + compatible = "fsl,ls1028a-esdhc", "fsl,esdhc"; + reg = <0x0 0x215 0x0 0x1>; + interrupts = ; + clock-frequency = <0>; /* fixed up by bootloader */ + clocks = < 2 1>; + voltage-ranges = <1800 1800 3300 3300>; + sdhci,auto-cmd12; + broken-cd; + little-endian; + bus-width = <4>; + status = "disabled"; + }; + duart0: serial@21c0500 { compatible = "fsl,ns16550", "ns16550a"; reg = <0x00 0x21c0500 0x0 0x100>; -- 2.9.5
clk/clk-next boot bisection: v5.3-rc1-79-g31f58d2f58cb on sun8i-h3-libretech-all-h3-cc
* * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * This automated bisection report was sent to you on the basis * * that you may be involved with the breaking commit it has * * found. No manual investigation has been done to verify it, * * and the root cause of the problem may be somewhere else. * * * * If you do send a fix, please include this trailer:* * Reported-by: "kernelci.org bot" * * * * Hope this helps! * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * * clk/clk-next boot bisection: v5.3-rc1-79-g31f58d2f58cb on sun8i-h3-libretech-all-h3-cc Summary: Start: 31f58d2f58cb Merge branch 'clk-meson' into clk-next Details:https://kernelci.org/boot/id/5d54b9d159b514324cf1226e Plain log: https://storage.kernelci.org//clk/clk-next/v5.3-rc1-79-g31f58d2f58cb/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-8/lab-baylibre/boot-sun8i-h3-libretech-all-h3-cc.txt HTML log: https://storage.kernelci.org//clk/clk-next/v5.3-rc1-79-g31f58d2f58cb/arm/multi_v7_defconfig+CONFIG_SMP=n/gcc-8/lab-baylibre/boot-sun8i-h3-libretech-all-h3-cc.html Result: c82987e740d1 clk: Overwrite clk_hw::init with NULL during clk_register() Checks: revert: PASS verify: PASS Parameters: Tree: clk URL:https://git.kernel.org/pub/scm/linux/kernel/git/clk/linux.git Branch: clk-next Target: sun8i-h3-libretech-all-h3-cc CPU arch: arm Lab:lab-baylibre Compiler: gcc-8 Config: multi_v7_defconfig+CONFIG_SMP=n Test suite: boot Breaking commit found: --- commit c82987e740d12be98b8ae8aa9221b8b9e2541271 Author: Stephen Boyd Date: Wed Jul 31 12:35:17 2019 -0700 clk: Overwrite clk_hw::init with NULL during clk_register() We don't want clk provider drivers to use the init structure after clk registration time, but we leave a dangling reference to it by means of clk_hw::init. Let's overwrite the member with NULL during clk_register() so that this can't be used anymore after registration time. Cc: Bjorn Andersson Cc: Doug Anderson Signed-off-by: Stephen Boyd Link: https://lkml.kernel.org/r/20190731193517.237136-10-sb...@kernel.org Reviewed-by: Sylwester Nawrocki diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c index c0990703ce54..efac620264a2 100644 --- a/drivers/clk/clk.c +++ b/drivers/clk/clk.c @@ -3484,9 +3484,9 @@ static int clk_cpy_name(const char **dst_p, const char *src, bool must_exist) return 0; } -static int clk_core_populate_parent_map(struct clk_core *core) +static int clk_core_populate_parent_map(struct clk_core *core, + const struct clk_init_data *init) { - const struct clk_init_data *init = core->hw->init; u8 num_parents = init->num_parents; const char * const *parent_names = init->parent_names; const struct clk_hw **parent_hws = init->parent_hws; @@ -3566,6 +3566,14 @@ __clk_register(struct device *dev, struct device_node *np, struct clk_hw *hw) { int ret; struct clk_core *core; + const struct clk_init_data *init = hw->init; + + /* +* The init data is not supposed to be used outside of registration path. +* Set it to NULL so that provider drivers can't use it either and so that +* we catch use of hw->init early on in the core. +*/ + hw->init = NULL; core = kzalloc(sizeof(*core), GFP_KERNEL); if (!core) { @@ -3573,17 +3581,17 @@ __clk_register(struct device *dev, struct device_node *np, struct clk_hw *hw) goto fail_out; } - core->name = kstrdup_const(hw->init->name, GFP_KERNEL); + core->name = kstrdup_const(init->name, GFP_KERNEL); if (!core->name) { ret = -ENOMEM; goto fail_name; } - if (WARN_ON(!hw->init->ops)) { + if (WARN_ON(!init->ops)) { ret = -EINVAL; goto fail_ops; } - core->ops = hw->init->ops; + core->ops = init->ops; if (dev && pm_runtime_enabled(dev)) core->rpm_enabled = true; @@ -3592,13 +3600,13 @@ __clk_register(struct device *dev, struct device_node *np, struct clk_hw *hw) if (dev && dev->driver) core->owner = dev->driver->owner; core->hw = hw; - core->flags = hw->init->flags; - core->num_parents = hw->init->num_parents; + core->flags = init->flags; + core->num_parents = init->num_parents; core->min_rate = 0; core->max_rate = ULONG_MAX; hw->core = core; - ret = clk_core_populate_parent_map(core); + ret =
Re: [PATCH v3 2/2] pwm: sprd: Add Spreadtrum PWM support
Hi Uwe, On Wed, 14 Aug 2019 at 23:03, Uwe Kleine-König wrote: > > On Wed, Aug 14, 2019 at 08:46:11PM +0800, Baolin Wang wrote: > > + > > + /* > > + * The hardware provides a counter that is feed by the source clock. > > + * The period length is (PRESCALE + 1) * MOD counter steps. > > + * The duty cycle length is (PRESCALE + 1) * DUTY counter steps. > > + * Thus the period_ns and duty_ns calculation formula should be: > > + * period_ns = NSEC_PER_SEC * (prescale + 1) * mod / clk_rate > > + * duty_ns = NSEC_PER_SEC * (prescale + 1) * duty / clk_rate > > + */ > > + val = sprd_pwm_read(spc, pwm->hwpwm, SPRD_PWM_PRESCALE); > > + prescale = val & SPRD_PWM_PRESCALE_MSK; > > + tmp = (prescale + 1) * NSEC_PER_SEC * SPRD_PWM_MOD_MAX; > > + state->period = DIV_ROUND_CLOSEST_ULL(tmp, chn->clk_rate); > > + > > + val = sprd_pwm_read(spc, pwm->hwpwm, SPRD_PWM_DUTY); > > + duty = val & SPRD_PWM_DUTY_MSK; > > + tmp = (prescale + 1) * NSEC_PER_SEC * duty; > > + state->duty_cycle = DIV_ROUND_CLOSEST_ULL(tmp, chn->clk_rate); > > + > > + /* Disable PWM clocks if the PWM channel is not in enable state. */ > > + if (!state->enabled) > > + clk_bulk_disable_unprepare(SPRD_PWM_CHN_CLKS_NUM, chn->clks); > > +} > > + > > +static int sprd_pwm_config(struct sprd_pwm_chip *spc, struct pwm_device > > *pwm, > > +int duty_ns, int period_ns) > > +{ > > + struct sprd_pwm_chn *chn = >chn[pwm->hwpwm]; > > + u32 prescale, duty; > > + u64 tmp; > > + > > + /* > > + * The hardware provides a counter that is feed by the source clock. > > + * The period length is (PRESCALE + 1) * MOD counter steps. > > + * The duty cycle length is (PRESCALE + 1) * DUTY counter steps. > > + * > > + * To keep the maths simple we're always using MOD = SPRD_PWM_MOD_MAX. > > Did you spend some thoughts about how wrong your period can get because > of that "lazyness"? > > Let's assume a clk rate of 100/3 MHz. Then the available period lengths > are: > > PRESCALE = 0 -> period = 7.65 µs > PRESCALE = 1 -> period = 15.30 µs > ... > PRESCALE = 17 -> period = 137.70 µs > PRESCALE = 18 -> period = 145.35 µs > > So the error can be up to (nearly) 7.65 µs (or in general Yes, but for our use case (pwm backlight), the precision can meet our requirement. Moreover, we usually do not change the period, just adjust the duty to change the back light. > 255 / clk_rate) because if 145.34 µs is requested you configure > PRESCALE = 17 and so yield a period of 137.70 µs. If however you'd pick I did not get you here, if period is 145.34, we still get the corresponding PRESCALE = 18 by below formula: tmp = (u64)chn->clk_rate * period_ns; do_div(tmp, NSEC_PER_SEC); prescale = DIV_ROUND_CLOSEST_ULL(tmp, SPRD_PWM_MOD_MAX) - 1; > PRESCALE = 18 and MOD = 254 you get a period of 144.78 µs and so the > error is only 0.56 µs which is a factor of 13 better. > > Hmm. > > > + * The value for PRESCALE is selected such that the resulting period > > + * gets the maximal length not bigger than the requested one with the > > + * given settings (MOD = SPRD_PWM_MOD_MAX and input clock). > > + */ > > + duty = duty_ns * SPRD_PWM_MOD_MAX / period_ns; > > I wonder if you loose some precision here as you use period_ns but might > actually implement a shorter period. > > Quick example, again consider clk_rate = 100 / 3 MHz, > period_ns = 145340, duty_ns = 72670. Then you end up with > > PRESCALE = 17 > MOD = 255 > DUTY = 127 Incorrect, we will get PRESCALE = 18, MOD = 255, DUTY = 127. > That corresponds to period_ns = 137700, duty_ns = 68580. With DUTY = 134 > you get 72360 ns which is still smaller than the requested 72670 ns. Incorrect, with DUTY = 134 (PRESCALE = 18 -> period = 145.35 µs), duty_ns = 76380ns > (But then again it is not obvious which of the two is the "better" > approximation because Thierry doesn't seem to see the necessity to > discuss or define a policy here.) Like I said, this is the simple calculation formula which can meet our requirement (we limit our DUTY value can only be 0 - 254). duty = duty_ns * SPRD_PWM_MOD_MAX / period_ns; > > (And to pick up the thoughts about not using SPRD_PWM_MOD_MAX > unconditionally, you could also use > > PRESCALE = 18 > MOD = 254 > DUTY = 127 > > yielding period_ns = 144780 and duty_ns = 72390. Summary: > > Request: 72670 / 145340 > your result: 68580 / 137700 > also possible: 72390 / 144780 > > Judge yourself.) > > > + tmp = (u64)chn->clk_rate * period_ns; > > + do_div(tmp, NSEC_PER_SEC); > > + prescale = DIV_ROUND_CLOSEST_ULL(tmp, SPRD_PWM_MOD_MAX) - 1; > > Now that you use DIV_ROUND_CLOSEST_ULL the comment is wrong because you > might provide a period bigger than the requested one. Also you
Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration
On 2019/8/14 上午12:41, Christoph Hellwig wrote: On Tue, Aug 13, 2019 at 08:57:07AM -0300, Jason Gunthorpe wrote: On Tue, Aug 13, 2019 at 04:31:07PM +0800, Jason Wang wrote: What kind of issues do you see? Spinlock is to synchronize GUP with MMU notifier in this series. A GUP that can't sleep can't pagefault which makes it a really weird pattern get_user_pages/get_user_pages_fast must not be called under a spinlock. We have the somewhat misnamed __get_user_page_fast that just does a lookup for existing pages and never faults for a few places that need to do that lookup from contexts where we can't sleep. Yes, I do use __get_user_pages_fast() in the code. Thanks
Re: [PATCH V5 0/9] Fixes for vhost metadata acceleration
On 2019/8/13 下午7:57, Jason Gunthorpe wrote: On Tue, Aug 13, 2019 at 04:31:07PM +0800, Jason Wang wrote: What kind of issues do you see? Spinlock is to synchronize GUP with MMU notifier in this series. A GUP that can't sleep can't pagefault which makes it a really weird pattern My understanding is __get_user_pages_fast() assumes caller can fail or have fallback. And we have graceful fallback to copy_{to|from}_user(). Btw, back to the original question. May I know why synchronize_rcu() is not suitable? Consider: We already went over this. You'd need to determine it doesn't somehow deadlock the mm on reclaim paths. Maybe it is OK, the rcq_gq_wq is marked WQ_MEM_RECLAIM at least.. Yes, will take a look at this. I also think Michael was concerned about the latency spikes a long RCU delay would cause. I don't think it's a real problem consider MMU notifier could be preempted or blocked. Thanks Jason
Re: [PATCH] virtio-net: lower min ring num_free for efficiency
On 2019/8/15 上午11:11, 冉 jiang wrote: On 2019/8/15 11:01, Jason Wang wrote: On 2019/8/14 上午10:06, ? jiang wrote: This change lowers ring buffer reclaim threshold from 1/2*queue to budget for better performance. According to our test with qemu + dpdk, packet dropping happens when the guest is not able to provide free buffer in avail ring timely with default 1/2*queue. The value in the patch has been tested and does show better performance. Please add your tests setup and result here. Thanks Signed-off-by: jiangkidd --- drivers/net/virtio_net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 0d4115c9e20b..bc08be7925eb 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1331,7 +1331,7 @@ static int virtnet_receive(struct receive_queue *rq, int budget, } } - if (rq->vq->num_free > virtqueue_get_vring_size(rq->vq) / 2) { + if (rq->vq->num_free > min((unsigned int)budget, virtqueue_get_vring_size(rq->vq)) / 2) { if (!try_fill_recv(vi, rq, GFP_ATOMIC)) schedule_delayed_work(>refill, 0); } Sure, here are the details: Thanks for the details, but I meant it's better if you could summarize you test result in the commit log in a compact way. Btw, some comments, see below: Test setup & result: Below is the snippet from our test result. Test1 was done with default driver with the value of 1/2 * queue, while test2 is with my patch. We can see average drop packets do decrease a lot in test2. test1Time avgDropPackets test2Time avgDropPackets pps 16:21.0 12.295 56:50.4 0 300k 17:19.1 15.244 56:50.4 0 300k 18:17.5 18.789 56:50.4 0 300k 19:15.1 14.208 56:50.4 0 300k 20:13.2 20.818 56:50.4 0.267 300k 21:11.2 12.397 56:50.4 0 300k 22:09.3 12.599 56:50.4 0 300k 23:07.3 15.531 57:48.4 0 300k 24:05.5 13.664 58:46.5 0 300k 25:03.7 13.158 59:44.5 4.73 300k 26:01.1 2.486 00:42.6 0 300k 26:59.1 11.241 01:40.6 0 300k 27:57.2 20.521 02:38.6 0 300k 28:55.2 30.094 03:36.7 0 300k 29:53.3 16.828 04:34.7 0.963 300k 30:51.3 46.916 05:32.8 0 400k 31:49.3 56.214 05:32.8 0 400k 32:47.3 58.69 05:32.8 0 400k 33:45.3 61.486 05:32.8 0 400k 34:43.3 72.175 05:32.8 0.598 400k 35:41.3 56.699 05:32.8 0 400k 36:39.3 61.071 05:32.8 0 400k 37:37.3 43.355 06:30.8 0 400k 38:35.4 44.644 06:30.8 0 400k 39:33.4 72.336 06:30.8 0 400k 40:31.4 70.676 06:30.8 0 400k 41:29.4 108.009 06:30.8 0 400k 42:27.4 65.216 06:30.8 0 400k Why there're difference in test time? Could you summarize them like: Test setup: e.g testpmd or pktgen to generate packets to guest avg packets drop before: XXX avg packets drop after: YYY(-ZZZ%) Thanks Data to prove why the patch helps: We did have completed several rounds of test with setting the value to budget (64 as the default value). It does improve a lot with pps is below 400pps for a single stream. We are confident that it runs out of free buffer in avail ring when packet dropping happens with below systemtap: Just a snippet: probe module("virtio_ring").function("virtqueue_get_buf") { x = (@cast($_vq, "vring_virtqueue")->vring->used->idx)- (@cast($_vq, "vring_virtqueue")->last_used_idx) ---> we use this one to verify if the queue is full, which means guest is not able to take buffer from the queue timely if (x<0 && (x+65535)<4096) x = x+65535 if((x==1024) && @cast($_vq, "vring_virtqueue")->vq->callback == callback_addr) netrxcount[x] <<< gettimeofday_s() } probe module("virtio_ring").function("virtqueue_add_inbuf") { y = (@cast($vq, "vring_virtqueue")->vring->avail->idx)- (@cast($vq, "vring_virtqueue")->vring->used->idx) ---> we use this one to verify if we run out of free buffer in avail ring if (y<0 && (y+65535)<4096) y = y+65535 if(@2=="debugon") { if(y==0 && @cast($vq, "vring_virtqueue")->vq->callback == callback_addr) { netrxfreecount[y] <<< gettimeofday_s() printf("no avail ring left seen, printing most recent 5 num free, vq: %lx, current index: %d\n", $vq, recentfreecount) for(i=recentfreecount; i!=((recentfreecount+4) % 5); i=((i+1) % 5)) { printf("index: %d, num free: %d\n", i, recentfree[$vq, i]) } printf("index: %d, num free: %d\n", i, recentfree[$vq, i]) //exit() } } } probe
[PATCH] arm64: dts: imx8mn: Add gpio-ranges property
From: Anson Huang Add "gpio-ranges" property to establish connections between GPIOs and PINs on i.MX8MN pinctrl driver. Signed-off-by: Anson Huang --- arch/arm64/boot/dts/freescale/imx8mn.dtsi | 5 + 1 file changed, 5 insertions(+) diff --git a/arch/arm64/boot/dts/freescale/imx8mn.dtsi b/arch/arm64/boot/dts/freescale/imx8mn.dtsi index f5eff35..1d8899b 100644 --- a/arch/arm64/boot/dts/freescale/imx8mn.dtsi +++ b/arch/arm64/boot/dts/freescale/imx8mn.dtsi @@ -173,6 +173,7 @@ #gpio-cells = <2>; interrupt-controller; #interrupt-cells = <2>; + gpio-ranges = < 0 10 30>; }; gpio2: gpio@3021 { @@ -185,6 +186,7 @@ #gpio-cells = <2>; interrupt-controller; #interrupt-cells = <2>; + gpio-ranges = < 0 40 21>; }; gpio3: gpio@3022 { @@ -197,6 +199,7 @@ #gpio-cells = <2>; interrupt-controller; #interrupt-cells = <2>; + gpio-ranges = < 0 61 26>; }; gpio4: gpio@3023 { @@ -209,6 +212,7 @@ #gpio-cells = <2>; interrupt-controller; #interrupt-cells = <2>; + gpio-ranges = < 21 108 11>; }; gpio5: gpio@3024 { @@ -221,6 +225,7 @@ #gpio-cells = <2>; interrupt-controller; #interrupt-cells = <2>; + gpio-ranges = < 0 119 30>; }; wdog1: watchdog@3028 { -- 2.7.4
Re: [PATCH] sh: Drop -Werror from kernel Makefile
On 8/14/19 5:59 PM, Gustavo A. R. Silva wrote: Guenter, On 8/13/19 8:18 AM, Guenter Roeck wrote: Please note that _mainline_ builds are currently broken. This should be fixed now: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=41de59634046b19cd53a1983594a95135c656997 Yes, it is. Thanks! Guenter
Re: [PATCH 1/2] riscv: Add memmove string operation.
Hi Paul, On Wed, Aug 14, 2019 at 10:03:39AM -0700, Paul Walmsley wrote: > Hi Nick, > > On Wed, 14 Aug 2019, Nick Hu wrote: > > > On Wed, Aug 14, 2019 at 10:22:15AM +0800, Paul Walmsley wrote: > > > On Tue, 13 Aug 2019, Palmer Dabbelt wrote: > > > > > > > On Mon, 12 Aug 2019 08:04:46 PDT (-0700), Christoph Hellwig wrote: > > > > > On Wed, Aug 07, 2019 at 03:19:14PM +0800, Nick Hu wrote: > > > > > > There are some features which need this string operation for > > > > > > compilation, > > > > > > like KASAN. So the purpose of this porting is for the features like > > > > > > KASAN > > > > > > which cannot be compiled without it. > > > > > > > > > > > > KASAN's string operations would replace the original string > > > > > > operations and > > > > > > call for the architecture defined string operations. Since we don't > > > > > > have > > > > > > this in current kernel, this patch provides the implementation. > > > > > > > > > > > > This porting refers to the 'arch/nds32/lib/memmove.S'. > > > > > > > > > > This looks sensible to me, although my stringop asm is rather rusty, > > > > > so just an ack and not a real review-by: > > > > > > > > FWIW, we just write this in C everywhere else and rely on the compiler > > > > to > > > > unroll the loops. I always prefer C to assembly when possible, so I'd > > > > prefer > > > > if we just adopt the string code from newlib. We have a RISC-V-specific > > > > memcpy in there, but just use the generic memmove. > > > > > > > > Maybe the best bet here would be to adopt the newlib memcpy/memmove as > > > > generic > > > > Linux functions? They're both in C so they should be fine, and they > > > > both look > > > > faster than what's in lib/string.c. Then everyone would benefit and we > > > > don't > > > > need this tricky RISC-V assembly. Also, from the look of it the newlib > > > > code > > > > is faster because the inner loop is unrolled. > > > > > > There's a generic memmove implementation in the kernel already: > > > > > > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/include/linux/string.h#n362 > > > > > > Nick, could you tell us more about why the generic memmove() isn't > > > suitable? > > > > KASAN has its own string operations(memcpy/memmove/memset) because it needs > > to > > hook some code to check memory region. It would undefined the original > > string > > operations and called the string operations with the prefix '__'. But the > > generic string operations didn't declare with the prefix. Other archs with > > KASAN support like arm64 and xtensa all have their own string operations and > > defined with the prefix. > > Thanks for the explanation. What do you think about Palmer's idea to > define a generic C set of KASAN string operations, derived from the newlib > code? > > > - Paul That sounds good to me. But it should be another topic. We need to investigate it further about replacing something generic and fundamental in lib/string.c with newlib C functions. Some blind spots may exist. So I suggest, let's consider KASAN for now. Nick
Re: [PATCH] nbd: add a missed nbd_config_put() in nbd_xmit_timeout()
Josef had ackd my patch for the same thing here: https://www.spinics.net/lists/linux-block/msg43800.html so maybe Jens will pick that up with the rest of the set Josef had acked: https://www.spinics.net/lists/linux-block/msg43809.html to make it easier. On 08/14/2019 08:27 PM, sunke (E) wrote: > ping > > 在 2019/8/12 20:31, Sun Ke 写道: >> When try to get the lock failed, before return, execute the >> nbd_config_put() to decrease the nbd->config_refs. >> >> If the nbd->config_refs is added but not decreased. Then will not >> execute nbd_clear_sock() in nbd_config_put(). bd->task_setup will >> not be cleared away. Finally, print"Device being setup by another >> task" in nbd_add_sock() and nbd device can not be reused. >> >> Fixes: 8f3ea35929a0 ("nbd: handle unexpected replies better") >> Signed-off-by: Sun Ke >> --- >> drivers/block/nbd.c | 4 +++- >> 1 file changed, 3 insertions(+), 1 deletion(-) >> >> diff --git a/drivers/block/nbd.c b/drivers/block/nbd.c >> index e21d2de..a69a90a 100644 >> --- a/drivers/block/nbd.c >> +++ b/drivers/block/nbd.c >> @@ -357,8 +357,10 @@ static enum blk_eh_timer_return >> nbd_xmit_timeout(struct request *req, >> } >> config = nbd->config; >> -if (!mutex_trylock(>lock)) >> +if (!mutex_trylock(>lock)) { >> +nbd_config_put(nbd); >> return BLK_EH_RESET_TIMER; >> +} >> if (config->num_connections > 1) { >> dev_err_ratelimited(nbd_to_dev(nbd), >> >
Re: [PATCH] cxgb4: fix a memory leak bug
From: Wenwen Wang Date: Tue, 13 Aug 2019 04:18:52 -0500 > In blocked_fl_write(), 't' is not deallocated if bitmap_parse_user() fails, > leading to a memory leak bug. To fix this issue, free t before returning > the error. > > Signed-off-by: Wenwen Wang Applied.
Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
On 8/14/19 5:02 PM, John Hubbard wrote: On 8/14/19 4:50 PM, Ira Weiny wrote: On Tue, Aug 13, 2019 at 05:56:31PM -0700, John Hubbard wrote: On 8/13/19 5:51 PM, John Hubbard wrote: On 8/13/19 2:08 PM, Ira Weiny wrote: On Mon, Aug 12, 2019 at 05:07:32PM -0700, John Hubbard wrote: On 8/12/19 4:49 PM, Ira Weiny wrote: On Sun, Aug 11, 2019 at 06:50:44PM -0700, john.hubb...@gmail.com wrote: From: John Hubbard ... Finally, I struggle with converting everyone to a new call. It is more overhead to use vaddr_pin in the call above because now the GUP code is going to associate a file pin object with that file when in ODP we don't need that because the pages can move around. What if the pages in ODP are file-backed? oops, strike that, you're right: in that case, even the file system case is covered. Don't mind me. :) Ok so are we agreed we will drop the patch to the ODP code? I'm going to keep the FOLL_PIN flag and addition in the vaddr_pin_pages. Yes. I hope I'm not overlooking anything, but it all seems to make sense to let ODP just rely on the MMU notifiers. Hold on, I *was* forgetting something: this was a two part thing, and you're conflating the two points, but they need to remain separate and distinct. There were: 1. FOLL_PIN is necessary because the caller is clearly in the use case that requires it--however briefly they might be there. As Jan described it, "Anything that gets page reference and then touches page data (e.g. direct IO) needs the new kind of tracking so that filesystem knows someone is messing with the page data." [1] 2. Releasing the pin: for ODP, we can use MMU notifiers instead of requiring a lease. This second point does not invalidate the first point. Therefore, I still see the need for the call within ODP, to something that sets FOLL_PIN. And that means either vaddr_pin_[user?]_pages_remote, or some other wrapper of your choice. :) I guess shows that the API might need to be refined. We're trying to solve two closely related issues, but they're not identical. thanks, -- John Hubbard NVIDIA
Re: [PATCH] virtio-net: lower min ring num_free for efficiency
On 2019/8/14 上午10:06, ? jiang wrote: This change lowers ring buffer reclaim threshold from 1/2*queue to budget for better performance. According to our test with qemu + dpdk, packet dropping happens when the guest is not able to provide free buffer in avail ring timely with default 1/2*queue. The value in the patch has been tested and does show better performance. Please add your tests setup and result here. Thanks Signed-off-by: jiangkidd --- drivers/net/virtio_net.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/drivers/net/virtio_net.c b/drivers/net/virtio_net.c index 0d4115c9e20b..bc08be7925eb 100644 --- a/drivers/net/virtio_net.c +++ b/drivers/net/virtio_net.c @@ -1331,7 +1331,7 @@ static int virtnet_receive(struct receive_queue *rq, int budget, } } - if (rq->vq->num_free > virtqueue_get_vring_size(rq->vq) / 2) { + if (rq->vq->num_free > min((unsigned int)budget, virtqueue_get_vring_size(rq->vq)) / 2) { if (!try_fill_recv(vi, rq, GFP_ATOMIC)) schedule_delayed_work(>refill, 0); }
[PATCH] arm: dts: rockchip: fix vcc_host_5v regulator for usb3 host
According to rock64 schemetic V2 and V3, the VCC_HOST_5V output is controlled by USB_20_HOST_DRV, which is the same as VCC_HOST1_5V. Signed-off-by: Kever Yang --- arch/arm64/boot/dts/rockchip/rk3328-rock64.dts | 10 ++ 1 file changed, 2 insertions(+), 8 deletions(-) diff --git a/arch/arm64/boot/dts/rockchip/rk3328-rock64.dts b/arch/arm64/boot/dts/rockchip/rk3328-rock64.dts index 7cfd5ca6cc85..bd4ad1635e0b 100644 --- a/arch/arm64/boot/dts/rockchip/rk3328-rock64.dts +++ b/arch/arm64/boot/dts/rockchip/rk3328-rock64.dts @@ -35,9 +35,9 @@ vcc_host_5v: vcc-host-5v-regulator { compatible = "regulator-fixed"; enable-active-high; - gpio = < RK_PA0 GPIO_ACTIVE_HIGH>; + gpio = < RK_PA2 GPIO_ACTIVE_LOW>; pinctrl-names = "default"; - pinctrl-0 = <_host_drv>; + pinctrl-0 = <_host_drv>; regulator-name = "vcc_host_5v"; regulator-always-on; regulator-boot-on; @@ -320,12 +320,6 @@ rockchip,pins = <0 RK_PA2 RK_FUNC_GPIO _pull_none>; }; }; - - usb3 { - usb30_host_drv: usb30-host-drv { - rockchip,pins = <0 RK_PA0 RK_FUNC_GPIO _pull_none>; - }; - }; }; { -- 2.17.1
Re: [PATCH v2 2/2] riscv: Make __fstate_clean() work correctly.
On Thu, Aug 15, 2019 at 6:17 AM Andreas Schwab wrote: > > On Aug 14 2019, Palmer Dabbelt wrote: > > > On Wed, 14 Aug 2019 13:32:50 PDT (-0700), Paul Walmsley wrote: > >> On Wed, 14 Aug 2019, Vincent Chen wrote: > >> > >>> Make the __fstate_clean() function correctly set the > >>> state of sstatus.FS in pt_regs to SR_FS_CLEAN. > >>> > >>> Fixes: 7db91e5 ("RISC-V: Task implementation") > >>> Cc: linux-stable > >>> Signed-off-by: Vincent Chen > >>> Reviewed-by: Anup Patel > >>> Reviewed-by: Christoph Hellwig > >> > >> Thanks, I extended the "Fixes" commit ID to 12 digits, as is the usual > >> practice here, and have queued the following for v5.3-rc. > > Thank Paul for correcting my mistake. > > For reference, something like "git config core.abbrev=12" (or whatever you > > write to get this in your .gitconfig) > > > >https://github.com/palmer-dabbelt/home/blob/master/.gitconfig.in#L23 > > > > causes git to do the right thing. > > Actually, the right setting is core.abbrev=auto (or leaving it unset). > It lets git chose the appropriate length depending on the repository > contents. For the linux repository it will chose 13 right now. > > Andreas. > Thanks to Palmer and Andreas for sharing this useful information. > -- > Andreas Schwab, sch...@linux-m68k.org > GPG Key fingerprint = 7578 EB47 D4E5 4D69 2510 2552 DF73 E780 A9DA AEC1 > "And now for something completely different."
Re: [PATCH] Makefile: Convert -Wimplicit-fallthrough=3 to just -Wimplicit-fallthrough for clang
On Tue, 2019-08-13 at 14:44 +0200, Miguel Ojeda wrote: > Hm... I would go for either __fallthrough as the rest of attributes, > or simply fallthrough -- FALLTHROUGH seems wrong. If you want it that > way for visibility, then I would choose __fallthrough, since the > underscores are quite prominent and anyway IDEs typically highlight > macros in a different color than keywords (return etc.). Just fyi: I added this line to my .emacs and "fallthrough" is now syntax highlighted like every other keyword. (font-lock-add-keywords 'c-mode '(("\\<\\(fallthrough\\)\\>" . font-lock-keyword-face))) So now my linux-c-mode block is: (defun linux-c-mode () "C mode with adjusted defaults for use with the Linux kernel." (interactive) (font-lock-add-keywords 'c-mode '(("\\<\\(fallthrough\\)\\>" . font-lock-keyword-face))) (c-mode) (c-set-style "K") (setq c-basic-offset 8) (setq c-indent-level 8) (setq c-brace-imaginary-offset 0) (setq c-brace-offset -8) (setq c-argdecl-indent 8) (setq c-label-offset -8) (setq c-continued-statement-offset 8) (setq indent-tabs-mode t) (setq tab-width 8) (setq show-trailing-whitespace t) ) I don't know to do that for vim nor any other ide, but I trust someone will know and show how it's done.
Re: [PATCH] x86/fixmap: update stale comments
Hi, Wish to know whether the patch make sense. On 8/9/19 7:46 PM, Cao jin wrote: > Signed-off-by: Cao jin > --- > arch/x86/include/asm/fixmap.h | 5 ++--- > 1 file changed, 2 insertions(+), 3 deletions(-) > > diff --git a/arch/x86/include/asm/fixmap.h b/arch/x86/include/asm/fixmap.h > index 9da8cccdf3fb..0c47aa82e2e2 100644 > --- a/arch/x86/include/asm/fixmap.h > +++ b/arch/x86/include/asm/fixmap.h > @@ -42,8 +42,7 @@ > * Because of this, FIXADDR_TOP x86 integration was left as later work. > */ > #ifdef CONFIG_X86_32 > -/* used by vmalloc.c, vsyscall.lds.S. > - * > +/* Not seeing variable __FIXADDR_TOP & macro FIXADDR_TOP under CONFIG_X86_32 referred in vmalloc.c, and there is no vsyscall.lds.S now. > * Leave one empty page between vmalloc'ed areas and > * the start of the fixmap. > */ > @@ -120,7 +119,7 @@ enum fixed_addresses { >* before ioremap() is functional. >* >* If necessary we round it up to the next 512 pages boundary so > - * that we can have a single pgd entry and a single pte table: > + * that we can have a single pmd entry and a single pte table: The comments seems missed to be updated in an ancient commit 551889a6e2a24 >*/ > #define NR_FIX_BTMAPS64 > #define FIX_BTMAPS_SLOTS 8 > -- Sincerely, Cao jin
Re: [PATCH V3 4/4] dt-bindings: iio: adc: ad7192: Add binding documentation for AD7192
On Wed, Aug 14, 2019 at 1:32 AM Mircea Caprioru wrote: > > This patch add device tree binding documentation for AD7192 adc in YAML > format. > > Signed-off-by: Mircea Caprioru > --- > Changelog V2: > - remove description from spi and interrupt properties > - changed the name of the device from ad7192 to adc in the example > > Changelog V3: > - added semicolon at the end of the dt example > > .../bindings/iio/adc/adi,ad7192.yaml | 121 ++ > 1 file changed, 121 insertions(+) > create mode 100644 Documentation/devicetree/bindings/iio/adc/adi,ad7192.yaml Reviewed-by: Rob Herring
[PATCH] afs: Move comments after /* fallthrough */
Make the code a bit easier for a script to appropriately convert case statement blocks with /* fallthrough */ comments to a macro by moving comments describing the next case block to the case statement. Signed-off-by: Joe Perches --- fs/afs/cmservice.c | 10 +++--- fs/afs/fsclient.c | 51 +-- fs/afs/vlclient.c | 50 +- fs/afs/yfsclient.c | 51 +-- 4 files changed, 62 insertions(+), 100 deletions(-) diff --git a/fs/afs/cmservice.c b/fs/afs/cmservice.c index b86195e4dc6c..2270fe9325da 100644 --- a/fs/afs/cmservice.c +++ b/fs/afs/cmservice.c @@ -282,10 +282,8 @@ static int afs_deliver_cb_callback(struct afs_call *call) case 0: afs_extract_to_tmp(call); call->unmarshall++; - - /* extract the FID array and its count in two steps */ /* fall through */ - case 1: + case 1: /* extract the FID array and its count in two steps */ _debug("extract FID count"); ret = afs_extract_data(call, true); if (ret < 0) @@ -329,9 +327,8 @@ static int afs_deliver_cb_callback(struct afs_call *call) afs_extract_to_tmp(call); call->unmarshall++; - /* extract the callback array and its count in two steps */ /* fall through */ - case 3: + case 3: /* extract the callback array & count in two steps */ _debug("extract CB count"); ret = afs_extract_data(call, true); if (ret < 0) @@ -651,9 +648,8 @@ static int afs_deliver_yfs_cb_callback(struct afs_call *call) afs_extract_to_tmp(call); call->unmarshall++; - /* extract the FID array and its count in two steps */ /* Fall through */ - case 1: + case 1: /* extract the FID array and its count in two steps */ _debug("extract FID count"); ret = afs_extract_data(call, true); if (ret < 0) diff --git a/fs/afs/fsclient.c b/fs/afs/fsclient.c index 114f281f3687..d9dc1bdfa695 100644 --- a/fs/afs/fsclient.c +++ b/fs/afs/fsclient.c @@ -341,8 +341,7 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call) } /* Fall through */ - /* extract the returned data length */ - case 1: + case 1: /* extract the returned data length */ _debug("extract data length"); ret = afs_extract_data(call, true); if (ret < 0) @@ -369,8 +368,7 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call) ASSERTCMP(size, <=, PAGE_SIZE); /* Fall through */ - /* extract the returned data */ - case 2: + case 2: /* extract the returned data */ _debug("extract data %zu/%llu", iov_iter_count(>iter), req->remain); @@ -411,8 +409,7 @@ static int afs_deliver_fs_fetch_data(struct afs_call *call) afs_extract_to_buf(call, (21 + 3 + 6) * 4); /* Fall through */ - /* extract the metadata */ - case 4: + case 4: /* extract the metadata */ ret = afs_extract_data(call, false); if (ret < 0) return ret; @@ -1476,8 +1473,7 @@ static int afs_deliver_fs_get_volume_status(struct afs_call *call) afs_extract_to_buf(call, 12 * 4); /* Fall through */ - /* extract the returned status record */ - case 1: + case 1: /* extract the returned status record */ _debug("extract status"); ret = afs_extract_data(call, true); if (ret < 0) @@ -1489,8 +1485,7 @@ static int afs_deliver_fs_get_volume_status(struct afs_call *call) afs_extract_to_tmp(call); /* Fall through */ - /* extract the volume name length */ - case 2: + case 2: /* extract the volume name length */ ret = afs_extract_data(call, true); if (ret < 0) return ret; @@ -1505,8 +1500,7 @@ static int afs_deliver_fs_get_volume_status(struct afs_call *call) call->unmarshall++; /* Fall through */ - /* extract the volume name */ - case 3: + case 3: /* extract the volume name */ _debug("extract volname"); ret = afs_extract_data(call, true); if (ret < 0) @@ -1519,8 +1513,7 @@ static int afs_deliver_fs_get_volume_status(struct afs_call *call) call->unmarshall++; /* Fall through */ - /* extract the offline message length */ -
[GIT PULL] Devicetree fixes for 5.3-rc, take 3
Linus, Please pull DT fixes for 5.3. Rob The following changes since commit 609488bc979f99f805f34e9a32c1e3b71179d10b: Linux 5.3-rc2 (2019-07-28 12:47:02 -0700) are available in the Git repository at: git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git tags/devicetree-fixes-for-5.3-3 for you to fetch changes up to 83f82d7a42583e93d0f0dde3d61ed10f75c0f4d8: of: irq: fix a trivial typo in a doc comment (2019-08-14 20:12:16 -0600) Devicetree fixes for 5.3: - Fix building DT binding examples for in tree builds - Correct some refcounting in adjust_local_phandle_references() - Update FSL FEC binding with deprecated properties - Schema fix in stm32 pinctrl - Fix typo in of_irq_parse_one docbook comment Lubomir Rintel (1): of: irq: fix a trivial typo in a doc comment Nishka Dasgupta (1): of: resolver: Add of_node_put() before return and break Rob Herring (2): dt-bindings: Fix generated example files getting added to schemas dt-bindings: pinctrl: stm32: Fix 'st,syscfg' schema Sven Van Asbroeck (1): dt-bindings: fec: explicitly mark deprecated properties Documentation/devicetree/bindings/Makefile | 4 ++- Documentation/devicetree/bindings/net/fsl-fec.txt | 30 -- .../bindings/pinctrl/st,stm32-pinctrl.yaml | 3 ++- drivers/of/irq.c | 2 +- drivers/of/resolver.c | 12 ++--- 5 files changed, 32 insertions(+), 19 deletions(-)
Re: [PATCH] scsi: fnic: remove redundant assignment of variable rc
Colin, > Variable ret is initialized to a value that is never read and it is > re-assigned later and immediatetly returns. Clean up the code by > removing rc and just returning 0. Applied to 5.4/scsi-queue. Thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH] of: irq: fix a trivial typo in a doc comment
On Wed, 7 Aug 2019 15:22:31 +0200, Lubomir Rintel wrote: > Diverged from what the code does with commit 530210c7814e ("of/irq: Replace > of_irq with of_phandle_args"). > > Signed-off-by: Lubomir Rintel > --- > drivers/of/irq.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > Applied, thanks. Rob
Re: [PATCH] scsi: ufs: Fix NULL pointer dereference in ufshcd_config_vreg_hpm()
Adrian, > Fix the following BUG: > > [ 187.065689] BUG: kernel NULL pointer dereference, address: > 001c > [ 187.065790] RIP: 0010:ufshcd_vreg_set_hpm+0x3c/0x110 [ufshcd_core] > [ 187.065938] Call Trace: > [ 187.065959] ufshcd_resume+0x72/0x290 [ufshcd_core] > [ 187.065980] ufshcd_system_resume+0x54/0x140 [ufshcd_core] > [ 187.065993] ? pci_pm_restore+0xb0/0xb0 > [ 187.066005] ufshcd_pci_resume+0x15/0x20 [ufshcd_pci] > [ 187.066017] pci_pm_thaw+0x4c/0x90 > [ 187.066030] dpm_run_callback+0x5b/0x150 > [ 187.066043] device_resume+0x11b/0x220 > > Voltage regulators are optional, so functions must check they exist > before dereferencing. > > Note this issue is hidden if CONFIG_REGULATORS is not set, because the > offending code is optimised away. Applied to 5.3/scsi-fixes, thanks! -- Martin K. Petersen Oracle Linux Engineering
Re: [PATCH v2] psi: get poll_work to run when calling poll syscall next time
Hello, It's been delayed for no reason a couple of days. Any comments and suggestions on this patch V2 would be appreciated. Thanks, Jason On 2019/7/30 下午1:16, Jason Xing wrote: Only when calling the poll syscall the first time can user receive POLLPRI correctly. After that, user always fails to acquire the event signal. Reproduce case: 1. Get the monitor code in Documentation/accounting/psi.txt 2. Run it, and wait for the event triggered. 3. Kill and restart the process. If the user doesn't kill the monitor process, it seems the poll_work works fine. After killing and restarting the monitor, the poll_work in kernel will never run again due to the wrong value of poll_scheduled. Therefore, we should reset the value as group_init() does after the last trigger is destroyed. [PATCH V2] In the patch v2, I put the atomic_set(>poll_scheduled, 0); into the right place. Here I quoted from Johannes as the best explaination: "The question is why we can end up with poll_scheduled = 1 but the work not running (which would reset it to 0). And the answer is because the scheduling side sees group->poll_kworker under RCU protection and then schedules it, but here we cancel the work and destroy the worker. The cancel needs to pair with resetting the poll_scheduled flag." Signed-off-by: Jason Xing Reviewed-by: Caspar Zhang Reviewed-by: Joseph Qi Reviewed-by: Suren Baghdasaryan Acked-by: Johannes Weiner --- kernel/sched/psi.c | 7 +++ 1 file changed, 7 insertions(+) diff --git a/kernel/sched/psi.c b/kernel/sched/psi.c index 7acc632..acdada0 100644 --- a/kernel/sched/psi.c +++ b/kernel/sched/psi.c @@ -1131,7 +1131,14 @@ static void psi_trigger_destroy(struct kref *ref) * deadlock while waiting for psi_poll_work to acquire trigger_lock */ if (kworker_to_destroy) { + /* +* After the RCU grace period has expired, the worker +* can no longer be found through group->poll_kworker. +* But it might have been already scheduled before +* that - deschedule it cleanly before destroying it. +*/ kthread_cancel_delayed_work_sync(>poll_work); + atomic_set(>poll_scheduled, 0); kthread_destroy_worker(kworker_to_destroy); } kfree(t);
Re: [PATCH v2 2/2] riscv: Make __fstate_clean() work correctly.
On Wed, 14 Aug 2019 15:17:18 PDT (-0700), sch...@linux-m68k.org wrote: On Aug 14 2019, Palmer Dabbelt wrote: On Wed, 14 Aug 2019 13:32:50 PDT (-0700), Paul Walmsley wrote: On Wed, 14 Aug 2019, Vincent Chen wrote: Make the __fstate_clean() function correctly set the state of sstatus.FS in pt_regs to SR_FS_CLEAN. Fixes: 7db91e5 ("RISC-V: Task implementation") Cc: linux-stable Signed-off-by: Vincent Chen Reviewed-by: Anup Patel Reviewed-by: Christoph Hellwig Thanks, I extended the "Fixes" commit ID to 12 digits, as is the usual practice here, and have queued the following for v5.3-rc. For reference, something like "git config core.abbrev=12" (or whatever you write to get this in your .gitconfig) https://github.com/palmer-dabbelt/home/blob/master/.gitconfig.in#L23 causes git to do the right thing. Actually, the right setting is core.abbrev=auto (or leaving it unset). It lets git chose the appropriate length depending on the repository contents. For the linux repository it will chose 13 right now. Awesome, thanks! I've updated my config :)
Re: [PATCH 5.2 000/144] 5.2.9-stable review
On Wed, 14 Aug 2019 at 22:33, Greg Kroah-Hartman wrote: > > This is the start of the stable review cycle for the 5.2.9 release. > There are 144 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Fri 16 Aug 2019 04:55:34 PM UTC. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > > https://www.kernel.org/pub/linux/kernel/v5.x/stable-review/patch-5.2.9-rc1.gz > or in the git tree and branch at: > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-5.2.y > and the diffstat can be found below. > > thanks, > > greg k-h > Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386. Summary kernel: 5.2.9-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-5.2.y git commit: 2440e485aeda5f36eaf2050eb1bb61be46275b39 git describe: v5.2.8-145-g2440e485aeda Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-5.2-oe/build/v5.2.8-145-g2440e485aeda No regressions (compared to build v5.2.8) No fixes (compared to build v5.2.8) Ran 22959 total tests in the following environments and test suites. Environments -- - dragonboard-410c - hi6220-hikey - i386 - juno-r2 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - x86 Test Suites --- * build * install-android-platform-tools-r2600 * kselftest * libgpiod * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-securebits-tests * ltp-timers-tests * spectre-meltdown-checker-test * ltp-sched-tests * ltp-syscalls-tests * perf * v4l2-compliance * ltp-fs-tests * ltp-open-posix-tests * network-basic-tests * kvm-unit-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none -- Linaro LKFT https://lkft.linaro.org
[PATCH 3/3] libnvdimm/security: Consolidate 'security' operations
The security operations are exported from libnvdimm/security.c to libnvdimm/dimm_devs.c, and libnvdimm/security.c is optionally compiled based on the CONFIG_NVDIMM_KEYS config symbol. Rather than export the operations across compile objects, just move the __security_store() entry point to live with the helpers. Cc: Dave Jiang Signed-off-by: Dan Williams --- drivers/nvdimm/dimm_devs.c | 84 - drivers/nvdimm/nd-core.h | 30 +-- drivers/nvdimm/security.c | 90 ++-- 3 files changed, 90 insertions(+), 114 deletions(-) diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c index d837cb9be83d..196aa44c4936 100644 --- a/drivers/nvdimm/dimm_devs.c +++ b/drivers/nvdimm/dimm_devs.c @@ -393,88 +393,6 @@ static ssize_t frozen_show(struct device *dev, } static DEVICE_ATTR_RO(frozen); -#define OPS\ - C( OP_FREEZE, "freeze", 1), \ - C( OP_DISABLE, "disable", 2), \ - C( OP_UPDATE, "update", 3), \ - C( OP_ERASE,"erase",2), \ - C( OP_OVERWRITE,"overwrite",2), \ - C( OP_MASTER_UPDATE,"master_update",3), \ - C( OP_MASTER_ERASE, "master_erase", 2) -#undef C -#define C(a, b, c) a -enum nvdimmsec_op_ids { OPS }; -#undef C -#define C(a, b, c) { b, c } -static struct { - const char *name; - int args; -} ops[] = { OPS }; -#undef C - -#define SEC_CMD_SIZE 32 -#define KEY_ID_SIZE 10 - -static ssize_t __security_store(struct device *dev, const char *buf, size_t len) -{ - struct nvdimm *nvdimm = to_nvdimm(dev); - ssize_t rc; - char cmd[SEC_CMD_SIZE+1], keystr[KEY_ID_SIZE+1], - nkeystr[KEY_ID_SIZE+1]; - unsigned int key, newkey; - int i; - - rc = sscanf(buf, "%"__stringify(SEC_CMD_SIZE)"s" - " %"__stringify(KEY_ID_SIZE)"s" - " %"__stringify(KEY_ID_SIZE)"s", - cmd, keystr, nkeystr); - if (rc < 1) - return -EINVAL; - for (i = 0; i < ARRAY_SIZE(ops); i++) - if (sysfs_streq(cmd, ops[i].name)) - break; - if (i >= ARRAY_SIZE(ops)) - return -EINVAL; - if (ops[i].args > 1) - rc = kstrtouint(keystr, 0, ); - if (rc >= 0 && ops[i].args > 2) - rc = kstrtouint(nkeystr, 0, ); - if (rc < 0) - return rc; - - if (i == OP_FREEZE) { - dev_dbg(dev, "freeze\n"); - rc = nvdimm_security_freeze(nvdimm); - } else if (i == OP_DISABLE) { - dev_dbg(dev, "disable %u\n", key); - rc = nvdimm_security_disable(nvdimm, key); - } else if (i == OP_UPDATE || i == OP_MASTER_UPDATE) { - dev_dbg(dev, "%s %u %u\n", ops[i].name, key, newkey); - rc = nvdimm_security_update(nvdimm, key, newkey, i == OP_UPDATE - ? NVDIMM_USER : NVDIMM_MASTER); - } else if (i == OP_ERASE || i == OP_MASTER_ERASE) { - dev_dbg(dev, "%s %u\n", ops[i].name, key); - if (atomic_read(>busy)) { - dev_dbg(dev, "Unable to secure erase while DIMM active.\n"); - return -EBUSY; - } - rc = nvdimm_security_erase(nvdimm, key, i == OP_ERASE - ? NVDIMM_USER : NVDIMM_MASTER); - } else if (i == OP_OVERWRITE) { - dev_dbg(dev, "overwrite %u\n", key); - if (atomic_read(>busy)) { - dev_dbg(dev, "Unable to overwrite while DIMM active.\n"); - return -EBUSY; - } - rc = nvdimm_security_overwrite(nvdimm, key); - } else - return -EINVAL; - - if (rc == 0) - rc = len; - return rc; -} - static ssize_t security_store(struct device *dev, struct device_attribute *attr, const char *buf, size_t len) @@ -489,7 +407,7 @@ static ssize_t security_store(struct device *dev, nd_device_lock(dev); nvdimm_bus_lock(dev); wait_nvdimm_bus_probe_idle(dev); - rc = __security_store(dev, buf, len); + rc = nvdimm_security_store(dev, buf, len); nvdimm_bus_unlock(dev); nd_device_unlock(dev); diff --git a/drivers/nvdimm/nd-core.h b/drivers/nvdimm/nd-core.h index da2bbfd56d9f..454454ba1738 100644 --- a/drivers/nvdimm/nd-core.h +++ b/drivers/nvdimm/nd-core.h @@ -68,35 +68,11 @@ static inline unsigned long nvdimm_security_flags( } int nvdimm_security_freeze(struct nvdimm *nvdimm); #if IS_ENABLED(CONFIG_NVDIMM_KEYS) -int nvdimm_security_disable(struct nvdimm *nvdimm, unsigned int keyid); -int nvdimm_security_update(struct nvdimm
[PATCH 1/3] libnvdimm/security: Introduce a 'frozen' attribute
In the process of debugging a system with an NVDIMM that was failing to unlock it was found that the kernel is reporting 'locked' while the DIMM security interface is 'frozen'. Unfortunately the security state is tracked internally as an enum which prevents it from communicating the difference between 'locked' and 'locked + frozen'. It follows that the enum also prevents the kernel from communicating 'unlocked + frozen' which would be useful for debugging why security operations like 'change passphrase' are disabled. Ditch the security state enum for a set of flags and introduce a new sysfs attribute explicitly for the 'frozen' state. The regression risk is low because the 'frozen' state was already blocked behind the 'locked' state, but will need to revisit if there were cases where applications need 'frozen' to show up in the primary 'security' attribute. The expectation is that communicating 'frozen' is mostly a helper for debug and status monitoring. Cc: Dave Jiang Reported-by: Jeff Moyer Signed-off-by: Dan Williams --- drivers/acpi/nfit/intel.c| 65 ++--- drivers/nvdimm/bus.c |2 - drivers/nvdimm/dimm_devs.c | 59 +-- drivers/nvdimm/nd-core.h | 21 ++-- drivers/nvdimm/security.c| 99 ++ include/linux/libnvdimm.h|9 ++- tools/testing/nvdimm/dimm_devs.c | 19 ++- 7 files changed, 146 insertions(+), 128 deletions(-) diff --git a/drivers/acpi/nfit/intel.c b/drivers/acpi/nfit/intel.c index cddd0fcf622c..2c51ca4155dc 100644 --- a/drivers/acpi/nfit/intel.c +++ b/drivers/acpi/nfit/intel.c @@ -7,10 +7,11 @@ #include "intel.h" #include "nfit.h" -static enum nvdimm_security_state intel_security_state(struct nvdimm *nvdimm, +static unsigned long intel_security_flags(struct nvdimm *nvdimm, enum nvdimm_passphrase_type ptype) { struct nfit_mem *nfit_mem = nvdimm_provider_data(nvdimm); + unsigned long security_flags = 0; struct { struct nd_cmd_pkg pkg; struct nd_intel_get_security_state cmd; @@ -27,46 +28,54 @@ static enum nvdimm_security_state intel_security_state(struct nvdimm *nvdimm, int rc; if (!test_bit(NVDIMM_INTEL_GET_SECURITY_STATE, _mem->dsm_mask)) - return -ENXIO; + return 0; /* * Short circuit the state retrieval while we are doing overwrite. * The DSM spec states that the security state is indeterminate * until the overwrite DSM completes. */ - if (nvdimm_in_overwrite(nvdimm) && ptype == NVDIMM_USER) - return NVDIMM_SECURITY_OVERWRITE; + if (nvdimm_in_overwrite(nvdimm) && ptype == NVDIMM_USER) { + set_bit(NVDIMM_SECURITY_OVERWRITE, _flags); + return security_flags; + } rc = nvdimm_ctl(nvdimm, ND_CMD_CALL, _cmd, sizeof(nd_cmd), NULL); - if (rc < 0) - return rc; - if (nd_cmd.cmd.status) - return -EIO; + if (rc < 0 || nd_cmd.cmd.status) { + pr_err("%s: security state retrieval failed (%d:%#x)\n", + nvdimm_name(nvdimm), rc, nd_cmd.cmd.status); + return 0; + } /* check and see if security is enabled and locked */ if (ptype == NVDIMM_MASTER) { if (nd_cmd.cmd.extended_state & ND_INTEL_SEC_ESTATE_ENABLED) - return NVDIMM_SECURITY_UNLOCKED; - else if (nd_cmd.cmd.extended_state & - ND_INTEL_SEC_ESTATE_PLIMIT) - return NVDIMM_SECURITY_FROZEN; - } else { - if (nd_cmd.cmd.state & ND_INTEL_SEC_STATE_UNSUPPORTED) - return -ENXIO; - else if (nd_cmd.cmd.state & ND_INTEL_SEC_STATE_ENABLED) { - if (nd_cmd.cmd.state & ND_INTEL_SEC_STATE_LOCKED) - return NVDIMM_SECURITY_LOCKED; - else if (nd_cmd.cmd.state & ND_INTEL_SEC_STATE_FROZEN - || nd_cmd.cmd.state & - ND_INTEL_SEC_STATE_PLIMIT) - return NVDIMM_SECURITY_FROZEN; - else - return NVDIMM_SECURITY_UNLOCKED; - } + set_bit(NVDIMM_SECURITY_UNLOCKED, _flags); + else + set_bit(NVDIMM_SECURITY_DISABLED, _flags); + if (nd_cmd.cmd.extended_state & ND_INTEL_SEC_ESTATE_PLIMIT) + set_bit(NVDIMM_SECURITY_FROZEN, _flags); + return security_flags; } - /* this should cover master security disabled as well */ - return NVDIMM_SECURITY_DISABLED; + + if (nd_cmd.cmd.state & ND_INTEL_SEC_STATE_UNSUPPORTED) + return 0; + + if
[PATCH 2/3] libnvdimm/security: Tighten scope of nvdimm->busy vs security operations
The blanket blocking of all security operations while the DIMM is in active use in a region is too restrictive. The only security operations that need to be aware of the ->busy state are those that mutate the state of data, i.e. erase and overwrite. Refactor the ->busy checks to be applied at the entry common entry point in __security_store() rather than each of the helper routines. Cc: Dave Jiang Signed-off-by: Dan Williams --- drivers/nvdimm/dimm_devs.c | 33 - drivers/nvdimm/security.c | 10 -- 2 files changed, 16 insertions(+), 27 deletions(-) diff --git a/drivers/nvdimm/dimm_devs.c b/drivers/nvdimm/dimm_devs.c index 53330625fe07..d837cb9be83d 100644 --- a/drivers/nvdimm/dimm_devs.c +++ b/drivers/nvdimm/dimm_devs.c @@ -424,9 +424,6 @@ static ssize_t __security_store(struct device *dev, const char *buf, size_t len) unsigned int key, newkey; int i; - if (atomic_read(>busy)) - return -EBUSY; - rc = sscanf(buf, "%"__stringify(SEC_CMD_SIZE)"s" " %"__stringify(KEY_ID_SIZE)"s" " %"__stringify(KEY_ID_SIZE)"s", @@ -451,23 +448,25 @@ static ssize_t __security_store(struct device *dev, const char *buf, size_t len) } else if (i == OP_DISABLE) { dev_dbg(dev, "disable %u\n", key); rc = nvdimm_security_disable(nvdimm, key); - } else if (i == OP_UPDATE) { - dev_dbg(dev, "update %u %u\n", key, newkey); - rc = nvdimm_security_update(nvdimm, key, newkey, NVDIMM_USER); - } else if (i == OP_ERASE) { - dev_dbg(dev, "erase %u\n", key); - rc = nvdimm_security_erase(nvdimm, key, NVDIMM_USER); + } else if (i == OP_UPDATE || i == OP_MASTER_UPDATE) { + dev_dbg(dev, "%s %u %u\n", ops[i].name, key, newkey); + rc = nvdimm_security_update(nvdimm, key, newkey, i == OP_UPDATE + ? NVDIMM_USER : NVDIMM_MASTER); + } else if (i == OP_ERASE || i == OP_MASTER_ERASE) { + dev_dbg(dev, "%s %u\n", ops[i].name, key); + if (atomic_read(>busy)) { + dev_dbg(dev, "Unable to secure erase while DIMM active.\n"); + return -EBUSY; + } + rc = nvdimm_security_erase(nvdimm, key, i == OP_ERASE + ? NVDIMM_USER : NVDIMM_MASTER); } else if (i == OP_OVERWRITE) { dev_dbg(dev, "overwrite %u\n", key); + if (atomic_read(>busy)) { + dev_dbg(dev, "Unable to overwrite while DIMM active.\n"); + return -EBUSY; + } rc = nvdimm_security_overwrite(nvdimm, key); - } else if (i == OP_MASTER_UPDATE) { - dev_dbg(dev, "master_update %u %u\n", key, newkey); - rc = nvdimm_security_update(nvdimm, key, newkey, - NVDIMM_MASTER); - } else if (i == OP_MASTER_ERASE) { - dev_dbg(dev, "master_erase %u\n", key); - rc = nvdimm_security_erase(nvdimm, key, - NVDIMM_MASTER); } else return -EINVAL; diff --git a/drivers/nvdimm/security.c b/drivers/nvdimm/security.c index 5862d0eee9db..2166e627383a 100644 --- a/drivers/nvdimm/security.c +++ b/drivers/nvdimm/security.c @@ -334,11 +334,6 @@ int nvdimm_security_erase(struct nvdimm *nvdimm, unsigned int keyid, || !nvdimm->sec.flags) return -EOPNOTSUPP; - if (atomic_read(>busy)) { - dev_dbg(dev, "Unable to secure erase while DIMM active.\n"); - return -EBUSY; - } - rc = check_security_state(nvdimm); if (rc) return rc; @@ -380,11 +375,6 @@ int nvdimm_security_overwrite(struct nvdimm *nvdimm, unsigned int keyid) || !nvdimm->sec.flags) return -EOPNOTSUPP; - if (atomic_read(>busy)) { - dev_dbg(dev, "Unable to overwrite while DIMM active.\n"); - return -EBUSY; - } - if (dev->driver == NULL) { dev_dbg(dev, "Unable to overwrite while DIMM active.\n"); return -EINVAL;
[PATCH 0/3] libnvdimm/security: Enumerate the frozen state and other cleanups
Jeff reported a scenario where ndctl was failing to unlock DIMMs [1]. Through the course of debug it was discovered that the security interface on the DIMMs was in the 'frozen' state disallowing unlock, or any security operation. Unfortunately the kernel only showed that the DIMMs were 'locked', not 'locked' and 'frozen'. Introduce a new sysfs 'frozen' attribute so that ndctl can reflect the "security-operations-allowed" state independently of the lock status. Then, followup with cleanups related to replacing a security-state-enum with a set of flags. [1]: https://lists.01.org/pipermail/linux-nvdimm/2019-August/022856.html --- Dan Williams (3): libnvdimm/security: Introduce a 'frozen' attribute libnvdimm/security: Tighten scope of nvdimm->busy vs security operations libnvdimm/security: Consolidate 'security' operations drivers/acpi/nfit/intel.c| 65 +++- drivers/nvdimm/bus.c |2 drivers/nvdimm/dimm_devs.c | 134 ++ drivers/nvdimm/nd-core.h | 51 -- drivers/nvdimm/security.c| 199 +- include/linux/libnvdimm.h|9 +- tools/testing/nvdimm/dimm_devs.c | 19 +--- 7 files changed, 231 insertions(+), 248 deletions(-)
Re: [PATCH 4.19 00/91] 4.19.67-stable review
On Wed, 14 Aug 2019 at 22:38, Greg Kroah-Hartman wrote: > > This is the start of the stable review cycle for the 4.19.67 release. > There are 91 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Fri 16 Aug 2019 04:55:34 PM UTC. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.19.67-rc1.gz > or in the git tree and branch at: > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-4.19.y > and the diffstat can be found below. > > thanks, > > greg k-h Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386. Summary kernel: 4.19.67-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.19.y git commit: f777613d3df0e7226d30d0e0ba97e9419e3064f2 git describe: v4.19.66-92-gf777613d3df0 Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.19-oe/build/v4.19.66-92-gf777613d3df0 No regressions (compared to build v4.19.66) No fixes (compared to build v4.19.66) Ran 25307 total tests in the following environments and test suites. Environments -- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64 Test Suites --- * build * install-android-platform-tools-r2600 * kselftest * libgpiod * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-securebits-tests * ltp-syscalls-tests * ltp-timers-tests * perf * spectre-meltdown-checker-test * v4l2-compliance * network-basic-tests * ltp-open-posix-tests * kvm-unit-tests * ssuite * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none -- Linaro LKFT https://lkft.linaro.org
Re: WARNING in cgroup_rstat_updated
syzbot has bisected this bug to: commit e9db4ef6bf4ca9894bb324c76e01b8f1a16b2650 Author: John Fastabend Date: Sat Jun 30 13:17:47 2018 + bpf: sockhash fix omitted bucket lock in sock_close bisection log: https://syzkaller.appspot.com/x/bisect.txt?x=143286e260 start commit: 31cc088a Merge tag 'drm-next-2019-07-19' of git://anongit... git tree: net-next final crash:https://syzkaller.appspot.com/x/report.txt?x=163286e260 console output: https://syzkaller.appspot.com/x/log.txt?x=123286e260 kernel config: https://syzkaller.appspot.com/x/.config?x=4dba67bf8b8c9ad7 dashboard link: https://syzkaller.appspot.com/bug?extid=370e4739fa489334a4ef syz repro: https://syzkaller.appspot.com/x/repro.syz?x=16dd57dc60 Reported-by: syzbot+370e4739fa489334a...@syzkaller.appspotmail.com Fixes: e9db4ef6bf4c ("bpf: sockhash fix omitted bucket lock in sock_close") For information about bisection process see: https://goo.gl/tpsmEJ#bisection
Re: [PATCH 4.14 00/69] 4.14.139-stable review
On Wed, 14 Aug 2019 at 22:42, Greg Kroah-Hartman wrote: > > This is the start of the stable review cycle for the 4.14.139 release. > There are 69 patches in this series, all will be posted as a response > to this one. If anyone has any issues with these being applied, please > let me know. > > Responses should be made by Fri 16 Aug 2019 04:55:34 PM UTC. > Anything received after that time might be too late. > > The whole patch series can be found in one patch at: > > https://www.kernel.org/pub/linux/kernel/v4.x/stable-review/patch-4.14.139-rc1.gz > or in the git tree and branch at: > > git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git > linux-4.14.y > and the diffstat can be found below. > > thanks, > > greg k-h Results from Linaro’s test farm. No regressions on arm64, arm, x86_64, and i386. Summary kernel: 4.14.139-rc1 git repo: https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git git branch: linux-4.14.y git commit: 736c2f07319a323c55007bcf8fca70481e9c7175 git describe: v4.14.138-70-g736c2f07319a Test details: https://qa-reports.linaro.org/lkft/linux-stable-rc-4.14-oe/build/v4.14.138-70-g736c2f07319a No regressions (compared to build v4.14.138) No fixes (compared to build v4.14.138) Ran 23727 total tests in the following environments and test suites. Environments -- - dragonboard-410c - arm64 - hi6220-hikey - arm64 - i386 - juno-r2 - arm64 - qemu_arm - qemu_arm64 - qemu_i386 - qemu_x86_64 - x15 - arm - x86_64 Test Suites --- * build * install-android-platform-tools-r2600 * kselftest * libhugetlbfs * ltp-cap_bounds-tests * ltp-commands-tests * ltp-containers-tests * ltp-cpuhotplug-tests * ltp-cve-tests * ltp-dio-tests * ltp-fcntl-locktests-tests * ltp-filecaps-tests * ltp-fs_bind-tests * ltp-fs_perms_simple-tests * ltp-fsx-tests * ltp-hugetlb-tests * ltp-io-tests * ltp-ipc-tests * ltp-math-tests * ltp-mm-tests * ltp-nptl-tests * ltp-pty-tests * ltp-sched-tests * ltp-syscalls-tests * ltp-timers-tests * perf * spectre-meltdown-checker-test * v4l2-compliance * ltp-fs-tests * ltp-securebits-tests * network-basic-tests * ltp-open-posix-tests * kvm-unit-tests * kselftest-vsyscall-mode-native * kselftest-vsyscall-mode-none * ssuite -- Linaro LKFT https://lkft.linaro.org
CONFIRM AND ACKNOWLEDGE
Good day, I write you today with optimism regarding a discussion that will benefit us both immensely I am Reese Bechnam. LLB (Hons). Head Attorney with Clifford Bryant Solicitors, Miami, FL and I got your information from the International directory here in Miami and after extensive research, you have a good record. So I decided to contact you I have already drafted a detailed letter but due to the confidentiality and sensitive nature of the situation, I deem it necessary to confirm that this is your private email before I can proceed with sending you the detailed letter. Kindly reply me on any of my emails. I await your prompt response. Kind Regards, Reese Bechnam 1351 NW 12th St, Miami, FL 33125, USA +1 786 789 5689
Re: [PATCH] sh: Drop -Werror from kernel Makefile
Guenter, On 8/13/19 8:18 AM, Guenter Roeck wrote: > > Please note that _mainline_ builds are currently broken. > This should be fixed now: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=41de59634046b19cd53a1983594a95135c656997 Thanks -- Gustavo
Re: [PATCH v8 05/14] media: rkisp1: add Rockchip ISP1 subdev driver
Hi Sakari, Thanks for your review. I just have some comments/questions below. On 8/8/19 6:14 AM, Sakari Ailus wrote: > Hi Helen, > > On Tue, Jul 30, 2019 at 03:42:47PM -0300, Helen Koike wrote: >> From: Jacob Chen >> >> Add the subdev driver for rockchip isp1. >> >> Signed-off-by: Jacob Chen >> Signed-off-by: Shunqian Zheng >> Signed-off-by: Yichong Zhong >> Signed-off-by: Jacob Chen >> Signed-off-by: Eddie Cai >> Signed-off-by: Jeffy Chen >> Signed-off-by: Allon Huang >> Signed-off-by: Tomasz Figa >> [fixed unknown entity type / switched to PIXEL_RATE] >> Signed-off-by: Ezequiel Garcia >> [update for upstream] >> Signed-off-by: Helen Koike >> >> --- >> >> Changes in v8: None >> Changes in v7: >> - fixed warning because of unknown entity type >> - fixed v4l2-compliance errors regarding rkisp1 formats, try formats >> and default values >> - fix typo riksp1/rkisp1 >> - redesign: remove mipi/csi subdevice, sensors connect directly to the >> isp subdevice in the media topology now. As a consequence, remove the >> hack in mipidphy_g_mbus_config() where information from the sensor was >> being propagated through the topology. >> - From the old dphy: >> * cache get_remote_sensor() in s_stream >> * use V4L2_CID_PIXEL_RATE instead of V4L2_CID_LINK_FREQ >> - Replace stream state with a boolean >> - code styling and checkpatch fixes >> - fix stop_stream (return after calling stop, do not reenable the stream) >> - fix rkisp1_isp_sd_get_selection when V4L2_SUBDEV_FORMAT_TRY is set >> - fix get format in output (isp_sd->out_fmt.mbus_code was being ignored) >> - s/intput/input >> - remove #define sd_to_isp_sd(_sd), add a static inline as it will be >> reused by the capture >> >> drivers/media/platform/rockchip/isp1/rkisp1.c | 1286 + >> drivers/media/platform/rockchip/isp1/rkisp1.h | 111 ++ >> 2 files changed, 1397 insertions(+) >> create mode 100644 drivers/media/platform/rockchip/isp1/rkisp1.c >> create mode 100644 drivers/media/platform/rockchip/isp1/rkisp1.h >> >> diff --git a/drivers/media/platform/rockchip/isp1/rkisp1.c >> b/drivers/media/platform/rockchip/isp1/rkisp1.c >> new file mode 100644 >> index ..6d0c0ffb5e03 >> --- /dev/null >> +++ b/drivers/media/platform/rockchip/isp1/rkisp1.c >> @@ -0,0 +1,1286 @@ >> +// SPDX-License-Identifier: (GPL-2.0+ OR MIT) >> +/* >> + * Rockchip isp1 driver >> + * >> + * Copyright (C) 2017 Rockchip Electronics Co., Ltd. >> + */ >> + >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> +#include >> + >> +#include "common.h" >> +#include "regs.h" >> + >> +#define CIF_ISP_INPUT_W_MAX 4032 >> +#define CIF_ISP_INPUT_H_MAX 3024 >> +#define CIF_ISP_INPUT_W_MIN 32 >> +#define CIF_ISP_INPUT_H_MIN 32 >> +#define CIF_ISP_OUTPUT_W_MAXCIF_ISP_INPUT_W_MAX >> +#define CIF_ISP_OUTPUT_H_MAXCIF_ISP_INPUT_H_MAX >> +#define CIF_ISP_OUTPUT_W_MINCIF_ISP_INPUT_W_MIN >> +#define CIF_ISP_OUTPUT_H_MINCIF_ISP_INPUT_H_MIN >> + >> +/* >> + * NOTE: MIPI controller and input MUX are also configured in this file, >> + * because ISP Subdev is not only describe ISP submodule(input size,format, >> + * output size, format), but also a virtual route device. >> + */ >> + >> +/* >> + * There are many variables named with format/frame in below code, >> + * please see here for their meaning. >> + * >> + * Cropping regions of ISP >> + * >> + * +-+ >> + * | Sensor image| >> + * | +---+ | >> + * | | ISP_ACQ (for black level) | | >> + * | | in_frm| | >> + * | | ++| | >> + * | | |ISP_OUT || | >> + * | | |in_crop || | >> + * | | |+-+ || | >> + * | | || ISP_IS| || | >> + * | | || rkisp1_isp_subdev: out_crop | || | >> + * | | |+-+ || | >> + * | | ++| | >> + * | +---+ | >> + * +-+ >> + */ >> + >> +static inline struct rkisp1_device *sd_to_isp_dev(struct v4l2_subdev *sd) >> +{ >> +return container_of(sd->v4l2_dev, struct rkisp1_device, v4l2_dev); >> +} >> + >> +/* Get sensor by enabled media link */ >> +static struct v4l2_subdev *get_remote_sensor(struct v4l2_subdev *sd) >> +{ >> +struct media_pad *local, *remote; >> +struct media_entity *sensor_me; >> + >> +local = >entity.pads[RKISP1_ISP_PAD_SINK]; >> +remote = media_entity_remote_pad(local); >> +if (!remote) {
RE: [PATCH] scsi: fnic: remove redundant assignment of variable rc
Acked-by: Karan Tilak Kumar -Original Message- From: Colin King Sent: Tuesday, August 13, 2019 6:24 AM To: Satish Kharat (satishkh) ; Sesidhar Baddela (sebaddel) ; Karan Tilak Kumar (kartilak) ; James E . J . Bottomley ; Martin K . Petersen ; linux-s...@vger.kernel.org Cc: kernel-janit...@vger.kernel.org; linux-kernel@vger.kernel.org Subject: [PATCH] scsi: fnic: remove redundant assignment of variable rc From: Colin Ian King Variable ret is initialized to a value that is never read and it is re-assigned later and immediatetly returns. Clean up the code by removing rc and just returning 0. Addresses-Coverity: ("Unused value") Signed-off-by: Colin Ian King --- drivers/scsi/fnic/fnic_debugfs.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/drivers/scsi/fnic/fnic_debugfs.c b/drivers/scsi/fnic/fnic_debugfs.c index 21991c99db7c..13f7d88d6e57 100644 --- a/drivers/scsi/fnic/fnic_debugfs.c +++ b/drivers/scsi/fnic/fnic_debugfs.c @@ -52,7 +52,6 @@ static struct fc_trace_flag_type *fc_trc_flag; */ int fnic_debugfs_init(void) { - int rc = -1; fnic_trace_debugfs_root = debugfs_create_dir("fnic", NULL); fnic_stats_debugfs_root = debugfs_create_dir("statistics", @@ -70,8 +69,7 @@ int fnic_debugfs_init(void) fc_trc_flag->fc_clear = 4; } - rc = 0; - return rc; + return 0; } /* -- 2.20.1
[PATCH v10 5/7] powerpc/memcpy: Add memcpy_mcsafe for pmem
From: Balbir Singh The pmem infrastructure uses memcpy_mcsafe in the pmem layer so as to convert machine check exceptions into a return value on failure in case a machine check exception is encountered during the memcpy. The return value is the number of bytes remaining to be copied. This patch largely borrows from the copyuser_power7 logic and does not add the VMX optimizations, largely to keep the patch simple. If needed those optimizations can be folded in. Signed-off-by: Balbir Singh [ar...@linux.ibm.com: Added symbol export] Co-developed-by: Santosh Sivaraj Signed-off-by: Santosh Sivaraj --- arch/powerpc/include/asm/string.h | 2 + arch/powerpc/lib/Makefile | 2 +- arch/powerpc/lib/memcpy_mcsafe_64.S | 242 3 files changed, 245 insertions(+), 1 deletion(-) create mode 100644 arch/powerpc/lib/memcpy_mcsafe_64.S diff --git a/arch/powerpc/include/asm/string.h b/arch/powerpc/include/asm/string.h index 9bf6dffb4090..b72692702f35 100644 --- a/arch/powerpc/include/asm/string.h +++ b/arch/powerpc/include/asm/string.h @@ -53,7 +53,9 @@ void *__memmove(void *to, const void *from, __kernel_size_t n); #ifndef CONFIG_KASAN #define __HAVE_ARCH_MEMSET32 #define __HAVE_ARCH_MEMSET64 +#define __HAVE_ARCH_MEMCPY_MCSAFE +extern int memcpy_mcsafe(void *dst, const void *src, __kernel_size_t sz); extern void *__memset16(uint16_t *, uint16_t v, __kernel_size_t); extern void *__memset32(uint32_t *, uint32_t v, __kernel_size_t); extern void *__memset64(uint64_t *, uint64_t v, __kernel_size_t); diff --git a/arch/powerpc/lib/Makefile b/arch/powerpc/lib/Makefile index eebc782d89a5..fa6b1b657b43 100644 --- a/arch/powerpc/lib/Makefile +++ b/arch/powerpc/lib/Makefile @@ -39,7 +39,7 @@ obj-$(CONFIG_PPC_BOOK3S_64) += copyuser_power7.o copypage_power7.o \ memcpy_power7.o obj64-y+= copypage_64.o copyuser_64.o mem_64.o hweight_64.o \ - memcpy_64.o pmem.o + memcpy_64.o pmem.o memcpy_mcsafe_64.o obj64-$(CONFIG_SMP)+= locks.o obj64-$(CONFIG_ALTIVEC)+= vmx-helper.o diff --git a/arch/powerpc/lib/memcpy_mcsafe_64.S b/arch/powerpc/lib/memcpy_mcsafe_64.S new file mode 100644 index ..949976dc115d --- /dev/null +++ b/arch/powerpc/lib/memcpy_mcsafe_64.S @@ -0,0 +1,242 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +/* + * Copyright (C) IBM Corporation, 2011 + * Derived from copyuser_power7.s by Anton Blanchard + * Author - Balbir Singh + */ +#include +#include +#include + + .macro err1 +100: + EX_TABLE(100b,.Ldo_err1) + .endm + + .macro err2 +200: + EX_TABLE(200b,.Ldo_err2) + .endm + + .macro err3 +300: EX_TABLE(300b,.Ldone) + .endm + +.Ldo_err2: + ld r22,STK_REG(R22)(r1) + ld r21,STK_REG(R21)(r1) + ld r20,STK_REG(R20)(r1) + ld r19,STK_REG(R19)(r1) + ld r18,STK_REG(R18)(r1) + ld r17,STK_REG(R17)(r1) + ld r16,STK_REG(R16)(r1) + ld r15,STK_REG(R15)(r1) + ld r14,STK_REG(R14)(r1) + addir1,r1,STACKFRAMESIZE +.Ldo_err1: + /* Do a byte by byte copy to get the exact remaining size */ + mtctr r7 +46: +err3; lbz r0,0(r4) + addir4,r4,1 +err3; stb r0,0(r3) + addir3,r3,1 + bdnz46b + li r3,0 + blr + +.Ldone: + mfctr r3 + blr + + +_GLOBAL(memcpy_mcsafe) + mr r7,r5 + cmpldi r5,16 + blt .Lshort_copy + +.Lcopy: + /* Get the source 8B aligned */ + neg r6,r4 + mtocrf 0x01,r6 + clrldi r6,r6,(64-3) + + bf cr7*4+3,1f +err1; lbz r0,0(r4) + addir4,r4,1 +err1; stb r0,0(r3) + addir3,r3,1 + subir7,r7,1 + +1: bf cr7*4+2,2f +err1; lhz r0,0(r4) + addir4,r4,2 +err1; sth r0,0(r3) + addir3,r3,2 + subir7,r7,2 + +2: bf cr7*4+1,3f +err1; lwz r0,0(r4) + addir4,r4,4 +err1; stw r0,0(r3) + addir3,r3,4 + subir7,r7,4 + +3: sub r5,r5,r6 + cmpldi r5,128 + blt 5f + + mflrr0 + stdur1,-STACKFRAMESIZE(r1) + std r14,STK_REG(R14)(r1) + std r15,STK_REG(R15)(r1) + std r16,STK_REG(R16)(r1) + std r17,STK_REG(R17)(r1) + std r18,STK_REG(R18)(r1) + std r19,STK_REG(R19)(r1) + std r20,STK_REG(R20)(r1) + std r21,STK_REG(R21)(r1) + std r22,STK_REG(R22)(r1) + std r0,STACKFRAMESIZE+16(r1) + + srdir6,r5,7 + mtctr r6 + + /* Now do cacheline (128B) sized loads and stores. */ + .align 5 +4: +err2; ld r0,0(r4) +err2; ld r6,8(r4) +err2; ld r8,16(r4) +err2; ld r9,24(r4) +err2; ld r10,32(r4) +err2; ld r11,40(r4) +err2; ld r12,48(r4) +err2; ld r14,56(r4) +err2; ld r15,64(r4) +err2; ld
[PATCH v10 6/7] powerpc/mce: Handle UE event for memcpy_mcsafe
From: Balbir Singh If we take a UE on one of the instructions with a fixup entry, set nip to continue execution at the fixup entry. Stop processing the event further or print it. Co-developed-by: Reza Arbab Signed-off-by: Reza Arbab Signed-off-by: Balbir Singh Signed-off-by: Santosh Sivaraj Reviewed-by: Mahesh Salgaonkar --- arch/powerpc/include/asm/mce.h | 4 +++- arch/powerpc/kernel/mce.c | 16 arch/powerpc/kernel/mce_power.c | 15 +-- 3 files changed, 32 insertions(+), 3 deletions(-) diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h index f3a6036b6bc0..e1931c8c2743 100644 --- a/arch/powerpc/include/asm/mce.h +++ b/arch/powerpc/include/asm/mce.h @@ -122,7 +122,8 @@ struct machine_check_event { enum MCE_UeErrorType ue_error_type:8; u8 effective_address_provided; u8 physical_address_provided; - u8 reserved_1[5]; + u8 ignore_event; + u8 reserved_1[4]; u64 effective_address; u64 physical_address; u8 reserved_2[8]; @@ -193,6 +194,7 @@ struct mce_error_info { enum MCE_Initiator initiator:8; enum MCE_ErrorClass error_class:8; boolsync_error; + boolignore_event; }; #define MAX_MC_EVT 100 diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index a3b122a685a5..ec4b3e1087be 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -149,6 +149,7 @@ void save_mce_event(struct pt_regs *regs, long handled, if (phys_addr != ULONG_MAX) { mce->u.ue_error.physical_address_provided = true; mce->u.ue_error.physical_address = phys_addr; + mce->u.ue_error.ignore_event = mce_err->ignore_event; machine_check_ue_event(mce); } } @@ -266,8 +267,17 @@ static void machine_process_ue_event(struct work_struct *work) /* * This should probably queued elsewhere, but * oh! well +* +* Don't report this machine check because the caller has a +* asked us to ignore the event, it has a fixup handler which +* will do the appropriate error handling and reporting. */ if (evt->error_type == MCE_ERROR_TYPE_UE) { + if (evt->u.ue_error.ignore_event) { + __this_cpu_dec(mce_ue_count); + continue; + } + if (evt->u.ue_error.physical_address_provided) { unsigned long pfn; @@ -301,6 +311,12 @@ static void machine_check_process_queued_event(struct irq_work *work) while (__this_cpu_read(mce_queue_count) > 0) { index = __this_cpu_read(mce_queue_count) - 1; evt = this_cpu_ptr(_event_queue[index]); + + if (evt->error_type == MCE_ERROR_TYPE_UE && + evt->u.ue_error.ignore_event) { + __this_cpu_dec(mce_queue_count); + continue; + } machine_check_print_event_info(evt, false, false); __this_cpu_dec(mce_queue_count); } diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c index e74816f045f8..1dd87f6f5186 100644 --- a/arch/powerpc/kernel/mce_power.c +++ b/arch/powerpc/kernel/mce_power.c @@ -11,6 +11,7 @@ #include #include +#include #include #include #include @@ -18,6 +19,7 @@ #include #include #include +#include /* * Convert an address related to an mm to a physical address. @@ -559,9 +561,18 @@ static int mce_handle_derror(struct pt_regs *regs, return 0; } -static long mce_handle_ue_error(struct pt_regs *regs) +static long mce_handle_ue_error(struct pt_regs *regs, + struct mce_error_info *mce_err) { long handled = 0; + const struct exception_table_entry *entry; + + entry = search_kernel_exception_table(regs->nip); + if (entry) { + mce_err->ignore_event = true; + regs->nip = extable_fixup(entry); + return 1; + } /* * On specific SCOM read via MMIO we may get a machine check @@ -594,7 +605,7 @@ static long mce_handle_error(struct pt_regs *regs, _addr); if (!handled && mce_err.error_type == MCE_ERROR_TYPE_UE) - handled = mce_handle_ue_error(regs); + handled = mce_handle_ue_error(regs, _err); save_mce_event(regs,
[PATCH v10 7/7] powerpc: add machine check safe copy_to_user
Use memcpy_mcsafe() implementation to define copy_to_user_mcsafe() Signed-off-by: Santosh Sivaraj --- arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/uaccess.h | 14 ++ 2 files changed, 15 insertions(+) diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig index 77f6ebf97113..4316e36095a2 100644 --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -137,6 +137,7 @@ config PPC select ARCH_HAS_STRICT_KERNEL_RWX if ((PPC_BOOK3S_64 || PPC32) && !RELOCATABLE && !HIBERNATION) select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST select ARCH_HAS_UACCESS_FLUSHCACHE if PPC64 + select ARCH_HAS_UACCESS_MCSAFE if PPC64 select ARCH_HAS_UBSAN_SANITIZE_ALL select ARCH_HAVE_NMI_SAFE_CMPXCHG select ARCH_KEEP_MEMBLOCK diff --git a/arch/powerpc/include/asm/uaccess.h b/arch/powerpc/include/asm/uaccess.h index 8b03eb44e876..15002b51ff18 100644 --- a/arch/powerpc/include/asm/uaccess.h +++ b/arch/powerpc/include/asm/uaccess.h @@ -387,6 +387,20 @@ static inline unsigned long raw_copy_to_user(void __user *to, return ret; } +static __always_inline unsigned long __must_check +copy_to_user_mcsafe(void __user *to, const void *from, unsigned long n) +{ + if (likely(check_copy_size(from, n, true))) { + if (access_ok(to, n)) { + allow_write_to_user(to, n); + n = memcpy_mcsafe((void *)to, from, n); + prevent_write_to_user(to, n); + } + } + + return n; +} + extern unsigned long __clear_user(void __user *addr, unsigned long size); static inline unsigned long clear_user(void __user *addr, unsigned long size) -- 2.21.0
[PATCH v10 4/7] extable: Add function to search only kernel exception table
Certain architecture specific operating modes (e.g., in powerpc machine check handler that is unable to access vmalloc memory), the search_exception_tables cannot be called because it also searches the module exception tables if entry is not found in the kernel exception table. Cc: Thomas Gleixner Cc: Ingo Molnar Cc: Nicholas Piggin Signed-off-by: Santosh Sivaraj Reviewed-by: Nicholas Piggin --- include/linux/extable.h | 2 ++ kernel/extable.c| 11 +-- 2 files changed, 11 insertions(+), 2 deletions(-) diff --git a/include/linux/extable.h b/include/linux/extable.h index 41c5b3a25f67..81ecfaa83ad3 100644 --- a/include/linux/extable.h +++ b/include/linux/extable.h @@ -19,6 +19,8 @@ void trim_init_extable(struct module *m); /* Given an address, look for it in the exception tables */ const struct exception_table_entry *search_exception_tables(unsigned long add); +const struct exception_table_entry * +search_kernel_exception_table(unsigned long addr); #ifdef CONFIG_MODULES /* For extable.c to search modules' exception tables. */ diff --git a/kernel/extable.c b/kernel/extable.c index e23cce6e6092..f6c9406eec7d 100644 --- a/kernel/extable.c +++ b/kernel/extable.c @@ -40,13 +40,20 @@ void __init sort_main_extable(void) } } +/* Given an address, look for it in the kernel exception table */ +const +struct exception_table_entry *search_kernel_exception_table(unsigned long addr) +{ + return search_extable(__start___ex_table, + __stop___ex_table - __start___ex_table, addr); +} + /* Given an address, look for it in the exception tables. */ const struct exception_table_entry *search_exception_tables(unsigned long addr) { const struct exception_table_entry *e; - e = search_extable(__start___ex_table, - __stop___ex_table - __start___ex_table, addr); + e = search_kernel_exception_table(addr); if (!e) e = search_module_extables(addr); return e; -- 2.21.0
[PATCH v10 2/7] powerpc/mce: Fix MCE handling for huge pages
From: Balbir Singh The current code would fail on huge pages addresses, since the shift would be incorrect. Use the correct page shift value returned by __find_linux_pte() to get the correct physical address. The code is more generic and can handle both regular and compound pages. Fixes: ba41e1e1ccb9 ("powerpc/mce: Hookup derror (load/store) UE errors") Signed-off-by: Balbir Singh [ar...@linux.ibm.com: Fixup pseries_do_memory_failure()] Signed-off-by: Reza Arbab Co-developed-by: Santosh Sivaraj Signed-off-by: Santosh Sivaraj Tested-by: Mahesh Salgaonkar Cc: sta...@vger.kernel.org # v4.15+ --- arch/powerpc/include/asm/mce.h | 2 +- arch/powerpc/kernel/mce_power.c | 55 ++-- arch/powerpc/platforms/pseries/ras.c | 9 ++--- 3 files changed, 32 insertions(+), 34 deletions(-) diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h index a4c6a74ad2fb..f3a6036b6bc0 100644 --- a/arch/powerpc/include/asm/mce.h +++ b/arch/powerpc/include/asm/mce.h @@ -209,7 +209,7 @@ extern void release_mce_event(void); extern void machine_check_queue_event(void); extern void machine_check_print_event_info(struct machine_check_event *evt, bool user_mode, bool in_guest); -unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr); +unsigned long addr_to_phys(struct pt_regs *regs, unsigned long addr); #ifdef CONFIG_PPC_BOOK3S_64 void flush_and_reload_slb(void); #endif /* CONFIG_PPC_BOOK3S_64 */ diff --git a/arch/powerpc/kernel/mce_power.c b/arch/powerpc/kernel/mce_power.c index a814d2dfb5b0..e74816f045f8 100644 --- a/arch/powerpc/kernel/mce_power.c +++ b/arch/powerpc/kernel/mce_power.c @@ -20,13 +20,14 @@ #include /* - * Convert an address related to an mm to a PFN. NOTE: we are in real - * mode, we could potentially race with page table updates. + * Convert an address related to an mm to a physical address. + * NOTE: we are in real mode, we could potentially race with page table updates. */ -unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr) +unsigned long addr_to_phys(struct pt_regs *regs, unsigned long addr) { - pte_t *ptep; - unsigned long flags; + pte_t *ptep, pte; + unsigned int shift; + unsigned long flags, phys_addr; struct mm_struct *mm; if (user_mode(regs)) @@ -35,14 +36,21 @@ unsigned long addr_to_pfn(struct pt_regs *regs, unsigned long addr) mm = _mm; local_irq_save(flags); - if (mm == current->mm) - ptep = find_current_mm_pte(mm->pgd, addr, NULL, NULL); - else - ptep = find_init_mm_pte(addr, NULL); + ptep = __find_linux_pte(mm->pgd, addr, NULL, ); local_irq_restore(flags); + if (!ptep || pte_special(*ptep)) return ULONG_MAX; - return pte_pfn(*ptep); + + pte = *ptep; + if (shift > PAGE_SHIFT) { + unsigned long rpnmask = (1ul << shift) - PAGE_SIZE; + + pte = __pte(pte_val(pte) | (addr & rpnmask)); + } + phys_addr = pte_pfn(pte) << PAGE_SHIFT; + + return phys_addr; } /* flush SLBs and reload */ @@ -344,7 +352,7 @@ static const struct mce_derror_table mce_p9_derror_table[] = { MCE_INITIATOR_CPU, MCE_SEV_SEVERE, true }, { 0, false, 0, 0, 0, 0, 0 } }; -static int mce_find_instr_ea_and_pfn(struct pt_regs *regs, uint64_t *addr, +static int mce_find_instr_ea_and_phys(struct pt_regs *regs, uint64_t *addr, uint64_t *phys_addr) { /* @@ -354,18 +362,16 @@ static int mce_find_instr_ea_and_pfn(struct pt_regs *regs, uint64_t *addr, * faults */ int instr; - unsigned long pfn, instr_addr; + unsigned long instr_addr; struct instruction_op op; struct pt_regs tmp = *regs; - pfn = addr_to_pfn(regs, regs->nip); - if (pfn != ULONG_MAX) { - instr_addr = (pfn << PAGE_SHIFT) + (regs->nip & ~PAGE_MASK); + instr_addr = addr_to_phys(regs, regs->nip) + (regs->nip & ~PAGE_MASK); + if (instr_addr != ULONG_MAX) { instr = *(unsigned int *)(instr_addr); if (!analyse_instr(, , instr)) { - pfn = addr_to_pfn(regs, op.ea); *addr = op.ea; - *phys_addr = (pfn << PAGE_SHIFT); + *phys_addr = addr_to_phys(regs, op.ea); return 0; } /* @@ -440,15 +446,9 @@ static int mce_handle_ierror(struct pt_regs *regs, *addr = regs->nip; if (mce_err->sync_error && table[i].error_type == MCE_ERROR_TYPE_UE) { - unsigned long pfn; - - if (get_paca()->in_mce < MAX_MCE_DEPTH) { - pfn = addr_to_pfn(regs, regs->nip); -
[PATCH v10 1/7] powerpc/mce: Schedule work from irq_work
schedule_work() cannot be called from MCE exception context as MCE can interrupt even in interrupt disabled context. fixes: 733e4a4c ("powerpc/mce: hookup memory_failure for UE errors") Suggested-by: Mahesh Salgaonkar Signed-off-by: Santosh Sivaraj Reviewed-by: Mahesh Salgaonkar Acked-by: Balbir Singh Cc: sta...@vger.kernel.org # v4.15+ --- arch/powerpc/kernel/mce.c | 11 ++- 1 file changed, 10 insertions(+), 1 deletion(-) diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index b18df633eae9..cff31d4a501f 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -33,6 +33,7 @@ static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT], mce_ue_event_queue); static void machine_check_process_queued_event(struct irq_work *work); +static void machine_check_ue_irq_work(struct irq_work *work); void machine_check_ue_event(struct machine_check_event *evt); static void machine_process_ue_event(struct work_struct *work); @@ -40,6 +41,10 @@ static struct irq_work mce_event_process_work = { .func = machine_check_process_queued_event, }; +static struct irq_work mce_ue_event_irq_work = { + .func = machine_check_ue_irq_work, +}; + DECLARE_WORK(mce_ue_event_work, machine_process_ue_event); static void mce_set_error_info(struct machine_check_event *mce, @@ -199,6 +204,10 @@ void release_mce_event(void) get_mce_event(NULL, true); } +static void machine_check_ue_irq_work(struct irq_work *work) +{ + schedule_work(_ue_event_work); +} /* * Queue up the MCE event which then can be handled later. @@ -216,7 +225,7 @@ void machine_check_ue_event(struct machine_check_event *evt) memcpy(this_cpu_ptr(_ue_event_queue[index]), evt, sizeof(*evt)); /* Queue work to process this event later. */ - schedule_work(_ue_event_work); + irq_work_queue(_ue_event_irq_work); } /* -- 2.21.0
[PATCH v10 3/7] powerpc/mce: Make machine_check_ue_event() static
From: Reza Arbab The function doesn't get used outside this file, so make it static. Signed-off-by: Reza Arbab Signed-off-by: Santosh Sivaraj Reviewed-by: Nicholas Piggin --- arch/powerpc/kernel/mce.c | 4 ++-- 1 file changed, 2 insertions(+), 2 deletions(-) diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c index cff31d4a501f..a3b122a685a5 100644 --- a/arch/powerpc/kernel/mce.c +++ b/arch/powerpc/kernel/mce.c @@ -34,7 +34,7 @@ static DEFINE_PER_CPU(struct machine_check_event[MAX_MC_EVT], static void machine_check_process_queued_event(struct irq_work *work); static void machine_check_ue_irq_work(struct irq_work *work); -void machine_check_ue_event(struct machine_check_event *evt); +static void machine_check_ue_event(struct machine_check_event *evt); static void machine_process_ue_event(struct work_struct *work); static struct irq_work mce_event_process_work = { @@ -212,7 +212,7 @@ static void machine_check_ue_irq_work(struct irq_work *work) /* * Queue up the MCE event which then can be handled later. */ -void machine_check_ue_event(struct machine_check_event *evt) +static void machine_check_ue_event(struct machine_check_event *evt) { int index; -- 2.21.0
[PATCH v10 0/7] powerpc: implement machine check safe memcpy
During a memcpy from a pmem device, if a machine check exception is generated we end up in a panic. In case of fsdax read, this should only result in a -EIO. Avoid MCE by implementing memcpy_mcsafe. Before this patch series: ``` bash-4.4# mount -o dax /dev/pmem0 /mnt/pmem/ [ 7621.714094] Disabling lock debugging due to kernel taint [ 7621.714099] MCE: CPU0: machine check (Severe) Host UE Load/Store [Not recovered] [ 7621.714104] MCE: CPU0: NIP: [c0088978] memcpy_power7+0x418/0x7e0 [ 7621.714107] MCE: CPU0: Hardware error [ 7621.714112] opal: Hardware platform error: Unrecoverable Machine Check exception [ 7621.714118] CPU: 0 PID: 1368 Comm: mount Tainted: G M 5.2.0-rc5-00239-g241e39004581 #50 [ 7621.714123] NIP: c0088978 LR: c08e16f8 CTR: 01de [ 7621.714129] REGS: c000fffbfd70 TRAP: 0200 Tainted: G M (5.2.0-rc5-00239-g241e39004581) [ 7621.714131] MSR: 92209033 CR: 24428840 XER: 0004 [ 7621.714160] CFAR: c00889a8 DAR: deadbeefdeadbeef DSISR: 8000 IRQMASK: 0 [ 7621.714171] GPR00: 0e00 c000f0b8b1e0 c12cf100 c000ed8e1100 [ 7621.714186] GPR04: c2001100 0001 0200 03fff1272000 [ 7621.714201] GPR08: 8000 0010 0020 0030 [ 7621.714216] GPR12: 0040 7fffb8c6d390 0050 0060 [ 7621.714232] GPR16: 0070 0001 c000f0b8b960 [ 7621.714247] GPR20: 0001 c000f0b8b940 0001 0001 [ 7621.714262] GPR24: c1382560 c00c003b6380 c00c003b6380 0001 [ 7621.714277] GPR28: 0001 c200 0001 [ 7621.714294] NIP [c0088978] memcpy_power7+0x418/0x7e0 [ 7621.714298] LR [c08e16f8] pmem_do_bvec+0xf8/0x430 ... ... ``` After this patch series: ``` bash-4.4# mount -o dax /dev/pmem0 /mnt/pmem/ [25302.883978] Buffer I/O error on dev pmem0, logical block 0, async page read [25303.020816] EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [25303.021236] EXT4-fs (pmem0): Can't read superblock on 2nd try [25303.152515] EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [25303.284031] EXT4-fs (pmem0): DAX enabled. Warning: EXPERIMENTAL, use at your own risk [25304.084100] UDF-fs: bad mount option "dax" or missing value mount: /mnt/pmem: wrong fs type, bad option, bad superblock on /dev/pmem0, missing codepage or helper program, or other error. ``` MCE is injected on a pmem address using mambo. The last patch which adds a nop is only for testing on mambo, where r13 is not restored upon hitting vector 200. The memcpy code can be optimised by adding VMX optimizations and GAS macros can be used to enable code reusablity, which I will send as another series. -- v10: Fix authorship; add reviewed-bys and acks. v9: * Add a new IRQ work for UE events [mahesh] * Reorder patches, and copy stable v8: * While ignoring UE events, return was used instead of continue. * Checkpatch fixups for commit log v7: * Move schedule_work to be called from irq_work. v6: * Don't return pfn, all callees are expecting physical address anyway [nick] * Patch re-ordering: move exception table patch before memcpy_mcsafe patch [nick] * Reword commit log for search_exception_tables patch [nick] v5: * Don't use search_exception_tables since it searches for module exception tables also [Nicholas] * Fix commit message for patch 2 [Nicholas] v4: * Squash return remaining bytes patch to memcpy_mcsafe implemtation patch [christophe] * Access ok should be checked for copy_to_user_mcsafe() [christophe] v3: * Drop patch which enables DR/IR for external modules * Drop notifier call chain, we don't want to do that in real mode * Return remaining bytes from memcpy_mcsafe correctly * We no longer restore r13 for simulator tests, rather use a nop at vector 0x200 [workaround for simulator; not to be merged] v2: * Don't set RI bit explicitly [mahesh] * Re-ordered series to get r13 workaround as the last patch -- Balbir Singh (3): powerpc/mce: Fix MCE handling for huge pages powerpc/memcpy: Add memcpy_mcsafe for pmem powerpc/mce: Handle UE event for memcpy_mcsafe Reza Arbab (1): powerpc/mce: Make machine_check_ue_event() static Santosh Sivaraj (3): powerpc/mce: Schedule work from irq_work extable: Add function to search only kernel exception table powerpc: add machine check safe copy_to_user arch/powerpc/Kconfig | 1 + arch/powerpc/include/asm/mce.h | 6 +- arch/powerpc/include/asm/string.h| 2 + arch/powerpc/include/asm/uaccess.h | 14 ++ arch/powerpc/kernel/mce.c| 31 +++- arch/powerpc/kernel/mce_power.c | 70 arch/powerpc/lib/Makefile| 2 +- arch/powerpc/lib/memcpy_mcsafe_64.S | 242
Re: [PATCH v6 5/8] clk: mediatek: Add MT6765 clock support
Quoting Macpaul Lin (2019-07-12 02:43:41) > diff --git a/drivers/clk/mediatek/clk-mt6765-audio.c > b/drivers/clk/mediatek/clk-mt6765-audio.c > new file mode 100644 > index ..41f19343dfb9 > --- /dev/null > +++ b/drivers/clk/mediatek/clk-mt6765-audio.c > @@ -0,0 +1,109 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (c) 2018 MediaTek Inc. > + * Author: Owen Chen > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 as > + * published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. Please use SPDX tags. > + */ > + > +#include > +#include > + > +#include "clk-mtk.h" > +#include "clk-gate.h" > + > diff --git a/drivers/clk/mediatek/clk-mt6765-vcodec.c > b/drivers/clk/mediatek/clk-mt6765-vcodec.c > new file mode 100644 > index ..eb9ae1c2c99c > --- /dev/null > +++ b/drivers/clk/mediatek/clk-mt6765-vcodec.c > @@ -0,0 +1,79 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (c) 2018 MediaTek Inc. > + * Author: Owen Chen > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 as > + * published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. > + */ SPDX tags. > diff --git a/drivers/clk/mediatek/clk-mt6765.c > b/drivers/clk/mediatek/clk-mt6765.c > new file mode 100644 > index ..f716a48a926d > --- /dev/null > +++ b/drivers/clk/mediatek/clk-mt6765.c > @@ -0,0 +1,961 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (c) 2018 MediaTek Inc. > + * Author: Owen Chen > + * > + * This program is free software; you can redistribute it and/or modify > + * it under the terms of the GNU General Public License version 2 as > + * published by the Free Software Foundation. > + * > + * This program is distributed in the hope that it will be useful, > + * but WITHOUT ANY WARRANTY; without even the implied warranty of > + * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the > + * GNU General Public License for more details. SPDX tags. > + */ > + > +#include > +#include > +#include > +#include > +#include Is this used? Maybe I deleted it. > +#include > +#include [...] > + > +static const char * const axi_parents[] = { > + "clk26m", > + "syspll_d7", > + "syspll1_d4", > + "syspll3_d2" > +}; > + > +static const char * const mem_parents[] = { > + "clk26m", > + "dmpll_ck", > + "apll1_ck" > +}; > + > +static const char * const mm_parents[] = { > + "clk26m", > + "mmpll_ck", > + "syspll1_d2", > + "syspll_d5", > + "syspll1_d4", > + "univpll_d5", > + "univpll1_d2", > + "mmpll_d2" > +}; > + > +static const char * const scp_parents[] = { > + "clk26m", > + "syspll4_d2", > + "univpll2_d2", > + "syspll1_d2", > + "univpll1_d2", > + "syspll_d3", > + "univpll_d3" > +}; > + > +static const char * const mfg_parents[] = { > + "clk26m", > + "mfgpll_ck", > + "syspll_d3", > + "univpll_d3" > +}; > + > +static const char * const atb_parents[] = { > + "clk26m", > + "syspll1_d4", > + "syspll1_d2" > +}; > + > +static const char * const camtg_parents[] = { > + "clk26m", > + "usb20_192m_d8", > + "univpll2_d8", > + "usb20_192m_d4", > + "univpll2_d32", > + "usb20_192m_d16", > + "usb20_192m_d32" > +}; > + > +static const char * const uart_parents[] = { > + "clk26m", > + "univpll2_d8" > +}; > + > +static const char * const spi_parents[] = { > + "clk26m", > + "syspll3_d2", > + "syspll4_d2", > + "syspll2_d4" > +}; > + > +static const char * const msdc5hclk_parents[] = { > + "clk26m", > + "syspll1_d2", > + "univpll1_d4", > + "syspll2_d2" > +}; > + > +static const char * const msdc50_0_parents[] = { > + "clk26m", > + "msdcpll_ck", > + "syspll2_d2", > + "syspll4_d2", > + "univpll1_d2", > + "syspll1_d2", > + "univpll_d5", > + "univpll1_d4" > +}; > + > +static const char * const msdc30_1_parents[] = { > + "clk26m", > + "msdcpll_d2", > + "univpll2_d2", > + "syspll2_d2", > + "syspll1_d4", > + "univpll1_d4", > + "usb20_192m_d4", > + "syspll2_d4" > +}; > + > +static const char * const audio_parents[] = { > + "clk26m", > +
Re: [PATCH v4 2/7] x86: kvm: svm: propagate errors from skip_emulated_instruction()
On Wed, Aug 14, 2019 at 11:34:52AM +0200, Vitaly Kuznetsov wrote: > Sean Christopherson writes: > > > x86_emulate_instruction() doesn't set vcpu->run->exit_reason when emulation > > fails with EMULTYPE_SKIP, i.e. this will exit to userspace with garbage in > > the exit_reason. > > Oh, nice catch, will take a look! Don't worry about addressing this. Paolo has already queued the series, and I've got a patch set waiting that purges emulation_result entirely that I'll post once your series hits kvm/queue.
[PATCH v4 3/3] x86/kasan: support KASAN_VMALLOC
In the case where KASAN directly allocates memory to back vmalloc space, don't map the early shadow page over it. We prepopulate pgds/p4ds for the range that would otherwise be empty. This is required to get it synced to hardware on boot, allowing the lower levels of the page tables to be filled dynamically. Acked-by: Dmitry Vyukov Signed-off-by: Daniel Axtens --- v2: move from faulting in shadow pgds to prepopulating --- arch/x86/Kconfig| 1 + arch/x86/mm/kasan_init_64.c | 61 + 2 files changed, 62 insertions(+) diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig index 222855cc0158..40562cc3771f 100644 --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -134,6 +134,7 @@ config X86 select HAVE_ARCH_JUMP_LABEL select HAVE_ARCH_JUMP_LABEL_RELATIVE select HAVE_ARCH_KASAN if X86_64 + select HAVE_ARCH_KASAN_VMALLOC if X86_64 select HAVE_ARCH_KGDB select HAVE_ARCH_MMAP_RND_BITS if MMU select HAVE_ARCH_MMAP_RND_COMPAT_BITS if MMU && COMPAT diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c index 296da58f3013..2f57c4ddff61 100644 --- a/arch/x86/mm/kasan_init_64.c +++ b/arch/x86/mm/kasan_init_64.c @@ -245,6 +245,52 @@ static void __init kasan_map_early_shadow(pgd_t *pgd) } while (pgd++, addr = next, addr != end); } +static void __init kasan_shallow_populate_p4ds(pgd_t *pgd, + unsigned long addr, + unsigned long end, + int nid) +{ + p4d_t *p4d; + unsigned long next; + void *p; + + p4d = p4d_offset(pgd, addr); + do { + next = p4d_addr_end(addr, end); + + if (p4d_none(*p4d)) { + p = early_alloc(PAGE_SIZE, nid, true); + p4d_populate(_mm, p4d, p); + } + } while (p4d++, addr = next, addr != end); +} + +static void __init kasan_shallow_populate_pgds(void *start, void *end) +{ + unsigned long addr, next; + pgd_t *pgd; + void *p; + int nid = early_pfn_to_nid((unsigned long)start); + + addr = (unsigned long)start; + pgd = pgd_offset_k(addr); + do { + next = pgd_addr_end(addr, (unsigned long)end); + + if (pgd_none(*pgd)) { + p = early_alloc(PAGE_SIZE, nid, true); + pgd_populate(_mm, pgd, p); + } + + /* +* we need to populate p4ds to be synced when running in +* four level mode - see sync_global_pgds_l4() +*/ + kasan_shallow_populate_p4ds(pgd, addr, next, nid); + } while (pgd++, addr = next, addr != (unsigned long)end); +} + + #ifdef CONFIG_KASAN_INLINE static int kasan_die_handler(struct notifier_block *self, unsigned long val, @@ -352,9 +398,24 @@ void __init kasan_init(void) shadow_cpu_entry_end = (void *)round_up( (unsigned long)shadow_cpu_entry_end, PAGE_SIZE); + /* +* If we're in full vmalloc mode, don't back vmalloc space with early +* shadow pages. Instead, prepopulate pgds/p4ds so they are synced to +* the global table and we can populate the lower levels on demand. +*/ +#ifdef CONFIG_KASAN_VMALLOC + kasan_shallow_populate_pgds( + kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM), + kasan_mem_to_shadow((void *)VMALLOC_END)); + + kasan_populate_early_shadow( + kasan_mem_to_shadow((void *)VMALLOC_END + 1), + shadow_cpu_entry_begin); +#else kasan_populate_early_shadow( kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM), shadow_cpu_entry_begin); +#endif kasan_populate_shadow((unsigned long)shadow_cpu_entry_begin, (unsigned long)shadow_cpu_entry_end, 0); -- 2.20.1
[PATCH v4 1/3] kasan: support backing vmalloc space with real shadow memory
Hook into vmalloc and vmap, and dynamically allocate real shadow memory to back the mappings. Most mappings in vmalloc space are small, requiring less than a full page of shadow space. Allocating a full shadow page per mapping would therefore be wasteful. Furthermore, to ensure that different mappings use different shadow pages, mappings would have to be aligned to KASAN_SHADOW_SCALE_SIZE * PAGE_SIZE. Instead, share backing space across multiple mappings. Allocate a backing page the first time a mapping in vmalloc space uses a particular page of the shadow region. Keep this page around regardless of whether the mapping is later freed - in the mean time the page could have become shared by another vmalloc mapping. This can in theory lead to unbounded memory growth, but the vmalloc allocator is pretty good at reusing addresses, so the practical memory usage grows at first but then stays fairly stable. This requires architecture support to actually use: arches must stop mapping the read-only zero page over portion of the shadow region that covers the vmalloc space and instead leave it unmapped. This allows KASAN with VMAP_STACK, and will be needed for architectures that do not have a separate module space (e.g. powerpc64, which I am currently working on). It also allows relaxing the module alignment back to PAGE_SIZE. Link: https://bugzilla.kernel.org/show_bug.cgi?id=202009 Acked-by: Vasily Gorbik Signed-off-by: Daniel Axtens [Mark: rework shadow allocation] Signed-off-by: Mark Rutland -- v2: let kasan_unpoison_shadow deal with ranges that do not use a full shadow byte. v3: relax module alignment rename to kasan_populate_vmalloc which is a much better name deal with concurrency correctly v4: Integrate Mark's rework Poision pages on vfree Handle allocation failures. I've tested this by inserting artificial failures and using test_vmalloc to stress it. I haven't handled the per-cpu case: it looked like it would require a messy hacking-up of the function to deal with an OOM failure case in a debug feature. --- Documentation/dev-tools/kasan.rst | 60 +++ include/linux/kasan.h | 24 +++ include/linux/moduleloader.h | 2 +- include/linux/vmalloc.h | 12 ++ lib/Kconfig.kasan | 16 lib/test_kasan.c | 26 mm/kasan/common.c | 67 +++ mm/kasan/generic_report.c | 3 ++ mm/kasan/kasan.h | 1 + mm/vmalloc.c | 28 - 10 files changed, 237 insertions(+), 2 deletions(-) diff --git a/Documentation/dev-tools/kasan.rst b/Documentation/dev-tools/kasan.rst index b72d07d70239..35fda484a672 100644 --- a/Documentation/dev-tools/kasan.rst +++ b/Documentation/dev-tools/kasan.rst @@ -215,3 +215,63 @@ brk handler is used to print bug reports. A potential expansion of this mode is a hardware tag-based mode, which would use hardware memory tagging support instead of compiler instrumentation and manual shadow memory manipulation. + +What memory accesses are sanitised by KASAN? + + +The kernel maps memory in a number of different parts of the address +space. This poses something of a problem for KASAN, which requires +that all addresses accessed by instrumented code have a valid shadow +region. + +The range of kernel virtual addresses is large: there is not enough +real memory to support a real shadow region for every address that +could be accessed by the kernel. + +By default +~~ + +By default, architectures only map real memory over the shadow region +for the linear mapping (and potentially other small areas). For all +other areas - such as vmalloc and vmemmap space - a single read-only +page is mapped over the shadow area. This read-only shadow page +declares all memory accesses as permitted. + +This presents a problem for modules: they do not live in the linear +mapping, but in a dedicated module space. By hooking in to the module +allocator, KASAN can temporarily map real shadow memory to cover +them. This allows detection of invalid accesses to module globals, for +example. + +This also creates an incompatibility with ``VMAP_STACK``: if the stack +lives in vmalloc space, it will be shadowed by the read-only page, and +the kernel will fault when trying to set up the shadow data for stack +variables. + +CONFIG_KASAN_VMALLOC + + +With ``CONFIG_KASAN_VMALLOC``, KASAN can cover vmalloc space at the +cost of greater memory usage. Currently this is only supported on x86. + +This works by hooking into vmalloc and vmap, and dynamically +allocating real shadow memory to back the mappings. + +Most mappings in vmalloc space are small, requiring less than a full +page of shadow space. Allocating a full shadow page per mapping would +therefore be wasteful. Furthermore, to ensure that different mappings
[PATCH v4 0/3] kasan: support backing vmalloc space with real shadow memory
Currently, vmalloc space is backed by the early shadow page. This means that kasan is incompatible with VMAP_STACK, and it also provides a hurdle for architectures that do not have a dedicated module space (like powerpc64). This series provides a mechanism to back vmalloc space with real, dynamically allocated memory. I have only wired up x86, because that's the only currently supported arch I can work with easily, but it's very easy to wire up other architectures. This has been discussed before in the context of VMAP_STACK: - https://bugzilla.kernel.org/show_bug.cgi?id=202009 - https://lkml.org/lkml/2018/7/22/198 - https://lkml.org/lkml/2019/7/19/822 In terms of implementation details: Most mappings in vmalloc space are small, requiring less than a full page of shadow space. Allocating a full shadow page per mapping would therefore be wasteful. Furthermore, to ensure that different mappings use different shadow pages, mappings would have to be aligned to KASAN_SHADOW_SCALE_SIZE * PAGE_SIZE. Instead, share backing space across multiple mappings. Allocate a backing page the first time a mapping in vmalloc space uses a particular page of the shadow region. Keep this page around regardless of whether the mapping is later freed - in the mean time the page could have become shared by another vmalloc mapping. This can in theory lead to unbounded memory growth, but the vmalloc allocator is pretty good at reusing addresses, so the practical memory usage appears to grow at first but then stay fairly stable. If we run into practical memory exhaustion issues, I'm happy to consider hooking into the book-keeping that vmap does, but I am not convinced that it will be an issue. v1: https://lore.kernel.org/linux-mm/20190725055503.19507-1-...@axtens.net/ v2: https://lore.kernel.org/linux-mm/20190729142108.23343-1-...@axtens.net/ Address review comments: - Patch 1: use kasan_unpoison_shadow's built-in handling of ranges that do not align to a full shadow byte - Patch 3: prepopulate pgds rather than faulting things in v3: https://lore.kernel.org/linux-mm/20190731071550.31814-1-...@axtens.net/ Address comments from Mark Rutland: - kasan_populate_vmalloc is a better name - handle concurrency correctly - various nits and cleanups - relax module alignment in KASAN_VMALLOC case v4: Changes to patch 1 only: - Integrate Mark's rework, thanks Mark! - handle the case where kasan_populate_shadow might fail - poision shadow on free, allowing the alloc path to just unpoision memory that it uses Daniel Axtens (3): kasan: support backing vmalloc space with real shadow memory fork: support VMAP_STACK with KASAN_VMALLOC x86/kasan: support KASAN_VMALLOC Documentation/dev-tools/kasan.rst | 60 +++ arch/Kconfig | 9 +++-- arch/x86/Kconfig | 1 + arch/x86/mm/kasan_init_64.c | 61 include/linux/kasan.h | 24 +++ include/linux/moduleloader.h | 2 +- include/linux/vmalloc.h | 12 ++ kernel/fork.c | 4 ++ lib/Kconfig.kasan | 16 lib/test_kasan.c | 26 mm/kasan/common.c | 67 +++ mm/kasan/generic_report.c | 3 ++ mm/kasan/kasan.h | 1 + mm/vmalloc.c | 28 - 14 files changed, 308 insertions(+), 6 deletions(-) -- 2.20.1
[PATCH v4 2/3] fork: support VMAP_STACK with KASAN_VMALLOC
Supporting VMAP_STACK with KASAN_VMALLOC is straightforward: - clear the shadow region of vmapped stacks when swapping them in - tweak Kconfig to allow VMAP_STACK to be turned on with KASAN Reviewed-by: Dmitry Vyukov Signed-off-by: Daniel Axtens --- arch/Kconfig | 9 + kernel/fork.c | 4 2 files changed, 9 insertions(+), 4 deletions(-) diff --git a/arch/Kconfig b/arch/Kconfig index a7b57dd42c26..e791196005e1 100644 --- a/arch/Kconfig +++ b/arch/Kconfig @@ -825,16 +825,17 @@ config HAVE_ARCH_VMAP_STACK config VMAP_STACK default y bool "Use a virtually-mapped stack" - depends on HAVE_ARCH_VMAP_STACK && !KASAN + depends on HAVE_ARCH_VMAP_STACK + depends on !KASAN || KASAN_VMALLOC ---help--- Enable this if you want the use virtually-mapped kernel stacks with guard pages. This causes kernel stack overflows to be caught immediately rather than causing difficult-to-diagnose corruption. - This is presently incompatible with KASAN because KASAN expects - the stack to map directly to the KASAN shadow map using a formula - that is incorrect if the stack is in vmalloc space. + To use this with KASAN, the architecture must support backing + virtual mappings with real shadow memory, and KASAN_VMALLOC must + be enabled. config ARCH_OPTIONAL_KERNEL_RWX def_bool n diff --git a/kernel/fork.c b/kernel/fork.c index d8ae0f1b4148..ce3150fe8ff2 100644 --- a/kernel/fork.c +++ b/kernel/fork.c @@ -94,6 +94,7 @@ #include #include #include +#include #include #include @@ -215,6 +216,9 @@ static unsigned long *alloc_thread_stack_node(struct task_struct *tsk, int node) if (!s) continue; + /* Clear the KASAN shadow of the stack. */ + kasan_unpoison_shadow(s->addr, THREAD_SIZE); + /* Clear stale pointers from reused stack. */ memset(s->addr, 0, THREAD_SIZE); -- 2.20.1
Re: [PATCH v9 6/7] powerpc/mce: Handle UE event for memcpy_mcsafe
Hi Balbir, Balbir Singh writes: > On 12/8/19 7:22 pm, Santosh Sivaraj wrote: >> If we take a UE on one of the instructions with a fixup entry, set nip >> to continue execution at the fixup entry. Stop processing the event >> further or print it. >> >> Co-developed-by: Reza Arbab >> Signed-off-by: Reza Arbab >> Cc: Mahesh Salgaonkar >> Signed-off-by: Santosh Sivaraj >> --- > > Isn't this based on https://patchwork.ozlabs.org/patch/895294/? If so it > should still have my author tag and signed-off-by Originally when I received the series for posting, I had Reza's authorship and signed-off-by, since the patch changed significantly I added co-developed-by as Reza. I will update in the next spin. https://lore.kernel.org/linuxppc-dev/20190702051932.511-1-sant...@fossix.org/ Santosh > > Balbir Singh > >> arch/powerpc/include/asm/mce.h | 4 +++- >> arch/powerpc/kernel/mce.c | 16 >> arch/powerpc/kernel/mce_power.c | 15 +-- >> 3 files changed, 32 insertions(+), 3 deletions(-) >> >> diff --git a/arch/powerpc/include/asm/mce.h b/arch/powerpc/include/asm/mce.h >> index f3a6036b6bc0..e1931c8c2743 100644 >> --- a/arch/powerpc/include/asm/mce.h >> +++ b/arch/powerpc/include/asm/mce.h >> @@ -122,7 +122,8 @@ struct machine_check_event { >> enum MCE_UeErrorType ue_error_type:8; >> u8 effective_address_provided; >> u8 physical_address_provided; >> -u8 reserved_1[5]; >> +u8 ignore_event; >> +u8 reserved_1[4]; >> u64 effective_address; >> u64 physical_address; >> u8 reserved_2[8]; >> @@ -193,6 +194,7 @@ struct mce_error_info { >> enum MCE_Initiator initiator:8; >> enum MCE_ErrorClass error_class:8; >> boolsync_error; >> +boolignore_event; >> }; >> >> #define MAX_MC_EVT 100 >> diff --git a/arch/powerpc/kernel/mce.c b/arch/powerpc/kernel/mce.c >> index a3b122a685a5..ec4b3e1087be 100644 >> --- a/arch/powerpc/kernel/mce.c >> +++ b/arch/powerpc/kernel/mce.c >> @@ -149,6 +149,7 @@ void save_mce_event(struct pt_regs *regs, long handled, >> if (phys_addr != ULONG_MAX) { >> mce->u.ue_error.physical_address_provided = true; >> mce->u.ue_error.physical_address = phys_addr; >> +mce->u.ue_error.ignore_event = mce_err->ignore_event; >> machine_check_ue_event(mce); >> } >> } >> @@ -266,8 +267,17 @@ static void machine_process_ue_event(struct work_struct >> *work) >> /* >> * This should probably queued elsewhere, but >> * oh! well >> + * >> + * Don't report this machine check because the caller has a >> + * asked us to ignore the event, it has a fixup handler which >> + * will do the appropriate error handling and reporting. >> */ >> if (evt->error_type == MCE_ERROR_TYPE_UE) { >> +if (evt->u.ue_error.ignore_event) { >> +__this_cpu_dec(mce_ue_count); >> +continue; >> +} >> + >> if (evt->u.ue_error.physical_address_provided) { >> unsigned long pfn; >> >> @@ -301,6 +311,12 @@ static void machine_check_process_queued_event(struct >> irq_work *work) >> while (__this_cpu_read(mce_queue_count) > 0) { >> index = __this_cpu_read(mce_queue_count) - 1; >> evt = this_cpu_ptr(_event_queue[index]); >> + >> +if (evt->error_type == MCE_ERROR_TYPE_UE && >> +evt->u.ue_error.ignore_event) { >> +__this_cpu_dec(mce_queue_count); >> +continue; >> +} >> machine_check_print_event_info(evt, false, false); >> __this_cpu_dec(mce_queue_count); >> } >> diff --git a/arch/powerpc/kernel/mce_power.c >> b/arch/powerpc/kernel/mce_power.c >> index e74816f045f8..1dd87f6f5186 100644 >> --- a/arch/powerpc/kernel/mce_power.c >> +++ b/arch/powerpc/kernel/mce_power.c >> @@ -11,6 +11,7 @@ >> >> #include >> #include >> +#include >> #include >> #include >> #include >> @@ -18,6 +19,7 @@ >> #include >> #include >> #include >> +#include >> >> /* >> * Convert an address related to an mm to a physical address. >> @@ -559,9 +561,18 @@ static int mce_handle_derror(struct pt_regs *regs, >> return 0; >> } >> >> -static long mce_handle_ue_error(struct pt_regs *regs) >> +static long mce_handle_ue_error(struct pt_regs *regs, >> +struct mce_error_info *mce_err) >> { >> long handled = 0; >> +const struct
[PATCH] ipvlan: set hw_enc_features like macvlan
Allow encapsulated packets sent to tunnels layered over ipvlan to use offloads rather than forcing SW fallbacks. Since commit f21e5077010acda73a60 ("macvlan: add offload features for encapsulation"), macvlan has set dev->hw_enc_features to include everything in dev->features; do likewise in ipvlan. Signed-off-by: Bill Sommerfeld --- drivers/net/ipvlan/ipvlan_main.c | 1 + 1 file changed, 1 insertion(+) diff --git a/drivers/net/ipvlan/ipvlan_main.c b/drivers/net/ipvlan/ipvlan_main.c index 1c96bed5a7c4..887bbba4631e 100644 --- a/drivers/net/ipvlan/ipvlan_main.c +++ b/drivers/net/ipvlan/ipvlan_main.c @@ -126,6 +126,7 @@ static int ipvlan_init(struct net_device *dev) (phy_dev->state & IPVLAN_STATE_MASK); dev->features = phy_dev->features & IPVLAN_FEATURES; dev->features |= NETIF_F_LLTX | NETIF_F_VLAN_CHALLENGED; + dev->hw_enc_features |= dev->features; dev->gso_max_size = phy_dev->gso_max_size; dev->gso_max_segs = phy_dev->gso_max_segs; dev->hard_header_len = phy_dev->hard_header_len; -- 2.23.0.rc1.153.gdeed80330f-goog
Re: [PATCH v9 7/7] powerpc: add machine check safe copy_to_user
Hi Balbir, Balbir Singh writes: > On 12/8/19 7:22 pm, Santosh Sivaraj wrote: >> Use memcpy_mcsafe() implementation to define copy_to_user_mcsafe() >> >> Signed-off-by: Santosh Sivaraj >> --- >> arch/powerpc/Kconfig | 1 + >> arch/powerpc/include/asm/uaccess.h | 14 ++ >> 2 files changed, 15 insertions(+) >> >> diff --git a/arch/powerpc/Kconfig b/arch/powerpc/Kconfig >> index 77f6ebf97113..4316e36095a2 100644 >> --- a/arch/powerpc/Kconfig >> +++ b/arch/powerpc/Kconfig >> @@ -137,6 +137,7 @@ config PPC >> select ARCH_HAS_STRICT_KERNEL_RWX if ((PPC_BOOK3S_64 || PPC32) && >> !RELOCATABLE && !HIBERNATION) >> select ARCH_HAS_TICK_BROADCAST if GENERIC_CLOCKEVENTS_BROADCAST >> select ARCH_HAS_UACCESS_FLUSHCACHE if PPC64 >> +select ARCH_HAS_UACCESS_MCSAFE if PPC64 >> select ARCH_HAS_UBSAN_SANITIZE_ALL >> select ARCH_HAVE_NMI_SAFE_CMPXCHG >> select ARCH_KEEP_MEMBLOCK >> diff --git a/arch/powerpc/include/asm/uaccess.h >> b/arch/powerpc/include/asm/uaccess.h >> index 8b03eb44e876..15002b51ff18 100644 >> --- a/arch/powerpc/include/asm/uaccess.h >> +++ b/arch/powerpc/include/asm/uaccess.h >> @@ -387,6 +387,20 @@ static inline unsigned long raw_copy_to_user(void >> __user *to, >> return ret; >> } >> >> +static __always_inline unsigned long __must_check >> +copy_to_user_mcsafe(void __user *to, const void *from, unsigned long n) >> +{ >> +if (likely(check_copy_size(from, n, true))) { >> +if (access_ok(to, n)) { >> +allow_write_to_user(to, n); >> +n = memcpy_mcsafe((void *)to, from, n); >> +prevent_write_to_user(to, n); >> +} >> +} >> + >> +return n; > > Do we always return n independent of the check_copy_size return value and > access_ok return values? Yes we always return the remaining bytes not copied even if check_copy_size or access_ok fails. Santosh > > Balbir Singh. > >> +} >> + >> extern unsigned long __clear_user(void __user *addr, unsigned long size); >> >> static inline unsigned long clear_user(void __user *addr, unsigned long >> size) >>
[PATCH bpf-next] tools: libbpf: update extended attributes version of bpf_object__open()
Update the bpf_object_open_attr structure and corresponding code so that the bpf_object__open_xattr function could be used to open objects from buffers as well as from files. The reason for this change is that the existing bpf_object__open_buffer function doesn't provide a way to specify neither the needs_kver nor flags parameters to the internal call to __bpf_object__open which makes it inconvenient for loading BPF objects which doesn't require a kernel version. Two new fields, obj_buf and obj_buf_sz, were added to the structure, and the file field was union'ed with obj_name so that one can open an object like this: struct bpf_object_open_attr attr = { .obj_name = name, .obj_buf= obj_buf, .obj_buf_sz = obj_buf_sz, .prog_type = BPF_PROG_TYPE_UNSPEC, }; return bpf_object__open_xattr(); while still being able to use the file semantics: struct bpf_object_open_attr attr = { .file = path, .prog_type = BPF_PROG_TYPE_UNSPEC, }; return bpf_object__open_xattr(); Another thing to note is that since the commit c034a177d3c8 ("bpf: bpftool, add flag to allow non-compat map definitions") which introduced a MAPS_RELAX_COMPAT flag to load objects with non-compat map definitions, bpf_object__open_buffer was called with this flag enabled (it was passed as the boolean true value in flags argument to __bpf_object__open). The default behaviour for all open functions is to clear this flag and this patch changes bpf_object__open_buffer to clears this flag. It can be enabled, if needed, by opening an object from buffer using __bpf_object__open_xattr. Signed-off-by: Anton Protopopov --- tools/lib/bpf/libbpf.c | 45 ++ tools/lib/bpf/libbpf.h | 7 ++- 2 files changed, 34 insertions(+), 18 deletions(-) diff --git a/tools/lib/bpf/libbpf.c b/tools/lib/bpf/libbpf.c index 2233f919dd88..7c8054afd901 100644 --- a/tools/lib/bpf/libbpf.c +++ b/tools/lib/bpf/libbpf.c @@ -3630,13 +3630,31 @@ __bpf_object__open(const char *path, void *obj_buf, size_t obj_buf_sz, struct bpf_object *__bpf_object__open_xattr(struct bpf_object_open_attr *attr, int flags) { + char tmp_name[64]; + /* param validation */ - if (!attr->file) + if (!attr) return NULL; - pr_debug("loading %s\n", attr->file); + if (attr->obj_buf) { + if (attr->obj_buf_sz <= 0) + return NULL; + if (!attr->file) { + snprintf(tmp_name, sizeof(tmp_name), "%lx-%lx", +(unsigned long)attr->obj_buf, +(unsigned long)attr->obj_buf_sz); + attr->obj_name = tmp_name; + } + pr_debug("loading object '%s' from buffer\n", attr->obj_name); + } else if (!attr->file) { + return NULL; + } else { + attr->obj_buf_sz = 0; - return __bpf_object__open(attr->file, NULL, 0, + pr_debug("loading object file '%s'\n", attr->file); + } + + return __bpf_object__open(attr->file, attr->obj_buf, attr->obj_buf_sz, bpf_prog_type__needs_kver(attr->prog_type), flags); } @@ -3660,21 +3678,14 @@ struct bpf_object *bpf_object__open_buffer(void *obj_buf, size_t obj_buf_sz, const char *name) { - char tmp_name[64]; - - /* param validation */ - if (!obj_buf || obj_buf_sz <= 0) - return NULL; - - if (!name) { - snprintf(tmp_name, sizeof(tmp_name), "%lx-%lx", -(unsigned long)obj_buf, -(unsigned long)obj_buf_sz); - name = tmp_name; - } - pr_debug("loading object '%s' from buffer\n", name); + struct bpf_object_open_attr attr = { + .obj_name = name, + .obj_buf= obj_buf, + .obj_buf_sz = obj_buf_sz, + .prog_type = BPF_PROG_TYPE_UNSPEC, + }; - return __bpf_object__open(name, obj_buf, obj_buf_sz, true, true); + return bpf_object__open_xattr(); } int bpf_object__unload(struct bpf_object *obj) diff --git a/tools/lib/bpf/libbpf.h b/tools/lib/bpf/libbpf.h index e8f70977d137..634f278578dd 100644 --- a/tools/lib/bpf/libbpf.h +++ b/tools/lib/bpf/libbpf.h @@ -63,8 +63,13 @@ LIBBPF_API libbpf_print_fn_t libbpf_set_print(libbpf_print_fn_t fn); struct bpf_object; struct bpf_object_open_attr { - const char *file; + union { + const char *file; + const char *obj_name; + }; enum bpf_prog_type prog_type; + void *obj_buf; + size_t obj_buf_sz; }; LIBBPF_API struct bpf_object *bpf_object__open(const char
Re: [PATCH v5 1/4] clk: core: link consumer with clock driver
Quoting Miquel Raynal (2019-05-21 05:51:10) > One major concern when, for instance, suspending/resuming a platform > is to never access registers before the underlying clock has been > resumed, otherwise most of the time the kernel will just crash. One > solution is to use syscore operations when registering clock drivers > suspend/resume callbacks. One problem of using syscore_ops is that the > suspend/resume scheduling will depend on the order of the > registrations, which brings (unacceptable) randomness in the process. > > A feature called device links has been introduced to handle such > situation. It creates dependencies between consumers and providers, > enforcing e.g. the suspend/resume order when needed. Such feature is > already in use for regulators. > > Add device links support in the clock subsystem by creating/deleting > the links at get/put time. > > Example of a boot (ESPRESSObin, A3700 SoC) with devices linked to clocks: > > marvell-armada-3700-tbg-clock d0013200.tbg: Linked as a consumer to > d0013800.pinctrl:xtal-clk > marvell-armada-3700-tbg-clock d0013200.tbg: Dropping the link to > d0013800.pinctrl:xtal-clk > marvell-armada-3700-tbg-clock d0013200.tbg: Linked as a consumer to > d0013800.pinctrl:xtal-clk > marvell-armada-3700-periph-clock d0013000.nb-periph-clk: Linked as a consumer > to d0013200.tbg > marvell-armada-3700-periph-clock d0013000.nb-periph-clk: Linked as a consumer > to d0013800.pinctrl:xtal-clk > marvell-armada-3700-periph-clock d0018000.sb-periph-clk: Linked as a consumer > to d0013200.tbg > mvneta d003.ethernet: Linked as a consumer to d0018000.sb-periph-clk > xhci-hcd d0058000.usb: Linked as a consumer to d0018000.sb-periph-clk > xenon-sdhci d00d.sdhci: Linked as a consumer to d0013000.nb-periph-clk > xenon-sdhci d00d.sdhci: Dropping the link to d0013000.nb-periph-clk > mvebu-uart d0012000.serial: Linked as a consumer to d0013800.pinctrl:xtal-clk > advk-pcie d007.pcie: Linked as a consumer to d0018000.sb-periph-clk > xenon-sdhci d00d.sdhci: Linked as a consumer to d0013000.nb-periph-clk > xenon-sdhci d00d.sdhci: Linked as a consumer to regulator.1 > cpu cpu0: Linked as a consumer to d0013000.nb-periph-clk > cpu cpu0: Dropping the link to d0013000.nb-periph-clk > cpu cpu0: Linked as a consumer to d0013000.nb-periph-clk > > Signed-off-by: Miquel Raynal > --- This patch doesn't apply. Things have changed upstream. > > diff --git a/drivers/clk/clk.c b/drivers/clk/clk.c > index ec6f04dcf5e6..e6b84ab43f9f 100644 > --- a/drivers/clk/clk.c > +++ b/drivers/clk/clk.c > @@ -1676,6 +1710,8 @@ static void clk_reparent(struct clk_core *core, struct > clk_core *new_parent) > > if (was_orphan != becomes_orphan) > clk_core_update_orphan_status(core, becomes_orphan); > + > + clk_link_hierarchy(core, new_parent); This isn't going to work. BUG: sleeping function called from invalid context at kernel/locking/mutex.c:909 in_atomic(): 1, irqs_disabled(): 128, pid: 1, name: swapper/0 3 locks held by swapper/0/1: #0: (ptrval) (>mutex){}, at: __device_driver_lock+0x40/0x4c #1: (ptrval) (prepare_lock){+.+.}, at: clk_prepare_lock+0x18/0x94 #2: (ptrval) (enable_lock){}, at: clk_enable_lock+0x34/0xdc irq event stamp: 311516 hardirqs last enabled at (311515): [] _raw_spin_unlock_irqrestore+0x54/0x90 hardirqs last disabled at (311516): [] clk_enable_lock+0x28/0xdc softirqs last enabled at (311348): [] __do_softirq+0x4cc/0x514 softirqs last disabled at (311341): [] irq_exit+0xd8/0xf8 CPU: 4 PID: 1 Comm: swapper/0 Tainted: GW 5.3.0-rc4-5-g6be06bbec80ef #10 Hardware name: Google Cheza (rev3+) (DT) Call trace: dump_backtrace+0x0/0x13c show_stack+0x20/0x2c dump_stack+0xc4/0x12c ___might_sleep+0x1b4/0x1c4 __might_sleep+0x50/0x88 __mutex_lock_common+0x5c/0xbfc mutex_lock_nested+0x40/0x50 device_link_add+0x88/0x3ac clk_reparent+0xc4/0x114 __clk_set_parent_before+0x74/0x90 clk_change_rate+0x98/0x854 clk_core_set_rate_nolock+0x1b0/0x21c clk_set_rate+0x3c/0x6c of_clk_set_defaults+0x29c/0x364 platform_drv_probe+0x28/0xb0 really_probe+0x130/0x2b4 driver_probe_device+0x64/0xfc device_driver_attach+0x4c/0x6c __driver_attach+0xb0/0xc4 bus_for_each_dev+0x84/0xcc driver_attach+0x2c/0x38 bus_add_driver+0xfc/0x1d0 driver_register+0x64/0xf0 __platform_driver_register+0x4c/0x58 msm_drm_register+0x5c/0x60 do_one_initcall+0x1e0/0x478 do_initcall_level+0x21c/0x25c do_basic_setup+0x60/0x78 kernel_init_freeable+0x128/0x1b0 kernel_init+0x14/0x100 ret_from_fork+0x10/0x18 > } else { > hlist_add_head(>child_node, _orphan_list); > if (!was_orphan) > @@ -2402,6 +2438,8 @@ __clk_init_parent(struct clk_core *core, bool > update_orphan) > if (!parent_hw) > return NULL; > > + clk_link_hierarchy(core, parent_hw->core); > + This is
Re: [RFC PATCH 2/2] mm/gup: introduce vaddr_pin_pages_remote()
On 8/14/19 4:50 PM, Ira Weiny wrote: On Tue, Aug 13, 2019 at 05:56:31PM -0700, John Hubbard wrote: On 8/13/19 5:51 PM, John Hubbard wrote: On 8/13/19 2:08 PM, Ira Weiny wrote: On Mon, Aug 12, 2019 at 05:07:32PM -0700, John Hubbard wrote: On 8/12/19 4:49 PM, Ira Weiny wrote: On Sun, Aug 11, 2019 at 06:50:44PM -0700, john.hubb...@gmail.com wrote: From: John Hubbard ... Finally, I struggle with converting everyone to a new call. It is more overhead to use vaddr_pin in the call above because now the GUP code is going to associate a file pin object with that file when in ODP we don't need that because the pages can move around. What if the pages in ODP are file-backed? oops, strike that, you're right: in that case, even the file system case is covered. Don't mind me. :) Ok so are we agreed we will drop the patch to the ODP code? I'm going to keep the FOLL_PIN flag and addition in the vaddr_pin_pages. Yes. I hope I'm not overlooking anything, but it all seems to make sense to let ODP just rely on the MMU notifiers. thanks, -- John Hubbard NVIDIA