Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions
Hi John, Thanks for having documentation as a part of the patch. Some kernel-doc nits below. On Mon, Dec 03, 2018 at 04:17:19PM -0800, john.hubb...@gmail.com wrote: > From: John Hubbard > > Introduces put_user_page(), which simply calls put_page(). > This provides a way to update all get_user_pages*() callers, > so that they call put_user_page(), instead of put_page(). > > Also introduces put_user_pages(), and a few dirty/locked variations, > as a replacement for release_pages(), and also as a replacement > for open-coded loops that release multiple pages. > These may be used for subsequent performance improvements, > via batching of pages to be released. > > This is the first step of fixing the problem described in [1]. The steps > are: > > 1) (This patch): provide put_user_page*() routines, intended to be used >for releasing pages that were pinned via get_user_pages*(). > > 2) Convert all of the call sites for get_user_pages*(), to >invoke put_user_page*(), instead of put_page(). This involves dozens of >call sites, and will take some time. > > 3) After (2) is complete, use get_user_pages*() and put_user_page*() to >implement tracking of these pages. This tracking will be separate from >the existing struct page refcounting. > > 4) Use the tracking and identification of these pages, to implement >special handling (especially in writeback paths) when the pages are >backed by a filesystem. Again, [1] provides details as to why that is >desirable. > > [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()" > > Reviewed-by: Jan Kara > > Cc: Matthew Wilcox > Cc: Michal Hocko > Cc: Christopher Lameter > Cc: Jason Gunthorpe > Cc: Dan Williams > Cc: Jan Kara > Cc: Al Viro > Cc: Jerome Glisse > Cc: Christoph Hellwig > Cc: Ralph Campbell > Signed-off-by: John Hubbard > --- > include/linux/mm.h | 20 > mm/swap.c | 80 ++ > 2 files changed, 100 insertions(+) > > diff --git a/include/linux/mm.h b/include/linux/mm.h > index 5411de93a363..09fbb2c81aba 100644 > --- a/include/linux/mm.h > +++ b/include/linux/mm.h > @@ -963,6 +963,26 @@ static inline void put_page(struct page *page) > __put_page(page); > } > > +/* > + * put_user_page() - release a page that had previously been acquired via > + * a call to one of the get_user_pages*() functions. Please add @page parameter description, otherwise kernel-doc is unhappy > + * > + * Pages that were pinned via get_user_pages*() must be released via > + * either put_user_page(), or one of the put_user_pages*() routines > + * below. This is so that eventually, pages that are pinned via > + * get_user_pages*() can be separately tracked and uniquely handled. In > + * particular, interactions with RDMA and filesystems need special > + * handling. > + */ > +static inline void put_user_page(struct page *page) > +{ > + put_page(page); > +} > + > +void put_user_pages_dirty(struct page **pages, unsigned long npages); > +void put_user_pages_dirty_lock(struct page **pages, unsigned long npages); > +void put_user_pages(struct page **pages, unsigned long npages); > + > #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) > #define SECTION_IN_PAGE_FLAGS > #endif > diff --git a/mm/swap.c b/mm/swap.c > index aa483719922e..bb8c32595e5f 100644 > --- a/mm/swap.c > +++ b/mm/swap.c > @@ -133,6 +133,86 @@ void put_pages_list(struct list_head *pages) > } > EXPORT_SYMBOL(put_pages_list); > > +typedef int (*set_dirty_func)(struct page *page); > + > +static void __put_user_pages_dirty(struct page **pages, > +unsigned long npages, > +set_dirty_func sdf) > +{ > + unsigned long index; > + > + for (index = 0; index < npages; index++) { > + struct page *page = compound_head(pages[index]); > + > + if (!PageDirty(page)) > + sdf(page); > + > + put_user_page(page); > + } > +} > + > +/* > + * put_user_pages_dirty() - for each page in the @pages array, make > + * that page (or its head page, if a compound page) dirty, if it was > + * previously listed as clean. Then, release the page using > + * put_user_page(). > + * > + * Please see the put_user_page() documentation for details. > + * > + * set_page_dirty(), which does not lock the page, is used here. > + * Therefore, it is the caller's responsibility to ensure that this is > + * safe. If not, then put_user_pages_dirty_lock() should be called instead. > + * > + * @pages: array of pages to be marked dirty and released. > + * @npages: number of pages in the @pages array. Please put the parameters description next to the brief function description, as described in [1] [1] https://www.kernel.org/doc/html/latest/doc-guide/kernel-doc.html#function-documentation > + * > + */ > +void put_user_pages_dirty(struct page **pages, unsigned long npages) > +{ >
RE: rcu_preempt caused oom
Hi, Paul: the enclosed is the log trigger the 120s hung_task_panic without other debug patches, the hung task is blocked at __wait_rcu_gp, it means the rcu_cpu_stall can't detect the scenario: echo 1 > /proc/sys/kernel/panic_on_rcu_stall echo 7 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout -Original Message- From: Paul E. McKenney Sent: Monday, December 3, 2018 9:57 PM To: He, Bo Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; j...@joshtriplett.org; mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin ; Zhang, Yanmin Subject: Re: rcu_preempt caused oom On Mon, Dec 03, 2018 at 07:44:03AM +, He, Bo wrote: > Thanks, we have run the test for the whole weekend and not reproduce the > issue, so we confirm the CONFIG_RCU_BOOST can fix the issue. Very good, that is encouraging. Perhaps I should think about making CONFIG_RCU_BOOST=y the default for CONFIG_PREEMPT in mainline, at least for architectures for which rt_mutexes are implemented. > We have enabled the rcupdate.rcu_cpu_stall_timeout=7 and also set panic on > rcu stall and will see if we can see the panic, will keep you posed with the > test results. > echo 1 > /proc/sys/kernel/panic_on_rcu_stall Looking forward to seeing what is going on! Of course, to reproduce, you will need to again build with CONFIG_RCU_BOOST=n. Thanx, Paul > -Original Message- > From: Paul E. McKenney > Sent: Saturday, December 1, 2018 12:49 AM > To: He, Bo > Cc: Steven Rostedt ; > linux-kernel@vger.kernel.org; j...@joshtriplett.org; > mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Zhang, Jun > ; Xiao, Jin ; Zhang, Yanmin > > Subject: Re: rcu_preempt caused oom > > On Fri, Nov 30, 2018 at 03:18:58PM +, He, Bo wrote: > > Here is the kernel cmdline: > > Thank you! > > > Kernel command line: androidboot.acpio_idx=0 > > androidboot.bootloader=efiwrapper-02_03-userdebug_kernelflinger-06_0 > > 3- userdebug androidboot.diskbus=00.0 > > androidboot.verifiedbootstate=green > > androidboot.bootreason=power-on androidboot.serialno=R1J56L6006a7bb > > g_ffs.iSerialNumber=R1J56L6006a7bb no_timer_check noxsaves > > reboot_panic=p,w i915.hpd_sense_invert=0x7 mem=2G nokaslr nopti > > ftrace_dump_on_oops trace_buf_size=1024K intel_iommu=off gpt > > loglevel=4 androidboot.hardware=gordon_peak > > firmware_class.path=/vendor/firmware relative_sleep_states=1 > > enforcing=0 androidboot.selinux=permissive cpu_init_udelay=10 > > androidboot.android_dt_dir=/sys/bus/platform/devices/ANDR0001:00/pro > > pe rties/android/ pstore.backend=ramoops memmap=0x140$0x5000 > > ramoops.mem_address=0x5000 ramoops.mem_size=0x140 > > ramoops.record_size=0x4000 ramoops.console_size=0x100 > > ramoops.ftrace_size=0x1 ramoops.dump_oops=1 vga=current > > i915.modeset=1 drm.atomic=1 i915.nuclear_pageflip=1 > > drm.vblankoffdelay= > > And no sign of any suppression of RCU CPU stall warnings. Hmmm... > It does take more than 21 seconds to OOM? Or do things happen faster than > that? If they do happen faster than that, then on approach would be to add > something like this to the kernel command line: > > rcupdate.rcu_cpu_stall_timeout=7 > > This would set the stall timeout to seven seconds. Note that timeouts less > than three seconds are silently interpreted as three seconds. > > Thanx, Paul > > > -Original Message- > > From: Steven Rostedt > > Sent: Friday, November 30, 2018 11:17 PM > > To: Paul E. McKenney > > Cc: He, Bo ; linux-kernel@vger.kernel.org; > > j...@joshtriplett.org; mathieu.desnoy...@efficios.com; > > jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin > > ; Zhang, Yanmin > > Subject: Re: rcu_preempt caused oom > > > > On Fri, 30 Nov 2018 06:43:17 -0800 > > "Paul E. McKenney" wrote: > > > > > Could you please send me your list of kernel boot parameters? > > > They usually appear near the start of your console output. > > > > Or just: cat /proc/cmdline > > > > -- Steve > > > apanic_console Description: apanic_console
[PATCH 1/2] mm: introduce put_user_page*(), placeholder versions
From: John Hubbard Introduces put_user_page(), which simply calls put_page(). This provides a way to update all get_user_pages*() callers, so that they call put_user_page(), instead of put_page(). Also introduces put_user_pages(), and a few dirty/locked variations, as a replacement for release_pages(), and also as a replacement for open-coded loops that release multiple pages. These may be used for subsequent performance improvements, via batching of pages to be released. This is the first step of fixing the problem described in [1]. The steps are: 1) (This patch): provide put_user_page*() routines, intended to be used for releasing pages that were pinned via get_user_pages*(). 2) Convert all of the call sites for get_user_pages*(), to invoke put_user_page*(), instead of put_page(). This involves dozens of call sites, and will take some time. 3) After (2) is complete, use get_user_pages*() and put_user_page*() to implement tracking of these pages. This tracking will be separate from the existing struct page refcounting. 4) Use the tracking and identification of these pages, to implement special handling (especially in writeback paths) when the pages are backed by a filesystem. Again, [1] provides details as to why that is desirable. [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()" Reviewed-by: Jan Kara Cc: Matthew Wilcox Cc: Michal Hocko Cc: Christopher Lameter Cc: Jason Gunthorpe Cc: Dan Williams Cc: Jan Kara Cc: Al Viro Cc: Jerome Glisse Cc: Christoph Hellwig Cc: Ralph Campbell Signed-off-by: John Hubbard --- include/linux/mm.h | 20 mm/swap.c | 80 ++ 2 files changed, 100 insertions(+) diff --git a/include/linux/mm.h b/include/linux/mm.h index 5411de93a363..09fbb2c81aba 100644 --- a/include/linux/mm.h +++ b/include/linux/mm.h @@ -963,6 +963,26 @@ static inline void put_page(struct page *page) __put_page(page); } +/* + * put_user_page() - release a page that had previously been acquired via + * a call to one of the get_user_pages*() functions. + * + * Pages that were pinned via get_user_pages*() must be released via + * either put_user_page(), or one of the put_user_pages*() routines + * below. This is so that eventually, pages that are pinned via + * get_user_pages*() can be separately tracked and uniquely handled. In + * particular, interactions with RDMA and filesystems need special + * handling. + */ +static inline void put_user_page(struct page *page) +{ + put_page(page); +} + +void put_user_pages_dirty(struct page **pages, unsigned long npages); +void put_user_pages_dirty_lock(struct page **pages, unsigned long npages); +void put_user_pages(struct page **pages, unsigned long npages); + #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP) #define SECTION_IN_PAGE_FLAGS #endif diff --git a/mm/swap.c b/mm/swap.c index aa483719922e..bb8c32595e5f 100644 --- a/mm/swap.c +++ b/mm/swap.c @@ -133,6 +133,86 @@ void put_pages_list(struct list_head *pages) } EXPORT_SYMBOL(put_pages_list); +typedef int (*set_dirty_func)(struct page *page); + +static void __put_user_pages_dirty(struct page **pages, + unsigned long npages, + set_dirty_func sdf) +{ + unsigned long index; + + for (index = 0; index < npages; index++) { + struct page *page = compound_head(pages[index]); + + if (!PageDirty(page)) + sdf(page); + + put_user_page(page); + } +} + +/* + * put_user_pages_dirty() - for each page in the @pages array, make + * that page (or its head page, if a compound page) dirty, if it was + * previously listed as clean. Then, release the page using + * put_user_page(). + * + * Please see the put_user_page() documentation for details. + * + * set_page_dirty(), which does not lock the page, is used here. + * Therefore, it is the caller's responsibility to ensure that this is + * safe. If not, then put_user_pages_dirty_lock() should be called instead. + * + * @pages: array of pages to be marked dirty and released. + * @npages: number of pages in the @pages array. + * + */ +void put_user_pages_dirty(struct page **pages, unsigned long npages) +{ + __put_user_pages_dirty(pages, npages, set_page_dirty); +} +EXPORT_SYMBOL(put_user_pages_dirty); + +/* + * put_user_pages_dirty_lock() - for each page in the @pages array, make + * that page (or its head page, if a compound page) dirty, if it was + * previously listed as clean. Then, release the page using + * put_user_page(). + * + * Please see the put_user_page() documentation for details. + * + * This is just like put_user_pages_dirty(), except that it invokes + * set_page_dirty_lock(), instead of set_page_dirty(). + * + * @pages: array of pages to be marked dirty and released. + * @npages: number of pages in the @pages array. + * + */ +void
[PATCH 2/2] infiniband/mm: convert put_page() to put_user_page*()
From: John Hubbard For infiniband code that retains pages via get_user_pages*(), release those pages via the new put_user_page(), or put_user_pages*(), instead of put_page() This is a tiny part of the second step of fixing the problem described in [1]. The steps are: 1) Provide put_user_page*() routines, intended to be used for releasing pages that were pinned via get_user_pages*(). 2) Convert all of the call sites for get_user_pages*(), to invoke put_user_page*(), instead of put_page(). This involves dozens of call sites, and will take some time. 3) After (2) is complete, use get_user_pages*() and put_user_page*() to implement tracking of these pages. This tracking will be separate from the existing struct page refcounting. 4) Use the tracking and identification of these pages, to implement special handling (especially in writeback paths) when the pages are backed by a filesystem. Again, [1] provides details as to why that is desirable. [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()" Reviewed-by: Jan Kara Reviewed-by: Dennis Dalessandro Acked-by: Jason Gunthorpe Cc: Doug Ledford Cc: Jason Gunthorpe Cc: Mike Marciniszyn Cc: Dennis Dalessandro Cc: Christian Benvenuti Signed-off-by: John Hubbard --- drivers/infiniband/core/umem.c | 7 --- drivers/infiniband/core/umem_odp.c | 2 +- drivers/infiniband/hw/hfi1/user_pages.c | 11 --- drivers/infiniband/hw/mthca/mthca_memfree.c | 6 +++--- drivers/infiniband/hw/qib/qib_user_pages.c | 11 --- drivers/infiniband/hw/qib/qib_user_sdma.c | 6 +++--- drivers/infiniband/hw/usnic/usnic_uiom.c| 7 --- 7 files changed, 23 insertions(+), 27 deletions(-) diff --git a/drivers/infiniband/core/umem.c b/drivers/infiniband/core/umem.c index c6144df47ea4..c2898bc7b3b2 100644 --- a/drivers/infiniband/core/umem.c +++ b/drivers/infiniband/core/umem.c @@ -58,9 +58,10 @@ static void __ib_umem_release(struct ib_device *dev, struct ib_umem *umem, int d for_each_sg(umem->sg_head.sgl, sg, umem->npages, i) { page = sg_page(sg); - if (!PageDirty(page) && umem->writable && dirty) - set_page_dirty_lock(page); - put_page(page); + if (umem->writable && dirty) + put_user_pages_dirty_lock(, 1); + else + put_user_page(page); } sg_free_table(>sg_head); diff --git a/drivers/infiniband/core/umem_odp.c b/drivers/infiniband/core/umem_odp.c index 676c1fd1119d..99715049cd3b 100644 --- a/drivers/infiniband/core/umem_odp.c +++ b/drivers/infiniband/core/umem_odp.c @@ -659,7 +659,7 @@ int ib_umem_odp_map_dma_pages(struct ib_umem_odp *umem_odp, u64 user_virt, ret = -EFAULT; break; } - put_page(local_page_list[j]); + put_user_page(local_page_list[j]); continue; } diff --git a/drivers/infiniband/hw/hfi1/user_pages.c b/drivers/infiniband/hw/hfi1/user_pages.c index e341e6dcc388..99ccc0483711 100644 --- a/drivers/infiniband/hw/hfi1/user_pages.c +++ b/drivers/infiniband/hw/hfi1/user_pages.c @@ -121,13 +121,10 @@ int hfi1_acquire_user_pages(struct mm_struct *mm, unsigned long vaddr, size_t np void hfi1_release_user_pages(struct mm_struct *mm, struct page **p, size_t npages, bool dirty) { - size_t i; - - for (i = 0; i < npages; i++) { - if (dirty) - set_page_dirty_lock(p[i]); - put_page(p[i]); - } + if (dirty) + put_user_pages_dirty_lock(p, npages); + else + put_user_pages(p, npages); if (mm) { /* during close after signal, mm can be NULL */ down_write(>mmap_sem); diff --git a/drivers/infiniband/hw/mthca/mthca_memfree.c b/drivers/infiniband/hw/mthca/mthca_memfree.c index cc9c0c8ccba3..b8b12effd009 100644 --- a/drivers/infiniband/hw/mthca/mthca_memfree.c +++ b/drivers/infiniband/hw/mthca/mthca_memfree.c @@ -481,7 +481,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct mthca_uar *uar, ret = pci_map_sg(dev->pdev, _tab->page[i].mem, 1, PCI_DMA_TODEVICE); if (ret < 0) { - put_page(pages[0]); + put_user_page(pages[0]); goto out; } @@ -489,7 +489,7 @@ int mthca_map_user_db(struct mthca_dev *dev, struct mthca_uar *uar, mthca_uarc_virt(dev, uar, i)); if (ret) { pci_unmap_sg(dev->pdev, _tab->page[i].mem, 1, PCI_DMA_TODEVICE); - put_page(sg_page(_tab->page[i].mem)); + put_user_page(sg_page(_tab->page[i].mem)); goto out; } @@ -555,7 +555,7 @@ void
[PATCH 0/2] put_user_page*(): start converting the call sites
From: John Hubbard Hi, Summary: I'd like these two patches to go into the next convenient cycle. I *think* that means 4.21. Details At the Linux Plumbers Conference, we talked about this approach [1], and the primary lingering concern was over performance. Tom Talpey helped me through a much more accurate run of the fio performance test, and now it's looking like an under 1% performance cost, to add and remove pages from the LRU (this is only paid when dealing with get_user_pages) [2]. So we should be fine to start converting call sites. This patchset gets the conversion started. Both patches already had a fair amount of review. (Tom, I'll add you Tested-by to the actual implementation that moves pages on and off the LRU. These first two patches don't do that.) [1] https://linuxplumbersconf.org/event/2/contributions/126/ "RDMA and get_user_pages" [2] https://lore.kernel.org/r/79d1ee27-9ea0-3d15-3fc4-97c1bd79c...@talpey.com John Hubbard (2): mm: introduce put_user_page*(), placeholder versions infiniband/mm: convert put_page() to put_user_page*() drivers/infiniband/core/umem.c | 7 +- drivers/infiniband/core/umem_odp.c | 2 +- drivers/infiniband/hw/hfi1/user_pages.c | 11 ++- drivers/infiniband/hw/mthca/mthca_memfree.c | 6 +- drivers/infiniband/hw/qib/qib_user_pages.c | 11 ++- drivers/infiniband/hw/qib/qib_user_sdma.c | 6 +- drivers/infiniband/hw/usnic/usnic_uiom.c| 7 +- include/linux/mm.h | 20 ++ mm/swap.c | 80 + 9 files changed, 123 insertions(+), 27 deletions(-) -- 2.19.2
RE: [PATCH V6 0/9] clk: add imx7ulp clk support
> -Original Message- > From: Stephen Boyd [mailto:sb...@kernel.org] > Sent: Tuesday, December 4, 2018 3:32 AM > To: Aisheng DONG ; linux-...@vger.kernel.org > Cc: linux-kernel@vger.kernel.org; linux-arm-ker...@lists.infradead.org; > mturque...@baylibre.com; shawn...@kernel.org; Anson Huang > ; Jacky Bai ; dl-linux-imx > ; Aisheng DONG > Subject: Re: [PATCH V6 0/9] clk: add imx7ulp clk support > > Quoting A.s. Dong (2018-11-14 05:01:31) > > This patch series intends to add imx7ulp clk support. > > > > i.MX7ULP Clock functions are under joint control of the System Clock > > Generation (SCG) modules, Peripheral Clock Control (PCC) modules, and > > Core Mode Controller (CMC)1 blocks > > > > The clocking scheme provides clear separation between M4 domain and A7 > > domain. Except for a few clock sources shared between two domains, > > such as the System Oscillator clock, the Slow IRC (SIRC), and and the > > Fast IRC clock (FIRCLK), clock sources and clock management are > > separated and contained within each domain. > > > > M4 clock management consists of SCG0, PCC0, PCC1, and CMC0 modules. > > A7 clock management consists of SCG1, PCC2, PCC3, and CMC1 modules. > > > > Note: this series only adds A7 clock domain support as M4 clock domain > > will be handled by M4 seperately. > > > > I got: > > drivers/clk/imx/clk-pllv4.c:152:15: warning: symbol 'imx_clk_pllv4' was not > declared. Should it be static? > drivers/clk/imx/clk-pfdv2.c:166:15: warning: symbol 'imx_clk_pfdv2' was not > declared. Should it be static? > drivers/clk/imx/clk-divider-gate.c:174:15: warning: symbol > 'imx_clk_divider_gate' was not declared. Should it be static? > drivers/clk/imx/clk-composite-7ulp.c:22:15: warning: symbol > 'imx7ulp_clk_composite' was not declared. Should it be static? > > which I can fix easily by throwing in clk.h into each file. Thanks, I will double check it when I back to office. Regards Dong Aisheng
Re: [PATCH] net: stmmac: convert to DEFINE_SHOW_ATTRIBUTE
From: Yangtao Li Date: Mon, 3 Dec 2018 09:22:09 -0500 > Use DEFINE_SHOW_ATTRIBUTE macro to simplify the code. > > Signed-off-by: Yangtao Li Applied.
Re: [PATCH v9 2/4] seccomp: switch system call argument type to void *
On Mon, Dec 03, 2018 at 07:17:26PM -0700, Tycho Andersen wrote: > On Tue, Dec 04, 2018 at 10:07:38AM +0800, kbuild test robot wrote: > > Hi Tycho, > > > > I love your patch! Yet something to improve: > > > > [auto build test ERROR on linus/master] > > [also build test ERROR on v4.20-rc5 next-20181203] > > [if your patch is applied to the wrong git tree, please drop us a note to > > help improve the system] > > > > url: > > https://github.com/0day-ci/linux/commits/Tycho-Andersen/seccomp-hoist-struct-seccomp_data-recalculation-higher/20181204-013450 > > config: i386-randconfig-x005-201848 (attached as .config) > > compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 > > reproduce: > > # save the attached .config to linux build tree > > make ARCH=i386 > > > > All errors (new ones prefixed by >>): > > > >In file included from kernel/seccomp.c:28:0: > > >> include/linux/syscalls.h:239:18: error: conflicting types for > > >> 'sys_seccomp' > > asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \ > > ^ > >include/linux/syscalls.h:225:2: note: in expansion of macro > > '__SYSCALL_DEFINEx' > > __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) > > ^ > >include/linux/syscalls.h:216:36: note: in expansion of macro > > 'SYSCALL_DEFINEx' > > #define SYSCALL_DEFINE3(name, ...) SYSCALL_DEFINEx(3, _##name, > > __VA_ARGS__) > >^~~ > >kernel/seccomp.c:946:1: note: in expansion of macro 'SYSCALL_DEFINE3' > > SYSCALL_DEFINE3(seccomp, unsigned int, op, unsigned int, flags, > > ^~~ > >In file included from kernel/seccomp.c:28:0: > >include/linux/syscalls.h:881:17: note: previous declaration of > > 'sys_seccomp' was here > > asmlinkage long sys_seccomp(unsigned int op, unsigned int flags, > > ^~~ > > Huh, I have no idea why I don't see this, but even with the attached > config it still doesn't cause a problem for me. Anyway, I'll fix it up > and do some more investigating... Oh, because it's "make ARCH=i386". Whoosh :) Anyway, it's fixed for v10. Tycho
Query regarding Spectre fixes for qemu-kvm...4.4 LTS Kernel.
Hi, I have few queries regarding qemu-kvm support of Spectre related fixes at 4.4.* LTS Kernel. I see that in upstream kernels, svm_vcpu_run() calls x86_spec_ctrl_set_guest() and x86_spec_ctrl_restore_host(). And calling into x86_virt_spec_ctrl(), that sets IBRS/IBPB/SSBD bits accordingly for guest context. Related commit IDs below: commit 5cf687548705412da47c9cec342fd952d71ed3d5 commit ccbcd2674472a978b48c91c1fbfb66c0ff959f24 Looks like this change is not fully ported to 4.4 LTS yet. x86_spec_ctrl_set_guest() and x86_spec_ctrl_restore_host() interfaces are available, however looks like svm_vcpu_run() is not calling them. So qemu-kvm running on 4.4 kernels may not have SPEC_CTRL set properly in guest context. Is there a plan to backport above changes fully into 4.4 LTS kernel?. Thanks, Paulose.
Re: [PATCH v5 8/8] soc: qcom: rpmhpd: Mark mx as a parent for cx
On 04-12-18, 10:51, Rajendra Nayak wrote: > Specify the active + sleep and active-only MX power domains as > the parents of the corresponding CX power domains. This will ensure that > performance state requests on CX automatically generate equivalent requests > on MX power domains. > > This is used to enforce a requirement that exists for various > hardware blocks on SDM845 that MX performance state >= CX performance > state for a given operating frequency. > > Signed-off-by: Rajendra Nayak > --- > This patch is dependent on the series from > Viresh [1] which adds support to propogate performance states across the > power domain hierarchy which is still being reviewed. > > [1] https://lkml.org/lkml/2018/11/26/333 > > drivers/soc/qcom/rpmhpd.c | 11 +++ > 1 file changed, 11 insertions(+) Acked-by: Viresh Kumar -- viresh
Re: [PATCH v2 1/5] devfreq: refactor set_target frequency function
Hi, On 2018년 12월 04일 13:39, Chanwoo Choi wrote: > Hi Lukasz, > > On 2018년 12월 03일 23:31, Lukasz Luba wrote: >> The refactoring is needed for the new client in devfreq: suspend. >> To avoid code duplication, move it to the new local function >> devfreq_set_target. >> >> The patch is based on earlier work by Tobias Jakobi. > > As I already commented, Please remove it. You already mentioned it on > cover-letter. > If you want to contain the contribution history of Tobias, you might better > to add 'Signed-off-by' or others. If you will fix it, feel free to add my tag: Reviewed-by: Chanwoo Choi > >> >> Suggested-by: Tobias Jakobi >> Suggested-by: Chanwoo Choi >> Signed-off-by: Lukasz Luba >> --- >> drivers/devfreq/devfreq.c | 62 >> +++ >> 1 file changed, 36 insertions(+), 26 deletions(-) >> >> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c >> index 1414130..a9fd61b 100644 >> --- a/drivers/devfreq/devfreq.c >> +++ b/drivers/devfreq/devfreq.c >> @@ -285,6 +285,40 @@ static int devfreq_notify_transition(struct devfreq >> *devfreq, >> return 0; >> } >> >> +static int devfreq_set_target(struct devfreq *devfreq, unsigned long >> new_freq, >> + u32 flags) >> +{ >> +struct devfreq_freqs freqs; >> +unsigned long cur_freq; >> +int err = 0; >> + >> +if (devfreq->profile->get_cur_freq) >> +devfreq->profile->get_cur_freq(devfreq->dev.parent, _freq); >> +else >> +cur_freq = devfreq->previous_freq; >> + >> +freqs.old = cur_freq; >> +freqs.new = new_freq; >> +devfreq_notify_transition(devfreq, , DEVFREQ_PRECHANGE); >> + >> +err = devfreq->profile->target(devfreq->dev.parent, _freq, flags); >> +if (err) { >> +freqs.new = cur_freq; >> +devfreq_notify_transition(devfreq, , DEVFREQ_POSTCHANGE); >> +return err; >> +} >> + >> +freqs.new = new_freq; >> +devfreq_notify_transition(devfreq, , DEVFREQ_POSTCHANGE); >> + >> +if (devfreq_update_status(devfreq, new_freq)) >> +dev_err(>dev, >> +"Couldn't update frequency transition information.\n"); >> + >> +devfreq->previous_freq = new_freq; >> +return err; >> +} >> + >> /* Load monitoring helper functions for governors use */ >> >> /** >> @@ -296,8 +330,7 @@ static int devfreq_notify_transition(struct devfreq >> *devfreq, >> */ >> int update_devfreq(struct devfreq *devfreq) >> { >> -struct devfreq_freqs freqs; >> -unsigned long freq, cur_freq, min_freq, max_freq; >> +unsigned long freq, min_freq, max_freq; >> int err = 0; >> u32 flags = 0; >> >> @@ -333,31 +366,8 @@ int update_devfreq(struct devfreq *devfreq) >> flags |= DEVFREQ_FLAG_LEAST_UPPER_BOUND; /* Use LUB */ >> } >> >> -if (devfreq->profile->get_cur_freq) >> -devfreq->profile->get_cur_freq(devfreq->dev.parent, _freq); >> -else >> -cur_freq = devfreq->previous_freq; >> - >> -freqs.old = cur_freq; >> -freqs.new = freq; >> -devfreq_notify_transition(devfreq, , DEVFREQ_PRECHANGE); >> +return devfreq_set_target(devfreq, freq, flags); >> >> -err = devfreq->profile->target(devfreq->dev.parent, , flags); >> -if (err) { >> -freqs.new = cur_freq; >> -devfreq_notify_transition(devfreq, , DEVFREQ_POSTCHANGE); >> -return err; >> -} >> - >> -freqs.new = freq; >> -devfreq_notify_transition(devfreq, , DEVFREQ_POSTCHANGE); >> - >> -if (devfreq_update_status(devfreq, freq)) >> -dev_err(>dev, >> -"Couldn't update frequency transition information.\n"); >> - >> -devfreq->previous_freq = freq; >> -return err; >> } >> EXPORT_SYMBOL(update_devfreq); >> >> > > -- Best Regards, Chanwoo Choi Samsung Electronics
[PATCH v3] panic: Avoid the extra noise dmesg
When kernel panic happens, it will first print the panic call stack, then the ending msg like: [ 35.743249] ---[ end Kernel panic - not syncing: Fatal exception [ 35.749975] [ cut here ] The above message are very useful for debugging. But if system is configured to not reboot on panic, say the "panic_timeout" parameter equals 0, it will likely print out many noisy message like WARN() call stack for each and every CPU except the panic one, messages like below: WARNING: CPU: 1 PID: 280 at kernel/sched/core.c:1198 set_task_cpu+0x183/0x190 Call Trace: try_to_wake_up default_wake_function autoremove_wake_function __wake_up_common __wake_up_common_lock __wake_up wake_up_klogd_work_func irq_work_run_list irq_work_tick update_process_times tick_sched_timer __hrtimer_run_queues hrtimer_interrupt smp_apic_timer_interrupt apic_timer_interrupt For people working in console mode, the screen will first show the panic call stack, but immediately overridded by these noisy extra messages, which makes debugging much more difficult, as the original context gets lost on screen. Also these noisy messages will confuse some users, as I have seen many bug reporters posted the noisy message into bugzilla, instead of the real panic call stack and context. Removing the "local_irq_enable" will avoid the noisy message. The justification for the removing is: when code runs to this point, it means user has chosed to not reboot, or do any special handling by using the panic notifier method, no much point in re-enabling the interrupt. Signed-off-by: Feng Tang Cc: Thomas Gleixner Cc: Kees Cook Cc: Borislav Petkov Cc: sta...@kernel.org --- kernel/panic.c | 1 - 1 file changed, 1 deletion(-) diff --git a/kernel/panic.c b/kernel/panic.c index f6d549a..a616e55 100644 --- a/kernel/panic.c +++ b/kernel/panic.c @@ -295,7 +295,6 @@ void panic(const char *fmt, ...) } #endif pr_emerg("---[ end Kernel panic - not syncing: %s ]---\n", buf); - local_irq_enable(); for (i = 0; ; i += PANIC_TIMER_STEP) { touch_softlockup_watchdog(); if (i >= i_next) { -- 2.7.4
linux-next: manual merge of the akpm tree with the pm tree
Hi Andrew, Today's linux-next merge of the akpm tree got a conflict in: fs/exec.c between commit: 67fe1224adc5 ("Revert "exec: make de_thread() freezable"") from the pm tree and patch: "fs/: remove caller signal_pending branch predictions" from the akpm tree. I fixed it up (see below) and can carry the fix as necessary. This is now fixed as far as linux-next is concerned, but any non trivial conflicts should be mentioned to your upstream maintainer when your tree is submitted for merging. You may also want to consider cooperating with the maintainer of the conflicting tree to minimise any particularly complex conflicts. -- Cheers, Stephen Rothwell diff --cc fs/exec.c index ea7d439cf79e,044e296f2381.. --- a/fs/exec.c +++ b/fs/exec.c @@@ -1086,8 -1087,8 +1086,8 @@@ static int de_thread(struct task_struc while (sig->notify_count) { __set_current_state(TASK_KILLABLE); spin_unlock_irq(lock); - freezable_schedule(); + schedule(); - if (unlikely(__fatal_signal_pending(tsk))) + if (__fatal_signal_pending(tsk)) goto killed; spin_lock_irq(lock); } @@@ -1114,8 -1115,8 +1114,8 @@@ __set_current_state(TASK_KILLABLE); write_unlock_irq(_lock); cgroup_threadgroup_change_end(tsk); - freezable_schedule(); + schedule(); - if (unlikely(__fatal_signal_pending(tsk))) + if (__fatal_signal_pending(tsk)) goto killed; } pgpyJkxHzU7jW.pgp Description: OpenPGP digital signature
Re: [PATCH] mm/alloc: fallback to first node if the wanted node offline
On Tue, Dec 4, 2018 at 11:53 AM David Rientjes wrote: > > On Tue, 4 Dec 2018, Pingfan Liu wrote: > > > diff --git a/include/linux/gfp.h b/include/linux/gfp.h > > index 76f8db0..8324953 100644 > > --- a/include/linux/gfp.h > > +++ b/include/linux/gfp.h > > @@ -453,6 +453,8 @@ static inline int gfp_zonelist(gfp_t flags) > > */ > > static inline struct zonelist *node_zonelist(int nid, gfp_t flags) > > { > > + if (unlikely(!node_online(nid))) > > + nid = first_online_node; > > return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags); > > } > > > > So we're passing the node id from dev_to_node() to kmalloc which > interprets that as the preferred node and then does node_zonelist() to > find the zonelist at allocation time. > > What happens if we fix this in alloc_dr()? Does anything else cause > problems? > I think it is better to fix it mm, since it can protect any new similar bug in future. While fixing in alloc_dr() just work at present > And rather than using first_online_node, would next_online_node() work? > What is the gain? Is it for memory pressure on node0? Thanks, Pingfan > I'm thinking about this: > > diff --git a/drivers/base/devres.c b/drivers/base/devres.c > --- a/drivers/base/devres.c > +++ b/drivers/base/devres.c > @@ -100,6 +100,8 @@ static __always_inline struct devres * > alloc_dr(dr_release_t release, > _size))) > return NULL; > > + if (unlikely(!node_online(nid))) > + nid = next_online_node(nid); > dr = kmalloc_node_track_caller(tot_size, gfp, nid); > if (unlikely(!dr)) > return NULL;
Re: [PATCH 3/9] tools/lib/traceevent: Install trace-seq.h API header file
On Fri, Nov 30, 2018 at 10:44:06AM -0500, Steven Rostedt wrote: > From: Tzvetomir Stoyanov > > This patch installs trace-seq.h header file on "make install". > > Signed-off-by: Tzvetomir Stoyanov > Signed-off-by: Steven Rostedt (VMware) > --- > tools/lib/traceevent/Makefile | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/tools/lib/traceevent/Makefile b/tools/lib/traceevent/Makefile > index adb16f845ab3..67fe5d7ef190 100644 > --- a/tools/lib/traceevent/Makefile > +++ b/tools/lib/traceevent/Makefile > @@ -285,7 +285,7 @@ define do_install_pkgconfig_file > fi > endef > > -install_lib: all_cmd install_plugins install_pkgconfig > +install_lib: all_cmd install_plugins install_headers install_pkgconfig > $(call QUIET_INSTALL, $(LIB_TARGET)) \ > $(call do_install_mkdir,$(libdir_SQ)); \ > cp -fpR $(LIB_INSTALL) $(DESTDIR)$(libdir_SQ) > @@ -302,6 +302,7 @@ install_headers: > $(call QUIET_INSTALL, headers) \ > $(call > do_install,event-parse.h,$(prefix)/include/traceevent,644); \ > $(call > do_install,event-utils.h,$(prefix)/include/traceevent,644); \ > + $(call > do_install,trace-seq.h,$(prefix)/include/traceevent,644); \ > $(call do_install,kbuffer.h,$(prefix)/include/traceevent,644) Do you still wanna have 'traceevent' directory prefix? I just sometimes feel a bit annoying to type it. ;-) Or you can rename it something like 'tep' or 'libtep' - and hopefully having only single header file to include.. Thanks, Namhyung > > install: install_lib > -- > 2.19.1 > >
Re: [PATCH v11 1/2] dt-bindings: cpufreq: Introduce QCOM CPUFREQ Firmware bindings
Quoting Rob Herring (2018-12-03 15:09:07) > On Sun, Dec 02, 2018 at 09:25:02AM +0530, Taniya Das wrote: > > Add QCOM cpufreq firmware device bindings for Qualcomm Technology Inc's > > SoCs. This is required for managing the cpu frequency transitions which are > > controlled by the hardware engine. > > > > Signed-off-by: Taniya Das > > --- > > .../bindings/cpufreq/cpufreq-qcom-hw.txt | 172 > > + > > 1 file changed, 172 insertions(+) > > create mode 100644 > > Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.txt > > > > diff --git a/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.txt > > b/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.txt > > new file mode 100644 > > index 000..2b82965 > > --- /dev/null > > +++ b/Documentation/devicetree/bindings/cpufreq/cpufreq-qcom-hw.txt > > @@ -0,0 +1,172 @@ > > +Qualcomm Technologies, Inc. CPUFREQ Bindings > > + > > +CPUFREQ HW is a hardware engine used by some Qualcomm Technologies, Inc. > > (QTI) > > +SoCs to manage frequency in hardware. It is capable of controlling > > frequency > > +for multiple clusters. > > + > > +Properties: > > +- compatible > > + Usage: required > > + Value type: > > + Definition: must be "qcom,cpufreq-hw". > > + > > +- clocks > > + Usage: required > > + Value type: From common clock binding. > > + Definition: clock handle for XO clock and GPLL0 clock. > > + > > +- clock-names > > + Usage: required > > + Value type: From common clock binding. > > + Definition: must be "xo", "alternate". > > + > > +- reg > > + Usage: required > > + Value type: > > + Definition: Addresses and sizes for the memory of the HW bases in > > + each frequency domain. > > +- reg-names > > + Usage: Optional > > + Value type: > > + Definition: Frequency domain name i.e. > > + "freq-domain0", "freq-domain1". > > + > > +- freq-domain-cells: > > #freq-domain-cells > > Otherwise, > > Reviewed-by: Rob Herring Or should it be #qcom,freq-domain-cells? That would match the same stem of the property used in the cpu node.
Re: [PATCH] power: reset: gpio-poweroff: add ability to specific active and inactive delays
On Mon, Dec 3, 2018 at 3:49 PM Rob Herring wrote: > > On Sun, 11 Nov 2018 22:45:38 +0100, Heiko Stuebner wrote: > > From: Heiko Stuebner > > > > Similar to gpio-reset allow to specify active and inactive delays > > while keeping the 100ms defaults that were used previously all the time. > > > > The dt-properties are named the same as in gpio-reset but get an "-ms" > > suffix as properties should contain such a suffix specifying its unit. > > > > Signed-off-by: Heiko Stuebner > > --- > > .../devicetree/bindings/power/reset/gpio-poweroff.txt | 2 ++ > > drivers/power/reset/gpio-poweroff.c| 10 -- > > 2 files changed, 10 insertions(+), 2 deletions(-) > > > > Reviewed-by: Rob Herring Reviewed-by: Moritz Fischer
Re: [PATCH linux-next v2 6/6] ASoC: rsnd: add avb clocks
HI Jiada > There are AVB Counter Clocks in ADG, each clock has 12bits integral > and 8 bits fractional dividers which operates with S0D1ϕ clock. > > This patch registers 8 AVB Counter Clocks when clock-cells of > rcar_sound node is 2, > > Signed-off-by: Jiada Wang > --- > sound/soc/sh/rcar/adg.c | 306 +-- > sound/soc/sh/rcar/gen.c | 9 ++ > sound/soc/sh/rcar/rsnd.h | 9 ++ > 3 files changed, 315 insertions(+), 9 deletions(-) Please update DT binding txt, too Best regards --- Kuninori Morimoto
[PATCH v5 03/13] iov_iter: pass void csum pointer to csum_and_copy_to_iter
From: Sagi Grimberg The single caller to csum_and_copy_to_iter is skb_copy_and_csum_datagram and we are trying to unite its logic with skb_copy_datagram_iter by passing a callback to the copy function that we want to apply. Thus, we need to make the checksum pointer private to the function. Acked-by: David S. Miller Signed-off-by: Sagi Grimberg --- include/linux/uio.h | 2 +- lib/iov_iter.c | 3 ++- 2 files changed, 3 insertions(+), 2 deletions(-) diff --git a/include/linux/uio.h b/include/linux/uio.h index 55ce99ddb912..41d1f8d3313d 100644 --- a/include/linux/uio.h +++ b/include/linux/uio.h @@ -266,7 +266,7 @@ static inline void iov_iter_reexpand(struct iov_iter *i, size_t count) { i->count = count; } -size_t csum_and_copy_to_iter(const void *addr, size_t bytes, __wsum *csum, struct iov_iter *i); +size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump, struct iov_iter *i); size_t csum_and_copy_from_iter(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i); bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum, struct iov_iter *i); diff --git a/lib/iov_iter.c b/lib/iov_iter.c index 7ebccb5c1637..db93531ca3e3 100644 --- a/lib/iov_iter.c +++ b/lib/iov_iter.c @@ -1432,10 +1432,11 @@ bool csum_and_copy_from_iter_full(void *addr, size_t bytes, __wsum *csum, } EXPORT_SYMBOL(csum_and_copy_from_iter_full); -size_t csum_and_copy_to_iter(const void *addr, size_t bytes, __wsum *csum, +size_t csum_and_copy_to_iter(const void *addr, size_t bytes, void *csump, struct iov_iter *i) { const char *from = addr; + __wsum *csum = csump; __wsum sum, next; size_t off = 0; sum = *csum; -- 2.17.1
[PATCH v5 02/13] datagram: open-code copy_page_to_iter
From: Sagi Grimberg This will be useful to consolidate skb_copy_and_hash_datagram_iter and skb_copy_and_csum_datagram to a single code path. Acked-by: David S. Miller Signed-off-by: Sagi Grimberg --- net/core/datagram.c | 9 ++--- 1 file changed, 6 insertions(+), 3 deletions(-) diff --git a/net/core/datagram.c b/net/core/datagram.c index 57f3a6fcfc1e..abe642181b64 100644 --- a/net/core/datagram.c +++ b/net/core/datagram.c @@ -445,11 +445,14 @@ int skb_copy_datagram_iter(const struct sk_buff *skb, int offset, end = start + skb_frag_size(frag); if ((copy = end - offset) > 0) { + struct page *page = skb_frag_page(frag); + u8 *vaddr = kmap(page); + if (copy > len) copy = len; - n = copy_page_to_iter(skb_frag_page(frag), - frag->page_offset + offset - - start, copy, to); + n = copy_to_iter(vaddr + frag->page_offset + +offset - start, copy, to); + kunmap(page); offset += n; if (n != copy) goto short_copy; -- 2.17.1
Re: [PATCH v11 2/2] cpufreq: qcom-hw: Add support for QCOM cpufreq HW driver
Hi Taniya, Sorry that I haven't been reviewing it much from last few iterations as I was letting others get this into a better shape. Thanks for your efforts.. On 02-12-18, 09:25, Taniya Das wrote: > +++ b/drivers/cpufreq/qcom-cpufreq-hw.c > +struct cpufreq_qcom { > + struct cpufreq_frequency_table *table; > + void __iomem *perf_state_reg; > + cpumask_t related_cpus; > +}; > + > +static struct cpufreq_qcom *qcom_freq_domain_map[NR_CPUS]; Now that the code is much more simplified, I am not sure if you need this per-cpu structure at all. The only place where you are using it is in qcom_cpufreq_hw_cpu_init() and probe(). Why not merge qcom_cpu_resources_init() completely into qcom_cpufreq_hw_cpu_init() and get rid of this structure entirely ? -- viresh
Re: [PATCH] x86/boot: clear rsdp address in boot_params for broken loaders
On 04/12/2018 00:07, h...@zytor.com wrote: > On December 3, 2018 2:38:11 AM PST, Juergen Gross wrote: >> In case a broken boot loader doesn't clear its struct boot_params clear >> rsdp_addr in sanitize_boot_params(). >> >> This fixes commit e6e094e053af75 ("x86/acpi, x86/boot: Take RSDP >> address from boot params if available") e.g. for the case of a boot via >> systemd-boot. >> >> Fixes: e6e094e053af75 ("x86/acpi, x86/boot: Take RSDP address from boot >> params if available") >> Reported-by: Gunnar Krueger >> Tested-by: Gunnar Krueger >> Signed-off-by: Juergen Gross >> --- >> arch/x86/include/asm/bootparam_utils.h | 1 + >> 1 file changed, 1 insertion(+) >> >> diff --git a/arch/x86/include/asm/bootparam_utils.h >> b/arch/x86/include/asm/bootparam_utils.h >> index a07ffd23e4dd..f6f6ef436599 100644 >> --- a/arch/x86/include/asm/bootparam_utils.h >> +++ b/arch/x86/include/asm/bootparam_utils.h >> @@ -36,6 +36,7 @@ static void sanitize_boot_params(struct boot_params >> *boot_params) >> */ >> if (boot_params->sentinel) { >> /* fields in boot_params are left uninitialized, clear them */ >> +boot_params->acpi_rsdp_addr = 0; >> memset(_params->ext_ramdisk_image, 0, >> (char *)_params->efi_info - >> (char *)_params->ext_ramdisk_image); > > Isn't this already covered by the memset()? If not, we should extend the > memset() to maximal coverage. I'd like to send a followup patch doing that. And I'd like to not only test sentinel for being non-zero, but all padding fields as well. This should be 4.21 material, though. Juergen
[PATCH] spi: lpspi: Add i.MX8 boards support for lpspi
Add both ipg and per clock for lpspi to support i.MX8QM/QXP boards. Signed-off-by: Clark Wang --- drivers/spi/spi-fsl-lpspi.c | 52 + 1 file changed, 41 insertions(+), 11 deletions(-) diff --git a/drivers/spi/spi-fsl-lpspi.c b/drivers/spi/spi-fsl-lpspi.c index 08dcc3c22e88..5802f188051b 100644 --- a/drivers/spi/spi-fsl-lpspi.c +++ b/drivers/spi/spi-fsl-lpspi.c @@ -80,7 +80,8 @@ struct lpspi_config { struct fsl_lpspi_data { struct device *dev; void __iomem *base; - struct clk *clk; + struct clk *clk_ipg; + struct clk *clk_per; bool is_slave; void *rx_buf; @@ -147,8 +148,19 @@ static int lpspi_prepare_xfer_hardware(struct spi_controller *controller) { struct fsl_lpspi_data *fsl_lpspi = spi_controller_get_devdata(controller); + int ret; + + ret = clk_prepare_enable(fsl_lpspi->clk_ipg); + if (ret) + return ret; + + ret = clk_prepare_enable(fsl_lpspi->clk_per); + if (ret) { + clk_disable_unprepare(fsl_lpspi->clk_ipg); + return ret; + } - return clk_prepare_enable(fsl_lpspi->clk); + return 0; } static int lpspi_unprepare_xfer_hardware(struct spi_controller *controller) @@ -156,7 +168,8 @@ static int lpspi_unprepare_xfer_hardware(struct spi_controller *controller) struct fsl_lpspi_data *fsl_lpspi = spi_controller_get_devdata(controller); - clk_disable_unprepare(fsl_lpspi->clk); + clk_disable_unprepare(fsl_lpspi->clk_ipg); + clk_disable_unprepare(fsl_lpspi->clk_per); return 0; } @@ -249,7 +262,7 @@ static int fsl_lpspi_set_bitrate(struct fsl_lpspi_data *fsl_lpspi) unsigned int perclk_rate, scldiv; u8 prescale; - perclk_rate = clk_get_rate(fsl_lpspi->clk); + perclk_rate = clk_get_rate(fsl_lpspi->clk_per); for (prescale = 0; prescale < 8; prescale++) { scldiv = perclk_rate / (clkdivs[prescale] * config.speed_hz) - 2; @@ -522,15 +535,30 @@ static int fsl_lpspi_probe(struct platform_device *pdev) goto out_controller_put; } - fsl_lpspi->clk = devm_clk_get(>dev, "ipg"); - if (IS_ERR(fsl_lpspi->clk)) { - ret = PTR_ERR(fsl_lpspi->clk); + fsl_lpspi->clk_per = devm_clk_get(>dev, "per"); + if (IS_ERR(fsl_lpspi->clk_per)) { + ret = PTR_ERR(fsl_lpspi->clk_per); + goto out_controller_put; + } + + fsl_lpspi->clk_ipg = devm_clk_get(>dev, "ipg"); + if (IS_ERR(fsl_lpspi->clk_ipg)) { + ret = PTR_ERR(fsl_lpspi->clk_ipg); + goto out_controller_put; + } + + ret = clk_prepare_enable(fsl_lpspi->clk_ipg); + if (ret) { + dev_err(>dev, + "can't enable lpspi ipg clock, ret=%d\n", ret); goto out_controller_put; } - ret = clk_prepare_enable(fsl_lpspi->clk); + ret = clk_prepare_enable(fsl_lpspi->clk_per); if (ret) { - dev_err(>dev, "can't enable lpspi clock, ret=%d\n", ret); + dev_err(>dev, + "can't enable lpspi per clock, ret=%d\n", ret); + clk_disable_unprepare(fsl_lpspi->clk_ipg); goto out_controller_put; } @@ -538,7 +566,8 @@ static int fsl_lpspi_probe(struct platform_device *pdev) fsl_lpspi->txfifosize = 1 << (temp & 0x0f); fsl_lpspi->rxfifosize = 1 << ((temp >> 8) & 0x0f); - clk_disable_unprepare(fsl_lpspi->clk); + clk_disable_unprepare(fsl_lpspi->clk_per); + clk_disable_unprepare(fsl_lpspi->clk_ipg); ret = devm_spi_register_controller(>dev, controller); if (ret < 0) { @@ -560,7 +589,8 @@ static int fsl_lpspi_remove(struct platform_device *pdev) struct fsl_lpspi_data *fsl_lpspi = spi_controller_get_devdata(controller); - clk_disable_unprepare(fsl_lpspi->clk); + clk_disable_unprepare(fsl_lpspi->clk_per); + clk_disable_unprepare(fsl_lpspi->clk_ipg); return 0; } -- 2.17.1
Re: [PATCH 5/5] i2c: mediatek: Add i2c compatible for MediaTek MT8183
於 2018年12月3日 週一 上午5:34寫道: > > From: qii wang > > Add i2c compatible for MT8183. Compare to 2712 i2c controller, MT8183 has > different registers, offsets, clock, and multi-user function. > > Signed-off-by: qii wang > --- > drivers/i2c/busses/i2c-mt65xx.c | 136 > +-- > 1 file changed, 130 insertions(+), 6 deletions(-) > > diff --git a/drivers/i2c/busses/i2c-mt65xx.c b/drivers/i2c/busses/i2c-mt65xx.c > index 428ac99..6b979ab 100644 > --- a/drivers/i2c/busses/i2c-mt65xx.c > +++ b/drivers/i2c/busses/i2c-mt65xx.c > @@ -35,17 +35,23 @@ > #include > > #define I2C_RS_TRANSFER(1 << 4) > +#define I2C_ARB_LOST (1 << 3) it seems no one refers to the macro in the patch so it should be better to be removed > #define I2C_HS_NACKERR (1 << 2) > #define I2C_ACKERR (1 << 1) > #define I2C_TRANSAC_COMP (1 << 0) > #define I2C_TRANSAC_START (1 << 0) > +#define I2C_RESUME_ARBIT (1 << 1) > #define I2C_RS_MUL_CNFG(1 << 15) > #define I2C_RS_MUL_TRIG(1 << 14) > +#define I2C_HS_TIME_EN (1 << 7) > #define I2C_DCM_DISABLE0x > #define I2C_IO_CONFIG_OPEN_DRAIN 0x0003 > #define I2C_IO_CONFIG_PUSH_PULL0x > #define I2C_SOFT_RST 0x0001 > #define I2C_FIFO_ADDR_CLR 0x0001 > +#define I2C_FIFO_ADDR_CLRH 0x0002 > +#define I2C_FIFO_ADDR_CLR_MCH 0x0004 > +#define I2C_HFIFO_DATA 0x8208 > #define I2C_DELAY_LEN 0x0002 > #define I2C_ST_START_CON 0x8001 > #define I2C_FS_START_CON 0x1800 > @@ -76,6 +82,8 @@ > #define I2C_CONTROL_DIR_CHANGE (0x1 << 4) > #define I2C_CONTROL_ACKERR_DET_EN (0x1 << 5) > #define I2C_CONTROL_TRANSFER_LEN_CHANGE (0x1 << 6) > +#define I2C_CONTROL_DMAACK_EN (0x1 << 8) > +#define I2C_CONTROL_ASYNC_MODE (0x1 << 9) > #define I2C_CONTROL_WRAPPER (0x1 << 0) > > #define I2C_DRV_NAME "i2c-mt65xx" > @@ -130,6 +138,15 @@ enum I2C_REGS_OFFSET { > OFFSET_DEBUGCTRL, > OFFSET_TRANSFER_LEN_AUX, > OFFSET_CLOCK_DIV, > + /* MT8183 only regs */ > + OFFSET_LTIMING, > + OFFSET_DATA_TIMING, > + OFFSET_MCU_INTR, > + OFFSET_HW_TIMEOUT, > + OFFSET_HFIFO_DATA, > + OFFSET_HFIFO_STAT, > + OFFSET_MULTI_DMA, > + OFFSET_ROLLBACK, > }; > > static const u16 mt_i2c_regs_v1[] = { > @@ -159,6 +176,39 @@ enum I2C_REGS_OFFSET { > [OFFSET_CLOCK_DIV] = 0x70, > }; > > +static const u16 mt_i2c_regs_v2[] = { > + [OFFSET_DATA_PORT] = 0x0, > + [OFFSET_SLAVE_ADDR] = 0x4, > + [OFFSET_INTR_MASK] = 0x8, > + [OFFSET_INTR_STAT] = 0xc, > + [OFFSET_CONTROL] = 0x10, > + [OFFSET_TRANSFER_LEN] = 0x14, > + [OFFSET_TRANSAC_LEN] = 0x18, > + [OFFSET_DELAY_LEN] = 0x1c, > + [OFFSET_TIMING] = 0x20, > + [OFFSET_START] = 0x24, > + [OFFSET_EXT_CONF] = 0x28, > + [OFFSET_LTIMING] = 0x2c, > + [OFFSET_HS] = 0x30, > + [OFFSET_IO_CONFIG] = 0x34, > + [OFFSET_FIFO_ADDR_CLR] = 0x38, > + [OFFSET_DATA_TIMING] = 0x3c, > + [OFFSET_MCU_INTR] = 0x40, > + [OFFSET_TRANSFER_LEN_AUX] = 0x44, > + [OFFSET_CLOCK_DIV] = 0x48, > + [OFFSET_HW_TIMEOUT] = 0x4c, > + [OFFSET_SOFTRESET] = 0x50, > + [OFFSET_HFIFO_DATA] = 0x70, > + [OFFSET_DEBUGSTAT] = 0xe0, > + [OFFSET_DEBUGCTRL] = 0xe8, > + [OFFSET_FIFO_STAT] = 0xf4, > + [OFFSET_FIFO_THRESH] = 0xf8, > + [OFFSET_HFIFO_STAT] = 0xfc, > + [OFFSET_DCM_EN] = 0xf88, > + [OFFSET_MULTI_DMA] = 0xf8c, > + [OFFSET_ROLLBACK] = 0xf98, > +}; > + > struct mtk_i2c_compatible { > const struct i2c_adapter_quirks *quirks; > const u16 *regs; > @@ -168,6 +218,7 @@ struct mtk_i2c_compatible { > unsigned char aux_len_reg: 1; > unsigned char support_33bits: 1; > unsigned char timing_adjust: 1; > + unsigned char dma_sync: 1; > }; > > struct mtk_i2c { > @@ -181,8 +232,11 @@ struct mtk_i2c { > struct clk *clk_main; /* main clock for i2c bus */ > struct clk *clk_dma;/* DMA clock for i2c via DMA */ > struct clk *clk_pmic; /* PMIC clock for i2c from PMIC */ > + struct clk *clk_arb;/* Arbitrator clock for i2c */ > bool have_pmic; /* can use i2c pins from PMIC */ > bool use_push_pull; /* IO config push-pull mode */ > + bool share_i3c; /* share i3c IP*/ > + u32 ch_offset; /* i2c multi-user channel offset */ > > u16 irq_stat; /* interrupt status */ > unsigned int clk_src_div; > @@ -190,6 +244,7 @@ struct mtk_i2c { > enum
Re: [PATCH 3/3] arm64: ftrace: add cond_resched() to func ftrace_make_(call|nop)
On Mon, 3 Dec 2018 22:51:52 +0100 Arnd Bergmann wrote: > On Mon, Dec 3, 2018 at 8:22 PM Will Deacon wrote: > > > > Hi Anders, > > > > On Fri, Nov 30, 2018 at 04:09:56PM +0100, Anders Roxell wrote: > > > Both of those functions end up calling ftrace_modify_code(), which is > > > expensive because it changes the page tables and flush caches. > > > Microseconds add up because this is called in a loop for each dyn_ftrace > > > record, and this triggers the softlockup watchdog unless we let it sleep > > > occasionally. > > > Rework so that we call cond_resched() before going into the > > > ftrace_modify_code() function. > > > > > > Co-developed-by: Arnd Bergmann > > > Signed-off-by: Arnd Bergmann > > > Signed-off-by: Anders Roxell > > > --- > > > arch/arm64/kernel/ftrace.c | 10 ++ > > > 1 file changed, 10 insertions(+) > > > > It sounds like you're running into issues with the existing code, but I'd > > like to understand a bit more about exactly what you're seeing. Which part > > of the ftrace patching is proving to be expensive? > > > > The page table manipulation only happens once per module when using PLTs, > > and the cache maintenance is just a single line per patch site without an > > IPI. > > > > Is it the loop in ftrace_replace_code() that is causing the hassle? > > Yes: with an allmodconfig kernel, the ftrace selftest calls > ftrace_replace_code > to look >4 through ftrace_make_call/ftrace_make_nop, and these > end up calling > > static int __kprobes __aarch64_insn_write(void *addr, __le32 insn) > { > void *waddr = addr; > unsigned long flags = 0; > int ret; > > raw_spin_lock_irqsave(_lock, flags); > waddr = patch_map(addr, FIX_TEXT_POKE0); > > ret = probe_kernel_write(waddr, , AARCH64_INSN_SIZE); > > patch_unmap(FIX_TEXT_POKE0); > raw_spin_unlock_irqrestore(_lock, flags); > > return ret; > } > int __kprobes aarch64_insn_patch_text_nosync(void *addr, u32 insn) > { > u32 *tp = addr; > int ret; > > /* A64 instructions must be word aligned */ > if ((uintptr_t)tp & 0x3) > return -EINVAL; > > ret = aarch64_insn_write(tp, insn); > if (ret == 0) > __flush_icache_range((uintptr_t)tp, > (uintptr_t)tp + AARCH64_INSN_SIZE); > > return ret; > } > > which seems to be where the main cost is. This is with inside of > qemu, and with lots of debugging options (in particular > kcov and ubsan) enabled, that make each function call > more expensive. I was thinking more about this. Would something like this work? -- Steve diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c index 8ef9fc226037..42e89397778b 100644 --- a/kernel/trace/ftrace.c +++ b/kernel/trace/ftrace.c @@ -2393,11 +2393,14 @@ void __weak ftrace_replace_code(int enable) { struct dyn_ftrace *rec; struct ftrace_page *pg; + bool schedulable; int failed; if (unlikely(ftrace_disabled)) return; + schedulable = !irqs_disabled() & !preempt_count(); + do_for_each_ftrace_rec(pg, rec) { if (rec->flags & FTRACE_FL_DISABLED) @@ -2409,6 +2412,8 @@ void __weak ftrace_replace_code(int enable) /* Stop processing */ return; } + if (schedulable) + cond_resched(); } while_for_each_ftrace_rec(); }
Re: [PATCH] x86/boot: clear rsdp address in boot_params for broken loaders
On 12/3/18 9:32 PM, Juergen Gross wrote: > > I'd like to send a followup patch doing that. And I'd like to not only > test sentinel for being non-zero, but all padding fields as well. This > should be 4.21 material, though. > No, you can't do that. That breaks backwards compatibility. -hpa
[PATCH] spi: lpspi: Add cs-gpio support
Add cs-gpio feature for LPSPI. Use fsl_lpspi_prepare_message() and fsl_lpspi_unprepare_message() to enable and control cs line. These two functions will be only called at the beginning and the ending of a message transfer. Still support using the mode without cs-gpio. It depends on if attribute cs-gpio has been configured in dts file. Signed-off-by: Clark Wang --- drivers/spi/spi-fsl-lpspi.c | 79 - 1 file changed, 78 insertions(+), 1 deletion(-) diff --git a/drivers/spi/spi-fsl-lpspi.c b/drivers/spi/spi-fsl-lpspi.c index a7d01b79827b..c6fe3f94de19 100644 --- a/drivers/spi/spi-fsl-lpspi.c +++ b/drivers/spi/spi-fsl-lpspi.c @@ -9,6 +9,7 @@ #include #include #include +#include #include #include #include @@ -16,7 +17,9 @@ #include #include #include +#include #include +#include #include #include #include @@ -28,6 +31,10 @@ #define FSL_LPSPI_RPM_TIMEOUT 50 /* 50ms */ +#define LPSPI_CS_ACTIVE1 +#define LPSPI_CS_INACTIVE 0 +#define LPSPI_CS_DELAY 100 + /* i.MX7ULP LPSPI registers */ #define IMX7ULP_VERID 0x0 #define IMX7ULP_PARAM 0x4 @@ -104,6 +111,8 @@ struct fsl_lpspi_data { struct completion xfer_done; bool slave_aborted; + + int chipselect[0]; }; static const struct of_device_id fsl_lpspi_dt_ids[] = { @@ -176,6 +185,48 @@ static int lpspi_unprepare_xfer_hardware(struct spi_controller *controller) return 0; } +static void fsl_lpspi_chipselect(struct spi_device *spi, bool enable) +{ + struct fsl_lpspi_data *fsl_lpspi = + spi_controller_get_devdata(spi->controller); + int gpio = fsl_lpspi->chipselect[spi->chip_select]; + + enable = (!!(spi->mode & SPI_CS_HIGH) == enable); + + if (!gpio_is_valid(gpio)) + return; + + gpio_set_value_cansleep(gpio, enable); +} + +static int fsl_lpspi_prepare_message(struct spi_controller *controller, + struct spi_message *msg) +{ + struct fsl_lpspi_data *fsl_lpspi = + spi_controller_get_devdata(controller); + struct spi_device *spi = msg->spi; + int gpio = fsl_lpspi->chipselect[spi->chip_select]; + + if (gpio_is_valid(gpio)) { + gpio_direction_output(gpio, + fsl_lpspi->config.mode & SPI_CS_HIGH ? 0 : 1); + } + + fsl_lpspi_chipselect(spi, LPSPI_CS_ACTIVE); + + return 0; +} + +static int fsl_lpspi_unprepare_message(struct spi_controller *controller, + struct spi_message *msg) +{ + struct spi_device *spi = msg->spi; + + fsl_lpspi_chipselect(spi, LPSPI_CS_INACTIVE); + + return 0; +} + static void fsl_lpspi_write_tx_fifo(struct fsl_lpspi_data *fsl_lpspi) { u8 txfifo_cnt; @@ -512,10 +563,13 @@ static int fsl_lpspi_init_rpm(struct fsl_lpspi_data *fsl_lpspi) static int fsl_lpspi_probe(struct platform_device *pdev) { + struct device_node *np = pdev->dev.of_node; struct fsl_lpspi_data *fsl_lpspi; struct spi_controller *controller; + struct spi_imx_master *lpspi_platform_info = + dev_get_platdata(>dev); struct resource *res; - int ret, irq; + int i, ret, irq; u32 temp; if (of_property_read_bool((>dev)->of_node, "spi-slave")) @@ -539,6 +593,29 @@ static int fsl_lpspi_probe(struct platform_device *pdev) fsl_lpspi->is_slave = of_property_read_bool((>dev)->of_node, "spi-slave"); + if (!fsl_lpspi->is_slave) { + for (i = 0; i < controller->num_chipselect; i++) { + int cs_gpio = of_get_named_gpio(np, "cs-gpios", i); + + if (!gpio_is_valid(cs_gpio) && lpspi_platform_info) + cs_gpio = lpspi_platform_info->chipselect[i]; + + fsl_lpspi->chipselect[i] = cs_gpio; + if (!gpio_is_valid(cs_gpio)) + continue; + + ret = devm_gpio_request(>dev, + fsl_lpspi->chipselect[i], DRIVER_NAME); + if (ret) { + dev_err(>dev, "can't get cs gpios\n"); + goto out_controller_put; + } + } + + controller->prepare_message = fsl_lpspi_prepare_message; + controller->unprepare_message = fsl_lpspi_unprepare_message; + } + controller->transfer_one_message = fsl_lpspi_transfer_one_msg; controller->prepare_transfer_hardware = lpspi_prepare_xfer_hardware; controller->unprepare_transfer_hardware = lpspi_unprepare_xfer_hardware; -- 2.17.1
general protection fault in kvm_arch_vcpu_ioctl_run
Hello, syzbot found the following crash on: HEAD commit:4b78317679c4 Merge branch 'x86-pti-for-linus' of git://git.. git tree: upstream console output: https://syzkaller.appspot.com/x/log.txt?x=15e979f540 kernel config: https://syzkaller.appspot.com/x/.config?x=4602730af4f872ef dashboard link: https://syzkaller.appspot.com/bug?extid=39810e6c400efadfef71 compiler: gcc (GCC) 8.0.1 20180413 (experimental) Unfortunately, I don't have any reproducer for this crash yet. IMPORTANT: if you fix the bug, please add the following tag to the commit: Reported-by: syzbot+39810e6c400efadfe...@syzkaller.appspotmail.com kasan: GPF could be caused by NULL-ptr deref or user memory access general protection fault: [#1] PREEMPT SMP KASAN CPU: 0 PID: 14932 Comm: syz-executor0 Not tainted 4.20.0-rc4+ #138 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 01/01/2011 RIP: 0010:kvm_apic_hw_enabled arch/x86/kvm/lapic.h:169 [inline] RIP: 0010:vcpu_scan_ioapic arch/x86/kvm/x86.c:7449 [inline] RIP: 0010:vcpu_enter_guest arch/x86/kvm/x86.c:7602 [inline] RIP: 0010:vcpu_run arch/x86/kvm/x86.c:7874 [inline] RIP: 0010:kvm_arch_vcpu_ioctl_run+0x5296/0x7320 arch/x86/kvm/x86.c:8074 Code: 03 00 00 48 89 f8 48 c1 e8 03 42 80 3c 20 00 0f 85 b4 1e 00 00 49 8b 9f e0 03 00 00 48 8d bb 88 00 00 00 48 89 f8 48 c1 e8 03 <42> 80 3c 20 00 0f 85 8a 1e 00 00 48 8b 9b 88 00 00 00 48 8d bb d8 RSP: 0018:88818b0bf530 EFLAGS: 00010206 RAX: 0011 RBX: RCX: c9001302b000 RDX: 00cf RSI: 81103a68 RDI: 0088 RBP: 88818b0bf8d0 R08: 8881bfe8e0c0 R09: 0008 R10: 0028 R11: 810feb0f R12: dc00 R13: R14: c90007ddfdb8 R15: 888188a18400 FS: 7ff919977700() GS:8881dae0() knlGS: CS: 0010 DS: ES: CR0: 80050033 CR2: 7fc94713c518 CR3: 0001cdea CR4: 001426f0 DR0: DR1: DR2: DR3: DR6: fffe0ff0 DR7: 0400 Call Trace: kvm_vcpu_ioctl+0x5c8/0x1150 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2596 vfs_ioctl fs/ioctl.c:46 [inline] file_ioctl fs/ioctl.c:509 [inline] do_vfs_ioctl+0x1de/0x1790 fs/ioctl.c:696 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:713 __do_sys_ioctl fs/ioctl.c:720 [inline] __se_sys_ioctl fs/ioctl.c:718 [inline] __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290 entry_SYSCALL_64_after_hwframe+0x49/0xbe RIP: 0033:0x457569 Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7 48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 RSP: 002b:7ff919976c78 EFLAGS: 0246 ORIG_RAX: 0010 RAX: ffda RBX: 0003 RCX: 00457569 RDX: RSI: ae80 RDI: 0005 RBP: 0072bf00 R08: R09: R10: R11: 0246 R12: 7ff9199776d4 R13: 004c034e R14: 004d0d60 R15: Modules linked in: kobject: 'loop5' (4f26f0d5): kobject_uevent_env kobject: 'loop5' (4f26f0d5): fill_kobj_path: path = '/devices/virtual/block/loop5' kobject: 'loop1' (10db8550): kobject_uevent_env kobject: 'loop1' (10db8550): fill_kobj_path: path = '/devices/virtual/block/loop1' kobject: 'loop3' (941a4e7a): kobject_uevent_env kobject: 'loop3' (941a4e7a): fill_kobj_path: path = '/devices/virtual/block/loop3' ---[ end trace d7fab4e7c1a70214 ]--- RIP: 0010:kvm_apic_hw_enabled arch/x86/kvm/lapic.h:169 [inline] RIP: 0010:vcpu_scan_ioapic arch/x86/kvm/x86.c:7449 [inline] RIP: 0010:vcpu_enter_guest arch/x86/kvm/x86.c:7602 [inline] RIP: 0010:vcpu_run arch/x86/kvm/x86.c:7874 [inline] RIP: 0010:kvm_arch_vcpu_ioctl_run+0x5296/0x7320 arch/x86/kvm/x86.c:8074 Code: 03 00 00 48 89 f8 48 c1 e8 03 42 80 3c 20 00 0f 85 b4 1e 00 00 49 8b 9f e0 03 00 00 48 8d bb 88 00 00 00 48 89 f8 48 c1 e8 03 <42> 80 3c 20 00 0f 85 8a 1e 00 00 48 8b 9b 88 00 00 00 48 8d bb d8 kobject: 'loop4' (4a89aba1): kobject_uevent_env kobject: 'loop4' (4a89aba1): fill_kobj_path: path = '/devices/virtual/block/loop4' kobject: 'loop2' (704d7e59): kobject_uevent_env kobject: 'loop2' (704d7e59): fill_kobj_path: path = '/devices/virtual/block/loop2' kobject: 'loop4' (4a89aba1): kobject_uevent_env kobject: 'loop4' (4a89aba1): fill_kobj_path: path = '/devices/virtual/block/loop4' kobject: 'loop2' (704d7e59): kobject_uevent_env RSP: 0018:88818b0bf530 EFLAGS: 00010206 RAX: 0011 RBX: RCX: c9001302b000 kobject: 'loop2' (704d7e59): fill_kobj_path: path = '/devices/virtual/block/loop2' RDX: 00cf RSI: 81103a68 RDI:
Re: Strange hang with gcc 8 of kprobe multiple_kprobes test
Hi Steve, On Mon, 3 Dec 2018 21:18:07 -0500 Steven Rostedt wrote: > Hi Masami, > > I started testing some of my new code and the system got into a > strange state. Debugging further, I found the cause came from the > kprobe tests. It became stranger to me that I could reproduce it with > older kernels. I went back as far as 4.16 and it triggered. I thought > this very strange because I ran this test on all those kernels in the > past. > > After a bit of hair pulling, I figured out what changed. I upgraded to > gcc 8.1 (and I reproduce it with 8.2 as well). I convert back to gcc 7 > and the tests pass without issue. OK, let me see. > The issue that I notice when the system gets into this strange state is > that I can't log into the box. Nor can I reboot. Basically it's > anything to do with systemd just doesn't work (insert your jokes here > now, and then let's move on). > > I was able to narrow down what the exact function was that caused the > issues and it is: update_vsyscall() > > gcc 7 looks like this: > > 81004bf0 : > 81004bf0: e8 0b cc 9f 00 callq 81a01800 > <__fentry__> > 81004bf1: R_X86_64_PC32 __fentry__-0x4 > 81004bf5: 48 8b 07mov(%rdi),%rax > 81004bf8: 8b 15 96 5f 34 01 mov0x1345f96(%rip),%edx > # 8234ab94 > 81004bfa: R_X86_64_PC32 vclocks_used-0x4 > 81004bfe: 83 05 7b 84 6f 01 01addl $0x1,0x16f847b(%rip) > # 826fd080 > 81004c00: R_X86_64_PC32 vsyscall_gtod_data-0x5 > 81004c05: 8b 48 24mov0x24(%rax),%ecx > 81004c08: b8 01 00 00 00 mov$0x1,%eax > 81004c0d: d3 e0 shl%cl,%eax > > And gcc 8 looks like this: > > 81004c90 : > 81004c90: e8 6b cb 9f 00 callq 81a01800 > <__fentry__> > 81004c91: R_X86_64_PC32 __fentry__-0x4 > 81004c95: 48 8b 07mov(%rdi),%rax > 81004c98: 83 05 e1 93 6f 01 01addl $0x1,0x16f93e1(%rip) > # 826fe080 Hm this is a RIP relative instruction, it should be modified by kprobes. > 81004c9a: R_X86_64_PC32 vsyscall_gtod_data-0x5 > 81004c9f: 8b 50 24mov0x24(%rax),%edx > 81004ca2: 8b 05 ec 5e 34 01 mov0x1345eec(%rip),%eax > # 8234ab94 > 81004ca4: R_X86_64_PC32 vclocks_used-0x4 > > The test adds a kprobe (optimized) at udpate_vsyscall+5. And will > insert a jump on the two instructions after fentry. The difference > between v7 and v8 is that v7 is touching vclocks_used and v8 is > touching vsyscall_gtod_data. > > Is there some black magic going on with the vsyscall area with > vsyscall_gtod_data that is causing havoc when a kprobe is added there? I think it might miss something when preprocessing RIP relative instruction. Could you disable jump optimization as below and test what happen on update_vsyscall+5 AND update_vsyscall+8? (RIP relative preprocess must happen even if the jump optimization is disabled) # echo 0 > /proc/sys/debug/kprobes-optimization > I can dig a little more into this, but I'm currently at my HQ office > with a lot of other objectives that I must get done, and I can't work > on this much more this week. OK, let me try to reproduce it in my environment. > > I included my config (for my virt machine, which I was also able to > trigger it with). Thanks, but I think it should not depend on the kconfig. > > The test that triggers this bug is: > > tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc > > It runs the test fine, but other things just start to act up after I > run it. Yeah, thank you for digging it down. It is now much easier to me. > > I notice that when I get into the state, journald and the dbus_daemon > are constantly running. Perhaps the userspace time keeping went bad? Yeah, I think so. Maybe addl instruction becomes broken. Thank you, -- Masami Hiramatsu
Re: [PATCH] ubi: fastmap: Check each mapping only once
On 02.12.18 16:02, Richard Weinberger wrote: Sasha, Am Sonntag, 2. Dezember 2018, 15:35:43 CET schrieb Sasha Levin: On Sun, Dec 02, 2018 at 11:50:33AM +, Sudip Mukherjee wrote: Now queued up for 4.14.y, thanks. can you *please* slow a little down? True. It will really help if you can have some sort of fixed schedule for stable release, like maybe stablerc is ready on Thursday or Friday and release the stable on Monday. Having a weekend in stablerc will be helpful for people like me who only get the time in weekends for upstream or stable kernel. Any sort of schedule will never work for everyone (for example, if it's part of your paid job - you don't necessarily want to review stuff over the weekend). a schedule is not needed, but please give maintainers at least a chance to react on stable inclusion request. In this case Martin asked for inclusion on Monday and the patch was applied two days later. True, especially when the maintainer is asked a question as part of the patch. I've already had the feeling that we'd need the other patch too, but in this case at least I should have searched for Fixes tags. Greg, how about reminding people of Fixes tags in Documentation/process/stable-kernel-rules.rst ? martin smime.p7s Description: S/MIME cryptographic signature
[PATCH] pstore/ram: Avoid NULL deref in ftrace merging failure path
Given corruption in the ftrace records, it might be possible to allocate tmp_prz without assigning prz to it, but still marking it as needing to be freed, which would cause at least a NULL dereference. smatch warnings: fs/pstore/ram.c:340 ramoops_pstore_read() error: we previously assumed 'prz' could be null (see line 255) https://lists.01.org/pipermail/kbuild-all/2018-December/055528.html Reported-by: Dan Carpenter Fixes: 2fbea82bbb89 ("pstore: Merge per-CPU ftrace records into one") Cc: "Joel Fernandes (Google)" Signed-off-by: Kees Cook --- fs/pstore/ram.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/fs/pstore/ram.c b/fs/pstore/ram.c index e6d9560ea455..96f7d32cd184 100644 --- a/fs/pstore/ram.c +++ b/fs/pstore/ram.c @@ -291,6 +291,7 @@ static ssize_t ramoops_pstore_read(struct pstore_record *record) GFP_KERNEL); if (!tmp_prz) return -ENOMEM; + prz = tmp_prz; free_prz = true; while (cxt->ftrace_read_cnt < cxt->max_ftrace_cnt) { @@ -309,7 +310,6 @@ static ssize_t ramoops_pstore_read(struct pstore_record *record) goto out; } record->id = 0; - prz = tmp_prz; } } -- 2.17.1 -- Kees Cook
Re: [PATCH v9 2/4] seccomp: switch system call argument type to void *
Hi Tycho, I love your patch! Yet something to improve: [auto build test ERROR on linus/master] [also build test ERROR on v4.20-rc5 next-20181203] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Tycho-Andersen/seccomp-hoist-struct-seccomp_data-recalculation-higher/20181204-013450 config: i386-randconfig-x005-201848 (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): In file included from kernel/seccomp.c:28:0: >> include/linux/syscalls.h:239:18: error: conflicting types for 'sys_seccomp' asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \ ^ include/linux/syscalls.h:225:2: note: in expansion of macro '__SYSCALL_DEFINEx' __SYSCALL_DEFINEx(x, sname, __VA_ARGS__) ^ include/linux/syscalls.h:216:36: note: in expansion of macro 'SYSCALL_DEFINEx' #define SYSCALL_DEFINE3(name, ...) SYSCALL_DEFINEx(3, _##name, __VA_ARGS__) ^~~ kernel/seccomp.c:946:1: note: in expansion of macro 'SYSCALL_DEFINE3' SYSCALL_DEFINE3(seccomp, unsigned int, op, unsigned int, flags, ^~~ In file included from kernel/seccomp.c:28:0: include/linux/syscalls.h:881:17: note: previous declaration of 'sys_seccomp' was here asmlinkage long sys_seccomp(unsigned int op, unsigned int flags, ^~~ vim +/sys_seccomp +239 include/linux/syscalls.h 1bd21c6c2 Dominik Brodowski 2018-04-05 228 e145242ea Dominik Brodowski 2018-04-09 229 /* e145242ea Dominik Brodowski 2018-04-09 230 * The asmlinkage stub is aliased to a function named __se_sys_*() which e145242ea Dominik Brodowski 2018-04-09 231 * sign-extends 32-bit ints to longs whenever needed. The actual work is e145242ea Dominik Brodowski 2018-04-09 232 * done within __do_sys_*(). e145242ea Dominik Brodowski 2018-04-09 233 */ 1bd21c6c2 Dominik Brodowski 2018-04-05 234 #ifndef __SYSCALL_DEFINEx bed1ffca0 Frederic Weisbecker 2009-03-13 235 #define __SYSCALL_DEFINEx(x, name, ...) \ bee200317 Arnd Bergmann 2018-06-19 236 __diag_push(); \ bee200317 Arnd Bergmann 2018-06-19 237 __diag_ignore(GCC, 8, "-Wattribute-alias", \ bee200317 Arnd Bergmann 2018-06-19 238 "Type aliasing is used to sanitize syscall arguments");\ 83460ec8d Andi Kleen 2013-11-12 @239 asmlinkage long sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) \ e145242ea Dominik Brodowski 2018-04-09 240 __attribute__((alias(__stringify(__se_sys##name;\ c9a211951 Howard McLauchlan 2018-03-21 241 ALLOW_ERROR_INJECTION(sys##name, ERRNO);\ e145242ea Dominik Brodowski 2018-04-09 242 static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__));\ e145242ea Dominik Brodowski 2018-04-09 243 asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)); \ e145242ea Dominik Brodowski 2018-04-09 244 asmlinkage long __se_sys##name(__MAP(x,__SC_LONG,__VA_ARGS__)) \ 1a94bc347 Heiko Carstens 2009-01-14 245 { \ e145242ea Dominik Brodowski 2018-04-09 246 long ret = __do_sys##name(__MAP(x,__SC_CAST,__VA_ARGS__));\ 07fe6e00f Al Viro 2013-01-21 247 __MAP(x,__SC_TEST,__VA_ARGS__); \ 2cf096668 Al Viro 2013-01-21 248 __PROTECT(x, ret,__MAP(x,__SC_ARGS,__VA_ARGS__)); \ 2cf096668 Al Viro 2013-01-21 249 return ret; \ 1a94bc347 Heiko Carstens 2009-01-14 250 } \ bee200317 Arnd Bergmann 2018-06-19 251 __diag_pop(); \ e145242ea Dominik Brodowski 2018-04-09 252 static inline long __do_sys##name(__MAP(x,__SC_DECL,__VA_ARGS__)) 1bd21c6c2 Dominik Brodowski 2018-04-05 253 #endif /* __SYSCALL_DEFINEx */ 1a94bc347 Heiko Carstens 2009-01-14 254 :: The code at line 239 was first introduced by commit :: 83460ec8dcac14142e7860a01fa59c267ac4657c syscalls.h: use gcc alias instead of assembler aliases for syscalls :: TO: Andi Kleen :: CC: Linus Torvalds --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH v4 0/7] zram idle page writeback
On (12/03/18 11:40), Minchan Kim wrote: > > Minchan Kim (7): > zram: fix lockdep warning of free block handling > zram: fix double free backing device > zram: refactoring flags and writeback stuff > zram: introduce ZRAM_IDLE flag > zram: support idle/huge page writeback > zram: add bd_stat statistics > zram: writeback throttle Looks good to me. Reviewed-by: Sergey Senozhatsky -ss
Re: [PATCH v7 2/4] clk: meson: add DT documentation for emmc clock controller
Hi Stephen, On 2018/12/4 6:45, Stephen Boyd wrote: > Quoting Jianxin Pan (2018-11-15 04:18:30) >> diff --git a/include/dt-bindings/clock/amlogic,mmc-clkc.h >> b/include/dt-bindings/clock/amlogic,mmc-clkc.h >> new file mode 100644 >> index 000..162b949 >> --- /dev/null >> +++ b/include/dt-bindings/clock/amlogic,mmc-clkc.h >> @@ -0,0 +1,17 @@ >> +/* SPDX-License-Identifier: (GPL-2.0+ OR MIT) */ >> +/* >> + * Meson MMC sub clock tree IDs >> + * >> + * Copyright (c) 2018 Amlogic, Inc. All rights reserved. >> + * Author: Yixun Lan >> + */ >> + >> +#ifndef __MMC_CLKC_H >> +#define __MMC_CLKC_H >> + >> +#define CLKID_MMC_DIV 1 > > Why does the define numbering start with 1 instead of 0? > The Clock ID 0 is used by CLKID_MMC_MUX. CLKID_MMC_MUX is an internal clock which defined in drivers/clk/meson/mmc-clkc.c, and it's the parent of CLKID_MMC_DIV. >> +#define CLKID_MMC_PHASE_CORE 2 >> +#define CLKID_MMC_PHASE_TX 3 >> +#define CLKID_MMC_PHASE_RX 4 >> + > > . >
Re: [PATCH v5 2/2] phy: qualcomm: Add Synopsys High-Speed USB PHY driver
Hi Kishon, On Tue, Dec 04, 2018 at 10:38:19AM +0530, Kishon Vijay Abraham I wrote: > Hi, > > On 27/11/18 3:37 PM, Shawn Guo wrote: > > It adds Synopsys 28nm Femto High-Speed USB PHY driver support, which > > is usually paired with Synopsys DWC3 USB controllers on Qualcomm SoCs. > > Is this Synopsys PHY specific to Qualcomm or could it be used by other vendors > (with just changing tuning parameters)? If it could be used by other vendors > then it would make sense to add this PHY driver in synopsys directory. My knowledge is that this Synopsys PHY is specific to Qualcomm SoCs. @Sriharsha, correct me if I'm wrong. Shawn
[PATCH] spi: lpspi: Fix CLK pin becomes low before one transfer
Remove Reset operation in fsl_lpspi_config(). This RST may cause both CLK and CS pins go from high to low level under cs-gpio mode. Add fsl_lpspi_reset() function after one message transfer to clear all flags in use. Signed-off-by: Clark Wang Reviewed-by: Fugang Duan --- drivers/spi/spi-fsl-lpspi.c | 24 1 file changed, 20 insertions(+), 4 deletions(-) diff --git a/drivers/spi/spi-fsl-lpspi.c b/drivers/spi/spi-fsl-lpspi.c index f32a2e0d7ae1..a7d01b79827b 100644 --- a/drivers/spi/spi-fsl-lpspi.c +++ b/drivers/spi/spi-fsl-lpspi.c @@ -279,10 +279,6 @@ static int fsl_lpspi_config(struct fsl_lpspi_data *fsl_lpspi) u32 temp; int ret; - temp = CR_RST; - writel(temp, fsl_lpspi->base + IMX7ULP_CR); - writel(0, fsl_lpspi->base + IMX7ULP_CR); - if (!fsl_lpspi->is_slave) { ret = fsl_lpspi_set_bitrate(fsl_lpspi); if (ret) @@ -373,6 +369,24 @@ static int fsl_lpspi_wait_for_completion(struct spi_controller *controller) return 0; } +static int fsl_lpspi_reset(struct fsl_lpspi_data *fsl_lpspi) +{ + u32 temp; + + /* Disable all interrupt */ + fsl_lpspi_intctrl(fsl_lpspi, 0); + + /* W1C for all flags in SR */ + temp = 0x3F << 8; + writel(temp, fsl_lpspi->base + IMX7ULP_SR); + + /* Clear FIFO and disable module */ + temp = CR_RRF | CR_RTF; + writel(temp, fsl_lpspi->base + IMX7ULP_CR); + + return 0; +} + static int fsl_lpspi_transfer_one(struct spi_controller *controller, struct spi_device *spi, struct spi_transfer *t) @@ -394,6 +408,8 @@ static int fsl_lpspi_transfer_one(struct spi_controller *controller, if (ret) return ret; + fsl_lpspi_reset(fsl_lpspi); + return 0; } -- 2.17.1
[PATCH v19 5/5] iommu/arm-smmu: Add support for qcom,smmu-v2 variant
qcom,smmu-v2 is an arm,smmu-v2 implementation with specific clock and power requirements. On msm8996, multiple cores, viz. mdss, video, etc. use this smmu. On sdm845, this smmu is used with gpu. Add bindings for the same. Signed-off-by: Vivek Gautam Reviewed-by: Rob Herring Reviewed-by: Tomasz Figa Tested-by: Srinivas Kandagatla Reviewed-by: Robin Murphy --- Changes since v18: None. drivers/iommu/arm-smmu.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index b6b11642b3a9..ba18d89d4732 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -120,6 +120,7 @@ enum arm_smmu_implementation { GENERIC_SMMU, ARM_MMU500, CAVIUM_SMMUV2, + QCOM_SMMUV2, }; struct arm_smmu_s2cr { @@ -2030,6 +2031,7 @@ ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, GENERIC_SMMU); ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU); ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500); ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2); +ARM_SMMU_MATCH_DATA(qcom_smmuv2, ARM_SMMU_V2, QCOM_SMMUV2); static const struct of_device_id arm_smmu_of_match[] = { { .compatible = "arm,smmu-v1", .data = _generic_v1 }, @@ -2038,6 +2040,7 @@ static const struct of_device_id arm_smmu_of_match[] = { { .compatible = "arm,mmu-401", .data = _mmu401 }, { .compatible = "arm,mmu-500", .data = _mmu500 }, { .compatible = "cavium,smmu-v2", .data = _smmuv2 }, + { .compatible = "qcom,smmu-v2", .data = _smmuv2 }, { }, }; MODULE_DEVICE_TABLE(of, arm_smmu_of_match); -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH v19 3/5] iommu/arm-smmu: Add the device_link between masters and smmu
From: Sricharan R Finally add the device link between the master device and smmu, so that the smmu gets runtime enabled/disabled only when the master needs it. This is done from add_device callback which gets called once when the master is added to the smmu. Signed-off-by: Sricharan R Signed-off-by: Vivek Gautam Reviewed-by: Tomasz Figa Tested-by: Srinivas Kandagatla Reviewed-by: Robin Murphy --- Changes since v18: None. drivers/iommu/arm-smmu.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c index 1917d214c4d9..b6b11642b3a9 100644 --- a/drivers/iommu/arm-smmu.c +++ b/drivers/iommu/arm-smmu.c @@ -1500,6 +1500,9 @@ static int arm_smmu_add_device(struct device *dev) iommu_device_link(>iommu, dev); + device_link_add(dev, smmu->dev, + DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_SUPPLIER); + return 0; out_cfg_free: -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH] spi: lpspi: Improve the stability of lpspi data transmission
Use SR_TDF to judge if need to send data, and SR_FCF is to judge if transmission end and to replace the waiting after transmission end. This waiting has no actual meaning, for module will set the FCF flag at the real end. The changes of interrupt flag and ISR function reduce the times of calling ISR. The use of the FCF flag improves the stability of the data transmission. These two points generally improve the data transfer speed of lpspi, especially when it is set to slave mode it can support higher transfer speed of the host. After making these changes, there is no need to use fsl_lpspi_txfifo_empty(), so remove it. Signed-off-by: Clark Wang --- drivers/spi/spi-fsl-lpspi.c | 61 - 1 file changed, 20 insertions(+), 41 deletions(-) diff --git a/drivers/spi/spi-fsl-lpspi.c b/drivers/spi/spi-fsl-lpspi.c index 3e935db5ff02..f32a2e0d7ae1 100644 --- a/drivers/spi/spi-fsl-lpspi.c +++ b/drivers/spi/spi-fsl-lpspi.c @@ -53,9 +53,11 @@ #define CR_RST BIT(1) #define CR_MEN BIT(0) #define SR_TCF BIT(10) +#define SR_FCF BIT(9) #define SR_RDF BIT(1) #define SR_TDF BIT(0) #define IER_TCIE BIT(10) +#define IER_FCIE BIT(9) #define IER_RDIE BIT(1) #define IER_TDIE BIT(0) #define CFGR1_PCSCFG BIT(27) @@ -174,28 +176,10 @@ static int lpspi_unprepare_xfer_hardware(struct spi_controller *controller) return 0; } -static int fsl_lpspi_txfifo_empty(struct fsl_lpspi_data *fsl_lpspi) -{ - u32 txcnt; - unsigned long orig_jiffies = jiffies; - - do { - txcnt = readl(fsl_lpspi->base + IMX7ULP_FSR) & 0xff; - - if (time_after(jiffies, orig_jiffies + msecs_to_jiffies(500))) { - dev_dbg(fsl_lpspi->dev, "txfifo empty timeout\n"); - return -ETIMEDOUT; - } - cond_resched(); - - } while (txcnt); - - return 0; -} - static void fsl_lpspi_write_tx_fifo(struct fsl_lpspi_data *fsl_lpspi) { u8 txfifo_cnt; + u32 temp; txfifo_cnt = readl(fsl_lpspi->base + IMX7ULP_FSR) & 0xff; @@ -206,9 +190,15 @@ static void fsl_lpspi_write_tx_fifo(struct fsl_lpspi_data *fsl_lpspi) txfifo_cnt++; } - if (!fsl_lpspi->remain && (txfifo_cnt < fsl_lpspi->txfifosize)) - writel(0, fsl_lpspi->base + IMX7ULP_TDR); - else + if (txfifo_cnt < fsl_lpspi->txfifosize) { + if (!fsl_lpspi->is_slave) { + temp = readl(fsl_lpspi->base + IMX7ULP_TCR); + temp &= ~TCR_CONTC; + writel(temp, fsl_lpspi->base + IMX7ULP_TCR); + } + + fsl_lpspi_intctrl(fsl_lpspi, IER_FCIE); + } else fsl_lpspi_intctrl(fsl_lpspi, IER_TDIE); } @@ -404,12 +394,6 @@ static int fsl_lpspi_transfer_one(struct spi_controller *controller, if (ret) return ret; - ret = fsl_lpspi_txfifo_empty(fsl_lpspi); - if (ret) - return ret; - - fsl_lpspi_read_rx_fifo(fsl_lpspi); - return 0; } @@ -421,7 +405,6 @@ static int fsl_lpspi_transfer_one_msg(struct spi_controller *controller, struct spi_device *spi = msg->spi; struct spi_transfer *xfer; bool is_first_xfer = true; - u32 temp; int ret = 0; msg->status = 0; @@ -441,13 +424,6 @@ static int fsl_lpspi_transfer_one_msg(struct spi_controller *controller, } complete: - if (!fsl_lpspi->is_slave) { - /* de-assert SS, then finalize current message */ - temp = readl(fsl_lpspi->base + IMX7ULP_TCR); - temp &= ~TCR_CONTC; - writel(temp, fsl_lpspi->base + IMX7ULP_TCR); - } - msg->status = ret; spi_finalize_current_message(controller); @@ -456,20 +432,23 @@ static int fsl_lpspi_transfer_one_msg(struct spi_controller *controller, static irqreturn_t fsl_lpspi_isr(int irq, void *dev_id) { + u32 temp_SR, temp_IER; struct fsl_lpspi_data *fsl_lpspi = dev_id; - u32 temp; + temp_IER = readl(fsl_lpspi->base + IMX7ULP_IER); fsl_lpspi_intctrl(fsl_lpspi, 0); - temp = readl(fsl_lpspi->base + IMX7ULP_SR); + temp_SR = readl(fsl_lpspi->base + IMX7ULP_SR); fsl_lpspi_read_rx_fifo(fsl_lpspi); - if (temp & SR_TDF) { + if ((temp_SR & SR_TDF) && (temp_IER & IER_TDIE)) { fsl_lpspi_write_tx_fifo(fsl_lpspi); + return IRQ_HANDLED; + } - if (!fsl_lpspi->remain) + if (temp_SR & SR_FCF && (temp_IER & IER_FCIE)) { + writel(SR_FCF, fsl_lpspi->base + IMX7ULP_SR); complete(_lpspi->xfer_done); - return IRQ_HANDLED; } -- 2.17.1
Re: [PATCH v9 2/4] seccomp: switch system call argument type to void *
On Mon, Dec 3, 2018 at 12:01 AM Serge E. Hallyn wrote: > On Sun, Dec 02, 2018 at 08:28:25PM -0700, Tycho Andersen wrote: > > The const qualifier causes problems for any code that wants to write to the > > third argument of the seccomp syscall, as we will do in a future patch in > > this series. > > > > The third argument to the seccomp syscall is documented as void *, so > > rather than just dropping the const, let's switch everything to use void * > > as well. > > > > I believe this is safe because of 1. the documentation above, 2. there's no > > real type information exported about syscalls anywhere besides the man > > pages. > > > > Signed-off-by: Tycho Andersen > > CC: Kees Cook > > CC: Andy Lutomirski > > CC: Oleg Nesterov > > CC: Eric W. Biederman > > CC: "Serge E. Hallyn" > > Acked-by: Serge Hallyn > > Though I'm not entirely convinced there will be no ill effects of changing > the argument type. I'll feel comfortable when Michael and Paul say it's > fine :) Well, looking at the seccomp(2) manpage on my system (dated 2018-02-02) the third argument is already shown as a "void *args": SYNOPSIS #include #include #include #include #include int seccomp(unsigned int operation, unsigned int flags, void *args); ... so I think we're safe :) >From a libseccomp perspective, we always call seccomp(2) via syscall(2) so it is unlikely we would ever run into problems, not too mention that we are just talking about the pointer type used in the kernel; from a syscall ABI perspective it is still a pointer value and that is the important part. > > CC: Christian Brauner > > CC: Tyler Hicks > > CC: Akihiro Suda > > --- > > include/linux/seccomp.h | 2 +- > > kernel/seccomp.c| 8 > > 2 files changed, 5 insertions(+), 5 deletions(-) > > > > diff --git a/include/linux/seccomp.h b/include/linux/seccomp.h > > index e5320f6c8654..b5103c019cf4 100644 > > --- a/include/linux/seccomp.h > > +++ b/include/linux/seccomp.h > > @@ -43,7 +43,7 @@ extern void secure_computing_strict(int this_syscall); > > #endif > > > > extern long prctl_get_seccomp(void); > > -extern long prctl_set_seccomp(unsigned long, char __user *); > > +extern long prctl_set_seccomp(unsigned long, void __user *); > > > > static inline int seccomp_mode(struct seccomp *s) > > { > > diff --git a/kernel/seccomp.c b/kernel/seccomp.c > > index 96afc32e041d..393e029f778a 100644 > > --- a/kernel/seccomp.c > > +++ b/kernel/seccomp.c > > @@ -924,7 +924,7 @@ static long seccomp_get_action_avail(const char __user > > *uaction) > > > > /* Common entry point for both prctl and syscall. */ > > static long do_seccomp(unsigned int op, unsigned int flags, > > -const char __user *uargs) > > +void __user *uargs) > > { > > switch (op) { > > case SECCOMP_SET_MODE_STRICT: > > @@ -944,7 +944,7 @@ static long do_seccomp(unsigned int op, unsigned int > > flags, > > } > > > > SYSCALL_DEFINE3(seccomp, unsigned int, op, unsigned int, flags, > > - const char __user *, uargs) > > + void __user *, uargs) > > { > > return do_seccomp(op, flags, uargs); > > } > > @@ -956,10 +956,10 @@ SYSCALL_DEFINE3(seccomp, unsigned int, op, unsigned > > int, flags, > > * > > * Returns 0 on success or -EINVAL on failure. > > */ > > -long prctl_set_seccomp(unsigned long seccomp_mode, char __user *filter) > > +long prctl_set_seccomp(unsigned long seccomp_mode, void __user *filter) > > { > > unsigned int op; > > - char __user *uargs; > > + void __user *uargs; > > > > switch (seccomp_mode) { > > case SECCOMP_MODE_STRICT: > > -- > > 2.19.1 -- paul moore www.paul-moore.com
Re: [PATCH 2/2] dt-bindings: rtc: Move trivial RTCs to rtc.txt
On 03/12/2018 17:45:58-0600, Rob Herring wrote: > On Sun, Nov 11, 2018 at 08:31:14PM +0100, Alexandre Belloni wrote: > > Move trivial RTCs to the rtc generic binding documentation as they all also > > support at least 'start-year'. > > > > Signed-off-by: Alexandre Belloni > > --- > > Documentation/devicetree/bindings/rtc/rtc.txt | 34 +++ > > .../devicetree/bindings/trivial-devices.txt | 24 - > > 2 files changed, 34 insertions(+), 24 deletions(-) > > Okay if I take these 2 to avoid conflicts with trivial-devices.txt? > Sure, I was actually planning for them to go through your tree for this reason. -- Alexandre Belloni, Bootlin Embedded Linux and Kernel engineering https://bootlin.com
Re: linux-next: build failure after merge of the rdma tree
On Tue, Dec 04, 2018 at 11:47:31AM +1100, Stephen Rothwell wrote: > Hi all, > > After merging the rdma tree, today's linux-next build (x86_64 > allmodconfig) failed like this: > > ERROR: "mlx5_get_send_wqe" [drivers/infiniband/hw/mlx5/mlx5_ib.ko] undefined! > > Caused by commit > > 34f4c9554d8b ("IB/mlx5: Use fragmented QP's buffer for in-kernel users") > > mlx5_get_send_wqe() is still used in drivers/infiniband/hw/mlx5/cq.c > and declared in drivers/infiniband/hw/mlx5/mlx5_ib.h ... > > I have used the version of the rdma tree from next-20181203 for today. Huh. So apparently every compiler that tested this patch (0-day, mine, the submitters) optimized this call away because is_atomic_response() always returns 0: meaning mlx5_get_atomic_laddr is never callable and can be deleted entirely, including the call to mlx5_get_send_wqe. Not sure what compiler setup will hit this, but it is clearly wrong code.. Guy/Leon, please send a fixup.. Maybe just delete all this handle_atomics stuff? Thanks, Jason
Re: [PATCH linux-next v2 6/6] ASoC: rsnd: add avb clocks
Hi Jiada > There are AVB Counter Clocks in ADG, each clock has 12bits integral > and 8 bits fractional dividers which operates with S0D1ϕ clock. > > This patch registers 8 AVB Counter Clocks when clock-cells of > rcar_sound node is 2, > > Signed-off-by: Jiada Wang > --- (snip) > +struct clk_avb { > + struct clk_hw hw; > + unsigned int idx; > + struct rsnd_mod *mod; > + /* lock reg access */ > + spinlock_t *lock; > +}; > + > +#define to_clk_avb(_hw) container_of(_hw, struct clk_avb, hw) I like "hw_to_avb()" > +static struct clk *rsnd_adg_clk_src_twocell_get(struct of_phandle_args > *clkspec, > + void *data) > +{ > + unsigned int clkidx = clkspec->args[1]; > + struct rsnd_adg *adg = data; > + const char *type; > + struct clk *clk; > + > + switch (clkspec->args[0]) { > + case ADG_FIX: > + type = "fixed"; > + if (clkidx >= CLKOUTMAX) { > + pr_err("Invalid %s clock index %u\n", type, > +clkidx); > + return ERR_PTR(-EINVAL); > + } > + clk = adg->clkout[clkidx]; > + break; > + case ADG_AVB: > + type = "avb"; > + if (clkidx >= AVB_CLK_NUM) { > + pr_err("Invalid %s clock index %u\n", type, > +clkidx); > + return ERR_PTR(-EINVAL); > + } > + clk = adg->clkavb[clkidx]; > + break; > + default: > + pr_err("Invalid ADG clock type %u\n", clkspec->args[0]); > + return ERR_PTR(-EINVAL); > + } > + > + return clk; > +} In this function 1) I don't think you need to use "char *type". 2) If you use "clkidx = clkspec->args[1]", having same name for "clkspec->args[0]" is readable. 3) please use dev_err() instad of pr_err() I think data can be priv, and you can use rsnd_priv_to_adg(), rsnd_priv_to_dev() > +static void clk_avb_div_write(struct rsnd_mod *mod, u32 data, int idx) (snip) > +static u32 clk_avb_div_read(struct rsnd_mod *mod, int idx) To reduce confusion, and be more redable code, I think these function can be clk_avb_div_write(struct rsnd_adg *adg, u32 data); clk_avb_div_read(struct rsnd_adg *adg); > +static int clk_avb_set_rate(struct clk_hw *hw, unsigned long rate, > + unsigned long parent_rate) > +{ > + struct clk_avb *avb = to_clk_avb(hw); > + unsigned int div = clk_avb_calc_div(rate, parent_rate); > + u32 val; > + > + val = clk_avb_div_read(avb->mod, avb->idx) & ~AVB_DIV_MASK; > + clk_avb_div_write(avb->mod, val | div, avb->idx); > + > + return 0; > +} Why do we need to care about ~AVB_DIV_MASK area ? These are 0 Reserved, I think. > +static const struct clk_ops clk_avb_ops = { > + .enable = clk_avb_enable, > + .disable = clk_avb_disable, > + .is_enabled = clk_avb_is_enabled, > + .recalc_rate = clk_avb_recalc_rate, > + .round_rate = clk_avb_round_rate, > + .set_rate = clk_avb_set_rate, > +}; This is not a big deal, but I like tab aligned ops > +static struct clk *clk_register_avb(struct device *dev, struct rsnd_mod *mod, > + unsigned int id, spinlock_t *lock) > +{ > + struct clk_init_data init; > + struct clk_avb *avb; > + struct clk *clk; > + char name[AVB_CLK_NAME_SIZE]; > + const char *parent_name = ADG_CLK_NAME; > + > + avb = devm_kzalloc(dev, sizeof(*avb), GFP_KERNEL); > + if (!avb) > + return ERR_PTR(-ENOMEM); > + > + snprintf(name, AVB_CLK_NAME_SIZE, "%s%u", AVB_CLK_NAME, id); > + > + avb->idx = id; > + avb->lock = lock; > + avb->mod = mod; > + > + /* Register the clock. */ > + init.name = name; > + init.ops = _avb_ops; > + init.flags = CLK_IS_BASIC; > + init.parent_names = _name; > + init.num_parents = 1; > + > + avb->hw.init = > + > + /* init DIV to a valid state */ > + clk_avb_div_write(avb->mod, avb->idx, AVB_MAX_DIV); Please check parameter, I think you want to do is - clk_avb_div_write(avb->mod, avb->idx, AVB_MAX_DIV); + clk_avb_div_write(avb->mod, AVB_MAX_DIV, avb->idx); Best regards --- Kuninori Morimoto
[PATCH v5 10/13] nvme-tcp: Add protocol header
From: Sagi Grimberg Signed-off-by: Sagi Grimberg --- include/linux/nvme-tcp.h | 189 +++ include/linux/nvme.h | 1 + 2 files changed, 190 insertions(+) create mode 100644 include/linux/nvme-tcp.h diff --git a/include/linux/nvme-tcp.h b/include/linux/nvme-tcp.h new file mode 100644 index ..03d87c0550a9 --- /dev/null +++ b/include/linux/nvme-tcp.h @@ -0,0 +1,189 @@ +// SPDX-License-Identifier: GPL-2.0 +/* + * NVMe over Fabrics TCP protocol header. + * Copyright (c) 2018 Lightbits Labs. All rights reserved. + */ + +#ifndef _LINUX_NVME_TCP_H +#define _LINUX_NVME_TCP_H + +#include + +#define NVME_TCP_DISC_PORT 8009 +#define NVME_TCP_ADMIN_CCSZSZ_8K +#define NVME_TCP_DIGEST_LENGTH 4 + +enum nvme_tcp_pfv { + NVME_TCP_PFV_1_0 = 0x0, +}; + +enum nvme_tcp_fatal_error_status { + NVME_TCP_FES_INVALID_PDU_HDR= 0x01, + NVME_TCP_FES_PDU_SEQ_ERR= 0x02, + NVME_TCP_FES_HDR_DIGEST_ERR = 0x03, + NVME_TCP_FES_DATA_OUT_OF_RANGE = 0x04, + NVME_TCP_FES_R2T_LIMIT_EXCEEDED = 0x05, + NVME_TCP_FES_DATA_LIMIT_EXCEEDED= 0x05, + NVME_TCP_FES_UNSUPPORTED_PARAM = 0x06, +}; + +enum nvme_tcp_digest_option { + NVME_TCP_HDR_DIGEST_ENABLE = (1 << 0), + NVME_TCP_DATA_DIGEST_ENABLE = (1 << 1), +}; + +enum nvme_tcp_pdu_type { + nvme_tcp_icreq = 0x0, + nvme_tcp_icresp = 0x1, + nvme_tcp_h2c_term = 0x2, + nvme_tcp_c2h_term = 0x3, + nvme_tcp_cmd= 0x4, + nvme_tcp_rsp= 0x5, + nvme_tcp_h2c_data = 0x6, + nvme_tcp_c2h_data = 0x7, + nvme_tcp_r2t= 0x9, +}; + +enum nvme_tcp_pdu_flags { + NVME_TCP_F_HDGST= (1 << 0), + NVME_TCP_F_DDGST= (1 << 1), + NVME_TCP_F_DATA_LAST= (1 << 2), + NVME_TCP_F_DATA_SUCCESS = (1 << 3), +}; + +/** + * struct nvme_tcp_hdr - nvme tcp pdu common header + * + * @type: pdu type + * @flags: pdu specific flags + * @hlen: pdu header length + * @pdo: pdu data offset + * @plen: pdu wire byte length + */ +struct nvme_tcp_hdr { + __u8type; + __u8flags; + __u8hlen; + __u8pdo; + __le32 plen; +}; + +/** + * struct nvme_tcp_icreq_pdu - nvme tcp initialize connection request pdu + * + * @hdr: pdu generic header + * @pfv: pdu version format + * @hpda: host pdu data alignment (dwords, 0's based) + * @digest:digest types enabled + * @maxr2t:maximum r2ts per request supported + */ +struct nvme_tcp_icreq_pdu { + struct nvme_tcp_hdr hdr; + __le16 pfv; + __u8hpda; + __u8digest; + __le32 maxr2t; + __u8rsvd2[112]; +}; + +/** + * struct nvme_tcp_icresp_pdu - nvme tcp initialize connection response pdu + * + * @hdr: pdu common header + * @pfv: pdu version format + * @cpda: controller pdu data alignment (dowrds, 0's based) + * @digest:digest types enabled + * @maxdata: maximum data capsules per r2t supported + */ +struct nvme_tcp_icresp_pdu { + struct nvme_tcp_hdr hdr; + __le16 pfv; + __u8cpda; + __u8digest; + __le32 maxdata; + __u8rsvd[112]; +}; + +/** + * struct nvme_tcp_term_pdu - nvme tcp terminate connection pdu + * + * @hdr: pdu common header + * @fes: fatal error status + * @fei: fatal error information + */ +struct nvme_tcp_term_pdu { + struct nvme_tcp_hdr hdr; + __le16 fes; + __le32 fei; + __u8rsvd[8]; +}; + +/** + * struct nvme_tcp_cmd_pdu - nvme tcp command capsule pdu + * + * @hdr: pdu common header + * @cmd: nvme command + */ +struct nvme_tcp_cmd_pdu { + struct nvme_tcp_hdr hdr; + struct nvme_command cmd; +}; + +/** + * struct nvme_tcp_rsp_pdu - nvme tcp response capsule pdu + * + * @hdr: pdu common header + * @hdr: nvme-tcp generic header + * @cqe: nvme completion queue entry + */ +struct nvme_tcp_rsp_pdu { + struct nvme_tcp_hdr hdr; + struct nvme_completion cqe; +}; + +/** + * struct nvme_tcp_r2t_pdu - nvme tcp ready-to-transfer pdu + * + * @hdr: pdu common header + * @command_id:nvme command identifier which this relates to + * @ttag: transfer tag (controller generated) + * @r2t_offset:offset from the start of the command data + * @r2t_length:length the host is allowed to send + */ +struct nvme_tcp_r2t_pdu { + struct nvme_tcp_hdr hdr; + __u16
Re: [PATCH] printk: Add caller information to printk() output.
On (12/02/18 20:23), Tetsuo Handa wrote: > > Some examples for console output: > > [0.919699]@T1 x86: Booting SMP configuration: > [4.152681]@T271 Fusion MPT base driver 3.04.20 > [5.070470]@C0 random: fast init done > [6.587900]@C3 random: crng init done This is hard to read. Let's have a fixed width space for from_id. > +#ifdef CONFIG_PRINTK_FROM > + if (in_task()) > + msg->from_id = current->pid; Let's use task_pid_nr(). > -static size_t print_time(u64 ts, char *buf) > -{ > - unsigned long rem_nsec = do_div(ts, 10); > - > - if (!buf) > - return snprintf(NULL, 0, "[%5lu.00] ", (unsigned long)ts); > - > - return sprintf(buf, "[%5lu.%06lu] ", > -(unsigned long)ts, rem_nsec / 1000); > -} OK, this patch depends on printk_time patch. > +config PRINTK_FROM > + bool "Show caller information on printks" > + depends on PRINTK Wasn't it supposed to also depend on DEBUG_AID_FOR_SYZBOT? -ss
Re: [PATCH 3/3] mm/memcg: Avoid reclaiming below hard protection
On 2018/12/3 PM 7:57, Michal Hocko wrote: > On Mon 03-12-18 16:01:19, Xunlei Pang wrote: >> When memcgs get reclaimed after its usage exceeds min, some >> usages below the min may also be reclaimed in the current >> implementation, the amount is considerably large during kswapd >> reclaim according to my ftrace results. > > And here again. Describe the setup and the behavior please? > step 1 mkdir -p /sys/fs/cgroup/memory/online cd /sys/fs/cgroup/memory/online echo 512M > memory.max echo 40960 > memory.min echo $$ > tasks dd if=/dev/sda of=/dev/null while true; do sleep 1; cat memory.current ; cat memory.min; done step 2 create global memory pressure by allocating annoymous and cached pages to constantly trigger kswap: dd if=/dev/sdb of=/dev/null step 3 Then observe "online" groups, hundreds of kbytes a little over memory.min can cause tens of MiB to be reclaimed by kswapd. Here is one of test results I got: cat memory.current; cat memory.min; echo; 409485312 // current 40960 // min 385052672 // See current got over reclaimed for 23MB 40960 // min Its corresponding ftrace output I monitored: kswapd_0-281 [000] 304.706632: shrink_node_memcg: min_excess=24, nr_reclaimed=6013, sc->nr_to_reclaim=147, exceeds 5989pages
Re: [PATCH V3 1/3] mmc: sdhci: add support for using external DMA devices
Hi Faiz, On Mon, 3 Dec 2018 at 21:55, Faiz Abbas wrote: > > Hi, > > On 03/12/18 5:45 PM, Faiz Abbas wrote: > > Hi, > > > >> +static void sdhci_external_dma_prepare_data(struct sdhci_host *host, > >> +struct mmc_command *cmd) > >> +{ > > > > Please add a condition for data == NULL here. This was already pointed > > out by Adrian in v2. The check for data is added in sdhci_external_dma_setup() . > > > > My test with an am335x-evm failed with these patches. Looks like the > > very first SDIO commands failing. I guess you didn't add 'dmas' in device tree, like patch 3 shows. > > > > https://pastebin.ubuntu.com/p/Y2RDjSKpgd/ > > > > Currently am335x-evm is using omap_hsmmc driver. I added the following > > patch to make it work with sdhci_omap. > > > > https://pastebin.ubuntu.com/p/VTGrCbJxY3/ > > > > Will look deeper into this. Please ping if you need any more information. > > > > So I disabled DMA in the driver altogether and still got the same > messages on am335x-evm in PIO mode. Looks like something more is > required for it to be supported. > > I instead shifted to a dra71-evm which supports both ADMA and external > DMA. Here is the log: > > https://pastebin.ubuntu.com/p/mcJmgcjQsp/ > > The interface fundamentally works but it complains with the following error: > Yes, it switched back to ADMA/PIO since sdhci couldn't find 'dmas' property in devicetree. > [3.111693] Failed to request TX DMA channel. Ok, I will add a check in sdhci-omap.c before switching to external dma, that should be able to avoid this error logs. Thanks for the review and test! Chunyan
[RFC PATCH] Add IPA clock support for clk-rpmh
This patch extends the existing clk-rpmh driver to support a different type of RPMh resource known as Bus Clock Manager(BCM) in order to scale performance for the Qualcomm IP Accelerator(IPA) core clock. David Dai (1): clk: qcom: clk-rpmh: Add IPA clock support drivers/clk/qcom/clk-rpmh.c | 142 ++ include/dt-bindings/clock/qcom,rpmh.h | 1 + 2 files changed, 143 insertions(+) -- The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum, a Linux Foundation Collaborative Project
[RFC PATCH] clk: qcom: clk-rpmh: Add IPA clock support
Add IPA clock support by extending the current clk rpmh driver to support clocks that are managed by a different type of RPMh resource known as Bus Clock Manager(BCM). Signed-off-by: David Dai --- drivers/clk/qcom/clk-rpmh.c | 142 ++ include/dt-bindings/clock/qcom,rpmh.h | 1 + 2 files changed, 143 insertions(+) diff --git a/drivers/clk/qcom/clk-rpmh.c b/drivers/clk/qcom/clk-rpmh.c index 9f4fc77..42e2cd2 100644 --- a/drivers/clk/qcom/clk-rpmh.c +++ b/drivers/clk/qcom/clk-rpmh.c @@ -18,6 +18,32 @@ #define CLK_RPMH_ARC_EN_OFFSET 0 #define CLK_RPMH_VRM_EN_OFFSET 4 +#define BCM_TCS_CMD_COMMIT_MASK0x4000 +#define BCM_TCS_CMD_VALID_SHIFT29 +#define BCM_TCS_CMD_VOTE_MASK 0x3fff +#define BCM_TCS_CMD_VOTE_SHIFT 0 + +#define BCM_TCS_CMD(valid, vote) \ + (BCM_TCS_CMD_COMMIT_MASK |\ + ((valid) << BCM_TCS_CMD_VALID_SHIFT) |\ + ((cpu_to_le32(vote) &\ + BCM_TCS_CMD_VOTE_MASK) << BCM_TCS_CMD_VOTE_SHIFT)) + +/** + * struct bcm_db - Auxiliary data pertaining to each Bus Clock Manager(BCM) + * @unit: divisor used to convert Hz value to an RPMh msg + * @width: multiplier used to convert Hz value to an RPMh msg + * @vcd: virtual clock domain that this bcm belongs to + * @reserved: reserved to pad the struct + */ + +struct bcm_db { + u32 unit; + u16 width; + u8 vcd; + u8 reserved; +}; + /** * struct clk_rpmh - individual rpmh clock data structure * @hw:handle between common and hardware-specific interfaces @@ -29,6 +55,7 @@ * @aggr_state:rpmh clock aggregated state * @last_sent_aggr_state: rpmh clock last aggr state sent to RPMh * @valid_state_mask: mask to determine the state of the rpmh clock + * @aux_data: data specific to the bcm rpmh resource * @dev: device to which it is attached * @peer: pointer to the clock rpmh sibling */ @@ -42,6 +69,7 @@ struct clk_rpmh { u32 aggr_state; u32 last_sent_aggr_state; u32 valid_state_mask; + struct bcm_db aux_data; struct device *dev; struct clk_rpmh *peer; }; @@ -98,6 +126,17 @@ struct clk_rpmh_desc { __DEFINE_CLK_RPMH(_platform, _name, _name_active, _res_name,\ CLK_RPMH_VRM_EN_OFFSET, 1, _div) +#define DEFINE_CLK_RPMH_BCM(_platform, _name, _res_name) \ + static struct clk_rpmh _platform##_##_name = { \ + .res_name = _res_name, \ + .valid_state_mask = BIT(RPMH_ACTIVE_ONLY_STATE),\ + .div = 1, \ + .hw.init = &(struct clk_init_data){ \ + .ops = _rpmh_bcm_ops, \ + .name = #_name, \ + }, \ + } + static inline struct clk_rpmh *to_clk_rpmh(struct clk_hw *_hw) { return container_of(_hw, struct clk_rpmh, hw); @@ -210,6 +249,91 @@ static unsigned long clk_rpmh_recalc_rate(struct clk_hw *hw, .recalc_rate= clk_rpmh_recalc_rate, }; +static int clk_rpmh_bcm_send_cmd(struct clk_rpmh *c, bool enable) +{ + struct tcs_cmd cmd = { 0 }; + u32 cmd_state; + int ret; + + cmd_state = enable ? (c->aggr_state ? c->aggr_state : 1) : 0; + + if (c->last_sent_aggr_state == cmd_state) + return 0; + + cmd.addr = c->res_addr; + cmd.data = BCM_TCS_CMD(enable, cmd_state); + + ret = rpmh_write_async(c->dev, RPMH_ACTIVE_ONLY_STATE, , 1); + if (ret) { + dev_err(c->dev, "set active state of %s failed: (%d)\n", + c->res_name, ret); + return ret; + } + + c->last_sent_aggr_state = cmd_state; + + return 0; +} + +static int clk_rpmh_bcm_prepare(struct clk_hw *hw) +{ + struct clk_rpmh *c = to_clk_rpmh(hw); + int ret = 0; + + mutex_lock(_clk_lock); + ret = clk_rpmh_bcm_send_cmd(c, true); + mutex_unlock(_clk_lock); + + return ret; +}; + +static void clk_rpmh_bcm_unprepare(struct clk_hw *hw) +{ + struct clk_rpmh *c = to_clk_rpmh(hw); + + mutex_lock(_clk_lock); + clk_rpmh_bcm_send_cmd(c, false); + mutex_unlock(_clk_lock); +}; + +static int clk_rpmh_bcm_set_rate(struct clk_hw *hw, unsigned long rate, +unsigned long parent_rate) +{ + struct clk_rpmh *c = to_clk_rpmh(hw); + + c->aggr_state = rate / (c->aux_data.unit * 1000); + + if (clk_hw_is_prepared(hw)) { + mutex_lock(_clk_lock); + clk_rpmh_bcm_send_cmd(c, true); + mutex_unlock(_clk_lock); + } + + return 0; +}; + +static long clk_rpmh_round_rate(struct
Re: [PATCH] ext4: clean up indentation issues, remove extraneous tabs
On Tue, Nov 27, 2018 at 10:28:36AM +0100, Jan Kara wrote: > On Fri 23-11-18 16:30:43, Colin King wrote: > > From: Colin Ian King > > > > There are several lines that are indented too far, clean these > > up by removing the tabs. > > > > Signed-off-by: Colin Ian King > > The patch looks good. You can add: > > Reviewed-by: Jan Kara Thanks, applied. - Ted
Re: [PATCH v2 2/3] phy: sr-usb: Add stingray usb phy driver
Hi, On 03/12/18 2:06 PM, Srinath Mannam wrote: > This driver supports all versions of stingray SS and HS > USB phys. > In version 1 is combo phy contain both SS and HS phys > in a common IO space. > In version 2 a single HS phy. > These phys support both xHCI host driver and > BDC Broadcom device controller driver. > > Signed-off-by: Srinath Mannam > Reviewed-by: Florian Fainelli > Reviewed-by: Scott Branden > --- > drivers/phy/broadcom/Kconfig | 11 + > drivers/phy/broadcom/Makefile | 1 + > drivers/phy/broadcom/phy-bcm-sr-usb.c | 373 > ++ > 3 files changed, 385 insertions(+) > create mode 100644 drivers/phy/broadcom/phy-bcm-sr-usb.c > > diff --git a/drivers/phy/broadcom/Kconfig b/drivers/phy/broadcom/Kconfig > index 8786a96..c1e4dd5 100644 > --- a/drivers/phy/broadcom/Kconfig > +++ b/drivers/phy/broadcom/Kconfig > @@ -10,6 +10,17 @@ config PHY_CYGNUS_PCIE > Enable this to support the Broadcom Cygnus PCIe PHY. > If unsure, say N. > > +config PHY_BCM_SR_USB > + tristate "Broadcom Stingray USB PHY driver" > + depends on OF && (ARCH_BCM_IPROC || COMPILE_TEST) > + select GENERIC_PHY > + default ARCH_BCM_IPROC > + help > + Enable this to support the Broadcom Stingray USB PHY > + driver. It supports all versions of Superspeed and > + Highspeed PHYs. > + If unsure, say N. > + > config BCM_KONA_USB2_PHY > tristate "Broadcom Kona USB2 PHY Driver" > depends on HAS_IOMEM > diff --git a/drivers/phy/broadcom/Makefile b/drivers/phy/broadcom/Makefile > index 0f60184..f453c7d 100644 > --- a/drivers/phy/broadcom/Makefile > +++ b/drivers/phy/broadcom/Makefile > @@ -11,3 +11,4 @@ obj-$(CONFIG_PHY_BRCM_USB) += phy-brcm-usb-dvr.o > phy-brcm-usb-dvr-objs := phy-brcm-usb.o phy-brcm-usb-init.o > > obj-$(CONFIG_PHY_BCM_SR_PCIE)+= phy-bcm-sr-pcie.o > +obj-$(CONFIG_PHY_BCM_SR_USB) += phy-bcm-sr-usb.o > diff --git a/drivers/phy/broadcom/phy-bcm-sr-usb.c > b/drivers/phy/broadcom/phy-bcm-sr-usb.c > new file mode 100644 > index 000..52484b3 > --- /dev/null > +++ b/drivers/phy/broadcom/phy-bcm-sr-usb.c > @@ -0,0 +1,373 @@ > +// SPDX-License-Identifier: GPL-2.0 > +/* > + * Copyright (C) 2016-2018 Broadcom > + */ > + > +#include > +#include > +#include > +#include > +#include > +#include > + > +enum bcm_usb_phy_version { > + BCM_USB_PHY_V1, > + BCM_USB_PHY_V2, > +}; > + > +enum bcm_usb_phy_reg { > + PLL_NDIV_FRAC, > + PLL_NDIV_INT, > + PLL_CTRL, > + PHY_CTRL, > + PHY_PLL_CTRL, > +}; > + > +/* USB PHY registers */ > + > +static const u8 bcm_usb_u3phy_v1[] = { > + [PLL_CTRL] = 0x18, > + [PHY_CTRL] = 0x14, > +}; > + > +static const u8 bcm_usb_u2phy_v1[] = { > + [PLL_NDIV_FRAC] = 0x04, > + [PLL_NDIV_INT] = 0x08, > + [PLL_CTRL] = 0x0c, > + [PHY_CTRL] = 0x10, > +}; > + > +#define HSPLL_NDIV_INT_VAL 0x13 > +#define HSPLL_NDIV_FRAC_VAL 0x1005 > + > +static const u8 bcm_usb_u2phy_v2[] = { > + [PLL_NDIV_FRAC] = 0x0, > + [PLL_NDIV_INT] = 0x4, > + [PLL_CTRL] = 0x8, > + [PHY_CTRL] = 0xc, > +}; > + > +enum pll_ctrl_bits { > + PLL_RESETB, > + SSPLL_SUSPEND_EN, > + PLL_SEQ_START, > + PLL_LOCK, > + PLL_PDIV, > +}; > + > +static const u8 u3pll_ctrl[] = { > + [PLL_RESETB]= 0, > + [SSPLL_SUSPEND_EN] = 1, > + [PLL_SEQ_START] = 2, > + [PLL_LOCK] = 3, > +}; > + > +#define HSPLL_PDIV_MASK 0xF > +#define HSPLL_PDIV_VAL 0x1 > + > +static const u8 u2pll_ctrl[] = { > + [PLL_PDIV] = 1, > + [PLL_RESETB]= 5, > + [PLL_LOCK] = 6, > +}; > + > +enum bcm_usb_phy_ctrl_bits { > + CORERDY, > + AFE_LDO_PWRDWNB, > + AFE_PLL_PWRDWNB, > + AFE_BG_PWRDWNB, > + PHY_ISO, > + PHY_RESETB, > + PHY_PCTL, > +}; > + > +#define PHY_PCTL_MASK0x > +/* > + * 0x0806 of PCTL_VAL has below bits set > + * BIT-8 : refclk divider 1 > + * BIT-3:2: device mode; mode is not effect > + * BIT-1: soft reset active low > + */ > +#define HSPHY_PCTL_VAL 0x0806 > +#define SSPHY_PCTL_VAL 0x0006 > + > +static const u8 u3phy_ctrl[] = { > + [PHY_RESETB]= 1, > + [PHY_PCTL] = 2, > +}; > + > +static const u8 u2phy_ctrl[] = { > + [CORERDY] = 0, > + [AFE_LDO_PWRDWNB] = 1, > + [AFE_PLL_PWRDWNB] = 2, > + [AFE_BG_PWRDWNB]= 3, > + [PHY_ISO] = 4, > + [PHY_RESETB]= 5, > + [PHY_PCTL] = 6, > +}; > + > +struct bcm_usb_phy_cfg { > + uint32_t type; > + uint32_t ver; > + void __iomem *regs; > + struct phy *phy; > + const u8 *offset; > +}; > + > +#define PLL_LOCK_RETRY_COUNT 1000 > + > +enum bcm_usb_phy_type { > + USB_HS_PHY, > + USB_SS_PHY, > +}; > + > +static inline void bcm_usb_reg32_clrbits(void __iomem *addr,
Re: [PATCH] Uprobes: Fix kernel oops with delayed_uprobe_remove()
On Mon, 3 Dec 2018 11:52:41 +0530 Ravi Bangoria wrote: > Hi Steve, > > Please pull this patch. > Please send a v2 version of the patch with the updated change log. And should it have a Fixes and be tagged for stable? -- Steve > Thanks. > > On 11/15/18 6:13 PM, Oleg Nesterov wrote: > > On 11/15, Ravi Bangoria wrote: > >> > >> There could be a race between task exit and probe unregister: > >> > >> exit_mm() > >> mmput() > >> __mmput() uprobe_unregister() > >> uprobe_clear_state() put_uprobe() > >> delayed_uprobe_remove() delayed_uprobe_remove() > >> > >> put_uprobe() is calling delayed_uprobe_remove() without taking > >> delayed_uprobe_lock and thus the race sometimes results in a > >> kernel crash. Fix this by taking delayed_uprobe_lock before > >> calling delayed_uprobe_remove() from put_uprobe(). > >> > >> Detailed crash log can be found at: > >> https://lkml.org/lkml/2018/11/1/1244 > > > > Thanks, looks good, > > > > Oleg. > >
Re: [PATCH v2 2/5] devfreq: add support for suspend/resume of a devfreq device
Hi Lukasz, I add the comment about 'suspend_count'. On 2018년 12월 04일 14:43, Chanwoo Choi wrote: > Hi, > > On 2018년 12월 04일 14:36, Chanwoo Choi wrote: >> Hi Lukasz, >> >> Looks good to me. But, I add the some comments. >> If you will fix it, feel free to add my tag: >> Reviewed-by: Chanwoo choi > > Sorry. Fix typo 'choi' to 'Choi' as following. > Reviewed-by: Chanwoo Choi > >> >> On 2018년 12월 03일 23:31, Lukasz Luba wrote: >>> The patch prepares devfreq device for handling suspend/resume >>> functionality. The new fields will store needed information during this >> >> nitpick. Remove unneeded space. There are two spaces between '.' and 'The >> new'. >> >>> process. Devfreq framework handles opp-suspend DT entry and there is no >> >> ditto. >> >>> need of modyfications in the drivers code. It uses atomic variables to >> >> ditto. >> >>> make sure no race condition affects the process. >>> >>> The patch is based on earlier work by Tobias Jakobi. >> >> Please remove it from each patch description. >> >>> >>> Suggested-by: Tobias Jakobi >>> Suggested-by: Chanwoo Choi >>> Signed-off-by: Lukasz Luba >>> --- >>> drivers/devfreq/devfreq.c | 51 >>> +++ >>> include/linux/devfreq.h | 7 +++ >>> 2 files changed, 50 insertions(+), 8 deletions(-) >>> >>> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c >>> index a9fd61b..36bed24 100644 >>> --- a/drivers/devfreq/devfreq.c >>> +++ b/drivers/devfreq/devfreq.c >>> @@ -316,6 +316,10 @@ static int devfreq_set_target(struct devfreq *devfreq, >>> unsigned long new_freq, >>> "Couldn't update frequency transition information.\n"); >>> >>> devfreq->previous_freq = new_freq; >>> + >>> + if (devfreq->suspend_freq) >>> + devfreq->resume_freq = cur_freq; >>> + >>> return err; >>> } >>> >>> @@ -667,6 +671,9 @@ struct devfreq *devfreq_add_device(struct device *dev, >>> } >>> devfreq->max_freq = devfreq->scaling_max_freq; >>> >>> + devfreq->suspend_freq = dev_pm_opp_get_suspend_opp_freq(dev); >>> + atomic_set(>suspend_count, 0); >>> + >>> dev_set_name(>dev, "devfreq%d", >>> atomic_inc_return(_no)); >>> err = device_register(>dev); >>> @@ -867,14 +874,28 @@ EXPORT_SYMBOL(devm_devfreq_remove_device); >>> */ >>> int devfreq_suspend_device(struct devfreq *devfreq) >>> { >>> + int ret; >>> + >>> if (!devfreq) >>> return -EINVAL; >>> >>> - if (!devfreq->governor) >>> - return 0; >>> + if (devfreq->governor) { >>> + ret = devfreq->governor->event_handler(devfreq, >>> + DEVFREQ_GOV_SUSPEND, NULL); >>> + if (ret) >>> + return ret; >>> + } >>> + >>> + if (devfreq->suspend_freq) { >>> + if (atomic_inc_return(>suspend_count) > 1) >>> + return 0; >>> + >>> + ret = devfreq_set_target(devfreq, devfreq->suspend_freq, 0); >>> + if (ret) >>> + return ret; >>> + } In this patch, if some users call 'devfreq_suspend_device' twice, 'devfreq->governor->event_handler(devfreq, DEVFREQ_GOV_SUSPEND, NULL)' is called twice but devfreq_set_target() is called only one. I knew that it is no problem for operation. But, I think that you better to use 'suspend_count' as the reference count of devfreq_suspend/resume_device(). But, if you use 'suspend_count' in order to check whether this devfreq is suspended or not, we can reduce the unneeded redundant call when calling it twice. clock and regulator used the 'reference count' method in order to remove the redundant call. >>> >>> - return devfreq->governor->event_handler(devfreq, >>> - DEVFREQ_GOV_SUSPEND, NULL); >>> + return 0; >>> } >>> EXPORT_SYMBOL(devfreq_suspend_device); >>> >>> @@ -888,14 +909,28 @@ EXPORT_SYMBOL(devfreq_suspend_device); >>> */ >>> int devfreq_resume_device(struct devfreq *devfreq) >>> { >>> + int ret; >>> + >>> if (!devfreq) >>> return -EINVAL; >>> >>> - if (!devfreq->governor) >>> - return 0; >>> + if (devfreq->resume_freq) { >>> + if (atomic_dec_return(>suspend_count) >= 1) >>> + return 0; ditto. >>> >>> - return devfreq->governor->event_handler(devfreq, >>> - DEVFREQ_GOV_RESUME, NULL); >>> + ret = devfreq_set_target(devfreq, devfreq->resume_freq, 0); >>> + if (ret) >>> + return ret; >>> + } >>> + >>> + if (devfreq->governor) { >>> + ret = devfreq->governor->event_handler(devfreq, >>> + DEVFREQ_GOV_RESUME, NULL); >>> + if (ret) >>> + return ret; >>> + } >>> + >>> + return 0; >>> } >>> EXPORT_SYMBOL(devfreq_resume_device); >>> >>> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h >>> index e4963b0..d985199 100644 >>> ---
linux-next: Tree for Dec 4
Hi all, Changes since 20181203: The rdma tree gained a build failure so I used the version from next-20181203. The bpf-next tree gained conflicts against the bpf tree. The char-misc tree gained a conflict against the char-misc.current tree. The akpm tree lost its build failure but gained a conflict against the pm tree. Non-merge commits (relative to Linus' tree): 5958 6050 files changed, 291311 insertions(+), 163581 deletions(-) I have created today's linux-next tree at git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git (patches at http://www.kernel.org/pub/linux/kernel/next/ ). If you are tracking the linux-next tree using git, you should not use "git pull" to do so as that will try to merge the new linux-next release with the old one. You should use "git fetch" and checkout or reset to the new master. You can see which trees have been included by looking in the Next/Trees file in the source. There are also quilt-import.log and merge.log files in the Next directory. Between each merge, the tree was built with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a multi_v7_defconfig for arm and a native build of tools/perf. After the final fixups (if any), I do an x86_64 modules_install followed by builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit), ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc and sparc64 defconfig. And finally, a simple boot test of the powerpc pseries_le_defconfig kernel in qemu (with and without kvm enabled). Below is a summary of the state of the merge. I am currently merging 286 trees (counting Linus' and 67 trees of bug fix patches pending for the current merge release). Stats about the size of the tree over time can be seen at http://neuling.org/linux-next-size.html . Status of my local build tests will be at http://kisskb.ellerman.id.au/linux-next . If maintainers want to give advice about cross compilers/configs that work, we are always open to add more builds. Thanks to Randy Dunlap for doing many randconfig builds. And to Paul Gortmaker for triage and bug fixes. -- Cheers, Stephen Rothwell $ git checkout master $ git reset --hard stable Merging origin/master (0072a0c14d5b Merge tag 'media/v4.20-4' of git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media) Merging fixes/master (d8c137546ef8 powerpc: tag implicit fall throughs) Merging kbuild-current/fixes (ccda4af0f4b9 Linux 4.20-rc2) Merging arc-current/for-curr (10d443431dc2 ARC: io.h: Implement reads{x}()/writes{x}()) Merging arm-current/fixes (e46daee53bb5 ARM: 8806/1: kprobes: Fix false positive with FORTIFY_SOURCE) Merging arm64-fixes/for-next/fixes (ea2412dc21cc ACPI/IORT: Fix iort_get_platform_device_domain() uninitialized pointer value) Merging m68k-current/for-linus (58c116fb7dc6 m68k/sun3: Remove is_medusa and m68k_pgtable_cachemode) Merging powerpc-fixes/fixes (bf3d6afbb234 powerpc: Look for "stdout-path" when setting up legacy consoles) Merging sparc/master (f3f950dba37b Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide) Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2) Merging net/master (35b827b6d061 tun: forbid iface creation with rtnl ops) Merging bpf/master (dcb40590e69e bpf: refactor bpf_test_run() to separate own failures and test program result) Merging ipsec/master (4a135e538962 xfrm_user: fix freeing of xfrm states on acquire) Merging netfilter/master (d78a5ebd8b18 Merge branch '1GbE' of git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue) Merging ipvs/master (feb9f55c33e5 netfilter: nft_dynset: allow dynamic updates of non-anonymous set) Merging wireless-drivers/master (2e6e902d1850 Linux 4.20-rc4) Merging mac80211/master (113f3aaa81bd cfg80211: Prevent regulatory restore during STA disconnect in concurrent interfaces) Merging rdma-fixes/for-rc (7bca603a69c0 RDMA/mlx5: Initialize return variable in case pagefault was skipped) Merging sound-current/for-linus (5f8cf7125826 ALSA: usb-audio: Fix UAF decrement if card has no live interfaces in card.c) Merging sound-asoc-fixes/for-linus (280ea4299e05 Merge branch 'asoc-4.20' into asoc-linus) Merging regmap-fixes/for-linus (9ff01193a20d Linux 4.20-rc3) Merging regulator-fixes/for-linus (fea4962497d8 Merge branch 'regulator-4.20' into regulator-linus) Merging spi-fixes/for-linus (9ea83d4c2b9a Merge branch 'spi-4.20' into spi-linus) Merging pci-current/for-linus (c74eadf881ad Merge remote-tracking branch 'lorenzo/pci/controller-fixes' into for-linus) Merging driver-core.current/driver-core-linus (2595646791c3 Linux 4.20-rc5) Merging tty.current/tty-linus (2a48602615e0 tty: do not set TTY_IO_ERROR flag if console port) Merging usb.current/usb-linus (2595646791c3 Linux 4.20-rc5) Merging usb-gadget-fixes/fixes (069caf5950df USB: omap_udc: fix rejection of out transfers when DMA is used) Merging usb-serial
Re: [PATCH AUTOSEL 4.14 25/35] iomap: sub-block dio needs to zeroout beyond EOF
On Mon, Dec 03, 2018 at 11:22:46PM +0159, Thomas Backlund wrote: > Den 2018-12-03 kl. 11:22, skrev Sasha Levin: > > > > > This is a case where theory collides with the real world. Yes, our QA is > > lacking, but we don't have the option of not doing the current process. > > If we stop backporting until a future data where our QA problem is > > solved we'll end up with what we had before: users stuck on ancient > > kernels without a way to upgrade. > > > > Sorry, but you seem to be living in a different "real world"... > > People stay on "ancient kernels" that "just works" instead of updating > to a newer one that "hopefully/maybe/... works" That's not good as those "ancient kernels" really just are "kernels with lots of known security bugs". It's your systems, I can't tell you what to do, but I will tell you that running older, unfixed kernels, is a known liability. Good luck! greg k-h
Re: [patch 1/2 for-4.20] mm, thp: restore node-local hugepage allocations
On Mon 03-12-18 15:50:24, David Rientjes wrote: > This is a full revert of ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for > MADV_HUGEPAGE mappings") and a partial revert of 89c83fb539f9 ("mm, thp: > consolidate THP gfp handling into alloc_hugepage_direct_gfpmask"). > > By not setting __GFP_THISNODE, applications can allocate remote hugepages > when the local node is fragmented or low on memory when either the thp > defrag setting is "always" or the vma has been madvised with > MADV_HUGEPAGE. > > Remote access to hugepages often has much higher latency than local pages > of the native page size. On Haswell, ac5b2c18911f was shown to have a > 13.9% access regression after this commit for binaries that remap their > text segment to be backed by transparent hugepages. > > The intent of ac5b2c18911f is to address an issue where a local node is > low on memory or fragmented such that a hugepage cannot be allocated. In > every scenario where this was described as a fix, there is abundant and > unfragmented remote memory available to allocate from, even with a greater > access latency. > > If remote memory is also low or fragmented, not setting __GFP_THISNODE was > also measured on Haswell to have a 40% regression in allocation latency. > > Restore __GFP_THISNODE for thp allocations. > > Fixes: ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE > mappings") > Fixes: 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into > alloc_hugepage_direct_gfpmask") At minimum do not remove the cleanup part which consolidates the gfp hadnling to a single place. There is no real reason to have the __GFP_THISNODE ugliness outside of alloc_hugepage_direct_gfpmask. I still hate the __GFP_THISNODE part as mentioned before. It is an ugly hack but I can learn to live with it if this is indeed the only option for the short term workaround until we find a proper solution. > Signed-off-by: David Rientjes > --- > include/linux/mempolicy.h | 2 -- > mm/huge_memory.c | 42 +++ > mm/mempolicy.c| 7 --- > 3 files changed, 20 insertions(+), 31 deletions(-) > > diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h > --- a/include/linux/mempolicy.h > +++ b/include/linux/mempolicy.h > @@ -139,8 +139,6 @@ struct mempolicy *mpol_shared_policy_lookup(struct > shared_policy *sp, > struct mempolicy *get_task_policy(struct task_struct *p); > struct mempolicy *__get_vma_policy(struct vm_area_struct *vma, > unsigned long addr); > -struct mempolicy *get_vma_policy(struct vm_area_struct *vma, > - unsigned long addr); > bool vma_policy_mof(struct vm_area_struct *vma); > > extern void numa_default_policy(void); > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -632,37 +632,27 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct > vm_fault *vmf, > static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct > *vma, unsigned long addr) > { > const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE); > - gfp_t this_node = 0; > - > -#ifdef CONFIG_NUMA > - struct mempolicy *pol; > - /* > - * __GFP_THISNODE is used only when __GFP_DIRECT_RECLAIM is not > - * specified, to express a general desire to stay on the current > - * node for optimistic allocation attempts. If the defrag mode > - * and/or madvise hint requires the direct reclaim then we prefer > - * to fallback to other node rather than node reclaim because that > - * can lead to excessive reclaim even though there is free memory > - * on other nodes. We expect that NUMA preferences are specified > - * by memory policies. > - */ > - pol = get_vma_policy(vma, addr); > - if (pol->mode != MPOL_BIND) > - this_node = __GFP_THISNODE; > - mpol_cond_put(pol); > -#endif > + const gfp_t gfp_mask = GFP_TRANSHUGE_LIGHT | __GFP_THISNODE; > > + /* Always do synchronous compaction */ > if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, > _hugepage_flags)) > - return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY); > + return GFP_TRANSHUGE | __GFP_THISNODE | > +(vma_madvised ? 0 : __GFP_NORETRY); > + > + /* Kick kcompactd and fail quickly */ > if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, > _hugepage_flags)) > - return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM | this_node; > + return gfp_mask | __GFP_KSWAPD_RECLAIM; > + > + /* Synchronous compaction if madvised, otherwise kick kcompactd */ > if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, > _hugepage_flags)) > - return GFP_TRANSHUGE_LIGHT | (vma_madvised ? > __GFP_DIRECT_RECLAIM : > - > __GFP_KSWAPD_RECLAIM | this_node); > +
Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()
On 12/4/18 5:04 AM, jgli...@redhat.com wrote: From: Jérôme Glisse Heterogeneous memory system are becoming more and more the norm, in those system there is not only the main system memory for each node, but also device memory and|or memory hierarchy to consider. Device memory can comes from a device like GPU, FPGA, ... or from a memory only device (persistent memory, or high density memory device). Memory hierarchy is when you not only have the main memory but also other type of memory like HBM (High Bandwidth Memory often stack up on CPU die or GPU die), peristent memory or high density memory (ie something slower then regular DDR DIMM but much bigger). On top of this diversity of memories you also have to account for the system bus topology ie how all CPUs and devices are connected to each others. Userspace do not care about the exact physical topology but care about topology from behavior point of view ie what are all the paths between an initiator (anything that can initiate memory access like CPU, GPU, FGPA, network controller ...) and a target memory and what are all the properties of each of those path (bandwidth, latency, granularity, ...). This means that it is no longer sufficient to consider a flat view for each node in a system but for maximum performance we need to account for all of this new memory but also for system topology. This is why this proposal is unlike the HMAT proposal [1] which tries to extend the existing NUMA for new type of memory. Here we are tackling a much more profound change that depart from NUMA. One of the reasons for radical change is the advance of accelerator like GPU or FPGA means that CPU is no longer the only piece where computation happens. It is becoming more and more common for an application to use a mix and match of different accelerator to perform its computation. So we can no longer satisfy our self with a CPU centric and flat view of a system like NUMA and NUMA distance. This patchset is a proposal to tackle this problems through three aspects: 1 - Expose complex system topology and various kind of memory to user space so that application have a standard way and single place to get all the information it cares about. 2 - A new API for user space to bind/provide hint to kernel on which memory to use for range of virtual address (a new mbind() syscall). 3 - Kernel side changes for vm policy to handle this changes This patchset is not and end to end solution but it provides enough pieces to be useful against nouveau (upstream open source driver for NVidia GPU). It is intended as a starting point for discussion so that we can figure out what to do. To avoid having too much topics to discuss i am not considering memory cgroup for now but it is definitely something we will want to integrate with. The rest of this emails is splits in 3 sections, the first section talks about complex system topology: what it is, how it is use today and how to describe it tomorrow. The second sections talks about new API to bind/provide hint to kernel for range of virtual address. The third section talks about new mechanism to track bind/hint provided by user space or device driver inside the kernel. 1) Complex system topology and representing them Inside a node you can have a complex topology of memory, for instance you can have multiple HBM memory in a node, each HBM memory tie to a set of CPUs (all of which are in the same node). This means that you have a hierarchy of memory for CPUs. The local fast HBM but which is expected to be relatively small compare to main memory and then the main memory. New memory technology might also deepen this hierarchy with another level of yet slower memory but gigantic in size (some persistent memory technology might fall into that category). Another example is device memory, and device themself can have a hierarchy like HBM on top of device core and main device memory. On top of that you can have multiple path to access each memory and each path can have different properties (latency, bandwidth, ...). Also there is not always symmetry ie some memory might only be accessible by some device or CPU ie not accessible by everyone. So a flat hierarchy for each node is not capable of representing this kind of complexity. To simplify discussion and because we do not want to single out CPU from device, from here on out we will use initiator to refer to either CPU or device. An initiator is any kind of CPU or device that can access memory (ie initiate memory access). At this point a example of such system might help: - 2 nodes and for each node: - 1 CPU per node with 2 complex of CPUs cores per CPU - one HBM memory for each complex of CPUs cores (200GB/s) - CPUs cores complex are linked to each other (100GB/s) - main memory is (90GB/s) - 4 GPUs each with: - HBM memory for
Re: [RFC/RFT][PATCH v6] cpuidle: New timer events oriented governor for tickless systems
On Thursday, November 29, 2018 12:20:07 AM CET Doug Smythies wrote: > On 2018.11.23 02:36 Rafael J. Wysocki wrote: > > v5 -> v6: > * Avoid applying poll_time_limit to non-polling idle states by mistake. > * Use idle duration measured by the governor for everything (as it likely is >more accurate than the one measured by the core). > > -- above missing-- (see follow up e-mail from Rafael) > > * Rename SPIKE to PULSE. > * Do not run pattern detection upfront. Instead, use recent idle duration >values to refine the state selection after finding a candidate idle state. > * Do not use the expected idle duration as an extra latency constraint >(exit latency is less than the target residency for all of the idle states >known to me anyway, so this doesn't change anything in practice). > > Hi Rafael, > > I did some minimal testing on teov6, using kernel 4.20-rc3 as my baseline > reference kernel. > > Test 1: Phoronix bdench test, all options: 1, 6, 12, 48, 128, 256 clients. > > Note: because it uses the disk, the dbench test is somewhat non-repeatable. > However, if particular attention is paid to not doing anything else with > the disk between tests, then it seems to be repeatable to within about 6%. > > Anyway no significant difference observed between kernel 4.20-rc3 and the > same with the teov6 patch. > > Test 2: Pipe test, non cross core. (And idle state 0 test, really) > I ran 4 pipe tests, 1 for each of my 4 cores, @2 CPUs per core. > Thus, pretty much only idle state 0 was ever used. > Processor package power was similar for both kernels. > teov6 entered/exited idle state 0 about 60,984 times/second/cpu. > -rc3 entered/exited idle state 0 about 62,806 times/second/cpu. > There was a difference in percentage time spent in idle state 0, > with kernel 4.20-rc3 spending 0.2441% in idle state 0 verses > teov6 at 0.0641%. > > For throughput, teov6 was 1.4% faster. This may indicate that teov6 is somewhat too aggressive. > Test 3: was an attempt to sweep through a preference for > all idle states. > > 40 threads were launched with nothing to do except sleep > for a variable duration of 1 to 500 uSec, each step was > run for 1 minute. With 1 minute idle before the test and a few > minutes idle after, the total test duration was about 505 minutes. > Recall that when one asks for a short sleep of 1 uSec, they actually > get about 50 uSec, due to overheads. So I use 40 threads in an attempt > to get the average time between wakeup events per CPU down somewhat. > > The results are here: > http://fast.smythies.com/linux-pm/k420/k420-pn-sweep-teo6-2.htm And, so long as my understanding of the graphs is correct, the results here indicate that teov6 tends to prefer relatively shallow idle states which is good for performance (at least with some workloads), but not necessarily for energy-efficiency. I will send a v7 of TEO with some changes to make it a bit more energy-efficient with respect to the v6. Thanks, Rafael
Re: [PATCH memory-model 0/3] Updates to the formal memory model
On Tue, Dec 04, 2018 at 08:28:03AM +0900, Akira Yokosawa wrote: > On 2018/12/03 15:04:11 -0800, Paul E. McKenney wrote: > > Hello, Ingo! > > > > This series contains updates to the Linux kernel's formal memory model > > in tools/memory-model. These patches are ready for inclusion into -tip. > > > > 1. Model smp_mb__after_unlock_lock(), courtesy of Andrea Parri. > > > > 2. Add scripts to check github litmus tests. > > > > 3. Make scripts take "-j" abbreviation for "--jobs". > > > > There is another series in preparation to model SRCU, but this series > > requires hot-off-the presses changes to the herd tool that have not yet > > been released. This SRCU series is therefore targeting the merge window > > after the upcoming one. People wishing to experiment with the prototype > > SRCU model may obtain it from my -rcu tree at branch "dev", and use > > a bleeding-edge herd7 built from https://github.com/herd/herdtools7/, > > version 7.51+2(dev), which is (commit 10403b24070c) or later. > > On the master branch of herdtools7, SRCU support was added in version > 7.51+4(dev), which is commit 6ec9da1f4d58, or later. It has been working for me with version 7.51+2(dev), but perhaps I have just been getting lucky. It wouldn't be the first time! ;-) Thanx, Paul > Thanks, Akira > > > > > Thanx, Paul > > > > > > > > .gitignore |1 > > README |2 > > linux-kernel.bell |3 > > linux-kernel.cat |4 - > > linux-kernel.def |1 > > scripts/README | 70 ++ > > scripts/checkalllitmus.sh | 53 +++-- > > scripts/checkghlitmus.sh | 65 > > scripts/checklitmus.sh | 74 +++ > > scripts/checklitmushist.sh | 60 +++ > > scripts/cmplitmushist.sh | 87 +++ > > scripts/initlitmushist.sh | 68 + > > scripts/judgelitmus.sh | 78 + > > scripts/newlitmushist.sh | 61 +++ > > scripts/parseargs.sh | 140 > > - > > scripts/runlitmushist.sh | 87 +++ > > 16 files changed, 757 insertions(+), 97 deletions(-) > > >
Re: [PATCH] arm64: dts: qcom: msm8998: Fix compatible of scm node
Quoting Bjorn Andersson (2018-11-29 22:56:55) > The scm binding and driver was updated to rely on the fallback to the > default qcom,scm for any modern SoC and as such both are required. Add > the default compatible to make the scm instance probe. > > Fixes: d850156a226a ("arm64: dts: qcom: msm8998: Add firmware node") > Signed-off-by: Bjorn Andersson > --- Reviewed-by: Stephen Boyd
Re: [PATCH v1 2/2] mtd: spi-nor: add NPCM FIU controller driver
Hi Tomer, I love your patch! Yet something to improve: [auto build test ERROR on mtd/spi-nor/next] [also build test ERROR on v4.20-rc5 next-20181203] [if your patch is applied to the wrong git tree, please drop us a note to help improve the system] url: https://github.com/0day-ci/linux/commits/Tomer-Maimon/dt-binding-mtd-add-NPCM-FIU-controller/20181203-201804 base: git://git.infradead.org/linux-mtd.git spi-nor/next config: i386-allmodconfig (attached as .config) compiler: gcc-7 (Debian 7.3.0-1) 7.3.0 reproduce: # save the attached .config to linux build tree make ARCH=i386 All errors (new ones prefixed by >>): >> drivers/mtd/spi-nor/npcm-fiu.c:25:10: fatal error: asm/sizes.h: No such file >> or directory #include ^ compilation terminated. vim +25 drivers/mtd/spi-nor/npcm-fiu.c 24 > 25 #include 26 #include 27 --- 0-DAY kernel test infrastructureOpen Source Technology Center https://lists.01.org/pipermail/kbuild-all Intel Corporation .config.gz Description: application/gzip
Re: [PATCH] x86/mpx: pass 'mm' to kernel_managing_mpx_tables() in mpx_notify_unmap()
On Mon, Dec 03, 2018 at 12:49:44PM -0800, Dave Hansen wrote: > On 12/3/18 12:43 PM, Jarkko Sakkinen wrote: > > If mm is not the same as current->mm, mpx_notify_unmap() will yield > > invalid results and at worst will lead to a crash if it gets called by > > a kthread. > > It's also worth noting that this does not fix any actual, > end-user-visible bug today. It really only prepares the code for the > case where it is called for a different mm than current->mm. > > > --- a/arch/x86/mm/mpx.c > > +++ b/arch/x86/mm/mpx.c > > @@ -882,7 +882,7 @@ static int mpx_unmap_tables(struct mm_struct *mm, > > * necessary, and the 'vma' is the first vma in this range (start -> end). > > */ > > void mpx_notify_unmap(struct mm_struct *mm, struct vm_area_struct *vma, > > - unsigned long start, unsigned long end) > > + unsigned long start, unsigned long end) > > { > > int ret; > > Please leave superfluous whitespace changes out of these things. > > But, otherwise, this looks fine. > > > Fixes: 1de4fa14ee25 ("x86, mpx: Cleanup unused bound tables") > > FWIW, I'm not sure you should be submitting this separately from your > SGX series. The deferred unmapping is really the thing that requires > the code to be changed. Thank you for the feedback. I'll include this to the next revision of the SGX patch set and explain why the change is needed. /Jarkko
Re: [PATCH 2/2] irqchip/meson-gpio: Add support for Meson-G12A SoC
On 2018/12/3 18:06, Neil Armstrong wrote: On 03/12/2018 10:28, Xingyu Chen wrote: On 2018/12/3 17:19, Jerome Brunet wrote: On Mon, 2018-12-03 at 14:13 +0800, Xingyu Chen wrote: The Meson-G12A SoC uses the same GPIO interrupt controller IP block as the other Meson SoCs, A totle of 100 pins can be spied on, which is the sum of: - 223:100 undefined (no interrupt) - 99:97 3 pins on bank GPIOE - 96:77 20 pins on bank GPIOX - 76:61 16 pins on bank GPIOA - 60:53 8 pins on bank GPIOC - 52:37 16 pins on bank BOOT - 36:28 9 pins on bank GPIOH - 27:12 16 pins on bank GPIOZ - 11:0 12 pins in the AO domain Signed-off-by: Xingyu Chen Signed-off-by: Jianxin Pan --- drivers/irqchip/irq-meson-gpio.c | 5 + 1 file changed, 5 insertions(+) diff --git a/drivers/irqchip/irq-meson-gpio.c b/drivers/irqchip/irq-meson- gpio.c index 7b531fd075b8..971e8dea069a 100644 --- a/drivers/irqchip/irq-meson-gpio.c +++ b/drivers/irqchip/irq-meson-gpio.c @@ -67,12 +67,17 @@ static const struct meson_gpio_irq_params axg_params = { .nr_hwirq = 100, }; +static const struct meson_gpio_irq_params g12a_params = { + .nr_hwirq = 100, +}; + Same comment as on i2c, the g12 seems compatible with the axg. Is this patchset patchset really necessary ? Although the total number of pins is the same as the Meson-AXG SoC, the gpio banks and irq numbers are different. To avoid confusion on use, i think the new compatible string is needed. OK for the new compatible, but you can re-use the same struct like for i2c. Neil Thanks for your comment, I will fix it in the next version. static const struct of_device_id meson_irq_gpio_matches[] = { { .compatible = "amlogic,meson8-gpio-intc", .data = _params }, { .compatible = "amlogic,meson8b-gpio-intc", .data = _params }, { .compatible = "amlogic,meson-gxbb-gpio-intc", .data = _params }, { .compatible = "amlogic,meson-gxl-gpio-intc", .data = _params }, { .compatible = "amlogic,meson-axg-gpio-intc", .data = _params }, + { .compatible = "amlogic,meson-g12a-gpio-intc", .data = _params }, { } }; . ___ linux-amlogic mailing list linux-amlo...@lists.infradead.org http://lists.infradead.org/mailman/listinfo/linux-amlogic .
Re: [PATCH 2/3] mm/vmscan: Enable kswapd to reclaim low-protected memory
On 2018/12/4 AM 1:22, Michal Hocko wrote: > On Mon 03-12-18 23:20:31, Xunlei Pang wrote: >> On 2018/12/3 下午7:56, Michal Hocko wrote: >>> On Mon 03-12-18 16:01:18, Xunlei Pang wrote: There may be cgroup memory overcommitment, it will become even common in the future. Let's enable kswapd to reclaim low-protected memory in case of memory pressure, to mitigate the global direct reclaim pressures which could cause jitters to the response time of lantency-sensitive groups. >>> >>> Please be more descriptive about the problem you are trying to handle >>> here. I haven't actually read the patch but let me emphasise that the >>> low limit protection is important isolation tool. And allowing kswapd to >>> reclaim protected memcgs is going to break the semantic as it has been >>> introduced and designed. >> >> We have two types of memcgs: online groups(important business) >> and offline groups(unimportant business). Online groups are >> all configured with MAX low protection, while offline groups >> are not at all protected(with default 0 low). >> >> When offline groups are overcommitted, the global memory pressure >> suffers. This will cause the memory allocations from online groups >> constantly go to the slow global direct reclaim in order to reclaim >> online's page caches, as kswap is not able to reclaim low-protection >> memory. low is not hard limit, it's reasonable to be reclaimed by >> kswapd if there's no other reclaimable memory. > > I am sorry I still do not follow. What role do offline cgroups play. > Those are certainly not low mem protected because mem_cgroup_css_offline > will reset them to 0. > Oh, I meant "offline groups" to be "offline-business groups", memcgs refered to here are all "online state" from kernel's perspective.
Re: [PATCH v2 2/5] devfreq: add support for suspend/resume of a devfreq device
Hi Lukasz, Looks good to me. But, I add the some comments. If you will fix it, feel free to add my tag: Reviewed-by: Chanwoo choi On 2018년 12월 03일 23:31, Lukasz Luba wrote: > The patch prepares devfreq device for handling suspend/resume > functionality. The new fields will store needed information during this nitpick. Remove unneeded space. There are two spaces between '.' and 'The new'. > process. Devfreq framework handles opp-suspend DT entry and there is no ditto. > need of modyfications in the drivers code. It uses atomic variables to ditto. > make sure no race condition affects the process. > > The patch is based on earlier work by Tobias Jakobi. Please remove it from each patch description. > > Suggested-by: Tobias Jakobi > Suggested-by: Chanwoo Choi > Signed-off-by: Lukasz Luba > --- > drivers/devfreq/devfreq.c | 51 > +++ > include/linux/devfreq.h | 7 +++ > 2 files changed, 50 insertions(+), 8 deletions(-) > > diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c > index a9fd61b..36bed24 100644 > --- a/drivers/devfreq/devfreq.c > +++ b/drivers/devfreq/devfreq.c > @@ -316,6 +316,10 @@ static int devfreq_set_target(struct devfreq *devfreq, > unsigned long new_freq, > "Couldn't update frequency transition information.\n"); > > devfreq->previous_freq = new_freq; > + > + if (devfreq->suspend_freq) > + devfreq->resume_freq = cur_freq; > + > return err; > } > > @@ -667,6 +671,9 @@ struct devfreq *devfreq_add_device(struct device *dev, > } > devfreq->max_freq = devfreq->scaling_max_freq; > > + devfreq->suspend_freq = dev_pm_opp_get_suspend_opp_freq(dev); > + atomic_set(>suspend_count, 0); > + > dev_set_name(>dev, "devfreq%d", > atomic_inc_return(_no)); > err = device_register(>dev); > @@ -867,14 +874,28 @@ EXPORT_SYMBOL(devm_devfreq_remove_device); > */ > int devfreq_suspend_device(struct devfreq *devfreq) > { > + int ret; > + > if (!devfreq) > return -EINVAL; > > - if (!devfreq->governor) > - return 0; > + if (devfreq->governor) { > + ret = devfreq->governor->event_handler(devfreq, > + DEVFREQ_GOV_SUSPEND, NULL); > + if (ret) > + return ret; > + } > + > + if (devfreq->suspend_freq) { > + if (atomic_inc_return(>suspend_count) > 1) > + return 0; > + > + ret = devfreq_set_target(devfreq, devfreq->suspend_freq, 0); > + if (ret) > + return ret; > + } > > - return devfreq->governor->event_handler(devfreq, > - DEVFREQ_GOV_SUSPEND, NULL); > + return 0; > } > EXPORT_SYMBOL(devfreq_suspend_device); > > @@ -888,14 +909,28 @@ EXPORT_SYMBOL(devfreq_suspend_device); > */ > int devfreq_resume_device(struct devfreq *devfreq) > { > + int ret; > + > if (!devfreq) > return -EINVAL; > > - if (!devfreq->governor) > - return 0; > + if (devfreq->resume_freq) { > + if (atomic_dec_return(>suspend_count) >= 1) > + return 0; > > - return devfreq->governor->event_handler(devfreq, > - DEVFREQ_GOV_RESUME, NULL); > + ret = devfreq_set_target(devfreq, devfreq->resume_freq, 0); > + if (ret) > + return ret; > + } > + > + if (devfreq->governor) { > + ret = devfreq->governor->event_handler(devfreq, > + DEVFREQ_GOV_RESUME, NULL); > + if (ret) > + return ret; > + } > + > + return 0; > } > EXPORT_SYMBOL(devfreq_resume_device); > > diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h > index e4963b0..d985199 100644 > --- a/include/linux/devfreq.h > +++ b/include/linux/devfreq.h > @@ -131,6 +131,9 @@ struct devfreq_dev_profile { > * @scaling_min_freq:Limit minimum frequency requested by OPP > interface > * @scaling_max_freq:Limit maximum frequency requested by OPP > interface > * @stop_polling: devfreq polling status of a device. > + * @suspend_freq: frequency of a device set during suspend phase. > + * @resume_freq: frequency of a device set in resume phase. > + * @suspend_count:suspend requests counter for a device. > * @total_trans: Number of devfreq transitions > * @trans_table: Statistics of devfreq transitions > * @time_in_state: Statistics of devfreq states > @@ -167,6 +170,10 @@ struct devfreq { > unsigned long scaling_max_freq; > bool stop_polling; > > + unsigned long suspend_freq; > + unsigned long resume_freq; > + atomic_t suspend_count; > + > /* information for device frequency transition */ > unsigned int
Re: [PATCH] mm/alloc: fallback to first node if the wanted node offline
On Tue 04-12-18 11:05:57, Pingfan Liu wrote: > During my test on some AMD machine, with kexec -l nr_cpus=x option, the > kernel failed to bootup, because some node's data struct can not be allocated, > e.g, on x86, initialized by init_cpu_to_node()->init_memory_less_node(). But > device->numa_node info is used as preferred_nid param for > __alloc_pages_nodemask(), which causes NULL reference > ac->zonelist = node_zonelist(preferred_nid, gfp_mask); > This patch tries to fix the issue by falling back to the first online node, > when encountering such corner case. We have seen similar issues already and the bug was usually that the zonelists were not initialized yet or the node is completely bogus. Zonelists should be initialized by build_all_zonelists quite early so I am wondering whether the later is the case. What is the actual node number the device is associated with? Your patch is not correct btw, because we want to fallback into the node in the distance order rather into the first online node. -- Michal Hocko SUSE Labs
Re: [PATCH 2/9] tools/lib/traceevent: Added support for pkg-config
Hi Steve, On Fri, Nov 30, 2018 at 10:44:05AM -0500, Steven Rostedt wrote: > From: Tzvetomir Stoyanov > > This patch implements integration with pkg-config framework. > pkg-config can be used by the library users to determine > required CFLAGS and LDFLAGS in order to use the library > > Signed-off-by: Tzvetomir Stoyanov > Signed-off-by: Steven Rostedt (VMware) > --- [SNIP] > diff --git a/tools/lib/traceevent/libtraceevent.pc.template > b/tools/lib/traceevent/libtraceevent.pc.template > new file mode 100644 > index ..42e4d6cb6b9e > --- /dev/null > +++ b/tools/lib/traceevent/libtraceevent.pc.template > @@ -0,0 +1,10 @@ > +prefix=INSTALL_PREFIX > +libdir=${prefix}/lib64 Don't we care 32-bit systems anymore? :) Thanks, Namhyung > +includedir=${prefix}/include/traceevent > + > +Name: libtraceevent > +URL: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git > +Description: Linux kernel trace event library > +Version: LIB_VERSION > +Cflags: -I${includedir} > +Libs: -L${libdir} -ltraceevent > -- > 2.19.1 > >
Re: ext4 file system corruption with v4.19.3 / v4.19.4
After upgrading my kernel to 4.19 I got a corruption on nearly every reboot or resume from suspend on my Acer s7-391 [UEFI boot]. Going to my UEFI setup and changing IDE mode from IDE to ATA seems to have resolved the issue for me. Don't know, though, if that is a valid data point or if it was a mere accident (tested only on one computer) or just avoids the Bad Timing by a few nanoseconds
Re: [PATCH 1/3] dt-bindings: soc: qcom: Add AOSS QMP binding
On Mon, Nov 12, 2018 at 12:05:55AM -0800, Bjorn Andersson wrote: > Add binding for the QMP based side-channel communication mechanism to > the AOSS, which is used to control resources not exposed through the > RPMh interface. > > Signed-off-by: Bjorn Andersson > --- > .../bindings/soc/qcom/qcom,aoss-qmp.txt | 63 +++ > include/dt-bindings/power/qcom-aoss-qmp.h | 15 + > 2 files changed, 78 insertions(+) > create mode 100644 > Documentation/devicetree/bindings/soc/qcom/qcom,aoss-qmp.txt > create mode 100644 include/dt-bindings/power/qcom-aoss-qmp.h > > diff --git a/Documentation/devicetree/bindings/soc/qcom/qcom,aoss-qmp.txt > b/Documentation/devicetree/bindings/soc/qcom/qcom,aoss-qmp.txt > new file mode 100644 > index ..e3c8cb4372f2 > --- /dev/null > +++ b/Documentation/devicetree/bindings/soc/qcom/qcom,aoss-qmp.txt > @@ -0,0 +1,63 @@ > +Qualcomm Always-On Subsystem side channel binding > + > +This binding describes the hardware component responsible for side channel > +requests to the always-on subsystem (AOSS), used for certain power management > +requests that is not handled by the standard RPMh interface. Each client in > the > +SoC has it's own block of message RAM and IRQ for communication with the > AOSS. > +The protocol used to communicate in the message RAM is known as QMP. > + > +- compatible: > + Usage: required > + Value type: > + Definition: must be "qcom,sdm845-aoss-qmp" > + > +- reg: > + Usage: required > + Value type: > + Definition: the base address and size of the message RAM for this > + client's communication with the AOSS > + > +- interrupts: > + Usage: required > + Value type: > + Definition: should specify the AOSS message IRQ for this client > + > +- mboxes: > + Usage: required > + Value type: > + Definition: reference to the mailbox representing the outgoing doorbell > + in APCS for this client, as described in mailbox/mailbox.txt > + > += AOSS Power-domains > +The AOSS side channel exposes control over a set of resources, used to > control > +a set of debug related clocks and to affect the low power state of resources > +related to the secondary subsystems. These resources are described as a set > of > +power-domains in a subnode of hte AOSS side channel node. Why does this need to be a sub-node? Are there other sub-nodes needed? > + > +- compatible: > + Usage: required > + Value type: > + Definition: must be "qcom,sdm845-aoss-qmp-pd" > + > +- #power-domain-cells: > + Usage: required > + Value type: > + Definition: must be 1
[PATCH] perf, tools: Support srccode output
From: Andi Kleen When looking at PT or brstackinsn traces with perf script it can be very useful to see the source code. This adds a simple facility to print them with perf script, if the information is available through dwarf % perf record ... % perf script -F insn,ip,sym,srccode ... 4004c6 main 5 for (i = 0; i < 1000; i++) 4004cd main 5 for (i = 0; i < 1000; i++) 4004c6 main 5 for (i = 0; i < 1000; i++) 4004cd main 5 for (i = 0; i < 1000; i++) 4004cd main 5 for (i = 0; i < 1000; i++) 4004cd main 5 for (i = 0; i < 1000; i++) 4004cd main 5 for (i = 0; i < 1000; i++) 4004cd main 5 for (i = 0; i < 1000; i++) 4004b3 main 6 v++; % perf record -b ... % perf script -F insn,ip,sym,srccode,brstackinsn ... main+22: 00400543insn: e8 ca ff ff ff# PRED |18 f1(); f1: 00400512insn: 55 |10 { 00400513insn: 48 89 e5 00400516insn: b8 00 00 00 00 |11 f2(); 0040051binsn: e8 d6 ff ff ff# PRED f2: 004004f6insn: 55 |5{ 004004f7insn: 48 89 e5 004004fainsn: 8b 05 2c 0b 20 00 |6 c = a / b; 00400500insn: 8b 0d 2a 0b 20 00 00400506insn: 99 00400507insn: f7 f9 00400509insn: 89 05 29 0b 20 00 0040050finsn: 90 |7} 00400510insn: 5d 00400511insn: c3# PRED f1+14: 00400520insn: b8 00 00 00 00 |12 f2(); 00400525insn: e8 cc ff ff ff# PRED f2: 004004f6insn: 55 |5{ 004004f7insn: 48 89 e5 004004fainsn: 8b 05 2c 0b 20 00 |6 c = a / b; Not supported for callchains currently, would need some layout changes there. Signed-off-by: Andi Kleen --- tools/perf/Documentation/perf-script.txt | 2 +- tools/perf/builtin-script.c | 47 +- tools/perf/util/Build| 1 + tools/perf/util/evsel_fprintf.c | 1 + tools/perf/util/map.c| 49 ++ tools/perf/util/map.h| 16 ++ tools/perf/util/srccode.c| 186 +++ tools/perf/util/srccode.h| 7 + tools/perf/util/srcline.c| 28 tools/perf/util/srcline.h| 1 + tools/perf/util/thread.c | 2 + tools/perf/util/thread.h | 2 + 12 files changed, 339 insertions(+), 3 deletions(-) create mode 100644 tools/perf/util/srccode.c create mode 100644 tools/perf/util/srccode.h diff --git a/tools/perf/Documentation/perf-script.txt b/tools/perf/Documentation/perf-script.txt index a2b37ce48094..9e4def08d569 100644 --- a/tools/perf/Documentation/perf-script.txt +++ b/tools/perf/Documentation/perf-script.txt @@ -117,7 +117,7 @@ OPTIONS Comma separated list of fields to print. Options are: comm, tid, pid, time, cpu, event, trace, ip, sym, dso, addr, symoff, srcline, period, iregs, uregs, brstack, brstacksym, flags, bpf-output, brstackinsn, -brstackoff, callindent, insn, insnlen, synth, phys_addr, metric, misc. +brstackoff, callindent, insn, insnlen, synth, phys_addr, metric, misc, srccode. Field list can be prepended with the type, trace, sw or hw, to indicate to which event type the field list applies. e.g., -F sw:comm,tid,time,ip,sym and -F trace:time,cpu,trace diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c index 04913136bac9..fa6f86b98c76 100644 --- a/tools/perf/builtin-script.c +++ b/tools/perf/builtin-script.c @@ -96,6 +96,7 @@ enum perf_output_field { PERF_OUTPUT_UREGS = 1U << 27, PERF_OUTPUT_METRIC = 1U << 28, PERF_OUTPUT_MISC= 1U << 29, + PERF_OUTPUT_SRCCODE = 1U << 30, }; struct output_option { @@ -132,6 +133,7 @@ struct output_option { {.str = "phys_addr", .field = PERF_OUTPUT_PHYS_ADDR}, {.str = "metric", .field = PERF_OUTPUT_METRIC}, {.str = "misc", .field = PERF_OUTPUT_MISC}, + {.str = "srccode", .field = PERF_OUTPUT_SRCCODE}, }; enum { @@ -424,7 +426,7 @@ static int perf_evsel__check_attr(struct perf_evsel *evsel, pr_err("Display of DSO requested but no address to convert.\n"); return -EINVAL; } - if (PRINT_FIELD(SRCLINE) &&
[PATCH v2 04/24] lockdep tests: Run lockdep tests a second time under Valgrind
This improves test coverage. Cc: Peter Zijlstra Cc: Waiman Long Cc: Johannes Berg Signed-off-by: Bart Van Assche --- tools/lib/lockdep/run_tests.sh | 14 ++ 1 file changed, 14 insertions(+) diff --git a/tools/lib/lockdep/run_tests.sh b/tools/lib/lockdep/run_tests.sh index bc36178329a8..c8fbd0306960 100755 --- a/tools/lib/lockdep/run_tests.sh +++ b/tools/lib/lockdep/run_tests.sh @@ -31,3 +31,17 @@ find tests -name '*.c' | sort | while read -r i; do fi rm -f "tests/$testname" done + +find tests -name '*.c' | sort | while read -r i; do + testname=$(basename "$i" .c) + echo -ne "(PRELOAD + Valgrind) $testname... " + if gcc -o "tests/$testname" -pthread -Iinclude "$i" && + { timeout 10 valgrind --read-var-info=yes ./lockdep "./tests/$testname" >& "tests/${testname}.vg.out"; true; } && + "tests/${testname}.sh" < "tests/${testname}.vg.out" && + ! grep -Eq '(^==[0-9]*== (Invalid |Uninitialised ))|Mismatched free|Source and destination overlap| UME ' "tests/${testname}.vg.out"; then + echo "PASSED!" + else + echo "FAILED!" + fi + rm -f "tests/$testname" +done -- 2.20.0.rc1.387.gf8505762e3-goog
[PATCH v2 00/24] locking/lockdep: Add support for dynamic keys
Hi Ingo and Peter, A known shortcoming of the current lockdep implementation is that it requires lock keys to be allocated statically. This forces certain unrelated synchronization objects to share keys and this key sharing can cause false positive deadlock reports. This patch series adds support for dynamic keys in the lockdep code and eliminates a class of false positive reports from the workqueue implementation. The changes compared to v1 are: - Addressed Peter's review comments: remove the list_head that I had added to struct lock_list again, replaced all_list_entries and free_list_entries by two bitmaps, use call_rcu() to free lockdep objects, add a BUILD_BUG_ON() that compares the size of struct lock_class_key and raw_spin_lock_t. - Addressed the "unknown symbol" errors reported by the build bot by adding a few #ifdef / #endif directives. Addressed the 32-bit warnings by using %d instead of %ld for array indices and by casting the array indices to unsigned int. - Removed several WARN_ON_ONCE(!class->hash_entry.pprev) statements since these duplicate the code in check_data_structures(). - Left out the patch that causes lockdep to complain if no name has been assigned to a lock object. That patch namely causes the build bot to complain about certain lock objects but I have not yet had the time to figure out the identity of these lock objects. Bart. Bart Van Assche (24): lockdep tests: Display compiler warning and error messages lockdep tests: Fix shellcheck warnings lockdep tests: Improve testing accuracy lockdep tests: Run lockdep tests a second time under Valgrind liblockdep: Rename "trywlock" into "trywrlock" liblockdep: Add dummy print_irqtrace_events() implementation lockdep tests: Test the lockdep_reset_lock() implementation locking/lockdep: Declare local symbols static locking/lockdep: Inline __lockdep_init_map() locking/lockdep: Introduce lock_class_cache_is_registered() locking/lockdep: Remove a superfluous INIT_LIST_HEAD() statement locking/lockdep: Make concurrent lockdep_reset_lock() calls safe locking/lockdep: Stop using RCU primitives to access all_lock_classes locking/lockdep: Make zap_class() remove all matching lock order entries locking/lockdep: Reorder struct lock_class members locking/lockdep: Retain the class key and name while freeing a lock class locking/lockdep: Free lock classes that are no longer in use locking/lockdep: Reuse list entries that are no longer in use locking/lockdep: Check data structure consistency locking/lockdep: Introduce __lockdep_free_key_range() locking/lockdep: Verify whether lock objects are small enough to be used as class keys locking/lockdep: Add support for dynamic keys kernel/workqueue: Use dynamic lockdep keys for workqueues lockdep tests: Test dynamic key registration include/linux/lockdep.h | 37 +- include/linux/workqueue.h | 28 +- kernel/locking/lockdep.c | 660 +++--- kernel/workqueue.c| 60 +- tools/lib/lockdep/include/liblockdep/common.h | 3 + tools/lib/lockdep/include/liblockdep/mutex.h | 12 +- tools/lib/lockdep/include/liblockdep/rwlock.h | 6 +- tools/lib/lockdep/lockdep.c | 5 + tools/lib/lockdep/run_tests.sh| 39 +- tools/lib/lockdep/tests/AA.sh | 2 + tools/lib/lockdep/tests/ABA.sh| 2 + tools/lib/lockdep/tests/ABBA.c| 12 + tools/lib/lockdep/tests/ABBA.sh | 2 + tools/lib/lockdep/tests/ABBA_2threads.sh | 2 + tools/lib/lockdep/tests/ABBCCA.c | 4 + tools/lib/lockdep/tests/ABBCCA.sh | 2 + tools/lib/lockdep/tests/ABBCCDDA.c| 5 + tools/lib/lockdep/tests/ABBCCDDA.sh | 2 + tools/lib/lockdep/tests/ABCABC.c | 4 + tools/lib/lockdep/tests/ABCABC.sh | 2 + tools/lib/lockdep/tests/ABCDBCDA.c| 5 + tools/lib/lockdep/tests/ABCDBCDA.sh | 2 + tools/lib/lockdep/tests/ABCDBDDA.c| 5 + tools/lib/lockdep/tests/ABCDBDDA.sh | 2 + tools/lib/lockdep/tests/WW.sh | 2 + tools/lib/lockdep/tests/unlock_balance.c | 2 + tools/lib/lockdep/tests/unlock_balance.sh | 2 + 27 files changed, 744 insertions(+), 165 deletions(-) create mode 100755 tools/lib/lockdep/tests/AA.sh create mode 100755 tools/lib/lockdep/tests/ABA.sh create mode 100755 tools/lib/lockdep/tests/ABBA.sh create mode 100755 tools/lib/lockdep/tests/ABBA_2threads.sh create mode 100755 tools/lib/lockdep/tests/ABBCCA.sh create mode 100755 tools/lib/lockdep/tests/ABBCCDDA.sh create mode 100755 tools/lib/lockdep/tests/ABCABC.sh create mode 100755 tools/lib/lockdep/tests/ABCDBCDA.sh create mode 100755 tools/lib/lockdep/tests/ABCDBDDA.sh create mode 100755 tools/lib/lockdep/tests/WW.sh create mode 100755
[PATCH v2 14/24] locking/lockdep: Make zap_class() remove all matching lock order entries
Make sure that all lock order entries that refer to a class are removed from the list_entries[] array when a kernel module is unloaded. Cc: Peter Zijlstra Cc: Waiman Long Cc: Johannes Berg Signed-off-by: Bart Van Assche --- include/linux/lockdep.h | 1 + kernel/locking/lockdep.c | 17 +++-- 2 files changed, 12 insertions(+), 6 deletions(-) diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 1fd82ff99c65..6d0f8d1c2bee 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -180,6 +180,7 @@ static inline void lockdep_copy_map(struct lockdep_map *to, struct lock_list { struct list_headentry; struct lock_class *class; + struct lock_class *links_to; struct stack_trace trace; int distance; diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 5c837a537273..ecd92969674c 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -859,7 +859,8 @@ static struct lock_list *alloc_list_entry(void) /* * Add a new dependency to the head of the list: */ -static int add_lock_to_list(struct lock_class *this, struct list_head *head, +static int add_lock_to_list(struct lock_class *this, + struct lock_class *links_to, struct list_head *head, unsigned long ip, int distance, struct stack_trace *trace) { @@ -872,7 +873,9 @@ static int add_lock_to_list(struct lock_class *this, struct list_head *head, if (!entry) return 0; + WARN_ON_ONCE(this == links_to); entry->class = this; + entry->links_to = links_to; entry->distance = distance; entry->trace = *trace; /* @@ -1918,14 +1921,14 @@ check_prev_add(struct task_struct *curr, struct held_lock *prev, * Ok, all validations passed, add the new lock * to the previous lock's dependency list: */ - ret = add_lock_to_list(hlock_class(next), + ret = add_lock_to_list(hlock_class(next), hlock_class(prev), _class(prev)->locks_after, next->acquire_ip, distance, trace); if (!ret) return 0; - ret = add_lock_to_list(hlock_class(prev), + ret = add_lock_to_list(hlock_class(prev), hlock_class(next), _class(next)->locks_before, next->acquire_ip, distance, trace); if (!ret) @@ -4128,15 +4131,17 @@ void lockdep_reset(void) */ static void zap_class(struct lock_class *class) { + struct lock_list *entry; int i; /* * Remove all dependencies this lock is * involved in: */ - for (i = 0; i < nr_list_entries; i++) { - if (list_entries[i].class == class) - list_del_rcu(_entries[i].entry); + for (i = 0, entry = list_entries; i < nr_list_entries; i++, entry++) { + if (entry->class != class && entry->links_to != class) + continue; + list_del_rcu(>entry); } /* * Unhash the class and remove it from the all_lock_classes list: -- 2.20.0.rc1.387.gf8505762e3-goog
[PATCH v2 08/24] locking/lockdep: Declare local symbols static
This patch avoids that sparse complains about a missing declaration for the lock_classes array. Cc: Peter Zijlstra Cc: Waiman Long Cc: Johannes Berg Signed-off-by: Bart Van Assche --- kernel/locking/lockdep.c | 3 +++ 1 file changed, 3 insertions(+) diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 1efada2dd9dd..7434a00b2b2f 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -138,6 +138,9 @@ static struct lock_list list_entries[MAX_LOCKDEP_ENTRIES]; * get freed - this significantly simplifies the debugging code. */ unsigned long nr_lock_classes; +#ifndef CONFIG_DEBUG_LOCKDEP +static +#endif struct lock_class lock_classes[MAX_LOCKDEP_KEYS]; static inline struct lock_class *hlock_class(struct held_lock *hlock) -- 2.20.0.rc1.387.gf8505762e3-goog
[PATCH v2 12/24] locking/lockdep: Make concurrent lockdep_reset_lock() calls safe
Since zap_class() removes items from the all_lock_classes list and the classhash_table, protect all zap_class() calls with the graph lock. Cc: Peter Zijlstra Cc: Waiman Long Cc: Johannes Berg Signed-off-by: Bart Van Assche --- kernel/locking/lockdep.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 346b5a1fd062..e78623819184 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -4229,6 +4229,7 @@ void lockdep_reset_lock(struct lockdep_map *lock) int j, locked; raw_local_irq_save(flags); + locked = graph_lock(); /* * Remove all classes this lock might have: @@ -4245,7 +4246,6 @@ void lockdep_reset_lock(struct lockdep_map *lock) * Debug check: in the end all mapped classes should * be gone. */ - locked = graph_lock(); if (unlikely(lock_class_cache_is_registered(lock))) { if (debug_locks_off_graph_unlock()) { /* -- 2.20.0.rc1.387.gf8505762e3-goog
[PATCH v2 02/24] lockdep tests: Fix shellcheck warnings
Use find instead of ls to avoid splitting filenames that contain spaces. Use rm -f instead of if ... then rm ...; fi. This patch addresses all shellcheck complaints about the run_tests.sh shell script. Cc: Peter Zijlstra Cc: Waiman Long Cc: Johannes Berg Signed-off-by: Bart Van Assche --- tools/lib/lockdep/run_tests.sh | 12 1 file changed, 4 insertions(+), 8 deletions(-) diff --git a/tools/lib/lockdep/run_tests.sh b/tools/lib/lockdep/run_tests.sh index 9f31f84e7fac..253719ee6377 100755 --- a/tools/lib/lockdep/run_tests.sh +++ b/tools/lib/lockdep/run_tests.sh @@ -7,7 +7,7 @@ if ! make >/dev/null; then exit 1 fi -for i in `ls tests/*.c`; do +find tests -name '*.c' | sort | while read -r i; do testname=$(basename "$i" .c) echo -ne "$testname... " if gcc -o "tests/$testname" -pthread "$i" liblockdep.a -Iinclude -D__USE_LIBLOCKDEP && @@ -16,12 +16,10 @@ for i in `ls tests/*.c`; do else echo "FAILED!" fi - if [ -f "tests/$testname" ]; then - rm tests/$testname - fi + rm -f "tests/$testname" done -for i in `ls tests/*.c`; do +find tests -name '*.c' | sort | while read -r i; do testname=$(basename "$i" .c) echo -ne "(PRELOAD) $testname... " if gcc -o "tests/$testname" -pthread -Iinclude "$i" && @@ -30,7 +28,5 @@ for i in `ls tests/*.c`; do else echo "FAILED!" fi - if [ -f "tests/$testname" ]; then - rm tests/$testname - fi + rm -f "tests/$testname" done -- 2.20.0.rc1.387.gf8505762e3-goog
[PATCH v2 17/24] locking/lockdep: Free lock classes that are no longer in use
Instead of leaving lock classes that are no longer in use in the lock_classes array, reuse entries from that array that are no longer in use. Maintain a linked list of free lock classes with list head 'free_lock_class'. Initialize that list from inside register_lock_class() instead of from inside lockdep_init() because register_lock_class() can be called before lockdep_init() has been called. Only add freed lock classes to the free_lock_classes list after a grace period to avoid that a lock_classes[] element would be reused while an RCU reader is accessing it. Cc: Peter Zijlstra Cc: Waiman Long Cc: Johannes Berg Signed-off-by: Bart Van Assche --- include/linux/lockdep.h | 9 +- kernel/locking/lockdep.c | 237 --- 2 files changed, 205 insertions(+), 41 deletions(-) diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 9421f028c26c..02a1469c46e1 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -63,7 +63,8 @@ extern struct lock_class_key __lockdep_no_validate__; #define LOCKSTAT_POINTS4 /* - * The lock-class itself: + * The lock-class itself. The order of the structure members matters. + * reinit_class() zeroes the key member and all subsequent members. */ struct lock_class { /* @@ -72,7 +73,9 @@ struct lock_class { struct hlist_node hash_entry; /* -* global list of all lock-classes: +* Entry in all_lock_classes when in use. Entry in free_lock_classes +* when not in use. Instances that are being freed are briefly on +* neither list. */ struct list_headlock_entry; @@ -106,7 +109,7 @@ struct lock_class { unsigned long contention_point[LOCKSTAT_POINTS]; unsigned long contending_point[LOCKSTAT_POINTS]; #endif -}; +} __no_randomize_layout; #ifdef CONFIG_LOCK_STAT struct lock_time { diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 92bdb187987f..d907d8bfefdf 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -134,8 +134,8 @@ static struct lock_list list_entries[MAX_LOCKDEP_ENTRIES]; /* * All data structures here are protected by the global debug_lock. * - * Mutex key structs only get allocated, once during bootup, and never - * get freed - this significantly simplifies the debugging code. + * nr_lock_classes is the number of elements of lock_classes[] that is + * in use. */ unsigned long nr_lock_classes; #ifndef CONFIG_DEBUG_LOCKDEP @@ -277,11 +277,18 @@ static inline void lock_release_holdtime(struct held_lock *hlock) #endif /* - * We keep a global list of all lock classes. The list only grows, - * never shrinks. The list is only accessed with the lockdep - * spinlock lock held. + * We keep a global list of all lock classes. The list is only accessed with + * the lockdep spinlock lock held. The zapped_classes list contains lock + * classes that are about to be freed but that may still be accessed by an RCU + * reader. free_lock_classes is a list with free elements. These elements are + * linked together by the lock_entry member in struct lock_class. */ LIST_HEAD(all_lock_classes); +static LIST_HEAD(zapped_classes); +static LIST_HEAD(free_lock_classes); +static bool initialization_happened; +static bool rcu_callback_scheduled; +static struct rcu_head free_zapped_classes_rcu_head; /* * The lockdep classes are in a hash-table as well, for fast lookup: @@ -735,6 +742,21 @@ static bool assign_lock_key(struct lockdep_map *lock) return true; } +/* + * Initialize the lock_classes[] array elements and also the free_lock_classes + * list. + */ +static void init_data_structures(void) +{ + int i; + + for (i = 0; i < ARRAY_SIZE(lock_classes); i++) { + list_add_tail(_classes[i].lock_entry, _lock_classes); + INIT_LIST_HEAD(_classes[i].locks_after); + INIT_LIST_HEAD(_classes[i].locks_before); + } +} + /* * Register a lock's class in the hash-table, if the class is not present * yet. Otherwise we look it up. We cache the result in the lock object @@ -775,11 +797,14 @@ register_lock_class(struct lockdep_map *lock, unsigned int subclass, int force) goto out_unlock_set; } - /* -* Allocate a new key from the static array, and add it to -* the hash: -*/ - if (nr_lock_classes >= MAX_LOCKDEP_KEYS) { + /* Allocate a new lock class and add it to the hash. */ + if (unlikely(!initialization_happened)) { + initialization_happened = true; + init_data_structures(); + } + class = list_first_entry_or_null(_lock_classes, typeof(*class), +lock_entry); + if (!class) { if (!debug_locks_off_graph_unlock()) { return NULL; } @@
[PATCH v2 16/24] locking/lockdep: Retain the class key and name while freeing a lock class
The next patch in this series uses the class name in code that detects lock class use-after-free. Hence retain the class name for lock classes that are being freed. Cc: Peter Zijlstra Cc: Waiman Long Cc: Johannes Berg Signed-off-by: Bart Van Assche --- kernel/locking/lockdep.c | 4 +--- 1 file changed, 1 insertion(+), 3 deletions(-) diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index ecd92969674c..92bdb187987f 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -4147,10 +4147,8 @@ static void zap_class(struct lock_class *class) * Unhash the class and remove it from the all_lock_classes list: */ hlist_del_rcu(>hash_entry); + class->hash_entry.pprev = NULL; list_del(>lock_entry); - - RCU_INIT_POINTER(class->key, NULL); - RCU_INIT_POINTER(class->name, NULL); } static inline int within(const void *addr, void *start, unsigned long size) -- 2.20.0.rc1.387.gf8505762e3-goog
[PATCH v2 05/24] liblockdep: Rename "trywlock" into "trywrlock"
This patch avoids that the following compiler warning is reported while compiling the lockdep unit tests: include/liblockdep/rwlock.h: In function 'liblockdep_pthread_rwlock_trywlock': include/liblockdep/rwlock.h:66:9: warning: implicit declaration of function 'pthread_rwlock_trywlock'; did you mean 'pthread_rwlock_trywrlock'? [-Wimplicit-function-declaration] return pthread_rwlock_trywlock(>rwlock) == 0 ? 1 : 0; ^~~ pthread_rwlock_trywrlock Fixes: 5a52c9b480e0 ("liblockdep: Add public headers for pthread_rwlock_t implementation") Cc: Sasha Levin Cc: Peter Zijlstra Cc: Waiman Long Cc: Johannes Berg Signed-off-by: Bart Van Assche --- tools/lib/lockdep/include/liblockdep/rwlock.h | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/tools/lib/lockdep/include/liblockdep/rwlock.h b/tools/lib/lockdep/include/liblockdep/rwlock.h index a96c3bf0fef1..365762e3a1ea 100644 --- a/tools/lib/lockdep/include/liblockdep/rwlock.h +++ b/tools/lib/lockdep/include/liblockdep/rwlock.h @@ -60,10 +60,10 @@ static inline int liblockdep_pthread_rwlock_tryrdlock(liblockdep_pthread_rwlock_ return pthread_rwlock_tryrdlock(>rwlock) == 0 ? 1 : 0; } -static inline int liblockdep_pthread_rwlock_trywlock(liblockdep_pthread_rwlock_t *lock) +static inline int liblockdep_pthread_rwlock_trywrlock(liblockdep_pthread_rwlock_t *lock) { lock_acquire(>dep_map, 0, 1, 0, 1, NULL, (unsigned long)_RET_IP_); - return pthread_rwlock_trywlock(>rwlock) == 0 ? 1 : 0; + return pthread_rwlock_trywrlock(>rwlock) == 0 ? 1 : 0; } static inline int liblockdep_rwlock_destroy(liblockdep_pthread_rwlock_t *lock) @@ -79,7 +79,7 @@ static inline int liblockdep_rwlock_destroy(liblockdep_pthread_rwlock_t *lock) #define pthread_rwlock_unlock liblockdep_pthread_rwlock_unlock #define pthread_rwlock_wrlock liblockdep_pthread_rwlock_wrlock #define pthread_rwlock_tryrdlock liblockdep_pthread_rwlock_tryrdlock -#define pthread_rwlock_trywlock liblockdep_pthread_rwlock_trywlock +#define pthread_rwlock_trywrlock liblockdep_pthread_rwlock_trywrlock #define pthread_rwlock_destroy liblockdep_rwlock_destroy #endif -- 2.20.0.rc1.387.gf8505762e3-goog
[PATCH v2 22/24] locking/lockdep: Add support for dynamic keys
A shortcoming of the current lockdep implementation is that it requires lock keys to be allocated statically. That forces certain lock objects to share lock keys. Since lock dependency analysis groups lock objects per key sharing lock keys can cause false positive lockdep reports. Make it possible to avoid such false positive reports by allowing lock keys to be allocated dynamically. Require that dynamically allocated lock keys are registered before use by calling lockdep_register_key(). Complain about attempts to register the same lock key pointer twice without calling lockdep_unregister_key() between successive registration calls. The purpose of the new lock_keys_hash[] data structure that keeps track of all dynamic keys is twofold: - Verify whether the lockdep_register_key() and lockdep_unregister_key() functions are used correctly. - Avoid that lockdep_init_map() complains when encountering a dynamically allocated key. Cc: Peter Zijlstra Cc: Waiman Long Cc: Johannes Berg Signed-off-by: Bart Van Assche --- include/linux/lockdep.h | 13 - kernel/locking/lockdep.c | 120 --- 2 files changed, 122 insertions(+), 11 deletions(-) diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h index 02a1469c46e1..ea09048b6e1c 100644 --- a/include/linux/lockdep.h +++ b/include/linux/lockdep.h @@ -46,15 +46,19 @@ extern int lock_stat; #define NR_LOCKDEP_CACHING_CLASSES 2 /* - * Lock-classes are keyed via unique addresses, by embedding the - * lockclass-key into the kernel (or module) .data section. (For - * static locks we use the lock address itself as the key.) + * A lockdep key is associated with each lock object. For static locks we use + * the lock address itself as the key. Dynamically allocated lock objects can + * have a statically or dynamically allocated key. Dynamically allocated lock + * keys must be registered before being used and must be unregistered before + * the key memory is freed. */ struct lockdep_subclass_key { char __one_byte; } __attribute__ ((__packed__)); +/* hash_entry is used to keep track of dynamically allocated keys. */ struct lock_class_key { + struct hlist_node hash_entry; struct lockdep_subclass_key subkeys[MAX_LOCKDEP_SUBCLASSES]; }; @@ -274,6 +278,9 @@ extern asmlinkage void lockdep_sys_exit(void); extern void lockdep_off(void); extern void lockdep_on(void); +extern void lockdep_register_key(struct lock_class_key *key); +extern void lockdep_unregister_key(struct lock_class_key *key); + /* * These methods are used by specific locking variants (spinlocks, * rwlocks, mutexes and rwsems) to pass init/acquire/release events diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index b4772e5fc176..d8ceed4f2900 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -141,6 +141,9 @@ static DECLARE_BITMAP(list_entries_being_freed, MAX_LOCKDEP_ENTRIES); * nr_lock_classes is the number of elements of lock_classes[] that is * in use. */ +#define KEYHASH_BITS (MAX_LOCKDEP_KEYS_BITS - 1) +#define KEYHASH_SIZE (1UL << KEYHASH_BITS) +static struct hlist_head lock_keys_hash[KEYHASH_SIZE]; unsigned long nr_lock_classes; #ifndef CONFIG_DEBUG_LOCKDEP static @@ -610,7 +613,7 @@ static int very_verbose(struct lock_class *class) * Is this the address of a static object: */ #ifdef __KERNEL__ -static int static_obj(void *obj) +static int static_obj(const void *obj) { unsigned long start = (unsigned long) &_stext, end = (unsigned long) &_end, @@ -912,6 +915,70 @@ static void init_data_structures(void) } } +static inline struct hlist_head *keyhashentry(const struct lock_class_key *key) +{ + unsigned long hash = hash_long((uintptr_t)key, KEYHASH_BITS); + + return lock_keys_hash + hash; +} + +/* + * Register a dynamically allocated key. + */ +void lockdep_register_key(struct lock_class_key *key) +{ + struct hlist_head *hash_head; + struct lock_class_key *k; + unsigned long flags; + + if (WARN_ON_ONCE(static_obj(key))) + return; + hash_head = keyhashentry(key); + raw_local_irq_save(flags); + if (!graph_lock()) + goto restore_irqs; + hlist_for_each_entry_rcu(k, hash_head, hash_entry) { + if (WARN_ON_ONCE(k == key)) + goto out_unlock; + } + hlist_add_head_rcu(>hash_entry, hash_head); +out_unlock: + graph_unlock(); +restore_irqs: + raw_local_irq_restore(flags); +} +EXPORT_SYMBOL_GPL(lockdep_register_key); + +/* + * Check whether a key has been registered as a dynamic key. Must not be called + * from interrupt context. + */ +static bool is_dynamic_key(const struct lock_class_key *key) +{ + struct hlist_head *hash_head; + struct lock_class_key *k; + unsigned long flags; + bool found = false; + + if
[PATCH v2 23/24] kernel/workqueue: Use dynamic lockdep keys for workqueues
Commit 87915adc3f0a ("workqueue: re-add lockdep dependencies for flushing") improved deadlock checking in the workqueue implementation. Unfortunately that patch also introduced a few false positive lockdep complaints. This patch suppresses these false positives by allocating the workqueue mutex lockdep key dynamically. An example of a false positive lockdep complaint suppressed by this report can be found below. The root cause of the lockdep complaint shown below is that the direct I/O code can call alloc_workqueue() from inside a work item created by another alloc_workqueue() call and that both workqueues share the same lockdep key. This patch avoids that that lockdep complaint is triggered by allocating the work queue lockdep keys dynamically. In other words, this patch guarantees that a unique lockdep key is associated with each work queue mutex. == WARNING: possible circular locking dependency detected 4.19.0-dbg+ #1 Not tainted -- fio/4129 is trying to acquire lock: a01cfe1a ((wq_completion)"dio/%s"sb->s_id){+.+.}, at: flush_workqueue+0xd0/0x970 but task is already holding lock: a0acecf9 (>s_type->i_mutex_key#14){+.+.}, at: ext4_file_write_iter+0x154/0x710 which lock already depends on the new lock. the existing dependency chain (in reverse order) is: -> #2 (>s_type->i_mutex_key#14){+.+.}: down_write+0x3d/0x80 __generic_file_fsync+0x77/0xf0 ext4_sync_file+0x3c9/0x780 vfs_fsync_range+0x66/0x100 dio_complete+0x2f5/0x360 dio_aio_complete_work+0x1c/0x20 process_one_work+0x481/0x9f0 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 -> #1 ((work_completion)(>complete_work)){+.+.}: process_one_work+0x447/0x9f0 worker_thread+0x63/0x5a0 kthread+0x1cf/0x1f0 ret_from_fork+0x24/0x30 -> #0 ((wq_completion)"dio/%s"sb->s_id){+.+.}: lock_acquire+0xc5/0x200 flush_workqueue+0xf3/0x970 drain_workqueue+0xec/0x220 destroy_workqueue+0x23/0x350 sb_init_dio_done_wq+0x6a/0x80 do_blockdev_direct_IO+0x1f33/0x4be0 __blockdev_direct_IO+0x79/0x86 ext4_direct_IO+0x5df/0xbb0 generic_file_direct_write+0x119/0x220 __generic_file_write_iter+0x131/0x2d0 ext4_file_write_iter+0x3fa/0x710 aio_write+0x235/0x330 io_submit_one+0x510/0xeb0 __x64_sys_io_submit+0x122/0x340 do_syscall_64+0x71/0x220 entry_SYSCALL_64_after_hwframe+0x49/0xbe other info that might help us debug this: Chain exists of: (wq_completion)"dio/%s"sb->s_id --> (work_completion)(>complete_work) --> >s_type->i_mutex_key#14 Possible unsafe locking scenario: CPU0CPU1 lock(>s_type->i_mutex_key#14); lock((work_completion)(>complete_work)); lock(>s_type->i_mutex_key#14); lock((wq_completion)"dio/%s"sb->s_id); *** DEADLOCK *** 1 lock held by fio/4129: #0: a0acecf9 (>s_type->i_mutex_key#14){+.+.}, at: ext4_file_write_iter+0x154/0x710 stack backtrace: CPU: 3 PID: 4129 Comm: fio Not tainted 4.19.0-dbg+ #1 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.10.2-1 04/01/2014 Call Trace: dump_stack+0x86/0xc5 print_circular_bug.isra.32+0x20a/0x218 __lock_acquire+0x1c68/0x1cf0 lock_acquire+0xc5/0x200 flush_workqueue+0xf3/0x970 drain_workqueue+0xec/0x220 destroy_workqueue+0x23/0x350 sb_init_dio_done_wq+0x6a/0x80 do_blockdev_direct_IO+0x1f33/0x4be0 __blockdev_direct_IO+0x79/0x86 ext4_direct_IO+0x5df/0xbb0 generic_file_direct_write+0x119/0x220 __generic_file_write_iter+0x131/0x2d0 ext4_file_write_iter+0x3fa/0x710 aio_write+0x235/0x330 io_submit_one+0x510/0xeb0 __x64_sys_io_submit+0x122/0x340 do_syscall_64+0x71/0x220 entry_SYSCALL_64_after_hwframe+0x49/0xbe Cc: Ingo Molnar Cc: Peter Zijlstra Cc: Will Deacon Cc: Tejun Heo Cc: Waiman Long Cc: Johannes Berg Signed-off-by: Bart Van Assche --- include/linux/workqueue.h | 28 +++--- kernel/workqueue.c| 60 +-- 2 files changed, 55 insertions(+), 33 deletions(-) diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h index 60d673e15632..d9a1a480e920 100644 --- a/include/linux/workqueue.h +++ b/include/linux/workqueue.h @@ -390,43 +390,23 @@ extern struct workqueue_struct *system_freezable_wq; extern struct workqueue_struct *system_power_efficient_wq; extern struct workqueue_struct *system_freezable_power_efficient_wq; -extern struct workqueue_struct * -__alloc_workqueue_key(const char *fmt, unsigned int flags, int max_active, - struct lock_class_key *key, const char *lock_name, ...) __printf(1, 6); - /** * alloc_workqueue - allocate a workqueue * @fmt: printf format for the name of the workqueue * @flags: WQ_* flags *
[PATCH v2 21/24] locking/lockdep: Verify whether lock objects are small enough to be used as class keys
Cc: Peter Zijlstra Cc: Waiman Long Cc: Johannes Berg Signed-off-by: Bart Van Assche --- kernel/locking/lockdep.c | 9 + 1 file changed, 9 insertions(+) diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index c936fce5b9d7..b4772e5fc176 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -727,6 +727,15 @@ static bool assign_lock_key(struct lockdep_map *lock) { unsigned long can_addr, addr = (unsigned long)lock; + /* +* lockdep_free_key_range() assumes that struct lock_class_key +* objects do not overlap. Since we use the address of lock +* objects as class key for static objects, check whether the +* size of lock_class_key objects does not exceed the size of +* the smallest lock object. +*/ + BUILD_BUG_ON(sizeof(struct lock_class_key) > sizeof(raw_spinlock_t)); + if (__is_kernel_percpu_address(addr, _addr)) lock->key = (void *)can_addr; else if (__is_module_percpu_address(addr, _addr)) -- 2.20.0.rc1.387.gf8505762e3-goog
[PATCH v2 24/24] lockdep tests: Test dynamic key registration
Make sure that the lockdep_register_key() and lockdep_unregister_key() code is tested when running the lockdep tests. Cc: Peter Zijlstra Cc: Waiman Long Cc: Johannes Berg Signed-off-by: Bart Van Assche --- tools/lib/lockdep/include/liblockdep/common.h | 2 ++ tools/lib/lockdep/include/liblockdep/mutex.h | 11 ++- tools/lib/lockdep/tests/ABBA.c| 9 + 3 files changed, 17 insertions(+), 5 deletions(-) diff --git a/tools/lib/lockdep/include/liblockdep/common.h b/tools/lib/lockdep/include/liblockdep/common.h index d640a9761f09..a81d91d4fc78 100644 --- a/tools/lib/lockdep/include/liblockdep/common.h +++ b/tools/lib/lockdep/include/liblockdep/common.h @@ -45,6 +45,8 @@ void lock_acquire(struct lockdep_map *lock, unsigned int subclass, void lock_release(struct lockdep_map *lock, int nested, unsigned long ip); void lockdep_reset_lock(struct lockdep_map *lock); +void lockdep_register_key(struct lock_class_key *key); +void lockdep_unregister_key(struct lock_class_key *key); extern void debug_check_no_locks_freed(const void *from, unsigned long len); #define STATIC_LOCKDEP_MAP_INIT(_name, _key) \ diff --git a/tools/lib/lockdep/include/liblockdep/mutex.h b/tools/lib/lockdep/include/liblockdep/mutex.h index 2073d4e1f2f0..783dd0df06f9 100644 --- a/tools/lib/lockdep/include/liblockdep/mutex.h +++ b/tools/lib/lockdep/include/liblockdep/mutex.h @@ -7,6 +7,7 @@ struct liblockdep_pthread_mutex { pthread_mutex_t mutex; + struct lock_class_key key; struct lockdep_map dep_map; }; @@ -27,11 +28,10 @@ static inline int __mutex_init(liblockdep_pthread_mutex_t *lock, return pthread_mutex_init(>mutex, __mutexattr); } -#define liblockdep_pthread_mutex_init(mutex, mutexattr)\ -({ \ - static struct lock_class_key __key; \ - \ - __mutex_init((mutex), #mutex, &__key, (mutexattr)); \ +#define liblockdep_pthread_mutex_init(mutex, mutexattr) \ +({ \ + lockdep_register_key(&(mutex)->key);\ + __mutex_init((mutex), #mutex, &(mutex)->key, (mutexattr)); \ }) static inline int liblockdep_pthread_mutex_lock(liblockdep_pthread_mutex_t *lock) @@ -55,6 +55,7 @@ static inline int liblockdep_pthread_mutex_trylock(liblockdep_pthread_mutex_t *l static inline int liblockdep_pthread_mutex_destroy(liblockdep_pthread_mutex_t *lock) { lockdep_reset_lock(>dep_map); + lockdep_unregister_key(>key); return pthread_mutex_destroy(>mutex); } diff --git a/tools/lib/lockdep/tests/ABBA.c b/tools/lib/lockdep/tests/ABBA.c index 623313f54720..543789bc3e37 100644 --- a/tools/lib/lockdep/tests/ABBA.c +++ b/tools/lib/lockdep/tests/ABBA.c @@ -14,4 +14,13 @@ void main(void) pthread_mutex_destroy(); pthread_mutex_destroy(); + + pthread_mutex_init(, NULL); + pthread_mutex_init(, NULL); + + LOCK_UNLOCK_2(a, b); + LOCK_UNLOCK_2(b, a); + + pthread_mutex_destroy(); + pthread_mutex_destroy(); } -- 2.20.0.rc1.387.gf8505762e3-goog
[PATCH v2 20/24] locking/lockdep: Introduce __lockdep_free_key_range()
This patch does not change any functionality but makes the next patch in this series easier to read. Cc: Peter Zijlstra Cc: Waiman Long Cc: Johannes Berg Signed-off-by: Bart Van Assche --- kernel/locking/lockdep.c | 27 +-- 1 file changed, 17 insertions(+), 10 deletions(-) diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index 6d99f3f0757c..c936fce5b9d7 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -4478,14 +4478,12 @@ static void schedule_free_zapped_classes(void) } /* - * Used in module.c to remove lock classes from memory that is going to be - * freed; and possibly re-used by other modules. - * - * We will have had one sync_sched() before getting here, so we're guaranteed - * nobody will look up these exact classes -- they're properly dead but still - * allocated. + * Remove all lock classes from the class hash table and from the + * all_lock_classes list whose key or name is in the address range + * [start, start + size). Move these lock classes to the + * @zapped_classes list. */ -void lockdep_free_key_range(void *start, unsigned long size) +static void __lockdep_free_key_range(void *start, unsigned long size) { struct lock_class *class; struct hlist_head *head; @@ -4496,9 +4494,6 @@ void lockdep_free_key_range(void *start, unsigned long size) raw_local_irq_save(flags); locked = graph_lock(); - /* -* Unhash all classes that were created by this module: -*/ for (i = 0; i < CLASSHASH_SIZE; i++) { head = classhash_table + i; hlist_for_each_entry_rcu(class, head, hash_entry) { @@ -4513,7 +4508,19 @@ void lockdep_free_key_range(void *start, unsigned long size) if (locked) graph_unlock(); raw_local_irq_restore(flags); +} +/* + * Used in module.c to remove lock classes from memory that is going to be + * freed; and possibly re-used by other modules. + * + * We will have had one sync_sched() before getting here, so we're guaranteed + * nobody will look up these exact classes -- they're properly dead but still + * allocated. + */ +void lockdep_free_key_range(void *start, unsigned long size) +{ + __lockdep_free_key_range(start, size); /* * Do not wait for concurrent look_up_lock_class() calls. If any such * concurrent call would return a pointer to one of the lock classes -- 2.20.0.rc1.387.gf8505762e3-goog
[PATCH v2 19/24] locking/lockdep: Check data structure consistency
Debugging lockdep data structure inconsistencies is challenging. Add disabled code that verifies data structure consistency at runtime. Cc: Peter Zijlstra Cc: Waiman Long Cc: Johannes Berg Signed-off-by: Bart Van Assche --- kernel/locking/lockdep.c | 147 +++ 1 file changed, 147 insertions(+) diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index f343e7612a3a..6d99f3f0757c 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -72,6 +72,8 @@ module_param(lock_stat, int, 0644); #define lock_stat 0 #endif +static bool check_data_structure_consistency; + /* * lockdep_lock: protects the lockdep graph, the hashes and the * class/list/hash allocators. @@ -744,6 +746,148 @@ static bool assign_lock_key(struct lockdep_map *lock) return true; } +/* Check whether element @e occurs in list @h */ +static bool in_list(struct list_head *e, struct list_head *h) +{ + struct list_head *f; + + list_for_each(f, h) { + if (e == f) + return true; + } + + return false; +} + +/* + * Check whether entry @e occurs in any of the locks_after or locks_before + * lists. + */ +static bool in_any_class_list(struct list_head *e) +{ + struct lock_class *class; + int i; + + for (i = 0; i < ARRAY_SIZE(lock_classes); i++) { + class = _classes[i]; + if (in_list(e, >locks_after) || + in_list(e, >locks_before)) + return true; + } + return false; +} + +static bool class_lock_list_valid(struct lock_class *c, struct list_head *h) +{ + struct lock_list *e; + + list_for_each_entry(e, h, entry) { + if (e->links_to != c) { + printk(KERN_INFO "class %s: mismatch for lock entry %ld; class %s <> %s", + c->name ? : "(?)", + (unsigned long)(e - list_entries), + e->links_to && e->links_to->name ? + e->links_to->name : "(?)", + e->class && e->class->name ? e->class->name : + "(?)"); + return false; + } + } + return true; +} + +static u16 chain_hlocks[]; + +static bool check_lock_chain_key(struct lock_chain *chain) +{ +#ifdef CONFIG_PROVE_LOCKING + u64 chain_key = 0; + int i; + + for (i = chain->base; i < chain->base + chain->depth; i++) + chain_key = iterate_chain_key(chain_key, chain_hlocks[i] + 1); + /* +* The 'unsigned long long' casts avoid that a compiler warning +* is reported when building tools/lib/lockdep. +*/ + if (chain->chain_key != chain_key) + printk(KERN_INFO "chain %lld: key %#llx <> %#llx\n", + (unsigned long long)(chain - lock_chains), + (unsigned long long)chain->chain_key, + (unsigned long long)chain_key); + return chain->chain_key == chain_key; +#else + return true; +#endif +} + +static bool check_data_structures(void) +{ + struct lock_class *class; + struct lock_chain *chain; + struct hlist_head *head; + struct lock_list *e; + int i; + + /* +* Check whether all list entries that are in use occur in a class +* lock list. +*/ + for_each_set_bit(i, list_entries_in_use, ARRAY_SIZE(list_entries)) { + if (test_bit(i, list_entries_being_freed)) + continue; + e = list_entries + i; + if (!in_any_class_list(>entry)) { + printk(KERN_INFO "list entry %d is not in any class list; class %s <> %s\n", + (unsigned int)(e - list_entries), + e->class->name ? : "(?)", + e->links_to->name ? : "(?)"); + return false; + } + } + + /* +* Check whether all list entries that are not in use do not occur in +* a class lock list. +*/ + for_each_clear_bit(i, list_entries_in_use, ARRAY_SIZE(list_entries)) { + e = list_entries + i; + if (WARN_ON_ONCE(test_bit(i, list_entries_being_freed))) + return false; + if (in_any_class_list(>entry)) { + printk(KERN_INFO "list entry %d occurs in a class list; class %s <> %s\n", + (unsigned int)(e - list_entries), + e->class && e->class->name ? e->class->name : + "(?)", + e->links_to && e->links_to->name ? + e->links_to->name : "(?)"); + return false; + } + } + +
[PATCH v2 18/24] locking/lockdep: Reuse list entries that are no longer in use
Instead of abandoning elements of list_entries[] that are no longer in use, make alloc_list_entry() reuse array elements that have been freed. Cc: Peter Zijlstra Cc: Waiman Long Cc: Johannes Berg Signed-off-by: Bart Van Assche --- kernel/locking/lockdep.c | 27 --- 1 file changed, 20 insertions(+), 7 deletions(-) diff --git a/kernel/locking/lockdep.c b/kernel/locking/lockdep.c index d907d8bfefdf..f343e7612a3a 100644 --- a/kernel/locking/lockdep.c +++ b/kernel/locking/lockdep.c @@ -130,6 +130,8 @@ static inline int debug_locks_off_graph_unlock(void) unsigned long nr_list_entries; static struct lock_list list_entries[MAX_LOCKDEP_ENTRIES]; +static DECLARE_BITMAP(list_entries_in_use, MAX_LOCKDEP_ENTRIES); +static DECLARE_BITMAP(list_entries_being_freed, MAX_LOCKDEP_ENTRIES); /* * All data structures here are protected by the global debug_lock. @@ -871,7 +873,10 @@ register_lock_class(struct lockdep_map *lock, unsigned int subclass, int force) */ static struct lock_list *alloc_list_entry(void) { - if (nr_list_entries >= MAX_LOCKDEP_ENTRIES) { + int idx = find_first_zero_bit(list_entries_in_use, + ARRAY_SIZE(list_entries)); + + if (idx >= ARRAY_SIZE(list_entries)) { if (!debug_locks_off_graph_unlock()) return NULL; @@ -879,7 +884,8 @@ static struct lock_list *alloc_list_entry(void) dump_stack(); return NULL; } - return list_entries + nr_list_entries++; + __set_bit(idx, list_entries_in_use); + return list_entries + idx; } /* @@ -984,7 +990,7 @@ static inline void mark_lock_accessed(struct lock_list *lock, unsigned long nr; nr = lock - list_entries; - WARN_ON(nr >= nr_list_entries); /* Out-of-bounds, input fail */ + WARN_ON(nr >= ARRAY_SIZE(list_entries)); /* Out-of-bounds, input fail */ lock->parent = parent; lock->class->dep_gen_id = lockdep_dependency_gen_id; } @@ -994,7 +1000,7 @@ static inline unsigned long lock_accessed(struct lock_list *lock) unsigned long nr; nr = lock - list_entries; - WARN_ON(nr >= nr_list_entries); /* Out-of-bounds, input fail */ + WARN_ON(nr >= ARRAY_SIZE(list_entries)); /* Out-of-bounds, input fail */ return lock->class->dep_gen_id == lockdep_dependency_gen_id; } @@ -4250,9 +4256,12 @@ static void zap_class(struct lock_class *class) * Remove all dependencies this lock is * involved in: */ - for (i = 0, entry = list_entries; i < nr_list_entries; i++, entry++) { + for_each_set_bit(i, list_entries_in_use, ARRAY_SIZE(list_entries)) { + entry = list_entries + i; if (entry->class != class && entry->links_to != class) continue; + if (__test_and_set_bit(i, list_entries_being_freed)) + continue; links_to = entry->links_to; WARN_ON_ONCE(entry->class == links_to); list_del_rcu(>entry); @@ -4286,8 +4295,9 @@ static inline int within(const void *addr, void *start, unsigned long size) } /* - * Free all lock classes that are on the zapped_classes list. Called as an - * RCU callback function. + * Free all lock classes that are on the zapped_classes list and also all list + * entries that have been marked as being freed. Called as an RCU callback + * function. */ static void free_zapped_classes(struct callback_head *ch) { @@ -4303,6 +4313,9 @@ static void free_zapped_classes(struct callback_head *ch) nr_lock_classes--; } list_splice_init(_classes, _lock_classes); + bitmap_andnot(list_entries_in_use, list_entries_in_use, + list_entries_being_freed, ARRAY_SIZE(list_entries)); + bitmap_clear(list_entries_being_freed, 0, ARRAY_SIZE(list_entries)); if (locked) graph_unlock(); raw_local_irq_restore(flags); -- 2.20.0.rc1.387.gf8505762e3-goog
[PATCH V2] sdhci: fix the timeout check window for clock and reset
>From 87692fc090978bde8fe872f02d0023a57af6b492 Mon Sep 17 00:00:00 2001 From: Alek Du Date: Fri, 30 Nov 2018 14:02:28 +0800 Subject: [PATCH] sdhci: fix the timeout check window for clock and reset We observed some fake timeouts on some devices, the log is like this: case 1: [159525.255629] mmc1: Internal clock never stabilised. [159525.255818] mmc1: sdhci: SDHCI REGISTER DUMP === [159525.256049] mmc1: sdhci: Sys addr: 0x | Version: 0x1002 [159525.256277] mmc1: sdhci: Blk size: 0x | Blk cnt: 0x [159525.256523] mmc1: sdhci: Argument: 0x | Trn mode: 0x [159525.256752] mmc1: sdhci: Present: 0x1fff | Host ctl: 0x [159525.256979] mmc1: sdhci: Power: 0x000b | Blk gap: 0x0080 [159525.257205] mmc1: sdhci: Wake-up: 0x | Clock:0xfa03 >From the clock control register dump, we are pretty sure the clock was stablized. case 2: [ 914.550127] mmc1: Reset 0x2 never completed. [ 914.550321] mmc1: sdhci: SDHCI REGISTER DUMP === [ 914.550608] mmc1: sdhci: Sys addr: 0x0010 | Version: 0x1002 After checking the sdhci code, we found the timeout check actually has a little window that the CPU can be scheduled out and when it comes back, the original time set or check is not valid. Signed-off-by: Alek Du --- drivers/mmc/host/sdhci.c | 16 ++-- 1 file changed, 14 insertions(+), 2 deletions(-) diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c index 99bdae53fa2e..af01f7d16eae 100644 --- a/drivers/mmc/host/sdhci.c +++ b/drivers/mmc/host/sdhci.c @@ -218,12 +218,17 @@ void sdhci_reset(struct sdhci_host *host, u8 mask) /* hw clears the bit when it's done */ while (sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask) { if (ktime_after(ktime_get(), timeout)) { + /* check it again, since there is a window between + bit check and time check */ + if (!(sdhci_readb(host, SDHCI_SOFTWARE_RESET) & mask)) + break; pr_err("%s: Reset 0x%x never completed.\n", mmc_hostname(host->mmc), (int)mask); sdhci_dumpregs(host); return; + } else { + udelay(10); } - udelay(10); } } EXPORT_SYMBOL_GPL(sdhci_reset); @@ -1611,12 +1616,19 @@ void sdhci_enable_clk(struct sdhci_host *host, u16 clk) while (!((clk = sdhci_readw(host, SDHCI_CLOCK_CONTROL)) & SDHCI_CLOCK_INT_STABLE)) { if (ktime_after(ktime_get(), timeout)) { + /* check it again since there is a window between + status check and time check */ + if ((clk = sdhci_readw(host, SDHCI_CLOCK_CONTROL)) + & SDHCI_CLOCK_INT_STABLE) + break; pr_err("%s: Internal clock never stabilised.\n", mmc_hostname(host->mmc)); sdhci_dumpregs(host); return; } - udelay(10); + else { + udelay(10); + } } clk |= SDHCI_CLOCK_CARD_EN; -- 2.17.1
[PATCH V3] kvm:x86 :remove unnecessary recalculate_apic_map
In the previous code, the variable apic_sw_disabled influences recalculate_apic_map. But in "KVM: x86: simplify kvm_apic_map" (commit:3b5a5ffa928a3f875b0d5dd284eeb7c322e1688a), the access to apic_sw_disabled in recalculate_apic_map has been deleted. Signed-off-by: Peng Hao --- arch/x86/kvm/lapic.c | 5 ++--- 1 file changed, 2 insertions(+), 3 deletions(-) diff --git a/arch/x86/kvm/lapic.c b/arch/x86/kvm/lapic.c index fbb0e6d..a11fbf9 100644 --- a/arch/x86/kvm/lapic.c +++ b/arch/x86/kvm/lapic.c @@ -246,10 +246,9 @@ static inline void apic_set_spiv(struct kvm_lapic *apic, u32 val) if (enabled != apic->sw_enabled) { apic->sw_enabled = enabled; - if (enabled) { + if (enabled) static_key_slow_dec_deferred(_sw_disabled); - recalculate_apic_map(apic->vcpu->kvm); - } else + else static_key_slow_inc(_sw_disabled.key); } } -- 1.8.3.1
[PATCH] squashfs: enable __GFP_FS in ->readpage to prevent hang in mem alloc
There is no need to disable __GFP_FS in ->readpage: * It's a read-only fs, so there will be no dirty/writeback page and there will be no deadlock against the caller's locked page * It just allocates one page, so compaction will not be invoked * It doesn't take any inode lock, so the reclamation of inode will be fine And no __GFP_FS may lead to hang in __alloc_pages_slowpath() if a squashfs page fault occurs in the context of a memory hogger, because the hogger will not be killed due to the logic in __alloc_pages_may_oom(). Signed-off-by: Hou Tao --- fs/squashfs/file.c | 3 ++- fs/squashfs/file_direct.c | 4 +++- fs/squashfs/squashfs_fs_f.h | 25 + 3 files changed, 30 insertions(+), 2 deletions(-) create mode 100644 fs/squashfs/squashfs_fs_f.h diff --git a/fs/squashfs/file.c b/fs/squashfs/file.c index f1c1430ae721..8603dda4a719 100644 --- a/fs/squashfs/file.c +++ b/fs/squashfs/file.c @@ -51,6 +51,7 @@ #include "squashfs_fs.h" #include "squashfs_fs_sb.h" #include "squashfs_fs_i.h" +#include "squashfs_fs_f.h" #include "squashfs.h" /* @@ -414,7 +415,7 @@ void squashfs_copy_cache(struct page *page, struct squashfs_cache_entry *buffer, TRACE("bytes %d, i %d, available_bytes %d\n", bytes, i, avail); push_page = (i == page->index) ? page : - grab_cache_page_nowait(page->mapping, i); + squashfs_grab_cache_page_nowait(page->mapping, i); if (!push_page) continue; diff --git a/fs/squashfs/file_direct.c b/fs/squashfs/file_direct.c index 80db1b86a27c..a0fdd6215348 100644 --- a/fs/squashfs/file_direct.c +++ b/fs/squashfs/file_direct.c @@ -17,6 +17,7 @@ #include "squashfs_fs.h" #include "squashfs_fs_sb.h" #include "squashfs_fs_i.h" +#include "squashfs_fs_f.h" #include "squashfs.h" #include "page_actor.h" @@ -60,7 +61,8 @@ int squashfs_readpage_block(struct page *target_page, u64 block, int bsize, /* Try to grab all the pages covered by the Squashfs block */ for (missing_pages = 0, i = 0, n = start_index; i < pages; i++, n++) { page[i] = (n == target_page->index) ? target_page : - grab_cache_page_nowait(target_page->mapping, n); + squashfs_grab_cache_page_nowait( + target_page->mapping, n); if (page[i] == NULL) { missing_pages++; diff --git a/fs/squashfs/squashfs_fs_f.h b/fs/squashfs/squashfs_fs_f.h new file mode 100644 index ..fc5fb7aeb27d --- /dev/null +++ b/fs/squashfs/squashfs_fs_f.h @@ -0,0 +1,25 @@ +/* SPDX-License-Identifier: GPL-2.0 */ +#ifndef SQUASHFS_FS_F +#define SQUASHFS_FS_F + +/* + * No need to use FGP_NOFS here: + * 1. It's a read-only fs, so there will be no dirty/writeback page and + *there will be no deadlock against the caller's locked page. + * 2. It just allocates one page, so compaction will not be invoked. + * 3. It doesn't take any inode lock, so the reclamation of inode + *will be fine. + * + * And GFP_NOFS may lead to infinite loop in __alloc_pages_slowpath() if a + * squashfs page fault occurs in the context of a memory hogger, because + * the hogger will not be killed due to the logic in __alloc_pages_may_oom(). + */ +static inline struct page * +squashfs_grab_cache_page_nowait(struct address_space *mapping, pgoff_t index) +{ + return pagecache_get_page(mapping, index, + FGP_LOCK|FGP_CREAT|FGP_NOWAIT, + mapping_gfp_mask(mapping)); +} +#endif + -- 2.16.2.dirty
Re: [PATCH] pinctrl: meson: fix G12A ao pull registers base address
On 2018/12/3 18:27, Neil Armstrong wrote: Hi Xingyu, On 03/12/2018 04:05, Xingyu Chen wrote: Since Meson G12A SoC, Introduce new ao registers AO_RTI_PULL_UP_EN_REG and AO_GPIO_O. These bits of controlling output level are remapped to the new register AO_GPIO_O, and the AO_GPIO_O_EN_N support only controlling output enable. These bits of controlling pull enable are remapped to the new register AO_RTI_PULL_UP_EN_REG, and the AO_RTI_PULL_UP_REG support only controlling pull type(up/down). The new layout of ao gpio/pull registers is as follows: - AO_GPIO_O_EN_N[offset: 0x9 << 2] - AO_GPIO_I [offset: 0xa << 2] - AO_RTI_PULL_UP_REG[offset: 0xb << 2] - AO_RTI_PULL_UP_EN_REG [offset: 0xc << 2] - AO_GPIO_O [offset: 0xd << 2] From above, we can see ao GPIO registers region has been separated by the ao pull registers. In order to ensure the continuity of the region on software, the ao GPIO and ao pull registers use the same base address, but can be identified by the offset. Fixes: 29ae0952e85f ("pinctrl: meson-g12a: add pinctrl driver support") Signed-off-by: Xingyu Chen Signed-off-by: Jianxin Pan --- drivers/pinctrl/meson/pinctrl-meson.c | 22 -- 1 file changed, 12 insertions(+), 10 deletions(-) diff --git a/drivers/pinctrl/meson/pinctrl-meson.c b/drivers/pinctrl/meson/pinctrl-meson.c index 53d449076dee..7ff40cd7a0cb 100644 --- a/drivers/pinctrl/meson/pinctrl-meson.c +++ b/drivers/pinctrl/meson/pinctrl-meson.c @@ -31,6 +31,9 @@ * In some cases the register ranges for pull enable and pull * direction are the same and thus there are only 3 register ranges. * + * Since Meson G12A SoC, the ao register ranges for gpio, pull enable + * and pull direction are the same, so there are only 2 register ranges. + * * For the pull and GPIO configuration every bank uses a contiguous * set of bits in the register sets described above; the same register * can be shared by more banks with different offsets. @@ -487,23 +490,22 @@ static int meson_pinctrl_parse_dt(struct meson_pinctrl *pc, return PTR_ERR(pc->reg_mux); } - pc->reg_pull = meson_map_resource(pc, gpio_np, "pull"); - if (IS_ERR(pc->reg_pull)) { - dev_err(pc->dev, "pull registers not found\n"); - return PTR_ERR(pc->reg_pull); + pc->reg_gpio = meson_map_resource(pc, gpio_np, "gpio"); + if (IS_ERR(pc->reg_gpio)) { + dev_err(pc->dev, "gpio registers not found\n"); + return PTR_ERR(pc->reg_gpio); } + pc->reg_pull = meson_map_resource(pc, gpio_np, "pull"); + /* Use gpio region if pull one is not present */ + if (IS_ERR(pc->reg_pull)) + pc->reg_pull = pc->reg_gpio; + pc->reg_pullen = meson_map_resource(pc, gpio_np, "pull-enable"); /* Use pull region if pull-enable one is not present */ if (IS_ERR(pc->reg_pullen)) pc->reg_pullen = pc->reg_pull; - pc->reg_gpio = meson_map_resource(pc, gpio_np, "gpio"); - if (IS_ERR(pc->reg_gpio)) { - dev_err(pc->dev, "gpio registers not found\n"); - return PTR_ERR(pc->reg_gpio); - } - return 0; } Doesn't it need an update of the bindings ? Neil Thanks, I will update it in next patch .
Re: [PATCH] printk: don't unconditionally shortcut print_time()
On (12/02/18 14:02), Tetsuo Handa wrote: > > @@ -1541,11 +1545,13 @@ int do_syslog(int type, char __user *buf, int len, > int source) > } else { > u64 seq = syslog_seq; > u32 idx = syslog_idx; > + bool f = syslog_partial ? syslog_time : printk_time; ^^ > while (seq < log_next_seq) { > struct printk_log *msg = log_from_idx(idx); > > - error += msg_print_text(msg, true, NULL, 0); > + error += msg_print_text(msg, true, f, NULL, 0); > + f = printk_time; ^^^ > idx = log_next(idx); > seq++; Can we please have something better than 'f'? -ss
Re: [PATCH 2/2] Input: omap-keypad: Fix idle configration to not block SoC idle states
Hi Tony, On Mon, Dec 03, 2018 at 03:12:51PM -0800, Tony Lindgren wrote: > > With PM enabled, I noticed that pressing a key on the droid4 keyboard will > block deeper idle states for the SoC. Looks like we can fix this by > managing the idle register to gether with the interrupt similar to what > we already do for the GPIO controller. Can you show me where exactly we are doing this? I can't seem to find the matching code. Thanks! > > And there's no need to keep enabling and disabling interrupts and > wake-up events for normal use if we use IRQF_ONESHOT as suggested by > Dmitry Torokhov so let's do that too. > > Cc: Axel Haslam > Cc: Illia Smyrnov > Cc: Marcel Partap > Cc: Merlijn Wajer > Cc: Michael Scott > Cc: NeKit > Cc: Pavel Machek > Cc: Sebastian Reichel > Reported-by: Pavel Machek > Signed-off-by: Tony Lindgren > --- > drivers/input/keyboard/omap4-keypad.c | 30 ++- > 1 file changed, 11 insertions(+), 19 deletions(-) > > diff --git a/drivers/input/keyboard/omap4-keypad.c > b/drivers/input/keyboard/omap4-keypad.c > --- a/drivers/input/keyboard/omap4-keypad.c > +++ b/drivers/input/keyboard/omap4-keypad.c > @@ -53,11 +53,12 @@ > /* OMAP4 bit definitions */ > #define OMAP4_DEF_IRQENABLE_EVENTEN BIT(0) > #define OMAP4_DEF_IRQENABLE_LONGKEY BIT(1) > -#define OMAP4_DEF_WUP_EVENT_ENA BIT(0) > -#define OMAP4_DEF_WUP_LONG_KEY_ENA BIT(1) > #define OMAP4_DEF_CTRL_NOSOFTMODEBIT(1) > #define OMAP4_DEF_CTRL_PTV_SHIFT 2 > > +#define OMAP4_KBD_IRQ_MASK (OMAP4_DEF_IRQENABLE_LONGKEY | \ > + OMAP4_DEF_IRQENABLE_EVENTEN) > + > /* OMAP4 values */ > #define OMAP4_VAL_IRQDISABLE 0x0 > > @@ -126,12 +127,8 @@ static irqreturn_t omap4_keypad_irq_handler(int irq, > void *dev_id) > { > struct omap4_keypad *keypad_data = dev_id; > > - if (kbd_read_irqreg(keypad_data, OMAP4_KBD_IRQSTATUS)) { > - /* Disable interrupts */ > - kbd_write_irqreg(keypad_data, OMAP4_KBD_IRQENABLE, > - OMAP4_VAL_IRQDISABLE); > + if (kbd_read_irqreg(keypad_data, OMAP4_KBD_IRQSTATUS)) > return IRQ_WAKE_THREAD; > - } > > return IRQ_NONE; > } > @@ -173,11 +170,6 @@ static irqreturn_t omap4_keypad_irq_thread_fn(int irq, > void *dev_id) > kbd_write_irqreg(keypad_data, OMAP4_KBD_IRQSTATUS, >kbd_read_irqreg(keypad_data, OMAP4_KBD_IRQSTATUS)); > > - /* enable interrupts */ > - kbd_write_irqreg(keypad_data, OMAP4_KBD_IRQENABLE, > - OMAP4_DEF_IRQENABLE_EVENTEN | > - OMAP4_DEF_IRQENABLE_LONGKEY); > - > return IRQ_HANDLED; > } > > @@ -197,11 +189,10 @@ static int omap4_keypad_open(struct input_dev *input) > /* clear pending interrupts */ > kbd_write_irqreg(keypad_data, OMAP4_KBD_IRQSTATUS, >kbd_read_irqreg(keypad_data, OMAP4_KBD_IRQSTATUS)); > - kbd_write_irqreg(keypad_data, OMAP4_KBD_IRQENABLE, > - OMAP4_DEF_IRQENABLE_EVENTEN | > - OMAP4_DEF_IRQENABLE_LONGKEY); > - kbd_writel(keypad_data, OMAP4_KBD_WAKEUPENABLE, > - OMAP4_DEF_WUP_EVENT_ENA | OMAP4_DEF_WUP_LONG_KEY_ENA); > + > + /* enable interrupts and wake-up events */ > + kbd_write_irqreg(keypad_data, OMAP4_KBD_IRQENABLE, OMAP4_KBD_IRQ_MASK); > + kbd_writel(keypad_data, OMAP4_KBD_WAKEUPENABLE, OMAP4_KBD_IRQ_MASK); > > enable_irq(keypad_data->irq); > > @@ -214,9 +205,10 @@ static void omap4_keypad_close(struct input_dev *input) > > disable_irq(keypad_data->irq); > > - /* Disable interrupts */ > + /* Disable interrupts and wake-up events */ > kbd_write_irqreg(keypad_data, OMAP4_KBD_IRQENABLE, >OMAP4_VAL_IRQDISABLE); > + kbd_writel(keypad_data, OMAP4_KBD_WAKEUPENABLE, 0); > > /* clear pending interrupts */ > kbd_write_irqreg(keypad_data, OMAP4_KBD_IRQSTATUS, > @@ -365,7 +357,7 @@ static int omap4_keypad_probe(struct platform_device > *pdev) > } > > error = request_threaded_irq(keypad_data->irq, omap4_keypad_irq_handler, > - omap4_keypad_irq_thread_fn, 0, > + omap4_keypad_irq_thread_fn, IRQF_ONESHOT, >"omap4-keypad", keypad_data); > if (error) { > dev_err(>dev, "failed to register interrupt\n"); > -- > 2.19.2 -- Dmitry
RE,
-- Hello, First of all i will like to apologies for my manner of communication because you do not know me personally, its due to the fact that i have a very important proposal for you.
[PATCH v5 7/8] arm64: dts: sdm845: Add rpmh powercontroller node
Add the DT node for the rpmhpd powercontroller. Signed-off-by: Rajendra Nayak --- arch/arm64/boot/dts/qcom/sdm845.dtsi | 51 1 file changed, 51 insertions(+) diff --git a/arch/arm64/boot/dts/qcom/sdm845.dtsi b/arch/arm64/boot/dts/qcom/sdm845.dtsi index b72bdb0a31a5..a6d0cd8d17b0 100644 --- a/arch/arm64/boot/dts/qcom/sdm845.dtsi +++ b/arch/arm64/boot/dts/qcom/sdm845.dtsi @@ -10,6 +10,7 @@ #include #include #include +#include #include #include @@ -1324,6 +1325,56 @@ compatible = "qcom,sdm845-rpmh-clk"; #clock-cells = <1>; }; + + rpmhpd: power-controller { + compatible = "qcom,sdm845-rpmhpd"; + #power-domain-cells = <1>; + operating-points-v2 = <_opp_table>; + }; + + rpmhpd_opp_table: opp-table { + compatible = "operating-points-v2-qcom-level"; + + rpmhpd_opp_ret: opp1 { + qcom,level = ; + }; + + rpmhpd_opp_min_svs: opp2 { + qcom,level = ; + }; + + rpmhpd_opp_low_svs: opp3 { + qcom,level = ; + }; + + rpmhpd_opp_svs: opp4 { + qcom,level = ; + }; + + rpmhpd_opp_svs_l1: opp5 { + qcom,level = ; + }; + + rpmhpd_opp_nom: opp6 { + qcom,level = ; + }; + + rpmhpd_opp_nom_l1: opp7 { + qcom,level = ; + }; + + rpmhpd_opp_nom_l2: opp8 { + qcom,level = ; + }; + + rpmhpd_opp_turbo: opp9 { + qcom,level = ; + }; + + rpmhpd_opp_turbo_l1: opp10 { + qcom,level = ; + }; + }; }; intc: interrupt-controller@17a0 { -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH v5 5/8] arm64: dts: msm8996: Add rpmpd device node
Add rpmpd device node and its OPP table Signed-off-by: Rajendra Nayak Signed-off-by: Viresh Kumar Reviewed-by: Ulf Hansson --- arch/arm64/boot/dts/qcom/msm8996.dtsi | 34 +++ 1 file changed, 34 insertions(+) diff --git a/arch/arm64/boot/dts/qcom/msm8996.dtsi b/arch/arm64/boot/dts/qcom/msm8996.dtsi index b29fe80d7288..ed35f0ced699 100644 --- a/arch/arm64/boot/dts/qcom/msm8996.dtsi +++ b/arch/arm64/boot/dts/qcom/msm8996.dtsi @@ -306,6 +306,40 @@ #clock-cells = <1>; }; + rpmpd: power-controller { + compatible = "qcom,msm8996-rpmpd"; + #power-domain-cells = <1>; + operating-points-v2 = <_opp_table>; + }; + + rpmpd_opp_table: opp-table { + compatible = "operating-points-v2-qcom-level"; + + rpmpd_opp1: opp1 { + qcom,level = <1>; + }; + + rpmpd_opp2: opp2 { + qcom,level = <2>; + }; + + rpmpd_opp3: opp3 { + qcom,level = <3>; + }; + + rpmpd_opp4: opp4 { + qcom,level = <4>; + }; + + rpmpd_opp5: opp5 { + qcom,level = <5>; + }; + + rpmpd_opp6: opp6 { + qcom,level = <6>; + }; + }; + pm8994-regulators { compatible = "qcom,rpm-pm8994-regulators"; -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH v5 6/8] soc: qcom: rpmhpd: Add RPMh Power domain driver
The RPMh Power domain driver aggregates the corner votes from various consumers for the ARC resources and communicates it to RPMh. With RPMh we use 2 different numbering space for corners, one used by the clients to express their performance needs, and another used to communicate to RPMh hardware. The clients express their performance requirements using a sparse numbering space which are mapped to meaningful levels like RET, SVS, NOMINAL, TURBO etc which then get mapped to another number space between 0 and 15 which is communicated to RPMh. The sparse number space, also referred to as vlvl is mapped to the continuous number space of 0 to 15, also referred to as hlvl, using command DB. Some power domain clients could request a performance state only while the CPU is active, while some others could request for a certain performance state all the time regardless of the state of the CPU. We handle this by internally aggregating the votes from both type of clients and then send the aggregated votes to RPMh. There are also 3 different types of Votes that are comunicated to RPMh for every resource. 1. ACTIVE_ONLY: This specifies the requirement for the resource when the CPU is active 2. SLEEP: This specifies the requirement for the resource when the CPU is going to sleep 3. WAKE_ONLY: This specifies the requirement for the resource when the CPU is coming out of sleep to active state We add data for all power domains on sdm845 SoC as part of the patch. The driver can be extended to support other SoCs which support RPMh Signed-off-by: Rajendra Nayak Reviewed-by: Ulf Hansson --- drivers/soc/qcom/Kconfig | 9 + drivers/soc/qcom/Makefile | 1 + drivers/soc/qcom/rpmhpd.c | 431 ++ 3 files changed, 441 insertions(+) create mode 100644 drivers/soc/qcom/rpmhpd.c diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig index e9b60695f6e7..a51458022d21 100644 --- a/drivers/soc/qcom/Kconfig +++ b/drivers/soc/qcom/Kconfig @@ -103,6 +103,15 @@ config QCOM_RPMH of hardware components aggregate requests for these resources and help apply the aggregated state on the resource. +config QCOM_RPMHPD + bool "Qualcomm RPMh Power domain driver" + depends on QCOM_RPMH && QCOM_COMMAND_DB + help + QCOM RPMh Power domain driver to support power-domains with + performance states. The driver communicates a performance state + value to RPMh which then translates it into corresponding voltage + for the voltage rail. + config QCOM_RPMPD bool "Qualcomm RPM Power domain driver" depends on MFD_QCOM_RPM && QCOM_SMD_RPM diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile index f1b25fdcf2ad..dd6ca92985ee 100644 --- a/drivers/soc/qcom/Makefile +++ b/drivers/soc/qcom/Makefile @@ -22,3 +22,4 @@ obj-$(CONFIG_QCOM_APR) += apr.o obj-$(CONFIG_QCOM_LLCC) += llcc-slice.o obj-$(CONFIG_QCOM_SDM845_LLCC) += llcc-sdm845.o obj-$(CONFIG_QCOM_RPMPD) += rpmpd.o +obj-$(CONFIG_QCOM_RPMHPD) += rpmhpd.o diff --git a/drivers/soc/qcom/rpmhpd.c b/drivers/soc/qcom/rpmhpd.c new file mode 100644 index ..10b45b4f4588 --- /dev/null +++ b/drivers/soc/qcom/rpmhpd.c @@ -0,0 +1,431 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2018, The Linux Foundation. All rights reserved.*/ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#define domain_to_rpmhpd(domain) container_of(domain, struct rpmhpd, pd) + +/* + * This is the number of bytes used for each command DB aux data entry of an + * ARC resource. + */ +#define RPMH_ARC_LEVEL_SIZE2 +#define RPMH_ARC_MAX_LEVELS16 + +/** + * struct rpmhpd - top level RPMh power domain resource data structure + * @dev: rpmh power domain controller device + * @pd:generic_pm_domain corrresponding to the power domain + * @peer: A peer power domain in case Active only Voting is supported + * @active_only: True if it represents an Active only peer + * @level: An array of level (vlvl) to corner (hlvl) mappings derived from cmd-db + * @level_count: Number of levels supported by the power domain. max being 16 (0 - 15) + * @enabled: true if the power domain is enabled + * @res_name: Resource name used for cmd-db lookup + * @addr: Resource address as looped up using resource name from cmd-db + */ +struct rpmhpd { + struct device *dev; + struct generic_pm_domain pd; + struct generic_pm_domain *parent; + struct rpmhpd *peer; + const bool active_only; + unsigned intcorner; + unsigned intactive_corner; + u32 level[RPMH_ARC_MAX_LEVELS]; + int level_count; + boolenabled; + const char *res_name; + u32 addr; +}; +
[PATCH v5 3/8] soc: qcom: rpmpd: Add a Power domain driver to model corners
The Power domains for corners just pass the performance state set by the consumers to the RPM (Remote Power manager) which then takes care of setting the appropriate voltage on the corresponding rails to meet the performance needs. We add all Power domain data needed on msm8996 here. This driver can easily be extended by adding data for other qualcomm SoCs as well. Signed-off-by: Rajendra Nayak Signed-off-by: Viresh Kumar Reviewed-by: Ulf Hansson Acked-by: Rob Herring --- drivers/soc/qcom/Kconfig | 9 ++ drivers/soc/qcom/Makefile | 1 + drivers/soc/qcom/rpmpd.c | 294 ++ 3 files changed, 304 insertions(+) create mode 100644 drivers/soc/qcom/rpmpd.c diff --git a/drivers/soc/qcom/Kconfig b/drivers/soc/qcom/Kconfig index 684cb51694d1..e9b60695f6e7 100644 --- a/drivers/soc/qcom/Kconfig +++ b/drivers/soc/qcom/Kconfig @@ -103,6 +103,15 @@ config QCOM_RPMH of hardware components aggregate requests for these resources and help apply the aggregated state on the resource. +config QCOM_RPMPD + bool "Qualcomm RPM Power domain driver" + depends on MFD_QCOM_RPM && QCOM_SMD_RPM + help + QCOM RPM Power domain driver to support power-domains with + performance states. The driver communicates a performance state + value to RPM which then translates it into corresponding voltage + for the voltage rail. + config QCOM_SMEM tristate "Qualcomm Shared Memory Manager (SMEM)" depends on ARCH_QCOM || COMPILE_TEST diff --git a/drivers/soc/qcom/Makefile b/drivers/soc/qcom/Makefile index f25b54cd6cf8..f1b25fdcf2ad 100644 --- a/drivers/soc/qcom/Makefile +++ b/drivers/soc/qcom/Makefile @@ -21,3 +21,4 @@ obj-$(CONFIG_QCOM_WCNSS_CTRL) += wcnss_ctrl.o obj-$(CONFIG_QCOM_APR) += apr.o obj-$(CONFIG_QCOM_LLCC) += llcc-slice.o obj-$(CONFIG_QCOM_SDM845_LLCC) += llcc-sdm845.o +obj-$(CONFIG_QCOM_RPMPD) += rpmpd.o diff --git a/drivers/soc/qcom/rpmpd.c b/drivers/soc/qcom/rpmpd.c new file mode 100644 index ..a0b9f122d793 --- /dev/null +++ b/drivers/soc/qcom/rpmpd.c @@ -0,0 +1,294 @@ +// SPDX-License-Identifier: GPL-2.0 +/* Copyright (c) 2017-2018, The Linux Foundation. All rights reserved. */ + +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include +#include + +#include +#include + +#define domain_to_rpmpd(domain) container_of(domain, struct rpmpd, pd) + +/* Resource types */ +#define RPMPD_SMPA 0x61706d73 +#define RPMPD_LDOA 0x616f646c + +/* Operation Keys */ +#define KEY_CORNER 0x6e726f63 /* corn */ +#define KEY_ENABLE 0x6e657773 /* swen */ +#define KEY_FLOOR_CORNER 0x636676 /* vfc */ + +#define DEFINE_RPMPD_CORN_SMPA(_platform, _name, _active, r_id) \ + static struct rpmpd _platform##_##_active; \ + static struct rpmpd _platform##_##_name = { \ + .pd = { .name = #_name, }, \ + .peer = &_platform##_##_active, \ + .res_type = RPMPD_SMPA, \ + .res_id = r_id, \ + .key = KEY_CORNER, \ + }; \ + static struct rpmpd _platform##_##_active = { \ + .pd = { .name = #_active, },\ + .peer = &_platform##_##_name, \ + .active_only = true,\ + .res_type = RPMPD_SMPA, \ + .res_id = r_id, \ + .key = KEY_CORNER, \ + } + +#define DEFINE_RPMPD_CORN_LDOA(_platform, _name, r_id) \ + static struct rpmpd _platform##_##_name = { \ + .pd = { .name = #_name, }, \ + .res_type = RPMPD_LDOA, \ + .res_id = r_id, \ + .key = KEY_CORNER, \ + } + +#define DEFINE_RPMPD_VFC(_platform, _name, r_id, r_type) \ + static struct rpmpd _platform##_##_name = { \ + .pd = { .name = #_name, }, \ + .res_type = r_type, \ + .res_id = r_id, \ + .key = KEY_FLOOR_CORNER,\ + } + +#define DEFINE_RPMPD_VFC_SMPA(_platform, _name, r_id) \ + DEFINE_RPMPD_VFC(_platform, _name, r_id, RPMPD_SMPA) +
[PATCH v5 4/8] soc: qcom: rpmpd: Add support for get/set performance state
Add support for the .set_performace_state() and .opp_to_performance_state() callbacks in the rpmpd driver. Signed-off-by: Rajendra Nayak Signed-off-by: Viresh Kumar Reviewed-by: Ulf Hansson --- drivers/soc/qcom/rpmpd.c | 46 1 file changed, 46 insertions(+) diff --git a/drivers/soc/qcom/rpmpd.c b/drivers/soc/qcom/rpmpd.c index a0b9f122d793..eb1cfa6a03d6 100644 --- a/drivers/soc/qcom/rpmpd.c +++ b/drivers/soc/qcom/rpmpd.c @@ -12,6 +12,7 @@ #include #include #include +#include #include #include @@ -28,6 +29,8 @@ #define KEY_ENABLE 0x6e657773 /* swen */ #define KEY_FLOOR_CORNER 0x636676 /* vfc */ +#define MAX_RPMPD_STATE6 + #define DEFINE_RPMPD_CORN_SMPA(_platform, _name, _active, r_id) \ static struct rpmpd _platform##_##_active; \ static struct rpmpd _platform##_##_name = { \ @@ -221,6 +224,47 @@ static int rpmpd_power_off(struct generic_pm_domain *domain) return ret; } +static int rpmpd_set_performance(struct generic_pm_domain *domain, +unsigned int state) +{ + int ret = 0; + struct rpmpd *pd = domain_to_rpmpd(domain); + + mutex_lock(_lock); + + if (state > MAX_RPMPD_STATE) + goto out; + + pd->corner = state; + + if (!pd->enabled && (pd->key != KEY_FLOOR_CORNER)) + goto out; + + ret = rpmpd_aggregate_corner(pd); + +out: + mutex_unlock(_lock); + + return ret; +} + +static unsigned int rpmpd_get_performance(struct generic_pm_domain *genpd, + struct dev_pm_opp *opp) +{ + struct device_node *np; + unsigned int corner = 0; + + np = dev_pm_opp_get_of_node(opp); + if (of_property_read_u32(np, "qcom,level", )) { + pr_err("%s: missing 'qcom,level' property\n", __func__); + return 0; + } + + of_node_put(np); + + return corner; +} + static int rpmpd_probe(struct platform_device *pdev) { int i; @@ -261,6 +305,8 @@ static int rpmpd_probe(struct platform_device *pdev) rpmpds[i]->rpm = rpm; rpmpds[i]->pd.power_off = rpmpd_power_off; rpmpds[i]->pd.power_on = rpmpd_power_on; + rpmpds[i]->pd.set_performance_state = rpmpd_set_performance; + rpmpds[i]->pd.opp_to_performance_state = rpmpd_get_performance; pm_genpd_init([i]->pd, NULL, true); data->domains[i] = [i]->pd; -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
[PATCH v5 0/8] Add power domain driver for corners on msm8996/sdm845
Hi Rob, This series is mainly pending your review/ack for the DT parts. Rest of the genpd parts are reviewed and acked by both Viresh and Ulf. Changes in v5: * First 6 patches are unchanged * Patch 7/8 adds the DT node for rpmh power-controller on sdm845 and the corresponding OPP tables for it to describe the performance states * Patch 8/8 adds a parent/child relationship across mx/cx and mx_ao/cx_ao as needed on sdm845 platform. This patch is dependent on the series from Viresh [1] which adds support to propogate performance states across the power domain hierarchy which is still being reviewed Changes in v4: * Included the patch to add qcom-opp bindings (dropped accidentally in v3) * merged the patches to add bindings for rpm and rpmh, added consumer binding example * Made the drivers built in, removed .remove * Added better description in changelog for PATCH 6/6 * Updated rpmhpd_aggregate_corner() based on Davids feedback * rpmhpd_set_performance_state() returns max corner, in cases where its called with an INT_MAX * Dropped the patch to max vote on all corners at init, the patch did not work anyway, and it shouldn't be needed now Changes in v3: * Bindings split into seperate patches * Bindings updated to remove duplicate OPP table phandles * DT headers defining macros for Power domain indexes and OPP levels * Optimisations to use rpmh_write_async() whereever applicable * Fixed up handling of ACTIVE_ONLY/WAKE_ONLY/SLEEP voting for RPMh * Fixed the vlvl to hlvl conversions in set_performance * Other minor fixes based on review of v2 * TODO: This series does not handle the case where all VDD_MX votes should be higher than VDD_CX from APPs, as pointed out by David Collins in v2. This needs support at genpd to propogate performance state up the parents, if we model these as Parent/Child to handle the interdependency. Changes in v2: * added a power domain driver for sdm845 which supports communicating to RPMh * dropped the changes to sdhc driver to move over to using OPP as there is active discussion on using OPP as the interface vs handling all of it in clock drivers * Other minor binding updates based on review of v1 With performance state support for genpd/OPP merged, this is an effort to model a power domain driver to communicate corner/level values for qualcomm platforms to RPM (Remote Power Manager) and RPMh. [1] https://lkml.org/lkml/2018/11/26/333 Rajendra Nayak (8): dt-bindings: opp: Introduce qcom-opp bindings dt-bindings: power: Add qcom rpm power domain driver bindings soc: qcom: rpmpd: Add a Power domain driver to model corners soc: qcom: rpmpd: Add support for get/set performance state arm64: dts: msm8996: Add rpmpd device node soc: qcom: rpmhpd: Add RPMh Power domain driver arm64: dts: sdm845: Add rpmh powercontroller node soc: qcom: rpmhpd: Mark mx as a parent for cx .../devicetree/bindings/opp/qcom-opp.txt | 25 + .../devicetree/bindings/power/qcom,rpmpd.txt | 146 ++ arch/arm64/boot/dts/qcom/msm8996.dtsi | 34 ++ arch/arm64/boot/dts/qcom/sdm845.dtsi | 51 ++ drivers/soc/qcom/Kconfig | 18 + drivers/soc/qcom/Makefile | 2 + drivers/soc/qcom/rpmhpd.c | 442 ++ drivers/soc/qcom/rpmpd.c | 340 ++ include/dt-bindings/power/qcom-rpmpd.h| 39 ++ 9 files changed, 1097 insertions(+) create mode 100644 Documentation/devicetree/bindings/opp/qcom-opp.txt create mode 100644 Documentation/devicetree/bindings/power/qcom,rpmpd.txt create mode 100644 drivers/soc/qcom/rpmhpd.c create mode 100644 drivers/soc/qcom/rpmpd.c create mode 100644 include/dt-bindings/power/qcom-rpmpd.h -- QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum, hosted by The Linux Foundation
Re: [PATCH] jbd2: clean up indentation issue, replace spaces with tab
On Mon, Nov 26, 2018 at 02:56:32PM +0100, Jan Kara wrote: > On Fri 23-11-18 16:40:53, Colin King wrote: > > From: Colin Ian King > > > > There is a statement that is indented with spaces, replace it with > > a tab. > > > > Signed-off-by: Colin Ian King > > Looks good. You can add: > > Reviewed-by: Jan Kara Thanks, applied. - Ted