date:20181203

Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions

2018-12-03 Thread Mike Rapoport

Hi John,

Thanks for having documentation as a part of the patch. Some kernel-doc
nits below.

On Mon, Dec 03, 2018 at 04:17:19PM -0800, john.hubb...@gmail.com wrote:
> From: John Hubbard 
> 
> Introduces put_user_page(), which simply calls put_page().
> This provides a way to update all get_user_pages*() callers,
> so that they call put_user_page(), instead of put_page().
> 
> Also introduces put_user_pages(), and a few dirty/locked variations,
> as a replacement for release_pages(), and also as a replacement
> for open-coded loops that release multiple pages.
> These may be used for subsequent performance improvements,
> via batching of pages to be released.
> 
> This is the first step of fixing the problem described in [1]. The steps
> are:
> 
> 1) (This patch): provide put_user_page*() routines, intended to be used
>for releasing pages that were pinned via get_user_pages*().
> 
> 2) Convert all of the call sites for get_user_pages*(), to
>invoke put_user_page*(), instead of put_page(). This involves dozens of
>call sites, and will take some time.
> 
> 3) After (2) is complete, use get_user_pages*() and put_user_page*() to
>implement tracking of these pages. This tracking will be separate from
>the existing struct page refcounting.
> 
> 4) Use the tracking and identification of these pages, to implement
>special handling (especially in writeback paths) when the pages are
>backed by a filesystem. Again, [1] provides details as to why that is
>desirable.
> 
> [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"
> 
> Reviewed-by: Jan Kara 
> 
> Cc: Matthew Wilcox 
> Cc: Michal Hocko 
> Cc: Christopher Lameter 
> Cc: Jason Gunthorpe 
> Cc: Dan Williams 
> Cc: Jan Kara 
> Cc: Al Viro 
> Cc: Jerome Glisse 
> Cc: Christoph Hellwig 
> Cc: Ralph Campbell 
> Signed-off-by: John Hubbard 
> ---
>  include/linux/mm.h | 20 
>  mm/swap.c  | 80 ++
>  2 files changed, 100 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 5411de93a363..09fbb2c81aba 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -963,6 +963,26 @@ static inline void put_page(struct page *page)
>   __put_page(page);
>  }
> 
> +/*
> + * put_user_page() - release a page that had previously been acquired via
> + * a call to one of the get_user_pages*() functions.

Please add @page parameter description, otherwise kernel-doc is unhappy

> + *
> + * Pages that were pinned via get_user_pages*() must be released via
> + * either put_user_page(), or one of the put_user_pages*() routines
> + * below. This is so that eventually, pages that are pinned via
> + * get_user_pages*() can be separately tracked and uniquely handled. In
> + * particular, interactions with RDMA and filesystems need special
> + * handling.
> + */
> +static inline void put_user_page(struct page *page)
> +{
> + put_page(page);
> +}
> +
> +void put_user_pages_dirty(struct page **pages, unsigned long npages);
> +void put_user_pages_dirty_lock(struct page **pages, unsigned long npages);
> +void put_user_pages(struct page **pages, unsigned long npages);
> +
>  #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
>  #define SECTION_IN_PAGE_FLAGS
>  #endif
> diff --git a/mm/swap.c b/mm/swap.c
> index aa483719922e..bb8c32595e5f 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -133,6 +133,86 @@ void put_pages_list(struct list_head *pages)
>  }
>  EXPORT_SYMBOL(put_pages_list);
> 
> +typedef int (*set_dirty_func)(struct page *page);
> +
> +static void __put_user_pages_dirty(struct page **pages,
> +unsigned long npages,
> +set_dirty_func sdf)
> +{
> + unsigned long index;
> +
> + for (index = 0; index < npages; index++) {
> + struct page *page = compound_head(pages[index]);
> +
> + if (!PageDirty(page))
> + sdf(page);
> +
> + put_user_page(page);
> + }
> +}
> +
> +/*
> + * put_user_pages_dirty() - for each page in the @pages array, make
> + * that page (or its head page, if a compound page) dirty, if it was
> + * previously listed as clean. Then, release the page using
> + * put_user_page().
> + *
> + * Please see the put_user_page() documentation for details.
> + *
> + * set_page_dirty(), which does not lock the page, is used here.
> + * Therefore, it is the caller's responsibility to ensure that this is
> + * safe. If not, then put_user_pages_dirty_lock() should be called instead.
> + *
> + * @pages:  array of pages to be marked dirty and released.
> + * @npages: number of pages in the @pages array.

Please put the parameters description next to the brief function
description, as described in [1]

[1] 
https://www.kernel.org/doc/html/latest/doc-guide/kernel-doc.html#function-documentation


> + *
> + */
> +void put_user_pages_dirty(struct page **pages, unsigned long npages)
> +{
>

Re: [PATCH 1/2] mm: introduce put_user_page*(), placeholder versions

2018-12-03 Thread Mike Rapoport

Hi John,

Thanks for having documentation as a part of the patch. Some kernel-doc
nits below.

On Mon, Dec 03, 2018 at 04:17:19PM -0800, john.hubb...@gmail.com wrote:
> From: John Hubbard 
> 
> Introduces put_user_page(), which simply calls put_page().
> This provides a way to update all get_user_pages*() callers,
> so that they call put_user_page(), instead of put_page().
> 
> Also introduces put_user_pages(), and a few dirty/locked variations,
> as a replacement for release_pages(), and also as a replacement
> for open-coded loops that release multiple pages.
> These may be used for subsequent performance improvements,
> via batching of pages to be released.
> 
> This is the first step of fixing the problem described in [1]. The steps
> are:
> 
> 1) (This patch): provide put_user_page*() routines, intended to be used
>for releasing pages that were pinned via get_user_pages*().
> 
> 2) Convert all of the call sites for get_user_pages*(), to
>invoke put_user_page*(), instead of put_page(). This involves dozens of
>call sites, and will take some time.
> 
> 3) After (2) is complete, use get_user_pages*() and put_user_page*() to
>implement tracking of these pages. This tracking will be separate from
>the existing struct page refcounting.
> 
> 4) Use the tracking and identification of these pages, to implement
>special handling (especially in writeback paths) when the pages are
>backed by a filesystem. Again, [1] provides details as to why that is
>desirable.
> 
> [1] https://lwn.net/Articles/753027/ : "The Trouble with get_user_pages()"
> 
> Reviewed-by: Jan Kara 
> 
> Cc: Matthew Wilcox 
> Cc: Michal Hocko 
> Cc: Christopher Lameter 
> Cc: Jason Gunthorpe 
> Cc: Dan Williams 
> Cc: Jan Kara 
> Cc: Al Viro 
> Cc: Jerome Glisse 
> Cc: Christoph Hellwig 
> Cc: Ralph Campbell 
> Signed-off-by: John Hubbard 
> ---
>  include/linux/mm.h | 20 
>  mm/swap.c  | 80 ++
>  2 files changed, 100 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 5411de93a363..09fbb2c81aba 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -963,6 +963,26 @@ static inline void put_page(struct page *page)
>   __put_page(page);
>  }
> 
> +/*
> + * put_user_page() - release a page that had previously been acquired via
> + * a call to one of the get_user_pages*() functions.

Please add @page parameter description, otherwise kernel-doc is unhappy

> + *
> + * Pages that were pinned via get_user_pages*() must be released via
> + * either put_user_page(), or one of the put_user_pages*() routines
> + * below. This is so that eventually, pages that are pinned via
> + * get_user_pages*() can be separately tracked and uniquely handled. In
> + * particular, interactions with RDMA and filesystems need special
> + * handling.
> + */
> +static inline void put_user_page(struct page *page)
> +{
> + put_page(page);
> +}
> +
> +void put_user_pages_dirty(struct page **pages, unsigned long npages);
> +void put_user_pages_dirty_lock(struct page **pages, unsigned long npages);
> +void put_user_pages(struct page **pages, unsigned long npages);
> +
>  #if defined(CONFIG_SPARSEMEM) && !defined(CONFIG_SPARSEMEM_VMEMMAP)
>  #define SECTION_IN_PAGE_FLAGS
>  #endif
> diff --git a/mm/swap.c b/mm/swap.c
> index aa483719922e..bb8c32595e5f 100644
> --- a/mm/swap.c
> +++ b/mm/swap.c
> @@ -133,6 +133,86 @@ void put_pages_list(struct list_head *pages)
>  }
>  EXPORT_SYMBOL(put_pages_list);
> 
> +typedef int (*set_dirty_func)(struct page *page);
> +
> +static void __put_user_pages_dirty(struct page **pages,
> +unsigned long npages,
> +set_dirty_func sdf)
> +{
> + unsigned long index;
> +
> + for (index = 0; index < npages; index++) {
> + struct page *page = compound_head(pages[index]);
> +
> + if (!PageDirty(page))
> + sdf(page);
> +
> + put_user_page(page);
> + }
> +}
> +
> +/*
> + * put_user_pages_dirty() - for each page in the @pages array, make
> + * that page (or its head page, if a compound page) dirty, if it was
> + * previously listed as clean. Then, release the page using
> + * put_user_page().
> + *
> + * Please see the put_user_page() documentation for details.
> + *
> + * set_page_dirty(), which does not lock the page, is used here.
> + * Therefore, it is the caller's responsibility to ensure that this is
> + * safe. If not, then put_user_pages_dirty_lock() should be called instead.
> + *
> + * @pages:  array of pages to be marked dirty and released.
> + * @npages: number of pages in the @pages array.

Please put the parameters description next to the brief function
description, as described in [1]

[1] 
https://www.kernel.org/doc/html/latest/doc-guide/kernel-doc.html#function-documentation


> + *
> + */
> +void put_user_pages_dirty(struct page **pages, unsigned long npages)
> +{
>

RE: rcu_preempt caused oom

2018-12-03 Thread He, Bo

Hi, Paul:
the enclosed is the log trigger the 120s hung_task_panic without other debug 
patches, the hung task is blocked at __wait_rcu_gp, it means the rcu_cpu_stall 
can't detect the scenario:
echo 1 > /proc/sys/kernel/panic_on_rcu_stall
echo 7 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout


-Original Message-
From: Paul E. McKenney  
Sent: Monday, December 3, 2018 9:57 PM
To: He, Bo 
Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; 
j...@joshtriplett.org; mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; 
Zhang, Jun ; Xiao, Jin ; Zhang, Yanmin 

Subject: Re: rcu_preempt caused oom

On Mon, Dec 03, 2018 at 07:44:03AM +, He, Bo wrote:
> Thanks, we have run the test for the whole weekend and not reproduce the 
> issue,  so we confirm the CONFIG_RCU_BOOST can fix the issue.

Very good, that is encouraging.  Perhaps I should think about making 
CONFIG_RCU_BOOST=y the default for CONFIG_PREEMPT in mainline, at least for 
architectures for which rt_mutexes are implemented.

> We have enabled the rcupdate.rcu_cpu_stall_timeout=7 and also set panic on 
> rcu stall and will see if we can see the panic, will keep you posed with the 
> test results.
> echo 1 > /proc/sys/kernel/panic_on_rcu_stall

Looking forward to seeing what is going on!  Of course, to reproduce, you will 
need to again build with CONFIG_RCU_BOOST=n.

Thanx, Paul

> -Original Message-
> From: Paul E. McKenney 
> Sent: Saturday, December 1, 2018 12:49 AM
> To: He, Bo 
> Cc: Steven Rostedt ; 
> linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
> mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Zhang, Jun 
> ; Xiao, Jin ; Zhang, Yanmin 
> 
> Subject: Re: rcu_preempt caused oom
> 
> On Fri, Nov 30, 2018 at 03:18:58PM +, He, Bo wrote:
> > Here is the kernel cmdline:
> 
> Thank you!
> 
> > Kernel command line: androidboot.acpio_idx=0
> > androidboot.bootloader=efiwrapper-02_03-userdebug_kernelflinger-06_0
> > 3- userdebug androidboot.diskbus=00.0 
> > androidboot.verifiedbootstate=green
> > androidboot.bootreason=power-on androidboot.serialno=R1J56L6006a7bb
> > g_ffs.iSerialNumber=R1J56L6006a7bb no_timer_check noxsaves 
> > reboot_panic=p,w i915.hpd_sense_invert=0x7 mem=2G nokaslr nopti 
> > ftrace_dump_on_oops trace_buf_size=1024K intel_iommu=off gpt
> > loglevel=4 androidboot.hardware=gordon_peak 
> > firmware_class.path=/vendor/firmware relative_sleep_states=1
> > enforcing=0 androidboot.selinux=permissive cpu_init_udelay=10 
> > androidboot.android_dt_dir=/sys/bus/platform/devices/ANDR0001:00/pro
> > pe rties/android/ pstore.backend=ramoops memmap=0x140$0x5000
> > ramoops.mem_address=0x5000 ramoops.mem_size=0x140
> > ramoops.record_size=0x4000 ramoops.console_size=0x100
> > ramoops.ftrace_size=0x1 ramoops.dump_oops=1 vga=current
> > i915.modeset=1 drm.atomic=1 i915.nuclear_pageflip=1 
> > drm.vblankoffdelay=
> 
> And no sign of any suppression of RCU CPU stall warnings.  Hmmm...
> It does take more than 21 seconds to OOM?  Or do things happen faster than 
> that?  If they do happen faster than that, then on approach would be to add 
> something like this to the kernel command line:
> 
>   rcupdate.rcu_cpu_stall_timeout=7
> 
> This would set the stall timeout to seven seconds.  Note that timeouts less 
> than three seconds are silently interpreted as three seconds.
> 
>   Thanx, Paul
> 
> > -Original Message-
> > From: Steven Rostedt 
> > Sent: Friday, November 30, 2018 11:17 PM
> > To: Paul E. McKenney 
> > Cc: He, Bo ; linux-kernel@vger.kernel.org; 
> > j...@joshtriplett.org; mathieu.desnoy...@efficios.com; 
> > jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin 
> > ; Zhang, Yanmin 
> > Subject: Re: rcu_preempt caused oom
> > 
> > On Fri, 30 Nov 2018 06:43:17 -0800
> > "Paul E. McKenney"  wrote:
> > 
> > > Could you please send me your list of kernel boot parameters?  
> > > They usually appear near the start of your console output.
> > 
> > Or just: cat /proc/cmdline
> > 
> > -- Steve
> > 
> 



apanic_console
Description: apanic_console

RE: rcu_preempt caused oom

2018-12-03 Thread He, Bo

Hi, Paul:
the enclosed is the log trigger the 120s hung_task_panic without other debug 
patches, the hung task is blocked at __wait_rcu_gp, it means the rcu_cpu_stall 
can't detect the scenario:
echo 1 > /proc/sys/kernel/panic_on_rcu_stall
echo 7 > /sys/module/rcupdate/parameters/rcu_cpu_stall_timeout


-Original Message-
From: Paul E. McKenney  
Sent: Monday, December 3, 2018 9:57 PM
To: He, Bo 
Cc: Steven Rostedt ; linux-kernel@vger.kernel.org; 
j...@joshtriplett.org; mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; 
Zhang, Jun ; Xiao, Jin ; Zhang, Yanmin 

Subject: Re: rcu_preempt caused oom

On Mon, Dec 03, 2018 at 07:44:03AM +, He, Bo wrote:
> Thanks, we have run the test for the whole weekend and not reproduce the 
> issue,  so we confirm the CONFIG_RCU_BOOST can fix the issue.

Very good, that is encouraging.  Perhaps I should think about making 
CONFIG_RCU_BOOST=y the default for CONFIG_PREEMPT in mainline, at least for 
architectures for which rt_mutexes are implemented.

> We have enabled the rcupdate.rcu_cpu_stall_timeout=7 and also set panic on 
> rcu stall and will see if we can see the panic, will keep you posed with the 
> test results.
> echo 1 > /proc/sys/kernel/panic_on_rcu_stall

Looking forward to seeing what is going on!  Of course, to reproduce, you will 
need to again build with CONFIG_RCU_BOOST=n.

Thanx, Paul

> -Original Message-
> From: Paul E. McKenney 
> Sent: Saturday, December 1, 2018 12:49 AM
> To: He, Bo 
> Cc: Steven Rostedt ; 
> linux-kernel@vger.kernel.org; j...@joshtriplett.org; 
> mathieu.desnoy...@efficios.com; jiangshan...@gmail.com; Zhang, Jun 
> ; Xiao, Jin ; Zhang, Yanmin 
> 
> Subject: Re: rcu_preempt caused oom
> 
> On Fri, Nov 30, 2018 at 03:18:58PM +, He, Bo wrote:
> > Here is the kernel cmdline:
> 
> Thank you!
> 
> > Kernel command line: androidboot.acpio_idx=0
> > androidboot.bootloader=efiwrapper-02_03-userdebug_kernelflinger-06_0
> > 3- userdebug androidboot.diskbus=00.0 
> > androidboot.verifiedbootstate=green
> > androidboot.bootreason=power-on androidboot.serialno=R1J56L6006a7bb
> > g_ffs.iSerialNumber=R1J56L6006a7bb no_timer_check noxsaves 
> > reboot_panic=p,w i915.hpd_sense_invert=0x7 mem=2G nokaslr nopti 
> > ftrace_dump_on_oops trace_buf_size=1024K intel_iommu=off gpt
> > loglevel=4 androidboot.hardware=gordon_peak 
> > firmware_class.path=/vendor/firmware relative_sleep_states=1
> > enforcing=0 androidboot.selinux=permissive cpu_init_udelay=10 
> > androidboot.android_dt_dir=/sys/bus/platform/devices/ANDR0001:00/pro
> > pe rties/android/ pstore.backend=ramoops memmap=0x140$0x5000
> > ramoops.mem_address=0x5000 ramoops.mem_size=0x140
> > ramoops.record_size=0x4000 ramoops.console_size=0x100
> > ramoops.ftrace_size=0x1 ramoops.dump_oops=1 vga=current
> > i915.modeset=1 drm.atomic=1 i915.nuclear_pageflip=1 
> > drm.vblankoffdelay=
> 
> And no sign of any suppression of RCU CPU stall warnings.  Hmmm...
> It does take more than 21 seconds to OOM?  Or do things happen faster than 
> that?  If they do happen faster than that, then on approach would be to add 
> something like this to the kernel command line:
> 
>   rcupdate.rcu_cpu_stall_timeout=7
> 
> This would set the stall timeout to seven seconds.  Note that timeouts less 
> than three seconds are silently interpreted as three seconds.
> 
>   Thanx, Paul
> 
> > -Original Message-
> > From: Steven Rostedt 
> > Sent: Friday, November 30, 2018 11:17 PM
> > To: Paul E. McKenney 
> > Cc: He, Bo ; linux-kernel@vger.kernel.org; 
> > j...@joshtriplett.org; mathieu.desnoy...@efficios.com; 
> > jiangshan...@gmail.com; Zhang, Jun ; Xiao, Jin 
> > ; Zhang, Yanmin 
> > Subject: Re: rcu_preempt caused oom
> > 
> > On Fri, 30 Nov 2018 06:43:17 -0800
> > "Paul E. McKenney"  wrote:
> > 
> > > Could you please send me your list of kernel boot parameters?  
> > > They usually appear near the start of your console output.
> > 
> > Or just: cat /proc/cmdline
> > 
> > -- Steve
> > 
> 



apanic_console
Description: apanic_console

Re: [PATCH 3/9] tools/lib/traceevent: Install trace-seq.h API header file

2018-12-03 Thread Namhyung Kim

On Fri, Nov 30, 2018 at 10:44:06AM -0500, Steven Rostedt wrote:
> From: Tzvetomir Stoyanov 
> 
> This patch installs trace-seq.h header file on "make install".
> 
> Signed-off-by: Tzvetomir Stoyanov 
> Signed-off-by: Steven Rostedt (VMware) 
> ---
>  tools/lib/traceevent/Makefile | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/lib/traceevent/Makefile b/tools/lib/traceevent/Makefile
> index adb16f845ab3..67fe5d7ef190 100644
> --- a/tools/lib/traceevent/Makefile
> +++ b/tools/lib/traceevent/Makefile
> @@ -285,7 +285,7 @@ define do_install_pkgconfig_file
>   fi
>  endef
>  
> -install_lib: all_cmd install_plugins install_pkgconfig
> +install_lib: all_cmd install_plugins install_headers install_pkgconfig
>   $(call QUIET_INSTALL, $(LIB_TARGET)) \
>   $(call do_install_mkdir,$(libdir_SQ)); \
>   cp -fpR $(LIB_INSTALL) $(DESTDIR)$(libdir_SQ)
> @@ -302,6 +302,7 @@ install_headers:
>   $(call QUIET_INSTALL, headers) \
>   $(call 
> do_install,event-parse.h,$(prefix)/include/traceevent,644); \
>   $(call 
> do_install,event-utils.h,$(prefix)/include/traceevent,644); \
> + $(call 
> do_install,trace-seq.h,$(prefix)/include/traceevent,644); \
>   $(call do_install,kbuffer.h,$(prefix)/include/traceevent,644)

Do you still wanna have 'traceevent' directory prefix?  I just
sometimes feel a bit annoying to type it. ;-)

Or you can rename it something like 'tep' or 'libtep' - and hopefully
having only single header file to include..

Thanks,
Namhyung


>  
>  install: install_lib
> -- 
> 2.19.1
> 
>

Re: [PATCH 3/9] tools/lib/traceevent: Install trace-seq.h API header file

2018-12-03 Thread Namhyung Kim

On Fri, Nov 30, 2018 at 10:44:06AM -0500, Steven Rostedt wrote:
> From: Tzvetomir Stoyanov 
> 
> This patch installs trace-seq.h header file on "make install".
> 
> Signed-off-by: Tzvetomir Stoyanov 
> Signed-off-by: Steven Rostedt (VMware) 
> ---
>  tools/lib/traceevent/Makefile | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/tools/lib/traceevent/Makefile b/tools/lib/traceevent/Makefile
> index adb16f845ab3..67fe5d7ef190 100644
> --- a/tools/lib/traceevent/Makefile
> +++ b/tools/lib/traceevent/Makefile
> @@ -285,7 +285,7 @@ define do_install_pkgconfig_file
>   fi
>  endef
>  
> -install_lib: all_cmd install_plugins install_pkgconfig
> +install_lib: all_cmd install_plugins install_headers install_pkgconfig
>   $(call QUIET_INSTALL, $(LIB_TARGET)) \
>   $(call do_install_mkdir,$(libdir_SQ)); \
>   cp -fpR $(LIB_INSTALL) $(DESTDIR)$(libdir_SQ)
> @@ -302,6 +302,7 @@ install_headers:
>   $(call QUIET_INSTALL, headers) \
>   $(call 
> do_install,event-parse.h,$(prefix)/include/traceevent,644); \
>   $(call 
> do_install,event-utils.h,$(prefix)/include/traceevent,644); \
> + $(call 
> do_install,trace-seq.h,$(prefix)/include/traceevent,644); \
>   $(call do_install,kbuffer.h,$(prefix)/include/traceevent,644)

Do you still wanna have 'traceevent' directory prefix?  I just
sometimes feel a bit annoying to type it. ;-)

Or you can rename it something like 'tep' or 'libtep' - and hopefully
having only single header file to include..

Thanks,
Namhyung


>  
>  install: install_lib
> -- 
> 2.19.1
> 
>

Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()

2018-12-03 Thread Aneesh Kumar K.V


On 12/4/18 5:04 AM, jgli...@redhat.com wrote:

From: Jérôme Glisse 

Heterogeneous memory system are becoming more and more the norm, in
those system there is not only the main system memory for each node,
but also device memory and|or memory hierarchy to consider. Device
memory can comes from a device like GPU, FPGA, ... or from a memory
only device (persistent memory, or high density memory device).

Memory hierarchy is when you not only have the main memory but also
other type of memory like HBM (High Bandwidth Memory often stack up
on CPU die or GPU die), peristent memory or high density memory (ie
something slower then regular DDR DIMM but much bigger).

On top of this diversity of memories you also have to account for the
system bus topology ie how all CPUs and devices are connected to each
others. Userspace do not care about the exact physical topology but
care about topology from behavior point of view ie what are all the
paths between an initiator (anything that can initiate memory access
like CPU, GPU, FGPA, network controller ...) and a target memory and
what are all the properties of each of those path (bandwidth, latency,
granularity, ...).

This means that it is no longer sufficient to consider a flat view
for each node in a system but for maximum performance we need to
account for all of this new memory but also for system topology.
This is why this proposal is unlike the HMAT proposal [1] which
tries to extend the existing NUMA for new type of memory. Here we
are tackling a much more profound change that depart from NUMA.


One of the reasons for radical change is the advance of accelerator
like GPU or FPGA means that CPU is no longer the only piece where
computation happens. It is becoming more and more common for an
application to use a mix and match of different accelerator to
perform its computation. So we can no longer satisfy our self with
a CPU centric and flat view of a system like NUMA and NUMA distance.


This patchset is a proposal to tackle this problems through three
aspects:
 1 - Expose complex system topology and various kind of memory
 to user space so that application have a standard way and
 single place to get all the information it cares about.
 2 - A new API for user space to bind/provide hint to kernel on
 which memory to use for range of virtual address (a new
 mbind() syscall).
 3 - Kernel side changes for vm policy to handle this changes

This patchset is not and end to end solution but it provides enough
pieces to be useful against nouveau (upstream open source driver for
NVidia GPU). It is intended as a starting point for discussion so
that we can figure out what to do. To avoid having too much topics
to discuss i am not considering memory cgroup for now but it is
definitely something we will want to integrate with.

The rest of this emails is splits in 3 sections, the first section
talks about complex system topology: what it is, how it is use today
and how to describe it tomorrow. The second sections talks about
new API to bind/provide hint to kernel for range of virtual address.
The third section talks about new mechanism to track bind/hint
provided by user space or device driver inside the kernel.


1) Complex system topology and representing them


Inside a node you can have a complex topology of memory, for instance
you can have multiple HBM memory in a node, each HBM memory tie to a
set of CPUs (all of which are in the same node). This means that you
have a hierarchy of memory for CPUs. The local fast HBM but which is
expected to be relatively small compare to main memory and then the
main memory. New memory technology might also deepen this hierarchy
with another level of yet slower memory but gigantic in size (some
persistent memory technology might fall into that category). Another
example is device memory, and device themself can have a hierarchy
like HBM on top of device core and main device memory.

On top of that you can have multiple path to access each memory and
each path can have different properties (latency, bandwidth, ...).
Also there is not always symmetry ie some memory might only be
accessible by some device or CPU ie not accessible by everyone.

So a flat hierarchy for each node is not capable of representing this
kind of complexity. To simplify discussion and because we do not want
to single out CPU from device, from here on out we will use initiator
to refer to either CPU or device. An initiator is any kind of CPU or
device that can access memory (ie initiate memory access).

At this point a example of such system might help:
 - 2 nodes and for each node:
 - 1 CPU per node with 2 complex of CPUs cores per CPU
 - one HBM memory for each complex of CPUs cores (200GB/s)
 - CPUs cores complex are linked to each other (100GB/s)
 - main memory is (90GB/s)
 - 4 GPUs each with:
 - HBM memory for

Re: [RFC PATCH 00/14] Heterogeneous Memory System (HMS) and hbind()

2018-12-03 Thread Aneesh Kumar K.V


On 12/4/18 5:04 AM, jgli...@redhat.com wrote:

From: Jérôme Glisse 

Heterogeneous memory system are becoming more and more the norm, in
those system there is not only the main system memory for each node,
but also device memory and|or memory hierarchy to consider. Device
memory can comes from a device like GPU, FPGA, ... or from a memory
only device (persistent memory, or high density memory device).

Memory hierarchy is when you not only have the main memory but also
other type of memory like HBM (High Bandwidth Memory often stack up
on CPU die or GPU die), peristent memory or high density memory (ie
something slower then regular DDR DIMM but much bigger).

On top of this diversity of memories you also have to account for the
system bus topology ie how all CPUs and devices are connected to each
others. Userspace do not care about the exact physical topology but
care about topology from behavior point of view ie what are all the
paths between an initiator (anything that can initiate memory access
like CPU, GPU, FGPA, network controller ...) and a target memory and
what are all the properties of each of those path (bandwidth, latency,
granularity, ...).

This means that it is no longer sufficient to consider a flat view
for each node in a system but for maximum performance we need to
account for all of this new memory but also for system topology.
This is why this proposal is unlike the HMAT proposal [1] which
tries to extend the existing NUMA for new type of memory. Here we
are tackling a much more profound change that depart from NUMA.


One of the reasons for radical change is the advance of accelerator
like GPU or FPGA means that CPU is no longer the only piece where
computation happens. It is becoming more and more common for an
application to use a mix and match of different accelerator to
perform its computation. So we can no longer satisfy our self with
a CPU centric and flat view of a system like NUMA and NUMA distance.


This patchset is a proposal to tackle this problems through three
aspects:
 1 - Expose complex system topology and various kind of memory
 to user space so that application have a standard way and
 single place to get all the information it cares about.
 2 - A new API for user space to bind/provide hint to kernel on
 which memory to use for range of virtual address (a new
 mbind() syscall).
 3 - Kernel side changes for vm policy to handle this changes

This patchset is not and end to end solution but it provides enough
pieces to be useful against nouveau (upstream open source driver for
NVidia GPU). It is intended as a starting point for discussion so
that we can figure out what to do. To avoid having too much topics
to discuss i am not considering memory cgroup for now but it is
definitely something we will want to integrate with.

The rest of this emails is splits in 3 sections, the first section
talks about complex system topology: what it is, how it is use today
and how to describe it tomorrow. The second sections talks about
new API to bind/provide hint to kernel for range of virtual address.
The third section talks about new mechanism to track bind/hint
provided by user space or device driver inside the kernel.


1) Complex system topology and representing them


Inside a node you can have a complex topology of memory, for instance
you can have multiple HBM memory in a node, each HBM memory tie to a
set of CPUs (all of which are in the same node). This means that you
have a hierarchy of memory for CPUs. The local fast HBM but which is
expected to be relatively small compare to main memory and then the
main memory. New memory technology might also deepen this hierarchy
with another level of yet slower memory but gigantic in size (some
persistent memory technology might fall into that category). Another
example is device memory, and device themself can have a hierarchy
like HBM on top of device core and main device memory.

On top of that you can have multiple path to access each memory and
each path can have different properties (latency, bandwidth, ...).
Also there is not always symmetry ie some memory might only be
accessible by some device or CPU ie not accessible by everyone.

So a flat hierarchy for each node is not capable of representing this
kind of complexity. To simplify discussion and because we do not want
to single out CPU from device, from here on out we will use initiator
to refer to either CPU or device. An initiator is any kind of CPU or
device that can access memory (ie initiate memory access).

At this point a example of such system might help:
 - 2 nodes and for each node:
 - 1 CPU per node with 2 complex of CPUs cores per CPU
 - one HBM memory for each complex of CPUs cores (200GB/s)
 - CPUs cores complex are linked to each other (100GB/s)
 - main memory is (90GB/s)
 - 4 GPUs each with:
 - HBM memory for

Re: linux-next: manual merge of the char-misc tree with the char-misc.current tree

2018-12-03 Thread Greg KH

On Tue, Dec 04, 2018 at 03:35:13PM +1100, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the char-misc tree got a conflict in:
> 
>   drivers/hv/channel_mgmt.c
> 
> between commit:
> 
>   37c2578c0c40 ("Drivers: hv: vmbus: Offload the handling of channels to two 
> workqueues")
> 
> from the char-misc.current tree and commit:
> 
>   4d3c5c69191f ("Drivers: hv: vmbus: Remove the useless API 
> vmbus_get_outgoing_channel()")
> 
> from the char-misc tree.
> 
> I fixed it up (I used the former version where they conflicted) and can
> carry the fix as necessary. This is now fixed as far as linux-next is
> concerned, but any non trivial conflicts should be mentioned to your
> upstream maintainer when your tree is submitted for merging.  You may
> also want to consider cooperating with the maintainer of the conflicting
> tree to minimise any particularly complex conflicts.

Yeah, this is a mess, I'll wait for the hyper-v developers to send me a
fixup patch for handling this merge issue, as they know it is happening
:(

thanks,

greg k-h

Re: linux-next: manual merge of the char-misc tree with the char-misc.current tree

2018-12-03 Thread Greg KH

On Tue, Dec 04, 2018 at 03:35:13PM +1100, Stephen Rothwell wrote:
> Hi all,
> 
> Today's linux-next merge of the char-misc tree got a conflict in:
> 
>   drivers/hv/channel_mgmt.c
> 
> between commit:
> 
>   37c2578c0c40 ("Drivers: hv: vmbus: Offload the handling of channels to two 
> workqueues")
> 
> from the char-misc.current tree and commit:
> 
>   4d3c5c69191f ("Drivers: hv: vmbus: Remove the useless API 
> vmbus_get_outgoing_channel()")
> 
> from the char-misc tree.
> 
> I fixed it up (I used the former version where they conflicted) and can
> carry the fix as necessary. This is now fixed as far as linux-next is
> concerned, but any non trivial conflicts should be mentioned to your
> upstream maintainer when your tree is submitted for merging.  You may
> also want to consider cooperating with the maintainer of the conflicting
> tree to minimise any particularly complex conflicts.

Yeah, this is a mess, I'll wait for the hyper-v developers to send me a
fixup patch for handling this merge issue, as they know it is happening
:(

thanks,

greg k-h

Re: [PATCH] ubi: fastmap: Check each mapping only once

2018-12-03 Thread Greg Kroah-Hartman

On Tue, Dec 04, 2018 at 08:39:16AM +0100, Martin Kepplinger wrote:
> On 02.12.18 16:02, Richard Weinberger wrote:
> > Sasha,
> > 
> > Am Sonntag, 2. Dezember 2018, 15:35:43 CET schrieb Sasha Levin:
> > > On Sun, Dec 02, 2018 at 11:50:33AM +, Sudip Mukherjee wrote:
> > > > > > Now queued up for 4.14.y, thanks.
> > > > > 
> > > > > can you *please* slow a little down?
> > > > 
> > > > True. It will really help if you can have some sort of fixed schedule
> > > > for stable release, like maybe stablerc is ready on Thursday or Friday
> > > > and release the stable on Monday. Having a weekend in stablerc will be
> > > > helpful for people like me who only get the time in weekends for
> > > > upstream or stable kernel.
> > > 
> > > Any sort of schedule will never work for everyone (for example, if it's
> > > part of your paid job - you don't necessarily want to review stuff over
> > > the weekend).
> > 
> > a schedule is not needed, but please give maintainers at least a chance
> > to react on stable inclusion request.
> > In this case Martin asked for inclusion on Monday and the patch was applied
> > two days later.
> 
> True, especially when the maintainer is asked a question as part of the
> patch.
> 
> I've already had the feeling that we'd need the other patch too, but in this
> case at least I should have searched for Fixes tags.
> 
> Greg, how about reminding people of Fixes tags in
> Documentation/process/stable-kernel-rules.rst ?

Reminding people how?  Patches to that file are always gladly accepted :)

thanks,

greg k-h

Re: [PATCH] ubi: fastmap: Check each mapping only once

2018-12-03 Thread Greg Kroah-Hartman

On Tue, Dec 04, 2018 at 08:39:16AM +0100, Martin Kepplinger wrote:
> On 02.12.18 16:02, Richard Weinberger wrote:
> > Sasha,
> > 
> > Am Sonntag, 2. Dezember 2018, 15:35:43 CET schrieb Sasha Levin:
> > > On Sun, Dec 02, 2018 at 11:50:33AM +, Sudip Mukherjee wrote:
> > > > > > Now queued up for 4.14.y, thanks.
> > > > > 
> > > > > can you *please* slow a little down?
> > > > 
> > > > True. It will really help if you can have some sort of fixed schedule
> > > > for stable release, like maybe stablerc is ready on Thursday or Friday
> > > > and release the stable on Monday. Having a weekend in stablerc will be
> > > > helpful for people like me who only get the time in weekends for
> > > > upstream or stable kernel.
> > > 
> > > Any sort of schedule will never work for everyone (for example, if it's
> > > part of your paid job - you don't necessarily want to review stuff over
> > > the weekend).
> > 
> > a schedule is not needed, but please give maintainers at least a chance
> > to react on stable inclusion request.
> > In this case Martin asked for inclusion on Monday and the patch was applied
> > two days later.
> 
> True, especially when the maintainer is asked a question as part of the
> patch.
> 
> I've already had the feeling that we'd need the other patch too, but in this
> case at least I should have searched for Fixes tags.
> 
> Greg, how about reminding people of Fixes tags in
> Documentation/process/stable-kernel-rules.rst ?

Reminding people how?  Patches to that file are always gladly accepted :)

thanks,

greg k-h

Re: Strange hang with gcc 8 of kprobe multiple_kprobes test

2018-12-03 Thread Masami Hiramatsu

Hi Steve,

On Mon, 3 Dec 2018 21:18:07 -0500
Steven Rostedt  wrote:

> Hi Masami,
> 
> I started testing some of my new code and the system got into a
> strange state. Debugging further, I found the cause came from the
> kprobe tests. It became stranger to me that I could reproduce it with
> older kernels. I went back as far as 4.16 and it triggered. I thought
> this very strange because I ran this test on all those kernels in the
> past.
> 
> After a bit of hair pulling, I figured out what changed. I upgraded to
> gcc 8.1 (and I reproduce it with 8.2 as well). I convert back to gcc 7
> and the tests pass without issue.

OK, let me see.

> The issue that I notice when the system gets into this strange state is
> that I can't log into the box. Nor can I reboot. Basically it's
> anything to do with systemd just doesn't work (insert your jokes here
> now, and then let's move on).
> 
> I was able to narrow down what the exact function was that caused the
> issues and it is: update_vsyscall()
> 
> gcc 7 looks like this:
> 
> 81004bf0 :
> 81004bf0:   e8 0b cc 9f 00  callq  81a01800 
> <__fentry__>
> 81004bf1: R_X86_64_PC32 __fentry__-0x4
> 81004bf5:   48 8b 07mov(%rdi),%rax
> 81004bf8:   8b 15 96 5f 34 01   mov0x1345f96(%rip),%edx   
>  # 8234ab94 
> 81004bfa: R_X86_64_PC32 vclocks_used-0x4
> 81004bfe:   83 05 7b 84 6f 01 01addl   $0x1,0x16f847b(%rip)   
>  # 826fd080 
> 81004c00: R_X86_64_PC32 vsyscall_gtod_data-0x5
> 81004c05:   8b 48 24mov0x24(%rax),%ecx
> 81004c08:   b8 01 00 00 00  mov$0x1,%eax
> 81004c0d:   d3 e0   shl%cl,%eax
> 
> And gcc 8 looks like this:
> 
> 81004c90 :
> 81004c90:   e8 6b cb 9f 00  callq  81a01800 
> <__fentry__>
> 81004c91: R_X86_64_PC32 __fentry__-0x4
> 81004c95:   48 8b 07mov(%rdi),%rax
> 81004c98:   83 05 e1 93 6f 01 01addl   $0x1,0x16f93e1(%rip)   
>  # 826fe080 

Hm this is a RIP relative instruction, it should be modified by kprobes.

> 81004c9a: R_X86_64_PC32 vsyscall_gtod_data-0x5
> 81004c9f:   8b 50 24mov0x24(%rax),%edx
> 81004ca2:   8b 05 ec 5e 34 01   mov0x1345eec(%rip),%eax   
>  # 8234ab94 
> 81004ca4: R_X86_64_PC32 vclocks_used-0x4
> 
> The test adds a kprobe (optimized) at udpate_vsyscall+5. And will
> insert a jump on the two instructions after fentry. The difference
> between v7 and v8 is that v7 is touching vclocks_used and v8 is
> touching vsyscall_gtod_data.
> 
> Is there some black magic going on with the vsyscall area with
> vsyscall_gtod_data that is causing havoc when a kprobe is added there?

I think it might miss something when preprocessing RIP relative instruction.
Could you disable jump optimization as below and test what happen on
update_vsyscall+5 AND update_vsyscall+8? (RIP relative preprocess must
happen even if the jump optimization is disabled)

# echo 0 > /proc/sys/debug/kprobes-optimization


> I can dig a little more into this, but I'm currently at my HQ office
> with a lot of other objectives that I must get done, and I can't work
> on this much more this week.

OK, let me try to reproduce it in my environment.

> 
> I included my config (for my virt machine, which I was also able to
> trigger it with).

Thanks, but I think it should not depend on the kconfig.

> 
> The test that triggers this bug is:
> 
>  tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc
> 
> It runs the test fine, but other things just start to act up after I
> run it.

Yeah, thank you for digging it down. It is now much easier to me.

> 
> I notice that when I get into the state, journald and the dbus_daemon
> are constantly running. Perhaps the userspace time keeping went bad?

Yeah, I think so. Maybe addl instruction becomes broken.

Thank you,

-- 
Masami Hiramatsu

Re: Strange hang with gcc 8 of kprobe multiple_kprobes test

2018-12-03 Thread Masami Hiramatsu

Hi Steve,

On Mon, 3 Dec 2018 21:18:07 -0500
Steven Rostedt  wrote:

> Hi Masami,
> 
> I started testing some of my new code and the system got into a
> strange state. Debugging further, I found the cause came from the
> kprobe tests. It became stranger to me that I could reproduce it with
> older kernels. I went back as far as 4.16 and it triggered. I thought
> this very strange because I ran this test on all those kernels in the
> past.
> 
> After a bit of hair pulling, I figured out what changed. I upgraded to
> gcc 8.1 (and I reproduce it with 8.2 as well). I convert back to gcc 7
> and the tests pass without issue.

OK, let me see.

> The issue that I notice when the system gets into this strange state is
> that I can't log into the box. Nor can I reboot. Basically it's
> anything to do with systemd just doesn't work (insert your jokes here
> now, and then let's move on).
> 
> I was able to narrow down what the exact function was that caused the
> issues and it is: update_vsyscall()
> 
> gcc 7 looks like this:
> 
> 81004bf0 :
> 81004bf0:   e8 0b cc 9f 00  callq  81a01800 
> <__fentry__>
> 81004bf1: R_X86_64_PC32 __fentry__-0x4
> 81004bf5:   48 8b 07mov(%rdi),%rax
> 81004bf8:   8b 15 96 5f 34 01   mov0x1345f96(%rip),%edx   
>  # 8234ab94 
> 81004bfa: R_X86_64_PC32 vclocks_used-0x4
> 81004bfe:   83 05 7b 84 6f 01 01addl   $0x1,0x16f847b(%rip)   
>  # 826fd080 
> 81004c00: R_X86_64_PC32 vsyscall_gtod_data-0x5
> 81004c05:   8b 48 24mov0x24(%rax),%ecx
> 81004c08:   b8 01 00 00 00  mov$0x1,%eax
> 81004c0d:   d3 e0   shl%cl,%eax
> 
> And gcc 8 looks like this:
> 
> 81004c90 :
> 81004c90:   e8 6b cb 9f 00  callq  81a01800 
> <__fentry__>
> 81004c91: R_X86_64_PC32 __fentry__-0x4
> 81004c95:   48 8b 07mov(%rdi),%rax
> 81004c98:   83 05 e1 93 6f 01 01addl   $0x1,0x16f93e1(%rip)   
>  # 826fe080 

Hm this is a RIP relative instruction, it should be modified by kprobes.

> 81004c9a: R_X86_64_PC32 vsyscall_gtod_data-0x5
> 81004c9f:   8b 50 24mov0x24(%rax),%edx
> 81004ca2:   8b 05 ec 5e 34 01   mov0x1345eec(%rip),%eax   
>  # 8234ab94 
> 81004ca4: R_X86_64_PC32 vclocks_used-0x4
> 
> The test adds a kprobe (optimized) at udpate_vsyscall+5. And will
> insert a jump on the two instructions after fentry. The difference
> between v7 and v8 is that v7 is touching vclocks_used and v8 is
> touching vsyscall_gtod_data.
> 
> Is there some black magic going on with the vsyscall area with
> vsyscall_gtod_data that is causing havoc when a kprobe is added there?

I think it might miss something when preprocessing RIP relative instruction.
Could you disable jump optimization as below and test what happen on
update_vsyscall+5 AND update_vsyscall+8? (RIP relative preprocess must
happen even if the jump optimization is disabled)

# echo 0 > /proc/sys/debug/kprobes-optimization


> I can dig a little more into this, but I'm currently at my HQ office
> with a lot of other objectives that I must get done, and I can't work
> on this much more this week.

OK, let me try to reproduce it in my environment.

> 
> I included my config (for my virt machine, which I was also able to
> trigger it with).

Thanks, but I think it should not depend on the kconfig.

> 
> The test that triggers this bug is:
> 
>  tools/testing/selftests/ftrace/test.d/kprobe/multiple_kprobes.tc
> 
> It runs the test fine, but other things just start to act up after I
> run it.

Yeah, thank you for digging it down. It is now much easier to me.

> 
> I notice that when I get into the state, journald and the dbus_daemon
> are constantly running. Perhaps the userspace time keeping went bad?

Yeah, I think so. Maybe addl instruction becomes broken.

Thank you,

-- 
Masami Hiramatsu

Re: [PATCH] ubi: fastmap: Check each mapping only once

2018-12-03 Thread Martin Kepplinger


On 02.12.18 16:02, Richard Weinberger wrote:

Sasha,

Am Sonntag, 2. Dezember 2018, 15:35:43 CET schrieb Sasha Levin:

On Sun, Dec 02, 2018 at 11:50:33AM +, Sudip Mukherjee wrote:

Now queued up for 4.14.y, thanks.


can you *please* slow a little down?


True. It will really help if you can have some sort of fixed schedule
for stable release, like maybe stablerc is ready on Thursday or Friday
and release the stable on Monday. Having a weekend in stablerc will be
helpful for people like me who only get the time in weekends for
upstream or stable kernel.


Any sort of schedule will never work for everyone (for example, if it's
part of your paid job - you don't necessarily want to review stuff over
the weekend).


a schedule is not needed, but please give maintainers at least a chance
to react on stable inclusion request.
In this case Martin asked for inclusion on Monday and the patch was applied
two days later.


True, especially when the maintainer is asked a question as part of the 
patch.


I've already had the feeling that we'd need the other patch too, but in 
this case at least I should have searched for Fixes tags.


Greg, how about reminding people of Fixes tags in 
Documentation/process/stable-kernel-rules.rst ?


  martin


smime.p7s
Description: S/MIME cryptographic signature

Re: [PATCH] ubi: fastmap: Check each mapping only once

2018-12-03 Thread Martin Kepplinger


On 02.12.18 16:02, Richard Weinberger wrote:

Sasha,

Am Sonntag, 2. Dezember 2018, 15:35:43 CET schrieb Sasha Levin:

On Sun, Dec 02, 2018 at 11:50:33AM +, Sudip Mukherjee wrote:

Now queued up for 4.14.y, thanks.


can you *please* slow a little down?


True. It will really help if you can have some sort of fixed schedule
for stable release, like maybe stablerc is ready on Thursday or Friday
and release the stable on Monday. Having a weekend in stablerc will be
helpful for people like me who only get the time in weekends for
upstream or stable kernel.


Any sort of schedule will never work for everyone (for example, if it's
part of your paid job - you don't necessarily want to review stuff over
the weekend).


a schedule is not needed, but please give maintainers at least a chance
to react on stable inclusion request.
In this case Martin asked for inclusion on Monday and the patch was applied
two days later.


True, especially when the maintainer is asked a question as part of the 
patch.


I've already had the feeling that we'd need the other patch too, but in 
this case at least I should have searched for Fixes tags.


Greg, how about reminding people of Fixes tags in 
Documentation/process/stable-kernel-rules.rst ?


  martin


smime.p7s
Description: S/MIME cryptographic signature

Re: [patch 0/2 for-4.20] mm, thp: fix remote access and allocation regressions

2018-12-03 Thread Michal Hocko

On Mon 03-12-18 15:50:18, David Rientjes wrote:
> This fixes a 13.9% of remote memory access regression and 40% remote
> memory allocation regression on Haswell when the local node is fragmented
> for hugepage sized pages and memory is being faulted with either the thp
> defrag setting of "always" or has been madvised with MADV_HUGEPAGE.
> 
> The usecase that initially identified this issue were binaries that mremap
> their .text segment to be backed by transparent hugepages on startup.
> They do mmap(), madvise(MADV_HUGEPAGE), memcpy(), and mremap().

Do you have something you can share with so that other people can play
and try to reproduce?

> This requires a full revert and partial revert of commits merged during
> the 4.20 rc cycle.  The full revert, of ac5b2c18911f ("mm: thp: relax
> __GFP_THISNODE for MADV_HUGEPAGE mappings"), was anticipated to fix large
> amounts of swap activity on the local zone when faulting hugepages by
> falling back to remote memory.  This remote allocation causes the access
> regression and, if fragmented, the allocation regression.

Have you tried to measure any of the workloads Mel and Andrea have
pointed out during the previous review discussion? In other words what
is the impact on the THP success rate and allocation latencies for other
usecases?
-- 
Michal Hocko
SUSE Labs

Re: [patch 0/2 for-4.20] mm, thp: fix remote access and allocation regressions

2018-12-03 Thread Michal Hocko

On Mon 03-12-18 15:50:18, David Rientjes wrote:
> This fixes a 13.9% of remote memory access regression and 40% remote
> memory allocation regression on Haswell when the local node is fragmented
> for hugepage sized pages and memory is being faulted with either the thp
> defrag setting of "always" or has been madvised with MADV_HUGEPAGE.
> 
> The usecase that initially identified this issue were binaries that mremap
> their .text segment to be backed by transparent hugepages on startup.
> They do mmap(), madvise(MADV_HUGEPAGE), memcpy(), and mremap().

Do you have something you can share with so that other people can play
and try to reproduce?

> This requires a full revert and partial revert of commits merged during
> the 4.20 rc cycle.  The full revert, of ac5b2c18911f ("mm: thp: relax
> __GFP_THISNODE for MADV_HUGEPAGE mappings"), was anticipated to fix large
> amounts of swap activity on the local zone when faulting hugepages by
> falling back to remote memory.  This remote allocation causes the access
> regression and, if fragmented, the allocation regression.

Have you tried to measure any of the workloads Mel and Andrea have
pointed out during the previous review discussion? In other words what
is the impact on the THP success rate and allocation latencies for other
usecases?
-- 
Michal Hocko
SUSE Labs

Re: [patch 1/2 for-4.20] mm, thp: restore node-local hugepage allocations

2018-12-03 Thread Michal Hocko

On Mon 03-12-18 15:50:24, David Rientjes wrote:
> This is a full revert of ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for
> MADV_HUGEPAGE mappings") and a partial revert of 89c83fb539f9 ("mm, thp:
> consolidate THP gfp handling into alloc_hugepage_direct_gfpmask").
> 
> By not setting __GFP_THISNODE, applications can allocate remote hugepages
> when the local node is fragmented or low on memory when either the thp
> defrag setting is "always" or the vma has been madvised with
> MADV_HUGEPAGE.
> 
> Remote access to hugepages often has much higher latency than local pages
> of the native page size.  On Haswell, ac5b2c18911f was shown to have a
> 13.9% access regression after this commit for binaries that remap their
> text segment to be backed by transparent hugepages.
> 
> The intent of ac5b2c18911f is to address an issue where a local node is
> low on memory or fragmented such that a hugepage cannot be allocated.  In
> every scenario where this was described as a fix, there is abundant and
> unfragmented remote memory available to allocate from, even with a greater
> access latency.
> 
> If remote memory is also low or fragmented, not setting __GFP_THISNODE was
> also measured on Haswell to have a 40% regression in allocation latency.
> 
> Restore __GFP_THISNODE for thp allocations.
> 
> Fixes: ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE 
> mappings")
> Fixes: 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into 
> alloc_hugepage_direct_gfpmask")

At minimum do not remove the cleanup part which consolidates the gfp
hadnling to a single place. There is no real reason to have the
__GFP_THISNODE ugliness outside of alloc_hugepage_direct_gfpmask.

I still hate the __GFP_THISNODE part as mentioned before. It is an ugly
hack but I can learn to live with it if this is indeed the only option
for the short term workaround until we find a proper solution.

> Signed-off-by: David Rientjes 
> ---
>  include/linux/mempolicy.h |  2 --
>  mm/huge_memory.c  | 42 +++
>  mm/mempolicy.c|  7 ---
>  3 files changed, 20 insertions(+), 31 deletions(-)
> 
> diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
> --- a/include/linux/mempolicy.h
> +++ b/include/linux/mempolicy.h
> @@ -139,8 +139,6 @@ struct mempolicy *mpol_shared_policy_lookup(struct 
> shared_policy *sp,
>  struct mempolicy *get_task_policy(struct task_struct *p);
>  struct mempolicy *__get_vma_policy(struct vm_area_struct *vma,
>   unsigned long addr);
> -struct mempolicy *get_vma_policy(struct vm_area_struct *vma,
> - unsigned long addr);
>  bool vma_policy_mof(struct vm_area_struct *vma);
>  
>  extern void numa_default_policy(void);
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -632,37 +632,27 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct 
> vm_fault *vmf,
>  static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct 
> *vma, unsigned long addr)
>  {
>   const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE);
> - gfp_t this_node = 0;
> -
> -#ifdef CONFIG_NUMA
> - struct mempolicy *pol;
> - /*
> -  * __GFP_THISNODE is used only when __GFP_DIRECT_RECLAIM is not
> -  * specified, to express a general desire to stay on the current
> -  * node for optimistic allocation attempts. If the defrag mode
> -  * and/or madvise hint requires the direct reclaim then we prefer
> -  * to fallback to other node rather than node reclaim because that
> -  * can lead to excessive reclaim even though there is free memory
> -  * on other nodes. We expect that NUMA preferences are specified
> -  * by memory policies.
> -  */
> - pol = get_vma_policy(vma, addr);
> - if (pol->mode != MPOL_BIND)
> - this_node = __GFP_THISNODE;
> - mpol_cond_put(pol);
> -#endif
> + const gfp_t gfp_mask = GFP_TRANSHUGE_LIGHT | __GFP_THISNODE;
>  
> + /* Always do synchronous compaction */
>   if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, 
> _hugepage_flags))
> - return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY);
> + return GFP_TRANSHUGE | __GFP_THISNODE |
> +(vma_madvised ? 0 : __GFP_NORETRY);
> +
> + /* Kick kcompactd and fail quickly */
>   if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, 
> _hugepage_flags))
> - return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM | this_node;
> + return gfp_mask | __GFP_KSWAPD_RECLAIM;
> +
> + /* Synchronous compaction if madvised, otherwise kick kcompactd */
>   if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, 
> _hugepage_flags))
> - return GFP_TRANSHUGE_LIGHT | (vma_madvised ? 
> __GFP_DIRECT_RECLAIM :
> -  
> __GFP_KSWAPD_RECLAIM | this_node);
> +

Re: [patch 1/2 for-4.20] mm, thp: restore node-local hugepage allocations

2018-12-03 Thread Michal Hocko

On Mon 03-12-18 15:50:24, David Rientjes wrote:
> This is a full revert of ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for
> MADV_HUGEPAGE mappings") and a partial revert of 89c83fb539f9 ("mm, thp:
> consolidate THP gfp handling into alloc_hugepage_direct_gfpmask").
> 
> By not setting __GFP_THISNODE, applications can allocate remote hugepages
> when the local node is fragmented or low on memory when either the thp
> defrag setting is "always" or the vma has been madvised with
> MADV_HUGEPAGE.
> 
> Remote access to hugepages often has much higher latency than local pages
> of the native page size.  On Haswell, ac5b2c18911f was shown to have a
> 13.9% access regression after this commit for binaries that remap their
> text segment to be backed by transparent hugepages.
> 
> The intent of ac5b2c18911f is to address an issue where a local node is
> low on memory or fragmented such that a hugepage cannot be allocated.  In
> every scenario where this was described as a fix, there is abundant and
> unfragmented remote memory available to allocate from, even with a greater
> access latency.
> 
> If remote memory is also low or fragmented, not setting __GFP_THISNODE was
> also measured on Haswell to have a 40% regression in allocation latency.
> 
> Restore __GFP_THISNODE for thp allocations.
> 
> Fixes: ac5b2c18911f ("mm: thp: relax __GFP_THISNODE for MADV_HUGEPAGE 
> mappings")
> Fixes: 89c83fb539f9 ("mm, thp: consolidate THP gfp handling into 
> alloc_hugepage_direct_gfpmask")

At minimum do not remove the cleanup part which consolidates the gfp
hadnling to a single place. There is no real reason to have the
__GFP_THISNODE ugliness outside of alloc_hugepage_direct_gfpmask.

I still hate the __GFP_THISNODE part as mentioned before. It is an ugly
hack but I can learn to live with it if this is indeed the only option
for the short term workaround until we find a proper solution.

> Signed-off-by: David Rientjes 
> ---
>  include/linux/mempolicy.h |  2 --
>  mm/huge_memory.c  | 42 +++
>  mm/mempolicy.c|  7 ---
>  3 files changed, 20 insertions(+), 31 deletions(-)
> 
> diff --git a/include/linux/mempolicy.h b/include/linux/mempolicy.h
> --- a/include/linux/mempolicy.h
> +++ b/include/linux/mempolicy.h
> @@ -139,8 +139,6 @@ struct mempolicy *mpol_shared_policy_lookup(struct 
> shared_policy *sp,
>  struct mempolicy *get_task_policy(struct task_struct *p);
>  struct mempolicy *__get_vma_policy(struct vm_area_struct *vma,
>   unsigned long addr);
> -struct mempolicy *get_vma_policy(struct vm_area_struct *vma,
> - unsigned long addr);
>  bool vma_policy_mof(struct vm_area_struct *vma);
>  
>  extern void numa_default_policy(void);
> diff --git a/mm/huge_memory.c b/mm/huge_memory.c
> --- a/mm/huge_memory.c
> +++ b/mm/huge_memory.c
> @@ -632,37 +632,27 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct 
> vm_fault *vmf,
>  static inline gfp_t alloc_hugepage_direct_gfpmask(struct vm_area_struct 
> *vma, unsigned long addr)
>  {
>   const bool vma_madvised = !!(vma->vm_flags & VM_HUGEPAGE);
> - gfp_t this_node = 0;
> -
> -#ifdef CONFIG_NUMA
> - struct mempolicy *pol;
> - /*
> -  * __GFP_THISNODE is used only when __GFP_DIRECT_RECLAIM is not
> -  * specified, to express a general desire to stay on the current
> -  * node for optimistic allocation attempts. If the defrag mode
> -  * and/or madvise hint requires the direct reclaim then we prefer
> -  * to fallback to other node rather than node reclaim because that
> -  * can lead to excessive reclaim even though there is free memory
> -  * on other nodes. We expect that NUMA preferences are specified
> -  * by memory policies.
> -  */
> - pol = get_vma_policy(vma, addr);
> - if (pol->mode != MPOL_BIND)
> - this_node = __GFP_THISNODE;
> - mpol_cond_put(pol);
> -#endif
> + const gfp_t gfp_mask = GFP_TRANSHUGE_LIGHT | __GFP_THISNODE;
>  
> + /* Always do synchronous compaction */
>   if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_DIRECT_FLAG, 
> _hugepage_flags))
> - return GFP_TRANSHUGE | (vma_madvised ? 0 : __GFP_NORETRY);
> + return GFP_TRANSHUGE | __GFP_THISNODE |
> +(vma_madvised ? 0 : __GFP_NORETRY);
> +
> + /* Kick kcompactd and fail quickly */
>   if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_FLAG, 
> _hugepage_flags))
> - return GFP_TRANSHUGE_LIGHT | __GFP_KSWAPD_RECLAIM | this_node;
> + return gfp_mask | __GFP_KSWAPD_RECLAIM;
> +
> + /* Synchronous compaction if madvised, otherwise kick kcompactd */
>   if (test_bit(TRANSPARENT_HUGEPAGE_DEFRAG_KSWAPD_OR_MADV_FLAG, 
> _hugepage_flags))
> - return GFP_TRANSHUGE_LIGHT | (vma_madvised ? 
> __GFP_DIRECT_RECLAIM :
> -  
> __GFP_KSWAPD_RECLAIM | this_node);
> +

Re: ext4 file system corruption with v4.19.3 / v4.19.4

2018-12-03 Thread Gunter Königsmann

After upgrading my kernel to 4.19 I got a corruption on nearly every
reboot or resume from suspend on my Acer s7-391 [UEFI boot].

Going to my UEFI setup and changing IDE mode from IDE to ATA seems to
have resolved the issue for me.

Don't know, though, if that is a valid data point or if it was a mere
accident (tested only on one computer) or just avoids the Bad Timing by
a few nanoseconds

Re: ext4 file system corruption with v4.19.3 / v4.19.4

2018-12-03 Thread Gunter Königsmann

After upgrading my kernel to 4.19 I got a corruption on nearly every
reboot or resume from suspend on my Acer s7-391 [UEFI boot].

Going to my UEFI setup and changing IDE mode from IDE to ATA seems to
have resolved the issue for me.

Don't know, though, if that is a valid data point or if it was a mere
accident (tested only on one computer) or just avoids the Bad Timing by
a few nanoseconds

Re: [PATCH 2/9] tools/lib/traceevent: Added support for pkg-config

2018-12-03 Thread Namhyung Kim

Hi Steve,

On Fri, Nov 30, 2018 at 10:44:05AM -0500, Steven Rostedt wrote:
> From: Tzvetomir Stoyanov 
> 
> This patch implements integration with pkg-config framework.
> pkg-config can be used by the library users to determine
> required CFLAGS and LDFLAGS in order to use the library
> 
> Signed-off-by: Tzvetomir Stoyanov 
> Signed-off-by: Steven Rostedt (VMware) 
> ---

[SNIP]
> diff --git a/tools/lib/traceevent/libtraceevent.pc.template 
> b/tools/lib/traceevent/libtraceevent.pc.template
> new file mode 100644
> index ..42e4d6cb6b9e
> --- /dev/null
> +++ b/tools/lib/traceevent/libtraceevent.pc.template
> @@ -0,0 +1,10 @@
> +prefix=INSTALL_PREFIX
> +libdir=${prefix}/lib64

Don't we care 32-bit systems anymore? :)

Thanks,
Namhyung


> +includedir=${prefix}/include/traceevent
> +
> +Name: libtraceevent
> +URL: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> +Description: Linux kernel trace event library
> +Version: LIB_VERSION
> +Cflags: -I${includedir}
> +Libs: -L${libdir} -ltraceevent
> -- 
> 2.19.1
> 
>

Re: [PATCH 2/9] tools/lib/traceevent: Added support for pkg-config

2018-12-03 Thread Namhyung Kim

Hi Steve,

On Fri, Nov 30, 2018 at 10:44:05AM -0500, Steven Rostedt wrote:
> From: Tzvetomir Stoyanov 
> 
> This patch implements integration with pkg-config framework.
> pkg-config can be used by the library users to determine
> required CFLAGS and LDFLAGS in order to use the library
> 
> Signed-off-by: Tzvetomir Stoyanov 
> Signed-off-by: Steven Rostedt (VMware) 
> ---

[SNIP]
> diff --git a/tools/lib/traceevent/libtraceevent.pc.template 
> b/tools/lib/traceevent/libtraceevent.pc.template
> new file mode 100644
> index ..42e4d6cb6b9e
> --- /dev/null
> +++ b/tools/lib/traceevent/libtraceevent.pc.template
> @@ -0,0 +1,10 @@
> +prefix=INSTALL_PREFIX
> +libdir=${prefix}/lib64

Don't we care 32-bit systems anymore? :)

Thanks,
Namhyung


> +includedir=${prefix}/include/traceevent
> +
> +Name: libtraceevent
> +URL: https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git
> +Description: Linux kernel trace event library
> +Version: LIB_VERSION
> +Cflags: -I${includedir}
> +Libs: -L${libdir} -ltraceevent
> -- 
> 2.19.1
> 
>

Re: [PATCH AUTOSEL 4.14 25/35] iomap: sub-block dio needs to zeroout beyond EOF

2018-12-03 Thread Greg KH

On Mon, Dec 03, 2018 at 11:22:46PM +0159, Thomas Backlund wrote:
> Den 2018-12-03 kl. 11:22, skrev Sasha Levin:
> 
> > 
> > This is a case where theory collides with the real world. Yes, our QA is
> > lacking, but we don't have the option of not doing the current process.
> > If we stop backporting until a future data where our QA problem is
> > solved we'll end up with what we had before: users stuck on ancient
> > kernels without a way to upgrade.
> > 
> 
> Sorry, but you seem to be living in a different "real world"...
> 
> People stay on "ancient kernels" that "just works" instead of updating
> to a newer one that "hopefully/maybe/... works"

That's not good as those "ancient kernels" really just are "kernels with
lots of known security bugs".

It's your systems, I can't tell you what to do, but I will tell you that
running older, unfixed kernels, is a known liability.

Good luck!

greg k-h

Re: [PATCH AUTOSEL 4.14 25/35] iomap: sub-block dio needs to zeroout beyond EOF

2018-12-03 Thread Greg KH

On Mon, Dec 03, 2018 at 11:22:46PM +0159, Thomas Backlund wrote:
> Den 2018-12-03 kl. 11:22, skrev Sasha Levin:
> 
> > 
> > This is a case where theory collides with the real world. Yes, our QA is
> > lacking, but we don't have the option of not doing the current process.
> > If we stop backporting until a future data where our QA problem is
> > solved we'll end up with what we had before: users stuck on ancient
> > kernels without a way to upgrade.
> > 
> 
> Sorry, but you seem to be living in a different "real world"...
> 
> People stay on "ancient kernels" that "just works" instead of updating
> to a newer one that "hopefully/maybe/... works"

That's not good as those "ancient kernels" really just are "kernels with
lots of known security bugs".

It's your systems, I can't tell you what to do, but I will tell you that
running older, unfixed kernels, is a known liability.

Good luck!

greg k-h

Re: [PATCH v7 05/16] tracing: Generalize hist trigger onmax and save action

2018-12-03 Thread Namhyung Kim

On Mon, Dec 03, 2018 at 04:22:02PM -0600, Tom Zanussi wrote:
> Hi Namhyung,
> 
> On Fri, 2018-11-23 at 11:50 +0900, Namhyung Kim wrote:
> > Hi Tom,
> > 
> > On Wed, Nov 14, 2018 at 02:18:02PM -0600, Tom Zanussi wrote:
> > > From: Tom Zanussi 
> > > 
> 
> [snip]
> 
> > > 
>  enum handler_id {
> > >   HANDLER_ONMATCH = 1,
> > >   HANDLER_ONMAX,
> > > @@ -349,14 +358,18 @@ struct action_data {
> > >  
> > >   struct {
> > >   char*var_str;
> > > - unsigned intmax_var_ref_idx;
> > > - struct hist_field   *max_var;
> > > - struct hist_field   *var;
> > > - } onmax;
> > > + struct hist_field   *var_ref;
> > > + unsigned intvar_ref_idx;
> > 
> > I have a question.  It's confusing for me there are many indexes for
> > a
> > variable (ref).  The hist_field already has var.idx, var_idx and
> > var_ref_idx in it.  But you also added an external var_ref_idx along
> > with the var_ref.  Also I see another var_ref_idx in the action data.
> > Is all that really needed?  Could you please add some comment then?
> > 
> 
> Below is a patch with some comments I'll merge into the next version
> that I hope will help make things more clear.  Basically, the
> hist_field.var_idx isn't used so I've removed it and therefore that

Thanks!


> source of confusion, while var.idx is the variable's unique 'handle' in
> the tracing_map, used when getting and setting the variable.  And then
> there are the several versions of var_ref_idx used for different
> purposes depending on the context, but all of them are indices into the
> array of variable values collected when a trigger is hit.  For example,

So IIUC field->var_ref_idx is an index to the val_ref_vals array,
right?  Then if we keep the all hist_fields we don't need to have a
separate var_ref_idx IMHO.


> the var_ref_idx defined inside track_data is the index that points to
> the tracked var value, which the action can use directly, and the

I guess the track_data.var_ref_idx is always same as the
track_data.track_var.var_ref_idx, no?  If so we can get rid of it.


> var_ref_idx alongside the synth fields in action_data is the index of
> the first param used when generating a synthetic event, and so on.

For synth event, we have hist_data->synth_var_refs[] but it's not
passed to trace_synth() so no way to know original var_ref_idx and I'm
ok with having action_data.var_ref_idx.

But I don't see where hist_data->synth_var_refs is used other than
find_var_ref().  And for that purpose, I guess it's more efficient to
use hist_data->var_refs[] so that we can remove synth_var_refs.


> 
> Tom 
> 
> diff --git a/kernel/trace/trace_events_hist.c 
> b/kernel/trace/trace_events_hist.c
> index 818944391d97..5310ef73f023 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -39,6 +39,16 @@ enum field_op_id {
>   FIELD_OP_UNARY_MINUS,
>  };
>  
> +/*
> + * A hist_var (histogram variable) contains variable information for
> + * hist_fields having the HIST_FIELD_FL_VAR or HIST_FIELD_FL_VAR_REF
> + * flag set.  A hist_var has a variable name e.g. ts0, and is
> + * associated with a given histogram trigger, as specified by
> + * hist_data.  The hist_var idx is the unique index assigned to the
> + * variable by the hist trigger's tracing_map.  The idx is what is
> + * used to set a variable's value and, by a variable reference, to
> + * retrieve it.
> + */
>  struct hist_var {
>   char*name;
>   struct hist_trigger_data*hist_data;
> @@ -60,7 +70,15 @@ struct hist_field {
>   char*system;
>   char*event_name;
>   char*name;
> - unsigned intvar_idx;
> +
> + /*
> +  * When a histogram trigger is hit, if it has any references
> +  * to variables, the values of those variables are collected
> +  * into a var_ref_vals array by resolve_var_refs().  The
> +  * current value of each variable is read from the tracing_map
> +  * using the hist field's hist_var.idx and entered into the
> +  * var_ref_idx entry i.e. var_ref_vals[var_ref_idx].
> +  */
>   unsigned intvar_ref_idx;
>   boolread_once;
>  };
> @@ -350,6 +368,14 @@ struct action_data {
>   unsigned intn_params;
>   char*params[SYNTH_FIELDS_MAX];
>  
> + /*
> +  * When a histogram trigger is hit, the values of any
> +  * references to variables, including variables being passed
> +  * as parameters to synthetic events, are collected into a
> +  * var_ref_vals array.  This var_ref_idx is the index of the
> +  * first param in the array to be passed to the synthetic
> +  * event invocation.
> +  */
>   unsigned int

Re: [PATCH v7 05/16] tracing: Generalize hist trigger onmax and save action

2018-12-03 Thread Namhyung Kim

On Mon, Dec 03, 2018 at 04:22:02PM -0600, Tom Zanussi wrote:
> Hi Namhyung,
> 
> On Fri, 2018-11-23 at 11:50 +0900, Namhyung Kim wrote:
> > Hi Tom,
> > 
> > On Wed, Nov 14, 2018 at 02:18:02PM -0600, Tom Zanussi wrote:
> > > From: Tom Zanussi 
> > > 
> 
> [snip]
> 
> > > 
>  enum handler_id {
> > >   HANDLER_ONMATCH = 1,
> > >   HANDLER_ONMAX,
> > > @@ -349,14 +358,18 @@ struct action_data {
> > >  
> > >   struct {
> > >   char*var_str;
> > > - unsigned intmax_var_ref_idx;
> > > - struct hist_field   *max_var;
> > > - struct hist_field   *var;
> > > - } onmax;
> > > + struct hist_field   *var_ref;
> > > + unsigned intvar_ref_idx;
> > 
> > I have a question.  It's confusing for me there are many indexes for
> > a
> > variable (ref).  The hist_field already has var.idx, var_idx and
> > var_ref_idx in it.  But you also added an external var_ref_idx along
> > with the var_ref.  Also I see another var_ref_idx in the action data.
> > Is all that really needed?  Could you please add some comment then?
> > 
> 
> Below is a patch with some comments I'll merge into the next version
> that I hope will help make things more clear.  Basically, the
> hist_field.var_idx isn't used so I've removed it and therefore that

Thanks!


> source of confusion, while var.idx is the variable's unique 'handle' in
> the tracing_map, used when getting and setting the variable.  And then
> there are the several versions of var_ref_idx used for different
> purposes depending on the context, but all of them are indices into the
> array of variable values collected when a trigger is hit.  For example,

So IIUC field->var_ref_idx is an index to the val_ref_vals array,
right?  Then if we keep the all hist_fields we don't need to have a
separate var_ref_idx IMHO.


> the var_ref_idx defined inside track_data is the index that points to
> the tracked var value, which the action can use directly, and the

I guess the track_data.var_ref_idx is always same as the
track_data.track_var.var_ref_idx, no?  If so we can get rid of it.


> var_ref_idx alongside the synth fields in action_data is the index of
> the first param used when generating a synthetic event, and so on.

For synth event, we have hist_data->synth_var_refs[] but it's not
passed to trace_synth() so no way to know original var_ref_idx and I'm
ok with having action_data.var_ref_idx.

But I don't see where hist_data->synth_var_refs is used other than
find_var_ref().  And for that purpose, I guess it's more efficient to
use hist_data->var_refs[] so that we can remove synth_var_refs.


> 
> Tom 
> 
> diff --git a/kernel/trace/trace_events_hist.c 
> b/kernel/trace/trace_events_hist.c
> index 818944391d97..5310ef73f023 100644
> --- a/kernel/trace/trace_events_hist.c
> +++ b/kernel/trace/trace_events_hist.c
> @@ -39,6 +39,16 @@ enum field_op_id {
>   FIELD_OP_UNARY_MINUS,
>  };
>  
> +/*
> + * A hist_var (histogram variable) contains variable information for
> + * hist_fields having the HIST_FIELD_FL_VAR or HIST_FIELD_FL_VAR_REF
> + * flag set.  A hist_var has a variable name e.g. ts0, and is
> + * associated with a given histogram trigger, as specified by
> + * hist_data.  The hist_var idx is the unique index assigned to the
> + * variable by the hist trigger's tracing_map.  The idx is what is
> + * used to set a variable's value and, by a variable reference, to
> + * retrieve it.
> + */
>  struct hist_var {
>   char*name;
>   struct hist_trigger_data*hist_data;
> @@ -60,7 +70,15 @@ struct hist_field {
>   char*system;
>   char*event_name;
>   char*name;
> - unsigned intvar_idx;
> +
> + /*
> +  * When a histogram trigger is hit, if it has any references
> +  * to variables, the values of those variables are collected
> +  * into a var_ref_vals array by resolve_var_refs().  The
> +  * current value of each variable is read from the tracing_map
> +  * using the hist field's hist_var.idx and entered into the
> +  * var_ref_idx entry i.e. var_ref_vals[var_ref_idx].
> +  */
>   unsigned intvar_ref_idx;
>   boolread_once;
>  };
> @@ -350,6 +368,14 @@ struct action_data {
>   unsigned intn_params;
>   char*params[SYNTH_FIELDS_MAX];
>  
> + /*
> +  * When a histogram trigger is hit, the values of any
> +  * references to variables, including variables being passed
> +  * as parameters to synthetic events, are collected into a
> +  * var_ref_vals array.  This var_ref_idx is the index of the
> +  * first param in the array to be passed to the synthetic
> +  * event invocation.
> +  */
>   unsigned int

Re: [PATCH 2/3] mm/vmscan: Enable kswapd to reclaim low-protected memory

2018-12-03 Thread Michal Hocko

On Tue 04-12-18 10:40:29, Xunlei Pang wrote:
> On 2018/12/4 AM 1:22, Michal Hocko wrote:
> > On Mon 03-12-18 23:20:31, Xunlei Pang wrote:
> >> On 2018/12/3 下午7:56, Michal Hocko wrote:
> >>> On Mon 03-12-18 16:01:18, Xunlei Pang wrote:
>  There may be cgroup memory overcommitment, it will become
>  even common in the future.
> 
>  Let's enable kswapd to reclaim low-protected memory in case
>  of memory pressure, to mitigate the global direct reclaim
>  pressures which could cause jitters to the response time of
>  lantency-sensitive groups.
> >>>
> >>> Please be more descriptive about the problem you are trying to handle
> >>> here. I haven't actually read the patch but let me emphasise that the
> >>> low limit protection is important isolation tool. And allowing kswapd to
> >>> reclaim protected memcgs is going to break the semantic as it has been
> >>> introduced and designed.
> >>
> >> We have two types of memcgs: online groups(important business)
> >> and offline groups(unimportant business). Online groups are
> >> all configured with MAX low protection, while offline groups
> >> are not at all protected(with default 0 low).
> >>
> >> When offline groups are overcommitted, the global memory pressure
> >> suffers. This will cause the memory allocations from online groups
> >> constantly go to the slow global direct reclaim in order to reclaim
> >> online's page caches, as kswap is not able to reclaim low-protection
> >> memory. low is not hard limit, it's reasonable to be reclaimed by
> >> kswapd if there's no other reclaimable memory.
> > 
> > I am sorry I still do not follow. What role do offline cgroups play.
> > Those are certainly not low mem protected because mem_cgroup_css_offline
> > will reset them to 0.
> > 
> 
> Oh, I meant "offline groups" to be "offline-business groups", memcgs
> refered to here are all "online state" from kernel's perspective.

What is offline-business group? Please try to explain the actual problem
in much more details and do not let us guess.

-- 
Michal Hocko
SUSE Labs

[PATCH V3 1/3] mmc: sdhci: add support for using external DMA devices

2018-12-03 Thread Chunyan Zhang

Some standard SD host controllers can support both external dma
controllers as well as ADMA/SDMA in which the SD host controller
acts as DMA master. TI's omap controller is the case as an example.

Currently the generic SDHCI code supports ADMA/SDMA integrated in
the host controller but does not have any support for external DMA
controllers implemented using dmaengine, meaning that custom code is
needed for any systems that use an external DMA controller with SDHCI.

Signed-off-by: Chunyan Zhang 
---
 drivers/mmc/host/Kconfig |   3 +
 drivers/mmc/host/sdhci.c | 185 ++-
 drivers/mmc/host/sdhci.h |   8 ++
 3 files changed, 195 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
index 1b58739..3101da6 100644
--- a/drivers/mmc/host/Kconfig
+++ b/drivers/mmc/host/Kconfig
@@ -977,3 +977,6 @@ config MMC_SDHCI_OMAP
  If you have a controller with this interface, say Y or M here.
 
  If unsure, say N.
+
+config MMC_SDHCI_EXTERNAL_DMA
+bool
diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 99bdae5..04b029c 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -14,6 +14,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1309,6 +1310,162 @@ static void sdhci_del_timer(struct sdhci_host *host, 
struct mmc_request *mrq)
del_timer(>timer);
 }
 
+#if IS_ENABLED(CONFIG_MMC_SDHCI_EXTERNAL_DMA)
+static int sdhci_external_dma_init(struct sdhci_host *host)
+{
+   int ret = 0;
+   struct mmc_host *mmc = host->mmc;
+
+   host->tx_chan = dma_request_chan(mmc->parent, "tx");
+   if (IS_ERR(host->tx_chan)) {
+   ret = PTR_ERR(host->tx_chan);
+   if (ret != -EPROBE_DEFER)
+   pr_warn("Failed to request TX DMA channel.\n");
+   host->tx_chan = NULL;
+   return ret;
+   }
+
+   host->rx_chan = dma_request_chan(mmc->parent, "rx");
+   if (IS_ERR(host->rx_chan)) {
+   if (host->tx_chan) {
+   dma_release_channel(host->tx_chan);
+   host->tx_chan = NULL;
+   }
+
+   ret = PTR_ERR(host->rx_chan);
+   if (ret != -EPROBE_DEFER)
+   pr_warn("Failed to request RX DMA channel.\n");
+   host->rx_chan = NULL;
+   }
+
+   return ret;
+}
+
+static inline struct dma_chan *
+sdhci_external_dma_channel(struct sdhci_host *host, struct mmc_data *data)
+{
+   return data->flags & MMC_DATA_WRITE ? host->tx_chan : host->rx_chan;
+}
+
+static int sdhci_external_dma_setup(struct sdhci_host *host,
+   struct mmc_command *cmd)
+{
+   int ret, i;
+   struct dma_async_tx_descriptor *desc;
+   struct mmc_data *data = cmd->data;
+   struct dma_chan *chan;
+   struct dma_slave_config cfg;
+   dma_cookie_t cookie;
+
+   if (!data)
+   return 0;
+
+   if (!host->mapbase)
+   return -EINVAL;
+
+   cfg.src_addr = host->mapbase + SDHCI_BUFFER;
+   cfg.dst_addr = host->mapbase + SDHCI_BUFFER;
+   cfg.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
+   cfg.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
+   cfg.src_maxburst = data->blksz / 4;
+   cfg.dst_maxburst = data->blksz / 4;
+
+   /* Sanity check: all the SG entries must be aligned by block size. */
+   for (i = 0; i < data->sg_len; i++) {
+   if ((data->sg + i)->length % data->blksz)
+   return -EINVAL;
+   }
+
+   chan = sdhci_external_dma_channel(host, data);
+
+   ret = dmaengine_slave_config(chan, );
+   if (ret)
+   return ret;
+
+   desc = dmaengine_prep_slave_sg(chan, data->sg, data->sg_len,
+  mmc_get_dma_dir(data),
+  DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
+   if (!desc)
+   return -EINVAL;
+
+   desc->callback = NULL;
+   desc->callback_param = NULL;
+
+   cookie = dmaengine_submit(desc);
+   if (cookie < 0)
+   ret = cookie;
+
+   return ret;
+}
+
+static void sdhci_external_dma_release(struct sdhci_host *host)
+{
+   if (host->tx_chan) {
+   dma_release_channel(host->tx_chan);
+   host->tx_chan = NULL;
+   }
+
+   if (host->rx_chan) {
+   dma_release_channel(host->rx_chan);
+   host->rx_chan = NULL;
+   }
+
+   sdhci_switch_external_dma(host, false);
+}
+
+static void sdhci_external_dma_prepare_data(struct sdhci_host *host,
+   struct mmc_command *cmd)
+{
+   if (sdhci_external_dma_setup(host, cmd)) {
+   sdhci_external_dma_release(host);
+   pr_err("%s: Failed to setup external DMA, switch to the DMA/PIO 
which standard SDHCI provides.\n",
+

Re: [RFC PATCH] hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined

2018-12-03 Thread Naoya Horiguchi

On Mon, Dec 03, 2018 at 11:03:09AM +0100, Michal Hocko wrote:
> From: Michal Hocko 
> 
> We have received a bug report that an injected MCE about faulty memory
> prevents memory offline to succeed. The underlying reason is that the
> HWPoison page has an elevated reference count and the migration keeps
> failing. There are two problems with that. First of all it is dubious
> to migrate the poisoned page because we know that accessing that memory
> is possible to fail. Secondly it doesn't make any sense to migrate a
> potentially broken content and preserve the memory corruption over to a
> new location.
> 
> Oscar has found out that it is the elevated reference count from
> memory_failure that is confusing the offlining path. HWPoisoned pages
> are isolated from the LRU list but __offline_pages might still try to
> migrate them if there is any preceding migrateable pages in the pfn
> range. Such a migration would fail due to the reference count but
> the migration code would put it back on the LRU list. This is quite
> wrong in itself but it would also make scan_movable_pages stumble over
> it again without any way out.
> 
> This means that the hotremove with hwpoisoned pages has never really
> worked (without a luck). HWPoisoning really needs a larger surgery
> but an immediate and backportable fix is to skip over these pages during
> offlining. Even if they are still mapped for some reason then
> try_to_unmap should turn those mappings into hwpoison ptes and cause
> SIGBUS on access. Nobody should be really touching the content of the
> page so it should be safe to ignore them even when there is a pending
> reference count.
> 
> Debugged-by: Oscar Salvador 
> Cc: stable
> Signed-off-by: Michal Hocko 
> ---
> Hi,
> I am sending this as an RFC now because I am not fully sure I see all
> the consequences myself yet. This has passed a testing by Oscar but I
> would highly appreciate a review from Naoya about my assumptions about
> hwpoisoning. E.g. it is not entirely clear to me whether there is a
> potential case where the page might be still mapped.

One potential case is ksm page, for which we give up unmapping and leave
it unmapped. Rather than that I don't have any idea, but any new type of
page would be potentially categorized to this class.

> I have put
> try_to_unmap just to be sure. It would be really great if I could drop
> that part because then it is not really great which of the TTU flags to
> use to cover all potential cases.
> 
> I have marked the patch for stable but I have no idea how far back it
> should go. Probably everything that already has hotremove and hwpoison
> code.

Yes, maybe this could be ported to all active stable trees.

> 
> Thanks in advance!
> 
>  mm/memory_hotplug.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index c6c42a7425e5..08c576d5a633 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -34,6 +34,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -1366,6 +1367,17 @@ do_migrate_range(unsigned long start_pfn, unsigned 
> long end_pfn)
>   pfn = page_to_pfn(compound_head(page))
>   + hpage_nr_pages(page) - 1;
>  
> + /*
> +  * HWPoison pages have elevated reference counts so the 
> migration would
> +  * fail on them. It also doesn't make any sense to migrate them 
> in the
> +  * first place. Still try to unmap such a page in case it is 
> still mapped.
> +  */
> + if (PageHWPoison(page)) {
> + if (page_mapped(page))
> + try_to_unmap(page, TTU_IGNORE_MLOCK | 
> TTU_IGNORE_ACCESS);
> + continue;
> + }
> +

I think this looks OK (no better idea.)

Reviewed-by: Naoya Horiguchi 

I wondered why I didn't find this for long, and found that my testing only
covered the case where PageHWPoison is the first page of memory block.
scan_movable_pages() considers PageHWPoison as non-movable, so 
do_migrate_range()
started with pfn after the PageHWPoison and never tried to migrate it
(so effectively ignored every PageHWPoison as the above code does.)

Thanks,
Naoya Horiguchi

>   if (!get_page_unless_zero(page))
>   continue;
>   /*
> -- 
> 2.19.1
> 
>

Re: [PATCH 2/3] mm/vmscan: Enable kswapd to reclaim low-protected memory

2018-12-03 Thread Michal Hocko

On Tue 04-12-18 10:40:29, Xunlei Pang wrote:
> On 2018/12/4 AM 1:22, Michal Hocko wrote:
> > On Mon 03-12-18 23:20:31, Xunlei Pang wrote:
> >> On 2018/12/3 下午7:56, Michal Hocko wrote:
> >>> On Mon 03-12-18 16:01:18, Xunlei Pang wrote:
>  There may be cgroup memory overcommitment, it will become
>  even common in the future.
> 
>  Let's enable kswapd to reclaim low-protected memory in case
>  of memory pressure, to mitigate the global direct reclaim
>  pressures which could cause jitters to the response time of
>  lantency-sensitive groups.
> >>>
> >>> Please be more descriptive about the problem you are trying to handle
> >>> here. I haven't actually read the patch but let me emphasise that the
> >>> low limit protection is important isolation tool. And allowing kswapd to
> >>> reclaim protected memcgs is going to break the semantic as it has been
> >>> introduced and designed.
> >>
> >> We have two types of memcgs: online groups(important business)
> >> and offline groups(unimportant business). Online groups are
> >> all configured with MAX low protection, while offline groups
> >> are not at all protected(with default 0 low).
> >>
> >> When offline groups are overcommitted, the global memory pressure
> >> suffers. This will cause the memory allocations from online groups
> >> constantly go to the slow global direct reclaim in order to reclaim
> >> online's page caches, as kswap is not able to reclaim low-protection
> >> memory. low is not hard limit, it's reasonable to be reclaimed by
> >> kswapd if there's no other reclaimable memory.
> > 
> > I am sorry I still do not follow. What role do offline cgroups play.
> > Those are certainly not low mem protected because mem_cgroup_css_offline
> > will reset them to 0.
> > 
> 
> Oh, I meant "offline groups" to be "offline-business groups", memcgs
> refered to here are all "online state" from kernel's perspective.

What is offline-business group? Please try to explain the actual problem
in much more details and do not let us guess.

-- 
Michal Hocko
SUSE Labs

[PATCH V3 1/3] mmc: sdhci: add support for using external DMA devices

2018-12-03 Thread Chunyan Zhang

Some standard SD host controllers can support both external dma
controllers as well as ADMA/SDMA in which the SD host controller
acts as DMA master. TI's omap controller is the case as an example.

Currently the generic SDHCI code supports ADMA/SDMA integrated in
the host controller but does not have any support for external DMA
controllers implemented using dmaengine, meaning that custom code is
needed for any systems that use an external DMA controller with SDHCI.

Signed-off-by: Chunyan Zhang 
---
 drivers/mmc/host/Kconfig |   3 +
 drivers/mmc/host/sdhci.c | 185 ++-
 drivers/mmc/host/sdhci.h |   8 ++
 3 files changed, 195 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
index 1b58739..3101da6 100644
--- a/drivers/mmc/host/Kconfig
+++ b/drivers/mmc/host/Kconfig
@@ -977,3 +977,6 @@ config MMC_SDHCI_OMAP
  If you have a controller with this interface, say Y or M here.
 
  If unsure, say N.
+
+config MMC_SDHCI_EXTERNAL_DMA
+bool
diff --git a/drivers/mmc/host/sdhci.c b/drivers/mmc/host/sdhci.c
index 99bdae5..04b029c 100644
--- a/drivers/mmc/host/sdhci.c
+++ b/drivers/mmc/host/sdhci.c
@@ -14,6 +14,7 @@
  */
 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -1309,6 +1310,162 @@ static void sdhci_del_timer(struct sdhci_host *host, 
struct mmc_request *mrq)
del_timer(>timer);
 }
 
+#if IS_ENABLED(CONFIG_MMC_SDHCI_EXTERNAL_DMA)
+static int sdhci_external_dma_init(struct sdhci_host *host)
+{
+   int ret = 0;
+   struct mmc_host *mmc = host->mmc;
+
+   host->tx_chan = dma_request_chan(mmc->parent, "tx");
+   if (IS_ERR(host->tx_chan)) {
+   ret = PTR_ERR(host->tx_chan);
+   if (ret != -EPROBE_DEFER)
+   pr_warn("Failed to request TX DMA channel.\n");
+   host->tx_chan = NULL;
+   return ret;
+   }
+
+   host->rx_chan = dma_request_chan(mmc->parent, "rx");
+   if (IS_ERR(host->rx_chan)) {
+   if (host->tx_chan) {
+   dma_release_channel(host->tx_chan);
+   host->tx_chan = NULL;
+   }
+
+   ret = PTR_ERR(host->rx_chan);
+   if (ret != -EPROBE_DEFER)
+   pr_warn("Failed to request RX DMA channel.\n");
+   host->rx_chan = NULL;
+   }
+
+   return ret;
+}
+
+static inline struct dma_chan *
+sdhci_external_dma_channel(struct sdhci_host *host, struct mmc_data *data)
+{
+   return data->flags & MMC_DATA_WRITE ? host->tx_chan : host->rx_chan;
+}
+
+static int sdhci_external_dma_setup(struct sdhci_host *host,
+   struct mmc_command *cmd)
+{
+   int ret, i;
+   struct dma_async_tx_descriptor *desc;
+   struct mmc_data *data = cmd->data;
+   struct dma_chan *chan;
+   struct dma_slave_config cfg;
+   dma_cookie_t cookie;
+
+   if (!data)
+   return 0;
+
+   if (!host->mapbase)
+   return -EINVAL;
+
+   cfg.src_addr = host->mapbase + SDHCI_BUFFER;
+   cfg.dst_addr = host->mapbase + SDHCI_BUFFER;
+   cfg.src_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
+   cfg.dst_addr_width = DMA_SLAVE_BUSWIDTH_4_BYTES;
+   cfg.src_maxburst = data->blksz / 4;
+   cfg.dst_maxburst = data->blksz / 4;
+
+   /* Sanity check: all the SG entries must be aligned by block size. */
+   for (i = 0; i < data->sg_len; i++) {
+   if ((data->sg + i)->length % data->blksz)
+   return -EINVAL;
+   }
+
+   chan = sdhci_external_dma_channel(host, data);
+
+   ret = dmaengine_slave_config(chan, );
+   if (ret)
+   return ret;
+
+   desc = dmaengine_prep_slave_sg(chan, data->sg, data->sg_len,
+  mmc_get_dma_dir(data),
+  DMA_PREP_INTERRUPT | DMA_CTRL_ACK);
+   if (!desc)
+   return -EINVAL;
+
+   desc->callback = NULL;
+   desc->callback_param = NULL;
+
+   cookie = dmaengine_submit(desc);
+   if (cookie < 0)
+   ret = cookie;
+
+   return ret;
+}
+
+static void sdhci_external_dma_release(struct sdhci_host *host)
+{
+   if (host->tx_chan) {
+   dma_release_channel(host->tx_chan);
+   host->tx_chan = NULL;
+   }
+
+   if (host->rx_chan) {
+   dma_release_channel(host->rx_chan);
+   host->rx_chan = NULL;
+   }
+
+   sdhci_switch_external_dma(host, false);
+}
+
+static void sdhci_external_dma_prepare_data(struct sdhci_host *host,
+   struct mmc_command *cmd)
+{
+   if (sdhci_external_dma_setup(host, cmd)) {
+   sdhci_external_dma_release(host);
+   pr_err("%s: Failed to setup external DMA, switch to the DMA/PIO 
which standard SDHCI provides.\n",
+

Re: [RFC PATCH] hwpoison, memory_hotplug: allow hwpoisoned pages to be offlined

2018-12-03 Thread Naoya Horiguchi

On Mon, Dec 03, 2018 at 11:03:09AM +0100, Michal Hocko wrote:
> From: Michal Hocko 
> 
> We have received a bug report that an injected MCE about faulty memory
> prevents memory offline to succeed. The underlying reason is that the
> HWPoison page has an elevated reference count and the migration keeps
> failing. There are two problems with that. First of all it is dubious
> to migrate the poisoned page because we know that accessing that memory
> is possible to fail. Secondly it doesn't make any sense to migrate a
> potentially broken content and preserve the memory corruption over to a
> new location.
> 
> Oscar has found out that it is the elevated reference count from
> memory_failure that is confusing the offlining path. HWPoisoned pages
> are isolated from the LRU list but __offline_pages might still try to
> migrate them if there is any preceding migrateable pages in the pfn
> range. Such a migration would fail due to the reference count but
> the migration code would put it back on the LRU list. This is quite
> wrong in itself but it would also make scan_movable_pages stumble over
> it again without any way out.
> 
> This means that the hotremove with hwpoisoned pages has never really
> worked (without a luck). HWPoisoning really needs a larger surgery
> but an immediate and backportable fix is to skip over these pages during
> offlining. Even if they are still mapped for some reason then
> try_to_unmap should turn those mappings into hwpoison ptes and cause
> SIGBUS on access. Nobody should be really touching the content of the
> page so it should be safe to ignore them even when there is a pending
> reference count.
> 
> Debugged-by: Oscar Salvador 
> Cc: stable
> Signed-off-by: Michal Hocko 
> ---
> Hi,
> I am sending this as an RFC now because I am not fully sure I see all
> the consequences myself yet. This has passed a testing by Oscar but I
> would highly appreciate a review from Naoya about my assumptions about
> hwpoisoning. E.g. it is not entirely clear to me whether there is a
> potential case where the page might be still mapped.

One potential case is ksm page, for which we give up unmapping and leave
it unmapped. Rather than that I don't have any idea, but any new type of
page would be potentially categorized to this class.

> I have put
> try_to_unmap just to be sure. It would be really great if I could drop
> that part because then it is not really great which of the TTU flags to
> use to cover all potential cases.
> 
> I have marked the patch for stable but I have no idea how far back it
> should go. Probably everything that already has hotremove and hwpoison
> code.

Yes, maybe this could be ported to all active stable trees.

> 
> Thanks in advance!
> 
>  mm/memory_hotplug.c | 12 
>  1 file changed, 12 insertions(+)
> 
> diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
> index c6c42a7425e5..08c576d5a633 100644
> --- a/mm/memory_hotplug.c
> +++ b/mm/memory_hotplug.c
> @@ -34,6 +34,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  
> @@ -1366,6 +1367,17 @@ do_migrate_range(unsigned long start_pfn, unsigned 
> long end_pfn)
>   pfn = page_to_pfn(compound_head(page))
>   + hpage_nr_pages(page) - 1;
>  
> + /*
> +  * HWPoison pages have elevated reference counts so the 
> migration would
> +  * fail on them. It also doesn't make any sense to migrate them 
> in the
> +  * first place. Still try to unmap such a page in case it is 
> still mapped.
> +  */
> + if (PageHWPoison(page)) {
> + if (page_mapped(page))
> + try_to_unmap(page, TTU_IGNORE_MLOCK | 
> TTU_IGNORE_ACCESS);
> + continue;
> + }
> +

I think this looks OK (no better idea.)

Reviewed-by: Naoya Horiguchi 

I wondered why I didn't find this for long, and found that my testing only
covered the case where PageHWPoison is the first page of memory block.
scan_movable_pages() considers PageHWPoison as non-movable, so 
do_migrate_range()
started with pfn after the PageHWPoison and never tried to migrate it
(so effectively ignored every PageHWPoison as the above code does.)

Thanks,
Naoya Horiguchi

>   if (!get_page_unless_zero(page))
>   continue;
>   /*
> -- 
> 2.19.1
> 
>

[PATCH V3 3/3] dt-bindings: sdhci-omap: Add example for using external dma

2018-12-03 Thread Chunyan Zhang

sdhci-omap can support both external dma controller via dmaengine
framework as well as ADMA which standard SD host controller
provides.

Signed-off-by: Chunyan Zhang 
---
 Documentation/devicetree/bindings/mmc/sdhci-omap.txt | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/devicetree/bindings/mmc/sdhci-omap.txt 
b/Documentation/devicetree/bindings/mmc/sdhci-omap.txt
index 393848c..c73fd47 100644
--- a/Documentation/devicetree/bindings/mmc/sdhci-omap.txt
+++ b/Documentation/devicetree/bindings/mmc/sdhci-omap.txt
@@ -12,6 +12,11 @@ Required properties:
 "ddr_1_8v-rev11", "ddr_1_8v" or "ddr_3_3v", "hs200_1_8v-rev11",
 "hs200_1_8v",
 - pinctrl- : Pinctrl states as described in 
bindings/pinctrl/pinctrl-bindings.txt
+- dmas:List of DMA specifiers with the controller specific 
format as described
+   in the generic DMA client binding. A tx and rx specifier is 
required.
+- dma-names:   List of DMA request names. These strings correspond 1:1 with the
+   DMA specifiers listed in dmas. The string naming is to be "rx"
+   and "tx" for RX and TX DMA requests, respectively.
 
 Example:
mmc1: mmc@4809c000 {
@@ -20,4 +25,6 @@ Example:
ti,hwmods = "mmc1";
bus-width = <4>;
vmmc-supply = <>; /* phandle to regulator node */
+   dmas = < 61  62>;
+   dma-names = "tx", "rx";
};
-- 
2.7.4

[PATCH V3 2/3] mmc: sdhci-omap: Add using external dma

2018-12-03 Thread Chunyan Zhang

sdhci-omap can support both external dma controller via dmaengine framework
as well as ADMA which standard SD host controller provides.

Signed-off-by: Chunyan Zhang 
---
 drivers/mmc/host/Kconfig  |  1 +
 drivers/mmc/host/sdhci-omap.c | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
index 3101da6..7846754 100644
--- a/drivers/mmc/host/Kconfig
+++ b/drivers/mmc/host/Kconfig
@@ -969,6 +969,7 @@ config MMC_SDHCI_XENON
 config MMC_SDHCI_OMAP
tristate "TI SDHCI Controller Support"
depends on MMC_SDHCI_PLTFM && OF
+   select MMC_SDHCI_EXTERNAL_DMA if DMA_ENGINE
help
  This selects the Secure Digital Host Controller Interface (SDHCI)
  support present in TI's DRA7 SOCs. The controller supports
diff --git a/drivers/mmc/host/sdhci-omap.c b/drivers/mmc/host/sdhci-omap.c
index 88347ce..b164fcc 100644
--- a/drivers/mmc/host/sdhci-omap.c
+++ b/drivers/mmc/host/sdhci-omap.c
@@ -896,6 +896,7 @@ static int sdhci_omap_probe(struct platform_device *pdev)
const struct of_device_id *match;
struct sdhci_omap_data *data;
const struct soc_device_attribute *soc;
+   struct resource *regs;
 
match = of_match_device(omap_sdhci_match, dev);
if (!match)
@@ -908,6 +909,10 @@ static int sdhci_omap_probe(struct platform_device *pdev)
}
offset = data->offset;
 
+   regs = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+   if (!regs)
+   return -ENXIO;
+
host = sdhci_pltfm_init(pdev, _omap_pdata,
sizeof(*omap_host));
if (IS_ERR(host)) {
@@ -924,6 +929,7 @@ static int sdhci_omap_probe(struct platform_device *pdev)
omap_host->timing = MMC_TIMING_LEGACY;
omap_host->flags = data->flags;
host->ioaddr += offset;
+   host->mapbase = regs->start;
 
mmc = host->mmc;
sdhci_get_of_property(pdev);
@@ -991,6 +997,10 @@ static int sdhci_omap_probe(struct platform_device *pdev)
host->mmc_host_ops.execute_tuning = sdhci_omap_execute_tuning;
host->mmc_host_ops.enable_sdio_irq = sdhci_omap_enable_sdio_irq;
 
+   /* Switch to external DMA only if there is the "dmas" property */
+   if (of_find_property(dev->of_node, "dmas", NULL))
+   sdhci_switch_external_dma(host, true);
+
ret = sdhci_setup_host(host);
if (ret)
goto err_put_sync;
-- 
2.7.4

Re: [PATCH] drm/sched: Fix a use-after-free when tracing the scheduler.

2018-12-03 Thread Koenig, Christian

Am 03.12.18 um 21:14 schrieb Eric Anholt:
> With DEBUG_SLAB (poisoning on free) enabled, I could quickly produce
> an oops when tracing V3D.

Good catch, but the solution is a clear NAK.

drm_sched_entity_add_dependency_cb() can result in setting 
entity->dependency to NULL. That in turn can lead to a memory leak 
because we call the _put with a NULL fence.

Instead we should rather call trace_drm_sched_job_wait_dep() before even 
calling drm_sched_entity_add_dependency_cb(). This is also cleaner 
because we want to trace which dependencies the driver gave to the 
scheduler and not which we actually needed to add a callback to.

Regards,
Christian.

>
> Signed-off-by: Eric Anholt 
> ---
>
> I think this patch is correct (though maybe a bigger refactor could
> avoid the extra get/put?), but I've still got this with "vblank_mode=0
> perf record -a -e v3d:.\* -e gpu_scheduler:.\* glxgears".  Any ideas?
>
> [  139.842191] Unable to handle kernel NULL pointer dereference at virtual 
> address 0020
> [  139.850413] pgd = eab7bb57
> [  139.854424] [0020] *pgd=8040004003, *pmd=
> [  139.860523] Internal error: Oops: 206 [#1] SMP ARM
> [  139.865340] Modules linked in:
> [  139.868404] CPU: 0 PID: 1161 Comm: v3d_render Not tainted 4.20.0-rc4+ #552
> [  139.875287] Hardware name: Broadcom STB (Flattened Device Tree)
> [  139.881228] PC is at perf_trace_drm_sched_job_wait_dep+0xa8/0xf4
> [  139.887243] LR is at 0xe9790274
> [  139.890388] pc : []lr : []psr: a0050013
> [  139.896662] sp : ed21dec0  ip : ed21dec0  fp : ed21df04
> [  139.901893] r10: ed267478  r9 :   r8 : ff7bde04
> [  139.907123] r7 :   r6 : 0063  r5 :   r4 : c1208448
> [  139.913659] r3 : c1265690  r2 : ff7bf660  r1 : 0034  r0 : ff7bf660
> [  139.920196] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment 
> user
> [  139.927339] Control: 30c5383d  Table: 68fa3b40  DAC: fffd
> [  139.933095] Process v3d_render (pid: 1161, stack limit = 0xb3c84b1b)
> [  139.939457] Stack: (0xed21dec0 to 0xed21e000)
> [  139.943821] dec0: 20050013  eb9700cc  ec0e8e80  
> eb9700cc e9790274
> [  139.952009] dee0:  e2f59345 eb970078 eba8f680 c12ae00c c1208478 
>  e8c2b048
> [  139.960197] df00: eb9700cc c06e92e4 c06e8f04  80050013 ed267478 
> eb970078 
> [  139.968385] df20: ed267578 c0e45ae0 e9093080 c06e831c ed267630 c06e8120 
> c06e77d4 c1208448
> [  139.976573] df40: ee2e8acc 0001  ee2e8640 c0272ab4 ed21df54 
> ed21df54 e2f59345
> [  139.984762] df60: ed21c000 ed1b4800 ed2d7840  ed21c000 ed267478 
> c06e8084 ee935cb0
> [  139.992950] df80: ed1b4838 c0249b44 ed21c000 ed2d7840 c02499e4  
>  
> [  140.001138] dfa0:    c02010ac   
>  
> [  140.009326] dfc0:       
>  
> [  140.017514] dfe0:     0013  
>  
> [  140.025707] [] (perf_trace_drm_sched_job_wait_dep) from 
> [] (drm_sched_entity_pop_job+0x394/0x438)
> [  140.036332] [] (drm_sched_entity_pop_job) from [] 
> (drm_sched_main+0x9c/0x298)
> [  140.045221] [] (drm_sched_main) from [] 
> (kthread+0x160/0x168)
> [  140.052716] [] (kthread) from [] 
> (ret_from_fork+0x14/0x28)
>
>   drivers/gpu/drm/scheduler/sched_entity.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
> b/drivers/gpu/drm/scheduler/sched_entity.c
> index 4463d3826ecb..0d4fc86089cb 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -440,13 +440,15 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
> drm_sched_entity *entity)
>   
>   while ((entity->dependency =
>   sched->ops->dependency(sched_job, entity))) {
> -
> + dma_fence_get(entity->dependency);
>   if (drm_sched_entity_add_dependency_cb(entity)) {
>   
>   trace_drm_sched_job_wait_dep(sched_job,
>entity->dependency);
> + dma_fence_put(entity->dependency);
>   return NULL;
>   }
> + dma_fence_put(entity->dependency);
>   }
>   
>   /* skip jobs from entity that marked guilty */

[PATCH V3 3/3] dt-bindings: sdhci-omap: Add example for using external dma

2018-12-03 Thread Chunyan Zhang

sdhci-omap can support both external dma controller via dmaengine
framework as well as ADMA which standard SD host controller
provides.

Signed-off-by: Chunyan Zhang 
---
 Documentation/devicetree/bindings/mmc/sdhci-omap.txt | 7 +++
 1 file changed, 7 insertions(+)

diff --git a/Documentation/devicetree/bindings/mmc/sdhci-omap.txt 
b/Documentation/devicetree/bindings/mmc/sdhci-omap.txt
index 393848c..c73fd47 100644
--- a/Documentation/devicetree/bindings/mmc/sdhci-omap.txt
+++ b/Documentation/devicetree/bindings/mmc/sdhci-omap.txt
@@ -12,6 +12,11 @@ Required properties:
 "ddr_1_8v-rev11", "ddr_1_8v" or "ddr_3_3v", "hs200_1_8v-rev11",
 "hs200_1_8v",
 - pinctrl- : Pinctrl states as described in 
bindings/pinctrl/pinctrl-bindings.txt
+- dmas:List of DMA specifiers with the controller specific 
format as described
+   in the generic DMA client binding. A tx and rx specifier is 
required.
+- dma-names:   List of DMA request names. These strings correspond 1:1 with the
+   DMA specifiers listed in dmas. The string naming is to be "rx"
+   and "tx" for RX and TX DMA requests, respectively.
 
 Example:
mmc1: mmc@4809c000 {
@@ -20,4 +25,6 @@ Example:
ti,hwmods = "mmc1";
bus-width = <4>;
vmmc-supply = <>; /* phandle to regulator node */
+   dmas = < 61  62>;
+   dma-names = "tx", "rx";
};
-- 
2.7.4

[PATCH V3 2/3] mmc: sdhci-omap: Add using external dma

2018-12-03 Thread Chunyan Zhang

sdhci-omap can support both external dma controller via dmaengine framework
as well as ADMA which standard SD host controller provides.

Signed-off-by: Chunyan Zhang 
---
 drivers/mmc/host/Kconfig  |  1 +
 drivers/mmc/host/sdhci-omap.c | 10 ++
 2 files changed, 11 insertions(+)

diff --git a/drivers/mmc/host/Kconfig b/drivers/mmc/host/Kconfig
index 3101da6..7846754 100644
--- a/drivers/mmc/host/Kconfig
+++ b/drivers/mmc/host/Kconfig
@@ -969,6 +969,7 @@ config MMC_SDHCI_XENON
 config MMC_SDHCI_OMAP
tristate "TI SDHCI Controller Support"
depends on MMC_SDHCI_PLTFM && OF
+   select MMC_SDHCI_EXTERNAL_DMA if DMA_ENGINE
help
  This selects the Secure Digital Host Controller Interface (SDHCI)
  support present in TI's DRA7 SOCs. The controller supports
diff --git a/drivers/mmc/host/sdhci-omap.c b/drivers/mmc/host/sdhci-omap.c
index 88347ce..b164fcc 100644
--- a/drivers/mmc/host/sdhci-omap.c
+++ b/drivers/mmc/host/sdhci-omap.c
@@ -896,6 +896,7 @@ static int sdhci_omap_probe(struct platform_device *pdev)
const struct of_device_id *match;
struct sdhci_omap_data *data;
const struct soc_device_attribute *soc;
+   struct resource *regs;
 
match = of_match_device(omap_sdhci_match, dev);
if (!match)
@@ -908,6 +909,10 @@ static int sdhci_omap_probe(struct platform_device *pdev)
}
offset = data->offset;
 
+   regs = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+   if (!regs)
+   return -ENXIO;
+
host = sdhci_pltfm_init(pdev, _omap_pdata,
sizeof(*omap_host));
if (IS_ERR(host)) {
@@ -924,6 +929,7 @@ static int sdhci_omap_probe(struct platform_device *pdev)
omap_host->timing = MMC_TIMING_LEGACY;
omap_host->flags = data->flags;
host->ioaddr += offset;
+   host->mapbase = regs->start;
 
mmc = host->mmc;
sdhci_get_of_property(pdev);
@@ -991,6 +997,10 @@ static int sdhci_omap_probe(struct platform_device *pdev)
host->mmc_host_ops.execute_tuning = sdhci_omap_execute_tuning;
host->mmc_host_ops.enable_sdio_irq = sdhci_omap_enable_sdio_irq;
 
+   /* Switch to external DMA only if there is the "dmas" property */
+   if (of_find_property(dev->of_node, "dmas", NULL))
+   sdhci_switch_external_dma(host, true);
+
ret = sdhci_setup_host(host);
if (ret)
goto err_put_sync;
-- 
2.7.4

Re: [PATCH] drm/sched: Fix a use-after-free when tracing the scheduler.

2018-12-03 Thread Koenig, Christian

Am 03.12.18 um 21:14 schrieb Eric Anholt:
> With DEBUG_SLAB (poisoning on free) enabled, I could quickly produce
> an oops when tracing V3D.

Good catch, but the solution is a clear NAK.

drm_sched_entity_add_dependency_cb() can result in setting 
entity->dependency to NULL. That in turn can lead to a memory leak 
because we call the _put with a NULL fence.

Instead we should rather call trace_drm_sched_job_wait_dep() before even 
calling drm_sched_entity_add_dependency_cb(). This is also cleaner 
because we want to trace which dependencies the driver gave to the 
scheduler and not which we actually needed to add a callback to.

Regards,
Christian.

>
> Signed-off-by: Eric Anholt 
> ---
>
> I think this patch is correct (though maybe a bigger refactor could
> avoid the extra get/put?), but I've still got this with "vblank_mode=0
> perf record -a -e v3d:.\* -e gpu_scheduler:.\* glxgears".  Any ideas?
>
> [  139.842191] Unable to handle kernel NULL pointer dereference at virtual 
> address 0020
> [  139.850413] pgd = eab7bb57
> [  139.854424] [0020] *pgd=8040004003, *pmd=
> [  139.860523] Internal error: Oops: 206 [#1] SMP ARM
> [  139.865340] Modules linked in:
> [  139.868404] CPU: 0 PID: 1161 Comm: v3d_render Not tainted 4.20.0-rc4+ #552
> [  139.875287] Hardware name: Broadcom STB (Flattened Device Tree)
> [  139.881228] PC is at perf_trace_drm_sched_job_wait_dep+0xa8/0xf4
> [  139.887243] LR is at 0xe9790274
> [  139.890388] pc : []lr : []psr: a0050013
> [  139.896662] sp : ed21dec0  ip : ed21dec0  fp : ed21df04
> [  139.901893] r10: ed267478  r9 :   r8 : ff7bde04
> [  139.907123] r7 :   r6 : 0063  r5 :   r4 : c1208448
> [  139.913659] r3 : c1265690  r2 : ff7bf660  r1 : 0034  r0 : ff7bf660
> [  139.920196] Flags: NzCv  IRQs on  FIQs on  Mode SVC_32  ISA ARM  Segment 
> user
> [  139.927339] Control: 30c5383d  Table: 68fa3b40  DAC: fffd
> [  139.933095] Process v3d_render (pid: 1161, stack limit = 0xb3c84b1b)
> [  139.939457] Stack: (0xed21dec0 to 0xed21e000)
> [  139.943821] dec0: 20050013  eb9700cc  ec0e8e80  
> eb9700cc e9790274
> [  139.952009] dee0:  e2f59345 eb970078 eba8f680 c12ae00c c1208478 
>  e8c2b048
> [  139.960197] df00: eb9700cc c06e92e4 c06e8f04  80050013 ed267478 
> eb970078 
> [  139.968385] df20: ed267578 c0e45ae0 e9093080 c06e831c ed267630 c06e8120 
> c06e77d4 c1208448
> [  139.976573] df40: ee2e8acc 0001  ee2e8640 c0272ab4 ed21df54 
> ed21df54 e2f59345
> [  139.984762] df60: ed21c000 ed1b4800 ed2d7840  ed21c000 ed267478 
> c06e8084 ee935cb0
> [  139.992950] df80: ed1b4838 c0249b44 ed21c000 ed2d7840 c02499e4  
>  
> [  140.001138] dfa0:    c02010ac   
>  
> [  140.009326] dfc0:       
>  
> [  140.017514] dfe0:     0013  
>  
> [  140.025707] [] (perf_trace_drm_sched_job_wait_dep) from 
> [] (drm_sched_entity_pop_job+0x394/0x438)
> [  140.036332] [] (drm_sched_entity_pop_job) from [] 
> (drm_sched_main+0x9c/0x298)
> [  140.045221] [] (drm_sched_main) from [] 
> (kthread+0x160/0x168)
> [  140.052716] [] (kthread) from [] 
> (ret_from_fork+0x14/0x28)
>
>   drivers/gpu/drm/scheduler/sched_entity.c | 4 +++-
>   1 file changed, 3 insertions(+), 1 deletion(-)
>
> diff --git a/drivers/gpu/drm/scheduler/sched_entity.c 
> b/drivers/gpu/drm/scheduler/sched_entity.c
> index 4463d3826ecb..0d4fc86089cb 100644
> --- a/drivers/gpu/drm/scheduler/sched_entity.c
> +++ b/drivers/gpu/drm/scheduler/sched_entity.c
> @@ -440,13 +440,15 @@ struct drm_sched_job *drm_sched_entity_pop_job(struct 
> drm_sched_entity *entity)
>   
>   while ((entity->dependency =
>   sched->ops->dependency(sched_job, entity))) {
> -
> + dma_fence_get(entity->dependency);
>   if (drm_sched_entity_add_dependency_cb(entity)) {
>   
>   trace_drm_sched_job_wait_dep(sched_job,
>entity->dependency);
> + dma_fence_put(entity->dependency);
>   return NULL;
>   }
> + dma_fence_put(entity->dependency);
>   }
>   
>   /* skip jobs from entity that marked guilty */

[PATCH V3 0/3] Add support for using external dma in SDHCI

2018-12-03 Thread Chunyan Zhang

Currently the generic SDHCI code in the Linux kernel supports the SD
standard DMA integrated into the host controller but does not have any
support for external DMA controllers implemented using dmaengine meaning
that custom code is needed for any systems that use a generic DMA
controller with SDHCI which in practice means any SDHCI controller that
doesn't have an integrated DMA controller so we should have this as a
generic feature.

There are already a number of controller specific drivers that have dmaengine
code, and some could use sdhci.c actually, but needed to implement 
mmc_ops->request()
in their specific driver for sending command with external dma using dmaengine
framework, with this patchset, them will take advantage of the generic support.
TI's omap controller is the case as an example.

Any comments are very appreciated.

Thanks,
Chunyan

Changes from v2 (https://lkml.org/lkml/2018/11/12/1936):
* Remove CONFIG_EXTERNAL_DMA prompt and help graph;
* Add checking for cmd->data;
* Select MMC_SDHCI_EXTERNAL_DMA for MMC_SDHCI_OMAP;
* Add checking if there's 'dmas' in device tree before decide using external 
dma.

Changes from v1 (https://lkml.org/lkml/2018/11/5/110):
(The code on patch 1/3 only was revised)
* Address comments from Arnd:
- Release channel when failed to request it unconditionally;
- Skip warning message if get EPROBE_DEFER;
* Address Andrian's comments:
- Replace extdma with external_dma;
- Add release dma resources in sdhci_cleanup_host() and sdhci_remove_host();
- Release dma resources once dmaengine_submit() failed.
- Put rx/tx_chan in struct sdhci_host, and removed unused structure.
* Fall back to the DMA/PIO which standard SDHCI supports, if 
sdhci_external_dma_setup()
  or sdhci_external_dma_init failed;

Chunyan Zhang (3):
  mmc: sdhci: add support for using external DMA devices
  mmc: sdhci-omap: Add using external dma
  dt-bindings: sdhci-omap: Add example for using external dma

 .../devicetree/bindings/mmc/sdhci-omap.txt |   7 +
 drivers/mmc/host/Kconfig   |   4 +
 drivers/mmc/host/sdhci-omap.c  |  10 ++
 drivers/mmc/host/sdhci.c   | 185 -
 drivers/mmc/host/sdhci.h   |   8 +
 5 files changed, 213 insertions(+), 1 deletion(-)

-- 
2.7.4

[PATCH V3 0/3] Add support for using external dma in SDHCI

2018-12-03 Thread Chunyan Zhang

Currently the generic SDHCI code in the Linux kernel supports the SD
standard DMA integrated into the host controller but does not have any
support for external DMA controllers implemented using dmaengine meaning
that custom code is needed for any systems that use a generic DMA
controller with SDHCI which in practice means any SDHCI controller that
doesn't have an integrated DMA controller so we should have this as a
generic feature.

There are already a number of controller specific drivers that have dmaengine
code, and some could use sdhci.c actually, but needed to implement 
mmc_ops->request()
in their specific driver for sending command with external dma using dmaengine
framework, with this patchset, them will take advantage of the generic support.
TI's omap controller is the case as an example.

Any comments are very appreciated.

Thanks,
Chunyan

Changes from v2 (https://lkml.org/lkml/2018/11/12/1936):
* Remove CONFIG_EXTERNAL_DMA prompt and help graph;
* Add checking for cmd->data;
* Select MMC_SDHCI_EXTERNAL_DMA for MMC_SDHCI_OMAP;
* Add checking if there's 'dmas' in device tree before decide using external 
dma.

Changes from v1 (https://lkml.org/lkml/2018/11/5/110):
(The code on patch 1/3 only was revised)
* Address comments from Arnd:
- Release channel when failed to request it unconditionally;
- Skip warning message if get EPROBE_DEFER;
* Address Andrian's comments:
- Replace extdma with external_dma;
- Add release dma resources in sdhci_cleanup_host() and sdhci_remove_host();
- Release dma resources once dmaengine_submit() failed.
- Put rx/tx_chan in struct sdhci_host, and removed unused structure.
* Fall back to the DMA/PIO which standard SDHCI supports, if 
sdhci_external_dma_setup()
  or sdhci_external_dma_init failed;

Chunyan Zhang (3):
  mmc: sdhci: add support for using external DMA devices
  mmc: sdhci-omap: Add using external dma
  dt-bindings: sdhci-omap: Add example for using external dma

 .../devicetree/bindings/mmc/sdhci-omap.txt |   7 +
 drivers/mmc/host/Kconfig   |   4 +
 drivers/mmc/host/sdhci-omap.c  |  10 ++
 drivers/mmc/host/sdhci.c   | 185 -
 drivers/mmc/host/sdhci.h   |   8 +
 5 files changed, 213 insertions(+), 1 deletion(-)

-- 
2.7.4

Re: [PATCH] mm/alloc: fallback to first node if the wanted node offline

2018-12-03 Thread Michal Hocko

On Tue 04-12-18 11:05:57, Pingfan Liu wrote:
> During my test on some AMD machine, with kexec -l nr_cpus=x option, the
> kernel failed to bootup, because some node's data struct can not be allocated,
> e.g, on x86, initialized by init_cpu_to_node()->init_memory_less_node(). But
> device->numa_node info is used as preferred_nid param for
> __alloc_pages_nodemask(), which causes NULL reference
>   ac->zonelist = node_zonelist(preferred_nid, gfp_mask);
> This patch tries to fix the issue by falling back to the first online node,
> when encountering such corner case.

We have seen similar issues already and the bug was usually that the
zonelists were not initialized yet or the node is completely bogus.
Zonelists should be initialized by build_all_zonelists quite early so I
am wondering whether the later is the case. What is the actual node
number the device is associated with?

Your patch is not correct btw, because we want to fallback into the node in
the distance order rather into the first online node.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH] mm/alloc: fallback to first node if the wanted node offline

2018-12-03 Thread Michal Hocko

On Tue 04-12-18 11:05:57, Pingfan Liu wrote:
> During my test on some AMD machine, with kexec -l nr_cpus=x option, the
> kernel failed to bootup, because some node's data struct can not be allocated,
> e.g, on x86, initialized by init_cpu_to_node()->init_memory_less_node(). But
> device->numa_node info is used as preferred_nid param for
> __alloc_pages_nodemask(), which causes NULL reference
>   ac->zonelist = node_zonelist(preferred_nid, gfp_mask);
> This patch tries to fix the issue by falling back to the first online node,
> when encountering such corner case.

We have seen similar issues already and the bug was usually that the
zonelists were not initialized yet or the node is completely bogus.
Zonelists should be initialized by build_all_zonelists quite early so I
am wondering whether the later is the case. What is the actual node
number the device is associated with?

Your patch is not correct btw, because we want to fallback into the node in
the distance order rather into the first online node.
-- 
Michal Hocko
SUSE Labs

Re: [PATCH] mm/alloc: fallback to first node if the wanted node offline

2018-12-03 Thread Pingfan Liu

On Tue, Dec 4, 2018 at 2:54 PM Wei Yang  wrote:
>
> On Tue, Dec 04, 2018 at 11:05:57AM +0800, Pingfan Liu wrote:
> >During my test on some AMD machine, with kexec -l nr_cpus=x option, the
> >kernel failed to bootup, because some node's data struct can not be 
> >allocated,
> >e.g, on x86, initialized by init_cpu_to_node()->init_memory_less_node(). But
> >device->numa_node info is used as preferred_nid param for
>
> could we fix the preferred_nid before passed to
> __alloc_pages_nodemask()?
>
Yes, we can doit too, but what is the gain?

> BTW, I don't catch the function call flow to this point. Would you mind
> giving me some hint?
>
You can track the code along slab_alloc() ->...->__alloc_pages_nodemask()

Thanks,
Pingfan

Re: [PATCH] mm/alloc: fallback to first node if the wanted node offline

2018-12-03 Thread Pingfan Liu

On Tue, Dec 4, 2018 at 2:54 PM Wei Yang  wrote:
>
> On Tue, Dec 04, 2018 at 11:05:57AM +0800, Pingfan Liu wrote:
> >During my test on some AMD machine, with kexec -l nr_cpus=x option, the
> >kernel failed to bootup, because some node's data struct can not be 
> >allocated,
> >e.g, on x86, initialized by init_cpu_to_node()->init_memory_less_node(). But
> >device->numa_node info is used as preferred_nid param for
>
> could we fix the preferred_nid before passed to
> __alloc_pages_nodemask()?
>
Yes, we can doit too, but what is the gain?

> BTW, I don't catch the function call flow to this point. Would you mind
> giving me some hint?
>
You can track the code along slab_alloc() ->...->__alloc_pages_nodemask()

Thanks,
Pingfan

Re: [PATCH] mm/alloc: fallback to first node if the wanted node offline

2018-12-03 Thread Pingfan Liu

On Tue, Dec 4, 2018 at 11:53 AM David Rientjes  wrote:
>
> On Tue, 4 Dec 2018, Pingfan Liu wrote:
>
> > diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> > index 76f8db0..8324953 100644
> > --- a/include/linux/gfp.h
> > +++ b/include/linux/gfp.h
> > @@ -453,6 +453,8 @@ static inline int gfp_zonelist(gfp_t flags)
> >   */
> >  static inline struct zonelist *node_zonelist(int nid, gfp_t flags)
> >  {
> > + if (unlikely(!node_online(nid)))
> > + nid = first_online_node;
> >   return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags);
> >  }
> >
>
> So we're passing the node id from dev_to_node() to kmalloc which
> interprets that as the preferred node and then does node_zonelist() to
> find the zonelist at allocation time.
>
> What happens if we fix this in alloc_dr()?  Does anything else cause
> problems?
>
I think it is better to fix it mm, since it can protect any new
similar bug in future. While fixing in alloc_dr() just work at present

> And rather than using first_online_node, would next_online_node() work?
>
What is the gain? Is it for memory pressure on node0?

Thanks,
Pingfan

> I'm thinking about this:
>
> diff --git a/drivers/base/devres.c b/drivers/base/devres.c
> --- a/drivers/base/devres.c
> +++ b/drivers/base/devres.c
> @@ -100,6 +100,8 @@ static __always_inline struct devres * 
> alloc_dr(dr_release_t release,
> _size)))
> return NULL;
>
> +   if (unlikely(!node_online(nid)))
> +   nid = next_online_node(nid);
> dr = kmalloc_node_track_caller(tot_size, gfp, nid);
> if (unlikely(!dr))
> return NULL;

Re: [PATCH] mm/alloc: fallback to first node if the wanted node offline

2018-12-03 Thread Pingfan Liu

On Tue, Dec 4, 2018 at 11:53 AM David Rientjes  wrote:
>
> On Tue, 4 Dec 2018, Pingfan Liu wrote:
>
> > diff --git a/include/linux/gfp.h b/include/linux/gfp.h
> > index 76f8db0..8324953 100644
> > --- a/include/linux/gfp.h
> > +++ b/include/linux/gfp.h
> > @@ -453,6 +453,8 @@ static inline int gfp_zonelist(gfp_t flags)
> >   */
> >  static inline struct zonelist *node_zonelist(int nid, gfp_t flags)
> >  {
> > + if (unlikely(!node_online(nid)))
> > + nid = first_online_node;
> >   return NODE_DATA(nid)->node_zonelists + gfp_zonelist(flags);
> >  }
> >
>
> So we're passing the node id from dev_to_node() to kmalloc which
> interprets that as the preferred node and then does node_zonelist() to
> find the zonelist at allocation time.
>
> What happens if we fix this in alloc_dr()?  Does anything else cause
> problems?
>
I think it is better to fix it mm, since it can protect any new
similar bug in future. While fixing in alloc_dr() just work at present

> And rather than using first_online_node, would next_online_node() work?
>
What is the gain? Is it for memory pressure on node0?

Thanks,
Pingfan

> I'm thinking about this:
>
> diff --git a/drivers/base/devres.c b/drivers/base/devres.c
> --- a/drivers/base/devres.c
> +++ b/drivers/base/devres.c
> @@ -100,6 +100,8 @@ static __always_inline struct devres * 
> alloc_dr(dr_release_t release,
> _size)))
> return NULL;
>
> +   if (unlikely(!node_online(nid)))
> +   nid = next_online_node(nid);
> dr = kmalloc_node_track_caller(tot_size, gfp, nid);
> if (unlikely(!dr))
> return NULL;

general protection fault in kvm_arch_vcpu_ioctl_run

2018-12-03 Thread syzbot


Hello,

syzbot found the following crash on:

HEAD commit:4b78317679c4 Merge branch 'x86-pti-for-linus' of git://git..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15e979f540
kernel config:  https://syzkaller.appspot.com/x/.config?x=4602730af4f872ef
dashboard link: https://syzkaller.appspot.com/bug?extid=39810e6c400efadfef71
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+39810e6c400efadfe...@syzkaller.appspotmail.com

kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault:  [#1] PREEMPT SMP KASAN
CPU: 0 PID: 14932 Comm: syz-executor0 Not tainted 4.20.0-rc4+ #138
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

RIP: 0010:kvm_apic_hw_enabled arch/x86/kvm/lapic.h:169 [inline]
RIP: 0010:vcpu_scan_ioapic arch/x86/kvm/x86.c:7449 [inline]
RIP: 0010:vcpu_enter_guest arch/x86/kvm/x86.c:7602 [inline]
RIP: 0010:vcpu_run arch/x86/kvm/x86.c:7874 [inline]
RIP: 0010:kvm_arch_vcpu_ioctl_run+0x5296/0x7320 arch/x86/kvm/x86.c:8074
Code: 03 00 00 48 89 f8 48 c1 e8 03 42 80 3c 20 00 0f 85 b4 1e 00 00 49 8b  
9f e0 03 00 00 48 8d bb 88 00 00 00 48 89 f8 48 c1 e8 03 <42> 80 3c 20 00  
0f 85 8a 1e 00 00 48 8b 9b 88 00 00 00 48 8d bb d8

RSP: 0018:88818b0bf530 EFLAGS: 00010206
RAX: 0011 RBX:  RCX: c9001302b000
RDX: 00cf RSI: 81103a68 RDI: 0088
RBP: 88818b0bf8d0 R08: 8881bfe8e0c0 R09: 0008
R10: 0028 R11: 810feb0f R12: dc00
R13:  R14: c90007ddfdb8 R15: 888188a18400
FS:  7ff919977700() GS:8881dae0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7fc94713c518 CR3: 0001cdea CR4: 001426f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 kvm_vcpu_ioctl+0x5c8/0x1150 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2596
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:509 [inline]
 do_vfs_ioctl+0x1de/0x1790 fs/ioctl.c:696
 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:713
 __do_sys_ioctl fs/ioctl.c:720 [inline]
 __se_sys_ioctl fs/ioctl.c:718 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457569
Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00

RSP: 002b:7ff919976c78 EFLAGS: 0246 ORIG_RAX: 0010
RAX: ffda RBX: 0003 RCX: 00457569
RDX:  RSI: ae80 RDI: 0005
RBP: 0072bf00 R08:  R09: 
R10:  R11: 0246 R12: 7ff9199776d4
R13: 004c034e R14: 004d0d60 R15: 
Modules linked in:
kobject: 'loop5' (4f26f0d5): kobject_uevent_env
kobject: 'loop5' (4f26f0d5): fill_kobj_path: path  
= '/devices/virtual/block/loop5'

kobject: 'loop1' (10db8550): kobject_uevent_env
kobject: 'loop1' (10db8550): fill_kobj_path: path  
= '/devices/virtual/block/loop1'

kobject: 'loop3' (941a4e7a): kobject_uevent_env
kobject: 'loop3' (941a4e7a): fill_kobj_path: path  
= '/devices/virtual/block/loop3'

---[ end trace d7fab4e7c1a70214 ]---
RIP: 0010:kvm_apic_hw_enabled arch/x86/kvm/lapic.h:169 [inline]
RIP: 0010:vcpu_scan_ioapic arch/x86/kvm/x86.c:7449 [inline]
RIP: 0010:vcpu_enter_guest arch/x86/kvm/x86.c:7602 [inline]
RIP: 0010:vcpu_run arch/x86/kvm/x86.c:7874 [inline]
RIP: 0010:kvm_arch_vcpu_ioctl_run+0x5296/0x7320 arch/x86/kvm/x86.c:8074
Code: 03 00 00 48 89 f8 48 c1 e8 03 42 80 3c 20 00 0f 85 b4 1e 00 00 49 8b  
9f e0 03 00 00 48 8d bb 88 00 00 00 48 89 f8 48 c1 e8 03 <42> 80 3c 20 00  
0f 85 8a 1e 00 00 48 8b 9b 88 00 00 00 48 8d bb d8

kobject: 'loop4' (4a89aba1): kobject_uevent_env
kobject: 'loop4' (4a89aba1): fill_kobj_path: path  
= '/devices/virtual/block/loop4'

kobject: 'loop2' (704d7e59): kobject_uevent_env
kobject: 'loop2' (704d7e59): fill_kobj_path: path  
= '/devices/virtual/block/loop2'

kobject: 'loop4' (4a89aba1): kobject_uevent_env
kobject: 'loop4' (4a89aba1): fill_kobj_path: path  
= '/devices/virtual/block/loop4'

kobject: 'loop2' (704d7e59): kobject_uevent_env
RSP: 0018:88818b0bf530 EFLAGS: 00010206
RAX: 0011 RBX:  RCX: c9001302b000
kobject: 'loop2' (704d7e59): fill_kobj_path: path  
= '/devices/virtual/block/loop2'

RDX: 00cf RSI: 81103a68 RDI:

general protection fault in kvm_arch_vcpu_ioctl_run

2018-12-03 Thread syzbot


Hello,

syzbot found the following crash on:

HEAD commit:4b78317679c4 Merge branch 'x86-pti-for-linus' of git://git..
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=15e979f540
kernel config:  https://syzkaller.appspot.com/x/.config?x=4602730af4f872ef
dashboard link: https://syzkaller.appspot.com/bug?extid=39810e6c400efadfef71
compiler:   gcc (GCC) 8.0.1 20180413 (experimental)

Unfortunately, I don't have any reproducer for this crash yet.

IMPORTANT: if you fix the bug, please add the following tag to the commit:
Reported-by: syzbot+39810e6c400efadfe...@syzkaller.appspotmail.com

kasan: GPF could be caused by NULL-ptr deref or user memory access
general protection fault:  [#1] PREEMPT SMP KASAN
CPU: 0 PID: 14932 Comm: syz-executor0 Not tainted 4.20.0-rc4+ #138
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS  
Google 01/01/2011

RIP: 0010:kvm_apic_hw_enabled arch/x86/kvm/lapic.h:169 [inline]
RIP: 0010:vcpu_scan_ioapic arch/x86/kvm/x86.c:7449 [inline]
RIP: 0010:vcpu_enter_guest arch/x86/kvm/x86.c:7602 [inline]
RIP: 0010:vcpu_run arch/x86/kvm/x86.c:7874 [inline]
RIP: 0010:kvm_arch_vcpu_ioctl_run+0x5296/0x7320 arch/x86/kvm/x86.c:8074
Code: 03 00 00 48 89 f8 48 c1 e8 03 42 80 3c 20 00 0f 85 b4 1e 00 00 49 8b  
9f e0 03 00 00 48 8d bb 88 00 00 00 48 89 f8 48 c1 e8 03 <42> 80 3c 20 00  
0f 85 8a 1e 00 00 48 8b 9b 88 00 00 00 48 8d bb d8

RSP: 0018:88818b0bf530 EFLAGS: 00010206
RAX: 0011 RBX:  RCX: c9001302b000
RDX: 00cf RSI: 81103a68 RDI: 0088
RBP: 88818b0bf8d0 R08: 8881bfe8e0c0 R09: 0008
R10: 0028 R11: 810feb0f R12: dc00
R13:  R14: c90007ddfdb8 R15: 888188a18400
FS:  7ff919977700() GS:8881dae0() knlGS:
CS:  0010 DS:  ES:  CR0: 80050033
CR2: 7fc94713c518 CR3: 0001cdea CR4: 001426f0
DR0:  DR1:  DR2: 
DR3:  DR6: fffe0ff0 DR7: 0400
Call Trace:
 kvm_vcpu_ioctl+0x5c8/0x1150 arch/x86/kvm/../../../virt/kvm/kvm_main.c:2596
 vfs_ioctl fs/ioctl.c:46 [inline]
 file_ioctl fs/ioctl.c:509 [inline]
 do_vfs_ioctl+0x1de/0x1790 fs/ioctl.c:696
 ksys_ioctl+0xa9/0xd0 fs/ioctl.c:713
 __do_sys_ioctl fs/ioctl.c:720 [inline]
 __se_sys_ioctl fs/ioctl.c:718 [inline]
 __x64_sys_ioctl+0x73/0xb0 fs/ioctl.c:718
 do_syscall_64+0x1b9/0x820 arch/x86/entry/common.c:290
 entry_SYSCALL_64_after_hwframe+0x49/0xbe
RIP: 0033:0x457569
Code: fd b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00 00 66 90 48 89 f8 48 89 f7  
48 89 d6 48 89 ca 4d 89 c2 4d 89 c8 4c 8b 4c 24 08 0f 05 <48> 3d 01 f0 ff  
ff 0f 83 cb b3 fb ff c3 66 2e 0f 1f 84 00 00 00 00

RSP: 002b:7ff919976c78 EFLAGS: 0246 ORIG_RAX: 0010
RAX: ffda RBX: 0003 RCX: 00457569
RDX:  RSI: ae80 RDI: 0005
RBP: 0072bf00 R08:  R09: 
R10:  R11: 0246 R12: 7ff9199776d4
R13: 004c034e R14: 004d0d60 R15: 
Modules linked in:
kobject: 'loop5' (4f26f0d5): kobject_uevent_env
kobject: 'loop5' (4f26f0d5): fill_kobj_path: path  
= '/devices/virtual/block/loop5'

kobject: 'loop1' (10db8550): kobject_uevent_env
kobject: 'loop1' (10db8550): fill_kobj_path: path  
= '/devices/virtual/block/loop1'

kobject: 'loop3' (941a4e7a): kobject_uevent_env
kobject: 'loop3' (941a4e7a): fill_kobj_path: path  
= '/devices/virtual/block/loop3'

---[ end trace d7fab4e7c1a70214 ]---
RIP: 0010:kvm_apic_hw_enabled arch/x86/kvm/lapic.h:169 [inline]
RIP: 0010:vcpu_scan_ioapic arch/x86/kvm/x86.c:7449 [inline]
RIP: 0010:vcpu_enter_guest arch/x86/kvm/x86.c:7602 [inline]
RIP: 0010:vcpu_run arch/x86/kvm/x86.c:7874 [inline]
RIP: 0010:kvm_arch_vcpu_ioctl_run+0x5296/0x7320 arch/x86/kvm/x86.c:8074
Code: 03 00 00 48 89 f8 48 c1 e8 03 42 80 3c 20 00 0f 85 b4 1e 00 00 49 8b  
9f e0 03 00 00 48 8d bb 88 00 00 00 48 89 f8 48 c1 e8 03 <42> 80 3c 20 00  
0f 85 8a 1e 00 00 48 8b 9b 88 00 00 00 48 8d bb d8

kobject: 'loop4' (4a89aba1): kobject_uevent_env
kobject: 'loop4' (4a89aba1): fill_kobj_path: path  
= '/devices/virtual/block/loop4'

kobject: 'loop2' (704d7e59): kobject_uevent_env
kobject: 'loop2' (704d7e59): fill_kobj_path: path  
= '/devices/virtual/block/loop2'

kobject: 'loop4' (4a89aba1): kobject_uevent_env
kobject: 'loop4' (4a89aba1): fill_kobj_path: path  
= '/devices/virtual/block/loop4'

kobject: 'loop2' (704d7e59): kobject_uevent_env
RSP: 0018:88818b0bf530 EFLAGS: 00010206
RAX: 0011 RBX:  RCX: c9001302b000
kobject: 'loop2' (704d7e59): fill_kobj_path: path  
= '/devices/virtual/block/loop2'

RDX: 00cf RSI: 81103a68 RDI:

Re: [PATCH] mm/alloc: fallback to first node if the wanted node offline

2018-12-03 Thread Wei Yang

On Tue, Dec 04, 2018 at 11:05:57AM +0800, Pingfan Liu wrote:
>During my test on some AMD machine, with kexec -l nr_cpus=x option, the
>kernel failed to bootup, because some node's data struct can not be allocated,
>e.g, on x86, initialized by init_cpu_to_node()->init_memory_less_node(). But
>device->numa_node info is used as preferred_nid param for

could we fix the preferred_nid before passed to
__alloc_pages_nodemask()?

BTW, I don't catch the function call flow to this point. Would you mind
giving me some hint?

-- 
Wei Yang
Help you, Help me

Re: [PATCH] mm/alloc: fallback to first node if the wanted node offline

2018-12-03 Thread Wei Yang

On Tue, Dec 04, 2018 at 11:05:57AM +0800, Pingfan Liu wrote:
>During my test on some AMD machine, with kexec -l nr_cpus=x option, the
>kernel failed to bootup, because some node's data struct can not be allocated,
>e.g, on x86, initialized by init_cpu_to_node()->init_memory_less_node(). But
>device->numa_node info is used as preferred_nid param for

could we fix the preferred_nid before passed to
__alloc_pages_nodemask()?

BTW, I don't catch the function call flow to this point. Would you mind
giving me some hint?

-- 
Wei Yang
Help you, Help me

linux-next: Tree for Dec 4

2018-12-03 Thread Stephen Rothwell

Hi all,

Changes since 20181203:

The rdma tree gained a build failure so I used the version from
next-20181203.

The bpf-next tree gained conflicts against the bpf tree.

The char-misc tree gained a conflict against the char-misc.current tree.

The akpm tree lost its build failure but gained a conflict against the
pm tree.

Non-merge commits (relative to Linus' tree): 5958
 6050 files changed, 291311 insertions(+), 163581 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 286 trees (counting Linus' and 67 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (0072a0c14d5b Merge tag 'media/v4.20-4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media)
Merging fixes/master (d8c137546ef8 powerpc: tag implicit fall throughs)
Merging kbuild-current/fixes (ccda4af0f4b9 Linux 4.20-rc2)
Merging arc-current/for-curr (10d443431dc2 ARC: io.h: Implement 
reads{x}()/writes{x}())
Merging arm-current/fixes (e46daee53bb5 ARM: 8806/1: kprobes: Fix false 
positive with FORTIFY_SOURCE)
Merging arm64-fixes/for-next/fixes (ea2412dc21cc ACPI/IORT: Fix 
iort_get_platform_device_domain() uninitialized pointer value)
Merging m68k-current/for-linus (58c116fb7dc6 m68k/sun3: Remove is_medusa and 
m68k_pgtable_cachemode)
Merging powerpc-fixes/fixes (bf3d6afbb234 powerpc: Look for "stdout-path" when 
setting up legacy consoles)
Merging sparc/master (f3f950dba37b Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (35b827b6d061 tun: forbid iface creation with rtnl ops)
Merging bpf/master (dcb40590e69e bpf: refactor bpf_test_run() to separate own 
failures and test program result)
Merging ipsec/master (4a135e538962 xfrm_user: fix freeing of xfrm states on 
acquire)
Merging netfilter/master (d78a5ebd8b18 Merge branch '1GbE' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue)
Merging ipvs/master (feb9f55c33e5 netfilter: nft_dynset: allow dynamic updates 
of non-anonymous set)
Merging wireless-drivers/master (2e6e902d1850 Linux 4.20-rc4)
Merging mac80211/master (113f3aaa81bd cfg80211: Prevent regulatory restore 
during STA disconnect in concurrent interfaces)
Merging rdma-fixes/for-rc (7bca603a69c0 RDMA/mlx5: Initialize return variable 
in case pagefault was skipped)
Merging sound-current/for-linus (5f8cf7125826 ALSA: usb-audio: Fix UAF 
decrement if card has no live interfaces in card.c)
Merging sound-asoc-fixes/for-linus (280ea4299e05 Merge branch 'asoc-4.20' into 
asoc-linus)
Merging regmap-fixes/for-linus (9ff01193a20d Linux 4.20-rc3)
Merging regulator-fixes/for-linus (fea4962497d8 Merge branch 'regulator-4.20' 
into regulator-linus)
Merging spi-fixes/for-linus (9ea83d4c2b9a Merge branch 'spi-4.20' into 
spi-linus)
Merging pci-current/for-linus (c74eadf881ad Merge remote-tracking branch 
'lorenzo/pci/controller-fixes' into for-linus)
Merging driver-core.current/driver-core-linus (2595646791c3 Linux 4.20-rc5)
Merging tty.current/tty-linus (2a48602615e0 tty: do not set TTY_IO_ERROR flag 
if console port)
Merging usb.current/usb-linus (2595646791c3 Linux 4.20-rc5)
Merging usb-gadget-fixes/fixes (069caf5950df USB: omap_udc: fix rejection of 
out transfers when DMA is used)
Merging usb-serial

linux-next: Tree for Dec 4

2018-12-03 Thread Stephen Rothwell

Hi all,

Changes since 20181203:

The rdma tree gained a build failure so I used the version from
next-20181203.

The bpf-next tree gained conflicts against the bpf tree.

The char-misc tree gained a conflict against the char-misc.current tree.

The akpm tree lost its build failure but gained a conflict against the
pm tree.

Non-merge commits (relative to Linus' tree): 5958
 6050 files changed, 291311 insertions(+), 163581 deletions(-)



I have created today's linux-next tree at
git://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git
(patches at http://www.kernel.org/pub/linux/kernel/next/ ).  If you
are tracking the linux-next tree using git, you should not use "git pull"
to do so as that will try to merge the new linux-next release with the
old one.  You should use "git fetch" and checkout or reset to the new
master.

You can see which trees have been included by looking in the Next/Trees
file in the source.  There are also quilt-import.log and merge.log
files in the Next directory.  Between each merge, the tree was built
with a ppc64_defconfig for powerpc, an allmodconfig for x86_64, a
multi_v7_defconfig for arm and a native build of tools/perf. After
the final fixups (if any), I do an x86_64 modules_install followed by
builds for x86_64 allnoconfig, powerpc allnoconfig (32 and 64 bit),
ppc44x_defconfig, allyesconfig and pseries_le_defconfig and i386, sparc
and sparc64 defconfig. And finally, a simple boot test of the powerpc
pseries_le_defconfig kernel in qemu (with and without kvm enabled).

Below is a summary of the state of the merge.

I am currently merging 286 trees (counting Linus' and 67 trees of bug
fix patches pending for the current merge release).

Stats about the size of the tree over time can be seen at
http://neuling.org/linux-next-size.html .

Status of my local build tests will be at
http://kisskb.ellerman.id.au/linux-next .  If maintainers want to give
advice about cross compilers/configs that work, we are always open to add
more builds.

Thanks to Randy Dunlap for doing many randconfig builds.  And to Paul
Gortmaker for triage and bug fixes.

-- 
Cheers,
Stephen Rothwell

$ git checkout master
$ git reset --hard stable
Merging origin/master (0072a0c14d5b Merge tag 'media/v4.20-4' of 
git://git.kernel.org/pub/scm/linux/kernel/git/mchehab/linux-media)
Merging fixes/master (d8c137546ef8 powerpc: tag implicit fall throughs)
Merging kbuild-current/fixes (ccda4af0f4b9 Linux 4.20-rc2)
Merging arc-current/for-curr (10d443431dc2 ARC: io.h: Implement 
reads{x}()/writes{x}())
Merging arm-current/fixes (e46daee53bb5 ARM: 8806/1: kprobes: Fix false 
positive with FORTIFY_SOURCE)
Merging arm64-fixes/for-next/fixes (ea2412dc21cc ACPI/IORT: Fix 
iort_get_platform_device_domain() uninitialized pointer value)
Merging m68k-current/for-linus (58c116fb7dc6 m68k/sun3: Remove is_medusa and 
m68k_pgtable_cachemode)
Merging powerpc-fixes/fixes (bf3d6afbb234 powerpc: Look for "stdout-path" when 
setting up legacy consoles)
Merging sparc/master (f3f950dba37b Merge 
git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide)
Merging fscrypt-current/for-stable (ae64f9bd1d36 Linux 4.15-rc2)
Merging net/master (35b827b6d061 tun: forbid iface creation with rtnl ops)
Merging bpf/master (dcb40590e69e bpf: refactor bpf_test_run() to separate own 
failures and test program result)
Merging ipsec/master (4a135e538962 xfrm_user: fix freeing of xfrm states on 
acquire)
Merging netfilter/master (d78a5ebd8b18 Merge branch '1GbE' of 
git://git.kernel.org/pub/scm/linux/kernel/git/jkirsher/net-queue)
Merging ipvs/master (feb9f55c33e5 netfilter: nft_dynset: allow dynamic updates 
of non-anonymous set)
Merging wireless-drivers/master (2e6e902d1850 Linux 4.20-rc4)
Merging mac80211/master (113f3aaa81bd cfg80211: Prevent regulatory restore 
during STA disconnect in concurrent interfaces)
Merging rdma-fixes/for-rc (7bca603a69c0 RDMA/mlx5: Initialize return variable 
in case pagefault was skipped)
Merging sound-current/for-linus (5f8cf7125826 ALSA: usb-audio: Fix UAF 
decrement if card has no live interfaces in card.c)
Merging sound-asoc-fixes/for-linus (280ea4299e05 Merge branch 'asoc-4.20' into 
asoc-linus)
Merging regmap-fixes/for-linus (9ff01193a20d Linux 4.20-rc3)
Merging regulator-fixes/for-linus (fea4962497d8 Merge branch 'regulator-4.20' 
into regulator-linus)
Merging spi-fixes/for-linus (9ea83d4c2b9a Merge branch 'spi-4.20' into 
spi-linus)
Merging pci-current/for-linus (c74eadf881ad Merge remote-tracking branch 
'lorenzo/pci/controller-fixes' into for-linus)
Merging driver-core.current/driver-core-linus (2595646791c3 Linux 4.20-rc5)
Merging tty.current/tty-linus (2a48602615e0 tty: do not set TTY_IO_ERROR flag 
if console port)
Merging usb.current/usb-linus (2595646791c3 Linux 4.20-rc5)
Merging usb-gadget-fixes/fixes (069caf5950df USB: omap_udc: fix rejection of 
out transfers when DMA is used)
Merging usb-serial

[tip:locking/core] tools/memory-model: Make scripts take "-j" abbreviation for "--jobs"

2018-12-03 Thread tip-bot for Paul E. McKenney

Commit-ID:  a6f1de04276d036b61c4d1dbd0367e6b430d8783
Gitweb: https://git.kernel.org/tip/a6f1de04276d036b61c4d1dbd0367e6b430d8783
Author: Paul E. McKenney 
AuthorDate: Mon, 3 Dec 2018 15:04:51 -0800
Committer:  Ingo Molnar 
CommitDate: Tue, 4 Dec 2018 07:29:52 +0100

tools/memory-model: Make scripts take "-j" abbreviation for "--jobs"

The "--jobs" argument to the litmus-test scripts is similar to the "-jN"
argument to "make", so this commit allows the "-jN" form as well.  While
in the area, it also prohibits the various forms of "-j0".

Suggested-by: Alan Stern 
Signed-off-by: Paul E. McKenney 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: aki...@gmail.com
Cc: boqun.f...@gmail.com
Cc: dhowe...@redhat.com
Cc: j.algl...@ucl.ac.uk
Cc: linux-a...@vger.kernel.org
Cc: luc.maran...@inria.fr
Cc: npig...@gmail.com
Cc: parri.and...@gmail.com
Cc: will.dea...@arm.com
Link: http://lkml.kernel.org/r/20181203230451.28921-3-paul...@linux.ibm.com
Signed-off-by: Ingo Molnar 
---
 tools/memory-model/scripts/parseargs.sh | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/tools/memory-model/scripts/parseargs.sh 
b/tools/memory-model/scripts/parseargs.sh
index 96b307c8d64a..859e1d581e05 100644
--- a/tools/memory-model/scripts/parseargs.sh
+++ b/tools/memory-model/scripts/parseargs.sh
@@ -95,8 +95,18 @@ do
LKMM_HERD_OPTIONS="$2"
shift
;;
-   --jobs|--job)
-   checkarg --jobs "(number)" "$#" "$2" '^[0-9]\+$' '^--'
+   -j[1-9]*)
+   njobs="`echo $1 | sed -e 's/^-j//'`"
+   trailchars="`echo $njobs | sed -e 's/[0-9]\+\(.*\)$/\1/'`"
+   if test -n "$trailchars"
+   then
+   echo $1 trailing characters "'$trailchars'"
+   usagehelp
+   fi
+   LKMM_JOBS="`echo $njobs | sed -e 's/^\([0-9]\+\).*$/\1/'`"
+   ;;
+   --jobs|--job|-j)
+   checkarg --jobs "(number)" "$#" "$2" '^[1-9][0-9]\+$' '^--'
LKMM_JOBS="$2"
shift
;;

Re: [RFC PATCH V2 00/11] Intel EPT-Based Sub-page Protection Support

2018-12-03 Thread Yi Zhang

On 2018-12-03 at 05:56:13 +0200, Mihai Donțu wrote:
> Hi Paolo,
> 
> On Fri, 2018-11-30 at 11:07 +0100, Paolo Bonzini wrote:
> > On 30/11/18 08:52, Zhang Yi wrote:
> > > Here is a patch-series which adding EPT-Based Sub-page Write Protection 
> > > Support.
> > > 
> > > Introduction:
> > > 
> > > EPT-Based Sub-page Write Protection referred to as SPP, it is a 
> > > capability which
> > > allow Virtual Machine Monitors(VMM) to specify write-permission for guest
> > > physical memory at a sub-page(128 byte) granularity.  When this 
> > > capability is
> > > utilized, the CPU enforces write-access permissions for sub-page regions 
> > > of 4K
> > > pages as specified by the VMM. EPT-based sub-page permissions is intended 
> > > to
> > > enable fine-grained memory write enforcement by a VMM for security(guest 
> > > OS
> > > monitoring) and usages such as device virtualization and memory 
> > > check-point.
> > > 
> > > SPPT is active when the "sub-page write protection" VM-execution control 
> > > is 1.
> > > SPPT looks up the guest physical addresses to derive a 64 bit "sub-page
> > > permission" value containing sub-page write permissions. The lookup from
> > > guest-physical addresses to the sub-page region permissions is determined 
> > > by a
> > > set of SPPT paging structures.
> > > 
> > > When the "sub-page write protection" VM-execution control is 1, the SPPT 
> > > is used
> > > to lookup write permission bits for the 128 byte sub-page regions 
> > > containing in
> > > the 4KB guest physical page. EPT specifies the 4KB page level privileges 
> > > that
> > > software is allowed when accessing the guest physical address, whereas 
> > > SPPT
> > > defines the write permissions for software at the 128 byte granularity 
> > > regions
> > > within a 4KB page. Write accesses prevented due to sub-page permissions 
> > > looked
> > > up via SPPT are reported as EPT violation VM exits. Similar to EPT, a 
> > > logical
> > > processor uses SPPT to lookup sub-page region write permissions for
> > > guest-physical addresses only when those addresses are used to access 
> > > memory.
> > 
> > Hi,
> > 
> > I think the right thing to do here would be to first get VM
> > introspection in KVM, as SPP is mostly an introspection feature and it
> > should be controller by the introspector rather than the KVM userspace.
> > 
> > Mihai, if you resubmit, I promise that I will look at it promptly.
Thanks review, Paolo, What do u think we cook some user-cases for qemu or
some kvmtools? even with some other kernel hyper-calls?

SPP is not only an introspection depended features.
> 
> I'm currently traveling until Wednesday, but when I'll get into the
> office I will see about preparing a new patch set and send it to the
> list before Christmas.
Thanks Mihai, please include me in the new VMI patch set. 
> 
> Regards,
> 
> -- 
> Mihai Donțu
>

[tip:locking/core] tools/memory-model: Model smp_mb__after_unlock_lock()

2018-12-03 Thread tip-bot for Andrea Parri

Commit-ID:  4607abbcf464ea2be14da444215d05c73025cf6e
Gitweb: https://git.kernel.org/tip/4607abbcf464ea2be14da444215d05c73025cf6e
Author: Andrea Parri 
AuthorDate: Mon, 3 Dec 2018 15:04:49 -0800
Committer:  Ingo Molnar 
CommitDate: Tue, 4 Dec 2018 07:29:51 +0100

tools/memory-model: Model smp_mb__after_unlock_lock()

The kernel documents smp_mb__after_unlock_lock() the following way:

  "Place this after a lock-acquisition primitive to guarantee that
   an UNLOCK+LOCK pair acts as a full barrier.  This guarantee applies
   if the UNLOCK and LOCK are executed by the same CPU or if the
   UNLOCK and LOCK operate on the same lock variable."

Formalize in LKMM the above guarantee by defining (new) mb-links according
to the law:

  ([M] ; po ; [UL] ; (co | po) ; [LKW] ;
fencerel(After-unlock-lock) ; [M])

where the component ([UL] ; co ; [LKW]) identifies "UNLOCK+LOCK pairs on
the same lock variable" and the component ([UL] ; po ; [LKW]) identifies
"UNLOCK+LOCK pairs executed by the same CPU".

In particular, the LKMM forbids the following two behaviors (the second
litmus test below is based on:

  Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html

c.f., Section "Tree RCU Grace Period Memory Ordering Building Blocks"):

C after-unlock-lock-same-cpu

(*
 * Result: Never
 *)

{}

P0(spinlock_t *s, spinlock_t *t, int *x, int *y)
{
int r0;

spin_lock(s);
WRITE_ONCE(*x, 1);
spin_unlock(s);
spin_lock(t);
smp_mb__after_unlock_lock();
r0 = READ_ONCE(*y);
spin_unlock(t);
}

P1(int *x, int *y)
{
int r0;

WRITE_ONCE(*y, 1);
smp_mb();
r0 = READ_ONCE(*x);
}

exists (0:r0=0 /\ 1:r0=0)

C after-unlock-lock-same-lock-variable

(*
 * Result: Never
 *)

{}

P0(spinlock_t *s, int *x, int *y)
{
int r0;

spin_lock(s);
WRITE_ONCE(*x, 1);
r0 = READ_ONCE(*y);
spin_unlock(s);
}

P1(spinlock_t *s, int *y, int *z)
{
int r0;

spin_lock(s);
smp_mb__after_unlock_lock();
WRITE_ONCE(*y, 1);
r0 = READ_ONCE(*z);
spin_unlock(s);
}

P2(int *z, int *x)
{
int r0;

WRITE_ONCE(*z, 1);
smp_mb();
r0 = READ_ONCE(*x);
}

exists (0:r0=0 /\ 1:r0=0 /\ 2:r0=0)

Signed-off-by: Andrea Parri 
Signed-off-by: Paul E. McKenney 
Cc: Akira Yokosawa 
Cc: Alan Stern 
Cc: Boqun Feng 
Cc: Daniel Lustig 
Cc: David Howells 
Cc: Jade Alglave 
Cc: Linus Torvalds 
Cc: Luc Maranget 
Cc: Nicholas Piggin 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: linux-a...@vger.kernel.org
Cc: parri.and...@gmail.com
Link: http://lkml.kernel.org/r/20181203230451.28921-1-paul...@linux.ibm.com
Signed-off-by: Ingo Molnar 
---
 tools/memory-model/linux-kernel.bell | 3 ++-
 tools/memory-model/linux-kernel.cat  | 4 +++-
 tools/memory-model/linux-kernel.def  | 1 +
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/memory-model/linux-kernel.bell 
b/tools/memory-model/linux-kernel.bell
index b84fb2f67109..796513362c05 100644
--- a/tools/memory-model/linux-kernel.bell
+++ b/tools/memory-model/linux-kernel.bell
@@ -29,7 +29,8 @@ enum Barriers = 'wmb (*smp_wmb*) ||
'sync-rcu (*synchronize_rcu*) ||
'before-atomic (*smp_mb__before_atomic*) ||
'after-atomic (*smp_mb__after_atomic*) ||
-   'after-spinlock (*smp_mb__after_spinlock*)
+   'after-spinlock (*smp_mb__after_spinlock*) ||
+   'after-unlock-lock (*smp_mb__after_unlock_lock*)
 instructions F[Barriers]
 
 (* Compute matching pairs of nested Rcu-lock and Rcu-unlock *)
diff --git a/tools/memory-model/linux-kernel.cat 
b/tools/memory-model/linux-kernel.cat
index 882fc33274ac..8f23c74a96fd 100644
--- a/tools/memory-model/linux-kernel.cat
+++ b/tools/memory-model/linux-kernel.cat
@@ -30,7 +30,9 @@ let wmb = [W] ; fencerel(Wmb) ; [W]
 let mb = ([M] ; fencerel(Mb) ; [M]) |
([M] ; fencerel(Before-atomic) ; [RMW] ; po? ; [M]) |
([M] ; po? ; [RMW] ; fencerel(After-atomic) ; [M]) |
-   ([M] ; po? ; [LKW] ; fencerel(After-spinlock) ; [M])
+   ([M] ; po? ; [LKW] ; fencerel(After-spinlock) ; [M]) |
+   ([M] ; po ; [UL] ; (co | po) ; [LKW] ;
+   fencerel(After-unlock-lock) ; [M])
 let gp = po ; [Sync-rcu] ; po?
 
 let strong-fence = mb | gp
diff --git a/tools/memory-model/linux-kernel.def 
b/tools/memory-model/linux-kernel.def
index 6fa3eb28d40b..b27911cc087d 100644
--- a/tools/memory-model/linux-kernel.def
+++ b/tools/memory-model/linux-kernel.def
@@ -23,6 +23,7 @@ smp_wmb() { __fence{wmb}; }
 smp_mb__before_atomic() { __fence{before-atomic}; }
 smp_mb__after_atomic() { __fence{after-atomic}; }
 smp_mb__after_spinlock() { __fence{after-spinlock}; }
+smp_mb__after_unlock_lock() { __fence{after-unlock-lock}; }
 
 // Exchange
 xchg(X,V)  __xchg{mb}(X,V)

[tip:locking/core] tools/memory-model: Make scripts take "-j" abbreviation for "--jobs"

2018-12-03 Thread tip-bot for Paul E. McKenney

Commit-ID:  a6f1de04276d036b61c4d1dbd0367e6b430d8783
Gitweb: https://git.kernel.org/tip/a6f1de04276d036b61c4d1dbd0367e6b430d8783
Author: Paul E. McKenney 
AuthorDate: Mon, 3 Dec 2018 15:04:51 -0800
Committer:  Ingo Molnar 
CommitDate: Tue, 4 Dec 2018 07:29:52 +0100

tools/memory-model: Make scripts take "-j" abbreviation for "--jobs"

The "--jobs" argument to the litmus-test scripts is similar to the "-jN"
argument to "make", so this commit allows the "-jN" form as well.  While
in the area, it also prohibits the various forms of "-j0".

Suggested-by: Alan Stern 
Signed-off-by: Paul E. McKenney 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: aki...@gmail.com
Cc: boqun.f...@gmail.com
Cc: dhowe...@redhat.com
Cc: j.algl...@ucl.ac.uk
Cc: linux-a...@vger.kernel.org
Cc: luc.maran...@inria.fr
Cc: npig...@gmail.com
Cc: parri.and...@gmail.com
Cc: will.dea...@arm.com
Link: http://lkml.kernel.org/r/20181203230451.28921-3-paul...@linux.ibm.com
Signed-off-by: Ingo Molnar 
---
 tools/memory-model/scripts/parseargs.sh | 14 --
 1 file changed, 12 insertions(+), 2 deletions(-)

diff --git a/tools/memory-model/scripts/parseargs.sh 
b/tools/memory-model/scripts/parseargs.sh
index 96b307c8d64a..859e1d581e05 100644
--- a/tools/memory-model/scripts/parseargs.sh
+++ b/tools/memory-model/scripts/parseargs.sh
@@ -95,8 +95,18 @@ do
LKMM_HERD_OPTIONS="$2"
shift
;;
-   --jobs|--job)
-   checkarg --jobs "(number)" "$#" "$2" '^[0-9]\+$' '^--'
+   -j[1-9]*)
+   njobs="`echo $1 | sed -e 's/^-j//'`"
+   trailchars="`echo $njobs | sed -e 's/[0-9]\+\(.*\)$/\1/'`"
+   if test -n "$trailchars"
+   then
+   echo $1 trailing characters "'$trailchars'"
+   usagehelp
+   fi
+   LKMM_JOBS="`echo $njobs | sed -e 's/^\([0-9]\+\).*$/\1/'`"
+   ;;
+   --jobs|--job|-j)
+   checkarg --jobs "(number)" "$#" "$2" '^[1-9][0-9]\+$' '^--'
LKMM_JOBS="$2"
shift
;;

Re: [RFC PATCH V2 00/11] Intel EPT-Based Sub-page Protection Support

2018-12-03 Thread Yi Zhang

On 2018-12-03 at 05:56:13 +0200, Mihai Donțu wrote:
> Hi Paolo,
> 
> On Fri, 2018-11-30 at 11:07 +0100, Paolo Bonzini wrote:
> > On 30/11/18 08:52, Zhang Yi wrote:
> > > Here is a patch-series which adding EPT-Based Sub-page Write Protection 
> > > Support.
> > > 
> > > Introduction:
> > > 
> > > EPT-Based Sub-page Write Protection referred to as SPP, it is a 
> > > capability which
> > > allow Virtual Machine Monitors(VMM) to specify write-permission for guest
> > > physical memory at a sub-page(128 byte) granularity.  When this 
> > > capability is
> > > utilized, the CPU enforces write-access permissions for sub-page regions 
> > > of 4K
> > > pages as specified by the VMM. EPT-based sub-page permissions is intended 
> > > to
> > > enable fine-grained memory write enforcement by a VMM for security(guest 
> > > OS
> > > monitoring) and usages such as device virtualization and memory 
> > > check-point.
> > > 
> > > SPPT is active when the "sub-page write protection" VM-execution control 
> > > is 1.
> > > SPPT looks up the guest physical addresses to derive a 64 bit "sub-page
> > > permission" value containing sub-page write permissions. The lookup from
> > > guest-physical addresses to the sub-page region permissions is determined 
> > > by a
> > > set of SPPT paging structures.
> > > 
> > > When the "sub-page write protection" VM-execution control is 1, the SPPT 
> > > is used
> > > to lookup write permission bits for the 128 byte sub-page regions 
> > > containing in
> > > the 4KB guest physical page. EPT specifies the 4KB page level privileges 
> > > that
> > > software is allowed when accessing the guest physical address, whereas 
> > > SPPT
> > > defines the write permissions for software at the 128 byte granularity 
> > > regions
> > > within a 4KB page. Write accesses prevented due to sub-page permissions 
> > > looked
> > > up via SPPT are reported as EPT violation VM exits. Similar to EPT, a 
> > > logical
> > > processor uses SPPT to lookup sub-page region write permissions for
> > > guest-physical addresses only when those addresses are used to access 
> > > memory.
> > 
> > Hi,
> > 
> > I think the right thing to do here would be to first get VM
> > introspection in KVM, as SPP is mostly an introspection feature and it
> > should be controller by the introspector rather than the KVM userspace.
> > 
> > Mihai, if you resubmit, I promise that I will look at it promptly.
Thanks review, Paolo, What do u think we cook some user-cases for qemu or
some kvmtools? even with some other kernel hyper-calls?

SPP is not only an introspection depended features.
> 
> I'm currently traveling until Wednesday, but when I'll get into the
> office I will see about preparing a new patch set and send it to the
> list before Christmas.
Thanks Mihai, please include me in the new VMI patch set. 
> 
> Regards,
> 
> -- 
> Mihai Donțu
>

[tip:locking/core] tools/memory-model: Model smp_mb__after_unlock_lock()

2018-12-03 Thread tip-bot for Andrea Parri

Commit-ID:  4607abbcf464ea2be14da444215d05c73025cf6e
Gitweb: https://git.kernel.org/tip/4607abbcf464ea2be14da444215d05c73025cf6e
Author: Andrea Parri 
AuthorDate: Mon, 3 Dec 2018 15:04:49 -0800
Committer:  Ingo Molnar 
CommitDate: Tue, 4 Dec 2018 07:29:51 +0100

tools/memory-model: Model smp_mb__after_unlock_lock()

The kernel documents smp_mb__after_unlock_lock() the following way:

  "Place this after a lock-acquisition primitive to guarantee that
   an UNLOCK+LOCK pair acts as a full barrier.  This guarantee applies
   if the UNLOCK and LOCK are executed by the same CPU or if the
   UNLOCK and LOCK operate on the same lock variable."

Formalize in LKMM the above guarantee by defining (new) mb-links according
to the law:

  ([M] ; po ; [UL] ; (co | po) ; [LKW] ;
fencerel(After-unlock-lock) ; [M])

where the component ([UL] ; co ; [LKW]) identifies "UNLOCK+LOCK pairs on
the same lock variable" and the component ([UL] ; po ; [LKW]) identifies
"UNLOCK+LOCK pairs executed by the same CPU".

In particular, the LKMM forbids the following two behaviors (the second
litmus test below is based on:

  Documentation/RCU/Design/Memory-Ordering/Tree-RCU-Memory-Ordering.html

c.f., Section "Tree RCU Grace Period Memory Ordering Building Blocks"):

C after-unlock-lock-same-cpu

(*
 * Result: Never
 *)

{}

P0(spinlock_t *s, spinlock_t *t, int *x, int *y)
{
int r0;

spin_lock(s);
WRITE_ONCE(*x, 1);
spin_unlock(s);
spin_lock(t);
smp_mb__after_unlock_lock();
r0 = READ_ONCE(*y);
spin_unlock(t);
}

P1(int *x, int *y)
{
int r0;

WRITE_ONCE(*y, 1);
smp_mb();
r0 = READ_ONCE(*x);
}

exists (0:r0=0 /\ 1:r0=0)

C after-unlock-lock-same-lock-variable

(*
 * Result: Never
 *)

{}

P0(spinlock_t *s, int *x, int *y)
{
int r0;

spin_lock(s);
WRITE_ONCE(*x, 1);
r0 = READ_ONCE(*y);
spin_unlock(s);
}

P1(spinlock_t *s, int *y, int *z)
{
int r0;

spin_lock(s);
smp_mb__after_unlock_lock();
WRITE_ONCE(*y, 1);
r0 = READ_ONCE(*z);
spin_unlock(s);
}

P2(int *z, int *x)
{
int r0;

WRITE_ONCE(*z, 1);
smp_mb();
r0 = READ_ONCE(*x);
}

exists (0:r0=0 /\ 1:r0=0 /\ 2:r0=0)

Signed-off-by: Andrea Parri 
Signed-off-by: Paul E. McKenney 
Cc: Akira Yokosawa 
Cc: Alan Stern 
Cc: Boqun Feng 
Cc: Daniel Lustig 
Cc: David Howells 
Cc: Jade Alglave 
Cc: Linus Torvalds 
Cc: Luc Maranget 
Cc: Nicholas Piggin 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: Will Deacon 
Cc: linux-a...@vger.kernel.org
Cc: parri.and...@gmail.com
Link: http://lkml.kernel.org/r/20181203230451.28921-1-paul...@linux.ibm.com
Signed-off-by: Ingo Molnar 
---
 tools/memory-model/linux-kernel.bell | 3 ++-
 tools/memory-model/linux-kernel.cat  | 4 +++-
 tools/memory-model/linux-kernel.def  | 1 +
 3 files changed, 6 insertions(+), 2 deletions(-)

diff --git a/tools/memory-model/linux-kernel.bell 
b/tools/memory-model/linux-kernel.bell
index b84fb2f67109..796513362c05 100644
--- a/tools/memory-model/linux-kernel.bell
+++ b/tools/memory-model/linux-kernel.bell
@@ -29,7 +29,8 @@ enum Barriers = 'wmb (*smp_wmb*) ||
'sync-rcu (*synchronize_rcu*) ||
'before-atomic (*smp_mb__before_atomic*) ||
'after-atomic (*smp_mb__after_atomic*) ||
-   'after-spinlock (*smp_mb__after_spinlock*)
+   'after-spinlock (*smp_mb__after_spinlock*) ||
+   'after-unlock-lock (*smp_mb__after_unlock_lock*)
 instructions F[Barriers]
 
 (* Compute matching pairs of nested Rcu-lock and Rcu-unlock *)
diff --git a/tools/memory-model/linux-kernel.cat 
b/tools/memory-model/linux-kernel.cat
index 882fc33274ac..8f23c74a96fd 100644
--- a/tools/memory-model/linux-kernel.cat
+++ b/tools/memory-model/linux-kernel.cat
@@ -30,7 +30,9 @@ let wmb = [W] ; fencerel(Wmb) ; [W]
 let mb = ([M] ; fencerel(Mb) ; [M]) |
([M] ; fencerel(Before-atomic) ; [RMW] ; po? ; [M]) |
([M] ; po? ; [RMW] ; fencerel(After-atomic) ; [M]) |
-   ([M] ; po? ; [LKW] ; fencerel(After-spinlock) ; [M])
+   ([M] ; po? ; [LKW] ; fencerel(After-spinlock) ; [M]) |
+   ([M] ; po ; [UL] ; (co | po) ; [LKW] ;
+   fencerel(After-unlock-lock) ; [M])
 let gp = po ; [Sync-rcu] ; po?
 
 let strong-fence = mb | gp
diff --git a/tools/memory-model/linux-kernel.def 
b/tools/memory-model/linux-kernel.def
index 6fa3eb28d40b..b27911cc087d 100644
--- a/tools/memory-model/linux-kernel.def
+++ b/tools/memory-model/linux-kernel.def
@@ -23,6 +23,7 @@ smp_wmb() { __fence{wmb}; }
 smp_mb__before_atomic() { __fence{before-atomic}; }
 smp_mb__after_atomic() { __fence{after-atomic}; }
 smp_mb__after_spinlock() { __fence{after-spinlock}; }
+smp_mb__after_unlock_lock() { __fence{after-unlock-lock}; }
 
 // Exchange
 xchg(X,V)  __xchg{mb}(X,V)

[tip:locking/core] tools/memory-model: Add scripts to check github litmus tests

2018-12-03 Thread tip-bot for Paul E. McKenney

Commit-ID:  e188d24a382d609ec7ca6c1a00396202565b7831
Gitweb: https://git.kernel.org/tip/e188d24a382d609ec7ca6c1a00396202565b7831
Author: Paul E. McKenney 
AuthorDate: Mon, 3 Dec 2018 15:04:50 -0800
Committer:  Ingo Molnar 
CommitDate: Tue, 4 Dec 2018 07:29:52 +0100

tools/memory-model: Add scripts to check github litmus tests

The https://github.com/paulmckrcu/litmus repository contains a large
number of C-language litmus tests that include "Result:" comments
predicting the verification result.  This commit adds a number of scripts
that run tests on these litmus tests:

checkghlitmus.sh:
Runs all litmus tests in the https://github.com/paulmckrcu/litmus
archive that are C-language and that have "Result:" comment lines
documenting expected results, comparing the actual results to
those expected.  Clones the repository if it has not already
been cloned into the "tools/memory-model/litmus" directory.

initlitmushist.sh
Run all litmus tests having no more than the specified number
of processes given a specified timeout, recording the results in
.litmus.out files.  Clones the repository if it has not already
been cloned into the "tools/memory-model/litmus" directory.

newlitmushist.sh
For all new or updated litmus tests having no more than the
specified number of processes given a specified timeout, run
and record the results in .litmus.out files.

checklitmushist.sh
Run all litmus tests having .litmus.out files from previous
initlitmushist.sh or newlitmushist.sh runs, comparing the
herd output to that of the original runs.

The above scripts will run litmus tests concurrently, by default with
one job per available CPU.  Giving any of these scripts the --help
argument will cause them to print usage information.

This commit also adds a number of helper scripts that are not intended
to be invoked from the command line:

cmplitmushist.sh: Compare the output of two different runs of the same
litmus test.

judgelitmus.sh: Compare the output of a litmus test to its "Result:"
comment line.

parseargs.sh: Parse command-line arguments.

runlitmushist.sh: Run the litmus tests whose pathnames are provided one
per line on standard input.

While in the area, this commit also makes the existing checklitmus.sh
and checkalllitmus.sh scripts use parseargs.sh in order to provide a
bit of uniformity.  In addition, per-litmus-test status output is directed
to stdout, while end-of-test summary information is directed to stderr.
Finally, the error flag standardizes on "!!!" to assist those familiar
with rcutorture output.

The defaults for the parseargs.sh arguments may be overridden by using
environment variables: LKMM_DESTDIR for --destdir, LKMM_HERD_OPTIONS
for --herdoptions, LKMM_JOBS for --jobs, LKMM_PROCS for --procs, and
LKMM_TIMEOUT for --timeout.

[ paulmck: History-check summary-line changes per Alan Stern feedback. ]
Signed-off-by: Paul E. McKenney 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: aki...@gmail.com
Cc: boqun.f...@gmail.com
Cc: dhowe...@redhat.com
Cc: j.algl...@ucl.ac.uk
Cc: linux-a...@vger.kernel.org
Cc: luc.maran...@inria.fr
Cc: npig...@gmail.com
Cc: parri.and...@gmail.com
Cc: st...@rowland.harvard.edu
Cc: will.dea...@arm.com
Link: http://lkml.kernel.org/r/20181203230451.28921-2-paul...@linux.ibm.com
Signed-off-by: Ingo Molnar 
---
 tools/memory-model/.gitignore |   1 +
 tools/memory-model/README |   2 +
 tools/memory-model/scripts/README |  70 ++
 tools/memory-model/scripts/checkalllitmus.sh  |  53 +--
 tools/memory-model/scripts/checkghlitmus.sh   |  65 +
 tools/memory-model/scripts/checklitmus.sh |  74 +++
 tools/memory-model/scripts/checklitmushist.sh |  60 
 tools/memory-model/scripts/cmplitmushist.sh   |  87 ++
 tools/memory-model/scripts/initlitmushist.sh  |  68 ++
 tools/memory-model/scripts/judgelitmus.sh |  78 
 tools/memory-model/scripts/newlitmushist.sh   |  61 +
 tools/memory-model/scripts/parseargs.sh   | 126 ++
 tools/memory-model/scripts/runlitmushist.sh   |  87 ++
 13 files changed, 739 insertions(+), 93 deletions(-)

diff --git a/tools/memory-model/.gitignore b/tools/memory-model/.gitignore
new file mode 100644
index ..b1d34c52f3c3
--- /dev/null
+++ b/tools/memory-model/.gitignore
@@ -0,0 +1 @@
+litmus
diff --git a/tools/memory-model/README b/tools/memory-model/README
index acf9077cffaa..0f2c366518c6 100644
--- a/tools/memory-model/README
+++ b/tools/memory-model/README
@@ -156,6 +156,8 @@ lock.cat
 README
This file.
 
+scriptsVarious scripts, see scripts/README.
+
 
 ===
 LIMITATIONS
diff --git a/tools/memory-model/scripts/README 
b/tools/memory-model/scripts/README
new file

[tip:locking/core] tools/memory-model: Add scripts to check github litmus tests

2018-12-03 Thread tip-bot for Paul E. McKenney

Commit-ID:  e188d24a382d609ec7ca6c1a00396202565b7831
Gitweb: https://git.kernel.org/tip/e188d24a382d609ec7ca6c1a00396202565b7831
Author: Paul E. McKenney 
AuthorDate: Mon, 3 Dec 2018 15:04:50 -0800
Committer:  Ingo Molnar 
CommitDate: Tue, 4 Dec 2018 07:29:52 +0100

tools/memory-model: Add scripts to check github litmus tests

The https://github.com/paulmckrcu/litmus repository contains a large
number of C-language litmus tests that include "Result:" comments
predicting the verification result.  This commit adds a number of scripts
that run tests on these litmus tests:

checkghlitmus.sh:
Runs all litmus tests in the https://github.com/paulmckrcu/litmus
archive that are C-language and that have "Result:" comment lines
documenting expected results, comparing the actual results to
those expected.  Clones the repository if it has not already
been cloned into the "tools/memory-model/litmus" directory.

initlitmushist.sh
Run all litmus tests having no more than the specified number
of processes given a specified timeout, recording the results in
.litmus.out files.  Clones the repository if it has not already
been cloned into the "tools/memory-model/litmus" directory.

newlitmushist.sh
For all new or updated litmus tests having no more than the
specified number of processes given a specified timeout, run
and record the results in .litmus.out files.

checklitmushist.sh
Run all litmus tests having .litmus.out files from previous
initlitmushist.sh or newlitmushist.sh runs, comparing the
herd output to that of the original runs.

The above scripts will run litmus tests concurrently, by default with
one job per available CPU.  Giving any of these scripts the --help
argument will cause them to print usage information.

This commit also adds a number of helper scripts that are not intended
to be invoked from the command line:

cmplitmushist.sh: Compare the output of two different runs of the same
litmus test.

judgelitmus.sh: Compare the output of a litmus test to its "Result:"
comment line.

parseargs.sh: Parse command-line arguments.

runlitmushist.sh: Run the litmus tests whose pathnames are provided one
per line on standard input.

While in the area, this commit also makes the existing checklitmus.sh
and checkalllitmus.sh scripts use parseargs.sh in order to provide a
bit of uniformity.  In addition, per-litmus-test status output is directed
to stdout, while end-of-test summary information is directed to stderr.
Finally, the error flag standardizes on "!!!" to assist those familiar
with rcutorture output.

The defaults for the parseargs.sh arguments may be overridden by using
environment variables: LKMM_DESTDIR for --destdir, LKMM_HERD_OPTIONS
for --herdoptions, LKMM_JOBS for --jobs, LKMM_PROCS for --procs, and
LKMM_TIMEOUT for --timeout.

[ paulmck: History-check summary-line changes per Alan Stern feedback. ]
Signed-off-by: Paul E. McKenney 
Cc: Linus Torvalds 
Cc: Peter Zijlstra 
Cc: Thomas Gleixner 
Cc: aki...@gmail.com
Cc: boqun.f...@gmail.com
Cc: dhowe...@redhat.com
Cc: j.algl...@ucl.ac.uk
Cc: linux-a...@vger.kernel.org
Cc: luc.maran...@inria.fr
Cc: npig...@gmail.com
Cc: parri.and...@gmail.com
Cc: st...@rowland.harvard.edu
Cc: will.dea...@arm.com
Link: http://lkml.kernel.org/r/20181203230451.28921-2-paul...@linux.ibm.com
Signed-off-by: Ingo Molnar 
---
 tools/memory-model/.gitignore |   1 +
 tools/memory-model/README |   2 +
 tools/memory-model/scripts/README |  70 ++
 tools/memory-model/scripts/checkalllitmus.sh  |  53 +--
 tools/memory-model/scripts/checkghlitmus.sh   |  65 +
 tools/memory-model/scripts/checklitmus.sh |  74 +++
 tools/memory-model/scripts/checklitmushist.sh |  60 
 tools/memory-model/scripts/cmplitmushist.sh   |  87 ++
 tools/memory-model/scripts/initlitmushist.sh  |  68 ++
 tools/memory-model/scripts/judgelitmus.sh |  78 
 tools/memory-model/scripts/newlitmushist.sh   |  61 +
 tools/memory-model/scripts/parseargs.sh   | 126 ++
 tools/memory-model/scripts/runlitmushist.sh   |  87 ++
 13 files changed, 739 insertions(+), 93 deletions(-)

diff --git a/tools/memory-model/.gitignore b/tools/memory-model/.gitignore
new file mode 100644
index ..b1d34c52f3c3
--- /dev/null
+++ b/tools/memory-model/.gitignore
@@ -0,0 +1 @@
+litmus
diff --git a/tools/memory-model/README b/tools/memory-model/README
index acf9077cffaa..0f2c366518c6 100644
--- a/tools/memory-model/README
+++ b/tools/memory-model/README
@@ -156,6 +156,8 @@ lock.cat
 README
This file.
 
+scriptsVarious scripts, see scripts/README.
+
 
 ===
 LIMITATIONS
diff --git a/tools/memory-model/scripts/README 
b/tools/memory-model/scripts/README
new file

[PATCH] spi: lpspi: Add cs-gpio support

2018-12-03 Thread Clark Wang

Add cs-gpio feature for LPSPI. Use fsl_lpspi_prepare_message() and
fsl_lpspi_unprepare_message() to enable and control cs line.
These two functions will be only called at the beginning and the ending
of a message transfer.

Still support using the mode without cs-gpio. It depends on if attribute
cs-gpio has been configured in dts file.

Signed-off-by: Clark Wang 
---
 drivers/spi/spi-fsl-lpspi.c | 79 -
 1 file changed, 78 insertions(+), 1 deletion(-)

diff --git a/drivers/spi/spi-fsl-lpspi.c b/drivers/spi/spi-fsl-lpspi.c
index a7d01b79827b..c6fe3f94de19 100644
--- a/drivers/spi/spi-fsl-lpspi.c
+++ b/drivers/spi/spi-fsl-lpspi.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -16,7 +17,9 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -28,6 +31,10 @@
 
 #define FSL_LPSPI_RPM_TIMEOUT 50 /* 50ms */
 
+#define LPSPI_CS_ACTIVE1
+#define LPSPI_CS_INACTIVE  0
+#define LPSPI_CS_DELAY 100
+
 /* i.MX7ULP LPSPI registers */
 #define IMX7ULP_VERID  0x0
 #define IMX7ULP_PARAM  0x4
@@ -104,6 +111,8 @@ struct fsl_lpspi_data {
struct completion xfer_done;
 
bool slave_aborted;
+
+   int chipselect[0];
 };
 
 static const struct of_device_id fsl_lpspi_dt_ids[] = {
@@ -176,6 +185,48 @@ static int lpspi_unprepare_xfer_hardware(struct 
spi_controller *controller)
return 0;
 }
 
+static void fsl_lpspi_chipselect(struct spi_device *spi, bool enable)
+{
+   struct fsl_lpspi_data *fsl_lpspi =
+   spi_controller_get_devdata(spi->controller);
+   int gpio = fsl_lpspi->chipselect[spi->chip_select];
+
+   enable = (!!(spi->mode & SPI_CS_HIGH) == enable);
+
+   if (!gpio_is_valid(gpio))
+   return;
+
+   gpio_set_value_cansleep(gpio, enable);
+}
+
+static int fsl_lpspi_prepare_message(struct spi_controller *controller,
+   struct spi_message *msg)
+{
+   struct fsl_lpspi_data *fsl_lpspi =
+   spi_controller_get_devdata(controller);
+   struct spi_device *spi = msg->spi;
+   int gpio = fsl_lpspi->chipselect[spi->chip_select];
+
+   if (gpio_is_valid(gpio)) {
+   gpio_direction_output(gpio,
+   fsl_lpspi->config.mode & SPI_CS_HIGH ? 0 : 1);
+   }
+
+   fsl_lpspi_chipselect(spi, LPSPI_CS_ACTIVE);
+
+   return 0;
+}
+
+static int fsl_lpspi_unprepare_message(struct spi_controller *controller,
+   struct spi_message *msg)
+{
+   struct spi_device *spi = msg->spi;
+
+   fsl_lpspi_chipselect(spi, LPSPI_CS_INACTIVE);
+
+   return 0;
+}
+
 static void fsl_lpspi_write_tx_fifo(struct fsl_lpspi_data *fsl_lpspi)
 {
u8 txfifo_cnt;
@@ -512,10 +563,13 @@ static int fsl_lpspi_init_rpm(struct fsl_lpspi_data 
*fsl_lpspi)
 
 static int fsl_lpspi_probe(struct platform_device *pdev)
 {
+   struct device_node *np = pdev->dev.of_node;
struct fsl_lpspi_data *fsl_lpspi;
struct spi_controller *controller;
+   struct spi_imx_master *lpspi_platform_info =
+   dev_get_platdata(>dev);
struct resource *res;
-   int ret, irq;
+   int i, ret, irq;
u32 temp;
 
if (of_property_read_bool((>dev)->of_node, "spi-slave"))
@@ -539,6 +593,29 @@ static int fsl_lpspi_probe(struct platform_device *pdev)
fsl_lpspi->is_slave = of_property_read_bool((>dev)->of_node,
"spi-slave");
 
+   if (!fsl_lpspi->is_slave) {
+   for (i = 0; i < controller->num_chipselect; i++) {
+   int cs_gpio = of_get_named_gpio(np, "cs-gpios", i);
+
+   if (!gpio_is_valid(cs_gpio) && lpspi_platform_info)
+   cs_gpio = lpspi_platform_info->chipselect[i];
+
+   fsl_lpspi->chipselect[i] = cs_gpio;
+   if (!gpio_is_valid(cs_gpio))
+   continue;
+
+   ret = devm_gpio_request(>dev,
+   fsl_lpspi->chipselect[i], DRIVER_NAME);
+   if (ret) {
+   dev_err(>dev, "can't get cs gpios\n");
+   goto out_controller_put;
+   }
+   }
+
+   controller->prepare_message = fsl_lpspi_prepare_message;
+   controller->unprepare_message = fsl_lpspi_unprepare_message;
+   }
+
controller->transfer_one_message = fsl_lpspi_transfer_one_msg;
controller->prepare_transfer_hardware = lpspi_prepare_xfer_hardware;
controller->unprepare_transfer_hardware = lpspi_unprepare_xfer_hardware;
-- 
2.17.1

[PATCH] spi: lpspi: Add cs-gpio support

2018-12-03 Thread Clark Wang

Add cs-gpio feature for LPSPI. Use fsl_lpspi_prepare_message() and
fsl_lpspi_unprepare_message() to enable and control cs line.
These two functions will be only called at the beginning and the ending
of a message transfer.

Still support using the mode without cs-gpio. It depends on if attribute
cs-gpio has been configured in dts file.

Signed-off-by: Clark Wang 
---
 drivers/spi/spi-fsl-lpspi.c | 79 -
 1 file changed, 78 insertions(+), 1 deletion(-)

diff --git a/drivers/spi/spi-fsl-lpspi.c b/drivers/spi/spi-fsl-lpspi.c
index a7d01b79827b..c6fe3f94de19 100644
--- a/drivers/spi/spi-fsl-lpspi.c
+++ b/drivers/spi/spi-fsl-lpspi.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -16,7 +17,9 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -28,6 +31,10 @@
 
 #define FSL_LPSPI_RPM_TIMEOUT 50 /* 50ms */
 
+#define LPSPI_CS_ACTIVE1
+#define LPSPI_CS_INACTIVE  0
+#define LPSPI_CS_DELAY 100
+
 /* i.MX7ULP LPSPI registers */
 #define IMX7ULP_VERID  0x0
 #define IMX7ULP_PARAM  0x4
@@ -104,6 +111,8 @@ struct fsl_lpspi_data {
struct completion xfer_done;
 
bool slave_aborted;
+
+   int chipselect[0];
 };
 
 static const struct of_device_id fsl_lpspi_dt_ids[] = {
@@ -176,6 +185,48 @@ static int lpspi_unprepare_xfer_hardware(struct 
spi_controller *controller)
return 0;
 }
 
+static void fsl_lpspi_chipselect(struct spi_device *spi, bool enable)
+{
+   struct fsl_lpspi_data *fsl_lpspi =
+   spi_controller_get_devdata(spi->controller);
+   int gpio = fsl_lpspi->chipselect[spi->chip_select];
+
+   enable = (!!(spi->mode & SPI_CS_HIGH) == enable);
+
+   if (!gpio_is_valid(gpio))
+   return;
+
+   gpio_set_value_cansleep(gpio, enable);
+}
+
+static int fsl_lpspi_prepare_message(struct spi_controller *controller,
+   struct spi_message *msg)
+{
+   struct fsl_lpspi_data *fsl_lpspi =
+   spi_controller_get_devdata(controller);
+   struct spi_device *spi = msg->spi;
+   int gpio = fsl_lpspi->chipselect[spi->chip_select];
+
+   if (gpio_is_valid(gpio)) {
+   gpio_direction_output(gpio,
+   fsl_lpspi->config.mode & SPI_CS_HIGH ? 0 : 1);
+   }
+
+   fsl_lpspi_chipselect(spi, LPSPI_CS_ACTIVE);
+
+   return 0;
+}
+
+static int fsl_lpspi_unprepare_message(struct spi_controller *controller,
+   struct spi_message *msg)
+{
+   struct spi_device *spi = msg->spi;
+
+   fsl_lpspi_chipselect(spi, LPSPI_CS_INACTIVE);
+
+   return 0;
+}
+
 static void fsl_lpspi_write_tx_fifo(struct fsl_lpspi_data *fsl_lpspi)
 {
u8 txfifo_cnt;
@@ -512,10 +563,13 @@ static int fsl_lpspi_init_rpm(struct fsl_lpspi_data 
*fsl_lpspi)
 
 static int fsl_lpspi_probe(struct platform_device *pdev)
 {
+   struct device_node *np = pdev->dev.of_node;
struct fsl_lpspi_data *fsl_lpspi;
struct spi_controller *controller;
+   struct spi_imx_master *lpspi_platform_info =
+   dev_get_platdata(>dev);
struct resource *res;
-   int ret, irq;
+   int i, ret, irq;
u32 temp;
 
if (of_property_read_bool((>dev)->of_node, "spi-slave"))
@@ -539,6 +593,29 @@ static int fsl_lpspi_probe(struct platform_device *pdev)
fsl_lpspi->is_slave = of_property_read_bool((>dev)->of_node,
"spi-slave");
 
+   if (!fsl_lpspi->is_slave) {
+   for (i = 0; i < controller->num_chipselect; i++) {
+   int cs_gpio = of_get_named_gpio(np, "cs-gpios", i);
+
+   if (!gpio_is_valid(cs_gpio) && lpspi_platform_info)
+   cs_gpio = lpspi_platform_info->chipselect[i];
+
+   fsl_lpspi->chipselect[i] = cs_gpio;
+   if (!gpio_is_valid(cs_gpio))
+   continue;
+
+   ret = devm_gpio_request(>dev,
+   fsl_lpspi->chipselect[i], DRIVER_NAME);
+   if (ret) {
+   dev_err(>dev, "can't get cs gpios\n");
+   goto out_controller_put;
+   }
+   }
+
+   controller->prepare_message = fsl_lpspi_prepare_message;
+   controller->unprepare_message = fsl_lpspi_unprepare_message;
+   }
+
controller->transfer_one_message = fsl_lpspi_transfer_one_msg;
controller->prepare_transfer_hardware = lpspi_prepare_xfer_hardware;
controller->unprepare_transfer_hardware = lpspi_unprepare_xfer_hardware;
-- 
2.17.1

[PATCH v19 5/5] iommu/arm-smmu: Add support for qcom,smmu-v2 variant

2018-12-03 Thread Vivek Gautam

qcom,smmu-v2 is an arm,smmu-v2 implementation with specific
clock and power requirements.
On msm8996, multiple cores, viz. mdss, video, etc. use this
smmu. On sdm845, this smmu is used with gpu.
Add bindings for the same.

Signed-off-by: Vivek Gautam 
Reviewed-by: Rob Herring 
Reviewed-by: Tomasz Figa 
Tested-by: Srinivas Kandagatla 
Reviewed-by: Robin Murphy 
---

Changes since v18:
 None.

 drivers/iommu/arm-smmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index b6b11642b3a9..ba18d89d4732 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -120,6 +120,7 @@ enum arm_smmu_implementation {
GENERIC_SMMU,
ARM_MMU500,
CAVIUM_SMMUV2,
+   QCOM_SMMUV2,
 };
 
 struct arm_smmu_s2cr {
@@ -2030,6 +2031,7 @@ ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, 
GENERIC_SMMU);
 ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU);
 ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
 ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
+ARM_SMMU_MATCH_DATA(qcom_smmuv2, ARM_SMMU_V2, QCOM_SMMUV2);
 
 static const struct of_device_id arm_smmu_of_match[] = {
{ .compatible = "arm,smmu-v1", .data = _generic_v1 },
@@ -2038,6 +2040,7 @@ static const struct of_device_id arm_smmu_of_match[] = {
{ .compatible = "arm,mmu-401", .data = _mmu401 },
{ .compatible = "arm,mmu-500", .data = _mmu500 },
{ .compatible = "cavium,smmu-v2", .data = _smmuv2 },
+   { .compatible = "qcom,smmu-v2", .data = _smmuv2 },
{ },
 };
 MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

[PATCH v19 3/5] iommu/arm-smmu: Add the device_link between masters and smmu

2018-12-03 Thread Vivek Gautam

From: Sricharan R 

Finally add the device link between the master device and
smmu, so that the smmu gets runtime enabled/disabled only when the
master needs it. This is done from add_device callback which gets
called once when the master is added to the smmu.

Signed-off-by: Sricharan R 
Signed-off-by: Vivek Gautam 
Reviewed-by: Tomasz Figa 
Tested-by: Srinivas Kandagatla 
Reviewed-by: Robin Murphy 
---

Changes since v18:
 None.

 drivers/iommu/arm-smmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 1917d214c4d9..b6b11642b3a9 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1500,6 +1500,9 @@ static int arm_smmu_add_device(struct device *dev)
 
iommu_device_link(>iommu, dev);
 
+   device_link_add(dev, smmu->dev,
+   DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_SUPPLIER);
+
return 0;
 
 out_cfg_free:
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

[PATCH v19 5/5] iommu/arm-smmu: Add support for qcom,smmu-v2 variant

2018-12-03 Thread Vivek Gautam

qcom,smmu-v2 is an arm,smmu-v2 implementation with specific
clock and power requirements.
On msm8996, multiple cores, viz. mdss, video, etc. use this
smmu. On sdm845, this smmu is used with gpu.
Add bindings for the same.

Signed-off-by: Vivek Gautam 
Reviewed-by: Rob Herring 
Reviewed-by: Tomasz Figa 
Tested-by: Srinivas Kandagatla 
Reviewed-by: Robin Murphy 
---

Changes since v18:
 None.

 drivers/iommu/arm-smmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index b6b11642b3a9..ba18d89d4732 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -120,6 +120,7 @@ enum arm_smmu_implementation {
GENERIC_SMMU,
ARM_MMU500,
CAVIUM_SMMUV2,
+   QCOM_SMMUV2,
 };
 
 struct arm_smmu_s2cr {
@@ -2030,6 +2031,7 @@ ARM_SMMU_MATCH_DATA(smmu_generic_v2, ARM_SMMU_V2, 
GENERIC_SMMU);
 ARM_SMMU_MATCH_DATA(arm_mmu401, ARM_SMMU_V1_64K, GENERIC_SMMU);
 ARM_SMMU_MATCH_DATA(arm_mmu500, ARM_SMMU_V2, ARM_MMU500);
 ARM_SMMU_MATCH_DATA(cavium_smmuv2, ARM_SMMU_V2, CAVIUM_SMMUV2);
+ARM_SMMU_MATCH_DATA(qcom_smmuv2, ARM_SMMU_V2, QCOM_SMMUV2);
 
 static const struct of_device_id arm_smmu_of_match[] = {
{ .compatible = "arm,smmu-v1", .data = _generic_v1 },
@@ -2038,6 +2040,7 @@ static const struct of_device_id arm_smmu_of_match[] = {
{ .compatible = "arm,mmu-401", .data = _mmu401 },
{ .compatible = "arm,mmu-500", .data = _mmu500 },
{ .compatible = "cavium,smmu-v2", .data = _smmuv2 },
+   { .compatible = "qcom,smmu-v2", .data = _smmuv2 },
{ },
 };
 MODULE_DEVICE_TABLE(of, arm_smmu_of_match);
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

[PATCH v19 3/5] iommu/arm-smmu: Add the device_link between masters and smmu

2018-12-03 Thread Vivek Gautam

From: Sricharan R 

Finally add the device link between the master device and
smmu, so that the smmu gets runtime enabled/disabled only when the
master needs it. This is done from add_device callback which gets
called once when the master is added to the smmu.

Signed-off-by: Sricharan R 
Signed-off-by: Vivek Gautam 
Reviewed-by: Tomasz Figa 
Tested-by: Srinivas Kandagatla 
Reviewed-by: Robin Murphy 
---

Changes since v18:
 None.

 drivers/iommu/arm-smmu.c | 3 +++
 1 file changed, 3 insertions(+)

diff --git a/drivers/iommu/arm-smmu.c b/drivers/iommu/arm-smmu.c
index 1917d214c4d9..b6b11642b3a9 100644
--- a/drivers/iommu/arm-smmu.c
+++ b/drivers/iommu/arm-smmu.c
@@ -1500,6 +1500,9 @@ static int arm_smmu_add_device(struct device *dev)
 
iommu_device_link(>iommu, dev);
 
+   device_link_add(dev, smmu->dev,
+   DL_FLAG_PM_RUNTIME | DL_FLAG_AUTOREMOVE_SUPPLIER);
+
return 0;
 
 out_cfg_free:
-- 
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a member
of Code Aurora Forum, hosted by The Linux Foundation

[PATCH] spi: lpspi: Fix CLK pin becomes low before one transfer

2018-12-03 Thread Clark Wang

Remove Reset operation in fsl_lpspi_config(). This RST may cause both CLK
and CS pins go from high to low level under cs-gpio mode.
Add fsl_lpspi_reset() function after one message transfer to clear all
flags in use.

Signed-off-by: Clark Wang 
Reviewed-by: Fugang Duan 
---
 drivers/spi/spi-fsl-lpspi.c | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/spi/spi-fsl-lpspi.c b/drivers/spi/spi-fsl-lpspi.c
index f32a2e0d7ae1..a7d01b79827b 100644
--- a/drivers/spi/spi-fsl-lpspi.c
+++ b/drivers/spi/spi-fsl-lpspi.c
@@ -279,10 +279,6 @@ static int fsl_lpspi_config(struct fsl_lpspi_data 
*fsl_lpspi)
u32 temp;
int ret;
 
-   temp = CR_RST;
-   writel(temp, fsl_lpspi->base + IMX7ULP_CR);
-   writel(0, fsl_lpspi->base + IMX7ULP_CR);
-
if (!fsl_lpspi->is_slave) {
ret = fsl_lpspi_set_bitrate(fsl_lpspi);
if (ret)
@@ -373,6 +369,24 @@ static int fsl_lpspi_wait_for_completion(struct 
spi_controller *controller)
return 0;
 }
 
+static int fsl_lpspi_reset(struct fsl_lpspi_data *fsl_lpspi)
+{
+   u32 temp;
+
+   /* Disable all interrupt */
+   fsl_lpspi_intctrl(fsl_lpspi, 0);
+
+   /* W1C for all flags in SR */
+   temp = 0x3F << 8;
+   writel(temp, fsl_lpspi->base + IMX7ULP_SR);
+
+   /* Clear FIFO and disable module */
+   temp = CR_RRF | CR_RTF;
+   writel(temp, fsl_lpspi->base + IMX7ULP_CR);
+
+   return 0;
+}
+
 static int fsl_lpspi_transfer_one(struct spi_controller *controller,
  struct spi_device *spi,
  struct spi_transfer *t)
@@ -394,6 +408,8 @@ static int fsl_lpspi_transfer_one(struct spi_controller 
*controller,
if (ret)
return ret;
 
+   fsl_lpspi_reset(fsl_lpspi);
+
return 0;
 }
 
-- 
2.17.1

[PATCH] spi: lpspi: Fix CLK pin becomes low before one transfer

2018-12-03 Thread Clark Wang

Remove Reset operation in fsl_lpspi_config(). This RST may cause both CLK
and CS pins go from high to low level under cs-gpio mode.
Add fsl_lpspi_reset() function after one message transfer to clear all
flags in use.

Signed-off-by: Clark Wang 
Reviewed-by: Fugang Duan 
---
 drivers/spi/spi-fsl-lpspi.c | 24 
 1 file changed, 20 insertions(+), 4 deletions(-)

diff --git a/drivers/spi/spi-fsl-lpspi.c b/drivers/spi/spi-fsl-lpspi.c
index f32a2e0d7ae1..a7d01b79827b 100644
--- a/drivers/spi/spi-fsl-lpspi.c
+++ b/drivers/spi/spi-fsl-lpspi.c
@@ -279,10 +279,6 @@ static int fsl_lpspi_config(struct fsl_lpspi_data 
*fsl_lpspi)
u32 temp;
int ret;
 
-   temp = CR_RST;
-   writel(temp, fsl_lpspi->base + IMX7ULP_CR);
-   writel(0, fsl_lpspi->base + IMX7ULP_CR);
-
if (!fsl_lpspi->is_slave) {
ret = fsl_lpspi_set_bitrate(fsl_lpspi);
if (ret)
@@ -373,6 +369,24 @@ static int fsl_lpspi_wait_for_completion(struct 
spi_controller *controller)
return 0;
 }
 
+static int fsl_lpspi_reset(struct fsl_lpspi_data *fsl_lpspi)
+{
+   u32 temp;
+
+   /* Disable all interrupt */
+   fsl_lpspi_intctrl(fsl_lpspi, 0);
+
+   /* W1C for all flags in SR */
+   temp = 0x3F << 8;
+   writel(temp, fsl_lpspi->base + IMX7ULP_SR);
+
+   /* Clear FIFO and disable module */
+   temp = CR_RRF | CR_RTF;
+   writel(temp, fsl_lpspi->base + IMX7ULP_CR);
+
+   return 0;
+}
+
 static int fsl_lpspi_transfer_one(struct spi_controller *controller,
  struct spi_device *spi,
  struct spi_transfer *t)
@@ -394,6 +408,8 @@ static int fsl_lpspi_transfer_one(struct spi_controller 
*controller,
if (ret)
return ret;
 
+   fsl_lpspi_reset(fsl_lpspi);
+
return 0;
 }
 
-- 
2.17.1

[PATCH] spi: lpspi: Improve the stability of lpspi data transmission

2018-12-03 Thread Clark Wang

Use SR_TDF to judge if need to send data, and SR_FCF is to judge if
transmission end and to replace the waiting after transmission end.
This waiting has no actual meaning, for module will set the FCF
flag at the real end.

The changes of interrupt flag and ISR function reduce the times of
calling ISR. The use of the FCF flag improves the stability of the
data transmission. These two points generally improve the data
transfer speed of lpspi, especially when it is set to slave mode
it can support higher transfer speed of the host.

After making these changes, there is no need to use
fsl_lpspi_txfifo_empty(), so remove it.

Signed-off-by: Clark Wang 
---
 drivers/spi/spi-fsl-lpspi.c | 61 -
 1 file changed, 20 insertions(+), 41 deletions(-)

diff --git a/drivers/spi/spi-fsl-lpspi.c b/drivers/spi/spi-fsl-lpspi.c
index 3e935db5ff02..f32a2e0d7ae1 100644
--- a/drivers/spi/spi-fsl-lpspi.c
+++ b/drivers/spi/spi-fsl-lpspi.c
@@ -53,9 +53,11 @@
 #define CR_RST BIT(1)
 #define CR_MEN BIT(0)
 #define SR_TCF BIT(10)
+#define SR_FCF BIT(9)
 #define SR_RDF BIT(1)
 #define SR_TDF BIT(0)
 #define IER_TCIE   BIT(10)
+#define IER_FCIE   BIT(9)
 #define IER_RDIE   BIT(1)
 #define IER_TDIE   BIT(0)
 #define CFGR1_PCSCFG   BIT(27)
@@ -174,28 +176,10 @@ static int lpspi_unprepare_xfer_hardware(struct 
spi_controller *controller)
return 0;
 }
 
-static int fsl_lpspi_txfifo_empty(struct fsl_lpspi_data *fsl_lpspi)
-{
-   u32 txcnt;
-   unsigned long orig_jiffies = jiffies;
-
-   do {
-   txcnt = readl(fsl_lpspi->base + IMX7ULP_FSR) & 0xff;
-
-   if (time_after(jiffies, orig_jiffies + msecs_to_jiffies(500))) {
-   dev_dbg(fsl_lpspi->dev, "txfifo empty timeout\n");
-   return -ETIMEDOUT;
-   }
-   cond_resched();
-
-   } while (txcnt);
-
-   return 0;
-}
-
 static void fsl_lpspi_write_tx_fifo(struct fsl_lpspi_data *fsl_lpspi)
 {
u8 txfifo_cnt;
+   u32 temp;
 
txfifo_cnt = readl(fsl_lpspi->base + IMX7ULP_FSR) & 0xff;
 
@@ -206,9 +190,15 @@ static void fsl_lpspi_write_tx_fifo(struct fsl_lpspi_data 
*fsl_lpspi)
txfifo_cnt++;
}
 
-   if (!fsl_lpspi->remain && (txfifo_cnt < fsl_lpspi->txfifosize))
-   writel(0, fsl_lpspi->base + IMX7ULP_TDR);
-   else
+   if (txfifo_cnt < fsl_lpspi->txfifosize) {
+   if (!fsl_lpspi->is_slave) {
+   temp = readl(fsl_lpspi->base + IMX7ULP_TCR);
+   temp &= ~TCR_CONTC;
+   writel(temp, fsl_lpspi->base + IMX7ULP_TCR);
+   }
+
+   fsl_lpspi_intctrl(fsl_lpspi, IER_FCIE);
+   } else
fsl_lpspi_intctrl(fsl_lpspi, IER_TDIE);
 }
 
@@ -404,12 +394,6 @@ static int fsl_lpspi_transfer_one(struct spi_controller 
*controller,
if (ret)
return ret;
 
-   ret = fsl_lpspi_txfifo_empty(fsl_lpspi);
-   if (ret)
-   return ret;
-
-   fsl_lpspi_read_rx_fifo(fsl_lpspi);
-
return 0;
 }
 
@@ -421,7 +405,6 @@ static int fsl_lpspi_transfer_one_msg(struct spi_controller 
*controller,
struct spi_device *spi = msg->spi;
struct spi_transfer *xfer;
bool is_first_xfer = true;
-   u32 temp;
int ret = 0;
 
msg->status = 0;
@@ -441,13 +424,6 @@ static int fsl_lpspi_transfer_one_msg(struct 
spi_controller *controller,
}
 
 complete:
-   if (!fsl_lpspi->is_slave) {
-   /* de-assert SS, then finalize current message */
-   temp = readl(fsl_lpspi->base + IMX7ULP_TCR);
-   temp &= ~TCR_CONTC;
-   writel(temp, fsl_lpspi->base + IMX7ULP_TCR);
-   }
-
msg->status = ret;
spi_finalize_current_message(controller);
 
@@ -456,20 +432,23 @@ static int fsl_lpspi_transfer_one_msg(struct 
spi_controller *controller,
 
 static irqreturn_t fsl_lpspi_isr(int irq, void *dev_id)
 {
+   u32 temp_SR, temp_IER;
struct fsl_lpspi_data *fsl_lpspi = dev_id;
-   u32 temp;
 
+   temp_IER = readl(fsl_lpspi->base + IMX7ULP_IER);
fsl_lpspi_intctrl(fsl_lpspi, 0);
-   temp = readl(fsl_lpspi->base + IMX7ULP_SR);
+   temp_SR = readl(fsl_lpspi->base + IMX7ULP_SR);
 
fsl_lpspi_read_rx_fifo(fsl_lpspi);
 
-   if (temp & SR_TDF) {
+   if ((temp_SR & SR_TDF) && (temp_IER & IER_TDIE)) {
fsl_lpspi_write_tx_fifo(fsl_lpspi);
+   return IRQ_HANDLED;
+   }
 
-   if (!fsl_lpspi->remain)
+   if (temp_SR & SR_FCF && (temp_IER & IER_FCIE)) {
+   writel(SR_FCF, fsl_lpspi->base + IMX7ULP_SR);
complete(_lpspi->xfer_done);
-
return IRQ_HANDLED;
}
 
-- 
2.17.1

[PATCH] spi: lpspi: Improve the stability of lpspi data transmission

2018-12-03 Thread Clark Wang

Use SR_TDF to judge if need to send data, and SR_FCF is to judge if
transmission end and to replace the waiting after transmission end.
This waiting has no actual meaning, for module will set the FCF
flag at the real end.

The changes of interrupt flag and ISR function reduce the times of
calling ISR. The use of the FCF flag improves the stability of the
data transmission. These two points generally improve the data
transfer speed of lpspi, especially when it is set to slave mode
it can support higher transfer speed of the host.

After making these changes, there is no need to use
fsl_lpspi_txfifo_empty(), so remove it.

Signed-off-by: Clark Wang 
---
 drivers/spi/spi-fsl-lpspi.c | 61 -
 1 file changed, 20 insertions(+), 41 deletions(-)

diff --git a/drivers/spi/spi-fsl-lpspi.c b/drivers/spi/spi-fsl-lpspi.c
index 3e935db5ff02..f32a2e0d7ae1 100644
--- a/drivers/spi/spi-fsl-lpspi.c
+++ b/drivers/spi/spi-fsl-lpspi.c
@@ -53,9 +53,11 @@
 #define CR_RST BIT(1)
 #define CR_MEN BIT(0)
 #define SR_TCF BIT(10)
+#define SR_FCF BIT(9)
 #define SR_RDF BIT(1)
 #define SR_TDF BIT(0)
 #define IER_TCIE   BIT(10)
+#define IER_FCIE   BIT(9)
 #define IER_RDIE   BIT(1)
 #define IER_TDIE   BIT(0)
 #define CFGR1_PCSCFG   BIT(27)
@@ -174,28 +176,10 @@ static int lpspi_unprepare_xfer_hardware(struct 
spi_controller *controller)
return 0;
 }
 
-static int fsl_lpspi_txfifo_empty(struct fsl_lpspi_data *fsl_lpspi)
-{
-   u32 txcnt;
-   unsigned long orig_jiffies = jiffies;
-
-   do {
-   txcnt = readl(fsl_lpspi->base + IMX7ULP_FSR) & 0xff;
-
-   if (time_after(jiffies, orig_jiffies + msecs_to_jiffies(500))) {
-   dev_dbg(fsl_lpspi->dev, "txfifo empty timeout\n");
-   return -ETIMEDOUT;
-   }
-   cond_resched();
-
-   } while (txcnt);
-
-   return 0;
-}
-
 static void fsl_lpspi_write_tx_fifo(struct fsl_lpspi_data *fsl_lpspi)
 {
u8 txfifo_cnt;
+   u32 temp;
 
txfifo_cnt = readl(fsl_lpspi->base + IMX7ULP_FSR) & 0xff;
 
@@ -206,9 +190,15 @@ static void fsl_lpspi_write_tx_fifo(struct fsl_lpspi_data 
*fsl_lpspi)
txfifo_cnt++;
}
 
-   if (!fsl_lpspi->remain && (txfifo_cnt < fsl_lpspi->txfifosize))
-   writel(0, fsl_lpspi->base + IMX7ULP_TDR);
-   else
+   if (txfifo_cnt < fsl_lpspi->txfifosize) {
+   if (!fsl_lpspi->is_slave) {
+   temp = readl(fsl_lpspi->base + IMX7ULP_TCR);
+   temp &= ~TCR_CONTC;
+   writel(temp, fsl_lpspi->base + IMX7ULP_TCR);
+   }
+
+   fsl_lpspi_intctrl(fsl_lpspi, IER_FCIE);
+   } else
fsl_lpspi_intctrl(fsl_lpspi, IER_TDIE);
 }
 
@@ -404,12 +394,6 @@ static int fsl_lpspi_transfer_one(struct spi_controller 
*controller,
if (ret)
return ret;
 
-   ret = fsl_lpspi_txfifo_empty(fsl_lpspi);
-   if (ret)
-   return ret;
-
-   fsl_lpspi_read_rx_fifo(fsl_lpspi);
-
return 0;
 }
 
@@ -421,7 +405,6 @@ static int fsl_lpspi_transfer_one_msg(struct spi_controller 
*controller,
struct spi_device *spi = msg->spi;
struct spi_transfer *xfer;
bool is_first_xfer = true;
-   u32 temp;
int ret = 0;
 
msg->status = 0;
@@ -441,13 +424,6 @@ static int fsl_lpspi_transfer_one_msg(struct 
spi_controller *controller,
}
 
 complete:
-   if (!fsl_lpspi->is_slave) {
-   /* de-assert SS, then finalize current message */
-   temp = readl(fsl_lpspi->base + IMX7ULP_TCR);
-   temp &= ~TCR_CONTC;
-   writel(temp, fsl_lpspi->base + IMX7ULP_TCR);
-   }
-
msg->status = ret;
spi_finalize_current_message(controller);
 
@@ -456,20 +432,23 @@ static int fsl_lpspi_transfer_one_msg(struct 
spi_controller *controller,
 
 static irqreturn_t fsl_lpspi_isr(int irq, void *dev_id)
 {
+   u32 temp_SR, temp_IER;
struct fsl_lpspi_data *fsl_lpspi = dev_id;
-   u32 temp;
 
+   temp_IER = readl(fsl_lpspi->base + IMX7ULP_IER);
fsl_lpspi_intctrl(fsl_lpspi, 0);
-   temp = readl(fsl_lpspi->base + IMX7ULP_SR);
+   temp_SR = readl(fsl_lpspi->base + IMX7ULP_SR);
 
fsl_lpspi_read_rx_fifo(fsl_lpspi);
 
-   if (temp & SR_TDF) {
+   if ((temp_SR & SR_TDF) && (temp_IER & IER_TDIE)) {
fsl_lpspi_write_tx_fifo(fsl_lpspi);
+   return IRQ_HANDLED;
+   }
 
-   if (!fsl_lpspi->remain)
+   if (temp_SR & SR_FCF && (temp_IER & IER_FCIE)) {
+   writel(SR_FCF, fsl_lpspi->base + IMX7ULP_SR);
complete(_lpspi->xfer_done);
-
return IRQ_HANDLED;
}
 
-- 
2.17.1

[PATCH] spi: lpspi: enable runtime pm for lpspi

2018-12-03 Thread Clark Wang

From: Han Xu 

Enable the runtime power management for lpspi module.

Signed-off-by: Han Xu 
Reviewed-by: Frank Li 
---
 drivers/spi/spi-fsl-lpspi.c | 117 
 1 file changed, 92 insertions(+), 25 deletions(-)

diff --git a/drivers/spi/spi-fsl-lpspi.c b/drivers/spi/spi-fsl-lpspi.c
index 5802f188051b..3e935db5ff02 100644
--- a/drivers/spi/spi-fsl-lpspi.c
+++ b/drivers/spi/spi-fsl-lpspi.c
@@ -16,7 +16,9 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -24,6 +26,8 @@
 
 #define DRIVER_NAME "fsl_lpspi"
 
+#define FSL_LPSPI_RPM_TIMEOUT 50 /* 50ms */
+
 /* i.MX7ULP LPSPI registers */
 #define IMX7ULP_VERID  0x0
 #define IMX7ULP_PARAM  0x4
@@ -150,13 +154,9 @@ static int lpspi_prepare_xfer_hardware(struct 
spi_controller *controller)
spi_controller_get_devdata(controller);
int ret;
 
-   ret = clk_prepare_enable(fsl_lpspi->clk_ipg);
-   if (ret)
-   return ret;
-
-   ret = clk_prepare_enable(fsl_lpspi->clk_per);
-   if (ret) {
-   clk_disable_unprepare(fsl_lpspi->clk_ipg);
+   ret = pm_runtime_get_sync(fsl_lpspi->dev);
+   if (ret < 0) {
+   dev_err(fsl_lpspi->dev, "failed to enable clock\n");
return ret;
}
 
@@ -168,8 +168,8 @@ static int lpspi_unprepare_xfer_hardware(struct 
spi_controller *controller)
struct fsl_lpspi_data *fsl_lpspi =
spi_controller_get_devdata(controller);
 
-   clk_disable_unprepare(fsl_lpspi->clk_ipg);
-   clk_disable_unprepare(fsl_lpspi->clk_per);
+   pm_runtime_mark_last_busy(fsl_lpspi->dev);
+   pm_runtime_put_autosuspend(fsl_lpspi->dev);
 
return 0;
 }
@@ -476,6 +476,45 @@ static irqreturn_t fsl_lpspi_isr(int irq, void *dev_id)
return IRQ_NONE;
 }
 
+int fsl_lpspi_runtime_resume(struct device *dev)
+{
+   struct fsl_lpspi_data *fsl_lpspi = dev_get_drvdata(dev);
+   int ret;
+
+   ret = clk_prepare_enable(fsl_lpspi->clk_per);
+   if (ret)
+   return ret;
+
+   ret = clk_prepare_enable(fsl_lpspi->clk_ipg);
+   if (ret) {
+   clk_disable_unprepare(fsl_lpspi->clk_per);
+   return ret;
+   }
+
+   return 0;
+}
+
+int fsl_lpspi_runtime_suspend(struct device *dev)
+{
+   struct fsl_lpspi_data *fsl_lpspi = dev_get_drvdata(dev);
+
+   clk_disable_unprepare(fsl_lpspi->clk_per);
+   clk_disable_unprepare(fsl_lpspi->clk_ipg);
+
+   return 0;
+}
+
+static int fsl_lpspi_init_rpm(struct fsl_lpspi_data *fsl_lpspi)
+{
+   struct device *dev = fsl_lpspi->dev;
+
+   pm_runtime_enable(dev);
+   pm_runtime_set_autosuspend_delay(dev, FSL_LPSPI_RPM_TIMEOUT);
+   pm_runtime_use_autosuspend(dev);
+
+   return 0;
+}
+
 static int fsl_lpspi_probe(struct platform_device *pdev)
 {
struct fsl_lpspi_data *fsl_lpspi;
@@ -501,6 +540,7 @@ static int fsl_lpspi_probe(struct platform_device *pdev)
 
fsl_lpspi = spi_controller_get_devdata(controller);
fsl_lpspi->dev = >dev;
+   dev_set_drvdata(>dev, fsl_lpspi);
fsl_lpspi->is_slave = of_property_read_bool((>dev)->of_node,
"spi-slave");
 
@@ -547,28 +587,21 @@ static int fsl_lpspi_probe(struct platform_device *pdev)
goto out_controller_put;
}
 
-   ret = clk_prepare_enable(fsl_lpspi->clk_ipg);
-   if (ret) {
-   dev_err(>dev,
-   "can't enable lpspi ipg clock, ret=%d\n", ret);
+   /* enable the clock */
+   ret = fsl_lpspi_init_rpm(fsl_lpspi);
+   if (ret)
goto out_controller_put;
-   }
 
-   ret = clk_prepare_enable(fsl_lpspi->clk_per);
-   if (ret) {
-   dev_err(>dev,
-   "can't enable lpspi per clock, ret=%d\n", ret);
-   clk_disable_unprepare(fsl_lpspi->clk_ipg);
-   goto out_controller_put;
+   ret = pm_runtime_get_sync(fsl_lpspi->dev);
+   if (ret < 0) {
+   dev_err(fsl_lpspi->dev, "failed to enable clock\n");
+   return ret;
}
 
temp = readl(fsl_lpspi->base + IMX7ULP_PARAM);
fsl_lpspi->txfifosize = 1 << (temp & 0x0f);
fsl_lpspi->rxfifosize = 1 << ((temp >> 8) & 0x0f);
 
-   clk_disable_unprepare(fsl_lpspi->clk_per);
-   clk_disable_unprepare(fsl_lpspi->clk_ipg);
-
ret = devm_spi_register_controller(>dev, controller);
if (ret < 0) {
dev_err(>dev, "spi_register_controller error.\n");
@@ -589,16 +622,50 @@ static int fsl_lpspi_remove(struct platform_device *pdev)
struct fsl_lpspi_data *fsl_lpspi =
spi_controller_get_devdata(controller);
 
-   clk_disable_unprepare(fsl_lpspi->clk_per);
-   clk_disable_unprepare(fsl_lpspi->clk_ipg);
+   pm_runtime_disable(fsl_lpspi->dev);
+
+

[PATCH] spi: lpspi: enable runtime pm for lpspi

2018-12-03 Thread Clark Wang

From: Han Xu 

Enable the runtime power management for lpspi module.

Signed-off-by: Han Xu 
Reviewed-by: Frank Li 
---
 drivers/spi/spi-fsl-lpspi.c | 117 
 1 file changed, 92 insertions(+), 25 deletions(-)

diff --git a/drivers/spi/spi-fsl-lpspi.c b/drivers/spi/spi-fsl-lpspi.c
index 5802f188051b..3e935db5ff02 100644
--- a/drivers/spi/spi-fsl-lpspi.c
+++ b/drivers/spi/spi-fsl-lpspi.c
@@ -16,7 +16,9 @@
 #include 
 #include 
 #include 
+#include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -24,6 +26,8 @@
 
 #define DRIVER_NAME "fsl_lpspi"
 
+#define FSL_LPSPI_RPM_TIMEOUT 50 /* 50ms */
+
 /* i.MX7ULP LPSPI registers */
 #define IMX7ULP_VERID  0x0
 #define IMX7ULP_PARAM  0x4
@@ -150,13 +154,9 @@ static int lpspi_prepare_xfer_hardware(struct 
spi_controller *controller)
spi_controller_get_devdata(controller);
int ret;
 
-   ret = clk_prepare_enable(fsl_lpspi->clk_ipg);
-   if (ret)
-   return ret;
-
-   ret = clk_prepare_enable(fsl_lpspi->clk_per);
-   if (ret) {
-   clk_disable_unprepare(fsl_lpspi->clk_ipg);
+   ret = pm_runtime_get_sync(fsl_lpspi->dev);
+   if (ret < 0) {
+   dev_err(fsl_lpspi->dev, "failed to enable clock\n");
return ret;
}
 
@@ -168,8 +168,8 @@ static int lpspi_unprepare_xfer_hardware(struct 
spi_controller *controller)
struct fsl_lpspi_data *fsl_lpspi =
spi_controller_get_devdata(controller);
 
-   clk_disable_unprepare(fsl_lpspi->clk_ipg);
-   clk_disable_unprepare(fsl_lpspi->clk_per);
+   pm_runtime_mark_last_busy(fsl_lpspi->dev);
+   pm_runtime_put_autosuspend(fsl_lpspi->dev);
 
return 0;
 }
@@ -476,6 +476,45 @@ static irqreturn_t fsl_lpspi_isr(int irq, void *dev_id)
return IRQ_NONE;
 }
 
+int fsl_lpspi_runtime_resume(struct device *dev)
+{
+   struct fsl_lpspi_data *fsl_lpspi = dev_get_drvdata(dev);
+   int ret;
+
+   ret = clk_prepare_enable(fsl_lpspi->clk_per);
+   if (ret)
+   return ret;
+
+   ret = clk_prepare_enable(fsl_lpspi->clk_ipg);
+   if (ret) {
+   clk_disable_unprepare(fsl_lpspi->clk_per);
+   return ret;
+   }
+
+   return 0;
+}
+
+int fsl_lpspi_runtime_suspend(struct device *dev)
+{
+   struct fsl_lpspi_data *fsl_lpspi = dev_get_drvdata(dev);
+
+   clk_disable_unprepare(fsl_lpspi->clk_per);
+   clk_disable_unprepare(fsl_lpspi->clk_ipg);
+
+   return 0;
+}
+
+static int fsl_lpspi_init_rpm(struct fsl_lpspi_data *fsl_lpspi)
+{
+   struct device *dev = fsl_lpspi->dev;
+
+   pm_runtime_enable(dev);
+   pm_runtime_set_autosuspend_delay(dev, FSL_LPSPI_RPM_TIMEOUT);
+   pm_runtime_use_autosuspend(dev);
+
+   return 0;
+}
+
 static int fsl_lpspi_probe(struct platform_device *pdev)
 {
struct fsl_lpspi_data *fsl_lpspi;
@@ -501,6 +540,7 @@ static int fsl_lpspi_probe(struct platform_device *pdev)
 
fsl_lpspi = spi_controller_get_devdata(controller);
fsl_lpspi->dev = >dev;
+   dev_set_drvdata(>dev, fsl_lpspi);
fsl_lpspi->is_slave = of_property_read_bool((>dev)->of_node,
"spi-slave");
 
@@ -547,28 +587,21 @@ static int fsl_lpspi_probe(struct platform_device *pdev)
goto out_controller_put;
}
 
-   ret = clk_prepare_enable(fsl_lpspi->clk_ipg);
-   if (ret) {
-   dev_err(>dev,
-   "can't enable lpspi ipg clock, ret=%d\n", ret);
+   /* enable the clock */
+   ret = fsl_lpspi_init_rpm(fsl_lpspi);
+   if (ret)
goto out_controller_put;
-   }
 
-   ret = clk_prepare_enable(fsl_lpspi->clk_per);
-   if (ret) {
-   dev_err(>dev,
-   "can't enable lpspi per clock, ret=%d\n", ret);
-   clk_disable_unprepare(fsl_lpspi->clk_ipg);
-   goto out_controller_put;
+   ret = pm_runtime_get_sync(fsl_lpspi->dev);
+   if (ret < 0) {
+   dev_err(fsl_lpspi->dev, "failed to enable clock\n");
+   return ret;
}
 
temp = readl(fsl_lpspi->base + IMX7ULP_PARAM);
fsl_lpspi->txfifosize = 1 << (temp & 0x0f);
fsl_lpspi->rxfifosize = 1 << ((temp >> 8) & 0x0f);
 
-   clk_disable_unprepare(fsl_lpspi->clk_per);
-   clk_disable_unprepare(fsl_lpspi->clk_ipg);
-
ret = devm_spi_register_controller(>dev, controller);
if (ret < 0) {
dev_err(>dev, "spi_register_controller error.\n");
@@ -589,16 +622,50 @@ static int fsl_lpspi_remove(struct platform_device *pdev)
struct fsl_lpspi_data *fsl_lpspi =
spi_controller_get_devdata(controller);
 
-   clk_disable_unprepare(fsl_lpspi->clk_per);
-   clk_disable_unprepare(fsl_lpspi->clk_ipg);
+   pm_runtime_disable(fsl_lpspi->dev);
+
+

Re: [PATCH v2 3/5] devfreq: add devfreq_suspend/resume() functions

2018-12-03 Thread Chanwoo Choi

Hi Lukasz,

On 2018년 12월 03일 23:31, Lukasz Luba wrote:
> This patch adds implementation for global suspend/resume for
> devfreq framework. System suspend will next use these functions.
> 
> The patch is based on earlier work by Tobias Jakobi.

Please remove it from each patch description.

> 
> Suggested-by: Tobias Jakobi 
> Suggested-by: Chanwoo Choi 
> Signed-off-by: Lukasz Luba 
> ---
>  drivers/devfreq/devfreq.c | 42 ++
>  include/linux/devfreq.h   |  6 ++
>  2 files changed, 48 insertions(+)
> 
> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
> index 36bed24..7d60423 100644
> --- a/drivers/devfreq/devfreq.c
> +++ b/drivers/devfreq/devfreq.c
> @@ -935,6 +935,48 @@ int devfreq_resume_device(struct devfreq *devfreq)
>  EXPORT_SYMBOL(devfreq_resume_device);
>  
>  /**
> + * devfreq_suspend() - Suspend devfreq governors and devices
> + *
> + * Called during system wide Suspend/Hibernate cycles for suspending 
> governors
> + * and devices preserving the state for resume. On some platforms the devfreq
> + * device must have precise state (frequency) after resume in order to 
> provide
> + * fully operating setup.
> + */
> +void devfreq_suspend(void)
> +{
> + struct devfreq *devfreq;
> + int ret;
> +
> + mutex_lock(_list_lock);
> + list_for_each_entry(devfreq, _list, node) {
> + ret = devfreq_suspend_device(devfreq);
> + if (ret)
> + dev_warn(>dev, "device suspend failed\n");

When I checked the cpufreq_suspend(), cpufreq_suspend() prints message as 'err' 
level.
I think that dev_err is more proper than dev_warn.

I'm not sure what is more correct log.
But, 'devfreq->dev' device has the separate suspend/resume function.
So, I think that devfreq_suspend() should print error log containing
that it is error by devfreq framework.

"device suspend failed"
-> "failed to suspend devfreq device"

> + }
> + mutex_unlock(_list_lock);
> +}
> +
> +/**
> + * devfreq_resume() - Resume devfreq governors and devices
> + *
> + * Called during system wide Suspend/Hibernate cycle for resuming governors 
> and
> + * devices that are suspended with devfreq_suspend().
> + */
> +void devfreq_resume(void)
> +{
> + struct devfreq *devfreq;
> + int ret;
> +
> + mutex_lock(_list_lock);
> + list_for_each_entry(devfreq, _list, node) {
> + ret = devfreq_resume_device(devfreq);
> + if (ret)
> + dev_warn(>dev, "device resume failed\n");

ditto.

"device resume failed"
-> "failed to resume devfreq device"


> + }
> + mutex_unlock(_list_lock);
> +}
> +
> +/**
>   * devfreq_add_governor() - Add devfreq governor
>   * @governor:the devfreq governor to be added
>   */
> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> index d985199..fbffa74 100644
> --- a/include/linux/devfreq.h
> +++ b/include/linux/devfreq.h
> @@ -205,6 +205,9 @@ extern void devm_devfreq_remove_device(struct device *dev,
>  extern int devfreq_suspend_device(struct devfreq *devfreq);
>  extern int devfreq_resume_device(struct devfreq *devfreq);
>  
> +extern void devfreq_suspend(void);
> +extern void devfreq_resume(void);
> +
>  /**
>   * update_devfreq() - Reevaluate the device and configure frequency
>   * @devfreq: the devfreq device
> @@ -331,6 +334,9 @@ static inline int devfreq_resume_device(struct devfreq 
> *devfreq)
>   return 0;
>  }
>  
> +static inline void devfreq_suspend(void) {}
> +static inline void devfreq_resume(void) {}
> +
>  static inline struct dev_pm_opp *devfreq_recommended_opp(struct device *dev,
>  unsigned long *freq, u32 flags)
>  {
> 

-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

Re: [PATCH v2 3/5] devfreq: add devfreq_suspend/resume() functions

2018-12-03 Thread Chanwoo Choi

Hi Lukasz,

On 2018년 12월 03일 23:31, Lukasz Luba wrote:
> This patch adds implementation for global suspend/resume for
> devfreq framework. System suspend will next use these functions.
> 
> The patch is based on earlier work by Tobias Jakobi.

Please remove it from each patch description.

> 
> Suggested-by: Tobias Jakobi 
> Suggested-by: Chanwoo Choi 
> Signed-off-by: Lukasz Luba 
> ---
>  drivers/devfreq/devfreq.c | 42 ++
>  include/linux/devfreq.h   |  6 ++
>  2 files changed, 48 insertions(+)
> 
> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
> index 36bed24..7d60423 100644
> --- a/drivers/devfreq/devfreq.c
> +++ b/drivers/devfreq/devfreq.c
> @@ -935,6 +935,48 @@ int devfreq_resume_device(struct devfreq *devfreq)
>  EXPORT_SYMBOL(devfreq_resume_device);
>  
>  /**
> + * devfreq_suspend() - Suspend devfreq governors and devices
> + *
> + * Called during system wide Suspend/Hibernate cycles for suspending 
> governors
> + * and devices preserving the state for resume. On some platforms the devfreq
> + * device must have precise state (frequency) after resume in order to 
> provide
> + * fully operating setup.
> + */
> +void devfreq_suspend(void)
> +{
> + struct devfreq *devfreq;
> + int ret;
> +
> + mutex_lock(_list_lock);
> + list_for_each_entry(devfreq, _list, node) {
> + ret = devfreq_suspend_device(devfreq);
> + if (ret)
> + dev_warn(>dev, "device suspend failed\n");

When I checked the cpufreq_suspend(), cpufreq_suspend() prints message as 'err' 
level.
I think that dev_err is more proper than dev_warn.

I'm not sure what is more correct log.
But, 'devfreq->dev' device has the separate suspend/resume function.
So, I think that devfreq_suspend() should print error log containing
that it is error by devfreq framework.

"device suspend failed"
-> "failed to suspend devfreq device"

> + }
> + mutex_unlock(_list_lock);
> +}
> +
> +/**
> + * devfreq_resume() - Resume devfreq governors and devices
> + *
> + * Called during system wide Suspend/Hibernate cycle for resuming governors 
> and
> + * devices that are suspended with devfreq_suspend().
> + */
> +void devfreq_resume(void)
> +{
> + struct devfreq *devfreq;
> + int ret;
> +
> + mutex_lock(_list_lock);
> + list_for_each_entry(devfreq, _list, node) {
> + ret = devfreq_resume_device(devfreq);
> + if (ret)
> + dev_warn(>dev, "device resume failed\n");

ditto.

"device resume failed"
-> "failed to resume devfreq device"


> + }
> + mutex_unlock(_list_lock);
> +}
> +
> +/**
>   * devfreq_add_governor() - Add devfreq governor
>   * @governor:the devfreq governor to be added
>   */
> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
> index d985199..fbffa74 100644
> --- a/include/linux/devfreq.h
> +++ b/include/linux/devfreq.h
> @@ -205,6 +205,9 @@ extern void devm_devfreq_remove_device(struct device *dev,
>  extern int devfreq_suspend_device(struct devfreq *devfreq);
>  extern int devfreq_resume_device(struct devfreq *devfreq);
>  
> +extern void devfreq_suspend(void);
> +extern void devfreq_resume(void);
> +
>  /**
>   * update_devfreq() - Reevaluate the device and configure frequency
>   * @devfreq: the devfreq device
> @@ -331,6 +334,9 @@ static inline int devfreq_resume_device(struct devfreq 
> *devfreq)
>   return 0;
>  }
>  
> +static inline void devfreq_suspend(void) {}
> +static inline void devfreq_resume(void) {}
> +
>  static inline struct dev_pm_opp *devfreq_recommended_opp(struct device *dev,
>  unsigned long *freq, u32 flags)
>  {
> 

-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

Re: [PATCH v2 2/3] clk: ti: check clock type before doing autoidle ops

2018-12-03 Thread Andreas Kemnade

On Mon, 3 Dec 2018 07:39:10 -0800
Tony Lindgren  wrote:

> * Stephen Boyd  [181130 23:52]:
> > Quoting Tony Lindgren (2018-11-30 07:37:29)  
> > > Hi,
> > > 
> > > * Tero Kristo  [181130 09:21]:  
> > > > On 30/11/2018 09:57, Stephen Boyd wrote:  
> > > > > No that is not preferred. Can the omap2_clk_deny_idle() function be
> > > > > integrated closer into the clk framework in some way that allows it to
> > > > > be part of the clk_ops structure? And then have that take a clk_hw
> > > > > structure instead of a struct clk? I haven't looked at this in any
> > > > > detail whatsoever so I may be way off right now.  
> > > > 
> > > > It could be added under the main clk_ops struct, however this would
> > > > introduce two new func pointers to it which are not used by anything 
> > > > else
> > > > but OMAP. Are you aware of any other platforms requiring similar 
> > > > feature?  
> > > 
> > > From consumer usage point of view, I'm still wondering about
> > > the relationship of clk_deny_idle() and clkdm_deny_idle().
> > > 
> > > It seems that we need to allow reset control drivers call
> > > clk_deny_idle() for the duration of reset. And it seems the
> > > clk_deny_idle() should propagate to also up to the related
> > > clock domain driver to do clkdm_deny_idle().
> > > 
> > > So maybe clk_deny_idle() is could just be something like:
> > > 
> > > dev = clk_get_device(clk);
> > > ...
> > > error = pm_runtime_get(dev);
> > > ...
> > > pm_runtime_put(dev);
> > > ...
> > > 
> > > And that way it would just propagate to the parent clock
> > > domain driver and the clock framework does not need to know
> > > about clockdomains. A clockdomain could be just a genpd
> > > domain.
> > > 
> > > Or do you guys have better ideas?
> > >   
> > 
> > Wouldn't the device link in clk framework patches do this for you if we
> > had the RUNTIME_PM flag passed in. If this is about keeping the clock
> > controller active when a consumer device is using it then I think it may
> > work.  
> 
> The consumer device stays active just fine with PM runtime
> calls. So yes, the problem is keeping a clock controller forced
> active for the period of consumer device reset. Other than
> that typically autoidle can be just kept enabled.
> 
Are we still talking about the same problem? Maybe I am losing track
here. Just to make sure. 
The patch series was about disabling autoidle for devices which cannot
work with it during normal operation. Not during reset or something
like that. 
Or is the keep-clock-active-during-reset just a requirement for bigger
restructuring ideas?

Regards,
Andreas


pgp45FsEyTWw6.pgp
Description: OpenPGP digital signature

Re: [PATCH v2 2/3] clk: ti: check clock type before doing autoidle ops

2018-12-03 Thread Andreas Kemnade

On Mon, 3 Dec 2018 07:39:10 -0800
Tony Lindgren  wrote:

> * Stephen Boyd  [181130 23:52]:
> > Quoting Tony Lindgren (2018-11-30 07:37:29)  
> > > Hi,
> > > 
> > > * Tero Kristo  [181130 09:21]:  
> > > > On 30/11/2018 09:57, Stephen Boyd wrote:  
> > > > > No that is not preferred. Can the omap2_clk_deny_idle() function be
> > > > > integrated closer into the clk framework in some way that allows it to
> > > > > be part of the clk_ops structure? And then have that take a clk_hw
> > > > > structure instead of a struct clk? I haven't looked at this in any
> > > > > detail whatsoever so I may be way off right now.  
> > > > 
> > > > It could be added under the main clk_ops struct, however this would
> > > > introduce two new func pointers to it which are not used by anything 
> > > > else
> > > > but OMAP. Are you aware of any other platforms requiring similar 
> > > > feature?  
> > > 
> > > From consumer usage point of view, I'm still wondering about
> > > the relationship of clk_deny_idle() and clkdm_deny_idle().
> > > 
> > > It seems that we need to allow reset control drivers call
> > > clk_deny_idle() for the duration of reset. And it seems the
> > > clk_deny_idle() should propagate to also up to the related
> > > clock domain driver to do clkdm_deny_idle().
> > > 
> > > So maybe clk_deny_idle() is could just be something like:
> > > 
> > > dev = clk_get_device(clk);
> > > ...
> > > error = pm_runtime_get(dev);
> > > ...
> > > pm_runtime_put(dev);
> > > ...
> > > 
> > > And that way it would just propagate to the parent clock
> > > domain driver and the clock framework does not need to know
> > > about clockdomains. A clockdomain could be just a genpd
> > > domain.
> > > 
> > > Or do you guys have better ideas?
> > >   
> > 
> > Wouldn't the device link in clk framework patches do this for you if we
> > had the RUNTIME_PM flag passed in. If this is about keeping the clock
> > controller active when a consumer device is using it then I think it may
> > work.  
> 
> The consumer device stays active just fine with PM runtime
> calls. So yes, the problem is keeping a clock controller forced
> active for the period of consumer device reset. Other than
> that typically autoidle can be just kept enabled.
> 
Are we still talking about the same problem? Maybe I am losing track
here. Just to make sure. 
The patch series was about disabling autoidle for devices which cannot
work with it during normal operation. Not during reset or something
like that. 
Or is the keep-clock-active-during-reset just a requirement for bigger
restructuring ideas?

Regards,
Andreas


pgp45FsEyTWw6.pgp
Description: OpenPGP digital signature

Re: [PATCH v2 2/5] devfreq: add support for suspend/resume of a devfreq device

2018-12-03 Thread Chanwoo Choi

Hi Lukasz,

I add the comment about 'suspend_count'.

On 2018년 12월 04일 14:43, Chanwoo Choi wrote:
> Hi,
> 
> On 2018년 12월 04일 14:36, Chanwoo Choi wrote:
>> Hi Lukasz,
>>
>> Looks good to me. But, I add the some comments.
>> If you will fix it, feel free to add my tag:
>> Reviewed-by: Chanwoo choi 
> 
> Sorry. Fix typo 'choi' to 'Choi' as following.
> Reviewed-by: Chanwoo Choi 
> 
>>
>> On 2018년 12월 03일 23:31, Lukasz Luba wrote:
>>> The patch prepares devfreq device for handling suspend/resume
>>> functionality.  The new fields will store needed information during this
>>
>> nitpick. Remove unneeded space. There are two spaces between '.' and 'The 
>> new'. 
>>
>>> process.  Devfreq framework handles opp-suspend DT entry and there is no
>>
>> ditto.
>>
>>> need of modyfications in the drivers code.  It uses atomic variables to
>>
>> ditto.
>>
>>> make sure no race condition affects the process.
>>>
>>> The patch is based on earlier work by Tobias Jakobi.
>>
>> Please remove it from each patch description.
>>
>>>
>>> Suggested-by: Tobias Jakobi 
>>> Suggested-by: Chanwoo Choi 
>>> Signed-off-by: Lukasz Luba 
>>> ---
>>>  drivers/devfreq/devfreq.c | 51 
>>> +++
>>>  include/linux/devfreq.h   |  7 +++
>>>  2 files changed, 50 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
>>> index a9fd61b..36bed24 100644
>>> --- a/drivers/devfreq/devfreq.c
>>> +++ b/drivers/devfreq/devfreq.c
>>> @@ -316,6 +316,10 @@ static int devfreq_set_target(struct devfreq *devfreq, 
>>> unsigned long new_freq,
>>> "Couldn't update frequency transition information.\n");
>>>  
>>> devfreq->previous_freq = new_freq;
>>> +
>>> +   if (devfreq->suspend_freq)
>>> +   devfreq->resume_freq = cur_freq;
>>> +
>>> return err;
>>>  }
>>>  
>>> @@ -667,6 +671,9 @@ struct devfreq *devfreq_add_device(struct device *dev,
>>> }
>>> devfreq->max_freq = devfreq->scaling_max_freq;
>>>  
>>> +   devfreq->suspend_freq = dev_pm_opp_get_suspend_opp_freq(dev);
>>> +   atomic_set(>suspend_count, 0);
>>> +
>>> dev_set_name(>dev, "devfreq%d",
>>> atomic_inc_return(_no));
>>> err = device_register(>dev);
>>> @@ -867,14 +874,28 @@ EXPORT_SYMBOL(devm_devfreq_remove_device);
>>>   */
>>>  int devfreq_suspend_device(struct devfreq *devfreq)
>>>  {
>>> +   int ret;
>>> +
>>> if (!devfreq)
>>> return -EINVAL;
>>>  
>>> -   if (!devfreq->governor)
>>> -   return 0;
>>> +   if (devfreq->governor) {
>>> +   ret = devfreq->governor->event_handler(devfreq,
>>> +   DEVFREQ_GOV_SUSPEND, NULL);
>>> +   if (ret)
>>> +   return ret;
>>> +   }
>>> +
>>> +   if (devfreq->suspend_freq) {
>>> +   if (atomic_inc_return(>suspend_count) > 1)
>>> +   return 0;
>>> +
>>> +   ret = devfreq_set_target(devfreq, devfreq->suspend_freq, 0);
>>> +   if (ret)
>>> +   return ret;
>>> +   }

In this patch, if some users call 'devfreq_suspend_device' twice,
'devfreq->governor->event_handler(devfreq, DEVFREQ_GOV_SUSPEND, NULL)'
is called twice but devfreq_set_target() is called only one.
I knew that it is no problem for operation.

But,
I think that you better to use 'suspend_count' as the reference count
of devfreq_suspend/resume_device(). But, if you use 'suspend_count'
in order to check whether this devfreq is suspended or not,
we can reduce the unneeded redundant call when calling it twice.

clock and regulator used the 'reference count' method in order to
remove the redundant call.


>>>  
>>> -   return devfreq->governor->event_handler(devfreq,
>>> -   DEVFREQ_GOV_SUSPEND, NULL);
>>> +   return 0;
>>>  }
>>>  EXPORT_SYMBOL(devfreq_suspend_device);
>>>  
>>> @@ -888,14 +909,28 @@ EXPORT_SYMBOL(devfreq_suspend_device);
>>>   */
>>>  int devfreq_resume_device(struct devfreq *devfreq)
>>>  {
>>> +   int ret;
>>> +
>>> if (!devfreq)
>>> return -EINVAL;
>>>  
>>> -   if (!devfreq->governor)
>>> -   return 0;
>>> +   if (devfreq->resume_freq) {
>>> +   if (atomic_dec_return(>suspend_count) >= 1)
>>> +   return 0;

ditto.

>>>  
>>> -   return devfreq->governor->event_handler(devfreq,
>>> -   DEVFREQ_GOV_RESUME, NULL);
>>> +   ret = devfreq_set_target(devfreq, devfreq->resume_freq, 0);
>>> +   if (ret)
>>> +   return ret;
>>> +   }
>>> +
>>> +   if (devfreq->governor) {
>>> +   ret = devfreq->governor->event_handler(devfreq,
>>> +   DEVFREQ_GOV_RESUME, NULL);
>>> +   if (ret)
>>> +   return ret;
>>> +   }
>>> +
>>> +   return 0;
>>>  }
>>>  EXPORT_SYMBOL(devfreq_resume_device);
>>>  
>>> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
>>> index e4963b0..d985199 100644
>>> ---

Re: [PATCH v2 2/5] devfreq: add support for suspend/resume of a devfreq device

2018-12-03 Thread Chanwoo Choi

Hi Lukasz,

I add the comment about 'suspend_count'.

On 2018년 12월 04일 14:43, Chanwoo Choi wrote:
> Hi,
> 
> On 2018년 12월 04일 14:36, Chanwoo Choi wrote:
>> Hi Lukasz,
>>
>> Looks good to me. But, I add the some comments.
>> If you will fix it, feel free to add my tag:
>> Reviewed-by: Chanwoo choi 
> 
> Sorry. Fix typo 'choi' to 'Choi' as following.
> Reviewed-by: Chanwoo Choi 
> 
>>
>> On 2018년 12월 03일 23:31, Lukasz Luba wrote:
>>> The patch prepares devfreq device for handling suspend/resume
>>> functionality.  The new fields will store needed information during this
>>
>> nitpick. Remove unneeded space. There are two spaces between '.' and 'The 
>> new'. 
>>
>>> process.  Devfreq framework handles opp-suspend DT entry and there is no
>>
>> ditto.
>>
>>> need of modyfications in the drivers code.  It uses atomic variables to
>>
>> ditto.
>>
>>> make sure no race condition affects the process.
>>>
>>> The patch is based on earlier work by Tobias Jakobi.
>>
>> Please remove it from each patch description.
>>
>>>
>>> Suggested-by: Tobias Jakobi 
>>> Suggested-by: Chanwoo Choi 
>>> Signed-off-by: Lukasz Luba 
>>> ---
>>>  drivers/devfreq/devfreq.c | 51 
>>> +++
>>>  include/linux/devfreq.h   |  7 +++
>>>  2 files changed, 50 insertions(+), 8 deletions(-)
>>>
>>> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
>>> index a9fd61b..36bed24 100644
>>> --- a/drivers/devfreq/devfreq.c
>>> +++ b/drivers/devfreq/devfreq.c
>>> @@ -316,6 +316,10 @@ static int devfreq_set_target(struct devfreq *devfreq, 
>>> unsigned long new_freq,
>>> "Couldn't update frequency transition information.\n");
>>>  
>>> devfreq->previous_freq = new_freq;
>>> +
>>> +   if (devfreq->suspend_freq)
>>> +   devfreq->resume_freq = cur_freq;
>>> +
>>> return err;
>>>  }
>>>  
>>> @@ -667,6 +671,9 @@ struct devfreq *devfreq_add_device(struct device *dev,
>>> }
>>> devfreq->max_freq = devfreq->scaling_max_freq;
>>>  
>>> +   devfreq->suspend_freq = dev_pm_opp_get_suspend_opp_freq(dev);
>>> +   atomic_set(>suspend_count, 0);
>>> +
>>> dev_set_name(>dev, "devfreq%d",
>>> atomic_inc_return(_no));
>>> err = device_register(>dev);
>>> @@ -867,14 +874,28 @@ EXPORT_SYMBOL(devm_devfreq_remove_device);
>>>   */
>>>  int devfreq_suspend_device(struct devfreq *devfreq)
>>>  {
>>> +   int ret;
>>> +
>>> if (!devfreq)
>>> return -EINVAL;
>>>  
>>> -   if (!devfreq->governor)
>>> -   return 0;
>>> +   if (devfreq->governor) {
>>> +   ret = devfreq->governor->event_handler(devfreq,
>>> +   DEVFREQ_GOV_SUSPEND, NULL);
>>> +   if (ret)
>>> +   return ret;
>>> +   }
>>> +
>>> +   if (devfreq->suspend_freq) {
>>> +   if (atomic_inc_return(>suspend_count) > 1)
>>> +   return 0;
>>> +
>>> +   ret = devfreq_set_target(devfreq, devfreq->suspend_freq, 0);
>>> +   if (ret)
>>> +   return ret;
>>> +   }

In this patch, if some users call 'devfreq_suspend_device' twice,
'devfreq->governor->event_handler(devfreq, DEVFREQ_GOV_SUSPEND, NULL)'
is called twice but devfreq_set_target() is called only one.
I knew that it is no problem for operation.

But,
I think that you better to use 'suspend_count' as the reference count
of devfreq_suspend/resume_device(). But, if you use 'suspend_count'
in order to check whether this devfreq is suspended or not,
we can reduce the unneeded redundant call when calling it twice.

clock and regulator used the 'reference count' method in order to
remove the redundant call.


>>>  
>>> -   return devfreq->governor->event_handler(devfreq,
>>> -   DEVFREQ_GOV_SUSPEND, NULL);
>>> +   return 0;
>>>  }
>>>  EXPORT_SYMBOL(devfreq_suspend_device);
>>>  
>>> @@ -888,14 +909,28 @@ EXPORT_SYMBOL(devfreq_suspend_device);
>>>   */
>>>  int devfreq_resume_device(struct devfreq *devfreq)
>>>  {
>>> +   int ret;
>>> +
>>> if (!devfreq)
>>> return -EINVAL;
>>>  
>>> -   if (!devfreq->governor)
>>> -   return 0;
>>> +   if (devfreq->resume_freq) {
>>> +   if (atomic_dec_return(>suspend_count) >= 1)
>>> +   return 0;

ditto.

>>>  
>>> -   return devfreq->governor->event_handler(devfreq,
>>> -   DEVFREQ_GOV_RESUME, NULL);
>>> +   ret = devfreq_set_target(devfreq, devfreq->resume_freq, 0);
>>> +   if (ret)
>>> +   return ret;
>>> +   }
>>> +
>>> +   if (devfreq->governor) {
>>> +   ret = devfreq->governor->event_handler(devfreq,
>>> +   DEVFREQ_GOV_RESUME, NULL);
>>> +   if (ret)
>>> +   return ret;
>>> +   }
>>> +
>>> +   return 0;
>>>  }
>>>  EXPORT_SYMBOL(devfreq_resume_device);
>>>  
>>> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
>>> index e4963b0..d985199 100644
>>> ---

Re: [PATCH] x86/boot: clear rsdp address in boot_params for broken loaders

2018-12-03 Thread Juergen Gross

On 04/12/2018 06:49, H. Peter Anvin wrote:
> On 12/3/18 9:32 PM, Juergen Gross wrote:
>>
>> I'd like to send a followup patch doing that. And I'd like to not only
>> test sentinel for being non-zero, but all padding fields as well. This
>> should be 4.21 material, though.
>>
> 
> No, you can't do that.  That breaks backwards compatibility.

So you are speaking about paddings which are at places where there used
to be some information? Shouldn't those be named "_res*"?
Recycling such paddings with some useful information seems to be rather
dangerous then.

I'd like to have at least some idea which boot loader is not passing a
clean struct boot_params. So I think we should at least have some debug
or info messages telling us which paddings are not zero initially to be
able to either fix the boot loader or switch from _pad* to _res* naming.

Juergen

Re: [PATCH v2] signal: add procfd_signal() syscall

2018-12-03 Thread Aleksa Sarai

On 2018-12-03, Christian Brauner  wrote:
> > > As I pointed out in another mail my I is to make this work by using
> > > file descriptors for /proc//task/.  I don't want this in the
> > > initial patchset though.  I prefer to slowly add those features once
> > > we have gotten the basic functionality in.
> > 
> > Do you want to land all this in one kernel release?  I wonder how
> > applications are supposed to discover kernel support if functionality is
> > split across several kernel releases.  If you get EINVAL or EBADF, it
> > may not be obvious what is going on.
> 
> Sigh, I get that but I really don't want to have to land this in one big
> chunk. I want this syscall to go in in a as soon as we can to fulfill
> the most basic need: having a way that guarantees us that we signal the
> process that we intended to signal.
> 
> The thread case is easy to implement on top of it. But I suspect we will
> quibble about the exact semantics for a long time. Even now we have been
> on multiple - justified - detrous. That's all pefectly fine and
> expected. But if we have the basic functionality in we have time to do
> all of that. We might even land it in the same kernel release still. I
> really don't want to come of as tea-party-kernel-conservative here but I
> have time-and-time again seen that making something fancy and cover ever
> interesting feature in one patchset takes a very very long time.
> 
> If you care about userspace being able to detect that case I can return
> EOPNOTSUPP when a tid descriptor is passed.

Personally, I'm +1 on -EOPNOTSUPP so we can get an MVP merged, and add
new features in later patches.

> > What happens if you use the new interface with an O_PATH descriptor?
> 
> You get EINVAL. When an O_PATH file descriptor is created the kernel
> will set file->f_op = _fops at which point the check I added 
> if (!proc_is_tgid_procfd(f.file))
> goto err;
> will fail. Imho this is correct behavior since technically signaling a
> struct pid is the equivalent of writing to a file and hence doesn't
> purely operate on the file descriptor level.

Not to mention that O_PATH file descriptors are a whole kettle of fish
when it comes to permission checking semantics.


-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH



signature.asc
Description: PGP signature

Re: [PATCH] x86/boot: clear rsdp address in boot_params for broken loaders

2018-12-03 Thread Juergen Gross

On 04/12/2018 06:49, H. Peter Anvin wrote:
> On 12/3/18 9:32 PM, Juergen Gross wrote:
>>
>> I'd like to send a followup patch doing that. And I'd like to not only
>> test sentinel for being non-zero, but all padding fields as well. This
>> should be 4.21 material, though.
>>
> 
> No, you can't do that.  That breaks backwards compatibility.

So you are speaking about paddings which are at places where there used
to be some information? Shouldn't those be named "_res*"?
Recycling such paddings with some useful information seems to be rather
dangerous then.

I'd like to have at least some idea which boot loader is not passing a
clean struct boot_params. So I think we should at least have some debug
or info messages telling us which paddings are not zero initially to be
able to either fix the boot loader or switch from _pad* to _res* naming.

Juergen

Re: [PATCH v2] signal: add procfd_signal() syscall

2018-12-03 Thread Aleksa Sarai

On 2018-12-03, Christian Brauner  wrote:
> > > As I pointed out in another mail my I is to make this work by using
> > > file descriptors for /proc//task/.  I don't want this in the
> > > initial patchset though.  I prefer to slowly add those features once
> > > we have gotten the basic functionality in.
> > 
> > Do you want to land all this in one kernel release?  I wonder how
> > applications are supposed to discover kernel support if functionality is
> > split across several kernel releases.  If you get EINVAL or EBADF, it
> > may not be obvious what is going on.
> 
> Sigh, I get that but I really don't want to have to land this in one big
> chunk. I want this syscall to go in in a as soon as we can to fulfill
> the most basic need: having a way that guarantees us that we signal the
> process that we intended to signal.
> 
> The thread case is easy to implement on top of it. But I suspect we will
> quibble about the exact semantics for a long time. Even now we have been
> on multiple - justified - detrous. That's all pefectly fine and
> expected. But if we have the basic functionality in we have time to do
> all of that. We might even land it in the same kernel release still. I
> really don't want to come of as tea-party-kernel-conservative here but I
> have time-and-time again seen that making something fancy and cover ever
> interesting feature in one patchset takes a very very long time.
> 
> If you care about userspace being able to detect that case I can return
> EOPNOTSUPP when a tid descriptor is passed.

Personally, I'm +1 on -EOPNOTSUPP so we can get an MVP merged, and add
new features in later patches.

> > What happens if you use the new interface with an O_PATH descriptor?
> 
> You get EINVAL. When an O_PATH file descriptor is created the kernel
> will set file->f_op = _fops at which point the check I added 
> if (!proc_is_tgid_procfd(f.file))
> goto err;
> will fail. Imho this is correct behavior since technically signaling a
> struct pid is the equivalent of writing to a file and hence doesn't
> purely operate on the file descriptor level.

Not to mention that O_PATH file descriptors are a whole kettle of fish
when it comes to permission checking semantics.


-- 
Aleksa Sarai
Senior Software Engineer (Containers)
SUSE Linux GmbH



signature.asc
Description: PGP signature

Re: [PATCH v5 2/2] phy: qualcomm: Add Synopsys High-Speed USB PHY driver

2018-12-03 Thread Shawn Guo

Hi Kishon,

On Tue, Dec 04, 2018 at 10:38:19AM +0530, Kishon Vijay Abraham I wrote:
> Hi,
> 
> On 27/11/18 3:37 PM, Shawn Guo wrote:
> > It adds Synopsys 28nm Femto High-Speed USB PHY driver support, which
> > is usually paired with Synopsys DWC3 USB controllers on Qualcomm SoCs.
> 
> Is this Synopsys PHY specific to Qualcomm or could it be used by other vendors
> (with just changing tuning parameters)? If it could be used by other vendors
> then it would make sense to add this PHY driver in synopsys directory.

My knowledge is that this Synopsys PHY is specific to Qualcomm SoCs.
@Sriharsha, correct me if I'm wrong.

Shawn

Re: [PATCH v5 2/2] phy: qualcomm: Add Synopsys High-Speed USB PHY driver

2018-12-03 Thread Shawn Guo

Hi Kishon,

On Tue, Dec 04, 2018 at 10:38:19AM +0530, Kishon Vijay Abraham I wrote:
> Hi,
> 
> On 27/11/18 3:37 PM, Shawn Guo wrote:
> > It adds Synopsys 28nm Femto High-Speed USB PHY driver support, which
> > is usually paired with Synopsys DWC3 USB controllers on Qualcomm SoCs.
> 
> Is this Synopsys PHY specific to Qualcomm or could it be used by other vendors
> (with just changing tuning parameters)? If it could be used by other vendors
> then it would make sense to add this PHY driver in synopsys directory.

My knowledge is that this Synopsys PHY is specific to Qualcomm SoCs.
@Sriharsha, correct me if I'm wrong.

Shawn

Re: [PATCH] Uprobes: Fix kernel oops with delayed_uprobe_remove()

2018-12-03 Thread Steven Rostedt

On Mon, 3 Dec 2018 11:52:41 +0530
Ravi Bangoria  wrote:

> Hi Steve,
> 
> Please pull this patch.
> 

Please send a v2 version of the patch with the updated change log. And
should it have a Fixes and be tagged for stable?

-- Steve

> Thanks.
> 
> On 11/15/18 6:13 PM, Oleg Nesterov wrote:
> > On 11/15, Ravi Bangoria wrote:  
> >>
> >> There could be a race between task exit and probe unregister:
> >>
> >>   exit_mm()
> >>   mmput()
> >>   __mmput() uprobe_unregister()
> >>   uprobe_clear_state()  put_uprobe()
> >>   delayed_uprobe_remove()   delayed_uprobe_remove()
> >>
> >> put_uprobe() is calling delayed_uprobe_remove() without taking
> >> delayed_uprobe_lock and thus the race sometimes results in a
> >> kernel crash. Fix this by taking delayed_uprobe_lock before
> >> calling delayed_uprobe_remove() from put_uprobe().
> >>
> >> Detailed crash log can be found at:
> >>   https://lkml.org/lkml/2018/11/1/1244  
> > 
> > Thanks, looks good,
> > 
> > Oleg.
> >

Re: [PATCH] Uprobes: Fix kernel oops with delayed_uprobe_remove()

2018-12-03 Thread Steven Rostedt

On Mon, 3 Dec 2018 11:52:41 +0530
Ravi Bangoria  wrote:

> Hi Steve,
> 
> Please pull this patch.
> 

Please send a v2 version of the patch with the updated change log. And
should it have a Fixes and be tagged for stable?

-- Steve

> Thanks.
> 
> On 11/15/18 6:13 PM, Oleg Nesterov wrote:
> > On 11/15, Ravi Bangoria wrote:  
> >>
> >> There could be a race between task exit and probe unregister:
> >>
> >>   exit_mm()
> >>   mmput()
> >>   __mmput() uprobe_unregister()
> >>   uprobe_clear_state()  put_uprobe()
> >>   delayed_uprobe_remove()   delayed_uprobe_remove()
> >>
> >> put_uprobe() is calling delayed_uprobe_remove() without taking
> >> delayed_uprobe_lock and thus the race sometimes results in a
> >> kernel crash. Fix this by taking delayed_uprobe_lock before
> >> calling delayed_uprobe_remove() from put_uprobe().
> >>
> >> Detailed crash log can be found at:
> >>   https://lkml.org/lkml/2018/11/1/1244  
> > 
> > Thanks, looks good,
> > 
> > Oleg.
> >

linux-next: manual merge of the akpm tree with the pm tree

2018-12-03 Thread Stephen Rothwell

Hi Andrew,

Today's linux-next merge of the akpm tree got a conflict in:

  fs/exec.c

between commit:

  67fe1224adc5 ("Revert "exec: make de_thread() freezable"")

from the pm tree and patch:

  "fs/: remove caller signal_pending branch predictions"

from the akpm tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc fs/exec.c
index ea7d439cf79e,044e296f2381..
--- a/fs/exec.c
+++ b/fs/exec.c
@@@ -1086,8 -1087,8 +1086,8 @@@ static int de_thread(struct task_struc
while (sig->notify_count) {
__set_current_state(TASK_KILLABLE);
spin_unlock_irq(lock);
 -  freezable_schedule();
 +  schedule();
-   if (unlikely(__fatal_signal_pending(tsk)))
+   if (__fatal_signal_pending(tsk))
goto killed;
spin_lock_irq(lock);
}
@@@ -1114,8 -1115,8 +1114,8 @@@
__set_current_state(TASK_KILLABLE);
write_unlock_irq(_lock);
cgroup_threadgroup_change_end(tsk);
 -  freezable_schedule();
 +  schedule();
-   if (unlikely(__fatal_signal_pending(tsk)))
+   if (__fatal_signal_pending(tsk))
goto killed;
}
  


pgpyJkxHzU7jW.pgp
Description: OpenPGP digital signature

linux-next: manual merge of the akpm tree with the pm tree

2018-12-03 Thread Stephen Rothwell

Hi Andrew,

Today's linux-next merge of the akpm tree got a conflict in:

  fs/exec.c

between commit:

  67fe1224adc5 ("Revert "exec: make de_thread() freezable"")

from the pm tree and patch:

  "fs/: remove caller signal_pending branch predictions"

from the akpm tree.

I fixed it up (see below) and can carry the fix as necessary. This
is now fixed as far as linux-next is concerned, but any non trivial
conflicts should be mentioned to your upstream maintainer when your tree
is submitted for merging.  You may also want to consider cooperating
with the maintainer of the conflicting tree to minimise any particularly
complex conflicts.

-- 
Cheers,
Stephen Rothwell

diff --cc fs/exec.c
index ea7d439cf79e,044e296f2381..
--- a/fs/exec.c
+++ b/fs/exec.c
@@@ -1086,8 -1087,8 +1086,8 @@@ static int de_thread(struct task_struc
while (sig->notify_count) {
__set_current_state(TASK_KILLABLE);
spin_unlock_irq(lock);
 -  freezable_schedule();
 +  schedule();
-   if (unlikely(__fatal_signal_pending(tsk)))
+   if (__fatal_signal_pending(tsk))
goto killed;
spin_lock_irq(lock);
}
@@@ -1114,8 -1115,8 +1114,8 @@@
__set_current_state(TASK_KILLABLE);
write_unlock_irq(_lock);
cgroup_threadgroup_change_end(tsk);
 -  freezable_schedule();
 +  schedule();
-   if (unlikely(__fatal_signal_pending(tsk)))
+   if (__fatal_signal_pending(tsk))
goto killed;
}
  


pgpyJkxHzU7jW.pgp
Description: OpenPGP digital signature

Re: [PATCH] x86/boot: clear rsdp address in boot_params for broken loaders

2018-12-03 Thread H. Peter Anvin

On 12/3/18 9:32 PM, Juergen Gross wrote:
> 
> I'd like to send a followup patch doing that. And I'd like to not only
> test sentinel for being non-zero, but all padding fields as well. This
> should be 4.21 material, though.
> 

No, you can't do that.  That breaks backwards compatibility.

-hpa

Re: [PATCH] x86/boot: clear rsdp address in boot_params for broken loaders

2018-12-03 Thread H. Peter Anvin

On 12/3/18 9:32 PM, Juergen Gross wrote:
> 
> I'd like to send a followup patch doing that. And I'd like to not only
> test sentinel for being non-zero, but all padding fields as well. This
> should be 4.21 material, though.
> 

No, you can't do that.  That breaks backwards compatibility.

-hpa

Re: [PATCH 5/5] i2c: mediatek: Add i2c compatible for MediaTek MT8183

2018-12-03 Thread Sean Wang

 於 2018年12月3日 週一 上午5:34寫道：
>
> From: qii wang 
>
> Add i2c compatible for MT8183. Compare to 2712 i2c controller, MT8183 has
> different registers, offsets, clock, and multi-user function.
>
> Signed-off-by: qii wang 
> ---
>  drivers/i2c/busses/i2c-mt65xx.c |  136 
> +--
>  1 file changed, 130 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/i2c/busses/i2c-mt65xx.c b/drivers/i2c/busses/i2c-mt65xx.c
> index 428ac99..6b979ab 100644
> --- a/drivers/i2c/busses/i2c-mt65xx.c
> +++ b/drivers/i2c/busses/i2c-mt65xx.c
> @@ -35,17 +35,23 @@
>  #include 
>
>  #define I2C_RS_TRANSFER(1 << 4)
> +#define I2C_ARB_LOST   (1 << 3)

it seems no one refers to the macro in the patch so it should be
better to be removed

>  #define I2C_HS_NACKERR (1 << 2)
>  #define I2C_ACKERR (1 << 1)
>  #define I2C_TRANSAC_COMP   (1 << 0)
>  #define I2C_TRANSAC_START  (1 << 0)
> +#define I2C_RESUME_ARBIT   (1 << 1)
>  #define I2C_RS_MUL_CNFG(1 << 15)
>  #define I2C_RS_MUL_TRIG(1 << 14)
> +#define I2C_HS_TIME_EN (1 << 7)
>  #define I2C_DCM_DISABLE0x
>  #define I2C_IO_CONFIG_OPEN_DRAIN   0x0003
>  #define I2C_IO_CONFIG_PUSH_PULL0x
>  #define I2C_SOFT_RST   0x0001
>  #define I2C_FIFO_ADDR_CLR  0x0001
> +#define I2C_FIFO_ADDR_CLRH 0x0002
> +#define I2C_FIFO_ADDR_CLR_MCH  0x0004
> +#define I2C_HFIFO_DATA 0x8208
>  #define I2C_DELAY_LEN  0x0002
>  #define I2C_ST_START_CON   0x8001
>  #define I2C_FS_START_CON   0x1800
> @@ -76,6 +82,8 @@
>  #define I2C_CONTROL_DIR_CHANGE  (0x1 << 4)
>  #define I2C_CONTROL_ACKERR_DET_EN   (0x1 << 5)
>  #define I2C_CONTROL_TRANSFER_LEN_CHANGE (0x1 << 6)
> +#define I2C_CONTROL_DMAACK_EN   (0x1 << 8)
> +#define I2C_CONTROL_ASYNC_MODE  (0x1 << 9)
>  #define I2C_CONTROL_WRAPPER (0x1 << 0)
>
>  #define I2C_DRV_NAME   "i2c-mt65xx"
> @@ -130,6 +138,15 @@ enum I2C_REGS_OFFSET {
> OFFSET_DEBUGCTRL,
> OFFSET_TRANSFER_LEN_AUX,
> OFFSET_CLOCK_DIV,
> +   /* MT8183 only regs */
> +   OFFSET_LTIMING,
> +   OFFSET_DATA_TIMING,
> +   OFFSET_MCU_INTR,
> +   OFFSET_HW_TIMEOUT,
> +   OFFSET_HFIFO_DATA,
> +   OFFSET_HFIFO_STAT,
> +   OFFSET_MULTI_DMA,
> +   OFFSET_ROLLBACK,
>  };
>
>  static const u16 mt_i2c_regs_v1[] = {
> @@ -159,6 +176,39 @@ enum I2C_REGS_OFFSET {
> [OFFSET_CLOCK_DIV] = 0x70,
>  };
>
> +static const u16 mt_i2c_regs_v2[] = {
> +   [OFFSET_DATA_PORT] = 0x0,
> +   [OFFSET_SLAVE_ADDR] = 0x4,
> +   [OFFSET_INTR_MASK] = 0x8,
> +   [OFFSET_INTR_STAT] = 0xc,
> +   [OFFSET_CONTROL] = 0x10,
> +   [OFFSET_TRANSFER_LEN] = 0x14,
> +   [OFFSET_TRANSAC_LEN] = 0x18,
> +   [OFFSET_DELAY_LEN] = 0x1c,
> +   [OFFSET_TIMING] = 0x20,
> +   [OFFSET_START] = 0x24,
> +   [OFFSET_EXT_CONF] = 0x28,
> +   [OFFSET_LTIMING] = 0x2c,
> +   [OFFSET_HS] = 0x30,
> +   [OFFSET_IO_CONFIG] = 0x34,
> +   [OFFSET_FIFO_ADDR_CLR] = 0x38,
> +   [OFFSET_DATA_TIMING] = 0x3c,
> +   [OFFSET_MCU_INTR] = 0x40,
> +   [OFFSET_TRANSFER_LEN_AUX] = 0x44,
> +   [OFFSET_CLOCK_DIV] = 0x48,
> +   [OFFSET_HW_TIMEOUT] = 0x4c,
> +   [OFFSET_SOFTRESET] = 0x50,
> +   [OFFSET_HFIFO_DATA] = 0x70,
> +   [OFFSET_DEBUGSTAT] = 0xe0,
> +   [OFFSET_DEBUGCTRL] = 0xe8,
> +   [OFFSET_FIFO_STAT] = 0xf4,
> +   [OFFSET_FIFO_THRESH] = 0xf8,
> +   [OFFSET_HFIFO_STAT] = 0xfc,
> +   [OFFSET_DCM_EN] = 0xf88,
> +   [OFFSET_MULTI_DMA] = 0xf8c,
> +   [OFFSET_ROLLBACK] = 0xf98,
> +};
> +
>  struct mtk_i2c_compatible {
> const struct i2c_adapter_quirks *quirks;
> const u16 *regs;
> @@ -168,6 +218,7 @@ struct mtk_i2c_compatible {
> unsigned char aux_len_reg: 1;
> unsigned char support_33bits: 1;
> unsigned char timing_adjust: 1;
> +   unsigned char dma_sync: 1;
>  };
>
>  struct mtk_i2c {
> @@ -181,8 +232,11 @@ struct mtk_i2c {
> struct clk *clk_main;   /* main clock for i2c bus */
> struct clk *clk_dma;/* DMA clock for i2c via DMA */
> struct clk *clk_pmic;   /* PMIC clock for i2c from PMIC */
> +   struct clk *clk_arb;/* Arbitrator clock for i2c */
> bool have_pmic; /* can use i2c pins from PMIC */
> bool use_push_pull; /* IO config push-pull mode */
> +   bool share_i3c; /* share i3c IP*/
> +   u32 ch_offset;  /* i2c multi-user channel offset */
>
> u16 irq_stat;   /* interrupt status */
> unsigned int clk_src_div;
> @@ -190,6 +244,7 @@ struct mtk_i2c {
> enum

Re: [PATCH 3/3] arm64: ftrace: add cond_resched() to func ftrace_make_(call|nop)

2018-12-03 Thread Steven Rostedt

On Mon, 3 Dec 2018 22:51:52 +0100
Arnd Bergmann  wrote:

> On Mon, Dec 3, 2018 at 8:22 PM Will Deacon  wrote:
> >
> > Hi Anders,
> >
> > On Fri, Nov 30, 2018 at 04:09:56PM +0100, Anders Roxell wrote:  
> > > Both of those functions end up calling ftrace_modify_code(), which is
> > > expensive because it changes the page tables and flush caches.
> > > Microseconds add up because this is called in a loop for each dyn_ftrace
> > > record, and this triggers the softlockup watchdog unless we let it sleep
> > > occasionally.
> > > Rework so that we call cond_resched() before going into the
> > > ftrace_modify_code() function.
> > >
> > > Co-developed-by: Arnd Bergmann 
> > > Signed-off-by: Arnd Bergmann 
> > > Signed-off-by: Anders Roxell 
> > > ---
> > >  arch/arm64/kernel/ftrace.c | 10 ++
> > >  1 file changed, 10 insertions(+)  
> >
> > It sounds like you're running into issues with the existing code, but I'd
> > like to understand a bit more about exactly what you're seeing. Which part
> > of the ftrace patching is proving to be expensive?
> >
> > The page table manipulation only happens once per module when using PLTs,
> > and the cache maintenance is just a single line per patch site without an
> > IPI.
> >
> > Is it the loop in ftrace_replace_code() that is causing the hassle?  
> 
> Yes: with an allmodconfig kernel, the ftrace selftest calls 
> ftrace_replace_code
> to look >4 through ftrace_make_call/ftrace_make_nop, and these
> end up calling
> 
> static int __kprobes __aarch64_insn_write(void *addr, __le32 insn)
> {
> void *waddr = addr;
> unsigned long flags = 0;
> int ret;
> 
> raw_spin_lock_irqsave(_lock, flags);
> waddr = patch_map(addr, FIX_TEXT_POKE0);
> 
> ret = probe_kernel_write(waddr, , AARCH64_INSN_SIZE);
> 
> patch_unmap(FIX_TEXT_POKE0);
> raw_spin_unlock_irqrestore(_lock, flags);
> 
> return ret;
> }
> int __kprobes aarch64_insn_patch_text_nosync(void *addr, u32 insn)
> {
> u32 *tp = addr;
> int ret;
> 
> /* A64 instructions must be word aligned */
> if ((uintptr_t)tp & 0x3)
> return -EINVAL;
> 
> ret = aarch64_insn_write(tp, insn);
> if (ret == 0)
> __flush_icache_range((uintptr_t)tp,
>  (uintptr_t)tp + AARCH64_INSN_SIZE);
> 
> return ret;
> }
> 
> which seems to be where the main cost is. This is with inside of
> qemu, and with lots of debugging options (in particular
> kcov and ubsan) enabled, that make each function call
> more expensive.

I was thinking more about this. Would something like this work?

-- Steve

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 8ef9fc226037..42e89397778b 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2393,11 +2393,14 @@ void __weak ftrace_replace_code(int enable)
 {
struct dyn_ftrace *rec;
struct ftrace_page *pg;
+   bool schedulable;
int failed;
 
if (unlikely(ftrace_disabled))
return;
 
+   schedulable = !irqs_disabled() & !preempt_count();
+
do_for_each_ftrace_rec(pg, rec) {
 
if (rec->flags & FTRACE_FL_DISABLED)
@@ -2409,6 +2412,8 @@ void __weak ftrace_replace_code(int enable)
/* Stop processing */
return;
}
+   if (schedulable)
+   cond_resched();
} while_for_each_ftrace_rec();
 }

Re: [PATCH 5/5] i2c: mediatek: Add i2c compatible for MediaTek MT8183

2018-12-03 Thread Sean Wang

 於 2018年12月3日 週一 上午5:34寫道：
>
> From: qii wang 
>
> Add i2c compatible for MT8183. Compare to 2712 i2c controller, MT8183 has
> different registers, offsets, clock, and multi-user function.
>
> Signed-off-by: qii wang 
> ---
>  drivers/i2c/busses/i2c-mt65xx.c |  136 
> +--
>  1 file changed, 130 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/i2c/busses/i2c-mt65xx.c b/drivers/i2c/busses/i2c-mt65xx.c
> index 428ac99..6b979ab 100644
> --- a/drivers/i2c/busses/i2c-mt65xx.c
> +++ b/drivers/i2c/busses/i2c-mt65xx.c
> @@ -35,17 +35,23 @@
>  #include 
>
>  #define I2C_RS_TRANSFER(1 << 4)
> +#define I2C_ARB_LOST   (1 << 3)

it seems no one refers to the macro in the patch so it should be
better to be removed

>  #define I2C_HS_NACKERR (1 << 2)
>  #define I2C_ACKERR (1 << 1)
>  #define I2C_TRANSAC_COMP   (1 << 0)
>  #define I2C_TRANSAC_START  (1 << 0)
> +#define I2C_RESUME_ARBIT   (1 << 1)
>  #define I2C_RS_MUL_CNFG(1 << 15)
>  #define I2C_RS_MUL_TRIG(1 << 14)
> +#define I2C_HS_TIME_EN (1 << 7)
>  #define I2C_DCM_DISABLE0x
>  #define I2C_IO_CONFIG_OPEN_DRAIN   0x0003
>  #define I2C_IO_CONFIG_PUSH_PULL0x
>  #define I2C_SOFT_RST   0x0001
>  #define I2C_FIFO_ADDR_CLR  0x0001
> +#define I2C_FIFO_ADDR_CLRH 0x0002
> +#define I2C_FIFO_ADDR_CLR_MCH  0x0004
> +#define I2C_HFIFO_DATA 0x8208
>  #define I2C_DELAY_LEN  0x0002
>  #define I2C_ST_START_CON   0x8001
>  #define I2C_FS_START_CON   0x1800
> @@ -76,6 +82,8 @@
>  #define I2C_CONTROL_DIR_CHANGE  (0x1 << 4)
>  #define I2C_CONTROL_ACKERR_DET_EN   (0x1 << 5)
>  #define I2C_CONTROL_TRANSFER_LEN_CHANGE (0x1 << 6)
> +#define I2C_CONTROL_DMAACK_EN   (0x1 << 8)
> +#define I2C_CONTROL_ASYNC_MODE  (0x1 << 9)
>  #define I2C_CONTROL_WRAPPER (0x1 << 0)
>
>  #define I2C_DRV_NAME   "i2c-mt65xx"
> @@ -130,6 +138,15 @@ enum I2C_REGS_OFFSET {
> OFFSET_DEBUGCTRL,
> OFFSET_TRANSFER_LEN_AUX,
> OFFSET_CLOCK_DIV,
> +   /* MT8183 only regs */
> +   OFFSET_LTIMING,
> +   OFFSET_DATA_TIMING,
> +   OFFSET_MCU_INTR,
> +   OFFSET_HW_TIMEOUT,
> +   OFFSET_HFIFO_DATA,
> +   OFFSET_HFIFO_STAT,
> +   OFFSET_MULTI_DMA,
> +   OFFSET_ROLLBACK,
>  };
>
>  static const u16 mt_i2c_regs_v1[] = {
> @@ -159,6 +176,39 @@ enum I2C_REGS_OFFSET {
> [OFFSET_CLOCK_DIV] = 0x70,
>  };
>
> +static const u16 mt_i2c_regs_v2[] = {
> +   [OFFSET_DATA_PORT] = 0x0,
> +   [OFFSET_SLAVE_ADDR] = 0x4,
> +   [OFFSET_INTR_MASK] = 0x8,
> +   [OFFSET_INTR_STAT] = 0xc,
> +   [OFFSET_CONTROL] = 0x10,
> +   [OFFSET_TRANSFER_LEN] = 0x14,
> +   [OFFSET_TRANSAC_LEN] = 0x18,
> +   [OFFSET_DELAY_LEN] = 0x1c,
> +   [OFFSET_TIMING] = 0x20,
> +   [OFFSET_START] = 0x24,
> +   [OFFSET_EXT_CONF] = 0x28,
> +   [OFFSET_LTIMING] = 0x2c,
> +   [OFFSET_HS] = 0x30,
> +   [OFFSET_IO_CONFIG] = 0x34,
> +   [OFFSET_FIFO_ADDR_CLR] = 0x38,
> +   [OFFSET_DATA_TIMING] = 0x3c,
> +   [OFFSET_MCU_INTR] = 0x40,
> +   [OFFSET_TRANSFER_LEN_AUX] = 0x44,
> +   [OFFSET_CLOCK_DIV] = 0x48,
> +   [OFFSET_HW_TIMEOUT] = 0x4c,
> +   [OFFSET_SOFTRESET] = 0x50,
> +   [OFFSET_HFIFO_DATA] = 0x70,
> +   [OFFSET_DEBUGSTAT] = 0xe0,
> +   [OFFSET_DEBUGCTRL] = 0xe8,
> +   [OFFSET_FIFO_STAT] = 0xf4,
> +   [OFFSET_FIFO_THRESH] = 0xf8,
> +   [OFFSET_HFIFO_STAT] = 0xfc,
> +   [OFFSET_DCM_EN] = 0xf88,
> +   [OFFSET_MULTI_DMA] = 0xf8c,
> +   [OFFSET_ROLLBACK] = 0xf98,
> +};
> +
>  struct mtk_i2c_compatible {
> const struct i2c_adapter_quirks *quirks;
> const u16 *regs;
> @@ -168,6 +218,7 @@ struct mtk_i2c_compatible {
> unsigned char aux_len_reg: 1;
> unsigned char support_33bits: 1;
> unsigned char timing_adjust: 1;
> +   unsigned char dma_sync: 1;
>  };
>
>  struct mtk_i2c {
> @@ -181,8 +232,11 @@ struct mtk_i2c {
> struct clk *clk_main;   /* main clock for i2c bus */
> struct clk *clk_dma;/* DMA clock for i2c via DMA */
> struct clk *clk_pmic;   /* PMIC clock for i2c from PMIC */
> +   struct clk *clk_arb;/* Arbitrator clock for i2c */
> bool have_pmic; /* can use i2c pins from PMIC */
> bool use_push_pull; /* IO config push-pull mode */
> +   bool share_i3c; /* share i3c IP*/
> +   u32 ch_offset;  /* i2c multi-user channel offset */
>
> u16 irq_stat;   /* interrupt status */
> unsigned int clk_src_div;
> @@ -190,6 +244,7 @@ struct mtk_i2c {
> enum

Re: [PATCH 3/3] arm64: ftrace: add cond_resched() to func ftrace_make_(call|nop)

2018-12-03 Thread Steven Rostedt

On Mon, 3 Dec 2018 22:51:52 +0100
Arnd Bergmann  wrote:

> On Mon, Dec 3, 2018 at 8:22 PM Will Deacon  wrote:
> >
> > Hi Anders,
> >
> > On Fri, Nov 30, 2018 at 04:09:56PM +0100, Anders Roxell wrote:  
> > > Both of those functions end up calling ftrace_modify_code(), which is
> > > expensive because it changes the page tables and flush caches.
> > > Microseconds add up because this is called in a loop for each dyn_ftrace
> > > record, and this triggers the softlockup watchdog unless we let it sleep
> > > occasionally.
> > > Rework so that we call cond_resched() before going into the
> > > ftrace_modify_code() function.
> > >
> > > Co-developed-by: Arnd Bergmann 
> > > Signed-off-by: Arnd Bergmann 
> > > Signed-off-by: Anders Roxell 
> > > ---
> > >  arch/arm64/kernel/ftrace.c | 10 ++
> > >  1 file changed, 10 insertions(+)  
> >
> > It sounds like you're running into issues with the existing code, but I'd
> > like to understand a bit more about exactly what you're seeing. Which part
> > of the ftrace patching is proving to be expensive?
> >
> > The page table manipulation only happens once per module when using PLTs,
> > and the cache maintenance is just a single line per patch site without an
> > IPI.
> >
> > Is it the loop in ftrace_replace_code() that is causing the hassle?  
> 
> Yes: with an allmodconfig kernel, the ftrace selftest calls 
> ftrace_replace_code
> to look >4 through ftrace_make_call/ftrace_make_nop, and these
> end up calling
> 
> static int __kprobes __aarch64_insn_write(void *addr, __le32 insn)
> {
> void *waddr = addr;
> unsigned long flags = 0;
> int ret;
> 
> raw_spin_lock_irqsave(_lock, flags);
> waddr = patch_map(addr, FIX_TEXT_POKE0);
> 
> ret = probe_kernel_write(waddr, , AARCH64_INSN_SIZE);
> 
> patch_unmap(FIX_TEXT_POKE0);
> raw_spin_unlock_irqrestore(_lock, flags);
> 
> return ret;
> }
> int __kprobes aarch64_insn_patch_text_nosync(void *addr, u32 insn)
> {
> u32 *tp = addr;
> int ret;
> 
> /* A64 instructions must be word aligned */
> if ((uintptr_t)tp & 0x3)
> return -EINVAL;
> 
> ret = aarch64_insn_write(tp, insn);
> if (ret == 0)
> __flush_icache_range((uintptr_t)tp,
>  (uintptr_t)tp + AARCH64_INSN_SIZE);
> 
> return ret;
> }
> 
> which seems to be where the main cost is. This is with inside of
> qemu, and with lots of debugging options (in particular
> kcov and ubsan) enabled, that make each function call
> more expensive.

I was thinking more about this. Would something like this work?

-- Steve

diff --git a/kernel/trace/ftrace.c b/kernel/trace/ftrace.c
index 8ef9fc226037..42e89397778b 100644
--- a/kernel/trace/ftrace.c
+++ b/kernel/trace/ftrace.c
@@ -2393,11 +2393,14 @@ void __weak ftrace_replace_code(int enable)
 {
struct dyn_ftrace *rec;
struct ftrace_page *pg;
+   bool schedulable;
int failed;
 
if (unlikely(ftrace_disabled))
return;
 
+   schedulable = !irqs_disabled() & !preempt_count();
+
do_for_each_ftrace_rec(pg, rec) {
 
if (rec->flags & FTRACE_FL_DISABLED)
@@ -2409,6 +2412,8 @@ void __weak ftrace_replace_code(int enable)
/* Stop processing */
return;
}
+   if (schedulable)
+   cond_resched();
} while_for_each_ftrace_rec();
 }

Re: [PATCH v2 2/5] devfreq: add support for suspend/resume of a devfreq device

2018-12-03 Thread Chanwoo Choi

Hi,

On 2018년 12월 04일 14:36, Chanwoo Choi wrote:
> Hi Lukasz,
> 
> Looks good to me. But, I add the some comments.
> If you will fix it, feel free to add my tag:
> Reviewed-by: Chanwoo choi 

Sorry. Fix typo 'choi' to 'Choi' as following.
Reviewed-by: Chanwoo Choi 

> 
> On 2018년 12월 03일 23:31, Lukasz Luba wrote:
>> The patch prepares devfreq device for handling suspend/resume
>> functionality.  The new fields will store needed information during this
> 
> nitpick. Remove unneeded space. There are two spaces between '.' and 'The 
> new'. 
> 
>> process.  Devfreq framework handles opp-suspend DT entry and there is no
> 
> ditto.
> 
>> need of modyfications in the drivers code.  It uses atomic variables to
> 
> ditto.
> 
>> make sure no race condition affects the process.
>>
>> The patch is based on earlier work by Tobias Jakobi.
> 
> Please remove it from each patch description.
> 
>>
>> Suggested-by: Tobias Jakobi 
>> Suggested-by: Chanwoo Choi 
>> Signed-off-by: Lukasz Luba 
>> ---
>>  drivers/devfreq/devfreq.c | 51 
>> +++
>>  include/linux/devfreq.h   |  7 +++
>>  2 files changed, 50 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
>> index a9fd61b..36bed24 100644
>> --- a/drivers/devfreq/devfreq.c
>> +++ b/drivers/devfreq/devfreq.c
>> @@ -316,6 +316,10 @@ static int devfreq_set_target(struct devfreq *devfreq, 
>> unsigned long new_freq,
>>  "Couldn't update frequency transition information.\n");
>>  
>>  devfreq->previous_freq = new_freq;
>> +
>> +if (devfreq->suspend_freq)
>> +devfreq->resume_freq = cur_freq;
>> +
>>  return err;
>>  }
>>  
>> @@ -667,6 +671,9 @@ struct devfreq *devfreq_add_device(struct device *dev,
>>  }
>>  devfreq->max_freq = devfreq->scaling_max_freq;
>>  
>> +devfreq->suspend_freq = dev_pm_opp_get_suspend_opp_freq(dev);
>> +atomic_set(>suspend_count, 0);
>> +
>>  dev_set_name(>dev, "devfreq%d",
>>  atomic_inc_return(_no));
>>  err = device_register(>dev);
>> @@ -867,14 +874,28 @@ EXPORT_SYMBOL(devm_devfreq_remove_device);
>>   */
>>  int devfreq_suspend_device(struct devfreq *devfreq)
>>  {
>> +int ret;
>> +
>>  if (!devfreq)
>>  return -EINVAL;
>>  
>> -if (!devfreq->governor)
>> -return 0;
>> +if (devfreq->governor) {
>> +ret = devfreq->governor->event_handler(devfreq,
>> +DEVFREQ_GOV_SUSPEND, NULL);
>> +if (ret)
>> +return ret;
>> +}
>> +
>> +if (devfreq->suspend_freq) {
>> +if (atomic_inc_return(>suspend_count) > 1)
>> +return 0;
>> +
>> +ret = devfreq_set_target(devfreq, devfreq->suspend_freq, 0);
>> +if (ret)
>> +return ret;
>> +}
>>  
>> -return devfreq->governor->event_handler(devfreq,
>> -DEVFREQ_GOV_SUSPEND, NULL);
>> +return 0;
>>  }
>>  EXPORT_SYMBOL(devfreq_suspend_device);
>>  
>> @@ -888,14 +909,28 @@ EXPORT_SYMBOL(devfreq_suspend_device);
>>   */
>>  int devfreq_resume_device(struct devfreq *devfreq)
>>  {
>> +int ret;
>> +
>>  if (!devfreq)
>>  return -EINVAL;
>>  
>> -if (!devfreq->governor)
>> -return 0;
>> +if (devfreq->resume_freq) {
>> +if (atomic_dec_return(>suspend_count) >= 1)
>> +return 0;
>>  
>> -return devfreq->governor->event_handler(devfreq,
>> -DEVFREQ_GOV_RESUME, NULL);
>> +ret = devfreq_set_target(devfreq, devfreq->resume_freq, 0);
>> +if (ret)
>> +return ret;
>> +}
>> +
>> +if (devfreq->governor) {
>> +ret = devfreq->governor->event_handler(devfreq,
>> +DEVFREQ_GOV_RESUME, NULL);
>> +if (ret)
>> +return ret;
>> +}
>> +
>> +return 0;
>>  }
>>  EXPORT_SYMBOL(devfreq_resume_device);
>>  
>> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
>> index e4963b0..d985199 100644
>> --- a/include/linux/devfreq.h
>> +++ b/include/linux/devfreq.h
>> @@ -131,6 +131,9 @@ struct devfreq_dev_profile {
>>   * @scaling_min_freq:   Limit minimum frequency requested by OPP 
>> interface
>>   * @scaling_max_freq:   Limit maximum frequency requested by OPP 
>> interface
>>   * @stop_polling:devfreq polling status of a device.
>> + * @suspend_freq:frequency of a device set during suspend phase.
>> + * @resume_freq: frequency of a device set in resume phase.
>> + * @suspend_count:   suspend requests counter for a device.
>>   * @total_trans:Number of devfreq transitions
>>   * @trans_table:Statistics of devfreq transitions
>>   * @time_in_state:  Statistics of devfreq states
>> @@ -167,6 +170,10 @@ struct devfreq {
>>  unsigned long scaling_max_freq;

Re: [PATCH v2 2/5] devfreq: add support for suspend/resume of a devfreq device

2018-12-03 Thread Chanwoo Choi

Hi,

On 2018년 12월 04일 14:36, Chanwoo Choi wrote:
> Hi Lukasz,
> 
> Looks good to me. But, I add the some comments.
> If you will fix it, feel free to add my tag:
> Reviewed-by: Chanwoo choi 

Sorry. Fix typo 'choi' to 'Choi' as following.
Reviewed-by: Chanwoo Choi 

> 
> On 2018년 12월 03일 23:31, Lukasz Luba wrote:
>> The patch prepares devfreq device for handling suspend/resume
>> functionality.  The new fields will store needed information during this
> 
> nitpick. Remove unneeded space. There are two spaces between '.' and 'The 
> new'. 
> 
>> process.  Devfreq framework handles opp-suspend DT entry and there is no
> 
> ditto.
> 
>> need of modyfications in the drivers code.  It uses atomic variables to
> 
> ditto.
> 
>> make sure no race condition affects the process.
>>
>> The patch is based on earlier work by Tobias Jakobi.
> 
> Please remove it from each patch description.
> 
>>
>> Suggested-by: Tobias Jakobi 
>> Suggested-by: Chanwoo Choi 
>> Signed-off-by: Lukasz Luba 
>> ---
>>  drivers/devfreq/devfreq.c | 51 
>> +++
>>  include/linux/devfreq.h   |  7 +++
>>  2 files changed, 50 insertions(+), 8 deletions(-)
>>
>> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
>> index a9fd61b..36bed24 100644
>> --- a/drivers/devfreq/devfreq.c
>> +++ b/drivers/devfreq/devfreq.c
>> @@ -316,6 +316,10 @@ static int devfreq_set_target(struct devfreq *devfreq, 
>> unsigned long new_freq,
>>  "Couldn't update frequency transition information.\n");
>>  
>>  devfreq->previous_freq = new_freq;
>> +
>> +if (devfreq->suspend_freq)
>> +devfreq->resume_freq = cur_freq;
>> +
>>  return err;
>>  }
>>  
>> @@ -667,6 +671,9 @@ struct devfreq *devfreq_add_device(struct device *dev,
>>  }
>>  devfreq->max_freq = devfreq->scaling_max_freq;
>>  
>> +devfreq->suspend_freq = dev_pm_opp_get_suspend_opp_freq(dev);
>> +atomic_set(>suspend_count, 0);
>> +
>>  dev_set_name(>dev, "devfreq%d",
>>  atomic_inc_return(_no));
>>  err = device_register(>dev);
>> @@ -867,14 +874,28 @@ EXPORT_SYMBOL(devm_devfreq_remove_device);
>>   */
>>  int devfreq_suspend_device(struct devfreq *devfreq)
>>  {
>> +int ret;
>> +
>>  if (!devfreq)
>>  return -EINVAL;
>>  
>> -if (!devfreq->governor)
>> -return 0;
>> +if (devfreq->governor) {
>> +ret = devfreq->governor->event_handler(devfreq,
>> +DEVFREQ_GOV_SUSPEND, NULL);
>> +if (ret)
>> +return ret;
>> +}
>> +
>> +if (devfreq->suspend_freq) {
>> +if (atomic_inc_return(>suspend_count) > 1)
>> +return 0;
>> +
>> +ret = devfreq_set_target(devfreq, devfreq->suspend_freq, 0);
>> +if (ret)
>> +return ret;
>> +}
>>  
>> -return devfreq->governor->event_handler(devfreq,
>> -DEVFREQ_GOV_SUSPEND, NULL);
>> +return 0;
>>  }
>>  EXPORT_SYMBOL(devfreq_suspend_device);
>>  
>> @@ -888,14 +909,28 @@ EXPORT_SYMBOL(devfreq_suspend_device);
>>   */
>>  int devfreq_resume_device(struct devfreq *devfreq)
>>  {
>> +int ret;
>> +
>>  if (!devfreq)
>>  return -EINVAL;
>>  
>> -if (!devfreq->governor)
>> -return 0;
>> +if (devfreq->resume_freq) {
>> +if (atomic_dec_return(>suspend_count) >= 1)
>> +return 0;
>>  
>> -return devfreq->governor->event_handler(devfreq,
>> -DEVFREQ_GOV_RESUME, NULL);
>> +ret = devfreq_set_target(devfreq, devfreq->resume_freq, 0);
>> +if (ret)
>> +return ret;
>> +}
>> +
>> +if (devfreq->governor) {
>> +ret = devfreq->governor->event_handler(devfreq,
>> +DEVFREQ_GOV_RESUME, NULL);
>> +if (ret)
>> +return ret;
>> +}
>> +
>> +return 0;
>>  }
>>  EXPORT_SYMBOL(devfreq_resume_device);
>>  
>> diff --git a/include/linux/devfreq.h b/include/linux/devfreq.h
>> index e4963b0..d985199 100644
>> --- a/include/linux/devfreq.h
>> +++ b/include/linux/devfreq.h
>> @@ -131,6 +131,9 @@ struct devfreq_dev_profile {
>>   * @scaling_min_freq:   Limit minimum frequency requested by OPP 
>> interface
>>   * @scaling_max_freq:   Limit maximum frequency requested by OPP 
>> interface
>>   * @stop_polling:devfreq polling status of a device.
>> + * @suspend_freq:frequency of a device set during suspend phase.
>> + * @resume_freq: frequency of a device set in resume phase.
>> + * @suspend_count:   suspend requests counter for a device.
>>   * @total_trans:Number of devfreq transitions
>>   * @trans_table:Statistics of devfreq transitions
>>   * @time_in_state:  Statistics of devfreq states
>> @@ -167,6 +170,10 @@ struct devfreq {
>>  unsigned long scaling_max_freq;

Re: [PATCH v2 1/5] devfreq: refactor set_target frequency function

2018-12-03 Thread Chanwoo Choi

Hi,

On 2018년 12월 04일 13:39, Chanwoo Choi wrote:
> Hi Lukasz,
> 
> On 2018년 12월 03일 23:31, Lukasz Luba wrote:
>> The refactoring is needed for the new client in devfreq: suspend.
>> To avoid code duplication, move it to the new local function
>> devfreq_set_target.
>>
>> The patch is based on earlier work by Tobias Jakobi.
> 
> As I already commented, Please remove it. You already mentioned it on 
> cover-letter.
> If you want to contain the contribution history of Tobias, you might better
> to add 'Signed-off-by' or others.

If you will fix it, feel free to add my tag:
Reviewed-by: Chanwoo Choi 

> 
>>
>> Suggested-by: Tobias Jakobi 
>> Suggested-by: Chanwoo Choi 
>> Signed-off-by: Lukasz Luba 
>> ---
>>  drivers/devfreq/devfreq.c | 62 
>> +++
>>  1 file changed, 36 insertions(+), 26 deletions(-)
>>
>> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
>> index 1414130..a9fd61b 100644
>> --- a/drivers/devfreq/devfreq.c
>> +++ b/drivers/devfreq/devfreq.c
>> @@ -285,6 +285,40 @@ static int devfreq_notify_transition(struct devfreq 
>> *devfreq,
>>  return 0;
>>  }
>>  
>> +static int devfreq_set_target(struct devfreq *devfreq, unsigned long 
>> new_freq,
>> +  u32 flags)
>> +{
>> +struct devfreq_freqs freqs;
>> +unsigned long cur_freq;
>> +int err = 0;
>> +
>> +if (devfreq->profile->get_cur_freq)
>> +devfreq->profile->get_cur_freq(devfreq->dev.parent, _freq);
>> +else
>> +cur_freq = devfreq->previous_freq;
>> +
>> +freqs.old = cur_freq;
>> +freqs.new = new_freq;
>> +devfreq_notify_transition(devfreq, , DEVFREQ_PRECHANGE);
>> +
>> +err = devfreq->profile->target(devfreq->dev.parent, _freq, flags);
>> +if (err) {
>> +freqs.new = cur_freq;
>> +devfreq_notify_transition(devfreq, , DEVFREQ_POSTCHANGE);
>> +return err;
>> +}
>> +
>> +freqs.new = new_freq;
>> +devfreq_notify_transition(devfreq, , DEVFREQ_POSTCHANGE);
>> +
>> +if (devfreq_update_status(devfreq, new_freq))
>> +dev_err(>dev,
>> +"Couldn't update frequency transition information.\n");
>> +
>> +devfreq->previous_freq = new_freq;
>> +return err;
>> +}
>> +
>>  /* Load monitoring helper functions for governors use */
>>  
>>  /**
>> @@ -296,8 +330,7 @@ static int devfreq_notify_transition(struct devfreq 
>> *devfreq,
>>   */
>>  int update_devfreq(struct devfreq *devfreq)
>>  {
>> -struct devfreq_freqs freqs;
>> -unsigned long freq, cur_freq, min_freq, max_freq;
>> +unsigned long freq, min_freq, max_freq;
>>  int err = 0;
>>  u32 flags = 0;
>>  
>> @@ -333,31 +366,8 @@ int update_devfreq(struct devfreq *devfreq)
>>  flags |= DEVFREQ_FLAG_LEAST_UPPER_BOUND; /* Use LUB */
>>  }
>>  
>> -if (devfreq->profile->get_cur_freq)
>> -devfreq->profile->get_cur_freq(devfreq->dev.parent, _freq);
>> -else
>> -cur_freq = devfreq->previous_freq;
>> -
>> -freqs.old = cur_freq;
>> -freqs.new = freq;
>> -devfreq_notify_transition(devfreq, , DEVFREQ_PRECHANGE);
>> +return devfreq_set_target(devfreq, freq, flags);
>>  
>> -err = devfreq->profile->target(devfreq->dev.parent, , flags);
>> -if (err) {
>> -freqs.new = cur_freq;
>> -devfreq_notify_transition(devfreq, , DEVFREQ_POSTCHANGE);
>> -return err;
>> -}
>> -
>> -freqs.new = freq;
>> -devfreq_notify_transition(devfreq, , DEVFREQ_POSTCHANGE);
>> -
>> -if (devfreq_update_status(devfreq, freq))
>> -dev_err(>dev,
>> -"Couldn't update frequency transition information.\n");
>> -
>> -devfreq->previous_freq = freq;
>> -return err;
>>  }
>>  EXPORT_SYMBOL(update_devfreq);
>>  
>>
> 
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

Re: [PATCH v2 1/5] devfreq: refactor set_target frequency function

2018-12-03 Thread Chanwoo Choi

Hi,

On 2018년 12월 04일 13:39, Chanwoo Choi wrote:
> Hi Lukasz,
> 
> On 2018년 12월 03일 23:31, Lukasz Luba wrote:
>> The refactoring is needed for the new client in devfreq: suspend.
>> To avoid code duplication, move it to the new local function
>> devfreq_set_target.
>>
>> The patch is based on earlier work by Tobias Jakobi.
> 
> As I already commented, Please remove it. You already mentioned it on 
> cover-letter.
> If you want to contain the contribution history of Tobias, you might better
> to add 'Signed-off-by' or others.

If you will fix it, feel free to add my tag:
Reviewed-by: Chanwoo Choi 

> 
>>
>> Suggested-by: Tobias Jakobi 
>> Suggested-by: Chanwoo Choi 
>> Signed-off-by: Lukasz Luba 
>> ---
>>  drivers/devfreq/devfreq.c | 62 
>> +++
>>  1 file changed, 36 insertions(+), 26 deletions(-)
>>
>> diff --git a/drivers/devfreq/devfreq.c b/drivers/devfreq/devfreq.c
>> index 1414130..a9fd61b 100644
>> --- a/drivers/devfreq/devfreq.c
>> +++ b/drivers/devfreq/devfreq.c
>> @@ -285,6 +285,40 @@ static int devfreq_notify_transition(struct devfreq 
>> *devfreq,
>>  return 0;
>>  }
>>  
>> +static int devfreq_set_target(struct devfreq *devfreq, unsigned long 
>> new_freq,
>> +  u32 flags)
>> +{
>> +struct devfreq_freqs freqs;
>> +unsigned long cur_freq;
>> +int err = 0;
>> +
>> +if (devfreq->profile->get_cur_freq)
>> +devfreq->profile->get_cur_freq(devfreq->dev.parent, _freq);
>> +else
>> +cur_freq = devfreq->previous_freq;
>> +
>> +freqs.old = cur_freq;
>> +freqs.new = new_freq;
>> +devfreq_notify_transition(devfreq, , DEVFREQ_PRECHANGE);
>> +
>> +err = devfreq->profile->target(devfreq->dev.parent, _freq, flags);
>> +if (err) {
>> +freqs.new = cur_freq;
>> +devfreq_notify_transition(devfreq, , DEVFREQ_POSTCHANGE);
>> +return err;
>> +}
>> +
>> +freqs.new = new_freq;
>> +devfreq_notify_transition(devfreq, , DEVFREQ_POSTCHANGE);
>> +
>> +if (devfreq_update_status(devfreq, new_freq))
>> +dev_err(>dev,
>> +"Couldn't update frequency transition information.\n");
>> +
>> +devfreq->previous_freq = new_freq;
>> +return err;
>> +}
>> +
>>  /* Load monitoring helper functions for governors use */
>>  
>>  /**
>> @@ -296,8 +330,7 @@ static int devfreq_notify_transition(struct devfreq 
>> *devfreq,
>>   */
>>  int update_devfreq(struct devfreq *devfreq)
>>  {
>> -struct devfreq_freqs freqs;
>> -unsigned long freq, cur_freq, min_freq, max_freq;
>> +unsigned long freq, min_freq, max_freq;
>>  int err = 0;
>>  u32 flags = 0;
>>  
>> @@ -333,31 +366,8 @@ int update_devfreq(struct devfreq *devfreq)
>>  flags |= DEVFREQ_FLAG_LEAST_UPPER_BOUND; /* Use LUB */
>>  }
>>  
>> -if (devfreq->profile->get_cur_freq)
>> -devfreq->profile->get_cur_freq(devfreq->dev.parent, _freq);
>> -else
>> -cur_freq = devfreq->previous_freq;
>> -
>> -freqs.old = cur_freq;
>> -freqs.new = freq;
>> -devfreq_notify_transition(devfreq, , DEVFREQ_PRECHANGE);
>> +return devfreq_set_target(devfreq, freq, flags);
>>  
>> -err = devfreq->profile->target(devfreq->dev.parent, , flags);
>> -if (err) {
>> -freqs.new = cur_freq;
>> -devfreq_notify_transition(devfreq, , DEVFREQ_POSTCHANGE);
>> -return err;
>> -}
>> -
>> -freqs.new = freq;
>> -devfreq_notify_transition(devfreq, , DEVFREQ_POSTCHANGE);
>> -
>> -if (devfreq_update_status(devfreq, freq))
>> -dev_err(>dev,
>> -"Couldn't update frequency transition information.\n");
>> -
>> -devfreq->previous_freq = freq;
>> -return err;
>>  }
>>  EXPORT_SYMBOL(update_devfreq);
>>  
>>
> 
> 


-- 
Best Regards,
Chanwoo Choi
Samsung Electronics

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 1632 matches

Mail list logo