date:20171005

Re: [PATCH v3 1/2] soc: qcom: smem: Rename "uncached" accessors

2017-10-05 Thread Stephen Boyd

On 10/04, Bjorn Andersson wrote:
> In preparation for adding accessors for "cached" entries rename the
> "uncached" accessors. Also rename "first" cached entry to "last", as
> the cached list grows backwards.
> 
> Signed-off-by: Bjorn Andersson 
> ---

Reviewed-by: Stephen Boyd 

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [PATCH v3 2/2] soc: qcom: smem: Support getting cached entries

2017-10-05 Thread Stephen Boyd

On 10/04, Bjorn Andersson wrote:
> On msm8996 cached SMEM items are used for storing the GLINK FIFOs, so
> for items not found in the uncached list we need to also search the
> cased list for these items.
> 
> Signed-off-by: Bjorn Andersson 
> ---

Reviewed-by: Stephen Boyd 

-- 
Qualcomm Innovation Center, Inc. is a member of Code Aurora Forum,
a Linux Foundation Collaborative Project

Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command

2017-10-05 Thread Mathieu Desnoyers

- On Oct 5, 2017, at 12:21 PM, Peter Zijlstra pet...@infradead.org wrote:

> On Thu, Oct 05, 2017 at 04:02:06PM +, Mathieu Desnoyers wrote:
>> - On Oct 5, 2017, at 8:12 AM, Peter Zijlstra pet...@infradead.org wrote:
>> 
>> > On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
>> >> diff --git a/arch/powerpc/kernel/membarrier.c 
>> >> b/arch/powerpc/kernel/membarrier.c
>> >> new file mode 100644
>> >> index ..b0d79a5f5981
>> >> --- /dev/null
>> >> +++ b/arch/powerpc/kernel/membarrier.c
>> >> @@ -0,0 +1,45 @@
>> > 
>> >> +void membarrier_arch_register_private_expedited(struct task_struct *p)
>> >> +{
>> >> + struct task_struct *t;
>> >> +
>> >> + if (get_nr_threads(p) == 1) {
>> >> + set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> >> + return;
>> >> + }
>> >> + /*
>> >> +  * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
>> >> +  * fork is protected by siglock.
>> >> +  */
>> >> + spin_lock(>sighand->siglock);
>> >> + for_each_thread(p, t)
>> >> + set_ti_thread_flag(task_thread_info(t),
>> >> + TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> > 
>> > I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.
>> 
>> The intent here is to hold the sighand siglock to provide mutual
>> exclusion against invocation of membarrier_fork(p, clone_flags)
>> by copy_process().
>> 
>> copy_process() grabs spin_lock(>sighand->siglock) for both
>> CLONE_THREAD and not CLONE_THREAD flags.
>> 
>> What am I missing here ?
> 
> If you do CLONE_VM without CLONE_THREAD you'll end up sharing the mm but
> you'll not be part of thread_head, so the for_each_thread() iteration
> will not find the task.

Excellent point. Please see the follow up RFC patch I posted taking care of
this matter.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: [PATCH tip/core/rcu 1/3] membarrier: Provide register expedited private command

2017-10-05 Thread Mathieu Desnoyers

- On Oct 5, 2017, at 12:21 PM, Peter Zijlstra pet...@infradead.org wrote:

> On Thu, Oct 05, 2017 at 04:02:06PM +, Mathieu Desnoyers wrote:
>> - On Oct 5, 2017, at 8:12 AM, Peter Zijlstra pet...@infradead.org wrote:
>> 
>> > On Wed, Oct 04, 2017 at 02:37:53PM -0700, Paul E. McKenney wrote:
>> >> diff --git a/arch/powerpc/kernel/membarrier.c 
>> >> b/arch/powerpc/kernel/membarrier.c
>> >> new file mode 100644
>> >> index ..b0d79a5f5981
>> >> --- /dev/null
>> >> +++ b/arch/powerpc/kernel/membarrier.c
>> >> @@ -0,0 +1,45 @@
>> > 
>> >> +void membarrier_arch_register_private_expedited(struct task_struct *p)
>> >> +{
>> >> + struct task_struct *t;
>> >> +
>> >> + if (get_nr_threads(p) == 1) {
>> >> + set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> >> + return;
>> >> + }
>> >> + /*
>> >> +  * Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
>> >> +  * fork is protected by siglock.
>> >> +  */
>> >> + spin_lock(>sighand->siglock);
>> >> + for_each_thread(p, t)
>> >> + set_ti_thread_flag(task_thread_info(t),
>> >> + TIF_MEMBARRIER_PRIVATE_EXPEDITED);
>> > 
>> > I'm not sure this works correctly vs CLONE_VM without CLONE_THREAD.
>> 
>> The intent here is to hold the sighand siglock to provide mutual
>> exclusion against invocation of membarrier_fork(p, clone_flags)
>> by copy_process().
>> 
>> copy_process() grabs spin_lock(>sighand->siglock) for both
>> CLONE_THREAD and not CLONE_THREAD flags.
>> 
>> What am I missing here ?
> 
> If you do CLONE_VM without CLONE_THREAD you'll end up sharing the mm but
> you'll not be part of thread_head, so the for_each_thread() iteration
> will not find the task.

Excellent point. Please see the follow up RFC patch I posted taking care of
this matter.

Thanks,

Mathieu


-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

[RFC PATCH for 4.14 1/2] membarrier: Remove unused code for architectures without membarrier hooks

2017-10-05 Thread Mathieu Desnoyers

Architectures without membarrier hooks don't need to emit the
empty membarrier_arch_switch_mm() static inline when
CONFIG_MEMBARRIER=y.

Adapt the CONFIG_MEMBARRIER=n counterpart to only emit the empty
membarrier_arch_switch_mm() for architectures with membarrier hooks.

Reported-by: Nicholas Piggin 
Signed-off-by: Mathieu Desnoyers 
CC: Peter Zijlstra 
CC: Paul E. McKenney 
CC: Boqun Feng 
CC: Andrew Hunter 
CC: Maged Michael 
CC: gro...@google.com
CC: Avi Kivity 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Michael Ellerman 
CC: Dave Watson 
CC: Alan Stern 
CC: Will Deacon 
CC: Andy Lutomirski 
CC: Ingo Molnar 
CC: Alexander Viro 
CC: linuxppc-...@lists.ozlabs.org
CC: linux-a...@vger.kernel.org
---
 include/linux/sched/mm.h | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index d5a9ab8f3836..b2767ecb21a8 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -215,10 +215,6 @@ static inline void memalloc_noreclaim_restore(unsigned int 
flags)
 #ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
 #include 
 #else
-static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
-   struct mm_struct *next, struct task_struct *tsk)
-{
-}
 static inline void membarrier_arch_fork(struct task_struct *t,
unsigned long clone_flags)
 {
@@ -247,10 +243,12 @@ static inline void membarrier_execve(struct task_struct 
*t)
membarrier_arch_execve(t);
 }
 #else
+#ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
 static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
struct mm_struct *next, struct task_struct *tsk)
 {
 }
+#endif
 static inline void membarrier_fork(struct task_struct *t,
unsigned long clone_flags)
 {
-- 
2.11.0

[RFC PATCH for 4.14 1/2] membarrier: Remove unused code for architectures without membarrier hooks

2017-10-05 Thread Mathieu Desnoyers

Architectures without membarrier hooks don't need to emit the
empty membarrier_arch_switch_mm() static inline when
CONFIG_MEMBARRIER=y.

Adapt the CONFIG_MEMBARRIER=n counterpart to only emit the empty
membarrier_arch_switch_mm() for architectures with membarrier hooks.

Reported-by: Nicholas Piggin 
Signed-off-by: Mathieu Desnoyers 
CC: Peter Zijlstra 
CC: Paul E. McKenney 
CC: Boqun Feng 
CC: Andrew Hunter 
CC: Maged Michael 
CC: gro...@google.com
CC: Avi Kivity 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Michael Ellerman 
CC: Dave Watson 
CC: Alan Stern 
CC: Will Deacon 
CC: Andy Lutomirski 
CC: Ingo Molnar 
CC: Alexander Viro 
CC: linuxppc-...@lists.ozlabs.org
CC: linux-a...@vger.kernel.org
---
 include/linux/sched/mm.h | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/include/linux/sched/mm.h b/include/linux/sched/mm.h
index d5a9ab8f3836..b2767ecb21a8 100644
--- a/include/linux/sched/mm.h
+++ b/include/linux/sched/mm.h
@@ -215,10 +215,6 @@ static inline void memalloc_noreclaim_restore(unsigned int 
flags)
 #ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
 #include 
 #else
-static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
-   struct mm_struct *next, struct task_struct *tsk)
-{
-}
 static inline void membarrier_arch_fork(struct task_struct *t,
unsigned long clone_flags)
 {
@@ -247,10 +243,12 @@ static inline void membarrier_execve(struct task_struct 
*t)
membarrier_arch_execve(t);
 }
 #else
+#ifdef CONFIG_ARCH_HAS_MEMBARRIER_HOOKS
 static inline void membarrier_arch_switch_mm(struct mm_struct *prev,
struct mm_struct *next, struct task_struct *tsk)
 {
 }
+#endif
 static inline void membarrier_fork(struct task_struct *t,
unsigned long clone_flags)
 {
-- 
2.11.0

[RFC PATCH for 4.14 2/2] Fix: membarrier: Handle CLONE_VM + !CLONE_THREAD correctly on powerpc

2017-10-05 Thread Mathieu Desnoyers

Threads targeting the same VM but which belong to different thread
groups is a tricky case. It has a few consequences:

It turns out that we cannot rely on get_nr_threads(p) to count the
number of threads using a VM. We can use
(atomic_read(>mm_users) == 1 && get_nr_threads(p) == 1)
instead to skip the synchronize_sched() for cases where the VM only has
a single user, and that user only has a single thread.

It also turns out that we cannot use for_each_thread() to set
thread flags in all threads using a VM, as it only iterates on the
thread group.

Therefore, test the membarrier state variable directly rather than
relying on thread flags. This means
membarrier_register_private_expedited() needs to set the
MEMBARRIER_STATE_SWITCH_MM flag, issue synchronize_sched(), and only
then set MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY which allows private
expedited membarrier commands to succeed. membarrier_arch_switch_mm()
now tests for the MEMBARRIER_STATE_SWITCH_MM flag.

Reported-by: Peter Zijlstra 
Signed-off-by: Mathieu Desnoyers 
CC: Paul E. McKenney 
CC: Boqun Feng 
CC: Andrew Hunter 
CC: Maged Michael 
CC: gro...@google.com
CC: Avi Kivity 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Michael Ellerman 
CC: Dave Watson 
CC: Alan Stern 
CC: Will Deacon 
CC: Andy Lutomirski 
CC: Ingo Molnar 
CC: Alexander Viro 
CC: Nicholas Piggin 
CC: linuxppc-...@lists.ozlabs.org
CC: linux-a...@vger.kernel.org
---
 arch/powerpc/include/asm/membarrier.h | 21 ++---
 arch/powerpc/kernel/membarrier.c  | 17 -
 include/linux/mm_types.h  |  2 +-
 include/linux/sched/mm.h  | 28 ++--
 kernel/fork.c |  2 --
 kernel/sched/membarrier.c | 16 +---
 6 files changed, 26 insertions(+), 60 deletions(-)

diff --git a/arch/powerpc/include/asm/membarrier.h 
b/arch/powerpc/include/asm/membarrier.h
index 61152a7a3cf9..0951646253d9 100644
--- a/arch/powerpc/include/asm/membarrier.h
+++ b/arch/powerpc/include/asm/membarrier.h
@@ -11,8 +11,8 @@ static inline void membarrier_arch_switch_mm(struct mm_struct 
*prev,
 * when switching from userspace to kernel is not needed after
 * store to rq->curr.
 */
-   if (likely(!test_ti_thread_flag(task_thread_info(tsk),
-   TIF_MEMBARRIER_PRIVATE_EXPEDITED) || !prev))
+   if (likely(!(atomic_read(>membarrier_state)
+   & MEMBARRIER_STATE_SWITCH_MM) || !prev))
return;
 
/*
@@ -21,23 +21,6 @@ static inline void membarrier_arch_switch_mm(struct 
mm_struct *prev,
 */
smp_mb();
 }
-static inline void membarrier_arch_fork(struct task_struct *t,
-   unsigned long clone_flags)
-{
-   /*
-* Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
-* fork is protected by siglock. membarrier_arch_fork is called
-* with siglock held.
-*/
-   if (test_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED))
-   set_ti_thread_flag(task_thread_info(t),
-   TIF_MEMBARRIER_PRIVATE_EXPEDITED);
-}
-static inline void membarrier_arch_execve(struct task_struct *t)
-{
-   clear_ti_thread_flag(task_thread_info(t),
-   TIF_MEMBARRIER_PRIVATE_EXPEDITED);
-}
 void membarrier_arch_register_private_expedited(struct task_struct *t);
 
 #endif /* _ASM_POWERPC_MEMBARRIER_H */
diff --git a/arch/powerpc/kernel/membarrier.c b/arch/powerpc/kernel/membarrier.c
index b0d79a5f5981..4795ad59b833 100644
--- a/arch/powerpc/kernel/membarrier.c
+++ b/arch/powerpc/kernel/membarrier.c
@@ -19,24 +19,15 @@
 #include 
 #include 
 #include 
+#include 
 
 void membarrier_arch_register_private_expedited(struct task_struct *p)
 {
-   struct task_struct *t;
+   struct mm_struct *mm = p->mm;
 
-   if (get_nr_threads(p) == 1) {
-   set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
+   atomic_or(MEMBARRIER_STATE_SWITCH_MM, >membarrier_state);
+   if (atomic_read(>mm_users) == 1 && get_nr_threads(p) == 1)
return;
-   }
-   /*
-* Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
-* fork is protected by siglock.
-*/
-   spin_lock(>sighand->siglock);
-   for_each_thread(p, t)
-   set_ti_thread_flag(task_thread_info(t),
-   TIF_MEMBARRIER_PRIVATE_EXPEDITED);
-   spin_unlock(>sighand->siglock);
/*
 * Ensure all future scheduler executions will observe the new
 * thread flag state for

[RFC PATCH for 4.14 2/2] Fix: membarrier: Handle CLONE_VM + !CLONE_THREAD correctly on powerpc

2017-10-05 Thread Mathieu Desnoyers

Threads targeting the same VM but which belong to different thread
groups is a tricky case. It has a few consequences:

It turns out that we cannot rely on get_nr_threads(p) to count the
number of threads using a VM. We can use
(atomic_read(>mm_users) == 1 && get_nr_threads(p) == 1)
instead to skip the synchronize_sched() for cases where the VM only has
a single user, and that user only has a single thread.

It also turns out that we cannot use for_each_thread() to set
thread flags in all threads using a VM, as it only iterates on the
thread group.

Therefore, test the membarrier state variable directly rather than
relying on thread flags. This means
membarrier_register_private_expedited() needs to set the
MEMBARRIER_STATE_SWITCH_MM flag, issue synchronize_sched(), and only
then set MEMBARRIER_STATE_PRIVATE_EXPEDITED_READY which allows private
expedited membarrier commands to succeed. membarrier_arch_switch_mm()
now tests for the MEMBARRIER_STATE_SWITCH_MM flag.

Reported-by: Peter Zijlstra 
Signed-off-by: Mathieu Desnoyers 
CC: Paul E. McKenney 
CC: Boqun Feng 
CC: Andrew Hunter 
CC: Maged Michael 
CC: gro...@google.com
CC: Avi Kivity 
CC: Benjamin Herrenschmidt 
CC: Paul Mackerras 
CC: Michael Ellerman 
CC: Dave Watson 
CC: Alan Stern 
CC: Will Deacon 
CC: Andy Lutomirski 
CC: Ingo Molnar 
CC: Alexander Viro 
CC: Nicholas Piggin 
CC: linuxppc-...@lists.ozlabs.org
CC: linux-a...@vger.kernel.org
---
 arch/powerpc/include/asm/membarrier.h | 21 ++---
 arch/powerpc/kernel/membarrier.c  | 17 -
 include/linux/mm_types.h  |  2 +-
 include/linux/sched/mm.h  | 28 ++--
 kernel/fork.c |  2 --
 kernel/sched/membarrier.c | 16 +---
 6 files changed, 26 insertions(+), 60 deletions(-)

diff --git a/arch/powerpc/include/asm/membarrier.h 
b/arch/powerpc/include/asm/membarrier.h
index 61152a7a3cf9..0951646253d9 100644
--- a/arch/powerpc/include/asm/membarrier.h
+++ b/arch/powerpc/include/asm/membarrier.h
@@ -11,8 +11,8 @@ static inline void membarrier_arch_switch_mm(struct mm_struct 
*prev,
 * when switching from userspace to kernel is not needed after
 * store to rq->curr.
 */
-   if (likely(!test_ti_thread_flag(task_thread_info(tsk),
-   TIF_MEMBARRIER_PRIVATE_EXPEDITED) || !prev))
+   if (likely(!(atomic_read(>membarrier_state)
+   & MEMBARRIER_STATE_SWITCH_MM) || !prev))
return;
 
/*
@@ -21,23 +21,6 @@ static inline void membarrier_arch_switch_mm(struct 
mm_struct *prev,
 */
smp_mb();
 }
-static inline void membarrier_arch_fork(struct task_struct *t,
-   unsigned long clone_flags)
-{
-   /*
-* Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
-* fork is protected by siglock. membarrier_arch_fork is called
-* with siglock held.
-*/
-   if (test_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED))
-   set_ti_thread_flag(task_thread_info(t),
-   TIF_MEMBARRIER_PRIVATE_EXPEDITED);
-}
-static inline void membarrier_arch_execve(struct task_struct *t)
-{
-   clear_ti_thread_flag(task_thread_info(t),
-   TIF_MEMBARRIER_PRIVATE_EXPEDITED);
-}
 void membarrier_arch_register_private_expedited(struct task_struct *t);
 
 #endif /* _ASM_POWERPC_MEMBARRIER_H */
diff --git a/arch/powerpc/kernel/membarrier.c b/arch/powerpc/kernel/membarrier.c
index b0d79a5f5981..4795ad59b833 100644
--- a/arch/powerpc/kernel/membarrier.c
+++ b/arch/powerpc/kernel/membarrier.c
@@ -19,24 +19,15 @@
 #include 
 #include 
 #include 
+#include 
 
 void membarrier_arch_register_private_expedited(struct task_struct *p)
 {
-   struct task_struct *t;
+   struct mm_struct *mm = p->mm;
 
-   if (get_nr_threads(p) == 1) {
-   set_thread_flag(TIF_MEMBARRIER_PRIVATE_EXPEDITED);
+   atomic_or(MEMBARRIER_STATE_SWITCH_MM, >membarrier_state);
+   if (atomic_read(>mm_users) == 1 && get_nr_threads(p) == 1)
return;
-   }
-   /*
-* Coherence of TIF_MEMBARRIER_PRIVATE_EXPEDITED against thread
-* fork is protected by siglock.
-*/
-   spin_lock(>sighand->siglock);
-   for_each_thread(p, t)
-   set_ti_thread_flag(task_thread_info(t),
-   TIF_MEMBARRIER_PRIVATE_EXPEDITED);
-   spin_unlock(>sighand->siglock);
/*
 * Ensure all future scheduler executions will observe the new
 * thread flag state for this process.
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index 5e0fe8ce053b..1861ea8dba77 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -446,7 +446,7 @@ struct mm_struct {
 
struct core_state *core_state; /* coredumping support */
 #ifdef CONFIG_MEMBARRIER
-   int membarrier_private_expedited;
+   atomic_t

Re: Extreme time jitter with suspend/resume cycles

2017-10-05 Thread Thomas Gleixner

On Thu, 5 Oct 2017, Thomas Gleixner wrote:
> On Thu, 5 Oct 2017, Gabriel Beddingfield wrote:
> 
> > Hi Thomas,
> > 
> > On Thu, Oct 5, 2017 at 11:01 AM, Thomas Gleixner  wrote:
> > >> > Which SoC/clocksource driver are you talking about?
> > >>
> > >> NXP i.MX 6SoloX
> > >> drivers/clocksource/timer-imx-gpt.c
> > >
> > > So that clocksource driver looks correct. Do you have an idea in which
> > > context this time jump happens? Does it happen when you exercise your high
> > > frequency suspend/resume dance or is that happening just when you let the
> > > machine run forever as well?
> > 
> > We couldn't devise any reproduction steps. We observed it happening at
> > unexpected times in a fleet of devices -- and we couldn't find any
> > patterns to clue us in.
> 
> Ok. Did you talk to NXP about that? Or did you try to exercise reads in a
> loop to detect the wreckage and maybe a pattern in there?

The reason I'm asking is to exclude any weird issue in the timekeeping
code, which is still a possibility, despite the fact that I went through it
with a fine comb after stumbling over that check in the resume path.

Thanks,

tglx

Re: Extreme time jitter with suspend/resume cycles

2017-10-05 Thread Thomas Gleixner

On Thu, 5 Oct 2017, Thomas Gleixner wrote:
> On Thu, 5 Oct 2017, Gabriel Beddingfield wrote:
> 
> > Hi Thomas,
> > 
> > On Thu, Oct 5, 2017 at 11:01 AM, Thomas Gleixner  wrote:
> > >> > Which SoC/clocksource driver are you talking about?
> > >>
> > >> NXP i.MX 6SoloX
> > >> drivers/clocksource/timer-imx-gpt.c
> > >
> > > So that clocksource driver looks correct. Do you have an idea in which
> > > context this time jump happens? Does it happen when you exercise your high
> > > frequency suspend/resume dance or is that happening just when you let the
> > > machine run forever as well?
> > 
> > We couldn't devise any reproduction steps. We observed it happening at
> > unexpected times in a fleet of devices -- and we couldn't find any
> > patterns to clue us in.
> 
> Ok. Did you talk to NXP about that? Or did you try to exercise reads in a
> loop to detect the wreckage and maybe a pattern in there?

The reason I'm asking is to exclude any weird issue in the timekeeping
code, which is still a possibility, despite the fact that I went through it
with a fine comb after stumbling over that check in the resume path.

Thanks,

tglx

Re: [PATCH 1/3] arm/xen: don't inclide rwlock.h directly.1~B

2017-10-05 Thread Sebastian Andrzej Siewior

On 2017-10-05 12:58:32 [-0700], Stefano Stabellini wrote:
> Let me know if you want this patch to go via the xen tree.
Yes, please.

Sebastian

Re: [PATCH 1/3] arm/xen: don't inclide rwlock.h directly.1~B

2017-10-05 Thread Sebastian Andrzej Siewior

On 2017-10-05 12:58:32 [-0700], Stefano Stabellini wrote:
> Let me know if you want this patch to go via the xen tree.
Yes, please.

Sebastian

Re: Extreme time jitter with suspend/resume cycles

2017-10-05 Thread Thomas Gleixner

Gabriel,

On Thu, 5 Oct 2017, Gabriel Beddingfield wrote:

> Hi John,
> 
> On Wed, Oct 4, 2017 at 5:16 PM, John Stultz  wrote:
> >> Please let me know what you think -- and what the right approach for
> >> solving this would be.
> >
> > So I suspect the best solution for you here is: provide some
> > infrastructure so clocksources that set CLOCK_SOURCE_SUSPEND_NONSTOP
> > which are not the current clocksource can be used for suspend timing.
> 
> Let me see if I understand how this might work in my situation...
> 
> 1. I register a clocksource and set the NONSTOP flag.
> 2. Give it a "low" rating so that it's not selected as the timekeeping
> clocksource.
> 3. Create functions clocksource_select_persistent() and
> timekeeping_notify_persistent()
> 4. Add `struct tk_read_base tk_persistent' to `struct timekeeper'
> 5. Possibly add a change_persistent_clocksource() function to timekeeping.c

That might work, but that looks a tad too complex. Let me give it a try:

1) Create a new flag CLOCK_SOURCE_SUSPEND_BACKUP

   This makes sense because such a clocksource is likely to be something
   which you don't want ever to use for timekeeping even if its the only
   thing aside of jiffies.

2) Set this flag when registering a clocksource, which excludes it from the
   normal selection process.

3) Make the registration code select such a clocksource as the backup for
   suspend resume to brige the gap when the timekeeper clocksource does not
   have the NONSTOP flag set.

   You don't need the extra timekeeping_notify_persistent() because in that
   case the maybe current backup clocksource is definitely not in use and
   can be replaced without the stompmachine muck which we need for
   replacing the current timekeeper clocksource. The system cannot be in
   suspend at this point obviously. so all it needs is to switch a pointer.

   You neither need this extra stuff in struct timekeeper, it's big enough
   anyway. A simple static struct tk_read_base should be sufficient.

On suspend you do

   if (tk_backup->clock)
sync_data(timekeeper, tk_backup);

   You still want to record the RTC based time if possible, in case that
   the backup timekeeper can wrap so you have a sanity check for that
   similar to the one we need for NONSTOP clocksources. If there is no RTC
   then we need a sensible cutoff for that wraparound time which makes sure
   that we don't trip over our own feet.

On resume you check tk_backup->clock again, do the RTC sanity check, if
available and valid (either wraps above cutoff or is checked via RTC). If
that's ok, then you update the timekeeper and proceed. If not, use the
fallback or do nothing in the worst case.

Thoughts?

Thanks,

tglx

Re: Extreme time jitter with suspend/resume cycles

2017-10-05 Thread Thomas Gleixner

Gabriel,

On Thu, 5 Oct 2017, Gabriel Beddingfield wrote:

> Hi John,
> 
> On Wed, Oct 4, 2017 at 5:16 PM, John Stultz  wrote:
> >> Please let me know what you think -- and what the right approach for
> >> solving this would be.
> >
> > So I suspect the best solution for you here is: provide some
> > infrastructure so clocksources that set CLOCK_SOURCE_SUSPEND_NONSTOP
> > which are not the current clocksource can be used for suspend timing.
> 
> Let me see if I understand how this might work in my situation...
> 
> 1. I register a clocksource and set the NONSTOP flag.
> 2. Give it a "low" rating so that it's not selected as the timekeeping
> clocksource.
> 3. Create functions clocksource_select_persistent() and
> timekeeping_notify_persistent()
> 4. Add `struct tk_read_base tk_persistent' to `struct timekeeper'
> 5. Possibly add a change_persistent_clocksource() function to timekeeping.c

That might work, but that looks a tad too complex. Let me give it a try:

1) Create a new flag CLOCK_SOURCE_SUSPEND_BACKUP

   This makes sense because such a clocksource is likely to be something
   which you don't want ever to use for timekeeping even if its the only
   thing aside of jiffies.

2) Set this flag when registering a clocksource, which excludes it from the
   normal selection process.

3) Make the registration code select such a clocksource as the backup for
   suspend resume to brige the gap when the timekeeper clocksource does not
   have the NONSTOP flag set.

   You don't need the extra timekeeping_notify_persistent() because in that
   case the maybe current backup clocksource is definitely not in use and
   can be replaced without the stompmachine muck which we need for
   replacing the current timekeeper clocksource. The system cannot be in
   suspend at this point obviously. so all it needs is to switch a pointer.

   You neither need this extra stuff in struct timekeeper, it's big enough
   anyway. A simple static struct tk_read_base should be sufficient.

On suspend you do

   if (tk_backup->clock)
sync_data(timekeeper, tk_backup);

   You still want to record the RTC based time if possible, in case that
   the backup timekeeper can wrap so you have a sanity check for that
   similar to the one we need for NONSTOP clocksources. If there is no RTC
   then we need a sensible cutoff for that wraparound time which makes sure
   that we don't trip over our own feet.

On resume you check tk_backup->clock again, do the RTC sanity check, if
available and valid (either wraps above cutoff or is checked via RTC). If
that's ok, then you update the timekeeper and proceed. If not, use the
fallback or do nothing in the worst case.

Thoughts?

Thanks,

tglx

Re: [PATCH v1 3/5] dt-bindings: Add DT bindings for NVIDIA Tegra AHB DMA controller

2017-10-05 Thread Dmitry Osipenko

On 05.10.2017 23:33, Rob Herring wrote:
> On Tue, Sep 26, 2017 at 02:22:04AM +0300, Dmitry Osipenko wrote:
>> Document DT bindings for NVIDIA Tegra AHB DMA controller that presents
>> on Tegra20/30 SoC's.
>>
>> Signed-off-by: Dmitry Osipenko 
>> ---
>>  .../bindings/dma/nvidia,tegra20-ahbdma.txt | 23 
>> ++
>>  1 file changed, 23 insertions(+)
>>  create mode 100644 
>> Documentation/devicetree/bindings/dma/nvidia,tegra20-ahbdma.txt
>>
>> diff --git a/Documentation/devicetree/bindings/dma/nvidia,tegra20-ahbdma.txt 
>> b/Documentation/devicetree/bindings/dma/nvidia,tegra20-ahbdma.txt
>> new file mode 100644
>> index ..2af9aa76ae11
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/dma/nvidia,tegra20-ahbdma.txt
>> @@ -0,0 +1,23 @@
>> +* NVIDIA Tegra AHB DMA controller
>> +
>> +Required properties:
>> +- compatible:   Must be "nvidia,tegra20-ahbdma"
>> +- reg:  Should contain registers base address and length.
>> +- interrupts:   Should contain one entry, DMA controller interrupt.
>> +- clocks:   Should contain one entry, DMA controller clock.
>> +- resets :  Should contain one entry, DMA controller reset.
>> +- #dma-cells:   Should be <1>. The cell represents DMA request select 
>> value
>> +for the peripheral. For more details consult the Tegra TRM's
>> +documentation, in particular AHB DMA channel control register
>> +REQ_SEL field.
>> +
>> +Example:
>> +
>> +ahbdma: ahbdma@60008000  {
> 
> Use standard node names. dma-controller in this case.
> 

Okay, I'll change it in v3. Thank you for the comment.

>> +compatible = "nvidia,tegra20-ahbdma";
>> +reg = <0x60008000 0x2000>;
>> +interrupts = ;
>> +clocks = <_car TEGRA20_CLK_AHBDMA>;
>> +resets = <_car 33>;
>> +#dma-cells = <1>;
>> +};
>> -- 
>> 2.14.1
>>

Re: [PATCH v1 3/5] dt-bindings: Add DT bindings for NVIDIA Tegra AHB DMA controller

2017-10-05 Thread Dmitry Osipenko

On 05.10.2017 23:33, Rob Herring wrote:
> On Tue, Sep 26, 2017 at 02:22:04AM +0300, Dmitry Osipenko wrote:
>> Document DT bindings for NVIDIA Tegra AHB DMA controller that presents
>> on Tegra20/30 SoC's.
>>
>> Signed-off-by: Dmitry Osipenko 
>> ---
>>  .../bindings/dma/nvidia,tegra20-ahbdma.txt | 23 
>> ++
>>  1 file changed, 23 insertions(+)
>>  create mode 100644 
>> Documentation/devicetree/bindings/dma/nvidia,tegra20-ahbdma.txt
>>
>> diff --git a/Documentation/devicetree/bindings/dma/nvidia,tegra20-ahbdma.txt 
>> b/Documentation/devicetree/bindings/dma/nvidia,tegra20-ahbdma.txt
>> new file mode 100644
>> index ..2af9aa76ae11
>> --- /dev/null
>> +++ b/Documentation/devicetree/bindings/dma/nvidia,tegra20-ahbdma.txt
>> @@ -0,0 +1,23 @@
>> +* NVIDIA Tegra AHB DMA controller
>> +
>> +Required properties:
>> +- compatible:   Must be "nvidia,tegra20-ahbdma"
>> +- reg:  Should contain registers base address and length.
>> +- interrupts:   Should contain one entry, DMA controller interrupt.
>> +- clocks:   Should contain one entry, DMA controller clock.
>> +- resets :  Should contain one entry, DMA controller reset.
>> +- #dma-cells:   Should be <1>. The cell represents DMA request select 
>> value
>> +for the peripheral. For more details consult the Tegra TRM's
>> +documentation, in particular AHB DMA channel control register
>> +REQ_SEL field.
>> +
>> +Example:
>> +
>> +ahbdma: ahbdma@60008000  {
> 
> Use standard node names. dma-controller in this case.
> 

Okay, I'll change it in v3. Thank you for the comment.

>> +compatible = "nvidia,tegra20-ahbdma";
>> +reg = <0x60008000 0x2000>;
>> +interrupts = ;
>> +clocks = <_car TEGRA20_CLK_AHBDMA>;
>> +resets = <_car 33>;
>> +#dma-cells = <1>;
>> +};
>> -- 
>> 2.14.1
>>

Re: [Intel-wired-lan] [PATCH] PCI: Check/Set ARI capability before setting numVFs

2017-10-05 Thread Alexander Duyck

On Thu, Oct 5, 2017 at 2:07 PM, Bjorn Helgaas  wrote:
> On Wed, Oct 04, 2017 at 04:29:14PM -0700, Alexander Duyck wrote:
>> On Wed, Oct 4, 2017 at 4:01 PM, Bjorn Helgaas  wrote:
>> > On Wed, Oct 04, 2017 at 08:52:58AM -0700, Tony Nguyen wrote:
>> >> This fixes a bug that can occur if an AER error is encountered while SRIOV
>> >> devices are present.
>> >>
>> >> This issue was seen by doing the following. Inject an AER error to a 
>> >> device
>> >> that has SRIOV devices.  After the device has recovered, remove the 
>> >> driver.
>> >> Reload the driver and enable SRIOV which causes the following crash to
>> >> occur:
>> >>
>> >> kernel BUG at drivers/pci/iov.c:157!
>> >> invalid opcode:  [#1] SMP
>> >> CPU: 36 PID: 2295 Comm: bash Not tainted 4.14.0-rc1+ #74
>> >> Hardware name: Supermicro X9DAi/X9DAi, BIOS 3.0a 04/29/2014
>> >> task: 9fa41cd45a00 task.stack: b4b2036e8000
>> >> RIP: 0010:pci_iov_add_virtfn+0x2eb/0x350
>> >> RSP: 0018:b4b2036ebcb8 EFLAGS: 00010286
>> >> RAX: fff0 RBX: 9fa42c1c8800 RCX: 9fa421ce2388
>> >> RDX: df90 RSI: 9fa8214fb388 RDI: df903fff
>> >> RBP: b4b2036ebd18 R08: 9fa421ce23b8 R09: b4b2036ebc2c
>> >> R10: 9fa42c1a5548 R11: 058e R12: 9fa8214fb000
>> >> R13: 9fa42c1a5000 R14: 9fa8214fb388 R15: 
>> >> FS:  7f60724b6700() GS:9fa82f30()
>> >> knlGS:
>> >> CS:  0010 DS:  ES:  CR0: 80050033
>> >> CR2: 559eca8b0f40 CR3: 000864146000 CR4: 001606e0
>> >> Call Trace:
>> >>  pci_enable_sriov+0x353/0x440
>> >>  ixgbe_pci_sriov_configure+0xd5/0x1f0 [ixgbe]
>> >>  sriov_numvfs_store+0xf7/0x170
>> >>  dev_attr_store+0x18/0x30
>> >>  sysfs_kf_write+0x37/0x40
>> >>  kernfs_fop_write+0x120/0x1b0
>> >>  __vfs_write+0x37/0x170
>> >>  ? __alloc_fd+0x3f/0x170
>> >>  ? set_close_on_exec+0x30/0x70
>> >>  vfs_write+0xb5/0x1a0
>> >>  SyS_write+0x55/0xc0
>> >>  entry_SYSCALL_64_fastpath+0x1a/0xa5
>> >> RIP: 0033:0x7f6071bafc20
>> >> RSP: 002b:7ffe7d42ba48 EFLAGS: 0246 ORIG_RAX: 0001
>> >> RAX: ffda RBX: 559eca8b0f30 RCX: 7f6071bafc20
>> >> RDX: 0002 RSI: 559eca961f60 RDI: 0001
>> >> RBP: 7f6071e78ae0 R08: 7f6071e7a740 R09: 7f60724b6700
>> >> R10: 0073 R11: 0246 R12: 
>> >> R13:  R14:  R15: 559eca892170
>> >> RIP: pci_iov_add_virtfn+0x2eb/0x350 RSP: b4b2036ebcb8
>> >>
>> >> The occurs since during AER recovery the ARI Capable Hierarchy bit,
>> >> which can affect the values for First VF Offset and VF Stride, is not set
>> >> until after pci_iov_set_numvfs() is called.
>> >
>> > Can you elaborate on where exactly this happens?  The only place we
>> > explicitly set PCI_SRIOV_CTRL_ARI is in sriov_init(), which is only
>> > called at enumeration-time.  So I'm guessing you're talking about this
>> > path:
>> >
>> >   ixgbe_io_slot_reset
>> > pci_restore_state
>> >   pci_restore_iov_state
>> > sriov_restore_state
>> >   pci_iov_set_numvfs
>> >
>> > where we don't set PCI_SRIOV_CTRL_ARI at all.  The fact that you say
>> > PCI_SRIOV_CTRL_ARI isn't set until *after* pci_iov_set_numvfs() is
>> > called suggests that it is being set *somewhere*, but I don't know
>> > where.
>>
>> The ARI bit is initialized in sriov_init, stored in iov->ctrl, and
>> restored in sriov_restore_state, but it occurs in the line after the
>> call to pci_iov_set_numvfs.
>>
>> The problem is you don't want to write the full iov->ctrl value until
>> after you have reset the the number of VFs since it will set VFE so
>> pulling out and configuring the ARI value separately is needed.
>
> Doh, that should have been obvious to me ;)
>
>> >> This can cause the iov
>> >> structure to be populated with values that are incorrect if the bit is
>> >> later set.   Check and set this bit, if needed, before calling
>> >> pci_iov_set_numvfs() so that the values being populated properly take
>> >> the ARI bit into account.
>> >>
>> >> CC: Alexander Duyck 
>> >> CC: Emil Tantilov 
>> >> Signed-off-by: Tony Nguyen 
>> >> ---
>> >>  drivers/pci/iov.c | 4 
>> >>  1 file changed, 4 insertions(+)
>> >>
>> >> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>> >> index 7492a65..a8896c7 100644
>> >> --- a/drivers/pci/iov.c
>> >> +++ b/drivers/pci/iov.c
>> >> @@ -497,6 +497,10 @@ static void sriov_restore_state(struct pci_dev *dev)
>> >>   if (ctrl & PCI_SRIOV_CTRL_VFE)
>> >>   return;
>> >>
>> >> + if ((iov->ctrl & PCI_SRIOV_CTRL_ARI) && !(ctrl & 
>> >> PCI_SRIOV_CTRL_ARI))
>> >> + pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL,
>> >> +   ctrl | PCI_SRIOV_CTRL_ARI);
>
> This looks a little fiddly and also

Re: [Intel-wired-lan] [PATCH] PCI: Check/Set ARI capability before setting numVFs

2017-10-05 Thread Alexander Duyck

On Thu, Oct 5, 2017 at 2:07 PM, Bjorn Helgaas  wrote:
> On Wed, Oct 04, 2017 at 04:29:14PM -0700, Alexander Duyck wrote:
>> On Wed, Oct 4, 2017 at 4:01 PM, Bjorn Helgaas  wrote:
>> > On Wed, Oct 04, 2017 at 08:52:58AM -0700, Tony Nguyen wrote:
>> >> This fixes a bug that can occur if an AER error is encountered while SRIOV
>> >> devices are present.
>> >>
>> >> This issue was seen by doing the following. Inject an AER error to a 
>> >> device
>> >> that has SRIOV devices.  After the device has recovered, remove the 
>> >> driver.
>> >> Reload the driver and enable SRIOV which causes the following crash to
>> >> occur:
>> >>
>> >> kernel BUG at drivers/pci/iov.c:157!
>> >> invalid opcode:  [#1] SMP
>> >> CPU: 36 PID: 2295 Comm: bash Not tainted 4.14.0-rc1+ #74
>> >> Hardware name: Supermicro X9DAi/X9DAi, BIOS 3.0a 04/29/2014
>> >> task: 9fa41cd45a00 task.stack: b4b2036e8000
>> >> RIP: 0010:pci_iov_add_virtfn+0x2eb/0x350
>> >> RSP: 0018:b4b2036ebcb8 EFLAGS: 00010286
>> >> RAX: fff0 RBX: 9fa42c1c8800 RCX: 9fa421ce2388
>> >> RDX: df90 RSI: 9fa8214fb388 RDI: df903fff
>> >> RBP: b4b2036ebd18 R08: 9fa421ce23b8 R09: b4b2036ebc2c
>> >> R10: 9fa42c1a5548 R11: 058e R12: 9fa8214fb000
>> >> R13: 9fa42c1a5000 R14: 9fa8214fb388 R15: 
>> >> FS:  7f60724b6700() GS:9fa82f30()
>> >> knlGS:
>> >> CS:  0010 DS:  ES:  CR0: 80050033
>> >> CR2: 559eca8b0f40 CR3: 000864146000 CR4: 001606e0
>> >> Call Trace:
>> >>  pci_enable_sriov+0x353/0x440
>> >>  ixgbe_pci_sriov_configure+0xd5/0x1f0 [ixgbe]
>> >>  sriov_numvfs_store+0xf7/0x170
>> >>  dev_attr_store+0x18/0x30
>> >>  sysfs_kf_write+0x37/0x40
>> >>  kernfs_fop_write+0x120/0x1b0
>> >>  __vfs_write+0x37/0x170
>> >>  ? __alloc_fd+0x3f/0x170
>> >>  ? set_close_on_exec+0x30/0x70
>> >>  vfs_write+0xb5/0x1a0
>> >>  SyS_write+0x55/0xc0
>> >>  entry_SYSCALL_64_fastpath+0x1a/0xa5
>> >> RIP: 0033:0x7f6071bafc20
>> >> RSP: 002b:7ffe7d42ba48 EFLAGS: 0246 ORIG_RAX: 0001
>> >> RAX: ffda RBX: 559eca8b0f30 RCX: 7f6071bafc20
>> >> RDX: 0002 RSI: 559eca961f60 RDI: 0001
>> >> RBP: 7f6071e78ae0 R08: 7f6071e7a740 R09: 7f60724b6700
>> >> R10: 0073 R11: 0246 R12: 
>> >> R13:  R14:  R15: 559eca892170
>> >> RIP: pci_iov_add_virtfn+0x2eb/0x350 RSP: b4b2036ebcb8
>> >>
>> >> The occurs since during AER recovery the ARI Capable Hierarchy bit,
>> >> which can affect the values for First VF Offset and VF Stride, is not set
>> >> until after pci_iov_set_numvfs() is called.
>> >
>> > Can you elaborate on where exactly this happens?  The only place we
>> > explicitly set PCI_SRIOV_CTRL_ARI is in sriov_init(), which is only
>> > called at enumeration-time.  So I'm guessing you're talking about this
>> > path:
>> >
>> >   ixgbe_io_slot_reset
>> > pci_restore_state
>> >   pci_restore_iov_state
>> > sriov_restore_state
>> >   pci_iov_set_numvfs
>> >
>> > where we don't set PCI_SRIOV_CTRL_ARI at all.  The fact that you say
>> > PCI_SRIOV_CTRL_ARI isn't set until *after* pci_iov_set_numvfs() is
>> > called suggests that it is being set *somewhere*, but I don't know
>> > where.
>>
>> The ARI bit is initialized in sriov_init, stored in iov->ctrl, and
>> restored in sriov_restore_state, but it occurs in the line after the
>> call to pci_iov_set_numvfs.
>>
>> The problem is you don't want to write the full iov->ctrl value until
>> after you have reset the the number of VFs since it will set VFE so
>> pulling out and configuring the ARI value separately is needed.
>
> Doh, that should have been obvious to me ;)
>
>> >> This can cause the iov
>> >> structure to be populated with values that are incorrect if the bit is
>> >> later set.   Check and set this bit, if needed, before calling
>> >> pci_iov_set_numvfs() so that the values being populated properly take
>> >> the ARI bit into account.
>> >>
>> >> CC: Alexander Duyck 
>> >> CC: Emil Tantilov 
>> >> Signed-off-by: Tony Nguyen 
>> >> ---
>> >>  drivers/pci/iov.c | 4 
>> >>  1 file changed, 4 insertions(+)
>> >>
>> >> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
>> >> index 7492a65..a8896c7 100644
>> >> --- a/drivers/pci/iov.c
>> >> +++ b/drivers/pci/iov.c
>> >> @@ -497,6 +497,10 @@ static void sriov_restore_state(struct pci_dev *dev)
>> >>   if (ctrl & PCI_SRIOV_CTRL_VFE)
>> >>   return;
>> >>
>> >> + if ((iov->ctrl & PCI_SRIOV_CTRL_ARI) && !(ctrl & 
>> >> PCI_SRIOV_CTRL_ARI))
>> >> + pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL,
>> >> +   ctrl | PCI_SRIOV_CTRL_ARI);
>
> This looks a little fiddly and also assumes that we only ever need to
> *set* PCI_SRIOV_CTRL_ARI.  That's likely the case because it's
> probably cleared after

Re: [PATCH 2/2] PCI: mvebu: Check DRAM window size

2017-10-05 Thread Bjorn Helgaas

On Mon, Sep 25, 2017 at 06:56:58PM -0500, Bjorn Helgaas wrote:
> On Mon, Aug 28, 2017 at 05:25:17PM +0200, Jan Luebbe wrote:
> > The sum of the DRAM windows may exceed 4GB (at least on Armada XP).
> > Return an error in that case.
> > 
> > Signed-off-by: Jan Luebbe 
> 
> Looking for an ack from Thomas or Jason before applying this...

Ping, I think I'm stil waiting for an ack for this.  Or did I miss it?

> > ---
> >  drivers/pci/host/pci-mvebu.c | 27 ++-
> >  1 file changed, 22 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
> > index f353a6eb2f01..5d74af81d104 100644
> > --- a/drivers/pci/host/pci-mvebu.c
> > +++ b/drivers/pci/host/pci-mvebu.c
> > @@ -206,10 +206,10 @@ static void mvebu_pcie_set_local_dev_nr(struct 
> > mvebu_pcie_port *port, int nr)
> >   * BAR[0,2] -> disabled, BAR[1] -> covers all DRAM banks
> >   * WIN[0-3] -> DRAM bank[0-3]
> >   */
> > -static void mvebu_pcie_setup_wins(struct mvebu_pcie_port *port)
> > +static int mvebu_pcie_setup_wins(struct mvebu_pcie_port *port)
> >  {
> > const struct mbus_dram_target_info *dram;
> > -   u32 size;
> > +   u64 size;
> > int i;
> >  
> > dram = mv_mbus_dram_info();
> > @@ -252,19 +252,32 @@ static void mvebu_pcie_setup_wins(struct 
> > mvebu_pcie_port *port)
> > if ((size & (size - 1)) != 0)
> > size = 1 << fls(size);
> >  
> > +   if (size > 0x1) {
> > +   dev_err(>pcie->pdev->dev,
> > +   "Could not configure DRAM window (too large): 0x%llx\n",
> > +   size);
> > +
> > +   return -EINVAL;
> > +   }
> > +
> > /* Setup BAR[1] to all DRAM banks. */
> > mvebu_writel(port, dram->cs[0].base, PCIE_BAR_LO_OFF(1));
> > mvebu_writel(port, 0, PCIE_BAR_HI_OFF(1));
> > mvebu_writel(port, ((size - 1) & 0x) | 1,
> >  PCIE_BAR_CTRL_OFF(1));
> > +
> > +   return 0;
> >  }
> >  
> > -static void mvebu_pcie_setup_hw(struct mvebu_pcie_port *port)
> > +static int mvebu_pcie_setup_hw(struct mvebu_pcie_port *port)
> >  {
> > u32 cmd, mask;
> > +   int ret;
> >  
> > /* Point PCIe unit MBUS decode windows to DRAM space. */
> > -   mvebu_pcie_setup_wins(port);
> > +   ret = mvebu_pcie_setup_wins(port);
> > +   if (ret)
> > +   return ret;
> >  
> > /* Master + slave enable. */
> > cmd = mvebu_readl(port, PCIE_CMD_OFF);
> > @@ -277,6 +290,8 @@ static void mvebu_pcie_setup_hw(struct mvebu_pcie_port 
> > *port)
> > mask = mvebu_readl(port, PCIE_MASK_OFF);
> > mask |= PCIE_MASK_ENABLE_INTS;
> > mvebu_writel(port, mask, PCIE_MASK_OFF);
> > +
> > +   return 0;
> >  }
> >  
> >  static int mvebu_pcie_hw_rd_conf(struct mvebu_pcie_port *port,
> > @@ -882,7 +897,9 @@ static int mvebu_pcie_setup(int nr, struct pci_sys_data 
> > *sys)
> >  
> > if (!port->base)
> > continue;
> > -   mvebu_pcie_setup_hw(port);
> > +   err = mvebu_pcie_setup_hw(port);
> > +   if (err)
> > +   return 0;
> > }
> >  
> > return 1;
> > -- 
> > 2.11.0
> > 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

Re: [PATCH 2/2] PCI: mvebu: Check DRAM window size

2017-10-05 Thread Bjorn Helgaas

On Mon, Sep 25, 2017 at 06:56:58PM -0500, Bjorn Helgaas wrote:
> On Mon, Aug 28, 2017 at 05:25:17PM +0200, Jan Luebbe wrote:
> > The sum of the DRAM windows may exceed 4GB (at least on Armada XP).
> > Return an error in that case.
> > 
> > Signed-off-by: Jan Luebbe 
> 
> Looking for an ack from Thomas or Jason before applying this...

Ping, I think I'm stil waiting for an ack for this.  Or did I miss it?

> > ---
> >  drivers/pci/host/pci-mvebu.c | 27 ++-
> >  1 file changed, 22 insertions(+), 5 deletions(-)
> > 
> > diff --git a/drivers/pci/host/pci-mvebu.c b/drivers/pci/host/pci-mvebu.c
> > index f353a6eb2f01..5d74af81d104 100644
> > --- a/drivers/pci/host/pci-mvebu.c
> > +++ b/drivers/pci/host/pci-mvebu.c
> > @@ -206,10 +206,10 @@ static void mvebu_pcie_set_local_dev_nr(struct 
> > mvebu_pcie_port *port, int nr)
> >   * BAR[0,2] -> disabled, BAR[1] -> covers all DRAM banks
> >   * WIN[0-3] -> DRAM bank[0-3]
> >   */
> > -static void mvebu_pcie_setup_wins(struct mvebu_pcie_port *port)
> > +static int mvebu_pcie_setup_wins(struct mvebu_pcie_port *port)
> >  {
> > const struct mbus_dram_target_info *dram;
> > -   u32 size;
> > +   u64 size;
> > int i;
> >  
> > dram = mv_mbus_dram_info();
> > @@ -252,19 +252,32 @@ static void mvebu_pcie_setup_wins(struct 
> > mvebu_pcie_port *port)
> > if ((size & (size - 1)) != 0)
> > size = 1 << fls(size);
> >  
> > +   if (size > 0x1) {
> > +   dev_err(>pcie->pdev->dev,
> > +   "Could not configure DRAM window (too large): 0x%llx\n",
> > +   size);
> > +
> > +   return -EINVAL;
> > +   }
> > +
> > /* Setup BAR[1] to all DRAM banks. */
> > mvebu_writel(port, dram->cs[0].base, PCIE_BAR_LO_OFF(1));
> > mvebu_writel(port, 0, PCIE_BAR_HI_OFF(1));
> > mvebu_writel(port, ((size - 1) & 0x) | 1,
> >  PCIE_BAR_CTRL_OFF(1));
> > +
> > +   return 0;
> >  }
> >  
> > -static void mvebu_pcie_setup_hw(struct mvebu_pcie_port *port)
> > +static int mvebu_pcie_setup_hw(struct mvebu_pcie_port *port)
> >  {
> > u32 cmd, mask;
> > +   int ret;
> >  
> > /* Point PCIe unit MBUS decode windows to DRAM space. */
> > -   mvebu_pcie_setup_wins(port);
> > +   ret = mvebu_pcie_setup_wins(port);
> > +   if (ret)
> > +   return ret;
> >  
> > /* Master + slave enable. */
> > cmd = mvebu_readl(port, PCIE_CMD_OFF);
> > @@ -277,6 +290,8 @@ static void mvebu_pcie_setup_hw(struct mvebu_pcie_port 
> > *port)
> > mask = mvebu_readl(port, PCIE_MASK_OFF);
> > mask |= PCIE_MASK_ENABLE_INTS;
> > mvebu_writel(port, mask, PCIE_MASK_OFF);
> > +
> > +   return 0;
> >  }
> >  
> >  static int mvebu_pcie_hw_rd_conf(struct mvebu_pcie_port *port,
> > @@ -882,7 +897,9 @@ static int mvebu_pcie_setup(int nr, struct pci_sys_data 
> > *sys)
> >  
> > if (!port->base)
> > continue;
> > -   mvebu_pcie_setup_hw(port);
> > +   err = mvebu_pcie_setup_hw(port);
> > +   if (err)
> > +   return 0;
> > }
> >  
> > return 1;
> > -- 
> > 2.11.0
> > 
> 
> ___
> linux-arm-kernel mailing list
> linux-arm-ker...@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel

[PATCH v10 00/10] complete deferred page initialization

2017-10-05 Thread Pavel Tatashin

Changelog:
v10 - v9
- Addressed new comments from Michal Hocko.
- Sent "mm: deferred_init_memmap improvements" as a separate patch as
  it is also fixing existing problem.
- Merged "mm: stop zeroing memory during allocation in vmemmap" with
  "mm: zero struct pages during initialization".
- Added more comments "mm: zero reserved and unavailable struct pages"

v9 - v8
- Addressed comments raised by Mark Rutland and Ard Biesheuvel: changed
  kasan implementation. Added a new function: kasan_map_populate() that
  zeroes the allocated and mapped memory

v8 - v7
- Added Acked-by's from Dave Miller for SPARC changes
- Fixed a minor compiling issue on tile architecture reported by kbuild

v7 - v6
- Addressed comments from Michal Hocko
- memblock_discard() patch was removed from this series and integrated
  separately
- Fixed bug reported by kbuild test robot new patch:
  mm: zero reserved and unavailable struct pages
- Removed patch
  x86/mm: reserve only exiting low pages
  As, it is not needed anymore, because of the previous fix 
- Re-wrote deferred_init_memmap(), found and fixed an existing bug, where
  page variable is not reset when zone holes present.
- Merged several patches together per Michal request
- Added performance data including raw logs

v6 - v5
- Fixed ARM64 + kasan code, as reported by Ard Biesheuvel
- Tested ARM64 code in qemu and found few more issues, that I fixed in this
  iteration
- Added page roundup/rounddown to x86 and arm zeroing routines to zero the
  whole allocated range, instead of only provided address range.
- Addressed SPARC related comment from Sam Ravnborg
- Fixed section mismatch warnings related to memblock_discard().

v5 - v4
- Fixed build issues reported by kbuild on various configurations
v4 - v3
- Rewrote code to zero sturct pages in __init_single_page() as
  suggested by Michal Hocko
- Added code to handle issues related to accessing struct page
  memory before they are initialized.

v3 - v2
- Addressed David Miller comments about one change per patch:
* Splited changes to platforms into 4 patches
* Made "do not zero vmemmap_buf" as a separate patch

v2 - v1
- Per request, added s390 to deferred "struct page" zeroing
- Collected performance data on x86 which proofs the importance to
  keep memset() as prefetch (see below).

SMP machines can benefit from the DEFERRED_STRUCT_PAGE_INIT config option,
which defers initializing struct pages until all cpus have been started so
it can be done in parallel.

However, this feature is sub-optimal, because the deferred page
initialization code expects that the struct pages have already been zeroed,
and the zeroing is done early in boot with a single thread only.  Also, we
access that memory and set flags before struct pages are initialized. All
of this is fixed in this patchset.

In this work we do the following:
- Never read access struct page until it was initialized
- Never set any fields in struct pages before they are initialized
- Zero struct page at the beginning of struct page initialization

==
Performance improvements on x86 machine with 8 nodes:
Intel(R) Xeon(R) CPU E7-8895 v3 @ 2.60GHz and 1T of memory:
TIME  SPEED UP
base no deferred:   95.796233s
fix no deferred:79.978956s19.77%

base deferred:  77.254713s
fix deferred:   55.050509s40.34%
==
SPARC M6 3600 MHz with 15T of memory
TIME  SPEED UP
base no deferred:   358.335727s
fix no deferred:302.320936s   18.52%

base deferred:  237.534603s
fix deferred:   182.103003s   30.44%
==
Raw dmesg output with timestamps:
x86 base no deferred:https://hastebin.com/ofunepurit.scala
x86 base deferred:   https://hastebin.com/ifazegeyas.scala
x86 fix no deferred: https://hastebin.com/pegocohevo.scala
x86 fix deferred:https://hastebin.com/ofupevikuk.scala
sparc base no deferred:  https://hastebin.com/ibobeteken.go
sparc base deferred: https://hastebin.com/fariqimiyu.go
sparc fix no deferred:   https://hastebin.com/muhegoheyi.go
sparc fix deferred:  https://hastebin.com/xadinobutu.go

Pavel Tatashin (10):
  x86/mm: setting fields in deferred pages
  sparc64/mm: setting fields in deferred pages
  sparc64: simplify vmemmap_populate
  mm: defining memblock_virt_alloc_try_nid_raw
  mm: zero reserved and unavailable struct pages
  mm/kasan: kasan specific map populate function
  x86/kasan: use kasan_map_populate()
  arm64/kasan: use kasan_map_populate()
  mm: stop zeroing memory during allocation in vmemmap
  sparc64: optimized struct page zeroing

 arch/arm64/include/asm/pgtable.h|  3 ++
 arch/arm64/mm/kasan_init.c  | 12 +++
 arch/sparc/include/asm/pgtable_64.h | 30 +

[PATCH v10 00/10] complete deferred page initialization

2017-10-05 Thread Pavel Tatashin

Changelog:
v10 - v9
- Addressed new comments from Michal Hocko.
- Sent "mm: deferred_init_memmap improvements" as a separate patch as
  it is also fixing existing problem.
- Merged "mm: stop zeroing memory during allocation in vmemmap" with
  "mm: zero struct pages during initialization".
- Added more comments "mm: zero reserved and unavailable struct pages"

v9 - v8
- Addressed comments raised by Mark Rutland and Ard Biesheuvel: changed
  kasan implementation. Added a new function: kasan_map_populate() that
  zeroes the allocated and mapped memory

v8 - v7
- Added Acked-by's from Dave Miller for SPARC changes
- Fixed a minor compiling issue on tile architecture reported by kbuild

v7 - v6
- Addressed comments from Michal Hocko
- memblock_discard() patch was removed from this series and integrated
  separately
- Fixed bug reported by kbuild test robot new patch:
  mm: zero reserved and unavailable struct pages
- Removed patch
  x86/mm: reserve only exiting low pages
  As, it is not needed anymore, because of the previous fix 
- Re-wrote deferred_init_memmap(), found and fixed an existing bug, where
  page variable is not reset when zone holes present.
- Merged several patches together per Michal request
- Added performance data including raw logs

v6 - v5
- Fixed ARM64 + kasan code, as reported by Ard Biesheuvel
- Tested ARM64 code in qemu and found few more issues, that I fixed in this
  iteration
- Added page roundup/rounddown to x86 and arm zeroing routines to zero the
  whole allocated range, instead of only provided address range.
- Addressed SPARC related comment from Sam Ravnborg
- Fixed section mismatch warnings related to memblock_discard().

v5 - v4
- Fixed build issues reported by kbuild on various configurations
v4 - v3
- Rewrote code to zero sturct pages in __init_single_page() as
  suggested by Michal Hocko
- Added code to handle issues related to accessing struct page
  memory before they are initialized.

v3 - v2
- Addressed David Miller comments about one change per patch:
* Splited changes to platforms into 4 patches
* Made "do not zero vmemmap_buf" as a separate patch

v2 - v1
- Per request, added s390 to deferred "struct page" zeroing
- Collected performance data on x86 which proofs the importance to
  keep memset() as prefetch (see below).

SMP machines can benefit from the DEFERRED_STRUCT_PAGE_INIT config option,
which defers initializing struct pages until all cpus have been started so
it can be done in parallel.

However, this feature is sub-optimal, because the deferred page
initialization code expects that the struct pages have already been zeroed,
and the zeroing is done early in boot with a single thread only.  Also, we
access that memory and set flags before struct pages are initialized. All
of this is fixed in this patchset.

In this work we do the following:
- Never read access struct page until it was initialized
- Never set any fields in struct pages before they are initialized
- Zero struct page at the beginning of struct page initialization

==
Performance improvements on x86 machine with 8 nodes:
Intel(R) Xeon(R) CPU E7-8895 v3 @ 2.60GHz and 1T of memory:
TIME  SPEED UP
base no deferred:   95.796233s
fix no deferred:79.978956s19.77%

base deferred:  77.254713s
fix deferred:   55.050509s40.34%
==
SPARC M6 3600 MHz with 15T of memory
TIME  SPEED UP
base no deferred:   358.335727s
fix no deferred:302.320936s   18.52%

base deferred:  237.534603s
fix deferred:   182.103003s   30.44%
==
Raw dmesg output with timestamps:
x86 base no deferred:https://hastebin.com/ofunepurit.scala
x86 base deferred:   https://hastebin.com/ifazegeyas.scala
x86 fix no deferred: https://hastebin.com/pegocohevo.scala
x86 fix deferred:https://hastebin.com/ofupevikuk.scala
sparc base no deferred:  https://hastebin.com/ibobeteken.go
sparc base deferred: https://hastebin.com/fariqimiyu.go
sparc fix no deferred:   https://hastebin.com/muhegoheyi.go
sparc fix deferred:  https://hastebin.com/xadinobutu.go

Pavel Tatashin (10):
  x86/mm: setting fields in deferred pages
  sparc64/mm: setting fields in deferred pages
  sparc64: simplify vmemmap_populate
  mm: defining memblock_virt_alloc_try_nid_raw
  mm: zero reserved and unavailable struct pages
  mm/kasan: kasan specific map populate function
  x86/kasan: use kasan_map_populate()
  arm64/kasan: use kasan_map_populate()
  mm: stop zeroing memory during allocation in vmemmap
  sparc64: optimized struct page zeroing

 arch/arm64/include/asm/pgtable.h|  3 ++
 arch/arm64/mm/kasan_init.c  | 12 +++
 arch/sparc/include/asm/pgtable_64.h | 30 +

Re: [PATCH RT] time/hrtimer: Use softirq based wakeups for non-RT threads

2017-10-05 Thread Gratian Crisan

Sebastian Andrzej Siewior writes:

> Normal wake ups (like clock_nanosleep()) which are performed by normal
> users can easily lead to 2ms latency spikes if (enough) hrtimer wakeups
> are synchronized.
> This patch moves all hrtimers wakeups to the softirq queue unless the
> caller has a RT priority.
>
> Reported-by: Gratian Crisan 

I can confirm this patch fixes the original problem reported.

I ran an overnight test (about 30 hours total) on two platforms using
cyclictest + hrtimer stress load: configurable number of SCHED_OTHER
threads doing random clock_nanosleep() in up to 1 mS intervals.

Without this patch the max latency recorded was:
  * on dual core, Intel Atom E3825 @ 1.33GHz: 107 uS
  * on dual core, Zynq 7020 @ 667MHz: 261 uS

With this patch the max numbers drop to:
  * on dual core, Intel Atom E3825 @ 1.33GHz: 66 uS
  * on dual core, Zynq 7020 @ 667MHz: 90 uS

The max latency numbers with this patch are in-line with expectations
for these platforms.

Thank you so much,
-Gratian

Re: [PREVIEW] Add x86 zstd kernel compression

2017-10-05 Thread René Rebe


Hi,

On 10/05/2017 09:22 PM, Nick Terrell wrote:

On 10/5/17, 1:18 AM, "René Rebe"  wrote:


Hi Nick,

On Oct 5, 2017, at 1:29, Nick Terrell  wrote:


On 10/4/17, 3:01 AM, "linux-kernel-ow...@vger.kernel.org on behalf of René Rebe" 
 wrote:

Hi,

I noticed zstd compression was recently added for btrfs and squashfs.
Are there actually already patches floating around for zstd kernel and intird 
compression?
Looks like that would be a quite nice fit regarding speed and compression 
ratio, …

Regards,
René


I started working on some patches yesterday, and just got zstd kernel,
initrd, and initramfs compression working today. I think I'll be ready to
send the patches out within a week.


awesome, if you have something to test I could give it a try on my test bench, 
too.


Here is a preview of the patch, though its not quite ready.
I believe it works, but I still need to clean it up a bit, and split it
into the decompression function, the initrd piece, and the x86 specific
piece. I'll try to get the patch set out this week, and I'll CC you.


thanks for the preview, which I applied cleanly, modulo this white-space 
chunk:


diff --git a/lib/Kconfig b/lib/Kconfig
index b1445b2..02fdcac 100644
@@ -296,7 +300,7 @@ config GENERIC_ALLOCATOR
  #
  config REED_SOLOMON
tristate
-
+
  config REED_SOLOMON_ENC8
bool
 to the 4.14-rc2 tarball and it compiled and booted on x86-64,
including a zstd initrd!

Great work, will probably start testing it on my personal
laptop, to use the same compression everywhere.

Regards,
René

--
  René Rebe, ExactCODE GmbH, Lietzenburger Str. 42, DE-10117 Berlin
  http://exactcode.com | http://t2-project.org | http://rene.rebe.de

Re: [PATCH RT] time/hrtimer: Use softirq based wakeups for non-RT threads

2017-10-05 Thread Gratian Crisan

Sebastian Andrzej Siewior writes:

> Normal wake ups (like clock_nanosleep()) which are performed by normal
> users can easily lead to 2ms latency spikes if (enough) hrtimer wakeups
> are synchronized.
> This patch moves all hrtimers wakeups to the softirq queue unless the
> caller has a RT priority.
>
> Reported-by: Gratian Crisan 

I can confirm this patch fixes the original problem reported.

I ran an overnight test (about 30 hours total) on two platforms using
cyclictest + hrtimer stress load: configurable number of SCHED_OTHER
threads doing random clock_nanosleep() in up to 1 mS intervals.

Without this patch the max latency recorded was:
  * on dual core, Intel Atom E3825 @ 1.33GHz: 107 uS
  * on dual core, Zynq 7020 @ 667MHz: 261 uS

With this patch the max numbers drop to:
  * on dual core, Intel Atom E3825 @ 1.33GHz: 66 uS
  * on dual core, Zynq 7020 @ 667MHz: 90 uS

The max latency numbers with this patch are in-line with expectations
for these platforms.

Thank you so much,
-Gratian

Re: [PREVIEW] Add x86 zstd kernel compression

2017-10-05 Thread René Rebe


Hi,

On 10/05/2017 09:22 PM, Nick Terrell wrote:

On 10/5/17, 1:18 AM, "René Rebe"  wrote:


Hi Nick,

On Oct 5, 2017, at 1:29, Nick Terrell  wrote:


On 10/4/17, 3:01 AM, "linux-kernel-ow...@vger.kernel.org on behalf of René Rebe" 
 wrote:

Hi,

I noticed zstd compression was recently added for btrfs and squashfs.
Are there actually already patches floating around for zstd kernel and intird 
compression?
Looks like that would be a quite nice fit regarding speed and compression 
ratio, …

Regards,
René


I started working on some patches yesterday, and just got zstd kernel,
initrd, and initramfs compression working today. I think I'll be ready to
send the patches out within a week.


awesome, if you have something to test I could give it a try on my test bench, 
too.


Here is a preview of the patch, though its not quite ready.
I believe it works, but I still need to clean it up a bit, and split it
into the decompression function, the initrd piece, and the x86 specific
piece. I'll try to get the patch set out this week, and I'll CC you.


thanks for the preview, which I applied cleanly, modulo this white-space 
chunk:


diff --git a/lib/Kconfig b/lib/Kconfig
index b1445b2..02fdcac 100644
@@ -296,7 +300,7 @@ config GENERIC_ALLOCATOR
  #
  config REED_SOLOMON
tristate
-
+
  config REED_SOLOMON_ENC8
bool
 to the 4.14-rc2 tarball and it compiled and booted on x86-64,
including a zstd initrd!

Great work, will probably start testing it on my personal
laptop, to use the same compression everywhere.

Regards,
René

--
  René Rebe, ExactCODE GmbH, Lietzenburger Str. 42, DE-10117 Berlin
  http://exactcode.com | http://t2-project.org | http://rene.rebe.de

RE: [PATCH v4 4/5] cramfs: add mmap support

2017-10-05 Thread Nicolas Pitre

On Thu, 5 Oct 2017, Chris Brandt wrote:

> On Wednesday, October 04, 2017, Nicolas Pitre wrote:
> > Anyway, here's a replacement for patch 4/5 below:
> > 
> > - >8
> > Subject: cramfs: add mmap support
> > 
> > When cramfs_physmem is used then we have the opportunity to map files
> > directly from ROM, directly into user space, saving on RAM usage.
> > This gives us Execute-In-Place (XIP) support.
> 
> 
> Tested on my setup:
>  * Cortex A9 (with MMU)
>  * CONFIG_XIP_KERNEL=y
>  * booted with XIP CRAMFS as my rootfs 
>  * all apps and libraries marked as XIP in my cramfs image
> 
> 
> 
> So far, functionally it seems to work the same as [PATCH v4 4/5].
> 
> As Nicolas said, before you could easily see that all my apps and 
> libraries were XIP from Flash:
> 
> $ cat /proc/self/maps
> 8000-000a1000 r-xp 1b005000 00:0c 18192  /bin/busybox
> 000a9000-000aa000 rw-p 00099000 00:0c 18192  /bin/busybox
> 000aa000-000ac000 rw-p  00:00 0  [heap]
> b6e69000-b6f42000 r-xp 1b0bc000 00:0c 766540 /lib/libc-2.18-2013.10.so
> b6f42000-b6f4a000 ---p 1b195000 00:0c 766540 /lib/libc-2.18-2013.10.so
> b6f4a000-b6f4c000 r--p 000d9000 00:0c 766540 /lib/libc-2.18-2013.10.so
> b6f4c000-b6f4d000 rw-p 000db000 00:0c 766540 /lib/libc-2.18-2013.10.so
> b6f4d000-b6f5 rw-p  00:00 0
> b6f5-b6f67000 r-xp 1b0a4000 00:0c 670372 /lib/ld-2.18-2013.10.so
> b6f6a000-b6f6b000 rw-p  00:00 0
> b6f6c000-b6f6e000 rw-p  00:00 0
> b6f6e000-b6f6f000 r--p 00016000 00:0c 670372 /lib/ld-2.18-2013.10.so
> b6f6f000-b6f7 rw-p 00017000 00:0c 670372 /lib/ld-2.18-2013.10.so
> beac-beae1000 rw-p  00:00 0  [stack]
> bebc9000-bebca000 r-xp  00:00 0  [sigpage]
> -1000 r-xp  00:00 0  [vectors]
> 
> 
> 
> But now just busybox looks like it's XIP:
> 
> $ cat /proc/self/maps
> 8000-000a1000 r-xp 1b005000 00:0c 18192  /bin/busybox
> 000a9000-000aa000 rw-p 00099000 00:0c 18192  /bin/busybox
> 000aa000-000ac000 rw-p  00:00 0  [heap]
> b6e4d000-b6f26000 r-xp  00:0c 766540 /lib/libc-2.18-2013.10.so
> b6f26000-b6f2e000 ---p 000d9000 00:0c 766540 /lib/libc-2.18-2013.10.so
> b6f2e000-b6f3 r--p 000d9000 00:0c 766540 /lib/libc-2.18-2013.10.so
> b6f3-b6f31000 rw-p 000db000 00:0c 766540 /lib/libc-2.18-2013.10.so
> b6f31000-b6f34000 rw-p  00:00 0
> b6f34000-b6f4b000 r-xp  00:0c 670372 /lib/ld-2.18-2013.10.so
> b6f4e000-b6f4f000 rw-p  00:00 0
> b6f5-b6f52000 rw-p  00:00 0
> b6f52000-b6f53000 r--p 00016000 00:0c 670372 /lib/ld-2.18-2013.10.so
> b6f53000-b6f54000 rw-p 00017000 00:0c 670372 /lib/ld-2.18-2013.10.so
> bec93000-becb4000 rw-p  00:00 0  [stack]
> befad000-befae000 r-xp  00:00 0  [sigpage]
> -1000 r-xp  00:00 0  [vectors]

Do you have the same amount of free memory once booted in both cases?

> Regardless, from a functional standpoint:
> 
> Tested-by: Chris Brandt 

Thanks.

> Just FYI, the previous [PATCH v4 4/5] also included this (which was the 
> only real difference between v3 and v4):
> 
> 
> diff --git a/fs/cramfs/Kconfig b/fs/cramfs/Kconfig
> index 5b4e0b7e13..306549be25 100644
> --- a/fs/cramfs/Kconfig
> +++ b/fs/cramfs/Kconfig
> @@ -30,7 +30,7 @@ config CRAMFS_BLOCKDEV
>  
>  config CRAMFS_PHYSMEM
>   bool "Support CramFs image directly mapped in physical memory"
> - depends on CRAMFS
> + depends on CRAMFS = y

Yeah, that was necessary because split_vma() wasn't exported to modules. 
Now split_vma() is no longer used so the no-module restriction has also 
been removed.


Nicolas

RE: [PATCH v4 4/5] cramfs: add mmap support

2017-10-05 Thread Nicolas Pitre

On Thu, 5 Oct 2017, Chris Brandt wrote:

> On Wednesday, October 04, 2017, Nicolas Pitre wrote:
> > Anyway, here's a replacement for patch 4/5 below:
> > 
> > - >8
> > Subject: cramfs: add mmap support
> > 
> > When cramfs_physmem is used then we have the opportunity to map files
> > directly from ROM, directly into user space, saving on RAM usage.
> > This gives us Execute-In-Place (XIP) support.
> 
> 
> Tested on my setup:
>  * Cortex A9 (with MMU)
>  * CONFIG_XIP_KERNEL=y
>  * booted with XIP CRAMFS as my rootfs 
>  * all apps and libraries marked as XIP in my cramfs image
> 
> 
> 
> So far, functionally it seems to work the same as [PATCH v4 4/5].
> 
> As Nicolas said, before you could easily see that all my apps and 
> libraries were XIP from Flash:
> 
> $ cat /proc/self/maps
> 8000-000a1000 r-xp 1b005000 00:0c 18192  /bin/busybox
> 000a9000-000aa000 rw-p 00099000 00:0c 18192  /bin/busybox
> 000aa000-000ac000 rw-p  00:00 0  [heap]
> b6e69000-b6f42000 r-xp 1b0bc000 00:0c 766540 /lib/libc-2.18-2013.10.so
> b6f42000-b6f4a000 ---p 1b195000 00:0c 766540 /lib/libc-2.18-2013.10.so
> b6f4a000-b6f4c000 r--p 000d9000 00:0c 766540 /lib/libc-2.18-2013.10.so
> b6f4c000-b6f4d000 rw-p 000db000 00:0c 766540 /lib/libc-2.18-2013.10.so
> b6f4d000-b6f5 rw-p  00:00 0
> b6f5-b6f67000 r-xp 1b0a4000 00:0c 670372 /lib/ld-2.18-2013.10.so
> b6f6a000-b6f6b000 rw-p  00:00 0
> b6f6c000-b6f6e000 rw-p  00:00 0
> b6f6e000-b6f6f000 r--p 00016000 00:0c 670372 /lib/ld-2.18-2013.10.so
> b6f6f000-b6f7 rw-p 00017000 00:0c 670372 /lib/ld-2.18-2013.10.so
> beac-beae1000 rw-p  00:00 0  [stack]
> bebc9000-bebca000 r-xp  00:00 0  [sigpage]
> -1000 r-xp  00:00 0  [vectors]
> 
> 
> 
> But now just busybox looks like it's XIP:
> 
> $ cat /proc/self/maps
> 8000-000a1000 r-xp 1b005000 00:0c 18192  /bin/busybox
> 000a9000-000aa000 rw-p 00099000 00:0c 18192  /bin/busybox
> 000aa000-000ac000 rw-p  00:00 0  [heap]
> b6e4d000-b6f26000 r-xp  00:0c 766540 /lib/libc-2.18-2013.10.so
> b6f26000-b6f2e000 ---p 000d9000 00:0c 766540 /lib/libc-2.18-2013.10.so
> b6f2e000-b6f3 r--p 000d9000 00:0c 766540 /lib/libc-2.18-2013.10.so
> b6f3-b6f31000 rw-p 000db000 00:0c 766540 /lib/libc-2.18-2013.10.so
> b6f31000-b6f34000 rw-p  00:00 0
> b6f34000-b6f4b000 r-xp  00:0c 670372 /lib/ld-2.18-2013.10.so
> b6f4e000-b6f4f000 rw-p  00:00 0
> b6f5-b6f52000 rw-p  00:00 0
> b6f52000-b6f53000 r--p 00016000 00:0c 670372 /lib/ld-2.18-2013.10.so
> b6f53000-b6f54000 rw-p 00017000 00:0c 670372 /lib/ld-2.18-2013.10.so
> bec93000-becb4000 rw-p  00:00 0  [stack]
> befad000-befae000 r-xp  00:00 0  [sigpage]
> -1000 r-xp  00:00 0  [vectors]

Do you have the same amount of free memory once booted in both cases?

> Regardless, from a functional standpoint:
> 
> Tested-by: Chris Brandt 

Thanks.

> Just FYI, the previous [PATCH v4 4/5] also included this (which was the 
> only real difference between v3 and v4):
> 
> 
> diff --git a/fs/cramfs/Kconfig b/fs/cramfs/Kconfig
> index 5b4e0b7e13..306549be25 100644
> --- a/fs/cramfs/Kconfig
> +++ b/fs/cramfs/Kconfig
> @@ -30,7 +30,7 @@ config CRAMFS_BLOCKDEV
>  
>  config CRAMFS_PHYSMEM
>   bool "Support CramFs image directly mapped in physical memory"
> - depends on CRAMFS
> + depends on CRAMFS = y

Yeah, that was necessary because split_vma() wasn't exported to modules. 
Now split_vma() is no longer used so the no-module restriction has also 
been removed.


Nicolas

[RELEASE] LTTng-modules 2.10.2 and 2.9.5 (Linux kernel tracer)

2017-10-05 Thread Mathieu Desnoyers

Hi,

Those are two important bugfix releases coming shortly after 2.10.1 and
2.9.4, which introduced a NULL pointer dereference in the block layer
instrumentation for kernels before 4.14. This issue has been caught by
our CI at .

All LTTng-modules users of versions 2.10.1 and 2.9.4 should upgrade.

Project website: http://lttng.org
Documentation: http://lttng.org/docs
Download link: http://lttng.org/download

Changelog:

2017-10-05 LTTng modules 2.10.2
* Fix: update block instrumentation for 4.14 kernel
* Revert "Fix: update block instrumentation for kernel 4.14"

2017-10-05 LTTng modules 2.9.5
* Fix: update block instrumentation for 4.14 kernel
* Revert "Fix: update block instrumentation for kernel 4.14"

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

[RELEASE] LTTng-modules 2.10.2 and 2.9.5 (Linux kernel tracer)

2017-10-05 Thread Mathieu Desnoyers

Hi,

Those are two important bugfix releases coming shortly after 2.10.1 and
2.9.4, which introduced a NULL pointer dereference in the block layer
instrumentation for kernels before 4.14. This issue has been caught by
our CI at .

All LTTng-modules users of versions 2.10.1 and 2.9.4 should upgrade.

Project website: http://lttng.org
Documentation: http://lttng.org/docs
Download link: http://lttng.org/download

Changelog:

2017-10-05 LTTng modules 2.10.2
* Fix: update block instrumentation for 4.14 kernel
* Revert "Fix: update block instrumentation for kernel 4.14"

2017-10-05 LTTng modules 2.9.5
* Fix: update block instrumentation for 4.14 kernel
* Revert "Fix: update block instrumentation for kernel 4.14"

Thanks,

Mathieu

-- 
Mathieu Desnoyers
EfficiOS Inc.
http://www.efficios.com

Re: Extreme time jitter with suspend/resume cycles

2017-10-05 Thread Gabriel Beddingfield

On Thu, Oct 5, 2017 at 2:04 PM, Thomas Gleixner  wrote:
>> > So that clocksource driver looks correct. Do you have an idea in which
>> > context this time jump happens? Does it happen when you exercise your high
>> > frequency suspend/resume dance or is that happening just when you let the
>> > machine run forever as well?
>>
>> We couldn't devise any reproduction steps. We observed it happening at
>> unexpected times in a fleet of devices -- and we couldn't find any
>> patterns to clue us in.
>
> Ok. Did you talk to NXP about that? Or did you try to exercise reads in a
> loop to detect the wreckage and maybe a pattern in there?

Yes, we talked to NXP about it. They don't have a conclusion on what happened.
While they've been very helpful... we were off the path from their reference
implementation and so it's not a high priority for them.

No, we didn't try that because we prioritized the "persistent clock" approach.
I have a little more time now and can try the loop-reading strategy.

-gabe

Re: Extreme time jitter with suspend/resume cycles

2017-10-05 Thread Gabriel Beddingfield

On Thu, Oct 5, 2017 at 2:04 PM, Thomas Gleixner  wrote:
>> > So that clocksource driver looks correct. Do you have an idea in which
>> > context this time jump happens? Does it happen when you exercise your high
>> > frequency suspend/resume dance or is that happening just when you let the
>> > machine run forever as well?
>>
>> We couldn't devise any reproduction steps. We observed it happening at
>> unexpected times in a fleet of devices -- and we couldn't find any
>> patterns to clue us in.
>
> Ok. Did you talk to NXP about that? Or did you try to exercise reads in a
> loop to detect the wreckage and maybe a pattern in there?

Yes, we talked to NXP about it. They don't have a conclusion on what happened.
While they've been very helpful... we were off the path from their reference
implementation and so it's not a high priority for them.

No, we didn't try that because we prioritized the "persistent clock" approach.
I have a little more time now and can try the loop-reading strategy.

-gabe

[PATCH v10 07/10] x86/kasan: use kasan_map_populate()

2017-10-05 Thread Pavel Tatashin

To optimize the performance of struct page initialization,
vmemmap_populate() will no longer zero memory.

Therefore, we must use a new interface to allocate and map kasan shadow
memory, that also zeroes memory for us.

Signed-off-by: Pavel Tatashin 
---
 arch/x86/mm/kasan_init_64.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index bc84b73684b7..2db95efd208e 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -23,7 +23,7 @@ static int __init map_range(struct range *range)
start = (unsigned long)kasan_mem_to_shadow(pfn_to_kaddr(range->start));
end = (unsigned long)kasan_mem_to_shadow(pfn_to_kaddr(range->end));
 
-   return vmemmap_populate(start, end, NUMA_NO_NODE);
+   return kasan_map_populate(start, end, NUMA_NO_NODE);
 }
 
 static void __init clear_pgds(unsigned long start,
@@ -136,9 +136,9 @@ void __init kasan_init(void)
kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM),
kasan_mem_to_shadow((void *)__START_KERNEL_map));
 
-   vmemmap_populate((unsigned long)kasan_mem_to_shadow(_stext),
-   (unsigned long)kasan_mem_to_shadow(_end),
-   NUMA_NO_NODE);
+   kasan_map_populate((unsigned long)kasan_mem_to_shadow(_stext),
+  (unsigned long)kasan_mem_to_shadow(_end),
+  NUMA_NO_NODE);
 
kasan_populate_zero_shadow(kasan_mem_to_shadow((void *)MODULES_END),
(void *)KASAN_SHADOW_END);
-- 
2.14.2

[PATCH v10 07/10] x86/kasan: use kasan_map_populate()

2017-10-05 Thread Pavel Tatashin

To optimize the performance of struct page initialization,
vmemmap_populate() will no longer zero memory.

Therefore, we must use a new interface to allocate and map kasan shadow
memory, that also zeroes memory for us.

Signed-off-by: Pavel Tatashin 
---
 arch/x86/mm/kasan_init_64.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/mm/kasan_init_64.c b/arch/x86/mm/kasan_init_64.c
index bc84b73684b7..2db95efd208e 100644
--- a/arch/x86/mm/kasan_init_64.c
+++ b/arch/x86/mm/kasan_init_64.c
@@ -23,7 +23,7 @@ static int __init map_range(struct range *range)
start = (unsigned long)kasan_mem_to_shadow(pfn_to_kaddr(range->start));
end = (unsigned long)kasan_mem_to_shadow(pfn_to_kaddr(range->end));
 
-   return vmemmap_populate(start, end, NUMA_NO_NODE);
+   return kasan_map_populate(start, end, NUMA_NO_NODE);
 }
 
 static void __init clear_pgds(unsigned long start,
@@ -136,9 +136,9 @@ void __init kasan_init(void)
kasan_mem_to_shadow((void *)PAGE_OFFSET + MAXMEM),
kasan_mem_to_shadow((void *)__START_KERNEL_map));
 
-   vmemmap_populate((unsigned long)kasan_mem_to_shadow(_stext),
-   (unsigned long)kasan_mem_to_shadow(_end),
-   NUMA_NO_NODE);
+   kasan_map_populate((unsigned long)kasan_mem_to_shadow(_stext),
+  (unsigned long)kasan_mem_to_shadow(_end),
+  NUMA_NO_NODE);
 
kasan_populate_zero_shadow(kasan_mem_to_shadow((void *)MODULES_END),
(void *)KASAN_SHADOW_END);
-- 
2.14.2

[PATCH v10 02/10] sparc64/mm: setting fields in deferred pages

2017-10-05 Thread Pavel Tatashin

Without deferred struct page feature (CONFIG_DEFERRED_STRUCT_PAGE_INIT),
flags and other fields in "struct page"es are never changed prior to first
initializing struct pages by going through __init_single_page().

With deferred struct page feature enabled there is a case where we set some
fields prior to initializing:

mem_init() {
 register_page_bootmem_info();
 free_all_bootmem();
 ...
}

When register_page_bootmem_info() is called only non-deferred struct pages
are initialized. But, this function goes through some reserved pages which
might be part of the deferred, and thus are not yet initialized.

mem_init
register_page_bootmem_info
register_page_bootmem_info_node
 get_page_bootmem
  .. setting fields here ..
  such as: page->freelist = (void *)type;

free_all_bootmem()
free_low_memory_core_early()
 for_each_reserved_mem_region()
  reserve_bootmem_region()
   init_reserved_page() <- Only if this is deferred reserved page
__init_single_pfn()
 __init_single_page()
  memset(0) <-- Loose the set fields here

We end-up with similar issue as in the previous patch, where currently we
do not observe problem as memory is zeroed. But, if flag asserts are
changed we can start hitting issues.

Also, because in this patch series we will stop zeroing struct page memory
during allocation, we must make sure that struct pages are properly
initialized prior to using them.

The deferred-reserved pages are initialized in free_all_bootmem().
Therefore, the fix is to switch the above calls.

Signed-off-by: Pavel Tatashin 
Reviewed-by: Steven Sistare 
Reviewed-by: Daniel Jordan 
Reviewed-by: Bob Picco 
Acked-by: David S. Miller 
Acked-by: Michal Hocko 
---
 arch/sparc/mm/init_64.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 6034569e2c0d..caed495544e9 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2548,9 +2548,16 @@ void __init mem_init(void)
 {
high_memory = __va(last_valid_pfn << PAGE_SHIFT);
 
-   register_page_bootmem_info();
free_all_bootmem();
 
+   /*
+* Must be done after boot memory is put on freelist, because here we
+* might set fields in deferred struct pages that have not yet been
+* initialized, and free_all_bootmem() initializes all the reserved
+* deferred pages for us.
+*/
+   register_page_bootmem_info();
+
/*
 * Set up the zero page, mark it reserved, so that page count
 * is not manipulated when freeing the page from user ptes.
-- 
2.14.2

[PATCH v10 02/10] sparc64/mm: setting fields in deferred pages

2017-10-05 Thread Pavel Tatashin

Without deferred struct page feature (CONFIG_DEFERRED_STRUCT_PAGE_INIT),
flags and other fields in "struct page"es are never changed prior to first
initializing struct pages by going through __init_single_page().

With deferred struct page feature enabled there is a case where we set some
fields prior to initializing:

mem_init() {
 register_page_bootmem_info();
 free_all_bootmem();
 ...
}

When register_page_bootmem_info() is called only non-deferred struct pages
are initialized. But, this function goes through some reserved pages which
might be part of the deferred, and thus are not yet initialized.

mem_init
register_page_bootmem_info
register_page_bootmem_info_node
 get_page_bootmem
  .. setting fields here ..
  such as: page->freelist = (void *)type;

free_all_bootmem()
free_low_memory_core_early()
 for_each_reserved_mem_region()
  reserve_bootmem_region()
   init_reserved_page() <- Only if this is deferred reserved page
__init_single_pfn()
 __init_single_page()
  memset(0) <-- Loose the set fields here

We end-up with similar issue as in the previous patch, where currently we
do not observe problem as memory is zeroed. But, if flag asserts are
changed we can start hitting issues.

Also, because in this patch series we will stop zeroing struct page memory
during allocation, we must make sure that struct pages are properly
initialized prior to using them.

The deferred-reserved pages are initialized in free_all_bootmem().
Therefore, the fix is to switch the above calls.

Signed-off-by: Pavel Tatashin 
Reviewed-by: Steven Sistare 
Reviewed-by: Daniel Jordan 
Reviewed-by: Bob Picco 
Acked-by: David S. Miller 
Acked-by: Michal Hocko 
---
 arch/sparc/mm/init_64.c | 9 -
 1 file changed, 8 insertions(+), 1 deletion(-)

diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index 6034569e2c0d..caed495544e9 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2548,9 +2548,16 @@ void __init mem_init(void)
 {
high_memory = __va(last_valid_pfn << PAGE_SHIFT);
 
-   register_page_bootmem_info();
free_all_bootmem();
 
+   /*
+* Must be done after boot memory is put on freelist, because here we
+* might set fields in deferred struct pages that have not yet been
+* initialized, and free_all_bootmem() initializes all the reserved
+* deferred pages for us.
+*/
+   register_page_bootmem_info();
+
/*
 * Set up the zero page, mark it reserved, so that page count
 * is not manipulated when freeing the page from user ptes.
-- 
2.14.2

[PATCH v10 05/10] mm: zero reserved and unavailable struct pages

2017-10-05 Thread Pavel Tatashin

Some memory is reserved but unavailable: not present in memblock.memory
(because not backed by physical pages), but present in memblock.reserved.
Such memory has backing struct pages, but they are not initialized by going
through __init_single_page().

In some cases these struct pages are accessed even if they do not contain
any data. One example is page_to_pfn() might access page->flags if this is
where section information is stored (CONFIG_SPARSEMEM,
SECTION_IN_PAGE_FLAGS).

One example of such memory: trim_low_memory_range() unconditionally
reserves from pfn 0, but e820__memblock_setup() might provide the exiting
memory from pfn 1 (i.e. KVM).

Since, struct pages are zeroed in __init_single_page(), and not during
allocation time, we must zero such struct pages explicitly.

The patch involves adding a new memblock iterator:
for_each_resv_unavail_range(i, p_start, p_end)

Which iterates through reserved && !memory lists, and we zero struct pages
explicitly by calling mm_zero_struct_page().

Signed-off-by: Pavel Tatashin 
Reviewed-by: Steven Sistare 
Reviewed-by: Daniel Jordan 
Reviewed-by: Bob Picco 
---
 include/linux/memblock.h | 16 
 include/linux/mm.h   | 15 +++
 mm/page_alloc.c  | 38 ++
 3 files changed, 69 insertions(+)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index bae11c7e7bf3..ce8bfa5f3e9b 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -237,6 +237,22 @@ unsigned long memblock_next_valid_pfn(unsigned long pfn, 
unsigned long max_pfn);
for_each_mem_range_rev(i, , , \
   nid, flags, p_start, p_end, p_nid)
 
+/**
+ * for_each_resv_unavail_range - iterate through reserved and unavailable 
memory
+ * @i: u64 used as loop variable
+ * @flags: pick from blocks based on memory attributes
+ * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
+ * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
+ *
+ * Walks over unavailable but reserved (reserved && !memory) areas of memblock.
+ * Available as soon as memblock is initialized.
+ * Note: because this memory does not belong to any physical node, flags and
+ * nid arguments do not make sense and thus not exported as arguments.
+ */
+#define for_each_resv_unavail_range(i, p_start, p_end) \
+   for_each_mem_range(i, , , \
+  NUMA_NO_NODE, MEMBLOCK_NONE, p_start, p_end, NULL)
+
 static inline void memblock_set_region_flags(struct memblock_region *r,
 unsigned long flags)
 {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 065d99deb847..04c8b2e5aff4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -94,6 +94,15 @@ extern int mmap_rnd_compat_bits __read_mostly;
 #define mm_forbids_zeropage(X) (0)
 #endif
 
+/*
+ * On some architectures it is expensive to call memset() for small sizes.
+ * Those architectures should provide their own implementation of "struct page"
+ * zeroing by defining this macro in .
+ */
+#ifndef mm_zero_struct_page
+#define mm_zero_struct_page(pp)  ((void)memset((pp), 0, sizeof(struct page)))
+#endif
+
 /*
  * Default maximum number of active map areas, this limits the number of vmas
  * per mm struct. Users can overwrite this number by sysctl but there is a
@@ -2001,6 +2010,12 @@ extern int __meminit __early_pfn_to_nid(unsigned long 
pfn,
struct mminit_pfnnid_cache *state);
 #endif
 
+#ifdef CONFIG_HAVE_MEMBLOCK
+void zero_resv_unavail(void);
+#else
+static inline void zero_resv_unavail(void) {}
+#endif
+
 extern void set_dma_reserve(unsigned long new_dma_reserve);
 extern void memmap_init_zone(unsigned long, int, unsigned long,
unsigned long, enum memmap_context);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 20b0bace2235..5f0013bbbe9d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6209,6 +6209,42 @@ void __paginginit free_area_init_node(int nid, unsigned 
long *zones_size,
free_area_init_core(pgdat);
 }
 
+#ifdef CONFIG_HAVE_MEMBLOCK
+/*
+ * Only struct pages that are backed by physical memory are zeroed and
+ * initialized by going through __init_single_page(). But, there are some
+ * struct pages which are reserved in memblock allocator and their fields
+ * may be accessed (for example page_to_pfn() on some configuration accesses
+ * flags). We must explicitly zero those struct pages.
+ */
+void __paginginit zero_resv_unavail(void)
+{
+   phys_addr_t start, end;
+   unsigned long pfn;
+   u64 i, pgcnt;
+
+   /* Loop through ranges that are reserved, but do not have reported
+* physical memory backing.
+*/
+   pgcnt = 0;
+   for_each_resv_unavail_range(i, , ) {
+   for

[PATCH v10 06/10] mm/kasan: kasan specific map populate function

2017-10-05 Thread Pavel Tatashin

During early boot, kasan uses vmemmap_populate() to establish its shadow
memory. But, that interface is intended for struct pages use.

Because of the current project, vmemmap won't be zeroed during allocation,
but kasan expects that memory to be zeroed. We are adding a new
kasan_map_populate() function to resolve this difference.

Signed-off-by: Pavel Tatashin 
---
 arch/arm64/include/asm/pgtable.h |  3 ++
 include/linux/kasan.h|  2 ++
 mm/kasan/kasan_init.c| 67 
 3 files changed, 72 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index b46e54c2399b..11ff58901519 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -381,6 +381,9 @@ extern pgprot_t phys_mem_access_prot(struct file *file, 
unsigned long pfn,
 PUD_TYPE_TABLE)
 #endif
 
+#define pmd_large(pmd) pmd_sect(pmd)
+#define pud_large(pud) pud_sect(pud)
+
 static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
 {
*pmdp = pmd;
diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index a5c7046f26b4..7e13df1722c2 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -78,6 +78,8 @@ size_t kasan_metadata_size(struct kmem_cache *cache);
 
 bool kasan_save_enable_multi_shot(void);
 void kasan_restore_multi_shot(bool enabled);
+int __meminit kasan_map_populate(unsigned long start, unsigned long end,
+int node);
 
 #else /* CONFIG_KASAN */
 
diff --git a/mm/kasan/kasan_init.c b/mm/kasan/kasan_init.c
index 554e4c0f23a2..57a973f05f63 100644
--- a/mm/kasan/kasan_init.c
+++ b/mm/kasan/kasan_init.c
@@ -197,3 +197,70 @@ void __init kasan_populate_zero_shadow(const void 
*shadow_start,
zero_p4d_populate(pgd, addr, next);
} while (pgd++, addr = next, addr != end);
 }
+
+/* Creates mappings for kasan during early boot. The mapped memory is zeroed */
+int __meminit kasan_map_populate(unsigned long start, unsigned long end,
+int node)
+{
+   unsigned long addr, pfn, next;
+   unsigned long long size;
+   pgd_t *pgd;
+   p4d_t *p4d;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *pte;
+   int ret;
+
+   ret = vmemmap_populate(start, end, node);
+   /*
+* We might have partially populated memory, so check for no entries,
+* and zero only those that actually exist.
+*/
+   for (addr = start; addr < end; addr = next) {
+   pgd = pgd_offset_k(addr);
+   if (pgd_none(*pgd)) {
+   next = pgd_addr_end(addr, end);
+   continue;
+   }
+
+   p4d = p4d_offset(pgd, addr);
+   if (p4d_none(*p4d)) {
+   next = p4d_addr_end(addr, end);
+   continue;
+   }
+
+   pud = pud_offset(p4d, addr);
+   if (pud_none(*pud)) {
+   next = pud_addr_end(addr, end);
+   continue;
+   }
+   if (pud_large(*pud)) {
+   /* This is PUD size page */
+   next = pud_addr_end(addr, end);
+   size = PUD_SIZE;
+   pfn = pud_pfn(*pud);
+   } else {
+   pmd = pmd_offset(pud, addr);
+   if (pmd_none(*pmd)) {
+   next = pmd_addr_end(addr, end);
+   continue;
+   }
+   if (pmd_large(*pmd)) {
+   /* This is PMD size page */
+   next = pmd_addr_end(addr, end);
+   size = PMD_SIZE;
+   pfn = pmd_pfn(*pmd);
+   } else {
+   pte = pte_offset_kernel(pmd, addr);
+   next = addr + PAGE_SIZE;
+   if (pte_none(*pte))
+   continue;
+   /* This is base size page */
+   size = PAGE_SIZE;
+   pfn = pte_pfn(*pte);
+   }
+   }
+   memset(phys_to_virt(PFN_PHYS(pfn)), 0, size);
+   }
+   return ret;
+}
-- 
2.14.2

[PATCH v10 05/10] mm: zero reserved and unavailable struct pages

2017-10-05 Thread Pavel Tatashin

Some memory is reserved but unavailable: not present in memblock.memory
(because not backed by physical pages), but present in memblock.reserved.
Such memory has backing struct pages, but they are not initialized by going
through __init_single_page().

In some cases these struct pages are accessed even if they do not contain
any data. One example is page_to_pfn() might access page->flags if this is
where section information is stored (CONFIG_SPARSEMEM,
SECTION_IN_PAGE_FLAGS).

One example of such memory: trim_low_memory_range() unconditionally
reserves from pfn 0, but e820__memblock_setup() might provide the exiting
memory from pfn 1 (i.e. KVM).

Since, struct pages are zeroed in __init_single_page(), and not during
allocation time, we must zero such struct pages explicitly.

The patch involves adding a new memblock iterator:
for_each_resv_unavail_range(i, p_start, p_end)

Which iterates through reserved && !memory lists, and we zero struct pages
explicitly by calling mm_zero_struct_page().

Signed-off-by: Pavel Tatashin 
Reviewed-by: Steven Sistare 
Reviewed-by: Daniel Jordan 
Reviewed-by: Bob Picco 
---
 include/linux/memblock.h | 16 
 include/linux/mm.h   | 15 +++
 mm/page_alloc.c  | 38 ++
 3 files changed, 69 insertions(+)

diff --git a/include/linux/memblock.h b/include/linux/memblock.h
index bae11c7e7bf3..ce8bfa5f3e9b 100644
--- a/include/linux/memblock.h
+++ b/include/linux/memblock.h
@@ -237,6 +237,22 @@ unsigned long memblock_next_valid_pfn(unsigned long pfn, 
unsigned long max_pfn);
for_each_mem_range_rev(i, , , \
   nid, flags, p_start, p_end, p_nid)
 
+/**
+ * for_each_resv_unavail_range - iterate through reserved and unavailable 
memory
+ * @i: u64 used as loop variable
+ * @flags: pick from blocks based on memory attributes
+ * @p_start: ptr to phys_addr_t for start address of the range, can be %NULL
+ * @p_end: ptr to phys_addr_t for end address of the range, can be %NULL
+ *
+ * Walks over unavailable but reserved (reserved && !memory) areas of memblock.
+ * Available as soon as memblock is initialized.
+ * Note: because this memory does not belong to any physical node, flags and
+ * nid arguments do not make sense and thus not exported as arguments.
+ */
+#define for_each_resv_unavail_range(i, p_start, p_end) \
+   for_each_mem_range(i, , , \
+  NUMA_NO_NODE, MEMBLOCK_NONE, p_start, p_end, NULL)
+
 static inline void memblock_set_region_flags(struct memblock_region *r,
 unsigned long flags)
 {
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 065d99deb847..04c8b2e5aff4 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -94,6 +94,15 @@ extern int mmap_rnd_compat_bits __read_mostly;
 #define mm_forbids_zeropage(X) (0)
 #endif
 
+/*
+ * On some architectures it is expensive to call memset() for small sizes.
+ * Those architectures should provide their own implementation of "struct page"
+ * zeroing by defining this macro in .
+ */
+#ifndef mm_zero_struct_page
+#define mm_zero_struct_page(pp)  ((void)memset((pp), 0, sizeof(struct page)))
+#endif
+
 /*
  * Default maximum number of active map areas, this limits the number of vmas
  * per mm struct. Users can overwrite this number by sysctl but there is a
@@ -2001,6 +2010,12 @@ extern int __meminit __early_pfn_to_nid(unsigned long 
pfn,
struct mminit_pfnnid_cache *state);
 #endif
 
+#ifdef CONFIG_HAVE_MEMBLOCK
+void zero_resv_unavail(void);
+#else
+static inline void zero_resv_unavail(void) {}
+#endif
+
 extern void set_dma_reserve(unsigned long new_dma_reserve);
 extern void memmap_init_zone(unsigned long, int, unsigned long,
unsigned long, enum memmap_context);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 20b0bace2235..5f0013bbbe9d 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -6209,6 +6209,42 @@ void __paginginit free_area_init_node(int nid, unsigned 
long *zones_size,
free_area_init_core(pgdat);
 }
 
+#ifdef CONFIG_HAVE_MEMBLOCK
+/*
+ * Only struct pages that are backed by physical memory are zeroed and
+ * initialized by going through __init_single_page(). But, there are some
+ * struct pages which are reserved in memblock allocator and their fields
+ * may be accessed (for example page_to_pfn() on some configuration accesses
+ * flags). We must explicitly zero those struct pages.
+ */
+void __paginginit zero_resv_unavail(void)
+{
+   phys_addr_t start, end;
+   unsigned long pfn;
+   u64 i, pgcnt;
+
+   /* Loop through ranges that are reserved, but do not have reported
+* physical memory backing.
+*/
+   pgcnt = 0;
+   for_each_resv_unavail_range(i, , ) {
+   for (pfn = PFN_DOWN(start); pfn < PFN_UP(end); pfn++) {
+

[PATCH v10 06/10] mm/kasan: kasan specific map populate function

2017-10-05 Thread Pavel Tatashin

During early boot, kasan uses vmemmap_populate() to establish its shadow
memory. But, that interface is intended for struct pages use.

Because of the current project, vmemmap won't be zeroed during allocation,
but kasan expects that memory to be zeroed. We are adding a new
kasan_map_populate() function to resolve this difference.

Signed-off-by: Pavel Tatashin 
---
 arch/arm64/include/asm/pgtable.h |  3 ++
 include/linux/kasan.h|  2 ++
 mm/kasan/kasan_init.c| 67 
 3 files changed, 72 insertions(+)

diff --git a/arch/arm64/include/asm/pgtable.h b/arch/arm64/include/asm/pgtable.h
index b46e54c2399b..11ff58901519 100644
--- a/arch/arm64/include/asm/pgtable.h
+++ b/arch/arm64/include/asm/pgtable.h
@@ -381,6 +381,9 @@ extern pgprot_t phys_mem_access_prot(struct file *file, 
unsigned long pfn,
 PUD_TYPE_TABLE)
 #endif
 
+#define pmd_large(pmd) pmd_sect(pmd)
+#define pud_large(pud) pud_sect(pud)
+
 static inline void set_pmd(pmd_t *pmdp, pmd_t pmd)
 {
*pmdp = pmd;
diff --git a/include/linux/kasan.h b/include/linux/kasan.h
index a5c7046f26b4..7e13df1722c2 100644
--- a/include/linux/kasan.h
+++ b/include/linux/kasan.h
@@ -78,6 +78,8 @@ size_t kasan_metadata_size(struct kmem_cache *cache);
 
 bool kasan_save_enable_multi_shot(void);
 void kasan_restore_multi_shot(bool enabled);
+int __meminit kasan_map_populate(unsigned long start, unsigned long end,
+int node);
 
 #else /* CONFIG_KASAN */
 
diff --git a/mm/kasan/kasan_init.c b/mm/kasan/kasan_init.c
index 554e4c0f23a2..57a973f05f63 100644
--- a/mm/kasan/kasan_init.c
+++ b/mm/kasan/kasan_init.c
@@ -197,3 +197,70 @@ void __init kasan_populate_zero_shadow(const void 
*shadow_start,
zero_p4d_populate(pgd, addr, next);
} while (pgd++, addr = next, addr != end);
 }
+
+/* Creates mappings for kasan during early boot. The mapped memory is zeroed */
+int __meminit kasan_map_populate(unsigned long start, unsigned long end,
+int node)
+{
+   unsigned long addr, pfn, next;
+   unsigned long long size;
+   pgd_t *pgd;
+   p4d_t *p4d;
+   pud_t *pud;
+   pmd_t *pmd;
+   pte_t *pte;
+   int ret;
+
+   ret = vmemmap_populate(start, end, node);
+   /*
+* We might have partially populated memory, so check for no entries,
+* and zero only those that actually exist.
+*/
+   for (addr = start; addr < end; addr = next) {
+   pgd = pgd_offset_k(addr);
+   if (pgd_none(*pgd)) {
+   next = pgd_addr_end(addr, end);
+   continue;
+   }
+
+   p4d = p4d_offset(pgd, addr);
+   if (p4d_none(*p4d)) {
+   next = p4d_addr_end(addr, end);
+   continue;
+   }
+
+   pud = pud_offset(p4d, addr);
+   if (pud_none(*pud)) {
+   next = pud_addr_end(addr, end);
+   continue;
+   }
+   if (pud_large(*pud)) {
+   /* This is PUD size page */
+   next = pud_addr_end(addr, end);
+   size = PUD_SIZE;
+   pfn = pud_pfn(*pud);
+   } else {
+   pmd = pmd_offset(pud, addr);
+   if (pmd_none(*pmd)) {
+   next = pmd_addr_end(addr, end);
+   continue;
+   }
+   if (pmd_large(*pmd)) {
+   /* This is PMD size page */
+   next = pmd_addr_end(addr, end);
+   size = PMD_SIZE;
+   pfn = pmd_pfn(*pmd);
+   } else {
+   pte = pte_offset_kernel(pmd, addr);
+   next = addr + PAGE_SIZE;
+   if (pte_none(*pte))
+   continue;
+   /* This is base size page */
+   size = PAGE_SIZE;
+   pfn = pte_pfn(*pte);
+   }
+   }
+   memset(phys_to_virt(PFN_PHYS(pfn)), 0, size);
+   }
+   return ret;
+}
-- 
2.14.2

[PATCH v10 09/10] mm: stop zeroing memory during allocation in vmemmap

2017-10-05 Thread Pavel Tatashin

vmemmap_alloc_block() will no longer zero the block, so zero memory
at its call sites for everything except struct pages.  Struct page memory
is zero'd by struct page initialization.

Replace allocators in sprase-vmemmap to use the non-zeroing version. So,
we will get the performance improvement by zeroing the memory in parallel
when struct pages are zeroed.

Add struct page zeroing as a part of initialization of other fields in
__init_single_page().

This single thread performance collected on: Intel(R) Xeon(R) CPU E7-8895
v3 @ 2.60GHz with 1T of memory (268400646 pages in 8 nodes):

 BASEFIX
sparse_init 11.244671836s   0.007199623s
zone_sizes_init  4.879775891s   8.355182299s
  --
Total   16.124447727s   8.362381922s

sparse_init is where memory for struct pages is zeroed, and the zeroing
part is moved later in this patch into __init_single_page(), which is
called from zone_sizes_init().

Signed-off-by: Pavel Tatashin 
Reviewed-by: Steven Sistare 
Reviewed-by: Daniel Jordan 
Reviewed-by: Bob Picco 
Acked-by: Michal Hocko 
---
 include/linux/mm.h  | 11 +++
 mm/page_alloc.c |  1 +
 mm/sparse-vmemmap.c | 15 +++
 mm/sparse.c |  6 +++---
 4 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 04c8b2e5aff4..fd045a3b243a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2501,6 +2501,17 @@ static inline void *vmemmap_alloc_block_buf(unsigned 
long size, int node)
return __vmemmap_alloc_block_buf(size, node, NULL);
 }
 
+static inline void *vmemmap_alloc_block_zero(unsigned long size, int node)
+{
+   void *p = vmemmap_alloc_block(size, node);
+
+   if (!p)
+   return NULL;
+   memset(p, 0, size);
+
+   return p;
+}
+
 void vmemmap_verify(pte_t *, int, unsigned long, unsigned long);
 int vmemmap_populate_basepages(unsigned long start, unsigned long end,
   int node);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5f0013bbbe9d..85e038e1e941 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1170,6 +1170,7 @@ static void free_one_page(struct zone *zone,
 static void __meminit __init_single_page(struct page *page, unsigned long pfn,
unsigned long zone, int nid)
 {
+   mm_zero_struct_page(page);
set_page_links(page, zone, nid, pfn);
init_page_count(page);
page_mapcount_reset(page);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index d1a39b8051e0..c2f5654e7c9d 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -41,7 +41,7 @@ static void * __ref __earlyonly_bootmem_alloc(int node,
unsigned long align,
unsigned long goal)
 {
-   return memblock_virt_alloc_try_nid(size, align, goal,
+   return memblock_virt_alloc_try_nid_raw(size, align, goal,
BOOTMEM_ALLOC_ACCESSIBLE, node);
 }
 
@@ -54,9 +54,8 @@ void * __meminit vmemmap_alloc_block(unsigned long size, int 
node)
if (slab_is_available()) {
struct page *page;
 
-   page = alloc_pages_node(node,
-   GFP_KERNEL | __GFP_ZERO | __GFP_RETRY_MAYFAIL,
-   get_order(size));
+   page = alloc_pages_node(node, GFP_KERNEL | __GFP_RETRY_MAYFAIL,
+   get_order(size));
if (page)
return page_address(page);
return NULL;
@@ -183,7 +182,7 @@ pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned 
long addr, int node)
 {
pmd_t *pmd = pmd_offset(pud, addr);
if (pmd_none(*pmd)) {
-   void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+   void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
if (!p)
return NULL;
pmd_populate_kernel(_mm, pmd, p);
@@ -195,7 +194,7 @@ pud_t * __meminit vmemmap_pud_populate(p4d_t *p4d, unsigned 
long addr, int node)
 {
pud_t *pud = pud_offset(p4d, addr);
if (pud_none(*pud)) {
-   void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+   void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
if (!p)
return NULL;
pud_populate(_mm, pud, p);
@@ -207,7 +206,7 @@ p4d_t * __meminit vmemmap_p4d_populate(pgd_t *pgd, unsigned 
long addr, int node)
 {
p4d_t *p4d = p4d_offset(pgd, addr);
if (p4d_none(*p4d)) {
-   void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+   void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
if (!p)
return NULL;
p4d_populate(_mm,

[PATCH v10 09/10] mm: stop zeroing memory during allocation in vmemmap

2017-10-05 Thread Pavel Tatashin

vmemmap_alloc_block() will no longer zero the block, so zero memory
at its call sites for everything except struct pages.  Struct page memory
is zero'd by struct page initialization.

Replace allocators in sprase-vmemmap to use the non-zeroing version. So,
we will get the performance improvement by zeroing the memory in parallel
when struct pages are zeroed.

Add struct page zeroing as a part of initialization of other fields in
__init_single_page().

This single thread performance collected on: Intel(R) Xeon(R) CPU E7-8895
v3 @ 2.60GHz with 1T of memory (268400646 pages in 8 nodes):

 BASEFIX
sparse_init 11.244671836s   0.007199623s
zone_sizes_init  4.879775891s   8.355182299s
  --
Total   16.124447727s   8.362381922s

sparse_init is where memory for struct pages is zeroed, and the zeroing
part is moved later in this patch into __init_single_page(), which is
called from zone_sizes_init().

Signed-off-by: Pavel Tatashin 
Reviewed-by: Steven Sistare 
Reviewed-by: Daniel Jordan 
Reviewed-by: Bob Picco 
Acked-by: Michal Hocko 
---
 include/linux/mm.h  | 11 +++
 mm/page_alloc.c |  1 +
 mm/sparse-vmemmap.c | 15 +++
 mm/sparse.c |  6 +++---
 4 files changed, 22 insertions(+), 11 deletions(-)

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 04c8b2e5aff4..fd045a3b243a 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2501,6 +2501,17 @@ static inline void *vmemmap_alloc_block_buf(unsigned 
long size, int node)
return __vmemmap_alloc_block_buf(size, node, NULL);
 }
 
+static inline void *vmemmap_alloc_block_zero(unsigned long size, int node)
+{
+   void *p = vmemmap_alloc_block(size, node);
+
+   if (!p)
+   return NULL;
+   memset(p, 0, size);
+
+   return p;
+}
+
 void vmemmap_verify(pte_t *, int, unsigned long, unsigned long);
 int vmemmap_populate_basepages(unsigned long start, unsigned long end,
   int node);
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 5f0013bbbe9d..85e038e1e941 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1170,6 +1170,7 @@ static void free_one_page(struct zone *zone,
 static void __meminit __init_single_page(struct page *page, unsigned long pfn,
unsigned long zone, int nid)
 {
+   mm_zero_struct_page(page);
set_page_links(page, zone, nid, pfn);
init_page_count(page);
page_mapcount_reset(page);
diff --git a/mm/sparse-vmemmap.c b/mm/sparse-vmemmap.c
index d1a39b8051e0..c2f5654e7c9d 100644
--- a/mm/sparse-vmemmap.c
+++ b/mm/sparse-vmemmap.c
@@ -41,7 +41,7 @@ static void * __ref __earlyonly_bootmem_alloc(int node,
unsigned long align,
unsigned long goal)
 {
-   return memblock_virt_alloc_try_nid(size, align, goal,
+   return memblock_virt_alloc_try_nid_raw(size, align, goal,
BOOTMEM_ALLOC_ACCESSIBLE, node);
 }
 
@@ -54,9 +54,8 @@ void * __meminit vmemmap_alloc_block(unsigned long size, int 
node)
if (slab_is_available()) {
struct page *page;
 
-   page = alloc_pages_node(node,
-   GFP_KERNEL | __GFP_ZERO | __GFP_RETRY_MAYFAIL,
-   get_order(size));
+   page = alloc_pages_node(node, GFP_KERNEL | __GFP_RETRY_MAYFAIL,
+   get_order(size));
if (page)
return page_address(page);
return NULL;
@@ -183,7 +182,7 @@ pmd_t * __meminit vmemmap_pmd_populate(pud_t *pud, unsigned 
long addr, int node)
 {
pmd_t *pmd = pmd_offset(pud, addr);
if (pmd_none(*pmd)) {
-   void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+   void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
if (!p)
return NULL;
pmd_populate_kernel(_mm, pmd, p);
@@ -195,7 +194,7 @@ pud_t * __meminit vmemmap_pud_populate(p4d_t *p4d, unsigned 
long addr, int node)
 {
pud_t *pud = pud_offset(p4d, addr);
if (pud_none(*pud)) {
-   void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+   void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
if (!p)
return NULL;
pud_populate(_mm, pud, p);
@@ -207,7 +206,7 @@ p4d_t * __meminit vmemmap_p4d_populate(pgd_t *pgd, unsigned 
long addr, int node)
 {
p4d_t *p4d = p4d_offset(pgd, addr);
if (p4d_none(*p4d)) {
-   void *p = vmemmap_alloc_block(PAGE_SIZE, node);
+   void *p = vmemmap_alloc_block_zero(PAGE_SIZE, node);
if (!p)
return NULL;
p4d_populate(_mm, p4d, p);
@@ -219,7 +218,7 @@ pgd_t * __meminit vmemmap_pgd_populate(unsigned long addr, 
int node)
 {
pgd_t *pgd

[PATCH v10 04/10] mm: defining memblock_virt_alloc_try_nid_raw

2017-10-05 Thread Pavel Tatashin

* A new variant of memblock_virt_alloc_* allocations:
memblock_virt_alloc_try_nid_raw()
- Does not zero the allocated memory
- Does not panic if request cannot be satisfied

* optimize early system hash allocations

Clients can call alloc_large_system_hash() with flag: HASH_ZERO to specify
that memory that was allocated for system hash needs to be zeroed,
otherwise the memory does not need to be zeroed, and client will initialize
it.

If memory does not need to be zero'd, call the new
memblock_virt_alloc_raw() interface, and thus improve the boot performance.

* debug for raw alloctor

When CONFIG_DEBUG_VM is enabled, this patch sets all the memory that is
returned by memblock_virt_alloc_try_nid_raw() to ones to ensure that no
places excpect zeroed memory.

Signed-off-by: Pavel Tatashin 
Reviewed-by: Steven Sistare 
Reviewed-by: Daniel Jordan 
Reviewed-by: Bob Picco 
Acked-by: Michal Hocko 
---
 include/linux/bootmem.h | 27 ++
 mm/memblock.c   | 60 +++--
 mm/page_alloc.c | 15 ++---
 3 files changed, 87 insertions(+), 15 deletions(-)

diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index e223d91b6439..ea30b3987282 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -160,6 +160,9 @@ extern void *__alloc_bootmem_low_node(pg_data_t *pgdat,
 #define BOOTMEM_ALLOC_ANYWHERE (~(phys_addr_t)0)
 
 /* FIXME: Move to memblock.h at a point where we remove nobootmem.c */
+void *memblock_virt_alloc_try_nid_raw(phys_addr_t size, phys_addr_t align,
+ phys_addr_t min_addr,
+ phys_addr_t max_addr, int nid);
 void *memblock_virt_alloc_try_nid_nopanic(phys_addr_t size,
phys_addr_t align, phys_addr_t min_addr,
phys_addr_t max_addr, int nid);
@@ -176,6 +179,14 @@ static inline void * __init memblock_virt_alloc(
NUMA_NO_NODE);
 }
 
+static inline void * __init memblock_virt_alloc_raw(
+   phys_addr_t size,  phys_addr_t align)
+{
+   return memblock_virt_alloc_try_nid_raw(size, align, BOOTMEM_LOW_LIMIT,
+   BOOTMEM_ALLOC_ACCESSIBLE,
+   NUMA_NO_NODE);
+}
+
 static inline void * __init memblock_virt_alloc_nopanic(
phys_addr_t size, phys_addr_t align)
 {
@@ -257,6 +268,14 @@ static inline void * __init memblock_virt_alloc(
return __alloc_bootmem(size, align, BOOTMEM_LOW_LIMIT);
 }
 
+static inline void * __init memblock_virt_alloc_raw(
+   phys_addr_t size,  phys_addr_t align)
+{
+   if (!align)
+   align = SMP_CACHE_BYTES;
+   return __alloc_bootmem_nopanic(size, align, BOOTMEM_LOW_LIMIT);
+}
+
 static inline void * __init memblock_virt_alloc_nopanic(
phys_addr_t size, phys_addr_t align)
 {
@@ -309,6 +328,14 @@ static inline void * __init 
memblock_virt_alloc_try_nid(phys_addr_t size,
  min_addr);
 }
 
+static inline void * __init memblock_virt_alloc_try_nid_raw(
+   phys_addr_t size, phys_addr_t align,
+   phys_addr_t min_addr, phys_addr_t max_addr, int nid)
+{
+   return ___alloc_bootmem_node_nopanic(NODE_DATA(nid), size, align,
+   min_addr, max_addr);
+}
+
 static inline void * __init memblock_virt_alloc_try_nid_nopanic(
phys_addr_t size, phys_addr_t align,
phys_addr_t min_addr, phys_addr_t max_addr, int nid)
diff --git a/mm/memblock.c b/mm/memblock.c
index 91205780e6b1..1f299fb1eb08 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1327,7 +1327,6 @@ static void * __init memblock_virt_alloc_internal(
return NULL;
 done:
ptr = phys_to_virt(alloc);
-   memset(ptr, 0, size);
 
/*
 * The min_count is set to 0 so that bootmem allocated blocks
@@ -1340,6 +1339,45 @@ static void * __init memblock_virt_alloc_internal(
return ptr;
 }
 
+/**
+ * memblock_virt_alloc_try_nid_raw - allocate boot memory block without zeroing
+ * memory and without panicking
+ * @size: size of memory block to be allocated in bytes
+ * @align: alignment of the region and block's size
+ * @min_addr: the lower bound of the memory region from where the allocation
+ *   is preferred (phys address)
+ * @max_addr: the upper bound of the memory region from where the allocation
+ *   is preferred (phys address), or %BOOTMEM_ALLOC_ACCESSIBLE to
+ *   allocate only from memory limited by memblock.current_limit value
+ * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
+ *
+ *

[PATCH v10 04/10] mm: defining memblock_virt_alloc_try_nid_raw

2017-10-05 Thread Pavel Tatashin

* A new variant of memblock_virt_alloc_* allocations:
memblock_virt_alloc_try_nid_raw()
- Does not zero the allocated memory
- Does not panic if request cannot be satisfied

* optimize early system hash allocations

Clients can call alloc_large_system_hash() with flag: HASH_ZERO to specify
that memory that was allocated for system hash needs to be zeroed,
otherwise the memory does not need to be zeroed, and client will initialize
it.

If memory does not need to be zero'd, call the new
memblock_virt_alloc_raw() interface, and thus improve the boot performance.

* debug for raw alloctor

When CONFIG_DEBUG_VM is enabled, this patch sets all the memory that is
returned by memblock_virt_alloc_try_nid_raw() to ones to ensure that no
places excpect zeroed memory.

Signed-off-by: Pavel Tatashin 
Reviewed-by: Steven Sistare 
Reviewed-by: Daniel Jordan 
Reviewed-by: Bob Picco 
Acked-by: Michal Hocko 
---
 include/linux/bootmem.h | 27 ++
 mm/memblock.c   | 60 +++--
 mm/page_alloc.c | 15 ++---
 3 files changed, 87 insertions(+), 15 deletions(-)

diff --git a/include/linux/bootmem.h b/include/linux/bootmem.h
index e223d91b6439..ea30b3987282 100644
--- a/include/linux/bootmem.h
+++ b/include/linux/bootmem.h
@@ -160,6 +160,9 @@ extern void *__alloc_bootmem_low_node(pg_data_t *pgdat,
 #define BOOTMEM_ALLOC_ANYWHERE (~(phys_addr_t)0)
 
 /* FIXME: Move to memblock.h at a point where we remove nobootmem.c */
+void *memblock_virt_alloc_try_nid_raw(phys_addr_t size, phys_addr_t align,
+ phys_addr_t min_addr,
+ phys_addr_t max_addr, int nid);
 void *memblock_virt_alloc_try_nid_nopanic(phys_addr_t size,
phys_addr_t align, phys_addr_t min_addr,
phys_addr_t max_addr, int nid);
@@ -176,6 +179,14 @@ static inline void * __init memblock_virt_alloc(
NUMA_NO_NODE);
 }
 
+static inline void * __init memblock_virt_alloc_raw(
+   phys_addr_t size,  phys_addr_t align)
+{
+   return memblock_virt_alloc_try_nid_raw(size, align, BOOTMEM_LOW_LIMIT,
+   BOOTMEM_ALLOC_ACCESSIBLE,
+   NUMA_NO_NODE);
+}
+
 static inline void * __init memblock_virt_alloc_nopanic(
phys_addr_t size, phys_addr_t align)
 {
@@ -257,6 +268,14 @@ static inline void * __init memblock_virt_alloc(
return __alloc_bootmem(size, align, BOOTMEM_LOW_LIMIT);
 }
 
+static inline void * __init memblock_virt_alloc_raw(
+   phys_addr_t size,  phys_addr_t align)
+{
+   if (!align)
+   align = SMP_CACHE_BYTES;
+   return __alloc_bootmem_nopanic(size, align, BOOTMEM_LOW_LIMIT);
+}
+
 static inline void * __init memblock_virt_alloc_nopanic(
phys_addr_t size, phys_addr_t align)
 {
@@ -309,6 +328,14 @@ static inline void * __init 
memblock_virt_alloc_try_nid(phys_addr_t size,
  min_addr);
 }
 
+static inline void * __init memblock_virt_alloc_try_nid_raw(
+   phys_addr_t size, phys_addr_t align,
+   phys_addr_t min_addr, phys_addr_t max_addr, int nid)
+{
+   return ___alloc_bootmem_node_nopanic(NODE_DATA(nid), size, align,
+   min_addr, max_addr);
+}
+
 static inline void * __init memblock_virt_alloc_try_nid_nopanic(
phys_addr_t size, phys_addr_t align,
phys_addr_t min_addr, phys_addr_t max_addr, int nid)
diff --git a/mm/memblock.c b/mm/memblock.c
index 91205780e6b1..1f299fb1eb08 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1327,7 +1327,6 @@ static void * __init memblock_virt_alloc_internal(
return NULL;
 done:
ptr = phys_to_virt(alloc);
-   memset(ptr, 0, size);
 
/*
 * The min_count is set to 0 so that bootmem allocated blocks
@@ -1340,6 +1339,45 @@ static void * __init memblock_virt_alloc_internal(
return ptr;
 }
 
+/**
+ * memblock_virt_alloc_try_nid_raw - allocate boot memory block without zeroing
+ * memory and without panicking
+ * @size: size of memory block to be allocated in bytes
+ * @align: alignment of the region and block's size
+ * @min_addr: the lower bound of the memory region from where the allocation
+ *   is preferred (phys address)
+ * @max_addr: the upper bound of the memory region from where the allocation
+ *   is preferred (phys address), or %BOOTMEM_ALLOC_ACCESSIBLE to
+ *   allocate only from memory limited by memblock.current_limit value
+ * @nid: nid of the free area to find, %NUMA_NO_NODE for any node
+ *
+ * Public function, provides additional debug information (including caller
+ * info), if enabled. Does not zero allocated

[PATCH v10 10/10] sparc64: optimized struct page zeroing

2017-10-05 Thread Pavel Tatashin

Add an optimized mm_zero_struct_page(), so struct page's are zeroed without
calling memset(). We do eight to ten regular stores based on the size of
struct page. Compiler optimizes out the conditions of switch() statement.

SPARC-M6 with 15T of memory, single thread performance:

   BASEFIX  OPTIMIZED_FIX
bootmem_init   28.440467985s   2.305674818s   2.305161615s
free_area_init_nodes  202.845901673s 225.343084508s 172.556506560s
  
Total 231.286369658s 227.648759326s 174.861668175s

BASE:  current linux
FIX:   This patch series without "optimized struct page zeroing"
OPTIMIZED_FIX: This patch series including the current patch.

bootmem_init() is where memory for struct pages is zeroed during
allocation. Note, about two seconds in this function is a fixed time: it
does not increase as memory is increased.

Signed-off-by: Pavel Tatashin 
Reviewed-by: Steven Sistare 
Reviewed-by: Daniel Jordan 
Reviewed-by: Bob Picco 
Acked-by: David S. Miller 
---
 arch/sparc/include/asm/pgtable_64.h | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/arch/sparc/include/asm/pgtable_64.h 
b/arch/sparc/include/asm/pgtable_64.h
index 4fefe3762083..8ed478abc630 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -230,6 +230,36 @@ extern unsigned long _PAGE_ALL_SZ_BITS;
 extern struct page *mem_map_zero;
 #define ZERO_PAGE(vaddr)   (mem_map_zero)
 
+/* This macro must be updated when the size of struct page grows above 80
+ * or reduces below 64.
+ * The idea that compiler optimizes out switch() statement, and only
+ * leaves clrx instructions
+ */
+#definemm_zero_struct_page(pp) do {
\
+   unsigned long *_pp = (void *)(pp);  \
+   \
+/* Check that struct page is either 64, 72, or 80 bytes */ \
+   BUILD_BUG_ON(sizeof(struct page) & 7);  \
+   BUILD_BUG_ON(sizeof(struct page) < 64); \
+   BUILD_BUG_ON(sizeof(struct page) > 80); \
+   \
+   switch (sizeof(struct page)) {  \
+   case 80:\
+   _pp[9] = 0; /* fallthrough */   \
+   case 72:\
+   _pp[8] = 0; /* fallthrough */   \
+   default:\
+   _pp[7] = 0; \
+   _pp[6] = 0; \
+   _pp[5] = 0; \
+   _pp[4] = 0; \
+   _pp[3] = 0; \
+   _pp[2] = 0; \
+   _pp[1] = 0; \
+   _pp[0] = 0; \
+   }   \
+} while (0)
+
 /* PFNs are real physical page numbers.  However, mem_map only begins to record
  * per-page information starting at pfn_base.  This is to handle systems where
  * the first physical page in the machine is at some huge physical address,
-- 
2.14.2

[PATCH v10 10/10] sparc64: optimized struct page zeroing

2017-10-05 Thread Pavel Tatashin

Add an optimized mm_zero_struct_page(), so struct page's are zeroed without
calling memset(). We do eight to ten regular stores based on the size of
struct page. Compiler optimizes out the conditions of switch() statement.

SPARC-M6 with 15T of memory, single thread performance:

   BASEFIX  OPTIMIZED_FIX
bootmem_init   28.440467985s   2.305674818s   2.305161615s
free_area_init_nodes  202.845901673s 225.343084508s 172.556506560s
  
Total 231.286369658s 227.648759326s 174.861668175s

BASE:  current linux
FIX:   This patch series without "optimized struct page zeroing"
OPTIMIZED_FIX: This patch series including the current patch.

bootmem_init() is where memory for struct pages is zeroed during
allocation. Note, about two seconds in this function is a fixed time: it
does not increase as memory is increased.

Signed-off-by: Pavel Tatashin 
Reviewed-by: Steven Sistare 
Reviewed-by: Daniel Jordan 
Reviewed-by: Bob Picco 
Acked-by: David S. Miller 
---
 arch/sparc/include/asm/pgtable_64.h | 30 ++
 1 file changed, 30 insertions(+)

diff --git a/arch/sparc/include/asm/pgtable_64.h 
b/arch/sparc/include/asm/pgtable_64.h
index 4fefe3762083..8ed478abc630 100644
--- a/arch/sparc/include/asm/pgtable_64.h
+++ b/arch/sparc/include/asm/pgtable_64.h
@@ -230,6 +230,36 @@ extern unsigned long _PAGE_ALL_SZ_BITS;
 extern struct page *mem_map_zero;
 #define ZERO_PAGE(vaddr)   (mem_map_zero)
 
+/* This macro must be updated when the size of struct page grows above 80
+ * or reduces below 64.
+ * The idea that compiler optimizes out switch() statement, and only
+ * leaves clrx instructions
+ */
+#definemm_zero_struct_page(pp) do {
\
+   unsigned long *_pp = (void *)(pp);  \
+   \
+/* Check that struct page is either 64, 72, or 80 bytes */ \
+   BUILD_BUG_ON(sizeof(struct page) & 7);  \
+   BUILD_BUG_ON(sizeof(struct page) < 64); \
+   BUILD_BUG_ON(sizeof(struct page) > 80); \
+   \
+   switch (sizeof(struct page)) {  \
+   case 80:\
+   _pp[9] = 0; /* fallthrough */   \
+   case 72:\
+   _pp[8] = 0; /* fallthrough */   \
+   default:\
+   _pp[7] = 0; \
+   _pp[6] = 0; \
+   _pp[5] = 0; \
+   _pp[4] = 0; \
+   _pp[3] = 0; \
+   _pp[2] = 0; \
+   _pp[1] = 0; \
+   _pp[0] = 0; \
+   }   \
+} while (0)
+
 /* PFNs are real physical page numbers.  However, mem_map only begins to record
  * per-page information starting at pfn_base.  This is to handle systems where
  * the first physical page in the machine is at some huge physical address,
-- 
2.14.2

[PATCH v10 08/10] arm64/kasan: use kasan_map_populate()

2017-10-05 Thread Pavel Tatashin

To optimize the performance of struct page initialization,
vmemmap_populate() will no longer zero memory.

Therefore, we must use a new interface to allocate and map kasan shadow
memory, that also zeroes memory for us.

Signed-off-by: Pavel Tatashin 
---
 arch/arm64/mm/kasan_init.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index 81f03959a4ab..b6e92cfa3ea3 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -161,11 +161,11 @@ void __init kasan_init(void)
 
clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
 
-   vmemmap_populate(kimg_shadow_start, kimg_shadow_end,
-pfn_to_nid(virt_to_pfn(lm_alias(_text;
+   kasan_map_populate(kimg_shadow_start, kimg_shadow_end,
+  pfn_to_nid(virt_to_pfn(lm_alias(_text;
 
/*
-* vmemmap_populate() has populated the shadow region that covers the
+* kasan_map_populate() has populated the shadow region that covers the
 * kernel image with SWAPPER_BLOCK_SIZE mappings, so we have to round
 * the start and end addresses to SWAPPER_BLOCK_SIZE as well, to prevent
 * kasan_populate_zero_shadow() from replacing the page table entries
@@ -191,9 +191,9 @@ void __init kasan_init(void)
if (start >= end)
break;
 
-   vmemmap_populate((unsigned long)kasan_mem_to_shadow(start),
-   (unsigned long)kasan_mem_to_shadow(end),
-   pfn_to_nid(virt_to_pfn(start)));
+   kasan_map_populate((unsigned long)kasan_mem_to_shadow(start),
+  (unsigned long)kasan_mem_to_shadow(end),
+  pfn_to_nid(virt_to_pfn(start)));
}
 
/*
-- 
2.14.2

[PATCH v10 08/10] arm64/kasan: use kasan_map_populate()

2017-10-05 Thread Pavel Tatashin

To optimize the performance of struct page initialization,
vmemmap_populate() will no longer zero memory.

Therefore, we must use a new interface to allocate and map kasan shadow
memory, that also zeroes memory for us.

Signed-off-by: Pavel Tatashin 
---
 arch/arm64/mm/kasan_init.c | 12 ++--
 1 file changed, 6 insertions(+), 6 deletions(-)

diff --git a/arch/arm64/mm/kasan_init.c b/arch/arm64/mm/kasan_init.c
index 81f03959a4ab..b6e92cfa3ea3 100644
--- a/arch/arm64/mm/kasan_init.c
+++ b/arch/arm64/mm/kasan_init.c
@@ -161,11 +161,11 @@ void __init kasan_init(void)
 
clear_pgds(KASAN_SHADOW_START, KASAN_SHADOW_END);
 
-   vmemmap_populate(kimg_shadow_start, kimg_shadow_end,
-pfn_to_nid(virt_to_pfn(lm_alias(_text;
+   kasan_map_populate(kimg_shadow_start, kimg_shadow_end,
+  pfn_to_nid(virt_to_pfn(lm_alias(_text;
 
/*
-* vmemmap_populate() has populated the shadow region that covers the
+* kasan_map_populate() has populated the shadow region that covers the
 * kernel image with SWAPPER_BLOCK_SIZE mappings, so we have to round
 * the start and end addresses to SWAPPER_BLOCK_SIZE as well, to prevent
 * kasan_populate_zero_shadow() from replacing the page table entries
@@ -191,9 +191,9 @@ void __init kasan_init(void)
if (start >= end)
break;
 
-   vmemmap_populate((unsigned long)kasan_mem_to_shadow(start),
-   (unsigned long)kasan_mem_to_shadow(end),
-   pfn_to_nid(virt_to_pfn(start)));
+   kasan_map_populate((unsigned long)kasan_mem_to_shadow(start),
+  (unsigned long)kasan_mem_to_shadow(end),
+  pfn_to_nid(virt_to_pfn(start)));
}
 
/*
-- 
2.14.2

[PATCH v10 03/10] sparc64: simplify vmemmap_populate

2017-10-05 Thread Pavel Tatashin

Remove duplicating code by using common functions
vmemmap_pud_populate and vmemmap_pgd_populate.

Signed-off-by: Pavel Tatashin 
Reviewed-by: Steven Sistare 
Reviewed-by: Daniel Jordan 
Reviewed-by: Bob Picco 
Acked-by: David S. Miller 
Acked-by: Michal Hocko 
---
 arch/sparc/mm/init_64.c | 23 ++-
 1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index caed495544e9..6839db3ffe1d 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2652,30 +2652,19 @@ int __meminit vmemmap_populate(unsigned long vstart, 
unsigned long vend,
vstart = vstart & PMD_MASK;
vend = ALIGN(vend, PMD_SIZE);
for (; vstart < vend; vstart += PMD_SIZE) {
-   pgd_t *pgd = pgd_offset_k(vstart);
+   pgd_t *pgd = vmemmap_pgd_populate(vstart, node);
unsigned long pte;
pud_t *pud;
pmd_t *pmd;
 
-   if (pgd_none(*pgd)) {
-   pud_t *new = vmemmap_alloc_block(PAGE_SIZE, node);
+   if (!pgd)
+   return -ENOMEM;
 
-   if (!new)
-   return -ENOMEM;
-   pgd_populate(_mm, pgd, new);
-   }
-
-   pud = pud_offset(pgd, vstart);
-   if (pud_none(*pud)) {
-   pmd_t *new = vmemmap_alloc_block(PAGE_SIZE, node);
-
-   if (!new)
-   return -ENOMEM;
-   pud_populate(_mm, pud, new);
-   }
+   pud = vmemmap_pud_populate(pgd, vstart, node);
+   if (!pud)
+   return -ENOMEM;
 
pmd = pmd_offset(pud, vstart);
-
pte = pmd_val(*pmd);
if (!(pte & _PAGE_VALID)) {
void *block = vmemmap_alloc_block(PMD_SIZE, node);
-- 
2.14.2

[PATCH v10 01/10] x86/mm: setting fields in deferred pages

2017-10-05 Thread Pavel Tatashin

Without deferred struct page feature (CONFIG_DEFERRED_STRUCT_PAGE_INIT),
flags and other fields in "struct page"es are never changed prior to first
initializing struct pages by going through __init_single_page().

With deferred struct page feature enabled, however, we set fields in
register_page_bootmem_info that are subsequently clobbered right after in
free_all_bootmem:

mem_init() {
register_page_bootmem_info();
free_all_bootmem();
...
}

When register_page_bootmem_info() is called only non-deferred struct pages
are initialized. But, this function goes through some reserved pages which
might be part of the deferred, and thus are not yet initialized.

  mem_init
   register_page_bootmem_info
register_page_bootmem_info_node
 get_page_bootmem
  .. setting fields here ..
  such as: page->freelist = (void *)type;

  free_all_bootmem()
   free_low_memory_core_early()
for_each_reserved_mem_region()
 reserve_bootmem_region()
  init_reserved_page() <- Only if this is deferred reserved page
   __init_single_pfn()
__init_single_page()
memset(0) <-- Loose the set fields here

We end-up with issue where, currently we do not observe problem as memory
is explicitly zeroed. But, if flag asserts are changed we can start hitting
issues.

Also, because in this patch series we will stop zeroing struct page memory
during allocation, we must make sure that struct pages are properly
initialized prior to using them.

The deferred-reserved pages are initialized in free_all_bootmem().
Therefore, the fix is to switch the above calls.

Signed-off-by: Pavel Tatashin 
Reviewed-by: Steven Sistare 
Reviewed-by: Daniel Jordan 
Reviewed-by: Bob Picco 
Acked-by: Michal Hocko 
---
 arch/x86/mm/init_64.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 5ea1c3c2636e..8822523fdcd7 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1182,12 +1182,18 @@ void __init mem_init(void)
 
/* clear_bss() already clear the empty_zero_page */
 
-   register_page_bootmem_info();
-
/* this will put all memory onto the freelists */
free_all_bootmem();
after_bootmem = 1;
 
+   /*
+* Must be done after boot memory is put on freelist, because here we
+* might set fields in deferred struct pages that have not yet been
+* initialized, and free_all_bootmem() initializes all the reserved
+* deferred pages for us.
+*/
+   register_page_bootmem_info();
+
/* Register memory areas for /proc/kcore */
kclist_add(_vsyscall, (void *)VSYSCALL_ADDR,
 PAGE_SIZE, KCORE_OTHER);
-- 
2.14.2

[PATCH v10 03/10] sparc64: simplify vmemmap_populate

2017-10-05 Thread Pavel Tatashin

Remove duplicating code by using common functions
vmemmap_pud_populate and vmemmap_pgd_populate.

Signed-off-by: Pavel Tatashin 
Reviewed-by: Steven Sistare 
Reviewed-by: Daniel Jordan 
Reviewed-by: Bob Picco 
Acked-by: David S. Miller 
Acked-by: Michal Hocko 
---
 arch/sparc/mm/init_64.c | 23 ++-
 1 file changed, 6 insertions(+), 17 deletions(-)

diff --git a/arch/sparc/mm/init_64.c b/arch/sparc/mm/init_64.c
index caed495544e9..6839db3ffe1d 100644
--- a/arch/sparc/mm/init_64.c
+++ b/arch/sparc/mm/init_64.c
@@ -2652,30 +2652,19 @@ int __meminit vmemmap_populate(unsigned long vstart, 
unsigned long vend,
vstart = vstart & PMD_MASK;
vend = ALIGN(vend, PMD_SIZE);
for (; vstart < vend; vstart += PMD_SIZE) {
-   pgd_t *pgd = pgd_offset_k(vstart);
+   pgd_t *pgd = vmemmap_pgd_populate(vstart, node);
unsigned long pte;
pud_t *pud;
pmd_t *pmd;
 
-   if (pgd_none(*pgd)) {
-   pud_t *new = vmemmap_alloc_block(PAGE_SIZE, node);
+   if (!pgd)
+   return -ENOMEM;
 
-   if (!new)
-   return -ENOMEM;
-   pgd_populate(_mm, pgd, new);
-   }
-
-   pud = pud_offset(pgd, vstart);
-   if (pud_none(*pud)) {
-   pmd_t *new = vmemmap_alloc_block(PAGE_SIZE, node);
-
-   if (!new)
-   return -ENOMEM;
-   pud_populate(_mm, pud, new);
-   }
+   pud = vmemmap_pud_populate(pgd, vstart, node);
+   if (!pud)
+   return -ENOMEM;
 
pmd = pmd_offset(pud, vstart);
-
pte = pmd_val(*pmd);
if (!(pte & _PAGE_VALID)) {
void *block = vmemmap_alloc_block(PMD_SIZE, node);
-- 
2.14.2

[PATCH v10 01/10] x86/mm: setting fields in deferred pages

2017-10-05 Thread Pavel Tatashin

Without deferred struct page feature (CONFIG_DEFERRED_STRUCT_PAGE_INIT),
flags and other fields in "struct page"es are never changed prior to first
initializing struct pages by going through __init_single_page().

With deferred struct page feature enabled, however, we set fields in
register_page_bootmem_info that are subsequently clobbered right after in
free_all_bootmem:

mem_init() {
register_page_bootmem_info();
free_all_bootmem();
...
}

When register_page_bootmem_info() is called only non-deferred struct pages
are initialized. But, this function goes through some reserved pages which
might be part of the deferred, and thus are not yet initialized.

  mem_init
   register_page_bootmem_info
register_page_bootmem_info_node
 get_page_bootmem
  .. setting fields here ..
  such as: page->freelist = (void *)type;

  free_all_bootmem()
   free_low_memory_core_early()
for_each_reserved_mem_region()
 reserve_bootmem_region()
  init_reserved_page() <- Only if this is deferred reserved page
   __init_single_pfn()
__init_single_page()
memset(0) <-- Loose the set fields here

We end-up with issue where, currently we do not observe problem as memory
is explicitly zeroed. But, if flag asserts are changed we can start hitting
issues.

Also, because in this patch series we will stop zeroing struct page memory
during allocation, we must make sure that struct pages are properly
initialized prior to using them.

The deferred-reserved pages are initialized in free_all_bootmem().
Therefore, the fix is to switch the above calls.

Signed-off-by: Pavel Tatashin 
Reviewed-by: Steven Sistare 
Reviewed-by: Daniel Jordan 
Reviewed-by: Bob Picco 
Acked-by: Michal Hocko 
---
 arch/x86/mm/init_64.c | 10 --
 1 file changed, 8 insertions(+), 2 deletions(-)

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index 5ea1c3c2636e..8822523fdcd7 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1182,12 +1182,18 @@ void __init mem_init(void)
 
/* clear_bss() already clear the empty_zero_page */
 
-   register_page_bootmem_info();
-
/* this will put all memory onto the freelists */
free_all_bootmem();
after_bootmem = 1;
 
+   /*
+* Must be done after boot memory is put on freelist, because here we
+* might set fields in deferred struct pages that have not yet been
+* initialized, and free_all_bootmem() initializes all the reserved
+* deferred pages for us.
+*/
+   register_page_bootmem_info();
+
/* Register memory areas for /proc/kcore */
kclist_add(_vsyscall, (void *)VSYSCALL_ADDR,
 PAGE_SIZE, KCORE_OTHER);
-- 
2.14.2

Re: [PATCH] pci/ehp_hpc: Convert timers to use timer_setup()

2017-10-05 Thread Bjorn Helgaas

On Wed, Oct 04, 2017 at 05:53:48PM -0700, Kees Cook wrote:
> In preparation for unconditionally passing the struct timer_list pointer to
> all timer callbacks, switch to using the new timer_setup() and from_timer()
> to pass the timer pointer explicitly. This fixes what appears to be a bug
> in passing the wrong pointer to the timer handler (address of ctrl pointer
> instead of ctrl pointer).
> 
> Cc: Bjorn Helgaas 
> Cc: Mika Westerberg 
> Cc: Mayurkumar Patel 
> Cc: Keith Busch 
> Cc: linux-...@vger.kernel.org
> Cc: Thomas Gleixner 
> Signed-off-by: Kees Cook 

Applied to pci/hotplug for v4.15, thanks!

> ---
> This requires commit 686fef928bba ("timer: Prepare to change timer
> callback argument type") in v4.14-rc3, but should be otherwise
> stand-alone.
> ---
>  drivers/pci/hotplug/pciehp_hpc.c | 12 
>  1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c 
> b/drivers/pci/hotplug/pciehp_hpc.c
> index e5d5ce9e3010..ba5055c5115c 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -50,14 +50,13 @@ static irqreturn_t pcie_isr(int irq, void *dev_id);
>  static void start_int_poll_timer(struct controller *ctrl, int sec);
>  
>  /* This is the interrupt polling timeout function. */
> -static void int_poll_timeout(unsigned long data)
> +static void int_poll_timeout(struct timer_list *t)
>  {
> - struct controller *ctrl = (struct controller *)data;
> + struct controller *ctrl = from_timer(ctrl, t, poll_timer);
>  
>   /* Poll for interrupt events.  regs == NULL => polling */
>   pcie_isr(0, ctrl);
>  
> - init_timer(>poll_timer);
>   if (!pciehp_poll_time)
>   pciehp_poll_time = 2; /* default polling interval is 2 sec */
>  
> @@ -71,8 +70,6 @@ static void start_int_poll_timer(struct controller *ctrl, 
> int sec)
>   if ((sec <= 0) || (sec > 60))
>   sec = 2;
>  
> - ctrl->poll_timer.function = _poll_timeout;
> - ctrl->poll_timer.data = (unsigned long)ctrl;
>   ctrl->poll_timer.expires = jiffies + sec * HZ;
>   add_timer(>poll_timer);
>  }
> @@ -83,7 +80,7 @@ static inline int pciehp_request_irq(struct controller 
> *ctrl)
>  
>   /* Install interrupt polling timer. Start with 10 sec delay */
>   if (pciehp_poll_mode) {
> - init_timer(>poll_timer);
> + timer_setup(>poll_timer, int_poll_timeout, 0);
>   start_int_poll_timer(ctrl, 10);
>   return 0;
>   }
> @@ -764,8 +761,7 @@ int pciehp_reset_slot(struct slot *slot, int probe)
>   ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
>pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, ctrl_mask);
>   if (pciehp_poll_mode)
> - int_poll_timeout(ctrl->poll_timer.data);
> -
> + int_poll_timeout(>poll_timer);
>   return 0;
>  }
>  
> -- 
> 2.7.4
> 
> 
> -- 
> Kees Cook
> Pixel Security

Re: [PATCH] pci/ehp_hpc: Convert timers to use timer_setup()

2017-10-05 Thread Bjorn Helgaas

On Wed, Oct 04, 2017 at 05:53:48PM -0700, Kees Cook wrote:
> In preparation for unconditionally passing the struct timer_list pointer to
> all timer callbacks, switch to using the new timer_setup() and from_timer()
> to pass the timer pointer explicitly. This fixes what appears to be a bug
> in passing the wrong pointer to the timer handler (address of ctrl pointer
> instead of ctrl pointer).
> 
> Cc: Bjorn Helgaas 
> Cc: Mika Westerberg 
> Cc: Mayurkumar Patel 
> Cc: Keith Busch 
> Cc: linux-...@vger.kernel.org
> Cc: Thomas Gleixner 
> Signed-off-by: Kees Cook 

Applied to pci/hotplug for v4.15, thanks!

> ---
> This requires commit 686fef928bba ("timer: Prepare to change timer
> callback argument type") in v4.14-rc3, but should be otherwise
> stand-alone.
> ---
>  drivers/pci/hotplug/pciehp_hpc.c | 12 
>  1 file changed, 4 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/pci/hotplug/pciehp_hpc.c 
> b/drivers/pci/hotplug/pciehp_hpc.c
> index e5d5ce9e3010..ba5055c5115c 100644
> --- a/drivers/pci/hotplug/pciehp_hpc.c
> +++ b/drivers/pci/hotplug/pciehp_hpc.c
> @@ -50,14 +50,13 @@ static irqreturn_t pcie_isr(int irq, void *dev_id);
>  static void start_int_poll_timer(struct controller *ctrl, int sec);
>  
>  /* This is the interrupt polling timeout function. */
> -static void int_poll_timeout(unsigned long data)
> +static void int_poll_timeout(struct timer_list *t)
>  {
> - struct controller *ctrl = (struct controller *)data;
> + struct controller *ctrl = from_timer(ctrl, t, poll_timer);
>  
>   /* Poll for interrupt events.  regs == NULL => polling */
>   pcie_isr(0, ctrl);
>  
> - init_timer(>poll_timer);
>   if (!pciehp_poll_time)
>   pciehp_poll_time = 2; /* default polling interval is 2 sec */
>  
> @@ -71,8 +70,6 @@ static void start_int_poll_timer(struct controller *ctrl, 
> int sec)
>   if ((sec <= 0) || (sec > 60))
>   sec = 2;
>  
> - ctrl->poll_timer.function = _poll_timeout;
> - ctrl->poll_timer.data = (unsigned long)ctrl;
>   ctrl->poll_timer.expires = jiffies + sec * HZ;
>   add_timer(>poll_timer);
>  }
> @@ -83,7 +80,7 @@ static inline int pciehp_request_irq(struct controller 
> *ctrl)
>  
>   /* Install interrupt polling timer. Start with 10 sec delay */
>   if (pciehp_poll_mode) {
> - init_timer(>poll_timer);
> + timer_setup(>poll_timer, int_poll_timeout, 0);
>   start_int_poll_timer(ctrl, 10);
>   return 0;
>   }
> @@ -764,8 +761,7 @@ int pciehp_reset_slot(struct slot *slot, int probe)
>   ctrl_dbg(ctrl, "%s: SLOTCTRL %x write cmd %x\n", __func__,
>pci_pcie_cap(ctrl->pcie->port) + PCI_EXP_SLTCTL, ctrl_mask);
>   if (pciehp_poll_mode)
> - int_poll_timeout(ctrl->poll_timer.data);
> -
> + int_poll_timeout(>poll_timer);
>   return 0;
>  }
>  
> -- 
> 2.7.4
> 
> 
> -- 
> Kees Cook
> Pixel Security

Re: [PATCH] switchtec: make struct event_regs static

2017-10-05 Thread Bjorn Helgaas

On Thu, Oct 05, 2017 at 11:01:45AM +0100, Colin King wrote:
> From: Colin Ian King 
> 
> The structure event_regs is local to the source and does not need to
> be in global scope, so make it static.
> 
> Cleans up sparse warning:
> symbol 'event_regs' was not declared. Should it be static
> 
> Signed-off-by: Colin Ian King 

Applied with Logan's ack to pci/switchtec for v4.15, thanks!

> ---
>  drivers/pci/switch/switchtec.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c
> index af81b2dec42e..da45dbea20ce 100644
> --- a/drivers/pci/switch/switchtec.c
> +++ b/drivers/pci/switch/switchtec.c
> @@ -943,7 +943,7 @@ static u32 __iomem *pff_ev_reg(struct switchtec_dev 
> *stdev,
>  #define EV_PAR(i, r)[i] = {offsetof(struct part_cfg_regs, r), part_ev_reg}
>  #define EV_PFF(i, r)[i] = {offsetof(struct pff_csr_regs, r), pff_ev_reg}
>  
> -const struct event_reg {
> +static const struct event_reg {
>   size_t offset;
>   u32 __iomem *(*map_reg)(struct switchtec_dev *stdev,
>   size_t offset, int index);
> -- 
> 2.14.1
>

Re: [PATCH] switchtec: make struct event_regs static

2017-10-05 Thread Bjorn Helgaas

On Thu, Oct 05, 2017 at 11:01:45AM +0100, Colin King wrote:
> From: Colin Ian King 
> 
> The structure event_regs is local to the source and does not need to
> be in global scope, so make it static.
> 
> Cleans up sparse warning:
> symbol 'event_regs' was not declared. Should it be static
> 
> Signed-off-by: Colin Ian King 

Applied with Logan's ack to pci/switchtec for v4.15, thanks!

> ---
>  drivers/pci/switch/switchtec.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/pci/switch/switchtec.c b/drivers/pci/switch/switchtec.c
> index af81b2dec42e..da45dbea20ce 100644
> --- a/drivers/pci/switch/switchtec.c
> +++ b/drivers/pci/switch/switchtec.c
> @@ -943,7 +943,7 @@ static u32 __iomem *pff_ev_reg(struct switchtec_dev 
> *stdev,
>  #define EV_PAR(i, r)[i] = {offsetof(struct part_cfg_regs, r), part_ev_reg}
>  #define EV_PFF(i, r)[i] = {offsetof(struct pff_csr_regs, r), pff_ev_reg}
>  
> -const struct event_reg {
> +static const struct event_reg {
>   size_t offset;
>   u32 __iomem *(*map_reg)(struct switchtec_dev *stdev,
>   size_t offset, int index);
> -- 
> 2.14.1
>

[PATCH V6 RESEND] sched/fair: Remove group imbalance from calculate_imbalance()

2017-10-05 Thread Jeffrey Hugo

The group_imbalance path in calculate_imbalance() made sense when it was
added back in 2007 with commit 908a7c1b9b80 ("sched: fix improper load
balance across sched domain") because busiest->load_per_task factored into
the amount of imbalance that was calculated. Beginning with commit
dd5feea14a7d ("sched: Fix SCHED_MC regression caused by change in sched
cpu_power"), busiest->load_per_task is not a factor in the imbalance
calculation, thus the group_imbalance path no longer makes sense.

The group_imbalance path can only affect the outcome of
calculate_imbalance() when the average load of the domain is less than the
original busiest->load_per_task. In this case, busiest->load_per_task is
overwritten with the scheduling domain load average. Thus
busiest->load_per_task no longer represents actual load that can be moved.

At the final comparison between env->imbalance and busiest->load_per_task,
imbalance may be larger than the new busiest->load_per_task causing the
check to fail under the assumption that there is a task that could be
migrated to satisfy the imbalance. However env->imbalance may still be
smaller than the original busiest->load_per_task, thus it is unlikely that
there is a task that can be migrated to satisfy the imbalance.
Calculate_imbalance() would not choose to run fix_small_imbalance() when we
expect it should. In the worst case, this can result in idle cpus.

Since the group imbalance path in calculate_imbalance() is at best a NOP
but otherwise harmful, remove it.

Co-authored-by: Austin Christ 
Signed-off-by: Jeffrey Hugo 
Tested-by: Tyler Baicar 
Reviewed-by: Dietmar Eggemann 
---

Peter, we were hoping you'd take this fix.  The discussion last time around
didn't seem ot have a specific conclusion.  Please lay out how we can move
forward on this.  Thanks.

[v6]
-Added additional history clarification to commit text

 kernel/sched/fair.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0107280..e92a0bf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8067,15 +8067,6 @@ static inline void calculate_imbalance(struct lb_env 
*env, struct sd_lb_stats *s
local = >local_stat;
busiest = >busiest_stat;
 
-   if (busiest->group_type == group_imbalanced) {
-   /*
-* In the group_imb case we cannot rely on group-wide averages
-* to ensure cpu-load equilibrium, look at wider averages. XXX
-*/
-   busiest->load_per_task =
-   min(busiest->load_per_task, sds->avg_load);
-   }
-
/*
 * Avg load of busiest sg can be less and avg load of local sg can
 * be greater than avg load across all sgs of sd because avg load
-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

[PATCH V6 RESEND] sched/fair: Remove group imbalance from calculate_imbalance()

2017-10-05 Thread Jeffrey Hugo

The group_imbalance path in calculate_imbalance() made sense when it was
added back in 2007 with commit 908a7c1b9b80 ("sched: fix improper load
balance across sched domain") because busiest->load_per_task factored into
the amount of imbalance that was calculated. Beginning with commit
dd5feea14a7d ("sched: Fix SCHED_MC regression caused by change in sched
cpu_power"), busiest->load_per_task is not a factor in the imbalance
calculation, thus the group_imbalance path no longer makes sense.

The group_imbalance path can only affect the outcome of
calculate_imbalance() when the average load of the domain is less than the
original busiest->load_per_task. In this case, busiest->load_per_task is
overwritten with the scheduling domain load average. Thus
busiest->load_per_task no longer represents actual load that can be moved.

At the final comparison between env->imbalance and busiest->load_per_task,
imbalance may be larger than the new busiest->load_per_task causing the
check to fail under the assumption that there is a task that could be
migrated to satisfy the imbalance. However env->imbalance may still be
smaller than the original busiest->load_per_task, thus it is unlikely that
there is a task that can be migrated to satisfy the imbalance.
Calculate_imbalance() would not choose to run fix_small_imbalance() when we
expect it should. In the worst case, this can result in idle cpus.

Since the group imbalance path in calculate_imbalance() is at best a NOP
but otherwise harmful, remove it.

Co-authored-by: Austin Christ 
Signed-off-by: Jeffrey Hugo 
Tested-by: Tyler Baicar 
Reviewed-by: Dietmar Eggemann 
---

Peter, we were hoping you'd take this fix.  The discussion last time around
didn't seem ot have a specific conclusion.  Please lay out how we can move
forward on this.  Thanks.

[v6]
-Added additional history clarification to commit text

 kernel/sched/fair.c | 9 -
 1 file changed, 9 deletions(-)

diff --git a/kernel/sched/fair.c b/kernel/sched/fair.c
index 0107280..e92a0bf 100644
--- a/kernel/sched/fair.c
+++ b/kernel/sched/fair.c
@@ -8067,15 +8067,6 @@ static inline void calculate_imbalance(struct lb_env 
*env, struct sd_lb_stats *s
local = >local_stat;
busiest = >busiest_stat;
 
-   if (busiest->group_type == group_imbalanced) {
-   /*
-* In the group_imb case we cannot rely on group-wide averages
-* to ensure cpu-load equilibrium, look at wider averages. XXX
-*/
-   busiest->load_per_task =
-   min(busiest->load_per_task, sds->avg_load);
-   }
-
/*
 * Avg load of busiest sg can be less and avg load of local sg can
 * be greater than avg load across all sgs of sd because avg load
-- 
Qualcomm Datacenter Technologies as an affiliate of Qualcomm Technologies, Inc.
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.

Re: [Intel-wired-lan] [PATCH] PCI: Check/Set ARI capability before setting numVFs

2017-10-05 Thread Bjorn Helgaas

On Wed, Oct 04, 2017 at 04:29:14PM -0700, Alexander Duyck wrote:
> On Wed, Oct 4, 2017 at 4:01 PM, Bjorn Helgaas  wrote:
> > On Wed, Oct 04, 2017 at 08:52:58AM -0700, Tony Nguyen wrote:
> >> This fixes a bug that can occur if an AER error is encountered while SRIOV
> >> devices are present.
> >>
> >> This issue was seen by doing the following. Inject an AER error to a device
> >> that has SRIOV devices.  After the device has recovered, remove the driver.
> >> Reload the driver and enable SRIOV which causes the following crash to
> >> occur:
> >>
> >> kernel BUG at drivers/pci/iov.c:157!
> >> invalid opcode:  [#1] SMP
> >> CPU: 36 PID: 2295 Comm: bash Not tainted 4.14.0-rc1+ #74
> >> Hardware name: Supermicro X9DAi/X9DAi, BIOS 3.0a 04/29/2014
> >> task: 9fa41cd45a00 task.stack: b4b2036e8000
> >> RIP: 0010:pci_iov_add_virtfn+0x2eb/0x350
> >> RSP: 0018:b4b2036ebcb8 EFLAGS: 00010286
> >> RAX: fff0 RBX: 9fa42c1c8800 RCX: 9fa421ce2388
> >> RDX: df90 RSI: 9fa8214fb388 RDI: df903fff
> >> RBP: b4b2036ebd18 R08: 9fa421ce23b8 R09: b4b2036ebc2c
> >> R10: 9fa42c1a5548 R11: 058e R12: 9fa8214fb000
> >> R13: 9fa42c1a5000 R14: 9fa8214fb388 R15: 
> >> FS:  7f60724b6700() GS:9fa82f30()
> >> knlGS:
> >> CS:  0010 DS:  ES:  CR0: 80050033
> >> CR2: 559eca8b0f40 CR3: 000864146000 CR4: 001606e0
> >> Call Trace:
> >>  pci_enable_sriov+0x353/0x440
> >>  ixgbe_pci_sriov_configure+0xd5/0x1f0 [ixgbe]
> >>  sriov_numvfs_store+0xf7/0x170
> >>  dev_attr_store+0x18/0x30
> >>  sysfs_kf_write+0x37/0x40
> >>  kernfs_fop_write+0x120/0x1b0
> >>  __vfs_write+0x37/0x170
> >>  ? __alloc_fd+0x3f/0x170
> >>  ? set_close_on_exec+0x30/0x70
> >>  vfs_write+0xb5/0x1a0
> >>  SyS_write+0x55/0xc0
> >>  entry_SYSCALL_64_fastpath+0x1a/0xa5
> >> RIP: 0033:0x7f6071bafc20
> >> RSP: 002b:7ffe7d42ba48 EFLAGS: 0246 ORIG_RAX: 0001
> >> RAX: ffda RBX: 559eca8b0f30 RCX: 7f6071bafc20
> >> RDX: 0002 RSI: 559eca961f60 RDI: 0001
> >> RBP: 7f6071e78ae0 R08: 7f6071e7a740 R09: 7f60724b6700
> >> R10: 0073 R11: 0246 R12: 
> >> R13:  R14:  R15: 559eca892170
> >> RIP: pci_iov_add_virtfn+0x2eb/0x350 RSP: b4b2036ebcb8
> >>
> >> The occurs since during AER recovery the ARI Capable Hierarchy bit,
> >> which can affect the values for First VF Offset and VF Stride, is not set
> >> until after pci_iov_set_numvfs() is called.
> >
> > Can you elaborate on where exactly this happens?  The only place we
> > explicitly set PCI_SRIOV_CTRL_ARI is in sriov_init(), which is only
> > called at enumeration-time.  So I'm guessing you're talking about this
> > path:
> >
> >   ixgbe_io_slot_reset
> > pci_restore_state
> >   pci_restore_iov_state
> > sriov_restore_state
> >   pci_iov_set_numvfs
> >
> > where we don't set PCI_SRIOV_CTRL_ARI at all.  The fact that you say
> > PCI_SRIOV_CTRL_ARI isn't set until *after* pci_iov_set_numvfs() is
> > called suggests that it is being set *somewhere*, but I don't know
> > where.
> 
> The ARI bit is initialized in sriov_init, stored in iov->ctrl, and
> restored in sriov_restore_state, but it occurs in the line after the
> call to pci_iov_set_numvfs.
> 
> The problem is you don't want to write the full iov->ctrl value until
> after you have reset the the number of VFs since it will set VFE so
> pulling out and configuring the ARI value separately is needed.

Doh, that should have been obvious to me ;)

> >> This can cause the iov
> >> structure to be populated with values that are incorrect if the bit is
> >> later set.   Check and set this bit, if needed, before calling
> >> pci_iov_set_numvfs() so that the values being populated properly take
> >> the ARI bit into account.
> >>
> >> CC: Alexander Duyck 
> >> CC: Emil Tantilov 
> >> Signed-off-by: Tony Nguyen 
> >> ---
> >>  drivers/pci/iov.c | 4 
> >>  1 file changed, 4 insertions(+)
> >>
> >> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> >> index 7492a65..a8896c7 100644
> >> --- a/drivers/pci/iov.c
> >> +++ b/drivers/pci/iov.c
> >> @@ -497,6 +497,10 @@ static void sriov_restore_state(struct pci_dev *dev)
> >>   if (ctrl & PCI_SRIOV_CTRL_VFE)
> >>   return;
> >>
> >> + if ((iov->ctrl & PCI_SRIOV_CTRL_ARI) && !(ctrl & PCI_SRIOV_CTRL_ARI))
> >> + pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL,
> >> +   ctrl | PCI_SRIOV_CTRL_ARI);

This looks a little fiddly and also assumes that we only ever need to
*set* PCI_SRIOV_CTRL_ARI.  That's likely the case because it's
probably cleared after reset and during resume.  But I'm not *sure*
that's always the case, so what do you

Re: [Intel-wired-lan] [PATCH] PCI: Check/Set ARI capability before setting numVFs

2017-10-05 Thread Bjorn Helgaas

On Wed, Oct 04, 2017 at 04:29:14PM -0700, Alexander Duyck wrote:
> On Wed, Oct 4, 2017 at 4:01 PM, Bjorn Helgaas  wrote:
> > On Wed, Oct 04, 2017 at 08:52:58AM -0700, Tony Nguyen wrote:
> >> This fixes a bug that can occur if an AER error is encountered while SRIOV
> >> devices are present.
> >>
> >> This issue was seen by doing the following. Inject an AER error to a device
> >> that has SRIOV devices.  After the device has recovered, remove the driver.
> >> Reload the driver and enable SRIOV which causes the following crash to
> >> occur:
> >>
> >> kernel BUG at drivers/pci/iov.c:157!
> >> invalid opcode:  [#1] SMP
> >> CPU: 36 PID: 2295 Comm: bash Not tainted 4.14.0-rc1+ #74
> >> Hardware name: Supermicro X9DAi/X9DAi, BIOS 3.0a 04/29/2014
> >> task: 9fa41cd45a00 task.stack: b4b2036e8000
> >> RIP: 0010:pci_iov_add_virtfn+0x2eb/0x350
> >> RSP: 0018:b4b2036ebcb8 EFLAGS: 00010286
> >> RAX: fff0 RBX: 9fa42c1c8800 RCX: 9fa421ce2388
> >> RDX: df90 RSI: 9fa8214fb388 RDI: df903fff
> >> RBP: b4b2036ebd18 R08: 9fa421ce23b8 R09: b4b2036ebc2c
> >> R10: 9fa42c1a5548 R11: 058e R12: 9fa8214fb000
> >> R13: 9fa42c1a5000 R14: 9fa8214fb388 R15: 
> >> FS:  7f60724b6700() GS:9fa82f30()
> >> knlGS:
> >> CS:  0010 DS:  ES:  CR0: 80050033
> >> CR2: 559eca8b0f40 CR3: 000864146000 CR4: 001606e0
> >> Call Trace:
> >>  pci_enable_sriov+0x353/0x440
> >>  ixgbe_pci_sriov_configure+0xd5/0x1f0 [ixgbe]
> >>  sriov_numvfs_store+0xf7/0x170
> >>  dev_attr_store+0x18/0x30
> >>  sysfs_kf_write+0x37/0x40
> >>  kernfs_fop_write+0x120/0x1b0
> >>  __vfs_write+0x37/0x170
> >>  ? __alloc_fd+0x3f/0x170
> >>  ? set_close_on_exec+0x30/0x70
> >>  vfs_write+0xb5/0x1a0
> >>  SyS_write+0x55/0xc0
> >>  entry_SYSCALL_64_fastpath+0x1a/0xa5
> >> RIP: 0033:0x7f6071bafc20
> >> RSP: 002b:7ffe7d42ba48 EFLAGS: 0246 ORIG_RAX: 0001
> >> RAX: ffda RBX: 559eca8b0f30 RCX: 7f6071bafc20
> >> RDX: 0002 RSI: 559eca961f60 RDI: 0001
> >> RBP: 7f6071e78ae0 R08: 7f6071e7a740 R09: 7f60724b6700
> >> R10: 0073 R11: 0246 R12: 
> >> R13:  R14:  R15: 559eca892170
> >> RIP: pci_iov_add_virtfn+0x2eb/0x350 RSP: b4b2036ebcb8
> >>
> >> The occurs since during AER recovery the ARI Capable Hierarchy bit,
> >> which can affect the values for First VF Offset and VF Stride, is not set
> >> until after pci_iov_set_numvfs() is called.
> >
> > Can you elaborate on where exactly this happens?  The only place we
> > explicitly set PCI_SRIOV_CTRL_ARI is in sriov_init(), which is only
> > called at enumeration-time.  So I'm guessing you're talking about this
> > path:
> >
> >   ixgbe_io_slot_reset
> > pci_restore_state
> >   pci_restore_iov_state
> > sriov_restore_state
> >   pci_iov_set_numvfs
> >
> > where we don't set PCI_SRIOV_CTRL_ARI at all.  The fact that you say
> > PCI_SRIOV_CTRL_ARI isn't set until *after* pci_iov_set_numvfs() is
> > called suggests that it is being set *somewhere*, but I don't know
> > where.
> 
> The ARI bit is initialized in sriov_init, stored in iov->ctrl, and
> restored in sriov_restore_state, but it occurs in the line after the
> call to pci_iov_set_numvfs.
> 
> The problem is you don't want to write the full iov->ctrl value until
> after you have reset the the number of VFs since it will set VFE so
> pulling out and configuring the ARI value separately is needed.

Doh, that should have been obvious to me ;)

> >> This can cause the iov
> >> structure to be populated with values that are incorrect if the bit is
> >> later set.   Check and set this bit, if needed, before calling
> >> pci_iov_set_numvfs() so that the values being populated properly take
> >> the ARI bit into account.
> >>
> >> CC: Alexander Duyck 
> >> CC: Emil Tantilov 
> >> Signed-off-by: Tony Nguyen 
> >> ---
> >>  drivers/pci/iov.c | 4 
> >>  1 file changed, 4 insertions(+)
> >>
> >> diff --git a/drivers/pci/iov.c b/drivers/pci/iov.c
> >> index 7492a65..a8896c7 100644
> >> --- a/drivers/pci/iov.c
> >> +++ b/drivers/pci/iov.c
> >> @@ -497,6 +497,10 @@ static void sriov_restore_state(struct pci_dev *dev)
> >>   if (ctrl & PCI_SRIOV_CTRL_VFE)
> >>   return;
> >>
> >> + if ((iov->ctrl & PCI_SRIOV_CTRL_ARI) && !(ctrl & PCI_SRIOV_CTRL_ARI))
> >> + pci_write_config_word(dev, iov->pos + PCI_SRIOV_CTRL,
> >> +   ctrl | PCI_SRIOV_CTRL_ARI);

This looks a little fiddly and also assumes that we only ever need to
*set* PCI_SRIOV_CTRL_ARI.  That's likely the case because it's
probably cleared after reset and during resume.  But I'm not *sure*
that's always the case, so what do you think about the proposal below?

> >>   for (i = PCI_IOV_RESOURCES; i <= PCI_IOV_RESOURCE_END; i++)

Re: [PATCH v2] ACPI / LPIT: Add Low Power Idle Table (LPIT) support

2017-10-05 Thread Rafael J. Wysocki

On Thursday, October 5, 2017 10:43:33 PM CEST Srinivas Pandruvada wrote:
> On Thu, 2017-10-05 at 21:39 +0300, Andy Shevchenko wrote:
> > On Thu, Oct 5, 2017 at 9:16 PM, Srinivas Pandruvada
> >  wrote:
> > > 
> > > Added functionality to read LPIT table, which provides:
> > > 
> > > - Sysfs interface to read residency counters via
> > > /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us
> > > /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us
> > > 
> > > Here the count "low_power_idle_cpu_residency_us" shows the time
> > > spent
> > > by CPU package in low power state. This is read via MSR interface,
> > > which
> > > points to MSR for PKG C10.
> > > 
> > > Here the count "low_power_idle_system_residency_us" show the count
> > > the
> > > system was in low power state. This is read via MMIO interface.
> > > This
> > > is mapped to SLP_S0 residency on modern Intel systems. This
> > > residency
> > > is achieved only when CPU is in PKG C10 and all functional blocks
> > > are
> > > in low power state.
> > > 
> > > It is possible that none of the above counters present or anyone of
> > > the
> > > counter present or all counters present.
> > > 
> > > For example: On my Kabylake system both of the above counters
> > > present.
> > > After suspend to idle these counts updated and prints:
> > > 6916179
> > > 6998564
> > > 
> > > This counter can be read by tools like turbostat to display. Or it
> > > can
> > > be used to debug, if modern systems are reaching desired low power
> > > state.
> > > 
> > > - Provides an interface to read residency counter memory address
> > > This address can be used to get the base address of PMC memory
> > > mapped IO.
> > > This is utilized by intel_pmc_core driver to print more debug
> > > information.
> > 
> > > 
> > > +   switch (residency_info_mem.gaddr.bit_width) {
> > > +   case 8:
> > > +   count =
> > > readb(residency_info_mem.iomem_addr);
> > > +   break;
> > > +   case 16:
> > > +   count =
> > > readw(residency_info_mem.iomem_addr);
> > > +   break;
> > > +   case 32:
> > > +   count =
> > > readl(residency_info_mem.iomem_addr);
> > > +   break;
> > > +   case 64:
> > > +   count =
> > > readq(residency_info_mem.iomem_addr);
> > > +   break;
> > > +   default:
> > > +   return -EINVAL;
> > > +   }
> > 
> > I saw something very similar already under drivers/acpi. Can we
> > utilize it (split a helper out of it and re-use)?
> This functionality is probably not only for ACPI, but may be other
> parts of the kernel too. So if there is a common function then it can
> be more generic outside of ACPI.

If the value of the field is a GAS, we can use the ACPICA's library
routine for reading from there I suppose.

Thanks,
Rafael

Re: [PATCH v2] ACPI / LPIT: Add Low Power Idle Table (LPIT) support

2017-10-05 Thread Rafael J. Wysocki

On Thursday, October 5, 2017 10:43:33 PM CEST Srinivas Pandruvada wrote:
> On Thu, 2017-10-05 at 21:39 +0300, Andy Shevchenko wrote:
> > On Thu, Oct 5, 2017 at 9:16 PM, Srinivas Pandruvada
> >  wrote:
> > > 
> > > Added functionality to read LPIT table, which provides:
> > > 
> > > - Sysfs interface to read residency counters via
> > > /sys/devices/system/cpu/cpuidle/low_power_idle_cpu_residency_us
> > > /sys/devices/system/cpu/cpuidle/low_power_idle_system_residency_us
> > > 
> > > Here the count "low_power_idle_cpu_residency_us" shows the time
> > > spent
> > > by CPU package in low power state. This is read via MSR interface,
> > > which
> > > points to MSR for PKG C10.
> > > 
> > > Here the count "low_power_idle_system_residency_us" show the count
> > > the
> > > system was in low power state. This is read via MMIO interface.
> > > This
> > > is mapped to SLP_S0 residency on modern Intel systems. This
> > > residency
> > > is achieved only when CPU is in PKG C10 and all functional blocks
> > > are
> > > in low power state.
> > > 
> > > It is possible that none of the above counters present or anyone of
> > > the
> > > counter present or all counters present.
> > > 
> > > For example: On my Kabylake system both of the above counters
> > > present.
> > > After suspend to idle these counts updated and prints:
> > > 6916179
> > > 6998564
> > > 
> > > This counter can be read by tools like turbostat to display. Or it
> > > can
> > > be used to debug, if modern systems are reaching desired low power
> > > state.
> > > 
> > > - Provides an interface to read residency counter memory address
> > > This address can be used to get the base address of PMC memory
> > > mapped IO.
> > > This is utilized by intel_pmc_core driver to print more debug
> > > information.
> > 
> > > 
> > > +   switch (residency_info_mem.gaddr.bit_width) {
> > > +   case 8:
> > > +   count =
> > > readb(residency_info_mem.iomem_addr);
> > > +   break;
> > > +   case 16:
> > > +   count =
> > > readw(residency_info_mem.iomem_addr);
> > > +   break;
> > > +   case 32:
> > > +   count =
> > > readl(residency_info_mem.iomem_addr);
> > > +   break;
> > > +   case 64:
> > > +   count =
> > > readq(residency_info_mem.iomem_addr);
> > > +   break;
> > > +   default:
> > > +   return -EINVAL;
> > > +   }
> > 
> > I saw something very similar already under drivers/acpi. Can we
> > utilize it (split a helper out of it and re-use)?
> This functionality is probably not only for ACPI, but may be other
> parts of the kernel too. So if there is a common function then it can
> be more generic outside of ACPI.

If the value of the field is a GAS, we can use the ACPICA's library
routine for reading from there I suppose.

Thanks,
Rafael

Re: Extreme time jitter with suspend/resume cycles

2017-10-05 Thread Thomas Gleixner

On Thu, 5 Oct 2017, Gabriel Beddingfield wrote:

> Hi Thomas,
> 
> On Thu, Oct 5, 2017 at 11:01 AM, Thomas Gleixner  wrote:
> >> > Which SoC/clocksource driver are you talking about?
> >>
> >> NXP i.MX 6SoloX
> >> drivers/clocksource/timer-imx-gpt.c
> >
> > So that clocksource driver looks correct. Do you have an idea in which
> > context this time jump happens? Does it happen when you exercise your high
> > frequency suspend/resume dance or is that happening just when you let the
> > machine run forever as well?
> 
> We couldn't devise any reproduction steps. We observed it happening at
> unexpected times in a fleet of devices -- and we couldn't find any
> patterns to clue us in.

Ok. Did you talk to NXP about that? Or did you try to exercise reads in a
loop to detect the wreckage and maybe a pattern in there?

> > The timekeeping_resume() path definitely has an issue:
> >
> > cycle_now = tk_clock_read(>tkr_mono);
> > if ((clock->flags & CLOCK_SOURCE_SUSPEND_NONSTOP) &&
> > cycle_now > tk->tkr_mono.cycle_last) {
> >
> > This works nice for clocksources which wont wrap across suspend/resume but
> > not for those which can. That cycle_now -> cycle_last check should take
> > cs-mask into account ...
> >
> > Of course for clocksources which can wrap within realistic suspend times,
> > which 36 hours might be accounted for, this would need an extra sanity
> > check against a RTC whether wrap time has been exceeded.
> >
> > I haven't thought it through whether that buggered check fully explains
> > what you are observing, but it's wrong nevertheless. John?
> 
> Nah. It looks like the consequence is that you'll either fail to inject
> the sleep time or you'll fall back to having the RTC inject the sleep
> time. In our case, we never sleep for more than a couple of minutes so
> the error would be seconds rather than hours.

Fair enough. It's still wrong though and wants to be fixed.

Thanks,

tglx

Re: Extreme time jitter with suspend/resume cycles

2017-10-05 Thread Thomas Gleixner

On Thu, 5 Oct 2017, Gabriel Beddingfield wrote:

> Hi Thomas,
> 
> On Thu, Oct 5, 2017 at 11:01 AM, Thomas Gleixner  wrote:
> >> > Which SoC/clocksource driver are you talking about?
> >>
> >> NXP i.MX 6SoloX
> >> drivers/clocksource/timer-imx-gpt.c
> >
> > So that clocksource driver looks correct. Do you have an idea in which
> > context this time jump happens? Does it happen when you exercise your high
> > frequency suspend/resume dance or is that happening just when you let the
> > machine run forever as well?
> 
> We couldn't devise any reproduction steps. We observed it happening at
> unexpected times in a fleet of devices -- and we couldn't find any
> patterns to clue us in.

Ok. Did you talk to NXP about that? Or did you try to exercise reads in a
loop to detect the wreckage and maybe a pattern in there?

> > The timekeeping_resume() path definitely has an issue:
> >
> > cycle_now = tk_clock_read(>tkr_mono);
> > if ((clock->flags & CLOCK_SOURCE_SUSPEND_NONSTOP) &&
> > cycle_now > tk->tkr_mono.cycle_last) {
> >
> > This works nice for clocksources which wont wrap across suspend/resume but
> > not for those which can. That cycle_now -> cycle_last check should take
> > cs-mask into account ...
> >
> > Of course for clocksources which can wrap within realistic suspend times,
> > which 36 hours might be accounted for, this would need an extra sanity
> > check against a RTC whether wrap time has been exceeded.
> >
> > I haven't thought it through whether that buggered check fully explains
> > what you are observing, but it's wrong nevertheless. John?
> 
> Nah. It looks like the consequence is that you'll either fail to inject
> the sleep time or you'll fall back to having the RTC inject the sleep
> time. In our case, we never sleep for more than a couple of minutes so
> the error would be seconds rather than hours.

Fair enough. It's still wrong though and wants to be fixed.

Thanks,

tglx

Re: [PATCH v6 01/11] dt-bindings: net: Restore sun8i dwmac binding

2017-10-05 Thread Rob Herring

On Wed, Sep 27, 2017 at 09:34:04AM +0200, Corentin Labbe wrote:
> This patch restore dt-bindings documentation about dwmac-sun8i
> This reverts commit 8aa33ec2f481 ("dt-bindings: net: Revert sun8i dwmac 
> binding")

What's missing here is why you are reverting?

> 
> Signed-off-by: Corentin Labbe 
> ---
>  .../devicetree/bindings/net/dwmac-sun8i.txt| 84 
> ++
>  1 file changed, 84 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/dwmac-sun8i.txt

Otherwise,

Acked-by: Rob Herring

Re: [PATCH v6 01/11] dt-bindings: net: Restore sun8i dwmac binding

2017-10-05 Thread Rob Herring

On Wed, Sep 27, 2017 at 09:34:04AM +0200, Corentin Labbe wrote:
> This patch restore dt-bindings documentation about dwmac-sun8i
> This reverts commit 8aa33ec2f481 ("dt-bindings: net: Revert sun8i dwmac 
> binding")

What's missing here is why you are reverting?

> 
> Signed-off-by: Corentin Labbe 
> ---
>  .../devicetree/bindings/net/dwmac-sun8i.txt| 84 
> ++
>  1 file changed, 84 insertions(+)
>  create mode 100644 Documentation/devicetree/bindings/net/dwmac-sun8i.txt

Otherwise,

Acked-by: Rob Herring

Applied "spi: spreadtrum adi: add hwspinlock dependency" to the spi tree

2017-10-05 Thread Mark Brown

The patch

   spi: spreadtrum adi: add hwspinlock dependency

has been applied to the spi tree at

   git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From e83f374247c310bc558c8626fbfcc03f22f9bf02 Mon Sep 17 00:00:00 2001
From: Arnd Bergmann 
Date: Thu, 5 Oct 2017 22:39:37 +0200
Subject: [PATCH] spi: spreadtrum adi: add hwspinlock dependency

With CONFIG_HWSPINLOCK=m, the new driver fails to link as a built-in driver:

drivers/spi/spi-sprd-adi.o: In function `sprd_adi_remove':
spi-sprd-adi.c:(.text+0x18): undefined reference to `hwspin_lock_free'
drivers/spi/spi-sprd-adi.o: In function `sprd_adi_probe':
spi-sprd-adi.c:(.text+0xfc): undefined reference to `of_hwspin_lock_get_id'
spi-sprd-adi.c:(.text+0x108): undefined reference to 
`hwspin_lock_request_specific'
spi-sprd-adi.c:(.text+0x268): undefined reference to `hwspin_lock_free'

This adds a hard Kconfig dependency on HWSPINLOCK for the !COMPILE_TEST
case, and allows compile-testing with HWSPINLOCK completely disabled,
which will then rely on the existing stub API.

Fixes: 7e2903cb91df ("spi: Add ADI driver for Spreadtrum platform")
Signed-off-by: Arnd Bergmann 
Signed-off-by: Mark Brown 
---
 drivers/spi/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/spi/Kconfig b/drivers/spi/Kconfig
index 0c38a5bfcd74..2c96f744352b 100644
--- a/drivers/spi/Kconfig
+++ b/drivers/spi/Kconfig
@@ -629,6 +629,7 @@ config SPI_SIRF
 config SPI_SPRD_ADI
tristate "Spreadtrum ADI controller"
depends on ARCH_SPRD || COMPILE_TEST
+   depends on HWSPINLOCK || (COMPILE_TEST && !HWSPINLOCK)
help
  ADI driver based on SPI for Spreadtrum SoCs.
 
-- 
2.14.1

Applied "spi: spreadtrum adi: add hwspinlock dependency" to the spi tree

2017-10-05 Thread Mark Brown

The patch

   spi: spreadtrum adi: add hwspinlock dependency

has been applied to the spi tree at

   git://git.kernel.org/pub/scm/linux/kernel/git/broonie/spi.git 

All being well this means that it will be integrated into the linux-next
tree (usually sometime in the next 24 hours) and sent to Linus during
the next merge window (or sooner if it is a bug fix), however if
problems are discovered then the patch may be dropped or reverted.  

You may get further e-mails resulting from automated or manual testing
and review of the tree, please engage with people reporting problems and
send followup patches addressing any issues that are reported if needed.

If any updates are required or you are submitting further changes they
should be sent as incremental updates against current git, existing
patches will not be replaced.

Please add any relevant lists and maintainers to the CCs when replying
to this mail.

Thanks,
Mark

>From e83f374247c310bc558c8626fbfcc03f22f9bf02 Mon Sep 17 00:00:00 2001
From: Arnd Bergmann 
Date: Thu, 5 Oct 2017 22:39:37 +0200
Subject: [PATCH] spi: spreadtrum adi: add hwspinlock dependency

With CONFIG_HWSPINLOCK=m, the new driver fails to link as a built-in driver:

drivers/spi/spi-sprd-adi.o: In function `sprd_adi_remove':
spi-sprd-adi.c:(.text+0x18): undefined reference to `hwspin_lock_free'
drivers/spi/spi-sprd-adi.o: In function `sprd_adi_probe':
spi-sprd-adi.c:(.text+0xfc): undefined reference to `of_hwspin_lock_get_id'
spi-sprd-adi.c:(.text+0x108): undefined reference to 
`hwspin_lock_request_specific'
spi-sprd-adi.c:(.text+0x268): undefined reference to `hwspin_lock_free'

This adds a hard Kconfig dependency on HWSPINLOCK for the !COMPILE_TEST
case, and allows compile-testing with HWSPINLOCK completely disabled,
which will then rely on the existing stub API.

Fixes: 7e2903cb91df ("spi: Add ADI driver for Spreadtrum platform")
Signed-off-by: Arnd Bergmann 
Signed-off-by: Mark Brown 
---
 drivers/spi/Kconfig | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/spi/Kconfig b/drivers/spi/Kconfig
index 0c38a5bfcd74..2c96f744352b 100644
--- a/drivers/spi/Kconfig
+++ b/drivers/spi/Kconfig
@@ -629,6 +629,7 @@ config SPI_SIRF
 config SPI_SPRD_ADI
tristate "Spreadtrum ADI controller"
depends on ARCH_SPRD || COMPILE_TEST
+   depends on HWSPINLOCK || (COMPILE_TEST && !HWSPINLOCK)
help
  ADI driver based on SPI for Spreadtrum SoCs.
 
-- 
2.14.1

Re: [PATCH v4 12/14] platform/x86: wmi: create character devices when requested by drivers

2017-10-05 Thread Darren Hart

On Thu, Oct 05, 2017 at 09:09:48PM +0200, Greg KH wrote:
> On Thu, Oct 05, 2017 at 07:03:24PM +, mario.limoncie...@dell.com wrote:
...
> > It's up to firmware to block the crazy stuff that you can put in a buffer.
> 
> So userspace can pass any blob it wants to the firmware through this
> interface and the kernel does not parse anything?  How is that
> "protected"?
> 
> > > Again, I like my TPM to work, and I don't want a random rootkit exploit
> > > to be able to destroy it :)
> > 
> > I'd like to however point out you can't kill your TPM from this interface.
> 
> On _your_ platform, can you guarantee it on any other platform?  :)

The dell-smbios-wmi driver won't load on any other platform. No
character device is created for any other platform. When drivers are
written for those other platforms for different WMI GUIDs, we need to
review them.

This driver not having MOF data should be the exception. We'll have more
ability to inspect others. If drivers are submitted that don't look at
the MOF data even through it is present, we should reject them.

-- 
Darren Hart
VMware Open Source Technology Center

Re: [PATCH v4 12/14] platform/x86: wmi: create character devices when requested by drivers

2017-10-05 Thread Darren Hart

On Thu, Oct 05, 2017 at 09:09:48PM +0200, Greg KH wrote:
> On Thu, Oct 05, 2017 at 07:03:24PM +, mario.limoncie...@dell.com wrote:
...
> > It's up to firmware to block the crazy stuff that you can put in a buffer.
> 
> So userspace can pass any blob it wants to the firmware through this
> interface and the kernel does not parse anything?  How is that
> "protected"?
> 
> > > Again, I like my TPM to work, and I don't want a random rootkit exploit
> > > to be able to destroy it :)
> > 
> > I'd like to however point out you can't kill your TPM from this interface.
> 
> On _your_ platform, can you guarantee it on any other platform?  :)

The dell-smbios-wmi driver won't load on any other platform. No
character device is created for any other platform. When drivers are
written for those other platforms for different WMI GUIDs, we need to
review them.

This driver not having MOF data should be the exception. We'll have more
ability to inspect others. If drivers are submitted that don't look at
the MOF data even through it is present, we should reject them.

-- 
Darren Hart
VMware Open Source Technology Center

Re: [RFC PATCH 1/2] kbuild: Add a cache for generated variables

2017-10-05 Thread Doug Anderson

Hi,

On Thu, Oct 5, 2017 at 12:26 AM, Masahiro Yamada
 wrote:
> As far as I tested, I always see only one space after ":=" in v2.
>
> I did not consider this deeply,
> but something is working nicely behind the scene.

Try adding this to the end of the main Makefile:

+$(info LDFLAGS_BUILD_ID = $(LDFLAGS_BUILD_ID))
+$(info KBUILD_ARFLAGS = $(KBUILD_ARFLAGS))
+$(info KBUILD_CFLAGS = $(KBUILD_CFLAGS))
+$(info KBUILD_AFLAGS = $(KBUILD_AFLAGS))
+$(info KBUILD_CPPFLAGS = $(KBUILD_CPPFLAGS))
+$(info REALMODE_CFLAGS = $(REALMODE_CFLAGS))
+
 endif  # skip-makefile

Record what you see.  Then apply my patches and run your build again
(actually, run it twice and look at the 2nd time, just to be sure).  I
think you'll see slightly different spacing in the flags for the two
runs.  I don't think this is terribly important, though.

-Doug

Re: [RFC PATCH 1/2] kbuild: Add a cache for generated variables

2017-10-05 Thread Doug Anderson

Hi,

On Thu, Oct 5, 2017 at 12:26 AM, Masahiro Yamada
 wrote:
> As far as I tested, I always see only one space after ":=" in v2.
>
> I did not consider this deeply,
> but something is working nicely behind the scene.

Try adding this to the end of the main Makefile:

+$(info LDFLAGS_BUILD_ID = $(LDFLAGS_BUILD_ID))
+$(info KBUILD_ARFLAGS = $(KBUILD_ARFLAGS))
+$(info KBUILD_CFLAGS = $(KBUILD_CFLAGS))
+$(info KBUILD_AFLAGS = $(KBUILD_AFLAGS))
+$(info KBUILD_CPPFLAGS = $(KBUILD_CPPFLAGS))
+$(info REALMODE_CFLAGS = $(REALMODE_CFLAGS))
+
 endif  # skip-makefile

Record what you see.  Then apply my patches and run your build again
(actually, run it twice and look at the 2nd time, just to be sure).  I
think you'll see slightly different spacing in the flags for the two
runs.  I don't think this is terribly important, though.

-Doug

Re: [Part2 PATCH v5 11/31] crypto: ccp: Define SEV key management command id

2017-10-05 Thread Borislav Petkov

On Wed, Oct 04, 2017 at 08:13:52AM -0500, Brijesh Singh wrote:
> Define Secure Encrypted Virtualization (SEV) key management command id
> and structure. The command definition is available in SEV KM [1] spec
> 0.14.
> 
> [1] http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf
> 
> Cc: Paolo Bonzini 
> Cc: "Radim Krčmář" 
> Cc: Borislav Petkov 
> Cc: Herbert Xu 
> Cc: Gary Hook 
> Cc: Tom Lendacky 
> Cc: linux-cry...@vger.kernel.org
> Cc: k...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Brijesh Singh 
> ---
>  include/linux/psp-sev.h | 515 
> 
>  1 file changed, 515 insertions(+)
>  create mode 100644 include/linux/psp-sev.h

Some fixes ontop below.

With that:

Reviewed-by: Borislav Petkov 

---
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index f01d0e3b09b8..496375d7f6a9 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -122,12 +122,12 @@ enum psp_ret_code {
  * @tmr_address: system physical address used for SEV-ES
  * @tmr_len: len of tmr_address
  */
-struct __attribute__((__packed__)) sev_data_init {
+struct sev_data_init {
u32 flags;  /* In */
u32 reserved;   /* In */
u64 tmr_address;/* In */
u32 tmr_len;/* In */
-};
+} __packed;
 
 /**
  * struct sev_data_status - PLATFORM_STATUS command parameters
@@ -140,7 +140,7 @@ struct __attribute__((__packed__)) sev_data_init {
  * @build: firmware build id for API version
  * @guest_count: number of active guests
  */
-struct __attribute__((__packed__)) sev_data_status {
+struct sev_data_status {
u8 api_major;   /* Out */
u8 api_minor;   /* Out */
u8 state;   /* Out */
@@ -150,7 +150,7 @@ struct __attribute__((__packed__)) sev_data_status {
u32 reserved2 : 23;
u32 build : 8;  /* Out */
u32 guest_count;/* Out */
-};
+} __packed;
 
 /**
  * struct sev_data_pek_csr - PEK_CSR command parameters
@@ -158,10 +158,10 @@ struct __attribute__((__packed__)) sev_data_status {
  * @address: PEK certificate chain
  * @len: len of certificate
  */
-struct __attribute__((__packed__)) sev_data_pek_csr {
+struct sev_data_pek_csr {
u64 address;/* In */
u32 len;/* In/Out */
-};
+} __packed;
 
 /**
  * struct sev_data_cert_import - PEK_CERT_IMPORT command parameters
@@ -171,13 +171,13 @@ struct __attribute__((__packed__)) sev_data_pek_csr {
  * @oca_address: OCA certificate chain
  * @oca_len: len of OCA certificate
  */
-struct __attribute__((__packed__)) sev_data_pek_cert_import {
+struct sev_data_pek_cert_import {
u64 pek_cert_address;   /* In */
u32 pek_cert_len;   /* In */
u32 reserved;   /* In */
u64 oca_cert_address;   /* In */
u32 oca_cert_len;   /* In */
-};
+} __packed;
 
 /**
  * struct sev_data_pdh_cert_export - PDH_CERT_EXPORT command parameters
@@ -187,22 +187,22 @@ struct __attribute__((__packed__)) 
sev_data_pek_cert_import {
  * @cert_chain_address: PDH certificate chain
  * @cert_chain_len: len of PDH certificate chain
  */
-struct __attribute__((__packed__)) sev_data_pdh_cert_export {
+struct sev_data_pdh_cert_export {
u64 pdh_cert_address;   /* In */
u32 pdh_cert_len;   /* In/Out */
u32 reserved;   /* In */
u64 cert_chain_address; /* In */
u32 cert_chain_len; /* In/Out */
-};
+} __packed;
 
 /**
  * struct sev_data_decommission - DECOMMISSION command parameters
  *
  * @handle: handle of the VM to decommission
  */
-struct __attribute__((__packed__)) sev_data_decommission {
+struct sev_data_decommission {
u32 handle; /* In */
-};
+} __packed;
 
 /**
  * struct sev_data_activate - ACTIVATE command parameters
@@ -210,19 +210,19 @@ struct __attribute__((__packed__)) sev_data_decommission {
  * @handle: handle of the VM to activate
  * @asid: asid assigned to the VM
  */
-struct __attribute__((__packed__)) sev_data_activate {
+struct sev_data_activate {
u32 handle; /* In */
u32 asid;   /* In */
-};
+} __packed;
 
 /**
  * struct sev_data_deactivate - DEACTIVATE command parameters
  *
  * @handle: handle of the VM to deactivate
  */
-struct __attribute__((__packed__)) sev_data_deactivate {
+struct sev_data_deactivate {
u32 handle;

Re: [Part2 PATCH v5 11/31] crypto: ccp: Define SEV key management command id

2017-10-05 Thread Borislav Petkov

On Wed, Oct 04, 2017 at 08:13:52AM -0500, Brijesh Singh wrote:
> Define Secure Encrypted Virtualization (SEV) key management command id
> and structure. The command definition is available in SEV KM [1] spec
> 0.14.
> 
> [1] http://support.amd.com/TechDocs/55766_SEV-KM API_Specification.pdf
> 
> Cc: Paolo Bonzini 
> Cc: "Radim Krčmář" 
> Cc: Borislav Petkov 
> Cc: Herbert Xu 
> Cc: Gary Hook 
> Cc: Tom Lendacky 
> Cc: linux-cry...@vger.kernel.org
> Cc: k...@vger.kernel.org
> Cc: linux-kernel@vger.kernel.org
> Signed-off-by: Brijesh Singh 
> ---
>  include/linux/psp-sev.h | 515 
> 
>  1 file changed, 515 insertions(+)
>  create mode 100644 include/linux/psp-sev.h

Some fixes ontop below.

With that:

Reviewed-by: Borislav Petkov 

---
diff --git a/include/linux/psp-sev.h b/include/linux/psp-sev.h
index f01d0e3b09b8..496375d7f6a9 100644
--- a/include/linux/psp-sev.h
+++ b/include/linux/psp-sev.h
@@ -122,12 +122,12 @@ enum psp_ret_code {
  * @tmr_address: system physical address used for SEV-ES
  * @tmr_len: len of tmr_address
  */
-struct __attribute__((__packed__)) sev_data_init {
+struct sev_data_init {
u32 flags;  /* In */
u32 reserved;   /* In */
u64 tmr_address;/* In */
u32 tmr_len;/* In */
-};
+} __packed;
 
 /**
  * struct sev_data_status - PLATFORM_STATUS command parameters
@@ -140,7 +140,7 @@ struct __attribute__((__packed__)) sev_data_init {
  * @build: firmware build id for API version
  * @guest_count: number of active guests
  */
-struct __attribute__((__packed__)) sev_data_status {
+struct sev_data_status {
u8 api_major;   /* Out */
u8 api_minor;   /* Out */
u8 state;   /* Out */
@@ -150,7 +150,7 @@ struct __attribute__((__packed__)) sev_data_status {
u32 reserved2 : 23;
u32 build : 8;  /* Out */
u32 guest_count;/* Out */
-};
+} __packed;
 
 /**
  * struct sev_data_pek_csr - PEK_CSR command parameters
@@ -158,10 +158,10 @@ struct __attribute__((__packed__)) sev_data_status {
  * @address: PEK certificate chain
  * @len: len of certificate
  */
-struct __attribute__((__packed__)) sev_data_pek_csr {
+struct sev_data_pek_csr {
u64 address;/* In */
u32 len;/* In/Out */
-};
+} __packed;
 
 /**
  * struct sev_data_cert_import - PEK_CERT_IMPORT command parameters
@@ -171,13 +171,13 @@ struct __attribute__((__packed__)) sev_data_pek_csr {
  * @oca_address: OCA certificate chain
  * @oca_len: len of OCA certificate
  */
-struct __attribute__((__packed__)) sev_data_pek_cert_import {
+struct sev_data_pek_cert_import {
u64 pek_cert_address;   /* In */
u32 pek_cert_len;   /* In */
u32 reserved;   /* In */
u64 oca_cert_address;   /* In */
u32 oca_cert_len;   /* In */
-};
+} __packed;
 
 /**
  * struct sev_data_pdh_cert_export - PDH_CERT_EXPORT command parameters
@@ -187,22 +187,22 @@ struct __attribute__((__packed__)) 
sev_data_pek_cert_import {
  * @cert_chain_address: PDH certificate chain
  * @cert_chain_len: len of PDH certificate chain
  */
-struct __attribute__((__packed__)) sev_data_pdh_cert_export {
+struct sev_data_pdh_cert_export {
u64 pdh_cert_address;   /* In */
u32 pdh_cert_len;   /* In/Out */
u32 reserved;   /* In */
u64 cert_chain_address; /* In */
u32 cert_chain_len; /* In/Out */
-};
+} __packed;
 
 /**
  * struct sev_data_decommission - DECOMMISSION command parameters
  *
  * @handle: handle of the VM to decommission
  */
-struct __attribute__((__packed__)) sev_data_decommission {
+struct sev_data_decommission {
u32 handle; /* In */
-};
+} __packed;
 
 /**
  * struct sev_data_activate - ACTIVATE command parameters
@@ -210,19 +210,19 @@ struct __attribute__((__packed__)) sev_data_decommission {
  * @handle: handle of the VM to activate
  * @asid: asid assigned to the VM
  */
-struct __attribute__((__packed__)) sev_data_activate {
+struct sev_data_activate {
u32 handle; /* In */
u32 asid;   /* In */
-};
+} __packed;
 
 /**
  * struct sev_data_deactivate - DEACTIVATE command parameters
  *
  * @handle: handle of the VM to deactivate
  */
-struct __attribute__((__packed__)) sev_data_deactivate {
+struct sev_data_deactivate {
u32 handle; /* In */
-};
+} __packed;
 
 /**
  * struct sev_data_guest_status - SEV GUEST_STATUS command parameters
@@ -232,12 +232,12 @@ struct __attribute__((__packed__))

Re: [PATCH] vfs: fix statfs64() returning impossible EOVERFLOW for 64-bit f_files

2017-10-05 Thread Al Viro

On Thu, Oct 05, 2017 at 09:36:36PM +0300, Sergey Klyaus wrote:
> compat_statfs64 structure has some 32-bit and some 64-bit fields, so
> 64d2ab32e "vfs: fix put_compat_statfs64() does not handle errors" fixed
> 32-bit overflow checks not being performed, but accidentally enabled
> checks for f_files and f_ffree that are 64-bit and cannot have overflow.
> Now checks for both groups of fields are enabled by different
> conditions.

TBH, the logics in there looks very dubious.  First of all, could somebody
show an architecture where compat_statfs64 would *not* have 32bit f_bsize?

AFAICS, there are only 3 variants of struct compat_statfs64 declaration in
the entire tree:
arch/mips/include/uapi/asm/statfs.h:82:struct compat_statfs64 {
arch/s390/include/asm/compat.h:167:struct compat_statfs64 {
include/uapi/asm-generic/statfs.h:68:struct compat_statfs64 {

mips one has
__u32   f_bsize;
s390 -
u32 f_bsize;
and generic -
__u32 f_bsize;

So what is that if (sizeof... == 4) about?  Before or after the commit in
question - f_blocks is consistently 64bit, f_bsize - 32bit.  IOW, that
commit has turned an obfuscated if (0) into equally obfuscated if (1).

In any case, that thing is supposed to behave like statfs64(2) on matching
32bit host, so what the hell is that EOVERFLOW about, anyway?  ->f_type value
not fitting into 32 bits?  That'd be an fs bug; I could see WARN_ON() on that,
but -EOVERFLOW is bloody odd.  And ->f_namelen exceeding 4Gb is even funnier...

Seriously, the old logics had been complete BS and the only saving grace had
been the fact that it never triggered.  What the hell is f_files and f_ffree
logics about?  Those are 64bit in *all* struct statfs64 variants.  Always
had been.

AFAICS, the real bug here is in hugetlbfs; that's where obscene values in
->f_bsize come from.  IMO all that code in put_compat_statfs64() should be
replaced with
if (kbuf->bsize != (u32)kbuf->bsize)
return -EOVERFLOW;
That, or hugetlbfs could be taught to fake saner ->f_bsize (recalculating
->f_bavail/->f_bfree/->f_blocks to go with that).

Comments?

Re: [PATCH] vfs: fix statfs64() returning impossible EOVERFLOW for 64-bit f_files

2017-10-05 Thread Al Viro

On Thu, Oct 05, 2017 at 09:36:36PM +0300, Sergey Klyaus wrote:
> compat_statfs64 structure has some 32-bit and some 64-bit fields, so
> 64d2ab32e "vfs: fix put_compat_statfs64() does not handle errors" fixed
> 32-bit overflow checks not being performed, but accidentally enabled
> checks for f_files and f_ffree that are 64-bit and cannot have overflow.
> Now checks for both groups of fields are enabled by different
> conditions.

TBH, the logics in there looks very dubious.  First of all, could somebody
show an architecture where compat_statfs64 would *not* have 32bit f_bsize?

AFAICS, there are only 3 variants of struct compat_statfs64 declaration in
the entire tree:
arch/mips/include/uapi/asm/statfs.h:82:struct compat_statfs64 {
arch/s390/include/asm/compat.h:167:struct compat_statfs64 {
include/uapi/asm-generic/statfs.h:68:struct compat_statfs64 {

mips one has
__u32   f_bsize;
s390 -
u32 f_bsize;
and generic -
__u32 f_bsize;

So what is that if (sizeof... == 4) about?  Before or after the commit in
question - f_blocks is consistently 64bit, f_bsize - 32bit.  IOW, that
commit has turned an obfuscated if (0) into equally obfuscated if (1).

In any case, that thing is supposed to behave like statfs64(2) on matching
32bit host, so what the hell is that EOVERFLOW about, anyway?  ->f_type value
not fitting into 32 bits?  That'd be an fs bug; I could see WARN_ON() on that,
but -EOVERFLOW is bloody odd.  And ->f_namelen exceeding 4Gb is even funnier...

Seriously, the old logics had been complete BS and the only saving grace had
been the fact that it never triggered.  What the hell is f_files and f_ffree
logics about?  Those are 64bit in *all* struct statfs64 variants.  Always
had been.

AFAICS, the real bug here is in hugetlbfs; that's where obscene values in
->f_bsize come from.  IMO all that code in put_compat_statfs64() should be
replaced with
if (kbuf->bsize != (u32)kbuf->bsize)
return -EOVERFLOW;
That, or hugetlbfs could be taught to fake saner ->f_bsize (recalculating
->f_bavail/->f_bfree/->f_blocks to go with that).

Comments?

[PATCH 1/3] pinctrl: sunxi: Introduce the strict flag

2017-10-05 Thread Maxime Ripard

Our pinctrl device should have had strict set all along. However, it wasn't
the case, and most of our old device trees also have a pinctrl group in
addition to the GPIOs properties, which mean that we can't really turn it
on now.

All our new SoCs don't have that group, so we should still enable that mode
on the newer one though.

In order to enable it by default, add a flag that will allow to disable
that mode that should be set by pinctrl drivers that cannot be migrated.

Signed-off-by: Maxime Ripard 
---
 drivers/pinctrl/sunxi/pinctrl-sunxi.c | 5 -
 drivers/pinctrl/sunxi/pinctrl-sunxi.h | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/pinctrl/sunxi/pinctrl-sunxi.c 
b/drivers/pinctrl/sunxi/pinctrl-sunxi.c
index 52edf3b5988d..1753a5b1573f 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sunxi.c
+++ b/drivers/pinctrl/sunxi/pinctrl-sunxi.c
@@ -690,7 +690,7 @@ sunxi_pmx_gpio_set_direction(struct pinctrl_dev *pctldev,
return 0;
 }
 
-static const struct pinmux_ops sunxi_pmx_ops = {
+static struct pinmux_ops sunxi_pmx_ops = {
.get_functions_count= sunxi_pmx_get_funcs_cnt,
.get_function_name  = sunxi_pmx_get_func_name,
.get_function_groups= sunxi_pmx_get_func_groups,
@@ -1307,6 +1307,9 @@ int sunxi_pinctrl_init_with_variant(struct 
platform_device *pdev,
pctrl_desc->pctlops = _pctrl_ops;
pctrl_desc->pmxops =  _pmx_ops;
 
+   if (desc->disable_strict_mode)
+   sunxi_pmx_ops.strict = false;
+
pctl->pctl_dev = devm_pinctrl_register(>dev, pctrl_desc, pctl);
if (IS_ERR(pctl->pctl_dev)) {
dev_err(>dev, "couldn't register pinctrl driver\n");
diff --git a/drivers/pinctrl/sunxi/pinctrl-sunxi.h 
b/drivers/pinctrl/sunxi/pinctrl-sunxi.h
index 1bfc0d8a55df..11b128f54ed2 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sunxi.h
+++ b/drivers/pinctrl/sunxi/pinctrl-sunxi.h
@@ -112,6 +112,7 @@ struct sunxi_pinctrl_desc {
unsignedirq_banks;
unsignedirq_bank_base;
boolirq_read_needs_mux;
+   booldisable_strict_mode;
 };
 
 struct sunxi_pinctrl_function {
-- 
git-series 0.9.1

[PATCH 1/3] pinctrl: sunxi: Introduce the strict flag

2017-10-05 Thread Maxime Ripard

Our pinctrl device should have had strict set all along. However, it wasn't
the case, and most of our old device trees also have a pinctrl group in
addition to the GPIOs properties, which mean that we can't really turn it
on now.

All our new SoCs don't have that group, so we should still enable that mode
on the newer one though.

In order to enable it by default, add a flag that will allow to disable
that mode that should be set by pinctrl drivers that cannot be migrated.

Signed-off-by: Maxime Ripard 
---
 drivers/pinctrl/sunxi/pinctrl-sunxi.c | 5 -
 drivers/pinctrl/sunxi/pinctrl-sunxi.h | 1 +
 2 files changed, 5 insertions(+), 1 deletion(-)

diff --git a/drivers/pinctrl/sunxi/pinctrl-sunxi.c 
b/drivers/pinctrl/sunxi/pinctrl-sunxi.c
index 52edf3b5988d..1753a5b1573f 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sunxi.c
+++ b/drivers/pinctrl/sunxi/pinctrl-sunxi.c
@@ -690,7 +690,7 @@ sunxi_pmx_gpio_set_direction(struct pinctrl_dev *pctldev,
return 0;
 }
 
-static const struct pinmux_ops sunxi_pmx_ops = {
+static struct pinmux_ops sunxi_pmx_ops = {
.get_functions_count= sunxi_pmx_get_funcs_cnt,
.get_function_name  = sunxi_pmx_get_func_name,
.get_function_groups= sunxi_pmx_get_func_groups,
@@ -1307,6 +1307,9 @@ int sunxi_pinctrl_init_with_variant(struct 
platform_device *pdev,
pctrl_desc->pctlops = _pctrl_ops;
pctrl_desc->pmxops =  _pmx_ops;
 
+   if (desc->disable_strict_mode)
+   sunxi_pmx_ops.strict = false;
+
pctl->pctl_dev = devm_pinctrl_register(>dev, pctrl_desc, pctl);
if (IS_ERR(pctl->pctl_dev)) {
dev_err(>dev, "couldn't register pinctrl driver\n");
diff --git a/drivers/pinctrl/sunxi/pinctrl-sunxi.h 
b/drivers/pinctrl/sunxi/pinctrl-sunxi.h
index 1bfc0d8a55df..11b128f54ed2 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sunxi.h
+++ b/drivers/pinctrl/sunxi/pinctrl-sunxi.h
@@ -112,6 +112,7 @@ struct sunxi_pinctrl_desc {
unsignedirq_banks;
unsignedirq_bank_base;
boolirq_read_needs_mux;
+   booldisable_strict_mode;
 };
 
 struct sunxi_pinctrl_function {
-- 
git-series 0.9.1

[PATCH 2/3] pinctrl: sunxi: Disable strict mode for old pinctrl drivers

2017-10-05 Thread Maxime Ripard

Old pinctrl drivers will need to disable strict mode for various reasons,
among which:
  - Some DT will still have a pinctrl group for each GPIO used, which will
be rejected by pin_request. While we could remove those nodes, we still
have to deal with old DTs.
  - Some GPIOs on these boards need to have their pin configuration changed
(for bias or current), and there's no clear migration path

Let's disable the strict mode on those SoCs so that there's no breakage.

Signed-off-by: Maxime Ripard 
---
 drivers/pinctrl/sunxi/pinctrl-sun4i-a10.c | 1 +
 drivers/pinctrl/sunxi/pinctrl-sun5i.c | 1 +
 drivers/pinctrl/sunxi/pinctrl-sun6i-a31.c | 1 +
 drivers/pinctrl/sunxi/pinctrl-sun8i-a23.c | 1 +
 drivers/pinctrl/sunxi/pinctrl-sun8i-a33.c | 1 +
 drivers/pinctrl/sunxi/pinctrl-sun8i-h3.c  | 3 ++-
 drivers/pinctrl/sunxi/pinctrl-sun9i-a80.c | 1 +
 7 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/pinctrl/sunxi/pinctrl-sun4i-a10.c 
b/drivers/pinctrl/sunxi/pinctrl-sun4i-a10.c
index f763d8d62d6e..295e48fc94bc 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sun4i-a10.c
+++ b/drivers/pinctrl/sunxi/pinctrl-sun4i-a10.c
@@ -1289,6 +1289,7 @@ static const struct sunxi_pinctrl_desc 
sun4i_a10_pinctrl_data = {
.npins = ARRAY_SIZE(sun4i_a10_pins),
.irq_banks = 1,
.irq_read_needs_mux = true,
+   .disable_strict_mode = true,
 };
 
 static int sun4i_a10_pinctrl_probe(struct platform_device *pdev)
diff --git a/drivers/pinctrl/sunxi/pinctrl-sun5i.c 
b/drivers/pinctrl/sunxi/pinctrl-sun5i.c
index 47afd558b114..27ec99e81c4c 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sun5i.c
+++ b/drivers/pinctrl/sunxi/pinctrl-sun5i.c
@@ -713,6 +713,7 @@ static const struct sunxi_pinctrl_desc sun5i_pinctrl_data = 
{
.pins = sun5i_pins,
.npins = ARRAY_SIZE(sun5i_pins),
.irq_banks = 1,
+   .disable_strict_mode = true,
 };
 
 static int sun5i_pinctrl_probe(struct platform_device *pdev)
diff --git a/drivers/pinctrl/sunxi/pinctrl-sun6i-a31.c 
b/drivers/pinctrl/sunxi/pinctrl-sun6i-a31.c
index 951a25c18815..82ffaf466892 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sun6i-a31.c
+++ b/drivers/pinctrl/sunxi/pinctrl-sun6i-a31.c
@@ -965,6 +965,7 @@ static const struct sunxi_pinctrl_desc 
sun6i_a31_pinctrl_data = {
.pins = sun6i_a31_pins,
.npins = ARRAY_SIZE(sun6i_a31_pins),
.irq_banks = 4,
+   .disable_strict_mode = true,
 };
 
 static int sun6i_a31_pinctrl_probe(struct platform_device *pdev)
diff --git a/drivers/pinctrl/sunxi/pinctrl-sun8i-a23.c 
b/drivers/pinctrl/sunxi/pinctrl-sun8i-a23.c
index 721b6935baf3..402fd7d21e7b 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sun8i-a23.c
+++ b/drivers/pinctrl/sunxi/pinctrl-sun8i-a23.c
@@ -563,6 +563,7 @@ static const struct sunxi_pinctrl_desc 
sun8i_a23_pinctrl_data = {
.pins = sun8i_a23_pins,
.npins = ARRAY_SIZE(sun8i_a23_pins),
.irq_banks = 3,
+   .disable_strict_mode = true,
 };
 
 static int sun8i_a23_pinctrl_probe(struct platform_device *pdev)
diff --git a/drivers/pinctrl/sunxi/pinctrl-sun8i-a33.c 
b/drivers/pinctrl/sunxi/pinctrl-sun8i-a33.c
index ef1e0bef4099..da387211a75e 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sun8i-a33.c
+++ b/drivers/pinctrl/sunxi/pinctrl-sun8i-a33.c
@@ -486,6 +486,7 @@ static const struct sunxi_pinctrl_desc 
sun8i_a33_pinctrl_data = {
.npins = ARRAY_SIZE(sun8i_a33_pins),
.irq_banks = 2,
.irq_bank_base = 1,
+   .disable_strict_mode = true,
 };
 
 static int sun8i_a33_pinctrl_probe(struct platform_device *pdev)
diff --git a/drivers/pinctrl/sunxi/pinctrl-sun8i-h3.c 
b/drivers/pinctrl/sunxi/pinctrl-sun8i-h3.c
index 518a92df4418..d1719a738c20 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sun8i-h3.c
+++ b/drivers/pinctrl/sunxi/pinctrl-sun8i-h3.c
@@ -491,7 +491,8 @@ static const struct sunxi_pinctrl_desc 
sun8i_h3_pinctrl_data = {
.pins = sun8i_h3_pins,
.npins = ARRAY_SIZE(sun8i_h3_pins),
.irq_banks = 2,
-   .irq_read_needs_mux = true
+   .irq_read_needs_mux = true,
+   .disable_strict_mode = true,
 };
 
 static int sun8i_h3_pinctrl_probe(struct platform_device *pdev)
diff --git a/drivers/pinctrl/sunxi/pinctrl-sun9i-a80.c 
b/drivers/pinctrl/sunxi/pinctrl-sun9i-a80.c
index bc14e954d7a2..472ef0d91b99 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sun9i-a80.c
+++ b/drivers/pinctrl/sunxi/pinctrl-sun9i-a80.c
@@ -721,6 +721,7 @@ static const struct sunxi_pinctrl_desc 
sun9i_a80_pinctrl_data = {
.pins = sun9i_a80_pins,
.npins = ARRAY_SIZE(sun9i_a80_pins),
.irq_banks = 5,
+   .disable_strict_mode = true,
 };
 
 static int sun9i_a80_pinctrl_probe(struct platform_device *pdev)
-- 
git-series 0.9.1

[PATCH 2/3] pinctrl: sunxi: Disable strict mode for old pinctrl drivers

2017-10-05 Thread Maxime Ripard

Old pinctrl drivers will need to disable strict mode for various reasons,
among which:
  - Some DT will still have a pinctrl group for each GPIO used, which will
be rejected by pin_request. While we could remove those nodes, we still
have to deal with old DTs.
  - Some GPIOs on these boards need to have their pin configuration changed
(for bias or current), and there's no clear migration path

Let's disable the strict mode on those SoCs so that there's no breakage.

Signed-off-by: Maxime Ripard 
---
 drivers/pinctrl/sunxi/pinctrl-sun4i-a10.c | 1 +
 drivers/pinctrl/sunxi/pinctrl-sun5i.c | 1 +
 drivers/pinctrl/sunxi/pinctrl-sun6i-a31.c | 1 +
 drivers/pinctrl/sunxi/pinctrl-sun8i-a23.c | 1 +
 drivers/pinctrl/sunxi/pinctrl-sun8i-a33.c | 1 +
 drivers/pinctrl/sunxi/pinctrl-sun8i-h3.c  | 3 ++-
 drivers/pinctrl/sunxi/pinctrl-sun9i-a80.c | 1 +
 7 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/pinctrl/sunxi/pinctrl-sun4i-a10.c 
b/drivers/pinctrl/sunxi/pinctrl-sun4i-a10.c
index f763d8d62d6e..295e48fc94bc 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sun4i-a10.c
+++ b/drivers/pinctrl/sunxi/pinctrl-sun4i-a10.c
@@ -1289,6 +1289,7 @@ static const struct sunxi_pinctrl_desc 
sun4i_a10_pinctrl_data = {
.npins = ARRAY_SIZE(sun4i_a10_pins),
.irq_banks = 1,
.irq_read_needs_mux = true,
+   .disable_strict_mode = true,
 };
 
 static int sun4i_a10_pinctrl_probe(struct platform_device *pdev)
diff --git a/drivers/pinctrl/sunxi/pinctrl-sun5i.c 
b/drivers/pinctrl/sunxi/pinctrl-sun5i.c
index 47afd558b114..27ec99e81c4c 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sun5i.c
+++ b/drivers/pinctrl/sunxi/pinctrl-sun5i.c
@@ -713,6 +713,7 @@ static const struct sunxi_pinctrl_desc sun5i_pinctrl_data = 
{
.pins = sun5i_pins,
.npins = ARRAY_SIZE(sun5i_pins),
.irq_banks = 1,
+   .disable_strict_mode = true,
 };
 
 static int sun5i_pinctrl_probe(struct platform_device *pdev)
diff --git a/drivers/pinctrl/sunxi/pinctrl-sun6i-a31.c 
b/drivers/pinctrl/sunxi/pinctrl-sun6i-a31.c
index 951a25c18815..82ffaf466892 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sun6i-a31.c
+++ b/drivers/pinctrl/sunxi/pinctrl-sun6i-a31.c
@@ -965,6 +965,7 @@ static const struct sunxi_pinctrl_desc 
sun6i_a31_pinctrl_data = {
.pins = sun6i_a31_pins,
.npins = ARRAY_SIZE(sun6i_a31_pins),
.irq_banks = 4,
+   .disable_strict_mode = true,
 };
 
 static int sun6i_a31_pinctrl_probe(struct platform_device *pdev)
diff --git a/drivers/pinctrl/sunxi/pinctrl-sun8i-a23.c 
b/drivers/pinctrl/sunxi/pinctrl-sun8i-a23.c
index 721b6935baf3..402fd7d21e7b 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sun8i-a23.c
+++ b/drivers/pinctrl/sunxi/pinctrl-sun8i-a23.c
@@ -563,6 +563,7 @@ static const struct sunxi_pinctrl_desc 
sun8i_a23_pinctrl_data = {
.pins = sun8i_a23_pins,
.npins = ARRAY_SIZE(sun8i_a23_pins),
.irq_banks = 3,
+   .disable_strict_mode = true,
 };
 
 static int sun8i_a23_pinctrl_probe(struct platform_device *pdev)
diff --git a/drivers/pinctrl/sunxi/pinctrl-sun8i-a33.c 
b/drivers/pinctrl/sunxi/pinctrl-sun8i-a33.c
index ef1e0bef4099..da387211a75e 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sun8i-a33.c
+++ b/drivers/pinctrl/sunxi/pinctrl-sun8i-a33.c
@@ -486,6 +486,7 @@ static const struct sunxi_pinctrl_desc 
sun8i_a33_pinctrl_data = {
.npins = ARRAY_SIZE(sun8i_a33_pins),
.irq_banks = 2,
.irq_bank_base = 1,
+   .disable_strict_mode = true,
 };
 
 static int sun8i_a33_pinctrl_probe(struct platform_device *pdev)
diff --git a/drivers/pinctrl/sunxi/pinctrl-sun8i-h3.c 
b/drivers/pinctrl/sunxi/pinctrl-sun8i-h3.c
index 518a92df4418..d1719a738c20 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sun8i-h3.c
+++ b/drivers/pinctrl/sunxi/pinctrl-sun8i-h3.c
@@ -491,7 +491,8 @@ static const struct sunxi_pinctrl_desc 
sun8i_h3_pinctrl_data = {
.pins = sun8i_h3_pins,
.npins = ARRAY_SIZE(sun8i_h3_pins),
.irq_banks = 2,
-   .irq_read_needs_mux = true
+   .irq_read_needs_mux = true,
+   .disable_strict_mode = true,
 };
 
 static int sun8i_h3_pinctrl_probe(struct platform_device *pdev)
diff --git a/drivers/pinctrl/sunxi/pinctrl-sun9i-a80.c 
b/drivers/pinctrl/sunxi/pinctrl-sun9i-a80.c
index bc14e954d7a2..472ef0d91b99 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sun9i-a80.c
+++ b/drivers/pinctrl/sunxi/pinctrl-sun9i-a80.c
@@ -721,6 +721,7 @@ static const struct sunxi_pinctrl_desc 
sun9i_a80_pinctrl_data = {
.pins = sun9i_a80_pins,
.npins = ARRAY_SIZE(sun9i_a80_pins),
.irq_banks = 5,
+   .disable_strict_mode = true,
 };
 
 static int sun9i_a80_pinctrl_probe(struct platform_device *pdev)
-- 
git-series 0.9.1

[PATCH 3/3] pinctrl: sunxi: Enforce the strict mode by default

2017-10-05 Thread Maxime Ripard

The strict mode should always have been enabled on our driver, and leaving
it unchecked just makes it harder to find a migration path as time passes.

Let's enable it by default now so that hopefully the new SoCs should be
safe.

Signed-off-by: Maxime Ripard 
---
 drivers/pinctrl/sunxi/pinctrl-sunxi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pinctrl/sunxi/pinctrl-sunxi.c 
b/drivers/pinctrl/sunxi/pinctrl-sunxi.c
index 1753a5b1573f..2313c3b8931a 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sunxi.c
+++ b/drivers/pinctrl/sunxi/pinctrl-sunxi.c
@@ -696,6 +696,7 @@ static struct pinmux_ops sunxi_pmx_ops = {
.get_function_groups= sunxi_pmx_get_func_groups,
.set_mux= sunxi_pmx_set_mux,
.gpio_set_direction = sunxi_pmx_gpio_set_direction,
+   .strict = true,
 };
 
 static int sunxi_pinctrl_gpio_direction_input(struct gpio_chip *chip,
-- 
git-series 0.9.1

[PATCH 0/3] pinctrl: sunxi: Start to enforce the strict mode

2017-10-05 Thread Maxime Ripard

Hi,

Here is an attempt to enable pinctrl's strict mode on our pinctrl drivers.
Indeed, our controllers should have had that mode enabled since its
introduction.

However, there's a number of issues with old device trees that prevent from
just enabling it for all the devices. There's basically two of them:

  - Most of our old DTs have a pinctrl node for GPIOs, which will result in
an error when the driver is going to request the gpio because it would
already be requested automatically by pinctrl. We cannot break those.

  - Some of these GPIOs also need to change their pin configuration to add
a bias or change the current output, and there isn't a migration path.

Let's just keep the old behaviour for the old SoCs, and enforce it on the
new one, and enabled it by default so that the situation at least doesn't
get worse.

This has been tested on an A83t (strict on) and an H3 (strict off) board.

Let me know what you think,
Maxime

Maxime Ripard (3):
  pinctrl: sunxi: Introduce the strict flag
  pinctrl: sunxi: Disable strict mode for old pinctrl drivers
  pinctrl: sunxi: Enforce the strict mode by default

 drivers/pinctrl/sunxi/pinctrl-sun4i-a10.c | 1 +
 drivers/pinctrl/sunxi/pinctrl-sun5i.c | 1 +
 drivers/pinctrl/sunxi/pinctrl-sun6i-a31.c | 1 +
 drivers/pinctrl/sunxi/pinctrl-sun8i-a23.c | 1 +
 drivers/pinctrl/sunxi/pinctrl-sun8i-a33.c | 1 +
 drivers/pinctrl/sunxi/pinctrl-sun8i-h3.c  | 3 ++-
 drivers/pinctrl/sunxi/pinctrl-sun9i-a80.c | 1 +
 drivers/pinctrl/sunxi/pinctrl-sunxi.c | 6 +-
 drivers/pinctrl/sunxi/pinctrl-sunxi.h | 1 +
 9 files changed, 14 insertions(+), 2 deletions(-)

base-commit: c27ad2958485126c3828e59d53013d3b45ec14f2
-- 
git-series 0.9.1

[PATCH 3/3] pinctrl: sunxi: Enforce the strict mode by default

2017-10-05 Thread Maxime Ripard

The strict mode should always have been enabled on our driver, and leaving
it unchecked just makes it harder to find a migration path as time passes.

Let's enable it by default now so that hopefully the new SoCs should be
safe.

Signed-off-by: Maxime Ripard 
---
 drivers/pinctrl/sunxi/pinctrl-sunxi.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/pinctrl/sunxi/pinctrl-sunxi.c 
b/drivers/pinctrl/sunxi/pinctrl-sunxi.c
index 1753a5b1573f..2313c3b8931a 100644
--- a/drivers/pinctrl/sunxi/pinctrl-sunxi.c
+++ b/drivers/pinctrl/sunxi/pinctrl-sunxi.c
@@ -696,6 +696,7 @@ static struct pinmux_ops sunxi_pmx_ops = {
.get_function_groups= sunxi_pmx_get_func_groups,
.set_mux= sunxi_pmx_set_mux,
.gpio_set_direction = sunxi_pmx_gpio_set_direction,
+   .strict = true,
 };
 
 static int sunxi_pinctrl_gpio_direction_input(struct gpio_chip *chip,
-- 
git-series 0.9.1

[PATCH 0/3] pinctrl: sunxi: Start to enforce the strict mode

2017-10-05 Thread Maxime Ripard

Hi,

Here is an attempt to enable pinctrl's strict mode on our pinctrl drivers.
Indeed, our controllers should have had that mode enabled since its
introduction.

However, there's a number of issues with old device trees that prevent from
just enabling it for all the devices. There's basically two of them:

  - Most of our old DTs have a pinctrl node for GPIOs, which will result in
an error when the driver is going to request the gpio because it would
already be requested automatically by pinctrl. We cannot break those.

  - Some of these GPIOs also need to change their pin configuration to add
a bias or change the current output, and there isn't a migration path.

Let's just keep the old behaviour for the old SoCs, and enforce it on the
new one, and enabled it by default so that the situation at least doesn't
get worse.

This has been tested on an A83t (strict on) and an H3 (strict off) board.

Let me know what you think,
Maxime

Maxime Ripard (3):
  pinctrl: sunxi: Introduce the strict flag
  pinctrl: sunxi: Disable strict mode for old pinctrl drivers
  pinctrl: sunxi: Enforce the strict mode by default

 drivers/pinctrl/sunxi/pinctrl-sun4i-a10.c | 1 +
 drivers/pinctrl/sunxi/pinctrl-sun5i.c | 1 +
 drivers/pinctrl/sunxi/pinctrl-sun6i-a31.c | 1 +
 drivers/pinctrl/sunxi/pinctrl-sun8i-a23.c | 1 +
 drivers/pinctrl/sunxi/pinctrl-sun8i-a33.c | 1 +
 drivers/pinctrl/sunxi/pinctrl-sun8i-h3.c  | 3 ++-
 drivers/pinctrl/sunxi/pinctrl-sun9i-a80.c | 1 +
 drivers/pinctrl/sunxi/pinctrl-sunxi.c | 6 +-
 drivers/pinctrl/sunxi/pinctrl-sunxi.h | 1 +
 9 files changed, 14 insertions(+), 2 deletions(-)

base-commit: c27ad2958485126c3828e59d53013d3b45ec14f2
-- 
git-series 0.9.1

[PATCH 2/2] powermac: Improve a size determination in five functions

2017-10-05 Thread SF Markus Elfring

From: Markus Elfring 
Date: Thu, 5 Oct 2017 22:40:39 +0200

Replace the specification of data structures by pointer dereferences
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 arch/powerpc/platforms/powermac/low_i2c.c| 6 +++---
 arch/powerpc/platforms/powermac/pfunc_core.c | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/powermac/low_i2c.c 
b/arch/powerpc/platforms/powermac/low_i2c.c
index ecd166b5549b..46b7eb1a0f5f 100644
--- a/arch/powerpc/platforms/powermac/low_i2c.c
+++ b/arch/powerpc/platforms/powermac/low_i2c.c
@@ -492,7 +492,7 @@ static struct pmac_i2c_host_kw *__init 
kw_i2c_host_init(struct device_node *np)
const u32   *psteps, *prate, *addrp;
u32 steps;
 
-   host = kzalloc(sizeof(struct pmac_i2c_host_kw), GFP_KERNEL);
+   host = kzalloc(sizeof(*host), GFP_KERNEL);
if (!host)
return NULL;
 
@@ -571,7 +571,7 @@ static void __init kw_i2c_add(struct pmac_i2c_host_kw *host,
 {
struct pmac_i2c_bus *bus;
 
-   bus = kzalloc(sizeof(struct pmac_i2c_bus), GFP_KERNEL);
+   bus = kzalloc(sizeof(*bus), GFP_KERNEL);
if (bus == NULL)
return;
 
@@ -1253,7 +1253,7 @@ static void* pmac_i2c_do_begin(struct pmf_function *func, 
struct pmf_args *args)
 * near OOM that need to be resolved, the allocator itself should
 * probably make GFP_NOIO implicit during suspend
 */
-   inst = kzalloc(sizeof(struct pmac_i2c_pf_inst), GFP_KERNEL);
+   inst = kzalloc(sizeof(*inst), GFP_KERNEL);
if (inst == NULL) {
pmac_i2c_close(bus);
return NULL;
diff --git a/arch/powerpc/platforms/powermac/pfunc_core.c 
b/arch/powerpc/platforms/powermac/pfunc_core.c
index df3c93bef228..e0462fedcdb8 100644
--- a/arch/powerpc/platforms/powermac/pfunc_core.c
+++ b/arch/powerpc/platforms/powermac/pfunc_core.c
@@ -643,7 +643,7 @@ static int pmf_add_function_prop(struct pmf_device *dev, 
void *driverdata,
 
while (length >= 12) {
/* Allocate a structure */
-   func = kzalloc(sizeof(struct pmf_function), GFP_KERNEL);
+   func = kzalloc(sizeof(*func), GFP_KERNEL);
if (func == NULL)
goto bail;
kref_init(>ref);
@@ -719,7 +719,7 @@ int pmf_register_driver(struct device_node *np,
return -EBUSY;
}
 
-   dev = kzalloc(sizeof(struct pmf_device), GFP_KERNEL);
+   dev = kzalloc(sizeof(*dev), GFP_KERNEL);
if (dev == NULL) {
DBG("pmf: no memory !\n");
return -ENOMEM;
-- 
2.14.2

[PATCH 2/2] powermac: Improve a size determination in five functions

2017-10-05 Thread SF Markus Elfring

From: Markus Elfring 
Date: Thu, 5 Oct 2017 22:40:39 +0200

Replace the specification of data structures by pointer dereferences
as the parameter for the operator "sizeof" to make the corresponding size
determination a bit safer according to the Linux coding style convention.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 arch/powerpc/platforms/powermac/low_i2c.c| 6 +++---
 arch/powerpc/platforms/powermac/pfunc_core.c | 4 ++--
 2 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/arch/powerpc/platforms/powermac/low_i2c.c 
b/arch/powerpc/platforms/powermac/low_i2c.c
index ecd166b5549b..46b7eb1a0f5f 100644
--- a/arch/powerpc/platforms/powermac/low_i2c.c
+++ b/arch/powerpc/platforms/powermac/low_i2c.c
@@ -492,7 +492,7 @@ static struct pmac_i2c_host_kw *__init 
kw_i2c_host_init(struct device_node *np)
const u32   *psteps, *prate, *addrp;
u32 steps;
 
-   host = kzalloc(sizeof(struct pmac_i2c_host_kw), GFP_KERNEL);
+   host = kzalloc(sizeof(*host), GFP_KERNEL);
if (!host)
return NULL;
 
@@ -571,7 +571,7 @@ static void __init kw_i2c_add(struct pmac_i2c_host_kw *host,
 {
struct pmac_i2c_bus *bus;
 
-   bus = kzalloc(sizeof(struct pmac_i2c_bus), GFP_KERNEL);
+   bus = kzalloc(sizeof(*bus), GFP_KERNEL);
if (bus == NULL)
return;
 
@@ -1253,7 +1253,7 @@ static void* pmac_i2c_do_begin(struct pmf_function *func, 
struct pmf_args *args)
 * near OOM that need to be resolved, the allocator itself should
 * probably make GFP_NOIO implicit during suspend
 */
-   inst = kzalloc(sizeof(struct pmac_i2c_pf_inst), GFP_KERNEL);
+   inst = kzalloc(sizeof(*inst), GFP_KERNEL);
if (inst == NULL) {
pmac_i2c_close(bus);
return NULL;
diff --git a/arch/powerpc/platforms/powermac/pfunc_core.c 
b/arch/powerpc/platforms/powermac/pfunc_core.c
index df3c93bef228..e0462fedcdb8 100644
--- a/arch/powerpc/platforms/powermac/pfunc_core.c
+++ b/arch/powerpc/platforms/powermac/pfunc_core.c
@@ -643,7 +643,7 @@ static int pmf_add_function_prop(struct pmf_device *dev, 
void *driverdata,
 
while (length >= 12) {
/* Allocate a structure */
-   func = kzalloc(sizeof(struct pmf_function), GFP_KERNEL);
+   func = kzalloc(sizeof(*func), GFP_KERNEL);
if (func == NULL)
goto bail;
kref_init(>ref);
@@ -719,7 +719,7 @@ int pmf_register_driver(struct device_node *np,
return -EBUSY;
}
 
-   dev = kzalloc(sizeof(struct pmf_device), GFP_KERNEL);
+   dev = kzalloc(sizeof(*dev), GFP_KERNEL);
if (dev == NULL) {
DBG("pmf: no memory !\n");
return -ENOMEM;
-- 
2.14.2

Re: [PATCH v4 12/14] platform/x86: wmi: create character devices when requested by drivers

2017-10-05 Thread Darren Hart

On Thu, Oct 05, 2017 at 07:03:24PM +, mario.limoncie...@dell.com wrote:
> > 
> > And how _exactly_ is this interface exposed in Windows?  Is it ad-hoc
> > with custom kernel drivers written by each vendor?  Or does the OS
> > provide a "sane" interface for it?
> 
> On Windows it's a driver-less solution.  Vendors don't do anything other
> than provide the MOF (which describes how the data passed to ASL looks).
> 
> When Windows boots up, _WDG is parsed, the binary MOF is loaded into
> the WMI repository.  The MOF describes how named objects map to GUIDs
> which map to ASL.
> 
> >From Powershell or from any application that uses WMI as admin you can 
> look up the root namespace and see all objects.  You can pass calls back
> and forth.  There's all sorts of examples of it here:
> https://msdn.microsoft.com/en-us/library/windows/hardware/dn614028(v=vs.85).aspx
> 

A couple snippets from this link which I believe I've shared previously
that clarify the design intent of the WMI mechanism:

"
ACPI-to-WMI Mapper Goals for Windows Instrumentation:

...

* Allow OEMs to include OEM-specific data blocks, user-mode-callable
  ACPI control methods, and events without requiring any changes to the
  ACPI-to-WMI mapper.

* Allow general-purpose data consumer applications--those written
  without any special knowledge of the data blocks exposed by a
  particular machine--to be able to access and understand the data
  blocks, user-mode-callable ACPI control methods, and events being
  mapped--including those that are OEM specific.

...

The following are not goals for the ACPI-to-WMI mapper:

* To have specific knowledge about any data block that passes through
  the mapper.

* To provide interfaces specifically for SMBIOS data and functions. The
  mapper is an open architecture that is not restricted to SMBIOS data
  and functionality.
"

This model is not consistent with Linux design principles, and Mario's
changes attempt to allow for more kernel oversight by:

* Requiring a driver to be written to bind to any GUID which will expose
  WMI methods to userspace. The goal here is to provide an effective
  whitelist, and to promote vendor participation (they need to send the
  patch, have it reviewed, respond to challenges on the security
  implications, etc.)

* In the future, provide for the MOF parsing within the kernel so Linux
  will have more ability to audit messaging.

-- 
Darren Hart
VMware Open Source Technology Center

[PATCH 1/2] powermac: Delete an error message for a failed memory allocation in kw_i2c_host_init()

2017-10-05 Thread SF Markus Elfring

From: Markus Elfring 
Date: Thu, 5 Oct 2017 22:30:29 +0200

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 arch/powerpc/platforms/powermac/low_i2c.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/powermac/low_i2c.c 
b/arch/powerpc/platforms/powermac/low_i2c.c
index 70183eb3d5c8..ecd166b5549b 100644
--- a/arch/powerpc/platforms/powermac/low_i2c.c
+++ b/arch/powerpc/platforms/powermac/low_i2c.c
@@ -493,11 +493,8 @@ static struct pmac_i2c_host_kw *__init 
kw_i2c_host_init(struct device_node *np)
u32 steps;
 
host = kzalloc(sizeof(struct pmac_i2c_host_kw), GFP_KERNEL);
-   if (host == NULL) {
-   printk(KERN_ERR "low_i2c: Can't allocate host for %pOF\n",
-  np);
+   if (!host)
return NULL;
-   }
 
/* Apple is kind enough to provide a valid AAPL,address property
 * on all i2c keywest nodes so far ... we would have to fallback
-- 
2.14.2

Re: Extreme time jitter with suspend/resume cycles

2017-10-05 Thread Gabriel Beddingfield

Hi Thomas,

On Thu, Oct 5, 2017 at 11:01 AM, Thomas Gleixner  wrote:
>> > Which SoC/clocksource driver are you talking about?
>>
>> NXP i.MX 6SoloX
>> drivers/clocksource/timer-imx-gpt.c
>
> So that clocksource driver looks correct. Do you have an idea in which
> context this time jump happens? Does it happen when you exercise your high
> frequency suspend/resume dance or is that happening just when you let the
> machine run forever as well?

We couldn't devise any reproduction steps. We observed it happening at
unexpected
times in a fleet of devices -- and we couldn't find any patterns to clue us in.

>
> The timekeeping_resume() path definitely has an issue:
>
> cycle_now = tk_clock_read(>tkr_mono);
> if ((clock->flags & CLOCK_SOURCE_SUSPEND_NONSTOP) &&
> cycle_now > tk->tkr_mono.cycle_last) {
>
> This works nice for clocksources which wont wrap across suspend/resume but
> not for those which can. That cycle_now -> cycle_last check should take
> cs-mask into account ...
>
> Of course for clocksources which can wrap within realistic suspend times,
> which 36 hours might be accounted for, this would need an extra sanity
> check against a RTC whether wrap time has been exceeded.
>
> I haven't thought it through whether that buggered check fully explains
> what you are observing, but it's wrong nevertheless. John?

Nah. It looks like the consequence is that you'll either fail to
inject the sleep time
or you'll fall back to having the RTC inject the sleep time. In our
case, we never
sleep for more than a couple of minutes so the error would be seconds rather
than hours.

-gabe

Re: [PATCH v4 12/14] platform/x86: wmi: create character devices when requested by drivers

2017-10-05 Thread Darren Hart

On Thu, Oct 05, 2017 at 07:03:24PM +, mario.limoncie...@dell.com wrote:
> > 
> > And how _exactly_ is this interface exposed in Windows?  Is it ad-hoc
> > with custom kernel drivers written by each vendor?  Or does the OS
> > provide a "sane" interface for it?
> 
> On Windows it's a driver-less solution.  Vendors don't do anything other
> than provide the MOF (which describes how the data passed to ASL looks).
> 
> When Windows boots up, _WDG is parsed, the binary MOF is loaded into
> the WMI repository.  The MOF describes how named objects map to GUIDs
> which map to ASL.
> 
> >From Powershell or from any application that uses WMI as admin you can 
> look up the root namespace and see all objects.  You can pass calls back
> and forth.  There's all sorts of examples of it here:
> https://msdn.microsoft.com/en-us/library/windows/hardware/dn614028(v=vs.85).aspx
> 

A couple snippets from this link which I believe I've shared previously
that clarify the design intent of the WMI mechanism:

"
ACPI-to-WMI Mapper Goals for Windows Instrumentation:

...

* Allow OEMs to include OEM-specific data blocks, user-mode-callable
  ACPI control methods, and events without requiring any changes to the
  ACPI-to-WMI mapper.

* Allow general-purpose data consumer applications--those written
  without any special knowledge of the data blocks exposed by a
  particular machine--to be able to access and understand the data
  blocks, user-mode-callable ACPI control methods, and events being
  mapped--including those that are OEM specific.

...

The following are not goals for the ACPI-to-WMI mapper:

* To have specific knowledge about any data block that passes through
  the mapper.

* To provide interfaces specifically for SMBIOS data and functions. The
  mapper is an open architecture that is not restricted to SMBIOS data
  and functionality.
"

This model is not consistent with Linux design principles, and Mario's
changes attempt to allow for more kernel oversight by:

* Requiring a driver to be written to bind to any GUID which will expose
  WMI methods to userspace. The goal here is to provide an effective
  whitelist, and to promote vendor participation (they need to send the
  patch, have it reviewed, respond to challenges on the security
  implications, etc.)

* In the future, provide for the MOF parsing within the kernel so Linux
  will have more ability to audit messaging.

-- 
Darren Hart
VMware Open Source Technology Center

[PATCH 1/2] powermac: Delete an error message for a failed memory allocation in kw_i2c_host_init()

2017-10-05 Thread SF Markus Elfring

From: Markus Elfring 
Date: Thu, 5 Oct 2017 22:30:29 +0200

Omit an extra message for a memory allocation failure in this function.

This issue was detected by using the Coccinelle software.

Signed-off-by: Markus Elfring 
---
 arch/powerpc/platforms/powermac/low_i2c.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/arch/powerpc/platforms/powermac/low_i2c.c 
b/arch/powerpc/platforms/powermac/low_i2c.c
index 70183eb3d5c8..ecd166b5549b 100644
--- a/arch/powerpc/platforms/powermac/low_i2c.c
+++ b/arch/powerpc/platforms/powermac/low_i2c.c
@@ -493,11 +493,8 @@ static struct pmac_i2c_host_kw *__init 
kw_i2c_host_init(struct device_node *np)
u32 steps;
 
host = kzalloc(sizeof(struct pmac_i2c_host_kw), GFP_KERNEL);
-   if (host == NULL) {
-   printk(KERN_ERR "low_i2c: Can't allocate host for %pOF\n",
-  np);
+   if (!host)
return NULL;
-   }
 
/* Apple is kind enough to provide a valid AAPL,address property
 * on all i2c keywest nodes so far ... we would have to fallback
-- 
2.14.2

Re: Extreme time jitter with suspend/resume cycles

2017-10-05 Thread Gabriel Beddingfield

Hi Thomas,

On Thu, Oct 5, 2017 at 11:01 AM, Thomas Gleixner  wrote:
>> > Which SoC/clocksource driver are you talking about?
>>
>> NXP i.MX 6SoloX
>> drivers/clocksource/timer-imx-gpt.c
>
> So that clocksource driver looks correct. Do you have an idea in which
> context this time jump happens? Does it happen when you exercise your high
> frequency suspend/resume dance or is that happening just when you let the
> machine run forever as well?

We couldn't devise any reproduction steps. We observed it happening at
unexpected
times in a fleet of devices -- and we couldn't find any patterns to clue us in.

>
> The timekeeping_resume() path definitely has an issue:
>
> cycle_now = tk_clock_read(>tkr_mono);
> if ((clock->flags & CLOCK_SOURCE_SUSPEND_NONSTOP) &&
> cycle_now > tk->tkr_mono.cycle_last) {
>
> This works nice for clocksources which wont wrap across suspend/resume but
> not for those which can. That cycle_now -> cycle_last check should take
> cs-mask into account ...
>
> Of course for clocksources which can wrap within realistic suspend times,
> which 36 hours might be accounted for, this would need an extra sanity
> check against a RTC whether wrap time has been exceeded.
>
> I haven't thought it through whether that buggered check fully explains
> what you are observing, but it's wrong nevertheless. John?

Nah. It looks like the consequence is that you'll either fail to
inject the sleep time
or you'll fall back to having the RTC inject the sleep time. In our
case, we never
sleep for more than a couple of minutes so the error would be seconds rather
than hours.

-gabe

Re: [PATCH] Add Tecon Microprocessor Technologies, LLC vendor prefix

2017-10-05 Thread Rob Herring

On Tue, Sep 26, 2017 at 05:49:37PM +0300, dun...@tecon.ru wrote:
> From: Dmitry Dunaev 
> 
> Signed-off-by: Dmitry Dunaev 
> ---
>  Documentation/devicetree/bindings/vendor-prefixes.txt | 1 +
>  1 file changed, 1 insertion(+)

Applied.

Re: [PATCH] Add Tecon Microprocessor Technologies, LLC vendor prefix

2017-10-05 Thread Rob Herring

On Tue, Sep 26, 2017 at 05:49:37PM +0300, dun...@tecon.ru wrote:
> From: Dmitry Dunaev 
> 
> Signed-off-by: Dmitry Dunaev 
> ---
>  Documentation/devicetree/bindings/vendor-prefixes.txt | 1 +
>  1 file changed, 1 insertion(+)

Applied.

[PATCH 0/2] Power Mac: Adjustments for five function implementations

2017-10-05 Thread SF Markus Elfring

From: Markus Elfring 
Date: Thu, 5 Oct 2017 22:48:22 +0200

Two update suggestions were taken into account
from static source code analysis.

Markus Elfring (2):
  Delete an error message for a failed memory allocation in kw_i2c_host_init()
  Improve a size determination in five functions

 arch/powerpc/platforms/powermac/low_i2c.c| 11 ---
 arch/powerpc/platforms/powermac/pfunc_core.c |  4 ++--
 2 files changed, 6 insertions(+), 9 deletions(-)

-- 
2.14.2

Re: [PATCH v2 01/10] pinctrl: move gpio-axp209 to pinctrl

2017-10-05 Thread Rob Herring

On Tue, Sep 26, 2017 at 02:17:11PM +0200, Quentin Schulz wrote:
> To prepare the driver for the upcoming pinctrl features, move the GPIO
> driver AXP209 from GPIO to pinctrl subsystem.
> 
> Signed-off-by: Quentin Schulz 
> ---
>  Documentation/devicetree/bindings/gpio/gpio-axp209.txt   |  30 +-
>  Documentation/devicetree/bindings/pinctrl/pinctrl-axp209.txt |  30 +-
>  drivers/gpio/Kconfig |   6 +-
>  drivers/gpio/Makefile|   1 +-
>  drivers/gpio/gpio-axp209.c   | 188 +---
>  drivers/pinctrl/Kconfig  |   6 +-
>  drivers/pinctrl/Makefile |   1 +-
>  drivers/pinctrl/pinctrl-axp209.c | 188 +++-
>  8 files changed, 225 insertions(+), 225 deletions(-)
>  delete mode 100644 Documentation/devicetree/bindings/gpio/gpio-axp209.txt
>  create mode 100644 
> Documentation/devicetree/bindings/pinctrl/pinctrl-axp209.txt
>  delete mode 100644 drivers/gpio/gpio-axp209.c
>  create mode 100644 drivers/pinctrl/pinctrl-axp209.c
> 
> diff --git a/Documentation/devicetree/bindings/gpio/gpio-axp209.txt 
> b/Documentation/devicetree/bindings/gpio/gpio-axp209.txt
> deleted file mode 100644
> index a661130..000
> --- a/Documentation/devicetree/bindings/gpio/gpio-axp209.txt
> +++ /dev/null
> @@ -1,30 +0,0 @@
> -AXP209 GPIO controller
> -
> -This driver follows the usual GPIO bindings found in
> -Documentation/devicetree/bindings/gpio/gpio.txt
> -
> -Required properties:
> -- compatible: Should be "x-powers,axp209-gpio"
> -- #gpio-cells: Should be two. The first cell is the pin number and the
> -  second is the GPIO flags.
> -- gpio-controller: Marks the device node as a GPIO controller.
> -
> -This node must be a subnode of the axp20x PMIC, documented in
> -Documentation/devicetree/bindings/mfd/axp20x.txt
> -
> -Example:
> -
> -axp209: pmic@34 {
> - compatible = "x-powers,axp209";
> - reg = <0x34>;
> - interrupt-parent = <_intc>;
> - interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
> - interrupt-controller;
> - #interrupt-cells = <1>;
> -
> - axp_gpio: gpio {
> - compatible = "x-powers,axp209-gpio";
> - gpio-controller;
> - #gpio-cells = <2>;
> - };
> -};
> diff --git a/Documentation/devicetree/bindings/pinctrl/pinctrl-axp209.txt 
> b/Documentation/devicetree/bindings/pinctrl/pinctrl-axp209.txt
> new file mode 100644
> index 000..a661130
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/pinctrl/pinctrl-axp209.txt
> @@ -0,0 +1,30 @@
> +AXP209 GPIO controller
> +
> +This driver follows the usual GPIO bindings found in
> +Documentation/devicetree/bindings/gpio/gpio.txt
> +
> +Required properties:
> +- compatible: Should be "x-powers,axp209-gpio"
> +- #gpio-cells: Should be two. The first cell is the pin number and the
> +  second is the GPIO flags.
> +- gpio-controller: Marks the device node as a GPIO controller.

Maybe there's some reason to move the driver, but the binding describes 
a gpio-controller so it should remain in bindings/gpio/. Binding and 
driver directories aren't one to one necessarily.

Rob

[PATCH 0/2] Power Mac: Adjustments for five function implementations

2017-10-05 Thread SF Markus Elfring

From: Markus Elfring 
Date: Thu, 5 Oct 2017 22:48:22 +0200

Two update suggestions were taken into account
from static source code analysis.

Markus Elfring (2):
  Delete an error message for a failed memory allocation in kw_i2c_host_init()
  Improve a size determination in five functions

 arch/powerpc/platforms/powermac/low_i2c.c| 11 ---
 arch/powerpc/platforms/powermac/pfunc_core.c |  4 ++--
 2 files changed, 6 insertions(+), 9 deletions(-)

-- 
2.14.2

Re: [PATCH v2 01/10] pinctrl: move gpio-axp209 to pinctrl

2017-10-05 Thread Rob Herring

On Tue, Sep 26, 2017 at 02:17:11PM +0200, Quentin Schulz wrote:
> To prepare the driver for the upcoming pinctrl features, move the GPIO
> driver AXP209 from GPIO to pinctrl subsystem.
> 
> Signed-off-by: Quentin Schulz 
> ---
>  Documentation/devicetree/bindings/gpio/gpio-axp209.txt   |  30 +-
>  Documentation/devicetree/bindings/pinctrl/pinctrl-axp209.txt |  30 +-
>  drivers/gpio/Kconfig |   6 +-
>  drivers/gpio/Makefile|   1 +-
>  drivers/gpio/gpio-axp209.c   | 188 +---
>  drivers/pinctrl/Kconfig  |   6 +-
>  drivers/pinctrl/Makefile |   1 +-
>  drivers/pinctrl/pinctrl-axp209.c | 188 +++-
>  8 files changed, 225 insertions(+), 225 deletions(-)
>  delete mode 100644 Documentation/devicetree/bindings/gpio/gpio-axp209.txt
>  create mode 100644 
> Documentation/devicetree/bindings/pinctrl/pinctrl-axp209.txt
>  delete mode 100644 drivers/gpio/gpio-axp209.c
>  create mode 100644 drivers/pinctrl/pinctrl-axp209.c
> 
> diff --git a/Documentation/devicetree/bindings/gpio/gpio-axp209.txt 
> b/Documentation/devicetree/bindings/gpio/gpio-axp209.txt
> deleted file mode 100644
> index a661130..000
> --- a/Documentation/devicetree/bindings/gpio/gpio-axp209.txt
> +++ /dev/null
> @@ -1,30 +0,0 @@
> -AXP209 GPIO controller
> -
> -This driver follows the usual GPIO bindings found in
> -Documentation/devicetree/bindings/gpio/gpio.txt
> -
> -Required properties:
> -- compatible: Should be "x-powers,axp209-gpio"
> -- #gpio-cells: Should be two. The first cell is the pin number and the
> -  second is the GPIO flags.
> -- gpio-controller: Marks the device node as a GPIO controller.
> -
> -This node must be a subnode of the axp20x PMIC, documented in
> -Documentation/devicetree/bindings/mfd/axp20x.txt
> -
> -Example:
> -
> -axp209: pmic@34 {
> - compatible = "x-powers,axp209";
> - reg = <0x34>;
> - interrupt-parent = <_intc>;
> - interrupts = <0 IRQ_TYPE_LEVEL_LOW>;
> - interrupt-controller;
> - #interrupt-cells = <1>;
> -
> - axp_gpio: gpio {
> - compatible = "x-powers,axp209-gpio";
> - gpio-controller;
> - #gpio-cells = <2>;
> - };
> -};
> diff --git a/Documentation/devicetree/bindings/pinctrl/pinctrl-axp209.txt 
> b/Documentation/devicetree/bindings/pinctrl/pinctrl-axp209.txt
> new file mode 100644
> index 000..a661130
> --- /dev/null
> +++ b/Documentation/devicetree/bindings/pinctrl/pinctrl-axp209.txt
> @@ -0,0 +1,30 @@
> +AXP209 GPIO controller
> +
> +This driver follows the usual GPIO bindings found in
> +Documentation/devicetree/bindings/gpio/gpio.txt
> +
> +Required properties:
> +- compatible: Should be "x-powers,axp209-gpio"
> +- #gpio-cells: Should be two. The first cell is the pin number and the
> +  second is the GPIO flags.
> +- gpio-controller: Marks the device node as a GPIO controller.

Maybe there's some reason to move the driver, but the binding describes 
a gpio-controller so it should remain in bindings/gpio/. Binding and 
driver directories aren't one to one necessarily.

Rob

Re: [PATCH v4] Input: add support for the Samsung S6SY761 touchscreen

2017-10-05 Thread Rob Herring

On Tue, Sep 26, 2017 at 03:31:35PM +0900, Andi Shyti wrote:
> The S6SY761 touchscreen is a capicitive multi-touch controller
> for mobile use. It's connected with i2c at the address 0x48.
> 
> This commit provides a basic version of the driver which can
> handle only initialization, touch events and power states.
> 
> The controller is controlled by a firmware which, in the version
> I currently have, doesn't provide all the possible
> functionalities mentioned in the datasheet.
> 
> Signed-off-by: Andi Shyti 
> ---
> Hi,
> 
> sorry for the mix-up of the previous patch. This one should be
> fine. Here's the changelog:
> 
> v3 - v4
>  - fixed a mismatch on the module name
> 
> v2 - v3
>  - added security check on an unsigned value which can (unlikely)
>get a "negative" value
> 
>  - in the probe function the interrupt is requested after the
>input device registration in order to avoid checking in the
>interrupt handler whether the input device has been registered
> 
>  - removed the 'prev_pm_state' variable. Its original meaning
>was to restore the state of the device when coming back from
>sleep mode, but because I removed in patch v2 the low power
>mode, now the device works only in two modes and therefore
>'prev_pm_state' is not required any longer.
> 
> v1 - v2
>  - remove the low power functionality as it doesn't bring any
>benefit
>  - use get_unaligned_be16 instead of the form 'a << 8 | b'
>  - use max_t instead of '? :'
>  - use managed 'devm_device_add_group()'
> 
> Thanks,
> Andi
> 
>  .../bindings/input/touchscreen/samsung,s6sy761.txt |  34 ++

In the future, a separate patch for bindings please.

Otherwise,

Acked-by: Rob Herring 

>  drivers/input/touchscreen/Kconfig  |  11 +
>  drivers/input/touchscreen/Makefile |   1 +
>  drivers/input/touchscreen/s6sy761.c| 556 
> +
>  4 files changed, 602 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/input/touchscreen/samsung,s6sy761.txt
>  create mode 100644 drivers/input/touchscreen/s6sy761.c

Re: [PATCH v4] Input: add support for the Samsung S6SY761 touchscreen

2017-10-05 Thread Rob Herring

On Tue, Sep 26, 2017 at 03:31:35PM +0900, Andi Shyti wrote:
> The S6SY761 touchscreen is a capicitive multi-touch controller
> for mobile use. It's connected with i2c at the address 0x48.
> 
> This commit provides a basic version of the driver which can
> handle only initialization, touch events and power states.
> 
> The controller is controlled by a firmware which, in the version
> I currently have, doesn't provide all the possible
> functionalities mentioned in the datasheet.
> 
> Signed-off-by: Andi Shyti 
> ---
> Hi,
> 
> sorry for the mix-up of the previous patch. This one should be
> fine. Here's the changelog:
> 
> v3 - v4
>  - fixed a mismatch on the module name
> 
> v2 - v3
>  - added security check on an unsigned value which can (unlikely)
>get a "negative" value
> 
>  - in the probe function the interrupt is requested after the
>input device registration in order to avoid checking in the
>interrupt handler whether the input device has been registered
> 
>  - removed the 'prev_pm_state' variable. Its original meaning
>was to restore the state of the device when coming back from
>sleep mode, but because I removed in patch v2 the low power
>mode, now the device works only in two modes and therefore
>'prev_pm_state' is not required any longer.
> 
> v1 - v2
>  - remove the low power functionality as it doesn't bring any
>benefit
>  - use get_unaligned_be16 instead of the form 'a << 8 | b'
>  - use max_t instead of '? :'
>  - use managed 'devm_device_add_group()'
> 
> Thanks,
> Andi
> 
>  .../bindings/input/touchscreen/samsung,s6sy761.txt |  34 ++

In the future, a separate patch for bindings please.

Otherwise,

Acked-by: Rob Herring 

>  drivers/input/touchscreen/Kconfig  |  11 +
>  drivers/input/touchscreen/Makefile |   1 +
>  drivers/input/touchscreen/s6sy761.c| 556 
> +
>  4 files changed, 602 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/input/touchscreen/samsung,s6sy761.txt
>  create mode 100644 drivers/input/touchscreen/s6sy761.c

< 1 2 3 4 5 6 7 8 9 10 >

401 - 500 of 1942 matches

Mail list logo