date:20200629

Peter Zijlstra wrote:

...

> -#define lockdep_assert_irqs_disabled()   do {\
> - WARN_ONCE(debug_locks && !current->lockdep_recursion && \
> -   current->hardirqs_enabled,\
> -   "IRQs not disabled as expected\n");   \
> - } while (0)

...

> +#define lockdep_assert_irqs_disabled()   \
> +do { \
> + WARN_ON_ONCE(debug_locks && this_cpu_read(hardirqs_enabled));   \
> +} while (0)

I think it would be nice to keep the "IRQs not disabled as expected"
message. It makes the lockdep splat much more readable.

This is similarly the case for the v3 lockdep preemption macros:

  https://lkml.kernel.org/r/20200630054452.3675847-5-a.darw...@linutronix.de

I did not add a message though to get in-sync with the IRQ macros above.

Thanks,

--
Ahmed S. Darwish
Linutronix GmbH

RE: [PATCH v8 3/3] iommu/arm-smmu: Add global/context fault implementation hooks

2020-06-29 Thread Pritesh Raithatha

> Add global/context fault hooks to allow NVIDIA SMMU implementation handle
> faults across multiple SMMUs.
> 
> Signed-off-by: Krishna Reddy 

Reviewed-by: Pritesh Raithatha

Re: [PATCH] kdb: prevent possible null deref in kdb_msg_write

2020-06-29 Thread Sumit Garg

On Mon, 29 Jun 2020 at 21:07, Daniel Thompson
 wrote:
>
> On Mon, Jun 29, 2020 at 04:50:20PM +0200, Petr Mladek wrote:
> > On Mon 2020-06-29 16:59:24, Cengiz Can wrote:
> > > `kdb_msg_write` operates on a global `struct kgdb_io *` called
> > > `dbg_io_ops`.
> > >
> > > Although it is initialized in `debug_core.c`, there's a null check in
> > > `kdb_msg_write` which implies that it can be null whenever we dereference
> > > it in this function call.
> > >
> > > Coverity scanner caught this as CID 1465042.
> > >
> > > I have modified the function to bail out if `dbg_io_ops` is not properly
> > > initialized.
> > >
> > > Signed-off-by: Cengiz Can 
> > > ---
> > >  kernel/debug/kdb/kdb_io.c | 15 ---
> > >  1 file changed, 8 insertions(+), 7 deletions(-)
> > >
> > > diff --git a/kernel/debug/kdb/kdb_io.c b/kernel/debug/kdb/kdb_io.c
> > > index 683a799618ad..85e579812458 100644
> > > --- a/kernel/debug/kdb/kdb_io.c
> > > +++ b/kernel/debug/kdb/kdb_io.c
> > > @@ -549,14 +549,15 @@ static void kdb_msg_write(const char *msg, int 
> > > msg_len)
> > > if (msg_len == 0)
> > > return;
> > >
> > > -   if (dbg_io_ops) {
> > > -   const char *cp = msg;
> > > -   int len = msg_len;
> > > +   if (!dbg_io_ops)
> > > +   return;
> >
> > This looks wrong. The message should be printed to the consoles
> > even when dbg_io_ops is NULL. I mean that the for_each_console(c)
> > cycle should always get called.
> >
> > Well, the code really looks racy. dbg_io_ops is set under
> > kgdb_registration_lock. IMHO, it should also get accessed under this lock.
> >
> > It seems that the race is possible. kdb_msg_write() is called from
> > vkdb_printf(). This function is serialized on more CPUs using
> > kdb_printf_cpu lock. But it is not serialized with
> > kgdb_register_io_module() and kgdb_unregister_io_module() calls.
>
> We can't take the lock from the trap handler itself since we cannot
> have spinlocks contended between regular threads and the debug trap
> (which could be an NMI).
>
> Instead, the call to kgdb_unregister_callbacks() at the beginning
> of kgdb_unregister_io_module() should render kdb_msg_write()
> unreachable prior to dbg_io_ops becoming NULL.
>
> As it happens I am starting to believe there is a race in this area but
> the race is between register/unregister calls rather than against the
> trap handler (if there were register/unregister races then the trap
> handler is be a potential victim of the race though).
>
>
> > But I might miss something. dbg_io_ops is dereferenced on many other
> > locations without any check.
>
> There is already a paranoid "just in case there are bugs" check in
> kgdb_io_ready() so in any case I think the check in kdb_msg_write() can
> simply be removed.
>
> As I said in my other post, if dbg_io_ops were ever NULL then the
> system is completely hosed anyway: we can never receive the keystroke
> needed to leave the debugger... and may not be able to tell anybody
> why.
>
>
> > >
> > > -   while (len--) {
> > > -   dbg_io_ops->write_char(*cp);
> > > -   cp++;
> > > -   }
> > > +   const char *cp = msg;
> > > +   int len = msg_len;
> > > +
> > > +   while (len--) {
> > > +   dbg_io_ops->write_char(*cp);
> > > +   cp++;
> > > }
> > >
> > > for_each_console(c) {
> >
> > You probably got confused by this new code:
> >
> >   if (c == dbg_io_ops->cons)
> >   continue;
> >
> > It dereferences dbg_io_ops without NULL check. It should probably
> > get replaced by:
> >
> >   if (dbg_io_ops && c == dbg_io_ops->cons)
> >   continue;
> >
> > Daniel, Sumit, could you please put some light on this?
>
> As above, I think the NULL check that confuses coverity can simply be
> removed.
>

+1

-Sumit

>
> Daniel.

RE: [PATCH v8 1/3] iommu/arm-smmu: add NVIDIA implementation for dual ARM MMU-500 usage

2020-06-29 Thread Pritesh Raithatha

> NVIDIA's Tegra194 SoC uses two ARM MMU-500s together to interleave IOVA
> accesses across them.
> Add NVIDIA implementation for dual ARM MMU-500s and add new compatible
> string for Tegra194 SoC SMMU topology.
> 
> Signed-off-by: Krishna Reddy 

Reviewed-by: Pritesh Raithatha

Re: [PATCH 2/2] can: flexcan: add support for ISO CAN-FD

2020-06-29 Thread Michael Walle


[+ Oliver]

Hi Joakim,

Am 2020-06-30 04:42, schrieb Joakim Zhang:

-Original Message-
From: Michael Walle 
Sent: 2020年6月30日 2:18
To: linux-...@vger.kernel.org; net...@vger.kernel.org;
linux-kernel@vger.kernel.org
Cc: Wolfgang Grandegger ; Marc Kleine-Budde
; David S . Miller ; Jakub
Kicinski ; Joakim Zhang ;
dl-linux-imx ; Michael Walle 
Subject: [PATCH 2/2] can: flexcan: add support for ISO CAN-FD

Up until now, the controller used non-ISO CAN-FD mode, although it 
supports it.
Add support for ISO mode, too. By default the hardware is in non-ISO 
mode and

an enable bit has to be explicitly set.

Signed-off-by: Michael Walle 
---
 drivers/net/can/flexcan.c | 19 ---
 1 file changed, 16 insertions(+), 3 deletions(-)

diff --git a/drivers/net/can/flexcan.c b/drivers/net/can/flexcan.c 
index

183e094f8d66..a92d3cdf4195 100644
--- a/drivers/net/can/flexcan.c
+++ b/drivers/net/can/flexcan.c
@@ -94,6 +94,7 @@
 #define FLEXCAN_CTRL2_MRP  BIT(18)
 #define FLEXCAN_CTRL2_RRS  BIT(17)
 #define FLEXCAN_CTRL2_EACENBIT(16)
+#define FLEXCAN_CTRL2_ISOCANFDEN   BIT(12)

 /* FLEXCAN memory error control register (MECR) bits */
 #define FLEXCAN_MECR_ECRWRDIS  BIT(31)
@@ -1344,14 +1345,25 @@ static int flexcan_chip_start(struct 
net_device

*dev)
else
reg_mcr |= FLEXCAN_MCR_SRX_DIS;

-   /* MCR - CAN-FD */
-   if (priv->can.ctrlmode & CAN_CTRLMODE_FD)
+   /* MCR, CTRL2
+*
+* CAN-FD mode
+* ISO CAN-FD mode
+*/
+   reg_ctrl2 = priv->read(>ctrl2);
+   if (priv->can.ctrlmode & CAN_CTRLMODE_FD) {
reg_mcr |= FLEXCAN_MCR_FDEN;
-   else
+   reg_ctrl2 |= FLEXCAN_CTRL2_ISOCANFDEN;
+   } else {
reg_mcr &= ~FLEXCAN_MCR_FDEN;
+   }
+
+   if (priv->can.ctrlmode & CAN_CTRLMODE_FD_NON_ISO)
+   reg_ctrl2 &= ~FLEXCAN_CTRL2_ISOCANFDEN;





[..]

ip link set can0 up type can bitrate 100 dbitrate 500 fd on
ip link set can0 up type can bitrate 100 dbitrate 500 fd on \
   fd-non-iso on


vs.

ip link set can0 up type can bitrate 100 dbitrate 500 
fd-non-iso on


I haven't found anything if CAN_CTRLMODE_FD_NON_ISO depends on
CAN_CTRLMODE_FD. I.e. wether CAN_CTRLMODE_FD_NON_ISO can only be set if
CAN_CTRLMODE_FD is also set.

Only the following piece of code, which might be a hint that you
have to set CAN_CTRLMODE_FD if you wan't to use CAN_CTRLMODE_FD_NON_ISO:

drivers/net/can/dev.c:
  /* do not check for static fd-non-iso if 'fd' is disabled */
  if (!(maskedflags & CAN_CTRLMODE_FD))
  ctrlstatic &= ~CAN_CTRLMODE_FD_NON_ISO;

If CAN_CTRLMODE_FD_NON_ISO can be set without CAN_CTRLMODE_FD, what
should be the mode if both are set at the same time?

Marc? Oliver?

-michael

[PATCH v3 15/20] raid5: Use sequence counter with associated spinlock

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish 
---
 drivers/md/raid5.c | 2 +-
 drivers/md/raid5.h | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/md/raid5.c b/drivers/md/raid5.c
index ab8067f9ce8c..892aefe88fa7 100644
--- a/drivers/md/raid5.c
+++ b/drivers/md/raid5.c
@@ -6935,7 +6935,7 @@ static struct r5conf *setup_conf(struct mddev *mddev)
} else
goto abort;
spin_lock_init(>device_lock);
-   seqcount_init(>gen_lock);
+   seqcount_spinlock_init(>gen_lock, >device_lock);
mutex_init(>cache_size_mutex);
init_waitqueue_head(>wait_for_quiescent);
init_waitqueue_head(>wait_for_stripe);
diff --git a/drivers/md/raid5.h b/drivers/md/raid5.h
index f90e0704bed9..a2c9e9e9f5ac 100644
--- a/drivers/md/raid5.h
+++ b/drivers/md/raid5.h
@@ -589,7 +589,7 @@ struct r5conf {
int prev_chunk_sectors;
int prev_algo;
short   generation; /* increments with every reshape */
-   seqcount_t  gen_lock;   /* lock against generation 
changes */
+   seqcount_spinlock_t gen_lock;   /* lock against generation 
changes */
unsigned long   reshape_checkpoint; /* Time we last updated
 * metadata */
long long   min_offset_diff; /* minimum difference between
-- 
2.20.1

[PATCH v3 20/20] hrtimer: Use sequence counter with associated raw spinlock

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_raw_spinlock_t data type, which allows to associate
a raw spinlock with the sequence counter. This enables lockdep to verify
that the raw spinlock used for writer serialization is held when the
write side critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish 
---
 include/linux/hrtimer.h |  2 +-
 kernel/time/hrtimer.c   | 13 ++---
 2 files changed, 11 insertions(+), 4 deletions(-)

diff --git a/include/linux/hrtimer.h b/include/linux/hrtimer.h
index 15c8ac313678..25993b86ac5c 100644
--- a/include/linux/hrtimer.h
+++ b/include/linux/hrtimer.h
@@ -159,7 +159,7 @@ struct hrtimer_clock_base {
struct hrtimer_cpu_base *cpu_base;
unsigned intindex;
clockid_t   clockid;
-   seqcount_t  seq;
+   seqcount_raw_spinlock_t seq;
struct hrtimer  *running;
struct timerqueue_head  active;
ktime_t (*get_time)(void);
diff --git a/kernel/time/hrtimer.c b/kernel/time/hrtimer.c
index d89da1c7e005..c4038511d5c9 100644
--- a/kernel/time/hrtimer.c
+++ b/kernel/time/hrtimer.c
@@ -135,7 +135,11 @@ static const int hrtimer_clock_to_base_table[MAX_CLOCKS] = 
{
  * timer->base->cpu_base
  */
 static struct hrtimer_cpu_base migration_cpu_base = {
-   .clock_base = { { .cpu_base = _cpu_base, }, },
+   .clock_base = { {
+   .cpu_base = _cpu_base,
+   .seq  = SEQCNT_RAW_SPINLOCK_ZERO(migration_cpu_base.seq,
+_cpu_base.lock),
+   }, },
 };
 
 #define migration_base migration_cpu_base.clock_base[0]
@@ -1998,8 +2002,11 @@ int hrtimers_prepare_cpu(unsigned int cpu)
int i;
 
for (i = 0; i < HRTIMER_MAX_CLOCK_BASES; i++) {
-   cpu_base->clock_base[i].cpu_base = cpu_base;
-   timerqueue_init_head(_base->clock_base[i].active);
+   struct hrtimer_clock_base *clock_b = _base->clock_base[i];
+
+   clock_b->cpu_base = cpu_base;
+   seqcount_raw_spinlock_init(_b->seq, _base->lock);
+   timerqueue_init_head(_b->active);
}
 
cpu_base->cpu = cpu;
-- 
2.20.1

[PATCH v3 17/20] NFSv4: Use sequence counter with associated spinlock

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish 
---
 fs/nfs/nfs4_fs.h   | 2 +-
 fs/nfs/nfs4state.c | 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/nfs/nfs4_fs.h b/fs/nfs/nfs4_fs.h
index 2b7f6dcd2eb8..210e590e1f71 100644
--- a/fs/nfs/nfs4_fs.h
+++ b/fs/nfs/nfs4_fs.h
@@ -117,7 +117,7 @@ struct nfs4_state_owner {
unsigned longso_flags;
struct list_head so_states;
struct nfs_seqid_counter so_seqid;
-   seqcount_t   so_reclaim_seqcount;
+   seqcount_spinlock_t  so_reclaim_seqcount;
struct mutex so_delegreturn_mutex;
 };
 
diff --git a/fs/nfs/nfs4state.c b/fs/nfs/nfs4state.c
index a8dc25ce48bb..b1dba24918f8 100644
--- a/fs/nfs/nfs4state.c
+++ b/fs/nfs/nfs4state.c
@@ -509,7 +509,7 @@ nfs4_alloc_state_owner(struct nfs_server *server,
nfs4_init_seqid_counter(>so_seqid);
atomic_set(>so_count, 1);
INIT_LIST_HEAD(>so_lru);
-   seqcount_init(>so_reclaim_seqcount);
+   seqcount_spinlock_init(>so_reclaim_seqcount, >so_lock);
mutex_init(>so_delegreturn_mutex);
return sp;
 }
-- 
2.20.1

[PATCH v3 12/20] xfrm: policy: Use sequence counters with associated lock

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. If the serialization primitive is
not disabling preemption implicitly, preemption has to be explicitly
disabled before entering the sequence counter write side critical
section.

A plain seqcount_t does not contain the information of which lock must
be held when entering a write side critical section.

Use the new seqcount_spinlock_t and seqcount_mutex_t data types instead,
which allow to associate a lock with the sequence counter. This enables
lockdep to verify that the lock used for writer serialization is held
when the write side critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish 
---
 net/xfrm/xfrm_policy.c | 10 +-
 1 file changed, 5 insertions(+), 5 deletions(-)

diff --git a/net/xfrm/xfrm_policy.c b/net/xfrm/xfrm_policy.c
index 564aa6492e7c..732a940468b0 100644
--- a/net/xfrm/xfrm_policy.c
+++ b/net/xfrm/xfrm_policy.c
@@ -122,7 +122,7 @@ struct xfrm_pol_inexact_bin {
/* list containing '*:*' policies */
struct hlist_head hhead;
 
-   seqcount_t count;
+   seqcount_spinlock_t count;
/* tree sorted by daddr/prefix */
struct rb_root root_d;
 
@@ -155,7 +155,7 @@ static struct xfrm_policy_afinfo const __rcu 
*xfrm_policy_afinfo[AF_INET6 + 1]
__read_mostly;
 
 static struct kmem_cache *xfrm_dst_cache __ro_after_init;
-static __read_mostly seqcount_t xfrm_policy_hash_generation;
+static __read_mostly seqcount_mutex_t xfrm_policy_hash_generation;
 
 static struct rhashtable xfrm_policy_inexact_table;
 static const struct rhashtable_params xfrm_pol_inexact_params;
@@ -719,7 +719,7 @@ xfrm_policy_inexact_alloc_bin(const struct xfrm_policy 
*pol, u8 dir)
INIT_HLIST_HEAD(>hhead);
bin->root_d = RB_ROOT;
bin->root_s = RB_ROOT;
-   seqcount_init(>count);
+   seqcount_spinlock_init(>count, >xfrm.xfrm_policy_lock);
 
prev = rhashtable_lookup_get_insert_key(_policy_inexact_table,
>k, >head,
@@ -1906,7 +1906,7 @@ static int xfrm_policy_match(const struct xfrm_policy 
*pol,
 
 static struct xfrm_pol_inexact_node *
 xfrm_policy_lookup_inexact_addr(const struct rb_root *r,
-   seqcount_t *count,
+   seqcount_spinlock_t *count,
const xfrm_address_t *addr, u16 family)
 {
const struct rb_node *parent;
@@ -4153,7 +4153,7 @@ void __init xfrm_init(void)
 {
register_pernet_subsys(_net_ops);
xfrm_dev_init();
-   seqcount_init(_policy_hash_generation);
+   seqcount_mutex_init(_policy_hash_generation, _resize_mutex);
xfrm_input_init();
 
 #ifdef CONFIG_INET_ESPINTCP
-- 
2.20.1

[PATCH v3 13/20] timekeeping: Use sequence counter with associated raw spinlock

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_raw_spinlock_t data type, which allows to associate
a raw spinlock with the sequence counter. This enables lockdep to verify
that the raw spinlock used for writer serialization is held when the
write side critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish 
---
 kernel/time/timekeeping.c | 19 +++
 1 file changed, 11 insertions(+), 8 deletions(-)

diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c
index d20d489841c8..05ecfd8a3314 100644
--- a/kernel/time/timekeeping.c
+++ b/kernel/time/timekeeping.c
@@ -39,18 +39,19 @@ enum timekeeping_adv_mode {
TK_ADV_FREQ
 };
 
+static DEFINE_RAW_SPINLOCK(timekeeper_lock);
+
 /*
  * The most important data for readout fits into a single 64 byte
  * cache line.
  */
 static struct {
-   seqcount_t  seq;
+   seqcount_raw_spinlock_t seq;
struct timekeeper   timekeeper;
 } tk_core cacheline_aligned = {
-   .seq = SEQCNT_ZERO(tk_core.seq),
+   .seq = SEQCNT_RAW_SPINLOCK_ZERO(tk_core.seq, _lock),
 };
 
-static DEFINE_RAW_SPINLOCK(timekeeper_lock);
 static struct timekeeper shadow_timekeeper;
 
 /**
@@ -63,7 +64,7 @@ static struct timekeeper shadow_timekeeper;
  * See @update_fast_timekeeper() below.
  */
 struct tk_fast {
-   seqcount_t  seq;
+   seqcount_raw_spinlock_t seq;
struct tk_read_base base[2];
 };
 
@@ -80,11 +81,13 @@ static struct clocksource dummy_clock = {
 };
 
 static struct tk_fast tk_fast_mono cacheline_aligned = {
+   .seq = SEQCNT_RAW_SPINLOCK_ZERO(tk_fast_mono.seq, _lock),
.base[0] = { .clock = _clock, },
.base[1] = { .clock = _clock, },
 };
 
 static struct tk_fast tk_fast_raw  cacheline_aligned = {
+   .seq = SEQCNT_RAW_SPINLOCK_ZERO(tk_fast_raw.seq, _lock),
.base[0] = { .clock = _clock, },
.base[1] = { .clock = _clock, },
 };
@@ -157,7 +160,7 @@ static inline void tk_update_sleep_time(struct timekeeper 
*tk, ktime_t delta)
  * tk_clock_read - atomic clocksource read() helper
  *
  * This helper is necessary to use in the read paths because, while the
- * seqlock ensures we don't return a bad value while structures are updated,
+ * seqcount ensures we don't return a bad value while structures are updated,
  * it doesn't protect from potential crashes. There is the possibility that
  * the tkr's clocksource may change between the read reference, and the
  * clock reference passed to the read function.  This can cause crashes if
@@ -222,10 +225,10 @@ static inline u64 timekeeping_get_delta(const struct 
tk_read_base *tkr)
unsigned int seq;
 
/*
-* Since we're called holding a seqlock, the data may shift
+* Since we're called holding a seqcount, the data may shift
 * under us while we're doing the calculation. This can cause
 * false positives, since we'd note a problem but throw the
-* results away. So nest another seqlock here to atomically
+* results away. So nest another seqcount here to atomically
 * grab the points we are checking with.
 */
do {
@@ -486,7 +489,7 @@ EXPORT_SYMBOL_GPL(ktime_get_raw_fast_ns);
  *
  * To keep it NMI safe since we're accessing from tracing, we're not using a
  * separate timekeeper with updates to monotonic clock and boot offset
- * protected with seqlocks. This has the following minor side effects:
+ * protected with seqcounts. This has the following minor side effects:
  *
  * (1) Its possible that a timestamp be taken after the boot offset is updated
  * but before the timekeeper is updated. If this happens, the new boot offset
-- 
2.20.1

[PATCH v3 11/20] netfilter: nft_set_rbtree: Use sequence counter with associated rwlock

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_rwlock_t data type, which allows to associate a
rwlock with the sequence counter. This enables lockdep to verify that
the rwlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish 
---
 net/netfilter/nft_set_rbtree.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/net/netfilter/nft_set_rbtree.c b/net/netfilter/nft_set_rbtree.c
index 62f416bc0579..9f58261ee4c7 100644
--- a/net/netfilter/nft_set_rbtree.c
+++ b/net/netfilter/nft_set_rbtree.c
@@ -18,7 +18,7 @@
 struct nft_rbtree {
struct rb_root  root;
rwlock_tlock;
-   seqcount_t  count;
+   seqcount_rwlock_t   count;
struct delayed_work gc_work;
 };
 
@@ -516,7 +516,7 @@ static int nft_rbtree_init(const struct nft_set *set,
struct nft_rbtree *priv = nft_set_priv(set);
 
rwlock_init(>lock);
-   seqcount_init(>count);
+   seqcount_rwlock_init(>count, >lock);
priv->root = RB_ROOT;
 
INIT_DEFERRABLE_WORK(>gc_work, nft_rbtree_gc);
-- 
2.20.1

[PATCH v3 16/20] iocost: Use sequence counter with associated spinlock

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish 
---
 block/blk-iocost.c | 5 ++---
 1 file changed, 2 insertions(+), 3 deletions(-)

diff --git a/block/blk-iocost.c b/block/blk-iocost.c
index 8ac4aad66ebc..8e940c27c27c 100644
--- a/block/blk-iocost.c
+++ b/block/blk-iocost.c
@@ -406,7 +406,7 @@ struct ioc {
enum ioc_runningrunning;
atomic64_t  vtime_rate;
 
-   seqcount_t  period_seqcount;
+   seqcount_spinlock_t period_seqcount;
u32 period_at;  /* wallclock starttime 
*/
u64 period_at_vtime; /* vtime starttime */
 
@@ -873,7 +873,6 @@ static void ioc_now(struct ioc *ioc, struct ioc_now *now)
 
 static void ioc_start_period(struct ioc *ioc, struct ioc_now *now)
 {
-   lockdep_assert_held(>lock);
WARN_ON_ONCE(ioc->running != IOC_RUNNING);
 
write_seqcount_begin(>period_seqcount);
@@ -2001,7 +2000,7 @@ static int blk_iocost_init(struct request_queue *q)
 
ioc->running = IOC_IDLE;
atomic64_set(>vtime_rate, VTIME_PER_USEC);
-   seqcount_init(>period_seqcount);
+   seqcount_spinlock_init(>period_seqcount, >lock);
ioc->period_at = ktime_to_us(ktime_get());
atomic64_set(>cur_period, 0);
atomic_set(>hweight_gen, 0);
-- 
2.20.1

[PATCH v3 19/20] kvm/eventfd: Use sequence counter with associated spinlock

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish 
Acked-by: Paolo Bonzini 
---
 include/linux/kvm_irqfd.h | 2 +-
 virt/kvm/eventfd.c| 2 +-
 2 files changed, 2 insertions(+), 2 deletions(-)

diff --git a/include/linux/kvm_irqfd.h b/include/linux/kvm_irqfd.h
index dc1da020305b..dac047abdba7 100644
--- a/include/linux/kvm_irqfd.h
+++ b/include/linux/kvm_irqfd.h
@@ -42,7 +42,7 @@ struct kvm_kernel_irqfd {
wait_queue_entry_t wait;
/* Update side is protected by irqfds.lock */
struct kvm_kernel_irq_routing_entry irq_entry;
-   seqcount_t irq_entry_sc;
+   seqcount_spinlock_t irq_entry_sc;
/* Used for level IRQ fast-path */
int gsi;
struct work_struct inject;
diff --git a/virt/kvm/eventfd.c b/virt/kvm/eventfd.c
index ef7ed916ad4a..d6408bb497dc 100644
--- a/virt/kvm/eventfd.c
+++ b/virt/kvm/eventfd.c
@@ -303,7 +303,7 @@ kvm_irqfd_assign(struct kvm *kvm, struct kvm_irqfd *args)
INIT_LIST_HEAD(>list);
INIT_WORK(>inject, irqfd_inject);
INIT_WORK(>shutdown, irqfd_shutdown);
-   seqcount_init(>irq_entry_sc);
+   seqcount_spinlock_init(>irq_entry_sc, >irqfds.lock);
 
f = fdget(args->fd);
if (!f.file) {
-- 
2.20.1

[PATCH v3 18/20] userfaultfd: Use sequence counter with associated spinlock

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish 
---
 fs/userfaultfd.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 52de29000c7e..26e8b23594fb 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -61,7 +61,7 @@ struct userfaultfd_ctx {
/* waitqueue head for events */
wait_queue_head_t event_wqh;
/* a refile sequence protected by fault_pending_wqh lock */
-   struct seqcount refile_seq;
+   seqcount_spinlock_t refile_seq;
/* pseudo fd refcounting */
refcount_t refcount;
/* userfaultfd syscall flags */
@@ -1998,7 +1998,7 @@ static void init_once_userfaultfd_ctx(void *mem)
init_waitqueue_head(>fault_wqh);
init_waitqueue_head(>event_wqh);
init_waitqueue_head(>fd_wqh);
-   seqcount_init(>refile_seq);
+   seqcount_spinlock_init(>refile_seq, >fault_pending_wqh.lock);
 }
 
 SYSCALL_DEFINE1(userfaultfd, int, flags)
-- 
2.20.1

[PATCH v3 14/20] vfs: Use sequence counter with associated spinlock

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish 
---
 fs/dcache.c   | 2 +-
 fs/fs_struct.c| 4 ++--
 include/linux/dcache.h| 2 +-
 include/linux/fs_struct.h | 2 +-
 4 files changed, 5 insertions(+), 5 deletions(-)

diff --git a/fs/dcache.c b/fs/dcache.c
index 361ea7ab30ea..ea0485861d93 100644
--- a/fs/dcache.c
+++ b/fs/dcache.c
@@ -1746,7 +1746,7 @@ static struct dentry *__d_alloc(struct super_block *sb, 
const struct qstr *name)
dentry->d_lockref.count = 1;
dentry->d_flags = 0;
spin_lock_init(>d_lock);
-   seqcount_init(>d_seq);
+   seqcount_spinlock_init(>d_seq, >d_lock);
dentry->d_inode = NULL;
dentry->d_parent = dentry;
dentry->d_sb = sb;
diff --git a/fs/fs_struct.c b/fs/fs_struct.c
index ca639ed967b7..04b3f5b9c629 100644
--- a/fs/fs_struct.c
+++ b/fs/fs_struct.c
@@ -117,7 +117,7 @@ struct fs_struct *copy_fs_struct(struct fs_struct *old)
fs->users = 1;
fs->in_exec = 0;
spin_lock_init(>lock);
-   seqcount_init(>seq);
+   seqcount_spinlock_init(>seq, >lock);
fs->umask = old->umask;
 
spin_lock(>lock);
@@ -163,6 +163,6 @@ EXPORT_SYMBOL(current_umask);
 struct fs_struct init_fs = {
.users  = 1,
.lock   = __SPIN_LOCK_UNLOCKED(init_fs.lock),
-   .seq= SEQCNT_ZERO(init_fs.seq),
+   .seq= SEQCNT_SPINLOCK_ZERO(init_fs.seq, _fs.lock),
.umask  = 0022,
 };
diff --git a/include/linux/dcache.h b/include/linux/dcache.h
index a81f0c3cf352..65d975bf9390 100644
--- a/include/linux/dcache.h
+++ b/include/linux/dcache.h
@@ -89,7 +89,7 @@ extern struct dentry_stat_t dentry_stat;
 struct dentry {
/* RCU lookup touched fields */
unsigned int d_flags;   /* protected by d_lock */
-   seqcount_t d_seq;   /* per dentry seqlock */
+   seqcount_spinlock_t d_seq;  /* per dentry seqlock */
struct hlist_bl_node d_hash;/* lookup hash list */
struct dentry *d_parent;/* parent directory */
struct qstr d_name;
diff --git a/include/linux/fs_struct.h b/include/linux/fs_struct.h
index cf1015abfbf2..783b48dedb72 100644
--- a/include/linux/fs_struct.h
+++ b/include/linux/fs_struct.h
@@ -9,7 +9,7 @@
 struct fs_struct {
int users;
spinlock_t lock;
-   seqcount_t seq;
+   seqcount_spinlock_t seq;
int umask;
int in_exec;
struct path root, pwd;
-- 
2.20.1

[PATCH v3 07/20] dma-buf: Remove custom seqcount lockdep class key

Commit 3c3b177a9369 ("reservation: add support for read-only access
using rcu") introduced a sequence counter to manage updates to
reservations. Back then, the reservation object initializer
reservation_object_init() was always inlined.

Having the sequence counter initialization inlined meant that each of
the call sites would have a different lockdep class key, which would've
broken lockdep's deadlock detection. The aforementioned commit thus
introduced, and exported, a custom seqcount lockdep class key and name.

The commit 8735f16803f00 ("dma-buf: cleanup reservation_object_init...")
transformed the reservation object initializer to a normal non-inlined C
function. seqcount_init(), which automatically defines the seqcount
lockdep class key and must be called non-inlined, can now be safely used.

Remove the seqcount custom lockdep class key, name, and export. Use
seqcount_init() inside the dma reservation object initializer.

Signed-off-by: Ahmed S. Darwish 
Reviewed-by: Sebastian Andrzej Siewior 
Acked-by: Daniel Vetter 
---
 drivers/dma-buf/dma-resv.c | 9 +
 include/linux/dma-resv.h   | 2 --
 2 files changed, 1 insertion(+), 10 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index b45f8514dc82..15efa0c2dacb 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -51,12 +51,6 @@
 DEFINE_WD_CLASS(reservation_ww_class);
 EXPORT_SYMBOL(reservation_ww_class);
 
-struct lock_class_key reservation_seqcount_class;
-EXPORT_SYMBOL(reservation_seqcount_class);
-
-const char reservation_seqcount_string[] = "reservation_seqcount";
-EXPORT_SYMBOL(reservation_seqcount_string);
-
 /**
  * dma_resv_list_alloc - allocate fence list
  * @shared_max: number of fences we need space for
@@ -135,9 +129,8 @@ subsys_initcall(dma_resv_lockdep);
 void dma_resv_init(struct dma_resv *obj)
 {
ww_mutex_init(>lock, _ww_class);
+   seqcount_init(>seq);
 
-   __seqcount_init(>seq, reservation_seqcount_string,
-   _seqcount_class);
RCU_INIT_POINTER(obj->fence, NULL);
RCU_INIT_POINTER(obj->fence_excl, NULL);
 }
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index ee50d10f052b..a6538ae7d93f 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -46,8 +46,6 @@
 #include 
 
 extern struct ww_class reservation_ww_class;
-extern struct lock_class_key reservation_seqcount_class;
-extern const char reservation_seqcount_string[];
 
 /**
  * struct dma_resv_list - a list of shared fences
-- 
2.20.1

[PATCH v3 05/20] seqlock: lockdep assert non-preemptibility on seqcount_t write

Preemption must be disabled before entering a sequence count write side
critical section.  Failing to do so, the seqcount read side can preempt
the write side section and spin for the entire scheduler tick.  If that
reader belongs to a real-time scheduling class, it can spin forever and
the kernel will livelock.

Assert through lockdep that preemption is disabled for seqcount writers.

Signed-off-by: Ahmed S. Darwish 
---
 include/linux/seqlock.h | 27 ++-
 1 file changed, 22 insertions(+), 5 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index 057f7326a877..679c440b17fe 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -419,12 +419,29 @@ static inline void raw_write_seqcount_latch(seqcount_t *s)
smp_wmb();  /* increment "sequence" before following stores */
 }
 
-static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
+static inline void __write_seqcount_begin_nested(seqcount_t *s, int subclass)
 {
raw_write_seqcount_begin(s);
seqcount_acquire(>dep_map, subclass, 0, _RET_IP_);
 }
 
+static inline void write_seqcount_begin_nested(seqcount_t *s, int subclass)
+{
+   lockdep_assert_preemption_disabled();
+   __write_seqcount_begin_nested(s, subclass);
+}
+
+/*
+ * write_seqcount_begin() without lockdep non-preemptibility checks.
+ *
+ * Use for internal seqlock.h code where it's known that preemption is
+ * already disabled. For example, seqlock_t write side functions.
+ */
+static inline void __write_seqcount_begin(seqcount_t *s)
+{
+   __write_seqcount_begin_nested(s, 0);
+}
+
 /**
  * write_seqcount_begin() - start a seqcount_t write-side critical section
  * @s: Pointer to  seqcount_t
@@ -563,7 +580,7 @@ static inline unsigned read_seqretry(const seqlock_t *sl, 
unsigned start)
 static inline void write_seqlock(seqlock_t *sl)
 {
spin_lock(>lock);
-   write_seqcount_begin(>seqcount);
+   __write_seqcount_begin(>seqcount);
 }
 
 /**
@@ -591,7 +608,7 @@ static inline void write_sequnlock(seqlock_t *sl)
 static inline void write_seqlock_bh(seqlock_t *sl)
 {
spin_lock_bh(>lock);
-   write_seqcount_begin(>seqcount);
+   __write_seqcount_begin(>seqcount);
 }
 
 /**
@@ -618,7 +635,7 @@ static inline void write_sequnlock_bh(seqlock_t *sl)
 static inline void write_seqlock_irq(seqlock_t *sl)
 {
spin_lock_irq(>lock);
-   write_seqcount_begin(>seqcount);
+   __write_seqcount_begin(>seqcount);
 }
 
 /**
@@ -639,7 +656,7 @@ static inline unsigned long 
__write_seqlock_irqsave(seqlock_t *sl)
unsigned long flags;
 
spin_lock_irqsave(>lock, flags);
-   write_seqcount_begin(>seqcount);
+   __write_seqcount_begin(>seqcount);
return flags;
 }
 
-- 
2.20.1

[PATCH v3 09/20] sched: tasks: Use sequence counter with associated spinlock

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish 
---
 include/linux/sched.h | 2 +-
 init/init_task.c  | 3 ++-
 kernel/fork.c | 2 +-
 3 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/linux/sched.h b/include/linux/sched.h
index 03403ca6e44e..1b4e6b8dc523 100644
--- a/include/linux/sched.h
+++ b/include/linux/sched.h
@@ -1055,7 +1055,7 @@ struct task_struct {
/* Protected by ->alloc_lock: */
nodemask_t  mems_allowed;
/* Seqence number to catch updates: */
-   seqcount_t  mems_allowed_seq;
+   seqcount_spinlock_t mems_allowed_seq;
int cpuset_mem_spread_rotor;
int cpuset_slab_spread_rotor;
 #endif
diff --git a/init/init_task.c b/init/init_task.c
index 15089d15010a..94fe3ba1bb60 100644
--- a/init/init_task.c
+++ b/init/init_task.c
@@ -154,7 +154,8 @@ struct task_struct init_task
.trc_holdout_list = LIST_HEAD_INIT(init_task.trc_holdout_list),
 #endif
 #ifdef CONFIG_CPUSETS
-   .mems_allowed_seq = SEQCNT_ZERO(init_task.mems_allowed_seq),
+   .mems_allowed_seq = SEQCNT_SPINLOCK_ZERO(init_task.mems_allowed_seq,
+_task.alloc_lock),
 #endif
 #ifdef CONFIG_RT_MUTEXES
.pi_waiters = RB_ROOT_CACHED,
diff --git a/kernel/fork.c b/kernel/fork.c
index f44d70307210..eb260c6bdb8b 100644
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -2032,7 +2032,7 @@ static __latent_entropy struct task_struct *copy_process(
 #ifdef CONFIG_CPUSETS
p->cpuset_mem_spread_rotor = NUMA_NO_NODE;
p->cpuset_slab_spread_rotor = NUMA_NO_NODE;
-   seqcount_init(>mems_allowed_seq);
+   seqcount_spinlock_init(>mems_allowed_seq, >alloc_lock);
 #endif
 #ifdef CONFIG_TRACE_IRQFLAGS
p->irq_events = 0;
-- 
2.20.1

[PATCH v3 10/20] netfilter: conntrack: Use sequence counter with associated spinlock

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. A plain seqcount_t does not
contain the information of which lock must be held when entering a write
side critical section.

Use the new seqcount_spinlock_t data type, which allows to associate a
spinlock with the sequence counter. This enables lockdep to verify that
the spinlock used for writer serialization is held when the write side
critical section is entered.

If lockdep is disabled this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish 
---
 include/net/netfilter/nf_conntrack.h | 2 +-
 net/netfilter/nf_conntrack_core.c| 5 +++--
 2 files changed, 4 insertions(+), 3 deletions(-)

diff --git a/include/net/netfilter/nf_conntrack.h 
b/include/net/netfilter/nf_conntrack.h
index 90690e37a56f..ea4e2010b246 100644
--- a/include/net/netfilter/nf_conntrack.h
+++ b/include/net/netfilter/nf_conntrack.h
@@ -286,7 +286,7 @@ int nf_conntrack_hash_resize(unsigned int hashsize);
 
 extern struct hlist_nulls_head *nf_conntrack_hash;
 extern unsigned int nf_conntrack_htable_size;
-extern seqcount_t nf_conntrack_generation;
+extern seqcount_spinlock_t nf_conntrack_generation;
 extern unsigned int nf_conntrack_max;
 
 /* must be called with rcu read lock held */
diff --git a/net/netfilter/nf_conntrack_core.c 
b/net/netfilter/nf_conntrack_core.c
index 79cd9dde457b..b8c54d390f93 100644
--- a/net/netfilter/nf_conntrack_core.c
+++ b/net/netfilter/nf_conntrack_core.c
@@ -180,7 +180,7 @@ EXPORT_SYMBOL_GPL(nf_conntrack_htable_size);
 
 unsigned int nf_conntrack_max __read_mostly;
 EXPORT_SYMBOL_GPL(nf_conntrack_max);
-seqcount_t nf_conntrack_generation __read_mostly;
+seqcount_spinlock_t nf_conntrack_generation __read_mostly;
 static unsigned int nf_conntrack_hash_rnd __read_mostly;
 
 static u32 hash_conntrack_raw(const struct nf_conntrack_tuple *tuple,
@@ -2598,7 +2598,8 @@ int nf_conntrack_init_start(void)
/* struct nf_ct_ext uses u8 to store offsets/size */
BUILD_BUG_ON(total_extension_size() > 255u);
 
-   seqcount_init(_conntrack_generation);
+   seqcount_spinlock_init(_conntrack_generation,
+  _conntrack_locks_all_lock);
 
for (i = 0; i < CONNTRACK_LOCKS; i++)
spin_lock_init(_conntrack_locks[i]);
-- 
2.20.1

[PATCH v3 01/20] Documentation: locking: Describe seqlock design and usage

Proper documentation for the design and usage of sequence counters and
sequential locks does not exist. Complete the seqlock.h documentation as
follows:

  - Divide all documentation on a seqcount_t vs. seqlock_t basis. The
description for both mechanisms was intermingled, which is incorrect
since the usage constrains for each type are vastly different.

  - Add an introductory paragraph describing the internal design of, and
rationale for, sequence counters.

  - Document seqcount_t writer non-preemptibility requirement, which was
not previously documented anywhere, and provide a clear rationale.

  - Provide template code for seqcount_t and seqlock_t initialization
and reader/writer critical sections.

  - Recommend using seqlock_t by default. It implicitly handles the
serialization and non-preemptibility requirements of writers.

At seqlock.h:

  - Remove references to brlocks as they've long been removed from the
kernel.

  - Remove references to gcc-3.x since the kernel's minimum supported
gcc version is 4.6.

References: 0f6ed63b1707 ("no need to keep brlock macros anymore...")
References: cafa0010cd51 ("Raise the minimum required gcc version to 4.6")
Signed-off-by: Ahmed S. Darwish 
---
 Documentation/locking/index.rst   |   1 +
 Documentation/locking/seqlock.rst | 184 ++
 include/linux/seqlock.h   |  77 ++---
 3 files changed, 221 insertions(+), 41 deletions(-)
 create mode 100644 Documentation/locking/seqlock.rst

diff --git a/Documentation/locking/index.rst b/Documentation/locking/index.rst
index d785878cad65..7003bd5aeff4 100644
--- a/Documentation/locking/index.rst
+++ b/Documentation/locking/index.rst
@@ -14,6 +14,7 @@ locking
 mutex-design
 rt-mutex-design
 rt-mutex
+seqlock
 spinlocks
 ww-mutex-design
 preempt-locking
diff --git a/Documentation/locking/seqlock.rst 
b/Documentation/locking/seqlock.rst
new file mode 100644
index ..c9916efe038e
--- /dev/null
+++ b/Documentation/locking/seqlock.rst
@@ -0,0 +1,184 @@
+==
+Sequence counters and sequential locks
+==
+
+Introduction
+
+
+Sequence counters are a reader-writer consistency mechanism with
+lockless readers (read-only retry loops), and no writer starvation. They
+are used for data that's rarely written to (e.g. system time), where the
+reader wants a consistent set of information and is willing to retry if
+that information changes.
+
+A data set is consistent when the sequence count at the beginning of the
+read side critical section is even and the same sequence count value is
+read again at the end of the critical section. The data in the set must
+be copied out inside the read side critical section. If the sequence
+count has changed between the start and the end of the critical section,
+the reader must retry.
+
+Writers increment the sequence count at the start and the end of their
+critical section. After starting the critical section the sequence count
+is odd and indicates to the readers that an update is in progress. At
+the end of the write side critical section the sequence count becomes
+even again which lets readers make progress.
+
+A sequence counter write side critical section must never be preempted
+or interrupted by read side sections. Otherwise the reader will spin for
+the entire scheduler tick due to the odd sequence count value and the
+interrupted writer. If that reader belongs to a real-time scheduling
+class, it can spin forever and the kernel will livelock.
+
+This mechanism cannot be used if the protected data contains pointers,
+as the writer can invalidate a pointer that the reader is following.
+
+.. _seqcount_t:
+
+Sequence counters (:c:type:`seqcount_t`)
+
+
+This is the the raw counting mechanism, which does not protect against
+multiple writers.  Write side critical sections must thus be serialized
+by an external lock.
+
+If the write serialization primitive is not implicitly disabling
+preemption, preemption must be explicitly disabled before entering the
+write side section. If the read section can be invoked from hardirq or
+softirq contexts, interrupts or bottom halves must also be respectively
+disabled before entering the write section.
+
+If it's desired to automatically handle the sequence counter
+requirements of writer serialization and non-preemptibility, use a
+:ref:`sequential lock ` instead.
+
+Initialization:
+
+.. code-block:: c
+
+   /* dynamic */
+   seqcount_t foo_seqcount;
+   seqcount_init(_seqcount);
+
+   /* static */
+   static seqcount_t foo_seqcount = SEQCNT_ZERO(foo_seqcount);
+
+   /* C99 struct init */
+   struct {
+   .seq   = SEQCNT_ZERO(foo.seq),
+   } foo;
+
+Write path:
+
+.. code-block:: c
+
+   /* Serialized context with disabled preemption */
+
+

[PATCH v3 04/20] lockdep: Add preemption enabled/disabled assertion APIs

Asserting that preemption is enabled or disabled is a critical sanity
check.  Developers are usually reluctant to add such a check in a
fastpath as reading the preemption count can be costly.

Extend the lockdep API with macros asserting that preemption is disabled
or enabled. If lockdep is disabled, or if the underlying architecture
does not support kernel preemption, this assert has no runtime overhead.

References: f54bb2ec02c8 ("locking/lockdep: Add IRQs disabled/enabled assertion 
APIs: ...")
Signed-off-by: Ahmed S. Darwish 
---
 include/linux/lockdep.h | 18 ++
 lib/Kconfig.debug   |  1 +
 2 files changed, 19 insertions(+)

diff --git a/include/linux/lockdep.h b/include/linux/lockdep.h
index fd04b9e96091..53eff6b26fac 100644
--- a/include/linux/lockdep.h
+++ b/include/linux/lockdep.h
@@ -548,6 +548,22 @@ do {   
\
WARN_ON_ONCE(debug_locks && !this_cpu_read(hardirq_context));   \
 } while (0)
 
+#define lockdep_assert_preemption_enabled()\
+do {   \
+   WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT)   &&  \
+debug_locks&&  \
+(preempt_count() != 0  ||  \
+ !this_cpu_read(hardirqs_enabled)));   \
+} while (0)
+
+#define lockdep_assert_preemption_disabled()   \
+do {   \
+   WARN_ON_ONCE(IS_ENABLED(CONFIG_PREEMPT_COUNT)   &&  \
+debug_locks&&  \
+(preempt_count() == 0  &&  \
+ this_cpu_read(hardirqs_enabled)));\
+} while (0)
+
 #else
 # define might_lock(lock) do { } while (0)
 # define might_lock_read(lock) do { } while (0)
@@ -556,6 +572,8 @@ do {
\
 # define lockdep_assert_irqs_enabled() do { } while (0)
 # define lockdep_assert_irqs_disabled() do { } while (0)
 # define lockdep_assert_in_irq() do { } while (0)
+# define lockdep_assert_preemption_enabled() do { } while (0)
+# define lockdep_assert_preemption_disabled() do { } while (0)
 #endif
 
 #ifdef CONFIG_PROVE_RAW_LOCK_NESTING
diff --git a/lib/Kconfig.debug b/lib/Kconfig.debug
index d74ac0fd6b2d..e5e2e632b749 100644
--- a/lib/Kconfig.debug
+++ b/lib/Kconfig.debug
@@ -1118,6 +1118,7 @@ config PROVE_LOCKING
select DEBUG_RWSEMS
select DEBUG_WW_MUTEX_SLOWPATH
select DEBUG_LOCK_ALLOC
+   select PREEMPT_COUNT if !ARCH_NO_PREEMPT
select TRACE_IRQFLAGS
default n
help
-- 
2.20.1

[PATCH v3 08/20] dma-buf: Use sequence counter with associated wound/wait mutex

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. If the serialization primitive is
not disabling preemption implicitly, preemption has to be explicitly
disabled before entering the sequence counter write side critical
section.

The dma-buf reservation subsystem uses plain sequence counters to manage
updates to reservations. Writer serialization is accomplished through a
wound/wait mutex.

Acquiring a wound/wait mutex does not disable preemption, so this needs
to be done manually before and after the write side critical section.

Use the newly-added seqcount_ww_mutex_t instead:

  - It associates the ww_mutex with the sequence count, which enables
lockdep to validate that the write side critical section is properly
serialized.

  - It removes the need to explicitly add preempt_disable/enable()
around the write side critical section because the write_begin/end()
functions for this new data type automatically do this.

If lockdep is disabled this ww_mutex lock association is compiled out
and has neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish 
Acked-by: Daniel Vetter 
---
 drivers/dma-buf/dma-resv.c   | 8 +---
 drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c | 2 --
 include/linux/dma-resv.h | 2 +-
 3 files changed, 2 insertions(+), 10 deletions(-)

diff --git a/drivers/dma-buf/dma-resv.c b/drivers/dma-buf/dma-resv.c
index 15efa0c2dacb..a7631352a486 100644
--- a/drivers/dma-buf/dma-resv.c
+++ b/drivers/dma-buf/dma-resv.c
@@ -129,7 +129,7 @@ subsys_initcall(dma_resv_lockdep);
 void dma_resv_init(struct dma_resv *obj)
 {
ww_mutex_init(>lock, _ww_class);
-   seqcount_init(>seq);
+   seqcount_ww_mutex_init(>seq, >lock);
 
RCU_INIT_POINTER(obj->fence, NULL);
RCU_INIT_POINTER(obj->fence_excl, NULL);
@@ -260,7 +260,6 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct 
dma_fence *fence)
fobj = dma_resv_get_list(obj);
count = fobj->shared_count;
 
-   preempt_disable();
write_seqcount_begin(>seq);
 
for (i = 0; i < count; ++i) {
@@ -282,7 +281,6 @@ void dma_resv_add_shared_fence(struct dma_resv *obj, struct 
dma_fence *fence)
smp_store_mb(fobj->shared_count, count);
 
write_seqcount_end(>seq);
-   preempt_enable();
dma_fence_put(old);
 }
 EXPORT_SYMBOL(dma_resv_add_shared_fence);
@@ -309,14 +307,12 @@ void dma_resv_add_excl_fence(struct dma_resv *obj, struct 
dma_fence *fence)
if (fence)
dma_fence_get(fence);
 
-   preempt_disable();
write_seqcount_begin(>seq);
/* write_seqcount_begin provides the necessary memory barrier */
RCU_INIT_POINTER(obj->fence_excl, fence);
if (old)
old->shared_count = 0;
write_seqcount_end(>seq);
-   preempt_enable();
 
/* inplace update, no shared fences */
while (i--)
@@ -394,13 +390,11 @@ int dma_resv_copy_fences(struct dma_resv *dst, struct 
dma_resv *src)
src_list = dma_resv_get_list(dst);
old = dma_resv_get_excl(dst);
 
-   preempt_disable();
write_seqcount_begin(>seq);
/* write_seqcount_begin provides the necessary memory barrier */
RCU_INIT_POINTER(dst->fence_excl, new);
RCU_INIT_POINTER(dst->fence, dst_list);
write_seqcount_end(>seq);
-   preempt_enable();
 
dma_resv_list_free(src_list);
dma_fence_put(old);
diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c 
b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
index b91b5171270f..ff4b583cb96a 100644
--- a/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
+++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c
@@ -258,11 +258,9 @@ static int amdgpu_amdkfd_remove_eviction_fence(struct 
amdgpu_bo *bo,
new->shared_count = k;
 
/* Install the new fence list, seqcount provides the barriers */
-   preempt_disable();
write_seqcount_begin(>seq);
RCU_INIT_POINTER(resv->fence, new);
write_seqcount_end(>seq);
-   preempt_enable();
 
/* Drop the references to the removed fences or move them to ef_list */
for (i = j, k = 0; i < old->shared_count; ++i) {
diff --git a/include/linux/dma-resv.h b/include/linux/dma-resv.h
index a6538ae7d93f..d44a77e8a7e3 100644
--- a/include/linux/dma-resv.h
+++ b/include/linux/dma-resv.h
@@ -69,7 +69,7 @@ struct dma_resv_list {
  */
 struct dma_resv {
struct ww_mutex lock;
-   seqcount_t seq;
+   seqcount_ww_mutex_t seq;
 
struct dma_fence __rcu *fence_excl;
struct dma_resv_list __rcu *fence;
-- 
2.20.1

[PATCH v3 06/20] seqlock: Extend seqcount API with associated locks

A sequence counter write side critical section must be protected by some
form of locking to serialize writers. If the serialization primitive is
not disabling preemption implicitly, preemption has to be explicitly
disabled before entering the write side critical section.

There is no built-in debugging mechanism to verify that the lock used
for writer serialization is held and preemption is disabled. Some usage
sites like dma-buf have explicit lockdep checks for the writer-side
lock, but this covers only a small portion of the sequence counter usage
in the kernel.

Add new sequence counter types which allows to associate a lock to the
sequence counter at initialization time. The seqcount API functions are
extended to provide appropriate lockdep assertions depending on the
seqcount/lock type.

For sequence counters with associated locks that do not implicitly
disable preemption, preemption protection is enforced in the sequence
counter write side functions. This removes the need to explicitly add
preempt_disable/enable() around the write side critical sections: the
write_begin/end() functions for these new sequence counter types
automatically do this.

Introduce the following seqcount types with associated locks:

 seqcount_spinlock_t
 seqcount_raw_spinlock_t
 seqcount_rwlock_t
 seqcount_mutex_t
 seqcount_ww_mutex_t

Extend the seqcount read and write functions to branch out to the
specific seqcount_LOCKTYPE_t implementation at compile-time. This avoids
kernel API explosion per each new seqcount_LOCKTYPE_t added. Add such
compile-time type detection logic into a new, internal, seqlock header.

Document the proper seqcount_LOCKTYPE_t usage, and rationale, at
Documentation/locking/seqlock.rst.

If lockdep is disabled, this lock association is compiled out and has
neither storage size nor runtime overhead.

Signed-off-by: Ahmed S. Darwish 
---
 Documentation/locking/seqlock.rst  |  64 -
 MAINTAINERS|   2 +-
 include/linux/seqlock.h| 372 -
 include/linux/seqlock_types_internal.h | 186 +
 4 files changed, 558 insertions(+), 66 deletions(-)
 create mode 100644 include/linux/seqlock_types_internal.h

diff --git a/Documentation/locking/seqlock.rst 
b/Documentation/locking/seqlock.rst
index c9916efe038e..2d526dc95408 100644
--- a/Documentation/locking/seqlock.rst
+++ b/Documentation/locking/seqlock.rst
@@ -48,9 +48,11 @@ write side section. If the read section can be invoked from 
hardirq or
 softirq contexts, interrupts or bottom halves must also be respectively
 disabled before entering the write section.
 
-If it's desired to automatically handle the sequence counter
-requirements of writer serialization and non-preemptibility, use a
-:ref:`sequential lock ` instead.
+If the write serialization mechanism is one of the common kernel locking
+primitives, use :ref:`sequence counters with associated locks
+` instead. If it's desired to automatically handle
+the sequence counter writer serialization and non-preemptibility
+requirements, use a :ref:`sequential lock `.
 
 Initialization:
 
@@ -70,6 +72,7 @@ Initialization:
 
 Write path:
 
+.. _seqcount_write_ops:
 .. code-block:: c
 
/* Serialized context with disabled preemption */
@@ -82,6 +85,7 @@ Write path:
 
 Read path:
 
+.. _seqcount_read_ops:
 .. code-block:: c
 
do {
@@ -91,6 +95,60 @@ Read path:
 
} while (read_seqcount_retry(_seqcount, seq));
 
+.. _seqcount_locktype_t:
+
+Sequence counters with associated locks (:c:type:`seqcount_LOCKTYPE_t`)
+---
+
+As :ref:`earlier discussed `, seqcount write side critical
+sections must be serialized and non-preemptible. This variant of
+sequence counters associate the lock used for writer serialization at
+the seqcount initialization time. This enables lockdep to validate that
+the write side critical section is properly serialized.
+
+This lock association is a NOOP if lockdep is disabled and has neither
+storage nor runtime overhead. If lockdep is enabled, the lock pointer is
+stored in struct seqcount and lockdep's "lock is held" assertions are
+injected at the beginning of the write side critical section to validate
+that it is properly protected.
+
+For lock types which do not implicitly disable preemption, preemption
+protection is enforced in the write side function.
+
+The following seqcounts with associated locks are defined:
+
+  - :c:type:`seqcount_spinlock_t`
+  - :c:type:`seqcount_raw_spinlock_t`
+  - :c:type:`seqcount_rwlock_t`
+  - :c:type:`seqcount_mutex_t`
+  - :c:type:`seqcount_ww_mutex_t`
+
+The plain seqcount read and write APIs branch out to the specific
+seqcount_LOCKTYPE_t implementation at compile-time. This avoids kernel
+API explosion per each new seqcount LOCKTYPE.
+
+Initialization (replace "LOCKTYPE" with one of the supported locks):
+
+.. code-block:: c
+
+   /* dynamic */
+

[PATCH v3 03/20] seqlock: Add missing kernel-doc annotations

A small number of the the exported seqlock.h functions are kernel-doc
annotated.

Since seqlock.h is now included by the kernel's RST documentation, add
kernel-doc annotations for all of the remaining functions.

Signed-off-by: Ahmed S. Darwish 
---
 include/linux/seqlock.h | 398 ++--
 1 file changed, 339 insertions(+), 59 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index d3bba59eb4df..057f7326a877 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -75,6 +75,10 @@ static inline void __seqcount_init(seqcount_t *s, const char 
*name,
 # define SEQCOUNT_DEP_MAP_INIT(lockname) \
.dep_map = { .name = #lockname } \
 
+/**
+ * seqcount_init() - runtime initializer for seqcount_t
+ * @s: Pointer to the  seqcount_t instance
+ */
 # define seqcount_init(s)  \
do {\
static struct lock_class_key __key; \
@@ -98,13 +102,17 @@ static inline void seqcount_lockdep_reader_access(const 
seqcount_t *s)
 # define seqcount_lockdep_reader_access(x)
 #endif
 
-#define SEQCNT_ZERO(lockname) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(lockname)}
+/**
+ * SEQCNT_ZERO() - static initializer for seqcount_t
+ * @name: Name of the  seqcount_t instance
+ */
+#define SEQCNT_ZERO(name) { .sequence = 0, SEQCOUNT_DEP_MAP_INIT(name) }
 
 
 /**
- * __read_seqcount_begin - begin a seq-read critical section (without barrier)
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * __read_seqcount_begin() - begin a seqcount_t read section (without barrier)
+ * @s: Pointer to  seqcount_t
+ * Returns: count to be passed to read_seqcount_retry()
  *
  * __read_seqcount_begin is like read_seqcount_begin, but has no smp_rmb()
  * barrier. Callers should ensure that smp_rmb() or equivalent ordering is
@@ -129,13 +137,14 @@ static inline unsigned __read_seqcount_begin(const 
seqcount_t *s)
 }
 
 /**
- * raw_read_seqcount - Read the raw seqcount
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * raw_read_seqcount() - Read the seqcount_t raw counter value
+ * @s: Pointer to  seqcount_t
+ * Returns: count to be passed to read_seqcount_retry()
  *
  * raw_read_seqcount opens a read critical section of the given
- * seqcount without any lockdep checking and without checking or
- * masking the LSB. Calling code is responsible for handling that.
+ * seqcount_t, without any lockdep checks and without checking or
+ * masking the sequence counter LSB. Calling code is responsible for
+ * handling that.
  */
 static inline unsigned raw_read_seqcount(const seqcount_t *s)
 {
@@ -146,13 +155,13 @@ static inline unsigned raw_read_seqcount(const seqcount_t 
*s)
 }
 
 /**
- * raw_read_seqcount_begin - start seq-read critical section w/o lockdep
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * raw_read_seqcount_begin() - start a seqcount_t read section w/o lockdep
+ * @s: Pointer to  seqcount_t
+ * Returns: count to be passed to read_seqcount_retry()
  *
  * raw_read_seqcount_begin opens a read critical section of the given
- * seqcount, but without any lockdep checking. Validity of the critical
- * section is tested by checking read_seqcount_retry function.
+ * seqcount_t, but without any lockdep checking. Validity of the read
+ * section must be checked with read_seqcount_retry().
  */
 static inline unsigned raw_read_seqcount_begin(const seqcount_t *s)
 {
@@ -162,13 +171,13 @@ static inline unsigned raw_read_seqcount_begin(const 
seqcount_t *s)
 }
 
 /**
- * read_seqcount_begin - begin a seq-read critical section
- * @s: pointer to seqcount_t
- * Returns: count to be passed to read_seqcount_retry
+ * read_seqcount_begin() - start a seqcount_t read critical section
+ * @s: Pointer to  seqcount_t
+ * Returns: count to be passed to read_seqcount_retry()
  *
- * read_seqcount_begin opens a read critical section of the given seqcount.
- * Validity of the critical section is tested by checking read_seqcount_retry
- * function.
+ * read_seqcount_begin opens a read critical section of the given
+ * seqcount_t. Validity of the read section must be checked with
+ * read_seqcount_retry().
  */
 static inline unsigned read_seqcount_begin(const seqcount_t *s)
 {
@@ -177,11 +186,11 @@ static inline unsigned read_seqcount_begin(const 
seqcount_t *s)
 }
 
 /**
- * raw_seqcount_begin - begin a seq-read critical section
- * @s: pointer to seqcount_t
+ * raw_seqcount_begin() - begin a seq-read critical section
+ * @s: Pointer to  seqcount_t
  * Returns: count to be passed to read_seqcount_retry
  *
- * raw_seqcount_begin opens a read critical section of the given seqcount.
+ * raw_seqcount_begin opens a read critical section of the given seqcount_t.
  * Validity of the critical section is tested by checking read_seqcount_retry
  * function.
  *
@@ -199,8 +208,8 @@ static inline

[PATCH v3 00/20] seqlock: Extend seqcount API with associated locks

Hi,

This is v3 of the seqlock patch series:

   [PATCH v1 00/25] seqlock: Extend seqcount API with associated locks
   https://lore.kernel.org/lkml/20200519214547.352050-1-a.darw...@linutronix.de

   [PATCH v2 00/18]
   https://lore.kernel.org/lkml/20200608005729.1874024-1-a.darw...@linutronix.de

It's based over:

   git://git.kernel.org/pub/scm/linux/kernel/git/peterz/queue.git locking/core

to get Peter's lockdep irqstate tracking series below, which untangles
mainline seqlock.h<=>sched.h 'current->' task_struct circular dependency:

   https://lkml.kernel.org/r/linuxppc-dev/20200623083645.277342...@infradead.org

Changelog-v3:

 - Re-add lockdep non-preemptibility checks on seqcount_t write paths.
   They were removed from v2 due to the circular dependencies mentioned.

 - Slight rebase over the new v5.8-rc1 KCSAN seqlock.h changes

 - Collect seqcount_t call-sites acked-by tags

Thanks,

8<--

Ahmed S. Darwish (20):
  Documentation: locking: Describe seqlock design and usage
  seqlock: Properly format kernel-doc code samples
  seqlock: Add missing kernel-doc annotations
  lockdep: Add preemption enabled/disabled assertion APIs
  seqlock: lockdep assert non-preemptibility on seqcount_t write
  seqlock: Extend seqcount API with associated locks
  dma-buf: Remove custom seqcount lockdep class key
  dma-buf: Use sequence counter with associated wound/wait mutex
  sched: tasks: Use sequence counter with associated spinlock
  netfilter: conntrack: Use sequence counter with associated spinlock
  netfilter: nft_set_rbtree: Use sequence counter with associated rwlock
  xfrm: policy: Use sequence counters with associated lock
  timekeeping: Use sequence counter with associated raw spinlock
  vfs: Use sequence counter with associated spinlock
  raid5: Use sequence counter with associated spinlock
  iocost: Use sequence counter with associated spinlock
  NFSv4: Use sequence counter with associated spinlock
  userfaultfd: Use sequence counter with associated spinlock
  kvm/eventfd: Use sequence counter with associated spinlock
  hrtimer: Use sequence counter with associated raw spinlock

 Documentation/locking/index.rst   |   1 +
 Documentation/locking/seqlock.rst | 242 +
 MAINTAINERS   |   2 +-
 block/blk-iocost.c|   5 +-
 drivers/dma-buf/dma-resv.c|  15 +-
 .../gpu/drm/amd/amdgpu/amdgpu_amdkfd_gpuvm.c  |   2 -
 drivers/md/raid5.c|   2 +-
 drivers/md/raid5.h|   2 +-
 fs/dcache.c   |   2 +-
 fs/fs_struct.c|   4 +-
 fs/nfs/nfs4_fs.h  |   2 +-
 fs/nfs/nfs4state.c|   2 +-
 fs/userfaultfd.c  |   4 +-
 include/linux/dcache.h|   2 +-
 include/linux/dma-resv.h  |   4 +-
 include/linux/fs_struct.h |   2 +-
 include/linux/hrtimer.h   |   2 +-
 include/linux/kvm_irqfd.h |   2 +-
 include/linux/lockdep.h   |  18 +
 include/linux/sched.h |   2 +-
 include/linux/seqlock.h   | 872 ++
 include/linux/seqlock_types_internal.h| 186 
 include/net/netfilter/nf_conntrack.h  |   2 +-
 init/init_task.c  |   3 +-
 kernel/fork.c |   2 +-
 kernel/time/hrtimer.c |  13 +-
 kernel/time/timekeeping.c |  19 +-
 lib/Kconfig.debug |   1 +
 net/netfilter/nf_conntrack_core.c |   5 +-
 net/netfilter/nft_set_rbtree.c|   4 +-
 net/xfrm/xfrm_policy.c|  10 +-
 virt/kvm/eventfd.c|   2 +-
 32 files changed, 1211 insertions(+), 225 deletions(-)
 create mode 100644 Documentation/locking/seqlock.rst
 create mode 100644 include/linux/seqlock_types_internal.h

base-commit: 997e89fa345e9006f311cf9f9c8fd9f7d96c240f
--
2.20.1

[PATCH v3 02/20] seqlock: Properly format kernel-doc code samples

Align the code samples and note sections inside kernel-doc comments with
tabs. This way they can be properly parsed and rendered by Sphinx. It
also makes the code samples easier to read from text editors.

Signed-off-by: Ahmed S. Darwish 
---
 include/linux/seqlock.h | 82 +
 1 file changed, 43 insertions(+), 39 deletions(-)

diff --git a/include/linux/seqlock.h b/include/linux/seqlock.h
index e54ff48e87f8..d3bba59eb4df 100644
--- a/include/linux/seqlock.h
+++ b/include/linux/seqlock.h
@@ -256,7 +256,7 @@ static inline void raw_write_seqcount_end(seqcount_t *s)
  *
  * This can be used to provide an ordering guarantee instead of the
  * usual consistency guarantee. It is one wmb cheaper, because we can
- * collapse the two back-to-back wmb()s.
+ * collapse the two back-to-back wmb()s::
  *
  * Note that writes surrounding the barrier should be declared atomic (e.g.
  * via WRITE_ONCE): a) to ensure the writes become visible to other threads
@@ -325,64 +325,68 @@ static inline int raw_read_seqcount_latch(seqcount_t *s)
  * Very simply put: we first modify one copy and then the other. This ensures
  * there is always one copy in a stable state, ready to give us an answer.
  *
- * The basic form is a data structure like:
+ * The basic form is a data structure like::
  *
- * struct latch_struct {
- * seqcount_t  seq;
- * struct data_struct  data[2];
- * };
+ * struct latch_struct {
+ * seqcount_t  seq;
+ * struct data_struct  data[2];
+ * };
  *
  * Where a modification, which is assumed to be externally serialized, does the
- * following:
+ * following::
  *
- * void latch_modify(struct latch_struct *latch, ...)
- * {
- * smp_wmb();  <- Ensure that the last data[1] update is visible
- * latch->seq++;
- * smp_wmb();  <- Ensure that the seqcount update is visible
+ * void latch_modify(struct latch_struct *latch, ...)
+ * {
+ * smp_wmb();  // Ensure that the last data[1] update is 
visible
+ * latch->seq++;
+ * smp_wmb();  // Ensure that the seqcount update is visible
  *
- * modify(latch->data[0], ...);
+ * modify(latch->data[0], ...);
  *
- * smp_wmb();  <- Ensure that the data[0] update is visible
- * latch->seq++;
- * smp_wmb();  <- Ensure that the seqcount update is visible
+ * smp_wmb();  // Ensure that the data[0] update is visible
+ * latch->seq++;
+ * smp_wmb();  // Ensure that the seqcount update is visible
  *
- * modify(latch->data[1], ...);
- * }
+ * modify(latch->data[1], ...);
+ * }
  *
- * The query will have a form like:
+ * The query will have a form like::
  *
- * struct entry *latch_query(struct latch_struct *latch, ...)
- * {
- * struct entry *entry;
- * unsigned seq, idx;
+ * struct entry *latch_query(struct latch_struct *latch, ...)
+ * {
+ * struct entry *entry;
+ * unsigned seq, idx;
  *
- * do {
- * seq = raw_read_seqcount_latch(>seq);
+ * do {
+ * seq = raw_read_seqcount_latch(>seq);
  *
- * idx = seq & 0x01;
- * entry = data_query(latch->data[idx], ...);
+ * idx = seq & 0x01;
+ * entry = data_query(latch->data[idx], ...);
  *
- * smp_rmb();
- * } while (seq != latch->seq);
+ * smp_rmb();
+ * } while (seq != latch->seq);
  *
- * return entry;
- * }
+ * return entry;
+ * }
  *
  * So during the modification, queries are first redirected to data[1]. Then we
  * modify data[0]. When that is complete, we redirect queries back to data[0]
  * and we can modify data[1].
  *
- * NOTE: The non-requirement for atomic modifications does _NOT_ include
- *   the publishing of new entries in the case where data is a dynamic
- *   data structure.
+ * NOTE:
  *
- *   An iteration might start in data[0] and get suspended long enough
- *   to miss an entire modification sequence, once it resumes it might
- *   observe the new entry.
+ * The non-requirement for atomic modifications does _NOT_ include
+ * the publishing of new entries in the case where data is a dynamic
+ * data structure.
  *
- * NOTE: When data is a dynamic data structure; one should use regular RCU
- *   patterns to manage the lifetimes of the objects within.
+ * An iteration might start in data[0] and get suspended long enough
+ * to miss an entire modification sequence, once it resumes it might
+ * observe the new entry.
+ *
+ * NOTE:
+ *
+ * When data is a dynamic data structure; one should use regular RCU
+ * patterns to manage the lifetimes of the objects within.
  */
 static inline void raw_write_seqcount_latch(seqcount_t *s)
 {
-- 
2.20.1

Re: [PATCH v2 10/15] exec: Remove do_execve_file

2020-06-29 Thread Christoph Hellwig

FYI, this clashes badly with my exec rework.  I'd suggest you
drop everything touching exec here for now, and I can then
add the final file based exec removal to the end of my series.

Re: [PATCH] doc: cgroup: add f2fs and xfs to supported list for writeback

2020-06-29 Thread Christoph Hellwig

On Mon, Jun 29, 2020 at 02:08:09PM -0500, Eric Sandeen wrote:
> f2fs and xfs have both added support for cgroup writeback:
> 
> 578c647 f2fs: implement cgroup writeback support
> adfb5fb xfs: implement cgroup aware writeback
> 
> so add them to the supported list in the docs.
> 
> Signed-off-by: Eric Sandeen 
> ---
> 
> TBH I wonder about the wisdom of having this detail in
> the doc, as it apparently gets missed quite often ...

I'd rather remove the list of file systems.  It has no chance of
staying uptodate.

Re: [PATCH v3 1/2] remoteproc: Add remoteproc character device interface

2020-06-29 Thread Siddharth Gupta




On 6/17/2020 1:44 AM, Arnaud POULIQUEN wrote:


On 6/16/20 9:56 PM, risha...@codeaurora.org wrote:

On 2020-04-30 01:30, Arnaud POULIQUEN wrote:

Hi Rishabh,


On 4/21/20 8:10 PM, Rishabh Bhatnagar wrote:

Add the character device interface into remoteproc framework.
This interface can be used in order to boot/shutdown remote
subsystems and provides a basic ioctl based interface to implement
supplementary functionality. An ioctl call is implemented to enable
the shutdown on release feature which will allow remote processors to
be shutdown when the controlling userpsace application crashes or
hangs.


Thanks for intruducing Ioctl, this will help for future evolutions.


Signed-off-by: Rishabh Bhatnagar 
---
  Documentation/userspace-api/ioctl/ioctl-number.rst |   1 +
  drivers/remoteproc/Kconfig |   9 ++
  drivers/remoteproc/Makefile|   1 +
  drivers/remoteproc/remoteproc_cdev.c   | 143
+
  drivers/remoteproc/remoteproc_internal.h   |  21 +++
  include/linux/remoteproc.h |   3 +
  include/uapi/linux/remoteproc_cdev.h   |  20 +++
  7 files changed, 198 insertions(+)
  create mode 100644 drivers/remoteproc/remoteproc_cdev.c
  create mode 100644 include/uapi/linux/remoteproc_cdev.h

diff --git a/Documentation/userspace-api/ioctl/ioctl-number.rst
b/Documentation/userspace-api/ioctl/ioctl-number.rst
index 2e91370..412b2a0 100644
--- a/Documentation/userspace-api/ioctl/ioctl-number.rst
+++ b/Documentation/userspace-api/ioctl/ioctl-number.rst
@@ -337,6 +337,7 @@ Code  Seq#Include File
   Comments
  0xB4  00-0F  linux/gpio.h

  0xB5  00-0F  uapi/linux/rpmsg.h

  0xB6  alllinux/fpga-dfl.h
+0xB7  alluapi/linux/remoteproc_cdev.h  

  0xC0  00-0F  linux/usb/iowarrior.h
  0xCA  00-0F  uapi/misc/cxl.h
  0xCA  10-2F  uapi/misc/ocxl.h
diff --git a/drivers/remoteproc/Kconfig b/drivers/remoteproc/Kconfig
index de3862c..6374b79 100644
--- a/drivers/remoteproc/Kconfig
+++ b/drivers/remoteproc/Kconfig
@@ -14,6 +14,15 @@ config REMOTEPROC

  if REMOTEPROC

+config REMOTEPROC_CDEV
+   bool "Remoteproc character device interface"
+   help
+ Say y here to have a character device interface for Remoteproc
+ framework. Userspace can boot/shutdown remote processors through
+ this interface.
+
+ It's safe to say N if you don't want to use this interface.
+
  config IMX_REMOTEPROC
tristate "IMX6/7 remoteproc support"
depends on ARCH_MXC
diff --git a/drivers/remoteproc/Makefile b/drivers/remoteproc/Makefile
index e30a1b1..b7d4f77 100644
--- a/drivers/remoteproc/Makefile
+++ b/drivers/remoteproc/Makefile
@@ -9,6 +9,7 @@ remoteproc-y+= remoteproc_debugfs.o
  remoteproc-y  += remoteproc_sysfs.o
  remoteproc-y  += remoteproc_virtio.o
  remoteproc-y  += remoteproc_elf_loader.o
+obj-$(CONFIG_REMOTEPROC_CDEV)  += remoteproc_cdev.o
  obj-$(CONFIG_IMX_REMOTEPROC)  += imx_rproc.o
  obj-$(CONFIG_MTK_SCP) += mtk_scp.o mtk_scp_ipi.o
  obj-$(CONFIG_OMAP_REMOTEPROC) += omap_remoteproc.o
diff --git a/drivers/remoteproc/remoteproc_cdev.c
b/drivers/remoteproc/remoteproc_cdev.c
new file mode 100644
index 000..65142ec
--- /dev/null
+++ b/drivers/remoteproc/remoteproc_cdev.c
@@ -0,0 +1,143 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * Character device interface driver for Remoteproc framework.
+ *
+ * Copyright (c) 2020, The Linux Foundation. All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include "remoteproc_internal.h"
+
+#define NUM_RPROC_DEVICES  64
+static dev_t rproc_major;
+
+static ssize_t rproc_cdev_write(struct file *filp, const char __user
*buf,
+size_t len, loff_t *pos)
+{
+   struct rproc *rproc = container_of(filp->f_inode->i_cdev,
+  struct rproc, char_dev);
+   int ret = 0;
+   char cmd[10];
+
+   if (!len || len > sizeof(cmd))
+   return -EINVAL;
+
+   ret = copy_from_user(cmd, buf, sizeof(cmd));
+   if (ret)
+   return -EFAULT;
+
+   if (sysfs_streq(cmd, "start")) {
+   if (rproc->state == RPROC_RUNNING)
+   return -EBUSY;
+
+   ret = rproc_boot(rproc);
+   if (ret)
+   dev_err(>dev, "Boot failed:%d\n", ret);
+   } else if (sysfs_streq(cmd, "stop")) {
+   if (rproc->state == RPROC_OFFLINE)
+   return -ENXIO;

returning ENXIO in this case seems to me no appropriate , what about
EPERM or
EINVAL (rproc_sysfs) ?


I think EPERM would indicate the operation is

Re: [EXT] Re: [PATCH 1/2] arm64: dts: ls1088a: add more thermal zone support

2020-06-29 Thread Amit Kucheria

On Tue, Jun 30, 2020 at 10:58 AM Andy Tang  wrote:
>
>
>
> > -Original Message-
> > From: Amit Kucheria 
> > Sent: 2020年6月30日 13:12
> > To: Andy Tang 
> > Cc: Shawn Guo ; Leo Li ; Rob
> > Herring ; lakml ;
> > open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS
> > ; LKML 
> > Subject: [EXT] Re: [PATCH 1/2] arm64: dts: ls1088a: add more thermal zone
> > support
> >
> > Caution: EXT Email
> >
> > On Tue, Jun 30, 2020 at 8:56 AM  wrote:
> > >
> > > From: Yuantian Tang 
> > >
> > > There are 2 thermal zones in ls1088a soc. Add the other thermal zone
> > > node to enable it.
> > > Also update the values in calibration table to make the temperatures
> > > monitored more precise.
> > >
> > > Signed-off-by: Yuantian Tang 
> > > ---
> > >  .../arm64/boot/dts/freescale/fsl-ls1088a.dtsi | 100
> > > +++---
> > >  1 file changed, 62 insertions(+), 38 deletions(-)
> > >
> > > diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
> > > b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
> > > index 36a799554620..ccbbc23e6c85 100644
> > > --- a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
> > > +++ b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
> > > @@ -129,19 +129,19 @@
> > > };
> > >
> > > thermal-zones {
> > > -   cpu_thermal: cpu-thermal {
> > > +   core-cluster {
> > > polling-delay-passive = <1000>;
> > > polling-delay = <5000>;
> > > thermal-sensors = < 0>;
> > >
> > > trips {
> > > -   cpu_alert: cpu-alert {
> > > +   core_cluster_alert:
> > core-cluster-alert
> > > + {
> > > temperature = <85000>;
> > > hysteresis = <2000>;
> > > type = "passive";
> > > };
> > >
> > > -   cpu_crit: cpu-crit {
> > > +   core_cluster_crit: core-cluster-crit {
> > > temperature = <95000>;
> > > hysteresis = <2000>;
> > > type = "critical"; @@
> > -150,7
> > > +150,7 @@
> > >
> > > cooling-maps {
> > > map0 {
> > > -   trip = <_alert>;
> > > +   trip =
> > <_cluster_alert>;
> > > cooling-device =
> > > <
> > THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > > <
> > > THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, @@ -163,6 +163,26 @@
> > > };
> > > };
> > > };
> > > +
> > > +   soc {
> > > +   polling-delay-passive = <1000>;
> > > +   polling-delay = <5000>;
> > > +   thermal-sensors = < 1>;
> > > +
> > > +   trips {
> > > +   soc-alert {
> > > +   temperature = <85000>;
> > > +   hysteresis = <2000>;
> > > +   type = "passive";
> > > +   };
> > > +
> > > +   soc-crit {
> > > +   temperature = <95000>;
> > > +   hysteresis = <2000>;
> > > +   type = "critical";
> > > +   };
> > > +   };
> > > +   };
> >
> > You should also add a cooling-maps section for this thermal zone given that 
> > it
> > has a passive trip type. Otherwise there is no use for a passive trip type.
> It is better to have a cooling device. But there is only one cooling device 
> on this platform
> which is used by core-cluster. So there is no extra cooling device for it.
> This zone can take action when critical temp is reached. So it is still 
> useful.
> What do you suggest?

If the action taken by the core-cluster cooling-maps is the only one
that can be taken, I suggest getting rid of the the soc-alert passive
trip completely. It is not of any use.

If there is a chance that your soc thermal-zone can heat up before
your cpu-cluster zone (unlikely), you could use the same cooling
device (cpu0, cpu1) for soc thermal zone too.

Re: [PATCH] vhost: Fix documentation

2020-06-29 Thread Jason Wang




On 2020/6/30 下午1:29, Eli Cohen wrote:

Fix documentation to match actual function prototypes

"end" used instead of "last". Fix that.

Signed-off-by: Eli Cohen 
---



Acked-by: Jason Wang 

Thanks



  drivers/vhost/iotlb.c | 4 ++--
  1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c
index 1f0ca6e44410..0d4213a54a88 100644
--- a/drivers/vhost/iotlb.c
+++ b/drivers/vhost/iotlb.c
@@ -149,7 +149,7 @@ EXPORT_SYMBOL_GPL(vhost_iotlb_free);
   * vhost_iotlb_itree_first - return the first overlapped range
   * @iotlb: the IOTLB
   * @start: start of IOVA range
- * @end: end of IOVA range
+ * @last: last byte in IOVA range
   */
  struct vhost_iotlb_map *
  vhost_iotlb_itree_first(struct vhost_iotlb *iotlb, u64 start, u64 last)
@@ -162,7 +162,7 @@ EXPORT_SYMBOL_GPL(vhost_iotlb_itree_first);
   * vhost_iotlb_itree_first - return the next overlapped range
   * @iotlb: the IOTLB
   * @start: start of IOVA range
- * @end: end of IOVA range
+ * @last: last byte IOVA range
   */
  struct vhost_iotlb_map *
  vhost_iotlb_itree_next(struct vhost_iotlb_map *map, u64 start, u64 last)

Re: [PATCH] ASoC: fsl_asrc: Add an option to select internal ratio mode

2020-06-29 Thread Shengjiu Wang

On Tue, Jun 30, 2020 at 4:09 AM Nicolin Chen  wrote:
>
> On Mon, Jun 29, 2020 at 09:58:35PM +0800, Shengjiu Wang wrote:
> > The ASRC not only supports ideal ratio mode, but also supports
> > internal ratio mode.
> >
> > For internal rato mode, the rate of clock source should be divided
> > with no remainder by sample rate, otherwise there is sound
> > distortion.
> >
> > Add function fsl_asrc_select_clk() to find proper clock source for
> > internal ratio mode, if the clock source is available then internal
> > ratio mode will be selected.
> >
> > With change, the ideal ratio mode is not the only option for user.
> >
> > Signed-off-by: Shengjiu Wang 
> > ---
>
> > +static int fsl_asrc_select_clk(struct fsl_asrc_priv *asrc_priv,
> > +struct fsl_asrc_pair *pair,
> > +int in_rate,
> > +int out_rate)
> > +{
> > + struct fsl_asrc_pair_priv *pair_priv = pair->private;
> > + struct asrc_config *config = pair_priv->config;
> > + int rate[2], select_clk[2]; /* Array size 2 means IN and OUT */
> > + int clk_rate, clk_index;
> > + int i = 0, j = 0;
> > + bool clk_sel[2];
> > +
> > + rate[0] = in_rate;
> > + rate[1] = out_rate;
> > +
> > + /* Select proper clock source for internal ratio mode */
> > + for (j = 0; j < 2; j++) {
> > + for (i = 0; i < ASRC_CLK_MAP_LEN; i++) {
> > + clk_index = asrc_priv->clk_map[j][i];
> > + clk_rate = 
> > clk_get_rate(asrc_priv->asrck_clk[clk_index]);
>
> +   /* Only match a perfect clock source with no 
> remainder */
>
> > + if (clk_rate != 0 && (clk_rate / rate[j]) <= 1024 &&
> > + (clk_rate % rate[j]) == 0)
> > + break;
> > + }
> > +
> > + if (i == ASRC_CLK_MAP_LEN) {
> > + select_clk[j] = OUTCLK_ASRCK1_CLK;
> > + clk_sel[j] = false;
> > + } else {
> > + select_clk[j] = i;
> > + clk_sel[j] = true;
> > + }
> > + }
> > +
> > + /* Switch to ideal ratio mode if there is no proper clock source */
> > + if (!clk_sel[IN] || !clk_sel[OUT])
> > + select_clk[IN] = INCLK_NONE;
>
> Could get rid of clk_set:
>
> for (j) {
> for (i) {
> if (match)
> break;
> }
>
> clk[j] = i;
> }
>
> if (clk[IN] == ASRC_CLK_MAP_LEN || clk[OUT] == ASRC_CLK_MAP_LEN)
>
> And it only overrides clk[IN] setting but leaving clk[OUT] to
> to the searching result. This means that clk[OUT] may be using
> a clock source other than OUTCLK_ASRCK1_CLK if sel[IN] happens
> to be false while sel[OUT] happens to be true. Not sure if it
> is intended...but I feel it would probably be safer to use the
> previous settings: INCLK_NONE + OUTCLK_ASRCK1_CLK?

ok, will update the patch.

best regards
wang shengjiu

Re: [PATCH 4.19 000/131] 4.19.131-rc1 review

2020-06-29 Thread Naresh Kamboju

On Mon, 29 Jun 2020 at 21:05, Sasha Levin  wrote:
>
>
> This is the start of the stable review cycle for the 4.19.131 release.
> There are 131 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed 01 Jul 2020 03:34:57 PM UTC.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-4.19.y=v4.19.130
>
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-4.19.y
> and the diffstat can be found below.
>
> --
> Thanks,
> Sasha

Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Summary


kernel: 4.19.131-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-4.19.y
git commit: d77d34fc48184da0390d7f79bdc17f44c512c458
git describe: v4.19.130-131-gd77d34fc4818
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-4.19-oe/build/v4.19.130-131-gd77d34fc4818


No regressions (compared to build v4.19.130)

No fixes (compared to build v4.19.130)

Ran 34302 total tests in the following environments and test suites.

Environments
--
- dragonboard-410c - arm64
- hi6220-hikey - arm64
- i386
- juno-r2 - arm64
- juno-r2-compat
- juno-r2-kasan
- nxp-ls2088
- qemu_arm
- qemu_arm64
- qemu_i386
- qemu_x86_64
- x15 - arm
- x86_64
- x86-kasan

Test Suites
---
* build
* install-android-platform-tools-r2600
* install-android-platform-tools-r2800
* kselftest
* kselftest/drivers
* kselftest/filesystems
* kselftest/net
* libhugetlbfs
* linux-log-parser
* ltp-cap_bounds-tests
* ltp-commands-tests
* ltp-containers-tests
* ltp-cpuhotplug-tests
* ltp-crypto-tests
* ltp-cve-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-mm-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-sched-tests
* ltp-securebits-tests
* ltp-syscalls-tests
* v4l2-compliance
* kvm-unit-tests
* ltp-controllers-tests
* ltp-dio-tests
* ltp-fs-tests
* ltp-io-tests
* network-basic-tests
* perf
* ltp-open-posix-tests
* kselftest-vsyscall-mode-native
* kselftest-vsyscall-mode-native/drivers
* kselftest-vsyscall-mode-native/filesystems
* kselftest-vsyscall-mode-native/net
* kselftest-vsyscall-mode-none
* kselftest-vsyscall-mode-none/drivers
* kselftest-vsyscall-mode-none/filesystems
* kselftest-vsyscall-mode-none/net

-- 
Linaro LKFT
https://lkft.linaro.org

include/linux/compiler.h:350:38: error: call to '__compiletime_assert_453' declared with attribute error: BUILD_BUG_ON failed: IS_ENABLED(CONFIG_32BIT) && (_PFN_SHIFT > PAGE_SHIFT)

2020-06-29 Thread kernel test robot

Hi Paul,

FYI, the error/warning still remains.

tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
master
head:   7c30b859a947535f2213277e827d7ac7dcff9c84
commit: 05d013a0366d50f4f0dbebf8c1b22b42020bf49a MIPS: Detect bad _PFN_SHIFT 
values
date:   9 months ago
config: mips-randconfig-r005-20200630 (attached as .config)
compiler: mipsel-linux-gcc (GCC) 9.3.0
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
git checkout 05d013a0366d50f4f0dbebf8c1b22b42020bf49a
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross 
ARCH=mips 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

 |  ^~~~
   arch/mips/kernel/signal.c:439:5: warning: no previous prototype for 
'setup_sigcontext' [-Wmissing-prototypes]
 439 | int setup_sigcontext(struct pt_regs *regs, struct sigcontext __user 
*sc)
 | ^~~~
   arch/mips/kernel/signal.c:516:5: warning: no previous prototype for 
'restore_sigcontext' [-Wmissing-prototypes]
 516 | int restore_sigcontext(struct pt_regs *regs, struct sigcontext 
__user *sc)
 | ^~
   arch/mips/kernel/signal.c:624:17: warning: no previous prototype for 
'sys_sigreturn' [-Wmissing-prototypes]
 624 | asmlinkage void sys_sigreturn(void)
 | ^
   arch/mips/kernel/signal.c:661:17: warning: no previous prototype for 
'sys_rt_sigreturn' [-Wmissing-prototypes]
 661 | asmlinkage void sys_rt_sigreturn(void)
 | ^~~~
   arch/mips/kernel/signal.c:889:17: warning: no previous prototype for 
'do_notify_resume' [-Wmissing-prototypes]
 889 | asmlinkage void do_notify_resume(struct pt_regs *regs, void *unused,
 | ^~~~
   arch/mips/mm/init.c:62:6: warning: no previous prototype for 
'setup_zero_pages' [-Wmissing-prototypes]
  62 | void setup_zero_pages(void)
 |  ^~~~
   arch/mips/kernel/traps.c:358:6: warning: no previous prototype for 
'show_registers' [-Wmissing-prototypes]
 358 | void show_registers(struct pt_regs *regs)
 |  ^~
   arch/mips/kernel/traps.c:440:17: warning: no previous prototype for 'do_be' 
[-Wmissing-prototypes]
 440 | asmlinkage void do_be(struct pt_regs *regs)
 | ^
   arch/mips/kernel/traps.c:701:17: warning: no previous prototype for 'do_ov' 
[-Wmissing-prototypes]
 701 | asmlinkage void do_ov(struct pt_regs *regs)
 | ^
   arch/mips/kernel/traps.c:825:17: warning: no previous prototype for 'do_fpe' 
[-Wmissing-prototypes]
 825 | asmlinkage void do_fpe(struct pt_regs *regs, unsigned long fcr31)
 | ^~
   arch/mips/kernel/traps.c:978:17: warning: no previous prototype for 'do_bp' 
[-Wmissing-prototypes]
 978 | asmlinkage void do_bp(struct pt_regs *regs)
 | ^
   arch/mips/kernel/traps.c:1070:17: warning: no previous prototype for 'do_tr' 
[-Wmissing-prototypes]
1070 | asmlinkage void do_tr(struct pt_regs *regs)
 | ^
   arch/mips/mm/c-r4k.c:1703:6: warning: no previous prototype for 
'au1x00_fixup_config_od' [-Wmissing-prototypes]
1703 | void au1x00_fixup_config_od(void)
 |  ^~
   arch/mips/mm/sc-mips.c:253:5: warning: no previous prototype for 
'mips_sc_init' [-Wmissing-prototypes]
 253 | int mips_sc_init(void)
 | ^~~~
   arch/mips/mm/c-r4k.c:1818:6: warning: no previous prototype for 
'r4k_cache_init' [-Wmissing-prototypes]
1818 | void r4k_cache_init(void)
 |  ^~
   arch/mips/mm/c-r4k.c:1962:12: warning: no previous prototype for 
'r4k_cache_init_pm' [-Wmissing-prototypes]
1962 | int __init r4k_cache_init_pm(void)
 |^
   arch/mips/kernel/traps.c:1112:17: warning: no previous prototype for 'do_ri' 
[-Wmissing-prototypes]
1112 | asmlinkage void do_ri(struct pt_regs *regs)
 | ^
   arch/mips/kernel/traps.c:1346:17: warning: no previous prototype for 
'do_cpu' [-Wmissing-prototypes]
1346 | asmlinkage void do_cpu(struct pt_regs *regs)
 | ^~
   arch/mips/kernel/traps.c:1452:17: warning: no previous prototype for 
'do_msa_fpe' [-Wmissing-prototypes]
1452 | asmlinkage void do_msa_fpe(struct pt_regs *regs, unsigned int msacsr)
 | ^~
   arch/mips/kernel/traps.c:1472:17: warning: no previous prototype for 
'do_msa' [-Wmissing-prototypes]
1472 | asmlinkage void do_msa(struct pt_regs *regs)
 | ^~
   arch/mips/kernel/traps.c:1493:17: warning: no

[PATCH] vhost: Fix documentation

2020-06-29 Thread Eli Cohen

Fix documentation to match actual function prototypes

"end" used instead of "last". Fix that.

Signed-off-by: Eli Cohen 
---
 drivers/vhost/iotlb.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/drivers/vhost/iotlb.c b/drivers/vhost/iotlb.c
index 1f0ca6e44410..0d4213a54a88 100644
--- a/drivers/vhost/iotlb.c
+++ b/drivers/vhost/iotlb.c
@@ -149,7 +149,7 @@ EXPORT_SYMBOL_GPL(vhost_iotlb_free);
  * vhost_iotlb_itree_first - return the first overlapped range
  * @iotlb: the IOTLB
  * @start: start of IOVA range
- * @end: end of IOVA range
+ * @last: last byte in IOVA range
  */
 struct vhost_iotlb_map *
 vhost_iotlb_itree_first(struct vhost_iotlb *iotlb, u64 start, u64 last)
@@ -162,7 +162,7 @@ EXPORT_SYMBOL_GPL(vhost_iotlb_itree_first);
  * vhost_iotlb_itree_first - return the next overlapped range
  * @iotlb: the IOTLB
  * @start: start of IOVA range
- * @end: end of IOVA range
+ * @last: last byte IOVA range
  */
 struct vhost_iotlb_map *
 vhost_iotlb_itree_next(struct vhost_iotlb_map *map, u64 start, u64 last)
-- 
2.26.0

RE: [EXT] Re: [PATCH 1/2] arm64: dts: ls1088a: add more thermal zone support

2020-06-29 Thread Andy Tang



> -Original Message-
> From: Amit Kucheria 
> Sent: 2020年6月30日 13:12
> To: Andy Tang 
> Cc: Shawn Guo ; Leo Li ; Rob
> Herring ; lakml ;
> open list:OPEN FIRMWARE AND FLATTENED DEVICE TREE BINDINGS
> ; LKML 
> Subject: [EXT] Re: [PATCH 1/2] arm64: dts: ls1088a: add more thermal zone
> support
> 
> Caution: EXT Email
> 
> On Tue, Jun 30, 2020 at 8:56 AM  wrote:
> >
> > From: Yuantian Tang 
> >
> > There are 2 thermal zones in ls1088a soc. Add the other thermal zone
> > node to enable it.
> > Also update the values in calibration table to make the temperatures
> > monitored more precise.
> >
> > Signed-off-by: Yuantian Tang 
> > ---
> >  .../arm64/boot/dts/freescale/fsl-ls1088a.dtsi | 100
> > +++---
> >  1 file changed, 62 insertions(+), 38 deletions(-)
> >
> > diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
> > b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
> > index 36a799554620..ccbbc23e6c85 100644
> > --- a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
> > +++ b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
> > @@ -129,19 +129,19 @@
> > };
> >
> > thermal-zones {
> > -   cpu_thermal: cpu-thermal {
> > +   core-cluster {
> > polling-delay-passive = <1000>;
> > polling-delay = <5000>;
> > thermal-sensors = < 0>;
> >
> > trips {
> > -   cpu_alert: cpu-alert {
> > +   core_cluster_alert:
> core-cluster-alert
> > + {
> > temperature = <85000>;
> > hysteresis = <2000>;
> > type = "passive";
> > };
> >
> > -   cpu_crit: cpu-crit {
> > +   core_cluster_crit: core-cluster-crit {
> > temperature = <95000>;
> > hysteresis = <2000>;
> > type = "critical"; @@
> -150,7
> > +150,7 @@
> >
> > cooling-maps {
> > map0 {
> > -   trip = <_alert>;
> > +   trip =
> <_cluster_alert>;
> > cooling-device =
> > <
> THERMAL_NO_LIMIT THERMAL_NO_LIMIT>,
> > <
> > THERMAL_NO_LIMIT THERMAL_NO_LIMIT>, @@ -163,6 +163,26 @@
> > };
> > };
> > };
> > +
> > +   soc {
> > +   polling-delay-passive = <1000>;
> > +   polling-delay = <5000>;
> > +   thermal-sensors = < 1>;
> > +
> > +   trips {
> > +   soc-alert {
> > +   temperature = <85000>;
> > +   hysteresis = <2000>;
> > +   type = "passive";
> > +   };
> > +
> > +   soc-crit {
> > +   temperature = <95000>;
> > +   hysteresis = <2000>;
> > +   type = "critical";
> > +   };
> > +   };
> > +   };
> 
> You should also add a cooling-maps section for this thermal zone given that it
> has a passive trip type. Otherwise there is no use for a passive trip type.
It is better to have a cooling device. But there is only one cooling device on 
this platform
which is used by core-cluster. So there is no extra cooling device for it.
This zone can take action when critical temp is reached. So it is still useful.
What do you suggest? 

BR,
Andy
> 
> > };
> >
> > timer {
> > @@ -209,45 +229,49 @@
> > compatible = "fsl,qoriq-tmu";
> > reg = <0x0 0x1f8 0x0 0x1>;
> > interrupts = <0 23 0x4>;
> > -   fsl,tmu-range = <0xb 0x9002a 0x6004c
> 0x30062>;
> > +   fsl,tmu-range = <0xb 0x9002a 0x6004c
> > + 0x70062>;
> > fsl,tmu-calibration =
> > /* Calibration data group 1 */
> > -   <0x 0x0026
> > -   0x0001 0x002d
> > -   0x0002 0x0032
> > -   0x0003 0x0039
> > -   0x0004 0x003f
> > -   0x0005 0x0046
> > -   0x0006 0x004d
> > -

Re: [PATCH] x86/split_lock: Don't write MSR_TEST_CTRL on CPUs that aren't whitelisted

2020-06-29 Thread Sean Christopherson

Ping.  This would ideally get into 5.8, the bad behavior is quite nasty.

On Fri, Jun 05, 2020 at 12:26:05PM -0700, Sean Christopherson wrote:
> Choo! Choo!  All aboard the Split Lock Express, with direct service to
> Wreckage!
> 
> Skip split_lock_verify_msr() if the CPU isn't whitelisted as a possible
> SLD-enabled CPU model to avoid writing MSR_TEST_CTRL.  MSR_TEST_CTRL
> exists, and is writable, on many generations of CPUs.  Writing the MSR,
> even with '0', can result in bizarre, undocumented behavior.
> 
> This fixes a crash on Haswell when resuming from suspend with a live KVM
> guest.  Because APs use the standard SMP boot flow for resume, they will
> go through split_lock_init() and the subsequent RDMSR/WRMSR sequence,
> which runs even when sld_state==sld_off to ensure SLD is disabled.  On
> Haswell (at least, my Haswell), writing MSR_TEST_CTRL with '0' will
> succeed and _may_ take the SMT _sibling_ out of VMX root mode.
> 
> When KVM has an active guest, KVM performs VMXON as part of CPU onlining
> (see kvm_starting_cpu()).  Because SMP boot is serialized, the resulting
> flow is effectively:
> 
>   on_each_ap_cpu() {
>  WRMSR(MSR_TEST_CTRL, 0)
>  VMXON
>   }
> 
> As a result, the WRMSR can disable VMX on a different CPU that has
> already done VMXON.  This ultimately results in a #UD on VMPTRLD when
> KVM regains control and attempt run its vCPUs.
> 
> The above voodoo was confirmed by reworking KVM's VMXON flow to write
> MSR_TEST_CTRL prior to VMXON, and to serialize the sequence as above.
> Further verification of the insanity was done by redoing VMXON on all
> APs after the initial WRMSR->VMXON sequence.  The additional VMXON,
> which should VM-Fail, occasionally succeeded, and also eliminated the
> unexpected #UD on VMPTRLD.
> 
> The damage done by writing MSR_TEST_CTRL doesn't appear to be limited
> to VMX, e.g. after suspend with an active KVM guest, subsequent reboots
> almost always hang (even when fudging VMXON), a #UD on a random Jcc was
> observed, suspend/resume stability is qualitatively poor, and so on and
> so forth.
> 
>   kernel BUG at arch/x86/kvm/x86.c:386!
>   invalid opcode:  [#7] SMP
>   CPU: 1 PID: 2592 Comm: CPU 6/KVM Tainted: G  D
>   Hardware name: ASUS Q87M-E/Q87M-E, BIOS 1102 03/03/2014
>   RIP: 0010:kvm_spurious_fault+0xf/0x20
>   Code: <0f> 0b 0f 1f 44 00 00 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00
>   RSP: 0018:c0bcc1677b78 EFLAGS: 00010246
>   RAX: 61764000 RBX: 9e8d01d8 RCX: 9e8d4fa4
>   RDX: 9e8d0336 RSI: 0003c336 RDI: 9e8d0336
>   RBP: 0001 R08: 9e8d046d9d40 R09: 0018
>   R10: c0bcc1677b80 R11: 0008 R12: 0006
>   R13:  R14:  R15: 
>   FS:  7fe16c9f9700() GS:9e8d4fa4() knlGS:
>   CS:  0010 DS:  ES:  CR0: 80050033
>   CR2: 00d7a418 CR3: 0003c47b1006 CR4: 001626e0
>   Call Trace:
>vmx_vcpu_load_vmcs+0x1fb/0x2b0
>vmx_vcpu_load+0x3e/0x160
>kvm_arch_vcpu_load+0x48/0x260
>finish_task_switch+0x140/0x260
>__schedule+0x460/0x720
>_cond_resched+0x2d/0x40
>kvm_arch_vcpu_ioctl_run+0x82e/0x1ca0
>kvm_vcpu_ioctl+0x363/0x5c0
>ksys_ioctl+0x88/0xa0
>__x64_sys_ioctl+0x16/0x20
>do_syscall_64+0x4c/0x170
>entry_SYSCALL_64_after_hwframe+0x44/0xa9
> 
> Cc: Thomas Gleixner 
> Cc: Xiaoyao Li 
> Cc: Paolo Bonzini 
> Cc: k...@vger.kernel.org
> Fixes: dbaba47085b0c ("x86/split_lock: Rework the initialization flow of 
> split lock detection")
> Signed-off-by: Sean Christopherson 
> ---
>  arch/x86/kernel/cpu/intel.c | 11 ++-
>  1 file changed, 10 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/cpu/intel.c b/arch/x86/kernel/cpu/intel.c
> index a19a680542ce..19b6c42739fc 100644
> --- a/arch/x86/kernel/cpu/intel.c
> +++ b/arch/x86/kernel/cpu/intel.c
> @@ -48,6 +48,13 @@ enum split_lock_detect_state {
>  static enum split_lock_detect_state sld_state __ro_after_init = sld_off;
>  static u64 msr_test_ctrl_cache __ro_after_init;
>  
> +/*
> + * With a name like MSR_TEST_CTL it should go without saying, but don't touch
> + * MSR_TEST_CTL unless the CPU is one of the whitelisted models.  Writing it
> + * on CPUs that do not support SLD can cause fireworks, even when writing 
> '0'.
> + */
> +static bool cpu_model_supports_sld __ro_after_init;
> +
>  /*
>   * Processors which have self-snooping capability can handle conflicting
>   * memory type across CPUs by snooping its own cache. However, there exists
> @@ -1064,7 +1071,8 @@ static void sld_update_msr(bool on)
>  
>  static void split_lock_init(void)
>  {
> - split_lock_verify_msr(sld_state != sld_off);
> + if (cpu_model_supports_sld)
> + split_lock_verify_msr(sld_state != sld_off);
>  }
>  
>  static void split_lock_warn(unsigned long ip)
> @@ -1167,5 +1175,6 @@ void __init cpu_set_core_cap_bits(struct

[PATCH v10 09/12] spi: imx: add new i.mx6ul compatible name in binding doc

ERR009165 fixed from i.mx6ul, add its compatible name in binding doc.

Signed-off-by: Robin Gong 
Acked-by: Mark Brown 
Reviewed-by: Rob Herring 
---
 Documentation/devicetree/bindings/spi/fsl-imx-cspi.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/spi/fsl-imx-cspi.txt 
b/Documentation/devicetree/bindings/spi/fsl-imx-cspi.txt
index 33bc58f..0a529ba 100644
--- a/Documentation/devicetree/bindings/spi/fsl-imx-cspi.txt
+++ b/Documentation/devicetree/bindings/spi/fsl-imx-cspi.txt
@@ -10,6 +10,7 @@ Required properties:
   - "fsl,imx35-cspi" for SPI compatible with the one integrated on i.MX35
   - "fsl,imx51-ecspi" for SPI compatible with the one integrated on i.MX51
   - "fsl,imx53-ecspi" for SPI compatible with the one integrated on i.MX53 and 
later Soc
+  - "fsl,imx6ul-ecspi" for SPI compatible with the one integrated on i.MX6UL 
and later Soc
   - "fsl,imx8mq-ecspi" for SPI compatible with the one integrated on i.MX8MQ
   - "fsl,imx8mm-ecspi" for SPI compatible with the one integrated on i.MX8MM
   - "fsl,imx8mn-ecspi" for SPI compatible with the one integrated on i.MX8MN
-- 
2.7.4

[PATCH v10 07/12] spi: imx: fix ERR009165

Change to XCH  mode even in dma mode, please refer to the below
errata:
https://www.nxp.com/docs/en/errata/IMX6DQCE.pdf

Signed-off-by: Robin Gong 
Acked-by: Mark Brown 
---
 drivers/spi/spi-imx.c | 10 +++---
 1 file changed, 3 insertions(+), 7 deletions(-)

diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index 2b8d339..873be5e 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -591,8 +591,8 @@ static int mx51_ecspi_prepare_transfer(struct spi_imx_data 
*spi_imx,
ctrl |= mx51_ecspi_clkdiv(spi_imx, t->speed_hz, );
spi_imx->spi_bus_clk = clk;
 
-   if (spi_imx->usedma)
-   ctrl |= MX51_ECSPI_CTRL_SMC;
+   /* ERR009165: work in XHC mode as PIO */
+   ctrl &= ~MX51_ECSPI_CTRL_SMC;
 
writel(ctrl, spi_imx->base + MX51_ECSPI_CTRL);
 
@@ -623,7 +623,7 @@ static void mx51_setup_wml(struct spi_imx_data *spi_imx)
 * and enable DMA request.
 */
writel(MX51_ECSPI_DMA_RX_WML(spi_imx->wml - 1) |
-   MX51_ECSPI_DMA_TX_WML(spi_imx->wml) |
+   MX51_ECSPI_DMA_TX_WML(0) |
MX51_ECSPI_DMA_RXT_WML(spi_imx->wml) |
MX51_ECSPI_DMA_TEDEN | MX51_ECSPI_DMA_RXDEN |
MX51_ECSPI_DMA_RXTDEN, spi_imx->base + MX51_ECSPI_DMA);
@@ -1273,10 +1273,6 @@ static int spi_imx_sdma_init(struct device *dev, struct 
spi_imx_data *spi_imx,
 {
int ret;
 
-   /* use pio mode for i.mx6dl chip TKT238285 */
-   if (of_machine_is_compatible("fsl,imx6dl"))
-   return 0;
-
spi_imx->wml = spi_imx->devtype_data->fifo_size / 2;
 
/* Prepare for TX DMA: */
-- 
2.7.4

[PATCH v10 08/12] spi: imx: remove ERR009165 workaround on i.mx6ul

ERR009165 fixed on i.mx6ul/6ull/6sll. All other i.mx6/7 and
i.mx8m/8mm still need this errata. Please refer to nxp official
errata document from https://www.nxp.com/ .

For removing workaround on those chips. Add new i.mx6ul type.

Signed-off-by: Robin Gong 
Acked-by: Mark Brown 
---
 drivers/spi/spi-imx.c | 50 ++
 1 file changed, 46 insertions(+), 4 deletions(-)

diff --git a/drivers/spi/spi-imx.c b/drivers/spi/spi-imx.c
index 873be5e..f3049c7 100644
--- a/drivers/spi/spi-imx.c
+++ b/drivers/spi/spi-imx.c
@@ -57,6 +57,7 @@ enum spi_imx_devtype {
IMX35_CSPI, /* CSPI on all i.mx except above */
IMX51_ECSPI,/* ECSPI on i.mx51 */
IMX53_ECSPI,/* ECSPI on i.mx53 and later */
+   IMX6UL_ECSPI,   /* ERR009165 fix from i.mx6ul */
 };
 
 struct spi_imx_data;
@@ -76,6 +77,11 @@ struct spi_imx_devtype_data {
bool has_slavemode;
unsigned int fifo_size;
bool dynamic_burst;
+   /*
+* ERR009165 fixed or not:
+* https://www.nxp.com/docs/en/errata/IMX6DQCE.pdf
+*/
+   bool tx_glitch_fixed;
enum spi_imx_devtype devtype;
 };
 
@@ -132,6 +138,11 @@ static inline int is_imx51_ecspi(struct spi_imx_data *d)
return d->devtype_data->devtype == IMX51_ECSPI;
 }
 
+static inline int is_imx6ul_ecspi(struct spi_imx_data *d)
+{
+   return d->devtype_data->devtype == IMX6UL_ECSPI;
+}
+
 static inline int is_imx53_ecspi(struct spi_imx_data *d)
 {
return d->devtype_data->devtype == IMX53_ECSPI;
@@ -591,8 +602,14 @@ static int mx51_ecspi_prepare_transfer(struct spi_imx_data 
*spi_imx,
ctrl |= mx51_ecspi_clkdiv(spi_imx, t->speed_hz, );
spi_imx->spi_bus_clk = clk;
 
-   /* ERR009165: work in XHC mode as PIO */
-   ctrl &= ~MX51_ECSPI_CTRL_SMC;
+   /*
+* ERR009165: work in XHC mode instead of SMC as PIO on the chips
+* before i.mx6ul.
+*/
+   if (spi_imx->usedma && spi_imx->devtype_data->tx_glitch_fixed)
+   ctrl |= MX51_ECSPI_CTRL_SMC;
+   else
+   ctrl &= ~MX51_ECSPI_CTRL_SMC;
 
writel(ctrl, spi_imx->base + MX51_ECSPI_CTRL);
 
@@ -618,12 +635,16 @@ static int mx51_ecspi_prepare_transfer(struct 
spi_imx_data *spi_imx,
 
 static void mx51_setup_wml(struct spi_imx_data *spi_imx)
 {
+   u32 tx_wml = 0;
+
+   if (spi_imx->devtype_data->tx_glitch_fixed)
+   tx_wml = spi_imx->wml;
/*
 * Configure the DMA register: setup the watermark
 * and enable DMA request.
 */
writel(MX51_ECSPI_DMA_RX_WML(spi_imx->wml - 1) |
-   MX51_ECSPI_DMA_TX_WML(0) |
+   MX51_ECSPI_DMA_TX_WML(tx_wml) |
MX51_ECSPI_DMA_RXT_WML(spi_imx->wml) |
MX51_ECSPI_DMA_TEDEN | MX51_ECSPI_DMA_RXDEN |
MX51_ECSPI_DMA_RXTDEN, spi_imx->base + MX51_ECSPI_DMA);
@@ -1017,6 +1038,23 @@ static struct spi_imx_devtype_data 
imx53_ecspi_devtype_data = {
.devtype = IMX53_ECSPI,
 };
 
+static struct spi_imx_devtype_data imx6ul_ecspi_devtype_data = {
+   .intctrl = mx51_ecspi_intctrl,
+   .prepare_message = mx51_ecspi_prepare_message,
+   .prepare_transfer = mx51_ecspi_prepare_transfer,
+   .trigger = mx51_ecspi_trigger,
+   .rx_available = mx51_ecspi_rx_available,
+   .reset = mx51_ecspi_reset,
+   .setup_wml = mx51_setup_wml,
+   .fifo_size = 64,
+   .has_dmamode = true,
+   .dynamic_burst = true,
+   .has_slavemode = true,
+   .tx_glitch_fixed = true,
+   .disable = mx51_ecspi_disable,
+   .devtype = IMX6UL_ECSPI,
+};
+
 static const struct platform_device_id spi_imx_devtype[] = {
{
.name = "imx1-cspi",
@@ -1040,6 +1078,9 @@ static const struct platform_device_id spi_imx_devtype[] 
= {
.name = "imx53-ecspi",
.driver_data = (kernel_ulong_t) _ecspi_devtype_data,
}, {
+   .name = "imx6ul-ecspi",
+   .driver_data = (kernel_ulong_t) _ecspi_devtype_data,
+   }, {
/* sentinel */
}
 };
@@ -1052,6 +1093,7 @@ static const struct of_device_id spi_imx_dt_ids[] = {
{ .compatible = "fsl,imx35-cspi", .data = _cspi_devtype_data, },
{ .compatible = "fsl,imx51-ecspi", .data = _ecspi_devtype_data, },
{ .compatible = "fsl,imx53-ecspi", .data = _ecspi_devtype_data, },
+   { .compatible = "fsl,imx6ul-ecspi", .data = _ecspi_devtype_data, 
},
{ /* sentinel */ }
 };
 MODULE_DEVICE_TABLE(of, spi_imx_dt_ids);
@@ -1671,7 +1713,7 @@ static int spi_imx_probe(struct platform_device *pdev)
spi_imx->bitbang.master->mode_bits = SPI_CPOL | SPI_CPHA | SPI_CS_HIGH \
 | SPI_NO_CS;
if (is_imx35_cspi(spi_imx) || is_imx51_ecspi(spi_imx) ||
-   is_imx53_ecspi(spi_imx))
+   is_imx53_ecspi(spi_imx) || is_imx6ul_ecspi(spi_imx))

[PATCH v10 12/12] dmaengine: imx-sdma: add uart rom script

For the compatibility of NXP internal legacy kernel before 4.19 which
is based on uart ram script and upstreaming kernel based on uart rom
script, add both uart ram/rom script in latest sdma firmware. By default
uart rom script used.
Besides, add two multi-fifo scripts for SAI/PDM on i.mx8m/8mm and add
back qspi script miss for v4(i.mx7d/8m/8mm family, but v3 is for i.mx6).

rom script:
uart_2_mcu_addr
uartsh_2_mcu_addr /* through spba bus */
am script:
uart_2_mcu_ram_addr
uartsh_2_mcu_ram_addr /* through spba bus */

Please get latest sdma firmware from the below and put them into the path
(/lib/firmware/imx/sdma/):
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git
/tree/imx/sdma

Signed-off-by: Robin Gong 
Acked-by: Vinod Koul 
---
 drivers/dma/imx-sdma.c | 4 ++--
 include/linux/platform_data/dma-imx-sdma.h | 8 ++--
 2 files changed, 8 insertions(+), 4 deletions(-)

diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
index 3058c78..e946271 100644
--- a/drivers/dma/imx-sdma.c
+++ b/drivers/dma/imx-sdma.c
@@ -1729,8 +1729,8 @@ static void sdma_issue_pending(struct dma_chan *chan)
 
 #define SDMA_SCRIPT_ADDRS_ARRAY_SIZE_V134
 #define SDMA_SCRIPT_ADDRS_ARRAY_SIZE_V238
-#define SDMA_SCRIPT_ADDRS_ARRAY_SIZE_V341
-#define SDMA_SCRIPT_ADDRS_ARRAY_SIZE_V442
+#define SDMA_SCRIPT_ADDRS_ARRAY_SIZE_V345
+#define SDMA_SCRIPT_ADDRS_ARRAY_SIZE_V446
 
 static void sdma_add_scripts(struct sdma_engine *sdma,
const struct sdma_script_start_addrs *addr)
diff --git a/include/linux/platform_data/dma-imx-sdma.h 
b/include/linux/platform_data/dma-imx-sdma.h
index 30e676b..e12d2e8 100644
--- a/include/linux/platform_data/dma-imx-sdma.h
+++ b/include/linux/platform_data/dma-imx-sdma.h
@@ -20,12 +20,12 @@ struct sdma_script_start_addrs {
s32 per_2_firi_addr;
s32 mcu_2_firi_addr;
s32 uart_2_per_addr;
-   s32 uart_2_mcu_addr;
+   s32 uart_2_mcu_ram_addr;
s32 per_2_app_addr;
s32 mcu_2_app_addr;
s32 per_2_per_addr;
s32 uartsh_2_per_addr;
-   s32 uartsh_2_mcu_addr;
+   s32 uartsh_2_mcu_ram_addr;
s32 per_2_shp_addr;
s32 mcu_2_shp_addr;
s32 ata_2_mcu_addr;
@@ -52,6 +52,10 @@ struct sdma_script_start_addrs {
s32 zcanfd_2_mcu_addr;
s32 zqspi_2_mcu_addr;
s32 mcu_2_ecspi_addr;
+   s32 mcu_2_sai_addr;
+   s32 sai_2_mcu_addr;
+   s32 uart_2_mcu_addr;
+   s32 uartsh_2_mcu_addr;
/* End of v3 array */
s32 mcu_2_zqspi_addr;
/* End of v4 array */
-- 
2.7.4

[PATCH v10 04/12] dmaengine: imx-sdma: remove duplicated sdma_load_context

Since sdma_transfer_init() will do sdma_load_context before any
sdma transfer, no need once more in sdma_config_channel().

Signed-off-by: Robin Gong 
Acked-by: Vinod Koul 
---
 drivers/dma/imx-sdma.c | 5 +
 1 file changed, 1 insertion(+), 4 deletions(-)

diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
index d305b80..5411e01e 100644
--- a/drivers/dma/imx-sdma.c
+++ b/drivers/dma/imx-sdma.c
@@ -1137,7 +1137,6 @@ static void sdma_set_watermarklevel_for_p2p(struct 
sdma_channel *sdmac)
 static int sdma_config_channel(struct dma_chan *chan)
 {
struct sdma_channel *sdmac = to_sdma_chan(chan);
-   int ret;
 
sdma_disable_channel(chan);
 
@@ -1177,9 +1176,7 @@ static int sdma_config_channel(struct dma_chan *chan)
sdmac->watermark_level = 0; /* FIXME: M3_BASE_ADDRESS */
}
 
-   ret = sdma_load_context(sdmac);
-
-   return ret;
+   return 0;
 }
 
 static int sdma_set_channel_priority(struct sdma_channel *sdmac,
-- 
2.7.4

[PATCH v10 11/12] dma: imx-sdma: add i.mx6ul compatible name

Add i.mx6ul compatible name in binding doc.

Signed-off-by: Robin Gong 
Reviewed-by: Rob Herring 
---
 Documentation/devicetree/bindings/dma/fsl-imx-sdma.txt | 1 +
 1 file changed, 1 insertion(+)

diff --git a/Documentation/devicetree/bindings/dma/fsl-imx-sdma.txt 
b/Documentation/devicetree/bindings/dma/fsl-imx-sdma.txt
index c9e9740..12c316f 100644
--- a/Documentation/devicetree/bindings/dma/fsl-imx-sdma.txt
+++ b/Documentation/devicetree/bindings/dma/fsl-imx-sdma.txt
@@ -9,6 +9,7 @@ Required properties:
   "fsl,imx53-sdma"
   "fsl,imx6q-sdma"
   "fsl,imx7d-sdma"
+  "fsl,imx6ul-sdma"
   "fsl,imx8mq-sdma"
   "fsl,imx8mm-sdma"
   "fsl,imx8mn-sdma"
-- 
2.7.4

[PATCH v10 05/12] dmaengine: dma: imx-sdma: add fw_loaded and is_ram_script

Add 'fw_loaded' and 'is_ram_script' to check if the script used by channel
is ram script and it's loaded or not, so that could prevent meaningless
following malloc dma descriptor and bd allocate in sdma_transfer_init(),
otherwise memory may be consumed out potentially without free in case
that spi fallback into pio while dma transfer failed by sdma firmware not
ready(next ERR009165 patch depends on sdma RAM scripts/firmware).

Signed-off-by: Robin Gong 
---
 drivers/dma/imx-sdma.c | 13 +
 1 file changed, 13 insertions(+)

diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
index 5411e01e..ce1c83e 100644
--- a/drivers/dma/imx-sdma.c
+++ b/drivers/dma/imx-sdma.c
@@ -379,6 +379,7 @@ struct sdma_channel {
enum dma_status status;
struct imx_dma_data data;
struct work_struct  terminate_worker;
+   boolis_ram_script;
 };
 
 #define IMX_DMA_SG_LOOPBIT(0)
@@ -443,6 +444,7 @@ struct sdma_engine {
struct sdma_buffer_descriptor   *bd0;
/* clock ratio for AHB:SDMA core. 1:1 is 1, 2:1 is 0*/
boolclk_ratio;
+   boolfw_loaded;
 };
 
 static int sdma_config_write(struct dma_chan *chan,
@@ -929,6 +931,7 @@ static void sdma_get_pc(struct sdma_channel *sdmac,
case IMX_DMATYPE_SSI_DUAL:
per_2_emi = sdma->script_addrs->ssish_2_mcu_addr;
emi_2_per = sdma->script_addrs->mcu_2_ssish_addr;
+   sdmac->is_ram_script = true;
break;
case IMX_DMATYPE_SSI_SP:
case IMX_DMATYPE_MMC:
@@ -943,6 +946,7 @@ static void sdma_get_pc(struct sdma_channel *sdmac,
per_2_emi = sdma->script_addrs->asrc_2_mcu_addr;
emi_2_per = sdma->script_addrs->asrc_2_mcu_addr;
per_2_per = sdma->script_addrs->per_2_per_addr;
+   sdmac->is_ram_script = true;
break;
case IMX_DMATYPE_ASRC_SP:
per_2_emi = sdma->script_addrs->shp_2_mcu_addr;
@@ -1339,6 +1343,11 @@ static struct sdma_desc *sdma_transfer_init(struct 
sdma_channel *sdmac,
 {
struct sdma_desc *desc;
 
+   if (!sdmac->sdma->fw_loaded && sdmac->is_ram_script) {
+   dev_err(sdmac->sdma->dev, "sdma firmware not ready!\n");
+   goto err_out;
+   }
+
desc = kzalloc((sizeof(*desc)), GFP_NOWAIT);
if (!desc)
goto err_out;
@@ -1589,6 +1598,8 @@ static int sdma_config_write(struct dma_chan *chan,
 {
struct sdma_channel *sdmac = to_sdma_chan(chan);
 
+   sdmac->is_ram_script = false;
+
if (direction == DMA_DEV_TO_MEM) {
sdmac->per_address = dmaengine_cfg->src_addr;
sdmac->watermark_level = dmaengine_cfg->src_maxburst *
@@ -1768,6 +1779,8 @@ static void sdma_load_firmware(const struct firmware *fw, 
void *context)
 
sdma_add_scripts(sdma, addr);
 
+   sdma->fw_loaded = true;
+
dev_info(sdma->dev, "loaded firmware %d.%d\n",
header->version_major,
header->version_minor);
-- 
2.7.4

[PATCH v10 06/12] dmaengine: imx-sdma: add mcu_2_ecspi script

Add mcu_2_ecspi script to fix ecspi errata ERR009165.

Signed-off-by: Robin Gong 
Acked-by: Vinod Koul 
---
 drivers/dma/imx-sdma.c | 4 
 1 file changed, 4 insertions(+)

diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
index ce1c83e..337143f 100644
--- a/drivers/dma/imx-sdma.c
+++ b/drivers/dma/imx-sdma.c
@@ -922,6 +922,10 @@ static void sdma_get_pc(struct sdma_channel *sdmac,
emi_2_per = sdma->script_addrs->mcu_2_ata_addr;
break;
case IMX_DMATYPE_CSPI:
+   per_2_emi = sdma->script_addrs->app_2_mcu_addr;
+   emi_2_per = sdma->script_addrs->mcu_2_ecspi_addr;
+   sdmac->is_ram_script = true;
+   break;
case IMX_DMATYPE_EXT:
case IMX_DMATYPE_SSI:
case IMX_DMATYPE_SAI:
-- 
2.7.4

[PATCH v10 10/12] dmaengine: imx-sdma: remove ERR009165 on i.mx6ul

ECSPI issue fixed from i.mx6ul at hardware level, no need
ERR009165 anymore on those chips such as i.mx8mq.

Signed-off-by: Robin Gong 
Acked-by: Vinod Koul 
---
 drivers/dma/imx-sdma.c | 29 -
 1 file changed, 28 insertions(+), 1 deletion(-)

diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
index 337143f..3058c78 100644
--- a/drivers/dma/imx-sdma.c
+++ b/drivers/dma/imx-sdma.c
@@ -420,6 +420,13 @@ struct sdma_driver_data {
int num_events;
struct sdma_script_start_addrs  *script_addrs;
bool check_ratio;
+   /*
+* ecspi ERR009165 fixed should be done in sdma script
+* and it has been fixed in soc from i.mx6ul.
+* please get more information from the below link:
+* https://www.nxp.com/docs/en/errata/IMX6DQCE.pdf
+*/
+   bool ecspi_fixed;
 };
 
 struct sdma_engine {
@@ -541,6 +548,13 @@ static struct sdma_driver_data sdma_imx6q = {
.script_addrs = _script_imx6q,
 };
 
+static struct sdma_driver_data sdma_imx6ul = {
+   .chnenbl0 = SDMA_CHNENBL0_IMX35,
+   .num_events = 48,
+   .script_addrs = _script_imx6q,
+   .ecspi_fixed = true,
+};
+
 static struct sdma_script_start_addrs sdma_script_imx7d = {
.ap_2_ap_addr = 644,
.uart_2_mcu_addr = 819,
@@ -589,6 +603,9 @@ static const struct platform_device_id sdma_devtypes[] = {
.name = "imx7d-sdma",
.driver_data = (unsigned long)_imx7d,
}, {
+   .name = "imx6ul-sdma",
+   .driver_data = (unsigned long)_imx6ul,
+   }, {
.name = "imx8mq-sdma",
.driver_data = (unsigned long)_imx8mq,
}, {
@@ -605,6 +622,7 @@ static const struct of_device_id sdma_dt_ids[] = {
{ .compatible = "fsl,imx31-sdma", .data = _imx31, },
{ .compatible = "fsl,imx25-sdma", .data = _imx25, },
{ .compatible = "fsl,imx7d-sdma", .data = _imx7d, },
+   { .compatible = "fsl,imx6ul-sdma", .data = _imx6ul, },
{ .compatible = "fsl,imx8mq-sdma", .data = _imx8mq, },
{ /* sentinel */ }
 };
@@ -1174,8 +1192,17 @@ static int sdma_config_channel(struct dma_chan *chan)
if (sdmac->peripheral_type == IMX_DMATYPE_ASRC_SP ||
sdmac->peripheral_type == IMX_DMATYPE_ASRC)
sdma_set_watermarklevel_for_p2p(sdmac);
-   } else
+   } else {
+   /*
+* ERR009165 fixed from i.mx6ul, no errata need,
+* set bit31 to let sdma script skip the errata.
+*/
+   if (sdmac->peripheral_type == IMX_DMATYPE_CSPI &&
+   sdmac->direction == DMA_MEM_TO_DEV &&
+   sdmac->sdma->drvdata->ecspi_fixed)
+   __set_bit(31, >watermark_level);
__set_bit(sdmac->event_id0, sdmac->event_mask);
+   }
 
/* Address */
sdmac->shp_addr = sdmac->per_address;
-- 
2.7.4

[PATCH v10 00/12] add ecspi ERR009165 for i.mx6/7 soc family

There is ecspi ERR009165 on i.mx6/7 soc family, which cause FIFO
transfer to be send twice in DMA mode. Please get more information from:
https://www.nxp.com/docs/en/errata/IMX6DQCE.pdf. The workaround is adding
new sdma ram script which works in XCH mode as PIO inside sdma instead
of SMC mode, meanwhile, 'TX_THRESHOLD' should be 0. The issue should be
exist on all legacy i.mx6/7 soc family before i.mx6ul.
NXP fix this design issue from i.mx6ul, so newer chips including i.mx6ul/
6ull/6sll do not need this workaroud anymore. All other i.mx6/7/8 chips
still need this workaroud. This patch set add new 'fsl,imx6ul-ecspi'
for ecspi driver and 'ecspi_fixed' in sdma driver to choose if need errata
or not.
The first two reverted patches should be the same issue, though, it
seems 'fixed' by changing to other shp script. Hope Sean or Sascha could
have the chance to test this patch set if could fix their issues.
Besides, enable sdma support for i.mx8mm/8mq and fix ecspi1 not work
on i.mx8mm because the event id is zero.

PS:
Please get sdma firmware from below linux-firmware and copy it to your
local rootfs /lib/firmware/imx/sdma.
https://git.kernel.org/pub/scm/linux/kernel/git/firmware/linux-firmware.git/tree/imx/sdma

v2:
1.Add commit log for reverted patches.
2.Add comment for 'ecspi_fixed' in sdma driver.
3.Add 'fsl,imx6sll-ecspi' compatible instead of 'fsl,imx6ul-ecspi'
rather than remove.
v3:
1.Confirm with design team make sure ERR009165 fixed on i.mx6ul/i.mx6ull
/i.mx6sll, not fixed on i.mx8m/8mm and other i.mx6/7 legacy chips.
Correct dts related dts patch in v2.
2.Clean eratta information in binding doc and new 'tx_glitch_fixed' flag
in spi-imx driver to state ERR009165 fixed or not.
3.Enlarge burst size to fifo size for tx since tx_wml set to 0 in the
errata workaroud, thus improve performance as possible.
v4:
1.Add Ack tag from Mark and Vinod
2.Remove checking 'event_id1' zero as 'event_id0'.
v5:
1.Add the last patch for compatible with the current uart driver which
using rom script, so both uart ram script and rom script supported
in latest firmware, by default uart rom script used. UART driver
will be broken without this patch.
v6:
1.Resend after rebase the latest next branch.
2.Remove below No.13~No.15 patches of v5 because they were mergered.
ARM: dts: imx6ul: add dma support on ecspi
ARM: dts: imx6sll: correct sdma compatible
arm64: defconfig: Enable SDMA on i.mx8mq/8mm
3.Revert "dmaengine: imx-sdma: fix context cache" since
'context_loaded' removed.
v7:
1.Put the last patch 13/13 'Revert "dmaengine: imx-sdma: fix context
cache"' to the ahead of 03/13 'Revert "dmaengine: imx-sdma: refine
to load context only once" so that no building waring during comes out
during bisect.
2.Address Sascha's comments, including eliminating any i.mx6sx in this
series, adding new 'is_imx6ul_ecspi()' instead imx in imx51 and taking
care SMC bit for PIO.
3.Add back missing 'Reviewed-by' tag on 08/15(v5):09/13(v7)
'spi: imx: add new i.mx6ul compatible name in binding doc'
v8:
1.remove 0003-Revert-dmaengine-imx-sdma-fix-context-cache.patch and merge
it into 04/13 of v7
2.add 0005-spi-imx-fallback-to-PIO-if-dma-setup-failure.patch for no any
ecspi function broken even if sdma firmware not updated.
3.merge 'tx.dst_maxburst' changes in the two continous patches into one
patch to avoid confusion.
4.fix typo 'duplicated'.
v9:
1. add "spi: imx: add dma_sync_sg_for_device after fallback from dma"
to fix the potential issue brought by commit bcd8e7761ec9("spi: imx:
fallback to PIO if dma setup failure") which is the only one patch
of v8 merged. Thanks Matthias for reporting:

https://lore.kernel.org/linux-arm-kernel/5d246dd81607bb6e5cb9af86ad4e53f7a7a99c50.ca...@ew.tq-group.com/
2. remove 05/13 of v8 "spi: imx:fallback to PIO if dma setup failure"
since it's been merged.
v10:
1. remove 01/13 "spi: imx: add dma_sync_sg_for_device after fallback from dma"
since there is another independent patch merged:
-- commit 809b1b04df898 ("spi: introduce fallback to pio")
2. add "dmaengine: dma: imx-sdma: add fw_loaded and is_ram_script" which
is used to fix the potential dma_alloc_coherent() failure while this
patchset applied but sdma firmware may not be ready for long time.
3. burst size change back from fifo size to normal wml to align with nxp
internal tree which has been test for years. Overnight with loopback
test with spidev failed with fifo size, but pass with wml(half of fifo
size).Seems the whole fifo size fed may cause rxfifo overflow during
tx shift out while rx shift in.
"spi: imx: remove ERR009165 workaround on i.mx6ul"
4. remove 12/13 'dmaengine: imx-sdma: fix ecspi1 rx dma not work on i.mx8mm'
since below two similar patches merged:
-- commit 25962e1a7f1d ("dmaengine: imx-sdma: Fix the event id check

[PATCH v10 02/12] Revert "ARM: dts: imx6: Use correct SDMA script for SPI cores"

There are two ways for SDMA accessing SPBA devices: one is SDMA->AIPS
->SPBA(masterA port), another is SDMA->SPBA(masterC port). Please refer
to the 'Figure 58-1. i.MX 6Dual/6Quad SPBA connectivity' of i.mx6DQ
Reference Manual. SDMA provide the corresponding app_2_mcu/mcu_2_app and
shp_2_mcu/mcu_2_shp script for such two options. So both AIPS and SPBA
scripts should keep the same behaviour, the issue only caught in AIPS
script sounds not solide.
The issue is more likely as the ecspi errata
ERR009165(http://www.nxp.com/docs/en/errata/IMX6DQCE.pdf):
eCSPI: TXFIFO empty flag glitch can cause the current FIFO transfer to
   be sent twice
So revert commit 'dd4b487b32a3' firstly.

Signed-off-by: Robin Gong 
Acked-by: Sascha Hauer 
---
 arch/arm/boot/dts/imx6qdl.dtsi | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/arm/boot/dts/imx6qdl.dtsi b/arch/arm/boot/dts/imx6qdl.dtsi
index 346a52f..a8dedeb 100644
--- a/arch/arm/boot/dts/imx6qdl.dtsi
+++ b/arch/arm/boot/dts/imx6qdl.dtsi
@@ -327,7 +327,7 @@
clocks = < IMX6QDL_CLK_ECSPI1>,
 < IMX6QDL_CLK_ECSPI1>;
clock-names = "ipg", "per";
-   dmas = < 3 8 1>, < 4 8 2>;
+   dmas = < 3 7 1>, < 4 7 2>;
dma-names = "rx", "tx";
status = "disabled";
};
@@ -341,7 +341,7 @@
clocks = < IMX6QDL_CLK_ECSPI2>,
 < IMX6QDL_CLK_ECSPI2>;
clock-names = "ipg", "per";
-   dmas = < 5 8 1>, < 6 8 2>;
+   dmas = < 5 7 1>, < 6 7 2>;
dma-names = "rx", "tx";
status = "disabled";
};
@@ -355,7 +355,7 @@
clocks = < IMX6QDL_CLK_ECSPI3>,
 < IMX6QDL_CLK_ECSPI3>;
clock-names = "ipg", "per";
-   dmas = < 7 8 1>, < 8 8 2>;
+   dmas = < 7 7 1>, < 8 7 2>;
dma-names = "rx", "tx";
status = "disabled";
};
@@ -369,7 +369,7 @@
clocks = < IMX6QDL_CLK_ECSPI4>,
 < IMX6QDL_CLK_ECSPI4>;
clock-names = "ipg", "per";
-   dmas = < 9 8 1>, < 10 8 2>;
+   dmas = < 9 7 1>, < 10 7 2>;
dma-names = "rx", "tx";
status = "disabled";
};
-- 
2.7.4

[PATCH v10 01/12] Revert "ARM: dts: imx6q: Use correct SDMA script for SPI5 core"

  There are two ways for SDMA accessing SPBA devices: one is SDMA->AIPS
->SPBA(masterA port), another is SDMA->SPBA(masterC port). Please refer
to the 'Figure 58-1. i.MX 6Dual/6Quad SPBA connectivity' of i.mx6DQ
Reference Manual. SDMA provide the corresponding app_2_mcu/mcu_2_app and
shp_2_mcu/mcu_2_shp script for such two options. So both AIPS and SPBA
scripts should keep the same behaviour, the issue only caught in AIPS
script sounds not solide.
  The issue is more likely as the ecspi errata
ERR009165(http://www.nxp.com/docs/en/errata/IMX6DQCE.pdf):
eCSPI: TXFIFO empty flag glitch can cause the current FIFO transfer to
   be sent twice
So revert commit 'df07101e1c4a' firstly.

Signed-off-by: Robin Gong 
Acked-by: Sascha Hauer 
---
 arch/arm/boot/dts/imx6q.dtsi | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/arm/boot/dts/imx6q.dtsi b/arch/arm/boot/dts/imx6q.dtsi
index 78a4d64..afdd9eb 100644
--- a/arch/arm/boot/dts/imx6q.dtsi
+++ b/arch/arm/boot/dts/imx6q.dtsi
@@ -177,7 +177,7 @@
clocks = < IMX6Q_CLK_ECSPI5>,
 < IMX6Q_CLK_ECSPI5>;
clock-names = "ipg", "per";
-   dmas = < 11 8 1>, < 12 8 2>;
+   dmas = < 11 7 1>, < 12 7 2>;
dma-names = "rx", "tx";
status = "disabled";
};
-- 
2.7.4

[PATCH v10 03/12] Revert "dmaengine: imx-sdma: refine to load context only once"

This reverts commit ad0d92d7ba6aecbe2705907c38ff8d8be4da1e9c, because
in spi-imx case, burst length may be changed dynamically.

Signed-off-by: Robin Gong 
Acked-by: Sascha Hauer 
---
 drivers/dma/imx-sdma.c | 8 
 1 file changed, 8 deletions(-)

diff --git a/drivers/dma/imx-sdma.c b/drivers/dma/imx-sdma.c
index 270992c..d305b80 100644
--- a/drivers/dma/imx-sdma.c
+++ b/drivers/dma/imx-sdma.c
@@ -377,7 +377,6 @@ struct sdma_channel {
unsigned long   watermark_level;
u32 shp_addr, per_addr;
enum dma_status status;
-   boolcontext_loaded;
struct imx_dma_data data;
struct work_struct  terminate_worker;
 };
@@ -984,9 +983,6 @@ static int sdma_load_context(struct sdma_channel *sdmac)
int ret;
unsigned long flags;
 
-   if (sdmac->context_loaded)
-   return 0;
-
if (sdmac->direction == DMA_DEV_TO_MEM)
load_address = sdmac->pc_from_device;
else if (sdmac->direction == DMA_DEV_TO_DEV)
@@ -1029,8 +1025,6 @@ static int sdma_load_context(struct sdma_channel *sdmac)
 
spin_unlock_irqrestore(>channel_0_lock, flags);
 
-   sdmac->context_loaded = true;
-
return ret;
 }
 
@@ -1069,7 +1063,6 @@ static void sdma_channel_terminate_work(struct 
work_struct *work)
vchan_get_all_descriptors(>vc, );
spin_unlock_irqrestore(>vc.lock, flags);
vchan_dma_desc_free_list(>vc, );
-   sdmac->context_loaded = false;
 }
 
 static int sdma_terminate_all(struct dma_chan *chan)
@@ -1337,7 +1330,6 @@ static void sdma_free_chan_resources(struct dma_chan 
*chan)
 
sdmac->event_id0 = 0;
sdmac->event_id1 = 0;
-   sdmac->context_loaded = false;
 
sdma_set_channel_priority(sdmac, 0);
 
-- 
2.7.4

Re: [PATCH 5.4 000/178] 5.4.50-rc1 review

2020-06-29 Thread Naresh Kamboju

On Mon, 29 Jun 2020 at 20:55, Sasha Levin  wrote:
>
>
> This is the start of the stable review cycle for the 5.4.50 release.
> There are 178 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed 01 Jul 2020 03:25:02 PM UTC.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-5.4.y=v5.4.49
>
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-5.4.y
> and the diffstat can be found below.
>
> --
> Thanks,
> Sasha


Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Summary


kernel: 5.4.50-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-5.4.y
git commit: 7d61c4b6865ab9c9f22e4ddbc65645c0c4b0427e
git describe: v5.4.49-178-g7d61c4b6865a
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-5.4-oe/build/v5.4.49-178-g7d61c4b6865a

No regressions (compared to build v5.4.49)

No fixes (compared to build v5.4.49)

Ran 36125 total tests in the following environments and test suites.

Environments
--
- dragonboard-410c
- hi6220-hikey
- i386
- juno-r2
- juno-r2-compat
- juno-r2-kasan
- nxp-ls2088
- qemu_arm
- qemu_arm64
- qemu_i386
- qemu_x86_64
- x15
- x86
- x86-kasan

Test Suites
---
* build
* install-android-platform-tools-r2600
* install-android-platform-tools-r2800
* kselftest
* kselftest/drivers
* kselftest/filesystems
* kselftest/net
* kvm-unit-tests
* libhugetlbfs
* linux-log-parser
* ltp-cap_bounds-tests
* ltp-containers-tests
* ltp-cpuhotplug-tests
* ltp-crypto-tests
* ltp-dio-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-mm-tests
* ltp-sched-tests
* ltp-syscalls-tests
* perf
* v4l2-compliance
* ltp-commands-tests
* ltp-controllers-tests
* ltp-cve-tests
* ltp-ipc-tests
* ltp-math-tests
* ltp-nptl-tests
* ltp-pty-tests
* ltp-securebits-tests
* network-basic-tests
* ltp-open-posix-tests
* kselftest-vsyscall-mode-native
* kselftest-vsyscall-mode-native/drivers
* kselftest-vsyscall-mode-native/filesystems
* kselftest-vsyscall-mode-native/net
* kselftest-vsyscall-mode-none
* kselftest-vsyscall-mode-none/drivers
* kselftest-vsyscall-mode-none/filesystems
* kselftest-vsyscall-mode-none/net

-- 
Linaro LKFT
https://lkft.linaro.org

Re: input maintainer -- are you there? was Re: [PATCH 1/2] Input: add `SW_MACHINE_COVER`

2020-06-29 Thread Dmitry Torokhov

On Mon, Jun 29, 2020 at 03:36:44PM +0200, Pavel Machek wrote:
> Hi!
> 
> > Looks like we're blocking on this input patch.
> > 
> > On 16/06/2020 12:50, Pavel Machek wrote:
> > > On Fri 2020-06-12 14:53:58, Merlijn Wajer wrote:
> > >> This event code represents the state of a removable cover of a device.
> > >> Value 0 means that the cover is open or removed, value 1 means that the
> > >> cover is closed.
> > >>
> > >> Reviewed-by: Sebastian Reichel  
> > >> Acked-by: Tony Lindgren 
> > >>  Signed-off-by: Merlijn Wajer  ---
> > > 
> > > Dmitry, can we get some kind of comment here, or better yet can we get 
> > > you to apply this?
> > 
> > This is part of a patch series to resolve problems with the Nokia N900
> > not booting when the cover is removed (making the cover be the card
> > detect was also just weird IMHO). Just removing the card-detect from the
> > DTS is fine, but it was suggested that we expose the data instead as
> > input event. And that's gotten no response for about four months.
> > 
> > Should we just drop the feature and only remove the cd-gpios line from
> > the DTS, assuming upstream doesn't want this SW_MACHINE_COVER code?
> 
> I believe series is good, lets keep it. Changing now will only delay
> it a bit more. Let me try to get Dmitry's attention...
> 
> If that does not work, we can get Linus' attention :-).
> 
> If that does not work, umm, there are some other options.

Sorry, am really swamped the last couple months. I can pick up the input
code, do you want me to pick up DTS as well?

Thanks.


-- 
Dmitry

Re: [PATCH] kernel/trace: Add TRACING_ALLOW_PRINTK config option

2020-06-29 Thread Alexei Starovoitov

On Sun, Jun 28, 2020 at 07:43:34PM -0400, Steven Rostedt wrote:
> On Sun, 28 Jun 2020 18:28:42 -0400
> Steven Rostedt  wrote:
> 
> > You create a bpf event just like you create any other event. When a bpf
> > program that uses a bpf_trace_printk() is loaded, you can enable that
> > event from within the kernel. Yes, there's internal interfaces to
> > enabled and disable events just like echoing 1 into
> > tracefs/events/system/event/enable. See trace_set_clr_event().
> 
> I just started playing with what the code would look like and have
> this. It can be optimized with per-cpu sets of buffers to remove the
> spin lock. I also didn't put in the enabling of the event, but I'm sure
> you can figure that out.
> 
> Warning, not even compiled tested.

Thanks! I see what you mean now.

> 
> -- Steve
> 
> diff --git a/kernel/trace/Makefile b/kernel/trace/Makefile
> index 6575bb0a0434..aeba5ee7325a 100644
> --- a/kernel/trace/Makefile
> +++ b/kernel/trace/Makefile
> @@ -31,6 +31,8 @@ ifdef CONFIG_GCOV_PROFILE_FTRACE
>  GCOV_PROFILE := y
>  endif
>  
> +CFLAGS_bpf_trace.o := -I$(src)

not following. why this is needed?

> +
>  CFLAGS_trace_benchmark.o := -I$(src)
>  CFLAGS_trace_events_filter.o := -I$(src)
>  
> diff --git a/kernel/trace/bpf_trace.c b/kernel/trace/bpf_trace.c
> index dc05626979b8..01bedf335b2e 100644
> --- a/kernel/trace/bpf_trace.c
> +++ b/kernel/trace/bpf_trace.c
> @@ -19,6 +19,9 @@
>  #include "trace_probe.h"
>  #include "trace.h"
>  
> +#define CREATE_TRACE_EVENTS

CREATE_TRACE_POINTS ?

> +#include "bpf_trace.h"
> +
>  #define bpf_event_rcu_dereference(p) \
>   rcu_dereference_protected(p, lockdep_is_held(_event_mutex))
>  
> @@ -473,13 +476,29 @@ BPF_CALL_5(bpf_trace_printk, char *, fmt, u32, 
> fmt_size, u64, arg1,
>   fmt_cnt++;
>   }
>  
> +static DEFINE_SPINLOCK(trace_printk_lock);
> +#define BPF_TRACE_PRINTK_SIZE1024
> +
> +static inline void do_trace_printk(const char *fmt, ...)
> +{
> + static char buf[BPF_TRACE_PRINT_SIZE];
> + unsigned long flags;
> +
> + spin_lock_irqsave(_printk_lock, flags);
> + va_start(ap, fmt);
> + vsnprintf(buf, BPF_TRACE_PRINT_SIZE, fmt, ap);
> + va_end(ap);
> +
> + trace_bpf_trace_printk(buf);
> + spin_unlock_irqrestore(_printk_lock, flags);

interesting. I don't think anyone would care about spin_lock overhead.
It's better because 'trace_bpf_trace_printk' would be a separate event
that can be individually enabled/disabled?
I guess it can work.
Thanks!

Re: [PATCH] Input: elan_i2c - only increment wakeup count on touch

2020-06-29 Thread Dmitry Torokhov

On Mon, Jun 29, 2020 at 05:57:07PM -0700, Derek Basehore wrote:
> This moves the wakeup increment for elan devices to the touch report.
> This prevents the drivers from incorrectly reporting a wakeup when the
> resume callback resets then device, which causes an interrupt to
> occur. This also avoids error messages when these interrupts occur,
> since this behavior is expected.
> 
> Signed-off-by: Derek Basehore 
> ---
>  drivers/input/mouse/elan_i2c_core.c | 16 +---
>  1 file changed, 13 insertions(+), 3 deletions(-)
> 
> diff --git a/drivers/input/mouse/elan_i2c_core.c 
> b/drivers/input/mouse/elan_i2c_core.c
> index cdbe6b38c73c1..6ad53a75f9807 100644
> --- a/drivers/input/mouse/elan_i2c_core.c
> +++ b/drivers/input/mouse/elan_i2c_core.c
> @@ -49,6 +49,7 @@
>  
>  #define ETP_MAX_FINGERS  5
>  #define ETP_FINGER_DATA_LEN  5
> +#define ETP_REPORT_LEN_OFFSET0
>  #define ETP_REPORT_ID0x5D
>  #define ETP_TP_REPORT_ID 0x5E
>  #define ETP_REPORT_ID_OFFSET 2
> @@ -1018,6 +1019,8 @@ static void elan_report_absolute(struct elan_tp_data 
> *data, u8 *packet)
>   u8 hover_info = packet[ETP_HOVER_INFO_OFFSET];
>   bool contact_valid, hover_event;
>  
> + pm_wakeup_event(>client->dev, 0);
> +
>   hover_event = hover_info & 0x40;
>   for (i = 0; i < ETP_MAX_FINGERS; i++) {
>   contact_valid = tp_info & (1U << (3 + i));
> @@ -1041,6 +1044,8 @@ static void elan_report_trackpoint(struct elan_tp_data 
> *data, u8 *report)
>   u8 *packet = [ETP_REPORT_ID_OFFSET + 1];
>   int x, y;
>  
> + pm_wakeup_event(>client->dev, 0);
> +
>   if (!data->tp_input) {
>   dev_warn_once(>client->dev,
> "received a trackpoint report while no trackpoint 
> device has been created. Please report upstream.\n");
> @@ -1065,7 +1070,6 @@ static void elan_report_trackpoint(struct elan_tp_data 
> *data, u8 *report)
>  static irqreturn_t elan_isr(int irq, void *dev_id)
>  {
>   struct elan_tp_data *data = dev_id;
> - struct device *dev = >client->dev;
>   int error;
>   u8 report[ETP_MAX_REPORT_LEN];
>  
> @@ -1083,7 +1087,13 @@ static irqreturn_t elan_isr(int irq, void *dev_id)
>   if (error)
>   goto out;
>  
> - pm_wakeup_event(dev, 0);
> + /*
> +  * Controllers may send a full length report on power on and reset
> +  * cases. There are only meaningless bytes in these reports except for
> +  * report[ETP_REPORT_LEN_OFFSET], which is 0.
> +  */

Is this true for all versions of firmware? Also, should we pay attention
to the value of this field for various types of reports?

> + if (!report[ETP_REPORT_LEN_OFFSET])
> + goto out;
>  
>   switch (report[ETP_REPORT_ID_OFFSET]) {
>   case ETP_REPORT_ID:
> @@ -1093,7 +1103,7 @@ static irqreturn_t elan_isr(int irq, void *dev_id)
>   elan_report_trackpoint(data, report);
>   break;
>   default:
> - dev_err(dev, "invalid report id data (%x)\n",
> + dev_err(>client->dev, "invalid report id data (%x)\n",
>   report[ETP_REPORT_ID_OFFSET]);
>   }
>  
> -- 
> 2.27.0.212.ge8ba1cc988-goog
> 

Thanks.

-- 
Dmitry

Re: [PATCH 5.7 000/265] 5.7.7-rc1 review

2020-06-29 Thread Naresh Kamboju

On Mon, 29 Jun 2020 at 20:48, Sasha Levin  wrote:
>
>
> This is the start of the stable review cycle for the 5.7.7 release.
> There are 265 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>
> Responses should be made by Wed 01 Jul 2020 03:14:48 PM UTC.
> Anything received after that time might be too late.
>
> The whole patch series can be found in one patch at:
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git/patch/?id=linux-5.7.y=v5.7.6
>
> or in the git tree and branch at:
> 
> git://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git 
> linux-5.7.y
> and the diffstat can be found below.
>
> --
> Thanks,
> Sasha

Results from Linaro’s test farm.
No regressions on arm64, arm, x86_64, and i386.

Summary


kernel: 5.7.7-rc1
git repo: 
https://git.kernel.org/pub/scm/linux/kernel/git/stable/linux-stable-rc.git
git branch: linux-5.7.y
git commit: 97943c6d41ef2b05f4e064eb49a538ff4b405809
git describe: v5.7.6-265-g97943c6d41ef
Test details: 
https://qa-reports.linaro.org/lkft/linux-stable-rc-5.7-oe/build/v5.7.6-265-g97943c6d41ef

No regressions (compared to build v5.7.6)

No fixes (compared to build v5.7.6)


Ran 36511 total tests in the following environments and test suites.

Environments
--
- dragonboard-410c
- hi6220-hikey
- i386
- juno-r2
- juno-r2-compat
- juno-r2-kasan
- nxp-ls2088
- qemu_arm
- qemu_arm64
- qemu_i386
- qemu_x86_64
- x15
- x86
- x86-kasan

Test Suites
---
* build
* install-android-platform-tools-r2600
* install-android-platform-tools-r2800
* kselftest
* kselftest/drivers
* kselftest/filesystems
* kselftest/net
* libhugetlbfs
* linux-log-parser
* ltp-containers-tests
* ltp-cve-tests
* ltp-dio-tests
* ltp-fcntl-locktests-tests
* ltp-filecaps-tests
* ltp-fs-tests
* ltp-fs_bind-tests
* ltp-fs_perms_simple-tests
* ltp-fsx-tests
* ltp-hugetlb-tests
* ltp-io-tests
* ltp-ipc-tests
* ltp-mm-tests
* ltp-sched-tests
* ltp-syscalls-tests
* perf
* v4l2-compliance
* kvm-unit-tests
* ltp-cap_bounds-tests
* ltp-commands-tests
* ltp-cpuhotplug-tests
* ltp-crypto-tests
* ltp-math-tests
* network-basic-tests
* ltp-controllers-tests
* ltp-nptl-tests
* ltp-open-posix-tests
* ltp-pty-tests
* ltp-securebits-tests
* kselftest-vsyscall-mode-native
* kselftest-vsyscall-mode-native/drivers
* kselftest-vsyscall-mode-native/filesystems
* kselftest-vsyscall-mode-native/net
* kselftest-vsyscall-mode-none
* kselftest-vsyscall-mode-none/drivers
* kselftest-vsyscall-mode-none/filesystems
* kselftest-vsyscall-mode-none/net

-- 
Linaro LKFT
https://lkft.linaro.org

Re: [PATCH 1/2] arm64: dts: ls1088a: add more thermal zone support

2020-06-29 Thread Amit Kucheria

On Tue, Jun 30, 2020 at 8:56 AM  wrote:
>
> From: Yuantian Tang 
>
> There are 2 thermal zones in ls1088a soc. Add the other thermal zone
> node to enable it.
> Also update the values in calibration table to make the temperatures
> monitored more precise.
>
> Signed-off-by: Yuantian Tang 
> ---
>  .../arm64/boot/dts/freescale/fsl-ls1088a.dtsi | 100 +++---
>  1 file changed, 62 insertions(+), 38 deletions(-)
>
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi 
> b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
> index 36a799554620..ccbbc23e6c85 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1088a.dtsi
> @@ -129,19 +129,19 @@
> };
>
> thermal-zones {
> -   cpu_thermal: cpu-thermal {
> +   core-cluster {
> polling-delay-passive = <1000>;
> polling-delay = <5000>;
> thermal-sensors = < 0>;
>
> trips {
> -   cpu_alert: cpu-alert {
> +   core_cluster_alert: core-cluster-alert {
> temperature = <85000>;
> hysteresis = <2000>;
> type = "passive";
> };
>
> -   cpu_crit: cpu-crit {
> +   core_cluster_crit: core-cluster-crit {
> temperature = <95000>;
> hysteresis = <2000>;
> type = "critical";
> @@ -150,7 +150,7 @@
>
> cooling-maps {
> map0 {
> -   trip = <_alert>;
> +   trip = <_cluster_alert>;
> cooling-device =
> < THERMAL_NO_LIMIT 
> THERMAL_NO_LIMIT>,
> < THERMAL_NO_LIMIT 
> THERMAL_NO_LIMIT>,
> @@ -163,6 +163,26 @@
> };
> };
> };
> +
> +   soc {
> +   polling-delay-passive = <1000>;
> +   polling-delay = <5000>;
> +   thermal-sensors = < 1>;
> +
> +   trips {
> +   soc-alert {
> +   temperature = <85000>;
> +   hysteresis = <2000>;
> +   type = "passive";
> +   };
> +
> +   soc-crit {
> +   temperature = <95000>;
> +   hysteresis = <2000>;
> +   type = "critical";
> +   };
> +   };
> +   };

You should also add a cooling-maps section for this thermal zone given
that it has a passive trip type. Otherwise there is no use for a
passive trip type.

> };
>
> timer {
> @@ -209,45 +229,49 @@
> compatible = "fsl,qoriq-tmu";
> reg = <0x0 0x1f8 0x0 0x1>;
> interrupts = <0 23 0x4>;
> -   fsl,tmu-range = <0xb 0x9002a 0x6004c 0x30062>;
> +   fsl,tmu-range = <0xb 0x9002a 0x6004c 0x70062>;
> fsl,tmu-calibration =
> /* Calibration data group 1 */
> -   <0x 0x0026
> -   0x0001 0x002d
> -   0x0002 0x0032
> -   0x0003 0x0039
> -   0x0004 0x003f
> -   0x0005 0x0046
> -   0x0006 0x004d
> -   0x0007 0x0054
> -   0x0008 0x005a
> -   0x0009 0x0061
> -   0x000a 0x006a
> -   0x000b 0x0071
> +   <0x 0x0023
> +   0x0001 0x002a
> +   0x0002 0x0030
> +   0x0003 0x0037
> +   0x0004 0x003d
> +   0x0005 0x0044
> +   0x0006 0x004a
> +   0x0007 0x0051
> +   0x0008 0x0057
> +   0x0009 0x005e
> +

Re: [PATCH v4 10/23] ASoC: simple-card: Wrong daifmt for CPU end of DPCM DAI link

2020-06-29 Thread Kuninori Morimoto



Hi Sameer

> For DPCM links I don't want to flip based on one Codec reference. My
> goal was to make the binding work for multiple CPU/Codec link. Hence I
> thought it would be better to explicitly describe the 'Master' DAI. We
> can eventually get rid of 'codec' argument from
> simple_dai_link_of_dpcm().

Yes. 'codec' argument on current simple_dai_link_of_dpcm()
is not good match for multi Codec support.

Thank you for your help !!

Best regards
---
Kuninori Morimoto

Re: [PATCH v3 00/15] HWPOISON: soft offline rework

2020-06-29 Thread Qian Cai

On Wed, Jun 24, 2020 at 03:01:22PM +, nao.horigu...@gmail.com wrote:
> I rebased soft-offline rework patchset [1][2] onto the latest mmotm.  The
> rebasing required some non-trivial changes to adjust, but mainly that was
> straightforward.  I confirmed that the reported problem doesn't reproduce on
> compaction after soft offline.  For more precise description of the problem
> and the motivation of this patchset, please see [2].
> 
> I think that the following two patches in v2 are better to be done with
> separate work of hard-offline rework, so it's not included in this series.
> 
>   - mm,hwpoison: Take pages off the buddy when hard-offlining
>   - mm/hwpoison-inject: Rip off duplicated checks
> 
> These two are not directly related to the reported problem, so they seems
> not urgent.  And the first one breaks num_poisoned_pages counting in some
> testcases, and The second patch needs more consideration about commented 
> point.
> 
> Any comment/suggestion/help would be appreciated.

Even after applied the compling fix,

https://lore.kernel.org/linux-mm/20200628065409.GA546944@u2004/

madvise(MADV_SOFT_OFFLINE) will fail with EIO with hugetlb where it
would succeed without this series. Steps:

# git clone https://github.com/cailca/linux-mm
# cd linux-mm; make
# ./random 1 (Need at least two NUMA memory nodes)
 start: migrate_huge_offline
- use NUMA nodes 0,4.
- mmap and free 8388608 bytes hugepages on node 0
- mmap and free 8388608 bytes hugepages on node 4
madvise: Input/output error

(x86.config is also included there.)

[10718.158548][T13039] __get_any_page: 0x1d1600 free huge page
[10718.165684][T13039] Soft offlining pfn 0x1d1600 at process virtual address 
0x7f1dd200
[10718.175061][T13039] Soft offlining pfn 0x154c00 at process virtual address 
0x7f1dd220
[10718.185492][T13039] Soft offlining pfn 0x137c00 at process virtual address 
0x7f1dd200
[10718.195209][T13039] Soft offlining pfn 0x4c7a00 at process virtual address 
0x7f1dd220
[10718.203483][T13039] soft offline: 0x4c7a00: hugepage isolation failed: 0, 
page count 2, type bfffc1000f (locked|referenced|uptodate|dirty|head)
[10718.218228][T13039] Soft offlining pfn 0x4c7a00 at process virtual address 
0x7f1dd200
[10718.227522][T13039] Soft offlining pfn 0x2da800 at process virtual address 
0x7f1dd220
[10718.238503][T13039] Soft offlining pfn 0x1de200 at process virtual address 
0x7f1dd200
[10718.247822][T13039] Soft offlining pfn 0x155c00 at process virtual address 
0x7f1dd220
[10718.259031][T13039] Soft offlining pfn 0x203600 at process virtual address 
0x7f1dd200
[10718.268504][T13039] Soft offlining pfn 0x417600 at process virtual address 
0x7f1dd220
[10718.278830][T13039] Soft offlining pfn 0x20a600 at process virtual address 
0x7f1dd200
[10718.288133][T13039] Soft offlining pfn 0x1d0800 at process virtual address 
0x7f1dd220
[10718.299198][T13039] Soft offlining pfn 0x3e5800 at process virtual address 
0x7f1dd200
[10718.308593][T13039] Soft offlining pfn 0x21f200 at process virtual address 
0x7f1dd220
[10718.319725][T13039] Soft offlining pfn 0x18c600 at process virtual address 
0x7f1dd200
[10718.329301][T13039] Soft offlining pfn 0x396a00 at process virtual address 
0x7f1dd220
[10718.378882][T13039] Soft offlining pfn 0x4d5000 at process virtual address 
0x7f1dd200
[10718.388415][T13039] Soft offlining pfn 0x4e5000 at process virtual address 
0x7f1dd220
[10718.398807][T13039] Soft offlining pfn 0x2f5000 at process virtual address 
0x7f1dd200
[10718.408236][T13039] Soft offlining pfn 0x50b400 at process virtual address 
0x7f1dd220
[10718.419781][T13039] Soft offlining pfn 0x396800 at process virtual address 
0x7f1dd200
[10718.429677][T13039] Soft offlining pfn 0xd69c00 at process virtual address 
0x7f1dd220
[10718.440435][T13039] Soft offlining pfn 0x21f000 at process virtual address 
0x7f1dd200
[10718.450341][T13039] Soft offlining pfn 0x399400 at process virtual address 
0x7f1dd220
[10718.458768][T13039] __get_any_page: 0x399400: unknown zero refcount page 
type bfffc0

The main part is,
https://github.com/cailca/linux-mm/blob/master/random.c#L372

> 
> [1] v1: 
> https://lore.kernel.org/linux-mm/1541746035-13408-1-git-send-email-n-horigu...@ah.jp.nec.com/
> [2] v2: 
> https://lore.kernel.org/linux-mm/20191017142123.24245-1-osalva...@suse.de/
> 
> Thanks,
> Naoya Horiguchi
> ---
> Summary:
> 
> Naoya Horiguchi (7):
>   mm,hwpoison: cleanup unused PageHuge() check
>   mm, hwpoison: remove recalculating hpage
>   mm,madvise: call soft_offline_page() without MF_COUNT_INCREASED
>   mm,hwpoison-inject: don't pin for hwpoison_filter
>   mm,hwpoison: remove MF_COUNT_INCREASED
>   mm,hwpoison: remove flag argument from soft offline functions
>   mm,hwpoison: introduce MF_MSG_UNSPLIT_THP
> 
> Oscar Salvador (8):
>   mm,madvise: Refactor madvise_inject_error
>   mm,hwpoison: Un-export

Re: [PATCH net] xsk: remove cheap_dma optimization

2020-06-29 Thread Christoph Hellwig

On Mon, Jun 29, 2020 at 05:18:38PM +0200, Daniel Borkmann wrote:
> On 6/29/20 5:10 PM, Björn Töpel wrote:
>> On 2020-06-29 15:52, Daniel Borkmann wrote:
>>>
>>> Ok, fair enough, please work with DMA folks to get this properly integrated 
>>> and
>>> restored then. Applied, thanks!
>>
>> Daniel, you were too quick! Please revert this one; Christoph just submitted 
>> a 4-patch-series that addresses both the DMA API, and the perf regression!
>
> Nice, tossed from bpf tree then! (Looks like it didn't land on the bpf list 
> yet,
> but seems other mails are currently stuck as well on vger. I presume it will 
> be
> routed to Linus via Christoph?)

I send the patches to the bpf list, did you get them now that vger
is unclogged?  Thinking about it the best route might be through
bpf/net, so if that works for you please pick it up.

Re: [PATCH v4 4/7] iio: core: move debugfs data on the private iio dev info

2020-06-29 Thread Ardelean, Alexandru

On Tue, 2020-06-30 at 07:57 +0300, Alexandru Ardelean wrote:
> This change moves all iio_dev debugfs fields to the iio_dev_priv object.
> It's not the biggest advantage yet (to the whole thing of
> abstractization)
> but it's a start.
> 
> The iio_get_debugfs_dentry() function (which is moved in
> industrialio-core.c) needs to also be guarded against the CONFIG_DEBUG_FS
> symbol, when it isn't defined. We do want to keep the inline definition
> in
> the iio.h header, so that the compiler can better infer when to compile
> out
> debugfs code that is related to the IIO debugfs directory.
> 

Well, pretty much only this patch changed since V3.
I thought about maybe re-doing just this patch, then I thought maybe I'd
get a minor complaint that I should re-send the series.

Either way, I prefer a complaint on this V4 series-re-send than if I were
to have re-sent just this patch.


> Signed-off-by: Alexandru Ardelean 
> ---
>  drivers/iio/industrialio-core.c | 46 +++--
>  include/linux/iio/iio-opaque.h  | 10 +++
>  include/linux/iio/iio.h | 13 +-
>  3 files changed, 44 insertions(+), 25 deletions(-)
> 
> diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-
> core.c
> index 27005ba4d09c..64174052641a 100644
> --- a/drivers/iio/industrialio-core.c
> +++ b/drivers/iio/industrialio-core.c
> @@ -165,6 +165,19 @@ static const char * const iio_chan_info_postfix[] =
> {
>   [IIO_CHAN_INFO_THERMOCOUPLE_TYPE] = "thermocouple_type",
>  };
>  
> +#if !defined(CONFIG_DEBUG_FS)
> +/**
> + * There's also a CONFIG_DEBUG_FS guard in include/linux/iio/iio.h for
> + * iio_get_debugfs_dentry() to make it inline if CONFIG_DEBUG_FS is
> undefined
> + */
> +struct dentry *iio_get_debugfs_dentry(struct iio_dev *indio_dev)
> +{
> + struct iio_dev_opaque *iio_dev_opaque =
> to_iio_dev_opaque(indio_dev);
> + return iio_dev_opaque->debugfs_dentry;
> +}
> +EXPORT_SYMBOL_GPL(iio_get_debugfs_dentry);
> +#endif
> +
>  /**
>   * iio_find_channel_from_si() - get channel from its scan index
>   * @indio_dev:   device
> @@ -308,35 +321,37 @@ static ssize_t iio_debugfs_read_reg(struct file
> *file, char __user *userbuf,
> size_t count, loff_t *ppos)
>  {
>   struct iio_dev *indio_dev = file->private_data;
> + struct iio_dev_opaque *iio_dev_opaque =
> to_iio_dev_opaque(indio_dev);
>   unsigned val = 0;
>   int ret;
>  
>   if (*ppos > 0)
>   return simple_read_from_buffer(userbuf, count, ppos,
> -indio_dev->read_buf,
> -indio_dev->read_buf_len);
> +iio_dev_opaque->read_buf,
> +iio_dev_opaque-
> >read_buf_len);
>  
>   ret = indio_dev->info->debugfs_reg_access(indio_dev,
> -   indio_dev-
> >cached_reg_addr,
> +   iio_dev_opaque-
> >cached_reg_addr,
> 0, );
>   if (ret) {
>   dev_err(indio_dev->dev.parent, "%s: read failed\n",
> __func__);
>   return ret;
>   }
>  
> - indio_dev->read_buf_len = snprintf(indio_dev->read_buf,
> -sizeof(indio_dev->read_buf),
> -"0x%X\n", val);
> + iio_dev_opaque->read_buf_len = snprintf(iio_dev_opaque->read_buf,
> +   sizeof(iio_dev_opaque-
> >read_buf),
> +   "0x%X\n", val);
>  
>   return simple_read_from_buffer(userbuf, count, ppos,
> -indio_dev->read_buf,
> -indio_dev->read_buf_len);
> +iio_dev_opaque->read_buf,
> +iio_dev_opaque->read_buf_len);
>  }
>  
>  static ssize_t iio_debugfs_write_reg(struct file *file,
>const char __user *userbuf, size_t count, loff_t
> *ppos)
>  {
>   struct iio_dev *indio_dev = file->private_data;
> + struct iio_dev_opaque *iio_dev_opaque =
> to_iio_dev_opaque(indio_dev);
>   unsigned reg, val;
>   char buf[80];
>   int ret;
> @@ -351,10 +366,10 @@ static ssize_t iio_debugfs_write_reg(struct file
> *file,
>  
>   switch (ret) {
>   case 1:
> - indio_dev->cached_reg_addr = reg;
> + iio_dev_opaque->cached_reg_addr = reg;
>   break;
>   case 2:
> - indio_dev->cached_reg_addr = reg;
> + iio_dev_opaque->cached_reg_addr = reg;
>   ret = indio_dev->info->debugfs_reg_access(indio_dev, reg,
> val, NULL);
>   if (ret) {
> @@ -378,23 +393,28 @@ static const struct file_operations
> iio_debugfs_reg_fops = {
>

[PATCH v4 6/7] iio: core: move iio_dev's buffer_list to the private iio device object

This change moves the 'buffer_list' away from the public IIO device object
into the private part.

Signed-off-by: Alexandru Ardelean 
Signed-off-by: Jonathan Cameron 
---
 drivers/iio/industrialio-buffer.c | 38 +++
 drivers/iio/industrialio-core.c   |  2 +-
 include/linux/iio/iio-opaque.h|  2 ++
 include/linux/iio/iio.h   |  2 --
 4 files changed, 27 insertions(+), 17 deletions(-)

diff --git a/drivers/iio/industrialio-buffer.c 
b/drivers/iio/industrialio-buffer.c
index 329dd4d6757a..2aec8b85f40d 100644
--- a/drivers/iio/industrialio-buffer.c
+++ b/drivers/iio/industrialio-buffer.c
@@ -19,6 +19,7 @@
 #include 
 
 #include 
+#include 
 #include "iio_core.h"
 #include "iio_core_trigger.h"
 #include 
@@ -599,8 +600,10 @@ static int iio_compute_scan_bytes(struct iio_dev 
*indio_dev,
 static void iio_buffer_activate(struct iio_dev *indio_dev,
struct iio_buffer *buffer)
 {
+   struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev);
+
iio_buffer_get(buffer);
-   list_add(>buffer_list, _dev->buffer_list);
+   list_add(>buffer_list, _dev_opaque->buffer_list);
 }
 
 static void iio_buffer_deactivate(struct iio_buffer *buffer)
@@ -612,10 +615,11 @@ static void iio_buffer_deactivate(struct iio_buffer 
*buffer)
 
 static void iio_buffer_deactivate_all(struct iio_dev *indio_dev)
 {
+   struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev);
struct iio_buffer *buffer, *_buffer;
 
list_for_each_entry_safe(buffer, _buffer,
-   _dev->buffer_list, buffer_list)
+   _dev_opaque->buffer_list, buffer_list)
iio_buffer_deactivate(buffer);
 }
 
@@ -688,6 +692,7 @@ static int iio_verify_update(struct iio_dev *indio_dev,
struct iio_buffer *insert_buffer, struct iio_buffer *remove_buffer,
struct iio_device_config *config)
 {
+   struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev);
unsigned long *compound_mask;
const unsigned long *scan_mask;
bool strict_scanmask = false;
@@ -710,12 +715,12 @@ static int iio_verify_update(struct iio_dev *indio_dev,
 * to verify.
 */
if (remove_buffer && !insert_buffer &&
-   list_is_singular(_dev->buffer_list))
+   list_is_singular(_dev_opaque->buffer_list))
return 0;
 
modes = indio_dev->modes;
 
-   list_for_each_entry(buffer, _dev->buffer_list, buffer_list) {
+   list_for_each_entry(buffer, _dev_opaque->buffer_list, buffer_list) {
if (buffer == remove_buffer)
continue;
modes &= buffer->access->modes;
@@ -736,7 +741,7 @@ static int iio_verify_update(struct iio_dev *indio_dev,
 * Keep things simple for now and only allow a single buffer to
 * be connected in hardware mode.
 */
-   if (insert_buffer && !list_empty(_dev->buffer_list))
+   if (insert_buffer && !list_empty(_dev_opaque->buffer_list))
return -EINVAL;
config->mode = INDIO_BUFFER_HARDWARE;
strict_scanmask = true;
@@ -756,7 +761,7 @@ static int iio_verify_update(struct iio_dev *indio_dev,
 
scan_timestamp = false;
 
-   list_for_each_entry(buffer, _dev->buffer_list, buffer_list) {
+   list_for_each_entry(buffer, _dev_opaque->buffer_list, buffer_list) {
if (buffer == remove_buffer)
continue;
bitmap_or(compound_mask, compound_mask, buffer->scan_mask,
@@ -902,10 +907,11 @@ static int iio_buffer_update_demux(struct iio_dev 
*indio_dev,
 
 static int iio_update_demux(struct iio_dev *indio_dev)
 {
+   struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev);
struct iio_buffer *buffer;
int ret;
 
-   list_for_each_entry(buffer, _dev->buffer_list, buffer_list) {
+   list_for_each_entry(buffer, _dev_opaque->buffer_list, buffer_list) {
ret = iio_buffer_update_demux(indio_dev, buffer);
if (ret < 0)
goto error_clear_mux_table;
@@ -913,7 +919,7 @@ static int iio_update_demux(struct iio_dev *indio_dev)
return 0;
 
 error_clear_mux_table:
-   list_for_each_entry(buffer, _dev->buffer_list, buffer_list)
+   list_for_each_entry(buffer, _dev_opaque->buffer_list, buffer_list)
iio_buffer_demux_free(buffer);
 
return ret;
@@ -922,6 +928,7 @@ static int iio_update_demux(struct iio_dev *indio_dev)
 static int iio_enable_buffers(struct iio_dev *indio_dev,
struct iio_device_config *config)
 {
+   struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev);
struct iio_buffer *buffer;
int ret;
 
@@ -958,7 +965,7 @@ static int iio_enable_buffers(struct iio_dev *indio_dev,
indio_dev->info->hwfifo_set_watermark(indio_dev,

[PATCH v4 7/7] iio: core: move event interface on the opaque struct

Same as with other private fields, this moves the event interface reference
to the opaque IIO device object, to be invisible to drivers.

Signed-off-by: Alexandru Ardelean 
Signed-off-by: Jonathan Cameron 
---
 drivers/iio/industrialio-core.c  |  5 ++-
 drivers/iio/industrialio-event.c | 68 +++-
 include/linux/iio/iio-opaque.h   |  2 +
 include/linux/iio/iio.h  |  3 --
 4 files changed, 45 insertions(+), 33 deletions(-)

diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c
index 461a4e7f48d7..a0ad152c82a7 100644
--- a/drivers/iio/industrialio-core.c
+++ b/drivers/iio/industrialio-core.c
@@ -211,7 +211,8 @@ EXPORT_SYMBOL(iio_read_const_attr);
 int iio_device_set_clock(struct iio_dev *indio_dev, clockid_t clock_id)
 {
int ret;
-   const struct iio_event_interface *ev_int = indio_dev->event_interface;
+   struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev);
+   const struct iio_event_interface *ev_int = 
iio_dev_opaque->event_interface;
 
ret = mutex_lock_interruptible(_dev->mlock);
if (ret)
@@ -1442,7 +1443,7 @@ static int iio_device_register_sysfs(struct iio_dev 
*indio_dev)
attrcount += ret;
}
 
-   if (indio_dev->event_interface)
+   if (iio_dev_opaque->event_interface)
clk = _attr_current_timestamp_clock.attr;
 
if (indio_dev->name)
diff --git a/drivers/iio/industrialio-event.c b/drivers/iio/industrialio-event.c
index 5b17c92d3b50..2ab4d4c44427 100644
--- a/drivers/iio/industrialio-event.c
+++ b/drivers/iio/industrialio-event.c
@@ -18,6 +18,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "iio_core.h"
 #include 
 #include 
@@ -62,7 +63,8 @@ bool iio_event_enabled(const struct iio_event_interface 
*ev_int)
  **/
 int iio_push_event(struct iio_dev *indio_dev, u64 ev_code, s64 timestamp)
 {
-   struct iio_event_interface *ev_int = indio_dev->event_interface;
+   struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev);
+   struct iio_event_interface *ev_int = iio_dev_opaque->event_interface;
struct iio_event_data ev;
int copied;
 
@@ -96,7 +98,8 @@ static __poll_t iio_event_poll(struct file *filep,
 struct poll_table_struct *wait)
 {
struct iio_dev *indio_dev = filep->private_data;
-   struct iio_event_interface *ev_int = indio_dev->event_interface;
+   struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev);
+   struct iio_event_interface *ev_int = iio_dev_opaque->event_interface;
__poll_t events = 0;
 
if (!indio_dev->info)
@@ -116,7 +119,8 @@ static ssize_t iio_event_chrdev_read(struct file *filep,
 loff_t *f_ps)
 {
struct iio_dev *indio_dev = filep->private_data;
-   struct iio_event_interface *ev_int = indio_dev->event_interface;
+   struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev);
+   struct iio_event_interface *ev_int = iio_dev_opaque->event_interface;
unsigned int copied;
int ret;
 
@@ -165,7 +169,8 @@ static ssize_t iio_event_chrdev_read(struct file *filep,
 static int iio_event_chrdev_release(struct inode *inode, struct file *filep)
 {
struct iio_dev *indio_dev = filep->private_data;
-   struct iio_event_interface *ev_int = indio_dev->event_interface;
+   struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev);
+   struct iio_event_interface *ev_int = iio_dev_opaque->event_interface;
 
clear_bit(IIO_BUSY_BIT_POS, _int->flags);
 
@@ -184,7 +189,8 @@ static const struct file_operations 
iio_event_chrdev_fileops = {
 
 int iio_event_getfd(struct iio_dev *indio_dev)
 {
-   struct iio_event_interface *ev_int = indio_dev->event_interface;
+   struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev);
+   struct iio_event_interface *ev_int = iio_dev_opaque->event_interface;
int fd;
 
if (ev_int == NULL)
@@ -343,6 +349,7 @@ static int iio_device_add_event(struct iio_dev *indio_dev,
enum iio_event_type type, enum iio_event_direction dir,
enum iio_shared_by shared_by, const unsigned long *mask)
 {
+   struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev);
ssize_t (*show)(struct device *, struct device_attribute *, char *);
ssize_t (*store)(struct device *, struct device_attribute *,
const char *, size_t);
@@ -376,7 +383,7 @@ static int iio_device_add_event(struct iio_dev *indio_dev,
 
ret = __iio_add_chan_devattr(postfix, chan, show, store,
 (i << 16) | spec_index, shared_by, _dev->dev,
-   _dev->event_interface->dev_attr_list);
+   _dev_opaque->event_interface->dev_attr_list);
kfree(postfix);
 
if ((ret == -EBUSY) && (shared_by != IIO_SEPARATE))
@@

[PATCH v4 3/7] iio: core: remove padding from private information

There was a recent discussion about this code:
  https://lore.kernel.org/linux-iio/20200322165317.0b1f0674@archlinux/

This looks like a good time to removed this, since any issues about it
should pop-up under testing, because the iio_dev is having a bit of an
overhaul and stuff being moved to iio_dev_opaque.

Signed-off-by: Alexandru Ardelean 
Signed-off-by: Jonathan Cameron 
---
 drivers/iio/industrialio-core.c | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c
index 33e2953cf021..27005ba4d09c 100644
--- a/drivers/iio/industrialio-core.c
+++ b/drivers/iio/industrialio-core.c
@@ -1507,8 +1507,6 @@ struct iio_dev *iio_device_alloc(struct device *parent, 
int sizeof_priv)
alloc_size = ALIGN(alloc_size, IIO_ALIGN);
alloc_size += sizeof_priv;
}
-   /* ensure 32-byte alignment of whole construct ? */
-   alloc_size += IIO_ALIGN - 1;
 
iio_dev_opaque = kzalloc(alloc_size, GFP_KERNEL);
if (!iio_dev_opaque)
-- 
2.17.1

[PATCH v4 5/7] iio: core: move channel list & group to private iio device object

This change bit straightforward and simple, since the
'channel_attr_list' & 'chan_attr_group' fields are only used in
'industrialio-core.c'.

This change moves to the private IIO device object

Signed-off-by: Alexandru Ardelean 
Signed-off-by: Jonathan Cameron 
---
 drivers/iio/industrialio-core.c | 46 +++--
 include/linux/iio/iio-opaque.h  |  5 
 include/linux/iio/iio.h |  5 
 3 files changed, 31 insertions(+), 25 deletions(-)

diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c
index 64174052641a..21fbb187ee79 100644
--- a/drivers/iio/industrialio-core.c
+++ b/drivers/iio/industrialio-core.c
@@ -1137,6 +1137,7 @@ static int iio_device_add_info_mask_type(struct iio_dev 
*indio_dev,
 enum iio_shared_by shared_by,
 const long *infomask)
 {
+   struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev);
int i, ret, attrcount = 0;
 
for_each_set_bit(i, infomask, sizeof(*infomask)*8) {
@@ -1149,7 +1150,7 @@ static int iio_device_add_info_mask_type(struct iio_dev 
*indio_dev,
 i,
 shared_by,
 _dev->dev,
-_dev->channel_attr_list);
+
_dev_opaque->channel_attr_list);
if ((ret == -EBUSY) && (shared_by != IIO_SEPARATE))
continue;
else if (ret < 0)
@@ -1165,6 +1166,7 @@ static int iio_device_add_info_mask_type_avail(struct 
iio_dev *indio_dev,
   enum iio_shared_by shared_by,
   const long *infomask)
 {
+   struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev);
int i, ret, attrcount = 0;
char *avail_postfix;
 
@@ -1184,7 +1186,7 @@ static int iio_device_add_info_mask_type_avail(struct 
iio_dev *indio_dev,
 i,
 shared_by,
 _dev->dev,
-_dev->channel_attr_list);
+
_dev_opaque->channel_attr_list);
kfree(avail_postfix);
if ((ret == -EBUSY) && (shared_by != IIO_SEPARATE))
continue;
@@ -1199,6 +1201,7 @@ static int iio_device_add_info_mask_type_avail(struct 
iio_dev *indio_dev,
 static int iio_device_add_channel_sysfs(struct iio_dev *indio_dev,
struct iio_chan_spec const *chan)
 {
+   struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev);
int ret, attrcount = 0;
const struct iio_chan_spec_ext_info *ext_info;
 
@@ -1274,7 +1277,7 @@ static int iio_device_add_channel_sysfs(struct iio_dev 
*indio_dev,
i,
ext_info->shared,
_dev->dev,
-   _dev->channel_attr_list);
+   _dev_opaque->channel_attr_list);
i++;
if (ret == -EBUSY && ext_info->shared)
continue;
@@ -1409,6 +1412,7 @@ static DEVICE_ATTR(current_timestamp_clock, S_IRUGO | 
S_IWUSR,
 
 static int iio_device_register_sysfs(struct iio_dev *indio_dev)
 {
+   struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev);
int i, ret = 0, attrcount, attrn, attrcount_orig = 0;
struct iio_dev_attr *p;
struct attribute **attr, *clk = NULL;
@@ -1448,47 +1452,49 @@ static int iio_device_register_sysfs(struct iio_dev 
*indio_dev)
if (clk)
attrcount++;
 
-   indio_dev->chan_attr_group.attrs = kcalloc(attrcount + 1,
-  
sizeof(indio_dev->chan_attr_group.attrs[0]),
-  GFP_KERNEL);
-   if (indio_dev->chan_attr_group.attrs == NULL) {
+   iio_dev_opaque->chan_attr_group.attrs =
+   kcalloc(attrcount + 1,
+   sizeof(iio_dev_opaque->chan_attr_group.attrs[0]),
+   GFP_KERNEL);
+   if (iio_dev_opaque->chan_attr_group.attrs == NULL) {
ret = -ENOMEM;
goto error_clear_attrs;
}
/* Copy across original attributes */
if (indio_dev->info->attrs)
-   memcpy(indio_dev->chan_attr_group.attrs,
+   memcpy(iio_dev_opaque->chan_attr_group.attrs,
   indio_dev->info->attrs->attrs,
-  sizeof(indio_dev->chan_attr_group.attrs[0])
+

[PATCH v4 4/7] iio: core: move debugfs data on the private iio dev info

This change moves all iio_dev debugfs fields to the iio_dev_priv object.
It's not the biggest advantage yet (to the whole thing of abstractization)
but it's a start.

The iio_get_debugfs_dentry() function (which is moved in
industrialio-core.c) needs to also be guarded against the CONFIG_DEBUG_FS
symbol, when it isn't defined. We do want to keep the inline definition in
the iio.h header, so that the compiler can better infer when to compile out
debugfs code that is related to the IIO debugfs directory.

Signed-off-by: Alexandru Ardelean 
---
 drivers/iio/industrialio-core.c | 46 +++--
 include/linux/iio/iio-opaque.h  | 10 +++
 include/linux/iio/iio.h | 13 +-
 3 files changed, 44 insertions(+), 25 deletions(-)

diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c
index 27005ba4d09c..64174052641a 100644
--- a/drivers/iio/industrialio-core.c
+++ b/drivers/iio/industrialio-core.c
@@ -165,6 +165,19 @@ static const char * const iio_chan_info_postfix[] = {
[IIO_CHAN_INFO_THERMOCOUPLE_TYPE] = "thermocouple_type",
 };
 
+#if !defined(CONFIG_DEBUG_FS)
+/**
+ * There's also a CONFIG_DEBUG_FS guard in include/linux/iio/iio.h for
+ * iio_get_debugfs_dentry() to make it inline if CONFIG_DEBUG_FS is undefined
+ */
+struct dentry *iio_get_debugfs_dentry(struct iio_dev *indio_dev)
+{
+   struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev);
+   return iio_dev_opaque->debugfs_dentry;
+}
+EXPORT_SYMBOL_GPL(iio_get_debugfs_dentry);
+#endif
+
 /**
  * iio_find_channel_from_si() - get channel from its scan index
  * @indio_dev: device
@@ -308,35 +321,37 @@ static ssize_t iio_debugfs_read_reg(struct file *file, 
char __user *userbuf,
  size_t count, loff_t *ppos)
 {
struct iio_dev *indio_dev = file->private_data;
+   struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev);
unsigned val = 0;
int ret;
 
if (*ppos > 0)
return simple_read_from_buffer(userbuf, count, ppos,
-  indio_dev->read_buf,
-  indio_dev->read_buf_len);
+  iio_dev_opaque->read_buf,
+  iio_dev_opaque->read_buf_len);
 
ret = indio_dev->info->debugfs_reg_access(indio_dev,
- indio_dev->cached_reg_addr,
+ 
iio_dev_opaque->cached_reg_addr,
  0, );
if (ret) {
dev_err(indio_dev->dev.parent, "%s: read failed\n", __func__);
return ret;
}
 
-   indio_dev->read_buf_len = snprintf(indio_dev->read_buf,
-  sizeof(indio_dev->read_buf),
-  "0x%X\n", val);
+   iio_dev_opaque->read_buf_len = snprintf(iio_dev_opaque->read_buf,
+ sizeof(iio_dev_opaque->read_buf),
+ "0x%X\n", val);
 
return simple_read_from_buffer(userbuf, count, ppos,
-  indio_dev->read_buf,
-  indio_dev->read_buf_len);
+  iio_dev_opaque->read_buf,
+  iio_dev_opaque->read_buf_len);
 }
 
 static ssize_t iio_debugfs_write_reg(struct file *file,
 const char __user *userbuf, size_t count, loff_t *ppos)
 {
struct iio_dev *indio_dev = file->private_data;
+   struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev);
unsigned reg, val;
char buf[80];
int ret;
@@ -351,10 +366,10 @@ static ssize_t iio_debugfs_write_reg(struct file *file,
 
switch (ret) {
case 1:
-   indio_dev->cached_reg_addr = reg;
+   iio_dev_opaque->cached_reg_addr = reg;
break;
case 2:
-   indio_dev->cached_reg_addr = reg;
+   iio_dev_opaque->cached_reg_addr = reg;
ret = indio_dev->info->debugfs_reg_access(indio_dev, reg,
  val, NULL);
if (ret) {
@@ -378,23 +393,28 @@ static const struct file_operations iio_debugfs_reg_fops 
= {
 
 static void iio_device_unregister_debugfs(struct iio_dev *indio_dev)
 {
-   debugfs_remove_recursive(indio_dev->debugfs_dentry);
+   struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev);
+   debugfs_remove_recursive(iio_dev_opaque->debugfs_dentry);
 }
 
 static void iio_device_register_debugfs(struct iio_dev *indio_dev)
 {
+   struct iio_dev_opaque *iio_dev_opaque;
+
if (indio_dev->info->debugfs_reg_access == NULL)
return;
 
if

Re: [PATCH] phy: qcom: remove ufs qmp phy driver

2020-06-29 Thread Vinod Koul

Hi Bjorn,

On 29-06-20, 12:24, Bjorn Andersson wrote:
> On Mon 29 Jun 07:54 PDT 2020, Vinod Koul wrote:
> 
> > UFS QMP phy drivers are duplicate as we are supposed to use common QMP
> > phy driver which is working fine on various platforms. So remove the
> > unused driver
> > 
> 
> This describes the current state, but the UFS QMP driver had a purpose
> not that long ago and I would like the commit message to describe what
> changed and why it's now fine to remove the driver.

Would below look better, also feel free to suggest as you have the
more history on this :)

"UFS QMP driver is dedicated driver for QMP phy for UFS variant. We
also have a common QMP phy driver which works not only for UFS but
USB and PCIe as well, so retire this driver in favour of the common
driver"

> 
> I'm happy with the patch itself (i.e. the removal of the driver) though.

Thanks
-- 
~Vinod

[PATCH v4 2/7] iio: core: wrap IIO device into an iio_dev_opaque object

There are plenty of bad designs we want to discourage or not have to review
manually usually about accessing private (marked as [INTERN]) fields of
'struct iio_dev'.

Sometimes users copy drivers that are not always the best examples.

A better idea is to hide those fields into the framework.
For 'struct iio_dev' this is a 'struct iio_dev_opaque' which wraps a public
'struct iio_dev' object.

In the next series, some fields will be moved to this new struct, each with
it's own rework.

This rework will not be complete-able for a while, as many fields need some
drivers to be reworked in order to finalize them (e.g. 'indio_dev->mlock').

But some fields can already be moved, and in time, all of them may get
there (in the 'struct iio_dev_opaque' object).

Since a lot of drivers also call 'iio_priv()', in order to preserve
fast-paths (where this matters), the public iio_dev object will have a
'priv' field that will have the pointer to the private information already
computed. The reference returned by this field should be guaranteed to be
cacheline aligned.

The opaque parts will be moved into the 'include/linux/iio/iio-opaque.h'
header. Should the hidden information be required for some debugging or
some special needs, it can be made available via this header.
Otherwise, only the IIO core files should include this file.

Signed-off-by: Alexandru Ardelean 
Signed-off-by: Jonathan Cameron 
---
 drivers/iio/industrialio-core.c | 19 +--
 include/linux/iio/iio-opaque.h  | 17 +
 include/linux/iio/iio.h |  6 +-
 3 files changed, 35 insertions(+), 7 deletions(-)
 create mode 100644 include/linux/iio/iio-opaque.h

diff --git a/drivers/iio/industrialio-core.c b/drivers/iio/industrialio-core.c
index 75661661aaba..33e2953cf021 100644
--- a/drivers/iio/industrialio-core.c
+++ b/drivers/iio/industrialio-core.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "iio_core.h"
 #include "iio_core_trigger.h"
 #include 
@@ -1473,6 +1474,8 @@ static void iio_device_unregister_sysfs(struct iio_dev 
*indio_dev)
 static void iio_dev_release(struct device *device)
 {
struct iio_dev *indio_dev = dev_to_iio_dev(device);
+   struct iio_dev_opaque *iio_dev_opaque = to_iio_dev_opaque(indio_dev);
+
if (indio_dev->modes & INDIO_ALL_TRIGGERED_MODES)
iio_device_unregister_trigger_consumer(indio_dev);
iio_device_unregister_eventset(indio_dev);
@@ -1481,7 +1484,7 @@ static void iio_dev_release(struct device *device)
iio_buffer_put(indio_dev->buffer);
 
ida_simple_remove(_ida, indio_dev->id);
-   kfree(indio_dev);
+   kfree(iio_dev_opaque);
 }
 
 struct device_type iio_device_type = {
@@ -1495,10 +1498,11 @@ struct device_type iio_device_type = {
  **/
 struct iio_dev *iio_device_alloc(struct device *parent, int sizeof_priv)
 {
+   struct iio_dev_opaque *iio_dev_opaque;
struct iio_dev *dev;
size_t alloc_size;
 
-   alloc_size = sizeof(struct iio_dev);
+   alloc_size = sizeof(struct iio_dev_opaque);
if (sizeof_priv) {
alloc_size = ALIGN(alloc_size, IIO_ALIGN);
alloc_size += sizeof_priv;
@@ -1506,11 +1510,14 @@ struct iio_dev *iio_device_alloc(struct device *parent, 
int sizeof_priv)
/* ensure 32-byte alignment of whole construct ? */
alloc_size += IIO_ALIGN - 1;
 
-   dev = kzalloc(alloc_size, GFP_KERNEL);
-   if (!dev)
+   iio_dev_opaque = kzalloc(alloc_size, GFP_KERNEL);
+   if (!iio_dev_opaque)
return NULL;
 
-   dev->dev.parent = parent;
+   dev = _dev_opaque->indio_dev;
+   dev->priv = (char *)iio_dev_opaque +
+   ALIGN(sizeof(struct iio_dev_opaque), IIO_ALIGN);
+
dev->dev.groups = dev->groups;
dev->dev.type = _device_type;
dev->dev.bus = _bus_type;
@@ -1524,7 +1531,7 @@ struct iio_dev *iio_device_alloc(struct device *parent, 
int sizeof_priv)
if (dev->id < 0) {
/* cannot use a dev_err as the name isn't available */
pr_err("failed to get device id\n");
-   kfree(dev);
+   kfree(iio_dev_opaque);
return NULL;
}
dev_set_name(>dev, "iio:device%d", dev->id);
diff --git a/include/linux/iio/iio-opaque.h b/include/linux/iio/iio-opaque.h
new file mode 100644
index ..1375674f14cd
--- /dev/null
+++ b/include/linux/iio/iio-opaque.h
@@ -0,0 +1,17 @@
+/* SPDX-License-Identifier: GPL-2.0 */
+
+#ifndef _INDUSTRIAL_IO_OPAQUE_H_
+#define _INDUSTRIAL_IO_OPAQUE_H_
+
+/**
+ * struct iio_dev_opaque - industrial I/O device opaque information
+ * @indio_dev: public industrial I/O device information
+ */
+struct iio_dev_opaque {
+   struct iio_dev  indio_dev;
+};
+
+#define to_iio_dev_opaque(indio_dev)   \
+   container_of(indio_dev, struct iio_dev_opaque, indio_dev)
+
+#endif
diff --git a/include/linux/iio/iio.h

[PATCH v4 1/7] iio: core: remove iio_priv_to_dev() helper

All users of this helper have been updated to not use it.
Remove it now, so that we don't need to move it when creating the
iio_dev_opaque structure.

Signed-off-by: Alexandru Ardelean 
Signed-off-by: Jonathan Cameron 
---
 include/linux/iio/iio.h | 6 --
 1 file changed, 6 deletions(-)

diff --git a/include/linux/iio/iio.h b/include/linux/iio/iio.h
index 1c1d02107722..10a6d97a8e3e 100644
--- a/include/linux/iio/iio.h
+++ b/include/linux/iio/iio.h
@@ -703,12 +703,6 @@ static inline void *iio_priv(const struct iio_dev 
*indio_dev)
return (char *)indio_dev + ALIGN(sizeof(struct iio_dev), IIO_ALIGN);
 }
 
-static inline struct iio_dev *iio_priv_to_dev(void *priv)
-{
-   return (struct iio_dev *)((char *)priv -
- ALIGN(sizeof(struct iio_dev), IIO_ALIGN));
-}
-
 void iio_device_free(struct iio_dev *indio_dev);
 struct iio_dev *devm_iio_device_alloc(struct device *parent, int sizeof_priv);
 struct iio_trigger *devm_iio_trigger_alloc(struct device *dev,
-- 
2.17.1

[PATCH v4 0/7] iio: core: wrap IIO device into an iio_dev_opaque object

This change starts to hide some internal fields of the IIO device into
the framework.

This patchset assumes that all drivers have been addressed with respect
to the use of iio_priv_to_dev(), so the first patch in the series (now)
is to remove it, so that we don't need to move it.

Changelog v3 -> v4:
* added ifdef guard for iio_get_debugfs_dentry();
  Reported-by: kernel test robot 

V2 series:
   https://patchwork.kernel.org/patch/11548709/
   
https://lore.kernel.org/linux-iio/20200514131710.84201-1-alexandru.ardel...@analog.com/
Referencing them, since it took a bit longer to get to V3

Changelog v2 -> v3:
- no driver should use iio_priv_to_dev() now; all drivers that use it
  should have been taken care by now; and as a result, all patches from
  v2 are no longer here
- added patch that just removes iio_priv_to_dev()
- for patch 'iio: core: wrap IIO device into an iio_dev_opaque object'
  added comment about 'priv' field that it must be accessed via
  iio_priv() helper
- patch 'iio: core: simplify alloc alignment code' removed
  added 'iio: core: remove padding from private information' instead
- v2 did not address too many movements from iio_dev -> iio_dev_opaque
  this patchset introduces a few more, the ones that seemed easier;
  v2 only had 'iio: core: move debugfs data on the private iio dev info'
  anything after that was added in v3

Changelog v1 -> v2:
- add pre-work patches that remove some calls to iio_priv_to_dev() from
  interrupt handlers
- renamed iio_dev_priv -> iio_dev_opaque
- moved the iio_dev_opaque to 'include/linux/iio/iio-opaque.h' this way
  it should be usable for debugging
- the iio_priv() call, is still an inline function that returns an
  'indio_dev->priv' reference; this field is added to 'struct iio_dev';
  the reference is computed in iio_device_alloc() and should be
  cacheline aligned
- the to_iio_dev_opaque() container is in the
  'include/linux/iio/iio-opaque.h' header; it's still implemented with
  some pointer arithmetic; one idea was to do it via an
  'indio_dev->opaque' field; that may still be an optionif there are
  some opinions in that direction

Alexandru Ardelean (7):
  iio: core: remove iio_priv_to_dev() helper
  iio: core: wrap IIO device into an iio_dev_opaque object
  iio: core: remove padding from private information
  iio: core: move debugfs data on the private iio dev info
  iio: core: move channel list & group to private iio device object
  iio: core: move iio_dev's buffer_list to the private iio device object
  iio: core: move event interface on the opaque struct

 drivers/iio/industrialio-buffer.c |  38 ++
 drivers/iio/industrialio-core.c   | 120 +++---
 drivers/iio/industrialio-event.c  |  68 ++---
 include/linux/iio/iio-opaque.h|  36 +
 include/linux/iio/iio.h   |  35 ++---
 5 files changed, 182 insertions(+), 115 deletions(-)
 create mode 100644 include/linux/iio/iio-opaque.h

-- 
2.17.1

Re: [PATCH 3/7] iommu/vt-d: Fix PASID devTLB invalidation

2020-06-29 Thread Jacob Pan

On Tue, 30 Jun 2020 03:01:29 +
"Tian, Kevin"  wrote:

> > From: Lu Baolu 
> > Sent: Thursday, June 25, 2020 3:26 PM
> > 
> > On 2020/6/23 23:43, Jacob Pan wrote:  
> > > DevTLB flush can be used for both DMA request with and without
> > > PASIDs. The former uses PASID#0 (RID2PASID), latter uses non-zero
> > > PASID for SVA usage.
> > >
> > > This patch adds a check for PASID value such that devTLB flush
> > > with PASID is used for SVA case. This is more efficient in that
> > > multiple PASIDs can be used by a single device, when tearing down
> > > a PASID entry we shall flush only the devTLB specific to a PASID.
> > >
> > > Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table")  
> 
> btw is it really a fix? From the description it's more like an
> optimization...
> 
I guess it depends on how the issue is perceived. There is no
functional problem but the flush is too coarse w/o this patch.

> > > Signed-off-by: Jacob Pan 
> > > ---
> > >   drivers/iommu/intel/pasid.c | 11 ++-
> > >   1 file changed, 10 insertions(+), 1 deletion(-)
> > >
> > > diff --git a/drivers/iommu/intel/pasid.c
> > > b/drivers/iommu/intel/pasid.c index c81f0f17c6ba..3991a24539a1
> > > 100644 --- a/drivers/iommu/intel/pasid.c
> > > +++ b/drivers/iommu/intel/pasid.c
> > > @@ -486,7 +486,16 @@ devtlb_invalidation_with_pasid(struct  
> > intel_iommu *iommu,  
> > >   qdep = info->ats_qdep;
> > >   pfsid = info->pfsid;
> > >
> > > - qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 -
> > > VTD_PAGE_SHIFT);
> > > + /*
> > > +  * When PASID 0 is used, it indicates RID2PASID(DMA
> > > request w/o  
> > PASID),  
> > > +  * devTLB flush w/o PASID should be used. For non-zero
> > > PASID under
> > > +  * SVA usage, device could do DMA with multiple PASIDs.
> > > It is more
> > > +  * efficient to flush devTLB specific to the PASID.
> > > +  */
> > > + if (pasid)  
> > 
> > How about
> > 
> > if (pasid == PASID_RID2PASID)
> > qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 -
> > VTD_PAGE_SHIFT);
> > else
> > qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid,
> > qdep, 0, 64 -
> > VTD_PAGE_SHIFT);
> > 
> > ?
> > 
> > It makes the code more readable and still works even we reassign
> > another pasid for RID2PASID.
> > 
> > Best regards,
> > baolu
> >   
> > > + qi_flush_dev_iotlb_pasid(iommu, sid, pfsid,
> > > pasid, qdep, 0,  
> > 64 - VTD_PAGE_SHIFT);  
> > > + else
> > > + qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0,
> > > 64 -  
> > VTD_PAGE_SHIFT);  
> > >   }
> > >
> > >   void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
> > > struct  
> > device *dev,  
> > >  

[Jacob Pan]

Re: [PATCH] igb: reinit_locked() should be called with rtnl_lock

2020-06-29 Thread Francesco Ruggeri

> Would you mind adding a fixes tag here? Probably:
>
> Fixes: 9d5c824399de ("igb: PCI-Express 82575 Gigabit Ethernet driver")

That seems to be the commit that introduced the driver in 2.6.25.
I am not familiar with the history of the driver to tell if this was a day 1
problem or if it became an issue later.

>
> And as a matter of fact it looks like e1000e and e1000 have the same
> bug :/ Would you mind checking all Intel driver producing matches for
> all the affected ones?

Do you mean identify all Intel drivers that may have the same issue?

Francesco

[PATCH v2 4/7] PCI: Add device even if driver attach failed

device_attach() returning failure indicates a driver error while trying to
probe the device. In such a scenario, the PCI device should still be added
in the system and be visible to the user.

This patch partially reverts:
commit ab1a187bba5c ("PCI: Check device_attach() return value always")

Signed-off-by: Rajat Jain 
Reviewed-by: Greg Kroah-Hartman 
---
v2: Cosmetic change in commit log.
Add Greg's "reviewed-by"

 drivers/pci/bus.c | 6 +-
 1 file changed, 1 insertion(+), 5 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 8e40b3e6da77d..3cef835b375fd 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -322,12 +322,8 @@ void pci_bus_add_device(struct pci_dev *dev)
 
dev->match_driver = true;
retval = device_attach(>dev);
-   if (retval < 0 && retval != -EPROBE_DEFER) {
+   if (retval < 0 && retval != -EPROBE_DEFER)
pci_warn(dev, "device attach failed (%d)\n", retval);
-   pci_proc_detach_device(dev);
-   pci_remove_sysfs_dev_files(dev);
-   return;
-   }
 
pci_dev_assign_added(dev, true);
 }
-- 
2.27.0.212.ge8ba1cc988-goog

[PATCH v2 3/7] PCI/ACS: Enable PCI_ACS_TB for untrusted/external-facing devices

When enabling ACS, enable translation blocking for external facing ports
and untrusted devices.

Signed-off-by: Rajat Jain 
---
v2: Commit log change 

 drivers/pci/pci.c|  4 
 drivers/pci/quirks.c | 11 +++
 2 files changed, 15 insertions(+)

diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index d2ff987585855..79853b52658a2 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -3330,6 +3330,10 @@ static void pci_std_enable_acs(struct pci_dev *dev)
/* Upstream Forwarding */
ctrl |= (cap & PCI_ACS_UF);
 
+   if (dev->external_facing || dev->untrusted)
+   /* Translation Blocking */
+   ctrl |= (cap & PCI_ACS_TB);
+
pci_write_config_word(dev, pos + PCI_ACS_CTRL, ctrl);
 }
 
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index b341628e47527..6294adeac4049 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4934,6 +4934,13 @@ static void pci_quirk_enable_intel_rp_mpc_acs(struct 
pci_dev *dev)
}
 }
 
+/*
+ * Currently this quirk does the equivalent of
+ * PCI_ACS_RR | PCI_ACS_CR | PCI_ACS_UF | PCI_ACS_SV
+ *
+ * Currently missing, it also needs to do equivalent of PCI_ACS_TB,
+ * if dev->external_facing || dev->untrusted
+ */
 static int pci_quirk_enable_intel_pch_acs(struct pci_dev *dev)
 {
if (!pci_quirk_intel_pch_acs_match(dev))
@@ -4973,6 +4980,10 @@ static int pci_quirk_enable_intel_spt_pch_acs(struct 
pci_dev *dev)
ctrl |= (cap & PCI_ACS_CR);
ctrl |= (cap & PCI_ACS_UF);
 
+   if (dev->external_facing || dev->untrusted)
+   /* Translation Blocking */
+   ctrl |= (cap & PCI_ACS_TB);
+
pci_write_config_dword(dev, pos + INTEL_SPT_ACS_CTRL, ctrl);
 
pci_info(dev, "Intel SPT PCH root port ACS workaround enabled\n");
-- 
2.27.0.212.ge8ba1cc988-goog

[PATCH v2 1/7] PCI: Keep the ACS capability offset in device

Currently this is being looked up at a number of places. Read and store it
once at bootup so that it can be used by all later.

Signed-off-by: Rajat Jain 
---
v2: Commit log cosmetic changes

 drivers/pci/p2pdma.c |  2 +-
 drivers/pci/pci.c| 21 +
 drivers/pci/pci.h|  2 +-
 drivers/pci/probe.c  |  2 +-
 drivers/pci/quirks.c |  8 
 include/linux/pci.h  |  1 +
 6 files changed, 25 insertions(+), 11 deletions(-)

diff --git a/drivers/pci/p2pdma.c b/drivers/pci/p2pdma.c
index e8e444eeb1cd2..f29a48f8fa594 100644
--- a/drivers/pci/p2pdma.c
+++ b/drivers/pci/p2pdma.c
@@ -253,7 +253,7 @@ static int pci_bridge_has_acs_redir(struct pci_dev *pdev)
int pos;
u16 ctrl;
 
-   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS);
+   pos = pdev->acs_cap;
if (!pos)
return 0;
 
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index ce096272f52b1..d2ff987585855 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -51,6 +51,7 @@ EXPORT_SYMBOL(pci_pci_problems);
 
 unsigned int pci_pm_d3_delay;
 
+static void pci_enable_acs(struct pci_dev *dev);
 static void pci_pme_list_scan(struct work_struct *work);
 
 static LIST_HEAD(pci_pme_list);
@@ -3284,7 +3285,7 @@ static void pci_disable_acs_redir(struct pci_dev *dev)
if (!pci_dev_specific_disable_acs_redir(dev))
return;
 
-   pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
+   pos = dev->acs_cap;
if (!pos) {
pci_warn(dev, "cannot disable ACS redirect for this hardware as 
it does not have ACS capabilities\n");
return;
@@ -3310,7 +3311,7 @@ static void pci_std_enable_acs(struct pci_dev *dev)
u16 cap;
u16 ctrl;
 
-   pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
+   pos = dev->acs_cap;
if (!pos)
return;
 
@@ -3336,7 +3337,7 @@ static void pci_std_enable_acs(struct pci_dev *dev)
  * pci_enable_acs - enable ACS if hardware support it
  * @dev: the PCI device
  */
-void pci_enable_acs(struct pci_dev *dev)
+static void pci_enable_acs(struct pci_dev *dev)
 {
if (!pci_acs_enable)
goto disable_acs_redir;
@@ -3362,7 +3363,7 @@ static bool pci_acs_flags_enabled(struct pci_dev *pdev, 
u16 acs_flags)
int pos;
u16 cap, ctrl;
 
-   pos = pci_find_ext_capability(pdev, PCI_EXT_CAP_ID_ACS);
+   pos = pdev->acs_cap;
if (!pos)
return false;
 
@@ -3487,6 +3488,18 @@ bool pci_acs_path_enabled(struct pci_dev *start,
return true;
 }
 
+/**
+ * pci_acs_init - Initialize if hardware supports it
+ * @dev: the PCI device
+ */
+void pci_acs_init(struct pci_dev *dev)
+{
+   dev->acs_cap = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
+
+   if (dev->acs_cap)
+   pci_enable_acs(dev);
+}
+
 /**
  * pci_rebar_find_pos - find position of resize ctrl reg for BAR
  * @pdev: PCI device
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 6d3f758671064..12fb79fbe29d3 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -532,7 +532,7 @@ static inline resource_size_t pci_resource_alignment(struct 
pci_dev *dev,
return resource_alignment(res);
 }
 
-void pci_enable_acs(struct pci_dev *dev);
+void pci_acs_init(struct pci_dev *dev);
 #ifdef CONFIG_PCI_QUIRKS
 int pci_dev_specific_acs_enabled(struct pci_dev *dev, u16 acs_flags);
 int pci_dev_specific_enable_acs(struct pci_dev *dev);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 2f66988cea257..6d87066a5ecc5 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -2390,7 +2390,7 @@ static void pci_init_capabilities(struct pci_dev *dev)
pci_ats_init(dev);  /* Address Translation Services */
pci_pri_init(dev);  /* Page Request Interface */
pci_pasid_init(dev);/* Process Address Space ID */
-   pci_enable_acs(dev);/* Enable ACS P2P upstream forwarding */
+   pci_acs_init(dev);  /* Access Control Services */
pci_ptm_init(dev);  /* Precision Time Measurement */
pci_aer_init(dev);  /* Advanced Error Reporting */
pci_dpc_init(dev);  /* Downstream Port Containment */
diff --git a/drivers/pci/quirks.c b/drivers/pci/quirks.c
index 812bfc32ecb82..b341628e47527 100644
--- a/drivers/pci/quirks.c
+++ b/drivers/pci/quirks.c
@@ -4653,7 +4653,7 @@ static int pci_quirk_intel_spt_pch_acs(struct pci_dev 
*dev, u16 acs_flags)
if (!pci_quirk_intel_spt_pch_acs_match(dev))
return -ENOTTY;
 
-   pos = pci_find_ext_capability(dev, PCI_EXT_CAP_ID_ACS);
+   pos = dev->acs_cap;
if (!pos)
return -ENOTTY;
 
@@ -4961,7 +4961,7 @@ static int pci_quirk_enable_intel_spt_pch_acs(struct 
pci_dev *dev)
if (!pci_quirk_intel_spt_pch_acs_match(dev))
return -ENOTTY;
 
-   pos = pci_find_ext_capability(dev,

[PATCH v2 2/7] PCI: Set "untrusted" flag for truly external devices only

The "ExternalFacing" devices (root ports) are still internal devices that
sit on the internal system fabric and thus trusted. Currently they were
being marked untrusted.

This patch uses the platform flag to identify the external facing devices
and then use it to mark any downstream devices as "untrusted". The
external-facing devices themselves are left as "trusted". This was
discussed here: https://lkml.org/lkml/2020/6/10/1049

Signed-off-by: Rajat Jain 
---
v2: cosmetic changes in commit log

 drivers/iommu/intel/iommu.c |  2 +-
 drivers/pci/of.c|  2 +-
 drivers/pci/pci-acpi.c  | 13 +++--
 drivers/pci/probe.c |  2 +-
 include/linux/pci.h |  8 
 5 files changed, 18 insertions(+), 9 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index d759e7234e982..1ccb224f82496 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -4743,7 +4743,7 @@ static inline bool has_untrusted_dev(void)
struct pci_dev *pdev = NULL;
 
for_each_pci_dev(pdev)
-   if (pdev->untrusted)
+   if (pdev->untrusted || pdev->external_facing)
return true;
 
return false;
diff --git a/drivers/pci/of.c b/drivers/pci/of.c
index 27839cd2459f6..22727fc9558df 100644
--- a/drivers/pci/of.c
+++ b/drivers/pci/of.c
@@ -42,7 +42,7 @@ void pci_set_bus_of_node(struct pci_bus *bus)
} else {
node = of_node_get(bus->self->dev.of_node);
if (node && of_property_read_bool(node, "external-facing"))
-   bus->self->untrusted = true;
+   bus->self->external_facing = true;
}
 
bus->dev.of_node = node;
diff --git a/drivers/pci/pci-acpi.c b/drivers/pci/pci-acpi.c
index 7224b1e5f2a83..492c07805caf8 100644
--- a/drivers/pci/pci-acpi.c
+++ b/drivers/pci/pci-acpi.c
@@ -1213,22 +1213,23 @@ static void pci_acpi_optimize_delay(struct pci_dev 
*pdev,
ACPI_FREE(obj);
 }
 
-static void pci_acpi_set_untrusted(struct pci_dev *dev)
+static void pci_acpi_set_external_facing(struct pci_dev *dev)
 {
u8 val;
 
-   if (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT)
+   if (pci_pcie_type(dev) != PCI_EXP_TYPE_ROOT_PORT &&
+   pci_pcie_type(dev) != PCI_EXP_TYPE_DOWNSTREAM)
return;
if (device_property_read_u8(>dev, "ExternalFacingPort", ))
return;
 
/*
-* These root ports expose PCIe (including DMA) outside of the
-* system so make sure we treat them and everything behind as
+* These root/down ports expose PCIe (including DMA) outside of the
+* system so make sure we treat everything behind them as
 * untrusted.
 */
if (val)
-   dev->untrusted = 1;
+   dev->external_facing = 1;
 }
 
 static void pci_acpi_setup(struct device *dev)
@@ -1240,7 +1241,7 @@ static void pci_acpi_setup(struct device *dev)
return;
 
pci_acpi_optimize_delay(pci_dev, adev->handle);
-   pci_acpi_set_untrusted(pci_dev);
+   pci_acpi_set_external_facing(pci_dev);
pci_acpi_add_edr_notifier(pci_dev);
 
pci_acpi_add_pm_notifier(adev, pci_dev);
diff --git a/drivers/pci/probe.c b/drivers/pci/probe.c
index 6d87066a5ecc5..8c40c00413e74 100644
--- a/drivers/pci/probe.c
+++ b/drivers/pci/probe.c
@@ -1552,7 +1552,7 @@ static void set_pcie_untrusted(struct pci_dev *dev)
 * untrusted as well.
 */
parent = pci_upstream_bridge(dev);
-   if (parent && parent->untrusted)
+   if (parent && (parent->untrusted || parent->external_facing))
dev->untrusted = true;
 }
 
diff --git a/include/linux/pci.h b/include/linux/pci.h
index a26be5332bba6..fe1bc603fda40 100644
--- a/include/linux/pci.h
+++ b/include/linux/pci.h
@@ -432,6 +432,14 @@ struct pci_dev {
 * mappings to make sure they cannot access arbitrary memory.
 */
unsigned intuntrusted:1;
+   /*
+* Devices are marked as external-facing using info from platform
+* (ACPI / devicetree). An external-facing device is still an internal
+* trusted device, but it faces external untrusted devices. Thus any
+* devices enumerated downstream an external-facing device is marked
+* as untrusted.
+*/
+   unsigned intexternal_facing:1;
unsigned intbroken_intx_masking:1;  /* INTx masking can't be used */
unsigned intio_window_1k:1; /* Intel bridge 1K I/O windows 
*/
unsigned intirq_managed:1;
-- 
2.27.0.212.ge8ba1cc988-goog

[PATCH v2 7/7] PCI: Add parameter to disable attaching external devices

Introduce a PCI parameter that disables the automatic attachment of
external devices to their drivers.

This is needed to allow an admin to control which drivers he wants to
allow on external ports. For more context, see threads at:
https://lore.kernel.org/linux-pci/20200609210400.GA1461839@bjorn-Precision-5520/
https://lore.kernel.org/linux-pci/cack8z6h-dzqybmqtu5_h5ttwwn35q7yysm9a7wj0twfqp8q...@mail.gmail.com/

drivers_autoprobe can only be disabled after userspace comes up. So
any external devices that were plugged in before boot may still bind
to drivers before userspace gets a chance to clear drivers_autoprobe.
Another problem is that even with drivers_autoprobe=0, the hot-added
PCI devices are bound to drivers because PCI explicitly calls
device_attach() asking driver core to find and attach a driver. This
patch helps with both of these problems.

Signed-off-by: Rajat Jain 
---
v2: Use the newly introduced dev_is_external() from device core
commit log elaborated

 drivers/pci/bus.c | 11 ---
 drivers/pci/pci.c |  9 +
 drivers/pci/pci.h |  1 +
 3 files changed, 18 insertions(+), 3 deletions(-)

diff --git a/drivers/pci/bus.c b/drivers/pci/bus.c
index 3cef835b375fd..c11725bccffb0 100644
--- a/drivers/pci/bus.c
+++ b/drivers/pci/bus.c
@@ -321,9 +321,14 @@ void pci_bus_add_device(struct pci_dev *dev)
pci_bridge_d3_update(dev);
 
dev->match_driver = true;
-   retval = device_attach(>dev);
-   if (retval < 0 && retval != -EPROBE_DEFER)
-   pci_warn(dev, "device attach failed (%d)\n", retval);
+
+   if (pci_dont_attach_external_devs && dev_is_external(>dev)) {
+   pci_info(dev, "not attaching external device\n");
+   } else {
+   retval = device_attach(>dev);
+   if (retval < 0 && retval != -EPROBE_DEFER)
+   pci_warn(dev, "device attach failed (%d)\n", retval);
+   }
 
pci_dev_assign_added(dev, true);
 }
diff --git a/drivers/pci/pci.c b/drivers/pci/pci.c
index 35f25ac39167b..3ebcfa8b33178 100644
--- a/drivers/pci/pci.c
+++ b/drivers/pci/pci.c
@@ -128,6 +128,13 @@ static bool pcie_ats_disabled;
 /* If set, the PCI config space of each device is printed during boot. */
 bool pci_early_dump;
 
+/*
+ * If set, the devices behind external-facing bridges (as marked by firmware)
+ * shall not be attached automatically. Userspace will need to attach them
+ * manually: echo   > /sys/bus/pci/drivers//bind
+ */
+bool pci_dont_attach_external_devs;
+
 bool pci_ats_disabled(void)
 {
return pcie_ats_disabled;
@@ -6539,6 +6546,8 @@ static int __init pci_setup(char *str)
pci_add_flags(PCI_SCAN_ALL_PCIE_DEVS);
} else if (!strncmp(str, "disable_acs_redir=", 18)) {
disable_acs_redir_param = str + 18;
+   } else if (!strcmp(str, "dont_attach_external_devs")) {
+   pci_dont_attach_external_devs = true;
} else {
pr_err("PCI: Unknown option `%s'\n", str);
}
diff --git a/drivers/pci/pci.h b/drivers/pci/pci.h
index 12fb79fbe29d3..875fecb9b2612 100644
--- a/drivers/pci/pci.h
+++ b/drivers/pci/pci.h
@@ -13,6 +13,7 @@
 
 extern const unsigned char pcie_link_speed[];
 extern bool pci_early_dump;
+extern bool pci_dont_attach_external_devs;
 
 bool pcie_cap_has_lnkctl(const struct pci_dev *dev);
 bool pcie_cap_has_rtctl(const struct pci_dev *dev);
-- 
2.27.0.212.ge8ba1cc988-goog

Re: [PATCH 3/7] iommu/vt-d: Fix PASID devTLB invalidation

2020-06-29 Thread Jacob Pan

On Thu, 25 Jun 2020 15:25:57 +0800
Lu Baolu  wrote:

> On 2020/6/23 23:43, Jacob Pan wrote:
> > DevTLB flush can be used for both DMA request with and without
> > PASIDs. The former uses PASID#0 (RID2PASID), latter uses non-zero
> > PASID for SVA usage.
> > 
> > This patch adds a check for PASID value such that devTLB flush with
> > PASID is used for SVA case. This is more efficient in that multiple
> > PASIDs can be used by a single device, when tearing down a PASID
> > entry we shall flush only the devTLB specific to a PASID.
> > 
> > Fixes: 6f7db75e1c46 ("iommu/vt-d: Add second level page table")
> > Signed-off-by: Jacob Pan 
> > ---
> >   drivers/iommu/intel/pasid.c | 11 ++-
> >   1 file changed, 10 insertions(+), 1 deletion(-)
> > 
> > diff --git a/drivers/iommu/intel/pasid.c
> > b/drivers/iommu/intel/pasid.c index c81f0f17c6ba..3991a24539a1
> > 100644 --- a/drivers/iommu/intel/pasid.c
> > +++ b/drivers/iommu/intel/pasid.c
> > @@ -486,7 +486,16 @@ devtlb_invalidation_with_pasid(struct
> > intel_iommu *iommu, qdep = info->ats_qdep;
> > pfsid = info->pfsid;
> >   
> > -   qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 -
> > VTD_PAGE_SHIFT);
> > +   /*
> > +* When PASID 0 is used, it indicates RID2PASID(DMA
> > request w/o PASID),
> > +* devTLB flush w/o PASID should be used. For non-zero
> > PASID under
> > +* SVA usage, device could do DMA with multiple PASIDs. It
> > is more
> > +* efficient to flush devTLB specific to the PASID.
> > +*/
> > +   if (pasid)  
> 
> How about
> 
>   if (pasid == PASID_RID2PASID)
>   qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64 -
> VTD_PAGE_SHIFT); else
>   qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid,
> qdep, 0, 64 - VTD_PAGE_SHIFT);
> 
> ?
> 
> It makes the code more readable and still works even we reassign
> another pasid for RID2PASID.
> 
agreed, thanks.

> Best regards,
> baolu
> 
> > +   qi_flush_dev_iotlb_pasid(iommu, sid, pfsid, pasid,
> > qdep, 0, 64 - VTD_PAGE_SHIFT);
> > +   else
> > +   qi_flush_dev_iotlb(iommu, sid, pfsid, qdep, 0, 64
> > - VTD_PAGE_SHIFT); }
> >   
> >   void intel_pasid_tear_down_entry(struct intel_iommu *iommu,
> > struct device *dev, 

[Jacob Pan]

[PATCH v2 6/7] PCI: Move pci_dev->untrusted logic to use device location instead

The firmware was provinding "ExternalFacing" attribute on PCI root ports,
to allow the kernel to mark devices behind it as external. Note that the
firmware provides an immutable, read-only property, i.e. the location of
the device.

The use of (external) device location as hint for (dis)trust, is a
decision that IOMMU drivers have taken, so we should call it out
explicitly.

This patch removes the pci_dev->untrusted and changes the users of it to
use device core provided device location instead. This location is
populated by PCI using the same "ExternalFacing" firmware info. Any
device not behind the "ExternalFacing" bridges are marked internal and
the ones behind such bridges are markes external.

Signed-off-by: Rajat Jain 
---
v2: (Initial version)

 drivers/iommu/intel/iommu.c | 31 +--
 drivers/pci/ats.c   |  2 +-
 drivers/pci/pci-driver.c|  1 +
 drivers/pci/pci.c   |  2 +-
 drivers/pci/probe.c | 18 --
 drivers/pci/quirks.c|  2 +-
 include/linux/pci.h | 10 +-
 7 files changed, 38 insertions(+), 28 deletions(-)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index 1ccb224f82496..ca66a196f5e97 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -168,6 +168,22 @@ static inline unsigned long virt_to_dma_pfn(void *p)
return page_to_dma_pfn(virt_to_page(p));
 }
 
+static inline bool untrusted_dev(struct device *dev)
+{
+   /*
+* Treat all external PCI devices as untrusted devices. These are the
+* devices behing marked behind external-facing bridges as marked by
+* the firmware. The untrusted devices are the ones that can potentially
+* execute DMA attacks and similar. They are typically connected through
+* external thunderbolt ports. When an IOMMU is enabled they should be
+* getting full mappings to ensure they cannot access arbitrary memory.
+*/
+   if (dev_is_pci(dev) && dev_is_external(dev))
+   return true;
+
+   return false;
+}
+
 /* global iommu list, set NULL for ignored DMAR units */
 static struct intel_iommu **g_iommus;
 
@@ -383,8 +399,7 @@ struct device_domain_info *get_domain_info(struct device 
*dev)
 DEFINE_SPINLOCK(device_domain_lock);
 static LIST_HEAD(device_domain_list);
 
-#define device_needs_bounce(d) (!intel_no_bounce && dev_is_pci(d) &&   \
-   to_pci_dev(d)->untrusted)
+#define device_needs_bounce(d) (!intel_no_bounce && untrusted_dev(d))
 
 /*
  * Iterate over elements in device_domain_list and call the specified
@@ -2830,7 +2845,7 @@ static int device_def_domain_type(struct device *dev)
 * Prevent any device marked as untrusted from getting
 * placed into the statically identity mapping domain.
 */
-   if (pdev->untrusted)
+   if (untrusted_dev(dev))
return IOMMU_DOMAIN_DMA;
 
if ((iommu_identity_mapping & IDENTMAP_AZALIA) && 
IS_AZALIA(pdev))
@@ -3464,7 +3479,6 @@ static void intel_unmap(struct device *dev, dma_addr_t 
dev_addr, size_t size)
unsigned long iova_pfn;
struct intel_iommu *iommu;
struct page *freelist;
-   struct pci_dev *pdev = NULL;
 
domain = find_domain(dev);
BUG_ON(!domain);
@@ -3477,11 +3491,8 @@ static void intel_unmap(struct device *dev, dma_addr_t 
dev_addr, size_t size)
start_pfn = mm_to_dma_pfn(iova_pfn);
last_pfn = start_pfn + nrpages - 1;
 
-   if (dev_is_pci(dev))
-   pdev = to_pci_dev(dev);
-
freelist = domain_unmap(domain, start_pfn, last_pfn);
-   if (intel_iommu_strict || (pdev && pdev->untrusted) ||
+   if (intel_iommu_strict || untrusted_dev(dev) ||
!has_iova_flush_queue(>iovad)) {
iommu_flush_iotlb_psi(iommu, domain, start_pfn,
  nrpages, !freelist, 0);
@@ -4743,7 +4754,7 @@ static inline bool has_untrusted_dev(void)
struct pci_dev *pdev = NULL;
 
for_each_pci_dev(pdev)
-   if (pdev->untrusted || pdev->external_facing)
+   if (pdev->external_facing || untrusted_dev(>dev))
return true;
 
return false;
@@ -6036,7 +6047,7 @@ intel_iommu_domain_set_attr(struct iommu_domain *domain,
  */
 static bool risky_device(struct pci_dev *pdev)
 {
-   if (pdev->untrusted) {
+   if (untrusted_dev(>dev)) {
pci_info(pdev,
 "Skipping IOMMU quirk for dev [%04X:%04X] on untrusted 
PCI link\n",
 pdev->vendor, pdev->device);
diff --git a/drivers/pci/ats.c b/drivers/pci/ats.c
index b761c1f72f672..ebd370f4d5b06 100644
--- a/drivers/pci/ats.c
+++ b/drivers/pci/ats.c
@@ -42,7 +42,7 @@ bool pci_ats_supported(struct pci_dev *dev)
if (!dev->ats_cap)
return false;
 
-

[PATCH v2 5/7] driver core: Add device location to "struct device" and expose it in sysfs

Add a new (optional) field to denote the physical location of a device
in the system, and expose it in sysfs. This was discussed here:
https://lore.kernel.org/linux-acpi/20200618184621.ga446...@kroah.com/

(The primary choice for attribute name i.e. "location" is already
exposed as an ABI elsewhere, so settled for "site"). Individual buses
that want to support this new attribute can opt-in by setting a flag in
bus_type, and then populating the location of device while enumerating
it.

Signed-off-by: Rajat Jain 
---
v2: (Initial version)

 drivers/base/core.c| 35 +++
 include/linux/device.h | 42 ++
 include/linux/device/bus.h |  8 
 3 files changed, 85 insertions(+)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index 67d39a90b45c7..14c815526b7fa 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1778,6 +1778,32 @@ static ssize_t online_store(struct device *dev, struct 
device_attribute *attr,
 }
 static DEVICE_ATTR_RW(online);
 
+static ssize_t site_show(struct device *dev, struct device_attribute *attr,
+char *buf)
+{
+   const char *site;
+
+   device_lock(dev);
+   switch (dev->site) {
+   case SITE_INTERNAL:
+   site = "INTERNAL";
+   break;
+   case SITE_EXTENDED:
+   site = "EXTENDED";
+   break;
+   case SITE_EXTERNAL:
+   site = "EXTERNAL";
+   break;
+   case SITE_UNKNOWN:
+   default:
+   site = "UNKNOWN";
+   break;
+   }
+   device_unlock(dev);
+   return sprintf(buf, "%s\n", site);
+}
+static DEVICE_ATTR_RO(site);
+
 int device_add_groups(struct device *dev, const struct attribute_group 
**groups)
 {
return sysfs_create_groups(>kobj, groups);
@@ -1949,8 +1975,16 @@ static int device_add_attrs(struct device *dev)
goto err_remove_dev_groups;
}
 
+   if (bus_supports_site(dev->bus)) {
+   error = device_create_file(dev, _attr_site);
+   if (error)
+   goto err_remove_dev_attr_online;
+   }
+
return 0;
 
+ err_remove_dev_attr_online:
+   device_remove_file(dev, _attr_online);
  err_remove_dev_groups:
device_remove_groups(dev, dev->groups);
  err_remove_type_groups:
@@ -1968,6 +2002,7 @@ static void device_remove_attrs(struct device *dev)
struct class *class = dev->class;
const struct device_type *type = dev->type;
 
+   device_remove_file(dev, _attr_site);
device_remove_file(dev, _attr_online);
device_remove_groups(dev, dev->groups);
 
diff --git a/include/linux/device.h b/include/linux/device.h
index 15460a5ac024a..a4143735ae712 100644
--- a/include/linux/device.h
+++ b/include/linux/device.h
@@ -428,6 +428,31 @@ enum dl_dev_state {
DL_DEV_UNBINDING,
 };
 
+/**
+ * enum device_site - Physical location of the device in the system.
+ * The semantics of values depend on subsystem / bus:
+ *
+ * @SITE_UNKNOWN:  Location is Unknown (default)
+ *
+ * @SITE_INTERNAL: Device is internal to the system, and cannot be (easily)
+ * removed. E.g. SoC internal devices, onboard soldered
+ * devices, internal M.2 cards (that cannot be removed
+ * without opening the chassis).
+ * @SITE_EXTENDED: Device sits an extension of the system. E.g. devices
+ * on external PCIe trays, docking stations etc. These
+ * devices may be removable, but are generally housed
+ * internally on an extension board, so they are removed
+ * only when that whole extension board is removed.
+ * @SITE_EXTERNAL: Devices truly external to the system (i.e. plugged on
+ * an external port) that may be removed or added frequently.
+ */
+enum device_site {
+   SITE_UNKNOWN = 0,
+   SITE_INTERNAL,
+   SITE_EXTENDED,
+   SITE_EXTERNAL,
+};
+
 /**
  * struct dev_links_info - Device data related to device links.
  * @suppliers: List of links to supplier devices.
@@ -513,6 +538,7 @@ struct dev_links_info {
  * device (i.e. the bus driver that discovered the device).
  * @iommu_group: IOMMU group the device belongs to.
  * @iommu: Per device generic IOMMU runtime data
+ * @site:  Physical location of the device w.r.t. the system
  *
  * @offline_disabled: If set, the device is permanently online.
  * @offline:   Set after successful invocation of bus type's .offline().
@@ -613,6 +639,8 @@ struct device {
struct iommu_group  *iommu_group;
struct dev_iommu*iommu;
 
+   enum device_sitesite;   /* Device physical location */
+
booloffline_disabled:1;
booloffline:1;
boolof_node_reused:1;
@@ -806,6 +834,20 @@ static inline bool dev_has_sync_state(struct

[PATCH v2 0/7] Tighten PCI security, expose dev location in sysfs

This is a set of loosely related patches most of whom emerged out of
discussion in the following threads. In a nutshell the goal was to allow
an administrator to specify which driver he wants to allow on external
ports, and a strategy was chalked out:
https://lore.kernel.org/linux-pci/20200609210400.GA1461839@bjorn-Precision-5520/
https://lore.kernel.org/linux-pci/20200618184621.ga446...@kroah.com/
https://lore.kernel.org/linux-pci/20200627050225.ga226...@kroah.com/

* The first 3 patches tighten the PCI security using ACS, and take care
  of a border case.
* The 4th patch takes care of PCI bug.
* 5th and 6th patches expose a device's location into the sysfs to allow
  admin to make decision based on that.
* 7th patch is to ensure that the external devices don't bind to drivers
  during boot.

Rajat Jain (7):
  PCI: Keep the ACS capability offset in device
  PCI: Set "untrusted" flag for truly external devices only
  PCI/ACS: Enable PCI_ACS_TB for untrusted/external-facing devices
  PCI: Add device even if driver attach failed
  driver core: Add device location to "struct device" and expose it in
sysfs
  PCI: Move pci_dev->untrusted logic to use device location instead
  PCI: Add parameter to disable attaching external devices

 drivers/base/core.c | 35 +++
 drivers/iommu/intel/iommu.c | 31 ++-
 drivers/pci/ats.c   |  2 +-
 drivers/pci/bus.c   | 13 ++--
 drivers/pci/of.c|  2 +-
 drivers/pci/p2pdma.c|  2 +-
 drivers/pci/pci-acpi.c  | 13 ++--
 drivers/pci/pci-driver.c|  1 +
 drivers/pci/pci.c   | 34 ++
 drivers/pci/pci.h   |  3 ++-
 drivers/pci/probe.c | 20 +++---
 drivers/pci/quirks.c| 19 +
 include/linux/device.h  | 42 +
 include/linux/device/bus.h  |  8 +++
 include/linux/pci.h | 13 ++--
 15 files changed, 191 insertions(+), 47 deletions(-)

-- 
2.27.0.212.ge8ba1cc988-goog

[PATCH] clk: staging: Specify IOMEM dependency for Xilinx Clocking Wizard driver

2020-06-29 Thread David Gow

The Xilinx Clocking Wizard driver uses the devm_ioremap_resource
function, but does not specify a dependency on IOMEM in Kconfig. This
causes a build failure on architectures without IOMEM, for example, UML
(notably with make allyesconfig).

Fix this by making CONFIG_COMMON_CLK_XLNX_CLKWZRD depend on CONFIG_IOMEM.

Signed-off-by: David Gow 
---
 drivers/staging/clocking-wizard/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/clocking-wizard/Kconfig 
b/drivers/staging/clocking-wizard/Kconfig
index 04be22dca9b6..69cf51445f08 100644
--- a/drivers/staging/clocking-wizard/Kconfig
+++ b/drivers/staging/clocking-wizard/Kconfig
@@ -5,6 +5,6 @@
 
 config COMMON_CLK_XLNX_CLKWZRD
tristate "Xilinx Clocking Wizard"
-   depends on COMMON_CLK && OF
+   depends on COMMON_CLK && OF && IOMEM
help
  Support for the Xilinx Clocking Wizard IP core clock generator.
-- 
2.27.0.212.ge8ba1cc988-goog

[PATCH net-next] net: dsa: Improve subordinate PHY error message

2020-06-29 Thread Florian Fainelli

It is not very informative to know the DSA master device when a
subordinate network device fails to get its PHY setup. Provide the
device name and capitalize PHY while we are it.

Signed-off-by: Florian Fainelli 
---
 net/dsa/slave.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/net/dsa/slave.c b/net/dsa/slave.c
index 4c7f086a047b..e147e10b411c 100644
--- a/net/dsa/slave.c
+++ b/net/dsa/slave.c
@@ -1795,7 +1795,8 @@ int dsa_slave_create(struct dsa_port *port)
 
ret = dsa_slave_phy_setup(slave_dev);
if (ret) {
-   netdev_err(master, "error %d setting up slave phy\n", ret);
+   netdev_err(master, "error %d setting up slave PHY for %s\n",
+  ret, slave_dev->name);
goto out_gcells;
}
 
-- 
2.17.1

RE: [PATCH V2 3/9] clk: imx: Support building SCU clock driver as module

2020-06-29 Thread Anson Huang



> Subject: RE: [PATCH V2 3/9] clk: imx: Support building SCU clock driver as
> module
> 
> > From: Arnd Bergmann 
> > Sent: Monday, June 29, 2020 4:20 PM
> >
> > On Mon, Jun 29, 2020 at 9:18 AM Dong Aisheng 
> > wrote:
> > > On Thu, Jun 25, 2020 at 6:43 AM Stephen Boyd 
> wrote:
> > > > Quoting Aisheng Dong (2020-06-23 19:59:09) Why aren't there
> > > > options to enable clk-imx6q and clk-imx6sl in the clk/imx/Kconfig file?
> > > > Those can be bool or tristate depending on if the SoC drivers use
> > > > CLK_OF_DECLARE or not and depend on the mxc-clk library and SoC
> > > > config we have in the makefile today.
> > >
> > > Yes, we can do that in clk/imx/Kconfig as follows theoretically.
> > > config CLK_IMX6Q
> > > bool
> > > def_bool ARCH_MXC && ARM
> > > select MXC_CLK
> > >
> > > But we have totally 15 platforms that need to change.
> >
> > I would make that
> >
> > config CLK_IMX6Q
> >  bool "Clock driver for NXP i.MX6Q"
> >  depends on SOC_IMX6Q || COMPILE_TEST
> >  default SOC_IMX6Q
> >  select MXC_CLK
> >
> 
> Yes, this seems better.
> Anson, pls follow this way.

In V4 patch series, I will add a patch to introduce CLK_IMX6Q and other i.MX6/7
clock config following this way.

Anson

Re: [PATCH v2 4/4] staging: qlge: replace pr_err with netdev_err

2020-06-29 Thread Benjamin Poirier

On 2020-06-30 01:43 +0800, Coiby Xu wrote:
> On Mon, Jun 29, 2020 at 02:30:04PM +0900, Benjamin Poirier wrote:
> > On 2020-06-27 22:58 +0800, Coiby Xu wrote:
> > [...]
> > >  void ql_dump_qdev(struct ql_adapter *qdev)
> > >  {
> > > @@ -1611,99 +1618,100 @@ void ql_dump_qdev(struct ql_adapter *qdev)
> > >  #ifdef QL_CB_DUMP
> > >  void ql_dump_wqicb(struct wqicb *wqicb)
> > >  {
> > > - pr_err("Dumping wqicb stuff...\n");
> > > - pr_err("wqicb->len = 0x%x\n", le16_to_cpu(wqicb->len));
> > > - pr_err("wqicb->flags = %x\n", le16_to_cpu(wqicb->flags));
> > > - pr_err("wqicb->cq_id_rss = %d\n",
> > > -le16_to_cpu(wqicb->cq_id_rss));
> > > - pr_err("wqicb->rid = 0x%x\n", le16_to_cpu(wqicb->rid));
> > > - pr_err("wqicb->wq_addr = 0x%llx\n",
> > > -(unsigned long long)le64_to_cpu(wqicb->addr));
> > > - pr_err("wqicb->wq_cnsmr_idx_addr = 0x%llx\n",
> > > -(unsigned long long)le64_to_cpu(wqicb->cnsmr_idx_addr));
> > > + netdev_err(qdev->ndev, "Dumping wqicb stuff...\n");
> > 
> > drivers/staging/qlge/qlge_dbg.c:1621:13: error: ‘qdev’ undeclared (first 
> > use in this function); did you mean ‘cdev’?
> > 1621 |  netdev_err(qdev->ndev, "Dumping wqicb stuff...\n");
> >  | ^~~~
> >  | cdev
> > 
> > [...]
> > and many more like that
> > 
> > Anyways, qlge_dbg.h is a dumpster. It has hundreds of lines of code
> > bitrotting away in ifdef land. See this comment from David Miller on the
> > topic of ifdef'ed debugging code:
> > https://lore.kernel.org/netdev/20200303.145916.1506066510928020193.da...@davemloft.net/
> 
> Thank you for spotting this problem! This issue could be fixed by
> passing qdev to ql_dump_wqicb. Or are you suggesting we completely
> remove qlge_dbg since it's only for the purpose of debugging the driver
> by the developer?

At 2000 lines, qlge_dbg.c alone is larger than some entire ethernet
drivers. Most of what it does is dump kernel data structures or pci
memory mapped registers to dmesg. There are better facilities for that.
My thinking is not simply to delete qlge_dbg.c but to replace it, making
sure that most of the same information is still available. For data
structures, crash or drgn can be used; possibly with a script for the
latter which formats the data. For pci registers, they should be
included in the ethtool register dump and a patch added to ethtool to
pretty print them. That's what other drivers like e1000e do. For the
"coredump", devlink health can be used.

The qlge_force_coredump module option should also be removed. At the
moment, calling the ethtool register dump function with that option
enabled does a hexdump of a 176k struct to dmesg. That's shameful.

> 
> Btw, I'm curious how you make this compiling error occur. Do you manually 
> trigger
> it via "QL_CB_DUMP=1 make M=drivers/staging/qlge" or use some automate
> tools?

I just uncommented the defines in qlge.h

[PATCH 4/7] [elf-fdpic] coredump: don't bother with cyclic list for per-thread objects

From: Al Viro 

plain single-linked list is just fine here...

Signed-off-by: Al Viro 
---
 fs/binfmt_elf_fdpic.c | 29 +++--
 1 file changed, 11 insertions(+), 18 deletions(-)

diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index a6ee92137529..bcbf756fba39 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1453,7 +1453,7 @@ static int fill_psinfo(struct elf_prpsinfo *psinfo, 
struct task_struct *p,
 /* Here is the structure in which status of each thread is captured. */
 struct elf_thread_status
 {
-   struct list_head list;
+   struct elf_thread_status *next;
struct elf_prstatus_fdpic prstatus; /* NT_PRSTATUS */
elf_fpregset_t fpu; /* NT_PRFPREG */
struct task_struct *thread;
@@ -1578,8 +1578,7 @@ static int elf_fdpic_core_dump(struct coredump_params 
*cprm)
struct memelfnote *notes = NULL;
struct elf_prstatus_fdpic *prstatus = NULL; /* NT_PRSTATUS */
struct elf_prpsinfo *psinfo = NULL; /* NT_PRPSINFO */
-   LIST_HEAD(thread_list);
-   struct list_head *t;
+   struct elf_thread_status *thread_list = NULL;
elf_fpregset_t *fpu = NULL;
int thread_status_size = 0;
elf_addr_t *auxv;
@@ -1627,15 +1626,12 @@ static int elf_fdpic_core_dump(struct coredump_params 
*cprm)
goto end_coredump;
 
tmp->thread = ct->task;
-   list_add(>list, _list);
+   tmp->next = thread_list;
+   thread_list = tmp;
}
 
-   list_for_each(t, _list) {
-   struct elf_thread_status *tmp;
-   int sz;
-
-   tmp = list_entry(t, struct elf_thread_status, list);
-   sz = elf_dump_thread_status(cprm->siginfo->si_signo, tmp);
+   for (tmp = thread_list; tmp; tmp = tmp->next) {
+   int sz = elf_dump_thread_status(cprm->siginfo->si_signo, tmp);
thread_status_size += sz;
}
 
@@ -1760,10 +1756,7 @@ static int elf_fdpic_core_dump(struct coredump_params 
*cprm)
goto end_coredump;
 
/* write out the thread status notes section */
-   list_for_each(t, _list) {
-   struct elf_thread_status *tmp =
-   list_entry(t, struct elf_thread_status, list);
-
+   for (tmp = thread_list; tmp; tmp = tmp->next) {
for (i = 0; i < tmp->num_notes; i++)
if (!writenote(>notes[i], cprm))
goto end_coredump;
@@ -1791,10 +1784,10 @@ static int elf_fdpic_core_dump(struct coredump_params 
*cprm)
}
 
 end_coredump:
-   while (!list_empty(_list)) {
-   struct list_head *tmp = thread_list.next;
-   list_del(tmp);
-   kfree(list_entry(tmp, struct elf_thread_status, list));
+   while (thread_list) {
+   tmp = thread_list;
+   thread_list = thread_list->next;
+   kfree(tmp);
}
kfree(phdr4note);
kfree(elf);
-- 
2.11.0

[PATCH 5/7] [elf-fdpic] move allocation of elf_thread_status into elf_dump_thread_status()

From: Al Viro 

Signed-off-by: Al Viro 
---
 fs/binfmt_elf_fdpic.c | 24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index bcbf756fba39..ba4f264dff3a 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1466,12 +1466,13 @@ struct elf_thread_status
  * we need to keep a linked list of every thread's pr_status and then create
  * a single section for them in the final core file.
  */
-static int elf_dump_thread_status(long signr, struct elf_thread_status *t)
+static struct elf_thread_status *elf_dump_thread_status(long signr, struct 
task_struct *p, int *sz)
 {
-   struct task_struct *p = t->thread;
-   int sz = 0;
+   struct elf_thread_status *t;
 
-   t->num_notes = 0;
+   t = kzalloc(sizeof(struct elf_thread_status), GFP_KERNEL);
+   if (!t)
+   return t;
 
fill_prstatus(>prstatus, p, signr);
elf_core_copy_task_regs(p, >prstatus.pr_reg);
@@ -1479,16 +1480,16 @@ static int elf_dump_thread_status(long signr, struct 
elf_thread_status *t)
fill_note(>notes[0], "CORE", NT_PRSTATUS, sizeof(t->prstatus),
  >prstatus);
t->num_notes++;
-   sz += notesize(>notes[0]);
+   *sz += notesize(>notes[0]);
 
t->prstatus.pr_fpvalid = elf_core_copy_task_fpregs(p, NULL, >fpu);
if (t->prstatus.pr_fpvalid) {
fill_note(>notes[1], "CORE", NT_PRFPREG, sizeof(t->fpu),
  >fpu);
t->num_notes++;
-   sz += notesize(>notes[1]);
+   *sz += notesize(>notes[1]);
}
-   return sz;
+   return t;
 }
 
 static void fill_extnum_info(struct elfhdr *elf, struct elf_shdr *shdr4extnum,
@@ -1621,20 +1622,15 @@ static int elf_fdpic_core_dump(struct coredump_params 
*cprm)
 
for (ct = current->mm->core_state->dumper.next;
ct; ct = ct->next) {
-   tmp = kzalloc(sizeof(*tmp), GFP_KERNEL);
+   tmp = elf_dump_thread_status(cprm->siginfo->si_signo,
+ct->task, _status_size);
if (!tmp)
goto end_coredump;
 
-   tmp->thread = ct->task;
tmp->next = thread_list;
thread_list = tmp;
}
 
-   for (tmp = thread_list; tmp; tmp = tmp->next) {
-   int sz = elf_dump_thread_status(cprm->siginfo->si_signo, tmp);
-   thread_status_size += sz;
-   }
-
/* now collect the dump for the current */
fill_prstatus(prstatus, current, cprm->siginfo->si_signo);
elf_core_copy_regs(>pr_reg, cprm->regs);
-- 
2.11.0

[PATCH 3/7] kill elf_fpxregs_t

From: Al Viro 

all uses are conditional upon ELF_CORE_COPY_XFPREGS, which has not
been defined on any architecture since 2010

Signed-off-by: Al Viro 
---
 arch/ia64/include/asm/elf.h|  2 --
 arch/powerpc/include/asm/elf.h |  2 --
 arch/x86/include/asm/elf.h |  2 --
 fs/binfmt_elf.c| 30 --
 fs/binfmt_elf_fdpic.c  | 28 
 include/linux/elfcore.h|  7 ---
 6 files changed, 71 deletions(-)

diff --git a/arch/ia64/include/asm/elf.h b/arch/ia64/include/asm/elf.h
index c70bb9c11f52..6629301a2620 100644
--- a/arch/ia64/include/asm/elf.h
+++ b/arch/ia64/include/asm/elf.h
@@ -179,8 +179,6 @@ extern void ia64_init_addr_space (void);
 #define ELF_AR_SSD_OFFSET  (56 * sizeof(elf_greg_t))
 #define ELF_AR_END_OFFSET  (57 * sizeof(elf_greg_t))
 
-typedef unsigned long elf_fpxregset_t;
-
 typedef unsigned long elf_greg_t;
 typedef elf_greg_t elf_gregset_t[ELF_NGREG];
 
diff --git a/arch/powerpc/include/asm/elf.h b/arch/powerpc/include/asm/elf.h
index 57c229a86f08..53ed2ca40151 100644
--- a/arch/powerpc/include/asm/elf.h
+++ b/arch/powerpc/include/asm/elf.h
@@ -53,8 +53,6 @@ static inline void ppc_elf_core_copy_regs(elf_gregset_t 
elf_regs,
 }
 #define ELF_CORE_COPY_REGS(gregs, regs) ppc_elf_core_copy_regs(gregs, regs);
 
-typedef elf_vrregset_t elf_fpxregset_t;
-
 /* ELF_HWCAP yields a mask that user programs can use to figure out what
instruction set this cpu supports.  This could be done in userspace,
but it's not easy, and we've already done it here.  */
diff --git a/arch/x86/include/asm/elf.h b/arch/x86/include/asm/elf.h
index 452beed7892b..b9a5d488f1a5 100644
--- a/arch/x86/include/asm/elf.h
+++ b/arch/x86/include/asm/elf.h
@@ -21,8 +21,6 @@ typedef struct user_i387_struct elf_fpregset_t;
 
 #ifdef __i386__
 
-typedef struct user_fxsr_struct elf_fpxregset_t;
-
 #define R_386_NONE 0
 #define R_386_32   1
 #define R_386_PC32 2
diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index 80743b8957c9..6a171a28bdf7 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -2038,9 +2038,6 @@ struct elf_thread_status
struct elf_prstatus prstatus;   /* NT_PRSTATUS */
elf_fpregset_t fpu; /* NT_PRFPREG */
struct task_struct *thread;
-#ifdef ELF_CORE_COPY_XFPREGS
-   elf_fpxregset_t xfpu;   /* ELF_CORE_XFPREG_TYPE */
-#endif
struct memelfnote notes[3];
int num_notes;
 };
@@ -2071,15 +2068,6 @@ static int elf_dump_thread_status(long signr, struct 
elf_thread_status *t)
t->num_notes++;
sz += notesize(>notes[1]);
}
-
-#ifdef ELF_CORE_COPY_XFPREGS
-   if (elf_core_copy_task_xfpregs(p, >xfpu)) {
-   fill_note(>notes[2], "LINUX", ELF_CORE_XFPREG_TYPE,
- sizeof(t->xfpu), >xfpu);
-   t->num_notes++;
-   sz += notesize(>notes[2]);
-   }
-#endif 
return sz;
 }
 
@@ -2090,9 +2078,6 @@ struct elf_note_info {
struct elf_prpsinfo *psinfo;/* NT_PRPSINFO */
struct list_head thread_list;
elf_fpregset_t *fpu;
-#ifdef ELF_CORE_COPY_XFPREGS
-   elf_fpxregset_t *xfpu;
-#endif
user_siginfo_t csigdata;
int thread_status_size;
int numnote;
@@ -2116,11 +2101,6 @@ static int elf_note_info_init(struct elf_note_info *info)
info->fpu = kmalloc(sizeof(*info->fpu), GFP_KERNEL);
if (!info->fpu)
return 0;
-#ifdef ELF_CORE_COPY_XFPREGS
-   info->xfpu = kmalloc(sizeof(*info->xfpu), GFP_KERNEL);
-   if (!info->xfpu)
-   return 0;
-#endif
return 1;
 }
 
@@ -2184,13 +2164,6 @@ static int fill_note_info(struct elfhdr *elf, int phdrs,
if (info->prstatus->pr_fpvalid)
fill_note(info->notes + info->numnote++,
  "CORE", NT_PRFPREG, sizeof(*info->fpu), info->fpu);
-#ifdef ELF_CORE_COPY_XFPREGS
-   if (elf_core_copy_task_xfpregs(current, info->xfpu))
-   fill_note(info->notes + info->numnote++,
- "LINUX", ELF_CORE_XFPREG_TYPE,
- sizeof(*info->xfpu), info->xfpu);
-#endif
-
return 1;
 }
 
@@ -2243,9 +2216,6 @@ static void free_note_info(struct elf_note_info *info)
kfree(info->psinfo);
kfree(info->notes);
kfree(info->fpu);
-#ifdef ELF_CORE_COPY_XFPREGS
-   kfree(info->xfpu);
-#endif
 }
 
 #endif
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 6e13d8bea32d..a6ee92137529 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1457,9 +1457,6 @@ struct elf_thread_status
struct elf_prstatus_fdpic prstatus; /* NT_PRSTATUS */
elf_fpregset_t fpu; /* NT_PRFPREG */
struct task_struct *thread;
-#ifdef ELF_CORE_COPY_XFPREGS
-   elf_fpxregset_t xfpu;   /* ELF_CORE_XFPREG_TYPE */
-#endif
struct memelfnote notes[3];
int num_notes;
 };
@@ -1491,15 +1488,6

[PATCH 7/7] [elf-fdpic] switch coredump to regsets

From: Al Viro 

similar to how elf coredump is working on architectures that
have regsets, and all architectures with elf-fdpic support *do*
have that.

Signed-off-by: Al Viro 
---
 fs/binfmt_elf_fdpic.c | 23 ++-
 1 file changed, 18 insertions(+), 5 deletions(-)

diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 34c45410d587..1af03c8d3c09 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -35,6 +35,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include 
 #include 
@@ -1456,8 +1457,7 @@ struct elf_thread_status
struct elf_thread_status *next;
struct elf_prstatus_fdpic prstatus; /* NT_PRSTATUS */
elf_fpregset_t fpu; /* NT_PRFPREG */
-   struct task_struct *thread;
-   struct memelfnote notes[3];
+   struct memelfnote notes[2];
int num_notes;
 };
 
@@ -1468,22 +1468,35 @@ struct elf_thread_status
  */
 static struct elf_thread_status *elf_dump_thread_status(long signr, struct 
task_struct *p, int *sz)
 {
+   const struct user_regset_view *view = task_user_regset_view(p);
struct elf_thread_status *t;
+   int i, ret;
 
t = kzalloc(sizeof(struct elf_thread_status), GFP_KERNEL);
if (!t)
return t;
 
fill_prstatus(>prstatus, p, signr);
-   elf_core_copy_task_regs(p, >prstatus.pr_reg);
+   regset_get(p, >regsets[0],
+  sizeof(t->prstatus.pr_reg), >prstatus.pr_reg);
 
fill_note(>notes[0], "CORE", NT_PRSTATUS, sizeof(t->prstatus),
  >prstatus);
t->num_notes++;
*sz += notesize(>notes[0]);
 
-   t->prstatus.pr_fpvalid = elf_core_copy_task_fpregs(p, task_pt_regs(p),
-  >fpu);
+   for (i = 1; i < view->n; ++i) {
+   const struct user_regset *regset = >regsets[i];
+   if (regset->core_note_type != NT_PRFPREG)
+   continue;
+   if (regset->active && regset->active(p, regset) <= 0)
+   continue;
+   ret = regset_get(p, regset, sizeof(t->fpu), >fpu);
+   if (ret >= 0)
+   t->prstatus.pr_fpvalid = 1;
+   break;
+   }
+
if (t->prstatus.pr_fpvalid) {
fill_note(>notes[1], "CORE", NT_PRFPREG, sizeof(t->fpu),
  >fpu);
-- 
2.11.0

[PATCH 6/7] [elf-fdpic] use elf_dump_thread_status() for the dumper thread as well

From: Al Viro 

the only reason to have it open-coded for the first (dumper) thread is
that coredump has a couple of process-wide notes stuck right after the
first (NT_PRSTATUS) note of the first thread.  But we don't need to
make the data collection side irregular for the first thread to handle
that - it's only the logics ordering the calls of writenote() that
needs to take care of that.

Signed-off-by: Al Viro 
---
 fs/binfmt_elf_fdpic.c | 81 ++-
 1 file changed, 28 insertions(+), 53 deletions(-)

diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index ba4f264dff3a..34c45410d587 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1482,7 +1482,8 @@ static struct elf_thread_status 
*elf_dump_thread_status(long signr, struct task_
t->num_notes++;
*sz += notesize(>notes[0]);
 
-   t->prstatus.pr_fpvalid = elf_core_copy_task_fpregs(p, NULL, >fpu);
+   t->prstatus.pr_fpvalid = elf_core_copy_task_fpregs(p, task_pt_regs(p),
+  >fpu);
if (t->prstatus.pr_fpvalid) {
fill_note(>notes[1], "CORE", NT_PRFPREG, sizeof(t->fpu),
  >fpu);
@@ -1568,19 +1569,15 @@ static size_t elf_core_vma_data_size(unsigned long 
mm_flags)
  */
 static int elf_fdpic_core_dump(struct coredump_params *cprm)
 {
-#defineNUM_NOTES   6
int has_dumped = 0;
int segs;
int i;
struct vm_area_struct *vma;
struct elfhdr *elf = NULL;
loff_t offset = 0, dataoff;
-   int numnote;
-   struct memelfnote *notes = NULL;
-   struct elf_prstatus_fdpic *prstatus = NULL; /* NT_PRSTATUS */
+   struct memelfnote psinfo_note, auxv_note;
struct elf_prpsinfo *psinfo = NULL; /* NT_PRPSINFO */
struct elf_thread_status *thread_list = NULL;
-   elf_fpregset_t *fpu = NULL;
int thread_status_size = 0;
elf_addr_t *auxv;
struct elf_phdr *phdr4note = NULL;
@@ -1606,19 +1603,9 @@ static int elf_fdpic_core_dump(struct coredump_params 
*cprm)
elf = kmalloc(sizeof(*elf), GFP_KERNEL);
if (!elf)
goto end_coredump;
-   prstatus = kzalloc(sizeof(*prstatus), GFP_KERNEL);
-   if (!prstatus)
-   goto end_coredump;
psinfo = kmalloc(sizeof(*psinfo), GFP_KERNEL);
if (!psinfo)
goto end_coredump;
-   notes = kmalloc_array(NUM_NOTES, sizeof(struct memelfnote),
- GFP_KERNEL);
-   if (!notes)
-   goto end_coredump;
-   fpu = kmalloc(sizeof(*fpu), GFP_KERNEL);
-   if (!fpu)
-   goto end_coredump;
 
for (ct = current->mm->core_state->dumper.next;
ct; ct = ct->next) {
@@ -1632,8 +1619,12 @@ static int elf_fdpic_core_dump(struct coredump_params 
*cprm)
}
 
/* now collect the dump for the current */
-   fill_prstatus(prstatus, current, cprm->siginfo->si_signo);
-   elf_core_copy_regs(>pr_reg, cprm->regs);
+   tmp = elf_dump_thread_status(cprm->siginfo->si_signo,
+current, _status_size);
+   if (!tmp)
+   goto end_coredump;
+   tmp->next = thread_list;
+   thread_list = tmp;
 
segs = current->mm->map_count;
segs += elf_core_extra_phdrs();
@@ -1655,46 +1646,28 @@ static int elf_fdpic_core_dump(struct coredump_params 
*cprm)
 * with info from their /proc.
 */
 
-   fill_note(notes + 0, "CORE", NT_PRSTATUS, sizeof(*prstatus), prstatus);
fill_psinfo(psinfo, current->group_leader, current->mm);
-   fill_note(notes + 1, "CORE", NT_PRPSINFO, sizeof(*psinfo), psinfo);
-
-   numnote = 2;
+   fill_note(_note, "CORE", NT_PRPSINFO, sizeof(*psinfo), psinfo);
+   thread_status_size += notesize(_note);
 
auxv = (elf_addr_t *) current->mm->saved_auxv;
-
i = 0;
do
i += 2;
while (auxv[i - 2] != AT_NULL);
-   fill_note([numnote++], "CORE", NT_AUXV,
- i * sizeof(elf_addr_t), auxv);
+   fill_note(_note, "CORE", NT_AUXV, i * sizeof(elf_addr_t), auxv);
+   thread_status_size += notesize(_note);
 
-   /* Try to dump the FPU. */
-   if ((prstatus->pr_fpvalid =
-elf_core_copy_task_fpregs(current, cprm->regs, fpu)))
-   fill_note(notes + numnote++,
- "CORE", NT_PRFPREG, sizeof(*fpu), fpu);
-
-   offset += sizeof(*elf); /* Elf header */
+   offset = sizeof(*elf);  /* Elf header */
offset += segs * sizeof(struct elf_phdr);   /* Program headers */
 
/* Write notes phdr entry */
-   {
-   int sz = 0;
-
-   for (i = 0; i < numnote; i++)
-   sz += notesize(notes + i);
-
-   sz += thread_status_size;
-
-

[PATCH 1/7] unexport linux/elfcore.h

From: Al Viro 

It's unusable from userland - it uses elf_gregset_t, which is not
provided by exported headers.  glibc has it in sys/procfs.h, but
the same file defines struct elf_prstatus, so linux/elfcore.h can't
be included once sys/procfs.h has been pulled.  Same goes for uclibc
and dietlibc simply doesn't have elf_gregset_t defined anywhere.

IOW, no userland source is including that thing.

Signed-off-by: Al Viro 
---
 include/linux/elfcore.h  |  69 +++--
 include/uapi/linux/elfcore.h | 101 ---
 scripts/headers_install.sh   |   1 -
 usr/include/Makefile |   1 -
 4 files changed, 66 insertions(+), 106 deletions(-)
 delete mode 100644 include/uapi/linux/elfcore.h

diff --git a/include/linux/elfcore.h b/include/linux/elfcore.h
index 4cad0e784b28..96ab215dad2d 100644
--- a/include/linux/elfcore.h
+++ b/include/linux/elfcore.h
@@ -5,12 +5,75 @@
 #include 
 #include 
 #include 
-
-#include 
-#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
 
 struct coredump_params;
 
+struct elf_siginfo
+{
+   int si_signo;   /* signal number */
+   int si_code;/* extra code */
+   int si_errno;   /* errno */
+};
+
+/*
+ * Definitions to generate Intel SVR4-like core files.
+ * These mostly have the same names as the SVR4 types with "elf_"
+ * tacked on the front to prevent clashes with linux definitions,
+ * and the typedef forms have been avoided.  This is mostly like
+ * the SVR4 structure, but more Linuxy, with things that Linux does
+ * not support and which gdb doesn't really use excluded.
+ */
+struct elf_prstatus
+{
+   struct elf_siginfo pr_info; /* Info associated with signal */
+   short   pr_cursig;  /* Current signal */
+   unsigned long pr_sigpend;   /* Set of pending signals */
+   unsigned long pr_sighold;   /* Set of held signals */
+   pid_t   pr_pid;
+   pid_t   pr_ppid;
+   pid_t   pr_pgrp;
+   pid_t   pr_sid;
+   struct __kernel_old_timeval pr_utime;   /* User time */
+   struct __kernel_old_timeval pr_stime;   /* System time */
+   struct __kernel_old_timeval pr_cutime;  /* Cumulative user time */
+   struct __kernel_old_timeval pr_cstime;  /* Cumulative system time */
+   elf_gregset_t pr_reg;   /* GP registers */
+#ifdef CONFIG_BINFMT_ELF_FDPIC
+   /* When using FDPIC, the loadmap addresses need to be communicated
+* to GDB in order for GDB to do the necessary relocations.  The
+* fields (below) used to communicate this information are placed
+* immediately after ``pr_reg'', so that the loadmap addresses may
+* be viewed as part of the register set if so desired.
+*/
+   unsigned long pr_exec_fdpic_loadmap;
+   unsigned long pr_interp_fdpic_loadmap;
+#endif
+   int pr_fpvalid; /* True if math co-processor being used.  */
+};
+
+#define ELF_PRARGSZ(80)/* Number of chars for args */
+
+struct elf_prpsinfo
+{
+   charpr_state;   /* numeric process state */
+   charpr_sname;   /* char for pr_state */
+   charpr_zomb;/* zombie */
+   charpr_nice;/* nice val */
+   unsigned long pr_flag;  /* flags */
+   __kernel_uid_t  pr_uid;
+   __kernel_gid_t  pr_gid;
+   pid_t   pr_pid, pr_ppid, pr_pgrp, pr_sid;
+   /* Lots missing */
+   charpr_fname[16];   /* filename of executable */
+   charpr_psargs[ELF_PRARGSZ]; /* initial part of arg list */
+};
+
 static inline void elf_core_copy_regs(elf_gregset_t *elfregs, struct pt_regs 
*regs)
 {
 #ifdef ELF_CORE_COPY_REGS
diff --git a/include/uapi/linux/elfcore.h b/include/uapi/linux/elfcore.h
deleted file mode 100644
index baf03562306d..
--- a/include/uapi/linux/elfcore.h
+++ /dev/null
@@ -1,101 +0,0 @@
-/* SPDX-License-Identifier: GPL-2.0 WITH Linux-syscall-note */
-#ifndef _UAPI_LINUX_ELFCORE_H
-#define _UAPI_LINUX_ELFCORE_H
-
-#include 
-#include 
-#include 
-#include 
-#include 
-#include 
-
-struct elf_siginfo
-{
-   int si_signo;   /* signal number */
-   int si_code;/* extra code */
-   int si_errno;   /* errno */
-};
-
-
-#ifndef __KERNEL__
-typedef elf_greg_t greg_t;
-typedef elf_gregset_t gregset_t;
-typedef elf_fpregset_t fpregset_t;
-typedef elf_fpxregset_t fpxregset_t;
-#define NGREG ELF_NGREG
-#endif
-
-/*
- * Definitions to generate Intel SVR4-like core files.
- * These mostly have the same names as the SVR4 types with "elf_"
- * tacked on the front to prevent clashes with linux definitions,
- * and the typedef forms have been avoided.  This is mostly like
- * the SVR4 structure, but more Linuxy, with things that Linux does
- * not support and which gdb doesn't really use excluded.
- * Fields present but not used are marked with

[PATCH 2/7] take fdpic-related parts of elf_prstatus out

From: Al Viro 

The only architecture where we might end up using both is arm,
and there we definitely don't want fdpic-related fields in
elf_prstatus - coredump layout of ELF binaries should not
depend upon having the kernel built with the support of ELF_FDPIC
ones.  Just move the fdpic-modified variant into binfmt_elf_fdpic.c
(and call it elf_prstatus_fdpic there)

[name stolen from nico]

Signed-off-by: Al Viro 
---
 fs/binfmt_elf_fdpic.c  | 32 +---
 include/linux/elfcore-compat.h |  4 
 include/linux/elfcore.h| 10 --
 3 files changed, 29 insertions(+), 17 deletions(-)

diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 0f45521b237c..6e13d8bea32d 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -1189,6 +1189,32 @@ static int elf_fdpic_map_file_by_direct_mmap(struct 
elf_fdpic_params *params,
  */
 #ifdef CONFIG_ELF_CORE
 
+struct elf_prstatus_fdpic
+{
+   struct elf_siginfo pr_info; /* Info associated with signal */
+   short   pr_cursig;  /* Current signal */
+   unsigned long pr_sigpend;   /* Set of pending signals */
+   unsigned long pr_sighold;   /* Set of held signals */
+   pid_t   pr_pid;
+   pid_t   pr_ppid;
+   pid_t   pr_pgrp;
+   pid_t   pr_sid;
+   struct __kernel_old_timeval pr_utime;   /* User time */
+   struct __kernel_old_timeval pr_stime;   /* System time */
+   struct __kernel_old_timeval pr_cutime;  /* Cumulative user time */
+   struct __kernel_old_timeval pr_cstime;  /* Cumulative system time */
+   elf_gregset_t pr_reg;   /* GP registers */
+   /* When using FDPIC, the loadmap addresses need to be communicated
+* to GDB in order for GDB to do the necessary relocations.  The
+* fields (below) used to communicate this information are placed
+* immediately after ``pr_reg'', so that the loadmap addresses may
+* be viewed as part of the register set if so desired.
+*/
+   unsigned long pr_exec_fdpic_loadmap;
+   unsigned long pr_interp_fdpic_loadmap;
+   int pr_fpvalid; /* True if math co-processor being used.  */
+};
+
 /*
  * Decide whether a segment is worth dumping; default is yes to be
  * sure (missing info is worse than too much; etc).
@@ -1345,7 +1371,7 @@ static inline void fill_note(struct memelfnote *note, 
const char *name, int type
  * fill up all the fields in prstatus from the given task struct, except
  * registers which need to be filled up separately.
  */
-static void fill_prstatus(struct elf_prstatus *prstatus,
+static void fill_prstatus(struct elf_prstatus_fdpic *prstatus,
  struct task_struct *p, long signr)
 {
prstatus->pr_info.si_signo = prstatus->pr_cursig = signr;
@@ -1428,7 +1454,7 @@ static int fill_psinfo(struct elf_prpsinfo *psinfo, 
struct task_struct *p,
 struct elf_thread_status
 {
struct list_head list;
-   struct elf_prstatus prstatus;   /* NT_PRSTATUS */
+   struct elf_prstatus_fdpic prstatus; /* NT_PRSTATUS */
elf_fpregset_t fpu; /* NT_PRFPREG */
struct task_struct *thread;
 #ifdef ELF_CORE_COPY_XFPREGS
@@ -1562,7 +1588,7 @@ static int elf_fdpic_core_dump(struct coredump_params 
*cprm)
loff_t offset = 0, dataoff;
int numnote;
struct memelfnote *notes = NULL;
-   struct elf_prstatus *prstatus = NULL;   /* NT_PRSTATUS */
+   struct elf_prstatus_fdpic *prstatus = NULL; /* NT_PRSTATUS */
struct elf_prpsinfo *psinfo = NULL; /* NT_PRPSINFO */
LIST_HEAD(thread_list);
struct list_head *t;
diff --git a/include/linux/elfcore-compat.h b/include/linux/elfcore-compat.h
index 7a37f4ce9fd2..10485f0c9740 100644
--- a/include/linux/elfcore-compat.h
+++ b/include/linux/elfcore-compat.h
@@ -32,10 +32,6 @@ struct compat_elf_prstatus
struct old_timeval32pr_cutime;
struct old_timeval32pr_cstime;
compat_elf_gregset_tpr_reg;
-#ifdef CONFIG_BINFMT_ELF_FDPIC
-   compat_ulong_t  pr_exec_fdpic_loadmap;
-   compat_ulong_t  pr_interp_fdpic_loadmap;
-#endif
compat_int_tpr_fpvalid;
 };
 
diff --git a/include/linux/elfcore.h b/include/linux/elfcore.h
index 96ab215dad2d..adb8ee89f3fd 100644
--- a/include/linux/elfcore.h
+++ b/include/linux/elfcore.h
@@ -44,16 +44,6 @@ struct elf_prstatus
struct __kernel_old_timeval pr_cutime;  /* Cumulative user time */
struct __kernel_old_timeval pr_cstime;  /* Cumulative system time */
elf_gregset_t pr_reg;   /* GP registers */
-#ifdef CONFIG_BINFMT_ELF_FDPIC
-   /* When using FDPIC, the loadmap addresses need to be communicated
-* to GDB in order for GDB to do the necessary relocations.  The
-* fields (below) used to communicate this information are placed
-* immediately after ``pr_reg'', so that the

[RFC][PATCHES] converting FDPIC coredumps to regsets