[PATCH v5 7/7] prandom: remove unused functions

2022-10-07 Thread Jason A. Donenfeld
With no callers left of prandom_u32() and prandom_bytes(), as well as
get_random_int(), remove these deprecated wrappers, in favor of
get_random_u32() and get_random_bytes().

Reviewed-by: Kees Cook 
Signed-off-by: Jason A. Donenfeld 
---
 drivers/char/random.c   | 11 +--
 include/linux/prandom.h | 12 
 include/linux/random.h  |  5 -
 3 files changed, 5 insertions(+), 23 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 01acf235f263..2fe28eeb2f38 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -97,7 +97,7 @@ MODULE_PARM_DESC(ratelimit_disable, "Disable random ratelimit 
suppression");
  * Returns whether or not the input pool has been seeded and thus guaranteed
  * to supply cryptographically secure random numbers. This applies to: the
  * /dev/urandom device, the get_random_bytes function, and the get_random_{u8,
- * u16,u32,u64,int,long} family of functions.
+ * u16,u32,u64,long} family of functions.
  *
  * Returns: true if the input pool has been seeded.
  *  false if the input pool has not been seeded.
@@ -161,15 +161,14 @@ EXPORT_SYMBOL(wait_for_random_bytes);
  * u16 get_random_u16()
  * u32 get_random_u32()
  * u64 get_random_u64()
- * unsigned int get_random_int()
  * unsigned long get_random_long()
  *
  * These interfaces will return the requested number of random bytes
  * into the given buffer or as a return value. This is equivalent to
- * a read from /dev/urandom. The u8, u16, u32, u64, int, and long
- * family of functions may be higher performance for one-off random
- * integers, because they do a bit of buffering and do not invoke
- * reseeding until the buffer is emptied.
+ * a read from /dev/urandom. The u8, u16, u32, u64, long family of
+ * functions may be higher performance for one-off random integers,
+ * because they do a bit of buffering and do not invoke reseeding
+ * until the buffer is emptied.
  *
  */
 
diff --git a/include/linux/prandom.h b/include/linux/prandom.h
index 78db003bc290..e0a0759dd09c 100644
--- a/include/linux/prandom.h
+++ b/include/linux/prandom.h
@@ -12,18 +12,6 @@
 #include 
 #include 
 
-/* Deprecated: use get_random_u32 instead. */
-static inline u32 prandom_u32(void)
-{
-   return get_random_u32();
-}
-
-/* Deprecated: use get_random_bytes instead. */
-static inline void prandom_bytes(void *buf, size_t nbytes)
-{
-   return get_random_bytes(buf, nbytes);
-}
-
 struct rnd_state {
__u32 s1, s2, s3, s4;
 };
diff --git a/include/linux/random.h b/include/linux/random.h
index 08322f700cdc..147a5e0d0b8e 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -42,10 +42,6 @@ u8 get_random_u8(void);
 u16 get_random_u16(void);
 u32 get_random_u32(void);
 u64 get_random_u64(void);
-static inline unsigned int get_random_int(void)
-{
-   return get_random_u32();
-}
 static inline unsigned long get_random_long(void)
 {
 #if BITS_PER_LONG == 64
@@ -100,7 +96,6 @@ declare_get_random_var_wait(u8, u8)
 declare_get_random_var_wait(u16, u16)
 declare_get_random_var_wait(u32, u32)
 declare_get_random_var_wait(u64, u32)
-declare_get_random_var_wait(int, unsigned int)
 declare_get_random_var_wait(long, unsigned long)
 #undef declare_get_random_var
 
-- 
2.37.3



[PATCH v5 6/7] treewide: use get_random_bytes when possible

2022-10-07 Thread Jason A. Donenfeld
The prandom_bytes() function has been a deprecated inline wrapper around
get_random_bytes() for several releases now, and compiles down to the
exact same code. Replace the deprecated wrapper with a direct call to
the real function. This was done as a basic find and replace.

Reviewed-by: Kees Cook 
Reviewed-by: Christophe Leroy  # powerpc
Signed-off-by: Jason A. Donenfeld 
---
 arch/powerpc/crypto/crc-vpmsum_test.c   |  2 +-
 block/blk-crypto-fallback.c |  2 +-
 crypto/async_tx/raid6test.c |  2 +-
 drivers/dma/dmatest.c   |  2 +-
 drivers/mtd/nand/raw/nandsim.c  |  2 +-
 drivers/mtd/tests/mtd_nandecctest.c |  2 +-
 drivers/mtd/tests/speedtest.c   |  2 +-
 drivers/mtd/tests/stresstest.c  |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c   |  2 +-
 drivers/net/ethernet/rocker/rocker_main.c   |  2 +-
 drivers/net/wireguard/selftest/allowedips.c | 12 ++--
 fs/ubifs/debug.c|  2 +-
 kernel/kcsan/selftest.c |  2 +-
 lib/random32.c  |  2 +-
 lib/test_objagg.c   |  2 +-
 lib/uuid.c  |  2 +-
 net/ipv4/route.c|  2 +-
 net/mac80211/rc80211_minstrel_ht.c  |  2 +-
 net/sched/sch_pie.c |  2 +-
 19 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/crypto/crc-vpmsum_test.c 
b/arch/powerpc/crypto/crc-vpmsum_test.c
index c1c1ef9457fb..273c527868db 100644
--- a/arch/powerpc/crypto/crc-vpmsum_test.c
+++ b/arch/powerpc/crypto/crc-vpmsum_test.c
@@ -82,7 +82,7 @@ static int __init crc_test_init(void)
 
if (len <= offset)
continue;
-   prandom_bytes(data, len);
+   get_random_bytes(data, len);
len -= offset;
 
crypto_shash_update(crct10dif_shash, data+offset, len);
diff --git a/block/blk-crypto-fallback.c b/block/blk-crypto-fallback.c
index 621abd1b0e4d..ad9844c5b40c 100644
--- a/block/blk-crypto-fallback.c
+++ b/block/blk-crypto-fallback.c
@@ -539,7 +539,7 @@ static int blk_crypto_fallback_init(void)
if (blk_crypto_fallback_inited)
return 0;
 
-   prandom_bytes(blank_key, BLK_CRYPTO_MAX_KEY_SIZE);
+   get_random_bytes(blank_key, BLK_CRYPTO_MAX_KEY_SIZE);
 
err = bioset_init(_bio_split, 64, 0, 0);
if (err)
diff --git a/crypto/async_tx/raid6test.c b/crypto/async_tx/raid6test.c
index c9d218e53bcb..f74505f2baf0 100644
--- a/crypto/async_tx/raid6test.c
+++ b/crypto/async_tx/raid6test.c
@@ -37,7 +37,7 @@ static void makedata(int disks)
int i;
 
for (i = 0; i < disks; i++) {
-   prandom_bytes(page_address(data[i]), PAGE_SIZE);
+   get_random_bytes(page_address(data[i]), PAGE_SIZE);
dataptrs[i] = data[i];
dataoffs[i] = 0;
}
diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
index 9fe2ae794316..ffe621695e47 100644
--- a/drivers/dma/dmatest.c
+++ b/drivers/dma/dmatest.c
@@ -312,7 +312,7 @@ static unsigned long dmatest_random(void)
 {
unsigned long buf;
 
-   prandom_bytes(, sizeof(buf));
+   get_random_bytes(, sizeof(buf));
return buf;
 }
 
diff --git a/drivers/mtd/nand/raw/nandsim.c b/drivers/mtd/nand/raw/nandsim.c
index 4bdaf4aa7007..c941a5a41ea6 100644
--- a/drivers/mtd/nand/raw/nandsim.c
+++ b/drivers/mtd/nand/raw/nandsim.c
@@ -1393,7 +1393,7 @@ static int ns_do_read_error(struct nandsim *ns, int num)
unsigned int page_no = ns->regs.row;
 
if (ns_read_error(page_no)) {
-   prandom_bytes(ns->buf.byte, num);
+   get_random_bytes(ns->buf.byte, num);
NS_WARN("simulating read error in page %u\n", page_no);
return 1;
}
diff --git a/drivers/mtd/tests/mtd_nandecctest.c 
b/drivers/mtd/tests/mtd_nandecctest.c
index 1c7201b0f372..440988562cfd 100644
--- a/drivers/mtd/tests/mtd_nandecctest.c
+++ b/drivers/mtd/tests/mtd_nandecctest.c
@@ -266,7 +266,7 @@ static int nand_ecc_test_run(const size_t size)
goto error;
}
 
-   prandom_bytes(correct_data, size);
+   get_random_bytes(correct_data, size);
ecc_sw_hamming_calculate(correct_data, size, correct_ecc, sm_order);
for (i = 0; i < ARRAY_SIZE(nand_ecc_test); i++) {
nand_ecc_test[i].prepare(error_data, error_ecc,
diff --git a/drivers/mtd/tests/speedtest.c b/drivers/mtd/tests/speedtest.c
index c9ec7086bfa1..075bce32caa5 100644
--- a/drivers/mtd/tests/speedtest.c
+++ b/drivers/mtd/tests/speedtest.c
@@ -223,7 +223,7 @@ static int __init mtd_speedtest_init(void)
if (!iobuf)
goto out;
 
-   prandom_bytes(iobuf, mtd->erasesize);
+   get_random_bytes(iobuf, mtd->erasesize);
 
bbt = kzalloc(ebcnt, GFP_KERNEL);

[PATCH v5 5/7] treewide: use get_random_u32() when possible

2022-10-07 Thread Jason A. Donenfeld
The prandom_u32() function has been a deprecated inline wrapper around
get_random_u32() for several releases now, and compiles down to the
exact same code. Replace the deprecated wrapper with a direct call to
the real function. The same also applies to get_random_int(), which is
just a wrapper around get_random_u32(). This was done as a basic find
and replace.

Reviewed-by: Kees Cook 
Acked-by: Toke Høiland-Jørgensen  # for sch_cake
Acked-by: Chuck Lever  # for nfsd
Reviewed-by: Jan Kara  # for ext4
Acked-by: Mika Westerberg  # for thunderbolt
Acked-by: Darrick J. Wong  # for xfs
Signed-off-by: Jason A. Donenfeld 
---
 Documentation/networking/filter.rst|  2 +-
 arch/parisc/kernel/process.c   |  2 +-
 arch/parisc/kernel/sys_parisc.c|  4 ++--
 arch/s390/mm/mmap.c|  2 +-
 arch/x86/kernel/cpu/amd.c  |  2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c|  6 +++---
 drivers/gpu/drm/i915/selftests/i915_selftest.c |  2 +-
 drivers/gpu/drm/selftests/test-drm_buddy.c |  2 +-
 drivers/gpu/drm/selftests/test-drm_mm.c|  2 +-
 drivers/infiniband/hw/cxgb4/cm.c   |  4 ++--
 drivers/infiniband/hw/hfi1/tid_rdma.c  |  2 +-
 drivers/infiniband/hw/mlx4/mad.c   |  2 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c|  2 +-
 drivers/md/raid5-cache.c   |  2 +-
 .../media/test-drivers/vivid/vivid-touch-cap.c |  4 ++--
 drivers/misc/habanalabs/gaudi2/gaudi2.c|  2 +-
 drivers/mtd/nand/raw/nandsim.c |  2 +-
 drivers/net/bonding/bond_main.c|  2 +-
 drivers/net/ethernet/broadcom/cnic.c   |  2 +-
 .../chelsio/inline_crypto/chtls/chtls_cm.c |  2 +-
 drivers/net/ethernet/rocker/rocker_main.c  |  6 +++---
 .../wireless/broadcom/brcm80211/brcmfmac/pno.c |  2 +-
 .../net/wireless/marvell/mwifiex/cfg80211.c|  4 ++--
 .../net/wireless/microchip/wilc1000/cfg80211.c |  2 +-
 .../net/wireless/quantenna/qtnfmac/cfg80211.c  |  2 +-
 drivers/net/wireless/ti/wlcore/main.c  |  2 +-
 drivers/nvme/common/auth.c |  2 +-
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c |  4 ++--
 drivers/target/iscsi/cxgbit/cxgbit_cm.c|  2 +-
 drivers/thunderbolt/xdomain.c  |  2 +-
 drivers/video/fbdev/uvesafb.c  |  2 +-
 fs/exfat/inode.c   |  2 +-
 fs/ext4/ialloc.c   |  2 +-
 fs/ext4/ioctl.c|  4 ++--
 fs/ext4/mmp.c  |  2 +-
 fs/f2fs/namei.c|  2 +-
 fs/fat/inode.c |  2 +-
 fs/nfsd/nfs4state.c|  4 ++--
 fs/ntfs3/fslog.c   |  6 +++---
 fs/ubifs/journal.c |  2 +-
 fs/xfs/libxfs/xfs_ialloc.c |  2 +-
 fs/xfs/xfs_icache.c|  2 +-
 fs/xfs/xfs_log.c   |  2 +-
 include/net/netfilter/nf_queue.h   |  2 +-
 include/net/red.h  |  2 +-
 include/net/sock.h |  2 +-
 kernel/bpf/bloom_filter.c  |  2 +-
 kernel/bpf/core.c  |  2 +-
 kernel/bpf/hashtab.c   |  2 +-
 kernel/bpf/verifier.c  |  2 +-
 kernel/kcsan/selftest.c|  2 +-
 lib/random32.c |  2 +-
 lib/reed_solomon/test_rslib.c  |  6 +++---
 lib/test_fprobe.c  |  2 +-
 lib/test_kprobes.c |  2 +-
 lib/test_min_heap.c|  6 +++---
 lib/test_rhashtable.c  |  6 +++---
 mm/shmem.c |  2 +-
 mm/slab.c  |  2 +-
 net/802/garp.c |  2 +-
 net/802/mrp.c  |  2 +-
 net/core/pktgen.c  |  4 ++--
 net/ipv4/route.c   |  2 +-
 net/ipv4/tcp_cdg.c |  2 +-
 net/ipv4/udp.c |  2 +-
 net/ipv6/ip6_flowlabel.c   |  2 +-
 net/ipv6/output_core.c |  2 +-
 net/netfilter/ipvs/ip_vs_conn.c|  2 +-
 net/netfilter/xt_statistic.c   |  2 +-
 net/openvswitch/actions.c  |  2 +-
 net/sched/sch_cake.c   |  2 +-
 net/sched/sch_netem.c  | 18 +-
 net/sunrpc/auth_gss/gss_krb5_wrap.c|  4 ++--
 net/sunrpc/xprt.c  |  2 +-
 net/unix/af_unix.c |  2 +-
 75 files changed, 104 insertions(+), 104 deletions(-)

diff --git a/Documentation/networking/filter.rst 

[PATCH v5 4/7] treewide: use get_random_{u8, u16}() when possible, part 2

2022-10-07 Thread Jason A. Donenfeld
Rather than truncate a 32-bit value to a 16-bit value or an 8-bit value,
simply use the get_random_{u8,u16}() functions, which are faster than
wasting the additional bytes from a 32-bit value. This was done by hand,
identifying all of the places where one of the random integer functions
was used in a non-32-bit context.

Reviewed-by: Kees Cook 
Signed-off-by: Jason A. Donenfeld 
---
 arch/s390/kernel/process.c  | 2 +-
 lib/test_vmalloc.c  | 2 +-
 net/ipv4/ip_output.c| 2 +-
 net/netfilter/nf_nat_core.c | 4 ++--
 net/rds/bind.c  | 2 +-
 net/sched/sch_sfb.c | 2 +-
 6 files changed, 7 insertions(+), 7 deletions(-)

diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
index 5ec78555dd2e..42af4b3aa02b 100644
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -230,7 +230,7 @@ unsigned long arch_align_stack(unsigned long sp)
 
 static inline unsigned long brk_rnd(void)
 {
-   return (get_random_int() & BRK_RND_MASK) << PAGE_SHIFT;
+   return (get_random_u16() & BRK_RND_MASK) << PAGE_SHIFT;
 }
 
 unsigned long arch_randomize_brk(struct mm_struct *mm)
diff --git a/lib/test_vmalloc.c b/lib/test_vmalloc.c
index a26bbbf20e62..cf7780572f5b 100644
--- a/lib/test_vmalloc.c
+++ b/lib/test_vmalloc.c
@@ -80,7 +80,7 @@ static int random_size_align_alloc_test(void)
int i;
 
for (i = 0; i < test_loop_count; i++) {
-   rnd = prandom_u32();
+   rnd = get_random_u8();
 
/*
 * Maximum 1024 pages, if PAGE_SIZE is 4096.
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 04e2034f2f8e..a4fbdbff14b3 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -172,7 +172,7 @@ int ip_build_and_send_pkt(struct sk_buff *skb, const struct 
sock *sk,
 * Avoid using the hashed IP ident generator.
 */
if (sk->sk_protocol == IPPROTO_TCP)
-   iph->id = (__force __be16)prandom_u32();
+   iph->id = (__force __be16)get_random_u16();
else
__ip_select_ident(net, iph, 1);
}
diff --git a/net/netfilter/nf_nat_core.c b/net/netfilter/nf_nat_core.c
index 7981be526f26..57c7686ac485 100644
--- a/net/netfilter/nf_nat_core.c
+++ b/net/netfilter/nf_nat_core.c
@@ -468,7 +468,7 @@ static void nf_nat_l4proto_unique_tuple(struct 
nf_conntrack_tuple *tuple,
if (range->flags & NF_NAT_RANGE_PROTO_OFFSET)
off = (ntohs(*keyptr) - ntohs(range->base_proto.all));
else
-   off = prandom_u32();
+   off = get_random_u16();
 
attempts = range_size;
if (attempts > max_attempts)
@@ -490,7 +490,7 @@ static void nf_nat_l4proto_unique_tuple(struct 
nf_conntrack_tuple *tuple,
if (attempts >= range_size || attempts < 16)
return;
attempts /= 2;
-   off = prandom_u32();
+   off = get_random_u16();
goto another_round;
 }
 
diff --git a/net/rds/bind.c b/net/rds/bind.c
index 5b5fb4ca8d3e..97a29172a8ee 100644
--- a/net/rds/bind.c
+++ b/net/rds/bind.c
@@ -104,7 +104,7 @@ static int rds_add_bound(struct rds_sock *rs, const struct 
in6_addr *addr,
return -EINVAL;
last = rover;
} else {
-   rover = max_t(u16, prandom_u32(), 2);
+   rover = max_t(u16, get_random_u16(), 2);
last = rover - 1;
}
 
diff --git a/net/sched/sch_sfb.c b/net/sched/sch_sfb.c
index 2829455211f8..7eb70acb4d58 100644
--- a/net/sched/sch_sfb.c
+++ b/net/sched/sch_sfb.c
@@ -379,7 +379,7 @@ static int sfb_enqueue(struct sk_buff *skb, struct Qdisc 
*sch,
goto enqueue;
}
 
-   r = prandom_u32() & SFB_MAX_PROB;
+   r = get_random_u16() & SFB_MAX_PROB;
 
if (unlikely(r < p_min)) {
if (unlikely(p_min > SFB_MAX_PROB / 2)) {
-- 
2.37.3



[PATCH v5 3/7] treewide: use get_random_{u8, u16}() when possible, part 1

2022-10-07 Thread Jason A. Donenfeld
Rather than truncate a 32-bit value to a 16-bit value or an 8-bit value,
simply use the get_random_{u8,u16}() functions, which are faster than
wasting the additional bytes from a 32-bit value. This was done
mechanically with this coccinelle script:

@@
expression E;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u16;
typedef u8;
@@
(
- (get_random_u32() & 0x)
+ get_random_u16()
|
- (get_random_u32() & 0xff)
+ get_random_u8()
|
- (get_random_u32() % 65536)
+ get_random_u16()
|
- (get_random_u32() % 256)
+ get_random_u8()
|
- (get_random_u32() >> 16)
+ get_random_u16()
|
- (get_random_u32() >> 24)
+ get_random_u8()
|
- (u16)get_random_u32()
+ get_random_u16()
|
- (u8)get_random_u32()
+ get_random_u8()
|
- prandom_u32_max(65536)
+ get_random_u16()
|
- prandom_u32_max(256)
+ get_random_u8()
|
- E->inet_id = get_random_u32()
+ E->inet_id = get_random_u16()
)

@@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u16;
identifier v;
@@
- u16 v = get_random_u32();
+ u16 v = get_random_u16();

@@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
typedef u8;
identifier v;
@@
- u8 v = get_random_u32();
+ u8 v = get_random_u8();

// Find a potential literal
@literal_mask@
expression LITERAL;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@

((T)get_random_u32()@p & (LITERAL))

// Examine limits
@script:python add_one@
literal << literal_mask.LITERAL;
RESULT;
@@

value = None
if literal.startswith('0x'):
value = int(literal, 16)
elif literal[0] in '123456789':
value = int(literal, 10)
if value is None:
print("I don't know how to handle %s" % (literal))
cocci.include_match(False)
elif value < 256:
coccinelle.RESULT = cocci.make_ident("get_random_u8")
elif value < 65536:
coccinelle.RESULT = cocci.make_ident("get_random_u16")
else:
print("Skipping large mask of %s" % (literal))
cocci.include_match(False)

// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
identifier add_one.RESULT;
identifier FUNC;
@@

-   (FUNC()@p & (LITERAL))
+   (RESULT() & LITERAL)

Reviewed-by: Kees Cook 
Acked-by: Toke Høiland-Jørgensen  # for sch_cake
Signed-off-by: Jason A. Donenfeld 
---
 arch/arm/kernel/signal.c  | 2 +-
 arch/arm64/kernel/syscall.c   | 2 +-
 crypto/testmgr.c  | 8 
 drivers/media/common/v4l2-tpg/v4l2-tpg-core.c | 2 +-
 drivers/media/test-drivers/vivid/vivid-radio-rx.c | 4 ++--
 .../net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c   | 2 +-
 drivers/net/hamradio/baycom_epp.c | 2 +-
 drivers/net/hamradio/hdlcdrv.c| 2 +-
 drivers/net/hamradio/yam.c| 2 +-
 drivers/net/wireguard/selftest/allowedips.c   | 4 ++--
 drivers/net/wireless/st/cw1200/wsm.c  | 2 +-
 drivers/scsi/lpfc/lpfc_hbadisc.c  | 6 +++---
 lib/cmdline_kunit.c   | 4 ++--
 net/dccp/ipv4.c   | 4 ++--
 net/ipv4/datagram.c   | 2 +-
 net/ipv4/tcp_ipv4.c   | 4 ++--
 net/mac80211/scan.c   | 2 +-
 net/sched/sch_cake.c  | 6 +++---
 net/sctp/socket.c | 2 +-
 19 files changed, 31 insertions(+), 31 deletions(-)

diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index ea128e32e8ca..e07f359254c3 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -655,7 +655,7 @@ struct page *get_signal_page(void)
 PAGE_SIZE / sizeof(u32));
 
/* Give the signal return code some randomness */
-   offset = 0x200 + (get_random_int() & 0x7fc);
+   offset = 0x200 + (get_random_u16() & 0x7fc);
signal_return_offset = offset;
 
/* Copy signal return handlers into the page */
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index 733451fe7e41..d72e8f23422d 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -67,7 +67,7 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int 
scno,
 *
 * The resulting 5 bits of entropy is seen in SP[8:4].
 */
-   choose_random_kstack_offset(get_random_int() & 0x1FF);
+   choose_random_kstack_offset(get_random_u16() & 0x1FF);
 }
 
 static inline bool has_syscall_work(unsigned long flags)
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index be45217acde4..981c637fa2ed 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -927,7 +927,7 @@ static void generate_random_bytes(u8 *buf, size_t count)
 

[PATCH v5 2/7] treewide: use prandom_u32_max() when possible, part 2

2022-10-07 Thread Jason A. Donenfeld
Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done by hand, covering things that coccinelle could not do on its own.

Reviewed-by: Kees Cook 
Reviewed-by: Jan Kara  # for ext2, ext4, and sbitmap
Signed-off-by: Jason A. Donenfeld 
---
 fs/ext2/ialloc.c   |  3 +--
 fs/ext4/ialloc.c   |  5 ++---
 lib/sbitmap.c  |  2 +-
 lib/test_vmalloc.c | 17 -
 4 files changed, 8 insertions(+), 19 deletions(-)

diff --git a/fs/ext2/ialloc.c b/fs/ext2/ialloc.c
index 998dd2ac8008..f4944c4dee60 100644
--- a/fs/ext2/ialloc.c
+++ b/fs/ext2/ialloc.c
@@ -277,8 +277,7 @@ static int find_group_orlov(struct super_block *sb, struct 
inode *parent)
int best_ndir = inodes_per_group;
int best_group = -1;
 
-   group = prandom_u32();
-   parent_group = (unsigned)group % ngroups;
+   parent_group = prandom_u32_max(ngroups);
for (i = 0; i < ngroups; i++) {
group = (parent_group + i) % ngroups;
desc = ext2_get_group_desc (sb, group, NULL);
diff --git a/fs/ext4/ialloc.c b/fs/ext4/ialloc.c
index f73e5eb43eae..36d5bc595cc2 100644
--- a/fs/ext4/ialloc.c
+++ b/fs/ext4/ialloc.c
@@ -463,10 +463,9 @@ static int find_group_orlov(struct super_block *sb, struct 
inode *parent,
hinfo.hash_version = DX_HASH_HALF_MD4;
hinfo.seed = sbi->s_hash_seed;
ext4fs_dirhash(parent, qstr->name, qstr->len, );
-   grp = hinfo.hash;
+   parent_group = hinfo.hash % ngroups;
} else
-   grp = prandom_u32();
-   parent_group = (unsigned)grp % ngroups;
+   parent_group = prandom_u32_max(ngroups);
for (i = 0; i < ngroups; i++) {
g = (parent_group + i) % ngroups;
get_orlov_stats(sb, g, flex_size, );
diff --git a/lib/sbitmap.c b/lib/sbitmap.c
index c4f04edf3ee9..ef0661504561 100644
--- a/lib/sbitmap.c
+++ b/lib/sbitmap.c
@@ -21,7 +21,7 @@ static int init_alloc_hint(struct sbitmap *sb, gfp_t flags)
int i;
 
for_each_possible_cpu(i)
-   *per_cpu_ptr(sb->alloc_hint, i) = prandom_u32() % depth;
+   *per_cpu_ptr(sb->alloc_hint, i) = 
prandom_u32_max(depth);
}
return 0;
 }
diff --git a/lib/test_vmalloc.c b/lib/test_vmalloc.c
index 4f2f2d1bac56..a26bbbf20e62 100644
--- a/lib/test_vmalloc.c
+++ b/lib/test_vmalloc.c
@@ -151,9 +151,7 @@ static int random_size_alloc_test(void)
int i;
 
for (i = 0; i < test_loop_count; i++) {
-   n = prandom_u32();
-   n = (n % 100) + 1;
-
+   n = prandom_u32_max(100) + 1;
p = vmalloc(n * PAGE_SIZE);
 
if (!p)
@@ -293,16 +291,12 @@ pcpu_alloc_test(void)
return -1;
 
for (i = 0; i < 35000; i++) {
-   unsigned int r;
-
-   r = prandom_u32();
-   size = (r % (PAGE_SIZE / 4)) + 1;
+   size = prandom_u32_max(PAGE_SIZE / 4) + 1;
 
/*
 * Maximum PAGE_SIZE
 */
-   r = prandom_u32();
-   align = 1 << ((r % 11) + 1);
+   align = 1 << (prandom_u32_max(11) + 1);
 
pcpu[i] = __alloc_percpu(size, align);
if (!pcpu[i])
@@ -393,14 +387,11 @@ static struct test_driver {
 
 static void shuffle_array(int *arr, int n)
 {
-   unsigned int rnd;
int i, j;
 
for (i = n - 1; i > 0; i--)  {
-   rnd = prandom_u32();
-
/* Cut the range. */
-   j = rnd % i;
+   j = prandom_u32_max(i);
 
/* Swap indexes. */
swap(arr[i], arr[j]);
-- 
2.37.3



[PATCH v5 1/7] treewide: use prandom_u32_max() when possible, part 1

2022-10-07 Thread Jason A. Donenfeld
Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done mechanically with this coccinelle script:

@basic@
expression E;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
@@
(
- ((T)get_random_u32() % (E))
+ prandom_u32_max(E)
|
- ((T)get_random_u32() & ((E) - 1))
+ prandom_u32_max(E * XXX_MAKE_SURE_E_IS_POW2)
|
- ((T)get_random_u32() & ~PAGE_MASK)
+ prandom_u32_max(PAGE_SIZE)
)

@multi_line@
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
identifier RAND;
expression E;
@@

-   RAND = get_random_u32();
... when != RAND
-   RAND %= (E);
+   RAND = prandom_u32_max(E);

// Find a potential literal
@literal_mask@
expression LITERAL;
type T;
identifier get_random_u32 =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@

((T)get_random_u32()@p & (LITERAL))

// Add one to the literal.
@script:python add_one@
literal << literal_mask.LITERAL;
RESULT;
@@

value = None
if literal.startswith('0x'):
value = int(literal, 16)
elif literal[0] in '123456789':
value = int(literal, 10)
if value is None:
print("I don't know how to handle %s" % (literal))
cocci.include_match(False)
elif value == 2**32 - 1 or value == 2**31 - 1 or value == 2**24 - 1 or value == 
2**16 - 1 or value == 2**8 - 1:
print("Skipping 0x%x for cleanup elsewhere" % (value))
cocci.include_match(False)
elif value & (value + 1) != 0:
print("Skipping 0x%x because it's not a power of two minus one" % 
(value))
cocci.include_match(False)
elif literal.startswith('0x'):
coccinelle.RESULT = cocci.make_expr("0x%x" % (value + 1))
else:
coccinelle.RESULT = cocci.make_expr("%d" % (value + 1))

// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
expression add_one.RESULT;
identifier FUNC;
@@

-   (FUNC()@p & (LITERAL))
+   prandom_u32_max(RESULT)

@collapse_ret@
type T;
identifier VAR;
expression E;
@@

 {
-   T VAR;
-   VAR = (E);
-   return VAR;
+   return E;
 }

@drop_var@
type T;
identifier VAR;
@@

 {
-   T VAR;
... when != VAR
 }

Reviewed-by: Kees Cook 
Reviewed-by: KP Singh 
Reviewed-by: Jan Kara  # for ext4 and sbitmap
Reviewed-by: Christoph Böhmwalder  # for drbd
Acked-by: Ulf Hansson  # for mmc
Acked-by: Darrick J. Wong  # for xfs
Signed-off-by: Jason A. Donenfeld 
---
 arch/arm/kernel/process.c |  2 +-
 arch/arm64/kernel/process.c   |  2 +-
 arch/loongarch/kernel/process.c   |  2 +-
 arch/loongarch/kernel/vdso.c  |  2 +-
 arch/mips/kernel/process.c|  2 +-
 arch/mips/kernel/vdso.c   |  2 +-
 arch/parisc/kernel/vdso.c |  2 +-
 arch/powerpc/kernel/process.c |  2 +-
 arch/s390/kernel/process.c|  2 +-
 arch/s390/kernel/vdso.c   |  2 +-
 arch/sparc/vdso/vma.c |  2 +-
 arch/um/kernel/process.c  |  2 +-
 arch/x86/entry/vdso/vma.c |  2 +-
 arch/x86/kernel/module.c  |  2 +-
 arch/x86/kernel/process.c |  2 +-
 arch/x86/mm/pat/cpa-test.c|  4 +-
 crypto/testmgr.c  | 86 +--
 drivers/block/drbd/drbd_receiver.c|  4 +-
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c|  2 +-
 drivers/infiniband/core/cma.c |  2 +-
 drivers/infiniband/hw/cxgb4/id_table.c|  4 +-
 drivers/infiniband/hw/hns/hns_roce_ah.c   |  5 +-
 drivers/infiniband/ulp/rtrs/rtrs-clt.c|  3 +-
 drivers/md/bcache/request.c   |  2 +-
 .../test-drivers/vivid/vivid-touch-cap.c  |  2 +-
 drivers/mmc/core/core.c   |  4 +-
 drivers/mmc/host/dw_mmc.c |  2 +-
 drivers/mtd/nand/raw/nandsim.c|  4 +-
 drivers/mtd/tests/mtd_nandecctest.c   | 10 +--
 drivers/mtd/tests/stresstest.c| 17 +---
 drivers/mtd/ubi/debug.c   |  2 +-
 drivers/mtd/ubi/debug.h   |  6 +-
 drivers/net/ethernet/broadcom/cnic.c  |  3 +-
 .../chelsio/inline_crypto/chtls/chtls_io.c|  4 +-
 drivers/net/hamradio/baycom_epp.c |  2 +-
 drivers/net/hamradio/hdlcdrv.c|  2 +-
 drivers/net/hamradio/yam.c|  2 +-
 drivers/net/phy/at803x.c  |  2 +-
 .../broadcom/brcm80211/brcmfmac/p2p.c |  2 +-
 .../net/wireless/intel/iwlwifi/mvm/mac-ctxt.c |  2 +-
 drivers/scsi/fcoe/fcoe_ctlr.c |  4 +-
 drivers/scsi/qedi/qedi_main.c |  2 +-
 fs/ceph/inode.c   |  2 +-
 

[PATCH v5 0/7] treewide cleanup of random integer usage

2022-10-07 Thread Jason A. Donenfeld
Changes v4->v5:
- Coccinelle is now used for as much mechanical aspects as possible,
  with mechanical parts split off from non-mechanical parts. This should
  drastically reduce the amount of code that needs to be reviewed
  carefully. Each commit mentions now if it was done by hand or is
  mechanical.

Hi folks,

This is a five part treewide cleanup of random integer handling. The
rules for random integers are:

- If you want a secure or an insecure random u64, use get_random_u64().
- If you want a secure or an insecure random u32, use get_random_u32().
  * The old function prandom_u32() has been deprecated for a while now
and is just a wrapper around get_random_u32(). Same for
get_random_int().
- If you want a secure or an insecure random u16, use get_random_u16().
- If you want a secure or an insecure random u8, use get_random_u8().
- If you want secure or insecure random bytes, use get_random_bytes().
  * The old function prandom_bytes() has been deprecated for a while now
and has long been a wrapper around get_random_bytes().
- If you want a non-uniform random u32, u16, or u8 bounded by a certain
  open interval maximum, use prandom_u32_max().
  * I say "non-uniform", because it doesn't do any rejection sampling or
divisions. Hence, it stays within the prandom_* namespace.

These rules ought to be applied uniformly, so that we can clean up the
deprecated functions, and earn the benefits of using the modern
functions. In particular, in addition to the boring substitutions, this
patchset accomplishes a few nice effects:

- By using prandom_u32_max() with an upper-bound that the compiler can
  prove at compile-time is ≤65536 or ≤256, internally get_random_u16()
  or get_random_u8() is used, which wastes fewer batched random bytes,
  and hence has higher throughput.

- By using prandom_u32_max() instead of %, when the upper-bound is not a
  constant, division is still avoided, because prandom_u32_max() uses
  a faster multiplication-based trick instead.

- By using get_random_u16() or get_random_u8() in cases where the return
  value is intended to indeed be a u16 or a u8, we waste fewer batched
  random bytes, and hence have higher throughput.

So, based on those rules and benefits from following them, this patchset
breaks down into the following five steps:

1) Replace `prandom_u32() % max` and variants thereof with
   prandom_u32_max(max).

   * Part 1 is done with Coccinelle. Part 2 is done by hand.

2) Replace `(type)get_random_u32()` and variants thereof with
   get_random_u16() or get_random_u8(). I took the pains to actually
   look and see what every lvalue type was across the entire tree.

   * Part 1 is done with Coccinelle. Part 2 is done by hand.

3) Replace remaining deprecated uses of prandom_u32() and
   get_random_int() with get_random_u32(). 

   * A boring search and replace operation.

4) Replace remaining deprecated uses of prandom_bytes() with
   get_random_bytes().

   * A boring search and replace operation.

5) Remove the deprecated and now-unused prandom_u32() and
   prandom_bytes() inline wrapper functions.

   * Just deleting code and updating comments.

I was thinking of taking this through my random.git tree (on which this
series is currently based) and submitting it near the end of the merge
window, or waiting for the very end of the 6.1 cycle when there will be
the fewest new patches brewing. If somebody with some treewide-cleanup
experience might share some wisdom about what the best timing usually
winds up being, I'm all ears.

Please take a look! The number of lines touched is quite small, so this
should be reviewable, and as much as is possible has been pushed into
Coccinelle scripts.

Thanks,
Jason

Cc: Andreas Noever 
Cc: Andrew Morton 
Cc: Andy Shevchenko 
Cc: Borislav Petkov 
Cc: Catalin Marinas 
Cc: Christoph Böhmwalder 
Cc: Christoph Hellwig 
Cc: Christophe Leroy 
Cc: Daniel Borkmann 
Cc: Dave Airlie 
Cc: Dave Hansen 
Cc: David S. Miller 
Cc: Eric Dumazet 
Cc: Florian Westphal 
Cc: Greg Kroah-Hartman ,
Cc: H. Peter Anvin 
Cc: Heiko Carstens 
Cc: Helge Deller 
Cc: Herbert Xu 
Cc: Huacai Chen 
Cc: Hugh Dickins 
Cc: Jakub Kicinski 
Cc: James E.J. Bottomley 
Cc: Jan Kara 
Cc: Jason Gunthorpe 
Cc: Jens Axboe 
Cc: Johannes Berg 
Cc: Jonathan Corbet 
Cc: Jozsef Kadlecsik 
Cc: KP Singh 
Cc: Kees Cook 
Cc: Marco Elver 
Cc: Mauro Carvalho Chehab 
Cc: Michael Ellerman 
Cc: Pablo Neira Ayuso 
Cc: Paolo Abeni 
Cc: Peter Zijlstra 
Cc: Richard Weinberger 
Cc: Russell King 
Cc: Theodore Ts'o 
Cc: Thomas Bogendoerfer 
Cc: Thomas Gleixner 
Cc: Thomas Graf 
Cc: Ulf Hansson 
Cc: Vignesh Raghavendra 
Cc: WANG Xuerui 
Cc: Will Deacon 
Cc: Yury Norov 
Cc: dri-devel@lists.freedesktop.org
Cc: kasan-...@googlegroups.com
Cc: kernel-janit...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-bl...@vger.kernel.org
Cc: linux-cry...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
Cc: 

Re: [PATCH v4 2/6] treewide: use prandom_u32_max() when possible

2022-10-07 Thread Kees Cook
[resending because I failed to CC]

On October 7, 2022 7:21:28 PM PDT, "Jason A. Donenfeld"  wrote:
>On Fri, Oct 07, 2022 at 03:47:44PM -0700, Kees Cook wrote:
>> On Fri, Oct 07, 2022 at 12:01:03PM -0600, Jason A. Donenfeld wrote:
>> > Rather than incurring a division or requesting too many random bytes for
>> > the given range, use the prandom_u32_max() function, which only takes
>> > the minimum required bytes from the RNG and avoids divisions.
>> 
>> I actually meant splitting the by-hand stuff by subsystem, but nearly
>> all of these can be done mechanically too, so it shouldn't be bad. Notes
>> below...
>
>Oh, cool, more coccinelle. You're basically giving me a class on these
>recipes. Much appreciated.

You're welcome! This was a fun exercise. :)

>
>> > [...]
>> > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
>> > index 92bcc1768f0b..87203429f802 100644
>> > --- a/arch/arm64/kernel/process.c
>> > +++ b/arch/arm64/kernel/process.c
>> > @@ -595,7 +595,7 @@ unsigned long __get_wchan(struct task_struct *p)
>> >  unsigned long arch_align_stack(unsigned long sp)
>> >  {
>> >if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
>> > -  sp -= get_random_int() & ~PAGE_MASK;
>> > +  sp -= prandom_u32_max(PAGE_SIZE);
>> >return sp & ~0xf;
>> >  }
>> >  
>> 
>> @mask@
>> expression MASK;
>> @@
>> 
>> - (get_random_int() & ~(MASK))
>> + prandom_u32_max(MASK)
>
>Not quite! PAGE_MASK != PAGE_SIZE. In this case, things get a litle
>more complicated where you can do:
>
>get_random_int() & MASK == prandom_u32_max(MASK + 1)
>*only if all the top bits of MASK are set* That is, if MASK one less

Oh whoops! Yes, right, I totally misread SIZE as MASK.

>than a power of two. Or if MASK & (MASK + 1) == 0.
>
>(If those top bits aren't set, you can technically do
>prandom_u32_max(MASK >> n + 1) << n. That'd be a nice thing to work out.
>But yeesh, maybe a bit much for the time being and probably a bit beyond
>coccinelle.)
>
>This case here, though, is a bit more special, where we can just rely on
>an obvious given kernel identity. Namely, PAGE_MASK == ~(PAGE_SIZE - 1).
>So ~PAGE_MASK == PAGE_SIZE - 1.
>So get_random_int() & ~PAGE_MASK == prandom_u32_max(PAGE_SIZE - 1 + 1).
>So get_random_int() & ~PAGE_MASK == prandom_u32_max(PAGE_SIZE).
>
>And most importantly, this makes the code more readable, since everybody
>knows what bounding by PAGE_SIZE means, where as what on earth is
>happening with the &~PAGE_MASK thing. So it's a good change. I'll try to
>teach coccinelle about that special case.

Yeah, it should be possible to just check for the literal.

>
>
>
>> > diff --git a/arch/loongarch/kernel/vdso.c b/arch/loongarch/kernel/vdso.c
>> > index f32c38abd791..8c9826062652 100644
>> > --- a/arch/loongarch/kernel/vdso.c
>> > +++ b/arch/loongarch/kernel/vdso.c
>> > @@ -78,7 +78,7 @@ static unsigned long vdso_base(void)
>> >unsigned long base = STACK_TOP;
>> >  
>> >if (current->flags & PF_RANDOMIZE) {
>> > -  base += get_random_int() & (VDSO_RANDOMIZE_SIZE - 1);
>> > +  base += prandom_u32_max(VDSO_RANDOMIZE_SIZE);
>> >base = PAGE_ALIGN(base);
>> >}
>> >  
>> 
>> @minus_one@
>> expression FULL;
>> @@
>> 
>> - (get_random_int() & ((FULL) - 1)
>> + prandom_u32_max(FULL)
>
>Ahh, well, okay, this is the example I mentioned above. Only works if
>FULL is saturated. Any clever way to get coccinelle to prove that? Can
>it look at the value of constants?

I'm not sure if Cocci will do that without a lot of work. The literals trick I 
used below would need a lot of fanciness. :)

>
>> 
>> > diff --git a/arch/parisc/kernel/vdso.c b/arch/parisc/kernel/vdso.c
>> > index 63dc44c4c246..47e5960a2f96 100644
>> > --- a/arch/parisc/kernel/vdso.c
>> > +++ b/arch/parisc/kernel/vdso.c
>> > @@ -75,7 +75,7 @@ int arch_setup_additional_pages(struct linux_binprm 
>> > *bprm,
>> >  
>> >map_base = mm->mmap_base;
>> >if (current->flags & PF_RANDOMIZE)
>> > -  map_base -= (get_random_int() & 0x1f) * PAGE_SIZE;
>> > +  map_base -= prandom_u32_max(0x20) * PAGE_SIZE;
>> >  
>> >vdso_text_start = get_unmapped_area(NULL, map_base, vdso_text_len, 0, 
>> > 0);
>> >  
>> 
>> These are more fun, but Coccinelle can still do them with a little
>> Pythonic help:
>> 
>> // Find a potential literal
>> @literal_mask@
>> expression LITERAL;
>> identifier randfunc =~ "get_random_int|prandom_u32|get_random_u32";
>> position p;
>> @@
>> 
>> (randfunc()@p & (LITERAL))
>> 
>> // Add one to the literal.
>> @script:python add_one@
>> literal << literal_mask.LITERAL;
>> RESULT;
>> @@
>> 
>> if literal.startswith('0x'):
>> value = int(literal, 16) + 1
>> coccinelle.RESULT = cocci.make_expr("0x%x" % (value))
>> elif literal[0] in '123456789':
>> value = int(literal, 10) + 1
>> coccinelle.RESULT = cocci.make_expr("%d" % (value))
>> else:
>> print("I don't know how to handle: %s" % (literal))
>> 
>> // 

Re: [PATCH v4 2/6] treewide: use prandom_u32_max() when possible

2022-10-07 Thread Jason A. Donenfeld
On Fri, Oct 07, 2022 at 03:47:44PM -0700, Kees Cook wrote:
> > diff --git a/lib/test_vmalloc.c b/lib/test_vmalloc.c
> > index 4f2f2d1bac56..56ffaa8dd3f6 100644
> > --- a/lib/test_vmalloc.c
> > +++ b/lib/test_vmalloc.c
> > @@ -151,9 +151,7 @@ static int random_size_alloc_test(void)
> > int i;
> >  
> > for (i = 0; i < test_loop_count; i++) {
> > -   n = prandom_u32();
> > -   n = (n % 100) + 1;
> > -
> > +   n = prandom_u32_max(n % 100) + 1;
> > p = vmalloc(n * PAGE_SIZE);
> >  
> > if (!p)
> 
> This looks wrong. Cocci says:
> 
> -   n = prandom_u32();
> -   n = (n % 100) + 1;
> +   n = prandom_u32_max(100) + 1;

I agree that's wrong, but what rule did you use to make Cocci generate
that?

Jason


Re: [PATCH v4 2/6] treewide: use prandom_u32_max() when possible

2022-10-07 Thread Jason A. Donenfeld
On Fri, Oct 07, 2022 at 03:47:44PM -0700, Kees Cook wrote:
> On Fri, Oct 07, 2022 at 12:01:03PM -0600, Jason A. Donenfeld wrote:
> > Rather than incurring a division or requesting too many random bytes for
> > the given range, use the prandom_u32_max() function, which only takes
> > the minimum required bytes from the RNG and avoids divisions.
> 
> I actually meant splitting the by-hand stuff by subsystem, but nearly
> all of these can be done mechanically too, so it shouldn't be bad. Notes
> below...

Oh, cool, more coccinelle. You're basically giving me a class on these
recipes. Much appreciated.

> > [...]
> > diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> > index 92bcc1768f0b..87203429f802 100644
> > --- a/arch/arm64/kernel/process.c
> > +++ b/arch/arm64/kernel/process.c
> > @@ -595,7 +595,7 @@ unsigned long __get_wchan(struct task_struct *p)
> >  unsigned long arch_align_stack(unsigned long sp)
> >  {
> > if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
> > -   sp -= get_random_int() & ~PAGE_MASK;
> > +   sp -= prandom_u32_max(PAGE_SIZE);
> > return sp & ~0xf;
> >  }
> >  
> 
> @mask@
> expression MASK;
> @@
> 
> - (get_random_int() & ~(MASK))
> + prandom_u32_max(MASK)

Not quite! PAGE_MASK != PAGE_SIZE. In this case, things get a litle
more complicated where you can do:

get_random_int() & MASK == prandom_u32_max(MASK + 1)
*only if all the top bits of MASK are set* That is, if MASK one less
than a power of two. Or if MASK & (MASK + 1) == 0.

(If those top bits aren't set, you can technically do
prandom_u32_max(MASK >> n + 1) << n. That'd be a nice thing to work out.
But yeesh, maybe a bit much for the time being and probably a bit beyond
coccinelle.)

This case here, though, is a bit more special, where we can just rely on
an obvious given kernel identity. Namely, PAGE_MASK == ~(PAGE_SIZE - 1).
So ~PAGE_MASK == PAGE_SIZE - 1.
So get_random_int() & ~PAGE_MASK == prandom_u32_max(PAGE_SIZE - 1 + 1).
So get_random_int() & ~PAGE_MASK == prandom_u32_max(PAGE_SIZE).

And most importantly, this makes the code more readable, since everybody
knows what bounding by PAGE_SIZE means, where as what on earth is
happening with the &~PAGE_MASK thing. So it's a good change. I'll try to
teach coccinelle about that special case.



> > diff --git a/arch/loongarch/kernel/vdso.c b/arch/loongarch/kernel/vdso.c
> > index f32c38abd791..8c9826062652 100644
> > --- a/arch/loongarch/kernel/vdso.c
> > +++ b/arch/loongarch/kernel/vdso.c
> > @@ -78,7 +78,7 @@ static unsigned long vdso_base(void)
> > unsigned long base = STACK_TOP;
> >  
> > if (current->flags & PF_RANDOMIZE) {
> > -   base += get_random_int() & (VDSO_RANDOMIZE_SIZE - 1);
> > +   base += prandom_u32_max(VDSO_RANDOMIZE_SIZE);
> > base = PAGE_ALIGN(base);
> > }
> >  
> 
> @minus_one@
> expression FULL;
> @@
> 
> - (get_random_int() & ((FULL) - 1)
> + prandom_u32_max(FULL)

Ahh, well, okay, this is the example I mentioned above. Only works if
FULL is saturated. Any clever way to get coccinelle to prove that? Can
it look at the value of constants?

> 
> > diff --git a/arch/parisc/kernel/vdso.c b/arch/parisc/kernel/vdso.c
> > index 63dc44c4c246..47e5960a2f96 100644
> > --- a/arch/parisc/kernel/vdso.c
> > +++ b/arch/parisc/kernel/vdso.c
> > @@ -75,7 +75,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
> >  
> > map_base = mm->mmap_base;
> > if (current->flags & PF_RANDOMIZE)
> > -   map_base -= (get_random_int() & 0x1f) * PAGE_SIZE;
> > +   map_base -= prandom_u32_max(0x20) * PAGE_SIZE;
> >  
> > vdso_text_start = get_unmapped_area(NULL, map_base, vdso_text_len, 0, 
> > 0);
> >  
> 
> These are more fun, but Coccinelle can still do them with a little
> Pythonic help:
> 
> // Find a potential literal
> @literal_mask@
> expression LITERAL;
> identifier randfunc =~ "get_random_int|prandom_u32|get_random_u32";
> position p;
> @@
> 
> (randfunc()@p & (LITERAL))
> 
> // Add one to the literal.
> @script:python add_one@
> literal << literal_mask.LITERAL;
> RESULT;
> @@
> 
> if literal.startswith('0x'):
> value = int(literal, 16) + 1
> coccinelle.RESULT = cocci.make_expr("0x%x" % (value))
> elif literal[0] in '123456789':
> value = int(literal, 10) + 1
> coccinelle.RESULT = cocci.make_expr("%d" % (value))
> else:
> print("I don't know how to handle: %s" % (literal))
> 
> // Replace the literal mask with the calculated result.
> @plus_one@
> expression literal_mask.LITERAL;
> position literal_mask.p;
> expression add_one.RESULT;
> identifier FUNC;
> @@
> 
> -   (FUNC()@p & (LITERAL))
> +   prandom_u32_max(RESULT)

Oh that's pretty cool. I can do the saturation check in python, since
`value` holds the parsed result. Neat.

> > diff --git a/fs/ext2/ialloc.c b/fs/ext2/ialloc.c
> > index 998dd2ac8008..f4944c4dee60 100644
> > --- a/fs/ext2/ialloc.c
> > +++ 

Re: [PATCH v4 4/6] treewide: use get_random_u32() when possible

2022-10-07 Thread Jason A. Donenfeld
On Fri, Oct 07, 2022 at 10:34:47PM +0200, Rolf Eike Beer wrote:
> > diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
> > index 7c37e09c92da..18c4f0e3e906 100644
> > --- a/arch/parisc/kernel/process.c
> > +++ b/arch/parisc/kernel/process.c
> > @@ -288,7 +288,7 @@ __get_wchan(struct task_struct *p)
> > 
> >  static inline unsigned long brk_rnd(void)
> >  {
> > -   return (get_random_int() & BRK_RND_MASK) << PAGE_SHIFT;
> > +   return (get_random_u32() & BRK_RND_MASK) << PAGE_SHIFT;
> >  }
> 
> Can't this be
> 
>   prandom_u32_max(BRK_RND_MASK + 1) << PAGE_SHIFT
> 
> ? More similar code with other masks follows below.

I guess it can, because BRK_RND_MASK happens to have all its lower bits
set. But as a "_MASK" maybe this isn't a given, and I don't want to
change intended semantics in this patchset. It's also not more
efficient, because BRK_RND_MASK is actually an expression:

#define BRK_RND_MASK(is_32bit_task() ? 0x07ffUL : 0x3UL)

So at compile-time, the compiler can't prove that it's <= U16_MAX, since
it isn't always the case, so it'll use get_random_u32() anyway.

[Side note: maybe that compile-time check should become a runtime check,
 but I'll need to do some benchmarking before changing that and
 introducing two added branches to every non-constant invocation, so for
 now it's a compile-time check. Fortunately the vast majority of uses
 are done on inputs the compiler can prove something about.]

> 
> > diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > b/drivers/gpu/drm/i915/i915_gem_gtt.c index 329ff75b80b9..7bd1861ddbdf
> > 100644
> > --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> > +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> > @@ -137,12 +137,12 @@ static u64 random_offset(u64 start, u64 end, u64 len,
> > u64 align) range = round_down(end - len, align) - round_up(start, align);
> > if (range) {
> > if (sizeof(unsigned long) == sizeof(u64)) {
> > -   addr = get_random_long();
> > +   addr = get_random_u64();
> > } else {
> > -   addr = get_random_int();
> > +   addr = get_random_u32();
> > if (range > U32_MAX) {
> > addr <<= 32;
> > -   addr |= get_random_int();
> > +   addr |= get_random_u32();
> > }
> > }
> > div64_u64_rem(addr, range, );
> 
> How about 
> 
>   if (sizeof(unsigned long) == sizeof(u64) || range > 
> U32_MAX)
>   addr = get_random_u64();
>   else
>   addr = get_random_u32();
> 

Yes, maybe, probably, indeed... But I don't want to go wild and start
fixing all the weird algorithms everywhere. My goal is to only make
changes that are "obviously right". But maybe after this lands this is
something that you or I can submit to the i915 people as an
optimization.

> > diff --git a/drivers/infiniband/hw/cxgb4/cm.c
> > b/drivers/infiniband/hw/cxgb4/cm.c index 14392c942f49..499a425a3379 100644
> > --- a/drivers/infiniband/hw/cxgb4/cm.c
> > +++ b/drivers/infiniband/hw/cxgb4/cm.c
> > @@ -734,7 +734,7 @@ static int send_connect(struct c4iw_ep *ep)
> >>com.remote_addr;
> > int ret;
> > enum chip_type adapter_type = ep->com.dev->rdev.lldi.adapter_type;
> > -   u32 isn = (prandom_u32() & ~7UL) - 1;
> > +   u32 isn = (get_random_u32() & ~7UL) - 1;
> > struct net_device *netdev;
> > u64 params;
> > 
> > @@ -2469,7 +2469,7 @@ static int accept_cr(struct c4iw_ep *ep, struct
> > sk_buff *skb, }
> > 
> > if (!is_t4(adapter_type)) {
> > -   u32 isn = (prandom_u32() & ~7UL) - 1;
> > +   u32 isn = (get_random_u32() & ~7UL) - 1;
> 
> u32 isn = get_random_u32() | 0x7;

Again, maybe so, but same rationale as above.

> >  static void ns_do_bit_flips(struct nandsim *ns, int num)
> >  {
> > -   if (bitflips && prandom_u32() < (1 << 22)) {
> > +   if (bitflips && get_random_u32() < (1 << 22)) {
> 
> Doing "get_random_u16() < (1 << 6)" should have the same probability with 
> only 
> 2 bytes of random, no?

That's very clever. (1<<22)/(1<<32) == (1<<6)/(1<<16). But also, same
rationale as above for not doing that.

Anyway, I realize this is probably disappointing to read. But also, we
can come back to those optimization cases later pretty easily.

Jason


Re: [PATCH v4 2/6] treewide: use prandom_u32_max() when possible

2022-10-07 Thread Jason A. Donenfeld
On Fri, Oct 07, 2022 at 02:17:22PM -0700, Darrick J. Wong wrote:
> On Fri, Oct 07, 2022 at 12:01:03PM -0600, Jason A. Donenfeld wrote:
> > Rather than incurring a division or requesting too many random bytes for
> > the given range, use the prandom_u32_max() function, which only takes
> > the minimum required bytes from the RNG and avoids divisions.
> > 
> > Reviewed-by: Kees Cook 
> > Reviewed-by: KP Singh 
> > Reviewed-by: Christoph Böhmwalder  # for 
> > drbd
> > Reviewed-by: Jan Kara  # for ext2, ext4, and sbitmap
> > Signed-off-by: Jason A. Donenfeld 
> > ---
> 
> 
> 
> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> > index e2bdf089c0a3..6261599bb389 100644
> > --- a/fs/xfs/libxfs/xfs_alloc.c
> > +++ b/fs/xfs/libxfs/xfs_alloc.c
> > @@ -1520,7 +1520,7 @@ xfs_alloc_ag_vextent_lastblock(
> >  
> >  #ifdef DEBUG
> > /* Randomly don't execute the first algorithm. */
> > -   if (prandom_u32() & 1)
> > +   if (prandom_u32_max(2))
> 
> I wonder if these usecases (picking 0 or 1 randomly) ought to have a
> trivial wrapper to make it more obvious that we want boolean semantics:
> 
> static inline bool prandom_bool(void)
> {
>   return prandom_u32_max(2);
> }
> 
>   if (prandom_bool())
>   use_crazy_algorithm(...);
> 

Yea, I've had a lot of similar ideas there. Part of doing this (initial)
patchset is to get an intuitive sense of what's actually used and how
often. On my list for investigation are a get_random_u32_max() to return
uniform numbers by rejection sampling (prandom_u32_max() doesn't do
that uniformly) and adding a function for booleans or bits < 8. Possible
ideas for the latter include:

   bool get_random_bool(void);
   bool get_random_bool(unsigned int probability);
   bool get_random_bits(u8 bits_less_than_eight);

With the core of all of those involving the same batching as the current
get_random_u{8,16,32,64}() functions, but also buffering the latest byte
and managing how many bits are left in it that haven't been shifted out
yet.

So API-wise, there are a few ways to go, so hopefully this series will
start to give a good picture of what's needed.

One thing I've noticed is that most of the prandom_u32_max(2)
invocations are in debug or test code, so that doesn't need to be
optimized. But kfence does that too in its hot path, so a
get_random_bool() function there would in theory lead to an 8x speed-up.
But I guess I just have to try some things and see.

Anyway, that is a long way to say, I share you curiosity on the matter
and I'm looking into it.

Jason


Re: [PATCH v2 1/2] drivers: gpu: drm: add driver for samsung s6e3fc2x01 cmd mode panel

2022-10-07 Thread Alexey Minnekhanov

Hi,

On 07.10.2022 14:14, Nia Espera wrote:


+
+#define dsi_dcs_write_seq(dsi, seq...) do {\
+   static const u8 d[] = { seq };  \
+   int ret;\
+   ret = mipi_dsi_dcs_write_buffer(dsi, d, ARRAY_SIZE(d)); \
+   if (ret < 0) \
+   return ret; \
+   } while (0)
+


There is now a standard macro for this - mipi_dsi_dcs_write_seq() [1] , 
so you don't need to reinvent it.



+static void samsung_s6e3fc2x01_reset(struct samsung_s6e3fc2x01 *ctx)
+{
+   gpiod_set_value_cansleep(ctx->reset_gpio, 0);
+   usleep_range(5000, 6000);
+   gpiod_set_value_cansleep(ctx->reset_gpio, 1);
+   usleep_range(2000, 3000);
+   gpiod_set_value_cansleep(ctx->reset_gpio, 0);
+   usleep_range(1, 11000);
+}


There is a high chance that first gpiod_set_value() is not needed, only 
the last 2.


[1] https://elixir.bootlin.com/linux/latest/C/ident/mipi_dsi_dcs_write_seq

--
Regards
Alexey Minnekhanov


Re: [PATCH v4 2/6] treewide: use prandom_u32_max() when possible

2022-10-07 Thread Kees Cook
On Fri, Oct 07, 2022 at 12:01:03PM -0600, Jason A. Donenfeld wrote:
> Rather than incurring a division or requesting too many random bytes for
> the given range, use the prandom_u32_max() function, which only takes
> the minimum required bytes from the RNG and avoids divisions.

I actually meant splitting the by-hand stuff by subsystem, but nearly
all of these can be done mechanically too, so it shouldn't be bad. Notes
below...

> 
> [...]
> diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
> index 92bcc1768f0b..87203429f802 100644
> --- a/arch/arm64/kernel/process.c
> +++ b/arch/arm64/kernel/process.c
> @@ -595,7 +595,7 @@ unsigned long __get_wchan(struct task_struct *p)
>  unsigned long arch_align_stack(unsigned long sp)
>  {
>   if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
> - sp -= get_random_int() & ~PAGE_MASK;
> + sp -= prandom_u32_max(PAGE_SIZE);
>   return sp & ~0xf;
>  }
>  

@mask@
expression MASK;
@@

- (get_random_int() & ~(MASK))
+ prandom_u32_max(MASK)

> diff --git a/arch/loongarch/kernel/vdso.c b/arch/loongarch/kernel/vdso.c
> index f32c38abd791..8c9826062652 100644
> --- a/arch/loongarch/kernel/vdso.c
> +++ b/arch/loongarch/kernel/vdso.c
> @@ -78,7 +78,7 @@ static unsigned long vdso_base(void)
>   unsigned long base = STACK_TOP;
>  
>   if (current->flags & PF_RANDOMIZE) {
> - base += get_random_int() & (VDSO_RANDOMIZE_SIZE - 1);
> + base += prandom_u32_max(VDSO_RANDOMIZE_SIZE);
>   base = PAGE_ALIGN(base);
>   }
>  

@minus_one@
expression FULL;
@@

- (get_random_int() & ((FULL) - 1)
+ prandom_u32_max(FULL)

> diff --git a/arch/parisc/kernel/vdso.c b/arch/parisc/kernel/vdso.c
> index 63dc44c4c246..47e5960a2f96 100644
> --- a/arch/parisc/kernel/vdso.c
> +++ b/arch/parisc/kernel/vdso.c
> @@ -75,7 +75,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
>  
>   map_base = mm->mmap_base;
>   if (current->flags & PF_RANDOMIZE)
> - map_base -= (get_random_int() & 0x1f) * PAGE_SIZE;
> + map_base -= prandom_u32_max(0x20) * PAGE_SIZE;
>  
>   vdso_text_start = get_unmapped_area(NULL, map_base, vdso_text_len, 0, 
> 0);
>  

These are more fun, but Coccinelle can still do them with a little
Pythonic help:

// Find a potential literal
@literal_mask@
expression LITERAL;
identifier randfunc =~ "get_random_int|prandom_u32|get_random_u32";
position p;
@@

(randfunc()@p & (LITERAL))

// Add one to the literal.
@script:python add_one@
literal << literal_mask.LITERAL;
RESULT;
@@

if literal.startswith('0x'):
value = int(literal, 16) + 1
coccinelle.RESULT = cocci.make_expr("0x%x" % (value))
elif literal[0] in '123456789':
value = int(literal, 10) + 1
coccinelle.RESULT = cocci.make_expr("%d" % (value))
else:
print("I don't know how to handle: %s" % (literal))

// Replace the literal mask with the calculated result.
@plus_one@
expression literal_mask.LITERAL;
position literal_mask.p;
expression add_one.RESULT;
identifier FUNC;
@@

-   (FUNC()@p & (LITERAL))
+   prandom_u32_max(RESULT)

> diff --git a/drivers/mtd/tests/stresstest.c b/drivers/mtd/tests/stresstest.c
> index cb29c8c1b370..d2faaca7f19d 100644
> --- a/drivers/mtd/tests/stresstest.c
> +++ b/drivers/mtd/tests/stresstest.c
> @@ -45,9 +45,8 @@ static int rand_eb(void)
>   unsigned int eb;
>  
>  again:
> - eb = prandom_u32();
>   /* Read or write up 2 eraseblocks at a time - hence 'ebcnt - 1' */
> - eb %= (ebcnt - 1);
> + eb = prandom_u32_max(ebcnt - 1);
>   if (bbt[eb])
>   goto again;
>   return eb;

This can also be done mechanically:

@multi_line@
identifier randfunc =~ "get_random_int|prandom_u32|get_random_u32";
identifier RAND;
expression E;
@@

-   RAND = randfunc();
... when != RAND
-   RAND %= (E);
+   RAND = prandom_u32_max(E);

@collapse_ret@
type TYPE;
identifier VAR;
expression E;
@@

 {
-   TYPE VAR;
-   VAR = (E);
-   return VAR;
+   return E;
 }

@drop_var@
type TYPE;
identifier VAR;
@@

 {
-   TYPE VAR;
... when != VAR
 }

> diff --git a/fs/ext2/ialloc.c b/fs/ext2/ialloc.c
> index 998dd2ac8008..f4944c4dee60 100644
> --- a/fs/ext2/ialloc.c
> +++ b/fs/ext2/ialloc.c
> @@ -277,8 +277,7 @@ static int find_group_orlov(struct super_block *sb, 
> struct inode *parent)
>   int best_ndir = inodes_per_group;
>   int best_group = -1;
>  
> - group = prandom_u32();
> - parent_group = (unsigned)group % ngroups;
> + parent_group = prandom_u32_max(ngroups);
>   for (i = 0; i < ngroups; i++) {
>   group = (parent_group + i) % ngroups;
>   desc = ext2_get_group_desc (sb, group, NULL);

Okay, that one is too much for me -- checking that group is never used
after the assignment removal is likely possible, but beyond my cocci

[Bug 205089] amdgpu : drm:amdgpu_cs_ioctl : Failed to initialize parser -125

2022-10-07 Thread bugzilla-daemon
https://bugzilla.kernel.org/show_bug.cgi?id=205089

laurentc92...@gmail.com changed:

   What|Removed |Added

 CC||laurentc92...@gmail.com

--- Comment #50 from laurentc92...@gmail.com ---
I had the same problem with my Sapphire RX 6700 while playing War Thunder or
Grid games. I got black screen and then return on the desktop but nothing was
possible to do but launching a shell and see the same error message :

[drm:amdgpu_cs_ioctl [amdgpu]] *ERROR* Failed to initialize parser -125!

I found the solution by forcing the VSYNC everywhere

> I enabled VSYNC in all games with the native refresh rate of my monitor 
> I also use mangohud to limit the FPS to 120 with VSYNC "ON" for Vulkan and
> OpenGL in the "performance" menu

My configuration (from neofetch) :

OS: Debian GNU/Linux bookworm/sid x86_64 
Host: MS-7B79 1.0 
Kernel: 5.18.0-0.deb11.4-amd64 (from bullseye-backport) 
MESA : 22.2.0-1  
Resolution: 2560x1440 / 165hz 
DE: MATE 1.26.0 
WM: Metacity (Marco) 
Theme: Menta [GTK2/3] 
Icons: menta [GTK2/3]  
CPU: AMD Ryzen 5 2600X (12) @ 3.600GHz 
GPU: AMD ATI Radeon RX 6700 
Memory: 1801MiB / 15925MiB

-- 
You may reply to this email to add a comment.

You are receiving this mail because:
You are watching the assignee of the bug.

Re: [PATCH v4 4/6] treewide: use get_random_u32() when possible

2022-10-07 Thread Darrick J. Wong
On Fri, Oct 07, 2022 at 12:01:05PM -0600, Jason A. Donenfeld wrote:
> The prandom_u32() function has been a deprecated inline wrapper around
> get_random_u32() for several releases now, and compiles down to the
> exact same code. Replace the deprecated wrapper with a direct call to
> the real function. The same also applies to get_random_int(), which is
> just a wrapper around get_random_u32().
> 
> Reviewed-by: Kees Cook 
> Acked-by: Toke Høiland-Jørgensen  # for sch_cake
> Acked-by: Chuck Lever  # for nfsd
> Reviewed-by: Jan Kara  # for ext4
> Acked-by: Mika Westerberg  # for thunderbolt
> Signed-off-by: Jason A. Donenfeld 

For the XFS parts,
Acked-by: Darrick J. Wong 

--D

> ---
>  Documentation/networking/filter.rst|  2 +-
>  arch/parisc/kernel/process.c   |  2 +-
>  arch/parisc/kernel/sys_parisc.c|  4 ++--
>  arch/s390/mm/mmap.c|  2 +-
>  arch/x86/kernel/cpu/amd.c  |  2 +-
>  drivers/gpu/drm/i915/i915_gem_gtt.c|  6 +++---
>  drivers/gpu/drm/i915/selftests/i915_selftest.c |  2 +-
>  drivers/gpu/drm/selftests/test-drm_buddy.c |  2 +-
>  drivers/gpu/drm/selftests/test-drm_mm.c|  2 +-
>  drivers/infiniband/hw/cxgb4/cm.c   |  4 ++--
>  drivers/infiniband/hw/hfi1/tid_rdma.c  |  2 +-
>  drivers/infiniband/hw/mlx4/mad.c   |  2 +-
>  drivers/infiniband/ulp/ipoib/ipoib_cm.c|  2 +-
>  drivers/md/raid5-cache.c   |  2 +-
>  .../media/test-drivers/vivid/vivid-touch-cap.c |  4 ++--
>  drivers/misc/habanalabs/gaudi2/gaudi2.c|  2 +-
>  drivers/mtd/nand/raw/nandsim.c |  2 +-
>  drivers/net/bonding/bond_main.c|  2 +-
>  drivers/net/ethernet/broadcom/cnic.c   |  2 +-
>  .../chelsio/inline_crypto/chtls/chtls_cm.c |  2 +-
>  drivers/net/ethernet/rocker/rocker_main.c  |  6 +++---
>  .../wireless/broadcom/brcm80211/brcmfmac/pno.c |  2 +-
>  .../net/wireless/marvell/mwifiex/cfg80211.c|  4 ++--
>  .../net/wireless/microchip/wilc1000/cfg80211.c |  2 +-
>  .../net/wireless/quantenna/qtnfmac/cfg80211.c  |  2 +-
>  drivers/net/wireless/ti/wlcore/main.c  |  2 +-
>  drivers/nvme/common/auth.c |  2 +-
>  drivers/scsi/cxgbi/cxgb4i/cxgb4i.c |  4 ++--
>  drivers/target/iscsi/cxgbit/cxgbit_cm.c|  2 +-
>  drivers/thunderbolt/xdomain.c  |  2 +-
>  drivers/video/fbdev/uvesafb.c  |  2 +-
>  fs/exfat/inode.c   |  2 +-
>  fs/ext4/ialloc.c   |  2 +-
>  fs/ext4/ioctl.c|  4 ++--
>  fs/ext4/mmp.c  |  2 +-
>  fs/f2fs/namei.c|  2 +-
>  fs/fat/inode.c |  2 +-
>  fs/nfsd/nfs4state.c|  4 ++--
>  fs/ntfs3/fslog.c   |  6 +++---
>  fs/ubifs/journal.c |  2 +-
>  fs/xfs/libxfs/xfs_ialloc.c |  2 +-
>  fs/xfs/xfs_icache.c|  2 +-
>  fs/xfs/xfs_log.c   |  2 +-
>  include/net/netfilter/nf_queue.h   |  2 +-
>  include/net/red.h  |  2 +-
>  include/net/sock.h |  2 +-
>  kernel/bpf/bloom_filter.c  |  2 +-
>  kernel/bpf/core.c  |  2 +-
>  kernel/bpf/hashtab.c   |  2 +-
>  kernel/bpf/verifier.c  |  2 +-
>  kernel/kcsan/selftest.c|  2 +-
>  lib/random32.c |  2 +-
>  lib/reed_solomon/test_rslib.c  |  6 +++---
>  lib/test_fprobe.c  |  2 +-
>  lib/test_kprobes.c |  2 +-
>  lib/test_min_heap.c|  6 +++---
>  lib/test_rhashtable.c  |  6 +++---
>  mm/shmem.c |  2 +-
>  mm/slab.c  |  2 +-
>  net/802/garp.c |  2 +-
>  net/802/mrp.c  |  2 +-
>  net/core/pktgen.c  |  4 ++--
>  net/ipv4/route.c   |  2 +-
>  net/ipv4/tcp_cdg.c |  2 +-
>  net/ipv4/udp.c |  2 +-
>  net/ipv6/ip6_flowlabel.c   |  2 +-
>  net/ipv6/output_core.c |  2 +-
>  net/netfilter/ipvs/ip_vs_conn.c|  2 +-
>  net/netfilter/xt_statistic.c   |  2 +-
>  net/openvswitch/actions.c  |  2 +-
>  net/rds/bind.c |  2 +-
>  net/sched/sch_cake.c   |  2 +-
>  net/sched/sch_netem.c  | 18 +-
>  

Re: [PATCH v4 2/6] treewide: use prandom_u32_max() when possible

2022-10-07 Thread Darrick J. Wong
On Fri, Oct 07, 2022 at 12:01:03PM -0600, Jason A. Donenfeld wrote:
> Rather than incurring a division or requesting too many random bytes for
> the given range, use the prandom_u32_max() function, which only takes
> the minimum required bytes from the RNG and avoids divisions.
> 
> Reviewed-by: Kees Cook 
> Reviewed-by: KP Singh 
> Reviewed-by: Christoph Böhmwalder  # for 
> drbd
> Reviewed-by: Jan Kara  # for ext2, ext4, and sbitmap
> Signed-off-by: Jason A. Donenfeld 
> ---



> diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c
> index e2bdf089c0a3..6261599bb389 100644
> --- a/fs/xfs/libxfs/xfs_alloc.c
> +++ b/fs/xfs/libxfs/xfs_alloc.c
> @@ -1520,7 +1520,7 @@ xfs_alloc_ag_vextent_lastblock(
>  
>  #ifdef DEBUG
>   /* Randomly don't execute the first algorithm. */
> - if (prandom_u32() & 1)
> + if (prandom_u32_max(2))

I wonder if these usecases (picking 0 or 1 randomly) ought to have a
trivial wrapper to make it more obvious that we want boolean semantics:

static inline bool prandom_bool(void)
{
return prandom_u32_max(2);
}

if (prandom_bool())
use_crazy_algorithm(...);

But this translation change looks correct to me, so for the XFS parts:
Acked-by: Darrick J. Wong 

--D


>   return 0;
>  #endif
>  
> diff --git a/fs/xfs/libxfs/xfs_ialloc.c b/fs/xfs/libxfs/xfs_ialloc.c
> index 6cdfd64bc56b..7838b31126e2 100644
> --- a/fs/xfs/libxfs/xfs_ialloc.c
> +++ b/fs/xfs/libxfs/xfs_ialloc.c
> @@ -636,7 +636,7 @@ xfs_ialloc_ag_alloc(
>   /* randomly do sparse inode allocations */
>   if (xfs_has_sparseinodes(tp->t_mountp) &&
>   igeo->ialloc_min_blks < igeo->ialloc_blks)
> - do_sparse = prandom_u32() & 1;
> + do_sparse = prandom_u32_max(2);
>  #endif
>  
>   /*
> diff --git a/include/linux/nodemask.h b/include/linux/nodemask.h
> index 4b71a96190a8..66ee9b4b7925 100644
> --- a/include/linux/nodemask.h
> +++ b/include/linux/nodemask.h
> @@ -509,7 +509,7 @@ static inline int node_random(const nodemask_t *maskp)
>   w = nodes_weight(*maskp);
>   if (w)
>   bit = bitmap_ord_to_pos(maskp->bits,
> - get_random_int() % w, MAX_NUMNODES);
> + prandom_u32_max(w), MAX_NUMNODES);
>   return bit;
>  #else
>   return 0;
> diff --git a/lib/cmdline_kunit.c b/lib/cmdline_kunit.c
> index e6a31c927b06..a72a2c16066e 100644
> --- a/lib/cmdline_kunit.c
> +++ b/lib/cmdline_kunit.c
> @@ -76,7 +76,7 @@ static void cmdline_test_lead_int(struct kunit *test)
>   int rc = cmdline_test_values[i];
>   int offset;
>  
> - sprintf(in, "%u%s", prandom_u32_max(256), str);
> + sprintf(in, "%u%s", get_random_int() % 256, str);
>   /* Only first '-' after the number will advance the pointer */
>   offset = strlen(in) - strlen(str) + !!(rc == 2);
>   cmdline_do_one_test(test, in, rc, offset);
> @@ -94,7 +94,7 @@ static void cmdline_test_tail_int(struct kunit *test)
>   int rc = strcmp(str, "") ? (strcmp(str, "-") ? 0 : 1) : 1;
>   int offset;
>  
> - sprintf(in, "%s%u", str, prandom_u32_max(256));
> + sprintf(in, "%s%u", str, get_random_int() % 256);
>   /*
>* Only first and leading '-' not followed by integer
>* will advance the pointer.
> diff --git a/lib/kobject.c b/lib/kobject.c
> index 5f0e71ab292c..a0b2dbfcfa23 100644
> --- a/lib/kobject.c
> +++ b/lib/kobject.c
> @@ -694,7 +694,7 @@ static void kobject_release(struct kref *kref)
>  {
>   struct kobject *kobj = container_of(kref, struct kobject, kref);
>  #ifdef CONFIG_DEBUG_KOBJECT_RELEASE
> - unsigned long delay = HZ + HZ * (get_random_int() & 0x3);
> + unsigned long delay = HZ + HZ * prandom_u32_max(4);
>   pr_info("kobject: '%s' (%p): %s, parent %p (delayed %ld)\n",
>kobject_name(kobj), kobj, __func__, kobj->parent, delay);
>   INIT_DELAYED_WORK(>release, kobject_delayed_cleanup);
> diff --git a/lib/reed_solomon/test_rslib.c b/lib/reed_solomon/test_rslib.c
> index 6faf9c9a6215..4d241bdc88aa 100644
> --- a/lib/reed_solomon/test_rslib.c
> +++ b/lib/reed_solomon/test_rslib.c
> @@ -199,7 +199,7 @@ static int get_rcw_we(struct rs_control *rs, struct 
> wspace *ws,
>  
>   derrlocs[i] = errloc;
>  
> - if (ewsc && (prandom_u32() & 1)) {
> + if (ewsc && prandom_u32_max(2)) {
>   /* Erasure with the symbol intact */
>   errlocs[errloc] = 2;
>   } else {
> diff --git a/lib/sbitmap.c b/lib/sbitmap.c
> index c4f04edf3ee9..ef0661504561 100644
> --- a/lib/sbitmap.c
> +++ b/lib/sbitmap.c
> @@ -21,7 +21,7 @@ static int init_alloc_hint(struct sbitmap *sb, gfp_t flags)
>   int i;
>  
>   for_each_possible_cpu(i)
> - *per_cpu_ptr(sb->alloc_hint, i) = prandom_u32() % depth;
> +

Re: [PATCH v4 1/6] treewide: use prandom_u32_max() when possible, mechanically

2022-10-07 Thread Darrick J. Wong
On Fri, Oct 07, 2022 at 12:01:02PM -0600, Jason A. Donenfeld wrote:
> Rather than incurring a division or requesting too many random bytes for
> the given range, use the prandom_u32_max() function, which only takes
> the minimum required bytes from the RNG and avoids divisions. This was
> done mechanically with these coccinelle scripts:
> 
> @no_modulo@
> expression E;
> @@
> -   (prandom_u32() % (E))
> +   prandom_u32_max(E)
> @no_modulo@
> expression E;
> @@
> -   (get_random_u32() % (E))
> +   prandom_u32_max(E)
> @no_modulo@
> expression E;
> @@
> -   (get_random_int() % (E))
> +   prandom_u32_max(E)
> 
> Reviewed-by: Kees Cook 
> Reviewed-by: KP Singh 
> Reviewed-by: Jan Kara  # for ext4 and sbitmap
> Acked-by: Ulf Hansson  # for mmc
> Signed-off-by: Jason A. Donenfeld 

For the XFS part,
Acked-by: Darrick J. Wong 

--D

> ---
>  arch/arm/kernel/process.c |  2 +-
>  arch/s390/kernel/vdso.c   |  2 +-
>  arch/um/kernel/process.c  |  2 +-
>  arch/x86/entry/vdso/vma.c |  2 +-
>  arch/x86/kernel/module.c  |  2 +-
>  arch/x86/kernel/process.c |  2 +-
>  arch/x86/mm/pat/cpa-test.c|  4 +-
>  crypto/testmgr.c  | 86 +--
>  .../gpu/drm/i915/gem/i915_gem_execbuffer.c|  2 +-
>  drivers/infiniband/core/cma.c |  2 +-
>  drivers/infiniband/hw/cxgb4/id_table.c|  4 +-
>  drivers/infiniband/hw/hns/hns_roce_ah.c   |  5 +-
>  drivers/infiniband/ulp/rtrs/rtrs-clt.c|  3 +-
>  .../test-drivers/vivid/vivid-touch-cap.c  |  2 +-
>  drivers/mmc/core/core.c   |  4 +-
>  drivers/mmc/host/dw_mmc.c |  2 +-
>  drivers/mtd/nand/raw/nandsim.c|  4 +-
>  drivers/mtd/tests/mtd_nandecctest.c   | 10 +--
>  drivers/mtd/ubi/debug.c   |  2 +-
>  .../chelsio/inline_crypto/chtls/chtls_io.c|  4 +-
>  drivers/net/hamradio/baycom_epp.c |  2 +-
>  drivers/net/hamradio/hdlcdrv.c|  2 +-
>  drivers/net/hamradio/yam.c|  2 +-
>  drivers/net/phy/at803x.c  |  2 +-
>  .../broadcom/brcm80211/brcmfmac/p2p.c |  2 +-
>  .../net/wireless/intel/iwlwifi/mvm/mac-ctxt.c |  2 +-
>  drivers/scsi/fcoe/fcoe_ctlr.c |  4 +-
>  drivers/scsi/qedi/qedi_main.c |  2 +-
>  fs/ceph/inode.c   |  2 +-
>  fs/ceph/mdsmap.c  |  2 +-
>  fs/ext4/super.c   |  7 +-
>  fs/f2fs/gc.c  |  2 +-
>  fs/f2fs/segment.c |  8 +-
>  fs/ubifs/debug.c  |  8 +-
>  fs/ubifs/lpt_commit.c | 12 +--
>  fs/xfs/xfs_error.c|  2 +-
>  kernel/bpf/core.c |  4 +-
>  kernel/locking/test-ww_mutex.c|  4 +-
>  kernel/time/clocksource.c |  2 +-
>  lib/cmdline_kunit.c   |  4 +-
>  lib/fault-inject.c|  2 +-
>  lib/find_bit_benchmark.c  |  4 +-
>  lib/reed_solomon/test_rslib.c |  4 +-
>  lib/sbitmap.c |  2 +-
>  lib/test-string_helpers.c |  2 +-
>  lib/test_hexdump.c| 10 +--
>  lib/test_kasan.c  |  6 +-
>  lib/test_list_sort.c  |  2 +-
>  mm/migrate.c  |  2 +-
>  mm/slub.c |  2 +-
>  net/ceph/mon_client.c |  2 +-
>  net/ceph/osd_client.c |  2 +-
>  net/core/neighbour.c  |  2 +-
>  net/core/pktgen.c | 39 -
>  net/core/stream.c |  2 +-
>  net/ipv4/igmp.c   |  6 +-
>  net/ipv4/inet_connection_sock.c   |  2 +-
>  net/ipv6/addrconf.c   |  8 +-
>  net/ipv6/mcast.c  | 10 +--
>  net/netfilter/ipvs/ip_vs_twos.c   |  4 +-
>  net/packet/af_packet.c|  2 +-
>  net/sched/act_gact.c  |  2 +-
>  net/sched/act_sample.c|  2 +-
>  net/sched/sch_netem.c |  4 +-
>  net/sctp/socket.c |  2 +-
>  net/tipc/socket.c |  2 +-
>  net/xfrm/xfrm_state.c |  2 +-
>  67 files changed, 173 insertions(+), 177 deletions(-)
> 
> diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
> index 3d9cace63884..35129ae36067 100644
> --- a/arch/arm/kernel/process.c
> +++ 

Difference GART/GTT and related questions

2022-10-07 Thread Peter Maucher

Hi dri-devel,

what is the difference between GTT and GART for AMD GPUs?
From what I gathered when looking through the mailing list archives and 
the freedesktop docs [1] as well as wikipedia [2],

these terms seem to be synonymous, but that can not be the whole truth
(different sizes in dmesg log, different kernel parameters in 
amdgpu/radeon, ...).


As far as I understand it currently,
the size of the GART is depending on some HW/ASIC functionality [3].
On the other hand, I was successfully able to increase the size of the 
GART mapping(?) from 512MB to 1024MB by using amdgpu.gartsize=1024 on my 
RX 6600, and booting the system.


GTT, on the other hand, is the maximum amount of system memory visible 
to the GPU, shared between all processes connected to the GPU.
As I understand it, using GPUVM, each process can have one or more GARTs 
for mapping?

Apparently, there is also something called a GART table window,
what's up with that?

Also, according to what I found in the mailing list archives,
the GPUVM functionality "replaces" old GART with new GART features,
so what is the difference and what exactly is GPUVM?
If I understood correctly, GPUVM is a MMU using page tables on the GPU?

And, additionally, the addresses translated by the GART(s) are 
optionally translated once more by the PCIe IOMMU,
as the former is located on the GPU and the latter is in the CPU's PCIe 
root complex?

Wikipedia mentions something about (another?) GART in an AMD IOMMU...

Lastly, do any of these numbers influence what the longest contiguous 
mapping is for one buffer to the GPU?
As in: can I map 95% or so of the available (GART/GTT?) space into one 
buffer and have the GPU work on it?


Thanks, Peter

[1] https://dri.freedesktop.org/wiki/GART/
[2] https://en.wikipedia.org/wiki/Graphics_address_remapping_table
[3] https://www.kernel.org/doc/html/v6.0/gpu/amdgpu/module-parameters.html



Re: [PATCH v4 4/6] treewide: use get_random_u32() when possible

2022-10-07 Thread Rolf Eike Beer
> diff --git a/arch/parisc/kernel/process.c b/arch/parisc/kernel/process.c
> index 7c37e09c92da..18c4f0e3e906 100644
> --- a/arch/parisc/kernel/process.c
> +++ b/arch/parisc/kernel/process.c
> @@ -288,7 +288,7 @@ __get_wchan(struct task_struct *p)
> 
>  static inline unsigned long brk_rnd(void)
>  {
> - return (get_random_int() & BRK_RND_MASK) << PAGE_SHIFT;
> + return (get_random_u32() & BRK_RND_MASK) << PAGE_SHIFT;
>  }

Can't this be

  prandom_u32_max(BRK_RND_MASK + 1) << PAGE_SHIFT

? More similar code with other masks follows below.

> diff --git a/drivers/gpu/drm/i915/i915_gem_gtt.c
> b/drivers/gpu/drm/i915/i915_gem_gtt.c index 329ff75b80b9..7bd1861ddbdf
> 100644
> --- a/drivers/gpu/drm/i915/i915_gem_gtt.c
> +++ b/drivers/gpu/drm/i915/i915_gem_gtt.c
> @@ -137,12 +137,12 @@ static u64 random_offset(u64 start, u64 end, u64 len,
> u64 align) range = round_down(end - len, align) - round_up(start, align);
>   if (range) {
>   if (sizeof(unsigned long) == sizeof(u64)) {
> - addr = get_random_long();
> + addr = get_random_u64();
>   } else {
> - addr = get_random_int();
> + addr = get_random_u32();
>   if (range > U32_MAX) {
>   addr <<= 32;
> - addr |= get_random_int();
> + addr |= get_random_u32();
>   }
>   }
>   div64_u64_rem(addr, range, );

How about 

if (sizeof(unsigned long) == sizeof(u64) || range > 
U32_MAX)
addr = get_random_u64();
else
addr = get_random_u32();

> diff --git a/drivers/infiniband/hw/cxgb4/cm.c
> b/drivers/infiniband/hw/cxgb4/cm.c index 14392c942f49..499a425a3379 100644
> --- a/drivers/infiniband/hw/cxgb4/cm.c
> +++ b/drivers/infiniband/hw/cxgb4/cm.c
> @@ -734,7 +734,7 @@ static int send_connect(struct c4iw_ep *ep)
>  >com.remote_addr;
>   int ret;
>   enum chip_type adapter_type = ep->com.dev->rdev.lldi.adapter_type;
> - u32 isn = (prandom_u32() & ~7UL) - 1;
> + u32 isn = (get_random_u32() & ~7UL) - 1;
>   struct net_device *netdev;
>   u64 params;
> 
> @@ -2469,7 +2469,7 @@ static int accept_cr(struct c4iw_ep *ep, struct
> sk_buff *skb, }
> 
>   if (!is_t4(adapter_type)) {
> - u32 isn = (prandom_u32() & ~7UL) - 1;
> + u32 isn = (get_random_u32() & ~7UL) - 1;

u32 isn = get_random_u32() | 0x7;

Same code comes later again.

> diff --git a/drivers/mtd/nand/raw/nandsim.c b/drivers/mtd/nand/raw/nandsim.c
> index 50bcf745e816..4bdaf4aa7007 100644
> --- a/drivers/mtd/nand/raw/nandsim.c
> +++ b/drivers/mtd/nand/raw/nandsim.c
> @@ -1402,7 +1402,7 @@ static int ns_do_read_error(struct nandsim *ns, int
> num)
> 
>  static void ns_do_bit_flips(struct nandsim *ns, int num)
>  {
> - if (bitflips && prandom_u32() < (1 << 22)) {
> + if (bitflips && get_random_u32() < (1 << 22)) {

Doing "get_random_u16() < (1 << 6)" should have the same probability with only 
2 bytes of random, no?

> diff --git a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c
> b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c index
> ac452a0111a9..b71ce6c5b512 100644
> --- a/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c
> +++ b/drivers/net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c
> @@ -1063,7 +1063,7 @@ static void chtls_pass_accept_rpl(struct sk_buff *skb,
> opt2 |= WND_SCALE_EN_V(WSCALE_OK(tp));
>   rpl5->opt0 = cpu_to_be64(opt0);
>   rpl5->opt2 = cpu_to_be32(opt2);
> - rpl5->iss = cpu_to_be32((prandom_u32() & ~7UL) - 1);
> + rpl5->iss = cpu_to_be32((get_random_u32() & ~7UL) - 1);
>   set_wr_txq(skb, CPL_PRIORITY_SETUP, csk->port_id);
>   t4_set_arp_err_handler(skb, sk, chtls_accept_rpl_arp_failure);
>   cxgb4_l2t_send(csk->egress_dev, skb, csk->l2t_entry);
> diff --git a/drivers/net/ethernet/rocker/rocker_main.c
> b/drivers/net/ethernet/rocker/rocker_main.c index
> fc83ec23bd1d..8c3bbafabb07 100644
> --- a/drivers/net/ethernet/rocker/rocker_main.c
> +++ b/drivers/net/ethernet/rocker/rocker_main.c
> @@ -139,9 +139,9 @@ static int rocker_reg_test(const struct rocker *rocker)
>   return -EIO;
>   }
> 
> - rnd = prandom_u32();
> + rnd = get_random_u32();
>   rnd <<= 31;
> - rnd |= prandom_u32();
> + rnd |= get_random_u32();

>   rocker_write64(rocker, TEST_REG64, rnd);
>   test_reg = rocker_read64(rocker, TEST_REG64);
>   if (test_reg != rnd * 2) {
> diff --git a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pno.c
> b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pno.c index
> fabfbb0b40b0..374e1cc07a63 100644
> --- a/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pno.c
> +++ b/drivers/net/wireless/broadcom/brcm80211/brcmfmac/pno.c
> @@ -177,7 +177,7 @@ static int 

[PATCH] drm/msm: Kconfig: Fix spelling mistake "throught" -> "through"

2022-10-07 Thread Colin Ian King
There is a spelling mistake in a Kconfig description. Fix it.

Signed-off-by: Colin Ian King 
---
 drivers/gpu/drm/msm/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/gpu/drm/msm/Kconfig b/drivers/gpu/drm/msm/Kconfig
index 4e0cbd682725..3c9dfdb0b328 100644
--- a/drivers/gpu/drm/msm/Kconfig
+++ b/drivers/gpu/drm/msm/Kconfig
@@ -155,7 +155,7 @@ config DRM_MSM_HDMI
  Compile in support for the HDMI output MSM DRM driver. It can
  be a primary or a secondary display on device. Note that this is used
  only for the direct HDMI output. If the device outputs HDMI data
- throught some kind of DSI-to-HDMI bridge, this option can be disabled.
+ through some kind of DSI-to-HDMI bridge, this option can be disabled.
 
 config DRM_MSM_HDMI_HDCP
bool "Enable HDMI HDCP support in MSM DRM driver"
-- 
2.37.3



[PATCH] drm/i915/perf: remove redundant variable 'taken'

2022-10-07 Thread Colin Ian King
The assignment to variable taken is redundant and so it can be
removed as well as the variable too.

Cleans up clang-scan build warnings:
warning: Although the value stored to 'taken' is used in the enclosing
expression, the value is never actually read from 'taken'
[deadcode.DeadStores]

Signed-off-by: Colin Ian King 
---
 drivers/gpu/drm/i915/i915_perf.c | 6 ++
 1 file changed, 2 insertions(+), 4 deletions(-)

diff --git a/drivers/gpu/drm/i915/i915_perf.c b/drivers/gpu/drm/i915/i915_perf.c
index 0defbb43ceea..15816df916c7 100644
--- a/drivers/gpu/drm/i915/i915_perf.c
+++ b/drivers/gpu/drm/i915/i915_perf.c
@@ -656,7 +656,6 @@ static int gen8_append_oa_reports(struct i915_perf_stream 
*stream,
size_t start_offset = *offset;
unsigned long flags;
u32 head, tail;
-   u32 taken;
int ret = 0;
 
if (drm_WARN_ON(>i915->drm, !stream->enabled))
@@ -692,7 +691,7 @@ static int gen8_append_oa_reports(struct i915_perf_stream 
*stream,
 
 
for (/* none */;
-(taken = OA_TAKEN(tail, head));
+OA_TAKEN(tail, head);
 head = (head + report_size) & mask) {
u8 *report = oa_buf_base + head;
u32 *report32 = (void *)report;
@@ -950,7 +949,6 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
size_t start_offset = *offset;
unsigned long flags;
u32 head, tail;
-   u32 taken;
int ret = 0;
 
if (drm_WARN_ON(>i915->drm, !stream->enabled))
@@ -984,7 +982,7 @@ static int gen7_append_oa_reports(struct i915_perf_stream 
*stream,
 
 
for (/* none */;
-(taken = OA_TAKEN(tail, head));
+OA_TAKEN(tail, head);
 head = (head + report_size) & mask) {
u8 *report = oa_buf_base + head;
u32 *report32 = (void *)report;
-- 
2.37.3



[linux-next:master] BUILD REGRESSION 082fce125e57cff60687181c97f3a8ee620c38f5

2022-10-07 Thread kernel test robot
tree/branch: 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git master
branch HEAD: 082fce125e57cff60687181c97f3a8ee620c38f5  Add linux-next specific 
files for 20221007

Error/Warning reports:

https://lore.kernel.org/linux-doc/202209201326.sy9kholm-...@intel.com
https://lore.kernel.org/linux-doc/202209231933.vcyettul-...@intel.com
https://lore.kernel.org/linux-doc/202210070057.npbamyxb-...@intel.com
https://lore.kernel.org/llvm/202209220019.yr2vuxhg-...@intel.com

Error/Warning: (recently discovered and may have been fixed)

ERROR: modpost: "devm_ioremap_resource" [drivers/dma/fsl-edma.ko] undefined!
ERROR: modpost: "devm_ioremap_resource" [drivers/dma/idma64.ko] undefined!
ERROR: modpost: "devm_ioremap_resource" [drivers/dma/qcom/hdma.ko] undefined!
ERROR: modpost: "devm_memremap" [drivers/misc/open-dice.ko] undefined!
ERROR: modpost: "devm_memunmap" [drivers/misc/open-dice.ko] undefined!
ERROR: modpost: "devm_platform_ioremap_resource" 
[drivers/char/xillybus/xillybus_of.ko] undefined!
ERROR: modpost: "ioremap" [drivers/net/ethernet/8390/pcnet_cs.ko] undefined!
ERROR: modpost: "ioremap" [drivers/tty/ipwireless/ipwireless.ko] undefined!
ERROR: modpost: "iounmap" [drivers/net/ethernet/8390/pcnet_cs.ko] undefined!
ERROR: modpost: "iounmap" [drivers/tty/ipwireless/ipwireless.ko] undefined!
Warning: Documentation/translations/zh_CN/devicetree/kernel-api.rst references 
a file that doesn't exist: Documentation/Devicetree/kernel-api.rst
Warning: MAINTAINERS references a file that doesn't exist: 
Documentation/devicetree/bindings/clock/microchip,mpfs.yaml
Warning: MAINTAINERS references a file that doesn't exist: 
Documentation/devicetree/bindings/mtd/amlogic,meson-nand.txt
arch/arm64/kernel/alternative.c:199:6: warning: no previous prototype for 
'apply_alternatives_vdso' [-Wmissing-prototypes]
arch/arm64/kernel/alternative.c:295:14: warning: no previous prototype for 
'alt_cb_patch_nops' [-Wmissing-prototypes]
arch/loongarch/mm/init.c:166:24: warning: variable 'new' set but not used 
[-Wunused-but-set-variable]
drivers/gpu/drm/amd/amdgpu/../display/dc/virtual/virtual_link_hwss.c:40:6: 
warning: no previous prototype for 'virtual_disable_link_output' 
[-Wmissing-prototypes]
drivers/iio/adc/mcp3911.c:252 mcp3911_write_raw() error: buffer overflow 
'mcp3911_osr_table' 8 <= 31
drivers/iio/adc/mcp3911.c:441 mcp3911_probe() warn: passing zero to 'PTR_ERR'
drivers/iio/adc/mcp3911.c:499 mcp3911_probe() warn: passing zero to 'PTR_ERR'
drivers/nvme/target/loop.c:578 nvme_loop_create_ctrl() warn: 'opts->queue_size 
- 1' 4294967295 can't fit into 65535 'ctrl->ctrl.sqsize'
fs/ext4/super.c:1744:19: warning: 'deprecated_msg' defined but not used 
[-Wunused-const-variable=]
pahole: .tmp_vmlinux.btf: No such file or directory

Error/Warning ids grouped by kconfigs:

gcc_recent_errors
|-- alpha-allyesconfig
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-virtual-virtual_link_hwss.c:warning:no-previous-prototype-for-virtual_disable_link_output
|-- alpha-buildonly-randconfig-r003-20221003
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-virtual-virtual_link_hwss.c:warning:no-previous-prototype-for-virtual_disable_link_output
|-- arc-allyesconfig
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-virtual-virtual_link_hwss.c:warning:no-previous-prototype-for-virtual_disable_link_output
|-- arm-allyesconfig
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-virtual-virtual_link_hwss.c:warning:no-previous-prototype-for-virtual_disable_link_output
|-- arm64-allyesconfig
|   |-- 
arch-arm64-kernel-alternative.c:warning:no-previous-prototype-for-alt_cb_patch_nops
|   |-- 
arch-arm64-kernel-alternative.c:warning:no-previous-prototype-for-apply_alternatives_vdso
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-virtual-virtual_link_hwss.c:warning:no-previous-prototype-for-virtual_disable_link_output
|-- arm64-randconfig-r004-20221002
|   |-- 
arch-arm64-kernel-alternative.c:warning:no-previous-prototype-for-alt_cb_patch_nops
|   |-- 
arch-arm64-kernel-alternative.c:warning:no-previous-prototype-for-apply_alternatives_vdso
|   `-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-virtual-virtual_link_hwss.c:warning:no-previous-prototype-for-virtual_disable_link_output
|-- arm64-randconfig-r014-20221003
|   |-- 
arch-arm64-kernel-alternative.c:warning:no-previous-prototype-for-alt_cb_patch_nops
|   `-- 
arch-arm64-kernel-alternative.c:warning:no-previous-prototype-for-apply_alternatives_vdso
|-- csky-randconfig-m041-20221002
|   |-- drivers-iio-adc-mcp3911.c-mcp3911_probe()-warn:passing-zero-to-PTR_ERR
|   |-- 
drivers-iio-adc-mcp3911.c-mcp3911_write_raw()-error:buffer-overflow-mcp3911_osr_table
|   `-- 
drivers-nvme-target-loop.c-nvme_loop_create_ctrl()-warn:opts-queue_size-can-t-fit-into-ctrl-ctrl.sqsize
|-- i386-allyesconfig
|   |-- 
drivers-gpu-drm-amd-amdgpu-..-display-dc-virtual-virtual_link_hwss.c:warning

[PATCH] drm/i915/gem: remove redundant assignments to variable ret

2022-10-07 Thread Colin Ian King
The variable ret is being assigned with a value that is never read
both before and after a while-loop. The variable is being re-assigned
inside the while-loop and afterwards on the call to the function
i915_gem_object_lock_interruptible. Remove the redundants assignments.

Cleans up clang scan-build warnings:

warning: Although the value stored to 'ret' is used in the
enclosing expression, the value is never actually read
from 'ret' [deadcode.DeadStores]

warning: Value stored to 'ret' is never read [deadcode.DeadStores]

Signed-off-by: Colin Ian King 
---
 drivers/gpu/drm/i915/gem/i915_gem_userptr.c | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)

diff --git a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c 
b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
index d4398948f016..b7e24476a0fd 100644
--- a/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
+++ b/drivers/gpu/drm/i915/gem/i915_gem_userptr.c
@@ -292,7 +292,7 @@ int i915_gem_object_userptr_submit_init(struct 
drm_i915_gem_object *obj)
if (!i915_gem_object_is_readonly(obj))
gup_flags |= FOLL_WRITE;
 
-   pinned = ret = 0;
+   pinned = 0;
while (pinned < num_pages) {
ret = pin_user_pages_fast(obj->userptr.ptr + pinned * PAGE_SIZE,
  num_pages - pinned, gup_flags,
@@ -302,7 +302,6 @@ int i915_gem_object_userptr_submit_init(struct 
drm_i915_gem_object *obj)
 
pinned += ret;
}
-   ret = 0;
 
ret = i915_gem_object_lock_interruptible(obj, NULL);
if (ret)
-- 
2.37.3



[PATCH v4 6/6] prandom: remove unused functions

2022-10-07 Thread Jason A. Donenfeld
With no callers left of prandom_u32() and prandom_bytes(), as well as
get_random_int(), remove these deprecated wrappers, in favor of
get_random_u32() and get_random_bytes().

Reviewed-by: Kees Cook 
Signed-off-by: Jason A. Donenfeld 
---
 drivers/char/random.c   | 11 +--
 include/linux/prandom.h | 12 
 include/linux/random.h  |  5 -
 3 files changed, 5 insertions(+), 23 deletions(-)

diff --git a/drivers/char/random.c b/drivers/char/random.c
index 01acf235f263..2fe28eeb2f38 100644
--- a/drivers/char/random.c
+++ b/drivers/char/random.c
@@ -97,7 +97,7 @@ MODULE_PARM_DESC(ratelimit_disable, "Disable random ratelimit 
suppression");
  * Returns whether or not the input pool has been seeded and thus guaranteed
  * to supply cryptographically secure random numbers. This applies to: the
  * /dev/urandom device, the get_random_bytes function, and the get_random_{u8,
- * u16,u32,u64,int,long} family of functions.
+ * u16,u32,u64,long} family of functions.
  *
  * Returns: true if the input pool has been seeded.
  *  false if the input pool has not been seeded.
@@ -161,15 +161,14 @@ EXPORT_SYMBOL(wait_for_random_bytes);
  * u16 get_random_u16()
  * u32 get_random_u32()
  * u64 get_random_u64()
- * unsigned int get_random_int()
  * unsigned long get_random_long()
  *
  * These interfaces will return the requested number of random bytes
  * into the given buffer or as a return value. This is equivalent to
- * a read from /dev/urandom. The u8, u16, u32, u64, int, and long
- * family of functions may be higher performance for one-off random
- * integers, because they do a bit of buffering and do not invoke
- * reseeding until the buffer is emptied.
+ * a read from /dev/urandom. The u8, u16, u32, u64, long family of
+ * functions may be higher performance for one-off random integers,
+ * because they do a bit of buffering and do not invoke reseeding
+ * until the buffer is emptied.
  *
  */
 
diff --git a/include/linux/prandom.h b/include/linux/prandom.h
index 78db003bc290..e0a0759dd09c 100644
--- a/include/linux/prandom.h
+++ b/include/linux/prandom.h
@@ -12,18 +12,6 @@
 #include 
 #include 
 
-/* Deprecated: use get_random_u32 instead. */
-static inline u32 prandom_u32(void)
-{
-   return get_random_u32();
-}
-
-/* Deprecated: use get_random_bytes instead. */
-static inline void prandom_bytes(void *buf, size_t nbytes)
-{
-   return get_random_bytes(buf, nbytes);
-}
-
 struct rnd_state {
__u32 s1, s2, s3, s4;
 };
diff --git a/include/linux/random.h b/include/linux/random.h
index 08322f700cdc..147a5e0d0b8e 100644
--- a/include/linux/random.h
+++ b/include/linux/random.h
@@ -42,10 +42,6 @@ u8 get_random_u8(void);
 u16 get_random_u16(void);
 u32 get_random_u32(void);
 u64 get_random_u64(void);
-static inline unsigned int get_random_int(void)
-{
-   return get_random_u32();
-}
 static inline unsigned long get_random_long(void)
 {
 #if BITS_PER_LONG == 64
@@ -100,7 +96,6 @@ declare_get_random_var_wait(u8, u8)
 declare_get_random_var_wait(u16, u16)
 declare_get_random_var_wait(u32, u32)
 declare_get_random_var_wait(u64, u32)
-declare_get_random_var_wait(int, unsigned int)
 declare_get_random_var_wait(long, unsigned long)
 #undef declare_get_random_var
 
-- 
2.37.3



[PATCH v4 5/6] treewide: use get_random_bytes when possible

2022-10-07 Thread Jason A. Donenfeld
The prandom_bytes() function has been a deprecated inline wrapper around
get_random_bytes() for several releases now, and compiles down to the
exact same code. Replace the deprecated wrapper with a direct call to
the real function.

Reviewed-by: Kees Cook 
Reviewed-by: Christophe Leroy  # powerpc
Signed-off-by: Jason A. Donenfeld 
---
 arch/powerpc/crypto/crc-vpmsum_test.c   |  2 +-
 block/blk-crypto-fallback.c |  2 +-
 crypto/async_tx/raid6test.c |  2 +-
 drivers/dma/dmatest.c   |  2 +-
 drivers/mtd/nand/raw/nandsim.c  |  2 +-
 drivers/mtd/tests/mtd_nandecctest.c |  2 +-
 drivers/mtd/tests/speedtest.c   |  2 +-
 drivers/mtd/tests/stresstest.c  |  2 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c   |  2 +-
 drivers/net/ethernet/rocker/rocker_main.c   |  2 +-
 drivers/net/wireguard/selftest/allowedips.c | 12 ++--
 fs/ubifs/debug.c|  2 +-
 kernel/kcsan/selftest.c |  2 +-
 lib/random32.c  |  2 +-
 lib/test_objagg.c   |  2 +-
 lib/uuid.c  |  2 +-
 net/ipv4/route.c|  2 +-
 net/mac80211/rc80211_minstrel_ht.c  |  2 +-
 net/sched/sch_pie.c |  2 +-
 19 files changed, 24 insertions(+), 24 deletions(-)

diff --git a/arch/powerpc/crypto/crc-vpmsum_test.c 
b/arch/powerpc/crypto/crc-vpmsum_test.c
index c1c1ef9457fb..273c527868db 100644
--- a/arch/powerpc/crypto/crc-vpmsum_test.c
+++ b/arch/powerpc/crypto/crc-vpmsum_test.c
@@ -82,7 +82,7 @@ static int __init crc_test_init(void)
 
if (len <= offset)
continue;
-   prandom_bytes(data, len);
+   get_random_bytes(data, len);
len -= offset;
 
crypto_shash_update(crct10dif_shash, data+offset, len);
diff --git a/block/blk-crypto-fallback.c b/block/blk-crypto-fallback.c
index 621abd1b0e4d..ad9844c5b40c 100644
--- a/block/blk-crypto-fallback.c
+++ b/block/blk-crypto-fallback.c
@@ -539,7 +539,7 @@ static int blk_crypto_fallback_init(void)
if (blk_crypto_fallback_inited)
return 0;
 
-   prandom_bytes(blank_key, BLK_CRYPTO_MAX_KEY_SIZE);
+   get_random_bytes(blank_key, BLK_CRYPTO_MAX_KEY_SIZE);
 
err = bioset_init(_bio_split, 64, 0, 0);
if (err)
diff --git a/crypto/async_tx/raid6test.c b/crypto/async_tx/raid6test.c
index c9d218e53bcb..f74505f2baf0 100644
--- a/crypto/async_tx/raid6test.c
+++ b/crypto/async_tx/raid6test.c
@@ -37,7 +37,7 @@ static void makedata(int disks)
int i;
 
for (i = 0; i < disks; i++) {
-   prandom_bytes(page_address(data[i]), PAGE_SIZE);
+   get_random_bytes(page_address(data[i]), PAGE_SIZE);
dataptrs[i] = data[i];
dataoffs[i] = 0;
}
diff --git a/drivers/dma/dmatest.c b/drivers/dma/dmatest.c
index 9fe2ae794316..ffe621695e47 100644
--- a/drivers/dma/dmatest.c
+++ b/drivers/dma/dmatest.c
@@ -312,7 +312,7 @@ static unsigned long dmatest_random(void)
 {
unsigned long buf;
 
-   prandom_bytes(, sizeof(buf));
+   get_random_bytes(, sizeof(buf));
return buf;
 }
 
diff --git a/drivers/mtd/nand/raw/nandsim.c b/drivers/mtd/nand/raw/nandsim.c
index 4bdaf4aa7007..c941a5a41ea6 100644
--- a/drivers/mtd/nand/raw/nandsim.c
+++ b/drivers/mtd/nand/raw/nandsim.c
@@ -1393,7 +1393,7 @@ static int ns_do_read_error(struct nandsim *ns, int num)
unsigned int page_no = ns->regs.row;
 
if (ns_read_error(page_no)) {
-   prandom_bytes(ns->buf.byte, num);
+   get_random_bytes(ns->buf.byte, num);
NS_WARN("simulating read error in page %u\n", page_no);
return 1;
}
diff --git a/drivers/mtd/tests/mtd_nandecctest.c 
b/drivers/mtd/tests/mtd_nandecctest.c
index 1c7201b0f372..440988562cfd 100644
--- a/drivers/mtd/tests/mtd_nandecctest.c
+++ b/drivers/mtd/tests/mtd_nandecctest.c
@@ -266,7 +266,7 @@ static int nand_ecc_test_run(const size_t size)
goto error;
}
 
-   prandom_bytes(correct_data, size);
+   get_random_bytes(correct_data, size);
ecc_sw_hamming_calculate(correct_data, size, correct_ecc, sm_order);
for (i = 0; i < ARRAY_SIZE(nand_ecc_test); i++) {
nand_ecc_test[i].prepare(error_data, error_ecc,
diff --git a/drivers/mtd/tests/speedtest.c b/drivers/mtd/tests/speedtest.c
index c9ec7086bfa1..075bce32caa5 100644
--- a/drivers/mtd/tests/speedtest.c
+++ b/drivers/mtd/tests/speedtest.c
@@ -223,7 +223,7 @@ static int __init mtd_speedtest_init(void)
if (!iobuf)
goto out;
 
-   prandom_bytes(iobuf, mtd->erasesize);
+   get_random_bytes(iobuf, mtd->erasesize);
 
bbt = kzalloc(ebcnt, GFP_KERNEL);
if (!bbt)
diff --git 

[PATCH v4 4/6] treewide: use get_random_u32() when possible

2022-10-07 Thread Jason A. Donenfeld
The prandom_u32() function has been a deprecated inline wrapper around
get_random_u32() for several releases now, and compiles down to the
exact same code. Replace the deprecated wrapper with a direct call to
the real function. The same also applies to get_random_int(), which is
just a wrapper around get_random_u32().

Reviewed-by: Kees Cook 
Acked-by: Toke Høiland-Jørgensen  # for sch_cake
Acked-by: Chuck Lever  # for nfsd
Reviewed-by: Jan Kara  # for ext4
Acked-by: Mika Westerberg  # for thunderbolt
Signed-off-by: Jason A. Donenfeld 
---
 Documentation/networking/filter.rst|  2 +-
 arch/parisc/kernel/process.c   |  2 +-
 arch/parisc/kernel/sys_parisc.c|  4 ++--
 arch/s390/mm/mmap.c|  2 +-
 arch/x86/kernel/cpu/amd.c  |  2 +-
 drivers/gpu/drm/i915/i915_gem_gtt.c|  6 +++---
 drivers/gpu/drm/i915/selftests/i915_selftest.c |  2 +-
 drivers/gpu/drm/selftests/test-drm_buddy.c |  2 +-
 drivers/gpu/drm/selftests/test-drm_mm.c|  2 +-
 drivers/infiniband/hw/cxgb4/cm.c   |  4 ++--
 drivers/infiniband/hw/hfi1/tid_rdma.c  |  2 +-
 drivers/infiniband/hw/mlx4/mad.c   |  2 +-
 drivers/infiniband/ulp/ipoib/ipoib_cm.c|  2 +-
 drivers/md/raid5-cache.c   |  2 +-
 .../media/test-drivers/vivid/vivid-touch-cap.c |  4 ++--
 drivers/misc/habanalabs/gaudi2/gaudi2.c|  2 +-
 drivers/mtd/nand/raw/nandsim.c |  2 +-
 drivers/net/bonding/bond_main.c|  2 +-
 drivers/net/ethernet/broadcom/cnic.c   |  2 +-
 .../chelsio/inline_crypto/chtls/chtls_cm.c |  2 +-
 drivers/net/ethernet/rocker/rocker_main.c  |  6 +++---
 .../wireless/broadcom/brcm80211/brcmfmac/pno.c |  2 +-
 .../net/wireless/marvell/mwifiex/cfg80211.c|  4 ++--
 .../net/wireless/microchip/wilc1000/cfg80211.c |  2 +-
 .../net/wireless/quantenna/qtnfmac/cfg80211.c  |  2 +-
 drivers/net/wireless/ti/wlcore/main.c  |  2 +-
 drivers/nvme/common/auth.c |  2 +-
 drivers/scsi/cxgbi/cxgb4i/cxgb4i.c |  4 ++--
 drivers/target/iscsi/cxgbit/cxgbit_cm.c|  2 +-
 drivers/thunderbolt/xdomain.c  |  2 +-
 drivers/video/fbdev/uvesafb.c  |  2 +-
 fs/exfat/inode.c   |  2 +-
 fs/ext4/ialloc.c   |  2 +-
 fs/ext4/ioctl.c|  4 ++--
 fs/ext4/mmp.c  |  2 +-
 fs/f2fs/namei.c|  2 +-
 fs/fat/inode.c |  2 +-
 fs/nfsd/nfs4state.c|  4 ++--
 fs/ntfs3/fslog.c   |  6 +++---
 fs/ubifs/journal.c |  2 +-
 fs/xfs/libxfs/xfs_ialloc.c |  2 +-
 fs/xfs/xfs_icache.c|  2 +-
 fs/xfs/xfs_log.c   |  2 +-
 include/net/netfilter/nf_queue.h   |  2 +-
 include/net/red.h  |  2 +-
 include/net/sock.h |  2 +-
 kernel/bpf/bloom_filter.c  |  2 +-
 kernel/bpf/core.c  |  2 +-
 kernel/bpf/hashtab.c   |  2 +-
 kernel/bpf/verifier.c  |  2 +-
 kernel/kcsan/selftest.c|  2 +-
 lib/random32.c |  2 +-
 lib/reed_solomon/test_rslib.c  |  6 +++---
 lib/test_fprobe.c  |  2 +-
 lib/test_kprobes.c |  2 +-
 lib/test_min_heap.c|  6 +++---
 lib/test_rhashtable.c  |  6 +++---
 mm/shmem.c |  2 +-
 mm/slab.c  |  2 +-
 net/802/garp.c |  2 +-
 net/802/mrp.c  |  2 +-
 net/core/pktgen.c  |  4 ++--
 net/ipv4/route.c   |  2 +-
 net/ipv4/tcp_cdg.c |  2 +-
 net/ipv4/udp.c |  2 +-
 net/ipv6/ip6_flowlabel.c   |  2 +-
 net/ipv6/output_core.c |  2 +-
 net/netfilter/ipvs/ip_vs_conn.c|  2 +-
 net/netfilter/xt_statistic.c   |  2 +-
 net/openvswitch/actions.c  |  2 +-
 net/rds/bind.c |  2 +-
 net/sched/sch_cake.c   |  2 +-
 net/sched/sch_netem.c  | 18 +-
 net/sunrpc/auth_gss/gss_krb5_wrap.c|  4 ++--
 net/sunrpc/xprt.c  |  2 +-
 net/unix/af_unix.c |  2 +-
 76 files changed, 105 insertions(+), 105 deletions(-)

diff --git a/Documentation/networking/filter.rst 

[PATCH v4 3/6] treewide: use get_random_{u8,u16}() when possible

2022-10-07 Thread Jason A. Donenfeld
Rather than truncate a 32-bit value to a 16-bit value or an 8-bit value,
simply use the get_random_{u8,u16}() functions, which are faster than
wasting the additional bytes from a 32-bit value.

Reviewed-by: Kees Cook 
Acked-by: Toke Høiland-Jørgensen  # for sch_cake
Signed-off-by: Jason A. Donenfeld 
---
 arch/arm/kernel/signal.c  | 2 +-
 arch/arm64/kernel/syscall.c   | 2 +-
 arch/s390/kernel/process.c| 2 +-
 arch/sparc/vdso/vma.c | 2 +-
 crypto/testmgr.c  | 8 
 drivers/media/common/v4l2-tpg/v4l2-tpg-core.c | 2 +-
 drivers/media/test-drivers/vivid/vivid-radio-rx.c | 4 ++--
 .../net/ethernet/chelsio/inline_crypto/chtls/chtls_cm.c   | 2 +-
 drivers/net/hamradio/baycom_epp.c | 2 +-
 drivers/net/hamradio/hdlcdrv.c| 2 +-
 drivers/net/hamradio/yam.c| 2 +-
 drivers/net/wireguard/selftest/allowedips.c   | 4 ++--
 drivers/net/wireless/st/cw1200/wsm.c  | 2 +-
 drivers/scsi/lpfc/lpfc_hbadisc.c  | 6 +++---
 lib/cmdline_kunit.c   | 4 ++--
 lib/test_vmalloc.c| 2 +-
 net/dccp/ipv4.c   | 4 ++--
 net/ipv4/datagram.c   | 2 +-
 net/ipv4/ip_output.c  | 2 +-
 net/ipv4/tcp_ipv4.c   | 4 ++--
 net/mac80211/scan.c   | 2 +-
 net/netfilter/nf_nat_core.c   | 4 ++--
 net/sched/sch_cake.c  | 6 +++---
 net/sched/sch_sfb.c   | 2 +-
 net/sctp/socket.c | 2 +-
 25 files changed, 38 insertions(+), 38 deletions(-)

diff --git a/arch/arm/kernel/signal.c b/arch/arm/kernel/signal.c
index ea128e32e8ca..e07f359254c3 100644
--- a/arch/arm/kernel/signal.c
+++ b/arch/arm/kernel/signal.c
@@ -655,7 +655,7 @@ struct page *get_signal_page(void)
 PAGE_SIZE / sizeof(u32));
 
/* Give the signal return code some randomness */
-   offset = 0x200 + (get_random_int() & 0x7fc);
+   offset = 0x200 + (get_random_u16() & 0x7fc);
signal_return_offset = offset;
 
/* Copy signal return handlers into the page */
diff --git a/arch/arm64/kernel/syscall.c b/arch/arm64/kernel/syscall.c
index 733451fe7e41..d72e8f23422d 100644
--- a/arch/arm64/kernel/syscall.c
+++ b/arch/arm64/kernel/syscall.c
@@ -67,7 +67,7 @@ static void invoke_syscall(struct pt_regs *regs, unsigned int 
scno,
 *
 * The resulting 5 bits of entropy is seen in SP[8:4].
 */
-   choose_random_kstack_offset(get_random_int() & 0x1FF);
+   choose_random_kstack_offset(get_random_u16() & 0x1FF);
 }
 
 static inline bool has_syscall_work(unsigned long flags)
diff --git a/arch/s390/kernel/process.c b/arch/s390/kernel/process.c
index 5ec78555dd2e..42af4b3aa02b 100644
--- a/arch/s390/kernel/process.c
+++ b/arch/s390/kernel/process.c
@@ -230,7 +230,7 @@ unsigned long arch_align_stack(unsigned long sp)
 
 static inline unsigned long brk_rnd(void)
 {
-   return (get_random_int() & BRK_RND_MASK) << PAGE_SHIFT;
+   return (get_random_u16() & BRK_RND_MASK) << PAGE_SHIFT;
 }
 
 unsigned long arch_randomize_brk(struct mm_struct *mm)
diff --git a/arch/sparc/vdso/vma.c b/arch/sparc/vdso/vma.c
index cc19e09b0fa1..04ee726859ca 100644
--- a/arch/sparc/vdso/vma.c
+++ b/arch/sparc/vdso/vma.c
@@ -354,7 +354,7 @@ static unsigned long vdso_addr(unsigned long start, 
unsigned int len)
unsigned int offset;
 
/* This loses some more bits than a modulo, but is cheaper */
-   offset = get_random_int() & (PTRS_PER_PTE - 1);
+   offset = get_random_u16() & (PTRS_PER_PTE - 1);
return start + (offset << PAGE_SHIFT);
 }
 
diff --git a/crypto/testmgr.c b/crypto/testmgr.c
index be45217acde4..981c637fa2ed 100644
--- a/crypto/testmgr.c
+++ b/crypto/testmgr.c
@@ -927,7 +927,7 @@ static void generate_random_bytes(u8 *buf, size_t count)
b = 0xff;
break;
default:
-   b = (u8)prandom_u32();
+   b = get_random_u8();
break;
}
memset(buf, b, count);
@@ -935,8 +935,8 @@ static void generate_random_bytes(u8 *buf, size_t count)
break;
case 2:
/* Ascending or descending bytes, plus optional mutations */
-   increment = (u8)prandom_u32();
-   b = (u8)prandom_u32();
+   increment = get_random_u8();
+   b = get_random_u8();
for (i = 0; i < count; i++, b += 

[PATCH v4 2/6] treewide: use prandom_u32_max() when possible

2022-10-07 Thread Jason A. Donenfeld
Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions.

Reviewed-by: Kees Cook 
Reviewed-by: KP Singh 
Reviewed-by: Christoph Böhmwalder  # for drbd
Reviewed-by: Jan Kara  # for ext2, ext4, and sbitmap
Signed-off-by: Jason A. Donenfeld 
---
 arch/arm64/kernel/process.c  |  2 +-
 arch/loongarch/kernel/process.c  |  2 +-
 arch/loongarch/kernel/vdso.c |  2 +-
 arch/mips/kernel/process.c   |  2 +-
 arch/mips/kernel/vdso.c  |  2 +-
 arch/parisc/kernel/vdso.c|  2 +-
 arch/powerpc/kernel/process.c|  2 +-
 arch/s390/kernel/process.c   |  2 +-
 drivers/block/drbd/drbd_receiver.c   |  4 ++--
 drivers/md/bcache/request.c  |  2 +-
 drivers/mtd/tests/stresstest.c   | 17 -
 drivers/mtd/ubi/debug.h  |  6 +++---
 drivers/net/ethernet/broadcom/cnic.c |  3 +--
 fs/ext2/ialloc.c |  3 +--
 fs/ext4/ialloc.c |  5 ++---
 fs/ubifs/lpt_commit.c|  2 +-
 fs/ubifs/tnc_commit.c|  2 +-
 fs/xfs/libxfs/xfs_alloc.c|  2 +-
 fs/xfs/libxfs/xfs_ialloc.c   |  2 +-
 include/linux/nodemask.h |  2 +-
 lib/cmdline_kunit.c  |  4 ++--
 lib/kobject.c|  2 +-
 lib/reed_solomon/test_rslib.c|  2 +-
 lib/sbitmap.c|  2 +-
 lib/test_hexdump.c   |  2 +-
 lib/test_vmalloc.c   | 17 -
 net/core/pktgen.c|  4 ++--
 net/ipv4/inet_hashtables.c   |  2 +-
 net/sunrpc/cache.c   |  2 +-
 net/sunrpc/xprtsock.c|  2 +-
 30 files changed, 42 insertions(+), 63 deletions(-)

diff --git a/arch/arm64/kernel/process.c b/arch/arm64/kernel/process.c
index 92bcc1768f0b..87203429f802 100644
--- a/arch/arm64/kernel/process.c
+++ b/arch/arm64/kernel/process.c
@@ -595,7 +595,7 @@ unsigned long __get_wchan(struct task_struct *p)
 unsigned long arch_align_stack(unsigned long sp)
 {
if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
-   sp -= get_random_int() & ~PAGE_MASK;
+   sp -= prandom_u32_max(PAGE_SIZE);
return sp & ~0xf;
 }
 
diff --git a/arch/loongarch/kernel/process.c b/arch/loongarch/kernel/process.c
index 660492f064e7..1256e3582475 100644
--- a/arch/loongarch/kernel/process.c
+++ b/arch/loongarch/kernel/process.c
@@ -293,7 +293,7 @@ unsigned long stack_top(void)
 unsigned long arch_align_stack(unsigned long sp)
 {
if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
-   sp -= get_random_int() & ~PAGE_MASK;
+   sp -= prandom_u32_max(PAGE_SIZE);
 
return sp & STACK_ALIGN;
 }
diff --git a/arch/loongarch/kernel/vdso.c b/arch/loongarch/kernel/vdso.c
index f32c38abd791..8c9826062652 100644
--- a/arch/loongarch/kernel/vdso.c
+++ b/arch/loongarch/kernel/vdso.c
@@ -78,7 +78,7 @@ static unsigned long vdso_base(void)
unsigned long base = STACK_TOP;
 
if (current->flags & PF_RANDOMIZE) {
-   base += get_random_int() & (VDSO_RANDOMIZE_SIZE - 1);
+   base += prandom_u32_max(VDSO_RANDOMIZE_SIZE);
base = PAGE_ALIGN(base);
}
 
diff --git a/arch/mips/kernel/process.c b/arch/mips/kernel/process.c
index 35b912bce429..bbe9ce471791 100644
--- a/arch/mips/kernel/process.c
+++ b/arch/mips/kernel/process.c
@@ -711,7 +711,7 @@ unsigned long mips_stack_top(void)
 unsigned long arch_align_stack(unsigned long sp)
 {
if (!(current->personality & ADDR_NO_RANDOMIZE) && randomize_va_space)
-   sp -= get_random_int() & ~PAGE_MASK;
+   sp -= prandom_u32_max(PAGE_SIZE);
 
return sp & ALMASK;
 }
diff --git a/arch/mips/kernel/vdso.c b/arch/mips/kernel/vdso.c
index b2cc2c2dd4bf..5fd9bf1d596c 100644
--- a/arch/mips/kernel/vdso.c
+++ b/arch/mips/kernel/vdso.c
@@ -79,7 +79,7 @@ static unsigned long vdso_base(void)
}
 
if (current->flags & PF_RANDOMIZE) {
-   base += get_random_int() & (VDSO_RANDOMIZE_SIZE - 1);
+   base += prandom_u32_max(VDSO_RANDOMIZE_SIZE);
base = PAGE_ALIGN(base);
}
 
diff --git a/arch/parisc/kernel/vdso.c b/arch/parisc/kernel/vdso.c
index 63dc44c4c246..47e5960a2f96 100644
--- a/arch/parisc/kernel/vdso.c
+++ b/arch/parisc/kernel/vdso.c
@@ -75,7 +75,7 @@ int arch_setup_additional_pages(struct linux_binprm *bprm,
 
map_base = mm->mmap_base;
if (current->flags & PF_RANDOMIZE)
-   map_base -= (get_random_int() & 0x1f) * PAGE_SIZE;
+   map_base -= prandom_u32_max(0x20) * PAGE_SIZE;
 
vdso_text_start = get_unmapped_area(NULL, map_base, vdso_text_len, 0, 
0);
 
diff --git a/arch/powerpc/kernel/process.c b/arch/powerpc/kernel/process.c
index 

[PATCH v4 1/6] treewide: use prandom_u32_max() when possible, mechanically

2022-10-07 Thread Jason A. Donenfeld
Rather than incurring a division or requesting too many random bytes for
the given range, use the prandom_u32_max() function, which only takes
the minimum required bytes from the RNG and avoids divisions. This was
done mechanically with these coccinelle scripts:

@no_modulo@
expression E;
@@
-   (prandom_u32() % (E))
+   prandom_u32_max(E)
@no_modulo@
expression E;
@@
-   (get_random_u32() % (E))
+   prandom_u32_max(E)
@no_modulo@
expression E;
@@
-   (get_random_int() % (E))
+   prandom_u32_max(E)

Reviewed-by: Kees Cook 
Reviewed-by: KP Singh 
Reviewed-by: Jan Kara  # for ext4 and sbitmap
Acked-by: Ulf Hansson  # for mmc
Signed-off-by: Jason A. Donenfeld 
---
 arch/arm/kernel/process.c |  2 +-
 arch/s390/kernel/vdso.c   |  2 +-
 arch/um/kernel/process.c  |  2 +-
 arch/x86/entry/vdso/vma.c |  2 +-
 arch/x86/kernel/module.c  |  2 +-
 arch/x86/kernel/process.c |  2 +-
 arch/x86/mm/pat/cpa-test.c|  4 +-
 crypto/testmgr.c  | 86 +--
 .../gpu/drm/i915/gem/i915_gem_execbuffer.c|  2 +-
 drivers/infiniband/core/cma.c |  2 +-
 drivers/infiniband/hw/cxgb4/id_table.c|  4 +-
 drivers/infiniband/hw/hns/hns_roce_ah.c   |  5 +-
 drivers/infiniband/ulp/rtrs/rtrs-clt.c|  3 +-
 .../test-drivers/vivid/vivid-touch-cap.c  |  2 +-
 drivers/mmc/core/core.c   |  4 +-
 drivers/mmc/host/dw_mmc.c |  2 +-
 drivers/mtd/nand/raw/nandsim.c|  4 +-
 drivers/mtd/tests/mtd_nandecctest.c   | 10 +--
 drivers/mtd/ubi/debug.c   |  2 +-
 .../chelsio/inline_crypto/chtls/chtls_io.c|  4 +-
 drivers/net/hamradio/baycom_epp.c |  2 +-
 drivers/net/hamradio/hdlcdrv.c|  2 +-
 drivers/net/hamradio/yam.c|  2 +-
 drivers/net/phy/at803x.c  |  2 +-
 .../broadcom/brcm80211/brcmfmac/p2p.c |  2 +-
 .../net/wireless/intel/iwlwifi/mvm/mac-ctxt.c |  2 +-
 drivers/scsi/fcoe/fcoe_ctlr.c |  4 +-
 drivers/scsi/qedi/qedi_main.c |  2 +-
 fs/ceph/inode.c   |  2 +-
 fs/ceph/mdsmap.c  |  2 +-
 fs/ext4/super.c   |  7 +-
 fs/f2fs/gc.c  |  2 +-
 fs/f2fs/segment.c |  8 +-
 fs/ubifs/debug.c  |  8 +-
 fs/ubifs/lpt_commit.c | 12 +--
 fs/xfs/xfs_error.c|  2 +-
 kernel/bpf/core.c |  4 +-
 kernel/locking/test-ww_mutex.c|  4 +-
 kernel/time/clocksource.c |  2 +-
 lib/cmdline_kunit.c   |  4 +-
 lib/fault-inject.c|  2 +-
 lib/find_bit_benchmark.c  |  4 +-
 lib/reed_solomon/test_rslib.c |  4 +-
 lib/sbitmap.c |  2 +-
 lib/test-string_helpers.c |  2 +-
 lib/test_hexdump.c| 10 +--
 lib/test_kasan.c  |  6 +-
 lib/test_list_sort.c  |  2 +-
 mm/migrate.c  |  2 +-
 mm/slub.c |  2 +-
 net/ceph/mon_client.c |  2 +-
 net/ceph/osd_client.c |  2 +-
 net/core/neighbour.c  |  2 +-
 net/core/pktgen.c | 39 -
 net/core/stream.c |  2 +-
 net/ipv4/igmp.c   |  6 +-
 net/ipv4/inet_connection_sock.c   |  2 +-
 net/ipv6/addrconf.c   |  8 +-
 net/ipv6/mcast.c  | 10 +--
 net/netfilter/ipvs/ip_vs_twos.c   |  4 +-
 net/packet/af_packet.c|  2 +-
 net/sched/act_gact.c  |  2 +-
 net/sched/act_sample.c|  2 +-
 net/sched/sch_netem.c |  4 +-
 net/sctp/socket.c |  2 +-
 net/tipc/socket.c |  2 +-
 net/xfrm/xfrm_state.c |  2 +-
 67 files changed, 173 insertions(+), 177 deletions(-)

diff --git a/arch/arm/kernel/process.c b/arch/arm/kernel/process.c
index 3d9cace63884..35129ae36067 100644
--- a/arch/arm/kernel/process.c
+++ b/arch/arm/kernel/process.c
@@ -375,7 +375,7 @@ static unsigned long sigpage_addr(const struct mm_struct 
*mm,
 
slots = ((last - first) >> PAGE_SHIFT) + 1;
 
-   offset = get_random_int() % slots;
+   offset = prandom_u32_max(slots);
 
addr = first + (offset << PAGE_SHIFT);
 
diff --git 

[PATCH v4 0/6] treewide cleanup of random integer usage

2022-10-07 Thread Jason A. Donenfeld
Changes v3->v4:
- Split coccinelle mechanical step from non-mechanical step.
- Handle `get_random_int() & ~PAGE_MASK` -> `prandom_u32_max(PAGE_SIZE)`.

Hi folks,

This is a five part treewide cleanup of random integer handling. The
rules for random integers are:

- If you want a secure or an insecure random u64, use get_random_u64().
- If you want a secure or an insecure random u32, use get_random_u32().
  * The old function prandom_u32() has been deprecated for a while now
and is just a wrapper around get_random_u32(). Same for
get_random_int().
- If you want a secure or an insecure random u16, use get_random_u16().
- If you want a secure or an insecure random u8, use get_random_u8().
- If you want secure or insecure random bytes, use get_random_bytes().
  * The old function prandom_bytes() has been deprecated for a while now
and has long been a wrapper around get_random_bytes().
- If you want a non-uniform random u32, u16, or u8 bounded by a certain
  open interval maximum, use prandom_u32_max().
  * I say "non-uniform", because it doesn't do any rejection sampling or
divisions. Hence, it stays within the prandom_* namespace.

These rules ought to be applied uniformly, so that we can clean up the
deprecated functions, and earn the benefits of using the modern
functions. In particular, in addition to the boring substitutions, this
patchset accomplishes a few nice effects:

- By using prandom_u32_max() with an upper-bound that the compiler can
  prove at compile-time is ≤65536 or ≤256, internally get_random_u16()
  or get_random_u8() is used, which wastes fewer batched random bytes,
  and hence has higher throughput.

- By using prandom_u32_max() instead of %, when the upper-bound is not a
  constant, division is still avoided, because prandom_u32_max() uses
  a faster multiplication-based trick instead.

- By using get_random_u16() or get_random_u8() in cases where the return
  value is intended to indeed be a u16 or a u8, we waste fewer batched
  random bytes, and hence have higher throughput.

So, based on those rules and benefits from following them, this patchset
breaks down into the following five steps, which were done mostly
manually, but the first step was split into two patches, the first of
which is cocinelle-able:

1) Replace `prandom_u32() % max` and variants thereof with
   prandom_u32_max(max).

2) Replace `(type)get_random_u32()` and variants thereof with
   get_random_u16() or get_random_u8(). I took the pains to actually
   look and see what every lvalue type was across the entire tree.

3) Replace remaining deprecated uses of prandom_u32() and
   get_random_int() with get_random_u32(). 

4) Replace remaining deprecated uses of prandom_bytes() with
   get_random_bytes().

5) Remove the deprecated and now-unused prandom_u32() and
   prandom_bytes() inline wrapper functions.

I was thinking of taking this through my random.git tree (on which this
series is currently based) and submitting it near the end of the merge
window, or waiting for the very end of the 6.1 cycle when there will be
the fewest new patches brewing. If somebody with some treewide-cleanup
experience might share some wisdom about what the best timing usually
winds up being, I'm all ears.

Please take a look! At "379 insertions(+), 422 deletions(-)", this
should be a somewhat small patchset to review, despite it having the
scary "treewide" moniker on it.

Thanks,
Jason

Cc: Andreas Noever 
Cc: Andrew Morton 
Cc: Andy Shevchenko 
Cc: Borislav Petkov 
Cc: Catalin Marinas 
Cc: Christoph Böhmwalder 
Cc: Christoph Hellwig 
Cc: Christophe Leroy 
Cc: Daniel Borkmann 
Cc: Dave Airlie 
Cc: Dave Hansen 
Cc: David S. Miller 
Cc: Eric Dumazet 
Cc: Florian Westphal 
Cc: Greg Kroah-Hartman ,
Cc: H. Peter Anvin 
Cc: Heiko Carstens 
Cc: Helge Deller 
Cc: Herbert Xu 
Cc: Huacai Chen 
Cc: Hugh Dickins 
Cc: Jakub Kicinski 
Cc: James E.J. Bottomley 
Cc: Jan Kara 
Cc: Jason Gunthorpe 
Cc: Jens Axboe 
Cc: Johannes Berg 
Cc: Jonathan Corbet 
Cc: Jozsef Kadlecsik 
Cc: KP Singh 
Cc: Kees Cook 
Cc: Marco Elver 
Cc: Mauro Carvalho Chehab 
Cc: Michael Ellerman 
Cc: Pablo Neira Ayuso 
Cc: Paolo Abeni 
Cc: Peter Zijlstra 
Cc: Richard Weinberger 
Cc: Russell King 
Cc: Theodore Ts'o 
Cc: Thomas Bogendoerfer 
Cc: Thomas Gleixner 
Cc: Thomas Graf 
Cc: Ulf Hansson 
Cc: Vignesh Raghavendra 
Cc: WANG Xuerui 
Cc: Will Deacon 
Cc: Yury Norov 
Cc: dri-devel@lists.freedesktop.org
Cc: kasan-...@googlegroups.com
Cc: kernel-janit...@vger.kernel.org
Cc: linux-arm-ker...@lists.infradead.org
Cc: linux-bl...@vger.kernel.org
Cc: linux-cry...@vger.kernel.org
Cc: linux-...@vger.kernel.org
Cc: linux-fsde...@vger.kernel.org
Cc: linux-me...@vger.kernel.org
Cc: linux-m...@vger.kernel.org
Cc: linux...@kvack.org
Cc: linux-...@vger.kernel.org
Cc: linux-...@lists.infradead.org
Cc: linux-n...@lists.infradead.org
Cc: linux-par...@vger.kernel.org
Cc: linux-r...@vger.kernel.org
Cc: linux-s...@vger.kernel.org
Cc: linux...@lists.infradead.org
Cc: linux-...@vger.kernel.org

Re: [PATCH v3 3/5] treewide: use get_random_u32() when possible

2022-10-07 Thread Jason A. Donenfeld
On Fri, Oct 07, 2022 at 10:12:42AM -0700, Kees Cook wrote:
> On Fri, Oct 07, 2022 at 08:07:58AM -0600, Jason A. Donenfeld wrote:
> > On Fri, Oct 07, 2022 at 04:57:24AM +, Christophe Leroy wrote:
> > > 
> > > 
> > > Le 07/10/2022 à 01:36, Jason A. Donenfeld a écrit :
> > > > On 10/6/22, Christophe Leroy  wrote:
> > > >>
> > > >>
> > > >> Le 06/10/2022 à 19:31, Christophe Leroy a écrit :
> > > >>>
> > > >>>
> > > >>> Le 06/10/2022 à 19:24, Jason A. Donenfeld a écrit :
> > >  Hi Christophe,
> > > 
> > >  On Thu, Oct 6, 2022 at 11:21 AM Christophe Leroy
> > >   wrote:
> > > > Le 06/10/2022 à 18:53, Jason A. Donenfeld a écrit :
> > > >> The prandom_u32() function has been a deprecated inline wrapper 
> > > >> around
> > > >> get_random_u32() for several releases now, and compiles down to the
> > > >> exact same code. Replace the deprecated wrapper with a direct call 
> > > >> to
> > > >> the real function. The same also applies to get_random_int(), 
> > > >> which is
> > > >> just a wrapper around get_random_u32().
> > > >>
> > > >> Reviewed-by: Kees Cook 
> > > >> Acked-by: Toke Høiland-Jørgensen  # for sch_cake
> > > >> Acked-by: Chuck Lever  # for nfsd
> > > >> Reviewed-by: Jan Kara  # for ext4
> > > >> Signed-off-by: Jason A. Donenfeld 
> > > >> ---
> > > >
> > > >> diff --git a/arch/powerpc/kernel/process.c
> > > >> b/arch/powerpc/kernel/process.c
> > > >> index 0fbda89cd1bb..9c4c15afbbe8 100644
> > > >> --- a/arch/powerpc/kernel/process.c
> > > >> +++ b/arch/powerpc/kernel/process.c
> > > >> @@ -2308,6 +2308,6 @@ void notrace __ppc64_runlatch_off(void)
> > > >> unsigned long arch_align_stack(unsigned long sp)
> > > >> {
> > > >> if (!(current->personality & ADDR_NO_RANDOMIZE) &&
> > > >> randomize_va_space)
> > > >> - sp -= get_random_int() & ~PAGE_MASK;
> > > >> + sp -= get_random_u32() & ~PAGE_MASK;
> > > >> return sp & ~0xf;
> > > >
> > > > Isn't that a candidate for prandom_u32_max() ?
> > > >
> > > > Note that sp is deemed to be 16 bytes aligned at all time.
> > > 
> > >  Yes, probably. It seemed non-trivial to think about, so I didn't. But
> > >  let's see here... maybe it's not too bad:
> > > 
> > >  If PAGE_MASK is always ~(PAGE_SIZE-1), then ~PAGE_MASK is
> > >  (PAGE_SIZE-1), so prandom_u32_max(PAGE_SIZE) should yield the same
> > >  thing? Is that accurate? And holds across platforms (this comes up a
> > >  few places)? If so, I'll do that for a v4.
> > > 
> > > >>>
> > > >>> On powerpc it is always (from arch/powerpc/include/asm/page.h) :
> > > >>>
> > > >>> /*
> > > >>>* Subtle: (1 << PAGE_SHIFT) is an int, not an unsigned long. So if 
> > > >>> we
> > > >>>* assign PAGE_MASK to a larger type it gets extended the way we 
> > > >>> want
> > > >>>* (i.e. with 1s in the high bits)
> > > >>>*/
> > > >>> #define PAGE_MASK  (~((1 << PAGE_SHIFT) - 1))
> > > >>>
> > > >>> #define PAGE_SIZE(1UL << PAGE_SHIFT)
> > > >>>
> > > >>>
> > > >>> So it would work I guess.
> > > >>
> > > >> But taking into account that sp must remain 16 bytes aligned, would it
> > > >> be better to do something like ?
> > > >>
> > > >>sp -= prandom_u32_max(PAGE_SIZE >> 4) << 4;
> > > >>
> > > >>return sp;
> > > > 
> > > > Does this assume that sp is already aligned at the beginning of the
> > > > function? I'd assume from the function's name that this isn't the
> > > > case?
> > > 
> > > Ah you are right, I overlooked it.
> > 
> > So I think to stay on the safe side, I'm going to go with
> > `prandom_u32_max(PAGE_SIZE)`. Sound good?
> 
> Given these kinds of less mechanical changes, it may make sense to split
> these from the "trivial" conversions in a treewide patch. The chance of
> needing a revert from the simple 1:1 conversions is much lower than the
> need to revert by-hand changes.
> 
> The Cocci script I suggested in my v1 review gets 80% of the first
> patch, for example.

I'll split things up into a mechanical step and a non-mechanical step.
Good idea.

Jason


Re: [PATCH v3 3/5] treewide: use get_random_u32() when possible

2022-10-07 Thread Kees Cook
On Fri, Oct 07, 2022 at 08:07:58AM -0600, Jason A. Donenfeld wrote:
> On Fri, Oct 07, 2022 at 04:57:24AM +, Christophe Leroy wrote:
> > 
> > 
> > Le 07/10/2022 à 01:36, Jason A. Donenfeld a écrit :
> > > On 10/6/22, Christophe Leroy  wrote:
> > >>
> > >>
> > >> Le 06/10/2022 à 19:31, Christophe Leroy a écrit :
> > >>>
> > >>>
> > >>> Le 06/10/2022 à 19:24, Jason A. Donenfeld a écrit :
> >  Hi Christophe,
> > 
> >  On Thu, Oct 6, 2022 at 11:21 AM Christophe Leroy
> >   wrote:
> > > Le 06/10/2022 à 18:53, Jason A. Donenfeld a écrit :
> > >> The prandom_u32() function has been a deprecated inline wrapper 
> > >> around
> > >> get_random_u32() for several releases now, and compiles down to the
> > >> exact same code. Replace the deprecated wrapper with a direct call to
> > >> the real function. The same also applies to get_random_int(), which 
> > >> is
> > >> just a wrapper around get_random_u32().
> > >>
> > >> Reviewed-by: Kees Cook 
> > >> Acked-by: Toke Høiland-Jørgensen  # for sch_cake
> > >> Acked-by: Chuck Lever  # for nfsd
> > >> Reviewed-by: Jan Kara  # for ext4
> > >> Signed-off-by: Jason A. Donenfeld 
> > >> ---
> > >
> > >> diff --git a/arch/powerpc/kernel/process.c
> > >> b/arch/powerpc/kernel/process.c
> > >> index 0fbda89cd1bb..9c4c15afbbe8 100644
> > >> --- a/arch/powerpc/kernel/process.c
> > >> +++ b/arch/powerpc/kernel/process.c
> > >> @@ -2308,6 +2308,6 @@ void notrace __ppc64_runlatch_off(void)
> > >> unsigned long arch_align_stack(unsigned long sp)
> > >> {
> > >> if (!(current->personality & ADDR_NO_RANDOMIZE) &&
> > >> randomize_va_space)
> > >> - sp -= get_random_int() & ~PAGE_MASK;
> > >> + sp -= get_random_u32() & ~PAGE_MASK;
> > >> return sp & ~0xf;
> > >
> > > Isn't that a candidate for prandom_u32_max() ?
> > >
> > > Note that sp is deemed to be 16 bytes aligned at all time.
> > 
> >  Yes, probably. It seemed non-trivial to think about, so I didn't. But
> >  let's see here... maybe it's not too bad:
> > 
> >  If PAGE_MASK is always ~(PAGE_SIZE-1), then ~PAGE_MASK is
> >  (PAGE_SIZE-1), so prandom_u32_max(PAGE_SIZE) should yield the same
> >  thing? Is that accurate? And holds across platforms (this comes up a
> >  few places)? If so, I'll do that for a v4.
> > 
> > >>>
> > >>> On powerpc it is always (from arch/powerpc/include/asm/page.h) :
> > >>>
> > >>> /*
> > >>>* Subtle: (1 << PAGE_SHIFT) is an int, not an unsigned long. So if we
> > >>>* assign PAGE_MASK to a larger type it gets extended the way we want
> > >>>* (i.e. with 1s in the high bits)
> > >>>*/
> > >>> #define PAGE_MASK  (~((1 << PAGE_SHIFT) - 1))
> > >>>
> > >>> #define PAGE_SIZE(1UL << PAGE_SHIFT)
> > >>>
> > >>>
> > >>> So it would work I guess.
> > >>
> > >> But taking into account that sp must remain 16 bytes aligned, would it
> > >> be better to do something like ?
> > >>
> > >>  sp -= prandom_u32_max(PAGE_SIZE >> 4) << 4;
> > >>
> > >>  return sp;
> > > 
> > > Does this assume that sp is already aligned at the beginning of the
> > > function? I'd assume from the function's name that this isn't the
> > > case?
> > 
> > Ah you are right, I overlooked it.
> 
> So I think to stay on the safe side, I'm going to go with
> `prandom_u32_max(PAGE_SIZE)`. Sound good?

Given these kinds of less mechanical changes, it may make sense to split
these from the "trivial" conversions in a treewide patch. The chance of
needing a revert from the simple 1:1 conversions is much lower than the
need to revert by-hand changes.

The Cocci script I suggested in my v1 review gets 80% of the first
patch, for example.

-- 
Kees Cook


Re: [PATCH v2] drm/display: Don't assume dual mode adaptors support i2c sub-addressing

2022-10-07 Thread Ville Syrjälä
On Thu, Oct 06, 2022 at 06:11:37PM +0300, Ville Syrjälä wrote:
> On Thu, Oct 06, 2022 at 11:33:14AM +0200, Simon Rettberg wrote:
> > Current dual mode adaptor ("DP++") detection code assumes that all
> > adaptors support i2c sub-addressing for read operations from the
> > DP-HDMI adaptor ID buffer.  It has been observed that multiple
> > adaptors do not in fact support this, and always return data starting
> > at register 0.  On affected adaptors, the code fails to read the proper
> > registers that would identify the device as a type 2 adaptor, and
> > handles those as type 1, limiting the TMDS clock to 165MHz, even if
> > the according register would announce a higher TMDS clock.
> > Fix this by always reading the ID buffer starting from offset 0, and
> > discarding any bytes before the actual offset of interest.
> > 
> > We tried finding authoritative documentation on whether or not this is
> > allowed behaviour, but since all the official VESA docs are paywalled,
> > the best we could come up with was the spec sheet for Texas Instruments'
> > SNx5DP149 chip family.[1]  It explicitly mentions that sub-addressing is
> > supported for register writes, but *not* for reads (See NOTE in
> > section 8.5.3).  Unless TI openly decided to violate the VESA spec, one
> > could take that as a hint that sub-addressing is in fact not mandated
> > by VESA.
> > The other two adaptors affected used the PS8409(A) and the LT8611,
> > according to the data returned from their ID buffers.
> > 
> > [1] https://www.ti.com/lit/ds/symlink/sn75dp149.pdf
> > 
> > Signed-off-by: Simon Rettberg 
> > Reviewed-by: Rafael Gieschke 
> > ---
> > 
> > v2 changes form last submission's feedback (thanks for taking the time):
> > - Added a shortened version of our "background story" to the commit message
> > - Only use tmpbuf if the read offset is != 0
> 
> Bounced to intel-gfx to get the i915 CI to check it...

CI didn't blow up, and I also gave this a quick smoking on my end
with both type 1 HDMI and type 2 HDMI adaptors. 

I'm thinking we want a cc:stable on this? I can slap that on
when pushing if there are no objections?

> 
> > 
> >  .../gpu/drm/display/drm_dp_dual_mode_helper.c | 51 +++
> >  1 file changed, 29 insertions(+), 22 deletions(-)
> > 
> > diff --git a/drivers/gpu/drm/display/drm_dp_dual_mode_helper.c 
> > b/drivers/gpu/drm/display/drm_dp_dual_mode_helper.c
> > index 3ea53bb67d3b..bd61e20770a5 100644
> > --- a/drivers/gpu/drm/display/drm_dp_dual_mode_helper.c
> > +++ b/drivers/gpu/drm/display/drm_dp_dual_mode_helper.c
> > @@ -63,23 +63,45 @@
> >  ssize_t drm_dp_dual_mode_read(struct i2c_adapter *adapter,
> >   u8 offset, void *buffer, size_t size)
> >  {
> > +   u8 zero = 0;
> > +   char *tmpbuf = NULL;
> > +   /*
> > +* As sub-addressing is not supported by all adaptors,
> > +* always explicitly read from the start and discard
> > +* any bytes that come before the requested offset.
> > +* This way, no matter whether the adaptor supports it
> > +* or not, we'll end up reading the proper data.
> > +*/
> > struct i2c_msg msgs[] = {
> > {
> > .addr = DP_DUAL_MODE_SLAVE_ADDRESS,
> > .flags = 0,
> > .len = 1,
> > -   .buf = ,
> > +   .buf = ,
> > },
> > {
> > .addr = DP_DUAL_MODE_SLAVE_ADDRESS,
> > .flags = I2C_M_RD,
> > -   .len = size,
> > +   .len = size + offset,
> > .buf = buffer,
> > },
> > };
> > int ret;
> >  
> > +   if (offset) {
> > +   tmpbuf = kmalloc(size + offset, GFP_KERNEL);
> > +   if (!tmpbuf)
> > +   return -ENOMEM;
> > +
> > +   msgs[1].buf = tmpbuf;
> > +   }
> > +
> > ret = i2c_transfer(adapter, msgs, ARRAY_SIZE(msgs));
> > +   if (tmpbuf)
> > +   memcpy(buffer, tmpbuf + offset, size);
> > +
> > +   kfree(tmpbuf);
> > +
> > if (ret < 0)
> > return ret;
> > if (ret != ARRAY_SIZE(msgs))
> > @@ -208,18 +230,6 @@ enum drm_dp_dual_mode_type 
> > drm_dp_dual_mode_detect(const struct drm_device *dev,
> > if (ret)
> > return DRM_DP_DUAL_MODE_UNKNOWN;
> >  
> > -   /*
> > -* Sigh. Some (maybe all?) type 1 adaptors are broken and ack
> > -* the offset but ignore it, and instead they just always return
> > -* data from the start of the HDMI ID buffer. So for a broken
> > -* type 1 HDMI adaptor a single byte read will always give us
> > -* 0x44, and for a type 1 DVI adaptor it should give 0x00
> > -* (assuming it implements any registers). Fortunately neither
> > -* of those values will match the type 2 signature of the
> > -* DP_DUAL_MODE_ADAPTOR_ID register so we can proceed with
> > -* the type 2 adaptor detection safely even in the presence
> > -* of broken type 1 adaptors.
> > -*/

Re: [git pull] drm regression fix

2022-10-07 Thread pr-tracker-bot
The pull request you sent on Fri, 7 Oct 2022 13:03:25 +1000:

> git://anongit.freedesktop.org/drm/drm tags/drm-next-2022-10-07-1

has been merged into torvalds/linux.git:
https://git.kernel.org/torvalds/c/9d84bb40bcb30a7fa16f33baa967aeb9953dda78

Thank you!

-- 
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/prtracker.html


Re: [PATCH v13 5/9] drm/i915: Check for integer truncation on scatterlist creation

2022-10-07 Thread Gwan-gyeong Mun




On 9/28/22 8:09 PM, Linus Torvalds wrote:

On Wed, Sep 28, 2022 at 1:15 AM Gwan-gyeong Mun
 wrote:


+   if (check_assign(obj->base.size >> PAGE_SHIFT, ))
+   return -E2BIG;


I have to say, I find that new "check_assign()" macro use to be disgusting.

It's one thing to check for overflows.

It's another thing entirely to just assign something to a local variable.

This disgusting "let's check and assign" needs to die. It makes the
code a completely unreadable mess. The "user" wersion is even worse.

If you worry about overflow, then use a mix of

  (a) use a sufficiently large type to begin with

  (b) check for value range separately

and in this particular case, I also suspect that the whole range check
should have been somewhere else entirely - at the original creation of
that "obj" structure, not at one random end-point where it is used.

In other words, THIS WHOLE PATCH is just end-points checking the size
requirements of that "base.size" thing much too late, when it should
have been checked originally for some "maximum acceptable base size"
instead.

And that "maximum acceptable base size" should *not* be about "this is
the size of the variables we use". It should be a sanity check of
"this value is sane and fits in sane use cases".

Because "let's plug security checks" is most definitely not about
picking random assignments and saying "let's check this one". It's
about trying to catch things earlier than that.
Linus, but the size check of the object in the i915 is already done at 
the time of creation.
And this patch series is designed to prevent problems that may arise 
from the difference between the data structure used internally by 
drm/i915/ttm (unsigned long) and the data structure provided and used by 
the scatter/getter api (unsigned int).


The current implementation of the i915 uses sg_table / scatterlist to 
manage and use memory resources at a low level.
When creating an object of i915, it is based on drm_gem_object, which is 
the data structure of drm. The size of object is size_t [1][2].
And i915 uses ttm. the ttm_resource_manager manages resources with 
ttm_resource structure [3] for resource management.
When creating sgt with sg_alloc_table()[4] in i915, size of struct 
drm_gem_object[1] and num_pages of struct ttm_resource[3] are used as 
nents arguments.

(Of course, there are places that explicitly use unsigned int variables.)
Even where sg_alloc_table_from_pages_segment() [5] is used, there are 
places where the size of struct drm_gem_object [1] is used as the 
n_pages argument after bit shift operation with PAGE_SHIFT.


As above, when using drm, ttm, sgt infrastructure in i915, there is a 
type mismatch in size in the driver implementation.


Because the types are different, when assigning a value from a large 
type variable to a small type variable, if the overflow check is used as 
a safety guard in i915 before sgt api call, implicit value truncation 
will not occur when a problem occurs. The log output makes it easy to 
detect that a problem has already occurred before sgt apis are called.
When a bug related to this issue occurs, it will not delay the reporting 
of the problem of this issue.


Because the above one is one of a workaround solution, if the types used 
in the scatter/getter api would be changed to such as size_t or another 
configurable type, it would be a more proper solution. But it might 
affect lots of drivers and frameworks. therefore I suggest a current 
solution before the changing of sgt area.



Br,

G.G.


[1]
struct drm_gem_object {
...
size_t size;
...

[2]
#ifndef __kernel_size_t
#if __BITS_PER_LONG != 64
typedef unsigned int__kernel_size_t;
#else
typedef __kernel_ulong_t __kernel_size_t;

typedef __kernel_size_tsize_t;

[3]
struct ttm_resource {
unsigned long start;
unsigned long num_pages;
uint32_t mem_type;
uint32_t placement;
struct ttm_bus_placement bus;
struct ttm_buffer_object *bo;

/**
 * @lru: Least recently used list, see _resource_manager.lru
 */
struct list_head lru;
};


[4] int sg_alloc_table(struct sg_table *table, unsigned int nents, gfp_t 
gfp_mask)


[5] int sg_alloc_table_from_pages_segment(struct sg_table *sgt, struct 
page **pages,

unsigned int n_pages, unsigned int offset,
unsigned long size, unsigned int max_segment,
gfp_t gfp_mask)





Kees, you need to reign in the craziness in overflow.h.

  Linus


Re: [PATCH v13 5/9] drm/i915: Check for integer truncation on scatterlist creation

2022-10-07 Thread Gwan-gyeong Mun
Linus and Kees, I also understood that I should not make and use the 
macro that performs assignment and checking at the same time, and I will 
drop it and update it.


Kees, the overflows_type() macro had several updates as input from you 
and the community, and there is an advantage when moving to Linux common.
What are your thoughts on continuing to add the overflows_type() macro 
to overflows.h?


Br,

G.G.

On 9/28/22 9:06 PM, Kees Cook wrote:

On Wed, Sep 28, 2022 at 10:09:04AM -0700, Linus Torvalds wrote:

Kees, you need to reign in the craziness in overflow.h.


Understood. I've been trying to help the drm folks walk a line between
having a bunch of custom macros hidden away in the drm includes and
building up generalized versions that are actually helpful beyond drm.
But I can see that it doesn't help to have a "do two things at the same
time" macro for the assignment checking.



Re: [PATCH v13 5/9] drm/i915: Check for integer truncation on scatterlist creation

2022-10-07 Thread Gwan-gyeong Mun




On 9/28/22 11:51 AM, Jani Nikula wrote:

On Wed, 28 Sep 2022, Gwan-gyeong Mun  wrote:

diff --git a/drivers/gpu/drm/i915/i915_scatterlist.h 
b/drivers/gpu/drm/i915/i915_scatterlist.h
index 9ddb3e743a3e..1d1802beb42b 100644
--- a/drivers/gpu/drm/i915/i915_scatterlist.h
+++ b/drivers/gpu/drm/i915/i915_scatterlist.h
@@ -220,4 +220,15 @@ struct i915_refct_sgt 
*i915_rsgt_from_buddy_resource(struct ttm_resource *res,
 u64 region_start,
 u32 page_alignment);
  
+/* Wrap scatterlist.h to sanity check for integer truncation */

+typedef unsigned int __sg_size_t; /* see linux/scatterlist.h */
+#define sg_alloc_table(sgt, nents, gfp) \
+   overflows_type(nents, __sg_size_t) ? -E2BIG \
+   : ((sg_alloc_table)(sgt, (__sg_size_t)(nents), gfp))
+
+#define sg_alloc_table_from_pages_segment(sgt, pages, npages, offset, size, 
max_segment, gfp) \
+   overflows_type(npages, __sg_size_t) ? -E2BIG \
+   : ((sg_alloc_table_from_pages_segment)(sgt, pages, 
(__sg_size_t)(npages), offset, \
+  size, max_segment, gfp))
+
  #endif


No. I don't think we should shadow sg_alloc_table() and
sg_alloc_table_from_pages_segment().

Either get this in scatterlist.h (preferred) or prefix with i915_ or
whatever to indicate it's our local thing.

i915_scatterlist.h already has too much scatterlist "namespace" abuse
that I'd rather see gone than violated more.



Hi Jani,
Thanks for you comments.

I will update this patch by removing the shadowing of 
sg_alloc_table_from_pages_segment() / sg_alloc_table(), so that the 
caller checks when overflow checking is required.


Br,
G.G.


BR,
Jani.





RE: [v6] drm/msm/disp/dpu1: add support for dspp sub block flush in sc7280

2022-10-07 Thread Kalyan Thota


>-Original Message-
>From: Dmitry Baryshkov 
>Sent: Tuesday, October 4, 2022 8:03 PM
>To: Kalyan Thota (QUIC) 
>Cc: dri-devel@lists.freedesktop.org; linux-arm-...@vger.kernel.org;
>freedr...@lists.freedesktop.org; devicet...@vger.kernel.org; linux-
>ker...@vger.kernel.org; robdcl...@gmail.com; diand...@chromium.org;
>swb...@chromium.org; Vinod Polimera (QUIC) ;
>Abhinav Kumar (QUIC) 
>Subject: Re: [v6] drm/msm/disp/dpu1: add support for dspp sub block flush in
>sc7280
>
>WARNING: This email originated from outside of Qualcomm. Please be wary of
>any links or attachments, and do not enable macros.
>
>On Sun, 2 Oct 2022 at 06:15, Kalyan Thota  wrote:
>>
>> Flush mechanism for DSPP blocks has changed in sc7280 family, it
>> allows individual sub blocks to be flushed in coordination with master
>> flush control.
>>
>> Representation: master_flush && (PCC_flush | IGC_flush .. etc )
>>
>> This change adds necessary support for the above design.
>>
>> Changes in v1:
>> - Few nits (Doug, Dmitry)
>> - Restrict sub-block flush programming to dpu_hw_ctl file (Dmitry)
>>
>> Changes in v2:
>> - Move the address offset to flush macro (Dmitry)
>> - Seperate ops for the sub block flush (Dmitry)
>>
>> Changes in v3:
>> - Reuse the DPU_DSPP_xx enum instead of a new one (Dmitry)
>>
>> Changes in v4:
>> - Use shorter version for unsigned int (Stephen)
>>
>> Changes in v5:
>> - Spurious patch please ignore.
>>
>> Changes in v6:
>> - Add SOB tag (Doug, Dmitry)
>>
>> Signed-off-by: Kalyan Thota 
>> Reviewed-by: Dmitry Baryshkov 
>> ---
>>  drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c   |  2 +-
>>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c |  5 +++-
>> drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h |  4 +++
>>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c | 35
>--
>>  drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.h | 10 ++--
>>  5 files changed, 50 insertions(+), 6 deletions(-)
>>
>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
>> b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
>> index 601d687..4170fbe 100644
>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_crtc.c
>> @@ -766,7 +766,7 @@ static void _dpu_crtc_setup_cp_blocks(struct
>> drm_crtc *crtc)
>>
>> /* stage config flush mask */
>> ctl->ops.update_pending_flush_dspp(ctl,
>> -   mixer[i].hw_dspp->idx);
>> +   mixer[i].hw_dspp->idx, DPU_DSPP_PCC);
>> }
>>  }
>>
>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
>> b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
>> index 27f029f..0eecb2f 100644
>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.c
>> @@ -65,7 +65,10 @@
>> (PINGPONG_SDM845_MASK | BIT(DPU_PINGPONG_TE2))
>>
>>  #define CTL_SC7280_MASK \
>> -   (BIT(DPU_CTL_ACTIVE_CFG) | BIT(DPU_CTL_FETCH_ACTIVE) |
>BIT(DPU_CTL_VM_CFG))
>> +   (BIT(DPU_CTL_ACTIVE_CFG) | \
>> +BIT(DPU_CTL_FETCH_ACTIVE) | \
>> +BIT(DPU_CTL_VM_CFG) | \
>> +BIT(DPU_CTL_DSPP_SUB_BLOCK_FLUSH))
>>
>>  #define MERGE_3D_SM8150_MASK (0)
>>
>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
>> b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
>> index 38aa38a..8148e91 100644
>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_catalog.h
>> @@ -161,10 +161,12 @@ enum {
>>   * DSPP sub-blocks
>>   * @DPU_DSPP_PCC Panel color correction block
>>   * @DPU_DSPP_GC  Gamma correction block
>> + * @DPU_DSPP_IGC Inverse Gamma correction block
>>   */
>>  enum {
>> DPU_DSPP_PCC = 0x1,
>> DPU_DSPP_GC,
>> +   DPU_DSPP_IGC,
>> DPU_DSPP_MAX
>>  };
>>
>> @@ -191,6 +193,7 @@ enum {
>>   * @DPU_CTL_SPLIT_DISPLAY: CTL supports video mode split display
>>   * @DPU_CTL_FETCH_ACTIVE:  Active CTL for fetch HW (SSPPs)
>>   * @DPU_CTL_VM_CFG:CTL config to support multiple VMs
>> + * @DPU_CTL_DSPP_BLOCK_FLUSH: CTL config to support dspp sub-block
>> + flush
>>   * @DPU_CTL_MAX
>>   */
>>  enum {
>> @@ -198,6 +201,7 @@ enum {
>> DPU_CTL_ACTIVE_CFG,
>> DPU_CTL_FETCH_ACTIVE,
>> DPU_CTL_VM_CFG,
>> +   DPU_CTL_DSPP_SUB_BLOCK_FLUSH,
>> DPU_CTL_MAX
>>  };
>>
>> diff --git a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c
>> b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c
>> index a35ecb6..f26f484 100644
>> --- a/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c
>> +++ b/drivers/gpu/drm/msm/disp/dpu1/dpu_hw_ctl.c
>> @@ -33,6 +33,7 @@
>>  #define   CTL_INTF_FLUSH0x110
>>  #define   CTL_INTF_MASTER   0x134
>>  #define   CTL_FETCH_PIPE_ACTIVE 0x0FC
>> +#define   CTL_DSPP_n_FLUSH(n)  ((0x13C) + ((n - 1) * 4))
>>
>>  #define CTL_MIXER_BORDER_OUTBIT(24)
>>  #define CTL_FLUSH_MASK_CTL  BIT(17)
>> @@ -287,8 +288,9 @@ static void
>> 

Re: [PATCH v3 3/5] treewide: use get_random_u32() when possible

2022-10-07 Thread Jason A. Donenfeld
On Fri, Oct 07, 2022 at 04:57:24AM +, Christophe Leroy wrote:
> 
> 
> Le 07/10/2022 à 01:36, Jason A. Donenfeld a écrit :
> > On 10/6/22, Christophe Leroy  wrote:
> >>
> >>
> >> Le 06/10/2022 à 19:31, Christophe Leroy a écrit :
> >>>
> >>>
> >>> Le 06/10/2022 à 19:24, Jason A. Donenfeld a écrit :
>  Hi Christophe,
> 
>  On Thu, Oct 6, 2022 at 11:21 AM Christophe Leroy
>   wrote:
> > Le 06/10/2022 à 18:53, Jason A. Donenfeld a écrit :
> >> The prandom_u32() function has been a deprecated inline wrapper around
> >> get_random_u32() for several releases now, and compiles down to the
> >> exact same code. Replace the deprecated wrapper with a direct call to
> >> the real function. The same also applies to get_random_int(), which is
> >> just a wrapper around get_random_u32().
> >>
> >> Reviewed-by: Kees Cook 
> >> Acked-by: Toke Høiland-Jørgensen  # for sch_cake
> >> Acked-by: Chuck Lever  # for nfsd
> >> Reviewed-by: Jan Kara  # for ext4
> >> Signed-off-by: Jason A. Donenfeld 
> >> ---
> >
> >> diff --git a/arch/powerpc/kernel/process.c
> >> b/arch/powerpc/kernel/process.c
> >> index 0fbda89cd1bb..9c4c15afbbe8 100644
> >> --- a/arch/powerpc/kernel/process.c
> >> +++ b/arch/powerpc/kernel/process.c
> >> @@ -2308,6 +2308,6 @@ void notrace __ppc64_runlatch_off(void)
> >> unsigned long arch_align_stack(unsigned long sp)
> >> {
> >> if (!(current->personality & ADDR_NO_RANDOMIZE) &&
> >> randomize_va_space)
> >> - sp -= get_random_int() & ~PAGE_MASK;
> >> + sp -= get_random_u32() & ~PAGE_MASK;
> >> return sp & ~0xf;
> >
> > Isn't that a candidate for prandom_u32_max() ?
> >
> > Note that sp is deemed to be 16 bytes aligned at all time.
> 
>  Yes, probably. It seemed non-trivial to think about, so I didn't. But
>  let's see here... maybe it's not too bad:
> 
>  If PAGE_MASK is always ~(PAGE_SIZE-1), then ~PAGE_MASK is
>  (PAGE_SIZE-1), so prandom_u32_max(PAGE_SIZE) should yield the same
>  thing? Is that accurate? And holds across platforms (this comes up a
>  few places)? If so, I'll do that for a v4.
> 
> >>>
> >>> On powerpc it is always (from arch/powerpc/include/asm/page.h) :
> >>>
> >>> /*
> >>>* Subtle: (1 << PAGE_SHIFT) is an int, not an unsigned long. So if we
> >>>* assign PAGE_MASK to a larger type it gets extended the way we want
> >>>* (i.e. with 1s in the high bits)
> >>>*/
> >>> #define PAGE_MASK  (~((1 << PAGE_SHIFT) - 1))
> >>>
> >>> #define PAGE_SIZE(1UL << PAGE_SHIFT)
> >>>
> >>>
> >>> So it would work I guess.
> >>
> >> But taking into account that sp must remain 16 bytes aligned, would it
> >> be better to do something like ?
> >>
> >>sp -= prandom_u32_max(PAGE_SIZE >> 4) << 4;
> >>
> >>return sp;
> > 
> > Does this assume that sp is already aligned at the beginning of the
> > function? I'd assume from the function's name that this isn't the
> > case?
> 
> Ah you are right, I overlooked it.

So I think to stay on the safe side, I'm going to go with
`prandom_u32_max(PAGE_SIZE)`. Sound good?

Jason


Re: [PATCH v2 3/7] dt-bindings: reserved-memory: Support framebuffer reserved memory

2022-10-07 Thread Rob Herring
On Fri, 07 Oct 2022 14:49:42 +0200, Thierry Reding wrote:
> From: Thierry Reding 
> 
> Document the "framebuffer" compatible string for reserved memory nodes
> to annotate reserved memory regions used for framebuffer carveouts.
> 
> Signed-off-by: Thierry Reding 
> ---
> Changes in v2:
> - use four spaces for indentation in example (as recommended elsewhere)
> - add explicit root node
> - drop unneeded quotes
> 
>  .../bindings/reserved-memory/framebuffer.yaml | 52 +++
>  1 file changed, 52 insertions(+)
>  create mode 100644 
> Documentation/devicetree/bindings/reserved-memory/framebuffer.yaml
> 

Reviewed-by: Rob Herring 


Re: [PATCH v2 2/7] dt-bindings: display: simple-framebuffer: Document 32-bit BGR format

2022-10-07 Thread Rob Herring
On Fri, 07 Oct 2022 14:49:41 +0200, Thierry Reding wrote:
> From: Thierry Reding 
> 
> This is a variant of the 32-bit RGB format where the red and blue
> components are swapped.
> 
> Signed-off-by: Thierry Reding 
> ---
>  .../devicetree/bindings/display/simple-framebuffer.yaml | 2 ++
>  1 file changed, 2 insertions(+)
> 

Reviewed-by: Rob Herring 


Re: [PATCH v2 1/7] dt-bindings: display: simple-framebuffer: Support system memory framebuffers

2022-10-07 Thread Rob Herring
On Fri, 07 Oct 2022 14:49:40 +0200, Thierry Reding wrote:
> From: Thierry Reding 
> 
> In order to support framebuffers residing in system memory, allow the
> memory-region property to override the framebuffer memory specification
> in the "reg" property.
> 
> Signed-off-by: Thierry Reding 
> ---
>  .../devicetree/bindings/display/simple-framebuffer.yaml  | 5 +
>  1 file changed, 5 insertions(+)
> 

Reviewed-by: Rob Herring 


Re: [Freedreno] [PATCH] drm/bridge: adv7533: remove dynamic lane switching from adv7533 bridge

2022-10-07 Thread Robert Foss
On Thu, 6 Oct 2022 at 17:07, Abhinav Kumar  wrote:
>
> Hi Robert
>
> Thanks for the review.
>
> On 10/4/2022 8:55 AM, Robert Foss wrote:
> > On Mon, 29 Aug 2022 at 20:23, Abhinav Kumar  
> > wrote:
> >>
> >> adv7533 bridge tries to dynamically switch lanes based on the
> >> mode by detaching and attaching the mipi dsi device.
> >>
> >> This approach is incorrect because this method of dynamic switch of
> >> detaching and attaching the mipi dsi device also results in removing
> >> and adding the component which is not necessary.
> >>
> >> This approach is also prone to deadlocks. So for example, on the
> >> db410c whenever this path is executed with lockdep enabled,
> >> this results in a deadlock due to below ordering of locks.
> >>
> >> -> #1 (crtc_ww_class_acquire){+.+.}-{0:0}:
> >>  lock_acquire+0x6c/0x90
> >>  drm_modeset_acquire_init+0xf4/0x150
> >>  drmm_mode_config_init+0x220/0x770
> >>  msm_drm_bind+0x13c/0x654
> >>  try_to_bring_up_aggregate_device+0x164/0x1d0
> >>  __component_add+0xa8/0x174
> >>  component_add+0x18/0x2c
> >>  dsi_dev_attach+0x24/0x30
> >>  dsi_host_attach+0x98/0x14c
> >>  devm_mipi_dsi_attach+0x38/0xb0
> >>  adv7533_attach_dsi+0x8c/0x110
> >>  adv7511_probe+0x5a0/0x930
> >>  i2c_device_probe+0x30c/0x350
> >>  really_probe.part.0+0x9c/0x2b0
> >>  __driver_probe_device+0x98/0x144
> >>  driver_probe_device+0xac/0x14c
> >>  __device_attach_driver+0xbc/0x124
> >>  bus_for_each_drv+0x78/0xd0
> >>  __device_attach+0xa8/0x1c0
> >>  device_initial_probe+0x18/0x24
> >>  bus_probe_device+0xa0/0xac
> >>  deferred_probe_work_func+0x90/0xd0
> >>  process_one_work+0x28c/0x6b0
> >>  worker_thread+0x240/0x444
> >>  kthread+0x110/0x114
> >>  ret_from_fork+0x10/0x20
> >>
> >> -> #0 (component_mutex){+.+.}-{3:3}:
> >>  __lock_acquire+0x1280/0x20ac
> >>  lock_acquire.part.0+0xe0/0x230
> >>  lock_acquire+0x6c/0x90
> >>  __mutex_lock+0x84/0x400
> >>  mutex_lock_nested+0x3c/0x70
> >>  component_del+0x34/0x170
> >>  dsi_dev_detach+0x24/0x30
> >>  dsi_host_detach+0x20/0x64
> >>  mipi_dsi_detach+0x2c/0x40
> >>  adv7533_mode_set+0x64/0x90
> >>  adv7511_bridge_mode_set+0x210/0x214
> >>  drm_bridge_chain_mode_set+0x5c/0x84
> >>  crtc_set_mode+0x18c/0x1dc
> >>  drm_atomic_helper_commit_modeset_disables+0x40/0x50
> >>  msm_atomic_commit_tail+0x1d0/0x6e0
> >>  commit_tail+0xa4/0x180
> >>  drm_atomic_helper_commit+0x178/0x3b0
> >>  drm_atomic_commit+0xa4/0xe0
> >>  drm_client_modeset_commit_atomic+0x228/0x284
> >>  drm_client_modeset_commit_locked+0x64/0x1d0
> >>  drm_client_modeset_commit+0x34/0x60
> >>  drm_fb_helper_lastclose+0x74/0xcc
> >>  drm_lastclose+0x3c/0x80
> >>  drm_release+0xfc/0x114
> >>  __fput+0x70/0x224
> >>  fput+0x14/0x20
> >>  task_work_run+0x88/0x1a0
> >>  do_exit+0x350/0xa50
> >>  do_group_exit+0x38/0xa4
> >>  __wake_up_parent+0x0/0x34
> >>  invoke_syscall+0x48/0x114
> >>  el0_svc_common.constprop.0+0x60/0x11c
> >>  do_el0_svc+0x30/0xc0
> >>  el0_svc+0x58/0x100
> >>  el0t_64_sync_handler+0x1b0/0x1bc
> >>  el0t_64_sync+0x18c/0x190
> >>
> >> Due to above reasons, remove the dynamic lane switching
> >> code from adv7533 bridge chip and filter out the modes
> >> which would need different number of lanes as compared
> >> to the initialization time using the mode_valid callback.
> >>
> >> This can be potentially re-introduced by using the pre_enable()
> >> callback but this needs to be evaluated first whether such an
> >> approach will work so this will be done with a separate change.
> >>
> >> changes since RFC:
> >>  - Fix commit text and add TODO comment
> >>
> >> Fixes: 62b2f026cd8e ("drm/bridge: adv7533: Change number of DSI lanes 
> >> dynamically")
> >> Closes: https://gitlab.freedesktop.org/drm/msm/-/issues/16
> >> Suggested-by: Dmitry Baryshkov 
> >> Signed-off-by: Abhinav Kumar 
> >> ---
> >>   drivers/gpu/drm/bridge/adv7511/adv7511.h |  3 ++-
> >>   drivers/gpu/drm/bridge/adv7511/adv7511_drv.c | 18 ++
> >>   drivers/gpu/drm/bridge/adv7511/adv7533.c | 25 
> >> +
> >>   3 files changed, 29 insertions(+), 17 deletions(-)
> >>
> >> diff --git a/drivers/gpu/drm/bridge/adv7511/adv7511.h 
> >> b/drivers/gpu/drm/bridge/adv7511/adv7511.h
> >> index 9e3bb8a8ee40..0a7cec80b75d 100644
> >> --- a/drivers/gpu/drm/bridge/adv7511/adv7511.h
> >> +++ b/drivers/gpu/drm/bridge/adv7511/adv7511.h
> >> @@ -417,7 +417,8 @@ static inline int adv7511_cec_init(struct device *dev, 
> >> struct adv7511 *adv7511)
> >>
> >>   void adv7533_dsi_power_on(struct adv7511 

Re: [PATCH V2 3/4] drm/panel: Add prepare_upstream_first flag to drm_panel

2022-10-07 Thread Dave Stevenson
Hi Jagan

On Thu, 6 Oct 2022 at 15:25, Jagan Teki  wrote:
>
> On Fri, Mar 4, 2022 at 8:48 PM Dave Stevenson
>  wrote:
> >
> > Mapping to the drm_bridge flag pre_enable_upstream_first,
> > add a new flag prepare_upstream_first to drm_panel to allow
> > the panel driver to request that the upstream bridge should
> > be pre_enabled before the panel prepare.
> >
> > Signed-off-by: Dave Stevenson 
> > ---
> >  drivers/gpu/drm/bridge/panel.c |  3 +++
> >  include/drm/drm_panel.h| 10 ++
> >  2 files changed, 13 insertions(+)
> >
> > diff --git a/drivers/gpu/drm/bridge/panel.c b/drivers/gpu/drm/bridge/panel.c
> > index 5be057575183..2ea08b3ba326 100644
> > --- a/drivers/gpu/drm/bridge/panel.c
> > +++ b/drivers/gpu/drm/bridge/panel.c
> > @@ -234,6 +234,9 @@ struct drm_bridge *drm_panel_bridge_add_typed(struct 
> > drm_panel *panel,
> > panel_bridge->bridge.ops = DRM_BRIDGE_OP_MODES;
> > panel_bridge->bridge.type = connector_type;
> >
> > +   panel_bridge->bridge.pre_enable_upstream_first =
> > +   
> > panel->prepare_upstream_first;
> > +
>
> panel_bridge is common for bridge users who used panel and those who
> might not need upstream first, so better to handle per bridge user
> whoever needs this.

Sorry, I don't follow you.

prepare_upstream_first is coming from a struct drm_panel, generally
for DSI panels are still panel drivers. An example would be Ilitek
ILI9881C.
It's a feature of the panel that would dictate that they want their
source initialised first.

The source bridge for the panel will call devm_drm_of_get_bridge,
which will call devm_drm_panel_bridge_add. Nothing outside of those
two functions have both the panel and bridge handles, so are you
proposing that the assignment should be done in devm_drm_of_get_bridge
[1]? That would leave the behaviour of drivers calling
drm_panel_bridge_add(_typed) directly as they were.

Thanks.
  Dave

[1] 
https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/bridge/panel.c#L418


[PATCH v2 7/7] arm64: tegra: Add simple framebuffer on Jetson Xavier NX

2022-10-07 Thread Thierry Reding
From: Thierry Reding 

Add the framebuffer carveout reserved memory node as well as a simple-
framebuffer node that is used to bind to the framebuffer that the
bootloader has set up.

Signed-off-by: Thierry Reding 
---
Changes in v2:
- clear out dynamic fields and leave it up to firmware to fill them in
- mark simple-framebuffer node as disabled by default

 .../nvidia/tegra194-p3509-+p3668-0001.dts | 33 +++
 arch/arm64/boot/dts/nvidia/tegra194.dtsi  |  2 +-
 2 files changed, 34 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/boot/dts/nvidia/tegra194-p3509-+p3668-0001.dts 
b/arch/arm64/boot/dts/nvidia/tegra194-p3509-+p3668-0001.dts
index 238fd98e8e45..44600479ea5f 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194-p3509-+p3668-0001.dts
+++ b/arch/arm64/boot/dts/nvidia/tegra194-p3509-+p3668-0001.dts
@@ -7,4 +7,37 @@
 / {
model = "NVIDIA Jetson Xavier NX Developer Kit (eMMC)";
compatible = "nvidia,p3509-+p3668-0001", "nvidia,tegra194";
+
+   chosen {
+   framebuffer {
+   compatible = "simple-framebuffer";
+   status = "disabled";
+   memory-region = <>;
+   power-domains = < TEGRA194_POWER_DOMAIN_DISP>;
+   clocks = < TEGRA194_CLK_SOR1_REF>,
+< TEGRA194_CLK_SOR1_OUT>,
+< TEGRA194_CLK_SOR1_PAD_CLKOUT>,
+< TEGRA194_CLK_PLLD2>,
+< TEGRA194_CLK_PLLDP>,
+< TEGRA194_CLK_NVDISPLAY_DISP>,
+< TEGRA194_CLK_NVDISPLAYHUB>,
+< TEGRA194_CLK_NVDISPLAY_P0>;
+   width = <0>;
+   height = <0>;
+   stride = <0>;
+   format = "";
+   };
+   };
+
+   reserved-memory {
+   #address-cells = <2>;
+   #size-cells = <2>;
+   ranges;
+
+   fb: framebuffer@0,0 {
+   compatible = "framebuffer";
+   reg = <0x0 0x0 0x0 0x0>;
+   iommu-addresses = < 0x0 0x0 0x0 0x0>;
+   };
+   };
 };
diff --git a/arch/arm64/boot/dts/nvidia/tegra194.dtsi 
b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
index 4d10b6b5324d..a961d42c81d6 100644
--- a/arch/arm64/boot/dts/nvidia/tegra194.dtsi
+++ b/arch/arm64/boot/dts/nvidia/tegra194.dtsi
@@ -1960,7 +1960,7 @@ display-hub@1520 {
 
ranges = <0x1520 0x1520 0x4>;
 
-   display@1520 {
+   dc0: display@1520 {
compatible = "nvidia,tegra194-dc";
reg = <0x1520 0x1>;
interrupts = ;
-- 
2.37.3



[PATCH v2 2/7] dt-bindings: display: simple-framebuffer: Document 32-bit BGR format

2022-10-07 Thread Thierry Reding
From: Thierry Reding 

This is a variant of the 32-bit RGB format where the red and blue
components are swapped.

Signed-off-by: Thierry Reding 
---
 .../devicetree/bindings/display/simple-framebuffer.yaml | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/Documentation/devicetree/bindings/display/simple-framebuffer.yaml 
b/Documentation/devicetree/bindings/display/simple-framebuffer.yaml
index 3e9857eb002e..3c9f29e428a4 100644
--- a/Documentation/devicetree/bindings/display/simple-framebuffer.yaml
+++ b/Documentation/devicetree/bindings/display/simple-framebuffer.yaml
@@ -99,6 +99,7 @@ properties:
 * `x1r5g5b5` - 16-bit pixels, d[14:10]=r, d[9:5]=g, d[4:0]=b
 * `x2r10g10b10` - 32-bit pixels, d[29:20]=r, d[19:10]=g, d[9:0]=b
 * `x8r8g8b8` - 32-bit pixels, d[23:16]=r, d[15:8]=g, d[7:0]=b
+* `x8b8g8r8` - 32-bit pixels, d[23:16]=b, d[15:8]=g, d[7:0]=r
 enum:
   - a1r5g5b5
   - a2r10g10b10
@@ -110,6 +111,7 @@ properties:
   - x1r5g5b5
   - x2r10g10b10
   - x8r8g8b8
+  - x8b8g8r8
 
   display:
 $ref: /schemas/types.yaml#/definitions/phandle
-- 
2.37.3



[PATCH v2 6/7] drm/simpledrm: Support the XB24/AB24 format

2022-10-07 Thread Thierry Reding
From: Thierry Reding 

Add XB24 and AB24 to the list of supported formats. The format helpers
support conversion to these formats and they are documented in the
simple-framebuffer device tree bindings.

v2: treat AB24 as XB24 and support both at the same time

Signed-off-by: Thierry Reding 
---
 drivers/gpu/drm/tiny/simpledrm.c   | 2 ++
 include/linux/platform_data/simplefb.h | 1 +
 2 files changed, 3 insertions(+)

diff --git a/drivers/gpu/drm/tiny/simpledrm.c b/drivers/gpu/drm/tiny/simpledrm.c
index cf36f67d22e4..ecb303c89320 100644
--- a/drivers/gpu/drm/tiny/simpledrm.c
+++ b/drivers/gpu/drm/tiny/simpledrm.c
@@ -559,6 +559,8 @@ static int simpledrm_device_init_mm(struct simpledrm_device 
*sdev)
 static const uint32_t simpledrm_primary_plane_formats[] = {
DRM_FORMAT_XRGB,
DRM_FORMAT_ARGB,
+   DRM_FORMAT_XBGR,
+   DRM_FORMAT_ABGR,
DRM_FORMAT_RGB565,
//DRM_FORMAT_XRGB1555,
//DRM_FORMAT_ARGB1555,
diff --git a/include/linux/platform_data/simplefb.h 
b/include/linux/platform_data/simplefb.h
index 27ea99af6e1d..4f94d52ac99f 100644
--- a/include/linux/platform_data/simplefb.h
+++ b/include/linux/platform_data/simplefb.h
@@ -22,6 +22,7 @@
{ "r8g8b8", 24, {16, 8}, {8, 8}, {0, 8}, {0, 0}, DRM_FORMAT_RGB888 }, \
{ "x8r8g8b8", 32, {16, 8}, {8, 8}, {0, 8}, {0, 0}, DRM_FORMAT_XRGB 
}, \
{ "a8r8g8b8", 32, {16, 8}, {8, 8}, {0, 8}, {24, 8}, DRM_FORMAT_ARGB 
}, \
+   { "x8b8g8r8", 32, {0, 8}, {8, 8}, {16, 8}, {0, 0}, DRM_FORMAT_XBGR 
}, \
{ "a8b8g8r8", 32, {0, 8}, {8, 8}, {16, 8}, {24, 8}, DRM_FORMAT_ABGR 
}, \
{ "x2r10g10b10", 32, {20, 10}, {10, 10}, {0, 10}, {0, 0}, 
DRM_FORMAT_XRGB2101010 }, \
{ "a2r10g10b10", 32, {20, 10}, {10, 10}, {0, 10}, {30, 2}, 
DRM_FORMAT_ARGB2101010 }, \
-- 
2.37.3



[PATCH v2 5/7] drm/format-helper: Support the XB24 format

2022-10-07 Thread Thierry Reding
From: Thierry Reding 

Add a conversion helper for the XB24 format to use in drm_fb_blit().

v2: support XB24 format instead of AB24

Signed-off-by: Thierry Reding 
---
 drivers/gpu/drm/drm_format_helper.c | 37 +
 1 file changed, 37 insertions(+)

diff --git a/drivers/gpu/drm/drm_format_helper.c 
b/drivers/gpu/drm/drm_format_helper.c
index e2f76621453c..8a7c200ecba9 100644
--- a/drivers/gpu/drm/drm_format_helper.c
+++ b/drivers/gpu/drm/drm_format_helper.c
@@ -503,6 +503,36 @@ static void drm_fb_rgb888_to_xrgb(struct iosys_map 
*dst, const unsigned int
drm_fb_rgb888_to_xrgb_line);
 }
 
+static void drm_fb_xrgb_to_xbgr_line(void *dbuf, const void *sbuf, 
unsigned int pixels)
+{
+   __le32 *dbuf32 = dbuf;
+   const __le32 *sbuf32 = sbuf;
+   unsigned int x;
+   u32 pix;
+
+   for (x = 0; x < pixels; x++) {
+   pix = le32_to_cpu(sbuf32[x]);
+   pix = ((pix & 0x00ff) >> 16) <<  0 |
+ ((pix & 0xff00) >>  8) <<  8 |
+ ((pix & 0x00ff) >>  0) << 16 |
+ 0xff << 24;
+   *dbuf32++ = cpu_to_le32(pix);
+   }
+}
+
+static void drm_fb_xrgb_to_xbgr(struct iosys_map *dst, const unsigned 
int *dst_pitch,
+   const struct iosys_map *src,
+   const struct drm_framebuffer *fb,
+   const struct drm_rect *clip)
+{
+   static const u8 dst_pixsize[DRM_FORMAT_MAX_PLANES] = {
+   4,
+   };
+
+   drm_fb_xfrm(dst, dst_pitch, dst_pixsize, src, fb, clip, false,
+   drm_fb_xrgb_to_xbgr_line);
+}
+
 static void drm_fb_xrgb_to_xrgb2101010_line(void *dbuf, const void *sbuf, 
unsigned int pixels)
 {
__le32 *dbuf32 = dbuf;
@@ -646,6 +676,8 @@ int drm_fb_blit(struct iosys_map *dst, const unsigned int 
*dst_pitch, uint32_t d
fb_format = DRM_FORMAT_XRGB;
if (dst_format == DRM_FORMAT_ARGB)
dst_format = DRM_FORMAT_XRGB;
+   if (dst_format == DRM_FORMAT_ABGR)
+   dst_format = DRM_FORMAT_XBGR;
if (fb_format == DRM_FORMAT_ARGB2101010)
fb_format = DRM_FORMAT_XRGB2101010;
if (dst_format == DRM_FORMAT_ARGB2101010)
@@ -673,6 +705,11 @@ int drm_fb_blit(struct iosys_map *dst, const unsigned int 
*dst_pitch, uint32_t d
drm_fb_rgb565_to_xrgb(dst, dst_pitch, src, fb, 
clip);
return 0;
}
+   } else if (dst_format == DRM_FORMAT_XBGR) {
+   if (fb_format == DRM_FORMAT_XRGB) {
+   drm_fb_xrgb_to_xbgr(dst, dst_pitch, src, fb, 
clip);
+   return 0;
+   }
} else if (dst_format == DRM_FORMAT_XRGB2101010) {
if (fb_format == DRM_FORMAT_XRGB) {
drm_fb_xrgb_to_xrgb2101010(dst, dst_pitch, src, fb, 
clip);
-- 
2.37.3



[PATCH v2 4/7] drm/simpledrm: Add support for system memory framebuffers

2022-10-07 Thread Thierry Reding
From: Thierry Reding 

Simple framebuffers can be set up in system memory, which cannot be
requested and/or I/O remapped using the I/O resource helpers. Add a
separate code path that obtains system memory framebuffers from the
reserved memory region referenced in the memory-region property.

v2: make screen base a struct iosys_map to avoid sparse warnings

Signed-off-by: Thierry Reding 
---
 drivers/gpu/drm/tiny/simpledrm.c | 177 ---
 1 file changed, 141 insertions(+), 36 deletions(-)

diff --git a/drivers/gpu/drm/tiny/simpledrm.c b/drivers/gpu/drm/tiny/simpledrm.c
index 18489779fb8a..cf36f67d22e4 100644
--- a/drivers/gpu/drm/tiny/simpledrm.c
+++ b/drivers/gpu/drm/tiny/simpledrm.c
@@ -2,6 +2,7 @@
 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -207,7 +208,9 @@ struct simpledrm_device {
unsigned int pitch;
 
/* memory management */
-   void __iomem *screen_base;
+   struct iosys_map screen_base;
+   phys_addr_t sysmem_start;
+   size_t sysmem_size;
 
/* modesetting */
uint32_t formats[8];
@@ -441,6 +444,106 @@ static int simpledrm_device_init_regulators(struct 
simpledrm_device *sdev)
 }
 #endif
 
+/*
+ * Memory management
+ */
+
+static int simpledrm_device_init_mm_sysmem(struct simpledrm_device *sdev, 
phys_addr_t start,
+  size_t size)
+{
+   struct drm_device *dev = >dev;
+   phys_addr_t end = start + size - 1;
+   void *screen_base;
+
+   drm_info(dev, "using system memory framebuffer at [%pa-%pa]\n", , 
);
+
+   screen_base = devm_memremap(dev->dev, start, size, MEMREMAP_WC);
+   if (!screen_base)
+   return -ENOMEM;
+
+   iosys_map_set_vaddr(>screen_base, screen_base);
+
+   return 0;
+}
+
+static int simpledrm_device_init_mm_io(struct simpledrm_device *sdev, 
phys_addr_t start,
+  size_t size)
+{
+   struct drm_device *dev = >dev;
+   phys_addr_t end = start + size - 1;
+   void __iomem *screen_base;
+   struct resource *mem;
+
+   drm_info(dev, "using I/O memory framebuffer at [%pa-%pa]\n", , 
);
+
+   mem = devm_request_mem_region(dev->dev, start, size, 
sdev->dev.driver->name);
+   if (!mem) {
+   /*
+* We cannot make this fatal. Sometimes this comes from magic
+* spaces our resource handlers simply don't know about. Use
+* the I/O-memory resource as-is and try to map that instead.
+*/
+   drm_warn(dev, "could not acquire memory region [%pa-%pa]\n", 
, );
+   } else {
+   size = resource_size(mem);
+   start = mem->start;
+   }
+
+   screen_base = devm_ioremap_wc(dev->dev, start, size);
+   if (!screen_base)
+   return -ENOMEM;
+
+   iosys_map_set_vaddr_iomem(>screen_base, screen_base);
+
+   return 0;
+}
+
+static void simpledrm_device_exit_mm(void *data)
+{
+   struct simpledrm_device *sdev = data;
+   struct drm_device *dev = >dev;
+
+   of_reserved_mem_device_release(dev->dev);
+}
+
+static int simpledrm_device_init_mm(struct simpledrm_device *sdev)
+{
+   int (*init)(struct simpledrm_device *sdev, phys_addr_t start, size_t 
size);
+   struct drm_device *dev = >dev;
+   struct platform_device *pdev = to_platform_device(dev->dev);
+   phys_addr_t start, end;
+   size_t size;
+   int ret;
+
+   ret = of_reserved_mem_device_init_by_idx(dev->dev, dev->dev->of_node, 
0);
+   if (ret) {
+   struct resource *res;
+
+   res = platform_get_resource(pdev, IORESOURCE_MEM, 0);
+   if (!res)
+   return -EINVAL;
+
+   init = simpledrm_device_init_mm_io;
+   size = resource_size(res);
+   start = res->start;
+   } else {
+   devm_add_action_or_reset(dev->dev, simpledrm_device_exit_mm, 
sdev);
+   init = simpledrm_device_init_mm_sysmem;
+   start = sdev->sysmem_start;
+   size = sdev->sysmem_size;
+   }
+
+   end = start + size - 1;
+
+   ret = devm_aperture_acquire_from_firmware(dev, start, size);
+   if (ret) {
+   drm_err(dev, "could not acquire memory range [%pa-%pa]: %d\n", 
, , ret);
+   return ret;
+   }
+
+   return init(sdev, start, size);
+}
+
 /*
  * Modesetting
  */
@@ -491,15 +594,15 @@ static void 
simpledrm_primary_plane_helper_atomic_update(struct drm_plane *plane
 
drm_atomic_helper_damage_iter_init(, old_plane_state, plane_state);
drm_atomic_for_each_plane_damage(, ) {
-   struct iosys_map dst = IOSYS_MAP_INIT_VADDR(sdev->screen_base);
struct drm_rect dst_clip = plane_state->dst;
 
if (!drm_rect_intersect(_clip, ))
continue;
 
-   iosys_map_incr(, 

[PATCH v2 3/7] dt-bindings: reserved-memory: Support framebuffer reserved memory

2022-10-07 Thread Thierry Reding
From: Thierry Reding 

Document the "framebuffer" compatible string for reserved memory nodes
to annotate reserved memory regions used for framebuffer carveouts.

Signed-off-by: Thierry Reding 
---
Changes in v2:
- use four spaces for indentation in example (as recommended elsewhere)
- add explicit root node
- drop unneeded quotes

 .../bindings/reserved-memory/framebuffer.yaml | 52 +++
 1 file changed, 52 insertions(+)
 create mode 100644 
Documentation/devicetree/bindings/reserved-memory/framebuffer.yaml

diff --git a/Documentation/devicetree/bindings/reserved-memory/framebuffer.yaml 
b/Documentation/devicetree/bindings/reserved-memory/framebuffer.yaml
new file mode 100644
index ..05b6648b3458
--- /dev/null
+++ b/Documentation/devicetree/bindings/reserved-memory/framebuffer.yaml
@@ -0,0 +1,52 @@
+# SPDX-License-Identifier: (GPL-2.0 OR BSD-2-Clause)
+%YAML 1.2
+---
+$id: http://devicetree.org/schemas/reserved-memory/framebuffer.yaml#
+$schema: http://devicetree.org/meta-schemas/core.yaml#
+
+title: /reserved-memory framebuffer node bindings
+
+maintainers:
+  - devicetree-s...@vger.kernel.org
+
+allOf:
+  - $ref: reserved-memory.yaml
+
+properties:
+  compatible:
+const: framebuffer
+description: >
+  This indicates a region of memory meant to be used as a framebuffer for
+  a set of display devices. It can be used by an operating system to keep
+  the framebuffer from being overwritten and use it as the backing memory
+  for a display device (such as simple-framebuffer).
+
+unevaluatedProperties: false
+
+examples:
+  - |
+/ {
+compatible = "foo";
+model = "foo";
+#address-cells = <1>;
+#size-cells = <1>;
+
+chosen {
+framebuffer {
+compatible = "simple-framebuffer";
+memory-region = <>;
+};
+};
+
+reserved-memory {
+#address-cells = <1>;
+#size-cells = <1>;
+ranges;
+
+fb: framebuffer@8000 {
+compatible = "framebuffer";
+reg = <0x8000 0x007e9000>;
+};
+};
+};
+...
-- 
2.37.3



[PATCH v2 1/7] dt-bindings: display: simple-framebuffer: Support system memory framebuffers

2022-10-07 Thread Thierry Reding
From: Thierry Reding 

In order to support framebuffers residing in system memory, allow the
memory-region property to override the framebuffer memory specification
in the "reg" property.

Signed-off-by: Thierry Reding 
---
 .../devicetree/bindings/display/simple-framebuffer.yaml  | 5 +
 1 file changed, 5 insertions(+)

diff --git a/Documentation/devicetree/bindings/display/simple-framebuffer.yaml 
b/Documentation/devicetree/bindings/display/simple-framebuffer.yaml
index dd64f70b5014..3e9857eb002e 100644
--- a/Documentation/devicetree/bindings/display/simple-framebuffer.yaml
+++ b/Documentation/devicetree/bindings/display/simple-framebuffer.yaml
@@ -63,6 +63,11 @@ properties:
   reg:
 description: Location and size of the framebuffer memory
 
+  memory-region:
+maxItems: 1
+description: Phandle to a node describing the memory to be used for the
+  framebuffer. If present, overrides the "reg" property (if one exists).
+
   clocks:
 description: List of clocks used by the framebuffer.
 
-- 
2.37.3



[PATCH v2 0/7] drm/simpledrm: Support system memory framebuffers

2022-10-07 Thread Thierry Reding
From: Thierry Reding 

Hi,

this series of patches adds support for framebuffers residing in system
memory to the simple-framebuffer DRM driver. To do this, the DT bindings
are extended do accept the memory-region property in addition to the reg
property for specifying the framebuffer memory. This is done because the
framebuffer memory will typically also need to be marked as reserved so
that the operating system will not reuse it and the memory-region
property is the standard property to reference reserved memory regions.

A new compatible string is documented to annotate the framebuffer memory
regions and the simpledrm driver has code added to bind such annotated
regions to the simple-framebuffer device.

The second half of the series then adds support for the XB24 and AB24
formats and ties it all together to provide a simple-framebuffer on
Jetson Xavier NX. It should be noted, though, that the Jetson Xavier NX
device tree nodes are placeholders only and it is expected that firmware
or a bootloader will fill these in at runtime, due to the variable
nature of the values that they contain.

This example also uses (but doesn't depend on) the iommu-addresses
property that has been proposed and which will hopefully be merged soon.

Version 1 of these patches can be found here:


https://lore.kernel.org/all/20220905163300.391692-1-thierry.red...@gmail.com/

Changes in v2:
- DT fields are now cleared so that they can be filled in at runtime
- add XB24 support and treat AB24 the same (alpha bits are unused)
- consistently use struct iosys_map
- fix issues with DT bindings

I've tested these with a simple UEFI implementation that will fill in
the placeholder values and set the simple-framebuffer's status property
to "okay".

Thierry

Thierry Reding (7):
  dt-bindings: display: simple-framebuffer: Support system memory
framebuffers
  dt-bindings: display: simple-framebuffer: Document 32-bit BGR format
  dt-bindings: reserved-memory: Support framebuffer reserved memory
  drm/simpledrm: Add support for system memory framebuffers
  drm/format-helper: Support the XB24 format
  drm/simpledrm: Support the XB24/AB24 format
  arm64: tegra: Add simple framebuffer on Jetson Xavier NX

 .../bindings/display/simple-framebuffer.yaml  |   7 +
 .../bindings/reserved-memory/framebuffer.yaml |  52 +
 .../nvidia/tegra194-p3509-+p3668-0001.dts |  33 
 arch/arm64/boot/dts/nvidia/tegra194.dtsi  |   2 +-
 drivers/gpu/drm/drm_format_helper.c   |  37 
 drivers/gpu/drm/tiny/simpledrm.c  | 179 ++
 include/linux/platform_data/simplefb.h|   1 +
 7 files changed, 274 insertions(+), 37 deletions(-)
 create mode 100644 
Documentation/devicetree/bindings/reserved-memory/framebuffer.yaml

-- 
2.37.3



[PATCH v5 1/2] drm/atomic-helper: Don't allocate new plane state in CRTC check

2022-10-07 Thread Thomas Zimmermann
In drm_atomic_helper_check_crtc_state(), do not add a new plane state
to the global state if it does not exist already. Adding a new plane
state will result in overhead for the plane during the atomic-commit
step.

For the test in drm_atomic_helper_check_crtc_state() to succeed, it
is important that the CRTC has an enabled primary plane after the
commit. Simply testing the CRTC state's plane_mask for a primary plane
is sufficient.

Note that the helper still only tests for an attached primary plane.
Drivers have to ensure that the plane contains valid pixel information.

v5:
* fix commit description (Javier)
v3:
* test for a primary plane in plane_mask (Ville)
v2:
* remove unnecessary test for plane->crtc (Ville)
* inline drm_atomic_get_next_plane_state() (Ville)
* acquire plane lock before accessing plane->state (Ville)

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Javier Martinez Canillas 
Fixes: d6b9af1097fe ("drm/atomic-helper: Add helper 
drm_atomic_helper_check_crtc_state()")
Cc: Thomas Zimmermann 
Cc: Jocelyn Falempe 
Cc: Maarten Lankhorst 
Cc: Maxime Ripard 
Cc: David Airlie 
Cc: Daniel Vetter 
Cc: dri-devel@lists.freedesktop.org
---
 drivers/gpu/drm/drm_atomic_helper.c | 10 +-
 1 file changed, 1 insertion(+), 9 deletions(-)

diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index 98cc3137c062..02b4a7dc92f5 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -945,7 +945,6 @@ int drm_atomic_helper_check_crtc_state(struct 
drm_crtc_state *crtc_state,
   bool can_disable_primary_planes)
 {
struct drm_device *dev = crtc_state->crtc->dev;
-   struct drm_atomic_state *state = crtc_state->state;
 
if (!crtc_state->enable)
return 0;
@@ -956,14 +955,7 @@ int drm_atomic_helper_check_crtc_state(struct 
drm_crtc_state *crtc_state,
struct drm_plane *plane;
 
drm_for_each_plane_mask(plane, dev, crtc_state->plane_mask) {
-   struct drm_plane_state *plane_state;
-
-   if (plane->type != DRM_PLANE_TYPE_PRIMARY)
-   continue;
-   plane_state = drm_atomic_get_plane_state(state, plane);
-   if (IS_ERR(plane_state))
-   return PTR_ERR(plane_state);
-   if (plane_state->fb && plane_state->crtc) {
+   if (plane->type == DRM_PLANE_TYPE_PRIMARY) {
has_primary_plane = true;
break;
}
-- 
2.37.3



[PATCH v5 2/2] drm/atomic-helper: Replace drm_atomic_helper_check_crtc_state()

2022-10-07 Thread Thomas Zimmermann
Rename the atomic helper function drm_atomic_helper_check_crtc_state()
to drm_atomic_helper_check_crtc_primary_plane() and only check for an
attached primary plane. Adapt callers.

Instead of having one big function to check for various CRTC state
conditions, we rather want smaller functions that drivers can pick
individually.

v5:
* rebase on top of udl changes

Signed-off-by: Thomas Zimmermann 
Reviewed-by: Javier Martinez Canillas 
---
 drivers/gpu/drm/ast/ast_mode.c  |  8 ++--
 drivers/gpu/drm/drm_atomic_helper.c | 52 +
 drivers/gpu/drm/drm_simple_kms_helper.c |  6 ++-
 drivers/gpu/drm/mgag200/mgag200_mode.c  |  8 ++--
 drivers/gpu/drm/solomon/ssd130x.c   |  6 ++-
 drivers/gpu/drm/tiny/simpledrm.c|  6 ++-
 drivers/gpu/drm/udl/udl_modeset.c   |  5 ++-
 include/drm/drm_atomic_helper.h |  3 +-
 8 files changed, 46 insertions(+), 48 deletions(-)

diff --git a/drivers/gpu/drm/ast/ast_mode.c b/drivers/gpu/drm/ast/ast_mode.c
index 8ff998da71ef..d5ee3ad538a8 100644
--- a/drivers/gpu/drm/ast/ast_mode.c
+++ b/drivers/gpu/drm/ast/ast_mode.c
@@ -1161,13 +1161,13 @@ static int ast_crtc_helper_atomic_check(struct drm_crtc 
*crtc,
bool succ;
int ret;
 
-   ret = drm_atomic_helper_check_crtc_state(crtc_state, false);
-   if (ret)
-   return ret;
-
if (!crtc_state->enable)
goto out;
 
+   ret = drm_atomic_helper_check_crtc_primary_plane(crtc_state);
+   if (ret)
+   return ret;
+
ast_state = to_ast_crtc_state(crtc_state);
 
format = ast_state->format;
diff --git a/drivers/gpu/drm/drm_atomic_helper.c 
b/drivers/gpu/drm/drm_atomic_helper.c
index 02b4a7dc92f5..1a586b3c454b 100644
--- a/drivers/gpu/drm/drm_atomic_helper.c
+++ b/drivers/gpu/drm/drm_atomic_helper.c
@@ -924,51 +924,35 @@ int drm_atomic_helper_check_plane_state(struct 
drm_plane_state *plane_state,
 EXPORT_SYMBOL(drm_atomic_helper_check_plane_state);
 
 /**
- * drm_atomic_helper_check_crtc_state() - Check CRTC state for validity
+ * drm_atomic_helper_check_crtc_primary_plane() - Check CRTC state for primary 
plane
  * @crtc_state: CRTC state to check
- * @can_disable_primary_planes: can the CRTC be enabled without a primary 
plane?
  *
- * Checks that a desired CRTC update is valid. Drivers that provide
- * their own CRTC handling rather than helper-provided implementations may
- * still wish to call this function to avoid duplication of error checking
- * code.
- *
- * Note that @can_disable_primary_planes only tests if the CRTC can be
- * enabled without a primary plane. To test if a primary plane can be updated
- * without a CRTC, use drm_atomic_helper_check_plane_state() in the plane's
- * atomic check.
+ * Checks that a CRTC has at least one primary plane attached to it, which is
+ * a requirement on some hardware. Note that this only involves the CRTC side
+ * of the test. To test if the primary plane is visible or if it can be updated
+ * without the CRTC being enabled, use drm_atomic_helper_check_plane_state() in
+ * the plane's atomic check.
  *
  * RETURNS:
- * Zero if update appears valid, error code on failure
+ * 0 if a primary plane is attached to the CRTC, or an error code otherwise
  */
-int drm_atomic_helper_check_crtc_state(struct drm_crtc_state *crtc_state,
-  bool can_disable_primary_planes)
+int drm_atomic_helper_check_crtc_primary_plane(struct drm_crtc_state 
*crtc_state)
 {
-   struct drm_device *dev = crtc_state->crtc->dev;
-
-   if (!crtc_state->enable)
-   return 0;
+   struct drm_crtc *crtc = crtc_state->crtc;
+   struct drm_device *dev = crtc->dev;
+   struct drm_plane *plane;
 
/* needs at least one primary plane to be enabled */
-   if (!can_disable_primary_planes) {
-   bool has_primary_plane = false;
-   struct drm_plane *plane;
-
-   drm_for_each_plane_mask(plane, dev, crtc_state->plane_mask) {
-   if (plane->type == DRM_PLANE_TYPE_PRIMARY) {
-   has_primary_plane = true;
-   break;
-   }
-   }
-   if (!has_primary_plane) {
-   drm_dbg_kms(dev, "Cannot enable CRTC without a primary 
plane.\n");
-   return -EINVAL;
-   }
+   drm_for_each_plane_mask(plane, dev, crtc_state->plane_mask) {
+   if (plane->type == DRM_PLANE_TYPE_PRIMARY)
+   return 0;
}
 
-   return 0;
+   drm_dbg_atomic(dev, "[CRTC:%d:%s] primary plane missing\n", 
crtc->base.id, crtc->name);
+
+   return -EINVAL;
 }
-EXPORT_SYMBOL(drm_atomic_helper_check_crtc_state);
+EXPORT_SYMBOL(drm_atomic_helper_check_crtc_primary_plane);
 
 /**
  * drm_atomic_helper_check_planes - validate state object for planes changes
diff --git a/drivers/gpu/drm/drm_simple_kms_helper.c 

[PATCH v5 0/2] drm/atomic-helpers: Fix CRTC primary-plane test

2022-10-07 Thread Thomas Zimmermann
Fix the test for the CRTC's primary plane and clean up the test function
to only do the test.

v5:
* fix commit message of patch 1 (Javier)
* rebase onto latest drm-tip
v4:
* clean up the helper function (Ville)

Thomas Zimmermann (2):
  drm/atomic-helper: Don't allocate new plane state in CRTC check
  drm/atomic-helper: Replace drm_atomic_helper_check_crtc_state()

 drivers/gpu/drm/ast/ast_mode.c  |  8 ++--
 drivers/gpu/drm/drm_atomic_helper.c | 60 -
 drivers/gpu/drm/drm_simple_kms_helper.c |  6 ++-
 drivers/gpu/drm/mgag200/mgag200_mode.c  |  8 ++--
 drivers/gpu/drm/solomon/ssd130x.c   |  6 ++-
 drivers/gpu/drm/tiny/simpledrm.c|  6 ++-
 drivers/gpu/drm/udl/udl_modeset.c   |  5 ++-
 include/drm/drm_atomic_helper.h |  3 +-
 8 files changed, 46 insertions(+), 56 deletions(-)

-- 
2.37.3



Re: [PATCH v2 1/2] drivers: gpu: drm: add driver for samsung s6e3fc2x01 cmd mode panel

2022-10-07 Thread Fabio Estevam
Hi Nia,

On Fri, Oct 7, 2022 at 8:16 AM Nia Espera  wrote:

> +static int samsung_s6e3fc2x01_prepare(struct drm_panel *panel)
> +{
> +   struct samsung_s6e3fc2x01 *ctx = to_samsung_s6e3fc2x01(panel);
> +   struct device *dev = >dsi->dev;
> +   int ret;
> +
> +   if (ctx->prepared)
> +   return 0;
> +
> +   ret = regulator_enable(ctx->supply);
> +   if (ret < 0) {
> +   dev_err(dev, "Failed to enable regulator: %d\n", ret);
> +   return ret;
> +   }
> +
> +   samsung_s6e3fc2x01_reset(ctx);
> +
> +   ret = samsung_s6e3fc2x01_on(ctx);
> +   if (ret < 0) {
> +   dev_err(dev, "Failed to initialize panel: %d\n", ret);
> +   gpiod_set_value_cansleep(ctx->reset_gpio, 1);

You should also call regulator_disable() here in the case of failure.

> +static int samsung_s6e3fc2x01_unprepare(struct drm_panel *panel)
> +{
> +   struct samsung_s6e3fc2x01 *ctx = to_samsung_s6e3fc2x01(panel);
> +   struct device *dev = >dsi->dev;
> +   int ret;
> +
> +   if (!ctx->prepared)
> +   return 0;
> +
> +   ret = samsung_s6e3fc2x01_off(ctx);
> +   if (ret < 0)
> +   dev_err(dev, "Failed to un-initialize panel: %d\n", ret);
> +
> +   gpiod_set_value_cansleep(ctx->reset_gpio, 1);

regulator_disable() should be called here as well.


[PATCH v2 2/2] drivers: gpu: drm: remove support for sofef00 driver on s6e3fc2x01 panel

2022-10-07 Thread Nia Espera
Removes functionality from sofef00 panel driver which allowed it to
drive the s6e3fc2x01 panel

Signed-off-by: Nia Espera 
Reviewed-by: Caleb Connolly 
---
 drivers/gpu/drm/panel/Kconfig |  6 +++---
 drivers/gpu/drm/panel/panel-samsung-sofef00.c | 18 --
 2 files changed, 3 insertions(+), 21 deletions(-)

diff --git a/drivers/gpu/drm/panel/Kconfig b/drivers/gpu/drm/panel/Kconfig
index ee62d5d8828a..62b9cb6acd05 100644
--- a/drivers/gpu/drm/panel/Kconfig
+++ b/drivers/gpu/drm/panel/Kconfig
@@ -547,16 +547,16 @@ config DRM_PANEL_SAMSUNG_S6E8AA0
select VIDEOMODE_HELPERS
 
 config DRM_PANEL_SAMSUNG_SOFEF00
-   tristate "Samsung sofef00/s6e3fc2x01 OnePlus 6/6T DSI cmd mode panels"
+   tristate "Samsung sofef00 OnePlus 6 DSI cmd mode panel"
depends on OF
depends on DRM_MIPI_DSI
depends on BACKLIGHT_CLASS_DEVICE
select VIDEOMODE_HELPERS
help
  Say Y or M here if you want to enable support for the Samsung AMOLED
- command mode panels found in the OnePlus 6/6T smartphones.
+ command mode panel found in the OnePlus 6 smartphone.
 
- The panels are 2280x1080@60Hz and 2340x1080@60Hz respectively
+ The panel is 2280x1080@60Hz
 
 config DRM_PANEL_SAMSUNG_S6E3FC2X01
tristate "Samsung s6e3fc2x01 OnePlus 6T DSI cmd mode panel"
diff --git a/drivers/gpu/drm/panel/panel-samsung-sofef00.c 
b/drivers/gpu/drm/panel/panel-samsung-sofef00.c
index bd02af81a4fe..68e58b9b8c5c 100644
--- a/drivers/gpu/drm/panel/panel-samsung-sofef00.c
+++ b/drivers/gpu/drm/panel/panel-samsung-sofef00.c
@@ -181,20 +181,6 @@ static const struct drm_display_mode enchilada_panel_mode 
= {
.height_mm = 145,
 };
 
-static const struct drm_display_mode fajita_panel_mode = {
-   .clock = (1080 + 72 + 16 + 36) * (2340 + 32 + 4 + 18) * 60 / 1000,
-   .hdisplay = 1080,
-   .hsync_start = 1080 + 72,
-   .hsync_end = 1080 + 72 + 16,
-   .htotal = 1080 + 72 + 16 + 36,
-   .vdisplay = 2340,
-   .vsync_start = 2340 + 32,
-   .vsync_end = 2340 + 32 + 4,
-   .vtotal = 2340 + 32 + 4 + 18,
-   .width_mm = 68,
-   .height_mm = 145,
-};
-
 static int sofef00_panel_get_modes(struct drm_panel *panel, struct 
drm_connector *connector)
 {
struct drm_display_mode *mode;
@@ -327,10 +313,6 @@ static const struct of_device_id sofef00_panel_of_match[] 
= {
.compatible = "samsung,sofef00",
.data = _panel_mode,
},
-   { // OnePlus 6T / fajita
-   .compatible = "samsung,s6e3fc2x01",
-   .data = _panel_mode,
-   },
{ /* sentinel */ }
 };
 MODULE_DEVICE_TABLE(of, sofef00_panel_of_match);
-- 
2.38.0



[PATCH v2 1/2] drivers: gpu: drm: add driver for samsung s6e3fc2x01 cmd mode panel

2022-10-07 Thread Nia Espera
Adds a dedicated driver for the Samsung s6e3fc2x01 panel used in OnePlus
6T smartphones which was previously driven by the sofef00 panel driver

Signed-off-by: Nia Espera 
Reviewed-by: Caleb Connolly 
---
 MAINTAINERS   |   5 +
 drivers/gpu/drm/panel/Kconfig |  11 +
 drivers/gpu/drm/panel/Makefile|   1 +
 .../gpu/drm/panel/panel-samsung-s6e3fc2x01.c  | 396 ++
 4 files changed, 413 insertions(+)
 create mode 100644 drivers/gpu/drm/panel/panel-samsung-s6e3fc2x01.c

diff --git a/MAINTAINERS b/MAINTAINERS
index 936490dcc97b..7e9455ac5a13 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -6630,6 +6630,11 @@ S:   Maintained
 F: Documentation/devicetree/bindings/display/panel/samsung,s6d27a1.yaml
 F: drivers/gpu/drm/panel/panel-samsung-s6d27a1.c
 
+DRM DRIVER FOR SAMSUNG S6E3FC2X01 PANELS
+M: Nia Espera 
+S: Maintained
+F: drivers/gpu/drm/panel/panel-samsung-s6e3fc2x01.c
+
 DRM DRIVER FOR SITRONIX ST7703 PANELS
 M: Guido Günther 
 R: Purism Kernel Team 
diff --git a/drivers/gpu/drm/panel/Kconfig b/drivers/gpu/drm/panel/Kconfig
index 9a281120363c..ee62d5d8828a 100644
--- a/drivers/gpu/drm/panel/Kconfig
+++ b/drivers/gpu/drm/panel/Kconfig
@@ -558,6 +558,17 @@ config DRM_PANEL_SAMSUNG_SOFEF00
 
  The panels are 2280x1080@60Hz and 2340x1080@60Hz respectively
 
+config DRM_PANEL_SAMSUNG_S6E3FC2X01
+   tristate "Samsung s6e3fc2x01 OnePlus 6T DSI cmd mode panel"
+   depends on OF
+   depends on DRM_MIPI_DSI
+   depends on BACKLIGHT_CLASS_DEVICE
+   select VIDEOMODE_HELPERS
+ Say Y or M here if you want to enable support for the Samsung AMOLED
+ command mode panel found in the OnePlus 6T smartphone.
+
+ The panel is 2340x1080@60Hz
+
 config DRM_PANEL_SEIKO_43WVF1G
tristate "Seiko 43WVF1G panel"
depends on OF
diff --git a/drivers/gpu/drm/panel/Makefile b/drivers/gpu/drm/panel/Makefile
index 6d493b9d64fe..b54de8812e91 100644
--- a/drivers/gpu/drm/panel/Makefile
+++ b/drivers/gpu/drm/panel/Makefile
@@ -56,6 +56,7 @@ obj-$(CONFIG_DRM_PANEL_SAMSUNG_S6E63M0_DSI) += 
panel-samsung-s6e63m0-dsi.o
 obj-$(CONFIG_DRM_PANEL_SAMSUNG_S6E88A0_AMS452EF01) += 
panel-samsung-s6e88a0-ams452ef01.o
 obj-$(CONFIG_DRM_PANEL_SAMSUNG_S6E8AA0) += panel-samsung-s6e8aa0.o
 obj-$(CONFIG_DRM_PANEL_SAMSUNG_SOFEF00) += panel-samsung-sofef00.o
+obj-$(CONFIG_DRM_PANEL_SAMSUNG_S6E3FC2X01) += panel-samsung-s6e3fc2x01.o
 obj-$(CONFIG_DRM_PANEL_SEIKO_43WVF1G) += panel-seiko-43wvf1g.o
 obj-$(CONFIG_DRM_PANEL_SHARP_LQ101R1SX01) += panel-sharp-lq101r1sx01.o
 obj-$(CONFIG_DRM_PANEL_SHARP_LS037V7DW01) += panel-sharp-ls037v7dw01.o
diff --git a/drivers/gpu/drm/panel/panel-samsung-s6e3fc2x01.c 
b/drivers/gpu/drm/panel/panel-samsung-s6e3fc2x01.c
new file mode 100644
index ..54b05703ebe4
--- /dev/null
+++ b/drivers/gpu/drm/panel/panel-samsung-s6e3fc2x01.c
@@ -0,0 +1,396 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/* Copyright (c) 2022 Nia Espera 
+ * Generated with linux-mdss-dsi-panel-driver-generator from vendor device 
tree:
+ * Copyright (c) 2022, The Linux Foundation. All rights reserved.
+ */
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include 
+#include 
+#include 
+
+struct samsung_s6e3fc2x01 {
+   struct drm_panel panel;
+   struct mipi_dsi_device *dsi;
+   struct regulator *supply;
+   struct gpio_desc *reset_gpio;
+   const struct drm_display_mode *mode;
+   bool prepared;
+};
+
+static inline
+struct samsung_s6e3fc2x01 *to_samsung_s6e3fc2x01(struct drm_panel *panel)
+{
+   return container_of(panel, struct samsung_s6e3fc2x01, panel);
+}
+
+#define dsi_dcs_write_seq(dsi, seq...) do {\
+   static const u8 d[] = { seq };  \
+   int ret;\
+   ret = mipi_dsi_dcs_write_buffer(dsi, d, ARRAY_SIZE(d)); \
+   if (ret < 0)\
+   return ret; \
+   } while (0)
+
+static void samsung_s6e3fc2x01_reset(struct samsung_s6e3fc2x01 *ctx)
+{
+   gpiod_set_value_cansleep(ctx->reset_gpio, 0);
+   usleep_range(5000, 6000);
+   gpiod_set_value_cansleep(ctx->reset_gpio, 1);
+   usleep_range(2000, 3000);
+   gpiod_set_value_cansleep(ctx->reset_gpio, 0);
+   usleep_range(1, 11000);
+}
+
+static int samsung_s6e3fc2x01_on(struct samsung_s6e3fc2x01 *ctx)
+{
+   struct mipi_dsi_device *dsi = ctx->dsi;
+   struct device *dev = >dev;
+   int ret;
+
+   dsi->mode_flags |= MIPI_DSI_MODE_LPM;
+
+   dsi_dcs_write_seq(dsi, 0x9f, 0xa5, 0xa5);
+
+   ret = mipi_dsi_dcs_exit_sleep_mode(dsi);
+   if (ret < 0) {
+   dev_err(dev, "Failed to exit sleep mode: %d\n", ret);
+   return ret;
+   }
+ 

[PATCH v2 0/2] Samsung s6e3fc2x01 panel driver for OnePlus 6T

2022-10-07 Thread Nia Espera
This patch series adds proper support for the panel used in OnePlus 6T
smartphones (s6e3fc2x01). Previously, the panel relied on the driver
used by the sofef00 panel which failed to properly initialise it after
a reset.

Nia Espera (2):
  drivers: gpu: drm: add driver for samsung s6e3fc2x01 cmd mode panel
  drivers: gpu: drm: remove support for sofef00 driver on s6e3fc2x01
panel

 MAINTAINERS   |   5 +
 drivers/gpu/drm/panel/Kconfig |  17 +-
 drivers/gpu/drm/panel/Makefile|   1 +
 .../gpu/drm/panel/panel-samsung-s6e3fc2x01.c  | 396 ++
 drivers/gpu/drm/panel/panel-samsung-sofef00.c |  18 -
 5 files changed, 416 insertions(+), 21 deletions(-)
 create mode 100644 drivers/gpu/drm/panel/panel-samsung-s6e3fc2x01.c

-- 
2.38.0



Re: [git pull] drm for 6.1-rc1

2022-10-07 Thread Daniel Vetter
Forgot to add Andrey as scheduler maintainer.
-Daniel

On Fri, 7 Oct 2022 at 10:16, Daniel Vetter  wrote:
>
> On Fri, 7 Oct 2022 at 01:45, Linus Torvalds
>  wrote:
> >
> > On Thu, Oct 6, 2022 at 1:25 PM Dave Airlie  wrote:
> > >
> > >
> > > [ 1234.778760] BUG: kernel NULL pointer dereference, address: 
> > > 0088
> > > [ 1234.778813] RIP: 0010:drm_sched_job_done.isra.0+0xc/0x140 [gpu_sched]
> >
> > As far as I can tell, that's the line
> >
> > struct drm_gpu_scheduler *sched = s_fence->sched;
> >
> > where 's_fence' is NULL. The code is
> >
> >0: 0f 1f 44 00 00nopl   0x0(%rax,%rax,1)
> >5: 41 54push   %r12
> >7: 55push   %rbp
> >8: 53push   %rbx
> >9: 48 89 fb  mov%rdi,%rbx
> >c:* 48 8b af 88 00 00 00 mov0x88(%rdi),%rbp <-- trapping instruction
> >   13: f0 ff 8d f0 00 00 00 lock decl 0xf0(%rbp)
> >   1a: 48 8b 85 80 01 00 00 mov0x180(%rbp),%rax
> >
> > and that next 'lock decl' instruction would have been the
> >
> > atomic_dec(>hw_rq_count);
> >
> > at the top of drm_sched_job_done().
> >
> > Now, as to *why* you'd have a NULL s_fence, it would seem that
> > drm_sched_job_cleanup() was called with an active job. Looking at that
> > code, it does
> >
> > if (kref_read(>s_fence->finished.refcount)) {
> > /* drm_sched_job_arm() has been called */
> > dma_fence_put(>s_fence->finished);
> > ...
> >
> > but then it does
> >
> > job->s_fence = NULL;
> >
> > anyway, despite the job still being active. The logic of that kind of
> > "fake refcount" escapes me. The above looks fundamentally racy, not to
> > say pointless and wrong (a refcount is a _count_, not a flag, so there
> > could be multiple references to it, what says that you can just
> > decrement one of them and say "I'm done").
>
> Just figured I'll clarify this, because it's indeed a bit wtf and the
> comment doesn't explain much. drm_sched_job_cleanup can be called both
> when a real job is being cleaned up (which holds a full reference on
> job->s_fence and needs to drop it) and to simplify error path in job
> constructions (and the "is this refcount initialized already" signals
> what exactly needs to be cleaned up or not). So no race, because the
> only times this check goes different is when job construction has
> failed before the job struct is visible by any other thread.
>
> But yeah the comment could actually explain what's going on here :-)
>
> And yeah the patch Dave reverted screws up the cascade of references
> that ensures this all stays alive until drm_sched_job_cleanup is
> called on active jobs, so looks all reasonable to me. Some Kunit tests
> maybe to exercise these corners? Not the first time pure scheduler
> code blew up, so proably worth the effort.
> -Daniel
>
> >
> > Now, _why_ any of that happens, I have no idea. I'm just looking at
> > the immediate "that pointer is NULL" thing, and reacting to what looks
> > like a completely bogus refcount pattern.
> >
> > But that odd refcount pattern isn't new, so it's presumably some user
> > on the amd gpu side that changed.
> >
> > The problem hasn't happened again for me, but that's not saying a lot,
> > since it was very random to begin with.
> >
> >  Linus
>
>
>
> --
> Daniel Vetter
> Software Engineer, Intel Corporation
> http://blog.ffwll.ch



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH v3 3/5] treewide: use get_random_u32() when possible

2022-10-07 Thread Mika Westerberg
On Thu, Oct 06, 2022 at 10:53:44AM -0600, Jason A. Donenfeld wrote:
>  drivers/thunderbolt/xdomain.c  |  2 +-

Acked-by: Mika Westerberg 


RE: [PATCH v3 3/5] treewide: use get_random_u32() when possible

2022-10-07 Thread David Laight
From: Christophe Leroy
> Sent: 06 October 2022 18:43
...
> But taking into account that sp must remain 16 bytes aligned, would it
> be better to do something like ?
> 
>   sp -= prandom_u32_max(PAGE_SIZE >> 4) << 4;

That makes me think...
If prandom_u32_max() is passed a (constant) power of 2 it doesn't
need to do the multiply, it can just do a shift right.

Doesn't it also always get a 32bit random value?
So actually get_random_u32() & PAGE_MASK & ~0xf is faster!

When PAGE_SIZE is 4k, PAGE_SIZE >> 4 is 256 so it could use:
get_ramdom_u8() << 4

You also seem to have removed prandom_u32() in favour of
get_random_u32() but have added more prandom_() functions.

David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, 
UK
Registration No: 1397386 (Wales)


Re: Internal DRI subsystem locking and contention between connector commits

2022-10-07 Thread Pekka Paalanen
On Thu, 6 Oct 2022 12:03:56 +
"Hoosier, Matt"  wrote:

> I have a DRM master implementing a purpose-built compositor for a
> dedicated use-case. It drives several different connectors, each on
> its own vsync cadence (there's no clone mode happening here).
> 
> The goal is to have commits to each connector occur completely
> without respect to whatever is happening on the other connectors.
> There's a different thread issuing the DRI ioctl's for each connector.
> 
> In the compositor, each connector is treated like its own little
> universe; a disjoint set of CRTCs and planes is earmarked for use by
> each of the connectors. One intention for this is to avoid sharing
> resources in a way that would introduce implicit synchronization
> points between the two connector's event loops. So, atomic commits
> made to one connector never attempt to use a resource that's ever
> been used in a commit to a different connector. This may be relevant
> to a question I'll ask a bit later below about resource locking
> contention.
> 
> For some time, I've been noticing that even test-only atomic commits
> done on connector A will sometimes block for many frame-times.
> Analysis with the DRI driver implementor has shown that the atomic
> commits to A--whether DRM_MODE_ATOMIC_TEST_ONLY or
> DRM_MODE_ATOMIC_NONBLOCK--are getting stuck in the ioctl entry code
> waiting for a DRI mutex.
> 
> It turns out that during these unexpected delays, the DRI driver's
> commit thread holds that mutex while servicing a commit to connector
> B. It does this while it waits for the fences to fire for all
> framebuffer IDs referred to by the pending connector B scene. So the
> commit to connector A can't be tested or enqueued until the commit to
> B is completely finished. The driver author reckons that this is
> unavoidable because every DRM_IOCTL_MODE_ATOMIC ioctl  needs to
> acquire the same global singleton DRM connection_mutex in order to
> query or manipulate the connector.
> 
> The result is that it's quite difficult to guarantee a framerate on
> connector A, because unrelated activity performed on connector B can
> hold global locks for an unpredictable amount of time.
> 
> The first question would be: does this story sound consistent? If so,
> then a couple more questions follow.
> 
> Is this kind of implicit interlocking expected? Is there any way to
> avoid the pending commits getting serialized like that on the kernel
> side?

Hi Matt,

Ville actually mentioned something very much like that recently, see
the thread at:
https://lore.kernel.org/dri-devel/20220916163331.6849-1-ville.syrj...@linux.intel.com/

If even non-blocking commits can stall test-only commits, that could be
a problem for Weston too. Weston being single-threaded wouldn't help.


Thanks,
pq


pgprupEpLIiXA.pgp
Description: OpenPGP digital signature


Re: [PATCH v3 0/5] treewide cleanup of random integer usage

2022-10-07 Thread Ulf Hansson
On Thu, 6 Oct 2022 at 18:54, Jason A. Donenfeld  wrote:
>
> Changes v2->v3:
> - Handle get_random_int() conversions too, which were overlooked before,
>   in the same way as the rest.
>
> Hi folks,
>
> This is a five part treewide cleanup of random integer handling. The
> rules for random integers are:
>
> - If you want a secure or an insecure random u64, use get_random_u64().
> - If you want a secure or an insecure random u32, use get_random_u32().
>   * The old function prandom_u32() has been deprecated for a while now
> and is just a wrapper around get_random_u32(). Same for
> get_random_int().
> - If you want a secure or an insecure random u16, use get_random_u16().
> - If you want a secure or an insecure random u8, use get_random_u8().
> - If you want secure or insecure random bytes, use get_random_bytes().
>   * The old function prandom_bytes() has been deprecated for a while now
> and has long been a wrapper around get_random_bytes().
> - If you want a non-uniform random u32, u16, or u8 bounded by a certain
>   open interval maximum, use prandom_u32_max().
>   * I say "non-uniform", because it doesn't do any rejection sampling or
> divisions. Hence, it stays within the prandom_* namespace.
>
> These rules ought to be applied uniformly, so that we can clean up the
> deprecated functions, and earn the benefits of using the modern
> functions. In particular, in addition to the boring substitutions, this
> patchset accomplishes a few nice effects:
>
> - By using prandom_u32_max() with an upper-bound that the compiler can
>   prove at compile-time is ≤65536 or ≤256, internally get_random_u16()
>   or get_random_u8() is used, which wastes fewer batched random bytes,
>   and hence has higher throughput.
>
> - By using prandom_u32_max() instead of %, when the upper-bound is not a
>   constant, division is still avoided, because prandom_u32_max() uses
>   a faster multiplication-based trick instead.
>
> - By using get_random_u16() or get_random_u8() in cases where the return
>   value is intended to indeed be a u16 or a u8, we waste fewer batched
>   random bytes, and hence have higher throughput.
>
> So, based on those rules and benefits from following them, this patchset
> breaks down into the following five steps, which were done mostly
> manually:
>
> 1) Replace `prandom_u32() % max` and variants thereof with
>prandom_u32_max(max).
>
> 2) Replace `(type)get_random_u32()` and variants thereof with
>get_random_u16() or get_random_u8(). I took the pains to actually
>look and see what every lvalue type was across the entire tree.
>
> 3) Replace remaining deprecated uses of prandom_u32() and
>get_random_int() with get_random_u32().
>
> 4) Replace remaining deprecated uses of prandom_bytes() with
>get_random_bytes().
>
> 5) Remove the deprecated and now-unused prandom_u32() and
>prandom_bytes() inline wrapper functions.
>
> I was thinking of taking this through my random.git tree (on which this
> series is currently based) and submitting it near the end of the merge
> window, or waiting for the very end of the 6.1 cycle when there will be
> the fewest new patches brewing. If somebody with some treewide-cleanup
> experience might share some wisdom about what the best timing usually
> winds up being, I'm all ears.
>
> Please take a look! At "379 insertions(+), 422 deletions(-)", this
> should be a somewhat small patchset to review, despite it having the
> scary "treewide" moniker on it.
>
> Thanks,
> Jason
>
> Cc: Andreas Noever 
> Cc: Andrew Morton 
> Cc: Andy Shevchenko 
> Cc: Borislav Petkov 
> Cc: Catalin Marinas 
> Cc: Christoph Böhmwalder 
> Cc: Christoph Hellwig 
> Cc: Christophe Leroy 
> Cc: Daniel Borkmann 
> Cc: Dave Airlie 
> Cc: Dave Hansen 
> Cc: David S. Miller 
> Cc: Eric Dumazet 
> Cc: Florian Westphal 
> Cc: Greg Kroah-Hartman ,
> Cc: H. Peter Anvin 
> Cc: Heiko Carstens 
> Cc: Helge Deller 
> Cc: Herbert Xu 
> Cc: Huacai Chen 
> Cc: Hugh Dickins 
> Cc: Jakub Kicinski 
> Cc: James E.J. Bottomley 
> Cc: Jan Kara 
> Cc: Jason Gunthorpe 
> Cc: Jens Axboe 
> Cc: Johannes Berg 
> Cc: Jonathan Corbet 
> Cc: Jozsef Kadlecsik 
> Cc: KP Singh 
> Cc: Kees Cook 
> Cc: Marco Elver 
> Cc: Mauro Carvalho Chehab 
> Cc: Michael Ellerman 
> Cc: Pablo Neira Ayuso 
> Cc: Paolo Abeni 
> Cc: Peter Zijlstra 
> Cc: Richard Weinberger 
> Cc: Russell King 
> Cc: Theodore Ts'o 
> Cc: Thomas Bogendoerfer 
> Cc: Thomas Gleixner 
> Cc: Thomas Graf 
> Cc: Ulf Hansson 
> Cc: Vignesh Raghavendra 
> Cc: WANG Xuerui 
> Cc: Will Deacon 
> Cc: Yury Norov 
> Cc: dri-devel@lists.freedesktop.org
> Cc: kasan-...@googlegroups.com
> Cc: kernel-janit...@vger.kernel.org
> Cc: linux-arm-ker...@lists.infradead.org
> Cc: linux-bl...@vger.kernel.org
> Cc: linux-cry...@vger.kernel.org
> Cc: linux-...@vger.kernel.org
> Cc: linux-fsde...@vger.kernel.org
> Cc: linux-me...@vger.kernel.org
> Cc: linux-m...@vger.kernel.org
> Cc: linux...@kvack.org
> Cc: linux-...@vger.kernel.org
> Cc: 

Re: [Intel-gfx] [PATCH v5 3/4] drm/i915: Make the heartbeat play nice with long pre-emption timeouts

2022-10-07 Thread Tvrtko Ursulin



On 06/10/2022 22:38, john.c.harri...@intel.com wrote:

From: John Harrison 

Compute workloads are inherently not pre-emptible for long periods on
current hardware. As a workaround for this, the pre-emption timeout
for compute capable engines was disabled. This is undesirable with GuC
submission as it prevents per engine reset of hung contexts. Hence the
next patch will re-enable the timeout but bumped up by an order of
magnitude.

However, the heartbeat might not respect that. Depending upon current
activity, a pre-emption to the heartbeat pulse might not even be
attempted until the last heartbeat period. Which means that only one
period is granted for the pre-emption to occur. With the aforesaid
bump, the pre-emption timeout could be significantly larger than this
heartbeat period.

So adjust the heartbeat code to take the pre-emption timeout into
account. When it reaches the final (high priority) period, it now
ensures the delay before hitting reset is bigger than the pre-emption
timeout.

v2: Fix for selftests which adjust the heartbeat period manually.
v3: Add FIXME comment about selftests. Add extra FIXME comment and
drm_notices when setting heartbeat to a non-default value (review
feedback from Tvrtko)

Signed-off-by: John Harrison 
---
  .../gpu/drm/i915/gt/intel_engine_heartbeat.c  | 39 +++
  1 file changed, 39 insertions(+)

diff --git a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c 
b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
index a3698f611f457..9a527e1f5be65 100644
--- a/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
+++ b/drivers/gpu/drm/i915/gt/intel_engine_heartbeat.c
@@ -22,9 +22,37 @@
  
  static bool next_heartbeat(struct intel_engine_cs *engine)

  {
+   struct i915_request *rq;
long delay;
  
  	delay = READ_ONCE(engine->props.heartbeat_interval_ms);

+
+   rq = engine->heartbeat.systole;
+
+   /*
+* FIXME: The final period extension is disabled if the period has been
+* modified from the default. This is to prevent issues with certain
+* selftests which override the value and expect specific behaviour.
+* Once the selftests have been updated to either cope with variable
+* heartbeat periods (or to override the pre-emption timeout as well,
+* or just to add a selftest specific override of the extension), the
+* generic override can be removed.
+*/
+   if (rq && rq->sched.attr.priority >= I915_PRIORITY_BARRIER &&
+   delay == engine->defaults.heartbeat_interval_ms) {
+   long longer;
+
+   /*
+* The final try is at the highest priority possible. Up until 
now
+* a pre-emption might not even have been attempted. So make 
sure
+* this last attempt allows enough time for a pre-emption to 
occur.
+*/
+   longer = READ_ONCE(engine->props.preempt_timeout_ms) * 2;
+   longer = intel_clamp_heartbeat_interval_ms(engine, longer);
+   if (longer > delay)
+   delay = longer;
+   }
+
if (!delay)
return false;
  
@@ -288,6 +316,17 @@ int intel_engine_set_heartbeat(struct intel_engine_cs *engine,

if (!delay && !intel_engine_has_preempt_reset(engine))
return -ENODEV;
  
+	/* FIXME: Remove together with equally marked hack in next_heartbeat. */

+   if (delay != engine->defaults.heartbeat_interval_ms &&
+   delay < 2 * engine->props.preempt_timeout_ms) {
+   if (intel_engine_uses_guc(engine))
+   drm_notice(>i915->drm, "%s heartbeat interval 
adjusted to a non-default value which may downgrade individual engine resets to full GPU 
resets!\n",
+  engine->name);
+   else
+   drm_notice(>i915->drm, "%s heartbeat interval 
adjusted to a non-default value which may cause engine resets to target innocent contexts!\n",
+  engine->name);
+   }
+
intel_engine_pm_get(engine);
  
  	err = mutex_lock_interruptible(>timeline->mutex);


LGTM - hope it is agreeable to you too.

Reviewed-by: Tvrtko Ursulin 

Regards,

Tvrtko



Re: drm fb helpers hotplug/resize

2022-10-07 Thread Ville Syrjälä
On Fri, Oct 07, 2022 at 09:58:31AM +0200, Thomas Zimmermann wrote:
> Hi
> 
> Am 07.10.22 um 09:15 schrieb Ville Syrjälä:
> 
> >>
> >> For the absolute size of fbdev memory, I think we should introduce a
> >> module parameter in drm_fb_helper, which an option to set a default
> >> value in the kernel config. It would benefit all drivers that use fbdev
> >> emulation and work how overalloc works.
> >>
> >> If no size is given, the current approach would be used.
> >>
> >> I don't think resizing would work immediately. There isn't anything in
> >> the check_var and set_par functions that implements the necessary atomic
> >> check and commit.
> > 
> > set_par() is the thing tht does the commit.
> > 
> 
> I know. There actually even is a commit statement in set_par, which can 
> restore the initial mode. My point is that it has no means of changing 
> the display mode. A full modeset would require us to convert the fb 
> screeninfo into an atomic state and commit that instead. For the fbconv 
> helper, I once made code to convert between the two. Leaving this here 
> for reference. [1][2]

Uff. Right, we'd probably just want to properly implemnt set_par()
then. 

BTW I had a slightly different take on the bitfiled stuff.
Maybe a bit easier to extend for new formats, but full of macros:
https://patchwork.freedesktop.org/patch/203189/?series=37820=1

> 
> Similarly, in check_var we sort out and reject all mode changes. We'd 
> have to change that as well.

I guess we could do a TEST_ONLY commit there. Though I think
not doing that and just failing from set_par() should be fine
too.

> 
> I guess we can continue to ignore non-atomic modesetting.
> 
> Best regards
> Thomas
> 
> [1] 
> https://gitlab.freedesktop.org/tzimmermann/linux/-/commit/385161cd2d048b5cf80544bff8ced3da7a82dfa9
> [2] 
> https://gitlab.freedesktop.org/tzimmermann/linux/-/commit/a541c405a638f47ee80389b222fbde6e311e8220
> 
> -- 
> Thomas Zimmermann
> Graphics Driver Developer
> SUSE Software Solutions Germany GmbH
> Maxfeldstr. 5, 90409 Nürnberg, Germany
> (HRB 36809, AG Nürnberg)
> Geschäftsführer: Ivo Totev




-- 
Ville Syrjälä
Intel


Re: [git pull] drm for 6.1-rc1

2022-10-07 Thread Daniel Vetter
On Fri, 7 Oct 2022 at 01:45, Linus Torvalds
 wrote:
>
> On Thu, Oct 6, 2022 at 1:25 PM Dave Airlie  wrote:
> >
> >
> > [ 1234.778760] BUG: kernel NULL pointer dereference, address: 
> > 0088
> > [ 1234.778813] RIP: 0010:drm_sched_job_done.isra.0+0xc/0x140 [gpu_sched]
>
> As far as I can tell, that's the line
>
> struct drm_gpu_scheduler *sched = s_fence->sched;
>
> where 's_fence' is NULL. The code is
>
>0: 0f 1f 44 00 00nopl   0x0(%rax,%rax,1)
>5: 41 54push   %r12
>7: 55push   %rbp
>8: 53push   %rbx
>9: 48 89 fb  mov%rdi,%rbx
>c:* 48 8b af 88 00 00 00 mov0x88(%rdi),%rbp <-- trapping instruction
>   13: f0 ff 8d f0 00 00 00 lock decl 0xf0(%rbp)
>   1a: 48 8b 85 80 01 00 00 mov0x180(%rbp),%rax
>
> and that next 'lock decl' instruction would have been the
>
> atomic_dec(>hw_rq_count);
>
> at the top of drm_sched_job_done().
>
> Now, as to *why* you'd have a NULL s_fence, it would seem that
> drm_sched_job_cleanup() was called with an active job. Looking at that
> code, it does
>
> if (kref_read(>s_fence->finished.refcount)) {
> /* drm_sched_job_arm() has been called */
> dma_fence_put(>s_fence->finished);
> ...
>
> but then it does
>
> job->s_fence = NULL;
>
> anyway, despite the job still being active. The logic of that kind of
> "fake refcount" escapes me. The above looks fundamentally racy, not to
> say pointless and wrong (a refcount is a _count_, not a flag, so there
> could be multiple references to it, what says that you can just
> decrement one of them and say "I'm done").

Just figured I'll clarify this, because it's indeed a bit wtf and the
comment doesn't explain much. drm_sched_job_cleanup can be called both
when a real job is being cleaned up (which holds a full reference on
job->s_fence and needs to drop it) and to simplify error path in job
constructions (and the "is this refcount initialized already" signals
what exactly needs to be cleaned up or not). So no race, because the
only times this check goes different is when job construction has
failed before the job struct is visible by any other thread.

But yeah the comment could actually explain what's going on here :-)

And yeah the patch Dave reverted screws up the cascade of references
that ensures this all stays alive until drm_sched_job_cleanup is
called on active jobs, so looks all reasonable to me. Some Kunit tests
maybe to exercise these corners? Not the first time pure scheduler
code blew up, so proably worth the effort.
-Daniel

>
> Now, _why_ any of that happens, I have no idea. I'm just looking at
> the immediate "that pointer is NULL" thing, and reacting to what looks
> like a completely bogus refcount pattern.
>
> But that odd refcount pattern isn't new, so it's presumably some user
> on the amd gpu side that changed.
>
> The problem hasn't happened again for me, but that's not saying a lot,
> since it was very random to begin with.
>
>  Linus



-- 
Daniel Vetter
Software Engineer, Intel Corporation
http://blog.ffwll.ch


Re: [PATCH v4 2/2] drm/atomic-helper: Replace drm_atomic_helper_check_crtc_state()

2022-10-07 Thread Thomas Zimmermann

Hi

Am 07.10.22 um 09:29 schrieb Ville Syrjälä:

On Fri, Oct 07, 2022 at 09:17:50AM +0200, Javier Martinez Canillas wrote:

On 10/7/22 09:07, Ville Syrjälä wrote:

On Thu, Oct 06, 2022 at 10:28:12PM +0200, Javier Martinez Canillas wrote:

On 10/5/22 13:40, Thomas Zimmermann wrote:

Rename the atomic helper function drm_atomic_helper_check_crtc_state()
to drm_atomic_helper_check_crtc_primary_plane() and only check for an
attached primary plane. Adapt callers.

Instead of having one big function to check for various CRTC state
conditions, we rather want smaller functions that drivers can pick
individually.

Signed-off-by: Thomas Zimmermann 
---


Reviewed-by: Javier Martinez Canillas 

[...]


+   drm_for_each_plane_mask(plane, dev, crtc_state->plane_mask) {
+   if (plane->type == DRM_PLANE_TYPE_PRIMARY)
+   return 0;
}


I believe the code convention is to drop the curly braces when you
have a single statement inside the a loop ?


This has two.



No, it has only one that is the if statement. So according to the Linux
kernel coding style AFAIU it should be written as:

drm_for_each_plane_mask(plane, dev, crtc_state->plane_mask)
if (plane->type == DRM_PLANE_TYPE_PRIMARY)
return 0;


That is exactly what it says not to do.


Hey, no need to be so upfront about it. Without the outer braces, I'd 
find it hard to parse anyway.


Best regard
Thomas





--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


OpenPGP_signature
Description: OpenPGP digital signature


Re: drm fb helpers hotplug/resize

2022-10-07 Thread Thomas Zimmermann

Hi

Am 07.10.22 um 09:15 schrieb Ville Syrjälä:



For the absolute size of fbdev memory, I think we should introduce a
module parameter in drm_fb_helper, which an option to set a default
value in the kernel config. It would benefit all drivers that use fbdev
emulation and work how overalloc works.

If no size is given, the current approach would be used.

I don't think resizing would work immediately. There isn't anything in
the check_var and set_par functions that implements the necessary atomic
check and commit.


set_par() is the thing tht does the commit.



I know. There actually even is a commit statement in set_par, which can 
restore the initial mode. My point is that it has no means of changing 
the display mode. A full modeset would require us to convert the fb 
screeninfo into an atomic state and commit that instead. For the fbconv 
helper, I once made code to convert between the two. Leaving this here 
for reference. [1][2]


Similarly, in check_var we sort out and reject all mode changes. We'd 
have to change that as well.


I guess we can continue to ignore non-atomic modesetting.

Best regards
Thomas

[1] 
https://gitlab.freedesktop.org/tzimmermann/linux/-/commit/385161cd2d048b5cf80544bff8ced3da7a82dfa9
[2] 
https://gitlab.freedesktop.org/tzimmermann/linux/-/commit/a541c405a638f47ee80389b222fbde6e311e8220


--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


OpenPGP_signature
Description: OpenPGP digital signature


Re: [PATCH v4 2/2] drm/atomic-helper: Replace drm_atomic_helper_check_crtc_state()

2022-10-07 Thread Javier Martinez Canillas
On 10/7/22 09:29, Ville Syrjälä wrote:

[...]

>>  drm_for_each_plane_mask(plane, dev, crtc_state->plane_mask)
>>  if (plane->type == DRM_PLANE_TYPE_PRIMARY)
>>  return 0;
> 
> That is exactly what it says not to do.
> 

Ah, good to know. I re-read the kernel coding style and see now
that it only applies to single simple statements.

Better then, since not using braces even for single statements
is something that's error prone IMO, as mentioned before.

-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: [PATCH v4 2/2] drm/atomic-helper: Replace drm_atomic_helper_check_crtc_state()

2022-10-07 Thread Ville Syrjälä
On Fri, Oct 07, 2022 at 09:17:50AM +0200, Javier Martinez Canillas wrote:
> On 10/7/22 09:07, Ville Syrjälä wrote:
> > On Thu, Oct 06, 2022 at 10:28:12PM +0200, Javier Martinez Canillas wrote:
> >> On 10/5/22 13:40, Thomas Zimmermann wrote:
> >>> Rename the atomic helper function drm_atomic_helper_check_crtc_state()
> >>> to drm_atomic_helper_check_crtc_primary_plane() and only check for an
> >>> attached primary plane. Adapt callers.
> >>>
> >>> Instead of having one big function to check for various CRTC state
> >>> conditions, we rather want smaller functions that drivers can pick
> >>> individually.
> >>>
> >>> Signed-off-by: Thomas Zimmermann 
> >>> ---
> >>
> >> Reviewed-by: Javier Martinez Canillas 
> >>
> >> [...]
> >>
> >>> + drm_for_each_plane_mask(plane, dev, crtc_state->plane_mask) {
> >>> + if (plane->type == DRM_PLANE_TYPE_PRIMARY)
> >>> + return 0;
> >>>   }
> >>
> >> I believe the code convention is to drop the curly braces when you
> >> have a single statement inside the a loop ?
> > 
> > This has two.
> > 
> 
> No, it has only one that is the if statement. So according to the Linux
> kernel coding style AFAIU it should be written as:
> 
>   drm_for_each_plane_mask(plane, dev, crtc_state->plane_mask)
>   if (plane->type == DRM_PLANE_TYPE_PRIMARY)
>   return 0;

That is exactly what it says not to do.

-- 
Ville Syrjälä
Intel


Re: [PATCH v4 2/2] drm/atomic-helper: Replace drm_atomic_helper_check_crtc_state()

2022-10-07 Thread Javier Martinez Canillas
On 10/7/22 09:07, Ville Syrjälä wrote:
> On Thu, Oct 06, 2022 at 10:28:12PM +0200, Javier Martinez Canillas wrote:
>> On 10/5/22 13:40, Thomas Zimmermann wrote:
>>> Rename the atomic helper function drm_atomic_helper_check_crtc_state()
>>> to drm_atomic_helper_check_crtc_primary_plane() and only check for an
>>> attached primary plane. Adapt callers.
>>>
>>> Instead of having one big function to check for various CRTC state
>>> conditions, we rather want smaller functions that drivers can pick
>>> individually.
>>>
>>> Signed-off-by: Thomas Zimmermann 
>>> ---
>>
>> Reviewed-by: Javier Martinez Canillas 
>>
>> [...]
>>
>>> +   drm_for_each_plane_mask(plane, dev, crtc_state->plane_mask) {
>>> +   if (plane->type == DRM_PLANE_TYPE_PRIMARY)
>>> +   return 0;
>>> }
>>
>> I believe the code convention is to drop the curly braces when you
>> have a single statement inside the a loop ?
> 
> This has two.
> 

No, it has only one that is the if statement. So according to the Linux
kernel coding style AFAIU it should be written as:

drm_for_each_plane_mask(plane, dev, crtc_state->plane_mask)
if (plane->type == DRM_PLANE_TYPE_PRIMARY)
return 0;


-- 
Best regards,

Javier Martinez Canillas
Core Platforms
Red Hat



Re: drm fb helpers hotplug/resize

2022-10-07 Thread Ville Syrjälä
On Fri, Oct 07, 2022 at 09:10:27AM +0200, Thomas Zimmermann wrote:
> Hi
> 
> Am 07.10.22 um 04:16 schrieb Zack Rusin:
> > On Thu, 2022-10-06 at 10:01 +0200, Thomas Zimmermann wrote:
> >> Hi Zack
> >>
> >> Am 05.10.22 um 21:49 schrieb Zack Rusin:
> >>> Hi, Thomas.
> >>>
> >>> Because you've been the one who's been working on drm_fb_helper.c the 
> >>> most the last
> >>> few years I wanted to pick your brain a bit.
> >>>
> >>> I was porting vmwgfx to drm_fb_helper code which is largely trivial, just 
> >>> removing
> >>> all of vmwgfx_fb.c and replacing it with a call to 
> >>> drm_fbdev_generic_setup. But
> >>
> >> Thanks a lot for this work. I have been looking into doing this
> >> conversion myself at some point, but never found the time to actually do
> >> it.
> >>
> >>> drm_fb_helper.c code never deals with resizes which is a bit of a problem.
> >>>
> >>> e.g. replacing the drm_sysfs_hotplug_event() call from
> >>> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c#L2255
> >>> with drm_kms_helper_hotplug_event will call drm_fbdev_client_hotplug and 
> >>> end up in
> >>> drm_fb_helper_hotplug_event:
> >>> https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_fb_helper.c#L2003
> >>>
> >>> Now drm_fb_helper_hotplug_event does drm_client_modeset_probe but it 
> >>> never resizes
> >>> drm_fb_helper::buffer and drm_fb_helper::fb so they're both incorrectly 
> >>> sized.
> >>>
> >>> In general I don't see drm_fb_helper code ever being able to deal with 
> >>> resizes. In
> >>> particular because the fbdev's xres_virtual/yres_virtual are sized 
> >>> exactly to the
> >>> initial xres/yres.
> >>>
> >>> It's definitely a lot bigger issue on virtualized environments where at 
> >>> boot we'll
> >>> have some very conservative size (800x600) on vmwgfx which is then 
> >>> usually resized
> >>> to the size of the window. drm_fb_helper breaks pretty bad in that case 
> >>> because it
> >>> can't deal with those resizes at all.
> 
> The initial resolution of 800x600 is imposed by the driver, right? 
> (VMW_MIN_INITIAL_{WIDTH,HEIGHT}) You can use video= on the kernel 
> command line to select a resolution. That gives at least a workaround 
> with fbdev emulation.
> 
> >>>
> >>> Is this scenario something that drm_fb_helper should be able to handle or 
> >>> is it not
> >>> worth pursuing it? I don't think there's a trivial way of handling it so 
> >>> my guess is
> >>> that it would make drm_fb_helper quite a bit more complicated.
> >>
> >> I'm aware that resizing is missing. It's one of the few things I'd like
> >> to see being added to generic fbdev emulation. But as you say, it's not
> >> easy. The generic fbdev emulation has all kinds of code paths for the
> >> various drivers' memory managers. That makes it complicated.
> >>
> >> The problem is that fbdev's mmap'ed memory cannot be reallocated. It is
> >> expected to behave like 'real video memory.' So either reserve a chunk
> >> of the video ram for fbdev's GEM objects or use deferred I/O, which
> >> provides mmaped pages from a shadow buffer in system memory. vmwgfx uses
> >> the latter IIRC.
> >>
> >> But ideally, we'd get rid of most of the shadow buffering and try to
> >> mmap pages directly from GEM objects. For modesetting, this means that
> >> the new mode's framebuffer has to inherit the old framebuffer's buffer
> >> objects. Probably the easiest solution is to allocate a framebuffer once
> >> and reconfigure its parameters (width, height, pitch) on each modeset
> >> operation.
> >>
> >> Switching to a higher resolution would require more video memory.
> >> Although we cannot reallocate, this problem can be solved with the
> >> drm_fbdev_overalloc parameter. It gives the percentage of allocated
> >> video memory. If you start with 800x600 with overalloc at 400, you'd get
> >> enough video memory for 2400 scanlines. This allows for fbdev panning
> >> (i.e., pageflipping). With that extra memory fbdev could switch to
> >> another display mode with a higher resolution. For example, changing to
> >> 1024x786 would result in 1875 scanlines at the given overalloc of 400.
> >>
> >> To implement this, I guess that some of fbdev's memory allocation needs
> >> to be changed. The check_var and set_par code needs an update to handle
> >> the modeset. And I suspect that there are other dark corners that need
> >> to be reworked as well.
> > 
> > That sounds good. In a similar fashion to drm_fbdev_overalloc another, 
> > rather hacky
> > but vastly simpler approach, would be to basically allow the drivers to 
> > specify the
> > maximum size of fb to support in drm_fbdev_generic_setup. This would just 
> > directly
> > set the drm_fb_helper_surface_size::surface_width and surface_height with 
> > the end
> > result being that drm_client_framebuffer_create would be called with those 
> > values
> > and xres_virtual/yres_virtual would be set to them. Resizing would 
> > basically just
> > work then, right? Of course at the 

Re: drm fb helpers hotplug/resize

2022-10-07 Thread Thomas Zimmermann

Hi

Am 07.10.22 um 04:16 schrieb Zack Rusin:

On Thu, 2022-10-06 at 10:01 +0200, Thomas Zimmermann wrote:

Hi Zack

Am 05.10.22 um 21:49 schrieb Zack Rusin:

Hi, Thomas.

Because you've been the one who's been working on drm_fb_helper.c the most the 
last
few years I wanted to pick your brain a bit.

I was porting vmwgfx to drm_fb_helper code which is largely trivial, just 
removing
all of vmwgfx_fb.c and replacing it with a call to drm_fbdev_generic_setup. But


Thanks a lot for this work. I have been looking into doing this
conversion myself at some point, but never found the time to actually do
it.


drm_fb_helper.c code never deals with resizes which is a bit of a problem.

e.g. replacing the drm_sysfs_hotplug_event() call from
https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/vmwgfx/vmwgfx_kms.c#L2255
with drm_kms_helper_hotplug_event will call drm_fbdev_client_hotplug and end up 
in
drm_fb_helper_hotplug_event:
https://elixir.bootlin.com/linux/latest/source/drivers/gpu/drm/drm_fb_helper.c#L2003

Now drm_fb_helper_hotplug_event does drm_client_modeset_probe but it never 
resizes
drm_fb_helper::buffer and drm_fb_helper::fb so they're both incorrectly sized.

In general I don't see drm_fb_helper code ever being able to deal with resizes. 
In
particular because the fbdev's xres_virtual/yres_virtual are sized exactly to 
the
initial xres/yres.

It's definitely a lot bigger issue on virtualized environments where at boot 
we'll
have some very conservative size (800x600) on vmwgfx which is then usually 
resized
to the size of the window. drm_fb_helper breaks pretty bad in that case because 
it
can't deal with those resizes at all.


The initial resolution of 800x600 is imposed by the driver, right? 
(VMW_MIN_INITIAL_{WIDTH,HEIGHT}) You can use video= on the kernel 
command line to select a resolution. That gives at least a workaround 
with fbdev emulation.




Is this scenario something that drm_fb_helper should be able to handle or is it 
not
worth pursuing it? I don't think there's a trivial way of handling it so my 
guess is
that it would make drm_fb_helper quite a bit more complicated.


I'm aware that resizing is missing. It's one of the few things I'd like
to see being added to generic fbdev emulation. But as you say, it's not
easy. The generic fbdev emulation has all kinds of code paths for the
various drivers' memory managers. That makes it complicated.

The problem is that fbdev's mmap'ed memory cannot be reallocated. It is
expected to behave like 'real video memory.' So either reserve a chunk
of the video ram for fbdev's GEM objects or use deferred I/O, which
provides mmaped pages from a shadow buffer in system memory. vmwgfx uses
the latter IIRC.

But ideally, we'd get rid of most of the shadow buffering and try to
mmap pages directly from GEM objects. For modesetting, this means that
the new mode's framebuffer has to inherit the old framebuffer's buffer
objects. Probably the easiest solution is to allocate a framebuffer once
and reconfigure its parameters (width, height, pitch) on each modeset
operation.

Switching to a higher resolution would require more video memory.
Although we cannot reallocate, this problem can be solved with the
drm_fbdev_overalloc parameter. It gives the percentage of allocated
video memory. If you start with 800x600 with overalloc at 400, you'd get
enough video memory for 2400 scanlines. This allows for fbdev panning
(i.e., pageflipping). With that extra memory fbdev could switch to
another display mode with a higher resolution. For example, changing to
1024x786 would result in 1875 scanlines at the given overalloc of 400.

To implement this, I guess that some of fbdev's memory allocation needs
to be changed. The check_var and set_par code needs an update to handle
the modeset. And I suspect that there are other dark corners that need
to be reworked as well.


That sounds good. In a similar fashion to drm_fbdev_overalloc another, rather 
hacky
but vastly simpler approach, would be to basically allow the drivers to specify 
the
maximum size of fb to support in drm_fbdev_generic_setup. This would just 
directly
set the drm_fb_helper_surface_size::surface_width and surface_height with the 
end
result being that drm_client_framebuffer_create would be called with those 
values
and xres_virtual/yres_virtual would be set to them. Resizing would basically 
just
work then, right? Of course at the cost of possibly large allocation, e.g. 4k fb
even when only 800x600 is actually used.


For the absolute size of fbdev memory, I think we should introduce a 
module parameter in drm_fb_helper, which an option to set a default 
value in the kernel config. It would benefit all drivers that use fbdev 
emulation and work how overalloc works.


If no size is given, the current approach would be used.

I don't think resizing would work immediately. There isn't anything in 
the check_var and set_par functions that implements the necessary atomic 
check and commit.




Either 

Re: [PATCH v4 2/2] drm/atomic-helper: Replace drm_atomic_helper_check_crtc_state()

2022-10-07 Thread Ville Syrjälä
On Thu, Oct 06, 2022 at 10:28:12PM +0200, Javier Martinez Canillas wrote:
> On 10/5/22 13:40, Thomas Zimmermann wrote:
> > Rename the atomic helper function drm_atomic_helper_check_crtc_state()
> > to drm_atomic_helper_check_crtc_primary_plane() and only check for an
> > attached primary plane. Adapt callers.
> > 
> > Instead of having one big function to check for various CRTC state
> > conditions, we rather want smaller functions that drivers can pick
> > individually.
> > 
> > Signed-off-by: Thomas Zimmermann 
> > ---
> 
> Reviewed-by: Javier Martinez Canillas 
> 
> [...]
> 
> > +   drm_for_each_plane_mask(plane, dev, crtc_state->plane_mask) {
> > +   if (plane->type == DRM_PLANE_TYPE_PRIMARY)
> > +   return 0;
> > }
> 
> I believe the code convention is to drop the curly braces when you
> have a single statement inside the a loop ?

This has two.

> 
> Feel free to ignore it though. I particularly don't agree with that
> convention anyways, because I think that makes the code more error
> prone. But still thought that was worth to point that out.
> 
> -- 
> Best regards,
> 
> Javier Martinez Canillas
> Core Platforms
> Red Hat

-- 
Ville Syrjälä
Intel


Re: [PATCH v3 1/2] dt-bindings: it6505: add properties to restrict output bandwidth

2022-10-07 Thread Krzysztof Kozlowski
On 07/10/2022 05:18, allen.c...@ite.com.tw wrote:
> hi
> 
> -Original Message-
> From: Krzysztof Kozlowski  
> Sent: Thursday, October 6, 2022 4:17 PM
> To: Allen Chen (陳柏宇) 
> Cc: Jau-Chih Tseng (曾昭智) ; Kenneth Hung (洪家倫) 
> ; Hermes Wu (吳佳宏) ; Pin-yen 
> Lin ; Andrzej Hajda ; Neil 
> Armstrong ; Robert Foss ; 
> Laurent Pinchart ; Jonas Karlman 
> ; Jernej Skrabec ; David Airlie 
> ; Daniel Vetter ; Rob Herring 
> ; Krzysztof Kozlowski 
> ; open list:DRM DRIVERS 
> ; open list:OPEN FIRMWARE AND FLATTENED 
> DEVICE TREE BINDINGS ; open list 
> 
> Subject: Re: [PATCH v3 1/2] dt-bindings: it6505: add properties to restrict 
> output bandwidth
> 
> On 06/10/2022 04:04, allen wrote:
>> From: allen chen 
>>
>> Add properties to restrict dp output data-lanes and clock.
>>
>> Signed-off-by: Pin-Yen Lin 
>> Signed-off-by: Allen Chen 
>> ---
>>  .../bindings/display/bridge/ite,it6505.yaml  | 12 
>>  1 file changed, 12 insertions(+)
>>
>> diff --git 
>> a/Documentation/devicetree/bindings/display/bridge/ite,it6505.yaml 
>> b/Documentation/devicetree/bindings/display/bridge/ite,it6505.yaml
>> index 833d11b2303a..f5482a614d05 100644
>> --- a/Documentation/devicetree/bindings/display/bridge/ite,it6505.yaml
>> +++ b/Documentation/devicetree/bindings/display/bridge/ite,it6505.yaml
>> @@ -52,6 +52,16 @@ properties:
>>  maxItems: 1
>>  description: extcon specifier for the Power Delivery
>>  
>> +  ite,dp-output-data-lane-count:
>> +description: restrict the dp output data-lanes with value of 1-4
> 
> Drop "with value of 1-4" because it is redundant, but instead explain what 
> this property is about. "Restrict output" is not yet enough.
> Restrict the number? Or choose specific lanes? Why it cannot be data-lanes 
> from video-interfaces?
> 
> ==> DP output bandwidth depends on data-lane-count, so the number of output 
> data-lane-count will restrict output bandwidth.
> In this dt-binding we don't have output endpoint, so use another property 
> name to configure.
> If need to use data-lanes, where can we put in this dt-binding?

I see you got review in v2 to rename it, but it still should be the same
property, if it has similar/same meaning. It can be put here. You just
need to define its type or reference other schema (e.g. video-interfaces
if applicable) which defines it.

Best regards,
Krzysztof



Re: [PATCH v4 1/2] drm/atomic-helper: Don't allocated plane state in CRTC check

2022-10-07 Thread Thomas Zimmermann

Hi

Am 06.10.22 um 22:18 schrieb Javier Martinez Canillas:

Hello Thomas,

On 10/5/22 13:40, Thomas Zimmermann wrote:

In drm_atomic_helper_check_crtc_state(), do not add a new plane state
to the global state if it does not exist already. Adding a new plane
state will result in overhead for the plane during the atomic-commit
step.

For the test in drm_atomic_helper_check_crtc_state() to succeed, it
is important that the CRTC has an enabled primary plane after the
commit. Simply testing the CRTC state's plane_mask for a primary plane
is sufficient.

Note that the helper still only tests for an attached primary plane.
Drivers have to ensure that the plane contains valid pixel information.

v3:
* test for a primary plane in plane_mask (Ville)
v2:
* remove unnecessary test for plane->crtc (Ville)
* inline drm_atomic_get_next_plane_state() (Ville)
* acquire plane lock before accessing plane->state (Ville)

Signed-off-by: Thomas Zimmermann 
Fixes: d6b9af1097fe ("drm/atomic-helper: Add helper 
drm_atomic_helper_check_crtc_state()")


This patch makes sense to me.

Reviewed-by: Javier Martinez Canillas 

but I've a hard time parsing the subject line. Did you mean instead:

"drm/atomic-helper: Don't allocate new plane state in CRTC check" ?


Ok, I'll do that.

Best regard
Thomas



or "drm/atomic-helper: Don't add a new plane state in CRTC check" ?

In any case you can fix that while applying so no need to resend IMO.



--
Thomas Zimmermann
Graphics Driver Developer
SUSE Software Solutions Germany GmbH
Maxfeldstr. 5, 90409 Nürnberg, Germany
(HRB 36809, AG Nürnberg)
Geschäftsführer: Ivo Totev


OpenPGP_signature
Description: OpenPGP digital signature


Re: [git pull] drm for 6.1-rc1

2022-10-07 Thread Christian König

Am 07.10.22 um 04:45 schrieb Dave Airlie:

On Fri, 7 Oct 2022 at 09:45, Linus Torvalds
 wrote:

On Thu, Oct 6, 2022 at 1:25 PM Dave Airlie  wrote:


[ 1234.778760] BUG: kernel NULL pointer dereference, address: 0088
[ 1234.778813] RIP: 0010:drm_sched_job_done.isra.0+0xc/0x140 [gpu_sched]

As far as I can tell, that's the line

 struct drm_gpu_scheduler *sched = s_fence->sched;

where 's_fence' is NULL. The code is

0: 0f 1f 44 00 00nopl   0x0(%rax,%rax,1)
5: 41 54push   %r12
7: 55push   %rbp
8: 53push   %rbx
9: 48 89 fb  mov%rdi,%rbx
c:* 48 8b af 88 00 00 00 mov0x88(%rdi),%rbp <-- trapping instruction
   13: f0 ff 8d f0 00 00 00 lock decl 0xf0(%rbp)
   1a: 48 8b 85 80 01 00 00 mov0x180(%rbp),%rax

and that next 'lock decl' instruction would have been the

 atomic_dec(>hw_rq_count);

at the top of drm_sched_job_done().

Now, as to *why* you'd have a NULL s_fence, it would seem that
drm_sched_job_cleanup() was called with an active job. Looking at that
code, it does

 if (kref_read(>s_fence->finished.refcount)) {
 /* drm_sched_job_arm() has been called */
 dma_fence_put(>s_fence->finished);
 ...

but then it does

 job->s_fence = NULL;

anyway, despite the job still being active. The logic of that kind of
"fake refcount" escapes me. The above looks fundamentally racy, not to
say pointless and wrong (a refcount is a _count_, not a flag, so there
could be multiple references to it, what says that you can just
decrement one of them and say "I'm done").

Now, _why_ any of that happens, I have no idea. I'm just looking at
the immediate "that pointer is NULL" thing, and reacting to what looks
like a completely bogus refcount pattern.

But that odd refcount pattern isn't new, so it's presumably some user
on the amd gpu side that changed.

The problem hasn't happened again for me, but that's not saying a lot,
since it was very random to begin with.

I chased down the culprit to a drm sched patch, I'll send you a pull
with a revert in it.

commit e4dc45b1848bc6bcac31eb1b4ccdd7f6718b3c86
Author: Arvind Yadav 
Date:   Wed Sep 14 22:13:20 2022 +0530

 drm/sched: Use parent fence instead of finished

 Using the parent fence instead of the finished fence
 to get the job status. This change is to avoid GPU
 scheduler timeout error which can cause GPU reset.

 Signed-off-by: Arvind Yadav 
 Reviewed-by: Andrey Grodzovsky 
 Link: 
https://nam11.safelinks.protection.outlook.com/?url=https%3A%2F%2Fpatchwork.freedesktop.org%2Fpatch%2Fmsgid%2F20220914164321.2156-6-Arvind.Yadav%40amd.comdata=05%7C01%7Cchristian.koenig%40amd.com%7C516db37183e84489e1aa08daa80e087e%7C3dd8961fe4884e608e11a82d994e183d%7C0%7C0%7C638007075495101336%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000%7C%7C%7Csdata=JWT8R205jIPQu87K7a1T0UJ0iKNO8smHhosijAA0%2BNk%3Dreserved=0
 Signed-off-by: Christian König 

I'll let Arvind and Christian maybe work out what is going wrong there.


That's a known issue Arvind is already investigating for a while.

Any idea how you triggered it on boot? We have only be able to trigger 
it very sporadic.


Reverting the patch for now sounds like a good idea to me, it's only a 
cleanup anyway.


Thanks,
Christian.



Dave.


  Linus