Re: [PATCH v3] binfmt_misc: pass binfmt_misc flags to the interpreter

2021-02-12 Thread Helge Deller

On 6/5/20 6:20 PM, Laurent Vivier wrote:

Le 28/01/2020 à 14:25, Laurent Vivier a écrit :

It can be useful to the interpreter to know which flags are in use.

For instance, knowing if the preserve-argv[0] is in use would
allow to skip the pathname argument.

This patch uses an unused auxiliary vector, AT_FLAGS, to add a
flag to inform interpreter if the preserve-argv[0] is enabled.

Signed-off-by: Laurent Vivier 


Acked-by: Helge Deller 

If nobody objects, I'd like to take this patch through the
parisc arch git tree.

It fixes a real-world problem with qemu-user which fails to
preserve the argv[0] argument when the callee of an exec is a
qemu-user target.
This problem leads to build errors on multiple Debian buildd servers
which are using qemu-user as emulation for the target machines.

For details see Debian bug:
http://bugs.debian.org/970460


Helge



---

Notes:
 This can be tested with QEMU from my branch:

   https://github.com/vivier/qemu/commits/binfmt-argv0

 With something like:

   # cp /qemu-ppc /chroot/powerpc/jessie

   # qemu-binfmt-conf.sh --qemu-path / --systemd ppc --credential yes \
 --persistent no --preserve-argv0 yes
   # systemctl restart systemd-binfmt.service
   # cat /proc/sys/fs/binfmt_misc/qemu-ppc
   enabled
   interpreter //qemu-ppc
   flags: POC
   offset 0
   magic 7f454c46010201020014
   mask ff00fffe
   # chroot /chroot/powerpc/jessie  sh -c 'echo $0'
   sh

   # qemu-binfmt-conf.sh --qemu-path / --systemd ppc --credential yes \
 --persistent no --preserve-argv0 no
   # systemctl restart systemd-binfmt.service
   # cat /proc/sys/fs/binfmt_misc/qemu-ppc
   enabled
   interpreter //qemu-ppc
   flags: OC
   offset 0
   magic 7f454c46010201020014
   mask ff00fffe
   # chroot /chroot/powerpc/jessie  sh -c 'echo $0'
   /bin/sh

 v3: mix my patch with one from YunQiang Su and my comments on it
 introduce a new flag in the uabi for the AT_FLAGS
 v2: only pass special flags (remove Magic and Enabled flags)

  fs/binfmt_elf.c  | 5 -
  fs/binfmt_elf_fdpic.c| 5 -
  fs/binfmt_misc.c | 4 +++-
  include/linux/binfmts.h  | 4 
  include/uapi/linux/binfmts.h | 4 
  5 files changed, 19 insertions(+), 3 deletions(-)

diff --git a/fs/binfmt_elf.c b/fs/binfmt_elf.c
index ecd8d2698515..ff918042ceed 100644
--- a/fs/binfmt_elf.c
+++ b/fs/binfmt_elf.c
@@ -176,6 +176,7 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr 
*exec,
unsigned char k_rand_bytes[16];
int items;
elf_addr_t *elf_info;
+   elf_addr_t flags = 0;
int ei_index = 0;
const struct cred *cred = current_cred();
struct vm_area_struct *vma;
@@ -250,7 +251,9 @@ create_elf_tables(struct linux_binprm *bprm, struct elfhdr 
*exec,
NEW_AUX_ENT(AT_PHENT, sizeof(struct elf_phdr));
NEW_AUX_ENT(AT_PHNUM, exec->e_phnum);
NEW_AUX_ENT(AT_BASE, interp_load_addr);
-   NEW_AUX_ENT(AT_FLAGS, 0);
+   if (bprm->interp_flags & BINPRM_FLAGS_PRESERVE_ARGV0)
+   flags |= AT_FLAGS_PRESERVE_ARGV0;
+   NEW_AUX_ENT(AT_FLAGS, flags);
NEW_AUX_ENT(AT_ENTRY, exec->e_entry);
NEW_AUX_ENT(AT_UID, from_kuid_munged(cred->user_ns, cred->uid));
NEW_AUX_ENT(AT_EUID, from_kuid_munged(cred->user_ns, cred->euid));
diff --git a/fs/binfmt_elf_fdpic.c b/fs/binfmt_elf_fdpic.c
index 240f3543..abb90d82aa58 100644
--- a/fs/binfmt_elf_fdpic.c
+++ b/fs/binfmt_elf_fdpic.c
@@ -507,6 +507,7 @@ static int create_elf_fdpic_tables(struct linux_binprm 
*bprm,
char __user *u_platform, *u_base_platform, *p;
int loop;
int nr; /* reset for each csp adjustment */
+   unsigned long flags = 0;

  #ifdef CONFIG_MMU
/* In some cases (e.g. Hyper-Threading), we want to avoid L1 evictions
@@ -647,7 +648,9 @@ static int create_elf_fdpic_tables(struct linux_binprm 
*bprm,
NEW_AUX_ENT(AT_PHENT,   sizeof(struct elf_phdr));
NEW_AUX_ENT(AT_PHNUM,   exec_params->hdr.e_phnum);
NEW_AUX_ENT(AT_BASE,interp_params->elfhdr_addr);
-   NEW_AUX_ENT(AT_FLAGS,   0);
+   if (bprm->interp_flags & BINPRM_FLAGS_PRESERVE_ARGV0)
+   flags |= AT_FLAGS_PRESERVE_ARGV0;
+   NEW_AUX_ENT(AT_FLAGS,   flags);
NEW_AUX_ENT(AT_ENTRY,   exec_params->entry_addr);
NEW_AUX_ENT(AT_UID, (elf_addr_t) from_kuid_munged(cred->user_ns, 
cred->uid));
NEW_AUX_ENT(AT_EUID,(elf_addr_t) from_kuid_munged(cred->user_ns, 
cred->euid));
diff --git a/fs/binfmt_misc.c b/fs/binfmt_misc.c
index cdb45829354d..b9acdd26a654 100644
--- a/fs/binfmt_misc.c
+++ b/fs/binfmt_misc.c
@@ -154,7 +154,9 @@ static int load_misc_binary(struct linux_binprm *bprm)
if (bprm->interp_flags

Re: [PATCH v2 1/6] perf arm-spe: Enable sample type PERF_SAMPLE_DATA_SRC

2021-02-12 Thread Leo Yan
On Fri, Feb 12, 2021 at 05:43:40PM -0300, Arnaldo Carvalho de Melo wrote:
> Em Thu, Feb 11, 2021 at 03:38:51PM +0200, James Clark escreveu:
> > From: Leo Yan 
> > 
> > This patch is to enable sample type PERF_SAMPLE_DATA_SRC for Arm SPE in
> > the perf data, when output the tracing data, it tells tools that it
> > contains data source in the memory event.
> 
> Thanks, series applied.

Thanks a lot, James and Arnaldo.


Re: [PATCH] ARM: dts: sun8i: h3: orangepi-plus: Fix Ethernet PHY mode

2021-02-12 Thread B.R. Oake
On Wed Feb 10 at 16:01:18 CET 2021, Maxime Ripard wrote:
> Unfortunately we can't take this patch as is, this needs to be your real 
> name, see:
> https://www.kernel.org/doc/html/latest/process/submitting-patches.html#developer-s-certificate-of-origin-1-1

Dear Maxime,

Thank you very much for considering my contribution and for all your 
work on supporting sunxi-based hardware; I appreciate it.

Thank you for referring me to the Developer's Certificate of Origin, but 
I had already read it before submitting (I had to do so in order to know 
what I was saying by "Signed-off-by:") and I do certify what it says.

Looking through recent entries in the commit log of the mainline kernel, 
I see several patches from authors such as:

  H.J. Lu 
  B K Karthik 
  JC Kuo 
  EJ Hsu 
  LH Lin 
  KP Singh 
  Karthik B S 
  Shreyas NC 
  Vandana BN 

so I believe names of this form are in fact acceptable, even if the 
style might seem a little old-fashioned to some.

I would like to add that I have met many people with names such as C.J., 
A A, TC, MG, etc. That is what everybody calls them and it would be 
natural for them to sign themselves that way. Some of them might want to 
contribute to Linux some day, and I think it would be a great shame and 
a loss to all of us if they were discouraged from doing so by reading 
our conversation in the archives and concluding that any contribution 
from them, however small, would be summarily refused simply because of 
their name. Please could you ensure that does not happen?

Thank you again for your consideration.

Yours sincerely,
B.R. Oake.


Re: [External] Re: [PATCH 4/4] mm: memcontrol: fix swap uncharge on cgroup v2

2021-02-12 Thread Muchun Song
On Sat, Feb 13, 2021 at 2:57 AM Shakeel Butt  wrote:
>
> CCing more folks.
>
> On Fri, Feb 12, 2021 at 9:14 AM Muchun Song  wrote:
> >
> > The swap charges the actual number of swap entries on cgroup v2.
> > If a swap cache page is charged successful, and then we uncharge
> > the swap counter. It is wrong on cgroup v2. Because the swap
> > entry is not freed.
> >
> > Fixes: 2d1c498072de ("mm: memcontrol: make swap tracking an integral part 
> > of memory control")
> > Signed-off-by: Muchun Song 
>
> What's the user visible impact of this change?

IIUC, I think that we cannot limit the swap to memory.swap.max
on cgroup v2.

  cd /sys/fs/cgroup/
  mkdir test
  cd test
  echo 8192 > memory.max
  echo 4096 > memory.swap.max

OK. Now we limit swap to 1 page and memory to 2 pages.
Firstly, we allocate 1 page from this memory cgroup and
swap this page to swap disk. We can see:

  memory.current: 0
  memory.swap.current: 1

Then we touch this page, we will swap in and charge
the swap cache page to the memory counter and uncharge
the swap counter.

  memory.current: 1
  memory.swap.current: 0 (but actually we use a swap entry)

Then we allocate another 1 page from this memory cgroup.

  memory.current: 2
  memory.swap.current: 0 (but actually we use a swap entry)

If we swap those 2 pages to swap disk. We can charge and swap
those 2 pages successfully. Right? Maybe I am wrong.

>
> One impact I can see is that without this patch meminfo's (SwapTotal -
> SwapFree) is larger than the sum of top level memory.swap.current.
> This change will reduce that gap.
>
> BTW what about per-cpu slots_ret cache? Should we call
> mem_cgroup_uncharge_swap() before putting in the cache after this
> change?
>
> > ---
> >  mm/memcontrol.c | 2 +-
> >  1 file changed, 1 insertion(+), 1 deletion(-)
> >
> > diff --git a/mm/memcontrol.c b/mm/memcontrol.c
> > index c737c8f05992..be6bc5044150 100644
> > --- a/mm/memcontrol.c
> > +++ b/mm/memcontrol.c
> > @@ -6753,7 +6753,7 @@ int mem_cgroup_charge(struct page *page, struct 
> > mm_struct *mm, gfp_t gfp_mask)
> > memcg_check_events(memcg, page);
> > local_irq_enable();
> >
> > -   if (PageSwapCache(page)) {
> > +   if (!cgroup_subsys_on_dfl(memory_cgrp_subsys) && 
> > PageSwapCache(page)) {
> > swp_entry_t entry = { .val = page_private(page) };
> > /*
> >  * The swap entry might not get freed for a long time,
> > --
> > 2.11.0
> >


[PATCH v2 net-next 3/3] ptp: ptp_clockmatrix: Remove unused header declarations.

2021-02-12 Thread vincent.cheng.xh
From: Vincent Cheng 

Removed unused header declarations.

Signed-off-by: Vincent Cheng 
---
 drivers/ptp/ptp_clockmatrix.h | 2 --
 1 file changed, 2 deletions(-)

diff --git a/drivers/ptp/ptp_clockmatrix.h b/drivers/ptp/ptp_clockmatrix.h
index 0233236..fb32327 100644
--- a/drivers/ptp/ptp_clockmatrix.h
+++ b/drivers/ptp/ptp_clockmatrix.h
@@ -15,7 +15,6 @@
 #define FW_FILENAME"idtcm.bin"
 #define MAX_TOD(4)
 #define MAX_PLL(8)
-#define MAX_OUTPUT (12)
 
 #define MAX_ABS_WRITE_PHASE_PICOSECONDS (107374182350LL)
 
@@ -138,7 +137,6 @@ struct idtcm_channel {
enum pll_mode   pll_mode;
u8  pll;
u16 output_mask;
-   u8  output_phase_adj[MAX_OUTPUT][4];
 };
 
 struct idtcm {
-- 
2.7.4



[PATCH v2 net-next 1/3] ptp: ptp_clockmatrix: Add wait_for_sys_apll_dpll_lock.

2021-02-12 Thread vincent.cheng.xh
From: Vincent Cheng 

Part of the device initialization aligns the rising edge of the output
clock to the internal 1 PPS clock. If the system APLL and DPLL is not
locked, then the alignment will fail and there will be a fixed offset
between the internal 1 PPS clock and the output clock.

After loading the device firmware, poll the system APLL and DPLL for
locked state prior to initialization, timing out after 2 seconds.

Signed-off-by: Vincent Cheng 
Acked-by: Richard Cochran 
---
 drivers/ptp/idt8a340_reg.h| 10 ++
 drivers/ptp/ptp_clockmatrix.c | 76 +--
 drivers/ptp/ptp_clockmatrix.h | 15 +
 3 files changed, 99 insertions(+), 2 deletions(-)

diff --git a/drivers/ptp/idt8a340_reg.h b/drivers/ptp/idt8a340_reg.h
index a664dfe..ac524cf 100644
--- a/drivers/ptp/idt8a340_reg.h
+++ b/drivers/ptp/idt8a340_reg.h
@@ -122,6 +122,8 @@
 #define OTP_SCSR_CONFIG_SELECT0x0022
 
 #define STATUS0xc03c
+#define DPLL_SYS_STATUS   0x0020
+#define DPLL_SYS_APLL_STATUS  0x0021
 #define USER_GPIO0_TO_7_STATUS0x008a
 #define USER_GPIO8_TO_15_STATUS   0x008b
 
@@ -707,4 +709,12 @@
 /* Bit definitions for the DPLL_CTRL_COMBO_MASTER_CFG register */
 #define COMBO_MASTER_HOLD BIT(0)
 
+/* Bit definitions for DPLL_SYS_STATUS register */
+#define DPLL_SYS_STATE_MASK   (0xf)
+
+/* Bit definitions for SYS_APLL_STATUS register */
+#define SYS_APLL_LOSS_LOCK_LIVE_MASK   BIT(0)
+#define SYS_APLL_LOSS_LOCK_LIVE_LOCKED 0
+#define SYS_APLL_LOSS_LOCK_LIVE_UNLOCKED   1
+
 #endif
diff --git a/drivers/ptp/ptp_clockmatrix.c b/drivers/ptp/ptp_clockmatrix.c
index 051511f..3de8411 100644
--- a/drivers/ptp/ptp_clockmatrix.c
+++ b/drivers/ptp/ptp_clockmatrix.c
@@ -335,6 +335,79 @@ static int wait_for_boot_status_ready(struct idtcm *idtcm)
return -EBUSY;
 }
 
+static int read_sys_apll_status(struct idtcm *idtcm, u8 *status)
+{
+   int err;
+
+   err = idtcm_read(idtcm, STATUS, DPLL_SYS_APLL_STATUS, status,
+sizeof(u8));
+
+   return err;
+}
+
+static int read_sys_dpll_status(struct idtcm *idtcm, u8 *status)
+{
+   int err;
+
+   err = idtcm_read(idtcm, STATUS, DPLL_SYS_STATUS, status, sizeof(u8));
+
+   return err;
+}
+
+static int wait_for_sys_apll_dpll_lock(struct idtcm *idtcm)
+{
+   const char *fmt = "%d ms SYS lock timeout: APLL Loss Lock %d  DPLL 
state %d";
+   u8 i = LOCK_TIMEOUT_MS / LOCK_POLL_INTERVAL_MS;
+   u8 apll = 0;
+   u8 dpll = 0;
+
+   int err;
+
+   do {
+   err = read_sys_apll_status(idtcm, &apll);
+
+   if (err)
+   return err;
+
+   err = read_sys_dpll_status(idtcm, &dpll);
+
+   if (err)
+   return err;
+
+   apll &= SYS_APLL_LOSS_LOCK_LIVE_MASK;
+   dpll &= DPLL_SYS_STATE_MASK;
+
+   if ((apll == SYS_APLL_LOSS_LOCK_LIVE_LOCKED)
+   && (dpll == DPLL_STATE_LOCKED)) {
+   return 0;
+   } else if ((dpll == DPLL_STATE_FREERUN) ||
+  (dpll == DPLL_STATE_HOLDOVER) ||
+  (dpll == DPLL_STATE_OPEN_LOOP)) {
+   dev_warn(&idtcm->client->dev,
+   "No wait state: DPLL_SYS_STATE %d", dpll);
+   return -EPERM;
+   }
+
+   msleep(LOCK_POLL_INTERVAL_MS);
+   i--;
+
+   } while (i);
+
+   dev_warn(&idtcm->client->dev, fmt, LOCK_TIMEOUT_MS, apll, dpll);
+
+   return -ETIME;
+}
+
+static void wait_for_chip_ready(struct idtcm *idtcm)
+{
+   if (wait_for_boot_status_ready(idtcm))
+   dev_warn(&idtcm->client->dev, "BOOT_STATUS != 0xA0");
+
+   if (wait_for_sys_apll_dpll_lock(idtcm))
+   dev_warn(&idtcm->client->dev,
+"Continuing while SYS APLL/DPLL is not locked");
+}
+
 static int _idtcm_gettime(struct idtcm_channel *channel,
  struct timespec64 *ts)
 {
@@ -2235,8 +2308,7 @@ static int idtcm_probe(struct i2c_client *client,
dev_warn(&idtcm->client->dev,
 "loading firmware failed with %d\n", err);
 
-   if (wait_for_boot_status_ready(idtcm))
-   dev_warn(&idtcm->client->dev, "BOOT_STATUS != 0xA0\n");
+   wait_for_chip_ready(idtcm);
 
if (idtcm->tod_mask) {
for (i = 0; i < MAX_TOD; i++) {
diff --git a/drivers/ptp/ptp_clockmatrix.h b/drivers/ptp/ptp_clockmatrix.h
index 645de2c..0233236 100644
--- a/drivers/ptp/ptp_clockmatrix.h
+++ b/drivers/ptp/ptp_clockmatrix.h
@@ -51,6 +51,9 @@
 #define TOD_WRITE_OVERHEAD_COUNT_MAX   (2)
 #define TOD_BYTE_COUNT (11)
 
+#define LOCK_TIMEOUT_MS(2000)
+#define LOCK_POLL_INTERVAL_MS  (10)
+
 #define PEROUT_ENABLE_O

[PATCH v2 net-next 0/3] ptp: ptp_clockmatrix: Fix output 1 PPS alignment.

2021-02-12 Thread vincent.cheng.xh
From: Vincent Cheng 

This series fixes a race condition that may result in the output clock
not aligned to internal 1 PPS clock.

Part of device initialization is to align the rising edge of output
clocks to the internal rising edge of the 1 PPS clock.  If the system
APLL and DPLL are not locked when this alignment occurs, the alignment
fails and a fixed offset between the internal 1 PPS clock and the
output clock occurs.

If a clock is dynamically enabled after power-up, the output clock
also needs to be aligned to the internal 1 PPS clock.

v2:
Suggested by: Richard Cochran 
- Added const to "char * fmt"
- Break unrelated header change into separate patch

Vincent Cheng (3):
  ptp: ptp_clockmatrix: Add wait_for_sys_apll_dpll_lock.
  ptp: ptp_clockmatrix: Add alignment of 1 PPS to idtcm_perout_enable.
  ptp: ptp_clockmatrix: Remove unused header declarations.

 drivers/ptp/idt8a340_reg.h| 10 +
 drivers/ptp/ptp_clockmatrix.c | 92 ---
 drivers/ptp/ptp_clockmatrix.h | 17 +++-
 3 files changed, 112 insertions(+), 7 deletions(-)

-- 
2.7.4



[PATCH v2 net-next 2/3] ptp: ptp_clockmatrix: Add alignment of 1 PPS to idtcm_perout_enable.

2021-02-12 Thread vincent.cheng.xh
From: Vincent Cheng 

When enabling output using PTP_CLK_REQ_PEROUT, need to align the output
clock to the internal 1 PPS clock.

Signed-off-by: Vincent Cheng 
Acked-by: Richard Cochran 
---
 drivers/ptp/ptp_clockmatrix.c | 16 +---
 1 file changed, 13 insertions(+), 3 deletions(-)

diff --git a/drivers/ptp/ptp_clockmatrix.c b/drivers/ptp/ptp_clockmatrix.c
index 3de8411..a83ba4b 100644
--- a/drivers/ptp/ptp_clockmatrix.c
+++ b/drivers/ptp/ptp_clockmatrix.c
@@ -1401,13 +1401,23 @@ static int idtcm_perout_enable(struct idtcm_channel 
*channel,
   bool enable,
   struct ptp_perout_request *perout)
 {
+   struct idtcm *idtcm = channel->idtcm;
unsigned int flags = perout->flags;
+   struct timespec64 ts = {0, 0};
+   int err;
 
if (flags == PEROUT_ENABLE_OUTPUT_MASK)
-   return idtcm_output_mask_enable(channel, enable);
+   err = idtcm_output_mask_enable(channel, enable);
+   else
+   err = idtcm_output_enable(channel, enable, perout->index);
+
+   if (err) {
+   dev_err(&idtcm->client->dev, "Unable to set output enable");
+   return err;
+   }
 
-   /* Enable/disable individual output instead */
-   return idtcm_output_enable(channel, enable, perout->index);
+   /* Align output to internal 1 PPS */
+   return _idtcm_settime(channel, &ts, SCSR_TOD_WR_TYPE_SEL_DELTA_PLUS);
 }
 
 static int idtcm_get_pll_mode(struct idtcm_channel *channel,
-- 
2.7.4



[PATCH v3] perf tools: Fix arm64 build error with gcc-11

2021-02-12 Thread Jianlin Lv
gcc version: 11.0.0 20210208 (experimental) (GCC)

Following build error on arm64:

...
In function ‘printf’,
inlined from ‘regs_dump__printf’ at util/session.c:1141:3,
inlined from ‘regs__printf’ at util/session.c:1169:2:
/usr/include/aarch64-linux-gnu/bits/stdio2.h:107:10: \
  error: ‘%-5s’ directive argument is null [-Werror=format-overflow=]

107 |   return __printf_chk (__USE_FORTIFY_LEVEL - 1, __fmt, \
__va_arg_pack ());

..
In function ‘fprintf’,
  inlined from ‘perf_sample__fprintf_regs.isra’ at \
builtin-script.c:622:14:
/usr/include/aarch64-linux-gnu/bits/stdio2.h:100:10: \
error: ‘%5s’ directive argument is null [-Werror=format-overflow=]
  100 |   return __fprintf_chk (__stream, __USE_FORTIFY_LEVEL - 1, __fmt,
  101 | __va_arg_pack ());

cc1: all warnings being treated as errors
...

This patch fixes Wformat-overflow warnings. Add ternary operator,
The statement evaluates to "Unknown" if reg_name==NULL is met.

Signed-off-by: Jianlin Lv 
---
v2: Add ternary operator to avoid similar errors in other arch.
v3: Declared reg_name in inner block.
---
 tools/perf/builtin-script.c| 4 +++-
 tools/perf/util/scripting-engines/trace-event-python.c | 3 ++-
 tools/perf/util/session.c  | 3 ++-
 3 files changed, 7 insertions(+), 3 deletions(-)

diff --git a/tools/perf/builtin-script.c b/tools/perf/builtin-script.c
index 42dad4a0f8cf..0d52dc45b1c7 100644
--- a/tools/perf/builtin-script.c
+++ b/tools/perf/builtin-script.c
@@ -643,7 +643,9 @@ static int perf_sample__fprintf_regs(struct regs_dump 
*regs, uint64_t mask,
 
for_each_set_bit(r, (unsigned long *) &mask, sizeof(mask) * 8) {
u64 val = regs->regs[i++];
-   printed += fprintf(fp, "%5s:0x%"PRIx64" ", perf_reg_name(r), 
val);
+   const char *reg_name = perf_reg_name(r);
+
+   printed += fprintf(fp, "%5s:0x%"PRIx64" ", reg_name ?: 
"unknown", val);
}
 
return printed;
diff --git a/tools/perf/util/scripting-engines/trace-event-python.c 
b/tools/perf/util/scripting-engines/trace-event-python.c
index c83c2c6564e0..768bdd4240f4 100644
--- a/tools/perf/util/scripting-engines/trace-event-python.c
+++ b/tools/perf/util/scripting-engines/trace-event-python.c
@@ -699,10 +699,11 @@ static int regs_map(struct regs_dump *regs, uint64_t 
mask, char *bf, int size)
 
for_each_set_bit(r, (unsigned long *) &mask, sizeof(mask) * 8) {
u64 val = regs->regs[i++];
+   const char *reg_name = perf_reg_name(r);
 
printed += scnprintf(bf + printed, size - printed,
 "%5s:0x%" PRIx64 " ",
-perf_reg_name(r), val);
+reg_name ?: "unknown", val);
}
 
return printed;
diff --git a/tools/perf/util/session.c b/tools/perf/util/session.c
index 25adbcce0281..2b40f1c431a3 100644
--- a/tools/perf/util/session.c
+++ b/tools/perf/util/session.c
@@ -1138,9 +1138,10 @@ static void regs_dump__printf(u64 mask, u64 *regs)
 
for_each_set_bit(rid, (unsigned long *) &mask, sizeof(mask) * 8) {
u64 val = regs[i++];
+   const char *reg_name = perf_reg_name(rid);
 
printf(" %-5s 0x%016" PRIx64 "\n",
-  perf_reg_name(rid), val);
+  reg_name ?: "unknown", val);
}
 }
 
-- 
2.25.1



[PATCH 2/2] x86/entry/x32: rename __x32_compat_sys_* to __x64_compat_sys_*

2021-02-12 Thread Masahiro Yamada
In arch/x86/entry/syscall_x32.c, the macros are mapped to symbols
as follows:

  __SYSCALL_COMMON(nr, sym)  -->  __x64_
  __SYSCALL_X32(nr, sym) -->  __x32_

Originally, the syscalls in the x32 special range (512-547) were all
compat.

This assumption is now broken after the following commits:

  55db9c0e8534 ("net: remove compat_sys_{get,set}sockopt")
  5f764d624a89 ("fs: remove the compat readv/writev syscalls")
  598b3cec831f ("fs: remove compat_sys_vmsplice")
  c3973b401ef2 ("mm: remove compat_process_vm_{readv,writev}")

Those commits redefined __x32_sys_* to __x64_sys_* because there is
no stub like __x32_sys_*.

I think defining as follows is more sensible and cleaner.

  __SYSCALL_COMMON(nr, sym)  -->  __x64_
  __SYSCALL_X32(nr, sym) -->  __x64_

This works because both x86_64 and x32 use the same ABI
(RDI, RSI, RDX, R10, R8, R9)

The ugly #define __x32_sys_* will go away.

Signed-off-by: Masahiro Yamada 
---

 arch/x86/entry/syscall_x32.c   | 16 ++--
 arch/x86/include/asm/syscall_wrapper.h | 10 +-
 2 files changed, 7 insertions(+), 19 deletions(-)

diff --git a/arch/x86/entry/syscall_x32.c b/arch/x86/entry/syscall_x32.c
index f2fe0a33bcfd..3fea8fb9cd6a 100644
--- a/arch/x86/entry/syscall_x32.c
+++ b/arch/x86/entry/syscall_x32.c
@@ -8,27 +8,15 @@
 #include 
 #include 
 
-/*
- * Reuse the 64-bit entry points for the x32 versions that occupy different
- * slots in the syscall table.
- */
-#define __x32_sys_readv__x64_sys_readv
-#define __x32_sys_writev   __x64_sys_writev
-#define __x32_sys_getsockopt   __x64_sys_getsockopt
-#define __x32_sys_setsockopt   __x64_sys_setsockopt
-#define __x32_sys_vmsplice __x64_sys_vmsplice
-#define __x32_sys_process_vm_readv __x64_sys_process_vm_readv
-#define __x32_sys_process_vm_writev__x64_sys_process_vm_writev
-
 #define __SYSCALL_64(nr, sym)
 
-#define __SYSCALL_X32(nr, sym) extern long __x32_##sym(const struct pt_regs *);
+#define __SYSCALL_X32(nr, sym) extern long __x64_##sym(const struct pt_regs *);
 #define __SYSCALL_COMMON(nr, sym) extern long __x64_##sym(const struct pt_regs 
*);
 #include 
 #undef __SYSCALL_X32
 #undef __SYSCALL_COMMON
 
-#define __SYSCALL_X32(nr, sym) [nr] = __x32_##sym,
+#define __SYSCALL_X32(nr, sym) [nr] = __x64_##sym,
 #define __SYSCALL_COMMON(nr, sym) [nr] = __x64_##sym,
 
 asmlinkage const sys_call_ptr_t x32_sys_call_table[__NR_x32_syscall_max+1] = {
diff --git a/arch/x86/include/asm/syscall_wrapper.h 
b/arch/x86/include/asm/syscall_wrapper.h
index 80c08c7d5e72..6a2827d0681f 100644
--- a/arch/x86/include/asm/syscall_wrapper.h
+++ b/arch/x86/include/asm/syscall_wrapper.h
@@ -17,7 +17,7 @@ extern long __ia32_sys_ni_syscall(const struct pt_regs *regs);
  * __x64_sys_*() - 64-bit native syscall
  * __ia32_sys_*()- 32-bit native syscall or common compat syscall
  * __ia32_compat_sys_*() - 32-bit compat syscall
- * __x32_compat_sys_*()  - 64-bit X32 compat syscall
+ * __x64_compat_sys_*()  - 64-bit X32 compat syscall
  *
  * The registers are decoded according to the ABI:
  * 64-bit: RDI, RSI, RDX, R10, R8, R9
@@ -166,17 +166,17 @@ extern long __ia32_sys_ni_syscall(const struct pt_regs 
*regs);
  * with x86_64 obviously do not need such care.
  */
 #define __X32_COMPAT_SYS_STUB0(name)   \
-   __SYS_STUB0(x32, compat_sys_##name)
+   __SYS_STUB0(x64, compat_sys_##name)
 
 #define __X32_COMPAT_SYS_STUBx(x, name, ...)   \
-   __SYS_STUBx(x32, compat_sys##name,  \
+   __SYS_STUBx(x64, compat_sys##name,  \
SC_X86_64_REGS_TO_ARGS(x, __VA_ARGS__))
 
 #define __X32_COMPAT_COND_SYSCALL(name)
\
-   __COND_SYSCALL(x32, compat_sys_##name)
+   __COND_SYSCALL(x64, compat_sys_##name)
 
 #define __X32_COMPAT_SYS_NI(name)  \
-   __SYS_NI(x32, compat_sys_##name)
+   __SYS_NI(x64, compat_sys_##name)
 #else /* CONFIG_X86_X32 */
 #define __X32_COMPAT_SYS_STUB0(name)
 #define __X32_COMPAT_SYS_STUBx(x, name, ...)
-- 
2.27.0



[PATCH 1/2] x86/syscalls: fix -Wmissing-prototypes warnings from COND_SYSCALL()

2021-02-12 Thread Masahiro Yamada
Building kernel/sys_ni.c with W=1 emits tons of -Wmissing-prototypes
warnings.

$ make W=1 kernel/sys_ni.o
  [ snip ]
  CC  kernel/sys_ni.o
In file included from kernel/sys_ni.c:10:
./arch/x86/include/asm/syscall_wrapper.h:83:14: warning: no previous prototype 
for '__x64_sys_io_setup' [-Wmissing-prototypes]
   83 |  __weak long __##abi##_##name(const struct pt_regs *__unused) \
  |  ^~
./arch/x86/include/asm/syscall_wrapper.h:100:2: note: in expansion of macro 
'__COND_SYSCALL'
  100 |  __COND_SYSCALL(x64, sys_##name)
  |  ^~
./arch/x86/include/asm/syscall_wrapper.h:256:2: note: in expansion of macro 
'__X64_COND_SYSCALL'
  256 |  __X64_COND_SYSCALL(name) \
  |  ^~
kernel/sys_ni.c:39:1: note: in expansion of macro 'COND_SYSCALL'
   39 | COND_SYSCALL(io_setup);
  | ^~~~
./arch/x86/include/asm/syscall_wrapper.h:83:14: warning: no previous prototype 
for '__ia32_sys_io_setup' [-Wmissing-prototypes]
   83 |  __weak long __##abi##_##name(const struct pt_regs *__unused) \
  |  ^~
./arch/x86/include/asm/syscall_wrapper.h:120:2: note: in expansion of macro 
'__COND_SYSCALL'
  120 |  __COND_SYSCALL(ia32, sys_##name)
  |  ^~
./arch/x86/include/asm/syscall_wrapper.h:257:2: note: in expansion of macro 
'__IA32_COND_SYSCALL'
  257 |  __IA32_COND_SYSCALL(name)
  |  ^~~
kernel/sys_ni.c:39:1: note: in expansion of macro 'COND_SYSCALL'
   39 | COND_SYSCALL(io_setup);
  | ^~~~
  ...

__SYS_STUB0() and __SYS_STUBx() defined a few lines above have forward
declarations. Let's do likewise for __COND_SYSCALL() to fix the
warnings.

Signed-off-by: Masahiro Yamada 
Tested-by: Mickaël Salaün 
---

 arch/x86/include/asm/syscall_wrapper.h | 1 +
 1 file changed, 1 insertion(+)

diff --git a/arch/x86/include/asm/syscall_wrapper.h 
b/arch/x86/include/asm/syscall_wrapper.h
index a84333adeef2..80c08c7d5e72 100644
--- a/arch/x86/include/asm/syscall_wrapper.h
+++ b/arch/x86/include/asm/syscall_wrapper.h
@@ -80,6 +80,7 @@ extern long __ia32_sys_ni_syscall(const struct pt_regs *regs);
}
 
 #define __COND_SYSCALL(abi, name)  \
+   __weak long __##abi##_##name(const struct pt_regs *__unused);   \
__weak long __##abi##_##name(const struct pt_regs *__unused)\
{   \
return sys_ni_syscall();\
-- 
2.27.0



Re: [PATCH 02/27] x86/syscalls: fix -Wmissing-prototypes warnings from COND_SYSCALL()

2021-02-12 Thread Masahiro Yamada
On Sat, Feb 13, 2021 at 12:12 AM Mickaël Salaün  wrote:
>
> Could you please push this patch to Linus? Thanks.
>
> On 04/02/2021 15:16, Mickaël Salaün wrote:
> >
> > On 28/01/2021 01:50, Masahiro Yamada wrote:
> >> Building kernel/sys_ni.c with W=1 omits tons of -Wmissing-prototypes
> >> warnings.
> >>
> >> $ make W=1 kernel/sys_ni.o
> >>   [ snip ]
> >>   CC  kernel/sys_ni.o
> >> In file included from kernel/sys_ni.c:10:
> >> ./arch/x86/include/asm/syscall_wrapper.h:83:14: warning: no previous 
> >> prototype for '__x64_sys_io_setup' [-Wmissing-prototypes]
> >>83 |  __weak long __##abi##_##name(const struct pt_regs *__unused) \
> >>   |  ^~
> >> ./arch/x86/include/asm/syscall_wrapper.h:100:2: note: in expansion of 
> >> macro '__COND_SYSCALL'
> >>   100 |  __COND_SYSCALL(x64, sys_##name)
> >>   |  ^~
> >> ./arch/x86/include/asm/syscall_wrapper.h:256:2: note: in expansion of 
> >> macro '__X64_COND_SYSCALL'
> >>   256 |  __X64_COND_SYSCALL(name) \
> >>   |  ^~
> >> kernel/sys_ni.c:39:1: note: in expansion of macro 'COND_SYSCALL'
> >>39 | COND_SYSCALL(io_setup);
> >>   | ^~~~
> >> ./arch/x86/include/asm/syscall_wrapper.h:83:14: warning: no previous 
> >> prototype for '__ia32_sys_io_setup' [-Wmissing-prototypes]
> >>83 |  __weak long __##abi##_##name(const struct pt_regs *__unused) \
> >>   |  ^~
> >> ./arch/x86/include/asm/syscall_wrapper.h:120:2: note: in expansion of 
> >> macro '__COND_SYSCALL'
> >>   120 |  __COND_SYSCALL(ia32, sys_##name)
> >>   |  ^~
> >> ./arch/x86/include/asm/syscall_wrapper.h:257:2: note: in expansion of 
> >> macro '__IA32_COND_SYSCALL'
> >>   257 |  __IA32_COND_SYSCALL(name)
> >>   |  ^~~
> >> kernel/sys_ni.c:39:1: note: in expansion of macro 'COND_SYSCALL'
> >>39 | COND_SYSCALL(io_setup);
> >>   | ^~~~
> >>   ...
> >>
> >> __SYS_STUB0() and __SYS_STUBx() defined a few lines above have forward
> >> declarations. Let's do likewise for __COND_SYSCALL() to fix the
> >> warnings.
> >>
> >> Signed-off-by: Masahiro Yamada 
> >
> > Tested-by: Mickaël Salaün 
> >
> > Thanks to this patch we avoid multiple emails from Intel's bot when
> > adding new syscalls. :)


Thanks for the reminder.
I will fix the typo "omits" -> "emits"
and send v2 just in case.



> >
> >
> >> ---
> >>
> >>  arch/x86/include/asm/syscall_wrapper.h | 1 +
> >>  1 file changed, 1 insertion(+)
> >>
> >> diff --git a/arch/x86/include/asm/syscall_wrapper.h 
> >> b/arch/x86/include/asm/syscall_wrapper.h
> >> index a84333adeef2..80c08c7d5e72 100644
> >> --- a/arch/x86/include/asm/syscall_wrapper.h
> >> +++ b/arch/x86/include/asm/syscall_wrapper.h
> >> @@ -80,6 +80,7 @@ extern long __ia32_sys_ni_syscall(const struct pt_regs 
> >> *regs);
> >>  }
> >>
> >>  #define __COND_SYSCALL(abi, name)   \
> >> +__weak long __##abi##_##name(const struct pt_regs *__unused);   \
> >>  __weak long __##abi##_##name(const struct pt_regs *__unused)\
> >>  {   \
> >>  return sys_ni_syscall();\
> >>



-- 
Best Regards
Masahiro Yamada


Re: [PATCH v2] kbuild: simplify access to the kernel's version

2021-02-12 Thread Masahiro Yamada
On Sat, Feb 13, 2021 at 1:29 AM Sasha Levin  wrote:
>
> Instead of storing the version in a single integer and having various
> kernel (and userspace) code how it's constructed, export individual
> (major, patchlevel, sublevel) components and simplify kernel code that
> uses it.
>
> This should also make it easier on userspace.
>
> Signed-off-by: Sasha Levin 
> ---
>  Makefile   | 5 -
>  drivers/net/ethernet/mellanox/mlx5/core/main.c | 4 ++--
>  drivers/usb/core/hcd.c | 4 ++--
>  drivers/usb/gadget/udc/aspeed-vhub/hub.c   | 4 ++--
>  include/linux/usb/composite.h  | 4 ++--
>  kernel/sys.c   | 2 +-
>  6 files changed, 13 insertions(+), 10 deletions(-)




Applied to linux-kbuild. Thanks.





> diff --git a/Makefile b/Makefile
> index 12607d3891487..1fdd44fe16590 100644
> --- a/Makefile
> +++ b/Makefile
> @@ -1255,7 +1255,10 @@ define filechk_version.h
> expr $(VERSION) \* 65536 + 0$(PATCHLEVEL) \* 256 + 
> $(SUBLEVEL)); \
> fi;  \
> echo '#define KERNEL_VERSION(a,b,c) (((a) << 16) + ((b) << 8) +  \
> -   ((c) > 255 ? 255 : (c)))'
> +   ((c) > 255 ? 255 : (c)))';   \
> +   echo \#define LINUX_VERSION_MAJOR $(VERSION);\
> +   echo \#define LINUX_VERSION_PATCHLEVEL $(PATCHLEVEL);\
> +   echo \#define LINUX_VERSION_SUBLEVEL $(SUBLEVEL)
>  endef
>
>  $(version_h): FORCE
> diff --git a/drivers/net/ethernet/mellanox/mlx5/core/main.c 
> b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> index ca6f2fc39ea0a..29f886263dc52 100644
> --- a/drivers/net/ethernet/mellanox/mlx5/core/main.c
> +++ b/drivers/net/ethernet/mellanox/mlx5/core/main.c
> @@ -235,8 +235,8 @@ static void mlx5_set_driver_version(struct mlx5_core_dev 
> *dev)
> remaining_size = max_t(int, 0, driver_ver_sz - strlen(string));
>
> snprintf(string + strlen(string), remaining_size, "%u.%u.%u",
> -(u8)((LINUX_VERSION_CODE >> 16) & 0xff), 
> (u8)((LINUX_VERSION_CODE >> 8) & 0xff),
> -(u16)(LINUX_VERSION_CODE & 0x));
> +   LINUX_VERSION_MAJOR, LINUX_VERSION_PATCHLEVEL,
> +   LINUX_VERSION_SUBLEVEL);
>
> /*Send the command*/
> MLX5_SET(set_driver_version_in, in, opcode,
> diff --git a/drivers/usb/core/hcd.c b/drivers/usb/core/hcd.c
> index ad5a0f405a75c..3f0381344221e 100644
> --- a/drivers/usb/core/hcd.c
> +++ b/drivers/usb/core/hcd.c
> @@ -111,8 +111,8 @@ DECLARE_WAIT_QUEUE_HEAD(usb_kill_urb_queue);
>   */
>
>  /*-*/
> -#define KERNEL_REL bin2bcd(((LINUX_VERSION_CODE >> 16) & 0x0ff))
> -#define KERNEL_VER bin2bcd(((LINUX_VERSION_CODE >> 8) & 0x0ff))
> +#define KERNEL_REL bin2bcd(LINUX_VERSION_MAJOR)
> +#define KERNEL_VER bin2bcd(LINUX_VERSION_PATCHLEVEL)
>
>  /* usb 3.1 root hub device descriptor */
>  static const u8 usb31_rh_dev_descriptor[18] = {
> diff --git a/drivers/usb/gadget/udc/aspeed-vhub/hub.c 
> b/drivers/usb/gadget/udc/aspeed-vhub/hub.c
> index bfd8e77788e29..5c7dea5e0ff16 100644
> --- a/drivers/usb/gadget/udc/aspeed-vhub/hub.c
> +++ b/drivers/usb/gadget/udc/aspeed-vhub/hub.c
> @@ -46,8 +46,8 @@
>   *- Make vid/did overridable
>   *- make it look like usb1 if usb1 mode forced
>   */
> -#define KERNEL_REL bin2bcd(((LINUX_VERSION_CODE >> 16) & 0x0ff))
> -#define KERNEL_VER bin2bcd(((LINUX_VERSION_CODE >> 8) & 0x0ff))
> +#define KERNEL_REL bin2bcd(LINUX_VERSION_MAJOR)
> +#define KERNEL_VER bin2bcd(LINUX_VERSION_PATCHLEVEL)
>
>  enum {
> AST_VHUB_STR_INDEX_MAX = 4,
> diff --git a/include/linux/usb/composite.h b/include/linux/usb/composite.h
> index a2d229ab63ba5..7531ce7233747 100644
> --- a/include/linux/usb/composite.h
> +++ b/include/linux/usb/composite.h
> @@ -573,8 +573,8 @@ static inline u16 get_default_bcdDevice(void)
>  {
> u16 bcdDevice;
>
> -   bcdDevice = bin2bcd((LINUX_VERSION_CODE >> 16 & 0xff)) << 8;
> -   bcdDevice |= bin2bcd((LINUX_VERSION_CODE >> 8 & 0xff));
> +   bcdDevice = bin2bcd(LINUX_VERSION_MAJOR) << 8;
> +   bcdDevice |= bin2bcd(LINUX_VERSION_PATCHLEVEL);
> return bcdDevice;
>  }
>
> diff --git a/kernel/sys.c b/kernel/sys.c
> index 51f00fe20e4d1..c2225bd405d58 100644
> --- a/kernel/sys.c
> +++ b/kernel/sys.c
> @@ -1243,7 +1243,7 @@ static int override_release(char __user *release, 
> size_t len)
> break;
> rest++;
> }
> -   v = ((LINUX_VERSION_CODE >> 8) & 0xff) + 60;
> +   v = LINUX_VERSION_PATCHLEVEL + 60;
> copy = clamp_t(size_t, len, 1, sizeof(buf));
> copy = scnprintf(buf, copy, "2.6.%u%s", v, rest);
> ret = copy_to_user(release, buf, copy + 1);
> -

Re: [PATCH net-next 1/2] ptp: ptp_clockmatrix: Add wait_for_sys_apll_dpll_lock.

2021-02-12 Thread Vincent Cheng
On Fri, Feb 12, 2021 at 10:31:40AM EST, Richard Cochran wrote:

>On Thu, Feb 11, 2021 at 11:38:44PM -0500, vincent.cheng...@renesas.com wrote:
>
>> +static int wait_for_sys_apll_dpll_lock(struct idtcm *idtcm)
>> +{
>> +char *fmt = "%d ms SYS lock timeout: APLL Loss Lock %d  DPLL state %d";
>
>Probably you want: const char *fmt

Good point, will change in V2 patch.

>
>> diff --git a/drivers/ptp/ptp_clockmatrix.h b/drivers/ptp/ptp_clockmatrix.h
>> index 645de2c..fb32327 100644
>> --- a/drivers/ptp/ptp_clockmatrix.h
>...
>
>> @@ -123,7 +137,6 @@ struct idtcm_channel {
>>  enum pll_mode   pll_mode;
>>  u8  pll;
>>  u16 output_mask;
>> -u8  output_phase_adj[MAX_OUTPUT][4];
>>  };
>
>Looks like this removal is unrelated to the patch subject, and so it
>deserves its own small patch.

Ok, will separate into separate patch for V2.

Vincent



RE: [PATCH v2] perf probe: fix kretprobe issue caused by GCC bug

2021-02-12 Thread Jianlin Lv



> -Original Message-
> From: Arnaldo Carvalho de Melo 
> Sent: Saturday, February 13, 2021 5:34 AM
> To: Jianlin Lv 
> Cc: pet...@infradead.org; mi...@redhat.com; Mark Rutland
> ; alexander.shish...@linux.intel.com;
> jo...@redhat.com; namhy...@kernel.org; nat...@kernel.org;
> ndesaulni...@google.com; mhira...@kernel.org; f...@redhat.com;
> irog...@google.com; suman...@linux.ibm.com; linux-
> ker...@vger.kernel.org; clang-built-li...@googlegroups.com
> Subject: Re: [PATCH v2] perf probe: fix kretprobe issue caused by GCC bug
>
> Em Wed, Feb 10, 2021 at 02:26:46PM +0800, Jianlin Lv escreveu:
> > Perf failed to add kretprobe event with debuginfo of vmlinux which is
> > compiled by gcc with -fpatchable-function-entry option enabled.
> > The same issue with kernel module.
> >
> > Issue:
> >
> >   # perf probe  -v 'kernel_clone%return $retval'
> >   ..
> >   Writing event: r:probe/kernel_clone__return _text+599624 $retval
> >   Failed to write event: Invalid argument
> > Error: Failed to add events. Reason: Invalid argument (Code: -22)
> >
> >   # cat /sys/kernel/debug/tracing/error_log
> >   [156.75] trace_kprobe: error: Retprobe address must be an function entry
> >   Command: r:probe/kernel_clone__return _text+599624 $retval
> > ^
> >
> >   # llvm-dwarfdump  vmlinux |grep  -A 10  -w 0x00df2c2b
> >   0x00df2c2b:   DW_TAG_subprogram
> > DW_AT_external  (true)
> > DW_AT_name  ("kernel_clone")
> > DW_AT_decl_file ("/home/code/linux-next/kernel/fork.c")
> > DW_AT_decl_line (2423)
> > DW_AT_decl_column   (0x07)
> > DW_AT_prototyped(true)
> > DW_AT_type  (0x00dcd492 "pid_t")
> > DW_AT_low_pc(0x800010092648)
> > DW_AT_high_pc   (0x800010092b9c)
> > DW_AT_frame_base(DW_OP_call_frame_cfa)
> >
> >   # cat /proc/kallsyms |grep kernel_clone
> >   800010092640 T kernel_clone
> >   # readelf -s vmlinux |grep -i kernel_clone
> >   183173: 800010092640  1372 FUNCGLOBAL DEFAULT2 kernel_clone
> >
> >   # objdump -d vmlinux |grep -A 10  -w \:
> >   800010092640 :
> >   800010092640:   d503201fnop
> >   800010092644:   d503201fnop
> >   800010092648:   d503233fpaciasp
> >   80001009264c:   a9b87bfdstp x29, x30, [sp, #-128]!
> >   800010092650:   910003fdmov x29, sp
> >   800010092654:   a90153f3stp x19, x20, [sp, #16]
> >
> > The entry address of kernel_clone converted by debuginfo is
> > _text+599624 (0x92648), which is consistent with the value of
> DW_AT_low_pc attribute.
> > But the symbolic address of kernel_clone from /proc/kallsyms is
> > 800010092640.
> >
> > This issue is found on arm64, -fpatchable-function-entry=2 is enabled
> > when CONFIG_DYNAMIC_FTRACE_WITH_REGS=y;
> > Just as objdump displayed the assembler contents of kernel_clone, GCC
> > generate 2 NOPs  at the beginning of each function.
> >
> > kprobe_on_func_entry detects that (_text+599624) is not the entry
> > address of the function, which leads to the failure of adding kretprobe
> event.
> >
> > ---
> > kprobe_on_func_entry
> > ->_kprobe_addr
> > ->kallsyms_lookup_size_offset
> > ->arch_kprobe_on_func_entry// FALSE
> > ---
>
> Please don't use --- at the start of a line, it is used to separate from the 
> patch
> itself, later down your message.
>
> It causes this:
>
> [acme@five perf]$ am /wb/1.patch
> Traceback (most recent call last):
>   File "/home/acme/bin/ksoff.py", line 180, in 
> sign_msg(sys.stdin, sys.stdout)
>   File "/home/acme/bin/ksoff.py", line 142, in sign_msg
> sob.remove(last_sob[0])
> TypeError: 'NoneType' object is not subscriptable [acme@five perf]$
>
> I'm fixing this by removing that --- markers
>

Sorry for the inconvenience?
Should I commit another version to fix this issue?

Jianlin

> > The cause of the issue is that the first instruction in the compile
> > unit indicated by DW_AT_low_pc does not include NOPs.
> > This issue exists in all gcc versions that support
> > -fpatchable-function-entry option.
> >
> > I have reported it to the GCC community:
> > https://gcc.gnu.org/bugzilla/show_bug.cgi?id=98776
> >
> > Currently arm64 and PA-RISC may enable fpatchable-function-entry option.
> > The kernel compiled with clang does not have this issue.
> >
> > FIX:
> >
> > This GCC issue only cause the registration failure of the kretprobe
> > event which doesn't need debuginfo. So, stop using debuginfo for retprobe.
> > map will be used to query the probe function address.
> >
> > Signed-off-by: Jianlin Lv 
> > ---
> > v2: stop using debuginfo for retprobe, and update changelog.
> > ---
> >  tools/perf/util/probe-event.c | 10 ++
> >  1 file changed, 10 insertions(+)
> >
> > diff --git a/tools/perf/util/probe-event.c
> > b/tools/perf/ut

Re: [PATCH] HID: sony: Support for DS4 clones that do not implement feature report 0x81

2021-02-12 Thread Ivan Mironov
Ignore this patch, I am working on a better one.

On Wed, 2021-01-13 at 22:34 +0500, Ivan Mironov wrote:
> There are clones of DualShock 4 that are very similar to the originals,
> except of 1) they do not support HID feature report 0x81 and 2) they do
> not have any USB Audio interfaces despite they physically have audio
> jack.
> 
> Such controllers are working fine with Linux when connected via
> Bluetooth, but not when connected via USB. Here is how failed USB
> connection attempt looks in log:
> 
>   usb 1-5: New USB device found, idVendor=054c, idProduct=05c4, 
> bcdDevice= 1.00
>   usb 1-5: New USB device strings: Mfr=1, Product=2, SerialNumber=0
>   usb 1-5: Product: Wireless Controller
>   usb 1-5: Manufacturer: Sony Computer Entertainment
>   sony 0003:054C:05C4.0007: failed to retrieve feature report 0x81 with 
> the DualShock 4 MAC address
>   sony 0003:054C:05C4.0007: hidraw6: USB HID v81.11 Gamepad [Sony 
> Computer Entertainment Wireless Controller] on usb-:00:14.0-5/input0
>   sony 0003:054C:05C4.0007: failed to claim input
> 
> This patch adds support of using feature report 0x12 as a fallback for
> Bluetooth MAC address retrieval. Feature report 0x12 also seems to be
> used by DS4Windows[1] for all DS4 controllers.
> 
> [1] 
> https://github.com/Ryochan7/DS4Windows/blob/1b74a4440089f38a24ee2c2483c1d733a0692b8f/DS4Windows/HidLibrary/HidDevice.cs#L479
> 
> Signed-off-by: Ivan Mironov 
> ---
>  drivers/hid/hid-sony.c | 72 ++
>  1 file changed, 52 insertions(+), 20 deletions(-)
> 
> diff --git a/drivers/hid/hid-sony.c b/drivers/hid/hid-sony.c
> index e3a557dc9ffd..97df12180e45 100644
> --- a/drivers/hid/hid-sony.c
> +++ b/drivers/hid/hid-sony.c
> @@ -491,6 +491,7 @@ struct motion_output_report_02 {
>  
> 
>  #define DS4_FEATURE_REPORT_0x02_SIZE 37
>  #define DS4_FEATURE_REPORT_0x05_SIZE 41
> +#define DS4_FEATURE_REPORT_0x12_SIZE 16
>  #define DS4_FEATURE_REPORT_0x81_SIZE 7
>  #define DS4_FEATURE_REPORT_0xA3_SIZE 49
>  #define DS4_INPUT_REPORT_0x11_SIZE 78
> @@ -2593,6 +2594,53 @@ static int sony_get_bt_devaddr(struct sony_sc *sc)
>   return 0;
>  }
>  
> 
> +static int sony_get_usb_ds4_devaddr(struct sony_sc *sc)
> +{
> + u8 *buf = NULL;
> + int ret;
> +
> + buf = kmalloc(max(DS4_FEATURE_REPORT_0x12_SIZE, 
> DS4_FEATURE_REPORT_0x81_SIZE), GFP_KERNEL);
> + if (!buf)
> + return -ENOMEM;
> +
> + /*
> +  * The MAC address of a DS4 controller connected via USB can be
> +  * retrieved with feature report 0x81. The address begins at
> +  * offset 1.
> +  */
> + ret = hid_hw_raw_request(sc->hdev, 0x81, buf,
> + DS4_FEATURE_REPORT_0x81_SIZE, HID_FEATURE_REPORT,
> + HID_REQ_GET_REPORT);
> + if (ret == DS4_FEATURE_REPORT_0x81_SIZE) {
> + memcpy(sc->mac_address, &buf[1], sizeof(sc->mac_address));
> + goto out_free;
> + }
> + dbg_hid("%s: hid_hw_raw_request(..., 0x81, ...) returned %d\n", 
> __func__, ret);
> +
> + /*
> +  * Some variants do not implement feature report 0x81 at all.
> +  * Fortunately, feature report 0x12 also contains the MAC address of
> +  * a controller.
> +  */
> + ret = hid_hw_raw_request(sc->hdev, 0x12, buf,
> + DS4_FEATURE_REPORT_0x12_SIZE, HID_FEATURE_REPORT,
> + HID_REQ_GET_REPORT);
> + if (ret == DS4_FEATURE_REPORT_0x12_SIZE) {
> + memcpy(sc->mac_address, &buf[1], sizeof(sc->mac_address));
> + goto out_free;
> + }
> + dbg_hid("%s: hid_hw_raw_request(..., 0x12, ...) returned %d\n", 
> __func__, ret);
> +
> + hid_err(sc->hdev, "failed to retrieve feature reports 0x81 and 0x12 
> with the DualShock 4 MAC address\n");
> + ret = ret < 0 ? ret : -EINVAL;
> +
> +out_free:
> +
> + kfree(buf);
> +
> + return ret;
> +}
> +
>  static int sony_check_add(struct sony_sc *sc)
>  {
>   u8 *buf = NULL;
> @@ -2613,26 +2661,9 @@ static int sony_check_add(struct sony_sc *sc)
>   return 0;
>   }
>   } else if (sc->quirks & (DUALSHOCK4_CONTROLLER_USB | 
> DUALSHOCK4_DONGLE)) {
> - buf = kmalloc(DS4_FEATURE_REPORT_0x81_SIZE, GFP_KERNEL);
> - if (!buf)
> - return -ENOMEM;
> -
> - /*
> -  * The MAC address of a DS4 controller connected via USB can be
> -  * retrieved with feature report 0x81. The address begins at
> -  * offset 1.
> -  */
> - ret = hid_hw_raw_request(sc->hdev, 0x81, buf,
> - DS4_FEATURE_REPORT_0x81_SIZE, 
> HID_FEATURE_REPORT,
> - HID_REQ_GET_REPORT);
> -
> - if (ret != DS4_FEATURE_REPORT_0x81_SIZE) {
> - hid_err(sc->hdev, "failed to retrieve feature report 
> 0x81 with the DualShock 4 MAC address\n");
> - ret = 

[PATCH net-next v2 0/3] net: phy: broadcom: Cleanups and APD

2021-02-12 Thread Florian Fainelli
This patch series cleans up the brcmphy.h header and its numerous unused
phydev->dev_flags, fixes the RXC/TXC clock disabling bit and allows the
BCM54210E PHY to utilize APD.

Changes in v2:

- dropped the patch that attempted to fix a possible discrepancy between
  the datasheet and the actual hardware
- added a patch to remove a forward declaration
- do additional flags cleanup

Florian Fainelli (3):
  net: phy: broadcom: Avoid forward for bcm54xx_config_clock_delay()
  net: phy: broadcom: Remove unused flags
  net: phy: broadcom: Allow BCM54210E to configure APD

 drivers/net/phy/broadcom.c | 101 +
 include/linux/brcmphy.h|  23 -
 2 files changed, 66 insertions(+), 58 deletions(-)

-- 
2.25.1



[PATCH net-next v2 2/3] net: phy: broadcom: Remove unused flags

2021-02-12 Thread Florian Fainelli
We have a number of unused flags defined today and since we are scarce
on space and may need to introduce new flags in the future remove and
shift every existing flag down into a contiguous assignment.
PHY_BCM_FLAGS_MODE_1000BX was only used internally for the BCM54616S
PHY, so we allocate a driver private structure instead to store that
flag instead of canibalizing one from phydev->dev_flags for that
purpose.

Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/broadcom.c | 19 ---
 include/linux/brcmphy.h| 21 -
 2 files changed, 24 insertions(+), 16 deletions(-)

diff --git a/drivers/net/phy/broadcom.c b/drivers/net/phy/broadcom.c
index 4142f69c1530..3ce266ab521b 100644
--- a/drivers/net/phy/broadcom.c
+++ b/drivers/net/phy/broadcom.c
@@ -381,10 +381,21 @@ static int bcm5481_config_aneg(struct phy_device *phydev)
return ret;
 }
 
+struct bcm54616s_phy_priv {
+   bool mode_1000bx_en;
+};
+
 static int bcm54616s_probe(struct phy_device *phydev)
 {
+   struct bcm54616s_phy_priv *priv;
int val, intf_sel;
 
+   priv = devm_kzalloc(&phydev->mdio.dev, sizeof(*priv), GFP_KERNEL);
+   if (!priv)
+   return -ENOMEM;
+
+   phydev->priv = priv;
+
val = bcm_phy_read_shadow(phydev, BCM54XX_SHD_MODE);
if (val < 0)
return val;
@@ -407,7 +418,7 @@ static int bcm54616s_probe(struct phy_device *phydev)
 * 1000BASE-X configuration.
 */
if (!(val & BCM54616S_100FX_MODE))
-   phydev->dev_flags |= PHY_BCM_FLAGS_MODE_1000BX;
+   priv->mode_1000bx_en = true;
 
phydev->port = PORT_FIBRE;
}
@@ -417,10 +428,11 @@ static int bcm54616s_probe(struct phy_device *phydev)
 
 static int bcm54616s_config_aneg(struct phy_device *phydev)
 {
+   struct bcm54616s_phy_priv *priv = phydev->priv;
int ret;
 
/* Aneg firstly. */
-   if (phydev->dev_flags & PHY_BCM_FLAGS_MODE_1000BX)
+   if (priv->mode_1000bx_en)
ret = genphy_c37_config_aneg(phydev);
else
ret = genphy_config_aneg(phydev);
@@ -433,9 +445,10 @@ static int bcm54616s_config_aneg(struct phy_device *phydev)
 
 static int bcm54616s_read_status(struct phy_device *phydev)
 {
+   struct bcm54616s_phy_priv *priv = phydev->priv;
int err;
 
-   if (phydev->dev_flags & PHY_BCM_FLAGS_MODE_1000BX)
+   if (priv->mode_1000bx_en)
err = genphy_c37_read_status(phydev);
else
err = genphy_read_status(phydev);
diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h
index de9430d55c90..844dcfe789a2 100644
--- a/include/linux/brcmphy.h
+++ b/include/linux/brcmphy.h
@@ -61,19 +61,14 @@
 #define PHY_BCM_OUI_5  0x03625e00
 #define PHY_BCM_OUI_6  0xae025000
 
-#define PHY_BCM_FLAGS_MODE_COPPER  0x0001
-#define PHY_BCM_FLAGS_MODE_1000BX  0x0002
-#define PHY_BCM_FLAGS_INTF_SGMII   0x0010
-#define PHY_BCM_FLAGS_INTF_XAUI0x0020
-#define PHY_BRCM_WIRESPEED_ENABLE  0x0100
-#define PHY_BRCM_AUTO_PWRDWN_ENABLE0x0200
-#define PHY_BRCM_RX_REFCLK_UNUSED  0x0400
-#define PHY_BRCM_STD_IBND_DISABLE  0x0800
-#define PHY_BRCM_EXT_IBND_RX_ENABLE0x1000
-#define PHY_BRCM_EXT_IBND_TX_ENABLE0x2000
-#define PHY_BRCM_CLEAR_RGMII_MODE  0x4000
-#define PHY_BRCM_DIS_TXCRXC_NOENRGY0x8000
-#define PHY_BRCM_EN_MASTER_MODE0x0001
+#define PHY_BRCM_AUTO_PWRDWN_ENABLE0x0001
+#define PHY_BRCM_RX_REFCLK_UNUSED  0x0002
+#define PHY_BRCM_STD_IBND_DISABLE  0x0004
+#define PHY_BRCM_EXT_IBND_RX_ENABLE0x0008
+#define PHY_BRCM_EXT_IBND_TX_ENABLE0x0010
+#define PHY_BRCM_CLEAR_RGMII_MODE  0x0020
+#define PHY_BRCM_DIS_TXCRXC_NOENRGY0x0040
+#define PHY_BRCM_EN_MASTER_MODE0x0080
 
 /* Broadcom BCM7xxx specific workarounds */
 #define PHY_BRCM_7XXX_REV(x)   (((x) >> 8) & 0xff)
-- 
2.25.1



[PATCH net-next v2 3/3] net: phy: broadcom: Allow BCM54210E to configure APD

2021-02-12 Thread Florian Fainelli
BCM54210E/BCM50212E has been verified to work correctly with the
auto-power down configuration done by bcm54xx_adjust_rxrefclk(), add it
to the list of PHYs working.

While we are at it, provide an appropriate name for the bit we are
changing which disables the RXC and TXC during auto-power down when
there is no energy on the cable.

Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/broadcom.c | 8 +---
 include/linux/brcmphy.h| 2 +-
 2 files changed, 6 insertions(+), 4 deletions(-)

diff --git a/drivers/net/phy/broadcom.c b/drivers/net/phy/broadcom.c
index 3ce266ab521b..91fbd26c809e 100644
--- a/drivers/net/phy/broadcom.c
+++ b/drivers/net/phy/broadcom.c
@@ -193,6 +193,7 @@ static void bcm54xx_adjust_rxrefclk(struct phy_device 
*phydev)
if (BRCM_PHY_MODEL(phydev) != PHY_ID_BCM57780 &&
BRCM_PHY_MODEL(phydev) != PHY_ID_BCM50610 &&
BRCM_PHY_MODEL(phydev) != PHY_ID_BCM50610M &&
+   BRCM_PHY_MODEL(phydev) != PHY_ID_BCM54210E &&
BRCM_PHY_MODEL(phydev) != PHY_ID_BCM54810 &&
BRCM_PHY_MODEL(phydev) != PHY_ID_BCM54811)
return;
@@ -227,9 +228,10 @@ static void bcm54xx_adjust_rxrefclk(struct phy_device 
*phydev)
val |= BCM54XX_SHD_SCR3_DLLAPD_DIS;
 
if (phydev->dev_flags & PHY_BRCM_DIS_TXCRXC_NOENRGY) {
-   if (BRCM_PHY_MODEL(phydev) == PHY_ID_BCM54810 ||
-   BRCM_PHY_MODEL(phydev) == PHY_ID_BCM54811)
-   val |= BCM54810_SHD_SCR3_TRDDAPD;
+   if (BRCM_PHY_MODEL(phydev) == PHY_ID_BCM54210E ||
+   BRCM_PHY_MODEL(phydev) == PHY_ID_BCM54810 ||
+   BRCM_PHY_MODEL(phydev) == PHY_ID_BCM54210E)
+   val |= BCM54XX_SHD_SCR3_RXCTXC_DIS;
else
val |= BCM54XX_SHD_SCR3_TRDDAPD;
}
diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h
index 844dcfe789a2..16597d3fa011 100644
--- a/include/linux/brcmphy.h
+++ b/include/linux/brcmphy.h
@@ -193,6 +193,7 @@
 #define  BCM54XX_SHD_SCR3_DEF_CLK125   0x0001
 #define  BCM54XX_SHD_SCR3_DLLAPD_DIS   0x0002
 #define  BCM54XX_SHD_SCR3_TRDDAPD  0x0004
+#define  BCM54XX_SHD_SCR3_RXCTXC_DIS   0x0100
 
 /* 01010: Auto Power-Down */
 #define BCM54XX_SHD_APD0x0a
@@ -253,7 +254,6 @@
 #define BCM54810_EXP_BROADREACH_LRE_MISC_CTL_EN(1 << 0)
 #define BCM54810_SHD_CLK_CTL   0x3
 #define BCM54810_SHD_CLK_CTL_GTXCLK_EN (1 << 9)
-#define BCM54810_SHD_SCR3_TRDDAPD  0x0100
 
 /* BCM54612E Registers */
 #define BCM54612E_EXP_SPARE0   (MII_BCM54XX_EXP_SEL_ETC + 0x34)
-- 
2.25.1



[PATCH net-next v2 1/3] net: phy: broadcom: Avoid forward for bcm54xx_config_clock_delay()

2021-02-12 Thread Florian Fainelli
Avoid a forward declaration by moving the callers of
bcm54xx_config_clock_delay() below its body.

Signed-off-by: Florian Fainelli 
---
 drivers/net/phy/broadcom.c | 74 +++---
 1 file changed, 36 insertions(+), 38 deletions(-)

diff --git a/drivers/net/phy/broadcom.c b/drivers/net/phy/broadcom.c
index 0472b3470c59..4142f69c1530 100644
--- a/drivers/net/phy/broadcom.c
+++ b/drivers/net/phy/broadcom.c
@@ -26,44 +26,6 @@ MODULE_DESCRIPTION("Broadcom PHY driver");
 MODULE_AUTHOR("Maciej W. Rozycki");
 MODULE_LICENSE("GPL");
 
-static int bcm54xx_config_clock_delay(struct phy_device *phydev);
-
-static int bcm54210e_config_init(struct phy_device *phydev)
-{
-   int val;
-
-   bcm54xx_config_clock_delay(phydev);
-
-   if (phydev->dev_flags & PHY_BRCM_EN_MASTER_MODE) {
-   val = phy_read(phydev, MII_CTRL1000);
-   val |= CTL1000_AS_MASTER | CTL1000_ENABLE_MASTER;
-   phy_write(phydev, MII_CTRL1000, val);
-   }
-
-   return 0;
-}
-
-static int bcm54612e_config_init(struct phy_device *phydev)
-{
-   int reg;
-
-   bcm54xx_config_clock_delay(phydev);
-
-   /* Enable CLK125 MUX on LED4 if ref clock is enabled. */
-   if (!(phydev->dev_flags & PHY_BRCM_RX_REFCLK_UNUSED)) {
-   int err;
-
-   reg = bcm_phy_read_exp(phydev, BCM54612E_EXP_SPARE0);
-   err = bcm_phy_write_exp(phydev, BCM54612E_EXP_SPARE0,
-   BCM54612E_LED4_CLK125OUT_EN | reg);
-
-   if (err < 0)
-   return err;
-   }
-
-   return 0;
-}
-
 static int bcm54xx_config_clock_delay(struct phy_device *phydev)
 {
int rc, val;
@@ -105,6 +67,42 @@ static int bcm54xx_config_clock_delay(struct phy_device 
*phydev)
return 0;
 }
 
+static int bcm54210e_config_init(struct phy_device *phydev)
+{
+   int val;
+
+   bcm54xx_config_clock_delay(phydev);
+
+   if (phydev->dev_flags & PHY_BRCM_EN_MASTER_MODE) {
+   val = phy_read(phydev, MII_CTRL1000);
+   val |= CTL1000_AS_MASTER | CTL1000_ENABLE_MASTER;
+   phy_write(phydev, MII_CTRL1000, val);
+   }
+
+   return 0;
+}
+
+static int bcm54612e_config_init(struct phy_device *phydev)
+{
+   int reg;
+
+   bcm54xx_config_clock_delay(phydev);
+
+   /* Enable CLK125 MUX on LED4 if ref clock is enabled. */
+   if (!(phydev->dev_flags & PHY_BRCM_RX_REFCLK_UNUSED)) {
+   int err;
+
+   reg = bcm_phy_read_exp(phydev, BCM54612E_EXP_SPARE0);
+   err = bcm_phy_write_exp(phydev, BCM54612E_EXP_SPARE0,
+   BCM54612E_LED4_CLK125OUT_EN | reg);
+
+   if (err < 0)
+   return err;
+   }
+
+   return 0;
+}
+
 /* Needs SMDSP clock enabled via bcm54xx_phydsp_config() */
 static int bcm50610_a0_workaround(struct phy_device *phydev)
 {
-- 
2.25.1



Re: [PATCH net-next 2/3] net: phy: broadcom: Fix RXC/TXC auto disabling

2021-02-12 Thread Florian Fainelli



On 2/12/2021 5:14 PM, Florian Fainelli wrote:
> 
> 
> On 2/12/2021 5:11 PM, Vladimir Oltean wrote:
>> On Fri, Feb 12, 2021 at 12:57:20PM -0800, Florian Fainelli wrote:
>>> When support for optionally disabling the TXC was introduced, bit 2 was
>>> used to do that operation but the datasheet for 50610M from 2009 does
>>> not show bit 2 as being defined. Bit 8 is the one that allows automatic
>>> disabling of the RXC/TXC auto disabling during auto power down.
>>>
>>> Fixes: 52fae0837153 ("tg3 / broadcom: Optionally disable TXC if no link")
>>> Signed-off-by: Florian Fainelli 
>>> ---
>>>  include/linux/brcmphy.h | 2 +-
>>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h
>>> index da7bf9dfef5b..3dd8203cf780 100644
>>> --- a/include/linux/brcmphy.h
>>> +++ b/include/linux/brcmphy.h
>>> @@ -193,7 +193,7 @@
>>>  #define BCM54XX_SHD_SCR3   0x05
>>>  #define  BCM54XX_SHD_SCR3_DEF_CLK125   0x0001
>>>  #define  BCM54XX_SHD_SCR3_DLLAPD_DIS   0x0002
>>> -#define  BCM54XX_SHD_SCR3_TRDDAPD  0x0004
>>> +#define  BCM54XX_SHD_SCR3_TRDDAPD  0x0100
>>>  
>>>  /* 01010: Auto Power-Down */
>>>  #define BCM54XX_SHD_APD0x0a
>>> -- 
>>> 2.25.1
>>>
>>
>> We may have a problem here, with the layout of the Spare Control 3
>> register not being as universal as we think.
>>
>> Your finding may have been the same as Kevin Lo's from commit
>> b0ed0bbfb304 ("net: phy: broadcom: add support for BCM54811 PHY"),
>> therefore your change is making BCM54XX_SHD_SCR3_TRDDAPD ==
>> BCM54810_SHD_SCR3_TRDDAPD, so currently this if condition is redundant
>> and probably something else is wrong too:
>>
>>  if (phydev->dev_flags & PHY_BRCM_DIS_TXCRXC_NOENRGY) {
>>  if (BRCM_PHY_MODEL(phydev) == PHY_ID_BCM54810 ||
>>  BRCM_PHY_MODEL(phydev) == PHY_ID_BCM54811)
>>  val |= BCM54810_SHD_SCR3_TRDDAPD;
>>  else
>>  val |= BCM54XX_SHD_SCR3_TRDDAPD;
>>  }
>>
>> I'm not sure what "TRDD" stands for, but my copy of the BCM5464R
>> datasheet shows both bits 2 as well as 8 as being reserved. I have
>> "CLK125 Output" in bit 0, "DLL Auto Power-Down" in bit 1, "SD/Energy
>> Detect Change" in bit 5, "TXC Disable" in bit 6, and that's about it.
> 
> Let me go back to the datasheet of all of the PHYs supported by
> bcm54xx_adjust_rxrefclk() and make sure we set the right bit.

I really don't know what the situation is with the 50610 and 50610M and
the datasheet appears to reflect that the latest PHYs should be revision
3 or newer (which we explicitly check for earlier), and that bit 8 is
supposed to control the disabling of RXC/TXC during auto-power down. I
can only trust that Matt checked with the design team back then and that
the datasheet must be wrong (would not be an isolated incident), let's
try not to fix something that we do not know for sure is broken.

I will respin without this patch and with another clean up added.
-- 
Florian


Re: [RFC PATCH v3 1/1] scsi: ufs: Enable power management for wlun

2021-02-12 Thread Bart Van Assche
On 2/11/21 11:18 AM, Asutosh Das wrote:
> +static inline bool is_rpmb_wlun(struct scsi_device *sdev)
> +{
> + return (sdev->lun == ufshcd_upiu_wlun_to_scsi_wlun(UFS_UPIU_RPMB_WLUN));
> +}
> +
> +static inline bool is_device_wlun(struct scsi_device *sdev)
> +{
> + return (sdev->lun ==
> + ufshcd_upiu_wlun_to_scsi_wlun(UFS_UPIU_UFS_DEVICE_WLUN));
> +}

A minor comment: checkpatch should have reported that "return is not a
function" for the above code.

>  /**
> + * ufshcd_setup_links - associate link b/w device wlun and other luns
> + * @sdev: pointer to SCSI device
> + * @hba: pointer to ufs hba
> + *
> + * Returns void
> + */

Please leave out "Returns void".

> +static int ufshcd_wl_suspend(struct device *dev)
> +{
> + struct scsi_device *sdev = to_scsi_device(dev);
> + struct ufs_hba *hba;
> + int ret;
> + ktime_t start = ktime_get();
> +
> + if (is_rpmb_wlun(sdev))
> + return 0;
> + hba = shost_priv(sdev->host);
> + ret = __ufshcd_wl_suspend(hba, UFS_SYSTEM_PM);
> + if (ret)
> + dev_err(&sdev->sdev_gendev, "%s failed: %d\n", __func__,  ret);
> +
> + trace_ufshcd_wl_suspend(dev_name(dev), ret,
> + ktime_to_us(ktime_sub(ktime_get(), start)),
> + hba->curr_dev_pwr_mode, hba->uic_link_state);
> +
> + return ret;
> +
> +}

Please remove the blank line after the return statement.

Otherwise this patch looks good to me. Hence:

Reviewed-by: Bart Van Assche 



Re: [PATCH 5.10 00/54] 5.10.16-rc1 review

2021-02-12 Thread Ross Schmidt
On Thu, Feb 11, 2021 at 04:01:44PM +0100, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.10.16 release.
> There are 54 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>

Compiled and booted with no regressions on x86_64.

Tested-by: Ross Schmidt 


thanks,

Ross


Re: [PATCH 5.4 00/24] 5.4.98-rc1 review

2021-02-12 Thread Ross Schmidt
On Thu, Feb 11, 2021 at 04:02:23PM +0100, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 5.4.98 release.
> There are 24 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>

Compiled and booted with no regressions on x86_64.

Tested-by: Ross Schmidt 


thanks,

Ross


Re: [PATCH 4.19 00/27] 4.19.176-rc2 review

2021-02-12 Thread Ross Schmidt
On Fri, Feb 12, 2021 at 08:55:04AM +0100, Greg Kroah-Hartman wrote:
> This is the start of the stable review cycle for the 4.19.176 release.
> There are 27 patches in this series, all will be posted as a response
> to this one.  If anyone has any issues with these being applied, please
> let me know.
>

Compiled and booted with no regressions on x86_64.

Tested-by: Ross Schmidt 


thanks,

Ross


Re: [PATCH v2 2/4] usb: typec: tps6598x: Add trace event for status register

2021-02-12 Thread kernel test robot
Hi "Guido,

I love your patch! Perhaps something to improve:

[auto build test WARNING on usb/usb-testing]
[also build test WARNING on v5.11-rc7 next-20210211]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Guido-G-nther/usb-typec-tps6598x-Add-IRQ-flag-and-register-tracing/20210212-200855
base:   https://git.kernel.org/pub/scm/linux/kernel/git/gregkh/usb.git 
usb-testing
config: openrisc-randconfig-s032-20210209 (attached as .config)
compiler: or1k-linux-gcc (GCC) 9.3.0
reproduce:
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# apt-get install sparse
# sparse version: v0.6.3-215-g0fb77bb6-dirty
# 
https://github.com/0day-ci/linux/commit/ba45e1d5e1fd25b6aed8724106e6c7d5adef7a20
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Guido-G-nther/usb-typec-tps6598x-Add-IRQ-flag-and-register-tracing/20210212-200855
git checkout ba45e1d5e1fd25b6aed8724106e6c7d5adef7a20
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-9.3.0 make.cross C=1 
CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=openrisc 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 


"sparse warnings: (new ones prefixed by >>)"
   drivers/usb/typec/tps6598x.c: note: in included file (through 
include/trace/trace_events.h, include/trace/define_trace.h, 
drivers/usb/typec/tps6598x_trace.h):
>> drivers/usb/typec/./tps6598x_trace.h:157:1: sparse: sparse: too long token 
>> expansion

vim +157 drivers/usb/typec/./tps6598x_trace.h

c90c0282e4ce33 Guido Günther 2021-02-12  156  
ba45e1d5e1fd25 Guido Günther 2021-02-12 @157  TRACE_EVENT(tps6598x_status,
ba45e1d5e1fd25 Guido Günther 2021-02-12  158TP_PROTO(u32 status),
ba45e1d5e1fd25 Guido Günther 2021-02-12  159TP_ARGS(status),
ba45e1d5e1fd25 Guido Günther 2021-02-12  160  
ba45e1d5e1fd25 Guido Günther 2021-02-12  161TP_STRUCT__entry(
ba45e1d5e1fd25 Guido Günther 2021-02-12  162 
__field(u32, status)
ba45e1d5e1fd25 Guido Günther 2021-02-12  163 ),
ba45e1d5e1fd25 Guido Günther 2021-02-12  164  
ba45e1d5e1fd25 Guido Günther 2021-02-12  165TP_fast_assign(
ba45e1d5e1fd25 Guido Günther 2021-02-12  166   
__entry->status = status;
ba45e1d5e1fd25 Guido Günther 2021-02-12  167   ),
ba45e1d5e1fd25 Guido Günther 2021-02-12  168  
ba45e1d5e1fd25 Guido Günther 2021-02-12  169TP_printk("conn: %s, 
pp_5v0: %s, pp_hv: %s, pp_ext: %s, pp_cable: %s, "
ba45e1d5e1fd25 Guido Günther 2021-02-12  170  "pwr-src: %s, 
vbus: %s, usb-host: %s, legacy: %s, flags: %s",
ba45e1d5e1fd25 Guido Günther 2021-02-12  171  
show_status_conn_state(__entry->status),
ba45e1d5e1fd25 Guido Günther 2021-02-12  172  
show_status_pp_switch_state(TPS_STATUS_PP_5V0_SWITCH(__entry->status)),
ba45e1d5e1fd25 Guido Günther 2021-02-12  173  
show_status_pp_switch_state(TPS_STATUS_PP_HV_SWITCH(__entry->status)),
ba45e1d5e1fd25 Guido Günther 2021-02-12  174  
show_status_pp_switch_state(TPS_STATUS_PP_EXT_SWITCH(__entry->status)),
ba45e1d5e1fd25 Guido Günther 2021-02-12  175  
show_status_pp_switch_state(TPS_STATUS_PP_CABLE_SWITCH(__entry->status)),
ba45e1d5e1fd25 Guido Günther 2021-02-12  176  
show_status_power_sources(__entry->status),
ba45e1d5e1fd25 Guido Günther 2021-02-12  177  
show_status_vbus_status(__entry->status),
ba45e1d5e1fd25 Guido Günther 2021-02-12  178  
show_status_usb_host_present(__entry->status),
ba45e1d5e1fd25 Guido Günther 2021-02-12  179  
show_status_legacy(__entry->status),
ba45e1d5e1fd25 Guido Günther 2021-02-12  180  
show_status_flags(__entry->status)
ba45e1d5e1fd25 Guido Günther 2021-02-12  181)
ba45e1d5e1fd25 Guido Günther 2021-02-12  182  );
ba45e1d5e1fd25 Guido Günther 2021-02-12  183  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip
___
kbuild mailing list -- kbu...@lists.01.org
To unsubscribe send an email to kbuild-le...@lists.01.org


Re: linux-next: manual merge of the rcu tree with the block tree

2021-02-12 Thread Stephen Rothwell
Hi all,

On Fri, 12 Feb 2021 08:30:27 -0700 Jens Axboe  wrote:
>
> On 2/12/21 8:18 AM, Frederic Weisbecker wrote:
> > On Thu, Feb 11, 2021 at 04:48:52PM +1100, Stephen Rothwell wrote:  
> >> Hi all,
> >>
> >> Today's linux-next merge of the rcu tree got conflicts in:
> >>
> >>   include/linux/rcupdate.h
> >>   kernel/rcu/tree.c
> >>   kernel/rcu/tree_plugin.h
> >>
> >> between commits:
> >>
> >>   3a7b5c87a0b2 ("rcu/nocb: Perform deferred wake up before last idle's 
> >> need_resched() check")
> >>   e4234f21d2ea ("rcu: Pull deferred rcuog wake up to rcu_eqs_enter() 
> >> callers")
> >>   14bbd41d5109 ("entry/kvm: Explicitly flush pending rcuog wakeup before 
> >> last
> >>   rescheduling point")
> >> from the block tree and commits:  
> > 
> > Isn't it tip:/sched/core instead of block?  
> 
> It must be, maybe block just got merged first? It's just sched/core in a
> topic branch, to satisfy a dependency.

Well, yes, it is a topic branch merge into the block tree.  However,
that topic branch has not been merged into the tip/auto-latest branch
which is what linux-next pulls in as the tip tree.  (And the tip tree
and block trees were both merged before the rcu tree.)

-- 
Cheers,
Stephen Rothwell


pgpML3kYzCm3t.pgp
Description: OpenPGP digital signature


Re: [PATCH v3 2/3] misc/pvpanic: probe multiple instances

2021-02-12 Thread kernel test robot
Hi Mihai,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on linux/master]
[also build test ERROR on soc/for-next linus/master v5.11-rc7]
[cannot apply to char-misc/char-misc-testing next-20210211]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Mihai-Carabas/misc-pvpanic-split-up-generic-and-platform-dependent-code/20210213-043307
base:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
2ab38c17aac10bf55ab3efde4c4db3893d8691d2
config: x86_64-randconfig-r012-20210209 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
c9439ca36342fb6013187d0a69aef92736951476)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install x86_64 cross compiling tool for clang build
# apt-get install binutils-x86-64-linux-gnu
# 
https://github.com/0day-ci/linux/commit/70eed71fbb1f23b28a401213c2dac3c27fcae323
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Mihai-Carabas/misc-pvpanic-split-up-generic-and-platform-dependent-code/20210213-043307
git checkout 70eed71fbb1f23b28a401213c2dac3c27fcae323
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   drivers/misc/pvpanic/pvpanic.c:62:5: warning: no previous prototype for 
function 'pvpanic_probe' [-Wmissing-prototypes]
   int pvpanic_probe(void __iomem *pbase)
   ^
   drivers/misc/pvpanic/pvpanic.c:62:1: note: declare 'static' if the function 
is not intended to be used outside of this translation unit
   int pvpanic_probe(void __iomem *pbase)
   ^
   static 
>> drivers/misc/pvpanic/pvpanic.c:87:3: error: void function 'pvpanic_remove' 
>> should not return a value [-Wreturn-type]
   return -EINVAL;
   ^  ~~~
   drivers/misc/pvpanic/pvpanic.c:82:6: warning: no previous prototype for 
function 'pvpanic_remove' [-Wmissing-prototypes]
   void pvpanic_remove(void __iomem *pbase)
^
   drivers/misc/pvpanic/pvpanic.c:82:1: note: declare 'static' if the function 
is not intended to be used outside of this translation unit
   void pvpanic_remove(void __iomem *pbase)
   ^
   static 
   2 warnings and 1 error generated.


vim +/pvpanic_remove +87 drivers/misc/pvpanic/pvpanic.c

61  
  > 62  int pvpanic_probe(void __iomem *pbase)
63  {
64  struct pvpanic_instance *pi;
65  
66  if(!pbase)
67  return -EINVAL;
68  
69  pi = kmalloc(sizeof(*pi), GFP_ATOMIC);
70  if (!pi)
71  return -ENOMEM;
72  
73  pi->base = pbase;
74  spin_lock(&pvpanic_lock);
75  list_add(&pi->list, &pvpanic_list);
76  spin_unlock(&pvpanic_lock);
77  
78  return 0;
79  }
80  EXPORT_SYMBOL_GPL(pvpanic_probe);
81  
82  void pvpanic_remove(void __iomem *pbase)
83  {
84  struct pvpanic_instance *pi_cur, *pi_next;
85  
86  if(!pbase)
  > 87  return -EINVAL;
88  
89  spin_lock(&pvpanic_lock);
90  list_for_each_entry_safe(pi_cur, pi_next, &pvpanic_list, list) {
91  if (pi_cur->base == pbase) {
92  list_del(&pi_cur->list);
93  kfree(pi_cur);
94  break;
95  }
96  }
97  spin_unlock(&pvpanic_lock);
98  }
99  EXPORT_SYMBOL_GPL(pvpanic_remove);
   100  

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: linux-next: manual merge of the spi tree with the powerpc tree

2021-02-12 Thread Stephen Rothwell
Hi Mark,

On Fri, 12 Feb 2021 12:27:59 + Mark Brown  wrote:
>
> On Fri, Feb 12, 2021 at 03:31:42PM +1100, Stephen Rothwell wrote:
> 
> > BTW Mark: the author's address in 258ea99fe25a uses a non existent domain 
> > :-(  
> 
> Ugh, I think that's something gone wrong with b4 :(  A bit late now to
> try to fix it up.

Not sure about that, the email (following the link to lore from the
commit) has the same address (...@public.gmane.com) and that domain
does not exist. In fact the email headers (in lore) look like this:

From: Sergiu Cuciurean 

To: ,
,

Cc: Sergiu Cuciurean


So I am suprised that it was received by anyone.  Maybe gmane has an
internal reply system that is screwed.
-- 
Cheers,
Stephen Rothwell


pgpGRdMSGsCmt.pgp
Description: OpenPGP digital signature


Re: [PATCH] KVM: nVMX: Sync L2 guest CET states between L1/L2

2021-02-12 Thread Yang Weijiang
On Thu, Feb 11, 2021 at 09:18:03AM -0800, Sean Christopherson wrote:
> On Tue, Feb 09, 2021, Yang Weijiang wrote:
> > When L2 guest status has been changed by L1 QEMU/KVM, sync the change back
> > to L2 guest before the later's next vm-entry. On the other hand, if it's
> > changed due to L2 guest, sync it back so as to let L1 guest see the change.
> > 
> > Signed-off-by: Yang Weijiang 
> > ---
> >  arch/x86/kvm/vmx/nested.c | 12 
> >  1 file changed, 12 insertions(+)
> > 
> > diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
> > index 9728efd529a1..b9d8db8facea 100644
> > --- a/arch/x86/kvm/vmx/nested.c
> > +++ b/arch/x86/kvm/vmx/nested.c
> > @@ -2602,6 +2602,12 @@ static int prepare_vmcs02(struct kvm_vcpu *vcpu, 
> > struct vmcs12 *vmcs12,
> > /* Note: may modify VM_ENTRY/EXIT_CONTROLS and GUEST/HOST_IA32_EFER */
> > vmx_set_efer(vcpu, vcpu->arch.efer);
> >  
> > +   if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE) {
> > +   vmcs_writel(GUEST_SSP, vmcs12->guest_ssp);
> > +   vmcs_writel(GUEST_INTR_SSP_TABLE, vmcs12->guest_ssp_tbl);
> > +   vmcs_writel(GUEST_S_CET, vmcs12->guest_s_cet);
> > +   }
> > +
> 
> This is incomplete.  If VM_ENTRY_LOAD_CET_STATE is not set, then CET state 
> needs
> to be propagated from vmcs01 to vmcs02.  See nested.vmcs01_debugctl and
> nested.vmcs01_guest_bndcfgs.
> 
> It's tempting to say that we should add machinery to simplify implementing new
> fields that are conditionally loading, e.g. define an array that specifies the
> field, its control, and its offset in vmcs12, then process the array at the
> appropriate time.  That might be overkill though...
>
Thanks Sean! I'll check the implementation of the two features.

> > /*
> >  * Guest state is invalid and unrestricted guest is disabled,
> >  * which means L1 attempted VMEntry to L2 with invalid state.
> > @@ -4152,6 +4158,12 @@ static void sync_vmcs02_to_vmcs12(struct kvm_vcpu 
> > *vcpu, struct vmcs12 *vmcs12)
> >  
> > if (vmcs12->vm_exit_controls & VM_EXIT_SAVE_IA32_EFER)
> > vmcs12->guest_ia32_efer = vcpu->arch.efer;
> > +
> > +   if (vmcs12->vm_entry_controls & VM_ENTRY_LOAD_CET_STATE) {
> 
> This is wrong, guest state is saved on VM-Exit if the control is _supported_,
> it doesn't have to be enabled.
> 
>   If the processor supports the 1-setting of the “load CET” VM-entry control,
>   the contents of the IA32_S_CET and IA32_INTERRUPT_SSP_TABLE_ADDR MSRs are
>   saved into the corresponding fields. On processors that do not support Intel
>   64 architecture, bits 63:32 of these MSRs are not saved.
> 
> And I'm pretty sure we should define these fields as a so called "rare" 
> fields,
> i.e. add 'em to the case statement in is_vmcs12_ext_field() and process them 
> in
> sync_vmcs02_to_vmcs12_rare().  CET isn't easily emulated, so they should 
> almost
> never be read/written by a VMM, and thus aren't with synchronizing to vmcs12 
> on
> every exit.
Sure, will modifiy the patch accordingly.
> 
> > +   vmcs12->guest_ssp = vmcs_readl(GUEST_SSP);
> > +   vmcs12->guest_ssp_tbl = vmcs_readl(GUEST_INTR_SSP_TABLE);
> > +   vmcs12->guest_s_cet = vmcs_readl(GUEST_S_CET);
> > +   }
> >  }
> >  
> >  /*
> > -- 
> > 2.26.2
> > 


Re: [PATCH v2] printk: avoid prb_first_valid_seq() where possible

2021-02-12 Thread Sergey Senozhatsky
On (21/02/12 10:47), Petr Mladek wrote:
> > Fixes: 896fbe20b4e2333fb55 ("printk: use the lockless ringbuffer")
> > Reported-by: kernel test robot 
> > Reported-by: J. Avila 
> > Signed-off-by: John Ogness 
> 
> Reviewed-by: Petr Mladek 
> 
> I am going to push the patch later today. I would prefer to do it
> later and give a chance others to react. But the merge window
> will likely start the following week and Sergey was fine
> with this approach.

Yup, ACK.

-ss


Re: [PATCH v3 net-next 4/5] net: ipa: introduce ipa_table_hash_support()

2021-02-12 Thread Alex Elder

On 2/12/21 7:05 PM, Alexander Duyck wrote:

On Fri, Feb 12, 2021 at 6:40 AM Alex Elder  wrote:


Introduce a new function to abstract the knowledge of whether hashed
routing and filter tables are supported for a given IPA instance.

IPA v4.2 is the only one that doesn't support hashed tables (now
and for the foreseeable future), but the name of the helper function
is better for explaining what's going on.

Signed-off-by: Alex Elder 
---
v2: - Update copyrights.

  drivers/net/ipa/ipa_cmd.c   |  2 +-
  drivers/net/ipa/ipa_table.c | 16 +---
  drivers/net/ipa/ipa_table.h |  8 +++-
  3 files changed, 17 insertions(+), 9 deletions(-)

diff --git a/drivers/net/ipa/ipa_cmd.c b/drivers/net/ipa/ipa_cmd.c
index fd8bf6468d313..35e35852c25c5 100644
--- a/drivers/net/ipa/ipa_cmd.c
+++ b/drivers/net/ipa/ipa_cmd.c
@@ -268,7 +268,7 @@ static bool ipa_cmd_register_write_valid(struct ipa *ipa)
 /* If hashed tables are supported, ensure the hash flush register
  * offset will fit in a register write IPA immediate command.
  */
-   if (ipa->version != IPA_VERSION_4_2) {
+   if (ipa_table_hash_support(ipa)) {
 offset = ipa_reg_filt_rout_hash_flush_offset(ipa->version);
 name = "filter/route hash flush";
 if (!ipa_cmd_register_write_offset_valid(ipa, name, offset))
diff --git a/drivers/net/ipa/ipa_table.c b/drivers/net/ipa/ipa_table.c
index 32e2d3e052d55..baaab3dd0e63c 100644
--- a/drivers/net/ipa/ipa_table.c
+++ b/drivers/net/ipa/ipa_table.c
@@ -1,7 +1,7 @@
  // SPDX-License-Identifier: GPL-2.0

  /* Copyright (c) 2012-2018, The Linux Foundation. All rights reserved.
- * Copyright (C) 2018-2020 Linaro Ltd.
+ * Copyright (C) 2018-2021 Linaro Ltd.
   */

  #include 
@@ -239,6 +239,11 @@ static void ipa_table_validate_build(void)

  #endif /* !IPA_VALIDATE */

+bool ipa_table_hash_support(struct ipa *ipa)
+{
+   return ipa->version != IPA_VERSION_4_2;
+}
+


Since this is only a single comparison it might make more sense to
make this a static inline and place it in ipa.h. Otherwise you are
just bloating the code up to jump to such a small function.


Static inline will duplicate the function everywhere also,
"bloating" it in another way (but it's a fairly trivial
amount of code either way).

Nevertheless I agree with your sentiment.  It's moot now
because the series was just accepted, but I'll keep it
in mind.

Thanks for your input.

-Alex


  /* Zero entry count means no table, so just return a 0 address */
  static dma_addr_t ipa_table_addr(struct ipa *ipa, bool filter_mask, u16 count)
  {
@@ -412,8 +417,7 @@ int ipa_table_hash_flush(struct ipa *ipa)
 struct gsi_trans *trans;
 u32 val;

-   /* IPA version 4.2 does not support hashed tables */
-   if (ipa->version == IPA_VERSION_4_2)
+   if (!ipa_table_hash_support(ipa))
 return 0;

 trans = ipa_cmd_trans_alloc(ipa, 1);
@@ -531,8 +535,7 @@ static void ipa_filter_config(struct ipa *ipa, bool modem)
 enum gsi_ee_id ee_id = modem ? GSI_EE_MODEM : GSI_EE_AP;
 u32 ep_mask = ipa->filter_map;

-   /* IPA version 4.2 has no hashed route tables */
-   if (ipa->version == IPA_VERSION_4_2)
+   if (!ipa_table_hash_support(ipa))
 return;

 while (ep_mask) {
@@ -582,8 +585,7 @@ static void ipa_route_config(struct ipa *ipa, bool modem)
  {
 u32 route_id;

-   /* IPA version 4.2 has no hashed route tables */
-   if (ipa->version == IPA_VERSION_4_2)
+   if (!ipa_table_hash_support(ipa))
 return;

 for (route_id = 0; route_id < IPA_ROUTE_COUNT_MAX; route_id++)
diff --git a/drivers/net/ipa/ipa_table.h b/drivers/net/ipa/ipa_table.h
index 78038d14fcea9..1a68d20f19d6a 100644
--- a/drivers/net/ipa/ipa_table.h
+++ b/drivers/net/ipa/ipa_table.h
@@ -1,7 +1,7 @@
  /* SPDX-License-Identifier: GPL-2.0 */

  /* Copyright (c) 2012-2018, The Linux Foundation. All rights reserved.
- * Copyright (C) 2019-2020 Linaro Ltd.
+ * Copyright (C) 2019-2021 Linaro Ltd.
   */
  #ifndef _IPA_TABLE_H_
  #define _IPA_TABLE_H_
@@ -51,6 +51,12 @@ static inline bool ipa_filter_map_valid(struct ipa *ipa, u32 
filter_mask)

  #endif /* !IPA_VALIDATE */

+/**
+ * ipa_table_hash_support() - Return true if hashed tables are supported
+ * @ipa:   IPA pointer
+ */
+bool ipa_table_hash_support(struct ipa *ipa);
+
  /**
   * ipa_table_reset() - Reset filter and route tables entries to "none"
   * @ipa:   IPA pointer


Just define the function here and make it a static inline.





Re: [GIT PULL] cifs fixes

2021-02-12 Thread Linus Torvalds
On Fri, Feb 12, 2021 at 5:08 PM Stefan Metzmacher  wrote:
>
> > Zen 2 seems to have fixed things (knock wood - it's certainly working
> > for me), But many people obviously never saw any issues with Zen 1
> > either.
>
> Do you know about the Zen3 status, I was thinking to replace the system
> by this one with AMD Ryzen 9 5950X:

I have heard nothing but good things about Zen3 so far (apart from
apparently people complaining about availability), but it's only been
out a few months, so obviously coverage is somewhat limited.

I wish AMD hadn't decimated their Linux team (several years ago), and
they definitely had some embarrassing issues early on with Zen (apart
from the Zen 1 stability issues, they've screwed up rdrand at least
three times, iirc). But I've yet to hear of any Zen 3 issues, and I
suspect I'll upgrade when Threadripper comes out (I've become quite
spoiled by the build speeds of my Threadripper 3970X - the only thing
I miss is the better 'perf' support from Intel PEBS).

Note that I'm not necessarily the person who would hear about any
issues first, though, so take the above with a pinch of salt.

   Linus


[PATCH] arm64: mm: correct the start of physical address in linear map

2021-02-12 Thread Pavel Tatashin
Memory hotplug may fail on systems with CONFIG_RANDOMIZE_BASE because the
linear map range is not checked correctly.

The start physical address that linear map covers can be actually at the
end of the range because of randmomization. Check that and if so reduce it
to 0.

This can be verified on QEMU with setting kaslr-seed to ~0ul:

memstart_offset_seed = 0x
START: __pa(_PAGE_OFFSET(vabits_actual)) = 9000c000
END:   __pa(PAGE_END - 1) =  1000bfff

Signed-off-by: Pavel Tatashin 
Fixes: 58284a901b42 ("arm64/mm: Validate hotplug range before creating linear 
mapping")
---
 arch/arm64/mm/mmu.c | 15 +--
 1 file changed, 13 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/mm/mmu.c b/arch/arm64/mm/mmu.c
index ae0c3d023824..6057ecaea897 100644
--- a/arch/arm64/mm/mmu.c
+++ b/arch/arm64/mm/mmu.c
@@ -1444,14 +1444,25 @@ static void __remove_pgd_mapping(pgd_t *pgdir, unsigned 
long start, u64 size)
 
 static bool inside_linear_region(u64 start, u64 size)
 {
+   u64 start_linear_pa = __pa(_PAGE_OFFSET(vabits_actual));
+   u64 end_linear_pa = __pa(PAGE_END - 1);
+
+   /*
+* Check for a wrap, it is possible because of randomized linear mapping
+* the start physical address is actually bigger than the end physical
+* address. In this case set start to zero because [0, end_linear_pa]
+* range must still be able to cover all addressable physical addresses.
+*/
+   if (start_linear_pa > end_linear_pa)
+   start_linear_pa = 0;
+
/*
 * Linear mapping region is the range [PAGE_OFFSET..(PAGE_END - 1)]
 * accommodating both its ends but excluding PAGE_END. Max physical
 * range which can be mapped inside this linear mapping range, must
 * also be derived from its end points.
 */
-   return start >= __pa(_PAGE_OFFSET(vabits_actual)) &&
-  (start + size - 1) <= __pa(PAGE_END - 1);
+   return start >= start_linear_pa && (start + size - 1) <= end_linear_pa;
 }
 
 int arch_add_memory(int nid, u64 start, u64 size,
-- 
2.25.1



Re: [LKP] Re: [mm] 10befea91b: hackbench.throughput -62.4% regression

2021-02-12 Thread Roman Gushchin
On Thu, Feb 04, 2021 at 01:19:47PM +0800, Xing Zhengjun wrote:
> 
> 
> On 2/3/2021 10:49 AM, Roman Gushchin wrote:
> > On Tue, Feb 02, 2021 at 04:18:27PM +0800, Xing, Zhengjun wrote:
> > > On 1/14/2021 11:18 AM, Roman Gushchin wrote:
> > > > On Thu, Jan 14, 2021 at 10:51:51AM +0800, kernel test robot wrote:
> > > > > Greeting,
> > > > > 
> > > > > FYI, we noticed a -62.4% regression of hackbench.throughput due to 
> > > > > commit:
> > > > Hi!
> > > > 
> > > > Commit "mm: memcg/slab: optimize objcg stock draining" (currently only 
> > > > in the mm tree,
> > > > so no stable hash) should improve the hackbench regression.
> > > The commit has been merged into Linux mainline :
> > >   3de7d4f25a7438f09fef4e71ef111f1805cd8e7c ("mm: memcg/slab: optimize 
> > > objcg
> > > stock draining")
> > > I test the regression still existed.
> > Hm, so in your setup it's about the same with and without this commit?
> > 
> > It's strange because I've received a letter stating a 45.2% improvement 
> > recently:
> > https://lkml.org/lkml/2021/1/27/83
> 
> They are different test cases, 45.2% improvement test case run in "thread" 
> mode, -62.4% regression test case run in "process" mode.
> From 286e04b8ed7a0427 to 3de7d4f25a7438f09fef4e71ef1 there are two 
> regressions for process mode :
> 1) 286e04b8ed7a0427 to 10befea91b61c4e2c2d1df06a2e  (-62.4% regression)
> 2) 10befea91b61c4e2c2d1df06a2e to d3921cb8be29ce5668c64e23ffd (-22.3% 
> regression)
> 
> 3de7d4f25a7438f09fef4e71ef111f1805cd8e7c only fix the regression 2) , so the 
> value of "hackbench.throughput" for 3de7d4f25a7438f09fef4e71ef1(71824) and 
> 10befea91b61c4e2c2d1df06a2e (72220) is very closed.
> 
> Regression 1) still existed.

Hi!

I've looked into the regression, made a bisection and tried a couple of obvious 
ideas.

The majority of regression comes from the kfree() path and the most expensive 
operation
is a reading of an objcg from an objcg vector on the release path. It seems 
that it
happens from another cpu, or just long after the allocation, so it's almost 
always
an expensive cache miss.

I've initially thought that zeroing is expensive and tried an option with not 
zeroing
the pointer until the next allocation which uses the same place (this approach 
brings
a penalty for non-accounted allocations). But it didn't help much.

Good news: some percentage of the regression can be mitigated by lowering the 
accounting
accuracy. For example:
--
diff --git a/mm/memcontrol.c b/mm/memcontrol.c
index ed5cc78a8dbf..cca04571dadb 100644
--- a/mm/memcontrol.c
+++ b/mm/memcontrol.c
@@ -3261,7 +3261,7 @@ static void refill_obj_stock(struct obj_cgroup *objcg, 
unsigned int nr_bytes)
}
stock->nr_bytes += nr_bytes;
 
-   if (stock->nr_bytes > PAGE_SIZE)
+   if (stock->nr_bytes > 32 * PAGE_SIZE)
drain_obj_stock(stock);
 
local_irq_restore(flags);

--

We also can try to play with putting the memcg pointer at the end of the object 
(we discussed
such an option earlier), however it doesn't guarantee that it will be any 
hotter. In some cases
we can try to put the vector into the tail struct pages (Shakeel brought this 
idea earlier),
but it's far from trivial.

As I said previously, because we've switched to a more precise per-object 
accounting, some
regression can be unavoidable. But real applications should not be affected 
that hard,
because the cost of an allocation is still low. And there is always an option 
to disable
the kernel memory accounting.

I'll think more about what we can do here. Any ideas are welcome!

Thanks!




Re: [PATCH v5 net-next 00/10] Cleanup in brport flags switchdev offload for DSA

2021-02-12 Thread patchwork-bot+netdevbpf
Hello:

This series was applied to netdev/net-next.git (refs/heads/master):

On Fri, 12 Feb 2021 17:15:50 +0200 you wrote:
> From: Vladimir Oltean 
> 
> The initial goal of this series was to have better support for
> standalone ports mode on the DSA drivers like ocelot/felix and sja1105.
> This turned out to require some API adjustments in both directions:
> to the information presented to and by the switchdev notifier, and to
> the API presented to the switch drivers by the DSA layer.
> 
> [...]

Here is the summary with links:
  - [v5,net-next,01/10] net: switchdev: propagate extack to port attributes
https://git.kernel.org/netdev/net-next/c/4c08c586ff29
  - [v5,net-next,02/10] net: bridge: offload all port flags at once in 
br_setport
https://git.kernel.org/netdev/net-next/c/304ae3bf1c1a
  - [v5,net-next,03/10] net: bridge: don't print in br_switchdev_set_port_flag
https://git.kernel.org/netdev/net-next/c/078bbb851ea6
  - [v5,net-next,04/10] net: dsa: configure better brport flags when ports 
leave the bridge
https://git.kernel.org/netdev/net-next/c/5e38c15856e9
  - [v5,net-next,05/10] net: switchdev: pass flags and mask to both 
{PRE_,}BRIDGE_FLAGS attributes
https://git.kernel.org/netdev/net-next/c/e18f4c18ab5b
  - [v5,net-next,06/10] net: dsa: act as passthrough for bridge port flags
https://git.kernel.org/netdev/net-next/c/a8b659e7ff75
  - [v5,net-next,07/10] net: dsa: felix: restore multicast flood to CPU when 
NPI tagger reinitializes
https://git.kernel.org/netdev/net-next/c/6edb9e8d451e
  - [v5,net-next,08/10] net: mscc: ocelot: use separate flooding PGID for 
broadcast
https://git.kernel.org/netdev/net-next/c/b360d94f1b86
  - [v5,net-next,09/10] net: mscc: ocelot: offload bridge port flags to device
https://git.kernel.org/netdev/net-next/c/421741ea5672
  - [v5,net-next,10/10] net: dsa: sja1105: offload bridge port flags to device
https://git.kernel.org/netdev/net-next/c/4d9423549501

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html




Re: [PATCH] mm/hugetlb: remove redundant reservation check condition in alloc_huge_page()

2021-02-12 Thread Mike Kravetz
On 2/9/21 11:54 PM, Miaohe Lin wrote:
> If there is no reservation corresponding to a vma, map_chg is always != 0,
> i.e. we can not meet the condition where a vma does not have reservation
> while map_chg = 0.

This commit message might be easier to understand?

vma_resv_map(vma) checks if a reserve map is associated with the vma.  The
routine vma_needs_reservation() will check vma_resv_map(vma) and return 1
if no reserv map is present.  map_chg is set to the return value of
vma_needs_reservation().  Therefore, !vma_resv_map(vma) is redundant in the
expression:
map_chg || avoid_reserve || !vma_resv_map(vma);
Remove the redundant check.

-- 
Mike Kravetz

> 
> Signed-off-by: Miaohe Lin 
> ---
>  mm/hugetlb.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 4f2c92ddbca4..36c3646fa55f 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2311,7 +2311,7 @@ struct page *alloc_huge_page(struct vm_area_struct *vma,
>  
>   /* If this allocation is not consuming a reservation, charge it now.
>*/
> - deferred_reserve = map_chg || avoid_reserve || !vma_resv_map(vma);
> + deferred_reserve = map_chg || avoid_reserve;
>   if (deferred_reserve) {
>   ret = hugetlb_cgroup_charge_cgroup_rsvd(
>   idx, pages_per_huge_page(h), &h_cg);
> 


Re: [PATCH net-next 1/3] net: phy: broadcom: Remove unused flags

2021-02-12 Thread Florian Fainelli



On 2/12/2021 5:14 PM, Vladimir Oltean wrote:
> On Fri, Feb 12, 2021 at 05:08:58PM -0800, Florian Fainelli wrote:
>> That's right, tg3 drove a lot of the Broadcom PHY driver changes back
>> then, I also would like to rework the way we pass flags towards PHY
>> drivers because tg3 is basically the only driver doing it right, where
>> it checks the PHY ID first, then sets appropriate flags during connect.
> 
> Why does the tg3 controller need to enable the auto power down PHY
> feature in the first place and the PHY driver can't just enable it by
> itself?
> 

That would be a question for Michael if he remembers those details from
12 years ago.
-- 
Florian


[PATCH] microblaze: Fix built-in DTB alignment to be 8-byte aligned

2021-02-12 Thread Rob Herring
Commit 79edff12060f ("scripts/dtc: Update to upstream version
v1.6.0-51-g183df9e9c2b9") broke booting on Microblaze systems depending on
the build. The problem is libfdt gained an 8-byte starting alignment check,
but the Microblaze built-in DTB area is only 4-byte aligned. This affected
not just built-in DTBs as bootloader passed DTBs are copied into the
built-in DTB region.

Other arches using built-in DTBs use a common linker macro which has
sufficient alignment.

Fixes: 79edff12060f ("scripts/dtc: Update to upstream version 
v1.6.0-51-g183df9e9c2b9")
Reported-by: Guenter Roeck 
Tested-by: Guenter Roeck 
Cc: Michal Simek 
Signed-off-by: Rob Herring 
---
As the commit is in my tree, I'll take this via the DT tree.

 arch/microblaze/kernel/vmlinux.lds.S | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/microblaze/kernel/vmlinux.lds.S 
b/arch/microblaze/kernel/vmlinux.lds.S
index df07b3d06cd6..fb31747ec092 100644
--- a/arch/microblaze/kernel/vmlinux.lds.S
+++ b/arch/microblaze/kernel/vmlinux.lds.S
@@ -45,7 +45,7 @@ SECTIONS {
_etext = . ;
}
 
-   . = ALIGN (4) ;
+   . = ALIGN (8) ;
__fdt_blob : AT(ADDR(__fdt_blob) - LOAD_OFFSET) {
_fdt_start = . ;/* place for fdt blob */
*(__fdt_blob) ; /* Any link-placed DTB */
-- 
2.27.0



Re: [PATCH net-next 1/3] net: phy: broadcom: Remove unused flags

2021-02-12 Thread Vladimir Oltean
On Fri, Feb 12, 2021 at 05:08:58PM -0800, Florian Fainelli wrote:
> That's right, tg3 drove a lot of the Broadcom PHY driver changes back
> then, I also would like to rework the way we pass flags towards PHY
> drivers because tg3 is basically the only driver doing it right, where
> it checks the PHY ID first, then sets appropriate flags during connect.

Why does the tg3 controller need to enable the auto power down PHY
feature in the first place and the PHY driver can't just enable it by
itself?


Re: [PATCH net-next 2/3] net: phy: broadcom: Fix RXC/TXC auto disabling

2021-02-12 Thread Florian Fainelli



On 2/12/2021 5:11 PM, Vladimir Oltean wrote:
> On Fri, Feb 12, 2021 at 12:57:20PM -0800, Florian Fainelli wrote:
>> When support for optionally disabling the TXC was introduced, bit 2 was
>> used to do that operation but the datasheet for 50610M from 2009 does
>> not show bit 2 as being defined. Bit 8 is the one that allows automatic
>> disabling of the RXC/TXC auto disabling during auto power down.
>>
>> Fixes: 52fae0837153 ("tg3 / broadcom: Optionally disable TXC if no link")
>> Signed-off-by: Florian Fainelli 
>> ---
>>  include/linux/brcmphy.h | 2 +-
>>  1 file changed, 1 insertion(+), 1 deletion(-)
>>
>> diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h
>> index da7bf9dfef5b..3dd8203cf780 100644
>> --- a/include/linux/brcmphy.h
>> +++ b/include/linux/brcmphy.h
>> @@ -193,7 +193,7 @@
>>  #define BCM54XX_SHD_SCR30x05
>>  #define  BCM54XX_SHD_SCR3_DEF_CLK1250x0001
>>  #define  BCM54XX_SHD_SCR3_DLLAPD_DIS0x0002
>> -#define  BCM54XX_SHD_SCR3_TRDDAPD   0x0004
>> +#define  BCM54XX_SHD_SCR3_TRDDAPD   0x0100
>>  
>>  /* 01010: Auto Power-Down */
>>  #define BCM54XX_SHD_APD 0x0a
>> -- 
>> 2.25.1
>>
> 
> We may have a problem here, with the layout of the Spare Control 3
> register not being as universal as we think.
> 
> Your finding may have been the same as Kevin Lo's from commit
> b0ed0bbfb304 ("net: phy: broadcom: add support for BCM54811 PHY"),
> therefore your change is making BCM54XX_SHD_SCR3_TRDDAPD ==
> BCM54810_SHD_SCR3_TRDDAPD, so currently this if condition is redundant
> and probably something else is wrong too:
> 
>   if (phydev->dev_flags & PHY_BRCM_DIS_TXCRXC_NOENRGY) {
>   if (BRCM_PHY_MODEL(phydev) == PHY_ID_BCM54810 ||
>   BRCM_PHY_MODEL(phydev) == PHY_ID_BCM54811)
>   val |= BCM54810_SHD_SCR3_TRDDAPD;
>   else
>   val |= BCM54XX_SHD_SCR3_TRDDAPD;
>   }
> 
> I'm not sure what "TRDD" stands for, but my copy of the BCM5464R
> datasheet shows both bits 2 as well as 8 as being reserved. I have
> "CLK125 Output" in bit 0, "DLL Auto Power-Down" in bit 1, "SD/Energy
> Detect Change" in bit 5, "TXC Disable" in bit 6, and that's about it.

Let me go back to the datasheet of all of the PHYs supported by
bcm54xx_adjust_rxrefclk() and make sure we set the right bit.

I also have no idea what TRDD stands for.

> 
> But I think it doesn't matter what BCM5464R has, since this feature is
> gated by PHY_BRCM_DIS_TXCRXC_NOENRGY.
Yes, but it should be working nonetheless.
-- 
Florian


Re: [PATCH net-next 2/3] net: phy: broadcom: Fix RXC/TXC auto disabling

2021-02-12 Thread Vladimir Oltean
On Fri, Feb 12, 2021 at 12:57:20PM -0800, Florian Fainelli wrote:
> When support for optionally disabling the TXC was introduced, bit 2 was
> used to do that operation but the datasheet for 50610M from 2009 does
> not show bit 2 as being defined. Bit 8 is the one that allows automatic
> disabling of the RXC/TXC auto disabling during auto power down.
> 
> Fixes: 52fae0837153 ("tg3 / broadcom: Optionally disable TXC if no link")
> Signed-off-by: Florian Fainelli 
> ---
>  include/linux/brcmphy.h | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/include/linux/brcmphy.h b/include/linux/brcmphy.h
> index da7bf9dfef5b..3dd8203cf780 100644
> --- a/include/linux/brcmphy.h
> +++ b/include/linux/brcmphy.h
> @@ -193,7 +193,7 @@
>  #define BCM54XX_SHD_SCR3 0x05
>  #define  BCM54XX_SHD_SCR3_DEF_CLK125 0x0001
>  #define  BCM54XX_SHD_SCR3_DLLAPD_DIS 0x0002
> -#define  BCM54XX_SHD_SCR3_TRDDAPD0x0004
> +#define  BCM54XX_SHD_SCR3_TRDDAPD0x0100
>  
>  /* 01010: Auto Power-Down */
>  #define BCM54XX_SHD_APD  0x0a
> -- 
> 2.25.1
> 

We may have a problem here, with the layout of the Spare Control 3
register not being as universal as we think.

Your finding may have been the same as Kevin Lo's from commit
b0ed0bbfb304 ("net: phy: broadcom: add support for BCM54811 PHY"),
therefore your change is making BCM54XX_SHD_SCR3_TRDDAPD ==
BCM54810_SHD_SCR3_TRDDAPD, so currently this if condition is redundant
and probably something else is wrong too:

if (phydev->dev_flags & PHY_BRCM_DIS_TXCRXC_NOENRGY) {
if (BRCM_PHY_MODEL(phydev) == PHY_ID_BCM54810 ||
BRCM_PHY_MODEL(phydev) == PHY_ID_BCM54811)
val |= BCM54810_SHD_SCR3_TRDDAPD;
else
val |= BCM54XX_SHD_SCR3_TRDDAPD;
}

I'm not sure what "TRDD" stands for, but my copy of the BCM5464R
datasheet shows both bits 2 as well as 8 as being reserved. I have
"CLK125 Output" in bit 0, "DLL Auto Power-Down" in bit 1, "SD/Energy
Detect Change" in bit 5, "TXC Disable" in bit 6, and that's about it.

But I think it doesn't matter what BCM5464R has, since this feature is
gated by PHY_BRCM_DIS_TXCRXC_NOENRGY.


Re: [PATCH net-next 1/3] net: phy: broadcom: Remove unused flags

2021-02-12 Thread Florian Fainelli



On 2/12/2021 4:56 PM, Vladimir Oltean wrote:
> On Fri, Feb 12, 2021 at 12:57:19PM -0800, Florian Fainelli wrote:
>> We have a number of unused flags defined today and since we are scarce
>> on space and may need to introduce new flags in the future remove and
>> shift every existing flag down into a contiguous assignment. No
>> functional change.
>>
>> Signed-off-by: Florian Fainelli 
>> ---
> 
> Good to see some of the dev_flags go away!
> 
> PHY_BCM_FLAGS_MODE_1000BX is used just from broadcom.c, therefore it can
> probably be moved to a structure in phydev->priv.

The next step would be to move it to a private flag, indeed.

> 
> PHY_BRCM_STD_IBND_DISABLE, PHY_BRCM_EXT_IBND_RX_ENABLE and
> PHY_BRCM_EXT_IBND_TX_ENABLE are set by
> drivers/net/ethernet/broadcom/tg3.c but not used anywhere.

That's right, tg3 drove a lot of the Broadcom PHY driver changes back
then, I also would like to rework the way we pass flags towards PHY
drivers because tg3 is basically the only driver doing it right, where
it checks the PHY ID first, then sets appropriate flags during connect.
-- 
Florian


Re: [GIT PULL] cifs fixes

2021-02-12 Thread Stefan Metzmacher
Hi Linus,

>> The machine is running a 'AMD Ryzen Threadripper 2950X 16-Core Processor'
>> and is freezing without any trace every view days.
> 
> I don't think the first-gen Zen issues ever really got solved. There
> were multiple ones, with random segfaults for the early ones (but
> afaik those were fixed by an RMA process with AMD), but the "it
> randomly locks up" ones never had a satisfactory resolution afaik.
> 
> There were lots of random workarounds, but judging by your email:
> 
>> We played with various boot parameters (currently we're using
>> 'mem_encrypt=off rcu_nocbs=0-31 processor.max_cstate=1 idle=nomwait 
>> nomodeset consoleblank=0',
> 
> I suspect you've seen all the bugzilla threads on this issue (kernel
> bugzilla 196683 is probably the main one, but it was discussed
> elsewhere too).

I just found that one, I'll have a closer look at the details in the next days.

> I assume you've updated to latest BIOS and looked at various BIOS
> power management settings too?

No, but I'll have a look at that.

> Zen 2 seems to have fixed things (knock wood - it's certainly working
> for me), But many people obviously never saw any issues with Zen 1
> either.

Do you know about the Zen3 status, I was thinking to replace the system
by this one with AMD Ryzen 9 5950X:
https://www.hetzner.com/dedicated-rootserver/ax101

Thanks!
metze




signature.asc
Description: OpenPGP digital signature


Re: [GIT PULL] fscache: I/O API modernisation and netfs helper library

2021-02-12 Thread Linus Torvalds
On Thu, Feb 11, 2021 at 3:21 PM David Howells  wrote:
>
> Most of the development discussion took place on IRC and waving snippets of
> code about in pastebin rather than email - the latency of email is just too
> high.  There's not a great deal I can do about that now as I haven't kept IRC
> logs.  I can do that in future if you want.

No, I really don't.

IRC is fine for discussing ideas about how to solve things.

But no, it's not a replacement for actual code review after the fact.

If you think email has too long latency for review, and can't use
public mailing lists and cc the people who are maintainers, then I
simply don't want your patches.

You need to fix your development model. This whole "I need to get
feedback from whoever still uses irc and is active RIGHT NOW" is not a
valid model. It's fine for brainstorming for possible approaches, and
getting ideas, sure.

   Linus


Re: [PATCH v3 net-next 4/5] net: ipa: introduce ipa_table_hash_support()

2021-02-12 Thread Alexander Duyck
On Fri, Feb 12, 2021 at 6:40 AM Alex Elder  wrote:
>
> Introduce a new function to abstract the knowledge of whether hashed
> routing and filter tables are supported for a given IPA instance.
>
> IPA v4.2 is the only one that doesn't support hashed tables (now
> and for the foreseeable future), but the name of the helper function
> is better for explaining what's going on.
>
> Signed-off-by: Alex Elder 
> ---
> v2: - Update copyrights.
>
>  drivers/net/ipa/ipa_cmd.c   |  2 +-
>  drivers/net/ipa/ipa_table.c | 16 +---
>  drivers/net/ipa/ipa_table.h |  8 +++-
>  3 files changed, 17 insertions(+), 9 deletions(-)
>
> diff --git a/drivers/net/ipa/ipa_cmd.c b/drivers/net/ipa/ipa_cmd.c
> index fd8bf6468d313..35e35852c25c5 100644
> --- a/drivers/net/ipa/ipa_cmd.c
> +++ b/drivers/net/ipa/ipa_cmd.c
> @@ -268,7 +268,7 @@ static bool ipa_cmd_register_write_valid(struct ipa *ipa)
> /* If hashed tables are supported, ensure the hash flush register
>  * offset will fit in a register write IPA immediate command.
>  */
> -   if (ipa->version != IPA_VERSION_4_2) {
> +   if (ipa_table_hash_support(ipa)) {
> offset = ipa_reg_filt_rout_hash_flush_offset(ipa->version);
> name = "filter/route hash flush";
> if (!ipa_cmd_register_write_offset_valid(ipa, name, offset))
> diff --git a/drivers/net/ipa/ipa_table.c b/drivers/net/ipa/ipa_table.c
> index 32e2d3e052d55..baaab3dd0e63c 100644
> --- a/drivers/net/ipa/ipa_table.c
> +++ b/drivers/net/ipa/ipa_table.c
> @@ -1,7 +1,7 @@
>  // SPDX-License-Identifier: GPL-2.0
>
>  /* Copyright (c) 2012-2018, The Linux Foundation. All rights reserved.
> - * Copyright (C) 2018-2020 Linaro Ltd.
> + * Copyright (C) 2018-2021 Linaro Ltd.
>   */
>
>  #include 
> @@ -239,6 +239,11 @@ static void ipa_table_validate_build(void)
>
>  #endif /* !IPA_VALIDATE */
>
> +bool ipa_table_hash_support(struct ipa *ipa)
> +{
> +   return ipa->version != IPA_VERSION_4_2;
> +}
> +

Since this is only a single comparison it might make more sense to
make this a static inline and place it in ipa.h. Otherwise you are
just bloating the code up to jump to such a small function.

>  /* Zero entry count means no table, so just return a 0 address */
>  static dma_addr_t ipa_table_addr(struct ipa *ipa, bool filter_mask, u16 
> count)
>  {
> @@ -412,8 +417,7 @@ int ipa_table_hash_flush(struct ipa *ipa)
> struct gsi_trans *trans;
> u32 val;
>
> -   /* IPA version 4.2 does not support hashed tables */
> -   if (ipa->version == IPA_VERSION_4_2)
> +   if (!ipa_table_hash_support(ipa))
> return 0;
>
> trans = ipa_cmd_trans_alloc(ipa, 1);
> @@ -531,8 +535,7 @@ static void ipa_filter_config(struct ipa *ipa, bool modem)
> enum gsi_ee_id ee_id = modem ? GSI_EE_MODEM : GSI_EE_AP;
> u32 ep_mask = ipa->filter_map;
>
> -   /* IPA version 4.2 has no hashed route tables */
> -   if (ipa->version == IPA_VERSION_4_2)
> +   if (!ipa_table_hash_support(ipa))
> return;
>
> while (ep_mask) {
> @@ -582,8 +585,7 @@ static void ipa_route_config(struct ipa *ipa, bool modem)
>  {
> u32 route_id;
>
> -   /* IPA version 4.2 has no hashed route tables */
> -   if (ipa->version == IPA_VERSION_4_2)
> +   if (!ipa_table_hash_support(ipa))
> return;
>
> for (route_id = 0; route_id < IPA_ROUTE_COUNT_MAX; route_id++)
> diff --git a/drivers/net/ipa/ipa_table.h b/drivers/net/ipa/ipa_table.h
> index 78038d14fcea9..1a68d20f19d6a 100644
> --- a/drivers/net/ipa/ipa_table.h
> +++ b/drivers/net/ipa/ipa_table.h
> @@ -1,7 +1,7 @@
>  /* SPDX-License-Identifier: GPL-2.0 */
>
>  /* Copyright (c) 2012-2018, The Linux Foundation. All rights reserved.
> - * Copyright (C) 2019-2020 Linaro Ltd.
> + * Copyright (C) 2019-2021 Linaro Ltd.
>   */
>  #ifndef _IPA_TABLE_H_
>  #define _IPA_TABLE_H_
> @@ -51,6 +51,12 @@ static inline bool ipa_filter_map_valid(struct ipa *ipa, 
> u32 filter_mask)
>
>  #endif /* !IPA_VALIDATE */
>
> +/**
> + * ipa_table_hash_support() - Return true if hashed tables are supported
> + * @ipa:   IPA pointer
> + */
> +bool ipa_table_hash_support(struct ipa *ipa);
> +
>  /**
>   * ipa_table_reset() - Reset filter and route tables entries to "none"
>   * @ipa:   IPA pointer

Just define the function here and make it a static inline.


[PATCH 9/9] KVM: x86: Rename GPR accessors to make mode-aware variants the defaults

2021-02-12 Thread Sean Christopherson
Append raw to the direct variants of kvm_register_read/write(), and
drop the "l" from the mode-aware variants.  I.e. make the mode-aware
variants the default, and make the direct variants scary sounding so as
to discourage use.  Accessing the full 64-bit values irrespective of
mode is rarely the desired behavior.

Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/kvm_cache_regs.h | 19 ---
 arch/x86/kvm/svm/svm.c|  8 
 arch/x86/kvm/vmx/nested.c | 20 ++--
 arch/x86/kvm/vmx/vmx.c| 12 ++--
 arch/x86/kvm/x86.c|  8 
 arch/x86/kvm/x86.h|  8 
 arch/x86/kvm/xen.c|  2 +-
 7 files changed, 41 insertions(+), 36 deletions(-)

diff --git a/arch/x86/kvm/kvm_cache_regs.h b/arch/x86/kvm/kvm_cache_regs.h
index 2e11da2f5621..3db5c42c9ecd 100644
--- a/arch/x86/kvm/kvm_cache_regs.h
+++ b/arch/x86/kvm/kvm_cache_regs.h
@@ -62,7 +62,12 @@ static inline void kvm_register_mark_dirty(struct kvm_vcpu 
*vcpu,
__set_bit(reg, (unsigned long *)&vcpu->arch.regs_dirty);
 }
 
-static inline unsigned long kvm_register_read(struct kvm_vcpu *vcpu, int reg)
+/*
+ * The "raw" register helpers are only for cases where the full 64 bits of a
+ * register are read/written irrespective of current vCPU mode.  In other 
words,
+ * odds are good you shouldn't be using the raw variants.
+ */
+static inline unsigned long kvm_register_read_raw(struct kvm_vcpu *vcpu, int 
reg)
 {
if (WARN_ON_ONCE((unsigned int)reg >= NR_VCPU_REGS))
return 0;
@@ -73,8 +78,8 @@ static inline unsigned long kvm_register_read(struct kvm_vcpu 
*vcpu, int reg)
return vcpu->arch.regs[reg];
 }
 
-static inline void kvm_register_write(struct kvm_vcpu *vcpu, int reg,
- unsigned long val)
+static inline void kvm_register_write_raw(struct kvm_vcpu *vcpu, int reg,
+ unsigned long val)
 {
if (WARN_ON_ONCE((unsigned int)reg >= NR_VCPU_REGS))
return;
@@ -85,22 +90,22 @@ static inline void kvm_register_write(struct kvm_vcpu 
*vcpu, int reg,
 
 static inline unsigned long kvm_rip_read(struct kvm_vcpu *vcpu)
 {
-   return kvm_register_read(vcpu, VCPU_REGS_RIP);
+   return kvm_register_read_raw(vcpu, VCPU_REGS_RIP);
 }
 
 static inline void kvm_rip_write(struct kvm_vcpu *vcpu, unsigned long val)
 {
-   kvm_register_write(vcpu, VCPU_REGS_RIP, val);
+   kvm_register_write_raw(vcpu, VCPU_REGS_RIP, val);
 }
 
 static inline unsigned long kvm_rsp_read(struct kvm_vcpu *vcpu)
 {
-   return kvm_register_read(vcpu, VCPU_REGS_RSP);
+   return kvm_register_read_raw(vcpu, VCPU_REGS_RSP);
 }
 
 static inline void kvm_rsp_write(struct kvm_vcpu *vcpu, unsigned long val)
 {
-   kvm_register_write(vcpu, VCPU_REGS_RSP, val);
+   kvm_register_write_raw(vcpu, VCPU_REGS_RSP, val);
 }
 
 static inline u64 kvm_pdptr_read(struct kvm_vcpu *vcpu, int index)
diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 4dc64ebaa756..55afe41b4102 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2531,7 +2531,7 @@ static int cr_interception(struct vcpu_svm *svm)
err = 0;
if (cr >= 16) { /* mov to cr */
cr -= 16;
-   val = kvm_register_readl(&svm->vcpu, reg);
+   val = kvm_register_read(&svm->vcpu, reg);
trace_kvm_cr_write(cr, val);
switch (cr) {
case 0:
@@ -2577,7 +2577,7 @@ static int cr_interception(struct vcpu_svm *svm)
kvm_queue_exception(&svm->vcpu, UD_VECTOR);
return 1;
}
-   kvm_register_writel(&svm->vcpu, reg, val);
+   kvm_register_write(&svm->vcpu, reg, val);
trace_kvm_cr_read(cr, val);
}
return kvm_complete_insn_gp(&svm->vcpu, err);
@@ -2642,11 +2642,11 @@ static int dr_interception(struct vcpu_svm *svm)
dr = svm->vmcb->control.exit_code - SVM_EXIT_READ_DR0;
if (dr >= 16) { /* mov to DRn  */
dr -= 16;
-   val = kvm_register_readl(&svm->vcpu, reg);
+   val = kvm_register_read(&svm->vcpu, reg);
err = kvm_set_dr(&svm->vcpu, dr, val);
} else {
kvm_get_dr(&svm->vcpu, dr, &val);
-   kvm_register_writel(&svm->vcpu, reg, val);
+   kvm_register_write(&svm->vcpu, reg, val);
}
 
return kvm_complete_insn_gp(&svm->vcpu, err);
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index a02d8744ca66..358747586037 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4601,9 +4601,9 @@ int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsigned 
long exit_qualification,
else if (addr_size == 0)
off = (gva_t)sign_extend64(off, 15);
if (base_is_valid)
-   off += kvm_register_readl(vcpu, base

[PATCH 7/9] KVM: x86/xen: Drop RAX[63:32] when processing hypercall

2021-02-12 Thread Sean Christopherson
Truncate RAX to 32 bits, i.e. consume EAX, when retrieving the hypecall
index for a Xen hypercall.  Per Xen documentation[*], the index is EAX
when the vCPU is not in 64-bit mode.

[*] 
http://xenbits.xenproject.org/docs/sphinx-unstable/guest-guide/x86/hypercall-abi.html

Fixes: 23200b7a30de ("KVM: x86/xen: intercept xen hypercalls if enabled")
Cc: Joao Martins 
Cc: David Woodhouse 
Cc: sta...@vger.kernel.org
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/xen.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/xen.c b/arch/x86/kvm/xen.c
index af8f6562fce4..5bfed72edd07 100644
--- a/arch/x86/kvm/xen.c
+++ b/arch/x86/kvm/xen.c
@@ -383,7 +383,7 @@ int kvm_xen_hypercall(struct kvm_vcpu *vcpu)
bool longmode;
u64 input, params[6];
 
-   input = (u64)kvm_register_read(vcpu, VCPU_REGS_RAX);
+   input = (u64)kvm_register_readl(vcpu, VCPU_REGS_RAX);
 
/* Hyper-V hypercalls get bit 31 set in EAX */
if ((input & 0x8000) &&
-- 
2.30.0.478.g8a0d178c01-goog



[PATCH 8/9] KVM: SVM: Use default rAX size for INVLPGA emulation

2021-02-12 Thread Sean Christopherson
Drop bits 63:32 of RAX when grabbing the address for INVLPGA emulation
outside of 64-bit mode to make KVM's emulation slightly less wrong.  The
address for INVLPGA is determined by the effective address size, i.e.
it's not hardcoded to 64/32 bits for a given mode.  Add a FIXME to call
out that the emulation is wrong.

Opportunistically tweak the ASID handling to make it clear that it's
defined by ECX, not rCX.

Per the APM:
   The portion of rAX used to form the address is determined by the
   effective address size (current execution mode and optional address
   size prefix). The ASID is taken from ECX.

Fixes: ff092385e828 ("KVM: SVM: Implement INVLPGA")
Cc: sta...@vger.kernel.org
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/svm/svm.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index d077584d45ec..4dc64ebaa756 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2327,12 +2327,17 @@ static int clgi_interception(struct vcpu_svm *svm)
 static int invlpga_interception(struct vcpu_svm *svm)
 {
struct kvm_vcpu *vcpu = &svm->vcpu;
+   gva_t gva = kvm_rax_read(vcpu);
+   u32 asid = kvm_rcx_read(vcpu);
 
-   trace_kvm_invlpga(svm->vmcb->save.rip, kvm_rcx_read(&svm->vcpu),
- kvm_rax_read(&svm->vcpu));
+   /* FIXME: Handle an address size prefix. */
+   if (!is_long_mode(vcpu))
+   gva = (u32)gva;
+
+   trace_kvm_invlpga(svm->vmcb->save.rip, asid, gva);
 
/* Let's treat INVLPGA the same as INVLPG (can be optimized!) */
-   kvm_mmu_invlpg(vcpu, kvm_rax_read(&svm->vcpu));
+   kvm_mmu_invlpg(vcpu, gva);
 
return kvm_skip_emulated_instruction(&svm->vcpu);
 }
-- 
2.30.0.478.g8a0d178c01-goog



[PATCH 3/9] KVM: SVM: Truncate GPR value for DR and CR accesses in !64-bit mode

2021-02-12 Thread Sean Christopherson
Drop bits 63:32 on loads/stores to/from DRs and CRs when the vCPU is not
in 64-bit mode.  The APM states bits 63:32 are dropped for both DRs and
CRs:

  In 64-bit mode, the operand size is fixed at 64 bits without the need
  for a REX prefix. In non-64-bit mode, the operand size is fixed at 32
  bits and the upper 32 bits of the destination are forced to 0.

Fixes: 7ff76d58a9dc ("KVM: SVM: enhance MOV CR intercept handler")
Fixes: cae3797a4639 ("KVM: SVM: enhance mov DR intercept handler")
Cc: sta...@vger.kernel.org
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/svm/svm.c | 8 
 1 file changed, 4 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/svm/svm.c b/arch/x86/kvm/svm/svm.c
index 42d4710074a6..d077584d45ec 100644
--- a/arch/x86/kvm/svm/svm.c
+++ b/arch/x86/kvm/svm/svm.c
@@ -2526,7 +2526,7 @@ static int cr_interception(struct vcpu_svm *svm)
err = 0;
if (cr >= 16) { /* mov to cr */
cr -= 16;
-   val = kvm_register_read(&svm->vcpu, reg);
+   val = kvm_register_readl(&svm->vcpu, reg);
trace_kvm_cr_write(cr, val);
switch (cr) {
case 0:
@@ -2572,7 +2572,7 @@ static int cr_interception(struct vcpu_svm *svm)
kvm_queue_exception(&svm->vcpu, UD_VECTOR);
return 1;
}
-   kvm_register_write(&svm->vcpu, reg, val);
+   kvm_register_writel(&svm->vcpu, reg, val);
trace_kvm_cr_read(cr, val);
}
return kvm_complete_insn_gp(&svm->vcpu, err);
@@ -2637,11 +2637,11 @@ static int dr_interception(struct vcpu_svm *svm)
dr = svm->vmcb->control.exit_code - SVM_EXIT_READ_DR0;
if (dr >= 16) { /* mov to DRn  */
dr -= 16;
-   val = kvm_register_read(&svm->vcpu, reg);
+   val = kvm_register_readl(&svm->vcpu, reg);
err = kvm_set_dr(&svm->vcpu, dr, val);
} else {
kvm_get_dr(&svm->vcpu, dr, &val);
-   kvm_register_write(&svm->vcpu, reg, val);
+   kvm_register_writel(&svm->vcpu, reg, val);
}
 
return kvm_complete_insn_gp(&svm->vcpu, err);
-- 
2.30.0.478.g8a0d178c01-goog



[PATCH 6/9] KVM: nVMX: Truncate base/index GPR value on address calc in !64-bit

2021-02-12 Thread Sean Christopherson
Drop bits 63:32 of the base and/or index GPRs when calculating the
effective address of a VMX instruction memory operand.  Outside of 64-bit
mode, memory encodings are strictly limited to E*X and below.

Fixes: 064aea774768 ("KVM: nVMX: Decoding memory operands of VMX instructions")
Cc: sta...@vger.kernel.org
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/vmx/nested.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index d6c892ea551c..a02d8744ca66 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4601,9 +4601,9 @@ int get_vmx_mem_address(struct kvm_vcpu *vcpu, unsigned 
long exit_qualification,
else if (addr_size == 0)
off = (gva_t)sign_extend64(off, 15);
if (base_is_valid)
-   off += kvm_register_read(vcpu, base_reg);
+   off += kvm_register_readl(vcpu, base_reg);
if (index_is_valid)
-   off += kvm_register_read(vcpu, index_reg) << scaling;
+   off += kvm_register_readl(vcpu, index_reg) << scaling;
vmx_get_segment(vcpu, &s, seg_reg);
 
/*
-- 
2.30.0.478.g8a0d178c01-goog



[PATCH 5/9] KVM: nVMX: Truncate bits 63:32 of VMCS field on nested check in !64-bit

2021-02-12 Thread Sean Christopherson
Drop bits 63:32 of the VMCS field encoding when checking for a nested
VM-Exit on VMREAD/VMWRITE in !64-bit mode.  VMREAD and VMWRITE always
use 32-bit operands outside of 64-bit mode.

The actual emulation of VMREAD/VMWRITE does the right thing, this bug is
purely limited to incorrectly causing a nested VM-Exit if a GPR happens
to have bits 63:32 set outside of 64-bit mode.

Fixes: a7cde481b6e8 ("KVM: nVMX: Do not forward VMREAD/VMWRITE VMExits to L1 if 
required so by vmcs12 vmread/vmwrite bitmaps")
Cc: sta...@vger.kernel.org
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/vmx/nested.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index b2f0b5e9cd63..d6c892ea551c 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -5717,7 +5717,7 @@ static bool nested_vmx_exit_handled_vmcs_access(struct 
kvm_vcpu *vcpu,
 
/* Decode instruction info and find the field to access */
vmx_instruction_info = vmcs_read32(VMX_INSTRUCTION_INFO);
-   field = kvm_register_read(vcpu, (((vmx_instruction_info) >> 28) & 0xf));
+   field = kvm_register_readl(vcpu, (((vmx_instruction_info) >> 28) & 
0xf));
 
/* Out-of-range fields always cause a VM exit from L2 to L1 */
if (field >> 15)
-- 
2.30.0.478.g8a0d178c01-goog



[PATCH 4/9] KVM: VMX: Truncate GPR value for DR and CR reads in !64-bit mode

2021-02-12 Thread Sean Christopherson
Drop bits 63:32 when storing a DR/CR to a GPR when the vCPU is not in
64-bit mode.  Per the SDM:

  The operand size for these instructions is always 32 bits in non-64-bit
  modes, regardless of the operand-size attribute.

CR8 technically isn't affected as CR8 isn't accessible outside of 64-bit
mode, but fix it up for consistency and to allow for future cleanup.

Fixes: 6aa8b732ca01 ("[PATCH] kvm: userspace interface")
Cc: sta...@vger.kernel.org
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/vmx/vmx.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e0a3a9be654b..115826a020ff 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5067,12 +5067,12 @@ static int handle_cr(struct kvm_vcpu *vcpu)
case 3:
WARN_ON_ONCE(enable_unrestricted_guest);
val = kvm_read_cr3(vcpu);
-   kvm_register_write(vcpu, reg, val);
+   kvm_register_writel(vcpu, reg, val);
trace_kvm_cr_read(cr, val);
return kvm_skip_emulated_instruction(vcpu);
case 8:
val = kvm_get_cr8(vcpu);
-   kvm_register_write(vcpu, reg, val);
+   kvm_register_writel(vcpu, reg, val);
trace_kvm_cr_read(cr, val);
return kvm_skip_emulated_instruction(vcpu);
}
@@ -5145,7 +5145,7 @@ static int handle_dr(struct kvm_vcpu *vcpu)
unsigned long val;
 
kvm_get_dr(vcpu, dr, &val);
-   kvm_register_write(vcpu, reg, val);
+   kvm_register_writel(vcpu, reg, val);
err = 0;
} else {
err = kvm_set_dr(vcpu, dr, kvm_register_readl(vcpu, reg));
-- 
2.30.0.478.g8a0d178c01-goog



[PATCH 2/9] KVM: x86: Check CR3 GPA for validity regardless of vCPU mode

2021-02-12 Thread Sean Christopherson
Check CR3 for an invalid GPA even if the vCPU isn't in long mode.  For
bigger emulation flows, notably RSM, the vCPU mode may not be accurate
if CR0/CR4 are loaded after CR3.  For MOV CR3 and similar flows, the
caller is responsible for truncating the value.

Note, SMRAM.CR3 is read-only, so this is mostly a theoretical bug since
KVM will not have stored an illegal CR3 into SMRAM during SMI emulation.

Fixes: 660a5d517aaa ("KVM: x86: save/load state on SMM switch")
Cc: sta...@vger.kernel.org
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/x86.c | 11 ---
 1 file changed, 8 insertions(+), 3 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index 3fa140383f5d..72fd8d384df7 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -1073,10 +1073,15 @@ int kvm_set_cr3(struct kvm_vcpu *vcpu, unsigned long 
cr3)
return 0;
}
 
-   if (is_long_mode(vcpu) && kvm_vcpu_is_illegal_gpa(vcpu, cr3))
+   /*
+* Do not condition the GPA check on long mode, this helper is used to
+* stuff CR3, e.g. for RSM emulation, and there is no guarantee that
+* the current vCPU mode is accurate.
+*/
+   if (kvm_vcpu_is_illegal_gpa(vcpu, cr3))
return 1;
-   else if (is_pae_paging(vcpu) &&
-!load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))
+
+   if (is_pae_paging(vcpu) && !load_pdptrs(vcpu, vcpu->arch.walk_mmu, cr3))
return 1;
 
kvm_mmu_new_pgd(vcpu, cr3, skip_tlb_flush, skip_tlb_flush);
-- 
2.30.0.478.g8a0d178c01-goog



[PATCH 1/9] KVM: x86: Remove emulator's broken checks on CR0/CR3/CR4 loads

2021-02-12 Thread Sean Christopherson
Remove the emulator's checks for illegal CR0, CR3, and CR4 values, as
the checks are redundant, outdated, and in the case of SEV's C-bit,
broken.  The emulator manually calculates MAXPHYADDR from CPUID and
neglects to mask off the C-bit.  For all other checks, kvm_set_cr*() are
a superset of the emulator checks, e.g. see CR4.LA57.

Fixes: a780a3ea6282 ("KVM: X86: Fix reserved bits check for MOV to CR3")
Cc: Babu Moger 
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/emulate.c | 68 +-
 1 file changed, 1 insertion(+), 67 deletions(-)

diff --git a/arch/x86/kvm/emulate.c b/arch/x86/kvm/emulate.c
index f7970ba6219f..f4273b8e31fa 100644
--- a/arch/x86/kvm/emulate.c
+++ b/arch/x86/kvm/emulate.c
@@ -4230,75 +4230,9 @@ static int check_cr_read(struct x86_emulate_ctxt *ctxt)
 
 static int check_cr_write(struct x86_emulate_ctxt *ctxt)
 {
-   u64 new_val = ctxt->src.val64;
-   int cr = ctxt->modrm_reg;
-   u64 efer = 0;
-
-   static u64 cr_reserved_bits[] = {
-   0xULL,
-   0, 0, 0, /* CR3 checked later */
-   CR4_RESERVED_BITS,
-   0, 0, 0,
-   CR8_RESERVED_BITS,
-   };
-
-   if (!valid_cr(cr))
+   if (!valid_cr(ctxt->modrm_reg))
return emulate_ud(ctxt);
 
-   if (new_val & cr_reserved_bits[cr])
-   return emulate_gp(ctxt, 0);
-
-   switch (cr) {
-   case 0: {
-   u64 cr4;
-   if (((new_val & X86_CR0_PG) && !(new_val & X86_CR0_PE)) ||
-   ((new_val & X86_CR0_NW) && !(new_val & X86_CR0_CD)))
-   return emulate_gp(ctxt, 0);
-
-   cr4 = ctxt->ops->get_cr(ctxt, 4);
-   ctxt->ops->get_msr(ctxt, MSR_EFER, &efer);
-
-   if ((new_val & X86_CR0_PG) && (efer & EFER_LME) &&
-   !(cr4 & X86_CR4_PAE))
-   return emulate_gp(ctxt, 0);
-
-   break;
-   }
-   case 3: {
-   u64 rsvd = 0;
-
-   ctxt->ops->get_msr(ctxt, MSR_EFER, &efer);
-   if (efer & EFER_LMA) {
-   u64 maxphyaddr;
-   u32 eax, ebx, ecx, edx;
-
-   eax = 0x8008;
-   ecx = 0;
-   if (ctxt->ops->get_cpuid(ctxt, &eax, &ebx, &ecx,
-&edx, true))
-   maxphyaddr = eax & 0xff;
-   else
-   maxphyaddr = 36;
-   rsvd = rsvd_bits(maxphyaddr, 63);
-   if (ctxt->ops->get_cr(ctxt, 4) & X86_CR4_PCIDE)
-   rsvd &= ~X86_CR3_PCID_NOFLUSH;
-   }
-
-   if (new_val & rsvd)
-   return emulate_gp(ctxt, 0);
-
-   break;
-   }
-   case 4: {
-   ctxt->ops->get_msr(ctxt, MSR_EFER, &efer);
-
-   if ((efer & EFER_LMA) && !(new_val & X86_CR4_PAE))
-   return emulate_gp(ctxt, 0);
-
-   break;
-   }
-   }
-
return X86EMUL_CONTINUE;
 }
 
-- 
2.30.0.478.g8a0d178c01-goog



[PATCH 0/9] KVM: x86: Fixes for (benign?) truncation bugs

2021-02-12 Thread Sean Christopherson
Patches 01 and 02 fix theoretical bugs related to loading CRs through
the emulator.  The rest of the patches are a bunch of small fixes for
cases where KVM reads/writes a 64-bit register outside of 64-bit mode.

I stumbled on this when puzzling over commit 0107973a80ad ("KVM: x86:
Introduce cr3_lm_rsvd_bits in kvm_vcpu_arch"), which stated that SEV
guests failed to boot on PCID-enabled hosts.  Why only PCID hosts?

After much staring, I realized that the initial CR3 load in
rsm_enter_protected_mode() would skip the MAXPHYADDR check due to the
vCPU not being in long mode.  But due to the ordering problems with
PCID, when PCID is enabled in the guest, the second load of CR3 would
be done with long mode enabled and thus hit the SEV C-bit bug.

Changing kvm_set_cr3() made me look at the callers, and seeing that
SVM didn't properly truncate the value made me look at everything else,
and here we are.

Note, I strongly suspect the emulator still has bugs.  But, unless the
guest is deliberately trying to hit these types of bugs, even the ones
fixed here, they're likely benign.  I figured I was more likely to break
something than I was to fix something by diving into the emulator, so I
left it alone.  For now. :-)

P.S. A few of the segmentation tests in kvm-unit-tests fail with
 unrestricted guest disabled, but those failure go back to at least
 v5.9.  I'll bisect 'em next week.

Sean Christopherson (9):
  KVM: x86: Remove emulator's broken checks on CR0/CR3/CR4 loads
  KVM: x86: Check CR3 GPA for validity regardless of vCPU mode
  KVM: SVM: Truncate GPR value for DR and CR accesses in !64-bit mode
  KVM: VMX: Truncate GPR value for DR and CR reads in !64-bit mode
  KVM: nVMX: Truncate bits 63:32 of VMCS field on nested check in
!64-bit
  KVM: nVMX: Truncate base/index GPR value on address calc in !64-bit
  KVM: x86/xen: Drop RAX[63:32] when processing hypercall
  KVM: SVM: Use default rAX size for INVLPGA emulation
  KVM: x86: Rename GPR accessors to make mode-aware variants the
defaults

 arch/x86/kvm/emulate.c| 68 +--
 arch/x86/kvm/kvm_cache_regs.h | 19 ++
 arch/x86/kvm/svm/svm.c| 11 --
 arch/x86/kvm/vmx/nested.c | 14 
 arch/x86/kvm/vmx/vmx.c|  6 ++--
 arch/x86/kvm/x86.c| 19 ++
 arch/x86/kvm/x86.h|  8 ++---
 7 files changed, 47 insertions(+), 98 deletions(-)

-- 
2.30.0.478.g8a0d178c01-goog



Re: [GIT PULL] cifs fixes

2021-02-12 Thread Linus Torvalds
On Fri, Feb 12, 2021 at 4:26 PM Stefan Metzmacher  wrote:
>
> The machine is running a 'AMD Ryzen Threadripper 2950X 16-Core Processor'
> and is freezing without any trace every view days.

I don't think the first-gen Zen issues ever really got solved. There
were multiple ones, with random segfaults for the early ones (but
afaik those were fixed by an RMA process with AMD), but the "it
randomly locks up" ones never had a satisfactory resolution afaik.

There were lots of random workarounds, but judging by your email:

> We played with various boot parameters (currently we're using
> 'mem_encrypt=off rcu_nocbs=0-31 processor.max_cstate=1 idle=nomwait nomodeset 
> consoleblank=0',

I suspect you've seen all the bugzilla threads on this issue (kernel
bugzilla 196683 is probably the main one, but it was discussed
elsewhere too).

I assume you've updated to latest BIOS and looked at various BIOS
power management settings too?

Zen 2 seems to have fixed things (knock wood - it's certainly working
for me), But many people obviously never saw any issues with Zen 1
either.

   Linus


Re: [PATCH v3 net-next 0/5] net: ipa: some more cleanup

2021-02-12 Thread patchwork-bot+netdevbpf
Hello:

This series was applied to netdev/net-next.git (refs/heads/master):

On Fri, 12 Feb 2021 08:33:57 -0600 you wrote:
> Version 3 of this series uses dev_err_probe() in the second patch,
> as suggested by Heiner Kallweit.
> 
> Version 2 was sent to ensure the series was based on current
> net-next/master, and added copyright updates to files touched.
> 
> The original introduction is below.
> 
> [...]

Here is the summary with links:
  - [v3,net-next,1/5] net: ipa: use a separate pointer for adjusted GSI memory
https://git.kernel.org/netdev/net-next/c/571b1e7e58ad
  - [v3,net-next,2/5] net: ipa: use dev_err_probe() in ipa_clock.c
https://git.kernel.org/netdev/net-next/c/4c7ccfcd09fd
  - [v3,net-next,3/5] net: ipa: fix register write command validation
https://git.kernel.org/netdev/net-next/c/2d65ed76924b
  - [v3,net-next,4/5] net: ipa: introduce ipa_table_hash_support()
https://git.kernel.org/netdev/net-next/c/a266ad6b5deb
  - [v3,net-next,5/5] net: ipa: introduce gsi_channel_initialized()
https://git.kernel.org/netdev/net-next/c/6170b6dab2d4

You are awesome, thank you!
--
Deet-doot-dot, I am a bot.
https://korg.docs.kernel.org/patchwork/pwbot.html




Re: [PATCH net-next 1/3] net: phy: broadcom: Remove unused flags

2021-02-12 Thread Vladimir Oltean
On Fri, Feb 12, 2021 at 12:57:19PM -0800, Florian Fainelli wrote:
> We have a number of unused flags defined today and since we are scarce
> on space and may need to introduce new flags in the future remove and
> shift every existing flag down into a contiguous assignment. No
> functional change.
> 
> Signed-off-by: Florian Fainelli 
> ---

Good to see some of the dev_flags go away!

PHY_BCM_FLAGS_MODE_1000BX is used just from broadcom.c, therefore it can
probably be moved to a structure in phydev->priv.

PHY_BRCM_STD_IBND_DISABLE, PHY_BRCM_EXT_IBND_RX_ENABLE and
PHY_BRCM_EXT_IBND_TX_ENABLE are set by
drivers/net/ethernet/broadcom/tg3.c but not used anywhere.


Re: [GIT PULL] cifs fixes

2021-02-12 Thread Stefan Metzmacher
Am 12.02.21 um 21:39 schrieb Steve French:
> Metze/Bjorn,
> Linus is right - samba.org is down for me (I also verified with JRA).
> Any ETA on when it gets back up?
> 
> On Fri, Feb 12, 2021 at 2:05 PM Linus Torvalds
>  wrote:
>>
>> On Fri, Feb 12, 2021 at 10:16 AM Steve French  wrote:
>>>
>>>   git://git.samba.org/sfrench/cifs-2.6.git tags/5.11-rc7-smb3
>>
>> It looks like git.samba.org is feeling very sick and is not answering.
>> Not git, not ping (but maybe icmp ping is blocked).
>>
>> Please give it a kick, or provide some other hosting mirror?


It's online again.

The machine is running a 'AMD Ryzen Threadripper 2950X 16-Core Processor'
and is freezing without any trace every view days.

We played with various boot parameters (currently we're using
'mem_encrypt=off rcu_nocbs=0-31 processor.max_cstate=1 idle=nomwait nomodeset 
consoleblank=0',
with the ubuntu 20.04 5.8 kernel, we also tried 5.4 before), but nothing seems 
to help.

metze




signature.asc
Description: OpenPGP digital signature


Re: [PATCH v2 1/2] dt-bindings: clock: Add SC7280 GCC clock binding

2021-02-12 Thread Stephen Boyd
Quoting Taniya Das (2021-02-10 10:26:18)
> Add device tree bindings for global clock subsystem clock
> controller for Qualcomm Technology Inc's SC7280 SoCs.
> 
> Signed-off-by: Taniya Das 
> ---

Applied to clk-next


Re: [PATCH v2 2/2] clk: qcom: Add Global Clock controller (GCC) driver for SC7280

2021-02-12 Thread Stephen Boyd
Quoting Taniya Das (2021-02-10 10:26:19)
> Add support for the global clock controller found on SC7280
> based devices. This should allow most non-multimedia device
> drivers to probe and control their clocks.
> 
> Signed-off-by: Taniya Das 
> ---

Applied to clk-next


Re: [PATCH] mm/hugetlb: optimize the surplus state transfer code in move_hugetlb_state()

2021-02-12 Thread Mike Kravetz
On 2/9/21 11:12 PM, Miaohe Lin wrote:
> We should not transfer the per-node surplus state when we do not cross the
> node in order to save some cpu cycles
> 
> Signed-off-by: Miaohe Lin 
> ---
>  mm/hugetlb.c | 6 ++
>  1 file changed, 6 insertions(+)

Thanks,

I was going to comment that the usual case is migrating to another node
and old_nid != new_nid.  However, this really is workload and system
configuration dependent.  In any case, the quick check is worth potentially
saving a lock/unlock cycle.

Reviewed-by: Mike Kravetz 
-- 
Mike Kravetz

> 
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index da347047ea10..4f2c92ddbca4 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -5632,6 +5632,12 @@ void move_hugetlb_state(struct page *oldpage, struct 
> page *newpage, int reason)
>   SetHPageTemporary(oldpage);
>   ClearHPageTemporary(newpage);
>  
> + /*
> +  * There is no need to transfer the per-node surplus state
> +  * when we do not cross the node.
> +  */
> + if (new_nid == old_nid)
> + return;
>   spin_lock(&hugetlb_lock);
>   if (h->surplus_huge_pages_node[old_nid]) {
>   h->surplus_huge_pages_node[old_nid]--;
> 


[PATCH 11/14] KVM: VMX: Dynamically enable/disable PML based on memslot dirty logging

2021-02-12 Thread Sean Christopherson
From: Makarand Sonare 

Currently, if enable_pml=1 PML remains enabled for the entire lifetime
of the VM irrespective of whether dirty logging is enable or disabled.
When dirty logging is disabled, all the pages of the VM are manually
marked dirty, so that PML is effectively non-operational.  Setting
the dirty bits is an expensive operation which can cause severe MMU
lock contention in a performance sensitive path when dirty logging is
disabled after a failed or canceled live migration.

Manually setting dirty bits also fails to prevent PML activity if some
code path clears dirty bits, which can incur unnecessary VM-Exits.

In order to avoid this extra overhead, dynamically enable/disable PML
when dirty logging gets turned on/off for the first/last memslot.

Signed-off-by: Makarand Sonare 
Co-developed-by: Sean Christopherson 
Signed-off-by: Sean Christopherson 
---
 arch/x86/include/asm/kvm-x86-ops.h |  1 +
 arch/x86/include/asm/kvm_host.h|  4 
 arch/x86/kvm/vmx/nested.c  |  5 +
 arch/x86/kvm/vmx/vmx.c | 28 +++-
 arch/x86/kvm/vmx/vmx.h |  2 ++
 arch/x86/kvm/x86.c | 35 ++
 6 files changed, 70 insertions(+), 5 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h 
b/arch/x86/include/asm/kvm-x86-ops.h
index 90affdb2cbbc..323641097f63 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -93,6 +93,7 @@ KVM_X86_OP(check_intercept)
 KVM_X86_OP(handle_exit_irqoff)
 KVM_X86_OP_NULL(request_immediate_exit)
 KVM_X86_OP(sched_in)
+KVM_X86_OP_NULL(update_cpu_dirty_logging)
 KVM_X86_OP_NULL(pre_block)
 KVM_X86_OP_NULL(post_block)
 KVM_X86_OP_NULL(vcpu_blocking)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 5cf382ec48b0..ffcfa84c969d 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -89,6 +89,8 @@
KVM_ARCH_REQ_FLAGS(27, KVM_REQUEST_NO_WAKEUP)
 #define KVM_REQ_APF_READY  KVM_ARCH_REQ(28)
 #define KVM_REQ_MSR_FILTER_CHANGED KVM_ARCH_REQ(29)
+#define KVM_REQ_UPDATE_CPU_DIRTY_LOGGING \
+   KVM_ARCH_REQ_FLAGS(30, KVM_REQUEST_WAIT | KVM_REQUEST_NO_WAKEUP)
 
 #define CR0_RESERVED_BITS   \
(~(unsigned long)(X86_CR0_PE | X86_CR0_MP | X86_CR0_EM | X86_CR0_TS \
@@ -1007,6 +1009,7 @@ struct kvm_arch {
u32 bsp_vcpu_id;
 
u64 disabled_quirks;
+   int cpu_dirty_logging_count;
 
enum kvm_irqchip_mode irqchip_mode;
u8 nr_reserved_ioapic_pins;
@@ -1275,6 +1278,7 @@ struct kvm_x86_ops {
 * value indicates CPU dirty logging is unsupported or disabled.
 */
int cpu_dirty_log_size;
+   void (*update_cpu_dirty_logging)(struct kvm_vcpu *vcpu);
 
/* pmu operations of sub-arch */
const struct kvm_pmu_ops *pmu_ops;
diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index 0c6dda9980a6..a63da447ede9 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -4493,6 +4493,11 @@ void nested_vmx_vmexit(struct kvm_vcpu *vcpu, u32 
vm_exit_reason,
vmx_set_virtual_apic_mode(vcpu);
}
 
+   if (vmx->nested.update_vmcs01_cpu_dirty_logging) {
+   vmx->nested.update_vmcs01_cpu_dirty_logging = false;
+   vmx_update_cpu_dirty_logging(vcpu);
+   }
+
/* Unpin physical memory we referred to in vmcs02 */
if (vmx->nested.apic_access_page) {
kvm_release_page_clean(vmx->nested.apic_access_page);
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index 862d1f5627e7..1204e5f0fe67 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -4277,7 +4277,12 @@ static void vmx_compute_secondary_exec_control(struct 
vcpu_vmx *vmx)
*/
exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS;
 
-   if (!enable_pml)
+   /*
+* PML is enabled/disabled when dirty logging of memsmlots changes, but
+* it needs to be set here when dirty logging is already active, e.g.
+* if this vCPU was created after dirty logging was enabled.
+*/
+   if (!vcpu->kvm->arch.cpu_dirty_logging_count)
exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
 
if (cpu_has_vmx_xsaves()) {
@@ -7499,6 +7504,26 @@ static void vmx_sched_in(struct kvm_vcpu *vcpu, int cpu)
shrink_ple_window(vcpu);
 }
 
+void vmx_update_cpu_dirty_logging(struct kvm_vcpu *vcpu)
+{
+   struct vcpu_vmx *vmx = to_vmx(vcpu);
+
+   if (is_guest_mode(vcpu)) {
+   vmx->nested.update_vmcs01_cpu_dirty_logging = true;
+   return;
+   }
+
+   /*
+* Note, cpu_dirty_logging_count can be changed concurrent with this
+* code, but in that case another update request will be made and so
+* the guest will never run with a stale PML value.
+*/
+   if (vcpu->kvm->arch.cpu_dirty_logging_

[PATCH 08/14] KVM: x86/mmu: Make dirty log size hook (PML) a value, not a function

2021-02-12 Thread Sean Christopherson
Store the vendor-specific dirty log size in a variable, there's no need
to wrap it in a function since the value is constant after
hardware_setup() runs.

Signed-off-by: Sean Christopherson 
---
 arch/x86/include/asm/kvm-x86-ops.h | 1 -
 arch/x86/include/asm/kvm_host.h| 2 +-
 arch/x86/kvm/mmu/mmu.c | 5 +
 arch/x86/kvm/vmx/vmx.c | 9 ++---
 4 files changed, 4 insertions(+), 13 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h 
b/arch/x86/include/asm/kvm-x86-ops.h
index 355a2ab8fc09..28c07cc01474 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -97,7 +97,6 @@ KVM_X86_OP_NULL(slot_enable_log_dirty)
 KVM_X86_OP_NULL(slot_disable_log_dirty)
 KVM_X86_OP_NULL(flush_log_dirty)
 KVM_X86_OP_NULL(enable_log_dirty_pt_masked)
-KVM_X86_OP_NULL(cpu_dirty_log_size)
 KVM_X86_OP_NULL(pre_block)
 KVM_X86_OP_NULL(post_block)
 KVM_X86_OP_NULL(vcpu_blocking)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index 84499aad01a4..fb59933610d9 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1294,7 +1294,7 @@ struct kvm_x86_ops {
void (*enable_log_dirty_pt_masked)(struct kvm *kvm,
   struct kvm_memory_slot *slot,
   gfn_t offset, unsigned long mask);
-   int (*cpu_dirty_log_size)(void);
+   int cpu_dirty_log_size;
 
/* pmu operations of sub-arch */
const struct kvm_pmu_ops *pmu_ops;
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index d5849a0e3de1..6c32e8e0f720 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1294,10 +1294,7 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm 
*kvm,
 
 int kvm_cpu_dirty_log_size(void)
 {
-   if (kvm_x86_ops.cpu_dirty_log_size)
-   return static_call(kvm_x86_cpu_dirty_log_size)();
-
-   return 0;
+   return kvm_x86_ops.cpu_dirty_log_size;
 }
 
 bool kvm_mmu_slot_gfn_write_protect(struct kvm *kvm,
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index b47ed3f412ef..f843707dd7df 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -7650,11 +7650,6 @@ static bool vmx_check_apicv_inhibit_reasons(ulong bit)
return supported & BIT(bit);
 }
 
-static int vmx_cpu_dirty_log_size(void)
-{
-   return enable_pml ? PML_ENTITY_NUM : 0;
-}
-
 static struct kvm_x86_ops vmx_x86_ops __initdata = {
.hardware_unsetup = hardware_unsetup,
 
@@ -7758,6 +7753,7 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
.slot_disable_log_dirty = vmx_slot_disable_log_dirty,
.flush_log_dirty = vmx_flush_log_dirty,
.enable_log_dirty_pt_masked = vmx_enable_log_dirty_pt_masked,
+   .cpu_dirty_log_size = PML_ENTITY_NUM,
 
.pre_block = vmx_pre_block,
.post_block = vmx_post_block,
@@ -7785,7 +7781,6 @@ static struct kvm_x86_ops vmx_x86_ops __initdata = {
 
.msr_filter_changed = vmx_msr_filter_changed,
.complete_emulated_msr = kvm_complete_insn_gp,
-   .cpu_dirty_log_size = vmx_cpu_dirty_log_size,
 
.vcpu_deliver_sipi_vector = kvm_vcpu_deliver_sipi_vector,
 };
@@ -7907,7 +7902,7 @@ static __init int hardware_setup(void)
vmx_x86_ops.slot_disable_log_dirty = NULL;
vmx_x86_ops.flush_log_dirty = NULL;
vmx_x86_ops.enable_log_dirty_pt_masked = NULL;
-   vmx_x86_ops.cpu_dirty_log_size = NULL;
+   vmx_x86_ops.cpu_dirty_log_size = 0;
}
 
if (!cpu_has_vmx_preemption_timer())
-- 
2.30.0.478.g8a0d178c01-goog



[PATCH 10/14] KVM: x86: Further clarify the logic and comments for toggling log dirty

2021-02-12 Thread Sean Christopherson
Add a sanity check in kvm_mmu_slot_apply_flags to assert that the
LOG_DIRTY_PAGES flag is indeed being toggled, and explicitly rely on
that holding true when zapping collapsible SPTEs.  Manipulating the
CPU dirty log (PML) and write-protection also relies on this assertion,
but that's not obvious in the current code.

Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/x86.c | 15 +++
 1 file changed, 11 insertions(+), 4 deletions(-)

diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index e89fe98a0099..c0d22f19aed0 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10761,12 +10761,20 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 enum kvm_mr_change change)
 {
/*
-* Nothing to do for RO slots or CREATE/MOVE/DELETE of a slot.
-* See comments below.
+* Nothing to do for RO slots (which can't be dirtied and can't be made
+* writable) or CREATE/MOVE/DELETE of a slot.  See comments below.
 */
if ((change != KVM_MR_FLAGS_ONLY) || (new->flags & KVM_MEM_READONLY))
return;
 
+   /*
+* READONLY and non-flags changes were filtered out above, and the only
+* other flag is LOG_DIRTY_PAGES, i.e. something is wrong if dirty
+* logging isn't being toggled on or off.
+*/
+   if (WARN_ON_ONCE(!((old->flags ^ new->flags) & 
KVM_MEM_LOG_DIRTY_PAGES)))
+   return;
+
/*
 * Dirty logging tracks sptes in 4k granularity, meaning that large
 * sptes have to be split.  If live migration is successful, the guest
@@ -10784,8 +10792,7 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 * MOVE/DELETE: The old mappings will already have been cleaned up by
 *  kvm_arch_flush_shadow_memslot()
 */
-   if ((old->flags & KVM_MEM_LOG_DIRTY_PAGES) &&
-   !(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
+   if (!(new->flags & KVM_MEM_LOG_DIRTY_PAGES))
kvm_mmu_zap_collapsible_sptes(kvm, new);
 
/*
-- 
2.30.0.478.g8a0d178c01-goog



[PATCH 14/14] KVM: x86/mmu: Remove a variety of unnecessary exports

2021-02-12 Thread Sean Christopherson
Remove several exports from the MMU that are no longer necessary.

No functional change intended.

Signed-off-by: Sean Christopherson 
---
 arch/x86/include/asm/kvm_host.h |  1 -
 arch/x86/kvm/mmu/mmu.c  | 35 ++---
 2 files changed, 15 insertions(+), 21 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index c15d6de8c457..0cf71ff2b2e5 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1592,7 +1592,6 @@ void kvm_inject_nmi(struct kvm_vcpu *vcpu);
 void kvm_update_dr7(struct kvm_vcpu *vcpu);
 
 int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn);
-int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva);
 void __kvm_mmu_free_some_pages(struct kvm_vcpu *vcpu);
 int kvm_mmu_load(struct kvm_vcpu *vcpu);
 void kvm_mmu_unload(struct kvm_vcpu *vcpu);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6ad0fb1913c6..30e9b0cb9abd 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2466,7 +2466,21 @@ int kvm_mmu_unprotect_page(struct kvm *kvm, gfn_t gfn)
 
return r;
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page);
+
+static int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
+{
+   gpa_t gpa;
+   int r;
+
+   if (vcpu->arch.mmu->direct_map)
+   return 0;
+
+   gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
+
+   r = kvm_mmu_unprotect_page(vcpu->kvm, gpa >> PAGE_SHIFT);
+
+   return r;
+}
 
 static void kvm_unsync_page(struct kvm_vcpu *vcpu, struct kvm_mmu_page *sp)
 {
@@ -3411,7 +3425,6 @@ void kvm_mmu_sync_roots(struct kvm_vcpu *vcpu)
kvm_mmu_audit(vcpu, AUDIT_POST_SYNC);
write_unlock(&vcpu->kvm->mmu_lock);
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_sync_roots);
 
 static gpa_t nonpaging_gva_to_gpa(struct kvm_vcpu *vcpu, gpa_t vaddr,
  u32 access, struct x86_exception *exception)
@@ -4977,22 +4990,6 @@ static void kvm_mmu_pte_write(struct kvm_vcpu *vcpu, 
gpa_t gpa,
write_unlock(&vcpu->kvm->mmu_lock);
 }
 
-int kvm_mmu_unprotect_page_virt(struct kvm_vcpu *vcpu, gva_t gva)
-{
-   gpa_t gpa;
-   int r;
-
-   if (vcpu->arch.mmu->direct_map)
-   return 0;
-
-   gpa = kvm_mmu_gva_to_gpa_read(vcpu, gva, NULL);
-
-   r = kvm_mmu_unprotect_page(vcpu->kvm, gpa >> PAGE_SHIFT);
-
-   return r;
-}
-EXPORT_SYMBOL_GPL(kvm_mmu_unprotect_page_virt);
-
 int kvm_mmu_page_fault(struct kvm_vcpu *vcpu, gpa_t cr2_or_gpa, u64 error_code,
   void *insn, int insn_len)
 {
@@ -5091,7 +5088,6 @@ void kvm_mmu_invalidate_gva(struct kvm_vcpu *vcpu, struct 
kvm_mmu *mmu,
mmu->invlpg(vcpu, gva, root_hpa);
}
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_invalidate_gva);
 
 void kvm_mmu_invlpg(struct kvm_vcpu *vcpu, gva_t gva)
 {
@@ -5131,7 +5127,6 @@ void kvm_mmu_invpcid_gva(struct kvm_vcpu *vcpu, gva_t 
gva, unsigned long pcid)
 * for them.
 */
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_invpcid_gva);
 
 void kvm_configure_mmu(bool enable_tdp, int tdp_max_root_level,
   int tdp_huge_page_level)
-- 
2.30.0.478.g8a0d178c01-goog



[PATCH 13/14] KVM: x86: Fold "write-protect large" use case into generic write-protect

2021-02-12 Thread Sean Christopherson
Drop kvm_mmu_slot_largepage_remove_write_access() and refactor its sole
caller to use kvm_mmu_slot_remove_write_access().  Remove the now-unused
slot_handle_large_level() and slot_handle_all_level() helpers.

No functional change intended.

Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c | 32 
 arch/x86/kvm/x86.c | 32 +---
 2 files changed, 17 insertions(+), 47 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 44ee55b26c3d..6ad0fb1913c6 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5204,22 +5204,6 @@ slot_handle_level(struct kvm *kvm, struct 
kvm_memory_slot *memslot,
lock_flush_tlb);
 }
 
-static __always_inline bool
-slot_handle_all_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
- slot_level_handler fn, bool lock_flush_tlb)
-{
-   return slot_handle_level(kvm, memslot, fn, PG_LEVEL_4K,
-KVM_MAX_HUGEPAGE_LEVEL, lock_flush_tlb);
-}
-
-static __always_inline bool
-slot_handle_large_level(struct kvm *kvm, struct kvm_memory_slot *memslot,
-   slot_level_handler fn, bool lock_flush_tlb)
-{
-   return slot_handle_level(kvm, memslot, fn, PG_LEVEL_4K + 1,
-KVM_MAX_HUGEPAGE_LEVEL, lock_flush_tlb);
-}
-
 static __always_inline bool
 slot_handle_leaf(struct kvm *kvm, struct kvm_memory_slot *memslot,
 slot_level_handler fn, bool lock_flush_tlb)
@@ -5584,22 +5568,6 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
 }
 
-void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm,
-   struct kvm_memory_slot *memslot)
-{
-   bool flush;
-
-   write_lock(&kvm->mmu_lock);
-   flush = slot_handle_large_level(kvm, memslot, slot_rmap_write_protect,
-   false);
-   if (is_tdp_mmu_enabled(kvm))
-   flush |= kvm_tdp_mmu_wrprot_slot(kvm, memslot, PG_LEVEL_2M);
-   write_unlock(&kvm->mmu_lock);
-
-   if (flush)
-   kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
-}
-
 void kvm_mmu_zap_all(struct kvm *kvm)
 {
struct kvm_mmu_page *sp, *node;
diff --git a/arch/x86/kvm/x86.c b/arch/x86/kvm/x86.c
index dca2cef2..1d2bc89431a2 100644
--- a/arch/x86/kvm/x86.c
+++ b/arch/x86/kvm/x86.c
@@ -10829,24 +10829,25 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 */
kvm_mmu_zap_collapsible_sptes(kvm, new);
} else {
-   /*
-* Large sptes are write-protected so they can be split on first
-* write. New large sptes cannot be created for this slot until
-* the end of the logging. See the comments in 
fast_page_fault().
-*
-* For small sptes, nothing is done if the dirty log is in the
-* initial-all-set state.  Otherwise, depending on whether pml
-* is enabled the D-bit or the W-bit will be cleared.
-*/
+   /* By default, write-protect everything to log writes. */
+   int level = PG_LEVEL_4K;
+
if (kvm_x86_ops.cpu_dirty_log_size) {
+   /*
+* Clear all dirty bits, unless pages are treated as
+* dirty from the get-go.
+*/
if (!kvm_dirty_log_manual_protect_and_init_set(kvm))
kvm_mmu_slot_leaf_clear_dirty(kvm, new);
-   kvm_mmu_slot_largepage_remove_write_access(kvm, new);
-   } else {
-   int level =
-   kvm_dirty_log_manual_protect_and_init_set(kvm) ?
-   PG_LEVEL_2M : PG_LEVEL_4K;
 
+   /*
+* Write-protect large pages on write so that dirty
+* logging happens at 4k granularity.  No need to
+* write-protect small SPTEs since write accesses are
+* logged by the CPU via dirty bits.
+*/
+   level = PG_LEVEL_2M;
+   } else if (kvm_dirty_log_manual_protect_and_init_set(kvm)) {
/*
 * If we're with initial-all-set, we don't need
 * to write protect any small page because
@@ -10855,8 +10856,9 @@ static void kvm_mmu_slot_apply_flags(struct kvm *kvm,
 * so that the page split can happen lazily on
 * the first write to the huge page.
 */
-   kvm_mmu_slot_remove_write_access(kvm, new, level);
+   level = PG_LEVEL_2M;
}
+   

[PATCH 09/14] KVM: x86: Move MMU's PML logic to common code

2021-02-12 Thread Sean Christopherson
Drop the facade of KVM's PML logic being vendor specific and move the
bits that aren't truly VMX specific into common x86 code.  The MMU logic
for dealing with PML is tightly coupled to the feature and to VMX's
implementation, bouncing through kvm_x86_ops obfuscates the code without
providing any meaningful separation of concerns or encapsulation.

No functional change intended.

Signed-off-by: Sean Christopherson 
---
 arch/x86/include/asm/kvm-x86-ops.h |  4 ---
 arch/x86/include/asm/kvm_host.h| 27 ++-
 arch/x86/kvm/mmu/mmu.c | 16 +++--
 arch/x86/kvm/vmx/vmx.c | 55 +-
 arch/x86/kvm/x86.c | 22 
 5 files changed, 24 insertions(+), 100 deletions(-)

diff --git a/arch/x86/include/asm/kvm-x86-ops.h 
b/arch/x86/include/asm/kvm-x86-ops.h
index 28c07cc01474..90affdb2cbbc 100644
--- a/arch/x86/include/asm/kvm-x86-ops.h
+++ b/arch/x86/include/asm/kvm-x86-ops.h
@@ -93,10 +93,6 @@ KVM_X86_OP(check_intercept)
 KVM_X86_OP(handle_exit_irqoff)
 KVM_X86_OP_NULL(request_immediate_exit)
 KVM_X86_OP(sched_in)
-KVM_X86_OP_NULL(slot_enable_log_dirty)
-KVM_X86_OP_NULL(slot_disable_log_dirty)
-KVM_X86_OP_NULL(flush_log_dirty)
-KVM_X86_OP_NULL(enable_log_dirty_pt_masked)
 KVM_X86_OP_NULL(pre_block)
 KVM_X86_OP_NULL(post_block)
 KVM_X86_OP_NULL(vcpu_blocking)
diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index fb59933610d9..5cf382ec48b0 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1271,29 +1271,9 @@ struct kvm_x86_ops {
void (*sched_in)(struct kvm_vcpu *kvm, int cpu);
 
/*
-* Arch-specific dirty logging hooks. These hooks are only supposed to
-* be valid if the specific arch has hardware-accelerated dirty logging
-* mechanism. Currently only for PML on VMX.
-*
-*  - slot_enable_log_dirty:
-*  called when enabling log dirty mode for the slot.
-*  - slot_disable_log_dirty:
-*  called when disabling log dirty mode for the slot.
-*  also called when slot is created with log dirty disabled.
-*  - flush_log_dirty:
-*  called before reporting dirty_bitmap to userspace.
-*  - enable_log_dirty_pt_masked:
-*  called when reenabling log dirty for the GFNs in the mask after
-*  corresponding bits are cleared in slot->dirty_bitmap.
+* Size of the CPU's dirty log buffer, i.e. VMX's PML buffer.  A zero
+* value indicates CPU dirty logging is unsupported or disabled.
 */
-   void (*slot_enable_log_dirty)(struct kvm *kvm,
- struct kvm_memory_slot *slot);
-   void (*slot_disable_log_dirty)(struct kvm *kvm,
-  struct kvm_memory_slot *slot);
-   void (*flush_log_dirty)(struct kvm *kvm);
-   void (*enable_log_dirty_pt_masked)(struct kvm *kvm,
-  struct kvm_memory_slot *slot,
-  gfn_t offset, unsigned long mask);
int cpu_dirty_log_size;
 
/* pmu operations of sub-arch */
@@ -1439,9 +1419,6 @@ void kvm_mmu_slot_largepage_remove_write_access(struct 
kvm *kvm,
struct kvm_memory_slot *memslot);
 void kvm_mmu_slot_set_dirty(struct kvm *kvm,
struct kvm_memory_slot *memslot);
-void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
-  struct kvm_memory_slot *slot,
-  gfn_t gfn_offset, unsigned long mask);
 void kvm_mmu_zap_all(struct kvm *kvm);
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
 unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 6c32e8e0f720..86182e79beaf 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1250,9 +1250,9 @@ static void kvm_mmu_write_protect_pt_masked(struct kvm 
*kvm,
  *
  * Used for PML to re-log the dirty GPAs after userspace querying dirty_bitmap.
  */
-void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
-struct kvm_memory_slot *slot,
-gfn_t gfn_offset, unsigned long mask)
+static void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
+struct kvm_memory_slot *slot,
+gfn_t gfn_offset, unsigned long mask)
 {
struct kvm_rmap_head *rmap_head;
 
@@ -1268,7 +1268,6 @@ void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
mask &= mask - 1;
}
 }
-EXPORT_SYMBOL_GPL(kvm_mmu_clear_dirty_pt_masked);
 
 /**
  * kvm_arch_mmu_enable_log_dirty_pt_masked - enable dirty logging for selected
@@ -1284,10 +1283,8 @@ void kvm_arch_mmu_enable_log_dirty_pt_masked(struct kvm 
*kvm,

[PATCH 12/14] KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PML

2021-02-12 Thread Sean Christopherson
Stop setting dirty bits for MMU pages when dirty logging is disabled for
a memslot, as PML is now completely disabled when there are no memslots
with dirty logging enabled.

This means that spurious PML entries will be created for memslots with
dirty logging disabled if at least one other memslot has dirty logging
enabled, but for all known use cases, dirty logging is a global VMM
control.  Furthermore, spurious PML entries are already possible since
dirty bits are set only when a dirty logging is turned off, i.e. memslots
that are never dirty logged will have dirty bits cleared.

In the end, it's faster overall to eat a few spurious PML entries in the
window where dirty logging is being disabled across all memslots.

Signed-off-by: Sean Christopherson 
---
 arch/x86/include/asm/kvm_host.h |  2 -
 arch/x86/kvm/mmu/mmu.c  | 45 -
 arch/x86/kvm/mmu/tdp_mmu.c  | 54 
 arch/x86/kvm/mmu/tdp_mmu.h  |  1 -
 arch/x86/kvm/x86.c  | 87 ++---
 5 files changed, 36 insertions(+), 153 deletions(-)

diff --git a/arch/x86/include/asm/kvm_host.h b/arch/x86/include/asm/kvm_host.h
index ffcfa84c969d..c15d6de8c457 100644
--- a/arch/x86/include/asm/kvm_host.h
+++ b/arch/x86/include/asm/kvm_host.h
@@ -1421,8 +1421,6 @@ void kvm_mmu_slot_leaf_clear_dirty(struct kvm *kvm,
   struct kvm_memory_slot *memslot);
 void kvm_mmu_slot_largepage_remove_write_access(struct kvm *kvm,
struct kvm_memory_slot *memslot);
-void kvm_mmu_slot_set_dirty(struct kvm *kvm,
-   struct kvm_memory_slot *memslot);
 void kvm_mmu_zap_all(struct kvm *kvm);
 void kvm_mmu_invalidate_mmio_sptes(struct kvm *kvm, u64 gen);
 unsigned long kvm_mmu_calculate_default_mmu_pages(struct kvm *kvm);
diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 86182e79beaf..44ee55b26c3d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1181,36 +1181,6 @@ static bool __rmap_clear_dirty(struct kvm *kvm, struct 
kvm_rmap_head *rmap_head,
return flush;
 }
 
-static bool spte_set_dirty(u64 *sptep)
-{
-   u64 spte = *sptep;
-
-   rmap_printk("spte %p %llx\n", sptep, *sptep);
-
-   /*
-* Similar to the !kvm_x86_ops.slot_disable_log_dirty case,
-* do not bother adding back write access to pages marked
-* SPTE_AD_WRPROT_ONLY_MASK.
-*/
-   spte |= shadow_dirty_mask;
-
-   return mmu_spte_update(sptep, spte);
-}
-
-static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
-struct kvm_memory_slot *slot)
-{
-   u64 *sptep;
-   struct rmap_iterator iter;
-   bool flush = false;
-
-   for_each_rmap_spte(rmap_head, &iter, sptep)
-   if (spte_ad_enabled(*sptep))
-   flush |= spte_set_dirty(sptep);
-
-   return flush;
-}
-
 /**
  * kvm_mmu_write_protect_pt_masked - write protect selected PT level pages
  * @kvm: kvm instance
@@ -5630,21 +5600,6 @@ void kvm_mmu_slot_largepage_remove_write_access(struct 
kvm *kvm,
kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
 }
 
-void kvm_mmu_slot_set_dirty(struct kvm *kvm,
-   struct kvm_memory_slot *memslot)
-{
-   bool flush;
-
-   write_lock(&kvm->mmu_lock);
-   flush = slot_handle_all_level(kvm, memslot, __rmap_set_dirty, false);
-   if (is_tdp_mmu_enabled(kvm))
-   flush |= kvm_tdp_mmu_slot_set_dirty(kvm, memslot);
-   write_unlock(&kvm->mmu_lock);
-
-   if (flush)
-   kvm_arch_flush_remote_tlbs_memslot(kvm, memslot);
-}
-
 void kvm_mmu_zap_all(struct kvm *kvm)
 {
struct kvm_mmu_page *sp, *node;
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index f8fa1f64e10d..c926c6b899a1 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1268,60 +1268,6 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm,
}
 }
 
-/*
- * Set the dirty status of all the SPTEs mapping GFNs in the memslot. This is
- * only used for PML, and so will involve setting the dirty bit on each SPTE.
- * Returns true if an SPTE has been changed and the TLBs need to be flushed.
- */
-static bool set_dirty_gfn_range(struct kvm *kvm, struct kvm_mmu_page *root,
-   gfn_t start, gfn_t end)
-{
-   struct tdp_iter iter;
-   u64 new_spte;
-   bool spte_set = false;
-
-   rcu_read_lock();
-
-   tdp_root_for_each_pte(iter, root, start, end) {
-   if (tdp_mmu_iter_cond_resched(kvm, &iter, false))
-   continue;
-
-   if (!is_shadow_present_pte(iter.old_spte) ||
-   iter.old_spte & shadow_dirty_mask)
-   continue;
-
-   new_spte = iter.old_spte | shadow_dirty_mask;
-
-   tdp_mmu_set_spte(kvm, &iter, new_spte);
-   spte_set 

[PATCH 07/14] KVM: x86/mmu: Expand on the comment in kvm_vcpu_ad_need_write_protect()

2021-02-12 Thread Sean Christopherson
Expand the comment about need to use write-protection for nested EPT
when PML is enabled to clarify that the tagging is a nop when PML is
_not_ enabled.  Without the clarification, omitting the PML check looks
wrong at first^Wfifth glance.

Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu_internal.h | 5 -
 1 file changed, 4 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 0b55aa561ec8..72b0928f2b2d 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -84,7 +84,10 @@ static inline bool kvm_vcpu_ad_need_write_protect(struct 
kvm_vcpu *vcpu)
 * When using the EPT page-modification log, the GPAs in the log
 * would come from L2 rather than L1.  Therefore, we need to rely
 * on write protection to record dirty pages.  This also bypasses
-* PML, since writes now result in a vmexit.
+* PML, since writes now result in a vmexit.  Note, this helper will
+* tag SPTEs as needing write-protection even if PML is disabled or
+* unsupported, but that's ok because the tag is consumed if and only
+* if PML is enabled.  Omit the PML check to save a few uops.
 */
return vcpu->arch.mmu == &vcpu->arch.guest_mmu;
 }
-- 
2.30.0.478.g8a0d178c01-goog



[PATCH 06/14] KVM: nVMX: Disable PML in hardware when running L2

2021-02-12 Thread Sean Christopherson
Unconditionally disable PML in vmcs02, KVM emulates PML purely in the
MMU, e.g. vmx_flush_pml_buffer() doesn't even try to copy the L2 GPAs
from vmcs02's buffer to vmcs12.  At best, enabling PML is a nop.  At
worst, it will cause vmx_flush_pml_buffer() to record bogus GFNs in the
dirty logs.

Initialize vmcs02.GUEST_PML_INDEX such that PML writes would trigger
VM-Exit if PML was somehow enabled, skip flushing the buffer for guest
mode since the index is bogus, and freak out if a PML full exit occurs
when L2 is active.

Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/vmx/nested.c | 29 +++--
 arch/x86/kvm/vmx/vmx.c| 12 ++--
 2 files changed, 25 insertions(+), 16 deletions(-)

diff --git a/arch/x86/kvm/vmx/nested.c b/arch/x86/kvm/vmx/nested.c
index b2f0b5e9cd63..0c6dda9980a6 100644
--- a/arch/x86/kvm/vmx/nested.c
+++ b/arch/x86/kvm/vmx/nested.c
@@ -2167,15 +2167,13 @@ static void prepare_vmcs02_constant_state(struct 
vcpu_vmx *vmx)
vmcs_write64(MSR_BITMAP, __pa(vmx->nested.vmcs02.msr_bitmap));
 
/*
-* The PML address never changes, so it is constant in vmcs02.
-* Conceptually we want to copy the PML index from vmcs01 here,
-* and then back to vmcs01 on nested vmexit.  But since we flush
-* the log and reset GUEST_PML_INDEX on each vmexit, the PML
-* index is also effectively constant in vmcs02.
+* PML is emulated for L2, but never enabled in hardware as the MMU
+* handles A/D emulation.  Disabling PML for L2 also avoids having to
+* deal with filtering out L2 GPAs from the buffer.
 */
if (enable_pml) {
-   vmcs_write64(PML_ADDRESS, page_to_phys(vmx->pml_pg));
-   vmcs_write16(GUEST_PML_INDEX, PML_ENTITY_NUM - 1);
+   vmcs_write64(PML_ADDRESS, 0);
+   vmcs_write16(GUEST_PML_INDEX, -1);
}
 
if (cpu_has_vmx_encls_vmexit())
@@ -2210,7 +2208,7 @@ static void prepare_vmcs02_early_rare(struct vcpu_vmx 
*vmx,
 
 static void prepare_vmcs02_early(struct vcpu_vmx *vmx, struct vmcs12 *vmcs12)
 {
-   u32 exec_control, vmcs12_exec_ctrl;
+   u32 exec_control;
u64 guest_efer = nested_vmx_calc_efer(vmx, vmcs12);
 
if (vmx->nested.dirty_vmcs12 || vmx->nested.hv_evmcs)
@@ -2284,11 +2282,11 @@ static void prepare_vmcs02_early(struct vcpu_vmx *vmx, 
struct vmcs12 *vmcs12)
  SECONDARY_EXEC_APIC_REGISTER_VIRT |
  SECONDARY_EXEC_ENABLE_VMFUNC);
if (nested_cpu_has(vmcs12,
-  CPU_BASED_ACTIVATE_SECONDARY_CONTROLS)) {
-   vmcs12_exec_ctrl = vmcs12->secondary_vm_exec_control &
-   ~SECONDARY_EXEC_ENABLE_PML;
-   exec_control |= vmcs12_exec_ctrl;
-   }
+  CPU_BASED_ACTIVATE_SECONDARY_CONTROLS))
+   exec_control |= vmcs12->secondary_vm_exec_control;
+
+   /* PML is emulated and never enabled in hardware for L2. */
+   exec_control &= ~SECONDARY_EXEC_ENABLE_PML;
 
/* VMCS shadowing for L2 is emulated for now */
exec_control &= ~SECONDARY_EXEC_SHADOW_VMCS;
@@ -5793,7 +5791,10 @@ static bool nested_vmx_l0_wants_exit(struct kvm_vcpu 
*vcpu,
case EXIT_REASON_PREEMPTION_TIMER:
return true;
case EXIT_REASON_PML_FULL:
-   /* We emulate PML support to L1. */
+   /*
+* PML is emulated for an L1 VMM and should never be enabled in
+* vmcs02, always "handle" PML_FULL by exiting to userspace.
+*/
return true;
case EXIT_REASON_VMFUNC:
/* VM functions are emulated through L2->L0 vmexits. */
diff --git a/arch/x86/kvm/vmx/vmx.c b/arch/x86/kvm/vmx/vmx.c
index e0a3a9be654b..b47ed3f412ef 100644
--- a/arch/x86/kvm/vmx/vmx.c
+++ b/arch/x86/kvm/vmx/vmx.c
@@ -5976,9 +5976,10 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, 
fastpath_t exit_fastpath)
 * updated. Another good is, in kvm_vm_ioctl_get_dirty_log, before
 * querying dirty_bitmap, we only need to kick all vcpus out of guest
 * mode as if vcpus is in root mode, the PML buffer must has been
-* flushed already.
+* flushed already.  Note, PML is never enabled in hardware while
+* running L2.
 */
-   if (enable_pml)
+   if (enable_pml && !is_guest_mode(vcpu))
vmx_flush_pml_buffer(vcpu);
 
/*
@@ -5994,6 +5995,13 @@ static int __vmx_handle_exit(struct kvm_vcpu *vcpu, 
fastpath_t exit_fastpath)
return handle_invalid_guest_state(vcpu);
 
if (is_guest_mode(vcpu)) {
+   /*
+* PML is never enabled when running L2, bail immediately if a
+* PML full exit occurs as something is horribly wrong.

[PATCH 05/14] KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs

2021-02-12 Thread Sean Christopherson
When zapping SPTEs in order to rebuild them as huge pages, use the new
helper that computes the max mapping level to detect whether or not a
SPTE should be zapped.  Doing so avoids zapping SPTEs that can't
possibly be rebuilt as huge pages, e.g. due to hardware constraints,
memslot alignment, etc...

This also avoids zapping SPTEs that are still large, e.g. if migration
was canceled before write-protected huge pages were shattered to enable
dirty logging.  Note, such pages are still write-protected at this time,
i.e. a page fault VM-Exit will still occur.  This will hopefully be
addressed in a future patch.

Sadly, TDP MMU loses its const on the memslot, but that's a pervasive
problem that's been around for quite some time.

Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c | 11 ++-
 arch/x86/kvm/mmu/tdp_mmu.c | 13 +++--
 arch/x86/kvm/mmu/tdp_mmu.h |  2 +-
 3 files changed, 14 insertions(+), 12 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index fb719e7a0cbb..d5849a0e3de1 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5553,8 +5553,8 @@ static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
 * mapping if the indirect sp has level = 1.
 */
if (sp->role.direct && !kvm_is_reserved_pfn(pfn) &&
-   (kvm_is_zone_device_pfn(pfn) ||
-PageCompound(pfn_to_page(pfn {
+   sp->role.level < kvm_mmu_max_mapping_level(kvm, slot, 
sp->gfn,
+  pfn, 
PG_LEVEL_NUM)) {
pte_list_remove(rmap_head, sptep);
 
if (kvm_available_flush_tlb_with_range())
@@ -5574,12 +5574,13 @@ void kvm_mmu_zap_collapsible_sptes(struct kvm *kvm,
   const struct kvm_memory_slot *memslot)
 {
/* FIXME: const-ify all uses of struct kvm_memory_slot.  */
+   struct kvm_memory_slot *slot = (struct kvm_memory_slot *)memslot;
+
write_lock(&kvm->mmu_lock);
-   slot_handle_leaf(kvm, (struct kvm_memory_slot *)memslot,
-kvm_mmu_zap_collapsible_spte, true);
+   slot_handle_leaf(kvm, slot, kvm_mmu_zap_collapsible_spte, true);
 
if (is_tdp_mmu_enabled(kvm))
-   kvm_tdp_mmu_zap_collapsible_sptes(kvm, memslot);
+   kvm_tdp_mmu_zap_collapsible_sptes(kvm, slot);
write_unlock(&kvm->mmu_lock);
 }
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 3cc332ed099d..f8fa1f64e10d 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1328,8 +1328,10 @@ bool kvm_tdp_mmu_slot_set_dirty(struct kvm *kvm, struct 
kvm_memory_slot *slot)
  */
 static void zap_collapsible_spte_range(struct kvm *kvm,
   struct kvm_mmu_page *root,
-  gfn_t start, gfn_t end)
+  struct kvm_memory_slot *slot)
 {
+   gfn_t start = slot->base_gfn;
+   gfn_t end = start + slot->npages;
struct tdp_iter iter;
kvm_pfn_t pfn;
bool spte_set = false;
@@ -1348,8 +1350,8 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 
pfn = spte_to_pfn(iter.old_spte);
if (kvm_is_reserved_pfn(pfn) ||
-   (!PageTransCompoundMap(pfn_to_page(pfn)) &&
-!kvm_is_zone_device_pfn(pfn)))
+   iter.level >= kvm_mmu_max_mapping_level(kvm, slot, iter.gfn,
+   pfn, PG_LEVEL_NUM))
continue;
 
tdp_mmu_set_spte(kvm, &iter, 0);
@@ -1367,7 +1369,7 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
  * be replaced by large mappings, for GFNs within the slot.
  */
 void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
-  const struct kvm_memory_slot *slot)
+  struct kvm_memory_slot *slot)
 {
struct kvm_mmu_page *root;
int root_as_id;
@@ -1377,8 +1379,7 @@ void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
if (root_as_id != slot->as_id)
continue;
 
-   zap_collapsible_spte_range(kvm, root, slot->base_gfn,
-  slot->base_gfn + slot->npages);
+   zap_collapsible_spte_range(kvm, root, slot);
}
 }
 
diff --git a/arch/x86/kvm/mmu/tdp_mmu.h b/arch/x86/kvm/mmu/tdp_mmu.h
index b4b65e3699b3..d31c5ed81a18 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.h
+++ b/arch/x86/kvm/mmu/tdp_mmu.h
@@ -35,7 +35,7 @@ void kvm_tdp_mmu_clear_dirty_pt_masked(struct kvm *kvm,
   bool wrprot);
 bool kvm_tdp_mmu_slot_set_dirty(struct kvm *kvm, struct kvm_memory_slot *slot);
 void kvm_tdp_mmu_zap_collapsible_sptes(struct kvm *kvm,
- 

[PATCH 04/14] KVM: x86/mmu: Pass the memslot to the rmap callbacks

2021-02-12 Thread Sean Christopherson
Pass the memslot to the rmap callbacks, it will be used when zapping
collapsible SPTEs to verify the memslot is compatible with hugepages
before zapping its SPTEs.

No functional change intended.

Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c | 24 +++-
 1 file changed, 15 insertions(+), 9 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 9be7fd474b2d..fb719e7a0cbb 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -1165,7 +1165,8 @@ static bool spte_wrprot_for_clear_dirty(u64 *sptep)
  * - W bit on ad-disabled SPTEs.
  * Returns true iff any D or W bits were cleared.
  */
-static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head 
*rmap_head)
+static bool __rmap_clear_dirty(struct kvm *kvm, struct kvm_rmap_head 
*rmap_head,
+  struct kvm_memory_slot *slot)
 {
u64 *sptep;
struct rmap_iterator iter;
@@ -1196,7 +1197,8 @@ static bool spte_set_dirty(u64 *sptep)
return mmu_spte_update(sptep, spte);
 }
 
-static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
+static bool __rmap_set_dirty(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+struct kvm_memory_slot *slot)
 {
u64 *sptep;
struct rmap_iterator iter;
@@ -1260,7 +1262,7 @@ void kvm_mmu_clear_dirty_pt_masked(struct kvm *kvm,
while (mask) {
rmap_head = __gfn_to_rmap(slot->base_gfn + gfn_offset + 
__ffs(mask),
  PG_LEVEL_4K, slot);
-   __rmap_clear_dirty(kvm, rmap_head);
+   __rmap_clear_dirty(kvm, rmap_head, slot);
 
/* clear the first set bit */
mask &= mask - 1;
@@ -1325,7 +1327,8 @@ static bool rmap_write_protect(struct kvm_vcpu *vcpu, u64 
gfn)
return kvm_mmu_slot_gfn_write_protect(vcpu->kvm, slot, gfn);
 }
 
-static bool kvm_zap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head)
+static bool kvm_zap_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
+ struct kvm_memory_slot *slot)
 {
u64 *sptep;
struct rmap_iterator iter;
@@ -1345,7 +1348,7 @@ static int kvm_unmap_rmapp(struct kvm *kvm, struct 
kvm_rmap_head *rmap_head,
   struct kvm_memory_slot *slot, gfn_t gfn, int level,
   unsigned long data)
 {
-   return kvm_zap_rmapp(kvm, rmap_head);
+   return kvm_zap_rmapp(kvm, rmap_head, slot);
 }
 
 static int kvm_set_pte_rmapp(struct kvm *kvm, struct kvm_rmap_head *rmap_head,
@@ -5189,7 +5192,8 @@ void kvm_configure_mmu(bool enable_tdp, int 
tdp_max_root_level,
 EXPORT_SYMBOL_GPL(kvm_configure_mmu);
 
 /* The return value indicates if tlb flush on all vcpus is needed. */
-typedef bool (*slot_level_handler) (struct kvm *kvm, struct kvm_rmap_head 
*rmap_head);
+typedef bool (*slot_level_handler) (struct kvm *kvm, struct kvm_rmap_head 
*rmap_head,
+   struct kvm_memory_slot *slot);
 
 /* The caller should hold mmu-lock before calling this function. */
 static __always_inline bool
@@ -5203,7 +5207,7 @@ slot_handle_level_range(struct kvm *kvm, struct 
kvm_memory_slot *memslot,
for_each_slot_rmap_range(memslot, start_level, end_level, start_gfn,
end_gfn, &iterator) {
if (iterator.rmap)
-   flush |= fn(kvm, iterator.rmap);
+   flush |= fn(kvm, iterator.rmap, memslot);
 
if (need_resched() || rwlock_needbreak(&kvm->mmu_lock)) {
if (flush && lock_flush_tlb) {
@@ -5492,7 +5496,8 @@ void kvm_zap_gfn_range(struct kvm *kvm, gfn_t gfn_start, 
gfn_t gfn_end)
 }
 
 static bool slot_rmap_write_protect(struct kvm *kvm,
-   struct kvm_rmap_head *rmap_head)
+   struct kvm_rmap_head *rmap_head,
+   struct kvm_memory_slot *slot)
 {
return __rmap_write_protect(kvm, rmap_head, false);
 }
@@ -5526,7 +5531,8 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
 }
 
 static bool kvm_mmu_zap_collapsible_spte(struct kvm *kvm,
-struct kvm_rmap_head *rmap_head)
+struct kvm_rmap_head *rmap_head,
+struct kvm_memory_slot *slot)
 {
u64 *sptep;
struct rmap_iterator iter;
-- 
2.30.0.478.g8a0d178c01-goog



[PATCH 03/14] KVM: x86/mmu: Split out max mapping level calculation to helper

2021-02-12 Thread Sean Christopherson
Factor out the logic for determining the maximum mapping level given a
memslot and a gpa.  The helper will be used when zapping collapsible
SPTEs when disabling dirty logging, e.g. to avoid zapping SPTEs that
can't possibly be rebuilt as hugepages.

No functional change intended.

Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c  | 37 -
 arch/x86/kvm/mmu/mmu_internal.h |  2 ++
 2 files changed, 24 insertions(+), 15 deletions(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index 24325bdcd387..9be7fd474b2d 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -2756,8 +2756,8 @@ static void direct_pte_prefetch(struct kvm_vcpu *vcpu, 
u64 *sptep)
__direct_pte_prefetch(vcpu, sp, sptep);
 }
 
-static int host_pfn_mapping_level(struct kvm_vcpu *vcpu, gfn_t gfn,
- kvm_pfn_t pfn, struct kvm_memory_slot *slot)
+static int host_pfn_mapping_level(struct kvm *kvm, gfn_t gfn, kvm_pfn_t pfn,
+ struct kvm_memory_slot *slot)
 {
unsigned long hva;
pte_t *pte;
@@ -2776,19 +2776,36 @@ static int host_pfn_mapping_level(struct kvm_vcpu 
*vcpu, gfn_t gfn,
 */
hva = __gfn_to_hva_memslot(slot, gfn);
 
-   pte = lookup_address_in_mm(vcpu->kvm->mm, hva, &level);
+   pte = lookup_address_in_mm(kvm->mm, hva, &level);
if (unlikely(!pte))
return PG_LEVEL_4K;
 
return level;
 }
 
+int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_memory_slot *slot,
+ gfn_t gfn, kvm_pfn_t pfn, int max_level)
+{
+   struct kvm_lpage_info *linfo;
+
+   max_level = min(max_level, max_huge_page_level);
+   for ( ; max_level > PG_LEVEL_4K; max_level--) {
+   linfo = lpage_info_slot(gfn, slot, max_level);
+   if (!linfo->disallow_lpage)
+   break;
+   }
+
+   if (max_level == PG_LEVEL_4K)
+   return PG_LEVEL_4K;
+
+   return host_pfn_mapping_level(kvm, gfn, pfn, slot);
+}
+
 int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
int max_level, kvm_pfn_t *pfnp,
bool huge_page_disallowed, int *req_level)
 {
struct kvm_memory_slot *slot;
-   struct kvm_lpage_info *linfo;
kvm_pfn_t pfn = *pfnp;
kvm_pfn_t mask;
int level;
@@ -2805,17 +2822,7 @@ int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t 
gfn,
if (!slot)
return PG_LEVEL_4K;
 
-   max_level = min(max_level, max_huge_page_level);
-   for ( ; max_level > PG_LEVEL_4K; max_level--) {
-   linfo = lpage_info_slot(gfn, slot, max_level);
-   if (!linfo->disallow_lpage)
-   break;
-   }
-
-   if (max_level == PG_LEVEL_4K)
-   return PG_LEVEL_4K;
-
-   level = host_pfn_mapping_level(vcpu, gfn, pfn, slot);
+   level = kvm_mmu_max_mapping_level(vcpu->kvm, slot, gfn, pfn, max_level);
if (level == PG_LEVEL_4K)
return level;
 
diff --git a/arch/x86/kvm/mmu/mmu_internal.h b/arch/x86/kvm/mmu/mmu_internal.h
index 9e38d3c5daad..0b55aa561ec8 100644
--- a/arch/x86/kvm/mmu/mmu_internal.h
+++ b/arch/x86/kvm/mmu/mmu_internal.h
@@ -138,6 +138,8 @@ enum {
 #define SET_SPTE_NEED_REMOTE_TLB_FLUSH BIT(1)
 #define SET_SPTE_SPURIOUS  BIT(2)
 
+int kvm_mmu_max_mapping_level(struct kvm *kvm, struct kvm_memory_slot *slot,
+ gfn_t gfn, kvm_pfn_t pfn, int max_level);
 int kvm_mmu_hugepage_adjust(struct kvm_vcpu *vcpu, gfn_t gfn,
int max_level, kvm_pfn_t *pfnp,
bool huge_page_disallowed, int *req_level);
-- 
2.30.0.478.g8a0d178c01-goog



[PATCH 02/14] KVM: x86/mmu: Don't unnecessarily write-protect small pages in TDP MMU

2021-02-12 Thread Sean Christopherson
Respect start_level when write-protect pages in the TDP MMU for dirty
logging.  When the dirty bitmaps are initialized with all bits set, small
pages don't need to be write-protected as they've already been marked
dirty.

Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/mmu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/mmu.c b/arch/x86/kvm/mmu/mmu.c
index e507568cd55d..24325bdcd387 100644
--- a/arch/x86/kvm/mmu/mmu.c
+++ b/arch/x86/kvm/mmu/mmu.c
@@ -5500,7 +5500,7 @@ void kvm_mmu_slot_remove_write_access(struct kvm *kvm,
flush = slot_handle_level(kvm, memslot, slot_rmap_write_protect,
start_level, KVM_MAX_HUGEPAGE_LEVEL, false);
if (is_tdp_mmu_enabled(kvm))
-   flush |= kvm_tdp_mmu_wrprot_slot(kvm, memslot, PG_LEVEL_4K);
+   flush |= kvm_tdp_mmu_wrprot_slot(kvm, memslot, start_level);
write_unlock(&kvm->mmu_lock);
 
/*
-- 
2.30.0.478.g8a0d178c01-goog



[PATCH 01/14] KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE pages

2021-02-12 Thread Sean Christopherson
Zap SPTEs that are backed by ZONE_DEVICE pages when zappings SPTEs to
rebuild them as huge pages in the TDP MMU.  ZONE_DEVICE huge pages are
managed differently than "regular" pages and are not compound pages.

Cc: Ben Gardon 
Fixes: 14881998566d ("kvm: x86/mmu: Support disabling dirty logging for the tdp 
MMU")
Signed-off-by: Sean Christopherson 
---
 arch/x86/kvm/mmu/tdp_mmu.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/arch/x86/kvm/mmu/tdp_mmu.c b/arch/x86/kvm/mmu/tdp_mmu.c
index 71e100a5670f..3cc332ed099d 100644
--- a/arch/x86/kvm/mmu/tdp_mmu.c
+++ b/arch/x86/kvm/mmu/tdp_mmu.c
@@ -1348,7 +1348,8 @@ static void zap_collapsible_spte_range(struct kvm *kvm,
 
pfn = spte_to_pfn(iter.old_spte);
if (kvm_is_reserved_pfn(pfn) ||
-   !PageTransCompoundMap(pfn_to_page(pfn)))
+   (!PageTransCompoundMap(pfn_to_page(pfn)) &&
+!kvm_is_zone_device_pfn(pfn)))
continue;
 
tdp_mmu_set_spte(kvm, &iter, 0);
-- 
2.30.0.478.g8a0d178c01-goog



[PATCH 00/14] KVM: x86/mmu: Dirty logging fixes and improvements

2021-02-12 Thread Sean Christopherson
Paolo, this is more or less ready, but on final read-through before
sending I realized it would be a good idea to WARN during VM destruction
if cpu_dirty_logging_count is non-zero.  I wanted to get you this before
the 5.12 window opens in case you want the TDP MMU fixes for 5.12.  I'll
do the above change and retest next week (note, Monday is a US holiday).

On to the code...

This started out as a small tweak to collapsible SPTE zapping in the TDP
MMU, and ended up as a rather large overhaul of CPU dirty logging, a.k.a.
PML.

Four main highlights:

  - Do a more precise check on whether or not a SPTE should be zapped to
rebuild it as a large page.
  - Disable PML when running L2.  PML is fully emulated for L1 VMMs, thus
enabling PML in L2 can only hurt and never help.
  - Drop the existing PML kvm_x86_ops.  They're basically trampolines into
the MMU, and IMO do far more harm than good.
  - Turn on PML only when it's needed instead of setting all dirty bits to
soft disable PML.

What led me down the rabbit's hole of ripping out the existing PML
kvm_x86_ops isn't really shown here.  Prior to incorporating Makarand's
patch, which allowed for the wholesale remove of setting dirty bits,
I spent a bunch of time poking around the "set dirty bits" code.  My
original changes optimized that path to skip setting dirty bits in the
nested MMU, since the nested MMU relies on write-protection and not PML.
That in turn allowed the TDP MMU zapping to completely skip walking the
rmaps, but doing so based on a bunch of callbacks was a twisted mess.

Happily, those patches got dropped in favor of nuking the code entirely.

Ran selftest and unit tests, and migrated actual VMs on AMD and Intel,
with and without TDP MMU, and with and without EPT.  The AMD system I'm
testing on infinite loops on the reset vector due to a #PF when NPT is
disabled, so that didn't get tested.  That reproduces with kvm/next,
I'll dig into it next week (no idea if it's a KVM or hardware issue).

For actual migration, I ran kvm-unit-tests in L1 along with stress to
hammer memory, and verified migration was effectively blocked until the
stress threads were killed (I didn't feel like figuring out how to
throttle the VM).

Makarand Sonare (1):
  KVM: VMX: Dynamically enable/disable PML based on memslot dirty
logging

Sean Christopherson (13):
  KVM: x86/mmu: Expand collapsible SPTE zap for TDP MMU to ZONE_DEVICE
pages
  KVM: x86/mmu: Don't unnecessarily write-protect small pages in TDP MMU
  KVM: x86/mmu: Split out max mapping level calculation to helper
  KVM: x86/mmu: Pass the memslot to the rmap callbacks
  KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs
  KVM: nVMX: Disable PML in hardware when running L2
  KVM: x86/mmu: Expand on the comment in
kvm_vcpu_ad_need_write_protect()
  KVM: x86/mmu: Make dirty log size hook (PML) a value, not a function
  KVM: x86: Move MMU's PML logic to common code
  KVM: x86: Further clarify the logic and comments for toggling log
dirty
  KVM: x86/mmu: Don't set dirty bits when disabling dirty logging w/ PML
  KVM: x86: Fold "write-protect large" use case into generic
write-protect
  KVM: x86/mmu: Remove a variety of unnecessary exports

 arch/x86/include/asm/kvm-x86-ops.h |   6 +-
 arch/x86/include/asm/kvm_host.h|  36 +
 arch/x86/kvm/mmu/mmu.c | 203 +
 arch/x86/kvm/mmu/mmu_internal.h|   7 +-
 arch/x86/kvm/mmu/tdp_mmu.c |  66 +-
 arch/x86/kvm/mmu/tdp_mmu.h |   3 +-
 arch/x86/kvm/vmx/nested.c  |  34 +++--
 arch/x86/kvm/vmx/vmx.c |  94 +
 arch/x86/kvm/vmx/vmx.h |   2 +
 arch/x86/kvm/x86.c | 145 +
 10 files changed, 230 insertions(+), 366 deletions(-)

-- 
2.30.0.478.g8a0d178c01-goog



Re: [PATCH v1 1/2] dt-bindings: clock: Add RPMHCC bindings for SC7280

2021-02-12 Thread Stephen Boyd
Quoting Taniya Das (2021-02-10 09:13:49)
> Add bindings and update documentation for clock rpmh driver on SC7280.
> 
> Signed-off-by: Taniya Das 
> ---

Applied to clk-next


Re: [PATCH v1 2/2] clk: qcom: rpmh: Add support for RPMH clocks on SC7280

2021-02-12 Thread Stephen Boyd
Quoting Taniya Das (2021-02-10 09:13:50)
> Add support for RPMH clocks on SC7280 SoCs.
> 
> Signed-off-by: Taniya Das 
> ---

Applied to clk-next


Re: [PATCH] vfio/type1: Use follow_pte()

2021-02-12 Thread kernel test robot
Hi Alex,

I love your patch! Yet something to improve:

[auto build test ERROR on vfio/next]
[also build test ERROR on v5.11-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Alex-Williamson/vfio-type1-Use-follow_pte/20210213-030541
base:   https://github.com/awilliam/linux-vfio.git next
config: x86_64-rhel (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce (this is a W=1 build):
# 
https://github.com/0day-ci/linux/commit/d1aea3bcf226e5225e706acb7df2f4c68ea8858a
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Alex-Williamson/vfio-type1-Use-follow_pte/20210213-030541
git checkout d1aea3bcf226e5225e706acb7df2f4c68ea8858a
# save the attached .config to linux build tree
make W=1 ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>, old ones prefixed by <<):

>> ERROR: modpost: "follow_pte" [drivers/vfio/vfio_iommu_type1.ko] undefined!

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip


Re: [PATCH] Revert "dts: phy: add GPIO number and active state used for phy reset"

2021-02-12 Thread Palmer Dabbelt

On Wed, 10 Feb 2021 04:47:34 PST (-0800), sch...@linux-m68k.org wrote:

On Feb 04 2021, Palmer Dabbelt wrote:


From: Palmer Dabbelt 

VSC8541 phys need a special reset sequence, which the driver doesn't
currentlny support.  As a result enabling the reset via GPIO essentially
guarnteees that the device won't work correctly.

This reverts commit a0fa9d727043da2238432471e85de0bdb8a8df65.

Fixes: a0fa9d727043 ("dts: phy: add GPIO number and active state used for phy 
reset")
Cc: sta...@vger.kernel.org
Signed-off-by: Palmer Dabbelt 


This fixes ethernet on the HiFive Unleashed with 5.10.12.


Thanks for testing.  Looks like I forgot to reply, but it's in Linus' tree and 
should end up in stable.


Re: [PATCH 2/2] rcu-tasks: add RCU-tasks self tests

2021-02-12 Thread Paul E. McKenney
On Fri, Feb 12, 2021 at 04:37:09PM -0800, Paul E. McKenney wrote:
> On Fri, Feb 12, 2021 at 03:48:51PM -0800, Paul E. McKenney wrote:
> > On Fri, Feb 12, 2021 at 10:12:07PM +0100, Uladzislau Rezki wrote:
> > > On Fri, Feb 12, 2021 at 08:20:59PM +0100, Sebastian Andrzej Siewior wrote:
> > > > On 2020-12-09 21:27:32 [+0100], Uladzislau Rezki (Sony) wrote:
> > > > > Add self tests for checking of RCU-tasks API functionality.
> > > > > It covers:
> > > > > - wait API functions;
> > > > > - invoking/completion call_rcu_tasks*().
> > > > > 
> > > > > Self-tests are run when CONFIG_PROVE_RCU kernel parameter is set.
> > > > 
> > > > I just bisected to this commit. By booting with `threadirqs' I end up
> > > > with:
> > > > [0.176533] Running RCU-tasks wait API self tests
> > > > 
> > > > No stall warning or so.
> > > > It boots again with:
> > > > 
> > > > diff --git a/init/main.c b/init/main.c
> > > > --- a/init/main.c
> > > > +++ b/init/main.c
> > > > @@ -1489,6 +1489,7 @@ void __init console_on_rootfs(void)
> > > > fput(file);
> > > >  }
> > > >  
> > > > +void rcu_tasks_initiate_self_tests(void);
> > > >  static noinline void __init kernel_init_freeable(void)
> > > >  {
> > > > /*
> > > > @@ -1514,6 +1515,7 @@ static noinline void __init 
> > > > kernel_init_freeable(void)
> > > >  
> > > > rcu_init_tasks_generic();
> > > > do_pre_smp_initcalls();
> > > > +   rcu_tasks_initiate_self_tests();
> > > > lockup_detector_init();
> > > >  
> > > > smp_init();
> > > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> > > > --- a/kernel/rcu/tasks.h
> > > > +++ b/kernel/rcu/tasks.h
> > > > @@ -1266,7 +1266,7 @@ static void test_rcu_tasks_callback(struct 
> > > > rcu_head *rhp)
> > > > rttd->notrun = true;
> > > >  }
> > > >  
> > > > -static void rcu_tasks_initiate_self_tests(void)
> > > > +void rcu_tasks_initiate_self_tests(void)
> > > >  {
> > > > pr_info("Running RCU-tasks wait API self tests\n");
> > > >  #ifdef CONFIG_TASKS_RCU
> > > > @@ -1322,7 +1322,6 @@ void __init rcu_init_tasks_generic(void)
> > > >  #endif
> > > >  
> > > > // Run the self-tests.
> > > > -   rcu_tasks_initiate_self_tests();
> > > >  }
> > > >  
> > > >  #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */
> > > > 
> > > > > Signed-off-by: Uladzislau Rezki (Sony) 
> > 
> > Apologies for the hassle!  My testing clearly missed this combination
> > of CONFIG_PROVE_RCU=y and threadirqs=1.  :-(
> > 
> > But at least I can easily reproduce this hang as follows:
> > 
> > tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 2 
> > --configs "TREE03" --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y 
> > CONFIG_PROVE_LOCKING=y" --bootargs "threadirqs=1" --trust-make
> > 
> > Sadly, I cannot take your patch because that simply papers over the
> > fact that early boot use of synchronize_rcu_tasks() is broken in this
> > particular configuration, which will likely eventually bite others now
> > that init_kprobes() has been moved earlier in boot:
> > 
> > 1b04fa990026 ("rcu-tasks: Move RCU-tasks initialization to before 
> > early_initcall()")
> > Link: https://lore.kernel.org/rcu/87eekfh80a@dja-thinkpad.axtens.net/
> > Fixes: 36dadef23fcc ("kprobes: Init kprobes in early_initcall")
> > 
> > > > Sebastian
> > > >
> > > We should be able to use call_rcu_tasks() in the *initcall() callbacks.
> > > The problem is that, ksoftirqd threads are not spawned by the time when
> > > an rcu_init_tasks_generic() is invoked:
> > > 
> > > diff --git a/init/main.c b/init/main.c
> > > index c68d784376ca..e6106bb12b2d 100644
> > > --- a/init/main.c
> > > +++ b/init/main.c
> > > @@ -954,7 +954,6 @@ asmlinkage __visible void __init 
> > > __no_sanitize_address start_kernel(void)
> > >   rcu_init_nohz();
> > >   init_timers();
> > >   hrtimers_init();
> > > - softirq_init();
> > >   timekeeping_init();
> > >  
> > >   /*
> > > @@ -1512,6 +1511,7 @@ static noinline void __init 
> > > kernel_init_freeable(void)
> > >  
> > >   init_mm_internals();
> > >  
> > > + softirq_init();
> > >   rcu_init_tasks_generic();
> > >   do_pre_smp_initcalls();
> > >   lockup_detector_init();
> > > diff --git a/kernel/softirq.c b/kernel/softirq.c
> > > index 9d71046ea247..cafa55c496d0 100644
> > > --- a/kernel/softirq.c
> > > +++ b/kernel/softirq.c
> > > @@ -630,6 +630,7 @@ void __init softirq_init(void)
> > >   &per_cpu(tasklet_hi_vec, cpu).head;
> > >   }
> > >  
> > > + spawn_ksoftirqd();
> > 
> > We need a forward reference to allow this to build, but with that added,
> > my test case passes.  Good show!
> > 
> > >   open_softirq(TASKLET_SOFTIRQ, tasklet_action);
> > >   open_softirq(HI_SOFTIRQ, tasklet_hi_action);
> > >  }
> > > @@ -732,7 +733,6 @@ static __init int spawn_ksoftirqd(void)
> > >  
> > >   return 0;
> > >  }
> > > -early_initcall(spawn_ksoftirqd);
> > >  
> > >  /*
> > >   * [ These __weak aliases are kept in a separate compilation unit, so 
> > > that
> > > 
> > > Any

Re: [PATCH] proc: Convert S_ permission uses to octal

2021-02-12 Thread Joe Perches
On Fri, 2021-02-12 at 17:48 -0600, Eric W. Biederman wrote:
> Matthew Wilcox  writes:
> > On Fri, Feb 12, 2021 at 04:01:48PM -0600, Eric W. Biederman wrote:
> > > Perhaps we can do something like:
> > > 
> > > #define S_IRWX 7
> > > #define S_IRW_ 6
> > > #define S_IR_X 5
> > > #define S_IR__ 4
> > > #define S_I_WX 3
> > > #define S_I_W_ 2
> > > #define S_I__X 1
> > > #define S_I___ 0
> > > 
> > > #define MODE(TYPE, USER, GROUP, OTHER) \
> > >   (((S_IF##TYPE) << 9) | \
> > >  ((S_I##USER)  << 6) | \
> > >  ((S_I##GROUP) << 3) | \
> > >  (S_I##OTHER))
> > > 
> > > Which would be used something like:
> > > MODE(DIR, RWX, R_X, R_X)
> > > MODE(REG, RWX, R__, R__)
> > > 
> > > Something like that should be able to address the readability while
> > > still using symbolic constants.
> > 
> > I think that's been proposed before.
> 
> I don't think it has ever been shot down.  Just no one care enough to
> implement it.

From: Linus Torvalds 
Date: Tue, 2 Aug 2016 16:58:29 -0400
Message-ID: 
 (raw)

[ So I answered similarly to another patch, but I'll just re-iterate
and change the subject line so that it stands out a bit from the
millions of actual patches ]

On Tue, Aug 2, 2016 at 1:42 PM, Pavel Machek  wrote:
>
> Everyone knows what 0644 is, but noone can read S_IRUSR | S_IWUSR |
> S_IRCRP | S_IROTH (*). Please don't do this.

Absolutely. It's *much* easier to parse and understand the octal
numbers, while the symbolic macro names are just random line noise and
hard as hell to understand. You really have to think about it.

So we should rather go the other way: convert existing bad symbolic
permission bit macro use to just use the octal numbers.

The symbolic names are good for the *other* bits (ie sticky bit, and
the inode mode _type_ numbers etc), but for the permission bits, the
symbolic names are just insane crap. Nobody sane should ever use them.
Not in the kernel, not in user space.




[PATCH] bus: mhi: core: Use current ee in intvec handler for syserr

2021-02-12 Thread Jeffrey Hugo
The intvec handler stores the caches ee in a local variable for use in
processing the intvec.  When determining if a syserr is a fatal error or
not, the intvec handler is using the cached version, when it should be
using the current ee read from the device.  Currently, the device could
be in the PBL ee as the result of a fatal error, but the cached ee might
be AMSS, which would cause the intvec handler to incorrectly signal a
non-fatal syserr.

Fixes: 3000f85b8f47 ("bus: mhi: core: Add support for basic PM operations")
Signed-off-by: Jeffrey Hugo 
---
 drivers/bus/mhi/core/main.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c
index 4e0131b..f182736 100644
--- a/drivers/bus/mhi/core/main.c
+++ b/drivers/bus/mhi/core/main.c
@@ -448,7 +448,7 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, 
void *priv)
wake_up_all(&mhi_cntrl->state_event);
 
/* For fatal errors, we let controller decide next step */
-   if (MHI_IN_PBL(ee))
+   if (MHI_IN_PBL(mhi_cntrl->ee))
mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_FATAL_ERROR);
else
mhi_pm_sys_err_handler(mhi_cntrl);
-- 
Qualcomm Technologies, Inc. is a member of the
Code Aurora Forum, a Linux Foundation Collaborative Project.



Re: [PATCH] proc: Convert S_ permission uses to octal

2021-02-12 Thread Joe Perches
On Fri, 2021-02-12 at 17:44 -0600, Eric W. Biederman wrote:

> I certainly do not see sufficient consensus to go around changing code
> other people maintain.

Every patch by a non-maintainer that doesn't have commit rights to
whatever tree is just a proposal.

> My suggest has the nice property that it handles all 512 different
> combinations.  I think that was the only real downside of Ingo's
> suggestion.  There are just too many different combinations to define
> a set of macros to cover all of the cases.

The treewide kernel use of octal vs symbolic permissions is ~2:1

There are about 11k uses of 4 digit octal values used for permissions
already in the kernel sources that are not in comments or strings.

$ git ls-files -- '*.[ch]' | xargs scc | sed 's/".*"//g' | grep -P -w 
'0[0-7]{3,3}' | wc -l
10818

(scc is a utility tool that strips comments from c source
 see: https://github.com/jleffler/scc-snapshots#readme)

vs:

$ git grep -w -P 'S_I[RWX][A-Z]{3,5}' | wc -l
5247

To my knowledge there just aren't many 4 digit octal uses in the
kernel sources that are _not_ permissions.

I believe the only non-permission 4 digit octal int uses not in
comments are:

include/uapi/linux/a.out.h
#define OMAGIC 0407
#define NMAGIC 0410
#define ZMAGIC 0413
#define QMAGIC 0314
#define CMAGIC 0421
#define N_STAB 0340

include/uapi/linux/coff.h
#define COFF_STMAGIC0401
#define COFF_OMAGIC 0404
#define COFF_JMAGIC 0407 
#define COFF_DMAGIC 0410 
#define COFF_ZMAGIC 0413 
#define COFF_SHMAGIC0443

fs/binfmt_flat.c:
if ((buf[0] != 037) || ((buf[1] != 0213) && (buf[1] != 0236))) {

lib/inflate.c:
((magic[1] != 0213) && (magic[1] != 0236))) {

And maybe those last 2 tests for gzip identification should be combined
into some static inline and use something other than magic constants.




Re: [PATCH 2/2] rcu-tasks: add RCU-tasks self tests

2021-02-12 Thread Paul E. McKenney
On Fri, Feb 12, 2021 at 03:48:51PM -0800, Paul E. McKenney wrote:
> On Fri, Feb 12, 2021 at 10:12:07PM +0100, Uladzislau Rezki wrote:
> > On Fri, Feb 12, 2021 at 08:20:59PM +0100, Sebastian Andrzej Siewior wrote:
> > > On 2020-12-09 21:27:32 [+0100], Uladzislau Rezki (Sony) wrote:
> > > > Add self tests for checking of RCU-tasks API functionality.
> > > > It covers:
> > > > - wait API functions;
> > > > - invoking/completion call_rcu_tasks*().
> > > > 
> > > > Self-tests are run when CONFIG_PROVE_RCU kernel parameter is set.
> > > 
> > > I just bisected to this commit. By booting with `threadirqs' I end up
> > > with:
> > > [0.176533] Running RCU-tasks wait API self tests
> > > 
> > > No stall warning or so.
> > > It boots again with:
> > > 
> > > diff --git a/init/main.c b/init/main.c
> > > --- a/init/main.c
> > > +++ b/init/main.c
> > > @@ -1489,6 +1489,7 @@ void __init console_on_rootfs(void)
> > >   fput(file);
> > >  }
> > >  
> > > +void rcu_tasks_initiate_self_tests(void);
> > >  static noinline void __init kernel_init_freeable(void)
> > >  {
> > >   /*
> > > @@ -1514,6 +1515,7 @@ static noinline void __init 
> > > kernel_init_freeable(void)
> > >  
> > >   rcu_init_tasks_generic();
> > >   do_pre_smp_initcalls();
> > > + rcu_tasks_initiate_self_tests();
> > >   lockup_detector_init();
> > >  
> > >   smp_init();
> > > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> > > --- a/kernel/rcu/tasks.h
> > > +++ b/kernel/rcu/tasks.h
> > > @@ -1266,7 +1266,7 @@ static void test_rcu_tasks_callback(struct rcu_head 
> > > *rhp)
> > >   rttd->notrun = true;
> > >  }
> > >  
> > > -static void rcu_tasks_initiate_self_tests(void)
> > > +void rcu_tasks_initiate_self_tests(void)
> > >  {
> > >   pr_info("Running RCU-tasks wait API self tests\n");
> > >  #ifdef CONFIG_TASKS_RCU
> > > @@ -1322,7 +1322,6 @@ void __init rcu_init_tasks_generic(void)
> > >  #endif
> > >  
> > >   // Run the self-tests.
> > > - rcu_tasks_initiate_self_tests();
> > >  }
> > >  
> > >  #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */
> > > 
> > > > Signed-off-by: Uladzislau Rezki (Sony) 
> 
> Apologies for the hassle!  My testing clearly missed this combination
> of CONFIG_PROVE_RCU=y and threadirqs=1.  :-(
> 
> But at least I can easily reproduce this hang as follows:
> 
> tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 2 
> --configs "TREE03" --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y 
> CONFIG_PROVE_LOCKING=y" --bootargs "threadirqs=1" --trust-make
> 
> Sadly, I cannot take your patch because that simply papers over the
> fact that early boot use of synchronize_rcu_tasks() is broken in this
> particular configuration, which will likely eventually bite others now
> that init_kprobes() has been moved earlier in boot:
> 
> 1b04fa990026 ("rcu-tasks: Move RCU-tasks initialization to before 
> early_initcall()")
> Link: https://lore.kernel.org/rcu/87eekfh80a@dja-thinkpad.axtens.net/
> Fixes: 36dadef23fcc ("kprobes: Init kprobes in early_initcall")
> 
> > > Sebastian
> > >
> > We should be able to use call_rcu_tasks() in the *initcall() callbacks.
> > The problem is that, ksoftirqd threads are not spawned by the time when
> > an rcu_init_tasks_generic() is invoked:
> > 
> > diff --git a/init/main.c b/init/main.c
> > index c68d784376ca..e6106bb12b2d 100644
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -954,7 +954,6 @@ asmlinkage __visible void __init __no_sanitize_address 
> > start_kernel(void)
> > rcu_init_nohz();
> > init_timers();
> > hrtimers_init();
> > -   softirq_init();
> > timekeeping_init();
> >  
> > /*
> > @@ -1512,6 +1511,7 @@ static noinline void __init kernel_init_freeable(void)
> >  
> > init_mm_internals();
> >  
> > +   softirq_init();
> > rcu_init_tasks_generic();
> > do_pre_smp_initcalls();
> > lockup_detector_init();
> > diff --git a/kernel/softirq.c b/kernel/softirq.c
> > index 9d71046ea247..cafa55c496d0 100644
> > --- a/kernel/softirq.c
> > +++ b/kernel/softirq.c
> > @@ -630,6 +630,7 @@ void __init softirq_init(void)
> > &per_cpu(tasklet_hi_vec, cpu).head;
> > }
> >  
> > +   spawn_ksoftirqd();
> 
> We need a forward reference to allow this to build, but with that added,
> my test case passes.  Good show!
> 
> > open_softirq(TASKLET_SOFTIRQ, tasklet_action);
> > open_softirq(HI_SOFTIRQ, tasklet_hi_action);
> >  }
> > @@ -732,7 +733,6 @@ static __init int spawn_ksoftirqd(void)
> >  
> > return 0;
> >  }
> > -early_initcall(spawn_ksoftirqd);
> >  
> >  /*
> >   * [ These __weak aliases are kept in a separate compilation unit, so that
> > 
> > Any thoughts?
> 
> One likely problem is that there are almost certainly parts of the kernel
> that need softirq_init() to stay roughly where it is.  So, is it possible
> to leave softirq_init() where it is, and to arrange for spawn_ksoftirqd()
> to be invoked just before rcu_init_tasks_generic() is called?

This still seems worth trying (and doing so is next on my list), but just

Re: [PATCH] clk: Mark fwnodes when their clock provider is added

2021-02-12 Thread Stephen Boyd
Quoting Greg KH (2021-02-11 05:00:51)
> On Wed, Feb 10, 2021 at 01:44:35PM +0200, Tudor Ambarus wrote:
> > This is a follow-up for:
> > commit 3c9ea42802a1 ("clk: Mark fwnodes when their clock provider is 
> > added/removed")
> > 
> > The above commit updated the deprecated of_clk_add_provider(),
> > but missed to update the preferred of_clk_add_hw_provider().
> > Update it now.
> > 
> > Signed-off-by: Tudor Ambarus 
> > ---

Acked-by: Stephen Boyd 


[PATCH] selftests: kvm: add hardware_disable test

2021-02-12 Thread Marc Orr
From: Ignacio Alvarado 

This test launches 512 VMs in serial and kills them after a random
amount of time.

The test was original written to exercise KVM user notifiers in
the context of1650b4ebc99d:
- KVM: Disable irq while unregistering user notifier
- 
https://lore.kernel.org/kvm/CACXrx53vkO=hkfwwwk+fvpvxcnjprymtdz10qwxfvvx_ptg...@mail.gmail.com/

Recently, this test piqued my interest because it proved useful to
for AMD SNP in exercising the "in-use" pages, described in APM section
15.36.12, "Running SNP-Active Virtual Machines".

To run the test, first compile:
$ make "CPPFLAGS=-static -Wl,--whole-archive -lpthread -Wl,--no-whole-archive" \
-C tools/testing/selftests/kvm/

Then, copy the test over to a machine with the kernel and run:
$ ./hardware_disable_test

Signed-off-by: Ignacio Alvarado 
Signed-off-by: Marc Orr 
---
 tools/testing/selftests/kvm/.gitignore|   1 +
 tools/testing/selftests/kvm/Makefile  |   1 +
 .../selftests/kvm/hardware_disable_test.c | 165 ++
 3 files changed, 167 insertions(+)
 create mode 100644 tools/testing/selftests/kvm/hardware_disable_test.c

diff --git a/tools/testing/selftests/kvm/.gitignore 
b/tools/testing/selftests/kvm/.gitignore
index ce8f4ad39684..d631e111441a 100644
--- a/tools/testing/selftests/kvm/.gitignore
+++ b/tools/testing/selftests/kvm/.gitignore
@@ -28,6 +28,7 @@
 /demand_paging_test
 /dirty_log_test
 /dirty_log_perf_test
+/hardware_disable_test
 /kvm_create_max_vcpus
 /set_memory_region_test
 /steal_time
diff --git a/tools/testing/selftests/kvm/Makefile 
b/tools/testing/selftests/kvm/Makefile
index fe41c6a0fa67..c1c403d878f6 100644
--- a/tools/testing/selftests/kvm/Makefile
+++ b/tools/testing/selftests/kvm/Makefile
@@ -62,6 +62,7 @@ TEST_GEN_PROGS_x86_64 += x86_64/tsc_msrs_test
 TEST_GEN_PROGS_x86_64 += demand_paging_test
 TEST_GEN_PROGS_x86_64 += dirty_log_test
 TEST_GEN_PROGS_x86_64 += dirty_log_perf_test
+TEST_GEN_PROGS_x86_64 += hardware_disable_test
 TEST_GEN_PROGS_x86_64 += kvm_create_max_vcpus
 TEST_GEN_PROGS_x86_64 += set_memory_region_test
 TEST_GEN_PROGS_x86_64 += steal_time
diff --git a/tools/testing/selftests/kvm/hardware_disable_test.c 
b/tools/testing/selftests/kvm/hardware_disable_test.c
new file mode 100644
index ..2f2eeb8a1d86
--- /dev/null
+++ b/tools/testing/selftests/kvm/hardware_disable_test.c
@@ -0,0 +1,165 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * This test is intended to reproduce a crash that happens when
+ * kvm_arch_hardware_disable is called and it attempts to unregister the user
+ * return notifiers.
+ */
+
+#define _GNU_SOURCE
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#include "kvm_util.h"
+
+#define VCPU_NUM 4
+#define SLEEPING_THREAD_NUM (1 << 4)
+#define FORK_NUM (1ULL << 9)
+#define DELAY_US_MAX 2000
+#define GUEST_CODE_PIO_PORT 4
+
+sem_t *sem;
+
+/* Arguments for the pthreads */
+struct payload {
+   struct kvm_vm *vm;
+   uint32_t index;
+};
+
+static void guest_code(void)
+{
+   for (;;)
+   ;  /* Some busy work */
+   printf("Should not be reached.\n");
+}
+
+static void *run_vcpu(void *arg)
+{
+   struct payload *payload = (struct payload *)arg;
+   struct kvm_run *state = vcpu_state(payload->vm, payload->index);
+
+   vcpu_run(payload->vm, payload->index);
+
+   TEST_ASSERT(false, "%s: exited with reason %d: %s\n",
+   __func__, state->exit_reason,
+   exit_reason_str(state->exit_reason));
+   pthread_exit(NULL);
+}
+
+static void *sleeping_thread(void *arg)
+{
+   int fd;
+
+   while (true) {
+   fd = open("/dev/null", O_RDWR);
+   close(fd);
+   }
+   TEST_ASSERT(false, "%s: exited\n", __func__);
+   pthread_exit(NULL);
+}
+
+static inline void check_create_thread(pthread_t *thread, pthread_attr_t *attr,
+  void *(*f)(void *), void *arg)
+{
+   int r;
+
+   r = pthread_create(thread, attr, f, arg);
+   TEST_ASSERT(r == 0, "%s: failed to create thread", __func__);
+}
+
+static inline void check_set_affinity(pthread_t thread, cpu_set_t *cpu_set)
+{
+   int r;
+
+   r = pthread_setaffinity_np(thread, sizeof(cpu_set_t), cpu_set);
+   TEST_ASSERT(r == 0, "%s: failed set affinity", __func__);
+}
+
+static inline void check_join(pthread_t thread, void **retval)
+{
+   int r;
+
+   r = pthread_join(thread, retval);
+   TEST_ASSERT(r == 0, "%s: failed to join thread", __func__);
+}
+
+static void run_test(uint32_t run)
+{
+   struct kvm_vm *vm;
+   cpu_set_t cpu_set;
+   pthread_t threads[VCPU_NUM];
+   pthread_t throw_away;
+   struct payload payloads[VCPU_NUM];
+   void *b;
+   uint32_t i, j;
+
+   CPU_ZERO(&cpu_set);
+   for (i = 0; i < VCPU_NUM; i++)
+   CPU_SET(i, &cpu_set);
+
+   vm = vm_create(VM_MODE_DEFAULT, DEFAULT_GUEST_PHY_PAGES, O_RDWR);
+   kvm

Re: [PATCH 00/21] [Set 2] Rid W=1 warnings from Clock

2021-02-12 Thread Stephen Boyd
Quoting Lee Jones (2021-02-12 14:37:39)
> On Fri, 12 Feb 2021, Stephen Boyd wrote:
> 
> > 
> > I'd like to enable it for only files under drivers/clk/ but it doesn't
> > seem to work. I'm not asking to enable it at the toplevel Makefile. I'm
> > asking to enable it for drivers/clk/ so nobody has to think about it now
> > that you've done the hard work of getting the numbers in this directory
> > down to zero or close to zero.
> 
> I'm not sure which one of us is confused.  Probably me, but ...
> 
> Even if you could enable it per-subsystem, how would that help you?
> 
> How can you ensure that contributors see any new W=1 warnings, but
> Linus doesn't?  When Linus conducts his build-tests during the merge
> window, he is also going to build W=1 for drivers/clk.

The assumption is contributors would have compiled the code they're
sending, but that's obviously not always the case, so this assumption
relies on developers running make. If they do run make then the hope is
they would see the warnings now, without having to rely on them to know
about passing W=1 to make, and fix them before sending code. If
developers are ignoring build errors or warnings then we can't do
anything anyway.

> 
> All that's going to achieve is put you in the firing line.

Ok. Is this prior experience?

> 
> From my PoV W=1 builds should be enabled during the development phase
> (i.e. contributor, auto-builder, maintainer).  By the time patches get
> make it into Mainline the review/testing stage is over and only the
> default W=0 warnings are meaningful.
> 

Alright maybe I don't understand and W=1 builds are noisy for the
drivers/clk subdirectory even after applying these patches. Or it has
some false positives that won't be fixed? Or a new compiler can cause
new warnings to happen? I could see these things being a problem.

I'm trying to see if we can make lives better for everyone by exposing
the warnings by default in the drivers/clk/ directory now that there are
supposedly none left. Shouldn't we tighten the screws now that we've
cleaned them?


[PATCH 0/1] HID: ft260: add usb hid to i2c host bridge driver

2021-02-12 Thread Michael Zaidman
The FT260 is a USB device that implements USB to I2C/UART bridges
through two USB HID class interfaces. The first - for I2C, and the
second for UART. Each interface is independent, and the kernel
detects it as a separate hidraw device.

This commit adds I2C host adapter support, enabling a wide range of
standard userspace tools and applications that do not implement HID
protocol, to access the I2C client devices via FT260 I2C controller. 

The driver was tested with different I2C client devices, Linux
kernels, and Linux userspace tools.

For data transfer, the FT260 implements one Interrupt IN and one
Interrupt OUT pipes per interface. For configuration and control,
the FT260 exposes the HID class commands through the Control pipe.

Commands and responses are FT260 specific and documented in the
AN_394_User_Guide_for_FT260.pdf on the https://www.ftdichip.com.

Michael Zaidman (1):
  HID: ft260: add usb hid to i2c host bridge driver

 MAINTAINERS |7 +
 drivers/hid/Kconfig |   11 +
 drivers/hid/Makefile|2 +
 drivers/hid/hid-ft260.c | 1097 +++
 drivers/hid/hid-ids.h   |1 +
 5 files changed, 1118 insertions(+)
 create mode 100644 drivers/hid/hid-ft260.c


base-commit: 07f7e57c63aaa2afb4ea31edef05e08699a63a00
-- 
2.25.1



[PATCH 1/1] HID: ft260: add usb hid to i2c host bridge driver

2021-02-12 Thread Michael Zaidman
The FT260 is a USB device that implements USB to I2C/UART bridges
through two USB HID class interfaces. The first - for I2C, and the
second for UART. Each interface is independently controlled, and
the kernel detects each interface as a separate hidraw device.

This commit adds I2C host adapter support.

Signed-off-by: Michael Zaidman 
---
 MAINTAINERS |7 +
 drivers/hid/Kconfig |   11 +
 drivers/hid/Makefile|2 +
 drivers/hid/hid-ft260.c | 1097 +++
 drivers/hid/hid-ids.h   |1 +
 5 files changed, 1118 insertions(+)
 create mode 100644 drivers/hid/hid-ft260.c

diff --git a/MAINTAINERS b/MAINTAINERS
index c38651ca59a5..69f7405b5cdb 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -7310,6 +7310,13 @@ F:   fs/verity/
 F: include/linux/fsverity.h
 F: include/uapi/linux/fsverity.h
 
+FT260 FTDI USB-HID TO I2C BRIDGE DRIVER
+M: Michael Zaidman 
+L: linux-...@vger.kernel.org
+L: linux-in...@vger.kernel.org
+S: Maintained
+F: drivers/hid/hid-ft260.c
+
 FUJITSU LAPTOP EXTRAS
 M: Jonathan Woithe 
 L: platform-driver-...@vger.kernel.org
diff --git a/drivers/hid/Kconfig b/drivers/hid/Kconfig
index a7dcbda74221..837d99e8f1f5 100644
--- a/drivers/hid/Kconfig
+++ b/drivers/hid/Kconfig
@@ -351,6 +351,17 @@ config HID_EZKEY
help
Support for Ezkey BTC 8193 keyboard.
 
+config HID_FT260
+   tristate "FTDI FT260 USB HID to I2C host support"
+   depends on USB_HID && HIDRAW && I2C
+   help
+ Provides I2C host adapter functionality over USB-HID through FT260
+ device. The customizable USB descriptor fields are exposed as sysfs
+ attributes.
+
+ To compile this driver as a module, choose M here: the module
+ will be called hid-ft260.
+
 config HID_GEMBIRD
tristate "Gembird Joypad"
depends on HID
diff --git a/drivers/hid/Makefile b/drivers/hid/Makefile
index c4f6d5c613dc..b6fbc19e7c7c 100644
--- a/drivers/hid/Makefile
+++ b/drivers/hid/Makefile
@@ -145,3 +145,5 @@ obj-$(CONFIG_INTEL_ISH_HID) += intel-ish-hid/
 obj-$(INTEL_ISH_FIRMWARE_DOWNLOADER)   += intel-ish-hid/
 
 obj-$(CONFIG_AMD_SFH_HID)   += amd-sfh-hid/
+
+obj-$(CONFIG_HID_FT260)+= hid-ft260.o
diff --git a/drivers/hid/hid-ft260.c b/drivers/hid/hid-ft260.c
new file mode 100644
index ..47481aeaadd3
--- /dev/null
+++ b/drivers/hid/hid-ft260.c
@@ -0,0 +1,1097 @@
+// SPDX-License-Identifier: GPL-2.0-only
+/*
+ * hid-ft260.c - FTDI USB HID to I2C host bridge
+ *
+ * Copyright (c) 2021, Michael Zaidman 
+ *
+ * Data Sheet:
+ *   https://www.ftdichip.com/Support/Documents/DataSheets/ICs/DS_FT260.pdf
+ */
+
+#include "hid-ids.h"
+#include 
+#include 
+#include 
+#include 
+
+#ifdef DEBUG
+static int ft260_debug = 1;
+#else
+static int ft260_debug;
+#endif
+module_param_named(debug, ft260_debug, int, 0600);
+MODULE_PARM_DESC(debug, "Toggle FT260 debugging messages");
+
+#define ft260_dbg(format, arg...)\
+   do {  \
+   if (ft260_debug)  \
+   pr_info("%s: " format, __func__, ##arg);  \
+   } while (0)
+
+#define FT260_REPORT_MAX_LENGTH (64)
+#define FT260_I2C_DATA_REPORT_ID(len) (FT260_I2C_REPORT_MIN + (len - 1) / 4)
+/*
+ * The input report format assigns 62 bytes for the data payload, but ft260
+ * returns 60 and 2 in two separate transactions. To minimize transfer time
+ * in reading chunks mode, set the maximum read payload length to 60 bytes.
+ */
+#define FT260_RD_DATA_MAX (60)
+#define FT260_WR_DATA_MAX (60)
+
+/*
+ * Device interface configuration.
+ * The FT260 has 2 interfaces that are controlled by DCNF0 and DCNF1 pins.
+ * First implementes USB HID to I2C bridge function and
+ * second - USB HID to UART bridge function.
+ */
+enum {
+   FT260_MODE_ALL  = 0x00,
+   FT260_MODE_I2C  = 0x01,
+   FT260_MODE_UART = 0x02,
+   FT260_MODE_BOTH = 0x03,
+};
+
+/* Control pipe */
+enum {
+   FT260_GET_RQST_TYPE = 0xA1,
+   FT260_GET_REPORT= 0x01,
+   FT260_SET_RQST_TYPE = 0x21,
+   FT260_SET_REPORT= 0x09,
+   FT260_FEATURE   = 0x03,
+};
+
+/* Report IDs / Feature In */
+enum {
+   FT260_CHIP_VERSION  = 0xA0,
+   FT260_SYSTEM_SETTINGS   = 0xA1,
+   FT260_I2C_STATUS= 0xC0,
+   FT260_I2C_READ_REQ  = 0xC2,
+   FT260_I2C_REPORT_MIN= 0xD0,
+   FT260_I2C_REPORT_MAX= 0xDE,
+   FT260_GPIO  = 0xB0,
+   FT260_UART_INTERRUPT_STATUS = 0xB1,
+   FT260_UART_STATUS   = 0xE0,
+   FT260_UART_RI_DCD_STATUS= 0xE1,
+   FT260_UART_REPORT   = 0xF0,
+};
+
+/* Feature Out */
+enum {
+   FT260_SET_CLOCK = 0x01,
+   F

Re: [PATCH v5] tpm_tis: Add missing tpm_request/relinquish_locality() calls

2021-02-12 Thread Jarkko Sakkinen
On Fri, Feb 12, 2021 at 12:06:00PM +0100, Lukasz Majczak wrote:
> There are missing calls to tpm_request_locality() before the calls to
> the tpm_get_timeouts() and tpm_tis_probe_irq_single() - both functions
> internally send commands to the tpm using tpm_tis_send_data()
> which in turn, at the very beginning, calls the tpm_tis_status().
> This one tries to read TPM_STS register, what fails and propagates
> this error upward. The read fails due to lack of acquired locality,
> as it is described in
> TCG PC Client Platform TPM Profile (PTP) Specification,
> paragraph 6.1 FIFO Interface Locality Usage per Register,
> Table 39 Register Behavior Based on Locality Setting for FIFO
> - a read attempt to TPM_STS_x Registers returns 0xFF in case of lack
> of locality. The described situation manifests itself with
> the following warning trace:
> 
> [4.324298] TPM returned invalid status
> [4.324806] WARNING: CPU: 2 PID: 1 at drivers/char/tpm/tpm_tis_core.c:275 
> tpm_tis_status+0x86/0x8f

The commit message is has great description  of the background, but
it does not have description what the commit does. Please describe
this in imperative form, e.g. "Export tpm_request_locality() and ..."
and "Call tpm_request_locality() before ...". You get the idea.

It's also lacking expalanation of the implementation path, i.e.
why you are not using tpm_chip_start() and tpm_chip_stop().

> 
> Tested on Samsung Chromebook Pro (Caroline), TPM 1.2 (SLB 9670)

Empty line here.

Also, add:

Cc: sta...@vger.kernel.org

> Fixes: a3fbfae82b4c ("tpm: take TPM chip power gating out of tpm_transmit()")

Remove empty line.

> Signed-off-by: Lukasz Majczak 
> Reviewed-by: Guenter Roeck 


> ---
> 
> Hi
> 
> I have tried to clean all the pointed issues, but decided to stay with 
> tpm_request/relinquish_locality() calls instead of using 
> tpm_chip_start/stop(),
> the rationale behind this is that, in this case only locality is requested, 
> there
> is no need to enable/disable the clock, the similar case is present in
> the probe_itpm() function.

I would prefer to use the "same same" if it does not cause any extra harm
instead of new exports. That will also make the fix more compact. So don't
agree with this reasoning. Also the commit message lacks *any* reasoning.

> One more clarification is that, the TPM present on my test machine is the SLB 
> 9670
> (not Cr50).
> 
> Best regards,
> Lukasz
> 
> Changes:
> v4->v5:
> * Fixed style, typos, clarified commit message
> 
>  drivers/char/tpm/tpm-chip.c  |  6 --
>  drivers/char/tpm/tpm-interface.c | 13 ++---
>  drivers/char/tpm/tpm.h   |  2 ++
>  drivers/char/tpm/tpm_tis_core.c  | 14 +++---
>  4 files changed, 27 insertions(+), 8 deletions(-)
> 
> diff --git a/drivers/char/tpm/tpm-chip.c b/drivers/char/tpm/tpm-chip.c
> index ddaeceb7e109..ce9c2650fbe5 100644
> --- a/drivers/char/tpm/tpm-chip.c
> +++ b/drivers/char/tpm/tpm-chip.c
> @@ -32,7 +32,7 @@ struct class *tpm_class;
>  struct class *tpmrm_class;
>  dev_t tpm_devt;
>  
> -static int tpm_request_locality(struct tpm_chip *chip)
> +int tpm_request_locality(struct tpm_chip *chip)
>  {
>   int rc;
>  
> @@ -46,8 +46,9 @@ static int tpm_request_locality(struct tpm_chip *chip)
>   chip->locality = rc;
>   return 0;
>  }
> +EXPORT_SYMBOL_GPL(tpm_request_locality);
>  
> -static void tpm_relinquish_locality(struct tpm_chip *chip)
> +void tpm_relinquish_locality(struct tpm_chip *chip)
>  {
>   int rc;
>  
> @@ -60,6 +61,7 @@ static void tpm_relinquish_locality(struct tpm_chip *chip)
>  
>   chip->locality = -1;
>  }
> +EXPORT_SYMBOL_GPL(tpm_relinquish_locality);
>  
>  static int tpm_cmd_ready(struct tpm_chip *chip)
>  {
> diff --git a/drivers/char/tpm/tpm-interface.c 
> b/drivers/char/tpm/tpm-interface.c
> index 1621ce818705..2a9001d329f2 100644
> --- a/drivers/char/tpm/tpm-interface.c
> +++ b/drivers/char/tpm/tpm-interface.c
> @@ -241,10 +241,17 @@ int tpm_get_timeouts(struct tpm_chip *chip)
>   if (chip->flags & TPM_CHIP_FLAG_HAVE_TIMEOUTS)
>   return 0;
>  
> - if (chip->flags & TPM_CHIP_FLAG_TPM2)
> + if (chip->flags & TPM_CHIP_FLAG_TPM2) {
>   return tpm2_get_timeouts(chip);
> - else
> - return tpm1_get_timeouts(chip);
> + } else {
> + ssize_t ret = tpm_request_locality(chip);
> +
> + if (ret)
> + return ret;
> + ret = tpm1_get_timeouts(chip);
> + tpm_relinquish_locality(chip);
> + return ret;
> + }
>  }
>  EXPORT_SYMBOL_GPL(tpm_get_timeouts);
>  
> diff --git a/drivers/char/tpm/tpm.h b/drivers/char/tpm/tpm.h
> index 947d1db0a5cc..8c13008437dd 100644
> --- a/drivers/char/tpm/tpm.h
> +++ b/drivers/char/tpm/tpm.h
> @@ -193,6 +193,8 @@ static inline void tpm_msleep(unsigned int delay_msec)
>  
>  int tpm_chip_start(struct tpm_chip *chip);
>  void tpm_chip_stop(struct tpm_chip *chip);
> +int tpm_request_locality(struct tpm_chip *chip);
> +void tpm

Re: [PATCH 1/6] fs: Add flag to file_system_type to indicate content is generated

2021-02-12 Thread Darrick J. Wong
On Sat, Feb 13, 2021 at 10:27:26AM +1100, Dave Chinner wrote:
> On Fri, Feb 12, 2021 at 03:07:39PM -0800, Ian Lance Taylor wrote:
> > On Fri, Feb 12, 2021 at 3:03 PM Dave Chinner  wrote:
> > >
> > > On Fri, Feb 12, 2021 at 04:45:41PM +0100, Greg KH wrote:
> > > > On Fri, Feb 12, 2021 at 07:33:57AM -0800, Ian Lance Taylor wrote:
> > > > > On Fri, Feb 12, 2021 at 12:38 AM Greg KH  
> > > > > wrote:
> > > > > >
> > > > > > Why are people trying to use copy_file_range on simple /proc and 
> > > > > > /sys
> > > > > > files in the first place?  They can not seek (well most can not), so
> > > > > > that feels like a "oh look, a new syscall, let's use it everywhere!"
> > > > > > problem that userspace should not do.
> > > > >
> > > > > This may have been covered elsewhere, but it's not that people are
> > > > > saying "let's use copy_file_range on files in /proc."  It's that the
> > > > > Go language standard library provides an interface to operating system
> > > > > files.  When Go code uses the standard library function io.Copy to
> > > > > copy the contents of one open file to another open file, then on Linux
> > > > > kernels 5.3 and greater the Go standard library will use the
> > > > > copy_file_range system call.  That seems to be exactly what
> > > > > copy_file_range is intended for.  Unfortunately it appears that when
> > > > > people writing Go code open a file in /proc and use io.Copy the
> > > > > contents to another open file, copy_file_range does nothing and
> > > > > reports success.  There isn't anything on the copy_file_range man page
> > > > > explaining this limitation, and there isn't any documented way to know
> > > > > that the Go standard library should not use copy_file_range on certain
> > > > > files.
> > > >
> > > > But, is this a bug in the kernel in that the syscall being made is not
> > > > working properly, or a bug in that Go decided to do this for all types
> > > > of files not knowing that some types of files can not handle this?
> > > >
> > > > If the kernel has always worked this way, I would say that Go is doing
> > > > the wrong thing here.  If the kernel used to work properly, and then
> > > > changed, then it's a regression on the kernel side.
> > > >
> > > > So which is it?
> > >
> > > Both Al Viro and myself have said "copy file range is not a generic
> > > method for copying data between two file descriptors". It is a
> > > targetted solution for *regular files only* on filesystems that store
> > > persistent data and can accelerate the data copy in some way (e.g.
> > > clone, server side offload, hardware offlead, etc). It is not
> > > intended as a copy mechanism for copying data from one random file
> > > descriptor to another.
> > >
> > > The use of it as a general file copy mechanism in the Go system
> > > library is incorrect and wrong. It is a userspace bug.  Userspace
> > > has done the wrong thing, userspace needs to be fixed.
> > 
> > OK, we'll take it out.
> > 
> > I'll just make one last plea that I think that copy_file_range could
> > be much more useful if there were some way that a program could know
> > whether it would work or not.

Well... we could always implement a CFR_DRYRUN flag that would run
through all the parameter validation and return 0 just before actually
starting any real copying logic.  But that wouldn't itself solve the
problem that there are very old virtual filesystems in Linux that have
zero-length regular files that behave like a pipe.

> If you can't tell from userspace that a file has data in it other
> than by calling read() on it, then you can't use cfr on it.

I don't know how to do that, Dave. :)

Frankly I'm with the Go developers on this -- one should detect c_f_r by
calling it and if it errors out then fall back to the usual userspace
buffer copy strategy.

That still means we need to fix the kernel WRT these weird old
filesystems.  One of...

1. Get rid of the generic fallback completely, since splice only copies
64k at a time and ... yay?  I guess it at least passes generic/521 and
generic/522 these days.

2. Keep it, but change c_f_r to require that both files have a
->copy_file_range implementation.  If they're the same then we'll call
the function pointer, if not, we call the generic fallback.  This at
least gets us back to the usual behavior which is that filesystems have
to opt in to new functionality (== we assume they QA'd all the wunnerful
combinations).

3. #2, but fix the generic fallback to not suck so badly.  That sounds
like someone (else's) 2yr project. :P

--D

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> da...@fromorbit.com


Re: [BUG REPORT] media: coda: mpeg4 decode corruption on i.MX6qp only

2021-02-12 Thread Sven Van Asbroeck
Philipp, Fabio,

I was able to verify that the PREs do indeed overrun their allocated ocram area.

Section 38.5.1 of the iMX6QuadPlus manual indicates the ocram size
required: width(pixels) x 8 lines x 4 bytes. For 2048 pixels max, this
comes to 64K. This is what the PRE driver allocates. So far, so good.

The trouble starts when we're displaying a section of a much wider
bitmap. This happens in X when using two displays. e.g.:
HDMI 1920x1088
LVDS 1280x800
X bitmap 3200x1088, left side displayed on HDMI, right side on LVDS.

In such a case, the stride will be much larger than the width of a
display scanline.

This is where things start to go very wrong.

I found that the ocram area used by the PREs increases with the
stride. I experimentally found a formula:
ocam_used = display_widthx8x4 + (bitmap_width-display_width)x7x4

As the stride increases, the PRE eventually overruns the ocram and...
ends up in the "ocram aliased" area, where it overwrites the ocram in
use by the vpu/coda !

I could not find any PRE register setting that changes the used ocram area.

Sven


Re: objtool segfault in 5.10 kernels with binutils-2.36.1

2021-02-12 Thread Josh Poimboeuf
On Thu, Feb 11, 2021 at 05:16:56PM +, Ken Moffat wrote:
> Hi,
> 
> in 5.10 kernels up to and including 5.10.15 when trying to build the
> kernel for an x86_64 skylake using binutils-2.36.1, gcc-10.2 and
> glibic-2.33 I get a segfault in objtool if the orc unwinder is
> enabled.
> 
> This has already been fixed in 5.11 by ''objtool: Fix seg fault with
> Clang non-section symbols'
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/patch/?id=44f6a7c0755d8dd453c70557e11687bb080a6f21
> 
> So can this be added to 5.10 stable, please ?
> 
> Please CC me as I am no-longer subscribed.

Hi Ken,

I agree that needs to be backported (and my bad for not marking it as
stable to begin with).

Greg, this also came up in another thread, are you pulling that one in,
or do you want me to send it to stable list?

-- 
Josh



Re: [PATCH 2/2] rcu-tasks: add RCU-tasks self tests

2021-02-12 Thread Paul E. McKenney
On Fri, Feb 12, 2021 at 10:12:07PM +0100, Uladzislau Rezki wrote:
> On Fri, Feb 12, 2021 at 08:20:59PM +0100, Sebastian Andrzej Siewior wrote:
> > On 2020-12-09 21:27:32 [+0100], Uladzislau Rezki (Sony) wrote:
> > > Add self tests for checking of RCU-tasks API functionality.
> > > It covers:
> > > - wait API functions;
> > > - invoking/completion call_rcu_tasks*().
> > > 
> > > Self-tests are run when CONFIG_PROVE_RCU kernel parameter is set.
> > 
> > I just bisected to this commit. By booting with `threadirqs' I end up
> > with:
> > [0.176533] Running RCU-tasks wait API self tests
> > 
> > No stall warning or so.
> > It boots again with:
> > 
> > diff --git a/init/main.c b/init/main.c
> > --- a/init/main.c
> > +++ b/init/main.c
> > @@ -1489,6 +1489,7 @@ void __init console_on_rootfs(void)
> > fput(file);
> >  }
> >  
> > +void rcu_tasks_initiate_self_tests(void);
> >  static noinline void __init kernel_init_freeable(void)
> >  {
> > /*
> > @@ -1514,6 +1515,7 @@ static noinline void __init kernel_init_freeable(void)
> >  
> > rcu_init_tasks_generic();
> > do_pre_smp_initcalls();
> > +   rcu_tasks_initiate_self_tests();
> > lockup_detector_init();
> >  
> > smp_init();
> > diff --git a/kernel/rcu/tasks.h b/kernel/rcu/tasks.h
> > --- a/kernel/rcu/tasks.h
> > +++ b/kernel/rcu/tasks.h
> > @@ -1266,7 +1266,7 @@ static void test_rcu_tasks_callback(struct rcu_head 
> > *rhp)
> > rttd->notrun = true;
> >  }
> >  
> > -static void rcu_tasks_initiate_self_tests(void)
> > +void rcu_tasks_initiate_self_tests(void)
> >  {
> > pr_info("Running RCU-tasks wait API self tests\n");
> >  #ifdef CONFIG_TASKS_RCU
> > @@ -1322,7 +1322,6 @@ void __init rcu_init_tasks_generic(void)
> >  #endif
> >  
> > // Run the self-tests.
> > -   rcu_tasks_initiate_self_tests();
> >  }
> >  
> >  #else /* #ifdef CONFIG_TASKS_RCU_GENERIC */
> > 
> > > Signed-off-by: Uladzislau Rezki (Sony) 

Apologies for the hassle!  My testing clearly missed this combination
of CONFIG_PROVE_RCU=y and threadirqs=1.  :-(

But at least I can easily reproduce this hang as follows:

tools/testing/selftests/rcutorture/bin/kvm.sh --allcpus --duration 2 --configs 
"TREE03" --kconfig "CONFIG_DEBUG_LOCK_ALLOC=y CONFIG_PROVE_LOCKING=y" 
--bootargs "threadirqs=1" --trust-make

Sadly, I cannot take your patch because that simply papers over the
fact that early boot use of synchronize_rcu_tasks() is broken in this
particular configuration, which will likely eventually bite others now
that init_kprobes() has been moved earlier in boot:

1b04fa990026 ("rcu-tasks: Move RCU-tasks initialization to before 
early_initcall()")
Link: https://lore.kernel.org/rcu/87eekfh80a@dja-thinkpad.axtens.net/
Fixes: 36dadef23fcc ("kprobes: Init kprobes in early_initcall")

> > Sebastian
> >
> We should be able to use call_rcu_tasks() in the *initcall() callbacks.
> The problem is that, ksoftirqd threads are not spawned by the time when
> an rcu_init_tasks_generic() is invoked:
> 
> diff --git a/init/main.c b/init/main.c
> index c68d784376ca..e6106bb12b2d 100644
> --- a/init/main.c
> +++ b/init/main.c
> @@ -954,7 +954,6 @@ asmlinkage __visible void __init __no_sanitize_address 
> start_kernel(void)
>   rcu_init_nohz();
>   init_timers();
>   hrtimers_init();
> - softirq_init();
>   timekeeping_init();
>  
>   /*
> @@ -1512,6 +1511,7 @@ static noinline void __init kernel_init_freeable(void)
>  
>   init_mm_internals();
>  
> + softirq_init();
>   rcu_init_tasks_generic();
>   do_pre_smp_initcalls();
>   lockup_detector_init();
> diff --git a/kernel/softirq.c b/kernel/softirq.c
> index 9d71046ea247..cafa55c496d0 100644
> --- a/kernel/softirq.c
> +++ b/kernel/softirq.c
> @@ -630,6 +630,7 @@ void __init softirq_init(void)
>   &per_cpu(tasklet_hi_vec, cpu).head;
>   }
>  
> + spawn_ksoftirqd();

We need a forward reference to allow this to build, but with that added,
my test case passes.  Good show!

>   open_softirq(TASKLET_SOFTIRQ, tasklet_action);
>   open_softirq(HI_SOFTIRQ, tasklet_hi_action);
>  }
> @@ -732,7 +733,6 @@ static __init int spawn_ksoftirqd(void)
>  
>   return 0;
>  }
> -early_initcall(spawn_ksoftirqd);
>  
>  /*
>   * [ These __weak aliases are kept in a separate compilation unit, so that
> 
> Any thoughts?

One likely problem is that there are almost certainly parts of the kernel
that need softirq_init() to stay roughly where it is.  So, is it possible
to leave softirq_init() where it is, and to arrange for spawn_ksoftirqd()
to be invoked just before rcu_init_tasks_generic() is called?

For my part, I will look into what is required to make Tasks RCU do
without softirq during boot, for example, by looking carefully at where
in boot RCU grace periods are unconditionally expedited.  Just in case
adjusting softirq has unforeseen side effects.

Thanx, Paul


Re: [PATCH] proc: Convert S_ permission uses to octal

2021-02-12 Thread Eric W. Biederman
Matthew Wilcox  writes:

> On Fri, Feb 12, 2021 at 04:01:48PM -0600, Eric W. Biederman wrote:
>> Joe Perches  writes:
>> 
>> > Convert S_ permissions to the more readable octal.
>> >
>> > Done using:
>> > $ ./scripts/checkpatch.pl -f --fix-inplace --types=SYMBOLIC_PERMS 
>> > fs/proc/*.[ch]
>> >
>> > No difference in generated .o files allyesconfig x86-64
>> >
>> > Link:
>> > https://lore.kernel.org/lkml/ca+55afw5v23t-zvdzp-mmd_eyxf8wbafwwb59934fv7g21u...@mail.gmail.com/
>> 
>> 
>> I will be frank.  I don't know what 0644 means.  I can never remember
>> which bit is read, write or execute.  So I like symbolic constants.
>
> Heh, I'm the other way, I can't remember what S_IRUGO means.
>
> but I think there's another way which improves the information
> density:
>
> #define DIR_RO_ALL(NAME, iops, fops)  DIR(NAME, 0555, iops, fops)
> ...
> (or S_IRUGO or whatever expands to 0555)
>
> There's really only a few combinations --
>   root read-only,
>   everybody read-only
>   root-write, others-read
>   everybody-write
>
> and execute is only used by proc for directories, not files, so I think
> there's only 8 combinations we'd need (and everybody-write is almost
> unused ...)

I guess it depends on which part of proc.  For fs/proc/base.c and it's
per process relatives something like that seems reasonable.

I don't know about fs/proc/generic.c where everyone from all over the
kernel registers new proc entries.

>> Perhaps we can do something like:
>> 
>> #define S_IRWX 7
>> #define S_IRW_ 6
>> #define S_IR_X 5
>> #define S_IR__ 4
>> #define S_I_WX 3
>> #define S_I_W_ 2
>> #define S_I__X 1
>> #define S_I___ 0
>> 
>> #define MODE(TYPE, USER, GROUP, OTHER) \
>>  (((S_IF##TYPE) << 9) | \
>>  ((S_I##USER)  << 6) | \
>>  ((S_I##GROUP) << 3) | \
>>  (S_I##OTHER))
>> 
>> Which would be used something like:
>> MODE(DIR, RWX, R_X, R_X)
>> MODE(REG, RWX, R__, R__)
>> 
>> Something like that should be able to address the readability while
>> still using symbolic constants.
>
> I think that's been proposed before.

I don't think it has ever been shot down.  Just no one care enough to
implement it.

Come to think of it, that has the nice property that if we cared we
could make it type safe as well.  Something we can't do with the octal
for obvious reasons.

Eric



Re: [PATCH 5/5] ima: enable loading of build time generated key to .ima keyring

2021-02-12 Thread Jarkko Sakkinen
On Thu, Feb 11, 2021 at 02:54:35PM -0500, Nayna Jain wrote:
> The kernel currently only loads the kernel module signing key onto
> the builtin trusted keyring. To support IMA, load the module signing
> key selectively either onto builtin or ima keyring based on MODULE_SIG
 ~~~
 IMA


> or MODULE_APPRAISE_MODSIG config respectively; and loads the CA kernel
> key onto builtin trusted keyring.
> 
> Signed-off-by: Nayna Jain 

/Jarkko

> ---
>  certs/system_keyring.c| 56 +++
>  include/keys/system_keyring.h |  9 +-
>  security/integrity/digsig.c   |  4 +++
>  3 files changed, 55 insertions(+), 14 deletions(-)
> 
> diff --git a/certs/system_keyring.c b/certs/system_keyring.c
> index 798291177186..0bbbe501f8a7 100644
> --- a/certs/system_keyring.c
> +++ b/certs/system_keyring.c
> @@ -26,6 +26,7 @@ static struct key *platform_trusted_keys;
>  
>  extern __initconst const u8 system_certificate_list[];
>  extern __initconst const unsigned long system_certificate_list_size;
> +extern __initconst const unsigned long module_cert_size;
>  
>  /**
>   * restrict_link_to_builtin_trusted - Restrict keyring addition by built in 
> CA
> @@ -131,19 +132,12 @@ static __init int system_trusted_keyring_init(void)
>   */
>  device_initcall(system_trusted_keyring_init);
>  
> -/*
> - * Load the compiled-in list of X.509 certificates.
> - */
> -static __init int load_system_certificate_list(void)
> +static __init int load_cert(const u8 *p, const u8 *end, struct key *keyring,
> + unsigned long flags)
>  {
>   key_ref_t key;
> - const u8 *p, *end;
>   size_t plen;
>  
> - pr_notice("Loading compiled-in X.509 certificates\n");
> -
> - p = system_certificate_list;
> - end = p + system_certificate_list_size;
>   while (p < end) {
>   /* Each cert begins with an ASN.1 SEQUENCE tag and must be more
>* than 256 bytes in size.
> @@ -158,16 +152,15 @@ static __init int load_system_certificate_list(void)
>   if (plen > end - p)
>   goto dodgy_cert;
>  
> - key = key_create_or_update(make_key_ref(builtin_trusted_keys, 
> 1),
> + key = key_create_or_update(make_key_ref(keyring, 1),
>  "asymmetric",
>  NULL,
>  p,
>  plen,
>  ((KEY_POS_ALL & ~KEY_POS_SETATTR) |
>  KEY_USR_VIEW | KEY_USR_READ),
> -KEY_ALLOC_NOT_IN_QUOTA |
> -KEY_ALLOC_BUILT_IN |
> -KEY_ALLOC_BYPASS_RESTRICTION);
> +flags);
> +
>   if (IS_ERR(key)) {
>   pr_err("Problem loading in-kernel X.509 certificate 
> (%ld)\n",
>  PTR_ERR(key));
> @@ -185,6 +178,43 @@ static __init int load_system_certificate_list(void)
>   pr_err("Problem parsing in-kernel X.509 certificate list\n");
>   return 0;
>  }
> +
> +__init int load_module_cert(struct key *keyring, unsigned long flags)
> +{
> + const u8 *p, *end;
> +
> + if (!IS_ENABLED(CONFIG_IMA_APPRAISE_MODSIG))
> + return 0;
> +
> + pr_notice("Loading compiled-in module X.509 certificates\n");
> +
> + p = system_certificate_list;
> + end = p + module_cert_size;
> + load_cert(p, end, keyring, flags);
> +
> + return 0;
> +}
> +
> +/*
> + * Load the compiled-in list of X.509 certificates.
> + */
> +static __init int load_system_certificate_list(void)
> +{
> + const u8 *p, *end;
> +
> + pr_notice("Loading compiled-in X.509 certificates\n");
> +
> +#ifdef CONFIG_MODULE_SIG
> + p = system_certificate_list;
> +#else
> + p = system_certificate_list + module_cert_size;
> +#endif
> + end = p + system_certificate_list_size;
> + load_cert(p, end, builtin_trusted_keys, KEY_ALLOC_NOT_IN_QUOTA |
> + KEY_ALLOC_BUILT_IN |
> + KEY_ALLOC_BYPASS_RESTRICTION);
> + return 0;
> +}
>  late_initcall(load_system_certificate_list);
>  
>  #ifdef CONFIG_SYSTEM_DATA_VERIFICATION
> diff --git a/include/keys/system_keyring.h b/include/keys/system_keyring.h
> index fb8b07daa9d1..e91c03376599 100644
> --- a/include/keys/system_keyring.h
> +++ b/include/keys/system_keyring.h
> @@ -16,9 +16,16 @@ extern int restrict_link_by_builtin_trusted(struct key 
> *keyring,
>   const struct key_type *type,
>   const union key_payload *payload,
>   struct key *restriction_key);
> -
> +extern __init int l

RE: [RFC] IRQ handlers run with some high-priority interrupts(not NMI) enabled on some platform

2021-02-12 Thread Song Bao Hua (Barry Song)


> -Original Message-
> From: Arnd Bergmann [mailto:a...@kernel.org]
> Sent: Saturday, February 13, 2021 12:06 PM
> To: Song Bao Hua (Barry Song) 
> Cc: t...@linutronix.de; gre...@linuxfoundation.org; a...@arndb.de;
> ge...@linux-m68k.org; fun...@jurai.org; ph...@gnu.org; cor...@lwn.net;
> mi...@redhat.com; linux-m...@lists.linux-m68k.org;
> fth...@telegraphics.com.au; linux-kernel@vger.kernel.org
> Subject: Re: [RFC] IRQ handlers run with some high-priority interrupts(not 
> NMI)
> enabled on some platform
> 
> On Sat, Feb 13, 2021 at 12:00 AM Song Bao Hua (Barry Song)
>  wrote:
> > > -Original Message-
> > > From: Arnd Bergmann [mailto:a...@kernel.org]
> > > Sent: Saturday, February 13, 2021 11:34 AM
> > > To: Song Bao Hua (Barry Song) 
> > > Cc: t...@linutronix.de; gre...@linuxfoundation.org; a...@arndb.de;
> > > ge...@linux-m68k.org; fun...@jurai.org; ph...@gnu.org; cor...@lwn.net;
> > > mi...@redhat.com; linux-m...@lists.linux-m68k.org;
> > > fth...@telegraphics.com.au; linux-kernel@vger.kernel.org
> > > Subject: Re: [RFC] IRQ handlers run with some high-priority interrupts(not
> NMI)
> > > enabled on some platform
> > >
> > > On Fri, Feb 12, 2021 at 2:18 AM Song Bao Hua (Barry Song)
> > >  wrote:
> > >
> > > > So I am requesting comments on:
> > > > 1. are we expecting all interrupts except NMI to be disabled in irq 
> > > > handler,
> > > > or do we actually allow some high-priority interrupts between low and
> NMI
> > > to
> > > > come in some platforms?
> > >
> > > I tried to come to an answer but this does not seem particularly 
> > > well-defined.
> > > There are a few things I noticed:
> > >
> > > - going through the local_irq_save()/restore() implementations on all
> > >   architectures, I did not find any other ones besides m68k that leave
> > >   high-priority interrupts enabled. I did see that at least alpha and 
> > > openrisc
> > >   are designed to support that in hardware, but the code just leaves the
> > >   interrupts disabled.
> >
> > The case is a little different. Explicit local_irq_save() does disable all
> > high priority interrupts on m68k. The only difference is 
> > arch_irqs_disabled()
> > of m68k will return true while low-priority interrupts are masked and high
> > -priority are still open. M68k's hardIRQ also runs in this context with high
> > priority interrupts enabled.
> 
> My point was that on most other architectures, local_irq_save()/restore()
> always disables/enables all interrupts, while on m68k it restores the
> specific level they were on before. On alpha, it does the same as on m68k,
> but then the top-level interrupt handler just disables them all before calling
> into any other code.

That's what I think m68k is better to do.
 
Looks weird that nested interrupts can enter while arch_irqs_disabled()
is true on m68k because masking low-priority interrupts with
high-interrupts still enabled would be able to make m68k's
arch_irqs_disabled() true, which is exactly the environment
m68k's irq handler is running.

So I was actually trying to warn this unusual case - interrupts
get nested while both in_hardirq() and irqs_disabled() are true.

diff --git a/include/linux/hardirq.h b/include/linux/hardirq.h
index 7c9d6a2d7e90..b8ca27555c76 100644
--- a/include/linux/hardirq.h
+++ b/include/linux/hardirq.h
@@ -32,6 +32,7 @@ static __always_inline void rcu_irq_enter_check_tick(void)
  */
 #define __irq_enter()  \
do {\
+   WARN_ONCE(in_hardirq() && irqs_disabled(), "nested
interrupts\n"); \
preempt_count_add(HARDIRQ_OFFSET);  \
lockdep_hardirq_enter();\
account_hardirq_enter(current); \
@@ -44,6 +45,7 @@ static __always_inline void rcu_irq_enter_check_tick(void)
  */
 #define __irq_enter_raw()  \
do {\
+   WARN_ONCE(in_hardirq() && irqs_disabled(), " nested
interrupts\n"); \
preempt_count_add(HARDIRQ_OFFSET);  \
lockdep_hardirq_enter();\
} while (0)

And I also think it is better for m68k's arch_irqs_disabled() to 
return true only when both low and high priority interrupts are
disabled rather than try to mute this warn in genirq by a weaker
condition:

irqreturn_t __handle_irq_event_percpu(struct irq_desc *desc, unsigned int 
*flags)
{
...

trace_irq_handler_entry(irq, action);
res = action->handler(irq, action->dev_id);
trace_irq_handler_exit(irq, action, res);

if (WARN_ONCE(!irqs_disabled(),"irq %u handler %pS enabled 
interrupts\n",
  irq, action->handler))
local_irq_disable();
}

This warn is not activated on m68k because its arch_irqs_disabled() return
true though its high-priority interrupts are still enabled

  1   2   3   4   5   6   7   8   9   10   >