date:20210218

Re: [PATCH] mtd: spi-nor: sfdp: Fix out of bound array access

2021-02-18 Thread Mathieu Dubois-Briand

Hi,

I just came across this commit (9166f4af32db) in spi-nor/for-5.12:
https://git.kernel.org/pub/scm/linux/kernel/git/mtd/linux.git/commit/?h=spi-nor/for-5.12=9166f4af32db74e1544a2149aef231ff24515ea3.

So I believe this patch isn't needed anymore.

Thanks,
Mathieu

Re: [PATCH v12 13/14] mm/vmalloc: Hugepage vmalloc mappings

2021-02-18 Thread Nicholas Piggin

Excerpts from Ding Tianhong's message of February 19, 2021 1:45 pm:
> Hi Nicholas:
> 
> I met some problem for this patch, like this:
> 
> kva = vmalloc(3*1024k);
> 
> remap_vmalloc_range(xxx, kva, xxx)
> 
> It failed because that the check for page_count(page) is null so return, it 
> break the some logic for current modules.
> because the new huge page is not valid for composed page.

Hey Ding, that's a good catch. How are you testing this stuff, do you 
have a particular driver that does this?

> I think some guys really don't get used to the changes for the vmalloc that 
> the small pages was transparency to the hugepage
> when the size is bigger than the PMD_SIZE.

I think in this case vmalloc could allocate the large page as a compound
page which would solve this problem I think? (without having actually 
tested it)

> can we think about give a new static huge page to fix it? just like use a a 
> new vmalloc_huge_xxx function to disginguish the current function,
> the user could choose to use the transparent hugepage or static hugepage for 
> vmalloc.

Yeah that's a good question, there are a few things in the huge vmalloc 
code that accounts things as small pages and you can't assume large or 
small. If there is benefit from forcing large pages that could certainly
be added.

Interestingly, remap_vmalloc_range in theory could map the pages as 
large in userspace as well. That takes more work but if something
really needs that for performance, it could be done.

Thanks,
Nick

[PATCH] Revert "driver core: Set fw_devlink=on by default"

2021-02-18 Thread Greg Kroah-Hartman

This reverts commit e590474768f1cc04852190b61dec692411b22e2a.

While things are _almost_ there and working for almost all systems,
there are still reported regressions happening, so let's revert this
default for 5.12.  We can bring it back in linux-next after 5.12-rc1 is
out to get more testing and hopefully solve the remaining different
subsystem and driver issues that people are running into.

Cc: Saravana Kannan 
Signed-off-by: Greg Kroah-Hartman 
---
 drivers/base/core.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/base/core.c b/drivers/base/core.c
index ea710b33bda6..afc6f9ce6235 100644
--- a/drivers/base/core.c
+++ b/drivers/base/core.c
@@ -1502,7 +1502,7 @@ static void device_links_purge(struct device *dev)
 #define FW_DEVLINK_FLAGS_RPM   (FW_DEVLINK_FLAGS_ON | \
 DL_FLAG_PM_RUNTIME)
 
-static u32 fw_devlink_flags = FW_DEVLINK_FLAGS_ON;
+static u32 fw_devlink_flags = FW_DEVLINK_FLAGS_PERMISSIVE;
 static int __init fw_devlink_setup(char *arg)
 {
if (!arg)
-- 
2.30.1

Re: [PATCH] cpufreq: schedutil: Don't consider freq reduction to busy CPU if need_freq_update is set

2021-02-18 Thread Viresh Kumar

On 19-02-21, 14:41, Yue Hu wrote:
> On Fri, 19 Feb 2021 09:39:33 +0530
> Viresh Kumar  wrote:
> 
> > On 19-02-21, 11:38, Yue Hu wrote:
> > > There's a possibility: we will use the previous freq to update if
> > > next_f is reduced for busy CPU if need_freq_update is set in
> > > sugov_update_next_freq().  
> > 
> > Right.
> > 
> > > This possibility would happen now? And this
> > > update is what we want if it happens?  
> > 
> > This is exactly what we want here, don't reduce speed for busy CPU,
> 
> I understand it should not skip this update but set the same freq as
> previous one again for the special case if need_freq_update is set. Am
> i rt?

The special check, about not reducing freq if CPU had been busy
recently, doesn't have anything to do with need_freq_update.

Though previously we added the need_freq_update check there to make
sure we account for any recent policy min/max change and don't skip
freq update anymore. That won't happen anymore and so we don't need
any check here related to need_freq_update.

If you still have doubt, please explain your concern in detail with an
example as I am failing to understand it.

-- 
viresh

Re: [PATCH v2 1/2] staging: comedi: cast function output to assigned variable type

2021-02-18 Thread Atul Gopinathan

On Fri, Feb 19, 2021 at 09:55:14AM +0300, Dan Carpenter wrote:
> No problem.  These days I have fibre to my house, but I still remember
> trying to clone the kernel when I could only buy 20MB of data at a
> time.  :P

Whoaa, that's tough! Respect to you for still trying to contribute
to the kernel. Hope no one gets such a situation. :D

Regards,
Atul

Re: [PATCH] staging: wlan-ng: Fixed incorrect type warning in p80211netdev.c

2021-02-18 Thread Ivan Safonov


On 2/17/21 6:42 PM, pritthijit.nath at icloud.com (Pritthijit Nath) wrote:

This change fixes a sparse warning "incorrect type in argument 1
(different address spaces)".

Signed-off-by: Pritthijit Nath 
---
  drivers/staging/wlan-ng/p80211netdev.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/staging/wlan-ng/p80211netdev.c 
b/drivers/staging/wlan-ng/p80211netdev.c
index 6f9666dc0277..70570e8a5ad2 100644
--- a/drivers/staging/wlan-ng/p80211netdev.c
+++ b/drivers/staging/wlan-ng/p80211netdev.c
@@ -569,7 +569,7 @@ static int p80211knetdev_do_ioctl(struct net_device *dev,
 goto bail;
 }
  
-   msgbuf = memdup_user(req->data, req->len);

+   msgbuf = memdup_user((void __user *)req->data, req->len);
 if (IS_ERR(msgbuf)) {
 result = PTR_ERR(msgbuf);
 goto bail;



Reviewed-by: Ivan Safonov

[PATCH] drm/radeon/dpm: fix non-restricted types with le16_to_cpu()

2021-02-18 Thread Yang Li

Fix the following sparse warnings:
drivers/gpu/drm/radeon/rv6xx_dpm.c:1798:21: warning: cast to restricted
__le32
drivers/gpu/drm/radeon/rv6xx_dpm.c:1799:22: warning: cast to restricted
__le16
drivers/gpu/drm/radeon/rv6xx_dpm.c:1800:23: warning: cast to restricted
__le16

Reported-by: Abaci Robot 
Signed-off-by: Yang Li 
---
 drivers/gpu/drm/radeon/rv6xx_dpm.c | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

diff --git a/drivers/gpu/drm/radeon/rv6xx_dpm.c 
b/drivers/gpu/drm/radeon/rv6xx_dpm.c
index 69d380f..e6ab99e 100644
--- a/drivers/gpu/drm/radeon/rv6xx_dpm.c
+++ b/drivers/gpu/drm/radeon/rv6xx_dpm.c
@@ -1795,9 +1795,9 @@ static void rv6xx_parse_pplib_non_clock_info(struct 
radeon_device *rdev,
 struct radeon_ps *rps,
 struct _ATOM_PPLIB_NONCLOCK_INFO 
*non_clock_info)
 {
-   rps->caps = le32_to_cpu(non_clock_info->ulCapsAndSettings);
-   rps->class = le16_to_cpu(non_clock_info->usClassification);
-   rps->class2 = le16_to_cpu(non_clock_info->usClassification2);
+   rps->caps = le32_to_cpu((__le32 
__force)non_clock_info->ulCapsAndSettings);
+   rps->class = le16_to_cpu((__le16 
__force)non_clock_info->usClassification);
+   rps->class2 = le16_to_cpu((__le16 
__force)non_clock_info->usClassification2);
 
if (r600_is_uvd_state(rps->class, rps->class2)) {
rps->vclk = RV6XX_DEFAULT_VCLK_FREQ;
-- 
1.8.3.1

Re: [PATCH] RISC-V: Enable CPU Hotplug in defconfigs

2021-02-18 Thread Palmer Dabbelt


On Mon, 08 Feb 2021 21:46:20 PST (-0800), Anup Patel wrote:

The CPU hotplug support has been tested on QEMU, Spike, and SiFive
Unleashed so let's enable it by default in RV32 and RV64 defconfigs.

Signed-off-by: Anup Patel 
---
 arch/riscv/configs/defconfig  | 1 +
 arch/riscv/configs/rv32_defconfig | 1 +
 2 files changed, 2 insertions(+)

diff --git a/arch/riscv/configs/defconfig b/arch/riscv/configs/defconfig
index 8c3d1e451703..6c0625aa96c7 100644
--- a/arch/riscv/configs/defconfig
+++ b/arch/riscv/configs/defconfig
@@ -17,6 +17,7 @@ CONFIG_BPF_SYSCALL=y
 CONFIG_SOC_SIFIVE=y
 CONFIG_SOC_VIRT=y
 CONFIG_SMP=y
+CONFIG_HOTPLUG_CPU=y
 CONFIG_JUMP_LABEL=y
 CONFIG_MODULES=y
 CONFIG_MODULE_UNLOAD=y
diff --git a/arch/riscv/configs/rv32_defconfig 
b/arch/riscv/configs/rv32_defconfig
index 2c2cda6cc1c5..8dd02b842fef 100644
--- a/arch/riscv/configs/rv32_defconfig
+++ b/arch/riscv/configs/rv32_defconfig
@@ -18,6 +18,7 @@ CONFIG_SOC_SIFIVE=y
 CONFIG_SOC_VIRT=y
 CONFIG_ARCH_RV32I=y
 CONFIG_SMP=y
+CONFIG_HOTPLUG_CPU=y
 CONFIG_JUMP_LABEL=y
 CONFIG_MODULES=y
 CONFIG_MODULE_UNLOAD=y


Thanks, this is on for-next.

Re: [RFC][PATCH 2/2] x86/retpoline: Compress retpolines

2021-02-18 Thread Borislav Petkov

On Thu, Feb 18, 2021 at 05:59:40PM +0100, Peter Zijlstra wrote:
> By using int3 as a speculation fence instead of lfence, we can shrink
> the longest alternative to just 15 bytes:
> 
>   0:   e8 05 00 00 00  callq  a <.altinstr_replacement+0xa>
>   5:   f3 90   pause  
>   7:   cc  int3   
>   8:   eb fb   jmp5 <.altinstr_replacement+0x5>
>   a:   48 89 04 24 mov%rax,(%rsp)
>   e:   c3  retq   
> 
> This means we can change the alignment from 32 to 16 bytes and get 4
> retpolines per cacheline, $I win.

You mean I$ :)

In any case, for both:

Reviewed-by: Borislav Petkov 

and it looks real nice here, the size:

 readelf -s vmlinux | grep __x86_indirect
 78966: 81c023e015 FUNCGLOBAL DEFAULT1 __x86_indirect_t[...]
 81653: 81c0239015 FUNCGLOBAL DEFAULT1 __x86_indirect_t[...]
 82338: 81c0243015 FUNCGLOBAL DEFAULT1 __x86_indirect_t[...]
 82955: 81c0238015 FUNCGLOBAL DEFAULT1 __x86_indirect_t[...]
 85057: 81c023f015 FUNCGLOBAL DEFAULT1 __x86_indirect_t[...]
 89996: 81c023a015 FUNCGLOBAL DEFAULT1 __x86_indirect_t[...]
 91094: 81c0240015 FUNCGLOBAL DEFAULT1 __x86_indirect_t[...]
 91278: 81c023b015 FUNCGLOBAL DEFAULT1 __x86_indirect_t[...]
 92015: 81c0236015 FUNCGLOBAL DEFAULT1 __x86_indirect_t[...]
 92722: 81c023c015 FUNCGLOBAL DEFAULT1 __x86_indirect_t[...]
 97062: 81c0241015 FUNCGLOBAL DEFAULT1 __x86_indirect_t[...]
 98687: 81c023d015 FUNCGLOBAL DEFAULT1 __x86_indirect_t[...]
 99076: 81c0235015 FUNCGLOBAL DEFAULT1 __x86_indirect_t[...]
 99500: 81c0237015 FUNCGLOBAL DEFAULT1 __x86_indirect_t[...]
100579: 81c0242015 FUNCGLOBAL DEFAULT1 __x86_indirect_t[...]

-- 
Regards/Gruss,
Boris.

https://people.kernel.org/tglx/notes-about-netiquette

Re: 5.10 LTS Kernel: 2 or 6 years?

2021-02-18 Thread Jari Ruusu

On Thursday, February 18, 2021 7:44 PM, Greg Kroah-Hartman 
 wrote:
> > It was the other way around. Fine working in-tree driver got
> > broken by backported "fixes". I did mention bit-rot.
>
> It did? Please let us stable maintainers know about, we will always
> gladly revert problems patches. What commits caused the problem?

I don't have a list of commits for you. It took me long time to
figure out that it was iwlwifi that was causing those problems.

In-tree iwlwifi on 4.19.y kernels needs professional quality
locking audit and backporting of necessary fixes from upstream
Intel out-of-tree version.

> So something in the 4.9.y and 4.14.y stable kernels caused a regression,
> can you please do 'git bisect' to let us know what broke?

My ability to do WiFi tests on that laptop computer are limited
for now. Earlier that laptop's connectivity to world was through
mobile WiFi router. That mobile WiFi router no longer has a
SIM-card, so no connectivity through that anymore. That laptop's
connectivity to world is now through wired ethernet to fiber.

It was actually this switch to ethernet/fiber that made me realize
the brokenness on in-tree iwlwifi on 4.19.y kernels. When in-tree
WiFi was not used, those problems never triggered. Switched back
to in-tree WiFi, and problems came back. Switched to out-of-tree
Intel version of iwlwifi, problems went away again. Then I looked
at the fixes in out-of-tree Intel version of iwlwifi that were
missing in in-tree version, and I understood why that was so.

Those stability issues on in-tree iwlwifi on 4.19.y kernels are
difficult to trigger. Sometimes it may take days to trigger it
once. Sometimes I was unlucky enough to trigger them more than
once a day. I say that two weeks of operation without issues are
needed to pronounce those issues gone.

Currently, special arrangements are needed for me to test WiFi at
all on that laptop computer, and those arrangements are something
I am not willing to commit for multi-week testing run. Sorry.

> And if 4.19.0 was always broken, why didn't you report that as well?

I didn't test early 4.19.y kernels.

--
Jari Ruusu  4096R/8132F189 12D6 4C3A DCDA 0AA4 27BD  ACDF F073 3C80 8132 F189

Re: [PATCH v4] tpm: fix reference counting for struct tpm_chip

2021-02-18 Thread Jarkko Sakkinen

On Wed, Feb 17, 2021 at 09:27:02PM -0400, Jason Gunthorpe wrote:
> On Thu, Feb 18, 2021 at 12:14:11AM +0200, Jarkko Sakkinen wrote:
> > On Tue, Feb 16, 2021 at 04:31:26PM +, David Laight wrote:
> > > ...
> > > > > > +   get_device(>dev);
> > > > > > +   chip->devs.release = tpm_devs_release;
> > > > > > +   chip->devs.devt =
> > > > > > +   MKDEV(MAJOR(tpm_devt), chip->dev_num + TPM_NUM_DEVICES);
> > > > 
> > > > Isn't this less than 100 chars?
> > > 
> > > Still best kept under 80 if 'reasonable'?
> > > 
> > > Really it is just split in the wrong place:
> > >   chip->devs.devt = MKDEV(MAJOR(tpm_devt),
> > >   chip->dev_num + TPM_NUM_DEVICES);
> > 
> > 
> > Well it looks crap IMHO. Would be more reasonable to have it in a single 
> > like. And it is legit too, since it is accepted by checkpatch.
> > 
> > You might break the lines within 80 chars if it is somehow "logically"
> > consistent.
> 
> FWIW, I've become kind of tired of the style wishywashyness I've
> mostly been happy to accept anything that clang-format spits out for
> ordinary C constructs.

A. I would not mind if it was already merged. Since it isn't, I don't
   see the point not fixing it.

> It is good enough and universally usable. If devs don't have it linked
> to their editor to format single expression or format selected blocks,
> they are missing out :)
> 
> The community consensus on style is quite unclear. Is 1 or 2 above the
> majority preference? Does this case fall under the new "use more than
> 80 cols if it improves readability?" I have no idea.

B. I need to maintain this, once it's merged.
C. A smaller diff for a critical bug fix. I actually allow style
   compromises for fixes to be backported *when* it makes the overall
   diff smaller.
D. Has more odds to make future changes smaller as the whole thing is
   in a single code line.

> Frankly, for most people writing driver code, if they consistently use
> clang-format their work will be alot better than if they try to do it
> by hand. It takes a lot of experiance to reliably eyeball something
> close to the kernel style..

For me it gives a framework to review patches in multiple subsystems.
If I have to constantly think whether to allow this and that shift
from the kernel coding style, it makes the whole process for me more
fuzzy and chaotic.

As I said (A), it would not be end of the world if this had been
merged already. I also want to state that I do sometimes make mistakes
when reviewing code, and am happy to take critique from that :-)

> Jason

/Jarkko

Re: [PATCH RESEND V12 3/8] fuse: Definitions and ioctl for passthrough

2021-02-18 Thread Peng Tao

On Wed, Feb 17, 2021 at 9:41 PM Miklos Szeredi  wrote:
>
> On Mon, Jan 25, 2021 at 4:31 PM Alessio Balsini  wrote:
> >
> > Expose the FUSE_PASSTHROUGH interface to user space and declare all the
> > basic data structures and functions as the skeleton on top of which the
> > FUSE passthrough functionality will be built.
> >
> > As part of this, introduce the new FUSE passthrough ioctl, which allows
> > the FUSE daemon to specify a direct connection between a FUSE file and a
> > lower file system file. Such ioctl requires user space to pass the file
> > descriptor of one of its opened files through the fuse_passthrough_out
> > data structure introduced in this patch. This structure includes extra
> > fields for possible future extensions.
> > Also, add the passthrough functions for the set-up and tear-down of the
> > data structures and locks that will be used both when fuse_conns and
> > fuse_files are created/deleted.
> >
> > Signed-off-by: Alessio Balsini 
>
> [...]
>
> > diff --git a/include/uapi/linux/fuse.h b/include/uapi/linux/fuse.h
> > index 54442612c48b..9d7685ce0acd 100644
> > --- a/include/uapi/linux/fuse.h
> > +++ b/include/uapi/linux/fuse.h
> > @@ -360,6 +360,7 @@ struct fuse_file_lock {
> >  #define FUSE_MAP_ALIGNMENT (1 << 26)
> >  #define FUSE_SUBMOUNTS (1 << 27)
> >  #define FUSE_HANDLE_KILLPRIV_V2(1 << 28)
> > +#define FUSE_PASSTHROUGH   (1 << 29)
>
> This header has a version and a changelog.  Please update those as well.
>
> >
> >  /**
> >   * CUSE INIT request/reply flags
> > @@ -625,7 +626,7 @@ struct fuse_create_in {
> >  struct fuse_open_out {
> > uint64_tfh;
> > uint32_topen_flags;
> > -   uint32_tpadding;
> > +   uint32_tpassthrough_fh;
>
> I think it would be cleaner to add a FOPEN_PASSTHROUGH flag to
> explicitly request passthrough instead of just passing a non-null
> value to passthrough_fh.
>
> >  };
> >
> >  struct fuse_release_in {
> > @@ -828,6 +829,13 @@ struct fuse_in_header {
> > uint32_tpadding;
> >  };
> >
> > +struct fuse_passthrough_out {
> > +   uint32_tfd;
> > +   /* For future implementation */
> > +   uint32_tlen;
> > +   void*vec;
> > +};
>
> I don't see why we'd need these extensions.The ioctl just needs to
> establish an ID to open file mapping that can be referenced on the
> regular protocol, i.e. it just needs to be passed an open file
> descriptor and return an unique ID.
>
> Mapping the fuse file's data to the underlying file's data is a
> different matter.  That can be an identity mapping established at open
> time (this is what this series does) or it can be an arbitrary extent
> mapping to one or more underlying open files, established at open time
> or on demand.  All of these can be done in band using the fuse
> protocol, no need to involve the ioctl mechanism.
>
> So I think we can just get rid of "struct fuse_passthrough_out"
> completely and use "uint32_t *" as the ioctl argument.
>
> What I think would be useful is to have an explicit
> FUSE_DEV_IOC_PASSTHROUGH_CLOSE ioctl, that would need to be called
> once the fuse server no longer needs this ID.   If this turns out to
> be a performance problem, we could still add the auto-close behavior
> with an explicit FOPEN_PASSTHROUGH_AUTOCLOSE flag later.
Hi Miklos,

W/o auto closing, what happens if user space daemon forgets to call
FUSE_DEV_IOC_PASSTHROUGH_CLOSE? Do we keep the ID alive somewhere?

Thanks,
Tao
-- 
Into Sth. Rich & Strange

Re: [PATCH] staging: rtl8723bs: make if-statement checkpatch.pl conform

2021-02-18 Thread Dan Carpenter

On Thu, Feb 18, 2021 at 07:50:27PM +, Kurt Manucredo wrote:
> Signed-off-by: Kurt Manucredo 
> ---
> 
> The preferred coding style is:
>   if (!StaAddr)
>   return;

Above the Signed-off-by line.  Also indenting is wrong.  And it's hard
to understand what you're saying.

> 
> thank you mr. dan carpenter

You're welcome.  (These sorts of comments do go below the --- cut off
line so that's fine.)

regards,
dan carpenter

[PATCH] perf report: Create option to disable raw event ordering

2021-02-18 Thread Jin Yao

Warning "dso not found" is reported when using "perf report -D".

 66702781413407 0x32c0 [0x30]: PERF_RECORD_SAMPLE(IP, 0x2): 28177/28177: 
0x55e493e00563 period: 106578 addr: 0
  ... thread: perf:28177
  .. dso: 

 66702727832429 0x9dd8 [0x38]: PERF_RECORD_COMM exec: triad_loop:28177/28177

The PERF_RECORD_SAMPLE event (timestamp: 66702781413407) should be after the
PERF_RECORD_COMM event (timestamp: 66702727832429), but it's early processed.

So for most of cases, it makes sense to keep the event ordered even for dump
mode. But it would be also useful to disable ordered_events for reporting raw
dump to see events as they are stored in the perf.data file.

So now, set ordered_events by default to true and add a new option
'disable-order' to disable it. For example,

perf report -D --disable-order

Fixes: 977f739b7126b ("perf report: Disable ordered_events for raw dump")
Signed-off-by: Jin Yao 
---
 tools/perf/Documentation/perf-report.txt | 3 +++
 tools/perf/builtin-report.c  | 5 -
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/tools/perf/Documentation/perf-report.txt 
b/tools/perf/Documentation/perf-report.txt
index f546b5e9db05..87112e8d904e 100644
--- a/tools/perf/Documentation/perf-report.txt
+++ b/tools/perf/Documentation/perf-report.txt
@@ -224,6 +224,9 @@ OPTIONS
 --dump-raw-trace::
 Dump raw trace in ASCII.
 
+--disable-order::
+   Disable raw trace ordering.
+
 -g::
 
--call-graph=::
 Display call chains using type, min percent threshold, print limit,
diff --git a/tools/perf/builtin-report.c b/tools/perf/builtin-report.c
index 2a845d6cac09..0d65c98794a8 100644
--- a/tools/perf/builtin-report.c
+++ b/tools/perf/builtin-report.c
@@ -84,6 +84,7 @@ struct report {
boolnonany_branch_mode;
boolgroup_set;
boolstitch_lbr;
+   booldisable_order;
int max_stack;
struct perf_read_values show_threads_values;
struct annotation_options annotation_opts;
@@ -1296,6 +1297,8 @@ int cmd_report(int argc, const char **argv)
OPTS_EVSWITCH(),
OPT_BOOLEAN(0, "total-cycles", _cycles_mode,
"Sort all blocks by 'Sampled Cycles%'"),
+   OPT_BOOLEAN(0, "disable-order", _order,
+   "Disable raw trace ordering"),
OPT_END()
};
struct perf_data data = {
@@ -1329,7 +1332,7 @@ int cmd_report(int argc, const char **argv)
if (report.mmaps_mode)
report.tasks_mode = true;
 
-   if (dump_trace)
+   if (dump_trace && report.disable_order)
report.tool.ordered_events = false;
 
if (quiet)
-- 
2.17.1

Re: [PATCH] Staging: rtl8723bs: code-style fix

2021-02-18 Thread Dan Carpenter

The subject is too vague.

On Thu, Feb 18, 2021 at 04:33:10PM +, Kurt Manucredo wrote:
> Signed-off-by: Kurt Manucredo 
> ---
> 
> Checkpatch complains the constant needs to be on the right side of the
> comparison. The preferred way is: 
> 

The commit message isn't complete and it has to go above the Signed-off-by
line.

regards,
dan carpenter

Re: [PATCH v2 1/2] staging: comedi: cast function output to assigned variable type

2021-02-18 Thread Dan Carpenter

No problem.  These days I have fibre to my house, but I still remember
trying to clone the kernel when I could only buy 20MB of data at a
time.  :P

regards,
dan carpenter

[PATCH] btrfs: ref-verify: use 'inline void' keyword ordering

2021-02-18 Thread Randy Dunlap

Fix build warnings of function signature when CONFIG_STACKTRACE is not
enabled by reordering the 'inline' and 'void' keywords.

../fs/btrfs/ref-verify.c:221:1: warning: ‘inline’ is not at beginning of 
declaration [-Wold-style-declaration]
 static void inline __save_stack_trace(struct ref_action *ra)
../fs/btrfs/ref-verify.c:225:1: warning: ‘inline’ is not at beginning of 
declaration [-Wold-style-declaration]
 static void inline __print_stack_trace(struct btrfs_fs_info *fs_info,

Fixes: fd708b81d972 ("Btrfs: add a extent ref verify tool")
Signed-off-by: Randy Dunlap 
Cc: Josef Bacik 
Cc: David Sterba 
Cc: Chris Mason 
Cc: linux-bt...@vger.kernel.org
Cc: Andrew Morton 
---
Found in mmotm; applies to mainline.

Apparently we are doing more '-W' checking than when this change was
made in 2017.

 fs/btrfs/ref-verify.c |4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

--- mmotm-2021-0218-1829.orig/fs/btrfs/ref-verify.c
+++ mmotm-2021-0218-1829/fs/btrfs/ref-verify.c
@@ -218,11 +218,11 @@ static void __print_stack_trace(struct b
stack_trace_print(ra->trace, ra->trace_len, 2);
 }
 #else
-static void inline __save_stack_trace(struct ref_action *ra)
+static inline void __save_stack_trace(struct ref_action *ra)
 {
 }
 
-static void inline __print_stack_trace(struct btrfs_fs_info *fs_info,
+static inline void __print_stack_trace(struct btrfs_fs_info *fs_info,
   struct ref_action *ra)
 {
btrfs_err(fs_info, "  ref-verify: no stacktrace support");

Re: [PATCH] cpufreq: schedutil: Don't consider freq reduction to busy CPU if need_freq_update is set

2021-02-18 Thread Yue Hu

On Fri, 19 Feb 2021 09:39:33 +0530
Viresh Kumar  wrote:

> On 19-02-21, 11:38, Yue Hu wrote:
> > There's a possibility: we will use the previous freq to update if
> > next_f is reduced for busy CPU if need_freq_update is set in
> > sugov_update_next_freq().  
> 
> Right.
> 
> > This possibility would happen now? And this
> > update is what we want if it happens?  
> 
> This is exactly what we want here, don't reduce speed for busy CPU,

I understand it should not skip this update but set the same freq as
previous one again for the specail case if need_freq_update is set. Am
i rt?

> but we also need to make sure we are in the policy's valid range
> which cpufreq core will take care of.
> 
> > This is related to another possible patch ready to send.  
> 
> I am not sure what's there to send now.

I will send later after figure out the doubt above.

>

Re: [PATCH] kthread: add kthread_mod_pending_delayed_work api

2021-02-18 Thread Yiwei Zhang‎

> 2. User triggered clean up races with the clean up triggered by
>   timeout. You try to handle this scenario by this patch.
Yes, exactly.

> 3. User does clean up after the clean up has already been done
>   by the timeout.
This case is well handled. So only (2) has a potential race.

Let me clarify a bit more here. The "clean up" is not the clean up
when a process tears down, but it's actually a "post-work" to cancel
out an early "pre-work". The "pre-work" enqueues the delayed "post
work" for the timeout purpose. That pair of operations can repeatedly
happen.

The racing is currently worked around by refcounting the delayed_work
container, and the later "post-work" will take care of the work
deallocation.

I mainly want to reach out to see if we agree that this is a valid API
to be supported by kthread. Or we extend the existing
kthread_mod_delayed_work api to take another option to not re-queue if
the cancel failed.

Best,
Yiwei

Re: [PATCH ghak124 v3] audit: log nftables configuration change events

2021-02-18 Thread Richard Guy Briggs

On 2021-02-18 23:42, Florian Westphal wrote:
> Richard Guy Briggs  wrote:
> > > If they appear in a batch tehy will be ignored, if the batch consists of
> > > such non-modifying ops only then nf_tables_commit() returns early
> > > because the transaction list is empty (nothing to do/change).
> > 
> > Ok, one little inconvenient question: what about GETOBJ_RESET?  That
> > looks like a hybrid that modifies kernel table counters and reports
> > synchronously.  That could be a special case call in
> > nf_tables_dump_obj() and nf_tables_getobj().  Will that cause a storm
> > per commit?
> 
> No, since they can't be part of a commit (they don't implement the
> 'call_batch' function).

Ok, good, so they should be safe (but still needs the gfp param to
audit_log_nfcfg() for atomic alloc in that obj reset callback).

> I'm not sure GETOBJ_RESET should be reported in the first place:
> RESET only affects expr internal state, and that state changes all the time
> anyway in response to network traffic.

We report audit lost messages reset as a config change since it affects
the view that an admin has about a system.  An unaccounted for reset
could mislead an administrator into thinking things are alright when
some messages were lost and there was nothing to show for it.  I could
see similar situations with network entity counters.

- RGB

--
Richard Guy Briggs 
Sr. S/W Engineer, Kernel Security, Base Operating Systems
Remote, Ottawa, Red Hat Canada
IRC: rgb, SunRaycer
Voice: +1.647.777.2635, Internal: (81) 32635

Re: [PATCH] staging: i2400m: use explicit host byte-order types in comparison

2021-02-18 Thread Greg Kroah-Hartman

On Fri, Feb 19, 2021 at 06:00:47AM +0530, karthik alapati wrote:
> convert le32 types to host byte-order types before
> comparison

That says what you did, but not _why_ you did it.  Please fix up and
resend.

thanks,

greg k-h

[PATCH] btrfs: Remove unused variable ret

2021-02-18 Thread Jiapeng Chong

Fix the following coccicheck warnings:

./fs/btrfs/disk-io.c:4403:5-8: Unneeded variable: "ret". Return "0" on
line 4411.

Reported-by: Abaci Robot 
Signed-off-by: Jiapeng Chong 
---
 fs/btrfs/disk-io.c | 9 -
 1 file changed, 4 insertions(+), 5 deletions(-)

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 07a2b4f..78ac205 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -52,7 +52,7 @@
 
 static void end_workqueue_fn(struct btrfs_work *work);
 static void btrfs_destroy_ordered_extents(struct btrfs_root *root);
-static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
+static void btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
  struct btrfs_fs_info *fs_info);
 static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root);
 static int btrfs_destroy_marked_extents(struct btrfs_fs_info *fs_info,
@@ -4394,13 +4394,12 @@ static void btrfs_destroy_all_ordered_extents(struct 
btrfs_fs_info *fs_info)
btrfs_wait_ordered_roots(fs_info, U64_MAX, 0, (u64)-1);
 }
 
-static int btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
+static void btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
  struct btrfs_fs_info *fs_info)
 {
struct rb_node *node;
struct btrfs_delayed_ref_root *delayed_refs;
struct btrfs_delayed_ref_node *ref;
-   int ret = 0;
 
delayed_refs = >delayed_refs;
 
@@ -4408,7 +4407,7 @@ static int btrfs_destroy_delayed_refs(struct 
btrfs_transaction *trans,
if (atomic_read(_refs->num_entries) == 0) {
spin_unlock(_refs->lock);
btrfs_debug(fs_info, "delayed_refs has NO entry");
-   return ret;
+   return;
}
 
while ((node = rb_first_cached(_refs->href_root)) != NULL) {
@@ -4474,7 +4473,7 @@ static int btrfs_destroy_delayed_refs(struct 
btrfs_transaction *trans,
 
spin_unlock(_refs->lock);
 
-   return ret;
+   return;
 }
 
 static void btrfs_destroy_delalloc_inodes(struct btrfs_root *root)
-- 
1.8.3.1

RE: [PATCH 2/4] iommu/vt-d: Enable write protect propagation from guest

2021-02-18 Thread Tian, Kevin

> From: Jacob Pan 
> Sent: Friday, February 19, 2021 5:31 AM
> 
> Write protect bit, when set, inhibits supervisor writes to the read-only
> pages. In guest supervisor shared virtual addressing (SVA), write-protect
> should be honored upon guest bind supervisor PASID request.
> 
> This patch extends the VT-d portion of the IOMMU UAPI to include WP bit.
> WPE bit of the  supervisor PASID entry will be set to match CPU CR0.WP bit.
> 
> Signed-off-by: Sanjay Kumar 
> Signed-off-by: Jacob Pan 
> ---
>  drivers/iommu/intel/pasid.c | 5 +
>  include/uapi/linux/iommu.h  | 3 ++-
>  2 files changed, 7 insertions(+), 1 deletion(-)
> 
> diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
> index 0b7e0e726ade..c7a2ec930af4 100644
> --- a/drivers/iommu/intel/pasid.c
> +++ b/drivers/iommu/intel/pasid.c
> @@ -763,6 +763,11 @@ intel_pasid_setup_bind_data(struct intel_iommu
> *iommu, struct pasid_entry *pte,
>   return -EINVAL;
>   }
>   pasid_set_sre(pte);
> + /* Enable write protect WP if guest requested */
> + if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_WPE) {
> + if (pasid_enable_wpe(pte))
> + return -EINVAL;

We should call pasid_set_wpe directly, as this binding is about guest
page table and suppose the guest has done whatever check required
(e.g. gcr0.wp) before setting this bit. pasid_enable_wpe has an additional 
check on host cr0.wp thus is logically incorrect here.

Thanks
Kevin

> + }
>   }
> 
>   if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EAFE) {
> diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
> index 68cb558fe8db..33f3dc7a91de 100644
> --- a/include/uapi/linux/iommu.h
> +++ b/include/uapi/linux/iommu.h
> @@ -288,7 +288,8 @@ struct iommu_gpasid_bind_data_vtd {
>  #define IOMMU_SVA_VTD_GPASID_PWT (1 << 3) /* page-level write
> through */
>  #define IOMMU_SVA_VTD_GPASID_EMTE(1 << 4) /* extended mem
> type enable */
>  #define IOMMU_SVA_VTD_GPASID_CD  (1 << 5) /* PASID-level
> cache disable */
> -#define IOMMU_SVA_VTD_GPASID_LAST(1 << 6)
> +#define IOMMU_SVA_VTD_GPASID_WPE (1 << 6) /* Write protect
> enable */
> +#define IOMMU_SVA_VTD_GPASID_LAST(1 << 7)
>   __u64 flags;
>   __u32 pat;
>   __u32 emt;
> --
> 2.25.1

Re: [PATCH] perf machine: Use true and false for bool variable

2021-02-18 Thread kajoljain




On 2/18/21 2:54 PM, Jiapeng Chong wrote:
> Fix the following coccicheck warnings:
> 
> ./tools/perf/util/machine.c:2000:9-10: WARNING: return of 0/1 in
> function 'symbol__match_regex' with return type bool.
> 
> Reported-by: Abaci Robot 
> Signed-off-by: Jiapeng Chong 
> ---
>  tools/perf/util/machine.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
> 
> diff --git a/tools/perf/util/machine.c b/tools/perf/util/machine.c
> index 1e9d3f9..f7ee29b 100644
> --- a/tools/perf/util/machine.c
> +++ b/tools/perf/util/machine.c
> @@ -1997,8 +1997,8 @@ int machine__process_event(struct machine *machine, 
> union perf_event *event,
>  static bool symbol__match_regex(struct symbol *sym, regex_t *regex)
>  {
>   if (!regexec(regex, sym->name, 0, NULL, 0))
> - return 1;
> - return 0;
> + return true;
> + return false;
>  }
>  

Hi Jiapeng,
   Just a suggestion, Can we make this check in single line like this:

static bool symbol__match_regex(struct symbol *sym, regex_t *regex)
{
return regexec(regex, sym->name, 0, NULL, 0) == 0;
}

Thanks,
Kajol Jain

>  static void ip__resolve_ams(struct thread *thread,
>

Re: [PATCH v2 1/2] mm: Make alloc_contig_range handle free hugetlb pages

2021-02-18 Thread Oscar Salvador


On 2021-02-19 03:10, Mike Kravetz wrote:

Those counts will be wrong as there are no huge pages on the node.

I'll think about this more tomorrow.
Pretty sure this is an issue, but I could be wrong.  Just wanted to 
give

a heads up.


Yes, this is a problem, although the fixup would be to check whether we 
have any hugepages.


Nevertheless, I think we should not be touching surplus at all but 
rather make the page temporary.
I am exploring making migrate_pages() handle free hugepages as Michal 
suggested, so the approach is cleaner and we do not need extra 
functions. I yet have to see if that is feasible, as some issues come to 
my mind like the page needs to be in a list to go to migrate_pages, but 
if it is in that list, it is not in hugepages_freelist, and that could 
disrupt userspace as it could not dequeue hugepages if it demands it.
I have to check. Shoult not be possible, we can always make the page 
temporary here.



--
Mike Kravetz


+   }
+   }
+
+   return ret;
+}
+
+bool isolate_or_dissolve_huge_page(struct page *page)
+{
+   struct hstate *h = NULL;
+   struct page *head;
+   bool ret = false;
+
+   spin_lock(_lock);
+   if (PageHuge(page)) {
+   head = compound_head(page);
+   h = page_hstate(head);
+   }
+   spin_unlock(_lock);
+
+   /*
+* The page might have been dissolved from under our feet.
+	 * If that is the case, return success as if we dissolved it 
ourselves.

+*/
+   if (!h)
+   return true;
+
+   /*
+* Fence off gigantic pages as there is a cyclic dependency
+* between alloc_contig_range and them.
+*/
+   if (hstate_is_gigantic(h))
+   return ret;
+
+   if(!page_count(head) && alloc_and_dissolve_huge_page(h, head))
+   ret = true;
+
+   return ret;
+}
+
 struct page *alloc_huge_page(struct vm_area_struct *vma,
unsigned long addr, int avoid_reserve)
 {



--
Oscar Salvador
SUSE L3

[PATCH 4/4] iommu/vt-d: Calculate and set flags for handle_mm_fault

2021-02-18 Thread Jacob Pan

Page requests are originated from the user page fault. Therefore, we
shall set FAULT_FLAG_USER. 

FAULT_FLAG_REMOTE indicates that we are walking an mm which is not
guaranteed to be the same as the current->mm and should not be subject
to protection key enforcement. Therefore, we should set FAULT_FLAG_REMOTE
to avoid faults when both SVM and PKEY are used.

References: commit 1b2ee1266ea6 ("mm/core: Do not enforce PKEY permissions on 
remote mm access")
Reviewed-by: Raj Ashok 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/svm.c | 9 ++---
 1 file changed, 6 insertions(+), 3 deletions(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index ff7ae7cc17d5..7bfd20a24a60 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -1086,6 +1086,7 @@ static irqreturn_t prq_event_thread(int irq, void *d)
struct intel_iommu *iommu = d;
struct intel_svm *svm = NULL;
int head, tail, handled = 0;
+   unsigned int flags = 0;
 
/* Clear PPR bit before reading head/tail registers, to
 * ensure that we get a new interrupt if needed. */
@@ -1186,9 +1187,11 @@ static irqreturn_t prq_event_thread(int irq, void *d)
if (access_error(vma, req))
goto invalid;
 
-   ret = handle_mm_fault(vma, address,
- req->wr_req ? FAULT_FLAG_WRITE : 0,
- NULL);
+   flags = FAULT_FLAG_USER | FAULT_FLAG_REMOTE;
+   if (req->wr_req)
+   flags |= FAULT_FLAG_WRITE;
+
+   ret = handle_mm_fault(vma, address, flags, NULL);
if (ret & VM_FAULT_ERROR)
goto invalid;
 
-- 
2.25.1

[PATCH 3/4] iommu/vt-d: Reject unsupported page request modes

2021-02-18 Thread Jacob Pan

When supervisor/privilige mode SVM is used, we bind init_mm.pgd with
a supervisor PASID. There should not be any page fault for init_mm.
Execution request with DMA read is also not supported.

This patch checks PRQ descriptor for both unsupported configurations,
reject them both with invalid responses.

Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/svm.c | 12 +++-
 1 file changed, 11 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/svm.c b/drivers/iommu/intel/svm.c
index 23a1e4f58c54..ff7ae7cc17d5 100644
--- a/drivers/iommu/intel/svm.c
+++ b/drivers/iommu/intel/svm.c
@@ -1113,7 +1113,17 @@ static irqreturn_t prq_event_thread(int irq, void *d)
   ((unsigned long long *)req)[1]);
goto no_pasid;
}
-
+   /* We shall not receive page request for supervisor SVM */
+   if (req->pm_req && (req->rd_req | req->wr_req)) {
+   pr_err("Unexpected page request in Privilege Mode");
+   /* No need to find the matching sdev as for bad_req */
+   goto no_pasid;
+   }
+   /* DMA read with exec requeset is not supported. */
+   if (req->exe_req && req->rd_req) {
+   pr_err("Execution request not supported\n");
+   goto no_pasid;
+   }
if (!svm || svm->pasid != req->pasid) {
rcu_read_lock();
svm = ioasid_find(NULL, req->pasid, NULL);
-- 
2.25.1

[PATCH 1/4] iommu/vt-d: Enable write protect for supervisor SVM

2021-02-18 Thread Jacob Pan

Write protect bit, when set, inhibits supervisor writes to the read-only
pages. In supervisor shared virtual addressing (SVA), where page tables
are shared between CPU and DMA, IOMMU PASID entry WPE bit should match
CR0.WP bit in the CPU.
This patch sets WPE bit for supervisor PASIDs if CR0.WP is set.

Signed-off-by: Sanjay Kumar 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/pasid.c | 26 ++
 1 file changed, 26 insertions(+)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 0cceaabc3ce6..0b7e0e726ade 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -410,6 +410,15 @@ static inline void pasid_set_sre(struct pasid_entry *pe)
pasid_set_bits(>val[2], 1 << 0, 1);
 }
 
+/*
+ * Setup the WPE(Write Protect Enable) field (Bit 132) of a
+ * scalable mode PASID entry.
+ */
+static inline void pasid_set_wpe(struct pasid_entry *pe)
+{
+   pasid_set_bits(>val[2], 1 << 4, 1 << 4);
+}
+
 /*
  * Setup the P(Present) field (Bit 0) of a scalable mode PASID
  * entry.
@@ -553,6 +562,20 @@ static void pasid_flush_caches(struct intel_iommu *iommu,
}
 }
 
+static inline int pasid_enable_wpe(struct pasid_entry *pte)
+{
+   unsigned long cr0 = read_cr0();
+
+   /* CR0.WP is normally set but just to be sure */
+   if (unlikely(!(cr0 & X86_CR0_WP))) {
+   pr_err_ratelimited("No CPU write protect!\n");
+   return -EINVAL;
+   }
+   pasid_set_wpe(pte);
+
+   return 0;
+};
+
 /*
  * Set up the scalable mode pasid table entry for first only
  * translation type.
@@ -584,6 +607,9 @@ int intel_pasid_setup_first_level(struct intel_iommu *iommu,
return -EINVAL;
}
pasid_set_sre(pte);
+   if (pasid_enable_wpe(pte))
+   return -EINVAL;
+
}
 
if (flags & PASID_FLAG_FL5LP) {
-- 
2.25.1

[PATCH 0/4] Misc vSVA fixes for VT-d

2021-02-18 Thread Jacob Pan

Hi Baolu et al,

This is a collection of SVA-related fixes.

Thanks,
Jacob


Jacob Pan (4):
  iommu/vt-d: Enable write protect for supervisor SVM
  iommu/vt-d: Enable write protect propagation from guest
  iommu/vt-d: Reject unsupported page request modes
  iommu/vt-d: Calculate and set flags for handle_mm_fault

 drivers/iommu/intel/pasid.c | 31 +++
 drivers/iommu/intel/svm.c   | 21 +
 include/uapi/linux/iommu.h  |  3 ++-
 3 files changed, 50 insertions(+), 5 deletions(-)

-- 
2.25.1

Re: [PATCH] staging: emxx_udc: remove unused variable driver_desc

2021-02-18 Thread Greg Kroah-Hartman

On Thu, Feb 18, 2021 at 10:41:07PM -0500, Sean Behan wrote:
> Signed-off-by: Sean Behan 
> ---
>  drivers/staging/emxx_udc/emxx_udc.c | 1 -
>  1 file changed, 1 deletion(-)
> 
> diff --git a/drivers/staging/emxx_udc/emxx_udc.c 
> b/drivers/staging/emxx_udc/emxx_udc.c
> index 3536c03ff523..741147a4f0fe 100644
> --- a/drivers/staging/emxx_udc/emxx_udc.c
> +++ b/drivers/staging/emxx_udc/emxx_udc.c
> @@ -38,7 +38,6 @@ static struct gpio_desc *vbus_gpio;
>  static int vbus_irq;
>  
>  static const chardriver_name[] = "emxx_udc";
> -static const chardriver_desc[] = DRIVER_DESC;
>  
>  
> /*===*/
>  /* Prototype */
> -- 
> 2.29.2

Hi,

This is the friendly patch-bot of Greg Kroah-Hartman.  You have sent him
a patch that has triggered this response.  He used to manually respond
to these common problems, but in order to save his sanity (he kept
writing the same thing over and over, yet to different people), I was
created.  Hopefully you will not take offence and will fix the problem
in your patch and resubmit it so that it can be accepted into the Linux
kernel tree.

You are receiving this message because of the following common error(s)
as indicated below:

- You did not specify a description of why the patch is needed, or
  possibly, any description at all, in the email body.  Please read the
  section entitled "The canonical patch format" in the kernel file,
  Documentation/SubmittingPatches for what is needed in order to
  properly describe the change.

- This looks like a new version of a previously submitted patch, but you
  did not list below the --- line any changes from the previous version.
  Please read the section entitled "The canonical patch format" in the
  kernel file, Documentation/SubmittingPatches for what needs to be done
  here to properly describe this.

If you wish to discuss this problem further, or you have questions about
how to resolve this issue, please feel free to respond to this email and
Greg will reply once he has dug out from the pending patches received
from other developers.

thanks,

greg k-h's patch email bot

[PATCH 2/4] iommu/vt-d: Enable write protect propagation from guest

2021-02-18 Thread Jacob Pan

Write protect bit, when set, inhibits supervisor writes to the read-only
pages. In guest supervisor shared virtual addressing (SVA), write-protect
should be honored upon guest bind supervisor PASID request.

This patch extends the VT-d portion of the IOMMU UAPI to include WP bit.
WPE bit of the  supervisor PASID entry will be set to match CPU CR0.WP bit.

Signed-off-by: Sanjay Kumar 
Signed-off-by: Jacob Pan 
---
 drivers/iommu/intel/pasid.c | 5 +
 include/uapi/linux/iommu.h  | 3 ++-
 2 files changed, 7 insertions(+), 1 deletion(-)

diff --git a/drivers/iommu/intel/pasid.c b/drivers/iommu/intel/pasid.c
index 0b7e0e726ade..c7a2ec930af4 100644
--- a/drivers/iommu/intel/pasid.c
+++ b/drivers/iommu/intel/pasid.c
@@ -763,6 +763,11 @@ intel_pasid_setup_bind_data(struct intel_iommu *iommu, 
struct pasid_entry *pte,
return -EINVAL;
}
pasid_set_sre(pte);
+   /* Enable write protect WP if guest requested */
+   if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_WPE) {
+   if (pasid_enable_wpe(pte))
+   return -EINVAL;
+   }
}
 
if (pasid_data->flags & IOMMU_SVA_VTD_GPASID_EAFE) {
diff --git a/include/uapi/linux/iommu.h b/include/uapi/linux/iommu.h
index 68cb558fe8db..33f3dc7a91de 100644
--- a/include/uapi/linux/iommu.h
+++ b/include/uapi/linux/iommu.h
@@ -288,7 +288,8 @@ struct iommu_gpasid_bind_data_vtd {
 #define IOMMU_SVA_VTD_GPASID_PWT   (1 << 3) /* page-level write through */
 #define IOMMU_SVA_VTD_GPASID_EMTE  (1 << 4) /* extended mem type enable */
 #define IOMMU_SVA_VTD_GPASID_CD(1 << 5) /* PASID-level cache 
disable */
-#define IOMMU_SVA_VTD_GPASID_LAST  (1 << 6)
+#define IOMMU_SVA_VTD_GPASID_WPE   (1 << 6) /* Write protect enable */
+#define IOMMU_SVA_VTD_GPASID_LAST  (1 << 7)
__u64 flags;
__u32 pat;
__u32 emt;
-- 
2.25.1

Re: [PATCH V7 5/6] of: unittest: Create overlay_common.dtsi and testcases_common.dtsi

2021-02-18 Thread Viresh Kumar

On 18-02-21, 23:20, Frank Rowand wrote:
> Hi Viresh,
> 
> I am in the wrong version with the comments below.  You are at version 8 now.

Yeah, it is fine. I have updated the patches already based on your
comments.

-- 
viresh

Re: [PATCH V7 5/6] of: unittest: Create overlay_common.dtsi and testcases_common.dtsi

2021-02-18 Thread Frank Rowand

Hi Viresh,

I am in the wrong version with the comments below.  You are at version 8 now.

-Frank

On 2/18/21 3:02 PM, Frank Rowand wrote:
> On 1/29/21 1:24 AM, Viresh Kumar wrote:
>> In order to build-test the same unit-test files using fdtoverlay tool,
>> move the device nodes from the existing overlay_base.dts and
>> testcases_common.dts files to .dtsi counterparts. The .dts files now
>> include the new .dtsi files, resulting in exactly the same behavior as
>> earlier.
>>
>> The .dtsi files can now be reused for compile time tests using
>> fdtoverlay (will be done by a later commit).
>>
>> This is required because the base files passed to fdtoverlay tool
>> shouldn't be overlays themselves (i.e. shouldn't have the /plugin/;
>> tag).
>>
>> Note that this commit also moves "testcase-device2" node to
>> testcases.dts from tests-interrupts.dtsi, as this node has a deliberate
>> error in it and is only relevant for runtime testing done with
>> unittest.c.
>>
>> Signed-off-by: Viresh Kumar 
>> ---
>>  drivers/of/unittest-data/overlay_base.dts | 90 +-
>>  drivers/of/unittest-data/overlay_common.dtsi  | 91 +++
>>  drivers/of/unittest-data/testcases.dts| 18 ++--
>>  .../of/unittest-data/testcases_common.dtsi| 19 
>>  .../of/unittest-data/tests-interrupts.dtsi|  7 --
>>  5 files changed, 118 insertions(+), 107 deletions(-)
>>  create mode 100644 drivers/of/unittest-data/overlay_common.dtsi
>>  create mode 100644 drivers/of/unittest-data/testcases_common.dtsi
>>
>> diff --git a/drivers/of/unittest-data/overlay_base.dts 
>> b/drivers/of/unittest-data/overlay_base.dts
>> index 99ab9d12d00b..ab9014589c5d 100644
>> --- a/drivers/of/unittest-data/overlay_base.dts
>> +++ b/drivers/of/unittest-data/overlay_base.dts
>> @@ -2,92 +2,4 @@
>>  /dts-v1/;
>>  /plugin/;
>>  
>> -/*
>> - * Base device tree that overlays will be applied against.
>> - *
>> - * Do not add any properties in node "/".
>> - * Do not add any nodes other than "/testcase-data-2" in node "/".
>> - * Do not add anything that would result in dtc creating node "/__fixups__".
>> - * dtc will create nodes "/__symbols__" and "/__local_fixups__".
>> - */
>> -
>> -/ {
>> -testcase-data-2 {
>> -#address-cells = <1>;
>> -#size-cells = <1>;
>> -
>> -electric_1: substation@100 {
>> -compatible = "ot,big-volts-control";
>> -reg = < 0x0100 0x100 >;
>> -status = "disabled";
>> -
>> -hvac_1: hvac-medium-1 {
>> -compatible = "ot,hvac-medium";
>> -heat-range = < 50 75 >;
>> -cool-range = < 60 80 >;
>> -};
>> -
>> -spin_ctrl_1: motor-1 {
>> -compatible = "ot,ferris-wheel-motor";
>> -spin = "clockwise";
>> -rpm_avail = < 50 >;
>> -};
>> -
>> -spin_ctrl_2: motor-8 {
>> -compatible = "ot,roller-coaster-motor";
>> -};
>> -};
>> -
>> -rides_1: fairway-1 {
>> -#address-cells = <1>;
>> -#size-cells = <1>;
>> -compatible = "ot,rides";
>> -status = "disabled";
>> -orientation = < 127 >;
>> -
>> -ride@100 {
>> -#address-cells = <1>;
>> -#size-cells = <1>;
>> -compatible = "ot,roller-coaster";
>> -reg = < 0x0100 0x100 >;
>> -hvac-provider = < _1 >;
>> -hvac-thermostat = < 29 > ;
>> -hvac-zones = < 14 >;
>> -hvac-zone-names = "operator";
>> -spin-controller = < _ctrl_2 5 _ctrl_2 
>> 7 >;
>> -spin-controller-names = "track_1", "track_2";
>> -queues = < 2 >;
>> -
>> -track@30 {
>> -reg = < 0x0030 0x10 >;
>> -};
>> -
>> -track@40 {
>> -reg = < 0x0040 0x10 >;
>> -};
>> -
>> -};
>> -};
>> -
>> -lights_1: lights@3 {
>> -compatible = "ot,work-lights";
>> -reg = < 0x0003 0x1000 >;
>> -status = "disabled";
>> -};
>> -
>> -lights_2: lights@4 {
>> -compatible = "ot,show-lights";
>> -reg = < 0x0004 0x1000 >;
>> -status = "disabled";
>> -rate = < 13 138 >;
>> -};

Re: Re: [PATCH] ASoC: Intel: Skylake: Fix missing check in skl_pcm_trigger

2021-02-18 Thread dinghao . liu

> 
> On 2/15/21 7:13 AM, Dinghao Liu wrote:
> > When cmd == SNDRV_PCM_TRIGGER_STOP, we should also check
> > the return value of skl_decoupled_trigger() just like what
> > we have done in case SNDRV_PCM_TRIGGER_PAUSE_RELEASE.
> >
> > Signed-off-by: Dinghao Liu 
> > ---
> >  sound/soc/intel/skylake/skl-pcm.c | 3 +++
> >  1 file changed, 3 insertions(+)
> >
> > diff --git a/sound/soc/intel/skylake/skl-pcm.c 
> > b/sound/soc/intel/skylake/skl-pcm.c
> > index b1ca64d2f7ea..a5b1f333a500 100644
> > --- a/sound/soc/intel/skylake/skl-pcm.c
> > +++ b/sound/soc/intel/skylake/skl-pcm.c
> > @@ -516,6 +516,9 @@ static int skl_pcm_trigger(struct snd_pcm_substream 
> > *substream, int cmd,
> > return ret;
> 
> Is there any additional error handling to be done for the
> 
> skl_stop_pipe that just happened ?
> 

I think the check is enough here.

Regards,
Dinghao

Re: [PATCH V3 XRT Alveo 00/18] XRT Alveo driver overview

2021-02-18 Thread Lizhi Hou





On 02/18/2021 05:52 AM, Tom Rix wrote:

On 2/17/21 10:40 PM, Lizhi Hou wrote:

Hello,

This is V3 of patch series which adds management physical function driver for 
Xilinx
Alveo PCIe accelerator cards, 
https://www.xilinx.com/products/boards-and-kits/alveo.html
This driver is part of Xilinx Runtime (XRT) open source stack.

XILINX ALVEO PLATFORM ARCHITECTURE

Thanks for refreshing this patchset.

It will take me a while to do the full review, so I thought I would give some 
early feed back.

It applies to char-misc-next, but will have conflicts with in-flight patches 
around the MAINTAINERS file. This is not a big deal.

The checkpatch is much better over v2, the complaints are

WARNING: added, moved or deleted file(s), does MAINTAINERS need updating?
#21:
new file mode 100644

WARNING: From:/Signed-off-by: email address mismatch: 'From: Lizhi Hou 
' != 'Signed-off-by: Lizhi Hou '

MAINTAINERS warning i believe you address in the last patch.

In the next revisions, please fix the signoff.

The test robot is complaining about hppa64.  While it may be an unlikely 
config, it would be best to fix it.
Thanks for reviewing. I will fix signoff, hppa64 and arm build issue 
reported by robot in next revision.


Tom


Alveo PCIe FPGA based platforms have a static *shell* partition and a partial
re-configurable *user* partition. The shell partition is automatically loaded 
from
flash when host is booted and PCIe is enumerated by BIOS. Shell cannot be 
changed
till the next cold reboot. The shell exposes two PCIe physical functions:

1. management physical function
2. user physical function

The patch series includes Documentation/xrt.rst which describes Alveo platform,
XRT driver architecture and deployment model in more detail.

Users compile their high level design in C/C++/OpenCL or RTL into FPGA image 
using
Vitis https://www.xilinx.com/products/design-tools/vitis/vitis-platform.html
tools. The compiled image is packaged as xclbin which contains partial bitstream
for the user partition and necessary metadata. Users can dynamically swap the 
image
running on the user partition in order to switch between different workloads by
loading different xclbins.

XRT DRIVERS FOR XILINX ALVEO

XRT Linux kernel driver *xmgmt* binds to management physical function of Alveo
platform. The modular driver framework is organized into several platform 
drivers
which primarily handle the following functionality:

1.  Loading firmware container also called xsabin at driver attach time
2.  Loading of user compiled xclbin with FPGA Manager integration
3.  Clock scaling of image running on user partition
4.  In-band sensors: temp, voltage, power, etc.
5.  Device reset and rescan

The platform drivers are packaged into *xrt-lib* helper module with well
defined interfaces. The module provides a pseudo-bus implementation for the
platform drivers. More details on the driver model can be found in
Documentation/xrt.rst.

User physical function driver is not included in this patch series.

LIBFDT REQUIREMENT

XRT driver infrastructure uses Device Tree as a metadata format to discover
HW subsystems in the Alveo PCIe device. The Device Tree schema used by XRT
is documented in Documentation/xrt.rst. Unlike previous V1 and V2 version
of patch series, V3 version does not require export of libfdt symbols.

TESTING AND VALIDATION

xmgmt driver can be tested with full XRT open source stack which includes user
space libraries, board utilities and (out of tree) first generation user 
physical
function driver xocl. XRT open source runtime stack is available at
https://github.com/Xilinx/XRT

Complete documentation for XRT open source stack including sections on Alveo/XRT
security and platform architecture can be found here:

https://xilinx.github.io/XRT/master/html/index.html
https://xilinx.github.io/XRT/master/html/security.html
https://xilinx.github.io/XRT/master/html/platforms_partitions.html

Changes since v2:
- Streamlined the driver framework into *xleaf*, *group* and *xroot*
- Updated documentation to show the driver model with examples
- Addressed kernel test robot errors
- Added a selftest for basic driver framework
- Documented device tree schema
- Removed need to export libfdt symbols

Changes since v1:
- Updated the driver to use fpga_region and fpga_bridge for FPGA
   programming
- Dropped platform drivers not related to PR programming to focus on XRT
   core framework
- Updated Documentation/fpga/xrt.rst with information on XRT core framework
- Addressed checkpatch issues
- Dropped xrt- prefix from some header files

For reference V1 version of patch series can be found here:

https://lore.kernel.org/lkml/20201217075046.28553-1-son...@xilinx.com/
https://lore.kernel.org/lkml/20201217075046.28553-2-son...@xilinx.com/
https://lore.kernel.org/lkml/20201217075046.28553-3-son...@xilinx.com/
https://lore.kernel.org/lkml/20201217075046.28553-4-son...@xilinx.com/
https://lore.kernel.org/lkml/20201217075046.28553-5-son...@xilinx.com/

Re: [PATCH] [v13] wireless: Initial driver submission for pureLiFi STA devices

2021-02-18 Thread Srinivasan Raju

Hi,

Please find a few responses to the comments , We will fix rest of the comments 
and submit v14 of the patch.

> Also, you *really* need some validation here, rather than blindly
> trusting that the file is well-formed, otherwise you immediately have a
> security issue here.

The firmware is signed and the harware validates the signature , so we are not 
validating in the software.

>> +static const struct plf_reg_alpha2_map reg_alpha2_map[] = {
>> + { PLF_REGDOMAIN_FCC, "US" },
>> + { PLF_REGDOMAIN_IC, "CA" },
>> + { PLF_REGDOMAIN_ETSI, "DE" }, /* Generic ETSI, use most restrictive */
>> + { PLF_REGDOMAIN_JAPAN, "JP" },
>> + { PLF_REGDOMAIN_SPAIN, "ES" },
>> + { PLF_REGDOMAIN_FRANCE, "FR" },
>> +};

> You actually have regulatory restrictions on this stuff?

Currently, There are no regulatory restrictions applicable for LiFi , 
regulatory_hint setting is only for wifi core 

>> +static const struct ieee80211_rate purelifi_rates[] = {
>> + { .bitrate = 10,
>> + .hw_value = PURELIFI_CCK_RATE_1M,
>> + .flags = 0 },
>> + { .bitrate = 20,
>> + .hw_value = PURELIFI_CCK_RATE_2M,
>> + .hw_value_short = PURELIFI_CCK_RATE_2M
>> + | PURELIFI_CCK_PREA_SHORT,
>> + .flags = IEEE80211_RATE_SHORT_PREAMBLE },
>> + { .bitrate = 55,
>> + .hw_value = PURELIFI_CCK_RATE_5_5M,
>> + .hw_value_short = PURELIFI_CCK_RATE_5_5M
>> + | PURELIFI_CCK_PREA_SHORT,
>> + .flags = IEEE80211_RATE_SHORT_PREAMBLE },
>> + { .bitrate = 110,
>> + .hw_value = PURELIFI_CCK_RATE_11M,
>> + .hw_value_short = PURELIFI_CCK_RATE_11M
>> + | PURELIFI_CCK_PREA_SHORT,
>> + .flags = IEEE80211_RATE_SHORT_PREAMBLE },


> So ... how much of that is completely fake? Are you _actually_ doing 1,
> 2, 5.5 etc. Mbps over the air, and you even have short and long
> preamble? Why do all of that legacy mess, when you're essentially
> greenfield??

Yes, this not used. We will test and remove the unused legacy settings

> OTOH, I can sort of see why/how you're reusing wifi functionality here,
> very old versions of wifi even had an infrared PHY I think.
>
> Conceptually, it seems odd. Perhaps we should add a new band definition?
>
> And what I also asked above - how much of the rate stuff is completely
> fake? Are you really doing CCK/OFDM in some (strange?) way?

Yes, your understanding is correct, and we use OFDM.
For now we will use the existing band definition.

Thanks
Srini

From: Johannes Berg 
Sent: Friday, February 12, 2021 7:14 PM
To: Srinivasan Raju
Cc: Mostafa Afgani; Kalle Valo; David S. Miller; Jakub Kicinski; open list; 
open list:NETWORKING DRIVERS (WIRELESS); open list:NETWORKING DRIVERS
Subject: Re: [PATCH] [v13] wireless: Initial driver submission for pureLiFi STA 
devices

Hi,

Thanks for your patience, and thanks for sticking around!

I'm sorry I haven't reviewed this again in a long time, but I was able
to today.


> +PUREILIFI USB DRIVER

Did you mistype "PURELIFI" here, or was that intentional?

> +PUREILIFI USB DRIVER
> +M:   Srinivasan Raju 

Probably would be good to have an "L" entry with the linux-wireless list
here.

> +if WLAN_VENDOR_PURELIFI
> +
> +source "drivers/net/wireless/purelifi/plfxlc/Kconfig"

Seems odd to have the Makefile under purelifi/ but the Kconfig is yet
another directory deeper?

> +++ b/drivers/net/wireless/purelifi/Makefile
> @@ -0,0 +1,2 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_WLAN_VENDOR_PURELIFI)   := plfxlc/

Although this one doesn't do anything, so all you did was save a level
of Kconfig inclusion I guess ... no real objection to that.

> diff --git a/drivers/net/wireless/purelifi/plfxlc/Kconfig 
> b/drivers/net/wireless/purelifi/plfxlc/Kconfig
> new file mode 100644
> index ..07a65c0fce68
> --- /dev/null
> +++ b/drivers/net/wireless/purelifi/plfxlc/Kconfig
> @@ -0,0 +1,13 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +config PLFXLC
> +
> + tristate "pureLiFi X, XL, XC device support"

extra blank line.

Also, maybe that should be a bit more verbose? PURELIFI_XLC or so? I
don't think it shows up in many places, but if you see "PLFXLC"
somewhere that's not very helpful.

> + depends on CFG80211 && MAC80211 && USB
> + help
> +This driver makes the adapter appear as a normal WLAN interface.
> +
> +The pureLiFi device requires external STA firmware to be loaded.
> +
> +To compile this driver as a module, choose M here: the module will
> +be called purelifi.

But will it? Seems like it would be called "plfxlc"?

See here:

> +++ b/drivers/net/wireless/purelifi/plfxlc/Makefile
> @@ -0,0 +1,3 @@
> +# SPDX-License-Identifier: GPL-2.0-only
> +obj-$(CONFIG_PLFXLC) := plfxlc.o
> +plfxlc-objs  += chip.o firmware.o usb.o mac.o


> +int

Re: [PATCH v2 1/4] hmm: Device exclusive memory access

2021-02-18 Thread kernel test robot

Hi Alistair,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kselftest/next]
[also build test ERROR on linus/master v5.11 next-20210218]
[cannot apply to hnaz-linux-mm/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Alistair-Popple/Add-support-for-SVM-atomics-in-Nouveau/20210219-100858
base:   
https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git next
config: x86_64-randconfig-s021-20210217 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce:
# apt-get install sparse
# sparse version: v0.6.3-215-g0fb77bb6-dirty
# 
https://github.com/0day-ci/linux/commit/bb5444811772d30b2e3bbaa44baeb8a4b3f03cec
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Alistair-Popple/Add-support-for-SVM-atomics-in-Nouveau/20210219-100858
git checkout bb5444811772d30b2e3bbaa44baeb8a4b3f03cec
# save the attached .config to linux build tree
make W=1 C=1 CF='-fdiagnostic-prefix -D__CHECK_ENDIAN__' ARCH=x86_64 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   ld: warning: orphan section `.data..decrypted' from 
`arch/x86/kernel/cpu/vmware.o' being placed in section `.data..decrypted'
   ld: warning: orphan section `.data..decrypted' from `arch/x86/kernel/kvm.o' 
being placed in section `.data..decrypted'
   ld: mm/memory.o: in function `do_swap_page':
>> mm/memory.c:3300: undefined reference to `hmm_remove_exclusive_entry'


vim +3300 mm/memory.c

  3270  
  3271  /*
  3272   * We enter with non-exclusive mmap_lock (to exclude vma changes,
  3273   * but allow concurrent faults), and pte mapped but not yet locked.
  3274   * We return with pte unmapped and unlocked.
  3275   *
  3276   * We return with the mmap_lock locked or unlocked in the same cases
  3277   * as does filemap_fault().
  3278   */
  3279  vm_fault_t do_swap_page(struct vm_fault *vmf)
  3280  {
  3281  struct vm_area_struct *vma = vmf->vma;
  3282  struct page *page = NULL, *swapcache;
  3283  swp_entry_t entry;
  3284  pte_t pte;
  3285  int locked;
  3286  int exclusive = 0;
  3287  vm_fault_t ret = 0;
  3288  void *shadow = NULL;
  3289  
  3290  if (!pte_unmap_same(vma->vm_mm, vmf->pmd, vmf->pte, 
vmf->orig_pte))
  3291  goto out;
  3292  
  3293  entry = pte_to_swp_entry(vmf->orig_pte);
  3294  if (unlikely(non_swap_entry(entry))) {
  3295  if (is_migration_entry(entry)) {
  3296  migration_entry_wait(vma->vm_mm, vmf->pmd,
  3297   vmf->address);
  3298  } else if (is_device_exclusive_entry(entry)) {
  3299  vmf->page = 
device_exclusive_entry_to_page(entry);
> 3300  ret = hmm_remove_exclusive_entry(vmf);
  3301  } else if (is_device_private_entry(entry)) {
  3302  vmf->page = device_private_entry_to_page(entry);
  3303  ret = 
vmf->page->pgmap->ops->migrate_to_ram(vmf);
  3304  } else if (is_hwpoison_entry(entry)) {
  3305  ret = VM_FAULT_HWPOISON;
  3306  } else {
  3307  print_bad_pte(vma, vmf->address, vmf->orig_pte, 
NULL);
  3308  ret = VM_FAULT_SIGBUS;
  3309  }
  3310  goto out;
  3311  }
  3312  
  3313  
  3314  delayacct_set_flag(DELAYACCT_PF_SWAPIN);
  3315  page = lookup_swap_cache(entry, vma, vmf->address);
  3316  swapcache = page;
  3317  
  3318  if (!page) {
  3319  struct swap_info_struct *si = swp_swap_info(entry);
  3320  
  3321  if (data_race(si->flags & SWP_SYNCHRONOUS_IO) &&
  3322  __swap_count(entry) == 1) {
  3323  /* skip swapcache */
  3324  page = alloc_page_vma(GFP_HIGHUSER_MOVABLE, vma,
  3325  vmf->address);
  3326  if (page) {
  3327  int err;
  3328  
  3329  __SetPageLocked(page);
  3330  __SetPageSwapBacked(page);
  3331  set_page_private(page, entry.val);
  3332  
    /* Tell memcg to use swap ownership 
records */
  3334  SetPa

Re: [PATCH V3 1/2] topology: Allow multiple entities to provide sched_freq_tick() callback

2021-02-18 Thread Viresh Kumar

On 18-02-21, 16:36, Ionela Voinescu wrote:
> Yes, we don't care if there is no cpufreq driver, as the use of AMUs won't
> get initialised either. But we do care if there is a cpufreq driver that
> does not support frequency invariance, which is the example above.
> 
> The intention with the patches that made cpufreq based invariance generic
> a while back was for it to be present, seamlessly, for as many drivers as
> possible, as a less than accurate invariance default method is still
> better than nothing.

Right.

> So only a few drivers today don't support cpufreq based FI

Only two AFAICT, both x86, and the AMU stuff doesn't conflict with
them.

drivers/cpufreq/intel_pstate.c
drivers/cpufreq/longrun.c

> but it's not a guarantee that it will stay this way.

What do you mean by "no guarantee" here ?

The very core routines (cpufreq_freq_transition_end() and
cpufreq_driver_fast_switch()) of the cpufreq core call
arch_set_freq_scale() today and this isn't going to change anytime
soon. If something gets changed there someone will need to see other
parts of the kernel which may get broken with that.

I don't see any need of complicating other parts of the kernel like,
amu or cppc code for that. They should be kept simple and they should
assume cpufreq invariance will be supported as it is today.

-- 
viresh

Re: [PATCH v8 2/6] arm64: hyperv: Add Hyper-V clocksource/clockevent support

2021-02-18 Thread kernel test robot

Hi Michael,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on arm64/for-next/core]
[also build test ERROR on tip/timers/core efi/next linus/master v5.11 
next-20210218]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Michael-Kelley/Enable-Linux-guests-on-Hyper-V-on-ARM64/20210219-072336
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git 
for-next/core
config: i386-randconfig-a003-20210218 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce (this is a W=1 build):
# 
https://github.com/0day-ci/linux/commit/a8eb25332c441e0965c0ecdfb1a86b507e3465e1
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Michael-Kelley/Enable-Linux-guests-on-Hyper-V-on-ARM64/20210219-072336
git checkout a8eb25332c441e0965c0ecdfb1a86b507e3465e1
# save the attached .config to linux build tree
make W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All errors (new ones prefixed by >>):

   drivers/clocksource/hyperv_timer.c:478:44: warning: 'struct 
acpi_table_header' declared inside parameter list will not be visible outside 
of this definition or declaration
 478 | static int __init hyperv_timer_init(struct acpi_table_header *table)
 |^
   drivers/clocksource/hyperv_timer.c: In function 'hyperv_timer_init':
>> drivers/clocksource/hyperv_timer.c:484:6: error: too many arguments to 
>> function 'hv_stimer_alloc'
 484 |  if (hv_stimer_alloc(true))
 |  ^~~
   drivers/clocksource/hyperv_timer.c:173:5: note: declared here
 173 | int hv_stimer_alloc(void)
 | ^~~
   In file included from include/linux/clockchips.h:14,
from drivers/clocksource/hyperv_timer.c:16:
   drivers/clocksource/hyperv_timer.c: At top level:
>> include/linux/clocksource.h:283:50: error: expected ')' before numeric 
>> constant
 283 |  ACPI_DECLARE_PROBE_ENTRY(timer, name, table_id, 0, NULL, 0, fn)
 |  ^
   drivers/clocksource/hyperv_timer.c:489:1: note: in expansion of macro 
'TIMER_ACPI_DECLARE'
 489 | TIMER_ACPI_DECLARE(hyperv, ACPI_SIG_GTDT, hyperv_timer_init);
 | ^~
   drivers/clocksource/hyperv_timer.c:478:19: warning: 'hyperv_timer_init' 
defined but not used [-Wunused-function]
 478 | static int __init hyperv_timer_init(struct acpi_table_header *table)
 |   ^


vim +/hv_stimer_alloc +484 drivers/clocksource/hyperv_timer.c

   476  
   477  /* Initialize everything on ARM64 */
   478  static int __init hyperv_timer_init(struct acpi_table_header *table)
   479  {
   480  if (!hv_is_hyperv_initialized())
   481  return -EINVAL;
   482  
   483  hv_init_clocksource();
 > 484  if (hv_stimer_alloc(true))

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip

Re: [PATCH v2 1/4] hmm: Device exclusive memory access

2021-02-18 Thread Alistair Popple

Apologies for the noise, looks like I don't have a CONFIG_DEVICE_PRIVATE=n 
build in my tests and missed creating definitions for the new static inline 
functions for that configuration.

I'll wait for some feedback on the overall approach and fix this in a v3.

 - Alistair

On Friday, 19 February 2021 3:04:07 PM AEDT kernel test robot wrote:
> External email: Use caution opening links or attachments
> 
> 
> Hi Alistair,
> 
> Thank you for the patch! Yet something to improve:
> 
> [auto build test ERROR on kselftest/next]
> [also build test ERROR on linus/master v5.11 next-20210218]
> [cannot apply to hnaz-linux-mm/master]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch]
> 
> url:
> https://github.com/0day-ci/linux/commits/Alistair-Popple/Add-support-for-SVM-atomics-in-Nouveau/20210219-100858
> base:   
> https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git next
> config: mips-randconfig-r036-20210218 (attached as .config)
> compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
c9439ca36342fb6013187d0a69aef92736951476)
> reproduce (this is a W=1 build):
> wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/
make.cross -O ~/bin/make.cross
> chmod +x ~/bin/make.cross
> # install mips cross compiling tool for clang build
> # apt-get install binutils-mips-linux-gnu
> # https://github.com/0day-ci/linux/commit/
bb5444811772d30b2e3bbaa44baeb8a4b3f03cec
> git remote add linux-review https://github.com/0day-ci/linux
> git fetch --no-tags linux-review Alistair-Popple/Add-support-for-
SVM-atomics-in-Nouveau/20210219-100858
> git checkout bb5444811772d30b2e3bbaa44baeb8a4b3f03cec
> # save the attached .config to linux build tree
> COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=mips
> 
> If you fix the issue, kindly add following tag as appropriate
> Reported-by: kernel test robot 
> 
> All error/warnings (new ones prefixed by >>):
> 
> >> fs/proc/task_mmu.c:521:12: error: implicit declaration of function 
'is_device_exclusive_entry' [-Werror,-Wimplicit-function-declaration]
>else if (is_device_exclusive_entry(swpent))
> ^
>fs/proc/task_mmu.c:521:12: note: did you mean 'is_device_private_entry'?
>include/linux/swapops.h:176:20: note: 'is_device_private_entry' declared 
here
>static inline bool is_device_private_entry(swp_entry_t entry)
>   ^
> >> fs/proc/task_mmu.c:522:11: error: implicit declaration of function 
'device_exclusive_entry_to_page' [-Werror,-Wimplicit-function-declaration]
>page = device_exclusive_entry_to_page(swpent);
>   ^
>fs/proc/task_mmu.c:522:11: note: did you mean 
'device_private_entry_to_page'?
>include/linux/swapops.h:191:28: note: 'device_private_entry_to_page' 
declared here
>static inline struct page *device_private_entry_to_page(swp_entry_t 
entry)
>   ^
> >> fs/proc/task_mmu.c:522:9: warning: incompatible integer to pointer 
conversion assigning to 'struct page *' from 'int' [-Wint-conversion]
>page = device_exclusive_entry_to_page(swpent);
> ^ ~~
>fs/proc/task_mmu.c:1395:7: error: implicit declaration of function 
'is_device_exclusive_entry' [-Werror,-Wimplicit-function-declaration]
>if (is_device_exclusive_entry(entry))
>^
>fs/proc/task_mmu.c:1396:11: error: implicit declaration of function 
'device_exclusive_entry_to_page' [-Werror,-Wimplicit-function-declaration]
>page = device_exclusive_entry_to_page(entry);
>   ^
>fs/proc/task_mmu.c:1396:9: warning: incompatible integer to pointer 
conversion assigning to 'struct page *' from 'int' [-Wint-conversion]
>page = device_exclusive_entry_to_page(entry);
> ^ ~
>2 warnings and 4 errors generated.
> 
> 
> vim +/is_device_exclusive_entry +521 fs/proc/task_mmu.c
> 
>490
>491  static void smaps_pte_entry(pte_t *pte, unsigned long addr,
>492  struct mm_walk *walk)
>493  {
>494  struct mem_size_stats *mss = walk->private;
>495  struct vm_area_struct *vma = walk->vma;
>496  bool locked = !!(vma->vm_flags & VM_LOCKED);
>497  struct p

amdgpu, 5.11.0, suicide when input of monitor is switched

2021-02-18 Thread Norbert Preining

Hi all

After switching inputs of the display I use (two computers are
connected, one via DP and the current one via HDMI), the AMD GPU I use
stopped sending signals to the display.

The kernel log showed the following:

[  262.300879] [drm:drm_atomic_helper_wait_for_flip_done [drm_kms_helper]] 
*ERROR* [CRTC:77:crtc-0] flip_done timed out
[  263.068884] [drm:amdgpu_dm_atomic_check [amdgpu]] *ERROR* [CRTC:77:crtc-0] 
hw_done or flip_done timed out
[  267.338757] [drm] amdgpu_dm_irq_schedule_work FAILED src 4
[  273.308861] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] 
*ERROR* [CRTC:77:crtc-0] flip_done timed out
[  277.928665] [drm] amdgpu_dm_irq_schedule_work FAILED src 4
[  278.528714] [drm] amdgpu_dm_irq_schedule_work FAILED src 4
[  283.548856] [drm:drm_atomic_helper_wait_for_dependencies [drm_kms_helper]] 
*ERROR* [PLANE:65:plane-5] flip_done timed out
[  283.611061] [ cut here ]
[  283.611063] WARNING: CPU: 6 PID: 214 at 
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:7754 
amdgpu_dm_atomic_commit_tail+0x21c1/0x2250 [amdgpu]
[  283.611147] Modules linked in: nf_conntrack_netlink xfrm_user xfrm_algo 
xt_addrtype br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack xt_tcpudp 
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc 
nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs lockd grace 
nfs_ssc fscache scsi_transport_iscsi ipt_REJECT nf_reject_ipv4 xt_multiport 
nft_compat nft_counter nf_tables nfnetlink rfkill overlay binfmt_misc 
x86_pkg_temp_thermal kvm_intel kvm irqbypass crct10dif_pclmul 
ghash_clmulni_intel aesni_intel crypto_simd cryptd glue_helper pcspkr serio_raw 
iTCO_wdt ee1004 iTCO_vendor_support uvcvideo videobuf2_vmalloc videobuf2_memops 
videobuf2_v4l2 videobuf2_common snd_usb_audio snd_usbmidi_lib videodev sg 
snd_rawmidi mc mei_me mei intel_pch_thermal acpi_pad evdev joydev ext4 crc16 
mbcache jbd2 nvidia_drm(POE) nvidia_modeset(POE) loop nvidia(POE) drivetemp 
i2c_dev parport_pc parport configfs sunrpc ip_tables x_tables autofs4 raid10 
raid456 async_raid6_recov
[  283.611185]  async_memcpy async_pq async_xor async_tx raid1 raid0 multipath 
linear md_mod wacom hid_generic usbhid hid uas usb_storage amdgpu gpu_sched 
i2c_algo_bit drm_ttm_helper ttm crc32_pclmul psmouse mxm_wmi xhci_pci nvme 
drm_kms_helper xhci_hcd nvme_core cec i2c_designware_platform drm 
i2c_designware_core usbcore
[  283.611201] CPU: 6 PID: 214 Comm: kworker/6:3 Tainted: P   OE 
5.11.0 #125
[  283.611203] Hardware name: MSI MS-7A16/Z170A MPOWER GAMING 
TITANIUM(MS-7A16), BIOS 1.10 07/22/2016
[  283.611204] Workqueue: events dm_irq_work_func [amdgpu]
[  283.611264] RIP: 0010:amdgpu_dm_atomic_commit_tail+0x21c1/0x2250 [amdgpu]
[  283.611323] Code: 95 a0 fd ff ff c7 85 a4 fd ff ff 37 00 00 00 c7 85 ac fd 
ff ff 20 00 00 00 e8 3b 50 12 00 e9 4c fb ff ff 0f 0b e9 9b f9 ff ff <0f> 0b e9 
eb f9 ff ff 0f 0b 0f 0b e9 01 fa ff ff 48 89 c8 44 8b 85
[  283.611324] RSP: 0018:a93340a3fab8 EFLAGS: 00010002
[  283.611326] RAX: 0297 RBX: 0001 RCX: 942594ed9118
[  283.611327] RDX: 0001 RSI: 0297 RDI: 942596120188
[  283.611328] RBP: a93340a3fdb0 R08: 0005 R09: 
[  283.611329] R10: a93340a3fa18 R11: 0002 R12: 0297
[  283.611330] R13: 9426652bc400 R14: 942594ed9000 R15: 94258abdcd80
[  283.611331] FS:  () GS:9434bed8() 
knlGS:
[  283.611332] CS:  0010 DS:  ES:  CR0: 80050033
[  283.611333] CR2: 7f1c940f0030 CR3: 000accc0a002 CR4: 003706e0
[  283.611335] DR0:  DR1:  DR2: 
[  283.611336] DR3:  DR6: fffe0ff0 DR7: 0400
[  283.611337] Call Trace:
[  283.611342]  commit_tail+0x94/0x130 [drm_kms_helper]
[  283.611349]  drm_atomic_helper_commit+0x113/0x140 [drm_kms_helper]
[  283.611354]  dm_restore_drm_connector_state+0xef/0x170 [amdgpu]
[  283.611413]  handle_hpd_irq+0xea/0x120 [amdgpu]
[  283.611470]  dm_irq_work_func+0x49/0x60 [amdgpu]
[  283.611528]  process_one_work+0x1ec/0x380
[  283.611531]  worker_thread+0x53/0x3d0
[  283.611533]  ? process_one_work+0x380/0x380
[  283.611535]  kthread+0x11b/0x140
[  283.611537]  ? __kthread_bind_mask+0x60/0x60
[  283.611538]  ret_from_fork+0x22/0x30
[  283.611541] ---[ end trace 4c7cf81f00d90176 ]---
[  283.611555] [ cut here ]
[  283.611555] WARNING: CPU: 6 PID: 214 at 
drivers/gpu/drm/amd/amdgpu/../display/amdgpu_dm/amdgpu_dm.c:7354 
amdgpu_dm_atomic_commit_tail+0x21ca/0x2250 [amdgpu]
[  283.611613] Modules linked in: nf_conntrack_netlink xfrm_user xfrm_algo 
xt_addrtype br_netfilter xt_CHECKSUM xt_MASQUERADE xt_conntrack xt_tcpudp 
nft_chain_nat nf_nat nf_conntrack nf_defrag_ipv6 nf_defrag_ipv4 bridge stp llc 
nfsv3 nfs_acl rpcsec_gss_krb5 auth_rpcgss nfsv4 dns_resolver nfs

[PATCH 8/8] __unix_find_socket_byname(): don't pass hash and type separately

2021-02-18 Thread Al Viro

We only care about exclusive or of those, so pass that directly.
Makes life simpler for callers as well...

Signed-off-by: Al Viro 
---
 net/unix/af_unix.c | 23 ++-
 1 file changed, 10 insertions(+), 13 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index bb4c6200953d..a7e20fcadfcc 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -288,11 +288,11 @@ static inline void unix_insert_socket(struct hlist_head 
*list, struct sock *sk)
 
 static struct sock *__unix_find_socket_byname(struct net *net,
  struct sockaddr_un *sunname,
- int len, int type, unsigned int 
hash)
+ int len, unsigned int hash)
 {
struct sock *s;
 
-   sk_for_each(s, _socket_table[hash ^ type]) {
+   sk_for_each(s, _socket_table[hash]) {
struct unix_sock *u = unix_sk(s);
 
if (!net_eq(sock_net(s), net))
@@ -307,13 +307,12 @@ static struct sock *__unix_find_socket_byname(struct net 
*net,
 
 static inline struct sock *unix_find_socket_byname(struct net *net,
   struct sockaddr_un *sunname,
-  int len, int type,
-  unsigned int hash)
+  int len, unsigned int hash)
 {
struct sock *s;
 
spin_lock(_table_lock);
-   s = __unix_find_socket_byname(net, sunname, len, type, hash);
+   s = __unix_find_socket_byname(net, sunname, len, hash);
if (s)
sock_hold(s);
spin_unlock(_table_lock);
@@ -900,12 +899,12 @@ static int unix_autobind(struct socket *sock)
 retry:
addr->len = sprintf(addr->name->sun_path+1, "%05x", ordernum) + 1 + 
sizeof(short);
addr->hash = unix_hash_fold(csum_partial(addr->name, addr->len, 0));
+   addr->hash ^= sk->sk_type;
 
spin_lock(_table_lock);
ordernum = (ordernum+1)&0xF;
 
-   if (__unix_find_socket_byname(net, addr->name, addr->len, sock->type,
- addr->hash)) {
+   if (__unix_find_socket_byname(net, addr->name, addr->len, addr->hash)) {
spin_unlock(_table_lock);
/*
 * __unix_find_socket_byname() may take long time if many names
@@ -920,7 +919,6 @@ static int unix_autobind(struct socket *sock)
}
goto retry;
}
-   addr->hash ^= sk->sk_type;
 
__unix_set_addr(sk, addr, addr->hash);
err = 0;
@@ -966,7 +964,7 @@ static struct sock *unix_find_other(struct net *net,
}
} else {
err = -ECONNREFUSED;
-   u = unix_find_socket_byname(net, sunname, len, type, hash);
+   u = unix_find_socket_byname(net, sunname, len, type ^ hash);
if (u) {
struct dentry *dentry;
dentry = unix_sk(u)->path.dentry;
@@ -1036,8 +1034,7 @@ static int unix_bind_bsd(struct sock *sk, struct 
unix_address *addr)
return err == -EEXIST ? -EADDRINUSE : err;
 }
 
-static int unix_bind_abstract(struct sock *sk, unsigned hash,
- struct unix_address *addr)
+static int unix_bind_abstract(struct sock *sk, struct unix_address *addr)
 {
struct unix_sock *u = unix_sk(sk);
int err;
@@ -1053,7 +1050,7 @@ static int unix_bind_abstract(struct sock *sk, unsigned 
hash,
 
spin_lock(_table_lock);
if (__unix_find_socket_byname(sock_net(sk), addr->name, addr->len,
- sk->sk_type, hash)) {
+ addr->hash)) {
spin_unlock(_table_lock);
mutex_unlock(>bindlock);
return -EADDRINUSE;
@@ -1095,7 +1092,7 @@ static int unix_bind(struct socket *sock, struct sockaddr 
*uaddr, int addr_len)
if (sun_path[0])
err = unix_bind_bsd(sk, addr);
else
-   err = unix_bind_abstract(sk, hash, addr);
+   err = unix_bind_abstract(sk, addr);
if (err)
unix_release_addr(addr);
return err;
-- 
2.11.0

[PATCH 6/8] unix_bind_bsd(): move done_path_create() call after dealing with ->bindlock

2021-02-18 Thread Al Viro

Final preparations for doing unlink on failure past the successful
mknod.  We can't hold ->bindlock over ->mknod() or ->unlink(), since
either might do sb_start_write() (e.g. on overlayfs).  However, we
can do it while holding filesystem and VFS locks - doing
kern_path_create()
vfs_mknod()
grab ->bindlock
if u->addr had been set
drop ->bindlock
done_path_create
return -EINVAL
else
assign the address to socket
drop ->bindlock
done_path_create
return 0
would be deadlock-free.  Here we massage unix_bind_bsd() to that
form.  We are still doing equivalent transformations.

Next commit will *not* be an equivalent transformation - it will
add a call of vfs_unlink() before done_path_create() in "alread bound"
case.

Signed-off-by: Al Viro 
---
 net/unix/af_unix.c | 24 ++--
 1 file changed, 10 insertions(+), 14 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 36b88c8c438b..d55035a9695f 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -989,7 +989,7 @@ static int unix_bind_bsd(struct sock *sk, struct 
unix_address *addr)
struct unix_sock *u = unix_sk(sk);
umode_t mode = S_IFSOCK |
   (SOCK_INODE(sk->sk_socket)->i_mode & ~current_umask());
-   struct path parent, path;
+   struct path parent;
struct dentry *dentry;
unsigned int hash;
int err;
@@ -1006,38 +1006,34 @@ static int unix_bind_bsd(struct sock *sk, struct 
unix_address *addr)
 * All right, let's create it.
 */
err = security_path_mknod(, dentry, mode, 0);
-   if (!err) {
+   if (!err)
err = vfs_mknod(d_inode(parent.dentry), dentry, mode, 0);
-   if (!err) {
-   path.mnt = mntget(parent.mnt);
-   path.dentry = dget(dentry);
-   }
-   }
-   done_path_create(, dentry);
+
if (err) {
if (err == -EEXIST)
err = -EADDRINUSE;
+   done_path_create(, dentry);
return err;
}
-
err = mutex_lock_interruptible(>bindlock);
if (err) {
-   path_put();
+   done_path_create(, dentry);
return err;
}
-
if (u->addr) {
mutex_unlock(>bindlock);
-   path_put();
+   done_path_create(, dentry);
return -EINVAL;
}
 
addr->hash = UNIX_HASH_SIZE;
-   hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1);
+   hash = d_backing_inode(dentry)->i_ino & (UNIX_HASH_SIZE - 1);
spin_lock(_table_lock);
-   u->path = path;
+   u->path.mnt = mntget(parent.mnt);
+   u->path.dentry = dget(dentry);
__unix_set_addr(sk, addr, hash);
mutex_unlock(>bindlock);
+   done_path_create(, dentry);
return 0;
 }
 
-- 
2.11.0

[PATCH 5/8] fold unix_mknod() into unix_bind_bsd()

2021-02-18 Thread Al Viro

Signed-off-by: Al Viro 
---
 net/unix/af_unix.c | 39 +++
 1 file changed, 15 insertions(+), 24 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index d7aeb4827747..36b88c8c438b 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -984,45 +984,36 @@ static struct sock *unix_find_other(struct net *net,
return NULL;
 }
 
-static int unix_mknod(const char *sun_path, umode_t mode, struct path *res)
+static int unix_bind_bsd(struct sock *sk, struct unix_address *addr)
 {
+   struct unix_sock *u = unix_sk(sk);
+   umode_t mode = S_IFSOCK |
+  (SOCK_INODE(sk->sk_socket)->i_mode & ~current_umask());
+   struct path parent, path;
struct dentry *dentry;
-   struct path path;
-   int err = 0;
+   unsigned int hash;
+   int err;
+
/*
 * Get the parent directory, calculate the hash for last
 * component.
 */
-   dentry = kern_path_create(AT_FDCWD, sun_path, , 0);
-   err = PTR_ERR(dentry);
+   dentry = kern_path_create(AT_FDCWD, addr->name->sun_path, , 0);
if (IS_ERR(dentry))
-   return err;
+   return PTR_ERR(dentry);
 
/*
 * All right, let's create it.
 */
-   err = security_path_mknod(, dentry, mode, 0);
+   err = security_path_mknod(, dentry, mode, 0);
if (!err) {
-   err = vfs_mknod(d_inode(path.dentry), dentry, mode, 0);
+   err = vfs_mknod(d_inode(parent.dentry), dentry, mode, 0);
if (!err) {
-   res->mnt = mntget(path.mnt);
-   res->dentry = dget(dentry);
+   path.mnt = mntget(parent.mnt);
+   path.dentry = dget(dentry);
}
}
-   done_path_create(, dentry);
-   return err;
-}
-
-static int unix_bind_bsd(struct sock *sk, struct unix_address *addr)
-{
-   struct unix_sock *u = unix_sk(sk);
-   struct path path = { };
-   umode_t mode = S_IFSOCK |
-  (SOCK_INODE(sk->sk_socket)->i_mode & ~current_umask());
-   unsigned int hash;
-   int err;
-
-   err = unix_mknod(addr->name->sun_path, mode, );
+   done_path_create(, dentry);
if (err) {
if (err == -EEXIST)
err = -EADDRINUSE;
-- 
2.11.0

[PATCH 7/8] unix_bind_bsd(): unlink if we fail after successful mknod

2021-02-18 Thread Al Viro

We can do that more or less safely, since the parent is
held locked all along.  Yes, somebody might observe the
object via dcache, only to have it disappear afterwards,
but there's really no good way to prevent that.  It won't
race with other bind(2) or attempts to move the sucker
elsewhere, or put something else in its place - locked
parent prevents that.

Signed-off-by: Al Viro 
---
 net/unix/af_unix.c | 33 -
 1 file changed, 16 insertions(+), 17 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index d55035a9695f..bb4c6200953d 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1001,30 +1001,19 @@ static int unix_bind_bsd(struct sock *sk, struct 
unix_address *addr)
dentry = kern_path_create(AT_FDCWD, addr->name->sun_path, , 0);
if (IS_ERR(dentry))
return PTR_ERR(dentry);
-
/*
 * All right, let's create it.
 */
err = security_path_mknod(, dentry, mode, 0);
if (!err)
err = vfs_mknod(d_inode(parent.dentry), dentry, mode, 0);
-
-   if (err) {
-   if (err == -EEXIST)
-   err = -EADDRINUSE;
-   done_path_create(, dentry);
-   return err;
-   }
+   if (err)
+   goto out;
err = mutex_lock_interruptible(>bindlock);
-   if (err) {
-   done_path_create(, dentry);
-   return err;
-   }
-   if (u->addr) {
-   mutex_unlock(>bindlock);
-   done_path_create(, dentry);
-   return -EINVAL;
-   }
+   if (err)
+   goto out_unlink;
+   if (u->addr)
+   goto out_unlock;
 
addr->hash = UNIX_HASH_SIZE;
hash = d_backing_inode(dentry)->i_ino & (UNIX_HASH_SIZE - 1);
@@ -1035,6 +1024,16 @@ static int unix_bind_bsd(struct sock *sk, struct 
unix_address *addr)
mutex_unlock(>bindlock);
done_path_create(, dentry);
return 0;
+
+out_unlock:
+   mutex_unlock(>bindlock);
+   err = -EINVAL;
+out_unlink:
+   /* failed after successful mknod?  unlink what we'd created... */
+   vfs_unlink(d_inode(parent.dentry), dentry, NULL);
+out:
+   done_path_create(, dentry);
+   return err == -EEXIST ? -EADDRINUSE : err;
 }
 
 static int unix_bind_abstract(struct sock *sk, unsigned hash,
-- 
2.11.0

[PATCH 3/8] unix_bind(): separate BSD and abstract cases

2021-02-18 Thread Al Viro

We do get some duplication that way, but it's minor compared to
parts that are different.  What we get is an ability to change
locking in BSD case without making failure exits very hard to
follow.

Signed-off-by: Al Viro 
---
 net/unix/af_unix.c | 52 
 1 file changed, 32 insertions(+), 20 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 451d81f405c0..11e18b0efbc6 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1023,7 +1023,6 @@ static int unix_bind(struct socket *sock, struct sockaddr 
*uaddr, int addr_len)
int err;
unsigned int hash;
struct unix_address *addr;
-   struct path path = { };
 
err = -EINVAL;
if (addr_len < offsetofend(struct sockaddr_un, sun_family) ||
@@ -1050,6 +1049,7 @@ static int unix_bind(struct socket *sock, struct sockaddr 
*uaddr, int addr_len)
refcount_set(>refcnt, 1);
 
if (sun_path[0]) {
+   struct path path = { };
umode_t mode = S_IFSOCK |
   (SOCK_INODE(sock)->i_mode & ~current_umask());
err = unix_mknod(sun_path, mode, );
@@ -1058,40 +1058,52 @@ static int unix_bind(struct socket *sock, struct 
sockaddr *uaddr, int addr_len)
err = -EADDRINUSE;
goto out_addr;
}
-   }
 
-   err = mutex_lock_interruptible(>bindlock);
-   if (err)
-   goto out_put;
+   err = mutex_lock_interruptible(>bindlock);
+   if (err) {
+   path_put();
+   goto out_addr;
+   }
 
-   err = -EINVAL;
-   if (u->addr)
-   goto out_up;
+   err = -EINVAL;
+   if (u->addr) {
+   mutex_unlock(>bindlock);
+   path_put();
+   goto out_addr;
+   }
 
-   if (sun_path[0]) {
addr->hash = UNIX_HASH_SIZE;
hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 
1);
spin_lock(_table_lock);
u->path = path;
+   __unix_set_addr(sk, addr, hash);
+   mutex_unlock(>bindlock);
+   addr = NULL;
+   err = 0;
} else {
+   err = mutex_lock_interruptible(>bindlock);
+   if (err)
+   goto out_addr;
+
+   err = -EINVAL;
+   if (u->addr) {
+   mutex_unlock(>bindlock);
+   goto out_addr;
+   }
+
spin_lock(_table_lock);
err = -EADDRINUSE;
if (__unix_find_socket_byname(net, sunaddr, addr_len,
  sk->sk_type, hash)) {
spin_unlock(_table_lock);
-   goto out_up;
+   mutex_unlock(>bindlock);
+   goto out_addr;
}
-   hash = addr->hash;
+   __unix_set_addr(sk, addr, addr->hash);
+   mutex_unlock(>bindlock);
+   addr = NULL;
+   err = 0;
}
-
-   __unix_set_addr(sk, addr, hash);
-   addr = NULL;
-   err = 0;
-out_up:
-   mutex_unlock(>bindlock);
-out_put:
-   if (err)
-   path_put();
 out_addr:
if (addr)
unix_release_addr(addr);
-- 
2.11.0

[PATCH 4/8] unix_bind(): take BSD and abstract address cases into new helpers

2021-02-18 Thread Al Viro

unix_bind_bsd() and unix_bind_abstract() respectively.

Signed-off-by: Al Viro 
---
 net/unix/af_unix.c | 144 +++--
 1 file changed, 74 insertions(+), 70 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 11e18b0efbc6..d7aeb4827747 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1013,101 +1013,105 @@ static int unix_mknod(const char *sun_path, umode_t 
mode, struct path *res)
return err;
 }
 
+static int unix_bind_bsd(struct sock *sk, struct unix_address *addr)
+{
+   struct unix_sock *u = unix_sk(sk);
+   struct path path = { };
+   umode_t mode = S_IFSOCK |
+  (SOCK_INODE(sk->sk_socket)->i_mode & ~current_umask());
+   unsigned int hash;
+   int err;
+
+   err = unix_mknod(addr->name->sun_path, mode, );
+   if (err) {
+   if (err == -EEXIST)
+   err = -EADDRINUSE;
+   return err;
+   }
+
+   err = mutex_lock_interruptible(>bindlock);
+   if (err) {
+   path_put();
+   return err;
+   }
+
+   if (u->addr) {
+   mutex_unlock(>bindlock);
+   path_put();
+   return -EINVAL;
+   }
+
+   addr->hash = UNIX_HASH_SIZE;
+   hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 1);
+   spin_lock(_table_lock);
+   u->path = path;
+   __unix_set_addr(sk, addr, hash);
+   mutex_unlock(>bindlock);
+   return 0;
+}
+
+static int unix_bind_abstract(struct sock *sk, unsigned hash,
+ struct unix_address *addr)
+{
+   struct unix_sock *u = unix_sk(sk);
+   int err;
+
+   err = mutex_lock_interruptible(>bindlock);
+   if (err)
+   return err;
+
+   if (u->addr) {
+   mutex_unlock(>bindlock);
+   return -EINVAL;
+   }
+
+   spin_lock(_table_lock);
+   if (__unix_find_socket_byname(sock_net(sk), addr->name, addr->len,
+ sk->sk_type, hash)) {
+   spin_unlock(_table_lock);
+   mutex_unlock(>bindlock);
+   return -EADDRINUSE;
+   }
+   __unix_set_addr(sk, addr, addr->hash);
+   mutex_unlock(>bindlock);
+   return 0;
+}
+
 static int unix_bind(struct socket *sock, struct sockaddr *uaddr, int addr_len)
 {
struct sock *sk = sock->sk;
-   struct net *net = sock_net(sk);
-   struct unix_sock *u = unix_sk(sk);
struct sockaddr_un *sunaddr = (struct sockaddr_un *)uaddr;
char *sun_path = sunaddr->sun_path;
int err;
unsigned int hash;
struct unix_address *addr;
 
-   err = -EINVAL;
if (addr_len < offsetofend(struct sockaddr_un, sun_family) ||
sunaddr->sun_family != AF_UNIX)
-   goto out;
+   return -EINVAL;
 
-   if (addr_len == sizeof(short)) {
-   err = unix_autobind(sock);
-   goto out;
-   }
+   if (addr_len == sizeof(short))
+   return unix_autobind(sock);
 
err = unix_mkname(sunaddr, addr_len, );
if (err < 0)
-   goto out;
+   return err;
addr_len = err;
-   err = -ENOMEM;
addr = kmalloc(sizeof(*addr)+addr_len, GFP_KERNEL);
if (!addr)
-   goto out;
+   return -ENOMEM;
 
memcpy(addr->name, sunaddr, addr_len);
addr->len = addr_len;
addr->hash = hash ^ sk->sk_type;
refcount_set(>refcnt, 1);
 
-   if (sun_path[0]) {
-   struct path path = { };
-   umode_t mode = S_IFSOCK |
-  (SOCK_INODE(sock)->i_mode & ~current_umask());
-   err = unix_mknod(sun_path, mode, );
-   if (err) {
-   if (err == -EEXIST)
-   err = -EADDRINUSE;
-   goto out_addr;
-   }
-
-   err = mutex_lock_interruptible(>bindlock);
-   if (err) {
-   path_put();
-   goto out_addr;
-   }
-
-   err = -EINVAL;
-   if (u->addr) {
-   mutex_unlock(>bindlock);
-   path_put();
-   goto out_addr;
-   }
-
-   addr->hash = UNIX_HASH_SIZE;
-   hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 
1);
-   spin_lock(_table_lock);
-   u->path = path;
-   __unix_set_addr(sk, addr, hash);
-   mutex_unlock(>bindlock);
-   addr = NULL;
-   err = 0;
-   } else {
-   err = mutex_lock_interruptible(>bindlock);
-   if (err)
-   goto out_addr;
-
-   err = -EINVAL;
-   if (u->addr) {
-   mutex_unlock(>bindlock);
-   goto

[PATCH 1/8] af_unix: take address assignment/hash insertion into a new helper

2021-02-18 Thread Al Viro

Duplicated logics in all bind variants (autobind, bind-to-path,
bind-to-abstract) gets taken into a common helper.

Signed-off-by: Al Viro 
---
 net/unix/af_unix.c | 30 +++---
 1 file changed, 15 insertions(+), 15 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 41c3303c3357..179b4fe837e6 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -262,6 +262,16 @@ static void __unix_insert_socket(struct hlist_head *list, 
struct sock *sk)
sk_add_node(sk, list);
 }
 
+static void __unix_set_addr(struct sock *sk, struct unix_address *addr,
+   unsigned hash)
+   __releases(_table_lock)
+{
+   __unix_remove_socket(sk);
+   smp_store_release(_sk(sk)->addr, addr);
+   __unix_insert_socket(_socket_table[hash], sk);
+   spin_unlock(_table_lock);
+}
+
 static inline void unix_remove_socket(struct sock *sk)
 {
spin_lock(_table_lock);
@@ -912,10 +922,7 @@ static int unix_autobind(struct socket *sock)
}
addr->hash ^= sk->sk_type;
 
-   __unix_remove_socket(sk);
-   smp_store_release(>addr, addr);
-   __unix_insert_socket(_socket_table[addr->hash], sk);
-   spin_unlock(_table_lock);
+   __unix_set_addr(sk, addr, addr->hash);
err = 0;
 
 out:   mutex_unlock(>bindlock);
@@ -1016,7 +1023,6 @@ static int unix_bind(struct socket *sock, struct sockaddr 
*uaddr, int addr_len)
int err;
unsigned int hash;
struct unix_address *addr;
-   struct hlist_head *list;
struct path path = { };
 
err = -EINVAL;
@@ -1068,26 +1074,20 @@ static int unix_bind(struct socket *sock, struct 
sockaddr *uaddr, int addr_len)
hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 
1);
spin_lock(_table_lock);
u->path = path;
-   list = _socket_table[hash];
} else {
spin_lock(_table_lock);
err = -EADDRINUSE;
if (__unix_find_socket_byname(net, sunaddr, addr_len,
  sk->sk_type, hash)) {
+   spin_unlock(_table_lock);
unix_release_addr(addr);
-   goto out_unlock;
+   goto out_up;
}
-
-   list = _socket_table[addr->hash];
+   hash = addr->hash;
}
 
+   __unix_set_addr(sk, addr, hash);
err = 0;
-   __unix_remove_socket(sk);
-   smp_store_release(>addr, addr);
-   __unix_insert_socket(list, sk);
-
-out_unlock:
-   spin_unlock(_table_lock);
 out_up:
mutex_unlock(>bindlock);
 out_put:
-- 
2.11.0

[PATCH 2/8] unix_bind(): allocate addr earlier

2021-02-18 Thread Al Viro

>From 24439dbb7b78cb301c73254b364020e6a3f31902 Mon Sep 17 00:00:00 2001
From: Al Viro 
Date: Thu, 18 Feb 2021 15:52:53 -0500
Subject: [PATCH 2/8] unix_bind(): allocate addr earlier

makes it easier to massage; we do pay for that by extra work
(kmalloc+memcpy+kfree) in some error cases, but those are not
on the hot paths anyway.

Signed-off-by: Al Viro 
---
 net/unix/af_unix.c | 26 ++
 1 file changed, 14 insertions(+), 12 deletions(-)

diff --git a/net/unix/af_unix.c b/net/unix/af_unix.c
index 179b4fe837e6..451d81f405c0 100644
--- a/net/unix/af_unix.c
+++ b/net/unix/af_unix.c
@@ -1039,6 +1039,15 @@ static int unix_bind(struct socket *sock, struct 
sockaddr *uaddr, int addr_len)
if (err < 0)
goto out;
addr_len = err;
+   err = -ENOMEM;
+   addr = kmalloc(sizeof(*addr)+addr_len, GFP_KERNEL);
+   if (!addr)
+   goto out;
+
+   memcpy(addr->name, sunaddr, addr_len);
+   addr->len = addr_len;
+   addr->hash = hash ^ sk->sk_type;
+   refcount_set(>refcnt, 1);
 
if (sun_path[0]) {
umode_t mode = S_IFSOCK |
@@ -1047,7 +1056,7 @@ static int unix_bind(struct socket *sock, struct sockaddr 
*uaddr, int addr_len)
if (err) {
if (err == -EEXIST)
err = -EADDRINUSE;
-   goto out;
+   goto out_addr;
}
}
 
@@ -1059,16 +1068,6 @@ static int unix_bind(struct socket *sock, struct 
sockaddr *uaddr, int addr_len)
if (u->addr)
goto out_up;
 
-   err = -ENOMEM;
-   addr = kmalloc(sizeof(*addr)+addr_len, GFP_KERNEL);
-   if (!addr)
-   goto out_up;
-
-   memcpy(addr->name, sunaddr, addr_len);
-   addr->len = addr_len;
-   addr->hash = hash ^ sk->sk_type;
-   refcount_set(>refcnt, 1);
-
if (sun_path[0]) {
addr->hash = UNIX_HASH_SIZE;
hash = d_backing_inode(path.dentry)->i_ino & (UNIX_HASH_SIZE - 
1);
@@ -1080,19 +1079,22 @@ static int unix_bind(struct socket *sock, struct 
sockaddr *uaddr, int addr_len)
if (__unix_find_socket_byname(net, sunaddr, addr_len,
  sk->sk_type, hash)) {
spin_unlock(_table_lock);
-   unix_release_addr(addr);
goto out_up;
}
hash = addr->hash;
}
 
__unix_set_addr(sk, addr, hash);
+   addr = NULL;
err = 0;
 out_up:
mutex_unlock(>bindlock);
 out_put:
if (err)
path_put();
+out_addr:
+   if (addr)
+   unix_release_addr(addr);
 out:
return err;
 }
-- 
2.11.0

Re: [PATCH v8 0/3] CPUFreq: Add support for opp-sharing cpus

2021-02-18 Thread Viresh Kumar

On 18-02-21, 22:23, Nicola Mazzucato wrote:
> Hi Viresh,
> 
> In this V8 I have addressed your comments:
> - correct the goto in patch 1/3
> - improve comment in patch 2/3 for dev_pm_opp_get_opp_count()

LGTM. I will apply them after the merge window is over. Thanks.

-- 
viresh

[PATCH] scsi: arcmsr: catch bounds error earlier and dont panic

2021-02-18 Thread Tong Zhang

pARCMSR_CDB is calculated from a base address + a value read from
controller with some bit shifting, the value is not checked and could possibly
overflow the buffer and cause panic. The buffer is allocated using
dma_alloc_coherent and the size is acb->uncache_size.
Instead of crashing the system, we can try to catch the bounds error
earlier and return an error.

[   25.820995] BUG: unable to handle page fault for address: ed1010383dd3
[   25.821451] #PF: supervisor read access in kernel mode
[   25.821737] #PF: error_code(0x) - not-present page
[   25.822023] PGD 17fff1067 P4D 17fff1067 PUD 17fff0067 PMD 0
[   25.822342] Oops:  [#1] SMP KASAN NOPTI
[   25.822578] CPU: 0 PID: 66 Comm: kworker/u2:2 Not tainted 5.11.0 #27
[   25.822931] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 
rel-1.13.0-48-gd9c812dda519-4
[   25.823553] Workqueue: scsi_tmf_6 scmd_eh_abort_handler
[   25.823853] RIP: 0010:__asan_load8+0x3c/0xa0
[   25.824097] Code: 00 00 00 00 00 00 ff 48 39 f8 77 57 48 8d 47 07 48 89 c2 
83 e2 07 48 83 fa 07 78
[   25.825123] RSP: 0018:888101ea7d10 EFLAGS: 00010a03
[   25.825417] RAX: 111010383dd3 RBX: 888102a2a8c8 RCX: c000ac23
[   25.825813] RDX: dc00 RSI: c9000430 RDI: 888081c1ee98
[   25.826210] RBP: 888081c1eec0 R08: c000a5ea R09: ed102054551a
[   25.826606] R10: 888102a2a8cb R11: ed1020545519 R12: 888081c1ee80
[   25.827004] R13: 81c1ee00 R14: 888102a28000 R15: c9000430
[   25.827400] FS:  () GS:88815b40() 
knlGS:
[   25.827853] CS:  0010 DS:  ES:  CR0: 80050033
[   25.828174] CR2: ed1010383dd3 CR3: 0001029ca000 CR4: 06f0
[   25.828570] DR0:  DR1:  DR2: 
[   25.828966] DR3:  DR6: fffe0ff0 DR7: 0400
[   25.829362] Call Trace:
[   25.829503]  arcmsr_abort.cold+0xd41/0xf40 [arcmsr]
[   25.829788]  ? __schedule+0x5ae/0xd40
[   25.82]  scmd_eh_abort_handler+0xbd/0x1a0
[   25.830247]  process_one_work+0x470/0x750
[   25.830476]  worker_thread+0x73/0x690
[   25.830685]  ? process_one_work+0x750/0x750
[   25.830922]  kthread+0x199/0x1f0
[   25.831108]  ? kthread_create_on_node+0xd0/0xd0
[   25.831363]  ret_from_fork+0x1f/0x30
[   25.831571] Modules linked in: arcmsr(+)
[   25.831796] CR2: ed1010383dd3
[   25.831984] ---[ end trace 4a558ca3660a5f82 ]---
[   25.832243] RIP: 0010:__asan_load8+0x3c/0xa0
[   25.832485] Code: 00 00 00 00 00 00 ff 48 39 f8 77 57 48 8d 47 07 48 89 c2 
83 e2 07 48 83 fa 07 78
[   25.833512] RSP: 0018:888101ea7d10 EFLAGS: 00010a03
[   25.833805] RAX: 111010383dd3 RBX: 888102a2a8c8 RCX: c000ac23
[   25.834201] RDX: dc00 RSI: c9000430 RDI: 888081c1ee98
[   25.834596] RBP: 888081c1eec0 R08: c000a5ea R09: ed102054551a
[   25.834992] R10: 888102a2a8cb R11: ed1020545519 R12: 888081c1ee80
[   25.835388] R13: 81c1ee00 R14: 888102a28000 R15: c9000430
[   25.835784] FS:  () GS:88815b40() 
knlGS:
[   25.836229] CS:  0010 DS:  ES:  CR0: 80050033
[   25.836547] CR2: ed1010383dd3 CR3: 0001029ca000 CR4: 06f0
[   25.836940] DR0:  DR1:  DR2: 
[   25.837333] DR3:  DR6: fffe0ff0 DR7: 0400

Signed-off-by: Tong Zhang 
---
 drivers/scsi/arcmsr/arcmsr_hba.c | 15 +++
 1 file changed, 15 insertions(+)

diff --git a/drivers/scsi/arcmsr/arcmsr_hba.c b/drivers/scsi/arcmsr/arcmsr_hba.c
index 4b79661275c9..e0227bf12ab2 100644
--- a/drivers/scsi/arcmsr/arcmsr_hba.c
+++ b/drivers/scsi/arcmsr/arcmsr_hba.c
@@ -1482,6 +1482,11 @@ static void arcmsr_done4abort_postqueue(struct 
AdapterControlBlock *acb)
while(((flag_ccb = readl(>outbound_queueport)) != 
0x)
&& (i++ < acb->maxOutstanding)) {
ccb_cdb_phy = (flag_ccb << 5) & 0x;
+   if (ccb_cdb_phy>=acb->uncache_size) {
+   printk(KERN_WARNING "arcmsr%d: ccb_cdb_phy 
bounds error detected",
+acb->host->host_no);
+   break;
+   }
if (acb->cdb_phyadd_hipart)
ccb_cdb_phy = ccb_cdb_phy | 
acb->cdb_phyadd_hipart;
pARCMSR_CDB = (struct ARCMSR_CDB *)(acb->vir2phy_offset 
+ ccb_cdb_phy);
@@ -2451,6 +2456,11 @@ static void arcmsr_hbaA_postqueue_isr(struct 
AdapterControlBlock *acb)
 
while ((flag_ccb = readl(>outbound_queueport)) != 0x) {
cdb_phy_addr = (flag_ccb << 5) & 0x;
+   if (cdb_phy_addr>=acb->uncache_size) {
+   printk(KERN_WARNING "arcmsr%d:

[GIT PULL] xfs: new code for 5.12

2021-02-18 Thread Darrick J. Wong

Hi Linus,

Please pull the following branch containing all the new xfs code for
5.12.  There's a lot going on this time, which seems about right for
this drama-filled year.

Community developers added some code to speed up freezing when read-only
workloads are still running, refactored the logging code, added checks
to prevent file extent counter overflow, reduced iolock cycling to speed
up fsync and gc scans, and started the slow march towards supporting
filesystem shrinking.

There's a huge refactoring of the internal speculative preallocation
garbage collection code which fixes a bunch of bugs, makes the gc
scheduling per-AG and hence multithreaded, and standardizes the retry
logic when we try to reserve space or quota, can't, and want to trigger
a gc scan.  We also enable multithreaded quotacheck to reduce mount
times further.  This is also preparation for background file gc, which
may or may not land for 5.13.

We also fixed some deadlocks in the rename code, fixed a quota
accounting leak when FSSETXATTR fails, restored the behavior that write
faults to an mmap'd region actually cause a SIGBUS, fixed a bug where
sgid directory inheritance wasn't quite working properly, and fixed a
bug where symlinks weren't working properly in ecryptfs.  We also now
advertise the inode btree counters feature that was introduced two
cycles ago.

This branch merges cleanly with 5.11, but there were a few merge
conflicts with the pidfd tree that Stephen Rothwell noticed in for-next.
Christian Brauner is trying to create per-mount id mappings, which
apparently requires passing the per-mount user namespace deep into the
filesystems, either directly or through struct files.

The first conflict arises from Christoph's fix for gid inheritance; I
think it can be resolved as follows:

diff --cc fs/xfs/xfs_inode.c
index 636ac13b1df2,95b7f2ba4e06..
--- a/fs/xfs/xfs_inode.c
+++ b/fs/xfs/xfs_inode.c
@@@ -809,13 -810,13 +810,13 @@@ xfs_init_new_inode
inode->i_rdev = rdev;
ip->i_d.di_projid = prid;
  
 -  if (pip && XFS_INHERIT_GID(pip)) {
 -  inode->i_gid = VFS_I(pip)->i_gid;
 -  if ((VFS_I(pip)->i_mode & S_ISGID) && S_ISDIR(mode))
 -  inode->i_mode |= S_ISGID;
 +  if (dir && !(dir->i_mode & S_ISGID) &&
 +  (mp->m_flags & XFS_MOUNT_GRPID)) {
-   inode->i_uid = current_fsuid();
++  inode->i_uid = fsuid_into_mnt(mnt_userns);
 +  inode->i_gid = dir->i_gid;
 +  inode->i_mode = mode;
} else {
-   inode_init_owner(inode, dir, mode);
 -  inode->i_gid = fsgid_into_mnt(mnt_userns);
++  inode_init_owner(mnt_userns, inode, dir, mode);
}
  
/*

I think the important bits here are making sure the previous
current_fs[ug]id() calls get turned into fs[ug]id_into_mnt() calls, and
making sure the mnt_userns pointer gets passed to inode_init_owner().

The second conflict involves the quota reservation rework patchset, and
I think it can be resolved as follows:

diff --cc fs/xfs/xfs_ioctl.c
index 248083ea0276,3d4c7ca080fb..
--- a/fs/xfs/xfs_ioctl.c
+++ b/fs/xfs/xfs_ioctl.c
@@@ -1275,9 -1280,9 +1280,10 @@@ xfs_ioctl_setattr_prepare_dax
   */
  static struct xfs_trans *
  xfs_ioctl_setattr_get_trans(
-   struct xfs_inode*ip,
 -  struct file *file)
++  struct file *file,
 +  struct xfs_dquot*pdqp)
  {
+   struct xfs_inode*ip = XFS_I(file_inode(file));
struct xfs_mount*mp = ip->i_mount;
struct xfs_trans*tp;
int error = -EROFS;
@@@ -1461,9 -1470,9 +1469,9 @@@ xfs_ioctl_setattr
  
xfs_ioctl_setattr_prepare_dax(ip, fa);
  
-   tp = xfs_ioctl_setattr_get_trans(ip, pdqp);
 -  tp = xfs_ioctl_setattr_get_trans(file);
++  tp = xfs_ioctl_setattr_get_trans(file, pdqp);
if (IS_ERR(tp)) {
 -  code = PTR_ERR(tp);
 +  error = PTR_ERR(tp);
goto error_free_dquots;
}
  
@@@ -1599,7 -1615,7 +1606,7 @@@ xfs_ioc_setxflags
  
xfs_ioctl_setattr_prepare_dax(ip, );
  
-   tp = xfs_ioctl_setattr_get_trans(ip, NULL);
 -  tp = xfs_ioctl_setattr_get_trans(filp);
++  tp = xfs_ioctl_setattr_get_trans(filp, NULL);
if (IS_ERR(tp)) {
error = PTR_ERR(tp);
goto out_drop_write;

Mr. Brauner swapped the xfs_inode pointer in the first argument of
xfs_ioctl_setattr_get_trans for a struct file, and I added a second
argument to pass a xfs_dquot that we're making reservations against into
the get_trans function.  The rest of the diff updates the callsite
parameters.

After the merge, the function signature should be:

static struct xfs_trans *
xfs_ioctl_setattr_get_trans(
struct file *file,
struct xfs_dquot*pdqp) {...}

The third conflict is also from the quota rework patchset, and (AFAICT)

Re: [PATCH] fs: export kern_path_locked

2021-02-18 Thread Al Viro

On Tue, Feb 16, 2021 at 06:00:34PM +, Al Viro wrote:

> Sigh...  OK, so we want something like
>   user_path_create()
>   vfs_mknod()
>   created = true
>   grab bindlock
>   
>   drop bindlock
>   if failed && created
>   vfs_unlink()
>   done_path_create()  
> in unix_bind()...  That would push ->bindlock all way down in the hierarchy,
> so that should be deadlock-free, but it looks like that'll be fucking ugly ;-/
> 
> Let me try and play with that a bit, maybe it can be massaged to something
> relatively sane...

OK...  Completely untested series follows.  Preliminary massage in first
6 patches, then actual "add cleanup on failure", then minor followup
cleanup.
  af_unix: take address assignment/hash insertion into a new helper
  unix_bind(): allocate addr earlier
  unix_bind(): separate BSD and abstract cases
  unix_bind(): take BSD and abstract address cases into new helpers
  fold unix_mknod() into unix_bind_bsd()
  unix_bind_bsd(): move done_path_create() call after dealing with 
->bindlock
  unix_bind_bsd(): unlink if we fail after successful mknod
  __unix_find_socket_byname(): don't pass hash and type separately

Branch is in git.kernel.org/pub/scm/linux/kernel/git/viro/vfs.git #misc.af_unix,
individual patches in followups

Re: [PATCH] cpufreq: schedutil: Don't consider freq reduction to busy CPU if need_freq_update is set

2021-02-18 Thread Viresh Kumar

On 19-02-21, 11:38, Yue Hu wrote:
> There's a possibility: we will use the previous freq to update if next_f
> is reduced for busy CPU if need_freq_update is set in
> sugov_update_next_freq().

Right.

> This possibility would happen now? And this
> update is what we want if it happens?

This is exactly what we want here, don't reduce speed for busy CPU, but we also
need to make sure we are in the policy's valid range which cpufreq core will
take care of.

> This is related to another possible patch ready to send.

I am not sure what's there to send now.

-- 
viresh

Re: [PATCH v2 1/4] hmm: Device exclusive memory access

2021-02-18 Thread kernel test robot

Hi Alistair,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kselftest/next]
[also build test ERROR on linus/master v5.11 next-20210218]
[cannot apply to hnaz-linux-mm/master]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Alistair-Popple/Add-support-for-SVM-atomics-in-Nouveau/20210219-100858
base:   
https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git next
config: mips-randconfig-r036-20210218 (attached as .config)
compiler: clang version 12.0.0 (https://github.com/llvm/llvm-project 
c9439ca36342fb6013187d0a69aef92736951476)
reproduce (this is a W=1 build):
wget 
https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O 
~/bin/make.cross
chmod +x ~/bin/make.cross
# install mips cross compiling tool for clang build
# apt-get install binutils-mips-linux-gnu
# 
https://github.com/0day-ci/linux/commit/bb5444811772d30b2e3bbaa44baeb8a4b3f03cec
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Alistair-Popple/Add-support-for-SVM-atomics-in-Nouveau/20210219-100858
git checkout bb5444811772d30b2e3bbaa44baeb8a4b3f03cec
# save the attached .config to linux build tree
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=mips 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All error/warnings (new ones prefixed by >>):

>> fs/proc/task_mmu.c:521:12: error: implicit declaration of function 
>> 'is_device_exclusive_entry' [-Werror,-Wimplicit-function-declaration]
   else if (is_device_exclusive_entry(swpent))
^
   fs/proc/task_mmu.c:521:12: note: did you mean 'is_device_private_entry'?
   include/linux/swapops.h:176:20: note: 'is_device_private_entry' declared here
   static inline bool is_device_private_entry(swp_entry_t entry)
  ^
>> fs/proc/task_mmu.c:522:11: error: implicit declaration of function 
>> 'device_exclusive_entry_to_page' [-Werror,-Wimplicit-function-declaration]
   page = device_exclusive_entry_to_page(swpent);
  ^
   fs/proc/task_mmu.c:522:11: note: did you mean 'device_private_entry_to_page'?
   include/linux/swapops.h:191:28: note: 'device_private_entry_to_page' 
declared here
   static inline struct page *device_private_entry_to_page(swp_entry_t entry)
  ^
>> fs/proc/task_mmu.c:522:9: warning: incompatible integer to pointer 
>> conversion assigning to 'struct page *' from 'int' [-Wint-conversion]
   page = device_exclusive_entry_to_page(swpent);
^ ~~
   fs/proc/task_mmu.c:1395:7: error: implicit declaration of function 
'is_device_exclusive_entry' [-Werror,-Wimplicit-function-declaration]
   if (is_device_exclusive_entry(entry))
   ^
   fs/proc/task_mmu.c:1396:11: error: implicit declaration of function 
'device_exclusive_entry_to_page' [-Werror,-Wimplicit-function-declaration]
   page = device_exclusive_entry_to_page(entry);
  ^
   fs/proc/task_mmu.c:1396:9: warning: incompatible integer to pointer 
conversion assigning to 'struct page *' from 'int' [-Wint-conversion]
   page = device_exclusive_entry_to_page(entry);
^ ~
   2 warnings and 4 errors generated.


vim +/is_device_exclusive_entry +521 fs/proc/task_mmu.c

   490  
   491  static void smaps_pte_entry(pte_t *pte, unsigned long addr,
   492  struct mm_walk *walk)
   493  {
   494  struct mem_size_stats *mss = walk->private;
   495  struct vm_area_struct *vma = walk->vma;
   496  bool locked = !!(vma->vm_flags & VM_LOCKED);
   497  struct page *page = NULL;
   498  
   499  if (pte_present(*pte)) {
   500  page = vm_normal_page(vma, addr, *pte);
   501  } else if (is_swap_pte(*pte)) {
   502  swp_entry_t swpent = pte_to_swp_entry(*pte);
   503  
   504  if (!non_swap_entry(swpent)) {
   505  int mapcount;
   506  
   507  mss->swap += PAGE_SIZE;
   508  mapcount = swp_swapcount(swpent);
   509  if (mapcount >= 2) {
   510  u64 pss_delta = (u64)PAGE_SIZE << 
PSS_SHIFT;
   511  
   512  do_div(pss_delta, mapcount);
   513

Re: [PATCH v8 7/9] crypto: hisilicon/hpre - add 'ECDH' algorithm

2021-02-18 Thread Herbert Xu

On Fri, Feb 19, 2021 at 09:25:13AM +0800, yumeng wrote:
>
> And p224 and p521 are the same as p384 (has no user and no
> generic implementation), so they should be supported by HPRE later,
> is it?

Right.

Thanks,
-- 
Email: Herbert Xu 
Home Page: http://gondor.apana.org.au/~herbert/
PGP Key: http://gondor.apana.org.au/~herbert/pubkey.txt

DID YOU RECEIVE MY MAIL?

2021-02-18 Thread Abdwabbo Maddah

-- 
Dear,
I had sent you a mail but i don't think you received it that's why am
writing you again.It is important you get back to me as soon as you
can.
Abd-Wabbo Maddah

Re: [PATCH v12 13/14] mm/vmalloc: Hugepage vmalloc mappings

2021-02-18 Thread Ding Tianhong

Hi Nicholas:

I met some problem for this patch, like this:

kva = vmalloc(3*1024k);

remap_vmalloc_range(xxx, kva, xxx)

It failed because that the check for page_count(page) is null so return, it 
break the some logic for current modules.
because the new huge page is not valid for composed page.

I think some guys really don't get used to the changes for the vmalloc that the 
small pages was transparency to the hugepage
when the size is bigger than the PMD_SIZE.

can we think about give a new static huge page to fix it? just like use a a new 
vmalloc_huge_xxx function to disginguish the current function,
the user could choose to use the transparent hugepage or static hugepage for 
vmalloc.

Thanks
Ding


On 2021/2/2 19:05, Nicholas Piggin wrote:
> Support huge page vmalloc mappings. Config option HAVE_ARCH_HUGE_VMALLOC
> enables support on architectures that define HAVE_ARCH_HUGE_VMAP and
> supports PMD sized vmap mappings.
> 
> vmalloc will attempt to allocate PMD-sized pages if allocating PMD size
> or larger, and fall back to small pages if that was unsuccessful.
> 
> Architectures must ensure that any arch specific vmalloc allocations
> that require PAGE_SIZE mappings (e.g., module allocations vs strict
> module rwx) use the VM_NOHUGE flag to inhibit larger mappings.
> 
> This can result in more internal fragmentation and memory overhead for a
> given allocation, an option nohugevmalloc is added to disable at boot.
> 
> Signed-off-by: Nicholas Piggin 
> ---
>  arch/Kconfig|  11 ++
>  include/linux/vmalloc.h |  21 
>  mm/page_alloc.c |   5 +-
>  mm/vmalloc.c| 215 +++-
>  4 files changed, 205 insertions(+), 47 deletions(-)
> 
> diff --git a/arch/Kconfig b/arch/Kconfig
> index 24862d15f3a3..eef170e0c9b8 100644
> --- a/arch/Kconfig
> +++ b/arch/Kconfig
> @@ -724,6 +724,17 @@ config HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD
>  config HAVE_ARCH_HUGE_VMAP
>   bool
>  
> +#
> +#  Archs that select this would be capable of PMD-sized vmaps (i.e.,
> +#  arch_vmap_pmd_supported() returns true), and they must make no assumptions
> +#  that vmalloc memory is mapped with PAGE_SIZE ptes. The VM_NO_HUGE_VMAP 
> flag
> +#  can be used to prohibit arch-specific allocations from using hugepages to
> +#  help with this (e.g., modules may require it).
> +#
> +config HAVE_ARCH_HUGE_VMALLOC
> + depends on HAVE_ARCH_HUGE_VMAP
> + bool
> +
>  config ARCH_WANT_HUGE_PMD_SHARE
>   bool
>  
> diff --git a/include/linux/vmalloc.h b/include/linux/vmalloc.h
> index 99ea72d547dc..93270adf5db5 100644
> --- a/include/linux/vmalloc.h
> +++ b/include/linux/vmalloc.h
> @@ -25,6 +25,7 @@ struct notifier_block;  /* in notifier.h */
>  #define VM_NO_GUARD  0x0040  /* don't add guard page */
>  #define VM_KASAN 0x0080  /* has allocated kasan shadow 
> memory */
>  #define VM_MAP_PUT_PAGES 0x0100  /* put pages and free array in 
> vfree */
> +#define VM_NO_HUGE_VMAP  0x0200  /* force PAGE_SIZE pte 
> mapping */
>  
>  /*
>   * VM_KASAN is used slighly differently depending on CONFIG_KASAN_VMALLOC.
> @@ -59,6 +60,9 @@ struct vm_struct {
>   unsigned long   size;
>   unsigned long   flags;
>   struct page **pages;
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC
> + unsigned intpage_order;
> +#endif
>   unsigned intnr_pages;
>   phys_addr_t phys_addr;
>   const void  *caller;
> @@ -193,6 +197,22 @@ void free_vm_area(struct vm_struct *area);
>  extern struct vm_struct *remove_vm_area(const void *addr);
>  extern struct vm_struct *find_vm_area(const void *addr);
>  
> +static inline bool is_vm_area_hugepages(const void *addr)
> +{
> + /*
> +  * This may not 100% tell if the area is mapped with > PAGE_SIZE
> +  * page table entries, if for some reason the architecture indicates
> +  * larger sizes are available but decides not to use them, nothing
> +  * prevents that. This only indicates the size of the physical page
> +  * allocated in the vmalloc layer.
> +  */
> +#ifdef CONFIG_HAVE_ARCH_HUGE_VMALLOC
> + return find_vm_area(addr)->page_order > 0;
> +#else
> + return false;
> +#endif
> +}
> +
>  #ifdef CONFIG_MMU
>  int vmap_range(unsigned long addr, unsigned long end,
>   phys_addr_t phys_addr, pgprot_t prot,
> @@ -210,6 +230,7 @@ static inline void set_vm_flush_reset_perms(void *addr)
>   if (vm)
>   vm->flags |= VM_FLUSH_RESET_PERMS;
>  }
> +
>  #else
>  static inline int
>  map_kernel_range_noflush(unsigned long start, unsigned long size,
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 519a60d5b6f7..1116ce45744b 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -72,6 +72,7 @@
>  #include 
>  #include 
>  #include 
> +#include 
>  
>  #include 
>  #include 
> @@ -8240,6 +8241,7 @@ void *__init

[PATCH] Input: zinitix - fix return type of zinitix_init_touch()

2021-02-18 Thread Dmitry Torokhov

zinitix_init_touch() returns error code or 0 for success and therefore
return type must be int, not bool.

Fixes: 26822652c85e ("Input: add zinitix touchscreen driver")
Reported-by: kernel test robot 
Reported-by: Jiapeng Chong 
Signed-off-by: Dmitry Torokhov 
---
 drivers/input/touchscreen/zinitix.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/input/touchscreen/zinitix.c 
b/drivers/input/touchscreen/zinitix.c
index f64d88170fac..3b636beb583c 100644
--- a/drivers/input/touchscreen/zinitix.c
+++ b/drivers/input/touchscreen/zinitix.c
@@ -190,7 +190,7 @@ static int zinitix_write_cmd(struct i2c_client *client, u16 
reg)
return 0;
 }
 
-static bool zinitix_init_touch(struct bt541_ts_data *bt541)
+static int zinitix_init_touch(struct bt541_ts_data *bt541)
 {
struct i2c_client *client = bt541->client;
int i;
-- 
2.30.0.617.g56c4b15f3c-goog


-- 
Dmitry

[PATCH] staging: emxx_udc: remove unused variable driver_desc

2021-02-18 Thread Sean Behan

Signed-off-by: Sean Behan 
---
 drivers/staging/emxx_udc/emxx_udc.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/staging/emxx_udc/emxx_udc.c 
b/drivers/staging/emxx_udc/emxx_udc.c
index 3536c03ff523..741147a4f0fe 100644
--- a/drivers/staging/emxx_udc/emxx_udc.c
+++ b/drivers/staging/emxx_udc/emxx_udc.c
@@ -38,7 +38,6 @@ static struct gpio_desc *vbus_gpio;
 static int vbus_irq;
 
 static const char  driver_name[] = "emxx_udc";
-static const char  driver_desc[] = DRIVER_DESC;
 
 /*===*/
 /* Prototype */
-- 
2.29.2

Re: [PATCH] cpufreq: schedutil: Don't consider freq reduction to busy CPU if need_freq_update is set

2021-02-18 Thread Yue Hu

On Thu, 18 Feb 2021 15:50:29 +0530
Viresh Kumar  wrote:

> On 18-02-21, 16:25, Yue Hu wrote:
> > From: Yue Hu 
> > 
> > For busy CPU case, we do not need to avoid freq reduction if limits
> > change since commit 600f5badb78c ("cpufreq: schedutil: Don't skip
> > freq update when limits change").
> > 
> > Later, commit 23a881852f3e ("cpufreq: schedutil: Don't skip freq
> > update if need_freq_update is set") discarded the need_freq_update
> > check for special case of busy CPU because we won't abort a
> > frequency update anymore if need_freq_update is set.
> > 
> > That is nonlogical since we will not reduce the freq for busy CPU
> > if the computed next_f is really reduced when limits change.  
> 
> Schedutil governor will probably ask for a higher frequency than
> allowed, but cpufreq core will clamp the request between policy
> min/max before updating the frequency here.
> 
> We added the check in 600f5badb78c here earlier as there were chances
> that we will abort the operation without reaching to cpufreq core,
> which won't happen now.
> 

There's a possibility: we will use the previous freq to update if next_f
is reduced for busy CPU if need_freq_update is set in
sugov_update_next_freq(). This possibility would happen now? And this
update is what we want if it happens? This is related to another
possible patch ready to send.

Thank you.

[GIT PULL] iomap: new code for 5.12-rc1

2021-02-18 Thread Darrick J. Wong

Hi Linus,

Please pull these new changes to the iomap code for 5.12.  The big
change in this cycle is some new code to make it possible for XFS to try
unaligned directio overwrites without taking locks.  If the block is
fully written and within EOF (i.e. doesn't require any further fs
intervention) then we can let the unlocked write proceed.  If not, we
fall back to synchronizing direct writes.

Note that the btrfs developers have been working on supporting zoned
block devices, and their 5.12 pull request has a single iomap patch to
adjust directio writes to support REQ_OP_APPEND.

The branch merges cleanly with 5.11 and has been soaking in for-next for
quite a while now.  Please let me know if there are any strange
problems.  It's been a pretty quiet cycle, so I don't anticipate any
more iomap pulls other than whatever new bug fixes show up.

--D (whose pull requests are delayed by last weekend's wild ride :( )

The following changes since commit 19c329f6808995b142b3966301f217c831e7cf31:

  Linux 5.11-rc4 (2021-01-17 16:37:05 -0800)

are available in the Git repository at:

  git://git.kernel.org/pub/scm/fs/xfs/xfs-linux.git tags/iomap-5.12-merge-2

for you to fetch changes up to ed1128c2d0c87e5ff49c40f5529f06bc35f4251b:

  xfs: reduce exclusive locking on unaligned dio (2021-02-01 09:47:19 -0800)


New code for 5.12:
- Adjust the final parameter of iomap_dio_rw.
- Add a new flag to request that iomap directio writes return EAGAIN if
  the write is not a pure overwrite within EOF; this will be used to
  reduce lock contention with unaligned direct writes on XFS.
- Amend XFS' directio code to eliminate exclusive locking for unaligned
  direct writes if the circumstances permit


Christoph Hellwig (9):
  iomap: rename the flags variable in __iomap_dio_rw
  iomap: pass a flags argument to iomap_dio_rw
  iomap: add a IOMAP_DIO_OVERWRITE_ONLY flag
  xfs: factor out a xfs_ilock_iocb helper
  xfs: make xfs_file_aio_write_checks IOCB_NOWAIT-aware
  xfs: cleanup the read/write helper naming
  xfs: remove the buffered I/O fallback assert
  xfs: simplify the read/write tracepoints
  xfs: improve the reflink_bounce_dio_write tracepoint

Dave Chinner (2):
  xfs: split the unaligned DIO write code out
  xfs: reduce exclusive locking on unaligned dio

 fs/btrfs/file.c   |   7 +-
 fs/ext4/file.c|   5 +-
 fs/gfs2/file.c|   7 +-
 fs/iomap/direct-io.c  |  26 ++--
 fs/xfs/xfs_file.c | 351 --
 fs/xfs/xfs_iomap.c|  29 +++--
 fs/xfs/xfs_trace.h|  22 ++--
 fs/zonefs/super.c |   4 +-
 include/linux/iomap.h |  18 ++-
 9 files changed, 269 insertions(+), 200 deletions(-)

Re: [PATCH] Input: fix boolreturn.cocci warnings

2021-02-18 Thread Dmitry Torokhov

Hi,

On Thu, Feb 18, 2021 at 05:59:53PM +0800, kernel test robot wrote:
> From: kernel test robot 
> 
> drivers/input/touchscreen/zinitix.c:250:8-9: WARNING: return of 0/1 in 
> function 'zinitix_init_touch' with return type bool
> 
>  Return statements in functions returning bool should use
>  true/false instead of 1/0.
> Generated by: scripts/coccinelle/misc/boolreturn.cocci
> 
> Fixes: 26822652c85e ("Input: add zinitix touchscreen driver")
> CC: Michael Srba 
> Reported-by: kernel test robot 
> Signed-off-by: kernel test robot 
> ---
> 
> tree:   https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git 
> master
> head:   f40ddce88593482919761f74910f42f4b84c004b
> commit: 26822652c85eff14e40115255727b2693400c524 Input: add zinitix 
> touchscreen driver
> 
>  zinitix.c |2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> --- a/drivers/input/touchscreen/zinitix.c
> +++ b/drivers/input/touchscreen/zinitix.c
> @@ -247,7 +247,7 @@ static bool zinitix_init_touch(struct bt
>   udelay(10);
>   }
>  
> - return 0;
> + return false;

This is incorrect as function is trying to return error codes. It needs
to be changed to return int. I'll take care of it.

Thanks.

-- 
Dmitry

Re: [PATCH] Input: Use true and false for bool variable

2021-02-18 Thread Dmitry Torokhov

Hi,

On Thu, Feb 18, 2021 at 06:23:55PM +0800, Jiapeng Chong wrote:
> Fix the following coccicheck warnings:
> 
> ./drivers/input/touchscreen/zinitix.c:250:8-9: WARNING: return of 0/1 in
> function 'zinitix_init_touch' with return type bool.
> 
> Reported-by: Abaci Robot 
> Signed-off-by: Jiapeng Chong 
> ---
>  drivers/input/touchscreen/zinitix.c | 2 +-
>  1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/drivers/input/touchscreen/zinitix.c 
> b/drivers/input/touchscreen/zinitix.c
> index a3e3adb..acb1d53 100644
> --- a/drivers/input/touchscreen/zinitix.c
> +++ b/drivers/input/touchscreen/zinitix.c
> @@ -247,7 +247,7 @@ static bool zinitix_init_touch(struct bt541_ts_data 
> *bt541)
>   udelay(10);
>   }
>  
> - return 0;
> + return false;

This is incorrect, as earlier we try to return error codes from this
function. It needs to be changed to return int, I'll take care of it.

Thanks.

-- 
Dmitry

Re: [PATCH v1] vdpa/mlx5: Restore the hardware used index after change map

2021-02-18 Thread Jason Wang




On 2021/2/18 8:43 下午, Si-Wei Liu wrote:



On 2/17/2021 8:44 PM, Jason Wang wrote:


On 2021/2/10 下午4:59, Si-Wei Liu wrote:



On 2/9/2021 7:53 PM, Jason Wang wrote:


On 2021/2/10 上午10:30, Si-Wei Liu wrote:



On 2/8/2021 10:37 PM, Jason Wang wrote:


On 2021/2/9 下午2:12, Eli Cohen wrote:

On Tue, Feb 09, 2021 at 11:20:14AM +0800, Jason Wang wrote:

On 2021/2/8 下午6:04, Eli Cohen wrote:

On Mon, Feb 08, 2021 at 05:04:27PM +0800, Jason Wang wrote:

On 2021/2/8 下午2:37, Eli Cohen wrote:

On Mon, Feb 08, 2021 at 12:27:18PM +0800, Jason Wang wrote:

On 2021/2/6 上午7:07, Si-Wei Liu wrote:

On 2/3/2021 11:36 PM, Eli Cohen wrote:
When a change of memory map occurs, the hardware 
resources are destroyed
and then re-created again with the new memory map. In 
such case, we need
to restore the hardware available and used indices. The 
driver failed to

restore the used index which is added here.

Also, since the driver also fails to reset the available 
and used
indices upon device reset, fix this here to avoid 
regression caused by

the fact that used index may not be zero upon device reset.

Fixes: 1a86b377aa21 ("vdpa/mlx5: Add VDPA driver for 
supported mlx5

devices")
Signed-off-by: Eli Cohen
---
v0 -> v1:
Clear indices upon device reset

     drivers/vdpa/mlx5/net/mlx5_vnet.c | 18 
++

     1 file changed, 18 insertions(+)

diff --git a/drivers/vdpa/mlx5/net/mlx5_vnet.c
b/drivers/vdpa/mlx5/net/mlx5_vnet.c
index 88dde3455bfd..b5fe6d2ad22f 100644
--- a/drivers/vdpa/mlx5/net/mlx5_vnet.c
+++ b/drivers/vdpa/mlx5/net/mlx5_vnet.c
@@ -87,6 +87,7 @@ struct mlx5_vq_restore_info {
     u64 device_addr;
     u64 driver_addr;
     u16 avail_index;
+    u16 used_index;
     bool ready;
     struct vdpa_callback cb;
     bool restore;
@@ -121,6 +122,7 @@ struct mlx5_vdpa_virtqueue {
     u32 virtq_id;
     struct mlx5_vdpa_net *ndev;
     u16 avail_idx;
+    u16 used_idx;
     int fw_state;
       /* keep last in the struct */
@@ -804,6 +806,7 @@ static int create_virtqueue(struct 
mlx5_vdpa_net

*ndev, struct mlx5_vdpa_virtque
       obj_context = 
MLX5_ADDR_OF(create_virtio_net_q_in, in,

obj_context);
     MLX5_SET(virtio_net_q_object, obj_context, 
hw_available_index,

mvq->avail_idx);
+    MLX5_SET(virtio_net_q_object, obj_context, 
hw_used_index,

mvq->used_idx);
     MLX5_SET(virtio_net_q_object, obj_context,
queue_feature_bit_mask_12_3,
get_features_12_3(ndev->mvdev.actual_features));
     vq_ctx = MLX5_ADDR_OF(virtio_net_q_object, 
obj_context,

virtio_q_context);
@@ -1022,6 +1025,7 @@ static int connect_qps(struct 
mlx5_vdpa_net

*ndev, struct mlx5_vdpa_virtqueue *m
     struct mlx5_virtq_attr {
     u8 state;
     u16 available_index;
+    u16 used_index;
     };
       static int query_virtqueue(struct mlx5_vdpa_net 
*ndev, struct

mlx5_vdpa_virtqueue *mvq,
@@ -1052,6 +1056,7 @@ static int query_virtqueue(struct
mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqueu
     memset(attr, 0, sizeof(*attr));
     attr->state = MLX5_GET(virtio_net_q_object, 
obj_context, state);
     attr->available_index = 
MLX5_GET(virtio_net_q_object,

obj_context, hw_available_index);
+    attr->used_index = MLX5_GET(virtio_net_q_object, 
obj_context,

hw_used_index);
     kfree(out);
     return 0;
     @@ -1535,6 +1540,16 @@ static void 
teardown_virtqueues(struct

mlx5_vdpa_net *ndev)
     }
     }
     +static void clear_virtqueues(struct mlx5_vdpa_net 
*ndev)

+{
+    int i;
+
+    for (i = ndev->mvdev.max_vqs - 1; i >= 0; i--) {
+    ndev->vqs[i].avail_idx = 0;
+    ndev->vqs[i].used_idx = 0;
+    }
+}
+
     /* TODO: cross-endian support */
     static inline bool mlx5_vdpa_is_little_endian(struct 
mlx5_vdpa_dev

*mvdev)
     {
@@ -1610,6 +1625,7 @@ static int save_channel_info(struct
mlx5_vdpa_net *ndev, struct mlx5_vdpa_virtqu
     return err;
       ri->avail_index = attr.available_index;
+    ri->used_index = attr.used_index;
     ri->ready = mvq->ready;
     ri->num_ent = mvq->num_ent;
     ri->desc_addr = mvq->desc_addr;
@@ -1654,6 +1670,7 @@ static void 
restore_channels_info(struct

mlx5_vdpa_net *ndev)
     continue;
       mvq->avail_idx = ri->avail_index;
+    mvq->used_idx = ri->used_index;
     mvq->ready = ri->ready;
     mvq->num_ent = ri->num_ent;
     mvq->desc_addr = ri->desc_addr;
@@ -1768,6 +1785,7 @@ static void 
mlx5_vdpa_set_status(struct

vdpa_device *vdev, u8 status)
     if (!status) {
     mlx5_vdpa_info(mvdev, "performing device 
reset\n");

     teardown_driver(ndev);
+    clear_virtqueues(ndev);
The clearing looks fine at the first glance, as it aligns 
with the other
state cleanups floating around at the same place. However, 
the thing is
get_vq_state() is supposed to be called right after to get 
sync'ed with
the latest internal avail_index from device while vq is 
stopped. The
index was saved

Re: [PATCH] of: error: 'const struct kimage' has no member named 'arch'

2021-02-18 Thread Lakshmi Ramasubramanian


On 2/18/21 5:13 PM, Thiago Jung Bauermann wrote:


Lakshmi Ramasubramanian  writes:


On 2/18/21 4:07 PM, Mimi Zohar wrote:

Hi Mimi,


On Thu, 2021-02-18 at 14:33 -0800, Lakshmi Ramasubramanian wrote:

of_kexec_alloc_and_setup_fdt() defined in drivers/of/kexec.c builds
a new device tree object that includes architecture specific data
for kexec system call.  This should be defined only if the architecture
being built defines kexec architecture structure "struct kimage_arch".

Define a new boolean config OF_KEXEC that is enabled if
CONFIG_KEXEC_FILE and CONFIG_OF_FLATTREE are enabled, and
the architecture is arm64 or powerpc64.  Build drivers/of/kexec.c
if CONFIG_OF_KEXEC is enabled.

Signed-off-by: Lakshmi Ramasubramanian 
Fixes: 33488dc4d61f ("of: Add a common kexec FDT setup function")
Reported-by: kernel test robot 
---
   drivers/of/Kconfig  | 6 ++
   drivers/of/Makefile | 7 +--
   2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig
index 18450437d5d5..f2e8fa54862a 100644
--- a/drivers/of/Kconfig
+++ b/drivers/of/Kconfig
@@ -100,4 +100,10 @@ config OF_DMA_DEFAULT_COHERENT
# arches should select this if DMA is coherent by default for OF devices
bool
   +config OF_KEXEC
+   bool
+   depends on KEXEC_FILE
+   depends on OF_FLATTREE
+   default y if ARM64 || PPC64
+
   endif # OF
diff --git a/drivers/of/Makefile b/drivers/of/Makefile
index c13b982084a3..287579dd1695 100644
--- a/drivers/of/Makefile
+++ b/drivers/of/Makefile
@@ -13,11 +13,6 @@ obj-$(CONFIG_OF_RESERVED_MEM) += of_reserved_mem.o
   obj-$(CONFIG_OF_RESOLVE)  += resolver.o
   obj-$(CONFIG_OF_OVERLAY) += overlay.o
   obj-$(CONFIG_OF_NUMA) += of_numa.o
-
-ifdef CONFIG_KEXEC_FILE
-ifdef CONFIG_OF_FLATTREE
-obj-y  += kexec.o
-endif
-endif
+obj-$(CONFIG_OF_KEXEC) += kexec.o
 obj-$(CONFIG_OF_UNITTEST) += unittest-data/

Is it possible to reuse CONFIG_HAVE_IMA_KEXEC here?



For ppc64 CONFIG_HAVE_IMA_KEXEC is selected when CONFIG_KEXEC_FILE is enabled.
So I don't see a problem in reusing CONFIG_HAVE_IMA_KEXEC for ppc.

But for arm64, CONFIG_HAVE_IMA_KEXEC is enabled in the final patch in the patch
set (the one for carrying forward IMA log across kexec for arm64). arm64 calls
of_kexec_alloc_and_setup_fdt() prior to enabling CONFIG_HAVE_IMA_KEXEC and hence
breaks the build for arm64.


One problem is that I believe that this patch won't placate the robot,
because IIUC it generates config files at random and this change still
allows hppa and s390 to enable CONFIG_OF_KEXEC.


I enabled CONFIG_OF_KEXEC for s390. With my patch applied, 
CONFIG_OF_KEXEC is removed. So I think the robot enabling this config 
would not be a problem.




Perhaps a new CONFIG_HAVE_KIMAGE_ARCH option? Not having that option
would still allow building kexec.o, but would be used inside kexec.c to
avoid accessing kimage.arch members.



I think this is a good idea - a new CONFIG_HAVE_KIMAGE_ARCH, which will 
be selected by arm64 and ppc for now. I tried this, and it fixes the 
build issue.


Although, the name for the new config can be misleading since PARISC, 
for instance, also defines "struct kimage_arch". Perhaps, 
CONFIG_HAVE_ELF_KIMAGE_ARCH since of_kexec_alloc_and_setup_fdt() is 
accessing ELF specific fields in "struct kimage_arch"?


Rob/Mimi - please let us know which approach you think is better.

thanks,
 -lakshmi

Re: [PATCH v8 2/6] arm64: hyperv: Add Hyper-V clocksource/clockevent support

2021-02-18 Thread kernel test robot

Hi Michael,

Thank you for the patch! Perhaps something to improve:

[auto build test WARNING on arm64/for-next/core]
[also build test WARNING on tip/timers/core efi/next linus/master v5.11 
next-20210218]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:
https://github.com/0day-ci/linux/commits/Michael-Kelley/Enable-Linux-guests-on-Hyper-V-on-ARM64/20210219-072336
base:   https://git.kernel.org/pub/scm/linux/kernel/git/arm64/linux.git 
for-next/core
config: i386-randconfig-a003-20210218 (attached as .config)
compiler: gcc-9 (Debian 9.3.0-15) 9.3.0
reproduce (this is a W=1 build):
# 
https://github.com/0day-ci/linux/commit/a8eb25332c441e0965c0ecdfb1a86b507e3465e1
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review 
Michael-Kelley/Enable-Linux-guests-on-Hyper-V-on-ARM64/20210219-072336
git checkout a8eb25332c441e0965c0ecdfb1a86b507e3465e1
# save the attached .config to linux build tree
make W=1 ARCH=i386 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot 

All warnings (new ones prefixed by >>):

>> drivers/clocksource/hyperv_timer.c:478:44: warning: 'struct 
>> acpi_table_header' declared inside parameter list will not be visible 
>> outside of this definition or declaration
 478 | static int __init hyperv_timer_init(struct acpi_table_header *table)
 |^
   drivers/clocksource/hyperv_timer.c: In function 'hyperv_timer_init':
   drivers/clocksource/hyperv_timer.c:484:6: error: too many arguments to 
function 'hv_stimer_alloc'
 484 |  if (hv_stimer_alloc(true))
 |  ^~~
   drivers/clocksource/hyperv_timer.c:173:5: note: declared here
 173 | int hv_stimer_alloc(void)
 | ^~~
   In file included from include/linux/clockchips.h:14,
from drivers/clocksource/hyperv_timer.c:16:
   drivers/clocksource/hyperv_timer.c: At top level:
   include/linux/clocksource.h:283:50: error: expected ')' before numeric 
constant
 283 |  ACPI_DECLARE_PROBE_ENTRY(timer, name, table_id, 0, NULL, 0, fn)
 |  ^
   drivers/clocksource/hyperv_timer.c:489:1: note: in expansion of macro 
'TIMER_ACPI_DECLARE'
 489 | TIMER_ACPI_DECLARE(hyperv, ACPI_SIG_GTDT, hyperv_timer_init);
 | ^~
   drivers/clocksource/hyperv_timer.c:478:19: warning: 'hyperv_timer_init' 
defined but not used [-Wunused-function]
 478 | static int __init hyperv_timer_init(struct acpi_table_header *table)
 |   ^


vim +478 drivers/clocksource/hyperv_timer.c

   476  
   477  /* Initialize everything on ARM64 */
 > 478  static int __init hyperv_timer_init(struct acpi_table_header *table)

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-...@lists.01.org


.config.gz
Description: application/gzip

mmotm 2021-02-18-18-29 uploaded

2021-02-18 Thread akpm

The mm-of-the-moment snapshot 2021-02-18-18-29 has been uploaded to

   https://www.ozlabs.org/~akpm/mmotm/

mmotm-readme.txt says

README for mm-of-the-moment:

https://www.ozlabs.org/~akpm/mmotm/

This is a snapshot of my -mm patch queue.  Uploaded at random hopefully
more than once a week.

You will need quilt to apply these patches to the latest Linus release (5.x
or 5.x-rcY).  The series file is in broken-out.tar.gz and is duplicated in
https://ozlabs.org/~akpm/mmotm/series

The file broken-out.tar.gz contains two datestamp files: .DATE and
.DATE--mm-dd-hh-mm-ss.  Both contain the string -mm-dd-hh-mm-ss,
followed by the base kernel version against which this patch series is to
be applied.

This tree is partially included in linux-next.  To see which patches are
included in linux-next, consult the `series' file.  Only the patches
within the #NEXT_PATCHES_START/#NEXT_PATCHES_END markers are included in
linux-next.


A full copy of the full kernel tree with the linux-next and mmotm patches
already applied is available through git within an hour of the mmotm
release.  Individual mmotm releases are tagged.  The master branch always
points to the latest release, so it's constantly rebasing.

https://github.com/hnaz/linux-mm

The directory https://www.ozlabs.org/~akpm/mmots/ (mm-of-the-second)
contains daily snapshots of the -mm tree.  It is updated more frequently
than mmotm, and is untested.

A git copy of this tree is also available at

https://github.com/hnaz/linux-mm



This mmotm tree contains the following patches against 5.11:
(patches marked "*" will be included in linux-next)

* proc-kpageflags-prevent-an-integer-overflow-in-stable_page_flags.patch
* proc-kpageflags-do-not-use-uninitialized-struct-pages.patch
* hexagon-remove-config_experimental-from-defconfigs.patch
* scripts-spellingtxt-increase-error-prone-spell-checking.patch
* scripts-spellingtxt-increase-error-prone-spell-checking-2.patch
* scripts-spellingtxt-add-allocted-and-exeeds-typo.patch
* scripts-spellingtxt-add-more-spellings-to-spellingtxt.patch
* ntfs-layouth-delete-duplicated-words.patch
* ocfs2-remove-redundant-conditional-before-iput.patch
* ocfs2-cleanup-some-definitions-which-is-not-used-anymore.patch
* ocfs2-fix-a-use-after-free-on-error.patch
* ocfs2-simplify-the-calculation-of-variables.patch
* ocfs2-clear-links-count-in-ocfs2_mknod-if-an-error-occurs.patch
* ocfs2-fix-ocfs2-corrupt-when-iputting-an-inode.patch
* fs-delete-repeated-words-in-comments.patch
* ramfs-support-o_tmpfile.patch
* kernel-watchdog-flush-all-printk-nmi-buffers-when-hardlockup-detected.patch
  mm.patch
* mm-tracing-record-slab-name-for-kmem_cache_free.patch
* mm-remove-ctor-argument-from-kmem_cache_flags.patch
* mm-slab-minor-coding-style-tweaks.patch
* mm-slub-disable-user-tracing-for-kmemleak-caches-by-default.patch
* mm-slub-stop-freeing-kmem_cache_node-structures-on-node-offline.patch
* mm-slab-slub-stop-taking-memory-hotplug-lock.patch
* mm-slab-slub-stop-taking-cpu-hotplug-lock.patch
* mm-slub-splice-cpu-and-page-freelists-in-deactivate_slab.patch
* 
mm-slub-remove-slub_memcg_sysfs-boot-param-and-config_slub_memcg_sysfs_on.patch
* mm-slub-minor-coding-style-tweaks.patch
* mm-debug-improve-memcg-debugging.patch
* 
mm-debug_vm_pgtable-basic-add-validation-for-dirtiness-after-write-protect.patch
* mm-debug_vm_pgtable-basic-iterate-over-entire-protection_map.patch
* mm-page_owner-use-helper-function-zone_end_pfn-to-get-end_pfn.patch
* mm-msync-exit-early-when-the-flags-is-an-ms_async-and-start-vm_start.patch
* 
mm-filemap-remove-unused-parameter-and-change-to-void-type-for-replace_page_cache_page.patch
* mm-filemap-dont-revert-iter-on-eiocbqueued.patch
* mm-filemap-rename-generic_file_buffered_read-subfunctions.patch
* mm-filemap-remove-dynamically-allocated-array-from-filemap_read.patch
* mm-filemap-convert-filemap_get_pages-to-take-a-pagevec.patch
* mm-filemap-use-head-pages-in-generic_file_buffered_read.patch
* mm-filemap-pass-a-sleep-state-to-put_and_wait_on_page_locked.patch
* mm-filemap-support-readpage-splitting-a-page.patch
* mm-filemap-inline-__wait_on_page_locked_async-into-caller.patch
* mm-filemap-dont-call-readpage-if-iocb_waitq-is-set.patch
* mm-filemap-change-filemap_read_page-calling-conventions.patch
* mm-filemap-change-filemap_create_page-calling-conventions.patch
* mm-filemap-convert-filemap_update_page-to-return-an-errno.patch
* mm-filemap-move-the-iocb-checks-into-filemap_update_page.patch
* mm-filemap-add-filemap_range_uptodate.patch
* mm-filemap-add-filemap_range_uptodate-fix.patch
* mm-filemap-split-filemap_readahead-out-of-filemap_get_pages.patch
* mm-filemap-restructure-filemap_get_pages.patch
* mm-filemap-dont-relock-the-page-after-calling-readpage.patch
* mm-filemap-rename-generic_file_buffered_read-to-filemap_read.patch
* mm-filemap-simplify-generic_file_read_iter.patch
* fs-bufferc-add-checking-buffer-head-stat-before-clear.patch
* mm-backing-dev-remove-duplicated-macro-definition.patch
*

[PATCH v2 1/1] PCI/RCEC: Fix RCiEP capable devices RCEC association

2021-02-18 Thread Qiuxu Zhuo

Function rcec_assoc_rciep() incorrectly used "rciep->devfn" (a single
byte encoding the device and function number) as the device number to
check whether the corresponding bit was set in the RCiEPBitmap of the
RCEC (Root Complex Event Collector) while enumerating over each bit of
the RCiEPBitmap.

As per the PCI Express Base Specification, Revision 5.0, Version 1.0,
Section 7.9.10.2, "Association Bitmap for RCiEPs", p. 935, only needs to
use a device number to check whether the corresponding bit was set in
the RCiEPBitmap.

Fix rcec_assoc_rciep() using the PCI_SLOT() macro and convert the value
of "rciep->devfn" to a device number to ensure that the RCiEP devices
associated with the RCEC are linked when the RCEC is enumerated.

[ Krzysztof: Update commit message. ]

Fixes: 507b460f8144 ("PCI/ERR: Add pcie_link_rcec() to associate RCiEPs")
Reported-and-tested-by: Wen Jin 
Reviewed-by: Sean V Kelley 
Signed-off-by: Qiuxu Zhuo 
---
v1->v2:
  - Update the subject and the commit message.
  - Add 'Reviewed-by: Sean V Kelley ' to the SoB chain.

 drivers/pci/pcie/rcec.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/drivers/pci/pcie/rcec.c b/drivers/pci/pcie/rcec.c
index 2c5c552994e4..d0bcd141ac9c 100644
--- a/drivers/pci/pcie/rcec.c
+++ b/drivers/pci/pcie/rcec.c
@@ -32,7 +32,7 @@ static bool rcec_assoc_rciep(struct pci_dev *rcec, struct 
pci_dev *rciep)
 
/* Same bus, so check bitmap */
for_each_set_bit(devn, , 32)
-   if (devn == rciep->devfn)
+   if (devn == PCI_SLOT(rciep->devfn))
return true;
 
return false;
-- 
2.17.1

[PATCH v4 2/3] bus: mhi: core: Download AMSS image from appropriate function

2021-02-18 Thread Bhaumik Bhatt

During full boot chain firmware download, the PM state worker
downloads the AMSS image after a blocking wait for the SBL
execution environment change when running in PBL transition
itself. Improve this design by having the host download the AMSS
image from the SBL transition of PM state worker thread when a
DEV_ST_TRANSITION_SBL is queued instead of the blocking wait.

Signed-off-by: Bhaumik Bhatt 
---
 drivers/bus/mhi/core/boot.c | 48 -
 drivers/bus/mhi/core/internal.h |  1 +
 drivers/bus/mhi/core/pm.c   |  2 ++
 3 files changed, 27 insertions(+), 24 deletions(-)

diff --git a/drivers/bus/mhi/core/boot.c b/drivers/bus/mhi/core/boot.c
index c2546bf..983e6b5 100644
--- a/drivers/bus/mhi/core/boot.c
+++ b/drivers/bus/mhi/core/boot.c
@@ -389,7 +389,6 @@ static void mhi_firmware_copy(struct mhi_controller 
*mhi_cntrl,
 void mhi_fw_load_handler(struct mhi_controller *mhi_cntrl)
 {
const struct firmware *firmware = NULL;
-   struct image_info *image_info;
struct device *dev = _cntrl->mhi_dev->dev;
const char *fw_name;
void *buf;
@@ -493,35 +492,15 @@ void mhi_fw_load_handler(struct mhi_controller *mhi_cntrl)
ret = mhi_ready_state_transition(mhi_cntrl);
 
if (!mhi_cntrl->fbc_download)
-   return;
+   goto exit_fw_load;
 
if (ret) {
dev_err(dev, "MHI did not enter READY state\n");
goto error_ready_state;
}
 
-   /* Wait for the SBL event */
-   ret = wait_event_timeout(mhi_cntrl->state_event,
-mhi_cntrl->ee == MHI_EE_SBL ||
-MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state),
-msecs_to_jiffies(mhi_cntrl->timeout_ms));
-
-   if (!ret || MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state)) {
-   dev_err(dev, "MHI did not enter SBL\n");
-   goto error_ready_state;
-   }
-
-   /* Start full firmware image download */
-   image_info = mhi_cntrl->fbc_image;
-   ret = mhi_fw_load_bhie(mhi_cntrl,
-  /* Vector table is the last entry */
-  _info->mhi_buf[image_info->entries - 1]);
-   if (ret) {
-   dev_err(dev, "MHI did not load image over BHIe, ret: %d\n",
-   ret);
-   goto error_fw_load;
-   }
-
+exit_fw_load:
+   dev_info(dev, "Wait for device to enter SBL or Mission mode\n");
return;
 
 error_ready_state:
@@ -532,3 +511,24 @@ void mhi_fw_load_handler(struct mhi_controller *mhi_cntrl)
mhi_cntrl->pm_state = MHI_PM_FW_DL_ERR;
wake_up_all(_cntrl->state_event);
 }
+
+int mhi_download_amss_image(struct mhi_controller *mhi_cntrl)
+{
+   struct image_info *image_info = mhi_cntrl->fbc_image;
+   struct device *dev = _cntrl->mhi_dev->dev;
+   int ret;
+
+   if (!image_info)
+   return -EIO;
+
+   ret = mhi_fw_load_bhie(mhi_cntrl,
+  /* Vector table is the last entry */
+  _info->mhi_buf[image_info->entries - 1]);
+   if (ret) {
+   dev_err(dev, "MHI did not load AMSS, ret:%d\n", ret);
+   mhi_cntrl->pm_state = MHI_PM_FW_DL_ERR;
+   wake_up_all(_cntrl->state_event);
+   }
+
+   return ret;
+}
diff --git a/drivers/bus/mhi/core/internal.h b/drivers/bus/mhi/core/internal.h
index 6f80ec3..6f37439 100644
--- a/drivers/bus/mhi/core/internal.h
+++ b/drivers/bus/mhi/core/internal.h
@@ -619,6 +619,7 @@ int mhi_pm_m3_transition(struct mhi_controller *mhi_cntrl);
 int __mhi_device_get_sync(struct mhi_controller *mhi_cntrl);
 int mhi_send_cmd(struct mhi_controller *mhi_cntrl, struct mhi_chan *mhi_chan,
 enum mhi_cmd_type cmd);
+int mhi_download_amss_image(struct mhi_controller *mhi_cntrl);
 static inline bool mhi_is_active(struct mhi_controller *mhi_cntrl)
 {
return (mhi_cntrl->dev_state >= MHI_STATE_M0 &&
diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
index 8da8806..44aa7eb 100644
--- a/drivers/bus/mhi/core/pm.c
+++ b/drivers/bus/mhi/core/pm.c
@@ -758,6 +758,8 @@ void mhi_pm_st_worker(struct work_struct *work)
 * either SBL or AMSS states
 */
mhi_create_devices(mhi_cntrl);
+   if (mhi_cntrl->fbc_download)
+   mhi_download_amss_image(mhi_cntrl);
break;
case DEV_ST_TRANSITION_MISSION_MODE:
mhi_pm_mission_mode_transition(mhi_cntrl);
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[PATCH] bus: mhi: core: Move to polling method to wait for MHI ready

2021-02-18 Thread Bhaumik Bhatt

In certain devices, it is likely that there is no incoming MHI
interrupt for a transition to MHI READY state. One such example
is the move from Pass Through to an SBL or AMSS execution
environment. In order to facilitate faster bootup times as there
is no need to wait until timeout_ms completes, MHI host can poll
every 25 milliseconds to check if device has entered MHI READY
until a maximum timeout of twice the timeout_ms is reached.

Signed-off-by: Bhaumik Bhatt 
---
 drivers/bus/mhi/core/pm.c | 32 
 1 file changed, 16 insertions(+), 16 deletions(-)

diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
index 65ebca8..ec0060c 100644
--- a/drivers/bus/mhi/core/pm.c
+++ b/drivers/bus/mhi/core/pm.c
@@ -9,6 +9,7 @@
 #include 
 #include 
 #include 
+#include 
 #include 
 #include 
 #include 
@@ -157,30 +158,29 @@ int mhi_ready_state_transition(struct mhi_controller 
*mhi_cntrl)
struct mhi_event *mhi_event;
enum mhi_pm_state cur_state;
struct device *dev = _cntrl->mhi_dev->dev;
-   u32 reset = 1, ready = 0;
+   u32 reset, ready;
int ret, i;
 
-   /* Wait for RESET to be cleared and READY bit to be set by the device */
-   wait_event_timeout(mhi_cntrl->state_event,
-  MHI_PM_IN_FATAL_STATE(mhi_cntrl->pm_state) ||
-  mhi_read_reg_field(mhi_cntrl, base, MHICTRL,
- MHICTRL_RESET_MASK,
- MHICTRL_RESET_SHIFT, ) ||
-  mhi_read_reg_field(mhi_cntrl, base, MHISTATUS,
- MHISTATUS_READY_MASK,
- MHISTATUS_READY_SHIFT, ) ||
-  (!reset && ready),
-  msecs_to_jiffies(mhi_cntrl->timeout_ms));
-
/* Check if device entered error state */
if (MHI_PM_IN_FATAL_STATE(mhi_cntrl->pm_state)) {
dev_err(dev, "Device link is not accessible\n");
return -EIO;
}
 
-   /* Timeout if device did not transition to ready state */
-   if (reset || !ready) {
-   dev_err(dev, "Device Ready timeout\n");
+   /* Wait for RESET to be cleared and READY bit to be set by the device */
+   ret = readl_relaxed_poll_timeout(base + MHICTRL, reset,
+!(reset & MHICTRL_RESET_MASK), 25000,
+mhi_cntrl->timeout_ms * 1000);
+   if (ret) {
+   dev_err(dev, "Device failed to clear MHI Reset\n");
+   return -ETIMEDOUT;
+   }
+
+   ret = readl_relaxed_poll_timeout(base + MHISTATUS, ready,
+(ready & MHISTATUS_READY_MASK), 25000,
+mhi_cntrl->timeout_ms * 1000);
+   if (ret) {
+   dev_err(dev, "Device failed to enter MHI Ready\n");
return -ETIMEDOUT;
}
 
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[PATCH v4 0/3] Serialize execution environment changes for MHI

2021-02-18 Thread Bhaumik Bhatt

v4:
-Addressed review comments for additional info logging for EE movements
-Updated switch case for EE handling in mhi_intvec_threaded_handler()

v3:
-Update commit text to accurately reflect changes and reasoning based on reviews

v2:
-Add patch to clear devices when moving execution environments

During full boot chain firmware download, the PM state worker downloads the AMSS
image after waiting for the SBL execution environment change in PBL mode itself.
Since getting rid of the firmware load worker thread, this design needs to
change and MHI host must download the AMSS image from the SBL mode of PM state
worker thread instead of blocking waits for SBL EE in PBL transition processing.

Ensure that EE changes are handled only from appropriate places and occur
one after another and handle only PBL or RDDM EE changes as critical events
directly from the interrupt handler and the status callback is given to the
controller drivers promptly.

When moving from SBL to AMSS EE, clear SBL specific client devices by calling
remove callbacks for them so they are not left opened in a different execution
environment.

Bhaumik Bhatt (3):
  bus: mhi: core: Clear devices when moving execution environments
  bus: mhi: core: Download AMSS image from appropriate function
  bus: mhi: core: Process execution environment changes serially

 drivers/bus/mhi/core/boot.c | 48 ++---
 drivers/bus/mhi/core/internal.h |  1 +
 drivers/bus/mhi/core/main.c | 67 +++--
 drivers/bus/mhi/core/pm.c   | 10 --
 4 files changed, 77 insertions(+), 49 deletions(-)

-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[PATCH v4 1/3] bus: mhi: core: Clear devices when moving execution environments

2021-02-18 Thread Bhaumik Bhatt

When moving from SBL to mission mode execution environment, there
is no remove callback notification to MHI client drivers which
operate on SBL mode only. Client driver devices are being created
in SBL or AMSS(mission mode) and only destroyed after power down
or SYS_ERROR. If there exist any SBL-specific channels, those are
left open and client drivers are thus unaware of the new execution
environment where those channels cannot operate. Close the gap and
issue remove callbacks to SBL-specific client drivers once device
enters mission mode.

Signed-off-by: Bhaumik Bhatt 
---
 drivers/bus/mhi/core/main.c | 27 +++
 drivers/bus/mhi/core/pm.c   |  3 +++
 2 files changed, 26 insertions(+), 4 deletions(-)

diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c
index 4e0131b..58f1425 100644
--- a/drivers/bus/mhi/core/main.c
+++ b/drivers/bus/mhi/core/main.c
@@ -244,8 +244,10 @@ static void mhi_del_ring_element(struct mhi_controller 
*mhi_cntrl,
 
 int mhi_destroy_device(struct device *dev, void *data)
 {
+   struct mhi_chan *ul_chan, *dl_chan;
struct mhi_device *mhi_dev;
struct mhi_controller *mhi_cntrl;
+   enum mhi_ee_type ee = MHI_EE_MAX;
 
if (dev->bus != _bus_type)
return 0;
@@ -257,6 +259,17 @@ int mhi_destroy_device(struct device *dev, void *data)
if (mhi_dev->dev_type == MHI_DEVICE_CONTROLLER)
return 0;
 
+   ul_chan = mhi_dev->ul_chan;
+   dl_chan = mhi_dev->dl_chan;
+
+   /*
+* If execution environment is specified, remove only those devices that
+* started in them based on ee_mask for the channels as we move on to a
+* different execution environment
+*/
+   if (data)
+   ee = *(enum mhi_ee_type *)data;
+
/*
 * For the suspend and resume case, this function will get called
 * without mhi_unregister_controller(). Hence, we need to drop the
@@ -264,11 +277,17 @@ int mhi_destroy_device(struct device *dev, void *data)
 * be sure that there will be no instances of mhi_dev left after
 * this.
 */
-   if (mhi_dev->ul_chan)
-   put_device(_dev->ul_chan->mhi_dev->dev);
+   if (ul_chan) {
+   if (ee != MHI_EE_MAX && !(ul_chan->ee_mask & BIT(ee)))
+   return 0;
+   put_device(_chan->mhi_dev->dev);
+   }
 
-   if (mhi_dev->dl_chan)
-   put_device(_dev->dl_chan->mhi_dev->dev);
+   if (dl_chan) {
+   if (ee != MHI_EE_MAX && !(dl_chan->ee_mask & BIT(ee)))
+   return 0;
+   put_device(_chan->mhi_dev->dev);
+   }
 
dev_dbg(_cntrl->mhi_dev->dev, "destroy device for chan:%s\n",
 mhi_dev->name);
diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
index 681960c..8da8806 100644
--- a/drivers/bus/mhi/core/pm.c
+++ b/drivers/bus/mhi/core/pm.c
@@ -377,6 +377,7 @@ static int mhi_pm_mission_mode_transition(struct 
mhi_controller *mhi_cntrl)
 {
struct mhi_event *mhi_event;
struct device *dev = _cntrl->mhi_dev->dev;
+   enum mhi_ee_type ee = MHI_EE_MAX, current_ee = mhi_cntrl->ee;
int i, ret;
 
dev_dbg(dev, "Processing Mission Mode transition\n");
@@ -395,6 +396,8 @@ static int mhi_pm_mission_mode_transition(struct 
mhi_controller *mhi_cntrl)
 
wake_up_all(_cntrl->state_event);
 
+   device_for_each_child(_cntrl->mhi_dev->dev, _ee,
+ mhi_destroy_device);
mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_EE_MISSION_MODE);
 
/* Force MHI to be in M0 state before continuing */
-- 
The Qualcomm Innovation Center, Inc. is a member of the Code Aurora Forum,
a Linux Foundation Collaborative Project

[PATCH v4 3/3] bus: mhi: core: Process execution environment changes serially

2021-02-18 Thread Bhaumik Bhatt

In current design, whenever the BHI interrupt is fired, the
execution environment is updated. This can cause race conditions
and impede ongoing power up/down processing. For example, if a
power down is in progress, MHI host updates to a local "disabled"
execution environment. If a BHI interrupt fires later, that value
gets replaced with one from the BHI EE register. This impacts the
controller as it does not expect multiple RDDM execution
environment change status callbacks as an example. Another issue
would be that the device can enter mission mode and the execution
environment is updated, while device creation for SBL channels is
still going on due to slower PM state worker thread run, leading
to multiple attempts at opening the same channel.

We must handle and wait for SYS_ERROR in any case to facilitate
clean-up for the controller and handle RDDM. Ensure that EE
changes are handled only from appropriate places and occur one
after another and handle only PBL modes or RDDM EE changes as
critical events directly from the interrupt handler. This also
makes sure that we use the correct execution environment to notify
the controller driver when the device resets to one of the PBL
execution environments.

Signed-off-by: Bhaumik Bhatt 
---
 drivers/bus/mhi/core/main.c | 40 +---
 drivers/bus/mhi/core/pm.c   |  5 +++--
 2 files changed, 24 insertions(+), 21 deletions(-)

diff --git a/drivers/bus/mhi/core/main.c b/drivers/bus/mhi/core/main.c
index 58f1425..0cfe0f5 100644
--- a/drivers/bus/mhi/core/main.c
+++ b/drivers/bus/mhi/core/main.c
@@ -428,7 +428,7 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, 
void *priv)
struct device *dev = _cntrl->mhi_dev->dev;
enum mhi_state state = MHI_STATE_MAX;
enum mhi_pm_state pm_state = 0;
-   enum mhi_ee_type ee = 0;
+   enum mhi_ee_type ee = MHI_EE_MAX;
 
write_lock_irq(_cntrl->pm_lock);
if (!MHI_REG_ACCESS_VALID(mhi_cntrl->pm_state)) {
@@ -437,8 +437,7 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, 
void *priv)
}
 
state = mhi_get_mhi_state(mhi_cntrl);
-   ee = mhi_cntrl->ee;
-   mhi_cntrl->ee = mhi_get_exec_env(mhi_cntrl);
+   ee = mhi_get_exec_env(mhi_cntrl);
dev_dbg(dev, "local ee:%s device ee:%s dev_state:%s\n",
TO_MHI_EXEC_STR(mhi_cntrl->ee), TO_MHI_EXEC_STR(ee),
TO_MHI_STATE_STR(state));
@@ -450,27 +449,30 @@ irqreturn_t mhi_intvec_threaded_handler(int irq_number, 
void *priv)
}
write_unlock_irq(_cntrl->pm_lock);
 
-/* If device supports RDDM don't bother processing SYS error */
-   if (mhi_cntrl->rddm_image) {
-   /* host may be performing a device power down already */
-   if (!mhi_is_active(mhi_cntrl))
-   goto exit_intvec;
+   if (pm_state != MHI_PM_SYS_ERR_DETECT || ee == mhi_cntrl->ee)
+   goto exit_intvec;
 
-   if (mhi_cntrl->ee == MHI_EE_RDDM && mhi_cntrl->ee != ee) {
+   switch (ee) {
+   case MHI_EE_RDDM:
+   if (!mhi_cntrl->rddm_image)
+   goto exit_intvec;
+   /* proceed if power down is not already in progress */
+   if (mhi_is_active(mhi_cntrl)) {
mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_EE_RDDM);
+   mhi_cntrl->ee = ee;
wake_up_all(_cntrl->state_event);
}
-   goto exit_intvec;
-   }
-
-   if (pm_state == MHI_PM_SYS_ERR_DETECT) {
+   break;
+   case MHI_EE_PBL:
+   case MHI_EE_EDL:
+   case MHI_EE_PTHRU:
+   mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_FATAL_ERROR);
+   mhi_cntrl->ee = ee;
+   /* continue */
+   default:
+   mhi_pm_sys_err_handler(mhi_cntrl);
wake_up_all(_cntrl->state_event);
-
-   /* For fatal errors, we let controller decide next step */
-   if (MHI_IN_PBL(ee))
-   mhi_cntrl->status_cb(mhi_cntrl, MHI_CB_FATAL_ERROR);
-   else
-   mhi_pm_sys_err_handler(mhi_cntrl);
+   break;
}
 
 exit_intvec:
diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
index 44aa7eb..c870fa8 100644
--- a/drivers/bus/mhi/core/pm.c
+++ b/drivers/bus/mhi/core/pm.c
@@ -384,14 +384,15 @@ static int mhi_pm_mission_mode_transition(struct 
mhi_controller *mhi_cntrl)
 
write_lock_irq(_cntrl->pm_lock);
if (MHI_REG_ACCESS_VALID(mhi_cntrl->pm_state))
-   mhi_cntrl->ee = mhi_get_exec_env(mhi_cntrl);
+   ee = mhi_get_exec_env(mhi_cntrl);
 
-   if (!MHI_IN_MISSION_MODE(mhi_cntrl->ee)) {
+   if (!MHI_IN_MISSION_MODE(ee)) {
mhi_cntrl->pm_state = MHI_PM_LD_ERR_FATAL_DETECT;
write_unlock_irq(_cntrl->pm_lock);
wake_up_all(_cntrl->state_event);
return

Re: [PATCH v2 1/2] mm: Make alloc_contig_range handle free hugetlb pages

2021-02-18 Thread Mike Kravetz

On 2/18/21 4:00 AM, Oscar Salvador wrote:
> alloc_contig_range will fail if it ever sees a HugeTLB page within the
> range we are trying to allocate, even when that page is free and can be
> easily reallocated.
> This has proofed to be problematic for some users of alloc_contic_range,
> e.g: CMA and virtio-mem, where those would fail the call even when those
> pages lay in ZONE_MOVABLE and are free.
> 
> We can do better by trying to dissolve such pages.
> 
> Free hugepages are tricky to handle so as to no userspace application
> notices disruption, we need to replace the current free hugepage with
> a new one.
> 
> In order to do that, a new function called alloc_and_dissolve_huge_page
> is introduced.
> This function will first try to get a new fresh hugepage, and if it
> succeeds, it will dissolve the old one.
> 
> If the old hugepage cannot be be dissolved, we have to dissolve the new
> hugepage we just got.
> Should that fail as well, we count is as a surplus, so the pool will be
> re-balanced when a hugepage gets free instead of enqueues again.
> 
> With regard to the allocation, we restrict it to the node the page belongs
> to with __GFP_THISNODE, meaning we do not fallback on other node's zones.
> 
> Note that gigantic hugetlb pages are fenced off since there is a cyclic
> dependency between them and alloc_contig_range.
> 
> Signed-off-by: Oscar Salvador 
> ---
>  include/linux/hugetlb.h |  6 
>  mm/compaction.c | 12 
>  mm/hugetlb.c| 75 
> +
>  3 files changed, 93 insertions(+)
> 
> diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
> index b5807f23caf8..72352d718829 100644
> --- a/include/linux/hugetlb.h
> +++ b/include/linux/hugetlb.h
> @@ -505,6 +505,7 @@ struct huge_bootmem_page {
>   struct hstate *hstate;
>  };
>  
> +bool isolate_or_dissolve_huge_page(struct page *page);
>  struct page *alloc_huge_page(struct vm_area_struct *vma,
>   unsigned long addr, int avoid_reserve);
>  struct page *alloc_huge_page_nodemask(struct hstate *h, int preferred_nid,
> @@ -775,6 +776,11 @@ void set_page_huge_active(struct page *page);
>  #else/* CONFIG_HUGETLB_PAGE */
>  struct hstate {};
>  
> +static inline bool isolate_or_dissolve_huge_page(struct page *page)
> +{
> + return false;
> +}
> +
>  static inline struct page *alloc_huge_page(struct vm_area_struct *vma,
>  unsigned long addr,
>  int avoid_reserve)
> diff --git a/mm/compaction.c b/mm/compaction.c
> index 190ccdaa6c19..d52506ed9db7 100644
> --- a/mm/compaction.c
> +++ b/mm/compaction.c
> @@ -905,6 +905,18 @@ isolate_migratepages_block(struct compact_control *cc, 
> unsigned long low_pfn,
>   valid_page = page;
>   }
>  
> + if (PageHuge(page) && cc->alloc_contig) {
> + if (!isolate_or_dissolve_huge_page(page))
> + goto isolate_fail;
> +
> + /*
> +  * Ok, the hugepage was dissolved. Now these pages are
> +  * Buddy and cannot be re-allocated because they are
> +  * isolated. Fall-through as the check below handles
> +  * Buddy pages.
> +  */
> + }
> +
>   /*
>* Skip if free. We read page order here without zone lock
>* which is generally unsafe, but the race window is small and
> diff --git a/mm/hugetlb.c b/mm/hugetlb.c
> index 4bdb58ab14cb..a4fbbe924a55 100644
> --- a/mm/hugetlb.c
> +++ b/mm/hugetlb.c
> @@ -2294,6 +2294,81 @@ static void restore_reserve_on_error(struct hstate *h,
>   }
>  }
>  
> +static bool alloc_and_dissolve_huge_page(struct hstate *h, struct page *page)
> +{
> + gfp_t gfp_mask = htlb_alloc_mask(h) | __GFP_THISNODE;
> + int nid = page_to_nid(page);
> + struct page *new_page;
> + bool ret = false;
> +
> + /*
> +  * Before dissolving the page, we need to allocate a new one,
> +  * so the pool remains stable.
> +  */
> + new_page = alloc_fresh_huge_page(h, gfp_mask, nid, NULL, NULL);
> + if (new_page) {
> + /*
> +  * Free it into the hugepage allocator
> +  */
> + put_page(new_page);
> +

Suppose an admin does

echo 0 > \
/sys/devices/system/node/node/hugepages/hugepages-2048kB/nr_hugepages

right now and dissolves both the original and new page.

> + /*
> +  * Ok, we got a new free hugepage to replace this one. Try to
> +  * dissolve the old page.
> +  */
> + if (!dissolve_free_huge_page(page)) {
> + ret = true;

dissolve_free_huge_page will fail for the original page

> + } else if (dissolve_free_huge_page(new_page)) {

and, will fail for the new page

> +

[PATCH v2 1/4] hmm: Device exclusive memory access

2021-02-18 Thread Alistair Popple

Some devices require exclusive write access to shared virtual
memory (SVM) ranges to perform atomic operations on that memory. This
requires CPU page tables to be updated to deny access whilst atomic
operations are occurring.

In order to do this introduce a new swap entry
type (SWP_DEVICE_EXCLUSIVE). When a SVM range needs to be marked for
exclusive access by a device all page table mappings for the particular
range are replaced with device exclusive swap entries. This causes any
CPU access to the page to result in a fault.

Faults are resovled by replacing the faulting entry with the original
mapping. This results in MMU notifiers being called which a driver uses
to update access permissions such as revoking atomic access. After
notifiers have been called the device will no longer have exclusive
access to the region.

Signed-off-by: Alistair Popple 
---
 Documentation/vm/hmm.rst |  15 +++
 fs/proc/task_mmu.c   |   7 ++
 include/linux/hmm.h  |   4 +
 include/linux/rmap.h |   1 +
 include/linux/swap.h |  10 +-
 include/linux/swapops.h  |  32 ++
 mm/hmm.c | 206 +++
 mm/memory.c  |  34 ++-
 mm/mprotect.c|   7 ++
 mm/page_vma_mapped.c |  14 ++-
 mm/rmap.c|  29 +-
 11 files changed, 347 insertions(+), 12 deletions(-)

diff --git a/Documentation/vm/hmm.rst b/Documentation/vm/hmm.rst
index 09e28507f5b2..ffdc58cb2e7c 100644
--- a/Documentation/vm/hmm.rst
+++ b/Documentation/vm/hmm.rst
@@ -405,6 +405,21 @@ between device driver specific code and shared common code:
 
The lock can now be released.
 
+Exclusive access memory
+===
+
+Not all devices support atomic access to system memory. To support atomic
+operations to a shared virtual memory page such a device needs access to that
+page which is exclusive of any userspace access from the CPU. The
+``hmm_exclusive_range()`` function can be used to make a memory range
+inaccessible from userspace.
+
+This replaces all mappings for pages in the given range with special swap
+entries. Any attempt to access the swap entry results in a fault which is
+resovled by replacing the entry with the original mapping. A driver gets
+notified that the mapping has been changed by MMU notifiers, after which point
+it will no longer have exclusive access to the page.
+
 Memory cgroup (memcg) and rss accounting
 
 
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 602e3a52884d..79aaf8768be3 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -518,6 +518,8 @@ static void smaps_pte_entry(pte_t *pte, unsigned long addr,
page = migration_entry_to_page(swpent);
else if (is_device_private_entry(swpent))
page = device_private_entry_to_page(swpent);
+   else if (is_device_exclusive_entry(swpent))
+   page = device_exclusive_entry_to_page(swpent);
} else if (unlikely(IS_ENABLED(CONFIG_SHMEM) && mss->check_shmem_swap
&& pte_none(*pte))) {
page = xa_load(>vm_file->f_mapping->i_pages,
@@ -695,6 +697,8 @@ static int smaps_hugetlb_range(pte_t *pte, unsigned long 
hmask,
page = migration_entry_to_page(swpent);
else if (is_device_private_entry(swpent))
page = device_private_entry_to_page(swpent);
+   else if (is_device_exclusive_entry(swpent))
+   page = device_exclusive_entry_to_page(swpent);
}
if (page) {
int mapcount = page_mapcount(page);
@@ -1387,6 +1391,9 @@ static pagemap_entry_t pte_to_pagemap_entry(struct 
pagemapread *pm,
 
if (is_device_private_entry(entry))
page = device_private_entry_to_page(entry);
+
+   if (is_device_exclusive_entry(entry))
+   page = device_exclusive_entry_to_page(entry);
}
 
if (page && !PageAnon(page))
diff --git a/include/linux/hmm.h b/include/linux/hmm.h
index 866a0fa104c4..5d28ff6d4d80 100644
--- a/include/linux/hmm.h
+++ b/include/linux/hmm.h
@@ -109,6 +109,10 @@ struct hmm_range {
  */
 int hmm_range_fault(struct hmm_range *range);
 
+int hmm_exclusive_range(struct mm_struct *mm, unsigned long start,
+   unsigned long end, struct page **pages);
+vm_fault_t hmm_remove_exclusive_entry(struct vm_fault *vmf);
+
 /*
  * HMM_RANGE_DEFAULT_TIMEOUT - default timeout (ms) when waiting for a range
  *
diff --git a/include/linux/rmap.h b/include/linux/rmap.h
index 70085ca1a3fc..5503fc4d1138 100644
--- a/include/linux/rmap.h
+++ b/include/linux/rmap.h
@@ -98,6 +98,7 @@ enum ttu_flags {
TTU_RMAP_LOCKED = 0x80, /* do not grab rmap lock:
 * caller holds it */
TTU_SPLIT_FREEZE= 0x100,

[PATCH v2 4/4] nouveau/svm: Implement atomic SVM access

2021-02-18 Thread Alistair Popple

Some NVIDIA GPUs do not support direct atomic access to system memory
via PCIe. Instead this must be emulated by granting the GPU exclusive
access to the memory. This is achieved by replacing CPU page table
entries with special swap entries that fault on userspace access.

The driver then grants the GPU permission to update the page undergoing
atomic access via the GPU page tables. When CPU access to the page is
required a CPU fault is raised which calls into the device driver via
MMU notifiers to revoke the atomic access. The original page table
entries are then restored allowing CPU access to proceed.

Signed-off-by: Alistair Popple 
---
 drivers/gpu/drm/nouveau/include/nvif/if000c.h |  1 +
 drivers/gpu/drm/nouveau/nouveau_svm.c | 86 ---
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |  1 +
 .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c|  6 ++
 4 files changed, 81 insertions(+), 13 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/include/nvif/if000c.h 
b/drivers/gpu/drm/nouveau/include/nvif/if000c.h
index d6dd40f21eed..9c7ff56831c5 100644
--- a/drivers/gpu/drm/nouveau/include/nvif/if000c.h
+++ b/drivers/gpu/drm/nouveau/include/nvif/if000c.h
@@ -77,6 +77,7 @@ struct nvif_vmm_pfnmap_v0 {
 #define NVIF_VMM_PFNMAP_V0_APER   0x00f0ULL
 #define NVIF_VMM_PFNMAP_V0_HOST   0xULL
 #define NVIF_VMM_PFNMAP_V0_VRAM   0x0010ULL
+#define NVIF_VMM_PFNMAP_V0_A 0x0004ULL
 #define NVIF_VMM_PFNMAP_V0_W  0x0002ULL
 #define NVIF_VMM_PFNMAP_V0_V  0x0001ULL
 #define NVIF_VMM_PFNMAP_V0_NONE   0xULL
diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c 
b/drivers/gpu/drm/nouveau/nouveau_svm.c
index cd7b47c946cf..d2ce4fb9c8ec 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -421,9 +421,9 @@ nouveau_svm_fault_cmp(const void *a, const void *b)
return ret;
if ((ret = (s64)fa->addr - fb->addr))
return ret;
-   /*XXX: atomic? */
-   return (fa->access == 0 || fa->access == 3) -
-  (fb->access == 0 || fb->access == 3);
+   /* Atomic access (2) has highest priority */
+   return (-1*(fa->access == 2) + (fa->access == 0 || fa->access == 3)) -
+  (-1*(fb->access == 2) + (fb->access == 0 || fb->access == 3));
 }
 
 static void
@@ -555,10 +555,57 @@ static void nouveau_hmm_convert_pfn(struct nouveau_drm 
*drm,
args->p.phys[0] |= NVIF_VMM_PFNMAP_V0_W;
 }
 
+static int nouveau_atomic_range_fault(struct nouveau_svmm *svmm,
+  struct nouveau_drm *drm,
+  struct nouveau_pfnmap_args *args, u32 size,
+  unsigned long hmm_flags, struct mm_struct *mm)
+{
+   struct page *page;
+   unsigned long start = args->p.addr;
+   struct vm_area_struct *vma;
+   int ret = 0;
+
+   mmap_read_lock(mm);
+   vma = find_vma_intersection(mm, start, start + size);
+   if (!vma || !(vma->vm_flags & VM_WRITE)) {
+   ret = -EPERM;
+   goto out;
+   }
+
+   hmm_exclusive_range(mm, start, start + PAGE_SIZE, );
+   if (!page) {
+   ret = -EINVAL;
+   goto out;
+   }
+
+   /* Map the page on the GPU. */
+   args->p.page = 12;
+   args->p.size = PAGE_SIZE;
+   args->p.addr = start;
+   args->p.phys[0] = page_to_phys(page) |
+   NVIF_VMM_PFNMAP_V0_V |
+   NVIF_VMM_PFNMAP_V0_W |
+   NVIF_VMM_PFNMAP_V0_A |
+   NVIF_VMM_PFNMAP_V0_HOST;
+
+   mutex_lock(>mutex);
+   svmm->vmm->vmm.object.client->super = true;
+   ret = nvif_object_ioctl(>vmm->vmm.object, args, size, NULL);
+   svmm->vmm->vmm.object.client->super = false;
+   mutex_unlock(>mutex);
+
+   unlock_page(page);
+   put_page(page);
+
+out:
+   mmap_read_unlock(mm);
+   return ret;
+}
+
 static int nouveau_range_fault(struct nouveau_svmm *svmm,
   struct nouveau_drm *drm,
   struct nouveau_pfnmap_args *args, u32 size,
-  unsigned long hmm_flags,
+  unsigned long hmm_flags, int atomic,
   struct svm_notifier *notifier)
 {
unsigned long timeout =
@@ -608,12 +655,18 @@ static int nouveau_range_fault(struct nouveau_svmm *svmm,
break;
}
 
-   nouveau_hmm_convert_pfn(drm, , args);
+   if (atomic) {
+   mutex_unlock(>mutex);
+   ret = nouveau_atomic_range_fault(svmm, drm, args,
+   size, hmm_flags, mm);
+   } else {
+   nouveau_hmm_convert_pfn(drm, , args);
 
-

[PATCH v2 3/4] nouveau/svm: Refactor nouveau_range_fault

2021-02-18 Thread Alistair Popple

Call mmu_interval_notifier_insert() as part of nouveau_range_fault().
This doesn't introduce any functional change but makes it easier for a
subsequent patch to alter the behaviour of nouveau_range_fault() to
support GPU atomic operations.

Signed-off-by: Alistair Popple 
---
 drivers/gpu/drm/nouveau/nouveau_svm.c | 34 ---
 1 file changed, 20 insertions(+), 14 deletions(-)

diff --git a/drivers/gpu/drm/nouveau/nouveau_svm.c 
b/drivers/gpu/drm/nouveau/nouveau_svm.c
index f18bd53da052..cd7b47c946cf 100644
--- a/drivers/gpu/drm/nouveau/nouveau_svm.c
+++ b/drivers/gpu/drm/nouveau/nouveau_svm.c
@@ -567,18 +567,27 @@ static int nouveau_range_fault(struct nouveau_svmm *svmm,
unsigned long hmm_pfns[1];
struct hmm_range range = {
.notifier = >notifier,
-   .start = notifier->notifier.interval_tree.start,
-   .end = notifier->notifier.interval_tree.last + 1,
.default_flags = hmm_flags,
.hmm_pfns = hmm_pfns,
.dev_private_owner = drm->dev,
};
-   struct mm_struct *mm = notifier->notifier.mm;
+   struct mm_struct *mm = svmm->notifier.mm;
int ret;
 
+   ret = mmu_interval_notifier_insert(>notifier, mm,
+   args->p.addr, args->p.size,
+   _svm_mni_ops);
+   if (ret)
+   return ret;
+
+   range.start = notifier->notifier.interval_tree.start;
+   range.end = notifier->notifier.interval_tree.last + 1;
+
while (true) {
-   if (time_after(jiffies, timeout))
-   return -EBUSY;
+   if (time_after(jiffies, timeout)) {
+   ret = -EBUSY;
+   goto out;
+   }
 
range.notifier_seq = mmu_interval_read_begin(range.notifier);
mmap_read_lock(mm);
@@ -587,7 +596,7 @@ static int nouveau_range_fault(struct nouveau_svmm *svmm,
if (ret) {
if (ret == -EBUSY)
continue;
-   return ret;
+   goto out;
}
 
mutex_lock(>mutex);
@@ -606,6 +615,9 @@ static int nouveau_range_fault(struct nouveau_svmm *svmm,
svmm->vmm->vmm.object.client->super = false;
mutex_unlock(>mutex);
 
+out:
+   mmu_interval_notifier_remove(>notifier);
+
return ret;
 }
 
@@ -727,14 +739,8 @@ nouveau_svm_fault(struct nvif_notify *notify)
}
 
notifier.svmm = svmm;
-   ret = mmu_interval_notifier_insert(, mm,
-  args.i.p.addr, args.i.p.size,
-  _svm_mni_ops);
-   if (!ret) {
-   ret = nouveau_range_fault(svmm, svm->drm, ,
-   sizeof(args), hmm_flags, );
-   mmu_interval_notifier_remove();
-   }
+   ret = nouveau_range_fault(svmm, svm->drm, ,
+   sizeof(args), hmm_flags, );
mmput(mm);
 
limit = args.i.p.addr + args.i.p.size;
-- 
2.20.1

[PATCH v2 0/4] Add support for SVM atomics in Nouveau

2021-02-18 Thread Alistair Popple

This is the second version of a series to add support to Nouveau for atomic
memory operations on OpenCL shared virtual memory (SVM) regions. This is
achieved using the atomic PTE bits on the GPU to only permit atomic
operations to system memory when a page is not mapped in userspace on the
CPU. The previous version of this series used an unmap and pin page
migration, however this resulted in problems with ZONE_MOVABLE and other
issues so this series uses a different approach.

Instead exclusive device access is implemented by adding a new swap entry
type (SWAP_DEVICE_EXCLUSIVE) which is similar to a migration entry. The
main difference is that on fault the original entry is immediately restored
by the fault handler instead of waiting.

Restoring the entry triggers calls to MMU notifers which allows a device
driver to revoke the atomic access permission from the GPU prior to the CPU
finalising the entry.

Patch 1 contains the bulk of the memory management changes required to
implement the new entry type.

Patch 2 contains some additions to the HMM selftests to ensure everything
works as expected.

Patch 3 was posted previously and has not changed.

Patch 4 is similar to what was posted previously but updated to use the new
swap entries instread of migration.

This has been tested using the latest upstream Mesa userspace with a simple
OpenCL test program which checks the results of atomic GPU operations on a
SVM buffer whilst also writing to the same buffer from the CPU.

Alistair Popple (4):
  hmm: Device exclusive memory access
  hmm: Selftests for exclusive device memory
  nouveau/svm: Refactor nouveau_range_fault
  nouveau/svm: Implement atomic SVM access

 Documentation/vm/hmm.rst  |  15 ++
 drivers/gpu/drm/nouveau/include/nvif/if000c.h |   1 +
 drivers/gpu/drm/nouveau/nouveau_svm.c | 118 +++---
 drivers/gpu/drm/nouveau/nvkm/subdev/mmu/vmm.h |   1 +
 .../drm/nouveau/nvkm/subdev/mmu/vmmgp100.c|   6 +
 fs/proc/task_mmu.c|   7 +
 include/linux/hmm.h   |   4 +
 include/linux/rmap.h  |   1 +
 include/linux/swap.h  |  10 +-
 include/linux/swapops.h   |  32 +++
 lib/test_hmm.c| 124 ++
 lib/test_hmm_uapi.h   |   2 +
 mm/hmm.c  | 206 
 mm/memory.c   |  34 ++-
 mm/mprotect.c |   7 +
 mm/page_vma_mapped.c  |  14 +-
 mm/rmap.c |  29 ++-
 tools/testing/selftests/vm/hmm-tests.c| 219 ++
 18 files changed, 792 insertions(+), 38 deletions(-)

-- 
2.20.1

[PATCH v2 2/4] hmm: Selftests for exclusive device memory

2021-02-18 Thread Alistair Popple

Adds some selftests for exclusive device memory.

Signed-off-by: Alistair Popple 
---
 lib/test_hmm.c | 124 ++
 lib/test_hmm_uapi.h|   2 +
 tools/testing/selftests/vm/hmm-tests.c | 219 +
 3 files changed, 345 insertions(+)

diff --git a/lib/test_hmm.c b/lib/test_hmm.c
index 80a78877bd93..d517d9d4c5aa 100644
--- a/lib/test_hmm.c
+++ b/lib/test_hmm.c
@@ -25,6 +25,7 @@
 #include 
 #include 
 #include 
+#include 
 
 #include "test_hmm_uapi.h"
 
@@ -46,6 +47,7 @@ struct dmirror_bounce {
unsigned long   cpages;
 };
 
+#define DPT_XA_TAG_ATOMIC 1UL
 #define DPT_XA_TAG_WRITE 3UL
 
 /*
@@ -619,6 +621,54 @@ static void dmirror_migrate_alloc_and_copy(struct 
migrate_vma *args,
}
 }
 
+static int dmirror_check_atomic(struct dmirror *dmirror, unsigned long start,
+unsigned long end)
+{
+   unsigned long pfn;
+
+   for (pfn = start >> PAGE_SHIFT; pfn < (end >> PAGE_SHIFT); pfn++) {
+   void *entry;
+   struct page *page;
+
+   entry = xa_load(>pt, pfn);
+   page = xa_untag_pointer(entry);
+   if (xa_pointer_tag(entry) == DPT_XA_TAG_ATOMIC)
+   return -EPERM;
+   }
+
+   return 0;
+}
+
+static int dmirror_atomic_map(unsigned long start, unsigned long end,
+ struct page **pages, struct dmirror *dmirror)
+{
+   unsigned long pfn, mapped = 0;
+   int i;
+
+   /* Map the migrated pages into the device's page tables. */
+   mutex_lock(>mutex);
+
+   for (i = 0, pfn = start >> PAGE_SHIFT; pfn < (end >> PAGE_SHIFT); 
pfn++, i++) {
+   void *entry;
+
+   if (!pages[i])
+   continue;
+
+   entry = pages[i];
+   entry = xa_tag_pointer(entry, DPT_XA_TAG_ATOMIC);
+   entry = xa_store(>pt, pfn, entry, GFP_ATOMIC);
+   if (xa_is_err(entry)) {
+   mutex_unlock(>mutex);
+   return xa_err(entry);
+   }
+
+   mapped++;
+   }
+
+   mutex_unlock(>mutex);
+   return mapped;
+}
+
 static int dmirror_migrate_finalize_and_map(struct migrate_vma *args,
struct dmirror *dmirror)
 {
@@ -661,6 +711,71 @@ static int dmirror_migrate_finalize_and_map(struct 
migrate_vma *args,
return 0;
 }
 
+static int dmirror_exclusive(struct dmirror *dmirror,
+struct hmm_dmirror_cmd *cmd)
+{
+   unsigned long start, end, addr;
+   unsigned long size = cmd->npages << PAGE_SHIFT;
+   struct mm_struct *mm = dmirror->notifier.mm;
+   struct page *pages[64];
+   struct dmirror_bounce bounce;
+   unsigned long next;
+   int ret;
+
+   start = cmd->addr;
+   end = start + size;
+   if (end < start)
+   return -EINVAL;
+
+   /* Since the mm is for the mirrored process, get a reference first. */
+   if (!mmget_not_zero(mm))
+   return -EINVAL;
+
+   mmap_read_lock(mm);
+   for (addr = start; addr < end; addr = next) {
+   int i, mapped;
+
+   if (end < addr + (64 << PAGE_SHIFT))
+   next = end;
+   else
+   next = addr + (64 << PAGE_SHIFT);
+
+   ret = hmm_exclusive_range(mm, addr, next, pages);
+   mapped = dmirror_atomic_map(addr, next, pages, dmirror);
+   for (i = 0; i < ret; i++) {
+   if (pages[i]) {
+   unlock_page(pages[i]);
+   put_page(pages[i]);
+   }
+   }
+
+   if (addr + (mapped << PAGE_SHIFT) < next) {
+   mmap_read_unlock(mm);
+   mmput(mm);
+   return -EBUSY;
+   }
+   }
+   mmap_read_unlock(mm);
+   mmput(mm);
+
+   /* Return the migrated data for verification. */
+   ret = dmirror_bounce_init(, start, size);
+   if (ret)
+   return ret;
+   mutex_lock(>mutex);
+   ret = dmirror_do_read(dmirror, start, end, );
+   mutex_unlock(>mutex);
+   if (ret == 0) {
+   if (copy_to_user(u64_to_user_ptr(cmd->ptr), bounce.ptr,
+bounce.size))
+   ret = -EFAULT;
+   }
+
+   cmd->cpages = bounce.cpages;
+   dmirror_bounce_fini();
+   return ret;
+}
+
 static int dmirror_migrate(struct dmirror *dmirror,
   struct hmm_dmirror_cmd *cmd)
 {
@@ -949,6 +1064,15 @@ static long dmirror_fops_unlocked_ioctl(struct file *filp,
ret = dmirror_migrate(dmirror, );
break;
 
+   case HMM_DMIRROR_EXCLUSIVE:
+   ret = dmirror_exclusive(dmirror, );
+   break;
+
+   case

Re: [PATCH] ntfs: check for valid standard information attribute

2021-02-18 Thread Anton Altaparmakov

Hi Andrew,

Can you please push this one upstream?  Thanks a lot in advance!

Best regards,

Anton

> On 17 Feb 2021, at 15:59, Rustam Kovhaev  wrote:
> 
> we should check for valid STANDARD_INFORMATION attribute offset and
> length before trying to access it
> 
> Reported-and-tested-by: syzbot+c584225dabdea2f71...@syzkaller.appspotmail.com
> Signed-off-by: Rustam Kovhaev 
> Acked-by: Anton Altaparmakov 
> Link: https://syzkaller.appspot.com/bug?extid=c584225dabdea2f71969
> ---
> fs/ntfs/inode.c | 6 ++
> 1 file changed, 6 insertions(+)
> 
> diff --git a/fs/ntfs/inode.c b/fs/ntfs/inode.c
> index f7e4cbc26eaf..be4ff9386ec0 100644
> --- a/fs/ntfs/inode.c
> +++ b/fs/ntfs/inode.c
> @@ -629,6 +629,12 @@ static int ntfs_read_locked_inode(struct inode *vi)
>   }
>   a = ctx->attr;
>   /* Get the standard information attribute value. */
> + if ((u8 *)a + le16_to_cpu(a->data.resident.value_offset)
> + + le32_to_cpu(a->data.resident.value_length) >
> + (u8 *)ctx->mrec + vol->mft_record_size) {
> + ntfs_error(vi->i_sb, "Corrupt standard information attribute in 
> inode.");
> + goto unm_err_out;
> + }
>   si = (STANDARD_INFORMATION*)((u8*)a +
>   le16_to_cpu(a->data.resident.value_offset));
> 
> -- 
> 2.30.0
> 


-- 
Anton Altaparmakov  (replace at with @)
Lead in File System Development, Tuxera Inc., http://www.tuxera.com/
Linux NTFS maintainer

RE: [PATCH 1/1] PCI/RCEC: Fix failure to inject errors to some RCiEP devices

2021-02-18 Thread Zhuo, Qiuxu

> ...
> 
> I took your suggestion and came up with the following:
> 
>   Function rcec_assoc_rciep() incorrectly used "rciep->devfn" (a single
>   byte encoding the device and function number) as the device number to
>   check whether the corresponding bit was set in the RCiEPBitmap of the
>   RCEC (Root Complex Event Collector) while enumerating over each bit of
>   the RCiEPBitmap.
> 
>   As per the PCI Express Base Specification, Revision 5.0, Version 1.0,
>   Section 7.9.10.2, "Association Bitmap for RCiEPs", p. 935, only needs to
>   use a device number to check whether the corresponding bit was set in
>   the RCiEPBitmap.
> 
>   Fix rcec_assoc_rciep() using the PCI_SLOT() macro and convert the value
>   of "rciep->devfn" to a device number to ensure that the RCiEP devices
>   are associated with the RCEC are linked when the RCEC is enumerated.
>
> Using either of the following as the subject:
> 
>   PCI/RCEC: Use device number to check RCiEPBitmap of RCEC
>   PCI/RCEC: Fix RCiEP capable devices RCEC association
> 
> What do you think?  Also, feel free to change whatever you see fit, of 
> course, as
> tis is only a suggestion.
> 

Hi Krzysztof,

Thanks for improving the commit message. It looks clearer. 
Will send out a v2 with this commit message.

Thanks!
-Qiuxu

RE: [PATCH 1/1] PCI/RCEC: Fix failure to inject errors to some RCiEP devices

2021-02-18 Thread Zhuo, Qiuxu

>...
> 
> We could probably add the following:
> 
>   Fixes: 507b460f8144 ("PCI/ERR: Add pcie_link_rcec() to associate RCiEPs")
> 

OK. Will add this to the v2.

Thanks!
-Qiuxu

Re: [PATCH 3/3] vfio/type1: batch page pinning

2021-02-18 Thread Daniel Jordan

Alex Williamson  writes:
> This might not be the first batch we fill, I think this needs to unwind
> rather than direct return.

So it does, well spotted.  And it's the same thing with the ENODEV case
farther up.

> Series looks good otherwise.

Thanks for going through it!

RE: [PATCH v5 05/14] vfio/mdev: idxd: add basic mdev registration and helper functions

2021-02-18 Thread Tian, Kevin

> From: Jason Gunthorpe 
> Sent: Wednesday, February 17, 2021 5:33 AM
> 
> On Tue, Feb 16, 2021 at 12:04:55PM -0700, Dave Jiang wrote:
> 
> > > > +   return remap_pfn_range(vma, vma->vm_start, pgoff, req_size,
> pg_prot);
> > > Nothing validated req_size - did you copy this from the Intel RDMA
> > > driver? It had a huge security bug just like this.
> 
> > Thanks. Will add. Some of the code came from the Intel i915 mdev
> > driver.
> 
> Please make sure it is fixed as well, the security bug is huge.
> 

It's already been fixed 2yrs ago:

commit 51b00d8509dc69c98740da2ad07308b630d3eb7d
Author: Zhenyu Wang 
Date:   Fri Jan 11 13:58:53 2019 +0800

drm/i915/gvt: Fix mmap range check

This is to fix missed mmap range check on vGPU bar2 region
and only allow to map vGPU allocated GMADDR range, which means
user space should support sparse mmap to get proper offset for
mmap vGPU aperture. And this takes care of actual pgoff in mmap
request as original code always does from beginning of vGPU
aperture.

Fixes: 659643f7d814 ("drm/i915/gvt/kvmgt: add vfio/mdev support to KVMGT")
Cc: "Monroy, Rodrigo Axel" 
Cc: "Orrala Contreras, Alfredo" 
Cc: sta...@vger.kernel.org # v4.10+
Reviewed-by: Hang Yuan 
Signed-off-by: Zhenyu Wang 

Thanks
Kevin

Re: [PATCH 05/14] KVM: x86/mmu: Consult max mapping level when zapping collapsible SPTEs

2021-02-18 Thread Sean Christopherson

On Thu, Feb 18, 2021, Mike Kravetz wrote:
> On 2/18/21 8:23 AM, Sean Christopherson wrote:
> > On Thu, Feb 18, 2021, Paolo Bonzini wrote:
> >> On 13/02/21 01:50, Sean Christopherson wrote:
> >>>
> >>>   pfn = spte_to_pfn(iter.old_spte);
> >>>   if (kvm_is_reserved_pfn(pfn) ||
> >>> - (!PageTransCompoundMap(pfn_to_page(pfn)) &&
> >>> -  !kvm_is_zone_device_pfn(pfn)))
> >>> + iter.level >= kvm_mmu_max_mapping_level(kvm, slot, iter.gfn,
> >>> + pfn, PG_LEVEL_NUM))
> >>>   continue;
> >>
> >>
> >> This changes the test to PageCompound.  Is it worth moving the change to
> >> patch 1?
> > 
> > Yes?  I originally did that in a separate patch, then changed my mind.
> > 
> > If PageTransCompoundMap() also detects HugeTLB pages, then it is the 
> > "better"
> > option as it checks that the page is actually mapped huge.  I dropped the 
> > change
> > because PageTransCompound() is just a wrapper around PageCompound(), and so 
> > I
> > assumed PageTransCompoundMap() would detect HugeTLB pages, too.  I'm not so 
> > sure
> > about that after rereading the code, yet again.
> 
> I have not followed this thread, but HugeTLB hit my mail filter and I can
> help with this question.
> 
> No, PageTransCompoundMap() will not detect HugeTLB.  hugetlb pages do not
> use the compound_mapcount_ptr field.  So, that final check/return in
> PageTransCompoundMap() will always be false.

Thanks Mike!

Paolo, I agree it makes sense to switch to PageCompound in the earlier patch, in
case this one needs to be reverted.

Re: [PATCH v8 7/9] crypto: hisilicon/hpre - add 'ECDH' algorithm

2021-02-18 Thread yumeng





在 2021/2/19 4:01, Herbert Xu 写道:

On Thu, Feb 18, 2021 at 10:24:40AM +0800, yumeng wrote:


Ecdh-nist-p384 is supported by HPRE now, currently there is no patch of
the generic ecdh-nist-p384.


In that case please leave it out until there is:

1) An in-kernel user of p384.
2) There is a generic implementation.

Thanks,



 OK, I will, thanks.

And p224 and p521 are the same as p384 (has no user and no
generic implementation), so they should be supported by HPRE later,
is it?

thanks.

Re: [PATCH v5 4/5] x86/signal: Detect and prevent an alternate signal stack overflow

2021-02-18 Thread Andy Lutomirski

On Wed, Feb 3, 2021 at 9:27 AM Chang S. Bae  wrote:
>
> The kernel pushes context on to the userspace stack to prepare for the
> user's signal handler. When the user has supplied an alternate signal
> stack, via sigaltstack(2), it is easy for the kernel to verify that the
> stack size is sufficient for the current hardware context.
>
> Check if writing the hardware context to the alternate stack will exceed
> it's size. If yes, then instead of corrupting user-data and proceeding with
> the original signal handler, an immediate SIGSEGV signal is delivered.
>
> While previous patches in this series allow new source code to discover and
> use a sufficient alternate signal stack size, this check is still necessary
> to protect binaries with insufficient alternate signal stack size from data
> corruption.
>
> Suggested-by: Jann Horn 
> Signed-off-by: Chang S. Bae 
> Reviewed-by: Len Brown 
> Reviewed-by: Jann Horn 
> Cc: Borislav Petkov 
> Cc: Jann Horn 
> Cc: x...@kernel.org
> Cc: linux-kernel@vger.kernel.org
> ---
> Changes from v3:
> * Updated the changelog (Borislav Petkov)
>
> Changes from v2:
> * Simplified the implementation (Jann Horn)
> ---
>  arch/x86/kernel/signal.c | 7 +--
>  1 file changed, 5 insertions(+), 2 deletions(-)
>
> diff --git a/arch/x86/kernel/signal.c b/arch/x86/kernel/signal.c
> index 0d24f64d0145..8e2df070dbfd 100644
> --- a/arch/x86/kernel/signal.c
> +++ b/arch/x86/kernel/signal.c
> @@ -242,7 +242,7 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs 
> *regs, size_t frame_size,
> unsigned long math_size = 0;
> unsigned long sp = regs->sp;
> unsigned long buf_fx = 0;
> -   int onsigstack = on_sig_stack(sp);
> +   bool onsigstack = on_sig_stack(sp);
> int ret;
>
> /* redzone */
> @@ -251,8 +251,11 @@ get_sigframe(struct k_sigaction *ka, struct pt_regs 
> *regs, size_t frame_size,
>
> /* This is the X/Open sanctioned signal stack switching.  */
> if (ka->sa.sa_flags & SA_ONSTACK) {
> -   if (sas_ss_flags(sp) == 0)
> +   if (sas_ss_flags(sp) == 0) {
> sp = current->sas_ss_sp + current->sas_ss_size;
> +   /* On the alternate signal stack */
> +   onsigstack = true;

This is buggy.  The old code had a dubious special case for
SS_AUTODISARM, and this interacts poorly with it.  I think you could
fix it by separating the case in which you are already on the altstack
from the case in which you're switching to the altstack, or you could
fix it by changing the check at the end of the function to literally
check whether the sp value is in bounds instead of calling
on_sig_stack.

Arguably the generic helpers could be adjusted to make this less annoying.

Re: [PATCH] of: error: 'const struct kimage' has no member named 'arch'

2021-02-18 Thread Thiago Jung Bauermann



Lakshmi Ramasubramanian  writes:

> On 2/18/21 4:07 PM, Mimi Zohar wrote:
>
> Hi Mimi,
>
>> On Thu, 2021-02-18 at 14:33 -0800, Lakshmi Ramasubramanian wrote:
>>> of_kexec_alloc_and_setup_fdt() defined in drivers/of/kexec.c builds
>>> a new device tree object that includes architecture specific data
>>> for kexec system call.  This should be defined only if the architecture
>>> being built defines kexec architecture structure "struct kimage_arch".
>>>
>>> Define a new boolean config OF_KEXEC that is enabled if
>>> CONFIG_KEXEC_FILE and CONFIG_OF_FLATTREE are enabled, and
>>> the architecture is arm64 or powerpc64.  Build drivers/of/kexec.c
>>> if CONFIG_OF_KEXEC is enabled.
>>>
>>> Signed-off-by: Lakshmi Ramasubramanian 
>>> Fixes: 33488dc4d61f ("of: Add a common kexec FDT setup function")
>>> Reported-by: kernel test robot 
>>> ---
>>>   drivers/of/Kconfig  | 6 ++
>>>   drivers/of/Makefile | 7 +--
>>>   2 files changed, 7 insertions(+), 6 deletions(-)
>>>
>>> diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig
>>> index 18450437d5d5..f2e8fa54862a 100644
>>> --- a/drivers/of/Kconfig
>>> +++ b/drivers/of/Kconfig
>>> @@ -100,4 +100,10 @@ config OF_DMA_DEFAULT_COHERENT
>>> # arches should select this if DMA is coherent by default for OF devices
>>> bool
>>>   +config OF_KEXEC
>>> +   bool
>>> +   depends on KEXEC_FILE
>>> +   depends on OF_FLATTREE
>>> +   default y if ARM64 || PPC64
>>> +
>>>   endif # OF
>>> diff --git a/drivers/of/Makefile b/drivers/of/Makefile
>>> index c13b982084a3..287579dd1695 100644
>>> --- a/drivers/of/Makefile
>>> +++ b/drivers/of/Makefile
>>> @@ -13,11 +13,6 @@ obj-$(CONFIG_OF_RESERVED_MEM) += of_reserved_mem.o
>>>   obj-$(CONFIG_OF_RESOLVE)  += resolver.o
>>>   obj-$(CONFIG_OF_OVERLAY) += overlay.o
>>>   obj-$(CONFIG_OF_NUMA) += of_numa.o
>>> -
>>> -ifdef CONFIG_KEXEC_FILE
>>> -ifdef CONFIG_OF_FLATTREE
>>> -obj-y  += kexec.o
>>> -endif
>>> -endif
>>> +obj-$(CONFIG_OF_KEXEC) += kexec.o
>>> obj-$(CONFIG_OF_UNITTEST) += unittest-data/
>> Is it possible to reuse CONFIG_HAVE_IMA_KEXEC here?
>> 
>
> For ppc64 CONFIG_HAVE_IMA_KEXEC is selected when CONFIG_KEXEC_FILE is enabled.
> So I don't see a problem in reusing CONFIG_HAVE_IMA_KEXEC for ppc.
>
> But for arm64, CONFIG_HAVE_IMA_KEXEC is enabled in the final patch in the 
> patch
> set (the one for carrying forward IMA log across kexec for arm64). arm64 calls
> of_kexec_alloc_and_setup_fdt() prior to enabling CONFIG_HAVE_IMA_KEXEC and 
> hence
> breaks the build for arm64.

One problem is that I believe that this patch won't placate the robot,
because IIUC it generates config files at random and this change still
allows hppa and s390 to enable CONFIG_OF_KEXEC.

Perhaps a new CONFIG_HAVE_KIMAGE_ARCH option? Not having that option
would still allow building kexec.o, but would be used inside kexec.c to
avoid accessing kimage.arch members.

-- 
Thiago Jung Bauermann
IBM Linux Technology Center

Re: [PATCH v6 0/2] fix a NULL pointer bug and simplify the code

2021-02-18 Thread Jens Axboe

On 2/18/21 5:26 AM, Sun Ke wrote:
> fix a NULL pointer bug and simplify the code
> 
> v6: Just add if (nbd->recv_workq) to nbd_disconnect_and_put().
> v5: Adjust the title and add “Suggested-by”.
> v4: Share exception handling code for if branches and 
>   move put_nbd adjustment to a separate patch.
> v3: Do not use unlock and add put_nbd.
> v2: Use jump target unlock.
> 
> Sun Ke (2):
>   nbd: Fix NULL pointer in flush_workqueue
>   nbd: share nbd_put and return by goto put_nbd
> 
>  drivers/block/nbd.c | 10 +-
>  1 file changed, 5 insertions(+), 5 deletions(-)

Applied for 5.12, thanks.

-- 
Jens Axboe

[PATCH v3] IMA: support for duplicate measurement records

2021-02-18 Thread Tushar Sugandhi

IMA does not include duplicate file, buffer, or critical data
measurement records since TPM extend is a very expensive
operation.  However, in some cases, the measurement of duplicate
records is necessary to accurately determine the current state of the
system.  For instance - the file, buffer, or critical data measurement
record may change from some value 'val#1', to 'val#2', and then back
to 'val#1'.  Currently, IMA will not measure the last change to 'val#1',
since the hash of 'val#1' for the given record is already present in the
measurement log.  This limits the ability of the attestation service to
accurately determine the current state of the system, because it would
be interpreted as the system having 'val#2' for the given record.

Update ima_add_template_entry() to support measurement of duplicate
records, driven by a Kconfig option - IMA_DISABLE_HTABLE.

Signed-off-by: Tushar Sugandhi 
---
Change Log v3:
 - Incorporated feedback from Mimi on v2.
 - Updated patch title and description to make it generic.
 - Changed config description word 'data' to 'records'.
 - Tested use cases for boot param "ima_policy=tcb".

Change Log v2:
 - Incorporated feedback from Mimi on v1.
 - The fix is not just applicable to measurement of critical data,
   it now applies to other buffers and file data as well.
 - the fix is driven by a Kconfig option IMA_DISABLE_HTABLE, rather
   than a IMA policy condition - allow_dup.

 security/integrity/ima/Kconfig | 7 +++
 security/integrity/ima/ima_queue.c | 5 +++--
 2 files changed, 10 insertions(+), 2 deletions(-)

diff --git a/security/integrity/ima/Kconfig b/security/integrity/ima/Kconfig
index 12e9250c1bec..d0ceada99243 100644
--- a/security/integrity/ima/Kconfig
+++ b/security/integrity/ima/Kconfig
@@ -334,3 +334,10 @@ config IMA_SECURE_AND_OR_TRUSTED_BOOT
help
   This option is selected by architectures to enable secure and/or
   trusted boot based on IMA runtime policies.
+
+config IMA_DISABLE_HTABLE
+   bool "Disable htable to allow measurement of duplicate records"
+   depends on IMA
+   default n
+   help
+  This option disables htable to allow measurement of duplicate 
records.
diff --git a/security/integrity/ima/ima_queue.c 
b/security/integrity/ima/ima_queue.c
index c096ef8945c7..532da87ce519 100644
--- a/security/integrity/ima/ima_queue.c
+++ b/security/integrity/ima/ima_queue.c
@@ -168,7 +168,7 @@ int ima_add_template_entry(struct ima_template_entry 
*entry, int violation,
int result = 0, tpmresult = 0;
 
mutex_lock(_extend_list_mutex);
-   if (!violation) {
+   if (!violation && !IS_ENABLED(CONFIG_IMA_DISABLE_HTABLE)) {
if (ima_lookup_digest_entry(digest, entry->pcr)) {
audit_cause = "hash_exists";
result = -EEXIST;
@@ -176,7 +176,8 @@ int ima_add_template_entry(struct ima_template_entry 
*entry, int violation,
}
}
 
-   result = ima_add_digest_entry(entry, 1);
+   result = ima_add_digest_entry(entry,
+ !IS_ENABLED(CONFIG_IMA_DISABLE_HTABLE));
if (result < 0) {
audit_cause = "ENOMEM";
audit_info = 0;
-- 
2.17.1

Re: [PATCH] of: error: 'const struct kimage' has no member named 'arch'

2021-02-18 Thread Lakshmi Ramasubramanian


On 2/18/21 4:07 PM, Mimi Zohar wrote:

Hi Mimi,


On Thu, 2021-02-18 at 14:33 -0800, Lakshmi Ramasubramanian wrote:

of_kexec_alloc_and_setup_fdt() defined in drivers/of/kexec.c builds
a new device tree object that includes architecture specific data
for kexec system call.  This should be defined only if the architecture
being built defines kexec architecture structure "struct kimage_arch".

Define a new boolean config OF_KEXEC that is enabled if
CONFIG_KEXEC_FILE and CONFIG_OF_FLATTREE are enabled, and
the architecture is arm64 or powerpc64.  Build drivers/of/kexec.c
if CONFIG_OF_KEXEC is enabled.

Signed-off-by: Lakshmi Ramasubramanian 
Fixes: 33488dc4d61f ("of: Add a common kexec FDT setup function")
Reported-by: kernel test robot 
---
  drivers/of/Kconfig  | 6 ++
  drivers/of/Makefile | 7 +--
  2 files changed, 7 insertions(+), 6 deletions(-)

diff --git a/drivers/of/Kconfig b/drivers/of/Kconfig
index 18450437d5d5..f2e8fa54862a 100644
--- a/drivers/of/Kconfig
+++ b/drivers/of/Kconfig
@@ -100,4 +100,10 @@ config OF_DMA_DEFAULT_COHERENT
# arches should select this if DMA is coherent by default for OF devices
bool
  
+config OF_KEXEC

+   bool
+   depends on KEXEC_FILE
+   depends on OF_FLATTREE
+   default y if ARM64 || PPC64
+
  endif # OF
diff --git a/drivers/of/Makefile b/drivers/of/Makefile
index c13b982084a3..287579dd1695 100644
--- a/drivers/of/Makefile
+++ b/drivers/of/Makefile
@@ -13,11 +13,6 @@ obj-$(CONFIG_OF_RESERVED_MEM) += of_reserved_mem.o
  obj-$(CONFIG_OF_RESOLVE)  += resolver.o
  obj-$(CONFIG_OF_OVERLAY) += overlay.o
  obj-$(CONFIG_OF_NUMA) += of_numa.o
-
-ifdef CONFIG_KEXEC_FILE
-ifdef CONFIG_OF_FLATTREE
-obj-y  += kexec.o
-endif
-endif
+obj-$(CONFIG_OF_KEXEC) += kexec.o
  
  obj-$(CONFIG_OF_UNITTEST) += unittest-data/


Is it possible to reuse CONFIG_HAVE_IMA_KEXEC here?



For ppc64 CONFIG_HAVE_IMA_KEXEC is selected when CONFIG_KEXEC_FILE is 
enabled. So I don't see a problem in reusing CONFIG_HAVE_IMA_KEXEC for ppc.


But for arm64, CONFIG_HAVE_IMA_KEXEC is enabled in the final patch in 
the patch set (the one for carrying forward IMA log across kexec for 
arm64). arm64 calls of_kexec_alloc_and_setup_fdt() prior to enabling 
CONFIG_HAVE_IMA_KEXEC and hence breaks the build for arm64.


thanks,
 -lakshmi

Re: [PATCH] arm64: dts: qcom: sc7180: Avoid glitching SPI CS at bootup on trogdor

2021-02-18 Thread Doug Anderson

Hi,

On Thu, Feb 18, 2021 at 2:55 PM Douglas Anderson  wrote:
>
> it's believed
> that, under certain timing conditions, it could be getting the EC into
> a confused state causing the EC driver to fail to probe.

Believed => confirmed

I _think_  is public.  It
explains why this was causing the EC driver to fail to prove.  In
short: it turns out that when we glitched the EC it printed to its
console.  If the EC's uptime was long enough then it would spend
enough time printing the timestamp for this error message (a bunch of
64-bit divide by 10) that it wouldn't be ready for the message we sent
to it.  Doh!

-Doug

Re: pstore: fix compression

2021-02-18 Thread Jiri Bohac

On Thu, Feb 18, 2021 at 12:30:03PM -0800, Kees Cook wrote:
> Eek; thanks for the catch!

thanks for applying the fix;

BTW, with the compression broken, I was not getting any dmesg
stored in ERST at all, not even uncompressed. After instrumenting
the code with a lot of debug printks I found that writing
erst_erange.size worth of data into the ERST fails and the
maximum writeable size is 98 bytes smaler:

Details: 

- erst_erange.size = 65536
- this results in  erst_info.bufsize = 65336
- pstore_compress() returned -EINVAL (because of the
  just-fixed typo), zipped_len = -EINVAL.
- pstore_dump calls copy_kmsg_to_buffer to only copy bufsize
  bytes from big_oops_buf to psinfo->buf;
  record.size = bufsize = 65336

psinfo->write() then fails with -EINVAL;
by more tracing inside the ERST code I found the -EINVAL was
produced by __erst_write_to_storage()
after apei_exec_ctx_get_output() returned
val=ERST_STATUS_FAILED=3 and this got translated into -EINVAL by
erst_errno().

Once the compression was fixed everything started working because
the records are much smaller after the compression (~30kB).

My next thought was to find the largest possible record that
could be written successfully.
I modified the ERST init code to decrease erst_info.bufsize by a
value specified on the cmdline. The maximum writable record was
65238 bytes long (i.e. erst_erange.size - sizeof(struct
cper_pstore_record) - 98). With this hack I got
65238 bytes of uncompressed dmesg stored to ERST.

Any idea what might be causing this?
As far as I can tell, there are no other records in the ERST
(checked through the erst-dbg interface).
Tested on a HPE ProLiant DL120 Gen10 server.

Thanks!


-- 
Jiri Bohac 
SUSE Labs, Prague, Czechia

[PATCH v7 4/6] userfaultfd: add UFFDIO_CONTINUE ioctl

2021-02-18 Thread Axel Rasmussen

This ioctl is how userspace ought to resolve "minor" userfaults. The
idea is, userspace is notified that a minor fault has occurred. It might
change the contents of the page using its second non-UFFD mapping, or
not. Then, it calls UFFDIO_CONTINUE to tell the kernel "I have ensured
the page contents are correct, carry on setting up the mapping".

Note that it doesn't make much sense to use UFFDIO_{COPY,ZEROPAGE} for
MINOR registered VMAs. ZEROPAGE maps the VMA to the zero page; but in
the minor fault case, we already have some pre-existing underlying page.
Likewise, UFFDIO_COPY isn't useful if we have a second non-UFFD mapping.
We'd just use memcpy() or similar instead.

It turns out hugetlb_mcopy_atomic_pte() already does very close to what
we want, if an existing page is provided via `struct page **pagep`. We
already special-case the behavior a bit for the UFFDIO_ZEROPAGE case, so
just extend that design: add an enum for the three modes of operation,
and make the small adjustments needed for the MCOPY_ATOMIC_CONTINUE
case. (Basically, look up the existing page, and avoid adding the
existing page to the page cache or calling set_page_huge_active() on
it.)

Signed-off-by: Axel Rasmussen 
---
 fs/userfaultfd.c | 67 
 include/linux/hugetlb.h  |  3 ++
 include/linux/userfaultfd_k.h| 18 +
 include/uapi/linux/userfaultfd.h | 21 +-
 mm/hugetlb.c | 41 ---
 mm/userfaultfd.c | 37 +++---
 6 files changed, 157 insertions(+), 30 deletions(-)

diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index 041449e47870..3b42c09eb043 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -1488,6 +1488,10 @@ static int userfaultfd_register(struct userfaultfd_ctx 
*ctx,
if (!(uffdio_register.mode & UFFDIO_REGISTER_MODE_WP))
ioctls_out &= ~((__u64)1 << _UFFDIO_WRITEPROTECT);
 
+   /* CONTINUE ioctl is only supported for MINOR ranges. */
+   if (!(uffdio_register.mode & UFFDIO_REGISTER_MODE_MINOR))
+   ioctls_out &= ~((__u64)1 << _UFFDIO_CONTINUE);
+
/*
 * Now that we scanned all vmas we can already tell
 * userland which ioctls methods are guaranteed to
@@ -1841,6 +1845,66 @@ static int userfaultfd_writeprotect(struct 
userfaultfd_ctx *ctx,
return ret;
 }
 
+static int userfaultfd_continue(struct userfaultfd_ctx *ctx, unsigned long arg)
+{
+   __s64 ret;
+   struct uffdio_continue uffdio_continue;
+   struct uffdio_continue __user *user_uffdio_continue;
+   struct userfaultfd_wake_range range;
+
+   user_uffdio_continue = (struct uffdio_continue __user *)arg;
+
+   ret = -EAGAIN;
+   if (READ_ONCE(ctx->mmap_changing))
+   goto out;
+
+   ret = -EFAULT;
+   if (copy_from_user(_continue, user_uffdio_continue,
+  /* don't copy the output fields */
+  sizeof(uffdio_continue) - (sizeof(__s64
+   goto out;
+
+   ret = validate_range(ctx->mm, _continue.range.start,
+uffdio_continue.range.len);
+   if (ret)
+   goto out;
+
+   ret = -EINVAL;
+   /* double check for wraparound just in case. */
+   if (uffdio_continue.range.start + uffdio_continue.range.len <=
+   uffdio_continue.range.start) {
+   goto out;
+   }
+   if (uffdio_continue.mode & ~UFFDIO_CONTINUE_MODE_DONTWAKE)
+   goto out;
+
+   if (mmget_not_zero(ctx->mm)) {
+   ret = mcopy_continue(ctx->mm, uffdio_continue.range.start,
+uffdio_continue.range.len,
+>mmap_changing);
+   mmput(ctx->mm);
+   } else {
+   return -ESRCH;
+   }
+
+   if (unlikely(put_user(ret, _uffdio_continue->mapped)))
+   return -EFAULT;
+   if (ret < 0)
+   goto out;
+
+   /* len == 0 would wake all */
+   BUG_ON(!ret);
+   range.len = ret;
+   if (!(uffdio_continue.mode & UFFDIO_CONTINUE_MODE_DONTWAKE)) {
+   range.start = uffdio_continue.range.start;
+   wake_userfault(ctx, );
+   }
+   ret = range.len == uffdio_continue.range.len ? 0 : -EAGAIN;
+
+out:
+   return ret;
+}
+
 static inline unsigned int uffd_ctx_features(__u64 user_features)
 {
/*
@@ -1928,6 +1992,9 @@ static long userfaultfd_ioctl(struct file *file, unsigned 
cmd,
case UFFDIO_WRITEPROTECT:
ret = userfaultfd_writeprotect(ctx, arg);
break;
+   case UFFDIO_CONTINUE:
+   ret = userfaultfd_continue(ctx, arg);
+   break;
}
return ret;
 }
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index 7e6d2f126df3..8f2a4fc11b1f 100644
--- a/include/linux/hugetlb.h
+++

[PATCH v7 3/6] userfaultfd: hugetlbfs: only compile UFFD helpers if config enabled

2021-02-18 Thread Axel Rasmussen

For background, mm/userfaultfd.c provides a general mcopy_atomic
implementation. But some types of memory (i.e., hugetlb and shmem) need
a slightly different implementation, so they provide their own helpers
for this. In other words, userfaultfd is the only caller of these
functions.

This patch achieves two things:

1. Don't spend time compiling code which will end up never being
referenced anyway (a small build time optimization).

2. In patches later in this series, we extend the signature of these
helpers with UFFD-specific state (a mode enumeration). Once this
happens, we *have to* either not compile the helpers, or unconditionally
define the UFFD-only state (which seems messier to me). This includes
the declarations in the headers, as otherwise they'd yield warnings
about implicitly defining the type of those arguments.

Reviewed-by: Mike Kravetz 
Reviewed-by: Peter Xu 
Signed-off-by: Axel Rasmussen 
---
 include/linux/hugetlb.h | 4 
 mm/hugetlb.c| 2 ++
 2 files changed, 6 insertions(+)

diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h
index ef5b55dbeb9a..7e6d2f126df3 100644
--- a/include/linux/hugetlb.h
+++ b/include/linux/hugetlb.h
@@ -134,11 +134,13 @@ void hugetlb_show_meminfo(void);
 unsigned long hugetlb_total_pages(void);
 vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
unsigned long address, unsigned int flags);
+#ifdef CONFIG_USERFAULTFD
 int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm, pte_t *dst_pte,
struct vm_area_struct *dst_vma,
unsigned long dst_addr,
unsigned long src_addr,
struct page **pagep);
+#endif /* CONFIG_USERFAULTFD */
 bool hugetlb_reserve_pages(struct inode *inode, long from, long to,
struct vm_area_struct *vma,
vm_flags_t vm_flags);
@@ -310,6 +312,7 @@ static inline void hugetlb_free_pgd_range(struct mmu_gather 
*tlb,
BUG();
 }
 
+#ifdef CONFIG_USERFAULTFD
 static inline int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
pte_t *dst_pte,
struct vm_area_struct *dst_vma,
@@ -320,6 +323,7 @@ static inline int hugetlb_mcopy_atomic_pte(struct mm_struct 
*dst_mm,
BUG();
return 0;
 }
+#endif /* CONFIG_USERFAULTFD */
 
 static inline pte_t *huge_pte_offset(struct mm_struct *mm, unsigned long addr,
unsigned long sz)
diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 0388107da4b1..301b6b64c04e 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -4624,6 +4624,7 @@ vm_fault_t hugetlb_fault(struct mm_struct *mm, struct 
vm_area_struct *vma,
return ret;
 }
 
+#ifdef CONFIG_USERFAULTFD
 /*
  * Used by userfaultfd UFFDIO_COPY.  Based on mcopy_atomic_pte with
  * modifications for huge pages.
@@ -4754,6 +4755,7 @@ int hugetlb_mcopy_atomic_pte(struct mm_struct *dst_mm,
put_page(page);
goto out;
 }
+#endif /* CONFIG_USERFAULTFD */
 
 static void record_subpages_vmas(struct page *page, struct vm_area_struct *vma,
 int refs, struct page **pages,
-- 
2.30.0.617.g56c4b15f3c-goog

[PATCH v7 5/6] userfaultfd: update documentation to describe minor fault handling

2021-02-18 Thread Axel Rasmussen

Reword / reorganize things a little bit into "lists", so new features /
modes / ioctls can sort of just be appended.

Describe how UFFDIO_REGISTER_MODE_MINOR and UFFDIO_CONTINUE can be used
to intercept and resolve minor faults. Make it clear that COPY and
ZEROPAGE are used for MISSING faults, whereas CONTINUE is used for MINOR
faults.

Signed-off-by: Axel Rasmussen 
---
 Documentation/admin-guide/mm/userfaultfd.rst | 107 ---
 1 file changed, 66 insertions(+), 41 deletions(-)

diff --git a/Documentation/admin-guide/mm/userfaultfd.rst 
b/Documentation/admin-guide/mm/userfaultfd.rst
index 65eefa66c0ba..3aa38e8b8361 100644
--- a/Documentation/admin-guide/mm/userfaultfd.rst
+++ b/Documentation/admin-guide/mm/userfaultfd.rst
@@ -63,36 +63,36 @@ the generic ioctl available.
 
 The ``uffdio_api.features`` bitmask returned by the ``UFFDIO_API`` ioctl
 defines what memory types are supported by the ``userfaultfd`` and what
-events, except page fault notifications, may be generated.
-
-If the kernel supports registering ``userfaultfd`` ranges on hugetlbfs
-virtual memory areas, ``UFFD_FEATURE_MISSING_HUGETLBFS`` will be set in
-``uffdio_api.features``. Similarly, ``UFFD_FEATURE_MISSING_SHMEM`` will be
-set if the kernel supports registering ``userfaultfd`` ranges on shared
-memory (covering all shmem APIs, i.e. tmpfs, ``IPCSHM``, ``/dev/zero``,
-``MAP_SHARED``, ``memfd_create``, etc).
-
-The userland application that wants to use ``userfaultfd`` with hugetlbfs
-or shared memory need to set the corresponding flag in
-``uffdio_api.features`` to enable those features.
-
-If the userland desires to receive notifications for events other than
-page faults, it has to verify that ``uffdio_api.features`` has appropriate
-``UFFD_FEATURE_EVENT_*`` bits set. These events are described in more
-detail below in `Non-cooperative userfaultfd`_ section.
-
-Once the ``userfaultfd`` has been enabled the ``UFFDIO_REGISTER`` ioctl should
-be invoked (if present in the returned ``uffdio_api.ioctls`` bitmask) to
-register a memory range in the ``userfaultfd`` by setting the
+events, except page fault notifications, may be generated:
+
+- The ``UFFD_FEATURE_EVENT_*`` flags indicate that various other events
+  other than page faults are supported. These events are described in more
+  detail below in the `Non-cooperative userfaultfd`_ section.
+
+- ``UFFD_FEATURE_MISSING_HUGETLBFS`` and ``UFFD_FEATURE_MISSING_SHMEM``
+  indicate that the kernel supports ``UFFDIO_REGISTER_MODE_MISSING``
+  registrations for hugetlbfs and shared memory (covering all shmem APIs,
+  i.e. tmpfs, ``IPCSHM``, ``/dev/zero``, ``MAP_SHARED``, ``memfd_create``,
+  etc) virtual memory areas, respectively.
+
+- ``UFFD_FEATURE_MINOR_HUGETLBFS`` indicates that the kernel supports
+  ``UFFDIO_REGISTER_MODE_MINOR`` registration for hugetlbfs virtual memory
+  areas.
+
+The userland application should set the feature flags it intends to use
+when invoking the ``UFFDIO_API`` ioctl, to request that those features be
+enabled if supported.
+
+Once the ``userfaultfd`` API has been enabled the ``UFFDIO_REGISTER``
+ioctl should be invoked (if present in the returned ``uffdio_api.ioctls``
+bitmask) to register a memory range in the ``userfaultfd`` by setting the
 uffdio_register structure accordingly. The ``uffdio_register.mode``
 bitmask will specify to the kernel which kind of faults to track for
-the range (``UFFDIO_REGISTER_MODE_MISSING`` would track missing
-pages). The ``UFFDIO_REGISTER`` ioctl will return the
+the range. The ``UFFDIO_REGISTER`` ioctl will return the
 ``uffdio_register.ioctls`` bitmask of ioctls that are suitable to resolve
 userfaults on the range registered. Not all ioctls will necessarily be
-supported for all memory types depending on the underlying virtual
-memory backend (anonymous memory vs tmpfs vs real filebacked
-mappings).
+supported for all memory types (e.g. anonymous memory vs. shmem vs.
+hugetlbfs), or all types of intercepted faults.
 
 Userland can use the ``uffdio_register.ioctls`` to manage the virtual
 address space in the background (to add or potentially also remove
@@ -100,21 +100,46 @@ memory from the ``userfaultfd`` registered range). This 
means a userfault
 could be triggering just before userland maps in the background the
 user-faulted page.
 
-The primary ioctl to resolve userfaults is ``UFFDIO_COPY``. That
-atomically copies a page into the userfault registered range and wakes
-up the blocked userfaults
-(unless ``uffdio_copy.mode & UFFDIO_COPY_MODE_DONTWAKE`` is set).
-Other ioctl works similarly to ``UFFDIO_COPY``. They're atomic as in
-guaranteeing that nothing can see an half copied page since it'll
-keep userfaulting until the copy has finished.
+Resolving Userfaults
+
+
+There are three basic ways to resolve userfaults:
+
+- ``UFFDIO_COPY`` atomically copies some existing page contents from
+  userspace.
+
+- ``UFFDIO_ZEROPAGE`` atomically zeros the new page.
+
+- ``UFFDIO_CONTINUE``

[PATCH v7 6/6] userfaultfd/selftests: add test exercising minor fault handling

2021-02-18 Thread Axel Rasmussen

Fix a dormant bug in userfaultfd_events_test(), where we did
`return faulting_process(0)` instead of `exit(faulting_process(0))`.
This caused the forked process to keep running, trying to execute any
further test cases after the events test in parallel with the "real"
process.

Add a simple test case which exercises minor faults. In short, it does
the following:

1. "Sets up" an area (area_dst) and a second shared mapping to the same
   underlying pages (area_dst_alias).

2. Register one of these areas with userfaultfd, in minor fault mode.

3. Start a second thread to handle any minor faults.

4. Populate the underlying pages with the non-UFFD-registered side of
   the mapping. Basically, memset() each page with some arbitrary
   contents.

5. Then, using the UFFD-registered mapping, read all of the page
   contents, asserting that the contents match expectations (we expect
   the minor fault handling thread can modify the page contents before
   resolving the fault).

The minor fault handling thread, upon receiving an event, flips all the
bits (~) in that page, just to prove that it can modify it in some
arbitrary way. Then it issues a UFFDIO_CONTINUE ioctl, to setup the
mapping and resolve the fault. The reading thread should wake up and see
this modification.

Currently the minor fault test is only enabled in hugetlb_shared mode,
as this is the only configuration the kernel feature supports.

Reviewed-by: Peter Xu 
Signed-off-by: Axel Rasmussen 
---
 tools/testing/selftests/vm/userfaultfd.c | 164 ++-
 1 file changed, 158 insertions(+), 6 deletions(-)

diff --git a/tools/testing/selftests/vm/userfaultfd.c 
b/tools/testing/selftests/vm/userfaultfd.c
index 92b8ec423201..f5ab5e0312e7 100644
--- a/tools/testing/selftests/vm/userfaultfd.c
+++ b/tools/testing/selftests/vm/userfaultfd.c
@@ -81,6 +81,8 @@ static volatile bool test_uffdio_copy_eexist = true;
 static volatile bool test_uffdio_zeropage_eexist = true;
 /* Whether to test uffd write-protection */
 static bool test_uffdio_wp = false;
+/* Whether to test uffd minor faults */
+static bool test_uffdio_minor = false;
 
 static bool map_shared;
 static int huge_fd;
@@ -96,6 +98,7 @@ struct uffd_stats {
int cpu;
unsigned long missing_faults;
unsigned long wp_faults;
+   unsigned long minor_faults;
 };
 
 /* pthread_mutex_t starts at page offset 0 */
@@ -153,17 +156,19 @@ static void uffd_stats_reset(struct uffd_stats 
*uffd_stats,
uffd_stats[i].cpu = i;
uffd_stats[i].missing_faults = 0;
uffd_stats[i].wp_faults = 0;
+   uffd_stats[i].minor_faults = 0;
}
 }
 
 static void uffd_stats_report(struct uffd_stats *stats, int n_cpus)
 {
int i;
-   unsigned long long miss_total = 0, wp_total = 0;
+   unsigned long long miss_total = 0, wp_total = 0, minor_total = 0;
 
for (i = 0; i < n_cpus; i++) {
miss_total += stats[i].missing_faults;
wp_total += stats[i].wp_faults;
+   minor_total += stats[i].minor_faults;
}
 
printf("userfaults: %llu missing (", miss_total);
@@ -172,6 +177,9 @@ static void uffd_stats_report(struct uffd_stats *stats, int 
n_cpus)
printf("\b), %llu wp (", wp_total);
for (i = 0; i < n_cpus; i++)
printf("%lu+", stats[i].wp_faults);
+   printf("\b), %llu minor (", minor_total);
+   for (i = 0; i < n_cpus; i++)
+   printf("%lu+", stats[i].minor_faults);
printf("\b)\n");
 }
 
@@ -328,7 +336,7 @@ static struct uffd_test_ops shmem_uffd_test_ops = {
 };
 
 static struct uffd_test_ops hugetlb_uffd_test_ops = {
-   .expected_ioctls = UFFD_API_RANGE_IOCTLS_BASIC,
+   .expected_ioctls = UFFD_API_RANGE_IOCTLS_BASIC & ~(1 << 
_UFFDIO_CONTINUE),
.allocate_area  = hugetlb_allocate_area,
.release_pages  = hugetlb_release_pages,
.alias_mapping = hugetlb_alias_mapping,
@@ -362,6 +370,22 @@ static void wp_range(int ufd, __u64 start, __u64 len, bool 
wp)
}
 }
 
+static void continue_range(int ufd, __u64 start, __u64 len)
+{
+   struct uffdio_continue req;
+
+   req.range.start = start;
+   req.range.len = len;
+   req.mode = 0;
+
+   if (ioctl(ufd, UFFDIO_CONTINUE, )) {
+   fprintf(stderr,
+   "UFFDIO_CONTINUE failed for address 0x%" PRIx64 "\n",
+   (uint64_t)start);
+   exit(1);
+   }
+}
+
 static void *locking_thread(void *arg)
 {
unsigned long cpu = (unsigned long) arg;
@@ -569,8 +593,32 @@ static void uffd_handle_page_fault(struct uffd_msg *msg,
}
 
if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_WP) {
+   /* Write protect page faults */
wp_range(uffd, msg->arg.pagefault.address, page_size, false);
stats->wp_faults++;
+   } else if (msg->arg.pagefault.flags & UFFD_PAGEFAULT_FLAG_MINOR) {
+

[PATCH v7 1/6] userfaultfd: add minor fault registration mode

2021-02-18 Thread Axel Rasmussen

This feature allows userspace to intercept "minor" faults. By "minor"
faults, I mean the following situation:

Let there exist two mappings (i.e., VMAs) to the same page(s). One of
the mappings is registered with userfaultfd (in minor mode), and the
other is not. Via the non-UFFD mapping, the underlying pages have
already been allocated & filled with some contents. The UFFD mapping
has not yet been faulted in; when it is touched for the first time,
this results in what I'm calling a "minor" fault. As a concrete
example, when working with hugetlbfs, we have huge_pte_none(), but
find_lock_page() finds an existing page.

This commit adds the new registration mode, and sets the relevant flag
on the VMAs being registered. In the hugetlb fault path, if we find
that we have huge_pte_none(), but find_lock_page() does indeed find an
existing page, then we have a "minor" fault, and if the VMA has the
userfaultfd registration flag, we call into userfaultfd to handle it.

This is implemented as a new registration mode, instead of an API
feature. This is because the alternative implementation has significant
drawbacks [1].

However, doing it this was requires we allocate a VM_* flag for the new
registration mode. On 32-bit systems, there are no unused bits, so this
feature is only supported on architectures with
CONFIG_ARCH_USES_HIGH_VMA_FLAGS. When attempting to register a VMA in
MINOR mode on 32-bit architectures, we return -EINVAL.

[1] https://lore.kernel.org/patchwork/patch/1380226/

Signed-off-by: Axel Rasmussen 
---
 arch/arm64/Kconfig   |  1 +
 arch/x86/Kconfig |  1 +
 fs/proc/task_mmu.c   |  3 ++
 fs/userfaultfd.c | 79 +++-
 include/linux/mm.h   |  7 +++
 include/linux/userfaultfd_k.h| 15 +-
 include/trace/events/mmflags.h   |  9 +++-
 include/uapi/linux/userfaultfd.h | 15 +-
 init/Kconfig |  5 ++
 mm/hugetlb.c | 32 +
 10 files changed, 132 insertions(+), 35 deletions(-)

diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig
index 31bd885b79eb..822ff6d2a0f6 100644
--- a/arch/arm64/Kconfig
+++ b/arch/arm64/Kconfig
@@ -208,6 +208,7 @@ config ARM64
select SWIOTLB
select SYSCTL_EXCEPTION_TRACE
select THREAD_INFO_IN_TASK
+   select HAVE_ARCH_USERFAULTFD_MINOR if USERFAULTFD
help
  ARM 64-bit (AArch64) Linux support.
 
diff --git a/arch/x86/Kconfig b/arch/x86/Kconfig
index 4cd5bfa91d88..0330d5dda3aa 100644
--- a/arch/x86/Kconfig
+++ b/arch/x86/Kconfig
@@ -162,6 +162,7 @@ config X86
select HAVE_ARCH_TRANSPARENT_HUGEPAGE
select HAVE_ARCH_TRANSPARENT_HUGEPAGE_PUD if X86_64
select HAVE_ARCH_USERFAULTFD_WP if X86_64 && USERFAULTFD
+   select HAVE_ARCH_USERFAULTFD_MINOR  if X86_64 && USERFAULTFD
select HAVE_ARCH_VMAP_STACK if X86_64
select HAVE_ARCH_WITHIN_STACK_FRAMES
select HAVE_ASM_MODVERSIONS
diff --git a/fs/proc/task_mmu.c b/fs/proc/task_mmu.c
index 3cec6fbef725..e1c9095ebe70 100644
--- a/fs/proc/task_mmu.c
+++ b/fs/proc/task_mmu.c
@@ -661,6 +661,9 @@ static void show_smap_vma_flags(struct seq_file *m, struct 
vm_area_struct *vma)
[ilog2(VM_PKEY_BIT4)]   = "",
 #endif
 #endif /* CONFIG_ARCH_HAS_PKEYS */
+#ifdef CONFIG_HAVE_ARCH_USERFAULTFD_MINOR
+   [ilog2(VM_UFFD_MINOR)]  = "ui",
+#endif /* CONFIG_HAVE_ARCH_USERFAULTFD_MINOR */
};
size_t i;
 
diff --git a/fs/userfaultfd.c b/fs/userfaultfd.c
index e5ce3b4e6c3d..041449e47870 100644
--- a/fs/userfaultfd.c
+++ b/fs/userfaultfd.c
@@ -197,24 +197,21 @@ static inline struct uffd_msg userfault_msg(unsigned long 
address,
msg_init();
msg.event = UFFD_EVENT_PAGEFAULT;
msg.arg.pagefault.address = address;
+   /*
+* These flags indicate why the userfault occurred:
+* - UFFD_PAGEFAULT_FLAG_WP indicates a write protect fault.
+* - UFFD_PAGEFAULT_FLAG_MINOR indicates a minor fault.
+* - Neither of these flags being set indicates a MISSING fault.
+*
+* Separately, UFFD_PAGEFAULT_FLAG_WRITE indicates it was a write
+* fault. Otherwise, it was a read fault.
+*/
if (flags & FAULT_FLAG_WRITE)
-   /*
-* If UFFD_FEATURE_PAGEFAULT_FLAG_WP was set in the
-* uffdio_api.features and UFFD_PAGEFAULT_FLAG_WRITE
-* was not set in a UFFD_EVENT_PAGEFAULT, it means it
-* was a read fault, otherwise if set it means it's
-* a write fault.
-*/
msg.arg.pagefault.flags |= UFFD_PAGEFAULT_FLAG_WRITE;
if (reason & VM_UFFD_WP)
-   /*
-* If UFFD_FEATURE_PAGEFAULT_FLAG_WP was set in the
-* uffdio_api.features and UFFD_PAGEFAULT_FLAG_WP was
-* not set in a UFFD_EVENT_PAGEFAULT, it means it was
-

[PATCH v7 2/6] userfaultfd: disable huge PMD sharing for MINOR registered VMAs

2021-02-18 Thread Axel Rasmussen

As the comment says: for the MINOR fault use case, although the page
might be present and populated in the other (non-UFFD-registered) half
of the mapping, it may be out of date, and we explicitly want userspace
to get a minor fault so it can check and potentially update the page's
contents.

Huge PMD sharing would prevent these faults from occurring for
suitably aligned areas, so disable it upon UFFD registration.

Reviewed-by: Peter Xu 
Signed-off-by: Axel Rasmussen 
---
 include/linux/userfaultfd_k.h | 13 ++---
 1 file changed, 10 insertions(+), 3 deletions(-)

diff --git a/include/linux/userfaultfd_k.h b/include/linux/userfaultfd_k.h
index 0390e5ac63b3..e060d5f77cc5 100644
--- a/include/linux/userfaultfd_k.h
+++ b/include/linux/userfaultfd_k.h
@@ -56,12 +56,19 @@ static inline bool is_mergeable_vm_userfaultfd_ctx(struct 
vm_area_struct *vma,
 }
 
 /*
- * Never enable huge pmd sharing on uffd-wp registered vmas, because uffd-wp
- * protect information is per pgtable entry.
+ * Never enable huge pmd sharing on some uffd registered vmas:
+ *
+ * - VM_UFFD_WP VMAs, because write protect information is per pgtable entry.
+ *
+ * - VM_UFFD_MINOR VMAs, because otherwise we would never get minor faults for
+ *   VMAs which share huge pmds. (If you have two mappings to the same
+ *   underlying pages, and fault in the non-UFFD-registered one with a write,
+ *   with huge pmd sharing this would *also* setup the second UFFD-registered
+ *   mapping, and we'd not get minor faults.)
  */
 static inline bool uffd_disable_huge_pmd_share(struct vm_area_struct *vma)
 {
-   return vma->vm_flags & VM_UFFD_WP;
+   return vma->vm_flags & (VM_UFFD_WP | VM_UFFD_MINOR);
 }
 
 static inline bool userfaultfd_missing(struct vm_area_struct *vma)
-- 
2.30.0.617.g56c4b15f3c-goog

1 2 3 4 5 6 7 8 9 10 >

1 - 100 of 955 matches

Mail list logo