Re: [PATCH 00/13] [RFC] Rust support

2021-04-19 Thread Greg Kroah-Hartman
On Mon, Apr 19, 2021 at 05:24:33PM -0700, Nick Desaulniers wrote:
> On Fri, Apr 16, 2021 at 10:39 AM Willy Tarreau  wrote:
> >
> > resources usage, I'm really not convinced at all it's suited for
> > low-level development. I understand the interest of the experiment
> > to help the language evolve into that direction, but I fear that
> > the kernel will soon be as bloated and insecure as a browser, and
> > that's really not to please me.
> 
> Dunno, I don't think the introduction of Rust made Firefox _more_ insecure.
> https://wiki.mozilla.org/Oxidation#Within_Firefox
> 
> I pray no executives ever see Dmitry Vyukov's 2019 Linux Plumbers Conf
> talk "Reflections on kernel quality, development process and testing."
> https://www.youtube.com/watch?v=iAfrrNdl2f4
> or his 2018 Linux Security Summit talk "Syzbot and the Tale of
> Thousand Kernel Bugs" https://www.youtube.com/watch?v=qrBVXxZDVQY
> (and they're just fuzzing the syscall interface and USB devices.
> Imagine once folks can more easily craft malformed bluetooth and wifi
> packets.)
> 
> I'd imagine the first term that comes to mind for them might be
> "liability."  They are quite sensitive to these vulnerabilities with
> silly names, logos, and websites.  There are many of us that believe
> an incremental approach of introducing a memory safe language to our
> existing infrastructure at the very least to attempt to improve the
> quality of drivers for those that choose to use such tools is a better
> approach.

I would LOVE it if some "executives" would see the above presentations,
because then they would maybe actually fund developers to fix bugs and
maintain the kernel code, instead of only allowing them to add new
features.

Seriously, that's the real problem, that Dmitry's work has exposed, the
lack of people allowed to do this type of bugfixing and maintenance on
company time, for something that the company relies on, is a huge issue.
"executives" feel that they are willing to fund the initial work and
then "throw it over the wall to the community" once it is merged, and
then they can forget about it as "the community" will maintain it for
them for free.  And that's a lie, as Dmitry's work shows.

The world creates new use cases and testing ability all the time, which
exposes bugs that have been around in old code.  Once the bugs are fixed
in that layer of code, the next layer down can finally be tested and
then look, more corner cases of issues.

Rewriting the kernel in another language is not going to fix the
majority of the issues that fuzzing finds here automagically, as that
work has exposed us to things like fault-injection and malicious USB
packets that no language would have saved us from "automatically".  All
of those code paths deal with "unsafe" data that doesn't magically
become "safe" because we switch languages.

And over time, what we have determined is "safe" has changed!  People
forget that only a few years ago have we decided that the kernel now has
to protect userspace programs from malicious hardware.  That's a major
shift in thinking, now data that we used to blindly trust can not be
trusted at all.  And "executives" want us to fix all of those issues for
free, when really it's a major design shift for loads of kernel
subsystems to handle this new threat model.

So yes, please spread that talk around.  Maybe then will we actually get
funding and support to FIX the bugs that those tools test.  Right now,
the primary fixer of those findings are _INTERNS_ as that's all
companies are willing to fund to fix this type of thing.

And don't get me started on the inability for "executives" to fund other
parts of Linux that they rely on, because they want "other companies" to
do it instead.  The tragedy-of-the-commons is a real threat to Linux,
and always has been...

thanks,

greg k-h


[PATCH v3 2/2] mmc: block: Update ext_csd.cache_ctrl if it was written

2021-04-19 Thread Avri Altman
The cache function can be turned ON and OFF by writing to the CACHE_CTRL
byte (EXT_CSD byte [33]).  However,  card->ext_csd.cache_ctrl is only
set on init if cache size > 0.

Fix that by explicitly setting ext_csd.cache_ctrl on ext-csd write.

Signed-off-by: Avri Altman 
---
 drivers/mmc/core/block.c | 12 
 1 file changed, 12 insertions(+)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 5b6501fc9fb7..8b07ed5e08de 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -572,6 +572,18 @@ static int __mmc_blk_ioctl_cmd(struct mmc_card *card, 
struct mmc_blk_data *md,
main_md->part_curr = value & EXT_CSD_PART_CONFIG_ACC_MASK;
}
 
+   /*
+* Make sure to update CACHE_CTRL in case it was changed. The cache
+* will get turned back on if the card is re-initialized, e.g.
+* suspend/resume or hw reset in recovery.
+*/
+   if ((MMC_EXTRACT_INDEX_FROM_ARG(cmd.arg) == EXT_CSD_CACHE_CTRL) &&
+   (cmd.opcode == MMC_SWITCH)) {
+   u8 value = MMC_EXTRACT_VALUE_FROM_ARG(cmd.arg) & 1;
+
+   card->ext_csd.cache_ctrl = value;
+   }
+
/*
 * According to the SD specs, some commands require a delay after
 * issuing the command.
-- 
2.25.1



[PATCH v3 1/2] mmc: block: Issue flush only if allowed

2021-04-19 Thread Avri Altman
The cache may be flushed to the nonvolatile storage by writing to
FLUSH_CACHE byte (EXT_CSD byte [32]). When in command queueing mode, the
cache may be flushed by issuing a CMDQ_TASK_ DEV_MGMT (CMD48) with a
FLUSH_CACHE op-code.  Either way, verify that The cache function is
turned ON before doing so.

fixes: 1e8e55b67030 (mmc: block: Add CQE support)

Reported-by: Brendan Peter 
Tested-by: Brendan Peter 
Signed-off-by: Avri Altman 
---
 drivers/mmc/core/block.c   | 7 +++
 drivers/mmc/core/mmc.c | 2 +-
 drivers/mmc/core/mmc_ops.h | 5 +
 3 files changed, 13 insertions(+), 1 deletion(-)

diff --git a/drivers/mmc/core/block.c b/drivers/mmc/core/block.c
index 8bfd4d95b386..5b6501fc9fb7 100644
--- a/drivers/mmc/core/block.c
+++ b/drivers/mmc/core/block.c
@@ -1476,6 +1476,11 @@ static int mmc_blk_cqe_issue_flush(struct mmc_queue *mq, 
struct request *req)
struct mmc_queue_req *mqrq = req_to_mmc_queue_req(req);
struct mmc_request *mrq = mmc_blk_cqe_prep_dcmd(mqrq, req);
 
+   if (mmc_card_mmc(mq->card) && !mmc_flush_allowed(mq->card)) {
+   blk_mq_end_request(req, BLK_STS_OK);
+   return -EPERM;
+   }
+
mrq->cmd->opcode = MMC_SWITCH;
mrq->cmd->arg = (MMC_SWITCH_MODE_WRITE_BYTE << 24) |
(EXT_CSD_FLUSH_CACHE << 16) |
@@ -2226,6 +2231,8 @@ enum mmc_issued mmc_blk_mq_issue_rq(struct mmc_queue *mq, 
struct request *req)
switch (req_op(req)) {
case REQ_OP_FLUSH:
ret = mmc_blk_cqe_issue_flush(mq, req);
+   if (ret == -EPERM)
+   return MMC_REQ_FINISHED;
break;
case REQ_OP_READ:
case REQ_OP_WRITE:
diff --git a/drivers/mmc/core/mmc.c b/drivers/mmc/core/mmc.c
index 9ad4aa537867..e3da62ffcb5e 100644
--- a/drivers/mmc/core/mmc.c
+++ b/drivers/mmc/core/mmc.c
@@ -2037,7 +2037,7 @@ static int _mmc_flush_cache(struct mmc_card *card)
 {
int err = 0;
 
-   if (card->ext_csd.cache_size > 0 && card->ext_csd.cache_ctrl & 1) {
+   if (mmc_flush_allowed(card)) {
err = mmc_switch(card, EXT_CSD_CMD_SET_NORMAL,
 EXT_CSD_FLUSH_CACHE, 1,
 CACHE_FLUSH_TIMEOUT_MS);
diff --git a/drivers/mmc/core/mmc_ops.h b/drivers/mmc/core/mmc_ops.h
index 5782fdf4e8e9..2682bf66708a 100644
--- a/drivers/mmc/core/mmc_ops.h
+++ b/drivers/mmc/core/mmc_ops.h
@@ -19,6 +19,11 @@ enum mmc_busy_cmd {
 struct mmc_host;
 struct mmc_card;
 
+static inline bool mmc_flush_allowed(struct mmc_card *card)
+{
+   return card->ext_csd.cache_size > 0 && card->ext_csd.cache_ctrl & 1;
+}
+
 int mmc_select_card(struct mmc_card *card);
 int mmc_deselect_cards(struct mmc_host *host);
 int mmc_set_dsr(struct mmc_host *host);
-- 
2.25.1



[PATCH v3 0/2] Do not flush cache when it is disabled

2021-04-19 Thread Avri Altman
v2 -> v3:
 - rebase onto recent cache changes

v1 -> v2:
 - Attend Adrian's comments

Cache is a temporary storage space in an eMMC device. Volatile by
nature, the cache should in typical case reduce the access time compared
to an access to the main nonvolatile storage.

The cache function can be turned ON and OFF. Once OFF, the host is not
expected to issue a flush-cache command to the device.

Avri Altman (2):
  mmc: block: Issue flush only if allowed
  mmc: block: Update ext_csd.cache_ctrl if it was written

 drivers/mmc/core/block.c   | 19 +++
 drivers/mmc/core/mmc.c |  2 +-
 drivers/mmc/core/mmc_ops.h |  5 +
 3 files changed, 25 insertions(+), 1 deletion(-)

-- 
2.25.1



Re: Enabling pmbus power control

2021-04-19 Thread Zev Weiss

On Mon, Apr 19, 2021 at 10:36:48PM CDT, Guenter Roeck wrote:

On Mon, Apr 19, 2021 at 08:29:53PM -0500, Zev Weiss wrote:

On Tue, Mar 30, 2021 at 02:38:10PM CDT, Guenter Roeck wrote:
> On Tue, Mar 30, 2021 at 07:02:00PM +0100, Mark Brown wrote:
> > On Tue, Mar 30, 2021 at 12:56:56PM -0500, Zev Weiss wrote:
> >
> > > Okay, to expand a bit on the description in my initial message -- we've
> > > got a single chassis with multiple server boards and a single manager 
board
> > > that handles, among other things, power control for the servers.
> > > The manager board has one LM25066 for each attached server, which acts as
> > > the "power switch" for that server.  There thus really isn't any driver to
> > > speak of for the downstream device.
> >
> > This sounds like you need a driver representing those server boards (or
> > the slots they plug into perhaps) that represents everything about those
> > boards to userspace, including power switching.  I don't see why you
> > wouldn't have a driver for that - it's a thing that physically exists
> > and clearly has some software control, and you'd presumably also expect
> > to represent some labelling about the slot as well.
>
> Absolutely agree.
>
> Thanks,
> Guenter

Hi Guenter, Mark,

I wanted to return to this to try to explain why structuring the kernel
support for this in a way that's specific to the device behind the PMIC
seems like an awkward fit to me, and ultimately kind of artificial.

In the system I described, the manager board with the LM25066 devices is its
own little self-contained BMC system running its own Linux kernel (though
"BMC" is perhaps a slightly misleading term because there's no host system
that it manages).  The PMICs are really the only relevant connection it has
to the servers it controls power for -- they have their own dedicated local
BMCs on board as well doing all the usual BMC tasks.  A hypothetical
dedicated driver for this on the manager board wouldn't have any other
hardware to touch aside from the pmbus interface of the LM25066 itself, so
calling it a server board driver seems pretty arbitrary -- and in fact the
same system also has another LM25066 that controls the power supply to the
chassis cooling fans (a totally different downstream device, but one that
would be equally well-served by the same software).

More recently, another system has entered the picture for us that might
illustrate it more starkly -- the Delta Open19 power shelf [0] supported by
a recent code release from LinkedIn [1].  This is a rackmount power supply


All I can see is that this code is a mess.


with fifty outputs, each independently switchable via an LM25066 attached to
an on-board BMC-style management controller (though again, no host system
involved).  We (Equinix Metal) are interested in porting a modern OpenBMC to
it to replace the dated, crufty, pre-Linux-Foundation version of OpenBMC it
currently runs (as found in the linked repo).  The exact nature of the
devices powered by its outputs is well outside the scope of the firmware
running on that controller (it could be any arbitrary thing that runs on
12VDC), but we still want to be able to both (a) retrieve per-output
voltage/current/power readings as provided by the existing LM25066 hwmon
support, and (b) control the on/off state of those outputs from userspace.

Given the array of possible use-cases, an approach of adding power-switch
functionality to the existing LM25066 support seems like the most obvious
way to support this, so I'm hoping to see if I can get some idea of what an
acceptable implementation of that might look like.



Sorry, that is simply a no-go for the LM25066 driver. I mentioned it before,
and it is still true: The hwmon subsystem is not in the business of power
control.

Have you looked into enclosures (drivers/misc/enclosure.c) ? Maybe that
or something similar could be used for your purposes.

Thanks,
Guenter


I had a glance at the enclosure driver; it looks pretty geared toward 
SES-like things (drivers/scsi/ses.c being its only usage I can see in 
the kernel at the moment) and while it could perhaps be pressed into 
working for this it seems like it would probably drag in a fair amount 
of boilerplate and result in a somewhat gratuitously confusing driver 
arrangement (calling the things involved in the cases we're looking at 
"enclosures" seems like a bit of a stretch).


As an alternative, would something like the patch below be more along 
the lines of what you're suggesting?  And if so, would it make sense to 
generalize it into something like 'pmbus-switch.c' and add a 
PMBUS_HAVE_POWERSWITCH functionality bit or similar in the pmbus code 
instead of hardcoding it for only LM25066 support?




Thanks,
Zev


From f4c0a3637371d69a414b6fb882497d22e5de3ba0 Mon Sep 17 00:00:00 2001
From: Zev Weiss 
Date: Wed, 31 Mar 2021 01:58:35 -0500
Subject: [PATCH] misc: add lm25066-switch driver

This driver allows an lm25066 to be switched on and off from userspace
via 

Doubt regarding memory allocation in KVM

2021-04-19 Thread Shivank Garg
Hi,
I'm learning about qemu KVM, looking into code and experimenting on
it. I have the following doubts regarding it, I would be grateful if
you help me to get some idea on them.

1. I observe that KVM allocates memory to guests when it needs it but
doesn't take it back (except for ballooning case).
Also, the Qemu/KVM process does not free the memory even when the
guest is rebooted. In this case,  Does the Guest VM get access to
memory already pre-filled with some garbage from the previous run??
(Since the host would allocate zeroed pages to guests the first time
it requests but after that it's up to guests). Can it be a security
issue?

2. How does the KVM know if GPFN (guest physical frame number) is
backed by an actual machine frame number in host? If not mapped, then
it faults in the host and allocates a physical frame for guests in the
host. (kvm_mmu_page_fault)

3. How/where can I access the GPFNs in the host? Is "gfn_t gfn = gpa
>> PAGE_SHIFT" and "gpa_t cr2_or_gpa" in the KVM page fault handler,
x86 is the same as GPFN. (that is can I use pfn_to_page in guest VM to
access the struct page in Guest)

Thank You.

Best Regards,
Shivank Garg
M.Tech Student,
IIT Kanpur


Re: [PATCH] watchdog: qcom: Move suspend/resume to suspend_late/resume_early

2021-04-19 Thread Sai Prakash Ranjan

Hi Guenter,

On 2021-03-11 01:53, Guenter Roeck wrote:

On Thu, Mar 11, 2021 at 01:50:04AM +0530, Sai Prakash Ranjan wrote:

During suspend/resume usecases and tests, it is common to see issues
such as lockups either in suspend path or resume path because of the
bugs in the corresponding device driver pm handling code. In such 
cases,

it is important that watchdog is active to make sure that we either
receive a watchdog pretimeout notification or a bite causing reset
instead of a hang causing us to hard reset the machine.

There are good reasons as to why we need this because:

* We can have a watchdog pretimeout governor set to panic in which
  case we can have a backtrace which would help identify the issue
  with the particular driver and cause a normal reboot.

* Even in case where there is no pretimeout support, a watchdog
  bite is still useful because some firmware has debug support to dump
  CPU core context on watchdog bite for post-mortem analysis.

* One more usecase which comes to mind is of warm reboot. In case we
  hard reset the target, a cold reboot could be induced resulting in
  lose of ddr contents thereby losing all the debug info.

Currently, the watchdog pm callback just invokes the usual suspend
and resume callback which do not have any special ordering in the
sense that a watchdog can be suspended before the buggy device driver
suspend callback and watchdog resume can happen after the buggy device
driver resume callback. This would mean that the watchdog will not be
active when the buggy driver cause the lockups thereby hanging the
system. So to make sure this doesn't happen, move the watchdog pm to
use late/early system pm callbacks which will ensure that the watchdog
is suspended late and resumed early so that it can catch such issues.

Signed-off-by: Sai Prakash Ranjan 


Reviewed-by: Guenter Roeck 



Gentle Ping. I don't see this in linux-next or linux-watchdog, please 
let

me know if anything is pending from my side.

Thanks,
Sai


---
 drivers/watchdog/qcom-wdt.c | 4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/watchdog/qcom-wdt.c b/drivers/watchdog/qcom-wdt.c
index e38a87ffe5f5..0d2209c5eaca 100644
--- a/drivers/watchdog/qcom-wdt.c
+++ b/drivers/watchdog/qcom-wdt.c
@@ -329,7 +329,9 @@ static int __maybe_unused qcom_wdt_resume(struct 
device *dev)

return 0;
 }

-static SIMPLE_DEV_PM_OPS(qcom_wdt_pm_ops, qcom_wdt_suspend, 
qcom_wdt_resume);

+static const struct dev_pm_ops qcom_wdt_pm_ops = {
+   SET_LATE_SYSTEM_SLEEP_PM_OPS(qcom_wdt_suspend, qcom_wdt_resume)
+};

 static const struct of_device_id qcom_wdt_of_table[] = {
{ .compatible = "qcom,kpss-timer", .data = _data_apcs_tmr },
--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation



--
QUALCOMM INDIA, on behalf of Qualcomm Innovation Center, Inc. is a 
member

of Code Aurora Forum, hosted by The Linux Foundation


[PATCH] x86/mm: Fix copy error in comments

2021-04-19 Thread Cao jin
Direct page mapping in bottom-up way will allocate memory from low
address for page structures in a range, which is the *bottom*,
not the *end*.

Signed-off-by: Cao jin 
---
 arch/x86/mm/init.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/x86/mm/init.c b/arch/x86/mm/init.c
index e26f5c5c6565..bc2f871c75f1 100644
--- a/arch/x86/mm/init.c
+++ b/arch/x86/mm/init.c
@@ -663,7 +663,7 @@ static void __init memory_map_bottom_up(unsigned long 
map_start,
/*
 * We start from the bottom (@map_start) and go to the top (@map_end).
 * The memblock_find_in_range() gets us a block of RAM from the
-* end of RAM in [min_pfn_mapped, max_pfn_mapped) used as new pages
+* bottom of RAM in [min_pfn_mapped, max_pfn_mapped) used as new pages
 * for page table.
 */
while (start < map_end) {
-- 
2.30.1



Re: [PATCH 1/1] powerpc/pseries/iommu: Fix window size for direct mapping with pmem

2021-04-19 Thread Leonardo Bras
On Tue, 2021-04-20 at 15:18 +1000, Alexey Kardashevskiy wrote:
> 
> On 20/04/2021 14:54, Leonardo Bras wrote:
> > As of today, if the DDW is big enough to fit (1 << MAX_PHYSMEM_BITS) it's
> > possible to use direct DMA mapping even with pmem region.
> > 
> > But, if that happens, the window size (len) is set to
> > (MAX_PHYSMEM_BITS - page_shift) instead of MAX_PHYSMEM_BITS, causing a
> > pagesize times smaller DDW to be created, being insufficient for correct
> > usage.
> > 
> > Fix this so the correct window size is used in this case.
> 
> Good find indeed.
> 
> afaict this does not create a huge problem though as 
> query.largest_available_block is always smaller than (MAX_PHYSMEM_BITS - 
> page_shift) where it matters (phyp).
> 
> 
> Reviewed-by: Alexey Kardashevskiy 
> 

Thanks for reviewing!

Leonardo Bras



Re: [PATCH 5/7] mfd: sec: Simplify getting of_device_id match data

2021-04-19 Thread Marek Szyprowski


On 19.04.2021 10:17, Krzysztof Kozlowski wrote:
> Use of_device_get_match_data() to make the code slightly smaller.
>
> Signed-off-by: Krzysztof Kozlowski 
> ---
>   drivers/mfd/sec-core.c | 9 +++--
>   1 file changed, 3 insertions(+), 6 deletions(-)
>
> diff --git a/drivers/mfd/sec-core.c b/drivers/mfd/sec-core.c
> index 8d55992da19e..3126c39f3203 100644
> --- a/drivers/mfd/sec-core.c
> +++ b/drivers/mfd/sec-core.c
> @@ -10,6 +10,7 @@
>   #include 
>   #include 
>   #include 
> +#include 
>   #include 
>   #include 
>   #include 
> @@ -324,12 +325,8 @@ static inline unsigned long 
> sec_i2c_get_driver_data(struct i2c_client *i2c,
>   const struct i2c_device_id *id)
>   {
>   #ifdef CONFIG_OF
> - if (i2c->dev.of_node) {
> - const struct of_device_id *match;
> -
> - match = of_match_node(sec_dt_match, i2c->dev.of_node);
> - return (unsigned long)match->data;
> - }
> + if (i2c->dev.of_node)
> + return (unsigned long)of_device_get_match_data(>dev);
>   #endif

Does it make any sense to keep the #ifdef CONFIG_OF after this change? I 
would also skip (i2c->dev.of_node) check, because 
of_device_get_match_data() already does that (although indirectly).

>   return id->driver_data;
>   }

Best regards
-- 
Marek Szyprowski, PhD
Samsung R Institute Poland



Re: [PATCH v2 8/8] MIPS: pci-legacy: use generic pci_enable_resources

2021-04-19 Thread Guenter Roeck
On Tue, Apr 13, 2021 at 08:12:40PM -0700, Ilya Lipnitskiy wrote:
> Follow the reasoning from commit 842de40d93e0 ("PCI: add generic
> pci_enable_resources()"):
> 
>   The only functional difference from the MIPS version is that the
>   generic one uses "!r->parent" to check for resource collisions
>   instead of "!r->start && r->end".
> 
> That should have no effect on any pci-legacy driver.
> 

Unfortunately it does. With this patch in place, all my mips qemu emulations
fail to boot from ide/ata drive; the driver is not instantiated. The error
message is:

ata_piix :00:0a.1: can't enable device: BAR 0 [io  0x01f0-0x01f7] not 
claimed
ata_piix: probe of :00:0a.1 failed with error -22

Reverting this patch fixes the problem, and the message displayed by the driver
is:

ata_piix :00:0a.1: enabling device ( -> 0001)

Bisect log is attached.

Guenter

---
# bad: [50b8b1d699ac313c0a07a3c185ffb23aecab8abb] Add linux-next specific files 
for 20210419
# good: [bf05bf16c76bb44ab5156223e1e58e26dfe30a88] Linux 5.12-rc8
git bisect start 'HEAD' 'v5.12-rc8'
# bad: [c4bb91fc07e59241cde97f913d7a2fbedc248f0d] Merge remote-tracking branch 
'crypto/master'
git bisect bad c4bb91fc07e59241cde97f913d7a2fbedc248f0d
# bad: [499f739ad70f2a58aac985dceb25ca7666da88be] Merge remote-tracking branch 
'jc_docs/docs-next'
git bisect bad 499f739ad70f2a58aac985dceb25ca7666da88be
# bad: [11b56408a328d1c5c4dfa7667c5dc46956b64aec] Merge remote-tracking branch 
'parisc-hd/for-next'
git bisect bad 11b56408a328d1c5c4dfa7667c5dc46956b64aec
# good: [09ccc0ee1227f2cfe50d8dbbe241d115d9b3885f] Merge branch 'arm/defconfig' 
into for-next
git bisect good 09ccc0ee1227f2cfe50d8dbbe241d115d9b3885f
# good: [a5b76c2f17330e266a5c56dde21430e27b0d0dbb] Merge remote-tracking branch 
'arm-soc/for-next'
git bisect good a5b76c2f17330e266a5c56dde21430e27b0d0dbb
# good: [1e4241f6813f1c1a0027d96df075ffd01808b3cf] Merge remote-tracking branch 
'ti-k3/ti-k3-next'
git bisect good 1e4241f6813f1c1a0027d96df075ffd01808b3cf
# good: [7496a43be7a362391607d78e49a3f28de80029ce] Merge remote-tracking branch 
'h8300/h8300-next'
git bisect good 7496a43be7a362391607d78e49a3f28de80029ce
# good: [66633abd0642f1e89d26e15f36fb13d3a1c535ff] MIPS/bpf: Enable 
bpf_probe_read{, str}() on MIPS again
git bisect good 66633abd0642f1e89d26e15f36fb13d3a1c535ff
# good: [2c92ef8ff8d327797c1920ae7f938bcc6f3f7421] MIPS: Fix strnlen_user 
access check
git bisect good 2c92ef8ff8d327797c1920ae7f938bcc6f3f7421
# good: [3a070801c61f4e3987d59b1068368ba71d727208] Merge remote-tracking branch 
'microblaze/next'
git bisect good 3a070801c61f4e3987d59b1068368ba71d727208
# good: [317f553bb677e324c9c865ff7f14597bc5ceeb9c] MIPS: pci-legacy: remove 
redundant info messages
git bisect good 317f553bb677e324c9c865ff7f14597bc5ceeb9c
# bad: [6ce48897ce476bed86fde28752c27596e8753277] MIPS: Loongson64: Add 
kexec/kdump support
git bisect bad 6ce48897ce476bed86fde28752c27596e8753277
# bad: [99bca615d89510917864fac6b26fd343eff2aba2] MIPS: pci-legacy: use generic 
pci_enable_resources
git bisect bad 99bca615d89510917864fac6b26fd343eff2aba2
# good: [0af83d2e447af3e5098583cb6320bb1b1fb0976b] MIPS: pci-legacy: remove 
busn_resource field
git bisect good 0af83d2e447af3e5098583cb6320bb1b1fb0976b
# first bad commit: [99bca615d89510917864fac6b26fd343eff2aba2] MIPS: 
pci-legacy: use generic pci_enable_resources


Re: [PATCH] mips: kdump: Crash kernel should be able to see old memories

2021-04-19 Thread Youling Tang

Hi, Jiaxun

On 04/20/2021 09:11 AM, Jiaxun Yang wrote:


在 2021/4/19 18:56, Youling Tang 写道:

From: Huacai Chen 

kexec-tools use mem=X@Y to pass usable memories to crash kernel, but in
commit a94e4f24ec836c8984f83959 ("MIPS: init: Drop boot_mem_map") all
BIOS passed memories are removed by early_parse_mem(). I think this is
reasonable for a normal kernel but not for a crash kernel, because a
crash kernel should be able to see all old memories, even though it is
not supposed to use them.

Fixes: a94e4f24ec836c8984f83959 ("MIPS: init: Drop boot_mem_map")
Signed-off-by: Huacai Chen 
Signed-off-by: Youling Tang 
---
  arch/mips/kernel/setup.c | 2 ++
  1 file changed, 2 insertions(+)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index b86e241..ac90d3b 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -351,8 +351,10 @@ static int __init early_parse_mem(char *p)
   */
  if (usermem == 0) {
  usermem = 1;
+#ifndef CONFIG_CRASH_DUMP


Why depend on a config instead of a runtime variable?


If not depend on config, we can determine whether the command line contains
the "elfcorehdr=" parameter, because the "mem=" and "elfcorhdr=" parameters
are automatically added in kexec-tools. So if there is an "elfcorehdr="
parameter in the command line, it means that the currently running kernel
is a capture kernel, and the memblock_remove() operation is not called.

The revised patch is as follows:
if (usermem == 0) {
usermem = 1;
-   memblock_remove(memblock_start_of_DRAM(),
-   memblock_end_of_DRAM() - memblock_start_of_DRAM());
+   if (!strstr(boot_command_line, "elfcorehdr")) {
+   memblock_remove(memblock_start_of_DRAM(),
+   memblock_end_of_DRAM() - 
memblock_start_of_DRAM());

+   }

Do you think it is feasible?

Btw as you are fixing my commit please keep me CCed.

Sorry, I will add your CCed.

Thanks,
Youling


Thanks.


- Jiaxun





Re: [PATCH 1/1] powerpc/pseries/iommu: Fix window size for direct mapping with pmem

2021-04-19 Thread Alexey Kardashevskiy




On 20/04/2021 14:54, Leonardo Bras wrote:

As of today, if the DDW is big enough to fit (1 << MAX_PHYSMEM_BITS) it's
possible to use direct DMA mapping even with pmem region.

But, if that happens, the window size (len) is set to
(MAX_PHYSMEM_BITS - page_shift) instead of MAX_PHYSMEM_BITS, causing a
pagesize times smaller DDW to be created, being insufficient for correct
usage.

Fix this so the correct window size is used in this case.


Good find indeed.

afaict this does not create a huge problem though as 
query.largest_available_block is always smaller than (MAX_PHYSMEM_BITS - 
page_shift) where it matters (phyp).



Reviewed-by: Alexey Kardashevskiy 



Fixes: bf6e2d562bbc4("powerpc/dma: Fallback to dma_ops when persistent memory 
present")
Signed-off-by: Leonardo Bras 
---
  arch/powerpc/platforms/pseries/iommu.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 9fc5217f0c8e..836cbbe0ecc5 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1229,7 +1229,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
device_node *pdn)
if (pmem_present) {
if (query.largest_available_block >=
(1ULL << (MAX_PHYSMEM_BITS - page_shift)))
-   len = MAX_PHYSMEM_BITS - page_shift;
+   len = MAX_PHYSMEM_BITS;
else
dev_info(>dev, "Skipping ibm,pmemory");
}



--
Alexey


[tip:master] BUILD SUCCESS 75ce17f4d207d047b991e28cd5243f2345889c0c

2021-04-19 Thread kernel test robot
tree/branch: https://git.kernel.org/pub/scm/linux/kernel/git/tip/tip.git master
branch HEAD: 75ce17f4d207d047b991e28cd5243f2345889c0c  Merge branch 'x86/build'

elapsed time: 722m

configs tested: 117
configs skipped: 3

The following configs have been built successfully.
More configs may be tested in the coming days.

gcc tested configs:
arm defconfig
arm64allyesconfig
arm64   defconfig
arm  allyesconfig
arm  allmodconfig
x86_64   allyesconfig
riscvallmodconfig
i386 allyesconfig
riscvallyesconfig
powerpc   motionpro_defconfig
sh ap325rxa_defconfig
mips  pic32mzda_defconfig
powerpc xes_mpc85xx_defconfig
m68kmvme16x_defconfig
arm   versatile_defconfig
s390 alldefconfig
mips  pistachio_defconfig
armspear3xx_defconfig
sh  rsk7203_defconfig
powerpc mpc8272_ads_defconfig
mips  rb532_defconfig
shedosk7705_defconfig
powerpc asp8347_defconfig
armrealview_defconfig
sparc   sparc32_defconfig
um   alldefconfig
sh   rts7751r2dplus_defconfig
sh   se7705_defconfig
powerpc redwood_defconfig
powerpc  katmai_defconfig
sh ecovec24_defconfig
sh  rsk7201_defconfig
ia64 allmodconfig
ia64defconfig
parisc   alldefconfig
powerpc   holly_defconfig
shsh7785lcr_defconfig
m68k  multi_defconfig
arm   omap2plus_defconfig
sh   sh7724_generic_defconfig
mips loongson1b_defconfig
powerpc  g5_defconfig
arm  footbridge_defconfig
powerpc sequoia_defconfig
ia64 allyesconfig
m68k allmodconfig
m68kdefconfig
m68k allyesconfig
nios2   defconfig
arc  allyesconfig
nds32 allnoconfig
nds32   defconfig
nios2allyesconfig
cskydefconfig
alpha   defconfig
alphaallyesconfig
xtensa   allyesconfig
h8300allyesconfig
arc defconfig
sh   allmodconfig
parisc  defconfig
s390 allyesconfig
s390 allmodconfig
parisc   allyesconfig
s390defconfig
sparcallyesconfig
sparc   defconfig
i386defconfig
mips allyesconfig
mips allmodconfig
powerpc  allyesconfig
powerpc  allmodconfig
powerpc   allnoconfig
x86_64   randconfig-a003-20210419
x86_64   randconfig-a001-20210419
x86_64   randconfig-a005-20210419
x86_64   randconfig-a002-20210419
x86_64   randconfig-a006-20210419
x86_64   randconfig-a004-20210419
i386 randconfig-a003-20210419
i386 randconfig-a001-20210419
i386 randconfig-a006-20210419
i386 randconfig-a005-20210419
i386 randconfig-a004-20210419
i386 randconfig-a002-20210419
i386 randconfig-a012-20210420
i386 randconfig-a014-20210420
i386 randconfig-a011-20210420
i386 randconfig-a013-20210420
i386 randconfig-a015-20210420
i386 randconfig-a016-20210420
i386 randconfig-a015-20210419
i386 randconfig-a013-20210419
i386 randconfig-a014-20210419
i386 randconfig-a016-20210419
i386 randconfig-a012-20210419
i386 randconfig-a011-20210419
riscvnommu_k210_defconfig
riscvnommu_virt_defconfig
riscv allnoconfig
riscv   defconfig
riscv

[syzbot] KASAN: use-after-free Read in sctp_do_8_2_transport_strike

2021-04-19 Thread syzbot
Hello,

syzbot found the following issue on:

HEAD commit:bf05bf16 Linux 5.12-rc8
git tree:   upstream
console output: https://syzkaller.appspot.com/x/log.txt?x=16a00dfed0
kernel config:  https://syzkaller.appspot.com/x/.config?x=9404cfa686df2c05
dashboard link: https://syzkaller.appspot.com/bug?extid=bbe538efd1046586f587

Unfortunately, I don't have any reproducer for this issue yet.

IMPORTANT: if you fix the issue, please add the following tag to the commit:
Reported-by: syzbot+bbe538efd1046586f...@syzkaller.appspotmail.com

==
BUG: KASAN: use-after-free in 
sctp_do_8_2_transport_strike.constprop.0+0xa27/0xab0 
net/sctp/sm_sideeffect.c:531
Read of size 4 at addr 888024d65154 by task swapper/1/0

CPU: 1 PID: 0 Comm: swapper/1 Not tainted 5.12.0-rc8-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x141/0x1d7 lib/dump_stack.c:120
 print_address_description.constprop.0.cold+0x5b/0x2f8 mm/kasan/report.c:232
 __kasan_report mm/kasan/report.c:399 [inline]
 kasan_report.cold+0x7c/0xd8 mm/kasan/report.c:416
 sctp_do_8_2_transport_strike.constprop.0+0xa27/0xab0 
net/sctp/sm_sideeffect.c:531
 sctp_cmd_interpreter net/sctp/sm_sideeffect.c:1636 [inline]
 sctp_side_effects net/sctp/sm_sideeffect.c:1185 [inline]
 sctp_do_sm+0x114f/0x5120 net/sctp/sm_sideeffect.c:1156
 sctp_generate_timeout_event+0x1bb/0x3d0 net/sctp/sm_sideeffect.c:295
 call_timer_fn+0x1a5/0x6b0 kernel/time/timer.c:1431
 expire_timers kernel/time/timer.c:1476 [inline]
 __run_timers.part.0+0x67c/0xa50 kernel/time/timer.c:1745
 __run_timers kernel/time/timer.c:1726 [inline]
 run_timer_softirq+0xb3/0x1d0 kernel/time/timer.c:1758
 __do_softirq+0x29b/0x9f6 kernel/softirq.c:345
 invoke_softirq kernel/softirq.c:221 [inline]
 __irq_exit_rcu kernel/softirq.c:422 [inline]
 irq_exit_rcu+0x134/0x200 kernel/softirq.c:434
 sysvec_apic_timer_interrupt+0x93/0xc0 arch/x86/kernel/apic/apic.c:1100
 
 asm_sysvec_apic_timer_interrupt+0x12/0x20 arch/x86/include/asm/idtentry.h:632
RIP: 0010:native_save_fl arch/x86/include/asm/irqflags.h:29 [inline]
RIP: 0010:arch_local_save_flags arch/x86/include/asm/irqflags.h:70 [inline]
RIP: 0010:arch_irqs_disabled arch/x86/include/asm/irqflags.h:137 [inline]
RIP: 0010:acpi_safe_halt drivers/acpi/processor_idle.c:112 [inline]
RIP: 0010:acpi_idle_do_entry+0x1c9/0x250 drivers/acpi/processor_idle.c:517
Code: ed 56 6e f8 84 db 75 ac e8 34 50 6e f8 e8 3f 3f 74 f8 e9 0c 00 00 00 e8 
25 50 6e f8 0f 00 2d be a5 c7 00 e8 19 50 6e f8 fb f4 <9c> 5b 81 e3 00 02 00 00 
fa 31 ff 48 89 de e8 24 58 6e f8 48 85 db
RSP: 0018:c9d57d18 EFLAGS: 0293
RAX:  RBX:  RCX: 
RDX: 8880111c54c0 RSI: 8905a167 RDI: 
RBP: 888141433864 R08: 0001 R09: 0001
R10: 8179e0c8 R11:  R12: 0001
R13: 888141433800 R14: 888141433864 R15: 888017879004
 acpi_idle_enter+0x361/0x500 drivers/acpi/processor_idle.c:652
 cpuidle_enter_state+0x1b1/0xc80 drivers/cpuidle/cpuidle.c:237
 cpuidle_enter+0x4a/0xa0 drivers/cpuidle/cpuidle.c:351
 call_cpuidle kernel/sched/idle.c:158 [inline]
 cpuidle_idle_call kernel/sched/idle.c:239 [inline]
 do_idle+0x3e1/0x590 kernel/sched/idle.c:300
 cpu_startup_entry+0x14/0x20 kernel/sched/idle.c:397
 start_secondary+0x274/0x350 arch/x86/kernel/smpboot.c:272
 secondary_startup_64_no_verify+0xb0/0xbb

Allocated by task 14433:
 kasan_save_stack+0x1b/0x40 mm/kasan/common.c:38
 kasan_set_track mm/kasan/common.c:46 [inline]
 set_alloc_info mm/kasan/common.c:427 [inline]
 kasan_kmalloc mm/kasan/common.c:506 [inline]
 kasan_kmalloc mm/kasan/common.c:465 [inline]
 __kasan_kmalloc+0x99/0xc0 mm/kasan/common.c:515
 kmalloc include/linux/slab.h:554 [inline]
 kzalloc include/linux/slab.h:684 [inline]
 sctp_transport_new+0x8c/0x690 net/sctp/transport.c:96
 sctp_assoc_add_peer+0x28f/0x1160 net/sctp/associola.c:618
 sctp_process_init+0x12a/0x2940 net/sctp/sm_make_chunk.c:2345
 sctp_sf_do_dupcook_a net/sctp/sm_statefuns.c:1800 [inline]
 sctp_sf_do_5_2_4_dupcook+0x1401/0x2d50 net/sctp/sm_statefuns.c:2200
 sctp_do_sm+0x179/0x5120 net/sctp/sm_sideeffect.c:1153
 sctp_assoc_bh_rcv+0x386/0x6c0 net/sctp/associola.c:1048
 sctp_inq_push+0x1da/0x270 net/sctp/inqueue.c:80
 sctp_rcv+0xf64/0x2f10 net/sctp/input.c:256
 sctp6_rcv+0x38/0x60 net/sctp/ipv6.c:1077
 ip6_protocol_deliver_rcu+0x2e9/0x17f0 net/ipv6/ip6_input.c:422
 ip6_input_finish+0x7f/0x160 net/ipv6/ip6_input.c:463
 NF_HOOK include/linux/netfilter.h:301 [inline]
 NF_HOOK include/linux/netfilter.h:295 [inline]
 ip6_input+0x9c/0xd0 net/ipv6/ip6_input.c:472
 dst_input include/net/dst.h:458 [inline]
 ip6_rcv_finish net/ipv6/ip6_input.c:76 [inline]
 NF_HOOK include/linux/netfilter.h:301 [inline]
 NF_HOOK include/linux/netfilter.h:295 [inline]
 ipv6_rcv+0x28e/0x3c0 

Re: [PATCH] bonding: 3ad: update slave arr after initialize

2021-04-19 Thread Jay Vosburgh
jin yiting  wrote:
[...]
>>  The described issue is a race condition (in that
>> ad_agg_selection_logic clears agg->is_active under mode_lock, but
>> bond_open -> bond_update_slave_arr is inspecting agg->is_active outside
>> the lock).  I don't see how the above change will reliably manage this;
>> the real issue looks to be that bond_update_slave_arr is committing
>> changes to the array (via bond_reset_slave_arr) based on a racy
>> inspection of the active aggregator state while it is in flux.
>>
>>  Also, the description of the issue says "The best aggregator in
>> ad_agg_selection_logic has not changed, no need to update slave arr,"
>> but the change above does the opposite, and will set update_slave_arr
>> when the aggregator has not changed (update_slave_arr remains false at
>> return of ad_agg_selection_logic).
>>
>>  I believe I understand the described problem, but I don't see
>> how the patch fixes it.  I suspect (but haven't tested) that the proper
>> fix is to acquire mode_lock in bond_update_slave_arr while calling
>> bond_3ad_get_active_agg_info to avoid conflict with the state machine.
>>
>>  -J
>>
>> ---
>>  -Jay Vosburgh, jay.vosbu...@canonical.com
>> .
>>
>
>   Thank you for your reply. The last patch does have redundant
>update slave arr.Thank you for your correction.
>
>As you said, holding mode_lock in bond_update_slave_arr while
>calling bond_3ad_get_active_agg_info can avoid conflictwith the state
>machine. I have tested this patch, with ifdown/ifup operations for bond or
>slaves.
>
>But bond_update_slave_arr is expected to hold RTNL only and NO
>other lock. And it have WARN_ON(lockdep_is_held(>mode_lock)); in
>bond_update_slave_arr. I'm not sure that holding mode_lock in
>bond_update_slave_arr while calling bond_3ad_get_active_agg_info is a
>correct action.

That WARN_ON came up in discussion recently, and my opinion is
that it's incorrect, and is trying to insure bond_update_slave_arr is
safe for a potential sleep when allocating memory.

https://lore.kernel.org/netdev/20210322123846.3024549-1-maxi...@nvidia.com/

The original authors haven't replied, so I would suggest you
remove the WARN_ON and the surrounding CONFIG_LOCKDEP ifdefs as part of
your patch and replace it with a call to might_sleep.

The other callers of bond_3ad_get_active_agg_info are generally
obtaining the state in order to report it to user space, so I think it's
safe to leave those calls not holding the mode_lock.  The race is still
there, but the data returned to user space is a snapshot and so may
reflect an incomplete state during a transition.  Further, having the
inspection functions acquire the mode_lock permits user space to spam
the lock with little effort.

-J

>diff --git a/drivers/net/bonding/bond_main.c
>b/drivers/net/bonding/bond_main.c
>index 74cbbb2..db988e5 100644
>--- a/drivers/net/bonding/bond_main.c
>+++ b/drivers/net/bonding/bond_main.c
>@@ -4406,7 +4406,9 @@ int bond_update_slave_arr(struct bonding *bond,
>struct slave *skipslave)
>if (BOND_MODE(bond) == BOND_MODE_8023AD) {
>struct ad_info ad_info;
>
>+   spin_lock_bh(>mode_lock);
>if (bond_3ad_get_active_agg_info(bond, _info)) {
>+   spin_unlock_bh(>mode_lock);
>pr_debug("bond_3ad_get_active_agg_info failed\n");
>/* No active aggragator means it's not safe to use
> * the previous array.
>@@ -4414,6 +4416,7 @@ int bond_update_slave_arr(struct bonding *bond,
>struct slave *skipslave)
>bond_reset_slave_arr(bond);
>goto out;
>}
>+   spin_unlock_bh(>mode_lock);
>agg_id = ad_info.aggregator_id;
>}
>bond_for_each_slave(bond, slave, iter) {

---
-Jay Vosburgh, jay.vosbu...@canonical.com


Re: PPC_FPU, ALTIVEC: enable_kernel_fp, put_vr, get_vr

2021-04-19 Thread Christophe Leroy




Le 19/04/2021 à 23:39, Randy Dunlap a écrit :

On 4/19/21 6:16 AM, Michael Ellerman wrote:

Randy Dunlap  writes:



Sure.  I'll post them later today.
They keep FPU and ALTIVEC as independent (build) features.


Those patches look OK.

But I don't think it makes sense to support that configuration, FPU=n
ALTVEC=y. No one is ever going to make a CPU like that. We have enough
testing surface due to configuration options, without adding artificial
combinations that no one is ever going to use.

IMHO :)

So I'd rather we just make ALTIVEC depend on FPU.


That's rather simple. See below.
I'm doing a bunch of randconfig builds with it now.

---
From: Randy Dunlap 
Subject: [PATCH] powerpc: make ALTIVEC depend PPC_FPU

On a kernel config with ALTIVEC=y and PPC_FPU not set/enabled,
there are build errors:

drivers/cpufreq/pmac32-cpufreq.c:262:2: error: implicit declaration of function 
'enable_kernel_fp' [-Werror,-Wimplicit-function-declaration]
enable_kernel_fp();
../arch/powerpc/lib/sstep.c: In function 'do_vec_load':
../arch/powerpc/lib/sstep.c:637:3: error: implicit declaration of function 
'put_vr' [-Werror=implicit-function-declaration]
   637 |   put_vr(rn, );
   |   ^~
../arch/powerpc/lib/sstep.c: In function 'do_vec_store':
../arch/powerpc/lib/sstep.c:660:3: error: implicit declaration of function 
'get_vr'; did you mean 'get_oc'? [-Werror=implicit-function-declaration]
   660 |   get_vr(rn, );
   |   ^~

In theory ALTIVEC is independent of PPC_FPU but in practice nobody
is going to build such a machine, so make ALTIVEC require PPC_FPU
by depending on PPC_FPU.

Signed-off-by: Randy Dunlap 
Reported-by: kernel test robot 
Cc: Michael Ellerman 
Cc: linuxppc-...@lists.ozlabs.org
Cc: Christophe Leroy 
Cc: Segher Boessenkool 
Cc: l...@intel.com
---
  arch/powerpc/platforms/86xx/Kconfig|1 +
  arch/powerpc/platforms/Kconfig.cputype |2 ++
  2 files changed, 3 insertions(+)

--- linux-next-20210416.orig/arch/powerpc/platforms/86xx/Kconfig
+++ linux-next-20210416/arch/powerpc/platforms/86xx/Kconfig
@@ -4,6 +4,7 @@ menuconfig PPC_86xx
bool "86xx-based boards"
depends on PPC_BOOK3S_32
select FSL_SOC
+   select PPC_FPU
select ALTIVEC
help
  The Freescale E600 SoCs have 74xx cores.
--- linux-next-20210416.orig/arch/powerpc/platforms/Kconfig.cputype
+++ linux-next-20210416/arch/powerpc/platforms/Kconfig.cputype
@@ -186,6 +186,7 @@ config E300C3_CPU
  config G4_CPU
bool "G4 (74xx)"
depends on PPC_BOOK3S_32
+   select PPC_FPU
select ALTIVEC
  
  endchoice

@@ -309,6 +310,7 @@ config PHYS_64BIT
  
  config ALTIVEC

bool "AltiVec Support"
+   depends on PPC_FPU


Shouldn't we do it the other way round ? In extenso make ALTIVEC select PPC_FPU and avoid the two 
selects that are above ?



depends on PPC_BOOK3S_32 || PPC_BOOK3S_64 || (PPC_E500MC && PPC64)
help
  This option enables kernel support for the Altivec extensions to the



Re: [PATCH v5 09/12] media: uvcvideo: Implement UVC_QUIRK_PRIVACY_DURING_STREAM

2021-04-19 Thread Tomasz Figa
On Wed, Dec 23, 2020 at 9:56 PM Ricardo Ribalda  wrote:
>
> Hi again
>
> On Wed, Dec 23, 2020 at 9:31 AM Ricardo Ribalda  wrote:
> >
> > Hi Laurent
> >
> > On Wed, Dec 23, 2020 at 9:05 AM Laurent Pinchart
> >  wrote:
> > >
> > > Hi Ricardo,
> > >
> > > On Tue, Dec 22, 2020 at 09:04:19PM +0100, Ricardo Ribalda wrote:
> > > > On Tue, Dec 22, 2020 at 11:30 AM Laurent Pinchart wrote:
> > > > > On Mon, Dec 21, 2020 at 05:48:16PM +0100, Ricardo Ribalda wrote:
> > > > > > Some devices, can only read the privacy_pin if the device is
> > > > >
> > > > > s/devices,/devices/
> > > > >
> > > > > > streaming.
> > > > > >
> > > > > > This patch implement a quirk for such devices, in order to avoid 
> > > > > > invalid
> > > > > > reads and/or spurious events.
> > > > > >
> > > > > > Signed-off-by: Ricardo Ribalda 
> > > > > > ---
> > > > > >  drivers/media/usb/uvc/uvc_driver.c | 57 
> > > > > > --
> > > > > >  drivers/media/usb/uvc/uvc_queue.c  |  3 ++
> > > > > >  drivers/media/usb/uvc/uvcvideo.h   |  4 +++
> > > > > >  3 files changed, 61 insertions(+), 3 deletions(-)
> > > > > >
> > > > > > diff --git a/drivers/media/usb/uvc/uvc_driver.c 
> > > > > > b/drivers/media/usb/uvc/uvc_driver.c
> > > > > > index 72516101fdd0..7af37d4bd60a 100644
> > > > > > --- a/drivers/media/usb/uvc/uvc_driver.c
> > > > > > +++ b/drivers/media/usb/uvc/uvc_driver.c
> > > > > > @@ -7,6 +7,7 @@
> > > > > >   */
> > > > > >
> > > > > >  #include 
> > > > > > +#include 
> > > > > >  #include 
> > > > > >  #include 
> > > > > >  #include 
> > > > > > @@ -1472,6 +1473,17 @@ static int uvc_parse_control(struct 
> > > > > > uvc_device *dev)
> > > > > >  /* 
> > > > > > -
> > > > > >   * Privacy GPIO
> > > > > >   */
> > > > >
> > > > > There should be a blank line here.
> > > > >
> > > > > > +static bool uvc_gpio_is_streaming(struct uvc_device *dev)
> > > > > > +{
> > > > > > + struct uvc_streaming *streaming;
> > > > > > +
> > > > > > + list_for_each_entry(streaming, >streams, list) {
> > > > > > + if (uvc_queue_streaming(>queue))
> > > > > > + return true;
> > > > > > + }
> > > > > > +
> > > > > > + return false;
> > > > > > +}
> > > > > >
> > > > > >
> > > > >
> > > > > But not too blank lines here.
> > > > >
> > > > > >  static u8 uvc_gpio_update_value(struct uvc_device *dev,
> > > > > > @@ -1499,7 +1511,12 @@ static int uvc_gpio_get_cur(struct 
> > > > > > uvc_device *dev, struct uvc_entity *entity,
> > > > > >   if (cs != UVC_CT_PRIVACY_CONTROL || size < 1)
> > > > > >   return -EINVAL;
> > > > > >
> > > > > > + if ((dev->quirks & UVC_QUIRK_PRIVACY_DURING_STREAM) &&
> > > > > > + !uvc_gpio_is_streaming(dev))
> > > > > > + return -EBUSY;
> > > > > > +
> > > > > >   *(uint8_t *)data = uvc_gpio_update_value(dev, entity);
> > > > > > +
> > > > > >   return 0;
> > > > > >  }
> > > > > >
> > > > > > @@ -1528,19 +1545,50 @@ static struct uvc_entity 
> > > > > > *uvc_gpio_find_entity(struct uvc_device *dev)
> > > > > >   return NULL;
> > > > > >  }
> > > > > >
> > > > > > -static irqreturn_t uvc_gpio_irq(int irq, void *data)
> > > > > > +void uvc_privacy_gpio_event(struct uvc_device *dev)
> > > > > >  {
> > > > > > - struct uvc_device *dev = data;
> > > > > >   struct uvc_entity *unit;
> > > > > >
> > > > > > +
> > > > > >   unit = uvc_gpio_find_entity(dev);
> > > > > >   if (!unit)
> > > > > > - return IRQ_HANDLED;
> > > > > > + return;
> > > > > >
> > > > > >   uvc_gpio_update_value(dev, unit);
> > > > > > +}
> > > > > > +
> > > > > > +static irqreturn_t uvc_gpio_irq(int irq, void *data)
> > > > > > +{
> > > > > > + struct uvc_device *dev = data;
> > > > > > +
> > > > > > + /* Ignore privacy events during streamoff */
> > > > > > + if (dev->quirks & UVC_QUIRK_PRIVACY_DURING_STREAM)
> > > > > > + if (!uvc_gpio_is_streaming(dev))
> > > > > > + return IRQ_HANDLED;
> > > > >
> > > > > I'm still a bit concerned of race conditions. When stopping the 
> > > > > stream,
> > > > > vb2_queue.streaming is set to 0 after calling the driver's 
> > > > > .stop_stream()
> > > > > handler. This means that the device will cut power before
> > > > > uvc_gpio_is_streaming() can detect that streaming has stopped, and the
> > > > > GPIO could thus trigger an IRQ.
> > > >
> > > > On the affected devices I have not seen this. I guess it takes some
> > > > time to discharge. Anyway I am implementing a workaround. Tell me if
> > > > it is too ugly.
> > > >
> > > > > You mentioned that devices have a pull-up or pull-down on the GPIO 
> > > > > line.
> > > > > As there are only two devices affected, do you know if it's a pull-up 
> > > > > or
> > > > > pull-down ? Would it be worse to expose that state to userspace than 
> > > > > to
> > > > > return -EBUSY when reading the control ?
> > 

[PATCH 1/1] powerpc/pseries/iommu: Fix window size for direct mapping with pmem

2021-04-19 Thread Leonardo Bras
As of today, if the DDW is big enough to fit (1 << MAX_PHYSMEM_BITS) it's
possible to use direct DMA mapping even with pmem region.

But, if that happens, the window size (len) is set to
(MAX_PHYSMEM_BITS - page_shift) instead of MAX_PHYSMEM_BITS, causing a
pagesize times smaller DDW to be created, being insufficient for correct
usage.

Fix this so the correct window size is used in this case.

Fixes: bf6e2d562bbc4("powerpc/dma: Fallback to dma_ops when persistent memory 
present")
Signed-off-by: Leonardo Bras 
---
 arch/powerpc/platforms/pseries/iommu.c | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/powerpc/platforms/pseries/iommu.c 
b/arch/powerpc/platforms/pseries/iommu.c
index 9fc5217f0c8e..836cbbe0ecc5 100644
--- a/arch/powerpc/platforms/pseries/iommu.c
+++ b/arch/powerpc/platforms/pseries/iommu.c
@@ -1229,7 +1229,7 @@ static u64 enable_ddw(struct pci_dev *dev, struct 
device_node *pdn)
if (pmem_present) {
if (query.largest_available_block >=
(1ULL << (MAX_PHYSMEM_BITS - page_shift)))
-   len = MAX_PHYSMEM_BITS - page_shift;
+   len = MAX_PHYSMEM_BITS;
else
dev_info(>dev, "Skipping ibm,pmemory");
}
-- 
2.30.2



Re: [PATCH] bpf: Fix backport of "bpf: restrict unknown scalars of mixed signed bounds for unprivileged"

2021-04-19 Thread Balbir Singh
On Mon, Apr 19, 2021 at 04:56:41PM -0700, Samuel Mendoza-Jonas wrote:
> The 4.14 backport of 9d7eceede ("bpf: restrict unknown scalars of mixed
> signed bounds for unprivileged") adds the PTR_TO_MAP_VALUE check to the
> wrong location in adjust_ptr_min_max_vals(), most likely because 4.14
> doesn't include the commit that updates the if-statement to a
> switch-statement (aad2eeaf4 "bpf: Simplify ptr_min_max_vals adjustment").
> 
> Move the check to the proper location in adjust_ptr_min_max_vals().
> 
> Fixes: 17efa65350c5a ("bpf: restrict unknown scalars of mixed signed bounds 
> for unprivileged")
> Signed-off-by: Samuel Mendoza-Jonas 
> Reviewed-by: Frank van der Linden 
> Reviewed-by: Ethan Chen 
> ---

Thanks for catching it :)

Reviewed-by: Balbir Singh 


RE: [PATCH v2] Drivers: hv: vmbus: Initialize unload_event statically

2021-04-19 Thread Michael Kelley
From: Andrea Parri (Microsoft)  Sent: Monday, April 19, 
2021 6:44 PM
> 
> If a malicious or compromised Hyper-V sends a spurious message of type
> CHANNELMSG_UNLOAD_RESPONSE, the function vmbus_unload_response() will
> call complete() on an uninitialized event, and cause an oops.
> 
> Reported-by: Michael Kelley 
> Signed-off-by: Andrea Parri (Microsoft) 
> ---
> Changes since v1[1]:
>   - add inline comment in vmbus_unload_response()
> 
> [1] 
> https://lore.kernel.org/linux-hyperv/20210416143932.16512-1-parri.and...@gmail.com/
> 
>  drivers/hv/channel_mgmt.c | 7 ++-
>  drivers/hv/connection.c   | 2 ++
>  2 files changed, 8 insertions(+), 1 deletion(-)
> 

Reviewed-by: Michael Kelley 

> diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
> index 4c9e45d1f462c..335a10ee03a5e 100644
> --- a/drivers/hv/channel_mgmt.c
> +++ b/drivers/hv/channel_mgmt.c
> @@ -826,6 +826,11 @@ static void vmbus_unload_response(struct
> vmbus_channel_message_header *hdr)
>   /*
>* This is a global event; just wakeup the waiting thread.
>* Once we successfully unload, we can cleanup the monitor state.
> +  *
> +  * NB.  A malicious or compromised Hyper-V could send a spurious
> +  * message of type CHANNELMSG_UNLOAD_RESPONSE, and trigger a call
> +  * of the complete() below.  Make sure that unload_event has been
> +  * initialized by the time this complete() is executed.
>*/
>   complete(_connection.unload_event);
>  }
> @@ -841,7 +846,7 @@ void vmbus_initiate_unload(bool crash)
>   if (vmbus_proto_version < VERSION_WIN8_1)
>   return;
> 
> - init_completion(_connection.unload_event);
> + reinit_completion(_connection.unload_event);
>   memset(, 0, sizeof(struct vmbus_channel_message_header));
>   hdr.msgtype = CHANNELMSG_UNLOAD;
>   vmbus_post_msg(, sizeof(struct vmbus_channel_message_header),
> diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
> index dc19d5ae4373c..311cd005b3be6 100644
> --- a/drivers/hv/connection.c
> +++ b/drivers/hv/connection.c
> @@ -26,6 +26,8 @@
> 
>  struct vmbus_connection vmbus_connection = {
>   .conn_state = DISCONNECTED,
> + .unload_event   = COMPLETION_INITIALIZER(
> +   vmbus_connection.unload_event),
>   .next_gpadl_handle  = ATOMIC_INIT(0xE1E10),
> 
>   .ready_for_suspend_event = COMPLETION_INITIALIZER(
> --
> 2.25.1



[PATCH v2 1/1] Drivers: hv: vmbus: Increase wait time for VMbus unload

2021-04-19 Thread Michael Kelley
When running in Azure, disks may be connected to a Linux VM with
read/write caching enabled. If a VM panics and issues a VMbus
UNLOAD request to Hyper-V, the response is delayed until all dirty
data in the disk cache is flushed.  In extreme cases, this flushing
can take 10's of seconds, depending on the disk speed and the amount
of dirty data. If kdump is configured for the VM, the current 10 second
timeout in vmbus_wait_for_unload() may be exceeded, and the UNLOAD
complete message may arrive well after the kdump kernel is already
running, causing problems.  Note that no problem occurs if kdump is
not enabled because Hyper-V waits for the cache flush before doing
a reboot through the BIOS/UEFI code.

Fix this problem by increasing the timeout in vmbus_wait_for_unload()
to 100 seconds. Also output periodic messages so that if anyone is
watching the serial console, they won't think the VM is completely
hung.

Fixes: 911e1987efc8 ("Drivers: hv: vmbus: Add timeout to vmbus_wait_for_unload")
Signed-off-by: Michael Kelley 
---

Changed in v2: Fixed silly error in the argument to mdelay()

---
 drivers/hv/channel_mgmt.c | 30 +-
 1 file changed, 25 insertions(+), 5 deletions(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index f3cf4af..ef4685c 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -755,6 +755,12 @@ static void init_vp_index(struct vmbus_channel *channel)
free_cpumask_var(available_mask);
 }
 
+#define UNLOAD_DELAY_UNIT_MS   10  /* 10 milliseconds */
+#define UNLOAD_WAIT_MS (100*1000)  /* 100 seconds */
+#define UNLOAD_WAIT_LOOPS  (UNLOAD_WAIT_MS/UNLOAD_DELAY_UNIT_MS)
+#define UNLOAD_MSG_MS  (5*1000)/* Every 5 seconds */
+#define UNLOAD_MSG_LOOPS   (UNLOAD_MSG_MS/UNLOAD_DELAY_UNIT_MS)
+
 static void vmbus_wait_for_unload(void)
 {
int cpu;
@@ -772,12 +778,17 @@ static void vmbus_wait_for_unload(void)
 * vmbus_connection.unload_event. If not, the last thing we can do is
 * read message pages for all CPUs directly.
 *
-* Wait no more than 10 seconds so that the panic path can't get
-* hung forever in case the response message isn't seen.
+* Wait up to 100 seconds since an Azure host must writeback any dirty
+* data in its disk cache before the VMbus UNLOAD request will
+* complete. This flushing has been empirically observed to take up
+* to 50 seconds in cases with a lot of dirty data, so allow additional
+* leeway and for inaccuracies in mdelay(). But eventually time out so
+* that the panic path can't get hung forever in case the response
+* message isn't seen.
 */
-   for (i = 0; i < 1000; i++) {
+   for (i = 1; i <= UNLOAD_WAIT_LOOPS; i++) {
if (completion_done(_connection.unload_event))
-   break;
+   goto completed;
 
for_each_online_cpu(cpu) {
struct hv_per_cpu_context *hv_cpu
@@ -800,9 +811,18 @@ static void vmbus_wait_for_unload(void)
vmbus_signal_eom(msg, message_type);
}
 
-   mdelay(10);
+   /*
+* Give a notice periodically so someone watching the
+* serial output won't think it is completely hung.
+*/
+   if (!(i % UNLOAD_MSG_LOOPS))
+   pr_notice("Waiting for VMBus UNLOAD to complete\n");
+
+   mdelay(UNLOAD_DELAY_UNIT_MS);
}
+   pr_err("Continuing even though VMBus UNLOAD did not complete\n");
 
+completed:
/*
 * We're crashing and already got the UNLOAD_RESPONSE, cleanup all
 * maybe-pending messages on all CPUs to be able to receive new
-- 
1.8.3.1



Re: [PATCH bpf-next v2 4/4] libbpf: add selftests for TC-BPF API

2021-04-19 Thread Andrii Nakryiko
On Mon, Apr 19, 2021 at 5:18 AM Kumar Kartikeya Dwivedi
 wrote:
>
> This adds some basic tests for the low level bpf_tc_cls_* API.
>
> Reviewed-by: Toke Høiland-Jørgensen 
> Signed-off-by: Kumar Kartikeya Dwivedi 
> ---
>  .../selftests/bpf/prog_tests/test_tc_bpf.c| 112 ++
>  .../selftests/bpf/progs/test_tc_bpf_kern.c|  12 ++
>  2 files changed, 124 insertions(+)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
>  create mode 100644 tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
>
> diff --git a/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c 
> b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
> new file mode 100644
> index ..945f3a1a72f8
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/prog_tests/test_tc_bpf.c
> @@ -0,0 +1,112 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +#include 
> +
> +#define LO_IFINDEX 1
> +
> +static int test_tc_cls_internal(int fd, __u32 parent_id)
> +{
> +   DECLARE_LIBBPF_OPTS(bpf_tc_cls_opts, opts, .handle = 1, .priority = 
> 10,
> +   .class_id = TC_H_MAKE(1UL << 16, 1),
> +   .chain_index = 5);
> +   struct bpf_tc_cls_attach_id id = {};
> +   struct bpf_tc_cls_info info = {};
> +   int ret;
> +
> +   ret = bpf_tc_cls_attach(fd, LO_IFINDEX, parent_id, , );
> +   if (CHECK_FAIL(ret < 0))
> +   return ret;
> +
> +   ret = bpf_tc_cls_get_info(fd, LO_IFINDEX, parent_id, NULL, );
> +   if (CHECK_FAIL(ret < 0))
> +   goto end;
> +
> +   ret = -1;
> +
> +   if (CHECK_FAIL(info.id.handle != id.handle) ||
> +   CHECK_FAIL(info.id.chain_index != id.chain_index) ||
> +   CHECK_FAIL(info.id.priority != id.priority) ||
> +   CHECK_FAIL(info.id.handle != 1) ||
> +   CHECK_FAIL(info.id.priority != 10) ||
> +   CHECK_FAIL(info.class_id != TC_H_MAKE(1UL << 16, 1)) ||
> +   CHECK_FAIL(info.id.chain_index != 5))
> +   goto end;
> +
> +   ret = bpf_tc_cls_replace(fd, LO_IFINDEX, parent_id, , );
> +   if (CHECK_FAIL(ret < 0))
> +   return ret;
> +
> +   if (CHECK_FAIL(info.id.handle != 1) ||
> +   CHECK_FAIL(info.id.priority != 10) ||
> +   CHECK_FAIL(info.class_id != TC_H_MAKE(1UL << 16, 1)))
> +   goto end;
> +
> +   /* Demonstrate changing attributes */
> +   opts.class_id = TC_H_MAKE(1UL << 16, 2);
> +
> +   ret = bpf_tc_cls_change(fd, LO_IFINDEX, parent_id, , );
> +   if (CHECK_FAIL(ret < 0))
> +   goto end;
> +
> +   ret = bpf_tc_cls_get_info(fd, LO_IFINDEX, parent_id, NULL, );
> +   if (CHECK_FAIL(ret < 0))
> +   goto end;
> +
> +   if (CHECK_FAIL(info.class_id != TC_H_MAKE(1UL << 16, 2)))
> +   goto end;
> +   if (CHECK_FAIL((info.bpf_flags & TCA_BPF_FLAG_ACT_DIRECT) != 1))
> +   goto end;
> +
> +end:
> +   ret = bpf_tc_cls_detach(LO_IFINDEX, parent_id, );
> +   CHECK_FAIL(ret < 0);
> +   return ret;
> +}
> +
> +void test_test_tc_bpf(void)
> +{
> +   const char *file = "./test_tc_bpf_kern.o";
> +   struct bpf_program *clsp;
> +   struct bpf_object *obj;
> +   int cls_fd, ret;
> +
> +   obj = bpf_object__open(file);
> +   if (CHECK_FAIL(IS_ERR_OR_NULL(obj)))
> +   return;
> +
> +   clsp = bpf_object__find_program_by_title(obj, "classifier");
> +   if (CHECK_FAIL(IS_ERR_OR_NULL(clsp)))
> +   goto end;
> +
> +   ret = bpf_object__load(obj);
> +   if (CHECK_FAIL(ret < 0))
> +   goto end;
> +
> +   cls_fd = bpf_program__fd(clsp);
> +
> +   system("tc qdisc del dev lo clsact");
> +
> +   ret = test_tc_cls_internal(cls_fd, BPF_TC_CLSACT_INGRESS);
> +   if (CHECK_FAIL(ret < 0))
> +   goto end;
> +
> +   if (CHECK_FAIL(system("tc qdisc del dev lo clsact")))
> +   goto end;
> +
> +   ret = test_tc_cls_internal(cls_fd, BPF_TC_CLSACT_EGRESS);
> +   if (CHECK_FAIL(ret < 0))
> +   goto end;
> +
> +   CHECK_FAIL(system("tc qdisc del dev lo clsact"));

please don't use CHECK_FAIL. And prefer ASSERT_xxx over CHECK().

> +
> +end:
> +   bpf_object__close(obj);
> +}
> diff --git a/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c 
> b/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
> new file mode 100644
> index ..3dd40e21af8e
> --- /dev/null
> +++ b/tools/testing/selftests/bpf/progs/test_tc_bpf_kern.c
> @@ -0,0 +1,12 @@
> +// SPDX-License-Identifier: GPL-2.0
> +
> +#include 
> +#include 
> +
> +// Dummy prog to test TC-BPF API

no C++-style comments, please (except for SPDX header, of course)
> +
> +SEC("classifier")
> +int cls(struct __sk_buff *skb)
> +{
> +   return 0;
> +}
> --
> 2.30.2
>


Re: [PATCH] arm64: dts: ls1028a-rdb: enable optee node

2021-04-19 Thread Li Yang
On Tue, Mar 30, 2021 at 9:53 AM Sahil Malhotra
 wrote:
>
> From: Sahil Malhotra 
>
> optee node was disabled by default, enabling it for ls1028a-rdb.
>
> Signed-off-by: Michael Walle 
> Signed-off-by: Sahil Malhotra 

Acked-by: Li Yang 

> ---
>  arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts | 4 
>  arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi| 2 +-
>  2 files changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts 
> b/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts
> index 41ae6e7675ba..1bdf0104d492 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a-rdb.dts
> @@ -274,6 +274,10 @@
> status = "okay";
>  };
>
> + {
> +   status = "okay";
> +};
> +
>   {
> status = "okay";
>  };
> diff --git a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi 
> b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> index 262fbad8f0ec..fb155ac192b1 100644
> --- a/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> +++ b/arch/arm64/boot/dts/freescale/fsl-ls1028a.dtsi
> @@ -92,7 +92,7 @@
> };
>
> firmware {
> -   optee {
> +   optee: optee  {
> compatible = "linaro,optee-tz";
> method = "smc";
> status = "disabled";
> --
> 2.17.1
>


[PATCH v2 2/2] usb: cdnsp: Fix lack of removing request from pending list.

2021-04-19 Thread Pawel Laszczak
From: Pawel Laszczak 

Patch fixes lack of removing request from ep->pending_list on failure
of the stop endpoint command. Driver even after failing this command
must remove request from ep->pending_list.
Without this fix driver can stuck in cdnsp_gadget_ep_disable function
in loop:
while (!list_empty(>pending_list)) {
preq = next_request(>pending_list);
cdnsp_ep_dequeue(pep, preq);
}

Fixes: 3d82904559f4 ("usb: cdnsp: cdns3 Add main part of Cadence USBSSP DRD 
Driver")
Signed-off-by: Pawel Laszczak 

---
Changelog:
v2:
- removed blank space
- added "Fixes" tag

 drivers/usb/cdns3/cdnsp-gadget.c | 14 +++---
 1 file changed, 7 insertions(+), 7 deletions(-)

diff --git a/drivers/usb/cdns3/cdnsp-gadget.c b/drivers/usb/cdns3/cdnsp-gadget.c
index 56707b6b0f57..c083985e387b 100644
--- a/drivers/usb/cdns3/cdnsp-gadget.c
+++ b/drivers/usb/cdns3/cdnsp-gadget.c
@@ -422,17 +422,17 @@ int cdnsp_ep_enqueue(struct cdnsp_ep *pep, struct 
cdnsp_request *preq)
 int cdnsp_ep_dequeue(struct cdnsp_ep *pep, struct cdnsp_request *preq)
 {
struct cdnsp_device *pdev = pep->pdev;
-   int ret;
+   int ret_stop = 0;
+   int ret_rem;
 
trace_cdnsp_request_dequeue(preq);
 
-   if (GET_EP_CTX_STATE(pep->out_ctx) == EP_STATE_RUNNING) {
-   ret = cdnsp_cmd_stop_ep(pdev, pep);
-   if (ret)
-   return ret;
-   }
+   if (GET_EP_CTX_STATE(pep->out_ctx) == EP_STATE_RUNNING)
+   ret_stop = cdnsp_cmd_stop_ep(pdev, pep);
+
+   ret_rem = cdnsp_remove_request(pdev, preq, pep);
 
-   return cdnsp_remove_request(pdev, preq, pep);
+   return ret_rem ? ret_rem : ret_stop;
 }
 
 static void cdnsp_zero_in_ctx(struct cdnsp_device *pdev)
-- 
2.25.1



Re: linux-next: Tree for Apr 19 (bcache)

2021-04-19 Thread Coly Li
On 4/20/21 1:50 AM, Jens Axboe wrote:
> On 4/19/21 10:26 AM, Coly Li wrote:
>> On 4/19/21 11:40 PM, Randy Dunlap wrote:
>>> On 4/19/21 3:23 AM, Stephen Rothwell wrote:
 Hi all,

 Changes since 20210416:

>>>
>>> on x86_64:
>>>
>>> when
>>> # CONFIG_BLK_DEV is not set
>>>
>>>
>>> WARNING: unmet direct dependencies detected for LIBNVDIMM
>>>   Depends on [n]: PHYS_ADDR_T_64BIT [=y] && HAS_IOMEM [=y] && BLK_DEV [=n]
>>>   Selected by [y]:
>>>   - BCACHE_NVM_PAGES [=y] && MD [=y] && BCACHE [=y] && PHYS_ADDR_T_64BIT 
>>> [=y]
>>>
>>>
>>> Full randconfig file is attached.
>>>
>>
>> I need hint from kbuild expert.
>>
>> My original idea to use "select LIBNVDIMM" is to avoid the
>> BCACHE_NVM_PAGES option to disappear if LIBNVDIMM is not enabled.
>> Otherwise if nvdimm driver is not configure, users won't know there is a
>> BCACHE_NVM_PAGES option unless they read bcache Kconfig file.
> 
> But why? That's exactly how it should work. Just use depends to set the
> dependency.
> 
>> But I see nvdimm's Kconfig, it uses "depends on BLK_DEV", I understand
>> it is acceptable that LIBNVDIMM option to disappear from "make
>> menuconfig" if BLK_DEV is not enabled.
>>
>> For such condition, which one is the proper way to set the dependence ?
>> - Change "select LIBNVDIMM" and "select DAX" to "depends on LIBNVDIMM"
>> and "depends on DAX" in bcache Kconfig
>> - Or change "depends on BLK_DEV" to "select BLK_DEV" in nvdimm Kconfig.
> 
> The former.
> 

Copied. Thanks for the hint. I will post a fix soon.

Coly Li


Re: [PATCH] riscv: Remove 32b kernel mapping from page table dump

2021-04-19 Thread Anup Patel
On Sun, Apr 18, 2021 at 4:59 PM Alexandre Ghiti  wrote:
>
> The 32b kernel mapping lies in the linear mapping, there is no point in
> printing its address in page table dump, so remove this leftover that
> comes from moving the kernel mapping outside the linear mapping for 64b
> kernel.
>
> Fixes: e9efb21fe352 ("riscv: Prepare ptdump for vm layout dynamic addresses")
> Signed-off-by: Alexandre Ghiti 

Looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup

> ---
>  arch/riscv/mm/ptdump.c | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
>
> diff --git a/arch/riscv/mm/ptdump.c b/arch/riscv/mm/ptdump.c
> index 0aba4421115c..a4ed4bdbbfde 100644
> --- a/arch/riscv/mm/ptdump.c
> +++ b/arch/riscv/mm/ptdump.c
> @@ -76,8 +76,8 @@ enum address_markers_idx {
> PAGE_OFFSET_NR,
>  #ifdef CONFIG_64BIT
> MODULES_MAPPING_NR,
> -#endif
> KERNEL_MAPPING_NR,
> +#endif
> END_OF_SPACE_NR
>  };
>
> @@ -99,8 +99,8 @@ static struct addr_marker address_markers[] = {
> {0, "Linear mapping"},
>  #ifdef CONFIG_64BIT
> {0, "Modules mapping"},
> -#endif
> {0, "Kernel mapping (kernel, BPF)"},
> +#endif
> {-1, NULL},
>  };
>
> @@ -379,8 +379,8 @@ static int ptdump_init(void)
> address_markers[PAGE_OFFSET_NR].start_address = PAGE_OFFSET;
>  #ifdef CONFIG_64BIT
> address_markers[MODULES_MAPPING_NR].start_address = MODULES_VADDR;
> -#endif
> address_markers[KERNEL_MAPPING_NR].start_address = kernel_virt_addr;
> +#endif
>
> kernel_ptd_info.base_addr = KERN_VIRT_START;
>
> --
> 2.20.1
>


Re: [PATCH] riscv: Fix 32b kernel caused by 64b kernel mapping moving outside linear mapping

2021-04-19 Thread Anup Patel
On Sat, Apr 17, 2021 at 10:52 PM Alexandre Ghiti  wrote:
>
> Fix multiple leftovers when moving the kernel mapping outside the linear
> mapping for 64b kernel that left the 32b kernel unusable.
>
> Fixes: 4b67f48da707 ("riscv: Move kernel mapping outside of linear mapping")
> Signed-off-by: Alexandre Ghiti 

Quite a few #ifdef but I don't see any better way at the moment. Maybe we can
clean this later. Otherwise looks good to me.

Reviewed-by: Anup Patel 

Regards,
Anup

> ---
>  arch/riscv/include/asm/page.h|  9 +
>  arch/riscv/include/asm/pgtable.h | 16 
>  arch/riscv/mm/init.c | 25 -
>  3 files changed, 45 insertions(+), 5 deletions(-)
>
> diff --git a/arch/riscv/include/asm/page.h b/arch/riscv/include/asm/page.h
> index 22cfb2be60dc..f64b61296c0c 100644
> --- a/arch/riscv/include/asm/page.h
> +++ b/arch/riscv/include/asm/page.h
> @@ -90,15 +90,20 @@ typedef struct page *pgtable_t;
>
>  #ifdef CONFIG_MMU
>  extern unsigned long va_pa_offset;
> +#ifdef CONFIG_64BIT
>  extern unsigned long va_kernel_pa_offset;
> +#endif
>  extern unsigned long pfn_base;
>  #define ARCH_PFN_OFFSET(pfn_base)
>  #else
>  #define va_pa_offset   0
> +#ifdef CONFIG_64BIT
>  #define va_kernel_pa_offset0
> +#endif
>  #define ARCH_PFN_OFFSET(PAGE_OFFSET >> PAGE_SHIFT)
>  #endif /* CONFIG_MMU */
>
> +#ifdef CONFIG_64BIT
>  extern unsigned long kernel_virt_addr;
>
>  #define linear_mapping_pa_to_va(x) ((void *)((unsigned long)(x) + 
> va_pa_offset))
> @@ -112,6 +117,10 @@ extern unsigned long kernel_virt_addr;
> (_x < kernel_virt_addr) ? 
>   \
> linear_mapping_va_to_pa(_x) : kernel_mapping_va_to_pa(_x);
>   \
> })
> +#else
> +#define __pa_to_va_nodebug(x)  ((void *)((unsigned long) (x) + va_pa_offset))
> +#define __va_to_pa_nodebug(x)  ((unsigned long)(x) - va_pa_offset)
> +#endif
>
>  #ifdef CONFIG_DEBUG_VIRTUAL
>  extern phys_addr_t __virt_to_phys(unsigned long x);
> diff --git a/arch/riscv/include/asm/pgtable.h 
> b/arch/riscv/include/asm/pgtable.h
> index 80e63a93e903..5afda75cc2c3 100644
> --- a/arch/riscv/include/asm/pgtable.h
> +++ b/arch/riscv/include/asm/pgtable.h
> @@ -16,19 +16,27 @@
>  #else
>
>  #define ADDRESS_SPACE_END  (UL(-1))
> -/*
> - * Leave 2GB for kernel and BPF at the end of the address space
> - */
> +
> +#ifdef CONFIG_64BIT
> +/* Leave 2GB for kernel and BPF at the end of the address space */
>  #define KERNEL_LINK_ADDR   (ADDRESS_SPACE_END - SZ_2G + 1)
> +#else
> +#define KERNEL_LINK_ADDR   PAGE_OFFSET
> +#endif
>
>  #define VMALLOC_SIZE (KERN_VIRT_SIZE >> 1)
>  #define VMALLOC_END  (PAGE_OFFSET - 1)
>  #define VMALLOC_START(PAGE_OFFSET - VMALLOC_SIZE)
>
> -/* KASLR should leave at least 128MB for BPF after the kernel */
>  #define BPF_JIT_REGION_SIZE(SZ_128M)
> +#ifdef CONFIG_64BIT
> +/* KASLR should leave at least 128MB for BPF after the kernel */
>  #define BPF_JIT_REGION_START   PFN_ALIGN((unsigned long)&_end)
>  #define BPF_JIT_REGION_END (BPF_JIT_REGION_START + BPF_JIT_REGION_SIZE)
> +#else
> +#define BPF_JIT_REGION_START   (PAGE_OFFSET - BPF_JIT_REGION_SIZE)
> +#define BPF_JIT_REGION_END (VMALLOC_END)
> +#endif
>
>  /* Modules always live before the kernel */
>  #ifdef CONFIG_64BIT
> diff --git a/arch/riscv/mm/init.c b/arch/riscv/mm/init.c
> index 093f3a96ecfc..dc9b988e0778 100644
> --- a/arch/riscv/mm/init.c
> +++ b/arch/riscv/mm/init.c
> @@ -91,8 +91,10 @@ static void print_vm_layout(void)
>   (unsigned long)VMALLOC_END);
> print_mlm("lowmem", (unsigned long)PAGE_OFFSET,
>   (unsigned long)high_memory);
> +#ifdef CONFIG_64BIT
> print_mlm("kernel", (unsigned long)KERNEL_LINK_ADDR,
>   (unsigned long)ADDRESS_SPACE_END);
> +#endif
>  }
>  #else
>  static void print_vm_layout(void) { }
> @@ -165,9 +167,11 @@ static struct pt_alloc_ops pt_ops;
>  /* Offset between linear mapping virtual address and kernel load address */
>  unsigned long va_pa_offset;
>  EXPORT_SYMBOL(va_pa_offset);
> +#ifdef CONFIG_64BIT
>  /* Offset between kernel mapping virtual address and kernel load address */
>  unsigned long va_kernel_pa_offset;
>  EXPORT_SYMBOL(va_kernel_pa_offset);
> +#endif
>  unsigned long pfn_base;
>  EXPORT_SYMBOL(pfn_base);
>
> @@ -410,7 +414,9 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
> load_sz = (uintptr_t)(&_end) - load_pa;
>
> va_pa_offset = PAGE_OFFSET - load_pa;
> +#ifdef CONFIG_64BIT
> va_kernel_pa_offset = kernel_virt_addr - load_pa;
> +#endif
>
> pfn_base = PFN_DOWN(load_pa);
>
> @@ -469,12 +475,16 @@ asmlinkage void __init setup_vm(uintptr_t dtb_pa)
>pa + PMD_SIZE, PMD_SIZE, PAGE_KERNEL);
> dtb_early_va = (void *)DTB_EARLY_BASE_VA + (dtb_pa & (PMD_SIZE - 1));
>  #else /* CONFIG_BUILTIN_DTB */
> +#ifdef CONFIG_64BIT
> /*
> 

Re: [PATCH v20 1/2] scsi: ufs: Enable power management for wlun

2021-04-19 Thread Adrian Hunter
On 20/04/21 12:53 am, Asutosh Das (asd) wrote:
> On 4/19/2021 11:37 AM, Adrian Hunter wrote:
>> On 16/04/21 10:49 pm, Asutosh Das wrote:
>>>
>>> Co-developed-by: Can Guo 
>>> Signed-off-by: Can Guo 
>>> Signed-off-by: Asutosh Das 
>>> ---
>>
>> I came across 3 issues while testing.  See comments below.
>>
> Hi Adrian
> Thanks for the comments.
>> 
>>
>>> @@ -5794,7 +5839,7 @@ static void ufshcd_err_handling_unprepare(struct 
>>> ufs_hba *hba)
>>>   if (ufshcd_is_clkscaling_supported(hba))
>>>   ufshcd_clk_scaling_suspend(hba, false);
>>>   ufshcd_clear_ua_wluns(hba);
>>
>> ufshcd_clear_ua_wluns() deadlocks trying to clear UFS_UPIU_RPMB_WLUN
>> if sdev_rpmb is suspended and sdev_ufs_device is suspending.
>> e.g. ufshcd_wl_suspend() is waiting on host_sem while ufshcd_err_handler()
>> is running, at which point sdev_rpmb has already suspended.
>>
> Umm, I didn't understand this deadlock.
> When you say, sdev_rpmb is suspended, does it mean runtime_suspended?
> sdev_ufs_device is suspending - this can't be runtime_suspending, while 
> ufshcd_err_handling_unprepare is running.
> 
> If you've a call-stack of this deadlock, please can you share it with me. 
> I'll also try to reproduce this.

Yes it is system suspend. sdev_rpmb has suspended, sdev_ufs_device is waiting 
on host_sem.
ufshcd_err_handler() holds host_sem. ufshcd_clear_ua_wlun(UFS_UPIU_RPMB_WLUN) 
gets stuck.
I will get some call-stacks.

> 
> I'll address the other comments in the next version.
> 
> 
> Thank you!
> 
>>> -    pm_runtime_put(hba->dev);
>>> +    ufshcd_rpm_put(hba);
>>>   }
>>
>> 
>>
>>> +void ufshcd_resume_complete(struct device *dev)
>>> +{
> 



linux-next: manual merge of the ftrace tree with the bpf-next tree

2021-04-19 Thread Stephen Rothwell
Hi all,

Today's linux-next merge of the ftrace tree got a conflict in:

  kernel/trace/bpf_trace.c

between commit:

  d9c9e4db186a ("bpf: Factorize bpf_trace_printk and bpf_seq_printf")

from the bpf-next tree and commit:

  f2cc020d7876 ("tracing: Fix various typos in comments")

from the ftrace tree.

I fixed it up (the former removed the comment updated by the latter) and
can carry the fix as necessary. This is now fixed as far as linux-next
is concerned, but any non trivial conflicts should be mentioned to your
upstream maintainer when your tree is submitted for merging.  You may
also want to consider cooperating with the maintainer of the conflicting
tree to minimise any particularly complex conflicts.

-- 
Cheers,
Stephen Rothwell


pgpZSIISeFaSc.pgp
Description: OpenPGP digital signature


Re: [PATCH 0/2] Change struct page layout for page_pool

2021-04-19 Thread Michael Ellerman
"Matthew Wilcox (Oracle)"  writes:
> The first patch here fixes two bugs on ppc32, and mips32.  It fixes one
> bug on arc and arm32 (in certain configurations).  It probably makes
> sense to get it in ASAP through the networking tree.  I'd like to see
> testing on those four architectures if possible?

Sorry I don't have easy access to any hardware that fits the bill. At
some point I'll be able to go to the office and setup a machine that (I
think) can test these paths, but I don't have an ETA on that.

You and others seem to have done lots of analysis on this though, so I
think you should merge the fixes without waiting on ppc32 testing.

cheers


>
> The second patch enables new functionality.  It is much less urgent.
> I'd really like to see Mel & Michal's thoughts on it.
>
> I have only compile-tested these patches.
>
> Matthew Wilcox (Oracle) (2):
>   mm: Fix struct page layout on 32-bit systems
>   mm: Indicate pfmemalloc pages in compound_head
>
>  include/linux/mm.h   | 12 +++-
>  include/linux/mm_types.h |  9 -
>  include/net/page_pool.h  | 12 +++-
>  net/core/page_pool.c | 12 +++-
>  4 files changed, 29 insertions(+), 16 deletions(-)
>
> -- 
> 2.30.2


RE: [PATCH 1/2] usb: gadget: f_uac2: Stop endpoint before enabling it.

2021-04-19 Thread Pawel Laszczak
>On 21-04-19 09:50:53, Pawel Laszczak wrote:
>> From: Pawel Laszczak 
>>
>> Patch adds disabling endpoint before enabling it during changing
>> alternate setting. Lack of this functionality causes that in some
>> cases uac2 queue the same request multiple time.
>> Such situation can occur when host send set interface with
>> alternate setting 1 twice.
>>
>> Signed-off-by: Pawel Laszczak 
>> ---
>>  drivers/usb/gadget/function/f_uac2.c | 8 
>>  1 file changed, 4 insertions(+), 4 deletions(-)
>>
>> diff --git a/drivers/usb/gadget/function/f_uac2.c 
>> b/drivers/usb/gadget/function/f_uac2.c
>> index 9cc5c512a5cd..7d20a9d8a1b4 100644
>> --- a/drivers/usb/gadget/function/f_uac2.c
>> +++ b/drivers/usb/gadget/function/f_uac2.c
>> @@ -890,17 +890,17 @@ afunc_set_alt(struct usb_function *fn, unsigned intf, 
>> unsigned alt)
>>  if (intf == uac2->as_out_intf) {
>>  uac2->as_out_alt = alt;
>>
>> +u_audio_stop_capture(>g_audio);
>> +
>>  if (alt)
>>  ret = u_audio_start_capture(>g_audio);
>> -else
>> -u_audio_stop_capture(>g_audio);
>>  } else if (intf == uac2->as_in_intf) {
>>  uac2->as_in_alt = alt;
>>
>> +u_audio_stop_playback(>g_audio);
>> +
>>  if (alt)
>>  ret = u_audio_start_playback(>g_audio);
>> -else
>> -u_audio_stop_playback(>g_audio);
>>  } else {
>>  dev_err(dev, "%s:%d Error!\n", __func__, __LINE__);
>>  return -EINVAL;
>
>To avoid this, you may use prm->ep_enabled to judge if the endpoint has
>already enabled.

Such condition is as first instruction inside u_audio_stop_playback->free_ep  
function,
so we don't need duplicate it here.

>
>--
>
>Thanks,
>Peter Chen

--

Regards,
Pawe Laszczak


[PATCH] mhi: add MHI_STATE_M2 to resume success criteria

2021-04-19 Thread Baochen Qiang
During system resume, mhi driver triggers M3->M0 transition and then waits
for target device to enter M0 state. Once done, the device queues a state
change event into ctrl event ring and notify mhi dirver by raising an
interrupt, where a tasklet is scheduled to process this event. In most cases,
the taklet is served timely and wait operation succeeds.

However, there are cases where CPU is busy and can not serve this tasklet
for some time. Once delay goes long enough, the device moves itself to M1
state and also interrupts mhi driver after inserting a new state change
event to ctrl ring. Later CPU finally has time to process the ring, however
there are two events in it now:
1. for M3->M0 event, which is processed first as queued first,
   tasklet handler updates device state to M0 and wakes up the task,
   i.e., the mhi driver.
2. for M0->M1 event, which is processed later, tasklet handler
   triggers M1->M2 transition and updates device state to M2 directly,
   then wakes up the mhi driver(if still sleeping on this wait queue).
Note that although mhi driver has been woken up while processing the first
event, it may still has no chance to run before the second event is processed.
In other words, mhi driver has to keep waiting till timeout cause the M0 state
has been missed.

kernel log here:
...
Apr 15 01:45:14 test-NUC8i7HVK kernel: [ 4247.911251] mhi :06:00.0: Entered 
with PM state: M3, MHI state: M3
Apr 15 01:45:14 test-NUC8i7HVK kernel: [ 4247.917762] mhi :06:00.0: State 
change event to state: M0
Apr 15 01:45:14 test-NUC8i7HVK kernel: [ 4247.917767] mhi :06:00.0: State 
change event to state: M1
Apr 15 01:45:14 test-NUC8i7HVK kernel: [ 4338.788231] mhi :06:00.0: Did not 
enter M0 state, MHI state: M2, PM state: M2
...

Fix this issue by simply adding M2 as a valid state for resume.

Tested-on: WCN6855 hw2.0 PCI 
WLAN.HSP.1.1-01720.1-QCAHSPSWPL_V1_V2_SILICONZ_LITE-1

Signed-off-by: Baochen Qiang 
---
 drivers/bus/mhi/core/pm.c | 1 +
 1 file changed, 1 insertion(+)

diff --git a/drivers/bus/mhi/core/pm.c b/drivers/bus/mhi/core/pm.c
index ce73cfa63cb3..ca5f2feed9d5 100644
--- a/drivers/bus/mhi/core/pm.c
+++ b/drivers/bus/mhi/core/pm.c
@@ -900,6 +900,7 @@ int mhi_pm_resume(struct mhi_controller *mhi_cntrl)
 
ret = wait_event_timeout(mhi_cntrl->state_event,
 mhi_cntrl->dev_state == MHI_STATE_M0 ||
+mhi_cntrl->dev_state == MHI_STATE_M2 ||
 MHI_PM_IN_ERROR_STATE(mhi_cntrl->pm_state),
 msecs_to_jiffies(mhi_cntrl->timeout_ms));
 
-- 
2.25.1



Re: [PATCH v4 4/4] pinctrl: add rsel setting on MT8195

2021-04-19 Thread Sean Wang
On Mon, Apr 12, 2021 at 10:57 PM Zhiyong Tao  wrote:


> @@ -176,6 +180,12 @@ static int mtk_pinconf_get(struct pinctrl_dev *pctldev,
> else
> err = -ENOTSUPP;
> break;
> +   case MTK_PIN_CONFIG_RSEL:
> +   if (hw->soc->rsel_get)
> +   err = hw->soc->rsel_get(hw, desc, );
> +   else
> +   err = -EOPNOTSUPP;

I think that should want to be -ENOTSUPP to align other occurrences.

> +   break;
> default:
> err = -ENOTSUPP;
> }
> @@ -295,6 +305,12 @@ static int mtk_pinconf_set(struct pinctrl_dev *pctldev, 
> unsigned int pin,
> else
> err = -ENOTSUPP;
> break;
> +   case MTK_PIN_CONFIG_RSEL:
> +   if (hw->soc->rsel_set)
> +   err = hw->soc->rsel_set(hw, desc, arg);
> +   else
> +   err = -EOPNOTSUPP;

Ditto

> +   break;
> default:
> err = -ENOTSUPP;
> }
> --
> 2.18.0
>


linux-next: build warning after merge of the spi tree

2021-04-19 Thread Stephen Rothwell
Hi all,

After merging the spi tree, today's linux-next build (x86_64 allmodconfig)
produced this warning:

In file included from include/linux/printk.h:409,
 from include/linux/kernel.h:16,
 from include/linux/clk.h:13,
 from drivers/spi/spi-stm32-qspi.c:7:
drivers/spi/spi-stm32-qspi.c: In function 'stm32_qspi_dirmap_read':
drivers/spi/spi-stm32-qspi.c:481:21: warning: format '%x' expects argument of 
type 'unsigned int', but argument 5 has type 'size_t' {aka 'long unsigned int'} 
[-Wformat=]
  481 |  dev_dbg(qspi->dev, "%s len = 0x%x offs = 0x%llx buf = 0x%p\n", 
__func__, len, offs, buf);
  | ^~
include/linux/dynamic_debug.h:129:15: note: in definition of macro 
'__dynamic_func_call'
  129 |   func(, ##__VA_ARGS__);  \
  |   ^~~
include/linux/dynamic_debug.h:161:2: note: in expansion of macro 
'_dynamic_func_call'
  161 |  _dynamic_func_call(fmt,__dynamic_dev_dbg,   \
  |  ^~
include/linux/dev_printk.h:123:2: note: in expansion of macro 'dynamic_dev_dbg'
  123 |  dynamic_dev_dbg(dev, dev_fmt(fmt), ##__VA_ARGS__)
  |  ^~~
include/linux/dev_printk.h:123:23: note: in expansion of macro 'dev_fmt'
  123 |  dynamic_dev_dbg(dev, dev_fmt(fmt), ##__VA_ARGS__)
  |   ^~~
drivers/spi/spi-stm32-qspi.c:481:2: note: in expansion of macro 'dev_dbg'
  481 |  dev_dbg(qspi->dev, "%s len = 0x%x offs = 0x%llx buf = 0x%p\n", 
__func__, len, offs, buf);
  |  ^~~
drivers/spi/spi-stm32-qspi.c:481:34: note: format string is defined here
  481 |  dev_dbg(qspi->dev, "%s len = 0x%x offs = 0x%llx buf = 0x%p\n", 
__func__, len, offs, buf);
  | ~^
  |  |
  |  unsigned int
  | %lx

Introduced by commit

  18674dee3cd6 ("spi: stm32-qspi: Add dirmap support")

-- 
Cheers,
Stephen Rothwell


pgpnXiPdMIUXz.pgp
Description: OpenPGP digital signature


Re: [PATCH 00/13] [RFC] Rust support

2021-04-19 Thread Willy Tarreau
Hi Nick,

On Mon, Apr 19, 2021 at 05:24:33PM -0700, Nick Desaulniers wrote:
> I don't think the introduction of Rust made Firefox _more_ insecure.
> https://wiki.mozilla.org/Oxidation#Within_Firefox

Browsers are human interfaces and do not fundamentally require low
level access to memory/hardware/whatever. They can be written in
about any language, only the resource usage and performance will
make a difference. As such, some were even written in Java or JS
for example.

Operating systems, and particularly drivers *do* require low-level
accesses, and stuff that can hardly be abstracted or understood by
a compiler. You may have to perform two 16-bit reads/writes on a
32-bit MMIO address to perform an operation and the compiler does
not have to know it, just to obey.

> Really, a key point is that a lot of common mistakes in C are compile
> time errors in Rust. I know no "true" kernel dev would make such
> mistakes in C,

Everyone makes mistakes, the level of attention varies over time and
the focus often changes when dealing with build errors. How many time
some of us facing a bug remembered having changed the code very late
after a build error, and being less careful from this point when the
goal changed from "let's do it right" to "let's get this to build" ?

> but is there nothing we can do to help our peers
> writing drivers?  The point is to transfer cost from runtime to
> compile time to avoid costs at runtime; like all of the memory safety
> bugs which are costing our industry.

And do we have stats on the number of logical bugs, some of which are
caused by developers trying to work around compilers' stubbornness ?
For me, personally speaking, they have *increased* over time, usually
trying to avoid some annoying modern gcc warnings, resulting in integer
casts being placed close to string formats, or returns being placed in
switch/case to avoid the fall-through warning, etc. Thus I'm worried that
a non-negligible part of the 70% of bugs caused by memory safety issues
could be replaced with logic bugs to get to the point where the rust
compiler finally accepts to compile the code. It makes me think about
researchers trying to reduce the causes of certain deaths and claiming
to "save lives" while in the end the people they "save" will simply die
from something else.

And I'm not particularly trying to blindly defend C here. I'm complaining
every single day about some of its shortcomings like the vast amount of
UB, stupid type promotion, counter-intuitive operators precedence when
combining bit-ops with arithmetic, limited size of enums, lack of rotate
operator, strict aliasing, or the recourse to asm() statements every 10
lines to do stuff that can hardly be expressed in a way understandable
by a compiler. I'm just seeing that a lot of the griefs I'm having
against C come from the compiler trying to be too smart or too stubborn,
so giving even more of the handle to a compiler doesn't appeal me at all.

In addition, we all know how painful it is to work around compiler bugs
by writing complex code that carefully avoids certain constructs. I'm
wondering if we'll still have that luxury with a stricter compiler, or
if the only response will have to be between "let's disable this driver
that does not compile" or "please force distros to upgrade their
compilers".

But we'll see :-/

Regards,
Willy


Re: [PATCH v4 3/4] pinctrl: add drive for I2C related pins on MT8195

2021-04-19 Thread Sean Wang
On Mon, Apr 12, 2021 at 10:57 PM Zhiyong Tao  wrote:
>
> This patch provides the advanced drive raw data setting version
> for I2C used pins on MT8195.
>
> Signed-off-by: Zhiyong Tao 

Acked-by: Sean Wang 

> ---
>  drivers/pinctrl/mediatek/pinctrl-mt8195.c | 22 +++
>  .../pinctrl/mediatek/pinctrl-mtk-common-v2.c  | 14 
>  .../pinctrl/mediatek/pinctrl-mtk-common-v2.h  |  5 +
>  3 files changed, 41 insertions(+)
>
> diff --git a/drivers/pinctrl/mediatek/pinctrl-mt8195.c 
> b/drivers/pinctrl/mediatek/pinctrl-mt8195.c
> index 063f164d7c9b..a7500e18bb1d 100644
> --- a/drivers/pinctrl/mediatek/pinctrl-mt8195.c
> +++ b/drivers/pinctrl/mediatek/pinctrl-mt8195.c
> @@ -760,6 +760,25 @@ static const struct mtk_pin_field_calc 
> mt8195_pin_drv_range[] = {
> PIN_FIELD_BASE(143, 143, 1, 0x020, 0x10, 24, 3),
>  };
>
> +static const struct mtk_pin_field_calc mt8195_pin_drv_adv_range[] = {
> +   PIN_FIELD_BASE(8, 8, 4, 0x020, 0x10, 15, 3),
> +   PIN_FIELD_BASE(9, 9, 4, 0x020, 0x10, 0, 3),
> +   PIN_FIELD_BASE(10, 10, 4, 0x020, 0x10, 18, 3),
> +   PIN_FIELD_BASE(11, 11, 4, 0x020, 0x10, 3, 3),
> +   PIN_FIELD_BASE(12, 12, 4, 0x020, 0x10, 21, 3),
> +   PIN_FIELD_BASE(13, 13, 4, 0x020, 0x10, 6, 3),
> +   PIN_FIELD_BASE(14, 14, 4, 0x020, 0x10, 24, 3),
> +   PIN_FIELD_BASE(15, 15, 4, 0x020, 0x10, 9, 3),
> +   PIN_FIELD_BASE(16, 16, 4, 0x020, 0x10, 27, 3),
> +   PIN_FIELD_BASE(17, 17, 4, 0x020, 0x10, 12, 3),
> +   PIN_FIELD_BASE(29, 29, 2, 0x020, 0x10, 0, 3),
> +   PIN_FIELD_BASE(30, 30, 2, 0x020, 0x10, 3, 3),
> +   PIN_FIELD_BASE(34, 34, 1, 0x040, 0x10, 0, 3),
> +   PIN_FIELD_BASE(35, 35, 1, 0x040, 0x10, 3, 3),
> +   PIN_FIELD_BASE(44, 44, 1, 0x040, 0x10, 6, 3),
> +   PIN_FIELD_BASE(45, 45, 1, 0x040, 0x10, 9, 3),
> +};
> +
>  static const struct mtk_pin_reg_calc mt8195_reg_cals[PINCTRL_PIN_REG_MAX] = {
> [PINCTRL_PIN_REG_MODE] = MTK_RANGE(mt8195_pin_mode_range),
> [PINCTRL_PIN_REG_DIR] = MTK_RANGE(mt8195_pin_dir_range),
> @@ -773,6 +792,7 @@ static const struct mtk_pin_reg_calc 
> mt8195_reg_cals[PINCTRL_PIN_REG_MAX] = {
> [PINCTRL_PIN_REG_PUPD] = MTK_RANGE(mt8195_pin_pupd_range),
> [PINCTRL_PIN_REG_R0] = MTK_RANGE(mt8195_pin_r0_range),
> [PINCTRL_PIN_REG_R1] = MTK_RANGE(mt8195_pin_r1_range),
> +   [PINCTRL_PIN_REG_DRV_ADV] = MTK_RANGE(mt8195_pin_drv_adv_range),
>  };
>
>  static const char * const mt8195_pinctrl_register_base_names[] = {
> @@ -801,6 +821,8 @@ static const struct mtk_pin_soc mt8195_data = {
> .bias_get_combo = mtk_pinconf_bias_get_combo,
> .drive_set = mtk_pinconf_drive_set_rev1,
> .drive_get = mtk_pinconf_drive_get_rev1,
> +   .adv_drive_get = mtk_pinconf_adv_drive_get_raw,
> +   .adv_drive_set = mtk_pinconf_adv_drive_set_raw,
>  };
>
>  static const struct of_device_id mt8195_pinctrl_of_match[] = {
> diff --git a/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c 
> b/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c
> index 72f17f26acd8..2b51f4a9b860 100644
> --- a/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c
> +++ b/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.c
> @@ -1027,6 +1027,20 @@ int mtk_pinconf_adv_drive_get(struct mtk_pinctrl *hw,
>  }
>  EXPORT_SYMBOL_GPL(mtk_pinconf_adv_drive_get);
>
> +int mtk_pinconf_adv_drive_set_raw(struct mtk_pinctrl *hw,
> + const struct mtk_pin_desc *desc, u32 arg)
> +{
> +   return mtk_hw_set_value(hw, desc, PINCTRL_PIN_REG_DRV_ADV, arg);
> +}
> +EXPORT_SYMBOL_GPL(mtk_pinconf_adv_drive_set_raw);
> +
> +int mtk_pinconf_adv_drive_get_raw(struct mtk_pinctrl *hw,
> + const struct mtk_pin_desc *desc, u32 *val)
> +{
> +   return mtk_hw_get_value(hw, desc, PINCTRL_PIN_REG_DRV_ADV, val);
> +}
> +EXPORT_SYMBOL_GPL(mtk_pinconf_adv_drive_get_raw);
> +
>  MODULE_LICENSE("GPL v2");
>  MODULE_AUTHOR("Sean Wang ");
>  MODULE_DESCRIPTION("Pin configuration library module for mediatek SoCs");
> diff --git a/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.h 
> b/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.h
> index e2aae285b5fc..fd5ce9c5dcbd 100644
> --- a/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.h
> +++ b/drivers/pinctrl/mediatek/pinctrl-mtk-common-v2.h
> @@ -66,6 +66,7 @@ enum {
> PINCTRL_PIN_REG_DRV_EN,
> PINCTRL_PIN_REG_DRV_E0,
> PINCTRL_PIN_REG_DRV_E1,
> +   PINCTRL_PIN_REG_DRV_ADV,
> PINCTRL_PIN_REG_MAX,
>  };
>
> @@ -314,6 +315,10 @@ int mtk_pinconf_adv_drive_set(struct mtk_pinctrl *hw,
>   const struct mtk_pin_desc *desc, u32 arg);
>  int mtk_pinconf_adv_drive_get(struct mtk_pinctrl *hw,
>   const struct mtk_pin_desc *desc, u32 *val);
> +int mtk_pinconf_adv_drive_set_raw(struct mtk_pinctrl *hw,
> + const struct mtk_pin_desc *desc, u32 arg);
> +int mtk_pinconf_adv_drive_get_raw(struct mtk_pinctrl 

[PATCH] drivers: regulator: core.c: Fix indentation of comment

2021-04-19 Thread Shubhankar Kuranagatti
Shifted the closing */ of multiline comment to a new line
This is done to maintain code uniformity

Signed-off-by: Shubhankar Kuranagatti 
---
 drivers/regulator/core.c | 21 ++---
 1 file changed, 14 insertions(+), 7 deletions(-)

diff --git a/drivers/regulator/core.c b/drivers/regulator/core.c
index 16114aea099a..06fbf18b6524 100644
--- a/drivers/regulator/core.c
+++ b/drivers/regulator/core.c
@@ -538,7 +538,8 @@ static int regulator_mode_constrain(struct regulator_dev 
*rdev,
 
/* The modes are bitmasks, the most power hungry modes having
 * the lowest values. If the requested mode isn't supported
-* try higher modes. */
+* try higher modes.
+*/
while (*mode) {
if (rdev->constraints->valid_modes_mask & *mode)
return 0;
@@ -931,7 +932,8 @@ static DEVICE_ATTR(bypass, 0444,
   regulator_bypass_show, NULL);
 
 /* Calculate the new optimum regulator operating mode based on the new total
- * consumer load. All locks held by caller */
+ * consumer load. All locks held by caller
+ */
 static int drms_uA_update(struct regulator_dev *rdev)
 {
struct regulator *sibling;
@@ -1219,7 +1221,8 @@ static int machine_constraints_voltage(struct 
regulator_dev *rdev,
int cmax = constraints->max_uV;
 
/* it's safe to autoconfigure fixed-voltage supplies
-  and the constraints are used by list_voltage. */
+* and the constraints are used by list_voltage.
+*/
if (count == 1 && !cmin) {
cmin = 1;
cmax = INT_MAX;
@@ -2525,7 +2528,8 @@ static int _regulator_do_enable(struct regulator_dev 
*rdev)
 
/* Allow the regulator to ramp; it would be useful to extend
 * this for bulk operations so that the regulators can ramp
-* together.  */
+* together.
+*/
trace_regulator_enable_delay(rdev_get_name(rdev));
 
/* If poll_enabled_time is set, poll upto the delay calculated
@@ -5337,10 +5341,12 @@ regulator_register(const struct regulator_desc 
*regulator_desc,
ret = set_machine_constraints(rdev);
if (ret == -EPROBE_DEFER) {
/* Regulator might be in bypass mode and so needs its supply
-* to set the constraints */
+* to set the constraints
+*/
/* FIXME: this currently triggers a chicken-and-egg problem
 * when creating -SUPPLY symlink in sysfs to a regulator
-* that is just being created */
+* that is just being created
+*/
rdev_dbg(rdev, "will resolve supply early: %s\n",
 rdev->supply_name);
ret = regulator_resolve_supply(rdev);
@@ -5899,7 +5905,8 @@ static int regulator_late_cleanup(struct device *dev, 
void *data)
 
if (have_full_constraints()) {
/* We log since this may kill the system if it goes
-* wrong. */
+* wrong.
+*/
rdev_info(rdev, "disabling\n");
ret = _regulator_do_disable(rdev);
if (ret != 0)
-- 
2.17.1



Re: [PATCH v4 2/4] pinctrl: add pinctrl driver on mt8195

2021-04-19 Thread Sean Wang
On Mon, Apr 12, 2021 at 10:57 PM Zhiyong Tao  wrote:
>
> This commit includes pinctrl driver for mt8195.
>
> Signed-off-by: Zhiyong Tao 

Acked-by: Sean Wang 

> ---
>  drivers/pinctrl/mediatek/Kconfig  |6 +
>  drivers/pinctrl/mediatek/Makefile |1 +
>  drivers/pinctrl/mediatek/pinctrl-mt8195.c |  828 
>  drivers/pinctrl/mediatek/pinctrl-mtk-mt8195.h | 1669 +
>  4 files changed, 2504 insertions(+)
>  create mode 100644 drivers/pinctrl/mediatek/pinctrl-mt8195.c
>  create mode 100644 drivers/pinctrl/mediatek/pinctrl-mtk-mt8195.h
>
> diff --git a/drivers/pinctrl/mediatek/Kconfig 
> b/drivers/pinctrl/mediatek/Kconfig
> index eef17f228669..90f0c8255eaf 100644
> --- a/drivers/pinctrl/mediatek/Kconfig
> +++ b/drivers/pinctrl/mediatek/Kconfig
> @@ -147,6 +147,12 @@ config PINCTRL_MT8192
> default ARM64 && ARCH_MEDIATEK
> select PINCTRL_MTK_PARIS
>
> +config PINCTRL_MT8195
> +   bool "Mediatek MT8195 pin control"
> +   depends on OF
> +   depends on ARM64 || COMPILE_TEST
> +   select PINCTRL_MTK_PARIS
> +
>  config PINCTRL_MT8516
> bool "Mediatek MT8516 pin control"
> depends on OF
> diff --git a/drivers/pinctrl/mediatek/Makefile 
> b/drivers/pinctrl/mediatek/Makefile
> index 01218bf4dc30..06fde993ace2 100644
> --- a/drivers/pinctrl/mediatek/Makefile
> +++ b/drivers/pinctrl/mediatek/Makefile
> @@ -21,5 +21,6 @@ obj-$(CONFIG_PINCTRL_MT8167)  += pinctrl-mt8167.o
>  obj-$(CONFIG_PINCTRL_MT8173)   += pinctrl-mt8173.o
>  obj-$(CONFIG_PINCTRL_MT8183)   += pinctrl-mt8183.o
>  obj-$(CONFIG_PINCTRL_MT8192)   += pinctrl-mt8192.o
> +obj-$(CONFIG_PINCTRL_MT8195)+= pinctrl-mt8195.o
>  obj-$(CONFIG_PINCTRL_MT8516)   += pinctrl-mt8516.o
>  obj-$(CONFIG_PINCTRL_MT6397)   += pinctrl-mt6397.o
> diff --git a/drivers/pinctrl/mediatek/pinctrl-mt8195.c 
> b/drivers/pinctrl/mediatek/pinctrl-mt8195.c
> new file mode 100644
> index ..063f164d7c9b
> --- /dev/null
> +++ b/drivers/pinctrl/mediatek/pinctrl-mt8195.c
> @@ -0,0 +1,828 @@
> +// SPDX-License-Identifier: GPL-2.0
> +/*
> + * Copyright (C) 2020 MediaTek Inc.
> + *
> + * Author: Zhiyong Tao 
> + *
> + */
> +
> +#include "pinctrl-mtk-mt8195.h"
> +#include "pinctrl-paris.h"
> +
> +/* MT8195 have multiple bases to program pin configuration listed as the 
> below:
> + * iocfg[0]:0x10005000, iocfg[1]:0x11d1, iocfg[2]:0x11d3,
> + * iocfg[3]:0x11d4, iocfg[4]:0x11e2, iocfg[5]:0x11eb,
> + * iocfg[6]:0x11f4.
> + * _i_based could be used to indicate what base the pin should be mapped 
> into.
> + */
> +
> +#define PIN_FIELD_BASE(s_pin, e_pin, i_base, s_addr, x_addrs, s_bit, x_bits) 
> \
> +   PIN_FIELD_CALC(s_pin, e_pin, i_base, s_addr, x_addrs, s_bit, x_bits, \
> +  32, 0)
> +
> +#define PINS_FIELD_BASE(s_pin, e_pin, i_base, s_addr, x_addrs, s_bit, 
> x_bits) \
> +   PIN_FIELD_CALC(s_pin, e_pin, i_base, s_addr, x_addrs, s_bit, x_bits,  
> \
> +  32, 1)
> +
> +static const struct mtk_pin_field_calc mt8195_pin_mode_range[] = {
> +   PIN_FIELD(0, 144, 0x300, 0x10, 0, 4),
> +};
> +
> +static const struct mtk_pin_field_calc mt8195_pin_dir_range[] = {
> +   PIN_FIELD(0, 144, 0x0, 0x10, 0, 1),
> +};
> +
> +static const struct mtk_pin_field_calc mt8195_pin_di_range[] = {
> +   PIN_FIELD(0, 144, 0x200, 0x10, 0, 1),
> +};
> +
> +static const struct mtk_pin_field_calc mt8195_pin_do_range[] = {
> +   PIN_FIELD(0, 144, 0x100, 0x10, 0, 1),
> +};
> +
> +static const struct mtk_pin_field_calc mt8195_pin_ies_range[] = {
> +   PIN_FIELD_BASE(0, 0, 4, 0x040, 0x10, 0, 1),
> +   PIN_FIELD_BASE(1, 1, 4, 0x040, 0x10, 1, 1),
> +   PIN_FIELD_BASE(2, 2, 4, 0x040, 0x10, 2, 1),
> +   PIN_FIELD_BASE(3, 3, 4, 0x040, 0x10, 3, 1),
> +   PIN_FIELD_BASE(4, 4, 4, 0x040, 0x10, 4, 1),
> +   PIN_FIELD_BASE(5, 5, 4, 0x040, 0x10, 5, 1),
> +   PIN_FIELD_BASE(6, 6, 4, 0x040, 0x10, 6, 1),
> +   PIN_FIELD_BASE(7, 7, 4, 0x040, 0x10, 7, 1),
> +   PIN_FIELD_BASE(8, 8, 4, 0x040, 0x10, 13, 1),
> +   PIN_FIELD_BASE(9, 9, 4, 0x040, 0x10, 8, 1),
> +   PIN_FIELD_BASE(10, 10, 4, 0x040, 0x10, 14, 1),
> +   PIN_FIELD_BASE(11, 11, 4, 0x040, 0x10, 9, 1),
> +   PIN_FIELD_BASE(12, 12, 4, 0x040, 0x10, 15, 1),
> +   PIN_FIELD_BASE(13, 13, 4, 0x040, 0x10, 10, 1),
> +   PIN_FIELD_BASE(14, 14, 4, 0x040, 0x10, 16, 1),
> +   PIN_FIELD_BASE(15, 15, 4, 0x040, 0x10, 11, 1),
> +   PIN_FIELD_BASE(16, 16, 4, 0x040, 0x10, 17, 1),
> +   PIN_FIELD_BASE(17, 17, 4, 0x040, 0x10, 12, 1),
> +   PIN_FIELD_BASE(18, 18, 2, 0x040, 0x10, 5, 1),
> +   PIN_FIELD_BASE(19, 19, 2, 0x040, 0x10, 12, 1),
> +   PIN_FIELD_BASE(20, 20, 2, 0x040, 0x10, 11, 1),
> +   PIN_FIELD_BASE(21, 21, 2, 0x040, 0x10, 10, 1),
> +   PIN_FIELD_BASE(22, 22, 2, 0x040, 0x10, 0, 1),
> +   PIN_FIELD_BASE(23, 23, 2, 0x040, 0x10, 1, 1),
> +   PIN_FIELD_BASE(24, 24, 2, 0x040, 0x10, 2, 1),
> +   

Re: [PATCH] powerpc/pseries: Add shutdown() to vio_driver and vio_bus

2021-04-19 Thread Michael Ellerman
Tyrel Datwyler  writes:
> On 4/17/21 5:30 AM, Michael Ellerman wrote:
>> Tyrel Datwyler  writes:
>>> On 4/1/21 5:13 PM, Tyrel Datwyler wrote:
 Currently, neither the vio_bus or vio_driver structures provide support
 for a shutdown() routine.

 Add support for shutdown() by allowing drivers to provide a
 implementation via function pointer in their vio_driver struct and
 provide a proper implementation in the driver template for the vio_bus
 that calls a vio drivers shutdown() if defined.

 In the case that no shutdown() is defined by a vio driver and a kexec is
 in progress we implement a big hammer that calls remove() to ensure no
 further DMA for the devices is possible.

 Signed-off-by: Tyrel Datwyler 
 ---
>>>
>>> Ping... any comments, problems with this approach?
>> 
>> The kexec part seems like a bit of a hack.
>> 
>> It also doesn't help for kdump, when none of the shutdown code is run.
>
> If I understand correctly for kdump we have a reserved memory space where the
> kdump kernel is loaded, but for kexec the memory region isn't reserved ahead 
> of
> time meaning we can try and load the kernel over potential memory used for DMA
> by the current kernel.

That's correct.

>> How many drivers do we have? Can we just implement a proper shutdown for
>> them?
>
> Well that is the end goal. I just don't currently have the bandwidth to do 
> each
> driver myself with a proper shutdown sequence, and thought this was a 
> launching
> off point to at least introduce the shutdown callback to the VIO bus.

Fair enough.

> Off the top of my head we have 3 storage drivers, 2 network drivers, vtpm, 
> vmc,
> pseries_rng, nx, nx842, hvcs, hvc_vio.
>
> I can drop the kexec_in_progress hammer and just have each driver call 
> remove()
> themselves in their shutdown function. Leave it to each maintainer to decide 
> if
> remove() is enough or if there is a more lightweight quiesce sequence they
> choose to implement.

That's OK, you've convinced me. I'll take it as-is.

Eventually it would be good for drivers to implement shutdown in the
optimal way for their device, but that can be done incrementally.

cheers


Re: [PATCH] capabilities: require CAP_SETFCAP to map uid 0 (v3.3)

2021-04-19 Thread Serge E. Hallyn
On Mon, Apr 19, 2021 at 06:09:11PM +0200, Christian Brauner wrote:
> On Mon, Apr 19, 2021 at 07:25:14AM -0500, Serge Hallyn wrote:
> > cap_setfcap is required to create file capabilities.
> > 
> > Since 8db6c34f1dbc ("Introduce v3 namespaced file capabilities"), a
> > process running as uid 0 but without cap_setfcap is able to work around
> > this as follows: unshare a new user namespace which maps parent uid 0
> > into the child namespace.  While this task will not have new
> > capabilities against the parent namespace, there is a loophole due to
> > the way namespaced file capabilities are represented as xattrs.  File
> > capabilities valid in userns 1 are distinguished from file capabilities
> > valid in userns 2 by the kuid which underlies uid 0.  Therefore the
> > restricted root process can unshare a new self-mapping namespace, add a
> > namespaced file capability onto a file, then use that file capability in
> > the parent namespace.
> > 
> > To prevent that, do not allow mapping parent uid 0 if the process which
> > opened the uid_map file does not have CAP_SETFCAP, which is the capability
> > for setting file capabilities.
> > 
> > As a further wrinkle:  a task can unshare its user namespace, then
> > open its uid_map file itself, and map (only) its own uid.  In this
> > case we do not have the credential from before unshare,  which was
> > potentially more restricted.  So, when creating a user namespace, we
> > record whether the creator had CAP_SETFCAP.  Then we can use that
> > during map_write().
> > 
> > With this patch:
> > 
> > 1. Unprivileged user can still unshare -Ur
> > 
> > ubuntu@caps:~$ unshare -Ur
> > root@caps:~# logout
> > 
> > 2. Root user can still unshare -Ur
> > 
> > ubuntu@caps:~$ sudo bash
> > root@caps:/home/ubuntu# unshare -Ur
> > root@caps:/home/ubuntu# logout
> > 
> > 3. Root user without CAP_SETFCAP cannot unshare -Ur:
> > 
> > root@caps:/home/ubuntu# /sbin/capsh --drop=cap_setfcap --
> > root@caps:/home/ubuntu# /sbin/setcap cap_setfcap=p /sbin/setcap
> > unable to set CAP_SETFCAP effective capability: Operation not permitted
> > root@caps:/home/ubuntu# unshare -Ur
> > unshare: write failed /proc/self/uid_map: Operation not permitted
> > 
> > Note: an alternative solution would be to allow uid 0 mappings by
> > processes without CAP_SETFCAP, but to prevent such a namespace from
> > writing any file capabilities.  This approach can be seen here:
> > 
> > https://git.kernel.org/pub/scm/linux/kernel/git/sergeh/linux.git/log/?h=2021-04-15/setfcap-nsfscaps-v4
> > 
> 
> Ah, can you link to the previous fix and its revert, please? I think
> that was mentioned in the formerly private thread as well but we forgot:
> 
> commit 95ebabde382c371572297915b104e55403674e73
> Author: Eric W. Biederman 
> Date:   Thu Dec 17 09:42:00 2020 -0600
> 
> capabilities: Don't allow writing ambiguous v3 file capabilities
> 
> commit 3b0c2d3eaa83da259d7726192cf55a137769012f
> Author: Eric W. Biederman 
> Date:   Fri Mar 12 15:07:09 2021 -0600
> 
> Revert 95ebabde382c ("capabilities: Don't allow writing ambiguous v3 file 
> capabilities")

Sure.

Is there a tag for that kind of thing or do I just mention it at the end
of the description?


Re: [PATCH v4 5/7] cpufreq: qcom-hw: Implement CPRh aware OSM programming

2021-04-19 Thread Viresh Kumar
On 19-04-21, 13:52, Bjorn Andersson wrote:
> On Tue 19 Jan 11:45 CST 2021, AngeloGioacchino Del Regno wrote:
> @Viresh, do you have any suggestion regarding my last comment?

> >  static int qcom_cpufreq_hw_driver_probe(struct platform_device *pdev)
> >  {
> > +   const struct qcom_cpufreq_soc_data *soc_data;
> > +   struct device_node *pd_node;
> > +   struct platform_device *pd_dev;
> > struct device *cpu_dev;
> > struct clk *clk;
> > -   int ret;
> > +   int clk_div, ret;
> > +
> > +   cpu_dev = get_cpu_device(0);
> > +   if (!cpu_dev)
> > +   return -EPROBE_DEFER;
> > +
> > +   soc_data = of_device_get_match_data(>dev);
> > +   if (!soc_data)
> > +   return -EINVAL;
> > +
> > +   if (!soc_data->uses_tz) {
> > +   /*
> > +* When the OSM is not pre-programmed from TZ, we will
> > +* need to program the sequencer through SCM calls.
> > +*/
> > +   if (!qcom_scm_is_available())
> > +   return -EPROBE_DEFER;
> > +
> > +   /*
> > +* If there are no power-domains, OSM programming cannot be
> > +* performed, as in that case, we wouldn't know where to take
> > +* the params from...
> > +*/
> > +   pd_node = of_parse_phandle(cpu_dev->of_node,
> > +  "power-domains", 0);
> > +   if (!pd_node) {
> > +   ret = PTR_ERR(pd_node);
> > +   dev_err(cpu_dev, "power domain not found: %d\n", ret);
> > +   return ret;
> > +   }
> > +
> > +   /*
> > +* If the power domain device is not registered yet, then
> > +* defer probing this driver until that is available.
> > +*/
> > +   pd_dev = of_find_device_by_node(pd_node);
> > +   if (!pd_dev || !pd_dev->dev.driver ||
> > +   !device_is_bound(_dev->dev))
> > +   return -EPROBE_DEFER;
> 
> I wonder if there's a more appropriate way to probe defer on resources
> described in the CPU nodes...

Recently we made some updates to the OPP core to start returning
EPROBE_DEFER on failure to acquire resources. I think you can get rid
of many checks for resources here by just trying to create the OPP
table and check its return value for EPROBE_DEFER.

-- 
viresh


Re: Enabling pmbus power control

2021-04-19 Thread Guenter Roeck
On Mon, Apr 19, 2021 at 08:29:53PM -0500, Zev Weiss wrote:
> On Tue, Mar 30, 2021 at 02:38:10PM CDT, Guenter Roeck wrote:
> > On Tue, Mar 30, 2021 at 07:02:00PM +0100, Mark Brown wrote:
> > > On Tue, Mar 30, 2021 at 12:56:56PM -0500, Zev Weiss wrote:
> > > 
> > > > Okay, to expand a bit on the description in my initial message -- we've
> > > > got a single chassis with multiple server boards and a single manager 
> > > > board
> > > > that handles, among other things, power control for the servers.
> > > > The manager board has one LM25066 for each attached server, which acts 
> > > > as
> > > > the "power switch" for that server.  There thus really isn't any driver 
> > > > to
> > > > speak of for the downstream device.
> > > 
> > > This sounds like you need a driver representing those server boards (or
> > > the slots they plug into perhaps) that represents everything about those
> > > boards to userspace, including power switching.  I don't see why you
> > > wouldn't have a driver for that - it's a thing that physically exists
> > > and clearly has some software control, and you'd presumably also expect
> > > to represent some labelling about the slot as well.
> > 
> > Absolutely agree.
> > 
> > Thanks,
> > Guenter
> 
> Hi Guenter, Mark,
> 
> I wanted to return to this to try to explain why structuring the kernel
> support for this in a way that's specific to the device behind the PMIC
> seems like an awkward fit to me, and ultimately kind of artificial.
> 
> In the system I described, the manager board with the LM25066 devices is its
> own little self-contained BMC system running its own Linux kernel (though
> "BMC" is perhaps a slightly misleading term because there's no host system
> that it manages).  The PMICs are really the only relevant connection it has
> to the servers it controls power for -- they have their own dedicated local
> BMCs on board as well doing all the usual BMC tasks.  A hypothetical
> dedicated driver for this on the manager board wouldn't have any other
> hardware to touch aside from the pmbus interface of the LM25066 itself, so
> calling it a server board driver seems pretty arbitrary -- and in fact the
> same system also has another LM25066 that controls the power supply to the
> chassis cooling fans (a totally different downstream device, but one that
> would be equally well-served by the same software).
> 
> More recently, another system has entered the picture for us that might
> illustrate it more starkly -- the Delta Open19 power shelf [0] supported by
> a recent code release from LinkedIn [1].  This is a rackmount power supply

All I can see is that this code is a mess.

> with fifty outputs, each independently switchable via an LM25066 attached to
> an on-board BMC-style management controller (though again, no host system
> involved).  We (Equinix Metal) are interested in porting a modern OpenBMC to
> it to replace the dated, crufty, pre-Linux-Foundation version of OpenBMC it
> currently runs (as found in the linked repo).  The exact nature of the
> devices powered by its outputs is well outside the scope of the firmware
> running on that controller (it could be any arbitrary thing that runs on
> 12VDC), but we still want to be able to both (a) retrieve per-output
> voltage/current/power readings as provided by the existing LM25066 hwmon
> support, and (b) control the on/off state of those outputs from userspace.
> 
> Given the array of possible use-cases, an approach of adding power-switch
> functionality to the existing LM25066 support seems like the most obvious
> way to support this, so I'm hoping to see if I can get some idea of what an
> acceptable implementation of that might look like.
> 

Sorry, that is simply a no-go for the LM25066 driver. I mentioned it before,
and it is still true: The hwmon subsystem is not in the business of power
control.

Have you looked into enclosures (drivers/misc/enclosure.c) ? Maybe that
or something similar could be used for your purposes.

Thanks,
Guenter


[Patch v3 7/7] crypto: qce: aead: Schedule fallback algorithm

2021-04-19 Thread Thara Gopinath
Qualcomm crypto engine does not handle the following scenarios and
will issue an abort. In such cases, pass on the transformation to
a fallback algorithm.

- DES3 algorithms with all three keys same.
- AES192 algorithms.
- 0 length messages.

Signed-off-by: Thara Gopinath 
---

v1->v2:
- Updated crypto_aead_set_reqsize to include the size of fallback
  request as well.

 drivers/crypto/qce/aead.c | 64 ---
 drivers/crypto/qce/aead.h |  3 ++
 2 files changed, 56 insertions(+), 11 deletions(-)

diff --git a/drivers/crypto/qce/aead.c b/drivers/crypto/qce/aead.c
index ef66ae21eae3..6d06a19b48e4 100644
--- a/drivers/crypto/qce/aead.c
+++ b/drivers/crypto/qce/aead.c
@@ -512,7 +512,23 @@ static int qce_aead_crypt(struct aead_request *req, int 
encrypt)
/* CE does not handle 0 length messages */
if (!rctx->cryptlen) {
if (!(IS_CCM(rctx->flags) && IS_DECRYPT(rctx->flags)))
-   return -EINVAL;
+   ctx->need_fallback = true;
+   }
+
+   /* If fallback is needed, schedule and exit */
+   if (ctx->need_fallback) {
+   /* Reset need_fallback in case the same ctx is used for another 
transaction */
+   ctx->need_fallback = false;
+
+   aead_request_set_tfm(>fallback_req, ctx->fallback);
+   aead_request_set_callback(>fallback_req, req->base.flags,
+ req->base.complete, req->base.data);
+   aead_request_set_crypt(>fallback_req, req->src,
+  req->dst, req->cryptlen, req->iv);
+   aead_request_set_ad(>fallback_req, req->assoclen);
+
+   return encrypt ? crypto_aead_encrypt(>fallback_req) :
+crypto_aead_decrypt(>fallback_req);
}
 
/*
@@ -553,7 +569,7 @@ static int qce_aead_ccm_setkey(struct crypto_aead *tfm, 
const u8 *key,
memcpy(ctx->ccm4309_salt, key + keylen, QCE_CCM4309_SALT_SIZE);
}
 
-   if (keylen != AES_KEYSIZE_128 && keylen != AES_KEYSIZE_256)
+   if (keylen != AES_KEYSIZE_128 && keylen != AES_KEYSIZE_256 && keylen != 
AES_KEYSIZE_192)
return -EINVAL;
 
ctx->enc_keylen = keylen;
@@ -562,7 +578,12 @@ static int qce_aead_ccm_setkey(struct crypto_aead *tfm, 
const u8 *key,
memcpy(ctx->enc_key, key, keylen);
memcpy(ctx->auth_key, key, keylen);
 
-   return 0;
+   if (keylen == AES_KEYSIZE_192)
+   ctx->need_fallback = true;
+
+   return IS_CCM_RFC4309(flags) ?
+   crypto_aead_setkey(ctx->fallback, key, keylen + 
QCE_CCM4309_SALT_SIZE) :
+   crypto_aead_setkey(ctx->fallback, key, keylen);
 }
 
 static int qce_aead_setkey(struct crypto_aead *tfm, const u8 *key, unsigned 
int keylen)
@@ -593,20 +614,21 @@ static int qce_aead_setkey(struct crypto_aead *tfm, const 
u8 *key, unsigned int
 * The crypto engine does not support any two keys
 * being the same for triple des algorithms. The
 * verify_skcipher_des3_key does not check for all the
-* below conditions. Return -EINVAL in case any two keys
-* are the same. Revisit to see if a fallback cipher
-* is needed to handle this condition.
+* below conditions. Schedule fallback in this case.
 */
memcpy(_key, authenc_keys.enckey, DES3_EDE_KEY_SIZE);
if (!((_key[0] ^ _key[2]) | (_key[1] ^ _key[3])) ||
!((_key[2] ^ _key[4]) | (_key[3] ^ _key[5])) ||
!((_key[0] ^ _key[4]) | (_key[1] ^ _key[5])))
-   return -EINVAL;
+   ctx->need_fallback = true;
} else if (IS_AES(flags)) {
/* No random key sizes */
if (authenc_keys.enckeylen != AES_KEYSIZE_128 &&
+   authenc_keys.enckeylen != AES_KEYSIZE_192 &&
authenc_keys.enckeylen != AES_KEYSIZE_256)
return -EINVAL;
+   if (authenc_keys.enckeylen == AES_KEYSIZE_192)
+   ctx->need_fallback = true;
}
 
ctx->enc_keylen = authenc_keys.enckeylen;
@@ -617,7 +639,7 @@ static int qce_aead_setkey(struct crypto_aead *tfm, const 
u8 *key, unsigned int
memset(ctx->auth_key, 0, sizeof(ctx->auth_key));
memcpy(ctx->auth_key, authenc_keys.authkey, authenc_keys.authkeylen);
 
-   return 0;
+   return crypto_aead_setkey(ctx->fallback, key, keylen);
 }
 
 static int qce_aead_setauthsize(struct crypto_aead *tfm, unsigned int authsize)
@@ -632,15 +654,33 @@ static int qce_aead_setauthsize(struct crypto_aead *tfm, 
unsigned int authsize)
return -EINVAL;
}
ctx->authsize = authsize;
-   return 0;
+
+   return crypto_aead_setauthsize(ctx->fallback, authsize);
 }
 
 

[Patch v3 6/7] crypto: qce: common: Add support for AEAD algorithms

2021-04-19 Thread Thara Gopinath
Add register programming sequence for enabling AEAD
algorithms on the Qualcomm crypto engine.

Signed-off-by: Thara Gopinath 
---

v2->v3:
- Made qce_be32_to_cpu_array truly be32 to cpu endian by using 
be32_to_cpup
  instead of cpu_to_be32p. Also remove the (u32 *) typcasting of arrays 
obtained
  as output from qce_be32_to_cpu_array as per Bjorn's review comments.
- Wrapped newly introduced std_iv_sha1, std_iv_sha256 and 
qce_be32_to_cpu_array
  in CONFIG_CRYPTO_DEV_QCE_AEAD to prevent W1 warnings as reported by 
kernel
  test robot .

v1->v2:
- Minor fixes like removing not needed initializing of variables
  and using bool values in lieu of 0 and 1 as pointed out by Bjorn.
- Introduced qce_be32_to_cpu_array which converts the u8 string in big
  endian order to array of u32 and returns back total number of words,
  as per Bjorn's review comments. Presently this function is used only 
by
  qce_setup_regs_aead to format keys, iv and nonce. cipher and hash 
  algorithms can be made to use this function as a separate clean up 
patch.

 drivers/crypto/qce/common.c | 162 +++-
 1 file changed, 160 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/qce/common.c b/drivers/crypto/qce/common.c
index 7b3d6caec1b2..6d6b3792323b 100644
--- a/drivers/crypto/qce/common.c
+++ b/drivers/crypto/qce/common.c
@@ -15,6 +15,7 @@
 #include "core.h"
 #include "regs-v5.h"
 #include "sha.h"
+#include "aead.h"
 
 static inline u32 qce_read(struct qce_device *qce, u32 offset)
 {
@@ -96,7 +97,7 @@ static inline void qce_crypto_go(struct qce_device *qce, bool 
result_dump)
qce_write(qce, REG_GOPROC, BIT(GO_SHIFT));
 }
 
-#ifdef CONFIG_CRYPTO_DEV_QCE_SHA
+#if defined(CONFIG_CRYPTO_DEV_QCE_SHA) || defined(CONFIG_CRYPTO_DEV_QCE_AEAD)
 static u32 qce_auth_cfg(unsigned long flags, u32 key_size, u32 auth_size)
 {
u32 cfg = 0;
@@ -139,7 +140,9 @@ static u32 qce_auth_cfg(unsigned long flags, u32 key_size, 
u32 auth_size)
 
return cfg;
 }
+#endif
 
+#ifdef CONFIG_CRYPTO_DEV_QCE_SHA
 static int qce_setup_regs_ahash(struct crypto_async_request *async_req)
 {
struct ahash_request *req = ahash_request_cast(async_req);
@@ -225,7 +228,7 @@ static int qce_setup_regs_ahash(struct crypto_async_request 
*async_req)
 }
 #endif
 
-#ifdef CONFIG_CRYPTO_DEV_QCE_SKCIPHER
+#if defined(CONFIG_CRYPTO_DEV_QCE_SKCIPHER) || 
defined(CONFIG_CRYPTO_DEV_QCE_AEAD)
 static u32 qce_encr_cfg(unsigned long flags, u32 aes_key_size)
 {
u32 cfg = 0;
@@ -271,7 +274,9 @@ static u32 qce_encr_cfg(unsigned long flags, u32 
aes_key_size)
 
return cfg;
 }
+#endif
 
+#ifdef CONFIG_CRYPTO_DEV_QCE_SKCIPHER
 static void qce_xts_swapiv(__be32 *dst, const u8 *src, unsigned int ivsize)
 {
u8 swap[QCE_AES_IV_LENGTH];
@@ -386,6 +391,155 @@ static int qce_setup_regs_skcipher(struct 
crypto_async_request *async_req)
 }
 #endif
 
+#ifdef CONFIG_CRYPTO_DEV_QCE_AEAD
+static const u32 std_iv_sha1[SHA256_DIGEST_SIZE / sizeof(u32)] = {
+   SHA1_H0, SHA1_H1, SHA1_H2, SHA1_H3, SHA1_H4, 0, 0, 0
+};
+
+static const u32 std_iv_sha256[SHA256_DIGEST_SIZE / sizeof(u32)] = {
+   SHA256_H0, SHA256_H1, SHA256_H2, SHA256_H3,
+   SHA256_H4, SHA256_H5, SHA256_H6, SHA256_H7
+};
+
+static unsigned int qce_be32_to_cpu_array(u32 *dst, const u8 *src, unsigned 
int len)
+{
+   u32 *d = dst;
+   const u8 *s = src;
+   unsigned int n;
+
+   n = len / sizeof(u32);
+   for (; n > 0; n--) {
+   *d = be32_to_cpup((const __be32 *)s);
+   s += sizeof(u32);
+   d++;
+   }
+   return DIV_ROUND_UP(len, sizeof(u32));
+}
+
+static int qce_setup_regs_aead(struct crypto_async_request *async_req)
+{
+   struct aead_request *req = aead_request_cast(async_req);
+   struct qce_aead_reqctx *rctx = aead_request_ctx(req);
+   struct qce_aead_ctx *ctx = crypto_tfm_ctx(async_req->tfm);
+   struct qce_alg_template *tmpl = to_aead_tmpl(crypto_aead_reqtfm(req));
+   struct qce_device *qce = tmpl->qce;
+   u32 enckey[QCE_MAX_CIPHER_KEY_SIZE / sizeof(u32)] = {0};
+   u32 enciv[QCE_MAX_IV_SIZE / sizeof(u32)] = {0};
+   u32 authkey[QCE_SHA_HMAC_KEY_SIZE / sizeof(u32)] = {0};
+   u32 authiv[SHA256_DIGEST_SIZE / sizeof(u32)] = {0};
+   u32 authnonce[QCE_MAX_NONCE / sizeof(u32)] = {0};
+   unsigned int enc_keylen = ctx->enc_keylen;
+   unsigned int auth_keylen = ctx->auth_keylen;
+   unsigned int enc_ivsize = rctx->ivsize;
+   unsigned int auth_ivsize;
+   unsigned int enckey_words, enciv_words;
+   unsigned int authkey_words, authiv_words, authnonce_words;
+   unsigned long flags = rctx->flags;
+   u32 encr_cfg, auth_cfg, config, totallen;
+   u32 iv_last_word;
+
+   qce_setup_config(qce);
+
+   /* Write encryption key */
+   enckey_words = qce_be32_to_cpu_array(enckey, ctx->enc_key, 

[Patch v3 5/7] crypto: qce: common: Clean up qce_auth_cfg

2021-04-19 Thread Thara Gopinath
Remove various redundant checks in qce_auth_cfg. Also allow qce_auth_cfg
to take auth_size as a parameter which is a required setting for ccm(aes)
algorithms

Signed-off-by: Thara Gopinath 
---
 drivers/crypto/qce/common.c | 21 +
 1 file changed, 9 insertions(+), 12 deletions(-)

diff --git a/drivers/crypto/qce/common.c b/drivers/crypto/qce/common.c
index 7b5bc5a6ae81..7b3d6caec1b2 100644
--- a/drivers/crypto/qce/common.c
+++ b/drivers/crypto/qce/common.c
@@ -97,11 +97,11 @@ static inline void qce_crypto_go(struct qce_device *qce, 
bool result_dump)
 }
 
 #ifdef CONFIG_CRYPTO_DEV_QCE_SHA
-static u32 qce_auth_cfg(unsigned long flags, u32 key_size)
+static u32 qce_auth_cfg(unsigned long flags, u32 key_size, u32 auth_size)
 {
u32 cfg = 0;
 
-   if (IS_AES(flags) && (IS_CCM(flags) || IS_CMAC(flags)))
+   if (IS_CCM(flags) || IS_CMAC(flags))
cfg |= AUTH_ALG_AES << AUTH_ALG_SHIFT;
else
cfg |= AUTH_ALG_SHA << AUTH_ALG_SHIFT;
@@ -119,15 +119,16 @@ static u32 qce_auth_cfg(unsigned long flags, u32 key_size)
cfg |= AUTH_SIZE_SHA256 << AUTH_SIZE_SHIFT;
else if (IS_CMAC(flags))
cfg |= AUTH_SIZE_ENUM_16_BYTES << AUTH_SIZE_SHIFT;
+   else if (IS_CCM(flags))
+   cfg |= (auth_size - 1) << AUTH_SIZE_SHIFT;
 
if (IS_SHA1(flags) || IS_SHA256(flags))
cfg |= AUTH_MODE_HASH << AUTH_MODE_SHIFT;
-   else if (IS_SHA1_HMAC(flags) || IS_SHA256_HMAC(flags) ||
-IS_CBC(flags) || IS_CTR(flags))
+   else if (IS_SHA1_HMAC(flags) || IS_SHA256_HMAC(flags))
cfg |= AUTH_MODE_HMAC << AUTH_MODE_SHIFT;
-   else if (IS_AES(flags) && IS_CCM(flags))
+   else if (IS_CCM(flags))
cfg |= AUTH_MODE_CCM << AUTH_MODE_SHIFT;
-   else if (IS_AES(flags) && IS_CMAC(flags))
+   else if (IS_CMAC(flags))
cfg |= AUTH_MODE_CMAC << AUTH_MODE_SHIFT;
 
if (IS_SHA(flags) || IS_SHA_HMAC(flags))
@@ -136,10 +137,6 @@ static u32 qce_auth_cfg(unsigned long flags, u32 key_size)
if (IS_CCM(flags))
cfg |= QCE_MAX_NONCE_WORDS << AUTH_NONCE_NUM_WORDS_SHIFT;
 
-   if (IS_CBC(flags) || IS_CTR(flags) || IS_CCM(flags) ||
-   IS_CMAC(flags))
-   cfg |= BIT(AUTH_LAST_SHIFT) | BIT(AUTH_FIRST_SHIFT);
-
return cfg;
 }
 
@@ -171,7 +168,7 @@ static int qce_setup_regs_ahash(struct crypto_async_request 
*async_req)
qce_clear_array(qce, REG_AUTH_KEY0, 16);
qce_clear_array(qce, REG_AUTH_BYTECNT0, 4);
 
-   auth_cfg = qce_auth_cfg(rctx->flags, rctx->authklen);
+   auth_cfg = qce_auth_cfg(rctx->flags, rctx->authklen, 
digestsize);
}
 
if (IS_SHA_HMAC(rctx->flags) || IS_CMAC(rctx->flags)) {
@@ -199,7 +196,7 @@ static int qce_setup_regs_ahash(struct crypto_async_request 
*async_req)
qce_write_array(qce, REG_AUTH_BYTECNT0,
(u32 *)rctx->byte_count, 2);
 
-   auth_cfg = qce_auth_cfg(rctx->flags, 0);
+   auth_cfg = qce_auth_cfg(rctx->flags, 0, digestsize);
 
if (rctx->last_blk)
auth_cfg |= BIT(AUTH_LAST_SHIFT);
-- 
2.25.1



[Patch v3 2/7] crypto: qce: common: Make result dump optional

2021-04-19 Thread Thara Gopinath
Qualcomm crypto engine allows for IV registers and status register
to be concatenated to the output. This option is enabled by setting the
RESULTS_DUMP field in GOPROC  register. This is useful for most of the
algorithms to either retrieve status of operation or in case of
authentication algorithms to retrieve the mac. But for ccm
algorithms, the mac is part of the output stream and not retrieved
from the IV registers, thus needing a separate buffer to retrieve it.
Make enabling RESULTS_DUMP field optional so that algorithms can choose
whether or not to enable the option.
Note that in this patch, the enabled algorithms always choose
RESULTS_DUMP to be enabled. But later with the introduction of ccm
algorithms, this changes.

Reviewed-by: Bjorn Andersson 
Signed-off-by: Thara Gopinath 
---
 drivers/crypto/qce/common.c | 11 +++
 1 file changed, 7 insertions(+), 4 deletions(-)

diff --git a/drivers/crypto/qce/common.c b/drivers/crypto/qce/common.c
index dd76175d5c62..7b5bc5a6ae81 100644
--- a/drivers/crypto/qce/common.c
+++ b/drivers/crypto/qce/common.c
@@ -88,9 +88,12 @@ static void qce_setup_config(struct qce_device *qce)
qce_write(qce, REG_CONFIG, config);
 }
 
-static inline void qce_crypto_go(struct qce_device *qce)
+static inline void qce_crypto_go(struct qce_device *qce, bool result_dump)
 {
-   qce_write(qce, REG_GOPROC, BIT(GO_SHIFT) | BIT(RESULTS_DUMP_SHIFT));
+   if (result_dump)
+   qce_write(qce, REG_GOPROC, BIT(GO_SHIFT) | 
BIT(RESULTS_DUMP_SHIFT));
+   else
+   qce_write(qce, REG_GOPROC, BIT(GO_SHIFT));
 }
 
 #ifdef CONFIG_CRYPTO_DEV_QCE_SHA
@@ -219,7 +222,7 @@ static int qce_setup_regs_ahash(struct crypto_async_request 
*async_req)
config = qce_config_reg(qce, 1);
qce_write(qce, REG_CONFIG, config);
 
-   qce_crypto_go(qce);
+   qce_crypto_go(qce, true);
 
return 0;
 }
@@ -380,7 +383,7 @@ static int qce_setup_regs_skcipher(struct 
crypto_async_request *async_req)
config = qce_config_reg(qce, 1);
qce_write(qce, REG_CONFIG, config);
 
-   qce_crypto_go(qce);
+   qce_crypto_go(qce, true);
 
return 0;
 }
-- 
2.25.1



[Patch v3 4/7] crypto: qce: Add support for AEAD algorithms

2021-04-19 Thread Thara Gopinath
Introduce support to enable following algorithms in Qualcomm Crypto Engine.

- authenc(hmac(sha1),cbc(des))
- authenc(hmac(sha1),cbc(des3_ede))
- authenc(hmac(sha256),cbc(des))
- authenc(hmac(sha256),cbc(des3_ede))
- authenc(hmac(sha256),cbc(aes))
- ccm(aes)
- rfc4309(ccm(aes))

Signed-off-by: Thara Gopinath 
---

v1->v2:
- Removed spurious totallen from qce_aead_async_req_handle since
  it was unused as pointed out by Hubert.
- Updated qce_dma_prep_sgs to use #nents returned by dma_map_sg rather 
than
  using precomputed #nents.


 drivers/crypto/Kconfig  |  15 +
 drivers/crypto/qce/Makefile |   1 +
 drivers/crypto/qce/aead.c   | 799 
 drivers/crypto/qce/aead.h   |  53 +++
 drivers/crypto/qce/common.h |   2 +
 drivers/crypto/qce/core.c   |   4 +
 6 files changed, 874 insertions(+)
 create mode 100644 drivers/crypto/qce/aead.c
 create mode 100644 drivers/crypto/qce/aead.h

diff --git a/drivers/crypto/Kconfig b/drivers/crypto/Kconfig
index 9a4c275a1335..1fe5b7eafc02 100644
--- a/drivers/crypto/Kconfig
+++ b/drivers/crypto/Kconfig
@@ -627,6 +627,12 @@ config CRYPTO_DEV_QCE_SHA
select CRYPTO_SHA1
select CRYPTO_SHA256
 
+config CRYPTO_DEV_QCE_AEAD
+   bool
+   depends on CRYPTO_DEV_QCE
+   select CRYPTO_AUTHENC
+   select CRYPTO_LIB_DES
+
 choice
prompt "Algorithms enabled for QCE acceleration"
default CRYPTO_DEV_QCE_ENABLE_ALL
@@ -647,6 +653,7 @@ choice
bool "All supported algorithms"
select CRYPTO_DEV_QCE_SKCIPHER
select CRYPTO_DEV_QCE_SHA
+   select CRYPTO_DEV_QCE_AEAD
help
  Enable all supported algorithms:
- AES (CBC, CTR, ECB, XTS)
@@ -672,6 +679,14 @@ choice
- SHA1, HMAC-SHA1
- SHA256, HMAC-SHA256
 
+   config CRYPTO_DEV_QCE_ENABLE_AEAD
+   bool "AEAD algorithms only"
+   select CRYPTO_DEV_QCE_AEAD
+   help
+ Enable AEAD algorithms only:
+   - authenc()
+   - ccm(aes)
+   - rfc4309(ccm(aes))
 endchoice
 
 config CRYPTO_DEV_QCE_SW_MAX_LEN
diff --git a/drivers/crypto/qce/Makefile b/drivers/crypto/qce/Makefile
index 14ade8a7d664..2cf8984e1b85 100644
--- a/drivers/crypto/qce/Makefile
+++ b/drivers/crypto/qce/Makefile
@@ -6,3 +6,4 @@ qcrypto-objs := core.o \
 
 qcrypto-$(CONFIG_CRYPTO_DEV_QCE_SHA) += sha.o
 qcrypto-$(CONFIG_CRYPTO_DEV_QCE_SKCIPHER) += skcipher.o
+qcrypto-$(CONFIG_CRYPTO_DEV_QCE_AEAD) += aead.o
diff --git a/drivers/crypto/qce/aead.c b/drivers/crypto/qce/aead.c
new file mode 100644
index ..ef66ae21eae3
--- /dev/null
+++ b/drivers/crypto/qce/aead.c
@@ -0,0 +1,799 @@
+// SPDX-License-Identifier: GPL-2.0-only
+
+/*
+ * Copyright (C) 2021, Linaro Limited. All rights reserved.
+ */
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+#include "aead.h"
+
+#define CCM_NONCE_ADATA_SHIFT  6
+#define CCM_NONCE_AUTHSIZE_SHIFT   3
+#define MAX_CCM_ADATA_HEADER_LEN6
+
+static LIST_HEAD(aead_algs);
+
+static void qce_aead_done(void *data)
+{
+   struct crypto_async_request *async_req = data;
+   struct aead_request *req = aead_request_cast(async_req);
+   struct qce_aead_reqctx *rctx = aead_request_ctx(req);
+   struct qce_aead_ctx *ctx = crypto_tfm_ctx(async_req->tfm);
+   struct qce_alg_template *tmpl = to_aead_tmpl(crypto_aead_reqtfm(req));
+   struct qce_device *qce = tmpl->qce;
+   struct qce_result_dump *result_buf = qce->dma.result_buf;
+   enum dma_data_direction dir_src, dir_dst;
+   bool diff_dst;
+   int error;
+   u32 status;
+   unsigned int totallen;
+   unsigned char tag[SHA256_DIGEST_SIZE] = {0};
+   int ret = 0;
+
+   diff_dst = (req->src != req->dst) ? true : false;
+   dir_src = diff_dst ? DMA_TO_DEVICE : DMA_BIDIRECTIONAL;
+   dir_dst = diff_dst ? DMA_FROM_DEVICE : DMA_BIDIRECTIONAL;
+
+   error = qce_dma_terminate_all(>dma);
+   if (error)
+   dev_dbg(qce->dev, "aead dma termination error (%d)\n",
+   error);
+   if (diff_dst)
+   dma_unmap_sg(qce->dev, rctx->src_sg, rctx->src_nents, dir_src);
+
+   dma_unmap_sg(qce->dev, rctx->dst_sg, rctx->dst_nents, dir_dst);
+
+   if (IS_CCM(rctx->flags)) {
+   if (req->assoclen) {
+   sg_free_table(>src_tbl);
+   if (diff_dst)
+   sg_free_table(>dst_tbl);
+   } else {
+   if (!(IS_DECRYPT(rctx->flags) && !diff_dst))
+   sg_free_table(>dst_tbl);
+   }
+   } else {
+   sg_free_table(>dst_tbl);
+   }
+
+   error = qce_check_status(qce, );
+   if (error < 0 && (error != 

[Patch v3 3/7] crypto: qce: Add mode for rfc4309

2021-04-19 Thread Thara Gopinath
rf4309 is the specification that uses aes ccm algorithms with IPsec
security packets. Add a submode to identify rfc4309 ccm(aes) algorithm
in the crypto driver.

Reviewed-by: Bjorn Andersson 
Signed-off-by: Thara Gopinath 
---

v1->v2:
- Moved up the QCE_ENCRYPT AND QCE_DECRYPT bit positions so that
  addition of other algorithms in future will not affect these
  macros.

 drivers/crypto/qce/common.h | 7 +--
 1 file changed, 5 insertions(+), 2 deletions(-)

diff --git a/drivers/crypto/qce/common.h b/drivers/crypto/qce/common.h
index 3bc244bcca2d..b135440bf72b 100644
--- a/drivers/crypto/qce/common.h
+++ b/drivers/crypto/qce/common.h
@@ -51,9 +51,11 @@
 #define QCE_MODE_CCM   BIT(12)
 #define QCE_MODE_MASK  GENMASK(12, 8)
 
+#define QCE_MODE_CCM_RFC4309   BIT(13)
+
 /* cipher encryption/decryption operations */
-#define QCE_ENCRYPTBIT(13)
-#define QCE_DECRYPTBIT(14)
+#define QCE_ENCRYPTBIT(30)
+#define QCE_DECRYPTBIT(31)
 
 #define IS_DES(flags)  (flags & QCE_ALG_DES)
 #define IS_3DES(flags) (flags & QCE_ALG_3DES)
@@ -73,6 +75,7 @@
 #define IS_CTR(mode)   (mode & QCE_MODE_CTR)
 #define IS_XTS(mode)   (mode & QCE_MODE_XTS)
 #define IS_CCM(mode)   (mode & QCE_MODE_CCM)
+#define IS_CCM_RFC4309(mode)   ((mode) & QCE_MODE_CCM_RFC4309)
 
 #define IS_ENCRYPT(dir)(dir & QCE_ENCRYPT)
 #define IS_DECRYPT(dir)(dir & QCE_DECRYPT)
-- 
2.25.1



[Patch v3 0/7] Add support for AEAD algorithms in Qualcomm Crypto Engine driver

2021-04-19 Thread Thara Gopinath
Enable support for AEAD algorithms in Qualcomm CE driver.  The first three
patches in this series are cleanups and add a few missing pieces required
to add support for AEAD algorithms.  Patch 4 introduces supported AEAD
transformations on Qualcomm CE.  Patches 5 and 6 implements the h/w
infrastructure needed to enable and run the AEAD transformations on
Qualcomm CE.  Patch 7 adds support to queue fallback algorithms in case of
unsupported special inputs.

This patch series has been tested with in kernel crypto testing module
tcrypt.ko with fuzz tests enabled as well.

Thara Gopinath (7):
  crypto: qce: common: Add MAC failed error checking
  crypto: qce: common: Make result dump optional
  crypto: qce: Add mode for rfc4309
  crypto: qce: Add support for AEAD algorithms
  crypto: qce: common: Clean up qce_auth_cfg
  crypto: qce: common: Add support for AEAD algorithms
  crypto: qce: aead: Schedule fallback algorithm

 drivers/crypto/Kconfig  |  15 +
 drivers/crypto/qce/Makefile |   1 +
 drivers/crypto/qce/aead.c   | 841 
 drivers/crypto/qce/aead.h   |  56 +++
 drivers/crypto/qce/common.c | 196 -
 drivers/crypto/qce/common.h |   9 +-
 drivers/crypto/qce/core.c   |   4 +
 7 files changed, 1102 insertions(+), 20 deletions(-)
 create mode 100644 drivers/crypto/qce/aead.c
 create mode 100644 drivers/crypto/qce/aead.h

-- 
2.25.1



[Patch v3 1/7] crypto: qce: common: Add MAC failed error checking

2021-04-19 Thread Thara Gopinath
MAC_FAILED gets set in the status register if authenthication fails
for ccm algorithms(during decryption). Add support to catch and flag
this error.

Reviewed-by: Bjorn Andersson 
Signed-off-by: Thara Gopinath 
---

v1->v2:
- Split the error checking for -ENXIO and -EBADMSG into if-else clause
  so that the code is more readable as per Bjorn's review comment.

 drivers/crypto/qce/common.c | 2 ++
 1 file changed, 2 insertions(+)

diff --git a/drivers/crypto/qce/common.c b/drivers/crypto/qce/common.c
index dceb9579d87a..dd76175d5c62 100644
--- a/drivers/crypto/qce/common.c
+++ b/drivers/crypto/qce/common.c
@@ -419,6 +419,8 @@ int qce_check_status(struct qce_device *qce, u32 *status)
 */
if (*status & STATUS_ERRORS || !(*status & BIT(OPERATION_DONE_SHIFT)))
ret = -ENXIO;
+   else if (*status & BIT(MAC_FAILED_SHIFT))
+   ret = -EBADMSG;
 
return ret;
 }
-- 
2.25.1



Re: [PATCH] net: ethernet: ravb: Fix release of refclk

2021-04-19 Thread Adam Ford
On Mon, Apr 19, 2021 at 5:45 PM David Miller  wrote:
>
> From: Adam Ford 
> Date: Sat, 17 Apr 2021 08:23:29 -0500
>
> > The call to clk_disable_unprepare() can happen before priv is
> > initialized. This means moving clk_disable_unprepare out of
> > out_release into a new label.
> >
> > Fixes: 8ef7adc6beb2("net: ethernet: ravb: Enable optional refclk")
> > Signed-off-by: Adam Ford 
> Thjis does not apply cleanly, please rebbase and resubmit.

Which branch should I use as the rebase?  I used net-next because
that's where the bug is, but I know it changes frequently.

>
> Please fix the formatting of your Fixes tag while you are at it, thank you.

no problem.  Sorry about that

adam


RE: [RFC PATCH v5 1/4] topology: Represent clusters of CPUs within a die

2021-04-19 Thread Song Bao Hua (Barry Song)



> -Original Message-
> From: Greg KH [mailto:gre...@linuxfoundation.org]
> Sent: Friday, March 19, 2021 11:02 PM
> To: Jonathan Cameron 
> Cc: Song Bao Hua (Barry Song) ;
> tim.c.c...@linux.intel.com; catalin.mari...@arm.com; w...@kernel.org;
> r...@rjwysocki.net; vincent.guit...@linaro.org; b...@alien8.de;
> t...@linutronix.de; mi...@redhat.com; l...@kernel.org; pet...@infradead.org;
> dietmar.eggem...@arm.com; rost...@goodmis.org; bseg...@google.com;
> mgor...@suse.de; msys.miz...@gmail.com; valentin.schnei...@arm.com;
> juri.le...@redhat.com; mark.rutl...@arm.com; sudeep.ho...@arm.com;
> aubrey...@linux.intel.com; linux-arm-ker...@lists.infradead.org;
> linux-kernel@vger.kernel.org; linux-a...@vger.kernel.org; x...@kernel.org;
> xuwei (O) ; Zengtao (B) ;
> guodong...@linaro.org; yangyicong ; Liguozhu (Kenneth)
> ; linux...@openeuler.org; h...@zytor.com
> Subject: Re: [RFC PATCH v5 1/4] topology: Represent clusters of CPUs within
> a die
> 
> On Fri, Mar 19, 2021 at 09:36:16AM +, Jonathan Cameron wrote:
> > On Fri, 19 Mar 2021 06:57:08 +
> > "Song Bao Hua (Barry Song)"  wrote:
> >
> > > > -Original Message-
> > > > From: Greg KH [mailto:gre...@linuxfoundation.org]
> > > > Sent: Friday, March 19, 2021 7:35 PM
> > > > To: Song Bao Hua (Barry Song) 
> > > > Cc: tim.c.c...@linux.intel.com; catalin.mari...@arm.com;
> w...@kernel.org;
> > > > r...@rjwysocki.net; vincent.guit...@linaro.org; b...@alien8.de;
> > > > t...@linutronix.de; mi...@redhat.com; l...@kernel.org;
> pet...@infradead.org;
> > > > dietmar.eggem...@arm.com; rost...@goodmis.org; bseg...@google.com;
> > > > mgor...@suse.de; msys.miz...@gmail.com; valentin.schnei...@arm.com;
> Jonathan
> > > > Cameron ; juri.le...@redhat.com;
> > > > mark.rutl...@arm.com; sudeep.ho...@arm.com; aubrey...@linux.intel.com;
> > > > linux-arm-ker...@lists.infradead.org; linux-kernel@vger.kernel.org;
> > > > linux-a...@vger.kernel.org; x...@kernel.org; xuwei (O)
> ;
> > > > Zengtao (B) ; guodong...@linaro.org;
> yangyicong
> > > > ; Liguozhu (Kenneth) ;
> > > > linux...@openeuler.org; h...@zytor.com
> > > > Subject: Re: [RFC PATCH v5 1/4] topology: Represent clusters of CPUs 
> > > > within
> > > > a die
> > > >
> > > > On Fri, Mar 19, 2021 at 05:16:15PM +1300, Barry Song wrote:
> > > > > diff --git a/Documentation/admin-guide/cputopology.rst
> > > > b/Documentation/admin-guide/cputopology.rst
> > > > > index b90dafc..f9d3745 100644
> > > > > --- a/Documentation/admin-guide/cputopology.rst
> > > > > +++ b/Documentation/admin-guide/cputopology.rst
> > > > > @@ -24,6 +24,12 @@ core_id:
> > > > >   identifier (rather than the kernel's).  The actual value is
> > > > >   architecture and platform dependent.
> > > > >
> > > > > +cluster_id:
> > > > > +
> > > > > + the Cluster ID of cpuX.  Typically it is the hardware platform's
> > > > > + identifier (rather than the kernel's).  The actual value is
> > > > > + architecture and platform dependent.
> > > > > +
> > > > >  book_id:
> > > > >
> > > > >   the book ID of cpuX. Typically it is the hardware platform's
> > > > > @@ -56,6 +62,14 @@ package_cpus_list:
> > > > >   human-readable list of CPUs sharing the same 
> > > > > physical_package_id.
> > > > >   (deprecated name: "core_siblings_list")
> > > > >
> > > > > +cluster_cpus:
> > > > > +
> > > > > + internal kernel map of CPUs within the same cluster.
> > > > > +
> > > > > +cluster_cpus_list:
> > > > > +
> > > > > + human-readable list of CPUs within the same cluster.
> > > > > +
> > > > >  die_cpus:
> > > > >
> > > > >   internal kernel map of CPUs within the same die.
> > > >
> > > > Why are these sysfs files in this file, and not in a Documentation/ABI/
> > > > file which can be correctly parsed and shown to userspace?
> > >
> > > Well. Those ABIs have been there for much a long time. It is like:
> > >
> > > [root@ceph1 topology]# ls
> > > core_id  core_siblings  core_siblings_list  physical_package_id
> thread_siblings  thread_siblings_list
> > > [root@ceph1 topology]# pwd
> > > /sys/devices/system/cpu/cpu100/topology
> > > [root@ceph1 topology]# cat core_siblings_list
> > > 64-127
> > > [root@ceph1 topology]#
> > >
> > > >
> > > > Any chance you can fix that up here as well?
> > >
> > > Yes. we will send a separate patch to address this, which won't
> > > be in this patchset. This patchset will base on that one.
> > >
> > > >
> > > > Also note that "list" is not something that goes in sysfs, sysfs is "one
> > > > value per file", and a list is not "one value".  How do you prevent
> > > > overflowing the buffer of the sysfs file if you have a "list"?
> > > >
> > >
> > > At a glance, the list is using "-" rather than a real list
> > > [root@ceph1 topology]# cat core_siblings_list
> > > 64-127
> > >
> > > Anyway, I will take a look if it has any chance to overflow.
> >
> > It could in theory be alternate CPUs as comma separated list.
> > So it's would get interesting around 500-1000 cpus (guessing).

Re: [PATCH bpf-next 1/2] bpf: Remove bpf_jit_enable=2 debugging mode

2021-04-19 Thread Alexei Starovoitov
On Sat, Apr 17, 2021 at 1:16 AM Christophe Leroy
 wrote:
>
>
>
> Le 16/04/2021 à 01:49, Alexei Starovoitov a écrit :
> > On Thu, Apr 15, 2021 at 8:41 AM Quentin Monnet  
> > wrote:
> >>
> >> 2021-04-15 16:37 UTC+0200 ~ Daniel Borkmann 
> >>> On 4/15/21 11:32 AM, Jianlin Lv wrote:
>  For debugging JITs, dumping the JITed image to kernel log is discouraged,
>  "bpftool prog dump jited" is much better way to examine JITed dumps.
>  This patch get rid of the code related to bpf_jit_enable=2 mode and
>  update the proc handler of bpf_jit_enable, also added auxiliary
>  information to explain how to use bpf_jit_disasm tool after this change.
> 
>  Signed-off-by: Jianlin Lv 
> >>
> >> Hello,
> >>
> >> For what it's worth, I have already seen people dump the JIT image in
> >> kernel logs in Qemu VMs running with just a busybox, not for kernel
> >> development, but in a context where buiding/using bpftool was not
> >> possible.
> >
> > If building/using bpftool is not possible then majority of selftests won't
> > be exercised. I don't think such environment is suitable for any kind
> > of bpf development. Much so for JIT debugging.
> > While bpf_jit_enable=2 is nothing but the debugging tool for JIT developers.
> > I'd rather nuke that code instead of carrying it from kernel to kernel.
> >
>
> When I implemented JIT for PPC32, it was extremely helpfull.
>
> As far as I understand, for the time being bpftool is not usable in my 
> environment because it
> doesn't support cross compilation when the target's endianess differs from 
> the building host
> endianess, see discussion at
> https://lore.kernel.org/bpf/21e66a09-514f-f426-b9e2-13baab0b9...@csgroup.eu/
>
> That's right that selftests can't be exercised because they don't build.
>
> The question might be candid as I didn't investigate much about the 
> replacement of "bpf_jit_enable=2
> debugging mode" by bpftool, how do we use bpftool exactly for that ? 
> Especially when using the BPF
> test module ?

the kernel developers can add any amount of printk and dumps to debug
their code,
but such debugging aid should not be part of the production kernel.
That sysctl was two things at once: debugging tool for kernel devs and
introspection for users.
bpftool jit dump solves the 2nd part. It provides JIT introspection to users.
Debugging of the kernel can be done with any amount of auxiliary code
including calling print_hex_dump() during jiting.


Re: [PATCH] bonding: 3ad: update slave arr after initialize

2021-04-19 Thread jin yiting




在 2021/4/16 12:28, Jay Vosburgh 写道:

jinyiting  wrote:


From: jin yiting 

The bond works in mode 4, and performs down/up operations on the bond
that is normally negotiated. The probability of bond-> slave_arr is NULL

Test commands:
ifconfig bond1 down
ifconfig bond1 up

The conflict occurs in the following process:

__dev_open (CPU A)
--bond_open
   --queue_delayed_work(bond->wq,>ad_work,0);
   --bond_update_slave_arr
 --bond_3ad_get_active_agg_info

ad_work(CPU B)
--bond_3ad_state_machine_handler
   --ad_agg_selection_logic

ad_work runs on cpu B. In the function ad_agg_selection_logic, all
agg->is_active will be cleared. Before the new active aggregator is
selected on CPU B, bond_3ad_get_active_agg_info failed on CPU A,
bond->slave_arr will be set to NULL. The best aggregator in
ad_agg_selection_logic has not changed, no need to update slave arr.

Signed-off-by: jin yiting 
---
drivers/net/bonding/bond_3ad.c | 6 ++
1 file changed, 6 insertions(+)

diff --git a/drivers/net/bonding/bond_3ad.c b/drivers/net/bonding/bond_3ad.c
index 6908822..d100079 100644
--- a/drivers/net/bonding/bond_3ad.c
+++ b/drivers/net/bonding/bond_3ad.c
@@ -2327,6 +2327,12 @@ void bond_3ad_state_machine_handler(struct work_struct 
*work)

aggregator = __get_first_agg(port);
ad_agg_selection_logic(aggregator, _slave_arr);
+   if (!update_slave_arr) {
+   struct aggregator *active = 
__get_active_agg(aggregator);
+
+   if (active && active->is_active)
+   update_slave_arr = true;
+   }
}
bond_3ad_set_carrier(bond);
}


The described issue is a race condition (in that
ad_agg_selection_logic clears agg->is_active under mode_lock, but
bond_open -> bond_update_slave_arr is inspecting agg->is_active outside
the lock).  I don't see how the above change will reliably manage this;
the real issue looks to be that bond_update_slave_arr is committing
changes to the array (via bond_reset_slave_arr) based on a racy
inspection of the active aggregator state while it is in flux.

Also, the description of the issue says "The best aggregator in
ad_agg_selection_logic has not changed, no need to update slave arr,"
but the change above does the opposite, and will set update_slave_arr
when the aggregator has not changed (update_slave_arr remains false at
return of ad_agg_selection_logic).

I believe I understand the described problem, but I don't see
how the patch fixes it.  I suspect (but haven't tested) that the proper
fix is to acquire mode_lock in bond_update_slave_arr while calling
bond_3ad_get_active_agg_info to avoid conflict with the state machine.

-J

---
-Jay Vosburgh, jay.vosbu...@canonical.com
.



	Thank you for your reply. The last patch does have redundant update 
slave arr.Thank you for your correction.


As you said, holding mode_lock in bond_update_slave_arr while 
calling bond_3ad_get_active_agg_info can avoid conflictwith the state 
machine. I have tested this patch, with ifdown/ifup operations for bond 
or slaves.


But bond_update_slave_arr is expected to hold RTNL only and NO 
other lock. And it have WARN_ON(lockdep_is_held(>mode_lock)); in 
bond_update_slave_arr. I'm not sure that holding mode_lock in 
bond_update_slave_arr while calling bond_3ad_get_active_agg_info is a 
correct action.



diff --git a/drivers/net/bonding/bond_main.c 
b/drivers/net/bonding/bond_main.c

index 74cbbb2..db988e5 100644
--- a/drivers/net/bonding/bond_main.c
+++ b/drivers/net/bonding/bond_main.c
@@ -4406,7 +4406,9 @@ int bond_update_slave_arr(struct bonding *bond, 
struct slave *skipslave)

if (BOND_MODE(bond) == BOND_MODE_8023AD) {
struct ad_info ad_info;

+   spin_lock_bh(>mode_lock);
if (bond_3ad_get_active_agg_info(bond, _info)) {
+   spin_unlock_bh(>mode_lock);
pr_debug("bond_3ad_get_active_agg_info failed\n");
/* No active aggragator means it's not safe to use
 * the previous array.
@@ -4414,6 +4416,7 @@ int bond_update_slave_arr(struct bonding *bond, 
struct slave *skipslave)

bond_reset_slave_arr(bond);
goto out;
}
+   spin_unlock_bh(>mode_lock);
agg_id = ad_info.aggregator_id;
}
bond_for_each_slave(bond, slave, iter) {
















[PATCH v7 9/9] Documentation: arm64: Document PMU counters access from userspace

2021-04-19 Thread Rob Herring
From: Raphael Gault 

Add documentation to describe the access to the pmu hardware counters from
userspace.

Signed-off-by: Raphael Gault 
Signed-off-by: Rob Herring 
---
v7:
 - Merge into existing arm64 perf.rst
v6:
  - Update the chained event section with attr.config1 details
v2:
  - Update links to test examples

Changes from Raphael's v4:
  - Convert to rSt
  - Update chained event status
  - Add section for heterogeneous systems
---
 Documentation/arm64/perf.rst | 67 +++-
 1 file changed, 66 insertions(+), 1 deletion(-)

diff --git a/Documentation/arm64/perf.rst b/Documentation/arm64/perf.rst
index b567f177d385..c40fc2adc451 100644
--- a/Documentation/arm64/perf.rst
+++ b/Documentation/arm64/perf.rst
@@ -2,7 +2,10 @@
 
 .. _perf_index:
 
-=
+
+Perf
+
+
 Perf Event Attributes
 =
 
@@ -88,3 +91,65 @@ exclude_host. However when using !exclude_hv there is a 
small blackout
 window at the guest entry/exit where host events are not captured.
 
 On VHE systems there are no blackout windows.
+
+Perf Userspace PMU Hardware Counter Access
+==
+
+Overview
+
+The perf userspace tool relies on the PMU to monitor events. It offers an
+abstraction layer over the hardware counters since the underlying
+implementation is cpu-dependent.
+Arm64 allows userspace tools to have access to the registers storing the
+hardware counters' values directly.
+
+This targets specifically self-monitoring tasks in order to reduce the overhead
+by directly accessing the registers without having to go through the kernel.
+
+How-to
+--
+The focus is set on the armv8 PMUv3 which makes sure that the access to the pmu
+registers is enabled and that the userspace has access to the relevant
+information in order to use them.
+
+In order to have access to the hardware counter it is necessary to open the 
event
+using the perf tool interface: the sys_perf_event_open syscall returns a fd 
which
+can subsequently be used with the mmap syscall in order to retrieve a page of
+memory containing information about the event.
+The PMU driver uses this page to expose to the user the hardware counter's
+index and other necessary data. Using this index enables the user to access the
+PMU registers using the `mrs` instruction.
+
+The userspace access is supported in libperf using the perf_evsel__mmap()
+and perf_evsel__read() functions. See `tools/lib/perf/tests/test-evsel.c`_ for
+an example.
+
+About heterogeneous systems
+---
+On heterogeneous systems such as big.LITTLE, userspace PMU counter access can
+only be enabled when the tasks are pinned to a homogeneous subset of cores and
+the corresponding PMU instance is opened by specifying the 'type' attribute.
+The use of generic event types is not supported in this case.
+
+Have a look at `tools/perf/arch/arm64/tests/user-events.c`_ for an example. It
+can be run using the perf tool to check that the access to the registers works
+correctly from userspace:
+
+.. code-block:: sh
+
+  perf test -v user
+
+About chained events and 64-bit counters
+
+Chained events are not supported in conjunction with userspace counter
+access. If a 64-bit counter is requested (attr.config1:0), then
+userspace access must also be requested with attr.config1:1 set. This
+will disable counter chaining. The 'pmc_width' in the user page will
+indicate the actual width of the counter which could be only 32-bits
+depending on event and PMU features.
+
+.. Links
+.. _tools/perf/arch/arm64/tests/user-events.c:
+   
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/perf/arch/arm64/tests/user-events.c
+.. _tools/lib/perf/tests/test-evsel.c:
+   
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/tools/lib/perf/tests/test-evsel.c
-- 
2.27.0



[PATCH v7 7/9] perf: arm64: Add test for userspace counter access on heterogeneous systems

2021-04-19 Thread Rob Herring
Userspace counter access only works on heterogeneous systems with some
restrictions. The userspace process must be pinned to a homogeneous
subset of CPUs and must open the corresponding PMU for those CPUs. This
commit adds a test implementing these requirements.

Signed-off-by: Rob Herring 
---
v6:
 - Add a check on cap_user_rdpmc
v5:
 - Adapt to libperf mmap API changes
v4:
 - Update perf_evsel__mmap params
v2:
 - Drop all but heterogeneous test as others covered by libperf tests
 - Rework to use libperf
---
 tools/perf/arch/arm64/include/arch-tests.h |   7 +
 tools/perf/arch/arm64/tests/Build  |   1 +
 tools/perf/arch/arm64/tests/arch-tests.c   |   4 +
 tools/perf/arch/arm64/tests/user-events.c  | 177 +
 4 files changed, 189 insertions(+)
 create mode 100644 tools/perf/arch/arm64/tests/user-events.c

diff --git a/tools/perf/arch/arm64/include/arch-tests.h 
b/tools/perf/arch/arm64/include/arch-tests.h
index 90ec4c8cb880..380ad34a3f09 100644
--- a/tools/perf/arch/arm64/include/arch-tests.h
+++ b/tools/perf/arch/arm64/include/arch-tests.h
@@ -2,11 +2,18 @@
 #ifndef ARCH_TESTS_H
 #define ARCH_TESTS_H
 
+#include 
+
 #ifdef HAVE_DWARF_UNWIND_SUPPORT
 struct thread;
 struct perf_sample;
+int test__arch_unwind_sample(struct perf_sample *sample,
+struct thread *thread);
 #endif
 
 extern struct test arch_tests[];
+int test__rd_pinned(struct test __maybe_unused *test,
+  int __maybe_unused subtest);
+
 
 #endif
diff --git a/tools/perf/arch/arm64/tests/Build 
b/tools/perf/arch/arm64/tests/Build
index a61c06bdb757..3f9a20c17fc6 100644
--- a/tools/perf/arch/arm64/tests/Build
+++ b/tools/perf/arch/arm64/tests/Build
@@ -1,4 +1,5 @@
 perf-y += regs_load.o
 perf-$(CONFIG_DWARF_UNWIND) += dwarf-unwind.o
 
+perf-y += user-events.o
 perf-y += arch-tests.o
diff --git a/tools/perf/arch/arm64/tests/arch-tests.c 
b/tools/perf/arch/arm64/tests/arch-tests.c
index 5b1543c98022..80ce7bd3c16d 100644
--- a/tools/perf/arch/arm64/tests/arch-tests.c
+++ b/tools/perf/arch/arm64/tests/arch-tests.c
@@ -10,6 +10,10 @@ struct test arch_tests[] = {
.func = test__dwarf_unwind,
},
 #endif
+   {
+   .desc = "Pinned CPU user counter access",
+   .func = test__rd_pinned,
+   },
{
.func = NULL,
},
diff --git a/tools/perf/arch/arm64/tests/user-events.c 
b/tools/perf/arch/arm64/tests/user-events.c
new file mode 100644
index ..c8efc6b369e6
--- /dev/null
+++ b/tools/perf/arch/arm64/tests/user-events.c
@@ -0,0 +1,177 @@
+// SPDX-License-Identifier: GPL-2.0
+#include 
+#include 
+#include 
+
+#include 
+#include 
+#include 
+
+#include "pmu.h"
+#include "debug.h"
+#include "tests/tests.h"
+#include "arch-tests.h"
+
+static int run_test(struct perf_evsel *evsel)
+{
+   int n;
+   volatile int tmp = 0;
+   u64 delta, i, loops = 1000;
+   struct perf_counts_values counts = { .val = 0 };
+
+   for (n = 0; n < 6; n++) {
+   u64 stamp, now;
+
+   perf_evsel__read(evsel, 0, 0, );
+   stamp = counts.val;
+
+   for (i = 0; i < loops; i++)
+   tmp++;
+
+   perf_evsel__read(evsel, 0, 0, );
+   now = counts.val;
+   loops *= 10;
+
+   delta = now - stamp;
+   pr_debug("%14d: %14llu\n", n, (long long)delta);
+
+   if (!delta)
+   break;
+   }
+   return delta ? 0 : -1;
+}
+
+static struct perf_pmu *pmu_for_cpu(int cpu)
+{
+   int acpu, idx;
+   struct perf_pmu *pmu = NULL;
+
+   while ((pmu = perf_pmu__scan(pmu)) != NULL) {
+   if (pmu->is_uncore)
+   continue;
+   perf_cpu_map__for_each_cpu(acpu, idx, pmu->cpus)
+   if (acpu == cpu)
+   return pmu;
+   }
+   return NULL;
+}
+
+static bool pmu_is_homogeneous(void)
+{
+   int core_cnt = 0;
+   struct perf_pmu *pmu = NULL;
+
+   while ((pmu = perf_pmu__scan(pmu)) != NULL) {
+   if (!pmu->is_uncore && !perf_cpu_map__empty(pmu->cpus))
+   core_cnt++;
+   }
+   return core_cnt == 1;
+}
+
+static int libperf_print(enum libperf_print_level level,
+const char *fmt, va_list ap)
+{
+   (void)level;
+   return vfprintf(stderr, fmt, ap);
+}
+
+static struct perf_evsel *perf_init(struct perf_event_attr *attr)
+{
+   int err;
+   struct perf_thread_map *threads;
+   struct perf_evsel *evsel;
+   struct perf_event_mmap_page *pc;
+
+   libperf_init(libperf_print);
+
+   threads = perf_thread_map__new_dummy();
+   if (!threads) {
+   pr_err("failed to create threads\n");
+   return NULL;
+   }
+
+   perf_thread_map__set_pid(threads, 0, 0);
+
+   evsel = perf_evsel__new(attr);
+   if (!evsel) {
+   

[PATCH v7 8/9] perf: arm64: Add tests for 32-bit and 64-bit counter size userspace access

2021-04-19 Thread Rob Herring
Add arm64 specific tests for 32-bit and 64-bit counter user access. On
arm64, counters default to 32-bit unless attr.config1:0 is set to 1. In
order to enable user access for 64-bit counters, attr.config1 must be set
to 3.

Signed-off-by: Rob Herring 
---
v6:
 - New patch
---
 tools/perf/arch/arm64/include/arch-tests.h |  5 +++
 tools/perf/arch/arm64/tests/arch-tests.c   |  8 +
 tools/perf/arch/arm64/tests/user-events.c  | 38 ++
 3 files changed, 51 insertions(+)

diff --git a/tools/perf/arch/arm64/include/arch-tests.h 
b/tools/perf/arch/arm64/include/arch-tests.h
index 380ad34a3f09..ddfa7460e1e1 100644
--- a/tools/perf/arch/arm64/include/arch-tests.h
+++ b/tools/perf/arch/arm64/include/arch-tests.h
@@ -15,5 +15,10 @@ extern struct test arch_tests[];
 int test__rd_pinned(struct test __maybe_unused *test,
   int __maybe_unused subtest);
 
+int test__rd_64bit(struct test __maybe_unused *test,
+  int __maybe_unused subtest);
+
+int test__rd_32bit(struct test __maybe_unused *test,
+  int __maybe_unused subtest);
 
 #endif
diff --git a/tools/perf/arch/arm64/tests/arch-tests.c 
b/tools/perf/arch/arm64/tests/arch-tests.c
index 80ce7bd3c16d..bbdb81aa3229 100644
--- a/tools/perf/arch/arm64/tests/arch-tests.c
+++ b/tools/perf/arch/arm64/tests/arch-tests.c
@@ -14,6 +14,14 @@ struct test arch_tests[] = {
.desc = "Pinned CPU user counter access",
.func = test__rd_pinned,
},
+   {
+   .desc = "User 64-bit counter access",
+   .func = test__rd_64bit,
+   },
+   {
+   .desc = "User 32-bit counter access",
+   .func = test__rd_32bit,
+   },
{
.func = NULL,
},
diff --git a/tools/perf/arch/arm64/tests/user-events.c 
b/tools/perf/arch/arm64/tests/user-events.c
index c8efc6b369e6..546323b5242c 100644
--- a/tools/perf/arch/arm64/tests/user-events.c
+++ b/tools/perf/arch/arm64/tests/user-events.c
@@ -175,3 +175,41 @@ int test__rd_pinned(struct test __maybe_unused *test,
perf_evsel__delete(evsel);
return ret;
 }
+
+static int test__rd_counter_size(struct test __maybe_unused *test,
+int config1)
+{
+   int ret;
+   struct perf_evsel *evsel;
+   struct perf_event_attr attr = {
+   .type = PERF_TYPE_HARDWARE,
+   .config = PERF_COUNT_HW_INSTRUCTIONS,
+   .config1 = config1,
+   .exclude_kernel = 1,
+   };
+
+   if (!pmu_is_homogeneous())
+   return TEST_SKIP;
+
+   evsel = perf_init();
+   if (!evsel)
+   return -1;
+
+   ret = run_test(evsel);
+
+   perf_evsel__close(evsel);
+   perf_evsel__delete(evsel);
+   return ret;
+}
+
+int test__rd_64bit(struct test __maybe_unused *test,
+  int __maybe_unused subtest)
+{
+   return test__rd_counter_size(test, 0x3);
+}
+
+int test__rd_32bit(struct test __maybe_unused *test,
+  int __maybe_unused subtest)
+{
+   return test__rd_counter_size(test, 0x2);
+}
-- 
2.27.0



[PATCH v7 6/9] libperf: Add arm64 support to perf_mmap__read_self()

2021-04-19 Thread Rob Herring
Add the arm64 variants for read_perf_counter() and read_timestamp().
Unfortunately the counter number is encoded into the instruction, so the
code is a bit verbose to enumerate all possible counters.

Signed-off-by: Rob Herring 
---
v7:
 - Move enabling of libperf user read test for arm64 to this patch
---
 tools/lib/perf/mmap.c | 98 +++
 tools/lib/perf/tests/test-evsel.c |  2 +-
 2 files changed, 99 insertions(+), 1 deletion(-)

diff --git a/tools/lib/perf/mmap.c b/tools/lib/perf/mmap.c
index 915469f00cf4..216e6c6212a2 100644
--- a/tools/lib/perf/mmap.c
+++ b/tools/lib/perf/mmap.c
@@ -13,6 +13,7 @@
 #include 
 #include 
 #include 
+#include 
 #include "internal.h"
 
 void perf_mmap__init(struct perf_mmap *map, struct perf_mmap *prev,
@@ -294,6 +295,103 @@ static u64 read_timestamp(void)
 
return low | ((u64)high) << 32;
 }
+#elif defined(__aarch64__)
+#define read_sysreg(r) ({  \
+   u64 __val;  \
+   asm volatile("mrs %0, " __stringify(r) : "=r" (__val)); \
+   __val;  \
+})
+
+static u64 read_pmccntr(void)
+{
+   return read_sysreg(pmccntr_el0);
+}
+
+#define PMEVCNTR_READ(idx) \
+   static u64 read_pmevcntr_##idx(void) {  \
+   return read_sysreg(pmevcntr##idx##_el0);\
+   }
+
+PMEVCNTR_READ(0);
+PMEVCNTR_READ(1);
+PMEVCNTR_READ(2);
+PMEVCNTR_READ(3);
+PMEVCNTR_READ(4);
+PMEVCNTR_READ(5);
+PMEVCNTR_READ(6);
+PMEVCNTR_READ(7);
+PMEVCNTR_READ(8);
+PMEVCNTR_READ(9);
+PMEVCNTR_READ(10);
+PMEVCNTR_READ(11);
+PMEVCNTR_READ(12);
+PMEVCNTR_READ(13);
+PMEVCNTR_READ(14);
+PMEVCNTR_READ(15);
+PMEVCNTR_READ(16);
+PMEVCNTR_READ(17);
+PMEVCNTR_READ(18);
+PMEVCNTR_READ(19);
+PMEVCNTR_READ(20);
+PMEVCNTR_READ(21);
+PMEVCNTR_READ(22);
+PMEVCNTR_READ(23);
+PMEVCNTR_READ(24);
+PMEVCNTR_READ(25);
+PMEVCNTR_READ(26);
+PMEVCNTR_READ(27);
+PMEVCNTR_READ(28);
+PMEVCNTR_READ(29);
+PMEVCNTR_READ(30);
+
+/*
+ * Read a value direct from PMEVCNTR
+ */
+static u64 read_perf_counter(unsigned int counter)
+{
+   static u64 (* const read_f[])(void) = {
+   read_pmevcntr_0,
+   read_pmevcntr_1,
+   read_pmevcntr_2,
+   read_pmevcntr_3,
+   read_pmevcntr_4,
+   read_pmevcntr_5,
+   read_pmevcntr_6,
+   read_pmevcntr_7,
+   read_pmevcntr_8,
+   read_pmevcntr_9,
+   read_pmevcntr_10,
+   read_pmevcntr_11,
+   read_pmevcntr_13,
+   read_pmevcntr_12,
+   read_pmevcntr_14,
+   read_pmevcntr_15,
+   read_pmevcntr_16,
+   read_pmevcntr_17,
+   read_pmevcntr_18,
+   read_pmevcntr_19,
+   read_pmevcntr_20,
+   read_pmevcntr_21,
+   read_pmevcntr_22,
+   read_pmevcntr_23,
+   read_pmevcntr_24,
+   read_pmevcntr_25,
+   read_pmevcntr_26,
+   read_pmevcntr_27,
+   read_pmevcntr_28,
+   read_pmevcntr_29,
+   read_pmevcntr_30,
+   read_pmccntr
+   };
+
+   if (counter < ARRAY_SIZE(read_f))
+   return (read_f[counter])();
+
+   return 0;
+}
+
+static u64 read_timestamp(void) { return read_sysreg(cntvct_el0); }
+
 #else
 static u64 read_perf_counter(unsigned int counter) { return 0; }
 static u64 read_timestamp(void) { return 0; }
diff --git a/tools/lib/perf/tests/test-evsel.c 
b/tools/lib/perf/tests/test-evsel.c
index 288b5feaefe2..670fcdb6d6af 100644
--- a/tools/lib/perf/tests/test-evsel.c
+++ b/tools/lib/perf/tests/test-evsel.c
@@ -148,7 +148,7 @@ static int test_stat_user_read(int event)
 
pc = perf_evsel__mmap_base(evsel, 0, 0);
 
-#if defined(__i386__) || defined(__x86_64__)
+#if defined(__i386__) || defined(__x86_64__) || __defined(__aarch64__)
__T("userspace counter access not supported", pc->cap_user_rdpmc);
__T("userspace counter access not enabled", pc->index);
__T("userspace counter width not set", pc->pmc_width >= 32);
-- 
2.27.0



[PATCH v7 5/9] arm64: perf: Add userspace counter access disable switch

2021-04-19 Thread Rob Herring
Like x86, some users may want to disable userspace PMU counter
altogether. Add a sysfs 'rdpmc' file to control userspace counter
access. The default is '1' which is enabled. Writing '0' disables
access.

In the case of multiple PMUs (i.e. big.LITTLE), the control is per PMU
and userspace must disable access on each PMU.

Note that x86 also supports writing '2' to globally enable user access.
As there's not existing userspace support to worry about, this shouldn't
be necessary for Arm. It could be added later if the need arises.

Signed-off-by: Rob Herring 
---
 arch/arm64/kernel/perf_event.c | 61 --
 include/linux/perf/arm_pmu.h   |  4 ++-
 2 files changed, 61 insertions(+), 4 deletions(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index bfbb7f449aca..1ab6308ca89c 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -336,6 +336,54 @@ static const struct attribute_group 
armv8_pmuv3_caps_attr_group = {
.attrs = armv8_pmuv3_caps_attrs,
 };
 
+static void armv8pmu_disable_user_access(void *mm);
+
+static ssize_t get_attr_rdpmc(struct device *dev,
+ struct device_attribute *attr,
+ char *buf)
+{
+   struct pmu *pmu = dev_get_drvdata(dev);
+   struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu);
+
+   return snprintf(buf, 40, "%d\n", cpu_pmu->attr_rdpmc);
+}
+
+static ssize_t set_attr_rdpmc(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+   struct pmu *pmu = dev_get_drvdata(dev);
+   struct arm_pmu *cpu_pmu = container_of(pmu, struct arm_pmu, pmu);
+   unsigned long val;
+   ssize_t ret;
+
+   ret = kstrtoul(buf, 0, );
+   if (ret)
+   return ret;
+
+   if (val > 1)
+   return -EINVAL;
+
+   if (val != cpu_pmu->attr_rdpmc) {
+   cpu_pmu->attr_rdpmc = !!val;
+   if (!val)
+   on_each_cpu_mask(_pmu->supported_cpus, 
armv8pmu_disable_user_access, NULL, 1);
+   }
+
+   return count;
+}
+
+static DEVICE_ATTR(rdpmc, S_IRUSR | S_IWUSR, get_attr_rdpmc, set_attr_rdpmc);
+
+static struct attribute *armv8_pmuv3_rdpmc_attrs[] = {
+   _attr_rdpmc.attr,
+   NULL,
+};
+
+static const struct attribute_group armv8_pmuv3_rdpmc_attr_group = {
+   .attrs = armv8_pmuv3_rdpmc_attrs,
+};
+
 /*
  * Perf Events' indices
  */
@@ -950,7 +998,8 @@ static void armv8pmu_sched_task(struct perf_event_context 
*ctx, bool sched_in)
 * If a new task has user access enabled, clear the dirty counters
 * to prevent the potential leak.
 */
-   if (ctx && current->mm && 
atomic_read(>mm->context.pmu_direct_access)) {
+   if (ctx && to_arm_pmu(ctx->pmu)->attr_rdpmc &&
+   current->mm && 
atomic_read(>mm->context.pmu_direct_access)) {
armv8pmu_enable_user_access();
armv8pmu_clear_dirty_counters(to_arm_pmu(ctx->pmu));
} else {
@@ -1093,7 +1142,8 @@ static int __armv8_pmuv3_map_event(struct perf_event 
*event,
   _pmuv3_perf_cache_map,
   ARMV8_PMU_EVTYPE_EVENT);
 
-   if (armv8pmu_event_want_user_access(event) || 
!armv8pmu_event_is_64bit(event))
+   if (armpmu->attr_rdpmc &&
+   (armv8pmu_event_want_user_access(event) || 
!armv8pmu_event_is_64bit(event)))
event->hw.flags |= ARMPMU_EL0_RD_CNTR;
 
/*
@@ -1218,7 +1268,9 @@ static int armv8pmu_probe_pmu(struct arm_pmu *cpu_pmu)
 
 static int armv8pmu_undef_handler(struct pt_regs *regs, u32 insn)
 {
-   if (atomic_read(>mm->context.pmu_direct_access)) {
+   struct arm_pmu *armpmu = *this_cpu_ptr(_armpmu);
+
+   if (armpmu->attr_rdpmc && 
atomic_read(>mm->context.pmu_direct_access)) {
armv8pmu_enable_user_access();
return 0;
}
@@ -1277,6 +1329,9 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char 
*name,
cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_CAPS] = caps ?
caps : _pmuv3_caps_attr_group;
 
+   cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_RDPMC] = 
_pmuv3_rdpmc_attr_group;
+   cpu_pmu->attr_rdpmc = true;
+
return 0;
 }
 
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 1daad3b2cce5..9303cd07ac57 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -82,6 +82,7 @@ enum armpmu_attr_groups {
ARMPMU_ATTR_GROUP_EVENTS,
ARMPMU_ATTR_GROUP_FORMATS,
ARMPMU_ATTR_GROUP_CAPS,
+   ARMPMU_ATTR_GROUP_RDPMC,
ARMPMU_NR_ATTR_GROUPS
 };
 
@@ -107,7 +108,8 @@ struct arm_pmu {
int (*map_event)(struct perf_event *event);
int (*filter_match)(struct perf_event *event);
int num_events;
-   bool  

[PATCH v7 3/9] arm64: perf: Enable PMU counter direct access for perf event

2021-04-19 Thread Rob Herring
From: Raphael Gault 

Keep track of event opened with direct access to the hardware counters
and modify permissions while they are open.

The strategy used here is the same which x86 uses: every time an event
is mapped, the permissions are set if required. The atomic field added
in the mm_context helps keep track of the different event opened and
de-activate the permissions when all are unmapped.
We also need to update the permissions in the context switch code so
that tasks keep the right permissions.

In order to enable 64-bit counters for userspace when available, a new
config1 bit is added for userspace to indicate it wants userspace counter
access. This bit allows the kernel to decide if chaining should be
disabled and chaining and userspace access are incompatible.
The modes for config1 are as follows:

config1 = 0 or 2 : user access enabled and always 32-bit
config1 = 1 : user access disabled and always 64-bit (using chaining if needed)
config1 = 3 : user access enabled and counter size matches underlying counter.

User access is enabled with config1 == 0 so that we match x86 behavior
and don't need Arm specific code (outside the counter read).

Signed-off-by: Raphael Gault 
Signed-off-by: Rob Herring 
---
Peter Z says (event->oncpu == smp_processor_id()) in the user page
update is always true, but my testing says otherwise[1].

v7:
 - Clear disabled counters when user access is enabled for a task to
   avoid leaking other tasks counter data.
 - Rework context switch handling utilizing sched_task callback
 - Add armv8pmu_event_can_chain() helper
 - Rework config1 flags handling structure
 - Use ARMV8_IDX_CYCLE_COUNTER_USER define for remapped user cycle
   counter index

v6:
 - Add new attr.config1 rdpmc bit for userspace to hint it wants
   userspace access when also requesting 64-bit counters.

v5:
 - Only set cap_user_rdpmc if event is on current cpu
 - Limit enabling/disabling access to CPUs associated with the PMU
   (supported_cpus) and with the mm_struct matching current->active_mm.

v2:
 - Move mapped/unmapped into arm64 code. Fixes arm32.
 - Rebase on cap_user_time_short changes

Changes from Raphael's v4:
  - Drop homogeneous check
  - Disable access for chained counters
  - Set pmc_width in user page

[1] 
https://lore.kernel.org/lkml/cal_jsqk+ekef5navnbfarcjre3myhfbfe54f9yhkbstnwql...@mail.gmail.com/
---
 arch/arm64/include/asm/mmu.h   |   5 +
 arch/arm64/kernel/perf_event.c | 179 +++--
 include/linux/perf/arm_pmu.h   |   6 ++
 3 files changed, 183 insertions(+), 7 deletions(-)

diff --git a/arch/arm64/include/asm/mmu.h b/arch/arm64/include/asm/mmu.h
index 75beffe2ee8a..ee08447455da 100644
--- a/arch/arm64/include/asm/mmu.h
+++ b/arch/arm64/include/asm/mmu.h
@@ -18,6 +18,11 @@
 
 typedef struct {
atomic64_t  id;
+   /*
+* non-zero if userspace have access to hardware
+* counters directly.
+*/
+   atomic_tpmu_direct_access;
 #ifdef CONFIG_COMPAT
void*sigpage;
 #endif
diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 40cf59455ce8..bfbb7f449aca 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -8,9 +8,11 @@
  * This code is based heavily on the ARMv7 perf event code.
  */
 
+#include 
 #include 
 #include 
 #include 
+#include 
 #include 
 
 #include 
@@ -288,15 +290,22 @@ static const struct attribute_group 
armv8_pmuv3_events_attr_group = {
 
 PMU_FORMAT_ATTR(event, "config:0-15");
 PMU_FORMAT_ATTR(long, "config1:0");
+PMU_FORMAT_ATTR(rdpmc, "config1:1");
 
 static inline bool armv8pmu_event_is_64bit(struct perf_event *event)
 {
return event->attr.config1 & 0x1;
 }
 
+static inline bool armv8pmu_event_want_user_access(struct perf_event *event)
+{
+   return event->attr.config1 & 0x2;
+}
+
 static struct attribute *armv8_pmuv3_format_attrs[] = {
_attr_event.attr,
_attr_long.attr,
+   _attr_rdpmc.attr,
NULL,
 };
 
@@ -344,6 +353,15 @@ static bool armv8pmu_has_long_event(struct arm_pmu 
*cpu_pmu)
return (cpu_pmu->pmuver >= ID_AA64DFR0_PMUVER_8_5);
 }
 
+static inline bool armv8pmu_event_can_chain(struct perf_event *event)
+{
+   struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
+
+   return !armv8pmu_event_want_user_access(event) &&
+  armv8pmu_event_is_64bit(event) &&
+  !armv8pmu_has_long_event(cpu_pmu);
+}
+
 /*
  * We must chain two programmable counters for 64 bit events,
  * except when we have allocated the 64bit cycle counter (for CPU
@@ -353,11 +371,9 @@ static bool armv8pmu_has_long_event(struct arm_pmu 
*cpu_pmu)
 static inline bool armv8pmu_event_is_chained(struct perf_event *event)
 {
int idx = event->hw.idx;
-   struct arm_pmu *cpu_pmu = to_arm_pmu(event->pmu);
 
return !WARN_ON(idx < 0) &&
-  armv8pmu_event_is_64bit(event) &&
-  !armv8pmu_has_long_event(cpu_pmu) &&
+  

[PATCH v7 4/9] drivers/perf: arm_pmu: Export the per_cpu cpu_armpmu

2021-04-19 Thread Rob Herring
The arm64 PMU driver needs to retrieve the struct arm_pmu pointer for
the current CPU. As the common code already maintains this with the
percpu cpu_armpmu, let's make it global.

Signed-off-by: Rob Herring 
---
 drivers/perf/arm_pmu.c   | 2 +-
 include/linux/perf/arm_pmu.h | 2 ++
 2 files changed, 3 insertions(+), 1 deletion(-)

diff --git a/drivers/perf/arm_pmu.c b/drivers/perf/arm_pmu.c
index 2d10d84fb79c..6ac56280b9c7 100644
--- a/drivers/perf/arm_pmu.c
+++ b/drivers/perf/arm_pmu.c
@@ -99,7 +99,7 @@ static const struct pmu_irq_ops percpu_pmunmi_ops = {
.free_pmuirq = armpmu_free_percpu_pmunmi
 };
 
-static DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu);
+DEFINE_PER_CPU(struct arm_pmu *, cpu_armpmu);
 static DEFINE_PER_CPU(int, cpu_irq);
 static DEFINE_PER_CPU(const struct pmu_irq_ops *, cpu_irq_ops);
 
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 63af052e3909..1daad3b2cce5 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -127,6 +127,8 @@ struct arm_pmu {
 
 #define to_arm_pmu(p) (container_of(p, struct arm_pmu, pmu))
 
+DECLARE_PER_CPU(struct arm_pmu *, cpu_armpmu);
+
 u64 armpmu_event_update(struct perf_event *event);
 
 int armpmu_event_set_period(struct perf_event *event);
-- 
2.27.0



[PATCH v7 2/9] arm64: pmu: Add function implementation to update event index in userpage

2021-04-19 Thread Rob Herring
From: Raphael Gault 

In order to be able to access the counter directly for userspace,
we need to provide the index of the counter using the userpage.
We thus need to override the event_idx function to retrieve and
convert the perf_event index to armv8 hardware index.

Since the arm_pmu driver can be used by any implementation, even
if not armv8, two components play a role into making sure the
behaviour is correct and consistent with the PMU capabilities:

* the ARMPMU_EL0_RD_CNTR flag which denotes the capability to access
counter from userspace.
* the event_idx call back, which is implemented and initialized by
the PMU implementation: if no callback is provided, the default
behaviour applies, returning 0 as index value.

Signed-off-by: Raphael Gault 
Signed-off-by: Rob Herring 
---
v7:
 - Add define ARMV8_IDX_CYCLE_COUNTER_USER for userspace index
   of cycle counter
---
 arch/arm64/kernel/perf_event.c | 20 +++-
 include/linux/perf/arm_pmu.h   |  2 ++
 2 files changed, 21 insertions(+), 1 deletion(-)

diff --git a/arch/arm64/kernel/perf_event.c b/arch/arm64/kernel/perf_event.c
index 4658fcf88c2b..40cf59455ce8 100644
--- a/arch/arm64/kernel/perf_event.c
+++ b/arch/arm64/kernel/perf_event.c
@@ -332,7 +332,7 @@ static const struct attribute_group 
armv8_pmuv3_caps_attr_group = {
  */
 #defineARMV8_IDX_CYCLE_COUNTER 0
 #defineARMV8_IDX_COUNTER0  1
-
+#defineARMV8_IDX_CYCLE_COUNTER_USER32
 
 /*
  * We unconditionally enable ARMv8.5-PMU long event counter support
@@ -871,6 +871,22 @@ static void armv8pmu_clear_event_idx(struct pmu_hw_events 
*cpuc,
clear_bit(idx - 1, cpuc->used_mask);
 }
 
+static int armv8pmu_access_event_idx(struct perf_event *event)
+{
+   if (!(event->hw.flags & ARMPMU_EL0_RD_CNTR))
+   return 0;
+
+   /*
+* We remap the cycle counter index to 32 to
+* match the offset applied to the rest of
+* the counter indices.
+*/
+   if (event->hw.idx == ARMV8_IDX_CYCLE_COUNTER)
+   return ARMV8_IDX_CYCLE_COUNTER_USER;
+
+   return event->hw.idx;
+}
+
 /*
  * Add an event filter to a given event.
  */
@@ -1098,6 +1114,8 @@ static int armv8_pmu_init(struct arm_pmu *cpu_pmu, char 
*name,
cpu_pmu->set_event_filter   = armv8pmu_set_event_filter;
cpu_pmu->filter_match   = armv8pmu_filter_match;
 
+   cpu_pmu->pmu.event_idx  = armv8pmu_access_event_idx;
+
cpu_pmu->name   = name;
cpu_pmu->map_event  = map_event;
cpu_pmu->attr_groups[ARMPMU_ATTR_GROUP_EVENTS] = events ?
diff --git a/include/linux/perf/arm_pmu.h b/include/linux/perf/arm_pmu.h
index 505480217cf1..d29aa981d989 100644
--- a/include/linux/perf/arm_pmu.h
+++ b/include/linux/perf/arm_pmu.h
@@ -26,6 +26,8 @@
  */
 /* Event uses a 64bit counter */
 #define ARMPMU_EVT_64BIT   1
+/* Allow access to hardware counter from userspace */
+#define ARMPMU_EL0_RD_CNTR 2
 
 #define HW_OP_UNSUPPORTED  0x
 #define C(_x)  PERF_COUNT_HW_CACHE_##_x
-- 
2.27.0



[PATCH v7 0/9] arm64 userspace counter access support

2021-04-19 Thread Rob Herring
Hi all,

Another version of arm64 userspace counter access support. I sent out
the libperf bits separately and those are now applied (Thanks!), so this
is just the arm64 bits.


This originally resurrected Raphael's series[1] to enable userspace counter
access on arm64. My previous versions are here[2][3][4][5][6][7]. A git
branch is here[8].


Changes in v7:
 - Handling of dirty counter leakage and reworking of context switch and
   user access enabling. The .sched_task hook and undef instruction handler
   are now utilized. (Patch 3)
 - Add a userspace disable switch like x86. (Patch 5)

Changes in v6:
 - Reworking of the handling of 64-bit counters and user access. There's
   a new config1 flag to request user access. This takes priority over
   the 64-bit flag and the user will get the maximum size the h/w
   supports without chaining.
 - The libperf evsel mmap struct is stored in its own xyarray
 - New tests for user 64-bit and 32-bit counters
 - Rebase to v5.12-rc2

Changes in v5:
 - Limit enabling/disabling access to CPUs associated with the PMU
   (supported_cpus) and with the mm_struct matching current->active_mm.
   The x86 method of using mm_cpumask doesn't work for arm64 as it is not
   updated.
 - Only set cap_user_rdpmc if event is on current cpu. See patch 2.
 - Create an mmap for every event in an evsel. This results in some changes
   to the libperf mmap API from the last version.
 - Rebase to v5.11-rc2

Changes in v4:
 - Dropped 'arm64: pmu: Add hook to handle pmu-related undefined instructions'.
   The onus is on userspace to pin itself to a homogeneous subset of CPUs
   and avoid any aborts on heterogeneous systems, so the hook is not needed.
 - Make perf_evsel__mmap() take pages rather than bytes for size
 - Fix building arm64 heterogeneous test.

Changes in v3:
 - Dropped removing x86 rdpmc test until libperf tests can run via 'perf test'
 - Added verbose prints for tests
 - Split adding perf_evsel__mmap() to separate patch

The following changes to the arm64 support have been made compared to
Raphael's last version:

The major change is support for heterogeneous systems with some
restrictions. Specifically, userspace must pin itself to like CPUs, open
a specific PMU by type, and use h/w specific events. The tests have been
reworked to demonstrate this.

Chained events are not supported. The problem with supporting chained
events was there's no way to distinguish between a chained event and a
native 64-bit counter. We could add some flag, but do self monitoring
processes really need that? Native 64-bit counters are supported if the
PMU h/w has support. As there's already an explicit ABI to request 64-bit
counters, userspace can request 64-bit counters and if user
access is not enabled, then it must retry with 32-bit counters.

Prior versions broke the build on arm32 (surprisingly never caught by
0-day). As a result, event_mapped and event_unmapped implementations have
been moved into the arm64 code.

There was a bug in that pmc_width was not set in the user page. The tests
now check for this.

The documentation has been converted to rST. I've added sections on
chained events and heterogeneous.

The tests have been expanded to test the cycle counter access.

Rob

[1] https://lore.kernel.org/r/20190822144220.27860-1-raphael.ga...@arm.com/
[2] https://lore.kernel.org/r/20200707205333.624938-1-r...@kernel.org/
[3] https://lore.kernel.org/r/20200828205614.3391252-1-r...@kernel.org/
[4] https://lore.kernel.org/r/20200911215118.2887710-1-r...@kernel.org/
[5] https://lore.kernel.org/r/20201001140116.651970-1-r...@kernel.org/
[6] https://lore.kernel.org/r/20210114020605.3943992-1-r...@kernel.org/
[7] https://lore.kernel.org/r/20210311000837.3630499-1-r...@kernel.org/
[8] git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git 
user-perf-event-v7

Raphael Gault (4):
  arm64: Restrict undef hook for cpufeature registers
  arm64: pmu: Add function implementation to update event index in
userpage
  arm64: perf: Enable PMU counter direct access for perf event
  Documentation: arm64: Document PMU counters access from userspace

Rob Herring (5):
  drivers/perf: arm_pmu: Export the per_cpu cpu_armpmu
  arm64: perf: Add userspace counter access disable switch
  libperf: Add arm64 support to perf_mmap__read_self()
  perf: arm64: Add test for userspace counter access on heterogeneous
systems
  perf: arm64: Add tests for 32-bit and 64-bit counter size userspace
access

 Documentation/arm64/perf.rst   |  67 +-
 arch/arm64/include/asm/mmu.h   |   5 +
 arch/arm64/kernel/cpufeature.c |   4 +-
 arch/arm64/kernel/perf_event.c | 254 -
 drivers/perf/arm_pmu.c |   2 +-
 include/linux/perf/arm_pmu.h   |  14 +-
 tools/lib/perf/mmap.c  |  98 
 tools/lib/perf/tests/test-evsel.c  |   2 +-
 tools/perf/arch/arm64/include/arch-tests.h |  12 +
 

[PATCH v7 1/9] arm64: Restrict undef hook for cpufeature registers

2021-04-19 Thread Rob Herring
From: Raphael Gault 

This commit modifies the mask of the mrs_hook declared in
arch/arm64/kernel/cpufeatures.c which emulates only feature register
access. This is necessary because this hook's mask was too large and
thus masking any mrs instruction, even if not related to the emulated
registers which made the pmu emulation inefficient.

Signed-off-by: Raphael Gault 
Signed-off-by: Rob Herring 
---
v7:
 - Split off from Raphael's original undef hook patch as this change stands
   on its own.
---
 arch/arm64/kernel/cpufeature.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

diff --git a/arch/arm64/kernel/cpufeature.c b/arch/arm64/kernel/cpufeature.c
index 066030717a4c..aa0777690ab1 100644
--- a/arch/arm64/kernel/cpufeature.c
+++ b/arch/arm64/kernel/cpufeature.c
@@ -2888,8 +2888,8 @@ static int emulate_mrs(struct pt_regs *regs, u32 insn)
 }

 static struct undef_hook mrs_hook = {
-   .instr_mask = 0xfff0,
-   .instr_val  = 0xd530,
+   .instr_mask = 0x,
+   .instr_val  = 0xd538,
.pstate_mask = PSR_AA32_MODE_MASK,
.pstate_val = PSR_MODE_EL0t,
.fn = emulate_mrs,
--
2.27.0


Re: [PATCH] hrtimer: Update softirq_expires_next correctly after __hrtimer_get_next_event()

2021-04-19 Thread Maciej Żenczykowski
On Thu, Apr 15, 2021 at 9:47 AM Thomas Gleixner  wrote:
>
> On Wed, Apr 14 2021 at 11:49, Lorenzo Colitti wrote:
> > On Wed, Apr 14, 2021 at 2:14 AM Greg KH  wrote:
> >> To give context, the commit is now 46eb1701c046 ("hrtimer: Update
> >> softirq_expires_next correctly after __hrtimer_get_next_event()") and is
> >> attached below.
> >>
> >> The f_ncm.c driver is doing a lot of calls to hrtimer_start() with mode
> >> HRTIMER_MODE_REL_SOFT for I think every packet it gets.  If that should
> >> not be happening, we can easily fix it but I guess no one has actually
> >> had fast USB devices that noticed this until now :)
> >
> > AIUI the purpose of this timer is to support packet aggregation. USB
> > transfers are relatively expensive/high latency. 6 Gbps is 500k
> > 1500-byte packets per second, or one every 2us. So f_ncm buffers as
> > many packets as will fit into 16k (usually, 10 1500-byte packets), and
> > only initiates a USB transfer when those packets have arrived. That
> > ends up doing only one transfer every 20us. It sets a 300us timer to
> > ensure that if the 10 packets haven't arrived, it still sends out
> > whatever it has when the timer fires. The timer is set/updated on
> > every packet buffered by ncm.
> >
> > Is this somehow much more expensive in 5.10.24 than it was before?
> > Even if this driver is somehow "holding it wrong", might there not be
> > other workloads that have a similar problem? What about regressions on
> > those workloads?
>
> Let's put the question of whether this hrtimer usage is sensible or not
> aside for now.
>
> I stared at the change for a while and did some experiments to recreate
> the problem, but that didn't get me anywhere.
>
> Could you please do the following?
>
> Enable tracing and enable the following tracepoints:
>
> timers/hrtimer_cancel
> timers/hrtimer_start
> timers/hrtimer_expire_entry
> irq/softirq_raise
> irq/softirq_enter
> irq/softirq_exit
>
> and function tracing filtered on ncm_wrap_ntb() and
> package_for_tx() only (to reduce the noise).
>
> Run the test on a kernels with and without that commit and collect trace
> data for both.
>
> That should give me a pretty clear picture what's going on.

Lorenzo is trying to get the traces you asked for, or rather he’s
trying to get confirmation he can post them.

Our initial observation of these results seems to suggest that
updating the timer (hrtimer_start, which seems to also call
hrtimer_cancel) takes twice as long as it used to.

My gut feeling is that softirq_activated is usually false, and the old
code in such a case calls just __hrtimer_get_next_event(,
HRTIMER_ACTIVE_ALL).  While the new code will first call
__hrtimer_get_next_event(, HRTIMER_ACTIVE_SOFT) and then
__hrtimer_get_next_event(, HRTIMER_ACTIVE_HARD)

Perhaps __hrtimer_get_next_event() should return both soft and hard
event times in one function call?
Or perhaps hrtimer_start should not call hrtimer_cancel?

We've also been looking at the f_ncm driver itself, and trying to
reduce the number of hrtimer_cancel/start calls.
It's pretty easy to reduce this by a factor of 10x (we’re not yet 100%
certain this is race free), which does drastically improve
performance.
However, it still only takes the regression from 60% to 20%.

- Maciej


Re: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems

2021-04-19 Thread Matthew Wilcox
On Tue, Apr 20, 2021 at 02:48:17AM +, Vineet Gupta wrote:
> > 32-bit architectures which expect 8-byte alignment for 8-byte integers
> > and need 64-bit DMA addresses (arc, arm, mips, ppc) had their struct
> > page inadvertently expanded in 2019.
> 
> FWIW, ARC doesn't require 8 byte alignment for 8 byte integers. This is 
> only needed for 8-byte atomics due to the requirements of LLOCKD/SCOND 
> instructions.

Ah, like x86?  OK, great, I'll drop your arch from the list of
affected.  Thanks!


RE: [EXT] Re: [net-next] net: dsa: felix: disable always guard band bit for TAS config

2021-04-19 Thread Xiaoliang Yang
Hi Vladimir.

On Mon, Apr 19, 2021 at 20:38PM +0800, Vladimir Oltean wrote:
>
>What is a scheduled queue? When time-aware scheduling is enabled on the port, 
>why are some queues scheduled and some not?

The felix vsc9959 device can set SCH_TRAFFIC_QUEUES field bits to define which 
queue is scheduled. Only the set queues serves schedule traffic. In this driver 
we set all 8 queues to be scheduled in default, so all the traffic are schedule 
queues to
schedule queue.

Thanks,
Xiaoliang Yang


Re: [PATCH V4 1/3] vDPA/ifcvf: deduce VIRTIO device ID when probe

2021-04-19 Thread Jason Wang



在 2021/4/19 下午2:33, Zhu Lingshan 写道:

This commit deduces VIRTIO device ID as device type when probe,
then ifcvf_vdpa_get_device_id() can simply return the ID.
ifcvf_vdpa_get_features() and ifcvf_vdpa_get_config_size()
can work properly based on the device ID.

Signed-off-by: Zhu Lingshan 



Acked-by: Jason Wang 



---
  drivers/vdpa/ifcvf/ifcvf_base.h |  1 +
  drivers/vdpa/ifcvf/ifcvf_main.c | 27 +++
  2 files changed, 16 insertions(+), 12 deletions(-)

diff --git a/drivers/vdpa/ifcvf/ifcvf_base.h b/drivers/vdpa/ifcvf/ifcvf_base.h
index b2eeb16b9c2c..1c04cd256fa7 100644
--- a/drivers/vdpa/ifcvf/ifcvf_base.h
+++ b/drivers/vdpa/ifcvf/ifcvf_base.h
@@ -84,6 +84,7 @@ struct ifcvf_hw {
u32 notify_off_multiplier;
u64 req_features;
u64 hw_features;
+   u32 dev_type;
struct virtio_pci_common_cfg __iomem *common_cfg;
void __iomem *net_cfg;
struct vring_info vring[IFCVF_MAX_QUEUE_PAIRS * 2];
diff --git a/drivers/vdpa/ifcvf/ifcvf_main.c b/drivers/vdpa/ifcvf/ifcvf_main.c
index 44d7586019da..66927ec81fa5 100644
--- a/drivers/vdpa/ifcvf/ifcvf_main.c
+++ b/drivers/vdpa/ifcvf/ifcvf_main.c
@@ -323,19 +323,9 @@ static u32 ifcvf_vdpa_get_generation(struct vdpa_device 
*vdpa_dev)
  
  static u32 ifcvf_vdpa_get_device_id(struct vdpa_device *vdpa_dev)

  {
-   struct ifcvf_adapter *adapter = vdpa_to_adapter(vdpa_dev);
-   struct pci_dev *pdev = adapter->pdev;
-   u32 ret = -ENODEV;
-
-   if (pdev->device < 0x1000 || pdev->device > 0x107f)
-   return ret;
-
-   if (pdev->device < 0x1040)
-   ret =  pdev->subsystem_device;
-   else
-   ret =  pdev->device - 0x1040;
+   struct ifcvf_hw *vf = vdpa_to_vf(vdpa_dev);
  
-	return ret;

+   return vf->dev_type;
  }
  
  static u32 ifcvf_vdpa_get_vendor_id(struct vdpa_device *vdpa_dev)

@@ -466,6 +456,19 @@ static int ifcvf_probe(struct pci_dev *pdev, const struct 
pci_device_id *id)
pci_set_drvdata(pdev, adapter);
  
  	vf = >vf;

+
+   /* This drirver drives both modern virtio devices and transitional
+* devices in modern mode.
+* vDPA requires feature bit VIRTIO_F_ACCESS_PLATFORM,
+* so legacy devices and transitional devices in legacy
+* mode will not work for vDPA, this driver will not
+* drive devices with legacy interface.
+*/
+   if (pdev->device < 0x1040)
+   vf->dev_type =  pdev->subsystem_device;
+   else
+   vf->dev_type =  pdev->device - 0x1040;
+
vf->base = pcim_iomap_table(pdev);
  
  	adapter->pdev = pdev;




[RESEND v2] iommu/vt-d: Use passthrough mode for the Intel IPUs

2021-04-19 Thread Bingbu Cao
Intel IPU(Image Processing Unit) has its own (IO)MMU hardware,
The IPU driver allocates its own page table that is not mapped
via the DMA, and thus the Intel IOMMU driver blocks access giving
this error:

DMAR: DRHD: handling fault status reg 3
DMAR: [DMA Read] Request device [00:05.0] PASID 
  fault addr 76406000 [fault reason 06] PTE Read access is not set

As IPU is not an external facing device which is not risky, so use
IOMMU passthrough mode for Intel IPUs.

Fixes: 26f5689592e2 ("media: staging/intel-ipu3: mmu: Implement driver")
Signed-off-by: Bingbu Cao 
---
 drivers/iommu/intel/iommu.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ee0932307d64..7e2fbdae467e 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -55,6 +55,12 @@
 #define IS_GFX_DEVICE(pdev) ((pdev->class >> 16) == PCI_BASE_CLASS_DISPLAY)
 #define IS_USB_DEVICE(pdev) ((pdev->class >> 8) == PCI_CLASS_SERIAL_USB)
 #define IS_ISA_DEVICE(pdev) ((pdev->class >> 8) == PCI_CLASS_BRIDGE_ISA)
+#define IS_INTEL_IPU(pdev) ((pdev)->vendor == PCI_VENDOR_ID_INTEL &&   \
+   ((pdev)->device == 0x9a19 ||\
+(pdev)->device == 0x9a39 ||\
+(pdev)->device == 0x4e19 ||\
+(pdev)->device == 0x465d ||\
+(pdev)->device == 0x1919))
 #define IS_AZALIA(pdev) ((pdev)->vendor == 0x8086 && (pdev)->device == 0x3a3e)
 
 #define IOAPIC_RANGE_START (0xfee0)
@@ -360,6 +366,7 @@ int intel_iommu_enabled = 0;
 EXPORT_SYMBOL_GPL(intel_iommu_enabled);
 
 static int dmar_map_gfx = 1;
+static int dmar_map_ipu = 1;
 static int dmar_forcedac;
 static int intel_iommu_strict;
 static int intel_iommu_superpage = 1;
@@ -368,6 +375,7 @@ static int iommu_skip_te_disable;
 
 #define IDENTMAP_GFX   2
 #define IDENTMAP_AZALIA4
+#define IDENTMAP_IPU   8
 
 int intel_iommu_gfx_mapped;
 EXPORT_SYMBOL_GPL(intel_iommu_gfx_mapped);
@@ -2839,6 +2847,9 @@ static int device_def_domain_type(struct device *dev)
 
if ((iommu_identity_mapping & IDENTMAP_GFX) && 
IS_GFX_DEVICE(pdev))
return IOMMU_DOMAIN_IDENTITY;
+
+   if ((iommu_identity_mapping & IDENTMAP_IPU) && 
IS_INTEL_IPU(pdev))
+   return IOMMU_DOMAIN_IDENTITY;
}
 
return 0;
@@ -3278,6 +3289,9 @@ static int __init init_dmars(void)
if (!dmar_map_gfx)
iommu_identity_mapping |= IDENTMAP_GFX;
 
+   if (!dmar_map_ipu)
+   iommu_identity_mapping |= IDENTMAP_IPU;
+
check_tylersburg_isoch();
 
ret = si_domain_init(hw_pass_through);
@@ -5622,6 +5636,18 @@ static void quirk_iommu_igfx(struct pci_dev *dev)
dmar_map_gfx = 0;
 }
 
+static void quirk_iommu_ipu(struct pci_dev *dev)
+{
+   if (!IS_INTEL_IPU(dev))
+   return;
+
+   if (risky_device(dev))
+   return;
+
+   pci_info(dev, "Passthrough IOMMU for integrated Intel IPU\n");
+   dmar_map_ipu = 0;
+}
+
 /* G4x/GM45 integrated gfx dmar support is totally busted. */
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2a40, quirk_iommu_igfx);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2e00, quirk_iommu_igfx);
@@ -5657,6 +5683,9 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x1632, 
quirk_iommu_igfx);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x163A, quirk_iommu_igfx);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x163D, quirk_iommu_igfx);
 
+/* disable IPU dmar support */
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_ANY_ID, quirk_iommu_ipu);
+
 static void quirk_iommu_rwbf(struct pci_dev *dev)
 {
if (risky_device(dev))
-- 
2.7.4



[PATCH] scsi: ufs: Add batched WB buffer flush

2021-04-19 Thread Daejun Park
Currently, WriteBooster (WB) buffer is always flushed during hibern8. However,
this is inefficient because data in the WB buffer can be invalid due to
spatial locality of IO workload.
If the WB buffer flush is flushed in a batched manner, the amount of data
migration and power consumption can be reduced because the overwritten data
of the WB buffer may be invalid due to spatial locality.

This patch supports batched flush of WB buffer. When batched flush is enabled,
fWriteBoosterBufferFlushDuringHibernate is set only when
b_rpm_dev_flush_capable is true during runtime suspend. When the device is
resumed, fWriteBoosterBufferFlushDuringHibernate is cleared to stop flush
during hibern8.

Co-developed-by: Keoseong Park 
Signed-off-by: Keoseong Park 
Signed-off-by: Daejun Park 
---
 Documentation/ABI/testing/sysfs-driver-ufs |  9 
 drivers/scsi/ufs/ufs-sysfs.c   | 50 ++
 drivers/scsi/ufs/ufshcd.c  | 14 --
 drivers/scsi/ufs/ufshcd.h  |  2 +
 4 files changed, 71 insertions(+), 4 deletions(-)

diff --git a/Documentation/ABI/testing/sysfs-driver-ufs 
b/Documentation/ABI/testing/sysfs-driver-ufs
index d1bc23cb6a9d..b67b8449e840 100644
--- a/Documentation/ABI/testing/sysfs-driver-ufs
+++ b/Documentation/ABI/testing/sysfs-driver-ufs
@@ -1172,3 +1172,12 @@ Description: This node is used to set or display 
whether UFS WriteBooster is
(if the platform supports UFSHCD_CAP_CLK_SCALING). For a
platform that doesn't support UFSHCD_CAP_CLK_SCALING, we can
disable/enable WriteBooster through this sysfs node.
+
+What:  /sys/bus/platform/drivers/ufshcd/*/wb_batched_flush
+Date:  April 2021
+Contact:   Daejun Park 
+Description:   This entry shows whether batch flushing of UFS WriteBooster
+   buffers is enabled. Writing 1 to this entry allows the device 
to flush
+   the WriteBooster buffer only when it needs to perform a buffer 
flush
+   during runtime suspend. Writing 0 to this entry allows the 
device to
+   flush the WriteBooster buffer during link hibernation.
diff --git a/drivers/scsi/ufs/ufs-sysfs.c b/drivers/scsi/ufs/ufs-sysfs.c
index d7c3cff9662f..943ac12e59c6 100644
--- a/drivers/scsi/ufs/ufs-sysfs.c
+++ b/drivers/scsi/ufs/ufs-sysfs.c
@@ -253,6 +253,54 @@ static ssize_t wb_on_store(struct device *dev, struct 
device_attribute *attr,
return res < 0 ? res : count;
 }
 
+
+static ssize_t wb_batched_flush_show(struct device *dev,
+struct device_attribute *attr, char *buf)
+{
+   struct ufs_hba *hba = dev_get_drvdata(dev);
+
+   return sysfs_emit(buf, "%d\n", hba->vps->wb_batched_flush);
+}
+
+static ssize_t wb_batched_flush_store(struct device *dev,
+ struct device_attribute *attr,
+ const char *buf, size_t count)
+{
+   struct ufs_hba *hba = dev_get_drvdata(dev);
+   unsigned int wb_batched_flush;
+   ssize_t res;
+
+   if (!ufshcd_is_wb_allowed(hba)) {
+   dev_warn(dev, "To control WB through wb_batched_flush is not 
allowed!\n");
+   return -EOPNOTSUPP;
+   }
+
+   if (kstrtouint(buf, 0, _batched_flush))
+   return -EINVAL;
+
+   if (wb_batched_flush != 0 && wb_batched_flush != 1)
+   return -EINVAL;
+
+   down(>host_sem);
+   if (!ufshcd_is_user_access_allowed(hba)) {
+   res = -EBUSY;
+   goto out;
+   }
+
+   if (wb_batched_flush == hba->vps->wb_batched_flush)
+   goto out;
+
+   pm_runtime_get_sync(hba->dev);
+   res = ufshcd_wb_toggle_flush_during_h8(hba, !wb_batched_flush);
+   pm_runtime_put_sync(hba->dev);
+   if (!res)
+   hba->vps->wb_batched_flush = wb_batched_flush;
+
+out:
+   up(>host_sem);
+   return res < 0 ? res : count;
+}
+
 static DEVICE_ATTR_RW(rpm_lvl);
 static DEVICE_ATTR_RO(rpm_target_dev_state);
 static DEVICE_ATTR_RO(rpm_target_link_state);
@@ -261,6 +309,7 @@ static DEVICE_ATTR_RO(spm_target_dev_state);
 static DEVICE_ATTR_RO(spm_target_link_state);
 static DEVICE_ATTR_RW(auto_hibern8);
 static DEVICE_ATTR_RW(wb_on);
+static DEVICE_ATTR_RW(wb_batched_flush);
 
 static struct attribute *ufs_sysfs_ufshcd_attrs[] = {
_attr_rpm_lvl.attr,
@@ -271,6 +320,7 @@ static struct attribute *ufs_sysfs_ufshcd_attrs[] = {
_attr_spm_target_link_state.attr,
_attr_auto_hibern8.attr,
_attr_wb_on.attr,
+   _attr_wb_batched_flush.attr,
NULL
 };
 
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c
index 0625da7a42ee..e11dc578a17c 100644
--- a/drivers/scsi/ufs/ufshcd.c
+++ b/drivers/scsi/ufs/ufshcd.c
@@ -244,7 +244,6 @@ static int ufshcd_setup_vreg(struct ufs_hba *hba, bool on);
 static inline int ufshcd_config_vreg_hpm(struct ufs_hba *hba,
 

[PATCH 3/4] MIPS: Reinstate platform `__div64_32' handler

2021-04-19 Thread Maciej W. Rozycki
Our current MIPS platform `__div64_32' handler is inactive, because it 
is incorrectly only enabled for 64-bit configurations, for which generic 
`do_div' code does not call it anyway.

The handler is not suitable for being called from there though as it 
only calculates 32 bits of the quotient under the assumption the 64-bit 
divident has been suitably reduced.  Code for such reduction used to be 
there, however it has been incorrectly removed with commit c21004cd5b4c 
("MIPS: Rewrite  to work with gcc 4.4.0."), which should 
have only updated an obsoleted constraint for an inline asm involving 
$hi and $lo register outputs, while possibly wiring the original MIPS 
variant of the `do_div' macro as `__div64_32' handler for the generic 
`do_div' implementation

Correct the handler as follows then:

- Revert most of the commit referred, however retaining the current 
  formatting, except for the final two instructions of the inline asm 
  sequence, which the original commit missed.  Omit the original 64-bit 
  parts though.

- Rename the original `do_div' macro to `__div64_32'.  Use the combined 
  `x' constraint referring to the MD accumulator as a whole, replacing 
  the original individual `h' and `l' constraints used for $hi and $lo 
  registers respectively, of which `h' has been obsoleted with GCC 4.4. 
  Update surrounding code accordingly.

  We have since removed support for GCC versions before 4.9, so no need 
  for a special arrangement here; GCC has supported the `x' constraint 
  since forever anyway, or at least going back to 1991.

- Rename the `__base' local variable in `__div64_32' to `__radix' to 
  avoid a conflict with a local variable in `do_div'.

- Actually enable this code for 32-bit rather than 64-bit configurations
  by qualifying it with BITS_PER_LONG being 32 instead of 64.  Include 
   for this macro rather than  as we 
  don't need anything else.

- Finally include  last rather than first.

This has passed correctness verification with test_div64 and reduced the 
module's average execution time down to 1.0668s and 0.2629s from 2.1529s 
and 0.5647s respectively for an R3400 CPU @40MHz and a 5Kc CPU @160MHz.  
For a reference 64-bit `do_div' code where we have the DDIVU instruction 
available to do the whole calculation right away averages at 0.0660s for 
the latter CPU.

Reported-by: Huacai Chen 
Signed-off-by: Maciej W. Rozycki 
Fixes: c21004cd5b4c ("MIPS: Rewrite  to work with gcc 4.4.0.")
Cc: sta...@vger.kernel.org # v2.6.30+
---
Our handcrafted handler seems to run at ~25% of the performance of the 
64-bit hardware instruction; not too bad I would say.  Though there's 
likely some overhead from surrounding code that interferes with the 
figures.

Then there are a couple of `checkpatch.pl' nits about trailing whitespace 
in inline asm, which however makes it more readable.  So the change stays 
as it is.
---
 arch/mips/include/asm/div64.h |   57 ++
 1 file changed, 41 insertions(+), 16 deletions(-)

linux-mips-div64-generic-fix.diff
Index: linux-3maxp-div64/arch/mips/include/asm/div64.h
===
--- linux-3maxp-div64.orig/arch/mips/include/asm/div64.h
+++ linux-3maxp-div64/arch/mips/include/asm/div64.h
@@ -1,5 +1,5 @@
 /*
- * Copyright (C) 2000, 2004  Maciej W. Rozycki
+ * Copyright (C) 2000, 2004, 2021  Maciej W. Rozycki
  * Copyright (C) 2003, 07 Ralf Baechle (r...@linux-mips.org)
  *
  * This file is subject to the terms and conditions of the GNU General Public
@@ -9,25 +9,18 @@
 #ifndef __ASM_DIV64_H
 #define __ASM_DIV64_H
 
-#include 
-
-#if BITS_PER_LONG == 64
+#include 
 
-#include 
+#if BITS_PER_LONG == 32
 
 /*
  * No traps on overflows for any of these...
  */
 
-#define __div64_32(n, base)\
-({ \
+#define do_div64_32(res, high, low, base) ({   \
unsigned long __cf, __tmp, __tmp2, __i; \
unsigned long __quot32, __mod32;\
-   unsigned long __high, __low;\
-   unsigned long long __n; \
\
-   __high = *__n >> 32;\
-   __low = __n;\
__asm__(\
"   .setpush\n" \
"   .setnoat\n" \
@@ -51,18 +44,50 @@
"   subu%0, %0, %z6 \n" \
"   addiu   %2, %2, 1   \n" \
"3: \n" \
-   "   bnez

[PATCH 1/4] lib/math: Add a `do_div' test module

2021-04-19 Thread Maciej W. Rozycki
Implement a module for correctness and performance evaluation for the 
`do_div' function, often handled in an optimised manner by platform 
code.  Use a somewhat randomly generated set of inputs that is supposed 
to be representative, using the same set of divisors twice, expressed as 
a constant and as a variable each, so as to verify the implementation 
for both cases should they be handled by different code execution paths.  
Reference results were produced with GNU bc.

At the conclusion output the total execution time elapsed.

Signed-off-by: Maciej W. Rozycki 
---
NB there's a `checkpatch.pl' warning for a split line, but I can see no 
gain from joining it.
---
 lib/Kconfig.debug |   10 ++
 lib/math/Makefile |2 
 lib/math/test_div64.c |  249 ++
 3 files changed, 261 insertions(+)

linux-div64-test.diff
Index: linux-3maxp-div64/lib/Kconfig.debug
===
--- linux-3maxp-div64.orig/lib/Kconfig.debug
+++ linux-3maxp-div64/lib/Kconfig.debug
@@ -2027,6 +2027,16 @@ config TEST_SORT
 
  If unsure, say N.
 
+config TEST_DIV64
+   tristate "64bit/32bit division and modulo test"
+   depends on DEBUG_KERNEL || m
+   help
+ Enable this to turn on 'do_div()' function test. This test is
+ executed only once during system boot (so affects only boot time),
+ or at module load time.
+
+ If unsure, say N.
+
 config KPROBES_SANITY_TEST
bool "Kprobes sanity tests"
depends on DEBUG_KERNEL
Index: linux-3maxp-div64/lib/math/Makefile
===
--- linux-3maxp-div64.orig/lib/math/Makefile
+++ linux-3maxp-div64/lib/math/Makefile
@@ -4,3 +4,5 @@ obj-y += div64.o gcd.o lcm.o int_pow.o i
 obj-$(CONFIG_CORDIC)   += cordic.o
 obj-$(CONFIG_PRIME_NUMBERS)+= prime_numbers.o
 obj-$(CONFIG_RATIONAL) += rational.o
+
+obj-$(CONFIG_TEST_DIV64)   += test_div64.o
Index: linux-3maxp-div64/lib/math/test_div64.c
===
--- /dev/null
+++ linux-3maxp-div64/lib/math/test_div64.c
@@ -0,0 +1,249 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Copyright (C) 2021  Maciej W. Rozycki
+ */
+
+#define pr_fmt(fmt) KBUILD_MODNAME ": " fmt
+
+#include 
+#include 
+#include 
+#include 
+#include 
+#include 
+
+#include 
+
+#define TEST_DIV64_N_ITER 1024
+
+static const u64 test_div64_dividents[] = {
+   0xab275080,
+   0x000fe73c1959,
+   0x00e54c0a74b1,
+   0x0d4398ff1ef9,
+   0xa18c2ee1c097,
+   0x00079fb80b072e4a,
+   0x0072db27380dd689,
+   0x0842f488162e2284,
+   0xf66745411d8ab063,
+};
+#define SIZE_DIV64_DIVIDENTS ARRAY_SIZE(test_div64_dividents)
+
+#define TEST_DIV64_DIVISOR_0 0x0009
+#define TEST_DIV64_DIVISOR_1 0x007c
+#define TEST_DIV64_DIVISOR_2 0x0204
+#define TEST_DIV64_DIVISOR_3 0xcb5b
+#define TEST_DIV64_DIVISOR_4 0x0001
+#define TEST_DIV64_DIVISOR_5 0x0008a880
+#define TEST_DIV64_DIVISOR_6 0x003fd3ae
+#define TEST_DIV64_DIVISOR_7 0x0b658fac
+#define TEST_DIV64_DIVISOR_8 0xdc08b349
+
+static const u32 test_div64_divisors[] = {
+   TEST_DIV64_DIVISOR_0,
+   TEST_DIV64_DIVISOR_1,
+   TEST_DIV64_DIVISOR_2,
+   TEST_DIV64_DIVISOR_3,
+   TEST_DIV64_DIVISOR_4,
+   TEST_DIV64_DIVISOR_5,
+   TEST_DIV64_DIVISOR_6,
+   TEST_DIV64_DIVISOR_7,
+   TEST_DIV64_DIVISOR_8,
+};
+#define SIZE_DIV64_DIVISORS ARRAY_SIZE(test_div64_divisors)
+
+static const struct {
+   u64 quotient;
+   u32 remainder;
+} test_div64_results[SIZE_DIV64_DIVISORS][SIZE_DIV64_DIVIDENTS] = {
+   {
+   { 0x13045e47, 0x0001 },
+   { 0x0161596c, 0x0030 },
+   { 0x0054e9d4, 0x0130 },
+   { 0xd776, 0x278e },
+   { 0xab27, 0x5080 },
+   { 0x13c4, 0x0004ce80 },
+   { 0x02ae, 0x001e143c },
+   { 0x000f, 0x0033e56c },
+   { 0x, 0xab275080 },
+   }, {
+   { 0x0001c45c02d1, 0x },
+   { 0x20d5213c, 0x0049 },
+   { 0x07e3d65f, 0x01dd },
+   { 0x00140531, 0x65ee },
+   { 0x000fe73c, 0x1959 },
+   { 0x0001d637, 0x0004e5d9 },
+   { 0x3fc9, 0x000713bb },
+   { 0x0165, 0x029abe7d },
+   { 0x0012, 0x6e9f7e37 },
+   }, {
+   { 0x00197a3a0cf7, 0x0002 },
+   { 0x0001d9632e5c, 0x0021 },
+   { 0x71c28039, 0x01cd },
+   { 0x0120a844, 0xb885 },
+   { 0x00e54c0a, 0x74b1 },
+   { 0x001a7bb3, 

[PATCH 2/4] div64: Correct inline documentation for `do_div'

2021-04-19 Thread Maciej W. Rozycki
Correct inline documentation for `do_div', which is a function-like 
macro the `n' parameter of which has the semantics of a C++ reference: 
it is both read and written in the context of the caller without an 
explicit dereference such as with a pointer.

In the C programming language it has no equivalent for proper functions, 
in terms of which the documentation expresses the semantics of `do_div', 
but substituting a pointer in documentation is misleading, and using the 
C++ notation should at least raise the reader's attention and encourage 
to seek explanation even if the C++ semantics is not readily understood.

While at it observe that "semantics" is an uncountable noun, so refer to 
it with a singular rather than plural verb.

Signed-off-by: Maciej W. Rozycki 
---
NB there's a `checkpatch.pl' warning for tabs preceded by spaces, but that 
is just the style of the piece of code quoted and I can see no gain from 
changing it or worse yet making inconsistent.
---
 include/asm-generic/div64.h |   10 ++
 1 file changed, 6 insertions(+), 4 deletions(-)

linux-div64-doc-fix.diff
Index: linux-3maxp-div64/include/asm-generic/div64.h
===
--- linux-3maxp-div64.orig/include/asm-generic/div64.h
+++ linux-3maxp-div64/include/asm-generic/div64.h
@@ -8,12 +8,14 @@
  * Optimization for constant divisors on 32-bit machines:
  * Copyright (C) 2006-2015 Nicolas Pitre
  *
- * The semantics of do_div() are:
+ * The semantics of do_div() is, in C++ notation, observing that the name
+ * is a function-like macro and the n parameter has the semantics of a C++
+ * reference:
  *
- * uint32_t do_div(uint64_t *n, uint32_t base)
+ * uint32_t do_div(uint64_t , uint32_t base)
  * {
- * uint32_t remainder = *n % base;
- * *n = *n / base;
+ * uint32_t remainder = n % base;
+ * n = n / base;
  * return remainder;
  * }
  *


[PATCH 4/4] MIPS: Avoid DIVU in `__div64_32' is result would be zero

2021-04-19 Thread Maciej W. Rozycki
We already check the high part of the divident against zero to avoid the 
costly DIVU instruction in that case, needed to reduce the high part of 
the divident, so we may well check against the divisor instead and set 
the high part of the quotient to zero right away.  We need to treat the 
high part the divident in that case though as the remainder that would 
be calculated by the DIVU instruction we avoided.

This has passed correctness verification with test_div64 and reduced the
module's average execution time down to 1.0445s and 0.2619s from 1.0668s
and 0.2629s respectively for an R3400 CPU @40MHz and a 5Kc CPU @160MHz.

Signed-off-by: Maciej W. Rozycki 
---
I have made an experimental change on top of this to put `__div64_32' out 
of line, and that increases the averages respectively up to 1.0785s and 
0.2705s.  Not a terrible loss, especially compared to generic times quoted 
with 3/4, but still, so I think it would best be made where optimising for 
size, as noted in the cover letter.
---
 arch/mips/include/asm/div64.h |6 --
 1 file changed, 4 insertions(+), 2 deletions(-)

Index: linux-3maxp-div64/arch/mips/include/asm/div64.h
===
--- linux-3maxp-div64.orig/arch/mips/include/asm/div64.h
+++ linux-3maxp-div64/arch/mips/include/asm/div64.h
@@ -68,9 +68,11 @@
\
__high = __div >> 32;   \
__low = __div;  \
-   __upper = __high;   \
\
-   if (__high) {   \
+   if (__high < __radix) { \
+   __upper = __high;   \
+   __high = 0; \
+   } else {\
__asm__("divu   $0, %z1, %z2"   \
: "=x" (__modquot)  \
: "Jr" (__high), "Jr" (__radix));   \


[PATCH 0/4] Reinstate and improve MIPS `do_div' implementation

2021-04-19 Thread Maciej W. Rozycki
Hi,

 As Huacai has recently discovered the MIPS backend for `do_div' has been 
broken and inadvertently disabled with commit c21004cd5b4c ("MIPS: Rewrite 
 to work with gcc 4.4.0.").  As it is code I have originally 
written myself and Huacai had issues bringing it back to life leading to a 
request to discard it even I have decided to step in.

 In the end I have fixed the code and measured its performance to be ~100% 
better on average than our generic code.  I have decided it would be worth 
having the test module I have prepared for correctness evaluation as well 
as benchmarking, so I have included it with the series, also so that I can 
refer to the results easily.

 In the end I have included four patches on this occasion: 1/4 is the test 
module, 2/4 is an inline documentation fix/clarification for the `do_div' 
wrapper, 3/4 enables the MIPS `__div64_32' backend and 4/4 adds a small 
performance improvement to it.

 I have investigated a fifth change as a potential improvement where I 
replaced the call to `do_div64_32' with a DIVU instruction for cases where 
the high part of the intermediate divident is zero, but it has turned out 
to regress performance a little, so I have discarded it.

 Also a follow-up change might be worth having to reduce the code size and 
place `__div64_32' out of line for CC_OPTIMIZE_FOR_SIZE configurations, 
but I have not fully prepared such a change at this time.  I did use the 
WIP form I have for performance evaluation however; see the figures quoted 
with 4/4.

 These changes have been verified with a DECstation system with an R3400 
MIPS I processor @40MHz and a MTI Malta system with a 5Kc MIPS64 processor 
@160MHz.

 See individual change descriptions and any additional discussions for
further details.

 Questions, comments or concerns?  Otherwise please apply.

  Maciej


[PATCH v2] iommu/vt-d: Use passthrough mode for the Intel IPUs

2021-04-19 Thread Bingbu Cao
Intel IPU(Image Processing Unit) has its own (IO)MMU hardware,
The IPU driver allocates its own page table that is not mapped
via the DMA, and thus the Intel IOMMU driver blocks access giving
this error:

DMAR: DRHD: handling fault status reg 3
DMAR: [DMA Read] Request device [00:05.0] PASID 
  fault addr 76406000 [fault reason 06] PTE Read access is not set

As IPU is not an external facing device which is not risky, so use
IOMMU passthrough mode for Intel IPUs.

Fixes: 26f5689592e2 ("media: staging/intel-ipu3: mmu: Implement driver")
Signed-off-by: Bingbu Cao 
---
Changes since v1:
 - Use IPU PCI DID value instead of macros to align with others
 - Check IPU PCI device ID in quirk

---
 drivers/iommu/intel/iommu.c | 29 +
 1 file changed, 29 insertions(+)

diff --git a/drivers/iommu/intel/iommu.c b/drivers/iommu/intel/iommu.c
index ee0932307d64..7e2fbdae467e 100644
--- a/drivers/iommu/intel/iommu.c
+++ b/drivers/iommu/intel/iommu.c
@@ -55,6 +55,12 @@
 #define IS_GFX_DEVICE(pdev) ((pdev->class >> 16) == PCI_BASE_CLASS_DISPLAY)
 #define IS_USB_DEVICE(pdev) ((pdev->class >> 8) == PCI_CLASS_SERIAL_USB)
 #define IS_ISA_DEVICE(pdev) ((pdev->class >> 8) == PCI_CLASS_BRIDGE_ISA)
+#define IS_INTEL_IPU(pdev) ((pdev)->vendor == PCI_VENDOR_ID_INTEL &&   \
+   ((pdev)->device == 0x9a19 ||\
+(pdev)->device == 0x9a39 ||\
+(pdev)->device == 0x4e19 ||\
+(pdev)->device == 0x465d ||\
+(pdev)->device == 0x1919))
 #define IS_AZALIA(pdev) ((pdev)->vendor == 0x8086 && (pdev)->device == 0x3a3e)
 
 #define IOAPIC_RANGE_START (0xfee0)
@@ -360,6 +366,7 @@ int intel_iommu_enabled = 0;
 EXPORT_SYMBOL_GPL(intel_iommu_enabled);
 
 static int dmar_map_gfx = 1;
+static int dmar_map_ipu = 1;
 static int dmar_forcedac;
 static int intel_iommu_strict;
 static int intel_iommu_superpage = 1;
@@ -368,6 +375,7 @@ static int iommu_skip_te_disable;
 
 #define IDENTMAP_GFX   2
 #define IDENTMAP_AZALIA4
+#define IDENTMAP_IPU   8
 
 int intel_iommu_gfx_mapped;
 EXPORT_SYMBOL_GPL(intel_iommu_gfx_mapped);
@@ -2839,6 +2847,9 @@ static int device_def_domain_type(struct device *dev)
 
if ((iommu_identity_mapping & IDENTMAP_GFX) && 
IS_GFX_DEVICE(pdev))
return IOMMU_DOMAIN_IDENTITY;
+
+   if ((iommu_identity_mapping & IDENTMAP_IPU) && 
IS_INTEL_IPU(pdev))
+   return IOMMU_DOMAIN_IDENTITY;
}
 
return 0;
@@ -3278,6 +3289,9 @@ static int __init init_dmars(void)
if (!dmar_map_gfx)
iommu_identity_mapping |= IDENTMAP_GFX;
 
+   if (!dmar_map_ipu)
+   iommu_identity_mapping |= IDENTMAP_IPU;
+
check_tylersburg_isoch();
 
ret = si_domain_init(hw_pass_through);
@@ -5622,6 +5636,18 @@ static void quirk_iommu_igfx(struct pci_dev *dev)
dmar_map_gfx = 0;
 }
 
+static void quirk_iommu_ipu(struct pci_dev *dev)
+{
+   if (!IS_INTEL_IPU(dev))
+   return;
+
+   if (risky_device(dev))
+   return;
+
+   pci_info(dev, "Passthrough IOMMU for integrated Intel IPU\n");
+   dmar_map_ipu = 0;
+}
+
 /* G4x/GM45 integrated gfx dmar support is totally busted. */
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2a40, quirk_iommu_igfx);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x2e00, quirk_iommu_igfx);
@@ -5657,6 +5683,9 @@ DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x1632, 
quirk_iommu_igfx);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x163A, quirk_iommu_igfx);
 DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, 0x163D, quirk_iommu_igfx);
 
+/* disable IPU dmar support */
+DECLARE_PCI_FIXUP_HEADER(PCI_VENDOR_ID_INTEL, PCI_ANY_ID, quirk_iommu_ipu);
+
 static void quirk_iommu_rwbf(struct pci_dev *dev)
 {
if (risky_device(dev))
-- 
2.7.4



Re: [PATCH] Revert "MIPS: make userspace mapping young by default".

2021-04-19 Thread Huang Pei
Hi,
On Mon, Apr 19, 2021 at 10:21:40PM +0800, Zhou Yanjie wrote:
> Hi
> 
> On 2021/4/19 下午12:56, Huang Pei wrote:
> > On Sat, Apr 17, 2021 at 12:45:59AM +0800, Zhou Yanjie wrote:
> > > On 2021/4/16 下午5:20, 黄沛 wrote:
> > > > Is there any log about the panic?
> > > 
> > > Yes, below is the log:
> > > 
> > > 
> > > [  195.436017] CPU 0 Unable to handle kernel paging request at virtual
> > > address 77eb8000, epc == 80117868, ra == 80118208
> > > [  195.446709] Oops[#1]:
> > > [  195.448977] CPU: 0 PID: 1461 Comm: Xsession Not tainted
> > > 5.12.0-rc6-00227-gc8fc6defbd2e-dirty #1
> > > [  195.457661] $ 0   :  0001 80117864 77eb9000
> > > [  195.462888] $ 4   : 77eb8000 82419600 838ea000 82482ba0
> > > [  195.468116] $ 8   : 826f8b18 8306f800 72d5 8306f800
> > > [  195.473343] $12   : 0002 0a03 0001 0402
> > > [  195.478568] $16   : 77eb8000 809faf60 0004 82482ba0
> > > [  195.483794] $20   : 77eb8000 82419600 82482ba0 8086
> > > [  195.489021] $24   : 8086121c 80117864
> > > [  195.494248] $28   : 838ea000 838ebd70  80118208
> > > [  195.499475] Hi    : 8c4e
> > > [  195.502343] Lo    : 4627
> > > [  195.505212] epc   : 80117868 r4k_blast_dcache_page_dc32+0x4/0x9c
> > > [  195.511217] ra    : 80118208 local_r4k_flush_cache_page+0x120/0x1b8
> > > [  195.517476] Status: 10001403 KERNEL EXL IE
> > > [  195.521657] Cause : 4080800c (ExcCode 03)
> > > [  195.525654] BadVA : 77eb8000
> > > [  195.528523] PrId  : 00d00100 (Ingenic XBurst)
> > > [  195.532866] Modules linked in:
> > > [  195.535911] Process Xsession (pid: 1461, threadinfo=00975a3e,
> > > task=3724fd66, tls=77ebd690)
> > > [  195.544162] Stack : 808a05ec f7edcbfd 8306f800  8086 
> > > 809faf60
> > > 80990a3c 80117f90
> > > [  195.552524] 809faf60 82419600 8306f800 801fd84c  
> > > 801180b4
> > > 838ebe80 80110b7c
> > > [  195.560887] 80990a3c 82482ba0 82482ba0 77eb8000 4627 
> > > f7edcbfd
> > > 838ebe80 801cbc08
> > > [  195.569249] 0001 181b2000  801fa06c  
> > > 83999ae0
> > > 8086 0004
> > > [  195.577610] 80990a3c f7edcbfd 80990a3c 838ebe80 0004 
> > > 80990a3c
> > > 82482ba0 04627685
> > > [  195.585973] ...
> > > [  195.588413] Call Trace:
> > > [  195.590849] [<80117868>] r4k_blast_dcache_page_dc32+0x4/0x9c
> > > [  195.596501] [<80118208>] local_r4k_flush_cache_page+0x120/0x1b8
> > > [  195.602413] [<80117f90>] r4k_on_each_cpu.isra.8+0x24/0x58
> > > [  195.607805] [<801180b4>] r4k_flush_cache_page+0x34/0x58
> > > [  195.613023] [<801cbc08>] wp_page_copy+0x3a8/0x56c
> > > [  195.617723] [<801ce944>] do_swap_page+0x4cc/0x558
> > > [  195.622419] [<801cf3f8>] handle_mm_fault+0x790/0x93c
> > > [  195.627374] [<8011025c>] do_page_fault+0x19c/0x540
> > > [  195.632159] [<801142f0>] tlb_do_page_fault_1+0x10c/0x11c
> > > [  195.637465]
> > > [  195.638947] Code: 03e8    24831000  bc950020
> > > bc950040  bc950060  bc950080  bc9500a0
> > > [  195.648706]
> > > [  195.650243] ---[ end trace 7cc7d7f611932c42 ]---
> > > [  195.654857] Kernel panic - not syncing: Fatal exception
> > > [  195.660072] Rebooting in 10 seconds..
> > > 
> > > 
> > > this problem can be triggered stably (by use Microsoft Remote Desktop 
> > > client
> > > to login to debian9 running on CU1830-Neo).
> > > 
> > Could you print out the PTE value at 0x77eb8000 ?
> 
> 
> Here is the new log:
> 
> 
> [   33.681712] CPU 0 Unable to handle kernel paging request at virtual
> address 77ea4000, epc == 801178ac, ra == 80118250
> [   33.692395] Oops[#1]:
> [   33.694662] CPU: 0 PID: 1389 Comm: Xsession Not tainted 5.12.0-rc8-dirty
> #2
> [   33.701612] $ 0   :  0001 801178a8 77ea5000
> [   33.706839] $ 4   : 77ea4000 81bcd220 80118130 856712a0
> [   33.712066] $ 8   : 833e4a80 8544b800 70a8 8544b800
> [   33.717293] $12   : 0002 05b7 0001 
> [   33.722518] $16   : 81bcd220 77ea4000 80a11ad8 0004
> [   33.727745] $20   : 77ea4000 81bcd220 856712a0 8086
> [   33.732972] $24   : 001c 801178a8
> [   33.738197] $28   : 82564000 82565d68  80118250
> [   33.743424] Hi    : f0cc
> [   33.746293] Lo    : 7866
> [   33.749162] epc   : 801178ac r4k_blast_dcache_page_dc32+0x4/0x9c
> [   33.755166] ra    : 80118250 local_r4k_flush_cache_page+0x120/0x2c8
> [   33.761425] Status: 10001403 KERNEL EXL IE
> [   33.765605] Cause : 4080800c (ExcCode 03)
> [   33.769603] BadVA : 77ea4000
> [   33.772472] PrId  : 00d00100 (Ingenic XBurst)
> [   33.776816] Modules linked in:
> [   33.779861] Process Xsession (pid: 1389, threadinfo=c8bdf64c,
> task=2372d853, tls=77ea9690)
> [   33.788111] Stack : 808a256c  808a256c bfa6939a 8544b800 8086
> 8094d308 80a11ad8
> [   33.796474] 856712a0 80117fd8 8094d308 81bcd220 8544b800 801fdb10
> 80945ce8 801180fc
> [   33.804838] 82565e80 80110b8c 80a11ad8 856712a0 856712a0 77ea4000
> 7866 bfa6939a
> [   33.813201] 

Re: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems

2021-04-19 Thread Vineet Gupta
Hi Matthew,

On 4/16/21 7:45 PM, Matthew Wilcox wrote:
> Replacement patch to fix compiler warning.
>
> From: "Matthew Wilcox (Oracle)" 
> Date: Fri, 16 Apr 2021 16:34:55 -0400
> Subject: [PATCH 1/2] mm: Fix struct page layout on 32-bit systems
> To: bro...@redhat.com
> Cc: linux-kernel@vger.kernel.org,
>  linux...@kvack.org,
>  net...@vger.kernel.org,
>  linuxppc-...@lists.ozlabs.org,
>  linux-arm-ker...@lists.infradead.org,
>  linux-m...@vger.kernel.org,
>  ilias.apalodi...@linaro.org,
>  mcr...@linux.microsoft.com,
>  grygorii.stras...@ti.com,
>  a...@kernel.org,
>  h...@lst.de,
>  linux-snps-...@lists.infradead.org,
>  mho...@kernel.org,
>  mgor...@suse.de
>
> 32-bit architectures which expect 8-byte alignment for 8-byte integers
> and need 64-bit DMA addresses (arc, arm, mips, ppc) had their struct
> page inadvertently expanded in 2019.

FWIW, ARC doesn't require 8 byte alignment for 8 byte integers. This is 
only needed for 8-byte atomics due to the requirements of LLOCKD/SCOND 
instructions.

> When the dma_addr_t was added,
> it forced the alignment of the union to 8 bytes, which inserted a 4 byte
> gap between 'flags' and the union.
>
> Fix this by storing the dma_addr_t in one or two adjacent unsigned longs.
> This restores the alignment to that of an unsigned long, and also fixes a
> potential problem where (on a big endian platform), the bit used to denote
> PageTail could inadvertently get set, and a racing get_user_pages_fast()
> could dereference a bogus compound_head().
>
> Fixes: c25fff7171be ("mm: add dma_addr_t to struct page")
> Signed-off-by: Matthew Wilcox (Oracle) 
> ---
>   include/linux/mm_types.h |  4 ++--
>   include/net/page_pool.h  | 12 +++-
>   net/core/page_pool.c | 12 +++-
>   3 files changed, 20 insertions(+), 8 deletions(-)
>
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index 6613b26a8894..5aacc1c10a45 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -97,10 +97,10 @@ struct page {
>   };
>   struct {/* page_pool used by netstack */
>   /**
> -  * @dma_addr: might require a 64-bit value even on
> +  * @dma_addr: might require a 64-bit value on
>* 32-bit architectures.
>*/
> - dma_addr_t dma_addr;
> + unsigned long dma_addr[2];
>   };
>   struct {/* slab, slob and slub */
>   union {
> diff --git a/include/net/page_pool.h b/include/net/page_pool.h
> index b5b195305346..ad6154dc206c 100644
> --- a/include/net/page_pool.h
> +++ b/include/net/page_pool.h
> @@ -198,7 +198,17 @@ static inline void page_pool_recycle_direct(struct 
> page_pool *pool,
>   
>   static inline dma_addr_t page_pool_get_dma_addr(struct page *page)
>   {
> - return page->dma_addr;
> + dma_addr_t ret = page->dma_addr[0];
> + if (sizeof(dma_addr_t) > sizeof(unsigned long))
> + ret |= (dma_addr_t)page->dma_addr[1] << 16 << 16;
> + return ret;
> +}
> +
> +static inline void page_pool_set_dma_addr(struct page *page, dma_addr_t addr)
> +{
> + page->dma_addr[0] = addr;
> + if (sizeof(dma_addr_t) > sizeof(unsigned long))
> + page->dma_addr[1] = addr >> 16 >> 16;
>   }
>   
>   static inline bool is_page_pool_compiled_in(void)
> diff --git a/net/core/page_pool.c b/net/core/page_pool.c
> index ad8b0707af04..f014fd8c19a6 100644
> --- a/net/core/page_pool.c
> +++ b/net/core/page_pool.c
> @@ -174,8 +174,10 @@ static void page_pool_dma_sync_for_device(struct 
> page_pool *pool,
> struct page *page,
> unsigned int dma_sync_size)
>   {
> + dma_addr_t dma_addr = page_pool_get_dma_addr(page);
> +
>   dma_sync_size = min(dma_sync_size, pool->p.max_len);
> - dma_sync_single_range_for_device(pool->p.dev, page->dma_addr,
> + dma_sync_single_range_for_device(pool->p.dev, dma_addr,
>pool->p.offset, dma_sync_size,
>pool->p.dma_dir);
>   }
> @@ -226,7 +228,7 @@ static struct page *__page_pool_alloc_pages_slow(struct 
> page_pool *pool,
>   put_page(page);
>   return NULL;
>   }
> - page->dma_addr = dma;
> + page_pool_set_dma_addr(page, dma);
>   
>   if (pool->p.flags & PP_FLAG_DMA_SYNC_DEV)
>   page_pool_dma_sync_for_device(pool, page, pool->p.max_len);
> @@ -294,13 +296,13 @@ void page_pool_release_page(struct page_pool *pool, 
> struct page *page)
>*/
>   goto skip_dma_unmap;
>   
> - dma = page->dma_addr;
> + dma = page_pool_get_dma_addr(page);
>   
> - /* When page is unmapped, it cannot be returned our pool */
> + /* When page is unmapped, it cannot be 

Re: [PATCH] dt-bindings: net: mediatek: support MT7621 SoC

2021-04-19 Thread Ilya Lipnitskiy
On Sun, Apr 18, 2021 at 11:24 PM Bjørn Mork  wrote:
>
> Ilya Lipnitskiy  writes:
>
> > Add missing binding documentation for SoC support that has been in place
> > since v5.1
> >
> > Fixes: 889bcbdeee57 ("net: ethernet: mediatek: support MT7621 SoC ethernet 
> > hardware")
> > Cc: Bjørn Mork 
> > Signed-off-by: Ilya Lipnitskiy 
> > ---
> >  Documentation/devicetree/bindings/net/mediatek-net.txt | 1 +
> >  1 file changed, 1 insertion(+)
> >
> > diff --git a/Documentation/devicetree/bindings/net/mediatek-net.txt 
> > b/Documentation/devicetree/bindings/net/mediatek-net.txt
> > index 72d03e07cf7c..950ef6af20b1 100644
> > --- a/Documentation/devicetree/bindings/net/mediatek-net.txt
> > +++ b/Documentation/devicetree/bindings/net/mediatek-net.txt
> > @@ -10,6 +10,7 @@ Required properties:
> >  - compatible: Should be
> >   "mediatek,mt2701-eth": for MT2701 SoC
> >   "mediatek,mt7623-eth", "mediatek,mt2701-eth": for MT7623 SoC
> > + "mediatek,mt7621-eth": for MT7621 SoC
> >   "mediatek,mt7622-eth": for MT7622 SoC
> >   "mediatek,mt7629-eth": for MT7629 SoC
> >   "ralink,rt5350-eth": for Ralink Rt5350F and MT7628/88 SoC
>
>
> Thanks for taking care of this!
>
> Note, however, that this compatible value is defined in
> Documentation/devicetree/bindings/net/ralink,rt2880-net.txt
>
> I believe that file should go away. These two files are both documenting
> the same compatible property AFAICS.
Removed along with two others in
https://lore.kernel.org/lkml/20210420024222.101615-1-ilya.lipnits...@gmail.com/T/#u

Ilya


[PATCH] dt-bindings: net: mediatek/ralink: remove unused bindings

2021-04-19 Thread Ilya Lipnitskiy
Revert commit 663148e48a66 ("Documentation: DT: net: add docs for
ralink/mediatek SoC ethernet binding")

No in-tree drivers use the compatible strings present in these bindings,
and some have been superseded by DSA-capable mtk_eth_soc driver, so
remove these obsolete bindings.

Cc: John Crispin 
Signed-off-by: Ilya Lipnitskiy 
---
 .../bindings/net/mediatek,mt7620-gsw.txt  | 24 
 .../bindings/net/ralink,rt2880-net.txt| 59 ---
 .../bindings/net/ralink,rt3050-esw.txt| 30 --
 3 files changed, 113 deletions(-)
 delete mode 100644 
Documentation/devicetree/bindings/net/mediatek,mt7620-gsw.txt
 delete mode 100644 Documentation/devicetree/bindings/net/ralink,rt2880-net.txt
 delete mode 100644 Documentation/devicetree/bindings/net/ralink,rt3050-esw.txt

diff --git a/Documentation/devicetree/bindings/net/mediatek,mt7620-gsw.txt 
b/Documentation/devicetree/bindings/net/mediatek,mt7620-gsw.txt
deleted file mode 100644
index 358fed2fab43..
--- a/Documentation/devicetree/bindings/net/mediatek,mt7620-gsw.txt
+++ /dev/null
@@ -1,24 +0,0 @@
-Mediatek Gigabit Switch
-===
-
-The mediatek gigabit switch can be found on Mediatek SoCs (mt7620, mt7621).
-
-Required properties:
-- compatible: Should be "mediatek,mt7620-gsw" or "mediatek,mt7621-gsw"
-- reg: Address and length of the register set for the device
-- interrupts: Should contain the gigabit switches interrupt
-- resets: Should contain the gigabit switches resets
-- reset-names: Should contain the reset names "gsw"
-
-Example:
-
-gsw@1011 {
-   compatible = "ralink,mt7620-gsw";
-   reg = <0x1011 8000>;
-
-   resets = < 23>;
-   reset-names = "gsw";
-
-   interrupt-parent = <>;
-   interrupts = <17>;
-};
diff --git a/Documentation/devicetree/bindings/net/ralink,rt2880-net.txt 
b/Documentation/devicetree/bindings/net/ralink,rt2880-net.txt
deleted file mode 100644
index 9fe1a0a22e44..
--- a/Documentation/devicetree/bindings/net/ralink,rt2880-net.txt
+++ /dev/null
@@ -1,59 +0,0 @@
-Ralink Frame Engine Ethernet controller
-===
-
-The Ralink frame engine ethernet controller can be found on Ralink and
-Mediatek SoCs (RT288x, RT3x5x, RT366x, RT388x, rt5350, mt7620, mt7621, mt76x8).
-
-Depending on the SoC, there is a number of ports connected to the CPU port
-directly and/or via a (gigabit-)switch.
-
-* Ethernet controller node
-
-Required properties:
-- compatible: Should be one of "ralink,rt2880-eth", "ralink,rt3050-eth",
-  "ralink,rt3050-eth", "ralink,rt3883-eth", "ralink,rt5350-eth",
-  "mediatek,mt7620-eth", "mediatek,mt7621-eth"
-- reg: Address and length of the register set for the device
-- interrupts: Should contain the frame engines interrupt
-- resets: Should contain the frame engines resets
-- reset-names: Should contain the reset names "fe". If a switch is present
-  "esw" is also required.
-
-
-* Ethernet port node
-
-Required properties:
-- compatible: Should be "ralink,eth-port"
-- reg: The number of the physical port
-- phy-handle: reference to the node describing the phy
-
-Example:
-
-mdio-bus {
-   ...
-   phy0: ethernet-phy@0 {
-   phy-mode = "mii";
-   reg = <0>;
-   };
-};
-
-ethernet@40 {
-   compatible = "ralink,rt2880-eth";
-   reg = <0x0040 1>;
-
-   #address-cells = <1>;
-   #size-cells = <0>;
-
-   resets = < 18>;
-   reset-names = "fe";
-
-   interrupt-parent = <>;
-   interrupts = <5>;
-
-   port@0 {
-   compatible = "ralink,eth-port";
-   reg = <0>;
-   phy-handle = <>;
-   };
-
-};
diff --git a/Documentation/devicetree/bindings/net/ralink,rt3050-esw.txt 
b/Documentation/devicetree/bindings/net/ralink,rt3050-esw.txt
deleted file mode 100644
index 87e315856efa..
--- a/Documentation/devicetree/bindings/net/ralink,rt3050-esw.txt
+++ /dev/null
@@ -1,30 +0,0 @@
-Ralink Fast Ethernet Embedded Switch
-
-
-The ralink fast ethernet embedded switch can be found on Ralink and Mediatek
-SoCs (RT3x5x, RT5350, MT76x8).
-
-Required properties:
-- compatible: Should be "ralink,rt3050-esw"
-- reg: Address and length of the register set for the device
-- interrupts: Should contain the embedded switches interrupt
-- resets: Should contain the embedded switches resets
-- reset-names: Should contain the reset names "esw"
-
-Optional properties:
-- ralink,portmap: can be used to choose if the default switch setup is
-  w or w
-- ralink,led_polarity: override the active high/low settings of the leds
-
-Example:
-
-esw@1011 {
-   compatible = "ralink,rt3050-esw";
-   reg = <0x1011 8000>;
-
-   resets = < 23>;
-   reset-names = "esw";
-
-   interrupt-parent = <>;
-   interrupts = <17>;
-};
-- 
2.31.1



Re: [PATCH 1/3] powerpc/8xx: Enhance readability of trap types

2021-04-19 Thread Xiongwei Song
On Mon, Apr 19, 2021 at 11:48 PM Christophe Leroy
 wrote:
>
> This patch makes use of trap types in head_8xx.S
>
> Signed-off-by: Christophe Leroy 
> ---
>  arch/powerpc/include/asm/interrupt.h | 29 
>  arch/powerpc/kernel/head_8xx.S   | 49 ++--
>  2 files changed, 47 insertions(+), 31 deletions(-)
>
> diff --git a/arch/powerpc/include/asm/interrupt.h 
> b/arch/powerpc/include/asm/interrupt.h
> index ed2c4042c3d1..cf2c5c3ae716 100644
> --- a/arch/powerpc/include/asm/interrupt.h
> +++ b/arch/powerpc/include/asm/interrupt.h
> @@ -2,13 +2,6 @@
>  #ifndef _ASM_POWERPC_INTERRUPT_H
>  #define _ASM_POWERPC_INTERRUPT_H
>
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -#include 
> -
>  /* BookE/4xx */
>  #define INTERRUPT_CRITICAL_INPUT  0x100
>
> @@ -39,9 +32,11 @@
>  /* BookE/BookS/4xx/8xx */
>  #define INTERRUPT_DATA_STORAGE0x300
>  #define INTERRUPT_INST_STORAGE0x400
> +#define INTERRUPT_EXTERNAL 0x500
>  #define INTERRUPT_ALIGNMENT   0x600
>  #define INTERRUPT_PROGRAM 0x700
>  #define INTERRUPT_SYSCALL 0xc00
> +#define INTERRUPT_TRACE0xd00

The INTERRUPT_TRACE macro is defined in BookS section.
In BookE, 0xd00 stands for debug interrupt, so I defined it as
INTERRUPT_DEBUG.  I understand they are similar things,
but the terminologies are different in reference manuals.

Regards,
Xiongwei


Re: [PATCH 2/2] USB: misc: Add driver for the WCH CH341 in I2C/GPIO mode

2021-04-19 Thread Randy Dunlap
Hi--

On 4/19/21 7:25 PM, Frank Zago wrote:
> From: frank zago 
> 

> 
> Signed-off-by: Frank Zago 
> Signed-off-by: frank zago 

Don't think you need both of those.

> ---
>  MAINTAINERS |   6 +
>  drivers/usb/misc/Kconfig|  18 ++
>  drivers/usb/misc/Makefile   |   1 +
>  drivers/usb/misc/ch341/Kconfig  |   0
>  drivers/usb/misc/ch341/Makefile |   3 +
>  drivers/usb/misc/ch341/ch341-core.c |  87 +
>  drivers/usb/misc/ch341/ch341-gpio.c | 249 ++
>  drivers/usb/misc/ch341/ch341-i2c.c  | 267 
>  drivers/usb/misc/ch341/ch341.h  |  50 ++
>  9 files changed, 681 insertions(+)
>  create mode 100644 drivers/usb/misc/ch341/Kconfig
>  create mode 100644 drivers/usb/misc/ch341/Makefile
>  create mode 100644 drivers/usb/misc/ch341/ch341-core.c
>  create mode 100644 drivers/usb/misc/ch341/ch341-gpio.c
>  create mode 100644 drivers/usb/misc/ch341/ch341-i2c.c
>  create mode 100644 drivers/usb/misc/ch341/ch341.h
> 

> diff --git a/drivers/usb/misc/Kconfig b/drivers/usb/misc/Kconfig
> index 8f1144359012..2d4db92f0de4 100644
> --- a/drivers/usb/misc/Kconfig
> +++ b/drivers/usb/misc/Kconfig
> @@ -284,3 +284,21 @@ config BRCM_USB_PINMAP
> This option enables support for remapping some USB external
> signals, which are typically on dedicated pins on the chip,
> to any gpio.
> +
> +config USB_CH341_CORE
> + tristate "USB WinChipHead CH341 in I2C/SPI/GPIO mode"
> + depends on USB && GPIOLIB && I2C && SPI
> + help
> +
> +   If you say yes to this option, support for the CH341 chips,
> +   running in I2C/SPI/GPIO mode will be included. Some versions
> +   of the chip do not support all the functionnalities but
> +   there is no way to differentiate them. For instance the
> +   CH341A and CH341B support everything while the CH341T can
> +   only do I2C.
> +
> +   The serial mode is not supported by this driver. Use the
> +   CH341 USB serial driver.
> +
> +   This driver can also be built as a module.  If so, the
> +   module will be called ch341-buses.

Above should be in drivers/usb/misc/ch341/Kconfig file instead?

> diff --git a/drivers/usb/misc/Makefile b/drivers/usb/misc/Makefile
> index 5f4e598573ab..95c0ca15b8c9 100644
> --- a/drivers/usb/misc/Makefile
> +++ b/drivers/usb/misc/Makefile
> @@ -32,3 +32,4 @@ obj-$(CONFIG_USB_CHAOSKEY)  += chaoskey.o
>  obj-$(CONFIG_USB_SISUSBVGA)  += sisusbvga/
>  obj-$(CONFIG_USB_LINK_LAYER_TEST)+= lvstest.o
>  obj-$(CONFIG_BRCM_USB_PINMAP)+= brcmstb-usb-pinmap.o
> +obj-$(CONFIG_USB_CH341_CORE) += ch341/
> diff --git a/drivers/usb/misc/ch341/Kconfig b/drivers/usb/misc/ch341/Kconfig
> new file mode 100644
> index ..e69de29bb2d1

Is that file (above) empty on purpose?

> diff --git a/drivers/usb/misc/ch341/Makefile b/drivers/usb/misc/ch341/Makefile
> new file mode 100644
> index ..7c6429e7a46e
> --- /dev/null
> +++ b/drivers/usb/misc/ch341/Makefile
> @@ -0,0 +1,3 @@
> +obj-$(CONFIG_USB_CH341_CORE) := ch341-buses.o
> +
> +ch341-buses-objs := ch341-core.o ch341-i2c.o ch341-gpio.o


-- 
~Randy



[PATCH 1/2] Revert "USB: serial: ch341: add new Product ID for CH341A"

2021-04-19 Thread Frank Zago
From: frank zago 

The 0x5512 USB PID is for the I2C/GPIO/SPI interfaces. UART is still
present but only the TX and RX pins are available; DTS, DTR, ... are
used for other things. Remove the PID, and let a I2C driver bind to
it.

Existing CH341 boards usually have physical jumpers to switch between
the 3 modes.

This reverts commit 46ee4abb10a07bd8f8ce910ee6b4ae6a947d7f63.

Signed-off-by: Frank Zago 
Signed-off-by: frank zago 
---
 drivers/usb/serial/ch341.c | 1 -
 1 file changed, 1 deletion(-)

diff --git a/drivers/usb/serial/ch341.c b/drivers/usb/serial/ch341.c
index 2db917eab799..235adc77ee0e 100644
--- a/drivers/usb/serial/ch341.c
+++ b/drivers/usb/serial/ch341.c
@@ -81,7 +81,6 @@
 #define CH341_QUIRK_SIMULATE_BREAK BIT(1)
 
 static const struct usb_device_id id_table[] = {
-   { USB_DEVICE(0x1a86, 0x5512) },
{ USB_DEVICE(0x1a86, 0x5523) },
{ USB_DEVICE(0x1a86, 0x7522) },
{ USB_DEVICE(0x1a86, 0x7523) },
-- 
2.27.0



[PATCH 2/2] USB: misc: Add driver for the WCH CH341 in I2C/GPIO mode

2021-04-19 Thread Frank Zago
From: frank zago 

The CH341 is a multifunction chip, presenting 3 different USB PID. One
of these functions is for I2C/SPI/GPIO. This new driver manages I2C
and GPIO. A future update will manage the SPI part as well.

The I2C interface can run at 4 different speeds. This driver currently
only offer 100MHz. Tested with a variety of I2C sensors, and the IIO
subsystem.

The GPIO interface offers 8 GPIOs. 6 are read/write, and 2 are
rea-only. However the SPI interface will use 6 of them, leaving 2
available GPIOs.

Signed-off-by: Frank Zago 
Signed-off-by: frank zago 
---
 MAINTAINERS |   6 +
 drivers/usb/misc/Kconfig|  18 ++
 drivers/usb/misc/Makefile   |   1 +
 drivers/usb/misc/ch341/Kconfig  |   0
 drivers/usb/misc/ch341/Makefile |   3 +
 drivers/usb/misc/ch341/ch341-core.c |  87 +
 drivers/usb/misc/ch341/ch341-gpio.c | 249 ++
 drivers/usb/misc/ch341/ch341-i2c.c  | 267 
 drivers/usb/misc/ch341/ch341.h  |  50 ++
 9 files changed, 681 insertions(+)
 create mode 100644 drivers/usb/misc/ch341/Kconfig
 create mode 100644 drivers/usb/misc/ch341/Makefile
 create mode 100644 drivers/usb/misc/ch341/ch341-core.c
 create mode 100644 drivers/usb/misc/ch341/ch341-gpio.c
 create mode 100644 drivers/usb/misc/ch341/ch341-i2c.c
 create mode 100644 drivers/usb/misc/ch341/ch341.h

diff --git a/MAINTAINERS b/MAINTAINERS
index 95e6766718b0..17e3d03cf9bf 100644
--- a/MAINTAINERS
+++ b/MAINTAINERS
@@ -19349,6 +19349,12 @@ M: David Härdeman 
 S: Maintained
 F: drivers/media/rc/winbond-cir.c
 
+WINCHIPHEAD CH341 I2C/GPIO/SPI DRIVER
+M: Frank Zago 
+L: linux-...@vger.kernel.org
+S: Maintained
+F: drivers/usb/misc/ch341/
+
 WINSYSTEMS EBC-C384 WATCHDOG DRIVER
 M: William Breathitt Gray 
 L: linux-watch...@vger.kernel.org
diff --git a/drivers/usb/misc/Kconfig b/drivers/usb/misc/Kconfig
index 8f1144359012..2d4db92f0de4 100644
--- a/drivers/usb/misc/Kconfig
+++ b/drivers/usb/misc/Kconfig
@@ -284,3 +284,21 @@ config BRCM_USB_PINMAP
  This option enables support for remapping some USB external
  signals, which are typically on dedicated pins on the chip,
  to any gpio.
+
+config USB_CH341_CORE
+   tristate "USB WinChipHead CH341 in I2C/SPI/GPIO mode"
+   depends on USB && GPIOLIB && I2C && SPI
+   help
+
+ If you say yes to this option, support for the CH341 chips,
+ running in I2C/SPI/GPIO mode will be included. Some versions
+ of the chip do not support all the functionnalities but
+ there is no way to differentiate them. For instance the
+ CH341A and CH341B support everything while the CH341T can
+ only do I2C.
+
+ The serial mode is not supported by this driver. Use the
+ CH341 USB serial driver.
+
+ This driver can also be built as a module.  If so, the
+ module will be called ch341-buses.
diff --git a/drivers/usb/misc/Makefile b/drivers/usb/misc/Makefile
index 5f4e598573ab..95c0ca15b8c9 100644
--- a/drivers/usb/misc/Makefile
+++ b/drivers/usb/misc/Makefile
@@ -32,3 +32,4 @@ obj-$(CONFIG_USB_CHAOSKEY)+= chaoskey.o
 obj-$(CONFIG_USB_SISUSBVGA)+= sisusbvga/
 obj-$(CONFIG_USB_LINK_LAYER_TEST)  += lvstest.o
 obj-$(CONFIG_BRCM_USB_PINMAP)  += brcmstb-usb-pinmap.o
+obj-$(CONFIG_USB_CH341_CORE)   += ch341/
diff --git a/drivers/usb/misc/ch341/Kconfig b/drivers/usb/misc/ch341/Kconfig
new file mode 100644
index ..e69de29bb2d1
diff --git a/drivers/usb/misc/ch341/Makefile b/drivers/usb/misc/ch341/Makefile
new file mode 100644
index ..7c6429e7a46e
--- /dev/null
+++ b/drivers/usb/misc/ch341/Makefile
@@ -0,0 +1,3 @@
+obj-$(CONFIG_USB_CH341_CORE) := ch341-buses.o
+
+ch341-buses-objs := ch341-core.o ch341-i2c.o ch341-gpio.o
diff --git a/drivers/usb/misc/ch341/ch341-core.c 
b/drivers/usb/misc/ch341/ch341-core.c
new file mode 100644
index ..19c531715ea3
--- /dev/null
+++ b/drivers/usb/misc/ch341/ch341-core.c
@@ -0,0 +1,87 @@
+// SPDX-License-Identifier: GPL-2.0
+/*
+ * Driver for the CH341A, and CH341B USB to I2C/GPIO adapter
+ * Driver for the CH341T USB to I2C adapter
+ *
+ * Copyright 2021, Frank Zago
+ * Copyright (c) 2017 Gunar Schorcht (gu...@schorcht.net)
+ * Copyright (c) 2016 Tse Lun Bien
+ * Copyright (c) 2014 Marco Gittler
+ * Copyright (c) 2006-2007 Till Harbaum (t...@harbaum.org)
+ *
+ * The full UART functionality is handled by the CH341 serial driver
+ */
+
+#include 
+#include 
+
+#include "ch341.h"
+
+static void ch341_usb_free_device(struct ch341_device *dev)
+{
+   ch341_gpio_remove(dev);
+   ch341_i2c_remove(dev);
+
+   usb_set_intfdata(dev->iface, NULL);
+   usb_put_dev(dev->usb_dev);
+
+   kfree(dev);
+}
+
+static int ch341_usb_probe(struct usb_interface *iface,
+  const struct usb_device_id *usb_id)
+{
+   struct ch341_device *dev;
+   

Re: [PATCH net v4 1/2] net: sched: fix packet stuck problem for lockless qdisc

2021-04-19 Thread Yunsheng Lin
On 2021/4/20 7:55, Michal Kubecek wrote:
> On Mon, Apr 19, 2021 at 05:29:46PM +0200, Michal Kubecek wrote:
>>
>> As pointed out in the discussion on v3, this patch may result in
>> significantly higher CPU consumption with multiple threads competing on
>> a saturated outgoing device. I missed this submission so that I haven't
>> checked it yet but given the description of v3->v4 changes above, it's
>> quite likely that it suffers from the same problem.
> 
> And it indeed does. However, with the additional patch from the v3
> discussion, the numbers are approximately the same as with an unpatched
> mainline kernel.
> 
> As with v4, I tried this patch on top of 5.12-rc7 with real devices.
> I used two machines with 10Gb/s Intel ixgbe NICs, sender has 16 CPUs
> (2 8-core CPUs with HT disabled) and 16 Rx/Tx queues, receiver has
> 48 CPUs (2 12-core CPUs with HT enabled) and 48 Rx/Tx queues.
> 
>   threads5.12-rc75.12-rc7 + v45.12-rc7 + v4 + stop
>  125.1%  38.1%22.9%
>  866.2% 277.0%74.1%
> 1690.1% 150.7%91.0%
> 32   107.2% 272.6%   108.3%
> 64   116.3% 487.5%   118.1%
>128   126.1% 946.7%   126.9%
> 
> (The values are normalized to one core, i.e. 100% corresponds to one
> fully used logical CPU.)
> 
> So it seems that repeated scheduling while the queue was stopped is
> indeed the main performance issue and that other cases of the logic
> being too pessimistic do not play significant role. There is an
> exception with 8 connections/threads and the result with just this
> series also looks abnormally high (e.g. much higher than with
> 16 threads). It might be worth investigating what happens there and
> what do the results with other thread counts around 8 look like.

Will try to investigate the 8 connections/threads case.

> 
> I'll run some more tests with other traffic patterns tomorrow and
> I'm also going to take a closer look at the additional patch.

Thanks for taking the detail testing and looking.

> 
> Michal
> 
> .
> 



Re: [PATCH v3 02/12] iommu: Add iommu_split_block interface

2021-04-19 Thread Lu Baolu

Hi Keqian,

On 4/20/21 9:25 AM, Keqian Zhu wrote:

Hi Baolu,

On 2021/4/19 21:33, Lu Baolu wrote:

Hi Keqian,

On 2021/4/19 17:32, Keqian Zhu wrote:

+EXPORT_SYMBOL_GPL(iommu_split_block);

Do you really have any consumers of this interface other than the dirty
bit tracking? If not, I don't suggest to make this as a generic IOMMU
interface.

There is an implicit requirement for such interfaces. The
iommu_map/unmap(iova, size) shouldn't be called at the same time.
Currently there's no such sanity check in the iommu core. A poorly
written driver could mess up the kernel by misusing this interface.

Yes, I don't think up a scenario except dirty tracking.

Indeed, we'd better not make them as a generic interface.

Do you have any suggestion that underlying iommu drivers can share these code 
but
not make it as a generic iommu interface?

I have a not so good idea. Make the "split" interfaces as a static function, and
transfer the function pointer to start_dirty_log. But it looks weird and 
inflexible.


I understand splitting/merging super pages is an optimization, but not a
functional requirement. So is it possible to let the vendor iommu driver
decide whether splitting super pages when starting dirty bit tracking
and the opposite operation during when stopping it? The requirement for

Right. If I understand you correct, actually that is what this series does.


I mean to say no generic APIs, jut do it by the iommu subsystem itself.
It's totally transparent to the upper level, just like what map() does.
The upper layer doesn't care about either super page or small page is
in use when do a mapping, right?

If you want to consolidate some code, how about putting them in
start/stop_tracking()?

Best regards,
baolu


We realized split/merge in IOMMU core layer, but don't force vendor driver to 
use it.

The problem is that when we expose these interfaces to vendor IOMMU driver, 
will also
expose them to upper driver.


upper layer is that starting/stopping dirty bit tracking and
mapping/unmapping are mutually exclusive.

OK, I will explicitly add the hints. Thanks.

Thanks,
Keqian




On the other hand, if a driver calls map/unmap with split/merge at the same 
time,
it's a bug of driver, it should follow the rule.



Best regards,
baolu
.



Re: [PATCH 0/5] Bring the BusLogic host bus adapter driver up to Y2021

2021-04-19 Thread Khalid Aziz
On 4/19/21 10:01 AM, Maciej W. Rozycki wrote:
> On Mon, 19 Apr 2021, Khalid Aziz wrote:
> 
>> On 4/18/21 2:21 PM, Ondrej Zary wrote:
>>>
>>> Found the 3000763 document here:
>>> https://doc.lagout.org/science/0_Computer Science/0_Computer 
>>> History/old-hardware/buslogic/3000763_PCI_EISA_Wide_SCSI_Tech_Ref_Dec94.pdf
>>>
>>> There's also 3002593 there:
>>> https://doc.lagout.org/science/0_Computer Science/0_Computer 
>>> History/old-hardware/buslogic/
>>>
>>
>> Thanks!!!
> 
>  Ondrej: Thanks a lot indeed!  These documents seem to have the essential 
> interface details all covered, except for Fast-20 SCSI adapters, which I 
> guess are a minor modification from the software's point of view.
> 
>  Khalid: I have skimmed over these documents and I infer 24-bit addressing 
> can be verified with any MultiMaster adapter, including ones that do have 
> 32-bit addressing implemented, by using the legacy Initialize Mailbox HBA 
> command.  That could be used to stop Christoph's recent changes for older 
> adapter support removal and replace them with proper fixes for whatever 
> has become broken.  Is that something you'd be willing as the driver's 
> maintainer to look into, or shall I?
> 

Hi Maciej,

Do you mean use OpCode 01 (INITIALIZE MAILBOX) to set a 24-bit address
for mailbox in place of OpCode 81? Verifying the change would be a
challenge. Do you have an old adapter to test it with? If you do, go
ahead and make the changes. I will be happy to review. I have only a
BT-757 adapter.

Thanks,
Khalid


Re: [PATCH] MIPS: Fix cmdline "mem=" parameter parsing

2021-04-19 Thread Youling Tang

Hi, Jiaxun

On 04/20/2021 09:05 AM, Jiaxun Yang wrote:


在 2021/4/19 18:50, Youling Tang 写道:
This problem may only occur on NUMA platforms. When machine start 
with the

"mem=" parameter on Loongson64, it cannot boot. When parsing the "mem="
parameter, first remove all RAM, and then add memory through 
memblock_add(),

which causes the newly added memory to be located on MAX_NUMNODES.

The solution is to add the current "mem=" parameter range to the 
memory area
of the corresponding node, instead of adding all of it to the 
MAX_NUMNODES
node area. Get the node number corresponding to the "mem=" parameter 
range

through pa_to_nid(), and then add it to the corresponding node through
memblock_add_node().

Signed-off-by: Jinyang He 
Signed-off-by: Youling Tang 
---
  arch/mips/kernel/setup.c | 2 +-
  1 file changed, 1 insertion(+), 1 deletion(-)

diff --git a/arch/mips/kernel/setup.c b/arch/mips/kernel/setup.c
index 279be01..b86e241 100644
--- a/arch/mips/kernel/setup.c
+++ b/arch/mips/kernel/setup.c
@@ -359,7 +359,7 @@ static int __init early_parse_mem(char *p)
  if (*p == '@')
  start = memparse(p + 1, );
  -memblock_add(start, size);
+memblock_add_node(start, size, pa_to_nid(start));


pa_to_nid is not available for all platforms.


Thanks for your correction.

pa_to_nid() only has actual definitions in mach-ip27 and mach-loongson64 
(only

for NUMA platform).

In arch/mips/include/asm/mmzone.h:
#ifndef pa_to_nid
#define pa_to_nid(addr) 0
#endif

So only need #include  to solve the "error: implicit 
declaration

of function'pa_to_nid'" compilation error.

Thanks,
Youling


Thanks.

- Jiaxun


return 0;
  }




Re: [v9,0/7] PCI: mediatek: Add new generation controller support

2021-04-19 Thread Jianjun Wang
On Mon, 2021-04-19 at 11:44 +0100, Lorenzo Pieralisi wrote:
> On Fri, Apr 16, 2021 at 02:21:00PM -0500, Bjorn Helgaas wrote:
> > On Wed, Mar 24, 2021 at 11:05:03AM +0800, Jianjun Wang wrote:
> > > These series patches add pcie-mediatek-gen3.c and dt-bindings file to
> > > support new generation PCIe controller.
> > 
> > Incidental: b4 doesn't work on this thread, I suspect because the
> > usual subject line format is:
> > 
> >   [PATCH v9 9/7]
> > 
> > instead of:
> > 
> >   [v9,0/7]
> > 
> > For b4 info, see 
> > https://git.kernel.org/pub/scm/utils/b4/b4.git/tree/README.rst
> 
> Jianjun will update the series accordingly (and please add to v10 the
> review tags you received.
> 
> Lorenzo

Yes, I will update this series in v10 to fix the subject line format and
use EXPORT_SYMBOL_GPL(), thanks for your comments.

Thanks.



Re: [PATCH v1 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address

2021-04-19 Thread Jue Wang
On Tue, 13 Apr 2021 07:43:20 +0900, Naoya Horiguchi wrote:

> This patch suggests to do page table walk to find the error virtual
> address.  If we find multiple virtual addresses in walking, we now can't
> determine which one is correct, so we fall back to sending SIGBUS in
> kill_me_maybe() without error info as we do now.  This corner case needs
> to be solved in the future.

Instead of walking the page tables, I wonder what about the following idea:

When failing to get vaddr, memory_failure just ensures the mapping is removed
and an hwpoisoned swap pte is put in place; or the original page is flagged with
PG_HWPOISONED and kept in the radix tree (e.g., for SHMEM THP).

NOTE: no SIGBUS is sent to user space.

Then do_machine_check just returns to user space to resume execution, the
re-execution will result in a #PF and should land to the exact page fault
handling code that generates a SIGBUS with the precise vaddr info:

https://github.com/torvalds/linux/blob/7af08140979a6e7e12b78c93b8625c8d25b084e2/mm/memory.c#L3290
https://github.com/torvalds/linux/blob/7af08140979a6e7e12b78c93b8625c8d25b084e2/mm/memory.c#L3647

Thanks,
-Jue


Re: [PATCH 1/1] of/pci: Add IORESOURCE_MEM_64 to resource flags for 64-bit memory addresses

2021-04-19 Thread Leonardo Bras
On Mon, 2021-04-19 at 20:39 -0500, Rob Herring wrote:
> On Mon, Apr 19, 2021 at 7:35 PM Leonardo Bras  wrote:
> > 
> > On Mon, 2021-04-19 at 10:44 -0500, Rob Herring wrote:
> > > On Fri, Apr 16, 2021 at 3:58 PM Leonardo Bras  wrote:
> > > > 
> > > > Hello Rob, thanks for this feedback!
> > > > 
> > > > On Thu, 2021-04-15 at 13:59 -0500, Rob Herring wrote:
> > > > > +PPC and PCI lists
> > > > > 
> > > > > On Thu, Apr 15, 2021 at 1:01 PM Leonardo Bras  
> > > > > wrote:
> > > > > > 
> > > > > > Many other resource flag parsers already add this flag when the 
> > > > > > input
> > > > > > has bits 24 & 25 set, so update this one to do the same.
> > > > > 
> > > > > Many others? Looks like sparc and powerpc to me.
> > > > > 
> > > > 
> > > > s390 also does that, but it look like it comes from a device-tree.
> > > 
> > > I'm only looking at DT based platforms, and s390 doesn't use DT.
> > 
> > Correct.
> > Sorry, I somehow write above the opposite of what I was thinking.
> > 
> > > 
> > > > > Those would be the
> > > > > ones I worry about breaking. Sparc doesn't use of/address.c so it's
> > > > > fine. Powerpc version of the flags code was only fixed in 2019, so I
> > > > > don't think powerpc will care either.
> > > > 
> > > > In powerpc I reach this function with this stack, while configuring a
> > > > virtio-net device for a qemu/KVM pseries guest:
> > > > 
> > > > pci_process_bridge_OF_ranges+0xac/0x2d4
> > > > pSeries_discover_phbs+0xc4/0x158
> > > > discover_phbs+0x40/0x60
> > > > do_one_initcall+0x60/0x2d0
> > > > kernel_init_freeable+0x308/0x3a8
> > > > kernel_init+0x2c/0x168
> > > > ret_from_kernel_thread+0x5c/0x70
> > > > 
> > > > For this, both MMIO32 and MMIO64 resources will have flags 0x200.
> > > 
> > > Oh good, powerpc has 2 possible flags parsing functions. So in the
> > > above path, do we need to set PCI_BASE_ADDRESS_MEM_TYPE_64?
> > > 
> > > Does pci_parse_of_flags() get called in your case?
> > > 
> > 
> > It's called in some cases, but not for the device I am debugging
> > (virtio-net pci@8002000).
> > 
> > For the above device, here is an expanded stack trace:
> > 
> > of_bus_pci_get_flags() (from parser->bus->get_flags())
> > of_pci_range_parser_one() (from macro for_each_of_pci_range)
> > pci_process_bridge_OF_ranges+0xac/0x2d4
> > pSeries_discover_phbs+0xc4/0x158
> > discover_phbs+0x40/0x60
> > do_one_initcall+0x60/0x2d0
> > kernel_init_freeable+0x308/0x3a8
> > kernel_init+0x2c/0x168
> > ret_from_kernel_thread+0x5c/0x70
> > 
> > For other devices, I could also see the following stack trace:
> > ## device ethernet@8
> > 
> > pci_parse_of_flags()
> > of_create_pci_dev+0x7f0/0xa40
> > __of_scan_bus+0x248/0x320
> > pcibios_scan_phb+0x370/0x3b0
> > pcibios_init+0x8c/0x12c
> > do_one_initcall+0x60/0x2d0
> > kernel_init_freeable+0x308/0x3a8
> > kernel_init+0x2c/0x168
> > ret_from_kernel_thread+0x5c/0x70
> > 
> > Devices that get parsed with of_bus_pci_get_flags() appears first at
> > dmesg (around 0.015s in my test), while devices that get parsed by
> > pci_parse_of_flags() appears later (0.025s in my test).
> > 
> > I am not really used to this code, but having the term "discover phbs"
> > in the first trace and the term "scan phb" in the second, makes me
> > wonder if the first trace is seen on devices that are seen/described in
> > the device-tree and the second trace is seen in devices not present in
> > the device-tree and found scanning pci bus.
> 
> That was my guess as well. I think on pSeries that most PCI devices
> are in the DT whereas on Arm and other flattened DT (non OpenFirmware)
> platforms PCI devices are not in DT.
> 

It makes sense to me. 

>  Of course, for virtio devices,
> they would not be in DT in either case.

I don't get this part... in pseries it looks like virtio devices can be
in device-tree.

Oh, I think I get it... this pci@8002000 looks like a bus
(described in device-tree, so discovered), and then the devices are
inside it, getting scanned.

The virtio device gets the correct flags (from pci_parse_of_flags), but
the bus (pci@8002000) does not seem to get it correctly,
because it comes from of_bus_pci_get_flags() which makes sense
according to the name of the function.

(see lspci bellow, output without patch)


00:08.0 Ethernet controller: Red Hat, Inc. Virtio network device (rev
01)
Subsystem: Red Hat, Inc. Device 1100
Device tree node:
/sys/firmware/devicetree/base/pci@8002000/ethernet@8
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop-
ParErr- Stepping- SERR+ FastB2B- DisINTx+
Status: Cap+ 66MHz- UDF- FastB2B- ParErr- DEVSEL=fast >TAbort-
SERR- 
BAR=0 offset= size=
Capabilities: [70] Vendor Specific Information: VirtIO: Notify
BAR=4 offset=3000 size=1000 multiplier=0004
Capabilities: [60] Vendor Specific Information: VirtIO:
DeviceCfg
BAR=4 offset=2000 size=1000
Capabilities: 

Re: [PATCH v2] tools/power turbostat: Fix RAPL summary collection on AMD processors

2021-04-19 Thread Chen Yu
On Mon, Apr 19, 2021 at 02:58:12PM -0500, Terry Bowman wrote:
> Turbostat fails to correctly collect and display RAPL summary information
> on Family 17h and 19h AMD processors. Running turbostat on these processors
> returns immediately. If turbostat is working correctly then RAPL summary
> data is displayed until the user provided command completes. If a command
> is not provided by the user then turbostat is designed to continuously
> display RAPL information until interrupted.
> 
> The issue is due to offset_to_idx() and idx_to_offset() missing support for
> AMD MSR addresses/offsets. offset_to_idx()'s switch statement is missing
> cases for AMD MSRs and idx_to_offset() does not include a path to return
> AMD MSR(s) for any idx.
> 
> The solution is add AMD MSR support to offset_to_idx() and idx_to_offset().
> These functions are split-out and renamed along architecture vendor lines
> for supporting both AMD and Intel MSRs.
> 
> Fixes: 9972d5d84d76 ("tools/power turbostat: Enable accumulate RAPL display")
> Signed-off-by: Terry Bowman 
Thanks for fixing, Terry, and previously there was a patch for this from Bas 
Nieuwenhuizen:
https://lkml.org/lkml/2021/3/12/682
and it is expected to have been merged in Len's branch already.

thanks,
Chenyu


Re: [syzbot] INFO: task hung in __io_uring_cancel

2021-04-19 Thread syzbot
Hello,

syzbot has tested the proposed patch but the reproducer is still triggering an 
issue:
KASAN: null-ptr-deref Write in io_uring_cancel_sqpoll

==
BUG: KASAN: null-ptr-deref in instrument_atomic_read_write 
include/linux/instrumented.h:101 [inline]
BUG: KASAN: null-ptr-deref in atomic_inc 
include/asm-generic/atomic-instrumented.h:240 [inline]
BUG: KASAN: null-ptr-deref in io_uring_cancel_sqpoll+0x150/0x310 
fs/io_uring.c:8930
Write of size 4 at addr 0114 by task iou-sqp-31588/31596

CPU: 0 PID: 31596 Comm: iou-sqp-31588 Not tainted 5.12.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x141/0x1d7 lib/dump_stack.c:120
 __kasan_report mm/kasan/report.c:403 [inline]
 kasan_report.cold+0x5f/0xd8 mm/kasan/report.c:416
 check_region_inline mm/kasan/generic.c:180 [inline]
 kasan_check_range+0x13d/0x180 mm/kasan/generic.c:186
 instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
 atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline]
 io_uring_cancel_sqpoll+0x150/0x310 fs/io_uring.c:8930
 io_sq_thread+0x47e/0x1310 fs/io_uring.c:6873
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
==
Kernel panic - not syncing: panic_on_warn set ...
CPU: 1 PID: 31596 Comm: iou-sqp-31588 Tainted: GB 
5.12.0-rc7-syzkaller #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 
01/01/2011
Call Trace:
 __dump_stack lib/dump_stack.c:79 [inline]
 dump_stack+0x141/0x1d7 lib/dump_stack.c:120
 panic+0x306/0x73d kernel/panic.c:231
 end_report mm/kasan/report.c:102 [inline]
 end_report.cold+0x5a/0x5a mm/kasan/report.c:88
 __kasan_report mm/kasan/report.c:406 [inline]
 kasan_report.cold+0x6a/0xd8 mm/kasan/report.c:416
 check_region_inline mm/kasan/generic.c:180 [inline]
 kasan_check_range+0x13d/0x180 mm/kasan/generic.c:186
 instrument_atomic_read_write include/linux/instrumented.h:101 [inline]
 atomic_inc include/asm-generic/atomic-instrumented.h:240 [inline]
 io_uring_cancel_sqpoll+0x150/0x310 fs/io_uring.c:8930
 io_sq_thread+0x47e/0x1310 fs/io_uring.c:6873
 ret_from_fork+0x1f/0x30 arch/x86/entry/entry_64.S:294
Kernel Offset: disabled
Rebooting in 86400 seconds..


Tested on:

commit: 734551df io_uring: fix shared sqpoll cancellation hangs
git tree:   git://git.kernel.dk/linux-block for-5.13/io_uring
console output: https://syzkaller.appspot.com/x/log.txt?x=175fec6dd0
kernel config:  https://syzkaller.appspot.com/x/.config?x=601d16d8cd22e315
dashboard link: https://syzkaller.appspot.com/bug?extid=47fc00967b06a3019bd2
compiler:   



Re:Re: [PATCH] libnvdimm.h: Remove duplicate struct declaration

2021-04-19 Thread Jiabing Wan
 
>On Mon, Apr 19, 2021 at 07:27:25PM +0800, Wan Jiabing wrote:>> struct device 
>is declared at 133rd line.
>> The declaration here is unnecessary. Remove it.
>> 
>> Signed-off-by: Wan Jiabing 
>> ---
>>  include/linux/libnvdimm.h | 1 -
>>  1 file changed, 1 deletion(-)
>> 
>> diff --git a/include/linux/libnvdimm.h b/include/linux/libnvdimm.h
>> index 01f251b6e36c..89b69e645ac7 100644
>> --- a/include/linux/libnvdimm.h
>> +++ b/include/linux/libnvdimm.h
>> @@ -141,7 +141,6 @@ static inline void __iomem *devm_nvdimm_ioremap(struct 
>> device *dev,
>>  
>>  struct nvdimm_bus;
>>  struct module;
>> -struct device;
>>  struct nd_blk_region;
>
>What is the coding style preference for pre-declarations like this?  Should
>they be placed at the top of the file?
>
>The patch is reasonable but if the intent is to declare right before use for
>clarity, both devm_nvdimm_memremap() and nd_blk_region_desc() use struct
>device.  So perhaps this duplicate is on purpose?
>
>Ira

OK, my script just catch this duplicate.
And I will report the duplicate if there is no MACRO dependence.
But I hadn't thought of whether the duplicate is a prompt on purpose.
Sorry.

Thanks for your reply.
Wan Jiabing

>>  struct nd_blk_region_desc {
>>  int (*enable)(struct nvdimm_bus *nvdimm_bus, struct device *dev);
>> -- 
>> 2.25.1
>> 




Re: [PATCH] f2fs: fix to cover allocate_segment() with lock

2021-04-19 Thread Chao Yu

On 2021/4/20 0:57, Jaegeuk Kim wrote:

On 04/14, Chao Yu wrote:

As we did for other cases, in fix_curseg_write_pointer(), let's
change as below:
- use callback function s_ops->allocate_segment() instead of
raw function allocate_segment_by_default();
- cover allocate_segment() with curseg_lock and sentry_lock.

Signed-off-by: Chao Yu 
---
  fs/f2fs/segment.c | 7 ++-
  1 file changed, 6 insertions(+), 1 deletion(-)

diff --git a/fs/f2fs/segment.c b/fs/f2fs/segment.c
index b2ee6b7791b0..daf9531ec58f 100644
--- a/fs/f2fs/segment.c
+++ b/fs/f2fs/segment.c
@@ -4848,7 +4848,12 @@ static int fix_curseg_write_pointer(struct f2fs_sb_info 
*sbi, int type)
  
  	f2fs_notice(sbi, "Assign new section to curseg[%d]: "

"curseg[0x%x,0x%x]", type, cs->segno, cs->next_blkoff);
-   allocate_segment_by_default(sbi, type, true);
+
+   down_read(_I(sbi)->curseg_lock);
+   down_write(_I(sbi)->sentry_lock);
+   SIT_I(sbi)->s_ops->allocate_segment(sbi, type, true);
+   up_write(_I(sbi)->sentry_lock);
+   up_read(_I(sbi)->curseg_lock);


Seems f2fs_allocate_new_section()?


f2fs_allocate_new_section() will allocate new section only when current
section has been initialized and has valid block/ckpt_block.

It looks fix_curseg_write_pointer() wants to force migrating current segment
to new section whenever write pointer and curseg->next_blkoff is inconsistent.

So how about adding a parameter to force f2fs_allocate_new_section() to
allocate new section?

Thanks,



  
  	/* check consistency of the zone curseg pointed to */

if (check_zone_write_pointer(sbi, zbd, ))
--
2.29.2

.



Re: [PATCH v1 3/3] mm,hwpoison: add kill_accessing_process() to find error virtual address

2021-04-19 Thread Jue Wang
On Tue, 13 Apr 2021 07:43:20 +0900, Naoya Horiguchi wrote:
...
> + * This function is intended to handle "Action Required" MCEs on already
> + * hardware poisoned pages. They could happen, for example, when
> + * memory_failure() failed to unmap the error page at the first call, or
> + * when multiple Action Optional MCE events races on different CPUs with
> + * Local MCE enabled.

+Tony Luck

Hey Tony, I thought SRAO MCEs are broadcasted to all cores in the system
as they come without an execution context, is it correct?

If Yes, Naoya, I think we might want to remove the comments about the
"multiple Action Optional MCE racing" part.

Best,
-Jue


Re: [PATCH net v3] net: sched: fix packet stuck problem for lockless qdisc

2021-04-19 Thread Yunsheng Lin
On 2021/4/19 22:57, Michal Kubecek wrote:
> On Mon, Apr 19, 2021 at 10:04:27AM +0800, Yunsheng Lin wrote:
>>>
>>> I tried this patch o top of 5.12-rc7 with real devices. I used two
>>> machines with 10Gb/s Intel ixgbe NICs, sender has 16 CPUs (2 8-core CPUs
>>> with HT disabled) and 16 Rx/Tx queues, receiver has 48 CPUs (2 12-core
>>> CPUs with HT enabled) and 48 Rx/Tx queues. With multiple TCP streams on
>>> a saturated ethernet, the CPU consumption grows quite a lot:
>>>
>>> threads unpatched 5.12-rc75.12-rc7 + v3   
>>>   1   25.6%   30.6%
>>>   8   73.1%  241.4%
>>> 128  132.2% 1012.0%
>>>
>>> (The values are normalized to one core, i.e. 100% corresponds to one
>>> fully used logical CPU.) I didn't perform a full statistical evaluation
>>> but the growth is way beyond any statistical fluctuation with one
>>> exception: 8-thread test of patched kernel showed values from 155.5% to
>>> 311.4%. Closer look shows that most of the CPU time was spent in softirq
>>> and running top in parallel with the test confirms that there are
>>> multiple ksofirqd threads running at 100% CPU. I had similar problem
>>> with earlier versions of my patch (work in progress, I still need to
>>> check some corner cases and write commit message explaining the logic)
>>
>> Great, if there is a better idea, maybe share the core idea first so
>> that we both can work on the that?
> 
> I'm not sure if it's really better but to minimize the false positives
> and unnecessary calls to __netif_schedule(), I replaced q->seqlock with
> an atomic combination of a "running" flag (which corresponds to current
> seqlock being locked) and a "drainers" count (number of other threads
> going to clean up the qdisc queue). This way we could keep track of them
> and get reliable information if another thread is going to run a cleanup
> after we leave the qdisc_run() critical section (so that there is no
> need to schedule).

It seems you are trying to match the skb enqueuing with the calling of
__qdisc_run() here, which is not reliable when considering the dequeue
batching, see try_bulk_dequeue_skb() or try_bulk_dequeue_skb_slow() in
dequeue_skb().

> 
>>> The biggest problem IMHO is that the loop in __qdisc_run() may finish
>>> without rescheduling not only when the qdisc queue is empty but also
>>> when the corresponding device Tx queue is stopped which devices tend to
>>> do whenever they cannot send any more packets out. Thus whenever
>>> __QDISC_STATE_NEED_RESCHEDULE is set with device queue stopped or
>>> frozen, we keep rescheduling the queue cleanup without any chance to
>>> progress or clear __QDISC_STATE_NEED_RESCHEDULE. For this to happen, all
>>> we need is another thready to fail the first spin_trylock() while device
>>> queue is stopped and qdisc queue not empty.
>>
>> Yes, We could just return false before doing the second spin_trylock() if
>> the netdev queue corresponding qdisc is stopped, and when the netdev queue
>> is restarted, __netif_schedule() is called again, see netif_tx_wake_queue().
>>
>> Maybe add a sch_qdisc_stopped() function and do the testting in 
>> qdisc_run_begin:
>>
>> if (dont_retry || sch_qdisc_stopped())
>>  return false;
>>
>> bool sch_qdisc_stopped(struct Qdisc *q)
>> {
>>  const struct netdev_queue *txq = q->dev_queue;
>>
>>  if (!netif_xmit_frozen_or_stopped(txq))
>>  return true;
>>
>>  reture false;
>> }
>>
>> At least for qdisc with TCQ_F_ONETXQUEUE flags set is doable?
> 
> Either this or you can do the check in qdisc_run_end() - when the device
> queue is stopped or frozen, there is no need to schedule as we know it's
> going to be done when the flag is cleared again (and we cannot do
> anything until then anyway).
> 
>>> Another part of the problem may be that to avoid the race, the logic is
>>> too pessimistic: consider e.g. (dotted lines show "barriers" where
>>> ordering is important):
>>>
>>> CPU ACPU B
>>> spin_trylock() succeeds
>>>  pfifo_fast_enqueue()
>>> ..
>>> skb_array empty, exit loop
>>>  first spin_trylock() fails
>>>  set __QDISC_STATE_NEED_RESCHEDULE
>>>  second spin_trylock() fails
>>> ..
>>> spin_unlock()
>>> call __netif_schedule()
>>>
>>> When we switch the order of spin_lock() on CPU A and second
>>> spin_trylock() on CPU B (but keep setting __QDISC_STATE_NEED_RESCHEDULE
>>> before CPU A tests it), we end up scheduling a queue cleanup even if
>>> there is already one running. And either of these is quite realistic.
>>
>> But I am not sure it is a good thing or bad thing when the above happen,
>> because it may be able to enable the 

[RFC] memory reserve for userspace oom-killer

2021-04-19 Thread Shakeel Butt
Proposal: Provide memory guarantees to userspace oom-killer.

Background:

Issues with kernel oom-killer:
1. Very conservative and prefer to reclaim. Applications can suffer
for a long time.
2. Borrows the context of the allocator which can be resource limited
(low sched priority or limited CPU quota).
3. Serialized by global lock.
4. Very simplistic oom victim selection policy.

These issues are resolved through userspace oom-killer by:
1. Ability to monitor arbitrary metrics (PSI, vmstat, memcg stats) to
early detect suffering.
2. Independent process context which can be given dedicated CPU quota
and high scheduling priority.
3. Can be more aggressive as required.
4. Can implement sophisticated business logic/policies.

Android's LMKD and Facebook's oomd are the prime examples of userspace
oom-killers. One of the biggest challenges for userspace oom-killers
is to potentially function under intense memory pressure and are prone
to getting stuck in memory reclaim themselves. Current userspace
oom-killers aim to avoid this situation by preallocating user memory
and protecting themselves from global reclaim by either mlocking or
memory.min. However a new allocation from userspace oom-killer can
still get stuck in the reclaim and policy rich oom-killer do trigger
new allocations through syscalls or even heap.

Our attempt of userspace oom-killer faces similar challenges.
Particularly at the tail on the very highly utilized machines we have
observed userspace oom-killer spectacularly failing in many possible
ways in the direct reclaim. We have seen oom-killer stuck in direct
reclaim throttling, stuck in reclaim and allocations from interrupts
keep stealing reclaimed memory. We have even observed systems where
all the processes were stuck in throttle_direct_reclaim() and only
kswapd was running and the interrupts kept stealing the memory
reclaimed by kswapd.

To reliably solve this problem, we need to give guaranteed memory to
the userspace oom-killer. At the moment we are contemplating between
the following options and I would like to get some feedback.

1. prctl(PF_MEMALLOC)

The idea is to give userspace oom-killer (just one thread which is
finding the appropriate victims and will be sending SIGKILLs) access
to MEMALLOC reserves. Most of the time the preallocation, mlock and
memory.min will be good enough but for rare occasions, when the
userspace oom-killer needs to allocate, the PF_MEMALLOC flag will
protect it from reclaim and let the allocation dip into the memory
reserves.

The misuse of this feature would be risky but it can be limited to
privileged applications. Userspace oom-killer is the only appropriate
user of this feature. This option is simple to implement.

2. Mempool

The idea is to preallocate mempool with a given amount of memory for
userspace oom-killer. Preferably this will be per-thread and
oom-killer can preallocate mempool for its specific threads. The core
page allocator can check before going to the reclaim path if the task
has private access to the mempool and return page from it if yes.

This option would be more complicated than the previous option as the
lifecycle of the page from the mempool would be more sophisticated.
Additionally the current mempool does not handle higher order pages
and we might need to extend it to allow such allocations. Though this
feature might have more use-cases and it would be less risky than the
previous option.

Another idea I had was to use kthread based oom-killer and provide the
policies through eBPF program. Though I am not sure how to make it
monitor arbitrary metrics and if that can be done without any
allocations.

Please do provide feedback on these approaches.

thanks,
Shakeel


[PATCH v2] Drivers: hv: vmbus: Initialize unload_event statically

2021-04-19 Thread Andrea Parri (Microsoft)
If a malicious or compromised Hyper-V sends a spurious message of type
CHANNELMSG_UNLOAD_RESPONSE, the function vmbus_unload_response() will
call complete() on an uninitialized event, and cause an oops.

Reported-by: Michael Kelley 
Signed-off-by: Andrea Parri (Microsoft) 
---
Changes since v1[1]:
  - add inline comment in vmbus_unload_response()

[1] https://lkml.kernel.org/r/20210416143932.16512-1-parri.and...@gmail.com

 drivers/hv/channel_mgmt.c | 7 ++-
 drivers/hv/connection.c   | 2 ++
 2 files changed, 8 insertions(+), 1 deletion(-)

diff --git a/drivers/hv/channel_mgmt.c b/drivers/hv/channel_mgmt.c
index 4c9e45d1f462c..335a10ee03a5e 100644
--- a/drivers/hv/channel_mgmt.c
+++ b/drivers/hv/channel_mgmt.c
@@ -826,6 +826,11 @@ static void vmbus_unload_response(struct 
vmbus_channel_message_header *hdr)
/*
 * This is a global event; just wakeup the waiting thread.
 * Once we successfully unload, we can cleanup the monitor state.
+*
+* NB.  A malicious or compromised Hyper-V could send a spurious
+* message of type CHANNELMSG_UNLOAD_RESPONSE, and trigger a call
+* of the complete() below.  Make sure that unload_event has been
+* initialized by the time this complete() is executed.
 */
complete(_connection.unload_event);
 }
@@ -841,7 +846,7 @@ void vmbus_initiate_unload(bool crash)
if (vmbus_proto_version < VERSION_WIN8_1)
return;
 
-   init_completion(_connection.unload_event);
+   reinit_completion(_connection.unload_event);
memset(, 0, sizeof(struct vmbus_channel_message_header));
hdr.msgtype = CHANNELMSG_UNLOAD;
vmbus_post_msg(, sizeof(struct vmbus_channel_message_header),
diff --git a/drivers/hv/connection.c b/drivers/hv/connection.c
index dc19d5ae4373c..311cd005b3be6 100644
--- a/drivers/hv/connection.c
+++ b/drivers/hv/connection.c
@@ -26,6 +26,8 @@
 
 struct vmbus_connection vmbus_connection = {
.conn_state = DISCONNECTED,
+   .unload_event   = COMPLETION_INITIALIZER(
+ vmbus_connection.unload_event),
.next_gpadl_handle  = ATOMIC_INIT(0xE1E10),
 
.ready_for_suspend_event = COMPLETION_INITIALIZER(
-- 
2.25.1



  1   2   3   4   5   6   7   8   9   10   >